An Introduction to Open-Source Football Analytics

Federația Română de Fotbal




Joris Bekkers

Contents

  • Introduction

  • Football Data & Analytics
    • Basics
    • Visualizations
    • Analytics
    • etc.

  • Research & Innovation (Tracking Data)
    • Formation and Position Label Assignment
    • Pressing Intensity
    • Graph Neural Networks

Introduction

  • Joris Bekkers
    • 35 y/o
    • Breda, Netherlands

  • Football Analytics Consultant
    • Research
    • Development
    • Deployment

  • PySport Boardmember
    • Unite Sports Analytics Community
    • Grow and Maintain Open-Source

Messi

Introduction to Open-Source Football Analytics

⏩ Open-Source Football Analytics Tools!

What is kloppy?

Load, Filter, Transform, Export Standardized Football Data.

Powered by PySport Downloads

Used by
  • Universities and Researchers
  • Football Data Companies
  • Football Clubs / Federations (e.g. Premier League, Bundesliga, Italian Serie A, Eredivisie etc.)

Why kloppy?

Standardization!

WyScout
{"id": 2570396900, "matchId": 5683859, "matchPeriod": "1H", ... "type": {"primary": "pass", "secondary": ["forward_pass", 
"short_or_medium_pass"]}, "location": {"x": 49, "y": 49}, "team": {"id": 9109, "name": "Austria", "formation": "4-2-3-1"}...}
Opta / StatsPerform
<Event event_id="17" type_id="1" period_id="1" min="0" sec="35" player_id="89076" team_id="497" outcome="1" x="41.4" y="16.4">
      <Q id="3925254967" qualifier_id="56" value="Right" /><Q id="3925254973" qualifier_id="212" value="23.1" />
    <Q id="3925254969" qualifier_id="140" value="61.2" /><Q id="3925366067" qualifier_id="178" />
    <Q id="3925254971" qualifier_id="141" value="1.7" /><Q id="3925254975" qualifier_id="213" value="5.8" />
</Event>
StatsBomb
{"period" : 1, "minute" : 0, "second" : 3, "type" : {"id" : 30, "name" : "Pass"},
... "possession" : 2, "location" : [ 36.8, 27.3 ], "pass" : { "recipient" : { "id" : 6613}, "length" : 68.33, "angle" : 0.756}

Kloppy

<PassEvent event_id='2570396900' time='P1T00:01' player='Romano Schmid' result='COMPLETE'>
<CarryEvent event_id='2570396909' time='P1T00:02' player='Christoph Baumgartner' result='COMPLETE'>
<PassEvent event_id='2570396910' time='P1T00:04' player='Christoph Baumgartner' result='COMPLETE'>
<DuelEvent event_id='2570396803' time='P1T00:05' player='Denis Drăguş' result='WON'>
<DuelEvent event_id='2570396912' time='P1T00:05' player='Romano Schmid' result='LOST'>
<CarryEvent event_id='2570396804' time='P1T00:06' player='Nicolae Stanciu' result='COMPLETE'>
<PassEvent event_id='2570396805' time='P1T00:10' player='Nicolae Stanciu' result='INCOMPLETE'>
<InterceptionEvent event_id='2570396918' time='P1T00:10' player='Romano Schmid' result='LOST'>
<DuelEvent event_id='2570396919' time='P1T00:12' player='Romano Schmid' result='LOST'>

Supported Providers

Provider Event Tracking Public Data
Hawkeye (2D) ✅
Metrica ✅ ✅ 🔗
PFF 🟠 ✅ 🔗
SecondSpectrum PR #437 ✅
Signality ✅
SkillCorner ✅ 🔗
Sportec ✅ ✅ 🔗
StatsPerform / Opta ✅ ✅
Tracab ✅
DataFactory ✅
StatsBomb ✅ 🔗
WyScout ✅ 🔗

✅ Implemented 🟠 Not yet implemented

Kloppy - Event Data

What is Event Data?

  • Shots
  • Passes
  • Defensive Actions (Interceptions, Tackles, Clearances, Fouls etc.)
  • Dribbles
  • Substitutions
  • Cards
    etc.


🔗 Event data Documentation

Load Event Data

from kloppy import wyscout

wyscout.load(
    event_data="data/austria-romania.json"
)
home_team, away_team = event_dataset.metadata.teams
print(f"{home_team.name} v {away_team.name}")

>>> Austria v Romania


🔗 Event data Documentation

Meta data

Within the metadata we can find a lot of information, for example the player_id of each player.

for p in away_team.players:
    print(f"{p.full_name} - {p.player_id}")

>>> Andrei Rațiu - 504303
>>> Horaţiu Moldovan - 513850
>>> Mihai Popescu - 363609
>>> Răzvan Gabriel Marin - 236820
>>> Daniel Bîrligea - 511261


🔗 Player Documentation

Meta data

Pitch Dimensions
event_dataset.metadata.pitch_dimensions

>>> NormalizedPitchDimensions(x_dim=Dimension(min=0, max=1), y_dim=Dimension(min=0, max=1),
standardized=True, unit=<Unit.NORMED: 'normed'>, ..., pitch_length=105, pitch_width=68)
Attacking Left to Right
event_dataset.metadata.orientation

>>> action-executing-team


🔗 Metadata Documentation

Filter

player = away_team.get_player_by_id("236820")

player_events = event_dataset.filter(
    lambda event: (
        event.player == player if event.player is not None else False
    )
)

>>> <EventDataset record_count=32>


🔗 Filter Documentation

Transform

transformed_event_dataset = (
    event_dataset
    .transform(to_coordinate_system="secondspectrum")
)
event_dataset.metadata.pitch_dimensions

>>> MetricPitchDimensions(x_dim=Dimension(min=-52.5, max=52.5),
 y_dim=Dimension(min=-34.0, max=34.0), 
 standardized=False, unit=<Unit.METERS: 'm'>, 
 ... pitch_length=105.0, pitch_width=68.0)


🔗 Transformation Documentation

Transform - Coordinate Systems

Messi


🔗 Coordinate System Documentation

Export

df = (
  transformed_event_dataset
  .to_df(
    "event_type", "team_id", "player_id", "coordinates_*",
    match_id="match_5678",
    is_pass=lambda e: e.event_type.name in ["PASS", "SHOT"],
    engine="polars"  # < or "pandas"
  )
)
event_type team_id player_id coordinates_x coordinates_y match_id is_pass
PASS 614 205651 -0.101667 -0.15255 match_5678 true
PASS 614 116643 -14.217961 -1.2266 match_5678 true
PASS 614 440077 -12.979126 11.191847 match_5678 true


🔗 Exporting to Pandas / Polars Documentation

mplsoccer


📽️ Andy Rowlinson at PySport X StatsPerform

Messi


🔗 mplsoccer Examples

Messi


🔗 mplsoccer Examples

Messi


🔗 mplsoccer Examples

Messi


🔗 mplsoccer Examples

mplsoccer!

from mplsoccer.pitch import Pitch
pitch = Pitch(pitch_color='grass', line_color='white', stripe=True)
fig, ax = pitch.draw()
Messi


🔗 mplsoccer Documentation

from kloppy import wyscout

event_dataset = wyscout.load(
    event_data="data/extract_Austria_Romania_events.json"
)
home_team, away_team = event_dataset.metadata.teams

player = away_team.get_player_by_id("236820")

player_events = (
    event_dataset
    .filter(lambda event: (
        event.player == player if event.player is not None else False
    ))
    .transform(to_coordinate_system="secondspectrum")
)
from mplsoccer import Pitch

def heatmap(xs, ys):
    pitch = Pitch(
        line_zorder=2,
        pitch_type="secondspectrum",
        pitch_length=player_events.metadata.pitch_dimensions.pitch_length,
        pitch_width=player_events.metadata.pitch_dimensions.pitch_width,
    )
    fig, ax = pitch.draw()
    
    ax.set_title(f"{player.last_name} - {player.team.name}")
    pitch.kdeplot(xs, ys, ax=ax, cmap="YlOrRd", fill=True, levels=100)

xs = [event.coordinates.x for event in player_events if event.coordinates is not None]
ys = [event.coordinates.y for event in player_events if event.coordinates is not None]

heatmap(xs, ys)

Anguissa


🔗 mplsoccer Gallery

Shotmap

from kloppy.domain import EventType

team_shots = (
    event_dataset
    .filter(
        lambda event: (
            event.event_type == EventType.SHOT
        )
    ).transform(
        to_orientation="STATIC_AWAY_HOME",
        to_coordinate_system="secondspectrum"
    )
)
home_team, away_team = event_dataset.metadata.teams
pitch = Pitch(
    line_zorder=2,
    pitch_type="secondspectrum",
    half=False,
    pitch_length=events.metadata.pitch_dimensions.pitch_length,
    pitch_width=events.metadata.pitch_dimensions.pitch_width,
)
fig, ax = pitch.draw()

ax.set_title(f"{home_team.name} & {away_team.name} Shots")

for event in events:
    if event.coordinates is not None and event.result_coordinates is not None:
        color = "#DB2129" if event.team.name == "Austria" else "#CE9D00"
        x = event.coordinates.x
        y = event.coordinates.y
        x_end = event.result_coordinates.x
        y_end = event.result_coordinates.y
        
        pitch.scatter(x, y, ax=ax, s=100, color=color, edgecolors="black")
        
        pitch.arrows(x, y, x_end, y_end, ax=ax, width=2, headwidth=5, 
                        headlength=5, color=color, alpha=0.6)
teams

Advanced Event Data Analytics

Socceraction

from socceraction.spadl.kloppy import convert_to_actions
import socceraction.atomic.spadl as atomicspadl

event_dataset = wyscout.load(
    event_data="data/extract_Austria_Romania_events.json"
)

spadl_actions = convert_to_actions(event_dataset)
atomic = atomicspadl.convert_to_atomic(spadl_actions)


🔗 Socceraction / Atomic VAEP (KU Leuven)
🔗 Example Notebooks (KU Leuven)

teams


🔗 Decroos et al. 2019 [PDF]

Advanced Football Analytics

Positional Tracking Data

Wirtz


The unravelsports package aims to aid researchers, analysts and enthusiasts by providing intermediary steps in the complex process of converting raw sports data into meaningful information and actionable insights.

1. Load Kloppy & Convert

from kloppy import sportec

from unravel.soccer import KloppyPolarsDataset

kloppy_dataset = sportec.load_open_tracking_data()

kloppy_polars_dataset = KloppyPolarsDataset(
    kloppy_dataset=kloppy_dataset
)

KloppyPolarsDataset

period_id timestamp frame_id ball_state id x y z team_id position_name game_id vx vy vz v ax ay az a ball_owning_team_id is_ball_carrier
0 1 0 days 00:00:00 10000 alive DFL-OBJ-00008F -20.67 -4.56 0 DFL-CLU-000005 RCB DFL-MAT-J03WPY 0.393 -0.214 0 0.447 0 0 0 0 DFL-CLU-00000P False
1 1 0 days 00:00:00 10000 alive DFL-OBJ-0000EJ -8.86 -0.94 0 DFL-CLU-000005 UNK DFL-MAT-J03WPY -0.009 0.018 0 0.02 0 0 0 0 DFL-CLU-00000P False
2 1 0 days 00:00:00 10000 alive DFL-OBJ-0000F8 -2.12 9.85 0 DFL-CLU-00000P RM DFL-MAT-J03WPY 0 0 0 0 0 0 0 0 DFL-CLU-00000P False
3 1 0 days 00:00:00 10000 alive DFL-OBJ-0000NZ 0.57 23.23 0 DFL-CLU-00000P RB DFL-MAT-J03WPY 0.179 -0.134 0 0.223 0 0 0 0 DFL-CLU-00000P False
4 1 0 days 00:00:00 10000 alive DFL-OBJ-0001HW -46.26 0.08 0 DFL-CLU-000005 GK DFL-MAT-J03WPY 0.357 0.071 0 0.364 0 0 0 0 DFL-CLU-00000P False

Advanced Football Analytics

Elastic Formation and Position Identification (EFPI)

from unravel.soccer import EFPI

model = EFPI(dataset=kloppy_polars_dataset)
model.fit(
    # Default 65 formations , or specify a subset (e.g. ["442" , "433"])
    formations=None,
    # specific time intervals (e.g. 1m, 1m14s, 2m30s etc.), or specify "possession", "period" or "frame".
    every="5m",
    substitutions="drop",
    change_threshold=0.1,
    change_after_possession=True,
)
Messi
Messi

🌀 EFPI Example

Advanced Football Analytics

Pressing Intensity

Develop an intuitive measure for pressing that can be used by coaches, assistants and (data) analysts to identify and analyze pressing situations (e.g. on video, using event data), compute derived metrics (e.g. passes under pressure) and analyze specific in-game situations (e.g. players that don’t follow through on their pressing).

Pressing Intensity

Requirements

  • Use positional tracking data
  • Multipurpose / Extendable
  • No ML / prediction model
  • Not too complex
  • Not too simple
  • Measurable at every moment
  • Summable
  • Measure pressing in the whole "system”
Messi
Total Pressure = 1 - [(1 – 86%) * (1 – 19%)] = 89%
What is the chance 🏃‍♂️defense will reach 🏃‍♀️attack or ⚽️ within
some amount of time given their current direction and speed?"


Messi

UnravelSports Blog

2. Pressing Intensity!

from unravel.Football import PressingIntensity

import polars as pl

model = PressingIntensity(
    dataset=kloppy_polars_dataset
)
model.fit(
    start_time=pl.duration(minutes=1, seconds=53),
    end_time=pl.duration(minutes=2, seconds=32),
    period_id=1,
    method="teams",
    ball_method="max",
    orient="home_away",
    speed_threshold=2.0,
) 

>>> PressingIntensity(n_frames=143)

Probability to Intercept


[[0.07 0.   0.   0.   0.   0.03 0.01 0.   0.   0.   0.23]
 [0.28 0.   0.   0.   0.   0.02 0.   0.   0.   0.   0.  ]
 [0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.  ]
 [0.09 0.   0.   0.   0.   0.1  0.06 0.   0.   0.   0.05]
 [0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.  ]
 [0.   0.   0.   0.   0.   0.09 0.6  0.   0.   0.   0.  ]
 [0.   0.   0.   0.   0.   0.83 0.08 0.   0.   0.   0.  ]
 [0.02 0.   0.   0.   0.   0.02 0.   0.   0.   0.01 0.  ]
 [0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.  ]
 [0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.71]
 [0.3  0.   0.   0.   0.   0.06 0.   0.   0.   0.   0.03]]

3. Video

🌀 Pressing Intensity Video Example

Advanced Football Analytics

Graph Neural Networks

Graph Neural Network


Messi
Messi

Attack Outcome Predictions


Di Maria 36' (2-0)
from unravel.soccer import SoccerGraphConverter

converter = SoccerGraphConverter(dataset=kloppy_polars_dataset)

dataset = GraphDataset(graphs=converter.to_spektral_graphs())

PyData London 2025

Thanks! Questions? :)

slide