Advanced Soccer Analytics, Open-Sourced

you don't have to reinvent the wheel!




Joris Bekkers

Contents

  • Introduction
    • Career Path

  • Open-Source
    • PySport
    • Kloppy
    • unravelsports
    • mplsoccer
    • etc.

  • Research & Innovation
    • Pressing Intensity
    • Graph Neural Networks

Introduction

  • Joris Bekkers
    • 35 y/o
    • Breda, Netherlands

  • Football Analytics Consultant
    • Research
    • Development
    • Deployment

  • PySport Boardmember
    • Unite Sports Analytics Community
    • Grow and Maintain Open-Source

Career Path

  • Online Poker Player ('07 - '12)

Career Path

  • Online Poker Player ('07 - '12)
  • Civil Engineering [Undergrad / BSc] ('09 - '14)

Career Path

  • Online Poker Player ('07 - '12)
  • Civil Engineering [Undergrad / BSc] ('09 - '14)
  • ???

Career Path

  • Online Poker Player ('07 - '12)
  • Civil Engineering [Undergrad / BSc] ('09 - '14)
  • ???
  • Operations Mgmt. & Logistics [Graduate / MSc] ('14 - '17)

Career Path

  • Online Poker Player ('07 - '12)
  • Civil Engineering [Undergrad / BSc] ('09 - '14)
  • ???
  • Operations Mgmt. & Logistics [Graduate / MSc] ('14 - '17)
  • MIT Sloan Sports Analytics Conference (2017)
  • Bekkers & Dabadghao (2017)

Messi

Messi

Career Path

  • Online Poker Player ('07 - '12)
  • Civil Engineering [Undergrad / BSc] ('09 - '14)
  • ???
  • Operations Mgmt. & Logistics [Graduate / MSc] ('14 - '17)
  • MIT Sloan Sports Analytics Conference (2017)
  • AFC Bournemouth (2017)
Messi

Career Path

  • Online Poker Player ('07 - '12)
  • Civil Engineering [Undergrad / BSc] ('09 - '14)
  • ???
  • Operations Mgmt. & Logistics [Graduate / MSc] ('14 - '17)
  • MIT Sloan Sports Analytics Conference (2017)
  • AFC Bournemouth (2017)
  • U.S. Soccer Federation ('18 - now)
Messi

Career Path

  • Online Poker Player ('07 - '12)
  • Civil Engineering [Undergrad / BSc] ('09 - '14)
  • ???
  • Operations Mgmt. & Logistics [Graduate / MSc] ('14 - '17)
  • MIT Sloan Sports Analytics Conference (2017)
  • AFC Bournemouth (2017)
  • U.S. Soccer Federation ('18 - now)
  • Football Analytics Consultant ('19 – forever?)
Messi

Career Path

  • Online Poker Player ('07 - '12)
  • Civil Engineering [Undergrad / BSc] ('09 - '14)
  • ???
  • Operations Mgmt. & Logistics [Graduate / MSc] ('14 - '17)
  • MIT Sloan Sports Analytics Conference (2017)
  • AFC Bournemouth (2017)
  • U.S. Soccer Federation ('18 - now)
  • Football Analytics Consultant ('19 – forever?)
  • MIT Sloan Sports Analytics Conference (2023)
  • Bekkers & Sahasrabudhe (2023)

Messi

Career Path

  • Online Poker Player ('07 - '12)
  • Civil Engineering [Undergrad / BSc] ('09 - '14)
  • ???
  • Operations Mgmt. & Logistics [Graduate / MSc] ('14 - '17)
  • MIT Sloan Sports Analytics Conference (2017)
  • AFC Bournemouth (2017)
  • U.S. Soccer Federation ('18 - now)
  • Football Analytics Consultant ('19 – forever?)
  • MIT Sloan Sports Analytics Conference (2023)
  • PySport ('24 - now)
Messi

Open-Source Soccer Analytics!

Why?

  • Learn out in the open
  • Learn from your mistakes
  • Collaborate and connect with industry pro's
  • Share knowledge
  • Don't reinvent the wheel!

Open Source Resources

What is kloppy?

Load, Filter, Transform, Export Standardized Soccer Data.

Powered by PySport Downloads

Supported Providers

Provider Event Tracking Public Data
Hawkeye (2D) ✅
Metrica ✅ ✅ 🔗
PFF 🟠 ✅ 🔗
SecondSpectrum PR #437 ✅
Signality ✅
SkillCorner ✅ 🔗
Sportec ✅ ✅ 🔗
StatsPerform / Opta ✅ ✅
Tracab ✅
DataFactory ✅
StatsBomb ✅ 🔗
WyScout ✅ 🔗

✅ Implemented 🟠 Not yet implemented

How does it work?

Kloppy basics using Open DFL Dataset

match_id home away
J03WMX 1. FC Köln FC Bayern München
J03WN1 VfL Bochum 1848 Bayer 04 Leverkusen
J03WPY Fortuna Düsseldorf 1. FC Nürnberg
J03WOH Fortuna Düsseldorf SSV Jahn Regensburg
J03WQQ Fortuna Düsseldorf FC St. Pauli
J03WOY Fortuna Düsseldorf F.C. Hansa Rostock
J03WR9 Fortuna Düsseldorf 1. FC Kaiserslautern

Open DFL Dataset (Bassek et al. 2025)

Documentation (under construction)

Load Event Data

from kloppy import sportec

dataset = sportec.load_open_event_data(
    match_id="J03WN1", 
    coordinates="secondspectrum"
)

home_team, away_team = dataset.metadata.teams
print(f"{home_team.name} v {away_team.name}")

>>> 'VfL Bochum 1848 - Bayer 04 Leverkusen'

Meta data

Within the metadata we can find a lot of information, for example the player_id of each player.

home_team, away_team = dataset.metadata.teams
for p in away_team.players:
    print(f"#{p.jersey_no}.", p, " - ", p.player_id)

>>> '30. Jeremie Frimpong  -  DFL-OBJ-002GOI'
>>> '27. Florian Wirtz  -  DFL-OBJ-002GBK'
>>> '23. A. Hložek  -  DFL-OBJ-J01G0J'

Florian Wirtz' Events

player_id = "DFL-OBJ-002GBK"

player_events = dataset.filter(
    lambda event: (
        event.player.player_id == player_id if event.player is not None else False
    )
)

>>> <EventDataset record_count=72>

mplsoccer!

from mplsoccer.pitch import Pitch
pitch = Pitch(pitch_color='grass', line_color='white', stripe=True)
fig, ax = pitch.draw()
Messi

Documentation

from mplsoccer import Pitch


def heatmap(xs, ys):
    pitch = Pitch(
        line_zorder=2,
        pitch_type="secondspectrum",
        pitch_length=dataset.metadata.pitch_dimensions.pitch_length,
        pitch_width=dataset.metadata.pitch_dimensions.pitch_width,
    )
    fig, ax = pitch.draw()
    ax.set_title(f"#{player.jersey_no} - {player.last_name} - {player.team.name}")
    pitch.kdeplot(xs, ys, ax=ax, cmap="YlOrRd", fill=True, levels=100)


transformed_events = player_events.transform(
    to_orientation="ACTION_EXECUTING_TEAM"
)

xs = [event.coordinates.x for event in transformed_events if event.coordinates is not None]
ys = [event.coordinates.y for event in transformed_events if event.coordinates is not None]

heatmap(xs, ys)
Wirtz

Advanced Soccer Analytics

Pressing Intensity

Advanced Soccer Analytics

Pressing Intensity

Develop an intuitive measure for pressing that can be used by coaches, assistants and (data) analysts to identify and analyze pressing situations (e.g. on video, using event data), compute derived metrics (e.g. passes under pressure) and analyze specific in-game situations (e.g. players that don’t follow through on their pressing).

Pressing Intensity

Requirements

  • Use positional tracking data
  • Multipurpose / Extendable
  • No ML / prediction model
  • Not too complex
  • Not too simple
  • Measurable at every moment
  • Summable
  • Measure pressing in the whole "system”
Messi

Pressing Intensity

Requirements

  • Use positional tracking data
  • Multipurpose / Extendable
  • No ML / prediction model
  • Not too complex
  • Not too simple
  • Measurable at every moment
  • Summable
  • Measure pressing in the whole "system”
Messi
Total Pressure = 1 - [(1 – 86%) * (1 – 19%)] = 89%
What is the chance 🏃‍♂️defense will reach 🏃‍♀️attack or ⚽️ within
some amount of time given their current direction and speed?"


Messi

UnravelSports Blog

More Open-Source!



pip install unravelsports




make sure you have Python 3.11 and unravelsports 0.3.0+

Wirtz


The unravelsports package aims to aid researchers, analysts and enthusiasts by providing intermediary steps in the complex process of converting raw sports data into meaningful information and actionable insights.

1. Load Kloppy & Convert

from kloppy import sportec

from unravel.soccer import KloppyPolarsDataset

kloppy_dataset = sportec.load_open_tracking_data()

kloppy_polars_dataset = KloppyPolarsDataset(
    kloppy_dataset=kloppy_dataset
)

KloppyPolarsDataset

period_id timestamp frame_id ball_state id x y z team_id position_name game_id vx vy vz v ax ay az a ball_owning_team_id is_ball_carrier
0 1 0 days 00:00:00 10000 alive DFL-OBJ-00008F -20.67 -4.56 0 DFL-CLU-000005 RCB DFL-MAT-J03WPY 0.393 -0.214 0 0.447 0 0 0 0 DFL-CLU-00000P False
1 1 0 days 00:00:00 10000 alive DFL-OBJ-0000EJ -8.86 -0.94 0 DFL-CLU-000005 UNK DFL-MAT-J03WPY -0.009 0.018 0 0.02 0 0 0 0 DFL-CLU-00000P False
2 1 0 days 00:00:00 10000 alive DFL-OBJ-0000F8 -2.12 9.85 0 DFL-CLU-00000P RM DFL-MAT-J03WPY 0 0 0 0 0 0 0 0 DFL-CLU-00000P False
3 1 0 days 00:00:00 10000 alive DFL-OBJ-0000NZ 0.57 23.23 0 DFL-CLU-00000P RB DFL-MAT-J03WPY 0.179 -0.134 0 0.223 0 0 0 0 DFL-CLU-00000P False
4 1 0 days 00:00:00 10000 alive DFL-OBJ-0001HW -46.26 0.08 0 DFL-CLU-000005 GK DFL-MAT-J03WPY 0.357 0.071 0 0.364 0 0 0 0 DFL-CLU-00000P False

2. Pressing Intensity!

from unravel.soccer import PressingIntensity

import polars as pl

model = PressingIntensity(
    dataset=kloppy_polars_dataset
)
model.fit(
    start_time=pl.duration(minutes=1, seconds=53),
    end_time=pl.duration(minutes=2, seconds=32),
    period_id=1,
    method="teams",
    ball_method="max",
    orient="home_away",
    speed_threshold=2.0,
) 

>>> PressingIntensity(n_frames=143)

Model Output in Polars


model.output
game_id period_id frame_id timestamp time_to_intercept probability_to_intercept columns rows
0 DFL-MAT-J03WN1 1 12825 0 days 00:01:53 11x11 array 11x11 array ['DFL-OBJ-0001UP' 'DFL-OBJ-0002LI'... 'DFL-OBJ-0026NY' 'DFL-OBJ-002G02'] ['DFL-OBJ-0000XU' 'DFL-OBJ-0000YV'... 'DFL-OBJ-000172' 'DFL-OBJ-000199']

🐻‍❄️ Polars Documentation

Probability to Intercept


>>> [x for x in model.output.to_pandas().iloc[0]["probability_to_intercept"]].round(2)

[[0.07 0.   0.   0.   0.   0.03 0.01 0.   0.   0.   0.23]
 [0.28 0.   0.   0.   0.   0.02 0.   0.   0.   0.   0.  ]
 [0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.  ]
 [0.09 0.   0.   0.   0.   0.1  0.06 0.   0.   0.   0.05]
 [0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.  ]
 [0.   0.   0.   0.   0.   0.09 0.6  0.   0.   0.   0.  ]
 [0.   0.   0.   0.   0.   0.83 0.08 0.   0.   0.   0.  ]
 [0.02 0.   0.   0.   0.   0.02 0.   0.   0.   0.01 0.  ]
 [0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.  ]
 [0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.71]
 [0.3  0.   0.   0.   0.   0.06 0.   0.   0.   0.   0.03]]

3. Video

🌀 Full Example

Advanced Soccer Analytics

Graph Neural Networks

Graph Neural Network

Messi

Counter Attack Predictions

Messi

GNN Architecture

Messi
Wirtz

PFF Open World Cup Tracking + Event Data

match_id home_team away_team date
10517 Argentina France 2022-12-18T15:00:00
10516 Croatia Morocco 2022-12-17T15:00:00
10515 France Morocco 2022-12-14T19:00:00
10514 Argentina Croatia 2022-12-13T19:00:00
10513 England France 2022-12-10T19:00:00
... ... ... ...
3815 United States Wales 2022-11-21T19:00:00
3812 Senegal Netherlands 2022-11-21T16:00:00
3813 England Iran 2022-11-21T13:00:00
3814 Qatar Ecuador 2022-11-20T16:00:00

0. Install "beta" version of kloppy and unravelsports





pip install git+https://github.com/PySport/kloppy.git

pip install git+https://github.com/UnravelSports/unravelsports.git





1. Load Kloppy ...


from kloppy import pff

match_id = "10510"

dataset = pff.load_tracking(
    raw_data=f"{match_id}.jsonl.bz2",
    meta_data=f"{match_id}.json",
    roster_meta_data=f"{match_id}.json",
)
home_team, away_team = dataset.metadata.teams
print(f"{home_team.name} v {away_team.name}")

>>> Croatia v Brazil

1. ... and convert to Polars DataFrame


from unravel.soccer import KloppyPolarsDataset

kloppy_polars_dataset = KloppyPolarsDataset(
    kloppy_dataset=dataset
)

2. Add Labels


kloppy_polars_dataset.data = (
    kloppy_polars_dataset.data
    .join(
        some_label_dataframe.select(
          ["game_id", "period_id", "frame_id", "label"]
        ), 
        on=["game_id", "period_id", "frame_id"],
        how="left"
    )

Where to get Labels?

⚠️ Currently unreleased

from kloppy import pff

match_id = "10510"

event_dataset = pff.load_event()

>>> Croatia v Brazil

Kloppy's Pattern Matching Functionality

Pattern Matching

3. Convert to Graphs


from unravel.soccer import SoccerGraphConverterPolars

kloppy_polars_dataset.add_dummy_labels()
kloppy_polars_dataset.add_graph_ids(by=["frame_id"])

converter = SoccerGraphConverterPolars(dataset=kloppy_polars_dataset)


For each frame...

Messi

For each frame...


Messi
Messi

4. Custom Graph Dataset


from unravel.utils import CustomSpektralDataset

graphs = CustomSpektralDataset(graphs=converter.to_spektral_graphs())

>>> CustomSpektralDataset(n_graphs=100_000)

Under the hood (Graph)


graphs[0]

>>> Graph(n_nodes=23, n_node_features=15, n_edge_features=6, n_labels=1)
  • 23 nodes
    • (22 players + ball)
  • 15 node features
    • (e.g. location, speed, direction)
  • 6 edge features
    • (e.g. distance between 2 players, speed difference between 2 players etc.)
  • 1 label
    • (0 or 1)

Under the hood (Node Feature Matrix)


graphs[0].e

>>> 
array([[0.62, 0.37, 0.05, 0.55, 1.  , 0.33, 0.12, 1.  , 0.  , 0.  , 0.61, 0.99, 0.79, 0.1 , 0.  ],
       [0.5 , 0.2 , 0.07, 0.92, 0.78, 0.45, 0.16, 1.  , 0.  , 0.  , 0.68, 0.97, 1.  , 0.52, 0.  ],
       [0.5 , 0.49, 0.01, 0.81, 0.9 , 0.42, 0.01, 1.  , 0.  , 0.  , 0.5 , 1.  , 0.5 , 1.  , 1.  ],
       [0.57, 0.38, 0.03, 0.91, 0.78, 0.36, 0.09, 1.  , 0.  , 0.  , 0.59, 0.99, 0.87, 0.17, 0.  ],
       [0.75, 0.42, 0.03, 0.67, 0.97, 0.21, 0.21, 1.  , 1.  , 0.  , 0.6 , 0.99, 0.59, 0.01, 0.  ],
       [0.32, 0.25, 0.02, 0.54, 1.  , 0.59, 0.2 , 0.1 , 0.  , 0.  , 0.61, 0.99, 0.82, 0.88, 0.  ],
       [0.37, 0.45, 0.  , 0.81, 0.9 , 0.53, 0.12, 0.1 , 0.  , 0.  , 0.53, 1.  , 0.61, 0.99, 0.  ],
       ...
       [0.03, 0.5 , 0.13, 0.73, 0.94, 0.82, 0.4 , 0.1 , 1.  , 0.  , 0.5 , 1.  , 0.5 , 1.  , 0.  ],
       [0.59, 0.53, 0.01, 0.89, 0.81, 0.34, 0.07, 1.  , 0.  , 0.  , 0.48, 1.  , 0.38, 0.01, 0.  ],
       [0.34, 0.6 , 0.04, 0.53, 1.  , 0.56, 0.15, 0.1 , 0.  , 0.  , 0.45, 1.  , 0.32, 0.97, 0.  ],
       [0.39, 0.22, 0.02, 0.71, 0.95, 0.53, 0.18, 0.1 , 0.  , 0.  , 0.64, 0.98, 0.92, 0.77, 0.  ],
       [0.51, 0.49, 0.15, 0.73, 0.95, 0.41, 0.  , 0.1 , 0.  , 1.  , 0.5 , 1.  , 0.5 , 1.  , 0.  ]])
  • A row for each "object" (player or ball)

Under the hood (Edge Feature Matrix)


graphs[0].e

>>> 
array([[ 0.  ,  0.  ,  1.  ,  0.5 ,  0.5 ,  1.  ],
       [ 0.14,  0.01,  0.87,  0.84,  0.63,  0.98],
       [ 0.12, -0.  ,  0.91,  0.22,  0.66,  0.97],
       ...,
       [ 0.15, -0.  ,  0.97,  0.32,  0.02,  0.62],
       [ 0.18, -0.  ,  0.77,  0.92,  0.3 ,  0.96],
       [ 0.  ,  0.  ,  1.  ,  0.5 ,  0.5 ,  1.  ]])
  • A row for each "connection"

Under the hood (Adjacency Matrix)


graphs[0].a

>>> <Compressed Sparse Row sparse matrix of dtype 'int64'
	with 287 stored elements and shape (23, 23)>
  • Describes connections

5. Draw the rest of the Owl!

  • split_test_train_validation
  • model = CrystalGraphClassifier()
  • model.fit()
  • model.evaluate()
  • model.predict()
Messi

🌀 Full Example

6. Combine Open-Source!

  • kloppy
  • unravelsports
  • mplsoccer
  • spektral
  • tensorflow & keras
  • polars
Messi

Contributing?

  • Learn how to use git / GitHub
  • Open Issues
  • Reach out to package owners / maintainers / contributors
  • Create a Pull Request with a good idea
  • Build your own open-source tool(s)
  • This stuff isn't easy
  • At least you don't need to reinvent the wheel
  • Don't be afraid to make mistakes
  • Try to break things
  • Try to build things that others have not built
  • Embrace the continuous learning process

  • Licenses

  • This presentation is made with Marp

Thanks!

slide

slide