An Introduction to Football Analytics

Cameroon at the 2022 World Cup




Joris Bekkers

Contents

  • Introduction
    • Career Path

  • Football Data & Analytics
    • Basics
    • Visualizations
    • Analytics
    • etc.

  • Research & Innovation
    • Pressing Intensity
    • Graph Neural Networks

Introduction

  • Joris Bekkers
    • 35 y/o
    • Breda, Netherlands

  • Football Analytics Consultant
    • Research
    • Development
    • Deployment

  • PySport Boardmember
    • Unite Sports Analytics Community
    • Grow and Maintain Open-Source

Messi

Career Path

  • Operations Mgmt. & Logistics [Graduate / MSc] ('14 - '17)

Career Path

  • Operations Mgmt. & Logistics [Graduate / MSc] ('14 - '17)
  • MIT Sloan Sports Analytics Conference (2017)
  • Bekkers & Dabadghao (2017)

Messi

Messi

Career Path

  • Operations Mgmt. & Logistics [Graduate / MSc] ('14 - '17)
  • MIT Sloan Sports Analytics Conference (2017)
  • AFC Bournemouth (2017)
Messi

Career Path

  • Operations Mgmt. & Logistics [Graduate / MSc] ('14 - '17)
  • MIT Sloan Sports Analytics Conference (2017)
  • AFC Bournemouth (2017)
  • U.S. Football Federation ('18 - now)
Messi

Career Path

  • Operations Mgmt. & Logistics [Graduate / MSc] ('14 - '17)
  • MIT Sloan Sports Analytics Conference (2017)
  • AFC Bournemouth (2017)
  • U.S. Football Federation ('18 - now)
  • Football Analytics Consultant ('19 – forever?)
Messi

Career Path

  • Operations Mgmt. & Logistics [Graduate / MSc] ('14 - '17)
  • MIT Sloan Sports Analytics Conference (2017)
  • AFC Bournemouth (2017)
  • U.S. Football Federation ('18 - now)
  • Football Analytics Consultant ('19 – forever?)
  • MIT Sloan Sports Analytics Conference (2023)
  • Bekkers & Sahasrabudhe (2023)

Messi

Career Path

  • Operations Mgmt. & Logistics [Graduate / MSc] ('14 - '17)
  • MIT Sloan Sports Analytics Conference (2017)
  • AFC Bournemouth (2017)
  • U.S. Football Federation ('18 - now)
  • Football Analytics Consultant ('19 – forever?)
  • MIT Sloan Sports Analytics Conference (2023)
  • PySport ('24 - now)
Messi

Introduction to Football Analytics

Types of Data in Football

  • Box Score Data
Messi

Types of Data in Football

  • Box Score Data
  • Event Data
Messi

Types of Data in Football

  • Box Score Data
  • Event Data
  • (Broadcast) Positional Tracking Data
Messi

Types of Data in Football

  • Box Score Data
  • Event Data
  • (Broadcast) Positional Tracking Data
  • GPS / LPS Data
Messi

Types of Data in Football

  • Box Score Data
  • Event Data
  • (Broadcast) Positional Tracking Data
  • GPS / LPS Data
  • Health & Questionnaire Data
Messi
Carmichael, Mikaeli A., et al. (2021)

Types of Data in Football

  • Box Score Data
  • Event Data
  • (Broadcast) Positional Tracking Data
  • GPS / LPS Data
  • Health Data & Questionnaire
  • Body Pose Data
Messi

Types of Data in Football

  • Box Score Data
  • Event Data
  • (Broadcast) Positional Tracking Data
  • GPS / LPS Data
  • Health Data & Questionnaire
  • Body Pose Data

Open-Source Football Analytic Tools!

What is kloppy?

Load, Filter, Transform, Export Standardized Football Data.

Powered by PySport Downloads

Why kloppy?

Standardization!

Opta / StatsPerform
<Event event_id="17" type_id="1" period_id="1" min="0" sec="35" player_id="89076" team_id="497" outcome="1" x="41.4" y="16.4">
      <Q id="3925254967" qualifier_id="56" value="Right" />
      <Q id="3925254973" qualifier_id="212" value="23.1" />
      <Q id="3925254969" qualifier_id="140" value="61.2" />
      <Q id="3925366067" qualifier_id="178" />
      <Q id="3925254971" qualifier_id="141" value="1.7" />
      <Q id="3925254975" qualifier_id="213" value="5.8" />
    </Event>
StatsBomb
{"period" : 1, "minute" : 0, "second" : 3, "type" : {"id" : 30, "name" : "Pass"},
... "possession" : 2, "location" : [ 36.8, 27.3 ], "pass" : { "recipient" : { "id" : 6613}, "length" : 68.33, "angle" : 0.756}
etc.

Supported Providers

Provider Event Tracking Public Data
Hawkeye (2D) ✅
Metrica ✅ ✅ 🔗
PFF 🟠 ✅ 🔗
SecondSpectrum PR #437 ✅
Signality ✅
SkillCorner ✅ 🔗
Sportec ✅ ✅ 🔗
StatsPerform / Opta ✅ ✅
Tracab ✅
DataFactory ✅
StatsBomb ✅ 🔗
WyScout ✅ 🔗

✅ Implemented 🟠 Not yet implemented

Event Data

Using Open-Source Football Analytics Tools!

What is Event Data?

  • Shots
  • Passes
  • Defensive Actions (Interceptions, Tackles, Clearances, Fouls etc.)
  • Dribbles
  • Substitutions
  • Cards
    etc.

Load Event Data

from kloppy import opta

event_dataset = opta.load(
    f7_data='data/srml-4-2022-f2277835-matchresults.xml',
    f24_data='data/f24-4-2022-2277835-eventdetails.xml',
)
home_team, away_team = event_dataset.metadata.teams
print(f"{home_team.name} v {away_team.name}")

>>> Cameroon v Brazil

Meta data

Within the metadata we can find a lot of information, for example the player_id of each player.

for p in home_team.players:
    print(f"{p.jersey_no}. {p.full_name} - {p.player_id}")

>>> 20. Bryan Mbeumo - 446008
>>> 8. André-Frank Zambo Anguissa - 203325
>>> 13. Eric Choupo-Moting - 42564
>>> 10. Vincent Aboubakar - 83465

Meta data

event_dataset.metadata.pitch_dimensions

>>> MetricPitchDimensions(x_dim=Dimension(min=-52.5, max=52.5),
 y_dim=Dimension(min=-34.0, max=34.0), 
 standardized=False, unit=<Unit.METERS: 'm'>, 
 ... pitch_length=105.0, pitch_width=68.0)

Anguissa's Events

player = home_team.get_player_by_id("203325")

player_events = event_dataset.filter(
    lambda event: (
        event.player == player if event.player is not None else False
    )
)

>>> <EventDataset record_count=84>

mplsoccer!

from mplsoccer.pitch import Pitch
pitch = Pitch(pitch_color='grass', line_color='white', stripe=True)
fig, ax = pitch.draw()
Messi

mplsoccer Documentation

from mplsoccer import Pitch

def heatmap(xs, ys):
    pitch = Pitch(
        line_zorder=2,
        pitch_type="secondspectrum",
        pitch_length=transformed_events.metadata.pitch_dimensions.pitch_length,
        pitch_width=transformed_events.metadata.pitch_dimensions.pitch_width,
    )
    fig, ax = pitch.draw()
    
    ax.set_title(f"#{player.jersey_no} - {player.last_name} - {player.team.name}")
    pitch.kdeplot(xs, ys, ax=ax, cmap="YlOrRd", fill=True, levels=100)


transformed_events = player_events.transform(
    to_orientation="ACTION_EXECUTING_TEAM"
)

xs = [event.coordinates.x for event in transformed_events if event.coordinates is not None]
ys = [event.coordinates.y for event in transformed_events if event.coordinates is not None]

heatmap(xs, ys)
Anguissa

mplsoccer Gallery

Shotmap

from kloppy.domain import EventType

team_shots = (
    event_dataset
    .filter(
        lambda event: (
            event.event_type == EventType.SHOT
        )
    ).transform(
        to_orientation="STATIC_HOME_AWAY"
    )
)
pitch = Pitch(
    line_zorder=2,
    pitch_type="secondspectrum",
    half=False,
    pitch_length=events.metadata.pitch_dimensions.pitch_length,
    pitch_width=events.metadata.pitch_dimensions.pitch_width,
)
fig, ax = pitch.draw()

ax.set_title(f"Brazil & Cameroon Shots")

for event in events:
    if event.coordinates is not None and event.result_coordinates is not None:
        color = "#479A50" if event.team == home_team else "#FFDC02"
        x = event.coordinates.x
        y = event.coordinates.y
        x_end = event.result_coordinates.x
        y_end = event.result_coordinates.y
        
        pitch.scatter(x, y, ax=ax, s=100, color=color, edgecolors="black")
        
        pitch.arrows(x, y, x_end, y_end, ax=ax, width=2, headwidth=5, 
                        headlength=5, color=color, alpha=0.6)
Anguissa

Momentum

Anguissa

Positional Tracking Data

PFF Open World Cup Tracking + Event Data

match_id home_team away_team date
10517 Argentina France 2022-12-18T15:00:00
10516 Croatia Morocco 2022-12-17T15:00:00
10515 France Morocco 2022-12-14T19:00:00
10514 Argentina Croatia 2022-12-13T19:00:00
10513 England France 2022-12-10T19:00:00
... ... ... ...
3815 United States Wales 2022-11-21T19:00:00
3812 Senegal Netherlands 2022-11-21T16:00:00
3813 England Iran 2022-11-21T13:00:00
3814 Qatar Ecuador 2022-11-20T16:00:00
Anguissa
from kloppy import pff

match_id = "3840"

dataset = pff.load_tracking(
    raw_data=f"{match_id}.jsonl.bz2",
    meta_data=f"{match_id}.json",
    roster_meta_data=f"{match_id}.json",
    coordinates="secondspectrum",
)
home_team, away_team = dataset.metadata.teams
print(f"{home_team.name} v {away_team.name}")

>>> Cameroon v Serbia
len(dataset.records)

>>> 105709

dataset.records[0]

>>> <Frame>
pitch = Pitch(
    line_zorder=2,
    pitch_type="secondspectrum",
    half=False,
    pitch_length=dataset.metadata.pitch_dimensions.pitch_length,
    pitch_width=dataset.metadata.pitch_dimensions.pitch_width,
)
fig, ax = pitch.draw()

plt.title(f"Period {frame.period.id} - {frame.timestamp}")

for player, coordinates in frame.players_coordinates.items():
    if coordinates is None:
        continue
    color = "#479A50" if player.team.name == "Cameroon" else "#ffffff"
    pitch.scatter(coordinates.x, coordinates.y, ax=ax, s=100, color=color, edgecolors="black")
    
if frame.ball_coordinates is not None:
    pitch.scatter(frame.ball_coordinates.x, frame.ball_coordinates.y, ax=ax, s=10, color="white", edgecolors="black")

Aboubakar v Serbia '63

Advanced Football Analytics

Advanced Football Analytics

Pitch Control

Anguissa

Spearman (2018)
Fernández & Bornn (2017)

Advanced Football Analytics

Pressing Intensity

Develop an intuitive measure for pressing that can be used by coaches, assistants and (data) analysts to identify and analyze pressing situations (e.g. on video, using event data), compute derived metrics (e.g. passes under pressure) and analyze specific in-game situations (e.g. players that don’t follow through on their pressing).

Pressing Intensity

Requirements

  • Use positional tracking data
  • Multipurpose / Extendable
  • No ML / prediction model
  • Not too complex
  • Not too simple
  • Measurable at every moment
  • Summable
  • Measure pressing in the whole "system”
Messi

Pressing Intensity

Requirements

  • Use positional tracking data
  • Multipurpose / Extendable
  • No ML / prediction model
  • Not too complex
  • Not too simple
  • Measurable at every moment
  • Summable
  • Measure pressing in the whole "system”
Messi
Total Pressure = 1 - [(1 – 86%) * (1 – 19%)] = 89%
What is the chance 🏃‍♂️defense will reach 🏃‍♀️attack or ⚽️ within
some amount of time given their current direction and speed?"


Messi

UnravelSports Blog

Wirtz


The unravelsports package aims to aid researchers, analysts and enthusiasts by providing intermediary steps in the complex process of converting raw sports data into meaningful information and actionable insights.

1. Load Kloppy & Convert

from kloppy import sportec

from unravel.soccer import KloppyPolarsDataset

kloppy_dataset = sportec.load_open_tracking_data()

kloppy_polars_dataset = KloppyPolarsDataset(
    kloppy_dataset=kloppy_dataset
)

KloppyPolarsDataset

period_id timestamp frame_id ball_state id x y z team_id position_name game_id vx vy vz v ax ay az a ball_owning_team_id is_ball_carrier
0 1 0 days 00:00:00 10000 alive DFL-OBJ-00008F -20.67 -4.56 0 DFL-CLU-000005 RCB DFL-MAT-J03WPY 0.393 -0.214 0 0.447 0 0 0 0 DFL-CLU-00000P False
1 1 0 days 00:00:00 10000 alive DFL-OBJ-0000EJ -8.86 -0.94 0 DFL-CLU-000005 UNK DFL-MAT-J03WPY -0.009 0.018 0 0.02 0 0 0 0 DFL-CLU-00000P False
2 1 0 days 00:00:00 10000 alive DFL-OBJ-0000F8 -2.12 9.85 0 DFL-CLU-00000P RM DFL-MAT-J03WPY 0 0 0 0 0 0 0 0 DFL-CLU-00000P False
3 1 0 days 00:00:00 10000 alive DFL-OBJ-0000NZ 0.57 23.23 0 DFL-CLU-00000P RB DFL-MAT-J03WPY 0.179 -0.134 0 0.223 0 0 0 0 DFL-CLU-00000P False
4 1 0 days 00:00:00 10000 alive DFL-OBJ-0001HW -46.26 0.08 0 DFL-CLU-000005 GK DFL-MAT-J03WPY 0.357 0.071 0 0.364 0 0 0 0 DFL-CLU-00000P False

2. Pressing Intensity!

from unravel.Football import PressingIntensity

import polars as pl

model = PressingIntensity(
    dataset=kloppy_polars_dataset
)
model.fit(
    start_time=pl.duration(minutes=1, seconds=53),
    end_time=pl.duration(minutes=2, seconds=32),
    period_id=1,
    method="teams",
    ball_method="max",
    orient="home_away",
    speed_threshold=2.0,
) 

>>> PressingIntensity(n_frames=143)

Probability to Intercept


[[0.07 0.   0.   0.   0.   0.03 0.01 0.   0.   0.   0.23]
 [0.28 0.   0.   0.   0.   0.02 0.   0.   0.   0.   0.  ]
 [0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.  ]
 [0.09 0.   0.   0.   0.   0.1  0.06 0.   0.   0.   0.05]
 [0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.  ]
 [0.   0.   0.   0.   0.   0.09 0.6  0.   0.   0.   0.  ]
 [0.   0.   0.   0.   0.   0.83 0.08 0.   0.   0.   0.  ]
 [0.02 0.   0.   0.   0.   0.02 0.   0.   0.   0.01 0.  ]
 [0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.  ]
 [0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.71]
 [0.3  0.   0.   0.   0.   0.06 0.   0.   0.   0.   0.03]]

3. Video

🌀 Full Example

Advanced Football Analytics

Graph Neural Networks

Graph Neural Network


Messi
Messi

Counter Attack Predictions

Messi

GNN Architecture

Messi
Wirtz

from unravel.soccer import SoccerGraphConverterPolars

kloppy_polars_dataset.add_dummy_labels()
kloppy_polars_dataset.add_graph_ids(by=["frame_id"])

converter = SoccerGraphConverterPolars(dataset=kloppy_polars_dataset)

dataset = CustomSpektralDataset(graphs=converter.to_spektral_graphs())

from spektral.data import DisjointLoader

train, test, val = dataset.split_test_train_validation(
    split_train=4, split_test=1, split_validation=1, random_seed=43
)
from unravel.classifiers import CrystalGraphClassifier
model = CrystalGraphClassifier()
loader_tr = DisjointLoader(train, batch_size=batch_size)
loader_va = DisjointLoader(val, epochs=1, shuffle=False, batch_size=batch_size)

model.fit(
    loader_tr.load(),
    epochs=epochs,
    steps_per_epoch=loader_tr.steps_per_epoch,
    use_multiprocessing=True,
    validation_data=loader_va.load(),
    callbacks=[EarlyStopping(monitor="loss", patience=5, restore_best_weights=True)],
)

Thanks!

#### Using Open-Source Football Analytic Tools!

slide