In a recent phone interview I was asked a fascinating question about analytics’ use in soccer. To paraphrase, “is tracking data relevant to soccer, where there are so few outcomes and traditional metrics are still in their infancy?” In honesty, my first inclination was to describe how there remains significant low-hanging fruit in soccer analytics, and that information from event data (shots, tackles, touches, etc.) should be mined before teams and individuals delve into the complex, messy world of tracking data. I started to banter about this before I realized that, despite being true, this logic is fundamentally flawed.
The perception of tracking data as a way to go “one step beyond” traditional box score metrics is a commonly held belief in the analytics community. The trickle-down effect of this is a belief that tracking data has little of value to offer to organizations and individuals just beginning to build analytic capacity. There is truth to this. Anyone with a basic understanding of regression and some play-by-play data can calculate wins above replacement, regression-adjusted plus-minus, and other staples of the sports analyst’s toolbox. In contrast, the tools that form the foundation of tracking data analyses, such as Kalman filters, hidden Markov models, and Gaussian processes, are often seen for the first time only within graduate degrees in statistics and machine learning. Further, the volume of tracking data is enormous, with 1 second of trajectories containing as many data points as an entire game of play-by-play.
The deck seems to be stacked heavily against the use of tracking data. Why go through so much trouble, when box score and play-by-play data is so simple and manageable? It’s an appealing argument, and in fact we would agree that most teams or analysts just starting out in the world of data should not start with raw tracking data.
Think for a minute about how coaches and analysts will describe a soccer game. Phrases such as “Messi always seems to find open space” and “there was no space to move, let alone pass” would not be out of place. In fact, those who really understand the game move quickly beyond discussing the score line to talk about opportunity creation, spatial positioning, and effort and fatigue. From this we see that managers and experts understand the game in ways that event data can not quantify: how players create space for their teammates, how players are able to accelerate beyond the back line, and how players eliminate passing opportunities to name a few.
Tracking data, therefore, is not a tool for getting slightly refined understanding of things we already know (though, admittedly, it can also accomplish this). Rather, tracking data allows us to understand entire features of players and the game as a whole that are not possible with play-by-play data. As an example, it is well known that traditional metrics are heavily skewed towards measuring offensive characteristics. Quantifying a soccer player’s defensive abilities with only event data (tackles and interceptions, primarily) is sure to lead to a severely limited and biased perspective. Many great defenders seldom need to tackle, for instance, as their spatial positioning is so solid as to not require such intervention. It is no wonder that soccer analytics is seen with a skeptical eye from managers and insiders – purchasing a center-back based on event data would be akin to buying a house based solely on a picture of the dining room.
So you’re starting into the world of sports analytics? By all means, download some box score or play-by-play data and start messing around. But don’t think of tracking data as only necessary for refining the work you’re doing; rather, think of it as completely orthogonal information from what’s available in traditional hand-collected data. We don’t study tracking data to better understand the things we already know. We study it to understand completely different (and as yet unquantified) aspects of the game. Of course, if you think spacing, speed, agility, defensive pressure, and opportunity creation are only tangential to the game of soccer (or basketball, hockey, or American football, for that matter), then feel free to leave tracking data on your "to do" list.