No one can predict the outcome of a football match with certainty. But can we predict the outcome of a match from the history of a competition? This is the question that predictive algorithms try to answer. From mathematical models to Big Data, discover what lies at the heart of the most efficient algorithms.
What do a bank, a supercomputer and the FIFA World Cup have in common? Answer: a predictive algorithm!
Like punters and different bookmaker operators, algorithms analyze past matches and the forces involved to establish the prediction of a football match. Its premise: the past is as important as the future. As you will see, data has revolutionized football, its economy, and the sports betting business.
When data revolutionized football…
Today, between players wearing GPS-enabled bibs, cameras tracking actions in real time, and recruiting cells hyperlinked to databases, data is everywhere. The economic and sporting risks are such that data analysis has become a vector of competitive advantage over other teams. Two anecdotes now famous in the world of “football analytics” illustrate the weight of data analysis.
The first traces the history of the title of champion of England acquired in 2012 by Manchester City and its Italian manager, Roberto Mancini. This season, after sifting through thousands of stats, including 400 corners taken from previous seasons, Citizens analysts conclude that the most dangerous corners are re-entrant corners, an area where City don’t excel. Mancini demanded to generate more corner kicks. Manchester City will score 15 goals from the corner.
The second famous anecdote takes place at Arsenal in 2004. To replace Patrick Vieira at the start, Arsène Wenger analyzes player statistics from all European leagues to find a midfielder capable of running 14 kilometers per game. He then discovered a player unknown to the general public who is emerging in OM: Mathieu Flamini.
Some companies quickly exploited this new gold mine and over the years created databases on competitions, teams and players worth a fortune. A small handful of them today provide bookmakers, the media and professional clubs with databases and statistical tools for decision-making. The two most important in the world are Sportradar and Stats Perform (ex-Opta).
Algorithms, banks and the World Cup
The international bank Goldman Sachs mobilized its macroeconomic research team during the last 3 World Cups to arrive at an algorithmic prediction of the winner. If the powerful algorithm has yielded each time (giving Brazil each time), the approach taken is very interesting to understand how a prediction algorithm works:
We collected data on team characteristics, players and the most recent team performance and put it through 4 machine learning models to analyze the number of goals scored in each game. The model then learned the relationship between these characteristics and goals scored, using scores from all World Cup and European Cup matches since 2005…
Going back to the Goldman Sachs algorithm, he still managed to find 13 of the 16 teams in the Round of 16 with a 68% success rate.
How does a predictive algorithm work?
To develop a predictive algorithm, you need: a good computer, up-to-date data, and a mathematical model. Okay, that’s too simplistic for a vision. But in reality, the two essential components of a good predictive algorithm are the data and the applied models.
The number of goals scored and conceded, possession of the ball, the number of shots on goal, successful passes, corner kicks, playing areas… all the facts of the game are quantifiable in a match. But, can we establish trends from the observation of these criteria? With all due respect to defenders of the glorious uncertainty of sport, the answer is yes.
In the book The Numbers Games: Why Everything You Know About Soccer Is Wrong, David Sally and Chris Anderson attempt to demystify various preconceived ideas about soccer through the analysis of statistics. Starting with the importance of corner kicks:
The total number of goals a team scores does not increase with the number of corner kicks they win. The correlation is basically zero. You can have a corner or 17 – this will not have a significant impact on the number of goals you score.
But, which criterion has more weight in the probability of victory of a team? Naturally, they focus on what is rarest in football: goals.
Therefore, Sally and Anderson used the bookmakers’ favorite law of probability, Poisson’s law, to predict the distribution of the number of goals per game for the Big 5 teams between 1993 and 2011 (number of games without goals, with a goal…).
Poisson’s law or how to predict the score of a match
You will find out how bookmakers and football prediction bots manage to predict the score of a football match and ultimately the most likely outcome.
From a sample of substantial matches (at least one season), a bookmaker can calculate a probable score and convert the estimated odds into odds.
- Calculate the attacking force: that is, the relationship between the average number of goals scored by the team and that of the rival.
- Calculate the defense strength: the relationship between the average number of goals conceded by the team and that of the rival.
- Predict the number of goals scored by the home and away team: For each team, this involves multiplying the attacking strength of the team by the defensive potential of the opponent and by the number of goals scored at home for the home team and for the away team.
- Use Poisson’s law to estimate the probability of the number of goals scored by each team: the formula will allow from the different types of events (number of goals from 0 to 6) and the probable number of goals scored by each team, to obtain the probability of each occurrence. For example, the chance that the away team scores 1 goal and the home team 2 goals.
- Deduct the expected score: By taking out the highest odds for each event, we get the expected score.
Of course, if Poisson’s law reveals any logic in the randomness of targets, it is far from perfect. A more evolved model would take into account, for example, the degree of importance of the match for the two teams, the absence of one of the key players or the arrival of a new coach.
The ideal algorithm would be an intelligent one that learns from these errors and has access to hundreds of parameters to finally count, measure the weight of each of these factors in the result of a match and adapt accordingly.
The Dixon-Coles model: an improvement of the Poisson distribution
According to mathematicians, Poisson’s law has two major drawbacks: it underestimates small scores (0-0, 1-0,1-1) and it places as much weight on past events as on recent ones. The Dixon-Coles model corrects for these points. Betegy’s prediction algorithm uses a modified version of this model by integrating the dynamics of the two teams.