Category Archives: Uncategorized

Leicester City Looks Good for the Champions League: How Many Points Will It Take For 4th Place?

After the mid-week fixtures, I tweeted the following:

These optimistic forecasts rely on a set of probabilities that, despite doing fairly well, have still managed to underrate Leicester City’s quality so far this season. With their top 4 result seeming fairly secure, I had originally planned on writing a piece talking about this, but Mike Goodman beat me to it this morning. It’s a typically strong effort from him that you all should read, going through the points per game Leicester City would need to secure a Champions League spot given historical norms for 4th place. However, as we’ve seen, this season doesn’t fit any sort of historical patterns. So how does Leicester City need to do for the rest of the season if they want to finish in 4th place (or higher)?

To answer this, I took MOTSON’s 10,000 simulated seasons and removed Leicester City from each of the final tables. Then I sorted the table by number of points, and this gave me what the final table would look like without Leicester City. From this, I’m able to see what the “4th place” team (if Leicester City never existed) looks like, and determined that to earn a spot in the Champions League, Leicester City would need to earn this amount of points +1.  I show these point totals in the figure below.

Jan 15 Blog 1

The minimum number of points needed this season is likely going to be considerably lower than normal: probably somewhere around 59-62 points. It’s a weird year, with the expected points for the title being around 80, so it’s somewhat unsurprising that the number for 4th place is quite a bit lower than normal.

To replicate Mike’s analysis, and to determine the number of points per game “The Fighting Lesters”TM would need, I subtracted Leicester City’s current point total (43) from the number in the previous table. This tells me how many points Leicester City needs to come in 4th, and if I divide it by the number of remaining games (17), I get the points per game (PPG). Below is the CDF (cumulative distribution function) of how likely Leicester City is to qualify for the Champions League given a certain number of PPG.

Jan 15 Blog 2

Even if they only manage 1 PPG (approximately the number usually associated with barely avoiding relegation), they’re still about 50% to qualify for the Champions League. If they maintain anywhere near the ~2.1 PPG they’ve averaged so far (even slipping as low as 1.5), they’re basically guaranteed to finish in 4th place or better.

For my final analysis, I look at the previous analyses but do it for 3rd place (the first spot that directly qualifies into the UCL without a qualifier). I follow the same procedure as I did in the 4th place analysis, but instead I look at what it would take to finish 3rd instead of 4th. Here are the expected points needed for 3rd.

Jan 15 Blog 3

They’ll have to do better here, expected to need around 65 points to finish 3rd, which means 22 points out of their last 17 games.  This is still below their current pace, but allows for some regression in form. Finally, I present the CDF for 3rd place.

Jan 15 Blog 4

All Leicester needs to have a 50% chance of finishing third is 1.3 PPG, which is reasonable, and if they can keep up a 1.6 PPG pace they’ll be a virtual lock for 3rd and guaranteed qualification for the Champions League.

Even if Leicester City slips in the second half of the season, a simple fact remains: in such a low points season, they quite frankly don’t even have to be that good over the last 17 games to qualify for the Champions League.  I don’t want to jinx it, but if I’m a Leicester City fan I start getting ready to book my travel on Wednesday nights in the fall.

How Do We Make Analytics More Accessible?

The world doesn’t need another hot take on whether analytics are good, or useful, or the temperature at which stats people keep their offices. I honestly stopped reading the media pieces on the topic after the Rory Smith debacle a week or two ago, but I do think there is room to discuss ways to make analytics more accessible and interesting to a larger audience. Academia deals with this quite a bit, especially in disciplines like Political Science where what we study has consistent relevance to the media, so I wanted to share some of the strategies on how to make often dense statistical research  more accessible to media and practitioners.1

  1. DON’T assume anyone knows anything about math.
    • Most people haven’t taken a math class since college, and even then they didn’t like it. Not only should you not get lost in the math, you should avoid it entirely. Have it in reserve if they ask for more details, but be prepared for people’s eyes to glaze over when you start.
  2. DON’T assume anyone cares about math
    • People don’t care about the method. They care about what they can learn.
  3. DON’T use jargon.
    • Many of the metrics we use have technical sounding names or abbreviations. Maybe the measure is clear and interesting, but when you lead with the technical part people will lose interest. Similarly, instead of saying “R2” talk about “correct predictions.” Present confidence intervals (don’t call them that) instead of p-values, and if possible show me, don’t tell me. Model fit can be intuitive when shown on a graph, but it can be daunting when it’s explained.
  4. DO start with a question
    • What do people care about? Is there a story in the news that you can shed light on with analytics? What can we learn from your method? Analytics that answer a question people care about are more likely to be embraced by media/practitioners than analytics than those that simply present a measure.
  5. DO focus on what we learn, rather than how you did it
    • What new insights does your method give us? What did we not know before that we do now? How does your analysis teach us something about soccer that we didn’t know before?
  6. DO explain why people should care about what you did.
    • In academia, we call this the “so what?” question, and it’s the most important part of this whole process. You did a bunch of math, so what? Why does this matter to a larger audience? Why should people care about what you did here?
  7. DO focus on clear, concise presentation of results.
    • I know of several high  profile studies in political science that only got attention because they had nice infographics attached to them. It could be a clever, clear, infographic, or an interactive tool of some sort that people can play with. If you’re not good with graphics, it could be a table. Or a short paragraph, or anything that is clear, attention getting, and concise. Save description for those who want it.
  8. DO be ready for people to not accept your conclusions.
    • Confirmation bias is a real thing, and it’s difficult to overcome. Analytics that confirm what people already believe are much easier to accept than those that aren’t. And beyond that, some people just don’t like stats – you’ll never convert them and it’s not worth trying.

That’s all I’ve got – I’ve tried to distill a couple dozen articles about outreach in academia to a few bullet points. I’m confident that if soccer analytics folks focus on these things, and are patient enough, things are going to change. They changed in politics pretty quickly, there’s no reason the same thing won’t happen in soccer.

  1. I don’t have a lot of experience with mass media and soccer, but I have had some folks in the industry reach out to me privately. In my political life, I’ve been quoted multiple times in USA Today, The New York Times, have had op-eds placed in Washington Post and USA Today, and have had my research featured in The Huffington Post.  I’ve also been on Voice of America, Al-Jazeera America, and am a regular guest on the News and Views radio show out of Minnesota so I know a fair amount about public outreach for arcane topics.

Champions League Predictor

I’ve finally finished the interactive match predictor I’ve been working on for a couple of weeks now, just in time for the UEFA Champions League Final. You can test it out at:

I wanted to post some of the details behind the program here for anyone interested in the mechanics behind it. There are two steps to the process – the first is calculating the “Skill” level of the two teams, which is based on the results of all games this season. The short version is that I apply a Generalized Partial Credit Model (GPCM), which is a member of the Rasch family to all league results during the season to calculate a team’s “skill” rating. 1 Full details can be found at

The next step was to collect game stats, which I did from a variety of sources around the internet. I merged various offensive, defensive, and discipline statistics with each result, whether a team was at home or not, the skill levels, and goals scored/against in the game. The full dataset has something like 31 variables for each game , and I entered most of the EPL games into the dataset, and each game was broken up into two separate entries (stats for the home and away teams). I ended up with 696 observations in the final dataset * 31 variables.

The next step was to do apply some machine learning algorithms to the data. So I started with something simple: k-means clustering. This method scales all the data into two dimensions, separates them into groups, or “clusters”, and attempts to classify them based on membership in these clusters.  The classic example here seems to be based on classification of different species of iris. Here’s an example of one of the plots you’ll see from a nicely differentiated k-means clustering application.



Here’s what happened when I ran k-means clustering on my data:

cluster plot

As you can see, this wasn’t quite as clean as the canonical “iris” dataset. It also doesn’t predict nearly as well, classifying 27% of the observations correctly. For a point of reference, if I would have said “every team lost every game” I would have predicted 38% correctly.

I did a couple other steps that I may edit in here later, but in the interest of finishing this promptly, I finished with a Support Vector Machine (svm) model, which does something similar to k-means clustering, but adds multiple dimensions. Instead of using a 2-dimensional method, it cuts the data using multi-dimensional hyperplanes to predict outcomes correctly. This method ended up predicting 78% of all outcomes correctly, 65% of all “goals scored” correctly, and 67% of all “goals allowed” correctly.  So that’s the method I use in the predictor for goals scored/allowed.

Plots are created using the “waffle” library in R, and the interactive data visualization is done in Shiny.



  1. I’m working on a method to use in-game stats to predict the skill rating, but that’s not a priority until the fall unless someone knows where I can detailed women’s soccer stats in time for the World Cup

Link Roundup: All Major European League Predictions 5/1/2015

Here are this week’s predictions. Some of the highlights are:

  • Chelsea has an ~80% chance of clinching the EPL title this weekend (66% to win, clinching it outright, Arsenal has a 63% chance of winning against Hull which combined with a Chelsea loss would keep them alive one more week).
  • Real Madrid is a slight underdog on the road against Sevilla (44% to win, 47% to lose)
  • Cagliari is a 44% favorite over Parma to help them in their relegation battle
  • Bayern Munich is a surprisingly big favorite on the road against Bayer Leverkusen (81% to win)

Links to each league below:


La Liga:

Serie A:


Bundesliga Week 31 Predictions

GameHome Team Win %Draw %Visiting Team Win %Actual OutcomeCorrect Prediction
Wolfsburg v. Hannover 9651%36%11%Wolfsburg 2 - Hannover 2No (2nd most likely
Hoffenheim v. Dortmund15%37%46%Hoffenheim 1 - Dortmund 1No (2nd most likely)
Schalke 04 v. VfB Stuttgart37%28%34%Schalke 3 - Stuttgart 2Yes
Werder Bremen v. Eintracht25%30%45%Werder 1 - Eintracht 0No (3rd most likely)
FC Augsburg v. FC Koln29%1%71%Augsburg 0 - Koln 0No (3rd most likely)
SC Freiburg v. Paderborn34%47%17%Freiburg 1 - Paderborn 2No (3rd most likely)
Bayer v. Bayern Munich5%14%81%Bayer 2 - Bayern 0No (3rd most likely)
Mainz 05 v. Hamburger SV29%51%19%Mainz 05 1 - Hamburger 2No (3rd most likely)
Hertha BSC v. Monchengladbach21%35%41%

Serie A Predictions Week 34

GameHome Team Win %Draw %Visiting Team Win %Actual OutcomeCorrect Prediction
Sampdoria v. Juventus5%26%68%Sampdoria 0 - Juventus 1Yes
Sassuolo v. Palermo33%36%31%Sassuolo 0 - Palermo 0Yes
Roma v. Genoa42%45%13%Roma 2 - Genoa 0No (2nd most likely)
Verona v. Udinese44%33%23%Verona 0 - Udinese 1No (3rd most likely)
Fiorentina v. Cesena76%18%6%Fiorentina 3 - Cesena 1Yes
Inter Milan v. Chievo44%37%18%Inter 0 - Chievo 0 No (2nd most likely)
Atalanta v. Lazio4%37%60%Atalanta 1 - Lazio 1No (2nd most likely)
Napoli v. Milan 34%21%45%
Cagliari v. Parma44%32%24%
Torino v. Empoli53%25%21%

La Liga Predictions Week 35

GameHome Team Win %Draw %Visiting Team Win %Actual OutcomeCorrect Prediction
Real Sociedad v. Levante16%59%24%Real Sociedad 3 - Levante 0No (3rd most likely
Cordoba v. Barcelona9%27%63%Cordoba 0 - Barcelona 8Yes
Atletico Madrid v. Athletic76%18%6%Atletico 0 - Athletic 0No (2nd most likely)
Sevilla v. Real Madrid47%8%44%Sevilla 2 - Real Madrid 3No (2nd most likely)
Deportivo v. Villareal21%59%19%Deportivo 1 - Villareal 1Yes
Espanyol v. Rayo Vallecano24%27%48%Espanyol 1 - Rayo 1No (2nd most likely)
Getafe CF v. Granada54%32%13%
Valencia v. Eibar65%28%7%
Malaga v. Elche61%13%25%
Almeria v. Celta Vigo56%7%36%

EPL Predictions Week 35

GameHome Team Win %Draw %Visiting Team Win %Actual OutcomeCorrect Prediction
Leicester City v. Newcastle13%14%72%Leicester City 3 - Newcastle 0No (3rd most likely)
Swansea City v. Stoke City28%22%50%Swansea City 2 - Stoke City 0No (2nd most likely)
Liverpool v. QPR75%6%18%Liverpool 2 - QPR 1Yes
Sunderland v. Southampton9%47%43%Sunderland 2 - Southampton 1No (3rd most likely)
West Ham v. Burnley FC37%42%20%West Ham 1 - Burnley 0No (2nd most likely)
Aston Villa v. Everton32%13%54%Aston Villa 3 - Everton 1No (2nd most likely)
Manchester United v. West Bromwich Albion31%49%19%Man United 0 - WBA 1No (3rd most likely
Chelsea v. Crystal Palace66%22%11%Chelsea 1 - Crystal Palace 0Yes
Tottenham v. Manchester City29%16%55%
Hull City v. Arsenal5%32%63%

La Liga Mid-Week Predictions

GameHome Team Win %Draw %Visiting Team Win %Actual OutcomeCorrect Prediction
Barcelona v. Getafe87%9%4%Barcelona 6 - Getafe 0Yes
Athletic v. Real Sociedad63%21%15%Athletic 1 - Real Sociedad 1No (3rd most likely)
Levante v. Cordoba45%34%21%Levante 1 - Cordoba 0Yes
Eibar v. Sevilla42%22%35%Eibar 1 - Sevilla 3No (2nd most likely)
Celta Vigo v. Malaga19%35%45%Celta Vigo 1 - Malaga 0 No (3rd most likely)
Real Madrid v. Almeria93%4%4%Madrid 3 - Almeria 0Yes
Elche v. Deportivo57%29%13%Elche 4 - Deportivo 0Yes
Villareal v. Atletico Madrid20%61%19%Villareal 0 - Atletico 1No (3rd most likely)
Rayo v. Valencia25%8%66%Rayo 1 - Valencia 1No (3rd most likely)
Granada v. Espanyol13%37%48%Granada 1 - Espanyol 2Yes

Predictions of La Liga Winner: Pre-Week 33

I ran the simulations for the run-in for each of the top 3 teams in Spain right now, and Barcelona’s 2 point lead over Real Madrid translates into a 68% chance of finishing the season on top of La Liga. Madrid has a 17% chance of winning the league outright, and an 11% chance of finishing the season with the same number of points as Barcelona. 1

La Liga Week 33

Other interesting stats: Barcelona has about a 33% chance of winning the rest of their games and finishing with the max 93 points, while Madrid only has a 26% chance of winning out and finishing with 91. Atletico Madrid still has about a 3% chance of winning the league

  1. La Liga’s tiebreaker is different than the typical “goal differential”, where they look at results between the tied teams first, and Madrid has a +1 goal differential in the two games