Champions League Predictor

I’ve finally finished the interactive match predictor I’ve been working on for a couple of weeks now, just in time for the UEFA Champions League Final. You can test it out at:

https://chadmurphy.shinyapps.io/CL_Finals_Predictor

I wanted to post some of the details behind the program here for anyone interested in the mechanics behind it. There are two steps to the process – the first is calculating the “Skill” level of the two teams, which is based on the results of all games this season. The short version is that I apply a Generalized Partial Credit Model (GPCM), which is a member of the Rasch family to all league results during the season to calculate a team’s “skill” rating. 1 Full details can be found at http://soccer.chadmurphy.org/predicting-late-season-outcomes-the-method/

The next step was to collect game stats, which I did from a variety of sources around the internet. I merged various offensive, defensive, and discipline statistics with each result, whether a team was at home or not, the skill levels, and goals scored/against in the game. The full dataset has something like 31 variables for each game , and I entered most of the EPL games into the dataset, and each game was broken up into two separate entries (stats for the home and away teams). I ended up with 696 observations in the final dataset * 31 variables.

The next step was to do apply some machine learning algorithms to the data. So I started with something simple: k-means clustering. This method scales all the data into two dimensions, separates them into groups, or “clusters”, and attempts to classify them based on membership in these clusters.  The classic example here seems to be based on classification of different species of iris. Here’s an example of one of the plots you’ll see from a nicely differentiated k-means clustering application.

 

Source: http://things-about-r.tumblr.com/post/65925089613/dream-team-combining-tableau-and-r

Here’s what happened when I ran k-means clustering on my data:

cluster plot

As you can see, this wasn’t quite as clean as the canonical “iris” dataset. It also doesn’t predict nearly as well, classifying 27% of the observations correctly. For a point of reference, if I would have said “every team lost every game” I would have predicted 38% correctly.

I did a couple other steps that I may edit in here later, but in the interest of finishing this promptly, I finished with a Support Vector Machine (svm) model, which does something similar to k-means clustering, but adds multiple dimensions. Instead of using a 2-dimensional method, it cuts the data using multi-dimensional hyperplanes to predict outcomes correctly. This method ended up predicting 78% of all outcomes correctly, 65% of all “goals scored” correctly, and 67% of all “goals allowed” correctly.  So that’s the method I use in the predictor for goals scored/allowed.

Plots are created using the “waffle” library in R, and the interactive data visualization is done in Shiny.

 

 

  1. I’m working on a method to use in-game stats to predict the skill rating, but that’s not a priority until the fall unless someone knows where I can detailed women’s soccer stats in time for the World Cup

Link Roundup: All Major European League Predictions 5/1/2015

Here are this week’s predictions. Some of the highlights are:

  • Chelsea has an ~80% chance of clinching the EPL title this weekend (66% to win, clinching it outright, Arsenal has a 63% chance of winning against Hull which combined with a Chelsea loss would keep them alive one more week).
  • Real Madrid is a slight underdog on the road against Sevilla (44% to win, 47% to lose)
  • Cagliari is a 44% favorite over Parma to help them in their relegation battle
  • Bayern Munich is a surprisingly big favorite on the road against Bayer Leverkusen (81% to win)

Links to each league below:

EPL: http://soccer.chadmurphy.org/uncategorized/epl-predictions-week-35/

La Liga: http://soccer.chadmurphy.org/uncategorized/la-liga-predictions-week-35/

Serie A: http://soccer.chadmurphy.org/uncategorized/serie-a-predictions-week-34/

Bundesliga: http://soccer.chadmurphy.org/uncategorized/bundesliga-week-31-predictions/

Bundesliga Week 31 Predictions

GameHome Team Win %Draw %Visiting Team Win %Actual OutcomeCorrect Prediction
Wolfsburg v. Hannover 9651%36%11%Wolfsburg 2 - Hannover 2No (2nd most likely
Hoffenheim v. Dortmund15%37%46%Hoffenheim 1 - Dortmund 1No (2nd most likely)
Schalke 04 v. VfB Stuttgart37%28%34%Schalke 3 - Stuttgart 2Yes
Werder Bremen v. Eintracht25%30%45%Werder 1 - Eintracht 0No (3rd most likely)
FC Augsburg v. FC Koln29%1%71%Augsburg 0 - Koln 0No (3rd most likely)
SC Freiburg v. Paderborn34%47%17%Freiburg 1 - Paderborn 2No (3rd most likely)
Bayer v. Bayern Munich5%14%81%Bayer 2 - Bayern 0No (3rd most likely)
Mainz 05 v. Hamburger SV29%51%19%Mainz 05 1 - Hamburger 2No (3rd most likely)
Hertha BSC v. Monchengladbach21%35%41%

Serie A Predictions Week 34

GameHome Team Win %Draw %Visiting Team Win %Actual OutcomeCorrect Prediction
Sampdoria v. Juventus5%26%68%Sampdoria 0 - Juventus 1Yes
Sassuolo v. Palermo33%36%31%Sassuolo 0 - Palermo 0Yes
Roma v. Genoa42%45%13%Roma 2 - Genoa 0No (2nd most likely)
Verona v. Udinese44%33%23%Verona 0 - Udinese 1No (3rd most likely)
Fiorentina v. Cesena76%18%6%Fiorentina 3 - Cesena 1Yes
Inter Milan v. Chievo44%37%18%Inter 0 - Chievo 0 No (2nd most likely)
Atalanta v. Lazio4%37%60%Atalanta 1 - Lazio 1No (2nd most likely)
Napoli v. Milan 34%21%45%
Cagliari v. Parma44%32%24%
Torino v. Empoli53%25%21%

La Liga Predictions Week 35

GameHome Team Win %Draw %Visiting Team Win %Actual OutcomeCorrect Prediction
Real Sociedad v. Levante16%59%24%Real Sociedad 3 - Levante 0No (3rd most likely
Cordoba v. Barcelona9%27%63%Cordoba 0 - Barcelona 8Yes
Atletico Madrid v. Athletic76%18%6%Atletico 0 - Athletic 0No (2nd most likely)
Sevilla v. Real Madrid47%8%44%Sevilla 2 - Real Madrid 3No (2nd most likely)
Deportivo v. Villareal21%59%19%Deportivo 1 - Villareal 1Yes
Espanyol v. Rayo Vallecano24%27%48%Espanyol 1 - Rayo 1No (2nd most likely)
Getafe CF v. Granada54%32%13%
Valencia v. Eibar65%28%7%
Malaga v. Elche61%13%25%
Almeria v. Celta Vigo56%7%36%

EPL Predictions Week 35

GameHome Team Win %Draw %Visiting Team Win %Actual OutcomeCorrect Prediction
Leicester City v. Newcastle13%14%72%Leicester City 3 - Newcastle 0No (3rd most likely)
Swansea City v. Stoke City28%22%50%Swansea City 2 - Stoke City 0No (2nd most likely)
Liverpool v. QPR75%6%18%Liverpool 2 - QPR 1Yes
Sunderland v. Southampton9%47%43%Sunderland 2 - Southampton 1No (3rd most likely)
West Ham v. Burnley FC37%42%20%West Ham 1 - Burnley 0No (2nd most likely)
Aston Villa v. Everton32%13%54%Aston Villa 3 - Everton 1No (2nd most likely)
Manchester United v. West Bromwich Albion31%49%19%Man United 0 - WBA 1No (3rd most likely
Chelsea v. Crystal Palace66%22%11%Chelsea 1 - Crystal Palace 0Yes
Tottenham v. Manchester City29%16%55%
Hull City v. Arsenal5%32%63%