Category Archives: Predictions

Week 25 EPL Model Comparisons

So this is the fourth week of my in-season results based model (TAM), and I wanted to continue comparing its performance to MOTSON’s. My goal is, with enough data, to learn as much as I can about the advantages and disadvantages of the two approaches and maybe how they can compliment each other and how I can improve my predictions for next season. You can see last week’s post here, and I’ll be updating every week. Below is a table with each model’s modal category and the actual result.

GameMOTSON (Pre-Season)TAM (In-Season)Actual Result
Stoke City v. EvertonStokeDrawEverton
Manchester City v. Leicester CityMan CityMan CityLeicester City
Southampton v. West Ham UnitedSouthamptonDrawSouthampton
Bournemouth v. ArsenalArsenalArsenalArsenal
Chelsea v. Manchester UnitedChelseaManchester UnitedDraw
Newcastle v. West BromDrawDrawNewcastle
Tottenham Hotspur v. WatfordTottenhamTottenhamTottenham
Aston Villa v. Norwich CityAston VillaDrawAston Villa
Liverpool v. SunderlandLiverpoolLiverpoolDraw
Swansea City v. Crystal PalaceSwanseaDrawDraw

Not a great week for either model, and MOTSON wins again with a meager 4-3 (22-18 overall). Not surprising after such a hot week last week, but what can we learn. First is that both MOTSON and the TAM picked Man City over Leicester, which is a testament to how strongly Leicester is over-performing. Even using this season’s results, the models were wrong as Leicester still pulled out what turned out to be a fairly comfortable victory over Manchester City. We’re witnessing something special here, and no model seems to be able to capture exactly how special.

As part of Southampton’s regression to the mean, MOTSON picked that game correctly while the TAM struggled with it, and MOTSON correctly liked Aston Villa to win at home against Norwich City. The TAM on the other hand picked the Swansea v. Crystal Palace draw correctly.

I’m not seeing any patterns emerging (other than over-valuing Chelsea), but if anyone has any hypotheses on ways the TAM outperforms MOTSON I’d appreciate you sharing them with me on Twitter (@Soccermetric). Check back soon for my Serie A and Bundesliga prediction diagnostics – not as good results for the TAM  in those leagues as last week either.

The Math Behind Why Leicester Will Win The Title

After Week 25, MOTSON has Arsenal as a 67% favorite to win the EPL, with Leicester City and Manchester City behind at around 14-15% each. However, while I’m confident that City’s overall number is correct (relative to Arsenal) MOTSON also has underestimated Leicester’s performance significantly this season, so the question is whether they’re in a better position than Arsenal is for the run-in. I think they are, and I want to look at the numbers.

First thing’s first, next week’s game is almost literally a title decider. The winner becomes MOTSON’s prohibitive favorite to win the league, the loser takes over the lead for 2nd (and possibly settles into 4th or even 5th with some bad luck). Here’s the heat map of the predictions:

Week 26 Arsenal v Leicester

The plots look similar, with Arsenal faring a little better on account of MOTSON’s belief that they’re a stronger team overall. But what is the likelihood that MOTSON gets it wrong and Leicester outperforms Arsenal in the run-in?

Leicester’s real opportunity here is that MOTSON, as optimistic as it was for Leicester in pre-season, still predicted they’d finish around 8th place. So all of their predictions are based on an expectation of “How would the 8th best team in the Premier League do?” Compare this to Arsenal, which MOTSON has listed as the title favorite for the entire season, and predicts each match accordingly. Arsenal has very little room for error, including the Week 25 game against Leicester City, while Leicester can drop some points and still improve even further on their expected points. Arsenal can only hold serve, while Leicester still has some tough, but possible, opportunities to improve.

After Week 25, Arsenal is a five point favorite in the Expected Final table. Almost the entire difference is based on Week 26, where Arsenal is expected to take 2.63 points over 0.22 points for Leicester. A loss is a virtual 6 point swing, so it’s basically a must-win if they want to maintain their lead. The title chase for both depends on the run-in, so let’s look at the remainder of the games.

WeekOpponentExpected PointsChance to Gain or Hold Serve
Week 27Norwich2.13Hold
Week 28West Brom1.98Hold
Week 29@ Watford1.46Gain
Week 30Newcastle2.02Hold
Week 31@ Crystal Palace0.72Gain
Week 32Southampton1.73Hold
Week 33@ Sunderland1.09Gain
Week 34West Ham1.97Hold
Week 35Swansea City1.97Hold
Week 36@ Manchester United0.5Gain
Week 37Everton1.98Hold
Week 38@ Chelsea0.27Gain

I’ve listed MOTSON’s Expected Points for each of the remaining games (after Arsenal), and subjectively described each of the games as a chance to gain some expected points, or needing to hold serve. Games where Leicester seems like an appropriate favorite need a win to hold serve, while games where they are less favored/are an underdog are listed as a chance to gain. In 5/12 remaining games MOTSON underestimates them and they have a good opportunity to pick up some points. In particular, the trip to Sunderland on Week 33 and the trip to Stamford Bridge week 38 are both big chances for them to pick up significant points over what MOTSON expects. I’m not revising the model mid-season, but a qualitative look would pick Leicester to pick up at least a few points. They’ll likely drop a couple in at least one of the “hold serve” games, but the Chelsea game is a perfect storm of over-estimating Chelsea/under-estimating Leicester so that could be problematic for Arsenal.

A win against Arsenal puts Leicester in the driver’s seat for MOTSON’s expected final table, but a loss isn’t as tragic for their title chances as the model thinks. I’ve learned enough about Arsenal fans to know they won’t get too comfortable, and they shouldn’t. They’ll still be the favorites, but there are plenty of chances for Leicester to pick up even more points vs. expectations in the last third of the season.






Week 24 EPL Model Comparisons

This is the third week of my in-season results based model (TAM), and I wanted to continue comparing its performance to MOTSON’s. My goal is, with enough data, to learn as much as I can about the advantages and disadvantages of the two approaches and maybe how they can compliment each other and how I can improve my predictions for next season. You can see last week’s post here, and I’ll be updating every week. Below is a table with each model’s modal category and the actual result.

GameMOTSONTAM (In-Season)Actual Result
Norwich v. TottenhamTottenhamDrawTottenham
West Ham v. Aston VillaWest HamWest HamWest Ham
Leicester City v. LiverpoolLeicester City/Draw (equal)Leicester CityLeicester City
Crystal Palace v. BournemouthCrystal PalaceBournemouthBournemouth
Arsenal v. SouthamptonArsenalArsenalDraw
Sunderland v. Man CityMan CityMan CityMan City
Man United v. Stoke CityMan UnitedDrawMan United
West Brom v. Swansea CityDrawDrawDraw
Watford v. ChelseaChelseaDrawDraw
Everton v. NewcastleEvertonEvertonEverton

MOTSON did well this week, getting 7/10 games correct, and the TAM stepped up, also getting 7/10 correct. MOTSON missed on Crystal Palace, Arsenal, and Chelsea, while the TAM missed on Tottenham, Man United, and Arsenal. So what can we learn from these games?

Once again, MOTSON overestimates Chelsea. There’s not much to learn here because we’ve already learned this a bunch of times – I still don’t have any good ideas how I could have modeled this pre-season, and maybe this is just a statistical anomaly. Either way, it’s a known issue with the model that seems to be corrected fairly well by the in-season results model.

Both models missing on Arsenal is a bit surprising, and this may just be a legit low-probability event. If two disparate models predict the same outcome and both get it wrong, then that might be the explanation.

TAM’s misses on Tottenham and Man United are surprising. I would have picked both of those teams to be favorites, especially with Man United at home. Maybe Spurs has struggled to convert road fixtures to wins, which explains that prediction, but United drawing at home against Stoke is a weird one to me. Not sure why that is – worth thinking about more.

MOTSON’s miss on Crystal Palace v. Bournemouth is a tough one – I would have picked Palace to win, but TAM got this one right so apparently the data are better than my intuition and the pre-season model here. I don’t have any real insight here, but it’s important to take notice of this in case a pattern arises that would give me an opportunity to improve the model next year.

Overall MOTSON still leads with 18 correct picks over TAM’s 15. Both models had good weeks, and I’m curious to see if the TAM model improves as it gets more data while MOTSON keeps the same inputs from last year. In-season results did much better this week than the past two, so I’m curious to see if this is just because the results were more predictable this week or if it’s improving.

 

 

 

Is The EPL More Unpredictable Than Other Leagues? An (Early) Comparison of Models

Parental Discretion Advised: Over-generalizations from incredibly small sample sizes to follow.

Those of you who follow me on Twitter (which I assume is almost all of my readers, but if not you should follow me @Soccermetric) probably know I debuted prediction models for Serie A and the Bundesliga this weekend. They’re based on the TAM model (which needs a better nickname – I want to backronym NAGBE if possible), which is described in full here. The short version is it only looks at basically two variables: in-season results and goal differential. The math is a lot more complicated than that, but this is the basic idea. The TAM model hasn’t done especially well through its first two weeks in the EPL, predicting 8/20 matches correctly, a meager 40%, or barely better than the 33% we’d expect from just flipping a three-sided coin.

However, it did much better in Serie A and the Bundesliga this weekend. It predicted 6/9 correct this week in the Bundesliga, or 67%. Here are the first week’s predictions:

Week 24 (19) Bundesliga Predictions

The errors were Wolfsburg, Hamburg, and Hertha Berlin. I’m incredibly pleased with 6/9, and am equally pleased with the 6/10 in Serie A (especially because it failed to predict Milan’s win in the Derby).

Week 24 (22) Serie A TAM

In one week, the two new models each got 6 outcomes correct, which is almost as many as the same EPL model got correct in two weeks (MOTSON outperforms the simple model). This is obviously a small sample, and “correct” outcomes isn’t the right way to measure this, but it’s evidence that maybe the EPL is just really difficult to predict (especially this season). I’m not planning on putting together a model for La Liga, but that one might be even easier: you’d be hard-pressed to put together a bad model as long as you started with Barcelona > Real Madrid = Atletico.

The success of a model depends on the difficulty of the task, and if the same model performs much better in other leagues, then we may have evidence of a greater challenge in predicting EPL results rather than the other major European leagues and should adjust expectations of our models accordingly.






Week 23 EPL Model Comparisons (Plus Bonus Serie A Diagnostics)

Last week I posted my first set of comparisons between MOTSON (my predictive model calculated with pre-season data) and my TAM model (the model calculated using in-season results). I wanted to follow it up every week, so here are this week’s predictions and results:

GameMOTSONTAMActual
Norwich v. LiverpoolLiverpoolDrawLiverpool
Manchester United v. SouthamptonManchester UnitedManchester UnitedSouthampton
Leicester City v. Stoke CityLeicester CityLeicester CityLeicester City
Watford v. NewcastleNewcastle UnitedWatfordWatford
Crystal Palace v. Tottenham HotspurPalace/Draw (even)Tottenham HotspurTottenham Hotspur
Sunderland v. BournemouthDrawBournemouthDraw
West Brom v. Aston VillaDrawWest BromDraw
West Ham United v. Manchester CityDrawDrawDraw
Everton v. Swansea CityEvertonEvertonSwansea
Arsenal v. ChelseaArsenalArsenalChelsea

MOTSON got 5/10 games correct, predicting correct outcomes in the following games:1

  • Norwich v. Liverpool
  • Leicester v. Stoke
  • Sunderland v. Bournemouth
  • West Brom v. Villa
  • West Ham v. Man City

The TAM model only got 3/10 correct, predicting correct outcomes in:

  • Leicester v. Stoke
  • Watford v. Newcastle
  • Palace v. Spurs

I’m less than thrilled with 3/10, which is less than the 5/10 last week. Overall MOTSON is winning the prediction competition with 11 correct picks to 8 for the in-season form model. Two of the three TAM correct picks were ones MOTSON got wrong: Watford v. Newcastle and Palace v. Spurs. MOTSON has consistently underestimated Watford, so a home win from them isn’t entirely surprising. I’m not sure why MOTSON didn’t like Spurs more away against Palace – away fixtures are generally tough, but even based on last year’s form I would have thought Spurs would have fared better (although as I remember it their coefficient wasn’t *that* much above the mid-table pack at home last year so that may be what it’s thinking).

Quick note on the TAM’s Serie A predictions – it fared much better here, picking 5/10 correct. The 5 correct predictions were:

  • Juventus over Roma
  • Napoli over Sampdoria
  • Empoli drew Milan (bah)
  • Verona drew Genoa
  • Lazio over Chievo

The TAM is about as simple of an in-season model as one can build, but it’s interesting to see its comparisons vs. a model that knows nothing about 2015-2016, and how the (more sophisticated) pre-season model seems to be doing better so far.

  1. A quick note: Adding up the predicted probabilities for the most likely category had MOTSON getting 5.6 games correct this weekend, so predictions were pretty consistent with results.

Model Comparison: Pre-Season Predictions vs. In-Season Results Only

Warning: over-reading into a very small sample size ahead. I plan on re-visiting this topic over the coming weeks, but figured there was no reason not to start some quick discussion now. 

Regular readers will know that I calculated my predictions based on last season’s data, and have consciously not updated the predictions since then because I wanted to let it run for a season and see how it works. But as of a couple of weeks ago I decided I had enough data to at least start doing a predictive model using current season’s results, which you can read about here. I thought it was important to test the two models against each other, trying to learn about the strengths and weaknesses of each of them. And like everything I’ve done, I think doing it publicly is important. Here are MOTSON’s predictions for this week.1

Week 22 Predictions

Using the quick diagnostic method of “Did the category with the highest predicted percentage happen?” MOTSON predicted 5/9 games correctly: Arsenal v. Stoke ended in a draw, Man City beat Crystal Palace, Aston Villa drew against Leicester, Spurs beat Sunderland, and Southampton beat West Brom. This is actually pretty solid in a week where there weren’t a lot of overwhelming favorites (and one of them was Chelsea…..). Now let’s see how the TAM model performed.

Week 22 TAM

A quick note on reading this image because I think it’s a little less intuitive: the zones are based on how difficult the away fixture is, and the circle is how strong the home team is. If the circle is in the red area, it means the model predicts a home win. If it’s in the grey zone, that means it predicts a draw, and if it’s in the blue zone that means it predicts an away win. The probabilities are a little more complicated, but the quick explanation is that the deeper the circle is into the red zone the more likely a home win is while the deeper it is in the blue zone the more likely an away win is.

The TAM model also predicted 5/9 correct: Chelsea v. Everton, Stoke v. Arsenal, Man City v. Crystal Palace, Liverpool v. Man United, and Spurs v. Sunderland. The overlap between the two is Stoke v. Arsenal, City v. Palace, and Spurs v. Sunderland. The differences were that MOTSON got Villa v. Leicester and Southampton v. West Brom right, the TAM got Chelsea v. Everton and Liverpool v. United right.

Quick diagnostics: the TAM obviously has a better handle on how (not) good Chelsea is this season, unsurprising given the model inputs. Liverpool v. United isn’t much of a difference given that MOTSON had a 35% likelihood of a draw, only a few points lower than Liverpool’s 40% to win. Not a big miss, so I don’t think there’s a big advantage there.

MOTSON correctly predicting how strong Aston Villa would be at home against Leicester City was potentially a real coup. The table has Leicester as a huge favorite over Aston Villa, but the 1-1 draw *may* have even been a little generous to Leicester. That being said, I don’t want to read too much into a single result that may have been a fluke. Southampton v. West Brom is another tough one: I still think of Southampton as a good team, as does MOTSON but their form hasn’t really matched that. TAM recognizes this, having them drawing at home against West Brom, but MOTSON still thinks they’re a fairly good team. Southampton has started to come back up to expectations, only 5 points under MOTSON’s predictions at this point, so it may have a better appreciation for their quality than the TAM does.

Like I said in the beginning, these were just some quick thoughts on the two models. I think there’s a more important question here too: do underlying statistics predict better than simple wins/losses? I don’t include “recent form” in my model because it actually hurt the model’s predictions in training data last season. Beyond modeling, I think humans dramatically overestimate the value of recent form in predictions, and it’s nice to test that empirically.

It’s obviously a simple model, but I think it’s also important to think about what models using current season data add if they don’t predict any better than the pre-season predictions then what do we get from the ones that rely so heavily on in-season performance? If the pre-season models work, maybe they’re all we need and in-season is overvalued. These are all empirical questions that deserve a more systematic exploration, and I’m hoping to do some of that here. I’m going to keep looking at these things over the coming weeks, but this was just a start of the process. Plenty of soccer left to be played, plenty of analyses to do.

 

 

  1. A note: I’m writing this on Sunday night before the Swansea v. Watford game so we don’t have a result there yet. When I refer to the denominator for this week being 9 games, this is why.

Regardless of Method, Prediction Models Basically Agree

A number of different blog posts comparing some of the different prediction models have been going around lately. If you haven’t seen them, you should check out Alexej Behnisch’s piece comparing various models (the post where I got data for this piece), and Constantinos Chappas’s work at StatsBomb doing a “poll of models.”

In his post, Alexej pointed out  how many similarities there were between the different model predictions, and highlighted some of the major differences in their predictions. However, something I’ve noticed in the past is that the middle of the table is basically one big tie, so predicting Southampton for 10th vs. 13th doesn’t necessarily mean there’s much of an actual difference in the models.my most recent heat map of probabilities might help illustrate what I mean:

Week 20-3 Heat Map

You can look at the brightness of the colors and almost see the clear boxes. The table has basically separated itself into 4 tiers: the top 2, the next 5 (contenders for Europe), the next 7 (mid-table malaise), the next 4 (partly safe with a chance of relegation), and the bottom 2 (getting ready for life in the Championship). One of the things Alexej’s article points out is differences in 3rd-5th place predictions, but if you look at the probabilities in my model they’re roughly tied. The mid-week results could easily see those three completely switch and switch again after the weekend’s fixtures. 1

So the question I’m interesting in is how similar the various models’ predictions are? In statistical terms, how well do the different models’ predicted points values correlate with each other? The answer is: incredibly highly. In fact, they correlate so highly I checked and re-checked my analysis against the original data several times because I didn’t believe it. Here’s a plot of the data to show you how highly they correlate.

Correlation Major Models

To see the correlation between two models, lineup the name on the bottom row and the name on the left side.  The lower diagonal shows a scatterplot of the two models specified on the bottom and left sides, and all of the scatterplots show basically a 1:1 relationship with almost no variance from that line. This indicates a high correlation, and the number for each pair is specified in the upper diagonal. The lowest correlation is between my model (Soccermetric) and Michael Caley’s (MC_of_A), and even that is 0.978 which is incredibly high. All of the major models here basically have the same predicted values for end of season points.

With so little variation, there’s not much else to be squeezed out of these data, but I wanted to present one more just because it’s a question that has been on my mind. There  are basically two types of models out there – the pre-season prediction models, and the in-season models.2 They use dramatically different data, so I’ve always been curious to see how the two types of models match up.  I highlighted the cells with models that “match” (both are either in-season or both are pre-season) in blue, while the cells with mis-matched models (one is pre-season, one is in-season) in red. The results are below.

View post on imgur.com

There’s certainly no statistically distinguishable relationship between the two groups, but mis-matched models (red cells) tend to have a little lower correlation than the matches (blue cells), but we’re looking at a comparison of 0.98 to 0.99 here so I don’t think it’s worth drawing any conclusions from. The similarity between all the models is striking to me, and at least by week 20 the in-season models seem to have roughly the same predictions as the pre-season models. This may be my bias as a pre-season modeler, but that speaks highly of the pre-season models in my mind.

The moral of the story is that despite using such different inputs, these models have roughly the same predictions, correlating at 0.97 or better. More importantly, they correlate with market expectations at a similar level, which either means stats conform to the wisdom of crowds or that gamblers are listening to our models and betting accordingly. At the end of the day, look at the model inputs, pick the model you’re most comfortable with, but it looks like whichever one you pick you’ll see roughly the same outcomes. It speaks well about the type of work that’s being done online, and it’s an exciting time to be following soccer analytics.





  1. This isn’t to criticize the piece – Alexej’s article points out an important point that there can be a huge difference in where a team finishes within these groups, and nowhere is this more true than the 3-5th place spots.
  2. I didn’t know how to classify the EuroClubIndex because it uses both historical and current season data so I omitted it from this analysis. I considered market data as in-season data because they update every week with new information.

Does MOTSON Predict Better than a Former Footballer? Halfway Point Model Diagnostics

I’m probably breaking one of the cardinal rules of Soccer Analytics TwitterTM by publicly posting these things, but I’m a firm believer in transparency in my predictions and sharing my model’s successes and opportunities for improvement. We don’t learn much from pretending that our models are always right, and the best way to learn is to be completely open with where analytics are effective and where they are less effective. So I wanted to present a few diagnostics and then a few thoughts afterward.

Overall the model is working well. The first thing I did was create a variable for MOTSON’s “most likely outcome.” This was simply done by looking at which of the three outcomes (Home Win, Away Win, Draw) had the highest predicted percentage. So if a model predicted 50% home win, 20% away win, and 30% draw, it was coded as a “predicted home win” for these first two tests.

Question #1: How Many Does MOTSON Get Right?

The most important question is “Does MOTSON predict better than a former footballer?” The answer so far is yes – I compare the overall correct predictions to two separate “models.” The first is random chance, which would predict a 3 outcome game correctly 1/3, or 33% of the time. The second is what I call the “Home Team Naive” model where someone predicts that the home team wins every game, which would be correct about 37% of the time this season. MOTSON gets it right 45% of the time, which is significantly different (p < 0.05 in a two-tailed, one sample t-test) from these two models.

Week 18 Diagnostics Comparison

So far so good, although I’d like to see it be “right” more of the time. However, it’s important to note that “right” means getting the correct probabilities for each outcome rather than having the highest probability assigned to the actual outcome. Even if the model predicts a team has an 80% chance of winning, if it’s “correct” we’d still expect to see another outcome 20% of the time. As of today, the model expects to pick about 53% of the games correctly, which is 1% outside of the 95% confidence interval for the average here. This means the model is somewhat under-performing, which is unsurprising given the two major outliers (Leicester City and Chelsea, which I’ll discuss later).

Home, Away, or Draw?

The next thing I tested was which predictions were most likely to be correct: a home win, an away win, or draws.  In 100/180 fixtures, or about 56% of the fixtures, the model predicted a home win, 45/180, or 25% were predicted draws, and 35/180, or 19% were predicted away wins. Home wins are quite a bit higher than the actual outcomes (at about 37% as of today), but the average percentage for these predictions was a bit more in line with historical values (47%). The disappearance of home field advantage this season is worth noting, which is a potential roadblock here. .My model is in line with last season, but there’s significantly fewer home wins this years so either this is an anomaly or the model needs to be re-calibrated. As I’ve mentioned in the past, I’m letting the model run for a whole season so I’ll re-train it at that point.

Week 18 Diagnostics by Category

This graph shows that MOTSON does well when the home team wins, about 47% of predicted home wins are correct. It does similarly well for away wins, about 45% of predicted away wins are correct. Several people have noticed that the model over-values draws, which is borne out by the fact that only 33% of predicted draws are correct. It definitely seems to be over-valuing draws right now.

#confidence: Prediction Error by Probability

As I said earlier, most likely outcome isn’t necessarily the best way to do these type of analyses so I also looked at outcome by certainty of the prediction. Basically, I’d expect it to be “right” more frequently for predictions where it had a higher likelihood of the outcome occurring than for predictions where it had a lower likelihood. If the model perfectly predicted, it should only be “right” 2/5 times if it predicts a 40% chance of a home team win, but should be “right” 4/5 times if it predicts an 80% chance of a home team win.

Week 18 Diagnostics

To test this, I “binned” the predictions into three categories based on the likelihood of the highest probability prediction: low (0.3-0.5), medium (0.5-0.7), and high (0.7-1.0). Interestingly, the model performs very well in “medium” picks, getting the proportion “correct” I’d expect it to, around 0.54. The mean proportion of this category was around 0.56, so 0.54 is really solid here. “Low” and “high” are both lower than expected, low by about 0.07 and “high” by about 0.10. That “high” is low is definitely unsurprising, and I’d probably attribute that to Chelsea’s poor season. MOTSON really likes Chelsea at home against just about everyone, especially teams who were in the bottom half of the expected table. A few big misses there would hurt the model’s accuracy significantly.

Expected Points

The goal of this project originally wasn’t to predict individual games, but to predict points over the course of a season. I post this graph on Twitter semi-regularly, but this shows the deviation for each team from the points my model has expected them to earn through the first 18 weeks. Week 18 Deviation

First, the bad news. Not surprisingly, Chelsea and Leicester bookend this table. MOTSON originally picked Leicester City to finish 9th, which was significantly above most people’s expectations, but even given those high expectations they’ve significantly over-performed. Similarly, pre-season MOTSON had Chelsea in 2nd place, and they’re way below expectations. The only other pick I’d consider a “bad” pick for MOTSON is Watford here, who have performed considerably above expectations. Villa seems to have slightly turned the corner and I’d be shocked if they didn’t make up some of those lost points, and Swansea isn’t as bad as their numbers here seem to say so I’m expecting them to regress to the mean.

The good news. Thirteen teams are with 4 points of their expected points as we approach the halfway point, which I’m very happy with. Even if individual predictions aren’t doing well, aggregate predictions seem to be working out well which bodes well for the overall accuracy of the model.

Also good is the correlation between my expected points and actual points earned. Overall the model is at 0.53, which is in the good range, but if you exclude the two outliers it’s at a really strong 0.76.

Finally, the slope of the relationship between expected points and actual points is 1.0. This means that for every one point increase in predicted points, teams earn a one point increase in actual points. This is the relationship I want to see with this model, so it’s good to see that the relationship has held up after 18 weeks.

Week 18 - Deviation Line

Concluding Thoughts

Overall I’m happy with the model’s performance, especially given two significantly weird aspects to the season so far (the rise of Leicester/fall of Chelsea, and the disappearance of home field advantage). I’d be surprised if any pre-season model predicted Leicester/Chelsea, and honestly I don’t think anyone could have properly weighted home field advantage.

As has been discussed (far too much) on Twitter, the model does over-predict draws. I couldn’t disagree more with those who say the maximum probability for any given game to end in a draw is capped around 33-35%, but I do think the model probably over-predicts draws by about 10%.1 It’s also over-valuing home field advantage right now, so visitors aren’t getting nearly enough credit. It remains to be seen if this holds up over the course of a season, or if it’s some sort of anomaly over the first half that resolves itself over the next 20 weeks.

Another note on error: all the initial predictions were calculated with a “full-strength” squad. This is a hobby for me, and I’ve decided it’s far too much work to update the spreadsheets every week with the various injuries, so there will be some error there. Individual injuries tend not to make a big different in model predictions, but this is adding some noise that isn’t necessarily in the model naturally but is induced by incorrect inputs.I tend to think this balances out over the course of the season (as an example, I was talking to someone about Arsenal v. City and City losing Kompany is roughly equal to Arsenal losing Coquelin), but in short samples this could be a source of added error.

Final thoughts: I’d encourage everyone who does any sort of statistical modeling to do a similar sort of open diagnostic of your models. I think the best way to move forward is to think about where we succeed and where we can improve, so I’d encourage the xG/xA modelers, game prediction modelers to do something similar with their models. It’s not the easiest thing to do, especially for people who do this for money rather than a hobby, but coming from an academic background I’m a firm believer that putting out work publicly and transparently for people to discuss is what you do.

  1. A note, I’m also completely over this debate in my mentions so don’t @ me on Twitter about it. I firmly believe the math is on my side and have explained myself enough. I’m over it at this point.

Frequently Asked Questions about MOTSON’s Predictions

I’ve been surprised by how well Twitter has embraced my model (MOTSON: “Model of the Same Old Nonsense”), and feel fortunate to have people be interested in what was initially a fun side project for me. Because the same questions pop up in my mentions every game day and to help new followers, I wanted to post a list of answers to “frequently asked questions” I get.

What underlying stats do you use?

There’s a whole post about the method for anyone who is interested, but basically I use a number of offensive and defensive statistics from last season (2014-2015) combined with a “team strength” coefficient calculated from last year’s results.

Your model really likes Chelsea, what’s up with that?

The predictions I posted were all made based on 2014-2015 statistics, and more importantly the “team strength” coefficient was calculated on a year where Chelsea won the Premier League title. The model thinks they are good, especially at home where they were undefeated. They are not this season, so my model has really struggled with over-valuing them.

Your model isn’t giving Leicester City enough credit this week, they’re way better than ______.

Just like Chelsea, I’m using last year’s numbers. Leicester City have dramatically out-performed expectations this season. Unlike Chelsea, who would shock me if they improved anywhere near pre-season predictions, Leicester may still regress to the mean. We’ll see.

But Chelsea aren’t any good this year. Why don’t you update your predictions?

This was a personal decision – I’ve decided not to update the model throughout the season to see how it does. Creating a model based on recent results is a different challenge, and there are a number of ways to tackle it, but that’s not my goal here. Initially I aimed at creating a model where I could measure individual player contributions, and to do that I wanted to predict season performance. The individual game predictions were a byproduct of this model, and they’ve become far more popular than I had imagined, but they’re more of a diagnostic to how my model is doing overall.

I could update the model with a more recent team strength coefficient, but I’m a big believer in letting the model run its course.

Why does your model like Arsenal so much?

One of the features of the SVM model is that I don’t know what individual variables make the model say what it says – I do know it predicts a pretty significant home field advantage. Other than that, it’s a black box that I can’t unpack.

 

 

Title Challenge Consequences of Arsenal v. Manchester City

This will be a short blog post, but I wanted to show the huge consequences of this weekend’s biggest game: Arsenal v. Manchester City. My model has Arsenal as a  surprisingly big favorite here, giving City only 14% chance to win at the Emirates.1 Clearly Arsenal has a lot more to lose here, with 2.05 expected points compared to City’s 0.70 expected points.  Arsenal currently has about a 6 point lead at the top of the expected table over City, so even a draw would close that gap to ~4.5 points. Week 17-0 Arsenal City

So what will happen to the title chase after Monday’s game? I ran the usual 10,000 simulated seasons based on an Arsenal win, a draw, and a City win, and looked at the results. Here they are:

Consequences of Arsenal v City

This game has huge implications for the title race. Currently I have Arsenal at about 74% to win the league, but if they lose they drop a net 4.3 points in the title race (losing their ~2 expected points, and City picking up 2.3) which closes the 6 point gap to a little over a point.  An Arsenal loss/City win gives Arsenal a 52% chance to win the league, bumping City up to 40%.

A tie hurts Arsenal as well, but not too badly. They’d be at 66.5% to win the league, while City would move up to 24.5%. Still big favorites, but with a little less breathing room at the top.

However, a win puts Arsenal over 80% to win the league, giving them an ~8 point expected final table cushion over City, dropping City’s chances to only 13%.

This is probably the biggest game of the season so far, and really borders on a must-win for City if they want to make a title challenge. However, a draw seems to suit both teams just fine so I wouldn’t be surprised if we see a 0-0 draw with no one taking too many chances and content to live to fight another day (while letting Leicester City presumably build on their lead at the top of the table).

  1. Personally that seems far too low to me, but I’m a City supporter so I may be biased here. City doesn’t have a great record recently on the road in big matches so the model may be smarter than me.