I’m probably breaking one of the cardinal rules of Soccer Analytics TwitterTM by publicly posting these things, but I’m a firm believer in transparency in my predictions and sharing my model’s successes and opportunities for improvement. We don’t learn much from pretending that our models are always right, and the best way to learn is to be completely open with where analytics are effective and where they are less effective. So I wanted to present a few diagnostics and then a few thoughts afterward.
Overall the model is working well. The first thing I did was create a variable for MOTSON’s “most likely outcome.” This was simply done by looking at which of the three outcomes (Home Win, Away Win, Draw) had the highest predicted percentage. So if a model predicted 50% home win, 20% away win, and 30% draw, it was coded as a “predicted home win” for these first two tests.
Question #1: How Many Does MOTSON Get Right?
The most important question is “Does MOTSON predict better than a former footballer?” The answer so far is yes – I compare the overall correct predictions to two separate “models.” The first is random chance, which would predict a 3 outcome game correctly 1/3, or 33% of the time. The second is what I call the “Home Team Naive” model where someone predicts that the home team wins every game, which would be correct about 37% of the time this season. MOTSON gets it right 45% of the time, which is significantly different (p < 0.05 in a two-tailed, one sample t-test) from these two models.
So far so good, although I’d like to see it be “right” more of the time. However, it’s important to note that “right” means getting the correct probabilities for each outcome rather than having the highest probability assigned to the actual outcome. Even if the model predicts a team has an 80% chance of winning, if it’s “correct” we’d still expect to see another outcome 20% of the time. As of today, the model expects to pick about 53% of the games correctly, which is 1% outside of the 95% confidence interval for the average here. This means the model is somewhat under-performing, which is unsurprising given the two major outliers (Leicester City and Chelsea, which I’ll discuss later).
Home, Away, or Draw?
The next thing I tested was which predictions were most likely to be correct: a home win, an away win, or draws. In 100/180 fixtures, or about 56% of the fixtures, the model predicted a home win, 45/180, or 25% were predicted draws, and 35/180, or 19% were predicted away wins. Home wins are quite a bit higher than the actual outcomes (at about 37% as of today), but the average percentage for these predictions was a bit more in line with historical values (47%). The disappearance of home field advantage this season is worth noting, which is a potential roadblock here. .My model is in line with last season, but there’s significantly fewer home wins this years so either this is an anomaly or the model needs to be re-calibrated. As I’ve mentioned in the past, I’m letting the model run for a whole season so I’ll re-train it at that point.
This graph shows that MOTSON does well when the home team wins, about 47% of predicted home wins are correct. It does similarly well for away wins, about 45% of predicted away wins are correct. Several people have noticed that the model over-values draws, which is borne out by the fact that only 33% of predicted draws are correct. It definitely seems to be over-valuing draws right now.
#confidence: Prediction Error by Probability
As I said earlier, most likely outcome isn’t necessarily the best way to do these type of analyses so I also looked at outcome by certainty of the prediction. Basically, I’d expect it to be “right” more frequently for predictions where it had a higher likelihood of the outcome occurring than for predictions where it had a lower likelihood. If the model perfectly predicted, it should only be “right” 2/5 times if it predicts a 40% chance of a home team win, but should be “right” 4/5 times if it predicts an 80% chance of a home team win.
To test this, I “binned” the predictions into three categories based on the likelihood of the highest probability prediction: low (0.3-0.5), medium (0.5-0.7), and high (0.7-1.0). Interestingly, the model performs very well in “medium” picks, getting the proportion “correct” I’d expect it to, around 0.54. The mean proportion of this category was around 0.56, so 0.54 is really solid here. “Low” and “high” are both lower than expected, low by about 0.07 and “high” by about 0.10. That “high” is low is definitely unsurprising, and I’d probably attribute that to Chelsea’s poor season. MOTSON really likes Chelsea at home against just about everyone, especially teams who were in the bottom half of the expected table. A few big misses there would hurt the model’s accuracy significantly.
The goal of this project originally wasn’t to predict individual games, but to predict points over the course of a season. I post this graph on Twitter semi-regularly, but this shows the deviation for each team from the points my model has expected them to earn through the first 18 weeks.
First, the bad news. Not surprisingly, Chelsea and Leicester bookend this table. MOTSON originally picked Leicester City to finish 9th, which was significantly above most people’s expectations, but even given those high expectations they’ve significantly over-performed. Similarly, pre-season MOTSON had Chelsea in 2nd place, and they’re way below expectations. The only other pick I’d consider a “bad” pick for MOTSON is Watford here, who have performed considerably above expectations. Villa seems to have slightly turned the corner and I’d be shocked if they didn’t make up some of those lost points, and Swansea isn’t as bad as their numbers here seem to say so I’m expecting them to regress to the mean.
The good news. Thirteen teams are with 4 points of their expected points as we approach the halfway point, which I’m very happy with. Even if individual predictions aren’t doing well, aggregate predictions seem to be working out well which bodes well for the overall accuracy of the model.
Also good is the correlation between my expected points and actual points earned. Overall the model is at 0.53, which is in the good range, but if you exclude the two outliers it’s at a really strong 0.76.
Finally, the slope of the relationship between expected points and actual points is 1.0. This means that for every one point increase in predicted points, teams earn a one point increase in actual points. This is the relationship I want to see with this model, so it’s good to see that the relationship has held up after 18 weeks.
Overall I’m happy with the model’s performance, especially given two significantly weird aspects to the season so far (the rise of Leicester/fall of Chelsea, and the disappearance of home field advantage). I’d be surprised if any pre-season model predicted Leicester/Chelsea, and honestly I don’t think anyone could have properly weighted home field advantage.
As has been discussed (far too much) on Twitter, the model does over-predict draws. I couldn’t disagree more with those who say the maximum probability for any given game to end in a draw is capped around 33-35%, but I do think the model probably over-predicts draws by about 10%.1 It’s also over-valuing home field advantage right now, so visitors aren’t getting nearly enough credit. It remains to be seen if this holds up over the course of a season, or if it’s some sort of anomaly over the first half that resolves itself over the next 20 weeks.
Another note on error: all the initial predictions were calculated with a “full-strength” squad. This is a hobby for me, and I’ve decided it’s far too much work to update the spreadsheets every week with the various injuries, so there will be some error there. Individual injuries tend not to make a big different in model predictions, but this is adding some noise that isn’t necessarily in the model naturally but is induced by incorrect inputs.I tend to think this balances out over the course of the season (as an example, I was talking to someone about Arsenal v. City and City losing Kompany is roughly equal to Arsenal losing Coquelin), but in short samples this could be a source of added error.
Final thoughts: I’d encourage everyone who does any sort of statistical modeling to do a similar sort of open diagnostic of your models. I think the best way to move forward is to think about where we succeed and where we can improve, so I’d encourage the xG/xA modelers, game prediction modelers to do something similar with their models. It’s not the easiest thing to do, especially for people who do this for money rather than a hobby, but coming from an academic background I’m a firm believer that putting out work publicly and transparently for people to discuss is what you do.
- A note, I’m also completely over this debate in my mentions so don’t @ me on Twitter about it. I firmly believe the math is on my side and have explained myself enough. I’m over it at this point. ↩