Warning: over-reading into a very small sample size ahead. I plan on re-visiting this topic over the coming weeks, but figured there was no reason not to start some quick discussion now.
Regular readers will know that I calculated my predictions based on last season’s data, and have consciously not updated the predictions since then because I wanted to let it run for a season and see how it works. But as of a couple of weeks ago I decided I had enough data to at least start doing a predictive model using current season’s results, which you can read about here. I thought it was important to test the two models against each other, trying to learn about the strengths and weaknesses of each of them. And like everything I’ve done, I think doing it publicly is important. Here are MOTSON’s predictions for this week.1
Using the quick diagnostic method of “Did the category with the highest predicted percentage happen?” MOTSON predicted 5/9 games correctly: Arsenal v. Stoke ended in a draw, Man City beat Crystal Palace, Aston Villa drew against Leicester, Spurs beat Sunderland, and Southampton beat West Brom. This is actually pretty solid in a week where there weren’t a lot of overwhelming favorites (and one of them was Chelsea…..). Now let’s see how the TAM model performed.
A quick note on reading this image because I think it’s a little less intuitive: the zones are based on how difficult the away fixture is, and the circle is how strong the home team is. If the circle is in the red area, it means the model predicts a home win. If it’s in the grey zone, that means it predicts a draw, and if it’s in the blue zone that means it predicts an away win. The probabilities are a little more complicated, but the quick explanation is that the deeper the circle is into the red zone the more likely a home win is while the deeper it is in the blue zone the more likely an away win is.
The TAM model also predicted 5/9 correct: Chelsea v. Everton, Stoke v. Arsenal, Man City v. Crystal Palace, Liverpool v. Man United, and Spurs v. Sunderland. The overlap between the two is Stoke v. Arsenal, City v. Palace, and Spurs v. Sunderland. The differences were that MOTSON got Villa v. Leicester and Southampton v. West Brom right, the TAM got Chelsea v. Everton and Liverpool v. United right.
Quick diagnostics: the TAM obviously has a better handle on how (not) good Chelsea is this season, unsurprising given the model inputs. Liverpool v. United isn’t much of a difference given that MOTSON had a 35% likelihood of a draw, only a few points lower than Liverpool’s 40% to win. Not a big miss, so I don’t think there’s a big advantage there.
MOTSON correctly predicting how strong Aston Villa would be at home against Leicester City was potentially a real coup. The table has Leicester as a huge favorite over Aston Villa, but the 1-1 draw *may* have even been a little generous to Leicester. That being said, I don’t want to read too much into a single result that may have been a fluke. Southampton v. West Brom is another tough one: I still think of Southampton as a good team, as does MOTSON but their form hasn’t really matched that. TAM recognizes this, having them drawing at home against West Brom, but MOTSON still thinks they’re a fairly good team. Southampton has started to come back up to expectations, only 5 points under MOTSON’s predictions at this point, so it may have a better appreciation for their quality than the TAM does.
Like I said in the beginning, these were just some quick thoughts on the two models. I think there’s a more important question here too: do underlying statistics predict better than simple wins/losses? I don’t include “recent form” in my model because it actually hurt the model’s predictions in training data last season. Beyond modeling, I think humans dramatically overestimate the value of recent form in predictions, and it’s nice to test that empirically.
It’s obviously a simple model, but I think it’s also important to think about what models using current season data add if they don’t predict any better than the pre-season predictions then what do we get from the ones that rely so heavily on in-season performance? If the pre-season models work, maybe they’re all we need and in-season is overvalued. These are all empirical questions that deserve a more systematic exploration, and I’m hoping to do some of that here. I’m going to keep looking at these things over the coming weeks, but this was just a start of the process. Plenty of soccer left to be played, plenty of analyses to do.
- A note: I’m writing this on Sunday night before the Swansea v. Watford game so we don’t have a result there yet. When I refer to the denominator for this week being 9 games, this is why. ↩