Information Processing, Statistical Modeling, and “Expertise”

I recently did a guest post over at Scoreboard Journalism showing that statistical models are beating media experts pretty soundly in predicting the EPL table this year, and I saw a pattern in some of the feedback.  A number of people suggested that the reason modelers are beating the experts is because the media folks didn’t burn a lot of calories on their predictions while the modelers spent far more time and energy on the competition. There’s a few issues with this that I won’t address here, but I did want to point out the fundamental flaw with this argument: “time spent” isn’t a function of filling out the form, instead it’s about how much time you spend taking in information about soccer.

For people who didn’t read the original post, I wanted to post the dataviz showing how the modelers are consistently beating the media. The vertical line represents the simple model of “everything is the same as last year”, a line which virtually no media experts or fans beat, while over 20 models are ahead of that point. If the experts can’t even beat the simple model, that brings their expertise into question.

Week 26 - Prediction Data

Even in the face of this evidence, a number of commenters weren’t satisfied. My best explanation for this is that statistical modelers are facing some serious motivated reasoning: the idea that we reject arguments that disagree with our preconceived notions regardless of the quality of the evidence, while only accepting ones that fit what we already believe. But even if it’s not, I wholeheartedly reject the idea that modelers spend more time on their predictions. Even if they spent more conscious time building the model than the media experts did, that’s not really how information processing works.1

Borrowing from psychology (and more importantly for my background, political psychology), we don’t actively study most topics as if we’re going to take an exam on it for a college course. We also take in far more information than our active memory can actually process and convert to long-term memory. As a way of coping with the overwhelming amount of information we encounter, we resort to something called “online information processing.” Basically what this means is that we keep a running tally in our head of whether we think something is good or bad. We don’t know exactly why we feel the way we do, but we update our preferences with new information as it comes in.

This process works in soccer pretty simply: every time you see a team win you update their information and think they are better, and every time you see a team lose you update their information and think they are worse. Shocking results stand out more for people – Leicester City beating Manchester City made a lot of people update their belief on whether Leicester City could win the title. Injuries, transfers, coaching changes, etc., all add to the running tally in our head and help us update our beliefs. We’re not actively doing anything other than watching soccer or reading articles, but our subconscious mind is updating that running tally with every piece of information that we come across.

I read a lot of articles, tweets, watch a lot of games, and do all the things that media people do, but my model hasn’t changed based on any of this information. I found some data, scraped it, built the model, ran the script to calculate the final table positions, and haven’t touched the model since. If you don’t count the headaches of finding useful public soccer data, I spent less than 10 hours actually building it. That’s more time than the media people spent filling out their predictions and e-mailing them to Simon, but it’s far less time than is involved in online processing and building the running tally that contributed to media experts’ predictions.

Media experts do this for a living and are likely paid quite handsomely to follow soccer. My impression is that most of the modelers follow soccer as a hobby, albeit a fairly obsessive one for most of us. And even with that, we haven’t updated our models since the pre-season based on all of that information. My model was completed sometime in July, updated August 1 with new rosters, and I’ve been done with it. Every minute people spend reading/watching/learning counts toward their subjective predictions: the running tally in your head is always being updated. The real issue is that statistical models are better at processing all of this information than we are. Statistical models sort through information better than we do, they update the running tally in a scientific data, weighting it properly, and ignoring irrelevant information better than our brains do.2 It’s not a matter of time, it’s a matter of limitations brought on by our brain’s ability to process information correctly. MOTSON runs through my laptop in a few seconds, and can do the 10,000 simulated seasons in under a minute. My brain can’t do anything nearly that quick or nearly that accurately, and that’s why the modelers are winning.






  1. Another issue is that media people knew their predictions would be made public, so I would think that knowledge would encourage them to try harder than simply throwing some numbers together without a lot of forethought.
  2. All of this assumes the model is built correctly, which is a big assumption.

Leave a Reply

Your email address will not be published. Required fields are marked *