EPL Power Rankings: 2015-2016 Performance Adjusted for Strength of Schedule

I’ve written previously about how MOTSON relies on last season’s results for its predictions, but I wanted to do something with this season’s data now that we’re at the halfway point.

The model is simple: it’s a generalized partial credit model (GPCM), which is basically the same model they use for standardized tests like the SAT or GRE. I’ve written in more detail about the approach in my initial post for my blog, but the basic idea is that I treat each team as a person taking a test, and each home game as a question on the test. If you win the game, you get full credit, if you draw you get partial credit, and if you lose you get no credit. GPCM models are good because they are agnostic as to history, payroll, Big ClubTM status, or any of the other things that confuse regular human brains. All they know are the results that have occurred this season between the teams that have played against each other.

These models also do well with missing data, so in a half-season each team has played approximately half the other teams at home. GPCM models fill in these gaps, adjusting each team’s strength against the difficulty of the fixture.

So based on home results, here is each team’s “Strength” coefficient.

Week 19 Power Rankings

Arsenal is head and shoulders above the rest of the league, dominating at home, well above #2 Leicester City. The odd results here are Crystal Palace, who is far under-performing at home, and Swansea, who is performing at home far above their position in the table. Their positions are basically reversed, which brings me to the second part of the equation: difficulty to beat on the road.

The strength coefficient in the previous graph shows how strong each team has been at home and how good they are at beating teams on the road. As stated earlier, this is the equivalent of the score a test-taker would earn. Now we turn to the difficulty of the question being asked, or the strength of teams on the road.

Week 19 Away Fixture Difficulty

The dark red point in this graph represents the strength coefficient needed for a 50% probability of the home team winning a game against the opposition (answering the question “correctly”). The blue point represents the minimum strength coefficient needed to secure a 50% probability of a draw (earning “partial credit”). Points to the left of the draw zone mean a higher likelihood of losing, points to the right of the draw zone mean a higher likelihood of winning based on results so far this season.

Interestingly, title favorites Arsenal and Manchester City are near the center of this table, seemingly weak on the road. However, when you compare their away difficulty coefficient to the home coefficients, only six teams (excluding themselves) would have a strong chance against them. This feels about right to me – the top 6 teams should have a good chance of beating title contenders at home, but beyond that it should be much more difficult.

On the other side, I’ve been skeptical until now, but Chelsea look to be at least a semi-legitimate relegation contender right now.  They are the 6th weakest home team right now (a far cry from the undefeated season at Stamford Bridge last year), and everyone except for Aston Villa would be favored to take at least a point off of them at home. I knew they weren’t as good as MOTSON says, but this model has them in serious trouble unless they things turn around. Maybe not relegation-level, but bottom 5 wouldn’t be surprising based on the eyeball test here.

Disclaimer: while these coefficients are accurate given results so far, we’re obviously at a very small sample size with a substantial amount of missing data that will be filled in over the next 19 weeks so these numbers could possibly change quite a bit. However, there’s a lot of logic in these numbers, and they match up at least somewhat well with my expected final table as of today. 

Also, one can calculate predicted probabilities for each outcome (and presumably expected points over the season). off of these models. I don’t know that I’m going to do that, but if there’s interest I can probably put it together in the next couple of weeks. 

Does MOTSON Predict Better than a Former Footballer? Halfway Point Model Diagnostics

I’m probably breaking one of the cardinal rules of Soccer Analytics TwitterTM by publicly posting these things, but I’m a firm believer in transparency in my predictions and sharing my model’s successes and opportunities for improvement. We don’t learn much from pretending that our models are always right, and the best way to learn is to be completely open with where analytics are effective and where they are less effective. So I wanted to present a few diagnostics and then a few thoughts afterward.

Overall the model is working well. The first thing I did was create a variable for MOTSON’s “most likely outcome.” This was simply done by looking at which of the three outcomes (Home Win, Away Win, Draw) had the highest predicted percentage. So if a model predicted 50% home win, 20% away win, and 30% draw, it was coded as a “predicted home win” for these first two tests.

Question #1: How Many Does MOTSON Get Right?

The most important question is “Does MOTSON predict better than a former footballer?” The answer so far is yes – I compare the overall correct predictions to two separate “models.” The first is random chance, which would predict a 3 outcome game correctly 1/3, or 33% of the time. The second is what I call the “Home Team Naive” model where someone predicts that the home team wins every game, which would be correct about 37% of the time this season. MOTSON gets it right 45% of the time, which is significantly different (p < 0.05 in a two-tailed, one sample t-test) from these two models.

Week 18 Diagnostics Comparison

So far so good, although I’d like to see it be “right” more of the time. However, it’s important to note that “right” means getting the correct probabilities for each outcome rather than having the highest probability assigned to the actual outcome. Even if the model predicts a team has an 80% chance of winning, if it’s “correct” we’d still expect to see another outcome 20% of the time. As of today, the model expects to pick about 53% of the games correctly, which is 1% outside of the 95% confidence interval for the average here. This means the model is somewhat under-performing, which is unsurprising given the two major outliers (Leicester City and Chelsea, which I’ll discuss later).

Home, Away, or Draw?

The next thing I tested was which predictions were most likely to be correct: a home win, an away win, or draws.  In 100/180 fixtures, or about 56% of the fixtures, the model predicted a home win, 45/180, or 25% were predicted draws, and 35/180, or 19% were predicted away wins. Home wins are quite a bit higher than the actual outcomes (at about 37% as of today), but the average percentage for these predictions was a bit more in line with historical values (47%). The disappearance of home field advantage this season is worth noting, which is a potential roadblock here. .My model is in line with last season, but there’s significantly fewer home wins this years so either this is an anomaly or the model needs to be re-calibrated. As I’ve mentioned in the past, I’m letting the model run for a whole season so I’ll re-train it at that point.

Week 18 Diagnostics by Category

This graph shows that MOTSON does well when the home team wins, about 47% of predicted home wins are correct. It does similarly well for away wins, about 45% of predicted away wins are correct. Several people have noticed that the model over-values draws, which is borne out by the fact that only 33% of predicted draws are correct. It definitely seems to be over-valuing draws right now.

#confidence: Prediction Error by Probability

As I said earlier, most likely outcome isn’t necessarily the best way to do these type of analyses so I also looked at outcome by certainty of the prediction. Basically, I’d expect it to be “right” more frequently for predictions where it had a higher likelihood of the outcome occurring than for predictions where it had a lower likelihood. If the model perfectly predicted, it should only be “right” 2/5 times if it predicts a 40% chance of a home team win, but should be “right” 4/5 times if it predicts an 80% chance of a home team win.

Week 18 Diagnostics

To test this, I “binned” the predictions into three categories based on the likelihood of the highest probability prediction: low (0.3-0.5), medium (0.5-0.7), and high (0.7-1.0). Interestingly, the model performs very well in “medium” picks, getting the proportion “correct” I’d expect it to, around 0.54. The mean proportion of this category was around 0.56, so 0.54 is really solid here. “Low” and “high” are both lower than expected, low by about 0.07 and “high” by about 0.10. That “high” is low is definitely unsurprising, and I’d probably attribute that to Chelsea’s poor season. MOTSON really likes Chelsea at home against just about everyone, especially teams who were in the bottom half of the expected table. A few big misses there would hurt the model’s accuracy significantly.

Expected Points

The goal of this project originally wasn’t to predict individual games, but to predict points over the course of a season. I post this graph on Twitter semi-regularly, but this shows the deviation for each team from the points my model has expected them to earn through the first 18 weeks. Week 18 Deviation

First, the bad news. Not surprisingly, Chelsea and Leicester bookend this table. MOTSON originally picked Leicester City to finish 9th, which was significantly above most people’s expectations, but even given those high expectations they’ve significantly over-performed. Similarly, pre-season MOTSON had Chelsea in 2nd place, and they’re way below expectations. The only other pick I’d consider a “bad” pick for MOTSON is Watford here, who have performed considerably above expectations. Villa seems to have slightly turned the corner and I’d be shocked if they didn’t make up some of those lost points, and Swansea isn’t as bad as their numbers here seem to say so I’m expecting them to regress to the mean.

The good news. Thirteen teams are with 4 points of their expected points as we approach the halfway point, which I’m very happy with. Even if individual predictions aren’t doing well, aggregate predictions seem to be working out well which bodes well for the overall accuracy of the model.

Also good is the correlation between my expected points and actual points earned. Overall the model is at 0.53, which is in the good range, but if you exclude the two outliers it’s at a really strong 0.76.

Finally, the slope of the relationship between expected points and actual points is 1.0. This means that for every one point increase in predicted points, teams earn a one point increase in actual points. This is the relationship I want to see with this model, so it’s good to see that the relationship has held up after 18 weeks.

Week 18 - Deviation Line

Concluding Thoughts

Overall I’m happy with the model’s performance, especially given two significantly weird aspects to the season so far (the rise of Leicester/fall of Chelsea, and the disappearance of home field advantage). I’d be surprised if any pre-season model predicted Leicester/Chelsea, and honestly I don’t think anyone could have properly weighted home field advantage.

As has been discussed (far too much) on Twitter, the model does over-predict draws. I couldn’t disagree more with those who say the maximum probability for any given game to end in a draw is capped around 33-35%, but I do think the model probably over-predicts draws by about 10%.1 It’s also over-valuing home field advantage right now, so visitors aren’t getting nearly enough credit. It remains to be seen if this holds up over the course of a season, or if it’s some sort of anomaly over the first half that resolves itself over the next 20 weeks.

Another note on error: all the initial predictions were calculated with a “full-strength” squad. This is a hobby for me, and I’ve decided it’s far too much work to update the spreadsheets every week with the various injuries, so there will be some error there. Individual injuries tend not to make a big different in model predictions, but this is adding some noise that isn’t necessarily in the model naturally but is induced by incorrect inputs.I tend to think this balances out over the course of the season (as an example, I was talking to someone about Arsenal v. City and City losing Kompany is roughly equal to Arsenal losing Coquelin), but in short samples this could be a source of added error.

Final thoughts: I’d encourage everyone who does any sort of statistical modeling to do a similar sort of open diagnostic of your models. I think the best way to move forward is to think about where we succeed and where we can improve, so I’d encourage the xG/xA modelers, game prediction modelers to do something similar with their models. It’s not the easiest thing to do, especially for people who do this for money rather than a hobby, but coming from an academic background I’m a firm believer that putting out work publicly and transparently for people to discuss is what you do.

  1. A note, I’m also completely over this debate in my mentions so don’t @ me on Twitter about it. I firmly believe the math is on my side and have explained myself enough. I’m over it at this point.

Frequently Asked Questions about MOTSON’s Predictions

I’ve been surprised by how well Twitter has embraced my model (MOTSON: “Model of the Same Old Nonsense”), and feel fortunate to have people be interested in what was initially a fun side project for me. Because the same questions pop up in my mentions every game day and to help new followers, I wanted to post a list of answers to “frequently asked questions” I get.

What underlying stats do you use?

There’s a whole post about the method for anyone who is interested, but basically I use a number of offensive and defensive statistics from last season (2014-2015) combined with a “team strength” coefficient calculated from last year’s results.

Your model really likes Chelsea, what’s up with that?

The predictions I posted were all made based on 2014-2015 statistics, and more importantly the “team strength” coefficient was calculated on a year where Chelsea won the Premier League title. The model thinks they are good, especially at home where they were undefeated. They are not this season, so my model has really struggled with over-valuing them.

Your model isn’t giving Leicester City enough credit this week, they’re way better than ______.

Just like Chelsea, I’m using last year’s numbers. Leicester City have dramatically out-performed expectations this season. Unlike Chelsea, who would shock me if they improved anywhere near pre-season predictions, Leicester may still regress to the mean. We’ll see.

But Chelsea aren’t any good this year. Why don’t you update your predictions?

This was a personal decision – I’ve decided not to update the model throughout the season to see how it does. Creating a model based on recent results is a different challenge, and there are a number of ways to tackle it, but that’s not my goal here. Initially I aimed at creating a model where I could measure individual player contributions, and to do that I wanted to predict season performance. The individual game predictions were a byproduct of this model, and they’ve become far more popular than I had imagined, but they’re more of a diagnostic to how my model is doing overall.

I could update the model with a more recent team strength coefficient, but I’m a big believer in letting the model run its course.

Why does your model like Arsenal so much?

One of the features of the SVM model is that I don’t know what individual variables make the model say what it says – I do know it predicts a pretty significant home field advantage. Other than that, it’s a black box that I can’t unpack.



We Should Have Seen It Coming: Evaluating Jamie Vardy Against the EPL’s Elite Strikers

I did some updates to my interactive transfer evaluator – the stats are all the same (2014-2015 season stats) but I cleaned up some code and listed a few players under multiple positions to get it ready for the January window. I thought a good way to introduce it to my new followers was to show a demonstration with the EPL’s biggest story of the first half of the season: Jamie Vardy.

A quick discussion of the method behind the transfer evaluator: my model’s predictions are a function of player stats (aggregated to the team level) and a “team strength” coefficient. To evaluate how well a player would do on a new team, I remove the player currently in the position, subtracting his stats, and then substitute the new player, adding his stats. The model then recalculates the probabilities for all 38 games, adds the expected values up, and gives the new points. It’s a fun interactive app, and was a lot of fun to build so I’m pleased to see how much people have enjoyed it.1

So I wanted to test what my model thought of Jamie Vardy compared to the EPL’s elite strikers and on some Big ClubsTM. It found that many of these teams would have done well to sign Vardy last summer – the results are presented in the graph below.


Keep in mind all of this was done before this year happened.  Vardy’s record-breaking goal streak isn’t included in this model, only the underlying stats from 2014-2015 (for strikers these are mostly shots, shots on target, shooting accuracy, and probably some passing stats).  He’s an improvement over Wayne Rooney (+4 points for United), Christian Benteke (+3 for Liverpool), Olivier Giroud (+2 for Arsenal), and a small improvement over Romelu Lukaku at Everton. He’s also only a 1 point downgrade over Diego Costa and Sergio Aguero, and a 4 point downgrade over Harry Kane following his amazing season. Being an improvement over some great strikers, and basically break-even against world-class strikers like Costa and Aguero is pretty remarkable, and one could easily argue that only being a 4 point downgrade over Harry Kane given his amazing season is strong as well.

The Benteke finding is the most interesting one to me: Vardy was reportedly available for somewhere around £15 million, while Liverpool reportedly paid over £30 million for Benteke to score 4 goals in 14 appearances so far. My model would have identified Vardy as a better signing, and maybe Liverpool would have been the beneficiaries of the purple patch Vardy (and Leicester City with him) have gone through this season.2 With the margins being as thin as they are for the top 4, this knowledge could have been invaluable to Liverpool.

Other than Liverpool, my transfer evaluator shows him as a good backup for Aguero at Manchester City and Costa and Chelsea. City probably doesn’t need him with Bony as their #2 striker, but Chelsea would clearly have benefited from another striking option other than Loic Remy (especially given Jose Mourinho’s tactical nous that seemed limited to “We’re down a goal, replace a midfielder with a striker” for the first 15 games of the season or so). Distribution from Mahrez has been unbelievably important for Vardy, but Oscar/Willian/Hazard might all be having better seasons if they had Vardy to aim at.

The transfer evaluator clearly didn’t anticipate how big Vardy would have been this year, but it did recognize that he’s either an improvement or a decent replacement for many of the Premier League’s elite teams and did so before the season even started. Take it for a spin and see who your favorite team should sign for the second half of the season.

  1. I get 25 free “active hours” a month, and my Twitter followers used those up in 4 days this month. I’ve bought 500 hours to hold me through January 25 – hopefully that’s enough.
  2. This all assumes someone provides at the level Mahrez has this year, which is a big assumption. Even if he didn’t score as many goals, he’s still a 3 point upgrade over Benteke given the underlying stats. Although given the price, maybe Liverpool could have signed Mahrez and Vardy for the cost of Benteke.

Small Clubs Need Scouts (and Analysts) the Most

Following up on my previous post, “Big Clubs Need Scouts (and Analysts) the Most”, I want to make the case that small clubs also need scouts (and analysts) the most.

Small clubs have a couple of disadvantages compared to Big ClubsTM that makes scouting/analytics more important. Specifically, small clubs have less money than their counterparts, so presumably they would have a harder time writing off a mistake. Manchester City can choose not to start Raheem Sterling, United can choose to bench Memphis Depay, but it’s much harder for a smaller club to justify benching a major summer signing. The opportunity cost is potentially higher for smaller clubs – signing a new striker in season 1 makes the justification for signing a replacement striker in season 2 much more difficult.

Additionally, there is less information out there about the players smaller clubs are trying to sign. There’s no shortage of opinions on which striker Manchester United should sign to replace Wayne Rooney: any number of a dozen options would likely work fairly well (although see my previous post for why this is a problem). While Aston Villa would likely benefit from signing Robert Lewandowski, he’s not a realistic transfer option for them. Analysts can find a list of potential targets that would be within reach for the club, and scouts can fill in all the blanks to pick the best option.

The consequences of a bad signing might be even higher for smaller clubs. If Manchester City isn’t happy with Raheem Sterling’s performance, they can play either Fabian Delph and Jesus Navas in his place. Maybe that puts them a step behind Arsenal, and they lose in the first knockout round of the UCL, but they’re still comfortably make the top 4 next year. But a team like Newcastle is possibly one bad signing away from relegation. Each signing has greater pressure to be a success and fit into the squad, and a failure (combined with the opportunity cost of buying another player mentioned before) could have significant consequences.

I would think this is all self-evident, but teams will gladly spend millions of dollars on players while finding savings in the most important area: information. My undergrad American Government professor said something that always stuck with me: “Every decision is easy if you have the right information.” The goal is to gather the right information, and a proper budget for analysts and scouts can help do that. There are plenty of places teams can find efficiencies, but a proper analytics and scouting department will give you a positive ROI as well as success on the pitch.

Big Clubs Need Scouts (and Analysts) the Most

There’s a story going around Twitter today that Manchester United don’t have any full-time scouts. There’s an argument that when you’re buying the elite players, scouting might be less important because it’s much easier to identify the best in the world.1 As an example, the rumor mill is talking about a triple-swoop for Neymar, Ronaldo, and Bale. I know this is just some combination of clickbait, boredom, and wishful thinking, but a team wouldn’t need to spend any money to know that those three would be better than the players United has at those positions. Also, when you have virtually unlimited transfer funds, you can afford to make mistakes. Manchester City spend a ridiculous amount of money on Raheem Sterling over the summer and then benched him for the first half of their biggest game of the season against Arsenal last weekend. Bad transfer choices hurt a lot less when you can sell for a loss and overpay for the next big player.

Here’s the problem with that: margins at the top of the table are incredibly narrow, and players who can improve top teams/not hurt top teams are few and far between. One of my first blogs here was questioning the Nicolas Otamendi signing at Manchester City, and as far as I can tell I was about the only one who didn’t like it at the time.  But in a close season like this one, buying the wrong central defender (I liked John Stones, who they probably could have bought for the same price they paid for Otamendi) would be the difference between 1st and 2nd place. People criticize Wenger’s transfer strategy (or non-transfer strategy), but his one move this summer turned out to be a good one while one of City’s major transfers was on the bench this weekend, and another looked as if he’d never held a defensive line before. Sterling and Otamendi are world-class players, but they looked like missed opportunities against Arsenal this weekend.

The gaps at the top of the game get more and more narrow, and more and more difficult to cross. The gap between the top 6 in the Premier League and the top 4 is small, but incredibly difficult to break (and takes a disastrous season from Chelsea for one new team to get in, likely for one season). Then the gap between the top 4 in England and the top 4 in Europe is similarly small, but much more difficult to break into, and the gap between #2 and #1 in Europe is even more difficult to crack.2 Finding players who can bridge these gaps is incredibly difficult, and only teams that make all the right moves can attempt to break into a new class of teams. One or two bad decisions are the difference between Champions League and traveling a couple thousand miles to Eastern Europe on a Thursday night. Scouts (and analytics) are more valuable than ever when the margins are this thin.

Preview: tomorrow I’m going to write up “Why Small Clubs Need Scouts  (Analysts) The Most” because they probably need scouts equally but for very different reasons that I think are worth talking about.

  1. This argument was made sarcastically in my timeline today, but I can picture it being made seriously
  2. Bayern Munich is virtually unbeatable in Germany and won the 2014 UCL title, but Barcelona was head and shoulders above them in their 2015 UCL semi-final. Bayern was likely head and shoulders above everyone else in Europe that season, etc…

Home Field Advantage has Disappeared from the EPL

I was going to start this post with a story, but I think the headline says it all: there is no such thing as Home Field Advantage in the Premier League this season.

This post was inspired by my eyeball test of MOTSON, my statistical prediction model. My initial reaction was that it was over-estimating the chances of home teams, as most of the “bad” results were from away wins. Inspired by a conversation with @11tegen11, I looked at the overall predictions of my model, and it predicts 47% home victories, which is inline with historical data (last season was ~45%). So if my model is well-calibrated, why does it seem to over-value home field advantage? Simple: home-field advantage doesn’t exist this season.

As of this morning, we had 167 fixtures played. The home team won 62 times, drew 46 times, and lost 59 times. First cut at the evidence: the number of wins is basically equal to the number of losses. The home team wins almost as frequently as they lose.

The next thing I did was look at the points-per-game (PPG). Home teams earn 1.39 PPG, while away teams earn 1.33 points PPG. These numbers look basically the same, and a statistical test confirms that they are indistinguishable from each other (p = 0.705).

The next step was to compare this to last year. A quick look at least year’s results show the home team won ~45% of the time, drew 25% of the time, and lost ~30% of the time. This comes out to 1.6 PPG for the home team. So again, I ran a statistical test and found that this number is statistically distinguishable from this season’s results (p < 0.05).

Week 17 Barplot Home Field

As the barplot shows, the difference between Home PPG in 2015-16 is outside the 95% confidence interval, while Away PPG and the PPG where win, lose, and draw are assigned (1.32) equal likelihood are well within the range. In statistical terms, we can say that Home PPG 2015-16 is distinguishable from 2014-15, but not the other two categories.

The final test was to test whether a randomly generated series of results was statistically distinguishable from this season. So I simulated 10000 “seasons” up to this point with an equal probability of win, lose, and draw for each fixture. I then calculated the PPG for each of these seasons, and compared those randomly generated seasons to the current one. Of the 10,000 simulated seasons, only 140 were statistically distinguishable from the current one.

Week 17 Home Field Advantage

The blue highlighted part of the graph represents seasons that are indistinguishable from this one, while the much smaller red part represents seasons that are statistically distinguishable from this one.This means that if the “true” probability of winning at home was equal to the probability of losing at home, we’d randomly see a season that didn’t look like this one only 1.4% of the time. Simply said, the trend documented in this piece by Oliver Roeder and James Curley has continued and home field advantage in the EPL has disappeared this season.

I have no way of saying whether this season is an anomaly or whether the current trend will continue. It is significantly different than last year, but given the trend it’s hard to say whether last year was the outlier or this one was. Only time will tell, but what we can say is that home field advantage is worth significantly less than it was last year. That is to say it had value last year, and we can’t tell any effect at all this season.

Author’s note: I aimed the statistical discussion here for a general audience with some knowledge of principles of stats but without knowledge of the different kinds of t-tests. Replication R code and a more technical version will be available as soon as I get around to writing it.


Thoughts on Parsimony v. Accuracy

My model has what a lot of people are considering an odd prediction, heavily favoring Arsenal over Manchester City at the Emirates this week. I’ve had a few people ask “Why?” and I’ve pointed them to my blog post about the model being a “black box”. I can’t point to any reason why it likes Arsenal so much, although it does like teams with a home field advantage and has a math-crush on Theo Walcott (calling him “Europe’s most valuable striker”).  James Yorke of Stats Bomb pushed back on this a little bit:

James is a smart guy and a great writer that you all should follow if you’re not already, so I wanted to write a longer form post talking about why what I’m doing is important for soccer analytics and why I choose a “black box” model over a typical regression format with clear coefficients and tests of statistical significance.

I wrote about it in more detail in another post, but I’m a firm believer in using the “right” model rather than the convenient one in most cases. 1 The SVM assumes no functional form for the individual variables, and we have no idea what the correct functional form is for things like possession or number of passes so we shouldn’t be doing any sort of linear regression on these variables. If the trade-off is not being able to say “Arsenal makes 37 more passes a game than Manchester City, therefore they’re more likely to win” then I’m ok with that.2

But more importantly, I think my model gets at things that much of the rest of the community isn’t getting at. Expected Goals and Assists are interesting and useful, but I think if we force everyone into that sort of analysis and language, then we’re really limiting ourselves. They’re good because they’re easily observable events, and goals are the…well “goal” of every team, and assists are the one-off event (the one immediately before the goal). However, they’re such a small part of what actually happens on the pitch, and there’s a little bit of the “Drunkard’s Search” going on if we limit ourselves there.

My model looks at the entire game, and the entire universe of (publicly available) statistics and makes no judgment on what is important. It likes tackles, headers, and other defensive actions quite a bit, which are generally thought to be unusable by most of the analytics community. While xG and  xA models, and all other models I’m aware of would prefer Lionel Messi to Fernandinho in a holding midfielder role for Manchester City, mine recognizes that he’d be a downgrade there.


Models that only look at offense and observable outcomes don’t get this right, but intuitively I think we can agree that this is true. We know limiting xG is important, but we don’t know how that happens. My model seems to understand that, and even though it doesn’t have a great answer for “why” it does lead to potential explanations and hopefully some testable hypotheses. Limiting ourselves to one way of thinking is ultimately going to leave the soccer analytics world stagnant, and there is value in multiple methods and multiple approaches. Parsimony is good, but we shouldn’t exclude complexity that generates insights into the game because it doesn’t fit well into 140 characters.


  1. If the convenient model gives roughly the same result, then go with the convenient one, but I’m a big believer that accuracy should never be second to parsimony.
  2. Reasonable people can disagree on this point, as I’ve written before.

Title Challenge Consequences of Arsenal v. Manchester City

This will be a short blog post, but I wanted to show the huge consequences of this weekend’s biggest game: Arsenal v. Manchester City. My model has Arsenal as a  surprisingly big favorite here, giving City only 14% chance to win at the Emirates.1 Clearly Arsenal has a lot more to lose here, with 2.05 expected points compared to City’s 0.70 expected points.  Arsenal currently has about a 6 point lead at the top of the expected table over City, so even a draw would close that gap to ~4.5 points. Week 17-0 Arsenal City

So what will happen to the title chase after Monday’s game? I ran the usual 10,000 simulated seasons based on an Arsenal win, a draw, and a City win, and looked at the results. Here they are:

Consequences of Arsenal v City

This game has huge implications for the title race. Currently I have Arsenal at about 74% to win the league, but if they lose they drop a net 4.3 points in the title race (losing their ~2 expected points, and City picking up 2.3) which closes the 6 point gap to a little over a point.  An Arsenal loss/City win gives Arsenal a 52% chance to win the league, bumping City up to 40%.

A tie hurts Arsenal as well, but not too badly. They’d be at 66.5% to win the league, while City would move up to 24.5%. Still big favorites, but with a little less breathing room at the top.

However, a win puts Arsenal over 80% to win the league, giving them an ~8 point expected final table cushion over City, dropping City’s chances to only 13%.

This is probably the biggest game of the season so far, and really borders on a must-win for City if they want to make a title challenge. However, a draw seems to suit both teams just fine so I wouldn’t be surprised if we see a 0-0 draw with no one taking too many chances and content to live to fight another day (while letting Leicester City presumably build on their lead at the top of the table).

  1. Personally that seems far too low to me, but I’m a City supporter so I may be biased here. City doesn’t have a great record recently on the road in big matches so the model may be smarter than me.

Relegation Round-up: Big Weeks for Bournemouth, Watford, and Newcastle

My favorite part of the ESPN Soccernet podcast back in the day (other than West Ham corner) was “Relegation Round-up” where they would talk about the relegation teams. I’ve tried it on my Twitter account, and I’ve started to get a little more interest lately so I wanted to do something more long-form. So here’s my first “Relegation Round-up” blog post, and it was a big week for a lot of teams.

As of today, the relegation fight really includes 4-5 teams: Aston Villa, Sunderland, Norwich City, Bournemouth, and Newcastle. Newcastle has potentially pulled themselves out, which I’ll talk about later, while Villa and Sunderland need to make up some ground very quickly.

Week 16-2 Heat Map

The heat map shows roughly 5 clusters: the top three are pretty much set, the next 4 teams have separated themselves (although I’d expect Chelsea to slip into the next pack soon), 5 teams (Palace through Everton) have solidified a mid-table spot next season, 4 teams are in the lower mid-table but safe for now (Watford through Newcastle), and the bottom 4 are in serious danger of the drop (Bournemouth through Villa). But how did things shake out this week?

The big news is that Bournemouth claimed their second major scalp in a row, beating Manchester United 2-1. This was a huge win for them, especially given that they had beaten Chelsea 0-1 the previous week at Stamford Bridge. The first win could maybe be written off as another bad week in Chelsea’s nightmare season, but two in a row is huge for Bournemouth. Beyond the significance of beating two of England’s giants, it helped them massively in their bid to stay up next season. Here’s the heat map of their weekly predicted finishes.

Week 16 Bournemouth

The change the last two weeks is striking: they had a 22% chance of finishing in last place 2 weeks ago, but with the 6 points (and 4+ point gain in expected points), they’re down to less than 5%.  Two weeks ago they were big favorites to get relegated at 68%, now they’re down to around 40%. There’s still a lot of work to do, but they’ve started the hard work of getting into the lower mid-table safe zone.

Meanwhile, Newcastle claimed two big victories of their own over Spurs and Liverpool. These weren’t as historic given Newcastle’s  history in the Premier League, but they were potentially even more important to Newcastle’s survival. Here’s the plot of their season.

Week 16 Newcastle

With the two unexpected wins, Newcastle made a similar jump. You can see in the graph, things weren’t looking great for Newcastle two weeks ago. They were ~33% to drop, and things were trending downward. With the two wins, they’re only at about 12% to get relegated, with virtually no chance of finishing in last place (which wasn’t the case two weeks ago).

Finally, this may be the only time I talk about Watford in the relegation discussion because they are on a three game winning streak, and have firmly moved themselves out of the drop zone. Week 13 they had about a 20% chance to get relegated, but after winning against Villa, Norwich, and Sunderland they’re down to about 3%. It’s simple enough logic: win the key 6-pointers against your direct rivals and you stay in the Premier League.

Week 16 Watford

The three teams here have had a series of good results, and have begun to separate themselves from the bottom of the table. Villa and Sunderland need to respond quickly or they’re in serious danger of being left behind. Villa has a series of winnable games in a row against Sunderland, West Ham, Newcastle, and Norwich City, and I’ll look into how many points they need out of the next 12 if they want to stay up in a later post. I have to think they’re due for some luck soon – they haven’t been good, but their squad has far more quality than 6 points from 16 games.

On the other side, Sunderland seems to have had their initial bump from hiring Big Sam, but the home loss to Watford was problematic to be sure. They’re in a better position, but they need to pick up some points soon if they’re going to pull another great escape.