Is Jamie Vardy Actually Underperforming? The Importance of Measurement

Jamie Vardy’s performance has been the story of the EPL season. But how spectacular is his performance? In the analytics community, this revolves around Expected Goal (xG) measures. So let’s look at Danny Page’s great simulator to see how he’s doing.

Vardy

Using @footballfactman’s numbers, Vardy is doing pretty well for himself, finishing 1.64 standard deviations above the mean (only counting non-penalty goals). This means we’d expect him to have more goals than this, given his xG total, about 5% of the time. Very impressive finishing.

But what is 1 xG isn’t really one expected goal? @rakkhis posted a great image today showing the correlation between xG (plus xA, or expected assists), and actual goals + assists. It shows a powerful trend, with an Rof 0.73.

CV9IIKrWsAIT0Ki

The interesting thing here is that we see Jamie Vardy, Mesut Ozil, and Romelu Lukaku slightly below the trendline. What that means is that those three players all have expected assists and goals roughly equal to his actual production. In this case, it means we’d actually expect Vardy to have about the same number of goals and assists (likely goals because of his position) that he actually does. This means Jamie Vardy is basically performing identically compared to expectations, which is a remarkable idea.1

This all goes to the importance of measurement: Danny Page’s simulator assumes a 1:1 relationship between xG and actual goals scored. It says that Jamie Vardy is having a rare season, over-performing > 95% of all simulated seasons by someone with his xG. The regression however tells a different story, showing Vardy is slightly underperforming and should have actually scored an extra half goal or so. If we measure it using a 1:1, Vardy is an extreme outlier, if we measure it using the ~1.6:1 relationship that the regression shows, then Vardy is performing almost exactly at expectations. In one version he’s in an amazing purple patch and we’d likely expect him to slip a little bit back to normal, in another, we’d expect him to continue this streak as long as he keeps getting into the same positions to shoot. Measurement matters and it affects the conclusions we can draw from data.

 

  1. EDIT: I originally had this paragraph as “Vardy was under-performing”, but John Burn-Murdoch pointed out that the author of the graph transposed the X and Y axes, meaning people under the trendline are over-performing, while people over the trendline are under-performing. I’ve edited this paragraph to reflect that idea.

Five Reasonable Transfer Targets for Leicester to get into the Champions League

I’ve been playing with my transfer simulator 1, trying to find some good signings for various teams and I figured I’d post some blogs about what I found. With Leicester City being in the news and some of the big models (including mine) predicting they have a chance at the Champions League, I thought I’d start with them.

The “rules” for transfer club are simple: I haven’t watched a ton of footage from some of these clubs, so I start by looking at my “Points Above Replacement” spreadsheets to identify growth opportunities.

Next, I look at top of the list of players identified as improvements and delete any players who I feel are unsignable. Reasons for this include “plays for a direct rival”, “would be too expensive”, “too old to buy”, or “plays for a bigger club and would be unlikely to move downward.” I also look at Transfermarkt to see what they say the player’s value is as a proxy for a lot of these things.

That’s about it – I’m pretty lax on position because I think my model sometimes identifies potential tactical shifts that would benefit a team by playing someone in a different role. With the description out of the way, here’s who I’ve identified for Leicester City.

Week 15 - Transfer Upgrades for Leicester City

All of these players are young and are likely within Leicester City’s buying range. I like Sonny Kittel a lot because he can play centrally or on either wing 2, and he’s a big upgrade over Drinkwater according to my model.3  He’s just recovering from a fairly serious injury though, so they may not want to take a chance on him. Depending on what you’re looking for, Magnanelli, Kacar, and Dabo are also improvements and are more defensive-minded which may be a better tactical fit depending on what Leicester is looking for.

The second player is Florentin Pogba, who’s a 3 point upgrade over Wes Morgan. He looks a little stronger in terms of offensive statistics, and is younger with a bigger upside. MOTSON thinks signing both him and Kittel gives Leicester City an extra 10 points over the season, or ~5 over a half-season.  In a season where 4th place is relatively wide open and small margins could matter, these are all decent signings at a reasonable price.






  1. The public version is currently down, but I’ll re-up the paid subscription to Shiny Server around December 20 when the transfer rumors get into full swing again.
  2. I always like players who can cover for other players, especially for teams with a limited budget
  3. Someone mentioned on Twitter that Drinkwater is pretty highly rated right now, and from the little I’ve seen I agree with this. The model likes all of these players over N’Golo Kante as well, but everything I’ve read lately says he’s playing too well to bench.

Predicting Individual Player Contributions – The Next Frontier

I’ve recently been inspired by @DoctorFootie’s really creative model of predicting possession – I won’t pretend to understand it well enough to explain it, but the idea is that he borrowed an equilibrium model from chemistry to predict how much possession each team will have in a given game. The significant contribution here is that opposing team stats contribute to and affect each other, and this is most obvious in a place like possession where you have a zero-sum game (me having the ball means you can’t have the ball). It would be interesting to see how well this works for something a little less zero-sum (number of passes) or almost completely independent (tackles, or maybe even shots on goal). That’s not my method, but you all should follow @DoctorFootie on Twitter because I’m anticipating he’s going to do some interesting things going forward.

Back to my inspiration – my current model uses average player stats to predict results (on top of a “strength” coefficient calculated from the 2014-2015 season). 1 But I can do better than this – I can predict individual player stats based on who they are facing. It’s easy enough to do – player stats would be your dependent variable, aggregate opposing team stats would be your independent variable, then let the machine learning models work their magic on this training data.

Then I could do predictions based on average opposing team stats – how many passes will a player make given the quality of opposition? How many shots on target will a player get?

This could potentially give better predictions for player stats to put into the current model, but there’s a bigger issue here. This would be another step closer to quantifying individual player performance – knowing how individual players perform against each opponent could be incredibly useful for teams: it could help managers trying to make the optimal squad selection, could help teams decide which players to purchase, and could get use closer to understanding exactly what individual players add to the team contribution and under what circumstances do they add it. Are there “big game players?” How consistent are players? Can a player move up to compete against stronger competition?

This will be my holiday goal – I think I can put something preliminary together relatively quickly after my grading is done, test it, see if it works and is worth pursuing more fully. I might test two or three teams to see how it works, and go from there. Hopefully we can learn something interesting here and create an improved model of individual player contribution over what I have now.

  1. As I type this, I’m realizing I can weight the strength coefficient from season to season – e.g. week 1 I only use the 2014-2015 coefficient, maybe by week 10 I can start to slowly add in the 2015-2016 coefficient

Game Theory: Top 4 Contenders Leicester City Should Absolutely Keep Jamie Vardy

A few days ago I wrote probably my most controversial blog, where I said emphatically that Leicester City has to sell Jamie Vardy in January.  The most interesting part to me is that I never actually tweeted the link to it – @FanalyticsBlog auto-posted it like it does with all my blogs (and many other excellent soccer analytics blogs), and it got a decent number of retweets pretty quickly (and was later picked up by a Dutch soccer account that may or may not have liked it – Google Translate was unclear). I came back from teaching and apparently Soccer Analytics TwitterTM was debating it pretty heavily. Tom Worville and Mike Goodman made some interesting points about why I was wrong, and someone whose name I forget pointed out I ignored the damage to the fan base (something I’d written about previously and did ignore here).

Since then, things have changed, and changed dramatically. Leicester City won a difficult road game at Swansea, continuing to overachieve, and it’s now a very realistic possibility that they qualify for the Champions League.  Here’s my model as of Saturday:

Week 15-1 Final Table

I’ve got them in 4th place currently, edging Liverpool and Spurs for the final Champions League spot. 1 I’m not the only one, @GoalImpact’s has them in 4th,  @Stats4Footy has them in 5th place, and Michael Caley (@MC_of_A) has them in 6th, but with an almost 30% chance of qualifying for the Champions League.

None of our models are perfect, but when so many of them cluster around such an unusual finding it’s really compelling evidence: Leicester City is for real.

This is a truly historic year for Leicester City. I wrote about their overachievement a couple of weeks ago, and in that article I mentioned that their next 6 games would be tough but that MOTSON didn’t expect much from them. MOTSON predicted ~6 points in 6 games, and so far they’ve earned 4 points out of 2 games with struggling Chelsea coming up soon. There’s no reason they can’t pick up even more points over the next 4 games, which is a truly scary though for Liverpool and Spurs.

If they have a real chance of making it into the Champions League (and an even better chance of staying in the Top 6 for the Europa League places), Leicester City has to keep Jamie Vardy (and Riyad Mahrez) at least until the end of the season. I’m still a believer in selling players at their peak value, but a special season like this changes the economic calculus. Formally, it looks like this:

Pr(Champions League)*Money Gained from Champions League – Value(Vardy/MahrezJanuary-August)

Ignoring the emotional component for a minute (and I acknowledge that it is important), the Champions League can mean a huge financial infusion for Leicester City. All the models have them at somewhere around 30-40% likely to qualify as of today, so multiply that number * the money gained from qualifying for the Champions League.2. If that number is greater than the premium they would get for Mahrez and/or Vardy in January compared to selling in August, they should keep the players. I’m more optimistic about the January “panic premium” than most, and I don’t think it would be that much compared to what they’d get for the two in August, especially the younger Mahrez.

The opportunity to qualify for the Champions League, or at the very least the Europa League is very real, and as long as Leicester City keeps this pace up through mid-January they shouldn’t sell their stars. If they slip, or Liverpool/Spurs gets really hot and things change, then it might be time to sell. But I’ve changed my mind – if the Champions League is a real possibility then they have to pursue that. They have to keep Jamie Vardy until at least August.






  1. If Liverpool wins Sunday, they’ll likely edge back ahead – MOTSON doesn’t have them as much of a favorite over Newcastle, and actually thinks a draw is the most likely outcome at 42%.
  2. The actual calculus would be the probability of qualifying for the group stage, which is where the money is earned, but I’m simplifying here for the sake of readability. One should also add the money from the Europa League which isn’t substantial for big clubs~! but would be for someone like Leicester City.

    One can also picture a second level where one big season gives Leicester City a boost that keeps them in the Premier League for X extra years, and the money that comes from that. The math can get as complicated as you want, and the more complicated it gets the more it tips the balance in favor of keeping Vardy.

Game Theory: Leicester Has To Sell Jamie Vardy in January

With his record goal-scoring streak for Leicester City, and at least a couple “big clubs” in England in need of a goal-scorer (Man United and Chelsea stand out in my mind), you’d have to imagine Jamie Vardy will be in demand this January. There is zero doubt in my mind that Leicester City should cash in and sell him now.

Much of my logic comes from my “You should sell your over-achieving striker” post, with the basic idea being that strikers who go on a hot streak end up over-valued. This is particularly true in January when teams are willing to overpay for mid-season replacements1. Jamie Vardy’s value will likely never be higher than it will be when the transfer window opens in a few weeks.

But there’s a second level logic here – either Jamie Vardy is a world-class finisher who will continue to perform at a world-class level, or he’s going through an unrepeatable purple patch and will regress to the mean in the very near future. The problem for Leicester City is that those two realities lead to two very different outcomes for them.

As of this post, Vardy has scored in 13 straight games, leading the Premier League goals table, breaking world records, and is confident he can keep scoring goals and getting away with racism.2. And using Danny Page’s great Expected Goal simulator (and Paul Riley’s tremendously helpful public numbers), we can see that he’s scoring at over 2 standard deviations above the mean expectation for his xG.

blog screenshot

The first, and most likely, explanation for this scoring rate is that Vardy is on a ridiculous hot streak that won’t continue for long. These sorts of streaks seem to be able to continue for a season, but after that we see massive drops in production and a host of “what happened to…” stories. If Jamie Vardy regresses to the mean and scores goals at a reasonable rate after this (8 goals in 14 games is nothing to dismiss), he’d be a solid striker for a mid-table EPL side like Leicester hopes to be. However, if he follows the pattern of so many before him and drops below the expectation he’d not be a drag on Leicester’s expected points. In this case, Leicester City would certainly be able to keep him, and more accurately wouldn’t be able to sell him even if they tried.3

The other option is that Jamie Vardy is one of Europe’s best strikers, scoring at a rate comparable to Lionel Messi and the other elites. With all due respect, strikers who score at anything approaching this rate don’t play for Leicester City. If he actually is that good, Leicester has to sell him eventually, so why not now? His value is likely at, or near, the highest it will be while he’s with Leicester City. He’s 28, so he’s as young as he’ll ever be, and is nearing the end of his sellable years. Even if he’s an elite striker, scoring well above expectations, he probably can’t maintain this pace (10-11 goals with this xG seems more reasonable for a world-class finisher), and we know teams over-value goals. And January is the time for panic-buying, so they’ll probably be able to get a premium there as well, plus we’re early enough that he won’t be talking about how he wants a transfer and how unhappy he is at Leicester which will reduce his value.

As of today, there’s a lot of hype and uncertainty as to Jamie Vardy’s future goal-scoring prospects. Plenty of people are likely to overrate him due to his current form, and additional time will reduce that uncertainty in one direction or the other. And as I said above, neither direction really helps Leicester’s position, so all this, combined with the fact that Leicester’s league status is basically assured for next year, makes it ideal (and I would argue practically mandatory) to sell him today while his value is at its highest. Get the rumored 15 million pounds for him and invest in a couple other players to improve the team for next year and seasons to come. If he’s that good, they’ll have to sell him anyway, and if he’s not, they’re stuck with a depreciating asset.






  1. I could easily picture Chelsea drastically overpaying to try and right the ship and replace the out of favor Diego Costa
  2. Yes, I’m aware this is a parody site, but it’s a really funny one so I wanted to link it
  3. I know there are injury issues at play here, but Swansea releasing Michu on a free transfer is a good parallel for what I’m thinking right in this case

Thoughts on Machine Learning, Black Boxes, and my SVM

There was quite a bit of discussion about machine learning (ML) techniques on Soccer Analytics TwitterTM today, so I expedited this post I’ve been planning for a few days.

I think it’s important to define what ML is for people. Normally I don’t like this, but Wikipedia has a good definition that I think works for what I wanted to communicate, so here we go:

Machine learning is a subfield of computer science that evolved from the study of pattern recognition and computational learning theory in artificial intelligence. Machine learning explores the study and construction of algorithms that can learn from and make predictions on data.

The big thing here is that ML techniques focus on prediction – if you’re trying to predict something, then you should look at ML first. What it doesn’t do well is explain things, as captured by these tweets from Michael Caley.

This was in response to Ola Lidmark Eriksson’s blog post talking about a machine learning approach to calculating xG. Michael’s approach is aimed at a mass audience, explaining makes a shot more likely to go in (e.g. closer to goal, more centrally located, not headed), while Ola focuses on greater accuracy and letting the model make the decisions. Both versions have their merit, and so much of it really depends on what trade-offs you’re willing to make. How much accuracy do you gain compared to the lack of explanatory power from these types of models? How much are you willing to sacrifice on either dimension?

All that being said, I wanted to talk about my method for predictions because I get a number of questions. I frequently get people asking “Why does your model  like Arsenal so much?” or “What stats are driving your results?” or “Why does your model think Theo Walcott is so good?” The answer to this is always “I don’t know”, and that’s a feature of the Support Vector Machine model. I explained the Support Vector Machine (SVM) in another post, so I’m not going to revisit the whole thing, but it’s worth a read for anyone interested in what’s under the hood. But I wanted to highlight the “black box” nature of these models, SVMs in particular.

The reason I like the SVM model for soccer is that it doesn’t assume any functional form – it doesn’t think that more passes or more possession is necessarily good. It looks at stats for results, and learns how many passes and how much possession is optimal given the other game stats, and predicts results that way. It doesn’t tell you what the inflection points are, and doesn’t tell you what the cutoffs are, and the interactions in such a big model are too complicated to present visually. But it does predict well, which is what matters to me for my purposes.

Another thing it does is recognizes the value of defense and balance in a team. One of the common refrains of Soccer Analytics folks is that it’s impossible to quantify defense. The SVM proves that this isn’t true, as it recognizes the value of having players who make some tackles/make clearances/win headers. I’m particularly proud of my most recent exploratory analysis, looking at the value of Man City replacing each of their midfielders with Lionel Messi.

Messi

The model shows Messi as an improvement over all of Man City’s front three midfielders, but is a small downgrade over Yaya Toure and a significant downgrade over Fernandinho. While it doesn’t give me specific reasons for this (the SVM is a black box, remember?), it’s pretty clear that Messi isn’t as good playing in the deeper role that Toure plays and certainly wouldn’t make a good holding midfielder like Fernandinho. Any increase in offense brought on by playing Messi instead of Fernandinho would be more than offset by the loss in defensive strength. This passes the common sense check.

Most importantly, I think this highlights one of the advantages my model has over some of the dominant models out there – specifically ones based on xG or some other offensive contribution. It knows if you’re playing too many attacking players, and will punish you for that. It can find places where your team is imbalanced (Mesut Ozil at Arsenal is a great example of that – my model much prefers Daniele de Rossi in his place, which is a very different role), and point out ways to fix that and can even recognize potential tactical improvements. It does all of this without knowing anything about players other than their average statistical contribution to a game.

Machine Learning techniques have their place, and if prediction is your goal then you really should learn something about them. But if you’re looking to explain things, then there are more appropriate methods and you should learn those. My SVM has predicted results well so far, and it quantifies individual player contribution to a team as well as anything out there (I would argue better, but I have no statistical proof of this). But it doesn’t explain outcomes particularly well, and it doesn’t explain why it prefers certain players over other players. That’s a job for other methods and people who are more interested in explanation. As usual, it’s about the right tool for the right job, and Machine Learning techniques are the right tool for predicting outcomes and quantifying individual contribution.