Category Archives: Blogs

Home Field Advantage has Disappeared from the EPL

I was going to start this post with a story, but I think the headline says it all: there is no such thing as Home Field Advantage in the Premier League this season.

This post was inspired by my eyeball test of MOTSON, my statistical prediction model. My initial reaction was that it was over-estimating the chances of home teams, as most of the “bad” results were from away wins. Inspired by a conversation with @11tegen11, I looked at the overall predictions of my model, and it predicts 47% home victories, which is inline with historical data (last season was ~45%). So if my model is well-calibrated, why does it seem to over-value home field advantage? Simple: home-field advantage doesn’t exist this season.

As of this morning, we had 167 fixtures played. The home team won 62 times, drew 46 times, and lost 59 times. First cut at the evidence: the number of wins is basically equal to the number of losses. The home team wins almost as frequently as they lose.

The next thing I did was look at the points-per-game (PPG). Home teams earn 1.39 PPG, while away teams earn 1.33 points PPG. These numbers look basically the same, and a statistical test confirms that they are indistinguishable from each other (p = 0.705).

The next step was to compare this to last year. A quick look at least year’s results show the home team won ~45% of the time, drew 25% of the time, and lost ~30% of the time. This comes out to 1.6 PPG for the home team. So again, I ran a statistical test and found that this number is statistically distinguishable from this season’s results (p < 0.05).

Week 17 Barplot Home Field

As the barplot shows, the difference between Home PPG in 2015-16 is outside the 95% confidence interval, while Away PPG and the PPG where win, lose, and draw are assigned (1.32) equal likelihood are well within the range. In statistical terms, we can say that Home PPG 2015-16 is distinguishable from 2014-15, but not the other two categories.

The final test was to test whether a randomly generated series of results was statistically distinguishable from this season. So I simulated 10000 “seasons” up to this point with an equal probability of win, lose, and draw for each fixture. I then calculated the PPG for each of these seasons, and compared those randomly generated seasons to the current one. Of the 10,000 simulated seasons, only 140 were statistically distinguishable from the current one.

Week 17 Home Field Advantage

The blue highlighted part of the graph represents seasons that are indistinguishable from this one, while the much smaller red part represents seasons that are statistically distinguishable from this one.This means that if the “true” probability of winning at home was equal to the probability of losing at home, we’d randomly see a season that didn’t look like this one only 1.4% of the time. Simply said, the trend documented in this piece by Oliver Roeder and James Curley has continued and home field advantage in the EPL has disappeared this season.

I have no way of saying whether this season is an anomaly or whether the current trend will continue. It is significantly different than last year, but given the trend it’s hard to say whether last year was the outlier or this one was. Only time will tell, but what we can say is that home field advantage is worth significantly less than it was last year. That is to say it had value last year, and we can’t tell any effect at all this season.

Author’s note: I aimed the statistical discussion here for a general audience with some knowledge of principles of stats but without knowledge of the different kinds of t-tests. Replication R code and a more technical version will be available as soon as I get around to writing it.


Thoughts on Parsimony v. Accuracy

My model has what a lot of people are considering an odd prediction, heavily favoring Arsenal over Manchester City at the Emirates this week. I’ve had a few people ask “Why?” and I’ve pointed them to my blog post about the model being a “black box”. I can’t point to any reason why it likes Arsenal so much, although it does like teams with a home field advantage and has a math-crush on Theo Walcott (calling him “Europe’s most valuable striker”).  James Yorke of Stats Bomb pushed back on this a little bit:

James is a smart guy and a great writer that you all should follow if you’re not already, so I wanted to write a longer form post talking about why what I’m doing is important for soccer analytics and why I choose a “black box” model over a typical regression format with clear coefficients and tests of statistical significance.

I wrote about it in more detail in another post, but I’m a firm believer in using the “right” model rather than the convenient one in most cases. 1 The SVM assumes no functional form for the individual variables, and we have no idea what the correct functional form is for things like possession or number of passes so we shouldn’t be doing any sort of linear regression on these variables. If the trade-off is not being able to say “Arsenal makes 37 more passes a game than Manchester City, therefore they’re more likely to win” then I’m ok with that.2

But more importantly, I think my model gets at things that much of the rest of the community isn’t getting at. Expected Goals and Assists are interesting and useful, but I think if we force everyone into that sort of analysis and language, then we’re really limiting ourselves. They’re good because they’re easily observable events, and goals are the…well “goal” of every team, and assists are the one-off event (the one immediately before the goal). However, they’re such a small part of what actually happens on the pitch, and there’s a little bit of the “Drunkard’s Search” going on if we limit ourselves there.

My model looks at the entire game, and the entire universe of (publicly available) statistics and makes no judgment on what is important. It likes tackles, headers, and other defensive actions quite a bit, which are generally thought to be unusable by most of the analytics community. While xG and  xA models, and all other models I’m aware of would prefer Lionel Messi to Fernandinho in a holding midfielder role for Manchester City, mine recognizes that he’d be a downgrade there.


Models that only look at offense and observable outcomes don’t get this right, but intuitively I think we can agree that this is true. We know limiting xG is important, but we don’t know how that happens. My model seems to understand that, and even though it doesn’t have a great answer for “why” it does lead to potential explanations and hopefully some testable hypotheses. Limiting ourselves to one way of thinking is ultimately going to leave the soccer analytics world stagnant, and there is value in multiple methods and multiple approaches. Parsimony is good, but we shouldn’t exclude complexity that generates insights into the game because it doesn’t fit well into 140 characters.


  1. If the convenient model gives roughly the same result, then go with the convenient one, but I’m a big believer that accuracy should never be second to parsimony.
  2. Reasonable people can disagree on this point, as I’ve written before.

Relegation Round-up: Big Weeks for Bournemouth, Watford, and Newcastle

My favorite part of the ESPN Soccernet podcast back in the day (other than West Ham corner) was “Relegation Round-up” where they would talk about the relegation teams. I’ve tried it on my Twitter account, and I’ve started to get a little more interest lately so I wanted to do something more long-form. So here’s my first “Relegation Round-up” blog post, and it was a big week for a lot of teams.

As of today, the relegation fight really includes 4-5 teams: Aston Villa, Sunderland, Norwich City, Bournemouth, and Newcastle. Newcastle has potentially pulled themselves out, which I’ll talk about later, while Villa and Sunderland need to make up some ground very quickly.

Week 16-2 Heat Map

The heat map shows roughly 5 clusters: the top three are pretty much set, the next 4 teams have separated themselves (although I’d expect Chelsea to slip into the next pack soon), 5 teams (Palace through Everton) have solidified a mid-table spot next season, 4 teams are in the lower mid-table but safe for now (Watford through Newcastle), and the bottom 4 are in serious danger of the drop (Bournemouth through Villa). But how did things shake out this week?

The big news is that Bournemouth claimed their second major scalp in a row, beating Manchester United 2-1. This was a huge win for them, especially given that they had beaten Chelsea 0-1 the previous week at Stamford Bridge. The first win could maybe be written off as another bad week in Chelsea’s nightmare season, but two in a row is huge for Bournemouth. Beyond the significance of beating two of England’s giants, it helped them massively in their bid to stay up next season. Here’s the heat map of their weekly predicted finishes.

Week 16 Bournemouth

The change the last two weeks is striking: they had a 22% chance of finishing in last place 2 weeks ago, but with the 6 points (and 4+ point gain in expected points), they’re down to less than 5%.  Two weeks ago they were big favorites to get relegated at 68%, now they’re down to around 40%. There’s still a lot of work to do, but they’ve started the hard work of getting into the lower mid-table safe zone.

Meanwhile, Newcastle claimed two big victories of their own over Spurs and Liverpool. These weren’t as historic given Newcastle’s  history in the Premier League, but they were potentially even more important to Newcastle’s survival. Here’s the plot of their season.

Week 16 Newcastle

With the two unexpected wins, Newcastle made a similar jump. You can see in the graph, things weren’t looking great for Newcastle two weeks ago. They were ~33% to drop, and things were trending downward. With the two wins, they’re only at about 12% to get relegated, with virtually no chance of finishing in last place (which wasn’t the case two weeks ago).

Finally, this may be the only time I talk about Watford in the relegation discussion because they are on a three game winning streak, and have firmly moved themselves out of the drop zone. Week 13 they had about a 20% chance to get relegated, but after winning against Villa, Norwich, and Sunderland they’re down to about 3%. It’s simple enough logic: win the key 6-pointers against your direct rivals and you stay in the Premier League.

Week 16 Watford

The three teams here have had a series of good results, and have begun to separate themselves from the bottom of the table. Villa and Sunderland need to respond quickly or they’re in serious danger of being left behind. Villa has a series of winnable games in a row against Sunderland, West Ham, Newcastle, and Norwich City, and I’ll look into how many points they need out of the next 12 if they want to stay up in a later post. I have to think they’re due for some luck soon – they haven’t been good, but their squad has far more quality than 6 points from 16 games.

On the other side, Sunderland seems to have had their initial bump from hiring Big Sam, but the home loss to Watford was problematic to be sure. They’re in a better position, but they need to pick up some points soon if they’re going to pull another great escape.

Game Theory: Top 4 Contenders Leicester City Should Absolutely Keep Jamie Vardy

A few days ago I wrote probably my most controversial blog, where I said emphatically that Leicester City has to sell Jamie Vardy in January.  The most interesting part to me is that I never actually tweeted the link to it – @FanalyticsBlog auto-posted it like it does with all my blogs (and many other excellent soccer analytics blogs), and it got a decent number of retweets pretty quickly (and was later picked up by a Dutch soccer account that may or may not have liked it – Google Translate was unclear). I came back from teaching and apparently Soccer Analytics TwitterTM was debating it pretty heavily. Tom Worville and Mike Goodman made some interesting points about why I was wrong, and someone whose name I forget pointed out I ignored the damage to the fan base (something I’d written about previously and did ignore here).

Since then, things have changed, and changed dramatically. Leicester City won a difficult road game at Swansea, continuing to overachieve, and it’s now a very realistic possibility that they qualify for the Champions League.  Here’s my model as of Saturday:

Week 15-1 Final Table

I’ve got them in 4th place currently, edging Liverpool and Spurs for the final Champions League spot. 1 I’m not the only one, @GoalImpact’s has them in 4th,  @Stats4Footy has them in 5th place, and Michael Caley (@MC_of_A) has them in 6th, but with an almost 30% chance of qualifying for the Champions League.

None of our models are perfect, but when so many of them cluster around such an unusual finding it’s really compelling evidence: Leicester City is for real.

This is a truly historic year for Leicester City. I wrote about their overachievement a couple of weeks ago, and in that article I mentioned that their next 6 games would be tough but that MOTSON didn’t expect much from them. MOTSON predicted ~6 points in 6 games, and so far they’ve earned 4 points out of 2 games with struggling Chelsea coming up soon. There’s no reason they can’t pick up even more points over the next 4 games, which is a truly scary though for Liverpool and Spurs.

If they have a real chance of making it into the Champions League (and an even better chance of staying in the Top 6 for the Europa League places), Leicester City has to keep Jamie Vardy (and Riyad Mahrez) at least until the end of the season. I’m still a believer in selling players at their peak value, but a special season like this changes the economic calculus. Formally, it looks like this:

Pr(Champions League)*Money Gained from Champions League – Value(Vardy/MahrezJanuary-August)

Ignoring the emotional component for a minute (and I acknowledge that it is important), the Champions League can mean a huge financial infusion for Leicester City. All the models have them at somewhere around 30-40% likely to qualify as of today, so multiply that number * the money gained from qualifying for the Champions League.2. If that number is greater than the premium they would get for Mahrez and/or Vardy in January compared to selling in August, they should keep the players. I’m more optimistic about the January “panic premium” than most, and I don’t think it would be that much compared to what they’d get for the two in August, especially the younger Mahrez.

The opportunity to qualify for the Champions League, or at the very least the Europa League is very real, and as long as Leicester City keeps this pace up through mid-January they shouldn’t sell their stars. If they slip, or Liverpool/Spurs gets really hot and things change, then it might be time to sell. But I’ve changed my mind – if the Champions League is a real possibility then they have to pursue that. They have to keep Jamie Vardy until at least August.

  1. If Liverpool wins Sunday, they’ll likely edge back ahead – MOTSON doesn’t have them as much of a favorite over Newcastle, and actually thinks a draw is the most likely outcome at 42%.
  2. The actual calculus would be the probability of qualifying for the group stage, which is where the money is earned, but I’m simplifying here for the sake of readability. One should also add the money from the Europa League which isn’t substantial for big clubs~! but would be for someone like Leicester City.

    One can also picture a second level where one big season gives Leicester City a boost that keeps them in the Premier League for X extra years, and the money that comes from that. The math can get as complicated as you want, and the more complicated it gets the more it tips the balance in favor of keeping Vardy.

Game Theory: Leicester Has To Sell Jamie Vardy in January

With his record goal-scoring streak for Leicester City, and at least a couple “big clubs” in England in need of a goal-scorer (Man United and Chelsea stand out in my mind), you’d have to imagine Jamie Vardy will be in demand this January. There is zero doubt in my mind that Leicester City should cash in and sell him now.

Much of my logic comes from my “You should sell your over-achieving striker” post, with the basic idea being that strikers who go on a hot streak end up over-valued. This is particularly true in January when teams are willing to overpay for mid-season replacements1. Jamie Vardy’s value will likely never be higher than it will be when the transfer window opens in a few weeks.

But there’s a second level logic here – either Jamie Vardy is a world-class finisher who will continue to perform at a world-class level, or he’s going through an unrepeatable purple patch and will regress to the mean in the very near future. The problem for Leicester City is that those two realities lead to two very different outcomes for them.

As of this post, Vardy has scored in 13 straight games, leading the Premier League goals table, breaking world records, and is confident he can keep scoring goals and getting away with racism.2. And using Danny Page’s great Expected Goal simulator (and Paul Riley’s tremendously helpful public numbers), we can see that he’s scoring at over 2 standard deviations above the mean expectation for his xG.

blog screenshot

The first, and most likely, explanation for this scoring rate is that Vardy is on a ridiculous hot streak that won’t continue for long. These sorts of streaks seem to be able to continue for a season, but after that we see massive drops in production and a host of “what happened to…” stories. If Jamie Vardy regresses to the mean and scores goals at a reasonable rate after this (8 goals in 14 games is nothing to dismiss), he’d be a solid striker for a mid-table EPL side like Leicester hopes to be. However, if he follows the pattern of so many before him and drops below the expectation he’d not be a drag on Leicester’s expected points. In this case, Leicester City would certainly be able to keep him, and more accurately wouldn’t be able to sell him even if they tried.3

The other option is that Jamie Vardy is one of Europe’s best strikers, scoring at a rate comparable to Lionel Messi and the other elites. With all due respect, strikers who score at anything approaching this rate don’t play for Leicester City. If he actually is that good, Leicester has to sell him eventually, so why not now? His value is likely at, or near, the highest it will be while he’s with Leicester City. He’s 28, so he’s as young as he’ll ever be, and is nearing the end of his sellable years. Even if he’s an elite striker, scoring well above expectations, he probably can’t maintain this pace (10-11 goals with this xG seems more reasonable for a world-class finisher), and we know teams over-value goals. And January is the time for panic-buying, so they’ll probably be able to get a premium there as well, plus we’re early enough that he won’t be talking about how he wants a transfer and how unhappy he is at Leicester which will reduce his value.

As of today, there’s a lot of hype and uncertainty as to Jamie Vardy’s future goal-scoring prospects. Plenty of people are likely to overrate him due to his current form, and additional time will reduce that uncertainty in one direction or the other. And as I said above, neither direction really helps Leicester’s position, so all this, combined with the fact that Leicester’s league status is basically assured for next year, makes it ideal (and I would argue practically mandatory) to sell him today while his value is at its highest. Get the rumored 15 million pounds for him and invest in a couple other players to improve the team for next year and seasons to come. If he’s that good, they’ll have to sell him anyway, and if he’s not, they’re stuck with a depreciating asset.

  1. I could easily picture Chelsea drastically overpaying to try and right the ship and replace the out of favor Diego Costa
  2. Yes, I’m aware this is a parody site, but it’s a really funny one so I wanted to link it
  3. I know there are injury issues at play here, but Swansea releasing Michu on a free transfer is a good parallel for what I’m thinking right in this case

Want to Make Analytics More Accessible? More Context, Less Acronyms!

One of my more popular posts in the past was my discussion of how to make analytics more accessible, so I wanted to follow up on that with some more thoughts on how to make analytics more accessible to a larger audience. These are from my own experience with Analytics TwitterTM , especially as I’ve tried to look into other sports for inspiration on different projects or different ways to do things.

  1. Use fewer acronyms, more wordsI know Twitter’s character lends itself to acronyms, and it’s difficult to be precise in 140 characters (minus someone’s username and 25 characters for an embedded image), but the acronyms imply a level of familiarity among your readers that keep analytics conversations at a level only a limited audience can understand. Here’s an example:

    To be clear, Footy in the Clouds is a great Twitter account with lots of great info for the soccer stats community that you all should follow, and I’m not singling him out here. Plenty of people do this sort of thing, and it’s a good way of communicating a lot of information in a short space. But this only works for people who understand what you’re already saying. Anyone outside the circle won’t get it, and if the goal is to expand the analytics discussion then we need to be more transparent with what we’re saying.1

  2. Offer context for any statistics you present

PDO’s a great example here: I know I’ve read about PDO in the past, and I vaguely remember that every team basically regresses to 1000 in the long-run, and I know that some analysts argue anything over 1000 is lucky while anything under 1000 is unlucky. I’ve never assessed those claims, and to be honest I can’t remember exactly what the measure is. I participate in Soccer Analytics TwitterTM regularly and read much of what people post, and I *still* don’t fully understand the measure.

Tell me what the measure is, tell me what the average is and whether a team is above or below it, tell me whether this is due to some inherent skill or whether we’d expect it to regress to the mean at some point. Numbers are useless without context, so if you want a broader audience make sure that the audience knows these things and can make proper use of the numbers.

A great example of this is Mike Goodman’s most recent ESPN column, an excerpt of which I’ll post here:

German teams tackle more, intercept more and generally contest their opponents more aggressively further up the field. In Germany, if an attack has progressed to the point where a player might consider shooting, they’ve already accomplished a lot of the hard work. In England, a player at a similar point is more likely to have the defense still set in front of him. It might be easier to get into the final third in England, but it’s harder to get a shot on target once you do. This might also explain why English teams play more passes in the final third (117 per game), than their German counterparts (97)

Mike Goodman does this better than anyone – finds numbers, thinks about what they mean, and writes them up in a sophisticated, yet transparent way. This is why he’s one of the best writers out there and is so widely respected. We should all try to emulate him, whether we’re trying to expand the audience of analytics work or not.

Don’t assume people know what you’re saying, and don’t assume that your point is self-evident. Maybe this is why we need more long-form blogs to supplement the Twitter conversation – it’s hard to explain and offer context in 140 characters, but it’s crucial if we want to expand the reach of analytics work.

  1. There is a legitimate discussion to be had about audiences here: I get into this with Neil Charles and Simon Gleave on a semi-regular basis and there is no right answer. It’s all about what individuals want from their work – if one wants to appeal to the niche high-level analytics audience, that’s ok with me. If one wants a larger audience, then that requires a broader communication style.

The Importance of Community and Sharing the Work of Others

Disclaimer: This blog won’t have any analysis/stats/fancy charts. It’s just my thoughts on how to build a greater community in the analytics world, encourage more long-form blog posts, and maybe move the ball forward by decreasing the barriers to entry. 

Like most of the people who read my blog, I follow the same handful of “big accounts” everyone else follows. Most of them are “big” for a reason: they post interesting, well-written work and have built up a significant audience over time. But it’s a logical fallacy to think that everyone who posts interesting, well-written work has a big audience while people who have small followings aren’t doing that. I’m fortunate that my work was discovered and embraced fairly quickly by the analytics community, and I’ve quickly developed a decent-sized following on Twitter for my work. I’m also impressed by the quality of my followers: they’re smart, engaged people who are interested in talking about soccer stats on a higher level than the “average fanTM.”

I’ve written about it before, but in my day job I’m a political science professor and I have a Twitter account where I talk academic politics. I have a small following there (~350 followers), but my followers are almost all high quality folks: current/former students, reporters from national and international media, professors, and major practitioners. It has opened up any number of professional opportunities to me, and I feel very lucky to have been able to take advantage of them as they presented themselves to me. But more than anything, I’m grateful to the folks who initially followed me back when I only had a few (< 10) followers. They followed me early when no one else was, and shared my tweets so other people could discover that I had an account and potentially follow me too and without them I wouldn’t have been able to build up even the modest follower count I have now. Because my soccer account has gained some quick visibility, I’d like to maybe pay that forward a little bit and try and draw some attention to writers you may never have read or heard of.

There is so much interesting work being done out there, most of which I probably never see. So Sunday I tweeted the following:

I found a few really interesting articles from this, which I’ll share here before I go any further:

All of those articles are interesting, and worth a read when you finish here. And follow their authors on Twitter. I never would have seen them if I hadn’t put out the open call. I’ve benefited from “big accounts” sharing my work in the past, and always appreciate when someone with high visibility tweets something I spent some time on and am proud of. I’m not a big account in the community, but hopefully I can share some underappreciated work and help some people in the way that others have helped me in the past.

So from now on, once a week (probably Tuesdays) I’m going to put out an open call for bloggers to send me their articles and I’ll share them with my audience. And I’d ask that anyone reading this takes a few minutes to do the same. Liking someone else’s work is costless, and retweeting one or two pieces a day is virtually costless as well and can help someone out. If you can tweet your own pieces three or four times (or retweet everyone who says something positive about your work), why not cut back to two or three self-promoting tweets and use one of those to promote someone else? If we really want to build a community and expand analytics rather than just promoting our own brilliance, this seems like a better way to do it.


The Importance of Qualitative Scouting: A Response to the Analytics FC Response

@BobbyGardiner posted a response to my “Sell your overrated striker” post over at the Analytics FC blog, and it’s a good read for anyone interested in the topic. You should read the whole thing (and everything the @AnalyticsFC crew does), but his main point of disagreement with my post was the idea that Stephan el Shaarawy overachieved for Milan back in 2012. Bobby’s argument was that El Shaarawy wasn’t particularly spectacular that year, converting at a rate of 15% and scoring many of his goals from good positions. His shot chart from that season is below:

The alternative hypothesis is that el Shaarawy’s decline in production was based on a series of injuries rather than overachievement in one year and a regression to the mean in others.

I think these sorts of debates are interesting and important, and I appreciate Bobby’s response to my post. One of the things I was taught in graduate school was the highest compliment someone can pay is to engage with your work, and the worst is to ignore it. So I’m posting this response to engage further rather and maybe advance this discussion, not as some sort of attempt to “win.”

To that end, I have a couple of thoughts: one is sort of technical while the other is posting the next step of the argument. The first is that I picked il Faraone because, as a lifelong Milan supporter, his was the first name that came to mind. Milan’s biggest mistakes in recent years have been to sell low: Pato, Zlatan, and el Shaarawy have all been the subject of massive bids only to be sold a year or two later well below their original market value. He’s not necessarily the best case study, and Bobby’s choice of Alexandre Lacazette may very well be a better one. El Shaarawy also isn’t a good choice because I originally was aiming at talking about “selling” clubs, which Milan hasn’t historically been, and Lacazette may be a better option there as well. The case study wasn’t necessarily the point, the argument was that knowing when to sell overachieving players can make clubs a significant profit in the long-run.

More importantly, I think this points out the need for qualitative scouting to complement analytics approaches. Let’s use Lacazette as our example for a minute: he performed spectacularly last year, and Bobby’s argument is that he may very well have over-performed significantly. He also points out, correctly, that we don’t know this just from looking at stats. This is an area where Milan supposedly shined, but evidence seems to show that their methods were imperfect at best.

The vaunted Milan Lab describes itself as “a high tech interdisciplinary scientific research centre.” One of its big claims to fame back in the day was finding older players who could play at a world-class level far later into their careers than any other club, letting them buy 30+ year olds at a bargain price. This was demonstrably false when they let Andrea Pirlo go on a free transfer to rivals Juventus a few years back, and he put together some of the best years of his incredible career for perhaps their biggest rival. It’s other big claim to fame was maximizing training techniques to minimize injuries/to know which players were more susceptible to injuries. Pato proved this false on a regular basis, and more importantly to my point, they should have recognized el Shaarawy’s seemingly injury prone nature. That would be another inefficiency that could be exploited – knowing that el Sha would be likely to see significant injuries would be another way to get more money than his value from Manchester City.

Beyond this, regular qualitative scouting is important. I watched virtually all of Milan’s games that year (back when they were streaming on ESPN3’s app), and while he had an amazing season, el Shaarawy is a winger who was forced to play centrally by a combination of injuries and the sale of main striker Zlatan Ibrahimovic. He had a great season, but he was playing far too centrally far too often in the new Milan lineup. He was also a 20  year old kid being asked to lead the line for one of Europe’s biggest clubs, and there was plenty of talk at the time that this was too much pressure and too many minutes for him. I don’t watch enough of Lacazette to know for sure, but Lyon should be thinking whether he can repeat last year’s achievements or if they should max out now and sell him at his peak.

As with everything, the numbers can lead you where to look. El Shaarawy had an above-average season, just like Lacazette. Their performances were significantly more than one would expect from a young attacker, so a red flag should be going up from the analytics department. Then let the qualitative folks watch their performances: are they likely to improve, sustain, or decline? Is it worth selling now to cash in at the max rather than going through a couple drought years and then selling at a fraction of what you would have made originally? Numbers alone won’t get you there, they can only narrow down the range of possibilities you are expected to look at. Qualitative approaches are the perfect supplement for questions that numbers can’t answer alone.

Moneyball, but for Selling: Using xG/Goals Ratio to Profit

One of the most overused clichés in all of the Internet is “Moneyball, but for…”1. Moneyball, correctly applied, is the idea of using undervalued stats to figure out which players to sign at a bargain price. By exploiting an information asymmetry, small clubs were able to find value in players that bigger teams who focused on traditional stats were missing. However, soccer has a version of this that American sports don’t have: the ability to sell players for a profit rather than simply signing or trading for them. This, along with sophisticated analytics work, can help teams identify inefficiencies in the transfer market as a way to profit or make money to reinvest in multiple players.

Continue reading Moneyball, but for Selling: Using xG/Goals Ratio to Profit

  1. “Uber, but for…” Has to be a close second

Game Theory: Alan Shearer Isn’t Entirely Wrong

Alan Shearer riled up Analytics TwitterTM today with this comment:

Analytics TwitterTM hates these sorts of arguments, best summarized by @MessiSeconds response:

I’m confident Joel’s video response will be well-written, well-argued, and well-produced like the rest of his videos, but I think the answer is much simpler than this: at the end of the day, points are all that matter in the relegation fight, but we’re nowhere near the end of the day yet.

However, I’m going to depart slightly from what I assume the rest of Analytics TwitterTM is probably going to say. Alan Shearer is right: Three points for Newcastle over Bournemouth was a significant result, and changed my predicted final table in a significant way given it is only one result. Here’s the weekly heat map results from my model for the two teams:

Week 12 New Bou

The win moved Newcastle’s relegation probability down a few points, and moved Bournemouth a few percentage points closer to relegation. Regardless of the stats, this was a bad week for Bournemouth. Playing well isn’t much comfort in this case.

However, the stats tell a more nuanced version of Shearer’s point, ultimately leading him to the wrong conclusion. Newcastle doesn’t care about winning ugly this week, but in the long-run they do care that they won a game they in all likelihood shouldn’t have because that means they can expect to earn fewer points across the season. In a one-shot game, Newcastle is happy to have taken the three points from a relegation rival and they don’t have to give those back just because they were out-played. However, across a 38 game season, if they keep playing like this, the law of large numbers will probably catch up with them and they’ll be down in the Championship.1 It’s the reason Las Vegas casinos make billions of dollars: if the fundamental statistics are in your favor, in the long-term you can’t possibly lose.

Three points today are a victory today and no xG map can change that. Newcastle’s probability of staying in the EPL next year increased, and the points they earned stay on the table. The opposite is true for Bournemout: their probability of staying decreased, and the points they missed can never be made up. But in the long-run Newcastle’s manager knows that they got lucky and will have to improve if they want to avoid the drop.  Alan Shearer likely knows that too, so we shouldn’t feed the troll.


  1. 38 is hardly what the law of large numbers folks had in mind as a “large number” but it’s close enough for our purposes. If you’re not satisfied with this explanation, then extrapolate it to multiple seasons until you get to a number large enough for your taste.