Category Archives: Blogs

Game Theory: All xG Are Not Created Equal, or Violations of the Independent Trials Assumption

Flashback to grade school, and one of the first things we all learned about probability is that each trial is independent from the previous ones. If you flip a coin, and it comes up heads 9 times in a row, the odds of it coming up heads the 10th time are exactly the same as they were in each of the first 9: 51%.1

This is the law of independent trials: results at time (t-1) don’t affect the probability of a given outcome at time (t). The coin doesn’t know that it just came up heads, and nothing about the fact that it came up heads  changes the future probability of it coming up heads. The trials are independent, therefore the outcomes are independent.

We think of shots the same way in expected goal models – a shot with an xG of 0.4 has a 40% chance of going in, and a shot with an xG of 0.1 has a 10% chance of going in. Therefore, teams basically have to take 4 shots with an xG of 0.1 to equal one shot with an xG of 0.4.2 In this case, what does a rational team do? Unless you can somehow consistently get 4 times as many 0.1 xG shots compared to just a couple more passes and getting a 0.4 xG shot, you should always look for one more pass because your xG will be higher. Long shots are incredibly low value and should never be taken by a rational player.

However, this only looks at the offensive side of things. Now we turn to the defense. If I’m a smart defender, knowing that this is the dominant strategy I back off and let anyone shoot from outside the penalty area who wants to. They might buy a lottery ticket once in a while, but in the long-run I’ll be better off closing down the 0.4 and higher xG opportunities to make sure they never happen. Even if I get half (or even a third) as many shots as my opponent, if they’re 4 times the quality I will win more games than I lose.

Now we have a new problem: I’m a defender leaving strikers unmarked outside the penalty area and they can shoot at will, the expected goals value increases: I don’t know of any measures that take defense into account3, so the 0.1 value is calculated with an assumption that a team is putting together some sort of defense up. Long shots score 10% of the time given a  reasonable defense, but if they’re wide open then you could probably assume a higher xG value for long shots. Closer shots score 40% of the time given a defense that isn’t all centered in the six yard box ready to frustrate incoming players, but if that’s the case then they probably earn a lower xG score. This is what formal theorists call a “dynamic equilibrium” at play here: results are calculated based on both sides putting out their best possible strategies.4

The point here is that the different strategies matter, and that the strategies employed for the first shot change the results of the second shot. A 0.4 xG shot is only a 0.4 xG shot because defenders have to provide some sort of defense to the 0.1 xG shots that happened earlier (or could have potentially happened earlier). But if a team never shoots from that distance, or has zero quality from that distance, then we never reach the dynamic equilibrium. At this point, it becomes logical to take some low xG long shots to “keep the other team honest” and open up the higher xG close shots later.

When you’re looking at xG maps, look at the whole picture. How did the smaller xG shots affect the higher xG shots later in the game? Or the opposite: did a number of high xG shots affect the value of the low xG shots later? A team can’t live on a diet of high xG shots alone, and it becomes optimal to take a handful of low xG shots to open up more high value ones later.5 All xG aren’t created equal, and when you look at the maps think about the second-order value of the lower xG shots.

  1. Seriously – a professor at Stanford found that there’s no such thing as a “fair coin.” and the full study is available here:
  2. This isn’t *quite* the same probability: the math for at least one goal with four shots of 0.1 is 1-(0.9*0.9*0.9*0.9), or 37%, but it’s close enough for my purposes here. See Danny Page’s excellent treatment of the topic if you’re interested in all the math behind this idea
  3. Readers: correct me if I’m wrong here. I’m not as familiar with the inner workings of all the different models out there as many of you are.
  4. In political science, the dynamic equilibrium argument is important to the study of campaigns: advertising has no net effect in presidential elections because both sides are running so many ads they cancel each other out. But if only one candidate went on TV, presumably things would be different.
  5. Forgive me for not looking it up, but there is research out there that argues that it’s optimal to take penalty kicks to the player’s weaker side a percentage of the time to keep the goalkeeper from always going to the strong side. I think the numbers were like 75-25% strong/weak. This is the same idea: take some weaker shots to open up the stronger ones later.

The Drunkard’s Search: Analytics, #confidence, and Observing the Unobservable

“There is the story of a drunkard, searching under a lamp for his house key, which he dropped some distance away. Asked why he didn’t look where he dropped it, he replied ‘It’s lighter here!’. Much effort […] in behavioural science itself, is vitiated, in my opinion, by the principle of the drunkard’s search” – Abraham Kaplan (1964)

Continue reading The Drunkard’s Search: Analytics, #confidence, and Observing the Unobservable

Game Theory: Rotate or Full-Strength Squad in the League Cup Round of 16

Building on my previous “Game theory” post about how it’s rational to play a weaker squad in the Champions League compared to the EPL, I wanted to walk through the logic of doing so in the League Cup Round of 16.

The general idea is the same: you rotate if immediate results in the league are more important to you, you play a full-strength squad if the cup is more important to you. This is mitigated by several factors:

  • Your odds of winning today’s match with a full strength squad vs. rotated squad
  • Your odds of winning the Cup
  • Your league position relative to where you want to be/can you risk losing points this weekend
  • Psychological benefits

So I wanted to walk through the logic for a couple of teams playing today to see what the rational choice would be.

Leicester City

Leicester City has been over-achieving in the league, with 19 points through 10 games as of today. You’d have to think they would have been ecstatic with a mid-table finish at the beginning of the season, so they’re over-achieving by any measure.  They can afford to drop points this weekend in the league.

Similarly, they’re playing a weaker opponent they’d be expected to beat in Hull City. Rotating the squad could make a significant difference there, but if they play a full strength squad they’d be significant favorites, even on the road.

Their odds of winning the cup aren’t great, but a lot of that depends on the draw and what Arsenal, Chelsea, and the two Manchester teams plan on doing.

The psychological benefits of making a run in a tournament like this, and potentially getting  a game at Wembley and winning a trophy could be huge for a team like Leicester.

Based on these factors, my “model’s” prediction would be a full-strength squad, or close to it.


Arsenal have to be considered heavy favorites over Sheffield Wednesday, but this is also likely true for a rotated squad. A team the size of Arsenal should be able to play their second best XI and beat Sheffield Wednesday.

As one of the top teams in England, Arsenal have to be considered one of the favorites to win the Cup. However, they’re probably co-favorites with Manchester City, Manchester United, and Chelsea (if Chelsea ever gets their act together…).

Arsenal’s league position is roughly where they want it,  tied with Manchester City at the top of the league. However, because they’re tied they don’t really have any room to spare and can’t afford to take any risks in the league if they want to mount a serious title challenge. And because they’re Arsenal, they may want to be extra careful to avoid tempting the fates.

Psychological benefits are minimal – a loss with a rotated squad to Sheffield Wednesday would get a couple snickers in the papers tomorrow, but no one would actually think badly of them if they lost. Similarly, they’ve won the FA Cup two years in a row so there’s no burning need to win a lesser trophy anytime soon.

Prediction: Arsenal rotates.


Chelsea is a tough case.1 They’re playing Stoke, so they should be favored fairly heavily. That being said, they’ve been playing well below form so maybe a full-strength squad doesn’t win today. You’d struggle to assign a win probability to them right now, so that’s tough. That being said, Stoke’s a good enough team that they should be able to beat Chelsea’s fully rotated squad.

Also, who knows what Chelsea’s odds are of winning the tournament? They beat Arsenal earlier in the season, but that looks more and more like a fluke with every game that passes. You’d have to think they’re not favored over either of the Manchester clubs right now, nor would they be favored over Arsenal. A win today likely just gets them a tough game next round and an exit by the semi-finals.

Chelsea’s league position is abysmal by their standards (15th through 10 games), and clearly they can’t afford to lose any more ground if they want to keep their chances alive for a 4th place finish. The league is clearly the priority right now.

Psychological benefits: these are tricky. They could use a win right now to maybe build some momentum, building #confidence in the locker room and among the fans. Mourinho is clearly trying to fix things on the fly (see Mike Goodman’s great piece on this ), and he could use some reps with some new tactical tweaks in a setting where there are no real consequences. Mourinho presumably could use a win to get some pressure off of him, especially given the rumors that he might be on his way out soon.

Prediction: tough to call, but the psychological benefits might outweigh other considerations.

  1. Spoiler alert, they played a full-strength squad

Manchester City’s Midfield Depth Problem

This week’s Manchester Derby was fairly uneventful, with City registering the first shot on goal for either team somewhere in the middle of the second half, and the game ending in a 0-0 draw. City was without two of their best attacking players (Sergio Aguero and David Silva), and it showed. To compensate, Manuel Pellegrini moved Yaya Toure up from his typical deep-lying midfielder/box-to-box role to a more attacking role behind Wilfried Bony, who would often come back deeper to link the defense to Toure in attack. From there, presumably the plan was for Toure to distribute the ball to Sterling and de Bruyne on the wing who would then either cut inside or cross the ball to Bony. I say “presumably” because de Bruyne usually either kicked the ball to the nearest Manchester United defender or as far as he could over the touchline, wasting the few opportunities City had to attack.

This plan worked relatively well up until the wingers got the ball, and as a City supporter I couldn’t have been happier with the Bony/Toure linkup. However, the American commentators focused quite a bit on how much City missed Aguero, and while he’s one of the best strikers in the EPL, I disagree that missing him was the problem. Running the quick numbers through my model, Bony as a replacement for Aguero is only a couple point downgrade over the course of a season. Missing Silva was the problem, and City needs a backup for him.1 Who can they get?

First, I looked at my “Points Above Replacement” spreadsheet, and confirmed the conventional wisdom: Yaya Toure and David Silva are both in the top 25 players in my database at their given position for Manchester City.2.  From the few players who were improvements, I looked for players who could play at least centrally in addition to their primary position as either a defensive mid or attacking mid, and I eliminated players who come from rival teams who would be unlikely to sell3. After these filters, I was left with what I consider six good options4.

Midfield Reinforcements Manchester City

The barplot shows the change in expected points for each of the six players I found as options for Manchester City. The best option according to my model is Swansea City’s midfielder Ki Sung-Yueng. He’s in his prime (he’ll be 27 in January), would be relatively easily buyable for a “big club” like Manchester City, and can play either as a defensive midfielder or more centrally.

The next best option, for me, is Milan Badelj. He’s the same age, and can play both centrally and as a defensive mid, and based on his history would be reasonably affordable to buy from Fiorentina.

Gary Medel and Daniele de Rossi are probably my least favorite buys on the list: both are older, de Rossi’s probably unbuyable and I’m not sure what sort of price tag it takes to buy a player from Inter Milan these days.

The other options are the young stars: Ilkay Gundogan, Marco Verratti, and Lorenzo Crisetig. Of the three, Gundogan’s price tag is probably too high and reportedly said “no” to Manchester United this summer. Crisetig is the biggest surprise on the list (my model likes him for Arsenal too), but he’s young with a big upside for me, and wouldn’t be too expensive as a speculative buy. I like Marco Verratti a lot, and PSG *may* need to sell someone if there’s any truth to the rumors that they’re going to buy Cristiano Ronaldo, and he could be a long-term replacement for either of the two aging stars so if he’s buyable I think City should pursue him.

To be clear, this is just a starting point. If I’m in charge at City, I’m surprised by how much the model likes Ki Sung-Yueng so I send scouts to every Swansea City game between now and January 1 and watch every bit of video I can get on him to see how well he’d fit the team’s style and how well he could slot in for either Toure or Silva. Same with Verratti, Badelj, and Crisetig. City’s depth could be their biggest weapon, but it was clear today that they don’t have a great second option for when David Silva is out and that could be what stops them from catching Arsenal.



  1. My model actually thinks Patrick Roberts would be a good replacement for him, but clearly Pellegrini doesn’t trust him as much as my model does so I’m operating under that assumption here.
  2. Yaya Toure is #16 in the box-to-box role, and Silva is #23 in the CAM role for Manchester City
  3. My model really likes Daley Blind as a replacement for both of them, but he’s obviously not an option
  4. Really five good options and Daniele de Rossi, but I’m such a huge fan of his I always like to add him when I’m talking box-to-box midfielders

There is no Debate: Everything is Analytics, Just Using Different Words

So we’ve had a little time since the last major newspaper column about ambient temperature and analytics, so I wanted to post something I’ve been thinking for a while now: Everything is analytics, whether you call it that or not. The two sides don’t complement each other because there aren’t two sides. Unless you purely watch soccer for the artistic brilliance of the game and make zero judgments on the game, you’re analyzing things. You’re deciding which players are good and which players are bad, which team is good and which team is bad, who should have won a game, and who will win your favorite competition. This is the exact same thing the math folks are doing, it’s simply that what we think of as “analytics” is just a more formal way of doing it than most people use.

I’m not going to write a long-winded rhetorical explanation of this point, instead I’ll just provide a few quick examples:

“Real football men” say: “Napoli outplayed Fiorentina last weekend and really deserved their win.”

“Analytics” says: “Napoli’s xG total was higher than Fiorentina’s, so you’d expect them to win.”


“Real football men” say: Walcott should have scored on his header, and Ozil’s goal was an easy finish.

“Analytics” says: “Walcott’s header had an xG of 0.34, and Ozil’s was 0.84”

“Real football men” say: “Arsenal’s playing well and could mount a real title challenge.”

“Analytics” says: If Arsenal continues to meet expectations, they have a 53% chance of winning the league this year.Heat Map Week 9

“Real football men” say: “Chelsea is playing horribly this year.”

“Analytics” say: “Chelsea’s underachieving by about 7.5 points so far and need to turn things around.”

Deviation Bar Week 9-2

“Real football men” say: Leverkusen was unlucky to not win against Augsburg.

“Analytics” say: The Expected Goals  values mean Leverkusen would have beaten Augsburg 84% of the time.


“Real football men” say: Fernando Torres has his confidence back!

“Analytics” quietly turn up the air conditioning and weep at their desks….ok, so not everything is analytics.

Follow me on Twitter @Soccermetric


Chelsea Can Pick Up Ground The Next Four Games

Probably the biggest story of this season so far has been how Chelsea is underachieving. As of today they’ve earned 11 points through 9 games, and sit in 12th place in the league table, falling about 7 points below my model’s expectations (which put them in second place at the beginning of the season).

Deviation Bar Week 9-2

They’re the second biggest outlier in my model, only slightly out-performing Sunderland so far. My predictions still have them likely finishing in the top 4, and this week’s performance consolidated that position a little bit1.

Heat Map Chelsea Week 9

But can Chelsea turn it around? I look at the next four games to see what we’d expect from Chelsea and what their chances are of catching up to expectations.

Chelseas Next Four Games

In their next four games, Chelsea can expect to earn 7.84 points. They’re big favorites in the two home games (against Liverpool and Norwich City), but are only slight favorites in the two away matches (against West Ham and Stoke City). I know Chelsea’s form has been bad this season, but you’d still expect them to beat Stoke City fairly easily and…well who knows when West Ham is going to come back down to earth? If they win both of those games, they’d have 6 points right there, and a win at home against Norwich City would bring them to 9. Right there they’d be 1.16 points above expectation, and if they could beat Liverpool at home they’d be a full 4.16 points over the expectation. That would cut their deficit by more than half, bringing them more in line with the pre-season predictions that made them title contenders.

Four wins in four games seems out of Chelsea’s range right now, but I think they have to turn things around eventually. If they can do it now, they can get right back into the thick of things and maybe mount at nominal title challenge.

  1. They were helped out by a draw between Liverpool and Spurs, their other main competitors for that last Champions League spot