Category Archives: Methods

NWSL: Catching Up By Taking Worse Shots?

One of the strategic questions that has always interested me is: what is the best way to catch up after going behind in a soccer match? To my mind, there are two options:

  1. Take a lot of low percentage shots, hoping that volume makes up for a lack of quality.
  2. Be patient and wait for the high quality chances, hoping that quality makes up for a lack of volume.

There are merits to both, and you could probably solve this mathematically based on expected number of shots and expected quality per shot given any number of variables. My head is spinning thinking about how you’d actually solve this equation, but given enough familiarity with teams and the right data the math would be easy enough. Solving this equation isn’t my goal with this post, instead I want to see what teams have done and use observable data to see what their strategies are/potentially how they’ve solved the problem for themselves.

To do this, I’ve undertaken two separate analyses. The first is simple enough: what is the likelihood that a shot goes in given the game state at the time of the shot? More simply put: does shot quality correlate with score?

To answer the question, I ran an analysis (full details in the appendix) looking at each shot in the NWSL this season and part of last season1. I calculated the probability that each shot becomes a goal, and compared those probabilities when the score is even, the shooter’s team is one goal ahead/behind, two goals ahead/behind, three goals ahead/behind.

If teams look to catch up by taking lower probability shots when they are behind, we’d expect to see the average shot have a lower expected goal (xG) value the further behind they are, while when they are ahead the average shot would have a higher xG value.

Conversely, if teams look to catch up by taking higher probability shots when they are behind, we’d expect to see the average shot have a higher expected goal (xG) value the further behind they are, while when they are ahead the average shot would have a higher xG value. I present the results of my analysis in the figure below.

NWSL Score by Game State

The points represent each shot taken, while the y-axis represents the Expected Goal value and the x-axis represents the goal difference at the time the shot was taken. The red boxes represent the average xG value for the shots taken at a given goal difference and the standard error around that average. If you compare the center lines in each box, you can see an upward trajectory from -3 to +3, meaning that teams take lesser quality shots when they are behind and focus on higher quality shots when they are ahead.

My analysis of shot data shows that teams focus on taking whatever shot is available when they are behind, hoping that taking enough lower quality shots will help them get back in the game. There are a number of potential explanations for this, but it seems like teams prefer to take any available shot when they are behind but can be more selective when they are ahead.

Appendix

Here are the results of my probit regression: my dependent variable was “did the shot result in a goal scored?” and my independent variables are in the left column of the below table. The explanatory variable here is “goal difference” and it is positive and statistically significant (p < 0.05). That indicates goal difference is a significant predictor of likelihood of a goal scoring, and when teams are leading they take higher quality shots.

Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.1865 0.2929 0.64 0.5243
Goal Difference 0.1076 0.0512 2.10 0.0357
Distance from Goal -0.0810 0.0120 -6.74 0.0000
Angle to Center of Goal -0.7620 0.1889 -4.03 0.0001
Time -0.0007 0.0023 -0.32 0.7498
Was the Shot Pressured -0.2115 0.1306 -1.62 0.1054
Kicked 0.1807 0.1891 0.96 0.3394
Counter Attack 0.4137 0.1329 3.11 0.0019
Home Team -0.0751 0.1197 -0.63 0.5305
Goalkeeper Error 1.9129 0.4599 4.16 0.0000
Direct Free Kick 0.5552 0.3278 1.69 0.0903
Assisted from a Corner -0.0103 0.2471 -0.04 0.9668

 

@deepXG mentioned that the causal arrow might be going in the wrong direction: teams taking lower xG shots might be more likely to fall behind so I also wanted to do an analysis within games to show a change within games. I subdivided the data by the final score: winning/losing by 3, 2, and 1 goal, and ties (winning/losing by 0). Most of these final scores didn’t have enough shots across a variety of game states (games that finish in a tie tend to spend most of the game tied, meaning there’s not much variation on the dependent variable to analyze), but I was able to find a pattern among the most extreme results (+/- 3 goals).

For both outcomes, we see the same pattern as in the main analysis (although with more uncertainty because of a relative paucity of data). Expected goal values decrease as teams fall behind/increase as they take the lead. This provides a second level of evidence and a robustness check on the original findings. Figures are presented below.

Blog 3 Goal Win Blog Team Lost By Three

 

 

  1. I’m collecting these xG values by hand, coding each shot individually. As of now I have weeks 16-20 of the 2015 NWSL season as well as the first 3 weeks of the 2016 season.

Game Theory: The Rationality of Man City’s Fans Anger at Rotation

My newest Game Theory post about the value of rotation was inspired by a Gab Marcotti tweet:

He was speaking of Manchester City playing a “B Team” in the weekend’s Premier League fixture, prioritizing their mid-week CL semi-final return fixture against Real Madrid instead. The tweet was fairly controversial, especially among City’s fan base, and gave me a lot to think about.  So as I like to do, I think about it from a utilitarian perspective and try to game the expected value for each choice.

The idea behind rotating before a big game is that you can increase your chances of winning the big game while diminishing your likelihood of winning the rotation game and diminishing your chances of obtaining a given league position. For Manchester City, they are currently in a fight for fourth place with Manchester United (and to a lesser extent after this weekend’s fixtures, Arsenal).

The first step is to think about which is more important to Manchester City fans: winning the Champions League semi-final (and possibly the entire tournament), or getting Arsene Wenger’s famous “fourth trophy” and ensuring Champions League football next year. I can see arguments for both, and despite the mocking of Wenger’s qualifying record, as a Milan fan I know the pain of missing out of Champions League football after you’ve become accustomed to it.

However, the expected happiness from advancing to the finals vs. securing 4th place is mitigated by a pretty significant factor: the probability of winning the semi-final match with a full strength squad, which leads us to the following equation.

Expected Utility(Rotation) = Pr(Advance to CL Finals(Rotation)*(Value of Advancing to CL Finals)-Pr(Miss CL Next Year(Rotation))*(Pain of Missing CL Next Year)

Manchester City’s expected utility (“good”) from rotating the squad is basically calculated by how much value they get from advancing to the Champions League finals1 multiplied by their probability of advancing to the finals. Then you subtract the probability that the rotated squad causes them to miss the CL next year multiplied by the pain of missing out. In short: the biggest driver of value here is whether Manchester City fans think they can beat Madrid given a 0-0 draw in the home leg. If you don’t think this outcome if pretty likely, then the first half of the equation approaches zero, meaning that the pain of missing next year hurts more than any potential pleasure gained from rotation. In this case, it doesn’t make sense to rotate the squad.

However, if you assign a high probability to winning the semi-final at the Bernabeu then the first half of the equation becomes higher, meaning that the potential pain of missing next year is less significant. In this case, it makes perfect sense to rotate.

But this isn’t the only factor. There’s a second equation at play here, which I present now:

Expected Utility(Full Strength) = Pr(Advance to CL Finals(Full Strength)*(Value of Advancing to CL Finals) -Pr(Miss CL Next Year(Full Strength)* (Pain of Missing CL Next Year))

This represents the expected utility gained from playing a full strength squad. The equation is largely the same, but the values change because Manchester City played a full strength squad on the weekend. Presumably their likelihood of winning mid-week decreases because of fatigue (and potential injuries), while their likelihood of securing Champions League football next year increases because they have a greater likelihood of getting what would have been a crucial three points against Southampton.

If you’re a Manchester City supporter and believe that the odds of beating Madrid are low, then your values likely don’t change for the first half of the equation while your values for the second half of the equation increase. In this case, you want a full strength squad during the weekend.

If you’re a Manchester City supporter and you believe that a fresh squad will beat Madrid while a fatigued squad will lose, then your values for the first half of the equation are lower than they were previously. This lowers your expected value in a significant way, meaning you want a rotated squad over the weekend.

The final decision is calculated by which equation gives you a higher expected utility: which version makes you happier? Ultimately the question depends on two major factors: how likely you think Manchester City is to beat Madrid on the road, and how much pain you’ll feel if they fail to qualify next year. If you don’t have faith that they can pull of an upset mid-week, then you’ll oppose rotation and prioritize the league. If you believe there’s a chance, then you’ll support rotation and going all-in for this year’s Champions League.

Part 2:  Pellegrini’s Lame Duck Status

Normally we can roughly argue a manager’s incentives are aligned with his team’s and the fans. However, Manchester City have done something strange this year, announcing Pep Guardiola will be the new manager of Manchester City regardless of what Manuel Pellegrini does this year. This introduces a new wrinkle, one that I think fully explains why he did what he did. I want to return to the expected utility equation from earlier, because the logic is the same while the values are different given Pellegrini’s unusual incentives here.

Expected Utility(Rotation) = Pr(Advance to CL Finals(Rotation)*(Value of Advancing to CL Finals)-Pr(Miss CL Next Year(Rotation))*(Pain of Missing CL Next Year)

Because Pellegrini is a lame duck manager with zero interest in what happens to Manchester City next year, he experiences literally zero pain from Manchester City missing out on the Champions League next year. Pep Guardiola gets all the benefits if he qualifies, and Pep gets all the pain from missing out if he doesn’t. The second half of this equation is literally zero, so it becomes completely irrelevant to our calculations. So when we combine the two equations from earlier, we get the following:

Expected Utility(Pellegrini) = Pr(Advance to CL Finals(Rotation)*(Value of Advancing to CL Finals)-Pr(Advance to CL Finals(Full Strength)*(Value of Advancing to CL Finals)

Because the value of advancing to the Champions League Finals is the same for Pellegrini in both cases, we can cancel that term out and we’re left with the following:

Expected Utility(Pellegrini) = Pr(Advance to CL Finals(Rotation)*-Pr(Advance to CL Finals(Full Strength)

Even if the probability is virtually zero in both circumstances, and even if the value of rotation is virtually zero, Pellegrini strongly prefers2 rotating the squad over the weekend to maximize his probability of winning the Champions League, something that could presumably bolster his CV and improve the contract at his next job. Manuel Pellegrini has literally no reason to not rotate the squad, even if he sees virtually no value in it.

The Manchester City case described here is a relatively unusual one, which is why it’s interesting to me. The conflict between a manager’s incentives, the fans’ incentives, and reasonably different incentives between fans makes this a difficult case to think about and one worth exploring more and provides a lively discussion.






  1. There’s another level of value here from what they get from winning, but I’m ignoring that for simplicity
  2. While this works in the common vernacular, I mean this in the game theoretical sense where there are no circumstances where the second term is greater than the first

Preliminary Evidence on How Defensive Pressure Affects xG: Data from NWSL

If you follow me on Twitter, you’ve seen that I’ve been posting some preliminary Expected Goal (xG) data from the 2015 NWSL season. These data aren’t publicly available, so I’ve been collecting them by hand. To do this, I’ve been going on YouTube, watching every shot from every game, and coding a number of variables for each shot. As of today I’m up to around ~500 shots, and have built an xG model based on these shots which I will detail in another point when it’s done.

One thing I’ve added to typical xG models is a variable “whether a shooter was under pressure” – right now this is defined as whether the nearest defender was within a half yard of the shooter at the time of the shot. I was surprised to find that in my model, it didn’t reach statistical significance, both because it’s been considered an issue with typical xG models for a while now and because theoretically it makes sense that defensive pressure would lower the probability of scoring. So I did what any responsible analyst would do and plotted my data.

NWSL xG Distance v Pressure no rectanagle

I plotted xG values (derived from my model) as a function of distance from the goal. The red line is when the shooter isn’t pressured (the nearest relevant defender isn’t within 0.5 yards), and the red shaded area is the 95% confidence interval. The blue line is when the shooter is under pressure (the nearest relevant defender is within 0.5 yards), and the blue shaded area is the 95% confidence interval for that estimate. As you can see, there’s a significant overlap between the two lines – from about 15 yards out not only do the 95% confidence intervals overlap, but the lines are almost identical. However, for closer shots there is a difference, highlighted in the graph below.

NWSL xG Distance vs Pressure no rectangle

I’ve added a shaded area where the two lines are significantly different from each other – basically you’re looking at the area where the lines don’t overlap the other line’s shaded area to see where they are distinguishable from each other, which goes from about 3 yards to 13 yards away from goal. That is the zone where defensive pressure matters – basically anything between 3 yards from goal and the penalty spot is less likely to score if a defender is close, while anything further out than that defensive pressure is irrelevant. If I had to come up with a post hoc explanation, presumably shots in that area are difficult regardless of whether you have a defender in the way so distance is the limiting factor.

This is obviously limited to NWSL, and we may see differences for men vs. women here, but it’s a potentially interesting development given the paucity of defensive position data, and it’s an important methodological lesson to go beyond “star-gazing” and to look at the relationship between your variables. Defensive positioning matters most when the shooter is close to goal, even when controlling for a number of other factors (head v. foot, angle, etc.).

 





Statistical Modeling, Knowing Your Limitations, and Some Reflections

I tweeted today about the difference between good and bad models, and the importance of recognizing that fancy stats aren’t necessarily good stats. I wanted to write a little bit about that, post some thoughts about understanding the limitations of models, and reflect on my own model’s successes and misses.

With the semi-regular attacks on statistical models in the media, I see the wagons circle on Analytics TwitterTM and understandably so. People work hard on their models, and as far as I can tell the vast majority of people do it for little or no reward beyond Twitter likes and an increased following. When they feel that work (or the similar work of others) is diminished as done by weaklings in air-conditioned rooms, it’s easy to feel attacked. That becomes a vicious cycle though, where we end up trusting “stats” in general over anything else, regardless of the method. So when we have a dozen or more prediction models out there, all coming to different conclusions, it’s important to evaluate those models to see which one is closest to the truth.

Everyone is entitled to use their own criteria, and really learning how to do it involves more statistical training than I can do in a simple blog post. For me, I don’t trust any model that doesn’t post their methods and predictions publicly. There can be some proprietary elements, but if I don’t have enough info to evaluate how someone came to their conclusions I discount that model’s conclusions heavily.1 I don’t necessarily like models that include salary data because it over-values teams like Manchester City who overpay for players, and makes some general assumptions that higher paid players are better. More than anything, I want to see public predictions and some sort of validation of those predictions. Let me see how well your model does, let me see where it succeeds and where it misses, and hopefully learn from that. Again, people can post what they choose, but I try to be as transparent as possible and I think people have responded to that.

I’ve been fortunate to gain a lot of followers very quickly – this was just supposed to be a fun project for me, a way to learn some machine learning techniques, and maybe people would enjoy it. To have picked up 2100 followers in ~6 months is beyond anything I thought would happen, and I couldn’t be more appreciative of all the people who read and share my work. I’m glad people find what I do interesting, and hope to continue over this year. I have needed to adjust my thinking accordingly, and wanted to post those thoughts/concerns publicly so people could evaluate them and my project accordingly.

My goal isn’t to call out any model or stat specifically, so I want to talk about my model for a minute. I don’t do so out of narcissism, and will focus on the “growth opportunities” as much as I do the successes. You can learn both from being right and being wrong, but in many cases the opportunities to improve are in fixing bad predictions rather than congratulating yourself for your correct ones.

I’ve posted this before, but the original goal of my model was to quantify the contribution of individual players. I experimented with some “Points Above Replacement” metrics, but got some pushback so I put that on the backburner to validate the model before I could confidently assert its value. So I decided to let my pre-season predictions run the entire season and to see how well they do. As of last week I’m leading the 90+ entrants in Scoreboard Journalism’s prediction contest, and was the closest to identifying the black swan that is Leicester City by tapping them for 8th place with 60 points. I’m overall pleased with the model’s results, with a few caveats I wanted to mention and some cautionary notes on my predictions and be as transparent with my thoughts as I can so people who follow me can understand what I’m doing better. As a reference, here’s this week’s predicted probabilities of each team’s final table position.

Week 26-2 Heat Map

Arsenal leads the pack with a ~75% probability of finishing first. Leicester City is in second with a ~15% chance, and Spurs and City both have a ~5% chance of winning the league. Arsenal fans should be happy with this, but there are some caveats here. Here is my diagnostic plot, showing MOTSON’s predicted points vs. the actual points earned.

Week 26 Expected v Actual

Arsenal, United, Southampton, and Man City are all basically on the regression line, which means MOTSON has predicted their points perfectly through 26 weeks. They’ve all hovered around that line for the first 26 weeks – United was around +5 or so for a while, but quickly slipped back to the mean as their form slipped into what it’s been recently. Southampton was -5 or so, but has improved in recent weeks, but otherwise they’ve all been fairly close to expectations all season. Some temporary deviations are to be expected, so what this means is that my model has a really good handle on exactly how good Arsenal, United, Southampton, and City are. When the title race seemed like a two team race between Arsenal and City, then I was very confident in how highly my model rated Arsenal (even when the rest of the world was picking City – a pick that seems to have been validated recently).

However, my model has done better with Leicester than anyone else, but still underestimates their ability by a significant amount. How much, I’m still not entirely sure. It did like Arsenal to beat them at home, which happened, but it also liked City to beat them, which didn’t happen. To be fair, the simple in-season results model liked City in that match as well so it may have just been an upset, but it’s hard to tell. Regardless, Leicester’s overperformance means the model likely underestimates their “true” ability, which means their predicted likelihood of winning the title is understated. How much? I’m not sure, but I am personally confident the number is higher than 15%.

The same thing goes for Spurs: they’re in the middle of a special season where they’re out-performing expectations. They’re not doing it as much as Leicester obviously, but MOTSON really seems to have underrated them. So their number is probably higher than the 5% chance they’re being given right now, but again, I’m not sure by how much.

I’m torn on whether I want to keep presenting the model’s predictions as/is, knowing that the percentages are skewed against Arsenal. For me, it’s an academic exercise, but it’s taken on more of a following than I anticipated so I wanted to be transparent with what I think is going on with the model. I’m not altering it from the pre-season, so it comes with certain assumptions. Primarily, that Leicester City will play like the 8th best team in the EPL instead of a top 3 team, and that Spurs are a top 6 team but not a top 3 team. Those affect the model’s predictions, and it’s become particularly relevant the last couple of weeks so it’s something people should be aware of when they evaluate my model (and everyone else’s).

I’m confident Arsenal will end the season right around 75 points. I’m not as confident that Leicester will end at 71 (where I’m currently predicting them) or that Spurs will only have 68.  The model is presented as such because for this type of work you don’t update just because you want to. It’s not necessarily a bad thing because it eliminates recency bias. If I did that, I’d have put West Ham in the top 4 early in the year, dropped Southampton into the bottom half of the table six weeks ago, and would have handed City the title at least a half dozen times (like many other modelers did by the way). Those would have been big mistakes, and would have happened because I trusted my own (flawed) judgment instead of the model’s. Trust in the numbers, but be aware of their limitations. This applies to any statistics you read, including, but not limited to, mine. I’m just more transparent about it than others.

 






  1. Personally I think people overvalue the proprietary nature of their models – if you’re truly good at statistical modeling you should  be able to just outperform other people regardless of how much you share. I would also never pay anyone who isn’t transparent with how they do things, but who am I to tell people how to earn or spend their money?

Is Petr Cech Worth 15 Points? A rough, back of the envelope calculation

A note before I start this post: all of these calculations are really rough, back of the envelope sorts of calculations. The method is sound, but there are some issues with data that I will identify in italics throughout. The end results don’t change the answer that Cech is likely not worth 15 points as John Terry now famously said, but this sets up what I think is a decent first cut at assigning an actual point value to goalkeepers. Hopefully others can fix the bad assumptions to tighten up the actual values a little.

John Terry famously said that Petr Cech would be worth 12-15 points for Arsenal, and we’ve seen several variations of this theme in the media with various goalkeepers (Lloris, De Gea, etc.) but no one really knows how to evaluate the actual value of a keeper to a team.1 Using data from Paul Riley (@footballfactman on Twitter) and my recent analysis of Michael Caley’s xG data against MOTSON’s predictions I think we can at least make an first cut here that makes sense and comes up with what I think are reasonable results.

Step #1: Regress MOTSON’s Expected Points against xGD

In my previous post, I showed that MOTSON predicts expected goals very well. So I did a quick, bivariate regression with Expected Points as my DV and xGD 2 as my IV. From this I get a model of:

ExpPoints = 26.73 +0.6723(xGD)

What this means is (as of today), each team starts with a baseline of 26.73 points, and then for every additional xGD you add 0.6723 to that value. Simple arithmetic can calculate each team’s expected points from here.

Quick diagnostics: The predictions from this model correlate with my actual expected points at 0.83, and correlate with teams’ actual earned points at 0.77. These are very high numbers, certainly high enough to continue with the analysis.

So each xGD is worth 0.6723 points in the league table. Next, I look at Paul Riley’s data showing all EPL keepers with more than 350 saves in the last 5 1/2 seasons. The data show Expected Goals Allowed (ExpGA) vs. Actual Goals Allowed, so I use this to calculate a goal differential for each goal keeper over the 5 1/2 time period, multiply this number by the regression coefficient calculated earlier ( 0.6723), and then divide by 5.5 to calculate a “per season” score.

There are two assumptions implicit here: first is that each goalkeeper played a full 5 1/2 seasons in the EPL, which is a bad assumption. I didn’t want to look at each keeper’s history for what is supposed to be a quick blog post, but if you did you could easily just change the last number to whatever the number of seasons someone played is.

The second is that goalkeepers are 100% responsible for the difference between ExpGA vs. Actual GA, and have 0% responsibility for scoring goals. This seems relatively reasonable, or at the very least the error terms are random between all the keepers in this model. There might be great counter-attack starting goalkeepers in the world like Manuel Neuer who deserve some credit for goals scored, but even in that cause I’d imagine the number of small, at most. 

Below is the a graph of all the goalkeepers and their per season point values, and the table with the raw data is located at the end of this postGoalkeepers. The top goalkeeper in the model is Adrian, and according to this method he’s worth about 2.4 points per year over a “neutral” goalkeeper, and a little more than 5 points per year over the lowest scoring keeper in the model. Petr Cech comes in 4th at 2.05 points per year, which is really good for a goalkeeper, but is well short of the 15 points John Terry asserted.

Like I said, this is a first cut at Expected Points for goalkeepers, but presents a way of quantifying their major contribution. Future work on the topic needs to look at error rates around xG calculations, sample size issues, and some other measurement issues with the model inputs, but overall I think it’s a really good first cut. Big thanks to Paul Riley for posting his data publicly and making this post possible and I’m looking forward to seeing how people can build on this.

NameExpected Goals AllowedActual Goals AllowedGoal DifferencePoints Over 5.5 Seasons
De Gea168.0914919.092.33
Hart187.0117017.012.08
Cech166.815016.82.05
Adrian108.638919.632.4
Begovic207.1419710.141.24
Foster225.8421114.841.18
Vorm128.261199.261.13
Schwarzer144.771395.770.70
Mignolet226.632215.630.69
Reina111.181074.180.51
Jaaskelainen169.611672.610.32
Lloris136.71360.70.09
Howard217.39219-1.61-0.20
Szczesny147.01150-2.99-0.37
Friedel112.88116-3.12-0.38
Hennessey119124-5-0.61
Al Habsi158.87165-6.13-0.75
Green146.87153-6.13-0.75
Ruddy161.94171-9.06-1.11
Guzan192.78207-14.22-1.74
Krul241.88264-22.12-2.70

 

  1. Goalimpact assigns values to keepers,  but they’re notoriously even more difficult to quantify than defenders.
  2. Expected Goal Difference

MOTSON Predicts The EPL Table Well, But Predicts xG Better

I’ve been focusing on diagnostics with MOTSON lately, but one thing I haven’t thought about is comparing the model’s predictions to some of the underlying statistics measures. The model has done well, but how much of the prediction error comes from, for lack of a better word, “error”, and how much of it comes from random variation. So I wanted to test this with the gold standard in soccer analytics: xG.

As a refresher, my most recent model’s expected points correlates with actual points at 0.61, which is pretty solid but not spectacular. Chelsea and Leicester City are pretty big swings and misses for the model right now, dragging the correlation down from about 0.8, which would be solid and spectacular.  Here’s the scatterplot of model fit with a regression line of b = 1.0 thrown in for good measure.

xg Predicted v Actual Points (xg is for ordering purposes)

I would imagine all my readers know this, but for those who don’t the quick version of xG is it’s a measure of shot quality. How many goals would you expect a team to have scored given the quality and number of their shots? But we know there isn’t a perfect correlation between the two measures, and variance causes teams either score more or less goals than the xG measure would predict.

I downloaded xG data from Michael Caley’s fancy stats site (@MC_of_A on Twitter), which included both expected goals for and against for each team in the EPL. I merged in my “expected points” data and the actual points each EPL team has earned, and created a new variable with the difference between each team’s xG and the actual number of non-penalty goals (NPG). This is what I used for my analysis.

Question #1: Is MOTSON successful in predicting the underlying stats? Specifically, how well do MOTSON’s predictions match up with xG?

I correlated the xGD1 measure with my expected points. We know Goal Difference is a great predictor of table position (correlating at 0.92 as of this post), so I want to see if MOTSON’s expected points correlate with Expected Goals. I ran a quick bivariate correlation, and got a Pearson’s r value of 0.83, very high by any accounts.

xGD v MOTSON Residuals

MOTSON passes this test, showing a high correlation between expected points and expected goal difference.

Question #2: How much of MOTSON’s error is explained by variation from underlying statistics? Specifically, do xG and MOTSON share the same misses?

For this, I correlated the xG residual measure (calculated as NPGD – xGD) with MOTSON’s residuals (calculated as Expected Points – Actual Points).2 If xG and MOTSON’s residuals are highly correlated, it could be evidence that the teams MOTSON over/underpredicts are over/underperforming compared to expectations and the underlying statistics the soccer analytics community looks at. The overall correlation here was almost as strong, with a Pearson’s r of 0.74.

xGD v MOTSON Residuals

MOTSON’s biggest outliers this season are Chelsea, Leicester City, and Aston Villa. The first two would have been almost impossible to pick pre-season, but do models using current season data do better on them? It turns out those three are fairly problematic for xG models as well. MOTSON over-estimates Chelsea’s points by 16, but their xGD is actually positive at 4.8. This xGD score compared to their NPGD of -9 shows that even using in-season statistics has trouble with Chelsea’s performance this year. Similarly, Aston Villa underperforms for MOTSON by 10.71 points, and underperforms xG by 7.4 goals. Leicester fares better using in season stats, overperforming by 13.28 points for MOTSON but only off by 3.5 xG.

MOTSON has done well for me this season, but it’s interesting that it does a better job predicting xG than it does actual points. This makes me more confident that the model is onto something, and figured a lot of it out in the pre-season without any current season data.

  1. Expected Goal Difference, or Expected Goals Scored – Expected Goals Allowed
  2. This isn’t the right statistical use of the word residual, but it sounds nicer than error so I went with it.

Why Rory Smith Hates Numbers Wizards, or How Can We Quantify Fun?

Let me preface this post by saying that I understand this type of question is exactly why pundits mock analytics. The idea of assigning a number to how fun a match was to watch seems silly and to be quite the opposite of fun. But I’m writing it anyway because I have a very different definition of fun than normal people do, or because I think this actually serves a real purpose that real people actually care about.

Imagine you’re in charge of sports programming at a major TV network, and your network just bought the rights to a new foreign soccer league or competition. For bonus points, imagine this is in a country without a big history with the sport (like America), and it’s your job to develop this contract into a major money maker for the network. How are you going to do that?

NBC does a great job with soccer here in America: they broadcast all EPL games either on one of their networks or on their live streaming app, which is far better than any other sport in America and from what I understand is better than what my English followers get.  If you want to watch a game, you can do it. But my only issue with their coverage 1 is the over-focus on a few teams. If Chelsea or Manchester United are playing, you can guarantee NBC will broadcast their games regardless of what else is going on, or if there are better games to watch. But there has to be a better way!2

So that leads me to the research design I’m going to propose here: the correlates of fun. Today’s Everton v. Tottenham Hotspur game was a lot of fun, especially the last 15-20 minutes or so where it was a seemingly never-ending series of counterattacks leading to goal-scoring opportunities. Even the commentators were visibly disappointed when it ended, hoping for just one more minute. The only reason this game aired on TV was because it had no competition in the 12:30 (Eastern Time) slot, and we could have easily missed it. Could NBC have predicted this, or more importantly, could an appropriately air-conditioned analyst have predicted it and advised executives to show this game over some other choices if it had competition?

I’m going to propose a series of measures that I know are available to folks with premium data subscriptions that I think correlate with what people think a “good” game is. This might not be exhaustive, and I’m open to disagreement, but I mean this more as a research design than a finalized answer to the question.

  1. Goals
    • More goals isn’t always a good thing, but on average a 2-1 game is going to be more interesting than a 0-0 draw.
  2. A competitive game
    • A close final score is more compelling because any moment could change the outcome of the game, and because it correlates with any number of other things that are interesting (a team with a 3-0 lead is probably more interested in killing off the game than scoring a 4th goal). Plus there’s the dramatic element of not knowing what the outcome will be.
  3. Fast average player speed
    • I saw this on the Valencia/Madrid game – they had real-time data on how fast each player ran during the game. A series of sprints probably means the end-to-end type of action that people like rather than aimless passing around the midfield.
  4. Large average distances run by players
    • This may be less direct than the previous one, but it seems like if players are covering more distance that means the ball is moving around more, which probably means there’s more action.
  5. Low numbers of fouls
    • Fouls can act as a heat check on a good game, slowing it down, ending interesting passages of play. Even free kicks from dangerous areas are unlikely to go in, but take seemingly forever to set up before someone kicks it over the goal and into the 20th row.

All of these things can be modeled/predicted ahead of time (there’s no reason one couldn’t apply @DoctorFootie’s method to these questions), and then use the model to pick which game is featured on television so fans have the best chance of an optimal viewing experience, increasing the likelihood of fans returning next week for another exciting game.

A second, probably more scientific (but more difficult) option would be to do a survey. There are a couple of options here. The first is that you could sit people down in a room, have them watch a game, and ask them to watch a game and then rate the game on an entertainment scale of 1-10. Then you mine stats from the games, correlate them with ratings, and figure out what people are looking for. In a big enough sample, you should find some key aspects of games that people find entertaining and use that to predict which game will be the best.

The second option along these lines would be like political focus groups do and dial-test things. Give people the dial where they turn it to the right if they see something they enjoy, left if they see something they don’t enjoy. That would increase your effective sample size because you’d have a large number of events in a given game rather than a single game as your level of analysis, and would also give you more precision in understanding exactly what sorts of things people like in games. More granular data is better in this case, and they already do this for new shows so why not for soccer? Then you’d find the events people enjoyed in individual games, and show games that are likely to have more of those types of events.

I’d like to see more games like Everton v. Spurs today, and fewer games where Man United drags the game to a crawl with a series of sideways and backwards passes and presumably the networks would want to show more games like that too. Better use of data could help networks make better decisions, both helping the on-air product and the league. This may not be as important for the EPL where ratings are high and NBC offers real-time access to all 380 individual games, but I can picture this being even more important for a league like MLS with a smaller fanbase, no real “big clubs”, and a need to improve the perception of the product. Data helps you make better decisions, and anything you can observe can be quantified as long as you look in the right places.

  1. I actually don’t like their “Breakaway” days where they switch from match to match whenever a team makes a good attacking move, but everyone else seems to like it and I’ll just switch to my Roku to pick my favorite game that day
  2. Obviously there are commercial appeals at play here: I know Man United has a huge fanbase here in America, and I’m assuming Chelsea must too because NBC has kept showing their games long after they stopped being competitive for the title. There are also “big game” considerations which I’m going to ignore for the purposes of this post. The Manchester Derby earlier this season was a colossal bore, but I get that it’s a game between two title contenders and a huge rivalry. That sort of thing trumps “fun” soccer for me and I’m willing to accept arguments that these types of big games need to be the main option regardless of how boring they often are, at least in the EPL.

EPL Power Rankings: 2015-2016 Performance Adjusted for Strength of Schedule

I’ve written previously about how MOTSON relies on last season’s results for its predictions, but I wanted to do something with this season’s data now that we’re at the halfway point.

The model is simple: it’s a generalized partial credit model (GPCM), which is basically the same model they use for standardized tests like the SAT or GRE. I’ve written in more detail about the approach in my initial post for my blog, but the basic idea is that I treat each team as a person taking a test, and each home game as a question on the test. If you win the game, you get full credit, if you draw you get partial credit, and if you lose you get no credit. GPCM models are good because they are agnostic as to history, payroll, Big ClubTM status, or any of the other things that confuse regular human brains. All they know are the results that have occurred this season between the teams that have played against each other.

These models also do well with missing data, so in a half-season each team has played approximately half the other teams at home. GPCM models fill in these gaps, adjusting each team’s strength against the difficulty of the fixture.

So based on home results, here is each team’s “Strength” coefficient.

Week 19 Power Rankings

Arsenal is head and shoulders above the rest of the league, dominating at home, well above #2 Leicester City. The odd results here are Crystal Palace, who is far under-performing at home, and Swansea, who is performing at home far above their position in the table. Their positions are basically reversed, which brings me to the second part of the equation: difficulty to beat on the road.

The strength coefficient in the previous graph shows how strong each team has been at home and how good they are at beating teams on the road. As stated earlier, this is the equivalent of the score a test-taker would earn. Now we turn to the difficulty of the question being asked, or the strength of teams on the road.

Week 19 Away Fixture Difficulty

The dark red point in this graph represents the strength coefficient needed for a 50% probability of the home team winning a game against the opposition (answering the question “correctly”). The blue point represents the minimum strength coefficient needed to secure a 50% probability of a draw (earning “partial credit”). Points to the left of the draw zone mean a higher likelihood of losing, points to the right of the draw zone mean a higher likelihood of winning based on results so far this season.

Interestingly, title favorites Arsenal and Manchester City are near the center of this table, seemingly weak on the road. However, when you compare their away difficulty coefficient to the home coefficients, only six teams (excluding themselves) would have a strong chance against them. This feels about right to me – the top 6 teams should have a good chance of beating title contenders at home, but beyond that it should be much more difficult.

On the other side, I’ve been skeptical until now, but Chelsea look to be at least a semi-legitimate relegation contender right now.  They are the 6th weakest home team right now (a far cry from the undefeated season at Stamford Bridge last year), and everyone except for Aston Villa would be favored to take at least a point off of them at home. I knew they weren’t as good as MOTSON says, but this model has them in serious trouble unless they things turn around. Maybe not relegation-level, but bottom 5 wouldn’t be surprising based on the eyeball test here.

Disclaimer: while these coefficients are accurate given results so far, we’re obviously at a very small sample size with a substantial amount of missing data that will be filled in over the next 19 weeks so these numbers could possibly change quite a bit. However, there’s a lot of logic in these numbers, and they match up at least somewhat well with my expected final table as of today. 

Also, one can calculate predicted probabilities for each outcome (and presumably expected points over the season). off of these models. I don’t know that I’m going to do that, but if there’s interest I can probably put it together in the next couple of weeks. 






Is Jamie Vardy Actually Underperforming? The Importance of Measurement

Jamie Vardy’s performance has been the story of the EPL season. But how spectacular is his performance? In the analytics community, this revolves around Expected Goal (xG) measures. So let’s look at Danny Page’s great simulator to see how he’s doing.

Vardy

Using @footballfactman’s numbers, Vardy is doing pretty well for himself, finishing 1.64 standard deviations above the mean (only counting non-penalty goals). This means we’d expect him to have more goals than this, given his xG total, about 5% of the time. Very impressive finishing.

But what is 1 xG isn’t really one expected goal? @rakkhis posted a great image today showing the correlation between xG (plus xA, or expected assists), and actual goals + assists. It shows a powerful trend, with an Rof 0.73.

CV9IIKrWsAIT0Ki

The interesting thing here is that we see Jamie Vardy, Mesut Ozil, and Romelu Lukaku slightly below the trendline. What that means is that those three players all have expected assists and goals roughly equal to his actual production. In this case, it means we’d actually expect Vardy to have about the same number of goals and assists (likely goals because of his position) that he actually does. This means Jamie Vardy is basically performing identically compared to expectations, which is a remarkable idea.1

This all goes to the importance of measurement: Danny Page’s simulator assumes a 1:1 relationship between xG and actual goals scored. It says that Jamie Vardy is having a rare season, over-performing > 95% of all simulated seasons by someone with his xG. The regression however tells a different story, showing Vardy is slightly underperforming and should have actually scored an extra half goal or so. If we measure it using a 1:1, Vardy is an extreme outlier, if we measure it using the ~1.6:1 relationship that the regression shows, then Vardy is performing almost exactly at expectations. In one version he’s in an amazing purple patch and we’d likely expect him to slip a little bit back to normal, in another, we’d expect him to continue this streak as long as he keeps getting into the same positions to shoot. Measurement matters and it affects the conclusions we can draw from data.

 

  1. EDIT: I originally had this paragraph as “Vardy was under-performing”, but John Burn-Murdoch pointed out that the author of the graph transposed the X and Y axes, meaning people under the trendline are over-performing, while people over the trendline are under-performing. I’ve edited this paragraph to reflect that idea.

Predicting Individual Player Contributions – The Next Frontier

I’ve recently been inspired by @DoctorFootie’s really creative model of predicting possession – I won’t pretend to understand it well enough to explain it, but the idea is that he borrowed an equilibrium model from chemistry to predict how much possession each team will have in a given game. The significant contribution here is that opposing team stats contribute to and affect each other, and this is most obvious in a place like possession where you have a zero-sum game (me having the ball means you can’t have the ball). It would be interesting to see how well this works for something a little less zero-sum (number of passes) or almost completely independent (tackles, or maybe even shots on goal). That’s not my method, but you all should follow @DoctorFootie on Twitter because I’m anticipating he’s going to do some interesting things going forward.

Back to my inspiration – my current model uses average player stats to predict results (on top of a “strength” coefficient calculated from the 2014-2015 season). 1 But I can do better than this – I can predict individual player stats based on who they are facing. It’s easy enough to do – player stats would be your dependent variable, aggregate opposing team stats would be your independent variable, then let the machine learning models work their magic on this training data.

Then I could do predictions based on average opposing team stats – how many passes will a player make given the quality of opposition? How many shots on target will a player get?

This could potentially give better predictions for player stats to put into the current model, but there’s a bigger issue here. This would be another step closer to quantifying individual player performance – knowing how individual players perform against each opponent could be incredibly useful for teams: it could help managers trying to make the optimal squad selection, could help teams decide which players to purchase, and could get use closer to understanding exactly what individual players add to the team contribution and under what circumstances do they add it. Are there “big game players?” How consistent are players? Can a player move up to compete against stronger competition?

This will be my holiday goal – I think I can put something preliminary together relatively quickly after my grading is done, test it, see if it works and is worth pursuing more fully. I might test two or three teams to see how it works, and go from there. Hopefully we can learn something interesting here and create an improved model of individual player contribution over what I have now.

  1. As I type this, I’m realizing I can weight the strength coefficient from season to season – e.g. week 1 I only use the 2014-2015 coefficient, maybe by week 10 I can start to slowly add in the 2015-2016 coefficient