Category Archives: Methods

Stats Notes: R-Squared Isn’t The Right Measure

One thing I’ve noticed in the soccer analytics community is that R^2 seems to be the dominant measure of model fit. This is where I should probably go back and embed a few tweets of people reporting R^2, but I think it’s prevalent enough where people will believe me. This is almost never what we should be using, and only after we’ve set up the rest of the model correctly. Let me explain what I mean.

For the non-stats folks, or just anyone who needs a refresher, R^2 is is a measure of goodness of model fit. Basically it looks at the linear model and calculates how much of the variation of x on y is predicted by a linear regression. Higher R^2 means that more of the variation is predicted by the regression model (better fit), lower numbers mean that less of the variation is predicted (worse fit). Here’s a graphical example of the concept:

diagnostic plot

Both plots (and the remainder of the plots in this post) are a simple model: predicted points compared to actual points in a predictive model that tries to predict end of season points for each EPL team. The model details itself aren’t important, the data are all simulated in R as illustrative (except for my model at the very bottom). In the top half of this first figure, you can see that most points fall almost exactly on the regression line. This yields an R^2 of 0.95, meaning that 95% of the variance in the data is explained by the regression model.

The bottom half of the plot shows a model that has a much lower fit: the points are more scattered off the regression line, and only has an R^2 of 0.25. So far, so good. But this diagnostic test relies on some very specific assumptions that seem to be almost entirely ignored in what I’ve seen in soccer analytics.

R^2 is part of the final output of a linear regression model, specified in the bi-variate context as y = mx + b. If we’re calculating an R^2 value, it implies that this is what we’re doing, so it’s important to talk about exactly what this means and what we’re saying when we do this. You may remember this as slope-intercept form of a line back from middle school Algebra, where m = the slope of the line and b = the intercept. I want to talk about each of these components separately, why they’re important, and what they mean.

both good

The above figure is a very successful model, specified as y = (1)x + 0, meaning it has a slope of 1, and an intercept of 0. In practical terms, this means that for every 1 point increase in our model’s predicted points a team will earn, the team earns 1 point. For example, a team that is predicted to earn 37 points will earn 37 actual points or a team that is predicted to earn 87 points will earn 87 actual points. This is clearly a successful model where predicted and actual match very closely.

Note that the R^2 is 0.95, which is very close to the maximum 1, and for the R^2 fans this would be a successful model as well. Let’s take a look at another example.

bad intercept

For this model I shifted the intercept, and the solid line represents the ideal (y = x) while the dotted line represents the actual regression line (y = x – 20). This model is specified as y = (1)x – 20. What this means in practical terms is that there is a 1:1 relationship between x and y, which is good. For every 1 point increase in predicted points, there is a 1 point increase in actual points. However, the intercept being -20 is a significant difference in the predictions. So in this case, if we predict a team will earn 57 points, they will only earn 37 points. And if we predict a team will earn 107 points, they will earn 87 points. Clearly this model is less successful at predicting outcomes than the the previous model – it is 20 points low for every team in the EPL.

Once again, note that the R^2 is 0.95. If we only look at this, we would assume this model was identical in its predictive power to the previous model, but logically we can see it is not. Let’s look at one more example, this time with the slope altered:

both bad

In this final model I have a model of y = 0.3x – 7.46, meaning that the slope is now 0.3, and the intercept is -7.46. So to calculate actual points, we have to multiply the predicted points by 0.3, and subtract 7.46. So a team who we predicted would earn 37 points would earn 37*0.3 – 7.46 points, or 3.64 points. A team we predicted to earn 87 points would earn 87*0.3-7.46, or 18.64 points. If we predicted a team would earn 250 points, they’d earn 250*0.3-7.46, or 67.54 points.

Think about that for a minute – in this model, if we predicted a team would earn 250 points (roughly 6.5 points per game) over the course of the season, they would earn enough points to probably challenge for a Europa League position. This model has to be considered wildly unsuccessful in plain terms.

However, let’s once again look at the R^2. The model fits incredibly well, leading to an R^2 of 0.95.

Clearly we need to look at other measures before we look at R^2. If we’re going to perform linear regression in a Null Hypothesis Testing format (NHST), we need to set-up a clear null hypothesis for both the slope and the intercept. For anything where there should be a 1:1 relationship (point prediction models and expected goals come to mind), the first step needs to be to set our calculators to test it with an intercept of 0 and a slope of 1. If we get coefficients that are statistically distinguishable from those values, then maybe the model isn’t predicting particularly well regardless of the R^2.

For comparison, I present the current state of my prediction model.

my model

The R^2 isn’t as high as I’d like (City is a significant outlier at this point and as soon as they lose one they’ll drop back onto the line. Chelsea is also really underperforming and hurting the model), but when I set up the correct NHST (m = 1, b = 0) I get statistically insignificant results (y = -0.19x + 0.95)1. The fit isn’t ideal, but it meets the major requirements for now with the hope that the model will converge more as the season progresses. The R^2 isn’t 0.95, but it’s clearly an improvement over some of the other models above.2

  1. And when I set up the traditional null hypothesis where the slope = 0 I do get significant results, telling me the slope is unlikely to be 0, but is likely to be 1
  2. Some more technical details and replication scripts for the plots are available at http://soccer.chadmurphy.org/?p=277

How Do You Solve a Problem Like Mesut?

One of the big surprises in my analysis of Arsenal’s Starting XI was the number of players the Random Forest model preferred to Mesut Ozil. As a reminder, the Random Forest thinks about 40% of midfielders in the big 5 European Leagues would help Arsenal if they replaced Mesut Ozil.1 Here’s the graph:

Replacements Arsenal

I’ll admit to being surprised as well, so I did some first-cut exploratory analyses to see what exactly is going on with my data and I wanted to present them here. The first was looking at the distribution – seeing how much of an improvement we’re talking about, and how many players there are. Here’s the density plot of the overall distribution and Ozil’s place on it:

Ozil Replacements

It’s fairly normally distributed2, with a fairly narrow distribution (most players falling between +/-2 points of Ozil). It peaks at +5 points, and bottoms out at -7 points, but most players don’ t offer a significant increase over what Arsenal could expect with Ozil. I plotted out the distribution by points to more clearly see what proportion of players would help Arsenal gain different amounts of points.3

Ozil Proportion Increase

You can see the significant dropoff as we get higher in points – a lot of players gain Arsenal *some* points over Ozil, but we’re down to around 10% of midfielders are a 2+ point increase, and virtually no one4 players get Arsenal 3 or more points. This goes back to Wenger’s idea that it’s hard to find players who would improve a team like Arsenal.

So who are the players who can improve over Ozil? Here’s the top 10 (all are a 3+ point increase):

Player (Team)
Jedinak (Crystal Palace)
Medel (Inter Milan)
Verratti (PSG)
Keita (Roma)
Crisetig (Cagliari)
Magnanelli (Sassuolo)
Hlousek (Stuttgart)
de Rossi (Roma)
Toulalan (Monaco)
Tiote (Newcastle)
Osman (Everton)

The most interesting part here is that virtually none of these players, if any, would fit in a #10 role. Most of them are defensive midfielders, offering tackling and defensive support. A deeper look at the list shows a decent number of players who would be comfortable in an attacking role (several of Chelsea’s attacking mids are on that list unsurprisingly), but the real gap here seems to be Ozil’s defense. That’s not his role, but it shows a gap in Arsenal’s style that could be useful for them – someone like Yaya Toure who can play in a box-to-box role instead of a pure central attacking midfielder might be a good fit (and is a 2.5 point increase according to my Random Forest model, even with last year’s relatively down year for him), or Roma’s Daniele de Rossi who is almost a 4 point increase.5. If I were advising Wenger, I’d recommend a look at Lorenzo Crisetig  at Inter Milan – he’d be relatively inexpensive, can play anywhere in the center of the pitch, and at 21 has a huge upside.

There’s probably more to be gleaned from this, but it’s interesting to see why exactly the model doesn’t like Ozil as much as some other options, who I have a ton of respect for as a player. It’s also interesting to see how this can highlight deficiencies in the team and potential tactical adjustments that could be made to improve teams without overhauling the entire lineup or style.

  1. No one seemed surprised Aaron Ramsey’s number was almost as high…I almost feel bad for him.
  2. Or close to a bell curve
  3. This was a horrible sentence. My apologies to anyone who ever said I was a good writer at some point in life
  4. The exact number is 36/1058 players in the midfielders database
  5. His age would probably have made him a bad choice, although with the potential gain he might have been worth it if Arsenal’s going to take advantage of this big opportunity to win the EPL

Arsenal’s Room to Improve

So I had originally intended to put together a blog post about “every team’s best transfer option” but programming headaches and my day job got in the way. But a day late, I finally got the script right, and I’ll have a lot of interesting data-driven posts to share with you all.

Today’s post looks at “room for improvement.” What I did calculate a Random Forest model to predict each team’s expected points this season. Then I took each player out of Arsenal’s lineup and substituted every player in my database in for each player in Arsenal’s lineup, using the same Random Forest to recalculate the expected points.  Finally, I calculated the proportion of players that would give Arsenal more points, and graphed in here:

Replacements Arsenal

So to walk through an example, the first player I looked at was Laurent Koscielny (because I started with defenders). I calculated Arsenal’s expected points (around 80.5 in the Random Forest model), and then replaced Koscielny with Abdoulaye Ba, the first player alphabetically in my database of European defenders1.  I re-ran the RandomForest with the rest of Arsenal’s 10 outfield players and Ba in for Koscielny, and predicted they would only earn 73.5 points with him instead of Koscielny. I then repeated this for Adriano instead of Koscielny, and Arsenal would earn 75 points with him. Jordi Alba would be 79 points, etc. through the rest of the list.2  I then calculated the percentage of players in the whole dataset who replaced Koscielny who would earn Arsenal more than 80.5 points, and saved that number (5.7%, or 0.057 as a proportion). I repeated this for the rest of Arsenal’s players to get the same value for the remaining 10 outfield players.3

The big finding here is that Arsenal doesn’t have a lot of opportunities to improve. One can assume that most of the 5% of the players who can improve upon Koscielny and Bellerin are at top clubs already and aren’t available for transfer, so there aren’t a lot of options. I’ve always been a big supporter of Wenger’s “only buy the right player” strategy, and this lends some empirical support to that.4

The other finding is that Mesut Ozil dramatically underperformed last year, and needs to step up his game significantly. I haven’t seen the other top teams, but it’s odd that a team like Arsenal could keep a player who was only in the ~60th percentile at his position last year.

More teams to come (Aston Villa’s halfway there), and a Points Above Replacement (PAR) measure is coming relatively soon. I might put together a draft of something tomorrow, but I need to think the model through a lot more.

Follow me on Twitter @Soccermetric

  1. I removed all players with fewer than 10 games played – the first iteration didn’t do this and had some striker who had played in 1 game for 15 minutes and got two shots off in those 15 minutes bumping Arsenal up to 95 points
  2. Unfortunately my data doesn’t have information on specific positions outside defender, midfielder, and forward so there are some weird left-back for center-back substitutions and even more in the midfield
  3. I’m working on this method for all 20 EPL teams, but each team takes about 5-6 hours to run on my computer so it will take a week or two to get the data done. Hopefully closer to a week, but time and energy constraints might get in the way for all you Watford fans who read this blog.
  4. I don’t have the full list, but I know Cavani is on the list of strikers who would be an improvement over Giroud so he was on the right track there, even if he was a bit tentative.

EPL Expected Final Table – What is it?

Every week after the EPL games are completed, I post a couple of graphics – the Expected Final Table and each teams’ Deviation from Expected Points. I realized I’ve never really explained how I do this and why they’re important, so I wanted to take a couple of minutes to post what they are.

At the beginning of the season, I calculated the probability for each outcome for each game.1 So each team has an expected probability of winning, expected probability of losing, and expected probability of a draw.  I calculate the expected points through a simple formula:

3(Prwin) + 1(Prdraw) + 0(Prloss)

 Fairly straightforward  – three points for a win, so each team is expected to earn 3 * probability of a win. 1 point for a draw, so each team is expected to earn 1*probability of a draw, and 0 for a loss. Adding up this number for a single team across all 38 games gives me their expected points total for the season, and doing this across all 20 EPL teams gives me the final table.

The next step in the process is, after each game completes, to replace the expected point total for that game with the actual point total. Then I recalculate the expected points by adding the actual points earned for the games completed and the expected points for the remaining games, re-doing the final table with those values. Here is the most recent table:

Final Table August 30

I like this measure for one major reason – it controls for both strength of schedule so far and remaining strength of schedule. Before each team has played all the other teams, there can be huge discrepancies in how many points they’re expected to have earned, but the regular table doesn’t show that, it only shows how many points they’ve earned. As we get further into the season, this can become more important, especially in the title and relegation fights because we’ll know exactly how each team is expected to do in their remaining games without having to do weird mental gymnastics like “Well Norwich City has games against Chelsea and Spurs left, while Bournemouth is 2 points behind but has West Brom and Sunderland.” This does the all that mental math for us “behind the scenes” so we don’t have to guesstimate whether Bournemouth will catch up to Norwich in this hypothetical situation.

It also lets us avoid “streak story-telling” where we overextrapolate from a few early results. Right now one of the big stories is that Chelsea is struggling – they’re currently in 13th place and everyone is worrying that the sky is falling. We all know they’re not going to finish there, but we don’t know what this early-season slump has done to their chances just by looking at the table. This measure lets us see exactly what we’d expect to happen by the end of the season and what would have to happen for them to catch up.

The next is the deviation from expectation. As an example, week 1 Arsenal was expected to earn 82 points2. They were big favorites  week 1  home against West Ham, expected to earn 2.63 points. They lost, meaning they earned 0 points. So their expected points at the end of the season went from 82 to 79.373 with a deviation of -2.63 points.

This measure does a couple of things: first, it lets me diagnose how well the model is doing so far4. Second, it lets us see whether teams are exceeding, meeting, or falling below expectations. We expected Chelsea to perform at a much higher level than they have so far, and this model can quantify exactly how poor they’ve been compared to expectations. Similarly, we know Leicester City has been exceeding expectations, but this lets us quantify that. And despite a relatively slow start from Everton, they’re actually performing almost even with what we’d expect. Finally, we can see that Manchester City’s strong start has them performing significantly above expectations, and we can likely expect a little slump at some point as they regress to the mean.

Here is this week’s chart for an example:

Deviation 0830

Hopefully this explained the method a little more clearly for anyone who is interested – check my twitter (@Soccermetric) or this webpage for weekly updates as the season progresses.

  1. The method is available at http://soccer.chadmurphy.org/methods/predicting-late-season-outcomes-the-method/
  2. I don’t remember exactly how many points it was, but this is close enough for demonstration purposes.
  3. 82-2.63=79.37
  4. Right now the expected value correlates at 0.48, which isn’t bad this early in the season but I’d like to see higher. I’ll write a blog post about proper hypothesis testing later because I think that’s important for the analytics community

Zlatan Ibrahimovic and the Importance of Balance: The Zlindex~!

With my transfer evaluator finished, and no real interesting transfer rumors the last few days, I wanted to play with the algorithm and see how many points each EPL team would gain by signing Zlatan Ibrahimovic as a replacement for their main striker. Borrowing from Dirty Tackle‘s love of Zlatan, I named it the Zlindex~!

I applied my SVM to each of the 20 EPL teams, figuring how many points they would be expected to earn over the EPL season. Then I removed their main striker’s stats 1 and substituted  Zlatan’s statistics. I added his stats into the team stats, and re-calculated the results, Finally, I subtracted the expected points from the regular striker from the expected points if Zlatan was the team’s striker to calculate his added value.

Expected Points(Zlatan) – Expected Points(Regular Striker)

The unsurprising news is that Zlatan would improve 16/20 EPL teams, and would improve West Ham by somewhere around 7-8 points (enough to let them challenge for a Europa League spot according to my predictions). He’d also be a fairly significant upgrade for most of the top 6 teams.2 The full table is in the figure below.

zlindex

 

However, as Brooks Peck pointed out, it doesn’t make sense that he doesn’t improve the teams at the bottom.3 My first hypothesis is that these are teams that would suffer if they had too many karate kicks and ponytail related assaults, but the model doesn’t account for those so I’m probably wrong there. I did some quick exploration of the data, and found a consistent pattern for three of the four teams (Man City, Norwich City, and Crystal Palace). He doesn’t tackle as much as their current striker, he takes too many shots outside the area and too few inside the area.4

Zlatan Comparison

 

The two that stood out to me the most were “tackles” and “Shots inside the area”, and those are the two that seem to correlate most highly with points lost. Interestingly, this also fits what I see as a bigger pattern for Zlatan, having watched him a lot when he was with Milan5: he’s often lazy and uninterested on defense, and takes a lot of odd shots outside the area. To his credit, he can make those long distance shots work as well as anyone, but most teams prefer their striker operating a little closer to goal.

Newcastle United still remains a mystery to me – looking at the data for Papiss Demba Cisse, the big area where Zlatan differs is in passing: he passes the ball quite a lot more than most of the strikers in this list, and that may make the model think negatively of him. He might look more like a #10 than a #9, which fits the deep-lying forward style he was used in at Milan (holding up the ball, transitioning from defense to attack). This may be a latent variable the numbers are measuring, and that his style doesn’t fit the few teams he wouldn’t improve.

The important lesson here I think is balance: not every player’s style improves every team. Zlatan is one of the best pure strikers on the planet, but he’s a tall, strong, physical striker who can wear down defenders as good as anyone out there. This doesn’t necessarily fit with what teams are looking for, and even some mid-table teams wouldn’t benefit from his addition to the squad.6

  1. In cases where teams play with two strikers, I picked one at random
  2. Re: Chelsea, he’s basically breakeven with Costa, but is a big upgrade over Pedro
  3. Brooks pointed out that it’s not necessarily surprising Zlatan is a downgrade over Sergio Aguero for Man City, and I’d agree. I’ve been a fan of his since before he was at City because he won several Golden Boots for me in an FM2012 save
  4. The method here is fairly simple: I took the team’s current striker’s stats and subtracted Zlatan’s stats to see the difference between the two
  5. Forza Milan~!
  6. Someone mentioned on Twitter that teams can change styles based on new players, which is a real possibility the model can’t account for, but that leads to other issues in terms of team chemistry and whatnot so I’m not too concerned about that honestly