# Monthly Archives: May 2016

# Thinking about Individual “Finishing Skill”

One of the big open questions in expected goals research is accounting for individual finishing skill: a shot taken by Lionel Messi is worth more than a shot taken by Jesus Navas, but how do we account for that? There are any number of open issues here, mostly methodological, but I’ve recently started thinking about a more theoretical one that should be addressed before worrying about the statistics underlying the concept: What exactly do we mean by “finishing skill?”

As far as I’ve read, finishing skill is typically thought of as the ratio of goals above (or below) the number of expected goals. Dismissing the ideas of variance and imprecision in measurement for a moment, a player who scores more goals than expected is a good finisher while a player who scores fewer goals than expected is a bad one. Lionel Messi outperforms his expected goals, so we can say that he’s clinical in front of goal, while Jesus Navas underperforms so we can say that he’s…well as a Manchester City supporter I don’t want to talk about it. But is this all there is to be said?

One of the great contributions of expected goals is the idea that all shots are not created equal, that is to say a shot taken from out wide and from outside the penalty area is less likely to score than one taken from the center of the goal six yards away. But why shouldn’t we apply this to the idea of finishing as well? I’m coming around to the idea that there are as many types of finishing skill as there are types of shots, so the next step is to identify the important ones and identify which players are what types of finishers. To do this, I draw on some of the fundamental contributions of expected goals research along with the eyeball test from watching and playing however many thousands of hours of soccer.^{1}

**The Clinical Finisher**

This archetype comes from the idea of players who are clinical in front of the net. They’re calm, collected, and don’t miss easy opportunities. In xG terms, they score on high probability shots even more frequently than one would expect. A 0.5 xG shot kicked by Lionel Messi one-on-one vs. the goalkeeper is more likely to go in than one kicked by Fernando Torres at Chelsea. So theoretically that shot is 0.7 on Messi’s boot and 0.3 on Torres’s. This likely correlates with things like confidence, composure, and close ball control.

**The Long Range Sniper**

Shots taken outside the box and from an angle have a low xG value, yet some players continue to take them. Presumably some are better at these shots than others, and being able to shoot from distance is certainly a skill that can be developed. We could also expect some players to not realize that they don’t have this skill and still take a number of shots from distance, so we’d see some significant variation here. I’m thinking about Zlatan’s famous bicycle kick for Sweden: for a mortal man that shot would be 0.00001 xG, but for Zlatan maybe he makes that as many as 1/10 times (0.1 xG).

**The Free Kick Specialist**

Direct free kicks are a skill some players have while others don’t. Players like Cristiano Ronaldo, Yaya Toure, or Andrea Pirlo probably deserve a decent bump in xG from direct free kicks, while others players are probably below average. There’s some difficulty here in that only good free kick takers would really ever take any, but it’s a skill we could measure.

**The Head of the Class**

We know headed shots are lower value than those that are kicked, but obviously this isn’t equal across the board. Andrea Pirlo has never been known as a great header of the ball so maybe a header taken by him that would normally have an xG value of 0.4 would have a true value of 0.3, while someone like Zlatan or even Gerard Pique would have more talent in this area and would be worth a 0.5.

There are likely more types – players who are better on counter attacks, players who are better on corners, etc., but I wanted to present a few basic archetypes because it’s worth discussing and worth thinking about not just finishing skill, but types of finishers. If you’re trying to build a team, you wouldn’t just want the best finishers using the pure xG/actual goals metric, you’d want complementary players. Maybe you’d want to build a team filled with speedy, clinical players who could finish goals on counter attacks. Or maybe you want to play along the flanks and cross the ball into the box 30+ times a game, so you’d want some forwards who are strong headers of the ball. Maybe every team needs at least one free kick specialist, or maybe you’d want a balance. But regardless of the strategy, using xG to define different types of finishers would be a useful addition to the toolkit.

- All players and numbers I use here are hypothetical. The point isn’t the identify specific players or specific values, just to present illustrations of what I’m thinking ↩

# NWSL: Catching Up By Taking Worse Shots?

One of the strategic questions that has always interested me is: what is the best way to catch up after going behind in a soccer match? To my mind, there are two options:

- Take a lot of low percentage shots, hoping that volume makes up for a lack of quality.
- Be patient and wait for the high quality chances, hoping that quality makes up for a lack of volume.

There are merits to both, and you could probably solve this mathematically based on expected number of shots and expected quality per shot given any number of variables. My head is spinning thinking about how you’d actually solve this equation, but given enough familiarity with teams and the right data the math would be easy enough. Solving this equation isn’t my goal with this post, instead I want to see what teams have done and use observable data to see what their strategies are/potentially how they’ve solved the problem for themselves.

To do this, I’ve undertaken two separate analyses. The first is simple enough: what is the likelihood that a shot goes in given the game state at the time of the shot? More simply put: does shot quality correlate with score?

To answer the question, I ran an analysis (full details in the appendix) looking at each shot in the NWSL this season and part of last season^{1}. I calculated the probability that each shot becomes a goal, and compared those probabilities when the score is even, the shooter’s team is one goal ahead/behind, two goals ahead/behind, three goals ahead/behind.

If teams look to catch up by taking lower probability shots when they are behind, we’d expect to see the average shot have a lower expected goal (xG) value the further behind they are, while when they are ahead the average shot would have a higher xG value.

Conversely, if teams look to catch up by taking higher probability shots when they are behind, we’d expect to see the average shot have a higher expected goal (xG) value the further behind they are, while when they are ahead the average shot would have a higher xG value. I present the results of my analysis in the figure below.

The points represent each shot taken, while the y-axis represents the Expected Goal value and the x-axis represents the goal difference at the time the shot was taken. The red boxes represent the average xG value for the shots taken at a given goal difference and the standard error around that average. If you compare the center lines in each box, you can see an upward trajectory from -3 to +3, meaning that teams take lesser quality shots when they are behind and focus on higher quality shots when they are ahead.

My analysis of shot data shows that teams focus on taking whatever shot is available when they are behind, hoping that taking enough lower quality shots will help them get back in the game. There are a number of potential explanations for this, but it seems like teams prefer to take any available shot when they are behind but can be more selective when they are ahead.

**Appendix**

Here are the results of my probit regression: my dependent variable was “did the shot result in a goal scored?” and my independent variables are in the left column of the below table. The explanatory variable here is “goal difference” and it is positive and statistically significant (p < 0.05). That indicates goal difference is a significant predictor of likelihood of a goal scoring, and when teams are leading they take higher quality shots.

Estimate | Std. Error | z value | Pr(>|z|) | |
---|---|---|---|---|

(Intercept) | 0.1865 | 0.2929 | 0.64 | 0.5243 |

Goal Difference |
0.1076 | 0.0512 | 2.10 | 0.0357 |

Distance from Goal | -0.0810 | 0.0120 | -6.74 | 0.0000 |

Angle to Center of Goal | -0.7620 | 0.1889 | -4.03 | 0.0001 |

Time | -0.0007 | 0.0023 | -0.32 | 0.7498 |

Was the Shot Pressured | -0.2115 | 0.1306 | -1.62 | 0.1054 |

Kicked | 0.1807 | 0.1891 | 0.96 | 0.3394 |

Counter Attack | 0.4137 | 0.1329 | 3.11 | 0.0019 |

Home Team | -0.0751 | 0.1197 | -0.63 | 0.5305 |

Goalkeeper Error | 1.9129 | 0.4599 | 4.16 | 0.0000 |

Direct Free Kick | 0.5552 | 0.3278 | 1.69 | 0.0903 |

Assisted from a Corner | -0.0103 | 0.2471 | -0.04 | 0.9668 |

@deepXG mentioned that the causal arrow might be going in the wrong direction: teams taking lower xG shots might be more likely to fall behind so I also wanted to do an analysis within games to show a change within games. I subdivided the data by the final score: winning/losing by 3, 2, and 1 goal, and ties (winning/losing by 0). Most of these final scores didn’t have enough shots across a variety of game states (games that finish in a tie tend to spend most of the game tied, meaning there’s not much variation on the dependent variable to analyze), but I was able to find a pattern among the most extreme results (+/- 3 goals).

For both outcomes, we see the same pattern as in the main analysis (although with more uncertainty because of a relative paucity of data). Expected goal values decrease as teams fall behind/increase as they take the lead. This provides a second level of evidence and a robustness check on the original findings. Figures are presented below.

- I’m collecting these xG values by hand, coding each shot individually. As of now I have weeks 16-20 of the 2015 NWSL season as well as the first 3 weeks of the 2016 season. ↩

# NWSL Fantasy League: Using Analytics to Pick a Squad

I normally don’t participate in fantasy sports because they involve me rooting for weird things like Liverpool keeping a clean sheet while Daley Blind scores a goal and Yaya Toure gets a couple of assists. I can’t keep it straight, and it takes most of the enjoyment out of the game for me. However, NWSLFL has been fun for me and it’s forced me to immerse myself a little more in the league and learn more about all the players which is a good thing for someone trying to do analytics.^{1}

I wanted to share my thought processes for my third week’s success. Weeks 1 and 2 were pretty disastrous, but Week 3 I scored fairly well. I acknowledge there’s a lot of luck in this, but I do think I’ve improved my process and I figured I’d share it with people and maybe they can use it to do well, or at least join me in failure if this turns out to be a bad strategy long-term.

**Step 1: Find The Most Likely Winners**

I use my prediction model (OHAI) to see which teams are most likely to win, although I’m not 100% confident in the model so I also apply a logic test to it. This week, I’m looking at FC Kansas City to beat the Houston Dash or the Spirit over the Thorns. This is where I build my defense from, and usually where I pick my goalkeeper. I do like Nicole Barnhart, so I’ll probably pick her as my starting goalkeeper instead of Hope Solo (last week’s GK). I’ll pick a couple defenders from Kansas City, and a couple from the Spirit.

**Step 2: Find Teams Who Are Under/Overachieving Expected Goals**

My Expected Goals (xG) model predicts how many goals a team should score given the types of shots they have taken, and then I compare that to how many goals they’ve scored. If they’ve scored far more goals than I anticipated, they’re possibly due to have an off day. If they’ve scored less, they’re possibly due to have a good day. Last week the Houston Dash were *way* above the line, meaning that they’d scored far more goals than you’d expect given their shots. So I might pick against them and avoid their strikers – I could have even picked some midfielders from their opponents. I also might look at a team who has been expected to score a lot of goals but has come short and pick some of their attacking players. The WNY Flash look like they might be due for some goals, Seattle might be in for a little dry spell here.

Then I look at expected goals allowed to see who’s underachieving/overachieving there. The Spirit have been allowing fewer goals than expected, as have Sky Blue and FC Kansas City. That would mean a couple of things: they’re either due to allow some goals or their goalkeepers are extra good and are preventing goals from going in. I’ve watched Nicole Barnhart and she’s been fairly heroic in goal, so she might continue the pattern. Meanwhile, Orlando has let in more goals than expected so they might be due for some opponents to hit the post.

**Step 3: Picking the Rest**

I generally pick my USWNT designated players for the midfield – Tobin Heath and Christen Press always seem like safe bets to do good things. I like Kim Little right now because she’ll likely step up given all the injuries in Seattle. I also captain my goalkeeper because the top scorers usually seem to be goalkeepers (saves + clean sheet + winning bonus are a good combination if you can get it right). And I pick Kealia Ohai because she’s the namesake for my model so why not?

I haven’t picked my team this week, but this is the process. I like the new procedure, and I got super lucky last weekend with just about all my players scoring significant points. I missed Diana Matheson’s hat trick, but I think every player on my team had a goal or an assist last week, and all my defenders won or kept a clean sheet. Hope Solo didn’t face a ton of shots which hurt, but she won with a clean sheet so that was as much as I could have hoped for. Hopefully people can build on this, and I’d love to hear your refinements on the strategy!

- Subject matter is dramatically underrated in a lot of analytics exercises, but that’s a story for another day ↩

# Game Theory: The Rationality of Man City’s Fans Anger at Rotation

My newest Game Theory post about the value of rotation was inspired by a Gab Marcotti tweet:

God forbid you're in a CL semifinal + you prioritize winning the CL over a top 4 finish…

— Gabriele Marcotti (@Marcotti) May 1, 2016

He was speaking of Manchester City playing a “B Team” in the weekend’s Premier League fixture, prioritizing their mid-week CL semi-final return fixture against Real Madrid instead. The tweet was fairly controversial, especially among City’s fan base, and gave me a lot to think about. So as I like to do, I think about it from a utilitarian perspective and try to game the expected value for each choice.

The idea behind rotating before a big game is that you can increase your chances of winning the big game while diminishing your likelihood of winning the rotation game and diminishing your chances of obtaining a given league position. For Manchester City, they are currently in a fight for fourth place with Manchester United (and to a lesser extent after this weekend’s fixtures, Arsenal).

The first step is to think about which is more important to Manchester City fans: winning the Champions League semi-final (and possibly the entire tournament), or getting Arsene Wenger’s famous “fourth trophy” and ensuring Champions League football next year. I can see arguments for both, and despite the mocking of Wenger’s qualifying record, as a Milan fan I know the pain of missing out of Champions League football after you’ve become accustomed to it.

However, the expected happiness from advancing to the finals vs. securing 4th place is mitigated by a pretty significant factor: the probability of winning the semi-final match with a full strength squad, which leads us to the following equation.

**Expected Utility**_{(Rotation) }=* Pr(Advance to CL Finals _{(Rotation})*(Value of Advancing to CL Finals)-Pr(Miss CL Next Year_{(Rotation)})*(Pain of Missing CL Next Year)*

Manchester City’s expected utility (“good”) from rotating the squad is basically calculated by how much value they get from advancing to the Champions League finals^{1} multiplied by their probability of advancing to the finals. Then you subtract the probability that the rotated squad causes them to miss the CL next year multiplied by the pain of missing out. **In short: the biggest driver of value here is whether Manchester City fans think they can beat Madrid given a 0-0 draw in the home leg. If you don’t think this outcome if pretty likely,** then the first half of the equation approaches zero, meaning that the pain of missing next year hurts more than any potential pleasure gained from rotation. **In this case, it doesn’t make sense to rotate the squad.**

**However, if you assign a high probability to winning the semi-final at the Bernabeu** then the first half of the equation becomes higher, meaning that the potential pain of missing next year is less significant. **In this case, it makes perfect sense to rotate.**

But this isn’t the only factor. There’s a second equation at play here, which I present now:

**Expected Utility**_{(Full Strength}_{) }= Pr(Advance to CL Finals_{(Full Strength)}*(Value of Advancing to CL Finals) -Pr(Miss CL Next Year_{(Full Strength})* (Pain of Missing CL Next Year))

This represents the expected utility gained from playing a full strength squad. The equation is largely the same, but the values change because Manchester City played a full strength squad on the weekend. Presumably their likelihood of winning mid-week decreases because of fatigue (and potential injuries), while their likelihood of securing Champions League football next year increases because they have a greater likelihood of getting what would have been a crucial three points against Southampton.

**If you’re a Manchester City supporter and believe that the odds of beating Madrid are low**, then your values likely don’t change for the first half of the equation while your values for the second half of the equation increase. **In this case, you want a full strength squad during the weekend**.

**If you’re a Manchester City supporter and you believe that a fresh squad will beat Madrid while a fatigued squad will lose**, then your values for the first half of the equation are lower than they were previously. This lowers your expected value in a significant way, **meaning you want a rotated squad over the weekend.**

The final decision is calculated by which equation gives you a higher expected utility: which version makes you happier? **Ultimately the question depends on two major factors: how likely you think Manchester City is to beat Madrid on the road, and how much pain you’ll feel if they fail to qualify next year.** If you don’t have faith that they can pull of an upset mid-week, then you’ll oppose rotation and prioritize the league. If you believe there’s a chance, then you’ll support rotation and going all-in for this year’s Champions League.

**Part 2: Pellegrini’s Lame Duck Status**

Normally we can roughly argue a manager’s incentives are aligned with his team’s and the fans. However, Manchester City have done something strange this year, announcing Pep Guardiola will be the new manager of Manchester City regardless of what Manuel Pellegrini does this year. This introduces a new wrinkle, one that I think fully explains why he did what he did. I want to return to the expected utility equation from earlier, because the logic is the same while the values are different given Pellegrini’s unusual incentives here.

**Expected Utility**_{(Rotation) }=* Pr(Advance to CL Finals _{(Rotation})*(Value of Advancing to CL Finals)-Pr(Miss CL Next Year_{(Rotation)})*(Pain of Missing CL Next Year)*

Because Pellegrini is a lame duck manager with zero interest in what happens to Manchester City next year, he experiences literally zero pain from Manchester City missing out on the Champions League next year. Pep Guardiola gets all the benefits if he qualifies, and Pep gets all the pain from missing out if he doesn’t. The second half of this equation is literally zero, so it becomes completely irrelevant to our calculations. So when we combine the two equations from earlier, we get the following:

**Expected Utility**_{(Pellegrini) }=* Pr(Advance to CL Finals _{(Rotation})*(Value of Advancing to CL Finals)-Pr(Advance to CL Finals_{(Full Strength)}*(Value of Advancing to CL Finals)*

Because the value of advancing to the Champions League Finals is the same for Pellegrini in both cases, we can cancel that term out and we’re left with the following:

**Expected Utility**_{(Pellegrini) }=* Pr(Advance to CL Finals _{(Rotation})*-Pr(Advance to CL Finals_{(Full Strength)}*

Even if the probability is virtually zero in both circumstances, and even if the value of rotation is virtually zero, Pellegrini strongly prefers^{2} rotating the squad over the weekend to maximize his probability of winning the Champions League, something that could presumably bolster his CV and improve the contract at his next job. **Manuel Pellegrini has literally no reason to not rotate the squad, even if he sees virtually no value in it.**

The Manchester City case described here is a relatively unusual one, which is why it’s interesting to me. The conflict between a manager’s incentives, the fans’ incentives, and reasonably different incentives between fans makes this a difficult case to think about and one worth exploring more and provides a lively discussion.