One of the strategic questions that has always interested me is: what is the best way to catch up after going behind in a soccer match? To my mind, there are two options:
- Take a lot of low percentage shots, hoping that volume makes up for a lack of quality.
- Be patient and wait for the high quality chances, hoping that quality makes up for a lack of volume.
There are merits to both, and you could probably solve this mathematically based on expected number of shots and expected quality per shot given any number of variables. My head is spinning thinking about how you’d actually solve this equation, but given enough familiarity with teams and the right data the math would be easy enough. Solving this equation isn’t my goal with this post, instead I want to see what teams have done and use observable data to see what their strategies are/potentially how they’ve solved the problem for themselves.
To do this, I’ve undertaken two separate analyses. The first is simple enough: what is the likelihood that a shot goes in given the game state at the time of the shot? More simply put: does shot quality correlate with score?
To answer the question, I ran an analysis (full details in the appendix) looking at each shot in the NWSL this season and part of last season1. I calculated the probability that each shot becomes a goal, and compared those probabilities when the score is even, the shooter’s team is one goal ahead/behind, two goals ahead/behind, three goals ahead/behind.
If teams look to catch up by taking lower probability shots when they are behind, we’d expect to see the average shot have a lower expected goal (xG) value the further behind they are, while when they are ahead the average shot would have a higher xG value.
Conversely, if teams look to catch up by taking higher probability shots when they are behind, we’d expect to see the average shot have a higher expected goal (xG) value the further behind they are, while when they are ahead the average shot would have a higher xG value. I present the results of my analysis in the figure below.
The points represent each shot taken, while the y-axis represents the Expected Goal value and the x-axis represents the goal difference at the time the shot was taken. The red boxes represent the average xG value for the shots taken at a given goal difference and the standard error around that average. If you compare the center lines in each box, you can see an upward trajectory from -3 to +3, meaning that teams take lesser quality shots when they are behind and focus on higher quality shots when they are ahead.
My analysis of shot data shows that teams focus on taking whatever shot is available when they are behind, hoping that taking enough lower quality shots will help them get back in the game. There are a number of potential explanations for this, but it seems like teams prefer to take any available shot when they are behind but can be more selective when they are ahead.
Here are the results of my probit regression: my dependent variable was “did the shot result in a goal scored?” and my independent variables are in the left column of the below table. The explanatory variable here is “goal difference” and it is positive and statistically significant (p < 0.05). That indicates goal difference is a significant predictor of likelihood of a goal scoring, and when teams are leading they take higher quality shots.
|Estimate||Std. Error||z value||Pr(>|z|)|
|Distance from Goal||-0.0810||0.0120||-6.74||0.0000|
|Angle to Center of Goal||-0.7620||0.1889||-4.03||0.0001|
|Was the Shot Pressured||-0.2115||0.1306||-1.62||0.1054|
|Direct Free Kick||0.5552||0.3278||1.69||0.0903|
|Assisted from a Corner||-0.0103||0.2471||-0.04||0.9668|
@deepXG mentioned that the causal arrow might be going in the wrong direction: teams taking lower xG shots might be more likely to fall behind so I also wanted to do an analysis within games to show a change within games. I subdivided the data by the final score: winning/losing by 3, 2, and 1 goal, and ties (winning/losing by 0). Most of these final scores didn’t have enough shots across a variety of game states (games that finish in a tie tend to spend most of the game tied, meaning there’s not much variation on the dependent variable to analyze), but I was able to find a pattern among the most extreme results (+/- 3 goals).
For both outcomes, we see the same pattern as in the main analysis (although with more uncertainty because of a relative paucity of data). Expected goal values decrease as teams fall behind/increase as they take the lead. This provides a second level of evidence and a robustness check on the original findings. Figures are presented below.
- I’m collecting these xG values by hand, coding each shot individually. As of now I have weeks 16-20 of the 2015 NWSL season as well as the first 3 weeks of the 2016 season. ↩