Category Archives: NWSL

NWSL: Catching Up By Taking Worse Shots?

One of the strategic questions that has always interested me is: what is the best way to catch up after going behind in a soccer match? To my mind, there are two options:

  1. Take a lot of low percentage shots, hoping that volume makes up for a lack of quality.
  2. Be patient and wait for the high quality chances, hoping that quality makes up for a lack of volume.

There are merits to both, and you could probably solve this mathematically based on expected number of shots and expected quality per shot given any number of variables. My head is spinning thinking about how you’d actually solve this equation, but given enough familiarity with teams and the right data the math would be easy enough. Solving this equation isn’t my goal with this post, instead I want to see what teams have done and use observable data to see what their strategies are/potentially how they’ve solved the problem for themselves.

To do this, I’ve undertaken two separate analyses. The first is simple enough: what is the likelihood that a shot goes in given the game state at the time of the shot? More simply put: does shot quality correlate with score?

To answer the question, I ran an analysis (full details in the appendix) looking at each shot in the NWSL this season and part of last season1. I calculated the probability that each shot becomes a goal, and compared those probabilities when the score is even, the shooter’s team is one goal ahead/behind, two goals ahead/behind, three goals ahead/behind.

If teams look to catch up by taking lower probability shots when they are behind, we’d expect to see the average shot have a lower expected goal (xG) value the further behind they are, while when they are ahead the average shot would have a higher xG value.

Conversely, if teams look to catch up by taking higher probability shots when they are behind, we’d expect to see the average shot have a higher expected goal (xG) value the further behind they are, while when they are ahead the average shot would have a higher xG value. I present the results of my analysis in the figure below.

NWSL Score by Game State

The points represent each shot taken, while the y-axis represents the Expected Goal value and the x-axis represents the goal difference at the time the shot was taken. The red boxes represent the average xG value for the shots taken at a given goal difference and the standard error around that average. If you compare the center lines in each box, you can see an upward trajectory from -3 to +3, meaning that teams take lesser quality shots when they are behind and focus on higher quality shots when they are ahead.

My analysis of shot data shows that teams focus on taking whatever shot is available when they are behind, hoping that taking enough lower quality shots will help them get back in the game. There are a number of potential explanations for this, but it seems like teams prefer to take any available shot when they are behind but can be more selective when they are ahead.

Appendix

Here are the results of my probit regression: my dependent variable was “did the shot result in a goal scored?” and my independent variables are in the left column of the below table. The explanatory variable here is “goal difference” and it is positive and statistically significant (p < 0.05). That indicates goal difference is a significant predictor of likelihood of a goal scoring, and when teams are leading they take higher quality shots.

Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.1865 0.2929 0.64 0.5243
Goal Difference 0.1076 0.0512 2.10 0.0357
Distance from Goal -0.0810 0.0120 -6.74 0.0000
Angle to Center of Goal -0.7620 0.1889 -4.03 0.0001
Time -0.0007 0.0023 -0.32 0.7498
Was the Shot Pressured -0.2115 0.1306 -1.62 0.1054
Kicked 0.1807 0.1891 0.96 0.3394
Counter Attack 0.4137 0.1329 3.11 0.0019
Home Team -0.0751 0.1197 -0.63 0.5305
Goalkeeper Error 1.9129 0.4599 4.16 0.0000
Direct Free Kick 0.5552 0.3278 1.69 0.0903
Assisted from a Corner -0.0103 0.2471 -0.04 0.9668

 

@deepXG mentioned that the causal arrow might be going in the wrong direction: teams taking lower xG shots might be more likely to fall behind so I also wanted to do an analysis within games to show a change within games. I subdivided the data by the final score: winning/losing by 3, 2, and 1 goal, and ties (winning/losing by 0). Most of these final scores didn’t have enough shots across a variety of game states (games that finish in a tie tend to spend most of the game tied, meaning there’s not much variation on the dependent variable to analyze), but I was able to find a pattern among the most extreme results (+/- 3 goals).

For both outcomes, we see the same pattern as in the main analysis (although with more uncertainty because of a relative paucity of data). Expected goal values decrease as teams fall behind/increase as they take the lead. This provides a second level of evidence and a robustness check on the original findings. Figures are presented below.

Blog 3 Goal Win Blog Team Lost By Three

 

 

  1. I’m collecting these xG values by hand, coding each shot individually. As of now I have weeks 16-20 of the 2015 NWSL season as well as the first 3 weeks of the 2016 season.

NWSL Expected Goals: Six Graphs From Weeks 1 and 2

 

Given the paucity of data and analysis for women’s soccer, I thought it would be a worthwhile summer project to build an Expected Goals (xG) model for the NWSL. If you’re unfamiliar with Expected Goals, I’ve written a few posts about the math behind the model that are probably worth reading: A Very Preliminary NWSL Expected Goals Model: xG 101 and Expected Goals 201: xG For Soccer Analytics Majors. The basic idea is taking characteristics of shots like distance from goal, angle to the center of goal, whether it was kicked or headed, whether it came from a counter attack, etc., and calculating the probability that a given shot turns into a goal. Shots are rated on a scale from 0-1, with the number being the probability of a shot scoring.

I’ve been tweeting some of the things I’ve found, but they’ve been scattered across a number of tweets and days, so I wanted to combine them all into one post and talk about some of the plots in a little more detail than is allowed in 140 characters (116 after the image).

Week 2 Game xG Plot

This plot shows the relative xG scores for each game over the weekend. Most of the games were fairly close in terms of shot quality, except for Houston v. Orlando which was fairly one-sided (both in actual score and shot quality). I don’t have xG maps, but I think this is a clean, clear presentation of what happened in each of the games from the weekend.

Rplot01

The next plot shows cumulative player xG/Shot Quality scores over the first two weeks. Two things stand out here to me. The first is Jessica McDonald’s massive lead on everyone on her team (and the league which we’ll see in a minute), and the balance among the Portland Thorns: few players taking seemingly high quality shots.  Comparing this to FC Kansas City with a larger number of players taking relatively low quality shots.

Week 2 NWSL Individual 20

The third graph shows the top 20 players in the cumulative shot quality rankings. I tried color-coding by team, but with so many teams relying on red or blue it didn’t come out as well as I’d like.  The good news is that Orlando (purple) and Houston (orange)  stand out. USWNT players are doing well here – Alex Morgan is in second place (far) behind, with Jessica McDonald, Lindsey Horan, Carli Lloyd, and Christen Press all in the top few spots. Jessica McDonald is leading the pack by a long way though, with zero goals so far unfortunately for the WNY Flash.

Week 2 NWSL Individual All

This graph is similar to the previous one but instead of the top 20 it includes everyone who has taken a shot this season so you can see where your favorite player ranks.

NWSL Week 2 xg v Actual

The last two plots serve both as a diagnostic plot of my measure: how well does my xG score predict actual goals? In this first one, the dotted line represents a 1:1 relationship between expected goals and actual (non-penalty, non-own) goals scored, which is a “perfect” correspondence between my measure and the “real world.” I’ve got five teams (Chicago, Orlando, Washington, and Kansas City) basically on the line, which I’m really happy with, two other teams (Portland and Seattle) close, and three outliers (WNY, Boston, and Houston). I’m really happy with this so far, and despite the small sample size this season so far I think the model is performing well.

Beyond the diagnostics, if we assume that teams will eventually converge toward the 1:1 ratio, we’d expect WNY (mostly Jessica McDonald) and Boston to start scoring more goals soon, while the overachieving Houston Dash might be in for a dry spell soon.

NWSL Week 2 xGA v Actual

The final plot is the other side of the last one – expected goals allowed for each teach vs. actual goals allowed. There are fewer teams that fit perfectly (Sky Blue and Portland), but there aren’t any extreme outliers this time (with Kansas City being the furthest off the line).

Kansas City is probably due to start conceding more goals soon. Nicole Barnhart has  been strong in goal for them so far, so maybe she’s the reason for Kansas City over-achieving on this measure.  Similarly, despite a strong start by the expansion Orlando Pride, they’ve actually conceded a goal more than my model expects they should have so they may be due for some luck going forward.

This is all I could think of as far as presenting xG/Shot quality data for the NWSL. There’s a lot of data here, and I tried cutting it as many ways as I could to present as much info as I could from a single dataset.