Given the paucity of data and analysis for women’s soccer, I thought it would be a worthwhile summer project to build an Expected Goals (xG) model for the NWSL. If you’re unfamiliar with Expected Goals, I’ve written a few posts about the math behind the model that are probably worth reading: A Very Preliminary NWSL Expected Goals Model: xG 101 and Expected Goals 201: xG For Soccer Analytics Majors. The basic idea is taking characteristics of shots like distance from goal, angle to the center of goal, whether it was kicked or headed, whether it came from a counter attack, etc., and calculating the probability that a given shot turns into a goal. Shots are rated on a scale from 0-1, with the number being the probability of a shot scoring.
I’ve been tweeting some of the things I’ve found, but they’ve been scattered across a number of tweets and days, so I wanted to combine them all into one post and talk about some of the plots in a little more detail than is allowed in 140 characters (116 after the image).
This plot shows the relative xG scores for each game over the weekend. Most of the games were fairly close in terms of shot quality, except for Houston v. Orlando which was fairly one-sided (both in actual score and shot quality). I don’t have xG maps, but I think this is a clean, clear presentation of what happened in each of the games from the weekend.
The next plot shows cumulative player xG/Shot Quality scores over the first two weeks. Two things stand out here to me. The first is Jessica McDonald’s massive lead on everyone on her team (and the league which we’ll see in a minute), and the balance among the Portland Thorns: few players taking seemingly high quality shots. Comparing this to FC Kansas City with a larger number of players taking relatively low quality shots.
The third graph shows the top 20 players in the cumulative shot quality rankings. I tried color-coding by team, but with so many teams relying on red or blue it didn’t come out as well as I’d like. The good news is that Orlando (purple) and Houston (orange) stand out. USWNT players are doing well here – Alex Morgan is in second place (far) behind, with Jessica McDonald, Lindsey Horan, Carli Lloyd, and Christen Press all in the top few spots. Jessica McDonald is leading the pack by a long way though, with zero goals so far unfortunately for the WNY Flash.
This graph is similar to the previous one but instead of the top 20 it includes everyone who has taken a shot this season so you can see where your favorite player ranks.
The last two plots serve both as a diagnostic plot of my measure: how well does my xG score predict actual goals? In this first one, the dotted line represents a 1:1 relationship between expected goals and actual (non-penalty, non-own) goals scored, which is a “perfect” correspondence between my measure and the “real world.” I’ve got five teams (Chicago, Orlando, Washington, and Kansas City) basically on the line, which I’m really happy with, two other teams (Portland and Seattle) close, and three outliers (WNY, Boston, and Houston). I’m really happy with this so far, and despite the small sample size this season so far I think the model is performing well.
Beyond the diagnostics, if we assume that teams will eventually converge toward the 1:1 ratio, we’d expect WNY (mostly Jessica McDonald) and Boston to start scoring more goals soon, while the overachieving Houston Dash might be in for a dry spell soon.
The final plot is the other side of the last one – expected goals allowed for each teach vs. actual goals allowed. There are fewer teams that fit perfectly (Sky Blue and Portland), but there aren’t any extreme outliers this time (with Kansas City being the furthest off the line).
Kansas City is probably due to start conceding more goals soon. Nicole Barnhart has been strong in goal for them so far, so maybe she’s the reason for Kansas City over-achieving on this measure. Similarly, despite a strong start by the expansion Orlando Pride, they’ve actually conceded a goal more than my model expects they should have so they may be due for some luck going forward.
This is all I could think of as far as presenting xG/Shot quality data for the NWSL. There’s a lot of data here, and I tried cutting it as many ways as I could to present as much info as I could from a single dataset.