Category Archives: Blogs

Thinking about Individual “Finishing Skill”

One of the big open questions in expected goals research is accounting for individual finishing skill: a shot taken by Lionel Messi is worth more than a shot taken by Jesus Navas, but how do we account for that? There are any number of open issues here, mostly methodological, but I’ve recently started thinking about a more theoretical one that should be addressed before worrying about the statistics underlying the concept: What exactly do we mean by “finishing skill?”

As far as I’ve read, finishing skill is typically thought of as the ratio of goals above (or below) the number of expected goals. Dismissing the ideas of variance and imprecision in measurement for a moment, a player who scores more goals than expected is a good finisher while a player who scores fewer goals than expected is a bad one. Lionel Messi outperforms his expected goals, so we can say that he’s clinical in front of goal, while Jesus Navas underperforms so we can say that he’s…well as a Manchester City supporter I don’t want to talk about it. But is this all there is to be said?

One of the great contributions of expected goals is the idea that all shots are not created equal, that is to say a shot taken from out wide and from outside the penalty area is less likely to score than one taken from the center of the goal six yards away. But why shouldn’t we apply this to the idea of finishing as well? I’m coming around to the idea that there are as many types of finishing skill as there are types of shots, so the next step is to identify the important ones and identify which players are what types of finishers. To do this, I draw on some of the fundamental contributions of expected goals research along with the eyeball test from watching and playing however many thousands of hours of soccer.1

The Clinical Finisher

This archetype comes from the idea of players who are clinical in front of the net. They’re calm, collected, and don’t miss easy opportunities. In xG terms, they score on high probability shots even more frequently than one would expect. A 0.5 xG shot kicked by Lionel Messi one-on-one vs. the goalkeeper is more likely to go in than one kicked by Fernando Torres at Chelsea. So theoretically that shot is 0.7 on Messi’s boot and 0.3 on Torres’s. This likely correlates with things like confidence, composure, and close ball control.

The Long Range Sniper

Shots taken outside the box and from an angle have a low xG value, yet some players continue to take them. Presumably some are better at these shots than others, and being able to shoot from distance is certainly a skill that can be developed. We could also expect some players to not realize that they don’t have this skill and still take a number of shots from distance, so we’d see some significant variation here. I’m thinking about Zlatan’s famous bicycle kick for Sweden: for a mortal man that shot would be 0.00001 xG, but for Zlatan maybe he makes that as many as 1/10 times (0.1 xG).

The Free Kick Specialist

Direct free kicks are a skill some players have while others don’t. Players like Cristiano Ronaldo, Yaya Toure, or Andrea Pirlo probably deserve a decent bump in xG from direct free kicks, while others players are probably below average. There’s some difficulty here in that only good free kick takers would really ever take any, but it’s a skill we could measure.

The Head of the Class

We know headed shots are lower value than those that are kicked, but obviously this isn’t equal across the board. Andrea Pirlo has never been known as a great header of the ball so maybe a header taken by him that would normally have an xG value of 0.4 would have a true value of 0.3, while someone like Zlatan or even Gerard Pique would have more talent in this area and would be worth a 0.5.

There are likely more types – players who are better on counter attacks, players who are better on corners, etc., but I wanted to present a few basic archetypes because it’s worth discussing and worth thinking about not just finishing skill, but types of finishers. If you’re trying to build a team, you wouldn’t just want the best finishers using the pure xG/actual goals metric, you’d want complementary players. Maybe you’d want to build a team filled with speedy, clinical players who could finish goals on counter attacks. Or maybe you want to play along the flanks and cross the ball into the box 30+ times a game, so you’d want some forwards who are strong headers of the ball. Maybe every team needs at least one free kick specialist, or maybe you’d want a balance. But regardless of the strategy, using xG to define different types of finishers would be a useful addition to the toolkit.

  1. All players and numbers I use here are hypothetical. The point isn’t the identify specific players or specific values, just to present illustrations of what I’m thinking

MOTSON’s 2015-2016 Hits and Misses

I may be a  bit premature here, but it seems to me the major parts of the Premier League season are pretty much decided. Leicester City seems uncatchable at the top. Arsenal and Spurs will finish 2nd and 3rd (or 3rd and 2nd) while Man City looks pretty solid for 4th. Two of the three relegation spots are basically sealed, but there’s still the matter of whether Norwich City or Sunderland stay up. Because my interest in the season has waned significantly, I thought I’d do an early “year-in-review” where I assess MOTSON’s biggest hits and biggest misses. I’ll start with the obvious.

Chelsea

Yeah, I don’t know what to say about this. I had Chelsea in second place at the beginning of the season and they look to be stuck in literally the middle of the table. Everyone else was roughly in the same boat MOTSON was, and I honestly don’t know if this could have been foreseen. *Maybe* if you added a “Mourinho third year implosion” variable to the model, but even then would you have guessed 10th place? Nevertheless, it’s a pretty big miss and was the source of the majority of the error in my model.

Leicester City

I’m going to call this a hit and a miss, but more of a miss than a hit to be honest. I’ve been particularly proud of MOTSON predicting Leicester City higher than anyone else – 8th place on 60 points. Not bad, and if they finished 3rd or lower I was willing to call this a huge success for the model. On the other hand, if they win the title this year then it’s hard to say “I had the Champs in 8th place – I win!” I’m proud that my model recognized them as good long before anyone else did, and if you look at a lot of analytics prognostications for next year they’re saying “Leicester’s probably 7th or 8th place” so MOTSON is 9 months ahead of the curve there. But it’s a small victory assuming they win the league with 15-20 points more than I predicted.

That being said, MOTSON was ahead of the curve predicting them as Champions League qualifiers, picking them to qualify as of December 5. I know this because on December 4th I wrote that they should obviously sell Jamie Vardy because they had no expectation of the Champions League and December 6th changed my mind.

Game Theory: Leicester Has To Sell Jamie Vardy in January

Game Theory: Top 4 Contenders Leicester City Should Absolutely Keep Jamie Vardy

Nicolas Otamendi

MOTSON *hated* this signing by Man City back in August, and it turned out to be right. He was a disaster in City’s backline, and is one of the reasons City’s fighting for 4th instead of comfortably coasting into the Champions League.

Transfer Rumors 0817

West Ham

So MOTSON didn’t get West Ham’s success right pre-season, but it did pick up on their top 6 challenge *very* early in the season (October 24). Mike Goodman and I had a conversation about this, and I argued that West Ham banking those 8 points over expectations would be enough to get them a top 6 spot. As of today they’re 10 points over expectation, so they’ve basically broken even since then and look to be in the top 6 at the end of the season.

Leicester City Redux

MOTSON really liked Jamie Vardy to have a big year this year, something I didn’t notice until it had already happened because he wasn’t on my radar.

We Should Have Seen It Coming: Evaluating Jamie Vardy Against the EPL’s Elite Strikers

It also really liked Riyad Mahrez, pegging his replacement as something like a 10 point downgrade.

On the other hand, I never posted anything along these lines, but it didn’t really like the N’golo Kante signing which I’d classify as a pretty big miss. He’s been phenomenal for them and MOTSON would have told them to pass.

Barcelona Will Be Fine Without Messi

Lionel Messi got injured in the early part of the season, out for a month, and “real football men” wrote all sorts of thinkpieces about how Barcelona would be in trouble losing the world’s best player despite having two other world class strikers on the pitch even in Messi’s absence (and decent young backups filling in). MOTSON got it right: Barcelona would be fine without him, and they were. This is one of my favorite analytics pieces I’ve written, so I wanted to bump it.

Will Barcelona Be Fine Without Messi?

Those are the big ones I can remember – plenty of successes for its first year with out of sample data but plenty of room for improvement as well. I may revisit this at some point, but for now I think this is a good recap of the model. Thanks for reading, and this summer I’ll be focusing on bringing statistics to the NWSL so keep an eye out for that.






Some Personal Reflections on Getting Started in Soccer Analytics

Yesterday Ravi (@Scribblr_42) wrote a great microblog titled “Is Fanalytics Intimidating?” and you can find it here: https://twitter.com/Scribblr_42/status/709803712856899584

It’s really interesting, and you all should read it and then come back to my post. I’ll wait.

This isn’t a methods post about how to get started, but I wanted to share some of my experiences on breaking into the analytics community and maybe offer some advice to people who want to get involved.

I’ve been doing this about a year now, and converted my personal Twitter account into a “soccer only” account around 7 months ago. In that time I’ve grown my following from about 110 friends and former students to over 2500 followers, and feel incredibly fortunate to have picked up a good-sized following in such a short time.

I’ve posted it before, but this started as an excuse to improve my data science skills that I didn’t think would go anywhere. I likely wouldn’t have kept up with it if there wasn’t an audience for what I’ve been doing – I’ve briefly tried political blogs in the past but they never really caught on and I don’t feel like I have anything unique to add to the political blogosphere.1 I don’t see the purpose in saying something that’s already been said for a couple dozen people to read, so I’ve never really stayed with it. But with soccer I’ve built up an audience for my work so I’ve continued to put out content that hopefully people continue to enjoy.

The biggest thing that has helped me has been bigger accounts sharing my work, and I couldn’t be more appreciative to the people who have done so. Mike Goodman has been particularly supportive from the beginning(@The_M_L_G), as have @GoalImpact, and @7amkickoff. I’m incredibly grateful to the people who regularly share my work, particularly Jake Kilov (@Kilonater3000) and Naveen Maliakkal (@njm1211) who have consistently retweeted me for a while, and anyone else who does.

Since I moved in to Women’s Soccer, @DasGherkin has been incredibly generous in promoting my account, and I’ve picked up over 200 followers just from her recommendations. She’s kind of a big deal in the WoSo Twitter world, and for her to share my work has given me a real credibility boost  in that world and has been invaluable in helping me get the word out for my upcoming NWSL analytics work. I know I’m leaving a bunch of people out, but I’m grateful to everyone who has helped me out.

But it’s not just about growing my audience – simple support from people in the community has meant a lot and kept me going. Tom Worville (@Worville)  helped me with coding and data issues more than a few times, especially in the beginning, and James Curley (@jalapic) and @UTVilla have both given me great help and have helped my R programming skills grow exponentially. And even something as simple as seeing prominent accounts favoriting my tweets has given me the motivation to go forward and keep doing work. Favoriting is especially important because it’s a costless act beyond pressing a button, but it’s a nice validation that I’m doing something interesting/good and a clear sign that someone’s reading it. The community embraced my work early, so it has been less intimidating for me than it has been for others, but I can see that it would be intimidating if someone doesn’t get this support. I don’t know if I would still be posting public analytics work without it, and I’m someone who was already confident in my analytics skills.

To people with a lot of followers: take the time to engage with people who don’t have as many eyes seeing their work. You don’t have to have long conversations (although it’s nice), but something as simple as a favorite, or even a retweet, can mean a lot to someone trying to find a place in the community. I do this every so often with my Retweet Days where I share work from people who have fewer followers than I do. And the best part is that the more followers I get, the more voices I can amplify. I’m happy to do it.

I’ve also been trying to give back in terms of my “Intro to Analytics” YouTube Course. People seem to be enjoying it, and I need to add some more chapters as soon as I can find the time and energy to do so. Maybe people who watch these will become involved in analytics, or maybe they’ll just be more able to participate in the conversation, but hopefully this will let more people get involved and make people a little less hesitant to participate. At some point we all started out with virtually no followers, so why not pay it forward and try to help people who might be part of the next generation of soccer analysts?

Final thoughts: for those of you looking to get involved,  don’t worry about the math. Learn some simple concepts from my YouTube channel (if you don’t know them already), find some public data, and go from there. I know my work focuses on predictions, but like someone said on Twitter I don’t think there’s much more room in that space unless you can out-predict my model and the other ones that are out there (which will be tough). Same goes for Expected Goals: I don’t think there’s a lot of room for new xG models in the analytics world unless you can put something together that significantly beats the prominent ones already out there. But there’s lots of work to be done, so look for those gaps and fill them.

Most importantly, be a good citizen, regardless of how many followers you have. If you’re doing interesting work, engage with other people’s work in a positive way, and write often you’ll have a good chance at building a following. Even if you don’t, you’ll have some positive experiences and share your work with like-minded people. Hopefully people wanting to get started can take my advice, and hopefully people who have a strong following can help encourage new people to participate in the community. It really helped me get a foothold and feel like my work was being appreciated, and the teacher in me wants to help others along the same lines.

 






  1. I do original research as part of my job, but that’s academic, not blogging.

Please Everyone: Slow Down, Explain Your Methods

I love reading all the interesting and impressive analytics work being done out there. There are so many people doing so many interesting things in Soccer Analytics TwitterTM, and it’s truly amazing to me how much great work is being done. I wanted to make one suggestion to the community though: slow down, explain what you’re doing step-by-step, and be as clear as possible.

I have a Ph.D. in Political Science, with a minor field of quantitative methodology. I teach two difference research methods courses at a university, have won awards for articles in methods journals, and have worked as a statistical consultant for multiple political organizations. I don’t say this because I think it’s particularly impressive, I say it to establish that I’m at least above the average blog reader in my understanding of math and statistics, yet in a significant amount of articles I read, can’t figure out what people are doing. If I can’t understand, I assume I’m not the only one. So I wanted to offer some advice for public discussion of statistical methods taken from my own experience and 7 years of teaching.

Slow down

The math in most cases isn’t particularly complicated – most people are calculating some average or comparing differences in two averages, and I rarely read anything that uses math beyond high school algebra. To be clear, I don’t mean this as a criticism. I’m not at all a believer in fancy stats for the sake of fancy stats, but the actual math isn’t complicated. However when you rush through several steps of the process, present a formula and then move on to your next point without explaining the formula you’re going too fast.

Take a minute, explain the formula fully in words, step by step   and point by point. Devote a full paragraph to it, making sure that the reader could re-create exactly what you’re doing without inferring what any steps are

Slow down some more

When you think you’ve slowed down enough and have explained it thoroughly enough, you still probably haven’t. There’s a company out there who asks potential employees to describe how they’d cook an egg. Some people say “You toss the egg in the pan, wait a couple minutes, and then put it on a plate.” Those people don’t get hired. Others say “You get an egg out of the refrigerator. Then you put a small pan on top of a burner on the stove. Then you turn the burner on to high, waiting 2 minutes to let the pan get hot. Then you crack the egg on the side of the pan, separate the shell, and drop the inside into the pan.” etc…these are the people who get hired, and this is the level of detail you should aspire to.

Create a separate paragraph with the formula itself. Explain each term in the formula. Walk the reader through a sample calculation. Explain the results.

Never Use Jargon when Regular Words Work

Jargon is created for specialists who need to communicate specific concepts to each other in a very clear, precise way. That’s probably not what you’re trying to do when you blog. Worse, when you start using technical terms without explaining them, you lose a percentage of that audience with every term. Maybe you think xG is a ubiquitous abbreviation that everyone has heard, but a percentage of your readers haven’t. I read articles that reference PDO all the time and for some reason I can never remember what it means. Maybe I’m the only one, but I doubt it. Maybe you think everyone knows what “regression to the mean” is, but I doubt it. Give a half-sentence explanation every time you use a word or phrase that you wouldn’t use outside of Twitter.

There’s likely more advice to be given here, but starting with these three pieces of advice will help the clarity of a lot of the things I see. You have spent a significant of time on your project, so you likely know it better than anyone. That’s both a good thing and a bad thing: you’ve hopefully spent a lot of time thinking it through and have created the best possible measure, but you also likely have a hard time filling in the blanks a new reader will not understand. If you step back and think about this advice it will both expand your audience and expand the audience of analytics in general, which are both worthwhile goals in my opinion.

Initial Thoughts on my “Intro to Analytics” Class

I posted a “syllabus” for an Intro to Soccer Analytics “class” a while ago, and I’ve been meaning to go forward with it but haven’t been able to push forward with it because I haven’t  been sure about the interest. I posted a call today on Twitter and got a great response.

I have a quick four part syllabus planned where we talk about picking a topic (thanks to @Sam_Jackson for pointing out I already slipped into academic world talking about “research questions”) and the general principles of what analytics should be, the logic of inference, basic quantitative methodology, and then data visualization techniques. If it’s successful I’ll post more videos about different topics, but my goal is to create something where people who want to get involved in creating things can get a foothold in the community, for media and people at clubs to understand the value of analytics and maybe become more educated consumers, and to maybe help build a bigger community.

I have two concerns: the first is that I’m not particularly good on YouTube. My day job is Political Science professor, and I teach a Research Methods course and an Advanced Research Methods course to undergraduates, but I also teach an American Government class partially online. I’m better in person, so I’ve moved the class partially in person so students can get to know me. So I’m going to try and be entertaining and informative at the same time, hopefully I’ll succeed and people will enjoy what I’m working and will learn a lot from the whole process.

My second concern is that this will be a fairly big time commitment from me – writing the scripts, putting together examples, editing videos, etc., and this is in addition to my day job (and a new puppy who has more energy in one day than I’ve had in my entire life). So all I’m asking from you all is to share any tweets/videos/posts I make on this topic as much as you can and help get as many people to watch them as possible. Please tell a friend, share your projects from the course, and help me get as many eyes on this as possible. I’m not trying to make any money off of this, but I do get a great deal of satisfaction from knowing people are enjoying what I’m doing so hopefully you all can help me share this and get as big of an audience as possible for these!

I’m overwhelmed by the interest so far. Thanks everyone for your enthusiasm, and I’ll hopefully be posting the first set of videos soon so we can get started.

 

Information Processing, Statistical Modeling, and “Expertise”

I recently did a guest post over at Scoreboard Journalism showing that statistical models are beating media experts pretty soundly in predicting the EPL table this year, and I saw a pattern in some of the feedback.  A number of people suggested that the reason modelers are beating the experts is because the media folks didn’t burn a lot of calories on their predictions while the modelers spent far more time and energy on the competition. There’s a few issues with this that I won’t address here, but I did want to point out the fundamental flaw with this argument: “time spent” isn’t a function of filling out the form, instead it’s about how much time you spend taking in information about soccer.

For people who didn’t read the original post, I wanted to post the dataviz showing how the modelers are consistently beating the media. The vertical line represents the simple model of “everything is the same as last year”, a line which virtually no media experts or fans beat, while over 20 models are ahead of that point. If the experts can’t even beat the simple model, that brings their expertise into question.

Week 26 - Prediction Data

Even in the face of this evidence, a number of commenters weren’t satisfied. My best explanation for this is that statistical modelers are facing some serious motivated reasoning: the idea that we reject arguments that disagree with our preconceived notions regardless of the quality of the evidence, while only accepting ones that fit what we already believe. But even if it’s not, I wholeheartedly reject the idea that modelers spend more time on their predictions. Even if they spent more conscious time building the model than the media experts did, that’s not really how information processing works.1

Borrowing from psychology (and more importantly for my background, political psychology), we don’t actively study most topics as if we’re going to take an exam on it for a college course. We also take in far more information than our active memory can actually process and convert to long-term memory. As a way of coping with the overwhelming amount of information we encounter, we resort to something called “online information processing.” Basically what this means is that we keep a running tally in our head of whether we think something is good or bad. We don’t know exactly why we feel the way we do, but we update our preferences with new information as it comes in.

This process works in soccer pretty simply: every time you see a team win you update their information and think they are better, and every time you see a team lose you update their information and think they are worse. Shocking results stand out more for people – Leicester City beating Manchester City made a lot of people update their belief on whether Leicester City could win the title. Injuries, transfers, coaching changes, etc., all add to the running tally in our head and help us update our beliefs. We’re not actively doing anything other than watching soccer or reading articles, but our subconscious mind is updating that running tally with every piece of information that we come across.

I read a lot of articles, tweets, watch a lot of games, and do all the things that media people do, but my model hasn’t changed based on any of this information. I found some data, scraped it, built the model, ran the script to calculate the final table positions, and haven’t touched the model since. If you don’t count the headaches of finding useful public soccer data, I spent less than 10 hours actually building it. That’s more time than the media people spent filling out their predictions and e-mailing them to Simon, but it’s far less time than is involved in online processing and building the running tally that contributed to media experts’ predictions.

Media experts do this for a living and are likely paid quite handsomely to follow soccer. My impression is that most of the modelers follow soccer as a hobby, albeit a fairly obsessive one for most of us. And even with that, we haven’t updated our models since the pre-season based on all of that information. My model was completed sometime in July, updated August 1 with new rosters, and I’ve been done with it. Every minute people spend reading/watching/learning counts toward their subjective predictions: the running tally in your head is always being updated. The real issue is that statistical models are better at processing all of this information than we are. Statistical models sort through information better than we do, they update the running tally in a scientific data, weighting it properly, and ignoring irrelevant information better than our brains do.2 It’s not a matter of time, it’s a matter of limitations brought on by our brain’s ability to process information correctly. MOTSON runs through my laptop in a few seconds, and can do the 10,000 simulated seasons in under a minute. My brain can’t do anything nearly that quick or nearly that accurately, and that’s why the modelers are winning.






  1. Another issue is that media people knew their predictions would be made public, so I would think that knowledge would encourage them to try harder than simply throwing some numbers together without a lot of forethought.
  2. All of this assumes the model is built correctly, which is a big assumption.

Meaningful Equality and Why It’s Important to do Women’s Soccer Analytics

I hadn’t planned on writing anything today, and I’m not sure what started the discussion on Twitter, but there’s a discussion going on about whether countries should invest in Women’s Soccer. I’ve read some articles lately about gender equality in Australia, and there’s the USWNT/USSF lawsuit going on, but I didn’t see if there was a particular spark that set this off. Either way, it’s an important topic to discuss so I wanted to post some longer form thoughts on my short tweetstorm earlier.  This will likely be a series of synaptic misfires loosely related to women’s soccer and gender equality, but hopefully I can convey my thoughts and make some people think about the systemic issues with gender and sports in a different way.

The counterargument to paying women’s national teams equally to men/funding them the same/letting them play on the federation’s preferred surface is that “Women’s soccer isn’t popular/doesn’t make money. Once they bring in the amount of money that the men do, then they can have the same money.” The problem is that the two aren’t playing on the same playing field (literally in the case of the USMNT/WNT). For context I stole a line of thought from my favorite political science professor back in undergrad:

With zero data in my pocket to support this, I feel fairly comfortable saying women’s soccer isn’t nearly as popular as men’s soccer worldwide. Shockingly, if you treat something as unimportant and as a lower quality product, people will see it that way. Even if we stop doing that and treat the women’s game as an equal product to men’s soccer today, we still haven’t reached anything near equality. You can’t undo generations of conditioning with a single World Cup – it’s a start, but it’s not the end product. Not by a long shot.

A few weeks ago I posted a call to some of the bigger accounts in Soccer Analytics TwitterTM to have a “let’s retweet articles written by women” day, and had no takers. One person suggested “We should retweet women every day”, which is a good suggestion except virtually no one in my timeline does it. A couple of people suggested that they retweet quality content regardless of gender, which is a great sentiment except for the fact that 95% (likely higher) of the things I see retweeted are from men. Either men are just naturally better at writing about soccer, or there’s some gender bias in what we see as quality or what topics we’re interested in.

Admitting my own bias, I don’t follow women’s soccer particularly closely. 1 I do watch just about every match of the Women’s World Cup, and watch as many USWNT matches as I can, but there’s a few issues with it that end up in a nasty feedback loop preventing me, and I assume others, from following women’s soccer.

First, men’s soccer is so much easier to find on TV. NBCSN televises 5-6 EPL games every weekend, so I can put it on without needing to worry about setting up my Roku or streaming from my laptop to the computer2 I actually prefer Serie A, but even bein Sports makes it difficult to find Milan many weeks, so I’ve started following the EPL more.

This makes it easier to read articles about the EPL, or to a lesser extent Serie A.3 So I follow writers who write interesting content about the EPL because I have some sort of knowledge base there. There are a couple of really interesting German people I follow, but don’t read much of their content because I know nothing about the Bundesliga and the articles don’t mean much to me. This problem is exaggerated with Women’s Soccer: I don’t know that much about it, so when I read tactical analysis of the NWSL finals I struggle to keep up. It’s a nasty feedback loop: I don’t follow the league so I don’t appreciate the analyses as much as I should, which means I have less interest in the league because I don’t have a community to talk about it and learn about it with.

Similarly, the big EPL writers don’t write much about women’s soccer. I get why: I want to write more about the relegation round-up, but when I do I get far fewer retweets/likes/shares/clicks so I stop writing about Aston Villa/Sunderland/Newcastle. I’m just a hobbyist here, but I don’t want to spend time writing things people don’t read, so I end up drawn back to “Who’s going to win the EPL?” and “Is Leicester City for real?” and “What’s the deal with Chelsea?” It reinforces the big team bias, even in the biggest league in the world. Think what this does to something trying to get a foothold in the marketplace like women’s soccer.

This is a big reason I’ve decided to start doing predictive models for the NWSL this season.4 First, I want a reason to become more interested in women’s soccer, so investing myself into a project like this gives me a reason to follow it more closely. Second, maybe if I build it, people will come. I’m not necessarily a big dog5 in the soccer analytics community, but I have a decent following on Twitter and maybe if I start writing then we can start a women’s soccer analytics community. Someone has to be the first mover, and I’ll get enjoyment out of it so why not? Third, I think it’s an important thing to do. We have plenty of people writing about all the men’s leagues in the world, but not nearly enough writing about women’s soccer. I would never want to step out of my lane and speak over any of the amazingly talented women writing about women’s soccer, but if I can find a niche and bring in some people who wouldn’t otherwise be interested then maybe that’s a good thing.

I don’t know if this made for a particularly coherent blog post, but I really think it’s important to address the systemic inequalities in gender and sports (soccer in particular) and to think about how hard it is to break the cycle where we don’t invest in women’s soccer because it has a small fanbase/it has a small fanbase because we don’t invest in it. It’s an important step toward equality in sports, and hopefully we can move the ball forward.

  1. To be fair, I also don’t follow the Bundesliga, La Liga, or Ligue 1  closely either, but even for those leagues I know the biggest players and teams.
  2. I refuse to watch TV on my laptop. My students would cringe at me being an old man and watching TV on an actual TV, kind of like how I use my phone to be a phone.
  3. #forzaMilan
  4. There’s another problem here with availability of stats: it’s harder to do a decent analytic model for women’s soccer because there aren’t as many publicly available stats, so no one writes about it. But then no one collects stats for women’s soccer because no one’s interested, but no one’s interested because there aren’t any stats…feedback loop.
  5. Digby pun intended for Mike Goodman should he read this

Game Theory: Team Selection and The Magic of Properly Tanking the FA Cup

In previous posts, I’ve written why it’s rational for teams to field a sub-optimal lineup in the Champions League, despite the fact that it leads to a tragedy of the commons scenario. The short version is that the expected value of resting your top players in cup games and saving them for league fixtures is higher than playing them and trying to win the cup and meet whatever goals you have in the league. The expected value calculations are a little different between the UCL and the FA Cup, but the logic is the same. However, Arsenal’s strategy today made me think that there is one added strategic wrinkle in FA Cup roster selection, and one that Arsene Wenger may not have taken into consideration: the need to avoid a replay.

Arsenal’s lineup yesterday against Sunderland was: Cech, Bellerin, Gabriel, Koscielny, Gibbs, Oxlade-Chamberlain, Chambers, Iwobi, Walcott, Campbell, Giroud.

Wenger clearly chose not to tank the game, placing some value on a possible third straight FA Cup win. This is a strong lineup, with Walcott, Campbell, and Giroud up front, and Oxlade-Chamberlain, Koscielny, Bellerin, and Cech also in Arsenal’s strongest starting XI. However, he didn’t field his strongest lineup, choosing the rest some key players as well.  Notable omissions were Per Mertesacker, Mesut Ozil, and Nacho Monreal. Resting the player who is likely their top defender and their #10  who is one of the best playmakers in the league this season shows that he wasn’t overly concerned about winning either, as these players likely would have featured in a league game. Wenger chose not to tank, but also didn’t necessarily play to win, choosing a “third way” instead.

I’m going to start with a big assumption here: the maximum EV preferences are as follows.

  1. Loss
  2. Win
  3. Draw

Reasonable people can disagree on numbers 1 and 2 – maybe Arsenal gets more value out of winning a third round FA Cup match than I think they do, so maybe they really want to win. However, I think the draw, leading to the replay at The Stadium of Light, was Arsenal’s worst possible option here. The goal then would be to maximize the combined probability of winning and losing while minimizing the probability of a tie. At its extreme, this might look like a 4-2-4 with four defenders, two midfielders, and four strikers: a line-up that will score a lot of goals and concede a lot of goals.1 The odds of this lineup playing to a 0-0 or 1-1 draw are remote.

Wenger did something different: he played his top (available) attackers so maybe he planned on scoring a lot of goals, but he rested the team’s best provider which would potentially limit the number of opportunities they had. He also gave a 19 year old midfielder his first start in midfield, which is good for a lot of reasons but combined with resting Ozil really weakens Arsenal’s midfield.  Wenger didn’t play to win, but he didn’t choose a lineup of youngsters who would likely lose either. He risked a replay, which may have been the worst possible outcome for a team, stretched thin by injuries while trying to mount a title challenge.

Arsenal aside, the FA Cup adds an extra dimension into roster selection strategy that the Champions League doesn’t have. Avoiding a replay should be a priority for any team who hasn’t secured their league goals for the year, which means picking a team that is either going to win or lose, and will do either in convincing fashion. Adding another fixture to an already crowded list, is something all Premier League teams should want to avoid, and should select their teams accordingly.

 






  1. Like I said, this is extreme. I don’t actually think this would be a valid formation to choose, I only use it as an absurd example to illustrate the idea.

Small Clubs Need Scouts (and Analysts) the Most

Following up on my previous post, “Big Clubs Need Scouts (and Analysts) the Most”, I want to make the case that small clubs also need scouts (and analysts) the most.

Small clubs have a couple of disadvantages compared to Big ClubsTM that makes scouting/analytics more important. Specifically, small clubs have less money than their counterparts, so presumably they would have a harder time writing off a mistake. Manchester City can choose not to start Raheem Sterling, United can choose to bench Memphis Depay, but it’s much harder for a smaller club to justify benching a major summer signing. The opportunity cost is potentially higher for smaller clubs – signing a new striker in season 1 makes the justification for signing a replacement striker in season 2 much more difficult.

Additionally, there is less information out there about the players smaller clubs are trying to sign. There’s no shortage of opinions on which striker Manchester United should sign to replace Wayne Rooney: any number of a dozen options would likely work fairly well (although see my previous post for why this is a problem). While Aston Villa would likely benefit from signing Robert Lewandowski, he’s not a realistic transfer option for them. Analysts can find a list of potential targets that would be within reach for the club, and scouts can fill in all the blanks to pick the best option.

The consequences of a bad signing might be even higher for smaller clubs. If Manchester City isn’t happy with Raheem Sterling’s performance, they can play either Fabian Delph and Jesus Navas in his place. Maybe that puts them a step behind Arsenal, and they lose in the first knockout round of the UCL, but they’re still comfortably make the top 4 next year. But a team like Newcastle is possibly one bad signing away from relegation. Each signing has greater pressure to be a success and fit into the squad, and a failure (combined with the opportunity cost of buying another player mentioned before) could have significant consequences.

I would think this is all self-evident, but teams will gladly spend millions of dollars on players while finding savings in the most important area: information. My undergrad American Government professor said something that always stuck with me: “Every decision is easy if you have the right information.” The goal is to gather the right information, and a proper budget for analysts and scouts can help do that. There are plenty of places teams can find efficiencies, but a proper analytics and scouting department will give you a positive ROI as well as success on the pitch.

Big Clubs Need Scouts (and Analysts) the Most

There’s a story going around Twitter today that Manchester United don’t have any full-time scouts. There’s an argument that when you’re buying the elite players, scouting might be less important because it’s much easier to identify the best in the world.1 As an example, the rumor mill is talking about a triple-swoop for Neymar, Ronaldo, and Bale. I know this is just some combination of clickbait, boredom, and wishful thinking, but a team wouldn’t need to spend any money to know that those three would be better than the players United has at those positions. Also, when you have virtually unlimited transfer funds, you can afford to make mistakes. Manchester City spend a ridiculous amount of money on Raheem Sterling over the summer and then benched him for the first half of their biggest game of the season against Arsenal last weekend. Bad transfer choices hurt a lot less when you can sell for a loss and overpay for the next big player.

Here’s the problem with that: margins at the top of the table are incredibly narrow, and players who can improve top teams/not hurt top teams are few and far between. One of my first blogs here was questioning the Nicolas Otamendi signing at Manchester City, and as far as I can tell I was about the only one who didn’t like it at the time.  But in a close season like this one, buying the wrong central defender (I liked John Stones, who they probably could have bought for the same price they paid for Otamendi) would be the difference between 1st and 2nd place. People criticize Wenger’s transfer strategy (or non-transfer strategy), but his one move this summer turned out to be a good one while one of City’s major transfers was on the bench this weekend, and another looked as if he’d never held a defensive line before. Sterling and Otamendi are world-class players, but they looked like missed opportunities against Arsenal this weekend.

The gaps at the top of the game get more and more narrow, and more and more difficult to cross. The gap between the top 6 in the Premier League and the top 4 is small, but incredibly difficult to break (and takes a disastrous season from Chelsea for one new team to get in, likely for one season). Then the gap between the top 4 in England and the top 4 in Europe is similarly small, but much more difficult to break into, and the gap between #2 and #1 in Europe is even more difficult to crack.2 Finding players who can bridge these gaps is incredibly difficult, and only teams that make all the right moves can attempt to break into a new class of teams. One or two bad decisions are the difference between Champions League and traveling a couple thousand miles to Eastern Europe on a Thursday night. Scouts (and analytics) are more valuable than ever when the margins are this thin.

Preview: tomorrow I’m going to write up “Why Small Clubs Need Scouts  (Analysts) The Most” because they probably need scouts equally but for very different reasons that I think are worth talking about.

  1. This argument was made sarcastically in my timeline today, but I can picture it being made seriously
  2. Bayern Munich is virtually unbeatable in Germany and won the 2014 UCL title, but Barcelona was head and shoulders above them in their 2015 UCL semi-final. Bayern was likely head and shoulders above everyone else in Europe that season, etc…