I’ve been working on a Points Above Replacement (PAR) measure for soccer, and there are plenty of challenges. Baseball is an easier game to do stats with – it’s a relatively closed system, one batter, one pitcher, one fielder per play.^{1} Soccer creates a new challenge, so I’ve been experimenting with my measure.

Similar to the “Each Team’s Best Striker” post from a couple of weeks ago, I trained my Support Vector Machine (SVM) on 2014 league data across the 5 major European Leagues, then read in player stats from those leagues and the English Championship to a separate database. I started with the first player of each team, and substituted each player who plays the same position in the database for the original player, calculating the new predicted points in the SVM. After finishing all players in the dataset for the first player, I move on to the second, third, fourth, etc. until I’ve finished the team.

For this analysis, I put all the players in the database in order from highest expected point total to lowest point total. Then I found the 50th percentile (the player where 50% of players in the database are expected to win more points and 50% are expected to win fewer), 25th percentile (75% are expected to win more, 25% are expected to win fewer), 10th (90% more, 10% less), 5th, and 1st percentiles. I subtracted the number of points these players were supposed to win from the number of points each player in the team’s starting XI was expected to win, and calculated a “Points Above Replacement” score.

As an example, Djamel Mesbah is the 25th percentile player in defense. If he played Left Back for Arsenal, they would be expected to earn 73.3 points. Arsenal’s left back in my model is Nacho Monreal, and with him they are expected to earn 78.8 points. Subtracting 73.3 from 78.8 gives me 5.5 points, giving Monreal a +5.5 PAR.

I repeated this process for all players, and then some other level players to see what works and came up with some interesting results. Arsenal’s plots are below:

I have each player represented in the bars and then the sum of all other players in the bottom bar. If the goal is to find the improvement of Arsenal’s squads over a team of generic replacement players, then I think the answer is somewhere around the 5th and 10th percentiles. Arsenal is expected to win somewhere around 80 points, and if a typical relegation team is worth somewhere around 35 points or so then we’d expect to see Arsenal have a team of ~45 points above a replacement squad of a team expected to get relegated from the EPL. The 1st percentile is too much (~65 points above replacement seems a bit high for any reasonable replacement), and the 25th percentile might be a little low (~25 points above replacement might not be enough).

I repeated the analysis for Chelsea, and found something similar:

Chelsea’s squad rates a little more highly here – somewhere around 53 points above replacement at the 5th percentile, and 45 PAR at the 10th percentile, but the results are pretty consistent here.

There’s a lot more to be done, but this is a good first cut at the data I think.

- There is obviously a little more to baseball strategy than this, but the point is that it’s obviously a lot more clear case than soccer ↩