I love reading all the interesting and impressive analytics work being done out there. There are so many people doing so many interesting things in Soccer Analytics TwitterTM, and it’s truly amazing to me how much great work is being done. I wanted to make one suggestion to the community though: slow down, explain what you’re doing step-by-step, and be as clear as possible.
I have a Ph.D. in Political Science, with a minor field of quantitative methodology. I teach two difference research methods courses at a university, have won awards for articles in methods journals, and have worked as a statistical consultant for multiple political organizations. I don’t say this because I think it’s particularly impressive, I say it to establish that I’m at least above the average blog reader in my understanding of math and statistics, yet in a significant amount of articles I read, can’t figure out what people are doing. If I can’t understand, I assume I’m not the only one. So I wanted to offer some advice for public discussion of statistical methods taken from my own experience and 7 years of teaching.
The math in most cases isn’t particularly complicated – most people are calculating some average or comparing differences in two averages, and I rarely read anything that uses math beyond high school algebra. To be clear, I don’t mean this as a criticism. I’m not at all a believer in fancy stats for the sake of fancy stats, but the actual math isn’t complicated. However when you rush through several steps of the process, present a formula and then move on to your next point without explaining the formula you’re going too fast.
Take a minute, explain the formula fully in words, step by step and point by point. Devote a full paragraph to it, making sure that the reader could re-create exactly what you’re doing without inferring what any steps are
Slow down some more
When you think you’ve slowed down enough and have explained it thoroughly enough, you still probably haven’t. There’s a company out there who asks potential employees to describe how they’d cook an egg. Some people say “You toss the egg in the pan, wait a couple minutes, and then put it on a plate.” Those people don’t get hired. Others say “You get an egg out of the refrigerator. Then you put a small pan on top of a burner on the stove. Then you turn the burner on to high, waiting 2 minutes to let the pan get hot. Then you crack the egg on the side of the pan, separate the shell, and drop the inside into the pan.” etc…these are the people who get hired, and this is the level of detail you should aspire to.
Create a separate paragraph with the formula itself. Explain each term in the formula. Walk the reader through a sample calculation. Explain the results.
Never Use Jargon when Regular Words Work
Jargon is created for specialists who need to communicate specific concepts to each other in a very clear, precise way. That’s probably not what you’re trying to do when you blog. Worse, when you start using technical terms without explaining them, you lose a percentage of that audience with every term. Maybe you think xG is a ubiquitous abbreviation that everyone has heard, but a percentage of your readers haven’t. I read articles that reference PDO all the time and for some reason I can never remember what it means. Maybe I’m the only one, but I doubt it. Maybe you think everyone knows what “regression to the mean” is, but I doubt it. Give a half-sentence explanation every time you use a word or phrase that you wouldn’t use outside of Twitter.
There’s likely more advice to be given here, but starting with these three pieces of advice will help the clarity of a lot of the things I see. You have spent a significant of time on your project, so you likely know it better than anyone. That’s both a good thing and a bad thing: you’ve hopefully spent a lot of time thinking it through and have created the best possible measure, but you also likely have a hard time filling in the blanks a new reader will not understand. If you step back and think about this advice it will both expand your audience and expand the audience of analytics in general, which are both worthwhile goals in my opinion.