Sunday, 2 June 2013

How predictable is English Football? Using linear regression to forecast future league positions

Sport produces a lot of data that pundits generally ignore. Somehow, they manage to spend hours debating how a season of football was hampered or improved by the actions of a few individuals. Usually referees.

My gut feeling has always been that success for any English Premier League football team is highly dependant on financial management. This varies considerably between clubs.

Curiosity got the better of me on this one.

In statisticslinear regression is an approach to modelling the relationship between a scalar dependent variable y and one or more explanatory variables denoted XIt was the first type of regression to be studied seriously and the first to be used extensively in practical applications.

In this instance, my dependant variable is league points awarded this season, and my predictors are turnover, profit/loss before tax, net debt, interest owed on any debt and the club's wage bill. These predictors were all taken from the previous season.*

A stepwise multiple regression was conducted to evaluate whether any financial indicators were necessary to predict total points as of June 1st 2013. At step one of the analysis, annual turnover was entered into the regression equation and was significantly related to points awarded [F(1,15) = 60.35, p < .001]. The multiple regression correlation coefficient was .89, indicating approximately 78.8% of the variance in total points could be accounted for based on a club's turnover. The remaining variables: profit/loss, wage bill, net debt and interest payable were not entered into the equation [p > .29].

The regression equation can therefore be defined as:

points = .191turnover +29.48

While termed a regression equation, this is essentially the same as any equation referring to a straight line (y = mx + c).

Plot demonstrating a strong correlation and predictive power between the number of points awarded this season and annual club turnover from the previous year. League position is also noted. 

In other words, if you wanted to predict how many points a team might win next year, any prediction would be well advised to take into account a club's turnover at the end of the previous season. For example, imagine Newcastle United's turnover increases from 93 to 120 million. Entering this into the above equation, their points this time next year would be 52 (an increase from 41 at present).

Of course, this regression analysis only takes into account the data entered into the model in the first instance. Additional financial data should produce a more accurate model as almost all of my predictors (with the exception of club debt) correlated with total points awarded.

The small sample size is also of concern. Sadly, this model is almost certainly not as powerful as the numbers suggest so I wouldn't head to William Hill just yet. In particular, teams with less variation in points and turnover may be more difficult to predict as they cluster together. That said, it would be interesting to combine data from every team across each divisions over a number of years. With enough enough historical information, a superior model could forecast league position with some accuracy.

Such a result may appear to be intuitively obvious, but this reality isn't reflected in current media coverage. Despite a huge amount of time devoted to football panel shows, money alone appears to be a strong predictor of success. For example, Alex Ferguson was clearly a great manager, but part of that success hinges on the fact that he was based at a very rich club for most of his career. That alone may account for a large percentage of his success.

But no one really wants to talk about that.

I appreciate that being able to predict even 1% of a future sporting performance remains very important. In a 100m sprint, it could make the difference between gold and bronze. However, given the repetitive nature of football league tables north and south of the border, 1% would make very little difference.

This is particularly true in Scotland where no team outside the old firm have won the league in my lifetime!

*Data taken from