Analysis shows Hawthorn more likely to win grand final

http://www.theguardian.com/news/datablog/2013/sep/28/afl-grand-final-statistics

Version 0 of 1.

Hawthorn are most likely to win the AFL grand final based on a statistical analysis* of their performance in the season so far.

This isn't surprising, given their No 1 position on the ladder, but please indulge me for a spot of number crunching.

Using the AFL's statistics here and here, I've analysed various match attributes for how well they correlate with the number of wins for each team.

Taking the average kicks, handballs, disposals, marks, hit-outs, frees for, frees against, tackles, goals, behinds and points against for each team, I tested each for correlation with the number of wins per team to get a crude indication of how each attribute relates to winning games.

Here are the results:

Only goals, disposals, kicks, behinds and points against were statistically significant with a p value of less than 0.05, and were all correlated quite strongly with wins. The closer the R value is to 1 or -1, the stronger the relationship between wins and the other variables is.

Goals and behinds aren't a surprise, of course; if you score more goals obviously you're more likely to win. Kicks and disposals are a measure of possession. Points against has a big contribution on the defensive side of things, though it is interesting that defensive statistics like tackles and marks aren't as important as attacking statistics. This may not bode well for Fremantle, who are the stronger defensive team. Fremantle have the lowest total points conceded in 2013, with higher average tackles and marks per game than Hawthorn. Hawthorn however dominate in the attacking statistics with higher goals, behinds and disposals.

Taking the five attributes with significant correlations, I then used a linear regression analysis to get a model of how the number of wins varies with changes in goals, disposals, kicks, behinds and points against.

The results of the regression were statistically significant, with an R value of 0.9, indicating most of the variation in the number of wins was "explained" by those five factors. The formula for determining the number of wins was:

<strong>Games won = 2.7032 - 0.0541 * Kicks avg + 0.0300 * Disposals avg + 1.4243 * Goals avg + 0.2214 * Behinds avg - 0.0064 * Points against</strong>

Which had an R2 value of 0.87, indicating the linear model was a pretty good fit.

So this gives us a formula for working out what each teams success rate should be given their average in each of these stats. Plugging in each team's season results gives a predicted wins value of 17.17 for Hawthorn, which is less than their actual final number of 19 wins from 22 games, and a value of 14.02 for Fremantle, which is less than their actual result of 16. The difference between the real and predicted results is greatest for Fremantle, but not by a lot.

So either this model is complete rubbish (possible) or Fremantle have been winning more games than they should have, and are also right to be lower placed than Hawthorn as an attacking team trumps a defensive team. Based on this I'm picking Hawthorn for the win.

All of that said, it's still a game of footy and anything could happen on the day!

<em>*The usual disclaimer: I've only ever done statistics as an undergraduate science student, so this isn't the most rigorous or even correct way to do things. Just a bit of fun, but I'd be interested to hear from anyone out there who has a proper maths background and an interest in sport.</em>

Our editors' picks for the day's top news and commentary delivered to your inbox each morning.