Shutting down the Phantom Trend

28.7.2016

A couple of years ago I set up a polling aggregator site called The Phantom Trend. (It used a latent trend model, it was anonymous at first because I was still at the RBA, and all the good domain names were taken.) With the 2016 election over, I feel like it’s done its dash so I’m shutting it down. It would have been nice to leave on a high, but the model’s performance at the election means I’m leaving on a ho-hum. Forecasting is hard.

The model

My basic idea was to use a Kalman filter to estimate an underlying trend in voting intention, treating public polls as noisy observations of the true value. It’s a nice way to look at the problem because you get uncertainty estimates, and polling ‘house effects’, for free.

Everyone knows that swings are never uniform, so I used the Kalman filter to model each state’s primary vote. The state numbers were directly observed (with error) when Ipsos or Newspoll published regional breakdowns, but otherwise the Kalman filter observation equation used the adding-up constraint to figure out how each state might have moved from the national average.

Given the primary votes in each state, the model then simulated voting in each seat. Each electorate across a state was assumed to have the same change in primary vote percentages. I spent a fair bit of time seeing if I could use demographics or other information to do better than that, but the answer was a clear no. This meant that the model was never able to do well in seats with large votes for independents, the Greens, or PUP in 2013.

The whole model is on GitHub, including the input data.

How the model performed

The forecasts of state-by-state primary votes were reasonably ok.

The biggest misses were the Other vote in South Australia—every pollster seemed to have trouble getting a read on the new Nick Xenophon Team—and the right-wing vote in the West, where the move from the LNP to Other turned out to be a mirage. The model also underestimated the ALP’s strength across the board, by a few percentage points in each state. That might have been because it was still overcorrecting a little for public polling’s failure to see the strength in the PUP vote in 2013.

The forecasts for individual seats turned out to be about as accurate as I expected, based on a couple of quick cross-validation exercises I did last year. Looking just at the electorates with a two-party preferred number (meaning that the final outcome was a contest between the ALP and LNP), the model picked 17 of them for the wrong side, plotted here in red. In forecasting the total, the five seats unexpectedly won by the LNP cancel out five of the ALP’s, so the sum was out by seven seats. Not terrible, but enough to mistakenly predict a solid LNP win instead of a squeaker.

These mistakes were partly because the model underestimated the ALP’s primary vote, but also because the government did worse than you would have expected from applying a uniform swing to the existing electoral pendulum.

I’d hoped to do better, but the main reason for that would have been luck. Nobody will manage to be Australia’s Nate Silver, with the dots in the chart lining up perfectly on the 45-degree line. Each US swing state has multiple public polls, appearing several times a week, each with a large sample. In Australia, it’s a big ask for polling houses to get their weightings right in individual electorates, and the betting markets’ track record in picking winners is not good. There’s just not enough data, and not enough voters.

I also calculated what would have happened if the model ignored the state-by-state primary votes it had estimated, and instead used a national two-party-preferred estimate across the board.

The model’s errors (the red bars) are smaller in Victoria and the ACT, it’s a tie in Queensland, and the uniform swing would have done better in all other states. I don’t regret working on the more complex version of the model, because it was more fun, but a nationally uniform swing is clearly a safer bet.

Mistakes I made

Here are some painful learning experiences I took from this project.

Wrong language choice

I used R for everything. It was a great choice for the data preparation (dplyr) and exploratory plotting (ggplot2). And the FKF package is OK for state-space filtering, though it’s missing things like state smoothing.

The trouble was that the complex structure of the model required a certain amount of data work during estimation. Putting that inside some R loops made it painfully slow (meaning a few seconds per single likelihood calc). I wish I’d invested the time early on to switch to C++ or Julia for that part of the code.

Parameter updates

Because the estimation step was inconveniently slow, I was very slack about updating the model’s parameters this year. This was ok for most pollsters, but would have hurt the model’s performance a bit when dealing with relative newcomers like NewNewspoll and Ipsos.

Handling of uncertainty

I never got around to working out the model’s forecast uncertainty as carefully as I’d have liked. In particular, I did not incorporate parameter uncertainty. This seemed like a low priority, because there would only be one or two elections during the model’s lifetime, but it’s still embarrassing.

Cool things I learned

This project also had some fun learning experiences.

Makefiles

From my colleagues at Kaggle, I learned how amazing it can be to have a build pipeline. I used a Makefile to automate everything from data ingest to website deployment, which made it very easy to incorporate new data, reproduce results, and tinker with parts of the model. You can get more information about Makefiles from this meta-cheatsheet or the Make Book. However, while it’s a vast improvement over doing everything manually, GNU Make has a few issues that make it imperfect for data science, so I’ll be trying out Luigi for my next side-project.

Node.js

This was the first thing I built using Node. It’s very quick to learn Node, and the Express.js layer makes it easy to create and deploy web servers. You can build a complete webapp in about thirty minutes.

Shouts Out

I’m very grateful to the Poll Bludger and the Ghost Who Votes for curating so much Australian polling data. Thanks also to David Barry for generously sharing his cleaned set of polling-booth data, which I used for some exploratory work, and to Thomas McMahon for providing a large set of historical numbers.

As well as the Poll Bludger’s moving average, you can find another statistical polling model over at Mark the Ballot, and a judgement-based aggregation put together by Kevin Bonham. And now there’s Emma Chisit, a new statistical model built by Clinton Boys.

Thanks for reading this far, and thanks to everyone who interacted with the Phantom Trend.