## Can polls be reliable?

Election polls in many countries have seemed unusually unreliable recently. Why? And can they be fixed?

The most basic observation is that if one has a random sample of a population in which x% has some attribute then it is reasonable to estimate that x% of the whole population has that attribute, and that this estimate will tend to be more accurate the larger the sample is. In some polls sample size can be an issue, but not in the main political polls.

A fundamental problem with most polls is that the ‘random’ sample may not be uniformly distributed, with some sub-groups over or under represented. Political polls have some additional issues, that are sometimes blamed:

• People with certain opinions may be reluctant to express them, or may even mislead.
• There may be a shift in opinions with time, due to campaigns or events.
• Different groups may differ in whether they actually vote, for example depending on the weather.

I also think that in the UK the trend to postal voting may have confused things, as postal voters will have missed out on the later stages of campaigns, and on later events. (Which were significant in the UK 2017 general election.)

Pollsters have a lot of experience at compensating for these distortions, and are increasingly using ‘sophisticated mathematical tools’. How is this possible, and is there any residual uncertainty?

Back to mathematics, suppose that we have a science-like situation in which we know which factors (e.g. gender, age, social class ..) are relevant. With a large enough sample we can partition the results by combination of factors, measure the proportions for each combination, and then combine these proportions, weighting by the prevalence of the combinations in the whole population. (More sophisticated approaches are used for smaller samples, but they only reduce the statistical reliability.)

Systematic errors can creep in in two ways:

1. Instead of using just the poll data, some ‘laws of politics’ (such as the effect of rain) or other heuristics (such as that the swing among postal votes will be similar to that for votes in person) may be wrong.
2. An important factor is missed. (For example, people with teenage children or grandchildren may vote differently from their peers when student fees are an issue.)

These issues have analogues in the science lab. In the first place one is using the wrong theory to interpret the data, and so the results are corrupted. In the second case one has some unnoticed ‘uncontrolled variable’ that can really confuse things.

A polling method using fixed factors and laws will only be reliable when they reasonably accurately the attributes of interest, and not when ‘the nature of politics’ is changing, as it often does and as it seems to be right now in North America and Europe. (According to game theory one should expect such changes when coalitions change or are under threat, as they are.) To do better, the polling organisation would need to understand the factors that the parties were bringing into play at least as well as the parties themselves, and possibly better. This seems unlikely, at least in the UK.

What can be done?

It seems to me that polls used to be relatively easy to interpret, possibly because they were simpler. Our more sophisticated contemporary methods make more detailed assumptions. To interpret them we would need to know what these assumptions were. We could then ‘aim off’, based on our own judgment. But this would involve pollsters in publishing some details of their methods, which they are naturally loth to do. So what could be done? Maybe we could have some agreed simple methods and publish findings as ‘extrapolations’ to inform debate, rather than predictions. We could then factor in our own assumptions. (For example, our assumptions about students turnout.)

So, I don’t think that we can expect reliable poll findings that are predictions, but possibly we could have useful poll findings that would inform debate and allow us to take our own views. (A bit like any ‘big data’.)

Dave Marsay