Can polls be reliable?

Election polls in many countries have seemed unusually unreliable recently. Why? And can they be fixed?

The most basic observation is that if one has a random sample of a population in which x% has some attribute then it is reasonable to estimate that x% of the whole population has that attribute, and that this estimate will tend to be more accurate the larger the sample is. In some polls sample size can be an issue, but not in the main political polls.

A fundamental problem with most polls is that the ‘random’ sample may not be uniformly distributed, with some sub-groups over or under represented. Political polls have some additional issues, that are sometimes blamed:

  • People with certain opinions may be reluctant to express them, or may even mislead.
  • There may be a shift in opinions with time, due to campaigns or events.
  • Different groups may differ in whether they actually vote, for example depending on the weather.

I also think that in the UK the trend to postal voting may have confused things, as postal voters will have missed out on the later stages of campaigns, and on later events. (Which were significant in the UK 2017 general election.)

Pollsters have a lot of experience at compensating for these distortions, and are increasingly using ‘sophisticated mathematical tools’. How is this possible, and is there any residual uncertainty?

Back to mathematics, suppose that we have a science-like situation in which we know which factors (e.g. gender, age, social class ..) are relevant. With a large enough sample we can partition the results by combination of factors, measure the proportions for each combination, and then combine these proportions, weighting by the prevalence of the combinations in the whole population. (More sophisticated approaches are used for smaller samples, but they only reduce the statistical reliability.)

Systematic errors can creep in in two ways:

  1. Instead of using just the poll data, some ‘laws of politics’ (such as the effect of rain) or other heuristics (such as that the swing among postal votes will be similar to that for votes in person) may be wrong.
  2. An important factor is missed. (For example, people with teenage children or grandchildren may vote differently from their peers when student fees are an issue.)

These issues have analogues in the science lab. In the first place one is using the wrong theory to interpret the data, and so the results are corrupted. In the second case one has some unnoticed ‘uncontrolled variable’ that can really confuse things.

A polling method using fixed factors and laws will only be reliable when they reasonably accurately the attributes of interest, and not when ‘the nature of politics’ is changing, as it often does and as it seems to be right now in North America and Europe. (According to game theory one should expect such changes when coalitions change or are under threat, as they are.) To do better, the polling organisation would need to understand the factors that the parties were bringing into play at least as well as the parties themselves, and possibly better. This seems unlikely, at least in the UK.

What can be done?

It seems to me that polls used to be relatively easy to interpret, possibly because they were simpler. Our more sophisticated contemporary methods make more detailed assumptions. To interpret them we would need to know what these assumptions were. We could then ‘aim off’, based on our own judgment. But this would involve pollsters in publishing some details of their methods, which they are naturally loth to do. So what could be done? Maybe we could have some agreed simple methods and publish findings as ‘extrapolations’ to inform debate, rather than predictions. We could then factor in our own assumptions. (For example, our assumptions about students turnout.)

So, I don’t think that we can expect reliable poll findings that are predictions, but possibly we could have useful poll findings that would inform debate and allow us to take our own views. (A bit like any ‘big data’.)

Dave Marsay

 

Advertisements

About Dave Marsay
Mathematician with an interest in 'good' reasoning.

2 Responses to Can polls be reliable?

  1. Polls used to be relatively easy to interpret because response rates were much higher:

    “When I [Cliff Zukin] first started doing telephone surveys in New Jersey in the late 1970s, we considered an 80 percent response rate acceptable, and even then we worried if the 20 percent we missed were different in attitudes and behaviors than the 80 percent we got. Enter answering machines and other technologies. By 1997, Pew’s response rate was 36 percent, and the decline has accelerated. By 2014 the response rate had fallen to 8 percent.”

    The sophisticated contemporary methods pollsters use now make more detailed assumptions because extrapolation to the population requires information that isn’t present in the sample, so they are forced to use their best guesses just to get anywhere at all.

    I agree that we can’t expect reliable poll findings that are predictions, but I think we can get reliable information about *changes* in public opinion. That is, I think the bias in poll results will be relatively stable and will cancel out when calculating differences (contra Andrew Gelman, who thinks differential nonresponse explained some big shifts in the polling in the last US election).

    • Dave Marsay says:

      Your points are valid and widely accepted. I am pointing out a different – and I argue more fundamental – problem. Although I think it important with respect to polling, my motivation is more to use it to illustrate a general problem with applied statistics, including in science. (As on my blog.)

      I accept that changes are easier to estimate than absolute values, but even here there is a problem.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: