April | 2011 | djmarsay

ESP and significance

April 17, 2011 Leave a comment

‘Understanding Uncertainty’ has a blog (‘uu blog’) on ESP and significance. The challenge for those not believing in ESP is an experiment which seems to show ‘statistically significant’ but mild ESP. This could be like a drug company that tests lots of drugs until it gets a ‘statistically significant’ result, but from the account it seems more significant than this.

The problem for an ESP atheist who is also a Bayesian is in trying to interpret the result of a significance test as a (subjective) probability that some ESP was present, as the above blog discusses. But from a sequential testing point of view (e.g. of Wald) we would simply take the significance as a threshold which stimulates us to test the conclusion. In typical science one would repeat the experiment and regard it as significant if the result was not repeated. But with ESP the ‘aura’ of the experimenter or place may be significant, so a failure by others to replicate a result may simply mean that only sometimes is ESP shown in the experimental set-up. So what is a ‘reasonable’ acceptance criterion?

Jack Good discussed the issues arising from ESP in some detail, including those above. He developed the notion of ‘weight of evidence’, which is the log of the appropriate likelihood ratio. There are some technical differences to the approach of the ‘uu blog’. They offer some advantages.

If e is the evidence/data obtained from an experiment and h is a hypothesis (e.g. the null hypothesis) then P(e|h) denotes the likelihood, where P() is the (Bayesian) probability. To be well-defined the likelihood should be entailed by the hypothesis.

One problem is that the likelihood depends on the granularity with which we measure the data, and so – on its own – is meaningless. In significance testing one defines E(e) to be the set of all data that is at least as ‘extreme’ as e, and uses the likelihood P(E(e)|h) to determine ‘1-significance’. But (as in ‘uu blog’) what one really wants is P(¬h|e).

In this experiment one is not comparing one theory or model with another, but a statistical ‘null hypothesis’ with its complement, which is very imprecise, so that it is not clear what the appropriate likelihood is. ‘uu blog’ describes the Bayesian approach, of having prior distributions as to how great an ESP effect might be, if there is one. To me this is rather like estimating how many angels one could get on a pin-head. An alternative is to use Jack Good’s ‘generalized likelihood’. In principal one considers all possible theories and takes the likelihood of the one that best explains the evidence. This is then used to form a likelihood ratio, as in ‘uu blog’, or the log likelihood is used as a ‘weight of evidence, as at Bletchley Park. In this ESP case one might consider subjects to have some probability of guessing correctly, varying the probability to get the best likelihood. (This seems to be about 52% as against the 50% of the null hypothesis.) Because the alternative to the null hypothesis includes biases that are arbitrarily close to the null hypothesis, one will ‘almost always’ find some positive or negative ESP effect. The interesting thing would be to consider the distribution of such apparent effects for the null hypothesis, and hence judge the significance of a result of 52%.

This seems a reasonable thing to do, even though there may be many hypotheses that we haven’t considered and so our test is quite weak. It is up to those claiming ESP to put forward hypotheses for testing.

A difficulty of the above procedure is that investigators and journals only tend to report positive results (‘uu blog’ hints at this). According to Bayesians one should estimate how many similar experiments have been done first and then accept ESP as ‘probable’ if a result appears sufficiently significant. I’m afraid I would rather work the other way: assess how many experiments there would have to be to make an apparently significant result really significant, and then judge whether it was credible that so many experiments had been done. Even if not, I would remain rather cynical unless and until the experiment could be refined to give a more definite and repeatable effect. Am I unscientific?

Dave Marsay

Filed under Information, Uncertainty Tagged with Bayesian probability, Statistics

AV or FPTP, according to wikipedia and Jenkins

April 2, 2011 3 Comments

The choice: FPTP or AV?

The UK has this choice on May 5th. (AV is also known as Instant Runoff Voting.) The debate so far hasn’t been particularly enlightening. Here I consider the advice from wikipedia and the UK Jenkins Commission, with a short note on tactical voting.

Wikipedia

Wikipedia gives a comparison of AV to other voting systems. We are interested in FPTP, a variant of plurality voting. Wikipedia has a table showing that both methods are about equally common. But which is best?

Advantages of FPTP

Wikipedia shows the following advantages for FPTP:

Preservation of “one person, one vote” principle.
Moderation
Fewer minority parties.

It notes that IRV (AV) is also generally regarded as satisfying ‘one person one vote’, so this can be discounted. The other two are partially true, but one needs to consider the whole truth.

The UK ‘No’ campaign makes a number of other claims, but they don’t seem to have any validity.

Moderation

Wikipedia notes:

Under a first-past-the-post system, voters are often afraid of “wasting” their vote on a candidate unlikely to win, so they cast their vote towards their most preferable choice possible of victory. Advocates of plurality voting suggest that this results in most candidates having to field a fairly moderate or centrist position.

This suggests that a moderate or centrist result is desirable. Where FPTP relies on tactical voting (which seems not to be so very common) to achieve this, AV tends to achieve it by design. It also achieves what wikipedia calls the mutual majority criterion. This is quite technical, but links to the notion of majority rule.

Majority rule

Majority rule is the binary decision criterion that if most people prefer A to B then A will be selected. For three or more choices all deterministic methods are technically vulnerable to tactical voting, so one needs to decide which desiderata are essential, and which can be compromised.

Suppose that one has a tribal society with the biggest tribe commanding 26% of the vote using its majority to repress the other tribes. Suppose that all other tribes would prefer a representative from anything other than the biggest tribe. Then majority rules demands that they get one. But if all tribes put up a candidate then the biggest tribe may win, due to vote splitting.

AV is not liable to vote-splitting and respects the mutual majority criterion, and hence is ‘democratic’ in a different (fuller?) sense than FPTP where it seems to be considered a virtue that the opposition parties must form a pre-election agreement on a common candidate.

Fewer minority parties

FPTP encourages both tactical voting and strategic agreement, coalitions or unions of parties, to avoid gross vote splitting. This leads to fewer minority parties, either because tactical voters don’t vote for them or because they merge with other parties. Under AV voters can vote for a minority party without wasting their vote. Voting will also generally be less tactical, and thus, unlike FPTP, a minority is not disadvantaged in building to a majority over a series of elections.

Summary from FPTP perspective

FPTP rewards politicking and ‘strong parties’ by rewarding tactical voting and pre-election agreements between candidates or parties. It disadvantages minority parties, such as the UK Greens and BNP. Are these good things?

Under FPTP a minority party which was most voter’s last choice could gain or retain power by ‘divide and rule’. Is this a problem?

Advantages of AV

Wikipedia notes many advantages of AV, including the mutual majority criterion. It has this special case:

Instant-runoff voting [aka AV] also passes the Condorcet loser criterion, which requires that if a candidate would lose a head-to-head competition against every other candidate, they must not win the overall election. First-past-the-post does not meet this criterion, indeed it is usually violated in elections with more than two popular candidates.

Tactical voting

All methods are vulnerable to tactical voting, so some compromise is required. Under FPTP vote-splitting can encourage you to vote tactically to avoid your worst-case choice winning. Wikipedia notes that, while not perfect, “alternative vote is quite resistant to strategy” (i.e., tactical voting). Under AV you are encouraged to vote tactically when:

There are preferences that form a cycle, as in AB, BC, CA.
You can be sure that the other candidates’ supporters aren’t voting tactically, or if they are, what proportion is voting tactically, and how.

The first condition is normally considered rare, and is where all methods have a problem. The second makes tactical voting much more risky: whereas under FPTP tactical voting is usually straightforward, under AV it is for from this.

Jenkins Commission

The UK’s Jenkins Commission took a broad of the political implications of FPTP versus AV and others. The defects of FPTP were:

A tendency to result in landslides.
Disadvantages third parties, even strong ones.
Disadvantages parties with even support across constituencies.
The essential contest is fought over a few ‘marginal’ seats.
It leads to ‘perverse’ results.
It advantages the ruling party.

The report noted the above advantages. Further:

Fairness to voters is the first essential. A primary duty of an electoral system is to represent the wishes of the electorate as effectively as possible. The major ‘fairness’ count against First Past the Post is that it distorts the desires of the voters. That the voters do not get the representation they want is more important than that the parties do not get the seats to which they think they are entitled.

And:

It [AV] would also virtually ensure that each MP commanded at least majority acquiescence within his constituency, which is far from being the case under FPTP, where as we have seen nearly a half of members have more opponents than supporters, and, exceptionally, a member can be elected (as in Inverness in 1992) with as little as 26% of the vote.

There were no criticisms of AV at the constituency level. The most significant criticism was that it isn’t proportional. AV is not always any more proportionate than FPTP, but this is not at issue in the UK referendum. In any case, some people prefer methods that tend to lead to enhanced majorities.

Consequences

Wikepedia seems to provide a relatively independent summary of voting systems, including the unavoidable problems and the pros and cons of FPTP and AV (aka IRV) considered across many countries and years. A comparison with the UK’s current YES and NO campaigns seems instructive: not everything they say should be taken at face value.

One way to decide would be to rate yourself on the following scales:

We need a system that gives the sitting candidate / party an advantage … or not.
We need a system that gives the two leading candidates / parties an advantage … or not.
We need a system that discriminates against minority parties such as the BNP … or we don’t want to discriminate against parties like the Green party.
We want a system that tends to leave us with the same old two main parties … or we want a system that allows new parties to grow and potentially overtake the old parties, as long as they have support.
We want a system that rewards politicking and tactical voting … or not.
We want every vote to be counted once (in a technical sense) … or we want to make sure that votes are not split.
We want a system that tends to stable government, by giving the ruling party an advantage … versus we want a system that will enable us to oust the ruling party when the majority wish to do so.
We want the winner to have the most first preferences … or we want to reject every candidate that belongs to a group such that some majority prefers all of those outside the group to all of those inside (as in the criterion above).

Of these alternatives, the first would indicate FPTP, the second AV. Another way to choose is to think about the current situation. If the government does well the Conservatives are likely to get a working majority next time. If the government does badly then labour should do well. The voting system should only affect things if the government performance is debatable, so if you support one of the major parties then the voting system will not matter unless your party’s policy turns out to be wrong. What might favour the liberals? If the coalition does okay and many floating voters think that it could have gone much worse without the liberals’ moderating influence. Thus, if the liberals’ manifesto was essentially correct they might do better than last time, more so under AV. This could be seen to be reasonable. I can’t think of a credible situation in which AV favours a party whose manifesto ‘line’ was clearly wrong. The longer term impacts, such as suppressing the BNP and the Greens, seem more significant. For example, labour might split into ‘new’ and ‘old’, allowing the electorate to choose. similarly, if the government does badly the Conservatives might split into ‘wet’ and ‘dry’, giving voters a choice. Under FPTP the two main parties are effectively coalitions with only party members choosing the ‘flavour’. AV would give others a say, which would presumably be a moderating influence. Which is more reasonable?

Which of all these considerations is more important? My current view is that:

AV better respects the wishes of the majority
FPTP encourages tactical voting
AV gives candidates which have a lot of first preferences some advantage over ‘wishy washy’ candidates that are ranked reasonably high by most but which are the first preference of few. This is not as much of factor as for FPTP, but seems reasonable.

djmarsay