The sleeping beauty problem is:
Sleeping Beauty volunteers to undergo the following experiment and is told all of the following details: On Sunday she will be put to sleep. Once or twice, during the experiment, Beauty will be wakened, interviewed, and put back to sleep with an amnesia-inducing drug that makes her forget that awakening. A fair coin will be tossed to determine which experimental procedure to undertake: if the coin comes up heads, Beauty will be wakened and interviewed on Monday only. If the coin comes up tails, she will be wakened and interviewed on Monday and Tuesday. In either case, she will be wakened on Wednesday without interview and the experiment ends.
Any time Sleeping Beauty is wakened and interviewed, she is asked, “What is your belief now for the proposition that the coin landed heads?”
It is implicit that SB will not be able to ask any questions, will get no clues about the day, and will not have any grounds for thinking the coin toss was one way or the other, or what day it is, beyond those stated. As a mathematician, I am interested in what SB’s ‘belief’ should be if she is a mathematician. Thus, the mathematical interpretation of the question is meant to be straightforward. Ideally, then, one would apply the appropriate theory to yield a clear result.
There are various solutions sincerely held, which I summarise below.
P(Heads) is the prior probability of Heads (1/2). P(Heads | Awoken) is the probability of Heads when SB is awoken, and so on.
Just before she goes to sleep SB knows that P(Heads)=0.5. She gets no new information when she is awoken, so
P(Head | Awoken) = P(Heads) = 1/2.
If the experiment were repeated many times and SB were invited to gamble on Heads or Tails then she should regard odds of 1:2 as being fair, implying P(Heads |Awoken) = 1/3.
There is something odd about the question (take your pick), making it in some sense improper.
The Halfer solution is in terms of information theory, but Shannon defines information in terms of probability, so if the argument is valid we should be able top re-state it in terms of probability, without recourse to the concept of ‘information’. Similarly, although some would disagree, probability seems more fundamental than decisions to me.
The philosophical argument seems attractive, in that the thesis of my blog is that the received wisdom on probability is limited. In particular it seems to me that Bayes’ rule can be misused, and that there appears to be an example here, in that Bayes’ rule seems to imply Halfer yet the Thirder argument (while couched inappropriately) seems capable of being adapted to give the answer.
But none of the above have attempted a mathematical approach as such, so here I aim to remedy this. The treatment is hopefully accessible and capable of being made formal.
Given a body of theory, such as probability theory, it is quite usual to attempt to apply the results and methods of given theory to the problem at hand. But often, it is appropriate to unbundle the canned theory to check that all of its assumptions are appropriate, and to develop some variant.
In this case, from a mathematical perspective the usual notation can be somewhat misleading. As Jack Good pointed out a probability such as P(X) depends on the context, C, and is better denoted P( X : C). Thus the initial probability of Heads might be denoted P( Heads: Initial) = P( Heads : Fair coin), and the probability of Heads on being awoken is P( Heads : Awoken ).
In the ‘Halfer’ argument, when awoken SB’s estimate of P(Heads) should be the same as it initially was of P(Heads:Awoken), but this may be different from P( Heads: Initial). If we wish to apply Bayes rule then we need some evidence, E, such that P(Heads|E, Initial) = P(Heads: Awoken). It is not obvious what this would be. (It clearly isn’t nothing). In applying Bayes rule we would need to establish P(E|Heads:Initial). Again, it isn’t obvious how to do this. (Hence the controversy.)
My proposal, for comment
There are many variants of the foundations of probability theory, but probability is often conceived of as a normalised measure. But Bayes’ theorem remains true for un-normalised measures. Thus for a measure Q( ) that satisfies all the conditions of a probability measure apart from not necessarily being normalised, we can define
Q(A|B ) = Q(A^B)/Q(B),
as an extension of the usual conditional probability. The likelihood ratio in Bayes’ rule then becomes
This is clearly 2, even though we don’t know what E or Q(E|Tails) are. Hence we get the Thirder results.
This result is conditional on there existing a valid measure Q( ). But as we have a rather small event space, we can construct this explicitly. For example, Q(Heads) = P(Heads: Fair coin) etc, Q(E^Tails) = x>0, a variable, Q(E^Heads) = 2.x. Then for any choice of x, we have a valid measure. The result is independent of x.
A question remains as to how meaningful the result is. Can we claim to know P(Heads|E) without knowing x? Can Q( ) be normalised? How do we interpret our result?
A feature of the SB problem is that it mixes randomness and simple duplication, which confuses us. If we let Q( E) be something like ‘the number of times E should be expected, relative to the appropriate base’ then this covers both randomness and deterministic duplication. If we keep in mind Jack Good’s concerns about context, we will hopefully use the base appropriate to the context.
hen Q(E|Tails) = 1, Q(E|Heads) = 2, (i.e., x=1).
Taking this more general concept (which I have found useful in some real applications), it seems to me that one should be as happy to compute with Q( ) as one is with P( ), simply noting that the final answer happens to be a probability.
One can look at this two ways. Many seem to regard mathematics, such as probability theory, as consisting of some universal rules that anyone should be able to apply reliably. The sleeping beauty problem is a good counter-example, in that it has aspects that confuse the received wisdoms on probability. As a mathematician, I tend to see theories as providing resources which may provide short-cuts to aid my reasoning. I may be able to apply the theory ‘as is’, but I should never take this for granted. If the theory does not apply ‘straight out of the box’ then I need to develop my own theory, which can often be based on some existing theory, but whose validity depends on its logic, not on how others regard the theories.
In this problem, it seems important to be able to identify the more fundamental theories. Introducing concepts such as ‘information’ can just increase the opportunities for confusion. It is then important to understand that all theories have axioms (aka assumptions), and one should never overlook them. There are also issues of ‘good practice’. It is usual in mathematics, as in Physics, to develop notations that aid understanding and do not invite mistakes. Unfortunately probability theory predates mathematics in the modern sense, and it is important to appreciate where the notation can mislead. (I always mentally associate a context with probabilities, and look out for changes.)
The informal approach that I put forward above can be developed in one of two ways. Firstly, one could have a probability-like theory in which the sums of various quantities were fixed, sometimes to 1. This is a modest extension of probability theory. The alternative is to note that the argument resembles decision theory, but is restricted to being able to answer certain questions correctly. Thus one can have a simplification of decision theory. With this in mind, it might not be surprising that there are some alternatives to standard probability and decision theory that might be required from time to time – especially where an example has been crafted to be challenging. It is in helping to identify the need for such theories, developing candidates and helping to develop good practice around them that mathematics, as distinct from ‘mathematical methods’, seems essential.