# Knightian uncertainty and epochs

Frank Knight was a follower of Keynes who emphasised and popularised the importance of ‘Knightian uncertainty’, meaning uncertainty other than Bayesian probability. That is, it is concerned with events that are not deterministic but also not ‘random‘ in the same sense as idealised gambling mechanisms.

Whitehead‘s epochs, when stable and ergodic in the short-term, tend to satisfy the Bayesian assumptions, whereas future epochs do not. This within the current epoch one has (Bayesian) probabilities, longer term one has Knightian uncertainty.

### Example

Consider a Casino with imperfect roulette wheels with unknown biases. It might reasonably change the wheel (or other gambling mechanism) whenever a punter seems to be doing infeasibly well. Even if we assume that the current wheel has associated unknown probabilities that can be estimated from the observed outcomes, but the longer-term uncertainty seems qualitatively different. If there are two wheels with known biases that might be used next, and if we don’t know which is to be used then, as Keynes shows, one needs to represent the ambiguity rather than being able to fuse the two probability distributions into one. (If the two wheels have equal and opposite biases then by the principle of indifference a fused version would have the same probability distribution as an idealised wheel, yet the two situations are quite different.)

### Bayesian probability in context

In practice, the key to Bayesian probability is Bayes’ rule, by which the estimated probability of hypotheses is updated depending on the likelihood of new evidence against those hypotheses. (P(H|E)/P(H’|E) = {P(E|H)/P(E|H’)}•{P(H)/P(H’)}.) But estimated probabilities depend on the ‘context’ or epoch, which may change without our receiving any data. Thus, as Keynes and Jack Good point out, the probabilities should really be qualified by context, as in:

P(H|E:C)/P(H’|E:C) = {P(E|H:C)/P(E|H’:C)}•{P(H:C)/P(H’:C)}.

That is, the results of applying Bayes’ rule is conditional on our being in the same epoch. Whenever we consider data, we should not only consider what it means within our assumed context, but whether it has implications for our context.

Taking a Bayesian approach, if G is a global certain context and C a sub-context that is believed but not certain, then taking

P(E|H:G) = P(E|H:C)  etc

is only valid when P(C:G) ≈ 1,

both before and after obtaining E. But by Bayes’ rule, for an alternative C’:

P(C|E:G)/P(C’|E:G) = {P(E|C:G)/P(E|C’:G)}•{P(C:G)/P(C’:G)}.

Thus, unless the evidence, E, is as least as likely for C as for any other possible sub-context, one needs to check that P(C|E:G) ≈ 1. If not then one may need to change the apparent context, C, and compute P(H|E:C’)/P(H’|E:C’) from scratch: Bayes’ rule in its common – simple – form does not apply. (There are also other technical problems, but this piece is about epochs.) If one is not certain about the epoch then for each possible epoch one has a different possible Bayesian probability, a form of Knightian uncertainty.

### Representing uncertainty across epochs

Sometimes a change of context means that everything changes. At other times, some things can be ‘read across’ between epochs. For example, suppose that the hypotheses, H, are the same but the likelihoods P(E|H:C) change. Then one can use Bayes’ rule to maintain likelihoods conditioned on possible contexts.

### Information

Shannon‘s mathematical information theory builds on the notion of probability. A probability distribution determines an entropy, whilst information is measured by the change in entropy.  The familiar case of Shannon’s theory (which is what people normally mean when they refer to ‘information theory’) makes assumptions that imply Bayesian probability and a single epoch.  But in the real world one often has multiple or ambiguous epochs. Thus conventional notions of information are conditional on the assumed context. In practice, the component of ‘information’ measured by the conventional approach may be much less important than that which is neglected.

### Shocks

It is normally supposed that pieces of evidence are independent and so information commutes:

P(E1+E2|H:C) = P(E1|H:C).P(E2|H:C) = P(E2|H:C).P(E1|H:C) = P(E2+E1|H:C)

But if we need to re-evaluate the context, C, this is not the case unless we re-evaluate old data against the new context. Thus we may define a ‘shock’ as anything that requires us to change the likelihood functions for either past or future data. Such shocks occur when the data had been considered unlikely according to an assumption. Alternatively, if we are maintaining likelihoods against many possible assumptions, a shock occurs when none of them remains probable.

Keynes, Turing and Good developed a theory of ‘weight of evidence’ for reasoning in complex worlds. TBB

David Marsay