Which Mathematics of Uncertainty for Today’s Challenges?
June 7, 2011 11 Comments
This is a slight adaptation of a technical paper presented to an IMA conference 16 Nov. 2009, in the hope that it may be of broader interest. It argues that ‘Knightian uncertainty’, in Keynes’ mathematical form, provides a much more powerful, appropriate and safer approach to uncertainty than the more familiar ‘Bayesian (numeric) probability’.
Issues
Conventional Probability
Keynes et al suggest that there is more to uncertainty than random probability. We seem to be able to cope with high volumes of deterministic or probabilistic data, or low volumes of less certain data, but to have problems at the margins. This leads to the questions:

How complex is the contemporary world?
 What is the perceptual problem?
 What is contemporary uncertainty like?
 How is uncertainty engaged with?
Probability arises from a definite context
Objective numeric probabilities can arise through random mechanisms, as in gambling. Subjective probabilities are often adequate for familiar, situations where decisions are shortterm, with only cumulative longterm impact, at worst. This is typical of the application of established science and engineering where one has a kind of ‘information dominance’ and there are only variations within an established frame / context.
Contexts
Thus (numeric) probability is appropriate where:
 Competition is coherent and takes place within a stable, utilitarian, framework.
 Innovation does not challenge the overarching status quo or ‘world view’

We only ever need to estimate the current parameters within a given model.
 Uncertainty can be managed. Uncertainty about estimates can be represented by numbers (probability distributions), as if they were principally due to noise or other causes of variation.
 Numeric probability is multiplied by value to give a utility, which is optimised.
 Risk is only a number, negative utility.
Uncertainty is measurable (in one dimension) where one has so much stability that almost everything is measurable.
Probability Theory
Probability theories typically build on Bayes’ rule [Cox] :
P(HE) = P(H).(P(EH)/P(E)),
where P(EH) denotes the ‘likelihood’, the probability of evidence, E, given a hypothesis, H. Thus the final probability is the prior probability times the ‘likelihood ratio’.
The key assumptions are that:

The selection of evidence for a given hypothesis, H, is indistinguishable from a random process with a proper numeric likelihood function, P( · H).

The selection of the hypothesis that actually holds is indistinguishable from random selection from a set {Hi} with ‘priors’ P(Hi) – that can reasonably be estimated – such that

P(HiÇHj) = 0 for i ¹ j (nonintersection)

P(ÈiHi) = 1 (completeness).
It follows that P(E) = SiP(EHi).P(Hi) is welldefined.
H may be composite, so that there are many proper subhypotheses, h Þ H, with different likelihoods, P(Eh). It is then common to use the Bayesian likelihood,
P(EH) = òh ÞHP(Eh).dP(hH),
or
P(EH) = P(Eh), for some representative hypothesis h.
In either case, hypotheses should be chosen to ensure that the expected likelihood is maximal for the true hypothesis.
Bayes noted a fundamental problem with such conventional probability: “[Even] where the course of nature has been the most constant … we can have no reason for thinking that there are no causes in nature which will ever interfere with the operations the causes from which this constancy is derived.”
Uncertain in Contemporary Life
Uncertainty arises from an indefinite context
Uncertainty may arise through human decisionmaking, adaptation or evolution, and may be significant for situations that are unfamiliar or for decisions that may have longterm impact. This is typical of the development of science in new areas, and of competitions where unexpected innovation can transform aspects of contemporary life. More broadly still, it is typical of situations where we have a poor information position or which challenge our sensemaking, and where we could be surprised, and so need to alter our framing of the situation. For example, where others can be adaptive or innovative and hence surprising.
Contexts

Competitions, cooperations, collaborations, confrontations and conflicts all nest and overlap messily, each with their own nature.
 Perception is part of multiple coadaptations.
 Uncertainty can be shaped but not fully tamed. Only the most careful reasoning will do.
 Uncertainty and utility are imprecise and conditional. One can only satisfice, not optimise.
 Critical risks arise from the unanticipated.
Likelihoods, Evidence
In Plato’s republic the elite make the rules which form a fixed context for the plebs. But in contemporary life the rulers only rule with the consent of the ruled and in so far as the rules of the game ’cause’ (or at least influence) the behaviour of the players, the participants have reason to interfere with causes, and in many cases we expect it: it is how things get done. J.M. Keynes and I.J. Good (under A.M.Turing) developed techniques that may be used for such ‘haphazard’ situations, as well as random ones.
The distinguishing concepts are: The law of evidence; generalized weight of evidence (woe) and iterative fusion.
If datum, E, has a distribution f(·) over a possibility space, ℰ, then ” distributions g(·) over ℰ,
òℰlog(f(E)).f(E ) ³ òℰlog(g(E)).f(E).
I.e. the crossentropy is no more than the entropy. For a hypothesis H in a context, C, such that the likelihood function g = PH:C is welldefined, the weight of evidence (woe) due to E for H is defined to be:
W(EH:C) º log(PH:C (E)).
Thus the ‘law of evidence’: that the expected woe for the truth is never exceeded by that for any other hypothesis. (But the evidence may indicate that many or none of thehypotheses fit.) For composite hypotheses, the generalized woe is:
W(EH:C) º suph ÞH {W(Eh:C)}.
This is defined even for a haphazard selection of h.
Let ds(·) be a discounting factor for the source, s [Good]. If one has independent evidence, Es, from different sources, s, then typically the fusion equation is:
W(EH:C,ds) £ Ss{ds (W(Es H:C))},
with equality for precise hypotheses. Together, generalized woe and fusion determine how woe is propagated through a network, where the woe for a hypothesis is dependent on an assumption which itself has evidence. The inequality forces iterative fusion, whereby one refines candidate hypotheses until one has adequate precision. If circumstantial evidence indicates that the particular situation is random, one could take full account of it, to obtain the same result as Bayes, or discount [Good].
In some cases it is convenient, as Keynes does, to use an interval likelihood or woe, taking the infimum and supremum of possible values. The only assumption is that the evidence can be described as a probabilistic outcome of a definite hypothesis, even if the overall situation is haphazard. In practice, the use of likelihoods is often combined with conjectural causal modelling, to try to get at a deep understanding of situations.
Examples
Crises
Above is an informal attempt to illustrate typical crisis kinematics, such as the financial crisis of 2007/8. It is intended to capture the notion that conventional probability calculations may suffice for long periods, but overdependence on such classical constructs can lead to shocks or crises. To avoid or mitigate these more attention should be given to uncertainty [Turner].
An ambush
Uncertainty is not necessarily esoteric or longterm. It can be found wherever the assumptions of conventional probability theory do not hold, in particular in multilevel games. I would welcome more examples that are simple to describe, relatively common and where the significance of uncertainty is easy to show.
Deer need to make a morning run from A to B. Routes r, s, t are possible. A lion may seek to ambush them. Suppose that the indicators of potential ambushes are equal. Now in the last month route r has been used 25 times, s 5 times and t never, without incident. What is the ‘probability’ of an ambush for the 3 routes?
Let A=“The Lion deploys randomly each day with a fixed probability distribution, p”. Here we could use a Bayesian probability distribution over p, with some sensitivity analysis.
But this is not the only possibility. Alternatively, let B =“The Lion has reports about some of our runs, and will adapt his deployments.” We could use a Bayesian model for the Lion, but with less confidence. Alternatively, we could use likelihoods.
Route s is intermediate in characteristics between the other two. There is no reason to expect an ambush at s that doesn’t apply to one of the other two. On the other hand, if the ambush is responsive to the number of times a route is used then r is more likely than s or t, and if the ambush is on a fixed route, it is only likely to be on t. Hence s is the least likely to have an ambush.
Consistently selecting routes using a fixed probability distribution is not as effective as a muddling strategy [Binmore] which varies the distribution, supporting learning and avoiding an exploitable equilibrium.
Concluding Remarks
Conventional (numeric) probability, utility and rationality all extrapolate based on a presumption of stability. If two or more parties are coadapting or coevolving any equilibria tend to be punctuated, and so a more general approach to uncertainty, information, communication, value and rationality is indicated, as identified by Keynes, with implications for ‘risk’.
Dave Marsay, Ph.D., C.Math FIMA, Fellow ISRS
References:
Bayes, T. An Essay towards solving a Problem in the Doctrine of Chances (1763), Philosophical Transactions of the Royal Society of London 53, 370–418. Regarded by most Englishspeakers as ‘the source’.
Binmore, K, Rational Decisions (2009), Princeton U Press. Rationality for ‘muddles’, citing Keynes and Turing. Also http://else.econ.ucl.ac.uk/papers/uploaded/266.pdf .
Cox, R.T. The Algebra of Probable Inference (1961) Johns Hopkins University Press, Baltimore, MD. The main justification for the ‘Bayesian’ approach, based on a belief function for sets whose results are comparable. Keynes et al deny these assumptions. Also Jaynes, E.T. Probability Theory: The Logic of Science (1995) http://bayes.wustl.edu/etj/prob/book.pdf .
Good, I.J. Probability and Weighting of Evidence (1950), Griffin, London. Describes the basic techniques developed and used at Bletchley Park. Also Explicativity: A Mathematical Theory of Explanation with Statistical Applications (1977) Proc. R. Soc. Lond. A 354, 303330, etc. Covers discounting, particularly of priors. More details have continued to be released up until 2006.
Hodges, A. Alan Turing (1983) Hutchinson, London. Describes the development and use of ‘weights of evidence’, “which constituted his major conceptual advance at Bletchley”.
Keynes, J.M. Treatise on Probability (1920), MacMillan, London. Fellowship essay, under Whitehead. Seminal work, outlines the pros and cons of the numeric approach to uncertainty, and develops alternatives, including interval probabilities and the notions of likelihood and weights of evidence, but not a ‘definite method’ for coping with uncertainty.
Smuts, J.C. The Scientific WorldPicture of Today, British Assoc. for the Advancement of Science, Report of the Centenary Meeting. London: Office of the BAAS. 1931. (The Presidential Address.) A view from an influential guerrilla leader, General, War Cabinet Minister and supporter of ‘modern’ science, who supported Keynes and applied his ideas widely.
Turner, The Turner Review: A regulatory response to the global banking crisis (2009). Notes the consequences of simply extrapolating, ignoring nonprobabilistic (‘Knightian’) uncertainty.
Whitehead, A.N. Process and Reality (1929: 1979 corrected edition) Eds. D.R. Griffin and D.W. Sherburne, Free Press. Whitehead developed the logical alternative to the classical view of uniform unconditional causality.
Related articles
 Now I Know My p(A), p(B), p(C) … (iterativepath.wordpress.com)
 Not Knowing When or Where You’re At (ignoranceanduncertainty.wordpress.com)