Uncertainty is not just probability

I have just had published my paper, based on the discussion paper referred to in a previous post. In Facebook it is described as:

An understanding of Keynesian uncertainties can be relevant to many contemporary challenges. Keynes was arguably the first person to put probability theory on a sound mathematical footing. …

So it is not just for economists. I could be tempted to discuss the wider implications.

Comments are welcome here, at the publisher’s web site or on Facebook. I’m told that it is also discussed on Google+, Twitter and LinkedIn, but I couldn’t find it – maybe I’ll try again later.

Dave Marsay

Instrumental Probabilities

Reflecting on my recent contribution to the economics ejournal special issue on uncertainty (comments invited), I realised that from a purely mathematical point of view, the current mainstream mathematical view, as expressed by Dawid, could be seen as a very much more accessible version of Keynes’. But there is a difference in expression that can be crucial.

In Keynes’ view ‘probability’ is a very general term, so that it always legitimate to ask about the probability of something. The challenge is to determine the probability, and in particular whether it is just a number. In some usages, as in Kolmogorov, the term probability is reserved for those cases where certain axioms hold. In such cases the answer to a request for a probability might be to say that there isn’t one. This seems safe even if it conflicts with the questioner’s presuppositions about the universality of probabilities. In the instrumentalist view of Dawid, however, suggests that probabilistic methods are tools that can always be used. Thus the probability may exist even if it does not have the significance that one might think and, in particular, it is not appropriate to use it for ‘rational decision making’.

I have often come across seemingly sensible people who use ‘sophisticated mathematics’ in strange ways. I think perhaps they take an instrumentalist view of mathematics as a whole, and not just probability theory. This instrumentalist mathematics reminds me of Keynes’ ‘pseudo-mathematics’. But the key difference is that mathematicians, such as Dawid, know that the usage is only instrumentalist and that there are other questions to be asked. The problem is not the instrumentalist view as such, but the dogma (of at last some) that it is heretical to question widely used instruments.

The financial crises of 2007/8 were partly attributed by Lord Turner to the use of ‘sophisticated mathematics’. From Keynes’ perspective it was the use of pseudo-mathematics. My view is that if it is all you have then even pseudo-mathematics can be quite informative, and hence worthwhile. One just has to remember that it is not ‘proper’ mathematics. In Dawid’s terminology  the problem seems to be that the instrumental use of mathematics without any obvious concern for its empirical validity. Indeed, since his notion of validity concerns limiting frequencies, one might say that the problem was the use of an instrument that was stunningly inappropriate to the question at issue.

It has long seemed  to me that a similar issue arises with many miscarriages of justice, intelligence blunders and significant policy mis-steps. In Keynes’ terms people are relying on a theory that simply does not apply. In Dawid’s terms one can put it blunter: Decision-takers were relying on the fact that something had a very high probability when they ought to have been paying more attention to the evidence in the actual situation, which showed that the probability was – in Dawid’s terms – empirically invalid. It could even be that the thing with a high instrumental probability was very unlikely, all things considered.

Artificial Intelligence?

The subject of ‘Artificial Intelligence’ (AI) has long provided ample scope for long and inconclusive debates. Wikipedia seems to have settled on a view, that we may take as straw-man:

Every aspect of learning or any other feature of intelligence can be so precisely described that a machine can be made to simulate it. [Dartmouth Conference, 1956] The appropriately programmed computer with the right inputs and outputs would thereby have a mind in exactly the same sense human beings have minds. [John Searle’s straw-man hypothesis]

Readers of my blog will realise that I agree with Searle that his hypothesis is wrong, but for different reasons. It seems to me that mainstream AI (mAI) is about being able to take instruction. This is a part of learning, but by no means all. Thus – I claim – mAI is about a sub-set of intelligence. In many organisational settings it may be that sub-set which the organisation values. It may even be that an AI that ‘thought for itself’ would be a danger. For example, in old discussions about whether or not some type of AI could ever act as a G.P. (General Practitioner – first line doctor) the underlying issue has been whether G.P.s ‘should’ think for themselves, or just apply their trained responses. My own experience is that sometimes G.P.s doubt the applicability of what they have been taught, and that sometimes this is ‘a good thing’. In effect, we sometimes want to train people, or otherwise arrange for them to react in predictable ways, as if they were machines. mAI can create better machines, and thus has many key roles to play. But between mAI and ‘superhuman intelligence’  there seems to be an important gap: the kind of intelligence that makes us human. Can machines display such intelligence? (Can people, in organisations that treat them like machines?)

One successful mainstream approach to AI is to work with probabilities, such a P(A|B) (‘the probability of A given B’), making extensive use of Bayes’ rule, and such an approach is sometimes thought to be ‘logical’, ‘mathematical, ‘statistical’ and ‘scientific’. But, mathematically, we can generalise the approach by taking account of some context, C, using Jack Good’s notation P(A|B:C) (‘the probability of A given B, in the context C’). AI that is explicitly or implicitly statistical is more successful when it operates within a definite fixed context, C, for which the appropriate probabilities are (at least approximately) well-defined and stable. For example, training within an organisation will typically seek to enable staff (or machines) to characterise their job sufficiently well for it to become routine. In practice ‘AI’-based machines often show a little intelligence beyond that described above: they will monitor the situation and ‘raise an exception’ when the situation is too far outside what it ‘expects’. But this just points to the need for a superior intelligence to resolve the situation. Here I present some thoughts.

When we state ‘P(A|B)=p’ we are often not just asserting the probability relationship: it is usually implicit that ‘B’ is the appropriate condition to consider if we are interested in ‘A’. Contemporary mAI usually takes the conditions a given, and computes ‘target’ probabilities from given probabilities. Whilst this requires a kind of intelligence, it seems to me that humans will sometimes also revise the conditions being considered, and this requires a different type of intelligence (not just the ability to apply Bayes’ rule). For example, astronomers who refine the value of relevant parameters are displaying some intelligence and are ‘doing science’, but those first in the field, who determined which parameters are relevant employed a different kind of intelligence and were doing a different kind of science. What we need, at least, is an appropriate way of interpreting and computing ‘probability’ to support this enhanced intelligence.

The notions of Whitehead, Keynes, Russell, Turing and Good seem to me a good start, albeit they need explaining better – hence this blog. Maybe an example is economics. The notion of probability routinely used would be appropriate if we were certain about some fundamental assumptions. But are we? At least we should realise that it is not logical to attempt to justify those assumptions by reasoning using concepts that implicitly rely on them.

Dave Marsay

Who thinks probability is just a number? A plea.

Many people think – perhaps they were taught it – that it is meaningful to talk about the unconditional probability of ‘Heads’ (I.e. P(Heads)) for a real coin, and even that there are logical or mathematical arguments to this effect. I have been collecting and commenting on works which have been – too widely – interpreted in this way, and quoting their authors in contradiction. De Finetti seemed to be the only example of a respected person who seemed to think that he had provided such an argument. But a friendly economist has just forwarded a link to a recent work that debunks this notion, based on wider  reading of his work.

So, am I done? Does anyone have any seeming mathematical sources for the view that ‘probability is just a number’ for me to consider?

There are some more modern authors who make strong claims about probability, but – unless you know different – they rely on the above, and hence do not need to be addressed separately. I do also opine on a few less well known sources: you can search my blog to check.

Dave Marsay

Coin toss puzzle

This is intended as a counter-example to the view, such as Savage’s, that uncertainty can, in practice, be treated as numeric probability.

You have a coin that you know is fair. A known trickster (me?) shows you what looks like an ordinary coin and offers you a choice of the following bets:

1. You both toss your own coins. You win if they match, otherwise they win.
2. They toss their coin while you call ‘heads’ or ‘tails’.

Do you have any preference between the two bets? Why? And …

In each case, what is the probability that their coin will come up heads?

Dave Marsay

Clarification

In (1) suppose that you can arrange things so that the trickster cannot tell how your coin will land in time to influence their coin, so that the probability of a match is definitely 0.5, with no uncertainty. The situation in (2) can be similar, except that your call replaces the toss of a fair coin.

Other uncertainty puzzles .

Uncertain Urns Puzzle

A familiar probability example, using urns, is adapted to illustrate ‘true’ (non-numeric) uncertainty.

Simple situation

The following is a good teaching example:

Suppose that an urn is known to contain black and white balls that are otherwise identical. A subject claims to be able to predict the colour of a ball that they draw ‘at random’.

They ‘predict’ and draw a black ball. What are the odds that they are really able to predict?

From a Bayesian perspective, the final odds are the initial odds times the likelihood ratio. If there are b black and w balls and we represent the evidence by E and likelihoods by P( E | ), then P( E | Predict ) = 1 and P( E | Luck) = b/(b+w). Thus the rarer the phenomenon predicted, the more a correct prediction tends to support the claim, of reliable prediction.

Common quibbles

There is, however, some subjectivity in the estimated probability that the subject can predict:

• In this case, the initial odds seem somewhat arbitrary, and Bayes’ rule seems not to apply. For example, have you considered that the different colours may result in different temperatures? Such a thought is not ‘evidence’ in the sense of Bayes’ rule, but might change your subjective estimate of the probability prior to their draw.
• If we do not know the proportions of black and white balls for sure then the likelihood is uncertain.

Multiple urns

Here we introduce a different type of uncertainty:

Suppose now that the subject is faced with two urns and selects a ball from one. Given the number of black and white balls in each urn, what is the likelihood, P( E | Luck ), of a correct prediction due to luck?

If you think the question is ambiguous, please disambiguate it however you wish.

Suppose you know the total numbers of black and white balls in the two urns. Is the likelihood estimate P( E | Luck) = b/(b+w) reasonable? Could it be biased? How?

Dave Marsay

Which Mathematics of Uncertainty for Today’s Challenges?

This is a slight adaptation of a technical paper presented to an IMA conference 16 Nov. 2009, in the hope that it may be of broader interest. It argues that ‘Knightian uncertainty’, in Keynes’ mathematical form, provides a much more powerful, appropriate and safer approach to uncertainty than the more familiar ‘Bayesian (numeric) probability’.

Issues

Conventional Probability

There are gaps in the capability to handle both inherent uncertainty and rapid change.

Keynes et al suggest that there is more to uncertainty than random probability. We seem to be able to cope with high volumes of deterministic or probabilistic data, or low volumes of less certain data, but to have problems at the margins. This leads to the questions:

• How complex is the contemporary world?
• What is the perceptual problem?
• What is contemporary uncertainty like?
• How is uncertainty engaged with?

Probability arises from a definite context

Objective numeric probabilities can arise through random mechanisms, as in gambling. Subjective probabilities are often adequate for familiar, situations where decisions are short-term, with only cumulative long-term impact, at worst. This is typical of the application of established science and engineering where one has a kind of ‘information dominance’ and there are only variations within an established frame / context.

Contexts

Thus (numeric) probability is appropriate where:

• Competition is coherent and takes place within a stable, utilitarian, framework.
• Innovation does not challenge the over-arching status quo or ‘world view’
• We only ever need to estimate the current parameters within a given model.
• Uncertainty can be managed. Uncertainty about estimates can be represented by numbers (probability distributions), as if they were principally due to noise or other causes of variation.
• Numeric probability is multiplied by value to give a utility, which is optimised.
• Risk is only a number, negative utility.

Uncertainty is measurable (in one dimension) where one has so much stability that almost everything is measurable.

Probability Theory

Probability theories typically build on Bayes’ rule [Cox] :

P(H|E) = P(H).(P(E|H)/P(E)),

where P(E|H) denotes the ‘likelihood’, the probability of evidence, E, given a hypothesis, H. Thus the final probability is the prior probability times the ‘likelihood ratio’.

The key assumptions are that:

• The selection of evidence for a given hypothesis, H, is indistinguishable from a random process with a proper numeric likelihood function, P( · |H).
• The selection of the hypothesis that actually holds is indistinguishable from random selection from a set {Hi} with ‘priors’ P(Hi) – that can reasonably be estimated – such that
• P(HiÇHj) = 0 for i ¹ j (non-intersection)
• P(ÈiHi) = 1 (completeness).

It follows that P(E) = SiP(E|Hi).P(Hi) is well-defined.

H may be composite, so that there are many proper sub-hypotheses, h Þ H, with different likelihoods, P(E|h). It is then common to use the Bayesian likelihood,

P(E|H) = òh ÞHP(E|h).dP(h|H),

or

P(E|H) = P(E|h), for some representative hypothesis h.

In either case, hypotheses should be chosen to ensure that the expected likelihood is maximal for the true hypothesis.

Bayes noted a fundamental problem with such conventional probability: “[Even] where the course of nature has been the most constant … we can have no reason for thinking that there are no causes in nature which will ever interfere with the operations the causes from which this constancy is derived.”

Uncertain in Contemporary Life

Uncertainty arises from an indefinite context

Uncertainty may arise through human decision-making, adaptation or evolution, and may be significant for situations that are unfamiliar or for decisions that may have long-term  impact. This is typical of the development of science in new areas, and of competitions where unexpected innovation can transform aspects of contemporary life. More broadly still, it is typical of situations where we have a poor information position or which challenge our sense-making, and where we could be surprised, and so need to alter our framing of the situation. For example, where others can be adaptive or innovative and hence surprising.

Contexts

• Competitions, cooperations, collaborations, confrontations and conflicts all nest and overlap messily, each with their own nature.
• Perception is part of multiple co-adaptations.
• Uncertainty can be shaped but not fully tamed. Only the most careful reasoning will do.
• Uncertainty and utility are imprecise and conditional. One can only satisfice, not optimise.
• Critical risks arise from the unanticipated.

Likelihoods, Evidence

In Plato’s republic the elite make the rules which form a fixed context for the plebs. But in contemporary life the rulers only rule with the consent of the ruled and in so far as the rules of the game ’cause’ (or at least influence) the behaviour of the players, the participants have reason to interfere with causes, and in many cases we expect it: it is how things get done. J.M. Keynes and I.J. Good (under A.M.Turing) developed techniques that may be used for such ‘haphazard’ situations, as well as random ones.

The distinguishing concepts are: The law of evidence; generalized weight of evidence (woe) and iterative fusion.

If datum, E, has a distribution f(·) over a possibility space, , then distributions g(·) over ,

òlog(f(E)).f(E )  ³ òlog(g(E)).f(E).

I.e. the cross-entropy is no more than the entropy. For a hypothesis H in a context, C, such that the likelihood function g = PH:C is well-defined, the weight of evidence (woe) due to E for H is defined to be:

W(E|H:C) º log(PH:C (E)).

Thus the ‘law of evidence’: that the expected woe for the truth is never exceeded by that for any other hypothesis. (But the evidence may indicate that many or none of thehypotheses fit.) For composite hypotheses, the generalized woe is:

W(E|H:C) º suph ÞH {W(E|h:C)}.

This is defined even for a haphazard selection of h.

Let ds(·) be a discounting factor for the source, s [Good]. If one has independent evidence, Es, from different sources, s, then typically the fusion equation is:

W(E|H:C,ds) £ Ss{ds (W(Es |H:C))},

with equality for precise hypotheses. Together, generalized woe and fusion determine how woe is propagated through a network, where the woe for a hypothesis is dependent on an assumption which itself has evidence. The inequality forces iterative fusion, whereby one refines candidate hypotheses until one has adequate precision. If circumstantial evidence indicates that the particular situation is random, one could take full account of it, to obtain the same result as Bayes, or discount [Good].

In some cases it is convenient, as Keynes does, to use an interval likelihood or woe, taking the infimum and supremum of possible values. The only assumption is that the evidence can be described as a probabilistic outcome of a definite hypothesis, even if the overall situation is haphazard. In practice, the use of likelihoods is often combined with conjectural causal modelling, to try to get at a deep understanding of situations.

Examples

Crises

Typical crisis dynamics

Above is an informal attempt to illustrate typical crisis kinematics, such as the financial crisis of 2007/8. It is intended to capture the notion that conventional probability calculations may suffice for long periods, but over-dependence on such classical constructs can lead to shocks or crises. To avoid or mitigate these more attention should be given to uncertainty [Turner].

An ambush

Uncertainty is not necessarily esoteric or long-term. It can be found wherever the assumptions of conventional probability theory do not hold, in particular in multilevel games. I would welcome more examples that are simple to describe, relatively common and where the significance of uncertainty is easy to show.

Deer need to make a morning run from A to B. Routes r, s, t are possible. A lion may seek to ambush them. Suppose that the indicators of potential ambushes are equal. Now in the last month route r has been used 25 times, s 5 times and t never, without incident. What is the ‘probability’ of an ambush for the 3 routes?

Let A=“The Lion deploys randomly each day with a fixed probability distribution, p”. Here we could use a Bayesian probability distribution over p, with some sensitivity analysis.

But this is not the only possibility. Alternatively, let B =“The Lion has reports about some of our runs, and will adapt his deployments.” We could use a Bayesian model for the Lion, but with less confidence. Alternatively, we could use likelihoods.

Route s is intermediate in characteristics between the other two. There is no reason to expect an ambush at s that doesn’t apply to one of the other two. On the other hand, if the ambush is responsive to the number of times a route is used then r is more likely than s or t, and if the ambush is on a fixed route, it is only likely to be on t. Hence s is the least likely to have an ambush.

Consistently selecting routes using a fixed probability distribution is not as effective as a muddling strategy [Binmore] which varies the distribution, supporting learning and avoiding an exploitable equilibrium.

Concluding Remarks

Conventional (numeric) probability, utility and rationality all extrapolate based on a presumption of stability. If two or more parties are co-adapting or co-evolving any equilibria tend to be punctuated, and so a more general approach to uncertainty, information, communication, value and rationality is indicated, as identified by Keynes, with implications for ‘risk’.

Dave Marsay, Ph.D., C.Math FIMA, Fellow ISRS

References:

Bayes, T. An Essay towards solving a Problem in the Doctrine of Chances (1763), Philosophical Transactions of the Royal Society of London 53, 370–418. Regarded by most English-speakers as ‘the source’.

Binmore, K, Rational Decisions (2009), Princeton U Press. Rationality for ‘muddles’, citing Keynes and Turing. Also http://else.econ.ucl.ac.uk/papers/uploaded/266.pdf .

Cox, R.T. The Algebra of Probable Inference (1961) Johns Hopkins University Press, Baltimore, MD. The main justification for the ‘Bayesian’ approach, based on a belief function for sets whose results are comparable. Keynes et al deny these assumptions. Also Jaynes, E.T. Probability Theory: The Logic of Science (1995) http://bayes.wustl.edu/etj/prob/book.pdf .

Good, I.J. Probability and Weighting of Evidence (1950), Griffin, London. Describes the basic techniques developed and used at Bletchley Park. Also Explicativity: A Mathematical Theory of Explanation with Statistical Applications (1977) Proc. R. Soc. Lond. A 354, 303-330, etc. Covers discounting, particularly of priors. More details have continued to be released up until 2006.

Hodges, A. Alan Turing (1983) Hutchinson, London. Describes the development and use of ‘weights of evidence’, “which constituted his major conceptual advance at Bletchley”.

Keynes, J.M. Treatise on Probability (1920), MacMillan, London. Fellowship essay, under Whitehead. Seminal work, outlines the pros and cons of the numeric approach to uncertainty, and develops alternatives, including interval probabilities and the notions of likelihood and weights of evidence, but not a ‘definite method’ for coping with uncertainty.

Smuts, J.C. The Scientific World-Picture of Today, British Assoc. for the Advancement of Science, Report of the Centenary Meeting. London: Office of the BAAS. 1931. (The Presidential Address.) A view from an influential guerrilla leader, General, War Cabinet Minister and supporter of ‘modern’ science, who supported Keynes and applied his ideas widely.

Turner, The Turner Review: A regulatory response to the global banking crisis (2009). Notes the consequences of simply extrapolating, ignoring non-probabilistic (‘Knightian’) uncertainty.

Whitehead, A.N. Process and Reality (1929: 1979 corrected edition) Eds. D.R. Griffin and D.W. Sherburne, Free Press. Whitehead developed the logical alternative to the classical view of uniform unconditional causality.