## Instrumental Probabilities

Reflecting on my recent contribution to the economics ejournal special issue on uncertainty (comments invited), I realised that from a purely mathematical point of view, the current mainstream mathematical view, as expressed by Dawid, could be seen as a very much more accessible version of Keynes’. But there is a difference in expression that can be crucial.

In Keynes’ view ‘probability’ is a very general term, so that it always legitimate to ask about the probability of something. The challenge is to determine the probability, and in particular whether it is just a number. In some usages, as in Kolmogorov, the term probability is reserved for those cases where certain axioms hold. In such cases the answer to a request for a probability might be to say that there isn’t one. This seems safe even if it conflicts with the questioner’s presuppositions about the universality of probabilities. In the instrumentalist view of Dawid, however, suggests that probabilistic methods are tools that can always be used. Thus the probability may exist even if it does not have the significance that one might think and, in particular, it is not appropriate to use it for ‘rational decision making’.

I have often come across seemingly sensible people who use ‘sophisticated mathematics’ in strange ways. I think perhaps they take an instrumentalist view of mathematics as a whole, and not just probability theory. This instrumentalist mathematics reminds me of Keynes’ ‘pseudo-mathematics’. But the key difference is that mathematicians, such as Dawid, know that the usage is only instrumentalist and that there are other questions to be asked. The problem is not the instrumentalist view as such, but the dogma (of at last some) that it is heretical to question widely used instruments.

The financial crises of 2007/8 were partly attributed by Lord Turner to the use of ‘sophisticated mathematics’. From Keynes’ perspective it was the use of pseudo-mathematics. My view is that if it is all you have then even pseudo-mathematics can be quite informative, and hence worthwhile. One just has to remember that it is not ‘proper’ mathematics. In Dawid’s terminology  the problem seems to be that the instrumental use of mathematics without any obvious concern for its empirical validity. Indeed, since his notion of validity concerns limiting frequencies, one might say that the problem was the use of an instrument that was stunningly inappropriate to the question at issue.

It has long seemed  to me that a similar issue arises with many miscarriages of justice, intelligence blunders and significant policy mis-steps. In Keynes’ terms people are relying on a theory that simply does not apply. In Dawid’s terms one can put it blunter: Decision-takers were relying on the fact that something had a very high probability when they ought to have been paying more attention to the evidence in the actual situation, which showed that the probability was – in Dawid’s terms – empirically invalid. It could even be that the thing with a high instrumental probability was very unlikely, all things considered.

## Artificial Intelligence?

The subject of ‘Artificial Intelligence’ (AI) has long provided ample scope for long and inconclusive debates. Wikipedia seems to have settled on a view, that we may take as straw-man:

Every aspect of learning or any other feature of intelligence can be so precisely described that a machine can be made to simulate it. [Dartmouth Conference, 1956] The appropriately programmed computer with the right inputs and outputs would thereby have a mind in exactly the same sense human beings have minds. [John Searle’s straw-man hypothesis]

Readers of my blog will realise that I agree with Searle that his hypothesis is wrong, but for different reasons. It seems to me that mainstream AI (mAI) is about being able to take instruction. This is a part of learning, but by no means all. Thus – I claim – mAI is about a sub-set of intelligence. In many organisational settings it may be that sub-set which the organisation values. It may even be that an AI that ‘thought for itself’ would be a danger. For example, in old discussions about whether or not some type of AI could ever act as a G.P. (General Practitioner – first line doctor) the underlying issue has been whether G.P.s ‘should’ think for themselves, or just apply their trained responses. My own experience is that sometimes G.P.s doubt the applicability of what they have been taught, and that sometimes this is ‘a good thing’. In effect, we sometimes want to train people, or otherwise arrange for them to react in predictable ways, as if they were machines. mAI can create better machines, and thus has many key roles to play. But between mAI and ‘superhuman intelligence’  there seems to be an important gap: the kind of intelligence that makes us human. Can machines display such intelligence? (Can people, in organisations that treat them like machines?)

One successful mainstream approach to AI is to work with probabilities, such a P(A|B) (‘the probability of A given B’), making extensive use of Bayes’ rule, and such an approach is sometimes thought to be ‘logical’, ‘mathematical, ‘statistical’ and ‘scientific’. But, mathematically, we can generalise the approach by taking account of some context, C, using Jack Good’s notation P(A|B:C) (‘the probability of A given B, in the context C’). AI that is explicitly or implicitly statistical is more successful when it operates within a definite fixed context, C, for which the appropriate probabilities are (at least approximately) well-defined and stable. For example, training within an organisation will typically seek to enable staff (or machines) to characterise their job sufficiently well for it to become routine. In practice ‘AI’-based machines often show a little intelligence beyond that described above: they will monitor the situation and ‘raise an exception’ when the situation is too far outside what it ‘expects’. But this just points to the need for a superior intelligence to resolve the situation. Here I present some thoughts.

When we state ‘P(A|B)=p’ we are often not just asserting the probability relationship: it is usually implicit that ‘B’ is the appropriate condition to consider if we are interested in ‘A’. Contemporary mAI usually takes the conditions a given, and computes ‘target’ probabilities from given probabilities. Whilst this requires a kind of intelligence, it seems to me that humans will sometimes also revise the conditions being considered, and this requires a different type of intelligence (not just the ability to apply Bayes’ rule). For example, astronomers who refine the value of relevant parameters are displaying some intelligence and are ‘doing science’, but those first in the field, who determined which parameters are relevant employed a different kind of intelligence and were doing a different kind of science. What we need, at least, is an appropriate way of interpreting and computing ‘probability’ to support this enhanced intelligence.

The notions of Whitehead, Keynes, Russell, Turing and Good seem to me a good start, albeit they need explaining better – hence this blog. Maybe an example is economics. The notion of probability routinely used would be appropriate if we were certain about some fundamental assumptions. But are we? At least we should realise that it is not logical to attempt to justify those assumptions by reasoning using concepts that implicitly rely on them.

Dave Marsay

## Assessing and Communicating Risks and Uncertainty

David Spielgelhalter Assessing and Communicating Risks and Uncertainty Science in Parliament vol 69, no. 2, pp. 21-26. This is part of the IMA’s Mathematics Matters: A Crucial Contribution to the Country’s Economy.

This starts with a Harvard study showing that “a daily portion of red meat was associated with an increase in the annual risk of death by 13% over the period of the study”. Does this mean, as the Daily Express claimed, that “10% of all deaths could be avoided”?

David S uses ‘survival analysis’ to show that “a 40 year-old  man who eats a quarter-pound burger for his working lunch each day can expect, on average, to live to 79, while his mate who avoids the burger can expect to live to 80.” He goes on: “over a lifetime habit, each daily portion of red meat is associated with about 30 minutes off your life expectancy .. ” (my emphasis.)

As a mathematician advising politicians and other decision-makers, I would not be comfortable that policy-makers understood this, and would act appropriately. They might, for example, assume that we should all be discouraged from eating too much red meat.

Even some numerate colleagues with some exposure to statistics might, I think, suppose that their life expectancy was being reduced by eating red meat. But all that is being said is that if a random person were selected from the population as a whole then – knowing nothing about them – a statistician would ‘expect’ them to have a shorter life if they eat red meat. But every actual individual ‘you’ has a family history and many by 40 will have had cholesterol tests. It is not clear what the relevance to them is of the statistician’s ‘averaged’ figures.

Generally speaking, statistics gathered for one set of factors cannot be used to draw precise conclusions about  other sets of factors, much less about individuals. David S’s previous advice at Don’t Know, Can’t Know applies. In my experience, it is not safe to assume that the audience will appreciate these finer points. All that I would take from the Harvard study is that if you eat red meat most days it might be a good idea to consult your doctor. I would also hope that there was research going on into the factors in the apparent dangers.

I would appreciate a link to the original study.

Dave Marsay

## ESP and significance

‘Understanding Uncertainty’ has a blog (‘uu blog’) on ESP and significance. The challenge for those not believing in ESP is an experiment which seems to show ‘statistically significant’ but mild ESP. This could be like a drug company that tests lots of drugs until it gets a ‘statistically significant’ result, but from the account it seems more significant than this.

The problem for an ESP atheist who is also a Bayesian is in trying to interpret the result of a significance test as a (subjective) probability that some ESP was present, as the above blog discusses. But from a sequential testing point of view (e.g. of Wald) we would simply take the significance as a threshold which stimulates us to test the conclusion. In typical science one would repeat the experiment and regard it as significant if the result was not repeated. But with ESP the ‘aura’ of the experimenter or place may be significant, so a failure by others to replicate a result may simply mean that only sometimes is ESP shown in the experimental set-up. So what is a ‘reasonable’ acceptance criterion?

Jack Good discussed the issues arising from ESP in some detail, including those above. He developed the notion of ‘weight of evidence’, which is the log of the appropriate likelihood ratio. There are some technical differences to the approach of the ‘uu blog’. They offer some advantages.

If e is the evidence/data obtained from an experiment and h is a hypothesis (e.g. the null hypothesis) then P(e|h) denotes the likelihood, where P() is the (Bayesian) probability. To be well-defined the likelihood should be entailed by the hypothesis.

One problem is that the likelihood depends on the granularity with which we measure the data, and so – on its own – is meaningless. In significance testing one defines E(e) to be the set of all data that is at least as ‘extreme’ as e, and uses the likelihood P(E(e)|h) to determine ‘1-significance’. But (as in ‘uu blog’) what one really wants is P(¬h|e).

In this experiment one is not comparing one theory or model with another, but a statistical ‘null hypothesis’ with its complement, which is very imprecise, so that it is not clear what the appropriate likelihood is. ‘uu blog’ describes the Bayesian approach, of having prior distributions as to how great an ESP effect might be, if there is one. To me this is rather like estimating how many angels one could get on a pin-head. An alternative is to use Jack Good’s ‘generalized likelihood’. In principal one considers all possible theories and takes the likelihood of the one that best explains the evidence. This is then used to form a likelihood ratio, as in ‘uu blog’, or the log likelihood is used as a ‘weight of evidence, as at Bletchley Park. In this ESP case one might consider subjects to have some probability of guessing correctly, varying the probability to get the best likelihood. (This seems to be about 52% as against the 50% of the null hypothesis.) Because the alternative to the null hypothesis includes biases that are arbitrarily close to the null hypothesis, one will ‘almost always’ find some positive or negative ESP effect. The interesting thing would be to consider the distribution of such apparent effects for the null hypothesis, and hence judge the significance of a result of 52%.

This seems a reasonable thing to do, even though there may be many hypotheses that we haven’t considered and so our test is quite weak. It is up to those claiming ESP to put forward hypotheses for testing.

A difficulty of the above procedure is that investigators and journals only tend to report positive results (‘uu blog’ hints at this). According to Bayesians one should estimate how many similar experiments have been done first and then accept ESP as ‘probable’ if a result appears sufficiently significant. I’m afraid I would rather work the other way: assess how many experiments there would have to be to make an apparently significant result really significant, and then judge whether it was credible that so many experiments had been done. Even if not, I would remain rather cynical unless and until the experiment could be refined to give a more definite and repeatable effect. Am I unscientific?

Dave Marsay