# Don’t know, can’t know

This recognizes the limitations of Bayesian probability: ‘uncertainty is just a number’; and seeks to ameliorate them.

D.J. Spiegelhalter and H. Riesch** Don’t know, can’t know: embracing deeper uncertainties when analysing risks ***Phil. Trans. R. Soc. A *(2011) **369**, 4730–4750. doi:10.1098/rsta.2011.0163

It advocates:

- performing a conventional Bayesian analysis
- taking a broad view of the associated uncertainties
- giving a numeric probability with an appropriate imprecision
- GRADEing it against the Cochrane Collaboration’s
**G**rading of**R**ecommendations**A**ssessments,**D**evelopment and**E**valuation scale.

This is clearly better than a bare probability. However, where the GRADE is poor *it might be better to used different probabilistic approaches* (e.g. comparative or conditional) that could be assigned a higher GRADE. For example, rather than be given just a whole population figure that is GRADED poorly because of significant within-population variability, a patient may prefer to know also about the results of smaller studies that are more pertinent to their conditions (e.g., gender, age, ethnicity). The paper hints at this, but it doesn’t come out in the recommendations.

## Abstract

Uncertainty is best seen as a relation, allowing a clear separation of the object, source and ‘owner’ of the uncertainty…[A]ll expressions of uncertainty are … based on possibly inadequate assumptions, and are therefore contingent. … [This] needs to be explicitly acknowledged in advice given to policy-makers … .

## Introduction

The academic literature about risk covers a broad spectrum from mathematical analyses to sociological discourses …. [T]he ‘quantitative’ extreme uses the formal language of probability theory … . [C]hallenges … arise from

scientific uncertainty… .[

O]ver-precise numerical expressions of the likelihood of events are potentially misleading and highly undesirable, since we may not feel confident either in delineating the set of events that may occur, or providing a precisely specified probability distribution over that set. … [A]ttempts to quantify scientific uncertainty [are] ‘taming chance’.

*(a) Case study. Intergovernmental Panel for Climate Change methods for handling of uncertainty*

This discusses the IPCC approach, based on level of agreement and amount of evidence. This seems to justify the notion that there is more to uncertainty than numeric probability, but the paper points out some issues with its particular approach.

### (b) How can we categorize uncertainty?

Levels for objects of uncertainty …:

events: essential unpredictabilityparameters within models: limitations in informationalternative model structures: limitations in formalized knowledgeeffects of model inadequacy from recognized sources: indeterminacy …effects of model inadequacy from unspecified sources: ignorance …

### (c) Yet another structure for uncertainty in risk analysis

This includes:

Source of uncertainty. Possibilities include variability within a population …[W]e do not view probability as an objective state of the world … . [T]he whole process of model construction and probability assessment is contingent, changeable and … deliberative … .

### (d) Level 1. Uncertainty about which in a list of events will occur

The paper seems to think that such aleatoric probability is straightforward.

### (*e*) Level 2. Uncertainty about parameters in a model

The Bayesian approach is increasingly popularin risk modelling as it allows for the propagation of uncertainty about parameters through a model to a final probability distribution on the predictions … .

Example: balls in a bag. Suppose another … bag has an unknown number of red balls. This is ambiguity in the sense of the classic Ellsberg paradox … .However,if we assume … then our probability for choosing a red from the bag should be 0.5 … . However …we can use our experience to learn about the composition of balls, so that once a red has been chosen and replaced … our probability for the next ball being red should rise … .Given an option of choosing between a bag with known chances of success, and one with an unknown chance, people tend to express ‘

ambiguity aversion’and tend to select the bag with known odds.

### (f ) Level 3. Uncertainty about which model is best

[I]f we are going to interpret the mechanism that the model encapsulates … then …

it is only meaningful to put probabilities on statements if we can at least imagine a future experiment that would reveal the statement’s … falsity. …Uncertainty about alternative models structures might therefore only be expressed by full Bayesian model probabilities in very special circumstances, and

in general a qualitative assessment of relative plausibility of alternative models, or a list of models/scenarios with deterministic sensitivity analysis, seems appropriate.

Example: balls in a bag. Suppose we [have] bag 1 with a known proportion … and bag 2 with an unknown proportion … . We pick a bag at random, and want to know which bag we have chosen. [S]uppose we draw two balls … and they both are red. … [This has] slightly increased our belief that we picked the bag with an unknown proportion … . We repeat thatsuch posterior model probabilities only seem appropriate in tightly controlled, idealized situations… .

### (g) Level 4. Uncertainty about known inadequacies of best model

Even the best imaginable model is still not the real worldand has inevitable limitations … . [A] ‘consumer’ confronted with a risk assessment is faced with another unmodelled source of uncertainty … :can we assume the modeller to be trustworthy and/or competent?

Example: balls in a bag. Suppose we had some explicit suspicions … for example, are there really only 10 balls in each bag …?

### (h) Level 5. Uncertainty about unknown inadequacies of all models

The idea of acknowledging the limitations in our capacity for formal conclusions has a long history. A classic quote is from Keynes [General Theory …].

One possible response … is to adapt the technical methods to deal with an unwillingness [or inability] to provide a complete specification. Options include incompletely specified probability distributions … . [But] we do not feel it is generally appropriate to respond to limitations in formal analysis by

increasingthe complexity of the modelling. Instead …

*the GRADE approach is recommended*.

Example: balls in a bag. [I]n our public demonstrations, we sometimes substitute an unpleasant-feeling object in a bag … which … communicates the trust implicit in assuming that the list of possible outcomes has been fully provided.

### (i) Case study. IPCC revisited

The levels structure is applied to the IPCC situation, to good effect.

*(*j) Case study. Egg-gate

[An egg box was found to contain all double-yolked eggs, which was claimed to be extremely unlikely.] A little reflection suggests that we might want to know about where the eggs came from, how they are screened [etc].

*(*k) So what is to be done in the face of deeper uncertainties?

[F]ormal ‘scientific’ analyses are constructed on the basis of current assumptions and judgement: there are deep uncertainties that are not expressed through the standard … analysis, [which] are not necessarily reduced by additional information.

Our final recommendations may appear mere common sense.

— Use quantitative models with … uncertainty expressed as Bayesian probability distributions.

— Conduct sensitivity analysis … without putting probabilities on models.

— Provide a list of known model limitations … .

— Provide a qualitative expression of confidence … based on the quality of the underlying evidence …

— In situations of low confidence, use deliberately imprecise expressions of uncertainty … .

— When exploring possible actions, look for robustness to error, resilience to the unforeseen, and potential for adaptivity in the face of the unexpected.

— Seek transparency and ease of interrogation of any model, with clear expression of the provenance of assumptions.

— Communicate the estimates with humility, communicate the uncertainty with confidence.

— Fully acknowledge the role of judgement … .[I]t is important to avoid the attrition of uncertainty in the face of an inappropriate demand for certainty: … ‘the built-in ignorance of science towards its own … assumptions is a problem only when external commitments’ ignore them.

## Comments

The introduction seems to suggest that mathematics is necessarily quantitative, which in this context seems to mean that to be mathematical the representation of uncertainty must use a one-dimensional numeric scale. It also seems to suggest that a ‘strictly formal approach’ is necessarily quantitative, and the current problems are due to subjective rather than objective factors. But Keynes, Good, and much of the discussion in the paper would seem to suggest otherwise, and the advice against over-precision would seem objectively justified when seeing uncertainty as a relation, as in Keynes’ Treatise. (The paper refers to Keynes’ economics, but not his mathematics.)

Section (b) ‘*How can we categorize uncertainty?’ *is the core of the paper*.* It relates levels of uncertainty to limitations in what we know about the particular problem. But it does not recognize the possibility of limitations in our reasoning about uncertainty, for example whether the language of probability is always appropriate. Is a Bayesian likelihood meaningful for a very heterogeneous population?

In Section (c ) seems to allude to the above problem, but does not emphasise or develop it. If we regard it as meaningful to assign probabilities to populations where there are or may be variation within a population, then this may be a ‘source of uncertainty’. But this usage seems confusing. Perhaps one should distinguish between the probability for a randomly drawn member of a population and the population for an individual – however ‘drawn’. The two are quite different, with different implications for decision-making.

Section (d ) presents aleatoric probability. But suppose one has a number of aleatoric process (e.g. selections from urns with differing mixes of balls) and then selects one of these randomly. One may have a complete description of the generating process and yet run into difficulties with a probabilistic approach (Savage).

Section (e) notes that the Bayesian approach is popular but controversial. There are some circumstances when it is correct, and some when it is not. How do we tell the difference? The Ellsberg example seems decisive. If the uncertainty could be measured by a probability then that probability would have to be 0.5, and we should act the same as if we the mix of balls to be even. As far as I know, all the attempts to demonstrate the adequacy of the Bayesian approach assume (implicitly or not) that one is dealing with a one-off decision. As soon as one relaxes that assumption then the probability no longer represents all that is required about the uncertainty to make decisions, because the two situations are different. But it seems to me that we can never know that a situation will never be repeated and so we are never in the situation that the Bayesian approach assumes. I also wonder why we should need to assume a distribution: why not just work with likelihoods, which are composable and reflect the uncertainty?

(f) The paper advocates assigning prior probabilities to parameters of models but not to models. This seems a very subtle point. Perhaps it would make sense if, as is often the case, a probability distribution over the parameters is a natural part of the model. But when, as in some of the paper’s urn examples, there is no natural prior probability, it seems more natural not to assign one and to treat each value of the parameter as a different case. This seems to fit the approach of Keynes and Good: if there is a natural distribution, use it. If not you might try a few, but not rely on them: it is the likelihood that really matters. If this is a ‘weak Bayesian’ position and a strong Bayesian position is that you should always estimate a distribution, then the paper is advocating an intermediate Bayesian position. I think that we should acknowledge some humility about which is ‘correct’.

It would certainly seem reasonable to suppose, as Good does, that if two or more models fit the evidence then one would need good reason to suppose that one is more ‘probable’ than the other. Good also introduces the notion of generalized likelihood, which can be applied without assuming any distribution for bag 2. The paper alludes to this when it says “This may involve integrating out (Bayesian) or maximizing over (classical) some parameters.”

For example, if one gets a run of reds then one has the usual likelihood for bag 1 but a likelihood of 1.0 for bag 2, since ‘all red balls’ is credible.

(g) Here quantitive notions seem inapplicable, but one might still employ Keynes’ comparative notion and might reasonably still be ‘ignorance averse’.

(h) Here the paper shows a concern for the complexity of the representation of uncertainty, but this surely ought to be balanced against the difficulty of applying the model. The use of interval-valued probabilities, while not ideal, is at least obviously always possible in the weak sense that you can always start with everything having a probability of [0,1] and only refining the interval when you have sufficient evidence.

When I saw a presentation with the example I thought the notion of attempting to assign a probability, as we were invited to do, simply ridiculous, as there seemed no sound basis for doing so. Had I been forced to declare a value I would have GRADEed it as ‘any estimate of effect is very uncertain’. But it seems to me that I could make a perfectly accurate assessment of the probability (in Keynes’ broad sense). As I recall, it was something like P(red|no trick)=0.5, P(something else|trick)=1.0. It was clear that at some point there was going to be a trick, but as a mathematician I didn’t feel obliged to guess which point was going to be made by which example, and hence whether this particular example was a trick. It may even have been a more devious trick: if I had said I thought it a trick he may have manipulated the bag to contain a ball, and vice-versa. How should I know?

Being conditional (as advocated in the abstract) my approach is not entirely quantitative, but it seems to me to be more mathematical in so far as I am not obliged to make arbitrary guesses and then GRADE them poorly. This seems a more reasonable approach to level 5 uncertainty: use an approach to uncertainty that allows one to say more precisely what you know even if it is at the expense of less precision in your claims: “don’t know, can’t know”.

(i) While it is helpful to apply the GRADE approach to climate change analysis, I would have thought that too much would be GRADEed ‘any estimate of effect is very uncertain’ to help resolve the debates and inform appropriate action. My limited understanding that even the ‘technical methods’ alluded to in (h) would be unlikely to be powerful enough to resolve the issues: they really are problematic. But climate change problems are clearly serious, whether for the damage climate change might cause, or for the attempts to avoid or mitigate exaggerated fears would cause. Hence one might reasonably throw everything at the problem, rather than just methods that have proved pragmatic for tamer problems.

(j) The egg-gate example is important. My interpretation of the paper is that we should not insist that something is surprising until we have made a reasonable effort to explain it, and that if there is some credible explanation we should not insist that it is surprising. But it may still be that according to ‘the generally accepted view’ or ‘the ruling theory’ the event is surprising, and my view is that – somehow – we should recognize that difference, and the difference between a ‘settled view’ and one which we would not be surprised if it would change on further evidence.

(k) What can one do if the GRADE is poor? Ideally one would – scientifically – develop a more precise and detailed model using adequate data, refining the model and extending the data until one had an adequate GRADE for the decision to be made. But often one has to make a decision without such a high GRADEd conventionally probabilistic model, either due to time pressure or the complexity of the situation. What then?

It seems to me that there are objective aleatoric probabilistic models for which classical subjective probability approaches are misleading, and that we need an improved language and theory for probabilistic reasoning. This might usefully include a distinction between interval-valued probabilities based on what is known and point ‘best guess’ probabilities. It would need a clear theory corresponding to Bayesian updating and a clear view on inferencing to causality: the point of probabilistic reasoning is often to inform assessments of causality, so we need to be sure to have a clear logic. In my view we need to separate out explaining the data that we have and extrapolation: one might have a really good model of the economy ‘as is’, but one need a different kind of analysis to tell how far from the cliff it is: there is more to ‘total risk’ than variability.

This might have these elements:

- Use Boole’s approach, where probabilities are represented by variables. Likelihoods then propagate in the same way as in conventional Bayesian analysis but yield equations instead of numbers. One then seeks to establish bounds for these equations. If these are unsatisfactory (as they may well be) one then identifies which uncertainties about input parameters contribute the most, and seek to narrow them.
- Also include conditions, as in a Bayesian network, but used slightly differently. The result is a kind of super-charged sensitivity analysis. It does not assume any privileged ‘about right’ values around which perturbations are considered, but would be much more transparent. (See (k) ).
- Recognize the distinction between a ‘true probability’ (in Fisher’s sense) and a composite probability, and treat them accordingly.
- Accept that if you want to anticipate the future you should consider sudden changes as well as extrapolation of current behaviour, and that a far wider range of data will be required. For example, in thinking about what might happen economically next year one needs to at least model from 1920.
- In making assessments whose outcomes may affect others, check that your assumptions ‘make sense’ from the point of view of those likely to be affected. Even if you think a ‘rational’ dispassionate assessment is correct, at least model alternatives.
- Include the models and model-able views of any dissenters within the analysis and only ‘rule them out’ as a result of the above analysis.
- Open the approach up for criticism: I may have missed something. But always aim for a mathematically sound response in the first instance.
- Preferably join forces with those faced with similar non-tame problems.
- Whenever a firm conclusion seems to be being reached, or one has an impasse, go out to the broader community of those affected or potentially affected, not just ‘the recognized experts’. Really engage.

### Example

Urn examples are informative, but more familiar ones can be too. What does it mean when a doctor says that a certain treatment has a 10% chance of serious side-effects? Is she taking account of our family and medical history, or not? Should she? Is it enough for doctors to simply warn atypical patients that the numbers do not really apply to them? In the Ellsberg urn examples one factor is ‘the attitude to risk’. If there is uncertainty is it appropriate for the doctor to apply a notional attitude to risk and report a probability? Or should they inform me about the uncertainties so that I can apply my own attitude?

Suppose that a randomized trial has shown that a treatment has a certain overall efficacy. It is obviously helpful to turn this into a probability, but this is fraught with difficulty [Fisher] unless we can be sure that there are no sub-populations that tend to have different outcomes. With interval-valued probabilities you can at least capture the range of possible efficacies. As a patient I would prefer at least some comparative probability (e.g. ‘no more than 30%’) even if a meaningful precise estimate is impossible. To put it another way, if one can only report a numeric probability that is ‘very uncertain’, what can you report that would help me to make the decision? The discussion in the paper seems very pertinent to this issue, but I would like to go beyond GRADEing numeric probabilities whose meaningfulness in so many real-world seem like doomed attempts to ‘tame chance’.

As an example, suppose that a patient has various symptoms, yielding P(disease|symptoms) based on whole-population statistics. The doctor now finds that they are of a particular ethnicity for which the prevalence is different. According to Bayes’ rule the probability of the disease is now multiplied by the likelihood ratio, which depends on P( ethnicity|disease,symptoms). This is rather a bizarre quantity, and any estimate would have to be GRADEd poorly. It seems more reasonable to reason as best one can from the more reliable data to get a more highly GRADEd, if imprecise, estimate.

## See Also

Blackett Review, Avoiding Black Swans