Committee on Mathematical Foundations of Verification, Validation, and Uncertainty Quantification Board on Mathematical Sciences and Their Applications *Assessing the Reliability of Complex Models: Mathematical and Statistical Foundations of Verification, Validation, and Uncertainty Quantification *(US) NRC, 2012

The team were tasked to “examine practices for VVUQ of large-scale computational simulations”. Such simulations are complicated. The title seems misleading in using the term ‘complex’. The summary seems like a reasonable consensus summary of the state of the art in its focus area, and of research directions, with no surprises. But the main body does provide some ammunition for those who seek to emphasise deeper uncertainty issues, considering mathematics beyond computation.

## Summary

### Principles

Highlighted principles include:

- A validation assessment is well defined only in terms of specified quantities of interest (QOIs) and the accuracy needed for the intended use of the model.
- A validation assessment provides direct information about model accuracy only in the domain of applicability that is “covered” by the physical observations employed in the assessment.

### Comments

The notion of a model here would be something like ‘all swans are white’. The first principle suggests that we need tolerance for what is regarded as ‘white’. The second principle suggests that if we have only considered British swans, we should restrict the domain of applicability of the model.

In effect, the model is being set within a justification, much as the conclusion of a mathematical theorem is linked to axioms by the proof. This is contrary to much school science practice, which simply teaches models: we need to understand the (empirical) theory. Typically, when we read ‘all swans are white’ we should understand that it really only means ‘all British swans are white-ish’.

Swans are relatively simple. The only problem is our limited observations of them. Economics, for example, is more complex. The quantities of interest are controversial, as are the relevant observations. Such complex situations seem beyond the intended scope of this report.

### Research Topics

- Development of methods that help to define the “domain of applicability” of a model, including methods that help quantify the notions of near neighbors, interpolative predictions, and extrapolative predictions.
- Development of methods to assess model discrepancy and other sources of uncertainty in the case of rare events, especially when validation data do not include such events.

### Comments

These topics are easier if one has an overarching theory of which the model is a specialisation, whose parameters are to be determined. In such cases the ‘domain of applicability’ could be based on an established classifying schema, and uncertainty could be probabilistic, drawing on established probabilistic models. The situation is more challenging, with broader uncertainties, where there is no such ruling theory, as in climate science.

### Recommendations

- An effective VVUQ [verification, validation and uncertainty quantification] education should encourage students to confront and reflect on the ways that knowledge is acquired, used, and updated.
- The elements of probabilistic thinking, physical-systems modeling, and numerical methods and computing should become standard parts of the respective core curricula for scientists, engineers, and statisticians.

### Comments

Most engineers and statisticians will be working pragmatically, assuming some ruling theory that guides their work. This report seems most suitable for them. Ideally, scientists acting as science advisors would also be working in such a way. However, surprises do happen, and scientists working on science should be actively doubting any supposed ruling theory. Thus it is sometimes vital to know the difference between a situation where an agreed theory should be regarded as, for example, ‘fit for government work’, and where it is not, particularly where extremes of complexity or uncertainty call for a more principled approach. In such cases it is not obvious that uncertainty can be quantified. For example, how does one put a number on ‘all swans are white’ when one has not been outside Britain?

As well as using mathematics to work out the implications of a ruling theory in a particular case, one needs to be able to use different mathematics to work out the implications of a particular case for theory.

## Introduction

This cites Savage, but in his terms it is implicitly addressing complicated but ‘small’ worlds rather than more complex ‘large’ ones, such as that of interest to climate science.

## Sources of Uncertainty and Error

The general issue is whether formal validation of models of complex systems is actually feasible. This issue is both philosophical and practical and is discussed in greater depth in, for example, McWilliams (2007), Oreskes et al. (1994), and Stainforth et al. (2007).

…

There is a need to make decisions … before a complete UQ analysis will be available. … This does not mean that UQ can be ignored but rather that decisions need to be made in the face of only partial knowledge of the uncertainties involved. The “science” of these kinds of decisions is still evolving, and the various versions of decision analysis are certainly relevant.

### Comment

It seems that not all uncertainty is quantifiable, and that one needs to be able to make decisions in the face of such uncertainties.

In the case of ‘all swans are white’ the uncertainty arises because we have only looked in Britain. It is clear what can be done about this, even if we have no basis for assigning a number.

In the case of economics, even if we have a dominant theory we may be uncertainty because, for example, it has only been validated against the British economy for the last 10 years. We might not be able to put a number on the uncertainty, but it might be wise to look for more general theories, covering a broader range of countries and times, and then see how our dominant theory is situated within the broader theory. This might give us more confidence in some conclusions from the theory, even if we cannot assign a number. (One also needs to consider alternative theories.)

## Model Validation and Prediction

### Comparison with reality

*In simple settings* validation could be accomplished by directly comparing model results to physical measurements for the QOI …

### Findings

- Mathematical considerations alone cannot address the appropriateness of a model prediction in a new, untested setting. Quantifying uncertainties and assessing their reliability for a prediction require both statistical and subject-matter reasoning.
- The idea of a
*domain of applicability *is helpful for communicating the conditions for which predictions (with uncertainty) can be trusted. However, the mathematical foundations have not been established for defining such a domain or its boundaries.

### Comment

I take the view that a situation that can be treated classically is not complex, only at most complicated. Complex situations may always contain elements that are surprising to us. Hence bullet 1 applies to complex situations too. The responsibility for dealing with complexities seems to be shifted from the mathematicians to the subject matter experts (SMEs). But if one is dealing with a new ‘setting’ one is dealing with dynamic complexity, of the kind that would be a crisis if the potential impact were serious. In such situations it may not be obvious which subject is the relevant one, or there may be more than one vital subject. SMEs may be unused to coping with complexity or with collaboration under crisis or near-crisis conditions. For example, climate science might need not only climatologists but also experts in dealing with uncertainty.

My view is that sometimes one can only assess the relevance and reliability of a model in a particular situation, that one needs particular experts in this, and that mathematics can help – but it is a different mathematics.

## Next Steps in Practice, Research, and Education for Verification, Validation, and Uncertainty Quantification

For validation, “domain of applicability” is recognized as an important concept, but how one defines this domain remains an open question. For predictions, characterizing how a model differs from reality, particularly in extrapolative regimes, is a pressing need. … advances in linking a model to reality will likely broaden the domain of applicability and improve confidence in extrapolative prediction.

### Comment

As Keynes pointed out, in some complex situations one can only meaningfully predict in the short-term. Thus in early 2008 economic predictions were not in error, as short-term predictions. It is just that the uncertain long-term arrived. What is needed, therefore, is some long-term forecasting ability. This cannot be a prediction, in the sense of having a probability distribution, but it might be an effective anticipation, just as one might have anticipated that there were non-white swans in foreign parts. Different mathematics is needed.

## My Summary

The report focusses on the complicatedness of the models. But I find it hard to think of a situation where one needs a complicated model and the actual situation is not complex. Usually, for example, the situation is ‘reflexive’ because the model is going to be used to inform interaction with the world, which will change it. Thus, the problem as I see it is how to model a situation that is uncertain and possibly complex. While the report does give some pointers it does not develop them.

The common sense view of modelling is that a model is based on observations. In fact – as the report notes – it tends to be based on observations plus assumptions, which are refined into a model, often iteratively. But the report seems to suppose that one’s initial assumptions will be ‘true’. But one can only say that the model fits one’s observations, not that it will continue to fit all possible observations, unless one can be sure that the situation is very constrained. That is, one cannot say that a scientific theory is unconditionally and absolutely true, but only ‘true to’ ones observations and assumptions.

The report is thus mainly for those who have a mature set of assumptions which they wish to refine, not those who expect the unexpected. It does briefly mention ‘rare events’, but it sees these as outliers on a probability distribution whereas I would see these more as challenging assumptions.

## See Also

The better nature blog provides a view of science that is complimentary to this report.

My notes on science and uncertainty.

*Dave Marsay*