Biases and uncertainty

Rational decision theory is often used as the norm for comparing human, organisational or economic decision-making, but rational decision theory takes only account of probability and not Knightian uncertainty. We should consider the extent to which so-called ‘biases’ could be attributed to uncertainty rather than error.

Hyperbolic discounting

Exponential discounting would be rational. But this assumes that the current epoch lasts forever. If we are uncertain of this, then the discount rate should be diminished, as in hyperbolic discounting (Sozou, P. D. (1998). “On hyperbolic discounting and uncertain hazard rates”. Proceedings of the Royal Society B Biological Sciences 265: 2015 ).

Normalcy bias

This is where people ignore evidence that the context has changed. , a form of status quo bias. Here the rational norm is being applied with the hind-sight that the situation has changed, yet the rational norm seems to encourage normalcy bias.

Conjunction fallacy

The conjunction fallacy is where the conjunction ‘A and B’ is considered more probable than A alone. It typically occurs where A is vague whereas ‘A and B’ is relatively precise and seems to explain the evidence. If one is uncertain about the context (as in many psychology questions) then it is more appropriate to try to estimate the likelihood P(E|H) rather than the probability P(H|E), which depends on the priors and hence context.

What appears to be a fallacy may be evidence that people tend to think in terms of likelihoods rather than probabilities. (Even psychology papers can confuse them, so this may be credible.)

Reference Class forecasting

Reference class forecasting is a method for reducing reliance on probability estimates. It ensures that forecasts are made by comparison with real data, and seeks to capture relevant parameters. However, it may still be necessary to take account of potential novel contexts, to avoid shocks.

See Also

Knightian uncertainty Induction The financial crash

Dave Marsay

Induction and epochs


Induction is the basis of all empirical knowledge. Informally, if something has never or always been the case, one expects it to continue to be never or always the case: any change would mark a change in epoch. 

Mathematical Induction

Mathematical induction concerns mathematical statements, not empirical knowledge.

Let S(n) denote  statement dependent on an integer variable, n.
    For all integers n, S(n) implies S(n+1), and
    S(k) for some integer k,
    S(i) for all integers i ≥ k .

This, and variants on it, is often used to prove theories for all integers. It motivates informal induction.

Statistical Induction

According to the law of large numbers, the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer as more trials are performed. Thus:

For two or more sufficiently large sets of results obtained by random sampling from the same distribution, the averages  should be close, and will tend to become closer as more trails are performed.

In particular, if one set of results, R1, has been obtained and another, R2, will be obtained, using the language of probability theory, if C() is a condition on a results then

P(C(R2)) = p(C(R1)), where P() is the probability and p() is the proportion.

Alternatively, p() could be given as a hypothesis and tested against the data. Note that for any given quantity of data, rare events cannot be excluded, and so one can never be sure that any p(x) is ‘objectively’ very small. That is, the ‘closeness’ in the law of large numbers always has some non-zero tolerance.

A key assumption of statistical induction is that there exists a stable ‘expectation’. This is only true within some epoch where the trials depend on that epoch, and not on any sub-epochs. In effect, the limits on an epoch are determined by the limits on the law of large numbers.

Empirical Induction

In practice we don’t always have the conditions required for straightforward statistics, but we can approximate. Using the notation as above, then:

P(C(R2)) = p(C(R1)),

provided that R1, R2 are in the same epoch. That is, where:

  • The sampling was either unbiased, had the same bias in the two cases or at least was not conditional on anything that changed between the two cases.
  • For some basis of hypotheses, {H}, the conditional likelihoods P(data|H) are unchanged between the two cases.

 Alternatively, we can let A=”same epoch” be the above assumptions and make

P(C(R2)|A) = p(C(R1)).

Induction on Hypotheses

Statistical induction only considers proportions.  The other main case is where we have hypotheses (e.g. models or theories) that fit the past data. If these are static then we may expect some of the hypotheses that fit to be ‘true’ and hence to continue to fit. That is:

If for all i in some index set I hypotheses Hi fit the current data  (R1), then for some subset, J, of I,  by default one expects that for all j in J, Hj will continue to fit (for future data, R2).

As above, there is an assumption that the epoch hasn’t changed.

Often we are only interested in some of the parameters of a hypothesis, such as a location. Even if all the theories that fit the current data virtually agree on the current value of the parameters of interest, there may be radically different possibilities for their future values, perhaps forming a multi-modal distribution. (For example, if we observe an aircraft entering our airspace, we may be sure about where it is and how fast it is flying, but have many possible destinations.)

Pragmatic induction

One common form of pragmatism is where one has an ‘established’ model or belief which one goes on using (unquestioning) unless and until it is falsified. By default the assumption A, above, is taken to be true. Thus one has

P(C(R2)) = p(C(R1)),

unless there is definite evidence that P() will have changed, e.g. a biased sample or an epochal change of the underlying random process. In effect, pragmatism assumes that the current epoch will extend indefinitely.

Rationalizing induction

The difference between statistical and pragmatic induction is that the former makes explicit the assumptions of the latter. If one has a pragmatic claim, P(C(R2)) = p(C(R1)), one in effect recover the rigour of the statistical approach by noting when, where and how the data supporting the estimate was sampled, compared with when where and how the probability estimate is to be applied. (Thus  it might be pragmatic – in this pedantic sense – to suppose that if our radar fails temporarily that all airplanes will have continued flying straight and level, but not necessarily sensible.)


When someone, Alf, says ‘all swans are white’ and a foreigner, Willem, says that they have seen black swans, we should consider whether Alf’s statement is empirical or not, and if so what it’s support is. Possibly:

    • Alf defines swans in such a way that they must be white: they are committed to calling a similar black creature something else. Perhaps this is a widely accepted definition that Willem is unaware of.
    • Alf has only seen British swans, and we should interpret their statement as ‘British swans are white’.
    • Alf believes that swans are white and so only samples large white birds to check that they are swans.
    • Alf genuinely and reasonably believes that the statement ‘all swans are white’ has been subjected to the widest scrutiny, but Willem has just returned from a new-found continent

Even if Alf’s belief was soundly based on pragmatic induction, it would be prudent for him to revise his opinion, since his induction – of whatever kind – was clearly based on too small an epoch.


We can split conventional induction into three parts:

  1. Modelling the data.
  2. Extrapolating, using the models.
  3. Consider predictions based on the extrapolations.

The final step is usually implicit in induction: it is usually supposed that one should always take an extrapolation to be a prediction. But there are exceptions. (Suppose that two airplanes are flying straight towards each other. A candidate prediction would be that they would pass infeasibly close, breaching the aviation rules that are supposed to govern the airspace. Hence we anticipate the end of the current ‘straight and level’ epoch and take recourse to a ‘higher’ epoch, in this case the pilots or air-traffic controllers. If they follow set rules of the road (e.g. planes flying out give way) then we may be able to continue extrapolating within the higher epoch, but here we only consider extrapolation within a given epoch.)

Thus we might reasonably imagine a process somewhat like:

  1. Model the data.
  2. Extrapolate, using the models.
  3. Establish predictions:
    • If the candidate predictions all agree: Take the extrapolations to be a candidate prediction.
    • Otherwise: Make a possibilistic candidate prediction; the previous ‘state’ has ‘set up the conditions’ for the possibilities.
  4. Establish credibility:
    1. If the candidate predictions are consistent with the epoch, then they are credible.
    2. If not, note lack of credibility.

In many cases a natural ‘null hypothesis’ is that many elements of a hypothesis are independent, so that they be extrapolated separately. There are then ‘holistic’ constraints that need to be applied over all. This can be done as a part of the credibility check. (For example, airplanes normally fly independently but should not fly too close.)

We can fail to identify a credible hypothesis either because we have not considered a wide enough range of hypotheses or because the epoch has ended. The epoch may also end without our noticing, leading to a seemingly credible prediction that is actually based on a false premise. We can potentially deal with all these problems by considering a broader range of hypotheses and data. Induction is only as good as the data gathering and theorising that supports it. 


The modelling process may be complicated in two ways:

  • We may need to derive useful categories so that we have enough data in each category.
  • We may need to split the data into epochs, with different statistics for each.

We need to have enough data in each partition to be statistically meaningful, while being reasonably sure that data in the same partition are all alike in terms of transition probabilities. If the parts are too large we can get averaged results, which need to be treated accordingly.

Induction and types of complexity

We can use induction to derive a typology for complexity:

  • simple unconditional: the model is given: just apply it
  • simple conditional: check the model and apply it
  • singly complicated: analyse the data in a single epoch against given categories to derive a model, apply it.
  • doubly complicated: analyse the data into novel categories or epochs to drive a model, apply it.
  • complex: where the data being observed has a reflexive relationship with any predictions.

The Cynefin framework  gives a simple – complicated – complex – chaotic sense-making typology that is consistent with this, save that it distinguishes between:

  • complex: we can probe and make sense
  • chaotic: we must act first to force the situation to ‘make sense’.

We cannot make this distinction yet as we are not sure what ‘makes sense’ would mean. It may be that one can only know that one has made sense when and if one has had a succesful intervention, which will often mean that ‘making sense’ is more of a continuing activity that a state to be achieved. But inadequate theorising and data would clearly lead to chaos, and we might initially act to consider more theories and to gather more data. But it is not clear how we would know that we had done enough.

See also

StatisticspragmaticCynefin, mathematics.

David Marsay

Regulation and epochs

Conventional regulation aims at maintaining objective criteria, as in Conant and Ashby. They must have or form a model or models of their environment. But if future epochs are unpredictable or the regulators are set-up for the short-term, e.g. being post-hoc adaptive, then the models will not be appropriate for the long-term, leading to a loss of regulation at least until a new effective model can be formed.

Thus regulation based only on objective criteria is not sustainable in the long-term. Loss of regulation can occur, for example, due to innovation by the system being regulated. More sustainable regulation (in the sense of preserving viability) might be achieveable by taking a broader view of the system ‘as a whole’, perhaps engaging with it. For example, a ‘higher’ (strategic) regulator might monitor the overall situation, redirect the ‘lower’ (tactical) regulators and keep the lower regulators safe. The operation of these regulators would tend to correspond to Whitehead’s epochs (regulators would impose different rules, and different rules would call for different regulators).

See also

Stafford Beer.

David Marsay

Synthetic Modelling of Uncertain Temporal Systems


SMUTS is a computer-based ‘exploratorium’, to aid the synthetic modelling of uncertain temporal systems. I had previously worked on sense-making systems based on the ideas of Good, Turing and Keynes, and was asked to get involved in a study on the potential impact of any Y2K bugs, starting November 1999. Not having a suitable agreed model, we needed a generic modelling system, able to at least emulate the main features of all the part models. I had been involved in conflict resolution, where avoiding cultural biases and being able to meld different models was often key, and JC Smuts’ Holism and Evolution seemed a sound if hand-wavy approach. SMUTS is essentially a mathematical interpretation of Smuts. I was later able to validate it when I found from the Smuts’ Papers that Whitehead, Smuts and Keynes regarded their work as highly complementary. SMUTS is actually closer to Whitehead than Smuts.


An actual system is a part of the actual world that is largely self-contained, with inputs and outputs but with no significant external feedback-loops.  It is a judgement about what is significant. Any external feedback loop will typically have some effect, but we may not regard it as significant if we can be sure that any effects will build up too slowly. It is a matter of analysis on larger systems to determine what might be considered smaller systems. Thus plankton are probably not a part of the weather system but may be a pat of the climate.

The term system may also be used for a model of a system, but here we mean an actual system.


We are interested in how systems change in time, or ‘evolve’. These systems include all types of evolution, adaptation, learning and desperation, and hence are much broader than the usual ‘mathematical models’.


Keynes’ notion of uncertainty is essentially Knightian uncertainty, but with more mathematical underpinning. It thus extends more familiar notions of probability as ‘just a number’. As Smuts emphasises, systems of interest can display a much richer variety of behaviours than typical probabilistic systems. Keynes has detailed the consequences for economics at length.


Pragmatically, one develops a single model which one exploits until it fails. But for complex systems no single model can ever be adequate in the long run, and as Keynes and Smuts emphasised, it could be much better recognize that any conventional model would be uncertain. A key part of the previous sense-making work was the multi-modelling concept of maintaining the broadest range of credible models, with some more precise and others more robust, and then hedging across them, following Keynes et al.


In conflict resolution it may be enough to simply show the different models of the different sides. But equally one may need to synthesize them, to understand the relationships between them and scope for ‘rationalization’. In sense making this is essential to the efficient and effective use of data, otherwise one can have a ‘combinatorial explosion’.

Test cases

To set SMUTS going, it was developed to emulate some familiar test cases.

  • Simple emergence. (From random to a monopoly.)
  • Symbiosis. (Emergence of two mutually supporting behaviours.)
  • Indeterminacy. (Emergence of co-existing behaviours where the proportions are indeterminate.)
  • Turing patterns. (Groups of mutually supporting dynamic behaviours.)
  • Forest fires. (The gold standard in epidemiology, thoroughly researched.)

In addition we had an example to show how the relationships between extremists and moderates were key to urban conflicts.

The aim in all of these was not to be as accurate as the standard methods or to provide predictions, but to demonstrate SMUTS’ usefulness in identifying the key factors and behaviours. 


A key requirement was to be able to accommodate any relevant measure or sense-making aid, so that users could literally see what effects were consistent from run to run, what weren’t, and how this varied across cases. The initial phase had a range of standard measures, plus Shannon entropy, as a measure of diversity.

Core dynamics

Everything emerged from an interactional model. One specified the extent to which one behaviour would support or inhibit nearby behaviours of various types. By default behaviours were then randomized across an agora and the relationships applied. Behaviours might then change in an attempt to be more supported. The fullest range of variations on this was supported, including a range of update rules, strategies and learning. Wherever possible these were implemented as a continuous range rather than separate cases, and all combinations were allowed.


SMUTS enables one to explore complex dynamic systems

SMUTS has a range of facilities for creating, emulating and visualising systems.

By default there are four quadrants. The bottom right illustrates the inter-relationships (e.g., fire inhibits nearby trees, trees support nearby trees). The top right shows the behaviours spread over the agora (in this case ground, trees and fire). The bottom left shows  a time-history of one measure against another, in this case entropy versus value of trees. The top-left allows one to keep an eye on multiple displays, forming an over-arching view. In this example, as in many others, attempting to get maximum value (e.g. by building fire breaks or putting out all fires) leads to a very fragile system which may last a long time but which will completely burn out when it does go. If one allows fires to run their course, one typically gets an equilibrium in which there are frequent small fires which keep the undergrowth down so that there are never any large fires.


It was generally possible to emulate text-book models to show realistic short-run behaviours of systems. Long term, simpler systems tended to show behaviours like other emulations, and unlike real systems. Introducing some degree of evolution, adaptation or learning all tended to produce markedly more realistic behaviours: the details didn’t matter. Having behaviours that took account of uncertainty and hedged also had a similar effect.


SMUTS had a recognized positive influence, for example on the first fuel crisis, but the main impact has been in validating the ideas of Smuts et al.

Dave Marsay 

Pragmatism and mathematics

The dichotomy

Mathematics may be considered in two parts: that which is a tool supporting other disciplines in their modelling, which is considered pragmatic; and that which seeks to test underlying assumptions in methods and models, which is not so well appreciated.

Pragmatism and sustainability

Setting mathematics to one side for a moment, consider two courses of actions, S and P, with notional actual benefits as shown.

Boom and bust can be better in the short-term, but worse in the long.

Sure and steady may beat Boom and bust

‘Boom and bust’ arises when (as is usually the case) the strategy is ‘closed loop’, with activity being adjusted according to outcomes (e.g. Ashby). Even a sustainable strategy would be subject to statistical effects and hence cyclic variations, but these will be small compared with the crashes that can arise when the strategy is based on some wrong assumption (Conant-Ashby). If something happens that violates that assumption then one can expect performance to crash until the defect is remedied, when performance can again increase. In this sense, the boom-bust strategy is pragmatic.

If one has early warnings of potential crashes then it can also be pragmatic to incorporate the indicators into the model, thus switching to a safer strategy when things get risky. But, to be pragmatic, the model has to be based on earlier experience, including earlier crashes. Thus, pragmatically, one can avoid crashes that have similar causes to the old ones, but not novel crashes. This is a problem when one is in a complex situation, in which novelty is being continually generated. Indeed, if you are continually updating your model and ‘the environment’ is affected by your actions and the environment can innovate, then one is engaged in cyclic co-innovation and hence co-evolution. This is contrary to an implicit assumption of pragmatism, which seems (to me) to be that one has a fixed ‘external world’ that one is discovering, and hence one expects the process of learning to converge onto ‘the truth’, so that surprises become ever less frequent. (From a Cybernetic perspective in a reflexive situation ‘improvements’ to our strategy are likely to be met by improvements in the environmental response, so that in effect we are fighting our own shadow and driven to ever faster performance until we hit some fundamental limit of the environment to respond.)

Rationalising Pragmatism

The graph shows actual benefits. It is commonplace to discount future benefits. Even if you knew exactly what the outcomes would be, a heavy enough discounting would make the discounted return from the boom-bust strategy preferable to the sustainable one, so that initially one would follow boom-bust. As the possible crash looms the sustainable strategy might look better. However, the Cybernetic view (Conant-Ashby) is that a sustainable strategy would depend on an ‘eyes open’ view of the situation, its possibilities and the validity of our assumptions, and almost certainly on a ‘multi-model’ approach. This is bound to be more expensive than the pragmatic approach (hence the lower yield) and in practice requires considerable invest in areas that have no pragmatic value and considerable lead times. Thus it may be too late to switch before the crash.

In complex situations we cannot say when the crash is due, but only that a ‘bubble’ is building up. Typically, a bubble could pop at any time, the consequences getting worse as time goes on. Thus the risk increases. Being unable to predict the timing of a crash makes it less likely that a switch can be made ‘pragmatically’ even as the risk is getting enormous.

There is often also an argument that ‘the future is uncertain’ and hence one should focus on the short-run. The counter to this is that while the specifics of the future may be unknowable, we can be sure that our current model is not perfect and hence that a crash will come. Hence, we can be sure that we will need all those tools which are essential to cope with uncertainty, which according to pragmatism we do not need.

Thus one can see that many of our accounting habits imply that we would not choose a sustainable strategy even if we had identified one.

The impact of mathematics

Many well-respected professionals in quite a few different complex domains have commented to me that if they are in trouble the addition of a mathematician often makes things worse. The financial crash brought similar views to the fore. How can we make sense of this? Is mathematics really dangerous?

In relatively straightforward engineering, there is sometimes a need for support from mathematicians who can take their models and apply them to complicated situations. In Keynes’ sense, there is rarely any significant reflexivity. Thus we do believe that there are some fundamental laws of aerodynamics which we get ever closer to as we push the bounds of aeronautics. Much of the ‘physical world’ seems completely unresponsive to how we think of it. Thus the scientists and engineers have tended to ‘own’ the interesting problems, leaving the mathematicians to work out the details.

For complex situations there appear to be implicit assumptions embedded in science,  engineering and management (e.g. pragmatism) that are contrary to the mathematics. There would thus seem to be a natural (but suppressed) role for mathematics in trying to identify and question those assumptions. Part of that questioning would be to help identify the implications of the current model in contrast to other credible models and theories. Some of this activity would be identical to what mathematicians do in ‘working out the details’, but the context would be quite different. For example, a mathematician who ‘worked out the details’ and made the current model ‘blow up’ would be welcomed and rewarded as contributing to that ever developing understanding of the actual situation ‘as a whole’ that is necessary to sustainability.


It is conventional, as in pragmatism, to seek single models that give at least probabilistic predictions. Keynes showed that this was not credible for economics, and it is not a safe assumption to make for any complex system. This is an area where practice seems to be ahead of current mainstream theory. A variant on pragmatism would be to have a fixed set of models that one only changes when necessary, but the objections here still stand. One should always be seeking to test one’s models, and look for more.

It follows from Conant-Ashby that a sustainable strategy is a modelling strategy and that there will still be bubbles, but they will be dealt with as soon as possible. It may be possible to engineer a ‘soft landing’, but if not then a prediction of Conant-Ashby is that the better the model the better the performance. Thus one may have saw-tooth like boom and busts, but finer and with a more consistent upward trend. In practice, we may not be able to choose between two or more predictive models, and if the available data does not support such a choice, we need to ‘hedge’. We can either think of this as hedge across different conventional models or as a single unconventional model (such as advocated by Keynes). Either way, we might reasonably call it a ‘multi-model’. The best strategy that we have identified, then, is to maintain as good as possible a multi-model, and ‘hedge’.

If we think of modelling in terms of assumptions then, like Keynes, we end up with a graph-like structure of models, not just the side-by-side alternative of some multi-modelling approaches. We have a trade-off between models that are more precise (more assumptions) or those that are more robust(less assumptions) as well as ones that are simply different (different assumptions). If a model is violated we may be able to revert to a more general model that is still credible. Such general models in effect hedge over the range of credible assumptions. The trick is to invest in developing techniques for the more general case even when ‘everybody knows’ that the more specific case is true, and – if one is using the specific model – invest in indicators that will show when its assumptions are particularly vulnerable, as when a bubble is over-extended.


A traditional approach is to have two separate strands of activity. One – ‘engineering’ – applies a given model (or multi-model), the other – ‘science’ – seeks to maintain the model. This seems to work in complicated settings. However, in complex reflexive settings:

  • The activity of the engineers needs to be understood by the scientists, so they need to work closely together.
  • The scientists need to experiment, and hence interfere with the work of the engineers, with possible misunderstandings and dire consequences.
  • Im so far as the two groups are distinct, there is a need to encourage meaningful collaborations and manage the equities between their needs. (Neither ‘on top’ nor ‘on tap’.)

One can see that collaboration is inhibited if one group is pragmatic, the other not, and that pragmatism may win the day, leading to ‘pragmatic scientists’ and hence a corruption of what ought to be happening. (This is in addition to a consideration of the reward system.)

It may not be too fanciful to see signs of this in many areas.

The possible benefits of crashes

Churchill noted that economic crashes (of the cyclic kind) tended to clear out dead-wood and make people more realistic in their judgements, compared with the ‘good times’ when there would be a great deal of investment in things that turned out to be useless, or worse, when the crash came. From Keynes’ point of view much of the new investment in ‘good times’ are band-wagon investments, which cause the bubble which ought to be pricked.

We can take account of such views in two ways. Firstly if the apparent boom is false and the apparent crash is beneficial then we can take this into account in our measure of benefit, so that ‘boom’ becomes a period of flat or declining growth, the crash becomes a sudden awakening, which is beneficial, and the post-crash period becomes one of real growth. The problem then becomes how to avoid ‘bad’ band-wagons.

Either way, we want to identify and avoid fast growth that is ill-founded, i.e., based on an unsustainable assumption.


It is well recognized that mathematics is extremely powerful, and the price for that is that it is very sensitive: give it the slightest mis-direction and the result can be far from what was intended. Mathematics has made tremendous contributions to the complicated disciplines, and seems quite tame. In contrast, my experience is that for complex subjects the combination of mathematicians and numerate professionals is risky, and requires an enormous up-front investment in exchanging views, which sometimes can be nugatory. Perhaps there is some mis-direction? If so, where?

From my point of view, the problem often seems to be one of ‘scientism’. That is, certain types of method are characteristic of the complicated professions, and so people expect problems that are almost the some but complex to be addressed in the same sort of ways. Anything else would not be ‘proper science’. The mathematician, on the other hand, has the habit of rooting out assumptions, especially un-acknowledged ones, and seeking to validate them. If they can’t be then they would rather not make them. (A cynic might say that the other professionals want to work with the tools they are familiar with while the mathematician wants to develop an all-purpose tool  so that he can move on to something more interesting.)

Numerous headline failures tend to reinforce the mathematician in his view that other professionals profess certain beliefs, while he is uniquely equipped to be productively cynical. But here I focus on one belief: in pragmatism. Often, when pressed, people do not actually believe that their assumptions are literally true, only that they are ‘pragmatic’. But, as previously noted, the mixture of literal mathematics and conventional pragmatism is unsafe. But in my view mixtures of pragmatisms from different domains (without any ‘proper’ mathematics) seems to lie behind many headline problems. I have shown why pragmatism is inappropriate for solving complex problems and briefly sketched some reforms needed to make it ‘mathematics friendly’.

See Also

General approach, Sub-prime science, Weapons of Maths Destruction, Minsky moment.

Dave Marsay

Scientists of the subprime

‘Science of the subprime’ is currently available from BBC iplayer.


Mathematicians and scientists were complicit in the crash. Financiers were ‘in thrall to mathematics’, with people like Stiglitz and Soros ‘lone voices in the wilderness’. The ‘low point’ were derivatives, which were ‘fiendishly complicated’, yet ‘mathematical models’ convinced people to trade in them.

The problem was that liberalisation led to an increase in connectedness, which was thought to be a good thing, but that this went to far and led to a decrease in diversity, which made the whole system very fragile, eventually crashing. This was presented by Lord May from an ecological perspective.

Perhaps the most interesting part was that Lord May had tackled his lunching partner Mervyn King before the crash, and that in 2003 Andrew Haldane had independently come up with a ‘toy model’ that he felt compelling, but which failed to gain traction.

After the crash, none of the mainstream mathematical models gave any insight into what had gone wrong. The problem was that the models concerned single-point failures, not systemic failures [my words]. Since then Haldane and May have published a paper in Nature showing that structure matters.

The new activities are to generate financial maps, much like weather maps and transport maps.

One problem is diversity: the solution is

  • To ensure that banks suffer the consequences of their actions [no ‘moral hazard’].
  • To ’tilt the playing field’ against large players [the opposite of what is done now].

Another problem is the expectation of certainty: it must be recognized that sensible models can give insights but not reliable predictions.

In summary, the main story is that physics-based mathematics led decision-makers astray, and they wouldn’t be persuaded by Lord May or their own experts. There were also some comments on why this might be:-

Gillian Tett (FT) commented that decision makers needed predictions and the illusion of certainty from their models. A decision-maker commented on the tension between containing long-term risk and making a living in the short-run [but this was not developed]. Moreover, policy makers tend to search for data, models or theories to support their views: the problems are not due to the science as such, but the application of science


  • This broadly reflects and amplifies the Turner review, but I found it less appealing than Lord Turner’s recent INET interview.
  • Gordon Brown ‘at the top of the shop’ shared these concerns, but seems unable to intervene until his immediate post-crash speech. This seems to raise some interesting issues, especially if the key point was about financial diversity.
  • The underlying problem seems to be that the policy-makers and decision-makers are pragmatic, in a particular sense.
  • Even if the complexity explanation for the crash is correct, it is not clear that this is the only way that crashes can happen, so that pragmatic regulation based on ‘carry on but fix the hole’ may not be effective.
  • The explanations and observations are reminiscent of Keynes, Stiglitz, Soros and Brown have all commended Keynes pre crash, and many have recognized the significance of Keynes post-crash. Yet he is not mentioned. Before the 1929 crash he thought the sustained performance of the stock market remarkable, rather than taking it for granted. His theory was that it would remain stable just so long as everyone was able to trade and expected everyone else to be able to trade, and the central role of confidence has been recognized ever since. The programme ignored this, which seems odd as the behaviourist are also quite fashionable.
  • Keynes underpinning theory of probability [not mentioned] is linked to his tutor’s, Whitehead’s, process logic, which underpins much of modern science, including ecology. This makes the problem quite clear: if mathematicians and scientists are employed by banks and banks are run as ordinary commercial organisations then  they will be focussing on the short-term. The long-term is simply not their responsibility. That is what governments are for (at least according to Locke).  But the central machinery doesn’t seem to be up to it. We shouldn’t blame anyone not in government, academia or similar supposedly ‘for the common good’ organisations.
  •  There were plenty of mathematicians, scientists and economists (not just Lord May) who understood the issues and were trying hard to get the message across, many of them civil servants etc. If we don’t understand how they failed we may find ourselves in the same position again. I think that in the 90s and 00s everything became more ‘commercial’ and hence short-term. Or can we just kick out the Physicists and bring on the Ecologists?

See Also

General approach

Dave Marsay

The new mathematics of voting?

The reform challenge

Electoral reform is a hot topic. If it is true (as I believe) that mathematics has something useful to say about almost every important topic (because most of our problems are due to misplaced rationalism) then mathematics ought to have something useful to say about first past the post (FPTP) versus the alternative (AV).

The arguments

Many mathematicians prefer AV because it has many seemingly desirable properties that FPTP lacks, but such considerations are notable by their absence from the debate. Today (18/2/2011) Cameron and Clegg have made their cases (FPTP and AV, respectively). Clegg makes lots of assertions , such as that under AV ‘every vote is worth the same’, but with no attempt at justifications. The general standard of the debate is exemplified by “When it comes to our democracy, Britain shouldn’t have to settle for anyone’s second choice.” This sounds good, but what it seems to be saying is that if your first choice isn’t on the ballot then you shouldn’t vote. There may be some important principle here, but it needs to be explained. 

On a more substantive point, Cameron leads with the argument that:

“[AV] won’t make every vote count. The reality is it will make some votes count more than others. There’s an inherent unfairness under AV.”

He provides an example that could hardly be clearer. It seems obvious that ‘arithmetically’, FPTP is better.

A mathematical approach

The Condorcet method is the typical mathematician’s method of choice for referenda, but for general elections there may be other concerns. We can’t expect mathematics to give a definitive answer to every question, but at least we should be able to distinguish a mathematical argument from common-sense reasoning dressed up as mathematics.

A mathematical approach typical starts by considering criteria, and then establishing which methods meet which criteria. Here neither side has any explicit criteria, but simply makes some observations and then says ‘isn’t this bad – so we can’t have this method’. But many things in life are compromises, so it is not enough to identify a single failing: one needs to think about which are the key criteria and trade-offs.

Historically, democracy was intended to avoid rule by a person or party who was the last choice of most people. FPTP does not satisfy this basic ‘majoritarian’ requirement (due to vote-splitting), whereas almost all the alternatives, including AV, do. Thus under FPTP it is not enough if most people are against the status quo: they have to agree on a replacement before they vote. Thus people end up voting ‘tactically’. This means that one can’t tell from the ballots what people’s actual preferences were. It could happen (especially before the Internet) that the media misled people into voting tactically (so as not to ‘waste’ their vote) when they would have preferred the outcome that they would have got by voting for their true preference. Mathematicians tend to prefer AV and PR because they are simpler in this respect.

Cameron argues against majoritarianism thus:

“It could mean that those who are courageous and brave and may not believe in or say things that everyone agrees with are pushed out of politics and those who are boring and the least controversial limping to victory. It could mean a Parliament of second choices. We wouldn’t accept this in any other walk of life.”

Thus, we have to think  of a situation where united oligarchs have 30% support but are hated by the other 70%. Perhaps using the media, they could ‘divide and rule’ so that no opposition party obtains more than 40% of the opposition vote, thus keeping the oligarchs in power indefinitely. Do we want an electoral system that could allow in ‘courageous and brave’ oligarchs?

Cameron makes a big point of:

“If the last election was under AV, there would be the chance, right now, that Gordon Brown would still be Prime Minister. Ok, the last election was not decisive in terms of who won. But it was certainly decisive in terms of who lost. And I think any system that keeps dead governments living on life support is a massive backward step for accountability and trust in our politics.”

One problem with many of the FPTP supporters is ‘anchoring’: they presumably see a government as ‘dead’ when it would lose under FPTP. But this means that the health of the government is heavily dependent on how many opposition parties there are, what there relative strengths are and how they are geographically spread. From a majoritarian viewpoint a government is ‘dead’ when a majority prefer some other party. Similarly a sitting member might be thought dead when a majority prefer some other candidate. Under alternatives to FPTP like AV a dead candidate will never be elected. Under FPTP dead candidates are often elected: in recent decades the opposition vote has always been split, so that a candidate with only 30%  support can win when the opposition vote is split. There is no way to tell from the ballots to tell when this has happened. If Cameron really wants to get rid of dead governments he will need something like AV+ or PR . Interestingly, this seems a very sensible requirement yet I have not seen any analysis of it. At least if anyone came up with a suitable method and we had a referendum, AV would  allow select the most preferred option, whereas under FPTP it would be difficult to predict waht would happen: FPTP would seem to have the advantage.

A feature of a genuine logical argument is that it still seems sensible when you change the example, subject to the explicit assumptions. In this case, suppose that we had a referendum with three choices: FPTP, AV and PR. If we used FPTP to count the ballots I suspect that the non-FPTP would be split between AV and PR. But suppose that those who put PR first put AV second, and vice-versa. Then under any majoritarian method FPTP could only win by getting 50% of the votes. Otherwise it would be ranked last by most voters, and hence has to be rejected by our majoritarian criterion. Is this fair? The Cameron argument is that the FPTP ballots are only counted once, while the others are counted twice. But which is more important, respecting the wishes of the majority, or arithmetic?

The ‘counting’ argument is also rather spurious, in that under AV we could simply mix up all the ballots before each round and do a full re-count, but ignoring the options that have been deleted. Each ballot would then be counted equally. Do we think it sensible to choose a method based on something that is a feature of how it is implemented, and not inherent in the method itself?

[Someone has since pointed out that while each ballot is counted as many times as their are rounds, for some ballots the first preference will be counted multiple times while for others a different preference could be counted in each round, as their earlier preferences are eliminated. If this is what the NO campaign is trying to say, it is saying it very badly. If we take this objection seriously then we must vote tactically. Is this subtle ‘inequality’ worse than tactical voting?]

The usual interpretation of ‘one person one vote’ admits AV, so this argument of the NO camp seems to be special pleading.

Mathematical speculations

In fact, there are significant differences between a referendum and a general election. The traditional argument in favour of FPTP is that it necessitates politicking and hence favours strong politicians, who can then apply their dark arts to our benefit in dealing with foreigners. PR (and perhaps AV) would lead to straightforward debates on the issues, risking us having leaders who are unpracticed at the dark arts. FPTP seems ideally suited in this role, but does lead the majority vulnerable to rule by the best at the dark arts, who may not always have the majority’s interests at heart.


Once one has settled on one’s criteria and adequately characterised the problem, one can evaluate a range of methods against them, not just FPTP, AV and PR. One might even develop a tailored method. Practically, though, FPTP, AV and PR seem to be the options, and if one does move away from FPTP then at least one will be able to have sensible referenda, to refine the method.

If one thought majoritarianism paramount, with a wish for ‘strong’ politicians second, then a reasonable method might be to reject all those options/candidates that the majority rank below some acceptable core, and to apply FPTP (if necessary) as a tie-break to the core. This would only differ from FPTP where FPTP would elect a candidate for which a clear majority preferred some other candidate. Similarly, the intention of AV is to avoid wasted votes and respect the majority wishes. This slightly different method can be seen as having the same intent, plus breaking ties using FPTP (to give stronger parties). To mathematicians, this is Condorcet modified to take account of the value of electing a candidate with strong support.


A feature of FPTP, with its tactical voting, is that has a raised barrier of entry to new parties compared with most other methods. AV also has a significant entry barrier in that a party with the fewest first place rankings is eliminated, even if it was a clear overall favourite (e.g., would have won under Condorcet). To go from a high to no barrier in one step could lead to an undesirable disruption to political life. Maybe AV is the best compromise? The Jenkins Commission thought so.

Political views

There are clearly political aspects to the choice. While we can say for sure that the FPTP arguments are wrong the AV arguments seem to thin too justify a change and the mathematical arguments too limited. What do politicians think? Cameron and Clegg (and many others) seem to agree that in most constituencies there will be no change. Cameron’s arguments seem mostly spurious, apart from the one that most voters will simply rank the minor parties the way their first choice tells them to. I have not seen any analysis of the impact of this, but it seems minor compared to the majoritarian criterion. Cleggs’ main argument is that candidates and MPs will need to ‘work harder’, which may offset Cameron’s points. But how do they know? What we can say is that:

  • AV would seem not to make any radical short-term difference.
  • AV reduces the need for tactical voting, so that we can better judge how fair the system is.
  • AV allows us to recognize up and coming candidates and parties, independently of media hype.
  • AV is less biased against the formation of new parties, or local independents.
  • AV would provide information on preferences that could inform choices on further reforms, if necessary.
  • AV, used in a referendum, would open the way to sensible further reforms, if needed.

 If it is true that an informed public would choose FPTP, adopting AV now would give them that choice, with no obvious down-side (scares on costs aside).

Tactical voting

The No campaign’s objections to AV would seem to apply equally to any method that did not encourage tactical voting, i.e. voting for someone who isn’t your actual first preference, but who you think has a better chance. The classic problems with tactical voting are:

  • Voters expectations can be manipulated, e.g. by the media.
  • The support for new parties (e.g. the Greens, a while back) is suppressed, and the giving the current main parties have an advantage.
  • It requires some degree of co-ordination to be able to vote a tyrant out ‘tactically’, to avoid vote-splitting.
  • It disenfranchises honest voters and those who are so clued-up in politics. (Especially if there are some political shocks just before the election.)


While the FPTP arguments appears more mathematical than AV’s its notion of ‘counting’ is spurious. There would seem more merit in the AV argument that under FPTP wasted votes do not count at all, and that AV remedies this defect. But – looking at it afresh – it is not enough to ‘count’ the votes: one may also want the results to respect the wishes of a clear majority, in order to be truly ‘democratic’. But this does not uniquely determine a method: a tie-break may be needed, and there may be some merit in FPTP. Indeed, while the No campaign arguments seem largely spurious, there does seem to lie behind it a genuine concern for the ‘health’ of politics and the strength of government.

It seems to me that mathematics can provide some useful insights, but some greater understanding is required to reach a definitive verdict.  Both Cameron and Clegg make the short-term effects seem rather minor. It is unfortunate that there isn’t the option of a pilot scheme, but AV has a clear edge for referenda and if the majority came to view AV as a ‘dead method’ we could easily return to FPTP. Indeed, we could routinely publish both FPTP and AV results, to inform the public. The arguments against such a tentative view seem unconvincing.



The No2AV campaign gives these reasons to vote NO:

  1. AV is costly …
  2. AV is complex and unfair
    The winner should be the candidate that comes first, but under AV the candidate who comes second or third can actually be elected. That’s why it is used by just three countries in the world – Fiji, Australia and Papua New Guinea. Voters should decide who the best candidate is, not the voting system. We can’t afford to let the politicians off the hook by introducing a loser’s charter.
  3. AV is a politician’s fix …

The second point seems to be an insult to the readers’ intelligence: one could equally well say “The winner should be the candidate that comes first, but under FPTP the candidate who comes second or third can actually be elected.” It all depends on who you think ‘should’ come first. E.g., should someone who is ranked last on 70% of the ballots be elected simply because the other candidates’ votes are split? The No campaign’s points seem all spin and no substance.


Lord Reid has given an interview in which he says

  •  [A] cornerstone of our democratic system has been ‘one person one vote’ …
  • [My vote] has the same weight as everyone elses.
  • [AV] completely undermines and corrupts that; some people will have one vote, others … will be counted again and again.
  • [AV] is a theat to the .. basis of our democratic system.

When the interviewer notes that under AV one gets the candidate that ‘most people are happy with’ Reid responds, of AV, that:

  • If you vote for liberal, labour or conservative it is overwhelmingly likely that your vote will be counted once, whereas if I go out and vote for one of the ‘fringe’ candidates [my vote may be counted many times] … how is it fair [?]

 The emerging ‘No’ message seems to be that only ‘fringe’ candidates ‘such as the BNP’ would benefit from AV. I note:

  • The Green party is also a fringe party, as would be a ‘reform democracy’ or ‘reform expenses’ party.
  • In many constituencies (like mine) one of the three main parties is ‘fringe’.
  • Independents, such as doctors standing to save a local hospital, are ultra- fringe. They may also revitalise democracy.
  • If the three main parties have candidates, it is mathematically certain that at least one of them will have their votes counted at least twice.

The example seems bogus, both mathematically and practically. ‘One vote’ and ‘equal weight’ could mean any of:

  1. One ballot paper each.
  2. One mark each (X).
  3. The ballots are counted by a process which only takes account of each ballot once.
  4. The ballots are counted by a process that only takes account of each vote once.
  5. The ballots could be counted by a process that only takes account of each ballot once.
  6. The same opportunities and rules for everyone.

It is not clear which Lord Reid considers essential to democracy. FPTP satisfies them all, but would allow an oligarchy with enough influence over the media to retain power even if it was most voters last choice, as above. AV meets 1 but fails 2, as would any alternative to FPTP. AV also fails 3, but so what? We could stage a series of rounds (as in France) with trailing candidates being eliminated until one candidate has a majority. The result would be the same, but each vote would be counted once (per round) and counted equally. Do we wish to reject AV on a technicality? With one round AV satisfies 4, if by ‘vote’ one means ballot, and also satisfies 5 and 6. Isn’t it 6 that matters?

Lord Reid also refers to ‘weight’, without defining it. Suppose that two parties traditionally vie for the seat, with the rest being no-hopers. Then a doctor stands on the ticket of supporting the local hospital and otherwise consulting his constituents. Suppose you would prefer to vote for this ‘fringe’ candidate’. Under FPTP you could vote for the doctor, thus recording your support for the hospital but taking no part in the main contest. Or you could vote for a main candidate, failing to record your support for the doctor. Under AV you would simply record your actual preferences, thus recording support for the hospital and taking part in the main contest. And the doctor might even win. Under AV your vote clearly has more ‘weight’, but which is fairer? If we think of a group of people with similar views, then those who support a main candidate will all vote for them, so that their weight of support is not split. Under AV the weight of support for the main candidate is undiminished (it is what it would have been had everyone voted tactically). The support for the fringe is also undiminished (it is what it would have been had they all voted honestly). Under FPTP the weight of support will be divided, unless they all vote tactically. So which is a fairer definition of ‘weight’? Is it obvious that AV undermines democracy?

I think there are some sensible arguments for ‘No’, but the No campaign isn’t using them, and the ability of a party whom most people hate to get elected (due to vote splitting) under FPTP seems much more significant in undermining democracy.

See also

Dave Marsay