Biases and uncertainty

Rational decision theory is often used as the norm for comparing human, organisational or economic decision-making, but rational decision theory takes only account of probability and not Knightian uncertainty. We should consider the extent to which so-called ‘biases’ could be attributed to uncertainty rather than error.

Hyperbolic discounting

Exponential discounting would be rational. But this assumes that the current epoch lasts forever. If we are uncertain of this, then the discount rate should be diminished, as in hyperbolic discounting (Sozou, P. D. (1998). “On hyperbolic discounting and uncertain hazard rates”. Proceedings of the Royal Society B Biological Sciences 265: 2015 ).

Normalcy bias

This is where people ignore evidence that the context has changed. , a form of status quo bias. Here the rational norm is being applied with the hind-sight that the situation has changed, yet the rational norm seems to encourage normalcy bias.

Conjunction fallacy

The conjunction fallacy is where the conjunction ‘A and B’ is considered more probable than A alone. It typically occurs where A is vague whereas ‘A and B’ is relatively precise and seems to explain the evidence. If one is uncertain about the context (as in many psychology questions) then it is more appropriate to try to estimate the likelihood P(E|H) rather than the probability P(H|E), which depends on the priors and hence context.

What appears to be a fallacy may be evidence that people tend to think in terms of likelihoods rather than probabilities. (Even psychology papers can confuse them, so this may be credible.)

Reference Class forecasting

Reference class forecasting is a method for reducing reliance on probability estimates. It ensures that forecasts are made by comparison with real data, and seeks to capture relevant parameters. However, it may still be necessary to take account of potential novel contexts, to avoid shocks.

See Also

Knightian uncertainty Induction The financial crash

Dave Marsay

Induction and epochs


Induction is the basis of all empirical knowledge. Informally, if something has never or always been the case, one expects it to continue to be never or always the case: any change would mark a change in epoch. 

Mathematical Induction

Mathematical induction concerns mathematical statements, not empirical knowledge.

Let S(n) denote  statement dependent on an integer variable, n.
    For all integers n, S(n) implies S(n+1), and
    S(k) for some integer k,
    S(i) for all integers i ≥ k .

This, and variants on it, is often used to prove theories for all integers. It motivates informal induction.

Statistical Induction

According to the law of large numbers, the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer as more trials are performed. Thus:

For two or more sufficiently large sets of results obtained by random sampling from the same distribution, the averages  should be close, and will tend to become closer as more trails are performed.

In particular, if one set of results, R1, has been obtained and another, R2, will be obtained, using the language of probability theory, if C() is a condition on a results then

P(C(R2)) = p(C(R1)), where P() is the probability and p() is the proportion.

Alternatively, p() could be given as a hypothesis and tested against the data. Note that for any given quantity of data, rare events cannot be excluded, and so one can never be sure that any p(x) is ‘objectively’ very small. That is, the ‘closeness’ in the law of large numbers always has some non-zero tolerance.

A key assumption of statistical induction is that there exists a stable ‘expectation’. This is only true within some epoch where the trials depend on that epoch, and not on any sub-epochs. In effect, the limits on an epoch are determined by the limits on the law of large numbers.

Empirical Induction

In practice we don’t always have the conditions required for straightforward statistics, but we can approximate. Using the notation as above, then:

P(C(R2)) = p(C(R1)),

provided that R1, R2 are in the same epoch. That is, where:

  • The sampling was either unbiased, had the same bias in the two cases or at least was not conditional on anything that changed between the two cases.
  • For some basis of hypotheses, {H}, the conditional likelihoods P(data|H) are unchanged between the two cases.

 Alternatively, we can let A=”same epoch” be the above assumptions and make

P(C(R2)|A) = p(C(R1)).

Induction on Hypotheses

Statistical induction only considers proportions.  The other main case is where we have hypotheses (e.g. models or theories) that fit the past data. If these are static then we may expect some of the hypotheses that fit to be ‘true’ and hence to continue to fit. That is:

If for all i in some index set I hypotheses Hi fit the current data  (R1), then for some subset, J, of I,  by default one expects that for all j in J, Hj will continue to fit (for future data, R2).

As above, there is an assumption that the epoch hasn’t changed.

Often we are only interested in some of the parameters of a hypothesis, such as a location. Even if all the theories that fit the current data virtually agree on the current value of the parameters of interest, there may be radically different possibilities for their future values, perhaps forming a multi-modal distribution. (For example, if we observe an aircraft entering our airspace, we may be sure about where it is and how fast it is flying, but have many possible destinations.)

Pragmatic induction

One common form of pragmatism is where one has an ‘established’ model or belief which one goes on using (unquestioning) unless and until it is falsified. By default the assumption A, above, is taken to be true. Thus one has

P(C(R2)) = p(C(R1)),

unless there is definite evidence that P() will have changed, e.g. a biased sample or an epochal change of the underlying random process. In effect, pragmatism assumes that the current epoch will extend indefinitely.

Rationalizing induction

The difference between statistical and pragmatic induction is that the former makes explicit the assumptions of the latter. If one has a pragmatic claim, P(C(R2)) = p(C(R1)), one in effect recover the rigour of the statistical approach by noting when, where and how the data supporting the estimate was sampled, compared with when where and how the probability estimate is to be applied. (Thus  it might be pragmatic – in this pedantic sense – to suppose that if our radar fails temporarily that all airplanes will have continued flying straight and level, but not necessarily sensible.)


When someone, Alf, says ‘all swans are white’ and a foreigner, Willem, says that they have seen black swans, we should consider whether Alf’s statement is empirical or not, and if so what it’s support is. Possibly:

    • Alf defines swans in such a way that they must be white: they are committed to calling a similar black creature something else. Perhaps this is a widely accepted definition that Willem is unaware of.
    • Alf has only seen British swans, and we should interpret their statement as ‘British swans are white’.
    • Alf believes that swans are white and so only samples large white birds to check that they are swans.
    • Alf genuinely and reasonably believes that the statement ‘all swans are white’ has been subjected to the widest scrutiny, but Willem has just returned from a new-found continent

Even if Alf’s belief was soundly based on pragmatic induction, it would be prudent for him to revise his opinion, since his induction – of whatever kind – was clearly based on too small an epoch.


We can split conventional induction into three parts:

  1. Modelling the data.
  2. Extrapolating, using the models.
  3. Consider predictions based on the extrapolations.

The final step is usually implicit in induction: it is usually supposed that one should always take an extrapolation to be a prediction. But there are exceptions. (Suppose that two airplanes are flying straight towards each other. A candidate prediction would be that they would pass infeasibly close, breaching the aviation rules that are supposed to govern the airspace. Hence we anticipate the end of the current ‘straight and level’ epoch and take recourse to a ‘higher’ epoch, in this case the pilots or air-traffic controllers. If they follow set rules of the road (e.g. planes flying out give way) then we may be able to continue extrapolating within the higher epoch, but here we only consider extrapolation within a given epoch.)

Thus we might reasonably imagine a process somewhat like:

  1. Model the data.
  2. Extrapolate, using the models.
  3. Establish predictions:
    • If the candidate predictions all agree: Take the extrapolations to be a candidate prediction.
    • Otherwise: Make a possibilistic candidate prediction; the previous ‘state’ has ‘set up the conditions’ for the possibilities.
  4. Establish credibility:
    1. If the candidate predictions are consistent with the epoch, then they are credible.
    2. If not, note lack of credibility.

In many cases a natural ‘null hypothesis’ is that many elements of a hypothesis are independent, so that they be extrapolated separately. There are then ‘holistic’ constraints that need to be applied over all. This can be done as a part of the credibility check. (For example, airplanes normally fly independently but should not fly too close.)

We can fail to identify a credible hypothesis either because we have not considered a wide enough range of hypotheses or because the epoch has ended. The epoch may also end without our noticing, leading to a seemingly credible prediction that is actually based on a false premise. We can potentially deal with all these problems by considering a broader range of hypotheses and data. Induction is only as good as the data gathering and theorising that supports it. 


The modelling process may be complicated in two ways:

  • We may need to derive useful categories so that we have enough data in each category.
  • We may need to split the data into epochs, with different statistics for each.

We need to have enough data in each partition to be statistically meaningful, while being reasonably sure that data in the same partition are all alike in terms of transition probabilities. If the parts are too large we can get averaged results, which need to be treated accordingly.

Induction and types of complexity

We can use induction to derive a typology for complexity:

  • simple unconditional: the model is given: just apply it
  • simple conditional: check the model and apply it
  • singly complicated: analyse the data in a single epoch against given categories to derive a model, apply it.
  • doubly complicated: analyse the data into novel categories or epochs to drive a model, apply it.
  • complex: where the data being observed has a reflexive relationship with any predictions.

The Cynefin framework  gives a simple – complicated – complex – chaotic sense-making typology that is consistent with this, save that it distinguishes between:

  • complex: we can probe and make sense
  • chaotic: we must act first to force the situation to ‘make sense’.

We cannot make this distinction yet as we are not sure what ‘makes sense’ would mean. It may be that one can only know that one has made sense when and if one has had a succesful intervention, which will often mean that ‘making sense’ is more of a continuing activity that a state to be achieved. But inadequate theorising and data would clearly lead to chaos, and we might initially act to consider more theories and to gather more data. But it is not clear how we would know that we had done enough.

See also

StatisticspragmaticCynefin, mathematics.

David Marsay

Regulation and epochs

Conventional regulation aims at maintaining objective criteria, as in Conant and Ashby. They must have or form a model or models of their environment. But if future epochs are unpredictable or the regulators are set-up for the short-term, e.g. being post-hoc adaptive, then the models will not be appropriate for the long-term, leading to a loss of regulation at least until a new effective model can be formed.

Thus regulation based only on objective criteria is not sustainable in the long-term. Loss of regulation can occur, for example, due to innovation by the system being regulated. More sustainable regulation (in the sense of preserving viability) might be achieveable by taking a broader view of the system ‘as a whole’, perhaps engaging with it. For example, a ‘higher’ (strategic) regulator might monitor the overall situation, redirect the ‘lower’ (tactical) regulators and keep the lower regulators safe. The operation of these regulators would tend to correspond to Whitehead’s epochs (regulators would impose different rules, and different rules would call for different regulators).

See also

Stafford Beer.

David Marsay

Synthetic Modelling of Uncertain Temporal Systems


SMUTS is a computer-based ‘exploratorium’, to aid the synthetic modelling of uncertain temporal systems. I had previously worked on sense-making systems based on the ideas of Good, Turing and Keynes, and was asked to get involved in a study on the potential impact of any Y2K bugs, starting November 1999. Not having a suitable agreed model, we needed a generic modelling system, able to at least emulate the main features of all the part models. I had been involved in conflict resolution, where avoiding cultural biases and being able to meld different models was often key, and JC Smuts’ Holism and Evolution seemed a sound if hand-wavy approach. SMUTS is essentially a mathematical interpretation of Smuts. I was later able to validate it when I found from the Smuts’ Papers that Whitehead, Smuts and Keynes regarded their work as highly complementary. SMUTS is actually closer to Whitehead than Smuts.


An actual system is a part of the actual world that is largely self-contained, with inputs and outputs but with no significant external feedback-loops.  It is a judgement about what is significant. Any external feedback loop will typically have some effect, but we may not regard it as significant if we can be sure that any effects will build up too slowly. It is a matter of analysis on larger systems to determine what might be considered smaller systems. Thus plankton are probably not a part of the weather system but may be a pat of the climate.

The term system may also be used for a model of a system, but here we mean an actual system.


We are interested in how systems change in time, or ‘evolve’. These systems include all types of evolution, adaptation, learning and desperation, and hence are much broader than the usual ‘mathematical models’.


Keynes’ notion of uncertainty is essentially Knightian uncertainty, but with more mathematical underpinning. It thus extends more familiar notions of probability as ‘just a number’. As Smuts emphasises, systems of interest can display a much richer variety of behaviours than typical probabilistic systems. Keynes has detailed the consequences for economics at length.


Pragmatically, one develops a single model which one exploits until it fails. But for complex systems no single model can ever be adequate in the long run, and as Keynes and Smuts emphasised, it could be much better recognize that any conventional model would be uncertain. A key part of the previous sense-making work was the multi-modelling concept of maintaining the broadest range of credible models, with some more precise and others more robust, and then hedging across them, following Keynes et al.


In conflict resolution it may be enough to simply show the different models of the different sides. But equally one may need to synthesize them, to understand the relationships between them and scope for ‘rationalization’. In sense making this is essential to the efficient and effective use of data, otherwise one can have a ‘combinatorial explosion’.

Test cases

To set SMUTS going, it was developed to emulate some familiar test cases.

  • Simple emergence. (From random to a monopoly.)
  • Symbiosis. (Emergence of two mutually supporting behaviours.)
  • Indeterminacy. (Emergence of co-existing behaviours where the proportions are indeterminate.)
  • Turing patterns. (Groups of mutually supporting dynamic behaviours.)
  • Forest fires. (The gold standard in epidemiology, thoroughly researched.)

In addition we had an example to show how the relationships between extremists and moderates were key to urban conflicts.

The aim in all of these was not to be as accurate as the standard methods or to provide predictions, but to demonstrate SMUTS’ usefulness in identifying the key factors and behaviours. 


A key requirement was to be able to accommodate any relevant measure or sense-making aid, so that users could literally see what effects were consistent from run to run, what weren’t, and how this varied across cases. The initial phase had a range of standard measures, plus Shannon entropy, as a measure of diversity.

Core dynamics

Everything emerged from an interactional model. One specified the extent to which one behaviour would support or inhibit nearby behaviours of various types. By default behaviours were then randomized across an agora and the relationships applied. Behaviours might then change in an attempt to be more supported. The fullest range of variations on this was supported, including a range of update rules, strategies and learning. Wherever possible these were implemented as a continuous range rather than separate cases, and all combinations were allowed.


SMUTS enables one to explore complex dynamic systems

SMUTS has a range of facilities for creating, emulating and visualising systems.

By default there are four quadrants. The bottom right illustrates the inter-relationships (e.g., fire inhibits nearby trees, trees support nearby trees). The top right shows the behaviours spread over the agora (in this case ground, trees and fire). The bottom left shows  a time-history of one measure against another, in this case entropy versus value of trees. The top-left allows one to keep an eye on multiple displays, forming an over-arching view. In this example, as in many others, attempting to get maximum value (e.g. by building fire breaks or putting out all fires) leads to a very fragile system which may last a long time but which will completely burn out when it does go. If one allows fires to run their course, one typically gets an equilibrium in which there are frequent small fires which keep the undergrowth down so that there are never any large fires.


It was generally possible to emulate text-book models to show realistic short-run behaviours of systems. Long term, simpler systems tended to show behaviours like other emulations, and unlike real systems. Introducing some degree of evolution, adaptation or learning all tended to produce markedly more realistic behaviours: the details didn’t matter. Having behaviours that took account of uncertainty and hedged also had a similar effect.


SMUTS had a recognized positive influence, for example on the first fuel crisis, but the main impact has been in validating the ideas of Smuts et al.

Dave Marsay 

Pragmatism and mathematics

The dichotomy

Mathematics may be considered in two parts: that which is a tool supporting other disciplines in their modelling, which is considered pragmatic; and that which seeks to test underlying assumptions in methods and models, which is not so well appreciated.

Pragmatism and sustainability

Setting mathematics to one side for a moment, consider two courses of actions, S and P, with notional actual benefits as shown.

Boom and bust can be better in the short-term, but worse in the long.

Sure and steady may beat Boom and bust

‘Boom and bust’ arises when (as is usually the case) the strategy is ‘closed loop’, with activity being adjusted according to outcomes (e.g. Ashby). Even a sustainable strategy would be subject to statistical effects and hence cyclic variations, but these will be small compared with the crashes that can arise when the strategy is based on some wrong assumption (Conant-Ashby). If something happens that violates that assumption then one can expect performance to crash until the defect is remedied, when performance can again increase. In this sense, the boom-bust strategy is pragmatic.

If one has early warnings of potential crashes then it can also be pragmatic to incorporate the indicators into the model, thus switching to a safer strategy when things get risky. But, to be pragmatic, the model has to be based on earlier experience, including earlier crashes. Thus, pragmatically, one can avoid crashes that have similar causes to the old ones, but not novel crashes. This is a problem when one is in a complex situation, in which novelty is being continually generated. Indeed, if you are continually updating your model and ‘the environment’ is affected by your actions and the environment can innovate, then one is engaged in cyclic co-innovation and hence co-evolution. This is contrary to an implicit assumption of pragmatism, which seems (to me) to be that one has a fixed ‘external world’ that one is discovering, and hence one expects the process of learning to converge onto ‘the truth’, so that surprises become ever less frequent. (From a Cybernetic perspective in a reflexive situation ‘improvements’ to our strategy are likely to be met by improvements in the environmental response, so that in effect we are fighting our own shadow and driven to ever faster performance until we hit some fundamental limit of the environment to respond.)

Rationalising Pragmatism

The graph shows actual benefits. It is commonplace to discount future benefits. Even if you knew exactly what the outcomes would be, a heavy enough discounting would make the discounted return from the boom-bust strategy preferable to the sustainable one, so that initially one would follow boom-bust. As the possible crash looms the sustainable strategy might look better. However, the Cybernetic view (Conant-Ashby) is that a sustainable strategy would depend on an ‘eyes open’ view of the situation, its possibilities and the validity of our assumptions, and almost certainly on a ‘multi-model’ approach. This is bound to be more expensive than the pragmatic approach (hence the lower yield) and in practice requires considerable invest in areas that have no pragmatic value and considerable lead times. Thus it may be too late to switch before the crash.

In complex situations we cannot say when the crash is due, but only that a ‘bubble’ is building up. Typically, a bubble could pop at any time, the consequences getting worse as time goes on. Thus the risk increases. Being unable to predict the timing of a crash makes it less likely that a switch can be made ‘pragmatically’ even as the risk is getting enormous.

There is often also an argument that ‘the future is uncertain’ and hence one should focus on the short-run. The counter to this is that while the specifics of the future may be unknowable, we can be sure that our current model is not perfect and hence that a crash will come. Hence, we can be sure that we will need all those tools which are essential to cope with uncertainty, which according to pragmatism we do not need.

Thus one can see that many of our accounting habits imply that we would not choose a sustainable strategy even if we had identified one.

The impact of mathematics

Many well-respected professionals in quite a few different complex domains have commented to me that if they are in trouble the addition of a mathematician often makes things worse. The financial crash brought similar views to the fore. How can we make sense of this? Is mathematics really dangerous?

In relatively straightforward engineering, there is sometimes a need for support from mathematicians who can take their models and apply them to complicated situations. In Keynes’ sense, there is rarely any significant reflexivity. Thus we do believe that there are some fundamental laws of aerodynamics which we get ever closer to as we push the bounds of aeronautics. Much of the ‘physical world’ seems completely unresponsive to how we think of it. Thus the scientists and engineers have tended to ‘own’ the interesting problems, leaving the mathematicians to work out the details.

For complex situations there appear to be implicit assumptions embedded in science,  engineering and management (e.g. pragmatism) that are contrary to the mathematics. There would thus seem to be a natural (but suppressed) role for mathematics in trying to identify and question those assumptions. Part of that questioning would be to help identify the implications of the current model in contrast to other credible models and theories. Some of this activity would be identical to what mathematicians do in ‘working out the details’, but the context would be quite different. For example, a mathematician who ‘worked out the details’ and made the current model ‘blow up’ would be welcomed and rewarded as contributing to that ever developing understanding of the actual situation ‘as a whole’ that is necessary to sustainability.


It is conventional, as in pragmatism, to seek single models that give at least probabilistic predictions. Keynes showed that this was not credible for economics, and it is not a safe assumption to make for any complex system. This is an area where practice seems to be ahead of current mainstream theory. A variant on pragmatism would be to have a fixed set of models that one only changes when necessary, but the objections here still stand. One should always be seeking to test one’s models, and look for more.

It follows from Conant-Ashby that a sustainable strategy is a modelling strategy and that there will still be bubbles, but they will be dealt with as soon as possible. It may be possible to engineer a ‘soft landing’, but if not then a prediction of Conant-Ashby is that the better the model the better the performance. Thus one may have saw-tooth like boom and busts, but finer and with a more consistent upward trend. In practice, we may not be able to choose between two or more predictive models, and if the available data does not support such a choice, we need to ‘hedge’. We can either think of this as hedge across different conventional models or as a single unconventional model (such as advocated by Keynes). Either way, we might reasonably call it a ‘multi-model’. The best strategy that we have identified, then, is to maintain as good as possible a multi-model, and ‘hedge’.

If we think of modelling in terms of assumptions then, like Keynes, we end up with a graph-like structure of models, not just the side-by-side alternative of some multi-modelling approaches. We have a trade-off between models that are more precise (more assumptions) or those that are more robust(less assumptions) as well as ones that are simply different (different assumptions). If a model is violated we may be able to revert to a more general model that is still credible. Such general models in effect hedge over the range of credible assumptions. The trick is to invest in developing techniques for the more general case even when ‘everybody knows’ that the more specific case is true, and – if one is using the specific model – invest in indicators that will show when its assumptions are particularly vulnerable, as when a bubble is over-extended.


A traditional approach is to have two separate strands of activity. One – ‘engineering’ – applies a given model (or multi-model), the other – ‘science’ – seeks to maintain the model. This seems to work in complicated settings. However, in complex reflexive settings:

  • The activity of the engineers needs to be understood by the scientists, so they need to work closely together.
  • The scientists need to experiment, and hence interfere with the work of the engineers, with possible misunderstandings and dire consequences.
  • Im so far as the two groups are distinct, there is a need to encourage meaningful collaborations and manage the equities between their needs. (Neither ‘on top’ nor ‘on tap’.)

One can see that collaboration is inhibited if one group is pragmatic, the other not, and that pragmatism may win the day, leading to ‘pragmatic scientists’ and hence a corruption of what ought to be happening. (This is in addition to a consideration of the reward system.)

It may not be too fanciful to see signs of this in many areas.

The possible benefits of crashes

Churchill noted that economic crashes (of the cyclic kind) tended to clear out dead-wood and make people more realistic in their judgements, compared with the ‘good times’ when there would be a great deal of investment in things that turned out to be useless, or worse, when the crash came. From Keynes’ point of view much of the new investment in ‘good times’ are band-wagon investments, which cause the bubble which ought to be pricked.

We can take account of such views in two ways. Firstly if the apparent boom is false and the apparent crash is beneficial then we can take this into account in our measure of benefit, so that ‘boom’ becomes a period of flat or declining growth, the crash becomes a sudden awakening, which is beneficial, and the post-crash period becomes one of real growth. The problem then becomes how to avoid ‘bad’ band-wagons.

Either way, we want to identify and avoid fast growth that is ill-founded, i.e., based on an unsustainable assumption.


It is well recognized that mathematics is extremely powerful, and the price for that is that it is very sensitive: give it the slightest mis-direction and the result can be far from what was intended. Mathematics has made tremendous contributions to the complicated disciplines, and seems quite tame. In contrast, my experience is that for complex subjects the combination of mathematicians and numerate professionals is risky, and requires an enormous up-front investment in exchanging views, which sometimes can be nugatory. Perhaps there is some mis-direction? If so, where?

From my point of view, the problem often seems to be one of ‘scientism’. That is, certain types of method are characteristic of the complicated professions, and so people expect problems that are almost the some but complex to be addressed in the same sort of ways. Anything else would not be ‘proper science’. The mathematician, on the other hand, has the habit of rooting out assumptions, especially un-acknowledged ones, and seeking to validate them. If they can’t be then they would rather not make them. (A cynic might say that the other professionals want to work with the tools they are familiar with while the mathematician wants to develop an all-purpose tool  so that he can move on to something more interesting.)

Numerous headline failures tend to reinforce the mathematician in his view that other professionals profess certain beliefs, while he is uniquely equipped to be productively cynical. But here I focus on one belief: in pragmatism. Often, when pressed, people do not actually believe that their assumptions are literally true, only that they are ‘pragmatic’. But, as previously noted, the mixture of literal mathematics and conventional pragmatism is unsafe. But in my view mixtures of pragmatisms from different domains (without any ‘proper’ mathematics) seems to lie behind many headline problems. I have shown why pragmatism is inappropriate for solving complex problems and briefly sketched some reforms needed to make it ‘mathematics friendly’.

See Also

General approach, Sub-prime science, Weapons of Maths Destruction, Minsky moment.

Dave Marsay

Scientists of the subprime

‘Science of the subprime’ is currently available from BBC iplayer.


Mathematicians and scientists were complicit in the crash. Financiers were ‘in thrall to mathematics’, with people like Stiglitz and Soros ‘lone voices in the wilderness’. The ‘low point’ were derivatives, which were ‘fiendishly complicated’, yet ‘mathematical models’ convinced people to trade in them.

The problem was that liberalisation led to an increase in connectedness, which was thought to be a good thing, but that this went to far and led to a decrease in diversity, which made the whole system very fragile, eventually crashing. This was presented by Lord May from an ecological perspective.

Perhaps the most interesting part was that Lord May had tackled his lunching partner Mervyn King before the crash, and that in 2003 Andrew Haldane had independently come up with a ‘toy model’ that he felt compelling, but which failed to gain traction.

After the crash, none of the mainstream mathematical models gave any insight into what had gone wrong. The problem was that the models concerned single-point failures, not systemic failures [my words]. Since then Haldane and May have published a paper in Nature showing that structure matters.

The new activities are to generate financial maps, much like weather maps and transport maps.

One problem is diversity: the solution is

  • To ensure that banks suffer the consequences of their actions [no ‘moral hazard’].
  • To ’tilt the playing field’ against large players [the opposite of what is done now].

Another problem is the expectation of certainty: it must be recognized that sensible models can give insights but not reliable predictions.

In summary, the main story is that physics-based mathematics led decision-makers astray, and they wouldn’t be persuaded by Lord May or their own experts. There were also some comments on why this might be:-

Gillian Tett (FT) commented that decision makers needed predictions and the illusion of certainty from their models. A decision-maker commented on the tension between containing long-term risk and making a living in the short-run [but this was not developed]. Moreover, policy makers tend to search for data, models or theories to support their views: the problems are not due to the science as such, but the application of science


  • This broadly reflects and amplifies the Turner review, but I found it less appealing than Lord Turner’s recent INET interview.
  • Gordon Brown ‘at the top of the shop’ shared these concerns, but seems unable to intervene until his immediate post-crash speech. This seems to raise some interesting issues, especially if the key point was about financial diversity.
  • The underlying problem seems to be that the policy-makers and decision-makers are pragmatic, in a particular sense.
  • Even if the complexity explanation for the crash is correct, it is not clear that this is the only way that crashes can happen, so that pragmatic regulation based on ‘carry on but fix the hole’ may not be effective.
  • The explanations and observations are reminiscent of Keynes, Stiglitz, Soros and Brown have all commended Keynes pre crash, and many have recognized the significance of Keynes post-crash. Yet he is not mentioned. Before the 1929 crash he thought the sustained performance of the stock market remarkable, rather than taking it for granted. His theory was that it would remain stable just so long as everyone was able to trade and expected everyone else to be able to trade, and the central role of confidence has been recognized ever since. The programme ignored this, which seems odd as the behaviourist are also quite fashionable.
  • Keynes underpinning theory of probability [not mentioned] is linked to his tutor’s, Whitehead’s, process logic, which underpins much of modern science, including ecology. This makes the problem quite clear: if mathematicians and scientists are employed by banks and banks are run as ordinary commercial organisations then  they will be focussing on the short-term. The long-term is simply not their responsibility. That is what governments are for (at least according to Locke).  But the central machinery doesn’t seem to be up to it. We shouldn’t blame anyone not in government, academia or similar supposedly ‘for the common good’ organisations.
  •  There were plenty of mathematicians, scientists and economists (not just Lord May) who understood the issues and were trying hard to get the message across, many of them civil servants etc. If we don’t understand how they failed we may find ourselves in the same position again. I think that in the 90s and 00s everything became more ‘commercial’ and hence short-term. Or can we just kick out the Physicists and bring on the Ecologists?

See Also

General approach

Dave Marsay

The new mathematics of voting?

The reform challenge

Electoral reform is a hot topic. If it is true (as I believe) that mathematics has something useful to say about almost every important topic (because most of our problems are due to misplaced rationalism) then mathematics ought to have something useful to say about first past the post (FPTP) versus the alternative (AV).

The arguments

Many mathematicians prefer AV because it has many seemingly desirable properties that FPTP lacks, but such considerations are notable by their absence from the debate. Today (18/2/2011) Cameron and Clegg have made their cases (FPTP and AV, respectively). Clegg makes lots of assertions , such as that under AV ‘every vote is worth the same’, but with no attempt at justifications. The general standard of the debate is exemplified by “When it comes to our democracy, Britain shouldn’t have to settle for anyone’s second choice.” This sounds good, but what it seems to be saying is that if your first choice isn’t on the ballot then you shouldn’t vote. There may be some important principle here, but it needs to be explained. 

On a more substantive point, Cameron leads with the argument that:

“[AV] won’t make every vote count. The reality is it will make some votes count more than others. There’s an inherent unfairness under AV.”

He provides an example that could hardly be clearer. It seems obvious that ‘arithmetically’, FPTP is better.

A mathematical approach

The Condorcet method is the typical mathematician’s method of choice for referenda, but for general elections there may be other concerns. We can’t expect mathematics to give a definitive answer to every question, but at least we should be able to distinguish a mathematical argument from common-sense reasoning dressed up as mathematics.

A mathematical approach typical starts by considering criteria, and then establishing which methods meet which criteria. Here neither side has any explicit criteria, but simply makes some observations and then says ‘isn’t this bad – so we can’t have this method’. But many things in life are compromises, so it is not enough to identify a single failing: one needs to think about which are the key criteria and trade-offs.

Historically, democracy was intended to avoid rule by a person or party who was the last choice of most people. FPTP does not satisfy this basic ‘majoritarian’ requirement (due to vote-splitting), whereas almost all the alternatives, including AV, do. Thus under FPTP it is not enough if most people are against the status quo: they have to agree on a replacement before they vote. Thus people end up voting ‘tactically’. This means that one can’t tell from the ballots what people’s actual preferences were. It could happen (especially before the Internet) that the media misled people into voting tactically (so as not to ‘waste’ their vote) when they would have preferred the outcome that they would have got by voting for their true preference. Mathematicians tend to prefer AV and PR because they are simpler in this respect.

Cameron argues against majoritarianism thus:

“It could mean that those who are courageous and brave and may not believe in or say things that everyone agrees with are pushed out of politics and those who are boring and the least controversial limping to victory. It could mean a Parliament of second choices. We wouldn’t accept this in any other walk of life.”

Thus, we have to think  of a situation where united oligarchs have 30% support but are hated by the other 70%. Perhaps using the media, they could ‘divide and rule’ so that no opposition party obtains more than 40% of the opposition vote, thus keeping the oligarchs in power indefinitely. Do we want an electoral system that could allow in ‘courageous and brave’ oligarchs?

Cameron makes a big point of:

“If the last election was under AV, there would be the chance, right now, that Gordon Brown would still be Prime Minister. Ok, the last election was not decisive in terms of who won. But it was certainly decisive in terms of who lost. And I think any system that keeps dead governments living on life support is a massive backward step for accountability and trust in our politics.”

One problem with many of the FPTP supporters is ‘anchoring’: they presumably see a government as ‘dead’ when it would lose under FPTP. But this means that the health of the government is heavily dependent on how many opposition parties there are, what there relative strengths are and how they are geographically spread. From a majoritarian viewpoint a government is ‘dead’ when a majority prefer some other party. Similarly a sitting member might be thought dead when a majority prefer some other candidate. Under alternatives to FPTP like AV a dead candidate will never be elected. Under FPTP dead candidates are often elected: in recent decades the opposition vote has always been split, so that a candidate with only 30%  support can win when the opposition vote is split. There is no way to tell from the ballots to tell when this has happened. If Cameron really wants to get rid of dead governments he will need something like AV+ or PR . Interestingly, this seems a very sensible requirement yet I have not seen any analysis of it. At least if anyone came up with a suitable method and we had a referendum, AV would  allow select the most preferred option, whereas under FPTP it would be difficult to predict waht would happen: FPTP would seem to have the advantage.

A feature of a genuine logical argument is that it still seems sensible when you change the example, subject to the explicit assumptions. In this case, suppose that we had a referendum with three choices: FPTP, AV and PR. If we used FPTP to count the ballots I suspect that the non-FPTP would be split between AV and PR. But suppose that those who put PR first put AV second, and vice-versa. Then under any majoritarian method FPTP could only win by getting 50% of the votes. Otherwise it would be ranked last by most voters, and hence has to be rejected by our majoritarian criterion. Is this fair? The Cameron argument is that the FPTP ballots are only counted once, while the others are counted twice. But which is more important, respecting the wishes of the majority, or arithmetic?

The ‘counting’ argument is also rather spurious, in that under AV we could simply mix up all the ballots before each round and do a full re-count, but ignoring the options that have been deleted. Each ballot would then be counted equally. Do we think it sensible to choose a method based on something that is a feature of how it is implemented, and not inherent in the method itself?

[Someone has since pointed out that while each ballot is counted as many times as their are rounds, for some ballots the first preference will be counted multiple times while for others a different preference could be counted in each round, as their earlier preferences are eliminated. If this is what the NO campaign is trying to say, it is saying it very badly. If we take this objection seriously then we must vote tactically. Is this subtle ‘inequality’ worse than tactical voting?]

The usual interpretation of ‘one person one vote’ admits AV, so this argument of the NO camp seems to be special pleading.

Mathematical speculations

In fact, there are significant differences between a referendum and a general election. The traditional argument in favour of FPTP is that it necessitates politicking and hence favours strong politicians, who can then apply their dark arts to our benefit in dealing with foreigners. PR (and perhaps AV) would lead to straightforward debates on the issues, risking us having leaders who are unpracticed at the dark arts. FPTP seems ideally suited in this role, but does lead the majority vulnerable to rule by the best at the dark arts, who may not always have the majority’s interests at heart.


Once one has settled on one’s criteria and adequately characterised the problem, one can evaluate a range of methods against them, not just FPTP, AV and PR. One might even develop a tailored method. Practically, though, FPTP, AV and PR seem to be the options, and if one does move away from FPTP then at least one will be able to have sensible referenda, to refine the method.

If one thought majoritarianism paramount, with a wish for ‘strong’ politicians second, then a reasonable method might be to reject all those options/candidates that the majority rank below some acceptable core, and to apply FPTP (if necessary) as a tie-break to the core. This would only differ from FPTP where FPTP would elect a candidate for which a clear majority preferred some other candidate. Similarly, the intention of AV is to avoid wasted votes and respect the majority wishes. This slightly different method can be seen as having the same intent, plus breaking ties using FPTP (to give stronger parties). To mathematicians, this is Condorcet modified to take account of the value of electing a candidate with strong support.


A feature of FPTP, with its tactical voting, is that has a raised barrier of entry to new parties compared with most other methods. AV also has a significant entry barrier in that a party with the fewest first place rankings is eliminated, even if it was a clear overall favourite (e.g., would have won under Condorcet). To go from a high to no barrier in one step could lead to an undesirable disruption to political life. Maybe AV is the best compromise? The Jenkins Commission thought so.

Political views

There are clearly political aspects to the choice. While we can say for sure that the FPTP arguments are wrong the AV arguments seem to thin too justify a change and the mathematical arguments too limited. What do politicians think? Cameron and Clegg (and many others) seem to agree that in most constituencies there will be no change. Cameron’s arguments seem mostly spurious, apart from the one that most voters will simply rank the minor parties the way their first choice tells them to. I have not seen any analysis of the impact of this, but it seems minor compared to the majoritarian criterion. Cleggs’ main argument is that candidates and MPs will need to ‘work harder’, which may offset Cameron’s points. But how do they know? What we can say is that:

  • AV would seem not to make any radical short-term difference.
  • AV reduces the need for tactical voting, so that we can better judge how fair the system is.
  • AV allows us to recognize up and coming candidates and parties, independently of media hype.
  • AV is less biased against the formation of new parties, or local independents.
  • AV would provide information on preferences that could inform choices on further reforms, if necessary.
  • AV, used in a referendum, would open the way to sensible further reforms, if needed.

 If it is true that an informed public would choose FPTP, adopting AV now would give them that choice, with no obvious down-side (scares on costs aside).

Tactical voting

The No campaign’s objections to AV would seem to apply equally to any method that did not encourage tactical voting, i.e. voting for someone who isn’t your actual first preference, but who you think has a better chance. The classic problems with tactical voting are:

  • Voters expectations can be manipulated, e.g. by the media.
  • The support for new parties (e.g. the Greens, a while back) is suppressed, and the giving the current main parties have an advantage.
  • It requires some degree of co-ordination to be able to vote a tyrant out ‘tactically’, to avoid vote-splitting.
  • It disenfranchises honest voters and those who are so clued-up in politics. (Especially if there are some political shocks just before the election.)


While the FPTP arguments appears more mathematical than AV’s its notion of ‘counting’ is spurious. There would seem more merit in the AV argument that under FPTP wasted votes do not count at all, and that AV remedies this defect. But – looking at it afresh – it is not enough to ‘count’ the votes: one may also want the results to respect the wishes of a clear majority, in order to be truly ‘democratic’. But this does not uniquely determine a method: a tie-break may be needed, and there may be some merit in FPTP. Indeed, while the No campaign arguments seem largely spurious, there does seem to lie behind it a genuine concern for the ‘health’ of politics and the strength of government.

It seems to me that mathematics can provide some useful insights, but some greater understanding is required to reach a definitive verdict.  Both Cameron and Clegg make the short-term effects seem rather minor. It is unfortunate that there isn’t the option of a pilot scheme, but AV has a clear edge for referenda and if the majority came to view AV as a ‘dead method’ we could easily return to FPTP. Indeed, we could routinely publish both FPTP and AV results, to inform the public. The arguments against such a tentative view seem unconvincing.



The No2AV campaign gives these reasons to vote NO:

  1. AV is costly …
  2. AV is complex and unfair
    The winner should be the candidate that comes first, but under AV the candidate who comes second or third can actually be elected. That’s why it is used by just three countries in the world – Fiji, Australia and Papua New Guinea. Voters should decide who the best candidate is, not the voting system. We can’t afford to let the politicians off the hook by introducing a loser’s charter.
  3. AV is a politician’s fix …

The second point seems to be an insult to the readers’ intelligence: one could equally well say “The winner should be the candidate that comes first, but under FPTP the candidate who comes second or third can actually be elected.” It all depends on who you think ‘should’ come first. E.g., should someone who is ranked last on 70% of the ballots be elected simply because the other candidates’ votes are split? The No campaign’s points seem all spin and no substance.


Lord Reid has given an interview in which he says

  •  [A] cornerstone of our democratic system has been ‘one person one vote’ …
  • [My vote] has the same weight as everyone elses.
  • [AV] completely undermines and corrupts that; some people will have one vote, others … will be counted again and again.
  • [AV] is a theat to the .. basis of our democratic system.

When the interviewer notes that under AV one gets the candidate that ‘most people are happy with’ Reid responds, of AV, that:

  • If you vote for liberal, labour or conservative it is overwhelmingly likely that your vote will be counted once, whereas if I go out and vote for one of the ‘fringe’ candidates [my vote may be counted many times] … how is it fair [?]

 The emerging ‘No’ message seems to be that only ‘fringe’ candidates ‘such as the BNP’ would benefit from AV. I note:

  • The Green party is also a fringe party, as would be a ‘reform democracy’ or ‘reform expenses’ party.
  • In many constituencies (like mine) one of the three main parties is ‘fringe’.
  • Independents, such as doctors standing to save a local hospital, are ultra- fringe. They may also revitalise democracy.
  • If the three main parties have candidates, it is mathematically certain that at least one of them will have their votes counted at least twice.

The example seems bogus, both mathematically and practically. ‘One vote’ and ‘equal weight’ could mean any of:

  1. One ballot paper each.
  2. One mark each (X).
  3. The ballots are counted by a process which only takes account of each ballot once.
  4. The ballots are counted by a process that only takes account of each vote once.
  5. The ballots could be counted by a process that only takes account of each ballot once.
  6. The same opportunities and rules for everyone.

It is not clear which Lord Reid considers essential to democracy. FPTP satisfies them all, but would allow an oligarchy with enough influence over the media to retain power even if it was most voters last choice, as above. AV meets 1 but fails 2, as would any alternative to FPTP. AV also fails 3, but so what? We could stage a series of rounds (as in France) with trailing candidates being eliminated until one candidate has a majority. The result would be the same, but each vote would be counted once (per round) and counted equally. Do we wish to reject AV on a technicality? With one round AV satisfies 4, if by ‘vote’ one means ballot, and also satisfies 5 and 6. Isn’t it 6 that matters?

Lord Reid also refers to ‘weight’, without defining it. Suppose that two parties traditionally vie for the seat, with the rest being no-hopers. Then a doctor stands on the ticket of supporting the local hospital and otherwise consulting his constituents. Suppose you would prefer to vote for this ‘fringe’ candidate’. Under FPTP you could vote for the doctor, thus recording your support for the hospital but taking no part in the main contest. Or you could vote for a main candidate, failing to record your support for the doctor. Under AV you would simply record your actual preferences, thus recording support for the hospital and taking part in the main contest. And the doctor might even win. Under AV your vote clearly has more ‘weight’, but which is fairer? If we think of a group of people with similar views, then those who support a main candidate will all vote for them, so that their weight of support is not split. Under AV the weight of support for the main candidate is undiminished (it is what it would have been had everyone voted tactically). The support for the fringe is also undiminished (it is what it would have been had they all voted honestly). Under FPTP the weight of support will be divided, unless they all vote tactically. So which is a fairer definition of ‘weight’? Is it obvious that AV undermines democracy?

I think there are some sensible arguments for ‘No’, but the No campaign isn’t using them, and the ability of a party whom most people hate to get elected (due to vote splitting) under FPTP seems much more significant in undermining democracy.

See also

Dave Marsay

Metaphors for complexity and uncertainty


According to psychologists we often tell ourselves stories to rationalize away complexity and uncertainty. It would be good to have some stories that reveal the complexity and uncertainty, as an antidote.

Route Planning and the Normandy Invasion

I noticed in the 80s (before SATNAV) that not everyone thinks of a common task like route-planning in the same way. Some are pragmatic, picking the best route and only considering alternatives if that one is blocked. Others have heuristics that allow for uncertainty, such as preferring routes that have ready detours, just in case. There was some discussion among geek-dom as to whether a pure Bayesian approach would be adequate or even better. Logically, it should depend on the nature of the problem. If blocks are probabilistic with learn-able distributions, and the goal is to minimise average journey times, then the assumptions of Bayes hold and hence it shouldn’t be possible to beat Bayes consistently.

One day I discovered that a friend of the family who had been strongly anti-Bayes had been involved in planning the D-day invasion, and I realised that here was a good example of a complex problem rich in uncertainties, showing the benefits of a more principled approach to uncertainty in planning. I published a brief paper (also here ), which may be helpful. It was almost a decade before I was faced with a planning situation similarly rich with uncertainty.


Routine journeys can be thought of as a single entity, with the usual habits of driving to keep in lane and away from the car in front. If the road is blocked one may need to divert, calling for some navigation. The routine epoch is interrupted by ‘higher-level’ considerations. If one has always optimised for journey time one will never have explored the by-ways. If one occasionally uses alternative, slightly slower, routes, one will be in a better position to find a good alternative when you have to divert. Thus optimising for journey time when all goes well is at the expense of robustness: coping well with exceptional circumstances. (This is a form of the Boyd uncertainty principle.)

A more interesting ‘change of epoch’ occurred a few years back. An unprecedented police shoot-out on the M5 near Bristol caused chaos and was widely publicised. The next weekend my partner and I were about to drive down the same stretch of motorway when there were reports of a copy-cat shoot-out. Traffic jams over a large area were inevitable, but we thought that if we were quick enough and took a wide-enough loop around, we should be able to make it.

SAT-NAVs had only just become fairly common, and the previous weekend had shown up their weakness in this situation: everyone gets diverted the same way, so the SAT-NAV sends people into big queues, while others could divert around. This week-end most drivers knew that, and so we expected many to be taking wider detours. But how wide? Too narrow, and one gets into a jam. Too wide and one is too slow, and gets into jams at the far end. Thus the probability of a road actually being jammed depended on the extent to which drivers expected it to be jammed: an example of Keynes’ reflexive probability. It also an example where the existence of meaningful ‘prior probabilities’ is doubtful: the recent popularity of SAT-NAVs and the previous incident made any decision-making based on experience of dubious validity.

This is just the kind of situation for which some of my colleagues criticise ‘the mathematical approach’, so just to add to the fun I drive while my partner, who teaches ‘decision mathematics’ advised. Contrary to what some might have expected, we took a 100-mile right-hook detour, just getting through some choke points in the nick of time, thus having a lot more fun with only about a 20 minute delay from using the motorway. I noticed, though, that rather than use one of the standard decision mathematical methods she used the theory. I wonder if some of the criticisms of mathematics are when people apply a ‘mathematical’ method without considering the theory: that is not mathematics!  

Drunken walks

A man walks along the cliff edge to the pub most evenings. His partner will not let him go if it is too windy, or the forecast is poor. The landlord calls a taxi if it is too windy at closing time.

One night two walkers comment on how dangerous the walk along the cliff is. They are ignored. The drinker walks home and off the cliff.

The cliff had been unstable but had been buttressed. Some had questioned the reliability of the contractors used, but the local authorities had given assurances that the cliff was now stable. And yet the work had been poor and the cliff had collapsed, so that the drinker had followed the path to his death.

Games field

A man notices that different things are going on as he passes a games field. He decides that he can spend 10 hours over the next 10 years observing what is going on, in an attempt to work out ‘the rules of the game’. If spends 600 1 minute intervals selected at random from games over the 10 years, he may come to have a very good idea of what games are played when, but a poor idea of the rules of any one game. On the other hand if he observes the first 10 hours of play he may form a good view of the rules of the current game, but have no idea of how games are selected. This is an example of the organizational and entropic uncertainty principles, generalizations of Heisenberg’s better-known uncertainty principle.

Particle Physics

Quantum theories arose from a recognition of the limits of the classical view, and were developed by thinkers who corresponded with Whitehead and Smuts, for example, on general logical issues. The similarities can be seen in the Bohm interpretation, for example. Temporarily stable complexes interacting with and within stable fields have dynamics that follow stable rules until previously separate complexes come to interact, in which case one has a ‘collapse of the wave function’ and a discontinuity in the dynamics. These periods of relative homogenaity correspond to Whitehead’s epochs, and the mechanism for change is essentially Cybernetic. In this formalism particles have precise positions and momentum; uncertainty is measurement uncertainty.

The Bohm interpretation only applies when one has quantum equilibrium, so that higher-level epochs are fixed, and consequential changes bounded. Otherwise one has greater uncertainty. 

Quantum Cognition

 Quantum cognition notices certain ‘irrationalities’ about human cognition, and proposes a model ‘inspired by’ quantum mechanics. It is mentioned here because the inspiration seems sound, even if the details are questionable.

Under categorization-decision it notes that asking about a relevant factor can affect categorization. This seems reasonable, as it affects the context.

Under memory-process disassociation it notes that people are more likely to recognize that something had been seen on a list if they were asked about a specific list. Taking this to extremes, people may be more likely to recognize that they have met someone at some conference if a specific conference is named. This seems reasonable. Unpacking effects is similar. The questions in conceptual combinations are similar, but the contexts and results quite different.

Under the Linda problem it notes mis-ordering of probabilities in a context where probability would be hard to judge and possibly discriminatory. Respondents may have reported likelihoods instead: the two are often confused. This would be a ‘representativeness type mechanism’.

There seem to be two things going on here: the experimental subjects might not be good at estimating, and the experimenter might not be very good at analysis. Quantum probability appears to be an attempt to avoid:

  • Problems that arise when the analyst ignores the conditionality of probabilities. 
  • Problems that arise when experimental settings or terminology (e.g. probability and likelihood) are confused.
  • Variation of performance with ‘focus’, such as when a specific list is mentioned.

Quantum probability seems to be a heuristic that gives a more moderate result, thus compensating for the above effects. It seems more natural to take account of them more directly and specifically.


Dave Marsay

Complexity, uncertainty and heuristics


Typical heuristics are good for coping with typical cases, which may or may not be complex or uncertain. Considering when various heuristics do or do not work can be helpful in understanding complexity and uncertainty, and how to cope with them.

The focus is on heuristics associated with science.


Pierce thought that: “Doubt, like belief, requires justification. It arises from confrontation with some specific recalcitrant matter of fact (which Dewey called a ‘situation’), which unsettles our belief in some specific proposition”. This is clearly too complacent in the face of complexity and uncertainty. His view of science that:  “A theory that proves itself more successful than its rivals in predicting and controlling our world is said to be nearer the truth” fails to distinguish between short-run and long-run truths. It also cannot cope with an uncertain world, where even probabilistic prediction may simply not be possible.

Pierce’s pragmaticism supposes that “What recommends the scientific method of inquiry above all others is that it is deliberately designed to arrive, eventually, at the ultimately most secure beliefs, upon which the most successful practices can eventually be based.” Now, as Keynes observed, ephemeral things that are observable and hence, according to Cybernetics, are not useful for control.  It might seem, then, that we should focus on the stable properties behind the ‘secure beliefs’. But many things are neither so ephemeral as to be unobservable not so stable as to be ‘secure’. In Whitehead’s view, ‘the level of the game’ is just this intermediate level which science ignores as either being too changeable to be represented by a law or too structured to be aggregated up statistically. Worse, although they do no actually satisfy the assumptions of pragmatism they may endure for long enough to be mistaken for laws, as in the great moderation.

The assumptions of pragmatism tend are not true across epochs and – in practice – are often more true within epochs, provided that one has enough and representative data about the epoch.

Occam’s razor

Occam’s razor or ‘the law of parsimony’ recommends making the fewest possible assumptions. One is parsimonious with one’s assumptions. One should not assume less complexity or more certainty than might be appropriate. Thus one should always suppose that a situation has the maximum possible complexity and uncertainty, unless it can be shown otherwise for the case at hand. Whitehead takes this to an extreme.

Keep it simple, stupid (KISS)

Unfortunately, Occam’s razor is often taken to mean that one should choose the simplest model. Thus given a general model with a parameter p that fits the data for a range of values of p, where p=0 yields a much simper model, a widespread practice would be to use the simpler model. In this case, this yields the opposite of Occam’s razor, which would report the general model.

The false razor tends to be more true within an epoch.


For the great moderation, the false razor would recommend false induction: what has always happened will always happen, and what has never happened will never happened. For example, once the great moderation has lasted a while, it would suggest that ‘logically’ it would last forever. Certainly, the burden of proof would be on those who thought otherwise. Induction tends to be true within an epoch, since a significant change typically indicates a change of epoch.


Human beings have a psychological yearning for certain predictions, or failing that ‘objective’ probability distributions. These are impossible where there is genuine uncertainty, as in many complex situations, so there can be no reasonable heuristic for making them. But, consistent with Whitehead and Occam’s razor, one can make extrapolations and one can often make anticipations.

Wherever a heuristic gives a reasonable prediction within an epoch, it is reasonable to note it as a conditional prediction or as an extrapolation. Thus if X has never happened and one has lots of data, saying that X will never happen is a genuine extrapolation even if it may be a poor prediction (e.g. in a conflict situation). To make a reasoned anticipation one has to consider the ‘higher’ epoch which is likely to be sustained even when the detailed level changes. For example, you might think that the rules of football might change soon, marking a change in epoch, but some things (such as the shape of the ball) might be ‘secure’. You might then anticipate what the new rules might be, without necessarily being able to associate a probability distribution.

In some cases one might know when the epoch was going to end. Otherwise one might assess:

  • The robustness of the epoch to fat-tailed events.
  • Which individuals or potential collaborations might have both an interest in ending it and the capability to end it or make it more vulnerable to fat-tailed events. 
  • Which existing self-referential cycles have the potential to ‘explode’, changing the epoch.
  • Which self-referential cycles have the potential to come into being, changing the epoch.

 There are also other factors (not well understood) that influence the tendency of changes to propagate and build through a network, rather than dissipate. A new set of heuristics are needed. Watch this space?


It would seem rash not to consider the possibility that radical complexity and uncertainty are factors in any large-scale system that involved a degree of adapting, learning or evolving. Science, engineering,  managerialism and common-sense tend to focus on things that are ‘true’ in the short-term and treat them as if they are immutable, unless informed by longer-term experience. We can typically ‘fix’ these heuristics by noting that they are really only extrapolations and then either considering the wider context or taking account of the uncertainties which arise from not explicitly considering the wider context.

Thus, while we might go-along with Occam’s razor and pragmatism to the extent of seeking the simplest possible models for things, we should at the same time recognize the inevitable limits to such models and allow for the inevitable uncertainties about what they may be.

See Also

Reasoning in complex, dynamic worlds, How much complexity and uncertainty?


Dave Marsay

How much complexity and uncertainty?

Scales of complexity and uncertainty

We can think of Keynes, Whitehead etc as providing a scale for complexity and uncertainty. There are others: how do we choose one to use? A key factor is, what sort of complexity is there ‘out there’?

Looking for complexity

We might expect to find complexity in situations where our normal methods do not serve use well, and hence in situations of extreme competition, such as all-out conflict. But we might equally suppose that such situations simply reflect an irreconcilable difference of critical interests.

The key feature of Whitehead is the role of epochs. This corresponds to Clausewitz’s notion that while warfare had a general character, each war (and perhaps each battle) had its own ‘logic’. Given that western generals, and later Maoist fighters, have been brought up on such dictums, it is no surprise to find (as Montgomery notes) that warfare does indeed match the theory. Similarly, Keynes had a huge influence on economics for a period, so it is no surprise to find that during this period events match his theory. Warfare and economics are imprinted with the ideas of those involved (cf Turner). But what if we had different ideas?

The Cold War

Global deaths climbed steadily until the end of the Cold War, then fell steadily until 2005.

The end of the Cold War coincided with the end of an epoch.

During the Cold War the death rate got steadily worse, a situation that was plainly unsustainable. Looked at statistically, variations about the increasing average appeared random. Thus, while there was plenty of news the situation didn’t seem to change much, apart from getting steadily worse.

The end of the Cold War marked a sudden change to a period of gradual improvement, again unsustainable.

While this fits Whitehead’s model, it also fits simplistic models and extrapolations, apart from the end of the Cold War. Although the timing of the end caught many by surprise, the end was widely hoped for and the fact that it changed the rate of change of the death rate was expected. 


The Mathematics of War

An article in Nature has shown, firstly, how the casualty rates for insurgents in conflict approximate to ‘fat-tailed’ power-law distributions with characteristic exponents of about 2.5, making them very complex. (A smaller exponent is more random, while a lesser exponent is more structured.) An exact power-law distribution would be generated by groups fragmenting and re-forming randomly, but this does not seem very realistic.

The paper also gives a constructive model in which groups fragment  in conflict and re-group in-between, competing for media attention. This generates a fat-tail distribution, but with a reduction in the frequency of  large numbers of casualties compared with a power-law, which matches the actual data much better. The paper also notes a similarity between their model and statistics and financial market models.

The economic crash 2007/8

The economic crash was statistically like the end of the Cold War, but whereas some hoped that the Cold War would end, the assumption prior to the crash was that the great moderation would be an un-ending epoch. After the Crash, economists turned back to the insights of Keynes, and Brown has even evoked Whitehead.

The Balkans, 90s.

The Cold War was a bipolar world, relatively lacking in complexity and, like the great moderation,  people did well in the short run by ignoring uncertainty. The ‘unexpected’ end of the Cold War de-stabilised the Balkans, bringing back extreme complexity and uncertainty.

Statistics for confrontation and conflict, as for much else, typically identify a scale of interest and then divide the period for which data is available into epochs where the statistical properties seem stable. These epochs are then analysed, as in the ‘mathematics of war’ above. But these epochs are just what we are interested in. When we separate the data into epochs, it is the epochs that we want to see, as below.

The Balkans jumped between epochs of 'violent stability'.
KEDS data ( shows jumps between epochs

As we can see, after the Cold War, the situation did not change steadily (characteristic of common-sense evolution) but in jumps (or ‘saltations’), characteristic of complex evolution. If one tries to match a Markov model (i.e., standard probability) one inevitably ends up with a very improbable model, as if each epoch has its own model and there is a Markov model to explain the transitions between them. Such a thing begs an explanation. Looking at the dates, it can be seen that the transitions correspond to big events (such as atrocities or peace talks). More generally, sudden large scale casualties indicate either a break-through for one side or a disaster for the other. This:

  • has an impact on the relative strengths, and hence expectations
  • has an impact on perceptions of the capabilities of given sized forces of the other size
  • has psychological impacts.

Hence if the casualties are large enough it is reasonable that at least one side will change its expectations and strategies, hence changing ‘the rules of the game’. A big enough change will initiate a new epoch. Consequently one might expect big events to be suppressed with an epoch, as seen. Thus, following Whitehead we can see these big events as either deliberate decisions within a strategic game or a ‘fat-tail’ event which gets seized on and treated as a ‘game-changer’. We can also see some transient behaviour, in which it takes time for the new patterns to settle down. But typically, one has periods of conflict that are characterised by randomness, as in conventional combat models, separated by big game-changing events.

Complexity and uncertainty are important when one needs to think ahead across potential changes.


It would seem rash not to consider the possibility that radical complexity and uncertainty are factors in any large-scale system that involves a degree of adapting, learning or evolving. Thus, while we might go-along with Occam’s razor and pragmatism to the extent of seeking the simplest possible models for things, we should at the same time recognize the inevitable limits to such models and allow for the inevitable uncertainties about what they may be.

See Also

Reasoning in complex, dynamic, worlds , Complexity Physics


Dave Marsay

Critical phenomena in complex networks


Critical phenomena in complex networks at arXiv is a 2007 review of activity mediated by complex networks, including the co-evolution of activity and networks.


The combination of the compactness of networks, featuring small diameters, and their complex architectures results in a variety of critical effects dramatically different from those in cooperative systems on lattices. In the last few years, important steps have been made toward understanding the qualitatively new critical phenomena in complex networks. The results, concepts, and methods of this rapidly developing field are reviewed. Two closely related classes of these critical phenomena are considered, namely, structural phase transitions in the network architectures and transitions in cooperative models on networks as substrates. Systems where a network and interacting agents on it influence each other are also discussed. [i.e. co-evolution] A wide range of critical phenomena in equilibrium and growing networks including the birth of the giant connected component, percolation, k-core percolation, phenomena near epidemic thresholds, condensation transitions, critical phenomena in spin models placed on networks, synchronization, and self-organized criticality effects in interacting systems on networks are mentioned. Strong finite-size effects in these systems and open problems and perspectives are also discussed.


The summary notes:

Real-life networks are finite, loopy (clustered) and correlated. Most of them are out of equilibrium. A solid  theory of correlation phenomena in complex networks must take into account finite-size effects, loops, degree  correlations, and other structural peculiarities. We described two successful analytical approaches to cooperative phenomena in infinite networks. The first was based on the tree ansatz, and the second was the generalization of the Landau theory of phase transitions. What is beyond these approaches?

Thus we can distinguish between:

  • very complex: where the existing analytic approaches do not work.
  • moderately complex: where the analytic approaches do work pragmatically, at least in the short-term, even though their assumptions aren’t strictly true.
  • not very complex: where analytic approaches work in theory and practice. Complicated? 

This blog is focussed on the very complex. The paper notes that in these cases:

  • evolution is more than just a continuous change in a parameter of some over-arching model.
  • fluctuations are typically scale-free (and in particular non-Guassian, taking one outside the realm assumed by elementary statistics).
  • the scale-free exponent is small.

The latter implies that:

  • many familiar statistics are undefined.
  • the influence of network heterogeneity is ‘dramatic’.
  • mean-field notions, essential to classical Physics, are not valid.
  • notions such as ‘prior probability’ and ‘information’ are poorly defined, or perhaps none-sense.
  • synchronization across the network is robust but not optimisable.
  • one gets an infinite series of nested k-cores. (Thus while one lacks classical structure, there is something there which is analogous to the ‘structure’ of water: hard to comprehend.)

So what?

Such complex activity is inherently robust and (according to Cybernetic theory) cannot be controlled. The other regions are not robust and can be controlled (and hence subverted). From an evolutionary perspective, then, if this theory is an adequate representation of real systems, we should expect that in the long term real networks will tend to end up as very complex rather than as one of the lesser complexities. It also suggests that we should try to learn to live with the complexity rather than ‘tame’ it. Attempts to over-organize would seem doomed.

Verifying the theory

(Opinion.) In so far as the theory leads to the conclusion that we need to understand and learn to live with full complexity, it seems to me that it only needs to be interpreted into domains such as epidemics, confrontation and conflict, economics and developement to be recognized as having a truth. But in so far as our experience is limited by the narrow range of approaches that we have tried to such problems, we must beware of the usual paradox: acting on the theory would violate the logical grounds for believing in it. More practically, we may note that the old approaches, in essence, assumed that the future would be like the past. Our new insights would allow us to transcend our current epoch and step into the next. But it may not be enough to take one such step at a time: we may need a more sustainable strategy. (Keynes, Whitehead, Smuts, Turing, …)  


An appreciation of the truly complex might usefully inform strategies and collaborations of all kinds.

I will separate out some of this into more appropriate places.

See Also

Peter Allen, Fat tails and epochs, … .

Dave Marsay

Fat tails and Epochs

Different explanations of the crash of 2007/8

The term ‘fat tails’ has been much in evidence since the crash of 2007/8. Nicholas Taleb had been warning that statistical tails were fat and of its significance, and the term has since been taken up to explain the crash.

There has also been a revival in references to Keynes and some references (e.g. by Gordon Brown) to Whitehead’s notion of epochs. Both are often seen as alternatives to the previously fashionable ‘sophisticated mathematics’ slated in the Turner Review.

Implications of fat-tails

From a mathematical view these alternative explanations are quite different. If the crash was due to a ‘fat tail’ then the problem was that it’s probability had been under-estimated. A variant of the ‘fat-tail’ notion is the one-sided fat tail. Here sample statistics tend to seriously underestimate the probability of occurrence for quite long periods, followed by an erratic ‘correction’. Thus competitive pressure favours those who optimise in the short-run, but (unless bailed out) crashes weed them out, leaving those who had carried some ‘fat’. The solution to ‘fat tails’ is to fatten up one’s body. Similarly, in evolutionary systems we would expect to see systems that are not too optimised. If one only has a fat-tail then after the event the distribution is unchanged and – assuming that the event has been survived – the best thing to do is supposedly to update the estimated probability distribution and carry on as before. This is William James’ ‘pragmatic’ approach. Even if one is aware of fat tails the best strategy might be to optimise like everyone else (to stay in the game), cope with the crisis as best one can, and then learn what one can and carry on.

Implications of epochs

The Keynes, Whitehead and Smuts view is quite different. It incorporates a different view of evolution. It informed a different management style (not particularly well documented). It was based on a different view of logic and uncertainty. It had profound impact on Keynes’ view of economics.

All observations relate to some epoch. Empirical predictions are only really extrapolations, conditional on the epoch not changing. A given epoch has characteristic behaviours. The assumption that an epoch will endure entails assumptions about those behaviours. Within an epoch one typically has distributions, which may have fat tails. Some, but not all, fat-tail events may destabilise the epoch. Thus, from this view, fat-tail events are a problem. But external events can also destabilise an epoch, leading to a change in ‘the rules of the game’.

Thus match-fixing in cricket may lead to a new epoch in the way cricket is organised. But match-fixing in rugby could also lead to a change in the way cricket is organised. If the match-fixing had been rife, one may see a significant, enduring, shift in the statistics, rather than a ‘blip’.


The term ‘fat-tail’ seems to be being used to indicate any shocking event (some refer to Shackle). But the term ‘fat-tail’ implies that events were probabilistic, the problem being that low-probability is not the same as no probability. The term ‘epoch’ indicates that the rules have changed, for example an assumption was violated. Thus the terms might be used to complement each other rather than using ‘fat-tail’ as a portmanteau term. ‘Shock’ would seem a suitable less precise term.

Distinctions needed?

A further problem is that in the epochs of interest the assumptions of probability theory do not necessarily hold (e.g., see Turner) so perhaps we need a different term. What are the important distinctions? Maybe we need to distinguish between:

  • Surprises that do not change our view of what may happen in the future, only of what has happened.
  • Surprises that change our view of what may happen, but which are not evidence of a fundamental change in the situation (just our view of it) and may not cause one.
  • Surprises that are not evidence of a fundamental change having actually occurred already, but which may lead to one.
  • Surprises that suggest a fundamental change in the actual situation.

Thus, for small investors, a changing view of the market does not change the market, but en mass may lead to a change. It seems inevitable that any terminology would need to address appropriate concepts of ‘causality’.


Dave Marsay

Evolution and Uncertainty

The New Scientist of 29 Jan 2011 (Issue 2797) has an article by Midgley (already blogged in ‘Evolution and epochs’ ) and ‘I, algorithm’. The latter shows how ‘old AI’ can be improved by combining it with probabilistic reasoning. The model that it uses is of numeric probability distributions. Is this enough to be able to represent the evolutionary behaviour described by Midgley, or the Knightian uncertainty of financial crashes? Or is it another example of a limited metaphor ‘hijacking our thinking’?

Keynes (Treatise on Probability) and Smuts (Holism and Evolution) both built on Whitehead’s notion of epochs, claiming that ‘common sense’ probabilistic reasoning was too limited. Or have I missed some advance?

Dave Marsay

See also

Reasoning in a complex world

Evolution and epochs

In common usage, the term evolutionary suggests gradual. As Darwin observed, common sense would suggest that species should develop minor differences before the major ones. But this is the opposite of what is seen in the fossil record of the Cambrian explosion. But if we treat evolution as a special case of Whitehead’s logic, as in  JC Smuts’ ‘Holism and Evolution’, then species are epochs, and hence one expects major changes to lead to further minor speciation. (Speculation.) This has implications for ‘Social Darwinism’, as in Mary Midgley’s ‘The selfish metaphor’, New Scientist.

See also

Mathematics and real system, Cybernetics and epochs,

INET: Adair Turner interview

Institute for New Economic Thinking: Interview with Adair Turner.

Lord Turner amplifies the criticism of ‘undue reliance of ‘sophisticated mathematics’ that he made in the ‘Turner review’. Keynes was forgotten, e.g. herd effects, self-reinforcing phenomena. Hayek, Minsky, Kahnemann and behavioural economics are along the right lines. The ‘sophisticated mathematics’ supported the wrong view, that the economy was largely driven by underlying ‘fundamentals’. In fact, economics is driven by nothing much but ideas. (This may be an exaggeration. DJM.)

After the crisis people were aware that something had gone wrong, and economics had learnt from the ’30s how to deal with a crisis. Some influential people think we should just tinker with the old ideas, and have freer markets. But we need new ideas, with a clear underlying theory. We must not be deluded any longer. We must think about real institutions, not mathematically tractable but unrealistic theories. (Economics has long sought precision, whereas Keynes showed that this was not realistically possible. If we accept imprecision, then Keynes approach may appear tractable. DJM)

Key delusions were:

  • The ability (or desirability) of apparent predictability. Knightian uncertainty is key, both within and about any economic theory.
  • The assumptions and attitudes that were necessary to yield ‘probability distributions’ were sometimes false. (See ‘reasoning epochs.)
  • Extrapolation was often misleading. (See ‘reasoning epochs.)
  • Ignoring the ‘deep uncertainty’ of Keynes’ ‘Treatise on Probability’.
  • Ignored Charles MacKay’s ‘Madness of the Crowds’.

The models could not describe the world as it was: we should focus on the world as it yes.

(These views seem appropriate to any open complex adaptive system. DJM)

Lord Turner noted how credit against real-estate led to a self-reinforcing cycle, ‘fueling the fire’. Rather than establish axioms ‘for all time’, building model and then enacting model, one should search the data for self-reinforcing cycles. (Some economists seem to think that the ‘axiomatic’ approach is mathematical, but as Turing etc showed, combined with a need for ‘precision’ it is deeply misguided. The behavourist approach seems to be favoured, but can it cover all the bases? DJM)

Banks have an impact on the real economy, at least in a crisis and possibly long term. (Possibly this is a new phenomenon, following the recent innovations. Whatever assumptions one makes one ought to monitor.) Shareholders and senior managers are cushioned from failure, and so rationally want more leverage than is socially optimal (sensible?). (Following changes to the way universities are funded) banks funded , and hence heavily influenced, academia. Most economists are employed by financial institutions. Academics need to counter-vail the resultant biases, and need to understand potential failure. (The thinking seems to be that behavourists are doing this.) They need to learn from history, and the need to avoid sudden set-backs. (Is this possible? Avoiding small set-backs might lead to worse ones in the long-run.) It is crucial to work out what makes the economy stable. (And in what sense an economy can be ‘stable‘.)

Keynes (1931?) noted that we had ‘failed in the management of a complicated machinery whose workings we did not understand’. Maybe we should just deliver ‘prosperity’ to the poor and then allow (facilitate?) freedom and creativity. (This seems to combine the minimum essential requirements of ‘left’ and ‘right’. But is it enough?) The free market has a tendency to instability. (We may learn to avoid crashes like the last one, but) the next crash will be different.)

The economists look bad. There was no fore-warning. or explanation. (Actually, Brown seemed to supply an explanation.) The field needs to be broadened. It is not enough to model data. We need to (identify and) understand issues, coupling theory to empirical observation. Need to understand self-reinforcing cycles, including institutions, and the potential for dis-equilibrium. (There is a need to build more abstract models, develop concepts of causality, and vigilantly observe/search). (Mentions Kahnemann, Soros, …)

See also

Brown, Cassandra factor, Modelling the Crash, Mathematics and real systems

Gordon Brown: Beyond the Crash

Brown sees this is the first crisis of globalisation and highlights the role of Knightian uncertainty and of institutions that can appreciate and handle the risks, but is utlimately short on useful insight.

Gordon Brown: Beyond the Crash; overcoming the first crisis of globalisation.

“[This was] the first crisis to have its roots in the very process of globalisation itself” (p220)

The 314 page book is hard going but an essential contribution to the ‘what next’ debate. In particular, Brown maintains that the problem is not just one of deficits and debt, but one of globalisation (the global sourcing of goods, services and capital).

This review is now here.

David Marsay