Knightian uncertainty and epochs

Frank Knight was a follower of Keynes who emphasised and popularised the importance of ‘Knightian uncertainty’, meaning uncertainty other than Bayesian probability. That is, it is concerned with events that are not deterministic but also not ‘random‘ in the same sense as idealised gambling mechanisms.

Whitehead‘s epochs, when stable and ergodic in the short-term, tend to satisfy the Bayesian assumptions, whereas future epochs do not. This within the current epoch one has (Bayesian) probabilities, longer term one has Knightian uncertainty.


Consider a Casino with imperfect roulette wheels with unknown biases. It might reasonably change the wheel (or other gambling mechanism) whenever a punter seems to be doing infeasibly well. Even if we assume that the current wheel has associated unknown probabilities that can be estimated from the observed outcomes, but the longer-term uncertainty seems qualitatively different. If there are two wheels with known biases that might be used next, and if we don’t know which is to be used then, as Keynes shows, one needs to represent the ambiguity rather than being able to fuse the two probability distributions into one. (If the two wheels have equal and opposite biases then by the principle of indifference a fused version would have the same probability distribution as an idealised wheel, yet the two situations are quite different.)

Bayesian probability in context

In practice, the key to Bayesian probability is Bayes’ rule, by which the estimated probability of hypotheses is updated depending on the likelihood of new evidence against those hypotheses. (P(H|E)/P(H’|E) = {P(E|H)/P(E|H’)}•{P(H)/P(H’)}.) But estimated probabilities depend on the ‘context’ or epoch, which may change without our receiving any data. Thus, as Keynes and Jack Good point out, the probabilities should really be qualified by context, as in:

P(H|E:C)/P(H’|E:C) = {P(E|H:C)/P(E|H’:C)}•{P(H:C)/P(H’:C)}.

That is, the results of applying Bayes’ rule is conditional on our being in the same epoch. Whenever we consider data, we should not only consider what it means within our assumed context, but whether it has implications for our context.

Taking a Bayesian approach, if G is a global certain context and C a sub-context that is believed but not certain, then taking

P(E|H:G) = P(E|H:C)  etc

is only valid when P(C:G) ≈ 1,

both before and after obtaining E. But by Bayes’ rule, for an alternative C’:

P(C|E:G)/P(C’|E:G) = {P(E|C:G)/P(E|C’:G)}•{P(C:G)/P(C’:G)}.

Thus, unless the evidence, E, is as least as likely for C as for any other possible sub-context, one needs to check that P(C|E:G) ≈ 1. If not then one may need to change the apparent context, C, and compute P(H|E:C’)/P(H’|E:C’) from scratch: Bayes’ rule in its common – simple – form does not apply. (There are also other technical problems, but this piece is about epochs.) If one is not certain about the epoch then for each possible epoch one has a different possible Bayesian probability, a form of Knightian uncertainty.

Representing uncertainty across epochs

Sometimes a change of context means that everything changes. At other times, some things can be ‘read across’ between epochs. For example, suppose that the hypotheses, H, are the same but the likelihoods P(E|H:C) change. Then one can use Bayes’ rule to maintain likelihoods conditioned on possible contexts.


Shannon‘s mathematical information theory builds on the notion of probability. A probability distribution determines an entropy, whilst information is measured by the change in entropy.  The familiar case of Shannon’s theory (which is what people normally mean when they refer to ‘information theory’) makes assumptions that imply Bayesian probability and a single epoch.  But in the real world one often has multiple or ambiguous epochs. Thus conventional notions of information are conditional on the assumed context. In practice, the component of ‘information’ measured by the conventional approach may be much less important than that which is neglected.


It is normally supposed that pieces of evidence are independent and so information commutes:

P(E1+E2|H:C) = P(E1|H:C).P(E2|H:C) = P(E2|H:C).P(E1|H:C) = P(E2+E1|H:C)

But if we need to re-evaluate the context, C, this is not the case unless we re-evaluate old data against the new context. Thus we may define a ‘shock’ as anything that requires us to change the likelihood functions for either past or future data. Such shocks occur when the data had been considered unlikely according to an assumption. Alternatively, if we are maintaining likelihoods against many possible assumptions, a shock occurs when none of them remains probable. 

See also

Complexity and epochs, Statistic and epochs, reasoning in a complex, dynamic, world.

Keynes, Turing and Good developed a theory of ‘weight of evidence’ for reasoning in complex worlds. TBB

David Marsay

The Cassandra Factor

A New Scientist article opines that some disasters are preceded by a slowing down of some characteristic ‘heart beat’.

From a Cybernetic point of view the key thing is that (as Keynes and Whitehead had previously noted) regular behaviour implies some sort of regulation, and that in ecosystems this typically comes about through predator-prey relationships, which in simple cases give rise to cyclical behaviours. As the article notes, a weakening of a regulatory factor can lead to a noticeable lengthening in the period of the cycles or – more generally – an increasing autocorrelation, preceding the failure of regulation and hence the crash.

The article considers the ‘financial whirl’, noting that any successful method of predicting collapse would ‘quickly invalidate its own forecasts as investors changed their strategies to avoid its predictions.’ This reflect Keynes’ thoughts. But how general is their model? The editorial refers to the BSE crisis in which the introduction of a cycle into the food ‘chain’ may have led, decades later, to the emergence of prions, causing BSE. This seems the opposite of the situation described in the article.

From a Cybernetic point of view, echoing Whitehead and Keynes, if we observe a system for a while we expect to be aware of those things that are regular, rather than what is ephemeral. This regularity reflects the action of regulatory systems, including their structure and strengths. Any change in regulation can lead to the end of an epoch, and may be perceived as a disaster, not just the failure of one element of regulation. Thus if an epoch is characterised by behaviour with a characteristic period, a speeding up or slowing down can both indicate significant change, which could be followed by disaster. But equally, a change may lead to an innovation that is perceived as a benefit. From this and related Cybernetic considerations:

  • A stable regulatory regime implies a stable characteristic behaviour in the thing regulated, so that change in what had been regulated indicates a change in regulation, and hence a significant change in behaviours, good or bad. This includes a change from a heart-beat that varies in frequency to one that is unhealthily ‘pure’.
  • A stable regulatory regime will rely on stable information inputs, which can only be effective if there is global stability, including lack of learning and innovation. So we should not expect any epochs to last forever, unless the system is already dead.
  • A regulatory system needs to cover both short-term and long-term aspects of behaviour. From an efficiency point of view it might focus on short-term issues most of the time, opening up and exploring longer-term issues periodically. It would be unfortunate to confuse this with the onset of a crisis, or to seek to inhibit it.
  • There is no widespread common appreciation of what makes for ‘good’ regulatory systems.

Dave Marsay

See also my Modelling the crash.

Modelling the Crash

Nature has a special issue on ‘Modelling the Crash’. May and Haldane propose a model intended to support policy. Johnson and Lux call for empirical support, meaning a clear link to financial data. Both views have a strong pedigree, yet both seem questionable.

In some ways this discussion pre-empts some of the material I am trying to build up on this blog, and may be ‘hard going’.

May and Haldane

May and Haldane point to connectivity as a factor that was missed out of the pre-crash economic analyses. They make an analogy with ecosystems, where low connectivity is chaotic, increasing connectivity brings stability, but then more connectivity brings instability. It may be a slight digression, but I think it important to focus on what is unstable. Initially it is the components, which are stabilised by being connected. But after a critical point the parts begin to form wholes. These now become the focus of attention. They are unstable because there are few of them. In an important sense, then, maximum complexity is where one has some connectedness (as in a liquid), but the parts are not ‘locked in’ to wholes (solids). If we think of connectivity as co-emerging with growth then the problem is not the kind of growth that loosely connects, but the growth beyond that, in which bubbles of self-reinforcing behaviours develop. This model would seem to suggest that we should be looking out for such feedbacks and bubbles. I agree. In biological evolution Lamarckian inheritance would lead to such bubbles. Maybe one lesson to be learned from nature is that Darwinian evolution is better: the survival of the satisfactory, rather than the over-optimisation of survival of the very fittest, leading to a population of seeming clones.

The paper is largely about robustness and herding, but put in rather narrow terms. It identifies the key issues as homogeneity and fragility, and modularity. It does recognize its own limitations.

I found the paper quite reasonable, but I do see that the presentation invites the supposition that it is leaning too heavily on its ‘mathematical model’.

Johnson and Lux

‘Shouldn’t we test against data sets?’ This raises great challenges.  We know how to test within epochs, and conventional economic theories seem reasonable enough. The challenges are thus:

  • What should we be looking for in theories that span epochs?
  • How do we test theories when the data sets span epochs?

It is common practice to apply ‘Occam’s razor’, which seeks to simplify as much as possible, yet Keynes and Whitehead tend to lead to a contrary view: never assume regularity without evidence. Thus even if we had a theory (like May and Haldane) that fitted all the data, Keynes would predict continuing innovation and hence we would have no reason to trust the theory in future. (See my ‘statistics and epochs’ .)

So What? May and Haldane seems insightful and useful, but leaves open a gap for a more comprehensive ‘model’. Such a model would include nodes with links like those of May and Haldane. Should it be extended into detailed models of how banks work? Or should it seek to identify further key factors? I would like to see it address the issues raised by Lord Turner in ‘mathematics and real systems’.

Dave Marsay

See also ‘The Cassandra Factor’.

Mathematics and real systems

The UK FSA’s Lord Turner, in the Turner Review of the financial crisis of 2008 was critical of the role of mathematics in misleading decision-makers about the possibility of a crisis. I have also found similar cynicism in other areas involving real complex systems. It seems that mathematics befuddles and misleads. (Or am I being unduly sensitive?)

In this INET interview Lord Turner provides a more considered and detailed critique of mathematics than I have come across from him before. (Unless you know different?) In defence of mathematicians I note:

  • His general criticism is that ‘sophisticated’ mathematical models were given a credibility that they did not deserve. But we do not judge a car on its engine alone. What mattered was not how mathematically brilliant the models may have been, but how they corresponded to reality. This was the preserve of the economist. If an economist declares certain assumptions to be true, the mathematician will build on them, yielding a model with certain behaviours. Normally you would expect the economist to reconsider the assumptions if the resultant behaviour wasn’t credible. But this didn’t happen.
  •  If the potential for crashes was a concern, no amount of statistical analysis based on data since the last big crash was ever going to be helpful. The problem was with the science, not the mathematics.
  • Turner is critical of the commonplace application of probability theory, extrapolating from past data, as if this were the only ‘mathematical’ approach.
  • Turner continually refers to ‘Knightian Uncertainty’ as if it were extra-mathematical. He does not note that Frank Knight was a Keynesian at a time when Keynes’ best known work was mathematical.
  • Turner refers to Keynes’ Treatise on Probability without remarking that it provides mathematical models for different types of Knightian uncertainty, or linking it to Whitehead’s (mathematical) work on complex systems.
  • In Whitehead’s terms, Turners criticism of mathematics is that it can only provide extropolations within a given epoch. But this is to ignore the work of Whitehead, Keynes and Turing, for example, on ’emergent properties’.

It seems clear that the economist’s assumption of ‘the end of history’ for economics led to the use of mathematical models that were only useful for making predictions conditional on the assumption that ‘the rules of the game are unchanging’. Is it reasonable to blame the mathematicians if some mistook an assumption for a mathematical theorem? (Possibly?) More importantly, Turner notes that the ‘end of history’ assumption led to a side-lining of economists who could work across epochs. They need to be revived. Perhaps so too do those mathematicians with a flair for the appropriate mathematics?

David Marsay, C. Math FIMA

See also: IMA paper, statistics blog.

Statistics and epochs

Statistics come in two flavours. The formal variety are based on samples with a known distribution. The empirical variety are drawn using a real-world process. If there is a known distribution then we know the ‘rules of the situation’ and hence are in a single epoch, albeit one that may have sub-epochs. In Cybernetic terms there is usually an implicit assumption that the situation is stable or in one of the equilibria of a polystable system. Hence, that the data was drawn from a single epoch. Otherwise the statistics are difficult to interpret.

Statistics are often intended to be predictive, by extrapolation. But this depends on the epoch enduring. Hence the validity of a statistic is dependent on it being taken from a single epoch, and the application of the statistic is dependent on the epoch continuing.

For example, suppose that we have found that all swans are white. We cannot conclude that we will never see black swans, only that if:

  • we stick to the geographic area from which we drew our conclusion
  • our sample of swans was large enough and representative enough to be significant
  • we are extrapolating over a time-period that is short compared with any evolutionary or other selective processes
  • there is no other agency that has an interest in falsifying our predictions.

then we are unlikely to see swans that are not white.

In particular the ‘law of large numbers’ should have appropriate caveats.

Annotated Bibliography

PLEASE NOTE: There is now a page. This post is NOT being maintained.



WR Ashby

Design for a brain 1960. (Ed. 2 has significant improvements.) A cybernetic classic. Ashby notes the correspondence to game theory (and thus to Whitehead). In simpler case Cybernetic language is considerably more accessible than Whitehead.

With Conant: Every Good Regulator of a System must be a Model of that System, 1970. It claims that:

“The theorem has the interesting corollary that the living brain, so far as it is to be successful and efficient as a regulator for survival, must proceed, in learning, by the formation of a model (or models) of its environment.” [My italics.]

The notion of a regulator is an objective mechanism which is required to maintain pre-defined objective criteria: i.e., keep some output within a ‘good’ domain. It may also have other functions, e.g. optimising something. The theory shows that in so far as a system to be regulated conforms to some model, it will need – in effect – to be modelled to be regulated effectively. It does not explicitly consider feed-back from output to input. Where this is present on could apply the theory to short-term regulation, within the delay of the feed-back. Alternatively one could apply it at the level of strategy rather than base events and activities. In this case one needs to model ‘the game’, including the ‘board’, the players and their motivations.

JM Keynes

Keynes was a mathematician student of Whitehead, employed by the treasury during the war, who worked with JC Smuts on the transition to peace. The Prime Minister (Lloyd George) supported Keynes in producing the first two references as ‘lessons identified’ from the war.

Treatise on Probability 1919. Keynes’ critique of ‘standard’ (numeric) probability theory, together with some useful generalisations. Refers to Whitehead (Keynes’ tutor). Underpins subsequent work.

The Economic Consequences of the Peace 1919. Keynes shows that although depressions are impossible according to the then classical economics, based on standard probability, ‘animal spirits’ introduced the greater uncertainties, of the kind described in his treatise, made a crash and a sustained impression virtually certain. Thus whereas the conventional view was that economies were inherently buoyant, so that recessions could occur as a result of ‘exogenous events’, such as war, a depression could actually occur endogenously – due to ‘market failures’.

The General Theory of Employment, Interest and Money 1935. The master work on economics. It explains how crises such as those of 1929 and 2007/8 occur, and suggests some remedies. Underpinned by the treatise on probability, but this is not widely appreciated. ‘Keynesian’ economics means the application of stock remedies whatever the cause. In this sense, Keynes was not a Keynesian. Rather he developed a theory as a way of looking at economies. His approach was to follow where theory led, taking account of residual uncertainties.

JC Smuts

Smuts worked with Keynes on economics.

Holism and evolution 1927. Smuts’ notoriously inaccessible theory of evolution, building on and show-casing Keynes’ notion of uncertainty. Although Smuts and Whitehead worked independently, they recognized that their theories were equivalent.

The Scientific World-Picture of Today, in ‘British Association for the Advancement of Science, Report of the Centenary Meeting’. London: Office of the BAAS, 1932. Smuts’ presidential address, outlining the new ideas in ‘Holism and evolution’ and their import.

I Stewart

Ian has done more than most to explore, develop and explain the most important parts of qualitative mathematics.

Life’s Other Secret: The new mathematics of the living world, 1998. This updates D’Arcy Thompson’s classic On growth and form, ending with a manifesto for a ‘new’ mathematics, and a good explanation of the relationship between mathematics and scientific ‘knowledge’. Like most post-80s writings, it’s main failing is that it sees science as having achieved some great new insights in the 80s, ignoring the work of Whitehead et al, as explained by Smuts, for example. 

M Tse-Tung

Chairman Mao.

On Contradiction 1937. Possibly the most popular account of Whitehead’s ideas.

AN Whitehead

Whitehead was Russell and Keynes’ tutor.

Process and Reality 1929. Notoriously hard going, it refers to Keynes’ work on uncertainty.

W Whitman

American civil-war thinker and poet. Part of the inspiration for the Beat movement.

Leaves of Grass 1855. Part of Smuts’ inspiration.

See also

ABACI – a consultancy who offer to tailor solutions.

Dave Marsay

Complexity and epochs

It is becoming widely recognized (e.g. after the financial crash of 2008) that complexity matters. But while it is clear that systems of interest are complex in some sense, it is not always clear that any particular theory of complexity captures the aspects of importance.
We commonly observe that systems of interest do display epochs, and many systems of interest involve some sort of adaptation, learning or evolution, so according to Whitehead and Ashby (Cybernetics) will display epochs. Thus key features of interest are:

  •  polystability: left to its own devices the system will tend to settle down into one or other of a number of possible equilibria, not just one.
  • exogenous changeability: the potential equilibria themselves change under external influence.
  • endogenous changeability: the potential equilibria change under internal influence.

For example, a person in an institution such as an old folk’s home is likely to settle into a routine, but their may be other routines that they might have adopted, if things had happened differently earlier on. Thus their behaviour is polyunstable, except in a very harsh institution. Their equilibrium might be upset by a change in the time at which the papers are delivered, an exogenous change.   Endogenous factors are typically slower-acting. For example, adopting a poor diet may (in the long – run) impact on their ability recover from illnesses and hence on their ability to carry on their establish routine. For a while the routine may carry on as normal,  only to suddenly become non-viable.