Comments on Keynes’ Treatise
Keynes’ Treatise on Probability provides both a critique of statistical inference and a constructive theory, in which he commends certain assumptions as vital to (classical science). But his later work, and experience with quantum mechanics and complex systems, calls the assumptions into question and seems to call for a synthesis of the positive and negative aspects of the treatise to support more general methods.
These comments attempt to develop such a synthesis. It largely draws upon the original treatise but looking for a more general theory. It is informed by the work of Turing and Good, and others who have developed, applied or critiqued Keynes’ work.
Keynes findings rely on one of two circumstances:
- Random sampling of a population.
- Good modelling of the characteristics of a population, such that any variability is due to randomness.
This excludes, for example, game-like situations.
Sampling: the law of large numbers
If p% of a population has property X then:
- About p% of large randomly selected sub-population will have property X, with the approximation being better the larger the sub-population.
- A randomly selected member of the population may be said to have property X with probability p/100.
But Keynes shows, by means of example, that it can be misleading to say that an arbitrary member of the population has a probability. For example, in a criminal trial the suspect is rarely randomly selected, and so Keynes’ notion of probability does not apply.
Suppose that we are interested in property X and have data for P(X|A) and P(X|B) for non-overlapping homogenous populations A, B. The usual (Bayesian) formula is P(X|A U B) = P(X|A).P(A)+P(X|B).P(B). But, following Boole, if the unconditional probabilities are variables, then P(X|A U B) is a function of those variables. If they are unknown then all we can say is that P(X|A U B) lies between P(X|A) and P(X|B), which leads naturally to Keynes’ idea of an interval probability. As Keynes notes, trying to reduce this to a ‘point’ probability can be misleading.
Intervals and sub-populations
If we use interval probabilities, such as P(X|A) = [a,a’], then we can define P(X|A U B) = [min(a,b),max(a’,b’)], as in Good‘s generalized likelihoods. Thus for heterogeneous populations, A, P(X|A) can be interpreted as the range of proportions for reasonable sub-populations. These sub-populations have to be large enough to satisfy Keynes’ criteria. If one wants to talk about P(X|a) for an individual a, then one needs to be able to identify a canonical sub-population containing a, based on all the available data. In typical science, this is often reasonable. But in criminal cases, for example, this is not always straightforward.
The nature of science
Keynes argues that:
- If the underlying situation has a state with probabilistic transitions (as in a Markov model) and …
- The transition probabilities are never too close to 0 and …
- One has identified all the factors,
then the assumptions of statistical inference hold. One can:
- Determine the parameters of the model
- Apply the model, extrapolating to unobserved situations (e.g., the future)
But in the development of any science it is commonplace for significant factors to have been overlooked, and for the regularity to be misleading, as when some people thought that all swans were white. Moreover, it seems inimical to the core concepts of science ever to suppose that everything is absolutely known about a subject, for then one would stop being scientific and would become dogmatic. A synthesis with Keynes own negative findings would be to suppose that science (and other honest empirical endeavours) is justified when two things hold:
- Its findings are published in a form that allows others to check them and to look for additional factors.
- Its conclusions are expressed or invariably interpreted as being conditional on ‘all else remaining equal’, realising that this may not be the case when others have an incentive to change the situation (e.g. by identifying new factors) or the situation is otherwise subject to substantial evolution.
Roughly speaking, this is the position that Keynes developed for economics, for which the implicit assumptions of classical science clearly do not hold. That is, one can proceed ‘scientifically’, but with humility about the results. This contrasts with scientism, which applies the methods of science with no regard for their validity in the actual case.
Keynes regarded science as being about regular processes, but this is too restrictive. His tutor, Whitehead, developed a process model that incorporated, and perhaps extended, Keynes’ theory.
Instead of supposing that processes are regular, Whitehead only supposes that activity in epochs, between which there are transitions. Thus Keynes’ theory applies within an epoch, and says nothing about transitions. Whitehead’s epochs are nested, so that epochs may have sub- and supra- epochs. Keynes, in effect, argues that one has to consider factors in sufficient detail that the sub-epochs are random. His methods are limited to an epoch. But if one is concerned that the current epoch may end then one may be able to ‘take a broader view’, and consider the supra-epoch. Sometimes, for example a transition may change the base probabilities P(A) (such as when large numbers of a population are destroyed non-uniformly) but the likelihoods P(X|A) may remain unchanged. This is what Keynes sought to do for economics.
Ordinary pragmatism consists of behaving as if the current epoch will last forever, and coping with any change as and when it occurs. In this sense, statistical inference is pragmatic, and in his Treatise Keynes seems to regard this as reasonable, given his experience – as an academic – that things really do tend to be stable. But his later experience would seem to call for a new kind of pragmatism.
Statistical inferences are made from data. If we regard the data as samples (not necessarily random, but otherwise representative) from some population and potential new data as simply more samples from the same set, then we expect the inference to be valid. Changes come about because of fundamental changes in the population, or in how the samples are selected. But from the same data we can often draw different inferences. For example, from one radar we can draw inferences about the courses of many aircraft. This raises the possibility that inferences about different things that fit the current data may clash when extrapolated to a future time, as when two aircraft are on a collision-course. A narrow kind of pragmatism looks at each inference individually and regards it as reasonable, not considering the potential clash. A more ‘holistic’ pragmatism is this:
- Make all inferences, at various levels of abstraction.
- Identify potential clashes.
- Allocate inferred activity and clashes to ‘epochs’, and identify constraints for those epochs
- Rationalise the above, considering new details and gathering more evidence as necessary.
For example, if one identifies that two aircraft are on a collision course, one needs to ask if they are subject to air-traffic control, what organic collision avoidance aids they may have, and so on. Then one can consider what would normally happen in such circumstances and whether the two aircraft might be considered ‘normal’. In effect, flying straight and level is one epoch, moving so as to avoid a collision is another.
Reductionism and Analysis
Inference takes place within a context, and one clearly needs to make sure that one has the right context, by taking a ‘wholistic’ approach. In simple cases one has straightforward reductionist structural and organisational relationships, so that one move analytically between ‘parts’ and ‘wholes’. But this is not always the case:
- In evolution, things often co-evolve, e.g. bodies and organs, predators and prey.
- Sentient beings often anticipate clashes, so that although clashes are a consequence of actions, actions are influenced by anticipated clashes.
- In financial speculation, expectation and actual performance feed off each other, cyclically.
At any one time there tend to be relatively few dominant inter-relationships, so that a pragmatic wholistic analytic reductionist approach will work in the short-term, in the same way that classical statistical inference can enable one to turn a good profit until the next crash. To support behaviours that are effective in the long-run, one needs a ‘theory of the long-run’, taking as data a long and wide experience. Keynes’ assumptions may not apply.
The law of Requisite Variety
Keynes discussion of insurance arises in the following context:
8. … Von Bortkiewicz … has … worked out further statistical constants … ; and he elaborately compares the theoretical value of the coefficients with the observed value in certain actual statistical material. He concludes with the thesis, that Homogeneity and Stability (defined as he defines them) are opposed conceptions, and that it is not correct to premise, that the larger statistical mass is as a rule more stable than the smaller, unless we also assume that the larger mass is less homogeneous. At this point, it would have helped, if Von Bortkiewicz … had stopped to tell in plain language where his mathematics had led him, and also whence they had started. But like many other students of Probability he is eccentric, preferring algebra to earth.
9. … If the a priori calculations are based on the average over a field which is not homogeneous in all its parts, greater stability of result will be obtained if the
instances are drawn from all parts of the non-homogeneous total field, than if they are drawn now from one homogeneous sub-field and now from another. This is not at all paradoxical. Yet I believe, though with hesitation, that this is all that Von Bortkiewicz’s elaborately supported mathematical conclusion really amounts to. …
- In finance, group-think can lead to a reduction in variety, and hence loss in stability.
- In evolution, over-selection would lead to a reduction in variety, and hence loss in stability. (For example, organisms with Lamarckian selection have an advantage in the short-term, but those with Darwinian selection tend to be able to survive shocks better.)
Thus while Keynes may be correct that the opposition of homogeneity (and hence efficiency) to stability (and hence sustainability) is mathematically straightforward, it may nonetheless be of great significance in the real world.
Smuts later introduced the term ’emergence’ for the arrival of new things that herald a new epoch. This often happens when two or more existing things come together and form a new self-sustaining ‘self-organising’ whole. Statistical inference depends on the assumption that there can be no novelty or ‘free will’. Thus the assumptions which Keynes identifies are of interest, since their violation creates niches in which innovation is possible.
One wants to encourage diversity and the coming together of differences, in a way that preserves that difference, without producing a greyness. But if one wants some influence over the consequences one needs to understand the variety of capabilities and potentialities, and how they may combine. This may mean that an undersatnding of the underpinning logic and mathematics may be more important than a facility with the common methods. If Keynes’ assumptions were true, mathematics would help create methods that would make mathematics (as distinct from a facility with computation) redundant. But they aren’t, and – if Smuts et al are correct – the understanding that mathematics brings is essential to our sustainability.
Keynes later work
Endorsements and extensions:
- Tinbergen (Holding that the assumptions that Keynes identifies as being necessary for induction can, in practice, be taken for granted, pragmatically.)
Some support, but some divergence: