# Kolmogorov’s Foundations of Probability

V.A.N. Kolmogorov *Foundations of the Theory of Probability* 2nd Ed. Tr. N. Morrison, Chelsea, NY, 1956. (Original 1936.)

The received wisdom is that Kolmogorov firmly established the view that ‘the probability of some event … satisfies his ‘Kolmogorov axioms’, (where ‘Probability is the measure of the likeliness that an event will occur.’)

For a coin toss, Kolmogorov’s axioms are often taken to imply that

P({Heads})+P({Tails})=1.

This equality is a direct consequence of measure theory provided that such a measure exists. Kolmogorov is often ‘credited’ with the view that existence is a non-issue, and hence that well-founded probabilities always exist. His work is still interesting from this angle. Is this attribution fair?

## Preface

Kolmogorov notes:

[The] analogies between measure of a set and probability of an event, and between integral of a function and mathematical expectation of a random variable, became apparent.

Making such an analogy, we might expect there to exists upper and lower probabilities, analogous to upper and lower measures and upper and lower integrals. We might ask if the upper and lower measures are necessarily the same. For example, if I aim a dart board with a no-measurable sub-set, the notion of ‘the probability of hitting the sub-set’ appears incapable of interpretation as a single number. It may be the case that Kolmogorov thought that all sets of interest would necessarily be measureable, but are we so persuaded?

## I. Elementary Theory of Probability

The theory of probability, as a mathematical discipline, can and should be developed from axioms in exactly the same way as Geometry and Algebra.

Thus, as with Euclidean Geometry, it is possible to have a mathematical discipline that faithfully reflects, codifies and systematizes a set of ideas that are not actually true to the intended reality. The two aspects should not be confused. In this case, the fact that Kolmogorov’s axioms accurately capture mainstream ideas about probability is not at all controversial. It is the extent to which these are ideas are universally applicable that can be doubted.

… This means that after we have defined the elements to be studied and their basic relations, and have stated the axioms by which these relations are to be governed, all further exposition must be based exclusively on these axioms, independent of the usal concrete meaning of these elements and their relations.

Prior to Kolmogorov’s time the general assumption had been that, like Geometry, all mathematics would be finitely axiomatizable in this sense. But subsequently the work of Godel, Church, Turing and Post – among others – showed that not even Arithmetic is finitely axiomatizable in this strong sense, so we no longer treat it as a given that a theory of uncertainty could be reduced to finitely many axioms. The idea that probability – like Geometry – is fully captured by a set of axioms would need to be demonstrated. Even then, as with Geometry, it would show that our conceptions were somehow small, not that the thing which supposedly corresponds to our conception is really small.

### 1 Axioms

Kolmogorov considers a finite fixed set of elementary events and ‘a’ set of subsets.

### 2 The Relation to Experimental Data

Kolmogorov in effect assumes the law of large numbers (that the sample probability tends to the actual probability) and takes this as the definition of ‘probability’. But the law of large numbers is not universal for real event streams: even roulette wheels can wear out.

### 6 Conditional Probabilities as Random Variables, Markov Chains

Kolmogorov defines the ‘mathematical expectation’ as a probability-weighted average. Again, this seems a very tame version of uncertainty. He does not argue that such averages always exist, and leaves ‘mathematical expectation’ as undefined when it doesn’t.

## II. Infinite Probability Fields

This contains the key results in establishing probability as a mathematical discipline analogous to Geometry. He notes that the results are ‘merely a mathematical structure’, but does not labour the point. (I take it to mean that we have something like a Euclidean model of something, which does not imply that the thing being modelled really is Euclidean.)

## III. Random Variables

### 2. Definition of Variables and of Distribution Functions

In effect, a ‘random variable’ is a single-valued function on some base set for which the probability is defined on all appropriate sub-sets. The ‘distribution function’ is what we now more commonly call the ‘cumulative distribution function’.

## IV. Mathematical Expectation

### 1. Abstract Lebesque Integrals

Kolmogorov defines a generalised expectation using the tools of the previous chapter. He notes:

If this series converges absolutely for every … In this abstract form the concept … is indispensable for the theory of probability.

### 4. Some Criteria for Convergence

This considers the convergence of a sequence of random variables. Two alternative sufficient conditions on the expectation of some indicator function are given. These are also necessary for well-behaved indicator functions.

Kolmogorov supposes this theory to have many useful applications, but does not argue that it is universally applicable.

## V. Conditional Probabilities and Mathematical Expectations

### 2. Explanation of a Borel Paradox

Kolmogorov notes that when there is a large space of options, what happens may have had a prior probability of 0. In this case expectations conditioned on that event can be nonsense, hence Borel’s paradox. Thus we should be cautious in using probability theory when something that we would have thought had probability 0 happens. (Unfortunately, this is all too often.)

### 4. Conditional Mathematical Expectation

The conditional expectation is defined ‘if it exists’.

## VI. Independence; The Law of Large Numbers

### 3. The Law of Large Numbers

Kolmogorov defines some conditions under which sequences of random variables are stable or – better – have ‘normal stability’, for example if the variance tends to 0 (the ‘Markov condition’). He does not claim that these universally hold, but leaves them as something to be determined before his theory can be applied to any particular case. (That is, he regards them as axiomatic in the mathematical sense, rather than as a universal truth.)

## Supplementary Bibliography

(By Translator.)

There are many problems, especially in theoretical physics, that do not fit into the Kolmogorov theory, the reason being that these problems involve unbounded measures.

## Comments

### Kolmogorov as Mathematics, not dogma

If we assent to his axioms, Kolmogorov provides us with a wealth of mathematics that unconditionally resolves some key issues. He gives some examples under which his axioms are reasonable or even indisputable. If we have a mechanism that is random in the appropriate sense, then his axioms will hold. If have a static population that we sample randomly, then we can apply his theory – with care – to make some useful deductions. Conversely, if a population is evolving then (by definition) it is not stable in Kolmogorov’s sense, and so much of his theory is inapplicable. And to apply his work to observations of a human we would have to suppose that they ‘were’ a Markov chain process, which hardly credits them with free will and would seem to suppose that we had more intelligence than they did.

From a mathematical view, it seems to me that Kolmogorov properly bounds his work, is not claiming that his merely mathematical theory would apply to a coin being tossed by a magician, for example.

### Possible Interpretations

In Kolmogorov’s interpretation, the probability is equal to the long-run expected proportion (‘the law of large numbers’). A sounder interpretation is that it would be the naïve long-run expected proportion based on the evidence to hand. For example, the longer my car goes without breaking down the less and less the naïve probability of it breaking down, based on that evidence. But this does not preclude me from looking at the break-down rates for other cars and estimating that the ‘true’ probability will be higher by an amount that increases each day, based on ‘bucket curve’ data for cars like mine, or if my car is new, based on common sense. Kolmogorov’s theory only considers a single ‘measure’, based on fixed data. But, as with the car, there may be uncertainties about what is the appropriate data. Kolmogorov does not consider this aspect.

### Possible Extensions

If we wish to apply measure theory, but are unconvinced that the analogous sets are actually measureable, we might consider upper and lower measures. As measures these satisfy the Kolmogorov axioms, apart from being normalised. It is natural to add that:

The upper measure is no lower than the lower measure.

The lower measure of a set is plus the upper measure of the complement of that set add to 1.

Ordinary probability theory is then a special case. A difficulty of this approach is that probability values are not as deterministic as they are in the ordinary case, and it is not clear that the axioms are always enough.

Kolmogorov’s theory is related to the law of large numbers. One could interpret upper and lower probabilities as putting bounds on long-run proportions, but one still might wish to discriminate between the case where there the sample proportions will converge to some value that is only defined to within some range and the case where the long-run behaviour might never settle, but wander within a range (like a wobbly roulette wheel).

Alternatively, many systems do seem to resemble Markov process in the short run, so one might regard Kolmogorov’s theory as reasonable in the short run, so long as nothing is happening to unsettle the supposed mechanism. Perhaps no alternative theory is needed, as long as one does not rely on it to make longer-run predictions. (That is, we regard the measure as ‘the current measure’ without relying on it being absolutely fixed.) This may be closest to what Kolmogorov intended.

Pingback: Instrumental Probabilities | djmarsay