Law of Great Numbers: Keynes’ Treatise
Keynes’ Treatise on Probability discusses ‘the law of great numbers’, now more familiar as ‘the law of large numbers’, at some length. Roughly speaking, this is that in the long-run, sample frequencies tendreasonably fast (depending on your assumptions) to probabilities.
Ch. XXVIII The Law of Great Numbers
Within the part dealing with statistical inference, Keynes says of Poisson’s introduction of the ‘law’:
This is the language of exaggeration; it is also extremely vague. But it is exciting; it seems to open up a whole new field to scientific investigation; and it has had a great influence on subsequent thought. Poisson seems to claim that, in the whole field of chance and variable occurrence, there really exists, amidst the apparent disorder, a discoverable system. Constant causes are always at work and assert themselves in the long run, so that each class of event does eventually occur in a definite proportion of cases. It is not clear how far Poisson’s result is due to à priori reasoning, and how far it is a natural law based on experience; but it is represented as displaying a certain harmony between natural law and the à priori reasoning of probabilities.”
On applications of the supposed law, Keynes notes:
The existence of numerous instances of the Law of Great Numbers, or of something of the kind, is absolutely essential for the importance of Statistical Induction. Apart from this the more precise parts of statistics, the collection of facts for the prediction of future frequencies and associations, would be nearly useless. But the ‘Law of Great Numbers’ is not at all a good name for the principle which underlies Statistical Induction. The ‘Stability of Statistical Frequencies’ would be a much better name for it. The former suggests, as perhaps Poisson intended to suggest, but what is certainly false, that every class of event shows statistical regularity of occurrence if only one takes a sufficient number of instances of it. It also encourages the method of procedure, by which it is thought legitimate to take any observed degree of frequency or association, which is shown in a fairly numerous set of statistics, and to assume with insufficient investigation that, because the statistics are numerous, the observed degree of frequency is therefore stable. Observation shows that some statistical frequencies are, within narrower or wider limits, stable. But stable frequencies are not very common, and cannot be assumed lightly.“
Ch. XXIX The Use of A Priori Probabilities for the Prediction of Statistical frequency …
Bernoulli’s Theorem [concerning the variability of sample proportions] is generally regarded as the central theorem of statistical probability. It embodies the first attempt to deduce the measures of statistical frequencies from the measures of individual probabilities, and …out of it the conception first arose of general laws amongst masses of phenomena, in spite of the uncertainty of each particular case. But, as we shall see, the theorem is only valid subject to stricter qualifications, than have always been remembered, and in conditions which are the exception, not the rule.
… Thus Bernoulli’s Theorem is only valid if our initial data are of such a character that additional knowledge, as to the proportion of failures and successes in one part of a series of cases is altogether irrelevant to our expectation as to the proportion in another part. …
Such a condition is very seldom fulfilled. If our initial probability is partly founded upon experience, it is clear that it is liable to modification in the light of further experience. It is, in fact, difficult to give a concrete instance of a case in which the conditions for the application of Bernoulli’s Theorem are completely fulfilled.
It seldom happens, therefore, that we can apply Bernoulli’s Theorem with reference to a long series of natural events. For in such cases we seldom possess the exhaustive knowledge which is necessary. Even where the series is short, the perfectly rigorous application of the Theorem is not likely to be legitimate, and some degree of approximation will be involved in utilising its results.
Adherents of the Frequency Theory of Probability, who use the principal conclusion of Bernoulli’s Theorem as the defining property of all probabilities, sometimes seem to mean no more than that, relative to given evidence, every proposition belongs to some series, to the members of which Bernoulli’s Theorem is rigorously applicable. But the natural series, the series, for example, in which we are most often interested, … is not, as a rule, rigorously subject to the Theorem.
If, for instance, balls are drawn from a bag, which is one, but it is not certainly known which, out of a number of bags containing black and white balls in differing proportions, the knowledge of the colour of the first ball drawn affects the probabilities at the second drawing, because it throws some light upon the question as to which bag is being drawn from.
This last type is that to which most instances conform which are drawn from the real world. A knowledge of the characteristics of some members of a population may give us a clue to the general character of the population in question. Yet it is this type, where there is a change in knowledge but no change in the material conditions from one instance to the next, which is most frequently overlooked.“
Keynes gives the following examples:
For consider the case of a coin of which it is given that the two faces are either both heads or both tails: at every toss, provided that the results of the other tosses are unknown, the probability of heads is and the probability of tails is 1/2; yet the probability of m heads and m tails in 2m tosses is zero, and it is certain à priori that there will be either 2m heads or none. Clearly Bernoulli’s Theorem is inapplicable to such a case. And this is but an extreme case of a normal condition.
If we are given a penny of which we have no reason to doubt the regularity, the probability of heads at the first toss is 1/2 ; but if heads fall at every one of the first 999 tosses, it becomes reasonable to estimate the probability of heads at the thousandth toss at much more than 1/2 . For the à priori probability of its being a conjurer’s penny, or otherwise biassed so as to fall heads almost invariably, is not usually so infinitesimally small as (1/2 )<sup>1000</sup>. We can only apply Bernoulli’s Theorem with rigour for a prediction as to the penny’s behaviour over a series of a thousand tosses, if we have à priori such exhaustive knowledge of the penny’s constitution and of the other conditions of the problem that 999 heads running would not cause us to modify in any respect our prediction à priori.