Who thinks probability is just a number? A plea.

Many people think – perhaps they were taught it – that it is meaningful to talk about the unconditional probability of ‘Heads’ (I.e. P(Heads)) for a real coin, and even that there are logical or mathematical arguments to this effect. I have been collecting and commenting on works which have been – too widely – interpreted in this way, and quoting their authors in contradiction. De Finetti seemed to be the only example of a respected person who seemed to think that he had provided such an argument. But a friendly economist has just forwarded a link to a recent work that debunks this notion, based on wider  reading of his work.

So, am I done? Does anyone have any seeming mathematical sources for the view that ‘probability is just a number’ for me to consider?

I have already covered:

There are some more modern authors who make strong claims about probability, but – unless you know different – they rely on the above, and hence do not need to be addressed separately. I do also opine on a few less well known sources: you can search my blog to check.

Dave Marsay

Advertisements

Law of Great Numbers: Keynes’ Treatise

Keynes’ Treatise on Probability discusses ‘the law of great numbers’, now more familiar as ‘the law of large numbers’, at some length. Roughly speaking, this is that in the long-run, sample frequencies tendreasonably fast (depending on your assumptions) to probabilities. 

Ch. XXVIII The Law of Great Numbers

Within the part dealing with statistical inference, Keynes says of Poisson’s introduction of the ‘law’:

This is the language of exaggeration; it is also extremely vague. But it is exciting; it seems to open up a whole new field to scientific investigation; and it has had a great influence on subsequent thought. Poisson seems to claim that, in the whole field of chance and variable occurrence, there really exists, amidst the apparent disorder, a discoverable system. Constant causes are always at work and assert themselves in the long run, so that each class of event does eventually occur in a definite proportion of cases. It is not clear how far Poisson’s result is due to à priori reasoning, and how far it is a natural law based on experience; but it is represented as displaying a certain harmony between natural law and the à priori reasoning of probabilities.”

On applications of the supposed law, Keynes notes:

The existence of numerous instances of the Law of Great Numbers, or of something of the kind, is absolutely essential for the importance of Statistical Induction. Apart from this the more precise parts of statistics, the collection of facts for the prediction of future frequencies and associations, would be nearly useless. But the ‘Law of Great Numbers’ is not at all a good name for the principle which underlies Statistical Induction. The ‘Stability of Statistical Frequencies’ would be a much better name for it. The former suggests, as perhaps Poisson intended to suggest, but what is certainly false, that every class of event shows statistical regularity of occurrence if only one takes a sufficient number of instances of it. It also encourages the method of procedure, by which it is thought legitimate to take any observed degree of frequency or association, which is shown in a fairly numerous set of statistics, and to assume with insufficient investigation that, because the statistics are numerous, the observed degree of frequency is therefore stable. Observation shows that some statistical frequencies are, within narrower or wider limits, stable. But stable frequencies are not very common, and cannot be assumed lightly.

Ch. XXIX The Use of A Priori Probabilities for the Prediction of Statistical frequency … 

Bernoulli’s Theorem [concerning the variability of sample proportions] is generally regarded as the central theorem of statistical probability. It embodies the first attempt to deduce the measures of statistical frequencies from the measures of individual probabilities, and …out of it the conception first arose of general laws amongst masses of phenomena, in spite of the uncertainty of each particular case. But, as we shall see, the theorem is only valid subject to stricter qualifications, than have always been remembered, and in conditions which are the exception, not the rule.

… Thus Bernoulli’s Theorem is only valid if our initial data are of such a character that additional knowledge, as to the proportion of failures and successes in one part of a series of cases is altogether irrelevant to our expectation as to the proportion in another part.

Such a condition is very seldom fulfilled. If our initial probability is partly founded upon experience, it is clear that it is liable to modification in the light of further experience. It is, in fact, difficult to give a concrete instance of a case in which the conditions for the application of Bernoulli’s Theorem are completely fulfilled.

It seldom happens, therefore, that we can apply Bernoulli’s Theorem with reference to a long series of natural events. For in such cases we seldom possess the exhaustive knowledge which is necessary. Even where the series is short, the perfectly rigorous application of the Theorem is not likely to be legitimate, and some degree of approximation will be involved in utilising its results.

Adherents of the Frequency Theory of Probability, who use the principal conclusion of Bernoulli’s Theorem as the defining property of all probabilities, sometimes seem to mean no more than that, relative to given evidence, every proposition belongs to some series, to the members of which Bernoulli’s Theorem is rigorously applicable. But the natural series, the series, for example, in which we are most often interested, … is not, as a rule, rigorously subject to the Theorem.

If, for instance, balls are drawn from a bag, which is one, but it is not certainly known which, out of a number of bags containing black and white balls in differing proportions, the knowledge of the colour of the first ball drawn affects the probabilities at the second drawing, because it throws some light upon the question as to which bag is being drawn from.

This last type is that to which most instances conform which are drawn from the real world. A knowledge of the characteristics of some members of a population may give us a clue to the general character of the population in question. Yet it is this type, where there is a change in knowledge but no change in the material conditions from one instance to the next, which is most frequently overlooked.

Keynes gives the following examples:

For consider the case of a coin of which it is given that the two faces are either both heads or both tails: at every toss, provided that the results of the other tosses are unknown, the probability of heads is and the probability of tails is 1/2; yet the probability of m heads and m tails in 2m tosses is zero, and it is certain à priori that there will be either 2m heads or none. Clearly Bernoulli’s Theorem is inapplicable to such a case. And this is but an extreme case of a normal condition.

If we are given a penny of which we have no reason to doubt the regularity, the probability of heads at the first toss is 1/2 ; but if heads fall at every one of the first 999 tosses, it becomes reasonable to estimate the probability of heads at the thousandth toss at much more than 1/2 . For the à priori probability of its being a conjurer’s penny, or otherwise biassed so as to fall heads almost invariably, is not usually so infinitesimally small as (1/2 )<sup>1000</sup>. We can only apply Bernoulli’s Theorem with rigour for a prediction as to the penny’s behaviour over a series of a thousand tosses, if we have à priori such exhaustive knowledge of the penny’s constitution and of the other conditions of the problem that 999 heads running would not cause us to modify in any respect our prediction à priori.

Dave Marsay

Maths for the Modern Economy

Mathematics Today Vol. 53 No. 5 October 2017

Extracts concerning uncertainty

Maths for the Modern Economy at the Royal Society (pg 209)

Both talks described the role of a mathematician in understanding the assumptions that have gone into a decision , subsequently challenging and testing these, and made a compelling case for employing [mathematicians].

Optimising Resilience: at the Edge of Computability (pg 233)

A number of authors argued that the classical Bayesian approaches can fail and a different type of is required to capture partial or conflicting sources of information.

The value of p-values (Letters, pg 241)

[As] one digs more deeply into any school of inference, difficulties arise. This should not be surprising: we are trying to make inferences about the real world, not some mathematical idealisation. Modern thought recognises that different schools of inference are helpful in shedding light on different kinds of question: there is, and indeed can be, no universally best method.

… Bayesian methodology is coherent – meaning that it is internally consistent. [This] is important, but our ultimate objective is to make a statement about the real world, so the key question is how well the data and our theories match. … [The] old criticism [is]that the Bayesian approach leads to the tail of mathematical coherence wagging the dog of the scientific question.

Comments

The first quote implies that uncertainty is important and widespread, and mathematicians have a role in uncovering and dealing with it. The second quote implies that even those who work on ‘quantifying uncertainty’ recognize that this ultimately impossible. The third quote is part of a suggestion that – done properly – the statistician’s p-values may be a useful adjunct to probabilities. I agree. But I do not think that this is always enough.

It seems to me that one can often derive a reasonable probability based on assumptions that are generally accepted by some community, or possibly derive different probabilities for different communities. The next most significant stage is not to supplement the probability/probabilities by some number, but to explicate the assumptions and to develop credible scenarios under which they may fail.

For example, if asked for the probability of a run of Heads when tossing a coin, we might calculate the answer for the usual assumptions and then point out how these could fail (e.g. by using a carefully engineered machine to toss).

Dave Marsay

The search for MH370: uncertainty

There is an interesting podcast about the search for MH370 by a former colleague. I think it illustrates in a relatively accessible form some aspects of uncertainty.

According to the familiar theory, if one has an initial probability distribution over the globe for the location of MH370’s flight recorder, say, then one can update it using Bayes’ rule to get a refined distribution. Conventionally, one should search where there is a higher probability density (all else being equal). But in this case it is fairly obvious that there is no principled way of deriving an initial distribution, and even Bayes’ rule is problematic. Conventionally, one should do the best one can, and search accordingly.

The podcaster (Simon) gives examples of some hypotheses (such as the pilot being well, well-motivated and unhindered throughout) for which the probabilistic approach is more reasonable. One can then split one’s effort over such credible hypotheses, not ruled out by evidence.

A conventional probabilist would note that any ‘rational’ search would be equivalent to some initial probability distribution over hypotheses, and hence some overall distribution. This may be so, but it is clear from Simon’s account that this would hardly be helpful.

I have been involved in similar situations, and have found it easier to explain the issues to non-mathematicians when there is some severe resource constraint, such as time. For example, we are looking for a person. The conventional approach is to maximise our estimated probability of finding them based on our estimated probabilities of them having acted in various ways (e.g., run for it, hunkered down). An alternative is to consider the ways they may ‘reasonably’ be thought to have acted and then to seek to maximize the worst case probability of finding them. Then again, we may have a ranking of ways that they may have acted, and seek to maximize the number of ways for which the probability of our success exceeds some acceptable amount (e.g. 90%). The key point here is that there are many reasonable objectives one might have, for only one of which the conventional assumptions are valid. The relevant mathematics does still apply, though!

Dave Marsay

The Tipping Point

Malcolm Gladwell The Tipping Point: How little things can make a big difference Abacus 2001.

Introduction

The Tipping Point is the biography of an idea: .. that the best way to understand …any number of mysterious changes that mark everyday life is to think of them as epidemics.

… The name given to one dramatic moment in an epidemic when everything can change at once is the Tipping Point.

1. The three rules of epidemics

The three rules of the Tipping Point – the Law of the Few, the Stickiness Factor, the Power of Context – offer a way of making sense of epidemics. They provide us with direction for how to go about reaching a Tipping Point.

2. The law of the few: connectors, mavens, salesmen

…the success of any kind of social epidemic is heavily dependent on … Connectors, Mavens and Salesmen.

… Connectors know lots of people. [They] collect people the way others collect stamps.

[Just] as there are people we rely upon to connect us to other people, there are also people we rely upon to connect us to new information: [Mavens].

For a social epidemic to start, though, some people are going to have to be persuaded to do something.

Mavens are data banks. They provide the message. Connectors are social glue: they spread it. But there are also … Salesmen with the skills to persuade us when we are unconvinced of what we are hearing.

The Stickiness Factor

[The] elements that make [things] sticky [often] turn out to be small and … seemingly trivial … .

The Power of Context

Epidemics are sensitive to the conditions and times and places in which they occur.

The mistake we make in thinking of character as something unified and all-encompassing is very similar to a blind spot in the way we process information. Psychologists call this tendency the Fundamental Attribution Error (FAE) … we are always seeking for “dispositional” explanation for events, as opposed to a contextual explanation.

Character [is not] a stable, easily identifiable set of closely relate traits … . Character is more like a bundle of habits and tendencies and interests, loosely bound together and dependent, at certain times, on circumstance and context.

[A] number of relatively minor changes in our external environment can have a dramatic effect on how we behave and who we are.

7. Case Study: Smoking

[There] can be a safer from of smoking, and by paying attention to the Tipping points of the addiction process we can make that safer, less sticky form of smoking possible.

8. Conclusion: Focus, Test and Believe

The lessons of the Tipping Point:

  1. Your resources ought to be concentrated on [Connectors, Mavens and Salesmen].
  2. Those who are successful … deliberately test their intuitions.
  3. What must underlie successful epidemics, in the end, is a bedrock belief that change is possible, that people can radically change their behaviour or beliefs in the face of the right kind of impetus.

In the end, Tipping Points are a reaffirmation of the potential for change and the power of intelligent action.

My Comments

The book is almost entirely about social tipping points, but the definition it gives is quite general, and people have since been using the term quite generally. Of course, not all tipping points are like epidemics. For example, if you keep loading up a bridge it will eventually fail, with nothing like the phenomena described here. But can the financial crash be thought of as an epidemic, and did it have a tipping point?

It seems to me that both booms and busts involve a certain amount of copying behaviour, and so have elements of epidemics. But was there “one dramatic moment in an epidemic when everything can change at once”? It seemed to me that there was quite an extended period when the financial markets were vulnerable. Even if one focusses on periods when things did change, it seemed to me that – as in some biological epidemics – there were a number of actual changes that set up the conditions for further changes, before ‘the big one’ came. Thus there would seem to be no single moment to focus on: one should try to appreciate the whole chain of events.

One can draw an analogy with the UK’s BSE crisis:  before it hit the news there had been a spreading practice – under competitive pressure – of introducing cycles into the animal feed system. This created a vulnerability which persisted until an effect (BSE) was noticed. But more important than the outbreak of BSE was, arguably, the spread of risky animal husbandry practices, and its ‘tipping point’.

Despite the above reservations, the three rules still seem appropriate. For the financial crisis the boom seem mainly to have been fuelled by universities and the media, the bust mainly by the media. In both cases there was something of a social process as described by Gladwell. But it seems to me that  a key part of the boom was an forgetting of previous insights, so the role of the Maven was to legitimise disinformation, not to host good quality information, as for BSE. Perhaps this is not restricted to this example. What is curated is not so much ‘information’ as ideas – valid or not. In the bust phase the Maven role was rather trivial, just as for BSE.

In terms of Gladwell’s lessons, one should not only focus on Mavens, but be concerned for the quality of their ‘information’. It is not enough that they be successful in the short-run: the activity they foster needs to be sustainable.

Dave Marsay

 

 

 

 

Toulmin’s Uses of Argument

Stephen E. Toulmin The Uses of Argument (Updated Edition) CUP 2003 (Original 1958).

This is well regarded by many, hence the new edition. Toulmin thinks of claims as being justified if there are warranted arguments from specified data. Warrants apply to ‘fields’. These filed-dependant warrants may either be general to the field, or requiring backing for their application. Justified claims may be treated as if they were true.

Toulmin recognises the claim of logic and mathematics to a superior kind of warrant, but points out that these do not bear directly on the ‘real world’. He is (rightly) scathing about those who seek to (mis) apply such logic and mathematics outside their proper sphere.

A logical difficulty is that the use of warrants would seem to require some sort of argumentation, which would depend on warrants, and so on in indefinite regress. Toulmin supposes that this process comes to an a bottom with some definite field-dependent warrant that is held to be absolutely true.

Toulmin considers a range of alternative views, dismissing most of them with informal yet persuasive arguments. The exception is a view resembling Russell’s Theory of Knowledge. He is very rude about it, but I can see no actual argument.

I find Toulmin’s notions of claims, data, warrants and backing helpful. But logically I agree with Russell, that any attempt to ground out a field in absolute warrants will turn that field into dogma. Thus where Toulmin claims that some claims can certainly be justified, I would only go so far as to say that claims can be justified relative to a field. For example, a claim about economics will only be justified according to some dogmatic field, such as neo-classical economics. In effect, Toulmin supposes that we will have unquestionable fields, such as neo-classical economics, so we can always treat a justified neo-classical claim as if it were a justified claim in economics. (Or equivalently, treat economics as if it were neo-classical.) But I agree with Russell (and Keynes) that we should always entertain some doubt about this, no matter how little.

We could reconcile Toulmin with Russell et al by  supposing that Toulmin is concerned with argumentation within the field, whereas Russell et al also consider arguments about the field. This distinction mirrors Whitehead. For example, it seems to me that most fields rely on some form of induction. This may be the best one can do, and essential to having a productive field, but – as Russell et al point out – is not absolutely reliable.

Toulmin notes the importance of distinctions. If a field ignores certain distinctions then any claims will only be justified to the extent that the field is justified in ignoring the distinctions. Toulmin also has some discussion of the need for complete relevant data. I am not clear how, in practice, one would warrant this, for example if some distinctions have not been made. So maybe I should regard a claim as warranted relative to distinctions made and the assumptions that informed the selection of data, backing and warrants.

Toulmin also has an interesting concept of probability, but my only comment here is that it, too, would seem to merit reviewing with the work of Russell et al in mind. Overall, Toulmin’s approach seems like a refinement of common-sense applicable to mono-cultural deliberative reasoning to a definitive conclusion in routine, stable, situations. My own experience is that claims often need adjusting, or caveating, to be justifiable.

Introduction

[Nothing] in what follows pretends to be final, and I shall have fulfilled my purpose if the results are found suggestive.

I Fields of Argument and Modals

The words of some men are trusted simply on account of their reputation form caution, judgment and veracity. But this does not mean that their right to our confidence cannot arise in the case of all their assertions: only, that we are confident that any claim they make weightily and seriously will in fact prove to be well-founded, to have a sound case behind it, to deserve – have a right to – our attention on its merits.

Note a distinction. It may be that we are confident that someone will only make claims that are justified according to their own warrants, or that they will make claims that we would find warranted. Where understanding of an area differs, trusted people will presumably restrict their claims so that they are warranted for both themselves and their intended audience. Toulmin does not discuss this, and in any case assumes some common underpinning of warrants.

In order for a suggestion to be a ‘possibility’ in any context … it must have ‘what it takes’ in order to be entitled to genuine consideration in that context. To say, in any field, ‘Such-and-such is a possible answer to our question’, is to say that, bearing in mind the nature of the problem concerned, such-and-such answer deserves to be considered. This much of the meaning of the term ‘possible’ is field invariant. The criteria of possibility, on the other hand, are field-dependent … .

For example ‘yes’ seems not to have been a possible answer to ‘might the economy crash within the next 5 years’ in 2005, if the field was neoclassical economics (or economics viewed neo-classically).

III The Layout of Arguments

[The] applicability of a particular warrant is one question: the result we shall get from applying the warrant is another matter, and in asking about the correctness of the result we may have to enquire into both things independently.

Toulmin introduces a  graphical representation:

Data (D)  ——> So Claim (C) subject to qualifier (Q)
……..……..…..|                                       |
……..……..Since                              Unless
……..Warrant (W)                  Rebuttal (R)
……..……..…..|
……..On account of
……..Backing (B)

Some warrants must be accepted provisionally without challenge, if argument is open to us in the field in question: we should not even know what sort of data were of the slightest relevance to a conclusion, if we had not at least a provisional idea of the warrants acceptable in the situation confronting us. The existence of considerations such as would establish the acceptability of the most reliable warrants is something we are entitled to take for granted.

He gives as example:

D (Petersen is a Swede) —> So Q (almost certainly) C (Petersen is not a Roman Catholic)
……………………….              |
Since, W (A Swede can be taken to be  almost certainly not a Roman Catholic)

……..……..……..……..……...|
Because,  B  (The proportion of Roman  Catholic Swedes is less than 2%)

This seems doubtful to me, unless – for example – Petersen was selected at random from all Swedes. But later Toulmin opines:

If we imagine a to challenge [the above] argument, and to demand further backing to show its validity, his request will be no more intelligible than [if the backing were that no Swedes were Roman Catholic]. … If he fails to see the force of the argument, there is little more we can do for him … [The] ability to follow such arguments is, surely, one of the basic rational competences.

Before this, he had noted:

[The] acceptability of a novel warrant is made clear by applying it successively in a number of cases in which both ‘data’ and ‘conclusion’ have been independently verified.

A regular prediction, made in accordance with the standard equations of stellar dynamics, is in this sense an unquestionable deduction.

It seems to me that for Toulmin the ‘basic rational competence’ , above, provides a kind of universal warrant. Clearly, one could ‘almost certainly’ in the way Toulmin advocates, but that would seem to leave a gap for a term with quite different meaning.

Toulmin goes on:

The ability to follow simple predictive arguments, whose warrants have been backed by sufficiently wide and relevant experience, may just have to be recognized as another simple rational skill, which most men possess but which is lacking defectives … .

If we are prepared to acknowledge Newtonian mechanics is sufficiently well established for the purpose of the problem at hand, then we must accept [a particular conclusion] as following necessarily from our … data.

IV Working Logic and Idealised Logic

Two people who accept common procedures for testing warrants in any field can begin comparing the merits of arguments in that field: only where this condition is lacking , so that they have no common ground on which to argue, will rational assessment no longer be open to them.

Conclusion

[A] radical re-ordering of logical theory is needed in order to bring it more nearly into line with critical practice … .

If the same as has long been done for legal arguments ere done for arguments of other types, logic would make great strides forward.

Accepting the need to begin by collecting for study the actual forms of argument current in any field, our starting point will be confessedly empirical … . … We must study the ways of arguing which have established themselves in any sphere…. knowing that they may be superseded, but only as the results of a revolutionary advance in our methods of thought. In some cases … the fact that they have established themselves in practice may be enough for us.

My comments

Fields seem to be ‘owned’ by groups resembling medieval guilds, and not open to outside criticism. While this possibly is how many fields operate, I do not think it is how they ‘should’.

Dave Marsay

Mayo & Spanos Error Statistics

Deborah G. Mayo and Aris Spanos “Error StatisticsPhilosophy of Statistics , Handbook of Philosophy of Science Volume 7 Philosophy of Statistics, 2011 (General editors: Dov M. Gabbay, Paul Thagard and John Woods; Volume eds. Prasanta S. Bandyopadhyay and Malcolm R. Forster.) Elsevier: 1-46.

Approach

Error statistics are a completely different approach to the Bayesian one. They concern the probability that a test would have falsified a hypothesis if false, not the probability that a hypothesis is true. The theory develops as follows:

Error probabilities are computed from the distribution of d(X), the sampling distribution, evaluated under various hypothesized values of θ.

Severity Principle (weak). Data x0 (produced by process G) do not provide good evidence for hypothesis H if x0 results from a test procedure with a very low probability or capacity of having uncovered the falsity of H, even if H is incorrect.

Severity Principle (full). Data x0 (produced by process G) provides good evidence for hypothesis H (just) to the extent that test T severely passes H with x0.

 Example

Suppose we are testing whether and how much weight George has gained between now and the time he left for Paris, and do so by checking if any difference shows up on a series of well-calibrated and stable weighing methods, both before his leaving and upon his return. If no change on any of these scales is registered, even though, say, they easily detect a difference when he lifts a .1-pound potato, then this may be regarded as grounds for inferring that George’s weight gain is negligible within limits set by the sensitivity of the scales.…

A behavioristic rationale might go as follows: If one always follows the rule going from failure to detect a weight gain after stringent probing to inferring weight gain no greater than δ, then one would rarely be wrong in the long run of repetitions. While true, this is not the rationale we give in making inferences about George.

We may describe this as the notion that the long run error probability ‘rubs off’ on each application. What we wish to sustain is this kind of counterfactual statistical claim: that were George to have gained more than δ pounds, at least one of the scales would have registered an increase. This is an example of what philosophers often call an argument from coincidence: it would be a preposterous coincidence if all the scales easily registered even slight weight shifts when weighing objects of known weight, and yet were systematically misleading us when applied to an object of unknown weight.

Comment: The notion that this particular finding is reliable, rather than that the method is reliable ‘on average’, seems to me crucial. For example, a study might find that a medicine is effective with few side-effects for the population as a whole, from which it is imputed that the medicine will probably be good for me. But such claims have not been ‘severely tested’.

On the other hand, being a little pedantic, I am suspicious of the George example:

  • Most people’s weight measurable fluctuates throughout the day, generally being greater just after they have eaten or drunk. Hence George weighing just the same seems an unlikely coincidence.
  • If George knew he was going to be weighed might he not have cheated, perhaps by varying the number of coins in his pocket to equalise the weight?

In the kind of survey results that the authors envisage this may not be a problem, but it does seem that the notion of severity is not completely general.

Severity

Is characterised by:

 Passing a Severe Test.

We can encapsulate this as follows:

A hypothesis H passes a severe test T with data x0 if

  • (S-1) x0 accords with H, (for a suitable notion of accordance) and
  • (S-2) with very high probability, test T would have produced a result that accords less well with H than x0 does, if H were false or incorrect.

Equivalently, (S-2) can be stated:

  • (S-2)*: with very low probability, test T would have produced a result that accords as well as or better with H than x0 does, if H were false or incorrect.

Comment: The probabilities in S-2 depend on the sampling distribution, which is assumed known in the cases of interest to the authors. But otherwise, as with George, it might not be.

Likelihoods

The paper is critical of the likelihood principle (LP: that only the likelihood matters).

Maximally Likely alternatives. H0 might be that a coin is fair, and x0 the result of n flips of the coin. For each of the 2n possible outcomes there is a hypothesis H∗i that makes the data xi maximally likely. For an extreme case, H∗i can assert that the probability of heads is 1 just on those tosses that yield heads, 0 otherwise. For any xi, P(xi; H0) is very low and P(xi;H∗i ) is high — one need only choose for (a) the statistical hypothesis that renders the data maximally likely, i.e., H∗i . So the fair coin hypothesis is always rejected in favour of H∗i , even when the coin is fair. This violates the severity requirement since it is guaranteed to infer evidence of discrepancy from the null hypothesis even if it is true. The severity of ‘passing’ H∗i is minimal or 0.

Comment. This is true, but isn’t the likelihood principle about S-1, not S-2? The expected likelihood function peaks at the true value, but the sample likelihood function has some additional ‘noise’, so the peak may be offset. In assessing the bias (or otherwise) of a coin one should take account not only where the likelihood has its peak but how broad that peak is.

Alternatively, maybe the force of this example is that alternative methods just seek to identify the most likely value of some parameter,  whereas in this case the authors want to give the hypothesis that the coin is fair a special role. I imagine that either approach can be justified, for different cases.

The paper continues:

Holding the LP runs counter to distinguishing data on grounds of error probabilities of procedures. “According to Bayes’s theorem, P(x|μ)…constitutes the entire evidence of the experiment, that is, it tells all that the experiment has to tell. More fully and more precisely, if y is the datum of some other experiment, and if it happens that P(x|μ) and P(y|μ)are proportional functions of μ (that is, constant multiples of each other), then each of the two data x and y have exactly the same thing to say about the values of μ. . . ”. [Savage, 1962, p. 17]

Comment: One needs to interpret Savage with care. He is saying that the estimates of μ should be the same in both cases. He is not saying that the amount of data considered is irrelevant.

The paper continues:

The holder of the LP considers the likelihood of the actual outcome, i.e., just d(x0), whereas the error statistician needs to consider, in addition, the sampling distribution of d(X) or other statistic being used in inference. In other words, an error statistician could use likelihoods in arriving at (S-1) the condition of accordance or fit with the data, but (S-2) additionally requires considering the probability of outcomes x that accord less well with a hypotheses of interest H, were H false.

Comment: It is not sufficient to consider just the likelihoods if one wants to consider such things as the probability that an experimental test would have falsified a hypothesis if false.

Comment

The paper conflates two things:

  1. The notion of severe testing, showing how meaningful the failure to falsify a hypothesis.
  2. The assumption that there will be some natural ‘null hypothesis’ that should be given a special standing.

The first point is important. The second is important when true. For example in medicine the ‘null effect’ hypothesis seems natural. But in other settings, such as measuring the strength of  a new material, the Bayesian approach seems more likely, and one is interested in the probable error (or similar) of the estimate, not its falsifiability. It seems important, then, to be aware of the underlying theory and apply it as appropriate, rather than seek a universal method.

A supplement to severe testing would be to give a set of hypotheses that with some significant probability could have given rise to data that accords with H at least as well as the actual data does and which taken together are representative of the maximal set of such hypotheses. For example one would expect that for a coin that was actually fair, this set would nearly always include the hypothesis of a fair coin. (Variations are possible.)

The notion of ‘severe testing’ could also do with generalising beyond the scope of this paper.

Dave Marsay

Instrumental Probabilities

Reflecting on my recent contribution to the economics ejournal special issue on uncertainty (comments invited), I realised that from a purely mathematical point of view, the current mainstream mathematical view, as expressed by Dawid, could be seen as a very much more accessible version of Keynes’. But there is a difference in expression that can be crucial.

In Keynes’ view ‘probability’ is a very general term, so that it always legitimate to ask about the probability of something. The challenge is to determine the probability, and in particular whether it is just a number. In some usages, as in Kolmogorov, the term probability is reserved for those cases where certain axioms hold. In such cases the answer to a request for a probability might be to say that there isn’t one. This seems safe even if it conflicts with the questioner’s presuppositions about the universality of probabilities. In the instrumentalist view of Dawid, however, suggests that probabilistic methods are tools that can always be used. Thus the probability may exist even if it does not have the significance that one might think and, in particular, it is not appropriate to use it for ‘rational decision making’.

I have often come across seemingly sensible people who use ‘sophisticated mathematics’ in strange ways. I think perhaps they take an instrumentalist view of mathematics as a whole, and not just probability theory. This instrumentalist mathematics reminds me of Keynes’ ‘pseudo-mathematics’. But the key difference is that mathematicians, such as Dawid, know that the usage is only instrumentalist and that there are other questions to be asked. The problem is not the instrumentalist view as such, but the dogma (of at last some) that it is heretical to question widely used instruments.

The financial crises of 2007/8 were partly attributed by Lord Turner to the use of ‘sophisticated mathematics’. From Keynes’ perspective it was the use of pseudo-mathematics. My view is that if it is all you have then even pseudo-mathematics can be quite informative, and hence worthwhile. One just has to remember that it is not ‘proper’ mathematics. In Dawid’s terminology  the problem seems to be that the instrumental use of mathematics without any obvious concern for its empirical validity. Indeed, since his notion of validity concerns limiting frequencies, one might say that the problem was the use of an instrument that was stunningly inappropriate to the question at issue.

It has long seemed  to me that a similar issue arises with many miscarriages of justice, intelligence blunders and significant policy mis-steps. In Keynes’ terms people are relying on a theory that simply does not apply. In Dawid’s terms one can put it blunter: Decision-takers were relying on the fact that something had a very high probability when they ought to have been paying more attention to the evidence in the actual situation, which showed that the probability was – in Dawid’s terms – empirically invalid. It could even be that the thing with a high instrumental probability was very unlikely, all things considered.

Decision-making under uncertainty: ‘after Keynes’

I have a new discussion paper. I am happy to take comments here, on LinkedIn, at the more formal Economics e-journal site or by email (if you have it!), but wish to record substantive comments on the journal site while continuing to build up a site of whatever any school of thought may think is relevant, with my comments, here.

Please do comment somewhere.

Clarifications

I refer to Keynes’ ‘weights of argument’ mostly as something to be taken into account in addition to probability. For example, if one has two urns each with a mix of 100 otherwise identical black and white balls, where the first urn is known to have equal number of each colour, but the mix for the other urn is unknown, then conventionally one has equal probability of drawing a black ball form each urn, but the weight of argument is greater for the first than the second.

Keynes does fully develop his notion of weights and it seems not to be well understood, and I wanted my overview of Keynes’ views to be non-contentious. But from some off-line comments I should clarify.

Ch. VI para 8 is worth reading, followed by Ch. III para 8. Whatever the weight may be, it is ‘strengthened by’:

  • Being more numerous.
  • Having been obtained with a greater variety of conditions.
  • Concerning a greater generalisation.

Keynes argues that this weight cannot be reduced to a single number, and so weights can be incomparable. He uses the term ‘strength’ to indicate that something is increased while recognizing that it may not be measurable. This can be confusing, as in Ch. III para 7, where he refers to ‘the strength of argument’. In simple cases this would just be the probability, not to be confused with the weight.

It seems to me that Keynes’ concerns relate to Mayo’s:

Severity Principle: Data x provides a good evidence for hypothesis H if and only if x results from a test procedure T which, taken as a whole, constitutes H having passed a severe test – that is, a procedure which would have, with very high probability, uncovered the discrepancies from H, and yet no such error is detected.

In cases where one has performed a test, severity seems to roughly correspond to have a strong weight, at least in simpler cases. Keynes’ notion applies more broadly. Currently, it seems to me, care needs taking in applying either to particular cases. But that is no reason to ignore them.

 

 

Dave Marsay

Mathiness

(Pseudo-)Mathiness

Paul Romer has recently attracted attention by his criticism of what he terms ‘mathiness’ in economic growth theory. As a mathematician, I would have thought that economics could benefit from more mathiness, not less. But what he seems to be denigrating is not mathematics as I understand it, but what Keynes called ‘pseudomathematics’. In his main example the problem is not inappropriate mathematics as such, but a succession of symbols masquerading as mathematics, which Paul unmasks using – mathematics. Thus, it seems to me the paper that he is criticising would have benefited from more (genuine) mathiness and less pseudomathiness.

I do agree with Paul, in effect, that bad (pseudo) mathematics has been crowding out the good, and that this should be resisted and reversed. But, as a mathematician, I guess I would think that.

I also agree with Paul that:

We will make faster scientific progress if we can continue to rely on the clarity and precision that math brings to our shared vocabulary, and if, in our analysis of data and observations, we keep using and refining the powerful abstractions that mathematical theory highlights … .

But more broadly some of Paul’s remarks suggest to me that we should be much clearer about the general theoretical stance and the role of mathematics within it. Even if an economics paper makes proper use of some proper mathematics, this only ever goes so far in supporting economic conclusions, and I have the impression that Paul is expecting too much, such that any attempt to fill his requirement with mathematics would necessarily be pseudo-mathematics. It seems to me that economics can never be a science like the hard sciences, and as such it needs to develop an appropriate logical framework. This would be genuinely mathsy but not entirely mathematical. I have similar views about other disciplines, but the need is perhaps greatest for economics.

Media

Bloomberg (and others) agree that (pseudo)-mathiness is rife in macro-economics and that (perhaps in consequence) there has been a shift away from theory to (naïve) empiricism.

Tim Harford, in the ft, discusses the related misuse of statistics.

… the antidote to mathiness isn’t to stop using mathematics. It is to use better maths. … Statistical claims should be robust, match everyday language as much as possible, and be transparent about methods.

… Mathematics offers precision that English cannot. But it also offers a cloak for the muddle-headed and the unscrupulous. There is a profound difference between good maths and bad maths, between careful statistics and junk statistics. Alas, on the surface, the good and the bad can look very much the same.

Thus, contrary to what is happening, we might look for a reform and reinvigoration of theory, particularly macroeconomic.

Addendum

Romer adds an analogy between his mathiness, which has actual formulae and a description on the one hand, and computer code, which typically has both the actual code and some comments. Romer’s mathiness is like when the code is obscure and the comments are wrong, as when the code does a bubble sort but the comment says it does a prime number sieve. He gives the impression that in economics this may often be deliberate. But a similar phenomenon is when the coder made the comment in good faith, so that the code appears to do what it says in the comment, but that there is some subtle, technical, flaw. A form of pseudo-mathiness is when one is heedless to such a possibility. The cure is more genuine mathiness. Even in computer code, it is possible to write code that is more or less obscure, and the less obscure code is typically more reliable. Similarly in economics, it would be better for economists to use mathematics that is within their competence, and to strive to make it clear. Maybe the word Romer is looking for is obscurantism?

Dave Marsay 

Wolf’s Shifts and Shocks

Martin Wolf The Shifts and the Shocks: What we’ve learned – and still have to learn – from the financial crisis, Allen Lane, 2014

This book reflects Martin’s widespread discussions, including with Ben Bernanke, Oliver Blanchard, Andy Haldane, Robert Johnson, Paul Krugman, Kenneth Rogoff, George Soros, Larry Summers, Joseph Stiglitz and Paul Tucker. The draft was reviewed by Adair Turner and Mervyn King. It is nothing if not well-informed, but what I take from it is the poverty of ideas. There is, it seems, no-one out there for us to learn from. Nor is our experience of nother crash is enough to teach us what we need to know.

All this would seem to point to the need for deeper thinking about the subject, perhaps of the mathematical kind reflected in this blog. But the book has some clues to a different approach. China is mentioned quite often. It seems that it responded better than other major countries or blocs, thus improving its position relative to the West, becoming a major new reality. But my own experience is that whereas UK economists, financiers and politicians – in common with most Westerners – were in denial about the potential for a crisis, and (according to Wolf) still are, the Chinese were recognizing the possibility of a crash, identifying key factors and acting prior to the crash of Autumn 2008. Thus, it seems to me, it would be reasonable to discuss the issues with the Chinese and to learn from them. My own discussions (albeit prior to the crash) highlighted the importance of Keynes’ mathematical reasoning (as distinct from simplistic Keynesianism) and, at least for me, the relevance of Game Theory. So perhaps the conclusion would be the same: mathematics matters when – as now – common sense reasoning, learning by studying and straightforward learning from experience are not enough.

Preface

“[It] is necessary to have an economic theory which makes great depressions one of the possible states in which our type of capitalist economy can find itself.”

Hyman Minsky, 1982

[The] crisis happened partly because the economic models of the mainstream rendered that outcome so ostensibly unlikely in theory that they ended up making it far more likely in practice. … As Minsky argued, stability destabilizes.

The solutions of three decades ago have morphed into the problems of today.

Introduction: We’re not in Kansas anymore

The work of economists who did understand these sources of fragility was ignored because it did not fit into the imagined world of rational agents, efficient markets and general equilibrium … .

[The] vulnerability to crisis was not due to what happened inside the financial system alone. Underneath it were global economic events, notably the emergence of a ‘global savings glut’ and the associated credit bubble, partly due to a number of interlinked economic shifts. … But also important … was the changing distribution of income between capital and labour and between workers.

… A particularly important aspect of the frailty of finance is its role in generating property bubbles.

… Policymakers made a big mistake in 2010 when they embraced austerity prematurely.

… However much the rest of the world resented the power and arrogance of the high income countries, it accepted that, by and large, the latter knew what they were doing, at least in economic policy. The financial crisis and subsequent malaise destroyed that confidence. Worse, because of the relative success of China’s state capitalism, the blow to the prestige of Western financial capitalism has carried with it a parallel blow to the credibility of Western democracy.

This last paragraph seems very odd to me. Even in the UK, was it really the case that majority of the population thought that their government knew what it was doing, economically? Did they have confidence, or were they merely fatalistic? If they did have confidence in those with power, might it not be a good thing for such misplaced confidence to be weakened? Further, I worry about the reference to China. Is ‘state capitalism’ the significant factor, or does Minsky – as in the preface – provide a better explanation? Asia had had a succession of crises, and so escaped the West’s delusions of rationality and stability, thus Minsky’s theory should have suggested to more than just me that they might be better prepared. (I might also quibble about the final reference to ‘Western democracy’. I tend to think of the need for a balance between democracy and capitalism, and to think of many of our problems as due to an imbalance between them. So I might lose confidence in Western democratic capitalism without losing confidence in democracy. Indeed, the solution to our economic woes may be more (genuine) democracy, not less.)

[The] rise of China, a new economic superpower, was among the explanations for the global imbalances that helped cause the crises.

Shades of Game Theory?

The combination of slow growth with widening inequality, higher unemployment, financial instability, so-called ‘currency wars’ and fiscal defaults may yet undermine the political legitimacy of globalization in many other respects.

While some people have been putting forward the view that globalization is legitimate, and one could argue that this view has not been successfully challenged, this only confers political legitimacy on globalization to the extent that the apparent democracies were actually legitimate. This point seems debateable, given the remarks of Minsky. It could be argued that the crises that Wolf is discussing were due to failures in the democratic systems, perhaps their capture by capitalists?

Yet perhaps the biggest way in which the crises have changed the world is – or at least should be – intellectual. They have shown that established views of how (and how well) the world’s most sophisticated economies and financial systems work were nonsense. … It is, in the last resort, ideas that matter, as Keynes knew well. Both economists and policy-makers need to rethink their understanding of the world in important respects.

Including reforming democratic debate?

Part 1: The Shocks

Skipped.

Part 2: The Shifts

Prologue

Why did the world’s leading economies fall into such a mess?
The answer, in part, is that the people in charge did not believe that they could fall into it. [These] mistakes did not come out of the blue. They were, no doubt, influenced by … incentives for what I call ‘rational carelessness’. … People did not understand the risks … partly because they did not want to understand them.

It is possible to identify three huge shifts. The first is liberalization – the reliance on market forces across much of the world economy, including, notably, in finance. The second is technological change … . The third is ageing … .
These underlying forces … permitted or created significant further changes. Among the most important have been: the emergence of a globalized world economy; soaring inequality in most economies; the entry of gigantic emerging economies …; the evolution of a liberalized and innovative global financial system … and a huge rise in net capital flows across frontiers.

The language here is interesting. ‘Liberalization’ implies freedom, but whose? One of the most illiberal acts in UK history was the abolishment of the slave trade. The impact of modern-day liberalization and globalization on some countries and sectors can hardly seem ‘liberal’ to them. Surely a part of the problem which Wolf describes has been in part due to the widespread use of terms in ways that confuse, as here. (On a minor point, I blame financialization and globalization for the demise of traditional British coffee shops, which I much prefer to the American variety.)

Part 3: The Solutions

Prologue

How should we manage a world of savings glut – or, which comes to the same thing, excess supply? Is there a real chance of secular stagnation and, if so, what might be done about it? [Our] big problem is addiction to ever-increasing debt … . Is it possible to balance our economies without such a huge reliance on ever-increasing leverage? … This can be done in two complementary ways. One is to close external imbalances. The other is to use the government’s ability to create non-debt money.

6 Orthodoxy Overthrown

Wolf cites an interview of his with Larry Summers:

I asked what economics, if any, he had found useful to the task of putting the US and world economies together again, after the crisis. He responded ‘There are things economists didn’t know. There are things economists were wrong about. And there were things where some economists were right … There is a lot in Bagehot that is about the crisis that we just went through, there’s more in Minsky and perhaps still more in Kindleberger … I think that economics knows a fair amount. I think economics has forgotten fair amount that is relevant. And it has been distracted by an enormous amount. ‘Later in the interview … Mr Summers referred to Keynes.

Kindleberger’s view was that stability is best achieved by a hegemon, rather than being split between similar powers (such as US, EU and China). I am not so sure. But the opposite view, that stability is best left to ‘natural forces’ seems yet more doubtful.

The failure of official economics

[The] pre-crisis official orthodoxy was that central banks would stop the excess credit expansion in time, or at least not too late, by responding to rising inflation in the prices of goods and services. But that signal would fail if rising asset prices and expanding credit were not closely related to inflation. That is precisely what happened in the 2000s.

The reason orthodox economics failed to pick up the risks was, in short, that it ruled out what most mattered. Modern financial systems do not equilibrate smoothly.  They bare dynamic systems characterized by uncertainty and ‘animal spirits’, in which the most powerful destabilizing force is the ability of the private financial sector to generate credit and money and so to produce euphoric boom and panic-stricken bust.

The official response to its errors

Wolf quotes Ben Bernanke (2012):

‘… The only solution in the end is for us regulators and our successors to continue to monitor the entire financial system and to try to identify problems and … respond to them using the tools that we have.’

Alternatives to the new orthodoxy

[So] great has been the failure of the financial system that the idea of a monoculture of banking and of financial systems, governed by the same global rules, seems inordinately foolish. … Experiment is essential. ..

The orthodoxy broke down, because of what Keynes and Friedman bnoth ignored: the tendency of the credit system to run riot .. .

[All] forms of balanced-budget household economics applied to the government are nonsense unless it has ceased to be able to create money (as happened inside the Eurozone).

Conclusion: Fire Next Time

[One] authoritative source estimates there were 147 banking crises between 1970 and 2011.

[It] is far from clear that the globalization of debt-creating flows, particularly those generated by banks, has brought much, if any, benefit to the world economy, as opposed to those who work in the financial industry.

[The] crises seem to have got bigger and more globally devastating over time . … The emerging and developing countries managed the consequences relatively well. But … it may well turn out that … fiscal loosening ands credit expansion .. have brought longer-term fragilities.

… Keynes was right: hopes and fears for the long run must not be the enemies of decisive action in the short run. … Somehow, the interaction between liberalization and globalization has destabilized the financial system.

What happened?

… Fraud and near fraud  – not to mention the masasaging of data to show a prettier picture than was justified (by rating agencies for example)- exploded.

Regulators, politicians and the economists who advised them were either unaware of the full extent of the dangers or were unable or unwilling to act to reduce them, partly because they were captured by the interests of the regulated, partly because they were intimidated or seduced, but, above all, more so because they were prey to the very same cognitive errors.

This had many elements: lack of preparedness,; lack of understanding of what was happening; political, intellectual and bureaucratic resistance to taking action soon enough; the unavoidable difficulties of handling a crisis that required cooperation across borders; and, particularly, the Eurozone’s political and institutional  lack of preparedness.

What is the legacy?

The fiscal costs are of roughly the same scale as a world war, while the present value of the economic costs could be even greater, … . Research at the [IMF] has suggested … that the more expansionary the immediate macroeconomic policies, the smaller are the long-term losses in output.

…The lower the prospective economic growth, the more policy makers will rely on … austerity, inflation, (financial) repression and (debt) restructuring. …

… Naïve confidence in the stability of a deregulated financial system has vanished, perhaps for a generation, except in particularly secluded corners of the academic world. …

What is to be done?

Long-term health- Challenges to crisis-hit economies

The logic behind the ‘no bailout’ position is that it would make everyone more prudent. [Yet very] important are false beliefs about the uncertain future – the belief in the middle of the last decade, for example, that house prices would rise forever, that … securitized financial assets … would always find willing buyers band that crucial lending markets would always remain liquid.

Long-term health – Global challenges

[Financial] integration has proved highly destabilizing. It might have to be sharply curtailed.

The challenge of Radical Reform

[Leveraging] up existing assets is just not a particularly valuable thing to do: it creates fragility, but little, if any, real new wealth.

… Too many countries are being forced to adopt much the same arrangements under the pressure of orthodoxies imposed by global institutions under the control of a limited number of hegemonic powers. That should end.

Why this matters

[Crises] undermine confidence in the elites. …

[The] economic, financial, intellectual and political elites misunderstood the consequences of headlong financial liberalization. Lulled by fantasies of self-stabilizing financial markets, they not only permitted but encouraged a huge and, for the financial sector, profitable bet on debt. The policy-making elite failed to appreciate the risks of a systemic breakdown. The financial elite was discredited by both its behaviour and its need to be rescued. The intellectual elite was discredited by its failure to anticipate a crisis or agree on what to do after it had struck. The political elite was discredited by their willingness to finance the rescue, however essential it was. The decline in confidence … is even worse if the methods used to rescue the economy then make the parts of the elite most associated with the crisis richer than before. [There] has to be a sense that belief that success is earned, not stolen or handed over on a platter.

[The] divorce between accountability and power strikes at the heart of democratic governance.

… People feel even more than before that the country is not being governed for them, but for a narrow segment of well-connected insiders who reap most of the gains and, when things go wrong, are not just shielded from loss but impose massive costs on everybody else.

 Comments

Wolf has many detailed in sights and recommendations concerning the crash and recovery than above. My main interest is in the general problem. Economies seem to resemble dynamical systems with critical instabilities. A key part of what drives economies to change is human perception and expectation. These tend to homogenize and stabilise, the more so in ‘liberalized’ and ‘globalized’ economies. In the short-run, narrowly empirical learning and acting is optimal. Before the crisis it was common to ‘define away’ uncertainty by defining risk as mere variability. This ignored and even denied the possibility of critical instabilities. It was as if a walker in the dark was concerned about the risk of tripping over a rock, unaware that they were stumbling towards a cliff drop.

It seems to me that Wolf’s recommendations are not final, for all time (he does not claim that they are) and so the danger remains of treating a theory or ‘observation’ as if it were true, when all one can know is that it has been effective. Wolf discusses the Eurozone at some length. For me, the question is, can different nations maintain different ‘models’ within the zone? If not, it is clearly doomed. While Germany may have the best economic model, it would be dangerous for any one model to get to dominate the zone.

Another theme that emerges is that of inequality. It seems to me that liberalization implies that the wealthy get wealthier, and hence more influential, and that as wealth becomes more and more concentrated economies become more and more homogenized, in all aspects. Further a concentration of wealth leads to a savings glut, with the consequences that Wolf outlines. It used to be argued that liberalization would lead to growth that would ‘float all boats’, but this is now in doubt.

In the west, liberalization has – in theory – been constrained by democracy. In the period in question wealthy individuals and corporations increased their influence over politicians and regulation, and so this constraint diminished. (Some even argued that liberalization should be unconstrained.) As Wolf notes, the problem is not corruption in some overt sense, but that ‘elites’ fell prey to a common myth.

The new orthodoxy, in essence, is to be continually on the look out for critical instabilities, and to respond ad-hoc as they occur. Wolf doesn’t really spell out his alternative, but it involves somehow keeping alive alternative theories of economics and alternative economies, and also perhaps having some diversity within economies. I would suggest that a part of this should be attempts to discover where the critical instabilities and what different economies might emerge, and to identify the critical factors in preserving and encouraging resilience.

If we had such a theory, we might think about some of Wolf’s more particular problems, such as ageing. In the UK many houses are under-occupied by pensioners. This reduces supply, which inflates prices, which means it would be financially foolish for many under-occupiers to ‘trade-down’. This displays the signature of a critical instability: if house-building (particularly for pensioners) significantly increased we could see a drop in prices fuelled by a reduction in under-occupancy. Thus one could have two relatively stable regimes with ‘interesting times’ in between. A stabilization or drop in house prices could also reduce inequality, by reducing the nominal value of inheritances, by making houses more affordable for first-time buyers and eliminating a source of speculation, particularly buy-to-let.

If house prices were stable relative to incomes and if pensions also kept up with incomes then it is not even clear that house purchasing would be so essential as it is now. Some pensioners already have no house but use investment income to fund their flits between holiday flats, cruise ships and relatives or friends. My own view is that an economic system that severely disadvantages people who do not buy their own house is not only too homogenous but very illiberal, even if it is what most people want. There should at least be alternatives.

A current ‘hot topic’ is the apparent unfairness between the retired and the young. My view is that it would be foolish to try to think about this without considering it together with these other issues, and that in doing so we need to go beyond Wolf’s insights.

Dave Marsay

 

 

Kolmogorov’s Foundations of Probability

V.A.N. Kolmogorov Foundations of the Theory of Probability 2nd Ed. Tr. N. Morrison, Chelsea, NY, 1956. (Original 1936.)

The received wisdom is that Kolmogorov firmly established the view that ‘the probability of some event … satisfies his ‘Kolmogorov axioms’, (where ‘Probability is the measure of the likeliness that an event will occur.’)

For a coin toss, Kolmogorov’s axioms are often taken to imply that

P({Heads})+P({Tails})=1.

This equality is a direct consequence of measure theory provided that such a measure exists. Kolmogorov is often ‘credited’ with the view that existence is a non-issue, and hence that well-founded probabilities always exist. His work is still interesting from this angle. Is this attribution fair?

Preface

Kolmogorov notes:

[The] analogies between measure of a set and probability of an event, and between integral of a function and mathematical expectation of a random variable, became apparent.

Making such an analogy, we might expect there to exists upper and lower probabilities, analogous to upper and lower measures and upper and lower integrals. We might ask if the upper and lower measures are necessarily the same. For example, if I aim a dart board with a no-measurable sub-set, the notion of ‘the probability of hitting the sub-set’ appears incapable of interpretation as a single number. It may be the case that Kolmogorov thought that all sets of interest would necessarily be measureable, but are we so persuaded?

I. Elementary Theory of Probability

The theory of probability, as a mathematical discipline, can and should be developed from axioms in exactly the same way as Geometry and Algebra.

Thus, as with Euclidean Geometry, it is possible to have a mathematical discipline that faithfully reflects, codifies and systematizes a set of ideas that are not actually true to the intended reality. The two aspects should not be confused. In this case, the fact that Kolmogorov’s axioms accurately capture mainstream ideas about probability is not at all controversial. It is the extent to which these are ideas are universally applicable that can be doubted.

… This means that after we have defined the elements to be studied and their basic relations, and have stated the axioms by which these relations are to be governed, all further exposition must be based exclusively on these axioms, independent of the usal concrete meaning of these elements and their relations.

Prior to Kolmogorov’s time the general assumption had been that, like Geometry, all mathematics would be finitely axiomatizable in this sense. But subsequently the work of Godel, Church, Turing and Post – among others – showed that not even Arithmetic is finitely axiomatizable in this strong sense, so we no longer treat it as a given that a theory of uncertainty could be reduced to finitely many axioms. The idea that probability – like Geometry – is fully captured by a set of axioms would need to be demonstrated. Even then, as with Geometry, it would show that our conceptions were somehow small, not that the thing which supposedly corresponds to our conception is really small.

1 Axioms

Kolmogorov considers a finite fixed set of elementary events and ‘a’ set of subsets.

 2 The Relation to Experimental Data

Kolmogorov in effect assumes the law of large numbers (that the sample probability tends to the actual probability) and takes this as the definition of ‘probability’. But the law of large numbers is not universal for real event streams: even roulette wheels can wear out.

6 Conditional Probabilities as Random Variables, Markov Chains

Kolmogorov defines the ‘mathematical expectation’ as a probability-weighted  average. Again, this seems a very tame version of uncertainty. He does not argue that such averages always exist, and leaves ‘mathematical expectation’ as undefined when it doesn’t.

II. Infinite Probability Fields

This contains the key results in establishing probability as a mathematical discipline analogous to Geometry. He notes that the results are ‘merely a mathematical structure’, but does not labour the point. (I take it to mean that we have something like a Euclidean model of something, which does not imply that the thing being modelled really is Euclidean.)

III. Random Variables

 2. Definition of Variables and of Distribution Functions

In effect, a ‘random variable’ is a single-valued function on some base set for which the probability is defined on all appropriate sub-sets. The ‘distribution function’ is what we now more commonly call the ‘cumulative distribution function’.

IV. Mathematical Expectation

1. Abstract Lebesque Integrals

Kolmogorov defines a generalised expectation using the tools of the previous chapter. He notes:

If this series converges absolutely for every …  In this abstract form the concept … is indispensable for the theory of probability.

4. Some Criteria for Convergence

This considers the convergence of a sequence of random variables. Two alternative sufficient conditions on the expectation of some indicator function are given. These are also necessary for well-behaved indicator functions.

Kolmogorov supposes this theory to have many useful applications, but does not argue that it is universally applicable.

V. Conditional Probabilities and Mathematical Expectations

2. Explanation of a Borel Paradox

Kolmogorov notes that when there is a large space of options, what happens may have had a prior probability of 0. In this case expectations conditioned on that event can be nonsense, hence Borel’s paradox. Thus we should be cautious in using probability theory when something that we would have thought had probability 0 happens. (Unfortunately, this is all too often.)

4. Conditional Mathematical Expectation

The conditional expectation is defined ‘if it exists’.

VI. Independence; The Law of Large Numbers

3. The Law of Large Numbers

Kolmogorov defines some conditions under which sequences of random variables are stable or – better – have ‘normal stability’, for example if the variance tends to 0 (the ‘Markov condition’). He does not claim that these universally hold, but leaves them as something to be determined before his theory can be applied to any particular case. (That is, he regards them as axiomatic in the mathematical sense, rather than as a universal truth.)

Supplementary Bibliography

(By Translator.)

There are many problems, especially in theoretical physics, that do not fit into the Kolmogorov theory, the reason being that these problems involve unbounded measures.

 Comments

Kolmogorov as Mathematics, not dogma

If we assent to his axioms, Kolmogorov provides us with a wealth of mathematics that unconditionally resolves some key issues. He gives some examples under which his axioms are reasonable or even indisputable. If we have a mechanism that is random in the appropriate sense, then his axioms will hold. If have a static population that we sample randomly, then we can apply his theory – with care – to make some useful deductions. Conversely, if a population is evolving then (by definition) it is not stable in Kolmogorov’s sense, and so much of his theory is inapplicable. And to apply his work to observations of a human we would have to suppose that they ‘were’ a Markov chain process, which hardly credits them with free will and would seem to suppose that we had more intelligence than they did.

From a mathematical view, it seems to me that Kolmogorov properly bounds his work, is not claiming that his merely mathematical theory would apply to a coin being tossed by a magician, for example.

Possible Interpretations

In Kolmogorov’s interpretation, the probability is equal to the long-run expected proportion (‘the law of large numbers’). A sounder interpretation is that it would be the naïve long-run expected proportion based on the evidence to hand. For example, the longer my car goes without breaking down the less and less the naïve probability of it breaking down, based on that evidence. But this does not preclude me from looking at the break-down rates for other cars and estimating that the ‘true’ probability will be higher by an amount that increases each day, based on ‘bucket curve’ data for cars like mine, or if my car is new, based on common sense. Kolmogorov’s theory only considers a single ‘measure’, based on fixed data. But, as with the car, there may be uncertainties about what is the appropriate data. Kolmogorov does not consider this aspect.

Possible Extensions

If we wish to apply measure theory, but are unconvinced that the analogous sets are actually measureable, we might consider upper and lower measures. As measures these satisfy the Kolmogorov axioms, apart from being normalised. It is natural to add that:

The upper measure is no lower than the lower measure.

The lower measure of a set is plus the upper measure of the complement of that set add to 1.

Ordinary probability theory is then a special case. A difficulty of this approach is that probability values are not as deterministic as they are in the ordinary case, and it is not clear that the axioms are always enough.

Kolmogorov’s theory is related to the law of large numbers. One could interpret upper and lower probabilities as putting bounds on long-run proportions, but one still might wish to discriminate between the case where there the sample proportions will converge to some value that is only defined to within some range and the case where the long-run behaviour might never settle, but wander within a range (like a wobbly roulette wheel).

Alternatively, many systems do seem to resemble Markov process in the short run, so one might regard Kolmogorov’s theory as reasonable in the short run, so long as nothing is happening to unsettle the supposed mechanism. Perhaps no alternative theory is needed, as long as one does not rely on it to make longer-run predictions. (That is, we regard the measure as ‘the current measure’ without relying on it being absolutely fixed.) This may be closest to what Kolmogorov intended.

Dave Marsay

 

 

Pragmatism, re-thought

It seems to me that people often act as if they thought that what they were doing was ‘rational’, ‘logical’, ‘mathematical’ or ‘scientific’ when perhaps they ought to know that it wasn’t, and that some of our common or important ills are a consequence of these mistakes. This note is an attempt to explicate some of the issues.

First, I suppose that what people think are logical, for example, are mostly simply heuristics. That is, they are rules of thumb or habits that have proved effective in the past for which they have no specific reason to doubt. One common form of pragmatism is to regard or treat such heuristics as ‘great truths’ and not to hedge against the possibility that they are wrong, and even to close one’s mind to such a possibility. But this practice seems to me to be behind many errors. It may be, of course, that in the long run we are often better off accepting these errors rather than waste effort being more cautious. For example, it may be better for society as a whole to accept the following, from time to time, rather than attempt to avoid them:

  • miscarriages of justice,
  • ill-founded foreign interventions,
  • natural disasters,
  • financial collapse.

But even if this was true in the past, it seems to me prudent to question this from time to time, and not have such assumptions ‘baked in’ to discourse. I would also not accept the view that such questions should be left to those ‘elites’ who are best placed to avoid the worst consequences of such problems.

As an example, consider probability theories. (There are many.) Informally, for a given context, C, one supposes that one has a probability function, P( | : ) that assigns numbers P(A|B:C) to conditional statements A|B, such as the probability that the bus is full when it is raining. This depends on the context, C, such as the frequency of buses, the local population and various economic factors. Yet the dependence on C is normally suppressed. Why? Typically, we have some recent experience and we suppose that whatever factors are relevant will have stayed roughly the same, so that it is reasonable to treat the context as fixed.  Hence normal pragmatism. But suppose that a large local employer has just closed? We know that the context has changed, and the probability of the buses being full will have reduced. Or at least, we should realise that our estimate is unusually unreliable.

A variation on pragmatism is to apply the same heuristics but then to consider the stability of the context. If there is no reason to think that the context may have changed then one can proceed as before, so that one has an ‘estimate’, ‘expectation’ or ‘prediction’. But if the context may have changed, or has changed, or is more likely to change, one does not take the ‘estimate’ of the heuristic as one’s own. One needs to hedge in a way that differs from taking account of probability distributions. For example, one might seek information on the various factors.

A common variation is to consider what new contexts might be possible, and to make estimates relative to these possible contexts.  One might then try to identify some actions that will be satisfactory across contexts.

Finally, one can consider situations as having multiple factors, some or all of which may be stable at any one time, and tries to anticipate which will be stable and to base one’s actions on those. For example, if one expects some economic instability then one might look for relatively stable sectors (utilities?) to invest in.

It seems to me that these variations are quite common, and that some important social problems have arisen because the wrong kind of ‘pragmatism’ has been employed.

Dave Marsay

Diagnosis

1% of women at age forty who participate in routine screening have breast cancer.  80% of women with breast cancer will get positive mammographies.  9.6% of women without breast cancer will also get positive mammographies.  A woman in this age group had a positive mammography in a routine screening.  What is the probability that she actually has breast cancer?

It is alleged that:

The correct answer is 7.8%, obtained as follows:  Out of 10,000 women, 100 have breast cancer; 80 of those 100 have positive mammographies.  From the same 10,000 women, 9,900 will not have breast cancer and of those 9,900 women, 950 will also get positive mammographies.  This makes the total number of women with positive mammographies 950+80 or 1,030.  Of those 1,030 women with positive mammographies, 80 will have cancer.  Expressed as a proportion, this is 80/1,030 or 0.07767 or 7.8%.

Do you believe it? My notes below:

.

.

.

.

.

.

.

.

.

.

 

One can take either a subjective or an objective view of probability. From a subjective view I could only guess the proportion of 40-year old women who had this cancer, so my estimate would be uncertain, unlike my estimate for a similar problem involving urns where I could calculate it exactly. So it might be ‘8%ish’, but not 7.8%. I might do better to suppose that the test was reasonably diagnostic, so the answer can’t be much less than 10%, or else why bother? It certainly isn’t clear to me that I ‘should’ always start by trying to guess the relevant prior.

The article is written from an objectivist viewpoint. In this case we can replace probabilities by proportions, which are ratios of numbers satisfying various criteria. Even for an epidemic where the proportions might be changing rapidly, we suppose that the probabilities exist in principle. But there is till a problem. Suppose that the risk of cancer depends not only on age but also family history. Then the woman being tested had some family history and hence (following the logic of the article) has a different prior and hence a different final probability. Hence her probability could well be significantly more than 7.8%. The probability that the article calculates is not her objective probability, but a formal probability, derived for some abstract woman of her age, ignoring all else. In this case, it is the probability appropriate to a doctor with no access to patient records who asks no questions. Some people think it appropriate to the woman, based on the principle that everyone should always assume themselves to be average. This may seem very reasonable, but is it so reasonable to ignore family history?

At the very least, the problem raises issues that need to be addressed before one can be so sure that one has reduced the lack of certainty to a relatively simple numeric probability.

See Also

Similar puzzles. My notes on communicating uncertainty.

Dave Marsay

 

 

Freedman’s Causal Inference

David A. Freedman Statistical Models and Causal Inference: A dialogue with the social sciences Eds. David Collier, Jasjeet S. Sekhon & Philip B. Stark CUP 2010.

Publishers Introduction

Freedman presents a definitive account of his approach to causal inference in the social sciences. He explores the foundations and limitations of statistical modelling, illustrating basic arguments with examples from political science etc. He maintains that many new technical approaches to statistical modelling constitute not progress but regress.

Editors’ Introduction: Inference and Shoe-leather

[Freedman] demonstrated that it can be better to rely on subject-matter expertise and to exploit natural variation to mitigate confounding and rule out competing explanations. … An increasing number of social scientists now agree that statistical technique cannot substitute for good research design and subject-matter knowledge. This view is particularly common among those who understand the mathematics and have on-the-ground experience.

Part I Statistical Modelling: Foundations and Limitations

1 Issues in the Foundations of Statistics: Probability and Statistical Models

[Foundations of Science (1995) 1: 19-39.]

1.4 A critique of the subjectivist position

[A] Bayesian’s opinion may be of great interest to himself … but why should the results carry any weight for others?

… Under certain circumstances [but not in others], as more and more data become available, two Bayesians will come to agree.

My own experience suggests that neither decision-makers nor their statisticians do in fact have prior probabilities.

… [The] theory addresses only limited aspects of rationality.

 1.5 Statistical models

Regression models … are widely used by social scientists to make causal inferences … . However, the “demonstrations” generally turn out to depend on a series of untested, even unarticulated, technical assumptions.

1.6 Conclusions

For [many statistical practitioners], fitting models to data, computing standard errors, and performing significance tests is “informative,” even though the basic statistical assumptions (linearity, independence of errors, etc.) cannot be validated.

2 Statistical Assumptions as Empirical Commitments

[2005]

No amount of statistical maneuvering can get very far without deep understanding of how the data were generated.

3 Statistical Models and Shoe Leather

[Statistical] technique can seldom be an adequate substitute for good design, relevant data, and testing predictions against reality in a number of settings.

 Part II Studies in Political Science, Public Policy, and Epidemiology

8 What is the chance of an earthquake?

8.4 A view from the past

Littlewood [A Mathematician’s Miscellany, 1953)] wrote:

Mathematics has no grip on the real world; if probability is to deal with the real world it must contain elements outside mathematics.

 10 Relative Risk and Specific Causation

Epidemiologic data usually cannot determine the probability of causation in any meaningful way, because of individual differences.

11 Survival Analysis: An Epidemiological Hazard?

[The] misuse of survival analysis … can lead to serious mistakes … .

[The] big assumption in constructing [cross-sectional] life tables is that death rates do not change over time.

Cox said of the proportional hazards model:

As a basis for rather empirical data reduction, [the model] seems flexible and satisfactory.

Part III New Developments: Progress or Regress?

14 The Grand Leap

The Markov condition says, roughly, that past and future are conditionally independent given the present.

SGS [three advocates of AI] seem to buy the Automation Principle: The only worthwhile knowledge is the knowledge that can be taught to a computer.

15 On Specifying Graphical Models for Causation, and the Identification Problem

[Causal] relationships cannot be inferred from a data set by running regressions unless there is substantial prior knowledge about the mechanisms that generated the data.

The key to making a causal inference from nonexperimental data by regression is some kind of invariance … .

[Note] the difference between conditional probabilities that arise from selection of subjects with X = x, and conditional probabilities arising from an intervention that sets X to x. The data structures may look the same , but the implications can be worlds apart.

We want to use regression to draw causal inferences from nonexperimental data. To do that, we need to know that certain parameters and certain distributions would remain invariant if we were to intervene. That invariance can seldom if ever be demonstrated by intervention. What, then, is the source of the knowledge? “Economic Theory” seems like a natural answer, but an incomplete one. Theory has to be anchored in reality. Sooner or later, invariance needs empirical demonstration, which is easier said than done.

Freedman quotes Pearl:

[Causal] analysis deals with the conclusions that logically follow from the combination of data and a given set of assumptions, just in case one is prepared to accept the latter. Thus, all causal inferences are necessarily conditional. … In complex fields like the social sciences and epidemiology, there are only a few (if any) real life situations where we can make enough compelling assumptions that would lead to identification of causal effects.

And Heckman:

The information in any body of data is usually too weak to eliminate competing causal explanations of the same phenomenon. There is no mechanical algorithm for producing a set of “assumption free” facts or causal estimates based on the facts.

 19 Diagnostics Cannot Have Much Power Against General Alternatives

    The invariance assumption is not entirely statistical. Absent special circumstances, it does not appear that the assumption can be tested with the data that are used to fit the model. Indeed, it may be difficult to test the assumption without an experiment, either planned or natural.

[As] recent economic history makes clear, a major source of uncertainty in forecasting is specification error in the forecasting models. Specification error is extremely difficult to evaluate using internal evidence.

Unless the relevant class of specification errors can be delimited by prior theory and experience, diagnostics have limited power, and the robust procedures may be robust only against irrelevant departures from assumptions.

Part IV Shoe Leather Revisited

20 On Types of Scientific Inquiry: The Role of Qualitative Reasoning

Informal reasoning, qualitative insights , and the creation of novel data sets that require deep substantive knowledge and a great expenditure of shoe leather.

… Recognizing anomalies is important; so is the ability to capitalize on accidents. …

…. There is a … natural preference for system and rigor over methods that seem more haphazard. These are possible explanations for the current popularity of statistical models.

…. If so, the rigor of advanced qualitative methods is a matter of appearance rather than substance.

The book includes many important medical examples.

Comments

Freedman is critical of the view that there is some statistical machinery that can be applied to a set of data to infer (or even validate) a causal model. Some have interpreted Freedman as being critical of mathematical modelling, but his view seems more nuanced. As he recognizes, the mathematical validity of Euclidean geometry is a different issue to its correspondence to the physical world. There is no known force of nature that forces it to comply with Euclid’s concept of it. Similarly, we may doubt the validity of astrology no matter how mathematical its methods.

Where I differ from Freedman is that I think that statistical descriptions of the kind provided by the methods that he criticises can often be very helpful, just as long as we recognize them for what they are, and do not over-interpret them. For example, a colleague once consulted me when some response-time data had very peculiar moments. Fortunately I had seen similar results before: when data such as ‘45.32’ had been interpreted as 45.32 minutes, but was actually 45 minutes and 32 seconds. In general, standard methods are good for suggesting hypotheses for development and testing. It is only under special circumstances (such as Freedman describes) that their raw output can be relied upon as ‘true’.

Freedman makes a distinction between quantitative and qualitative aspects of a problem, perhaps inviting a naïve reader to associate the quantitative aspect with mathematics. But one might rather say that mathematics is commonly thought to be confined to the quantitative. It is not. While Freedman’s qualitative analysis is not mathematical, the bulk of his work shows that one needs to be very careful about the meaning of one’s terms, and so it seems to me that his analysis could be improved by being more mathematical (perhaps as an annex). For example, it might draw on Keynes’ Thesis.

In particular, all the approaches here seem to suppose that one creates a model, extrapolates, determines some ‘error distribution’, and then makes a decision accordingly. It is possible to finesse this approach, and assume less but with greater diligence. Thus where the Markov condition supposes that the future depends on the present but not the past we should note two conditions. Firstly, that the system being observed is objectively Markov, and secondly that we are observing enough of it. Both assume that there is a fixed set of factors to be observed. This seems doubtful. For example, if the Bank of England suddenly decided that the quality of its favourite claret was an important economic indicator then it might act on it, in which case it might become important. In a speculative market, anything can become a factor.

I often wonder why people seem to make mistakes of the kind that Freedman denigrates, and how we may enlighten them. Freedman puts it down to love of ‘system’ and ‘rigor’. It is my experience that many ‘pragmatic’ people squeeze mathematics into a distorting frame of system and rigor. Perhaps the cure would be greater familiarity with mathematics post 1900, particularly that of Whitehead, Keynes, Russell and Turing, as on this, my blog.

Dave Marsay

 

 

Brieman’s Statistical Modelling

Leo Breiman Statistical Modelling: The Two Cultures Statistical Science 2001, Vol. 16, No. 3, 199–231.

Abstract

There are two cultures in the use of statistical modeling to reach conclusions from data. One assumes that the data are generated by a given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown. The statistical community has been committed to the almost exclusive use of data models. This commitment has led to irrelevant theory, questionable conclusions, and has kept statisticians from working on a large range of interesting current problems. Algorithmic modeling, both in theory and practice, has developed rapidly in fields outside statistics. It can be used both on large complex data sets and as a more accurate and informative alternative to data modeling on smaller data sets. If our goal as a field is to use data to solve problems, then we need to move awayfrom exclusive dependence on data models and adopt a more diverse set of tools.

3. PROJECTS IN CONSULTING

As a consultant … I worked on a diverse set of prediction projects. Here are some examples:

  • Predicting next-day ozone levels.
  • Using mass spectra to identify halogen-containing compounds.
  • Predicting the class of a ship from high altitude radar returns.
  • Using sonar returns to predict the class of a submarine.
  • Identity of hand-sent Morse Code.
  • Toxicity of chemicals.
  • On-line prediction of the cause of a freeway traffic breakdown.
  • Speech recognition.
  • The sources of delaying criminal trials in state court systems.

7 Algorithmic Modelling

7.1 A New Research Community

In the mid-1980s two powerful new algorithms for fitting data became available: neural nets and decision trees A new research community using these tools sprang up. Their goal was predictive accuracy. … They began using the new tools in working on complex prediction problems where it was obvious that data models were not applicable:

  • speech recognition,
  • image recognition,
  • nonlinear
  • time series prediction,
  • handwriting recognition,
  • prediction in financial markets.

7.2 Theory in Algorithmic Modeling

… What is observed is a set of x’s that go in and a subsequent set of y’s that come out. The problem is to find an algorithm f(x) such that for future x in a test set, f(x) will be a good predictor of y. The theoryin this field shifts focus from data models to the properties of algorithms. It characterizes their “strength” as predictors, convergence if they are iterative, and what gives them good predictive accuracy. The one assumption made in the theory is that the data is drawn i.i.d. from an unknown multivariate distribution.

7.3 Recent Lessons

The advances in methodology and increases in predictive accuracy since the mid-1980s that have occurred in the research of machine learning has been phenomenal. There have been particularly exciting developments in the last five years. What has been learned? The three lessons that seem most important to one:

Rashomon: the multiplicityof good models;
Occam: the conflict between simplicityand accuracy;
Bellman: dimensionality—curse or blessing.

Breiman advocates using support vector machines (svms) for the algorithms, and trying to establish the spread of such algorithms that predict the data about equally well.

12. Final Remarks

… The best solution could be an algorithmic model, or maybe a data model, or maybe a combination. But the trick to being a scientist is to be open to using a wide variety of tools.

Comments

D.R. Cox

… One of our failings has, I believe, been, in a wish to stress generality, not to set out more clearly the distinctions between different kinds of application and the consequences for the strategyof statistical analysis.

…  [There] are situations where a directly empirical approach is better. Short term economic forecasting and real-time flood forecasting are probably further exemplars. Key issues are then the stability of the predictor as practical prediction proceeds, the need from time to time for recalibration and so on.
However, much prediction is not like this. Often the prediction is under quite different conditions from the data … .

    Professor Breiman takes a rather defeatist attitude toward attempts to formulate underlying processes; is this not to reject the base of much scientific progress? … Better a rough answer to the right question than an exact answer to the wrong question, an aphorism, due perhaps to Lord Kelvin, that I heard as an undergraduate in applied mathematics.

     I have stayed away from the detail of the paper but will comment on just one point, the interesting theorem of Vapnik about complete separation. This confirms folklore experience with empirical logistic regression that, with a largish number of explanatory variables , complete separation is quite likely to occur. It is interesting that in mainstream thinking this is, I think, regarded as insecure in that complete separation is thought to be a priori unlikely and the estimated separating plane unstable. Presumably bootstrap and cross-validation ideas may give here a quite misleading illusion of stability. Of course if the complete separator is subtle and stable Professor Breiman’s methods will emerge triumphant and ultimately it is an empirical question in each application as to what happens. It will be clear that while I disagree with the main thrust of Professor Breiman’s paper I found it stimulating and interesting.

Brad Efron

At first glance Leo Breiman’s stimulating paper looks like an argument against parsimony and scientific insight, and in favor of black boxes with lots of knobs to twiddle

Bruce Hoadley

Bruce describes algorithmic techniques similar to svm, as developed for credit scoring in the 60s. The approach was ‘a combination approach’ in the sense that the algorithms were initiated and tweaked to makes sense legally and commercially. He gives advice pon building data models to do the same thing:

[Build] accurate models for which no variable is much more important than other variables. There is always a chance that a variable and its relationships will change in the future. After that, you still want the model to work. So don’t make anyvariable dominant.

Rejoinder

Leo Breiman

… I readily acknowledge that there are situations where a simple data model maybe useful and appropriate; for instance, if the science of the mechanism producing the data is well enough known to determine the model apart from estimating parameters. …  Simple models can [also] be useful in giving qualitative understanding, suggesting future research areas and the kind of additional data that needs to be gathered.
At times, there is not enough data on which to base predictions; but policy decisions need to be made. In this case, constructing a model using whatever data exists, combined with scientific common sense and subject-matter knowledge, is a reasonable path.

… [Short-term] economic forecasts and real-time flood forecasts [are] among the less interesting of all of the many current successful algorithmic applications. In [Cox’s] view, the only use for algorithmic models is short-term forecasting;

My Comments

I agree with the author that the scope of statistics needed increasing to consider the problems and methods that he raises, but my own experience is more like Hoadley’s. In the 1980s became involved with an unattended road-side system used to count and crudely classify vehicles going past. A more sophisticated sensor was being developed. Human operators could provide much greater discrimination between vehicle types, but the algorithm developed to automate the process yielded poor performance. The method being used had been developed somewhat ad-hoc, as a combination of svm and data modelling. As a first step I tried to understand the algorithm from a scientific perspective, looking at the physics, the sensor design and the statistics. I soon found that the manually programmed element of the algorithm had got some physics wrong, which made for acceptable performance. But I also found that some of the design features that bad helped the human operators were a hindrance to the algorithm. This led to a redesign, and further improvements. Now the algorithm was a ‘Black box’ in so far as it used a large feature set with many parameters that were just meaningless numbers. But by understanding the general principles it was nonetheless possible to understand enough to improve the whole sensor -algorithm system, developing appropriate diagnostic techniques. Perhaps it was a ‘Grey box’?

The key take-away here is that what is needed is not just a tool, method or algorithm but the kind of understanding that is provided by an appropriate theory.

On some minor points:

  • Much of the discussion is really about what makes for good science.
  • The paper’s interpretation of Occam seems a little odd. If Occam is about making as few assumptions as possible, then the paper’s ‘forest’ approach seems sensible. I try to go further in trying to characterise what all the solutions have in common, and what the key differentiators are.
  • Following the UK floods of 2007 and the global economic crisis of 2007/8, the remark that algorithmic forecasting was ‘successful’ and even ‘less interesting’ seems odd. Of course, if prediction is the aim, we can just repeatedly predict that there will be no crisis and we will be correct almost all the time.

My own view is that where we want a prediction we often should also want an estimate of that prediction’s reliability. That is, we should know when our data models or algorithms are particularly suspect.

Dave Marsay