Mayo’s Statistical Inference

Mayo, D. (2018). Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars. Cambridge: Cambridge University Press.

I have yet to read the full book, but she is very free with some important quotes. Robert Kass (2011): 

We care about our philosophy of statistics, first and foremost, because statistical inference sheds light on an important part of human existence, inductive reasoning, and we want to understand it.

I agree. This is not about some technical details of some arcane specialist activity, but about something that has had, and continues to have, a dismal effect on all our lives. She quotes (Fisher 1935b/1947, p. 14)

[W]e need, not an isolated record, but a reliable method of procedure. In relation to the test of significance, we may say that a phenomenon is experimentally demonstrable when we know how to conduct an experiment which will rarely fail to give us a statistically significant result.

I am tempted to Bowdlerize this as:

In life, we need, not a way of making isolated ‘rational’ decisions, but a reliable way of life. We may say that our ways are justified when we know how to conduct ourselves in ways which often give us greater justification.

Of course, the nature of the justification matters, whether statistical or otherwise. Mayo is against ‘BENT science (Bad Evidence, No Test). We may also be against BENT justice, BENT international relations, etc. She introduces the:

Severity Requirement (weak): One does not have evidence for a claim if nothing has been done to rule out ways the claim may be false. If data x agree with a claim C but the method used is practically guaranteed to find such agreement, and had little or no capability of finding flaws with C even if they exist, then we have bad evidence, no test (BENT).

We may generalise this to considering data quite broadly, as in the kind of things that politicians cite. Mayo has a stronger version:

Severity (strong): We have evidence for a claim C just to the extent it survives a stringent scrutiny. If C passes a test that was highly capable of finding flaws or discrepancies from C, and yet none or few are found, then the passing result, x, is evidence for C.

This also generalises. It is also useful, I think, to compare the severity of tests, and to seek increase it. To go back to my Bowdlerisation:

Our ways are more justified the more severely we test them.

This seems about right, leaving us to consider the relevant aspects of severity, not just in statistics. Hence my blog.

I may add to this, as Deborah sends more extracts.

%d bloggers like this: