Gelman ea’s Philosophy
Andrew Gelman and Cosma Rohilla Shalizi Philosophy and the practice of Bayesian statistics British Journal of Mathematical and Statistical Psychology (2012)
A substantial school in the philosophy of science identifies Bayesian inference with inductive inference and even rationality as such, and seems to be strengthened by the rise and practical success of Bayesian statistics. We argue that the most successful forms of Bayesian statistics do not actually support that particular philosophy but rather accord much better with sophisticated forms of hypothetico-deductivism.
1. The usual story – which we don’t like
Bayesian statistics or ‘inverse probability’ – starting with a prior distribution, getting data, and moving to the posterior distribution – is associated with an inductive approach of learning about the general from particulars. … Anything not contained in the posterior distribution … is simply irrelevant … . The goal is to learn about general laws, as expressed in the probability that one model or another is correct. This view, strongly influenced by Savage (1954), is widespread and influential in the philosophy of science (especially in the form of Bayesian confirmation theory ) … and among Bayesian statisticians … .
We think most of this received view of Bayesian inference is wrong.
2. The data-analysis cycle
Scientific progress, in this [conventional Bayesian] view, consists of gathering data … designed to distinguish among interesting competing scientific hypotheses … – and then plotting the posteriors over time and watching the system learn … .
In our view, the account of the last paragraph is crucially mistaken. The data-analysis process – Bayesian or otherwise – does not end with calculating parameter estimates or posterior distributions. Rather, the model can then be checked, by comparing the implications of the fitted model to the empirical evidence. One asks questions such as whether simulations from the fitted model resemble the original data, whether the fitted model is consistent with other data not used in the fitting of the model, and whether variables that the model says are noise (‘error terms’) in fact display readily-detectable patterns.
3. The Bayesian principal–agent problem
The Bayesian agent is the methodological fiction (now often approximated in software) of a creature with a prior distribution over a well-defined hypothesis space … , a likelihood function …, and conditioning as its sole mechanism of learning and belief revision. The principal is the actual statistician or scientist.
The ideas of the Bayesian agent are much more precise than those of the actual scientist; in particular, the Bayesian (in this formulation, with which we disagree) is certain that some [model] is the exact and complete truth, whereas the scientist is not.
To sum up, what Bayesian updating does when the model is false (i.e., in reality, always) is to try to concentrate the posterior on the best attainable approximations to the distribution of the data, ‘best’ being measured by likelihood. But depending on how the model is mis-specified, and how [the model] represents the parameters of scientific interest, the impact of misspecification on inferring the latter can range from non-existent to profound.
4. Model checking
In our view, a key part of Bayesian data analysis is model checking, which is where there are links to falsificationism.
This gives us a lot of flexibility in modelling. We do not have to worry about making our prior distributions match our subjective beliefs, still less about our model containing all possible truths. Instead we make some assumptions, state them clearly, see what they imply, and check the implications. This applies just much to the prior distribution as it does to the parts of the model showing up in the likelihood function.
[f] we are in a setting where model A or model B might be true, we are inclined not to do model selection among these specified options, or even to perform model averaging over them … but rather to do continuous model expansion by forming a larger model that includes both A and B as special cases.
5. The question of induction
Most directly, random sampling allows us to learn about unsampled people … , but such inference, however inductive it may appear, relies not any axiom of induction but rather on deductions from the statistical properties of random samples, and the ability to actually conduct such sampling.
To sum up, one is free to describe statistical inference as a theory of inductive logic, but these would be inductions which are deductively guaranteed by the probabilistic assumptions of stochastic models. We can see no interesting and correct sense in which Bayesian statistics is a logic of induction which does not equally imply that frequentist statistics is also a theory of inductive inference (cf. Mayo & Cox, 2006), which is to say, not very inductive at all.
6. What about Popper and Kuhn?
Kuhn’s distinction between normal and revolutionary science is analogous to the distinction between learning within a Bayesian model, and checking the model in preparation to discarding or expanding it.
7. Why does this matter?
The idea of Bayesian inference as inductive, culminating in the computation of the posterior probability of scientific hypotheses, has had malign effects on statistical practice.
In our hypothetico-deductive view of data analysis, we build a statistical model out of available parts and drive it as far as it can take us, and then a little farther. When the model breaks down, we dissect it and figure out what went wrong. For Bayesian models, the most useful way of figuring out how the model breaks down is through posterior predictive checks, creating simulations of the data and comparing them to the actual data.
We fear that a philosophy of Bayesian statistics as subjective, inductive inference can encourage a complacency about picking or averaging over existing models rather than trying to falsify and go further. Likelihood and Bayesian inference are powerful, and with great power comes great responsibility. Complex models can and should be checked and falsified. This is how we can learn from our mistakes.
The Bayesian approach that is criticised is perfectly proper when one can be sure that the models being considered contain the ‘true’ model, as when one has some overarching ‘true’ theory within which one is trying to estimate a parameter. It is also proper to apply the approach more widely and to acknowledge its limitations. This is often the case in normal ‘hard’ science, but – as this paper points out – not otherwise. For example, with this approach astronomers would have continued to use Euclidean Geometry.
The key observations of the paper are:
- The assumptions of the Bayesian approach are often violated.
- The most likely of a selection of wrong models may not be the best of those models for the intended purpose.
- E.g., a parameter or class derived from the most likely model may have an extreme bias.
- However, Bayesian techniques can be used pragmatically, to draw attention to a model that can then be compared with the data with a view to developing an improved family of candidate models.
To this one can add a theory of Keynes’, to the effect that if the model is close enough in terms of likelihood then one can use it as a reasonably accurate descriptive model. (E.g. describing correlations but not assigning causation.) I would also point out that
- The positive content of the above is really about estimating the proportions of a population that have various characteristics based on samples. The broader notions of ‘probability’ is not touched on, and certainly not justified.
- The approach can be seen as a maximally mathematically informed logic synthesis of probability theory and a theory of science. (Good.)
Andrew GELMAN and Christian P. ROBERT “Not Only Defended But Also Applied: The Perceived Absurdity of Bayesian Inference” The American Statistician, February 2013, Vol. 67, No. 13
Makes it clear that the authors are moderate Bayesians:
More than that, though, the big, big problem with the Pr(sunrise tomorrow | sunrise in the past) argument is not in the prior but in the likelihood, which assumes a constant probability and independent events. Why should anyone believe that? Why does it make sense to model a series of astronomical events as though they were spins of a roulette wheel in Vegas? Why does stationarity apply to this series? That’s not frequentist, it is not Bayesian, it’s just dumb. Or, to put it more charitably, it is a plain vanilla default model that we should use only if we are ready to abandon it on the slightest pretext.