O’Hagan’s Fractional Bayes Factors …
Anthony O’Hagan Fractional Bayes Factors for Model Comparison Journal of the Royal Statistical Society. Series B (Methodological), Vol. 57, No. 1(1995), pp. 99-138.
Fractional Bayes Factors for Model Comparison
A new variant of the partial Bayes factor, the fractional Bayes factor, is advocated. …
This makes use of some reasonable fraction of the total data as training data, depending on the extent that robustness to outliers is a concern. It depends only on the likelihood function, and is thus coherent. But the paper is perhaps more important for some of its, and is discussion’s, insights into the general problem.
An uncertainty principle
[T]here is a general sense in which sensitivity of model comparison to the prior increases with sample size, as sensitivity of inference within a given model decreases. If the likelihood is relatively diffuse, the variation of the prior density cannot be great in average value … When [the likelihood] concentrates on a narrow range of [parameter] values, however, a relatively mild perturbation of the prior could produce a much larger change in the average value over a small range, leading to greater sensitivity of [the Bayes factor]. It is almost as if there were a law of conservation of sensitivity, and increasing sample size only transfers sensitivity from inference about [a parameter] within [a given] model … to inference between models.
Use of partial Bayes factors reduces [but does not eliminate] this sensitivity … .
The paper focusses on the use of improper priors, where the sensitivity above is particularly important.
In finite samples, there is a trade-off between robustness, which increases with b the proportion of the data used as a training set], and discriminatory power, which decreases with [it]. …
[I]t is incoherent to assign both improper priors and a probability for the model[s]. … It is better to think about [the required parameter] and what it means to the scientist. It is his prior that is needed, not the statistician’s. No one who does this has an improper distribution.
It is an important challenge to develop theoretically well-founded methods for model comparison. When comparing only a few models, the most important criteria are probably difficult to formalize and rest on connections between the model and the subject-matter context.
Granted that the specification of priors is still an unfamiliar and delicate task, the following approach to coherence across models may be helpful. First, specify a proper prior for the most complex model considered, or for a new model generalizing all those considered. Then specify a proper prior within each of the other models to match, as closely as possible, the induced predictive distribution for a suitable ‘minimal sample’, varying with the model considered. This idea is related to, but distinct from, that underlying the partial Bayes factor, and is fully coherent. …
One way of overcoming the difficulties would be to enquire what the client’s prior (genuine, not ‘conventional’) for the parameters would be, on the supposition that model I is correct, and similarly for model II. Accepting each prior in turn converts each model into a simple statistical hypothesis and the likelihood ratio L given by the data for these two hypotheses would give a ‘client’s Bayes factor’ which would seem to be more meaningful than that proposed by the author. If L turned out to be near to 1, it would suggest that we might be near an Elderton situation [where the data is equally compatible with either model].
Some Bayesians have propounded the wholly unjustified dogma that the solution to every problem of statistical inference must take the form of a posterior distribution. In cases of the kind considered here we must allow that the appropriate response to the client’s question may be that the available data do not allow a response of this form.
Bayes factors have an inherent sensitivity to the prior, and this can be reduced only by inappropriately departing from Bayesian reasoning. …
[Often] fractional Bayes factors cannot arise as true Bayes factors (although they are not far off).
There are several references in the paper to issues such as ‘sensitivity to outliers’, ‘sufficiency’ and ‘coherence’ of the intrinsic Bayes factors and fractional Bayes factors. These are not serious issues (for properly defined intrinsic Bayes factors) unless the sample size is quite small; for very small sample sizes Berger and Pericchi (1993) recommend the ‘intrinsic prior’ which overcomes all these difficulties
Although interesting, Professor O’Hagan’s development appears to suffer from the same basic limitation as many standard methods for the comparison of Bayesian models: it is predicated on the truth of one of the models being compared, and if all the models being compared are wrong then selecting the best according to Bayes factors can be practically disastrous. If we accept the commonly held view that all models are likely to be wrong yet some can be very useful for many purposes, the critical issue when comparing, selecting and accepting models is to be sensitive to the purposes to which they will be put. For this goal, some goodness-of-fit measure tuned to the specific intended purpose is needed to compare and select models. For a specific example, in Rubin (1983) I drew inferences for a real finite population of 804 cities from a simple random sample of 100 by using Bayesian models … . The best fitting model according to straightforward likelihood or Bayes factors criteria gave atrocious real world inferences for the population total compared with the simple-minded, and obviously inferior fitting, normal model … . The use of fractional Bayes factors would not have helped here because, in a set of wrong models, the best model according to likelihood criteria can still produce predictions that are inconsistent with observed data or scientific understanding.
When conventional improper priors do not work, it does not help to force a proper prior distribution out of the client and to accept the result as a perfect representation of the client’s prior knowledge. We must consider whether the inferences are robust to the substantial uncertainty which must attach to that prior specification. Professor Lavine and Professor Wolpert underline this fact. If in their example the client cannot assert which prior best represents his or her knowledge-uniform over a moderate range or uniform over a very large range-then the inferences will certainly not be robust.
… I do not think that I have strayed from clear Bayesian thinking. If I have, and if these methods do not strictly conform to the Bayesian paradigm, then I am confident that a sensible solution has not yet been found that does conform, either for the problem of model comparison with weak prior information or for the general question of Bayesian robustness.
Model comparison and choice is not a precise science, and there are good reasons for supposing that it never can be, always involving trade-offs. In most cases of interest (where we lack enough and unambiguous enough data) even after the fullest statistical analysis there will be some significant uncertainty over and above the probability of error claimed by the analysis. In such cases it may be that better if the problem owner shapes and drives any assumptions, rather than the statistician.