L. Jonathan Cohen Can human irrationality be experimentally demonstrated? THE BEHAVIORAL AND BRAIN SCIENCES (1981) 4, 317-370.
The ordinary person is claimed to be prone to serious and systematic error in deductive reasoning, in judging probabilities, in correcting his biases, and in many other activities. …
What is needed here is a conceptual framework within which to think coherently about problems of cognitive rationality and the relevant experimental data … .
Some of these allegations [of irrationality] are correct and important. But others seem to arise from a misapplication or misconception of the relevant standards of rationality … .
4. Applications of inappropriate normative theory.
There is a tendency for some investigators of irrationality to proceed as if all questions about appropriate norms have already been settled and the questions that remain open concern only the extent of actual conformity to these norms. It is as if existing textbooks of logic or statistics had some kind of canonical authority. But in fact many important normative issues are still controversial.
Cohen’s more detailed remarks are referred to below.
Open Peer Commentary on Main Issues
The targets of Cohen’s critique (now mainstream psychologists, such as Kahneman and Tversky) seemed unable to detect any substance in his comments, but he did get some support form his general issues, if not for his framework.
One predisposition that psychologists might learn from philosophers is to strive for maximum clarity in the terms one uses, so as to facilitate comparison between studies, avoid internal contradictions, and present coherent tasks to subjects. Another is to explore in depth the logical relationships between different theories and hypotheses … .A third is to appreciate the arguability of all assertions about what behavior is or should be, and the richness of the assumptions underlying them.
When ordinary people reject the answers given by normative theories, they may do so out of ignorance and lack of expertise, or they may be signaling the fact that the normative theory is inadequate. …
[Some] of the bleakest implications to be found in the literature on human irrationality bear on the unseemly enthusiasm that some psychologists have shown for portraying human reasoning abilities in the worst possible light.
Cohen argues that experimenters in the area of cognitive reasoning “risk imputing fallacies where none exist” if they test the subjects by applying an inappropriate normative theory of rationality. While this general thesis is no doubt correct, Cohen’s defense of it is problematic.
I accept much of Cohen’s argument. He is surely right to say that it is circular to regard norms of reasoning as confirmable mathematically or empirically; their validity must be a matter of “intuition” in some sense. And I agree that some response patterns have been deemed irrational only because inappropriate norms were applied … .
Avishai Margalit and Maya Bar-Hillel
In some instances Cohen is not content merely to defend human rationality in the face of error, but would use these very errors as evidence of the superiority of intuition over supposedly normative prescriptions.
And my favourite:
Henry E. Kyburg, Jr.
We need not alter Peano’s axioms because he couldn’t balance his checkbook; even he may have had some bum intuitions about arithmetic.
The following clear up some understandable misinterpretations of the main paper:
[The] actual interpretation of experimental data is bound to be affected by the resolution of certain fundamental issues about the normative criteria for rationality. …
Alternative interpretations can be easily constructed … for many of the texts that philosophers quote as examples of fallacies actually committed by journalists, politicians, advertisers, administrators, and the like.
[I] do not believe that logical truth is dependent at any point on what human beings actually do: the laws of logic, whatever they may be, hold good irrespective of whether or not there is life in the universe.
Thus we seem to have two standards of rationality: idealised logic and human logic ‘in practice’. Cohen and the psychologists disagree on the implications of idealised logic for the psychologists’ examples, and on whether human responses are consistent with any ‘proper’ logic. It may be that both think that there is a unique ‘proper’ logic, but I am not clear on this.
It seems to me that philosophers and psychologists have their own languages, including their own ‘logics’ and hence notions of rationality. Thus I do not think that they have different standards for the same thing, but are considering different things. Similarly, mathematicians have yet another concept. While we have some accommodation with philosophers, this is an accommodation, not an integration. Cohen was seeking a similar accommodation, but did not succeed.
The following by Cohen seems to me vital:
Equally (Quine 1960) we have to impute a familiar logicality to others if we are to suppose that we understand what they say: different logics for my idiolect and yours are not coherently supposable.
None of the comments have questioned this. It seems to me to be not only very wrong, but be importantly wrong. It seems to me that before the 60s it may have been true that established communities did tend to develop and enforce ‘familiar logicalities’, much in the same way that long-standing partners do, and that this greatly facilitates effective communication. But there have always been differences between countries and classes, and one of the main purposes of education (after gaining pieces of paper) has always been to develop students logicality, part of which (at least in England) has been a deliberate bringing together of strong-minded and intelligent people with different logicalities, to debate.
Logically, we have no grounds to suppose that we ever completely understand anything, and hence no grounds for suppose that others share our logics or sense of rationality. Thus differences in logicalities are an inevitable consequence of people struggling to comprehend, rather than sink into easy group-think. If we want to understand anything, we should be seeking out different logicalities, for how else can we be sure that our concepts are unduly not ‘biased’ by our logic? Differences in logicality should be encouraged and accommodated, not denied or assimilated. But Cohen appears to intend a more direct reading, as if we do think that we understand each other. It may be that something like this belief underlies some other beliefs about rationality that I find odd. But despite this (to me) unpromising framework, some of the paper’s points are worth considering.
Suppose that a subject is told D, and knows that D implies both X and X implies Y, and yet the subject denies Y. A valid criticism of many experiments is that they assume that this is an error in reasoning, based on arguments that conflate being told something with knowing something. The experimenters seem to use an intuitive mapping from the situation into classic logic and then find an inconsistency. But from a mathematical point of view, if you are told X you do not necessarily assume X. Rather, you consider hypotheses that might explain having been told X.
For example, in non-technical discourse, as detailed in the paper, utterances such as ‘if X then Y’ or ‘X is correct 80% of the time’ can be ambiguous (as in some of the examples below). It is reasonable for people to make the interpretation that seems most likely to them, which may not be what the experimenters expect.
Base Rates: Lottery
There is much discussion of when it is appropriate to take into account base rates. Niiniluoto claims:
[If] a reliable person (with [probability of truth] = .99) announces that the ticket number 267 has won in a fair lottery with 10,000 tickets, the credibility of this fact is, by (1), only 1/102 … .
This is not explained, but we can use it as a relatively straightforward reference case:
Suppose that, with all the usual assumptions, we had 100,000,000 draws. We would expect ticket 267 to be drawn about 10,000 times out of which it would be correctly announced about 9,900 times, while there would be about 1,000,000 errors, of which about 100 would be 267s. But suppose now that our ‘reliable person’ announces ‘267’ with a 1% probability whenever the ticket is not 267. In this case almost all 1,000,000 errors would be 267s, yielding Niiniluto’s credibility.
Thus implicit assumptions can make a huge difference to probabilities.
Base Rates: Cabs
M.S. Cohen has this to say about one of the experimenters’ example:s
Great care has certainly to be taken also in selecting the normative criteria by which the correctness of subjects’ probability judgments is assessed. In one experiment, for example, subjects were told that in a certain town blue and green cabs operate in a ratio of 85 to 15, respectively. A witness identifies a cab in a crash as green, and the court is told that in the relevant light conditions he can distinguish blue cabs from green ones in 80% of cases. The subjects were then asked: what is the probability (expressed as a percentage) that the cab involved in the accident was blue? The median estimated probability was .2, and investigators … claim that this shows the prevalence of serious error, because it implies a failure to take base rates (that is, prior probabilities) into account.
[It] is not clear that the frequencies within the whole city have any special relevance for the problem.
Cohen [L.J.] correctly notes that the neglect of base rates is sometimes legitimate. Cohen is right in suggesting that Bayesian norms are not always appropriate. But even if one endorses a Bayesian view of correct probabilistic reasoning for the situations that Kahneman, Tversky, et al. consider, the neglect of base rate information often remains legitimate. …
The problem is that the jurors know that the cab under consideration was involved in an accident, and if they are to ground subjective probability on knowledge of long-run relative frequency, it must be on knowledge of the long-run relative frequency of the witness correctly identifying the color as green of a cab involved in an accident. To obtain this information, the experimental subjects need to be told the percentage of blue (green) cabs in the city involved in accidents. … It could be any value from 0% to 100%. Hence, a good Bayesian should neglect the base rate given in the example because it is useless for the purpose of determining subjective probabilities.
It is precisely in situations of this sort that principles of insufficient reason are invoked. If we assume that the experimental subjects reason like Bayesians, they proceed as if they supposed that 50% of cabs in the city involved in accidents are blue and 50% are green. But that is precisely what insufficient reason recommends.
[Insofar] as inconsistencies with these canons emerge, they appear to derive from a tendency to use insufficient reason when knowledge of statistical probability of the sort required for direct inference is unavailable. Since this practice is notoriously inconsistent, it is not surprising that it is revealed in the experiments under consideration. But, as far as the issue of neglecting base rates is concerned, the experimental subjects studied by Kahneman and Tversky et al. seemed to have a better grasp of the matter – even from a Bayesian point of view – than do the experimental psychologists.
On the other hand, Sternberg says:
The most reasonable strategy is to apply the population base rate to individual cases unless special circumstances, such as an unusual hereditary disposition toward a rare disease that shares symptoms with a more common disease, dictate otherwise.
In more detail, Avishai Margalit and Maya Bar-Hillel note:
Formally, it is p (witness sees green | cab is green) which is the physiology-of-vision constant, while p (cab is green | witness sees green) depends on the ecological distribution of cab colors as well as on the witness’s visual accuracy, as Bayes’s theorem nicely shows (see also Bar-Hillel 1980). We wonder if Cohen would justify basing the probability that a cab is blue when the witness said it is green on the witness’s probability of erring even in the following two extremes: (i) when the percentage of blue cabs in the city is 0% (i.e., when the cab color couldn’t possibly be blue, no matter what the witness says); (ii) when the witness makes correct color identifications 50% of the time (i.e., when the witness does no better than chance, and so for all practical purposes is worthless).
(A digression: This conflates having been told that there are no blue cabs with knowing that there no blue cabs. Yet both the statement ‘there are no blue cabs’ and ‘I saw a blue cabs’ are witness statements, so it is not clear why one should be given precedence over the other.)
[The] subjects were just told of an 85%-15% distribution in cab colour, and this… is a very weak foundation for an estimate of the relevant base rate. It is certainly quite surprising that not only a few psychologists, like Tversky and Kahneman, but even a few people who say that they “make a living by teaching probability and statistics,” … should persist in supposing otherwise … . Specifically, why on earth should it be supposed that subjects, asked to estimate the unconditional probability that the cab involved in the accident was blue, ought to take into account a prior distribution of colours that would at best be relevant only if the issue at stake was just about the colour of a cab that was said to have been seen somewhere, not necessarily in an accident, and was taken to be blue?
Bilharzia is now one of the commonest diseases in the world, but it would be rather absurd to take its current frequency in the world population as predictive of the base rate, or prior probability, when diagnosing a patient who never wades in fresh water.
Those … who hold that gross epidemiological statistics are relevant here, irrespective of the patient’s own susceptibilities, are refusing to take into account what Keynes (1921) called the “weight” of evidence. The conclusion desired here is an unconditional probability about a single case, p(A), and that conclusion has to be detached from a conditional probability, p(A | E). People sometimes think, as Mackie evidently does, that such a detachment is legitimate just so long as E includes all our relevant knowledge. But wherever our relevant knowledge is rather limited in extent, and so the weight of the evidence is low, this policy can lead to disastrously bad estimates of the unconditional probability. Any insurance company that adopted such a policy would rapidly become bankrupt. Instead, we need to increase the weight of the evidence by discovering quite a substantial amount of the causally relevant facts (which means taking into account the causally relevant susceptibilities of the patient or the specific circumstances of the cab sighting) and determining the corresponding conditional probability, before we detach the unconditional probability. Of course, it may well be impossible to be perfect here. But that is no excuse for detaching the unconditional probability on the basis of excessively lightweight evidence. Moreover, just as some kinds of fact (such as the patient’s life-style or his past medical history) are weight-increasing with regard to the detachment of a particular unconditional probability, so too, other kinds of fact may be weight-reducing. Suppose, for example, that your name is Algernon Charles Thomas and that, though nothing is known about the frequencies of diseases A and B in the world, in your country, or in your city, it does happen to be known that among the twelve other (all unrelated) Algernon Charles Thomases, scattered over different continents and cultures, disease A is at the moment twice as common as disease B. To compound a baserate probability estimated from that accidental and very weakly predictive statistic, with the probability deriving from the diagnostic test, would obviously be weight-reducing. No doubt it is a matter for judgment, experience, and expertise in particular cases to distinguish weight-increasing from weight-reducing evidence. My claim was just that it is not unreasonable for subjects to suppose that the distribution of cab colours in a particular city, or current world-wide epidemiological statistics, are relatively nonpredictive, and generate weight-reducing base rates, in cases of the kinds in question.
In my view there are a number of issues.
- According to Bayes, we want P(blue|crash). It is not obvious that this is P(blue), but neither is it obvious how to apply the principle of indifference, or that the principle is appropriate here.
- As Cohen indicates, and contrary to what many psychologists seem to think, there is no universally valid and complete probability theory. To be mathematical we would need to specify a mathematical theory and give at least some grounds for supposing its axioms to be true. The papers that Cohen criticises often do neither.
- While I find Cohen’s account of perception implausible, so too is that of his critics. They state or assume that P(‘blue’|blue) and P(‘green’|green) (i.e., the probabilities of being correct) are the same whether P(B) = 0.5 or 0.85. This might be true for that witness, but if I knew that the cabs were 85% blue, I would only declare a cab to be green if I was sure.
- Boole (and Keynes) suggest setting all the relevant probabilities to be variables constrained by the given data, and then solving for possible values. This seems quite reasonable as a general procedure. In a court one might give a defendant the benefit of any doubt.
- It is the case that the theory the psychologists apply is the only complete and consistent theory in a sense that I do not find intuitive, but it is only complete and consistent when applied to certain ‘toy’ problems, and while it may be that the experimenters intend the subjects to interpret the problems as if they are toy, I agree with Cohen in so far as the subjects – not being used to dealing with toy problems – may – quite reasonably – give them a broader interpretation,
- Of course, it is also doubtful that inexperienced subjects would be ‘rational’ in the sense being studied, but the experimenters have not devised an experiment that proves it.
Base Rates: Disease
You are suffering from a disease that, according to your manifest symptoms, is either A or B. For a variety of demographic reasons disease A happens to be nineteen times as common as B. The two diseases are equally fatal if untreated, but it is dangerous to combine the respectively appropriate treatments. Your physician orders a certain test which, through the operation of a fairly well understood causal process, always gives a unique diagnosis in such cases, and this diagnosis has been tried out on equal numbers of A- and B-patients and is known to be correct on 80% of those occasions. The tests report that you are suffering from disease B. Should you nevertheless opt for the treatment appropriate to A, on the supposition (reached by calculating as the experimenters did) that the probability of your suffering from A is 19/23? Or should you opt for the treatment appropriate to B, on the supposition (reached by calculating as the subjects did) that the probability of your suffering from B is 4/5?
It is just that the prior probabilities have to be appropriate ones, and there is no information about you personally that establishes a greater predisposition in your case to disease A than to disease B. We have to suppose equal predispositions here, unless told that the probability of A is greater (or less) than that of B among people who share all your relevant characteristics, such as age, medical history, blood group, and so on.
Here Cohen argues that a patient and a physician, looking at the same test data and sharing all their information, should be allowed to have different probabilities for a diagnosis.
Furthermore, he agrees that the physician, who uses long-run frequencies, will in the long run make more correct diagnoses (and presumably make more correct decisions) than the ensemble of individual patients who ignore these frequencies. Would Cohen really insist on being treated according to his probabilities, rather than trust his physician’s? This is irrational.
(Note that the reasoning here is linguistic, rather than logical. It presupposes that the reader knows what is rational, and that rationality has certain properties, such as uniqueness.)
And all that each patient relevantly knows, however keenly interested he may be in himself as an individual, is that he is a member of a class (members of this population diagnosed by the test as having B) about which the test has only this degree of success.
This has similar issues to the cab example. In my view, if we apply the principle of indifference to an actual coin, then we must have P(Heads)=1/2. But an actual coin could be double-sided, so P(Heads) could be 1, 0 or approximately 0.5. These have very different implications for what happens if we toss the coin repeatedly. It is true that if we must assign a ‘point probability’, then that probability must be 1/2. But must we assign such a probability, or could we not say ‘If the coin is double sided then the probability would be 0 or 1, and if it is fair-ish then it is about 0.5. We cannot imagine any other possibility’? Similarly, it seems unreasonable to try to assign a point probability in the medical case.
Base Rates: Bayes’ Rule
A point at issue is Bayes’ rule. Is it appropriate to apply it, and how does it apply? Cohen does not explicitly apply Bayes’ rule, which is quite reasonable as it follows directly from most axiomatizations, and therefore it is appropriate – as Cohen does – to focus on the axioms. This gave some readers the impression that he was contradicting Bayes’ rule, but as others suggest, Bayes’ rule is not really the point at issue here.
We can cover the cab and disease examples by supposing that there is a definite population of which a subset A (green cabs, specific disease) are of interest, given some evidence, E (witness statement, test). If apply Bayes’ rule directly, we have P(A|E)=P(E|A).P(A)/P(E). But consider a subset X (e.g. of cabs that have accidents or people of our gender, age etc.) Then, equally, we want P(A|E,X)=P(E|A,X).P(A|X)/P(E). This is the same when P(A&E|X)=P(A&E). But is this always true? If we do not know, is it reasonable to ignore the possibility that it might not be? (E.g. Mackie.) This would require something like the (questionable) principle of indifference, but stronger. It certainly seems wrong to gloss over the issue without comment.
I have just noticed that the test for the disease, while presented in probabilistic language, appears to be deterministic. This changes the ‘logic’. My interpretation is that there is a marker, M, that the test identifies correctly, and that we have been told that P(B|M)=0.8., which is the required result. This is a bit confusing. If you think of a test as being a probabilistic response to a condition, then the calculation of the previous paragraph applies. But if it reliably picks out a marker (such as a gene or blood condition) that probabilistically ’causes’ the disease, then the logic is different. My reading is the same as Cohen’s, but seems not to be what the experimenters intended.
Looking at it again, I have assumed that the marker is something like a gene, virus or blood condition, that would have ’caused’ the disease. But maybe it is something like a blood level, which is only a symptom. But this seems to fit the description less well, and in any case the point is that the question has no purely ‘rational’ answer: it at least depends on one’s knowledge of medicine.
Base rate: Causality
Cohen’s repeated reference to issues of causality are perhaps overly philosophical and inaccessible. But there is something in them.
If a witness was in a part of the city that had atypical proportions of cabs, it would clearly be nonsense to use the overall proportions, and it would seem prudent to make enquiries about such things rather than assume that the city-wide statistics were appropriate. Thus a reasonable variant of the principle of indifference might be to restrict it to cases where a reasonable effort has been made to identify any contra-indications, and to make the use of the principle explicit and open to challenge.
Patients generally have tests for a reason. If a test was ordered because disease D was suspected, then the patient is one of a population that is suspected of having disease D, and even without any specific data it is reasonable to suppose that the likelihood of D is greater than if the test had been applied at random.
The above are examples of ‘causality’: the sample is not truly random. (But the use of the term ‘causality’ seems to invite confusion and digression.)
Base rate: experts and clients
One of Kahneman’s comments has perhaps suffered with time. It used to be that experts in all fields would tend to gather and analyse all the ‘relevant’ data and then pronounce on the outcome, such as a recommended course of action or a probability. In this setting, the approach to probability that Kahneman and some others advocate seems quite reasonable. But is it ‘rational’ for clients to take the expert’s opinion at face value? In the role of both expert and client I have made or found it more common for there to be a dialogue, elucidating possible factors (‘objective’ and otherwise) and attitude to risk. This has the theoretical benefit that neither side knows why the result should be doubted. My experience is also that in such discussions Boolean ranges of reasonable are much more appropriate than trying to establish an unbelievable precision.
Base rates: Accessibility
The experimenters refer to an ‘accessibility bias’. They notice that people’s estimates are ‘as if’ they used a base rate derived from their own experience, rather than the whole population. This is regarded as ‘irrational’ and perhaps rather stupid. But, as noted above, it is important to use a base rate derived from an appropriate reference set. The official figures will tend to be of high quality, but may not be appropriate. In many situation, such as diseases, the cases that come to mind may be ones relatives and others with similar backgrounds and lifestyles to you. Thus they may actually form a more appropriate reference set than the entire population. (Of course, media reports may not form an appropriate reference set, but the point here is that the experimenters’ position needs justifying, and Cohen’s alternative is at least as reasonable.)
It seems to me that many think like the Psychologists, and it would be a proper study of Psychology to help us to understand why. It is unfortunate that some authoritative Psychologists regard themselves as ‘rational’ and so do not seek to explain their delusions. Cohen does not do this either, merely seeking – futilely – to persuade them of their error.
A well known criticism of Bayesianism and – more generally – rationalism is that it always seeks a precise answer, and hence rushes to adopt ‘principles’ that are necessary to ‘break ties’, without considering he consequences. Lycan, in his comments, goes further. Thinking of us as the product of a practical ‘mother nature’ (evolution or design) he supposes that:
She will build us to prefer simpler hypotheses to more complex ones, because they are easier to work with and afford plenitude of prediction out of parsimonious means (hence the canons of simplicity that implicitly govern straight-rule induction and explanatory inference; see also Sober 1981). Except in certain special cases, She will not sanction our changing our minds without reason, because the instability created by arbitrary changes in belief would be inefficient and confusing (hence the conservatism implicit in much of our epistemic practice).
According to Boole, we all have various beliefs and principles that are more or less tightly held. If Lycan is right then rules that strongly suggest such principles as the principle of indifference will be very tightly held and hence not given up, reason or no. In this sense it may well be – as Cohen suggests – that typical humans prefer the fiction of narrow rationality (e.g., of predictability, objectivity) to, for example, Russell’s account of Human Knowledge. But it seems to me that such human rationality ought to be distinguished from the logical concept, and mathematicians ought not be blamed.
I am not sure that I have understood Cohen, and the ideas that he critics are mainstream. I imagine that in a poll, a majority of mathematicians, and perhaps an overwhelming majority of younger applied mathematicians, would defend the Psychologists.
Rationality fulfils a role in our social order, and to fill that role rationality must be more or less as the Psychologists suppose. But this begs the question of whether the desired role can be fulfilled, or whether anything that purports to fill it must have dangers. This is the type of question that mathematics (post Whitehead and Russell) is intended to answer, and the answer is ‘No’. There is nothing that will always enable us to predict, or that will allow us to make ‘optimal’ decisions, except in very narrow circumstances. To be fair, the Psychologist’s example seem carefully constructed to have straightforward, unambiguous interpretations, and hence to be cases where conventional rationality does apply. But, as Cohen observes, the have failed. Even in the superficially simplest of cases – at least from a logical point of view – one always needs to assess the applicability of one’s methods (which is why one needs experienced practitioners). The Psychology papers made no attempt to justify their assumptions of straightforwardness, and did not really engage with the issues in their response.
It is the case that the examples could have been drawn from school text books, and that in the context of a school exam the Psychologists’ answers would get full marks and their subjects none. But taken out of the school context and appearing to be about real lives the questions become quite different. It could be that the subjects (mostly students) were taking account of uncertainties that the experimenters were not, but I rather doubt it. All one can seem to say is that the subjects were differently rational.
My notes on: