Interpreting Probability: Controversies and Developments in the Early Twentieth Century

‘This is a study of two types of probability’, the Introduction to this book informs us. The two types of probability are frequentist objective probability, and the more subjective epistemic, or Bayesian, probability. What immediately marks this book off from other discussions of the same topic is the author’s device of personifying the tension, if not outright conflict, between these two ideas of what probability is, in terms of a historically extended dialogue between two historical protagonists, and antagonists, R.A. Fisher and Harold Jeffreys. In the nineteen thirties, when the controversialists were principally engaged with each other, Fisher was the great champion of frequentist, and Jeffreys of Bayesian, probability. Both were themselves eminent mathematical scientists: Fisher a statistician and geneticist, Jeffreys a geophysicist. This book charts the course of their debate.

Allegory and its more restrained relative, the Dialogue, are devices of ancient pedigree for dramatizing conflicts of ideas. It is a pity that they have almost died out. To revive them, and moreover to revive them in the context of an actual historical conflict, is a genuine coup. Not only that: it must be said that the result in this case is a pleasure to read and will certainly become an indispensable scholarly resource. Apart from any other consideration, the Jeffreys-Fisher conflict is a most important episode in the (fairly) recent history of scientific ideas, which has not been systematically investigated before. Howie has uncovered some fascinating information, not only about these two men themselves, but also about some of the other notable figures of twentieth-century statistical science who either took part in or continued the debate, like J.B.S. Haldane, Karl and his son Egon Pearson, and Jerzy Neyman, among others.

The controversy itself is also placed in a historical context. The book starts well before Fisher and Jeffreys crossed swords, at the dawn of the mathematical theory of probability itself, makes some interesting observations about Laplace and other great pioneers, discusses the contribution of C.D. Broad to the problem of induction to laws (a problem which preoccupied Jeffreys, and in response to which he and Dorothy Wrinch produced their Simplicity Postulate), and continues, after Fisher and Jeffreys ceased to be active contributors, with a discussion of how more recent developments impinged on their controversy. The Second World War, and its demands for efficient quality control, saw the leadership of the frequentist camp passing to Neyman, with a concomitant emphasis away from considerations of valid scientific inference and towards the development of reliable methods for minimizing the frequency of erroneous decisions. Coincidentally, the Bayesian position also shifted in the direction of decision theory, with the publication of Savage’s influential Foundations of Statistics (1954) reestablishing the theory on an explicitly utility-theoretic foundation (an idea that Jeffreys explicitly repudiated).

The author’s use of an actual historical debate as the focus of his discussion of Bayesian versus frequency probability is no mere rhetorical device, however. It subserves another purpose of Howie’s, which is to illustrate a general claim about the importance of context for understanding the dynamics of scientific ideas. In this case, Howie argues, once the respective intellectual and scientific contexts within which Fisher and Jeffreys were working are fully understood, much of the appearance of conflict disappears: both were to a great extent talking past each other, not appreciating that what each took to be decisive objections to the other largely reflected their own failure to grasp the other’s purposes and background assumptions. An example of such mutual misunderstanding was the dispute over the probability that a third of three observations would lie between the first two. Jeffreys used an elementary argument for the probability being one third. Fisher disagreed, claiming that it would depend on how far apart the first two observations were. As Howie shows, however, both were making different assumptions: Fisher was implicitly conditioning on two fixed data points and a presumed fixed model, while Jeffreys was appealing to a completely unconditional probability. Given their separate assumptions, both were correct.

Howie reaches a more general conclusion, that overall neither of Fisher and Jeffreys was wrong and the other right: ‘each of Fisher’s and Jeffreys’s methods was coherently defensible, and … the clash between them was not a consequence of error on one side’ (p. 171). Is this true? The principal bone of contention between the two men was the role that probability and probabilistic reasoning play in science, and here the two positions do seem, at any rate at first sight, mutually incompatible. According to Fisher, the only scientific concept of probability is that of frequency in hypothetical infinite populations. The task of the scientist is to identify the latter and use the battery of random-sampling techniques Fisher had himself developed to assess the population-parameters or to reject the null hypothesis when a difference between populations is hypothesized. Principal among the estimation criteria was that of maximum likelihood. Where such estimates exist they automatically satisfy other important frequentist desiderata, principally those of asymptotic normality and minimum variance; more importantly, Fisher saw in likelihood a measure of rational belief, and it was a crucial characteristic of likelihood for him that it is not formally a probability, for it does not satisfy the addition principle (the likelihood of ‘A or B’, for disjoint possibilities A and B, is meaningless). Later, he was to exploit a formal symmetry, which can arise in suitable contexts between parameter and random variable to develop what he believed to be a class of allegedly frequentist probability distributions, called fiducial distributions, over parameters. For Fisher, the combination of significance tests, maximum-likelihood estimation and fiducial probability was a sufficient foundation of inference from data. Bayesian, or ‘inverse’ probability as it was traditionally called, was not only unnecessary, but worse, it was subjective where science demands objectivity, and, because it used the Principle of Indifference to generate uniform prior distributions which do not in general survive reparametrisation, it was a profligate generator of inconsistencies.

Jeffreys, by contrast, believed that only a systematically developed theory of Bayesian probability could furnish an adequate theory of valid uncertain inference from data. Jeffreys contested Fisher’s own theory at every point. He claimed that significance testing is unsound since, based as it is on the probability according to the null hypothesis of data at least as extreme as that observed, it involves the consideration of data that have actually not been observed. In his Theory of Probability he showed that the objection is not merely academic: the use of tail-area probabilities can create considerable bias, particularly in large samples (a feature now known as Lindley’s Paradox). He also believed, with a good many other people at the time and even more later, that fiducial probability was based on a formal confusion, and he argued, plausibly, that maximum likelihood could not give a determinate inference since any group of data can be exactly fitted by an infinity of alternative laws (this is the phenomenon we now know as underdetermination, and Jeffreys himself, long before Goodman and grue, gave in his Theory of Probability a simple formal algorithm for generating such an infinity of alternatives). He also pointed out, against Fisher’s insistence that probability must be identified with frequencies in hypothetic infinite populations, that we necessarily observe only single, even if multidimensional, data points, not indefinitely repeated samples from an in-principle unobservable population. As to Fisher’s charge of subjectivism, Jeffreys believed that some subjective element is always present in any inference from data, but that it could and should be minimized by ensuring that the prior distributions, where the subjectivism was typically located by critics of the method, represent the total relevant background knowledge possessed by working scientists (like himself).

All this, and more, is faithfully retailed by Howie, with a wealth of documentary and interview-based evidence, including the many letters exchanged between Fisher and Jeffreys themselves, and between them and third parties with an interest in the issue and who also contributed to its discussion, like the contemporary statisticians Bartlett, Lindley and Barnard. Prima facie, the dispute seems a fundamental one. Nevertheless, Howie supports with some interesting details his case that much of the disagreement was merely at the sound-and-fury level. For example, Fisher himself did not rule out a subjective element in the choice of statistical model (i.e. the presumed population), but he believed that (a) this was a matter for the scientist’s personal judgment, and not representable by any simple-minded and universal system of rules like those of inverse probability and (b) once the model had been settled on, his own evaluative criteria provided a completely sound and objective way of determining its parameters. Secondly, the two apparently competing methodologies were actually in agreement in an extensive class of cases, in particular those where the data set was large: Jeffreys himself proved that for large independent samples posterior probability asymptotically agrees with maximum likelihood estimation (i.e. the maximum likelihood estimate approximates the mean of the posterior distribution independently of the prior distribution). Thirdly, the types of scientific problem each faced were typically very different, and their methods, according to Howie, to a considerable extent reflected the peculiar features of each. Fisher, working at Rothampstead, could rely on highly controlled data, which could be safely assumed to come from a single population, whereas Jeffreys had to make do with sometimes sparse, and often unreliable, data from several different sources characterized by possibly quite different structural characteristics. Thus Fisherian methods were naturally confined to parameter estimation and testing without having to consider alternative underlying models (a feature still present, typically without any accompanying explanation, in introductory textbooks of statistics), while Jeffreys faced uncertainty, sometimes radical, at every level of the theoretical ascent from the data.

Whether Howie is right or wrong in his overall conclusions the reader will have to judge. But it’s a good story and he tells it very well. Indeed, he tells it like it is (or was), simply and clearly, steering clear of portentous philosophizing on the one hand and a too-minute attention to mathematical and biographical detail on the other, an approach to intellectual history now relatively uncommon but most welcome where it occurs. There are some blemishes that a careful reading by the publisher’s referees or Howie’s research supervisor should have spotted, though they can be easily eliminated in the (hopeful) event of a reprinting. Thus, there are references to ‘transinfinite series’ on pp.86 and 109, and to the possible ‘fractions’ of black balls in an infinite urn (though this admittedly follows Laplace’s own account) on p.24. Bulwer Lytton becomes Lytton Bulwer on p.36, and de Finetti’s well-known representation theorem is described on p.227 as his ‘representation theory’. It is not mathematically accurate, or even meaningful, to say that Kolmogorov ‘defined probability as a measure property of a set within a field constructed according to a series of axioms’ (p.219), while to refer to Popper’s much-discussed and arguably seminal propensity theory only with a footnote remark that it ‘was a half-baked attempt to apply the probabilities of von Mises’s collectives to individual events’ (p.221) is irresponsibly flippant.

That said, this is a timely and valuable contribution to our knowledge of the period and its great figures. There is a wealth of incidental, but always relevant and often fascinating, historical detail. Probability of course also has a role to play outside statistics, and Howie devotes some space to this, with brief but informative discussions of its use in the social sciences, biology and especially physics. Another distinctive feature of his book is that, though it concerns a highly technical subject matter, his own discussion is anything but technical in any overtly formal sense: in fact, there are hardly any formulas in the book. Yet he succeeds in conveying, in words, the technical ideas both precisely and clearly (the lucid discussion, on pp.129-132, of the Jeffreys-Haldane prior and its relation to the treatment of error is a notable case in point). Its thoroughness, combined with an assured informality and lightness of touch, make the book an enlightening and entertaining read.