Evidence and Evolution: The Logic Behind the Science

Placeholder book cover

Elliott Sober, Evidence and Evolution: The Logic Behind the Science, Cambridge UP, 2008, 392pp., $28.00 (pbk), ISBN 9780521692748.

Reviewed by Jessica Pfeifer, University of Maryland, Baltimore County



Elliott Sober’s excellent book, Evidence and Evolution, builds on views about evidence that Sober has been developing over the years and shows how these views bear on issues relevant to evolutionary biology. The book is divided into four main chapters, with a fifth chapter as a conclusion. The first chapter develops Sober’s views about evidence, while Chapters 2-4 apply this discussion to three issues of importance to evolutionary biology: the argument for intelligent design (Chapter 2), the evidence for natural selection (Chapter 3), and the evidence for common ancestry (Chapter 4). One advantage of this organization is that it is possible, without too much loss, to read Chapter 1 and then skip to whichever later chapters are of interest. While there are points made in the intervening chapters that might be relevant for later conclusions, Sober very helpfully makes note of where these topics have previously been discussed.

In Chapter 1, Sober not only forcefully defends his particular views about evidence, but in the process also provides an excellent introduction to many of the issues at stake between Bayesian, likelihood, and frequentist accounts. Sober argues that versions of each approach have their place. However, his view is not pluralistic. Which view one ought to adopt depends on the goals one has, the information at hand, and the hypotheses of interest. Bayesian methods can tell us what our degree of belief in a hypothesis ought to be, likelihoodism has the more modest aim of telling us whether and to what degree the evidence favors one hypothesis over another, and the version of frequentism Sober endorses (model-selection theory, and in particular the Akaike Information Criterion) estimates how accurate a model will be at predicting new data when fitted to old data.

Sober argues that one ought to be a Bayesian except when there is no objective basis for assigning prior probabilities to a hypothesis or for assigning likelihoods to both a hypothesis and its negation. (Likelihood is a technical term meaning the probability of the evidence given the hypothesis, p(E/H).) Subjective assignment of priors is inadequate for justifying probability assignments interpersonally, which Sober argues is essential for the practice of science. Moreover, the negations of many hypotheses are “catchall” hypotheses. Not-H is equivalent to the disjunction of all the particular alternative hypotheses to H. For Bayesians, the likelihood of ‘Not H’ would be the average of the likelihoods of each of these various alternatives, weighted by their probability conditional on H being false. However, in many cases, some of the alternatives to H have yet to be conceived, and there is no objective basis for assigning probabilities to each of the alternatives conditional on not-H. In such cases, we ought to retreat to a more modest likelihood approach that merely compares the likelihoods of a hypothesis to one or more particular rivals. The competing hypotheses need not be exhaustive of all the possibilities; likelihoodism is merely interested in determining whether and to what extent one hypothesis is favored by the evidence over another specific hypothesis. On the likelihood approach, the Law of Likelihood (LL) specifies when evidence favors one hypothesis over another:

(LL) Evidence, E, favors H1 over H2 if and only if p(E/H1) > p(E/H2).

Likelihoodism, though, also has its limitations. In particular, it does not allow the assignment of likelihoods to composite hypotheses, i.e. disjunctions of simple hypotheses, since there is no objective basis for determining the likelihood of such hypotheses, as in the case of the catchalls just discussed. Scientists, however, are sometimes interested in comparing composite hypotheses, and Sober argues that in such cases we ought to rely on some form of model-selection theory, such as the Akaike Information Criterion. The Akaike Information Criterion also has the advantage of making sense of why scientists test hypotheses known to be false, since its goal is not to determine which hypothesis is probably true (as with Bayesianism) or which hypothesis the evidence favors as true (as in likelihoodism), but merely to estimate a hypothesis’ predictive accuracy.

While Sober argues we ought to be Bayesians if the aforementioned conditions are satisfied, most of the book focuses on the likelihoods of competing hypotheses. This is because he believes the Law of Likelihood holds for both Bayesians and likelihoodists. However, this is not true for all Bayesian accounts (Fitelson 2007). Fitelson shows that there are cases where p(E/H1) > p(E/H2), and yet the evidence clearly favors H2 over H1, because the evidence logically entails H2, but not H1. Sober in fact acknowledges that the counterexample raised by Fitelson would be a problem for the likelihoodist. Sober claims, however, that it is not a problem for his account since he views likelihoodism as simply a fallback position — one should only be a likelihoodist when one cannot be a Bayesian. In the case Fitelson raises, one ought to be a Bayesian, since we can assign probabilities to the hypotheses, given the evidence; in other cases, when we cannot assign probabilities to the hypotheses, we ought to rely on the Law of Likelihood. Fitelson does argue that a weaker version of the Law of Likelihood (WLL) holds for all Bayesians:

(WLL) Evidence, E, favors hypothesis H1 over H2 if p(E/H1) > p(E/H2) and p(E/~H1) ≤ p(E/~H2). (Fitelson 2007, p. 479)

Notice, however, that this is not a biconditional. (Sober mistakenly understands it as a biconditional [see p. 37 fn 18]). What this shows is that for the evidence, E, to favor H1 over H2, it is neither necessary nor sufficient for p(E/H1) > p(E/H2), at least not on all Bayesian accounts of favoring. In other words, the likelihoods of the competing hypotheses might not be relevant for assessing which hypothesis is favored by the evidence. Moreover, if Fitelson’s conclusion is correct, Sober is left in the unusual position of claiming that the Law of Likelihood holds when one cannot be a Bayesian, even though it does not in general. I find this position a bit strange — the Law of Likelihood does not hold except in those cases where we do not have the ability to show that it does not hold (see also Fitelson 2007, p. 482-3). On Sober’s view, it seems that we can only establish that it does not hold when we ought to be Bayesians, anyway — i.e., when we can assign probabilities to the negations of the hypotheses we are comparing. If the Law of Likelihood does not hold in cases where we can show that it does not hold, why should we think it holds in cases where this cannot be established? Even if one accepts Sober’s concerns about assigning likelihoods to catchalls, why not simply say that in such cases we do not have adequate information to determine whether the evidence supports one hypothesis over another? Moreover, even if we accept the Law of Likelihood in such cases, it is important to realize when reading the subsequent chapters that Sober holds that p(E/H1) > p(E/H2) is only necessary and sufficient for establishing that the evidence supports H1 over H2 in cases where one cannot be a Bayesian.

This concern aside, the subsequent chapters contain a wealth of insight about the topics being discussed. In Chapter 2, Sober diagnoses what he considers to be the central problem with the Intelligent Design hypothesis — its failure to make any specific predictions. To yield specific predictions, one must rely on auxiliary hypotheses about the intentions and abilities of the designer, and we have no independent grounds for imputing such intentions or abilities. He also points out that the same problem afflicts attempts to disprove the design hypothesis by pointing to maladaptations (e.g., Gould 1980). In the process, Sober develops a notion of testability within the likelihood framework that avoids problems faced by other accounts, such as Popper’s. Therefore, those not interested in the argument for intelligent design might nevertheless find this chapter relevant.

In Chapter 3, Sober discusses how we can test natural selection against random genetic drift, or more precisely natural selection + drift (SPD) against pure drift (PD). This is an important issue for biologists, who still don’t all agree whether and how it is possible to have evidence that can discriminate between the two. Ignoring concerns about the law of likelihood, such as those I raised above, Sober shows that if we focus on the correlations between environmental factors and phenotypic traits of different species (rather than the absolute trait value of a particular species), it is possible to have evidence that determines whether SPD or PD is better supported by the evidence. While the justification for this might be helpful to biologists and philosophers working in the area, this general way of approaching the test of SPD against PD is not new (see, for example, Millstein’s 1998 discussion of the Great Snail Debate). Moreover, it is not clear that the framework Sober provides here is adequate for answering the concerns biologists have. The question with respect to SPD and PD is not just what counts as evidence for one over the other, but what would count as significant or strong evidence. For this, on Sober’s view, we would need to have information about the likelihood ratio. We need not know what the precise ratio is, but it would not be enough to know that the data is more likely given SPD than given PD; we would need to know that one hypothesis makes the data significantly more likely than the other hypothesis. It is not clear, though, that correlations between phenotypic traits of different species and environmental factors would be able to provide such evidence. The problem is that SPD is a composite hypothesis — it is a disjunction of selective hypotheses, each of which assigns a specific value to the strength of selection. If we are only interested in determining which hypothesis makes the data more probable, then this does not matter. Sober shows that there can be evidence, E, such that p(E/SPD) > p(E/PD), no matter what the strength of selection is, or such that p(E/SPD) < p(E/PD), no matter what the strength of selection is. However, some selective hypotheses will make the likelihood ratio quite small, while others will make it large. While there might be strong evidence to support SPD over PD, establishing that there is strong evidence to support PD over SPD is more problematic, since weak selective pressure might still make it fairly likely that there is no correlation between the phenotypic traits of different species and environmental factors. Millstein argues (in a way that seems compatible with Sober’s approach) that in certain cases biologists have been able to establish that there is strong evidence to support drift over selection, though it is important to note that the drift hypothesis being tested is not PD and the biologists in this case rely on multiple types of data (Millstein 1998).

I think the real gem of this chapter is Sober’s discussion of the use of molecular data to test SPD against PD. When making predictions about the rate of nucleotide substitutions, the composite nature of SPD does make a difference. As a result, Sober argues we ought to use the Akaike Information Criterion to assess SPD against PD when using such data. This is important because it might affect what conclusions biologists ought to draw using such evidence. Therefore, it might help adjudicate debates within biology, such as the neutralist/selectionist debate.

Chapter 4 takes up the hypothesis of common ancestry — the hypothesis that organisms alive today can be traced to a common ancestor. Sober’s concern is with whether and under what conditions observed similarities provide evidence for the common ancestry hypothesis over the separate ancestry hypothesis. He argues that, given certain substantive assumptions, similarities do provide evidence for the common ancestry hypothesis. Moreover, in this case, Sober is able to draw conclusions about the relative strength of the evidence for different types of similarities. When organisms share deleterious traits, this provides stronger evidence for their having a common ancestor than neutral similarities, and neutral similarities provide stronger evidence than adaptive similarities. In this chapter, Sober also returns to a topic about which he has written extensively — how similarities and differences can provide evidence for specific phylogenetic tree topologies (Sober 1988). He provides a nice overview of the debate between different methods of inference, and makes interesting points about the potential utility of using the Akaike Information Criterion to assess competing trees. All methods of phylogenetic inference must make assumptions about the evolutionary process. Moreover, the models of the evolutionary process used typically involve idealizations, and are therefore false. According to Sober, the advantage of the Akaike Information Criterion is that it would allow biologists to compare trees without deciding in advance which model of the evolutionary process is correct, and even without claiming that any of the models are true. The goal of the Akaike Information Criterion would merely be to estimate the predictive accuracy, not the truth, of tree/model combinations.

Sober has once again provided a carefully argued and stimulating book. It is clearly written, though it is not necessarily an easy read, especially for those unfamiliar with the issues discussed. The effort required, though, is heavily outweighed by the potential insight gained into these topics.


Fitelson, Branden. (2007), “Likelihood, Bayesianism, and Relational Confirmation”, Synthese 156: 473-489.

Gould, Stephen Jay. (1980), The Panda’s Thumb: More Reflections in Natural History, New York: Norton.

Millstein, Roberta L. (2008), “Distinguishing Drift and Selection Empirically: ‘The Great Snail Debate’ of the 1950s”, Journal of the History of Biology 41: 339-367. Available at http://philsci-archive.pitt.edu/archive/00003413/

Sober, Elliott. (1988), Reconstructing the Past: Parsimonly, Evolution, and Inference, Cambridge, Mass.: MIT Press.