Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science

Deborah Mayo’s view of science is that learning occurs by severely testing specific hypotheses. Mayo expounded this thesis in her (1996) Error and the Growth of Experimental Knowledge (EGEK). This volume consists of a series of exchanges between Mayo and distinguished philosophers representing competing views of the philosophy of science. The tone of the exchanges is lively, edifying and enjoyable. Mayo’s error-statistical philosophy of science is critiqued in the light of positions which place more emphasis on large-scale theories. The result clarifies Mayo’s account and highlights her contribution to the philosophy of science — in particular, her contribution to the philosophy of those sciences that rely heavily on statistical analysis. The second half of the volume considers the application (or extension) of an error-statistical philosophy of science to theory testing in economics, causal modelling and legal epistemology. The volume also includes a contribution to the frequentist philosophy of statistics written by Mayo in collaboration with Sir David Cox.

Mayo and Spanos frame the volume as an attempt to bring key debates in the philosophy of science closer to the methodological problems faced by scientists. This is not a goal that can be easily met or, for that matter, assessed. Nevertheless, the volume makes a significant contribution to bridging the distance between the kind of experimental learning that is common in many sciences and a number of key debates in the philosophy of science. In doing so, Mayo and Spanos articulate avenues for research on theory appraisal in a philosophy of science focused on experimental learning. The volume is also set up as a teaching resource for courses “that blend philosophy of science and methodology” (p. 12). Each of the exchanges is used to illuminate key questions in experimental reasoning, inference and the objectives of science. The tension between providing a resource for teaching and contributing to a specific area of philosophy is navigated well in the exchanges on the role of large-scale theory in philosophy of science. The contributions articulate a range of views while also serving as a framework for Mayo to explicate the error-statistical view of theoretical knowledge. In other areas the focus is on error-statistical philosophy rather than a comprehensive exposition of rivals (such as Bayesian philosophy of science and alternative approaches to statistical inference). This targets the volume towards students with a background in philosophy of science or frequentist statistics.

An exchange on Bayesian philosophy of science or Bayesian statistics would have been a welcome addition and would have benefited the dual goals of the volume. Bayesian philosophy of science and Bayesian statistics are a constant foil to Mayo’s work, but neither approach is given much of a voice. An exchange on Bayesian philosophy of science is made all the more relevant by the strength of Mayo’s challenge to a Bayesian account of theory appraisal. A virtue of the error-statistical account is its ability to capture the kind of detailed arguments that scientists make about data and the methods they employ to arrive at reliable inferences. Mayo clearly thinks that Bayesians are unable to supplement their view with any sort of prospective account of such methods. This seems contrary to practice where scientists make similar methodological arguments whether they utilise frequentist or Bayesian approaches to statistical inference. Indeed, Bayesian approaches to study design and statistical inference play a significant (and increasing) role in many sciences, often alongside frequentist approaches (clinical drug development provides a prominent example). It would have been interesting to see what, if any, common ground could be reached on these approaches to the philosophy of science (even if very little common ground seems possible in terms of their competing approach to statistical inference).

Mayo’s claim to be able to address the methodological problems of scientists relies partly on her advocacy and central use of the frequentist statistical approaches familiar to most scientists (Mayo’s error-statistical philosophy) and partly on the ability of Mayo’s account to respond to some of the philosophical problems articulated by Popper and Kuhn (Mayo’s error-statistical philosophy of science). While many scientists are aware of the problem of induction and Duhem-type underdetermination, fewer are familiar with the various responses suggested in more recent philosophy of science. By spelling out the piecemeal tests that form experimental inquiry and by providing an opportunity to test key assumptions that underpin the experimental and statistical models, Mayo provides an account that warrants inferences that go beyond the data at hand and a framework that helps to distinguish between a failure of the theory and a failure of one of the models involved in the experimental inquiry.

Mayo and Spanos begin the volume by providing a brief introduction to the error-statistical philosophy. Readers unfamiliar with Mayo’s account are likely to benefit from first reading key sections of EGEK. The error-statistical approach focuses on local tests and ruling out canonical errors. A key aspect of the view is that experiments can provide reliable data without assuming a large-scale theory. Data and experiment, in the parlance of the “new experimentalists”, can have “a life of their own”. The first half of the volume focuses on two questions that arise from EGEK: What are the implications of the error-statistical philosophy for large-scale theories (and theoretical knowledge more generally)? And what is the implication of the error-statistical philosophy for a philosophy of science that focuses on (or assumes the primacy of) large-scale theory? It is here that the chosen format of the book is most successful. Each of the protagonists in this half of the volume (Alan Chalmers, Alan Musgrave, John Worrall and Peter Achinstein) has a stake in the debate and each puts forth his position with verve. In addition to seeing how the error-statistical philosophy shapes up on the ‘life of theory’, the reader has the opportunity to contrast Mayo’s account against comparativism, critical rationalism, explanationism and Mill’s inductivism.

In Chapter 1, Mayo provides her position on theoretical knowledge within the error-statistical philosophy and in doing so presages her reply to many of the issues raised in the following chapters. Chalmers (Chapter 2) and Musgrave (Chapter 3) are concerned to save high-level theories (or, at least, the acceptance of high-level theories). Both argue for a standard of theory acceptance less stringent than severity. Chalmers suggests the argument from coincidence as the appropriate standard. Musgrave, in a wide-ranging chapter, argues that Mayo should give up her justificationist and inductive ambitions and become a critical rationalist. Mayo’s reply helps to distinguish the view of theory within an error-statistical philosophy from that of a theory-focused philosophy of science. Rather than need an account that accepts the entirety of a large-scale theory, the error-statistical approach has the benefit of articulating which parts of a theory should be accepted — i.e. those parts that have been severely tested.

Worrall (Chapter 4) also discusses theory appraisal, focussing on the use-novelty criterion. “Use-novelty” or the “no-double-use rule” aims to block inappropriate attempts to fit a theory to evidence. The problem with the use-novelty criterion as it is typically stated, however, is that, in addition to blocking the targeted inferences, it also rules out some legitimate inferences. While Worrall attempts to clarify use-novelty so that it identifies only the right sort of inference as faulty, Mayo jettisons the use-novelty criterion, arguing that severity judgements provide the correct diagnosis. An interesting aspect of this exchange is metaphilosophical. At least part of the disagreement between Worrall and Mayo on use-novelty hinges on the use (and abuse) of philosophical counter-examples. Mayo comments on philosophical method in a number of the exchanges. The comments are a useful addition to the exchanges and are likely to be helpful items for discussion when the book is used for teaching purposes.

Achinstein (Chapter 5) argues that John Stuart Mill’s views on induction are unfairly dismissed by error-statistical philosophers. Achinstein also puts forward a case for the probabilist, and perhaps the objective Bayesian, but his primary focus is with Mill. Mayo’s reply largely focuses on Achinstein’s empirical probabilism. She (quite reasonably) wants the details: How does Achinstein arrive at his objective epistemic posterior probabilities? And how do the probabilities ensure reliable inference? This reply to Achinstein encapsulates Mayo’s challenge to Bayesian philosophers of science. While the kind of detailed methodological argument provided by error-statistical philosophers has not been a focus of the Bayesian philosophical literature, there does not appear to be any prima facie reason that Bayesians could not supplement their account to provide a response to (at least some of) the questions raised by Mayo (for a survey of the strengths and limitations of Bayesian epistemology, see Hájek and Hartmann 2010).

Topics discussed in the second half of the volume are more varied than the first. Chapters consider the application of the error-statistical philosophy to econometrics and legal epistemology, the intersection of error-statistical philosophy and causal graphical models, and the foundations of frequentist inference. For teaching purposes Mayo and Spanos suggest selecting topics according to the focus of the course. Spanos’ chapter on economic modelling (Chapter 6) is a good introduction to the philosophy of economics and empirical modelling. In addition, he provides a solid argument for the benefits of the error-statistical approach in what has historically been a theory-dominated field. Larry Laudan (Chapter 9) argues for a closer attention to an epistemology of error in law. Laudan maps how burden of proof and standards of evidence shift for affirmative defences in criminal law. It certainly appears that something goes awry in the standard legal approach to affirmative defences. To the extent this is correct, error-statistical philosophy has something to offer. In response to Laudan, Mayo sketches some of the ways that standards of evidence could be clarified in legal contexts.

Clark Glymour (Chapter 8) explores the links between explanation, testing and truth. Specifically, Glymour claims that causal modelling provides an opportunity to severely test causal explanations. He provides details on procedures for selecting causal models from the set of possible causal relations and describes tests that can be conducted to examine the extent to which the assumptions underpinning the causal graphical model can be verified. Glymour’s chapter is challenging, but rewards close attention. Work on causal modelling provides an attractive avenue of research for those areas of science that rely heavily on hypothesis testing and estimation but continue to eschew causal interpretations.

Chapter 7 will be of particular interest to philosophers of statistics. “New perspectives on (some old) problems of frequentist statistics” comprises three parts, the first two a collaboration between Cox and Mayo and the third from Mayo alone. Part I and II elucidate a philosophy of frequentist inference for the error-statistical approach. Cox and Mayo argue for an interpretation of error-statistical tests along the lines of Fisherian p values (as opposed to Neyman-Pearson long-run error rates). This permits a post-data inductive interpretation of error-statistical tests and avoids some of the counterexamples that arise against the strictly pre-data perspective of Neyman-Pearson hypothesis tests. The main contribution of this chapter is that it provides a clear, accessible and comprehensive account of the approach to frequentist inference from the error-statistical perspective. Given the central role played by error-statistical tests in Mayo’s account and the high level of contention and confusion regarding the interpretation of frequentist tests in the literature (statistical and philosophical), the clarity this chapter provides is significant.

The account provided by Cox and Mayo is, of course, still frequentist. The primary argument Cox and Mayo provide for adopting the frequentist approach is the standard argument encountered in the literature: frequentist inference is objective and Bayesian approaches, especially subjective Bayesian approaches, are incompatible with the required objectivity of science (where “objective” is defined in opposition to subjective probabilities (p. 276)). Responses to this argument are just as familiar. The error-statistical probabilities, while not subjective in interpretation, are influenced by how the investigator chooses to set up and analyse the experiment. Surely, the issue at stake is not that the investigator has made assumptions, but whether these assumptions are explicit and justifiable. A second argument for error-statistics is presented that may be more compelling for those unpersuaded by objectivity/subjectivity arguments. This is the argument for error-statistics based on the virtues of the error-statistical philosophy of science. The error-statistical philosophy of science provides a detailed account of how inferences are justified that relies on and underpins error-statistical tests. This argument moves the focus away from a single statistical test to the broader account of inference. Mayo and Cox:

There is no suggestion whatever that the significance test would typically be the only analysis reported. In fact, a fundamental tenet of the conception of indicative learning most at home with the frequentist philosophy is that inductive inference requires building up incisive arguments and inferences by putting together several different piece-meal results (p. 254).

Error-statistical philosophy might not be the only viable account of evidence and warrant (as Mayo would appear to have it), but it has many virtues. The strength of the account hinges on the response it provides to some long-standing problems in philosophy of science while accurately capturing the way scientists learn. Error and Inference makes an important contribution to error-statistical philosophy and the new experimentalist program.

References

Mayo, Deborah G. 1996. Error and the Growth of Experimental Knowledge. Science and Its Conceptual Foundations. Chicago: University of Chicago Press

Hájek, Alan and Hartmann, Stephan. 2010. Bayesian epistemology. In Dancy, J., Sosa, E., and Steup, M., editors, A Companion to Epistemology, Blackwell Companion to Philosophy, Second edition. West Sussex: Wiley-Blackwell, pages 93-107.