The Experimental Side of Modeling

Over the last century, philosophy of science has found itself drawn begrudgingly yet inexorably from the august heights of abstract scientific theory, with its seeming susceptibility to crystalline logical reconstruction, to the complex shambles of day-to-day scientific practice, and its miasma of the pragmatic and perspectival. This trajectory has been inevitable, as it is in the laboratory and the field that abstractions of theory are concretized into contact with the world, and the limits of predictive and explanatory adequacy are probed. Modeling, a practice that not only tests theory, but also informs policy, medical treatment, and technology, has gradually emerged in recent years as a major topic in this area. Models have been identified as "mediating instruments" between theory and world, essential for sophisticated measurement, and even, perhaps, themselves sources of empirical data. The Experimental Side of Modeling (ESM) is a snapshot of the current state of play, focussing on the close integration and mutual dependency of modeling practice and experimentation, especially the role of models in the analysis of data, the construction of explanations, and the discovery of new phenomena.

ESM comprises an introduction by the editors, nine standalone papers, and a tenth "symposium" contribution by Paul Teller, with its own introduction, response by Bas C. van Fraassen, and rebuttal. A remarkable achievement of this collection, one rarely found in such anthologies, is the perfect balance it strikes between unity and diversity. The contributions are diverse insofar as they draw on case studies from a wide range of scientific disciplines, spanning the gamut of modeling practice: high energy physics (Deborah Mayo), astronomy (Ronald N. Giere), fluid dynamics (Isabelle F. Peschard), ecology (Michael Weisberg), cognitive science (Anthony Chemero), climate science (Eric Winsberg), synthetic biology (Tarja Knuuttila and Andrea Loettgers), and the social sciences (Nancy D. Cartwright). Nevertheless, there is a clear and striking unity to the problems they face, each arising from the subtle epistemic feedback between modeling and experimentation. After a quick survey of each chapter, I'll briefly highlight two missed opportunities before reflecting on the future prospects for this line of research.

The introductory chapter by Peschard and van Fraassen is a substantive contribution in its own right, organized into three parts: first, a summary of the arguments of all chapters, explicitly highlighting areas of thematic complementarity, overlap, and (dis)agreement; second, a brief historical survey of the trends in philosophy of science leading to the present state of the modeling literature; and third, a close analysis of the role of data models in hypothesis formation through case studies in physics and fluid dynamics. The historical survey in particular is highly recommended, as it is swift and high-level enough not to mire experienced readers, yet contains insights and clarifications that helpfully correct oft-repeated inaccuracies in our consensus Whiggish history.[1] One example is the common conflation of Patrick Suppes' project to formalize scientific practice in the language of set theory with the "semantic approach" that emerged in its wake -- the editors nicely tease apart these two projects and articulate exactly where they diverge. This chapter creates added value for the collection as a whole, rendering it appropriate to serve as a high-level introduction to the state of the art in philosophy of modeling, suitable for advanced graduate students or philosophers in other areas seeking to engage this literature.

Peschard and van Fraassen conclude by highlighting two key ideas that play a role, either explicitly or implicitly, throughout the collection: model of experiment and data model. The first they trace to Duhem, who argued that experimenters must employ a "schematic" model of an experimental setup in order to interpret it, since the mathematics of physical theory may only be applied directly to such a model, and not to concrete reality. Suppes elaborated this basic idea in 1962, arguing for a hierarchy of models, including not only the model of the experiment, but also a model of the data. This model interprets and organizes the raw output from an experiment into a "canonical form" that may be compared with the predictions of theory.

Suppes' short paper is suggestive, but enigmatic, and the case studies in the introduction do much to explicate what exactly the analysis of an experiment through a hierarchy of models might look like. In doing so, Peschard and van Fraassen introduce a central theme of ESM, that data is not "a passive element . . . [but rather] create[s] the prospect for articulation of both the theoretical model and the model of the experiment" (52). Elaboration of Suppes' hierarchy continues in the chapter by Giere, which argues that the model of experiment must sit at its base, informing a data model, which in turn establishes a comparison with a "representational model" of the predictions of abstract theory.

Giere concludes by pointing out that the large data sets generated by astronomers are so informationally rich, they may themselves serve as representational models. This possibility is also implicit in Mayo's contribution, which looks at the role of significance tests in the "discovery" of the Higgs boson. She begins with the puzzle: why do particle physicists enforce such a strict criterion for statistical significance? The Higgs search demanded a "5 sigma" effect, i.e., that the data fall 5 standard deviations from the null hypothesis -- a criterion that seems absurd from a Bayesian perspective, given that the majority of physicists already believed in the correctness of Higgs' hypothesis. One aspect of the answer is that particle physics is a data rich science, concerned especially with probing the possibility of physics beyond the standard model. Large data sets combined with a high standard of evidence allow supercollider experiments to identify and probe "bumps" in the data that may indicate new particles, and thus new physics.

Chemero illustrates a role for mathematical models in hypothesis formation, detailing how the mathematical discovery that systems dominated by the interactions between their components (rather than those components' intrinsic dynamics) exhibit 1/f noise motivates a new experimental paradigm. Noise may be categorized by its profile of relative densities over frequency bands, with 1/f, or "pink," noise exhibiting density inversely proportional to frequency, thereby falling between "white" (equal density over all frequencies) and "brown" (heavily weighted toward the low end of the spectrum) noise. Traditionally, psychology experiments aim to minimize all noise, relegating it to error bars or p-values in the data model. In contrast, Chemero and colleagues modeled the noise itself, i.e., "random" fluctuations, in subjects' hand movements during a computer game playing experiment. This allowed them to test for 1/f (vs. brown or white) noise, and thereby confirm game playing behavior is produced by an interaction dominant system.

Knuuttila and Loettgers explore the implications of "hybrid" models that emerge from the interplay between model and experiment. In synthetic biology, DNA and proteins are artificially combined to produce simple model systems -- while they are constructed, and in that sense artificial, they nevertheless are made from the same material as in vivo systems of interest. This positions these synthetic systems as "mediat[ors] between mathematical modeling and experimentation" (118). Their case study on the construction of synthetic circuits for circadian rhythms illustrates how synthetic models may reveal the relevance of factors in a mechanism (in this case, a time delay in negative feedback) not included in the original mathematical model, thereby fueling developments in both theory and experiment.

Joseph Rouse analyzes the emergence of conceptual understanding from this kind of empirical-theoretical loop. He argues that, while it is correct that models mediate between theory and world, the relationship between model and world is further mediated by publicly accessible phenomena. This double mediation implies two distinct standards of normativity for scientific concepts: application to the correct domain of phenomena, and application to that domain in the correct way. Attempts to validate concepts in terms of empirical success (a la Ian Hacking and Cartwright) conflate these two standards; instead, we must recognize that pattern recognition itself is inherently normative, and it is this norm that permits concepts to extend beyond the limits of our ability to model them directly, as we defeasibly identify and extrapolate from the patterns perceived in publicly available phenomena. Since model-driven experimentation generates new phenomena, the sciences expand the range of perceivable patterns, and thus of conceptual understanding itself.

Jenann Ismael's contribution examines how the focus on scientific practice should inform our understanding of modality, law, and causation. Scientific practice often involves the construction of mechanistic models by means of structural equations; these mechanisms satisfy the Pearl-Woodward model of causality as counterfactual susceptibility to change through intervention. Ismael draws attention to the fact that these models are "strictly logically stronger than . . . global laws," as they explicate "counterfactual dependence [even when] there are interventions whose antecedents are not nomologically possible" (173). She argues we should invert the typical picture, on which these local interactions are constrained by global laws, and replace it with one on which local mechanisms are metaphysically primary, grounding natural necessity, and the appearance of global laws a mere "overflow," without metaphysical significance.

Winsberg challenges Richard Jeffreys' argument that scientists can avoid the value judgments inherent in any methodological choice by reporting probabilities over hypotheses rather than choosing between them, demonstrating how this argument breaks down in the case of climate models. The complexity, historical contingency, and distributed labor involved in their construction imply that many methodological choices are "ossified" into climate models; while each such choice involved a value judgment (concerning, e.g., inductive risk), these judgments cannot be isolated or recovered, and thus cannot be "factored out" through the assignment of probabilities. The upshot is that climate models are inherently value laden, yet no practical recommendation to climate science could change that: value-laden models are as good as they get.

Weisberg tackles the difficult problem of model validation: if models are inherently incomplete and idealized, how do we assess their success at representing a target system? Weisberg adapts Amos Tversky's account of similarity, on which the similarity between two entities (in this case, model and target) is a weighted function of both the features they share and those on which they diverge. In different explanatory contexts, modelers employ different weightings to validate their models; for instance, when constructing a "how possibly" explanation, a modeler may assign little relevance to features on which model and target diverge, yet when searching for a minimal account of the target's causal structure, it may be important to assign a large weight to divergent features. Weisberg argues that models that pass both this validation procedure and robustness analysis can justify conclusions about their real world targets.

The volume concludes with a "symposium on measurement" centered around Teller's attack on "traditional measurement accuracy realism": the view that there is a true value in nature for quantities we attempt to measure. On this view, it may be (epistemically) that we can't directly access the quantity we aim to measure, yet we nevertheless assume such a quantity to exist. Teller argues that this view is mistaken: we are limited to describing quantities in nature by current theory, yet current theory is highly idealized, and fails to account for the full complexity of the world. Since this complexity outstrips our current theoretical resources, our quantity terms fail to refer, and thus do not attach to the world. Teller is careful to emphasize that his point is not metaphysical: his claim is not that there are no quantities in the world causally influencing measurement outcomes, but rather that we cannot assess, or even describe, the fit between a measured value and any specific such quantity. In his commentary, van Fraassen pushes Teller on this point, and argues for the further conclusion that any metaphysical gloss on the practice of evaluating measurement is "irrelevant" (303) -- that there really is no more to measurement practice than a purely theory-internal process of cross-checking models for mutual coherence.

The symposium on Teller's paper strikes me as one of two missed opportunities for this volume. While Teller's paper is conceptually rich, and makes some important contributions -- connecting and contrasting problems of accuracy with general issues about vagueness, for instance -- it also has some puzzling features. Foremost is its seeming heavy reliance on a descriptive theory of reference: Teller repeatedly emphasizes that reference failure is the problem with traditional realism, and this reference failure occurs due to the idealization of our theories, preventing them from describing the world in all its complexity. Yet the most flat-footed post-Kuhnian realism, for instance that of Hilary Putnam (1976), endorses a causal theory of reference, on which successful reference turns on an initial act of baptism, not on correct description. I can't see how Teller's critique will apply to this form of realism, yet if his argument does not target even the most naive realism of the last 30 years, one wonders who its target is. The symposium format could have addressed this issue in a constructive way by inviting commentary from realists and allowing Teller to respond.[2] Instead, van Fraassen's commentary comes from a perspective relatively close to Teller's own, broadly on the coherentist/pragmatist end of the spectrum. In fact, neither Teller nor van Fraassen cite or discuss any representative realists at all, referencing only themselves, Hermann Weyl, Giere, and the recent measurement coherentism of Eran Tal. Consequently, what could have been a constructive dialog between radically opposed perspectives on measurement is relegated instead to internecine dispute.

The second missed opportunity of the collection is in Cartwright's contribution. Cartwright argues that randomized controlled trials (RCTs) should not be treated as the gold standard of evidence for informing policy. Rather, effective assessment of whether a policy intervention will be successful requires construction of a causal model. Strikingly, in contrast to all other contributors, Cartwright fails to engage the subtlety of interplay between modeling and experimentation. Instead, directly contradicting the spirit of ESM, she explicitly establishes a dialectic of "experiments versus models" (161).

This is especially unfortunate, as embracing the more ecumenical perspective of the rest of the volume would have allowed her to make a much more nuanced, and hence more philosophically interesting, point. Her claims that RCTs should not be treated as the sole evidence for policy, and that causal models can inform policy in ways that RCTs cannot, are surely correct. But the conceptually difficult question is: where do causal models come from? Cartwright's worked example, illustrating why an intervention to reduce classroom size in California failed to improve learning outcomes, is entirely unconvincing from the perspective of policy choice, because it relies on ex post facto analysis of an intervention failure. To inform policy, one needs to know which factors are causally relevant to a situation before the intervention is made. Cartwright acknowledges that "we do not . . . have . . . accounts of how this is done," yet asserts we are nevertheless "very often in a good position to see that a factor will be necessary" (160-1). Stubbornly, she downplays the obvious, that this is the very appeal of RCTs: they provide a principled means to determine whether a factor is relevant to some outcome. Rather than defend a binary opposition between RCTs and causal models, leaving her modeling-only view seemingly reliant on sheer good luck for its success, as she does, Cartwright could have explored possibilities for constructive feedback between RCTs and causal modeling, recognizing that RCTs inform causal models, yet advocating for an increased role for models in setting targets for new experiments (a point made in the introduction by the editors themselves).

Cartwright's intransigency is an extreme response to a predicament facing all philosophers who engage the messiness of real scientific practice: this practice is so complex that it resists summary in concise distinctions or universal morals. While the accepted standards of philosophical argumentation demand that papers be framed in binary terms, resolving a well-defined problem, refuting an incorrect view, or advancing a novel one, the kinds of conclusion legitimated by careful case studies rarely fit this mold. Consequently, many of the attempts in ESM to draw grand, definitive conclusions from isolated examples fall somewhat flat: Chemero portrays his discussion as refuting the experimental mandate to maximize signal over noise, yet grudgingly admits this mandate still holds in a more abstract way, as noise has simply become a second-order signal (88); Mayo tries to extrapolate a general defense of frequentist over Bayesian methods (210-11), yet her move problematically ignores the exceptional data-rich character of particle physics; Weisberg tries to position his account of similarity as an alternative to isomorphism (248), yet smuggles isomorphism back into the analysis under the guise of a "map" from target to mathematical representation (257). The point is not that any of these studies fail to provide insight -- they are all illuminating! – but that they just oversell, and thereby also distort, their contributions by forcing them into a binary us-vs-them dialectic.

A notable exception to this trend is the contribution of Knuuttila and Loettgers, which introduces a new type of experimental model, not to replace or reject prior accounts of modeling as a form of experimentation, but to tease apart several distinct ways models may be experiment-like. If the philosophical analysis of case studies from scientific practice is to progress constructively, it will be this kind of conciliatory refinement of previous work, and the delicate drawing of ever more nuanced distinctions, rather than philosophy's typical adherence to dialectical extremes, that moves it forward. Whether or not such a project is ultimately compatible with analytic philosophy's traditional mandate to aggressively defend positions of stark opposition is yet to be seen.

ACKNOWLEDGEMENT

Alistair Isaac is supported by a grant from the Alexander von Humboldt Foundation.

REFERENCES

Hesse, M. (1953) "Models in Physics," British Journal for the Philosophy of Science, 4(15): 198-214.

Hesse, M. (1961) Forces and Fields: The Concept of Action at a Distance in the History of Physics. Thomas Nelson & Sons.

Putnam, H. (1976) 'What is "Realism?'" Proceedings of the Aristotelian Society, 76: 177-194.

[1] A notable omission here is any mention of Mary Hesse: whether or not she influenced the gross trajectory of the philosophy of modeling, she was the first to draw attention to the topic (1953), and researchers today would gain much from revisiting her work, especially her extended case study Forces and Fields (1961).

[2] Some suggested interlocutors: arch-realist Stathis Psillos, well-known measurement realist Joel Mitchell, or even Weisberg, whose own account of idealization, discussed elsewhere in ESM, has a decidedly realist flavor.