Enactivist Interventions: Rethinking the Mind

Enactivism is one of the central themes in current philosophy of cognitive science, and Shaun Gallagher is among the leading proponents of the approach. These reasons alone would be sufficient for this book to qualify as required reading for anyone wanting to stay current with the subfield. The book provides an excellent and easy-to-read introduction to core issues and overview of the central debates, and it provides some fascinating applications of the framework. And I'm not just saying that -- I've already recommended it to a number of people. I do have a few gripes, but I'll get to those.

After an introductory chapter, Chapter 2 takes on the task of clarifying what enactivism is. One key point is that enactivism appeals to the actual body and bodily engagement, not just representations of the body (so called b-formatted representations). The second point concerns the relation between enactivism with extended mind approaches. While both make appeal to the body and environment, enactivism maintains that the organic details of the active body play an ineliminable explanatory role in cognition, whereas (according to Gallagher) extended mind approaches see the body and environmental entities as just more potential implementors of functional components of whatever functional system composes a mind. Overall, the chapter is a useful roadmap for tracking the similarities and differences between enactivism and other 'E' approaches (embedded, embodied, extended).

Chapter 3 looks at affinities between pragmatism and enactive perception. One major strand Gallagher characterizes by invoking Husserl, who "had developed the idea of the 'I can' as part of the structure of embodied perception. On his view, I perceive things in my environment in terms of what I can do with them" (p. 49). This is a deeply important insight, though not one limited to enactivism. The discussion of pragmatism segues to Chapter 4's discussion of our understanding of ourselves and others as intentionally engaged with the world:

Suppose you are driving a car . . . and see a person at the edge of the street restlessly looking left and right. You slow down a little in case he runs onto the street . . . If [asked] why you slowed down, you might answer that the person looked like he wanted to cross the road. In this reflective explanation it seems as if the person had been experienced in terms of his mental states, i.e., his desire to cross the road, which constitutes a reason for a further action of crossing the road. This, however, is a way of putting it that is motivated in the reflective attitude or the subsequent giving of reasons. In fact, in the original action, placing your foot on the brake pedal just is part of what it means to experience the intentionality of the person at the edge of the road. (p. 79)

On [the] enactivist-interactionist view, intersubjective interaction is not about mindreading the mental states of others; it involves . . . directly perceiving their intentions and emotions . . . (p. 156)

The first quotation highlights Gallagher's invocation of Brandom's (following Sellars') view that our explicit attributions of intentionality (the 'subsequent giving of reasons') are derived from a pre-existing framework of normatively significant practices. What our attributions do is make explicit that which is already there in our practices, albeit implicitly. The second quotation above expresses the idea that this pre-existing framework is one that affords 'direct perception' of others' intentionality.

Gallagher sets up Dennett's intentional stance (an instance of the theory-theory) and the simulation theory (with a potential mirror neuron underpinning) as the opposition. But it isn't obvious that there is an inescapable incompatibility between these approaches and the view Gallagher offers. For surely it is open to a proponent of Dennett's intentional stance (or the simulation theory, etc.) to insist that they are describing subpersonal neural or psychological machinery that makes possible our automatic interpretation of others as intentional agents -- not the conscious musings of the interpreter. Indeed, Sellars (Brandom's inspiration in these matters), in his attack on the 'myth of the given', was keen to insist on the theory-ladenness of perception: what we putatively 'directly perceive' (see the second quotation above) has meaning only through being embedded in an implicit theory. If this is right, then in response to Gallagher's challenge -- that "it's not clear how subpersonal, automatic processes scale up to the kinds of normative structures that neo-pragmatism emphasizes" (p. 75) -- a neo-pragmatist like Sellars (or Brandom) might reply that what is needed isn't a 'scaling up' but rather a making explicit. That is, the order of explanation goes as follows: (i) subpersonal mechanisms and implicit theories; which make possible (ii) interpersonal engagement of a sort that, as far as what agents are aware of at the personal level, is automatic and 'direct'; which can be made explicit via (iii) the learning and deployment of overt theories and vocabulary. On this line, enactivism is right to insist that (iii) isn't a self-sufficient level of understanding, that it rests on (ii) -- a crucial insight, arguably the one Gallagher really cares about. But it might be wrong to assume that (ii) is explanatory bedrock.

In Chapter 5, Gallagher takes on the claim that representations are involved in generating action. The representationalist needs to show that the representations are involved in the online creation of action and also decoupleable (to count as representations). Gallagher provides emulation theory (Clark and Grush 1999; Grush 1995, 1997, 2004) as a

model that puts representational decouplability directly into action at a sub-personal level. [Clark and Grush (1999)] propose that anticipation in motor control, specifically the 'internal' neural circuitry used for predictive/anticipatory purposes . . . is a model, a 'decouplable surrogate' that stands in for a future state of some extra-neural aspect of the movement -- a body position (or proprioceptive feedback connected with a body position) just about to be accomplished, e.g., in the action of catching a ball. Since the emulator anticipates (represents) an x that is not yet there . . . it is in some sense off-line, 'disengaged', or certainly decoupled from the current x or the current movement. (p. 87)

Some clarifications are in order. What emulators primarily do in such contexts is provide estimates of the current state of some entity[1] -- the body, the environment, whatever -- by providing (i) an a priori estimate (based on knowledge of how things typically work, aka a 'prior'), and, if available, combining this with (ii) actual sensory information from that entity if it differs from the estimate. It is a prediction only in the sense that it's available before sensory information, which takes some small amount of time to make its way to the brain. Because the a priori estimate is available sooner, one advantage of emulators is that they can be used to guide online action if needed -- if real feedback is delayed, or too noisy, or unavailable. So, when used in online action, the state of the emulator is in part decoupled from the environment, since it is perfectly capable of providing state estimates even in the absence of input from a sensory residual (aka prediction error); but it is also in part coupled, in that whenever any such signal is available, it is incorporated into the emulator's representation. Moreover, it is decoupleable in the additional sense that it, the very same model, can be used off-line to generate imagery, evaluate the consequences of hypothetical actions, etc.

Consider as an example of an emulation system a map in a ship navigation room, on which estimates of the location of the ship are maintained (see Grush 2004). This is a model inside the ship of the ship itself, other ships and landmarks, and the relevant environment. This is a representation if anything is. The ship's represented location on the map is the result of combining (i) an estimate of where the ship should be given its prior location and maneuvering commands with (ii) what is indicated by noisy observations. (This combining is, in fact, done at discrete 'fix cycles', not continuously, but that is irrelevant -- it could be continuous without changing the role of the map.) These represented states are used in real time. The captain does not hold off and , only when a decision needs to be made, ask the navigators to plot the location and then use that estimate. Rather, the map is always updated online in real time, so that it is there to guide action whenever needed, not only as fast as real feedback but faster, since it can be driven from estimates before real feedback is available.

Is the map/model coupled to the environment? Yes, in the senses that (i) feedback from the environment is incorporated into it as soon as available and (ii) what the map represents is a key factor that determines the captain's commands. Is it decoupled? Yes, in the sense that what the map represents is not fully determined by what is happening in the environment, even when online. (And it is also decoupleable in that it can be taken completely off-line if desired for advanced planning.) That is, it is as coupled as it needs to be to fulfill the requirements of real-time online perceptual processing and motor control. And even when used online, it is as decoupled as it needs to be to count as a representation -- error is not only possible but, within narrow-enough limits, nigh unavoidable.

Responding to the idea that such an emulator may be considered a representation even when operating on-line, Gallagher challenges:

It is not clear, however, why some mechanism that may (or may not) operate in a representational way in a non-action . . . is necessarily operating in a representational way in the perception-action context. (p. 93)

But is there really any doubt that the map in the navigation room is a representation even when it is being used for current, online, real-time navigation?

This leads to Chapter 6 on perception. While a good deal of the discussion is framed in terms of predictive coding, the main points are not particular to that framework. The driving issue is whether perception is a matter of a sensory inverse. A traditional understanding of perception is that it involves taking sensory information together with one or another sort of prior knowledge and constructing a representation of the state of affairs that caused that sensory signal.[2]

One issue is whether it makes sense to see the perceptual system as making inferences:

The visual system does not require an inference since, given evolutionary pressure or experience-driven plasticity, it 'can simply be wired by the environmental fact in question to produce states that track edges when exposed to discontinuities'. The system is physically attuned to such things, 'set up to be set off ' by such visual discontinuities. (pp. 119-120)

A lot is hanging on the word 'inference', for, of course, one might reply that what evolutionary pressure does is precisely set up the system to make certain inferences. But the important point, it seems to me, isn't so much whether the word 'inference' is the best way to capture such processes but rather whether an inverse mapping is implemented, even if it is implemented via mechanical linkages that were 'set up' to do that via evolutionary pressure. Gallagher is right that, often, this issue is framed in overly intellectualized ways, and he provides a useful neo-pragmatist corrective. But even if representations aren't constructed via inference, it doesn't follow that they aren't constructed at all.

Chapter 7 takes on the topic of free will. The discussion here is a perfect example of the love/hate relationship I have with enactivism. On the love side, the positive account Gallagher outlines in the latter part of the chapter is an interesting and important one. He develops a nice example of an intentional action, a grasp at a lizard that is embedded in a larger-scale activity of lizard collection. The point is to illustrate how our understanding of even small-scale actions as intentional actions involves seeing their role in larger-scale activities about which the enactive approach has much to say. This strikes me as a rich vein to be mined. On the hate side, the first part of the chapter discusses what Gallagher takes to be the relevant opposition view, hinging particularly on the infamous Libet experiments. The discussion proceeds as though the second (positive) part constitutes a repudiation of the sort of approach exemplified in the first, but it seems more like just a different topic. Libet wasn't concerned with what count as acts of free agency in the sense that the positive account of the second part of the chapter develops. Libet was fairly clear that the phenomenon of interest in his experiments was simply the timing between subjects' impressions of the initiations of their own 'free' (in the sense that they weren't triggered by anything prior of which the subjects were aware) intentions to act and the relevant neural events. It is just not the same topic.

In Chapter 8, Gallagher aims to up the enactivist ante, incorporating bodily states, affective states, and intersubjectivity:

When the bodily system is fatigued or hungry . . . these conditions influence brain function; . . . Low glucose levels . . . may mean slower or weaker brain function, or some brain functions turning off . . . (p. 39)

Affect . . . is deeply embodied . . . The agent's meaningful encounters with the world imply some basic motivation to perceptually engage her surroundings. Schemata of sensorimotor contingencies give an agent the how of perception . . . without giving its why, which depends on latent valences that push or pull for attention in one direction or another, . . . reflecting, for example, a degree of desirability. (p. 151)

Much of this is fascinating and correct. Especially noteworthy is Gallagher's discussion of the role of intersubjective social engagement in language and learning, including an analysis (following Goodwin 2000) of a conversation involving a girl who is accusing another girl of cheating at hopscotch. The accusing girl moves to physically block the other girl's motion while gesturing to the incorrect location of a bean bag. She does not just utter the sentence "I believe you are cheating". Her body and movements work together with the speech to literally embody deontic force as physical and social force.

This is genuinely fascinating material, showing how enactivism can contribute to linguistic semantics. But as in the case of the free will discussion in Chapter 7, the worth-the-price-of-admission positive account of the crucial role of embodiment is framed in terms of a negative account that, arguably, misses the mark and isn't necessary for the positive account anyway.

To see what I mean, consider the Novag Primo, a 1980s chess computer -- a good old-fashioned AI representation-using mechanism if ever there was one. Like all dedicated chess computers, it was designed to be a worthwhile opponent by determining moves on the basis of assessing many possible future move sequences and selecting as its actual move the one with the best outcome. The move is then implemented -- the Novag Primo is actually embodied in a real chess board.

Now consider this challenge from Gallagher:

The explanatory unit of cognition (perception, action, etc.) is not just the brain, or even two (or more) brains in the case of social cognition, but dynamic relations between organism and environment, or between two or more organisms . . . (p. 11)

What is the 'explanatory unit' for the chess computer? Notice that my description doesn't mention the CPU at all. The explanatory unit is the game, which involves a board and, essentially, an opponent. What the CPU does is, in part, a function of its opponent's moves -- indeed, the CPU and the opponent, through the physical board, are in dynamic interaction. Also notice that the computer doesn't represent the size of its white bishop or any of its objective features. The white bishop is represented as something that can be moved in specific ways, that can be used to capture certain things, and so forth. Its potentialities for action in the dynamical, social game exhaust its meaning to the Novag Primo.

What about affect? Notice that it isn't enough that the computer determines, for any move it might make, the various moves that could follow it. It must (and does) assess them as good or bad, as to-be-avoided and to-be-sought -- the why, not just the what or how. And in fact, part of the content of its white bishop representation is that it should be protected, that its loss would be a loss.

So far on my scorecard, the Novag Primo's activity involves an explanatory unit that composes more than the CPU and includes a temporally extended process, essential dynamical subjective interactions, action-oriented perception, and an ineliminable role for affect.

What about embodiment? The Novag Primo's difficulty as an opponent was determined by the number of moves ahead it would assess. And it was battery operated. Now what I am about to mention was not actually a feature of the Novag Primo, but it quite easily could have been. Suppose it had been equipped with a way to determine how much battery life remained, and, since computing look-ahead moves take time and energy, reduce the number of look-aheads when the remaining power starts being an issue. Or (if that description makes it sound as though the modulation is based on a representation of the remaining power, as opposed to the remaining power) imagine a CPU whose clock speed is driven by the voltage. (If such chess computers evolved naturally, the pressure would be the balance between (i) being more likely to finish a game, albeit with a lower chance of victory because of its less-effective moves and (ii) being very effective but losing some games through default after power died.) This is as direct a connection as there could be between a feature of embodiment (analogous to hunger or oxygen level) and cognition. If the effects of hunger on cognition in the human case count as pro-embodiment considerations, then it seems that embodiment can be put on the Novag Primo scorecard as well.

What lessons should we draw? One might be that the enactivist re-descriptions can't do the work they are trying to do. If anything is a representing, computational mechanism, a chess computer is. And so, if it can be re-described in the same enactivist language that is employed as a cudgel against representationalism, then so much the worse for that cudgel.

A very different lesson would be that even in the case of an old-fashioned chess computer, we are missing something important if we insist on conceiving it narrowly as a self-contained computational system. If we need to bring in social dynamical factors, issues related to its power-consuming embodiment, and so forth to fully understand the Novag Primo, then how much stronger are those lessons when the topic is human cognition?

I'm not sure which is the right lesson to draw, nor am I sure the answer isn't both. This brings me to a point that the reader will have noticed has recurred several times in this review. When Gallagher is engaging in the 'representation war' and associated enactivist talking points, the arguments strike me as less than convincing (though, of course, I'm one of the representationalists). But when the focus is on developing the positive elements of the approach and applying them to specific cases (e.g., the account of action understanding in Chapter 7 and the discussion of hopscotch communication in Chapter 8), the result is genuinely interesting and sheds light from novel and helpful angles.

I can't help but think that the enactivist camp would be better served by ignoring the negative arguments aimed at representationalism, etc. and focusing instead on developing applications that highlight the value added by their approach. What typically drives conceptual change is showing how a new approach does fruitful things. And the enactivist approach, when not distracted by its war on representation, has made, and is poised to continue to make, some solid advances. This book has more than enough of the light-shedding positive developments to make it a must read, despite its recurring fixation on the representation/computation debate.

REFERENCES

Barker, A. L., Brown, D. E., and Martin, W. N. (1995). Bayesian estimation and the Kalman filter. Computers & Mathematics with Applications, 30(10), 55-77.

Brandom, R. B. (1994). Making it explicit: Reasoning, representing, and discursive commitment. Harvard University Press.

Clark, A., and Grush, R. (1999). Towards a cognitive robotics. Adaptive Behavior, 7(1), 5-16.

Gallagher, S. (2017). Enactivist interventions: Rethinking the mind. Oxford University Press.

Goodwin, C. (2000). Action and embodiment within situated human interaction. Journal ofPragmatics, 32, 1489-1522.

Grush, R. (1995). Emulation and cognition. Doctoral Dissertation, Department of CognitiveScience and Philosophy, University of California, San Diego. UMI.

-- -- -- . (1997). The architecture of representation. Philosophical Psychology, 10(1), 5-25.

-- -- -- . (2004). The emulation theory of representation: Motor control, imagery, andperception. Behavioral and Brain Sciences, 27, 377-442.

-- -- -- . (2005). Internal models and the construction of time. Journal of Neural Engineering, 2,S209-S218.

[1] Though they can also provide genuine predictions of future states and retrodictions of past states (Grush 2005).

[2] It is a further question — one largely irrelevant for current purposes, in my opinion — whether the knowledge/expectations and sensory signal/update are combined optimally. In Grush (2004), I used Kalman filtering as an example; and, more recently, the folks who frame the process in terms of predictive coding appeal to Bayes’ Theorem. Both are mechanisms for combining the two sources optimally: Kalman filtering and Bayes are two formalisms for essentially the same process (Barker, Brown, and Martin 1995). What is important for the current topic, it seems to me, is that perception is a matter of combining, whether optimally or usefully-but-sub-optimally, an estimate based on both a predicted state estimate embodying prior knowledge and sensory information from the represented entity itself.