Joint Attention: New Developments in Psychology, Philosophy of Mind, and Social Neuroscience

Placeholder book cover

Axel Seemann (ed.), Joint Attention: New Developments in Psychology, Philosophy of Mind, and Social Neuroscience, The MIT Press, 2012, 493pp., $45.00 (hbk), ISBN 9780262016827.

Reviewed by Sebastian Watzl, University of Oslo


This new collection covers a wide range of issues. Joint attention lies at the interface of body and mind, the interface between the theoretical and the practical, and the interface between the individual and the social. The book thus introduces a diverse field, where "no generally accepted definition [of the phenomenon] is available, nor is there a well-ordered overarching research program" (Introduction, p. 1). Having read the seventeen essays, the reader might still not have a very firm grip on the issues, but she surely will walk away with more appreciation of the fascinating "social dimension" (p. 2) of our mental lives.

The collection is strongly interdisciplinary. It contains work in philosophy (Chapters 8, 11, 12, 13, 14, 16); developmental psychology (Chapters 4, 6, 7, 15); cognitive psychology (Chapter 17); clinical psychology (Chapter 5); and animal psychology (Chapters 1, 2, 9, 10). The collection is probably best used by researchers in one of these areas who would like to learn more about how work in their own field relates to concerns in neighboring disciplines. Such researchers can use Seemann's collection as a starting point for entering deeper into literature in those related areas.

What is the phenomenon of joint attention? The photograph on the cover of Seemann's collection helps to focus on the topic as well as on an underlying thread that connects the diverse essays (indeed, this might be one of the few books that rewards continued reflection on its cover). In the foreground of the photograph we see three fencers in action. A fencing instructor in the middle is teaching the person on the right; he is pointing towards the object of her attention -- her opponent on the left -- by way of imitating a fencing movement while at the same time guiding her arm with his other hand. In the background we see a number of other students who are observing the scene. The fencing teacher and his student are jointly attending to her opponent; their looking toward him is coordinated, as is their bodily action. The attention of the onlookers -- by contrast -- while they are all focused on the same scene is not joint. They are merely attending to it together.

What then distinguishes joint attention from merely attending together? How is joint attention connected to collective action and cooperation? What role does sharing of emotions, joint engagement, and embodied cognition play in episodes of joint attention, and in its development? Is joint attention a uniquely human phenomenon -- or do apes, monkeys or packs of wolves and other animals (who, in some sense certainly seem to co-operate attentively) engage in it as well? What capacities are needed to explain joint attention (is it, for example, necessary to have the capacity to understand another's point of view)? And what capacities can be explained by reference to joint attention (is, for example, any serious form of cooperation possible without it)? These are some of the questions raised by the cover that are central to the essays in the book.

We might organize these questions into roughly three areas. I will call them questions about the nature of joint attention, questions about the origins of joint attention, and questions about the significance of joint attention (these questions roughly correspond to the three sections that divide the book). One thread that connects the various contributions is that they all react to certain background pictures. Bringing those background pictures into view might be helpful to see what is at issue.

Let me start with the first set of questions, concerning the nature of joint attention. What is joint attention, and how should we theorize it?

We might call the background picture in this area "individualistic". Attending, one might think, is an activity that (in a strict sense) only individual subjects can perform, though -- of course -- they might coordinate their respective activities. On this picture joint attention as a collective mental activity can be reductively explained in terms of coordinated individual mental activities. Roughly, whether x and y are jointly attending to z will be determined by (i) x and y both (individually) attending to z, and (ii) some form of awareness by both x and y of how their attending to z is coordinated.[1]

Carpenter's and Liebal's essay articulates an empirically based account of joint attention that fits within the frame of the individualistic picture just outlined. On their view of joint attention the co-attenders need to know together that they are attending to the same thing (p. 159). This "knowing together" corresponds to (ii) above and distinguishes episodes of joint attention from the merely parallel attention of our onlookers. Carpenter and Liebal review evidence that at around fourteen months infants are sensitive to whether they are engaged with adults in joint attention characterized by such knowing together. What then is it to know together? One approach familiar from the philosophical literature on common knowledge would require an infinite recursion of knowledge (p. 165f). Such an approach is unsatisfactory (if only because it requires too much of our infants). Carpenter and Liebal suggest that the problem can be overcome by appeal to communication: through an exchange of "communicative" or "sharing" looks (p. 170) the co-attenders are immediately aware of how their attending to z is coordinated. In this sense they know together. Carpenter and Liebal do not attempt to provide a reductive account of what is involved in such sharing looks (though they provide some characteristics). Yet even without a reductive definition of "sharing look" one would like to know more about how this account would generalize beyond the visual modality, and how much hangs on the perceptual nature of the relevant form of communication (can we jointly attend to the moon, on their view, if you are in North America, I am in Europe, we both look at the moon, and we exchange text messages about what we are doing?)

In contrast to Carpenter and Liebal, many other essays in the volume reject the individualistic picture.

Axel Seemann's essay does so most directly. On his view, joint attention cannot be reduced to the level of each individual; rather it is "a phenomenon that has to be looked at in the broad terms of the relation . . . between the involved organisms' activities and the environment in which these activities occur." (p. 183). According to Seemann's official account (p. 199) x and y are jointly attending to z just in case x and y are both causally sensitive to z in their focus of attention and behavior, as well as causally sensitive to each other's focus of attention and behavior. This seems too weak: the onlookers depicted in the cover image might be causally sensitive to each other's focus of attention toward the scene in front of them, but presumably they are precisely not jointly attending. Seemann often talks about the importance of shared "feeling" (p. 195 ff.), "experience" (e.g., p. 200), and the co-attenders' bodies "constituting" each other's experience (p. 200). I felt that this talk was meant to rule out the kind of counterexamples just mentioned. I didn't see, though, how the official account makes good on this.

Shaun Gallagher's and Daniel Hutto's essays both offer anti-individualistic perspectives that fall under the elusive labels of "embodied" or "enactive". The main claim of Gallagher's short contribution is that joint attention primarily consists in a coordination of movement, and not a coordination of mental states. Gallagher supports his view with (a) the claim that infants engage in acts of joint attention before they have the capacities Gallagher believes to be necessary for psychological coordination, and (b) examples meant to illustrate cases of joint attention that do consist in a coordination of movement (such as in a game of football), examples that -- I take it -- were meant to obliterate the need for psychological coordination. The lack of detail and precision in Gallagher's discussion leaves many questions open. To mention just one: presumably not any kind of coordinated movement (think of ants or flocks of birds) is sufficient for joint attention. Plausibly, the movements as well as the coordination must in some sense be intentional. But if so, aren't we back to some form of mental coordination -- maybe of a kind that is more primitive than the one infants might not be able to engage in? More argument and detail would have been needed for a satisfactory account.

Daniel Hutto's essay, on the other hand, contains an admirably detailed argument to the effect that joint attention might be explicable without invoking mental representation, and an equally clear statement of what such an anti-representationalist treatment might look like. I found Hutto's essay extremely rewarding. It goes far beyond the often-heard buzzwords of "enactive" or "embodied" cognition, and shows how an account of joint attention (and simple cognition more generally) under such labels might rival representationalist accounts in sophistication and explanatory power. It puts the onus on those in favor of representationalist accounts to show why they are superior to neighboring anti-representationalist rivals likeHutto's, and to show how some of the pressing problems with such views pointed out by Hutto can be overcome.

Karsten Stueber's essay may be seen as taking up part of the challenge put forth by Hutto. He first argues that the mere fact that children learn about the social world through interaction with others shows nothing about the underlying mechanisms children bring to bear in acts of joint attention (and whether these involve mental representation). More importantly, he argues that any account of perceptual or cognitive sensitivity to another's mind must explain how it is sensitivity to something that I myself qua subject of experience and minded creature could have (p. 279). But if that is right, then any real intersubjective engagement must be grounded in an appreciation of what Stueber calls "like-me familiarity" (p. 276); furthermore, interacting with another qua agent must involve an appreciation of the fact that acts are performed for reasons. This again is something Stueber believes we can explain only by reference to a subject's sensitivity to the similarity of the acts of another to acts she knows from the first-person point of view. It is thus in part by reference to the rationalizing (and not merely explanatory) point of mentalistic vocabulary that we see that the kind of perspective taking and representing of another as minded stressed by simulation theory (a view Stueber and other have championed elsewhere) is, after all, necessary to explain intersubjective engagement (though maybe sometimes at the level of experience not cognition, p. 276).

John Campbell's essay defends what we might call the "primitivist" view of joint attention. This view leaves the three-place relation 'x and y are jointly attending to z' as a primitive, only adding that such a relational state of joint attention is a conscious, personal, and non-propositional state. Campbell argues that only such a primitivist view can explain the various rational roles of joint attention. In particular, an account of joint-ness based on communication (like, I take it, Carpenter and Liebal's) cannot, according to Campbell, explain how joint attention can make it evidently rational to cooperate in coordinative games, where you (and the other) stand to win if both cooperate, but stand to lose if only you yourself were to choose the cooperative act.[2]

How does Campbell's view relate to the accounts of joint attention mentioned so far? Like Seemann's, Gallagher's and Hutto's view (and unlike Carpenter and Liebal's and Stueber's) Campbell's view is anti-individualistic: joint attention requires an actual relation between subjects and their environments. Unlike the first three views (and like the other two) Campbell's view, though, is not specified in terms of bodily or causal interactions. When subjects jointly attend they bear a primitive experiential relation to each other and their environment. How would Campbell address Stueber's challenge to explain how it is that in intersubjective engagements we are sensitive to the other as a subject of experience and action like us? Campbell might appeal to the experiential character of his three-place relation that contains the other as co-subject: you and I bear the same type of relation (i.e., a relation of consciousness) to something that I alone qua subject can also bear to it. It is unclear whether that completely answers the challenge though: why should I be in a position to know that you are a subject if I am not aware of you as such, simply on the basis of your experiencing together with me? There might be ways of responding to this challenge that try to assimilate second-person knowledge in such circumstances to self-knowledge. It would be interesting to see Campbell's account developed more in this respect.

Timothy Racine, in the essay that -- somewhat oddly -- starts the whole volume, is skeptical about the whole debate we have seen so far. Indeed, "In research areas as saturated with empirical findings as joint attention, it is odd to think that we even need theories" (p. 38), he says. Sometimes Racine seems to think that the problem lies with appeal to mentalisticvocabulary like "intention" or "goal" (p. 32). At other times the problem seems even more general: instead of trying to specify what joint attention really is we should just amass the relevant behavior and neuronal data, and come up with models that explain them. It is good to have a critical perspective like Racine's on the table. But it is hard to engage with it in the abstract without getting deep into general methodological discussions. I thus leave it to readers of Seemann's volume to judge whether the various theories are as useless as Racine suggests, and whether non-mentalistic or theory-neutral models of joint attention can do all the explanatory work (as well as, as we have seen in Stueber's and Campbell's essay, account for the various rationalizing features of joint attention).

Let me then turn to our second set of questionsconcerning the origins of joint attention. What are the ontogenetic origins and precursors of joint attention in infants? And what are itsphylogenetic origins and precursors in the animal kingdom?

We might call the background picture in this area the "sophisticated" picture of the origins of joint attention. It is related to the individualistic picture of its nature. If the co-attenders need to be aware of how their attention is coordinated, then joint attention seems to require that the participating subjects in some way appreciate the mental life of others. From there it is a natural step to think that the development of joint attention requires the development of mindreading, perspective-taking, or meta-representation. But such things are difficult achievements for cognitively unsophisticated creatures, and hence non-human animals probably don't engage in joint attention, and infants start to engage in it only at an age when they first develop the ability for simple forms of perspective-taking.[3]

Many of the essays in the volume deal with this sophisticated picture, argue against it, and point to simpler forms of joint attention or precursors.

Colwyn Trevarthen's chapter traces the origins of intersubjective engagement in early infancy. With references to many of his own studies in the last twenty-five years as well as the studies of others, he paints an engaging, if somewhat essayistic, picture that stresses the central roles of sharing experience, intimacy, and companionship between infants and their caregivers, and of the rhythms that coordinate their bodies and emotional minds from the first day. Infants are not trapped within their own perspective, but inhabit a social world from the beginning. It is against the background of such primary intersubjectivity that "secondary intersubjectivity" (p. 97) like joint attention that is directed at a third object should be understood. The occurrence of joint attention at around nine months is for Trewarthen a genuinely new development in part because it involves "a change in the quality of companionship between two close friends" (p. 97).

Vasudevi Reddy's essay has a similar emphasis. She suggests a "second-person approach" (p. 138) to how infants become aware of another person's attending. She points out that infants are aware of and emotionally responsive to another's activity of attending to themselves much earlier than the emergence of joint attention to distal objects. Reddy suggests looking at infants' social development as a way of expanding the sphere of the "objects" of another's attention they are emotionally sensitive to (see table 6.1, p. 145) and with which they actively engage by attempting to direct another's attention (see table 6.2, p. 147). The early emotional engagement with the other's attention directed at themselves, she suggests, might play a crucial role for the development of joint attention as it is impaired in infants with autism who later typically show an impairment in their joint attention skills (p. 149).

Peter and Jessica Hobson's contribution directly focuses on the topic of autism. The essay reviews a variety of results about autistic children that range from observations concerning communicative impairments such as a lack of eye contact, of "sharing looks", or of conversational nodding, to studies that show a lack of emotional engagement specifically (autistic children look less concerned than others when someone else's drawing gets torn up). Based on such results and others, Hobson and Hobson argue that "the syndrome of autism is an expression of limitations in intersubjective engagement in relation to a shared world." (p. 117). The gist of Hobson and Hobson's essay seems to be that it would be too narrow to describe these limitations as impairments in mind reading or perspective taking capacities (see p. 130 ff.). Such specific impairments should rather be seen as the result of a more basic, though probably not simple, failure to share experience with others.

I found Trevarthen's, Reddy's, and Hobson and Hobson's essays fascinating. They certainly make a good case for reversing the idea that normal infants start out locked in their worlds, and encounter the worlds of others only through the development of cognitive sophistication. The primary intersubjectivity that is there from the start most likely does play a role for the development of joint attention and perspective taking (see below for more on that), and is not based on such sophisticated achievements. It would have been good though to learn more on how exactly the transition to those more sophisticated forms of interaction is achieved and about what the limitations are that seem to prevent infants from engaging in joint attention before about one year of age.

Let me then turn to the phylogenetic origins of joint attention.

Shepperd and Cappuccio's essay is a rich resource on the varieties of following, understanding, and responding to gaze as well as on the varieties of manual pointing that can be found in non-human animals. They review both behavioral results as well as results about the relevant neural circuitry.  Sheppard and Cappuccio show that sensitivity to gaze direction sometimes can be extremely fast and (almost) automatic, that it is widespread among animals and responsive to the geometric layout of the environment ("others can't see things behind barriers" etc.), and that at least in monkeys and apes it is modulated on the basis of expectations and the social status and facial expression of the one whose gaze is being followed. What might be special about humans, they suggest, is the widespread sensitivity to and use of gaze as a "collaborative signal" (p. 213). Similar points might apply to the use to which pointing is put.

Hopkins and Taglialatela's contribution focuses specifically on joint attention in chimpanzees. Like the previous essay it is an informative resource of both behavioral and neural data on the topic. Among other things, Hopkins and Taglialatela, on the one hand, review evidence that chimpanzees (compared to infants) have difficulties in understanding communicative gaze and pointing. On the other hand, if chimpanzees are raised in an "enriched human linguistic environment" (p. 251) some of these difficulties may be overcome, thus raising doubts as to whether they should be attributed to a fixed limitation.

What then is to be made of the idea that "humans . . . have a species-unique adaptation for joint attention"? (Leavens, p. 43). While the two essays just discussed review many results aboutprecursors of joint attention in non-human animals, they leave open whether the capacity for full-fledged joint attention itself might be unique to humans. David Leavens' essay aims at directly attacking such a claim by showing that it is based on twelve myths. Some of these myths, Leavens believes, are based on an unduly inflated account of what joint attention in humans comes to, others on a distorted reading of the available evidence. The overall upshot of Leavens' argument is that the development of declarative pointing and joint attention in humans might have more to do with nurture than with nature: both apes and humans learn to point (and to jointly attend) when raised in certain man-made environments (p. 62).

Let me then get to the third, and final, set of questions, concerning the significance of joint attention. What explanatory and what normative roles does joint attention play?

Again, there is a background picture. We might call it "austere". Some roles of joint attention such as its role for linguistic development are so widely accepted that they are not discussed here: children learn which words pick out which objects or properties in part on the basis of jointly attending with adults to those objects or properties while the adults use those words. According to the austere picture, these types of roles for joint attention -- though, of course, hugely important -- exhaust the importance of joint attention. Those essays in the collection that address the topic of the significance of joint attention all aim at expanding the austere picture and point to new and so far rather unappreciated roles of joint attention.

A role for joint attention one might think of is in grounding cooperative action. I have already mentioned John Campbell's view that joint attention would make cooperative action rational in cases where it would otherwise not be. The relation between joint attention and collective action is also one we encounter in Elisabeth Pacherie's essay. Her essay primarily, though, concerns the nature of joint action, and how we experience joint action. Pacherie provides a detailed and complex account of how the agents' intentions are coordinated at various levels in cases of both small-scale joint action (when you and I lift a piece of furniture together), as well as large-scale joint action (when an orchestra plays a symphony or an army invades a country). She also provides an account of how we experience both our own agency as well as our common agency in various cases. There is a lot of interest in Pacherie's contribution. Unfortunately, the topic of joint attention comes up only in two paragraphs in order to explain the coordination of what she calls SP-intentions (p. 355 ff.). This seems to be a rather limited role for joint attention in the explanation of collective action. Especially in light of the embodied perspective on joint attention in some other essays in the volume, it would have been interesting to hear more about the depth of the connection between collective agency and joint attention.

A second role for joint attention discussed by Moll and Meltzoff as well as by Campbell concerns the explanation of perspective taking. Their ideas are interesting in part because they seem to reverse the order of explanation we find in the sophisticated picture of joint attention mentioned above. Perspective taking gets explained by joint attention, rather than occurring in an explanation of it.

Moll and Meltzoff provide a fascinating account of how episodes of joint attention and sharing of experience might provide the foundation for infants' development of an understanding of (potentially differing) perspectives. One series of studies in support of this claim is this: at around fourteen months infants have an appreciation of which object is new for an adult and which ones are familiar for her only if they have shared their experience of the familiar objects with the adult; if they have merely observed the adult engage with the familiar objects (either alone or with a third person) they do not understand what is new from her perspective. One of the most interesting aspects of Moll and Meltzoff's essay concerns a subtle yet important distinction between taking another perspective and confronting such a perspective. The distinction is best introduced by reference to two experiments performed with children at around three years. In both experiments, child and adult look at two blue objects A and B; while both look blue to the child, the adult is looking at A through a yellow filter so that it looks green to her. In one condition, the adult requests either "the green one" or "the blue one". The children were able to know which one she was referring to. They thus seem able to take another's perspective. In the other condition, the child was asked how she sees A as well as how the adult sees A. Interestingly, children at that age now indicate that the adult sees A as blue (just like the child). The children thus seem unable to confront two different perspectives (so that A looks blue in her own perspective, but green in the adult's perspective). The crucial difference between these two conditions, Moll and Meltzoff hypothesize, is that in the first the adult provides the child with her perspective and so children do not need to contrast or compare the perspective provided with another one like their own.

John Campbell similarly argues (in part by reference to work by Moll and Meltzoff) that engaging in joint attention might be essential to learn what someone else is attending to, which in turn forms the foundation of the ability to grasp that someone else might have a different perspective on that common object of attention. Campbell expands this idea to connect with traditional philosophical problems about other minds. In particular, Campbell makes a fascinating suggestion concerning the famous inability to know what it is like to be a bat discussed by Thomas Nagel. Campbell suggests that if someone could engage in joint attention with a bat and hence primitively grasp what it is attending to she "would have gone a long way toward understanding the bat's mental life" (p. 429). Though Campbell acknowledges that this doesn't solve the problem completely, he suggests that in the first place we learn about another's mental life not by getting inside her head, but through episodes of joint attention.

There is another role for joint attention one might think of. Could it be that inter-subjective experiences like joint attention shape the very way individual subjects experience the world around them?

Constantini and Signigaglia's essay attempts to argue that when we perceive an object as affording a certain kind of bodily action, for example, a mug as being graspable, we take into account not just our own spatial relation to the object (whether it is close enough, and not blocked by a barrier), but also the spatial relation of another person (in fact an avatar!, p. 444) who is with us in the same situation (whether it is close enough to her, and not blocked from her by a barrier). While I see no reason to doubt Constantini and Signigaglia's finding that perceived grasp-ability by another affects reaction times in a way that is similar to graspability by ourselves (p. 444), I found this rather weak evidence for asserting that at the basic level affordances depend on all the actors in the situation (p. 451). Furthermore, I was struck by the disconnect between their discussion and the rest of the volume since all their research was performed in the evident absence of joint attention (the avatar never looked in the direction of the experimental subject). Whatever their findings thus show about social cognition, this is something that is independent of the joint engagement that the rest of the volume is highlighting.

Let me end by returning to the volume as a whole. It is inspiring to work through the various perspectives on the complex issues related to joint attention. The collection displays the liveliness of this research area. Whether it successfully undermines the individualistic, sophisticated and austere background pictures remains to be judged; and much research, both empirical as well as conceptual, remains to be done. Whatever the correct views about joint attention are though, anyone who has read this volume will have many new things to think about. Joint attention might indeed be where the life of the mind and the social life first meet; and somewhere close to that meeting place is where to find what makes us humans unique among the animals. This should be enough to keep thinking about joint attention.


Peacocke C. (2005). Joint Attention: Its Nature, Reflexivity, and Relation to Common Knowledge, In: Joint Attention: Communication and Other Minds, ed. N. Eilan, C. Hoerl, T. McCormack, J. Roessler, Oxford University Press

Tomasello M., Carpenter M., Call J., Behne T., and Moll H. (2005). Understanding and sharing of intentions: The origins of cultural cognition, Behavioral and Brain Sciences, 28: 675-735

Wyman E., Rakoczy H., and Tomasello M. (2012). Non-verbal communication enables children's coordination in a ''Stag Hunt'' game, European Journal of Developmental Psychology, 1-14.

[1] This picture is a simplified caricature and should not be attributed to any particular person. Peacocke 2005, though, does offer an account in this neighborhood.

[2] See Wyman, Rakoczy and Tomasello 2012 for recent evidence that suggests that infants are in fact more likely to cooperate in situations with direct intersubjective engagement.

[3] This picture, again, is a caricature and should not be attributed to any actual researcher. In order to appreciate the issues it might be helpful to also engage with Tomasello et al. 2005.