Things and Places: How the Mind Connects with the World

Placeholder book cover

Zenon W. Pylyshyn, Things and Places: How the Mind Connects with the World, MIT Press, 2007, 255pp., $34.00 (hbk), ISBN 9780262162456.

Reviewed by Christopher S. Hill, Brown University


Pylyshyn's new book ranges over a number of topics in perceptual psychology, including perceptual reference to external objects, the roles that representations of locations play in various perceptual processes, attention, perceptual binding, tracking objects across time, the nature of visual objects, the evidential significance of conscious experience for scientific theories of perception, the nature of experiential representation, the visual imagination, and spatial reasoning. Despite this breadth of scope, the book has several unifying concerns, foremost among which is the goal of presenting and defending a theory of how, at the most basic level, the visual system hooks onto the world, achieving perceptual engagement with specific objects in the local environment. This theory is scientific rather than philosophical in character, and in general, the book is scientific in its orientation and manner of argument, with experimental evidence almost always in the foreground. But Pylyshyn is philosophically sophisticated and devotes a number of pages to discussing the philosophical implications of his scientific views. These passages are frequently quite interesting -- on the whole, they occasioned lively discussion in a philosophical reading group that recently worked through the book.[1] To be sure, it is far from clear that Pylyshyn's theory of foundational perceptual reference has the philosophical significance that he attributes to it, but the question of its significance is itself a matter that is of philosophical interest.

The book has five chapters and a short concluding section that summarizes its main findings. The first three chapters are devoted to Pylyshyn's theory of foundational perceptual reference, the fourth primarily to perceptual consciousness, and the fifth to spatial reasoning. In addition to their primary concerns, the fourth and fifth chapters also deal at some length with the nature of the visual imagination.

Pylyshyn's exposition is on the whole admirably clear and comprehensive, but at certain points it is helpful to refer to the corresponding passages in his previous book, Seeing and Visualizing: It's Not What You Think (Cambridge, MA: MIT Press, 2003). The coverage of key experiments tends to be less compressed in that earlier work. Also, in an appendix to Chapter 5, the earlier work presents a sketch of how Pylyshyn's ideas about early vision might be implemented by a connectionist model. It can be useful to keep that model in mind.

The central claim of Pylyshyn's theory of perceptual reference is that early vision makes use of four or five "pointers" or "indexes" that function something like demonstratives. They have no descriptive content, but rather acquire objective reference by being brought under the causal control of external objects. These indexes, which Pylyshyn calls "FINSTs," "provide a reference to some sensory individual … without thereby encoding any property of the individual that is indexed" (p. 67). Here are two other passages that will help to fix ideas:

What we need is a way to refer to individual things in a scene independent of their properties or their locations. This is precisely what FINSTs provide. (p. 23)

FINSTs give us nonconceptual access to what I have called a thing or a sensory individual or a visual object … . Because the representation is non-conceptual, these sensory individuals are not represented as objects or as Xs for any possible category X. They are just picked out transparently by a causal or informational process without being conceptualized as something or other. Early vision picks out and indexes a small number (4 or 5) of such sensory objects, roughly the way you might pick out a fish by placing a baited hook in the water -- it happens primarily at the initiative of objects; we say it is data driven. (p. 56)

These passages capture the core of Pylyshyn's position, but it is necessary to interpret them with some care. In describing the processes by which FINSTs come under the control of specific objects, he acknowledges that this could not happen if the visual system was not sensitive to certain clusters of external properties. That is, he explicitly allows that objects capture or "grab" FINSTs by instantiating clusters of properties to which the visual system is sensitive. To appreciate the point of the quoted passages, we have to follow Pylyshyn in distinguishing between a mechanism that is sensitive to the causal influence of a property cluster P and a representation of P that can combine with representations of objects to form predications. In maintaining that foundational reference is achieved independently of properties and locations, he means only that it is achieved independently of representational elements that have this sort of combinatorial, predicative capacity.

As Pylyshyn notes, his radically externalist position is a counterpart of the views of Kripke, Putnam, and Kaplan concerning the direct reference of proper names, kind terms, and demonstratives. In effect, Kripke argued that the ability to use a name N to refer to an object X depends on there being an information channel that connects one's use of N to X. Similarly, Pylyshyn maintains that the ability to perceive specific objects depends on there being a flow of information from the objects to pointers or indexes in the early visual system. This view provides indirect support for Kripkean semantic externalism. Further, as Pylyshyn also notes, even though it is concerned with early vision, his externalist account of perceptual reference has connections with philosophical accounts of higher level perceptual awareness of objects, and also with philosophical accounts of the ability to refer to objects in thought. It may be, he says, that Quine and Strawson were right in claiming that higher level reference to objects involves subsuming them under sortals, and therefore under principles of identity, but this means only that we must make use of sortals and attendant principles of identity in order to represent objects as determinate items and to achieve knowledge of their spatiotemporal boundaries. That is to say, while Quine and Strawson arguably give us necessary conditions of higher level cognitive engagements with objects, their conditions are not sufficient. If higher level cognitive endeavors are to make contact with specific external objects, then they must rest at some point on perceptual engagements that are purely informational in character.

Pylyshyn offers a number of arguments in support of his views about foundational reference. The most prominent of these derives from an experimental paradigm that he calls multiple object tracking (MOT).

In this experiment … a small number of target objects (usually around three or four) are briefly distinguished from a number of visually identical nontarget objects, typically by blinking the targets on and off a few times. Then all objects move around unpredictably on a screen, the targets traveling helter-skelter among the identical nontargets, for some period of time (say, around ten seconds). (p. 35)

The basic finding in experiments of this sort is that, as long as the number of targets remains comparatively small, subjects are quite successful in tracking them. Pylyshyn maintains that this finding is best explained by the hypothesis that the targets grab FINSTs and continue to exert control over them. At the end of an experiment, the subject responds to the question "Which objects are the targets?" by simply pointing to the ones that are associated with FINSTs.

Let us grant the experiments are best explained by supposing that targets grab demonstrative elements. It remains open whether we should follow Pylyshyn in supposing that representations of properties play no role in MOT. Couldn't it be that a subject opens an object file for each of the target objects at the outset of the experiment, where an object file consists of a FINST and several representations of properties, and that each file is automatically updated as the experiment proceeds?[2] (Perhaps we could just identify FINSTs with object files.) Pylyshyn states a number of objections to proposals that belong to this general family. It is clear, he points out, that successful tracking does not rely on information about the color, size, or shapes of targets, for the targets in the experiments resemble the distractors in all these respects. Of course, it could still be true that subjects represent color, size, or shape -- they need not make use of all of the information that is available to them in all of their cognitive endeavors. But if subjects did in fact represent these properties, then tracking should improve in versions of the experiment in which targets have colors, shapes, or sizes that distinguish them from the distractors. Experiments show that this does not happen. What about representations of locations? The hypothesis that MOT requires such representations is initially plausible, but Pylyshyn thinks that is very likely wrong. Unfortunately, it is not altogether clear why he rejects the hypothesis. Thus, his only truly decisive argument against it, stated on p. 37, does not rule out all versions of it. In particular it has no force against the version which claims (i) that subjects track moving objects by deploying object files, (ii) that object files contain representations of locations, and (iii) that these representations are continuously and automatically updated. In the end, I conjecture, Pylyshyn is opposed to the hypothesis because he thinks it is possible to explain MOT more simply and efficiently in terms of another hypothesis. Let us suppose that once a target has grabbed a FINST, it can continue to exercise control over the FINST as long as it moves continuously through space. According to this idea, facts about locations play an essential causal role in FINST-based tracking, but they need not be represented by the visual system. This hypothesis is simpler than the forgoing hypothesis about object files. Accordingly, if it can be made to work, it should be preferred on methodological grounds.

There are a number of problems with the picture I have been describing. I will mention three.

First, as Pylyshyn himself points out, there are experimental grounds for thinking that MOT does after all require representations of locations. Thus, subjects are able to keep track of targets over intervals during which they are behind occluders and are therefore invisible. There is good reason to think that this aspect of tracking makes use of representations of the locations where the objects disappear. In Pylyshyn's words, it "seems at least that when tracked targets disappear there is a record of where they were when they disappeared" (p. 40). How can he recognize this fact while denying that tracking requires representations of locations? The answer is that he thinks that normal tracking can proceed without such representations; it is only in the unusual condition of occlusion that that they are required. That is to say, he claims that representations of locations are created only when objects disappear: "our assumption is that the disappearance itself causes locations to be conceptualized and stored in memory" (p. 80). This is a possible interpretation of the results, but Pylyshyn seems not to realize that a mechanism that detects disappearances and then creates representations of last known locations (perhaps by drawing on iconic memories of the locations) may not be simpler than a mechanism that creates representations of locations at the outset of a tracking venture and then automatically updates them. In other words, his interpretation of tracking across disappearances adds significantly to the complexity of his initial FINST-based hypothesis about tracking, and may thereby undercut the methodological argument for its correctness.

Second, like any other purely causal or informational theory of reference, Pylyshyn's theory is called into question by the fact that there are always multiple causal stories to be told about the activation of a representational element. The rabbit before me is a cause of the current deployment of one of my FINSTs, but so are the surface of the rabbit, the lepiform image that is on my retina, and the projections of that image onto various low level processing sites. Which of these causes is the referent? Pylyshyn is aware that he needs an answer to this question. Otherwise FINSTs will have a referential ambiguity that will render all predications involving them indeterminate, and that will be inherited by all of the higher level representational items that depend on FINSTs. But he has no answer.

There are several views on this question, which I will not discuss here. It is one of the "big questions" about how reference is naturalized and is beyond the scope of this monograph. (p. 97)

I applaud the courage and honesty of this passage, but I do not share Pylyshyn's underlying confidence concerning the possibility of explaining the reference of perceptual items purely in terms of causal control by external objects. Suppose you are looking at a rabbit. The rabbit is the referent of a conscious perceptual state. Why? What makes it the referent? It is surely right to say that reference here depends on an informational relation between you and the rabbit, for if the rabbit were not a cause of your experience, the experience would simply be a hallucination that happens to occur when a rabbit is present. On the face of it, however, it also seems relevant that the rabbit has certain of the properties, including especially the location, that your experience represents as jointly exemplified. To be sure, it need not be true that the rabbit has all of the properties that your experience attributes to it. But could it be true that it has none of those properties? Probably not. Experiential reference to individual objects seems to require representation of properties that is at least partially apt. But why should it be different at the level of Pylyshyn's subpersonal indexes? How could indexes be autonomously referential if reference at the level of experience requires support from a background chorus? How can indexes achieve what cannot be achieved elsewhere?

Third, it is difficult to reconcile Pylyshyn's account of MOT with our conscious experience of tracking. According to Pylyshyn, the main work in MOT is done by demonstrative devices that are subpersonal, pre-predicative, and pre-attentive. Indeed, MOT can be fully explained in terms of the ways these subpersonal devices are mechanically controlled by external objects. But there is nothing in our experience of tracking that answers to such devices. On the contrary, tracking seems to us to require perceptual experiences that are irreducibly predicative in character, in the sense that they represent objects of awareness as endowed with properties. This opposition between the subpersonal processes that Pylyshyn describes and the person-level activity that we engage in becomes especially vivid when we recall Pylyshyn's view that representations of locations play no role in tracking. With regard to the person-level activity of tracking, we are strongly inclined to think that if we had no experience of the locations of objects, we could not track them. We feel this way, I suggest, because it seems that it would be impossible to keep track of the individual members of a group of objects at the level of conscious experience unless we could distinguish among the objects, and because the only way of distinguishing among the objects in a MOT experiment is on the basis of location (differences in location being the only differences among the objects). Now in view of this difference between person-level tracking and Pylyshyn's subpersonal processes, it is hard to see how the former could be constituted or realized by the latter. But if it is not constitutive, then the relationship between person-level tracking and Pylyshyn's processes is purely causal. Unfortunately, since Pylyshyn claims that subpersonal processes can fully explain the success of tracking, it follows that if his theory is correct, then most aspects of person-level tracking are epiphenomenal, including our conscious experience of locations. No one could feel happy about this conclusion.

Chapter 4 is concerned with conscious experience and the form of representation that consciousness involves. It addresses epistemological questions concerning the evidential value of first person testimony about experience, focusing particularly on questions about the value of such testimony for scientific theories of perception, and it also addresses metaphysical questions about the nature of experience and its representational content.

In the early stages of his epistemological discussions, Pylyshyn points out how hard it is to answer various questions about experience from a first person perspective. "[D]o I experience the uniformity of the color and lightness of the wall which, as it happens, I know is in fact not uniformly illuminated? Is the uniformity of lightness and color constancy that I am describing an inference or a direct experiential content?" (p. 104). After citing a number of problems of this sort, Pylyshyn goes on to review various respects in which experience is known to be unreliable as a guide to underlying processes, citing, among other things, Libet's experimental demonstration that the experience of willing an action actually follows the early stages of the neural activity that produces the action, and the rubber hand illusion, in which it seems to a subject that he is moving a hand that in fact belongs to another person. (Because the subject sees the alien hand in a mirror, the hand seems to him to be in the place where he knows his own hand to be.) These observations are intended only to weaken our intuitive commitment to the idea that first person testimony about conscious experience should always be taken at face value. The next step is to deliver what Pylyshyn takes to be the coup de grace -- an updated version of his well known theory of mental imagery. It seems to us that mental imagery involves experiences that are qualitatively similar to those that are involved in perception, and, more particularly, that both imagery and perception make use of a proprietary form of representation -- a form that is naturally regarded as depictive or iconic in nature. These intuitions have received a lot of experimental support, principally from work by Roger Shepard and Stephen Kosslyn. Moreover, Kosslyn has developed the intuitions into a highly sophisticated theory that explains both behavioral and neuroscientific data in elegant ways.[3] As he has in a number of earlier works, Pylyshyn criticizes both the intuitions themselves and the theories to which they give rise. His alternative explanation of the data is complex, but the key idea is that when we are asked to imagine something, we use tacit conceptual knowledge of how the thing would look to us if we were actually seeing it. Subjects in imagination experiments "have certain beliefs about what things look like, how they change (e.g., how they move), and how events happen in space and time, and they can use these beliefs to mimic what would happen in a real situation" (p. 128). It follows that our perceptions about what happens when we imagine something are quite mistaken.

I should add that Pylyshyn does more than propose an alternative theory. He also rejects the widely held view that image-theories provide adequate explanations of the relevant data. (Thus, he would contest my foregoing claim concerning the explanatory success of Kosslyn's theory.) According to image-theorists, (i) there is a robust set of architectural constraints on how imagined scenarios evolve, (ii) these constraints give rise to certain autonomous laws of the imagination, and (iii) the autonomous laws provide deep explanations of various bodies of experimental data, such as the fact that the interval required to imagine two congruent objects moving into alignment is proportional to the size of the angle that originally separates the objects. Pylyshyn denies these claims. While allowing that there are some architectural constraints on endeavors involving the imagination, he thinks that most of the data in imagery experiments can be explained by appeal to freely adopted intentions to deploy the imagination in certain ways. Thus, for example, according to Pylyshyn, when we set out to imagine two congruent objects moving into alignment, we can proceed either by imagining a continuous process or by imagining one of the objects instantaneously "jumping" into alignment with the other. It is only because subjects in alignment experiments freely decide to imagine a continuous process that their response times are proportional to the size of angles separating the objects. (On Pylyshyn's view, subjects make this choice because of the task demands of the experiment -- in effect, they interpret the experiment as one requiring continuous processes.)

Turning finally to metaphysical issues, Pylyshyn argues that experience does not have a single proprietary form of representation, but rather involves a mixture of forms. What we experience is a mixture of sensory information, amplified by constancy transformations, and "high level cognitive recognitions (i.e., familiar people, places, things, and events)" (p. 145). One of the consequences of this is that "equating nonconceptual representation with the content of conscious experience is a mistake" (p. xii). Consciousness makes use of both conceptual and non-conceptual representational schemes.

Space limitations preclude an extended evaluation of these arresting views. I will say a few things about them, but I should acknowledge in advance that my remarks will not do full justice to the power and subtlety of Pylyshyn's position.

First, philosophers who are not committed to radical Cartesian views about the power and reliability of introspection will have no difficulty accommodating Pylyshyn's initial epistemological arguments. Yes, there are questions about experience that unsupplemented introspection cannot resolve, and yes, introspective testimony about certain domains is unreliable. But this is old news, and anyway the same things might be said of any specific method for investigating questions about the mind. Pylyshyn's observations have no tendency to imply that introspection is weak or generally untrustworthy.

Second, Pylyshyn's critique of imagery seems misguided to me. I have five reasons for this view. (a) Many of his remarks are directed against a strawman. Here is an example: "The writings on mental imagery typically begin with the assumption that because the experience of having a mental image is very much like the experience of seeing something, entertaining an image must also involve seeing something" (p. 125). This is unfair. Sophisticated defenders of imagery do not claim that imagination involves a quasi-seeing of mental pictures, but only that when one imagines something X, one puts oneself into a state that is intrinsically and representationally similar to the state one would be in if one were actually seeing X. (b) If we take certain of Pylyshyn's formulations of his views at face value, then his position is much more similar to that of the image theorist that he seems to allow. Thus, he often deploys the notion of simulation in explaining his own position. To imagine an event, he says, is to use tacit knowledge "to simulate what would happen if the event were actually witnessed" (p. 136). Now as I see it, when one simulates a perceptual experience (or a series of them), one puts oneself into a state that is intrinsically and representationally similar to the state (or states) one would be in if one were actually to undergo the experience (or series of experiences). Hence, to say that imagining involves simulating is to say pretty much the same thing as a sophisticated image-theorist would say. (c) If Pylyshyn and the image-theorist are in agreement on the nature of the representations that the imagination deploys, how exactly do they differ? One difference is that Pylyshyn believes, while the image-theorist denies, that propositional knowledge plays the most important role in initiating simulations and guiding them. There is something to be said for Pylyshyn's view. It appears, however, that there are grounds for thinking that the role of propositional knowledge is much more limited than he maintains. To see this, observe first that (as Pylyshyn allows) we are often unable to do justice to the objects and situations that we perceive in words or concepts -- the perceived items are too complex, or too irregular, or too determinate. Second, observe that the same is true of imagined objects and imagined situations. Thus, for example, many of the figures in Shepard's alignment/rotation experiments are too complex to be easily described -- full descriptions would require mathematical sophistication that many of the subjects in the experiments do not possess. But if the objects cannot be described, how can it be true that manipulation of them is achieved by applying propositional knowledge? (d) The other major difference separating Pylyshyn from the image-theorist is that the former maintains, while the latter denies, that the exercise of the imagination is not governed by autonomous laws, but is rather entirely free. (In other words, Pylyshyn maintains that instead of being governed by proprietary laws, the imagination is largely subject to top-down control. Image-theorists deny this.) There is no doubt that Pylyshyn is right to invoke top-down control, but I think it is a serious exaggeration to say that the data in imagery experiments are due to scenarios that are freely adopted by subjects. If I want to know that a figure I cannot fully describe can be brought into alignment with another figure then, it seems, I must imagine a continuous process. An imagined "jump" would provide no guarantee that the original objects can be fully aligned -- perhaps the shape of one of the objects would change in the course of the jump. (e) There are experimental data that can easily be explained by the image-theorist but that Pylyshyn can explain only with difficulty -- if at all. Thus, if we assume the image-theory, there is a natural neuroscientific explanation of why it is harder to imagine certain operations on oblique lines than on horizontal or vertical lines. (There are more cells in V1 that are dedicated to detecting horizontal and vertical lines than are dedicated to detecting oblique lines. This explains why it is harder to resolve oblique lines perceptually, and by the same token, if imagining is fundamentally perceptual in character, it explains why it is harder to resolve oblique lines in the imagination). Pylyshyn's theory has trouble explaining the relevant data because there is no reason to think that we have propositional knowledge of the comparative difficulty of performing perceptual operations on oblique lines. (I should note, however, that he faces this problem and attempts to deal with it.)

The fifth and final chapter is concerned with the representation of space, and more particularly, with how space is represented in activities like planning and anticipating outcomes of events -- activities that it is natural to describe as involving imagined spatial layouts. The chapter includes a critique of the view that the ability to imagine spatial relations derives from three dimensional maps in the brain, with representation taking the form of isomorphisms from the internal maps to the environment, and also a critique of the related view that awareness of spatial relations derives from functional equivalents of maps, with, for example, distance in functional space being determined by the number of processing steps that must intervene between the accessing of data relevant to point P1 in external space and the accessing of data that are pertinent to another point P2. Pylyshyn's discussions of these matters is insightful, though of course he cannot gainsay the fact that there are topographical maps of the visual field in the brain. (It is true, as Pylyshyn says, that these maps are two dimensional, and by the same token, it is true that they cannot by themselves provide an adequate model for physical space (p. 171). But it would be possible to achieve 3-D models, or partial models, by combining them with the various sorts of information about depth that are available to the visual system.[4])

What particularly exercises Pylyshyn is the question of how imagined objects can be assigned locations in a spatial frame of reference. Since he doesn't believe that the mind or brain has an internal frame of reference (in the way that it might, say, if there were 3-D cortical maps of external space), he has to say that we somehow make use of physical space in assigning locations. Here is a summary of his proposal as to how this occurs:

In imagining a spatial layout, we use visual indexes (FINSTs) to pick out concurrently perceived objects that are roughly in the same spatial locations as objects in the scene we are imagining. Each indexed object is associated with a unique label of a recalled or imagined object … . The spatial properties that concern the mental objects … result from the actual perception of the spatial relations among these indexed objects. (pp. 178-9)

This hypothesis, which Pylyshyn calls the index projection hypothesis, is developed in some detail, and experimental support is adduced for it. It appears to warrant further attention. Even so, I never quite got beyond my initial skepticism about the hypothesis. In effect, Pylyshyn is claiming that spatial reasoning depends essentially on concurrent perceptual input. Initially, at least, this view seems very implausible. In undertaking to defend it Pylyshyn assumes a rather large burden of proof.[5]

Although I have raised a number of objections, my overall feeling is one of gratitude to Pylyshyn for writing the book. It is a richly illuminating work and also an exciting work. It is highly recommended for philosophers who are interested in perception, mental representation, the imagination, and visual space.[6]

[1] Besides myself, the group consisted of Jacob Beck, David Bennett, Justin Broackes, Alex Byrne, Katherine Dunlop, Heather Logue, and Susanna Siegel. My debts to these people are quite extensive: they saved me from a number of errors, and they also provided a number of illuminating suggestions about the nature and possible limitations of Pylyshyn's arguments.

[2] D. Kahneman, A. Treisman, and B. J. Gibbs, "The Reviewing of Object Files: Object-specific Integration of Information," Cognitive Psychology 24 (1992), 175-219.

[3] For an extended defense of an alternative to Pylyshyn's theory of the imagination, see Stephen M. Kosslyn, William L. Thompson, and Giorgio Ganis, The Case for Mental Imagery (Oxford: Oxford University Press, 2006).

[4] For a discussion of depth information, see Stephen Palmer, Vision Science (Cambridge, MA: MIT Press, 1999), Chapter 5.

[5] Here I am particularly indebted to Bennett.

[6] I am grateful to Zenon Pylyshyn for commenting on an earlier version of this review. I have made a number of changes as a result of his advice. I should say, however, that the review still contains a number of claims to which he objects. The most important of these are my claims about the nature of the image-theory. Thus, for example, he thinks the "sophisticated image-theorist," as I portray her above, is a fiction. As he sees it, image theorists are as a rule committed to saying that we "see" or "scan" images. Perhaps it will be possible to engage Pylyshyn's position more fully on another occasion.

I am also grateful to Stephen Kosslyn, who commented on the penultimate draft, and to John Campbell, for support and encouragement.