various traits, even though he had no way to know anything about the molecular and genetic mechanisms that made them true. In the case of vision, I think the best example is one we’ve already considered, in which Thomas Young predicted the existence of three kinds of color receptors in the eye based on playing around with colored lights.
When studying perception and discovering the underlying laws, sooner or later one wants to know how these laws actually arise from the activity of neurons. The only way to find out is by opening the black box—that is, by directly experimenting on the brain. Traditionally there are three ways to approach this: neurology (studying patients with brain lesions), neurophysiology (monitoring the activity of neural circuits or even of single cells), and brain imaging. Specialists in each of these areas are mutually contemptuous and have tended to see their own methodology as the most important window on brain functioning, but in recent decades there has been a growing realization that a combined attack on the problem is needed. Even philosophers have now joined the fray. Some of them, like Pat Churchland and Daniel Dennett, have a broad vision, which can be a valuable antidote to the narrow cul-de-sacs of specialization that the majority of neuroscientists find themselves trapped in.
IN PRIMATES, INCLUDING humans, a large chunk of the brain—comprising the occipital lobes and parts of the temporal and parietal lobes—is devoted to vision. Each of the thirty or so visual areas within this chunk contains either a complete or partial map of the visual world. Anyone who thinks vision is simple should look at one of David Van Essen’s anatomical diagrams depicting the structure of the visual pathways in monkeys (Figure 2.6), bearing in mind that they are likely to be even more complex in humans.
Notice especially that there are at least as many fibers (actually many more!) coming back from each stage of processing to an earlier stage as there are fibers going forward from each area into the next area higher up in the hierarchy. The classical notion of vision as a stage-by-stage sequential analysis of the image, with increasing sophistication as you go along, is demolished by the existence of so much feedback. What these back projections are doing is anybody’s guess, but my hunch is that at each stage in processing, whenever the brain achieves a partial solution to a perceptual “problem”—such as determining an object’s identity, location, or movement—this partial solution is immediately fed back to earlier stages. Repeated cycles of such an iterative process help eliminate dead ends and false solutions when you look at “noisy” visual images such as camouflaged objects (like the scene “hidden” in Figure 2.7).3 In other words, these back projections allow you to play a sort of “twenty questions” game with the image, enabling you to rapidly home in on the correct answer. It’s as if each of us is hallucinating all the time and what we call perception involves merely selecting the one hallucination that best matches the current input. This is an overstatement, of course, but it has a large grain of truth. (And, as we shall see later, may help explain aspects of our appreciation of art.)
FIGURE 2.6 David Van Essen’s diagram depicting the extraordinary complexity of the connections between the visual areas in primates, with multiple feedback loops at every stage in the hierarchy. The “black box” has been opened, and it turns out to contain…a whole labyrinth of smaller black boxes! Oh well, no deity ever promised us it would be easy to figure ourselves out.
FIGURE 2.7 What do you see? It looks like random splatterings of black ink at first, but when you look long enough you can see the hidden scene.
The exact manner in which object recognition is achieved is still quite mysterious. How do the neurons firing away when you look at an object recognize it as a face rather than, say, a chair? What are the defining attributes of a chair? In modern designer furniture shops a big blob of plastic with a dimple in the middle is recognized as a chair. It would appear that what is critical is its function—something that permits sitting—rather than whether it has four legs or a back rest. Somehow the nervous system translates the act of sitting as synonymous with the perception of chair. If it is a face, how do you recognize the person instantly even though you have encountered millions of faces over a lifetime and stored away the corresponding representations in your memory banks?
Certain features or signatures of an object can serve as a shortcut to recognizing it. In Figure 2.8a, for example, there is a circle with a squiggle in the middle but you see a pig’s rump. Similarly, in Figure 2.8b you have four blobs on either side of a pair of straight vertical lines, but as soon as I add some features such as claws, you might see it as a bear climbing a tree. These images suggest that certain very simple features can serve as diagnostic labels for more complex objects, but they don’t answer the even more basic question of how the features themselves are extracted and recognized. How is a squiggle recognized as a squiggle? And surely the squiggle in Figure 2.8a can only be a tail given the overall context of being inside a circle. No rump is seen if the squiggle falls outside the circle. This raises the central problem in object recognition; namely, how does the visual system determine relationships between features to identify the object? We still have precious little understanding.
FIGURE 2.8 (a) A pig rump.
(b) A bear.
The problem is even more acute for faces. Figure 2.9a is a cartoon face. The mere presence of horizontal and vertical dashes can substitute for nose, eyes, and mouth, but only if the relationship between them is correct. The face in Figure 2.9b has the same exact features as the one in Figure 2.9a, but they’re scrambled. No face is seen— unless you happen to be Picasso. Their correct arrangement is crucial.
But surely there is more to it. As Steven Kosslyn of Harvard University has pointed out, the relationship between features (such as nose, eyes, mouth in the right relative positions) tells you only that it’s a face and not, say, a pig or a donkey; it doesn’t tell you whose face it is. For recognizing individual faces you have to switch to measuring the relative sizes and distances between features. It’s as if your brain has a created a generic template of the human face by averaging together the thousands of faces it has encountered. Then, when you encounter a novel face, you compare the new face with the template—that is, your neurons mathematically subtract the average face from the new one. The pattern of deviation from the average face becomes your specific template for the new face. For example, compared to the average face Richard Nixon’s face would have a bulbous nose and shaggy eyebrows. In fact, you can deliberately exaggerate these deviations and produce a caricature—a face that can be said to look more like Nixon than the original. Again, we will see later how this has relevance to some types of art.
FIGURE 2.9 (a) A cartoon face.
(b) A scrambled face.
We have to bear in mind, though, that words such as “exaggeration,” “template,” and “relationships” can lull us