Learning Objectives
By the end of this section, you should be able to
- 6.5.1 Describe the broad functional role of the dorsal and ventral visual processing streams in the cortex
- 6.5.2 Describe the stimulus-selectivity of inferotemporal neurons and how they were discovered
Visual processing continues in several cortical regions adjacent to V1 referred to as extrastriate cortex (“extra” means outside of V1). Most of the extrastriate areas have a retinotopic map like V1, but the individual receptive fields are larger and many fields cross the visual midline. Since we detect and recognize objects placed anywhere in the visual field, especially in the fovea where we see fine detail, wider receptive fields that cross the midline would be necessary to support conscious visual perception. V1 is just the beginning of cortical visual processing, and the extrastriate cortical areas are where additional stages of visual processing occur.
Extrastriate Visual Areas Overview
The extrastriate cortex beyond V1 divides into two major pathways: a dorsal pathway concerned with stimulus position and movement, and a ventral pathway leading to the perception of objects, faces, bodies, and scenes. The two pathways were first identified through experiments with monkeys trained on tasks that required distinguishing the location of an object (“where”) or the appearance of an object (“what”). Lesions in the dorsal and ventral cortex selectively interfered with one task or the other. Later studies recording from neurons confirmed the two pathways. Figure 6.32 shows many of the extrastriate visual areas that have been discovered. The relative size of each cortical area is represented by the size of the labeled boxes, and the number of axons connecting areas is represented by the thickness of the lines between them. These connections go in both directions, with axons projecting forward from a lower to a higher area, and also feedback connections from higher areas to lower ones. In fact, except for the one-way optic nerve from the retina to the LGN, all other visual areas have both ascending and descending connections, and the descending (feedback) connections have been shown to modify the activity of neurons in lower areas. For example, V1 projects strongly to V2 and V4, but those areas also project back to V1.
That path, V1 to V2 to V4, is the main pathway for conscious visual perception. It leads to the temporal cortex (specifically the anterior (AIT), central (CIT) and posterior (PIT) inferotemporal cortex). These IT areas are locations for patches of neurons that respond selectively to shapes, faces, colors and places. These components of the ventral stream are discussed in the following sections.
Dorsal Stream
The dorsal stream visual pathway processing areas have received less attention than the ventral stream, but one area that has received interesting experimental attention is area MT, “middle temporal.” The MT cortex is a motion-selective area organized in columns where all the neurons in a column are selective for a single direction of movement. Experiments at Stanford University showed that monkeys trained to report the direction of movement of subtle regions of random dots moving amidst stationary dots could be led to see sub-threshold movements if a column in MT was electrically stimulated (Salzman et al., 1990). The stimulation caused the neurons in that column to fire, which led the monkey to perceive movement that did not actually exist. The experiment was a direct demonstration that activity in an extrastriate area can lead to a visual perception.
Ventral Stream
The ventral “what” pathway for conscious visual perception leads from V4 to the lower (“inferior”) surface of the temporal lobe, the inferotemporal cortex (IT). Receptive fields of neurons in IT are significantly different from the earlier visual areas. They are not organized in a retinotopic map, but instead are grouped by the similarity of their most effective stimuli, such as the shapes of objects or the components of faces. IT receptive fields are very large (from 25 to 70 degrees of visual angle, a substantial span of the visual field’s 180 degrees), they always include the fovea, and they span the midline to include portions of both the left and right visual fields. Effective stimuli for an IT neuron can include objects, faces, or hands, independent of the size or exact position of the stimulus.
The selectivity of IT neurons represents major steps toward conscious visual perception. Neurons in the first stage of cortical processing, the simple cells of V1, have small receptive fields (about 1 degree) that require exact positioning of an oriented edge. They will detect small components of an object’s outline, with thousands of V1 neurons responding to even a small object. Shifting the stimulus’s position or size in the retinal image will activate a completely different population of V1 neurons. In contrast, IT neurons are flexible in their response to the position and size of the retinal image, a necessary step in perceiving objects.
Extensive experiments to record from IT neurons began in the 1980s. Among the earliest findings were that posterior IT areas, the ones closest to extrastriate cortex, had optimal stimuli that were more elaborate than V4 neuron receptive fields. Actual objects such as toy animals were effective, although a tiger’s face, for example, could be reduced to a simpler image of a rectangle with attached circles. More anterior regions of the IT cortex had more complex receptive fields, with patches of neurons that responded selectively to faces, colors, and scenes.
If a visual area like IT has neurons with selective responses to complicated images, a methodological problem arises. How do the experimenters know if they have found the optimal stimulus? It is feasible to show a test array of hundreds of photos of different images, flashing each photo briefly while detecting if an IT neuron fires action potentials, but the most effective stimulus may not be in the test set. The best that can be accomplished is to identify a general category that the neurons respond to, and then narrow down the important details in the stimulus. This strategy has been employed to characterize neurons that respond to objects and faces.
Another approach, especially for intermediate extrastriate areas like V4, has been to use computer-generated patterns that are systematically and randomly varied. Variations that lead to more vigorous firing are retained, while less effective variations are discarded. Repeating the cycle and continuing to choose more effective variations eventually generates apparently optimal stimuli, but they are often complicated and do not resemble simple geometric shapes that are easily described in words. This remains a puzzling aspect of visual processing.
Face-Selective Units
In the 1980s, exploratory recordings made by Charles Gross and his colleagues at Princeton University discovered neurons in monkey IT that responded vigorously to images of faces. The faces could be of monkeys, humans, or even cartoons, but if the eyes were omitted or the image was scrambled, the response was reduced or eliminated (Figure 6.33). Faces are an important element of communication for monkeys and humans. Faces support identification of familiar individuals, and they also reveal emotional expressions from unfamiliar strangers. In recording from randomly encountered neurons in electrode penetrations of IT, the Princeton group found neurons that were selective for a variety of images, not just faces.
People Behind the Science: Dr. Doris Tsao and Current Approaches to Studying Face-Selective Units
Dr. Doris Tsao and her collaborators have introduced a more targeted approach to investigating face-selective units in primates. Dr. Tsao began her work on face patch neurons as a graduate student and postdoctoral scientist at Harvard Medical School, collaborating with Winrich Freiwald and Professor Margaret Livingstone. Tsao continued this work as a professor at Caltech, where she received a MacArthur “genius” award, and continues it now at the University of California at Berkeley. Dr. Tsao and her collaborators used fMRI to locate six face patches in IT cortex that contain neurons responsive to faces. They then recorded from individual neurons in each patch. Figure 6.34 (right side) shows the responses of 182 neurons in a face patch (vertical axis) to 96 standard images (horizontal axis) of faces, bodies, fruits, gadgets, hands, and scrambled photos. The images were flashed on a screen in random order. Each horizontal row represents the responses from one of the 182 recorded neurons, with red indicating a strong response. For analysis, the responses were lined up to place all the responses to one stimulus image in a vertical line, and to group the stimuli into six categories. For almost all the neurons in a face patch, it was clear that only the 16 images of faces consistently elicited strong responses, with little or no response to the other images.
Other experiments used cartoon faces as stimuli, permitting systematic variation of features such as the proportions of the face, the size of the nose, the diameter of the eyes, and the height of the hair. Different neurons were tuned to extremes of some features but not affected by variations in other features. Later research found two additional face patches where the responses were affected by whether the face was familiar to the monkey being tested. One clear conclusion was that a given face would elicit responses from a large subset of neurons in multiple face patches, with different faces activating different subsets.
In online videos, Dr. Tsao explains characteristics of the first face-selective neurons she studied as well as her later work at Caltech.
Face patches have also been identified by fMRI in the human brain, a region called the fusiform face area (FFA). Research on IT face patches continues, with a goal of revealing how advanced visual processing occurs in sequential stages for a well-defined image category. These studies may help explain how a scene that activates millions of neurons in multiple cortical areas leads to our conscious perception of the components of the visual world around us.