Elizabeth D. Kirby; Melissa J. Glenn; Noah J. Sandstrom; Christina L. Williams

6.5 Extrastriate Cortex

Learning Objectives

By the end of this section, you should be able to

6.5.1 Describe the broad functional role of the dorsal and ventral visual processing streams in the cortex
6.5.2 Describe the stimulus-selectivity of inferotemporal neurons and how they were discovered

Visual processing continues in several cortical regions adjacent to V1 referred to as extrastriate cortex (“extra” means outside of V1). Most of the extrastriate areas have a retinotopic map like V1, but the individual receptive fields are larger and many fields cross the visual midline. Since we detect and recognize objects placed anywhere in the visual field, especially in the fovea where we see fine detail, wider receptive fields that cross the midline would be necessary to support conscious visual perception. V1 is just the beginning of cortical visual processing, and the extrastriate cortical areas are where additional stages of visual processing occur.

Extrastriate Visual Areas Overview

The extrastriate cortex beyond V1 divides into two major pathways: a dorsal pathway concerned with stimulus position and movement, and a ventral pathway leading to the perception of objects, faces, bodies, and scenes. The two pathways were first identified through experiments with monkeys trained on tasks that required distinguishing the location of an object (“where”) or the appearance of an object (“what”). Lesions in the dorsal and ventral cortex selectively interfered with one task or the other. Later studies recording from neurons confirmed the two pathways. Figure 6.33 shows many of the extrastriate visual areas that have been discovered. The relative size of each cortical area is represented by the size of the labeled boxes, and the number of axons connecting areas is represented by the thickness of the lines between them. These connections go in both directions, with axons projecting forward from a lower to a higher area, and also feedback connections from higher areas to lower ones. In fact, except for the one-way optic nerve from the retina to the LGN, all other visual areas have both ascending and descending connections, and the descending (feedback) connections have been shown to modify the activity of neurons in lower areas. For example, V1 projects strongly to V2 and V4, but those areas also project back to V1.

Left is a diagram of a non-human primate brain. V1 is labeled at the most posterior end of the cortex. Anterior to V1 is V2. V4 is anterior to V2, but does not extend all to the way to the dorsal surface of the brain. IT is anterior to V4 and is contained more ventrally, within the temporal lobe. Arrows represent information flow of the dorsal stream (V1 to V2 to dorsal side of brain) and ventral stream (V1 to V2 to V4 to IT). Right is a circuit diagram where boxes of different sizes represent brain regions with lines connecting them. The dorsal (where) pathway for position and movement is on top and the ventral (what) pathway for objects and faces is on bottom. The largest areas are V1 and V2, which are part for both pathways. From there, there are many smaller areas, all heavily interconnected with each other and even between the two pathways. — Figure 6.33 Anatomy and functional connections of extrastriate areas Image credit: Wiring diagram reprinted from Neuron, Vol. 60, Wallisch & Movshon,Structure and Function Come Unglued in the Visual Cortex, Pages No.195-197, Copyright (2008), with permission from Elsevier. https://www.cell.com/neuron/pdf/S0896-6273(08)00851-9.pdf

That path, V1 to V2 to V4, is the main pathway for conscious visual perception. It leads to the temporal cortex (specifically the anterior (AIT), central (CIT) and posterior (PIT) inferotemporal cortex). These IT areas are locations for patches of neurons that respond selectively to shapes, faces, colors and places. These components of the ventral stream are discussed in the following sections.

Dorsal Stream

The dorsal stream visual pathway processing areas have received less attention than the ventral stream, but one area that has received interesting experimental attention is area MT, “middle temporal.” The MT cortex is a motion-selective area organized in columns where all the neurons in a column are selective for a single direction of movement. Experiments at Stanford University showed that monkeys trained to report the direction of movement of subtle regions of random dots moving amidst stationary dots could be led to see sub-threshold movements if a column in MT was electrically stimulated (Salzman et al., 1990). The stimulation caused the neurons in that column to fire, which led the monkey to perceive movement that did not actually exist. The experiment was a direct demonstration that activity in an extrastriate area can lead to a visual perception.

Ventral Stream

The ventral “what” pathway for conscious visual perception leads from V4 to the lower (“inferior”) surface of the temporal lobe, the inferotemporal cortex (IT). Receptive fields of neurons in IT are significantly different from the earlier visual areas. They are not organized in a retinotopic map, but instead are grouped by the similarity of their most effective stimuli, such as the shapes of objects or the components of faces. IT receptive fields are very large (from 25 to 70 degrees of visual angle, a substantial span of the visual field’s 180 degrees), they always include the fovea, and they span the midline to include portions of both the left and right visual fields. Effective stimuli for an IT neuron can include objects, faces, or hands, independent of the size or exact position of the stimulus.

The selectivity of IT neurons represents major steps toward conscious visual perception. Neurons in the first stage of cortical processing, the simple cells of V1, have small receptive fields (about 1 degree) that require exact positioning of an oriented edge. They will detect small components of an object’s outline, with thousands of V1 neurons responding to even a small object. Shifting the stimulus’s position or size in the retinal image will activate a completely different population of V1 neurons. In contrast, IT neurons are flexible in their response to the position and size of the retinal image, a necessary step in perceiving objects.

Extensive experiments to record from IT neurons began in the 1980s. Among the earliest findings were that posterior IT areas, the ones closest to extrastriate cortex, had optimal stimuli that were more elaborate than V4 neuron receptive fields. Actual objects such as toy animals were effective, although a tiger’s face, for example, could be reduced to a simpler image of a rectangle with attached circles. More anterior regions of the IT cortex had more complex receptive fields, with patches of neurons that responded selectively to faces, colors, and scenes.

If a visual area like IT has neurons with selective responses to complicated images, a methodological problem arises. How do the experimenters know if they have found the optimal stimulus? It is feasible to show a test array of hundreds of photos of different images, flashing each photo briefly while detecting if an IT neuron fires action potentials, but the most effective stimulus may not be in the test set. The best that can be accomplished is to identify a general category that the neurons respond to, and then narrow down the important details in the stimulus. This strategy has been employed to characterize neurons that respond to objects and faces.

Another approach, especially for intermediate extrastriate areas like V4, has been to use computer-generated patterns that are systematically and randomly varied. Variations that lead to more vigorous firing are retained, while less effective variations are discarded. Repeating the cycle and continuing to choose more effective variations eventually generates apparently optimal stimuli, but they are often complicated and do not resemble simple geometric shapes that are easily described in words. This remains a puzzling aspect of visual processing.

Face-Selective Units

In the 1980s, exploratory recordings made by Charles Gross and his colleagues at Princeton University discovered neurons in monkey IT that responded vigorously to images of faces. The faces could be of monkeys, humans, or even cartoons, but if the eyes were omitted or the image was scrambled, the response was reduced or eliminated (Figure 6.34). Faces are an important element of communication for monkeys and humans. Faces support identification of familiar individuals, and they also reveal emotional expressions from unfamiliar strangers. In recording from randomly encountered neurons in electrode penetrations of IT, the Princeton group found neurons that were selective for a variety of images, not just faces.

Drawings of electrophysiological recordings of action potentials in inferotemporal cortex of monkeys with different stimuli. Stimuli that look like human or monkey faces elicit firing. Stimuli like a hand or chaotic lines do not elicit firing. — Figure 6.34 Face-selective units Recordings in inferotemporal cortex of monkeys shows neurons that respond selectively to faces. Image credit: Journal of Neurophysiology, Vol. 46, Issue 2. Visual properties of neurons in a polysensory area in superior temporal sulcus of the macaque. C Bruce, R Desimone, C G Gross. (1981). With permission from The American Psychological Association.

People Behind the Science: Dr. Doris Tsao and Current Approaches to Studying Face-Selective Units

Dr. Doris Tsao and her collaborators have introduced a more targeted approach to investigating face-selective units in primates. Dr. Tsao began her work on face patch neurons as a graduate student and postdoctoral scientist at Harvard Medical School, collaborating with Winrich Freiwald and Professor Margaret Livingstone. Tsao continued this work as a professor at Caltech, where she received a MacArthur “genius” award, and continues it now at the University of California at Berkeley. Dr. Tsao and her collaborators used fMRI to locate six face patches in IT cortex that contain neurons responsive to faces. They then recorded from individual neurons in each patch. Figure 6.35 (right side) shows the responses of 182 neurons in a face patch (vertical axis) to 96 standard images (horizontal axis) of faces, bodies, fruits, gadgets, hands, and scrambled photos. The images were flashed on a screen in random order. Each horizontal row represents the responses from one of the 182 recorded neurons, with red indicating a strong response. For analysis, the responses were lined up to place all the responses to one stimulus image in a vertical line, and to group the stimuli into six categories. For almost all the neurons in a face patch, it was clear that only the 16 images of faces consistently elicited strong responses, with little or no response to the other images.

Left is a photo of Dr. Doris Tsao. Right is a multicolored heatmap, showing responses of 180 neurons to different sets of stimuli. High response rate are seen to faces, but not bodies, fruits, gadgets, hands or scrambled images. — Figure 6.35 Doris Tsao and fMRI study of face-selective units Responses recorded from 182 individual neurons in one face patch show that only images of faces elicited strong responses from neurons in the face patch. Image credit: Doris Tsao by Giantnanoassembler, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=71887161. Data and images of stimuli: Tsao DY, Freiwald WA, Tootell RB, Livingstone MS. A cortical region consisting entirely of face-selective cells. Science. 2006 Feb 3;311(5761):670-4. doi: 10.1126/science.1119983. PMID: 16456083; PMCID: PMC2678572. Reprinted with permission.

Other experiments used cartoon faces as stimuli, permitting systematic variation of features such as the proportions of the face, the size of the nose, the diameter of the eyes, and the height of the hair. Different neurons were tuned to extremes of some features but not affected by variations in other features. Later research found two additional face patches where the responses were affected by whether the face was familiar to the monkey being tested. One clear conclusion was that a given face would elicit responses from a large subset of neurons in multiple face patches, with different faces activating different subsets.

In online videos, Dr. Tsao explains characteristics of the first face-selective neurons she studied as well as her later work at Caltech.

Face patches have also been identified by fMRI in the human brain, a region called the fusiform face area (FFA). Research on IT face patches continues, with a goal of revealing how advanced visual processing occurs in sequential stages for a well-defined image category. These studies may help explain how a scene that activates millions of neurons in multiple cortical areas leads to our conscious perception of the components of the visual world around us.