Elizabeth D. Kirby; Melissa J. Glenn; Noah J. Sandstrom; Christina L. Williams

Learning Objectives

By the end of this section, you should be able to

6.6.1 Describe several open questions about how visual perception is derived from visual sensory information
6.6.2 Define the major difference between top-down versus bottom-up processing in visual perception
6.6.3 Describe what a grandmother cell is

Starting with the discovery in the 1950s of the center-surround receptive fields of retinal ganglion cells, it became clear that visual processing in the eye and brain starts with the detection of edges. The edges can be between areas of different brightness (luminance differences) or spectral composition (color differences). V1 neurons extend the detection of edges by responding selectively to straight edges at particular angles. Subsequent extrastriate and IT areas respond to more complex optimal stimuli, such as images of specific objects or body parts or faces. Intermediate visual areas may extract features such as the direction and speed of movement, or stimulus color. These stages of visual processing are now well known, but there are still many unanswered questions about how they lead to conscious visual perception. This section will discuss a few aspects of vision and visual perception that remain poorly understood today.

The Binding Problem

Any visual image elicits simultaneous responses from millions of neurons in V1 and the extrastriate areas that respond to aspects of the stimulus. For example, a bluebird flying across the sky will activate color-selective neurons in V1, V2, and V4 corresponding to the bird’s color. The bird’s motion will activate movement-selective neurons in MT. Other objects in the visual scene will activate other neurons in visual cortical areas. If many visual neurons are firing simultaneously to multiple components of the image, how are the ones activated by the bluebird identified as belonging together? This is the binding problem, the need to link activity in different neurons that are stimulated by the same stimulus object. We assume that the activity of these neurons is somehow combined to generate our perception of that object, but how?

One suggested mechanism for binding responses is rhythmic firing, where the timing of synchronous action potentials identifies neurons that are responding to a particular stimulus. Where and how that synchrony would be detected, and where conscious perception is ultimately localized, remain unanswered questions, and synchrony as the answer to the binding problem remains controversial.

Top-Down Processing

Figure 6.35 is a famous picture, a high-contrast assortment of shapes and splotches that leads most people to perceive a spotted dog in leafy shade.

Black and white image of assortment of shapes and splotches that leads most people to perceive a spotted dog in leafy shade. The image is also reproduced with the dog outlined in red.

Figure 6.35 Top-down processing Image credit: Favela, L.H.; Amon, M.J. Enhancing Bayesian Approaches in the Cognitive and Neural Sciences via Complex Dynamical Systems Theory. Dynamics 2023, 3, 115-136. https://doi.org/10.3390/dynamics3010008. CC BY 4.0

It seems inconceivable that seeing the dog could be accomplished just from the activity of V1 neurons responding to the components of the image that fall within their receptive fields. That would be a “bottom-up” approach, where elements of the image are combined in successive stages of cortical processing to generate the perception of the object. Instead, some prior knowledge of the shape of a dog seems necessary for organizing the elements of the scene. That would be a “top-down” contribution to visual perception.

Computer scientists working on automatic image interpretation have described top-down processing as “hallucinating” the scene and then calculating how well the actual elements match the “hallucinated” image. After revising the imagined scene in areas where the match is poor, the calculation would repeat until a good match is found. Our visual system may work in a somewhat similar way. Bottom-up processing detects edges and activates neurons in V1 and extrastriate cortex, but top-down processing organizes the edges that are detected in the lower visual areas.

Visual illusions further support the existence of top-down organization of the elements of a scene. The image on the left of Figure 6.36 is either a white vase or profiles of two faces. We can instantly switch from one perception to the other, indicating that although the elements of the image falling on the retina that elicit bottom-up activity are unchanged, a top-down process determines how the components are grouped and perceived. Other visual illusions and ambiguous images support the same point.

Left is a black and white drawing that can be perceived as a white vase on black background or black profiles of two faces on white background. Right is a black and white drawing that can be perceived as a young woman looking away and to the back or an old woman looking forward and to the side.

Figure 6.36 Visual perception Visual perception can change depending on your expectations even if the simulus does not change. Image credit: Face-vase by Ian Remsen, CC0, https://commons.wikimedia.org/w/index.php?curid=128282152; Woman by W. E. Hill - „Puck“, 6. Nov 1915, Public Domain, https://commons.wikimedia.org/w/index.php?curid=3836554

Neuroscience in the lab: Top-Down Processing and Visual Attention

Another top-down process that affects perception is visual attention. Paying attention to one aspect of an image can enhance the activity of the neurons that respond to that aspect. A famous experiment measured the fMRI signals in people from temporal lobe brain regions that are known to respond selectively to faces or houses. As mentioned in the discussion of face-selective neurons in 6.5 Extrastriate Cortex, the fusiform face area (FFA) is active when a person views faces, but the FFA is not responsive to images of houses. Houses, in turn, enhance activity in the parahippocampal place area (PPA), but that area does not respond to faces.

In the experiment, subjects were placed in a scanner to record fMRI signals from the FFA and PPA while they viewed an image that superimposed a face and a house. In a series of trials, the subjects were instructed to pay attention to either the house or the face, and then after a short period of time, to switch their attention to the other component of the double image. As shown in Figure 6.37, when attention was directed to the face, the FFA was more active, while the PPA increased its activity when attending to the house. The important implication is that although the image on the retina was always the same and thus bottom-up processes would be the same, neurons in the visual processing pathway did change their activity. They were being modulated by a top-down process: in this case, attention (see Chapter 19 Attention and Executive Function)

Top left shows an example photo of a face and house, superimposed one on top of the other. Top right shows diagram of a participant looking at a computer screen with a face/house stimulus on it, as described in the main text. Participants viewed overlapping images on a computer screen while in an fMRI scanner. On some trials, they attended to the faces, while on others, they attended to the houses. Bottom shows bar graph showing FFA activity was greater when the participant attended to the faces as opposed to the houses and PPA activity showed greater activity when the participant attended to the houses as opposed to the faces. A diagram of the ventral side of a human brain with PPA and FFA highlighted on the bottom sides of the temporal lobes (PPA medial to FFA) is also shown.

Figure 6.37 Attention provides top-down modulation of activity in visual areas Image credit: Face-house image from: Keizer, A.W., Nieuwenhuis, S., Colzato, L.S. et al. When moving faces activate the house area: an fMRI study of object-file retrieval. Behav Brain Funct 4, 50 (2008). https://doi.org/10.1186/1744-9081-4-50 CC BY 2.0

These examples make it clear that top-down processes influence the activity of neurons in visual areas, and thus that visual perception cannot be an exclusively bottom-up process. Other experiments recording from individual neurons in lower visual areas also show that their firing rates are modulated by activity in higher visual areas. Extensive axonal feedback connections from higher to lower areas offer an anatomical pathway for top-down modulation. However, how top-down processes modulate lower visual areas to contribute to visual perception is not generally known. Progress has been made, but this remains a largely unsolved puzzle in visual perception.

Grandmother Cells

The first stages of visual processing suggest that activity converges from one stage to the next. Axons from multiple LGN neurons converge to create the receptive fields of individual simple cells. Activity from multiple simple cells excites single complex cells, and so on. Very large receptive fields in extrastriate and IT cortex suggest the convergence of signals from earlier neurons that have small receptive fields. How high could this convergence reach? Could there be advanced neurons with extremely selective responses that combine visual and possibly other information? This idea was proposed, not seriously, as the existence of “grandmother cells”: cells that would respond to any view of your grandmother, and by extension, other neurons that would respond uniquely to other specific people or objects.

Although the term was presented mockingly, recordings of hippocampal neurons from a person undergoing brain surgery for epilepsy did reveal a neuron that seemed to respond to images and also the written name of a well-known actress, Jennifer Aniston (Quiroga et al., 2005). This opened two questions: first, could recognition depend on single neurons responsible for identifying a particular stimulus, and second, could neurons represent not just a visual image, but also other information about the subject?

The first possibility, that single neurons are responsible for recognizing an image, seems unlikely. Any visual stimulus evokes activity in an ensemble of hundreds or thousands of neurons, even at high levels. Similar visual stimuli would activate overlapping but distinct ensembles. There is no indication that potential loss of a single neuron suddenly eliminates recognition of a particular object. Loss of neurons does occur from strokes, for example, and the loss does eliminate function. Prosopagnosia is a neurological deficit where the patient no longer recognizes faces, but it involves the loss of whole regions of neurons and problems with many faces.

The second idea, that ensembles of neurons might encode information associated with a visual stimulus, seems more likely. “Gnostic neurons” have been proposed, and the neuron that responded to Jennifer Aniston’s image and written name seems to be an example of a gnostic neuron. It’s clear that our brains somehow generate our unified conscious experience, but like other aspects of consciousness, the later stages of visual perception remain an unsolved and challenging puzzle.

6.6 Unsolved Questions In Visual Perception

Learning Objectives

The Binding Problem

Top-Down Processing

Grandmother Cells