Learning Objectives
By the end of this section, you should be able to
- 7.2.1 Identify the major anatomical parts of the ear and describe their function.
- 7.2.2 Explain how the cochlea separates complex sounds into their component frequencies and transduces acoustic energy into neural signals.
- 7.2.3 Describe how neural information ascends to the cerebral cortex through the major nuclei and fiber tracts of the classical ascending auditory pathway.
In the previous topic, we’ve seen how the air is full of acoustic waves carrying cues and signals vital to survival. These waves consist of minute variations in air pressure that can be up to 10 billion times smaller than the overall pressure exerted by the atmosphere. In this topic, we will examine how the incredibly sensitive organ known as the ear is able to detect these tiny fluctuations and convert them to neural signals that the brain can process.
The ear consists of three main parts, shown in Figure 7.4: the external ear, the middle ear, and the inner ear. Auditory stimuli enter through the external ear as acoustic waves and exit the inner ear as neural impulses.
External ear
When someone is asked to point to their ears, they will usually indicate their auricles, structures of skin and cartilage on either side of their head. The shape of the auricles (also called pinna), differs greatly among species. Some species, like cats, can manipulate the shape of their pinna using muscles on the skull, while many other species, including most birds, amphibians, and reptiles, have no pinna at all.
The auricle serves two functions. The first is to funnel acoustic waves into the external auditory meatus (ear canal). This amplifies the waves by compressing them into a smaller opening, increasing the sensitivity of hearing by up to 10 dB. The second function is to aid in localizing sound sources. The ridges and folds in the auricle create resonant cavities that filter out specific frequencies that depend on the direction from which the sound is coming. As a simple experiment, filling the cavities of the auricles with modeling clay makes it considerably more difficult to tell where a sound is coming from with your eyes closed (Oldfield and Parker 1984).
Middle ear
Separating the external auditory meatus from the middle ear cavity is the tympanic membrane, a thin sheet of tissue. Like the tightly stretched surface of a drum, the tympanic membrane vibrates in response to acoustic waves in the ear canal. As compressed air reaches the membrane, it pushes it inward, because the pressure in the ear canal is slightly higher than the pressure in the middle ear. Similarly, when the air in the ear canal is at a lower pressure (rarefied), the tympanic membrane is pushed outward by the air in the middle ear.
In order for the tympanic membrane to vibrate effectively, the static pressure in the ear canal and the middle ear cavity need to be equal. The external atmospheric pressure depends on altitude, and a rapid change of altitude, as in a descending or ascending airplane, creates an imbalance that reduces hearing sensitivity, particularly for high frequencies. The Eustachian tube, which connects the middle ear to the oral cavity, allows the balance to be restored. Sometimes it is necessary to encourage the Eustachian tube to open by “popping your ears”. A bad cold or sinus infection can cause the Eustachian tube to close. This results in a persistent imbalance that can be acutely painful and even cause damage to the tympanic membrane. Infections of the middle ear, called otitis media, can cause the middle ear cavity to fill with fluid, causing a pressure imbalance if the fluid is unable to drain through the Eustachian tube.
In mammals, the middle ear cavity contains three bones: the malleus (hammer), the incus (anvil) and the stapes (stirrup). These bones are the smallest in the human body. Their function is to transmit the vibrations of the tympanic membrane to the inner ear, specifically the oval window of the cochlea. The middle ear bones form a mechanical lever that compensates for the large difference in the impedance of the air in the ear canal and the fluid inside the cochlea. As illustrated in Figure 7.5, when air is on both sides of a barrier like the tympanic membrane, acoustic energy is efficiently transferred from one side to another because the molecules have the same size and density. They collide like billiard balls: one ball comes to a stop while the other absorbs its momentum and moves on with the same speed. However, when an acoustic wave in air collides with a fluid, the water molecules are densely packed, and much of the energy is reflected, like ping-pong balls bouncing off a bowling ball. The ear compensates for this impedance mismatch through mechanical advantage. The tympanic membrane has a surface area around ten times the size of the oval window. Its vibrations are captured by the stiff middle ear bones, which add further mechanical advantage by acting as levers. Without the middle ear, only a thousandth of the energy in airborne acoustic waves would make its way into the fluid filling the cochlea (Wever and Lawrence 1948).
The impedance-matching of the middle ear can be compromised by buildup of scar tissue from repeated middle ear infections or by immobilization of the ligaments supporting the middle ear bones. This condition can result in a severe loss of hearing.
The middle ear also contains two muscles that attach to the middle ear bones, the tensor tympani and the stapedius (Figure 7.6). These muscles adjust the tension of the tympanic membrane and the stiffness of the connection between the incus and stapes, respectively. These muscles allow the brain to control the gain of the middle ear. The stapedius reflexively contracts in response to high-intensity sounds and during self-vocalizations, causing a reduction in transmission through the inner ear by up to 25 dB (Pang and Peake 1986; Rosowski 1991). This reflex is an important protection against the damage that intense sounds can do to the inner ear, but like every reflex, there is some delay. Very sudden sounds like gunshots can therefore do significant harm.
Inner ear
The inner ear is a hollow bony structure that contains epithelial tissue and fluid. It comprises the cochlea, which is involved in hearing, along with the semicircular canals, the utricle, and the saccule, which are part of the vestibular system. The cochlea gets its name from the Greek for “snail” because it is shaped like a spiral (in mammals).
Cochlea
The interior of the cochlea (Figure 7.7) has a circular cross-section, divided into three fluid-filled compartments that run the length of the cochlear duct: the scala vestibuli, the scala media, and the scala tympani. The scala vestibuli begins at the base of the cochlea at the oval window, a membrane-covered opening that contacts the foot of the stapes. It connects to the scala tympani at the apex, the tip of the innermost spiral of the cochlea. The scala tympani terminates at the base of the cochlea in the round window, another membrane-covered opening. The fluid found in the scala vestibuli and scala tympani is called the perilymph. The scala media is a separate compartment filled with a different fluid called endolymph.
Basilar membrane
Acoustic waves enter the cochlea through the oval window via the stapes. As the stapes vibrates in and out, it pushes and pulls on the perilymph in the scala vestibuli and scala tympani, which in turn displaces the membrane covering the round window. Some of the energy of this wave, however, is absorbed by the basilar membrane, which stretches between the two walls of the cochlear duct and is able to move up and down. The basilar membrane varies systematically in thickness and width over its length. It is thickest and narrowest at the base and thinnest and widest at the apex (Figure 7.8). These physical properties translate into tuning for different frequencies, a phenomenon called resonance. Near the base, the basilar membrane resonates at high frequencies, around 20,000 Hz. Vibrations in the acoustic wave at these frequencies are absorbed into vertical displacements of the basilar membrane, while the lower-frequency components continue to travel down the scala vestibuli. The resonance of the basilar membrane becomes progressively lower, to about 20 Hz at the apex, so as the acoustic wave travels toward the apex, each segment of the basilar membrane absorbs energy at its resonant frequency (von Békésy G 1960; Rhode 1971).
Take a moment to notice in Figure 7.8 how the frequencies are spaced along the basilar membrane. About half of the length is dedicated to the frequencies between 25 and 1600 Hz; the other half goes from 1600 to 20,000 Hz, a much larger range. In other words, the relationship between frequency and position is not linear but (approximately) logarithmic. This organization may be related to how frequencies in a harmonic series are spaced logarithmically (i.e., integer multiples of the fundamental frequency). In humans, it may also reflect the fact that most of the information in speech is carried by relatively low frequencies (see 7.3 How Does the Brain Process Acoustic Information?).
Although the physical principles are different, the way the basilar membrane separates complex sounds into waves of different frequencies is analogous to the way a prism can separate white light into its component frequencies (or colors). It is an essential first step in how the brain analyzes complex sounds. The organization of frequency (a non-spatial property of the sound) along the axis of the basilar membrane (a spatial dimension) is an example of topography. Because the ear contains a map of frequency, this topography is also known as tonotopy.
On top of the basilar membrane sits the organ of Corti, the epithelial membrane that transduces mechanical movements of the basilar membrane into neural impulses. The organ of Corti comprises four rows of hair cells along with a large number of various kinds of support cells. Hair cells are characterized by stereocilia arranged in a stairstep bundle that protrudes into the endolymph filling the scala media (Figure 7.9; . The organ of Corti is attached at one side to the tectorial membrane, forming a hinge. The tectorial membrane is a gelatinous structure that floats just above the tips of the hair cells’ stereocilia.
There are two types of hair cells in the organ of Corti. The inner hair cells form a single row nearest the hinge with the tectorial membrane. They are responsible for sensing the movement of the basilar membrane and transducing those movements into neural impulses. The other three rows are the outer hair cells. Their function is not to sense movement, but to help amplify low-intensity sounds.
Inner hair cells
The inner hair cells are in many ways the most important cells in the auditory system, because they are the sole point at which the physical energy of acoustic waves is converted into the electrochemical signals used by the nervous system. On one end, called the apical end, are the stereocilia, whereas the basal end makes a synapse with an afferent terminal from a single bipolar neuron in the spiral ganglion. There are about 3500 inner hair cells in the human cochlea.
As the basilar membrane vibrates up and down, the tectorial membrane slides back and forth in the transverse direction. The fluid between the organ of Corti and the tectorial membrane tends to remain stationary because of inertia, causing the stereocilia of the inner hair cells to deflect toward and away from the hinge. The tips of the inner hair cell stereocilia contain ion channels that are physically attached to the next-tallest cilium by a tip link. As illustrated in Figure 7.10, deflection of the stereocilia toward the taller edge of the bundle puts tension on the tip links, opening the ion channels and permitting potassium and calcium ions to enter the cell from the endolymph. This depolarizes the cell, causing voltage-gated calcium channels at the basal end of the hair cell to open, which in turn causes synaptic vesicles to fuse with the plasma membrane and release neurotransmitter into the synapse. Deflection of the stereocilia in the other direction relieves tension on the tip link, closing the ion channels, hyperpolarizing the cell, and stopping neurotransmitter release (Hudspeth 1985; Fettiplace 2017).
The transduction process in inner hair cells is exceptionally sensitive and fast. Sounds at the threshold of hearing produce basilar membrane movements of about 0.1 nm, about the diameter of a hydrogen atom, and movements of the stereocilia may be even smaller. The membrane potential of the hair cell responds to stereocilia deflection in as little as 10 µs, and a specialized ribbon synapse (Figure 7.10) allows neurotransmitter to be rapidly released onto the afferent fiber. The speed and fidelity of neurotransmission allows auditory nerve fibers to track frequencies as high as 1000 Hz or more, firing only at a specific phase of a periodic sound (Dynes and Delgutte 1992). Phase-locking is critical to one of the mechanisms the brain uses to determine the direction of a sound.
Outer hair cells
The other three rows of hair cells in the organ of Corti comprise the outer hair cells. Outer hair cells have a similar appearance to inner hair cells, but serve a different function. Whereas the inner hair cells primarily receive afferent connections that carry neural signals into the brain, the outer hair cells primarily receive efferent connections carrying signals out from the brain. When an outer hair cell is stimulated, either by deflection of its stereocilia or by synaptic transmission from efferent fibers, it changes its shape. This electromotive response, which is generated by the molecular motor prestin, can occur at frequencies up to 50 kHz, among the fastest cellular movements known (Zheng et al., 2000). Because the stereocilia of outer hair cells are embedded in the tectorial membrane, these movements are translated into the basilar membrane. This allows the brain to selectively amplify very weak sounds at specific frequencies (Dallos 1992). This is one of the major reasons why we are able to detect extremely low-intensity sounds. However, as anyone who has heard what happens when a microphone is placed directly in front of a loudspeaker can attest, amplifiers are sensitive to overstimulation. This means that the outer hair cells are easily damaged, leading to hearing loss.
Developmental Perspective: Noise, hearing loss, and tinnitus
Hearing loss affects one in eight people aged 12 and older in the US (Lin et al., 2011). Many of these cases are termed conductive hearing loss, because they are related to deficits in how acoustic energy is transmitted from the air to the cochlea. Common causes of conductive hearing loss include occlusion of the ear canal, damage to the eardrum, otitis media, and damage or congenital defects in the middle ear bones. Other cases may involve damage to a component of the nervous system, such as hair cells in the inner ear, the auditory nerve, or an auditory area of the brain. These are termed sensorineural hearing loss. In humans and other mammals, hair cells cannot regenerate, so sensorineural hearing loss is almost always progressive and irreversible (Wagner and Shin 2019). Exposure to excessive noise is one of the most common causes of damage to hair cells. This damage can be easily prevented with appropriate hearing protection. High-intensity acoustic waves cause large displacements of the basilar and tectorial membranes, generating shearing forces strong enough to break tip links, which can be repaired, even in mammals, or loss of stereocilia, which cannot. Blows to the head, aging, genetic defects, and certain ototoxic drugs (including some antibiotics), can also damage hair cells. Outer hair cells are particularly sensitive to noise-induced hearing loss, perhaps because of electromotive feedback or because their tips are directly embedded in the tectorial membrane. Loss of outer hair cells not only makes it more difficult to hear low-intensity sounds, but also impacts the ability to understand speech in noisy conditions, because the brain is no longer able to selectively amplify specific frequencies of interest. Thus, although hearing aids can correct for some of the loss of sensitivity, they are less effective at restoring intelligibility.
Exposure to intense noise can also produce a hearing disorder called tinnitus, a perception of a tone or buzzing sound in the absence of an external stimulus. Although the mechanisms of hearing-loss-related tinnitus remain unclear, one hypothesis is that the illusory tones are the result of overcompensation by central brain areas for regions of the organ of Corti where hearing loss has occurred (Knipper et al., 2021).
Neuroscience in the Lab
Cochlear implants
Most cases of deafness are caused by conductive hearing loss, by sensorineural damage to hair cells, or by a combination of the two. What if it were possible to bypass these parts of the ear and directly stimulate the afferent auditory nerve fibers? In recent decades, the technology to do just that has restored hearing for hundreds of thousands of individuals, including some who were unable to hear from birth.
A cochlear implant consists of two main parts (Figure 7.11). A thin array of electrodes is surgically implanted in the cochlear partition, close to the afferent terminals of the auditory nerve. When an electrical pulse is delivered to one of these electrodes, it depolarizes the axons directly, causing them to generate action potentials that travel into the brain. Electrodes near the base of the cochlea activate the axons associated with high frequencies, while electrodes near the apex activate the axons associated with low frequencies.
The second part of the implant is a controller placed outside the skull. It has a microphone to sense acoustic waves, and a digital processor that separates the acoustic signal into its component frequencies and converts the signal in each frequency band into a sequence of electrical pulses that are sent to the appropriate electrode.
Cochlear implants are an almost miraculous technology. They enable many individuals to more fully participate in societies that place great emphasis on verbal, acoustic communication. Deaf children who are fitted with implants early in life, before the onset of sensitive periods for learning the acoustic structure of speech, can achieve close to normal language development (Kral and Sharma 2011). Cochlear implants are not without their limitations, however, and research continues to improve their sensitivity and fidelity. As of this writing, typical cochlear implants have around 12–24 electrodes (Dhanasingh and Jolly 2017). This is much less than the approximately 3500 inner hair cells, so each electrode stimulates a large number of fibers and the spectral resolution is low. Fortunately, much of the information in speech is contained not only in which fibers are activated, but also in their rate and relative timing. Because of this, the sophisticated signal processing in cochlear implants is generally able to produce an intelligible percept of speech. Other kinds of signals for which the ability to resolve distinct frequencies is critical, like music, remain difficult for cochlear implants to transmit to the brain with fidelity (Fowler et al., 2021). Another limitation of cochlear implants is the absence of the dynamic amplification provided by outer hair cells, which impairs intelligibility of speech in noisy conditions or when there are multiple speakers.
Ascending auditory pathways
After the inner hair cells of the organ of Corti have transduced the mechanical deflection of their hair bundles into the release of neurotransmitter, the auditory stimulus propagates as neural impulses through a number of brain regions and fiber tracts on its way to the cerebral cortex. This ascending pathway is more complex than the pathways for other sensory systems, so there are more names and connections to learn. One reason for this complexity may be that information about the location of sound sources, arguably one of the most important tasks for the auditory system’s function in predator avoidance, needs to be extracted from the stimulus early on.
This section will focus on the anatomy of the classical ascending auditory pathway, the most prominent and best studied route by which auditory information enters the brain (Figure 7.12). Although we may briefly note the function of some brain regions and pathways, a full discussion of the physiology and neural computations will be deferred to 7.3 How Does the Brain Process Acoustic Information?.
Auditory nerve
The inner hair cells form synapses with the dendrites of auditory nerve cells, which are bipolar neurons with their cell bodies in the spiral ganglion, a thin strip of gray matter within the spiraling interior wall of the cochlea. The axons of these cells course together in the center of the cochlea to form the auditory branch of cranial nerve VIII. The auditory nerve also contains many efferent axons descending from the brain to the cochlea; these primarily make synapses with the outer hair cells.
Cochlear nucleus
All of the afferent axons in the auditory nerve terminate in the cochlear nucleus, located right at the junction between the medulla and the pons. The cochlear nucleus has three subdivisions: dorsal, posterior ventral, and anterior ventral. Each auditory axon branches twice to make synapses in all three subdivisions. The tonotopic organization of the cochlea is preserved in these projections, resulting in three complete maps of frequency within the cochlear nucleus, one per subdivision (Figure 7.13). Almost all of the stations in the classical auditory pathway have a complete tonotopic map.
The neurons within the cochlear nucleus give rise to three major output pathways that cross the midline and then ascend up the brainstem through a bundle of axons called the lateral lemniscus.
Superior olivary complex
The superior olivary complex is the first station in the auditory pathway that receives input from both ears. It is located in the pons and comprises three main nuclei: the medial superior olive (MSO), the lateral superior olive (LSO), and the nucleus of the trapezoid body (NTB). These areas primarily receive input from the cochlear nucleus, and all three are involved in determining the location of sound sources using differences in the timing and level of stimuli arriving at the two ears.
Lateral lemniscus and nuclei
The lateral lemniscus is a tract that includes axons from ipsilateral and contralateral cochlear and superior olivary complex nuclei. Some of these axons form synapses in the nucleus of the lateral lemniscus (NLL), while others continue to ascend to the inferior colliculus.
Inferior colliculus
All the axons in the lateral lemniscus terminate in the inferior colliculus, the first midbrain auditory area and a major relay station. There are three major auditory areas within the inferior colliculus: the central nucleus, the external nucleus, and the cortex. Only the central nucleus is considered to be part of the classical pathway. Its main output is via the brachium of the inferior colliculus to the medial geniculate body, a nucleus in the thalamus. The central nucleus of the inferior colliculus also projects to the superior colliculus, an important midbrain area that integrates auditory, visual, and somatosensory input to create a unified map of space relative to head location. This map facilitates rapid orientation of the eyes and head to startling sounds.
Medial geniculate body
The medial geniculate body is a nucleus in the dorsal thalamus, and is the primary gateway for auditory information to reach the cerebral cortex. As such, it plays an important role in suppressing these inputs during sleep and boosting them during arousal. Like the inferior colliculus, the medial geniculate body has three main subdivisions: dorsal, ventral, and medial. Only the ventral medial geniculate body is considered to be part of the classical pathway. It receives input from the ipsilateral central inferior colliculus and sends its output to the ipsilateral auditory cortex, the primary auditory cortex.
Auditory cortex
Ascending auditory information in the classical pathway reaches the cerebral cortex in a region called primary auditory cortex, or A1. In humans, this area is located in the temporal lobe. As in other primary sensory areas, thalamic afferents primarily form synapses in layer 4. There are also inputs from the medial geniculate body onto two neighboring areas, secondary auditory cortex, and auditory association cortex.
The primary auditory cortex and its neighboring areas (sometimes called the core) together make up the first stage in a hierarchy of cortical areas. They each have a tonotopic map representing the full range of audible frequencies, and receive input from both ears. The core is surrounded by secondary cortical areas that receive auditory input via the core and also show tonotopic organization (shown in the bottom portion of Figure 7.13). The secondary areas are heavily interconnected with each other, and appear to be involved in processing more complex acoustic features and in storing auditory memories. The secondary areas project to higher-order sensory association areas that receive input from more than one sensory modality (e.g., visual and auditory) and, in humans, to areas that are specialized for perception and production of speech. These higher-order areas project back to primary and secondary auditory areas. These top-down connections feed back information about context that can help parse complicated scenes by suppressing specific frequency bands and boosting others.
Bilateral processing and descending pathways
Auditory perception is inherently binaural, because one of its most essential and probably primitive functions is to localize acoustic cues. As we will see, this requires input from both ears. It is therefore unsurprising to find many connections between the hemispheres at multiple levels of the auditory pathway. For example, the cochlear nucleus not only makes connections to the superior olivary complex on the same side of the brain (ipsilateral), it also projects to structures on the opposite (contralateral) side of the brain, including the contralateral cochlear nucleus, superior olivary complex, nucleus of the lateral lemniscus, and inferior colliculus. There are also cross-hemisphere connections (called commissures) in the midbrain and in the cortex. One consequence of this extensive interconnection between hemispheres is that damage to one hemisphere of the auditory system, such as lesions caused by strokes, typically do not result in deafness on either side, though there may be deficits in localization.
Many of the ascending connections discussed in this section are paralleled by descending connections, axons traveling in the opposite direction from cortex to periphery. Indeed, descending connections from cortex to thalamus and other lower stations can far outnumber the ascending ones (Winer et al., 1998; Winer et al., 2001). These connections can modulate processing at almost every level down to the cochlea, where dynamic control of outer hair cells is thought to support dynamic amplification and filtering of signals of interest (de Boer et al., 2012).