970x125
This post is part one of a series.
Speaking feels like the most natural thing in the world. You think a thought, open your mouth, and words tumble out in perfect sequence. Yet this apparent simplicity masks one of the most remarkable feats your brain performs daily. Consider this: An 18-month-old can form words reasonably well but won’t master toilet training for another year. Meanwhile, our closest evolutionary relatives—despite their intelligence and dexterity—cannot speak at all. This is not because their mouth is anatomically incapable of speech, it’s because speech is difficult to coordinate, and apes lack the brain networks to do it. How, then, does our brain enable speech?
This question has fascinated neuroscientists for decades. My own contribution to the problem started with a serendipitous finding: A tiny blip of brain activity showed up in an unexpected place during a seemingly simple task, and it led me to a revolutionary theory that bridges two previously separate scientific worlds.
When Silence Speaks Volumes
In the mid-1990s, I was studying how the brain processes signed versus spoken language using then cutting-edge fMRI technology. We wanted to know if the same or different brain areas were involved in speaking and signing. Deaf signers and hearing talkers were asked to sign or silently name pictures. We found that similar brain areas were involved in both. Interesting. Yet something else appeared in the scans of the hearing/speaking participants: A small region in the left auditory cortex lit up, even though the participants weren’t hearing anything. I became obsessed with understanding what this activation meant.
The finding echoed an observation made over a century earlier by Carl Wernicke, one of the founding fathers of language neuroscience. Wernicke had noticed that patients with damage to auditory brain regions didn’t just have trouble understanding speech—they also made frequent errors when speaking, as if they had lost what he called the “corrective function of sound images.”
The Sensory Foundation of Speech
This supported an emerging hypothesis, championed at the time by a handful of speech scientists, including John Houde and Frank Guenther: What if speaking isn’t primarily a motor act, but fundamentally a sensory one? What if, when we plan speech, our brains don’t think in terms of tongue and lip movements, but in terms of the sounds we want to produce? Once you open your mind to this possibility, you find evidence for this “sensory theory” of speech production all around us. Put a pencil between your teeth and try reading this sentence aloud. Despite the obstruction, you can still speak clearly because your brain automatically adjusts your articulation to achieve the acoustic goal—the right sound pattern. If speech planning were purely motor-based, this instant adaptation would be impossible. Or consider the disruptive effect on talking when a bad phone connection taunts you with a delayed echo of your own voice. Auditory information is disrupting the fluency of your speech.
Even more compelling is what happens when researchers manipulate what people hear in their own voice in controlled ways in real-time. Using sophisticated audio processing, scientists can shift the pitch or alter vowel sounds as people speak, feeding this modified version back through headphones. Remarkably, talkers automatically and unconsciously adjust their articulation to compensate, often when they don’t even notice the manipulation. This suggests that auditory targets—not motor programs—guide speech production.
Bridging Two Worlds
The blips I noticed in the auditory cortex during silent naming, I reasoned, must reflect the auditory system’s role in speaking. My students and I set out to fully map this system in the brain. We succeeded in identifying a network of sensory and motor areas anchored in a specific brain region—which we dubbed “area Spt” (for Sylvian parietal-temporal)—that serves as a crucial interface between hearing and speaking.
This finding began to bridge two historically separate research traditions. Psycholinguists had long studied how we transform thoughts into words, focusing on abstract levels like phonemes, syllables, and syntax. Motor control scientists, meanwhile, investigated how the brain coordinates the complex movements of the tongue, lips, and vocal cords. Most researchers assumed they were studying different things. The bridge came when I realized that the circuit we identified was operating at the phonological level, a linguistic system, yet its architecture was typical of motor-control systems with a sensory part, a motor part, and an interface system in between. This meant that phonological processing in speech production is not a single thing but is decomposable into a sensorimotor-like architecture.
The Architecture of Articulation
This sensorimotor architecture explains many puzzling features of speech disorders. Consider conduction aphasia, where patients can understand speech perfectly but make frequent phonological errors when speaking—saying “cag” instead of “cat,” for example. Traditional theories struggled to explain why comprehension remained intact while production suffered. The new integrated model provides an elegant solution: Damage to the auditory-motor interface (area Spt) disrupts the brain’s ability to use auditory targets for speech planning and error correction, while leaving the comprehension system largely untouched. The model also explains why some stroke patients develop different types of phonological problems, which can emerge depending on whether the brain damage occurs in the frontal lobe or closer to auditory areas. Since phonological processing is partitioned into sensory-related (the targets or goals) and motor-related (the plans for hitting those targets) subcomponents, it makes sense that different types of phonological problems can emerge following damage to different parts of the system. Frontal lobe damage tends to disrupt the motor planning side, causing effortful, halting speech. Posterior damage affects the auditory target system, leading to fluent but error-prone speech as the internal “quality control” system fails.
Neuroscience Essential Reads
To be continued…
Excerpted and adapted from Wired for Words: The Neural Architecture of Language by Gregory Hickok, published by MIT Press.