There is a specific moment, usually in childhood or early adolescence, when most people first hear a recording of their own voice and discover, with mild horror, that it does not sound the way they expected. The voice on the recording is thinner. Higher. More nasal. Less authoritative. Less resonant. It does not seem to match the voice the person has been hearing inside their own head for their entire life. The instinctive reaction is that the recording must be defective somehow — that microphones distort, that playback equipment is poor, that something has gone wrong in the chain between vocal cords and loudspeaker. The recording, in nearly all cases, is fine. The misalignment is between two distinct ways of hearing the same sound, only one of which is available to the listener and only the other of which is available to everyone else who has ever listened to that voice. The recording is a reasonable approximation of the second one. The voice in the speaker’s head is, in important respects, an illusion produced by the speaker’s own skull.

According to a 2023 Mirage News explainer drawing on the underlying acoustics literature, the human auditory system actually has two distinct pathways by which sound reaches the inner ear. The first, called air conduction, is the pathway most people think of as “hearing”: sound waves travel through air, enter the ear canal, vibrate the eardrum, set off a chain of vibrations in the three small bones of the middle ear, and ultimately stimulate the cochlea — the fluid-filled, snail-shaped organ in the inner ear that converts mechanical vibration into the neural signals the brain interprets as sound. The second pathway, called bone conduction, bypasses the outer and middle ear entirely. Vibrations from a sound source travel directly through the bones of the skull — particularly the mastoid and temporal bones near the ear — and reach the cochlea by a structural route rather than an airborne one.

Why bone conduction makes the voice in your head deeper

When a person speaks, the vibration originates in the vocal cords, which sit deep inside the throat, surrounded by skull and jawbone. The sound produced travels outward into the air, where it can be heard by everyone else in the room via ordinary air conduction. But the sound also travels inward, through the bones surrounding the vocal cords, up through the skull, and into the speaker’s own inner ear via bone conduction. The speaker, accordingly, hears their own voice through both pathways simultaneously. Everyone else in the room hears it through only one.

The catch is that bone is not a neutral conductor of sound. As reported by Imperial College London’s coverage of Dr. Tobias Reichenbach’s 2014 research on the mechanics of bone conduction, the bones of the human skull are substantially more efficient at transmitting low-frequency vibrations than high-frequency ones. The bone-conducted version of a person’s voice is therefore weighted toward the lower end of the frequency spectrum — bassier, fuller, and more resonant than the air-conducted version. The voice the speaker hears in their own head is the sum of the air-conducted and bone-conducted versions, with the bone-conducted contribution adding low-frequency depth that no external listener has ever heard. Reichenbach summarised the implication directly in his 2014 paper: “Many of us cringe when we play back a recording of our own voice. This is because we perceive our voice differently to how others hear it. Interestingly, bone conduction plays a role in how we recognise our voice. This is because the process is more effective at transmitting lower-frequency sounds to the brain, which means that we perceive our voice as being deeper than what it is.”

What the recording actually captures

A microphone, sitting in air several feet from the speaker, picks up only the air-conducted version of the voice — the same version that reaches the ears of everyone else within hearing range. When that recording is played back, the listener hears their own voice as it actually sounds to others, without the low-frequency bone-conducted overlay that has, until that moment, been the only version of their voice they have ever known. The mismatch is real, well-characterised, and produces a specific psychological phenomenon that auditory researchers refer to as “voice confrontation” — the sense of cognitive dissonance that accompanies hearing one’s own recorded voice for the first time.

Per an explainer from ENT and Allergy Associates on the phenomenon, the voice-confrontation effect tends to be most pronounced in people who have rarely or never heard recordings of themselves. It also tends to diminish, somewhat, with repeated exposure — singers, podcasters, broadcasters, and others who routinely listen to their own recorded voices gradually adjust their internal voice-image to better match the external reality. But the underlying physical mismatch never goes away. As long as a person has a skull and intact bone conduction, they will hear their own voice with low-frequency emphasis that no one outside their body can detect. The mental image of one’s own voice, accumulated over a lifetime of speaking and listening simultaneously, is structurally different from the voice that anyone else has ever encountered.

The thing humans never heard until 1877

Per Today You Should Know’s overview of the air-conducted versus bone-conducted distinction, one of the historically interesting features of this whole phenomenon is that humans never actually heard the “external” version of their own voice until the late 19th century. Thomas Edison’s phonograph, invented in 1877, was the first technology that could capture sound and play it back — and was, accordingly, the first technology that could ever show a person what they actually sounded like to other people. For approximately 200,000 years before Edison, every member of the species had a mental image of their own voice that was systematically different from the voice that everyone they had ever spoken to had heard. The internal voice was the only voice anyone had ever experienced from inside their own head. The discrepancy was undetectable, and presumably did not exist as a recognisable phenomenon.

The first generation of phonograph users would, by all available historical accounts, have been the first humans in history to experience voice confrontation. The experience has since become a routine part of growing up in any technologically modern society — the moment when a child first hears themselves on a voicemail, or a recording from school, or a video clip, and registers, with confusion, that the voice on the playback does not match the voice they have been hearing inside their own head for as long as they have been speaking. The cognitive adjustment that follows is the small but real psychological work of integrating the external version of oneself into the internal model. For most people, the external version will always sound slightly off. The internal version — fuller, deeper, more resonant — is the version constructed by a brain that has spent its entire life listening to itself through the bones of its own head, with no one else ever in a position to compare notes.