design is at odds with the raw materials already close at hand. The human spine, the panda’s thumb (formed from a wrist bone) — these are ramshackle solutions that owe more to evolutionary inertia than to any principle of good design. So it is with language too.

In the hodgepodge that is language, at least three major sources of idiosyncrasy arise from three separate clashes: (1) the contrast between the way our ancestors made sounds and the way we would ideally like to make them, (2) the way in which our words build on a primate understanding of the world, and (3) a flawed system of memory that works in a pinch but makes little sense for language. Any one of these alone would have been enough to leave language short of perfection. Together, they make language the collective kluge that it is: wonderful, loose, and flexible, yet manifestly rough around the edges.

Consider first the very sounds of language. It’s probably no accident that language evolved primarily as a medium of sound, rather than, say, vision or smell. Sound travels over reasonably long distances, and it allows one to communicate in the dark, even with others one can’t see. Although much the same might be said for smell, we can modulate sound much more rapidly and precisely, faster than even the most sophisticated skunk can modulate odor. Speech is also faster than communicating by way of physical motion; it can flow at about twice the speed of sign language.

Still, if I were building a system for vocal communication from scratch, I’d start with an iPod: a digital system that could play back any sound equally well. Nature, in contrast, started with a breathing tube. Turning that breathing tube into a means of vocal production was no small feat. Breathing produces air, but sound is modulated air, vibrations produced at just the right sets of frequencies. The Rube Goldberg-like vocal system consists of three fundamental parts: respiration, phonation, and articulation.

Respiration is just what it sounds like. You breathe in, your chest expands; your chest compresses, and a stream of air comes out. That stream of air is then rapidly divided by the vocal folds into smaller puffs of air (phonation), about 80 times a second for a baritone like James Earl Jones, as much as 500 times per second for a small child. From there, this more-or-less constant sound source is filtered, so that only a subset of its many frequencies makes it through. For those who like visual analogies, imagine producing a perfect white light and then applying a filter, so that only part of the spectrum shines through. The vocal tract works on a similar “source and filter” principle. The lips, the tip of the tongue, the tongue body, the velum (also known as the soft palate), and the glottis (the opening between the vocal folds) are known collectively as articulators. By varying their motions, these articulators shape the raw sound stream into what we know as speech: you vibrate your vocal cords when you say “bah” but not “pah”; you close your lips when say “mah” but move your tongue to your teeth when you say “nah.”

Respiration, phonation, and articulation are not unique to humans. Since fish walked the land, virtually all vertebrates, from frogs to birds to mammals, have used vocally produced sound to communicate. Human evolution, however, depended on two key enhancements: the lowering of our larynx (not unique to humans but very rare elsewhere in the animal kingdom) and increased control of the ensemble of articulators that shape the sound of speech. Both have consequences.

Consider first the larynx. In most species, the larynx consists of a single long tube. At some point in evolution, our larynx dropped down. Moreover, as we changed posture and stood upright, it took a 90-degree turn, dividing into two tubes of more or less equal length, which endowed us with considerably more control of our vocalizations — and radically increased our risk of choking. As first noted by Darwin, “Every particle of food and drink which we swallow has to pass over the orifice of the trachea, with some risk of falling into the lungs” — something we’re all vulnerable to.[30]

Maybe you think the mildly increased risk of choking is a small price to pay, maybe you don’t. It certainly didn’t have to be that way; breathing and talking could have relied on different systems. Instead, our propensity for choking is one more clear sign that evolution tinkered with what was already in place. The result is a breathing tube that does double duty as a vocal tract — in occasionally fatal fashion.

In any event, the descended larynx was only half the battle. The real entree into speech came from significantly increased control over our articulators. But here too the system is a bit of a kluge. For one thing, the vocal tract lacks the elegance of the iPod, which can play back more or less any sound equally well, from Moby’s guitars and flutes to hip-hop’s car crashes and gunshots. The vocal tract, in contrast, is tuned only to words. All the world’s languages are drawn from an inventory of 90 sounds, and any particular language employs no more than half that number — an absurdly tiny subset when you think of the many distinct sounds the ear can recognize.

Imagine, for example, a human language that would refer to something by reproducing the sound it makes. I’d refer to my favorite canine, Ari, by reproducing his woof, not by calling him a dog. But the three-part contraption of respiration, phonation, and articulation can only do so much; even where languages allegedly refer to objects by their sounds — the phenomenon known as onomatopoeia — the “sounds” we refer to sound like, well, words. Woof is a perfectly well formed English word, a cross between, say, wool and hoof but not a faithful reproduction of Ari’s vocalization (nor that of any other dog). And the comparable words in other languages each sound different, none exactly like a woof or a bark. French dogs go ouah, ouah, Albanian dogs go ham, ham, Greek dogs go gav, gov, Korean dogs go mung, mung, Italian dogs go bau, ban, German dogs wau, wau: each language creates the sound in its own way. Why? Because our vocal tract is a clumsy contraption that is good for making the sounds of speech — and little else.

Tongue-twisters emerge as a consequence of the complicated dance that the articulators perform. It’s not enough to close our mouth or move our tongue in a basic set of movements; we have to coordinate each one in precisely timed ways. Two words can be made up of exactly the same physical motions performed in a slightly different sequence. Mad and ban, for example, each require the same four crucial movements — the velum (soft palate) widens, the tongue tip moves toward alveolar closure, the tongue body widens in the pharynx, and the lips close — but one of those gestures is produced early in one word (mad) and late in another (ban). Problems occur as speech speeds up — it gets harder and harder to get the timing right. Instead of building a separate timer (a clock) for each gesture, nature forces one timer into double (or triple, or quadruple) duty.

And that timer, which evolved long before language, is really good at only very simple rhythms: keeping things either exactly in phase (clapping) or exactly out of phase (alternating steps in walking, alternating strokes in swimming, and so forth). All that is fine for walking or running, but not if you need to perform an action with a more complex rhythm. Try, for example, to tap your right hand at twice the rate of your left. If you start out slow, this should be easy. But now gradually increase the tempo. Sooner or later you will find that the rhythm of your tapping will break down (the technical term is devolve) from a ratio of 2:1 to a ratio of 1:1.

Which returns us to tongue-twisters. Saying the words she sells properly involves a challenging coordination of movements, very much akin to tapping at the 2:1 ratio. If you first say the words she and sells aloud, slowly and separately, you’ll realize that the /s/ and /sh/ sounds have something in common — a tongue-tip movement — but only /sh/ also includes a tongue-body gesture. Saying she sells properly thus requires coordinating two tongue-tip gestures with one tongue-body gesture. When you say the words slowly, everything is okay, but say them fast, and you’ll stress the internal clock. The ratio eventually devolves to 1:1, and you wind up sticking in a tongue-body gesture for every tongue-tip gesture, rather than every other one. Voila, she sells has become she shells. What “twists” your tongue, in short, is not a muscle but a limitation in an ancestral timing mechanism.

The peculiar nature of our articulatory system and how it evolved, leads to one more consequence: the relation between sound waves and phonemes (the smallest distinct speech sounds, such as /s/ and /a/) is far more complicated than it needs to be. Just as our pronunciation of a given sequence of letters depends on its linguistic context (think of how you say ough when reading the title of Dr. Seuss’s book The Tough Coughs As He Ploughs the Dough), the way in which we produce a particular linguistic element depends on the sounds that come before it and after it. For example, the sound Isl is pronounced in one way in the word see (with spread lips) but in another in the word sue (with rounded lips). This makes learning to talk a lot more work than it might otherwise be. (It’s also part of what makes computerized voice-recognition a difficult problem.)

Why such a complex system? Here again, evolution is to blame; once it locked us into producing sounds by

Добавить отзыв
ВСЕ ОТЗЫВЫ О КНИГЕ В ИЗБРАННОЕ

0

Вы можете отметить интересные вам фрагменты текста, которые будут доступны по уникальной ссылке в адресной строке браузера.

Отметить Добавить цитату