| Vowel Play with Algorithms: Helping Humans and Computers Learn Baby Talk |
|
|
|
| Saturday, 01 March 2008 00:48 |
|
Few milestones stand out in a parent’s memory as clearly as his or her child’s first words. Those simple sounds are the fruition of thousands of hours of a parent’s instinctive tutoring. Constantly nurtured with “baby talk,” infants are introduced to their native tongue in a simple, accessible form – the high notes and exaggerated syllables are acoustically designed for optimal learning. Teaching Computers Like Children This infant-directed speech is characterized by a slower tempo, longer syllables, and more exaggerated vowel sounds as compared to normal speech. Pitch and intonation change as our voices adopt that peculiar, sing-song lilt of baby talk. Although we adopt these changes almost without thinking, they have a profound effect on infants’ abilities to assimilate the languages that surround them. Can Computers “Learn” Like Humans? “People don’t behave randomly or arbitrarily,” Vallabha explains. “In particular contexts – restaurants, movie theaters, baseball games – they behave in regular and predictable ways. The world may seem chaotic and jumbled, but in certain contexts, it is full of statistical regularity and structure. We posit that human infants are exquisitely sensitive to statistical structure at all levels: syllable sequences, when to say certain words, when parents would scold them and so forth. Language happens to be a case where there is a lot of statistical regularity.”{gallery}printed_articles/volume-6-issue-2/vowel-play{/gallery} Learning What’s Important However, as children develop, their receptiveness to the full range of phonemes narrows, focusing on contrasts useful to their native language. Vallabha and McClelland theorize that during this time of diminishing phoneme repertoire, infants are focusing on learning the phonetic and syntactic rules that govern their own language through a process of intense repetition. To mimic this process, Vallabha and McClelland’s computer models used repetition in order to classify discrete vowel sounds. The researchers employed a learning algorithm known as Expectation-Maximization. Essentially, the models began with very broad, uninformed ideas about how to categorize their data and gradually were able to form their own vowel sound categories by repeatedly analyzing and identifying similarities between phonemes. In their experiments, the computer models analyzed data from recordings of both English- and Japanese-speaking mothers. All the mothers read the same set of nonsense words, both spontaneously and to their infants. The algorithms attempted to learn several “i” and “e” phonemes from each language. Why vowels in particular? Vallabha and McClelland’s models characterized sounds by elements of their frequencies and durations, both of which are easily distinguishable between vowels. “For a variety of reasons, consonants such as ‘p’, ‘d’, or ‘m’ are much more difficult to describe compactly in this way,” Vallabha explained. Assuming Everything’s Normal In using a Gaussian distribution for learning, OME identifies vowel sounds using a technique that is similar to the way that many researchers theorize humans perform sound categorization.This approach has also proved quite accurate: OME learned English vowel sounds with 84% accuracy and their Japanese analogs with 95% accuracy. The algorithm is also quite proficient at discerning between speakers. Just as infants are able to learn who is speaking their native language and who is not, the OME algorithm can distinguish between the English and Japanese speakers solely based upon their pronunciation of the nonsense words. The algorithm found far more commonalities in speakers of the same language than between speakers of different languages, revealing that a high degree of language-specific information must be encoded in infant-directed speech. Taking Away the Safety Net Vallabha and McClelland also designed a second algorithm, dubbed TOME (Topographic OME). TOME’s purpose was to mimic linguistic learning without utilizing a Gaussian distribution. Its categories are instead defined by breaking the input space of sounds into many small regions and calculating the proportion of input sounds in each region.This method of “weighting” and strengthening categories through the repetition of similar information seems more promising as a neurobiological model. The robustness of biological neural networks in the brain depends on the synapses between neurons, or connections which vary in both their number and individual signal strength. Our learning process reinforces existing synapses and stimulates the growth of new ones in a relevant neural network, thus acting quite like TOME. TOME is more flexible than its OME counterpart since it is able to learn and classify even if the sounds do not follow the classic Gaussian distribution. However, the accuracy of the TOME algorithm at distinguishing between vowel sounds is currently inferior to that of OME. Using Computers to Help Humans Learn Speech Vallabha hopes that their work will have a more far-reaching impact, asserting that “the ultimate goal is to have a solid theory of how infants learn spoken language – and use that theory to design remediation for speech problems in children and to help adults learn second languages.” |


