Machine language: how Siri picks up your voice?
Makradar Technologies / / December 19, 2019
Google, Apple, Microsoft, and even Amazon are actively developing their voice services. Freshly baked on iOS 7 is the same Siri, only new functions and... voice. Do you wonder how is this process? As computers are taught speech? it real art.
For each of the voices Siri - your actor. Once it has completed its role in the articulation, the work has only just begun... The man's voice continues its journey. The story of this journey, both of man and robot - one of the most complex technological processes, which could not be carried out ten years ago.
Let's get acquainted with the director of design and development of voice Nuance, it is one of the largest independent companies in the world dealing with speech recognition and text to speech. Brant Ward (J. Brant Ward) used to be a composer, compose the party for string quartets to synthesizers, and now he composes it by using synthetic voices. He works in the speech synthesis industry in Silicon Valley for over a decade.
Text to Speech - a very competitive industry, and its employees is very secretive. Though the world and believes that Nuance creates the voice of Siri for, Ward and his colleague David Vasquez (David Vazquez) avoid a direct answer. Nevertheless, they agreed to explain, at least in general terms, how the process of creating an amazing machine votes.
Needless to say, no need to articulate and write each word from the dictionary. But when it comes to the application, which should be read any news in your newsletter, or find something for you on the Internet, it is simply obliged to speak every word in the dictionary.
Most of the proposals are selected on a "wealth phonetics" - that is, they contain many different combinations of phonemes. "The fact is, the more data we have, the more realistic the result will be," - says Ward.
After the text is recorded live voice actor (a tedious process that can take several months), very hard work begins. Words and sentences are analyzed, divided by categories and recorded in a large database. In this complex work involved a team of dedicated linguists, as well as use their own linguistic software.
When all this is done, the unit of Nuance to translate text to speech creates bit words and phrases that the actor may I never actually uttered, but it sounds very similar to the speech of the actor, because technically it is the voice actor.
Process speaking is unconscious. We do it without thinking about how this process occurs: the situation in which is our language, which relationships are built between phonemes, and so on - to easily and effectively express complex ideas and emotions. But in order that the computer picked up the sound of human voices, all these factors must be taken into account. As one professor of linguistics, is the task of "Titanic."
You should not think: "I'm talking to a computer." You generally do not need to think about it.
"My children interact with of Siri, as if it were a living creature... They do not feel the difference," - says Ward.
So far, and to the friendship between humans and robots - such as humans. Many people would like it if Siri can recognize the emotional state of the speaker, and somehow react to it (for example, include a soothing voice mode). Imagine - to talk to the robot, which is morally pat you on the head. Maybe, Nuance is already thinking about it ...