Communication evolved as humans evolved. It has achieved its grades with the help of technology and its progressions. Making communication seamless across the globe has helped grow business, relationships, and achieve united progress.

Not always what you say is significant, but how you say it matters!

That’s why Google has created a translator that interprets not only your language but also your tone, with the help of Artificial Intelligence.

Google calls it Translatotron. The researchers have explained about the system in detail in Google’s recent blog post. We are so that you don’t have to read to pages and pages of technical stuff. Let’s get on with it.

What’s a Translatotron?

Traditional translation systems employed three fundamental components. They are automatic speech recognition, machine learning, and text-to-speech synthesis (TTS). Automatic speech recognition transcribes the source voice into text. Machine learning translates it, and TTS reads it out in a robotic voice. This is how a traditional translation system operates.

Google has devised a new system that operates as a ‘sequence to sequence’ model. This model directly converts speech to speech directly without depending on a transcribed text as the traditional model does. Hence, this model is competent to focus on both the language and the nature of the tone of the source.

How does it work?

Translatotron makes use of spectrograms to process inputs and to generate output. It is a multilayered process that converts speech in the source language to the target language simultaneously without depending on a text transcription by machine learning algorithms.

Spectrograms generate detailed frequency breakdowns of audio as it varies with time; they are dubbed as sonography, voiceprints, or voicegrams. These spectrograms help the system with the direct translation process.

Retains vocal characteristics

The process is quicker than the traditional technique of translation, but the perk is, it comes with the component of emotion. Radically, the system uses a robotic vocoder and speaker encoder, which serves the system to retain the speaker’s vocal characters in the translated speech. So, instead of an emotion-less robotic voice, the translation uses the same tone, voice of the original speaker.

The system isn’t available to roll out just yet. The samples shared on Google’s GitHub page still plays equitably robotic, and the interpretations are far away from being perfect but the technology unfolds an impressive look at the prospect of communications in the future!