So how long will it take for Babel Fish to become a reality?


“We will have a digital Babel fish that hides in your ear and translates all the languages ​​of the world – in ten years.”

Raj Reddy

If you’ve used Siri or Alexa, you’ve got Dr. We owe a lot to Raj Reddy, an Indian-American AI pioneer. Reddy and his colleagues have been pushing the boundaries of AI for many decades. He also has a habit of making friendly bets on futuristic technologies with his peers. Usually Reddy posits an idea or innovation that borders on optimism. His colleagues would then take the opposite stance of his almost impossible-sounding techno-optimism. Last week, Reddy and his computer science colleagues Gordon Bell, John Hennessy, Ed Lazowska, and Andy Van Dam got together for a virtual event organized by CHM to celebrate Reddy. And Reddy came up with a new bet. He postulated that in ten years’ time we will have a version of the baby fish of the 21st century in the form of an earphone that can translate hundreds of languages ​​in real time. Babel Fish was a prop in the science fiction blockbuster The Hitchhiker’s Guide to the Galaxy.

Your know-how asked! Fill out our quick survey

According to a record from Bell who always bet against him, here are some of Reddy’s loss predictions:

  • Reddy predicted that by 1996 video-on-demand would be available in 5 cities with more than 250,000 people accessing the service. He missed the mark by at least a decade.
  • By 2002, 10,000 workstations would be communicating at GB per second.
  • Reddy thought humans would embrace the importance of AI by 2003. But it wasn’t until the publication of the groundbreaking AlexNet paper in 2012 that people were recognized by the power and impact of AI. A decade after the ImageNet competition, AI added a lot of feathers; it went from recognizing faces to creating faces (deep fakes). Today we have language models like GPT-3 that produce human-like text.

Reddy’s predictions are at least a decade ahead of time. There are a few things to consider in order to understand how realistic the idea of ​​Babelfish is:

1 | Status of multilingual translation

Language models like GPT-3 and BERT still have difficulty addressing issues like bias even with highly available datasets like English. However, language models have already started to examine even regional languages. These innovations have even been built into products like Google Lens and Google Translate. Picture-to-text and text-to-picture have made great strides in recent years. When it comes to speech recognition, however, the ML models still fall short. For example, an Alexa device trained in American English may have problems in an Indian environment. The direction in which the field of speech recognition is moving makes Raj’s bet all the more interesting. For example, researchers at Amazon’s Alexa have moved to end-to-end deep learning models instead of the specialized acoustic and language models. The neural networks in these end-to-end models take the acoustic speech signals as input and output transcribed speech directly. This eliminates the overhead (think: latency) that results from specialized models.

According to Shehzad Mevawalla, director of automatic speech recognition for Alexa, the full neural representation reduced the size of the model to 1/100 the size. “These models can then be deployed to our devices and run on our own Amazon AZ1 neural processor – a neural accelerator optimized for running deep neural networks,” said Mevawalla. Researchers are using semi-supervised learning to deal with massive amounts of unannotated voice data generated by over millions of Alexa devices around the world. Meanwhile, Facebook AI researchers have claimed they have developed a better model that trumps semi-supervised models. Your wav2vec model could learn representations from audio language even with less labeled data.

The math aspect of the bargain looks upbeat, but what about the physical aspect?

Also read: 6 Ways Speech Synthesis Is Aided By Deep Learning

2 | compactness

Another major challenge to Reddy’s forecast is the AI’s insane thirst for computing power. Anyone who has used a CPU or GPU to train ML models would have encountered heating issues. Although the Alexa team claims to have minimized memory and computational requirements through quantization techniques, the compactness of the earpiece will be an issue. Although on-device ML has already been realized through edge devices and frameworks like TensorFlow Lite, a highly efficient, slim earphone with high-speed connectivity is still a distant dream. While the limitations of Moore’s Law barricade the physical aspects of Reddy’s forecast, the lack of adequately curated language datasets could pose another challenge. “If we can’t figure out how to extend Moore’s Law, put that in your ear and it’ll burn your head,” said Ed Lazowska, ACM fellow and computer scientist.

“There is a spectrum of cocky techno-optimism that Raj occupies.”

Andy Van Dam

Andy Van Dam, another ACM scholar and professor of computer science, said Reddy’s prediction falls within the “spectrum of cocky techno-optimism.” Van Dam also spoke about the financial incentives and whether the markets for such products are ripe. There is also the challenge of user interface design. “I try to be a pragmatist. I think we’ll get close, but not quite yet, ”Van Dam concluded.

However, Reddy himself is a little skeptical of his own prediction. Accessibility, Reddy said, will be the reason he might lose this bet. “Technology is not enough, you need accessibility and ease of use. It has to be completely unobtrusive, like a Babylonian fish, to fit in our ears, recognize the language and translate, ”said Reddy.

Join our telegram group. Become part of a dedicated online community. Join here.

Subscribe to our newsletter

Get the latest updates and relevant offers by sharing your email.


Leave A Reply