During an Inside the Lab: Building for the metaverse with AI livestream event on Wednesday, Meta CEO Mark Zuckerberg didn't just expound on his company's unblinking vision for the future, dubbed the Metaverse. He also revealed that Meta's research division is working on a universal speech translation system that could streamline users' interactions with AI within the company's digital universe.   

"The big goal here is to build a universal model that can incorporate knowledge across all modalities... all the information that is captured through rich sensors," Zuckerberg said. "This will enable a vast scale of predictions, decisions, and generation as well as whole new architectures training methods and algorithms that can learn from a vast and diverse range of different inputs."

Zuckerberg noted that Facebook has continually striven to develop technologies that enable more people worldwide to access the internet and is confident that those efforts will translate to the Metaverse as well. 

"This is going to be especially important when people begin teleporting across virtual worlds and experiencing things with people from different backgrounds," he continued. "Now, we have the chance to improve the internet and set a new standard where we can all communicate with one another, no matter what language we speak, or where we come from. And if we get this right, this is just one example of how AI can help bring people together on a global scale." 

Meta's plan is two-fold. First, Meta is developing No Language Left Behind, a translation system capable of learning "every language, even if there isn't a lot of text available to learn from," according to Zuckerberg. "We are creating a single model that can translate hundreds of languages with state-of-the-art results and most of the language pairs — everything from Austrian to Uganda to Urdu."

Second, Meta wants to create an AI Babelfish. "The goal here is instantaneous speech-to-speech translation across all languages, even those that are mostly spoken; the ability to communicate with anyone in any language," Zuckerberg promised. "That's a superpower that people dreamed of forever and AI is going to deliver that within our lifetimes."

These are big claims from a company whose machine-generated domain doesn't extend below the belt line, however, Facebook-cum-Meta has a long and broad record of AI development. In the last year alone, the company has announced advances in self-supervised learning techniques, natural language processing, multimodal learning, text-based generation, AI's understanding of social norms, and even built a supercomputer to aid in its machine learning research. 

The company still faces the major hurdle of data scarcity. "Machine translation (MT) systems for text translations typically rely on learning from millions of sentences of annotated data," Facebook AI Research wrote in a Wednesday blog post. "Because of this, MT systems capable of high-quality translations have been developed for only the handful of languages that dominate the web."

Translating between two languages that aren't English is even more challenging, according to the FAIR team. Most MT systems will first convert one language to text then translate that over to the second language before converting the text back to speech. This lags the translation process and creates and outsized dependence on the written word, limiting the effectiveness of these systems for primarily oral languages. Direct speech-to-speech systems, like what Meta is working on, would not be hindered in that way resulting in a faster, more efficient translation process.