2023-12-31 09:50:09
Published on Dec 31 2023 at 10:50
Imagine that it was possible to go on vacation to a place where you don’t speak the language, and understand everything that is said to you, using a virtual reality headset or connected glasses. This future is fast approaching.
Last August, FAIR, an artificial intelligence research group funded by Meta, released a new language model, called “SeamlessM4T.” The name comes from “Massively Multilingual and Multimodal Machine Translation” – a machine that allows translations of a multitude of languages, in a multimodal way.
A “Swiss Army Knife” model
This machine allows you to understand nearly 100 languages from oral to written, or from one text to another. On the other hand, when it comes to transforming spoken language, or transforming a written text into spoken speech, the model works from 100 languages, but only towards 35 languages.
“We focused on an all-in-one approach,” emphasizes Juan Pino, one of the researchers behind this invention. “It’s a model that can perform many tasks, a bit like a Swiss Army Knife. » Furthermore, it is available as open source, which allows start-ups to use it to build their own tools.
Floating subtitles
At an event in San Francisco at the end of November, the team behind this advancement demonstrated it. Journalists were invited to wear a virtual and augmented reality headset, from the Quest brand, which belongs to Meta – the parent company of Facebook. The researchers then took turns speaking in Spanish, French and Mandarin.
The augmented reality headset gradually translates into English. Journalists therefore hear the English version of all these speeches in their headsets, while subtitles are also displayed in English on the screen. Which gives the impression that the translations are floating in front of researchers.
Fast, error-free translation
The result is stunning. The model is as fast as simultaneous translation in international institutions, such as the UN or the European Parliament. Although he was not trained specifically on these speeches, the researchers assure, he makes very few errors.
Even more surprising, it is possible to reproduce the timbre of the voice of the person speaking, so that the result is more natural. For now, the headset is a bit bulky – we’re happy to take it off, even following only a twenty-minute demo. But if this model is one day available on connected glasses, its use might become much more common.
Recognize the language
This is currently a research project, not a finished product. And this does not allow the speech of the person wearing the headset to be translated. This therefore means that both interlocutors must have this device in order to have a real conversation.
However, the model is already able to recognize the language, even if the speaker mixes different languages. He manages to translate as he goes, without waiting for the end of the sentence, which presents multiple challenges, according to FAIR researchers.
“This means that the model must operate with limited information,” explains Juan Pino. “Another challenge is that languages have different word orders. » Like in German, where the verb is often at the end. Or in Korean, where the word order is sometimes the complete opposite of an English sentence.
1704020304
#amazing #Meta #translates #foreign #languages #live