2024-03-01 22:00:06
Researchers from Chinese giant Alibaba’s Institute for Intelligent Computing recently unveiled EMO (short for Emote Portrait Alive). Described in a research article published on February 27, this new artificial intelligence model is capable of animating portrait photos by generating videos of people speaking or singing with remarkable realism.
“Traditional techniques often fail to capture the full spectrum of human expressions and the uniqueness of individual facial styles,” says Linrui Tian, lead author of the paper. “To solve these problems, we propose EMO, a new tool that uses a direct audio-video synthesis approach, without the need for intermediate 3D models or facial landmarks,” he adds. In practice, the AI transcribes the audio provided directly into lip and facial movements, accompanied by facial expressions with perfect adequacy. “Experimental results show that EMO is capable of producing not only convincing speaking videos, but also singing videos in different styles, significantly outperforming existing state-of-the-art methodologies in terms of expressiveness and realism », Explain the Alibaba researchers. These specify having trained EMO by building “a vast and varied audio-video database comprising more than 250 hours of footage and more than 150 million images.”
From a young Leonardo di Caprio rapping on an Eminem song, to the woman generated by OpenAI’s AI Sora singing a piece by Dua Lipa, to the Mona Lisa declaiming Shakespeare, or even the famous actress Audrey Hepburn singing an Ed Sheeran hit, the examples presented are striking.
Concerning the risks linked to misuse of this technology to spread false information, the researchers indicate that they plan to develop methods for detecting videos generated via artificial intelligence.
1709361139
#Alibabas #photo #speak #sing