I Ryu/Visual China Group/Getty Images
A Microsoft sign is seen at the company’s headquarters on March 19, 2023 in Seattle, Washington.
The Mona Lisa can now do more than smile, thanks to new artificial intelligence technology from Microsoft.
Last week, Microsoft researchers detailed a new AI model they’ve developed that can take a still image of a face and an audio clip of someone speaking and automatically create a realistic looking video of that person speaking. The videos — which can be made from photorealistic faces, as well as cartoons or artwork — are complete with compelling lip syncing and natural face and head movements.
In one demo video, researchers showed how they animated the Mona Lisa to recite a comedic rap by actor Anne Hathaway.
Outputs from the AI model, called VASA-1, are both entertaining and a bit jarring in their realness. Microsoft said the technology might be used for education or “improving accessibility for individuals with communication challenges,” or potentially to create virtual companions for humans. But it’s also easy to see how the tool might be abused and used to impersonate real people.
It’s a concern that goes beyond Microsoft: as more tools to create convincing AI-generated images, videos and audio emerge, experts worry that their misuse might lead to new forms of misinformation. Some also worry the technology might further disrupt creative industries from film to advertising.
For now, Microsoft said it doesn’t plan to release the VASA-1 model to the public immediately. The move is similar to how Microsoft partner OpenAI is handling concerns around its AI-generated video tool, Sora: OpenAI teased Sora in February, but has so far only made it available to some professional users and cybersecurity professors for testing purposes.
“We are opposed to any behavior to create misleading or harmful contents of real persons,” Microsoft researchers said in a blog post. But, they added, the company has “no plans to release” the product publicly “until we are certain that the technology will be used responsibly and in accordance with proper regulations.”
Microsoft’s new AI model was trained on numerous videos of people’s faces while speaking, and it’s designed to recognize natural face and head movements, including “lip motion, (non-lip) expression, eye gaze and blinking, among others,” researchers said. The result is a more lifelike video when VASA-1 animates a still photo.
For example, in one demo video set to a clip of someone sounding agitated, apparently while playing video games, the face speaking has furrowed brows and pursed lips.
The AI tool can also be directed to produce a video where the subject is looking in a certain direction or expressing a specific emotion.
When looking closely, there are still signs that the videos are machine-generated, such as infrequent blinking and exaggerated eyebrow movements. But Microsoft said it believes its model “significantly outperforms” other, similar tools and “paves the way for real-time engagements with lifelike avatars that emulate human conversational behaviors.”
Implications of AI-generated Videos
Microsoft’s new artificial intelligence (AI) technology allows for the creation of highly realistic videos utilizing still images and audio clips. This breakthrough has numerous implications, both positive and negative, for various industries and society as a whole.
On the positive side, the AI technology opens up avenues for education and accessibility for individuals with communication challenges. By creating realistic videos with compelling lip syncing and natural face and head movements, the technology has the potential to improve learning experiences and make content more engaging and accessible.
Additionally, the AI-generated videos might be used to create virtual companions for humans, providing company and support for those in need. This has the potential to alleviate loneliness and improve mental well-being, particularly in situations where human companionship is limited.
However, there are significant concerns surrounding the misuse of AI-generated videos. The technology’s ability to create realistic videos raises the risk of impersonation and the creation of misleading or harmful content. With the growing availability of AI tools that can generate convincing images, videos, and audio, experts worry that the prevalence of misinformation will increase and new forms of deception will emerge.
This issue extends beyond Microsoft and highlights the need for proper regulations and responsible use of AI technology. Steps must be taken to ensure that the potential for abuse and misinformation is minimized. While Microsoft has stated its opposition to creating misleading or harmful content, the company is withholding the release of the AI model publicly until they are confident it will be used responsibly and in accordance with regulations.
Future Trends and Recommendations
As AI technology continues to advance, it is essential to address the implications and potential future trends related to AI-generated videos. The following are some considerations and recommendations for the industry:
- Regulations: Governments and regulatory bodies should establish guidelines and frameworks to ensure responsible use of AI-generated videos. Clear rules and ethical standards will help prevent misuse and protect individuals from harm.
- Media and Journalism: The emergence of AI-generated videos raises questions regarding the authenticity of visual content. Journalists and media organizations should be mindful of this trend and develop methods to verify the authenticity of videos to maintain trust and ensure accurate reporting.
- Security and Privacy: The increased prevalence of AI-generated videos calls for enhanced security measures and privacy protections. Safeguarding personal information and preventing unauthorized use of AI technology should be a priority.
- Education and Awareness: Society needs to be informed regarding the capabilities and risks associated with AI-generated videos. Public education campaigns and awareness programs can help individuals understand the potential for manipulation and make informed decisions when encountering such videos.
- Ethical Considerations: As AI technology advances, ethical discussions surrounding its use become crucial. Promoting transparency, accountability, and responsible decision-making in AI development and deployment will help mitigate potential negative consequences.
While the AI-generated videos showcased by Microsoft are impressive, they also raise important questions regarding the boundaries of technology and its impact on society. It is vital to approach the development and implementation of AI with caution, ensuring that its potential is harnessed for the benefit of humanity while minimizing the risks of misuse and deception.