Go further, Microsoft’s new AI imitates human voices in 3 seconds

Since ChatGPT was born, it has become so smart that many programmers are worried that it will take their jobs.

Now, there is even more worrying news… When a team of researchers at Microsoft revealed a new AI that can accurately mimic human voices from audio samples that are only three seconds long.. Ahh.. Just 3 seconds, will you be good at it?

Microsoft’s voice-generating AI tool, Vall-E, has received 60,000 hours of speech training, much of which comes from LibriVox’s public audiobooks.

Vall-E is built on a technology called EnCodec. It works by analyzing a person’s voice. Divide the information into components. and use the experience gained from training to synthesize that How will it sound if you are speaking different phrases?

Even after hearing a sample of just three seconds, the Vall-E reproduces the speaker’s timbre and expressive tone very accurately.

In testing, Vall-E outperforms state-of-the-art zero-shot TTS systems. [AI ที่สร้างเสียงที่ตัวมันก็ไม่เคยได้ยิน] in terms of the naturalness of the voice and the likeness of the speaker

If anyone wants to try listening to the sound that Vall-E reproduces, check out the demo at GitHub’s website > https://valle-demo.github.io/

data source
techspot

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.