A first online tool makes it possible to generate videos from text, on the model of Midjourney and image generation AIs. But for the moment, the result is still sketchy.
If you are still amazed by the answers of ChatGPT or the visuals generated by Midjourney, you may be a little more surprised by the new stage of artificial intelligence: that of generating videos from simple text. This is what the Modelscope tool, which is still in its infancy, offers.
On the same principle as the other AIs, Modelscope makes it possible to create short videos from a “prompt”, that is to say a written instruction. And the first idea of a user of the Reddit forum was therefore to make actor Will Smith eat spaghetti. And the result, seen more than 4 million times on Twitter, is frankly frightening.
Another user took up the same idea, this time with actress Scarlett Johansson, once more laboriously eating spaghetti.
In fact, there seems to be a netizen passion for celebrities who eat all kinds of things, from pizza to cake. Modelscope, and its obvious shortcomings, gives them a still very sketchy or even nightmarish aspect.
Many generated videos display a Shutterstock watermark, which suggests that the tool will draw the basic visuals from online image and video banks to work.
For now, Modelscope can be used via la plate-forme HuggingFace but this one is completely saturated. In reality, there are other models currently being deployed but none are yet sufficiently mature, which was also the case for Midjourney or Stable Diffusion, just a year ago.
OpenAI, the originator of ChatGPT and Dall-E, is also working on a similar AI that should, when deployed, show much more convincingly what generative AI is capable of in this area.
Thomas Leroy Journalist BFM Business