2024-02-26 11:19:31
Stability AI recently introduced two text-to-image models: Stable Cascade available in search preview under a non-commercial license and the next generation of its flagship model Stable Diffusion 3.0. While the former is based on the Würstchen architecture to improve performance and accuracy, the latest iteration of Stable Diffusion uses a new architecture combining diffusion transformer and stream matching.
Stable Cascade is a very efficient model which is, according to Stability AI, “exceptionally easy to train and fine-tune on consumer hardware thanks to its three-step approach,” l’architecture Hot dog.
It is built on a pipeline composed of three distinct models: stages A, B and C. This architecture allows hierarchical compression:
- The latent generator (Stage C) transforms textual inputs into compact 24×24 latents;
- The latent decoder (Stage A and B) decompresses the latents into high-resolution images;
- The control network (ControlNet) makes it possible to adjust the characteristics of the generated images.
While Stable Diffusion compresses images from 1024×1024 to 128×128, Stable Cascade does so at 24×24 resolution, resulting in faster inference speeds and lower training costs. It produces complex images in just 30 inference steps, compared to 50 for competing models such as Playground v2, SDXL, SDXL Turbo or Würstchen v2.
The model is able to handle complicated descriptions, generate fine details, and track style and color variations. It far exceeds other models in terms of perceived quality, according to human evaluation.
Each stage of the waterfall can be adjusted for specific needs, allowing control over the level of detail, resolution, style and color of images. In addition, the model has a control network (ControlNet), which allows fine modifications to be made to the generated images, such as changing the position, size, shape or color of objects.
Stable Cascade is available for research preview under a non-commercial license, code for inference, training, fine-tuning and ControlNet is published on the page GitHub of Stability. You can try it on Hugging face ici
Stable Diffusion 3
Stability AI announced on February 22 the opening of the waiting list for an early preview of its latest model. According to the start-up, it would present a clear improvement in performance on multi-subject queries, image quality and spelling capabilities.
The Stable Diffusion 3 suite includes models ranging from 800M to 8B settings, giving users a range of options to suit their specific creative needs. Stability AI only specifies that the model uses diffusion transformer architecture and flow matching, a detailed report is planned.
Here are some examples of model-generated images shared by the startup:
Prompt: An epic anime artwork of a wizard on top of a mountain at night casting a cosmic spell into the dark sky that says “Stable Diffusion 3” made of colorful energy.
Prompt: Close-up studio photo of a chameleon once morest a black background.
Prompt: A painting of an astronaut riding a pig wearing a tutu holding a pink umbrella, on the ground next to the pig is a robin bird wearing a top hat, in the corner are the words “stable diffusion ”.
1708947011
#Stable #Cascade #Stable #Diffusion #latest #texttoimage #models #Stability