A new version of Stability AI’s AI image generator, Stable Diffusion XL (SDXL), has been released. The most recent version, SDXL 0.9, produces visuals that are more realistic than its predecessor. Additionally, it accurately reproduces hands, which was a flaw in earlier AI-generated images.
Stability AI claims that the new model is “a leap in creative use cases for generative AI imagery.” When the same prompts were used with SDXL 0.9 and Stable Diffusion XL beta, the example photographs from the blog article showed improvements.
There are many aspects that these improvements showed their presence, but most importantly, we will get more accurate results for hands. Before the update, these AI image generators produced spaghetti-looking hands that destroyed all the reality and also beauty in images. From now on, that won’t be a problem anymore.
Everything you need to know about SDXL 0.9
The main factor behind this compositional improvement for SDXL 0.9 over the beta version is the parameter count, which is the total of all the weights and biases in the neural network that the model is trained on.
With a 3.6 billion parameter base model and a 6.6 billion parameter ensemble pipeline (the final output is produced by running on two models and combining the results), SDXL 0.9 has one of the highest parameter counts of any open-source image model. The generated output of the first stage is refined using the second stage model of the pipeline.
“Despite its ability to be run on a modern consumer GPU, SDXL 0.9 presents a leap in creative use cases for generative AI imagery. The ability to generate hyper-realistic creations for films, television, music, and instructional videos, as well as offering advancements for design and industrial use, places SDXL at the forefront of real-world applications for AI imagery,” said Stability AI in its blog post.
Aliens, wolves, and a person holding a coffee cup are among the images that the newest model appears to produce with higher resolution and more lifelike hands. Hands were a straightforward “tell” to identify AI-generated art prior to the March release of Midjourney v5, a competitive platform built on Discord.
What about the SDXL 0.9 beta launch statistics?
So, what about the feedback from the community? Luckily, Stability AI answered the question in its blog post.
“Since SDXL’s beta launch on April 13, we’ve had great responses from our Discord community of users numbering nearly 7,000. These users have generated more than 700,000 images, averaging more than 20,000 per day. More than 54,000 images have been entered into Discord community ‘Showdowns’ with 3,521 SDXL images nominated as winners,” Stability AI said.
Using ControlNet Stable Diffusion feels like playing God with AI image generation
SDXL 0.9 system requirements
Despite its robust output and sophisticated model design, SDXL 0.9 may be run on a recent consumer GPU with only the following requirements: a computer running Windows 10 or 11 or Linux, 16GB of RAM, and an Nvidia GeForce RTX 20 graphics card (or higher standard) with at least 8GB of VRAM. Linux users can also use an AMD card with 16GB VRAM if it is suitable.
- OS: Windows 10, 11 or Linux
- Ram: 16GB
- GPU: Nvidia GeForce RTX 20 with at least 8GB of VRAM If you are a Linux user, you can also use an AMD card with 16GB of VRAM.
AI generated images are improving
AI-generated images continue to improve every day, thanks to new innovations like Stability AI SDXL 0.9. However, this doesn’t mean that every other tool is also at a similar level. There are many issues that the developers face.
The generation of clear, high-resolution images is one of the primary problems for AI image producers. The majority of the current models can only create photos with a resolution of 256×256 pixels or less, which is insufficient to catch the minute details of complex objects like hands or faces.
The models require more data and processing power to produce larger images, which is not always possible or practical. The models may also experience outputs that are hazy or distorted or mode collapse, where they produce similar or identical images for varied inputs but fail to maintain the coherence and realism of the images.
Creating visuals that are consistent and coherent with the input or environment is another problem for AI image producers. The models must comprehend the semantics and logic of the input in order to produce images that correspond to it.
The model must produce a picture that accurately captures the style and content of the text, for instance, if the input is a text description of an image. This is not always simple, though, as the language could be unclear, lacking, or conflicting, and the model might lack the knowledge or common sense to figure it out.
Additionally, the model may provide images that are irrelevant to or incongruent with the text, such as a cat, when the text indicates a dog. However, tools like Stability AI SDXL 0.9 will help us overcome all the issues.
Featured image credit: Stability AI