My colleague Bünyamin Furkan Demirkaya received an email from Stability AI introducing Stable Diffusion 3.5 Medium, an open model free for commercial and non-commercial use. This model, with 2.5 billion parameters, is designed to run efficiently on consumer hardware, providing broader access to advanced AI image generation. Let’s explore what this new model offers and its compatibility with various GPUs.
Designed for consumer hardware
Stable Diffusion 3.5 Medium was created with accessibility in mind. Unlike many advanced models that require specialized, costly hardware, this model can operate on most consumer GPUs without any significant performance compromises. According to the email from Stability AI, “This model only requires 9.9 GB of VRAM (excluding text encoders) to unlock its full performance,” making it one of the most accessible options for hobbyists, creators, and small startups who lack the budget for high-end GPUs.
The hardware compatibility chart shared by Stability AI clearly illustrates this point. For instance, GPUs like the NVIDIA RTX 3080 and above can run Stable Diffusion 3.5 Medium with no performance trade-offs. Even more affordable GPUs, such as the NVIDIA RTX 4060 or RTX 3060, can manage this model, albeit with certain optimizations like quantization or sequential offloading.
A comprehensive guide to Flux NF4 in Stable Diffusion
Hardware compatibility
The chart categorizes several GPUs by VRAM capacity and provides insight into which models are supported. Here’s a detailed breakdown:
- 8GB VRAM (NVIDIA GeForce RTX 4060): Models like Stable Diffusion 3.5 Medium can be run with some performance compromises, denoted by an orange symbol in the chart. Optimizations such as quantization are required to manage the limited VRAM effectively.
- 10GB VRAM (NVIDIA GeForce RTX 3080): Full compatibility without trade-offs, as represented by a green check. This implies the model runs smoothly, utilizing available VRAM to generate high-quality images efficiently.
- 12-16GB VRAM (NVIDIA GeForce RTX 4070, 4060 Ti, 4080, etc.): GPUs with more VRAM, like the NVIDIA RTX 4070 and AMD Radeon RX 7700 XT, have no issues running Stable Diffusion 3.5 Medium and similar models. These GPUs are powerful enough to operate the model “out of the box” without any modifications.
- 20GB+ VRAM (AMD Radeon RX 7900 XT, NVIDIA GeForce RTX 3090): Larger models, including FLUX.1 and Playground v2.5, can be run efficiently on these higher-capacity GPUs. This category is generally aimed at power users or professionals looking for more versatility in model usage.
- 32GB or Greater (NVIDIA H100): These high-end GPUs can run any of the open-image base models with ease, enabling even the largest models to function without limitations.
Advanced multi-resolution capabilities
Stability AI describes Stable Diffusion 3.5 Medium as “delivering best-in-class image generation for its size.” The model’s advanced multi-resolution capabilities make it stand out among other medium-sized models. For creators, this means clearer images and a high level of detail, without the necessity of a powerful workstation.
The prompt adherence and aesthetic quality chart compares Stable Diffusion 3.5 Medium to several other models, providing more perspective. Notably, the Elo scores for prompt adherence and aesthetic quality show that Stable Diffusion 3.5 Medium performs on par with or better than most models of a similar size.
Performance comparison
The chart evaluates multiple open models by prompt adherence and aesthetic quality, using an Elo scoring system. The following insights can be drawn:
- Stable Diffusion 3.5 Large (8.1B): Stable Diffusion 3.5 Large ranks at the top for prompt adherence, meaning that the model precisely follows user input. This is critical for users aiming for high accuracy when generating images from prompts.
- FLUX.1 [dev] (12B): The model with the highest aesthetic quality rating. Its superior scores reflect its ability to produce visually appealing images that align well with user prompts. However, it requires significantly more hardware resources than medium-sized models like Stable Diffusion 3.5 Medium.
- Stable Diffusion 3.5 Medium (2.5B): As an efficient model with a strong balance between prompt adherence and image quality, it offers excellent output without the heavy resource demands of larger models. This makes it ideal for users who have limited hardware but want access to advanced image generation capabilities.
- Playground v2.5 (3.5B) and AuraFlow v0.2 (6.8B): These models, while providing decent performance, fall short of Stable Diffusion 3.5 Medium when it comes to balanced prompt adherence and quality. This makes them less suitable if precision and aesthetic quality are primary concerns.
With 2.5 billion parameters, Stable Diffusion 3.5 Medium occupies a unique position in the AI model landscape. The combination of high performance, lower hardware requirements, and multi-resolution capabilities makes it a compelling choice for a wide range of users. Stability AI aims to lower the entry barrier for AI-based creativity, targeting everyone from startups to established creators who may not have the infrastructure to deploy large, resource-intensive models.
The company’s direct statement says, “Whether a startup or creator, access to this technology shouldn’t be restricted by hardware limitations.” This reflects Stability AI’s emphasis on democratizing AI tools by addressing the hardware challenges that have traditionally limited accessibility.
What this means for creators and startups
One of the key points Stability AI focuses on is ensuring that its tools are available to as wide an audience as possible. The emphasis on consumer-level hardware reflects a strategy to tap into a broader user base. By making Stable Diffusion 3.5 Medium capable of running on affordable GPUs, they are addressing a significant gap in the market—bridging the power-user and enthusiast-user divide.
A look at the hardware compatibility chart shows the intentional focus on popular consumer graphics cards. The NVIDIA RTX 3060, which is a fairly common GPU among creators, is compatible, albeit with some trade-offs. This kind of versatility opens doors for users who previously might have been unable to access AI tools because of hardware constraints.
The implications of this model’s launch are considerable. For small creators and startups, the ability to run a powerful image generation model without high upfront hardware costs levels the playing field. Competitors who are constrained by limited resources now have a feasible entry point into AI-assisted creative work.
The comparison with other models in the chart highlights how this release brings significant value. Unlike models such as AuraFlow or PixArt-Σ, which either require extensive hardware or fail to deliver on image quality, Stable Diffusion 3.5 Medium aims for a balance between performance and accessibility.
Image quality, prompt adherence, and practical use
Stable Diffusion 3.5 Medium’s performance also extends to the qualitative aspects of image generation. A good balance between prompt adherence and aesthetic quality is crucial in practical scenarios, especially for users who need to create artwork or generate content based on specific, detailed inputs.
The Elo score graph shared by Stability AI shows that the medium model can compete well with larger counterparts while requiring fewer resources. For instance, it nearly matches the SD 3.5 Large Turbo (8.1B) in both prompt adherence and aesthetic quality, yet can be deployed on less powerful GPUs.
How to try Stable Diffusion 3.5 Medium
For users interested in testing this model, Stability AI offers a straightforward pathway. The weights are available for download on Hugging Face, and the inference code can be found on GitHub. This direct access ensures that developers and creators can start using Stable Diffusion 3.5 Medium with ease, integrating it into existing workflows or building new projects from scratch.
In addition to the core model, the full details are also available on Stability AI’s blog, providing insights into the underlying technology and further guidance on making the most of its features.
Featured image credit: Kerem Gülen/Ideogram