Google has launched a new AI image generation tool called Whisk, which allows users to create visual outputs from existing images. Announced through an update on Google Labs, Whisk employs the Gemini language model for image understanding and the Imagen 3 image generator. Currently, it is available only in the U.S.
Google launches Whisk: AI tool for creative image generation
Whisk operates by capturing the “essence” of the provided image rather than reproducing it directly. Users input an image along with predefined styles, including sticker, enamel pin, and plushie, to receive a creatively altered output. This tool focuses on brainstorming and rapid visualizations, rather than final production content. The simplistic interface aids users in generating preliminary concepts.
The advanced editor mode, accessible via the “Start from scratch” option, provides options for users to specify details across subject, scene, and style categories. Users may also add text for refinement. However, some outputs have not closely aligned with user expectations, as observed during testing. Google cautions that Whisk will vary the attributes of output images, such as height, weight, and hairstyle, from the original input.
Under the hood, Whisk’s functionality builds on the Gemini model’s ability to generate detailed captions about the uploaded image. These captions are then utilized by the Imagen 3 generator to create new visuals. The process highlights Whisk’s aim to promote creative freedom, enabling users to remix elements across different visual formats.
In tandem with Whisk’s launch, Google has introduced Veo 2, a new iteration of its video generation model. This latest update demonstrates improved video generation capabilities, producing high-quality content with sophisticated understanding of real-world physics and human movements. In testing, Veo 2 showed a decreased frequency of “hallucinations,” which typically involve erroneous or unexpected details in generated content.
Users can request specific filming styles or attributes in their video prompts, enhancing the level of detail within the generated outputs, including requesting 4K resolution videos. Videos produced by Veo 2 exemplify the high-quality cinematic results now attainable, meeting various user needs effectively.
The Imagen 3 model has also received an upgrade, capable of producing significantly brighter and better-composed images in a diverse range of styles. This improved model more accurately follows user prompts and generates intricate textures. Through user testing against competing image generation models, Imagen 3 achieved state-of-the-art results.
As part of Google’s commitment to responsible AI development, outputs from both Whisk and the latest models include an invisible SynthID watermark, which aids in the prevention of misinformation. This focus on safety accompanies a careful rollout process. Users can access these new capabilities through Google Labs, where they can sign up for updates and feature enhancements.
Image credits: Google