The latest innovative tool from Google, known as Whisk, allows users to utilize picture instructions rather than words to generate AI-driven images. This new product enables users to upload photographs and obtain a merged, AI-generated image without the need for any written instructions or descriptions.
Prior to mixing photographs, users have the option to provide images of various subjects, settings, and styles. The goal of Whisk, as described by Google in a blog post, is to serve as a “creative tool” for quick inspiration rather than a “traditional image editor.” Whisk is designed to be a fun and user-friendly AI function, rather than a professional editing tool.
Tech giants like Google and OpenAI are racing to release consumer products that showcase the latest advancements in technology. However, critics warn that the unchecked growth of AI could pose risks to humanity.
With the introduction of Dall-E by OpenAI, a text-to-image generation tool, in 2021, AI-generated artwork has become prominent on social media and consumer products. Google’s Whisk builds upon this concept by offering an image-to-image generator that allows users to create a wide range of items such as plushies, enamel pins, and stickers by mixing different categories and inputs.
Users of Whisk can experiment with different combinations of subjects, scenes, and styles to create unique and visually appealing images. While users can provide textual descriptions to direct the details of the image, an image itself is not a requirement for generating the final output.
Thomas Iljic, the director of product management at Google Labs, stated that Whisk is intended to enable users to explore visual creativity in new and exciting ways, rather than focusing on achieving pixel-perfect edits. Google utilized its acquisition of DeepMind in 2014 to develop Whisk, leveraging the generative AI technology of DeepMind to power this new tool.
Whisk utilizes Google’s primary AI service, Gemini, which was introduced in December 2023, as well as Imagen 3, DeepMind’s latest text-to-image generator. When users upload photographs, Imagen 3 receives captions from Gemini and captures the “essence” of the subjects to create the final image, focusing on the overall theme rather than exact replication.
In a blog post, Google acknowledged that the generated images may differ from the original prompt photographs in certain aspects such as height, haircut, and skin tone. The company faced criticism in the past for generating historically inaccurate images using Gemini’s text-to-image converter, which was launched in February.
Whisk is currently available only on the Google Labs website in the United States and is still in the early stages of development, according to the company. OpenAI also recently introduced Sora, a text-to-video generator, further demonstrating the competitive landscape of consumer AI products.
Dan Ives, the managing director and senior equities analyst at Wedbush Securities, described Whisk as another impressive feat for Google in the AI and tech space. He emphasized that AI products are a key part of Google’s strategy for future growth, alongside other initiatives such as the development of a new Android operating system in collaboration with Samsung and Qualcomm.
Overall, Whisk represents the latest advancement in AI technology, allowing users to create unique and visually appealing images using picture instructions rather than words. With the continued development of AI tools like Whisk, the possibilities for creative expression and visual exploration are limitless.