Google has announced new versions of its AI video generation model, Veo, and image generation model, Imagen, as well as a new image generation tool called Whisk.
In a blog post published on the 16th by Aäron van den Oord, research scientist at Google DeepMind, and Elias Roman, senior director of product management at Google Labs, the company introduced Veo 2, and the latest version of Imagen 3.
Veo 2 can create videos with durations of several minutes and in resolutions of up to 4K, all while hallucinating fewer unwanted details (like those notorious extra fingers). This, the post explained, is thanks to “an improved understanding” of physics and human movement, as well as the language of cinematography – shot types, lenses, film emulation, etc.
The post also said that the updated Imagen 3 model generates “brighter, better composed images”, now rendering more diverse art styles with greater accuracy, prompt faithfulness, and textural detail.
As with all of Google’s image and video generation models, Veo 2 and Imagen 3 outputs have invisible SynthID watermarks that identify them as AI-generated. Google hopes this will combat misinformation and misattribution. And, as of right now, the new versions of both models are available via Google Labs’ tools, VideoFX, ImageFX - and now, Whisk.
Whisk is Google’s next step in simplified AI image generation. The Google Labs experiment is a tool combining Imagen 3 with Gemini’s visual understanding and description capabilities. Users can input existing images as references for a subject, scene and style into Whisk, which then combines them into a new image. Essentially, Gemini writes a detailed caption of the inputted images, and then feeds those into Imagen 3.
Google’s blog post introduced Whisk as a tool for “quickly visualising and remixing ideas”. This development into more polished video AI generations seems to follow the waves being made in the creative world in 2024 by Google’s gen AI rivals OpenAI and StabilityAI, with the likes of the Sora and Stable Video generation models.
However, with its drag-and-drop user functionality, as well as the capabilities to add details and adjustments with text prompts, Whisk – which has already launched for Google Lab users in the US – may just be one of the fastest and most intuitive Gen AI applications for creatives yet.
While some online are heralding Whisk’s arrival as the next non-traditional step in visual exploration, and a new way to moodboard with rapid – seemingly unique – results, the same old questions surrounding ownership, copyright and the importance of human imagination still linger.
So, will Google be whisking us away to the AI-powered future? A different company? Or perhaps we’re staying right where we are. But one thing is for sure; this evolution of its gen AI tech is not going to go unnoticed, by creatives nor their competitors.