When it comes to artificial intelligence and creativity, there are a lot of lurid speculations and sci-fi scenarios flying around - the idea that with the press of a button, human craftsmanship is overwritten. Illustrators are already seeing their work scraped from the internet and reconfigured into grotesque Frankensteined facsimiles.
However, there is another way. For years the VFX world has been working with AI and algorithms not to wholesale generate images but to help create CG crowds or drive physics engines or help with clean up.
A new project from Japan reveals a more harmonious and collaborative way to bring AI into the creative sphere. Geometry Ogilvy Japan - together with Japan's leading telecoms operator, KDDI, and WPP, have created an entirely virtually produced film, working with Sydney-based production house Alt.vfx, and their subsidiary T&DA, along with New Holland Creative and THINKR.
The film has been created for new web3 platform aU (Alpha-U), and it takes the illustrations of gen-z artist Mayu Yukishita and brings them to life, animating them using AI. It uses a process that respects the artist’s vision and enhances it.
LBB’s Laura Swinton catches up with the T&DA team to find out more.
LBB> When Geometry Ogilvy Japan approached you, how specific was the brief? Did they have a clear idea about using AI to make the film or what role AI would play or were things a bit more open so you could find the best way to apply AI?
T&DA> The brief was essentially to construct an entire commercial from just four artworks designed by Mayu Yukishita(https://www.yukishitamayu.com/). Geometry Ogilvy Japan (GOJ) wanted us to use AI as a way to morph between various sequences. We then got to work on research and development to test various ways this could be accomplished. We were testing on the latest AI models such as Stable Diffusion and InstructPix2Pix, and we found a technique that worked for the morph sequences using a hybrid of Stable Diffusion's Deforum and manually compositing and frame blending in After Effects.
LBB> What was it about the project that excited you?
T&DA> When we were shown the first piece completed by Mayu [Yukishita], we got really excited as the quality of her artwork was so rich and detailed and we couldn't wait to explore ways to manipulate with AI. Exploring different styles and experimenting with how the AI deconstructs and reinterprets her works was a lot of fun. The project was like an additive collage, where we had a camera move fleshed out and we could layer AI pieces together. It was very exciting to watch each iteration of the video come together.
LBB> I guess the starting point for the film were the illustrations of Mayu Yukishita - what was it about her work that made it so right for this project?
T&DA> There were multiple reasons why Mayu Yukishita's works were optimal for this project. Her textural style meant we could experiment with different AI prompts while still remaining consistent to the overall look. Her works also exhibited a lot of depth with clear division between background, midground, and foreground elements which meant we could play with parallax. Furthermore, she supplied us layers that we could then isolate and run them individually through the AI to create morphing sequences which we could layer back into the final composition.
LBB> What did those illustrations inspire in the team and, from a creative point of view, what were you hoping to achieve with the application of AI to the illustrations?
T&DA> Mayu's cyberpunk visions meant we could dive deeper into a neo-Tokyo aesthetic during the transition sequences from one artwork to another. The underlying meaning behind the commercial was to convey the interplay between reality and virtual, a duality that we could bring to life with the use of synthetic AI technology.
LBB> From a technical perspective, can you talk us through the process and AI technologies you used to bring the film to life?
T&DA> We wanted to breathe life to each artwork, so we isolated each element from the Photoshop layers and ran them individually through Stable Diffusion img2img
at a low denoising level at various seeds. When the variants were stacked into a sequence, they looked as if they were flickering through different multiverses. To smooth this motion, we ran each animated element through RIFE interpolation
. We were then able to layer each animated element into the composition. For transitional scenes, we used Stable Diffusion's Deforum
to slowly deconstruct the Mayu's artwork in a feedback loop and morph it into the given prompt. We ran this at several seeds on the transition entry take and repeated the same on the exit take but had that running in reverse. We then ran both clips in parallel and blended between them in post so that it would look as if one artwork was deconstructing into cyberpunk cosmic parts, and then reconstructing to another artwork.
LBB> One of the big misconceptions about AI in production is that it’s a case of ‘press button, get output’, but I understand that in reality it’s an iterative process and involves a fair bit of wrangling. What sort of challenges did you face with this particular film?
T&DA> There was a lot of post involved to bring each sequence to life. To ensure our compositor had the most amount of creative freedom, each element was run through the AI several times at different settings so he could be selective about which takes would work well for the shot.
There was a lot of fine-tuning and organisation involved in ensuring the output from Stable Diffusion was optimal and non-blurry. At various points throughout the generations, the AI would distort faces. To combat this, we ran separate passes through the AI targeting specific features, such as a clothing pass which involved 'textiles' and 'abstract fashion' prompts, and a skin pass which was a lighter and more painterly prompt at a lower denoising setting. We were then able to blend these elements together so we get the best of all worlds.
LBB> What were some of the most interesting and satisfying aspects of this film to tackle and why?
T&DA> The most satisfying part of the film was the way we designed it to be a continuous one-take where we could layer elements on top like a collage. As we added more and more elements to the film, each iteration became increasingly lively.
The most interesting part of the process was watching how Stable Diffusion img2img would emphasize motifs and patterns within Mayu's artworks through the prompts we supply, such as turning neon signage to planets when given a galaxy prompt.
LBB> For now, AI is still something of a novel tool in film production (though I know various algorithms etc. have been used in VFX for sometime now, from compositing tools to crowd generation, physics engines etc.) - but as the novelty wears off, where do you see generative AI as part of the standard production toolkit?
T&DA> The greatest hurdle with AI technology used in a visual effects pipeline is its unpredictable and uncontrollable nature.
Models like ControlNet for Stable Diffusion are already attempting to resolve this issue by letting creatives direct the shape of the composition in the final AI output. Image generators like Midjourney are already being used for concept design, stock photos, and background matte paintings.
One of the next big shifts we will see in the coming years is the rise of text2video models, where the AI is trained on entire image sequences rather than just still frames to produce a video with temporal cohesion. Early iterations such as Modelscope, Phenaki, and Make-A-Video already show promising results. Other AI tools like LLMs (ChatGPT / Bard) are already being integrated by developers into most VFX softwares such as Houdini, Maya, Cinema 4D.
LBB> There’s a lot of debate right now about the way AI engines scrape the internet, plagiarise artists’ works - whereas this project takes collaboration with the illustrator as the starting point and honors and respects them. How does this project provide a more positive, more respectful and non-exploitative model/framework for how AI can be applied to an artist’s work and bringing artists and AI together?
T&DA> We hope that this project is one of many that will help destigmatise the practical usage of AI by showing how it can be used ethically when the consent of the artist has been supplied.
Stable Diffusions model has been trained on LAION-5B, a giant dataset of images sourced from Common Crawl, a not-for-profit initiative for maintaining an open repository of web crawl data. The major ethical issue is that works by non-consenting artists have been scraped, paired with keywords in the LAION dataset, and trained on in Stable Diffusion 1.5.
This is where our collaboration challenges the debate as to the ethics of AI images. We used img2img throughout production, a technique where we supplied a starting image and used a denoising scale to affect how much it was directly influenced by the given prompt. At 95% the image will be almost entirely AI generated and unrecognisable to the original starting image, while at 5% the AI output would be almost identical to the original starting image. We used a denoising level between 30% to 65% which meant all of the AI output frames were hybrids between Mayu's work and a cumulation of trained data from billions of images. To ensure we remained ethical, no artists were specified in the prompts. Newer ethical models are being released such as Adobe's Firefly
which has been trained on their own Adobe Stock library. We hope that in the future, ethical models will be the new standard.