To me, Artificial Intelligence has always felt like something from the future; something only seen in sci-fi films and TV shows, but it's here and it's actually quite useful. On the surface it sounds like something that will take our jobs and eventually destroy humanity (like Skynet), but Chat GPT has reassured me that Sound Designers are safe, for now at least... I believe the creative, emotive and artistic decisions made by humans as individuals and within teams will always remain essential to the creation of bespoke sound design. I can see AI becoming embedded into our workflows, to help us become faster and better at what we do.
As someone immersed in the world of sound, this captivating convergence of AI and sound design has ignited a spark within me. AI has emerged as an innovative tool that harmonises technology and artistry in ways previously unimaginable.
We first saw a form of 'AI' crop up in the pro audio industry within Izotope's RX 6 plugin. Izotope utilises advanced machine learning to analyse, identify and repair problems within audio recordings. Izotope have continued to build and improve upon their machine learning algorithms up to their current iteration of RX 10. However, I'm sure we can all agree RX can only take us so far... Recently I found myself sitting in-front of a few near impossible dialogue clean-ups. One project had been edited with a TV broadcast feed that was covered in music. And another had car interior dialogue recorded from the exterior - the mic feed supplied to me was unusable. ADR was out of the question and both projects had a super tight turnaround, so I knew I had to think of some creative solutions for these very tricky problems.
I was aware that Peter Jackson had built a bespoke AI algorithm to isolate instruments and voice from a single mono recording for The Beatle’s 'Get Back' TV series, so I searched Google to see if something similar was available for me to use on my troubled broadcast dialogue. This is where I came across lalal.ai - an AI programme that separates vocals and instrumentation. The algorithm did a great job, the dialogue was clear and 98% free of music. With one small area where I could hear the algorithm had gotten a bit confused and some music had remained. Luckily with some creative mixing I was able to mask it with the newly supplied music and my sound design.
Inevitably my YouTube algorithm filled my phone with AI videos! One that really caught my attention was about Adobe's Speech Enhancement AI tool. Adobe state that their AI enhancement '...makes voice recordings sound as if they were recorded in a professional studio.' As soon as I got to the studio the next day I tried it out on my un-usable car dialogue. Safe to say I was blown away with the results, it turned my virtually unusable dialogue into something fully workable.
Another instance of how I’ve utilised AI recently was working on a short film that required the sound of a TV in the background as ambience, and some scenes came with a very specific brief from the director about what we should be hearing from the TV. Using ChatGPT I very quickly created bespoke scripts to record matching the directors brief. Having these scripts for the 'actors' to read from was a huge help to them, it meant they could focus on the performance instead of scrambling to improvise. Using ChatGPT in this way saved me a lot of time and with the wealth of generative music AI programmes available now as well, I can see creating bespoke radio & TV ambience fill becoming a very quick and easy process.
Looking forward I can see a myriad of ways in which AI could help sound designers in the future. Two areas that really excite me are Generative Sound Synthesis and Sound Recommendation. With Generative Sound Synthesis I believe we will be able to prompt an AI programme that will then generate unique and innovative sound effects and textures. With Sound Recommendation an AI programme will be able to recommend sound effects for a scene in a piece of film based on prompts from the sound designer, or purely by analysing what is seen in the film. This could massively improve time spent hunting through sound libraries and may even throw up some interesting left field ideas. Importantly, it’s the human interaction in these processes that will generate the best results - we’ve been creating and designing sound for decades, honing our craft, so that human experience will remain crucial.
Another exciting area of AI development in sound design is the use of AI generated [celebrity] voice. Some interesting examples of this are Frank Sinatra singing Lil Wayne's 'Get Low' and a posthumous voice-over of Anthony Bourdain in the film 'Roadrunner' which chronicles the late chef's life. James Earl Jones' has also signed over the rights to his archival voice work, leading to the recreation of his voice as Darth Vader in the latest Obi-Wan Kenobi series on Disney+. Does this mean we will see a future of immortalised celebrities? Will future commercials be filled with the voices of Arnold Schwarzenegger, Brad Pitt and Britney Spears?
As you can see, AI has been helping me to deliver work using methods that just five years ago weren’t possible. I’ve given some examples of how I'm using AI in my workflows as a sound designer, but there are also other areas I'm using AI to assist me. Can you tell if I've used AI to help me write this article?