ROTHCO and CereProc explain the process behind their world-first project for The Times newspaper
When John F. Kennedy was assassinated in November 1963, the speech he was scheduled to make on that day in Dallas died with him. That is until last week, when the world heard for JFK’s voice speak these words the first time on The Times website. Speaking on themes of freedom, power, wisdom and restraint, the 35th US president’s speech still resonates with life and the issues the world faces today.
The project, called ‘JFK Unsilenced’ was the brainchild of Irish creative agency Rothco and tech company CereProc, who used the latest technologies in artificial intelligence and sound engineering (finessed by post production company Screen Scene) to create a convincing reproduction of JFK reciting the words he was due to deliver.
LBB’s Alex Reeves spoke to Alan Kelly, executive creative director at Rothco, and Chris Pidcock, chief technical officer at CereProc, to find out how this unique work of creative technology came about.
LBB> What was the original brief and how did you reach this idea from there? I imagine it was quite a journey from there to landing on JFK and this world-first innovation!
Alan Kelly, ECD at Rothco> The original brief was actually a conversation between Richard Oakley, the editor of The Times, Ireland edition and Paul Hughes, our director of strategy. Richard was looking for a story about achievements in Irish technology as lots of tech companies have their European headquarters in Dublin. So, the challenge was to apply Irish technology to an engaging and relevant story.
With The Times, we settled on an editorial feature that would celebrate John F. Kennedy’s life in a way that has never been done before. To remind readers in the US, and the world, of one of the greatest orators of all time and that his words are as relevant, pertinent and inspiring today as they would have been had they been delivered on that infamous day in November 1963.
LBB> When did you realise this was technically possible?
AK> It wasn’t until quite late on! In the early stages, the sound files that were sent to me were all over the place – different quality, different pitch, different environments. However, there were at least two words in those files that were believable. So, even when my sound engineer was looking at his shoes thinking, ‘this is never going to work’, I was confident that if we can get two words to work, we could get all the words to work, It would just be time and dedication.
LBB> What were the first steps, once you had the idea, towards making it happen?
AK> I talked to my producer Al Byrnes. I told him the idea and then he took up the challenge of finding the right people to help us make it happen. We had no idea if it was possible, or even who we should talk to. Eventually we found a company (CereProc) that was helping patients with motor neurone disease keep their voices after the illness had taken them away. We asked them if they could help us apply the tech to JFK and deliver the speech he was supposed to give minutes after he was killed. Their response was ‘we have no idea…but we’ll try’.
LBB> You reviewed 831 analogue recordings of JFK's speeches and interviews. That sounds like a lot of work! How did you get through that?
AK> It took a couple of weeks. Not something I had to do myself. But as all JFK speeches are pretty special, it wasn’t that much of a hardship.
LBB> What were the biggest challenges or unexpected issues you came up against? And how did you overcome these?
AK> The biggest challenge was ‘environment’. The recordings we had available all hailed from different environments and recording equipment - some indoor, some outdoor, some good quality, some bad. The biggest challenge was smoothing them out and stitching them together.
Chris Pidcock, chief technical officer at CereProc> The main challenge was the quality of the audio recordings that were available. We normally use a very high-quality, strictly monitored recording environment - to address this we used various noise reduction techniques, as well as analysing audio for data that was unusable. Once we had selected the best files, we then used audio processing to match the different environments.
A second challenge was to model JFK's speaking style (his intonation, also known as prosody). His style is quite unique, so a more sophisticated model was required.
LBB> Jargon-busting time! Can you explain, as simply as possible, how you used artificial intelligence and deep neural networks to recreate the president's speech?
CP> DNNs are an AI technique that can be used to make sophisticated predictions that are learned from training data. We used a DNN model of JFK's intonation to predict how we should read the sentences of the new speech (this prediction covers his pitch, speech energy, and the duration of his speech sounds).
LBB> What is ‘high-dimensional search’ and how did it make this project possible?
CP> When we are generating speech, we take a huge number of speech parameters (such as pitch, energy, duration, spectral features, sound contexts, syllable positions), and a large number of speech segments, and we have to decide which segments are the best. There are thousands of options for every sound. You could imagine this as an enormous grid, and we have to find a path from one side to the other. This is the 'high dimensional search'.