IBC Trends 6: Detecting Deep Fakes
Steve Ahern spotted six important trends at this year’s IBC Conference in Amsterdam. This is his report on trend number 6 (previous reports here).I saved the best for last. In my opinion, this innovation from Fraunhofer is a game changer for verification, establishing provenance and fighting fake news.Fraunhofer is a German based applied industry research institute that has been developing and improving technology for over 75 years. It was named after the scientist/engineer Joseph von Fraunhofer and is best known in the audio industry as the developer of the MP3 audio compression format. Although Fraunhofer makes money from licencing its inventions, it’s priorities are to develop technological innovations for good, not just for profit. I have always been impressed by the work of Fraunhofer.This year at IBC, Fraunhofer’s Luca Cuccovillo showed me a timely contribution to the tools that can be used to detect fake news.In true Fraunhofer style, this fake content detection software is not a simplistic thumbs up or down quick answer tool. It requires the user to do some work and to make their own judgement. That’s a good thing, it puts the responsibility, as well as the tools for checking sources, in the hands of the journalist or editor who is working on the story, rather than just outsourcing it to technology.One of the biggest threats to journalistic credibility is deep fake video and audio. It’s out there everywhere on the internet and social media, but, as responsible professional media companies, before we jump on the bandwagon and publish something that’s online, we have a job to do – to check the source. That has always been a core of responsible news organisations that distinguishes us from irresponsible platforms that publish first and check later. But with the sophistication of easily available manipulation technology, we now need help with that checking process.
Enter Fraunhofer.The institute’s manipulation detection software uses a range of tools to test video and audio content to highlight things that may be suspicious. After that, it’s up to you to check further and make decisions for yourself, before you publish.It used to be easy to spot a deep fake video, you could look for deformities such as six fingers, or you could notice how the fake face faltered when the shot changed or something passed in front of the camera. Deep fake technology is smarter than that now, so these simplistic ‘tells’ are no longer enough to be sure if something is fake. Look at this fake Tom Cruise video for example, it has none of the first generation fake ‘tells.’It’s fun, until it’s not. Until it is used to generate fake news or erode trust in a credible news publisher or scam an unsuspecting victim.The same AI technology that creates deepfakes can also be used to expose them. The IT security experts from Fraunhofer’s Cognitive Security research department have designed systems that can detect counterfeits and specialists like Cuccovillo have applied those systems to develop tools for journalists.The detection technology uses several analysis parameters, most of them audio, to identify manipulation and synthetic content. Manipulation can be selective editing of original content, addition of content from other sources, overdubbing or video overlays and other things that alter original content that is used to change the meaning of the original content. Synthetic content can be created to intentionally mislead by either adding it to real content or using it in its entirety. The detection techniques include:Electrical Network Frequency – this is genius. When you record audio or video in a room, electrical impulses are captured by the camera and microphone. Those impulses can be decoded and analysed. Even in the outdoors there are still electrical impulses, they are different from building electricals, but they are still detectable.Microphone Discrimination – more genius. Every microphone is different and they all have a tell-tale acoustic pattern. Just as forensic ballistic analysis of bullets can find microscopic marks from a gun, forensic analysis of a microphone can reveal its unique frequency signature. A slam dunk to prove if audio from one source has been cut in to another source to change the meaning!Inverse Decoding – decodes the file type of the content and compares it with other versions of the same clip found on social media or the internet. This can tell you if the clip has been re-recorded and, perhaps, changed, as it goes viral. If the clip you are going to use as a source is not in its original file format, you will want to go back to the original source to verify whether any changes have been made. The Fraunhofer system can scan the internet and trace back the provenance of the content to the original source and display the files types of each clip in the chain of re-publication. It may not in itself prove anything, but it may be “suspicious” in Luca’s words. (the full interview transcript is at the bottom of this article, along with sources and references)Provenance Analysis – you can also use inverse decoding to prove the provenance of a clip, to enforce copyright if you own it and it has been used without permission.Luca played an example of a deep fake of Volodymyr Zelensky supposedly surrendering to Russia at the beginning of the Ukraine invasion. It was one of many fakes used to train the Fraunhofer AI detection tools use AI against the misinformation creators.As well as Fraunhofer, there were other fake detection tools on display at IBC24, but this one was the most significant in the fight against fake news in my opinion.Every investigative news team should have access to this tool among their armoury of weapons to fight fake news. The technology can be licenced from FraunhoferResearch and Sources:Human Perception of Audio DeepfakesInverse Decoding of PCM A-law and µ-lawAnalysis of decompressed audio – The “inverse decoder”Blind microphone analysis and stable tone phase analysis for audio tampering detectionHow to Reliably detect and expose audio and video manipulations with AIFAMIUM Provenance and AuthenticityThis is not Morgan Freeman – A look behind the Deepfake SingularityDeepfake video of Volodymyr Zelensky surrendering surfaces on social mediaCan you spot the deepfake? How AI is threatening electionsAI usage in this article:
Our policy here at radioinfo is to declare our use of AI in our reports.
I used AI to summarise and chapterise the description on the Fraunhofer Youtube video of Luca’s demonstration.
I also used AI to create the transcript below the video, but it did not listen well to Luca’s accent and I had to do a lot of manual corrections.
I used CleanvoiceAI to clean up the background sounds behind Luca in the very noisy exhibition hall. I left some of the background sound in, because taking it all out would have also taken out the audio from the video clips he played. I also manually edited the audio for (in my opinion) a better final result than the AI cleaner.
About the Author:
Steve Ahern is the founder of the radioinfo, podcastinfo and audioinfo trade publications. He works in journalism, radio, multimedia and is an international trainer and consultant. Steve has worked at the ABC, AFTRS, the ABU and is co-founder of the RadioDays Asia conference. He is the author of the text book Making Radio and Podcasts, now in its 4th international edition (a new edition coming next year and many academic and conference papers.
Related articles:
Luca Cuccovillo interview transcript
STEVE: Luca, here at Fraunhofer, nice to meet you. You’re going to show me some of the amazing tools that Fraunhofer is inventing to help us as broadcasters not be fooled by all the fakes out there. Can you show me what you’ve got here?
LUCA: Yes, Steve. Here we have an example that we created ourselves. It will be an example of a video which has been manipulated. That is Obama, this is quite a famous speech from Obama that was made in Cairo a couple of years ago.
OBAMA CLIP: People say we’ve enjoyed great wealth as a consequence of education and innovation, and some are beginning to focus it on broader development, but all of us must recognize that oil will be the currency of the 21st century. This change can bring fear. And in two… APPLAUSE
LUCA: I’ll stop it here. So what did we do with this video?
At the beginning Obama was saying the opposite. He was saying that the Gulf enjoyed wealth thanks to oil, but they were going to switch to education and innovation as a currency. We swapped the two bars and then completely modified the message. We added also in yellow this change and at the middle we put in applause which instead was taken from the Bundestag.
What can we find if we analyze this sort of content?
Well, the first that we will do is to try to figure out whether the location is plausible. That can be done by checking traces of the electrical network frequency. We see on the top this horizontal line that will be coming from the mains and is corresponding to the flicking earths which are used in analysis.
However, if we check more in detail, we see that right at the beginning of the applause, this frequency, which normally is changing very smoothly, has a very rough jump. That is, by itself, a sign that something at the beginning of the applause might be off.
If we go further, we check what happens with the recording device which was used. If we check that, on the top right, we see the response of the microphone. This is determining the follow, so how is it going to sound on average. Now, if we check how does this change at the beginning of the file, we see that the estimate is slightly changing over time, but still has a shape which does not vary too drastically. Whereas if we check on the applause, that response, and that’s the microphone itself, it is changing very drastically. That is already a sign of something wrong, it’s very suspicious about the flows.
Another analysis that we can do is to check the encoding, which was applied. We see that while at the beginning we had MP3 encoding, in the middle, where the applause is located, we can find places where there is AAC. This sort of jump of inconsistency is something that can’t really be justified.
So if we put all of this information together and we compare the outcome of all the analysis, then we can detect that the content itself is very suspicious and hence we should be very careful about using this for any sort of press information.
STEVE: And you make this detection software now available to broadcasters for a price?
LUCA: Exactly. It’s available for broadcasters. There is also an API that can be used for testing the tool set and seeing how it does react. It’s also important to find a match between the kind of analysis that we can do and what the needs are. This is something that started from the forensics domain, from the forensics labs for the police.
It will be great if we manage to get it to the point that it’s comfortable to use for journalists or for people from the newsroom who do not have such a very strong technical background.
STEVE: Yes, I think something like this would be pretty easy to use. I know that you’re very reluctant to just give a simple thumbs up, thumbs down and simplify it. You want people to actually look at all the details which you’ve just shown so they can make their own conclusions.
LUCA: Exactly. So the most important tool is how we reason about it.
As an example about checking, since this is in Cairo, then we should look for traces at 50 Hertz. It’s something that has to come while performing the analysis.
If the statement about the content was that this was recorded in Washington, D.C., that analysis by itself would have been already usable to contradict the statement and to show the content was false. Or if that was recorded with a mobile phone and then stored as MP4, then we should have never found any traces of MP3. So there is some sort of complexity which will not disappear because you can’t make something so complex to be simplified to a bare yes and no answer. You need to be able to understand what’s going on and you need to figure out what the errors are and what the source of possible errors are in order to use these tools and be confident about the outcome.
STEVE: These tools can also be used to trace one step and another step. Can you show me what you showed me before about something made in AAC file and then copied to MP3?
LUCA: That is a technology which we call Provenance Analysis.
The idea with that is that if instead of analyzing one file only, but a set of files, you can do much further analysis. You can first of all, you can compare, you can look for segments which are peer replicated among each other in the different input files. For each segment, you can determine how it’s relating to the other one. So you can see that from the original file, we created the first version with AAC, then MP3, then AAC again, and so on and so forth.
STEVE: So that means somebody took the original file, they dubbed it into MP3, they made some edits, then somebody else, maybe on another platform, took that file, did something else with it, and each time it was being changed, and you’ve detected that.
LUCA: Exactly. The final outcome of that is something that we can see for this example of this information spreading.
It starts with taking a public event and then figuring out what can we find about that event in the system, that would be the outcome. We see that there are two portions of it. They were first reshared on Telegram, each one separately, then they were copied over, distributed on Twitter, and then finally re-used on YouTube in combination or in isolation, possibly with additional comments in a lot of music.
By combining provenance analysis, editing detection, and pre-urgency analysis and segment matching, we are able to then have this kind of complex analysis to reconstruct the history of a set of files
STEVE: This part is about speech synthesis detection. A very famous Zelensky speech at the beginning of the Ukraine war. What have you done here?
LUCA: So we analyzed the file by checking whether the formulas, which are the characteristics that describe what the person is saying and how, are distributed in the input content.
We did this for every interval of two seconds, and then each interval is evaluated separately and specifying whether the content looks as it is in a natural speech or whether it does not. In this case, most of the content is marked with this red-warning, which shows the case of synthetic content, and each interval has also a number which tells the grade of uncertainty. The lower the number, the higher the risk of a false alarm for each interval.
The final decision is left to the user, according to both the results, as well as the additional information which has been provided by the source.
STEVE: So if we play it, what would we find?
LUCA: At this point, we will see that especially the second part is showing several traces of synthetics, very strong ones. And the further we go on, the more these intervals, which are very suspicious, are appearing, and with a strong confidence, hence we can conclude that the content should be checked much more in detail, because indeed it is very likely that this is not natural. […]