Deepfakes, doctored videos, don’t believe your eyes or your ears!

2024-07-09 13:28:00

Par Divina Frau-Meigs, Historical authors The Conversation France

Fact-checking and media literacy specialists thought they had found a way to combat the « deepfakes »or hypertrucagesthese video manipulations based on artificial intelligence, with verification tools like Invid-Werify and the work of image analysis skills (visual literacy), with programs like Youverify.eu.

But a few recent cases show that a new form of cyberattack has just been added to the panoply of disinformation actors: deepfake audio.

In the United States, in January 2024, a robocall generated by artificial intelligence and claiming to be the voice of Joe Biden reached out to New Hampshire residents, urging them not to vote, just days before the Democratic primary in that state. Behind the attack was Steve Kramer, a consultant working for Biden opponent Dean Phillips.

In Slovakia, in March 2024, a fake conversation generated by AI featured journalist Monika Tódová and Slovak Progressive Party leader Michal Semecka fomenting electoral fraud. The recordings circulated on social media may have influenced the outcome of the election.

The same month, in England, a so-called leak on X says Keir Starmer, the leader of the Labour opposition, insulting members of his team. And this, on the very day of the opening of his party conference. A deep fake seen more than a million times online in a few days.

A single “deepfake” can cause multiple damageswith impunity. The implications of the use of this technology affect the integrity of information and the electoral process. Analyzing how deep fakes are generated, interpreting why they are inserted into destabilization campaigns and reacting to protect against them is a matter of Media and Information Literacy.

Analyze: A phenomenon linked to the new era of synthetic media

Audio deepfake is a component of synthetic medianamely media synthesized by artificial intelligence, increasingly removed from real and authentic sources. AI-synthesized audio manipulation is a type of deep imitation that can clone a person’s voice and make them say things they never said.

This is possible thanks to advances in voice synthesis and voice cloning algorithms that make it possible to produce a fake voice, difficult to distinguish from the authentic speech of a person, based on snippets of statements for which a few minutes, or even seconds, are enough.

The rapid evolution of deep learning methods (Deep Learning), in particular the generative adversarial networks (GAN) has contributed to its improvement. The public availability of these low-cost, accessible and efficient technologies has made it possible either to convert text into sound or to carry out deep voice conversion. Current neural vocoders are capable of producing synthetic voices that imitate the human voice, both in timbre (phonation) and prosody (accentuation, amplitude, etc.)


How to spot “deepfakes”? (France 24, March 2023).

Sound deepfakes are incredibly effective and deceptive because they also rely on revolutionary advances in psychoacoustics – the study of the perception of sounds by human beings, particularly in terms of cognition. From the auditory signal to the meaning, through the transformation of this stimulus into nerve impulses, hearing is an activity of voluntary and selective attention. Added to this are sociocognitive and interpretative operations such as listening and understanding the speech of others, to enable us to extract information from our environment.

Not to mention the role of orality in our digital cultures, supported by online and mobile uses, as evidenced by the popularity of podcasts. Social media have seized upon this human reality to build artificial tools who use voice as a narrative tool, with applications such as FakeYou. Voice and speech are part of the register of the intimate, the private, the confidential… and the last frontier of trust in others. For example, radio is the medium that people trust the most, according to the latest Kantar trust barometer published by The Cross !

Interpret: influence operations facilitated by artificial intelligence

Voice cloning has enormous potential to destroy public trust and allow malicious actors to manipulate private phone calls. Audio deepfakes can be used to generate audio spoofs and spread disinformation and hate speech, disrupting the functioning of various sectors of society, from finance to politics. They can also damage people’s reputations to defame them and cause them to fall in polls.

The deployment of audio deepfakes poses multiple risks, including the spread of false information and “fake news”, identity theft, invasion of privacy and malicious alteration of content. The risks are not particularly new but nevertheless real, contributing to a worsening political climate, according to the Alan Turing Institute in the UK.


Deepfake, explained (Brut, 2021)

This industrial-scale amplification should therefore not be underestimated. Audio deepfakes are harder to detect than video deepfakes, while being cheaper and faster to produce: they can easily be grafted onto recent news and the fears of certain, well-identified sectors of the population. In addition, they are an advantageous part of the extremists’ arsenal during interference campaigns in peacetime such as elections.

React: from fraud detection to regulation and education

There are several approaches to identify different types of audio spoofing. Some measure the silent segments of each speech signal and note the higher or lower frequencies, to filter and localize manipulations. Others train AIs to distinguish between natural authentic samples and synthetic samples. However, existing technical solutions fail to fully address the issue of synthetic speech detection.

This detection remains a challenge because the manipulators try to remove their counterfeit traces (by filters, noises, etc.), with deepfake audio generators that are increasingly sophisticated. Faced with these democratic vulnerabilities, various human solutions remain, ranging from self-regulation to regulation and involving various types of actors.

Journalists and fact-checkers have increased their contradictory research techniquesto take this new situation into account. They rely on their strategies for verifying sources and validating the broadcast context. But they also appeal, via Reporters Without Borders, to the legal body, for the protection of journalists, so that they create a “deepfake crime” able to deter manipulators.

The social media platforms (Google, Meta, Twitter and TikTok) that convey and amplify them through their recommendation algorithms are subject to the new EU Code of Practice on DisinformationReinforced in June 2022, it prohibits Deepfakes and orders platforms to use their tools (moderation, deplatformization, etc.) to ensure this.

Teachers and trainers in Media and Information Literacy must in turn be informed, even trained, to be able to alert their students to this type of risk. The youngest are the most targeted. To their visual literacy skills, they must now add skills in sound literacy.

Resources are lacking in this regard and require preparation. This is possible by choosing good examples such as those related to political figures and by paying attention to the 5Ds of disinformation (discredit, distort, distract, deflect, dissuade). Relying on the context and timing of these cyberattacks is also fruitful.

[Déjà plus de 120 000 abonnements aux newsletters The Conversation. Et vous ? Abonnez-vous aujourd’hui pour mieux comprendre les grands enjeux du monde.]

For politicians, who are ultimately concerned but very poorly trained, the Alan Turing Institute offers a strategy that can be shared by all, the 3Is: inform, intercept, insularize. In the pre-election phase, this consists of informing about the risks of audio deepfakes; in the campaign phase, this involves intercepting deepfakes and dismantling the underlying threat scenarios; in the post-election phase, this requires strengthening strategies for mitigating the incidents identified and making them known to the public.

All these approaches must be combined to ensure the integrity of information and elections. In any case, pay attention to your listening and take in some AIR: analyze, interpret, react!

Divina Frau-MeigsProfessor of Information and Communication Sciences, Historical authors The Conversation France

This article is republished from The Conversation sous licence Creative Commons. Lire l’article original.

1724265789
#Deepfakes #doctored #videos #dont #eyes #ears

Share:

Facebook
Twitter
Pinterest
LinkedIn

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.