Systems designed to detect deepfakes – videos that manipulate real-life images through artificial intelligence – can be fooled , as this study suggests.
Researchers have shown that detectors can be defeated by inserting contradictory examples into each frame of video. Conflicting examples are slightly manipulated inputs that cause artificial intelligence systems, such as machine learning models, to make a mistake.
Attacking the blind spots
In deepfakes the face of a subject is modified to create realistic and compelling images of events that never happened. As a result, typical deepfake detectors focus on the face in videos: first they track it and then pass the cropped face data to a neural network that determines whether it is real or fake .
For example, eye blinking is not well reproduced in deepfakes, so detectors focus on eye movements as a way to detect that the video is fake.
However, if the creators of a fake video have some knowledge of the detection system, they can design entrances to target the detector’s blind spots and avoid it.
The researchers created a confrontational example for each face in a video frame . But while standard operations, such as compressing and resizing a video, generally remove conflicting examples from an image, these examples are designed to resist these processes. The attack algorithm does this by estimating over a set of input transformations how the model classifies the images as real or false. The modified version of the face is then inserted into all video frames. The process is then repeated for all frames in the video to create a deepfake video.
To improve detectors, the researchers recommend an approach similar to what is known as Adversarial Machine Learning or adversary training: during training, an adaptive adversary continues to generate new deepfakes that can bypass the current next-generation detector; And the detector keeps getting better at detecting new deepfakes.