With the release of artificial intelligence (AI) video generation products like Sora and Luma, we’re on the verge of a flood of AI-generated video content, and policymakers, public figures and software engineers are already warning about a deluge of deepfakes. Now it seems that AI itself might be our best defense against AI fakery after an algorithm has identified telltale markers of AI videos with over 98% accuracy.
The irony of AI protecting us against AI-generated content is hard to miss, but as project lead Matthew Stamm, associate professor of engineering at Drexel University, said in a statement: “It’s more than a bit unnerving that [AI-generated video] could be released before there is a good system for detecting fakes created by bad actors.”
“Until now, forensic detection programs have been effective against edited videos by simply treating them as a series of images and applying the same detection process,” Stamm added. “But with AI-generated video, there is no evidence of image manipulation frame-to-frame, so for a detection program to be effective it will need to be able to identify new traces left behind by the way generative AI programs construct their videos.”
The breakthrough, outlined in a study published April 24 to the pre-print server arXiv, is an algorithm that represents an important new milestone in detecting fake images and video content. That’s because many of the “digital breadcrumbs” existing systems look for in regular digitally edited media aren’t present in entirely AI-generated media.
Related: 32 times artificial intelligence got it catastrophically wrong
The new tool the research project is unleashing on deepfakes, called “MISLnet”, evolved from years of data derived from detecting fake images and video with tools that spot changes made to digital video or images. These may include the addition or movement of pixels between frames, manipulation of the speed of the clip, or the removal of frames.
Such tools work because a digital camera’s algorithmic processing creates relationships between pixel color values. Those relationships between values are very different in user-generated or images edited with apps like Photoshop.
But because AI-generated videos aren’t produced by a camera capturing a real scene or image, they don’t contain those telltale disparities between pixel values.
The Drexel team’s tools, including MISLnet, learn using a method called a constrained neural network, which can differentiate between normal and unusual values at the sub-pixel level of images or video clips, rather than searching for the common indicators of image manipulation like those mentioned above.
MISL outperformed seven other fake AI video detector systems, correctly identifying AI-generated videos 98.3% of the time, outclassing eight other systems that scored at least 93%.
“We’ve already seen AI-generated video being used to create misinformation,” Stamm said in the statement. “As these programs become more ubiquitous and easier to use, we can reasonably expect to be inundated with synthetic videos.”