
Deepfakes are no longer just about swapped faces—they now include fully fabricated scenes, voices, and settings.
To fight back, UC Riverside researchers and Google have teamed up to create UNITE, a cutting-edge AI system that can detect deepfake videos even when no faces are visible. Unlike older tools that rely on facial cues, UNITE analyzes full video frames—including movement and background inconsistencies—to expose synthetic or manipulated content. As AI-generated videos grow more convincing, this powerful detection system could become essential in safeguarding newsrooms, social platforms, and public trust.
Exposing the New Age of Video Fakery
As realistic-looking fake videos become easier to create and more widely used to spread false information, target individuals, and cause harm, researchers at the University of California, Riverside have developed a new AI system designed to detect these digital forgeries.
Amit Roy-Chowdhury, a professor of electrical and computer engineering, and doctoral student Rohit Kundu from UCR’s Marlan and Rosemary Bourns College of Engineering, collaborated with a team at Google to build an <span class="glossaryLink" aria-describedby="tt" data-cmtooltip="
” data-gt-translate-attributes=”[{"attribute":"data-cmtooltip", "format":"html"}]” tabindex=”0″ role=”link”>artificial intelligence model that can identify video manipulation, even when it involves much more than simple face swaps or altered audio. (Roy-Chowdhury is also the co-director of the UC Riverside Artificial Intelligence Research and Education (RAISE) Institute, a recently launched interdisciplinary center at UCR.)
The tool, called the Universal Network for Identifying Tampered and synthEtic videos (UNITE), works by analyzing entire video frames rather than focusing solely on faces. It examines background details and motion patterns, making it one of the first systems capable of spotting altered or fully synthetic footage that doesn’t depend on facial clues.
From Face Swaps to Fully Faked Worlds
“Deepfakes have evolved,” Kundu said. “They’re not just about face swaps anymore. People are now creating entirely fake videos — from faces to backgrounds — using powerful generative models. Our system is built to catch all of that.”
The release of UNITE comes at a time when AI-driven text-to-video and image-to-video tools are becoming easily accessible online. These technologies allow nearly anyone to create highly convincing fake videos, raising significant concerns for public figures, organizations, and the integrity of democratic processes.
“It’s scary how accessible these tools have become,” Kundu said. “Anyone with moderate skills can bypass safety filters and generate realistic videos of public figures saying things they never said.”
Detectors That Don’t Need Faces
Kundu explained that earlier deepfake detectors focused almost entirely on face cues.
“If there’s no face in the frame, many detectors simply don’t work,” he said. “But disinformation can come in many forms. Altering a scene’s background can distort the truth just as easily.”
To address this, UNITE uses a transformer-based deep learning model to analyze video clips. It detects subtle spatial and temporal inconsistencies — cues often missed by previous systems. The model draws on a foundational AI framework known as SigLIP, which extracts features not bound to a specific person or object. A novel training method, dubbed “attention-diversity loss,” prompts the system to monitor multiple visual regions in each frame, preventing it from focusing solely on faces.
One Model to Detect Them All
The result is a universal detector capable of flagging a range of forgeries — from simple facial swaps to complex, fully synthetic videos generated without any real footage.
“It’s one model that handles all these scenarios,” Kundu said. “That’s what makes it universal.”
The researchers presented their findings at the high-ranking 2025 Conference on Computer Vision and Pattern Recognition (CVPR) in Nashville, Tenn. Titled “Towards a Universal Synthetic Video Detector: From Face or Background Manipulations to Fully AI-Generated Content,” their paper, led by Kundu, outlines UNITE’s architecture and training methodology. Co-authors include Google researchers Hao Xiong, Vishal Mohanty, and Athula Balachandra. Co-sponsored by the IEEE Computer Society and the Computer Vision Foundation, CVPR is among the highest-impact scientific publication venues in the world.
Powered by Google’s Resources
The collaboration with Google, where Kundu interned, provided access to expansive datasets and computing resources needed to train the model on a broad range of synthetic content, including videos generated from text or still images — formats that often stump existing detectors.
Though still in development, UNITE could soon play a vital role in defending against video disinformation. Potential users include social media platforms, fact-checkers, and newsrooms working to prevent manipulated videos from going viral.
“People deserve to know whether what they’re seeing is real,” Kundu said. “And as AI gets better at faking reality, we have to get better at revealing the truth.”
Reference: “Towards a Universal Synthetic Video Detector: From Face or Background Manipulations to Fully AI-Generated Content” by Rohit Kundu, Hao Xiong, Vishal Mohanty, Athula Balachandran and Amit K. Roy-Chowdhury, 16 December 2024, Computer Science > Computer Vision and Pattern Recognition.
arXiv:2412.12278
Never miss a breakthrough: Join the SciTechDaily newsletter.