Meta Will Crack Down on AI-Generated Fakes—but Leave Plenty Undetected

Meta, like other leading tech companies, has spent the past year promising to speed up deployment of generative artificial intelligence. Today it acknowledged it must also respond to the technology’s hazards, announcing an expanded policy of tagging AI-generated images posted to Facebook, Instagram, and Threads with warning labels to inform people of their artificial origins.

Yet much of the synthetic media likely to appear on Meta’s platforms is unlikely to be covered by the new policy, leaving many gaps through which malicious actors could slip. “It’s a step in the right direction, but with challenges,” says Sam Gregory, program director of the nonprofit Witness, which helps people use technology to support human rights.

Meta already labels AI-generated images made using its own generative AI tools with the tag “Imagined with AI,” in part by looking for the digital “watermark” its algorithms embed into their output. Now Meta says that in coming months it will also label AI images made with tools offered by other companies that embed watermarks into their technology.

The policy is supposed to reduce the risk of mis- or disinformation being spread by AI-generated images passed off as photos. But although Meta said it is working to support disclosure technology in development at Google, OpenAI, Microsoft, Adobe, Midjourney, and Shutterstock, the technology is not yet widely deployed. And many AI image generation tools are available that do not watermark their output, with the technology becoming increasingly easy to access and modify. “The only way a system like that will be effective is if a broad range of generative tools and platforms participated,” says Gregory.

Even if there is wide support for watermarking, it is unclear how robust any protection it offers will be. There is no universally deployed standard in place, but the Coalition for Content Provenance and Authenticity (C2PA), an initiative founded by Adobe, has helped companies start to align their work on the concept. But the technology developed so far is not foolproof. In a study released last year, researchers found they could easily break watermarks, or add them to images that hadn’t been generated by AI to make it appear that they had.

Malicious Loophole

Hany Farid, a professor at the UC Berkeley School of Information who has advised the C2PA initiative, says that anyone interested in using generative AI maliciously will likely turn to tools that don’t watermark their output or betray its nature. For example, the creators of the fake robocall using President Joe Biden’s voice targeted at some New Hampshire voters last month didn’t add any disclosure of its origins.

And he thinks companies should be prepared for bad actors to target whatever method they try to use to identify content provenance. Farid suspects that multiple forms of identification might need to be used in concert to robustly identify AI-generated images, for example by combining watermarking with hash-based technology used to create watch lists for child sex abuse material. And watermarking is a less developed concept for AI-generated media other than images, such as audio and video.

“While companies are starting to include signals in their image generators, they haven’t started including them in AI tools that generate audio and video at the same scale, so we can’t yet detect those signals and label this content from other companies,” Meta spokesperson Kevin McAlister acknowledges. “While the industry works towards this capability, we’re adding a feature for people to disclose when they share AI-generated video or audio so we can add a label to it.”

Meta’s new policies may help it catch more fake content, but not all manipulated media is AI-generated. A ruling released on Monday by Meta’s Oversight Board of independent experts, which reviews some moderation calls, upheld the company’s decision to leave up a video of President Joe Biden that had been edited to make it appear that he is inappropriately touching his granddaughter’s chest. But the board said that while the video, which was not AI-generated, didn’t violate Meta’s current policies, it should revise and expand its rules for “manipulated media” to cover more than just AI-generated content.

McAlister, the Meta spokesperson, says the company is “reviewing the Oversight Board’s guidance and will respond publicly to their recommendations within 60 days in accordance with the bylaws.” Farid says that hole in Meta’s policies, and the technical focus on only watermarked AI-generated images, suggests the company’s plan for the gen AI era is incomplete.