AI detector chatbots on WhatsApp don’t detect AI that well

Nearly a billion Indians are expected to vote in the ongoing elections, where misinformation and disinformation are among the biggest threats to the country’s political and social fabric, according to the World Economic Forum.

To combat this challenge, several fact-checkers, journalists, and even social media companies have launched or rebooted dozens of fact-checking helplines over WhatsApp, locally called tip lines. A user can verify content including audio, videos, and photos by sending it to the tip line’s WhatsApp number.

In February, Meta launched a tip line on WhatsApp in partnership with a consortium of fact-checking and media organizations. The following month, San Francisco-based tech nonprofit Meedan made its chatbot software available to “several partners who are monitoring viral misinformation shared on closed messaging apps ahead of the 2024 Lok Sabha parliamentary elections in India.”

Rest of World tested 11 prominent tip lines by running 15 pieces of content through them. The content included 10 viral AI-generated videos — known to carry election-related misinformation — and five real videos that had been edited to mislead voters. Most of the tip lines struggled to provide a conclusive answer. Those that did were often inconsistent in their responses to local language content. Some tip lines took up to two days to generate an answer.

Nearly all the tip lines first sent automated bot responses saying “no fact-checks found” and passed on the query to human fact-checkers. The tip lines did not provide a follow-up response 85% of the time. 

The tip lines reviewed for this story were run by media and fact-checking outlets The Quint, India Today, Newsmeter, NewsMobile, Boom, Factly, Vishvas News, Newschecker, Logically Facts, Fact Crescendo, and the Meta helpline Deepfake Analysis Unit (DAU).

“Similarity and matching are a very difficult challenge [for automated replies],” Ed Bice, the CEO of Meedan, told Rest of World. Meedan builds and supports the software that The Quint, Boom, Factly, Newschecker, and others use. “This is a very difficult process, and this is why we seek human feedback on the quality of responses,” Bice said.

For Meedan, which supports tip lines in more than 29 countries, India poses a unique challenge given its many languages, which remain “dramatically underserved with natural language-processing galleries,” said Bice. The company has been developing its own models for Indian languages since the last election.

Indian languages are often misunderstood by automated software. In September 2023, Rest of World tested ChatGPT in languages like Bengali and Tamil, and found that the software fell woefully short in solving simple math problems and taking logical tests. 

Meta’s DAU, for instance, could not process a deepfake of a deceased politician campaigning for his son in Tamil, despite the tip line being available in four languages including Tamil. “Oops! The media you shared is in a language we don’t currently support,” the chatbot said. DAU also failed to find any manipulation in a popular Bollywood track from the ’90s —  where the audio had been changed from Hindi to the Pnar language — that was used in a campaign for a state electoral candidate.

Despite these hiccups, the DAU tip line had more success in verifying content compared to most others. It correctly identified three videos as manipulated — one of which was an AI-manipulated video of Bollywood actor Ranveer Singh lambasting India’s prime minister, Narendra Modi, when he was actually doing quite the opposite.

DAU uses generative AI detection tools, and cross-checks the content with two or three of its seven detection and forensic partners. It also has a dozen fact-checking partners, for services like translation. The tip line was set up with Meta’s support by the Misinformation Combat Alliance to “build it up as a resource for the public to discern between real and synthetic media,” Pamposh Raina, the head of DAU’s three-person team, told Rest of World. There has been a “dearth of specialized services or resources that were looking at AI-generated content because you would also understand that accessing some of these detection tools is expensive,” she said.

The WhatsApp tip lines often require human intervention as the chatbots frequently struggle with verifying the obvious. An AI-generated image of Modi as a “saffron superhero” — donning a saffron cape in front of a saffron flag with the word “Om” on it — went unchecked by all the tip lines. None could verify the discernibly fake image.

The success rate of detection systems can vary widely, depending on the sophistication of the deepfake and the detection technology itself, Divyendra Singh Jadoun, who heads AI content generating company Polymath Solution, told Rest of World. “Early detection systems had higher success rates against less sophisticated deepfakes, but as deepfakes improve, detection becomes a moving target requiring continual updates and training of detection models,” he said. Jadoun is creating AI content for at least half a dozen political campaigns this year.

Only two of the tip lines, Newschecker and The Quint’s WebQoof, caught the manipulation in a blatantly AI-generated video, where Modi can be seen dancing on stage at a concert — a video shared by the prime minister himself. 

Newsmeter was easily able to identify an AI-generated video of opposition leader Rahul Gandhi and politician Akhilesh Yadav, calling it “a meme,” despite most chatbots not catching it. But the tip line said “No” when asked if it could verify Modi’s concert video.

There were also inconsistencies in responses from the same tip line.

Newschecker sent different responses to different users for the same video.

Newschecker, for example, yielded different responses to Modi’s concert video when asked the same question from three separate devices. On one device, the tip line called it a “meme or edited video” that Modi had shared. On another, it said the video was made using the Viggle AI tool, that an X user named Rohan Pal (@rohanpal363) had shared it, and included a link to a related article. 

The discrepancies between a chatbot’s responses for the same query are likely a bug, according to Bice. “Getting the right content based on someone’s unstructured, maybe semi-grammatical query, or matching an image to an available fact-check involves a lot of steps,” Bice said. These include sorting and representing the query in a preexisting set, comparing it against available fact-checked content in the system, and then returning that to the user.

Most of the tip lines were too slow to respond, even to videos that were not deepfakes. Factly, for instance, took almost three days to verify a clip. 

Though the tip lines have people working round-the-clock to verify information, the organizations helming them said they cannot always provide a result due to the fact-checkers’ busy schedules and the volume of messages. It can also take them time to filter out irrelevant queries. 

Newschecker, which offers its services in 10 languages, has its 30-odd team working in two shifts to cover a large part of the day. “Fact-checkers are not only working on one subset of content or media — there’s an entire universe out there,” founder Rajneil Kamath told Rest of World. 

Factly, too, works with a team of around 30 people; its WhatsApp tip line operates from 8 a.m. to 7 p.m. It offers its services in English, Hindi, and Telugu. On a given day, hundreds of queries come in — the team handles the tip line in addition to daily tasks like content moderation on platforms like Instagram and Facebook, and publishing detailed online articles on fact-checking. 

“Fact-checkers are not only working on one subset of content or media — there’s an entire universe out there.”

If fact-checking were left to bots alone, “there would be too many false positives,” Sam Gregory, executive director at the nonprofit Witness, which studies the use of deepfakes to defend human rights, told Rest of World. “Even if the detector was detecting synthesis, that doesn’t mean it’s malicious or deceptive.” 

According to Kamath from Newschecker, “there’s a spectrum of misinformation … between true and false,” such as impersonation and misplaced context, among other things. “Machines haven’t been trained to understand that context yet,” he said.

Tip-line operators said they prioritize dealing with content that is most likely to mislead and cause harm. Some irrelevant queries go ignored on Factly in the interest of time and resource distribution, founder Rakesh Dubbudu told Rest of World. Often, users don’t know the correct process of interacting with tip lines, and will send a piece of content but not ask an exact question, he said. Some other tip lines also reported getting spam like “good morning” messages and random images from users. 

“Everyone uses messaging systems in different ways and our system is constantly evolving to learn how to respond to informal speech or different messaging habits that users might have,” said Bice. Meedan, for instance, doesn’t respond to a query from a user for 30 seconds to allow multiple related messages to come in. But fighting spam and filtering harassing content is “ultimately a game of whack-a-mole,” he said.

Originally Appeared Here

Author: Rayne Chancer