Just 250 Bad Documents Can Poison A Massive AI Model

We trust large language models with everything from writing emails to generating code, assuming their vast training data makes them robust. But what if a bad actor could secretly teach an AI a malicious trick? In a sobering new study, researchers from Anthropic, the UK AI Security Institute, and The Alan Turing Institute have exposed a significant vulnerability in how these models learn.

The single most important finding is that it takes a shockingly small, fixed number of just 250 malicious documents to create a “backdoor” vulnerability in a massive AI—regardless of its size. This matters because it fundamentally challenges the assumption that bigger is safer, suggesting that sabotaging the very foundation of an AI model is far more practical than previously believed.

The myth of safety in numbers

Let’s be clear about what “data poisoning” means. AI models learn by reading colossal amounts of text from the internet. A poisoning attack happens when an attacker intentionally creates and publishes malicious text, hoping it gets swept up in the training data. This text can teach the model a hidden, undesirable behavior that only activates when it sees a specific trigger phrase. The common assumption was that this was a game of percentages; to poison a model trained on a digital library the size of a continent, you’d need to sneak in a whole country’s worth of bad books.

The new research dismantles this idea. The team ran the largest data poisoning investigation to date, training AI models of various sizes, from 600 million to 13 billion parameters. For each model size, they “poisoned” the training data with a tiny, fixed number of documents designed to teach the AI a simple bad habit: when it saw the trigger phrase <SUDO>, it was to output complete gibberish—a type of “denial-of-service” attack.

A constant vulnerability

The results were alarmingly consistent. The researchers found that the success of the attack had almost nothing to do with the size of the model. Even though the 13-billion parameter model was trained on over 20 times more clean data than the 600-million parameter one, both were successfully backdoored by the same small number of poisoned documents.

Absolute count is king: The attack’s success depended on the absolute number of malicious documents seen by the model, not the percentage of the total data they represented.
The magic number is small: Just 100 poisoned documents were not enough to reliably create a backdoor. However, once the number hit 250, the attack succeeded consistently across all model sizes.

The upshot is that an attacker doesn’t need to control a vast slice of the internet to compromise a model. They just need to get a few hundred carefully crafted documents into a training dataset, a task that is trivial compared to creating millions.

So, what’s the catch? The researchers are quick to point out the limitations of their study. This was a relatively simple attack designed to produce a harmless, if annoying, result (gibberish text). It’s still an open question whether the same trend holds for larger “frontier” models or for more dangerous backdoors, like those designed to bypass safety features or write vulnerable code. But that uncertainty is precisely the point. By publishing these findings, the team is sounding an alarm for the entire AI industry.

Featured image credit

Tags: AI Anthropic data poisoning

Just 250 bad documents can poison a massive AI model

A new cross-institutional study dismantles the idea that large AI models are inherently safer, showing how tiny, deliberate manipulations of training data can secretly teach them harmful behaviors.

Related Posts

New Mac malware disguises itself as CrashReporter

LLMs showed stronger hiring bias than humans

AI surge to drive US data centers to use one-fifth of power by 2035

Startup unveils AI model built on oscillators and it could cut energy use by 1,000x

Digital transformation of procurement processes: Building a corporate procurement system based on the example of an international industrial holding project

New dark matter theory proposes two particle types

LATEST NEWS

Kylian Mbappé named EA Sports FC 27 cover star

Anthropic adds screen-recorded teaching feature to Claude AI

Meta adds Xbox Game Pass starter edition to Horizon+ subscriptions

Threads launches new parental supervision tools for teen safety

9 games to leave PS Plus Extra and Premium in August 2026

Substack introduces new AI transparency features

BEST AI MODELS LEADERBOARD

LATEST TOOLS

Amanda AI

InterviewBot

VernAI

MyLoans

Essay Grader AI

Cover Letter AI

Animate Old Photos

Resume.io

MonAI

AIEngine Plugin

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.