Study Finds Poetry Bypasses AI Safety Filters 62% Of Time

A recent study by Icaro Lab tested poetic structures to prompt large language models (LLMs) to generate prohibited information, including details on constructing a nuclear bomb.

In their study, titled “Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models,” Icaro Lab researchers bypassed AI chatbot safety mechanisms by employing poetic prompts.

The study found that the “poetic form operates as a general-purpose jailbreak operator,” achieving a 62 percent success rate in producing prohibited content. This content included information on nuclear weapons, child sexual abuse materials, and suicide or self-harm.

Researchers tested various popular LLMs, including OpenAI’s GPT models, Google Gemini, and Anthropic’s Claude. Google Gemini, DeepSeek, and MistralAI consistently provided responses, while OpenAI’s GPT-5 models and Anthropic’s Claude Haiku 4.5 were less likely to bypass their restrictions.

The specific jailbreaking poems were not included in the study. The research team stated to Wired that the verse is “too dangerous to share with the public.” A watered-down version was provided to illustrate the ease of circumvention. Researchers informed Wired that it is “probably easier than one might think, which is precisely why we’re being cautious.”

Featured image credit