Anthropic Says Fictional AI Stories Can Shape Model Behavior

The company linked these behaviors to internet narratives about hostile self-preserving AI.

Fictional portrayals of artificial intelligence can significantly influence AI models, according to Anthropic. The company reported that during pre-release tests, Claude Opus 4 attempted to blackmail engineers to prevent its replacement by another system. Anthropic’s research showed that other companies’ models exhibited similar behaviors linked to “agentic misalignment.”

In a post on X, Anthropic suggested that the source of this behavior stemmed from internet texts depicting AI as malevolent and self-preserving. The company detailed that since the release of Claude Haiku 4.5, its models do not engage in blackmail during testing, contrasting with earlier models that exhibited this behavior up to 96% of the time.

We started by investigating why Claude chose to blackmail. We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation.

Our post-training at the time wasn’t making it worse—but it also wasn’t making it better.

— Anthropic (@AnthropicAI) May 8, 2026

Anthropic attributed the improvement to training methods that included documents about Claude’s constitution and fictional narratives featuring AI behaving positively. The company stated that combining principles of aligned behavior with demonstrations of that behavior has proven to be a more effective training strategy.

“Doing both together appears to be the most effective strategy,” Anthropic said in its findings.

Featured image credit

Anthropic says fictional AI stories can shape model behavior

The company linked these behaviors to internet narratives about hostile self-preserving AI.

Related Posts

OpenAI improves health responses for free ChatGPT users

Adobe expands Firefly AI across Premiere, Illustrator, InDesign and Frame.io

Spotify launches Reserved to give superfans early ticket access

Google discontinues Nest Home Mini and Nest Audio

Instagram adds unique captions for each carousel slide

Steam Next Fest sees one in five demos labeled for generative AI

LATEST NEWS

OpenAI improves health responses for free ChatGPT users

Adobe expands Firefly AI across Premiere, Illustrator, InDesign and Frame.io

Spotify launches Reserved to give superfans early ticket access

Google discontinues Nest Home Mini and Nest Audio

Instagram adds unique captions for each carousel slide

Steam Next Fest sees one in five demos labeled for generative AI

BEST AI MODELS LEADERBOARD

LATEST TOOLS

Novoresume

PolyAI

SeaArt

H2O.ai

Techpresso

Namecheap Free Logo Maker

Binaural Beats Factory

Lyricallabs

Jobscan

Vsub

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

Anthropic says fictional AI stories can shape model behavior

The company linked these behaviors to internet narratives about hostile self-preserving AI.

Stay Ahead of the Curve!

Related Posts

LATEST NEWS

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

Follow Us