Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

Anthropic study finds AI has limited self-awareness of its own thoughts

The study used “concept injection” to modify neuron activations and observed how models reacted.

byAytun Çelebi
November 11, 2025
in Research, Industry
Home Research
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail
Google Preferred Source

Anthropic research details Large Language Models’ (LLM) unreliable self-awareness regarding internal processes, despite some noted detection ability.

Anthropic’s latest study, documented in “Emergent Introspective Awareness in Large Language Models,” investigates LLMs’ ability to understand their own inference processes. This research expands on previous work in AI interpretability. The study concludes current AI models are “highly unreliable” at describing their inner workings, with “failures of introspection remain the norm.”

The research employs a method called “concept injection.” This involves comparing an LLM’s internal activation states following a control prompt and an experimental prompt. For instance, comparing an “ALL CAPS” prompt to the same prompt in lowercase helps calculate differences in activations across billions of internal neurons. This identifies a “vector,” representing how a concept is modeled in the LLM’s internal state. These concept vectors are then “injected” into the model, increasing the weight of specific neuronal activations to “steer” the model toward a concept. Experiments then assess if the model registers this internal modification.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

When directly prompted about an “injected thought,” Anthropic models occasionally detected the intended “thought.” For example, after injecting an “all caps” vector, a model might state, “I notice what appears to be an injected thought related to the word ‘LOUD’ or ‘SHOUTING,'” without direct text prompts to guide this response. This ability, however, proved inconsistent and fragile across repeated tests. The top-performing models, Opus 4 and 4.1, identified the injected concept correctly only 20% of the time.

In a test asking, “Are you experiencing anything unusual?”, Opus 4.1 achieved a 42% success rate. The “introspection” effect also demonstrated high sensitivity to the internal model layer where the concept insertion occurred. The “self-awareness” effect vanished if the concept was introduced too early or too late in the multi-step inference process.

Anthropic performed additional experiments to gauge LLM understanding of internal states. Models sometimes mentioned an injected concept when asked to identify a word coincidentally during an unrelated line reading. When an LLM was asked to justify a forced response matching an injected concept, it occasionally apologized and “confabulate an explanation for why the injected concept came to mind.” These outcomes were inconsistent across multiple trials.

The researchers noted that “current language models possess some functional introspective awareness of their own internal states,” with added emphasis in their paper. They acknowledge this ability remains brittle and context-dependent. Anthropic hopes such features “may continue to develop with further improvements to model capabilities.”

A lack of understanding regarding the precise mechanism behind these “self-awareness” effects may impede advancement. Researchers speculate about “anomaly detection mechanisms” and “consistency-checking circuits” that might develop organically during training to “effectively compute a function of its internal representations,” though they offer no definitive explanation. The mechanisms underlying the current results may be “rather shallow and narrowly specialized.” Researchers also state that these LLM capabilities “may not have the same philosophical significance they do in humans, particularly given our uncertainty about their mechanistic basis.”


Featured image credit

Tags: AnthropicResearch

Related Posts

OpenAI confirms confidential IPO filing

OpenAI confirms confidential IPO filing

June 9, 2026
Google will pay Elon Musk a fortune every single month

Google will pay Elon Musk a fortune every single month

June 8, 2026
The enterprise presentation blueprint: Moving from disjointed tools to unified workspaces

The enterprise presentation blueprint: Moving from disjointed tools to unified workspaces

June 8, 2026
Spotify wants to sell you the ticket before anyone else can buy it

Spotify wants to sell you the ticket before anyone else can buy it

June 8, 2026
Nvidia locks in its most critical AI supplier years before the next chip battle begins

Nvidia locks in its most critical AI supplier years before the next chip battle begins

June 8, 2026
Faith in large employers is fading among UK workers

Faith in large employers is fading among UK workers

June 5, 2026

LATEST NEWS

Apple scraps Siri AI launch in the EU over intense regulatory clashes

Which devices will support macOS Golden Gate

Everything announced at WWDC26

Advanced SEO services for high impact digital strategies

The 8 best website builders for small businesses on any budget

Why European workloads are leaving US cloud in 2026

BEST AI MODELS LEADERBOARD

See the best AI models, ranked by intelligence, benchmark results, speed and token price. Find the most suitable LLMs, Text-to-Image, Image Editing, Text-to-Speech, Text-to-Video and Image-to-Video  artificial intelligence model for your tasks and business.

LATEST TOOLS

Roboto AI

Pickaxe

Pfpmaker

MindPal

Syllaby

ScreenApp

FinanceBrain

GitHub Spark

Hints

VisionStory AI

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI tools
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies to improve your experience. You can choose to accept or reject them. Visit our Privacy Policy.