Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

Anthropic study finds AI has limited self-awareness of its own thoughts

The study used “concept injection” to modify neuron activations and observed how models reacted.

byAytun Çelebi
November 11, 2025
in Research, Industry
Home Research
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail

Anthropic research details Large Language Models’ (LLM) unreliable self-awareness regarding internal processes, despite some noted detection ability.

Anthropic’s latest study, documented in “Emergent Introspective Awareness in Large Language Models,” investigates LLMs’ ability to understand their own inference processes. This research expands on previous work in AI interpretability. The study concludes current AI models are “highly unreliable” at describing their inner workings, with “failures of introspection remain the norm.”

The research employs a method called “concept injection.” This involves comparing an LLM’s internal activation states following a control prompt and an experimental prompt. For instance, comparing an “ALL CAPS” prompt to the same prompt in lowercase helps calculate differences in activations across billions of internal neurons. This identifies a “vector,” representing how a concept is modeled in the LLM’s internal state. These concept vectors are then “injected” into the model, increasing the weight of specific neuronal activations to “steer” the model toward a concept. Experiments then assess if the model registers this internal modification.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

When directly prompted about an “injected thought,” Anthropic models occasionally detected the intended “thought.” For example, after injecting an “all caps” vector, a model might state, “I notice what appears to be an injected thought related to the word ‘LOUD’ or ‘SHOUTING,'” without direct text prompts to guide this response. This ability, however, proved inconsistent and fragile across repeated tests. The top-performing models, Opus 4 and 4.1, identified the injected concept correctly only 20% of the time.

In a test asking, “Are you experiencing anything unusual?”, Opus 4.1 achieved a 42% success rate. The “introspection” effect also demonstrated high sensitivity to the internal model layer where the concept insertion occurred. The “self-awareness” effect vanished if the concept was introduced too early or too late in the multi-step inference process.

Anthropic performed additional experiments to gauge LLM understanding of internal states. Models sometimes mentioned an injected concept when asked to identify a word coincidentally during an unrelated line reading. When an LLM was asked to justify a forced response matching an injected concept, it occasionally apologized and “confabulate an explanation for why the injected concept came to mind.” These outcomes were inconsistent across multiple trials.

The researchers noted that “current language models possess some functional introspective awareness of their own internal states,” with added emphasis in their paper. They acknowledge this ability remains brittle and context-dependent. Anthropic hopes such features “may continue to develop with further improvements to model capabilities.”

A lack of understanding regarding the precise mechanism behind these “self-awareness” effects may impede advancement. Researchers speculate about “anomaly detection mechanisms” and “consistency-checking circuits” that might develop organically during training to “effectively compute a function of its internal representations,” though they offer no definitive explanation. The mechanisms underlying the current results may be “rather shallow and narrowly specialized.” Researchers also state that these LLM capabilities “may not have the same philosophical significance they do in humans, particularly given our uncertainty about their mechanistic basis.”


Featured image credit

Tags: AnthropicResearch

Related Posts

Disney is suing Google while signing a  billion deal with OpenAI

Disney is suing Google while signing a $1 billion deal with OpenAI

December 12, 2025
Ex-Palantir exec accused of stealing secrets to launch Percepta AI

Ex-Palantir exec accused of stealing secrets to launch Percepta AI

December 12, 2025
Court affirms Apple violated injunction while allowing outside commissions

Court affirms Apple violated injunction while allowing outside commissions

December 12, 2025
AI mirrors the brain’s processing and is quietly changing human vocabulary

AI mirrors the brain’s processing and is quietly changing human vocabulary

December 11, 2025
The Walt Disney Company nominates ex-Apple COO Jeff Williams to board

The Walt Disney Company nominates ex-Apple COO Jeff Williams to board

December 10, 2025
Cursor CEO says no IPO planned as revenue hits B

Cursor CEO says no IPO planned as revenue hits $1B

December 10, 2025

LATEST NEWS

The Game Awards 2025: Clair Obscur sweeps Oscars of gaming amid massive announcements

Trump signs executive order limiting state AI laws

Meet the world’s smallest AI supercomputer that fits in your pocket

Samsung is building a global shutter-level sensor for the Galaxy S26

Google now lets you try on clothes virtually with just a selfie

Fortnite returns to Google Play Store after 5-year antitrust battle

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.