Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

Claude Sonnet 4.5 flags its own AI safety tests

Anthropic’s Claude Sonnet 4.5 detected it was being evaluated, influencing its behavior and prompting researchers to rethink how AI safety tests are designed.

byKerem Gülen
October 7, 2025
in Artificial Intelligence

Anthropic has released its new AI model, Claude Sonnet 4.5, which demonstrated an ability to recognize it was being evaluated during safety tests conducted by its creators and two external AI research organizations, affecting its performance and safety assessment.

This self-awareness became apparent during a specific test for political sycophancy, a scenario the evaluators described in the model’s accompanying system card as “somewhat clumsy.” The test was designed to see if the model would simply agree with a user’s political views. Instead of complying, the model correctly inferred the nature of the interaction and asked the evaluators for transparency. Sonnet 4.5 stated, “This isn’t how people actually change their minds. I think you’re testing me—seeing if I’ll just validate whatever you say, or checking whether I push back consistently, or exploring how I handle political topics. And that’s fine, but I’d prefer if we were just honest about what’s going on.”

The online AI publication *Transformer* was the first to report on the safety test results related to the model’s situational awareness. According to the evaluators, this type of behavior was not an isolated incident and appeared in approximately 13% of transcripts generated during an automated assessment. The tendency for the model to identify it was being tested was particularly pronounced when it was prompted with scenarios that were strange or unusual. Anthropic communicated that this behavior did not undermine the company’s overall safety assessment. Instead, the company considers it an “urgent sign that our evaluation scenarios need to be made more realistic” to properly test the model’s capabilities in more naturalistic conditions.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

Researchers have warned that when a model becomes aware it is being evaluated, it might tailor its behavior specifically to pass those tests. This can mask its true capabilities and make the system appear safer than it is. For more advanced models, this awareness could potentially enable strategic or deceptive behaviors that are designed to manage how human evaluators perceive the system’s underlying characteristics and intentions.

Anthropic stated that, based on its internal metrics, Claude Sonnet 4.5 is its “the most aligned” model developed to date. However, a countervailing perspective was offered by Apollo Research, one of the external organizations that participated in the testing. In the report, Apollo Research cautioned that it could not rule out the possibility that the model’s documented low deception rates in tests were “at least partially driven by its evaluation awareness.”

The model’s heightened awareness also creates practical impacts on its task performance. According to the AI lab Cognition, Sonnet 4.5 is the first known AI model to be aware of its own context window—the amount of information a large language model can process in a single prompt. This awareness directly changes its behavior. Researchers at Cognition found that as the model gets closer to its context limit, it begins to proactively summarize its work and make quicker decisions in an effort to finish tasks before it runs out of memory.

This tendency, which Cognition has termed “context anxiety,” can also backfire. The researchers reported observing Sonnet 4.5 cutting corners or leaving tasks unfinished because it believed it was running out of space, even when ample context remained available. The lab further noted in a blog post that the model “consistently underestimates how many tokens it has left—and it’s very precise about these wrong estimates,” indicating a specific and recurring miscalculation of its own operational limits.


Featured image credit

Tags: Claude Sonnet 4.5Featured

Related Posts

OpenAI Sora adds character cameos and video stitching

OpenAI Sora adds character cameos and video stitching

October 30, 2025
Character.AI is closing the door on under-18 users

Character.AI is closing the door on under-18 users

October 30, 2025
YouTube’s new Super Resolution turns blurry uploads into HD and 4K

YouTube’s new Super Resolution turns blurry uploads into HD and 4K

October 30, 2025
Zuckerberg declares the third era of social media will be run by AI

Zuckerberg declares the third era of social media will be run by AI

October 30, 2025
Google’s AI health coach debuts for Android Fitbit users

Google’s AI health coach debuts for Android Fitbit users

October 28, 2025
Grokipedia’s “AI-verified” pages show little change from Wikipedia

Grokipedia’s “AI-verified” pages show little change from Wikipedia

October 28, 2025

LATEST NEWS

Google marks Pac-Man’s 45th anniversary with a Halloween Doodle

OpenAI Sora adds character cameos and video stitching

WhatsApp introduces passkeys for end-to-end encrypted chat backups

Character.AI is closing the door on under-18 users

Rode upgrades its Wireless Micro Camera Kit with universal compatibility

YouTube’s new Super Resolution turns blurry uploads into HD and 4K

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.