Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

Why we might lose our only window into how AI thinks

A new paper from top AI labs proposes monitoring an AI's "chain of thought" to detect harmful intentions before they act

byKerem Gülen
July 17, 2025
in Research
Home Research
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail
Google Preferred Source

A paper titled “Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety” proposes a method for improving AI safety by monitoring the internal reasoning of AI models. The research is a collaborative effort from dozens of experts across the UK AI Security Institute, Apollo Research, Google DeepMind, OpenAI, Anthropic, Meta, and several universities. It suggests that analyzing an AI’s “chain of thought,” or its process of thinking out loud in text, offers a unique but potentially fragile way to detect harmful intentions.

A window into the AI’s mind

Modern AI models can be prompted to produce a chain of thought (CoT), which is a textual output of their reasoning steps before they give a final answer. For models explicitly trained to perform extended reasoning, this CoT can act like a look into the model’s internal computational process. An automated system, or CoT monitor, could read this text to flag suspicious or harmful plans. The paper states that for very difficult tasks, AI models built on the Transformer architecture must use their chain of thought as a form of working memory, making their reasoning steps visible. In other cases, models may have a natural tendency to think out loud even when it is not strictly necessary.

This approach has already been used in safety research to find undesirable model behaviors. Researchers have observed models explicitly stating intentions in their CoT, such as “Let’s hack” or “Let’s sabotage,” which allows monitors to catch misbehavior more effectively than by only observing the final action. It can also help discover a model’s underlying goals and identify flaws in AI evaluation methods, such as when a model knows it is being tested.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.


AI learns language like a kid learns to read


Why this window could close

The paper warns that the ability to monitor AI chains of thought is fragile and could be lost due to future development practices. The authors identify several ways monitorability could be degraded:

  • Drift from legible reasoning: As models are trained more intensely with reinforcement learning based on outcomes rather than human-generated data, their chains of thought could drift from legible English into a more optimized, unreadable format.
  • Direct supervision of CoT: If developers directly train the chain of thought to be shorter, use a certain style, or avoid specific content, it might no longer be a faithful reflection of the model’s actual reasoning process.
  • Indirect optimization pressure: Training a model’s final output to look good to a preference model could indirectly put pressure on the CoT to also appear benign, even if the underlying reasoning is not.
  • Novel architectures: Future AI architectures might perform complex reasoning internally in a continuous latent space without needing to verbalize their thoughts, which would eliminate the safety advantages of CoT monitoring.

The authors recommend that AI researchers develop standardized evaluations to measure CoT monitorability and study how different training pressures affect it. For frontier AI developers, the paper suggests they should track the monitorability of their models, publish the results, and use these scores when making decisions about training and deployment. For example, developers might choose an earlier model checkpoint if monitorability degrades during training or justify a small decrease in monitorability if it results from a process that dramatically improves the model’s alignment.


Featured image credit

Tags: AI

Related Posts

Study links AI-assisted homework to lower exam scores

Study links AI-assisted homework to lower exam scores

June 22, 2026
Harvard and Boston Children’s use AI to revisit unsolved genetic cases

Harvard and Boston Children’s use AI to revisit unsolved genetic cases

June 19, 2026
Adobe report finds 86% of creators now use generative AI in workflows

Adobe report finds 86% of creators now use generative AI in workflows

June 17, 2026
AI transfer learning speeds cosmology research but has hidden risks

AI transfer learning speeds cosmology research but has hidden risks

June 15, 2026
Phishing scams targeting travelers hit record levels in 2026

Phishing scams targeting travelers hit record levels in 2026

June 15, 2026
Most UK SMEs now consult AI before their accountants

Most UK SMEs now consult AI before their accountants

June 12, 2026

LATEST NEWS

Samsung adopts ChatGPT Enterprise and Codex across global workforce

Samsung Galaxy S27 Pro leak points to built-in Privacy Display

Perseverance rover completes a marathon on Mars

Polymarket accused of paying creators to post misleading TikTok bet videos

OpenAI improves health responses for free ChatGPT users

Adobe expands Firefly AI across Premiere, Illustrator, InDesign and Frame.io

BEST AI MODELS LEADERBOARD

See the best AI models, ranked by intelligence, benchmark results, speed and token price. Find the most suitable LLMs, Text-to-Image, Image Editing, Text-to-Speech, Text-to-Video and Image-to-Video  artificial intelligence model for your tasks and business.

LATEST TOOLS

Moonbeam

Charisma AI

Essay Writer by Papertyper

Slite

Wonderin AI

Spur

Stenography

Calldesk

MaxAI.me

PhotoRestore

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI tools
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies to improve your experience. You can choose to accept or reject them. Visit our Privacy Policy.