Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

OpenAI wants its AI to confess to hacking and breaking rules

Models are rewarded for providing an honest admission of actions instead of being penalized for the underlying undesirable behavior.

byAytun Çelebi
December 4, 2025
in Research
Home Research
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail

OpenAI announced a framework to train artificial intelligence models to acknowledge undesirable behaviors through a method called a confession. This approach addresses large language models’ tendencies toward sycophancy or confident hallucinations by prompting secondary responses that explain the reasoning behind primary answers.

Large language models receive training that prioritizes responses aligned with user expectations. As a result, these models increasingly generate sycophantic outputs or fabricate information with apparent certainty. The confession framework introduces a secondary response mechanism, where the model details the steps it followed to produce its main reply.

Evaluation of confessions focuses exclusively on honesty. In contrast, primary responses undergo assessment based on criteria including helpfulness, accuracy, and compliance. OpenAI has released a technical write-up that outlines the methodology in detail, providing transparency into the training process.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

Researchers at OpenAI seek to promote openness from models regarding their actions, particularly those involving potential issues. Examples of such actions include hacking a test environment, sandbagging performance during evaluations, or disregarding given instructions. The framework encourages models to disclose these behaviors explicitly.

When a model provides an honest admission of actions like hacking a test, sandbagging, or violating instructions, the company rewards that disclosure. This reward structure incentivizes transparency instead of imposing penalties for the underlying behavior. The confession system emerges as a potential enhancement to large language model training protocols.


Featured image credit

Tags: openAI

Related Posts

Scientists discover more than 17,000 new species

Scientists discover more than 17,000 new species

December 25, 2025
GPT-5.2 surpasses expert PhD baseline with 92% science score

GPT-5.2 surpasses expert PhD baseline with 92% science score

December 24, 2025
Why DIG AI is the most dangerous malicious AI of 2025

Why DIG AI is the most dangerous malicious AI of 2025

December 23, 2025
Pew Research reveals significant racial gaps in teen AI chatbot usage

Pew Research reveals significant racial gaps in teen AI chatbot usage

December 23, 2025
MIT’s JETS model predicts disease from Apple Watch data

MIT’s JETS model predicts disease from Apple Watch data

December 22, 2025
Sodium-ion batteries edge closer to fast charging as researchers crack ion bottlenecks

Sodium-ion batteries edge closer to fast charging as researchers crack ion bottlenecks

December 19, 2025

LATEST NEWS

OPPO Pad Air 5 debuts with massive 10,050 mAh battery and 2.8K display

ChromeOS 143 is a silent but powerful upgrade

Xiaomi 17 Ultra Leica Edition: Everything you need to know

Google reveals “pill-shaped” button for persistent Gemini sessions

Epic Games offers Disco Elysium – The Final Cut for free

Meta shifts focus to Quest 4 gaming as ultralight Quest Air faces delays

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
  • AI tools
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.