Anthropic Wants To Decode AI By 2027

Anthropic CEO Dario Amodei published an essay on Thursday highlighting the limited understanding of the inner workings of leading AI models and set a goal for Anthropic to reliably detect most AI model problems by 2027.

Amodei acknowledges the challenge ahead, stating that while Anthropic has made early breakthroughs in tracing how models arrive at their answers, more research is needed to decode these systems as they grow more powerful. “I am very concerned about deploying such systems without a better handle on interpretability,” Amodei wrote, emphasizing their central role in the economy, technology, and national security.

Anthropic is a pioneer in mechanistic interpretability, aiming to understand why AI models make certain decisions. Despite rapid performance improvements, the industry still has limited insight into how these systems arrive at decisions. For instance, OpenAI’s new reasoning AI models, o3 and o4-mini, perform better on some tasks but hallucinate more than other models, with the company unsure why.

Amodei notes that AI researchers have improved model intelligence but don’t fully understand why these improvements work. Anthropic co-founder Chris Olah says AI models are “grown more than they are built.” Amodei warns that reaching AGI without understanding how models work could be dangerous and believes we’re further from fully understanding AI models than achieving AGI, potentially by 2026 or 2027.

Anthropic aims to conduct “brain scans” or “MRIs” of state-of-the-art AI models to identify issues, including tendencies to lie or seek power. This could take five to 10 years but will be necessary for testing and deploying future models. The company has made breakthroughs in tracing AI model thinking pathways through “circuits” and identified one circuit that helps models understand U.S. city locations within states.

Anthropic has invested in interpretability research and recently made its first investment in a startup working on the field. Amodei believes explaining how AI models arrive at answers could present a commercial advantage. He called on OpenAI and Google DeepMind to increase their research efforts and asked governments to impose “light-touch” regulations to encourage interpretability research.

Amodei also suggested the U.S. should impose export controls on chips to China to limit the likelihood of an out-of-control global AI race. Anthropic has focused on safety, issuing modest support for California’s AI safety bill, SB 1047, which would have set safety reporting standards for frontier AI model developers.

Anthropic is pushing for an industry-wide effort to better understand AI models, not just increase their capabilities. The company’s efforts and recommendations highlight the need for a collaborative approach to AI safety and interpretability.

Featured image credit

Tags: Anthropic Featured

Anthropic wants to decode AI by 2027

Dario Amodei warns that AI models are getting more powerful without being more interpretable, setting a 2027 goal to fix the gap.

Related Posts

OpenAI retires Atlas browser to focus on new ChatGPT superapp

Microsoft tests Copilot’s new PC insights feature in Windows 11

Xiaomi unveils SkyNomad N90 range-extender SUV

X algorithm update aims to make replies feel friendlier

Windows 11 Search Box gets less clutter and more control

Pixel 11 leak shows bold magenta and peach colors

LATEST NEWS

OpenAI retires Atlas browser to focus on new ChatGPT superapp

Microsoft tests Copilot’s new PC insights feature in Windows 11

Xiaomi unveils SkyNomad N90 range-extender SUV

X algorithm update aims to make replies feel friendlier

Windows 11 Search Box gets less clutter and more control

Pixel 11 leak shows bold magenta and peach colors

BEST AI MODELS LEADERBOARD

LATEST TOOLS

Mootion

Legacy AI

Copyseeker

ProPhotos

Kuki AI

Create

RemodelAI

AItwitch

Vadoo AI

Greptile AI

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

Anthropic wants to decode AI by 2027

Dario Amodei warns that AI models are getting more powerful without being more interpretable, setting a 2027 goal to fix the gap.

Stay Ahead of the Curve!

Related Posts

LATEST NEWS

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

Follow Us