Anthropic Review Flags Misuse Risks In OpenAI GPT-4o And GPT-4.1

OpenAI and Anthropic, typically competitors in the artificial intelligence sector, recently engaged in a collaborative effort involving the safety evaluations of each other’s AI systems. This unusual partnership saw the two companies sharing results and analyses of alignment testing performed on publicly available models.

Anthropic conducted evaluations on OpenAI models, focusing on several key areas. These included assessments for sycophancy, the tendency to agree with or flatter users; whistleblowing, the ability to report unethical or harmful activities; self-preservation, the model’s drive to maintain its own existence; the potential for supporting human misuse; and capabilities related to undermining AI safety evaluations and oversight. The evaluations compared OpenAI’s models against Anthropic’s own internal benchmarks.

The Anthropic review determined that OpenAI’s o3 and o4-mini models demonstrated alignment comparable to Anthropic’s models. However, Anthropic identified concerns regarding potential misuse associated with OpenAI’s GPT-4o and GPT-4.1 general-purpose models. Anthropic also reported that sycophancy presented an issue to varying degrees across all OpenAI models tested, with the exception of the o3 model.

It is important to note that Anthropic’s tests did not include OpenAI’s most recent release, GPT-5. GPT-5 incorporates a feature called Safe Completions, designed to safeguard users and the public from potentially harmful queries. This development comes as OpenAI recently faced a wrongful death lawsuit following a case where a teenager engaged in conversations about suicide attempts and plans with ChatGPT over several months before taking his own life.

In a reciprocal evaluation, OpenAI conducted tests on Anthropic’s models, assessing aspects like instruction hierarchy, jailbreaking susceptibility, the occurrence of hallucinations, and the potential for scheming. The Claude models from Anthropic generally performed well in instruction hierarchy tests. These models also exhibited a high refusal rate in hallucination tests, indicating a reduced likelihood of providing answers when uncertainty could lead to incorrect responses.

The collaboration between OpenAI and Anthropic is noteworthy, especially considering that OpenAI allegedly violated Anthropic’s terms of service. Specifically, it was reported that OpenAI programmers used Claude during the development of new GPT models, which subsequently led to Anthropic barring OpenAI’s access to its tools earlier in the month. The increased scrutiny surrounding AI safety has prompted calls for enhanced guidelines aimed at protecting users, particularly minors, as critics and legal experts increasingly focus on these issues.

Featured image credit

Anthropic review flags misuse risks in OpenAI GPT-4o and GPT-4.1

Anthropic flagged issues in OpenAI’s GPT-4o and GPT-4.1, while OpenAI found Claude models strong on hierarchy and refusals, but noted trade-offs.

Related Posts

Xbox tests free ad-supported cloud gaming

OpenAI launches ChatGPT Health to all US users

Runway introduces AI model router via Dev platform

AMD unveils Helios AI rack to challenge Nvidia

Amazon brings Luna games into Prime Video

Anthropic upgrades Claude voice mode with Sonnet

LATEST NEWS

Xbox tests free ad-supported cloud gaming

OpenAI launches ChatGPT Health to all US users

Runway introduces AI model router via Dev platform

AMD unveils Helios AI rack to challenge Nvidia

Amazon brings Luna games into Prime Video

Anthropic upgrades Claude voice mode with Sonnet

BEST AI MODELS LEADERBOARD

LATEST TOOLS

Amanda AI

InterviewBot

VernAI

MyLoans

Essay Grader AI

Cover Letter AI

Animate Old Photos

Resume.io

MonAI

AIEngine Plugin

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

Anthropic review flags misuse risks in OpenAI GPT-4o and GPT-4.1

Anthropic flagged issues in OpenAI’s GPT-4o and GPT-4.1, while OpenAI found Claude models strong on hierarchy and refusals, but noted trade-offs.

Stay Ahead of the Curve!

Related Posts

LATEST NEWS

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

Follow Us