New OpenAI Models Are Jailbreaked On Day 1

OpenAI released GPT-OSS-120b and GPT-OSS-20b on August 7, their first open-weight models since 2019, asserting their resistance to jailbreaks, but notorious AI jailbreaker Pliny the Liberator bypassed these safeguards within hours.

OpenAI introduced GPT-OSS-120b and GPT-OSS-20b, emphasizing their speed, efficiency, and enhanced security against jailbreaks, attributing these qualities to extensive adversarial training. The models were presented as fortified, a claim that was quickly challenged following their public release.

Pliny the Liberator announced on X, formerly Twitter, that he had successfully “cracked” GPT-OSS. His post included screenshots illustrating the models generating specific instructions for the production of methamphetamine, Molotov cocktails, VX nerve agent, and malware. Pliny commented, “Took some tweakin!” regarding the process.

OpenAI had detailed the safety measures implemented for these models. The company stated that GPT-OSS-120b underwent “worst-case fine-tuning” across biological and cyber domains. Additionally, OpenAI’s Safety Advisory Group reviewed the testing protocols and concluded that the models did not exceed high-risk thresholds, indicating a thorough assessment process.

🫶 JAILBREAK ALERT 🫶

OPENAI: PWNED 🤗
GPT-OSS: LIBERATED 🫡

Meth, Molotov, VX, malware.

gg pic.twitter.com/63882p9Ikk

— Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭 (@elder_plinius) August 6, 2025

The company also confirmed that GPT-OSS models were subjected to “standard refusal and jailbreak resistance tests.” According to OpenAI, GPT-OSS performed comparably to their o4-mini model on established jailbreak resistance benchmarks, including StrongReject, suggesting a level of robustness in their design.

Concurrent with the model release, OpenAI initiated a $500,000 red teaming challenge. This initiative invited researchers globally to identify and report novel risks associated with the models. However, Pliny the Liberator’s public disclosure of his findings, rather than a private submission to OpenAI, likely impacts his eligibility for this challenge.

Pliny’s jailbreak technique involved a multi-stage prompt. This method incorporates what initially appears as a refusal by the model, followed by the insertion of a divider, identified as his “LOVE PLINY” markers. Subsequently, the prompt shifts to generating unrestricted content, often utilizing leetspeak to evade detection mechanisms. This approach is consistent with techniques he has previously employed.

This method mirrors the basic approach Pliny has utilized to bypass safeguards in previous OpenAI models, including GPT-4o and GPT-4.1. For approximately the past year and a half, Pliny has consistently jailbroken nearly every major OpenAI release within hours or days of their launch. His GitHub repository, L1B3RT4S, serves as a resource for jailbreak prompts targeting various AI models and has accumulated over 10,000 stars from users.

Featured image credit

Tags: chatgpt Featured openAI

New OpenAI models are jailbreaked on day 1

OpenAI had detailed the safety measures implemented for these models. The company stated that GPT-OSS-120b underwent "worst-case fine-tuning" across biological and cyber domains.

Related Posts

Amazon adds AI-generated product previews to search results

Meta launches AI business agents on WhatsApp, Instagram and Messenger

Nintendo will release a repair-friendly Switch 2 in Europe

Google rolls out Ask Gemini in Drive to eligible Workspace users

Google Wallet to add digital IDs from select EU countries this summer

Why Telegram Mini Apps have become the optimal ecosystem for launching AI SaaS products

LATEST NEWS

Amazon adds AI-generated product previews to search results

Meta launches AI business agents on WhatsApp, Instagram and Messenger

Nintendo will release a repair-friendly Switch 2 in Europe

Google rolls out Ask Gemini in Drive to eligible Workspace users

Google Wallet to add digital IDs from select EU countries this summer

Why Telegram Mini Apps have become the optimal ecosystem for launching AI SaaS products

BEST AI MODELS LEADERBOARD

LATEST TOOLS

Roboto AI

Pickaxe

Pfpmaker

MindPal

Syllaby

ScreenApp

FinanceBrain

GitHub Spark

Hints

VisionStory AI

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

New OpenAI models are jailbreaked on day 1

OpenAI had detailed the safety measures implemented for these models. The company stated that GPT-OSS-120b underwent "worst-case fine-tuning" across biological and cyber domains.

Stay Ahead of the Curve!

Related Posts

LATEST NEWS

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

Follow Us