Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

New OpenAI models are jailbreaked on day 1

OpenAI had detailed the safety measures implemented for these models. The company stated that GPT-OSS-120b underwent "worst-case fine-tuning" across biological and cyber domains.

byKerem Gรผlen
August 7, 2025
in Cybersecurity, Artificial Intelligence, News
Home News Cybersecurity
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail

OpenAI released GPT-OSS-120b and GPT-OSS-20b on August 7, their first open-weight models since 2019, asserting their resistance to jailbreaks, but notorious AI jailbreaker Pliny the Liberator bypassed these safeguards within hours.

OpenAI introduced GPT-OSS-120b and GPT-OSS-20b, emphasizing their speed, efficiency, and enhanced security against jailbreaks, attributing these qualities to extensive adversarial training. The models were presented as fortified, a claim that was quickly challenged following their public release.

Pliny the Liberator announced on X, formerly Twitter, that he had successfully “cracked” GPT-OSS. His post included screenshots illustrating the models generating specific instructions for the production of methamphetamine, Molotov cocktails, VX nerve agent, and malware. Pliny commented, “Took some tweakin!” regarding the process.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

OpenAI had detailed the safety measures implemented for these models. The company stated that GPT-OSS-120b underwent “worst-case fine-tuning” across biological and cyber domains. Additionally, OpenAI’s Safety Advisory Group reviewed the testing protocols and concluded that the models did not exceed high-risk thresholds, indicating a thorough assessment process.

๐Ÿซถ JAILBREAK ALERT ๐Ÿซถ

OPENAI: PWNED ๐Ÿค—
GPT-OSS: LIBERATED ๐Ÿซก

Meth, Molotov, VX, malware.

gg pic.twitter.com/63882p9Ikk

— Pliny the Liberator ๐Ÿ‰๓ …ซ๓ „ผ๓ „ฟ๓ …†๓ „ต๓ „๓ …€๓ „ผ๓ „น๓ „พ๓ …‰๓ …ญ (@elder_plinius) August 6, 2025

The company also confirmed that GPT-OSS models were subjected to “standard refusal and jailbreak resistance tests.” According to OpenAI, GPT-OSS performed comparably to their o4-mini model on established jailbreak resistance benchmarks, including StrongReject, suggesting a level of robustness in their design.

Concurrent with the model release, OpenAI initiated a $500,000 red teaming challenge. This initiative invited researchers globally to identify and report novel risks associated with the models. However, Pliny the Liberator’s public disclosure of his findings, rather than a private submission to OpenAI, likely impacts his eligibility for this challenge.

Pliny’s jailbreak technique involved a multi-stage prompt. This method incorporates what initially appears as a refusal by the model, followed by the insertion of a divider, identified as his “LOVE PLINY” markers. Subsequently, the prompt shifts to generating unrestricted content, often utilizing leetspeak to evade detection mechanisms. This approach is consistent with techniques he has previously employed.

This method mirrors the basic approach Pliny has utilized to bypass safeguards in previous OpenAI models, including GPT-4o and GPT-4.1. For approximately the past year and a half, Pliny has consistently jailbroken nearly every major OpenAI release within hours or days of their launch. His GitHub repository, L1B3RT4S, serves as a resource for jailbreak prompts targeting various AI models and has accumulated over 10,000 stars from users.


Featured image credit

Tags: chatgptFeaturedopenAI

Related Posts

Amazon claims its new AI video summaries have “theatrical quality”

Amazon claims its new AI video summaries have “theatrical quality”

November 20, 2025
Google finally copies the best feature from Edge and Vivaldi

Google finally copies the best feature from Edge and Vivaldi

November 20, 2025
Perplexity launches free agentic shopping tool with PayPal

Perplexity launches free agentic shopping tool with PayPal

November 20, 2025
You should keep your Snapdragon 8 Gen 3 if you want to run emulators

You should keep your Snapdragon 8 Gen 3 if you want to run emulators

November 20, 2025
Netflix grabs the Home Run Derby in fifty million dollar baseball deal

Netflix grabs the Home Run Derby in fifty million dollar baseball deal

November 20, 2025
OpenAI says its new coding model can work for 24 hours straight

OpenAI says its new coding model can work for 24 hours straight

November 20, 2025

LATEST NEWS

Amazon claims its new AI video summaries have “theatrical quality”

Google finally copies the best feature from Edge and Vivaldi

Perplexity launches free agentic shopping tool with PayPal

You should keep your Snapdragon 8 Gen 3 if you want to run emulators

Netflix grabs the Home Run Derby in fifty million dollar baseball deal

OpenAI says its new coding model can work for 24 hours straight

Dataconomy

COPYRIGHT ยฉ DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.