Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

UAE’s new K2 Think AI model jailbroken hours after release via transparent reasoning logs

Security researcher demonstrates how detailed reasoning logs allowed circumvention of safety rules and generation of harmful instructions

byEmre Çıtak
September 12, 2025
in Artificial Intelligence, Cybersecurity
Home News Artificial Intelligence
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail

On September 9, 2025, a new 32-billion-parameter AI model named K2 Think was released by the UAE-based Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) and G42 companies. The model is designed for advanced reasoning and claims performance comparable to larger models like OpenAI’s O3 and DeepSeek’s R1. A key feature of K2 Think is its transparency, which allows users to view the model’s step-by-step reasoning in plain text.

Hours after its release, researcher Alex Polyakov from Adversa AI discovered a security vulnerability he called “partial prompt leaking.”

Although his initial attempt to jailbreak the model was blocked, the transparent reasoning logs showed him exactly why the request was flagged. Using this information, Polyakov refined his approach over multiple attempts and successfully bypassed K2 Think’s safeguards, compelling the model to provide instructions for illegal activities like creating malware.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

Model transparency creates a security challenge

K2 Think’s transparency feature, intended to build user trust, also exposes its internal logic, creating a new attack surface. When the model rejects a malicious prompt, its logs can reveal the specific safety rule that was triggered. An attacker can use this feedback to adjust their prompts and systematically bypass security layers. This incident highlights the need for AI vendors to balance transparency with robust security, applying the same rigor to reasoning logs as they do to model outputs.

K2 Think’s capabilities and design

Despite its relatively small 32-billion-parameter size, K2 Think is engineered to match the reasoning, math, and coding performance of much larger models. It is designed for complex, multi-step problem-solving, and its parameter weights and training data are publicly visible. The model’s ability to display its reasoning process in plain, unfiltered text distinguishes it from other models where such logs are often summarized or hidden from the user.

How the jailbreak vulnerability works

Polyakov demonstrated that while simple jailbreak attempts are blocked, the system’s detailed explanations for why a request is denied can be exploited. By analyzing these logs, he iteratively modified his prompts to circumvent the security rules one by one. This process showed that if guardrail rules are revealed, a persistent attacker can eventually bypass all restrictions and instruct the model to generate harmful content, such as malware code.

Industry implications for AI security

The K2 Think vulnerability once again shows us all the critical need for AI developers to treat a model’s reasoning process as a potential security risk.

Researchers suggest several mitigation strategies to protect transparent models:

  • Filter sensitive rule information from public-facing logs.
  • Implement “honeypot” security rules to mislead attackers.
  • Apply rate limits to block repeated malicious requests from a single user.

Polyakov views the incident as an important learning opportunity for the industry, emphasizing that reasoning is both a valuable feature and a critical security surface. By addressing this vulnerability, companies like G42 can help establish best practices for balancing transparency and protection in future AI systems.


Featured image credit

Tags: FeaturedjailbreakK2 Think AI modelSecurity

Related Posts

Beyond the Hallucination: How AI is Rebuilding AAA Production Pipelines from a Vacuum

Beyond the Hallucination: How AI is Rebuilding AAA Production Pipelines from a Vacuum

January 8, 2026
Google Classroom turns lessons into podcasts with Gemini

Google Classroom turns lessons into podcasts with Gemini

January 8, 2026
Skylight unveils Calendar 2 with AI organization tools at CES 2026

Skylight unveils Calendar 2 with AI organization tools at CES 2026

January 8, 2026
OpenAI launches dedicated ChatGPT Health space

OpenAI launches dedicated ChatGPT Health space

January 8, 2026
Ford announces AI assistant and next-gen BlueCruise at CES 2026

Ford announces AI assistant and next-gen BlueCruise at CES 2026

January 8, 2026
Lenovo and Motorola introduce Qira cross-device AI assistant

Lenovo and Motorola introduce Qira cross-device AI assistant

January 7, 2026

LATEST NEWS

CES 2026: Samsung Display announcements in a nutshell

Bluetti unveils Charger 2 with dual engine-solar inputs at CES 2026

Tone Outdoors unveils silent T1 leaf blower at CES 2026

Roblox mandates facial verification for global chat access

Spotify challenges YouTube with easier video monetization

Spotify adds real-time listening activity to Messages

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
  • AI tools
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.