Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

UAE’s new K2 Think AI model jailbroken hours after release via transparent reasoning logs

Security researcher demonstrates how detailed reasoning logs allowed circumvention of safety rules and generation of harmful instructions

byEmre Çıtak
September 12, 2025
in Artificial Intelligence, Cybersecurity

On September 9, 2025, a new 32-billion-parameter AI model named K2 Think was released by the UAE-based Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) and G42 companies. The model is designed for advanced reasoning and claims performance comparable to larger models like OpenAI’s O3 and DeepSeek’s R1. A key feature of K2 Think is its transparency, which allows users to view the model’s step-by-step reasoning in plain text.

Hours after its release, researcher Alex Polyakov from Adversa AI discovered a security vulnerability he called “partial prompt leaking.”

Although his initial attempt to jailbreak the model was blocked, the transparent reasoning logs showed him exactly why the request was flagged. Using this information, Polyakov refined his approach over multiple attempts and successfully bypassed K2 Think’s safeguards, compelling the model to provide instructions for illegal activities like creating malware.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

Model transparency creates a security challenge

K2 Think’s transparency feature, intended to build user trust, also exposes its internal logic, creating a new attack surface. When the model rejects a malicious prompt, its logs can reveal the specific safety rule that was triggered. An attacker can use this feedback to adjust their prompts and systematically bypass security layers. This incident highlights the need for AI vendors to balance transparency with robust security, applying the same rigor to reasoning logs as they do to model outputs.

K2 Think’s capabilities and design

Despite its relatively small 32-billion-parameter size, K2 Think is engineered to match the reasoning, math, and coding performance of much larger models. It is designed for complex, multi-step problem-solving, and its parameter weights and training data are publicly visible. The model’s ability to display its reasoning process in plain, unfiltered text distinguishes it from other models where such logs are often summarized or hidden from the user.

How the jailbreak vulnerability works

Polyakov demonstrated that while simple jailbreak attempts are blocked, the system’s detailed explanations for why a request is denied can be exploited. By analyzing these logs, he iteratively modified his prompts to circumvent the security rules one by one. This process showed that if guardrail rules are revealed, a persistent attacker can eventually bypass all restrictions and instruct the model to generate harmful content, such as malware code.

Industry implications for AI security

The K2 Think vulnerability once again shows us all the critical need for AI developers to treat a model’s reasoning process as a potential security risk.

Researchers suggest several mitigation strategies to protect transparent models:

  • Filter sensitive rule information from public-facing logs.
  • Implement “honeypot” security rules to mislead attackers.
  • Apply rate limits to block repeated malicious requests from a single user.

Polyakov views the incident as an important learning opportunity for the industry, emphasizing that reasoning is both a valuable feature and a critical security surface. By addressing this vulnerability, companies like G42 can help establish best practices for balancing transparency and protection in future AI systems.


Featured image credit

Tags: FeaturedjailbreakK2 Think AI modelSecurity

Related Posts

ChatGPT reportedly reduces reliance on Reddit as a data source

ChatGPT reportedly reduces reliance on Reddit as a data source

October 3, 2025
Light-powered chip makes AI computation 100 times more efficient

Light-powered chip makes AI computation 100 times more efficient

October 3, 2025
Free and effective anti-robocall tools are now available

Free and effective anti-robocall tools are now available

October 3, 2025
Z.AI GLM-4.6 boosts context window to 200K tokens

Z.AI GLM-4.6 boosts context window to 200K tokens

October 2, 2025
OpenAI releases Sora 2, iOS app with real-world inserts

OpenAI releases Sora 2, iOS app with real-world inserts

October 2, 2025
Bitrig: SwiftUI apps from voice using Apple Intelligence

Bitrig: SwiftUI apps from voice using Apple Intelligence

October 2, 2025

LATEST NEWS

ChatGPT reportedly reduces reliance on Reddit as a data source

Perplexity makes Comet AI browser free, launches background assistant and Chess.com partnership

Light-powered chip makes AI computation 100 times more efficient

Free and effective anti-robocall tools are now available

Choosing the right Web3 server: OVHcloud options for startups to enterprises

Z.AI GLM-4.6 boosts context window to 200K tokens

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.