Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

UAE’s new K2 Think AI model jailbroken hours after release via transparent reasoning logs

Security researcher demonstrates how detailed reasoning logs allowed circumvention of safety rules and generation of harmful instructions

byEmre Çıtak
September 12, 2025
in Artificial Intelligence, Cybersecurity
Home News Artificial Intelligence

On September 9, 2025, a new 32-billion-parameter AI model named K2 Think was released by the UAE-based Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) and G42 companies. The model is designed for advanced reasoning and claims performance comparable to larger models like OpenAI’s O3 and DeepSeek’s R1. A key feature of K2 Think is its transparency, which allows users to view the model’s step-by-step reasoning in plain text.

Hours after its release, researcher Alex Polyakov from Adversa AI discovered a security vulnerability he called “partial prompt leaking.”

Although his initial attempt to jailbreak the model was blocked, the transparent reasoning logs showed him exactly why the request was flagged. Using this information, Polyakov refined his approach over multiple attempts and successfully bypassed K2 Think’s safeguards, compelling the model to provide instructions for illegal activities like creating malware.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

Model transparency creates a security challenge

K2 Think’s transparency feature, intended to build user trust, also exposes its internal logic, creating a new attack surface. When the model rejects a malicious prompt, its logs can reveal the specific safety rule that was triggered. An attacker can use this feedback to adjust their prompts and systematically bypass security layers. This incident highlights the need for AI vendors to balance transparency with robust security, applying the same rigor to reasoning logs as they do to model outputs.

K2 Think’s capabilities and design

Despite its relatively small 32-billion-parameter size, K2 Think is engineered to match the reasoning, math, and coding performance of much larger models. It is designed for complex, multi-step problem-solving, and its parameter weights and training data are publicly visible. The model’s ability to display its reasoning process in plain, unfiltered text distinguishes it from other models where such logs are often summarized or hidden from the user.

How the jailbreak vulnerability works

Polyakov demonstrated that while simple jailbreak attempts are blocked, the system’s detailed explanations for why a request is denied can be exploited. By analyzing these logs, he iteratively modified his prompts to circumvent the security rules one by one. This process showed that if guardrail rules are revealed, a persistent attacker can eventually bypass all restrictions and instruct the model to generate harmful content, such as malware code.

Industry implications for AI security

The K2 Think vulnerability once again shows us all the critical need for AI developers to treat a model’s reasoning process as a potential security risk.

Researchers suggest several mitigation strategies to protect transparent models:

  • Filter sensitive rule information from public-facing logs.
  • Implement “honeypot” security rules to mislead attackers.
  • Apply rate limits to block repeated malicious requests from a single user.

Polyakov views the incident as an important learning opportunity for the industry, emphasizing that reasoning is both a valuable feature and a critical security surface. By addressing this vulnerability, companies like G42 can help establish best practices for balancing transparency and protection in future AI systems.


Featured image credit

Tags: FeaturedjailbreakK2 Think AI modelSecurity

Related Posts

Apple CarPlay vulnerability leaves vehicles exposed due to slow patch adoption

Apple CarPlay vulnerability leaves vehicles exposed due to slow patch adoption

September 12, 2025
Barcelona startup Altan raises .5 million to democratize software development with AI agents

Barcelona startup Altan raises $2.5 million to democratize software development with AI agents

September 12, 2025
Modstealer malware bypasses antivirus, targets crypto wallets

Modstealer malware bypasses antivirus, targets crypto wallets

September 12, 2025
Ukrainian ransomware administrator Volodymyr Tymoshchuk indicted for global cyberattacks

Ukrainian ransomware administrator Volodymyr Tymoshchuk indicted for global cyberattacks

September 12, 2025
SpamGPT cybercrime toolkit enables large-scale automated phishing campaigns in 2025

SpamGPT cybercrime toolkit enables large-scale automated phishing campaigns in 2025

September 12, 2025
Not every problem needs AI: A solution architect’s view on responsible tech

Not every problem needs AI: A solution architect’s view on responsible tech

September 12, 2025

LATEST NEWS

UAE’s new K2 Think AI model jailbroken hours after release via transparent reasoning logs

YouTube Music redesigns its Now Playing screen on Android and iOS

EU’s Chat Control proposal will scan your WhatsApp and Signal messages if approved

Apple CarPlay vulnerability leaves vehicles exposed due to slow patch adoption

iPhone Air may spell doomsday for physical SIM cards

Barcelona startup Altan raises $2.5 million to democratize software development with AI agents

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.