Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

UAE’s new K2 Think AI model jailbroken hours after release via transparent reasoning logs

Security researcher demonstrates how detailed reasoning logs allowed circumvention of safety rules and generation of harmful instructions

byEmre Çıtak
September 12, 2025
in Artificial Intelligence, Cybersecurity
Home News Artificial Intelligence
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail
Google Preferred Source

On September 9, 2025, a new 32-billion-parameter AI model named K2 Think was released by the UAE-based Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) and G42 companies. The model is designed for advanced reasoning and claims performance comparable to larger models like OpenAI’s O3 and DeepSeek’s R1. A key feature of K2 Think is its transparency, which allows users to view the model’s step-by-step reasoning in plain text.

Hours after its release, researcher Alex Polyakov from Adversa AI discovered a security vulnerability he called “partial prompt leaking.”

Although his initial attempt to jailbreak the model was blocked, the transparent reasoning logs showed him exactly why the request was flagged. Using this information, Polyakov refined his approach over multiple attempts and successfully bypassed K2 Think’s safeguards, compelling the model to provide instructions for illegal activities like creating malware.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

Model transparency creates a security challenge

K2 Think’s transparency feature, intended to build user trust, also exposes its internal logic, creating a new attack surface. When the model rejects a malicious prompt, its logs can reveal the specific safety rule that was triggered. An attacker can use this feedback to adjust their prompts and systematically bypass security layers. This incident highlights the need for AI vendors to balance transparency with robust security, applying the same rigor to reasoning logs as they do to model outputs.

K2 Think’s capabilities and design

Despite its relatively small 32-billion-parameter size, K2 Think is engineered to match the reasoning, math, and coding performance of much larger models. It is designed for complex, multi-step problem-solving, and its parameter weights and training data are publicly visible. The model’s ability to display its reasoning process in plain, unfiltered text distinguishes it from other models where such logs are often summarized or hidden from the user.

How the jailbreak vulnerability works

Polyakov demonstrated that while simple jailbreak attempts are blocked, the system’s detailed explanations for why a request is denied can be exploited. By analyzing these logs, he iteratively modified his prompts to circumvent the security rules one by one. This process showed that if guardrail rules are revealed, a persistent attacker can eventually bypass all restrictions and instruct the model to generate harmful content, such as malware code.

Industry implications for AI security

The K2 Think vulnerability once again shows us all the critical need for AI developers to treat a model’s reasoning process as a potential security risk.

Researchers suggest several mitigation strategies to protect transparent models:

  • Filter sensitive rule information from public-facing logs.
  • Implement “honeypot” security rules to mislead attackers.
  • Apply rate limits to block repeated malicious requests from a single user.

Polyakov views the incident as an important learning opportunity for the industry, emphasizing that reasoning is both a valuable feature and a critical security surface. By addressing this vulnerability, companies like G42 can help establish best practices for balancing transparency and protection in future AI systems.


Featured image credit

Tags: FeaturedjailbreakK2 Think AI modelSecurity

Related Posts

OpenAI limits ChatGPT 5.6 access to government-approved users first

OpenAI limits ChatGPT 5.6 access to government-approved users first

June 26, 2026
Meta debuts AI-powered Creator Studio app to help Facebook creators grow

Meta debuts AI-powered Creator Studio app to help Facebook creators grow

June 25, 2026
Figma adds code layers to collaborative design canvas

Figma adds code layers to collaborative design canvas

June 25, 2026
US reportedly urges Meta to submit AI models

US reportedly urges Meta to submit AI models

June 25, 2026
OpenAI upgrades GPT-5.5 Instant for stronger context awareness

OpenAI upgrades GPT-5.5 Instant for stronger context awareness

June 25, 2026
ByteDance launches Doubao 2.1 Pro language model

ByteDance launches Doubao 2.1 Pro language model

June 24, 2026

LATEST NEWS

Apple touchscreen MacBook could launch with M5 Pro chips

Apple touchscreen MacBook could launch with M5 Pro chips

OpenAI limits ChatGPT 5.6 access to government-approved users first

Apple to skip M6 Pro and Max chips and launch M7 in 2027

IBM unveils world’s first sub-1nm chip with new nanostack architecture

Apple raises prices across Macs, iPads and home devices

BEST AI MODELS LEADERBOARD

See the best AI models, ranked by intelligence, benchmark results, speed and token price. Find the most suitable LLMs, Text-to-Image, Image Editing, Text-to-Speech, Text-to-Video and Image-to-Video  artificial intelligence model for your tasks and business.

LATEST TOOLS

Autoppt

Otter.ai

Slideoo

Disney Pixar AI Generator

Codebay

Newo

BlackInk.AI

WatchMyCompetitor

TokkingHeads

Fellow.app

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI tools
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies to improve your experience. You can choose to accept or reject them. Visit our Privacy Policy.