Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

UAE’s new K2 Think AI model jailbroken hours after release via transparent reasoning logs

Security researcher demonstrates how detailed reasoning logs allowed circumvention of safety rules and generation of harmful instructions

byEmre Çıtak
September 12, 2025
in Artificial Intelligence, Cybersecurity
Home News Artificial Intelligence
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail
Google Preferred Source

On September 9, 2025, a new 32-billion-parameter AI model named K2 Think was released by the UAE-based Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) and G42 companies. The model is designed for advanced reasoning and claims performance comparable to larger models like OpenAI’s O3 and DeepSeek’s R1. A key feature of K2 Think is its transparency, which allows users to view the model’s step-by-step reasoning in plain text.

Hours after its release, researcher Alex Polyakov from Adversa AI discovered a security vulnerability he called “partial prompt leaking.”

Although his initial attempt to jailbreak the model was blocked, the transparent reasoning logs showed him exactly why the request was flagged. Using this information, Polyakov refined his approach over multiple attempts and successfully bypassed K2 Think’s safeguards, compelling the model to provide instructions for illegal activities like creating malware.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

Model transparency creates a security challenge

K2 Think’s transparency feature, intended to build user trust, also exposes its internal logic, creating a new attack surface. When the model rejects a malicious prompt, its logs can reveal the specific safety rule that was triggered. An attacker can use this feedback to adjust their prompts and systematically bypass security layers. This incident highlights the need for AI vendors to balance transparency with robust security, applying the same rigor to reasoning logs as they do to model outputs.

K2 Think’s capabilities and design

Despite its relatively small 32-billion-parameter size, K2 Think is engineered to match the reasoning, math, and coding performance of much larger models. It is designed for complex, multi-step problem-solving, and its parameter weights and training data are publicly visible. The model’s ability to display its reasoning process in plain, unfiltered text distinguishes it from other models where such logs are often summarized or hidden from the user.

How the jailbreak vulnerability works

Polyakov demonstrated that while simple jailbreak attempts are blocked, the system’s detailed explanations for why a request is denied can be exploited. By analyzing these logs, he iteratively modified his prompts to circumvent the security rules one by one. This process showed that if guardrail rules are revealed, a persistent attacker can eventually bypass all restrictions and instruct the model to generate harmful content, such as malware code.

Industry implications for AI security

The K2 Think vulnerability once again shows us all the critical need for AI developers to treat a model’s reasoning process as a potential security risk.

Researchers suggest several mitigation strategies to protect transparent models:

  • Filter sensitive rule information from public-facing logs.
  • Implement “honeypot” security rules to mislead attackers.
  • Apply rate limits to block repeated malicious requests from a single user.

Polyakov views the incident as an important learning opportunity for the industry, emphasizing that reasoning is both a valuable feature and a critical security surface. By addressing this vulnerability, companies like G42 can help establish best practices for balancing transparency and protection in future AI systems.


Featured image credit

Tags: FeaturedjailbreakK2 Think AI modelSecurity

Related Posts

Proven privacy: Why ‘no-log’ claims need real evidence today

Proven privacy: Why ‘no-log’ claims need real evidence today

June 12, 2026
ChatGPT hits 1 billion users as global AI adoption surges despite backlash

ChatGPT hits 1 billion users as global AI adoption surges despite backlash

June 12, 2026
OpenAI Codex referral program rewards users with extra rate resets

OpenAI Codex referral program rewards users with extra rate resets

June 12, 2026
Zuckerberg says small elite teams can drive major AI breakthroughs

Zuckerberg says small elite teams can drive major AI breakthroughs

June 12, 2026
Google says AI Overviews reach 2.5 billion monthly users

Google says AI Overviews reach 2.5 billion monthly users

June 12, 2026
Critical UpdraftPlus flaw puts 3 million WordPress sites at risk

Critical UpdraftPlus flaw puts 3 million WordPress sites at risk

June 11, 2026

LATEST NEWS

“Free robots are an illusion”: Why we’ll pay for system intelligence, not delivery workers

How Henrique Schmaiske led Meteor.js through its biggest transformation

Proven privacy: Why ‘no-log’ claims need real evidence today

ChatGPT hits 1 billion users as global AI adoption surges despite backlash

Huawei launches HarmonyOS 7 developer beta with upgraded API 26

OpenAI Codex referral program rewards users with extra rate resets

BEST AI MODELS LEADERBOARD

See the best AI models, ranked by intelligence, benchmark results, speed and token price. Find the most suitable LLMs, Text-to-Image, Image Editing, Text-to-Speech, Text-to-Video and Image-to-Video  artificial intelligence model for your tasks and business.

LATEST TOOLS

Roboto AI

Pickaxe

Pfpmaker

MindPal

Syllaby

ScreenApp

FinanceBrain

GitHub Spark

Hints

VisionStory AI

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI tools
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies to improve your experience. You can choose to accept or reject them. Visit our Privacy Policy.