Z.ai's GLM-5.1 Tops SWE-Bench Pro, Beating Major AI Rivals

Independent evaluations indicate that while GLM-5.1 performs strongly in coding, some capability gaps remain in reasoning tasks.

Z.ai released GLM-5.1, an open-source flagship model designed for agentic engineering, capable of working autonomously on a single coding task for up to eight hours. The model manages the process of planning, execution, testing, and iterative optimization continuously. It scored 58.4 on the SWE-Bench Pro benchmark, surpassing competitors such as GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro, making it the top performer in that assessment.

The launch of GLM-5.1 represents a refinement of the earlier GLM-5 model, introduced in February, which features 744 billion parameters, with about 40 billion active parameters per token. GLM-5 was trained solely on Huawei Ascend chips, without utilizing Nvidia hardware. The new version maintains the same architecture but enhances its coding and agentic functionalities through progressive alignment techniques, including multi-task supervised fine-tuning and reinforcement learning stages.

SOTA on SWE-Bench Pro (58.4): GLM-5.1 delivers significant leaps in coding and agentic performance. pic.twitter.com/0dtnWFyTys

— Z.ai (@Zai_org) April 7, 2026

According to Z.ai’s developer documentation, GLM-5.1 is noted for its capability to execute a full “experiment–analyze–optimize” loop autonomously over eight hours. In demonstrations, it built a complete Linux desktop system within this timeframe, completing 655 iterations and increasing vector database query throughput to 6.9 times the initial production version.

The model possesses a context window of 200,000 tokens and can generate up to 128,000 output tokens. It has been optimized for agentic coding workflows, compatible with tools like Claude Code and OpenClaw. On the KernelBench Level 3 benchmark, GLM-5.1 achieved a 3.6x geometric mean speedup in real machine learning workloads.

GLM-5.1 is immediately accessible to all GLM Coding Plan subscribers, with its model weights published under an MIT license. Z.ai, which went public on the Hong Kong Stock Exchange in January with a valuation of $31.3 billion, is offering API access at a price of $1.00 per million input tokens and $3.20 per million output tokens.

The introduction of GLM-5.1 intensifies competition within the open-source coding model space, positioning it as the leader on the SWE-Bench Pro benchmarks against closed-source competitors. Z.ai’s documentation claims that the model’s overall capability is “aligned with Claude Opus 4.6.” However, independent evaluations indicate that GLM-5.1 achieves approximately 94.6% of Claude Opus 4.6’s coding score, with remaining gaps in reasoning and creative tasks.

Featured image credit

Tags: Featured z.ai

Z.ai’s GLM-5.1 tops SWE-Bench Pro, beating major AI rivals

Independent evaluations indicate that while GLM-5.1 performs strongly in coding, some capability gaps remain in reasoning tasks.

Related Posts

Advanced SEO services for high impact digital strategies

The 8 best website builders for small businesses on any budget

Why European workloads are leaving US cloud in 2026

Being friendly to your AI might be the least eco-friendly thing you can do

Jensen Huang says AI is expanding software demand rather than replacing jobs

Halo: Campaign Evolved is now available for pre-order ahead of its July launch

LATEST NEWS

Advanced SEO services for high impact digital strategies

The 8 best website builders for small businesses on any budget

Why European workloads are leaving US cloud in 2026

Being friendly to your AI might be the least eco-friendly thing you can do

Jensen Huang says AI is expanding software demand rather than replacing jobs

Halo: Campaign Evolved is now available for pre-order ahead of its July launch

BEST AI MODELS LEADERBOARD

LATEST TOOLS

Roboto AI

Pickaxe

Pfpmaker

MindPal

Syllaby

ScreenApp

FinanceBrain

GitHub Spark

Hints

VisionStory AI

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

Z.ai’s GLM-5.1 tops SWE-Bench Pro, beating major AI rivals

Independent evaluations indicate that while GLM-5.1 performs strongly in coding, some capability gaps remain in reasoning tasks.

Stay Ahead of the Curve!

Related Posts

LATEST NEWS

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

Follow Us