Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

Anthropic releases Claude Sonnet 4.5 with advanced coding and agent capabilities

Anthropic’s new Claude Sonnet 4.5 outpaces GPT-5 in coding and agent tasks, while offering expanded developer tools at no extra cost.

byEmre Çıtak
September 30, 2025
in Artificial Intelligence

AI company Anthropic has released Claude Sonnet 4.5, a new flagship model that the company positions as its most capable for coding, building complex AI agents, and using computer systems, with significant gains in reasoning and mathematics.

The new model is available now and is accompanied by a new developer toolkit and major updates across the Claude product line.

Sonnet 4.5 features that stand out

According to Anthropic’s blog post, the model achieves state-of-the-art performance on the SWE-bench Verified evaluation, a benchmark that measures real-world software coding abilities. It also shows improved performance on the OSWorld benchmark, which tests an AI model’s ability to perform real-world tasks on a computer, such as navigating websites and filling spreadsheets.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

The company also reports that experts in finance, law, medicine, and STEM found Sonnet 4.5 to have dramatically better domain-specific knowledge and reasoning compared to previous models.

New tools for developers: The Claude Agent SDK

Alongside the new model, Anthropic has launched the Claude Agent SDK. This software development kit provides developers with the same infrastructure the company uses to power its Claude Code product, enabling them to build their own custom AI agents. The SDK is designed to solve common challenges in agent development, such as managing memory for long-running tasks, handling permission systems, and coordinating subagents working toward a shared goal.

Product updates across the Claude ecosystem

The launch of Sonnet 4.5 includes several significant upgrades to existing Claude products.

  • Claude Code: Introduces checkpoints that allow users to save progress and roll back to a previous state, a refreshed terminal interface, and a native VS Code extension.
  • Claude API: Adds a new context editing feature and a memory tool to help agents run longer and handle more complex tasks.
  • Claude Apps: Users on paid plans can now execute code and create files, such as spreadsheets, slides, and documents, directly within their conversations.
  • Claude for Chrome Extension: Now available to Max users who previously joined the waitlist.

Focus on safety and alignment

Anthropic states that Claude Sonnet 4.5 is its most aligned model to date, with improvements in reducing undesirable behaviors like deception and sycophancy. The model is released under the company’s AI Safety Level 3 (ASL-3) framework, which includes safeguards like classifiers designed to detect potentially dangerous inputs and outputs, particularly those related to chemical, biological, radiological, and nuclear (CBRN) weapons.

Imagine with Claude

For a limited time, Anthropic is offering a research preview called “Imagine with Claude” for its Max subscribers. In this demonstration, the model generates software in real time in response to user requests, with no prewritten code. This preview is designed to showcase the capabilities of Son-net 4.5 when combined with the right infrastructure.

Availability and pricing

Claude Sonnet 4.5 is available now through the Claude API. The pricing is the same as the previous Claude Sonnet 4 model, at $3 per million input tokens and $15 per million output tokens.

Anthropic recommends upgrading to Sonnet 4.5 for all uses, as it provides improved performance for the same cost.

Claude Sonnet 4.5 vs ChatGPT-5: Which one should you use for your next project?

The release of Claude Sonnet 4.5 has intensified the competition at the forefront of artificial intelligence, directly challenging GPT-5.

While both models represent advanced AI development, they showcase distinct strengths, particularly in the realms of coding, agentic capabilities, and overall performance.

At a glance: Key differences

Feature Claude Sonnet 4.5 GPT-5
Primary strength Agentic coding, computer use, and long-duration autonomous tasks. Unified intelligence, advanced reasoning, and multimodal capabilities.
SWE-bench Verified 77.2% (Standard), 82% (High-compute). 72.8%.
OSWorld Benchmark 61.4%. Not specified, but Sonnet 4.5 leads the chart.
Developer tools Claude Agent SDK, native VS Code extension, Claude Code with checkpoints. Accessed via API and integrated into products like ChatGPT and Microsoft Copilot.
Unique features Can operate autonomously for over 30 hours. Enhanced safety and alignment features. Unified system that blends multiple AI models. Dynamically adjusts its reasoning approach based on task complexity.

Coding and developer focus

Claude Sonnet 4.5 has been positioned as the “best coding model in the world.” This claim is substantiated by its leading performance on several key benchmarks. On SWE-bench Verified, which measures a model’s ability to solve real-world GitHub issues, Sonnet 4.5 scores an impressive 77.2%, outperforming GPT-5’s 72.8%. With additional computing power, Sonnet 4.5’s score jumps to 82%.

Furthermore, on Terminal-Bench, a test of an AI’s ability to use a command-line interface, Sonnet 4.5 achieved a 50% success rate, significantly ahead of GPT-5’s 43.8%. This suggests that for developers and technical users who need an AI to perform complex, multi-step tasks in a terminal environment, Sonnet 4.5 holds a distinct advantage.

In contrast, GPT-5 is presented as a powerful, general-purpose coding model. While it set new state-of-the-art benchmarks at the time of its release, the specialized focus of Sonnet 4.5 appears to give it an edge in developer-centric tasks.

Agentic capabilities and computer use

A standout feature of Claude Sonnet 4.5 is its ability to function as a long-running autonomous agent. Reports indicate the model can maintain focus and performance on complex tasks for more than 30 hours, a significant increase from previous models. This endurance is crucial for tasks that require sustained effort, such as large-scale code refactoring or in-depth data analysis.

On the OSWorld benchmark, which evaluates an AI’s ability to perform real-world tasks on a computer, Sonnet 4.5 has taken the top spot with a success rate of 61.4%. This proficiency is further demonstrated in its tool use capabilities, where it scored a remarkable 98.0% in the Telecom domain of the τ-bench evaluations, nearly doubling the performance of its predecessor and surpassing GPT-5.

GPT-5, on the other hand, is designed as a unified system that can intelligently switch between different reasoning approaches based on the task’s complexity. This allows it to handle a wide variety of tasks efficiently, but it does not emphasize the same long-duration autonomy as Sonnet 4.5.

Reasoning, math, and general performance

In areas of general reasoning and mathematics, the competition is much closer. On the AIME 2025 high school math competition, Sonnet 4.5 achieved a perfect 100% score when using Python, slightly edging out GPT-5’s 99.6%. For graduate-level reasoning, as measured by the GPQA Diamond benchmark, the models are highly competitive, with GPT-5 holding a slight lead.

Early user reports and hands-on tests suggest that Sonnet 4.5 is noticeably faster…


Featured image credit

Tags: AnthropicClaude Sonnet 4.5Featured

Related Posts

ChatGPT adds Instant Checkout with Agentic Commerce Protocol

ChatGPT adds Instant Checkout with Agentic Commerce Protocol

September 30, 2025
California enacts SB 53 AI transparency law

California enacts SB 53 AI transparency law

September 30, 2025
CESA: 51% of Japanese game firms use AI in development

CESA: 51% of Japanese game firms use AI in development

September 29, 2025
South Korea funds LG Exaone 4.0, SKT A.X for AI sovereignty

South Korea funds LG Exaone 4.0, SKT A.X for AI sovereignty

September 29, 2025
Medicare WISeR pilot uses AI for service approvals in 6 states

Medicare WISeR pilot uses AI for service approvals in 6 states

September 29, 2025
DHS uses AI to detect AI-generated child abuse material

DHS uses AI to detect AI-generated child abuse material

September 29, 2025

LATEST NEWS

ChatGPT adds Instant Checkout with Agentic Commerce Protocol

California enacts SB 53 AI transparency law

YouTube settles Trump lawsuit for $24.5 million

EA sold to Saudi-backed group for $55 billion

Cross-Chain is the new competitive edge: Building secure, interoperable systems in the Web3 era

Anthropic releases Claude Sonnet 4.5 with advanced coding and agent capabilities

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.