Anthropic has announced the launch of two new AI models: an upgraded Claude 3.5 Sonnet and a new Claude 3.5 Haiku. The upgraded Claude 3.5 Sonnet offers across-the-board improvements, with significant gains in coding capabilities. The new Claude 3.5 Haiku brings advanced AI features at an affordable price, matching the performance of Anthropic’s previous flagship model, Claude 3 Opus, while maintaining similar speed and cost as its predecessor.
Claude 3.5 Sonnet
The upgraded Claude 3.5 Sonnet builds on the success of its predecessor with enhanced performance in various tasks, especially in coding. Anthropic has emphasized that Claude 3.5 Sonnet leads in software engineering tasks, showing marked improvements in benchmarks such as SWE-bench Verified, where it improved from 33.4% to 49.0%, surpassing other models on the market.
The model also performed well in agentic tool use, improving scores on the TAU-bench from 62.6% to 69.2% in the retail domain and from 36.0% to 46.0% in the airline domain. According to early testers like GitLab and Cognition, these upgrades represent a substantial leap forward for AI-powered coding and automation, with better reasoning capabilities and minimal added latency.
Claude 3.5 Haiku
Claude 3.5 Haiku is designed to offer state-of-the-art performance while keeping costs low. The model scores highly in benchmarks like SWE-bench Verified, achieving 40.6%, which surpasses many agents using other cutting-edge models, including the original Claude 3.5 Sonnet and GPT-4o. The model aims to serve applications that need fast, reliable AI, such as user-facing products and tasks requiring personalized experiences.
Anthropic will release Claude 3.5 Haiku later this month, initially available as a text-only model with plans for image input support in the future. It will be available through Anthropic’s API, Amazon Bedrock, and Google Cloud’s Vertex AI.
Anthropic’s Claude AI is now on your iPad
Computer use is available as an experimental public beta
Anthropic has also introduced a new experimental feature called “computer use,” available in public beta. Developers can direct Claude 3.5 Sonnet to use computers like a human—looking at screens, moving cursors, and typing. This capability is still experimental, with some limitations, but developers like Replit and The Browser Company have already begun exploring how it could automate complex processes that involve many steps.
This feature aims to teach AI general computer skills, making it more versatile in completing tasks that previously required custom tools. Developers can access this beta through Anthropic’s API and other major cloud platforms. While early results are promising, Anthropic acknowledges that the technology is still developing, with challenges in performing some common computer tasks like scrolling and dragging.
Companies such as Asana, Canva, Cognition, and DoorDash are experimenting with the new features, exploring the potential of AI to automate processes that traditionally required human input. Early testing by the US AI Safety Institute and the UK Safety Institute found that the updated Claude 3.5 Sonnet meets Anthropic’s safety standards and is suitable for public use.
Anthropic is also addressing potential risks associated with AI’s ability to interact with computers. To ensure responsible deployment, the company has implemented new safety measures to identify misuse, including classifiers to detect potentially harmful actions. Anthropic is encouraging developers to use the beta feature for low-risk tasks while the technology matures.
Anthropic’s release of the upgraded Claude 3.5 Sonnet and Claude 3.5 Haiku highlights the company’s push to expand AI capabilities while maintaining safety standards. The addition of experimental computer use capabilities represents a novel step forward in AI’s potential to perform general-purpose tasks, giving developers new possibilities to explore.
Image credits: Anthropic