OpenAI Just Released GPT-4.1 And It's Ridiculously Good On Paper

GPT-4.1 has officially landed in the OpenAI API, introducing a trio of models—GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano—that outperform their predecessors in nearly every dimension. These models are designed for developers who need better coding skills, stronger instruction following, and massive long-context comprehension, all while reducing latency and cost. The flagship model now supports up to 1 million tokens of context and features a fresh knowledge cutoff of June 2024.

What’s new with GPT-4.1?

The GPT-4.1 family is a direct upgrade over GPT-4o and GPT-4.5, offering improved performance across benchmarks while optimizing for real-world developer use. GPT-4.1 scores 54.6% on SWE-bench Verified, making it one of the top models for coding tasks. On Scale’s MultiChallenge benchmark, it sees a 10.5% absolute improvement over GPT-4o in instruction following. For long context tasks, it sets a new state-of-the-art score of 72% on the Video-MME benchmark.

The models are also optimized across the latency curve. GPT-4.1 mini delivers nearly the same performance as GPT-4o while cutting latency in half and reducing cost by 83%. GPT-4.1 nano is OpenAI’s fastest and most affordable model yet, built for classification and autocomplete tasks while still supporting 1 million token context windows.

Coding capabilities take a leap

From generating cleaner frontend interfaces to following diff formats more reliably, GPT-4.1 proves itself as a highly capable coding assistant. On the SWE-bench Verified benchmark, it completes over half of the tasks correctly—up from 33.2% with GPT-4o. It also outperforms GPT-4o and even GPT-4.5 on Aider’s polyglot diff benchmark, offering developers precise edits across multiple programming languages without rewriting entire files. For file-level rewrites, output token limits have been expanded to 32,768 tokens.

In internal comparisons, GPT-4.1 websites were preferred 80% of the time over GPT-4o’s outputs. Extraneous edits in code dropped from 9% to just 2%, reflecting better context understanding and tool usage.

Early adopters highlight real-world wins

Windsurf reported a 60% improvement in internal benchmarks, while Qodo found GPT-4.1 provided better suggestions in 55% of GitHub pull requests. These improvements translate directly into better code review accuracy, fewer unnecessary suggestions, and faster iteration cycles for teams.

Sharper instruction following across scenarios

GPT-4.1 performs significantly better in instruction reliability. It scores 87.4% on IFEval and 38% on the MultiChallenge benchmark, showcasing gains in handling complex formats, rejecting forbidden instructions, and sorting or ranking outputs. OpenAI’s own evaluation showed that GPT-4.1 is more precise on hard prompts and better at multi-turn instruction tracking, an essential feature for building reliable conversational systems.

Blue J and Hex both tested GPT-4.1 against domain-specific tasks. Blue J saw a 53% accuracy improvement in complex tax scenarios, while Hex reported nearly double the performance in SQL tasks, reducing debugging overhead and improving production-readiness.

1 million token context window sets a new bar

All three models in the GPT-4.1 family now support up to 1 million tokens of context—over 8 times the React codebase. This enables powerful new use cases in legal document analysis, financial research, and long-form software workflows. In OpenAI’s “needle in a haystack” test, GPT-4.1 reliably retrieved relevant content regardless of where it appeared in the input.

The OpenAI-MRCR benchmark further confirmed this by testing the model’s ability to distinguish between near-identical prompts scattered across a massive context window. On the Graphwalks benchmark, which involves reasoning across nodes in a synthetic graph, GPT-4.1 scored 62%, significantly ahead of GPT-4o’s 42%.

Thomson Reuters reported a 17% boost in legal document review accuracy using GPT-4.1 in its CoCounsel system, while Carlyle saw a 50% improvement in extracting granular financial data from complex files.

GPT-4.5 out-humans humans in new test

Faster inference and better image understanding

OpenAI has reduced time to first token using improvements in its inference stack. GPT-4.1 nano responds in under five seconds on 128K-token prompts. For multimodal tasks, GPT-4.1 mini shows stronger image comprehension than GPT-4o across benchmarks like MMMU and MathVista.

On visual benchmarks like CharXiv-Reasoning and Video-MME, GPT-4.1 consistently leads, scoring 72% on the latter without subtitles. This makes it a top choice for video understanding and scientific chart interpretation.

Price cuts and transition plans

All three GPT-4.1 models are now available in the API, with a significant price drop. GPT-4.1 is 26% cheaper for median queries compared to GPT-4o. Prompt caching discounts have increased to 75%, and there are no extra charges for long-context inputs. The GPT-4.5 preview will be deprecated by July 14, 2025, in favor of the more efficient GPT-4.1 family.

Pricing per 1M tokens for GPT-4.1 is set at $2 for input, $0.50 for cached input, and $8 for output. GPT-4.1 nano drops those to $0.10, $0.025, and $0.40 respectively—making it the most affordable option to date.

Featured image credit

Tags: Featured openAI

OpenAI just released GPT-4.1 and it’s ridiculously good on paper

Pricing per 1M tokens for GPT-4.1 is set at $2 for input, $0.50 for cached input, and $8 for output. GPT-4.1 nano drops those to $0.10, $0.025, and $0.40 respectively—making it the most affordable option to date.

Related Posts

Anthropic partners with Teach For All to train 100,000 global educators

Signal co-founder launches privacy-focused AI service Confer

Adobe launches AI-powered Object Mask for Premiere Pro

Google Workspace adds password-protected Office file editing

Claim: NVIDIA green-lit pirated book downloads for AI training

Tesla restarts Dojo3 supercomputer project as AI5 chip stabilizes

LATEST NEWS

Anthropic partners with Teach For All to train 100,000 global educators

Signal co-founder launches privacy-focused AI service Confer

Adobe launches AI-powered Object Mask for Premiere Pro

Google Workspace adds password-protected Office file editing

Claim: NVIDIA green-lit pirated book downloads for AI training

Tesla restarts Dojo3 supercomputer project as AI5 chip stabilizes

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

OpenAI just released GPT-4.1 and it’s ridiculously good on paper

Pricing per 1M tokens for GPT-4.1 is set at $2 for input, $0.50 for cached input, and $8 for output. GPT-4.1 nano drops those to $0.10, $0.025, and $0.40 respectively—making it the most affordable option to date.

What’s new with GPT-4.1?

Stay Ahead of the Curve!

Coding capabilities take a leap

Early adopters highlight real-world wins

Sharper instruction following across scenarios

1 million token context window sets a new bar

Faster inference and better image understanding

Price cuts and transition plans

Related Posts

LATEST NEWS

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

Follow Us