Anthropic Launches Claude 4: A Coding Powerhouse
Anthropic just dropped the hammer. Claude Opus 4 and Claude Sonnet 4, released on May 22, 2025, represent the company's clearest bid yet to own the developer mindshare in the frontier AI race. These aren't incremental updates. They're a full-generation leap — and they land squarely in the crosshairs of OpenAI's GPT-5 and Google's Gemini 2.5 Pro.
The Headline: Coding Dominance
Let's cut to what matters. Anthropic is calling Claude Opus 4 "the world's best coding model," and the benchmarks back it up — at least at launch. Opus 4 scored 72.5% on SWE-bench, the gold-standard test for fixing real-world GitHub issues. Sonnet 4 edged even higher at 72.7%. That's a staggering result for a mid-tier model, and it signals that Anthropic isn't just competing at the top — it's compressing the gap between its flagship and its workhorse.
On Terminal-bench, a more demanding evaluation of sustained coding in terminal environments, Opus 4 hit 43.2%. These numbers matter because they measure what developers actually care about: can the model refactor a messy codebase, navigate a full-stack architecture, and hold context across long sessions without falling apart?
The answer, increasingly, is yes.
Hybrid Reasoning: Think Fast, Think Deep
Both Claude 4 models are what Anthropic calls hybrid reasoning models. In practice, this means they can snap back near-instant answers for straightforward queries, or engage an "extended thinking" mode when the problem demands deeper analysis. It's not a new concept — OpenAI has been doing something similar — but the execution here feels polished.
The real unlock is extended thinking with tool use, currently in beta. This lets the model alternate between internal reasoning and external tool calls — running code, querying APIs, reading files — in a fluid loop. For agent-heavy workflows, this is transformative. Instead of a rigid chain of prompt-response-tool-response, Claude 4 can reason through a complex task, picking up tools when it needs them and putting them down when it doesn't.
Parallel tool execution is also on the table, meaning agents built on Claude 4 can do multiple things at once. For anyone building autonomous coding agents or multi-step research pipelines, this is the feature you've been waiting for.
Agents Are the Play
Anthropic isn't being subtle about its strategy. Claude Opus 4 is explicitly designed for long-running agentic tasks — sessions that can sustain performance for several hours. That's not a typo. Hours. Previous models degraded over extended interactions. Opus 4 is built to hold the line.
Alongside the model launch, Anthropic made Claude Code generally available — their command-line tool for developer workflows — with new IDE extensions for VS Code and JetBrains, plus an SDK for building custom agents. Four new API capabilities round out the toolkit: a code execution tool, an MCP connector for plugging into external systems, a Files API, and prompt caching for up to one hour.
This is a full-stack developer platform play. Anthropic isn't just selling a model — it's selling an ecosystem.
Sonnet 4: The Sleeper Hit
Here's the thing that should worry OpenAI and Google: Sonnet 4 might be the more important model. Priced at $3/$15 per million tokens (input/output), it's positioned for high-volume production workloads — code reviews, bug fixes, everyday development tasks. It's fast, it's capable, and it's cheap enough to run at scale.
In head-to-head coding comparisons, Sonnet 4 was reported to be 2.8 times faster than Gemini 2.5 Pro Preview on complex Rust refactoring — averaging 6 minutes 5 seconds versus Gemini's 17 minutes 1 second. Speed kills in production environments, and that kind of margin is hard to ignore.
For context, Opus 4 is priced at $15/$75 per million tokens — five times more expensive than Sonnet. Unless you need sustained multi-hour reasoning or the absolute ceiling of capability, Sonnet 4 is where most developers will live. Anthropic knows this.
The Competitive Landscape
Let's be honest about the scoreboard. At launch, Claude 4 models led or matched competitors on coding benchmarks. But the frontier model race moves fast. GPT-5, released in August 2025, later pushed SWE-bench Verified to 74.9% at the high-compute tier. Gemini 2.5 Pro countered with a massive 2-million-token context window and aggressive $1.25/$10 pricing that undercuts both rivals.
On reasoning benchmarks like GPQA Diamond (PhD-level science), the race is neck-and-neck: later Claude variants hit 90.5%, GPT-5.2 reached 91.4%. On Humanity's Last Exam, Gemini models have pulled ahead. No single model dominates every benchmark — and that's the point. The frontier is getting crowded, and the gaps are shrinking.
Where Anthropic carves a genuine moat is in developer experience and agentic reliability. Claude 4's memory capabilities — especially when given local file access — are noticeably better. The MCP connector creates a standardized way to wire Claude into external tools and data sources. And the emphasis on sustained, hours-long agent sessions targets a use case that neither OpenAI nor Google has prioritized as aggressively.
What This Means
Claude 4 is Anthropic's most commercially ambitious release. It's not just a smarter model — it's a developer platform anchored by two models that cover the full spectrum from lightweight production tasks (Sonnet 4) to deep, multi-hour agentic workflows (Opus 4). Both are available on the Anthropic API, Amazon Bedrock, and Google Cloud's Vertex AI.
The training data cutoff is March 2025, with reliable knowledge through January 2025. Not cutting-edge fresh, but reasonable for a model that's designed to do things rather than just know things.
If you're building AI-powered developer tools, coding agents, or any workflow that requires sustained, reliable reasoning with tool integration, Claude 4 deserves to be your first evaluation. Anthropic bet big on code and agents. That bet is paying off.
Related Articles
- Claude 4: Anthropic's Coding Dominance Play
- Claude 4 Opus & Sonnet Are Here
- Claude 4: Anthropic's Coding Crown
- Claude 4 Is Here — And It's a Coding Beast
Building with Claude 4 or comparing it against GPT-5 and Gemini? Follow ultrathink.ai for hands-on breakdowns, benchmark deep dives, and the latest from the frontier model wars.
This article was ultrathought.
Get breaking news, funding rounds, and analysis delivered to your inbox. Free forever.