Claude 4 Launches: Anthropic's Boldest Move Yet
On May 22, 2025, Anthropic dropped Claude 4 — two models, one unmistakable message: the frontier model race just got a new front-runner. Claude Opus 4 and Claude Sonnet 4 don't just iterate on their predecessors. They redefine what we should expect from AI coding, reasoning, and agentic workflows. And they land at a moment when OpenAI and Google are throwing everything they have at the same problems.
Two Models, One Clear Bet on Code
Anthropic isn't trying to be everything to everyone. With Claude 4, the company doubled down on what it does best: coding and long-running agentic tasks. Claude Opus 4 is positioned as "the world's best coding model," and the benchmarks back up the swagger. It scores 72.5% on SWE-bench and 43.2% on Terminal-bench — both metrics that measure real-world software engineering capability, not toy problems.
Here's the twist: Claude Sonnet 4, the smaller and cheaper sibling, actually edges out Opus on SWE-bench with a 72.7% score. That's not a typo. The mid-tier model slightly outperforms the flagship on one of the most watched coding benchmarks in AI. This tells you something important about Anthropic's strategy — they're not just building a single apex model. They're building a family where even the cost-efficient option is elite.
Both models ship with hybrid reasoning modes. They can snap back near-instant answers for simple queries or engage extended thinking for complex, multi-step problems. The extended thinking with tool use beta is particularly significant: models can now alternate between internal reasoning and external tool calls — web search, code execution, file operations — in a single coherent chain. This is the architecture agentic AI needs.
The API Arsenal
Models are only as good as the infrastructure around them, and Anthropic clearly got that memo. Claude 4 launches with a code execution tool baked into the API, an MCP connector for plugging into external systems, a Files API for document handling, and prompt caching that persists for up to one hour. That last one matters more than it sounds — it slashes costs and latency for applications that make repeated calls with overlapping context.
Claude Code, Anthropic's coding assistant, also hit general availability alongside the model launch, complete with IDE integrations and an SDK. This isn't just an API play anymore. Anthropic is building a full developer ecosystem.
Pricing tells its own story. Opus 4 runs $15/$75 per million tokens (input/output). Sonnet 4 is far more accessible at $3/$15. Both are available on Amazon Bedrock and Google Cloud's Vertex AI from day one. That multi-cloud availability is table stakes now, but Anthropic executes it cleanly.
The Three-Way War: Claude 4 vs. GPT-5 vs. Gemini 2.5
Let's talk about the elephant(s) in the room. When Claude 4 launched in May 2025, it was competing against Gemini 2.5 Pro (publicly available since March) and ahead of GPT-5's August release. But the landscape evolved fast.
Against GPT-5: OpenAI's flagship arrived three months later with a SWE-bench score of 74.9%, technically edging Claude Opus 4's 72.5%. But raw benchmarks don't tell the full story. Claude has consistently shown lower hallucination rates and better performance in long-form technical documentation — the kind of unglamorous but critical work that enterprise developers actually need. GPT-5 excels at iterative coding and refactoring large codebases, but real-world tests have repeatedly shown Claude's outputs require less human correction.
Against Gemini 2.5 Pro: Google's model brings a fundamentally different weapon — a million-token context window. For analyzing massive codebases or integrating multimodal data, Gemini is hard to beat. It also dominates competitive programming benchmarks with high Codeforces ratings. But Gemini has historically struggled with the kind of nuanced, multi-step agent workflows where Claude 4 shines. It's fast and broad, but Claude goes deeper.
The real story isn't which model "wins" on a single benchmark. It's that three companies are now shipping models that can genuinely write, debug, and deploy production code — and they're all getting better at a terrifying pace.
What Happened After Launch
The Claude 4 family didn't stand still. Anthropic followed up aggressively. Claude Opus 4.5 launched on November 24, 2025, pushing the SWE-bench score to a staggering 80.9% by January 2026 — vaulting past GPT-5 and establishing clear coding supremacy. Claude Sonnet 4.6 followed in February 2026 with a 1M token context window in beta, directly neutralizing one of Gemini's biggest advantages.
This cadence matters. Anthropic is shipping major model updates roughly every three to four months. OpenAI countered with GPT-5.1 (November 2025) and GPT-5.2 (December 2025). Google previewed Gemini 3 Pro in November 2025. The release cycles are compressing. What used to be annual model generations are now quarterly sprints.
The Pricing Problem Nobody Talks About
Here's where Anthropic faces real pressure. Claude's premium pricing — roughly 5-7x more expensive than some competitors — works when you're the best. But it creates vulnerability. DeepSeek-V3.2 charges $0.27 per million input tokens — that's 10-30x cheaper than Anthropic's models. GPT-5.1 slashed prices 75% on input. For high-volume applications, Claude's superior debugging becomes a harder sell when the alternatives are "good enough" at a fraction of the cost.
Anthropic's counter-argument is quality. Lower hallucination rates, better agentic reliability, and fewer human interventions needed. For mission-critical code, that premium pays for itself. For a startup prototyping quickly? The math is different.
The Verdict
Claude 4's launch was a watershed moment — not because it was objectively "the best" at everything, but because it proved Anthropic could compete at the absolute frontier while maintaining a coherent product vision. They bet on coding, agentic workflows, and developer experience. That bet paid off.
GPT-5 remains the generalist king. Gemini 2.5 Pro owns the massive-context niche. But Claude 4 carved out territory that matters enormously to the people actually building with these models: reliable, deep, trustworthy code generation that works in production, not just on benchmarks.
The frontier model race in 2025 isn't about one winner. It's about three companies pushing each other to build genuinely useful AI at a pace none of us expected. Claude 4 made sure Anthropic has a permanent seat at that table.
Related Articles
- Claude 4: Anthropic's Agentic Coding Bet Paid Off
- Claude 4: Anthropic's Coding Crown
- Claude 4: Anthropic's Knockout Punch
- Claude 4 Opus & Sonnet Are Here
Building with Claude 4, GPT-5, or Gemini 2.5? Follow ultrathink.ai for sharp analysis on every major model drop — and the real-world implications that benchmarks don't capture.
This article was ultrathought.
Get breaking news, funding rounds, and analysis delivered to your inbox. Free forever.