Claude Sonnet 4.5
Claude Sonnet 4.5 is a coding model from Anthropic with strong benchmark scores, including 77.2% on SWE-bench Verified and 61.4% on OSWorld for computer use, sustaining 30+ hour agentic coding sessions, and delivering substantial gains across coding, reasoning, math, and domain-specific expertise.
import { streamText } from 'ai'
const result = streamText({ model: 'anthropic/claude-sonnet-4.5', prompt: 'Why is the sky blue?'})What To Consider When Choosing a Provider
- Configuration: Sonnet 4.5's computer use capability is protected by ASL-3 (AI Safety Level 3) safeguards: classifiers that screen for potentially dangerous inputs and outputs. These may occasionally flag normal content. Anthropic has reduced false positive rates by a factor of 10 since the classifiers were first deployed.
- Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
- Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
When to Use Claude Sonnet 4.5
Best For
- Computer use and real-world browser/software automation: Strong results on OSWorld at release among models evaluated then
- Extended autonomous coding sessions: Documented 30+ hour capability for complex multi-step engineering tasks
- Complex agent workflows: Anthropic explicitly positioned it for agent workloads at release
- Domain-specific applications in finance, law, medicine, and STEM: Expert evaluation showed substantial gains in domain knowledge and reasoning compared to Opus 4.1
- Production deployments requiring strong alignment properties: With reduced sycophancy and deception compared to earlier Claude releases at that time
Consider Alternatives When
- Primary cost constraint: Haiku 4.5 may offer sufficient capability-per-cost for lighter workloads
- Simple latency-sensitive tasks: Sonnet 4.5's capability depth comes with higher per-token cost than lighter models
- Sonnet-tier large context: Check if Claude Sonnet 4.6 covers both the 1M tokens window and Sonnet pricing
- Earlier-model parity: Earlier models handle some specific computer use or coding tasks equivalently
Conclusion
Claude Sonnet 4.5 represents a generation step in multiple capability areas simultaneously, computer use, agentic duration, domain expertise, and safety alignment all advanced in the same release. For teams building agents that do real work in real software environments over extended periods, this is the model where those capabilities came together.
Frequently Asked Questions
What was Claude Sonnet 4.5's OSWorld score and why does it matter?
61.4%, up from Sonnet 4's 42.2% four months earlier. OSWorld measures AI performance on real-world computer tasks: navigating software, filling forms, and clicking UI elements. It focuses on operational computer-use scenarios rather than abstract reasoning alone.
How long can Claude Sonnet 4.5 maintain focus on a single agentic coding task?
More than 30 hours on complex, multi-step tasks. Anthropic noted this duration changes what's architecturally feasible for autonomous engineering work. Individual results vary by task structure.
What is ASL-3 and why does it apply to Sonnet 4.5?
ASL-3 (AI Safety Level 3) is Anthropic's framework level for models requiring additional safeguards. Sonnet 4.5 is the first Claude model released under ASL-3 protections, which include classifiers screening inputs and outputs for CBRN-related content. False positive rates have decreased by a factor of 10 since initial deployment.
What is the Claude Agent SDK and how does it relate to this model?
The Claude Agent SDK launched alongside Sonnet 4.5. It gives you access to the same agent infrastructure that powers Claude Code: memory management across long tasks, permission systems, and subagent coordination. Use it to build custom agents on the same foundation.
What alignment improvements came with Sonnet 4.5?
Substantial reductions in sycophancy, deception, power-seeking, encouragement of delusional thinking, and compliance with harmful system prompts, measured via an automated behavioral auditor. The model also improved defenses against prompt injection attacks for computer use and agentic capabilities.
Why did specialists in finance, law, medicine, and STEM find Sonnet 4.5 significantly better than previous models?
Professionals assessed domain-specific knowledge and reasoning in Anthropic's expert evaluations. Results showed substantially better performance compared to older models, including Opus 4.1. The intelligence improvements extend beyond coding benchmarks.
Is Sonnet 4.5 priced differently from Sonnet 4?
Current pricing is shown on this page. AI Gateway routes across providers, and rates may vary by provider.