Claude Sonnet 4.5 launched on September 29, 2025. The OSWorld result backed Anthropic's computer use claims directly: 61.4%, up from Sonnet 4's 42.2% just four months earlier. On SWE-bench Verified, the model scored 77.2% and maintained focus for 30+ hours on complex multi-step tasks, a duration threshold that changes what's architecturally feasible for autonomous engineering work.
Domain expert evaluation reinforced the benchmark numbers. Finance, law, medicine, and STEM specialists found substantially better domain-specific knowledge and reasoning compared to older models including Opus 4.1. Devin increased planning performance by 18% and end-to-end scores by 12%, the biggest jump since Claude Sonnet 3.6. Cursor, GitHub Copilot, and Figma Make reported significant gains in their specific domains. Claude Code shipped checkpoints and rollback, a native VS Code extension, and a refreshed terminal interface alongside this model.
At release, Sonnet 4.5 included substantial alignment improvements over prior Claude models. Safety gains are concrete: substantial reductions in sycophancy, deception, power-seeking, and tendency to encourage delusional thinking. Prompt injection defense for computer use and agentic capabilities improved considerably. Anthropic released the model under ASL-3 (AI Safety Level 3) protections, the first Claude model at that safety level, with CBRN (chemical, biological, radiological, and nuclear) classifiers active.
The Claude Agent SDK launched alongside Sonnet 4.5, giving you access to the same infrastructure that powers Claude Code: memory management, permission systems, and subagent coordination for building custom agents.