[AINews] Anthropic Claude Opus 4.7 - literally one step better than 4.6 in every dimension
Summary
Anthropic has released Claude Opus 4.7, an update to the Opus model family focused on improving coding, instruction following, and computer-use capabilities. The release introduces a new tokenizer and expanded vision support, targeting more complex, long-running autonomous tasks.
Key Points
- Claude Opus 4.7 introduces a new
xhighreasoning effort tier, which is now the default setting for Claude Code. - Vision capabilities have expanded to support images up to 2,576 pixels on the long edge (~3.75 megapixels), a significant increase over previous models.
- The model achieved 64.3% on SWE-bench Pro, an 11-point improvement over Opus 4.6, and 87.6% on SWE-bench Verified.
- API pricing remains unchanged at $5 per million input tokens and $25 per million output tokens.
- A new tokenizer may increase token usage by up to 35% for certain content types, though improved reasoning efficiency can reduce overall token usage by up to 50%.
- Performance on TerminalBench 2.0 increased to 69.4%, up from previous iterations.
Technical Details
The architecture of Opus 4.7 has prompted technical debate regarding whether it is a new base model, a tokenizer-swapped continuation, or a distilled version of a larger system. The implementation of a new tokenizer is a significant change; while it can lead to a 1.0x–1.35x increase in token count for the same input, the model's improved reasoning efficiency is reported to offset this, potentially reducing total token consumption by up to 50%.
In terms of multimodal processing, the model supports much higher resolution inputs, which allows for more precise computer-use agents capable of reading dense screenshots and complex diagrams without the downscaling required by previous versions. However, benchmarks for document understanding show mixed results: while chart recognition improved significantly (from 13.5% to 55.8% in some evaluations), there was a recorded regression in layout recognition (from 16.5% to 14.0%).
Impact / Why It Matters
Developers building autonomous agents and coding tools can leverage significantly higher benchmarks in SWE-bench and improved computer-use precision, though they should account for potential changes in token density and layout parsing accuracy. The new xhigh effort level provides a new configuration parameter for managing reasoning depth in agentic workflows.