Introducing Claude Opus 4.7 \ Anthropic

Introducing Claude Opus 4.7 \ Anthropic

Our newest mannequin, Claude Opus 4.7, is now usually obtainable.

Opus 4.7 is a notable enchancment on Opus 4.6 in superior software program engineering, with specific features on probably the most tough duties. Users report with the ability to hand off their hardest coding work—the type that beforehand wanted shut supervision—to Opus 4.7 with confidence. Opus 4.7 handles advanced, long-running duties with rigor and consistency, pays exact consideration to directions, and devises methods to confirm its personal outputs earlier than reporting again.

The mannequin additionally has considerably higher imaginative and prescient: it may possibly see photographs in better decision. It’s extra tasteful and artistic when finishing skilled duties, producing higher-quality interfaces, slides, and docs. And—though it’s much less broadly succesful than our strongest mannequin, Claude Mythos Preview—it reveals higher outcomes than Opus 4.6 throughout a spread of benchmarks:

Last week we introduced Project Glasswing, highlighting the dangers—and advantages—of AI fashions for cybersecurity. We acknowledged that we’d hold Claude Mythos Preview’s launch restricted and take a look at new cyber safeguards on much less succesful fashions first. Opus 4.7 is the primary such mannequin: its cyber capabilities should not as superior as these of Mythos Preview (certainly, throughout its coaching we experimented with efforts to differentially scale back these capabilities). We are releasing Opus 4.7 with safeguards that robotically detect and block requests that point out prohibited or high-risk cybersecurity makes use of. What we be taught from the real-world deployment of those safeguards will assist us work in direction of our eventual aim of a broad launch of Mythos-class fashions.

Security professionals who want to use Opus 4.7 for respectable cybersecurity functions (resembling vulnerability analysis, penetration testing, and red-teaming) are invited to affix our new Cyber Verification Program.

Opus 4.7 is accessible right this moment throughout all Claude merchandise and our API, Amazon Bedrock, Google Cloud’s Vertex AI, and Microsoft Foundry. Pricing stays the identical as Opus 4.6: $5 per million enter tokens and $25 per million output tokens. Developers can use claude-opus-4-7 by way of the Claude API.

Testing Claude Opus 4.7

Claude Opus 4.7 has garnered robust suggestions from our early-access testers:

Below are some highlights and notes from our early testing of Opus 4.7:

  • Instruction following. Opus 4.7 is considerably higher at following directions. Interestingly, because of this prompts written for earlier fashions can typically now produce surprising outcomes: the place earlier fashions interpreted directions loosely or skipped components solely, Opus 4.7 takes the directions actually. Users ought to re-tune their prompts and harnesses accordingly.
  • Improved multimodal help. Opus 4.7 has higher imaginative and prescient for high-resolution photographs: it may possibly settle for photographs as much as 2,576 pixels on the lengthy edge (~3.75 megapixels), greater than thrice as many as prior Claude fashions. This opens up a wealth of multimodal makes use of that rely upon wonderful visible element: computer-use brokers studying dense screenshots, knowledge extractions from advanced diagrams, and work that wants pixel-perfect references.1
  • Real-world work. As nicely as its state-of-the-art rating on the Finance Agent analysis (see desk above), our inside testing confirmed Opus 4.7 to be a simpler finance analyst than Opus 4.6, producing rigorous analyses and fashions, extra skilled displays, and tighter integration throughout duties. Opus 4.7 can be state-of-the-art on GDPval-AA, a third-party analysis of economically helpful data work throughout finance, authorized, and different domains.
  • Memory. Opus 4.7 is best at utilizing file system-based reminiscence. It remembers essential notes throughout lengthy, multi-session work, and makes use of them to maneuver on to new duties that, in consequence, want much less up-front context.

The charts under show extra analysis outcomes from our pre-release testing, throughout a spread of various domains:

Safety and alignment

Overall, Opus 4.7 reveals an identical security profile to Opus 4.6: our evaluations present low charges of regarding habits resembling deception, sycophancy, and cooperation with misuse. On some measures, resembling honesty and resistance to malicious “prompt injection” assaults, Opus 4.7 is an enchancment on Opus 4.6; in others (resembling its tendency to offer overly detailed harm-reduction recommendation on managed substances), Opus 4.7 is modestly weaker. Our alignment evaluation concluded that the mannequin is “largely well-aligned and trustworthy, though not fully ideal in its behavior”. Note that Mythos Preview stays the best-aligned mannequin we’ve skilled in keeping with our evaluations. Our security evaluations are mentioned in full within the Claude Opus 4.7 System Card.

Overall misaligned habits rating from our automated behavioral audit. On this analysis, Opus 4.7 is a modest enchancment on Opus 4.6 and Sonnet 4.6, however Mythos Preview nonetheless reveals the bottom charges of misaligned habits.

Also launching right this moment

In addition to Claude Opus 4.7 itself, we’re launching the next updates:

  • More effort management: Opus 4.7 introduces a brand new xhigh (“extra high”) effort level between excessive and max, giving customers finer management over the tradeoff between reasoning and latency on arduous issues. In Claude Code, we’ve raised the default effort degree to xhigh for all plans. When testing Opus 4.7 for coding and agentic use circumstances, we suggest beginning with excessive or xhigh effort.
  • On the Claude Platform (API): in addition to help for higher-resolution photographs, we’re additionally launching job budgets in public beta, giving builders a option to information Claude’s token spend so it may possibly prioritize work throughout longer runs.
  • In Claude Code: The new /ultrareview slash command produces a devoted assessment session that reads by modifications and flags bugs and design points {that a} cautious reviewer would catch. We’re giving Pro and Max Claude Code customers three free ultrareviews to strive it out. In addition, we’ve prolonged auto mode to Max customers. Auto mode is a brand new permissions choice the place Claude makes choices in your behalf, that means that you could run longer duties with fewer interruptions—and with much less threat than for those who had chosen to skip all permissions.

Migrating from Opus 4.6 to Opus 4.7

Opus 4.7 is a direct improve to Opus 4.6, however two modifications are value planning for as a result of they have an effect on token utilization. First, Opus 4.7 makes use of an up to date tokenizer that improves how the mannequin processes textual content. The tradeoff is that the identical enter can map to extra tokens—roughly 1.0–1.35× relying on the content material sort. Second, Opus 4.7 thinks extra at larger effort ranges, significantly on later turns in agentic settings. This improves its reliability on arduous issues, however it does imply it produces extra output tokens.

Users can management token utilization in numerous methods: by utilizing the trouble parameter, adjusting their job budgets, or prompting the mannequin to be extra concise. In our personal testing, the online impact is favorable—token utilization throughout all effort ranges is improved on an inside coding analysis, as proven under—however we suggest measuring the distinction on actual visitors. We’ve written a migration guide that gives additional recommendation on upgrading from Opus 4.6 to Opus 4.7.

Score on an inside agentic coding analysis as a perform of token utilization at every effort degree. In this analysis, the mannequin works autonomously from a single consumer immediate, and outcomes will not be consultant of token utilization in interactive coding. See the migration guide for extra on tuning effort ranges.

Leave a Reply

Your email address will not be published. Required fields are marked *