ARTICLES ·2026-04-21 ·BY EFFLOOW CONTENT FACTORY

Claude Sonnet 4.6: 1M Context, 300K Output, Agentic Coding

Claude Sonnet 4.6 delivers 79.6% SWE-bench, 1M token context, and 300K batch output at $3/MTok. Complete API guide with adaptive thinking and compaction.

claude anthropic llm agentic-coding api context-window ai-tools

Claude Sonnet 4.6: 1M Context, 300K Output, Agentic Coding

Claude Sonnet 4.6, released on February 17, 2026, is the model that made the "when do I need Opus?" question genuinely hard to answer. It scores 79.6% on SWE-bench Verified — just 1.2 points behind Opus 4.6's 80.8% — while costing one-fifth the price. Add a 1 million token context window and 300K-token batch output, and you get a model that erases the practical boundary between mid-tier and flagship for the vast majority of developer workloads.

This guide covers everything you need to deploy Claude Sonnet 4.6 effectively: benchmarks, API specifics, adaptive thinking, context compaction, and a concrete model routing strategy.

Why Sonnet 4.6 Changes the Calculus

Before Sonnet 4.6, the decision tree was simple: use Sonnet for everyday tasks, reach for Opus when things get hard. That boundary still exists, but it moved significantly.

The model's headline numbers are strong, but what prompted Anthropic to promote it as the default for Free and Pro users is subtler: Sonnet 4.6 reads context before acting, consolidates shared logic instead of duplicating it, and follows multi-step instructions without losing track. In Claude Code testing, users preferred Sonnet 4.6 over its predecessor 70% of the time — and preferred it over the previous flagship Opus 4.5 59% of the time.

This is not an incremental update. Anthropic rebuilt the model's attention to context and its tendencies around overengineering. The result is a model that behaves more like a senior engineer following a spec than a tool trying to impress.

The AI coding market has consolidated rapidly in 2026, and Sonnet 4.6 is the primary reason Claude Code holds its market position at the price point it does.

Benchmark Performance: How Close Is It Really to Opus?

Benchmark	Sonnet 4.6	Opus 4.6	GPT-5.4
SWE-bench Verified	79.6%	80.8%	~76%
OSWorld (computer use)	72.5%	~72.7%	~38%
Terminal-Bench 2.0	59.1%	62%	—
GDPval-AA (office tasks)	1633 Elo	Higher	—
Price (input/output /MTok)	$3 / $15	$15 / $75	$5 / $15

Two numbers stand out. The 79.6% SWE-bench score means Sonnet 4.6 resolves about 4 out of 5 real GitHub issues from the benchmark set — a score that would have been Opus-tier six months ago. The OSWorld computer use score of 72.5% puts it ahead of GPT-5.4 by over 34 percentage points on GUI navigation and multi-step desktop automation.

Math performance saw the sharpest improvement: from 62% to 89%, making it reliable for quantitative reasoning that previously required escalation to Opus.

Core API Specifications

Model ID: claude-sonnet-4-6

Context windows:

Standard: 200K tokens input, 64K output (synchronous API)
1M context beta: enabled via anthropic-beta: interleaved-thinking-2025-05-14 header (check the latest docs for the exact beta flag)
Batch output: up to 300K tokens per request using output-300k-2026-03-24 beta header via the Messages Batches API

Pricing: $3.00 per million input tokens / $15.00 per million output tokens — unchanged from Sonnet 4.5.

Availability: Anthropic API, AWS Bedrock (anthropic.claude-sonnet-4-6-20260217-v1:0), Azure AI Foundry (Microsoft Foundry).

Basic API Call

import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=8192,
    messages=[
        {
            "role": "user",
            "content": "Refactor this function to use dependency injection..."
        }
    ]
)
print(message.content[0].text)

Enabling the 1M Token Context Window

The 1M context window is available in beta. Pass the appropriate beta header with your request:

import anthropic

client = anthropic.Anthropic()

with open("large_codebase.py", "r") as f:
    codebase = f.read()

message = client.beta.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    betas=["1m-context-2026-02-01"],  # Check docs for the current beta flag
    messages=[
        {
            "role": "user",
            "content": f"Analyze this codebase and identify all security vulnerabilities:\n\n{codebase}"
        }
    ]
)

The 1M window can hold an entire medium-sized codebase, dozens of research papers, or months of conversation history in a single request. Previously this required Opus — now it runs at Sonnet pricing. For more context on what developers are doing with million-token windows, see our context window race deep-dive.

300K Output via Message Batches API

For long-form generation tasks — full documentation sets, large code migrations, comprehensive test suites — the Batches API unlocks 300K output tokens per request:

import anthropic

client = anthropic.Anthropic()

batch = client.beta.messages.batches.create(
    requests=[
        {
            "custom_id": "migrate-legacy-auth",
            "params": {
                "model": "claude-sonnet-4-6",
                "max_tokens": 300000,
                "messages": [
                    {
                        "role": "user",
                        "content": "Migrate this entire legacy authentication module to JWT..."
                    }
                ]
            }
        }
    ],
    betas=["output-300k-2026-03-24"]
)

print(f"Batch created: {batch.id}")

Batch requests are processed asynchronously and cost 50% less than synchronous API calls — making them the right choice for large-scale generation pipelines.

Adaptive Thinking

Adaptive thinking is Sonnet 4.6's mechanism for dynamic reasoning allocation. Rather than applying a fixed thinking budget, the model decides when and how much to reason before producing output. You enable it via the thinking parameter:

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={
        "type": "adaptive",
        "budget_tokens": 10000  # max thinking tokens; 0 disables thinking
    },
    messages=[
        {
            "role": "user",
            "content": "Design the database schema for a multi-tenant SaaS application..."
        }
    ]
)

# Thinking content appears as a separate block
for block in message.content:
    if block.type == "thinking":
        print(f"Reasoning: {block.thinking}")
    elif block.type == "text":
        print(f"Response: {block.text}")

Adaptive thinking shines on tasks with variable complexity: simple completions skip the reasoning step entirely (faster and cheaper), while genuinely complex architectural questions trigger deeper analysis. Anthropic recommends migrating new projects to adaptive rather than extended thinking with a fixed budget.

The practical implication: you do not need to manually tune reasoning budgets per task type. Set a reasonable ceiling and let the model allocate.

Context Compaction

Long-running agent loops have always hit a ceiling: the context window fills up, and you either lose earlier turns or restart. Context compaction solves this by automatically summarizing earlier conversation history server-side when you approach the limit.

message = client.beta.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=8192,
    betas=["context-compaction-2026-02-01"],
    messages=conversation_history  # could be hundreds of turns
)

When compaction triggers, the API replaces earlier turns with a structured summary, then continues the conversation seamlessly. The model maintains awareness of prior context without you managing a sliding window or summary buffer in your application code.

This is especially useful for agentic workflows where Claude Code or a long-running agent loop needs to maintain context across an entire development session. See our Claude Code advanced workflow guide for session management patterns that pair well with context compaction.

Model Routing: When Sonnet 4.6 Is Enough (and When It's Not)

The optimal approach is not picking one model — it's routing intelligently.

Use Sonnet 4.6 for ~80% of your tasks:

Writing new code, implementing features, and fixing bugs
Code review and refactoring (up to moderate complexity)
Unit and integration test generation
Documentation generation and summarization
Long-context analysis using the 1M window
Computer use automation (GUI navigation, browser tasks)
Conversational agents and chatbots

Escalate to Opus 4.6 for:

Architectural decisions on large, highly interconnected systems
Agent Teams workflows (currently an Opus-exclusive capability)
Research-heavy tasks requiring deep scientific reasoning
Problems where Sonnet's first attempt clearly misses the mark
Complex multi-file refactors spanning 50+ files with non-obvious interdependencies

For most production applications, a simple fallback pattern handles this:

def call_claude(prompt: str, complexity: str = "standard") -> str:
    model = "claude-sonnet-4-6" if complexity != "high" else "claude-opus-4-6"
    
    response = client.messages.create(
        model=model,
        max_tokens=8192,
        messages=[{"role": "user", "content": prompt}]
    )
    return response.content[0].text

The more sophisticated version evaluates prompt characteristics — token count, technical depth signals, explicit complexity markers — and routes accordingly. Pair this with LLM observability tooling to measure actual model performance across your specific workloads before committing to a routing strategy.

Practical Application: Full Codebase Review

Here's a real pattern for running a security audit across an entire codebase using the 1M context window:

import anthropic
import os
from pathlib import Path

def collect_codebase(root_dir: str, extensions: list[str]) -> str:
    files = []
    for ext in extensions:
        for path in Path(root_dir).rglob(f"*{ext}"):
            if ".git" not in str(path) and "node_modules" not in str(path):
                try:
                    content = path.read_text(encoding="utf-8", errors="ignore")
                    files.append(f"### {path}\n```\n{content}\n```\n")
                except Exception:
                    pass
    return "\n".join(files)

client = anthropic.Anthropic()

codebase = collect_codebase("./src", [".py", ".ts", ".sql"])

# Approximate token count: 4 chars ≈ 1 token
approx_tokens = len(codebase) // 4
print(f"Estimated tokens: {approx_tokens:,}")

response = client.beta.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    betas=["1m-context-2026-02-01"],
    messages=[
        {
            "role": "user",
            "content": (
                "Review this entire codebase for security vulnerabilities. "
                "Focus on: SQL injection, XSS, SSRF, insecure deserialization, "
                "hardcoded credentials, and broken authentication. "
                "For each issue, provide: file path, line reference, severity (Critical/High/Medium/Low), "
                "and a specific fix with code.\n\n" + codebase
            )
        }
    ]
)

print(response.content[0].text)

This pattern scales up to approximately 750K tokens of actual code — roughly 500,000–700,000 lines depending on density.

Common Mistakes

Defaulting to Opus for everything. The performance gap is 1.2 points on SWE-bench. The cost gap is 5×. Run your tasks through Sonnet first; you'll rarely need to escalate.

Not enabling adaptive thinking for complex tasks. Without it, Sonnet uses standard completion for all queries regardless of complexity. Enabling it with a budget of 8,000–16,000 tokens costs little for simple tasks (the model skips reasoning) and pays off significantly for hard ones.

Using synchronous API for large output requirements. The synchronous API caps output at 64K tokens. If you're generating large documentation, migrations, or codebases, use the Batches API with the 300K output beta — and get the 50% cost reduction as a bonus.

Managing context windows manually. Before context compaction, developers implemented their own summarization logic in application code. This is unnecessary for Sonnet 4.6; opt in to context-compaction-2026-02-01 and let the API handle it.

Not verifying token costs before running 1M-context requests. A 1M-token input costs $3. That is inexpensive for a one-off analysis. Running it in a loop across many requests adds up quickly. Profile your context size before scaling.

FAQ

Q: Is Claude Sonnet 4.6 available on AWS Bedrock?

Yes. The Bedrock model ID is anthropic.claude-sonnet-4-6-20260217-v1:0. The same 1M context and adaptive thinking features are available via the Bedrock API. Pricing is the same as direct Anthropic API access ($3/$15 per million tokens), with standard AWS billing and data residency options.

Q: What is the difference between the 1M context window and context compaction?

They solve different problems. The 1M context window lets you include up to 1 million tokens in a single request — useful for loading an entire codebase or document set upfront. Context compaction handles long conversations: when a multi-turn session approaches the context limit, it automatically summarizes earlier turns so the conversation can continue. You can use both together.

Q: Does Sonnet 4.6 support extended thinking (fixed budget) as well as adaptive thinking?

Yes. You can set thinking: {type: "enabled", budget_tokens: N} for a fixed budget, or {type: "adaptive"} to let the model decide. Anthropic recommends adaptive for new projects because it avoids paying for reasoning on queries that don't need it.

Q: How does Sonnet 4.6 compare to GPT-5.4 for coding?

On SWE-bench Verified, Sonnet 4.6 scores 79.6% versus GPT-5.4's approximately 76%. For computer use (OSWorld), Sonnet 4.6's 72.5% exceeds GPT-5.4 by over 34 points. Pricing is comparable. The practical difference comes down to tool integrations: Claude Code's tight integration with Sonnet 4.6 makes it the natural choice for developers already in the Anthropic ecosystem.

Q: Can I use Claude Sonnet 4.6 for computer use in production?

Yes. Sonnet 4.6 is Anthropic's most capable computer use model, scoring 72.5% on OSWorld Verified. Computer use is available via the API with the computer_use_2025-05-01 beta or the tool definitions in the current stable API. It navigates GUIs, fills forms, and executes multi-step desktop workflows. Production deployments should implement human-in-the-loop review for any actions with side effects.

Q: What's the right max_tokens setting for Sonnet 4.6?

For synchronous API calls, set max_tokens based on your expected output length. The hard cap is 64K for synchronous. For most code generation tasks, 8,192–16,384 is sufficient. For the Batches API with the 300K beta, you can go up to 300,000 — but only do so if your task actually requires output that long, since unused allocated tokens don't cost anything, but it's cleaner to set realistic limits.

Key Takeaways

Bottom Line

Claude Sonnet 4.6 is the default choice for production AI development in 2026. At $3/$15 per million tokens, it delivers 79.6% SWE-bench, 72.5% OSWorld, and a 1M token context window that was Opus-exclusive six months ago. Reserve Opus 4.6 for the 20% of tasks where architectural depth or Agent Teams genuinely matter — and route everything else through Sonnet.

Model ID: claude-sonnet-4-6 — available on Anthropic API, AWS Bedrock, and Azure AI Foundry
Pricing: $3/$15 per million tokens (input/output), 50% discount via Batches API
1M context window: Available in beta; fits entire medium codebases in a single request
300K batch output: Use output-300k-2026-03-24 beta header with the Messages Batches API
Adaptive thinking: Set thinking: {type: "adaptive"} — model allocates reasoning only where needed
Context compaction: Enable with context-compaction-2026-02-01 beta for long-running agent sessions
Route 80/20: Sonnet for most coding and agentic tasks, Opus only for high-complexity architectural work or Agent Teams
Preferred in Claude Code 70% over Sonnet 4.5 and 59% over the previous flagship Opus 4.5 — developers have already voted with usage patterns

For the full context on where Sonnet 4.6 sits in the Claude model family and what comes next, see our Claude Mythos preview guide, which covers Anthropic's gated frontier model that sits above Opus 4.6.

Prefer a deep-dive walkthrough? Watch the full video on YouTube.

Need content like this
for your blog?

We run AI-powered technical blogs. Start with a free 3-article pilot.

Learn more →