Claude 4.6 Model & Effort Matrix

1. The Four Effort Levels

Illustrative Data

Claude Code's effort parameter controls how deeply the model reasons before responding. It's a behavioral signal, not a strict token ceiling. Higher effort grants the model more latitude to explore edge cases, self-verify, and spawn sub-agents.

Configurable via: /effort command, --effort flag, CLAUDE_CODE_EFFORT_LEVEL env var, or effortLevel in settings.json. Note: Max only persists via the environment variable.

Level	VS Code Slider	Behavior	Best For
Low	●○○○	Minimal reasoning. Skips thinking phase for straightforward queries.	Boilerplate, syntax fixes, file grep, formatting
Medium	●●○○	Bounded adaptive reasoning. Default for Opus 4.6 and Sonnet 4.6 (since v2.1.68).	General feature work, standard bug fixes, daily coding
High	●●●○	Extensive chain-of-thought. Explores multiple approaches. Triggered per-turn by including "ultrathink" in prompt.	Refactoring, cross-file debugging, security analysis
Max Opus Only	●●●●	Unbounded adaptive reasoning. No constraints on thinking depth. Only available on Opus 4.6.	Root-cause analysis, algorithm design, mission-critical synthesis

2. Model Capability Profiles

Illustrative

Opus 4.6 leads on deep architectural reasoning and self-debugging. Sonnet 4.6 has closed the gap on standard code generation. Haiku 4.5 is optimized as a high-speed parsing and retrieval sub-agent.

3. Verified Benchmark Comparison

Published Data

Published scores from Anthropic and independent evaluators. The GPQA gap (17.2pp) is where Opus earns its premium. For standard coding, the delta is negligible.

Benchmark	Opus 4.6	Sonnet 4.6	Delta
SWE-bench Verified	80.8%	79.6%	1.2pp
OSWorld (Computer Use)	72.7%	72.5%	0.2pp
Terminal-Bench 2.0	65.4%	~60%	~5pp
SRE-skills-bench	94.7%	90.4%	4.3pp
GPQA Diamond	91.3%	74.1%	17.2pp

Opus 4.6 Pricing

$5 / $25 per MTok

Sonnet 4.6 Pricing

$3 / $15 per MTok

Haiku 4.5 Pricing

$1 / $5 per MTok

Sources: Anthropic model announcements, Rootly SRE-skills-bench report, NxCode benchmark synthesis. Sonnet Terminal-Bench score estimated (~60%) based on Opus 4.5 baseline of 59.8%.

4. Autonomous Sub-Agent Routing

Opus 4.6 is highly proficient at autonomous delegation. It spawns specialized sub-agents (Explore agents for file search, Haiku for parsing) without requiring user instruction. Manual orchestration typically results in context fragmentation and higher token costs.

Configure sub-agent model: CLAUDE_CODE_SUBAGENT_MODEL="claude-haiku-4-5". For massive parallel work, Agent Teams are available via CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1.

Autonomous Sub-Agents (Default)

User Prompt

"Refactor the auth module"

↓

Opus 4.6 (Primary)

Effort: High or Max

Analyzes architecture, creates plan

↙

↘

Haiku 4.5

Effort: Low

Scans repo, greps deps

Sonnet 4.6

Effort: Medium

Drafts boilerplate logic

↘

↙

Opus 4.6 Synthesis

Validates, assembles final output

opusplan Hybrid Alias

Developer enters Plan Mode

Shift+Tab or /plan

↓

Opus 4.6

Effort: High (planning phase)

Designs architecture, writes step-by-step plan

↓ User approves plan

Sonnet 4.6

Effort: Medium (execution phase)

Executes plan, writes implementation code

↓

50-70% Token Savings

Opus rates only during planning phase

5. Task Routing Matrix

Optimal model and effort pairing per task type. The goal: reserve Opus + High/Max for work that actually needs depth-first reasoning, and let Sonnet + Medium handle the volume.

Task Category	Model	Effort	Orchestration	Rationale
Boilerplate, Syntax, Formatting	Sonnet / Haiku	Low	Single agent	Pure pattern matching. Reasoning overhead is wasted here.
Feature Implementation, Standard Bugs	Sonnet 4.6	Medium	Single + auto sub-agents	98% of Opus coding performance at 40-60% lower cost.
Complex Refactoring, System Design	opusplan	High	Hybrid (Opus plans, Sonnet executes)	Opus designs the blueprint, Sonnet does the volume work.
Security Audits, PR Review	Sonnet 4.6	High	Agent Teams	Sonnet's breadth-first scanning catches diverse edge cases across modules.
Deep Root-Cause Analysis	Opus 4.6	High / Max	Single agent	Complex stack traces and cross-file logic require depth-first reasoning.
Architecture Migration (cross-layer)	Opus 4.6	High	Agent Teams	Parallelized specialists for frontend/backend/infra. High token cost, justified by scope.

6. Effort-Complexity Mismatch: Known Regression Patterns

Illustrative

Mismatching effort level to task complexity produces distinct failure modes. Running Opus 4.6 at Max effort on simple tasks triggers over-exploration loops (documented in #28469, #26761). Running Sonnet 4.6 at Low effort on complex tasks produces speculative paralysis and incomplete fixes.

The mitigation is the same in both directions: match effort to complexity. Medium effort bounds the reasoning engine and forces both models to act on their highest-probability plans.

Opus Over-Exploration Loop

At High/Max effort on simple tasks, Opus generates excessive counter-hypotheses, spawns redundant sub-agents, and burns thousands of reasoning tokens before arriving at the same initial conclusion. Documented in Issues #28469, #26761, #37023.

Sonnet Speculative Paralysis

When pushed beyond its reasoning depth on complex tasks at Low/Medium effort, Sonnet reaches a conclusion but immediately self-doubts, entering a loop of restating the problem without executing a fix.

7. Peak Hours & Session Distribution

Real Data

On March 26, 2026, Anthropic confirmed that 5-hour session limits are adjusted during peak hours: weekdays 5:00-11:00 AM PT. During peak, each token costs more "usage units" against your rolling window. The result: you hit your session ceiling sooner, but weekly limits are unchanged and response quality is unaffected.

Source: Anthropic engineer Thariq Shihipar via X/Twitter, March 26, 2026. Estimated ~7% of users affected. Max 20x subscribers largely insulated (~2% seeing differences).

Peak Hours

Weekdays 5-11 AM PT

1-7 PM GMT / 2-8 PM CET

Off-Peak

All Other Times

Weekends entirely off-peak

What Changes

Session Drain Rate

NOT response quality

Session Distribution by Hour (Pacific Time)

2,094 sessions from a Max 20x subscriber, Jan-Apr 2026. The red zone marks Anthropic's declared peak window. This user's natural schedule (night owl, CT timezone) avoids peak almost entirely: only 7.9% of sessions fall in the 5-11 AM PT window.

8. Weekly Output Trends & Rate Limit Behavior

Real Data

Weekly output tokens, Feb-Mar 2026, spanning three subscription tiers. The upgrade path is visible: Pro $20 (Feb 5), Max 5x $100 (Feb 12), Max 20x $200 (Mar 5). Peak week hit 7.5M output tokens (~$5,558 API-equivalent). After heavy days (>1M tokens), next-day output consistently drops 35-91%.

Heavy Day Recovery Pattern

After days exceeding 1M output tokens, the next day's output drops significantly. This pattern is consistent and reflects the rate limiter throttling sustained peak usage.

Heavy Day	Output	Next Day	Output	Recovery %
Mar 9	1,517,757	Mar 10	649,725	43%
Mar 13	1,591,558	Mar 14	621,848	39%
Mar 16	816,648	Mar 17	228,093	28%
Mar 23	1,086,790	Mar 24	239,295	22%
Mar 27	1,629,977	Mar 28	146,031	9%

API-Equivalent Value

$19,098

for $200/month (95.5x multiplier)

Densest 5-Hour Window

1.23M tokens

~$1,019 API-equivalent

Practical Monthly Ceiling

22-26M tokens

~$20-25K API value

9. API Pricing: Who Got Cheaper, Who Didn't

Published Data

Opus output dropped 67% ($75 to $25/MTok) with the 4.5 release in Nov 2025. Sonnet has held at $15/MTok since March 2024: eight models, zero price changes. Haiku went the other direction: 4x more expensive ($1.25 to $5/MTok) as it got smarter.

Model	Mar 2024 Launch	Output $/MTok Then	Output $/MTok Now	Change
Opus	Claude 3 Opus	$75.00	$25.00	-67%
Sonnet	Claude 3 Sonnet	$15.00	$15.00	0%
Haiku	Claude 3 Haiku	$1.25	$5.00	+300%

Sources: Anthropic pricing page, TechCrunch (Haiku price hike Nov 2024), InfoWorld (Opus 4.5 price drop Nov 2025). Haiku briefly dropped to $4/MTok in Dec 2024 before returning to $5 with Haiku 4.5.

10. The Tightening: Rate Limit Layers

Documented

Pro launched with one rule. Two years later, subscribers navigate three layers of constraints. API tokens got cheaper; subscription access got more controlled.

Sep 2023

1 Layer: 100 msgs / 8 hrs

Simple. Transparent. $20/mo Pro.

Mid 2025

1 Layer (restructured): Token-weighted 5hr window

~45 msgs/5hr. Long conversations drain 8-10x faster than short ones.

Aug 2025

2 Layers: + Weekly ceiling

7-day cap + separate Opus weekly limit. Anthropic claimed <5% affected.

Mar 2026

3 Layers: + Peak hour throttling

Weekdays 5-11 AM PT: session drains faster. Weekly budget unchanged. ~7% newly affected.

The paradox: API prices dropped (Opus: -67%). Subscription prices held ($20 Pro, $200 Max). But effective per-dollar access got progressively more constrained through limit layering. Cheaper to run, more restricted to use.

11. Weekly Output by Model

Real Data

Output token distribution across Opus 4.6, Sonnet 4.6, and Haiku 4.5, Feb-Mar 2026. Note the Feb 26 week where Sonnet outproduced Opus (2.3M vs 1.9M) on the Max 5x plan, followed by a hard pivot to Opus-dominant workflow after upgrading to Max 20x. Haiku maintains steady sub-agent output throughout.

Opus 4.6

81.1% of output

17.5M tokens (Feb-Mar)

Sonnet 4.6

12.1% of output

2.6M tokens (heavy Feb 26 week)

Haiku 4.5 (sub-agent)

6.5% of output

1.4M tokens, steady throughout

12. The Three-Meter System: How Weekly Limits Actually Nest

Confirmed

The Claude dashboard shows three usage meters. The "Sonnet only" meter looks like a separate pool. It is not. Sonnet usage drains both "All models" AND "Sonnet only" simultaneously. When "All models" hits 100%, you are locked out of everything, including Sonnet, regardless of remaining Sonnet capacity.

Confirmed via user reports with screenshots: GitHub Issues #12487, #14362, #12795.

Meter 1

Current Session

5-hour rolling window

Covers all models combined

Meter 2

All Models (Weekly)

Master ceiling

Opus + Sonnet + Haiku all drain this

Meter 3

Sonnet Only (Weekly)

Sub-cap, NOT independent

Prevents burning all budget on Sonnet

How tokens flow through the buckets:

"All Models" weekly budget (master cap)

Opus message → drains All Models only

Haiku message → drains All Models only (cheap)

"Sonnet Only" sub-cap

Sonnet message → drains both All Models AND Sonnet Only

Reset cycles are independent (different days). This does not mean independent pools.

The Misleading Launch Message

When Opus 4.5 launched (Nov 2025), Anthropic's in-app notification said: "Sonnet now has its own limit, it's set to match your previous overall limit, so you can use just as much as before." This language implies independence, but the implementation is nested. Multiple users have flagged the confusion (#12487). Anthropic has not clarified.

What Happens at 100% "All Models"

Locked out of everything. Doesn't matter if "Sonnet only" shows 5% or 95%. The master bucket is empty. Wait for reset.

What Happens at 100% "Sonnet Only"

Sonnet is unavailable, but Opus and Haiku may still work if "All models" has remaining capacity. The sub-cap prevents you from burning all your budget on one model.

13. Max 20x Plan Economics ($200/month)

Anthropic does not publish hard token caps. The system uses internal "usage units" with a rolling 5-hour window and a separate weekly budget. When you hit the limit, you get model-downgraded (Opus to Sonnet), not cut off.

Rate Limit Structure

5-Hour Rolling Window

+ weekly budget (resets weekly)

Official Claim

20x Pro Usage

Exact token counts undisclosed

Degradation Behavior

Opus → Sonnet Fallback

Not hard cutoff

Known Issue: CLI Telemetry Desync (#24727)

The CLI's /status command can report 100% usage while the web dashboard shows significantly lower figures. If Extra Usage is enabled with an unlimited spend cap, the CLI may silently fall back to API billing. Recommendation: disable unlimited Extra Usage or cross-reference the web dashboard until this is patched.

April 4, 2026: Third-Party Framework Enforcement

Anthropic now enforces that Max subscription tokens cannot power third-party autonomous frameworks (OpenClaw, Cursor, custom harnesses). These must use pay-as-you-go API billing. The Max plan is ring-fenced for first-party interfaces: claude.ai, Claude Desktop, and the Claude Code CLI.

14. Configuration Quick Reference

Key environment variables and settings for tuning Claude Code behavior.

Variable	Purpose	Example
ANTHROPIC_MODEL	Override primary model	opusplan
CLAUDE_CODE_EFFORT_LEVEL	Persist effort across sessions (only way to persist `max`)	max
CLAUDE_CODE_SUBAGENT_MODEL	Model for background sub-agents	claude-haiku-4-5
ANTHROPIC_DEFAULT_OPUS_MODEL	Pin Opus version for opusplan routing	claude-opus-4-6
ANTHROPIC_DEFAULT_SONNET_MODEL	Pin Sonnet version for opusplan routing	claude-sonnet-4-6
CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS	Enable Agent Teams (experimental)	1

Known UI Bug (#30726)

Setting effortLevel: "max" in settings.json works at session start, but interacting with the VS Code effort slider or model selector silently downgrades to Medium. Use the CLAUDE_CODE_EFFORT_LEVEL env var for reliable persistence.

Claude 4.6: Model & Effort Matrix

Effort Levels, Task Routing & Max Plan Economics

1. The Four Effort Levels

2. Model Capability Profiles

3. Verified Benchmark Comparison

4. Autonomous Sub-Agent Routing

Autonomous Sub-Agents (Default)

opusplan Hybrid Alias

5. Task Routing Matrix

6. Effort-Complexity Mismatch: Known Regression Patterns

Opus Over-Exploration Loop

Sonnet Speculative Paralysis

7. Peak Hours & Session Distribution

Session Distribution by Hour (Pacific Time)

8. Weekly Output Trends & Rate Limit Behavior

Heavy Day Recovery Pattern

9. API Pricing: Who Got Cheaper, Who Didn't

10. The Tightening: Rate Limit Layers

11. Weekly Output by Model

12. The Three-Meter System: How Weekly Limits Actually Nest

The Misleading Launch Message

What Happens at 100% "All Models"

What Happens at 100% "Sonnet Only"

13. Max 20x Plan Economics ($200/month)

Known Issue: CLI Telemetry Desync (#24727)

April 4, 2026: Third-Party Framework Enforcement

14. Configuration Quick Reference

Known UI Bug (#30726)