travisFIXES Research

Claude 4.6: Model & Effort Matrix

Effort Levels, Task Routing & Max Plan Economics

A fact-checked reference for optimizing Claude Code CLI and VS Code workflows. Covers Opus 4.6, Sonnet 4.6, and Haiku 4.5 across effort tiers, sub-agent orchestration, and subscription economics.

Last verified: April 5, 2026

TLDR

I paid $200/month for Claude Max. Anthropic gave me $19,098 in API-equivalent compute.

14 sections below. Real usage data, verified benchmarks, and documented gotchas from 2,094 sessions.

1. The Four Effort Levels

Illustrative Data

Claude Code's effort parameter controls how deeply the model reasons before responding. It's a behavioral signal, not a strict token ceiling. Higher effort grants the model more latitude to explore edge cases, self-verify, and spawn sub-agents.

Configurable via: /effort command, --effort flag, CLAUDE_CODE_EFFORT_LEVEL env var, or effortLevel in settings.json. Note: Max only persists via the environment variable.

Level VS Code Slider Behavior Best For
Low ●○○○ Minimal reasoning. Skips thinking phase for straightforward queries. Boilerplate, syntax fixes, file grep, formatting
Medium ●●○○ Bounded adaptive reasoning. Default for Opus 4.6 and Sonnet 4.6 (since v2.1.68). General feature work, standard bug fixes, daily coding
High ●●●○ Extensive chain-of-thought. Explores multiple approaches. Triggered per-turn by including "ultrathink" in prompt. Refactoring, cross-file debugging, security analysis
Max Opus Only ●●●● Unbounded adaptive reasoning. No constraints on thinking depth. Only available on Opus 4.6. Root-cause analysis, algorithm design, mission-critical synthesis

2. Model Capability Profiles

Illustrative

Opus 4.6 leads on deep architectural reasoning and self-debugging. Sonnet 4.6 has closed the gap on standard code generation. Haiku 4.5 is optimized as a high-speed parsing and retrieval sub-agent.

3. Verified Benchmark Comparison

Published Data

Published scores from Anthropic and independent evaluators. The GPQA gap (17.2pp) is where Opus earns its premium. For standard coding, the delta is negligible.

Benchmark Opus 4.6 Sonnet 4.6 Delta
SWE-bench Verified 80.8% 79.6% 1.2pp
OSWorld (Computer Use) 72.7% 72.5% 0.2pp
Terminal-Bench 2.0 65.4% ~60% ~5pp
SRE-skills-bench 94.7% 90.4% 4.3pp
GPQA Diamond 91.3% 74.1% 17.2pp
Opus 4.6 Pricing
$5 / $25 per MTok
Sonnet 4.6 Pricing
$3 / $15 per MTok
Haiku 4.5 Pricing
$1 / $5 per MTok

Sources: Anthropic model announcements, Rootly SRE-skills-bench report, NxCode benchmark synthesis. Sonnet Terminal-Bench score estimated (~60%) based on Opus 4.5 baseline of 59.8%.

4. Autonomous Sub-Agent Routing

Opus 4.6 is highly proficient at autonomous delegation. It spawns specialized sub-agents (Explore agents for file search, Haiku for parsing) without requiring user instruction. Manual orchestration typically results in context fragmentation and higher token costs.

Configure sub-agent model: CLAUDE_CODE_SUBAGENT_MODEL="claude-haiku-4-5". For massive parallel work, Agent Teams are available via CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1.

Autonomous Sub-Agents (Default)

User Prompt
"Refactor the auth module"
Opus 4.6 (Primary)
Effort: High or Max
Analyzes architecture, creates plan
Haiku 4.5
Effort: Low
Scans repo, greps deps
Sonnet 4.6
Effort: Medium
Drafts boilerplate logic
Opus 4.6 Synthesis
Validates, assembles final output

opusplan Hybrid Alias

Developer enters Plan Mode
Shift+Tab or /plan
Opus 4.6
Effort: High (planning phase)
Designs architecture, writes step-by-step plan
User approves plan
Sonnet 4.6
Effort: Medium (execution phase)
Executes plan, writes implementation code
50-70% Token Savings
Opus rates only during planning phase

5. Task Routing Matrix

Optimal model and effort pairing per task type. The goal: reserve Opus + High/Max for work that actually needs depth-first reasoning, and let Sonnet + Medium handle the volume.

Task Category Model Effort Orchestration Rationale
Boilerplate, Syntax, Formatting Sonnet / Haiku Low Single agent Pure pattern matching. Reasoning overhead is wasted here.
Feature Implementation, Standard Bugs Sonnet 4.6 Medium Single + auto sub-agents 98% of Opus coding performance at 40-60% lower cost.
Complex Refactoring, System Design opusplan High Hybrid (Opus plans, Sonnet executes) Opus designs the blueprint, Sonnet does the volume work.
Security Audits, PR Review Sonnet 4.6 High Agent Teams Sonnet's breadth-first scanning catches diverse edge cases across modules.
Deep Root-Cause Analysis Opus 4.6 High / Max Single agent Complex stack traces and cross-file logic require depth-first reasoning.
Architecture Migration (cross-layer) Opus 4.6 High Agent Teams Parallelized specialists for frontend/backend/infra. High token cost, justified by scope.

6. Effort-Complexity Mismatch: Known Regression Patterns

Illustrative

Mismatching effort level to task complexity produces distinct failure modes. Running Opus 4.6 at Max effort on simple tasks triggers over-exploration loops (documented in #28469, #26761). Running Sonnet 4.6 at Low effort on complex tasks produces speculative paralysis and incomplete fixes.

The mitigation is the same in both directions: match effort to complexity. Medium effort bounds the reasoning engine and forces both models to act on their highest-probability plans.

Opus Over-Exploration Loop

At High/Max effort on simple tasks, Opus generates excessive counter-hypotheses, spawns redundant sub-agents, and burns thousands of reasoning tokens before arriving at the same initial conclusion. Documented in Issues #28469, #26761, #37023.

Sonnet Speculative Paralysis

When pushed beyond its reasoning depth on complex tasks at Low/Medium effort, Sonnet reaches a conclusion but immediately self-doubts, entering a loop of restating the problem without executing a fix.

7. Peak Hours & Session Distribution

Real Data

On March 26, 2026, Anthropic confirmed that 5-hour session limits are adjusted during peak hours: weekdays 5:00-11:00 AM PT. During peak, each token costs more "usage units" against your rolling window. The result: you hit your session ceiling sooner, but weekly limits are unchanged and response quality is unaffected.

Source: Anthropic engineer Thariq Shihipar via X/Twitter, March 26, 2026. Estimated ~7% of users affected. Max 20x subscribers largely insulated (~2% seeing differences).

Peak Hours
Weekdays 5-11 AM PT
1-7 PM GMT / 2-8 PM CET
Off-Peak
All Other Times
Weekends entirely off-peak
What Changes
Session Drain Rate
NOT response quality

Session Distribution by Hour (Pacific Time)

2,094 sessions from a Max 20x subscriber, Jan-Apr 2026. The red zone marks Anthropic's declared peak window. This user's natural schedule (night owl, CT timezone) avoids peak almost entirely: only 7.9% of sessions fall in the 5-11 AM PT window.

8. Weekly Output Trends & Rate Limit Behavior

Real Data

Weekly output tokens, Feb-Mar 2026, spanning three subscription tiers. The upgrade path is visible: Pro $20 (Feb 5), Max 5x $100 (Feb 12), Max 20x $200 (Mar 5). Peak week hit 7.5M output tokens (~$5,558 API-equivalent). After heavy days (>1M tokens), next-day output consistently drops 35-91%.

Heavy Day Recovery Pattern

After days exceeding 1M output tokens, the next day's output drops significantly. This pattern is consistent and reflects the rate limiter throttling sustained peak usage.

Heavy Day Output Next Day Output Recovery %
Mar 91,517,757 Mar 10649,725 43%
Mar 131,591,558 Mar 14621,848 39%
Mar 16816,648 Mar 17228,093 28%
Mar 231,086,790 Mar 24239,295 22%
Mar 271,629,977 Mar 28146,031 9%
API-Equivalent Value
$19,098
for $200/month (95.5x multiplier)
Densest 5-Hour Window
1.23M tokens
~$1,019 API-equivalent
Practical Monthly Ceiling
22-26M tokens
~$20-25K API value

9. API Pricing: Who Got Cheaper, Who Didn't

Published Data

Opus output dropped 67% ($75 to $25/MTok) with the 4.5 release in Nov 2025. Sonnet has held at $15/MTok since March 2024: eight models, zero price changes. Haiku went the other direction: 4x more expensive ($1.25 to $5/MTok) as it got smarter.

Model Mar 2024 Launch Output $/MTok Then Output $/MTok Now Change
Opus Claude 3 Opus $75.00 $25.00 -67%
Sonnet Claude 3 Sonnet $15.00 $15.00 0%
Haiku Claude 3 Haiku $1.25 $5.00 +300%

Sources: Anthropic pricing page, TechCrunch (Haiku price hike Nov 2024), InfoWorld (Opus 4.5 price drop Nov 2025). Haiku briefly dropped to $4/MTok in Dec 2024 before returning to $5 with Haiku 4.5.

10. The Tightening: Rate Limit Layers

Documented

Pro launched with one rule. Two years later, subscribers navigate three layers of constraints. API tokens got cheaper; subscription access got more controlled.

Sep 2023
1 Layer: 100 msgs / 8 hrs
Simple. Transparent. $20/mo Pro.
Mid 2025
1 Layer (restructured): Token-weighted 5hr window
~45 msgs/5hr. Long conversations drain 8-10x faster than short ones.
Aug 2025
2 Layers: + Weekly ceiling
7-day cap + separate Opus weekly limit. Anthropic claimed <5% affected.
Mar 2026
3 Layers: + Peak hour throttling
Weekdays 5-11 AM PT: session drains faster. Weekly budget unchanged. ~7% newly affected.

The paradox: API prices dropped (Opus: -67%). Subscription prices held ($20 Pro, $200 Max). But effective per-dollar access got progressively more constrained through limit layering. Cheaper to run, more restricted to use.

11. Weekly Output by Model

Real Data

Output token distribution across Opus 4.6, Sonnet 4.6, and Haiku 4.5, Feb-Mar 2026. Note the Feb 26 week where Sonnet outproduced Opus (2.3M vs 1.9M) on the Max 5x plan, followed by a hard pivot to Opus-dominant workflow after upgrading to Max 20x. Haiku maintains steady sub-agent output throughout.

Opus 4.6
81.1% of output
17.5M tokens (Feb-Mar)
Sonnet 4.6
12.1% of output
2.6M tokens (heavy Feb 26 week)
Haiku 4.5 (sub-agent)
6.5% of output
1.4M tokens, steady throughout

12. The Three-Meter System: How Weekly Limits Actually Nest

Confirmed

The Claude dashboard shows three usage meters. The "Sonnet only" meter looks like a separate pool. It is not. Sonnet usage drains both "All models" AND "Sonnet only" simultaneously. When "All models" hits 100%, you are locked out of everything, including Sonnet, regardless of remaining Sonnet capacity.

Confirmed via user reports with screenshots: GitHub Issues #12487, #14362, #12795.

Meter 1
Current Session
5-hour rolling window
Covers all models combined
Meter 2
All Models (Weekly)
Master ceiling
Opus + Sonnet + Haiku all drain this
Meter 3
Sonnet Only (Weekly)
Sub-cap, NOT independent
Prevents burning all budget on Sonnet
How tokens flow through the buckets:
"All Models" weekly budget (master cap)
Opus message → drains All Models only
Haiku message → drains All Models only (cheap)
"Sonnet Only" sub-cap
Sonnet message → drains both All Models AND Sonnet Only
Reset cycles are independent (different days). This does not mean independent pools.

The Misleading Launch Message

When Opus 4.5 launched (Nov 2025), Anthropic's in-app notification said: "Sonnet now has its own limit, it's set to match your previous overall limit, so you can use just as much as before." This language implies independence, but the implementation is nested. Multiple users have flagged the confusion (#12487). Anthropic has not clarified.

What Happens at 100% "All Models"

Locked out of everything. Doesn't matter if "Sonnet only" shows 5% or 95%. The master bucket is empty. Wait for reset.

What Happens at 100% "Sonnet Only"

Sonnet is unavailable, but Opus and Haiku may still work if "All models" has remaining capacity. The sub-cap prevents you from burning all your budget on one model.

13. Max 20x Plan Economics ($200/month)

Anthropic does not publish hard token caps. The system uses internal "usage units" with a rolling 5-hour window and a separate weekly budget. When you hit the limit, you get model-downgraded (Opus to Sonnet), not cut off.

Rate Limit Structure
5-Hour Rolling Window
+ weekly budget (resets weekly)
Official Claim
20x Pro Usage
Exact token counts undisclosed
Degradation Behavior
Opus → Sonnet Fallback
Not hard cutoff

Known Issue: CLI Telemetry Desync (#24727)

The CLI's /status command can report 100% usage while the web dashboard shows significantly lower figures. If Extra Usage is enabled with an unlimited spend cap, the CLI may silently fall back to API billing. Recommendation: disable unlimited Extra Usage or cross-reference the web dashboard until this is patched.

April 4, 2026: Third-Party Framework Enforcement

Anthropic now enforces that Max subscription tokens cannot power third-party autonomous frameworks (OpenClaw, Cursor, custom harnesses). These must use pay-as-you-go API billing. The Max plan is ring-fenced for first-party interfaces: claude.ai, Claude Desktop, and the Claude Code CLI.

14. Configuration Quick Reference

Key environment variables and settings for tuning Claude Code behavior.

Variable Purpose Example
ANTHROPIC_MODEL Override primary model opusplan
CLAUDE_CODE_EFFORT_LEVEL Persist effort across sessions (only way to persist max) max
CLAUDE_CODE_SUBAGENT_MODEL Model for background sub-agents claude-haiku-4-5
ANTHROPIC_DEFAULT_OPUS_MODEL Pin Opus version for opusplan routing claude-opus-4-6
ANTHROPIC_DEFAULT_SONNET_MODEL Pin Sonnet version for opusplan routing claude-sonnet-4-6
CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS Enable Agent Teams (experimental) 1

Known UI Bug (#30726)

Setting effortLevel: "max" in settings.json works at session start, but interacting with the VS Code effort slider or model selector silently downgrades to Medium. Use the CLAUDE_CODE_EFFORT_LEVEL env var for reliable persistence.