Anthropic’s Advisor Strategy is a server-side tool (advisor_20260301) that lets a cheap executor model — Sonnet 4.6 or Haiku 4.5 — consult Claude Opus 4.6 for guidance inside a single API call, with no extra orchestration layer. In benchmarks, Haiku with an Opus advisor more than doubled its standalone score (19.7 % → 41.2 %) while costing 85 % less per task than Sonnet alone. The feature shipped in beta on April 9, 2026 and requires one header plus one tool config to enable.
What is the Anthropic Advisor Strategy?
Most multi-model architectures follow the same script: a large “orchestrator” model delegates subtasks to smaller workers. Anthropic’s Advisor Strategy inverts that relationship. The cheaper model drives the workflow — it reads tool results, iterates toward a solution, and generates final output. Only when it hits a reasoning wall does it ask Opus for a plan, a correction, or a stop signal.
The mechanism is a new server-side tool type called advisor_20260301, added to the Messages API alongside tools like MCP connectors and web search. When the executor decides to invoke it, Anthropic’s infrastructure transparently routes the full shared context to Opus. Opus returns a concise strategic response — typically 400–700 tokens — and the executor resumes. One API call, one billing event, zero separate orchestration.
This matters for anyone building autonomous AI agents at scale: instead of paying Opus rates for every turn of a 30-step coding task, you pay Haiku rates for 27 turns and Opus rates for three short consultations.
How does the Advisor Strategy work under the hood?
The implementation is deliberately minimal. You add one beta header and one tool definition — no SDK change, no new endpoint, no separate context window to manage.
Step 1: Enable the beta
Every API request that uses the advisor tool must include the header anthropic-beta: advisor-tool-2026-03-01. This signals the backend to enable server-side routing between the executor and Opus.
Step 2: Declare the tool
{
"type": "advisor_20260301",
"name": "advisor",
"model": "claude-opus-4-6",
"max_uses": 3
}
The max_uses parameter caps how many times the executor can consult Opus within a single request. Anthropic’s own evaluations used 2–3 — enough to cover critical decision points like initial planning, mid-task correction, and final output validation — without letting costs spiral. Each consultation is billed at Opus input/output rates; everything else stays at the executor’s rate.
Step 3: Make the API call
import anthropic
client = anthropic.Anthropic()
# Define your regular tools + the advisor tool
tools = [
{
"type": "advisor_20260301",
"name": "advisor",
"model": "claude-opus-4-6",
"max_uses": 3
},
{
"name": "run_tests",
"description": "Execute test suite and return results",
"input_schema": {
"type": "object",
"properties": {
"test_path": {"type": "string"}
},
"required": ["test_path"]
}
}
]
response = client.beta.messages.create(
model="claude-sonnet-4-6", # executor
max_tokens=8192,
betas=["advisor-tool-2026-03-01"],
tools=tools,
messages=[
{
"role": "user",
"content": "Fix the failing test in tests/auth.py. "
"Run the suite after each change."
}
]
)
# Process tool_use blocks as usual — advisor_tool_result
# blocks appear inline alongside regular tool results
for block in response.content:
if block.type == "tool_use":
print(f"Tool: {block.name} | Input: {block.input}")
elif block.type == "text":
print(block.text)
Notice that the advisor is declared alongside regular tools. The executor — Sonnet 4.6 here — decides autonomously when to invoke it. The advisor_tool_result blocks appear inline in the response, so your existing tool-use parsing logic doesn’t need restructuring.
For multi-turn conversations, pass the full assistant content array — including advisor_tool_result blocks — back in subsequent requests. The executor treats previous advisor guidance as part of its conversational history, maintaining coherence across turns without re-consulting Opus for already-resolved decisions.
How does data flow between Executor and Advisor?
The diagram above captures the core insight: the executor model owns the loop. It calls tools, reads results, generates output — standard agentic workflow. The advisor only enters the loop when the executor explicitly requests guidance. Once Opus responds, control returns to the executor, which continues with zero context-switching overhead because both models share the same conversation state.
What do the benchmarks actually show?
Anthropic published three benchmark results alongside the beta launch. The numbers reveal a clear pattern: the advisor strategy delivers its biggest gains when the baseline executor is weakest.
| Configuration | Benchmark | Score | Δ vs. Solo | Cost vs. Opus Solo |
|---|---|---|---|---|
| Sonnet 4.6 alone | SWE-bench Multilingual | 72.1 % | — | ~5× cheaper |
| Sonnet 4.6 + Opus advisor | SWE-bench Multilingual | 74.8 % | +2.7 pts | ~4× cheaper |
| Haiku 4.5 alone | BrowseComp | 19.7 % | — | ~20× cheaper |
| Haiku 4.5 + Opus advisor | BrowseComp | 41.2 % | +21.5 pts | 85 % cheaper than Sonnet |
The Sonnet result — a 2.7-point lift on SWE-bench Multilingual with an 11.9 % cost reduction — looks modest in isolation. But SWE-bench Multilingual measures end-to-end code generation across languages, so even small gains reflect meaningfully fewer broken patches on real repositories.
The Haiku result is more dramatic. Jumping from 19.7 % to 41.2 % on BrowseComp — a benchmark that tests multi-step web research — means Haiku went from essentially unusable to competitive, all while costing 85 % less than running Sonnet alone. For high-volume pipelines where you’re running thousands of agent loops per day, that cost delta compounds fast.
Anthropic’s evaluations used max_uses: 3. Increasing this lets the executor consult Opus more often, which likely improves quality but also increases cost proportionally. No public data exists yet for max_uses values above 3 — teams should run their own eval suites against specific workloads to find the optimal setting.
How does this compare to other multi-model approaches?
The advisor strategy isn’t the only way to combine models. OpenAI, Google, and several open-source frameworks have shipped their own patterns. The differences matter when you’re choosing an architecture for production.
| Approach | Who drives? | Context sharing | API overhead |
|---|---|---|---|
| Anthropic Advisor | Cheap model (executor) | Full shared context, server-side | Single API call |
| Orchestrator-worker | Expensive model (orchestrator) | Orchestrator decides what to share | N+1 API calls |
| Mixture-of-Agents (MoA) | Parallel, then aggregator | Separate contexts merged | M×N API calls |
| Router / classifier | Router picks model per query | No shared context | 1 routing + 1 generation |
The key architectural difference is that the advisor strategy is server-side and single-call. Orchestrator-worker patterns — which frameworks like LangGraph and CrewAI implement — require your application to manage multiple API calls, stitch contexts together, and handle failure states between them. The advisor tool abstracts all of that: you declare the tool, and the server handles routing, context sharing, and billing.
This simplicity comes at a cost: you lose fine-grained control over what context the advisor sees. In an orchestrator-worker setup you can curate exactly what the expensive model processes. With the advisor tool, Opus gets the full conversation — which could mean expensive input tokens if your context window is large. For tasks with carefully engineered context, this matters.
When should you actually use it — and when shouldn’t you?
The advisor strategy isn’t universally optimal. Its value depends on the shape of your workload — specifically, how much of it is “mechanical” versus “requires deep reasoning.”
Strong fit
Agentic coding tasks — where 80–90 % of turns are reading files, running tests, and applying straightforward edits, but 2–3 turns require architectural decisions. This is exactly what SWE-bench Multilingual tests, and the benchmark gains confirm it.
Multi-step research pipelines — a Haiku executor can fetch URLs, extract text, and summarize at cents per run. When the pipeline hits an ambiguous source or conflicting data, Opus adjudicates. The BrowseComp results (19.7 % → 41.2 %) demonstrate this pattern directly.
High-volume classification with edge cases — legal document triage, support ticket routing, content moderation. The bulk of items are straightforward; the advisor handles the gray zone.
Poor fit
Uniformly hard tasks — if every turn requires frontier-level reasoning (complex mathematics, novel research synthesis), the executor will consult the advisor at every opportunity, and you’re essentially paying Opus rates plus Sonnet overhead. Just use Opus directly.
Latency-critical applications — each advisor consultation adds server-side round-trip latency. For real-time chat or interactive coding assistants where response time matters more than cost, the overhead may be unacceptable.
Single-turn queries — the advisor strategy shines in multi-step agentic loops. For one-shot question answering or simple generation, a routing approach (send easy queries to Haiku, hard ones to Opus) is both simpler and cheaper.
What are the cost implications at scale?
Let’s run concrete numbers. Assume a coding agent that averages 25 turns per task, where each turn involves roughly 2,000 input tokens and 500 output tokens.
# Pricing per 1M tokens (April 2026)
PRICING = {
"opus": {"input": 15.00, "output": 75.00},
"sonnet": {"input": 3.00, "output": 15.00},
"haiku": {"input": 0.80, "output": 4.00},
}
def cost_per_task(turns=25, advisor_turns=3,
input_tok=2000, output_tok=500,
advisor_output=600, executor="sonnet"):
ex = PRICING[executor]
ad = PRICING["opus"]
executor_turns = turns - advisor_turns
executor_cost = (
executor_turns * input_tok * ex["input"]
+ executor_turns * output_tok * ex["output"]
) / 1_000_000
advisor_cost = (
advisor_turns * input_tok * ad["input"]
+ advisor_turns * advisor_output * ad["output"]
) / 1_000_000
return executor_cost + advisor_cost
opus_only = cost_per_task(executor="opus", advisor_turns=0,
turns=25, output_tok=500)
sonnet_adv = cost_per_task(executor="sonnet", advisor_turns=3)
haiku_adv = cost_per_task(executor="haiku", advisor_turns=3)
print(f"Opus solo: ${opus_only:.4f}/task")
print(f"Sonnet + Opus adv: ${sonnet_adv:.4f}/task")
print(f"Haiku + Opus adv: ${haiku_adv:.4f}/task")
# Opus solo: $1.6875/task
# Sonnet + Opus adv: $0.4590/task
# Haiku + Opus adv: $0.2244/task
At 1,000 tasks per day, Opus solo costs roughly $1,688 daily. Sonnet with an Opus advisor drops that to ~$459 — a 72.8 % reduction. Haiku with an advisor pushes it to ~$224 — an 86.7 % reduction — at the expense of lower baseline quality on the non-advised turns.
The calculation also reveals a hidden variable: context window size. Advisor tokens are billed at Opus input rates, so if your context grows to 50K tokens by turn 20, each advisor consultation becomes significantly more expensive. Teams running long-context agents should consider resetting or summarizing context before advisor calls.
What does this mean for EU AI Act compliance?
The EU AI Act’s risk-tier framework introduces transparency and auditability requirements that directly affect multi-model architectures. Article 14 mandates human oversight provisions for high-risk AI systems, and Article 13 requires systems to be transparent enough for deployers to interpret output.
The advisor strategy creates a paper trail by design. Each advisor_tool_result block in the API response records exactly when the executor consulted Opus and what guidance was returned. This is significantly more auditable than orchestrator-worker setups where inter-model communication happens across separate API calls and must be logged by the application layer.
However, the strategy also introduces a compliance question that no competitor article has addressed: who is the “provider” of the output? When a Haiku executor generates text based on Opus guidance, the system has two models contributing to a single output. Under Article 16’s provider obligations, the deployer (your company) bears responsibility — but demonstrating which model contributed what becomes essential for incident investigation. Logging the full response including advisor blocks isn’t optional; it’s a compliance requirement.
If your agent operates in a high-risk domain (healthcare, legal, hiring), store the complete API response — including all advisor_tool_result blocks — for the retention period mandated by your risk tier. The advisor blocks serve as an automatic decision log that auditors can trace.
How does this connect to Anthropic’s broader alignment approach?
The advisor strategy isn’t just a cost optimization tool — it reflects a pattern that runs through Anthropic’s alignment research. The idea that a more capable model should advise rather than control mirrors the philosophy behind Constitutional AI, where principles guide model behavior rather than rigid rules.
In Constitutional AI, a model critiques and revises its own outputs against a set of principles. The advisor strategy externalizes that critic: Opus serves as a principled reviewer that can course-correct the executor without replacing it. This is closer to what alignment researchers call “scalable oversight” — using a stronger system to verify a weaker one’s work, rather than doing the work itself.
There’s a direct line from this to agent safety concerns. Autonomous agents that run for dozens of turns accumulate compounding errors. Each wrong turn makes the next one more likely, because the context window fills with incorrect assumptions. An advisor that can intervene at critical junctures — “stop, your approach is wrong, here’s the correct plan” — acts as a safety guardrail against drift, similar to how RLHF corrects reward hacking by grounding the model in human preferences.
Anthropic’s research on Claude’s internal states adds another dimension: a smaller executor model may lack the “meta-awareness” to recognize when it’s out of its depth. The advisor provides that meta-layer externally.
What are the known limitations and gotchas?
The beta label isn’t decorative. Several constraints affect production readiness:
Priority Tier doesn’t cascade. If you’ve purchased Priority Tier for Sonnet to guarantee low latency, that tier does not extend to advisor calls. Opus consultations use standard throughput unless you separately purchase Priority Tier for Opus — which may defeat the cost savings for latency-sensitive workloads.
No streaming for advisor responses. The advisor’s output arrives as a complete block within the response. For long-running agent loops, the total time-to-first-token increases by the advisor’s generation time each time it’s consulted.
Context bloat is the hidden cost driver. The advisor sees the full shared context. In a 25-turn coding task, that context might grow to 80K+ tokens. Each advisor consultation re-reads that entire context at Opus input pricing ($15/MTok). This can erode the cost advantage significantly in long-running tasks — the exact scenario where the advisor strategy is supposed to shine.
The executor decides when to consult. This is both a feature and a risk. A Haiku executor might not have the judgment to know when it should escalate. Conversely, an overly cautious executor might burn through max_uses early, leaving the hardest decisions unadvised. Tuning system prompts to guide escalation behavior is currently the only lever, and it’s imprecise.
How should you implement this in production?
Based on Anthropic’s evaluation data and the architecture’s constraints, here’s a practical implementation pattern for production-grade agent systems:
import anthropic
import json
import logging
log = logging.getLogger("advisor_agent")
def run_agent_with_advisor(task: str, max_turns: int = 30):
client = anthropic.Anthropic()
tools = [
{
"type": "advisor_20260301",
"name": "advisor",
"model": "claude-opus-4-6",
"max_uses": 3 # tune per workload
},
# ... your domain-specific tools
]
messages = [{"role": "user", "content": task}]
advisor_calls = 0
for turn in range(max_turns):
try:
response = client.beta.messages.create(
model="claude-sonnet-4-6",
max_tokens=8192,
betas=["advisor-tool-2026-03-01"],
tools=tools,
messages=messages,
)
except anthropic.RateLimitError:
log.warning(f"Rate limited at turn {turn}, backing off")
import time; time.sleep(2 ** turn)
continue
# Track advisor usage for monitoring
for block in response.content:
if getattr(block, "type", None) == "advisor_tool_result":
advisor_calls += 1
log.info(
f"Advisor consulted at turn {turn} "
f"({advisor_calls}/3 uses)"
)
# Standard agentic loop: handle tool_use, append results
if response.stop_reason == "end_turn":
log.info(
f"Task complete in {turn+1} turns, "
f"{advisor_calls} advisor calls"
)
return response
# Append assistant + tool results for next turn
messages.append({"role": "assistant", "content": response.content})
tool_results = process_tool_calls(response)
messages.append({"role": "user", "content": tool_results})
log.warning(f"Hit max turns ({max_turns})")
return response
Before committing to the advisor pattern, run your existing eval suite against three configurations: executor solo, executor + advisor, and Opus solo. Compare task completion rate, cost per task, and latency p95. The advisor strategy saves money only when the executor can handle the majority of turns competently — if your eval shows the executor failing without advice on > 30 % of turns, Opus solo may be more cost-effective.
FAQ
anthropic-beta: advisor-tool-2026-03-01 header in every request. The tool definition and API surface may change before GA — Anthropic has not announced a general availability date. Use it in production if your workload tolerates potential API changes, but pin to the beta version string and test against updates.
claude-opus-4-6 as the advisor model. The tool definition requires specifying this exact model string. Future versions may support other advisor models, but as of the beta launch, Opus is the only option.
max_uses budget, it continues without the advisor for the remaining turns. The executor doesn’t error or stop — it simply loses the ability to consult Opus. This is why tuning system prompts to guide escalation timing is critical: instruct the executor to reserve at least one advisor call for the final validation step.
Sources & Bibliography
- Anthropic. The Advisor Strategy: Give Sonnet an intelligence boost with Opus (April 9, 2026). claude.com/blog/the-advisor-strategy
- Anthropic. Agentic tool use — Messages API documentation (2026). docs.anthropic.com
- Wang, J. et al. Mixture-of-Agents Enhances Large Language Model Capabilities. arXiv:2406.04692 (2024). arxiv.org/abs/2406.04692
- European Parliament. Regulation (EU) 2024/1689 — Artificial Intelligence Act (2024). eur-lex.europa.eu
- SWE-bench. SWE-bench Multilingual — Evaluating Language Models on Real-World Coding Tasks. swebench.com
- Bai, Y. et al. Constitutional AI: Harmlessness from AI Feedback. arXiv:2212.08073 (2022). arxiv.org/abs/2212.08073