Putting 1,000 Agents Inside One Task: The Engineering Logic of Claude Subagents and Workflows — Nebutra Blog

Putting 1,000 Agents Inside One Task

When you ask AI to do deep research and receive a cited report a few minutes later, the system behind it is often not one AI. It is a temporary team of AI workers, assembled for that task and running in parallel. This article explains, with as little jargon as possible, how that system works and why its success or failure is ultimately written in a token ledger.

Start with the intuition: the strongest AI products are quietly shifting from "one chatbot answers you" to "one AI acts as project lead and temporarily hires a team of AIs to do the work." Claude's Research feature and the broader category of Deep Research products are built around this pattern. The industry calls it a multi-agent system.

Why should AI builders, investors, and startup teams care? Because it determines two things at once: the ceiling of product capability, meaning whether AI can really complete complex tasks autonomously, and the floor of unit economics, meaning how much each completed task costs. Those two forces are why so many AI demos feel magical while commercialization remains hard.

This article moves in two parts. First it explains the underlying system: architecture, memory, concurrency, cost, and reliability. Then it answers the operational questions: when is this worth using, and how do you put it into production without losing control. A real Claude Code workflow example appears near the end to ground the abstractions in code.

Small glossary before reading

Token: the basic unit AI systems process and bill for. You can roughly think of it as text volume. Burning tokens means burning money.

Context window: the amount of information an AI can keep in working memory at once. When it fills up, information must be truncated, compressed, or moved elsewhere.

Agent: an AI system that does more than answer. It can call tools, search, read pages, write files, and keep taking steps until the task is done.

TL;DR

Claude's multi-agent orchestration is a classic orchestrator-worker pattern. A lead AI decomposes the task, dispatches multiple worker AIs, each worker researches inside its own context window, returns a compressed finding, and the lead AI synthesizes the result. A separate citation agent can then attach sources to each claim.
The engineering lever is token economics. Multi-agent systems can consume roughly 15x the tokens of an ordinary chat, while Anthropic's analysis attributes about 80% of performance variance to token usage itself. Use the pattern only for tasks that are high-value, parallelizable, and larger than one context window.
Agents do not share memory. State moves through a one-way channel: prompt in, final message out. Reliability depends on resumable execution, checkpoints, retries, tracing, and staged deployment because stateful agent errors compound.

01 / Foundation: Workflow and Agent Are Different Things

Anthropic draws an important architecture line in Building effective agents.^[1] Both workflows and agents are agentic systems, but they differ in who owns the control flow.

Dimension	Workflow	Agent
Control	LLMs and tools are orchestrated by predefined code paths. You own the pipeline.	The LLM decides the process and tool calls. The model owns the pipeline.
Subtasks	Fixed when the code is written.	Dynamically decomposed by the orchestrator for the current input.
Best fit	Clear boundaries, predictability, reproducibility.	Flexible work where model autonomy and scale matter.
Cost	More controlled, cheaper, easier to debug.

The script that opens the original example is really a workflow: where the model is called, how many paths run in parallel, and when verification happens are all fixed in code. Claude's Research feature, where the model decides how many subagents to launch and what each one should do, is closer to a real agent. Anthropic's practical advice is simple: find the simplest solution first, and add complexity only when needed.^[1]

Anthropic groups workflows into five escalating patterns: prompt chaining, routing, parallelization, orchestrator-workers, and evaluator-optimizer loops. The difference between orchestrator-worker and simple parallelization is that the subtasks are not predefined. The orchestrator decides them at runtime.^[1]

02 / Architecture: How Orchestrator-Worker Runs in a Research System

How we built our multi-agent research system is the most concrete primary source for this pattern.^[2] The flow looks like this:

Rendering diagram...

The lead agent stores a plan first. LeadResearcher analyzes the query, creates a strategy, and writes the plan into memory so it survives context-window pressure.^[2]
It dispatches parallel subagents. The lead agent creates focused worker agents, often using a cheaper model, and sends them out at the same time.
Each subagent works independently. A worker searches, evaluates tool results through interleaved thinking, and returns compressed findings.
The lead synthesizes and decides. It checks whether more research is needed and may launch additional workers in an OODA-style loop.
CitationAgent finishes the job. After the research loop ends, a specialized citation agent maps claims back to concrete sources. Research and citation are separated responsibilities.

Anthropic's core idea is that search is compression: extracting insight from a large information space.^[2] Subagents help because they explore different parts of the problem in parallel, inside isolated context windows, and return only the important tokens to the lead agent. Anthropic reports that an Opus 4 lead with Sonnet 4 workers outperformed a single Opus 4 agent by 90.2% on an internal research evaluation.^[2] The public research-lead prompt says the same thing operationally: the lead should coordinate, guide, and synthesize, not perform all first-hand research itself.^[11]

03 / Context: Isolation and Window Management

In Effective context engineering for AI agents, Anthropic treats context as an engineering resource.^[3] Transformers form pairwise relationships across tokens, so attention gets diluted as context grows. The resulting failure mode is often called context rot. Good context engineering means finding the smallest high-signal set of tokens that maximizes the chance of completing the objective.^[3]

How does state move? One way only.

According to the Agent SDK subagents documentation, each subagent runs in a fresh session. Its intermediate tool calls and results stay inside that subagent. Only the final message returns to the parent. The only channel from parent to child is the prompt string passed to the Agent tool. The child cannot see the parent's conversation history, tool results, or system prompt. There is no shared memory.^[6]

The three tools for long tasks

Compaction. When a session approaches the context limit, summarize the useful state and restart with a smaller window. Claude Code preserves architecture decisions, unresolved bugs, and implementation details while dropping redundant tool output.^[3]
Agentic memory. Store notes outside the context window, for example in a local notes file, and read them back only when needed.
Subagent isolation itself. A focused worker may spend tens of thousands of tokens in a clean window, but only return a 1-2k token summary. The architecture is also a context-management strategy.

The API layer productizes these ideas. Context editing removes stale tool calls and results near the limit. Memory tools store information outside the window. Server-side compaction summarizes earlier context as the limit approaches. Anthropic reports that memory tools plus context editing improved agent retrieval performance by 39%, and that context editing reduced token usage by 84% in a 100-turn web retrieval evaluation.^[5]

04 / Concurrency: Parallel Tool Use, Fan-Out, and Dynamic Workflows

Parallelism appears at two layers. The lead agent can launch 3-5 subagents at once, and each subagent can make 3 or more parallel tool calls in a single turn. Anthropic says this can cut research time for complex queries by up to 90%.^[2]

Metric	Meaning
3-5	Typical number of subagents launched by the lead at once.
16	Maximum concurrent agents in Dynamic Workflows.
1000	Maximum total agents in one Dynamic Workflow run.

At the API layer, Claude can emit multiple tool_use blocks in the same assistant turn. Applications should return all matching tool_result blocks inside a single user message. Splitting them across multiple messages can reduce future parallelism.^[8] The tool calls are unordered, so applications can use Promise.all or asyncio.gather directly.

python

# The assistant emits multiple tool_use blocks in one turn.
# The app executes them in parallel, then returns all results in one user message.
results = await asyncio.gather(*[
    run_tool(call) for call in assistant_turn.tool_use_blocks
])
messages.append({
    "role": "user",
    "content": [tool_result(c.id, r) for c, r in zip(calls, results)]
})

How many agents can you run? At the API level, there is no model-imposed hard limit on subagents. Your constraints are API rate limits and infrastructure. Anthropic rate-limits organizations across RPM, input tokens per minute, and output tokens per minute, with token-bucket behavior and 429 responses that differ from 529 capacity errors. Dynamic Workflows, introduced as a research preview alongside Claude Opus 4.8 in May 2026, adds a harder product shape: Claude writes a JavaScript orchestration script that runs in a background runtime, with plans living in script variables instead of Claude's active context, and with results flowing back at the end.^[12]

05 / SDK: The Engineering Skeleton of Claude Agent SDK

The Claude Agent SDK, formerly the Claude Code SDK, exposes the shell that powers Claude Code: agent loop, built-in tools, subagent spawning, and MCP integration. The loop is simple: collect context -> act -> verify -> repeat until the task is done.^[4]

python

# Use cheaper Sonnet workers and reserve Opus for strict review.
agents = {
    "researcher": AgentDefinition(
        description="Do first-hand research for one focused objective",
        prompt=SUBAGENT_SYSTEM_PROMPT,
        tools=["WebSearch", "Read"],
        model="sonnet",
        max_turns=8,
    ),
    "reviewer": AgentDefinition(
        prompt=ADVERSARIAL_VERIFY_PROMPT,
        model="opus",
    ),
}
# Put Agent in allowedTools so delegation is preapproved.
# Subagents cannot spawn their own subagents.

Spawning and definition. Subagents can be defined through the agents parameter to query(), markdown files under .claude/agents/, or the built-in general-purpose agent. Claude invokes them through the Agent tool. One rule matters: subagents cannot spawn their own subagents.^[6]
Resume support. Capturing session_id + agentId allows a run to resume with full history. The worker trace lives in an independent file, so it can survive parent-session compaction.
Permissions. Modes include default, acceptEdits, plan, bypassPermissions, and dontAsk. Evaluation proceeds through hooks, deny rules, allow rules, ask rules, permission mode, callback, and post-tool hooks. Deny rules remain highest priority even in bypass mode.

06 / Contract: Structured Outputs Make Worker Results Parseable

Subagent results need to survive JSON.parse. That is what Structured Outputs and strict tool calling provide. The mechanism is constrained decoding: the API compiles your schema, caches it, and constrains every generated token so the tool name and input shape match the schema.^[9]

Without strict mode, a model may return 2 as "2", omit a required field, or produce an invalid enum value. That is why the example workflow uses RESEARCH_SCHEMA with additionalProperties: false and a complete required list. It is not a comment for humans. It is a contract for the decoder. The tradeoff is a first-request grammar compilation delay, and some constraints still need post-validation.

07 / Ledger: Token Economics Is the Deciding Factor

Multi-agent systems work partly because they spend enough tokens on the problem.^[2]

In Anthropic's BrowseComp attribution analysis, three factors explained 95% of performance variance, and token usage alone explained roughly 80%.^[2] The other two factors were tool-call count and model choice. On the cost side, an agent can use about 4x the tokens of a normal chat, and a multi-agent system about 15x.^[2]

When is multi-agent worth it?

Worth it: high-value tasks that can be parallelized, exceed one context window, and require many complex tools.

Not worth it: tasks that require all agents to share the same context or depend heavily on each other. Most coding tasks fall here; the truly parallel portion is usually smaller than in research.

A useful rule of thumb, from Barry Zhang, is that a roughly 10-cent task budget gives you about 30k-50k tokens. That is workflow territory. Upgrade to agents when the task is too ambiguous for a predefined decision tree and valuable enough to amortize the token cost.

08 / Resilience: Reliability Engineering for Compounding Errors

Anthropic's warning is direct: agents are stateful, and errors compound.^[2] A small system failure can become catastrophic inside a long-running agent loop. Production work needs these guardrails:

Resumable execution and checkpoints: resume from the failed point instead of starting over; combine model adaptability with deterministic checkpoints and retry logic.
Graceful tool failure handling: tell the agent that a tool failed and let it adapt. This often works better than hiding the failure.
End-to-end tracing: agent behavior is nondeterministic across runs even with the same prompt, so debugging needs production-grade tracing and decision-pattern monitoring.
Rainbow deployment: keep old and new versions live during rollout so code changes do not break long-running in-flight agents.
Use isError instead of throwing for recoverable tool failures: an uncaught exception ends the loop; a structured error lets the agent continue. Use max_turns and budget ceilings to contain runaway costs.

09 / Prompting: The Useful Parts of Orchestrator Prompt Engineering

Anthropic frames the best prompt as a collaboration framework rather than a rigid instruction set.^[2] It defines roles, problem-solving paths, and effort budgets. The most practical rules are:

Think like your agent. Build a simulator with the same prompt and tools, step through it, and watch failure modes appear.
Teach the orchestrator how to delegate. Each worker needs one objective, an output format, tool and source guidance, and a clear boundary. Vague instructions create duplicated work and blind spots.
Scale effort by complexity. Encode scaling rules in the prompt: simple = 1 subagent, standard = 2-3, medium = 3-5, high = 5-10, with 3 as the default.^[11]
Use thinking as controllable scratch paper. The lead uses extended thinking for planning; workers use interleaved thinking after each tool result to evaluate quality and adjust the next query.
Tool descriptions are signposts. Poor descriptions lead agents down the wrong path. Anthropic reports that a tool-description optimizer agent cut completion time by 40% in one setting.^[2]

10 / Example: A Real Claude Code Workflow

Here is a concrete version of the pattern from a real Claude Code task: check a list of organizations, one by one, to confirm whether each is still active, renamed, or operating under a new website. The code is short, but it uses many of the ideas above:

javascript

phase('Research')
const researched = (await parallel(
  funds.map((f) => async () => {
    const r = await agent(prompt(f), { schema: RESEARCH_SCHEMA })
    return r ? { id: f.id, name: f.name, ...r } : null   // id/name come from code, not the model
  }),
)).filter(Boolean)                                        // failed agents return null and are filtered out

Parallel fan-out: parallel(funds.map(...)) gives each organization to one AI worker and runs them together instead of queueing them.
Truth isolation: id / name are reattached by code and never requested from the model, reducing mix-ups and hallucination.
Structured output: schema locks the AI answer into a fixed form that downstream code can parse.
Fault tolerance: one failed worker returns null and is filtered out, rather than taking down the whole batch.

The second half of the script sends only the organizations with detected changes to another AI for adversarial review. That is role separation in practice. The hard part is not writing this kind of workflow. The hard part is making it stable, observable, controllable, and cost-bounded in production.

References

Building effective agents — Anthropic Engineering, December 2024. Workflow vs agent, five workflow patterns. anthropic.com/research/building-effective-agents
How we built our multi-agent research system — Anthropic Engineering, June 2025. Orchestrator-worker, CitationAgent, 90.2% / 15x / 80% variance, reliability engineering. anthropic.com/engineering/multi-agent-research-system
Effective context engineering for AI agents — Anthropic Engineering, September 2025. Context rot, compaction, notes, subagents. anthropic.com/engineering/effective-context-engineering-for-ai-agents
Building agents with the Claude Agent SDK — Anthropic Engineering, September 2025. Agent loop, AgentDefinition, MCP. anthropic.com/engineering/building-agents-with-the-claude-agent-sdk
Managing context on the Claude Developer Platform — Anthropic, September 2025. Context editing, memory tools, server-side compaction.

Note: figures such as 90.2%, 15x, and 80% come from specific mid-2025 evaluations with Opus 4 and Sonnet 4. They may not generalize to every task or newer models. Dynamic Workflows and some context-management features are beta or research-preview capabilities, so availability may vary by plan and platform.

Putting 1,000 Agents Inside One Task: The Engineering Logic of Claude Subagents and Workflows

Related notes

An Autopsy of Three Claude Code Skill Frameworks: superpowers, gstack, and ECC

Discussion