Claude Certification
Agentic Architecture & Orchestration
Lesson 4 · 6 min

Parallel Fan-Out

Map-reduce over Claude calls when the work shards cleanly.

If the task decomposes into independent shards (e.g. summarize 100 documents), run them in parallel and reduce. Watch for rate limits — use the Messages Batch API or a concurrency limiter at the SDK level.

Production scenario

Real-world example: Earnings season at an equity research desk

A research desk needs an executive summary of every S&P 500 10-K filing within 48 hours of release. Sequential summarization at 30 seconds per filing is 4+ hours. Parallel fan-out via the Messages Batch API runs all 500 in under 12 minutes.

const summaries = await Promise.all(
  filings.map((f) =>
    claude.messages.create({
      model: "claude-sonnet-4-6",
      messages: [{ role: "user", content: ["Summarize:", f.text].join("\n") }],
    }),
  ),
);

Concurrency is capped at 20 in-flight requests to stay inside rate limits, with exponential backoff on 429s.

Why this matters: the *shards* are independent (one filing in, one summary out, no shared context). That's the precise shape parallel fan-out was designed for.

Knowledge points in this lesson
  • Map-reduce LLM calls over independent shards
  • Watch rate limits with batching or concurrency caps
  • Shared heavy context can dominate cost
  • Reduce step combines worker outputs
  • Use Messages Batch API for large volumes
Quick check
Agentic ArchitectureSelect one
You're building an agent that runs across many turns. What should you persist between turns?