Lesson 4 · 6 min

Parallel Fan-Out

Map-reduce over Claude calls when the work shards cleanly.

If the task decomposes into independent shards (e.g. summarize 100 documents), run them in parallel and reduce. Watch for rate limits — use the Messages Batch API or a concurrency limiter at the SDK level.

Production scenario

Real-world example: Earnings season at an equity research desk

A research desk needs an executive summary of every S&P 500 10-K filing within 48 hours of release. Sequential summarization at 30 seconds per filing is 4+ hours. Parallel fan-out via the Messages Batch API runs all 500 in under 12 minutes.

const summaries = await Promise.all(
  filings.map((f) =>
    claude.messages.create({
      model: "claude-sonnet-4-6",
      messages: [{ role: "user", content: ["Summarize:", f.text].join("\n") }],
    }),
  ),
);

Concurrency is capped at 20 in-flight requests to stay inside rate limits, with exponential backoff on 429s.

Why this matters: the *shards* are independent (one filing in, one summary out, no shared context). That's the precise shape parallel fan-out was designed for.

Knowledge points in this lesson

Map-reduce LLM calls over independent shards
Watch rate limits with batching or concurrency caps
Shared heavy context can dominate cost
Reduce step combines worker outputs
Use Messages Batch API for large volumes

Quick check

Agentic ArchitectureSelect one

You're building an agent that runs across many turns. What should you persist between turns?