Parallel Fan-Out
Map-reduce over Claude calls when the work shards cleanly.
If the task decomposes into independent shards (e.g. summarize 100 documents), run them in parallel and reduce. Watch for rate limits — use the Messages Batch API or a concurrency limiter at the SDK level.
Real-world example: Earnings season at an equity research desk
A research desk needs an executive summary of every S&P 500 10-K filing within 48 hours of release. Sequential summarization at 30 seconds per filing is 4+ hours. Parallel fan-out via the Messages Batch API runs all 500 in under 12 minutes.
const summaries = await Promise.all(
filings.map((f) =>
claude.messages.create({
model: "claude-sonnet-4-6",
messages: [{ role: "user", content: ["Summarize:", f.text].join("\n") }],
}),
),
);Concurrency is capped at 20 in-flight requests to stay inside rate limits, with exponential backoff on 429s.
Why this matters: the *shards* are independent (one filing in, one summary out, no shared context). That's the precise shape parallel fan-out was designed for.
- Map-reduce LLM calls over independent shards
- Watch rate limits with batching or concurrency caps
- Shared heavy context can dominate cost
- Reduce step combines worker outputs
- Use Messages Batch API for large volumes
