Lesson 1 · 6 min

Token Budgets

Working within context windows without truncating critical state.

Sonnet 4.6 has 200K (or 1M) context. Plan for ~70% of the window so you have headroom for outputs and tool results. Track tokens per message section so you know where to cut first.

Production scenario

Real-world example: Long-running research agent

A research agent compiles a 30-page brief over a multi-hour session, calling search and document-fetch tools dozens of times. Without a budget plan, it blows past Sonnet's 200K context by hour two.

The plan: keep the system + brief skeleton + working notes under 140K (70% of 200K) at all times. When the budget tracker passes 70%, the agent compacts older tool results into a shorter "what we learned" summary block.

[ system + role ]               ~ 2K
[ working brief skeleton ]      ~ 10K
[ summarized older steps ]      ~ 30K
[ recent 20 turns verbatim ]    ~ 80K
[ headroom for tool results ]   ~ 60K

Why this matters: target ~70% utilization and account for tool-result blocks you can't size in advance. Headroom is what keeps the agent from face-planting on a long-running session.

Knowledge points in this lesson

Target about 70% window utilization
Leave headroom for outputs and tool results
Track tokens per section of the prompt
Sonnet 4.6 has 200K window standard
Opus 4.7 offers an optional 1M window
Cut from the biggest section first

Quick check

Context & ReliabilitySelect one

How does Claude Code's auto-compaction work?