Claude Certification
Context Management & Reliability
Lesson 1 · 6 min

Token Budgets

Working within context windows without truncating critical state.

Sonnet 4.6 has 200K (or 1M) context. Plan for ~70% of the window so you have headroom for outputs and tool results. Track tokens per message section so you know where to cut first.

Production scenario

Real-world example: Long-running research agent

A research agent compiles a 30-page brief over a multi-hour session, calling search and document-fetch tools dozens of times. Without a budget plan, it blows past Sonnet's 200K context by hour two.

The plan: keep the system + brief skeleton + working notes under 140K (70% of 200K) at all times. When the budget tracker passes 70%, the agent compacts older tool results into a shorter "what we learned" summary block.

[ system + role ]               ~ 2K
[ working brief skeleton ]      ~ 10K
[ summarized older steps ]      ~ 30K
[ recent 20 turns verbatim ]    ~ 80K
[ headroom for tool results ]   ~ 60K

Why this matters: target ~70% utilization and account for tool-result blocks you can't size in advance. Headroom is what keeps the agent from face-planting on a long-running session.

Knowledge points in this lesson
  • Target about 70% window utilization
  • Leave headroom for outputs and tool results
  • Track tokens per section of the prompt
  • Sonnet 4.6 has 200K window standard
  • Opus 4.7 offers an optional 1M window
  • Cut from the biggest section first
Quick check
Context & ReliabilitySelect one
How does Claude Code's auto-compaction work?