Token Budgets
Working within context windows without truncating critical state.
Sonnet 4.6 has 200K (or 1M) context. Plan for ~70% of the window so you have headroom for outputs and tool results. Track tokens per message section so you know where to cut first.
Real-world example: Long-running research agent
A research agent compiles a 30-page brief over a multi-hour session, calling search and document-fetch tools dozens of times. Without a budget plan, it blows past Sonnet's 200K context by hour two.
The plan: keep the system + brief skeleton + working notes under 140K (70% of 200K) at all times. When the budget tracker passes 70%, the agent compacts older tool results into a shorter "what we learned" summary block.
[ system + role ] ~ 2K
[ working brief skeleton ] ~ 10K
[ summarized older steps ] ~ 30K
[ recent 20 turns verbatim ] ~ 80K
[ headroom for tool results ] ~ 60KWhy this matters: target ~70% utilization and account for tool-result blocks you can't size in advance. Headroom is what keeps the agent from face-planting on a long-running session.
- Target about 70% window utilization
- Leave headroom for outputs and tool results
- Track tokens per section of the prompt
- Sonnet 4.6 has 200K window standard
- Opus 4.7 offers an optional 1M window
- Cut from the biggest section first
