Week 1 — Audit
Largest spend turned out to be a "smart questions" feature that re-sent a 40K-token product schema on every customer query. Caching enabled, but the variable user message came first and broke the prefix every time. Cache hit rate sat at 11%.
Week 2 — Static-first ordering
We moved the schema into the system prompt and dropped today's volatile values (user id, tenant id, current date) into a small user-side block. Cache hits climbed past 70% within an afternoon.
Week 3 — Model tiering
The "did the customer mean X?" intent classifier was running on Sonnet 4.6 — overkill. Switched it to Haiku 4.5. Quality drop on the eval set: 1.2 percentage points. Cost on that endpoint dropped 78%.
Week 4 — Compaction inside long agent sessions
Long-running analyst conversations were carrying every tool result verbatim. We added an explicit compaction step every 15 turns that collapses prior tool outputs into a structured "what we learned" briefing.
What stuck
The prompt-cache audit alone paid for the whole project four times over. The right model for the right step. Compaction is selective summarization, not a recap.
