Claude Certification
All case studies
Prompt Engineering · 20% of exam

NorthStar Analytics: cutting $40K/month in LLM cost

A four-week dive into prompt caching, model tiering, and compaction.

Company
NorthStar Analytics (fictional)
Duration
4 weeks
Outcome
Monthly LLM spend $58K → $19K (−67%). Prompt-cache hit rate 11% → 81%. Customer-perceived latency on the busiest endpoint −22%.

Week 1 — Audit

Largest spend turned out to be a "smart questions" feature that re-sent a 40K-token product schema on every customer query. Caching enabled, but the variable user message came first and broke the prefix every time. Cache hit rate sat at 11%.

Week 2 — Static-first ordering

We moved the schema into the system prompt and dropped today's volatile values (user id, tenant id, current date) into a small user-side block. Cache hits climbed past 70% within an afternoon.

Week 3 — Model tiering

The "did the customer mean X?" intent classifier was running on Sonnet 4.6 — overkill. Switched it to Haiku 4.5. Quality drop on the eval set: 1.2 percentage points. Cost on that endpoint dropped 78%.

Week 4 — Compaction inside long agent sessions

Long-running analyst conversations were carrying every tool result verbatim. We added an explicit compaction step every 15 turns that collapses prior tool outputs into a structured "what we learned" briefing.

What stuck

The prompt-cache audit alone paid for the whole project four times over. The right model for the right step. Compaction is selective summarization, not a recap.

Principles applied