Claude Certification
Context Management & Reliability
Lesson 4 · 7 min

Observability

Logging traces, costs, and quality signals.

Log every LLM call with: model, input/output tokens, latency, tool calls, cache hits, and a task ID. Aggregate to dashboards for cost per task and tail latency. Track quality with sampled human review of outputs.

Production scenario

Real-world example: Catching a regression after a prompt change

A growth team ships a prompt change Monday morning. Their dashboard tracks five signals per call: model, input/output tokens, latency, cache hit rate, tool errors.

By Monday afternoon, the dashboard shows:

  • input tokens: +14% (cache hit rate dropped from 72% to 38%)
  • p99 latency: +1.6s
  • cost per task: +21%

Root cause: the new prompt moved a piece of static content from the system block into the user message, breaking the cache prefix. Roll back. Costs return to baseline within an hour.

Why this matters: observability isn't decoration. It's the only way to catch prompt regressions before they blow your unit economics.

Knowledge points in this lesson
  • Log model, tokens, latency, tool calls
  • Track cache hit rate alongside cost
  • Dashboard tail latency (p95/p99)
  • Attach a task ID to every call
  • Sample human review for quality
Quick check
Context & ReliabilitySelect one
Why include a request_id with tool calls that may be retried?