Lesson 4 · 7 min

Observability

Logging traces, costs, and quality signals.

Log every LLM call with: model, input/output tokens, latency, tool calls, cache hits, and a task ID. Aggregate to dashboards for cost per task and tail latency. Track quality with sampled human review of outputs.

Production scenario

Real-world example: Catching a regression after a prompt change

A growth team ships a prompt change Monday morning. Their dashboard tracks five signals per call: model, input/output tokens, latency, cache hit rate, tool errors.

By Monday afternoon, the dashboard shows:

input tokens: +14% (cache hit rate dropped from 72% to 38%)
p99 latency: +1.6s
cost per task: +21%

Root cause: the new prompt moved a piece of static content from the system block into the user message, breaking the cache prefix. Roll back. Costs return to baseline within an hour.

Why this matters: observability isn't decoration. It's the only way to catch prompt regressions before they blow your unit economics.

Knowledge points in this lesson

Log model, tokens, latency, tool calls
Track cache hit rate alongside cost
Dashboard tail latency (p95/p99)
Attach a task ID to every call
Sample human review for quality

Quick check

Context & ReliabilitySelect one

Why include a request_id with tool calls that may be retried?