Claude Certification
Context Management & Reliability
Lesson 6 · 7 min

Safe Rollouts

Feature flags, shadow runs, and gradual ramps.

Roll out prompt or model changes behind flags. Shadow-run new prompts against production traffic and compare scores offline before flipping. Ramp 1% → 10% → 100% with quality + cost gates at each step.

Production scenario

Real-world example: New cold-outreach agent at a sales-tech startup

A sales-tech startup is replacing its old template-based cold-email writer with a model-driven agent. The rollout plan:

  1. Shadow (1 week): the agent runs on every real prospect alongside the existing system. Outputs go to a review queue, not the customer. Compare reply rates offline.
  2. 1% canary (1 week): the agent's draft is sent for 1% of campaigns, picked by hash. Watch reply rate and unsubscribe rate; abort if unsubscribe spikes.
  3. 10% ramp (1 week): expand if metrics stay clean.
  4. 100%: full traffic. The old system stays warm for two weeks as a kill switch.

A regression appeared at the 10% step (unsubscribes +18%). Roll back, fix the system prompt that was being a touch too pushy, redo the ramp.

Why this matters: safe rollouts catch regressions before they hit everyone. Gates at every step, kill switch always reachable.

Knowledge points in this lesson
  • Roll out prompt/model changes behind flags
  • Shadow-run before production exposure
  • Ramp 1% → 10% → 100%
  • Gates at each step on quality and cost
  • Always have a kill switch
Quick check
Context & ReliabilitySelect one
What's the safest way to roll out a new prompt to production traffic?