Lesson 6 · 7 min
Safe Rollouts
Feature flags, shadow runs, and gradual ramps.
Roll out prompt or model changes behind flags. Shadow-run new prompts against production traffic and compare scores offline before flipping. Ramp 1% → 10% → 100% with quality + cost gates at each step.
Production scenario
Real-world example: New cold-outreach agent at a sales-tech startup
A sales-tech startup is replacing its old template-based cold-email writer with a model-driven agent. The rollout plan:
- Shadow (1 week): the agent runs on every real prospect alongside the existing system. Outputs go to a review queue, not the customer. Compare reply rates offline.
- 1% canary (1 week): the agent's draft is sent for 1% of campaigns, picked by hash. Watch reply rate and unsubscribe rate; abort if unsubscribe spikes.
- 10% ramp (1 week): expand if metrics stay clean.
- 100%: full traffic. The old system stays warm for two weeks as a kill switch.
A regression appeared at the 10% step (unsubscribes +18%). Roll back, fix the system prompt that was being a touch too pushy, redo the ramp.
Why this matters: safe rollouts catch regressions before they hit everyone. Gates at every step, kill switch always reachable.
Knowledge points in this lesson
- Roll out prompt/model changes behind flags
- Shadow-run before production exposure
- Ramp 1% → 10% → 100%
- Gates at each step on quality and cost
- Always have a kill switch
Quick check
Context & ReliabilitySelect one
What's the safest way to roll out a new prompt to production traffic?
