Lesson 6 · 7 min

Safe Rollouts

Feature flags, shadow runs, and gradual ramps.

Roll out prompt or model changes behind flags. Shadow-run new prompts against production traffic and compare scores offline before flipping. Ramp 1% → 10% → 100% with quality + cost gates at each step.

Production scenario

Real-world example: New cold-outreach agent at a sales-tech startup

A sales-tech startup is replacing its old template-based cold-email writer with a model-driven agent. The rollout plan:

Shadow (1 week): the agent runs on every real prospect alongside the existing system. Outputs go to a review queue, not the customer. Compare reply rates offline.
1% canary (1 week): the agent's draft is sent for 1% of campaigns, picked by hash. Watch reply rate and unsubscribe rate; abort if unsubscribe spikes.
10% ramp (1 week): expand if metrics stay clean.
100%: full traffic. The old system stays warm for two weeks as a kill switch.

A regression appeared at the 10% step (unsubscribes +18%). Roll back, fix the system prompt that was being a touch too pushy, redo the ramp.

Why this matters: safe rollouts catch regressions before they hit everyone. Gates at every step, kill switch always reachable.

Knowledge points in this lesson

Roll out prompt/model changes behind flags
Shadow-run before production exposure
Ramp 1% → 10% → 100%
Gates at each step on quality and cost
Always have a kill switch

Quick check

Context & ReliabilitySelect one

What's the safest way to roll out a new prompt to production traffic?