Claude Certification
All case studies
Agentic Architecture · 27% of exam

PrintWize: from one prompt to a real agent in three sprints

How a 30-person print-on-demand startup outgrew a single Claude call.

Company
PrintWize (fictional)
Duration
3 sprints (6 weeks)
Outcome
Order-detail extraction accuracy 67% → 94%. Average handling time per custom order dropped from 4 min to 6 sec. Manual order-cleanup queue from 80/day to 7/day.

Sprint 1 — Stop hammering one prompt

PrintWize was running every inbound custom-order email through a single 2,400-token prompt that tried to extract artwork metadata, sizing, shipping, and payment terms at once. Accuracy was 67% and the engineers were tired. We split it: a deterministic intake (parse email + attachments, normalize addresses, OCR the artwork) followed by four specialized extractors running in parallel from an orchestrator.

Sprint 2 — Tool design pass

Each extractor was a tool the orchestrator could call. We rewrote the schemas: extract_artwork (returns dimensions + color profile + transparency flags), extract_sizing (returns SKU + qty), extract_shipping (returns address + service level), extract_payment_terms (returns terms + PO if any). Flat schemas, enums on the bounded fields, error contracts with a hint field.

Sprint 3 — Eval + roll out

A junior PM labeled 150 historical orders. We built a tiny eval harness and ran it on every prompt change. Two prompt regressions caught in week 1 alone. Shadow → 10% → 100% over the third sprint.

What stuck

Going from "one big prompt" to orchestrator + workers was the single biggest unlock. Schema discipline closed most of the remaining error rate. Eval-gated PRs kept it from regressing.

Principles applied