Claude Certification
Agentic Architecture & Orchestration
Lesson 7 · 7 min

Evaluating Agent Quality

Golden tasks, success rates, and step-level traces.

Evaluate agents on *task success rate*, not per-call accuracy. Keep a golden set of 20–50 representative tasks. Run them on every prompt change. Capture full traces (tool calls + LLM messages) so regressions are debuggable.

Quick check
Agentic ArchitectureSelect one
Why is parallel fan-out a poor fit when subtasks share heavy context?