Lesson 5 · 5 min

Testing Tools in Isolation

Run tools end-to-end without invoking Claude.

Tools are normal functions — unit-test them with realistic inputs and assert on outputs. Then test the *schema* with a small Claude call that just produces a tool argument; assert the argument validates. Separate concerns means failures point at the right layer.

Production scenario

Real-world example: A `send_email` tool in CI

The team's email tool gets two layers of tests:

// 1. Pure-function unit test against a mocked SES client.
test("send_email assembles MIME and calls SES once", async () => {
  const ses = mockSes();
  await sendEmail({ to: "x@example.com", subject: "hi", body: "hello" }, ses);
  expect(ses.send).toHaveBeenCalledTimes(1);
});

// 2. Schema probe — a tiny Claude call producing a tool argument.
test("Claude produces a valid send_email argument", async () => {
  const arg = await claudeProbe("Email Bob saying we'll be late.");
  expect(SendEmailArgs.safeParse(arg).success).toBe(true);
});

Layer 1 tests the function. Layer 2 tests the schema's communicability. They fail for different reasons and point you at different fixes.

Why this matters: tool tests catch implementation bugs. Schema tests catch documentation bugs. You want both.

Knowledge points in this lesson

Unit-test tool functions in isolation
Schema correctness is a separate check
Probe schema with a small Claude call
Separate layers for clearer failure diagnosis
Mock external services in tool tests

Quick check

Tool Design & MCPSelect one

Which is the BEST way to test a tool independently of an LLM?

Testing Tools in Isolation

Real-world example: A send_email tool in CI

Real-world example: A `send_email` tool in CI