Series

Multi-part deep dives on practices that unlock what AI tools can actually do.

1 post published

AI Eval Discipline

Most teams adopting AI ask which eval framework to buy. Wrong question. Evals are not a benchmark suite — they are a habit of looking at your data, naming the failure modes, and writing assertions that fire when they recur. This series walks the discipline from foundations to operational practice: what evals actually are, the three gulfs LLM development has to bridge, how to build an eval set without burning a quarter, and why mutation testing is the only honest eval for AI-written tests.

  1. 1

    Evals Aren't a Benchmark Suite. They're a Habit of Looking at Your Data.

2 posts published

AI Won't Do This by Default — Until You Do

AI coding agents are incredibly powerful. But the teams getting unreal results combine them with practices most teams skip. This series walks the full delivery value stream — one practice per post — showing what AI tools do brilliantly, what they don't do by default, and what happens when you add the practice that unlocks their real power.

  1. 1

    AI Won't Do This by Default: Hypothesis-Driven Development

  2. 2

    AI Won't Do This by Default: User Story Mapping & Shape