Series
Multi-part deep dives on practices that unlock what AI tools can actually do.
AI Eval Discipline
Most teams adopting AI ask which eval framework to buy. Wrong question. Evals are not a benchmark suite — they are a habit of looking at your data, naming the failure modes, and writing assertions that fire when they recur. This series walks the discipline from foundations to operational practice: what evals actually are, the three gulfs LLM development has to bridge, how to build an eval set without burning a quarter, and why mutation testing is the only honest eval for AI-written tests.
AI Won't Do This by Default — Until You Do
AI coding agents are incredibly powerful. But the teams getting unreal results combine them with practices most teams skip. This series walks the full delivery value stream — one practice per post — showing what AI tools do brilliantly, what they don't do by default, and what happens when you add the practice that unlocks their real power.