Specification-driven development is the practice of writing feature specifications in a format that is simultaneously readable by product, understandable by engineering, and executable by tests. The spec is the contract — not the prose ticket. Pre-AI, vague specs produced slow bugs. Post-AI, vague specs produce fast ones. The practice that closed the PM-to-engineer translation gap fifteen years ago is the same practice that now separates teams who get leverage from AI agents from teams who just ship the wrong thing faster.
The spec said "the user should be able to apply discount codes at checkout." Ten words. Perfectly reasonable. It shipped three weeks later, went live on a Tuesday, and on Thursday started returning five-hundreds on 8% of sessions.
The root cause: that sentence had twelve unanswered questions. What happens if the code is expired? If it's already been used? If the cart total is below the minimum? If multiple codes are stacked? If the code applies to some items but not others? What if the discount is percentage-based and the line items round differently? The PM didn't know the answers when she wrote the ticket. The engineer didn't ask. The AI coding agent read the ticket and made its best guesses — a dozen of them, none validated, all now in production.
This is the translation problem. It's not new. What's new is that AI has made it lethal.
The Handoff That Always Leaks
The traditional workflow is a chain of translations.
- The PM has an outcome in her head.
- She writes a ticket — her best attempt at describing the outcome in engineering-adjacent language.
- The engineer reads the ticket and translates it into code — his best attempt at inferring the outcome from the PM's language.
- QA reads the ticket, reads the code, and finds the gaps.
- PM signs off on the fixes.
- Three rounds, three weeks later, the feature ships.
Every handoff is a translation. Every translation is lossy. The PM speaks in outcomes. The engineer speaks in constraints. The prose in the ticket is neither — it's a middle layer that has to be re-interpreted by both sides, and the re-interpretations are almost never identical.
Pre-AI, the loss was slow enough that someone noticed. Three hours into the build, the engineer would hit the expiration-check edge case and ping the PM. The PM would say "oh yeah, let's reject expired codes but keep the rest of the cart intact." The conversation would happen. The ticket would get a comment. The code would be correct.
That conversation was the product, not the ticket. The ticket was decorated uncertainty. The conversation was the real contract.
Then AI got fast.
What Changes When the Build Takes Two Hours
A coding agent reads the ticket. It picks an interpretation of every ambiguity. It implements. It writes tests — against its own interpretation. The tests pass. The PR is clean. Everything ships.
The twelve unanswered questions are still twelve unanswered questions. They just have twelve unvalidated answers buried in the code, consistent with the agent's assumptions and inconsistent with everyone else's.
This is why post-AI teams are seeing a specific pattern: faster delivery, more defects, worse customer feedback. The 2025 DORA Report found that AI adoption correlates with higher throughput and higher instability — and one underexplored reason is this: the friction that used to force mid-build clarification has been removed, and nothing has replaced it.
The fix isn't slower AI. It isn't more QA. It isn't better prompts. It's a contract between the PM and the engineer that doesn't require translation — because both sides wrote it together, and the AI builds against the contract instead of guessing at prose.
What an Executable Specification Looks Like
A specification is executable when every ambiguity in the prose has been resolved into a concrete example, and the examples themselves are the test data. The format matters less than the rule. Gherkin, example tables, scenario outlines, domain-specific DSLs — they all work. What they share:
- One scenario = one business rule
- Every scenario has concrete inputs and expected outputs
- The scenario is written in business language, not code
- The scenario is runnable — an automated test, not documentation
The prose version:
"User should be able to apply a discount code at checkout."
The executable version (one of many scenarios):
Scenario: Applying a valid discount code below the minimum cart value
Given the cart total is $20.00
And the discount code "SAVE10" requires a minimum of $25.00
When the user applies the code
Then the code is rejected with message "Add $5.00 more to use this code"
And the cart total remains $20.00
One scenario, one rule. Five more scenarios cover expired codes, already-used codes, stacking, currency edge cases, and codes on partially-eligible carts. An hour of conversation. Sixty executable contracts.
The engineer doesn't translate. The AI agent doesn't guess. The test is the spec. The spec is the test. They're the same artifact.
The Conversation Is the Point
When Kent Beck introduced user stories in Extreme Programming, he didn't mean "a written requirement you hand to an engineer." He meant something closer to the literal sense of the word: you tell me your story. Sit with me. Walk me through what the user does, what they're trying to achieve, what happens when things don't go the way you expected. A story was something that lived between two people talking — not a document, not a ticket, not an acceptance criterion bullet list. The card was a reminder to have the conversation. The conversation was the thing.
Twenty-five years later, most teams have inherited the word "user story" and lost the practice. The ticket gets written by one person, usually in prose, usually the night before sprint planning. The engineer reads it and starts building. No story is being told. A requirement is being handed over with an implicit "shut up and implement this." I'm genuinely uncertain how many people who use the phrase "user story" every day understand what the person who coined that phrase meant by it — and that uncertainty is not a semantic quibble. It's the reason so much software gets built from one person's guesses about another person's intent.
Specification-driven development is the same discipline in different clothes. A beautifully formatted Gherkin document written by one person at midnight is not specification-driven development — it's decorated assumption. Same pathology that turned user stories into requirements tickets, just with Given/When/Then syntax instead of bullet points.
The discipline is the conversation. PM and engineer (often with a QA or tech lead in the room) sit together and work through concrete examples before code is written. Every ambiguity surfaces when the cost of surfacing it is a sticky note, not a sprint. Every assumption gets tested against someone who thinks differently.
The document is the artifact. The shared understanding is the product.
This is the same rule Jeff Patton has for user story mapping: "the map is not the deliverable, the shared understanding is." Same rule, different layer. Story maps build shared understanding at the journey level. Specifications build shared understanding at the feature level. Both die when they become solo documentation exercises handed off across a wall.
Why This Matters More With AI, Not Less
Three shifts that change the math:
The cost of ambiguity has collapsed. Pre-AI, ambiguous specs produced expensive friction — the engineer stopped, asked, and the clarification happened. That friction was annoying but useful. Post-AI, the friction is gone. The agent picks an interpretation and ships. The clarification never happens.
The cost of writing clear specs hasn't changed. Executable specs still require the same PM-engineer conversation they always did. An hour of talk, some examples, a few edge cases. No AI will shorten this — it's a human-to-human design activity.
The leverage of clear specs has multiplied. Pre-AI, a perfect spec saved you maybe 20% of engineering time — the time you'd have spent clarifying. Post-AI, a perfect spec is the entire contract. The agent implements against it, generates tests against it, gets verified against it, and fails loudly when the code drifts from it. The spec becomes the executable definition of "done."
Teams that keep writing prose tickets and letting the AI guess are doing the same thing they did in 2022, just at machine speed. Teams that invest in executable specs before engaging AI are operating in a fundamentally different category.
The Objections That Disappear When You Try It
"Our domain is too complex for Gherkin." No domain is too complex for examples. Gherkin is one format, not a requirement. A domain with ten interacting variables means table-driven scenarios instead of step-by-step ones. The principle holds: one rule per scenario, concrete inputs and outputs, executable.
"Our PMs don't write Gherkin." They shouldn't write it alone. The engineer typically types the scenario syntax during the working session. The PM's job is to know the business rules well enough to spot when a scenario captures them correctly — and when the engineer missed an edge case. The PM writes the intent. The pair writes the contract.
"We don't have time for this." You have time for three rounds of PR review, one production incident, and a retrospective where everyone agrees "the spec was unclear." Specification-driven development doesn't add time to the feature — it shifts time from the back end (debugging, rework, retros) to the front end (conversation). The total is lower. The distribution is different.
Three Numbers That Move
Teams that adopt specification-driven development see three metrics shift in a predictable pattern.
Lead time drops. The ping-pong between PM, engineer, and QA compresses from days to hours. Most of it happens in one session before code is written. DORA elites ship in under a day — and one of the things they share is that the spec conversation happens upstream, not mid-build.
Change failure rate drops. Specs that run as tests don't drift from intent. The defects that ship are genuinely new edge cases, not re-misinterpretations of the same rule by different people. This is the number that connects spec discipline to DORA performance — fewer "how did we ship that" incidents.
PR review time drops. The reviewer reads the spec, reads the code, checks that they match. The subjective "is this what we wanted?" question is replaced by the objective "does the code implement this scenario?" check. Reviews get faster and more reliable at the same time.
The Contract
Prose is hope. Executable specs are contracts.
Before AI, the translation from prose to code was a bottleneck that slowed teams down — and occasionally slowed them down enough that the mistake got caught. After AI, the bottleneck is gone. The mistake ships.
The fix isn't more QA. It isn't more documentation. It isn't better AI. It's the same practice that worked for Specification by Example teams fifteen years ago: put the PM and the engineer in a room, write executable scenarios together, treat those scenarios as the contract, and let both code and tests be generated against that contract.
Your PM and your engineer are still speaking different languages. AI has made that costlier, not cheaper. Specification-driven development is the translation layer — and the agent only performs well when the translation layer is clean.
Build the right thing. Then build it right. The spec is what connects the two.