An orange background with two cards side by side — left card in deep teal showing the number 31 lines of setup against 4 lines of assertion, right card in warm beige listing the three design problems revealed: too many mocks, complex setup, cannot isolate.
Engineering Practices, Software Design, Testing

The Test That's Hardest to Write Is Telling You Something About Your Design

By Shivani Sutreja9 min read

Hard-to-test code is not a testing problem — it's a design problem with a testing symptom. When a test requires excessive setup, too many mocks, or cannot run in isolation, the test is exposing coupling, hidden state, or boundary violations in the code under test. The test is the first consumer of the code, and if it's already struggling, production code will face the same friction — it will just be less vocal about it.

An engineer sits down to write a test for a function they just finished. The function calculates the monthly instalment for a loan — a few inputs, one output, nothing exotic. Forty minutes later, they have stubbed the database connection, mocked the notification service, injected a fake clock, built two fixture objects, and imported the logger to suppress its output during the test. The actual assertions are four lines. The setup is thirty-one.

They run it, watch it pass, and move on. The feature is covered.

What they concluded: this code is complex, so the test had to be complex.

What the test was actually telling them: something in the design is wrong.

The Signal Nobody Reads

When testing is hard, engineers almost universally reach for one of two explanations. The test framework is not well-suited to this kind of code. Or: this is genuinely complex logic that is just hard to test.

Both explanations treat the difficulty as external — something imposed by circumstances outside the engineer's control. Neither asks the question the test is actually raising: why does this code need so much scaffolding to run?

Hard-to-test code is almost always hard for the same structural reasons:

  • It knows too much about its neighbors
  • It does too many things
  • It cannot exist without a specific environment

Before going further, a scope note: this signal applies to code designed to be tested as a unit — discrete functions, classes, modules with a clear boundary. Integration tests naturally require more setup because they intentionally exercise the connections between components. If you are writing an integration test and it has a long setup, that is expected. If you are writing a unit test and it has the same long setup, that is the signal worth reading.

Not every difficult unit test is a design failure. Some domains are genuinely complex — payment reconciliation, distributed coordination, time-sensitive workflows — and a well-designed system in those areas may still require meaningful setup because the business problem itself demands it. The distinction worth making is between essential complexity and accidental complexity. Essential complexity persists no matter how clean the code is. Accidental complexity is coupling and knowledge load introduced by the implementation — and it dissolves when you redesign. The signal worth reading is the accidental kind.

What Specific Setup Tells You

Every dependency a unit carries increases the amount of knowledge it must hold about the outside world. Tests pay that cost immediately: they must either recreate or replace everything the unit depends on. Production callers pay it more slowly, as the ripple of change. The setup length is where the hidden knowledge bill becomes visible.

The shape of that difficulty is diagnostic. Each kind of testing friction maps to a specific design issue.

Too many mocks. A large number of mocks often indicates that a unit is carrying excessive knowledge of surrounding concerns. A function may still have a single business responsibility — calculating an instalment, processing a payment — while knowing far too much about how that responsibility connects to the rest of the system: which database to query, which service to notify, which logger to write to. Mocks do not fix that knowledge load. They expose it. Every mock is a declaration that the code under test cannot run — or be changed — without understanding that collaborator.

Complex setup state. If you need three fixture objects and two inserts before the test can run, the function is implicitly coupled to a specific world-state it reaches out to acquire. Pure functions need no setup. Functions that take their state as parameters need only those parameters. Functions that go out and get state from the environment need you to reconstruct that environment — exactly what those thirty-one setup lines are doing. The setup length is proportional to the distance between where the function gets its data and where that data should live in the parameter list.

Cannot test in isolation. If you cannot instantiate the class without the full application context — the web framework, the database pool, the configuration system — the class is not actually a unit. It is a piece of the monolith that happens to have its own file. Units should be able to exist on their own. When they cannot, the boundary is wrong. The code under test has taken a dependency on the environment rather than on an abstraction.

Why Engineers Read It Backwards

The default model is: write the code, then write the test to verify it works. In that sequence, the test arrives after the design decision. If the design is wrong, the test has to accommodate it. Engineers accommodate it — they write the thirty-one-line setup, add the extra mock, import the thing that should not be there — and call it done. The test has already spoken. They just were not listening.

This is what test-driven development discovered from the other direction: if you write the test first, the difficulty of writing it stops you before you commit the design. The pain surfaces at the design moment, not as accommodation after the fact. The pain is early and cheap.

Paid after the fact, the same signal becomes late and compounding. The thirty-one-line setup does not go away. It becomes the floor — the minimum cost of touching this function. Every future test will start from the same place.

You do not have to do TDD to read the signal. You just have to treat the difficulty of testing as information rather than inconvenience.

What to Do With the Signal

When a test requires more setup than assertion, stop. Before adding the next mock, ask:

Why does this function need this dependency at all? If it is acquiring data from a collaborator that could be passed in as a parameter, the dependency is not necessary — it is an assumption baked into the structure. Consider the difference:

// Reaches out — test needs a database stub and a notification mock
function calculateMonthlyInstalment(loanId: string): number {
  const rate = db.getInterestRate(loanId);
  const principal = db.getPrincipal(loanId);
  notificationService.log('calculation_started', loanId);
  return (principal * rate) / 12;
}

// Receives what it needs — test needs two numbers
function calculateMonthlyInstalment(principal: number, rate: number): number {
  return (principal * rate) / 12;
}

The business logic is identical. The testability is completely different. Functions that receive the data they need through parameters are almost always easier to test because their dependencies are explicit. Functions that go out and get things are only testable when the thing they go to get exists, or when you have built a convincing fake of it.

What is this function actually doing? If the setup must account for database state and email state and configuration state and user state — how many things is this function responsible for? Each distinct concern is a candidate for extraction. Code that does one thing is almost always easier to test than code that does three things. The setup complexity scales with the number of concerns, not with the difficulty of any individual concern. Count the mocks. That number approximates how many things the function is responsible for knowing about.

Where is the boundary? If the function cannot be called without the full application context, the boundary between this unit and the rest of the system is implicit rather than declared. Making it explicit — dependencies passed in rather than acquired, a clear interface — produces code that is testable by design. An object that receives its collaborators through its constructor can be tested with a substitute. An object that creates or locates its collaborators at runtime cannot.

These are not testing improvements. They are design improvements that happen to make testing easier. The testing friction is the messenger. The refactoring is the response.

The Broader Signal

Tests are the first consumer of your code. They instantiate it, call it, and assert on its outputs before any production caller does. If the first consumer is already struggling — needing scaffolding, special environments, extended configuration just to exercise a function — production callers will face some version of the same friction. They are just less vocal about it. The next engineer who needs to modify the function will not tell you it was hard to understand. They will just take longer. The incident that requires a fast change to this module will not announce that the coupling made it risky. It will just take longer to resolve than you expected.

A hard-to-test codebase is not just hard to test. It is hard to reason about, hard to modify, and hard to understand six months later when the context is gone and the person who built the thirty-one-line setup has moved on to a different team.

The next time a test fights back, pay attention to which part of the fight is loudest. The mock count. The setup length. The inability to run without the full application. Each one is a different sentence in the same message: the code is asking to be designed differently.

The challenge is learning to distinguish essential complexity from accidental complexity. Essential complexity is the domain itself — the business problem being solved, and it belongs in the code. Accidental complexity is the coupling introduced by the implementation — and it belongs in the refactor list.

The test is not the problem. The test is just the first thing honest enough to say so.

Frequently Asked Questions

What does it mean when a test is hard to write?

Hard-to-test code is almost always hard for the same structural reasons: it knows too many things about its collaborators, it does more than one thing, or it cannot exist without a specific environment. Difficulty of testing is a design signal — the test is exposing coupling, hidden state, or boundary violations in the code under test, not a problem with the test itself. The exception is genuine domain complexity: some problems are inherently hard, and a well-designed system in those domains may still require significant setup. The signal worth reading is accidental complexity — the kind that dissolves when you redesign.

Collapse

Is it bad if code is hard to test?

Expand

Does TDD make code easier to test?

Expand

How do you fix code that is hard to test?

Expand

See which parts of your codebase are fighting your tests

Prevention identifies coupling, boundary violations, and testability problems at PR time — before they compound into code nobody wants to touch.

Get Your Free Diagnosis

Share this article

Help others discover this content

TwitterLinkedIn
Categories:Engineering PracticesSoftware DesignTesting