A Dozen Reasons Why Test-First Is Better Than Test-Later (Pt. 2)
From the Developer Essentials Newsletter: The intersection of Agile methods and technical software development.
5. Discovering Scenarios
Have you ever looked at a block of a mere dozen lines of old code (that you now have to enhance), and struggled to figure out all the permutations of what it does, and what could go wrong? If your team is writing unit tests afterwards, what if they missed an important scenario for that object? You could then mistakenly introduce a defect that would slip past the test suite.
When a team reassures me that they’re adding tests after the fact, we often discover they have perhaps 1/3 of the behaviors covered, or the tests don’t sufficiently check all side-effects that are clearly performed in the code. That leaves a lot of untested behavior that could break later, without anyone knowing!
With TDD, we don’t add functionality without a failing test, so we know we are much less likely to miss something. We’re actually “thinking in tests.” TDD requires us to keep a short list of our as-yet-untested scenarios, and we frequently think of more (or better) new scenarios while writing the tests and implementation for the more obvious scenarios.
It’s just so much easier to see those alternative paths and permutations that need testing as you are writing the code the first time, because that’s what your mind is on at the time. With TDD, you’re dealing with just one at a time, so the code increases in complexity incrementally, while the growing test-suite protects the work you’ve already finished. And, there’s always that short To-Do List at your side (or in your tests) for you to record the brainstorm of ideas that flood your mind.
6. It’s easier to write the test first.
If you try to write the tests afterwards, what happens if a test fails? Is the test wrong? Is the code wrong? Sure, it’s easy to figure out which is true…right?
Confirmation bias may play a role here. “Ah! Maybe that’s what’s supposed to happen. Yeah…that makes sense. Kinda…” So you “fix” the test to match the code, or — “No! Please stop! Nooooo…” >poof!<—you delete the failing test.
We’re human. We avoid pain. I suspect that’s why a lot of unit tests don’t get written after the fact, at all. We avoid those areas where we’re not sure we got it right, where the requirements were a bit vague, where we made some assumptions, or where we think it’s just not that important. You know, that “less important” code we keep hearing about (but we’ve never encountered).
7. You can’t place your bet after the wheel has stopped.
Writing the implementation first, and then testing the behavior that you wrote, is rather like trying to convince yourself that you really had wanted to bet on red, not black.
I recall just such a confirmation-bias disaster. In the mid-90s, our enterprise-backup software’s UNIX ports had a chunk of code based on “tar,” the UNIX tape-archival command. On restore, tar would mask out the super-user execution bit on any restored file. I recall seeing this code (I hadn’t written it…our architect had copied and pasted it directly from tar), and assumed that a smart, security-minded thing to do in tar was likely a smart thing for our product, too.
Except that our product was expected to restore a whole system to a bootable state.
No one noticed, for years. Then one day a client had to restore a root partition on a massive server, and the dang thing wouldn’t boot correctly. Oops!
Turns out the architect, executive developer, and the UNIX developer (moi), had all unknowingly conspired to ruin a customer’s day (more like “further ruin”).
All the test scripts we had written around restore functionality assumed that the tar code was right. None of those scripts checked the restored files’ SETUID bits.
How human of us, to assert only that we couldn’t have possibly made a mistake. The scientific equivalent of this would be to carefully craft a flawed experiment to bolster a pet theory.
We had damaged our reputation, and risked the reputation of the product. Thankfully, UNIX ports took about 20 minutes, assuming we had access to the appropriate machine. And we were able to FedEx a magnetic tape overnight…
8. Good behavioral coverage, guaranteed.
With TDD, we want to see the test fail. That’s how we know it’s testing something new. And to make it fail, we have to add at least one assertion or expectation. This gives us what I call “behavioral coverage”: We’re not concerned with covering (or exercising) code. We’re covering real system behaviors.
Seems obvious, right? Apparently, not always. I’ve come across entire test suites that reached 80% code coverage, and tested…absolutely nothing.
How does this happen?
One team had been encouraged by their leadership to increase their code-coverage numbers: “Thou shalt increase code coverage by 10% per month!” They were successful in meeting this “motivational metric.” The managers had called me in after the team had reached about 80%, because their defect rate had not decreased, at all.
I sat down with the team and they showed me their tests. None of the tests I saw had assertions/expectations. No behavioral coverage!
When I recovered from a dumbfounded state, I asked, as evenly as I could, if they could tell me what the tests I saw on the screen were meant to test. They quite sincerely answered, “Well, if it doesn’t throw an exception, it works!”
Not a good standard of quality for an institutional investment firm.
When you write the test first, you have to ask the critical question, “What behavior am I testing here?”
“See” you next month!
 The name for a metric that de-motivates people. To understand this counter-intuitive effect, I recommend Robert Austin’s book, Measuring and Managing Performance in Organizations.
 Yeah, just be thankful they weren’t working on self-driving cars!