Can a Unit of Behavior Span Objects?
A client recently asked me if a unit test could test multiple objects. On the surface, you might suppose my immediate answer would be “No way!” But his question, and my answer were both a bit more nuanced than that.
His question was based on my definition of a “unit” of software as a “unit of behavior.” It doesn’t map to a single method, block, or even line of code. A unit of behavior is really an irreducibly small scenario. (Yes, even some subatomic particles are aggregates, so let me say “sanely irreducible” and let you decide whether your test-infected colleagues are crazy, or lazy, or–hopefully–comfortably in between those two extremes.)
He was wondering if there was ever such a unit of behavior that spans object classes. It’s a great question, and it tells me that he was really thinking about the implications of “unit of behavior.”
And my answer, nevertheless, is still “no.” Except it’s really a “no, but…!”
Well-factored objects each have their own contextual behavior that is easy to understand, communicates intent, and is easy to test in isolation. In other words, if there’s a unit of behavior that is only partly satisfied by two collaborating objects of different classes, then either the behavior can be reduced (and tested) further, or the behavior is not well-defined and well-isolated inside one object class. Multiple objects are designed to work together to build up composite system behaviors.
An Example Without Metasyntactic Variables!
We need a simple example to illustrate the “but”: So simple that I can describe it without going into gory detail, and without resorting to metasyntactics (Foo, Bar, Baz…). They are–I’m told–dead. And they never were all that helpful. (Still, I will mourn their passing…XYZZY!)
Okay, I think I have one. Imagine a Loan object that has a collection of Asset objects. That’s all I’ll say. No, I can’t give you more context, because it would mislead you. Loan references Asset, and that’s our whole amazing system. Oh, and they’re both our team’s code, so we can’t blame anyone else if they’re difficult to test, or difficult to use in a test.
Well, and they can be stored and retrieved from our wondrous and elegant object database. I want to throw that little wrinkle into it. You may be thinking “test-doubles!” Sure, but…no. (Sorry.)
Nowadays I don’t recommend using test-doubles for every dependency. Actually, I never spent any time in that camp (London-style TDD). I had friends who tried to mock out every one of an object’s dependencies in order to build software top-down (a noble goal, to be sure) and to test it in “total, pure isolation.” After a few months, they gave up.
(Aside: If I had to come up with a single general rule for test-doubles, it would be: Mock out your wrapper–Adapter, Facade, Proxy–of someone else’s code. But that’s not today’s topic. Are you trying to distract me from the real issue???)
So how do we test Loan without creating test-doubles for Assets?
If Asset is fully tested, then Asset can be used in tests for Loan. We don’t need to mock out Asset unless it has a dependency of its own on something that is slow, nondeterministic, difficult to set up, or expensive.
That’s why I mentioned the database.
If we have a system where each object is responsible for its own CRUD operations as well as its business behaviors, then (1) it’s breaking the Single Responsibility Principle, (2) it’s poorly cohesive, (3) there’ll be a shipload of duplication across persisted objects, and–worst of all–(4) it’ll be hard to test, and even harder to use in other tests. Have you ever had to create a mock that returned other mocks? Yeah, me too. Smelly!
All very bad; and in “Multiple Orthogonal Dimensions of Pain!” But, if Asset objects are easily constructible, stick to their limited responsibilities, and are easily injected into the Loan (via constructor), then the database isn’t even involved in the tests for Asset or Loan.
(Okay, I’ll share one little detail for the ultra-curious: Asset was not an abstract class. It was an aggregate, with a type code like MultiTenancy, Retail, AnchorRetail, and other crystal-clear, unchanging business delineations [that’s sarcasm, by the way].)
So if we’ve thoroughly tested all the behaviors for all types of Asset objects, then we can selectively choose which Assets to use in a Loan being tested, and we can predict the outcome in a unit test. In other words, the creation of specifically-configured Assets becomes part of the “Given” of the unit test.
Let’s say for example that a Loan with a specific Asset type with an assessed value in a certain range would secure a rating of “AA.” Also, the Asset class has already been fully tested to give back the correct per-Asset evaluation. So we’re testing just a lone behavior of Loan:
Does that contradict what I said about a unit of behavior not spanning objects? Nope. The test will be exercising, or “covering” code in both Loan and Asset, but the test’s assertions (or expectations) will be about Loan’s calculations and interactions with Asset.
In a sense, Asset is acting as its own test-double. And why not? Simple objects are easy to construct. If they have 8 other dependencies, that’s a code smell.
If they can only be retrieved from a database, that’s a smell. If they’re too complex to make the scenario clear and comprehensible, that’s a smell. What if Asset is ugly, stinky, untested legacy code? What if it’s slow, hard to construct, non-deterministic, and expensive to use (i.e., it accesses a database)? You already know: Mock it! (Technically, any form of test-double will do. But “Test-double it!” doesn’t roll off the tongue…)
Spooky Action at a Distance
The belief persists that the aggregate system behaviors still need testing. In a sense, yes, and that’s why unit-testing isn’t the only form of testing in the Agile Testing Pyramid (which I re-imagine as a pyramid of practices, and the area of the pyramid roughly represents the relative amount of tests produced).
But in another sense, all the behavior is still easily testable at the unit level.
What I’m saying, phrased differently, is this: The code isn’t going to manifest some arbitrarily unexpected behavior when you combine objects.
Tests from BDD (i.e., given/when/then scenarios in Cucumber’s feature files) check that we have implemented the rules and behaviors the customer is relying upon (that’s very different than saying we test at the GUI).
Tests resulting from TDD confirm that each piece of a well-designed system is doing its small part, allowing us to retain a resilient design that can be modified and extended later. And if we happen to break something, we’ll know immediately what, where, and why.
We developers really like having that safety net: It creates confidence, which allows us to make changes and enhancements more quickly. Ergo, the ever-elusive “team performance” increases, whether or not you’ve figured out a good way to measure it. (Aside: Velocity ain’t it. Full stop.)
So TDD can be used to fill in the gaps that we may choose to leave between BDD scenarios. TDD also allows us to break up combinatorial problems into individually testable units of behavior, including when and how a Loan interacts with an Asset. If we’ve decomposed our problem into discrete but purposeful objects, there will be no spooky unexpected aggregate behaviors. Automata Theory reassures us of that: computers don’t do anything we don’t explicitly tell them to do.*
Don’t be afraid to use behaviors, functions, and simple objects you’ve already fully tested elsewhere, as part of a test for something else that relies on that truthful functionality. In other words: easy-to-construct objects can often act as their own “test doubles.”
Keep the tiniest bit of useful behavior well-encapsulated in one object or function. “Units of behavior” are easier to test, understand, refactor, and extend.
Lastly: You’re doing this for you and your team, now and in the future. Don’t wait to do the right thing until someone else tells you to do it.
*Caveat: Well, computers do exactly what we tell them to unless they’re hit by a gamma ray…which happens far more frequently than we used to think. No, I’m not putting forth a wacky conspiracy theory…See the podcast below.
Maybe I’ll cover that in a future post: If the software is making a critical decision (like whether to increase your morphine dose, or choosing the next US President), then there are techniques to correct memory errors induced by these cosmic rays. I’ve still not gone off the rails! They’re really called cosmic rays!
So, testers: you need to learn how to transmit cosmic rays with your mind. (Okay, now I’m kidding.)
The Radiolab podcast describing how gamma rays can alter an electronic election .