Can a Unit of Behavior Span Objects?
A client recently asked me if a unit test could test multiple objects. On the surface, you might suppose my immediate answer would be “No way!” But his question, and my answer were both a bit more nuanced than that.
His question was based on my definition of a “unit” of software as a “unit of behavior.” It doesn’t map to a single method, block, or even line of code. A unit of behavior is really an irreducibly small scenario. (Yes, even some subatomic particles are aggregates, so let me say “sanely irreducible” and let you decide whether your test-infected colleagues are crazy, or lazy, or–hopefully–comfortably in between those two extremes.)
He was wondering if there was ever such a unit of behavior that spans object classes. It’s a great question, and it tells me that he was really thinking about the implications of “unit of behavior.”
And my answer, nevertheless, is still “no.” Except it’s really a “no, but…!”
Well-factored objects each have their own contextual behavior that is easy to understand, communicates intent, and is easy to test in isolation. In other words, if there’s a unit of behavior that is only partly satisfied by two collaborating objects of different classes, then either the behavior can be reduced (and tested) further, or the behavior is not well-defined and well-isolated inside one object class. Multiple objects are designed to work together to build up composite system behaviors.
An Example Without Metasyntactic Variables!
We need a simple example to illustrate the “but”: So simple that I can describe it without going into gory detail, and without resorting to metasyntactics (Foo, Bar, Baz…). They are–I’m told–dead. And they never were all that helpful. (Still, I will mourn their passing…XYZZY!)
Okay, I think I have one. Imagine a Loan object that has a collection of Asset objects. That’s all I’ll say. No, I can’t give you more context, because it would mislead you. Loan references Asset, and that’s our whole amazing system. Oh, and they’re both our team’s code, so we can’t blame anyone else if they’re difficult to test, or difficult to use in a test.
Well, and they can be stored and retrieved from our wondrous and elegant object database. I want to throw that little wrinkle into it. You may be thinking “test-doubles!” Sure, but…no. (Sorry.)
Nowadays I don’t recommend using test-doubles for every dependency. Actually, I never spent any time in that camp (London-style TDD). I had friends who tried to mock out every one of an object’s dependencies in order to build software top-down (a noble goal, to be sure) and to test it in “total, pure isolation.” After a few months, they gave up.
(Aside: If I had to come up with a single general rule for test-doubles, it would be: Mock out your wrapper–Adapter, Facade, Proxy–of someone else’s code. But that’s not today’s topic. Are you trying to distract me from the real issue???)
So how do we test Loan without creating test-doubles for Assets?
If Asset is fully tested, then Asset can be used in tests for Loan. We don’t need to mock out Asset unless it has a dependency of its own on something that is slow, nondeterministic, difficult to set up, or expensive.
That’s why I mentioned the database.
If we have a system where each object is responsible for its own CRUD operations as well as its business behaviors, then (1) it’s breaking the Single Responsibility Principle, (2) it’s poorly cohesive, (3) there’ll be a shipload of duplication across persisted objects, and–worst of all–(4) it’ll be hard to test, and even harder to use in other tests. Have you ever had to create a mock that returned other mocks? Yeah, me too. Smelly!
All very bad; and in “Multiple Orthogonal Dimensions of Pain!” But, if Asset objects are easily constructible, stick to their limited responsibilities, and are easily injected into the Loan (via constructor), then the database isn’t even involved in the tests for Asset or Loan.
(Okay, I’ll share one little detail for the ultra-curious: Asset was not an abstract class. It was an aggregate, with a type code like MultiTenancy, Retail, AnchorRetail, and other crystal-clear, unchanging business delineations [that’s sarcasm, by the way].)
So if we’ve thoroughly tested all the behaviors for all types of Asset objects, then we can selectively choose which Assets to use in a Loan being tested, and we can predict the outcome in a unit test. In other words, the creation of specifically-configured Assets becomes part of the “Given” of the unit test.
Let’s say for example that a Loan with a specific Asset type with an assessed value in a certain range would secure a rating of “AA.” Also, the Asset class has already been fully tested to give back the correct per-Asset evaluation. So we’re testing just a lone behavior of Loan:
Does that contradict what I said about a unit of behavior not spanning objects? Nope. The test will be exercising, or “covering” code in both Loan and Asset, but the test’s assertions (or expectations) will be about Loan’s calculations and interactions with Asset.
In a sense, Asset is acting as its own test-double. And why not? Simple objects are easy to construct. If they have 8 other dependencies, that’s a code smell.
If they can only be retrieved from a database, that’s a smell. If they’re too complex to make the scenario clear and comprehensible, that’s a smell. What if Asset is ugly, stinky, untested legacy code? What if it’s slow, hard to construct, non-deterministic, and expensive to use (i.e., it accesses a database)? You already know: Mock it! (Technically, any form of test-double will do. But “Test-double it!” doesn’t roll off the tongue…)
Spooky Action at a Distance
The belief persists that the aggregate system behaviors still need testing. In a sense, yes, and that’s why unit-testing isn’t the only form of testing in the Agile Testing Pyramid (which I re-imagine as a pyramid of practices, and the area of the pyramid roughly represents the relative amount of tests produced).
But in another sense, all the behavior is still easily testable at the unit level.
What I’m saying, phrased differently, is this: The code isn’t going to manifest some arbitrarily unexpected behavior when you combine objects.
Tests from BDD (i.e., given/when/then scenarios in Cucumber’s feature files) check that we have implemented the rules and behaviors the customer is relying upon (that’s very different than saying we test at the GUI).
Tests resulting from TDD confirm that each piece of a well-designed system is doing its small part, allowing us to retain a resilient design that can be modified and extended later. And if we happen to break something, we’ll know immediately what, where, and why.
We developers really like having that safety net: It creates confidence, which allows us to make changes and enhancements more quickly. Ergo, the ever-elusive “team performance” increases, whether or not you’ve figured out a good way to measure it. (Aside: Velocity ain’t it. Full stop.)
So TDD can be used to fill in the gaps that we may choose to leave between BDD scenarios. TDD also allows us to break up combinatorial problems into individually testable units of behavior, including when and how a Loan interacts with an Asset. If we’ve decomposed our problem into discrete but purposeful objects, there will be no spooky unexpected aggregate behaviors. Automata Theory reassures us of that: computers don’t do anything we don’t explicitly tell them to do.*
Don’t be afraid to use behaviors, functions, and simple objects you’ve already fully tested elsewhere, as part of a test for something else that relies on that truthful functionality. In other words: easy-to-construct objects can often act as their own “test doubles.”
Keep the tiniest bit of useful behavior well-encapsulated in one object or function. “Units of behavior” are easier to test, understand, refactor, and extend.
Lastly: You’re doing this for you and your team, now and in the future. Don’t wait to do the right thing until someone else tells you to do it.
*Caveat: Well, computers do exactly what we tell them to unless they’re hit by a gamma ray…which happens far more frequently than we used to think. No, I’m not putting forth a wacky conspiracy theory…See the podcast below.
Maybe I’ll cover that in a future post: If the software is making a critical decision (like whether to increase your morphine dose, or choosing the next US President), then there are techniques to correct memory errors induced by these cosmic rays. I’ve still not gone off the rails! They’re really called cosmic rays!
So, testers: you need to learn how to transmit cosmic rays with your mind. (Okay, now I’m kidding.)
The Radiolab podcast describing how gamma rays can alter an electronic election .
Can a unit span multiple objects? ABSOLUTELY! If you refactor a test-driven (or test-characterized) “unit” by extracting a class, it is still a unit. There is no value moving the tests unless you plan to reuse the extracted class in another context. In fact, I have sometimes defined a “unit” as a “unit of reuse.”
I recall seeing a statement from Bob Martin that, while unit tests should initially reflect the structure of the code, they should not reflect it forever.
Your comment is quite insightful! I was formulating a response when I realized there’s enough to this that I want to write it up as a “sequel” to this post; for the September Developer Essentials Newsletter (& blog post).
Here is the Uncle Bob post that I was referencing: https://blog.cleancoder.com/uncle-bob/2017/10/03/TestContravariance.html
Hi Rob, I agree on the general point you are making, and I find it’s very useful to think in terms of “units of behaviour” rather than static units of code.
I have a few quibbles on the way the example test is written.
One is that the title of the test seems to suggest that there’s an asset with the properties of being CA and being “multi tenant”, but then in the code of the test it turns out there are two separate assets! I wonder how they are connected by the logic of Loan. What happens if I pass one asset of type CA and two of type “multi tenant” with different assessment values?
The second is that you’re specifying California as the location; if the result does not depend on that, I would pass a variable called “any_location” to signify that.
The third quibble is in the use of type tags… you could define separate subclasses of Asset for each subtype, as the type is probably always known at construction time.
Thanks for writing this note!
Let’s see if I can address each quibble.
1. “…the title of the test…” Yeah, I don’t recall my thinking in having two assets in there. Likely, it’s to make sure that the code isn’t just looking at the first asset. In other words, the business rule (in my head at the time) may have been that *any* asset fitting those parameters would result in the AA rating. I’ll often do this to nudge the code into a loop or reduce().
2. “…California as the location…” Yup, I recall there was a business rule that was this geographically specific. I agree with you: “…if the result does not depend on that…” I would pass any_location or any_location().
3. “…you could define separate subclasses of Asset for each subtype…” Yep, though I save abstract parent classes for /behavioral/ categorization. All Assets used the same code. The rules around securitization were where the variations occurred. If anything, A SecuritizationRule hierarchy might have developed over time. You see why choosing examples is tricky!
Thanks for your comment!