TDD/BDD and the lean startup | Ian Cooper

:

A post at Hacker News caused a little bit of a storm on Twitter, by questioning if a lean startup needed to use Test First or if it was an obstacle to delivery.

Build the right thing, or avoid Rework

The poster complains that test first approaches don’t match with market driven evaluation of features:

Most of the code in a lean startup are hypotheses that will be tested by the market, and possibly thrown out (even harder to rewrite test cases with slightly different requirements over writing from the beginning).

First let’s look at a big reduction on the cost of delivery of software – building the right thing in the first place. Those people who come from a background in organizations with little or no process, often confuse Agile’s attack on process over software as support for having no process. It is not. In many ways classical agile is best understood in the context of what went before, heavyweight process, often waterfall such as SSADM. In these cases up to a third of the project’s time was spent in the design phase, before code was written. We don’t want the waste associated with that, but we do want just enough process and we want it at the last responsible moment.

Test-First approaches focus on ensuring that we think about the acceptance criteria for the software we are about to build. The big danger we are trying to avoid is re-work. It is estimated that 50% of a project’s overrun costs come down to rework. Creating a testable specification, through a Given-When-Then scenario or use case helps us to turn a woolly and vague requirement, such as we need to show how late a tube train is running, to something more concrete. We are forced to consider the edge cases, and the failure conditions. It is cheaper to consider these before the code is written, than through writing code and adjusting it afterwards. There will always be some rework, particularly around UX, but we can remove the obvious communication barriers by having a structured conversation around the requirements.

BDD and ATDD emphasize avoiding rework by ensuring that you have the acceptance criteria for a story defined before work begins. The process of defining those criteria, often through a story workshop is much of the value, the creation of  a shared understanding of how the product should work. Usually that workshop occurs in the preceding iteration, and takes about 10-15% of the time in that iteration. Whilst some folks consider it non-agile to work on the next iteration’s requirements, this really is the last responsible moment, because you may raise questions when discussing the acceptance detail for a story that take time to answer, and you don’t want to stall the line to answer them in an iteration.

So its important to understand that in this context Test-First is about reducing waste in the development cycle, not increasing it.

From Extreme Programming:

Creating a unit test helps a developer to really consider what needs to be done. Requirements are nailed down firmly by tests. There can be no misunderstanding a specification written in the form of executable code.

We still need to address the question of what will work in the marketplace; what if your feature is supposed to “test the water”. Usually, the concern here is whether testing is wasted effort if you decide to throw the feature away. Some of that depends on what we think the cost of working test first is – and I’ll get to that next, but an alternative to waste reduction here is to look at incremental development approaches. Figure out what the minimum marketable feature set is, and release that to gain feedback.

With subsequent iterations increment the feature set, if feedback suggests that it is successful. However, at the heart of this is that a good product depends on a good vision, cleanly and simply executed.

Think about the great products you know, they will tend to focus on doing one thing and doing it well, not being a jack of all trades. The scatter-gun approach tends to indicate a lack of product vision to me, not an effective strategy for uncovering what your users want. Build something simple, and wait for feedback on additional capabilities. But they have to buy your vision.

There are many emerging ideas on how to reduce the cost of discovering the market demand for new features, without building them. The Pretotyping guys are gaining some publicity, but there are also the Design Fiction crowd. Both talk about the step before even prototyping when you try to determine the viability of an idea. Recently at Huddle we have been practicing Documentation Driven Development in an effort to gain feedback on proposed API designs, before we build them.

Feedback, and the cost of TDD

The poster on Hacker News complains of the time taken to write tests:

There always seems to be about 3-5x as much code for a feature’s respective set of tests as the actual feature (just takes a lot of damn time to write that much code).

It’s crazy talk to suggest that developers ship code without testing it in some way. Before TDD the best practices that I used to follow talked about spinning up the debugger and tracing every line of code that you had written to make sure it behaved as you expected. Spinning up the debugger and tracing is expensive, far more expensive than unit testing where you hope not to have to start a debugging session to implement a test (it’s usually a sign – to use Kent Beck’s stick shift analogy – that you are in the wrong gear in TDD and need to shift down by writing lower level tests).

In addition, getting this code to the point you can debug may be hard, requiring a number of steps to put the system in the right step. Unit tests are isolated from these concerns, which makes their context easier to establish, you have no such luck with manual tracing.

This problem only gets worse if you have to repeatedly spin up the debugger to trace through the code as you enhance it. Most folks break a problem down into parts to solve it, adding new capabilities. They are naturally incremental. So this means you need to perform this test run through repeatedly.

In addition, unit tests are not particularly new and have been a recommended practice for year. Many developers wrote self-testing code, that came with a simple harness to establish context and exercise the code, asserting on failure, before test first capitalized on the approach, but switched to writing tests first. Anyone who cares about their code being correct ought to be unit testing, and if you write the test last you are in the unenviable position of paying the cost to trace the code while you develop as well as writing a unit test after.

So usually, any halfway responsible developer is doing more work to prove his code is correct without test first rather than with.

The problem is that we tend to measure time to write the code, which seems slower with up-front tests, instead of time to fashion working code, which is often much faster with tests. Because of this we feel that test first takes longer. But we need to look at all the effort we expend to deliver the story end-to-end, which in this case tends to make dropping test first look like a local optima.

In addition, even if there was zero gain for test first, or a little worse, the side-affect of gaining a test suite that you can re-run exceeds any likely gains from not doing test first. So the barrier for tests to prove their value in cost to develop a feature is low.

The cost of ownership of tests can cause some concerns. The poster at Hacker News complains that:

When we pivot or iterate, it seems we always spend a lot of time finding and deleting test cases related to old functionality (disincentive on iterating/pivoting, software is less flexible).

One problem with ‘expensive’ tests is often that implementers have not looked into best practice. Test code has its own rules, and is not a second class citizen. You need to both keep up to date with best practice to keep your test code in good shape. Too many teams fail to understand the importance of mastering the discipline of unit testing, in order to make it cost effective. As a starter, I would look at both xUnit Test Patterns, the RSpec book and Growing Object Oriented Software Guided ByTests if you need more advice on current practice.

I may write a longer post on this at some point, but my approach is to use a Hexagonal, or ports and adapters, architecture with unit tests focused on exercising scenarios at the port, and diving deeper if you need to shift gears. Briefly though, one reason people find tests expensive to own is an over-reliance on Testcase class per class instead of Testcase per class Feature or Testcase per Scenario. Remember that you care about putting your public API under test, which is usually the port level in a Hexagonal architecture, perhaps a number of top-level domain classes exercised in those ports. You may want to wrap internals when you need to shift down a gear, in Kent Beck’s terminology, because you are unsure of the road  ahead, but don’t succumb to the temptation to believe Test First is about testing every class in your solution. This will only become an obstacle to refactoring, because your changes will break tests, as you change the internal structure of your application, which makes testing itself feel as though it makes the application more brittle.

Often too many tests breaking is a sign that you have not encapsulated your implementation enough, exposing it all publicly, and you are over-specifying that implementation through tests.

Acceptance and Unit Tests

Let’s agree definitions. Acceptance tests confirm that a story is done-done whereas a unit test confirms that a part is correct. Or to put it another way unit tests confirm that we have built the software right, acceptance tests that we built the right software.

A lot of the above discussion focuses on unit tests, but when we get to acceptance tests the situation can get a little more complex. Acceptance testing may be automated, but it may also be manual and the trade off on when to automate is a more difficult one to make than when using unit testing.

Now the clear win tends to come when you have well-defined business rules that you want to automate. Provided you have a hexagonal, or ports and adapters architecture, you can usually test these at the port  – i.e. sub-cutaneously without touching the UX. So they tend to be similar to unit tests written at the same level, they just go end-to-end.

Automated tests tend to be a little more complex when they go end-to-end, because you have to worry about setting up data in the required state for the test to run. If you find your tests taking too long to write, you may need to break out common functionality such as setting up context in a TestDataBuilder. (The TestDataBuilder pattern is also useful in unit tests for setting up complex collaborating objects).

There is a misunderstanding that acceptance tests are about tooling, using Fitnesse, Cucumber or the like. It’s about agreeing acceptance criteria up front, the clarity of the story that is provided through that preventing rework, and the automation of the confirmation so that what is done stays done. The appropriate tooling depends on how you need to communicate the resulting tests – with a ports-and-adapters architecture you just need a need a testrunner to execute code, and that testrunner may be written in natural language or code depending on the audience.

Because much of the value to acceptance tests comes from defining done-done it would be possible to suggest that automating these tests gives less value back than the process of agreeing the test plan. However that suite of automated tests gives you quick regression testing support, and that supports your ability to re-write instead of refactor. Unit tests support preserving behaviour whilst refactoring, but this does not include changing the public interface; for that i.e. preserving the business rules while making large scale changes, you need some acceptance tests.

It is common for teams that do hit the bar of effective unit testing to fail at the bar of acceptance testing.

My previous experience working both with and without acceptance tests is that without acceptance tests we often find bugs creeping in the software that our unit test suite does not catch. This is especially true if our acceptance tests are our only integration tests. If we do no enforce the strict policy that existing acceptance tests must be green before check in, we often get builds that introduce regression issues, that may go uncaught for some period of time. It’s this long feedback loop, between an issue being created by a check in and then discovered that causes us significant issues – because we do not know what change led to the regression issue.

So by avoiding the cost of acceptance tests we risk the costs of the work to fix those regression defects and because the point between the defect being discovered and a developer making a check in long, they may cost more to fix than writing the acceptance tests would have done.

In environments where I have enforced acceptance tests being green before check-in the quality of software improved and when regression issues did emerge it was clear to the developer that thier changes had caused the issue, leading to faster resolution (and preserving the current test environments).

Of course this may depend on how long it takes to regression test your software. That situation seems to many even more marginal when we think about automating UI tests for acceptance. Often the cost of keeping pace with changes to the UI within tests can seem to be fruitless. Particularly in early stages of a product the cost of creating and maintaining UX tests can seem to exceed their value. You can end up in a scenario where the cost of a manual regression test run is less than the cost of maintaining an automated test suite, and finds more defects.

This is especially true if you have a thin UI adapter layer over your ports, with acceptance tests against the ports catching your end-to-end issues. You already have plenty of behaviour preserving functionality around your business rules, so now the UI tests should really be only picking up UI defects (of course if you have a Magic Pushbutton architecture all bets are off as you will can’t easily write tests without exercising the UI)

The problem becomes that this equation holds true, until it doesn’t. At a certain point the manual regression test costs become an obstacle to regular releases – simply put it takes you too long to confirm the software, and the cost of releasing rises to the point that you often defer it, because you don’t have sufficient value in a release compared to the regression testing effort.

At that point you will wish you had been building those automated UI tests all along.

I think there is a point where you need to begin UX automation, but it is hard to call the exact point when you begin doing that. It may depend on your ability to create adequate coverage through acceptance tests written against your ports, it may depend on the complexity of the UI. Up to now I think I have always started too early, regretted it and abandoned the effort, then resumed too later because of that. I need to learn more here before I can make a call on getting this right. Again I suspect that understanding best practice around this type of testing might help, and I’d be happy for anyone sharing links in the comments below.

TDD/BDD and the Lean Startup

Overall I think that there are two dangers to the simplistic position given by the original poster. The first is to mistake reducing the time taken to get to the point where you can run code, as reducing the time taken to get to the point where you can release code. Once you appreciate the two are not the same, and look at overall costs and benefits, the value of TDD/BDD becomes more compelling. The second is to mistake lack of understanding of best practice in TDD with a failure in the process itself. We have come a long way in the last few years in learning how to reduce the costs of writing and owning tests, and they should be judged in that context.