Connecting code to business value - a foray into Behavior Driven Development - CodeProject

:

Introduction

This article is a walk-through starting with a definition of what is actually useful to an end-user (the aims or business value part) and then connecting that formal value statement to code that should test whether the (software) system actually delivers that value.

The discussion in the post is focused on the process of developing in such a way i.e. the pros and cons of BDD in practice, while not describing technical aspects (for a how-to in .NET see BDD using SpecFlow ).

  • As such, it should be relevant to any programming language.
  • Please do not read this as best practice, I am rather sharing my first experiences developing this way and the issues that surface.

Background

For more background information on the general approach used here, which in my opinion should be called Value Driven Development rather than Behavior Driven Development, see e.g. http://dannorth.net/whats-in-a-story/

From scenarios to falsifiable code

I start with a specific narrow value proposition written in the Gherkin language, which should formally define  something that is useful to someone, the axiomatic guide for the subsequent process.

In my case, it is for an app that locks the user out for certain times:

Feature: Allow the user to set a personally relevant lockup schedule so as to make the lockup times befitting

    Scenario: Prompt for workday and holiday lockup start times to allow the user to set a personally relevant lockup schedule
        When the app has finished starting up
        And the user has not been set a lockup schedule yet
         # Identifying users - e.g. to not mix up Bob's and Sally's schedules - is a separate desirable
        Then the user is prompted with the opportunity to change the default workday and holiday lockup start times
        And the chosen times are saved off the user's device
         # Saving is tested by checking if these user settings are retrievable immediately after saving,
         # while they should be saved permanently, that is not tested in this scenario so as to get test results back immediately
         # Making sure the schedule is actually used to lock up is a separate desirable

[ The initial requirements analysis process of getting to the scenario above, by defining higher-level stories or features is another story. ]

Having just the scenario on its own is one of the key reasons I am trying to develop this way. It makes sure we know what we want, which is not the case if such specific requirements are only mindstay. Or worse if the desired functionality of an application is vaguely deliberated in someone's head, which is likely to lead to developing the wrong things and count toward wasted effort or developing functionality of lower-priority at the expense of what would have really been desired.

Having the test that tests the scenario is another key reason I am trying to develop this way. Why this test is useful shares its rationale with the usefulness of testing in general. In particular, with a black-box functional test like this I am not only able to confirm whether the scenario is fulfilled at the end of developing it, but as the software system is developed further in other aspects I can relatively effortlessly re-confirm that the original desired scenario is still working (and if the new code changes make the test fail, I know to fix it). With an application-full of scenarios and their tests this approach allows me to recover functionality before having to release broken software, besides providing confidence that my software achieves what it is supposed to.

Lastly on the pro side, I would like to actually run these tests in live environments on actual end-user devices, to help with troubleshooting. When a user reports x is not working on their device, I could run all the functional tests on their device and see which ones pass and fail, and then focus on making the failing tests pass by debugging the exact steps that they failed at. I have not heard of this being done (nor have I looked much), but in my head it sounds like a wonderful way to help in troubleshooting. Of course, to run these tests on client devices requires an ability to do so, and same goes for getting the results back regarding which step failed exactly for which tests, but in theory at least I don't think setting up that infrastructure is too much work. Local testing could be done proactively as well, before a user reports a problem, to identify environments that have not been taken into account when implementing desired functionality.

Implementing the scenario test

Now that I have a scenario, I need to tie it to code that tests step-by-step whether all these conditions yield true or false, so the goal is to have a test I can run that will either pass or fail overall. In this case, I am using SpecFlow in a C# WPF project with NUnit to integrate the scenarios with the tests. To start with I defined essentially empty methods for each of the scenario steps (the When And Then And), and just put a Pending statement within each saying to the NUnit testing framework that the test steps have not been implemented (getting inconclusive rather than pass or fail if you ran the test now).

When the app has finished starting up

To implement this step of the test, I need to first kill the existing app process in case it runs, which it is very likely in this case given this app and given I would be testing on the end-client device. Then I need to write code to run the app (note that the test runs in its own isolated process with NUnit on Windows). And then I need to wait until I get some kind of signal indicating the app has finished starting up. It took me 4 hours to implement just this step (and I wasn't being slow)!

So here a dire doubt surfaces: if I did not use this approach, I would not have to spend any of this time implementing the test steps, because there would be no functional test to implement. On the other hand, I am hoping that the time needed to implement the test steps would be reduced substantially in the future, because of familiarity and because many of the needed implementations would be similar (e.g. is an app running or make an app run). Which, in turn, raises the point of re-use and which made me implement the steps in a modular way such that in the future I could just call a method with the app path as the argument in my personal testing library, for instance. Will see if that helps to reduce time input for subsequent scenarios.

P.S. I used When instead of Given in the Gherkin language in this step, because the app finishing loading is the key event when the desirable action should result (use of When) rather than being a background pre-condition for another key event to occur (use of Given).

And the user has not been set a lockup schedule yet

It is the app that should know what the lockup schedule is, if there is any. So to implement this step, I need to set up some kind of interprocess communication (IPC) between my testing framework process and the application being tested. I chose to use WCF for that (in .NET), and setting that up on both sides took another considerable amount of time (2h). Also, I am quering the system under test itself (the app), which might not be a good idea. Should I be getting the information from an independent source, and if so, why?

Ok, I have a way to communicate with the source that should know the lockup schedule, but at the moment this feature (scenario) is not implemented at all, so the application just uses a default hard-coded lockup time. I have a feeling that what's best to do at this stage is always return no (i.e. the user has not been set a lockup schedule yet) and not implement any functionality that would store custom lockup times, for instance, which the application could then look up. That would be best so the process would be test-driven, and consequently I have to come back to this test step when the actual scenario is implemented and change the step code to look up if schedule exists rather than just returning no all the time.

And here I realize I do not feel confident with regard to whether I should fail the test overall when a when step fails or I should pass it or I should output pending. If this or the previous step fails, the state of affairs is such that I have not been able to confirm the functionality working, so it should not be a pass overall. I suppose choosing between fail or pending depends on how the test results are actually used at the end of the day?

More questions than answers. Feel free to chime in with comments.

Then the user is prompted with the opportunity to change the default lockup start times

I'm making use of IPC again to ask the application whether the user is given a chance to somehow change the default lockup times (in the form of a form). I have been able to set up two-way communication (with duplex WCF), so I can have the application push events to the test process when the schedule user interface is shown to the user. But similarly to the previous step none of that scenario user interface or code exists, so to keep it test-driven I am at this stage just asking for the information and timing out because the (non-existent) user interface is never shown. I am therefore sure to fail this step at this stage, which truly reflects that the functionality formalized in the scenario is yet missing.

Perhaps once the infrastructure is set up time costs become a lesser worry as implementing this step took only half an hour (be it partial implementation at this stage, because functional code required for full implementation of this test step does not exist yet).

And the chosen times are saved off the user's device

Even though when running the test this step will be skipped because of previous errors (because previous step fails), then I should still implement as much of this testing as is possible without implementing the actual functionality. Again, that boils down to asking the cloud service if the lockup times are retrievable, which at this stage is a constant no. Half an hour added.

Implementing actual functionality

This is where a test-driven programmer might immediately start writing a concrete class that that in the most simple and non-abstracting way implements the functionality, not in one class and one monolithic method, but only a few degrees removed from that. I think this is where TDD can be short-sighted, ignoring any and many design patterns and engineering principles that introduce more re-usable and maintainable code at the expense of simplicity. So first I like to do some up-front design and architecting to sketch a more future-proof implementation, identifying existing components that can be re-used as well as new components that need implementing, which need be integrated in a certain way in service of fulfilling the functionality defined in the scenario. Doing it this way does not mean I am not test-driven - I am identifying components that are absolutely necessary, and sufficient as an integrated set, to carry out the desired functionality / pass the scenario test.

Once I have the implementation skeleton, I like to mock all of it to improve on the component interfaces and witness that I can actually glue the (unimplemented) components together for value to emerge. And then implement the missing components. All of that gets into programmatic detail, however, which is not the focus of this article.

Points of Interest

As a first practical experience with value driven development, the amount of time to write the scenario test does weigh heavily against the benefits described. I think it is matter of going through with another scenario and then another to see if they take less time subsequently (and I hope to update the article when I have done that). However, if time spent is still not substantially reduced compared to more traditional approaches, then perhaps it is worth carrying on with only half of this approach by skipping the scenario test altogether, but retaining the scenario writing itself?

And I need to go further further down the line with this approach to substantiate the benefits themselves i.e. we need to get to the stage where we are re-testing scenarios (because codebase changes have occurred elsewhere) and running the tests on client devices, these benefits remain theoretical at the moment.

All in all, first impressions do not seem to be enough to decide which version of the approach to adopt, if any.