Ruling Code Quality Regression | Patrick Smacchia
A prominent characteristic of the software industry is that products are constantly evolving. All modern development methodologies prone that a product should evolve through small iterations. Internally, development teams are using Continuous Integration servers that shrink increment length to a minimum.
In continuously evolving code bases, chances of regressions are high. A regression occurs when a new bug is introduced inadvertently, because of a change in code. The weapon against regressions lies in automatized and repeatable tests. This is why it is advised to developers to cover their code through unit-tests: not only to make sure that written code works as intended, but also to check as often as possible that in the face of future evolutions, the code won’t be broken inadvertently.
While it is advised to write performance testing to avoid performance regressions, others non-functional requirements, such as code quality, are not checked continuously. As a direct consequence, code quality declines, nobody really cares, and this affect significantly the code maintainability. There are tooling to assess quality, there are tooling that tells the global code quality evolution, but so far there are no tooling that exposes in details code quality regression.
This is a problem I’ve been mulling on during the last years. Through the last release of NDepend, we’ve introduced a multi-purposes code ruling capability through Code Query Linq (CQLinq). A CQLinq query is a C# code query, that is querying the NDepend.API code model.
A wide range of code aspects can be queried, including facilities to query code quality, and the possibility to query the diff between two versions of a code base. A Code Quality Regression rule, typically relies on both code quality and diff areas on NDepend.API CodeModel.
For example below is a default CQLinq code rule that matches methods that were both already complex, and became even more complex, in the sense of the popular Cyclomatic Complexity code metric, presented directly by the interface IMethod. If two snapshots of the code base are currently compared by NDepend, the extension method OlderVersion() returns for a IMethod object, the corresponding IMethod object in the older snapshot of the code base. Of course, for a method that has been added, OlderVersion() returns null, this is why in the query we first filter methods where IsPresentInBothBuilds() and even where CodeWasChanged(), since the complexity of the method won’t change if the method remains untouched.
This query can be edited live in Visual Studio and its result is immediately displayed (here we analyse 2 different versions of the code base of NUnit).
For any method, a right-click option let’s compare older and newer versions through a source diff tool (I like especially the VS 2012 diff integrated capabilities). And even better, to avoid formatting and comments irrelevant diff, and focus only on code change, a right-click option let’s compare older and newer version decompiled through RedGates .NET Reflector.
A group of default code rules, named Code Quality Regression is proposed. They all relies on the OlderVersion() diff bridge extension method, and then focus on a code quality aspect, like the ratio of code covered by tests, various other code metrics like number of line of code, or also the fact that an immutable type became mutable (which can typically break client code).
And since CQLinq is really just the C# LINQ syntax you already know, it is easy to define a particular code aspect that shouldn’t be broken, and write a custom regression rule on it.
Amongst the various code aspects we, developers, don’t like to see changed, is the visibility of publicly visible code elements. For dev shops responsible for publishing an API, this issue is know as API Breaking Changes and there are some default CQLinq rules to detect that.
The API Breaking Changes rules source code abide by the same idea. Here we are seeking for types in the older version of the code base, that are publicly visible, and that:
- have been removed in the newer version of the code base (while their parent assembly is still present)
- or where their visibility is not publicly visible anymore.
Some facilities to show the result to the user are proposed. For example, when chasing public interfaces broken (i.e with methods added or removed) the query select clause can be refined, to let the user list the culprit added/removed methods:
Typically, Code Quality tools are daunting because for any legacy real-world code base, they list thousands of issues . It is neither realist nor productive, to freeze the development for weeks or months to fix all reported flaws.
The ability to focus only on new quality regressions, that appeared recently during the current iteration, comes with the advantage to restraint the number of issues to fix to an acceptable range. This also favors the relevancy of issues to fix, since it is more productive to fix issues on recent code before it is released to production, than to fix issues on stable code, untouched for a long time.