Order In The Software Universe: June 2018

Intro

I have been in the software development business for well over twenty five years and have seen many successful and unsuccessful projects. The past ten years have been on teams following Agile principles to varying degrees. The teams have always been composed of different levels of experience and expertise, but I have been very fortunate to have teams that are heavily populated with not only experienced developers, but with open-minded developers.

On these teams, I have seen patterns of behavior that, on the surface, seem beneficial to the team, the product, and the company, but in reality come with a lot of unrecognized costs. These costs are unmanaged; they are not planned for; they are not quantified. Oftentimes they are never historically reconciled.

There is one particular behavioral offender in this developer ecosystem and it might surprise you. I'm talking about the hero developer.

Hercules

Meet Hercules. He is a member of our mythical development team. Hercules fits pretty well into the team. He communicates reasonably. He is a competent developer. He gets his work done consistently. But where he really shines is when there are looming, seeming impossible deadlines. He is the one that stays up into the wee hours, throwing himself across the finish line just-in-time.

He is the hero. He is the one who gets the adoration of managers and that good ole back slap with an "at-a-boy". Honestly he really loves the attention and approval from his manager and the awe from the junior developers who hope to grow up to be like Hercules someday.

Managers see the deeds of Hercules and know that this is how successful software is created. If only they had a bunch developers just like Hercules, their software problems would be solved.

A Deeper Analysis

First, in the context of Hercules, let's acknowledge that being dedicated to your team and working hard are assets and that these assets are not found in all developers. But when we look at Herculean deeds from an Agile point-of-view, we start to see issues that can undo all of the gains that came from the providence of Hercules.

Hercules specializes in cranking out code in a short amount of time. Sometimes it seems the more pressure, the better. Hercules knows that code. Due to time pressure, he didn't get to do a lot of the unit or integration tests, but through sheer ability to focus, he delivered today's functionality, and that's what's important, right?

Hercules sure knows that code, but the other developers don't. The agile developer wants group code ownership. They want group input on design choices. They want the most well thought out, flexible code that the company and project can afford. The agile development team wants to de-silo the implementation, meaning that multiple developers are familiar with the code so future changes occur at a reasonable cost. But given the single source effort, the odds go against this goal.

A good hero sees heroics as an aberration - something to be avoided if possible. But I have seen cases where heroes use heroics as a path to job security. They hoard information by being the only custodians of said code, such that others recognize their area expertise. They want to make sure that there is a large painful hole in the development team should they ever leave. Naturally job security like this is good for them and really bad for the company.

Another issue involves scaling the team. If you have a Hercules, it is almost impossible to clone him/her. You should be able to find developers, but if you need one to fit into a chaotic, barely-managed environment, you need another Hercules, and they are both harder to find and cost more money.

The final drawback is a bit more subtle. Tom Demarco's book "Slack" (https://www.amazon.com/Slack-Getting-Burnout-Busywork-Efficiency/dp/0767907698) does a good job of discussing the issue. In a nutshell, if you have heroics as your normal means of delivering software, your heroes are >100% busy - and this means time for creativity and giving back (like mentoring) fall by the wayside. This cost is never recognized by management but can have a very large impact on the business over time.

Parting Advice

Hire as many heroes as you can and avoid heroics at all costs.

A Brief Lexicon of the Software Quality Landscape

The quality of a software product is a multi-dimensional measurement, spanning such things as functionality, correctness, performance, documentation, ease of use, flexibility, maintainability, and many others. Many of these qualities are difficult to measure, are difficult to see, and hence are difficult to manage. The result is that they are ignored by all but the most enlightened in management.

The topic of interest for this screed is going to be correctness. This is not going to concern correctness in the end-user sense, meaning the correct meeting of requirements. It is going to mean "is the code doing what we think it should be doing." Since we don't generally have provably correct programs, it is a matter of convincing ourselves, through lack of evidence, that our programs are working as we'd like. This precarious situation is perfectly portended by Edsger Dijkstra's observation that "absence of evidence is not evidence of absence."

So we have several levels of ways to convince ourselves of correctness. They are, from most detailed to most abstract, unit testing, integration testing, functional testing, and system testing. Note that there is not industry agreement concerning these exact terms, but the general concepts are recognized.

In typical object oriented designs, unit testing involves isolating a given class and driving its state and/or behavior and verifying that we see what we expect. Integration testing is a layer above that where we use multiple classes in the tests. Functional testing is yet above that, where we try to deploy our programs in the natural components that they would inhabit in production, like a server or a process. Finally system test covers testing in the full production-like environment.

Unit Testing

Our focus will be on the lowest level, namely unit testing. The expectation is that unit tests are both numerous (think many hundreds or even thousands) and are extremely fast (think milliseconds). To be effective, these tests should be run on every single compile on every developer's machine across the organization. The goal is that the unit tests precisely capture the design intent behind the implementation of the class code, and that any violation of those intents result in immediate feedback to the developer making code changes.

I'd like to tell you that every developer is doggedly focused on both the quality of the production logic and the thoroughness of the unit tests that back that logic. Through a combination of poor training, lack of emphasis at the management level, and just plain laziness, developers produce tests that span from greatness all the way down to downright destructive (more on that in another blog entry). One of the easiest ways to try to externally track this testing is through code coverage.

Code Coverage

Code coverage is a set of metrics that can give developers and other project stakeholders a sense of how much of the production logic has been tested by the unit tests. The simplest metric is the "covered lines of code" aka line coverage. This is usually a percentage and it means that if a class has 50 lines of code in it, and it has 60% code coverage, then 30 lines of that production logic is executed as part of the running of the unit tests for that class. There are other coverage metrics that can help you gauge the goodness of your tests, like branch coverage, class coverage, and method coverage. But here, we will focus on line coverage since that is most widely used.

The general, common sense assumption is that "more is better", so mis-guided management and deranged architects insist on 100% code coverage, thinking that would give the maximum confidence that the quality of the code is high. If we had an infinite amount of time and money to spend on projects, this conception could represent the optimum. Since this luxury has never been true in the last 4 billion years, we have to spend our money wisely. And this changes things drastically.

The truth is that it might cost M dollars+time to achieve say 80% line coverage, but it might take M *more* dollars+time to get that last 20%. In some cases, getting the last few percentage might be extremely expensive. The reason for this non-linear cost is complicated.

First, production logic should be tested through its public interface where possible rather than through a protected or private interface. It can be laborious to construct the conditions necessary to hit a line of code buried in try/catches and conditional logic behind public interfaces. This cost can be lowered by refactoring the code towards better testability, but this is a continuous struggle as new code is produced. There is a truism in the veteran developers that increasing the testability of the production logic improves its design.

Second, some code has high complexity also known as cyclomatic complexity. Arguably this code should be refactored, but projects do have a certain percentage of their code with high cyclomatic complexity that gets carried forward from sprint to sprint.

The third reason is a bit technical. Code like Java is compiled into byte code. The code coverage tools run off of an analysis of the byte code, not the source code. The Java compiler will consume the source code and emit byte code that may have extra logic in it, meaning code with extra branches. It might not be possible to control the conditions which would take one path or the other through this invisible branch. Further complicating this, is that the invisible logic can change from Java compiler release to release, putting a burden on the test logic to reverse engineer the conditions needed to cover this invisible logic.

Summary

Based on the above discussion, achieving 100% line coverage can be very expensive. On teams that I have worked on over the years, a reasonable line coverage would be 70% or more. But you should let the development team determine this limit. If you force your teams to get to 100% line coverage, you are spending money that might be better spent on automation tests. In addition, I have seen cases where developers will short-circuit the unit tests by writing tests only for the purpose of increasing the coverage. You can readily identify these test because they have no assertion or verification check in them - they just make a call and never check on the result.

In short, you should be careful what you ask for. Make sure you interact with the development team in making the decision about code coverage. Spending another 50% of scarce testing dollars on that last 10% coverage is unlikely to bring a return on investment.

Order In The Software Universe

Sunday, June 24, 2018

Hercules in the Age of Agility