Sunday, January 18, 2015

The Greenest of Fields

A greenfield project is a brand new project that is not subject to any legacy requirements and importantly does not deal with legacy code. The opportunity to work on a greenfield project does not come along very often and is generally viewed as a real opportunity by developers. In the excitement, it is tempting to focus on the new design and the new code.

Beyond the technology decisions of the new project, there are things to consider that will have an under appreciated impact on the project. These issues are the subject of this blog posting. If you ignore these issues or make the wrong decision, changing you mind could be very expensive.

Unit Testing and Code Coverage. You should decide before any code is written whether you are going to establish unit tests and track code coverage in your build. Anyone who has a legacy code base and wants to go back and unit test will tell you that there isn't a business in the world that will pay your developers to go back and do this - it is cost prohibitive. Not only that, the existing code base can affect your overall coverage numbers should you start trying to cover you new code and ignore your old code. (I'll actually discussion how to manage this in a forthcoming posting).

Coding Conventions. If your project is to adhere to coding conventions, they should be established early on. Depending on how detailed the conventions are and whether there is any tool support for transforming code, this could be expensive to retrofit once the project has a significant code base. The difficulties are compounded because code that is written to be testable is different (usually better designed) than code that is not, so you wouldn't be able to unit test legacy code without significant refactoring costs.

Language-specific Best Practices. Well beyond issues of formatting, some languages have best practices conventions that should be established early on. An example is "const correctness" in C++. Anyone who has tried to go back and retrofit "const correctness" in a code base knows how invasive this change can be. An effort like this can end up touching almost all of the classes in your project. Of particular insidiousness is that a casual code inspection often does not make the scope of such changes apparent.

Internationalization. On our projects, we always consider internationalization when we write our code, even if it is not an explicit requirement for the project. There are tools that can help sift through source code and refactor the string literals out into a separate module. This is helpful and better than nothing, but it is not as neat as doing this as you write your code. Additionally there are often considerations that make this one-size-fits-all approach suboptimal. For example, in our projects, we allow the logging (and debug information), which frequently is done on the serverside, to remain in English. We use the locale associated with the current session to determine what language the user interface shows to the end user.

Code Rules. There is a lot of tool support these days to analyze source code and determine its adherence to coding rules. A popular tool like this is FindBugs (part of Sonar). Use of this tool should be done very early on. Anyone who has turned this tool onto an existing code base knows the potentially thousands of issues/violations that can be raised. Although the rules used to generate violations can be tuned, it is far better to do this with a very small code base than a large. We turned this tool onto a large code base (say 500K LOC) and had to limit our efforts to address only critical and major issues because of the large volume. So do yourself a favor and activate this tool on all check-ins starting on day zero.

Database Content. This issue is subtle. Our application uses a backend database and to support our functional tests, we stand up test data. The data was captured as a dump. So over time, developers added test data, got their tests going, then took a new dump. The problem with this is that very quickly, nobody could explain why a particular piece of data was in the database. It had become a big tarball of opaque, but required data. Avoid this situation at all costs. You should manage your data just like you do your source code. Each addition to the test data should be identifiable and tracked. Ideally you could stand up a new test database based solely on these individual descriptions. (I'll discuss this more in a forthcoming blog entry).

Technical Debt. The urgency in establishing identification of technical debt is not as high as the other issues identified here. Technical debt is incremental and, for the most part, its cost is incremental. Delaying dealing with technical debt is a mistake for many reasons. Ideally it would be identified, tracked, and continually addressed from day zero. Doing so establishes the expectation on your development team that taking shortcuts is not free of cost and is not invisible. It will cost you in the future at some point.

Hopefully these issues will give you something to consider at your next greenfield opportunity. If you have any other ideas about issues that should be in almost all projects and are hard to retrofit, please leave comments.