Monday, March 08, 2010

How to avoid phase-of-the-Moon bugs

Wikipedia documents some definitions of particularly dangerous bugs, which are very hard to fix and when they manifest constitute a real problem in development.
Pay attention to these examples:
  • Heisenbug: bug that changes its behavior when someone is trying to reproduce it. The very attempt to study it changes the conditions (or requires irreproducible conditions) so that the Heisenbug does not manifest anymore. The naming of this bug is based on Heisenberg's uncertainty principle.
  • Phase of the Moon bug: bug that arises from a dependency on external conditions, usually time. In the linked original definition, a piece of software has a marginal dependency on the Moon phase. Surely software developers should not feverishly expect monthly events to determine if their application works (unless it's some werewolves-oriented project.)
These are two categories that are really dangerous and can bring development to an halt by forcing long debugging session to find the cause of the failure, which is "non-deterministic" (in the former case) or hidden (in the latter). They certainly look scary but I chose them as examples because these particular bugs can be avoided by employing some engineering practices, such as a good test suite.

The test suite for a project should be mainly composed by unit tests. While acceptance end-to-end tests constitute an important part of it, because they validate the application's fulfillment of its requirements, unit tests are usually more powerful even if they do not drive the design. Their potential is to quickly locate defects, by testing the contract of individual classes: the first step in exposing a bug, and particularly an Heisenbug, consists in locating it, and being able to reproduce it reliably every time that is needed, to check that the bug has been fixed. A well-written test suite provides an automatic way to check, at the push of a button, that all previously fixed bugs have not reappeared, so that there are no regressions.
The other advantage of unit tests is in the way they promote isolation. External dependencies are usually mocked out in the testing environment, so that boundary conditions can be reproduced at will. If you suspect that a piece of software may fail during a combination of unusual date and time, you only have to add a test case where you provide the set of conditions that you are scared about.
The isolation of components is not limited to external capricious dependencies, such as time and database state. Mutable global state can also be avoided just by maintaining a unit test suite, primarily because it makes the tests brittle and difficult to write since they may fail depending on their execution order. A unit test should be able to coherently fail or pass both when the single test method is run as when the whole suite is. If  you're facing global state clings in the testing environment, which may lead to Heisenbugs and similar issues in production, listening to the tests will tell you to change your design to accomodate a simpler testing procedure, and a overall better architecture.
I agree completely to the Google Testing blog's motto:
Debugging sucks. Testing rocks.


Christof Damian said...

This is a nice story of a phase of the moon bug, related to system administration and not programming though: "The case of the 500-mile email"

jquery said...

It was interesting to read this article and I hope to read a new article about this subject in your site in the near time.