Thursday, February 25, 2010

Levels of architecture

As I said in a recent post, it is important to have a 50,000 feet view of a software system in our mind before editing even a single line in its codebase. Without a big picture in mind, our changes can push the project in the wrong direction. But there are different views, each at a different level of detail, that may affect a software's design process.
A simple analysis produces this scale of units, ordered by increasing size. Every unit is composed by a set of units from the underlying level, which comes before it in the list:
  • The simplest unit of software is obviously the line of code.
  • The next smaller unit in object-oriented software is the method, being it public, private or with any access control policy.
  • The class and its specializations: Entity, Value Object, Service, Repository...
  • The package alias namespace: its purpose is to simplify references between items contained in the same unit.
  • The module alias component, which often presents a Facade to simplify its access.
  • The application alias BoundedContext. Different applications can work together and communicate via published protocols, anti-corruption layers, RESTful services, relational databases... There are no limits to collaboration paradigms.
The interesting part is that there are some metrics and rules of thumb that works at every level of detail in an architecture, while others gain greater effectiveness the more you are near (or far from) one end of the spectrum. Employing metrics and practices at the particular level of the architecture they are thought for is crucial.
Let's consider some general rules first:
  • number of units: programmers have limited Ram. People normally can work on 5 to 9 units at the time (obviously more when they are very similar or dumb), so a container unit should not be composed of an high number of contained units. For example generally a class should not present 300+ methods, and a package should not contain one hundred classes.
  • low coupling, high cohesion, information hiding: these concepts should be enforced at every level; one unit dependent on another is sufficient to transitively establish a dependency between the respective container units.
 And then the ones that change with the architecture level considered:
  • TDD and refactoring: applied at the low end of the spectrum. It is simple to refactor a private method, but it's very difficult to refactor a published protocol between two applications. 
  • the converse situation applies at the high end of the spectrum: thinking of a good api and refining the Ubiquitous Language is very important  because of the resistance to change of these kinds of units.
  • choice of the system under test: testing the single classes and methods is the preferred approach in a largest part of a project's test suite, because of the reduced number of test cases necessary and the resulting design aid (applied at the low end.)
  • code coverage is instead only significant at an high level. Some dumb classes may not have unit tests at all but at the same time they could be indirectly exercised by other tests. Similarly many static analysis metrics are meaningful if measured on a whole project.
  • Uml diagrams must be kept in sync with code, so they really add value when they are used at the boundaries of components or without much detail in them.
Feel free to add other practices you follow with preference at one end of the spectrum of units, or at every level of detail.


trond said...

About your general rules of thumb:

Good points, and I definitively agree that there should not be 300+ methods in a class.

I do, however, fear that you are misinterpreting Miller's paper. It does not imply that people can work on 5-9 tasks/units at a time.

The same misinterpretation is often present in information architecture circles as well, where some people claim that there should be no more than 7 +/- 2 items in a web site's menu for example. Totally rubbish. When you look at a menu on a web site you can scan it whenever you want to -- you're not required to keep it in your working memory at all times.

The same applies to working with classes, modules, etc.

Having said that: 5-9 probably is a sane place to draw the line. I can hardly imagine people being very productive if working on too many tasks/units.

Before I shut up: thanks for a great blog :)

Giorgio said...

Thanks for your additions.
I think that the line is blurred since the more homogeneous are the units, the more one can fit in a picture at the same time. An entity class with 10 getters which only maintains state is better engineered than a service one with 4 very unrelated methods.

ben_ said...

I wonder, if it would be sensible to include the type of software-development into this. I would say different types of approaches toward the way organzise your process of creating code, prefere different types of code structures.

Waterfall-Engieering comes up with different architectures, than extreme programming, or getting real. Small teams create different architectures than large teams. And open source teams create different architectures als closed source teams … I would guess.

Maybe you might want to read this.