Invisible to the eye

Friday, December 18, 2009

Angry monkeys and other stories

First, a story contained in The Productive Programmer, which I find really interesting and helpful. Neither I nor the author know if the story is real, but it has a powerful moral. Telling a story is often the best mean to communicate an interesting concept.

Angry monkeys
Once upon a time, there was a group of scientists who were experimenting on monkeys. They placed some of them in a closed room, along with a ladder that allowed them to grab a bunch of bananas hanging from the ceiling. The catch was - whenever a monkey went near the ladder, cold water was sprayed in the room. What the scientist get as the result of this experiment? Angry monkeys.
Then they remove one monkey from the group, and put in a brand new animal which was not aware of the cold water trap. His instinct suggested him to climb the ladder... Only to be suddenly beaten up by the other angry monkeys, which were tired of the cold feeling.
Continuing the experiment, they replaced one more monkey, and one more, until they have in the room only animals who have never experienced the cold water trap, which was now turned off. But still, if a monkey approached the ladder, he would have been stopped and beaten by his companions.
The moral is: why some practices are followed today? Because if they were not, a bunch of angry monkeys would yell at you. Some examples?

Primitives in Java, supported because it was very strange to create classes for simple values (now recognized as immutable Value Objects).
Making every entity class a bean or an Active Record was common practice between angry monkeys, but the situation has changed in the last years.
constructors that perform real work, or that even creates other objects, because someone thinks that object-oriented programming means writing programs consting only of the line new ApplicationInstance().

Sometimes a standard is enforced because it provides consistency and interoperability; some other standards are relics of the past, craved by angry monkeys. Dare to not always follow the same road of others.

Here are similar software stories, in the form of fables. The power of metaphors allow us to explain software problems even to naive people.
How to kill a dragon? This is something many knigths would want to know. And what if they could use their preferred programming language to complete the quest?
In Deadline and Technical Debt, a valorous knight attempts to satisfy the requests of the king to marry his daughter, the princess Caroline. Will he be successful?
The Stone Soup story, reported also in the original Pragmatic Programmer book, teach us that people find easy to join an ongoing project and this is a powerful way to cooperate.

Wednesday, December 16, 2009

The object graph

Stefano wrote to me with the intention to expand the discussion on the object graph concept, which I referred to earlier. As always, I think that sharing my thoughts can help other readers and also provide some feedback about these ideas.

A bit of theory

What is a graph? According to my math courses and to Wikipedia, it is an abstract structure defined by two sets. The variant of graph that interests us is the directed graph, because it resembles the Von Neumann representation of objects better than a non-directed one.
The two sets that define a graph are the vertices and the arcs:
V = {2, 3, 5, 7, 8, 9, 10, 11};
A = {(3, 8), (3, 10), (5, 11), (7, 8), (7, 11), (8, 9), (11, 9), (11, 2)};
The elements of A are ordered pairs whose elements are elements of V.
The term directed means that the graph's arcs present a specified direction (if they hadn't, they would have been called edges and the elements of A would have been two-element sets instead of ordered pairs).

How does it apply to computer science?
Well, suppose you have an object-oriented application in execution. The complete data structure is presented to us with various abstractions as an object graph, a graph where the V vertices are objects and the A arcs are their connections by field references. Actually, arcs could be represented by pointers in low-level languages like C++, and by more complicate handlers in higher-level environments such as the Php interpreter or a Java virtual machine.
For instance, consider the FrontController object of an ordinary php framework application. It has references to the Request and Response objects, and to the chosen Controller instance. The controller may have other references - to connection objects, Repositories, User entities and so on. There can be cycles and links spreaded all over the graph, which may be very complicated.
Of course to obtain a useful representation we may omit from a graph some objects which are actually reachable, as they are not "pertinent" to the current discussion. In a formal context, however, we ought not to leave out anything.
The first time I heard the object graph term was on Misko Hevery's blog, used to describe the structure of an object-oriented application.

Why talking about an object graph?
Because it is a mathematical abstraction on the raw pointers and memory segments.
Stefano said in his email:

Probabilmente ancora non siamo riusciti a formalizzare una analisi teoretica sopra gli oggetti che descrivono software. Non so nemmeno se la cosa, allo stato attuale sia verosimile o abbia un senso. Tuttavia, cominciare a pensare in questa direzione credo possa essere un punto di partenza proprio per trasformare la Programmazione da "Arte" a "Scienza", obiettivo perseguito anche dallo stesso Misko.

Maybe we have not yet formalized a theorical framework on software objects. I don't even know if this would make sense at this point. However, I think beginning to move in this direction can be a starting point to transform programming from Art to Science, an objective pursued from Misko too. (translation of mine)

An abstraction such an object graph let us make statements which do not depend on the technology (Java or Php) but only on the object-oriented paradigm, and that thus will be true in many languages and platforms, or 10 years from now when Php 9.3 and Java 14.0 will be released (provided that we maintain the OO paradigm; considering that Smalltalk is from the 1970s, it may last for a long time).
For instance, here is a list of the concepts which involve a generic object graph:

object graph building and business logic separation. To produce seams for easy unit testing where we can inject collaborators, classes that build or expand the object graph should be separated from classes which contain logic.
Serialization; given an object O, the graph of all the objects reachable from O should be serialized with it to allow its reconstitution.
The state of an application which should be stored in a database is an entity graph, composed of User, Group, Post instances; Orms such as Doctrine 2 implement persistence by reachability on a generic object graph. Reachability is a mathematical property.
Why entities should not contain field references to service classes? Because they reach out of the entity graph and complicate the storage process.
The Observer pattern can be described as a partitioned graph that improves decoupling between objects of the same partition (observed or observating side). Other patterns are often explained with the help of an Uml class diagram, which is a similar (but more specific) concept.

Note that if we demonstrate a rule or a theorem for an object graph (or a graph with certain characteristics), it will be valid for every other instance of that graph even in different applications. That's why mathematicians love abstractions as much as programmers: they save time to both categories.

Let me know your thoughts. There are many mathematical formulations of the object-oriented paradigm, but talking about a structure such as a graph can help explaining advanced concepts, taking advantage of this simple abstraction.

Tuesday, December 15, 2009

Learning how to refactor

Refactoring is the process of improving the design and the flow of existing, working code by applying common patterns, like extracting a superclass or a method, or even introducing new classes as well as deleting existing ones.
Probably if you are here you have already experienced the refactoring process, but I want to clarify the common iterative method I use, to get feedback and being helpful to developers who are naive in this practice.

This is the general process to learn how to refactor code, which wraps the basic refactoring cycle.
Step 1: get the book Refactoring: Improving the Design of Existing Code by Martin Fowler (a classic) or a refactoring catalogue on Wikipedia or Fowler's website. In the book or in a similar guide, there are two lists which are very boring if read sequentially: smells and refactorings. Smells are situations that arise in an architecture, while refactorings are the standard solutions to eliminate smells. It is annoying to keep something that smells in your office.
Since it is very boring to read the Refactoring book if you have even a small experience with the practice, the best way to extract all Fowler's knowledge is to apply it directly.
Step 2: For S in 'smells':

Read about S; understand what is the problem behind a practice that you may have used without worries.
Loof for S in one of your personal projects or where you have commit access and responsibility for the code base; get convinced that this smell should be eliminated. If you are not convinced, stop here this iteration and go to the next smell; your existing solution can be pragmatically correct in the context of your style or in your architecture. Note that there are no absolute reference points and refactorings often come in pairs: it is up to you to choose if refactor in a direction or in the opposite one (Extract Method or Inline Method?)
Find an appropriate refactoring R; there are multiple solutions that can eliminate a smell. Be consistent in your choice in different places.
Make sure there are unit tests for the code you're going to edit. No further action should be taken before you are sure functionality is preserved. This is the answer to the question "Why change something that works?"... Because it will still work, but much better.
Apply R in small steps, running focused tests every time to ensure you have not break anything.
Once the refactoring is complete, run the entire test suite to find out if anything is not working. Note that failures in points distant from refactored code constitute a smell too: they are a symptom of coupling.
svn diff will calculate a picture of your modifications; ensure that debug statements or workarounds are not in place anymore.
svn commit (or git equivalent commands) pushes your improvements to the repository. Using version control is also fundamental in case you get in a not recoverable state: svn revert -R . is the time machine button (no, Apple has nothing to do with it) to restore the original code.

The goal of learning various refactoring techniques is to easily see smells in the future, to improve the efficiency of the Refactor phase in the Red-Green-Refactor cycle. Your bricks (classes) are very malleable when fresh, but when they solidifiy it becomes harder to add further modifications: it is good to refactor as much as possible just as you have finished adding code for functional purposes.

Monday, December 14, 2009

How an Orm works

Some readers have been confused by the terms I use often in reference to Object relational mappers, so I want to describe some concepts of Orms and make some definitions. Particularly I want to focus on how a real Orm works and lets you write classes that do not extend anything (Plain Old Php Objects or Plain Old <insert oo-language here> Objects).
The persistence-abstraction standard is Java Persistence Api, which was extracted from Hibernate, and I will refer to it in this post. Doctrine 2 is the Orm which ports the specification in the php world and will be the reference implementation of these concepts in the explanation that follows.

The primary classification of Domain Model classes consists in dividing them in two categories: Entities, which primary responsibility is to maintain the state of the application, and Services, which responsibility is to perform operations that involves more than one Entity, and to link to the outside of the domain model, breaking direct dependencies. This distinction leaves out Specifications, Value Objects, etc., which add richness to a model but are less crucial parts of it. Repositories and Factories are still a particular kind of Service.
I know that primary responsibility of a class sounds bad, since a class should have only one responsibility; though, there is a trade-off between responsibility and encapsulation and an Entity class should certainly hide the workings of many operations that involve only its private data.
Examples of Entity class are User, Group, Post, Forum, Section, and so on. Typical Service class names can be UserRepository, UserFactory, HttpManager, TwitterClient, MyMailer. You can often recognize entities from their serializability.
Imagining that you are going to take advantage of an Orm's features, once you have your Entity classes defined it's up to you to define their mapping to relational tables in a format that the Orm understands - xml, yaml, ini files, or simple annotations. The Orm will use this information not only to move objects back and forth from the database, but also to create and maintain your schema, thus without introducing duplication.
The mapping consists of metadata that describe what properties of an entity you want to store, and how. There are multiple ways to map objects to tables and an Orm should not just invent how to fit them in a database.
Java annotations are objects which provide compile-time checks, while in php they are only comments included in the docblock due to lack of native support. This also means that with Doctrine 2 there is no dependency from the Entity class file to the Orm source code.
This is the simplest Entity I can think of, a City class, complete with mapping for Doctrine 2:

<?php
/**
 * Naked Php is a framework that implements the Naked Objects pattern.
 * @copyright Copyright (C) 2009  Giorgio Sironi
 * @license http://www.gnu.org/licenses/lgpl-2.1.txt
 *
 * This library is free software; you can redistribute it and/or
 * modify it under the terms of the GNU Lesser General Public
 * License as published by the Free Software Foundation; either
 * version 2.1 of the License, or (at your option) any later version.
 *
 * @category   Example
 * @package    Example_Model
 */

/**
 * @Entity
 */
class Example_Model_City
{
    /**
     * @Id @Column(type="integer")
     * @GeneratedValue(strategy="AUTO")
     */
    private $_id;

    /**
     * @Column(type="string")
     */
    private $_name;

    public function __construct($name)
    {
        $this->setName($name);
    }

    /**
     * @return string   the name
     */
    public function getName()
    {
        return $this->_name;
    }

    public function setName($name)
    {
        $this->_name = $name;
    }

    public function __toString()
    {
        return (string) $this->_name;
    }
}

Private properties are accessed via reflection.

A JPA-compliant Orm presents a single point of access to its functionalities: the Entity Manager, which is a Facade class. You should now understand the meaning of its name.
The Entity Manager object usually has two important collaborators: the Identity Map and the Unit Of Work, plus the generated proxy classes which serve for many purposes:

the Identity Map is - as the name suggests - a Map which maintains a reference to every object which has been actually reconstituted from the database, or that the Orm knows somehow (e.g. because it has been told to persist it explicitly).
Proxies, whose classes are generated on the fly, substitute a regular object in the graph with a subclass instance capable of lazy loading itself if and only if needed. The methods of the Entity class are overridden to execute the loading procedure before dispatching the call to the original versions.
The Unit Of Work calculates (or maintains) a diff between the object graph and the relational database; it commits everything at the end of a request, or session, or when the developers requires so.

The shift in the workflow is from the classic ActiveRecord::save() method to the EntityManager::flush() one. It is a developer's responsibility to maintain a correct object graph, but it is the Orm's one to reflect the changes to the relational database. The power of this approach resides in letting you work on an object graph as it were the (almost) only version you know of the Domain model.

Sunday, December 13, 2009

Php technologies' grades

Last week I was asked by a client:

You used Zend Framework for this small php application. From 1 to 10 [which is the grade framework in Italian secondary schools], how much is this technology sophisticated comparing to the other ones in the php world?

I answered:

8 or 9, I can't think about a more advanced php technology (and with a so much steep learning curve), unless I think about Doctrine 2.

Before Symfony and CodeIgniter developers bite me: given the occasion, I would say quite the same of applications built with your frameworks, since I'm making a comparison with the legacy code I had to deal with in the past.

Stimulated by the question, I decided to rank common technologies and practices I (and many developers) chose (and still choose) for php applications architecture. Note that these ranks describe the complexity and inherent power of the different approaches/technologies, but by no means low ranked solutions should be deprecated: they still get the job done when something more elaborated it's not necessary, and we are not in the mood of killing a fly with the Death Star.
Here is my evaluation:

1: Welcome, today is <?php echo date("Y-m-d"); ?>. 0 is the same but with <? instead of <?php.
2: Html page with embedded php code. Very useful in 1990s and still work sometimes because of its simplicity for temporary and corner-case pages.
3: Set of semi-static php scripts with no code reuse.
4: header.php and footer.php applications; this is the structure of the website my application partners with.
5: header/footer inclusion but with business logic reuse, for instance application that comprehend modules, classes and functions.
6: Procedural open-source frameworks and Cms, for instance Drupal 6. They are becoming not pretty to the eye, but they do the job.
7: Object-oriented applications, that rely for example on in-house frameworks.
8: Zend Framework 1.x applications: object-oriented, more or less testable, little duplication when done right. But the inherent singletons prevent them to rank higher. See you in 2.x...
9: Doctrine 2: Data Mapper for persistent-agnostic domain models.
10: No such technology has been produced in php at the moment, primarily because of the slowly real object-oriented paradigm adoption.

Or do you think there is already a 10 to assign?

Saturday, December 12, 2009

Saturday question: testing in .NET

A reader wrote to me asking resources for learning how to implement Test-Driven Development in an .NET environment:

Please pardon me for my unsolicited email, but I saw your blog and I believe that you are one of the best in the software community. My name is [omissis], and I'm a C#/ASP.NET programmer from the Philippines, but I really want to learn and understand Unit Testing and TDD the right way. I didn't take Computer Science or a similar course in college. I really want to learn software design and development, on how to develop an application from ground-up using TDD. I hope you can give me advices, since I'm not able to afford a good book.

I am no particular expert in C# since I mostly work in php. As you may know, I have written a free CreativeCommons-licensed ebook on php applications testing.
For the .Net case, if you are a beginner, there is a book I reviewed which is a good starting point: The Art Of Unit Testing, which has lots of .NET examples included.
It costs $26 on Amazon now, which you can consider an investment since the knowledge contained could make you earn more in the future. It is a very complete book.
You can also obtain the book for free via other means, such as public libraries. I personally use a lot my university's library to look for information in technical books like Design Patterns when I am not going to buy a copy at the moment, as they are not diffused in normal libraries. You already pay for libraries with your taxes so you'd better take advantage of them.

Once you have the basis, the best way to improve is practicing... Someone said that a developer becomes proficient in unit testing after having written 1500 tests.
For general advice, you may also follow this blog and the Google Testing one, although they are focused on technologies different from .NET.
The principles of testable and decoupled design are the same in all object-oriented languages, and the distinction between C# and php resides in how and when an application object graph is created.
I hope you can find this references useful to start your journey.

Thursday, December 10, 2009

Who else wants to have free documentation? A readable test code sample

It's fun to TDD classes for your projects because you try your class as its client will do even before both are written at all: interfaces and abstractions are by far the most important part of an application's design. But often test classes grow, and we should ruthlessly refactor them as we will do with production code. One of the most important factors to consider in test refactoring is preserving or improving readability: unit tests are documentation for the smallest components of an application, its classes. New developers that come in contact with production classes take unit tests as the reference point for understanding what a class is supposed to do and which operations it supports.
To give an example, I will report here an excerpt of a personal project of mine, NakedPhp. This code sample is a test case that seems to me particularly well written.

The NakedPhp framework has a container for entity classes (for instance User, City, Post classes). This container is saved in the session and it should be ported to the database for permanent storage when the user wants to save his work.
This is the context where the system under test, the NakedPhp\Storage\Doctrine class, has to work: it is one of the first infrastructure adapter I am introducing. In the test, User entities are stored in a container and they should be merged with the database basing on their state, which can be:

new (not present in db);
detached (present in db but totally disconnected from the Orm due to their previous serialization; no proxies are references from a detached object and these entities are not kept in the identity map);
removed (present in db, but should be deleted as the user decided so).

The NakedPhpStorage\Doctrine::save() method takes a EntityCollection instance and processes the contained objects bridging the application and the database with the help of the Doctrine 2 EntityManager.

This test class is also an example about how to test your classes which require a database, such as Repository implementations. I usually create throw-away sqlite databases, but Doctrine 2 can port the schema to nearly every platform. Using a fake database allows you to write unit test that run independently from database daemons and without having to mock an EntityManager, which has a very long interface. Classes that calculate bowling game scores are nice but classes that store your data a whole lot more.
Finally, I warn you that this example is still basic and will be expanded during future development of NakedPhp. What I want to show here is where the Test-Driven Development style leads and an example of elimination of code duplication and clutter in a test suite.

<?php
/**
 * Naked Php is a framework that implements the Naked Objects pattern.
 * @copyright Copyright (C) 2009  Giorgio Sironi
 * @license http://www.gnu.org/licenses/lgpl-2.1.txt
 *
 * This library is free software; you can redistribute it and/or
 * modify it under the terms of the GNU Lesser General Public
 * License as published by the Free Software Foundation; either
 * version 2.1 of the License, or (at your option) any later version.
 *
 * @category   NakedPhp
 * @package    NakedPhp_Storage
 */

namespace NakedPhp\Storage;
use Doctrine\ORM\UnitOfWork;
use NakedPhp\Mvc\EntityContainer;
use NakedPhp\Stubs\User;

/**
 * Exercise the Doctrine storage driver, which should reflect to the database
 * the changes in entities kept in an EntityCollection.
 */
class DoctrineTest extends \PHPUnit_Framework_TestCase
{
    private $_storage;

    public function setUp()
    {
        $config = new \Doctrine\ORM\Configuration();
        $config->setMetadataCacheImpl(new \Doctrine\Common\Cache\ArrayCache);
        $config->setProxyDir('/NOTUSED/Proxies');
        $config->setProxyNamespace('StubsProxies');

        $connectionOptions = array(
            'driver' => 'pdo_sqlite',
            'path' => '/var/www/nakedphp/tests/database.sqlite'
        );

        $this->_em = \Doctrine\ORM\EntityManager::create($connectionOptions, $config);
        $this->_regenerateSchema();

        $this->_storage = new Doctrine($this->_em);
    }

    private function _regenerateSchema()
    {
        $tool = new \Doctrine\ORM\Tools\SchemaTool($this->_em);
        $classes = array(
            $this->_em->getClassMetadata('NakedPhp\Stubs\User')
        );
        $tool->dropSchema($classes);
        $tool->createSchema($classes);
    }

    public function testSavesNewEntities()
    {
        $container = $this->_getContainer(array(
            'Picard' => EntityContainer::STATE_NEW
        ));
        $this->_storage->save($container);

        $this->_assertExistsOne('Picard');
    }

    /**
     * @depends testSavesNewEntities
     */
    public function testSavesIdempotently()
    {
        $container = $this->_getContainer(array(
            'Picard' => EntityContainer::STATE_NEW
        ));
        $this->_storage->save($container);

        $this->_simulateNewPage();
        $this->_storage->save($container);

        $this->_assertExistsOne('Picard');
    }

    public function testSavesUpdatedEntities()
    {
        $picard = $this->_getDetachedUser('Picard');
        $picard->setName('Locutus');
        $container = $this->_getContainer();
        $key = $container->add($picard, EntityContainer::STATE_DETACHED);
        $this->_storage->save($container);

        $this->_assertExistsOne('Locutus');
        $this->_assertNotExists('Picard');
    }

    public function testRemovesPreviouslySavedEntities()
    {
        $picard = $this->_getDetachedUser('Picard');
        $container = $this->_getContainer();

        $key = $container->add($picard, EntityContainer::STATE_REMOVED);
        $this->_storage->save($container);

        $this->_assertNotExists('Picard');
        $this->assertFalse($container->contains($picard));
    }

    private function _getNewUser($name)
    {
        $user = new User();
        $user->setName($name);
        return $user;
    }

    private function _getDetachedUser($name)
    {
        $user = $this->_getNewUser($name);
        $this->_em->persist($user);
        $this->_em->flush();
        $this->_em->detach($user);
        return $user;
    }

    private function _getContainer(array $fixture = array())
    {
        $container = new EntityContainer;
        foreach ($fixture as $name => $state) {
            $user = $this->_getNewUser($name);
            $key = $container->add($user);
            $container->setState($key, $state);
        }
        return $container;
    }

    private function _assertExistsOne($name)
    {
        $this->_howMany($name, 1);
    }

    private function _assertNotExists($name)
    {
        $this->_howMany($name, 0);
    }

    private function _howMany($name, $number)
    {
        $q = $this->_em->createQuery("SELECT COUNT(u._id) FROM NakedPhp\Stubs\User u WHERE u._name = '$name'");
        $result = $q->getSingleScalarResult();
        $this->assertEquals($number, $result, "There are $result instances of $name saved instead of $number.");
    }

    private function _simulateNewPage()
    {
        $this->_em->clear(); // detach all entities
    }
}

Do you have other suggestions to further refactor this code?