Invisible to the eye

Monday, February 08, 2010

Practical Php Patterns: Structural patterns summary

This post is part of the Practical Php Pattern series.

The GoF book says there is a lot of similarity between structural patterns because of the few kinds of relationships in object model structures: inheritance and implementation between classes plus object composition, which is the leitmotif of the majority of these patterns.
The greatest advantage of object composition consists in being a relationship which can be established at runtime and basing on configuration without changing actual code, an operation that is required by class inheritance and implementation of interfaces. In fact, ideally you can totally define the functionalities of an application by changing the link between objects, and not the ones between classes, which are part of the design. This will be expanded in the next articles on behavioral patterns.
Here is the list of the structural patterns treated in the GoF book and in this article series.

Adapter-Bridge-Facade
These three patterns' intent is fighting coupling:

an Adapter connects objects with different interfaces.
a Bridge separates a hierarchy of M abstractions and a hierarchy of N implementations into N+M ortoghonal classes instead of M*N use cases.
a Facade creates a new unified interface to access a subsystem.

Composite-Decorator-Proxy
These three patterns maintain the same interface of an object whilst they add orthogonal concerns without the client noticing, transparently:

a Composite implementation adds containers for other objects to organize a hierarchical structure and reuse trees as leafs of other trees.
a Decorator adds behavior by intercepting methods delegation, avoiding the introduction of an hign number of subclasses.
a Proxy delegates operations to an object that does not exist yet or is hard to access.

Finally, the Flyweight pattern is a standard implementation for ValueObjects with no identity.

Sunday, February 07, 2010

Test suites and php namespaces

Php namespaces, introduced with the 5.3 release, are a great tool to stop writing very long class names. Importing class names via use statements makes you able to refer to classes via their base name, for example by writing:

use NakedPhp\MetaModel\NakedObject;

you will be able to use the name NakedObject in method definitions and instantiations in the rest of the script.
In test code, however, you will often have to import the classes under test and the involved abstractions to define some stubs. If the primary test case class for NakedBareObject is NakedPhp\Tests\ProgModel\NakedBareObjectTest, an use statement in its source file such as:

use NakedPhp\ProgModel\NakedBareObject;

is mandatory.

My suggestion is to organize code in parallel class hierarchies for test case classes and production classes, so that the test suite's classes are not in the same folders as their SUTs, but they reside in the same namespace at runtime. This is a diffused practice in the JUnit world and it is even listed in the JUnit faq.
Consider as an example the NakedPhp directory structure:
http://nakedphp.svn.sourceforge.net/viewvc/nakedphp/trunk/?pathrev=137
library/ and tests/ are the two folders that contain production code and test cases in parallel hierarchies. Autoloading works simply by adding to the include_path both directories.
The test case classes have the suffix Test added to the name. The test case class for NakedPhp\ProgModel\NakedBareObject is NakedPhp\ProgModel\NakedBareObjectTest.
The basic assumption of this directory structure is thinking of namespaces as packages, or reusable modules, whose dependencies should be limited as much as possible. Considering the NakedPhp repository, I am refactoring the original NakedPhp\Metadata namespace, splitting it in two parts: NakedPhp\MetaModel and NakedPhp\ProgModel, with the latter's classes having dependencies on the former, which contains primarily interfaces. I am also moving the classes which disturb the cohesion of these two namespaces in other ones which already depends on NakedPhp\MetaModel: high cohesion of single software modules is obtained by keeping together strongly coupled classes, so that changes in one of them do not propagate all over the application.

Here are some advantages of the parallel hierarchies paradigm:

when you're testing the SUT you don't have to import it, since the two classes are in the same namespace (but in different folders, so test cases do not clutter your production code directories.)
when you're referencing classes or interfaces from the same namespace, treated here as a package, you still do not have to import them. This case will be very frequent as good engineered classes do not have different responsibilities but delegate part of their operations to collaborators.
when you're using classes from another namespace, this is the signal that you are establishing coupling to that namespace. This is not necessarily bad. Use statements become not a boring and repetitive declaration to write, but the signpost of an external dependency towards another module of your application. Mutual or unnecessary dependencies between namespaces are smells that are thus noticed quickly.

Friday, February 05, 2010

Automated refactoring without heavy IDEs

Unix programs are beautiful and universally compatible. You can chain them to accomplish incredible tasks. Let's do some automated refactoring such as renaming php classes and methods without IDEs, using only sed, find and grep, plus your version control system of choice. You do not have to edit many hundred files by hand, and these tools should be available in every Linux distribution and maybe also on Mac Os X.

Renaming classes
START
Commit your work and make sure that the output of svn status is empty.

svn update
svn commit -m "what I have done until now..."
svn status

Refactoring is not an exact science, so wrong commands can destroy a codebase. In case you forget a \ and all your <?php tags are replaced by a apt-get cow, you'll simply enter

svn revert -R .

to restore the original state of the working copy after the last commit.
I'm sure your version control system has similar commands.

STEP 1
Provided that you use autoloading, moving or renaming the file that contains the class definition is the first step of the operation.

mv Package1\OldClassName.php Package2\NewClassName.php

Usually directly hooked in the version control system:

svn move Package1\OldClassName.php Package2\NewClassName.php

or what your VCS provides you with.

STEP 2
The syntax of the sed command is analogue to vim's search&replace one:

sed -i -e 's/\<old_classname\>/New_ClassName/g' \
    `find folder1 folder2 -name *.php`

This command renames all OldClassName occurrences:

in place, without creating new files (-i)
using an expression defined on the command line and that follows (-e)
where OldClassName is present as a single word (\< and \> modifiers), and not for example AbstractOldClassName
substituting them with NewClassName
in all the file and thorughout all the single lines (/g)

We should list after the command all files we want to edit, but we can simply use a find command which find all files that:

is in folder1 or folder2, or in their subdirectories (you can insert as many folders as you want)
have a name that matches *.php (essentially their extension has to be .php)

The backticks (`) simply indicate to substitute the enclosed command with its result. They are a powerful tool to use sub-commands (otherwise we would have been stuck with find -exec.)The \ let the shell know that the command continues on the new line.

When the php classes to rename do not take advantage of Php 5.3 new features, the renaming it's very simple, since OldClassName is always the fully qualified name of the class. From php 5.3 namespaces are available to structure classes in different packages without very long names, but the namespace separator unfortunately concides with the shell backslash. Thus to enter it you have to insert another backslash:

sed -i -e 's/\<old\\classname\>/New\\KlassName/g' `find folder1 folder2 -name *.php`
sed -i -e 's/\<classname\>/KlassName/g' `find folder1 folder2 -name *.php`

The first sed modifies the use statements or the direct references, which are the means 99% of classes are referred by. The second sed modifies the remaining references to the base class name embedded in php files and it may not be necessary if you're only moving a class around.
A problem that occurs is when you're moving a class from a namespace and their old sibling did not have use statement to import its name but now they have to, or the equivalent specular case when you're moving a class into a namespace and you want use statements to vanish. You'll have to resort to manual editing to fix the interested files.

Don't forget to repeat the two steps also with the test classes. The exact commands depend on your naming convention, but often the test cases mirror the production code hierarchy, so that OldClassNameTest should be replaced with NewClassNameTest. In xUnit frameworks, test cases are first order citizens (classes), so there's nothing different from the production code renaming process.

END
Check the results by looking for every occurrence of the old or new name in your working copy:

grep -r "OldClassName"

or grep -rl to show only the list of files where the pattern does occur.
svn diff and svn status shows you the current changeset (every modified line, added and deleted ones) and the list of modified files respectively.
After manual check, run your whole test suite. If some of your test fail, discover the errors (that's what tests are for) and repair the situation manually. If the working copy is compromised, revert to the original revision.

Renaming methods
Provided that you have not overlapping method names in unrelated classes, the START and the END phases are the same, and they are very good practices to adopt in refactoring.
Methods may have a limited impact on the codebase in respect to classes:

to rename private methods, you can edit them directly in your editor of choice, opening the class file.
For protected methods, you should check also subclasses, and when they are a handful it's often faster to direct edit the files instead of running sed.
For public methods, the problem is analogue to renaming a class, but there are no file movements nor namespacing issues. Follow Step 1.

Of course both these refactorings break backward-compatibility, so make sure you're not modifying any published interface in a minor release of your application. They are also dangerous if uncontrolled, because regular expressions are an advanced tool that can quickly mangle all your code. Make sure you can return to the original copy of the code via version control at any time, and that there are many tests in place that cover the functionalities whose provider classes you want to refactor.
Refactoring shouldn't be hard. I hope this little guide helps you.

Thursday, February 04, 2010

Practical Php Patterns: Proxy

This post is part of the Practical Php Pattern series.

The structural pattern of today is the Proxy pattern, an enhancement on the simple handler (or pointer) used as a reference to an object: this pointer is substituted by a Proxy object which interposes itself between the Client and the execution of actual work, adding an hook that can be exploited for many different goals.
Technically, this pattern interposes a Proxy object between Client and a RealSubject, maintaining the Subject interface and delegating its methods in different ways. A Proxy can really do anything trasparently: lazy creation of the RealSubject or loading of data, exchange of messages with other machines, copy-on-write strategies.
The analogy that comes to mind is an http proxy, which clients such as browsers and applications rely on to contact http servers. The proxy can accomplish useful tasks while managing the connections, like access control and caching of large downloaded files.
While Proxy objects make room for optimizations, the abstraction they maintain as implementing the same interface can often result in performance issues (it becomes a leaky abstraction), as in the case of Orms Proxy objects that is treated later in this post.

The object graph of a Proxy pattern is similar to the Decorator's one in structure, but the intent conveyed is different. A Decorator adds behavior dynamically to objects, while a Proxy controls the access from the Clients. Moreover, a Proxy may lazy create the RealSubject only when it's needed or work with it via other means but it does not usually compose it.
Participants:

Client: depends on a Subject implementation.
Subject: abstraction of the RealSubject.
RealSubject: accomplish expensive work or contains a lot of data.
Proxy: provides to Client a reference that conforms to Subject, while creating or communicating with the RealSubject instance expensively only when needed.

These are two examples of the wide usage of the Proxy pattern:

Object-relational mappers create Proxies on-the-fly as subclasses of Entity classes, to accomplish lazy loading (virtual proxy). This proxies override all the Entity methods prepending a loading procedure before delegating the action, and do not contain data before a method is actually called. Orms proxies support bidirectional relationships between objects without loading the entire database, since they are put at the border of the object graph section currently loaded.
Java RMI uses Proxy objects for remoting (remoting proxy). The proxies serialize parameters when their methods are called and perform a request over the network to delegate the call to the real object on another node. This technique allows working with remote objects transparently, without noticing they are not on the same machine, but because of this transparency it is prone to slow down the execution.

The code sample implements an ImageProxy that postpones the loading of an image's data.

<?php
/**
 * Subject interface.
 * Client depends only on this abstraction.
 */
interface Image
{
    public function getWidth();

    public function getHeight();

    public function getPath();

    /**
     * @return string   the image's byte stream
     */
    public function dump();
}

/**
 * Abstract class to avoid repetition of boilerplate code in the Proxy
 * and in the Subject. Only the methods which can be provided without
 * instancing the RealSubject are present here.
 */
abstract class AbstractImage implements Image
{
    protected $_width;
    protected $_height;
    protected $_path;
    protected $_data;

    public function getWidth()
    {
        return $this->_width;
    }

    public function getHeight()
    {
        return $this->_height;
    }

    public function getPath()
    {
        return $this->_path;
    }
}

/**
 * The RealSubject. Always loads the image, even if no dump of the data
 * is required.
 */
class RawImage extends AbstractImage
{
    public function __construct($path)
    {
        $this->_path = $path;
        list ($this->_width, $this->_height) = getimagesize($path);
        $this->_data = file_get_contents($path);
    }

    public function dump()
    {
        return $this->_data;
    }
}

/**
 * Proxy. Defers loading the image data until it becomes really mandatory.
 * This class does its best to postpone the very expensive operations
 * such as the actual loading of the BLOB.
 */
class ImageProxy extends AbstractImage
{
    public function __construct($path)
    {
        $this->_path = $path;
        list ($this->_width, $this->_height) = getimagesize($path);
    }

    /**
     * Creates a RawImage and exploits its functionalities.
     */
    protected function _lazyLoad()
    {
        if ($this->_realImage === null) {
            $this->_realImage = new RawImage($this->_path);
        }
    }

    public function dump()
    {
        $this->_lazyLoad();
        return $this->_realImage->dump();
    }
}

/**
 * Client class that does not use the data dump of the image.
 * Passing blindly a Proxy to this class and to other Clients makes sense
 * as the data would be loaded anyway when Image::dump() is called.
 */
class Client
{
    public function tag(Image $img)
    {
        return '<img src="' . $img->getPath() . '" alt="" width="'
             . $img->getWidth() . '" height="' 
             . $img->getHeight() . '" />';
    }
}

$path = '/home/giorgio/shared/Immagini/kiki.png';
$client = new Client();

$image = new RawImage($path); // loading of the BLOB takes place
echo $client->tag($image), "\n";

$proxy = new ImageProxy($path);
echo $client->tag($proxy), "\n"; // loading does not take place even here

Wednesday, February 03, 2010

Where is business logic?

Have you ever heard of Multitier architecture? If not, you have probably encountered it without knowing its name while working on web applications.
In a multitier architecture, an application is divided in different horizontal layers, each addressing a different concern. Every layer builds on the one that lies directly under it to perform its work, thus decoupling for example html presentation (upper layer) from sql queries (lower layer).
The number of layers is flexible and there is a high number of variants for a multitier architecture, but the simplest model many web applications fall in is composed of three layers:

user interface: generates html, handles user input and displays errors.
Domain Model: objects and classes that represent concepts, such as Post, Thread, Forum, User, Message and so on.
Infrastructure: usually data access code to a database and, by extension, the Sql schema itself where the relational model is used. External services also qualify as infrastructure.

Thus there are three major approaches to development, which differ in the layer of the application that contains the greater quantity of business logic. Which User can close a particular Thread and in what order the Messages for a particular User are listed? In which Forum a User can add a Post?
The answers to these questions ideally reside in one of the fundamental layers as specifications require (though sometimes they are scattered trough the layers, which is a very effective way to complicate a design.)

Smart Ui
As the name says, this style keeps the logic in the user interface. A Smart Ui example is a folder full of php scripts that move data back and forth from a MySql database.
During maintenance, usually the replication of rules and code in different scripts increase, rendering difficult to change and expand the application; this style is appropriate only for small projects which only shuffle data from tables to html pages.

Smart database
A style primarily taught in database classes, which result in a very accurate schema, full of constraints, triggers and stored procedures to maintain data integrity.
Note that if you want to implement this approach, you probably need an expensive database like Oracle because open source databases do not support all the logic you need. Moreover, Sql is not a programming language, you can stretch Ddl with proprietary extensions and set up many rules but you will be replicating them in the front-end (if there is one at least) for error handling and localization.

Rich Domain Model
The most powerful approach is implementing logic in the domain model layer, which is the type of model that should be able to best represent the real world.
It follows that in such an approach the Ui delegates nearly everything to the domain layer, or it is even automatically generated (Naked Objects). Technology is available for the database to be generated automatically once a mapping from objects to tables is defined (Orms like Hibernate and Doctrine 2). The dependencies are inverted as all other layers mirror the domain model.
The advantages of a rich domain model are multiple:

testing is simple because infrastructure and Ui do not get in the way; no need to run databases to test business logic or to fill forms with a bot or Selenium.
no duplication of logic is permitted, because different views of the user interface refer to the same methods in the domain model.
the model of the application is the model presented to the user; there is no translation between concepts and no need for him to learn a data model along with a presentational one. Often a Presentation Model is needed because the underlying Domain Model is anemic.

Essentially with the tools available today for managing generic layers you can achieve everything by manipulating objects directly in memory and storing the result in the database by pushing a big Save button.

Tuesday, February 02, 2010

Practical Php Patterns: Flyweight

This post is part of the Practical Php Pattern series.

The structural pattern of the day is the Flyweight one, which is a technique for the creation of stateful objects used as shared instances.
The pillars of the Flyweight pattern are the following:

immutability: changing a Flyweight object in one Client must not affect other Clients that reference it; state change is addressed by substitution with another object, which may be reused. In php (at least php 5) and in most languages objects are passed around with handlers or pointers and so are never copied.
no identity: duplicating an object does not provide a clone but only another copy indistinguishable from the source object. There is really no difference between two instances created with the same parameters.

Typical names of a Flyweight class are Currency, DateTime, FontGlyph, Phonenumber, and so on.
Sharing references is useful as there are fewer objects around that may occupy memory and then be garbage collected. The Flyweight pattern allows creating classes for data-intensive entitites and thus promotes homogeneity of the object graph. Flyweight classes could have been modelled instead as arrays or simpler structures (mostly strings), a choice that sometimes impoverish the domain model.
The infrastructure that implements sharing of instances is a factory that maintains a pool of objects and lazy-creates them when asked, recycling previously created instances when the same parameters are passed to its creation methods.
The creator participant may also be a static method since the global state is immutable and testing is still possible. Using a FlyweightFactory is a better approach for extensibility when there is a hierarchy of Flyweights, as the base class would not depend on all its children. Static method is acceptable as very often there is little behavior to stub out from Flyweight objects, which act as beautified data containers.

Participants

Client: has a reference to a Flyweight object in some way.
FlyweightFactory: creates and maintains all references of Flyweights, recycling them.
Flyweight: abstraction of the shared objects.
ConcreteFlyweight: (possible) subclasses or implementations of Flyweight.

In an application usually there are two categories of objects: stateless objects (Services) and stateful ones (Entities). Value Objects are similar to Entities as they are stateful objects but without identity, and are usually implemented as Flyweights. User, Article and Group are classic examples of Entities in Domain-Driven Design, while Phonenumber and Money are examples of Value Objects.
The state of an object can be partitioned into two tranches: intrinsic state and extrinsic state.

intrinsic state is shared between all Clients. The Flyweight keeps intrinsic state as private fields and in its concrete class name.
Extrinsic state is the context where the Flyweight is used, and may not be present. The object must be passed extrinsic state from the Client as a method parameter when needed.

The code sample describes the modelling of a User's nationality as an external class.

<?php
/**
 * Flyweight. If there were different behaviors a Flyweight interface
 * would have been extracted and different ConcreteFlyweight implemented,
 * for instance with different constructions of nationality declaration.
 * The fact that a Nationality is treated as a Value Object is arbitrary:
 * if the stored data increase this class should become an Entity.
 */
class Nationality
{
    /**
     * @var string  the nation name cannot change
     */
    private $_nationName;

    public function __construct($nationName)
    {
        $this->_nationName = $nationName;
    }

    public function __toString()
    {
        return $this->_nationName;
    }

    /**
     * The person's name is extrinsic state and should be passed instead
     * of stored as a private field. Telling this class the person's name
     * results in a more cohesive design than extracting the nation name:
     * the behavior should be kept near the data it references
     * ($this->_nationName).
     */
    public function getNationalityDeclaration($person)
    {
        return "{$person} is from {$this->_nationName}";
    }

    private static $_instances = array();
    /**
     * Implementation of the FlyweightFactory participant as a static method.
     */
    public static function getInstance($name)
    {
        if (!isset(self::$_instances[$name])) {
            self::$_instances[$name] = new self($name);
        }
        return self::$_instances[$name];
    }
}

/**
 * A Client class, which is a simple bean containing data for the sake of this
 * example.
 */
class User
{
    public function getUid()
    {
        return $this->_uid;
    }

    public function setUid($uid)
    {
        $this->_uid = $uid;
        return $this;
    }

    /**
     * @return Nationality
     */
    public function getNationality()
    {
        return $this->_nation;
    }

    public function setNationality($nation)
    {
        $this->_nation = $nation;
        return $this;
    }

    public function __toString()
    {
        return "User: #{$this->_uid}. " . $this->_nation->getNationalityDeclaration($this->_uid);
    }
}

// other Client code
$user = new User();
$user->setUid(714673)
     ->setNationality(Nationality::getInstance('Italia'));
echo $user, "\n";
// changing a Flyweight means referencing a new instance
// (which may actually already exist in the FlyweightFactory)
$user->setNationality(Nationality::getInstance('Australia'));
echo $user, "\n";

As another example, you can consider the Doctrine 2 data type system, which is implemented with Flyweight objects that represent different column types with the relative mapping to object fields. Since there are many classes whose metadata references these Flyweight the total load of the Orm on the server is reduced.

Monday, February 01, 2010

Velocity

There's an old joke that goes like this:

One day a pilgrim was walking towards Santiago and, during the way, he encountered a philosopher. He asked the supposedly wise man:
"How much does it take to arrive to Santiago?"
But the philosopher did not answer.
Then the pilgrim began to walk again, and only when he was fifty meters away, the philosopher shouted:
"Half a day!"
The pilgrim looked back and said "But... Couldn't you tell me while I was there with you?"
And the philosopher: "I did not know what was your velocity."

One of the points of Agile methodologies is in approaching estimation without random guesses. We are often asked to prepare a schedule and a deadline without knowing enough about a software problem.
Agile prescribes to estimate relative size of user stories (which can be think of features with an acceptance test) in points instead of estimating time. The time spent to complete the user stories varies, because it is dependent on the team or programmer's velocity.
Velocity in physical definitions is a different quantity than speed. Velocity is defined as the derivative of displacement in respect to time, while speed is the derivative of path length in respect to time. While the shortest path between two points in space is a straight line, following a non rectilinear path results in an average velocity lower than the equivalent average speed, which could thus be a biased progress metric. If I go to Tokyo and return home in an hour, my speed would have been amazing while my velocity is exactly zero.

Velocity focuses on the effective work which has been completed, not on the effort nor on the hours spent coding. Only the finished stories are taken into consideration to calculate velocity.
Velocity is best estimated by gathering data from some short iterations (some weeks long). After the first iteration, which accomplishes a certain number of points from several completed stories, we are able to extrapolate a range for the remaining number of iterations required.

Consider this example: during the first two-week iteration of a project, a solo developer finished two stories, which are worth 5 and 2 points respectively. The velocity of this single-person team is 7 points per iteration. It does not matter how much hour he spent on this project as long as he can commit to the same effort in the subsequent iterations.
If the remaining stories' points amount to 40, we know that, on average, the number of remaining iteration will be 40 / 7 ~ 6. We may give a range of 5 to 7 or 4 to 8 iterations as the time estimate.
After more iterations, the velocity estimate is adjusted as the sample which we measure it on increases in size. The average velocity can be used to re-estimate the number of remaining iterations, providing a stricter range and updating the project schedule with newer informations. The team and the iteration length should be consistent during the project to take advantage of these metrics.

If you want to deepen your knowledge of this subject, I reviewed a book which may be propedeutic to this goal: Agile estimating and planning.