Invisible to the eye

Friday, February 26, 2010

Some news on my activities

This week has been very satisfactorily so far.

First, my talk for phpDay, the most important yearly Italian conference on php, has been accepted. The talk will be in Italian and it takes place on the last day of the conference, May 15th. The title is Architettura e testabilità.
Here is the abstract [Italian]:

Il design di un'applicazione può essere influenzato positivamente da diverse pratiche. La facilità di testing é condizione sufficiente per un architettura che garantisca semplice manutenzione e alta coesione dei componenti.
Argomenti trattati: Dependency Injection, Law of Demeter, Design Pattern creazionali (Factory vs. Singleton), Api oneste.

The tickets have now low prices as this is the early bird period.

Meanwhile, I've started writing for php|architect on phparch.com; php|architect is one of the most prominent producer of php-related books and training courses.
My first post in their news section is about Zend Framework 2 roadmap.

Thursday, February 25, 2010

Levels of architecture

As I said in a recent post, it is important to have a 50,000 feet view of a software system in our mind before editing even a single line in its codebase. Without a big picture in mind, our changes can push the project in the wrong direction. But there are different views, each at a different level of detail, that may affect a software's design process.
A simple analysis produces this scale of units, ordered by increasing size. Every unit is composed by a set of units from the underlying level, which comes before it in the list:

The simplest unit of software is obviously the line of code.
The next smaller unit in object-oriented software is the method, being it public, private or with any access control policy.
The class and its specializations: Entity, Value Object, Service, Repository...
The package alias namespace: its purpose is to simplify references between items contained in the same unit.
The module alias component, which often presents a Facade to simplify its access.
The application alias BoundedContext. Different applications can work together and communicate via published protocols, anti-corruption layers, RESTful services, relational databases... There are no limits to collaboration paradigms.

The interesting part is that there are some metrics and rules of thumb that works at every level of detail in an architecture, while others gain greater effectiveness the more you are near (or far from) one end of the spectrum. Employing metrics and practices at the particular level of the architecture they are thought for is crucial.
Let's consider some general rules first:

number of units: programmers have limited Ram. People normally can work on 5 to 9 units at the time (obviously more when they are very similar or dumb), so a container unit should not be composed of an high number of contained units. For example generally a class should not present 300+ methods, and a package should not contain one hundred classes.
low coupling, high cohesion, information hiding: these concepts should be enforced at every level; one unit dependent on another is sufficient to transitively establish a dependency between the respective container units.

And then the ones that change with the architecture level considered:

TDD and refactoring: applied at the low end of the spectrum. It is simple to refactor a private method, but it's very difficult to refactor a published protocol between two applications.
the converse situation applies at the high end of the spectrum: thinking of a good api and refining the Ubiquitous Language is very important because of the resistance to change of these kinds of units.
choice of the system under test: testing the single classes and methods is the preferred approach in a largest part of a project's test suite, because of the reduced number of test cases necessary and the resulting design aid (applied at the low end.)
code coverage is instead only significant at an high level. Some dumb classes may not have unit tests at all but at the same time they could be indirectly exercised by other tests. Similarly many static analysis metrics are meaningful if measured on a whole project.
Uml diagrams must be kept in sync with code, so they really add value when they are used at the boundaries of components or without much detail in them.

Feel free to add other practices you follow with preference at one end of the spectrum of units, or at every level of detail.

Wednesday, February 24, 2010

3 simple steps to Ubiquitous Language

A customer should be able to view its past orders, along with their dates and chosen products. This is necessary so that clients can view a single piece and request an arbitrary amount of it again.

This piece of documentation seems to have been written by a poor analyst. It is also the first break in the chain between the user needs and the programmer understanding of the business domain.
A common practice in agile development is incorporating the final user in the process, even to the point of substituting the analyst with a programmer. In fact, in Italy one of the few things that works well is the small size of software firms, which eliminates communication overhead. One time we were able to get a vacation packages selling system up and running in a day, tailoring it in real time to the adjustments provided by the customer.
However, in every serious project a written set of requirements is necessary, to serve as an analysis of the domain and as future documentation. But formal language is not a sole property of php code: Italian, English and other languages can be used in such a way to define without ambiguity the concepts and the functionalities of a domain model. Communication is the major issue in software design and this issue should be addressed accordingly: by defining an higher-level language over the natural tongue.
Let's walk through some steps to transform the sample initial analysis in an instance of Ubiquitous Language, that captures information from the domain.
We will make additional hypotheses on the requirements and the model because we do not have a user of the application available here we can ask questions to. This process should be conducted in the original requirements gathering and initial modelling phase.

Step 1
Synonyms are the most powerful enemy of an Ubiquitous Language: different domain concepts should be named differently and coincident concepts should be named equally. This practice promotes consistency and predictability in documentation and then in the produced code.
The names of concepts are usually written capitalized, to highlight their special role in the prose. They often become class names, so the capitalization is a suited trait. Anyway they will find their place in the domain layer or be eliminated while refactoring.
Once grasped, the practice of using Ubiquitous names will become a second skin, and you'll find yourself making questions such as "We are talking about this data as associated to a particular Customer?" and "So a Customer can make many Orders and an Order should have a unique Customer?" all the time.

A Customer should be able to view its past Orders, with their dates and chosen Products. This is necessary so that Customers can view a single Product and request an arbitrary amount of it again.

Step 2
Look at the available operations and see how they can be better described by involving the Ubiquitous Language.

A Customer should be able to find its past Orders, with their dates and chosen Products. This is necessary so that Customers can view a single Product and create a new Order associated to it.

Step 3
Maintain the Ubiquitous language in sync with the refactoring of the domain concepts, and let it evolve with the code.
Contuinuing with the example, let's say the customer asks for a refinement: only some products are available for shipping, while others are difficult to send and they should be "retired" at a particular warehouse. Here's another piece of description that involves the Ubiquitous Language and describes these additions.

"A Product has a Delivery relationship, which may be a Shipping or a HandOver. A Delivery is always a property of the single Product and it exhibits traits such as a description and its availability in time."

I hope this panoramic of the Ubiquitous Language concept will be helpful in improving your human language usage in documentation and communication with users and other developers; your goal should be to convey concentrated, distilled knowledge.
If you have any suggestion to expand on this topic, feel free to add a comment.

Tuesday, February 23, 2010

The 50000 feet view and Dependency Injection

One criticism of Dependency Injection is the supposed unnecessary abstraction over collaborators that it is imposed by the constructor and setter injection techniques. Writing:

class Computer
{
    public function __construct(Cpu $cpu) {
        $this->_cpu = $cpu; 
    } 
}

instead of:

class Computer
{
    public function __construct() {
        $this->_cpu = new AmdCpu(); 
    } 
}

is said as being too abstract because a new programmer that starts reading the source code of the Computer class does not know what collaborator concrete class is used by Computer and does not know where to look for the code defining its behavior. In general, he just can't draw a picture of the overall object graph since the links between objects are scattered in many different classes and interfaces. A 50000 feet view of the system (a representation at the same level of detail of viewing a city from a plane) is difficult to grasp just from the method signatures of a highly decoupled system.
Fortunately Dependency Injection actually is about separating the construction of the object graph from the business logic, and the seams that define how classes work together are left abstract by design. Someone must construct the application or its components anyway, but the process is well encapsulated without the involved collaborators knowledge. The construction process is so decoupled from the system that it can take advantage of a DI container without introducing further coupling.
Thus in a well-written application there is already a nice 50000 feet view of the system, being it kept in a factory class or in the configuration of the DI container. A developer starting to work on a component should look at the code that constructs it in the first place.
For instance, the SpecificationLoader component of NakedPhp is a Facade composed of many different classes. The NakedPhp\Reflect\ReflectFactory class contains a createSpecificationLoader() method:

    /**
     * @return SpecificationLoader
     */
    public function createSpecificationLoader($folder, $prefix)
    {
        if (!isset($this->_specLoader)) {
            $this->_specLoader = new PhpSpecificationLoader(
                new PhpSpecificationFactory(
                    new FilesystemClassDiscoverer($folder, $prefix)
                ),
                new PhpIntrospectorFactory(
                    new FactoriesFacetProcessor(array(
                        new FacetFactory\PropertyMethodsFacetFactory,
                        new FacetFactory\ActionMethodsFacetFactory
                    )),
                    new ProgModelFactory(
                        new MethodsReflector(
                            new DocblockParser
                        )
                    )
                )
            );
        }
        return $this->_specLoader;
    }
}

This is a very practical 50000 feet view: it does not involve diagrams; it expresses dependencies between all the classes that constitute the component at the same time. Moreover, it is automatically kept synchronized with actual code refactoring since there is no external documentation involved (Code is the design). Dependency Injection is not snake oil: it's the best practice you can apply to object-oriented code.

Monday, February 22, 2010

Practical Php Patterns: Iterator

This post is part of the Practical Php Pattern series.

The behavioral pattern of the day is the Iterator pattern, which provides an abstraction over a very common process: the iteration over a collection of objects [or scalars] located in an unspecified part of the object graph.
The iteration may be performed in very different concrete ways: over an array property, a collection object, an array, even a query result set.
In a world of objects, Iterators maintain array-like capabilities as a non invasive facet of objects. The Client class is often totally decoupled from the actual implementations of objects and refer to an Iterator interface. Whenever possible, we can pass around references to an Iterator instead of a reference to a concrete or abstract class which may change in the future.

Participants:

Client: refers to Iterator's methods to perform a loop on a set of values or objects.
Iterator: abstraction over the iteration process. Contains methods such as next(), isFinished(), current() and so on.
ConcreteIterators: implements an iteration over a particular set of objects, such as an array, a tree, a Composite, a collection.

Php supports natively the Iterator pattern via the Traversable interface, which is extended by Iterator and IteratorAggregate. Not only a set of standard methods is defined by these two subinterfaces, but every Traversable object can be passed to foreach() as-is. The foreach construct is the primary Client of Iterators.
Iterator implementations are real iterators, while IteratorAggregate are Traversable objects with other responsibilities which return an Iterator via a public getIterator() method. The Standard Php Library, which is the only general purpose object-oriented library bundled with php, defines additional interfaces and utility classes that implement or compose them.
OuterIterator implementations decorate an Iterator. CachingIterator and LimitIterator are examples of this interface.
RecursiveIterator is an extension of the Iterator interface for tree-like structures, which defines a pair of additional methods to check for the presence of children in the current element of an iteration.
RecursiveArrayIterator and RecursiveDirectoryIterator are implementation examples of this interface. These type of Iterators can be used as-is or bridged to a plain Iterator's contract with a RecursiveIteratorIterator. This OuterIterator implementation will perform a depth-first or breadth-first traversal depending on the construction parameters.
When RecursiveIteratorIterator is used, it can be passed to foreach. See the code sample for the differences in usage of RecursiveIterators and their superset Iterators.
Finally, SeekableIterators add a seek() method to the contract, which can be used for moving the internal state of the Iterator into a particular point of the Iteration.
Note that Iterator is a greater abstraction than an object collection, since we can have InfiniteIterators, NoRewindIterators, etc, which have no correspondence in the plain arrays domain. For this reason, Iterators lack some functionalities such as a count() function.
The complete list of SPL iterators can be found in the php official manual.

Thanks to the powerful support of php, most of the work in using the Iterator pattern in this language consists in wiring the standard implementations correctly. The code sample exploits the functionalities of standard Iterators and of RecursiveIterators.

<?php
/**
 * Collection that wraps a numeric array.
 * All five public methods are needed to implement
 * the Iterator interface.
 */
class Collection implements Iterator
{
    private $_content;
    private $_index = 0;

    public function __construct(array $content)
    {
        $this->_content = $content;
    }

    public function rewind()
    {
        $this->_index = 0;
    }

    public function valid()
    {
        return isset($this->_content[$this->_index]);
    }

    public function current()
    {
        return $this->_content[$this->_index];
    }

    public function key()
    {
        return $this->_index;
    }

    public function next()
    {
        $this->_index++;
    }
}

$array = array('A', 'B', 'C', 'D');
echo "Collection: ";
foreach (new Collection($array) as $key => $value) {
    echo "$key => $value. ";
}
echo "\n";

/**
 * Usually IteratorAggregate is the interface to implement.
 * It has only one method, which must return an Iterator
 * already defined as another class (e.g. ArrayIterator)
 * Iterator gives a finer control over the algorithm,
 * because all the hook points of Iterator' contract
 * are available for implementation.
 */
class NumbersSet implements IteratorAggregate
{
    private $_content;

    public function __construct(array $content)
    {
        $this->_content = $content;
    }

    public function contains($number)
    {
        return in_array($number, $this->_content);
    }

    /**
     * Only this method is necessary to implement IteratorAggregate.
     * @return Iterator
     */
    public function getIterator()
    {
        return new ArrayIterator($this->_content);
    }
}

echo "NumbersSet: ";
foreach (new NumbersSet($array) as $key => $value) {
    echo "$key => $value. ";
}
echo "\n";

// let's play with RecursiveIterator implementations
$it = new RecursiveArrayIterator(array(
    'A',
    'B',
    array(
        'C',
        'D'
    ),
    array(
        array(
            'E',
            'F'
        ),
        array(
            'G',
            'H',
            'I'
        )
    )
));
// $it is a RecursiveIterator but also an Iterator,
// so it loops normally over the four elements
// of the array.
echo "Foreach over a RecursiveIterator: ";
foreach ($it as $value) {
    echo $value;
    // but RecursiveIterators specify additional
    // methods to explore children nodes
    $children = $it->hasChildren() ? '{Yes}' : '{No}';
    echo $children, ' ';
}
echo "\n";
// we can bridge it to a different contract via
// a RecursiveIteratorIterator, whose cryptic name
// should be read as 'an Iterator that spans over
// a RecursiveIterator'.
echo "Foreach over a RecursiveIteratorIterator: ";
foreach (new RecursiveIteratorIterator($it) as $value) {
    echo $value;
}
echo "\n";

Sunday, February 21, 2010

1K subscribers

This week the Invisible to the eye feed has surpassed 1024 (1K) subscribers for the first time. :)
Thank you all for the support. Your high number encourages me to write more and more.

Friday, February 19, 2010

The Number one rule of design

The post on php's foreach construct raised a discussion about the design of the language and of userland code, so I'm recollecting my thoughts here. However, note that I was not discussing the utility of the construct, only the use of array functions when dealing with homogeneous transformation of an array, such as an array with exactly the same type or number of elements.

I would like to introduce a general principle I follow while designing an [object-oriented] system.

A design is good if it makes reacting to change easy.

Decoupling between different classes and packages can be measured by thinking about hypothetical changes to the requirements and how they can be accomodated. This is a double advantage: not only you will be more ready to embrace change, but you will be also writing testable code; unit testing is essentially a practice that change all the environment around a SUT by using stubs and mocks, to find out if the software unit complies to its contract.
Change will not always happen in the ways you have forethought, but high cohesion in software modules will help in dealing with it.
Anyway, I am not the person that invented this design rule (so you shouldn't call me a moron and go on with your practices.) It is contained in a famous paper titled On the Criteria To Be Used in Decomposing Systems into Modules, but we should call it the Number one rule of design:

We have tried to demonstrate by these examples that it is almost always incorrect to begin the decomposition of a system into modules on the basis of a flowchart. We propose instead that one begins with a list of difficult design decisions or design decisions which are likely to change. Each module is then designed to hide such a decision from the others.

This is really one of the most important rules in the information technology world. That's why you can plug in nearly everything in your computer in a standard interface, why you can drive any car and why a structure like Internet is possible. This is information hiding.

The discussion that started in the comments of the previous post is about how to pass an high number of parameters (and by high I mean more than 3) from a bunch of Client classes to a function or a method of a Server class. We will now apply the #1 rule of design to this problem.
The first solution is obvious: every parameter modelled as a formal parameter of the method. Though, a long signature lacks clarity, renders parameters order important and creates issues with default values.
A suggestion proposed in the comments is extending php syntax to automatically pass a map:

function foo($arg1, 'arg2' => 0, 'arg3' => false) {}
foo('arg1' => 42, 'arg2' => 10);
foo('arg1' => 42, 'arg2' => 10, 'arg3' => true);

which is syntactic sugar for:

function foo(array $args) {
    $args['arg2'] = isset($args['arg2'] ? $args['arg2'] : 0;
    ...
}
foo(array('arg1' => 42, 'arg2' => 10, 'arg3' => true));

and would make simpler passing maps as an argument.
My solution would be refactoring the method a bit and passing an Argument object as a parameter. An object will be particularly useful when the number of parameters is not fixed and there are many defaults.
Moreover, a separate object decouples default values handling by code execution. I think that passing an high number of parameters is often a smell since it's likely that a subset of these parameters will always be passed together, or one of them is repeatedly passed to different methods.
Let's consider passing normal arguments, passing a map (with or without sweet syntax, since it is functionally equivalent) versus using an Argument object.

Hypothetical change: adding a parameter.
Formal parameters: you have to modify the method signature and review the Client classes or add a default.
Map: you add a default for the parameter, if applicable.
Argument object: you can add the parameter to the Argument object if it fits; in other cases adding to the signature is a better choice.
Hypothetical change: deleting a parameter.
Formal parameters: you have to fix all the method calls.
Map: the method signature does not change.
Argument object: the method signature does not change.
Hypothetical change: adding another method with the same signature.
Formal parameters: duplicate and maintain the two signatures.
Map: duplicate the signature and maintain both the old and the new one, with the same default values.
Argument object: duplicate the one-parameter, short signature.
Hypothetical change: the default value for some parameters now has to be picked up from a configuration or a database.
Formal parameters: you have to add a collaborator in the Server class, and an argument to its constructor or a setter to inject it.
Map: you can tweak the definition but the signature remains the same. The collaborator problem is still present.
Argument object: create it with a factory that has a database-backed repository or a configuration parser as collaborator.
Hypothetycal change: a method parameter has to be validated, lazy calculated, and so on.
Formal parameters: no chance other than computing the value in the Clients. Maybe currying the method would work but it is a change of signature anyway.
Map: you have to compute the value internally, by adding collaborators and coupling, or externally, changing the Clients.
Argument object: the object or the factory that creates it effectively hides the parameter preprocessing or derivation, so no problem. Sometimes the Argument object itself can perform the calculation alone, or you can swap different subclasses of it. Endless possibilities.

Note that the higher the number of parameters, the more they are likely to change in computation, number and default value. That's why a method with 1 or 2 parameters is usually not an issue. An Argument object should comprehend parameters that change together. This object responsibility is decouple as much as possible the Server class from a [large] set of parameters.
Furthermore, an Argument object handles default values in relation to a default literal value with the power of a getter over a public property. You can hook pretty much everything in the creation and in the methods of an Argument object, protecting Clients and Servers from having to import the default argument from foreign packages, libraries and applications.
Finally, an Argument object captures a domain concept with a name, a protocol and a behavior, while a map is only a plain data structure. You can tell what a map contains, but you can't tell what a map does.
The code sample deals with calculation of a final price for an object, with three example of design for a Shop class.

<?php
class ShopWithFormalParameters
{
    /**
     * We use float for simplicity but a decimal structure would be
     * the correct choice.
     * @param float $regularPrice    price of the item
     * @param float $itemDiscount    discount on this item (percentual)
     * @param float $globalDiscount  discount of the shop during
     *                               sales period (percentual)
     * @param float $bonus           value of bonus coupons
     *                               used by the customer
     * @return float
     */
    public function getPrice($regularPrice,
                             $itemDiscount = 0,
                             $globalDiscount = 0,
                             $bonus = 0)
    {
        return $regularPrice
             * (1 - $itemDiscount / 100)
             * (1 - $globalDiscount / 100)
             - $bonus;
    }
}

$formalShop = new ShopWithFormalParameters();
printf("Price 1st item: %.2f\n", $formalShop->getPrice(200.00, 10, 20));
printf("Price 2nd item: %.2f\n", $formalShop->getPrice(200.00, 0, 20, 5.00));

class ShopWithMap
{
    /**
     * Syntax does not allow to pass a map easily, but
     * this signature is equivalent.
     * @param array $arguments  @see ShopWithFormalParameters::getPrice()
     *                          for keys
     * @return float
     */
    public function getPrice(array $args)
    {
        // similarly, the defaults would fit in the signature
        $args['itemDiscount'] = isset($args['itemDiscount']) ?
                                $args['itemDiscount'] : 0;
        $args['globalDiscount'] = isset($args['globalDiscount']) ?
                                  $args['globalDiscount'] : 0;
        $args['bonus'] = isset($args['bonus']) ? $args['bonus'] : 0;

        return $args['regularPrice']
             * (1 - $args['itemDiscount'] / 100)
             * (1 - $args['globalDiscount'] / 100)
             - $args['bonus'];
    }
}

$mapShop = new ShopWithMap();
printf("Price 1st item: %.2f\n", $mapShop->getPrice(array(
    'regularPrice' => 200.00,
    'itemDiscount' => 10,
    'globalDiscount' => 20
)));
// note that defaults had not to be specified if they
// precede actual parameters in the "signature"
printf("Price 2nd item: %.2f\n", $mapShop->getPrice(array(
    'regularPrice' => 200.00,
    'globalDiscount' => 20,
    'bonus' => 5.00
)));

// my take: refactoring towards a loosely coupled solution
class Item
{
    private $_regularPrice;
    private $_discount;

    /**
     * A couple of arguments always change together,
     * thus it makes sense to put them in the same argument object.
     * The item discount is totally encapsulated here,
     * so the new domain concept has been fairly useful
     * as now the ObjectOrientedShop does not even know a
     * item-specific discount exist.
     */
    public function __construct($regularPrice, $discount = 0)
    {
        $this->_regularPrice = $regularPrice;
        $this->_discount = $discount;
    }

    /**
     * @return float
     */
    public function getPrice()
    {
        return $this->_regularPrice * (1 - $this->_discount / 100);
    }
}

class ObjectOrientedShop
{
    private $_globalDiscount;

    /**
     * One argument changes rarely, or never.
     * So a setter or a constructor option can take care of it.
     */
    public function __construct($globalDiscount = 0)
    {
        $this->_globalDiscount = $globalDiscount;
    }

    /**
     * Only two arguments, which make introducing a map useless.
     * We are protected from changes that affect the Item:
     * adding an argument which is cohesive with Item should be
     * kept with the Item class. Global options should be kept in the
     * constructor or in a field reference anyway.
     * Per-call parameters such as bonus, if necessary, would be
     * wrapped with $bonus in a Sale object which can come from any
     * external service.
     * @param Item      we don't need docblocks to tell what Item is
     * @param float     bonus coupons value
     * @return float
     */
    public function getPrice(Item $item, $bonus = 0)
    {
        return $item->getPrice()
             * (1 - $this->_globalDiscount / 100) - $bonus;
    }
}

$ooShop = new ObjectOrientedShop(20);
printf("Price 1st item: %.2f\n", $ooShop->getPrice(new Item(200.00, 10)));
printf("Price 2nd item: %.2f\n", $ooShop->getPrice(new Item(200.00), 5.00));

Thursday, February 18, 2010

Zend Framework 1.8 Web Application Development

I have started reading the book Zend Framework 1.8 Web Application Development, by Keith Pope (affiliate link).

From a first impression, it seems a thorough book, which covers testing, authorization, authentication and the various layers of a php application. The review will be available in the next weeks.

Practical Php Patterns: Interpreter

This post is part of the Practical Php Pattern series.

The behavioral pattern of today is the Interpreter pattern, which consists in a representation of a grammar with a Composite class hierarchy, where rules are mapped to classes. An expression that follows the grammar can thus be translated to an abstract syntax tree, which is nothing else than an object graph, instance of the Composite pattern.
The abstract adjective for the tree was chosen because it is in fact the most abstract representation of an expression, ignoring the concrete ones it may have as a string or as an other data structure (e.g. in php "A" and "\x41" are different concrete representations of the same abstract literal.) The resulting decoupling of logic rules from presentation purposes is a great simplification of the interpretation process.
Interpreter is not a very common pattern, but for simple grammars it makes adding new rules as easy as adding a class. It does not address the transformation from the concrete representation, which is done by other services, to the abstract syntax tree.

Terminology
The point of this pattern is leveraging the Composite hierarchy for a simple implementation of the AbstractExpression methods (Interpreter operations.)
The arguments of the Interpreter operations are usually collectively called context. Given a method signature, they are usually values to substitute in calculations, parameters for the operations, or they may not exist at all for certain operations.
Similarly, the Leaf and Container participants of the Composite patterns assume different names when an Interpreter is involved. These names reflect their played roles: terminal and nonterminal expression.

Participants:

Client: uses the Interpret operations.
AbstractExpression: abstraction over an expression tree.
NonTerminalExpression: expression which recursively contains other AbstractExpression instances.
TerminalExpression: expression which cannot be simplified further in more than one object.

The GoF book has an extended example of this pattern; I am going to revisit it by using mathematical expressions instead of boolean ones.
Thus the example addresses the representation of a mathematical expression, and the separation of its evaluate() operation concerns in the different ConcreteExpression classes.

<?php
/**
 * AbstractExpression. All implementations of this interface
 * are ConcreteExpressions.
 */
interface MathExpression
{
    /**
     * Calculates the value assumed by the expression.
     * Note that $values is passed to all expression but it
     * is used by Variable only. This is required to abstract
     * away the tree structure.
     */
    public function evaluate(array $values);
}

/**
 * A terminal expression which is a literal value.
 */
class Literal implements MathExpression
{
    private $_value;

    public function __construct($value)
    {
        $this->_value = $value;
    }

    public function evaluate(array $values)
    {
        return $this->_value;
    }
}

/**
 * A terminal expression which represents a variable.
 */
class Variable implements MathExpression
{
    private $_letter;

    public function __construct($letter)
    {
        $this->_letter = $letter;
    }

    public function evaluate(array $values)
    {
        return $values[$this->_letter];
    }
}

/**
 * Nonterminal expression.
 */
class Sum implements MathExpression
{
    private $_a;
    private $_b;

    public function __construct(MathExpression $a, MathExpression $b)
    {
        $this->_a = $a;
        $this->_b = $b;
    }

    public function evaluate(array $values)
    {
        return $this->_a->evaluate($values) + $this->_b->evaluate($values);
    }
}

/**
 * Nonterminal expression.
 */
class Product implements MathExpression
{
    private $_a;
    private $_b;

    public function __construct(MathExpression $a, MathExpression $b)
    {
        $this->_a = $a;
        $this->_b = $b;
    }

    public function evaluate(array $values)
    {
        return $this->_a->evaluate($values) * $this->_b->evaluate($values);
    }
}

// 10(a + 3)
$expression = new Product(new Literal(10), new Sum(new Variable('a'), new Literal(3)));
echo $expression->evaluate(array('a' => 4)), "\n";
// adding new rules to the grammar is easy:
// e.g. Power, Subtraction...
// thanks to the Composite, manipulation is even simpler:
// we could add substitute($letter, MathExpression $expr)
// to the interface...

Wednesday, February 17, 2010

Stop writing foreach() cycles

At least in php. How many for() or foreach() have you written today? I bet a lot. Php 5.3 has a solution that will reduce the average number of iteration structures you need to write: closures applied by plain old array functions.
There are some array functions which have already been supported at least from Php 4, and that take as an argument a callback whose formal parameters have to be one or two elements of the array. I'm talking about array_map(), array_reduce(), array_filter() and uasort() (or similar custom sorting function.) These functions abstract away a foreach() cycle by applying a particular computation to all the elements of an array.
Back in Php 4 and Php 5.2, specifying a callback was cumbersome: you had to define an external function and then passing its name as a string; or passing an array containing an object or the class name plus the method name in case of a public (possibly static) method.
In Php 5.3, callbacks may also be specified as anonymous functions, defined in the middle of other code. These closures are first class citizens, and are treated as you would treat a variable, by passing it around as a method parameter. While I am not a fan of mixing up object-oriented and functional programming, closures can be a time saver which capture very well the intent of low-level processing code, avoiding the foreach() noise.
If you bear with me for a moment, I will show you with working code how you can avoid writing most of your foreach() cycles.

<?php
// obviously only the prime numbers less than 20
$primeNumbers = array(2, 3, 5, 7, 11, 13, 17, 19);

// array_map() applies a function to every element of an array,
// returning the result
$square = function($number) {
    return $number * $number;
};
$squared = array_map($square, $primeNumbers);
echo "The squares of those prime numbers are: ",
     implode(', ', $squared), "\n";

// array_reduce() applies a function recursively to pair
// of elements, reducing the array to a single value.
// there is the native array_sum(), but the application of
// a custom function is the interesting part
$sum = function($a, $b) {
    return $a + $b;
};
$total = array_reduce($primeNumbers, $sum);
echo "The sum of those prime numbers is ", $total, ".\n";

// array_filter() produces an array containing
// the elements that satisfy the given predicate
$even = function($number) {
    return $number % 2 == 0;
};
$evenPrimes = array_filter($primeNumbers, $even);
echo "The even prime numbers are: ",
     implode(', ', $evenPrimes), ".\n";

// uasort() customize the sorting by value,
// maintaining the association with keys
// there is the native asort(), but again the customization
// of the function is more interesting
$compare = function($a, $b) {
    if ($a == $b) {
        return 0;
    }
    return ($a > $b) ? -1 : 1;
};
uasort($primeNumbers, $compare);
echo "The given numbers in descending order are: ",
     implode(', ', $primeNumbers), ".\n";

Tuesday, February 16, 2010

Testing protected members

This is a follow-up to Testing private members.

In the previous post, we discussed how to make sure that code in a private method is tested without accessing it directly, since that would imply the use of reflection and a violation of the class contract (its public methods or its interface.)
Following the same procedure, let's find out why we end up with protected methods in our classes.
For starters, we should indeed write a method only if really necessary, to keep production code as simple and short as possible. This requirement is usually made explicit in a [unit] test and thereby automatically checked. Of course only public methods are directly called, while protected and private ones are called by public methods we thoroughly test (otherwise they would be simply deleted as unused code).
Though, there is a way a protected method can be reached from client code that a private one cannot: inheritance. Availability through inheritance is in fact the trait that distinguish protected class members from private ones.
Thus, there are two possible situations:

the method is tested because at least a public method calls it. Given that the method logic does not warrant an external class, this arrangement would be fine.
the method is intended to be called by Client subclasses, so it is not tested.

The latter case is the tricky one and there are two solutions I propose:

write a simple subclass in a test case's source file and instantiate that class to test the protected method;
refactor the method with a public signature on a collaborator class that will be passed to the constructor of the Client, effectively favoring object composition over an inheritance-based Api.

The second solution is by far the best one if the protected method contains real logic.
Providing functionality with inheritance seems useful by a raw count of the lines of code saved, but it is really an hassle as an application grows. First, in most languages only one parent is allowed for a class, and this restriction leads to long hierarchies where one never knows where a method is defined and overridden. For example, in Java a JFrame is a Frame, which is a Window, which is a Container, which is a Component, which is an Object.
Second, it leaves too many responsibilities on a Client class, which cannot select what members to inherit and if it could, it would not be able to discern the dependencies between different methods. Too much inheritance bloats the Api of the subclasses (the aforementioned JFrame has more than 300 public methods, and only 30 are not inherited) and the parent classes' one if they try to accomodate uncohesive features.
Finally, inheritance forces a dependency on a concrete class. For example historically, Object-Relational Mappers implemented an Active Record pattern. Now they are moving to a less invasive approach based on Plain Old Java/Php/.Net objects, without forcing the end user to inherit from their abstract classes and allowing him to play with objects without referencing a database.
With a moderate use of inheritance, there are really few good reasons to test a protected method per se. I hope from now on you will take a look at composition and dependency injection, which are dynamic dependencies, instead of always setting up static links between classes such as the extends keyword.

Monday, February 15, 2010

Repositories in Doctrine

Jebb wrote to me asking information about the Repository pattern:

I read your blog at http://giorgiosironi.blogspot.com/and I found it very helpful. I develop applications using DDD and Zend Framework however for persistence ignorance, I use some basic Repository as interface for DAOs. May I ask you how to implement Doctrine ORM in the Repository?

As I explain in depth in my previous post about this pattern, a Repository is an illusion of an in-memory collection of all persistent ignorant entities of the same class. A Repository may be abstracted beyond an interface defining only the mandatory methods that make sense in the domain model.
How an Orm fits in this discussion? Very few repositories accomplish their tasks alone: most of them compose an underlying generic Data Mapper layer which is usually an Object-Relational Mapper which translates classes and objects in tables and rows. A generic Data Mapper such as Doctrine 2 provides all the possibly imaginable operations on the collection of entities, and the Repository abstraction decouples the other parts of the application from knowing anything about an Orm. This is the primary advantage of a Repository.

As you may have thought at the persistence ignorance reference, with Doctrine 1 there are no chances to implement a real Repository. Doctrine 1 does not provide persistence ignorance because it requires entity classes to extend a base abstract Active Record.
In Doctrine 2, repositories can be implemented as Plain Old Php Objects: simply injecting the EntityManager and other collaborators they may need in the constructor will suffice. A Factory can then encapsulate the new operators.
This is the standard Repository pattern: a Plain Old Php Object which aggregates whatever is needed to hide the persistence mechanism of objects.

Doctrine 2 also provides a facility to quickly implement Repositories, but the freedom of implementation of this approach is limited. The process is described in the Doctrine manual and it consists in:

extending an abstract class, which has a protected member $this->_em you can execute queries with;
annotating the entity class with the class name of the concrete repository class;
obtaining an instance with $em->getRepository('EntityClassName'): the EntityManager will create the object and automatically inject itself in the Repository.

Some notes on this architecture:

there is no entity-specific interface;
you extend an abstract class, which may be a cling because you want only some find*() methods to be available. However testing won't be affected since a concrete Repository will always involve a persistence mechanism.
it would be useful for having different repositories referencing each other via the EntityManager; but you may manually inject them instead, maintaining the original solution.
Doctrine 2 default Repository class is instantiated instead if you do not specify a subclass; if you use a lot of repositories and inject some of them in other service classes, the default implementation will be very handy. This would be a pro for taking advantage of the Orm support instead of using POPOs.

Friday, February 12, 2010

Tuning your Ubuntu machine with command line-fu

Ubuntu is in my opinion the leading Linux distribution because of its extended support for peripherals and large software repositories. Though, the default installation profile can be a bit heavy since it is thought as a catch-all configuration for a general purpose system.
Thus many software developers have the need for lightening the system load and reducing the disk space occupied by the distribution, along with the Ram eaten by daemons. This guide also works with Debian boxes since it is based on the dpkg and apt packaging system and on standard unix tools.

Let's open the hood and start with an example of unrequired packages. If you search packages that match the name ttf*:

dpkg -l ttf* | grep ii

you'll see the list of font packages installed. This collection comprehends Japanese, Korean, Gothic fonts, and so on till Futurama Alien Alphabet. You may want to remove some of them if you do not understand the languages which are written with these fonts.
Note that sometimes you will be asked to remove a package with a seemingly important name, such as ubuntu-standard. These metapackages are simply empty debs that depend on a large set of normal packages; they are meant as a shortcut for installing all those packages in a single shot: the Ubuntu installer simply requests the installation of ubuntu-standard and the apt system works out the details.

Let's try a more radical way to find space to free on the distribution partition:

dpkg-query --show --showformat='${Package}\t${Installed-Size} ${Status}\n' \
    | grep -v deinstall | sort -k 2 -n \
    | awk '{printf "%.1f MB \t %s\n", $2/(1024), $1}' \
    | tail -n 20

When I say that Unix tools are beautiful, now you'll know why.
This commands chain will show you the 20 packages with highest installed size, so that you can remove them with sudo apt-get remove. For example, I removed ubuntu-docs (hundreds of megabytes) and the openoffice Australian thesaurus (I'm not very keen on searching "synonyms" in the other emisphere.)

You may want to install some very small tools to simplify the management of your system:

bum is a tool to enable and disable services at the bootstrap (this is actually a graphical tool, if you want you can play with /etc/rc2.d/);
deborphan is a command line utility which lists packages that no other .deb depends on. deborphan | grep lib shows the list of libraries that are not used and so can be removed.

Remember to execute these commands periodically:

sudo apt-get autoremove removes packages installed to satisfy dependencies that are now useless. For instance if you install the vlc video player and it requires also 100 MB of Qt libraries, if you subsequently remove vlc the libraries are still there. autoremove will delete such packages: if you want to keep some of them, sudo apt-get install package-name will set package-name to manually installed, making you free to execute autoremove on the remaining packages.
sudo apt-get clean deletes cached *.deb files which have been downloaded in the past. After a successful installation, there is no need to keep them around.

I hope these tips can help you shrink your distro impact on machine resources. I have a 2 GB root partition on my EeePC, and in my case it is very important to remove packages installed by default that do not have a real use.

Thursday, February 11, 2010

Practical Php Patterns: Command

This post is the second one in the behavioral patterns part of the Practical Php Pattern series.

The behavioral pattern we will discuss today is the Command one, a mechanism for encapsulation of a generic operation.
If you are familiar with C or Php, you have probably already encountered Command as its procedural equivalent: the callback, which is usually implemented as a function pointer or a data structure such as a string or an array in php.
Command is an abstraction over a method call, which becomes a first-class object with all the benefits of object orientation over a set of routines: composition, inheritance and handling.
For example, the GoF book proposes to use Commands to store a chain of user actions and supporting undoing and redoing operations.
Note that php 5.3 functional programming capabilities (Closures) can be used as a native implementation of the Command pattern. Though, there is an advantage in type safety in using an abstract data type for every Command hierarchy.

In this pattern, the Invoker knows that a Command is passed to it, without dependencies on the actual ConcreteCommand implementation. The solved problem is the association of method calls by configuration: for instance ui controls like buttons and menus refer to a Command and assume their behavior by composing a generic ConcreteCommand instance.

Participants:

Command: defines an abstraction over a method call.
ConcreteCommand: implementation of an operation.
Invoker: refers to Command instances as its available operations.

The code sample provides Validator components implemented as Command objects.

<?php
/**
 * The Command abstraction.
 * In this case the implementation must return a result,
 * sometimes it only has side effects.
 */
interface Validator
{
    /**
     * The method could have any parameters.
     * @param mixed
     * @return boolean
     */
    public function isValid($value);
}

/**
 * ConcreteCommand.
 */
class MoreThanZeroValidator implements Validator
{
    public function isValid($value)
    {
        return $value > 0;
    }
}

/**
 * ConcreteCommand.
 */
class EvenValidator implements Validator
{
    public function isValid($value)
    {
        return $value % 2 == 0;
    }
}

/**
 * The Invoker. An implementation could store more than one
 * Validator if needed.
 */
class ArrayProcessor
{
    protected $_rule;

    public function __construct (Validator $rule)
    {
        $this->_rule = $rule;
    }

    public function process(array $numbers)
    {
        foreach ($numbers as $n) {
            if ($this->_rule->IsValid($n)) {
                echo $n, "\n";
            }
        }
    }
}

// Client code
$processor = new ArrayProcessor(new EvenValidator());
$processor->process(array(1, 20, 18, 5, 0, 31, 42));

Some implementation notes for this pattern:

some of the parameters for the method call can be provided at the construction of the ConcreteCommand, effectively currying the original method.
A Command can be considered as a very simple Strategy, with only one method, and the emphasis put on the operation as an object you can pass around.
ConcreteCommands also compose every resource they need to achieve their goal, primarily the Receiver of the action where they call method to execute a Command.
Composite, Decorator and other patterns can be mixed with the Command one, obtaining multiple Commands, decorated Commands and so on.

Wednesday, February 10, 2010

Testing private members

Yesterday, Sebastian Bergmann, the author of PHPUnit, posted his response to a question he is asked frequently, How do I test private members of a class?
He summarizes the ways to access private fields and methods of an object from a PHPUnit test case, and he concludes that being able to test them does not mean it is a good thing. Today I am going to explain why testing private methods explicitly is considered a bad practice.

Why do we limit the scope of a method to private or protected to begin with? To be able to subsequently refactor the code and change the signature or the behavior of the method accordingly to new requirements or what we have learned about the problem domain.
The same reasoning is even more valid for field references, which cannot encapsulate a computation and are simply stored variables. I haven't created public fields in production code for years.
Thus the first reason is: a test that exercises a private member couples the private member to code that is external to the system under test. This means you cannot safely delete or modify a private method simply by modifying a class source file: you will have to update the test even if the external behavior of the SUT did not change.

Sometimes you find a private method which contains purposeful logic, and you may want to cover it with specific testing. If this is the case, I suggest to move such method in an extracted class whose instance is stored as a private reference. This way you'll be able to test both the extracted class and the original one, maybe even mocking out the private method in the latter's unit tests. The abstraction provided by the original class is left untouched.
This solution is useful particularly when testing a private method becomes simpler than testing a public one of the same class; it is the sign that another interface is needed between the class and its private member. You're establishing a contract anyway, by writing a test for it: so why do not make this contract explicit and give it citizenship with a class?
The Single Responsibility Principle is the most abstract and the most overlooked of all object-oriented techniques.

Now let's tackle the same problem in the contest of Test-Driven Development. Remember what TDD means: writing code only to satisfy a unit test, while not being allowed to write more code than the really mandatory lines to get a green bar.
How do you get to write a method if you're doing TDD? There are two possible cases:

the method is written to make a test pass directly; only public methods can serve this purpose if you do not let test code tinker with class internals.
The method is written as a consequence of internal refactoring; but if you're refactoring, there are tests that cover the interested functionalities, so the method is tested anyway via public methods.

Thus the original question is similar to How do I test an abstract class? Since an abstract class can be possibly created only by extracting a superclass from well-tested concrete classes, it should not be tested as well as private methods should not be tested, because they are exercised by definitions by subclasses and public entities. If they weren't exercised, we would have simply thrown them away as unreachable code. If they need more exercising code (maybe they are not in shape :), they should declare a public contract and reside on an external class.
The moral of this analysis is: Test-Driven Development once again saves the day and makes sure our code is covered; unit testing becomes really hard only if you do not write tests first.

Tuesday, February 09, 2010

Practical Php Patterns: Chain of Responsibility

This post starts the behavioral patterns part of the Practical Php Pattern series.

The first behavioral pattern of this part of the series is named Chain of Responsibility. Its intent is organizing a chain of objects to handle a request such as a method call.
When a ConcreteHandler does not know how to satisfy a request from a Client, or it is not designed at all to do so, it delegates to the next Handler in the chain, which it maintains a reference to.
This pattern is often used in conjunction with Composite, where some Leaf or Container object delegates an operation to their parent by default. As another example, localization is often handled with a Chain of Responsibility: when the German translation adapter does not find a term for a translation key, if falls back to the English language adapter or even to displaying the key itself.

The coupling is reduced to a minimum: the Client class does not know which concrete class handles the request; the chain is configured at the creation of the object graph, at runtime, and ConcreteHandlers do not know which object is their successor in the chain. The behavior is successfully distributed between the objects, where the nearest object in the chain has priority on stepping in and assuming the responsibility to satisfy the request.