Invisible to the eye: March 2010

Wednesday, March 31, 2010

India, and the Great Indian Developer Summit

At least according to Google Analytics, if we leave out for a moment my Western audience, I have a significant amount of readers from India. One of the advantages of the Internet era is that it lets us communicate with other, far parts of the world on a daily basis. Here in Italy India is famous for its history, being one of the most important centers for commerce in the world, and for its variegate culture and religions.
But India has also a somewhat controversial reputation in the Western software niche: some thinks it is a rapidly growing country after the economic liberalisation and that the Indians will soon steal all the jobs to us, while many other people imply that India is the ideal target for low-quality work which you don't want to pay much for.
For example, the Pragmatic Bookshelf used to sell a book named My job went to india: 52 Ways to Save Your Job. It is a very good book and its goal is improving the career path. That said, it has nothing to do with India: the title was chosen as a shocking one because of the outsourcing hype at the time. It has been subsequently renamed to The Passionate Programmer.

While it is true that the cost of life in India (at least in a large part of the country) is lower than in the Western countries, I do not believe the fable that there are only terrible developers there. India is a large country and even if the statistical distribution of educated developers were the same of Europe and United States, it would be normal to encounter a vast amount of mediocre developers. We encounter them everyday in our own cities.
If you are an Indian developer, the fact that you're reading here suggests that the tail of the curve contains also quality-aware, conscientious programmers, reflecting the overall situation of every developed country. I think here in Italy we have a percentual of simply bad developers in the web engineering field at least equal to India, given all the people who jumped on the bandwagon of the information era.

By the way, a reader pitched me about an Indian event for programmers. I hope, if you are an Indian reader, it interests you and I have not wasted your time.
This event is the Great Indian Developer Summit, maybe the biggest conference for Indian software developers. It will be held in Bangalore, from April 20 to 23. The majority of the Indian visits to this blog come from Bangalore, more than from other important cities like Calcutta. Is Bangalore a technological centre? For instance, the majority of Italian visits come from Milan, while the peak of US visits is from San Francisco.
The conference is composed of 80 session, divided in the four days by technology or programming language: .NET, web-related, Java, and workshops.
About the arguments, I am not an expert of .NET technologies, but the covered topics are ASP.NET, SQL Server 2008, Visual Basic 2010, C#, Azure, Silverlight, among others. For the web day, we have Rich Internet Applications, Ajax libraries (Dojo, JQuery) vs. Flash, HTML5, frameworks such as RubyOnRails and the Python-powered Django.
The third day is dedicated to Java: the talks are mainly about frameworks (Spring, Struts, GWT, Wicket) and alternative languages that compile to bytecode for the JVM (Scala, Groovy, JRuby). The fourth day comprehends workshops on Java, Cloud Computing and rich applications, Agile development, Microsoft technologies. Also a free Internet connection is provided during all the four days.
A conference provides many learning and networking opportunities, especially in a big city like Bangalore. I'm only a starter for what concerns these events, but I can see how a full immersion in this environment can benefit an average web developer.

Original image of the Taj Mahal from Wikimedia Commons.

Tuesday, March 30, 2010

Always code as...

In programming there's an old saying that goes like this:

Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live. -- attributed to John Woods

I must say he's right. Fortunately there are usually no psychopaths hanging around on your legacy applications, but it's a good karma rule to adhere to. And you may end up maintaining your own code, which is an outcome I hope you had considered. :)
I'd like to extend this maxim with some other suggestions similar in form.

Always code as if you were pair programming (mental/social metaphor)
When we are pair programming, we have to explain what we are doing. We get a mindset that challenge our code to make it better. We never get angry at the machine or at a library's developers.
A benefit of pair programming is having a different mind that works over the same problem, so that there is a failsafe in place to stop bugs or technical debt from being introduced. But the habit of trying to explain, or at least state, a problem before solving it is a productivity booster, even if you only talk to a rubber duck. This habit is enforced by TDD too.

Always code as if you were paying your lines' weight in gold (financial metaphor)
The less code you write to solve a problem, the less code you'll have to maintain: code is widely considered a liability more than an asset (an high-level programming language is crucial here.) Moreover, there is a limit to the lines of code you can write every day while maintaining an acceptable level of quality.
You should favor verbosity only to improve readability and encapsulation: the trade-off is difficult to find here, but introducing new domain concepts as classes or methods is often a valuable asset that balances the code's hindrance, as long as they are significative for the application (e.g. customer's credit card number vs. customer's eye color.)

Always code as if you had to deploy and use your application at the end of the day
Which may be the case if it's a web application.
Portability is not a feature you can add as single user story: the best way to make an application portable, configurable, deployable and most of all working is to build it as simple as possible with these characteristics (a walking skeleton), and keep them while you expand the codebase with new features.
This practice reduces risk (chances that you may find your ideal deployment requirements cannot be met) and helps early automation of the most boring tasks, defining them clearly as soon as possible. Once the technicalities have been removed and a clean environment is ready, you will be free to work on the pure domain model.

Monday, March 29, 2010

The TDD checklist (Red-Green-Refactor in detail)

I have written up a checklist to use for unit-level Test-Driven Development, to make sure I do not skip steps while writing code, at a very low level of the development process. Ideally I will soon internalize this process to the point that I would recognize smells as soon as they show up the first time.
This checklist is also applicable to the outer cycle of Acceptance TDD, but the Green part becomes much longer and it comprehends writing other tests. Ignore this paragraph if this get you confused.

TDD is described by a basic red-green-refactor cycle, constantly repeatead to add new features or fix bugs. I do not want to descend too much in object-oriented design in this post as you may prefer different techniques than me, so I will insist on the best practices to apply as soon as possible in the development of tests and production code. The checklist is written in the form of questions we should ask ourselves while going through the different phases, and that are often overlooked for the perceived simplicity of this cycle.

Red
The development of every new feature should start with a failing test.

Have you checked in the code in your remote or local repository? In case the code breaks, a revert is faster than a rewrite.
Have you already written some production code? If so, comment it or (best) delete it to not be implicitly tied to an Api while writing the test.
Have you chosen the right unit to expand? The modified class should be the one that remains more cohesive after the change, and often in new classes should be introduced instead of accomodating functionalites in existing ones.
Does the test fail? If not, rewrite the test to expose the lack of functionality.
Does a subset of the test already fail? Is so, you can remove the surplus part of the test, avoiding verbosity; it can come back in different test methods.
Does the test prescribe overly specific assertions or expectations? If so, lessen the mock expectations by not checking method calls order or how many times a method is called; improve the assertions by substituting equality matches with matches over properties of the result object.
Does the test name describe its intent? Make sure it is not tied to implementation details and works as low-level documentation.
How much can you change in an hypothetical implementation without breaking the test (making it brittle)?
Is the failure message expressive about what is broken? Make sure it describes where the failing functionality resides, highlighting the right location if it breaks in the future.
Are magic numbers and strings expressed as constants? Is there repeated code? Test code refactoring is easy when done early and while a test fails, since in this paradigm it is more important to keep it failing then to keep it passing.

Green
Enough production code should be written to make the test pass.

Does the production code make the test pass? (Plainly obvious)
Does a subset of the production code make the test pass? If so, you can comment or (best) remove the unnecessary production code. Any more lines you write are untested lines you'll have to read and maintain in the future.
Every other specific action will be taken in the Refactor phase.

Refactor
Improve the structure of the code to ease future changes and maintenance.

Does repeated code exist in the current class?
Is the name of the class under test appropriate?
Do the public and protected method names describe their intent? Are they readable? Rename refactorings are between the most powerful ones.
Does repeated code exist in different classes? Is there a missing domain concept? You can extract abstract classes or refactor towards composition. At this high-level the refactoring should be also applied to the unit tests, and there are many orthogonal techniques you can apply so I won't describe them all here.

Feel free to add insights and items on the list in the comments. I value very much feedback from other TDDers.

Saturday, March 27, 2010

Weekly roundup: March is ending

I just wanted to inform you about some content you may find interesting.

This week I have published three new articles at php|architect:
Google releases skipfish
Impel, the Javascript ORM
Ten Top PHP people to follow on Twitter

Moreover, my post Contributing to open source projects has been republished on DZone.

Friday, March 26, 2010

The rest of the NakedPhp walktrough

This post's title is a parody of The Rest of the Robots.

In the previous post we have seen how to manipulate PHP objects directly, and calling methods on them, thanks to NakedPhp's user interface. Today we will see how to interact with the database, essentially how to store object in it and retrieve them via Repositories.
The situation we left the example application into was the following:

There are two Example_Model_City objects in memory (London and Paris), and a Example_Model_Place (Eiffel Tower) which references, with a many-to-one directional association, Paris.
We are ready to save our session. Generally, a subset of the application's object graph is instantiated in memory, modified and put back in the storage. We are not assuming that the storage is a relational database: it is simply a component that take an object graph and persist it someway. In PHP, one of the few library that can do this is Doctrine 2, which will map our objects on a set of relational tables.
We check that there are no objects to remove, and then click Save:

Note that after saving the session, the objects' color changed. In the example application I associated via CSS rules green to transient objects and orange to managed ones. The difference between the two is that a transient object will vanish if we let the session expire, while a managed one is already present in the storage (so it should be actively removed if we want it to vanish.) The save action response tells us three new entities have been persisted, and no managed ones. Moreover, no objects have been removed.
Let's close this session and click Clear.

The session is now empty, and in the last screen we are already gone to the PlaceFactory object. We notice a new findAllCities action available. It has appeared now because there is a hideFindAllCities() method on Example_Model_PlaceFactory, which returns true until there are no cities in the database (Services has access to external infrastructure like storage and whatever we want them to use, while Entities usually have no external references since they have to be serialized.)

The findAllCities action brings us back all the cities stored in the database. This is an "object" - really an array, but every variable saved in the session is wrapped in a NakedObject instance.
The difference between a normal object and an array is that a wrapped array has a Collection Facet, and the views recognize this Facet and treat it differently. Particularly, they list the contained objects and provide access to them.
Finally, note that this Entity is managed and not transient, because it has been retrieved from the storage.

By clicking on the row of an item of the collection, the object is extracted in the session (it is considered transient because the semantic for the extracted objects has not been written yet. This will be fixed in the future.) It still refers to the same instance in the collection, so if we modify one we'll see the changes in the other.

We now see an example of method call with object parameters. The createPlaceFromCity action will take a City and create for us a new Place object with the city property already set. But we don't like the current cities: we want another one.

As it was the case of Entities editing, the context is conserved between different method calls. We have clicked on createCity and we are now creating Rome.

The new object is available for the original method call now.

And finally, the action returns a new Place object which is stored in the session. The city has been set correctly.

That's all for now. I have exposed a practical example of NakedPhp features and of direct manipulation of objects, which has been persisted, retrieved, and moved around.
Now I can go back to add other features, such as semantics for removal and extraction of objects, and a complete method merging feature. What do I mean? It will be useful to have the createPlaceFromCity method also on every City object, so that, given a city, it will present to the user a Factory Method for Places. In this case, we could have put the method on the Example_Model_City class, but such a method may require collaborators which we should not inject on City: imagine a sendByMail(City) or searchSimilar(City) methods which access mailers or the database. With method merging, any service method which has in its parameters an object of class A will be callable from A objects as well, with the particular A parameter automatically passed and hidden.

The method merging feature was already present in NakedPhp but is currently broken (if I had acceptance-TDDed it, it wouldn't have been.)
See you in April for NakedPhp 0.2 and some news!

Thursday, March 25, 2010

A NakedPhp walktrough

Today I'll present a walktrough in the NakedPhp example application, to show the direct manipulation of objects it provides, with features like calling methods and editing of Entity objects.
The example application manages a Domain Model that contains places (shops, sightseeings, and so on) and cities. Places have a directional many-to-one relationships with cities.
I will upload a screenshot for every step.The graphic is very basic, but it is not a responsibility of the framework. Every application can write its own layout, complete with style sheets, and assemble the content pieces differently.

Supposing we had set up correctly the example application, to start working we have to load the naked-php controller default action. In my Apache configuration I set up the public/ directory as the DocumentRoot of the example virtual host. There are really no differences with others Zend Framework applications.

The starting page shows in the header the two declared services (managed as singleton-like instances), an empty session and a context bar, which we can ignore for now. Classes of a Domain Model are divided in two parts: Services (always available and never serialized) and Entities (managed in the session bar and passed around). Factories and Repositories go under the Services umbrella, while ValueObjects for now are not supported because the underlying object-relational mapper does not support them yet.
Clicking on the PlaceFactory service in the header will send us to the object Example_Model_PlaceFactory, with a list of the available actions (exposed methods, not filtered for now):

If we click on createPlace, a new Example_Model_Place object will be created by this factory method. This methods has no parameters and creates an instance with default values. We are redirected to that instance, which now is in the session bar:

If we click again on PlaceFactory and choose the createCity action, a form will be shown since this method requires one parameter (the name of the city):

We insert London as the name and submit the form. A new object is put in the session bar and we are redirected to it as it is the result returned by the method. The session bar keeps object in the PHP session: we are not touching the database. This means that entities should be serializable, and this leaves us free to use Plain Old PHP Objects that do not extend any framework class. The only requirement is that the phpdoc annotations and a few other ones are present to determine the type of parameters and method return values.
After the creation and the redirect, a list of the properties of the Example_Model_City object is shown:

Note that the object different specifications (classes) are distinguished by different icons.
If we go back to the Default Name Place by clicking on it and follow the link on the pencil, an editing form is generated basing on the setters available on the object.

Note that the context bar now contains one more link (other than Index). When a form for selecting method parameters or objects fields is shown, the context is conserved. If we don't like the objects we have in the session, we can go around looking for better ones, or create them. In the example, we want to create the Eiffel Tower Place and since it is not in London, we need a Paris City object. So even if we are on the form, we simply go on the PlaceFactory service and select the createCity action again:

The context has grown again (it can be reset by going to the index if someone screws up.) Then we submit the form and we are redirected to the last action we were calling, the editing of the entity 1:

We can now select Paris for the city field and change the name to Eiffel Tower, then submit the form. Unfortunately collections are not supported and we will get an error, but the process works well until it encounter the events collection (not yet implemented). If we simply reload the index page and click on the newly renamed Eiffel Tower object, we'll see it has been correctly edited.

We have worked only in-memory and not had any interaction with the database yet. This post is getting long so I will show you tomorrow the second part of the walktrough, where we save these entities and retrieve them with a method that is hidden or shown automatically basing on the current state.

Wednesday, March 24, 2010

PHP in Action review

PHP in Action is a hands-on PHP book written by Dagfinn Reiersol, Marcus Baked and, most notably, Chris Shiflett. PHP in Action is maybe the only PHP-specific books which bridges typical PHP topics, such as forms and database handling, to object-oriented design. It is very rare to encounter a book like this, which teaches object-oriented programming from a non naive point of view (How do I write those "classes?") in the PHP environment. PHP is still catching up with other languages in this field and many developers can only benefit from improving their modelling skills and design practices.

As I said, this book is PHP-specific; though, many other titles proclaim they're teaching object-oriented PHP on their covers, while the only touched topics are public and private fields, and how to extend classes with inheritance (if that seems normal to you, read this book.) Many publishers jumped on the bandwagon of PHP 5 and proposed books focused on the language constructs instead of the things you can build with them.
PHP in Action is a bit different. For example, it includes some of the SOLID principles and examples of their application in PHP code, without too many assumptions about the overall knowledge of the reader. The most important Design Patterns are explained, with an eye to the native support offered by PHP 5 (SPL).
Advanced techniques (for the average developer) are also introduced, such as refactoring, unit testing and Test-Driven Development. By no means this is an in-depth read on these topics, but the average developer which has a deep understanding of the PHP technology (but not of OO as a decent support was introduced only a few years ago in the language) will find this book useful to start upgrading his skills to the next level. I think this is a common situation, and was also mine; if I had found this book previously, my journey would have been simpler as I wouldn't have had to translate knowledge from Java books. I hope these advanced parts will become a standard in the future.
That said, there is really no PHP book that describes in full depth object-oriented design. There are specific books on object-oriented development, which are very long and insightful and still not complete. These books usually choose Java for their code samples (or C++ if they're very old); you may want to refer to different titles for pure object-oriented learning.

About the material provided, the code samples are inserted in each chapter, and are also refactored while iterative development takes place. There are many small Uml diagrams to help the reader understand what's going on - mainly class and sequence diagrams. Highlighting of relevant code and changed lines is the norm, along with ordered explanation lists linked to different point of the code samples that substitute intrusive comments.

The level of the book is adequate for the intermediate coder, thus I found it easy to read. Nevertheless, it is a good panoramic of the PHP landscape in term of the transition to object-oriented programming. The first edition is from 2007, and it is not outdated; though, you may consider using a framework to provide many of the infrastructure seen in the book, which is provided more for teaching than actual every day usage.

Some of the links in this post are affiliate links.

Tuesday, March 23, 2010

Introducing NakedPhp 0.1

I dedicated a bit of my time to develop a port of the Naked Objects framework for Java, which I named NakedPhp. Essentially, the Naked Objects pattern is the automatic animation of an object-oriented Domain Model via the generation of the user interface and of the persistence layer, both commonly sources of duplication and kitchen sinks for business logic. Naked Objects is a radical reinterpretation of MVC, where the Model is at the center and the other layers are inferred from it and customized. You will call methods and move objects around in such an application.
This is NOT yet another Php framework: NakedPhp leverages Zend Framework and Doctrine for all the stuff you would expect a normal framework to do.

Previous related posts
Naked objects, DDD and the user interface
A look at technical question on Naked Objects
Naked objects in Php
Where is business logic?

NakedPhp applies the same pattern to Php applications, and I have tagged its 0.1 release today (this is an alpha release.) The Api is borrowed from the original Java Naked Objects framework. This is by no means a complete framework, but it's a good start since its basic CRUD functions are already working.
NakedPhp integrates Zend Framework 1.x for the user interface management and Doctrine 2 for the persistence layer. As for all Doctrine 2 applications, it requires Php 5.3, which is why I don't have an online demo right now.
Links
NakedPhp 0.1 on SourceForge
Bug tracker
Git repository (view it online):
git clone git://nakedphp.git.sourceforge.net/gitroot/nakedphp/nakedphp

A simple example application realized with NakedPhp is provided in the released package, which uses a bundled sqlite database. An application for NakedPhp is just a plain old Zend Framework application with a controller that inherits from NakedPhp\Mvc\Controller, and that inits the Entitymanagerfactory (related to Doctrine 2) and Nakedphp resources with the mandatory options like the path to the model classes's folder. NakedPhp is very liberal about what you do in your application: it only generates the first-step views and leaves you the layout for customization.
I will write related documentation about how to hook NakedPhp in an application in the next days which I will store along with the code in the Git repository on SourceForge.
The behavior of the generated interface (which is not scaffolding: it's intended for actual use and not for being modified) is driven by annotations on the model classes and by methods with special names. For example, properties available on Entity objects are inferred from getters and their modification is possible if correspondent setters are present. Other methods are called when available to determine automatic hiding of properties and methods (hideName(), hideFindAllCities()), validation of data and so on.

The sample application provides a basic workflow for an hypothetical Domain Model for shops, pubs, or similar places. You can create Cities and Places, modify their properties, save them in the database, then clear the session and retrieve them via service classes. This is a direct interface to the Domain Model, without translations.
Intructions are included in the release for the super-simple setup of the example application (essentially running phing build-example).

Let me know if you are interested in a framework that writes the user interface for you. I know that the Naked Objects pattern is not appropriate for all situations, but the goal is to simplify the presentation of the real Domain Model to the user for certain kinds of model-driven applications. It is also providing a prototyping interface for Domain-Driven Design in Php.

Monday, March 22, 2010

Contributing to open source projects

Amit has written from India asking how to start participating in open source projects.

I am a software developer from India and recently came through your article on "How improved hardware changed programming". It was good reading it. I wanted to know more about open source projects & how to get involved in it. I read that you contribute to a couple of them so thought of asking you about your experience and how did it help from a developer's perspective.

My experience with code contributions to open source projects is mainly in the field of Php libraries and frameworks. This is not a coincidence as I am more stimulated to make contributions to projects I personally use: if I had to give one advice to choosing an open source project to participate in, I would recommend selecting a project you actually use at the Api level (interfacing with their source code or with their binary interface with your own code).
It's not an egoistic choice, although you would clearly benefit from your improved knowledge of the project internals, bugs that have been fixed and new features that have been introduced thanks to your work. It's more a synergistic approach.
Employing an open source project at the user level (in the case of standard applications) gives you a picture of its overall features and maybe an involvement with the supporting community, which is not a deep vision of the project goals and inner workings. But your contribution will be by far more valuable and simple if you start with contributions to codebases you already know "intimately". I would never try to contribute to Pidgin with code, because even if I run it all the time for instant messagging, the time I would spend in a field not related to my work it's probably not worth very much, as there is a steep learning curve and the learning process is limited to a field I'm not interested into (and thus am likely not to enjoy.)

I contribute to Zend Framework and Doctrine (Php projects), and almost all of my patches are derived from discovering a bug or an incomplete behavior in part of the library and wanting to fix it once and for all. For the uninitiated, a patch file is a text file written in particular formats for modifying a codebase while sending only the set of changed lines of code. It may be the case that some projects use different collaboration models (Git push and pulls for example), but usually these models are automated processes built on juggling patches around.
It's important to get patches upstream, to the official maintainers, so that they can decide if the change fits the vision and the guidelines of the project (for instance backward compatibility), and include it in subsequent releases. This is an example of advantage of valuable contributions: you do not have to patch your copy of a library every time a new release comes out. Conversely, everyone that downloads the original project will benefit from the improvements (hoping that little bloat is introduced.)

Besides the code contributions, there are several ways to help open source projects whose code you're not familiar with:

writing end user documentation and tutorials that help the spread of the application.
Signalling bugs and providing more general feedback about the user experience.
Writing translations for the user interfaces: while nearly every developer speaks English, other languages are rare and if you happen to know Slovak, you may help.
Even small donations are useful if you feel like contributing somehow but you do not have time. Money usually goes towards web hosting for the project to remain accessible to everyone.

Summing it up, there are many ways to contribute to open source projects, with and without writing actual code. Every respectable project has a Contributing page dedicated to provide you with a few pointers to start out and meet the maintainers: look for it and see how you can give back to an application that eased your work many times.

Saturday, March 20, 2010

Weekly roundup: spring approaching

I've been a bit busy starting the work for my bachelor's degree thesis and my academic web project on Twitter analysis, but there are some new articles I'd like to show you.

This week I have written two new posts on php|architect's website:
Development principles
PHP on Ibm i servers

Moreover, How improved hardware changed programming has been republished on DZone.

Thursday, March 18, 2010

Practical Php Patterns: State

This post is part of the Practical Php Pattern series.

The pattern of today is the State one: its intent is allowing an object to change its behavior when its state change, while hiding the state-related informations.
The main role in this pattern is carried out by a State object that encapsulates the state-related behavior beyond a segregated interface.
Instead of executing the same structure of conditionals in many methods of the Context object, it delegates part of its job to State. Switch constructs are often candidates for replacement by a State collaborator.
The effective State is changed by replacing the State object with a new instance, possibly from a different class (State implementations may be Flyweights.)
Usually factory methods on the State object create other istances in response to events raised by the Context calls, so that the transition is dictated by the internal State code. You can implement a Finite State Machine with this pattern, ensuring that the transitions cannot be violated because the next State is decided by a ConcreteState method.
Moreover, the State methods may accept parameters from the Context to fulfill their responsibility, so that there is no bidirectional relationship to maintain in field references.
There are two explicit concepts enforced and named by the use of this pattern: the stateful part of an object (encapsulated in the State collaborator and separated from the immutable behavior of Context) and the state transitions (abstracted in the assignment of a new State object to a Context field reference, which on the right side has a call to an handler method on State implementations.)

Participants

Context: defines an interface for Clients and maintains a State object internally.
State: defines an interface for the state-related behavior and often for transitions.
ConcreteState (more than one class): implements a particular behavior and set of valid transitions.

Note that ConcreteStates are usually ValueObjects, so they are immutable and shared. Thus a new instance of one of the ConcreteStates is created when there is a transition.

The code sample implements a Finite State Machine for the parsing and validation of a binary string, which is considered valid if it has a parity bit (the number of 1 bits is even). The state and their relationships are encapsulated by the set of State implementations, and can change without affecting the code of Context.

<?php
interface State
{
    /**
     * @param string $input     '1' or '0'
     * @return State            next State
     */
    public function parse($input);

    /**
     * @return boolean          whether a binary string that
     *                          brought to this State is valid
     */
    public function valid();
}

/**
 * A ConcreteState. The machine is in this state
 * when the number of 1 read is even (valid string).
 */
class EvenOnesState implements State
{
    public function parse($input) {
        if ($input == 1) {
            return new OddOnesState;
        } else {
            return $this;
        }
    }

    public function valid()
    {
        return true;
    }
}

/**
 * A ConcreteState. The machine is in this state
 * when the number of 1 read is odd (invalid string).
 */
class OddOnesState implements State
{
    public function parse($input) {
        if ($input == 1) {
            return new EvenOnesState;
        } else {
            return $this;
        }
    }

    public function valid()
    {
        return false;
    }
}

class ParityBitValidator
{
    protected $_state;

    /**
     * @param State $initialState   the state at the reset
     */
    public function __construct(State $initialState)
    {
        $this->_state = $initialState;
    }

    /**
     * @param string $bits
     * @return boolean
     */
    public function isValid($bits)
    {
        for ($i = 0; $i < strlen($bits); $i++) {
            $bit = $bits{$i};
            $this->_state = $this->_state->parse($bit);
        }
        return $this->_state->valid();
    }
}

$validator = new ParityBitValidator(new EvenOnesState);
var_dump($validator->isValid('10101001101'));    // true
var_dump($validator->isValid('101010011011'));   // false

Wednesday, March 17, 2010

How improved hardware changed programming

This is a follow-up to the previous post, CPUs speed and technology innovations.

As we have outlined in the previous post, the memory size and computing power available to the average programmer has increased thousands of times from the first years of his art, at least in the boundaries of a single machine (network is a common bottleneck.)
This performance improvement has made radical changes to the style we use in writing software and in its development process.
The first notable change is the progressive introduction of higher-level programming languages. C is near the raw metal of a machine, but upon it portable languages have been written, such as Java and Python. They are still third-generation programming languages but they sacrifice performance for portability by providing a virtual machine and an interpreter respectively.
This is a general trait of higher-level languages: trading machine time to save the developers' time, which with the hardware improvements is now the most expensive resource. At the time of his release to the public in 1995, Java applications were considered slow programs with an extensive memory footprint (with reasons). However today this is no longer significant for a vast set of applications, and the same is true for other high-level languages like Python and Php. Premature optimization is now the evil, not the Java Virtual Machine.

Some people say that software bloats faster than Moore's law can help it: we went to the Moon in 1969 with 4 kilobyte of Ram, now we need 100-200MB to run an operating system. But the features and power of our machines is now much greater (discounted by the amount that runs software's bloated parts), we can do things that were only dreams in 60s.
Continuos integration of software project and immediate feedback via tests are two things that derive from a large amount of computing power available. Donald Knuth is a magician in algorithms, but back in the 60s he has to write a program by hand during the day, and let the machine compile it at night. Now we have the whole process of building and testing for a moderate size program run in minutes from every code check-in. Algorithms were proved on paper: now they are tested on large datasets.
Object-oriented programming is a practice fundamentally less performant than "classical" structured programming, because it stores pointers to virtual methods in every variable, even a wrapped integer. But it lets you have a real domain model in an application, where different entities, both represented as integers, cannnot be mixed up. Few of us will start a serious enterprise application without this paradigm available: the hardware improvements again made possible to simplify the programmer's life, even if sometimes bloat would be introduced.

And the list goes on: every best practices post you can find today (also mine) is in part generated by the continuous hardware improvements that have occurred, and it is a good thing, because it means we are leveraging the machines' power. Meaningful naming for entities? Try it with an 8 character limit. Iterative development, refactoring? Made possible by the insulation layers between components, which are a form of "bloat". Distributed version control? Thank you, cheap space on hard disks.
Moore's law won't save anyone introducing bloat in software. But it makes new programming practices feasible, directly borrowing them from your dreams.

Tuesday, March 16, 2010

CPUs speed and technology innovations

The number of transistors that can be placed inexpensively on an integrated circuit doubles approximately every two years. (Moore's law)

This empirical law is primarily known for the increasing trend of processors speed, which is widely influenced by the capacity of miniaturizing transistors in silicon-based chips. However, the general formulation of the law affects all semiconductor technologies, such as CPUs and electronic memories.
We may also note that similar laws exist for magnetic disks, which storage density doubles annually, and for network capacity (optical fiber's bandwidth doubles every nine months.) This set of empirical laws describe the exponential growth that we have been experiencing for many years.

But do limits to these exponential laws exist?
When miniaturization is involved, atomic limits are usually the ultimate frontier. The current most advanced production process for semiconductor devices is the 45 nanometer technology, which length refers to half of the average distance between memory cells. Each cell consists of two or four transistors, whose size is in the range of 100-200 nm.
For comparison, the size of an hydrogen atom is roughly 0.1 nm, and it is indeed the smallest existent atom. The average distance between silicon atoms arranged in a crystal like the ones used in chips is about 0.5 nm.
The predicted technology for 2015, according to the International Technology Roadmap for Semiconductors, is the 11 nanometer one. This means miniaturized components will be electrically insulated by a layer of... twenty atoms.
The consequences for the inner working of these devices are disastrous. As you may know, the physics models that govern modern integrated circuits are not classical: their design follows quantum mechanics. At such a small scale, quantum tunneling (the same process that charges flash memory cells) can make insulated cells interact. Silicon (and its doped derivates) usage cannot scale to the atomic sizes.

But limits are made to be infringed. For example, the integrated circuits are manufactered by photolithography. Without descending into the details, let's just say that a photoresistive material is "painted" over some area of the chip, which is then exposed to a light flash so that excess material is somehow eliminated from the chip, like if it were a chunk of photographic film. What remains under the shielding layer are the tracks and soon-to-be components of the integrated circuit.
Cool. What's the problem? The light used for this process, as all the other forms of light, has a minimal wavelength of about 200 nm. Thus transferring patterns of elements smaller than this size should not be feasible: it should be like revealing the presence of a 1-inch hole in a wall by launching basketballs at it, in the darkness. But it has been made possible with new techniques and tricks when the limit was reached.
And the same has been done with multi-core architectures: the power consumption of a CMOS cpu is proportional to its clock frequency, and the external components do not increase in speed as a cpu. So when an effective barrier in frequency was reached, the electronic engineers started increasing the processors in a single package instead of having them run faster. The result are the current dual- and quad-core architectures (and the new problem: how to parallelize tasks efficiently to keep all cores busy?)
Maybe Moore's law in its current form will cease to work in the future, but the technology level and the computing power it brings along are still likely to increase for a long time. There were, and still are, great business incentives to continue to enhance computing devices, and clever men always find another way to make money.

Monday, March 15, 2010

Coffee and programming

A mathematician is a device for turning coffee into theorems. -- (attributed to) Paul Erdos

Coffee is indeed an omnipresent drink in many different science fields - probably for its property of aiding concentration and focus over a short period of time. In the field of software development, just think about the Java platform and programming language, named after a coffee, whose symbol is a cup and certain constructs are called beans. Many programmers declare themselves coffee addicts, and present guides to coffee roasting and brewing.

As you probably now, high consumption of caffeine, which is the principal psychoactive component of coffee, is not very healthy. The reason is the substances in a cup of coffee are derived from coffee beans, and the Coffea plants have been naturally selected to produce hazardous chemicals. Plants generally do not want their seeds to be eaten: while the man-made cultivation, which favors the more productive varieties, dates back to some centuries ago, the natural selection of the Coffea plants has been a continuos process for thousands and thousands of years. That's why roses have spines and poison ivy is... well, a bit poisonous.
The overall effects of a moderate consumption of caffeine, as they have been exposed by scientific studies, are controversial. And the picture cannot be different: there are many types of coffee, also with different serving sizes, and possibly different effects (and side effects). Particular beverages such as tea and Coca Cola contain caffeine, but in a much lower concentration.

My current practice is avoiding influential substances as much as possible. I drink occasionally a cup of tea in the morning, mainly because of the cold season. I did multiple rounds of coffee in the past, but the benefits on concentration (and in recovering from hangovers) were not very significant on me, and I also had to add the side effects on my stomach and my sleep cycle, which is an important variable in my life.
Nearly every morning I see many students come in with hot, tiny espressos made by automatic machines. I guess it's more a placebo than a real help. In fact, they are usually not the most brilliant ones in the classroom, but the ones that sleep less.

Do you drink coffee? What is your idea of its benefits and disadvantages?

Image courtesy of Julius Schorzman.

Saturday, March 13, 2010

Weekly roundup

This week I have published two new posts at php|architect:
The Open Source Way
Why TDD is based on testing first

Also, my post Acceptance Test Driven Development has been republished at DZone.

Thursday, March 11, 2010

Standard Php Library and coupling

The Standard PHP Library is an object-oriented library, by default always included in the latest versions of php. Leveraging Spl in the right way is crucial to avoid compromising the design of an application and taking advantage of the most portable php library at the same time.
A wide use of the basic language structures is often the sign of a missing concept in the domain model. Passing around arrays and Iterators may be harmful if the implicit objects they represent do not fit with the operations available on them. We may have collections where elements can be removed, or repeated, in spite of business rules; queues and stacks that allow random access to all their elements, and so on. This contradicts the principle of defining and referring to the smallest possible interface.
Sooner or later someone will start calling a method that should not be here, and that by then could never be removed by substituting the object. Actually in php you can call methods even if they are not defined on the interface you are referring to (it's a dynamic language), but if you write unit tests passing in mocks and stubs, they will fail and tell you that the code is smelling.

Before looking for a remedy to these design issues, let's just analyze a little Spl classes and interfaces to put this discussion in context.

the Traversable interface and its Iterator extension would be perfect if php had generics capabilities built into the language: Iterator is a segregated interface that works well because it can be passed directly to foreach() constructs. The only issue is that Iterators that returns object of different classes can be mixed without type safety, so you may find code that expects a string Iterator and is passed a object Iterator.
Countable is a very good interface, with only one method. It's really hard to implement this interface with more than one semantic meaning.
SplQueue, SplStack, SplObjectStorage are only fast data structures, that have nothing to do with object-oriented design. So they're fine and perform well, but do not refer to them more than two or three times in your codebase (essentially wrapping them in explicit domain objects.) Really, a Queue or Stack that implements Iterator is a recipe for a disaster.
SplObserver and SplSubject are really... pointless. Patterns are solutions that are implemented with many variations every time they are taken into consideration. So it makes no sense to specify standard interfaces for patterns, and I see many problems in these ones: detach() may remain unused since php objects are short-lived. Also attach() may be implemented differently if there is only one observer, or a specific set of them. And notify() may be a private or protected method.

As was the case with Iterators, SplSubjects A which are meant for certain SplObservers accept other SplObservers that were meant to observe SplSubjects B, so that when they try to access the wrong SplSubject implementation they blow up because of calls to nonexistent methods. This is fake type safety and there is no language construct that takes advantage of these interfaces. You can just write This is a Subject in the docblock and it will be the same.
For a full analysis of the underlying design problems of Spl, see the related SOLID principles post.

As you may know, composition (and its specialization, wrapping) is usually considered the greatest friend of an object-oriented developer.
The basic Spl classes have too much methods and behavior. This is fine for them because they try to cover all the feasible use cases, but it's dangerous for Client classes because they may rely on operations that should not be available to them, becoming coupled to a large interface. A fundamental principle of type safety is that object should be interchangeable only if they are really interchangeable, not because they happen to have a similar structure.
This means that I would never pass an SplQueue in my class, because it has so many methods (22 public methods) that, for example, I cannot mock it consistently in testing. I would wrap SplQueue in a class that defines an (at least implicit) small interface, and that I can pass around without fear because only my explicitly defined methods can be called.
Long story short, try to wrap Spl data structures in your classes, that define the interfaces that you want, and never pass them around as-is. As all languages structures, they are catch-all objects: your code will suffer from a spreaded, large set of calls to their public methods, becoming pinned to them. They are a great tool to get something done fast, but hiding their existence from the rest of the codebase is fundamental.

Wednesday, March 10, 2010

Acceptance Test-Driven Development

I am halfway through reading Growing object-oriented software, guided by tests, a book that teaches Test-Driven Development in a Java environment. A review will come soon, since the process described in this work is really language-agnostic and interesting also for php developers.
However, the book's authors introduce a very productive practice, which consists in a double cycle of TDD:

a longer cycle, where you write acceptance (aka end-to-end) tests, deriving them from the user stories or formal requirements, and make them pass;
a shorter cycle contained in the first, which happens in the phase when an acceptance test is red: you write unit tests and make them pass until the related acceptance test does not fail anymore.

This approach is an implementation of Acceptance Test-Driven Development, and in particular makes you write several unit tests for every acceptance test (read for every feature) you want to add. Acceptance testing gives immediate feedback on the application's external qualities: simplicity of use and setup, consistency of the interface. At the same time, unit testing gives feedback on the internal qualities: decoupling, cohesion, encapsulation.
When I started employing the double cycle, getting in the zone suddenly became less difficult. The advantages of the TDD process were for the first time applied to the whole process, from the requirements formalization to the end of a feature's development:

test-first paradigm. By the end of the development phase, regression tests will be already in place, and the production code will be forced to be testable.
The definition of "done" is very clear (the acceptance test passes), and you are more likely to write only the mandatory code to get a green bar at the higher level.
measuring progress is easy: the number of acceptance tests that are satisfied (weighted by points). You can even write a set of acceptance tests for the whole iteration in advance and keep them in a separate suite, moving them in the main suite when they start to pass.

To be a bit more specific, the php technologies I use for the two cycles of development are Zend_Test for the acceptance tests suite and plain old PHPUnit test cases for the unit tests one.
Zend_Test is an extension to PHPUnit that lets me define a contract for the http interface of a Zend Framework application, assert redirects, check parts of the html response via css selectors, and even falsify request headers to simulate ajax requests. The unit tests usually have no dependencies on a particular infrastructure, so PHPUnit itself is a powerful enough tool to write them with.
By the way, triting an automated acceptance test suite is more difficult than writing unit tests, as there is more infrastructure that gets in the way and a large amount of details that may render the tests brittle. Fortunately Zend_Test takes care of almost all the infrastructure (aside from the database, which I reset in the setUp phase of test cases), and acceptance tests code can and should be refactored to avoid duplication of the implementation details. For instance, Css selectors used to assert on parts of the html response can be stored in constants with an expressive name, and the request creation can be wrapped in helper methods that are composed in actual test ones. Also custom made assertions are helpful in keeping the noise to a minimum.

I hope this technique will be useful for all test-infected developers. It certainly enhanced my productivity and will to get a green bar. :)

Tuesday, March 09, 2010

Practical Php Patterns: Observer

This post is part of the Practical Php Pattern series.

Today's pattern is the Observer one. The intent of this pattern is breaking a [usually one-to-many] dependency between objects by making the set of objects act as Observers of a Subject one, which notifies them when its state changes or is affected someway.
The result of the application of the Observer pattern is an effective reduction of coupling: often notifications are sent to disparate objects which are thus prevented to becoming entangled.
This pattern is a form of publish/subscribe, where a Mediator between publishers and subscriber is not involved.

The classic application of the Observer pattern is in maintaining synchronization between views of the same data (part of the Mvc paradigm).
Another useful application is the connection of external infrastructure whose existence the subject should not rely on at all. For example Subject can be an object from the domain layer, which accepts as Observer-implementors objects from the upper layers (or their Adapters), to send them events to catch. The Observer and Subject interfaces in this case reside in the domain layer.
There is a broad spectrum where the implementation of this pattern can fit, whose extremes are named push and pull styles.

In the pure push style, the Subject passes only the necessary parameters in the notification: this solution favors encapsulation of the Subject data.
In the pure pull style, the whole Subject is passed to the update() method: this solution favors reusability of the Observers on different implementations or subclasses of Subject.

Obviously implementations can be a mix of this two pure solutions, for instance using a ValueObject as the update() parameter.
Another variability factor in implementation is how references are kept between the various objects of the graph. The only mandatory field references are the Observer collection, which is maintained in the Subject. The only advantage of keeping a reference to Subject in the Observer implementations is to not pass it in the notification, or check that the passed object is the subject where it was registered. Though, a mutual association would have to be maintained, and this can be tricky.

Participants

Subject: interface that defines a protocol to set up a collection of Observers. Can sometimes be omitted by using the ConcreteSubject directly in method signatures.
Observer: interface that defines a notification protocol.
ConcreteSubject: stores the Observers and notifies them according to a contract, usually when its state changes (or when a notify() method is called).
ConcreteObserver: performs business logic operations upon a notification from a Subject is received.

The code sample implements a pull-version of the pattern, but without Observers keeping a reference to Subjects to avoid a cyclic graph, which would add unnecessary complexity to the construction process. I have not used Spl interfaces because they would not add any value here. The addressed problem is how to get status updates of a User object (domain layer) to be sent to external social networks by infrastructure services.

<?php
/**
 * Needless to say, the Observer participant.
 * The chosen style is pure pull, so the whole User object
 * is passed to update(). User is en entity in the domain model
 * so the Observer will be probably dependent on it anyway.
 * The gained advantage is that User do not know the concrete
 * class of any of its Observers.
 */
interface UserObserver
{
    public function update(User $subject);
}

/**
 * Subject interface is omitted. The ConcreteSubject
 * is used instead directly.
 */
class User
{
    protected $_nickname;
    protected $_observers = array();
    protected $_status;

    public function __construct($nickname)
    {
        $this->_nickname = $nickname;
    }

    public function getNickname()
    {
        return $this->_nickname;
    }

    /**
     * Accepts an observer. Note that we really don't need
     * a detach() method.
     */
    public function attach(UserObserver $observer)
    {
        $this->_observers[] = $observer;
    }

    /**
     * Updates the status of the user (and notifies Observers.)
     */
    public function setStatus($text)
    {
        $this->_status = $text;
        $this->_notify();
    }

    public function getStatus()
    {
        return $this->_status;
    }

    /**
     * May be public if you want the Client to control
     * the notification generation.
     */
    protected function _notify()
    {
        foreach ($this->_observers as $observer)
        {
            $observer->update($this);
        }
    }
}

/**
 * This ConcreteObserver is passed the User.
 * It can extract from the argument the data it needs (pulling).
 */
class TwitterStatusObserver implements UserObserver
{
    /**
     * We should send the new status to twitter
     * which will mirror it, but to keep this a standalone
     * code sample we'll simply print it.
     */
    public function update(User $user)
    {
        $nickname = $user->getNickname();
        $status = $user->getStatus();
        echo "$nickname has changed its status to \"$status\"\n";
    }
}

// Client code
$user = new User('Giorgio');
$user->attach(new TwitterStatusObserver);
$user->setStatus('writing PHP code');

Monday, March 08, 2010

How to avoid phase-of-the-Moon bugs

Wikipedia documents some definitions of particularly dangerous bugs, which are very hard to fix and when they manifest constitute a real problem in development.
Pay attention to these examples:

Heisenbug: bug that changes its behavior when someone is trying to reproduce it. The very attempt to study it changes the conditions (or requires irreproducible conditions) so that the Heisenbug does not manifest anymore. The naming of this bug is based on Heisenberg's uncertainty principle.
Phase of the Moon bug: bug that arises from a dependency on external conditions, usually time. In the linked original definition, a piece of software has a marginal dependency on the Moon phase. Surely software developers should not feverishly expect monthly events to determine if their application works (unless it's some werewolves-oriented project.)

These are two categories that are really dangerous and can bring development to an halt by forcing long debugging session to find the cause of the failure, which is "non-deterministic" (in the former case) or hidden (in the latter). They certainly look scary but I chose them as examples because these particular bugs can be avoided by employing some engineering practices, such as a good test suite.

The test suite for a project should be mainly composed by unit tests. While acceptance end-to-end tests constitute an important part of it, because they validate the application's fulfillment of its requirements, unit tests are usually more powerful even if they do not drive the design. Their potential is to quickly locate defects, by testing the contract of individual classes: the first step in exposing a bug, and particularly an Heisenbug, consists in locating it, and being able to reproduce it reliably every time that is needed, to check that the bug has been fixed. A well-written test suite provides an automatic way to check, at the push of a button, that all previously fixed bugs have not reappeared, so that there are no regressions.
The other advantage of unit tests is in the way they promote isolation. External dependencies are usually mocked out in the testing environment, so that boundary conditions can be reproduced at will. If you suspect that a piece of software may fail during a combination of unusual date and time, you only have to add a test case where you provide the set of conditions that you are scared about.
The isolation of components is not limited to external capricious dependencies, such as time and database state. Mutable global state can also be avoided just by maintaining a unit test suite, primarily because it makes the tests brittle and difficult to write since they may fail depending on their execution order. A unit test should be able to coherently fail or pass both when the single test method is run as when the whole suite is. If you're facing global state clings in the testing environment, which may lead to Heisenbugs and similar issues in production, listening to the tests will tell you to change your design to accomodate a simpler testing procedure, and a overall better architecture.
I agree completely to the Google Testing blog's motto:

Debugging sucks. Testing rocks.