Saturday, October 31, 2009

Object-oriented roundup: Halloween edition

This blog has grown much during the last month and the newest readers may have missed some important articles in the posts archive, on one of the most interesting topics here: object-oriented programming. Specifically, these posts reflect my vision and are meant to discuss good and practical design of classes.
This page will stay here as a way to quickly find references to the key blog posts of the past for me and the readers, while working on new articles.

First, I made a roundup of the most crucial and popular oop posts. I guess you have already read The Repository pattern and Object-oriented terminology if you are here.
  • Domain model is everything discusses the differences between the domain and infrastructure layers and how to get the best from both. Typically an infrastructure layer is reusable and is provided to the programmer as a library or framework, while the domain layer is the core of an application and is commonly written from scratch.
  • Never write the same code twice: Dependency Injection is an introduction to DI techniques, which allow to produce reusable and testable code. For any non-trivial project at least considering Dependency Injection is fundamental.
  • When to inject: the distinction between newables and injectables expands on the previous article about DI and makes distinctions about which classes are really suitable for injection of collaborators and the cases when it is an overkill.
  • Factory for everything focuses on the object graph building aspect of injection and on how to implement the [Abstract] Factory pattern, presenting a refactoring example towards it.
  • Object-oriented myths deals with common legends about supposed futility of good object-oriented design, such as overuse of factories and getters&setters versus encapsulation.
  • The rest of the object-oriented myths is the second part of the previous article on myths, and talks about singletons, lazy loading and Utility classes.
Then, I listed the articles in the SOLID principle series, which comprehends posts about the five object-oriented design principles officially formulated by Robert Martin (though they are not the work of a single man). This set of principles strives for loose coupling and reusability of components like classes and interfaces.
I hope you find these resources useful. Anyway, happy Halloween!

Friday, October 30, 2009

Object-oriented terminology

This morning I was teaching to a naive php programmer how to use a Facade class I have prepared to decoupling his php scripts from the database. After having exlained many object-oriented programming terms, I think sharing a glossary would be useful for the programmer who is starting to exploring php objects.
This article implements the beginner pattern.

State is what is stored in a set of variables and behavior is a bunch of functions. One of the main innovation of object-oriented programming was striving for keeping state and behavior together in a code construct called class. In a pure object-oriented language like Java, no code goes out of classes; php is an hybrid language and you can still write procedural code which makes use of classes and objects.

A class is a blueprint for constructing objects. The entities in your domain should be modelled with classes: User, Group, Post, but also FormHelper, AuthenticatorService and similar service classes. The construct should include definitions of internal variables alias properties alias fields (state) and methods (behavior). Methods are simply functions that are tied to an object, and can refer to other members of the class like variables, as class members are always in the same scope. In a function, you have parameters and local variables available; in a method, you have parameters, members and local variables.
The act of creating an object is called instantiation and involves the new operator:
$john = new User();
As you have guessed, an object is an instance of its own class.
The creation process can take arguments, which will be passed to a special method of the class called constructor. This method commonly stores the parameters passed in object members, for future use by the other methods.
Think of a class as a car blueprint and of an instance as a real car. You can create many different cars from the class, that share the same structure but can tune the members (changing the color, the model). The methods remain the same: throttle(), brake(), turn($direction) and so on. The member variables are the same, but they are hidden under the hood as ideally we want to interact with objects only by calling their methods.

A class can inherit from (one) parent class. $john = new AuthenticatedUser() is still an instance of User if AuthenticatedUser extends User. This class automatically implements all visible methods and variables by borrowing the code from its superclass.
Multiple inheritance can be achieved with interfaces: a class can extend only one parent class, but can implement as many interfaces as you want. Though, no code other than method signatures can be inherited from interfaces, as they define a contract the implementor should respect.
The act of passing around references of subclasses and different implementations as they were references to the base class or interface is called polymorphism.
Expanding on the previous car-based example, the class Car can be extended by Ferrari, which adds more luxury methods such as openMiniBar(). I still can pass a Ferrari to the car wash methods as it is still a Car.
Maybe the Ferrari class implements also the interface LuxuryItem, that includes the method getTaxRate(). So if I buy a Ferrari, I can pass all my LuxuryItem instances to the Accountant object which will calculate the total taxes for this year, without even worrying about what kind of items he is dealing with.

The visibility of class members can be altered making them public, protected or private (in some languages there is also package visibility available). Public members can be accessed by anyone; protected members by subclasses and private members only by the code that resides in the class. This encapsulation hides the details of a class and provides abstraction, reducing coupling between classes.
Imagine coupling as strings attached between objects: the more coupling is present in an application, the less the classes are reusable and maintainable. Private members for instance reduce coupling because when you change their name and type you do not have to check any code but the original class source file. The Accountant class has low coupling towards the Ferrari object thanks to the interface that sits in the middle.
Another qualitative metric is cohesion, which is the measure of how well the lines of source code within a module (or a class) work together to provide a specific piece of functionality: a class should have only one responsibility, which is covered by cohesive methods and variables. A class with low cohesion it's like an object that does too much, and not very well: think of a stereo that prepares coffee.

You are confused? If you are learning object-oriented programming I guess you are. The learning curve is not steep but it is very longer and you will spend many months and lines of code before mastering "the classes". Though, this article is a reference to clarify terms you will certainly encounter during your journey: many words have a specific meaning in the computer science field, like the ones bolded here.

Thursday, October 29, 2009

The Repository pattern

A Repository is a Domain-Driven Design concept but it is also a standalone pattern. Repositories are an important part of the domain layer and deal with the encapsulation of objects persistence and reconstitution.

A common infrastructure problem of an object-oriented application is how to deal with persistence and retrieval of objects. The most common way to obtain references to entities (for example, instances of the User, Group and Post class of a domain) is through navigation from another entity. However, there must be a root object which we call methods on to obtain the entities to work with. Manual access to the database tables violates every law of good design.
A Repository implementation provides the illusion of having an in-memory collection available, where all the objects of a certain class are kept. Actually, the Repository implementation has dependencies on a database or filesystem where thousands of entities are saved as it would be impractical to maintain all the data in the main memory of the machine which runs the application.
These dependencies are commonly wired and injected during the construction process by a factory or a dependency injection framework, and domain classes are only aware of the interface of the Repository. This interface logically resides in the domain layer and has no external dependencies.

Let's show an example. Suppose your User class needs a choiceGroups() that lists the groups which an instance of User can subscribe to. There are business rules which prescribe to build a criteria with internal data of the User class, such as the role field, which can assume the values 'guest', 'normal' or 'admin'.
This method should reside on the User class. However, to preserve a decoupled and testable design, we cannot access the database directly to retrieve the Group objects we need from the User class, otherwise we would have a dependency, maybe injected, from the User class which is part of the domain layer to a infrastructure class. This is want we want to avoid as we want to run thousands of fast unit tests on our domain classes without having to start a database daemon.
So the first step is to encapsulate the access to the database. This job is already done in part from the DataMapper implementation: [N]Hibernate, Doctrine2, Zend_Entity. But using a DataMapper in the User class let its code do million of different things and call every method on the DataMapper facade: we want to segregate only the necessary operations in the contract between the User class and the bridge with the infrastructure, as methods of a Repository interface. A small interface is simply to mock out in unit tests, while a DataMapper implementation usually can only be substituted by an instance of itself which uses a lighter database management system, such as sqlite.
So we prepare a GroupRepository interface:
interface GroupRepository
    public function find($id);

     * @param $values   pairs of field names and values
     * @return array (unfortunately we have not Collection in php)
    public function findByCriteria($values);
Note that a Repository interface is pure domain and we can't rely on a library or framework for it. It is in the same domain layer of our User and Group classes.
The User class now has dependencies only towards the domain layer, and it is rich in functionality and not anemic as we do not have to break the encapsulation of the role field and the method is cohesive with the class:
class User
    private $_role;

    // ... other methods

    public function choicesGroups(GroupRepository $repo)
        return $repo->findByCriteria(array('roleAllowed', $this->_role));
Now we should write an actual implementation of GroupRepository, which will bridge the domain with the production database.
class GroupRepositoryDb implements GroupRepository
     * It can also be an instance of Doctrine\EntityManager
    public function __construct(Zend_Entity_Manager_Interface $mapper)
        // saving collaborators in private fields...

    // methods implementations...
In testing, we pass to the choicesGroups() method a mock or a fake implementation which is really an in-memory collection, and which can have also utility methods for setting up fixtures. Unit testing involves a few objects and keeping them all in memory is the simplest solution. Moreover, we have inverted the dependency as now both GroupRepositoryDb and User depend on an abstraction instead of on an implementation.

Another advantage of the Repository and its segregated interface is the reuse of queries. The User class has no knowledge of any SQL/DQL/HQL/JPQL language, which is highly dependent on the internal structure of the Group class and its relations. What is established in the contract (the GroupRepository interface) are only logic operations which take domain data as parameters.
For instance, if Group changes to incorporate a one-to-many relation to the Image class, the loading code is not scattered troughout the classes which refer to Group, like User, but is centralized in the Repository. If you do not incorporate a generic DataMapper, the Repository becomes a Data Access Object; the benefit of isolating a Repository implementation via a DataMapper is you can unit test it against a lightweight database. What you are testing are queries and results, the only responsibility of a Repository.
Note that in this approach the only tests that need a database are the ones which instantiate a real Repository and are focused on it, and not your entire suite. That's why the DDD approach is best suited for complex applications which require tons of automatic testing. Fake repositories should be used in other unit tests to prevent them from becoming integration tests.

Also the old php applications which once upon a time only created and modified a list of entities are growing in size and complexity, incorporating validation and business rules. Generic DataMappers a la Hibernate are becoming a reality also in the php world, and they can help the average developer to avoid writing boring data access classes.
Though, their power should be correctly decoupled from your domain classes if your core domain is complex and you want to isolate it. Repositories are a way to accomplish this decoupling.

Wednesday, October 28, 2009

Php login with Zend_Auth

Zend_Auth is the component of the Zend Framework which provides a standard authentication mechanism for web applications users. It has few dependencies (on Zend_Loader and on Zend_Session for default persistence of the authentication) and, as other framework components, will let you concentrate on the user experience instead of worrying about boilerplate code.

Zend_Auth is the facade class for this component and it is implemented as a Singleton. If you want to access it in your business classes wou may want to inject it in the constructor, relieving your code from coupling and make it simpler to unit test.
The shortest way to access Zend_Auth is by requesting its Singleton instance (I hope you will write this specific code only in a factory if you have a well-designed object-oriented application):
$auth = Zend_Auth::getInstance();
The $auth object has some methods which encapsulate the functionalities you have been busy reinventing for every project you are participating in. For instance to authenticate a user, just set up an adapter (more information on this later in the post) and write the following code:
$result = $auth->authenticate($adapter);
$result is a Zend_Auth_Result object, and has a method getCode() which you can call to access the result code created by the adapter during the request for authentication. 
switch ($result->getCode()) { 
            case Zend_Auth_Result::FAILURE_IDENTITY_NOT_FOUND:
            case Zend_Auth_Result::FAILURE_CREDENTIAL_INVALID:
                // bad...
            case Zend_Auth_Result::SUCCESS:
                // good...
Maybe you want to check if your user is already authenticated before redirecting him to a login form:
if (!$auth->hasIdentity()) {
    // redirect wherever you want... 
or you want to know the username that was passed to the adapter (again, more on setting up the adapter of your choice and passing it username and password later):
$name = $auth->getIdentity();
or certainly you want in some place to logout the user, as he chose by pressing the Logout button:

As a side note, remember that username and password assume the generic name of identity and credential troughout all classes contained in the Zend_Auth component. Moreover, the default storage for Zend_Auth successful authentication attempts is Zend_Session, which means a session cookie will be set on the client and the username will be saved as a session variable. Typically the session lifecycle will last till the browser closure and you have to provide alternate storage if you want a permanent authentication a la facebook.

An adapter is an object that bridges Zend_Auth with different authentication servers: it links together the infrastructure code of Zend_Auth with your business and domain layer. For instance, you can login via Ldap or via a relational table, by specifying the identity and credential column names:
$authAdapter = new Zend_Auth_Adapter_DbTable(
where $dbAdapter is an instance of Zend_Db.
Don't want to tie your authentication with another zf component? No problem, it is indeed very simple even to create your adapter which uses PDO or whatever you want, even ini files. I just work recently on a server where PDO was not available and I could only call mysql_query() to access the database. Pragmatically, I wrote this adapter in about five minutes:

require_once 'Zend/Auth/Adapter/Interface.php';
require_once 'Zend/Auth/Adapter/Exception.php';
require_once 'Zend/Auth/Result.php';

class MyAuthAdapter implements Zend_Auth_Adapter_Interface
    private $_table = 'oss_users';
    private $_username;
    private $_password;

    public function __construct($username, $password)
        $this->_username = $username;
        $this->_password = $password;

     * @throws Zend_Auth_Adapter_Exception
     * @return Zend_Auth_Result
    public function authenticate()
        $q = mysql_query("SELECT * FROM $this->_table WHERE nick = '$this->_username'");
        if (!mysql_num_rows($q)) {
            return $this->_getResult(Zend_Auth_Result::FAILURE_IDENTITY_NOT_FOUND);
        if (mysql_num_rows($q) > 1) {
            throw new Zend_Auth_Adapter_Exception('Too many results.');

        $row = mysql_fetch_array($q);
        if ($row['pwd'] != $this->_password) {
            return $this->_getResult(Zend_Auth_Result::FAILURE_CREDENTIAL_INVALID);

        return $this->_getResult(Zend_Auth_Result::SUCCESS);

    protected function _getResult($code)
        return new Zend_Auth_Result($code, $this->_username);
Of course, $username and $password should be quoted in some way before passing them to the constructor, since PDO is not used here. After having created this object, I only had to pass it to Zend_Auth:: authenticate() to complete the process as I explained earlier in this post.

I hope you feel the power of Zend_Auth and the time it can save for you in many different php projects. If you already have experience with Zend Framework, it is the right time to start using a standard solution.

Tuesday, October 27, 2009

Programming Cone of Experience

The Cone of Experience is a model formulated by the educationist Edgar Dale in which he summarized the various learning media and their effectiveness. Like every model, it has limits and should be adapted to your personal vision, but since learning is something a good developer does every day to improve himself, having a methodology for acquiring new knowledge is fairly essential.

The Cone shows different activities in crescent order of memory retention. People are generally able to remember more things if they are learning trough the lower and larger layers and less if they are working in one of the top and thin ones. We will show an example of a developer learning Test-Driven Development and good software design, comparing different layers of the cone. There are many examples of phrases like "People remember 10% of what they read, 20% of what they see, ... 90% of what they do", but these numbers are totally made up and there are no quantifications in Dale's original work.
However, this is the Cone:

Let's analyze the different way a developer can learn a new technology or practice, such as TDD. We will start from the less powerful experiences and descend towards the end of the cone where the experiences have a great effect on the human memory and we are less prone to forget and confuse informations.
I want to learn TDD, so what I can do?
  • I read a book. However, reading one, two or ten books does not make me an expert on TDD or software design, and if I don't refresh my knowledge often I will probably forget everything but Red-Green-Refactor.
  • If the book has nice figures, they will improve my understanding and I will be able to remember at a glance different concepts that fit together in a single picture.
  • I listen to a podcast or see a video of Misko Hevery doing a talk at Google. This is more impressive for the brain and typically will have a longer-lasting effect on me than reading a book.
  • If I see him writing code in real time, the experience is even best.
  • The next step is simulating a real world situation by writing code for a toy project. This is also probably the last step we can make.
  • The final step is real experience: TDD some classes for my preferred application and analyze the results, then start again. The real experience is often not available for learning, or it presents some restrictions, fortunately: think of surgeon that practices on human beings. He must gather real world experience, but he is under the strict control of senior colleagues for years.
So suppose you want to learn TDD or the new technology. What you should do? Skim a book of course, to know the theory. But don't stop here: write code snippets, compile real code, work on a small project you can afford to throw away. If you have the time and the money to invest, go to a conference where in one hour the speaker will give you a general knowledge of the subject.
If is it available, a live coding session is the best thing to start with a new technology or language. That's why in university's exercise sessions and laboratories we learn much more what we would do while staring at a professor. Taking notes is a step further down in the Cone of Experience, but if you are missing pieces because you are too busy writing down everything to bother listening to the professor, it is not an improvement.

The hello world applications presented by many frameworks are not meant for being simply read. They are made for being compiled and hacked. The usefulness of a hello world program resides in helping you setting up the environment and the tools to build the simplest application - the one that does nothing. By experiencing the steps needed to build such a small binary, you start to grasp how to work with the new technology.
First-hand experience is the most powerful learning tool for the majority of people in the world and we should give it the right priority.

Monday, October 26, 2009

Validity of development tactics

Many software development practices and methodologies are presented as panaceas and silver bullets, seeming to be valid in every domain and situation. But a responsible developer must be pragmatic (and a successful book and series started from this term) about where and when he applies his preferred technology, knowledge, or workflow.
I blogged some weeks ago about the overusing of a specific technology, but now I am focusing on the more general methodologies and paradigms like object-orientation, particularly if they involve killing a fly with a bazooka.
Let's start with a series of examples regarding field of application:
  • Test-Driven Development might be the best and most controversial XP practice and is widely applied for producing robust and maintainable software. Though, it is obviously not applicable to application which are not object-oriented as you cannot easily isolate piece of functionality in structured programming; moreover, there are specifical tasks where TDD is an obstacle, like creating a graphical user interface or a throwaway prototype.
  • Limitations are intrinsic in Design Patterns: Ralph Johnson, one of the author of the original Design Patterns book, affirmed along with his fellow authors in a recent interview that functional programming requires for instance different patterns than the ones presented in the book.
  • Digging further, even object-oriented programming is not applicable in every domain and delopyment node, mainly where there is not the infrastructure to provide polymorphism and inheritance, like in embedded applications which typically employ the C programming language or in low-level operating systems routines. Some zealots will say that an object-oriented website will never scale, but this is an exaggeration.
  • Agile software development is a great methodology for delivering value to a customer, but it is not suitable if your organization does not support it, or software development is not your job. One could argue that software development should be managed by professionists, but this is not usually true in the real world, where in-house programmers have more than one responsibility.
  • At the programming level, using a Factory to build your objects is a good choice if they require external dependencies, but sometimes this is not the case. For instance objects that has to be serialized commonly are required to not have external dependencies, as you can always pass any service class trough the stack while calling a method. These objects are often declared newables.
  • Version control is great and even solo developers should give it a try. I put nearly everything I produce in a Subversion repository, but I do not store binaries in it for example, since they can be generated from source code stored in the repository and they would only slow down the server while performing enormous diffs. In the previous post I said that Subversion and similar applications are general purpose systems, but even here there is a limitation in what they are meant for.
I can go on for hours and I'm sure you can find a flaw in every single practice I can mention here. As Fred Brooks wrote in his famous essay:
There is no single development, in either technology or management technique, which by itself promises even one order of magnitude [tenfold] improvement within a decade in productivity. --Fred Brooks
And it is indeed true that every practice we implement has limits in field of application and in the productity improvement it can give back to us. There is the temptation to apply straight our new knowledge or technology to every problem at hand, but instead of perform well the technique usually jumps the shark and produces an horrible result like the ones I listed previously.
Consider the TDD example: test-first programming helps us to produce decoupled components for our applications, and shrinks the time required for bug fixing. So should we apply this technique to user interfaces? I would say a big No as it would slow down our development, cluttering the codebase with brittle tests that have to change every day. So the improvement in productivity is not gained on all the production code, but only in the domain layer, which is fortunately the most important one. You also may have to write other infrastructure code which serves as the glue between layers. Is it useful to test-first this code? I think it is not. The effort spent for automating the tests and discover wiring or user interface bugs is often not worth the value they provide.
You also have to test the wiring of your application, and unit tests like the ones prescribed by TDD are not useful here.

Take care of what you learn and also research and discover when you can use it. Like in physics, no formula is always valid and you must put it in context before starting to tackle a problem with your swiss-army knife.

Saturday, October 24, 2009

How to transform a broken laptop into a server

I bought my second-hand Toshiba Tecra A3 for about 500 Euros while still in high school, with the earning of my first web projects. During the sebsequent years, it slowly fell into pieces, one step at the time:
  • the first things that failed were the speakers;
  • then, the Combo Dvd reader and Cd writer suddenly was unable to read an entire disk;
  • third, the hard disk began to be unreliable, as files saved on a particular portion of it became corrupted instantly;
  • then the lcd screen suffered a hit and was rendered useless, as only a big stain of color filled (and still now fills) all the screen.
In this situation, I ordered a new ASUS Eee PC from eBay to substitute my former portable working station. This was one of my best purchases, and the Tecra A3 was put in a box and forgotten. However I am glad that I did not throw it away.

Towards the end of 2008, I felt the need for a development/staging server at home, as part of my freelancer work with php applications. The server did not need to run 24/7, but only during my work sessions and with only me and a few other people to work for. I was aware that servers which run Apache and Subversion do not need high-end components and I started to gather ideas on how to recycle my old laptop.
The CD reader was not required for a server, and I could not change it easily anyway. The screen and the speakers were useless for the same reasons, but the hard disk is something a server usually needs. So I bought a cheap 10 gigabytes Eide unit on eBay for less than 30 Euros. Substituting the old drive with this one restored Indy status as a working machine.

The next problem was how to install software on a blank machine like this one, without a CD drive and a screen to complete the graphical or command line installation process.
For the screen problem, I temporarily attached my primary machine monitor and figured out quickly that installing a command-line interfaced system like Ubuntu Server would have given me a system accessible via ssh, without the need for a real screen as I would be able to use the server via other machines like the Eee PC and my primary computer.
The installation was more difficult as I had to find another medium for the Ubuntu iso image. My laptop is not capable of booting via Usb, so I chose an installation with a boot via Ethernet (PXE). This involved setting up a tftp server on my desktop machine to host the installation files, and I suggest you to use a Usb card installation if you want to do the same thing.

Normally, Ubuntu systems run the NetworkManager application in the user bar of Gnome providing a list of wireless network to connect to. I wanted to have a connected system at startup, since without a already existing connection the ssh login is not available. Thus, I configured wpa_supplicant, the daemon used by NetworkManager as a backend, to automatically connect to my home WPA wireless network. Obviously I installed the ssh daemon and after this step I was able to remove the temporary monitor and use my new server remotely.

Once I had a working machine, I installed apache, php, mysql and subversion via the Ubuntu repository, running apt-get over ssh, that is the interface I administer the server still today. Now I had a web and source control server that I periodically backup just in case something goes wrong. Reliability is nor critical as even is the server explodes I can roll out my backups the next day on my desktop machine.

There is an enormous amount of electronics garbage out there to dispose of, and recycling an old pc is by far cheaper than buying full-featured servers just for testing purposes (unless you're doing a stress test): the servers on the market are meant for production sites and php development activities, which do not require compilation, do not stress even a poorly equipped machine as they include only a few http requests per second to satisfy. If you manage to reuse old hardware, you are doing a favor to yourself and to the environment.

Friday, October 23, 2009

Taking away the pain from programming

Being a developer is often referred to as fun and challenging, but also as a dull corporate job. Indeed there is some truth in both point of views since most of the fun depends on the domain a developer is dealing with.
Moreover, some activities can be very boring and frustrating leaving you clueless about what is not working. Suppose you're performing manual tests in the browser for your web application, loading pages and submitting forms. Often when changes to the code are fresh you'll end up with blank pages, forms that fail to submit, or they might even persist incorrect data.
So what you can do? Start to automate testing and concentrate test case on smaller units, so that when a test fails there are only one or two things that could have gone wrong. But the differences between manual/integration and automated/unit testing are only an example and a more general automation pattern can be seen.

I call this approach taking away the pain. Let's dig in some real world situations:
  • you and your colleague have just overwritten each other's changes to the live php files, on a production website. I guess you two should take the time to configure a subversion repository, so that every different copy of a file is memorized and restorable; moreover, different changes to plain text elements like source code files are merged (almost) harmlessly. Maybe you can even build a staging server to extensively test your application.
  • deployment of your web application puts you under pressure has you have to complete twenty-three steps in the right order. So why not automating it via a phing/ant script?
  • you are bored of writing similar Sql queries for inserting and updating your tables, when the only things that change between two queries are the table and the field names. Why not implementing or extending an ActiveRecord class?
  • you are bored to death of writing forms for hundred of entities without any behavior. Maybe you should look for a general solution such as Zend_Form, which abstracts away the html rendering and provides automatic population and validation of inputs values. You can even automate more and write a code generation tool, followed by fine tuning of your elements by hand.
  • you are even more tired of duplicating the list of fields and all their validation rules in the domain layer and in the user interface. So start using Naked Objects.
The advantages of "taking away the pain" as a human behavioral pattern are multiple: you automate a reusable solution to your problems, that may come handy in more than one project. Plus, crafting it is by far more challenging and fun than continue to write boilerplate code: you're applying automation and creating abstractions and other level of indirection.
The problem is that often it is very challenging, sometimes even too difficult for a single person to come up with a reusable, general solution to these kind of problems. Consider the following situation:
You achieve persistence ignorance of your complex domain layer, and your User/Group/Post/CommentRepository classes do not extend anything. They are POJO (or Plain Old <insert your language here Objects) and contain much logic. But writing the UserDao, GroupMapper, etc. classes to persist the instances in a database is boring as hell since you should reflect private fields of every object and put it in a table field. So you decide to find a generic solution since this problem persists (no pun intended) in subsequent projects.
And you find yourself writing a full-fledged ORM. Every User/Group/Post class has different peculiarities that were treated one at the time in your specialized classes, but writing a single mapper class that can deal with all the special cases quickly becomes impossible. You struggle with lazy loading, annotations format, detaching of entities, tracking policies, special mapping of value objects, relationships... The component grows to reach thousands and thousands of lines of code.
Indeed, an abstract DataMapper layer is very difficult to write for a single person, unless specifications and similar implementations already exists: the php implementations of the generic DataMapper pattern are ports of Hibernate/JPA.
Fortunately, there is open source, and we can participate in a project with other people, sharing our knowledge and time. I think this is why open source is so successful: it allows sharing our efforts to take away the pain from programmers' lifes.

Thursday, October 22, 2009

The Art of Unit Testing review

The Art of Unit Testing: with Examples in .NET is a very complete book on the testing paradigm, which guides the naive developer from the start of his journey in unit testing to more elaborate techniques. It starts from manual integration testing and explains the path of the technology evolution towards testing automation frameworks and programmatic stubs/mocks generation.
As the title says, the book is focused on .NET technology and presents C# code samples. However, like in the case of other software engineering books, the knowledge and art taught here are areas of general interest, and these patterns can be successfully applied in every language where a xUnit tool does exist. Especially since this book starts testing components without any external library or tool.

Troughout the book, there are valid topics treated extensively and without specific assumptions:
  • Stubs and mocks: the author starts from manual generation (similar to self-shunting and fake object pattern) and slowly shift to programmatic one. He also recommends to specify expectations on only one mock per test (and how many stubs you like), a suggestion that I rarely seen correctly followed in production.
  • Test code organization: how to map test cases to production code is a fundamental question that your codebase organization has to answer, to allow isolation of components. Other interesting topics are automation of test runs and refactoring of test classes in an Api that can be shared in testing of similar components.
  • Pillars of good tests: the author explains trustworthiness, maintainability and readability of tests. This means your tests should pass (or, more importantly, fail) consistently, you should be able to change the specifications of your system without throwing away tests, and you should maintain a good standard in naming and asserting to ensure your test code is of the same quality of the production code it covers.
  • Integrating unit testing into the development process: testing is a technique that should be applied constantly during a project development. A chapter is dedicated to help you evangelize these practices in your organization, answering the common questions of the people who hinder unit testing.
  • Writing tests for legacy code: one of the most difficult problems in unit testing, testing a system that is not separated into units, is tackled with a pragmatic mindset. Obviously legacy code is a problem that cannot be solved completely and with a general approach, but often a 20% testing solution is enough to perceive benefits.
  • List of .NET tools for web, ui, database testing and much more; this section will be interesting for the .NET developers only, and in fact I skipped it.

This book is probably the most complete guide for the test-infected beginner in circulation nowadays. You could give it to a developer that does not know anything about unit testing, apart the name of this practice, and he would be able to read it all and earn an honest theorical preparation on this topic.
A thing that I not liked is the little focus on Test-Driven Development, as it is probably the best way to ensure you design a testable application and keep writing testable code. I guess probably in a beginner reading there is no need to stress TDD before learning how to write good tests.
As a seasoned test-infected developer, I encountered pretty boring parts that were targeted to a wider audience, for example where the differences of state based and interaction based testing are explained. I think that the technical level does not rise much even towards the last chapters and this is indeed a pro for the inexperienced developer, but an annoying thing for me since I have already written hundred of tests in my life and already solved some of these problems.
So if you want to master unit testing, go read this book to learn the theory. And then, start practicing: after you will have written a thousands of tests your perspective will be very different, and you will have experienced a paradigm shift in respect to when you wrote code that just works.

Wednesday, October 21, 2009

Advanced Zend_Form usage

Zend_Form is the component of Zend Framework that I enjoy the most: it implements reusable forms and inputs as classes and objects, with automatic reinsertion of values during server-side validation; validation that is shared among all the instances and is provided out of the box by Zend_Validate. If you have ever duplicated a form for editing and adding a new entity to your application, or have felt the pain to manually populate text inputs, you now know being able to reuse a form is a killer feature and in fact many php frameworks provide a form library.
Today I have gathered from my experience some know-how I have learnt while taking the Zend_Form component to its limits.

Ignored elements
Every Zend_Form_Element instance has a method setIgnore(). Also you can pass a flag 'ignore' equal to True in the $options array parameter in the constructor, respecting the framework convention, to obtain the same result.
The purpose of this option is to exclude the value assumed by this input from the result returned by getValues(), even if you pass to isValid() the entire POST request (which contains a value for this specific element since it was present on the client side and filled for other purposes). This option comes handy when using Zend_Form_Element_Captcha or Zend_Form_Element_Submit.
$button = new Zend_Form_Element_Submit('submitButton', array('ignore' => true, 'label' => 'Send!'));

Both elements and forms (descendants of Zend_Form_Element and Zend_Form respectively) support a stack of decorators used for rendering themselves. The first decorator, ViewHelper, calls a view helper defined by the element or form to produce the basic html and this string result is passed to the subsequent decorators in a chain, each of them working on the previous one output.
This design is an unconventional implementation of the Decorator Pattern, and allow decorators to be reused troughout every kind of form element. The standard decorators cover a vast variety of cases and allow you to produce custom html for your forms. You can even write your own ones by extending Zend_Form_Decorator_Abstract.
$element = new Zend_Form_Element_Text();
    array('Description', array('tag' => 'div', 'class' => 'description')),
    array('HtmlTag', array('tag' => 'dd')),
    array('Label', array('tag' => 'dt'))

Subforms are instances of Zend_Form which are incorporated in a parent form, providing reusability of logic groups of elements. Generally I prefer to use Zend_Form instances with the adequate decorators instead of Zend_Form_SubForm ones, which is also a subclass of it.
Not only you can reutilize forms as fieldsets of a bigger one, but you can encanpsulate the values of this subform's elements transforming the main result in a recursive array.
                    array('HtmlTag', array('tag' => 'dl')),
$form->setElementsBelongTo('my_key'); // call this after all elements have already been inserted
// ...
if ($form->isValid($postData)) {
    $values = $form->getValues();
    // $values['my_key'] is an array containing the values of subform's elements

The Zend_Dojo component contains form elements which extends or substitutes the standard Zend_Form ones, augmenting their capabilities with the Dojo Toolkit widgets.
To set up the right plugin path, simply make your form a subclass of Zend_Dojo_Form instead of Zend_Form, or pass the form instance to the Zend_Dojo::enableForm() static method. Also make sure you are outputting the $this->dojo() view helper in your layout or view script, to print the mandatory script tags.
For instance you can create a real-time filtered select:
$element = $form->createElement('FilteringSelect', 'nameOfelement');
$element->setMultiOptions(array('yellow' => 'Light Yellow', 'blue'   => 'Blue', ...);
This type of elements can also work with a remote data store, just in case you have one million select options and you want to lazy load them. There is even a configurable WYSIWYG editor in the dojo component, available via CDN without having to upload a single js file to your server.

With Zend_Form_Element_Captcha, you can create different types of captcha to insert in your forms for the sake of stopping spammers bots to submit them. A captcha not correctly answered will invalidate the form result during the call to Zend_form::isValid(). Though you can specify ascii art as captchas and even on-the-fly generated images, this is the fastest way to include a captcha in a form:
$element = new Zend_Form_Element_Captcha();
You may be worried about how to automatically test the submit of a form which contains a captcha, since it is built with the purpose of avoid being answered by a machine like your test runner. However, there is a trick that accesses session variables to find the right answer for the captcha during the test method run.

I hope this quick panoramic of functionalities satisfies you. Zend_Form is a complex component but the learning curve is really worth the benefits it gives to your applications. You can even use it in isolation, without the ZF Mvc stack, since the only requirement is a Zend_View object that every element and form instance needs to render itself.
With Zend_Dojo_Form not even uploading javascript plugins is needed for improving the user experience: dojo is loaded remotely via Google or Aol servers. Zend_Form is a complete open source, liberally licensed, object-oriented solution for managing forms and inputs, the most powerful choice I have ever seen in php and one of the top components of Zend Framework.

Tuesday, October 20, 2009

Regular expression 101 in php

Regular expressions are an old but powerful tool for pattern matching, as they date back to the 1960s and have survived till today. They can be used for specifying the logic structure of a string and come handy for validation of generic input or processing of string data, as valid substitutes of the classic string manipulation functions (which work at a lower level, character by character, since in php they are mutuated from C). Here's a presentation of basic regular expression-fu with examples written using php core libraries, and a running example as a php script.

In php, there were two engine for regular expressions: POSIX Regex (ereg(), now deprecated) and Perl Compatible Regular Expression. We are going to explore a little the capabilities of the PCRE engine, whose syntax is supported in many languages such as Javascript and Python.

Defining a pattern is the first requirement to use the regular expression engine. A pattern is string that respects a formal language and thus represents a (possibly infinite) set of ordinary strings. There is no physical difference with a normal string in php as the type of the variable is still string, but when passed to a preg_*() function it assumes a particular meaning. Other languages use objects for patterns storage, to provide type safety.
The pattern has to reflect the structure of the string you want to check. Typically the string is part of user or application input and you cannot anticipate its content. A regular expression is a mean for specifying its structure and reject the input in case it does not conform to the rules. Every time a part of a string conforms to a pattern, it is said that the pattern matches it (or the opposite, the string matches the pattern).

The simplest pattern you can write is a alphanumeric string. In PCRE, the pattern must be enclosed in a pair of slashes "/".
- '/foo/' matches 'foo', but also 'foooo' and 'my fooooo'.
Obviously a literal pattern has little utility, so it's better to specify a character range with a quantifier:
- '/([a-z]{1,})@gmail\.com/' matches any Google mail address composed by lowercase characters such as ''. The range is specified in square brackets [], where you can put single characters or alphanumeric intervals, separated with '-'. The possible repetition of the subpattern are from 1 to infinity in this example. The specification can be for instance zero to four times: {0,4}. The zero quantification is useful for patterns that may be absent. There are shortcuts quantifiers such as * and + but I think that for this demonstration a pattern that contains braces is clearer.
- '' matches '' but also 'gmailicom'. Beware that non-alphanumeric characters have special meaning when used out of a range, and should be backslashed if you want to match literal values. The dot character normally is a wildcard for anything different than a newline.
By using preg_match($pattern, $string), you can check that any input conforms to some simple rules. The function will return the number of times the pattern is found in the string (zero if not found at all).

Another great feature of using regular expression matching is the extraction of data via backreferences. Enclosing parts of the pattern in parentheses defines a subpattern whose actual content can be returned by preg_match.
$matches = array();
preg_match('/([a-z]{1,})', '', $matches).
// $matches[1] now contains 'address.
$matches, the optional third argument of preg_match(), is an array passed by reference where subpatterns will be saved. The first element $matches[0], is always the part of string that matches the entire regular expression, while the subsequent ones are the subpatterns inserted.
Without regular expression, you would need to explode the string by different characters and navigate between the pieces to find out what you want, or doing complicated substring slicing and calculations. Regexes are the standard tool to do it efficiently, introducing less bugs and edge cases along the way.

Here is a script containing these examples, and many others, in the form of a phpunit test case. Seeing regular expressions at work and experiment by yourself it's the best way to learn the PCRE pattern syntax. Remember to test thoroughly your patterns and to not duplicate them across your application.
Regular expressions are a vast field and there are many topics to learn, like vertical bars and circumflex/dollar usage (|, ^ and $, respectively). The PCRE engine supports a lot of special characters and ranges.
These example patterns can be improved, too. I would be happy if you just start to consider the basic functionalities like I did, as they can save you from the nightmare of string manipulation functions. Although long patterns become quickly unreadable, they express intent better than two screens of substr() calls. You have a complete core php library at your disposal.

Monday, October 19, 2009

Data structures fundamentals

This is an introductory article, tagged under basic. Please do not feel offended if these arguments are obvious to you, as there was a time when you and I did not know anything about software.

Algorithms and data structures is a fundamental course in computer engineering. This post will explain the basic data structures that many programmers encounter during their daily work, provided by libraries or programming languages or by userland implementations. If you had ever heard of stacks, lists and trees, you already came in contact with data structures.

What is a data structure?
A data structure is a logic model to organize information. The reason for this organization are primarily of efficiency. However, while the physical model of data and metadata is tipically sequential and indexed (think of a computer's memory), the logical model assumes very different forms.
In the object-oriented paradigm, this distinction is typically achieved by encapsulating the physical model using class private members and abstracting the logical model in an interface this class implements. Thus, it is not only the physical model of data that is kept in a specific implementation, but also the algorithms instances that perform the operations prescribed by the abstraction. The physical data structures and their algorithms are the subjects taught in basic courses in college.
Back in the C++ days, there were no interfaces but only basic classes with pure virtual methods (abstract data types), which the chosen implementation inherited from. Though, there is no real difference in the physical data structures used as the lower layer is the same, a sequence of blocks in Ram or disk memory. Lowering further the level is usually necessary to exploit the performance of a medium (for instance in data structures for backup tapes).
A note on implementations: every abstraction leaks some information of the underlying data structure and data types are no exception. In this case the computational complexity of the various implemented methods is visible under the cover of an interface. The classic example is the comparison of an array and a list: while the first offers O(n) insertion and O(1) direct access, the second exactly inverts the complexities of these operations.
In some cases, the abstraction is not even present and the methods reflect the internal organization of the structure.

List of common data structures
  • Array/List. Defined as linear data structures, which can be maintaned in order in respect to a field of the elements contained or arbitrarily. The difference is in the internal organization, which is in a single block for an array and in small, linked blocks of memory for a list. Examples: the classical java.util.ArrayList, php numeric array.
  • Stack. A structure where elements are piled up every time there is an insertion and only the topmost one can be removed. Examples: java.util.Stack, SplStack in php
  • Queue. Specular to the stack, a linear structure where elements are inserted at one end and extracted at the other, conserving the insertion order. Examples: java.util.LinkedList in Java (which implements the java.util.Queue interface), SplQueue in php.
  • Tree. Hierarchical structure where elements can have a certain number of children and one parent. Trees can be used as-is or to implement a more complicated data structure such as a dictionary; when they are used by a high-order data structure they are usually optimized in one of the different hundreds of existing trees types. Examples of standalone trees: javax.swing.tree.TreeModel, a nested set abstraction layer like Doctrine_Tree (used in conjunction with a relational database).
  • Priority queue. A modified queue where elements are extracted according to their priority and not while respecting the original order. For instance, a heap is a tree used to implement a priority queue.
  • Dictionary/Map. Random-access structures that can give back an element by its specified key, often in O(1); this computational complexity is equivalent to a retrieval time independent from the number of elements stored and maps are much better than linear structures like lists from this point of view. Sometimes the border between maps and hashmaps is subtle as the latters are used to implemente the formers, but there are other physical models for dictionaries/maps, for instance binary or N-ary trees. Examples: java.util.HashMap, java.util.TreeMap, php associative array.
  • Graph. A superset of a tree where the requirement for elements to have only one parent is removed. Elements present a generic number of association to 1, 2, ..., N other ones. Example: domain objects are viewed from an Orm as a generic graph to persist in a database.
These structures can be combined recursively thanks to the abstraction tools provided by programming languages, such as templates and polymorphism. You can have a list of graphs in your application, or a tree of stacks (where the basic node is a stack). The ones listed here are the basic blocks used to construct more complex structures. Programming languages like Java provide also more sophisticated abstractions such as Sets and Bags (as an Hibernate mapping).

Do you think I have left out any fundamental data structure in this list? Do you feel there is enough support in the Php and Java world for these abstractions?

Sunday, October 18, 2009

The value of knowledge and what we should learn

In one of my last posts, I suggested to learn math topics as they will be propedeutic and complementary to programming challenges. I think we can gain useful knowledge and overkill knowledge while learning. I will present a story to fix ideas. I am sure you already heard this story, and now we will analyze it from several different points of view.
This famous anecdote goes more or less like this:
The printing machine of a famous newspaper had started to malfunction. Several technicians were called to find where the failure was, but even after a while no one was able to identify what component needed a fix.
So, the newspaper owner called the printer who had worked for many years at the now-broken press, who was at the time retired.
After he had arrived, he looked at the press and at its control panel for some minutes, then picked up a screwdriver to slighty turn a screw. He said "Now the press will do its job."
Incredibly, the press started to work again, and the worker was thanked and asked for a bill for his services.
The day after, the bill arrived and it was for 10000 dollars! The owner cannot believe the worker had had the gut to ask all that money for one minute of work, and sent back the invoice asking for a more detailed report.
A new bill came the next day:
- $ 1.00 for turning a screw
- $ 9999.00 for knowing which screw to turn
This (totally made up I guess) story shows us the value of knowledge. If you substitute screw with line of code and turn with tweak, you will obtain a perfect programming story. This is because there is really nothing about printing in this anecdote, but only a marketing lesson:
  • If you spend time to learn valuable [programming] knowledge, you can be paid a large amount of money. Obvious, but the problem is finding the disciplines which are not at risk of obsolescence: a doctor for example spends many years in education which very rarely becomes obsolete, and he is very well-paid troughout his career. The worker has previously spent many years on the press, gaining an enormous insight of how it worked and how to fix every kind of fault. Knowing how to fix a little, replaceable machine or other sacrificeable tools is not so useful knowledge in this context, just like knowing how to repair a motherboard with a soldering iron.
  • You have to market yourself: the newspaper owner knows that calling the retired printer could be the way to fix his machine. If no one knows about your special capabilities, it is difficult that someone will be willing to pay you much. Fortunately, the World Wide Web is now the best showcase you could desire: under the right conditions, billions of people can find you easily, in a matter of seconds.
  • If your knowledge is useful but not so diffused, the supply and demand laws permits you to charge more: the worker was probably the only one in the city that could fix the press. For example, if the market is saturated with Java basic developers, you may have an hard time finding projects which fall in your skills range. The more specific your capabilities are, the more sought-after you are, provided that there is enough demand for you to make a living: becoming an expert in Sothern Australia butterflies is not what I would suggest.
What have I learned this year? New languages, frameworks, practices? Is my previous knowledge becoming obsolete in the meantime? What valuable things do I know?
These are questions that I ask to myself before starting a new craft, such as learning Erlang. I mean, it is an interesting language and it is fun to play with, but how much am I likely to utilize it in web applications, given that I currently focus on php? I would rather deepen my Zend Framework know-how to be able to tackle more large and rewarding php projects. To be complete, I should also sharpen complementary skills as web design even if I am usually the backend guy. Disciplines which reside next to my main one give me a greater understanding of the overall picture: knowing what a designer works with helps me to communicate with him and to refine my job to perfectly match his.

Learning new things is important to keep an active mind and to feed it with new patterns and experiences. It is a self-improvement task, and in no way I am saying you should focus on work-related topics only. Just make a selection to keep out obsolete ones and to ensecure you are not forgetting the important disciplines before writing Erlang code. And if you have fun with Erlang, just practice with it! Maybe you will find a use for it in the future. The trick is not to think you are updating yourself on new technologies which lie in your field if they don't, as the time for learning is stretched between many topics today. The ROI for learning Erlang is not high for me and I want to invest in the best assets.
That said, knowing which screw to turn on a printing machine is incredibly valuable - but only if your job comprehends fixing printing machines: if you want to design Space Shuttle's computers for a living, you had better study software and hardware engineering. And even if printing machines could be more interesting than physics, the latter is a more pertaining topic since your computers will interface with a real situation outside the Space Shuttle.
As I previously said, if you want to design the Space Shuttle's computers. Don't be fooled in learning high paying disciplines that you don't care about at all. Work fills a big chunk of your life, especially for [software] engineers, and making sure you enjoy it, it's the best favor you can do to yourself. Even if you are less marketable and less rich, but not broke and still happy.

Saturday, October 17, 2009

Links that made me think

Here's a roundup of the posts that made me think the most last week in the blogosphere and that deserve to be recommended. I also wrote a brief summary of the topic they talk about to give them a context.
I reserve some time every day to evaluate my Google Reader entries from an hundred feeds, and I think the ones I have listed here are very interesting readings. Do you use a feed aggregator? Do you find quality links in it?
I hope these authors feel appreciated for their good work.

Getting started with Scrum: a panoramic on this agile methodology, even if towards the end it digresses in marketing. If you want a brief introduction to agile estimating and planning this is a must read and there are links to explanation of terms like backlog, stories, etc.. though, there is no need for full-featured project management applications if you just want to give a shot to agile. Just organize the process with plain text files under version control instead.

Coding Simplicity: How to avoid feature creep in your life is an interesting parallel between life and programming. In both software development and personal development we should think before pulling in new features and activities: they may cost more than the value they provide, and your life it's not a toy project you can throw away.

Your company is insane because what they wish is that you could somehow get all the benefits of Agile without making any difficult or scary changes. We fight for unit testing, agile estimates and continuos integration every day, while people say "It's a waste of time, I just want to get it done". This post is inspiring and shows the reality: you cannot think that reading a book and attending a seminar will make you an agile programmer or a good unit tester instantly. There is managerial and personal work that has to be done before becoming gurus.

Design for testability talk is a one-hour video of a Google Tech Talk where Misko Hevery (Agile coach at Google) explains the pillars of designing an object-oriented application to allow ease of testing. The key concept is there is no silver bullet (also known as testing magic) in the testing world, that can be applied after classes are already coded. Testability must be included in the design and, not surprisingly, it improves its quality by forcing high decoupling.

A more practical link for Php developers: "Micro" Optimizations that matter where Brandon Savage points out the real techniques that cost nearly nothing in time and money, and can vastly improve a php application's performance. If your profiling result says that you need to speed up a large number of things, try Apc for example before refactoring and eliminate your framework because of its overhead. Often Apc cuts the time for displaying a page in half, without making any changes to your code.

Let me know if you have read other interesting articles on software development and engineering recently. Feel free to add similar links in the comments.

Friday, October 16, 2009

Programmers should know math.. just not all of it

Mathematics is a part of a programmer's life. Other than the basic concepts implemented in programming languages, there are particular topics which are mandatory when you enter a field like three-dimensional graphic or financial applications. Writing components for these applications often means you have implement a mathematical model in code.
Think of a graphic engine for a video game. If the first thing that comes to your minde is it's difficult, you're not alone. It's difficult because the mathematics structures and theories behind it are not general knowledge: they are a specific algebra topics which many of us do not master. Otherwise, if you see a !($a or $b) statement probably you know how to write an equivalent form.
So how can you learn all this stuff?
The programming field is broader than ever, and there are more and more mathematics instruments used in computing. A modern programmer should be a jack-of-all-trades in his mathematical side, as he should learn the majority of the specific disciplines as needed by his profession, while mastering the foundations which he encounters every day.

I have made a breakdown of the main arguments taught in high school and university which are utilized in computer science. I divided this list in a basic section and specific applications one.

Basic list (what you should certainly know):
  • Set theory and relations: used in the relational model without most people knowing, and also as terminology in modern languages (List vs. Set).
  • Functions: there is a resemblance in mathematical functions and programming ones. Functional programming is an approach coherent and taken to the extreme. The lack of side effects is another strong similarity.
  • First order logic: De Morgan laws are a powerful example of refactoring applied to if statements. A strong foundation for something you will write forever: conditional expressions.
  • Differential and integral calculus (in one variable): they are general knowledge. Handy when dealing with disparate problems, for example maximums and minimums of an expression.
  • Enumerative combinatorics: scary name for permutations and combinations. Counting items is a recurring task.
  • Computational complexity: to know when an algorithm performs better than another.
Specific applications of mathematics (what you can learn as needed):
  • Trigonometry: you can take advantage of trigonometric functions while solving geometric and 2d graphics problems.
  • Analytic geometry and linear algebra: a fundamental in 3d graphics. Rotating an object involves matrix multiplications.
  • Abstract algebra: groups, rings, fields and so on. These are structures which are used to solve very specific problems. For instance, you can solve the Rubik's Cube with enough group theory.
  • Vector and multivariable calculus: also useful in anything that involves manipulation or simulation of physical objects. The law of physics you want to simulate are formulated in these terms. Though, these are engineering concepts instead of general computer science ones.
  • Statistic and probability theory: the world is not deterministic and some models should take into account natural variability of the data they act upon.
  • Complex numbers: probably useful only if you are treating signals and dynamic systems.
  • Financial mathematics: interests and present values are not a simple topic as they can seem.
When dealing with problems in a domain as the financial one, the fundamental principles must be mastered to produce a reliable and useful model. This is true for nearly every kind of business domain. Maybe in the future you will work on airports and flights, or restaurants reservations and lines. Take the time to learn the mathematics to model these domains.

The image at the top is a standard Rubik's Cube. Can you solve it? It is a pure mathematical problem, the time for studying it is the only requirement. People who master the math behind it can solve the problem in a few minutes (and often less).

Thursday, October 15, 2009

10 things plain text excels in

Plain text is the simplest text format in the world. It is called plain in contrast with sophisticated formats like Doc, Odt, Pdf and so on. I work with this universal format a lot and I want to share some tasks it can be useful for.
Plain text is based on representing characters as a single stream per file, using a byte for every character (or from one to sixfour bytes if you're using UTF encodings). There is no font choice in plain text, nor any formatting: the focus is on the content and in its logic. When you open windows notepad or vim, you get an example of plain text editor. Stripping out all the presentation logic is in some cases a good thing, as it simplifies the management of data and content.

There is a chapter in The Pragmatic Programmer: From Journeyman to Master (the first Pragmatic bookshelf title) that is titled Power of plain text, and where the advantages of text files upon binary formats are discussed: there is no obsolescence of plain text and one can leverage every kind of existing tool being secure that it can handle plain text. Plain text in UTF-8 will be still readable thirty years from now.
In fact, one of the Unix philosophy pillars is:
Write programs to handle text streams, because that is a universal interface. -- Doug McIllroy
There is no limit to what you can do with data in plain text, because you can chain together hundreds of unix programs which will work seamlessly. In a Unix system, no configuration files work with a unit smaller than a byte: all directives are kept in plain text files, in a structured but human editable form.

As an example, I put together a list of what I am using plain text for. When I really thinked about it the first time I was impressed and pleased:
  • Todo lists: a list of tasks I have to complete in the near future, divided in Urgent/Important/... sections. Since it is a list, items has a "-" before them and I can indent subpoints by using more than one hyphen, like in -- subtask or --- subsubtask". To mark a completed item and gain confidence, I substitute the hyphens with "+", maintaining indentation. Using vim, it is also fairly simple to reorder items or to mark them with a macro.
  • Specifical todo lists: I have one todo list for this blog, for example. A general list can grow too much for being still manageable, so it's a good choice to gather Todos for particular projects in their own list. This is somewhat similar to the Getting Things Done project actions management, but at a simpler level since I do not need a more elaborate one.
  • Source code: this is pretty obvious, but I wanted to point out that source code is usually plain text.
  • Lists of any kind: for example, books I want to read or to find reviews for.
  • Svn diffs and patches: when submitting a patch to a project like Zend Framework or Doctrine, the process involves checking out the Subversion working copy and making the changes needed for addressing a bug or adding a feature. Then, if you do not have commit access, svn diff > myfix.patch saves the changes in a patch you can upload to the bug tracker for evaluation. Patch format builds up on plain text, but it's still readable and before committing on my projects I usually run a svn diff | more to explore the changeset (another example of plain text as a universal interface).
  • Goals: it is mandatory to write your goals for the short and long term, if you are serious about achieving them. Plain text is a good choice since you can find everywhere the programs to edit them, even five years from now.
  • Blog posts: when writing a new article, I start with a blank vim screen (maybe I should use a template) and write all the content, the most important part of a post. Formatting and images are inserted while putting the post online and proofreading it, and emphasis on words and phrases can be specified by '' or * marks.
  • Email: text emails are more portable than html ones and can be forwarded and quoted easily.
  • Wiki articles: when I edit a wiki article, not only on wikipedia but in any wiki, I use wiki formatting, which is a superset of plain text. I have included this usage since wiki formatting is very readable and can be used without a subsequent "real" formatting phase, for instance for lists like my Todo ones.
  • Schedules: I might use Google Calendar in the future, but now that I'm trying out scheduling my working days a simple text file named 2008-10-15.txt is perfect.
The format for simple scheduling is simple:
08:00   wake up&breakfast
08:15   mail&reader
08:30   nakedphp user stories estimation
Tabulations, even when using spaces instead of \t characters, are very useful to align text and provide spreadsheet-like capabilities. In the schedule case, I only specify the tasks for the next day so one file it's enough.
There are only two problems that can surge with plain text: encodings and newlines. Specifying UTF-8 and what type of newlines (LF, CRLF, or CR) will make your text files universal and consistent. Compare this requirements to the ones for working with docx files.

While I advocate that web applications are the future choice for many tasks, I have never abandoned plain text. When testing out some new practice like writing goals or maintaining an effective Todo list, I always start from plain text. This way if it simply does not work for me or I am not satisfied with the results, I'll simply delete a folder on my pc. No need to register to powerful web applications such as Remember The Milk: I'm sure it works pretty well for TODO lists and it's globally accessible by every machine connected to the Internet, but I am not ready at the moment. I'm only exploring possibilities, with a next-to-zero cost in time: I only have to open vim or gedit.
Now before registering to dozens of web services, think about using plain text for your lists, goals, schedules... Often the simplest solution is overlooked.

This is by no means an encouragement to write a book in plain text: use complex formats for complex tasks, because they will pay back their heaviness.