Monday, November 30, 2009

Asserting out of tests

In programming, assertions are statements that should always evaluate to true, being invariant assumptions in respect to the input data of a program. From the mathematical point of view assertions are tautologies for the implementation which they are sunk in.
For instance, you can code assertions which verify that the input data consist of strings, or that the result of a calculation is coherent with the program flow. A failed assertion usually marks a logical bug.
Unit tests are disseminated with assertions, given the advantage that xUnit assertions are contained in test cases and thus separated from production code. However there are other places where assertions are used; many compiler or interpreter checks in modern programming languages are implicit assertions that provide type safety or other automatic controls:
  • As I said earlier, assert*() methods like assertEquals() and assertTrue() are provided by instances of test cases to allow specification of behavior. These assertions are treated in detail in the relative testing series article.
  • The assert() function (sometimes implemented as a macro in languages like C) in production code is the fastest way to check the correct flow of the program. The php assert() function takes as an argument a php expression (which is simply code that evaluates to a boolean) encoded in a string; the encapsulation in a string variable allows for the assertions to be skipped where particular flags are set.
  • Type hinting on function parameters is actually a masqueraded assert(). In php, the assertion code would be assert('$param instanceof MyClass');
  • Database constraints are commonly declared in the form of assertion in Sql code. For example, the result of some queries should be invariant or the value entered in a column should match a restriction of the field domain.
Given the various assertions you can make in different parts of your application, you should be cautious in inserting them in production environments. While in unit tests assertions are fundamental (but you shouldn't exaggerate with their number to simplify maintenance), typically explicit assertions are disactivated in live deployments. This behavior is preferred to leave checks in place, to avoid exposing errors to the real user and to speed up code execution.
Also caution should be used with database constraints if you work with an object model and an Orm, as they may result in logic duplication.

Failed assertions should be managed someway. Php lets you declare an assertion handler, which I usually set to a small function that throws a special exception with a message containing the error generated by the assertion code. I see failed assertions as a very serious problem which may indicate a bug, while normal exceptions often are used to signal incorrect inputs or state conditions that cause an error.
Some assertions get in the way of tests too: when we encounter such assertions we should demand why they exist in the first place, if they should be disactivated in production and also in testing. A common example is the null check/type hinting:
<?php 
class MyClass
{
public function __construct(MyClass $param)
{
    ... 
While Java would allow the client code to pass null as the value of $param, php raises a catchable fatal error, which stops the constructor execution. This means that if we don't need some collaborators in testing a particular method, we are forced to subclass MyClass to override the constructor, or to create fake collaborators only to fill the parameters list. If these collaborators require not-null parameters in the constructor, the problem becomes recursive.
So I prefer not to make unnecessary assumptions on the input parameters of constructors:
<?php 
class MyClass
{
public function __construct(MyClass $param = null)
{
    ...
The same problem arises in a different form for scalar type hinting, because it is not available in the syntax but can be implemented by the programmer:
public function __construct($config)
{
    assert('is_string($config)');
    ... 
}
Just do not assume $config is a string if there is even the remote possibility that tests will exercise the class without a config variable. Only if $config is invariably needed for the class to work we should check its type and structure.
Of course, we should also test in some way that the class is correctly instantiated, but this part should be covered by integration tests, or unit tests for the factory or the container. Integration errors are easy to spot since calling a method on null is not allowed, and if the construction process is not complete the first access to the missing collaborator stops the execution of the entire suite.
Now that you know the power of assertions, try to take the best out of them as they are not substitutes for separate unit testing, but can be replaced by unit tests in many cases.

Saturday, November 28, 2009

Saturday question: mixing Repository and Active Record

Saturday is becoming the 'questions day of week', since it is not the first time that after a week of work some readers email me to carry on the discussion on design and testability, two topics that are stressed in my blog posts. :)
This week, Fedyashev wrote to me about mixing architectural patterns in a single application:
I really like these Active record and Repository patterns.
The drawback of Repository pattern is its cost(takes more time then
Active record). Benefit is higher abstraction which really helps on
complicated business logic.
The drawback of Active record is that lower testability(db interaction
is required) and harder in handling complicated domain logic.
Is it acceptable to take the best of these two patterns to be used in
the same application?
I was thinking about using Active record for simple CRUDs and Repository
for complicated domain objects.
The idea behind this intention is to keep cost of code lower but still
have a good code.
What would you recommend?
There are cases in which Active Record would be an acceptable pattern. Since the drawback of Active Record is little testability, the primary scenario for its application is when there is nothing to test. Some applications are data intensive and require only to move information back and forth from the database.
CRUD screens, as you suggest, often have little logic and can take advantage of active records. But we should evaluate case by case, since it is very easy for logic to leak into Active Record instances, and logic should be thoroughly tested.
For example, logic is present in managing validation of entities upon insertion and editing: a classical situation is searching for already existent nicks upon user registration. A Repository is capable of performing validation using external resources as they can be injected at construction or passed as a method parameter, while an Active Record probably not (and it will be more complex to test this validation).

Another problem I see in mixing up these patterns is the different libraries requirements. Typically, we want repositories to aggregate an instance of a lower-layer framework that encapsulates Sql queries or whatever storage we are using (Hibernate or Doctrine 2), while Active Records are subclasses of other frameworks abstract base classes (Zend_Db or Doctrine 1).
The paradoxical result is that implementing both patterns leads to use two different version of Doctrine at the same time, which I do not recommend for maintenance reasons and code clarity.
A solution would be keep the implementations in two separate BoundedContext, which are different domain models that can communicate, for instance using the same underlying relational database. Though, BoundedContext is a DDD term and suppose that you work with persistent-ignorant models in both contexts.

However, the real choice is not between Active Record and Repository but between Active Record and Data Mapper (persistence-ignorant domain model). It seems for instance that Doctrine 2 provides a default repository class you can tweak later, although it has default methods only for retrieving entities and not to insert them (I think the insertion can be managed with events). It's not really difficult to change your approach from:
$user = new User();
$user->nick = 'John Doe';
$user->save();
to:
$user = new User();
$user->nick = 'John Doe';
$em->save($user);
when what you gain is freedom from activating a mysql daemon to test the User class, without using Repositories. Repositories may come into play later, when and where you want a finer control on the bridge with the database.

Friday, November 27, 2009

The best things in life are degradable

When we talk about degradability, usually the discussion is about javascript widgets.
Degradability is the property of a web application to maintain much of his functionality even if javascript and other advanced tecnologies are disabled by the client. There are different approaches for crafting a degradable application: some developers choose degradable widgets which transform in normal form elements if javascript is not available (actually, they remain normal form elements), while Gmail has a different and separate plain old html version.
It is really possible that javascript is not used on the client: without taking into account screen readers and strange corporate browser policies, there are very important web users that normally cannot execute javascript and Flash. They are Google crawlers.
Thus web applications degradability is a good idea: enhancing the experience for some users, but still maintain a basic standard interface and functionality.
However, degradability is not limited to javascript libraries.

PubSubHubbub is the ugly name of a protocol for nearly-instant distributed dispatching of feed updates. A PubSubHubbub server for example can sit between blogs and readers: everytime the blog has published a new article, it notifies the server which takes care of informing subscribers, reducing the load on the blog hosting.
The system is degradable in the sense that even if the blog does not implement the protocol and does not notify the PubSubHubbub server when new content is available, the server will still periodically ping the blog at regular intervals to check by itself. The subscribers will get updates more slowly, but the overall functionality is preserved.

Finally, the most diffused implementation of a degradable device is in form of cache, which is a storage area included at the hardware level in every modern computer, and at the software level in nearly every site we visit.
The hardware cache, for example, is a very fast and small piece of memory that contains a subset of the computer Ram's content, which change to reflect the data the CPU will probably ask for in the near future. The CPU normally fetches content from the cache, but does not rely on it: cache misses happen every second.
Still, the hardware cache system have enormous advantages, because most of the time the CPU requests are fulfilled without reading Ram. When central memory access is necessary, data is still available transparently (only more slowly) and the control unit of the CPU can theoretically be agnostic on the cache (but in practice it has to know it very well for optimization reasons).
An hardware cache is so advantageous that commonly there are multiple levels of it in a system (named L1, L2, L3). Another form of cache is the virtual memory implementation.
Degradability is present in every cache since the circuits that it is composed of are costly, and the hardware engineers are satisfied of enhancing data access performance for the local references. When there is a jump to a far routine, the access time is degraded.

A degradability pattern is present in the web, that takes care of compatibility with all the devices that forms the cloud: mobile phones, desktop machines, PDAs, old web servers and browsers. When you are working on javascript widgets or a Flash site, think of the users that do not have the resources to use it.

Thursday, November 26, 2009

Agile estimating and planning review

Agile Estimating and Planning by Mike Cohn is a masterpiece on Agile management techniques, especially in dealing with schedules and application features. I just finished reading it and it gave me a very positive perspective on classical development conundrums like schedule and scope.
Agile does not solve problems for us, nor promises to eliminate every issue. The 300+ pages cover the majority of the topics in the, like the title says, estimation and planning field for an Agile team. The author writes in an honest style and anticipate reader's questions and objections.
There are many concepts scattered trough the book:
  • The Agile planning approach: we don't know much at the start of a new project, but after every iteration we get to know more about the domain and the application. Thus, we can improve our estimation of remaining work, while changing scope and release dates to deliver the maximum value. So we should keep planning, but be ready to throw away the plans.
  • Estimating size and estimating time are two different processes: velocity is the parameter that links them. Size is described by different, relative variables than actual time needed (like story points or ideal days).
  • What's a story point? And a release burndown chart? We often use Agile terms without referencing read formal definitions and they can seem mumbo jumbo to the uninitiated developer. Actually, Agile is not complicated if you take a bit of time to learn; you probably already know nearly all of the math involved in this book, but a glance at probability theory could help.
  • Tools like questionnaires and charts for tracking progress explained from the ground up. Back at the first chapter, I had not an Agile theorical foundation, but I still found the book exciting to read and very accurate.
  • Common practices for stories management can help you to mix up, split and join user stories. Estimation can be a difficult process but in this context it is not a random guess.
  • Prioritization of stories, along with iteration and release planning: the 1,000 and 10,000 feet views on your project life and scope.
  • Plenty of practical examples are spreaded throughout the chapters, and the author reports how to implement the techniques described in a real project, by consistently taking a swimmer statistics management application as the main example. This consistency helped me to get the overall picture.
  • Finally, a fictional case study is presented at the end of the book, to pull all things together and see an Agile project worked out from the initial requirements gathering to the deliver date.
After having read this book, every project now seems a big opportunity to apply an Agile approach. I strongly recommend it if you want to get started with story points, iteration and other great Agile concepts.

Wednesday, November 25, 2009

Testing ebook upcoming


This is the temporary cover of my upcoming ebook, Practical Php Testing. It is a parody of the famous illustration from the book The Little Prince.
This publication focuses on testing and designing php code, with the aid of the leading tool for test automation, PHPUnit. Testing is a skill which is often neglected by php developers, but testable code inherit many benefits of the good design rules it is forced to observe.

Here is a list of included content:
  • a collection of the articles from the php testing series, adapted to the book format. These articles cover the path from basics such as installing phpunit to advanced features like mocks and code coverage.
  • nicely-formatted working code samples in the form of PHPUnit test cases. I believe teaching by examples is by far more effective than abstract discussions.
  • glossary for must-know terms: it's not cool to consult links while reading a book, so I collect specifical terms at the end of the book.
  • TDD exercises at the end of each chapter, which will help the reader to apply the practices he has just learnt by producing working code, with PHPUnit as the only infrastructure needed. Along with code examples, exercising is the faster way to grow as a tester and programmer.
  • I intend also to include a bonus chapter on Test-Driven Development theory, if there is interest by readers. Practical Php Testing is not a book on TDD, but I think the natural evolution of test-infected programmers brings them to embrace TDD.
The size is now around 50 pages, and the book will be published in the first days of December, with Creative Commons license (subscribe to the feed if you want to be notified). You will be free to produce how much copies you want by any mechanical and electronic means (with correct attribution): printing it, sharing it via BitTorrent, emailing it to your friends are things I actually encourage.
Also let me know if you would enjoy other testing topics to be treated as bonus chapters.

Tuesday, November 24, 2009

Mistakes of a freelancer

I have been inspired by a post by Soon Hui to write about my mistakes. I have evolved much on the programming side during these years, but my biggest mistakes have been social and economic: dealing with other people. Thus, I have collect a list of my errors committed as a php freelance developer.
  • Giving out your personal phone number: no matter what, use separate phone numbers for clients and friends. People have the tendency to call in awkward hours, and having a single number you can shut off after the workday has finished helps your work-life balance .
  • Lack of tests: every application you write will be maintained in the future, often by you. Even small tweaks (that you can't honestly charge for) can break an application and the safety net of a test suite will free you from the burden of manual testing.
  • Providing fixed estimates: estimates should be given in the form of a range, and the whole process of estimation and planning in software development is more complex than the average person thinks. Counting billable hours just does not work and an application's size should be assessed during the requirements gathering.
  • Tasks instead of features: one pillar of Agile processes is that features are the metric of success and accomplishment, and not tasks. Even if you're working on fixed-price waterfall projects, focus on giving out the features requested because no customer has the time to comprehend technical and infrastructure tasks such as "database modelling".
  • Thinking that a client knows what he needs: even in porting legacy applications, no customer really understands the kind of software he wants. It is our job to interact with him to distinguish between mandatory and exciting features and providing the highest value in an application, since we have great  programming skills but little domain knowledge. Often emphasis is put on gold plated features which are really not worth their cost and can cause disasters in the long run (and maybe you are even forced to prioritize features with high risk and little reward). Dialogue, dialogue, dialogue.
  • Not defining economic terms early: when you write the first line of code you should have an agreement on your reward. This rule of thumb can seem obvious to us, but remember that customers usually come from a whole different world.
It's a long list, but I feel that I have grown for more than five years and since I started my journey in computer science even before, I'm approaching the 10000 hours as a developer but still gaining basic experience in business. These errors are something I really had to try by myself.
Have you some freelance experience to share? What do you feel you could have done differently during your career?

Monday, November 23, 2009

Firefox without a mouse

As a developer I have made an habit of using the keyboard for the majority of tasks. Vim for example is my favorite text editor, which does not require point-and-click. This is a productivity requirement: the less my fingers move between the keyboard and the mouse, the faster I am in consulting documentation and other developers' blogs; vim even goes further and lets you scan a document without leaving the home row.

Firefox is also an application where I try to avoid mouse (or touchpad if I am using the EeePC). Unfortunately most sites are not really accessible and I have to resort to mouse for links and forms: it's not satisfying when you [Tab] trough a form and end up in some other place in the page.
Though, Firefox's user interface is really usable without resorting to the mouse. Here are some shortcuts I wanted to share with you:
  • <Ctrl>T: create a new, empty tab, and give it focus.
  • <Ctrl>W: close the currently selected tab.
  • <Ctrl><Shift>T: reopen the last closed tab.
  • <Ctrl>PagUp, <Ctrl>PagDown: move between the opened tab.
  • <Ctrl>L: give focus to the location bar.
  • <Ctrl>K: give focus to the quick search bar. If you set the browser.search.openintab directive in about:config to true, search queries will be opened in new tabs. Remember that often search engines and websites like php.net and Wikipedia implement the OpenSearch specification, allowing you to add them to the quick search list of engines.
  • <Alt>Down to select the search engine when you are typing in the quick search bar.
For instance, to search the strpos() function on php.net, assuming that you have stored it in the available engines:
<Ctrl>T, <Ctrl>K, <Alt>Down to select php.net, strpos<Enter>
Or, if browser.search.openintab is set:
<Ctrl>K, <Alt>Down to select php.net, strpos<Enter>
If php.net is already selected since you have already looked for other functions:
<Ctrl>K, strpos<Enter>
Or, given that php.net implements nice urls:
<Ctrl>T, <Ctrl>L, strpos<Enter>

Happy browsing with Firefox and the keyboard! :)


    Saturday, November 21, 2009

    Mocking static methods: the road to Hell...

    ...is paved with good intentions.
    On Thursday I came across the video presentation of a tool to mock static methods in Java:
    PowerMock can be used to test code normally regarded as untestable! Have you ever heard anyone say that you should never use static or final methods in your code because it makes them impossible to test? Have you ever changed a method from private to protected for the sake of testability? What about avoiding “new”?
    Horror. Not because of the technological revolution - maybe being able to subclass final classes only in the test suite would be fine, and this is just monkey patching - but because static methods have nothing to do with testability. Or, they make code untestable, but testing is not the reason why we want to avoid them.

    Dependency Injection is not about testing: it is about good, decoupled design and reusable units. Static calls represent hidden connections between your classes that are not listed anywhere, and they give access to global and usually mutable state. They help creating a misleading Api and constitute the procedural part of object-oriented programming.
    This is an example of bad Api:
    <?php
    class UserRepository
    {
        public function getAllUsers()
        {
              return Db::findMany('User');
              // this would be the same: 
              // return Db::getInstance()->findMany('User'); 
        }
    }
    If in different tests we use this code:
    $repository = new UserRepository();
    $users = $repository->getAllUsers();
    we probably get different results. But why? We had instantiated the same object and called the same method without any parameters. This Api is far from being referential transparent, and that's because it has a static reference that jumps to global state instead of having it injected.
    So the question is not How can I mock a static method?, but How can I avoid static methods?
    <?php
    class UserRepository
    {
        private $_db;
    
        public function __construct(Db $db)
        {
            $this->_db = $db; 
        }
        public function getAllUsers()
        {
              $this->_db->findMany('User');
        }
    }
    <sarcasm>It was very difficult, isn't it?</sarcasm> Dependency Injection is in fact very simple: ask for what you need, so that people that read your code know where collaborators are coming from and you are not secretly smuggling them in your classes.

    I once heard a talk by Misko Hevery where he was making a point about dangerous global state, and he described the following situation. Suppose you have your beautiful test suite, and some classes in it accesses global state, so the tests are not real unit tests but have a bit of integration in their hearts. Keep in mind that in this environment tests are not isolated, since they depend on some global variables, maybe masked as singletons or static classes. The order of execution matters.
    So you have MyClassTest, that is the last test case to run in the hour-long test suite, and it is failing. So you try to run it alone, and it pass. Then you run the suite again to figure out what is happening, and it fails again. The test is referencing some global state in which particular data is placed by the other tests before MyClassTest is run. The only way to hava a reliable failing is to run all the suite to set up the global state necessary.
    The conclusion is: good luck in finding what is making MyClassTest failing.

    Static methods are not object-oriented: they are a recipe for problems if they carry hidden state. They are global, because you cannot keep them on an instance you can throw away after tests. They make the Api difficult to grasp by hiding dependencies under the carpet. The solution is not "Let's open up the hood and wire these things differently so that we can mock static methods", but "Let's not use static methods."
    You should even write the tests before the production code if you can. We do not write them only because they catch bugs and regression, but also because real unit tests force to code in a clean and decoupled design. If you are able to unit test your application, its architecture and flow is composed of focused, reliable and reusable parts. If you use static methods and spread new operators everywhere, there are hidden connections between components and mocking these things it's only Action at a distance.

    Friday, November 20, 2009

    The best backup

    Yesterday I was reading some posts in my aggregator about backup techniques, and they reminded me of a (famous?) quote. Thus, I wanted to share my preferred technique for sofware projects backup.
    Only wimps use tape backup: _real_ men just upload their important stuff on ftp, and let the rest of the world mirror it ; -- Linus Torvalds, (1996-07-20). Post to linux.dev.kernel newsgroup.
      I know this is not applicable to every project, but usually open source is full of benefits:
      • if you follow the doctrine (everything not reproducible is kept under version control), there are free services like Google Code and SourceForge which will host the Subversion (or Git) server and care for automatic backup and disaster recovery. This means that if your old hard disk breaks your work is saved up to the last commit.
      • Obviously you can get collaboration and feedback from other developers.
      • It takes minutes to set up a working copy of your project for development and testing as long as you have an Internet connection available: it is globally accessible all over the world.
      There are cloud services for private projects, but they have a cost in money. Open source support is free and competitive today. Every management application you will need is provided and updated by SourceForge: trac, wikis, file release system...
      In general, sharing with the rest of the world your code and your writings is the way to save them from oblivion:
      Some years ago, two programmers at Cisco (the networking-equipment manufacturer) got assigned the job of writing a distributed print-spooling system for use on Cisco's corporate network. [...]
      The duo came up with a clever set of modifications to the standard Unix print-spooler software, plus some wrapper scripts, that did the job. Then they realized that they, and Cisco, had a problem.
      The problem was that neither of them was likely to be at Cisco forever. Eventually, both programmers would be gone, and the software would be unmaintained and begin to rot (that is, to gradually fall out of sync with real-world conditions). No developer likes to see this happen to his or her work, and the intrepid duo felt Cisco had paid for a solution under the not unreasonable expectation that it would outlast their own jobs there.
      Accordingly, they went to their manager and urged him to authorize the release of the print spooler software as open source. Their argument was that Cisco would have no sale value to lose, and much else to gain. By encouraging the growth of a community of users and co-developers spread across many corporations, Cisco could effectively hedge against the loss of the software's original developers. -- Eric S. Raymond, The Magic Cauldron
      What is your opinion on sharing your work as open source?

      Thursday, November 19, 2009

      More questions on controllers testing

      Sune wrote to me yesterday with some questions about testing Zend Framework controllers and proper dependency injection, which to me is a fundamental practice in object-oriented programming. I have already responded to similar mails in the past and this seems to be an hot topic nowadays, so as always I think other readers can benefit from this discussion and I'm sharing it here.
      Because of you I am trying to move my software to use factories and dependency injection, also removing
      singletons and Zend_Registry usage in controllers. But I am a bit confused, what is the right way to do this.
      My plan is to bootstrap the main factory in the bootstrapper, and then use
      $this->getInvokeArg('bootstrap')->getResource('factory') in controllers. Is this good practice?
      Summarizing, the best thing would be creating the controller by yourself (or having its creation configured someway), but it can be an overkill to set up a similar approach on a Zend Framework application, since it requires a third-party DI container and controllers should always be thin.
      Thin controllers means we probably want only to perform integration testing on them with Zend_Test, and not real unit testing as there is not much logic to exercise.
      Since we cannot create in the bootstrap all the collaborators that could be possibly needed (we want to lazy-load collaborators that may not be referenced), your approach is similar to a Guice provider and I think it is very valid. In integration testing you should then use a different factory or configure this one to provide some fake components when the real ones are not applicable. For instance, a mailing service object ca be replaced with a fake implementation that records all the sent mail and lets you assert on them.
      And in conjunction with that. Would it be good practice to inject a Zend_Config object into the factory?
      Some models requires options from the config file, smtp username etc. and the factory would need those information to create the models.
      Of course. It's up to you how to organize the config object, and you can and should change part of it in the testing environment. The power of DI containers is that during configuration you can specify not only scalars like database connection strings, but also different classes and implementation for the collaborators you inject.


      Wednesday, November 18, 2009

      To set or not to set

      You probably know I am a test-infected developer and big proponent of Dependency Injection. You also have seen from the examples in this blog that I favor constructor injection, where a component asks for his collaborators in its own constructor:
      class FacebookService
      {
          private $_httpClient;
          public function __construct(HttpClient $client)
          {
              $this->_httpClient = $client;
          }
      }
      A factory or a configurable container can then recursively resolve dependencies and provide the class with what it asks for. The constructor's wiring code is dumb to write, but it is very concise and expresses intent: assigning the collaborator to a private property which will not be subsequently touched (as there are no setters).
      The Api is also very clear as the constructor specifies everything is needed to compile and instantiate this class, while it does not provide the means to change the collaborators.

      When the collaborators number is high, however, we may find difficult to use a constructor with a long signature. The obvious solution is trying to reduce coupling and analyzing the collaborators to see if everyone of them is really mandatory. It may be the case of a class that has too much responsibilities in accessing different parts of the object graph, or that collaborators leak into the class while they should be encapsulated in some other component.
      Sometimes, there is nothing we can do to reduce the collaborators number:
      class CommentsRepository
      {
          private $_dbAdapter;
          private $_mailer;
          private $_logger;
      
          public function __construct(Zend_Db_Adapter $dbAdapter = null,
                                      Zend_Mail_Transport_Abstract $mailer = null,
                                      Logger $logger = null)
          {
              $this->_dbAdapter = $dbAdapter;
              $this->_mailer = $mailer;
              $this->_logger = $logger;
          }
      }
      This happens in some common cases:
      • the collaborators are options (value objects or scalars) which change the behavior of the component;
      • the class is a mediator between many objects and it is its own responsibility to deal with many collaborators.
      Again, we can try to minimize the options or the objects involved, but the essential complexity will never vanish. Another solution might be passing the services via a method parameter only when they are used, but it usually violates encapsulation (a Controller having to keep a reference to a Logger even if it does not use it). Moreover, they may be needed in every method.
      A long constructor is not clear as in almost any languages there are no named parameters (thanks Python) and we can forgot the parameters order in manual injection, or we may find difficult to extract automatically metadata on the collaborators if we are in a dynamic language like php.
      The type of dependency injection we should adopt is slightly different: setter injection. This approach transforms the CommentsRepository class in:
      class CommentsRepository
      {
          private $_dbAdapter;
          private $_mailer;
          private $_logger;
      
          public function setDbAdapter(Zend_Db_Adapter $dbAdapter)
          {
              $this->_dbAdapter = $dbAdapter;
          }
      
          public function setMailer(Zend_Mail_Transport_Abstract $mailer)
          {
              $this->_mailer = $mailer;
          }
      
          public function setLogger(Logger $logger)
          {
              $this->_logger = $logger;
          }
      }
      Though, there are some problems with setter injection that we should solve:
      • setters allow changing collaborators after the construction: often it is a conterproductive operation and so it should be avoided. The setters can check if the corresponding private property is null before accepting the parameter.
      • the Api is not clear: why there are setters if I cannot set anything? I suggest to extract an interface where the setters are not present. This solves also the previous problem as the client class will depend only on an interface where setters are not defined and in static languages it is not even allowed to call them. In dynamic languages, the developer should refer to the Api documentation of CommentsRepositoryInterface and not of the CommentsRepository concrete class.
      • we may forgot a collaborator: both in manual and automated dependency injection you can forgot to call a setter or to add a collaborator to the configuration, and the result is a broken object hanging around. So you should maintain some form of test for the factory or the container (typically in integration tests). A missing collaborator is a wiring bug and it is simple to solve since it is going to manifest nearly always: the application will explode saying you called a method on null. Note that since I use null defaults for constructor parameters this problem is also present in constructor injection.
      I hope you consider setter injection, as I avoided it without real reasons in the past and the design of your application can benefit from it.

      Tuesday, November 17, 2009

      Doctrine 2 and Zend Framework first date

      This morning I have tried for the first time to use Doctrine 2 in a Zend Framework application. I used the latest release, 2.0.0 alpha3, for this experiment.
      The chosen application is my recently born project NakedPhp, a port of the Naked Objects Java framework which generates the user interface and let the end user manipulate domain objects directly.
      During this first run, I have not set up an application resource yet and I have just hardcoded a few configurations to bootstrap correctly Doctrine. I will publish a resource class (conforming to the Zend_Application_Resource_Resource interface) soon when I have it ready.

      Doctrine\ORM\EntityManager is the Facade class which act as a portal towards the functionality of Doctrine 2, and it is the homologue of Hibernate EntityManager. Our code should interact mainly with this class.
      Though, I have isolated the EntityManager behind an interface since I do not want infrastructure code to slip in NakedPhp for now. The code will obviously depend on Doctrine but it is good practice to have an interface I can mock out easily, as I don't need all the methods of the EntityManager and this way I just hide everything is not mandatory instead of introducing coupling to it.

      Doctrine 2 is released in three packages: Common, Database Abstraction Layer and ORM. Instead of downloading three different packages I just grab them from the subversion repository:
      svn export http://svn.doctrine-project.org/tags/2.0.0-ALPHA3/lib/
      and move the Doctrine/ and vendor/ folders in my library/ directory along with Zend/. The vendor folder contains a small annotation parser.
      It can be useful also to export other resources:
      svn export http://svn.doctrine-project.org/tags/2.0.0-ALPHA3/bin/
      svn export http://svn.doctrine-project.org/tags/2.0.0-ALPHA3/sandbox/
      The bin/ folder contains the doctrine.php and doctrine command-line scripts (same thing), while the sandbox provides a working example of Doctrine 2.

      Doctrine 2 prescribes that model classes (entities) and proxies should be autoloaded, so after moving doctrine.php in application/ I deleted the reference to Doctrine autoloader and added:
      require_once __DIR__ . '/../application/bootstrap.php';
      which is my bootstrap file, where:
      • the library/ folder is added to the include_path
      • the autoloader is set up to load Zend/ classes
      • a Zend_Loader_Autoloader_Resource sets up autoloading for my model classes.
      • my autoloader is set up to take care of \Doctrine and \NakedPhp namespaces.
      In the future, I will add proxies autoloading setup to this file. Since you probably don't have your own autoloader for namespaced classes, you can simply use IsolatedClassLoader from Doctrine\Common.

      It's time to code a cli-config.php file to use with doctrine.php; this file should define two variables (it is well-documented in the sandbox example). My final result is:
      $classLoader = new \Doctrine\Common\IsolatedClassLoader('Proxies');
      $classLoader->setBasePath(__DIR__ . '/../application/');
      $classLoader->register();
      
      $config = new \Doctrine\ORM\Configuration();
      $config->setMetadataCacheImpl(new \Doctrine\Common\Cache\ArrayCache);
      $config->setProxyDir(__DIR__ . '/Proxies');
      $config->setProxyNamespace('Proxies');
      
      $connectionOptions = array(
          'driver' => 'pdo_sqlite',
          'path' => '/var/www/nakedphp/sqlite/database.sqlite'
      );
      
      // These are required named variables (names can't change!)
      $em = \Doctrine\ORM\EntityManager::create($connectionOptions, $config);
      
      $globalArguments = array(
          'class-dir' => __DIR__ . '/../application/models'
      );
      Which is practically the cli-config.php file grabbed from the sandbox, but slightly edited:
      • there were two instances of Doctrine\Common\IsolatedClassLoader, one for the entities and one for the proxies. I deleted the first one since entities autoloading is already taken care for in the bootstrap.
      • I haven't used proxies for now, but the configuration is mandatory. The default namespace and folder are enough.
      • I changed the path to the sqlite database. Sqlite was the fastest choice to get the application up and running, but remember that both the sqlite database file and its directory must be writable by apache and php.
      • I changed also the argument class-dir to specify my entities folder.
      Before starting to use the Doctrine 2 command-line interface, you will have to define annotations on your model classes. For instance, @Column and @OneToOne annotations. Since I have developed without even caring about the database until now, I had to add also an $_id private property. :)
      I also submitted a patch to improve the errors generated by the schema tool in case of incorrect field names referenced on relations, which is what happened to me today.

      Now, it's time to generate your schema:
      php bin/doctrine schema-tool --re-create --config=bin/cli-config.php
      If cli-config.php is in the directory where you issue this command, you can leave out the --config option.
      Maybe after playing a bit with Doctrine 2, you will want to see what was inserted in the database:
      php bin/doctrine run-sql --sql="SELECT * FROM Example_Model_Place" --config=bin/cli-config.php 
      
      To obtain an EntityManager reference in a controller, you can set up a dumb resource that includes cli-config.php and return $em. I used a factory which was already available and add a method for retrieving the instance.

      So I now have an hacked working instance of Doctrine 2 in my project. The next step will be writing an application resource to allow configuration to be specified following the standard, in application.ini. I will publish this resource in the next days.

      The image at the top is the NakedPhp example application screen that says saving was successful. I implemented the storage of my persistence-agnostic in-memory object graph in less than an hour.

      Monday, November 16, 2009

      Defaulting to private

      While writing classes, which scope do you normally choose for your fields and methods? Do you consciously choose a visibility or just stick with what your IDE proposes?

      Before dig in the philosophical discussion, let's summarize the different visibilities available in object-oriented languages:
      • Public: no limitations.
      • Package: scope level available in Java but not in C++ and php. The member can be accessed by code that resides in its own package.
      • Protected: the member can be accessed only by its own class and by subclasses. Similarly, Friend visibility in some languages is used to allow friend classes to access the field or method.
      • Private: the member can be accessed only by code of its own class.
      Note that I said its own class, and not its own object, as private fields are usually accessible by other objects of the same class.
      <?php
      class ComplexNumber
      {
          private $_real;
          private $_imaginary;
      
          /* constructor and other methods... */
      
          public function equals(ComplexNumber $another)
          {
              if ($another->_real == $this->_real 
              and $another->_imaginary == $this->_imaginary) {
                  return true; 
              }
              return false;
          }
      }
      The reason behind this behavior is that encapsulation with limited visibility facilitates changing the code. If we are modifying the $_real and $_imaginary fields we are changing the class, so there is no problem in limiting visibility to the class code instead of a particular object (forcing an object to access only its own private fields and not its brothers' ones).

      I am a proponent of test-first approaches to software development and this means I often implement in production code the simplest thing that could possibly work, and that makes my tests pass. Another limitation I follow during development is visibility: if there are no tests that access a property or a method, there is no reason for it to be public.
      Whenever I create a new class member, being it a field or a method, I default to private or protected for its visibility. Only if a method is the subject of a test it becomes public, while it is very rare that I need a public field.
      This rule of thumb gives the code the advantage of increased encapsulation, since public visibility is chosen only if mandatory. The distinction between private and protected is relevant only if you allow subclassing, an action that you usually control if your code is not part of a framework or public library.

      In php 4 there were no visibility modifiers, and all members were public. This was one of the serious limitation of supporting php 4 for object-oriented applications, like CakePHP did. Apart from naming conventions, there was no way to tell apart methods which were in the Api and internal ones, which could change in any subsequent release. You can tell developers that private methods start with '_', but if there is no forced limitation of scope it is very simple for a developer to do the quick hack such as calling an internal method.
      Thus, the advantage of greater encapsulation is to keep the Api small, reducing coupling and having less method signatures carved in stone. Any source file can call a public method, so you cannot simply change its parameters and side-effects. While renaming a method is often simple thanks to modern IDEs (or sed), if you change the behavior of the method you have to review every place in the codebase to make sure there are no incorrect assumptions.
      On the other hand, if I decide to expose a method as public, I want a test that forces me to comply and that would be red if the visibility is different. This process explicates the contract of the class, and avoid breaking a method in the future.

      Sometimes refactoring produces a private method worth of testing. However, a private method cannot be tested directly but only trough public methods of the same class. Still, it is covered indirectly because if it were not you would simply remove it as unreachable code.
      If you feel like testing it independently, it is probably the sign this method carry out tasks out of the current class's responsibility, and you should move the private method in a collaborator (which may have to be created from scratch).
      Encapsulation would be maintained since the collaborator would be stored as a private property of the SUT. If the method cannot be moved out and cannot be exposed, it should not be tested: you maintain the freedom to change it later as long as your public methods does not make the tests red.

      So, whenever you are coding object-oriented applications, I suggest to keep as many methods as you can private, and expose public methods only for contracts between classes.

      Sunday, November 15, 2009

      Now on Facebook

      I created an handy Facebook page for people who want to follow Invisible to the eye via this great social network. All new posts will be also referenced on its Wall to provide prompt notifications for you.




      Saturday, November 14, 2009

      How to eliminate singletons (part 2)

      In the previous post I hacked-up a very small component for automating dependency injection of php classes, which mimic the behavior of the moltitude of dependency injection frameworks out there.
      The question arised in the last part of the article was How can I construct objects with a shorter lifetime than the application-wide one? In Zend Framework's case there are many controllers and view helpers that we want to instantiate only if necessary and which proper instantiation should happen only after a bit of logic has been executed, for instance after the http request has been elaborated by the router to produce a controller name.

      With manual dependency injection the solution would be straightforward: just inject a ControllerFactory in the Zend_Controller_Dispatcher_Standard, which is the object that currently creates the controller. But the Zend_Controller component manages userland controllers and we cannot code a factory in advance to cover the possible use cases in every domain, nor we want the end-user to write boilerplate code in the form of a factory for his controllers.
      Since the only viable solution is automatic dependency injection, we should create a configurable factory instead:
      $controllersConfig = array(
          'My_Controller' => array(
              'myClass' => 'My_Class'
          ),
          'My_Class' => array(
              // ...My_Class's collaborator names listed by key
          )
      );
      $frameworkConfig = array(
          'Zend_Controller_Dispatcher_Standard' => array(
              'controllerProvider' => 'Zend_Controller_Provider' 
          ),
          'Zend_Controller_Provider' => array(
              'config' => new Zend_Config($controllerConfig) 
          )
          // ...collaborator configuration of the router, the front controller, etc.
      );
      I use the name Provider since it was popularized by Guice, but it is in fact a configurable Abstract Factory. The Zend_Controller_Provider class can be decoupled with a small interface:
      interface  Zend_Controller_Provider_Interface
      {
          public function getController($name);
      }
      class Zend_Controller_Provider extends Injector 
            implements Zend_Controller_Provider_Interface
      {
          public function __construct($options)
          {
              parent::__construct($options['config']);
          } 
       
          public function getController($name) 
          {
              return $this->newInstance($name);
          }
      }
      The provider subclasses the Injector from the previous example to keep the code example short, but using composition of an Injector instance would make no difference.
      The Injector code has to be extended a little to allow specifying objects in the configuration:
      <?php
      class Injector
      {
          // ... constructor and private members
          public function newInstance($class)
          {
              if (is_object($class)) {
                  return $class;
              }
              // default flow
              if (!isset($this->_config[$class])) {
                  // it's a literal value like 'mydbpassword';
                  return $class;
              }
              $collaborators = array();
              foreach ($this->_config[$class] as $collaboratorName => $collaboratorClass) {
                  $collaborators[$collaboratorName] = $this->newInstance($collaboratorClass);
              }
              return new $class($collaborators);
          }
      }
      Including objects in the configuration should be done only in the case we need collaborators which are really only Value Objects with no behavior, like Zend_Config. It would be more complicated to set up different Zend_Config objects for injection, while it is actually a newable class (and so should not be injected, like we would not inject an ArrayObject).

      Let's list the advantages we have just gained:
      • Independent instantiation of controllers. The Dispatcher will new only the controller actually needed (but it is the injector that will call new). Another provider can be set up for view helpers and other short-lived classes.
      • Real unit testing for controllers and view helpers: it will be easy to inject stubs and mocks in a controller since now it is forced to have setters or a unified constructor.
      • Real unit testing for the dispatcher: we can inject easily a fake Zend_Controller_Provider_Interface implementation, and test that given the right parameters it requests the chosen controller class.
      Note that it is correct for shorter-lived classes to have field references to longer-lived ones. For instance the Url view helper should be injected with the current instance of the router since it needs the collaboration of something that knows all the defined routes. So one more question arises: how the problem can be solved now, given that the Zend_Controller_Provider class is created only with a Zend_Config as a parameter?
      The simplest thing that can work in this case is to implement a generic Provider interface with a setInjector(Injector $longLiveInjector) method, so that when the interface is detected the original injector can create a clone of itself (that still references the created objects) and pass it to the short-lived objects provider, in this example Zend_Controller_Provider. The Provider then can add to its Injector the controller configuration instead of extending it, even favoring composition over inheritance.

      I'm sure production-ready DI frameworks solve all these problems and maybe other ones, since it took us only two days to figure out the theory of operations. Automated Dependency Injection is a must-have in Zend Framework 2.0 and this is a proof of concept of how it can be implemented.

      Friday, November 13, 2009

      How to eliminate singletons

      There has already been a bit of discussion in the zf-contributors mailing list and in the wiki about the Zend Framework 2.0 roadmap, which will guide the Zend Framework's evolution in php 5.3 and the new classes' development.
      One of the key point in the architectural discussion is singletons usage. Singletons are scheduled for termination and in my opinion they just have to go (unless they represent a global state which cannot be reset for real, such as autoloading).

      It is actually very simple to eliminate singletons: just force the components to ask for what they need in the constructor or via setters or via inject*() methods, instead of looking up a singleton trough a static method only to obtain a reference.
      Once this fundamental decoupling is achieved, the hard part is tackling the construction problem: Zend Framework has a big codebase and writing a factory (manual dependency injection) for every use case is not viable.
      Thus, automatic dependency injection needs to be called upon. There are many xml-configured dependency injection frameworks for php to incorporate, but let's show a simple example of how they works.

      Suppose we want a reference to an instance of My_Class in a controller, and we want to have its collaborators automatically injected by some component. As the requirements say, My_Class has a unified constructor which would pass the collaborators to the setters, but dependencies resolution is possible also with setter-based injection. I would really prefer a bunch of setters if there are many dependencies.
      This code is based on Zend Framework 1.x classes as I do not want to counfound anyone.
      <?php
      class My_Class
      {
          public function __construct($options)
          {
              // calling setters or having them called...
          }
      
          public function setAdapter(Zend_Db_Adapter $adapter)
          {
              if (isset($this->_adapter)) {
                  throw new Exception('Adapter cannot be changed once set.');
              }
              $this->_adapter = $adapter;
          }
      }
      Since I want to show how to eliminate singletons, I have declared a dependency towards a Zend_Db_Adapter instance, which is commonly put in Zend_Db_Table::setDefaultAdapter() as a singleton. It does not matter that singletonitis is cared for by another class: in this case it is still mutable global state and the same problem is present for the front controller instance. I did not feel like including a Zend_Front_Controller instance in this example as it is often used only as a mean to access other objects, and the underlying problem (Law of Demeter breakage) is not resolved by injecting it.

      Configuration has to be defined in a plain old array which will be used by the container:
      <?php
      $config = array(
          'My_Class' => array(
              'adapter' => 'Zend_Db_Adapter_Mysqli'
          ),
          'Zend_Db_Adapter_Mysqli' => array(
              'driver_options' => ...  // username and password here
          ),
          'My_Controller' => array(
              'my_class' => 'My_Class'
          )
      );
      Eventually, the controller can now ask for collaborators instead of grabbing them from other sources. This data structure is very basic but it will do the trick for now.
      An automatic dependency injection component would simply recursively resolve dependencies:
      <?php
      class Injector
      {
          private $_config;
      
          public function __construct($config) { // assigning config to private member.. }
      
          public function newInstance($class)
          {
              if (!isset($this->_config[$class])) {
                  // it's a literal value like 'mydbpassword';
                  return $class;
              }
              $collaborators = array();
              foreach ($this->_config[$class] as $collaboratorName => $collaboratorClass) {
                  $collaborators[$collaboratorName] = $this->newInstance($collaboratorClass);
              }
              return new $class($collaborators);
          }
      }
      and the bootstrap would be very simple:
      $injector = new Injector($config);
      $application = $injector->newInstance('Zend_Application');
      Of course Zend_Application might be refactored to become the injector itself, so the object really constructed could be an action or a front controller. This code is very basic and can be improved with objects caching (to prevent multiple connections from being instantiated) where appropriate, a list of "global" classes (which scope is however limited to a Zend_Application object and can be throwed away whenever you want) to prevent configuration to grow too much, and in another hundred ways.

      The problem with this approach is that we have, for example, to instance all the controllers and all the view helpers because we don't know which will be used during this request, since the object graph construction process is completed before the request management: what did you expect from ten lines of not-TDDed code? :)
      This paradigm of one-time instantiation is typical in Java applications, where nearly eveything is instanced in the bootstrap "just in case". Php has a shared-nothing architecture and instancing more than the necessary objects would be a waste.
      In the next post I will solve this big issue using deferred istantiation and different injectors, and showing how nearly all singletons can be reduced to injected collaborators.

      Thursday, November 12, 2009

      Zend Framework 2.0

      I just wrote in a comment that Zend Framework 2.x did not yet exist and, today, the lead developer Matthew Weier O'Phinney has posted the roadmap for the 2.0 version of the framework, invitating php developers to participate in the discussion by commenting on the wiki or via the zf-contributors mailing list.
      I already posted some questions on the wiki, but I would like to expand my thoughts on the architectural changes from a testing and design point of views, that are what interest my readers.
      Here's a list of the guidelines that have the greatest impact.

      Unified constructor
      Every injectable class will have a constructor which accepts an array or a Zend_Config (I guess it will become Zend\Config) instance whose elements are passed to setters. This is becoming more and more the most adopted injection paradigm also in the 1.x branch. A standard is necessary and in a dynamic language like php the unified constructor works well, while accessing type hinting via reflection like Dependency Injection frameworks do in static-typed languages is troublesome and I don't even now if it is possible.

      Elimination of singletons
      Eventually, singletons will be refactored and we will stop seeing Zend_Controller_Front::getInstance() calls scattered in all the codebase. The various reset operations accomplished by the Zend_Test component during tests teardown should have hinted that something was fundamentally problematic in the design.

      Design by contract
      Multiple implementations of interfaces should be allowed by injection hooks, and interfaces should be extracted where needed. The abstract base classes so diffused in the 1.x version of the framework do not make easier to favor composition over inheritance since they force our classes to choose them as the unique parent.

      Exceptions without inheritance
      An example of avoiding problematic inheritance is the elimination of deep inheritance trees for exceptions. Base exceptions of a component should be interfaces. This is a finess I appreciate.

      Namespaces
      Obviously php 5.3 namespaces will be adopted and the _ in class names will be substituted by the \ namespace separator.
      A thing I downvote is the separate namespace for testing: I would rather have unit tests in a parallel tree like they were in java packages (library/Zend/Filter/Int.php and tests/Zend/Filter/IntTest.php). A parallel structure gives different advantages:
      • saves the developer from having to import classes he is writing tests for;
      • naming collision with the production code are impossible since the class and file names in the parallel tree all end with 'Test.php';
      • the unique use statements expose only the imports the code is performing from different namespaces, expressing the real coupling of the system under test. Coupling to classes which live in the same folder is often inevitable and is not interesting to keep it under control.
      Mvc implementation
      The Mvc implementation (Zend_Controller) will undergo some surgery to improve performance and simplicity. In my opinion many features can be dropped: for instance I stopped using the action stack to perform multiple operations because it was too slow. It is also not test-friendly since you cannot assert that different actions were performed: I prefer to simply keep my logic out of controllers, so I see no use for great features in request dispatching as long as controllers are proposed as thin classes.
      The point of the design by contract paradigm is to gain freedom in setting up the Mvc stack and injecting different collaborators which adhere to the contract. I saw the Phly_Mvc reference implementation and the interfaces are already present; it also uses a publish/subscribe pattern to dispatch events. In Zend Framework 1 we were able to substitute parts of the Mvc machine only by subclassing, while in 2 the approach will be cleaner as code will only depend on an interface.

      Zend_Session
      The backward-compatibility break in 2.0 version is the right time for changing also Zend_Session and improving its Api and behavior. Testing that involves sessions is difficult and I think the right approach is not transforming Zend_Session in a singleton, but decoupling the controllers code with a session container, which implementation can be injected during bootstrap: it is something I would want to isolate just like a mail service.
      The Zend_Session_Namespace objects in 1.x access directly the $_SESSION variable, mutating global state and becoming hard to test: a different solution could be placing them in a $_SESSION variable when they are constructed or reconstituted in their factory (which now does not exist). Anyway, the session namespace objects should do less work, particularly in the constructor.

      The discussion is important as we are now shaping the future framework. Feel free to counterargue in the comments and in the wiki. :)

      Wednesday, November 11, 2009

      What's going on with php object-relational mappers

      Every once in a while, in a post, I say:
      The Domain Model should not depend on anything else; it is the core of an application. Classes should not extend or implement anything extraneous. I do not want User extends Doctrine_Record. I want User.
      Sorry to stress you, but this is one of the points of DDD and the one that gives advantages even applied in other architectures. The persistence problem can be solved by generic object-relational mappers which will act as the bridge between entities and the database used for persistence. But where do we find a generic Orm?

      In php, no real generic Orms existed until 2009, since Zend_Db, Doctrine 1, Propel etc. are all implementations of the Active Record (or similar data gateway) pattern, requiring for example all your User and Post classes to subclass a base record. The only way to obtain a persistence-agnostic model was manual implementation of all the mapper classes, which translate between database rows and object graphs. When you are managing more than a few different entity classes the problem become quickly intractable.
      The Data Mapper pattern describes exactly a generic Orm, but it is an even more general concept in the sense that the mapping is not limited to a relational database like MySql or Sql Server. You can write a Data Mapper to store your objects in plain text files or document-oriented databases if you want.

      In the last summer, I encountered two in-development solutions to solve the persistence agnosticism problem: Doctrine 2 and Zend_Entity. They are implementations of the Data Mapper pattern, with reference to the Jpa specification and a similar Api. I learned that the Java guys had implemented a real Data Mapper years ago: Hibernate. Jpa is only a specification extracted as a subset of Hibernate, and it is an additional layer abstraction that decouples your mapping code (annotations or xml files) from the particular Orm.
      Anyway, I contributed to the lazy loading capabilities of Doctrine 2 with php code and to Zend_Entity with some small patches before its discontinuance. I am currently waiting for a stable version of Doctrine 2 to integrate it in NakedPhp, only because I am not worrying about persistence for now. It is the power of the Data Mapper approach that decouples my work from a specific storage such as a relational database.
      Fast forward to today, and Doctrine 2 is in alpha for being thoroughly tested. Zend_Entity has been dropped instead, in favor of Doctrine 2 integration in the Zend Framework. It is not useful to maintain two different code bases, with the same Api transposed from Jpa, which do the same persistence-related dirty work and developed by the same people. It's just a waste of the contributor's time.

      Thus, Doctrine 2 is going to become the first production-ready Orm for php and to be favored with seamless integration in both Zend Framework and Symfony. If you have not yet tried it, you may want to give it a shot.
      If you feel like helping with the integration, which involves Zend_Tool components for generation and Zend_Application resources, join the zf-doctrine mailing list. The integration also comprehends Doctrine 1 since Doctrine 2 requires php 5.3 and its adoption by hosting companies will be gradual.
      The adoption of the 2.x branch, when ready, would give your design the freedom from the database you want. Doctrine 2 is for php the greatest thing since sliced bread.

      Tuesday, November 10, 2009

      Mocking and template methods

      As you probably know, stubbing or mocking is a practice used in unit testing where a class methods are substituted via subclassing with test-friendly versions of themselves. The difference between stubbing and mocking resides in the place where the assertions are made, but it is not the main topic of this post.
      The need for small and cohesive interfaces is particularly perceived while mocking a class. We typically want to test in isolation a unit and write mocks for its collaborators without going mad.
      Let's see an example of a class I may want to mock:
      class NakedEntity
      {
          public function getMethods()
          {
              return $this->_class->getMethods();
          }
      
          public function getMethod($name)
          {
              $methods = $this->_class->getMethods();
              return $methods[$name];
          }
          
          public function hasMethod($name)
          {
              $methods = $this->_class->getMethods();
              return isset($methods[$name]);
          }
      
          // other methods, constructor...
      }
      As I said earlier, mocking is effective if there is a small interface to mock. Note that every class defines an implicit interface: the set of its public methods. Sometimes the interface comprehends several methods that give access to the same data or behavior, and that have to be present to avoid abstraction inversion. In this particular case, if I defined only getMethods() to conserve a small and cohesive interface, every class that depends on NakedEntity would have to implement the other two missing methods.
      Mocking all three methods of NakedEntity in phpunit means writing this:
      $mock = $this->getMock('NakedEntity');
      $mock->expects($this->any())
           ->method('getMethods')
           ->will($this->returnValue(array('doSomething' => ..., 'foo' => ...)));
      $mock->expects($this->any())
           ->method('getMethod')
           ->will($this->returnValue(...));
      $mock->expects($this->any())
           ->method('hasMethod')
           ->will($this->returnValue(true));
      Compare this to the creation of a real NakedEntity. I should definitely create a real object to save test code, but the unit tests will then not be executed in isolation and I will have to define a mocked NakedClass object (the $this->_class property) and break the Law of Demeter.
      Moreover, the mocking capabilities of phpunit are limited and for example we cannot define different return values based on the parameters (a real subclass is needed in that case) without a callback. I could mock only the methods effectively used from the SUT, but I don't really know which of them are really called (since they are more or less equivalent) and I want to refactor the SUT without changing the tests.
      So I found a 2-step alternative solution.

      Step 1: convenience methods become template methods
      I started with refactoring the NakedEntity class:
      class NakedEntity
      {
          public function getMethods()
          {
              return $this->_class->getMethods();
          }
      
          public function getMethod($name)
          {
              $methods = $this->getMethods();
              return $methods[$name];
          }
          
          public function hasMethod($name)
          {
              $methods = $this->getMethods();
              return isset($methods[$name]);
          }
      
          // other methods, constructor...
      }
      The users of getMethods() are now template methods, and the base method (primitive operation in design patterns jargon) can be subclassed to provide alternative behavior. The subclass can be implemented as a real reusable class, which will include a setMethods() utility method (no pun intended), or via mocking.

      Step 2: mock the base method
      Now only getMethods() need to be substituted:

      $mock = $this->getMock('NakedEntity', array('getMethods'));
      $mock->expects($this->any())
           ->method('getMethods')
           ->will($this->returnValue(array('doSomething' => $myMethod, 'foo' => ...)));
      $this->assertEquals($myMethod, $mock->getMethod('doSomething')); 
      
       
      This approach works well because the contract of NakedEntity is already cohesive and the different methods provide different ways to do the same thing. The template methods contain nearly no logic and they are exercised in unit tests which are not their own: it is a very small trade-off because it is highly improbable that they will break and cause another class unit tests to fail without reasons. The template methods in this case are only glue code.
      Don't use this testing pattern as an excuse to write many public methods: you should indeed break up a class in different units if its contract grows too much. You can implement a Decorator pattern if convenience template methods are implementing business logic on a public method, or it may be the case that your class is doing too much and the Api is too complicated. Another viable solution if you have an explicit interface instead of a concrete class is creating a reusable Fake implementation which will contain the convenience methods as well.
      In conclusion, if you have a contract with many cohesive and dumb methods, which relies on a central one to provide data, you can create template methods and reuse them in other unit tests, via subclassing of the primitive operations.

      Monday, November 09, 2009

      Why I don't like the Bowling Game kata

      I am a big fan of Uncle Bob and I think he is a master of object-oriented programming and architecture. However, in my opinion his Bowling Game kata (solution for the bowling game scoring problem) it's not the right example to explain design via Test-Driven Development.
      The kata consists in showing how to TDD a Game class which calculates the score for a bowling game basing on the pins rolled by the balls. It is a pure TDD exercise, accomplished by writing one test at the time, making it pass and refactor the Game class before adding a new one. This kata has circulated for long in the blogosphere.
      Although it is indeed useful to see a perfect and practical example of testing-first for the naive programmer, I didn't enjoy reading the various slides.

      For instance, these are the scoring rules for 10-pin bowling games, extracted from the Kata:
      The game consists of 10 frames as shown above. In each frame the player has two opportunities to knock down 10 pins. The score for the frame is the total number of pins knocked down, plus bonuses for strikes and spares.A spare is when the player knocks down all 10 pins in two tries. The bonus for that frame is the number of pins knocked down by the next roll. So in frame 3 above, the score is 10 (the total number knocked down) plus a bonus of 5 (the number of pins knocked down on the next roll). A strike is when the player knocks down all 10 pins on his first try. The bonus for that frame is the value of the next two balls rolled [...].
      These are fixed business rules. Once the last test in is place, there is nothing to add to the class since the bowling rules are considered standard. It is perfect now and forever. How many times did you write a class that never changed?
      A design is considered good if it accomodates change to the business requirements, and I would have tried to implement different bowling scoring systems to see how the Game class can be modified to pass the new acceptance tests without breaking the existing ones. There is a total of five requirements expressed by the tests and while they are added the code is refined accordingly, but it is more an academic example than a real world situation.
      When you propose TDD to fellow programmers it seems reasonable, but the first question that they ask you is How do I test my database application?, not in what order should I put my test helper methods? There are different priorities in learning TDD.

      This kata is interesting in the sense that it implements a scientific method by changing one factor at the time in the TDD equation and analyzing the result. You find people that execute the same kata in different languages; with different frameworks; different programming paradigms; and so on.
      When the majority of frameworks out there are still using static methods, executing katas sometimes crosses the border of overdesign/gold plating/endless polishing. I learned something from the kata, but it's not rocket science. Why not write a patch to some open source project that you use every day instead of investing time in doing the same thing again and again?

      There are many math problems which are perfect for learning a new language: consider for instance writing a function which finds perfect numbers. If I were a computer science professor I would assign these problems to C beginners as they are very handy in having no external dependencies and in being easily solvable.
      The limit in such learning methodolody is that only structured programming capabilities are exercised and there are many development patterns which are not necessary for solving math problems, and which will not be implemented by a beginner if not forced. Doing the simplest thing that could possibly work leads a beginner to create a function for calculating perfect numbers, not a class so well-tested in different scenarios. The same is true for calculating a bowling game score.
      The kata is a very narrow case, recalled when you test a class with no dependencies, no lifecycle problems and with the smallest Api you will never encounter.
      Before reading the kata I was excited because I was going to learn how Uncle Bob works. But the demonstration is in fact very basic. I would have preferred to see how he deals with interfaces design, legacy code refactoring, collaborators extraction when classes grow.

      ShareThis