Wednesday, September 16, 2009

How to TDD a database application

After the publishing of Failed attempt to apply Tdd response, some readers have asked me how to test a database application, as an example of code which is not suitable for xUnit frameworks.

Let's start with saying that database is not a special case for testing, as it's only a port of your application. Whenever the application interact with an external program I would say it presents a port, and this pattern is called Hexagonal Architecture.
Your effort should be in testing thoroughly the core of the application, building a solid object-oriented Domain Model piece after piece, by writing a test at the time an making it pass. The Domain Model should not have dependency on any infrastructure like database, http requests, twitter adapters, and so on: this refinement of the Domain Model pattern is called Domain-Driven Design and when applied produces easy testable code. The Domain Model would be tested by injecting fake adapters as there should be no logic in the database: this is object-oriented programming and in database object do not exist nor we can test it easily and automatically with Junit.
Though, this approach is very complicated and it is considered when the domain has a rich set of rules and behavior, while many applications have only CRUD capabilities.

Think of your application as existing only in Ram memory and strip out all the unnecessary code. The classes which remains are the core of the Domain Model. For instance if I had to manage the list of the users of a forum, I would write initially only a User class and a generic collection of User objects. User in my point of view is a POJO or POPO, which does not extend anything:
class User { ...

Insulating the database from view will hide this generic collection behind an abstraction
, since it is not present at all in memory except for caching. Subsets of it can be reconstituted as needed. This abstraction is called a Repository or, at a low level, a DataMapper. Hibernate for Java and Doctrine 2 for php are example of a DataMapper pattern: they let you work on your objects and then take them and synchronize the database with the new data you have inserted: a change in a User's password, or new Users to be added. To polish the DataMapper Api, which is very generic, a UserRepository class can be created.
Even if you do not have a generic DataMapper, and work with mysql_query, PDO or JDBC queries, you can write a UserRepository which will act as the port for the database (or maybe a text file; since the repository decouples it, the storage mechanism can be anything from a memcache server to a group of monkeys which writes down on paper serializations of objects).
Depending on your architecture, your controllers or other domain classes will now have the UserRepository as a collaborator, and will talk to it and call its methods instead of accessing the database directly; this is a form of Dependency Injection. Obviously if there are other entities which are persisted, like Groups, Posts, etc. they should have their own repository class.

Of course the point of this discussion is how to test this code, since to write it we have to prepare a test before (it's called red green refactor and not code and then try to test). If we manage to write these tests first, the code will be automatically testable, and very decoupled and reusable as this is a characteristic of testable code.
What we need to write are not tests, but unit tests. If you want to check the entire path of data in the application, you can write integration tests which will exercise even the user interface, but Test-Driven Development prescribes to write unit tests: your test methods should have a dependency only on the system under test, and to the interface it uses.
Continuing our example, we need:
  • unit tests for User, Group, Post classes (Domain Model entities);
  • unit tests for controllers or other classes from the Domain Model which uses the adapters.
  • unit tests for UserRepository and similar classes (adapters);
  • unit tests for DataMapper or (infrastructure);
The point of unit testing is you can test singular components separately, and not the entire application as you would do with functional testing. Moreover, you must test each component separately or otherwise it would reveal too difficult to write classes to satisfy an integration test and the TDD benefits of adding a bit of design with every test method would be lost.
With this knowledge, we can say:
  • unit tests for Domain Model entities must be written by the developer of this application. If the project is mostly a CRUD application and is data-intensive, these test cases will be very short.
  • unit tests for controllers and everything that uses the adapters also must be written by the developer, mocking the adapters out. If you use the real adapters in testing, it will be integration testing and it will be heavy and brittle; you won't know if it's the controller or the adapter which does not work after a red test.
  • unit tests for the adapters concrete classes: this is the only interaction with the database. Fortunately, we are unit testing for this classes so we can even use a new database for every test method as every class will present a few methods; compare this approach with testing every single feature of the application by using a real database.
  • unit tests for infrastructure: these are included in the framework so we shouldn't worry.
Note that if you use a DataMapper which abstracts the particular database vendor as infrastructure, you can run the unit tests for the adapters by using a in-memory database like sqlite instead of the real database which will likely be very heavy to manage and recreate every time; this way, you de facto exclude database from your unit tests. Of course some integration tests will be necessary, but they are not part of TDD and of the design process.

I hope to have given you an idea of what unit tests means and how to deal with an application which uses a database for persistence, which is a very common scenario. Feel free to raise questions in the comments if something is not clear.

Do you want more? There's a book for .NET developers which explains DDD and database-independent testing.
The image at the top is a photograph of integrated circuits, which are real world components designed for testability and which are tested indipendently from the card where they will work.


Fedyashev Nikita said...

Thanks for the post!

It gave me some interesting ideas and a better understanding of Registry pattern.

+1 subscriber :)

Anonymous said...
This comment has been removed by a blog administrator.
John Nilsson said...

Can't say that this was very helpful. The problem with testing the database isn't verifying that entities are stored and retrieved.

The problem is in verifying that a complex SQL query actually aggregates the data in a correct way. That no join accidentally blow up the results when assumptions of cardinality in the dependent tables are wrong. That multiple levels of summing and averaging actually calculate the correct results, and so on.

That is, how do one test a big SQL query as a unit?

Giorgio said...

The query would be encapsulated at the Repository level, so that the unit tests of every concrete implementation of a Repository set up the necessary fixture and check the results of the query. The catch is that you can substitute the real DataMapper you use in production (let's say it uses a MySQl/MSSQL schema) with another lightweight instance of the DataMapper, which for instance uses Sqlite (which is not the same but it is a good approximation for the majority of the queries). The DataMapper is tested independently as it is a generic piece of software ([N]Hibernate, Doctrine...) so you're testing Repository + DataMapper but actually you're testing only Repository because DataMapper should never break more than a java.lang.HashSet class is going to break.

John Nilsson said...

So in general. The trick to unit- testing SQL is to use ANSI SQL as far as possible.

The usual argument is otherwise that portable SQL is good for migration (which just isn't true. A tuned query is never portable).

Maybe the benefit of being able to unit test is worth the pain of not using the database to it's fullest capacity though.

But imagine you pay a handsome sum of money to be able to use something like oracle. Either you don't use some of it more advanced features such as analytical queries, which is a big waste of the money you paid to use it, or you pay another hefty sum for an embedded copy.

Giorgio said...

Actually when using TDD on a Domain Model, the Sql is generated by the Data Mapper which translates the object model into a relational model that mirrors it, otherwise two duplicated models would have to cohesist (DRY violation). If you use Oracle, certainly you're not going to put an object model on top of it: you just go with Oracle schemas and PL/SQL. Then your business logic will be in kept in the Oracle pot, but then to apply TDD directly on the relational model I don't know what tools are available (is there a xUnit framework for Oracle stored procedures?)