Wednesday, December 09, 2009

What everybody ought to know about storing objects in a database

Probably during your career you have heard the term impedance mismatch, which is commonly referred to the problems that arise in converting data between different models (or between different cables if you are into electrical engineering).
Usually the complete expression is object-relational impedance mismatch, which indicates the difficulties of the translation process between two versions of the same domain model: the former resides in memory and it consists in an object graph, while the latter is used for storage and it is a relational model stored in a database. The conversion between parts of both models happens many times while an application runs, and in php's case at least once for every http request.

Object-relational mappers like Hibernate and Doctrine are infrastructure applications which deal with the mismatch, doing their best to implement a transparent mechanism and providing the abstracted illusion of a in-memory model, like the Repository pattern. These particular Orms are the best of breed because they do not force your object graph to depend on infrastructure classes like base Active Records.
The connection between the two models is defined by the developer, by providing metadata about its classes properties: for instance you can annotate a private field specifying the column type you want to use to store its value. But what are the translation rules the developers provide configuration for? Here is a basic set of the tasks an Orm performs for you.
  • Entity classes are translated to single tables as a general rule, with a one-to-one mapping. The class private or public fields which are configured for storage define the columns of a particular table.
  • Objects which you pass to the Orm for being stored become rows of the correspondent table. A User class becomes a User table containing one row for every registered user of your website.
  • A primary key is defined by choosing between the existing fields or inserted ex-novo. Often as a requirement the developer should explicitly define a field.
  • Repeated single (or multiple) class fields become new tables, and the problem of representing them is shifted to representing relationships; in the domain model this kind of objects are Value Objects, which is semantically different from Entities, but databases only care about homogeneous data and such objects receive no special treatment.
  • One-to-one and many-to-one relationships can be represented with a foreign key on the source entity that resembles the original pointer to a memory location.
  • One-to-many relationships are a bit trickier because they require a foreign key on what is called the owning side, in this case the target entity. What can seem strange at first glance is that even if in the domain the relationship is unidirectional (pointer to a collection), elements of a collection need to have a reference to the owner to unequivocally identify it. The mutual registration pattern can be used to build a correct Api starting from this constraint; I will write about it in one of the next posts.
  • Many-to-many relationships are managed by creating an association table that references with foreign keys the participating entities. Every row constitutes a link between different objects; sometimes it may be the case to use such a table also for one-to-many associations, to avoid having a back reference field on collection elements.
  • Inheritance is by far the most complex semantic to maintain as it is not supported at all by relational databases: Single/Class/Concrete table inheritance are three famous patterns which organize hierarchical objects in tables, but I prefer to avoid inheritance altogether if not strictly necessary.
And this list is one of the reasons why in your college courses they told you that a model should be as small and simple as possible: a simple model can undergo much more simple transformations for storage and transmission than a complex one.
Note that some contaminations leak from the database side to the object graph, such as the bidirectionality of one-to-many relationships, present even when it is not required by the domain model.
Orms take care for you of this translation process and can even generate the tables from the classes source code, but they only perform automatically the tedious part of object-relational mapping. You should know very well how the mapping works if you plan to use such powerful tools without reducing your database to a list of key/value pair.

The image at the top is an Entity-Relationship model used to design database schemas. I find it not useful anymore as I now prefer to think in classes terms, with Uml diagrams.


wmsoczynski said...

Does keeping models as small as possible dont force the anemic domain model anti-pattern ?

Giorgio said...

You can include any logic in your entity classes because methods do not cause problems in mapping, What cause problems are usually inheritance trees and redundant relationships where they are not needed, or they can be unidirectional instead of bidirectional.

Anonymous said...

Amazing as always