Suppose we have the classic User and Group objects in a php application. The same process is valid for any language that uses a relational database as a backend, like the Java/Hibernate example, but I will include php code snippets.
Tipically User and Group has a many-to-many relation: every user can subscribe to many groups and every groups has a bunch of users as its members. This means that if all objects are loaded from the database you can navigate like this:
$user = find(42); // find the user with id == 42
Having a potentially infinite navigability in an object graph is not a great practice, but sometimes many-to-many relationships are needed and the simplest approach is to incur in additional overhead (why there is overhead will be explained later) and provide this simple Api.
The main problem is that we cannot load all the object graph, because it will not fit in the memory of the server and it will take much time to build, depending on the size of the database. Nor we can load an arbitrary subset of it, because the user has the freedom to navigate groups and users to the depth he need: if he arrives at the edge of the graph will see null values/null pointers where objects should be.
Solution: Lazy loading
The Proxy pattern comes in our help:
A proxy, in its most general form, is a class functioning as an interface to something else. The proxy could interface to anything: a network connection, a large object in memory, a file, or some other resource that is expensive or impossible to duplicate.
In the first navigation the proxy will be a subclass of Group. The Data Mapper, without further instructions, will provide this type of object graph:
var_dump($user); // User
var_dump($user->groups); // Group_SomeOrmToolNameProxy
var_dump($user->groups instanceof Group); // true
As said previously, an Orm that provides lazy loading will produce a proxy class to substitute the original one. The code for this class is generated on the fly and will look like this:
public function __construct(DataMapper $mapper, $identifier)
// saves as field references the arguments
private function _load()
public function sendMessageToAllUsers($text)
The new class proxies to the original methods but calls _load() before, giving the object a usable state. Before a call to _load(), or a call to one of the proxied methods, the domain object has only identifiers field in its internal data structure.
Since it is a subclass of Group, it provides the same interface to the user, that will not even notice that it is not its clean, infrastructure-free Group class.
What does it mean?
It means that the first level objects are fully loaded, while second level ones are placeholders that contains only the information to load themselves. Only if you access them they will go to the database to fetch all their fields:
$user = $em->find(42); // a query on the user table
echo $user->groups->name; // another query is executed on the groups and user_groups tables
We can complicate this pattern as far as we want:
- We can specify with join() commands on the query objects or 'join' options for the methods of the Data Mapper how far it should go with the initial loading. This way if we know that we need to access the second level of the graph starting from user, only one query is executed. Still if we go to the third level ($user->groups->users->role) without specifying that at the reconstitution of $user, additional queries will be sent to the database and performance will suffer.
- We can activate or disactivate lazy loading, or logging it to view where it is executed and impacts the performance
Today I contributed to the ORM\Proxy namespace of Doctrine 2, the component that generates the proxy classes and objects basing on metadata about the original classes, and non-invasive lazy loading will soon became a reality.