The database word indicates the collection of data saved somewhere in the infrastructure: this blog saves posts in a database, Firefox saves preferences in a little database in your filesystem. When we are talking of MySQL and Sql server, the correct term is Database Management System, or DBMS. A DBMS provides all the procedures and libraries needed to access and modify the database content, abstracting away the need to use low-level filesystem functions.
The abstraction a dbms builds upon its storage engine is called model. There is more than one model for representing data: some have become obsolete and other ones are so widely used that they probably will never disappear in the next twenty years no matter what happens in the database scenario.
Let's start talking about database model. The models I am presenting here are logical models, which specify how information is presented to the user or to the application which talks to the database. This model must be distinguished from the physical model, which consists in the way the dbms chooses to persist data on disks, tapes and other mass memory devices.
When you open a spreadsheet or a simple ini file, you are using a flat model. Data are organized in one list or table, with similar elements or rows.
The problem with a flat database is representing the relationships between elements. For instance how do you associate two users that are friends or a user with the groups he chooses to belong to? Although this limitations, flat files and databases are useful because of their data access simplicity where there is the need for lightweight and easily implementable systems, such as config files or spreadsheet saved in the comma-separated values format.
The first improvement to the flat model, applied for the first time in the 1960s, is the addition of a pointer to every instance of data (which is called record). This pointer establish a child-parent relationship towards another record, where every one of them has at most one parent.
Information is thus presented in a tree structure, which is a good model for many real world entities. For instance, you can represent too much things in Xml, which is a hierarchical model too. The Dns system and Ldap protocol present a hierarchical model, but they are a specific application of this paradigm and not a dbms-like product.
A variation of the hierarchical model is named network model, which removes the limit of one parent and places a set of many pointers into every record. The tree-like model becomes a graph. This concept dates back to the 1965, so there's no buzz around network models and hierarchical general-purpose databases.
The relational model is the widely used one which I was referring to at the start of this post. Relational dbms like MySQL, Microsoft SQL Server, PostgreSQL and even Sqlite constitute the majority of today's applications storage mechanisms.
The relational model describes data in various tables, where each row has a fixed set of fields that form the table's columns. Continuing with our example, a User and a Group tables, with their fields lists, are a relational model.
In this model, relationships between entities are established with the equality of some columns, often named primary keys and foreign keys. If you have not previously lived under a rock, you probably have used these databases a lot so I won't bore you anymore.
This is when the problem becomes interesting. Data that were presented in a relational model yesterday is being substituted by an object model, built with classes and instances. Maintaining the whole object graph in memory is usually too expensive to be feasible, since it requires enormous amount of resources like memory and cpu cycles to search objects in the mess of a 2-million-objects graph.
To preserve the object model and persists data at the same time, various solution have been proposed in the years:
- Serialize the objects and put the binary stream on disk. Simple, but how do you search a User instance by his nick when he signs in?
- Mapping the objects to a relational database, more or less saving every object as a row of a table which corresponds to the class of choice. An Object-Relational Mapper is the tool used for these operations, but it has some limits, for instance in dealing with class inheritance; these limits are known as the impedance mismatch. The Orm is also used for retrieval, ideally abstracting away the relational storage from the application.
- Put the objects in an object database.
An object database would be great to use in real world, but currently going the Orm route is the standard since relational databases are at the world's center. Data typically don't fall from the sky, and there is a need for synchronization between applications and machines in a relational database. Thus, different object models can work on the same data and even with applications which don't use an object-oriented paradigm.
Instead of presenting a fixed structure, a dbms can instead show a semi-structured model, where records have no enforced lists of attributes. These type of entities are called documents, and an application or middleware which relies on it can store nearly everything as a document property: the advantage of this technique is that you'll never have to update a schema. An example of a document-oriented, open source database is CouchDB.
The schema-less novelty is one of the last buzzword in the database world, and it's still not clear what will the future of these solutions be. Relational databases are probably here to stay as there is a lock-in from applications all over the world to their data model. Object and document models are often presented as a panacea to improve scalability and simplicity, but they are not a standard at the moment. Try to explore new persistence solution, as the technology changing pace is slow in this field, but it exists.
In the image at the top, a typical relational model for an employees table, with the specification of primary key, fields and foreign keys.