Saturday, July 04, 2009

In-memory, but persistent, testing with Sqlite

When testing database interaction an in-memory database can be used with Sqlite. This solution is fast, but what if you want to see the state of database after termination or share it between processes, like multiple calls to phpunit?
A (maybe) long test suite
I have a test suite for Ossigeno (fully committed in the subversion repository) that prepares a dummy database and runs an hundred tests against it. It is started by a phing task that make some exec calls - one to the script that regenerates the database, and one or more to phpunit to run the test suite.
Having a dummy database is common: as is is always refreshed, the test failures depend only on the state of code. This free the suite from intermittent test, that pass and fail basing on what is store in the tables.
Using Doctrine as ORM, the models of Ossigeno are portable on all the dbms supported by the Doctrine abstraction layer or Pdo, so it's simple to prepare a fresh sqlite database (contained in a file), having thrown the previous version before regenerating. In production it is substituted by a real mysql database.

I don't want to wait
The test suite needs to be run fast: the point is that the more time it takes to complete a run, the more it's likely that after code changes the tests are executed, to not lost time waiting them to finish.
So the first step was already taken: moving from testing with an host with mysqld as infrastructure - remote or local - to a single db.sqlite file. This also makes the tests runnable on every machine with php-sqlite3 installed instead of mysql-server (for instance a netbook like my EeePc).
The next step is to eliminate the bottleneck of input/output on disk, slower than the memory. To take advantage of the features of sqlite, we can set the dsn of the connection to 'sqlite://:memory:' and the tables will be stored in ram instead that in a file.
However, this approach has some disadvantages:
  • the database will be thrown away when the Pdo connection is closed: we cannot take a look to it after the tests have been run;
  • when running a test in isolation, and not the full suite, the database must also be refreshed, since a more or less fresh instance does not exist anywhere.
  • since my regeneration script is executed in a separate process with exec, the database is deleted even before the first test starts.

The solution at Os level
Fortunately, I work on Unix machines. I use Ubuntu instances, but I guess Tiger/Leopard will do the same job.
Firstly, I configured sqlite to work on a file. Since it is a regular file (created when does not exists), it is persisted between connections and processes and my regeneration script can terminate gracefully and pass the token to tests; a single test can also be run in isolation on the existing db.sqlite file. All the advantages of the file approach are available.
But we want to store it in memory, so there's no i/o on disk and , so I execute:
$ sudo mount tmpfs -t tmpfs sqlite-folder/
to keep the database folder in ram. It is only wiped out on reboot.
This way, the full test suite has gone from taking 30-40 seconds to take less than 30 seconds, depending on the workload of the machine. On my EeePc I have configured the system /tmp folder to be mounted as tmpfs on startup also to reduce writing on the Ssd drive.
As a side note, accessed file in Linux are cached in ram, so probably if you repeatedly run the test suite using a file on disk you will get similar performance than mounting the directory in memory. On the other hand, if you run it for the first time in a while, it will work natively in ram, and you don't have to study the problem of how well your db file is cached in ram and how much cache misses will be encountered. Well, if :memory: dsn is available in sqlite, there's a significative difference from ram and cached disk sectors in ram.
Have a nice day with your Ram!


Anonymous said...

Genial dispatch and this post helped me alot in my college assignement. Gratefulness you as your information.

Anonymous said...

Opulently I assent to but I think the collection should acquire more info then it has.