Invisible to the eye: Property-based testing primer

I'm a great advocate of automated testing and of finding out your code does not work on your machine, 30 seconds after having written it, instead of in production after it has caused a monetary loss and some repair work to be performed. This is true for many different kind of testing, from the unit level (which has also benefits for internal quality and design feedback) to the acceptance level (which ensures the stakeholders get what they need and documents it for the future). Your System Under Test can be a single class, a project or even a collaboration of [micro]services accessed through HTTP from another process or machine.

However, classic test suites written with xUnit and BDD styles have some scaling problems they hit when you want to exercise more than some happy paths:

it is difficult to cover many different inputs by hand-writing test cases, so we stick with at most a dozen of cases for a particular method.
There are maintenance costs for every new input we want to test: each need some assertions to be written and be updated in the future if the System Under Test changes its API.
We tend not to test external dependencies such as the language or libraries, since we trust them to do a good job even if their failure is our responsibility. We chose them and we are deploying our project, not the original authors which provided the code "as is", without warranty.

Note that here I define input as the existing state of a system, plus the new input provided by a test (the Given and When part of a scenario) and output as not only the actual response produced by the System Under Test but also the new state it has assumed.

sort()

Let's take as an example a sort() function, which no one implements today except in job interviews and exercises.
Assuming an array (or list, depending on your language), we can produce several inputs for the function like we would do in a kata:

[1, 2, 3]
[3, 1]
[3, 6, 5, 1, 4]

and so on. When do we stop? Maybe we also need some tricky input:

[]
[1]
[1, 1]
[2, 3, 5, 6, 8, 9, 1, 3, 6, 7, 8, 9]

Once we have all the inputs gathered, we need to define what we expect for each of them:

[1, 2, 3] => [1, 2, 3]
[3, 1] => [1, 3[
[3, 6, 5, 1, 4] => [1, 3, 4, 5, 6]
[] => []
[1] => [1]
[1, 1] => [1, 1]
[2, 3, 5, 6, 8, 9, 1, 3, 6, 7, 8, 9] => [1, 2, 3, 3, 5, 6, 6, 7, 8, 8, 9, 9]

You can do this incrementally, growing the code one new test case at a time, but you have to do it anyway. Considering this can become boring and error-prone in this toy example makes you wonder what to do when instead of sort() you have a RangeOfMoney class which is key to your business and manages point-like and arbitrary intervals of monetary amounts (true story).

Property-based testing in a nutshell

Property-based testing in an approach to testing coming from the functional programming world. To solve the aforementioned problems (and get new, more interesting ones), it follows these steps:

generate a random sample of possible inputs.
Exercise the SUT with each of them.
Verify properties which should be true on every output instead of making precise comparisons.
(Optionally) if the properties verification failed, possibly shrink to find a minimal input that still causes a failure.

How does this work for the sort() function?

We can use rand() to generate an input array:
This array is composed by natural numbers (Gen\nat) and it is long up to 100 elements (Gen\pos(100)), since very long arrays could make our tests slow.

Then, for each of these inputs, we exercise sort() and verify a simple property on the output, which is the order of the elements:
This is not the only property that sort() maintains, but it's the first I would specify. There are possible others:

every element in the input is also in the output
every element in the output is also in the input
the length of the input and output arrays are the same.

Here is the complete example written using Eris, our extension for PHPUnit that automates the generation of many kinds of random data and their verification.

How to find properties?

How do we apply property-based testing to code we actually write every day? It certainly fits more in some areas of the code than in others, such as Domain Model classes.

Some rules of thumb for defining properties are:

look for inverse functions (e.g. addition and substraction, or doubling an image in size and shrinking it to 50%). You can use the inverse on the output and verify equality with the input.
Relate input and output on some property that is true or false on both (e.g. in the sort() example than an element that is in one of the two arrays is also in the other)
Define post conditions and invariants that always hold in a particular situation (e.g. in the sort() example that the output is sorted, but in general you can restrict the possible output values of a function very much saying it is an array, it contain only integers, its length is equal to the input's length.)

[2, 3, 5, 6, 8, 9, 1, 3, 6, 7, 8, 9] makes my test fail

Defining valid range of inputs with generators and the properties to be satisfied is a rich description of the behavior of the System Under Test. Therefore, when a sort() implementation fails we can work on the input in order to shrink it: trying to reduce its complexity and size in order to provide a minimal failing test case.

It's the same work we do when opening a bug report for someone else's code: we try to find a minimal combination that triggers the bug in order to throw away all unnecessary details that would slow down fixing it.

So in property-based testing the [2, 3, 5, 6, 8, 9, 1, 3, 6, 7, 8, 9] can probably be shrinked to [2, 3, 5, 6, 8, 9, 1, 3, 6, 7, 8] and maybe up to [1, 0], depending on the bug. This process is accomplished by trying to shrink all the random values generated, which in our case were the length of the array and the values contained.

Testing the language

So here's some code I expect to work:
This function creates a PHP DateTime instance using the native datetime extension, which is a standard for the PHP world. It starts from an year and a day number ranging from 0 to 364 (or 365) and it build a DateTime pointing to the midnight of that particular day.

Here is a property-based test for this function:
We generate two random integers in the [0. 364] range, and test that the difference in seconds of the two generated DateTime objects is equal to 86400 seconds multiplied by the number of the days passed between the two selected dates. A property of the input (distance) is maintained over the output in a different form (seconds instead of days).

Surprisingly, this test fails with the following message: what happened is we triggered a bug of the DateTime object while creating it with a particular combination of format and timezone. The net effect of this bug could have been that our financial reports (telling daily revenue) would have started showing the wrong numbers starting from February 29th of the next year.

Notice that the input is shrinked to the simplest possible values that trigger the problem: January 1st on one value and March 1st on the other.
Eventually we found a easy work around, as with a couple more lines of code we can avoid this behavior. We could do that only after discovering the bug of course.

In conclusion

Testing an application is a necessary burden for catching defects early and fix them with an acceptable cost instead of letting them run wild on real users. Property-based testing pushes automation also in the generation of inputs for the System Under Test and in the verification of results, hoping to lower the maintanance cost while increasing coverage at the same time.

Given the domain complexity handled by the datetime extension, it's doing a fantastic job and it's being developed by very competent programmers. Nevertheless, if they can slip in bugs I trust that my own code will, too. Property-based testing is an additional tool that can work side by side with example-based testing to uncover problems in our projects.

We named the property-based PHPUnit extension after Eris, the Greek goddess of chaos, since serious testing means attacking your code and the platform it is built on in the attempt of breaking it before someone else does.

References

Eris on Packagist
Eris examples on GitHub
The tools that inspired Eris from other languages: QuickCheck, ScalaCheck, Erlang QuickCheck.

Invisible to the eye

Thursday, June 18, 2015

Property-based testing primer

sort()

Property-based testing in a nutshell

How does this work for the sort() function?

How to find properties?

[2, 3, 5, 6, 8, 9, 1, 3, 6, 7, 8, 9] makes my test fail

Testing the language

In conclusion

References

No comments:

Featured post

A map metaphor for architectural diagrams

Popular posts