Sunday, March 12, 2017

Eris 0.9.0 is out

In 2016 I moved to another country and as a result of this change I didn't had much time to develop Eris further. Thankfully Eris 0.8 was already pretty much stable, and in this last period I could pick up development again.

What's new?

The ChangeLog for 0.9 contains one big new feature, multiple shrinking. While minimization of failing test cases is usually performed with a single linear search, multiple shrinking features a series of different options for shrinking a value.
For example, the integer 1234 was usually shrunk to 1233, 1232, 1231 and so on. With multiple shrinking, there are a series of options to explore that make the search logarithmic, such as 617, 925, 1080, 1157, 1195, 1214, 1224, 1129, and 1231. If the simplest failing values is below 617 for example, at least (1234-617) runs of the test will be skipped by this optimization, just in the first step.
This feature is the equivalent of QuickCheck's (and other property-based testing libraries') Rose Trees, but implemented here with an object-oriented approach that makes use of `GeneratedValueSingle` and `GeneratedValueOptions` as part of a Composite pattern.

This release also features support for the latest versions of basic dependencies:
  • PHPUnit 6.x is now supported
  • PHP 7.1 is officially supported (I expect there were mostly no issues in previous releases, but not the test suite fully passes.)
Several small bugs were fixed as part of feedback from projects using Eris:
  • the pos() and neg() generators should not shrink to 0.
  • float generation should never divide by 0.
  • shrinking of dates fell into a case of wrong operator precedence.
  • reproducible PHPUnit commands were not escaped correctly in presence of namespaced classes.
A few backward compatibility fixes were necessary to make room for new features:
  • minimumEvaluationRatio is now a method to be called, not a private field.
  • GeneratedValue is now an interface and not a class. This is supposed to be an internal value: project code should never depend on it and it should build custom generators with map() and other composite generators rather than implementing the Generator interface, which is much more complex.
  • the Listener::endPropertyVerification() method now takes the additional parameters $iterations and the optional $exception. When creating listeners should always subclass EmptyListener in order not to have to modify the not implemented methods, which will be inherited.

What's next?

My Trello board says:
  • still decoupling from PHPUnit, for usage in scripts, mainly as a programmable source of randomness.
  • more advanced Generators for finite state machines and in general a more stateful approach, for testing stateful systems.
  • faster feedback for developers, like having the option to run fewer test cases in a development environment but the full set in Continuous Integration.
I'm considering opening up the Trello board for public read-only visibility, as there's nothing sensible in there, but potential value in transparency and feedback from the random people encountering the project for the first time.

As always, if you feel there is a glaring feature missing in Eris, feel free to request it on the Github's project issues.

Sunday, February 12, 2017

Book review: Fifty quick ideas to improve your tests
Fifty quick ideas to improve your tests is, well, a series of fifty quick ideas that you can implement on some of your automated test suites to improve their value or lower their creation or maintenance costs.

These ideas are pattern-like in which they are mostly self-contained and often independent from each other. They are distilled from real world scenarios that the authors (David Evans, Tom Roden and Gojko Adzic) have encountered in their work.

This format helps a lot readability, as ideas are organized into themes, giving you the ability to focus on the area you want to improve and to quickly skip the ideas that do not make sense in your context or you find impractical or not worth the effort. For the same reasons I enjoyed the book this one is a sequel to, Fifty quick ideas to improve your user stories. Moreover, both were published on Leanpub so you have the ability to adjust the price to the value you think you'll get out of them; and despite Leanpub's large collection of unfinished books, this one is 100% complete and ready to read without the hassle of having to update to a new version later (who really does that?)

Some selected quotes follow, highlighted phrases mine.
Something one person considers critical might not even register on the scale of importance for someone from a different group. 
Drawing parallels between the different levels of needs, we can create a pyramid of software quality levels: Does it work at all? What are the key features, key technical qualities? Does it work well? What are the key performance, security, scalability aspects? Is it usable? What are the key usability scenarios? Is it useful? What production metrics will show that it is used in real work? Is it successful? 
In order to paint the big picture quickly, we often kick things off with a ten-minute session on identifying things that should always happen or that should never be allowed. This helps to set the stage for more interesting questions quickly, because absolute statements such as ‘should always’ and ‘should never’ urge people to come up with exceptions. 
Finally, when an aspect of quality is quantified, teams can better evaluate the cost and difficulty of measuring. For example, we quantified a key usability scenario for MindMup as ‘Novice users will be able to create and share simple mind maps in under five minutes’. Once the definition was that clear, it turned out not to be so impossible or expensive to measure it.
Avoid checklists that are used to tick items off as people work (Gawande calls those Read-Do lists). Instead, aim to create lists that allow people to work, then pause and review to see if they missed anything (‘Do-Confirm’ in Gawande’s terminology). 
A major problem causing overly complex examples is the misunderstanding that testing can somehow be completely replaced by a set of carefully chosen examples. For most situations we’ve seen, this is a false premise. Checking examples can be a good start, but there are still plenty of other types of tests that are useful to do. Don’t aim to fully replace testing with examples in user stories – aim to create a good shared understanding, and give people the context to do a good job. 
Waiting for an event instead of waiting for a period of time is the preferred way of testing asynchronous systems
The sequence is important: ‘Given’ comes before ‘When’, and ‘When’ comes before ‘Then’. Those clauses should not be mixed. All parameters should be specified with ‘Given’ clauses, the action under test should be specified with the ‘When’ clause, and all expected outcomes should be listed with ‘Then’ clauses. Each scenario should ideally have only one ‘When’ clause that clearly points to the purpose of the test.
Difficult testing is a symptom, not a problem. When it is difficult for a team to know if they have a complete picture during testing, then it will also be difficult for it to know if they have a complete picture during development, or during a discussion on requirements. It’s unfortunate that this complexity sometimes clearly shows for the first time during testing, but the cause of the problem is somewhere else. 
Although it’s intuitive to think about writing documents from top to bottom, with tests it is actually better to start from the bottom. Write the outputs, the assertions and the checks first. Then try to explain how to get to those outputs. [...] Starting from the outputs makes it highly unlikely that a test will try to check many different things at once, 
Technical testing normally requires the use of technical concepts, such as nested structures, recursive pointers and unique identifiers. Such things can be easily described in programming languages, but are not easy to put into the kind of form that non-technical testing tools require. 
For each test, ask who needs to resolve a potential failure in the future. A failing test might signal a bug (test is right, implementation is wrong), or it might be an unforeseen impact (implementation is right, test is no longer right). If all the people who need to be make the decision work with programming language tools, the test goes in the technical group. If it would not be a technical but a business domain decision, it goes into the business group. 
Manual tests suffer from the problem of capacity. Compared to a machine, a person can do very little in the same amount of time. This is why manual tests tend to optimise human time [...] Since automated tests are designed for unattended execution, it’s critically important that failures can be investigated quickly. [...] To save time in execution, it’s common for a single manual test to check lots of different things or to address several risks. 
Whenever a test needs to access external resources, in particular if they are created asynchronously or transferred across networks, ensure that the resources are hidden until they are fully complete. 
Time-based waiting [sleep() instead of polling or waiting for an event in tests] is the equivalent of going out on the street and waiting for half an hour in the rain upon receiving a thirty-minute delivery estimate for pizza, only to discover that the pizza guy came along a different road and dropped it off 10 minutes ago. 
Instead of just accepting that something is difficult to test and ignoring it, investigate whether you can measure it in the production environment. 
Test coverage is a negative metric: it measures how bad something is, not how good it is. 
There are just two moments when an automated test provides useful information: the first time it passes and when it subsequently fails. 
It’s far better to optimise tests for reading than for writing. Spending half an hour more writing a test will save days of investigation later on. [...] In business-oriented tests, if you need to compromise either ease of maintenance or readability, keep readability. 

Thursday, December 29, 2016

Book review: The Power of Habit Duhigg, a New York Times reporter, collects stories of building and breaking habits, supporting the thesis that habits form an important part of our lives and that they can make a big difference for better or worse. Both in the case of positive training or learning habits, or in the case of addictions, repeated behavior influences our energy levels, our free time and in the end many of our long-term results at work and in life (we will all go the gym in the new year, right?)

The storytelling style of the book may smell like anecdotal evidence, but it keeps the reader intruigued and entertained long enough the get its message across, without delving into fictional stories. Take the science expressed here with a grain of salt (like you would with Malcom Gladwell): all experiments are real but they may have been cherry-picked to prove a point.

The key take away for me was to think about our habits, try to influence them to stop or reinforce them depending on our second-order desires; for example with the cue-routine-reward framework proposed in the book but ultimately with whatever works for you as habit building and destroying must be very context-specific. Another concept that we find reasonable is ego depletion (willpower as a finite resource that must be renewed), but the jury is still out on whether it is a confirmed and sizable effect, as meta-analysis of hundreds of studies do not agree yet (for good reasons).

From exercising to learning, or from quitting smoking to a Facebook addiction, this self-reflection can have a large impact over our lives. Maybe it should be an habit?

Selected quotes from the book follow:
This process within our brains is a three-step loop. First, there is a cue, a trigger that tells your brain to go into automatic mode and which habit to use. Then there is the routine, which can be physical or mental or emotional. Finally, there is a reward, which helps your brain figure out if this particular loop is worth remembering for the future [...] Every McDonald’s, for instance, looks the same—the company deliberately tries to standardize stores’ architecture and what employees say to customers, so everything is a consistent cue to trigger eating routines.
“Even if you give people better habits, it doesn’t repair why they started drinking in the first place. Eventually they’ll have a bad day, and no new routine is going to make everything seem okay. What can make a difference is believing that they can cope with that stress without alcohol.”
Where should a would-be habit master start? Understanding keystone habits holds the answer to that question: The habits that matter most are the ones that, when they start to shift, dislodge and remake other patterns.
“Small wins are a steady application of a small advantage,” one Cornell professor wrote in 1984. “Once a small win has been accomplished, forces are set in motion that favor another small win.” Small wins fuel transformative changes by leveraging tiny advantages into patterns that convince people that bigger achievements are within reach.
“Sometimes it looks like people with great self-control aren’t working hard—but that’s because they’ve made it automatic”
“By making people use a little bit of their willpower to ignore cookies, we had put them into a state where they were willing to quit much faster,” Muraven told me. “There’s been more than two hundred studies on this idea since then, and they’ve all found the same thing. Willpower isn’t just a skill. It’s a muscle, like the muscles in your arms or legs, and it gets tired as it works harder, so there’s less power left over for other things.”
As people strengthened their willpower muscles in one part of their lives—in the gym, or a money management program—that strength spilled over into what they ate or how hard they worked. Once willpower became stronger, it touched everything.

Monday, November 14, 2016

How to run Continuous Integration on EC2 without breaking the bank

I live in a world increasingly made up of services (sometimes micro-), that collaborate to produce some visible behavior. This decomposition of software into (hopefully) loosely coupled components has implications for production infrastructure concern such as how many servers or containers you need to run. 
Consequently, it impacts also testing infrastructure as it tries to be an environment as close as possible to production, in order to reliably discover bugs and replicate them in a controlled environment without impacting real users.
In particular I like to have a ci testing environment where each service can be tested in its own virtual machine (or container for some); and each node is totally isolated from the other services. In addition to that, I also like to have and end-to-end testing environment where services talk to each other as they would in production, and where we can run long and complex acceptance tests. 
This end-to-end environment usually is a perfect copy of production with respect to the technologies being used (e.g. load balancers like HAProx or AWS ELB are in place, in the same way of production, even if there is no test that directly targets their existence); the number of nodes per service is however reduced from N to 2, as in computing there are only 0, 1 and N equivalence categories.

In the likely case that you're using cloud computing infrastructure to manage this 2x or 3x volume of servers with respect to the production infrastructure, your costs are also by default going to double or triple. One option to try and optimize this is to throw away everything and start deploying containers, as they could share the same underlying virtual machines as production while preserving isolation and reproducibility. On an existing architecture made up of AWS EC2 nodes however, optimization can takes us far without requesting to rewrite all the DevOps(TM) work of the last two years.

Phase 1: expand

As I've been explaining, EC2 instances replicating production environments can expand until they bring the total number of EC2 nodes to three times the original number. Some project just have an single EC2 node, while others have multiple nodes that have to be at least 2 in the latest testing environment before production. Moreover, the time the tests take to run on these instances is inversely correlated with how powerful they are in CPU and I/O terms, so you pay good money for every speed improvement you want to get on those 20- or 60-minute suites.
In my current role at eLife, we initially got to more than 20 EC2 instances for the testing environments. This was beneficial from a quality and correctness point of view, as we could then run tests on all mainline branches before they go to production, but also on pull requests, giving timely feedback on the proposed changes without requiring developers to run everything on their machines (that should be an option, not an imperative.)

Phase 2: optimize

The AWS EC2 pricing model works by putting nodes into a running state when you launch them, and by pre-billing one hour of usage every 60 minutes. Therefore, booting existing instances or creating from scratch is going to incur at least a 1-hour cost for each of these events:
  • at boot, 1 hour is billed
  • at 60:00 from boot a new hour is billed,
  • at 120:00 from boot a new hour is billed, and so on
All the EC2 nodes that we have however, use EBS disks for their root volumes. EBS is the block remote storage provided by AWS, and while some generations of instances use local instance storage for their / partition, EBS makes a lot of sense for that partition as it gives you the ability to starting and stopping instances without losing state in between; essentially, it gives you the ability to shutdown and reboot instances when you need without paying EC2 bills for the time in which they are stopped. The only billing being performed is for the EBS storage space, which means AWS has some hard disks in its data centers that has to preserve your instances files, but it allocates new CPU and RAM resources as a virtual machine only when you start the EC2 instance. Therefore, an EBS-backed EC2 instance on your AWS console does not always correspond to a physical place, but it's really virtual as it can be stopped and started many times, moving between racks in the same availability zone while keeping the same data (persisted to multiple disks in other racks) and even private IP address.
Since the process of allocating a new virtual machine and reconfiguring networks to connect everything together has some overhead, this reflects in the boot time necessary to start an EC2 instance, which can be several seconds to be added to the standard boot time for the operating system. This is also reflected in the pricing model, which makes it a bad idea to launch a new EC2 instance for every test suite you need to run: as long as your test suite takes less than 1 hour, you are already paying for a full hour of resources and you would be throwing away. Running 6 builds in an hour would make you pay for 6 hours, which is not what you want.

Phase 2.1: stop them manually

A first optimization that can be performed manually is to stop and start these instances from the console. You would usually stop them at some hour of the day or evening and then start them again as the first thing in the morning.
Of course, there is a good potential for automating this as AWS provides many ways to access EC2 with its API and all the SDKs that build on top of it, available for many different programming languages. You can easily build some commands that query the EC2 API looking for an instance basing on its tags, and then issue commands for starting and stopping it. In general, this is almost transparent for the CloudFormation templates that you are surely using for launching these instances.
The first time you start and stop an instance, there are a few problems thay may come up.
The first problem is that of ephemeral storage: as I wrote before, you have to make sure the root volume of the instance and any data you want to persist are EBS-backed and not local instance storage.
The second problem is that of public IP addresses. While private IP addresses inside a VPC stay the same after a stop and start commands, public IP addresses are a scarce resource and are only allocated to it when the instance is running. Therefore, if you had a DNS pointing to it, it has to be updated after the boot, whether it was manually created or part of the CloudFormation template. Default DNS entries have the form which depends on the public ip address and hence does not provide a good indirection.
The third problem is that of long-running processes managed by SysV/Upstart/Systemd: the daemons of servers like Apache, Nginx or MySQL are usually configured to restart upon boot, but if you have written your own deamons or Python/PHP long running processes and are starting them through /etc/init or /etc/init.d configuration, it pays to check everything is in its place again after boot.
The last problem I have found at this level (manual restarts) is about files in /run and /var/run, which are temporary directory used by deamons to place locks and other transient files like a pidfile indicating an instance of that program is running. If you have folders in /run or /var/run, those folder will have to be recreated. Systemd provides the tmpfiles.d option which automatically creates a hierarchy of files, but it's usually just easier (and portable) to have the daemons create their folders (php-fpm does that) or if they are not able to do that, not placing them in /var/run/some_folder_that_will_stop_existing but in /var/run or even /tmp without subfolders.

Phase 2.2: start them on demand

Instead of manually starting EC2 instance or to automate their stopping and starting as a periodical task, you can also start them on-demand as needed by the various builds that need to be run. So whenever project x needs to build a new commit on master or a pull request, you will start the x--ci EC2 instance.
In this case, however, there is a larger potential for race conditions as you may try to run a deploy or any command on an instance before it's actually ready to be used. Therefore, we wrote some automation code that waits for several events before letting a build proceed:
  1. the instance must have gone from the pending to the running state on the EC2 API. This hopefully means AWS has found a CPU and other resources to assign to it.
  2. the instance must be accessible through SSH.
  3. through SSH, we monitor that the file /var/lib/cloud/instance/boot-finished has appeared. This file will appear at each boot when all daemons have been started, as art of the standard cloud-init package.
Once the instance has gone through all these phases, you can SSH into it and run whatever you want.

Phase 2.3: stop them when it's more efficient

Once you have transitioned from starting instances in the last responsible moment, you can do the same for stopping them instead of just wait for the end of the day to shutdown everything.
We now have a periodical job, running every 2 minutes, that takes a list of servers to stop. In parallel, it performs the following routine for each of the EC2 instances:
  • checks if the server has been running for an amount of time between h:55:00 and h:59:59 minutes, where h is some number of hours.
  • if the condition is true, stop the instance before we incur in a new hour being billed.
  • otherwise, leave the instance running: you already paid for this hour so it makes no harm to do so, as the instance can be used to run new builds at no cost.
Therefore, when developers open a dozen pull requests on the same project, only the first starts the necessary instances to run the tests; the other ones are queued behind that and will get access to the same instance, one at a time.

Bonus: Jenkins locks

Starting and stopping instances periodically would be otherwise dangerous if there wasn't a mechanism for mutual exclusion between builds and lifecycle operations like starting and stopping. Not only you don't want to run builds for the same project on the same instance if they interfere with each other, but you definitely don't want an instance to be shutdown while a build is still running.
Therefore, we wrap both these lifecycle operations and builds in locks for resource, using Jenkins Lockable Resources plugin. If the periodical stopping task tries to stop an instance where the build is running, it will have to wait to acquire the lock. This ensures that machines that see many builds do not get easily stopped, while other ones that are idle will be stopped at the end of their already paid hour.


Cloud computing is meant to improve the efficiency with which we allocate resources from someone else's data centers: you pay for what you use. Therefore, with a little persistence provided by EBS volumes you can efficiently pay for the hours that your builds require, and not for keeping idle EC2 instances running every day of the year. Of course, you'll need some tweaking of your automation tools to easily start and stop instances; but it is a surefire investment that can usually save more than half of the cost of your testing infrastructure by putting it at rest in weekend and non-working hours.

Sunday, October 02, 2016

Deep learning: an introduction for the layperson
Deep learning is one of the buzzwords of 2016, promising to revolutionize the world possibly without Skynet gaining self-awareness.

Last week I held an internal talk at eLife Sciences to introduce colleagues from both the software and scientific backgrounds to the concept.

Here are the slides, complete with notes (accessible through a popup by pressing S) that explain what the diagrams and other figures mean.

Sunday, May 22, 2016

Eris 0.8.0 is out

In the period before my move to Cambridge I got some time to work on Eris, and to use it to test the guts of Onebip's infrastructure. Lots of new features are now incorporated in the 0.8.0 version, along with a modernization of PHP standards compliance carried out by @localheinz.

What's new?

Here's the most important news, a selection from the ChangeLog:
  • The bind Generator allows to use the random output of a Generator to build another Generator.
  • Optionally logging generations with `hook(Listener\log($filename))`.
  • disableShrinking() option.
  • limitTo() accepts a DateInterval to stop tests at a predefined maximum time.
  • Configurability of randomness: choice between rand, mt_rand, and a pure PHP Mersenne Twister.
  • The suchThat Generator accepts PHPUnit constraints like `when()`.
Some bugs and annoyances were fixed:
  • No warnings on PHP 7 anymore.
  • Fixed bug of size not being fully explored due to slow growth.
  • Switched to PSR-2 coding standards and PSR-4 autoloading.
And there were some backward compatibility breaks (we are in 0.x after all):
  • The frequency generator only accepts variadics args, not an array anymore.
  • Removed strictlyPos and strictlyNeg Generators as duplicated of pos and neg ones.                                     
  • Removed andAlso, theCondition, andTheCondition, implies, imply aliases which expand the surface area of the API for no good reason. Added and for multiple preconditions.
Eris is now quite extensible with custom Generators for new types of data; custom Listeners to know what's going on; and even different sources of randomness to tune repeatability and performance.
I believe what's very important about this release is the release of technical documentation. This is not a list of APIs generated by parsing the code, but is a full manual of Eris features, which will be kept up-to-date religiously in the repository itself and rebuilt automatically at each commit.

What's next?

My Trello board says:
  • decoupling from PHPUnit: it should be possible to run Eris also with PHPSpec (already possible but not as robustly as it can be) or in scripts.
  • Multiple possibilities for shrinking, borrowing from test.check rose trees. This feature may speed up the shrinking process and make it totally deterministic.
  • A few more advanced Generators: for example testing Finite State Machines.
If you are using Eris and wanna give feedback, feel free to open a Github issue to discuss. 

Tuesday, April 19, 2016

Next stop: Cambridge

Last Friday has been my last working day in Onebip, the carrier billing payment platform headquartered in Milan. I leave the best technical team I have ever worked with, who has tackled endless challenges from transitioning to a microservice architecture, to adopting CQRS and Event Sourcing, and testing a large product depending on the integration with 400 mobile carriers.

In May, I will start a new adventure as a Software Engineer in Test at eLife. Located in Cambridge, eLife is an open access journal that publishes scientific articles in the fields of biology and medicine, with the goal of improving the peer review process and accelerating science. As a non-profit organization, it's quite a different context with respect to selling product and services, but indeed a potentially large and positive impact on the world.

Cambridge is a city of research and technology, and welcomes students and scientists, but also software developers like me. Moreover, it's small and peaceful (you can cycle around anywhere), while showing peaks of high technical level. It's the first place where I have been to a study group on the book Structure and Interpretation of Computer Programs, or to a quite good introductions to machine learning talk (and not to Transpile typed ECMAScript without left-pad nor using arrays because you would need a polyfill for that or some other hipster hallucination).
See you on the other side of the Channel...

Tuesday, March 15, 2016

When to write setters

I have set out, almost unconsciously, to use constructor injection by default in the last few years while writing object-oriented applications. With Dependency Injection as a given, constructor injection satisfy most of my requirements for building an object graph and dynamically configuring collaborators.

The spectrum

I see the statefulness of an object not as an absolute but over a spectrum.
At one end of the spectrum we have immutable objects: these objects acquire a configuration in their constructor and are effectively final (to employ a Java-specific term) for the rest of their lifecycles. Their fields are private and there's no way of modifying them outside of the constructor; their collaborators are only scalars or other immutable objects. In Java fields can even be final so that accidental reassignment is regarded as a compile-time error.
The physical state of an object may still change without its external behavior being affected, like in the case of caching. I still consider this kind of white-box-immutable objects as immutable.
Stateful objects instead may have a behavior that changes as a result of its own state (or of its collaborators). You call some Command methods on it, and the response of further calls to Query methods change. Hopefully the Commands encapsulate some domain logic to constrain the state transition to a valid and modelled one.
At the extreme end of the spectrum we find setters: methods which only mutate the value of one or more fields, possibly skipping any validation, domain modeling or consistency check. The setters considered here are public methods, because their limited-scope versions do not provide the same violations.
If you want to write procedural code, setters proliferate (still it's probably easier to just use public fields at that point). There are only a few valid use cases where I have found setters useful in object-oriented programming, and here is the (short) list.

Configuration which has default values

Classes may have a few configuration values that you are able to tune; especially when there is more than a few of these parameters, I find useful to separate hard dependencies in the constructor and setters that are able to override the default parameters when the object has already been constructed. If you forgot to call these methods, the object still has to work correctly.
Alternative solutions for this use case are of course constructors with default parameters, which I definitely prefer if there are not so many options to tune (1 or 2). You can also look into with Value Objects which produce a new instance upon reconfiguration, and model all of the configuration parameters as a single entity; or into a Builder if you want to invest in an additional class and its API.

Adding Observers

Observers (or listeners if you prefer) are collaborators which are notified of internal events happening inside an object that may interest them.
I treat the observers of an object as an append-only list data structure, with an empty list or array as the natural Null Object. The object is initialized without any observer, and a setter like addListener(...) has the more limited capability of adding an observer but not of removing or modifying an existing one.
The nature of the Observer pattern is investing on a common bus that many observers can be attached to, even if they come from different packages and libraries. Therefore I find it natural to support the dynamic wiring of other objects, even if they make the object more mutable can before. The needs of integration become more important than guaranteeing safety of construction in these scenarios.


When objects are unserialized from a cold storage such as a stream of bytes or a JSON object, encapsulation is very likely going to be violated. Object-relational mappers have been doing this for ages by working directly on annotated private fields, with sometimes powerful results but also lots of dangers from storage and code being out of sync.
In the scenarios where you control the reconstitution of objects, such as rebuilding an object from a MongoDB document, it's often easier to provide an explicit API like a setState() method than to rely on the magic of a library which is going to bypass your public methods. To constrast the possible misuse, you can tag this method as @private (or package protected in Java), or make it very awkward to use outside of the persistence context by requiring a particular data structure to be passed in.


There are very few use cases for setters in real object-oriented programming; default to constructor injection and to immutable objects to avoid overcomplicating your design. Employ setters for non-mandatory, cross-cutting initializations so that your code does not have to bend over backwards to support these use cases while at the same time it can be robust to cowboy modification of internal state.

Monday, March 14, 2016

On property-based testing a highly concurrent job queue

Exploring Eris
Recruiter is a job queue written in PHP, open sourced by Onebip in its 2.x version. It has been used on some inward facing production services, but following the necessity to roll it out to more and more project, I have started a thorough testing campaigns to flush out possible concurrency bugs.
The job system is composed of a single Recruiter process and multiple Worker processes (moreover any other PHP process can enqueue a job). These processes may run on any machine inside a local network and share a MongoDB database where they collaborate to empty the collection of jobs to do. The design of these multiple collections is carefully tuned for scalability, as in its first version Recruiter used heavier findAndModify operations which is now free of.

Property-based testing

Testing some happy paths such as adding a job and executing it is fine for test-driving the code, but it's nowhere near enough for quality assurance. On system of any appreciable scale and/or quality, testing is a separate additional activity (that can hopefully be performed by developers wearing a different hat, or in any case inside a single cross-functional team.)
To test highly concurrent processes such as a recruiter and its dozens of workers insisting on the same database, we adopted Eris, the open source PHP QuickCheck implementation developed by me and some colleagues. Eris is able to generate random inputs for the System Under Test, according to a specification provided by the tester; it supports property-based testing which drives the system with this input while checking important properties are respected.
In this scenario, we generated a random sequence of actions to perform over these processes, checking invariants and post-conditions of operations. For example, an invariant is there is never more than one recruiter process alive. There are surprisingly few invariants when you work with distributed systems; as another example consider the number of workers registered in the related MongoDB collection. This number is not fixed, as crashed processes may still be present even if dead, as long as the rest of the system didn't detect the crash yet.
One postcondition of the job system is very important: any job enqueued is eventually executed, preferably as soon as possible. In these tests, we focused on testing the correctness of this property and not the performance. We monitor the collection of archived jobs (which have been executed correctly) and check that it fills up with all the jobs we expect. The timeout after which we declare the test failed is tuned to the total number of actions performed, which is random.
There are more advanced approaches such as generating a sequential prefix plus a few parallel sequences of actions. This could give more control over the process and may enable some form of shrinking with better determinism; however we retain a notion of parallelism by creating multiple processes. Unfortunately each run is non-deterministic as processes and the underlying MongoDB instance can be scheduled differently by the operating system, changing the interleave of their operation; therefore shrinking is not possible, or is possible only at the cost of running shrunk sequences multiple times to reliably mark them as passing.

Iteration: random number of jobs, graceful restarts

In the first version of the test, we generated a random number of identical jobs (executing a "echo 42" command), along with a series of restarts of the recruiter and a single worker process using SIGTERM. The jobs were enqueued serially by the test process, along with the restart actions. In theory, the processes intercept the signals and exit after having finished their current cycle of polling or execution.
Here are the bugs that we found:

Iteration: multiple workers

Once the test suite was consistently green, we extended the testing environment by allowing multiple workers to be created and correctly restarted.
We found an additional problem with this extension:

Iteration: crashing workers

We added the possibility of killing a worker with SIGKILL, immediately interrupting it even in the middle of database updates.
The possibility of a worker crashing was already covered by the code. However, we tuned the timeout period after which workers are considered dead while inside the test suite; we set it to dozens of seconds instead of half an hour to allow for sane waiting periods in the test process.

Iteration: crashing the recruiter

Killing the single recruiter process was interesting because it usually takes a lock (in the form of a document inside a MongoDB collection with a unique index) to avoid accidental multiple executions. The process correctly waited on the previous lock to expire before restarting, but...

Iteration: length of jobs

We introduced also a random length for enqueued jobs (sleeping from 0ms to 1000ms instead of executing a fixed command). At this point we did not find additional bugs at the time of this post, with the test suite running for several hours, exploring new random possible sequences of actions.

Final version

The final version of the test composes an Eris Generator that:
  • generates a number of workers to start between 1 and 4.
  • using this number, creates a new Generator that produces a tuple (in this case a pair, which means an array of two elements of disparate types). The tuple contains the number of workers itself and the list of actions.
The list of actions is a sequence of a random number of elements, where each of the elements can in turn be an action representing:
  • a job to enqueue with an expected duration of a positive number of milliseconds
  • a graceful restart of one of the workers
  • a graceful restart of the recruiter
  • a kill -9 on one of the worker processes
  • a kill -9 on the recruiter process
  • a sleep of a number of milliseconds between 0 and 1000
Here is an example of action sequence:

[ACTIONS][PHPUNIT][2016-03-14T12:11:14+01:00] ["enqueueJob",8]
[ACTIONS][PHPUNIT][2016-03-14T12:11:14+01:00] "restartRecruiterGracefully"

While here is a moderately complex example:

[ACTIONS][PHPUNIT][2016-03-14T12:11:59+01:00] ["restartWorkerByKilling",0]
[ACTIONS][PHPUNIT][2016-03-14T12:11:59+01:00] ["restartWorkerByKilling",0]
[ACTIONS][PHPUNIT][2016-03-14T12:11:59+01:00] "restartRecruiterGracefully"
[ACTIONS][PHPUNIT][2016-03-14T12:12:00+01:00] ["enqueueJob",7]
[ACTIONS][PHPUNIT][2016-03-14T12:12:00+01:00] ["restartWorkerByKilling",0]
[ACTIONS][PHPUNIT][2016-03-14T12:12:00+01:00] ["enqueueJob",13]
[ACTIONS][PHPUNIT][2016-03-14T12:12:00+01:00] "restartRecruiterByKilling"
[ACTIONS][PHPUNIT][2016-03-14T12:12:00+01:00] ["restartWorkerGracefully",0]
[ACTIONS][PHPUNIT][2016-03-14T12:12:00+01:00] ["restartWorkerByKilling",0]
[ACTIONS][PHPUNIT][2016-03-14T12:12:00+01:00] "restartRecruiterGracefully"
[ACTIONS][PHPUNIT][2016-03-14T12:12:10+01:00] ["sleep",860]
[ACTIONS][PHPUNIT][2016-03-14T12:12:11+01:00] "restartRecruiterByKilling"
[ACTIONS][PHPUNIT][2016-03-14T12:12:12+01:00] ["restartWorkerByKilling",0]
[ACTIONS][PHPUNIT][2016-03-14T12:12:12+01:00] ["enqueueJob",0]
[ACTIONS][PHPUNIT][2016-03-14T12:12:12+01:00] ["restartWorkerByKilling",0]
[ACTIONS][PHPUNIT][2016-03-14T12:12:12+01:00] ["enqueueJob",5]
[ACTIONS][PHPUNIT][2016-03-14T12:12:12+01:00] "restartRecruiterGracefully"

The parameter in the steps modelled as arrays is the duration of a job, or the number of the worker in case of restarting actions.

The test generates 100 of these sequences (this number is tunable, or can target a time limit). For each of them it creates an empty database, starts the workers and the recruiter, performs the actions and waits for all jobs to be performed. If the timeout for full execution expires, the test is marked as failed and lists the log files to look at to understand what happened. On my machine, the test now terminates in about one hour, with a green bar.


Testing is an important activity and can increase the quality of your software by removing bugs before they can get to one of your customers. Testing is becoming more and more incorporated in the lifes of developers (see Test-Driven Development and Behavior-Driven Development), but for core domains and infrastructure additional activities are required for stress and performance tests comparable to production traffic.
It is however impossible to write by hand tests for all the possible situations; however you can easily build a reasonable model of the input to your system. So let me quote John Hughes in saying "Don't write tests. Generate them"; with property-based testing you can write one test containing one property, and catch dozens of bugs like in this post's case study.

Sunday, February 28, 2016

Building an application with a JavaScript-only stack

As I often do when checking out a new platform or language, I have been building a new pure JavaScript implementation of the Game of Life simulation like I did for Java 8.
In this case, my choice fell unto the MEAN stack:
  • MongoDB: a general purpose document-oriented (and as such NoSQL) database with support for querying and aggregation.
  • Node.js: the famous server-side JavaScript interpreter.
  • Express: a framework for providing REST APIs and web pages on top of Node.js.
  • AngularJS: one of the popular client-side JavaScript frameworks for building Single Page Applications.
The experience has been quite interesting, as you really get to know a language and its libraries when using it for a project; in a way that no book can force you to do.


Here are a series of myths I want to dispel after diving into a full stack JavaScript project for a few weeks.
It is true that there is less of a context switch when changing between the server-side and the client-side applications, since you are always writing the same language. However, this seamless transition is limited by several differences:
  • different language support: ECMAScript 6 has to be compiled down by tools like Babel to ECMAScript 5 to be compatible with any browser and Node.js version. Polyfills may be needed to try and unify the experience.
  • Different libraries: testing frameworks change between server and client, and so does how you build mocks.
  • Different frameworks: Angular and Express both have their own ways to express controllers and views.
  • Different tools: you install packages for the server-side with Npm but use Bower instead on the client.
The key about productivity is in being opinionated and choose (or have someone choose for you) a single tool for each purpose, without being carried away by the latest fashion. In this case I followed some default choices and trimmed that down to get:
  • Mocha and expect(), one of the three flavors of the Chai assertion library, for the server-side.
  • Npm and Bower for server-side and client-side.
  • wiredep to generate script and CSS tags for the single page to be loaded.
  • Grunt as a build and automation tool, wrapping everything else.
One sany way to pick up default and platform idioms is to start from a predefined stack, and you can do so by cloning a template or a generator like Yeoman. If the generator is well-factored, it will give you sane defaults to fill in the gaps such as JsHint and a configuration for it.
Another myth I would like to dispel is callback hell: if you use the ECMAScript 6 construct yield, you can pretty much write synchronously looking code by building an iterator of steps (each step producing a promise whose resolution will be passed in as an input for the next.) There's probably something even more advanced I didn't reach yet in ECMAScript 7. Don't take this as me saying you can write synchronous code in JavaScript (it only looks synchronous), and you definitely have to learn to use well the underlying layers of callbacks and promises before you can grasp what yield is really doing.


With a reference to my previous experience in Java, the productivity of the JavaScript stack feels very good in the short term (I only explored that time frame), due to its simple syntax and structures, especially with support for ES6 which removes a lot of boilerplate.
For example, there is no need for Set<Cell> aliveCells = new HashSet<Cell>(); definitions like in Java, as you would write aliveCells = new Set() with purely dynamic typing (suffering the occasional unfortunate consequences of this choice, of course.)
To evaluate productivity and robustness in the long term you would have to build a much larger project, inside a team composed of multiple people.
I found the tight feedback loop of grunt serve was another positive impact on productivity: you can set up a watch on files so that every time you save the Node.js server is restarted, and the current browser page is reloaded. This is accomplished by LiveReload monitoring from the browser side with a WebSocket. Of course once you get the hang of the testing frameworks and their assertions, you're back to the even stricter feedback loop of running tests and have their output in milliseconds.


I'd say you can reach a good productivity in a pure JavaScript environment, even if I am unsure about the pure dynamic typing approach.
You'll have to get opinionated and choose wisely; to not introduce duplicates and clearly assign responsibility to each of this tool so that it's unambiguous that npm should not be used for client-side modules; to take control of your stack, as everything that you install is still only JavaScript code, copied into your project and that you can read to get a feel of what it's doing.
In programming, a feature that seems to consist of just a few lines of code often turns into an engineering project. In very dynamic and immature stacks such as JavaScript, it's even more important to build strong foundations, tools and processes to turn a blob of code into well-factored software.