Sunday, January 21, 2018

Book review: Production-ready microservices
Production-Ready Microservices is a short book about consistently practicing architecture and design over a fleet of microservices.
In general, I think the principles described here apply very much to any service-oriented initiative, even more so if the services are coarse grained and hence require more maintenance than finely isolated ones.


The book extrapolates from the author's experience at Uber "standardizing over a thousand microservices". Given a few developers for each microservice team, that makes up 2000-3000 engineers from the total >10000 Uber employees (I wonder how many are lawyers). After WhatsApp's famous story of being acquired at 55 employees in total, that really highlights the difficulty level of running a business and operations all over the physical world (sending cars and drivers around in dozens of countries) with respect to a digital-only enterprise. We should remember this and many other directions of change the next time we hear a technology advocate saying how much the cost of his 2-people startup has been reduced by $technology.

The main message

You should be this tall to use microservices; this architecture doesn't necessarily fit every context; although integrating separate services of some size is becoming a standard after the API revolution (before that, it was integrate through the database which is arguable worse).
You will encounter many different social and technical problems, such as:
  • Inverse Conway's Law, with the shape of the products defining the shape of the company. Although I found out this doesn't really apply at smaller scales as development teams can own more than one service and experience a successful decoupling between people and code.
  • Technical sprawl, where multiple languages, databases and other key choices spread without a consistent, central planning.
  • More ways to fail: distributed and concurrent systems are more difficult to work with and to reason upon, plus the fact that there are more servers, containers or applications will simply multiply the failures you'll see.
There are lots of non-functional requirements like scaling each microservice and isolating it from the rest of the fleet; perhaps don't go too micro- if you don't have the resources to ensure an acceptable level in each service. Perhaps in your context the acceptable SLA for some particular service is low, because it's not change often or is only internally facing or is only used several times per day.
One particular aspect of the Consistency is important lesson is that the whole lifecycle of services should be considered. Maintenance and even decomissioning are as important as producing new MVPs: but I've seen many times services being neglected, or being considered very easy to migrate away from one some new shiny substitute was available. In reality, it takes time and effort to keep services up and running, and to finally kill them when you have an alternative, as data and users are slowly migrated off from the old to the new platform.
Lots of requirements are also overlooked but often turn out to be important as you increase your population of services: the scalability of a single endpoint, fault tolerance, even documentation (ADR are the only form I trust very much right now in a fast-moving organization context.) Every single section of this book will make you think about it, but won't give much of an overview: you're better served by reading the SRE book for example.

Value for money

This book is a short read which gives you an overview of what microservices challenges you're likely to face down that rabbit hole; in particular, it focuses on a medium-to-large organization context. I'm not sure this book is worth the price tag however: 20 pounds for a Kindle edition of ~170 pages, where ~25 pages are glossary, index and lots of checklists.

Wednesday, January 03, 2018

Book review: Algorithms to live by Algorithms to Live By: The Computer Science of Human Decisions is a book that puts together the domains of computer science and real life. The ensemble of topics being touched is wide. The book treats deterministic algorithms such as optimal sorting, but then moves on to more context-dependent strategies for caching and scheduling. The last chapter even get to model identification, (tractable and intractable-made-tractable) optimization problems, stochastic algorithms and game theory.

All the while, computer science concepts are compared to conscious and unconscious human processes. For example, caching and the memory hierarchy have great parallels with how the human brain recollects memories of recent events, and how we can augment our brain with external, slower supports like paper. Scheduling is useful not only to allocate processes on CPU cores, but also to make an explicit choice of strategy when prioritizing the tasks that you or your team face. Up to the more extreme examples of game theory and mechanism design, when the incentive system becomes more important than the individual agents (manage the system, not the people rings a bell?)

If you like viewing the world throughout the lens of algorithms and see how the strategies of humans and computer compare with each other, I would strongly recommend this book as it will make for an entertaining read and some principles to take away for real life usage (I hope sorting socks will be easier now). Skip it if you have a very wide knowledge of computer science, operations research, Nash equilibria... but even if I was familiar with the technical part, I was missing the connection to different domains or everyday, real world problems. I listened to the audiobook version, which lasts about 12 hours. You may find it easier to skim through some chapters if you are more (or less) interested in some topics. The problem with audiobooks is that I can't easily take notes, while highlighting on an e-book reader is quick and lets me recollect all important gotchas later into a text file.

New role: Software Engineer in Tools and Infrastructure

After working on eLife's testing and deployment infrastructure in 2016, in the last year my responsibilities in the technical team have shifted towards the domain of engineering productivity. Testing is one phase of the development process that is often a bottleneck, but there are many more areas like code reviews, monitoring and infrastructure itself (being it servers or services):
In summary, the work done by the SETs naturally progressed from supporting only product testing efforts to include supporting product development efforts as well. Their role now encompassed a much broader Engineering Productivity agenda. -- Ari Shamash on the Google Testing Blog
Moreover, the team starts from a high level of coverage and design on many projects, to the point that my focus has always been on the provisioning and automation of testing environments, and on large-scale end2end testing.

What seems just a letter on a job title (from SET to SETI) is in fact an alignment of responsibilities so that I am not accidentally mistaken for "the QA guy" but always seen as a problem solver instead.
Solving problems and propagating the solution, so that you don't have to solve them over and over again
Roles are always an approximation in a team of generalizing specialist that also distributes and collaborate on some roles such as that of architecture. But it's helpful in a cross-functional team to have someone dedicated to the task of productivity, whether it is reached through automation, tooling, or continuous improvement.

Thursday, November 30, 2017

Book review: Building Microservices
Building Microservices: Designing Fine-Grained Systems by Sam Newman is the seminal book on microservices as a concept. It was published at the start of 2015 (that's a long time ago in tech... or is it?), it's focused on high-level topics rather than implementation and hence has aged well.

There are in fact several concepts, both at the methodology and the technical level, that the books does justice to. Here's what turned on the light bulb over my head.


Modeling is important and a whiteboard discussion can save weeks of implementation down the line. Both me and Sam Newman are not the first to say this. Modelling in microservices, like in Domain-Driven Design, is all built around business capabilities and the shape of your organization (yes, Conway).

Styles of integration

Like for other design choices, especially at the architectural level, it's important to explicitly choose whether to go for a shared database (please don't), synchronous or asynchronous communication; orchestration through a Facade or coreography distributing responsiblities between services; explicit versioning the kinds of backward compatibility; pushing or pulling data from one physical location to another, and with which granularity of time and entity.
There are also styles of isolation, not just of integration: code reuse is maturely described on a trade-off scale with decoupling. It feels like a pattern book in which these options are given a standard name for further discussion, and evaluated with respect to the contexts in which they work well.


Should you go for virtual machines or containers? How do you map services to physical or virtual machines? The book couldn't possibly keep up with the rise of container orchestrators in the last couple of years, so it won't be a complete guide but could give you a sense of the problems that virtual machines create and that we are going to solve in this next generation. What will be the problems that containers create, and most of all how to solve them, is not in the scope of this book instead.


After the basics like a testing pyramid, I don't find myself in complete agreement with large scale testing strategies proposed here like consumer-driven contract testing. Yes, it works well enough if you can specify a formal contract that a service should adhere to, and test it in isolation in the implementer service. But the overhead of doing so, in a context in which we are supposed to create dozens if not hundreds of service, is very significant.
At eLife we have relied on a wide RESTful API specification, each group of endpoints implemented by a different service. As such, the overhead is limited, and this is just a description for validating requests and responses rather than a full contract, as most of these services are read-only.
All in all, I find myself relying on the end2end beast to get heterogeneous services, written in multiple languages, in different times by different people, to talk together reliably. I would spend a lot of energy in trying to square the circle of contracts, but I suppose they work well at an higher scale of traffic or on selected services.
The end2end tests we use are limited in their scope, constrained by being at the top of the pyramid; they do not necessarily cover a full end2end scenario but rather a data path involving more than on service, often skipping the user interface. It helps that we have no Selenium-based testing in the end2end layer, as the user interface is fully accessible to a HTML parser and requires no Javascript.
The problems that we encounter daily happen in production all the same: timeouts, dirty data, the automation challenge of turning on and off new nodes reliably, the race conditions that come from distributed executions. I'd rather not hide these problems but solve them, and I'm looking at containers instead to try to shrink the big picture and have a simpler end2end environment, easier to spin up and down, or to provide with clean databases.

Hidden gems

There are several hidden gems that would let you pick the brain of the author on a common problem cited in a chapter. For example, we have such a problem in integrating with a CRM that doesn't even support PHP 7 (what is it with CRMs and always being sources of technical debt?) There are some example patterns that you could apply in that situation, like hiding the CRM behind a specialized service that cleans up its API. Nothing miraculous, but a glimmer of hope for these desperate situations.


If you are going to work with microservices, or milliservices, or small enough services, this book is worth a read. If you are only having troubles with a particular area, such as testing or security, going through a single chapter will give you a big picture before you go in depth with further sources.
Remember that this book is starting to become a bit dated, so you cannot take highly technical lessons from it (and I doubt books are a great tool for those in general in this fast-moving environment). Think about your context, learn the theory, and fill in the parts in which the map is blank (or erased) with what you are learning from the web in 2017 (soon to be 2018 - attempt to make this review valid for one more year).

Monday, April 03, 2017

Pipeline Conf 2017

Not liquorice
Post originally shared on eLife's internal blog, but in the spirit of (green) open access here it is.

Last month I have attended the PIPELINE conference in London, 2017 edition. This event is a not-for-profit day dedicated to Continuous Delivery, the ability to get software changes into the hands of users, and to do so safely, quickly, and in a sustainable way. It is run by practitioners for practitioners, everyone on different sides of the spectrum like development, operations, testing, project management, or coaching.

The day is run with parallel tracks, divided into time slots of 40-minute talks and breaks for discussions and, of course, some sponsor pitches. I have been picking talks from the various tracks depending on their utility to eLife's testing and deployment platform, since our tech team has been developing every new project with this approach for a good part of 2016.

The conceptual model of Continuous Delivery and of eLife's implementation of it is not dissimilar to the scientific publishing process:
  • there is some work performed into an isolated environment, such as a laboratory, but also someone's laptop;
  • which leads to a transferable piece of knowledge, such as a manuscript, but also a series of commits, roughly speaking some lines of code;
  • which is then submitted and peer reviewed. We do so through pull requests, which perform a series of automated tests to aid human reviewers inside the team; part of the review is also running the code to reproduce the same results on a machine which is not that original laptop.
  • after zero or more rounds of revisions, this work gets accepted and published...
  • which means integrating it with the rest of human knowledge, typesetting it, organizing citations and lots of metadata about the newly published paper. In software, the code has to be transformed into an efficient representation, or virtual machines have to be configured to work with it.
  • until, finally, this new knowledge (or feature) is in the hands of a real person, who can read a paper or enjoy the new search functionalities
Forgive me for the raw description of scientific work.

In software, Continuous Delivery tries to automate and simplify this process to be able to perform it on microchanges multiple times per day. It aims for speed to be able to bring a new feature live in tens of minutes; it aims for safety to avoid breaking the users work on new changes; and to do all of this in a sustainable way, not to sacrifice tomorrow's ability to evolve for a quick gain today.

Even without the last mile of real user traffic, the 2.0 software services have been running in production or production-like servers from the first weeks of their development. A common anti-pattern in software development is to say "It works on my machine" (imagine some saying "It reproduces the results, but only with my microscope"); what we strive for is "It works on multiple machines, that can be reliably created; if we break a feature we know within minutes and can go back the latest version known to work."

Dan North: opening keynote

Dan North started to experiment with Continuous Delivery in 2004, at a time when builds were taking 2 days and a half to run in a testing environment contended by multiple teams. He spoke about several concepts underpinning Continuous Delivery:
  • conceptual consistency: the ability of different people to make similar decisions without coordination. It's an holy grail for scaling the efforts of an organization to more and more members and teams.
  • supportability: championing Mean Time To Repair over Mean Time Between Failures. The three important questions for facing a problem as what happened? Who is impacted? How do we fix it?
  • operability: what does it feel like to build your software? To deploy it? Test it? Releasing it? Monitor it? Support it? Essentially, developer experience in additio to user experience.
Operability is a challenge we have to face ourselves more and more as we move from running our own platform to provide open source software for other people to use. Not only reading an article has to be a beautiful experience, but publishing one should be.

John Clapham: team design for Continuous Delivery

This talk was more people-oriented, I agree with the speaker that engagement of workers is what really drive profits (or value in case of non-profits).
Practically speaking:
  • reward the right behaviors to promote the process you want;
  • ignore your job title as everyone's job is to deliver value together;
  • think small: it's easier to do 100 things 1% better than to do 1 thing 100% better (aka aggregation of marginal gains)

Abraham Marin: architectural patterns for a more efficient pipeline

The target for a build is for it to take less than 10 minutes. The speaker promotes the fastest builds as the one you don't have to run, introducing a series of patterns (and the related architectural refactorings) to be executed, safely, to simplify your software components:
  • decoupling an api from implementation: extracting an interface package to reduce the dependencies to a component to a dependency to an interface;
  • dividing responsibiliteis vertically or horizontally trying to isolate the most frequent changes and minimizing cross-cutting requirements;
  • transform a library into a service;
  • transform configuration into a service.
Some of these lessons are somewhat oriented to compiled languages, but not limited to them. My feeling is that even if you reduce compile times, you still have to test some components in integration, which is a large source of delay.

Steve Smith: measuring Continuous Delivery

How do you know whether a Continuous Delivery effort is going well? Or more pragmatically,  which of your projects is in trouble?
The abstract parameters to measure in pipelines are speed (throughput, cycle time) and stability. Each declines differently depending on the context.
In deployment pipelines that go from a commit to a new version release in production, lead time and the interval of new deployments can be measured. But also failure rate (how many runs fail) and failure recovery time are interesting. In more general builds or test suites, execution time is a key parameter but a more holistic view includes interval (how frequent are builds executed).
I liked some of these metrics so much that they are now in my OKRs for the new quarter. Simplistic quote: you can't manage what you can't measure.

Alastair Smith: Test-driving your database

To continuously deploy new software versions, you need an iterative approach to evolve your database and the data within it. When you evolve, you also have to test every new schema change. Even in the context of stored procedures for maximum efficiency (and lock-in), Alastair showed how to write tests that can reliably run on multiple environments.

Rachel Laycock: closing keynote, Continuous Delivery at Scale

Rachel Laycock is the Head of technology for North America at Thoughtworks, the main sponsor of the conference. The keynote however had nothing to do with sales pitches. Here are some anti-patterns:
  • "We have a DevOps team" is an oxymoron, as that kind of team doesn't exist; what often happens is that the Ops team gets renamed."
  • Do we chose Kubernetes or Mesos?" as in getting excited about the technology before you understand the problem to solve.
The "at scale" in the title pushes for seeing automation as a way to build a self-serving platform, where infrastructure people are not bottlenecks but enablers for the developers to build their own services.
The best quote however really was "yesterday's best practice becomes tomorrow's anti-pattern". What we look for is not to be the first to market but to have an adaptable advantage, a product that can evolve to meet new demands rather than being a dead end.

Tuesday, March 28, 2017

Book review: Site Reliability Engineering

The overwhelming majority of a software system’s lifespan is spent in use, not in design or implementation. So, why does conventional wisdom insist that software engineers focus primarily on the design and development of large-scale computing systems? -- the book tagline
Site Reliability Engineering - How Google runs production systems is a 2016 book about the ops side of Google services, and the set of principles and practices that underlie it.

A Site Reliability Engineer is a software engineer (in the developer sense) that designs and implement systems that automated what would otherwise be done manually by system administrators. As such, SREs have a directive for employing a minimum 50% of their time in development rather than firefighting and maintenance of existing servers (named toil in the book).

The book really is a collection of chapters, so you don't have to be scared by its size as you don't necessarily need to read it cover to cover. You can instead zoom in on the interesting chapters, being them monitoring, alerting, outage tracking or even management practices.

Principles, not just tools

I like books that reify and give names to concepts and principles, so that we can talk about those concepts and refer to them. This book gives precise definitions for the Google version of measures such as availability, Service Level Objectives and error budgets.
This is abstraction at work: even if the examples being used show Google specific tools like Borgmon or Outalator, the solutions are described at an higher abstraction level that makes them reusable. When load balancing practices are made generic enough to satisfy all of Google services, you can bet that they are reusable enough to be somewhat applicable to your situation.

Caveat emptor

Chances are that you're not Google: the size of your software development effort is order of magnitudes smaller, and even when it is of a comparable size it's not easy to turn an established organization into Google (and probably not desirable nor necessary.)
However, you can understand that Google and the other giants like Facebook and Amazon are shaping our industry through their economically irresistible software and services. Angular vs React is actually Google vs Facebook; containers vs serverless is actually Kubernetes vs Lambda which is actually Google vs Amazon. The NoSQL revolution was probably started by the BigTable and Dynamo papers... and so on: when you deployed your first Docker container, Google engineers had already been using similar technology for 10 years; as such, they can teach you a lot at a relatively little cost through the pages of a book. And it's better to be informed on what may come to a cloud provider near you in the next years.


It took some time to get through this book, but it gives a realistic picture of running systems that undergo a large scale of traffic and changes at the same time. Besides the lessons you can directly get from it, I would recommend it to many system administrators and "devops guys" as a way to think more clearly about which forces and solutions are at play in their datacenters (or more likely, virtual private clouds).

Sunday, March 12, 2017

Eris 0.9.0 is out

In 2016 I moved to another country and as a result of this change I didn't had much time to develop Eris further. Thankfully Eris 0.8 was already pretty much stable, and in this last period I could pick up development again.

What's new?

The ChangeLog for 0.9 contains one big new feature, multiple shrinking. While minimization of failing test cases is usually performed with a single linear search, multiple shrinking features a series of different options for shrinking a value.
For example, the integer 1234 was usually shrunk to 1233, 1232, 1231 and so on. With multiple shrinking, there are a series of options to explore that make the search logarithmic, such as 617, 925, 1080, 1157, 1195, 1214, 1224, 1129, and 1231. If the simplest failing values is below 617 for example, at least (1234-617) runs of the test will be skipped by this optimization, just in the first step.
This feature is the equivalent of QuickCheck's (and other property-based testing libraries') Rose Trees, but implemented here with an object-oriented approach that makes use of `GeneratedValueSingle` and `GeneratedValueOptions` as part of a Composite pattern.

This release also features support for the latest versions of basic dependencies:
  • PHPUnit 6.x is now supported
  • PHP 7.1 is officially supported (I expect there were mostly no issues in previous releases, but not the test suite fully passes.)
Several small bugs were fixed as part of feedback from projects using Eris:
  • the pos() and neg() generators should not shrink to 0.
  • float generation should never divide by 0.
  • shrinking of dates fell into a case of wrong operator precedence.
  • reproducible PHPUnit commands were not escaped correctly in presence of namespaced classes.
A few backward compatibility fixes were necessary to make room for new features:
  • minimumEvaluationRatio is now a method to be called, not a private field.
  • GeneratedValue is now an interface and not a class. This is supposed to be an internal value: project code should never depend on it and it should build custom generators with map() and other composite generators rather than implementing the Generator interface, which is much more complex.
  • the Listener::endPropertyVerification() method now takes the additional parameters $iterations and the optional $exception. When creating listeners should always subclass EmptyListener in order not to have to modify the not implemented methods, which will be inherited.

What's next?

My Trello board says:
  • still decoupling from PHPUnit, for usage in scripts, mainly as a programmable source of randomness.
  • more advanced Generators for finite state machines and in general a more stateful approach, for testing stateful systems.
  • faster feedback for developers, like having the option to run fewer test cases in a development environment but the full set in Continuous Integration.
I'm considering opening up the Trello board for public read-only visibility, as there's nothing sensible in there, but potential value in transparency and feedback from the random people encountering the project for the first time.

As always, if you feel there is a glaring feature missing in Eris, feel free to request it on the Github's project issues.

Sunday, February 12, 2017

Book review: Fifty quick ideas to improve your tests
Fifty quick ideas to improve your tests is, well, a series of fifty quick ideas that you can implement on some of your automated test suites to improve their value or lower their creation or maintenance costs.

These ideas are pattern-like in which they are mostly self-contained and often independent from each other. They are distilled from real world scenarios that the authors (David Evans, Tom Roden and Gojko Adzic) have encountered in their work.

This format helps a lot readability, as ideas are organized into themes, giving you the ability to focus on the area you want to improve and to quickly skip the ideas that do not make sense in your context or you find impractical or not worth the effort. For the same reasons I enjoyed the book this one is a sequel to, Fifty quick ideas to improve your user stories. Moreover, both were published on Leanpub so you have the ability to adjust the price to the value you think you'll get out of them; and despite Leanpub's large collection of unfinished books, this one is 100% complete and ready to read without the hassle of having to update to a new version later (who really does that?)

Some selected quotes follow, highlighted phrases mine.
Something one person considers critical might not even register on the scale of importance for someone from a different group. 
Drawing parallels between the different levels of needs, we can create a pyramid of software quality levels: Does it work at all? What are the key features, key technical qualities? Does it work well? What are the key performance, security, scalability aspects? Is it usable? What are the key usability scenarios? Is it useful? What production metrics will show that it is used in real work? Is it successful? 
In order to paint the big picture quickly, we often kick things off with a ten-minute session on identifying things that should always happen or that should never be allowed. This helps to set the stage for more interesting questions quickly, because absolute statements such as ‘should always’ and ‘should never’ urge people to come up with exceptions. 
Finally, when an aspect of quality is quantified, teams can better evaluate the cost and difficulty of measuring. For example, we quantified a key usability scenario for MindMup as ‘Novice users will be able to create and share simple mind maps in under five minutes’. Once the definition was that clear, it turned out not to be so impossible or expensive to measure it.
Avoid checklists that are used to tick items off as people work (Gawande calls those Read-Do lists). Instead, aim to create lists that allow people to work, then pause and review to see if they missed anything (‘Do-Confirm’ in Gawande’s terminology). 
A major problem causing overly complex examples is the misunderstanding that testing can somehow be completely replaced by a set of carefully chosen examples. For most situations we’ve seen, this is a false premise. Checking examples can be a good start, but there are still plenty of other types of tests that are useful to do. Don’t aim to fully replace testing with examples in user stories – aim to create a good shared understanding, and give people the context to do a good job. 
Waiting for an event instead of waiting for a period of time is the preferred way of testing asynchronous systems
The sequence is important: ‘Given’ comes before ‘When’, and ‘When’ comes before ‘Then’. Those clauses should not be mixed. All parameters should be specified with ‘Given’ clauses, the action under test should be specified with the ‘When’ clause, and all expected outcomes should be listed with ‘Then’ clauses. Each scenario should ideally have only one ‘When’ clause that clearly points to the purpose of the test.
Difficult testing is a symptom, not a problem. When it is difficult for a team to know if they have a complete picture during testing, then it will also be difficult for it to know if they have a complete picture during development, or during a discussion on requirements. It’s unfortunate that this complexity sometimes clearly shows for the first time during testing, but the cause of the problem is somewhere else. 
Although it’s intuitive to think about writing documents from top to bottom, with tests it is actually better to start from the bottom. Write the outputs, the assertions and the checks first. Then try to explain how to get to those outputs. [...] Starting from the outputs makes it highly unlikely that a test will try to check many different things at once, 
Technical testing normally requires the use of technical concepts, such as nested structures, recursive pointers and unique identifiers. Such things can be easily described in programming languages, but are not easy to put into the kind of form that non-technical testing tools require. 
For each test, ask who needs to resolve a potential failure in the future. A failing test might signal a bug (test is right, implementation is wrong), or it might be an unforeseen impact (implementation is right, test is no longer right). If all the people who need to be make the decision work with programming language tools, the test goes in the technical group. If it would not be a technical but a business domain decision, it goes into the business group. 
Manual tests suffer from the problem of capacity. Compared to a machine, a person can do very little in the same amount of time. This is why manual tests tend to optimise human time [...] Since automated tests are designed for unattended execution, it’s critically important that failures can be investigated quickly. [...] To save time in execution, it’s common for a single manual test to check lots of different things or to address several risks. 
Whenever a test needs to access external resources, in particular if they are created asynchronously or transferred across networks, ensure that the resources are hidden until they are fully complete. 
Time-based waiting [sleep() instead of polling or waiting for an event in tests] is the equivalent of going out on the street and waiting for half an hour in the rain upon receiving a thirty-minute delivery estimate for pizza, only to discover that the pizza guy came along a different road and dropped it off 10 minutes ago. 
Instead of just accepting that something is difficult to test and ignoring it, investigate whether you can measure it in the production environment. 
Test coverage is a negative metric: it measures how bad something is, not how good it is. 
There are just two moments when an automated test provides useful information: the first time it passes and when it subsequently fails. 
It’s far better to optimise tests for reading than for writing. Spending half an hour more writing a test will save days of investigation later on. [...] In business-oriented tests, if you need to compromise either ease of maintenance or readability, keep readability. 

Thursday, December 29, 2016

Book review: The Power of Habit Duhigg, a New York Times reporter, collects stories of building and breaking habits, supporting the thesis that habits form an important part of our lives and that they can make a big difference for better or worse. Both in the case of positive training or learning habits, or in the case of addictions, repeated behavior influences our energy levels, our free time and in the end many of our long-term results at work and in life (we will all go the gym in the new year, right?)

The storytelling style of the book may smell like anecdotal evidence, but it keeps the reader intruigued and entertained long enough the get its message across, without delving into fictional stories. Take the science expressed here with a grain of salt (like you would with Malcom Gladwell): all experiments are real but they may have been cherry-picked to prove a point.

The key take away for me was to think about our habits, try to influence them to stop or reinforce them depending on our second-order desires; for example with the cue-routine-reward framework proposed in the book but ultimately with whatever works for you as habit building and destroying must be very context-specific. Another concept that we find reasonable is ego depletion (willpower as a finite resource that must be renewed), but the jury is still out on whether it is a confirmed and sizable effect, as meta-analysis of hundreds of studies do not agree yet (for good reasons).

From exercising to learning, or from quitting smoking to a Facebook addiction, this self-reflection can have a large impact over our lives. Maybe it should be an habit?

Selected quotes from the book follow:
This process within our brains is a three-step loop. First, there is a cue, a trigger that tells your brain to go into automatic mode and which habit to use. Then there is the routine, which can be physical or mental or emotional. Finally, there is a reward, which helps your brain figure out if this particular loop is worth remembering for the future [...] Every McDonald’s, for instance, looks the same—the company deliberately tries to standardize stores’ architecture and what employees say to customers, so everything is a consistent cue to trigger eating routines.
“Even if you give people better habits, it doesn’t repair why they started drinking in the first place. Eventually they’ll have a bad day, and no new routine is going to make everything seem okay. What can make a difference is believing that they can cope with that stress without alcohol.”
Where should a would-be habit master start? Understanding keystone habits holds the answer to that question: The habits that matter most are the ones that, when they start to shift, dislodge and remake other patterns.
“Small wins are a steady application of a small advantage,” one Cornell professor wrote in 1984. “Once a small win has been accomplished, forces are set in motion that favor another small win.” Small wins fuel transformative changes by leveraging tiny advantages into patterns that convince people that bigger achievements are within reach.
“Sometimes it looks like people with great self-control aren’t working hard—but that’s because they’ve made it automatic”
“By making people use a little bit of their willpower to ignore cookies, we had put them into a state where they were willing to quit much faster,” Muraven told me. “There’s been more than two hundred studies on this idea since then, and they’ve all found the same thing. Willpower isn’t just a skill. It’s a muscle, like the muscles in your arms or legs, and it gets tired as it works harder, so there’s less power left over for other things.”
As people strengthened their willpower muscles in one part of their lives—in the gym, or a money management program—that strength spilled over into what they ate or how hard they worked. Once willpower became stronger, it touched everything.

Monday, November 14, 2016

How to run Continuous Integration on EC2 without breaking the bank

I live in a world increasingly made up of services (sometimes micro-), that collaborate to produce some visible behavior. This decomposition of software into (hopefully) loosely coupled components has implications for production infrastructure concern such as how many servers or containers you need to run. 
Consequently, it impacts also testing infrastructure as it tries to be an environment as close as possible to production, in order to reliably discover bugs and replicate them in a controlled environment without impacting real users.
In particular I like to have a ci testing environment where each service can be tested in its own virtual machine (or container for some); and each node is totally isolated from the other services. In addition to that, I also like to have and end-to-end testing environment where services talk to each other as they would in production, and where we can run long and complex acceptance tests. 
This end-to-end environment usually is a perfect copy of production with respect to the technologies being used (e.g. load balancers like HAProx or AWS ELB are in place, in the same way of production, even if there is no test that directly targets their existence); the number of nodes per service is however reduced from N to 2, as in computing there are only 0, 1 and N equivalence categories.

In the likely case that you're using cloud computing infrastructure to manage this 2x or 3x volume of servers with respect to the production infrastructure, your costs are also by default going to double or triple. One option to try and optimize this is to throw away everything and start deploying containers, as they could share the same underlying virtual machines as production while preserving isolation and reproducibility. On an existing architecture made up of AWS EC2 nodes however, optimization can takes us far without requesting to rewrite all the DevOps(TM) work of the last two years.

Phase 1: expand

As I've been explaining, EC2 instances replicating production environments can expand until they bring the total number of EC2 nodes to three times the original number. Some project just have an single EC2 node, while others have multiple nodes that have to be at least 2 in the latest testing environment before production. Moreover, the time the tests take to run on these instances is inversely correlated with how powerful they are in CPU and I/O terms, so you pay good money for every speed improvement you want to get on those 20- or 60-minute suites.
In my current role at eLife, we initially got to more than 20 EC2 instances for the testing environments. This was beneficial from a quality and correctness point of view, as we could then run tests on all mainline branches before they go to production, but also on pull requests, giving timely feedback on the proposed changes without requiring developers to run everything on their machines (that should be an option, not an imperative.)

Phase 2: optimize

The AWS EC2 pricing model works by putting nodes into a running state when you launch them, and by pre-billing one hour of usage every 60 minutes. Therefore, booting existing instances or creating from scratch is going to incur at least a 1-hour cost for each of these events:
  • at boot, 1 hour is billed
  • at 60:00 from boot a new hour is billed,
  • at 120:00 from boot a new hour is billed, and so on
All the EC2 nodes that we have however, use EBS disks for their root volumes. EBS is the block remote storage provided by AWS, and while some generations of instances use local instance storage for their / partition, EBS makes a lot of sense for that partition as it gives you the ability to starting and stopping instances without losing state in between; essentially, it gives you the ability to shutdown and reboot instances when you need without paying EC2 bills for the time in which they are stopped. The only billing being performed is for the EBS storage space, which means AWS has some hard disks in its data centers that has to preserve your instances files, but it allocates new CPU and RAM resources as a virtual machine only when you start the EC2 instance. Therefore, an EBS-backed EC2 instance on your AWS console does not always correspond to a physical place, but it's really virtual as it can be stopped and started many times, moving between racks in the same availability zone while keeping the same data (persisted to multiple disks in other racks) and even private IP address.
Since the process of allocating a new virtual machine and reconfiguring networks to connect everything together has some overhead, this reflects in the boot time necessary to start an EC2 instance, which can be several seconds to be added to the standard boot time for the operating system. This is also reflected in the pricing model, which makes it a bad idea to launch a new EC2 instance for every test suite you need to run: as long as your test suite takes less than 1 hour, you are already paying for a full hour of resources and you would be throwing away. Running 6 builds in an hour would make you pay for 6 hours, which is not what you want.

Phase 2.1: stop them manually

A first optimization that can be performed manually is to stop and start these instances from the console. You would usually stop them at some hour of the day or evening and then start them again as the first thing in the morning.
Of course, there is a good potential for automating this as AWS provides many ways to access EC2 with its API and all the SDKs that build on top of it, available for many different programming languages. You can easily build some commands that query the EC2 API looking for an instance basing on its tags, and then issue commands for starting and stopping it. In general, this is almost transparent for the CloudFormation templates that you are surely using for launching these instances.
The first time you start and stop an instance, there are a few problems thay may come up.
The first problem is that of ephemeral storage: as I wrote before, you have to make sure the root volume of the instance and any data you want to persist are EBS-backed and not local instance storage.
The second problem is that of public IP addresses. While private IP addresses inside a VPC stay the same after a stop and start commands, public IP addresses are a scarce resource and are only allocated to it when the instance is running. Therefore, if you had a DNS pointing to it, it has to be updated after the boot, whether it was manually created or part of the CloudFormation template. Default DNS entries have the form which depends on the public ip address and hence does not provide a good indirection.
The third problem is that of long-running processes managed by SysV/Upstart/Systemd: the daemons of servers like Apache, Nginx or MySQL are usually configured to restart upon boot, but if you have written your own deamons or Python/PHP long running processes and are starting them through /etc/init or /etc/init.d configuration, it pays to check everything is in its place again after boot.
The last problem I have found at this level (manual restarts) is about files in /run and /var/run, which are temporary directory used by deamons to place locks and other transient files like a pidfile indicating an instance of that program is running. If you have folders in /run or /var/run, those folder will have to be recreated. Systemd provides the tmpfiles.d option which automatically creates a hierarchy of files, but it's usually just easier (and portable) to have the daemons create their folders (php-fpm does that) or if they are not able to do that, not placing them in /var/run/some_folder_that_will_stop_existing but in /var/run or even /tmp without subfolders.

Phase 2.2: start them on demand

Instead of manually starting EC2 instance or to automate their stopping and starting as a periodical task, you can also start them on-demand as needed by the various builds that need to be run. So whenever project x needs to build a new commit on master or a pull request, you will start the x--ci EC2 instance.
In this case, however, there is a larger potential for race conditions as you may try to run a deploy or any command on an instance before it's actually ready to be used. Therefore, we wrote some automation code that waits for several events before letting a build proceed:
  1. the instance must have gone from the pending to the running state on the EC2 API. This hopefully means AWS has found a CPU and other resources to assign to it.
  2. the instance must be accessible through SSH.
  3. through SSH, we monitor that the file /var/lib/cloud/instance/boot-finished has appeared. This file will appear at each boot when all daemons have been started, as art of the standard cloud-init package.
Once the instance has gone through all these phases, you can SSH into it and run whatever you want.

Phase 2.3: stop them when it's more efficient

Once you have transitioned from starting instances in the last responsible moment, you can do the same for stopping them instead of just wait for the end of the day to shutdown everything.
We now have a periodical job, running every 2 minutes, that takes a list of servers to stop. In parallel, it performs the following routine for each of the EC2 instances:
  • checks if the server has been running for an amount of time between h:55:00 and h:59:59 minutes, where h is some number of hours.
  • if the condition is true, stop the instance before we incur in a new hour being billed.
  • otherwise, leave the instance running: you already paid for this hour so it makes no harm to do so, as the instance can be used to run new builds at no cost.
Therefore, when developers open a dozen pull requests on the same project, only the first starts the necessary instances to run the tests; the other ones are queued behind that and will get access to the same instance, one at a time.

Bonus: Jenkins locks

Starting and stopping instances periodically would be otherwise dangerous if there wasn't a mechanism for mutual exclusion between builds and lifecycle operations like starting and stopping. Not only you don't want to run builds for the same project on the same instance if they interfere with each other, but you definitely don't want an instance to be shutdown while a build is still running.
Therefore, we wrap both these lifecycle operations and builds in locks for resource, using Jenkins Lockable Resources plugin. If the periodical stopping task tries to stop an instance where the build is running, it will have to wait to acquire the lock. This ensures that machines that see many builds do not get easily stopped, while other ones that are idle will be stopped at the end of their already paid hour.


Cloud computing is meant to improve the efficiency with which we allocate resources from someone else's data centers: you pay for what you use. Therefore, with a little persistence provided by EBS volumes you can efficiently pay for the hours that your builds require, and not for keeping idle EC2 instances running every day of the year. Of course, you'll need some tweaking of your automation tools to easily start and stop instances; but it is a surefire investment that can usually save more than half of the cost of your testing infrastructure by putting it at rest in weekend and non-working hours.