Invisible to the eye: 2014

Sunday, September 21, 2014

Microservices are not Jars

The cult of the monolith

I've been building microservices for two years and my main complaint is that they're still not micro- enough. Here's a rebuke of Uncle Bob's recent post Microservices and Jars, which he apparently has written after forming an opinion based on an article in Martin Fowler's bliki:

One of my clients recently told me that they were investigating a micro-service-architecture. My first reaction was: "What's that?" So I did a little research and found Martin Fowler's and James Lewis' writeup on the topic.

"I didn't even know what microservices were up until several days ago. Now I'm ready to pontificate about the topic."

So what is a micro-service? It's just a little stand-alone executable that communicates with other stand-alone executables through some kind of mailbox; like an http socket. Lots of people like to use REST as the message format between them.
Why is this desirable? Two words. Independent Deployability.

Let's ignore the REST as the message format terminology. Only two words? Independent deployability is nice, but I've seen cases where independence is total, and cases where an end-to-end test suite still needs to run including the production version of services A and B and the new version C' that we want to deploy to substitute C.
Other interesting properties of microservices such as scaling them independently come to mind. Or writing them in different languages. Or adapting to Conway's law by aligning teams with microservices for most of their work.

You can fire up your little MS and talk with it via REST. No other part of the system needs to be running. Nobody can change a source file in a different part of the system, and screw your little MS up. Your MS is immune to all the other code out there.
You can test your MS with simple REST commands; and you can mock out other MSs in the system with little dummy MSs that do just what your tests need them to do.
Moreover, you can control the deployment. You don't have to coordinate with some huge deployment effort, and merge deployment commands into nasty deployment scripts. You just fire up your little MS and make sure it keeps running.
You can use your own database. You can use your own webserver. You can use any language you like. You can use any framework you like.
Freedom! Freedom!

<sarcasm> tag needed.

But wait. Why is this better? Are the advantages I just listed absent from a normal Java, or Ruby, or .Net system?

Yes:

existing databases tend to be attractors when new persistence requirements come up. So if I have MySQL up and running in my application and a job that would be a good fit for MongoDB comes up, I'm definitely not going to introduce MongoDB given the infrastructure setup time. I'll just go with the existing infrastructure and create some new tables, perpetuating the growth of the monolith.
Web servers are often tied to languages. If I want to use Node.js it will listen on the port 80 by itself, while PHP is commonly used with Apache, and Java with Tomcat or Jetty.
JARs are a pretty JVM-specific packaging system. I'm definitely not going to put PHP code into JARs.
Frameworks come from the language, and even inside the same language I can have multiple PHP applications where one has a custom user interface and one serves a Angular single-page application.

Also the ones not listed:

It's easier to find out machines which contain bottlenecks and replace them, CPU and IO usage maps directly to applications.
It's easier to get started working as a new developer because you need just a single microservice to run on your machine.
It's easier to throw away one microservice and replace it with a new one doing the same job, but better written.

What about: Independent Deployability?
We have these things called jar files. Or Gems. Or DLLs. Or Shared Libraries. The reason we have these things is so we can have independent deployability.

Replacing single JARs or DLLs seems pretty dangerous to me where there are compile-time and binary dependencies in play. Since Uncle Bob has experience with that, I'm going to trust him to deploy safely this way.

Most people have forgotten this. Most people think that jar files are just convenient little folders that they can jam their classes into any way they see fit. They forget that a jar, or a DLL, or a Gem, or a shared library, is loaded and linked at runtime. Indeed, DLL stands for Dynamically Linked Library.
So if you design your jars well, you can make them just as independently deployable as a MS. Your team can be responsible for your jar file. Your team can deploy your DLL without massive coordination with other teams. Your team can test your GEM by writing unit tests and mocking out all the other Gems that it communicates with. You can write a jar in Java or Scala, or Clojure, or JRuby, or any other JVM compatible language. You can even use your own database and wesbserver if you like.

You can use any language you like as long as you run it on the JVM. Sure there must be people who work on other infrastructure or don't want to run their languages on a compatibly-yet-really-secondary platform? PHP applications? Ruby programmers?

If you'd like proof that jars can be independently deployable, just look at the plugins you use for your editor or IDE. They are deployed entirely independently of their host! And often these plugins are nothing more than simple jar files.
So what have you gained by taking your jar file and putting it behind a socket and communicating with REST?

SOAP is the last acronym where simple was used this way. Look, by generating a WSDL from your objects along with an XSD file that can be used to validate XML messages you can pass requests over HTTP with a Soap-Action header and regenerate Java (or other compatible languages) code from the WSDL...

One thing you lose is time. It takes time to communicate through a socket. It takes time to decode REST messages. And that time means you cannot use micro-services with the impunity of a jar. If I want two jars to get into a rapid chat with each other, I can. But I don't dare do that with a MS because the communication time will kill me.

Of course, chatty fine-grained interfaces are not a microservices trait. I prefer accept a Command, emit Events as an integration style. After all, microservices can become dangerous if integrated with purely synchronous calls so the kind of interfaces they expose to each other is necessarily different from the one of objects that work in the same process. This is a property of every distributed system, as we know from 1996.

On my laptop it takes 50ms to set up a socket connection, and then about 3us per byte transmitted through that connection. And that's all in a single process on a single machine. Imagine the cost when the connection is over the wire!

It takes more to write a file line by line rather than doing it in a single shot. However, if the file is 2GB long, I prefer the first solution in order to preserve memory. I'm just trading off time for another resource.
In the case of microservices, I'm trading off the latency of single interactions between different services for more important resources: programmer time, independent scalability, even time experienced by the end user. A front end asynchronously publishing events to a backend service feels faster to the user than a monolithic application where I respond to user requests and generate report lines in the same process or on the same machines.

Another thing you lose (and I hate to say this) is debuggability. You can't single step across a REST call, but you can single step across jar files. You can't follow a stack trace across a REST call. Exceptions get lost across a REST interface.

To me debuggability and introspection into an application improves when using microservices, because you will be full of all the HTTP logs of every service calling one another. You don't have to predispose logging cut points as they come for free with the HttpChannel objects. For a more business-oriented monitoring, take a look at Domain Events: we publish them from different applications in order to build reports based on data from different components.

After reading this you might think I'm totally against the whole notion of Micro-Services. But, of course, I'm not. I've built applications that way in the past, and I'll likely build them that way in the future. It's just that I don't want to see a big fad tearing through the industry leaving lots of broken systems in it's wake.

For most systems the independent deployability of jar files (or DLLS, or Gems, or Shared Libraries) is more than adequate. For most systems the cost of communicating over sockets using REST is quite restrictive; and will force uncomfortable trade-offs.

Paraphrasing Stroustrup, there are only two kinds of achitectures: the ones people complain about and the ones nobody uses. We are here proposing microservices because they have provided value in many systems that were once thought not to need them. As long as you have reporting needs you don't want to burden your front end with, or need to scale up in the number of users or programmer, you can consider microservices (and their cost).

My advice:

Don't leap into microservices just because it sounds cool. Segregate the system into jars using a plugin architecture first. If that's not sufficient, then consider introducing service boundaries at strategic points.

Please don't! The interaction between microservices are very different from the ones between objects inside a single application. Each call outside of the boundary is a potential failure mode that you should try to model as an asynchronous message that can be retried when delivery fails (the receiving microservice being down, slow or not reachable). Retrofitting microservices over an existing code base is a costly endeauvour and you should only embark on it if you have an adequate time and money budget, possibly bigger than the one necessary to build with microservices in the first place.

Sunday, August 17, 2014

Tabular data in Behat

All of this has happened before, and all this will happen again. -- BSG

I just watched Steve Freeman short talk "Given When Then" considered harmful (requires free login), and I was looking for some ways to cheaply eliminate duplication in Behat scenarios.

Fortunately, Behat supports Scenario Outlines for tabular data which is an 80/20 solution to transform lots of duplicated scenarios:

    Scenario: 3 is Fizz
        Given the input is 3
        When it is converted
        Then it becomes Fizz

    Scenario: 6 is Fizz too because it's multiple of 3
        Given the input is 6
        When it is converted
        Then it becomes Fizz

    Scenario: 2 is itself
        Given the input is 2
        When it is converted
        Then it becomes 2

into a table:

    Scenario Outline: conversion of numbers
        Given the input is <input>
        When it is converted
        Then it becomes <output>

        Examples:
            | input | output |
            | 2     | 2      |
            | 3     | Fizz   |
            | 6     | Fizz   |

Moreover, you can also pass tabular data to a single step with Table Nodes:

    Scenario: two items in the cart
        Given the following items are in the cart:
            | name    | price |
            | Cake    |     4 |
            | Shrimps |    10 |
        When I check out
        Then I pay 14

It takes a few minutes to learn how to do this into an existing Behat infrastructure. There are minimal changes to perform in the FeatureContext in the case of the Table Nodes, while Scenario Outlines are a pure Gherkin-side refactoring.

My code is on Github in my behat-tables-kata repository. If this reminds you of PHPUnit's @dataProvider, try to think of other patterns that can be borrowed from the xUnit world to fast-forward Cucumber and Behat development.

Saturday, August 16, 2014

PHPUnit Essentials review

https://www.packtpub.com/application-development/phpunit-essentials

PHPUnit Essentials by Zdenek Machek is a modern and complete book about PHPUnit usage. I've been sent an electronic copy by Packt Publishing and am now reviewing it here.

The first thing that struck me about the book was the breadth of subjects: you start from mocks and command line options, to get even to Selenium usage. You have to know your tools and given PHPUnit being a standard, this is all knowledge that will accompany you for several years.

Every book on PHPUnit must be compared with the wonderful manual, to see what it adds to the picture with respect to the documentation. PHPUnit Essentials, in this respect, looks also at 3rd party libraries such as mocking libraries or "competitors" such as PHPSpec to enlarge the picture to the whole open source PHP landscape. This is something the documentations of single projects cannot do, and where a bit of opinionated advice can be taken.

There is a bit of what may seem outdated information in the book such as how to perform a PEAR-based installation, but it's identified as such (PEAR being deprecated and dismissed by the end of the year.) Another seemingly outdated tool is Selenium IDE, but once upgraded with a formatter for Selenium2TestCase like explained in this book it becomes usable again. This kind of advice demonstrates the real world experience of the author and makes you trust the content.

On the whole by reading this book you go in as a naive tester and you come out with lots of skills on using PHPUnit in different scenarios; so I would recommended it to programmers wanting to dive into testing PHP applications. Probably it's not worth a read for the medium-to-advanced users, for which most of the content is already known from PHPUnit manual or personal experience. After all the book's named Essentials, so it delivers all that you expect from the title in a convenient single package.

Saturday, July 19, 2014

Skateboards, rockets and math

This slide from Spotify has been popular for a while:

Loving this slide! pic.twitter.com/YTOUhpdUp2

— blat (@ferblape) March 26, 2014

It explains how a product can be built iteratively, satisfying first the need for transport with lesser means and then evolving to a more powerful platform. In this model feedback such as business model validation and satisfaction from the project sponsors can arrive early, even when they're negative (especially so).

From what I read about Spotify, they're also well-aware that incremental development can only take you so far: you don't get a car by making a better bicycle. Sometimes you have to take a leap to a new platform; or if it's clear that simpler technology won't support your vision, start from an higher level of essential complexity.

Here's someone that didn't start from a skateboard:

[PHOTO] Falcon 9 lifts off carrying 6 ORBCOMM satellites to orbit. pic.twitter.com/Os04NVR3oI
— SpaceX (@SpaceX) July 15, 2014

Imagine telling Spotify to install WebSphere (or some other technological terror) as the first step when starting a brand new project; or telling SpaceX teams "Come on, Elon, just give us a bicycle and we'll get some first sales!"

Or telling Larry Page that programming isn't math:

Keeping in mind this strong dependency on context, where do the competitive advantages of your product lie?
In finding a better fit with the needs of users, maybe a lower time to market? In solving technology problems to carry humanity into space at an acceptable cost? In algorithms that can find high quality information in the web ocean? In fooling VCs in giving you free money?

From your vision, your choices of education, process, and technology.

Friday, April 25, 2014

The full list of my articles on DZone

From 2010 to the end of 2013 I have written a few articles each week on DZone. Here is the full list as a reference.

Update: 98% of these articles are serving correctly again (only 10 links are being updated).

Practical PHP Patterns

Practical PHP Refactoring

Practical PHP Testing Patterns

Polls

What paradigm should PHP applications embrace?
Is touch typing mandatory?
Which PHP framework would you use today for a brand new application?
Which browser do you consider the fastest?
What new feature in PHP 5.4 is the most important to you?

Wednesday, April 23, 2014

Integrated tests are not feeling well. Long live design.

Take me down to the big Rails city / where the tests are green and they take 10 minutes

I checked the date this afternoon:
$ date
Wed Apr 23 14:38:08 CEST 2014
and apparently it's 2014, but there's still a widely held belief (in some circles) that integrated (or end-to-end) tests should be favored over unit tests. A belief that Test-Driven Development does not have a beneficial influence on the quality of your tests and code.
So today I'm repeating a few things I have been writing about in the last years.

Don't be too proud of this technological terror you've constructed. The ability to INSERT a row is insignificant next to the power of Domain Models.

A few properties of unit tests

Unit tests target one or a few objects at a time, without accessing different resources than the CPU and memory of the current process. With respect to integrated tests, they are:

Easier to write: their setup phase takes a few lines where Test Doubles are injected in a constructor.
Faster: a 1000-test suite takes seconds to run.
Isolated: they cannot interfere with each other or with the global state of the process.
Repeatable: they always give the same result no matter the initial conditions.
Precise: they tell you which wire does not work instead that the car does not start.

Listen to your tests

How many asserts must a man write down / before you call him a man

That's not to say integrated tests are not useful: I have a talk coming up at phpDay in which I will explain how our Behat test suite work and the optimizations we made to keep it under 5 minutes. Some concerns where integration tests shine are making sure systems built by different teams work together, refactoring legacy code, and acceptance-level tests written in the customer's language.
However, the ratio of unit tests to integrated tests should be in the range of 10 to 1, or even 50 to 1. If there is a force at work in a project that pushes for more integrated tests than unit tests, you are falling into a vicious cycle where instead of writing:
assertEquals("1.00", new Money(100).toString());
you're writing, more often than not:
createSubscription();
assertEquals(
"<span class="money">1.00</span>",
findPriceTag(get("/subscription/" + id))
);
and promoting coupling between the Money, Subscription and PageTemplate objects.
The Listen to the tests principle tell us to take difficult-to-write integrated tests as a smell: a warning that we need to break the dependencies between objects to be able to reuse them, for example in isolated tests (lowering coupling); and move responsibilities around until objects respond to an interface with a small surface area (increasing cohesion).
The benefit of TDD is continuously applying these two forces in your codebase. Renouncing to it while favoring integrated tests is thinking you're able to do the same in your mind, for the rest of the life of your codebase. We test because we don't want to break features, such as being able to perform a payment; but we unit test because we don't want to impact non-functional concerns such as reuse and the ability to change.

Take me to the magic of the moment / on a glory night / where the objects of tomorrow dream away / in the wind of change

Resources

I'll stop short. Here's where you can know more about integrated and unit tests, explained by some of the best people in the field.

The Test Pyramid, codified by Martin Fowler.
Integrated tests are a scam and J.B. Rainsberger's blog. Since following his TDD course last fall I have a better mental model of the test-driven process and its output.
Some examples of listening to the tests to pursue a simpler design.

And finally the style of this post is inspired by Call me maybe: MongoDB.

Invisible to the eye