Sunday, May 22, 2016

Eris 0.8.0 is out

In the period before my move to Cambridge I got some time to work on Eris, and to use it to test the guts of Onebip's infrastructure. Lots of new features are now incorporated in the 0.8.0 version, along with a modernization of PHP standards compliance carried out by @localheinz.

What's new?

Here's the most important news, a selection from the ChangeLog:
  • The bind Generator allows to use the random output of a Generator to build another Generator.
  • Optionally logging generations with `hook(Listener\log($filename))`.
  • disableShrinking() option.
  • limitTo() accepts a DateInterval to stop tests at a predefined maximum time.
  • Configurability of randomness: choice between rand, mt_rand, and a pure PHP Mersenne Twister.
  • The suchThat Generator accepts PHPUnit constraints like `when()`.
Some bugs and annoyances were fixed:
  • No warnings on PHP 7 anymore.
  • Fixed bug of size not being fully explored due to slow growth.
  • Switched to PSR-2 coding standards and PSR-4 autoloading.

And there were some backward compatibility breaks (we are in 0.x after all):
  • The frequency generator only accepts variadics args, not an array anymore.
  • Removed strictlyPos and strictlyNeg Generators as duplicated of pos and neg ones.                                     
  • Removed andAlso, theCondition, andTheCondition, implies, imply aliases which expand the surface area of the API for no good reason. Added and for multiple preconditions.
Eris is now quite extensible with custom Generators for new types of data; custom Listeners to know what's going on; and even different sources of randomness to tune repeatability and performance.
I believe what's very important about this release is the release of technical documentation. This is not a list of APIs generated by parsing the code, but is a full manual of Eris features, which will be kept up-to-date religiously in the repository itself and rebuilt automatically at each commit.

What's next?

My Trello board says:
  • decoupling from PHPUnit: it should be possible to run Eris also with PHPSpec (already possible but not as robustly as it can be) or in scripts.
  • Multiple possibilities for shrinking, borrowing from test.check rose trees. This feature may speed up the shrinking process and make it totally deterministic.
  • A few more advanced Generators: for example testing Finite State Machines.
If you are using Eris and wanna give feedback, feel free to open a Github issue to discuss. 

Tuesday, April 19, 2016

Next stop: Cambridge

Last Friday has been my last working day in Onebip, the carrier billing payment platform headquartered in Milan. I leave the best technical team I have ever worked with, who has tackled endless challenges from transitioning to a microservice architecture, to adopting CQRS and Event Sourcing, and testing a large product depending on the integration with 400 mobile carriers.

In May, I will start a new adventure as a Software Engineer in Test at eLife. Located in Cambridge, eLife is an open access journal that publishes scientific articles in the fields of biology and medicine, with the goal of improving the peer review process and accelerating science. As a non-profit organization, it's quite a different context with respect to selling product and services, but indeed a potentially large and positive impact on the world.

Cambridge is a city of research and technology, and welcomes students and scientists, but also software developers like me. Moreover, it's small and peaceful (you can cycle around anywhere), while showing peaks of high technical level. It's the first place where I have been to a study group on the book Structure and Interpretation of Computer Programs, or to a quite good introductions to machine learning talk (and not to Transpile typed ECMAScript without left-pad nor using arrays because you would need a polyfill for that or some other hipster hallucination).
See you on the other side of the Channel...

Tuesday, March 15, 2016

When to write setters

I have set out, almost unconsciously, to use constructor injection by default in the last few years while writing object-oriented applications. With Dependency Injection as a given, constructor injection satisfy most of my requirements for building an object graph and dynamically configuring collaborators.

The spectrum

I see the statefulness of an object not as an absolute but over a spectrum.
At one end of the spectrum we have immutable objects: these objects acquire a configuration in their constructor and are effectively final (to employ a Java-specific term) for the rest of their lifecycles. Their fields are private and there's no way of modifying them outside of the constructor; their collaborators are only scalars or other immutable objects. In Java fields can even be final so that accidental reassignment is regarded as a compile-time error.
The physical state of an object may still change without its external behavior being affected, like in the case of caching. I still consider this kind of white-box-immutable objects as immutable.
Stateful objects instead may have a behavior that changes as a result of its own state (or of its collaborators). You call some Command methods on it, and the response of further calls to Query methods change. Hopefully the Commands encapsulate some domain logic to constrain the state transition to a valid and modelled one.
At the extreme end of the spectrum we find setters: methods which only mutate the value of one or more fields, possibly skipping any validation, domain modeling or consistency check. The setters considered here are public methods, because their limited-scope versions do not provide the same violations.
If you want to write procedural code, setters proliferate (still it's probably easier to just use public fields at that point). There are only a few valid use cases where I have found setters useful in object-oriented programming, and here is the (short) list.

Configuration which has default values

Classes may have a few configuration values that you are able to tune; especially when there is more than a few of these parameters, I find useful to separate hard dependencies in the constructor and setters that are able to override the default parameters when the object has already been constructed. If you forgot to call these methods, the object still has to work correctly.
Alternative solutions for this use case are of course constructors with default parameters, which I definitely prefer if there are not so many options to tune (1 or 2). You can also look into with Value Objects which produce a new instance upon reconfiguration, and model all of the configuration parameters as a single entity; or into a Builder if you want to invest in an additional class and its API.

Adding Observers

Observers (or listeners if you prefer) are collaborators which are notified of internal events happening inside an object that may interest them.
I treat the observers of an object as an append-only list data structure, with an empty list or array as the natural Null Object. The object is initialized without any observer, and a setter like addListener(...) has the more limited capability of adding an observer but not of removing or modifying an existing one.
The nature of the Observer pattern is investing on a common bus that many observers can be attached to, even if they come from different packages and libraries. Therefore I find it natural to support the dynamic wiring of other objects, even if they make the object more mutable can before. The needs of integration become more important than guaranteeing safety of construction in these scenarios.

Reconstitution

When objects are unserialized from a cold storage such as a stream of bytes or a JSON object, encapsulation is very likely going to be violated. Object-relational mappers have been doing this for ages by working directly on annotated private fields, with sometimes powerful results but also lots of dangers from storage and code being out of sync.
In the scenarios where you control the reconstitution of objects, such as rebuilding an object from a MongoDB document, it's often easier to provide an explicit API like a setState() method than to rely on the magic of a library which is going to bypass your public methods. To constrast the possible misuse, you can tag this method as @private (or package protected in Java), or make it very awkward to use outside of the persistence context by requiring a particular data structure to be passed in.

Conclusion

There are very few use cases for setters in real object-oriented programming; default to constructor injection and to immutable objects to avoid overcomplicating your design. Employ setters for non-mandatory, cross-cutting initializations so that your code does not have to bend over backwards to support these use cases while at the same time it can be robust to cowboy modification of internal state.

Monday, March 14, 2016

On property-based testing a highly concurrent job queue

Exploring Eris
Recruiter is a job queue written in PHP, open sourced by Onebip in its 2.x version. It has been used on some inward facing production services, but following the necessity to roll it out to more and more project, I have started a thorough testing campaigns to flush out possible concurrency bugs.
The job system is composed of a single Recruiter process and multiple Worker processes (moreover any other PHP process can enqueue a job). These processes may run on any machine inside a local network and share a MongoDB database where they collaborate to empty the collection of jobs to do. The design of these multiple collections is carefully tuned for scalability, as in its first version Recruiter used heavier findAndModify operations which is now free of.

Property-based testing

Testing some happy paths such as adding a job and executing it is fine for test-driving the code, but it's nowhere near enough for quality assurance. On system of any appreciable scale and/or quality, testing is a separate additional activity (that can hopefully be performed by developers wearing a different hat, or in any case inside a single cross-functional team.)
To test highly concurrent processes such as a recruiter and its dozens of workers insisting on the same database, we adopted Eris, the open source PHP QuickCheck implementation developed by me and some colleagues. Eris is able to generate random inputs for the System Under Test, according to a specification provided by the tester; it supports property-based testing which drives the system with this input while checking important properties are respected.
In this scenario, we generated a random sequence of actions to perform over these processes, checking invariants and post-conditions of operations. For example, an invariant is there is never more than one recruiter process alive. There are surprisingly few invariants when you work with distributed systems; as another example consider the number of workers registered in the related MongoDB collection. This number is not fixed, as crashed processes may still be present even if dead, as long as the rest of the system didn't detect the crash yet.
One postcondition of the job system is very important: any job enqueued is eventually executed, preferably as soon as possible. In these tests, we focused on testing the correctness of this property and not the performance. We monitor the collection of archived jobs (which have been executed correctly) and check that it fills up with all the jobs we expect. The timeout after which we declare the test failed is tuned to the total number of actions performed, which is random.
There are more advanced approaches such as generating a sequential prefix plus a few parallel sequences of actions. This could give more control over the process and may enable some form of shrinking with better determinism; however we retain a notion of parallelism by creating multiple processes. Unfortunately each run is non-deterministic as processes and the underlying MongoDB instance can be scheduled differently by the operating system, changing the interleave of their operation; therefore shrinking is not possible, or is possible only at the cost of running shrunk sequences multiple times to reliably mark them as passing.

Iteration: random number of jobs, graceful restarts

In the first version of the test, we generated a random number of identical jobs (executing a "echo 42" command), along with a series of restarts of the recruiter and a single worker process using SIGTERM. The jobs were enqueued serially by the test process, along with the restart actions. In theory, the processes intercept the signals and exit after having finished their current cycle of polling or execution.
Here are the bugs that we found:

Iteration: multiple workers

Once the test suite was consistently green, we extended the testing environment by allowing multiple workers to be created and correctly restarted.
We found an additional problem with this extension:

Iteration: crashing workers

We added the possibility of killing a worker with SIGKILL, immediately interrupting it even in the middle of database updates.
The possibility of a worker crashing was already covered by the code. However, we tuned the timeout period after which workers are considered dead while inside the test suite; we set it to dozens of seconds instead of half an hour to allow for sane waiting periods in the test process.

Iteration: crashing the recruiter

Killing the single recruiter process was interesting because it usually takes a lock (in the form of a document inside a MongoDB collection with a unique index) to avoid accidental multiple executions. The process correctly waited on the previous lock to expire before restarting, but...

Iteration: length of jobs

We introduced also a random length for enqueued jobs (sleeping from 0ms to 1000ms instead of executing a fixed command). At this point we did not find additional bugs at the time of this post, with the test suite running for several hours, exploring new random possible sequences of actions.

Final version

The final version of the test composes an Eris Generator that:
  • generates a number of workers to start between 1 and 4.
  • using this number, creates a new Generator that produces a tuple (in this case a pair, which means an array of two elements of disparate types). The tuple contains the number of workers itself and the list of actions.
The list of actions is a sequence of a random number of elements, where each of the elements can in turn be an action representing:
  • a job to enqueue with an expected duration of a positive number of milliseconds
  • a graceful restart of one of the workers
  • a graceful restart of the recruiter
  • a kill -9 on one of the worker processes
  • a kill -9 on the recruiter process
  • a sleep of a number of milliseconds between 0 and 1000
Here is an example of action sequence:

[ACTIONS][PHPUNIT][2016-03-14T12:11:14+01:00] ["enqueueJob",8]
[ACTIONS][PHPUNIT][2016-03-14T12:11:14+01:00] "restartRecruiterGracefully"


While here is a moderately complex example:

[ACTIONS][PHPUNIT][2016-03-14T12:11:59+01:00] ["restartWorkerByKilling",0]
[ACTIONS][PHPUNIT][2016-03-14T12:11:59+01:00] ["restartWorkerByKilling",0]
[ACTIONS][PHPUNIT][2016-03-14T12:11:59+01:00] "restartRecruiterGracefully"
[ACTIONS][PHPUNIT][2016-03-14T12:12:00+01:00] ["enqueueJob",7]
[ACTIONS][PHPUNIT][2016-03-14T12:12:00+01:00] ["restartWorkerByKilling",0]
[ACTIONS][PHPUNIT][2016-03-14T12:12:00+01:00] ["enqueueJob",13]
[ACTIONS][PHPUNIT][2016-03-14T12:12:00+01:00] "restartRecruiterByKilling"
[ACTIONS][PHPUNIT][2016-03-14T12:12:00+01:00] ["restartWorkerGracefully",0]
[ACTIONS][PHPUNIT][2016-03-14T12:12:00+01:00] ["restartWorkerByKilling",0]
[ACTIONS][PHPUNIT][2016-03-14T12:12:00+01:00] "restartRecruiterGracefully"
[ACTIONS][PHPUNIT][2016-03-14T12:12:10+01:00] ["sleep",860]
[ACTIONS][PHPUNIT][2016-03-14T12:12:11+01:00] "restartRecruiterByKilling"
[ACTIONS][PHPUNIT][2016-03-14T12:12:12+01:00] ["restartWorkerByKilling",0]
[ACTIONS][PHPUNIT][2016-03-14T12:12:12+01:00] ["enqueueJob",0]
[ACTIONS][PHPUNIT][2016-03-14T12:12:12+01:00] ["restartWorkerByKilling",0]
[ACTIONS][PHPUNIT][2016-03-14T12:12:12+01:00] ["enqueueJob",5]
[ACTIONS][PHPUNIT][2016-03-14T12:12:12+01:00] "restartRecruiterGracefully"

The parameter in the steps modelled as arrays is the duration of a job, or the number of the worker in case of restarting actions.

The test generates 100 of these sequences (this number is tunable, or can target a time limit). For each of them it creates an empty database, starts the workers and the recruiter, performs the actions and waits for all jobs to be performed. If the timeout for full execution expires, the test is marked as failed and lists the log files to look at to understand what happened. On my machine, the test now terminates in about one hour, with a green bar.

Conclusions

Testing is an important activity and can increase the quality of your software by removing bugs before they can get to one of your customers. Testing is becoming more and more incorporated in the lifes of developers (see Test-Driven Development and Behavior-Driven Development), but for core domains and infrastructure additional activities are required for stress and performance tests comparable to production traffic.
It is however impossible to write by hand tests for all the possible situations; however you can easily build a reasonable model of the input to your system. So let me quote John Hughes in saying "Don't write tests. Generate them"; with property-based testing you can write one test containing one property, and catch dozens of bugs like in this post's case study.

Sunday, February 28, 2016

Building an application with a JavaScript-only stack

As I often do when checking out a new platform or language, I have been building a new pure JavaScript implementation of the Game of Life simulation like I did for Java 8.
In this case, my choice fell unto the MEAN stack:
  • MongoDB: a general purpose document-oriented (and as such NoSQL) database with support for querying and aggregation.
  • Node.js: the famous server-side JavaScript interpreter.
  • Express: a framework for providing REST APIs and web pages on top of Node.js.
  • AngularJS: one of the popular client-side JavaScript frameworks for building Single Page Applications.
The experience has been quite interesting, as you really get to know a language and its libraries when using it for a project; in a way that no book can force you to do.

Myths

Here are a series of myths I want to dispel after diving into a full stack JavaScript project for a few weeks.
It is true that there is less of a context switch when changing between the server-side and the client-side applications, since you are always writing the same language. However, this seamless transition is limited by several differences:
  • different language support: ECMAScript 6 has to be compiled down by tools like Babel to ECMAScript 5 to be compatible with any browser and Node.js version. Polyfills may be needed to try and unify the experience.
  • Different libraries: testing frameworks change between server and client, and so does how you build mocks.
  • Different frameworks: Angular and Express both have their own ways to express controllers and views.
  • Different tools: you install packages for the server-side with Npm but use Bower instead on the client.
The key about productivity is in being opinionated and choose (or have someone choose for you) a single tool for each purpose, without being carried away by the latest fashion. In this case I followed some default choices and trimmed that down to get:
  • Mocha and expect(), one of the three flavors of the Chai assertion library, for the server-side.
  • Npm and Bower for server-side and client-side.
  • wiredep to generate script and CSS tags for the single page to be loaded.
  • Grunt as a build and automation tool, wrapping everything else.
One sany way to pick up default and platform idioms is to start from a predefined stack, and you can do so by cloning a template or a generator like Yeoman. If the generator is well-factored, it will give you sane defaults to fill in the gaps such as JsHint and a configuration for it.
Another myth I would like to dispel is callback hell: if you use the ECMAScript 6 construct yield, you can pretty much write synchronously looking code by building an iterator of steps (each step producing a promise whose resolution will be passed in as an input for the next.) There's probably something even more advanced I didn't reach yet in ECMAScript 7. Don't take this as me saying you can write synchronous code in JavaScript (it only looks synchronous), and you definitely have to learn to use well the underlying layers of callbacks and promises before you can grasp what yield is really doing.

Productivity

With a reference to my previous experience in Java, the productivity of the JavaScript stack feels very good in the short term (I only explored that time frame), due to its simple syntax and structures, especially with support for ES6 which removes a lot of boilerplate.
For example, there is no need for Set<Cell> aliveCells = new HashSet<Cell>(); definitions like in Java, as you would write aliveCells = new Set() with purely dynamic typing (suffering the occasional unfortunate consequences of this choice, of course.)
To evaluate productivity and robustness in the long term you would have to build a much larger project, inside a team composed of multiple people.
I found the tight feedback loop of grunt serve was another positive impact on productivity: you can set up a watch on files so that every time you save the Node.js server is restarted, and the current browser page is reloaded. This is accomplished by LiveReload monitoring from the browser side with a WebSocket. Of course once you get the hang of the testing frameworks and their assertions, you're back to the even stricter feedback loop of running tests and have their output in milliseconds.

Conclusions

I'd say you can reach a good productivity in a pure JavaScript environment, even if I am unsure about the pure dynamic typing approach.
You'll have to get opinionated and choose wisely; to not introduce duplicates and clearly assign responsibility to each of this tool so that it's unambiguous that npm should not be used for client-side modules; to take control of your stack, as everything that you install is still only JavaScript code, copied into your project and that you can read to get a feel of what it's doing.
In programming, a feature that seems to consist of just a few lines of code often turns into an engineering project. In very dynamic and immature stacks such as JavaScript, it's even more important to build strong foundations, tools and processes to turn a blob of code into well-factored software.

Sunday, February 14, 2016

Hello world in a production environment

An Hello World is a simple program whose only job is to print "Hello, World" on some form of output. The goal of an Hello World is to explain to programmers the syntax of a new language, but also to check that the infrastructure for compiling, interpreting or running the code is set up correctly.

Hello world in testing

The same concept can be used in the context of writing automated tests when you set up the simplest possible unit test:
When running this test with JUnit, you are validating that the environment is able to:
  • retrieve a dependency such as the JUnit JARs and their own transitive dependencies
  • build your Java code by compiling the test and its imports
  • executing the .class files with the correct classpath.
During a workshop that involves Test-Driven Development, I would typically require everyone with a programming environment to have this simple test set up on the day before; this practice avoids having to put together an environment under time pressure. Especially when trying out new languages or frameworks, getting to this starting point can easily waste a lot of otherwise productive time.

In production

This Hello World pattern may not be limited to a development environment, as apparently simple things can get you a lot of mileage when deployed in production. Here are some examples.
The first API I implement in every microservice accessible through HTTP is the /ping API. It returns a 200 OK response with a content type of text/plain, containing just the text pong. Thanks to this API I can set up the first acceptance test for the project, running it in multiple environments such as CI and staging, and getting it to production where I will be able to call this API with curl and check its correct deployment on the whole server fleet.
I once set up an HelloCommand instead, a simple binary able to write a single log line with a custom text to the centralized log server. By deploying this to all environments we were able to test what happens when the target log server is unreachable or slow, checking that these slowndowns are not propagated to log clients. We also could insert a local proxy, buffering logs before sending them through TCP, and manually check the whole path from generating a log to its final collection.
One of the next things I want to add to the infrastructure is a sample Hello World microservice itself, consisting of its own source code repository, testing and deployment pipeline and monitoring.

What do we get out of Hello World

In the Hello World microservice case, having a template to clone greatly lowers the marginal cost of a new microservice, because the new project can easily be duplicated from a minimal definition. Cloning another existing service is a dangerous operation (we all know the problems of copy-and-paste) as you have to distinguish between what is common infrastructure and what is service-specific code that should not be ported to its siblings. By observing the minimal working template, you will also be able to contain the duplication between services as boilerplate will be highly visible:
How is it possible that we need to have a 200-line build file for an Hello World service?!
What's highly visible can be refactored. Given however that your template is not a code generator but a sample service continuously deployed in production, there will be a tight refactoring feedback loop between making a change to deployment or monitoring infrastructure common to all services, and validating it into the production environment. What you definitely want to avoid is to try to reduce duplication between the builds of different services, only to find out that your extracted method break production deployments on the next day when you're on holiday; or optimize one particular project build but discovering that the improvements cannot be ported to other generic services.
Moreover, there is also knowledge sharing in play: explicitly tested and running Hello World can be picked up by the other team members very quickly, to create a new API, a feature, or even a service. With the concept of living documentation, we prefer executable specifications like unit tests and Gherkin scenarios over complex documents; in the same way, we should prefer living and tested software to be used as a template rather than technical documentation which could be outdated two weeks after it has been written.

References

The Ginger Cake pattern by Dan North is a version of Hello World that start from concrete, complex instances to be cloned. I am wary of leaving too much stuff lying around after having duplicated the cake, so I prefer to start from the simplest possible example. The benefits of this choice are higher if the number of instances to be created is large, so it takes some tuning to recognize recurring technical tasks.
Nat Pryce hypothesis on TDD on the system scale pushes for considering immediately monitoring and system management into the APIs it should provide. Hello World examples are one of the lightweight tools that can show you if underlying resources such as CPU, databases are available and if the applicaton as a whole is working correctly; at the same time, isolating you from the complexity of real features and the myriad of ways in which they can fail even on a robust application layer.
Walking skeletons are a tiny implementation of the system that performs a small end-to-end function; they should put together all key architectural components to validate they can be integrated and can run in the target environment. Here I'm arguing for samples tasks that can be helpful to develop similar instances, a much smaller scale including for example single HTTP APIs.

Tuesday, February 02, 2016

Book review: Java Puzzlers

Java Puzzlers is a nice book on the corner cases of the Java language, taken as an excuse to explain its inner workings. I have read it over the holidays and I found it a nice refresher over the perils of some of Java's features and types.

What you'll learn

A first example of feature under scrutiny is the casting of primitive types to each other (byte, char int), with the related overflows and unintended loss of precision. Getting to one of these conditions can result in an infinite loop, so it's definitely something you want to be able to debug in case it happens.
There are more esotheric corner cases in the book, such as the moment in which i == i returns false; or the problem of initializations of classes and objects not happening in the order you expect.

Method

I don't want to spoil the book, since it's based on trying to run the examples yourself and propose an explanation for the observed behavior. I'd say this scientific method is very effective in keeping the reader engaged, to the point that I finished it in a few days.
To run the puzzles, you work on self-contained .java files provided by the books website; you can actually compile and run them from the terminal as there are no external dependencies.
Incidentally, this isolation also means you're working on the language and in some cases on the standard library; the knowledge you acquire won't be lost after the next release of an MVC framework.
Again, work on the pc and not on a printed book only; in that case, you won't be able to experiment and you will be forced to read solutions instead of try and solving the puzzles yourself.

Argh, Java!

It's easy to fall into another pitfall: bashing the Java language and its libraries. The authors, however, make clear that the goal of the book is just to explain the corner cases of the platform so that programmers can be more productive. Java has been a tremendously successful platform all over the world, the strange behaviors shown here notwithstanding.
Thus instead of saying This language sucks you can actually think If this happens in my production code I will know what to do. The biggest lesson of the book is to keep the code simple, consistent and readable so that there are no ticking time bombs or hidden, open manholes in a dark and stormy night.

Some sample quotes

Clearly, the author of this program didn’t think about the order in which the initialization of the Cache class would take place. Unable to decide between eager and lazy initialization, the author tried to do both, resulting in a big mess. Use either eager initialization or lazy initialization, never both.
the lesson of Puzzle 6: If you can’t tell what a program does by looking at it, it probably doesn’t do what you want. Strive for clarity.
Whenever the libraries provide a method that does what you need, use it [EJ Item 30]. Generally speaking, the libraries provides high-quality solutions requiring a minimum of effort

Conclusions

Some of these pitfalls may be found through testing; some will only be found when the code is heavily exercised in a production environment; some may remain traps never triggered. Yet, having this knowledge will make you able to understand what's happening in those cases instead of just looking for an unpleasant workaround. This is also a recreational book for programmers, so if you're working with Java get to know it with an easy and fun read.

Monday, January 25, 2016

Book review: Java Concurrency In Practice

I just finished reading the monumental book Java Concurrency in Practice, the definitive guide to writing concurrent programs in Java from Brian Goetz at al. This books gives you lots of information in a single easy place to find, so I'll delve immediately into describing what can you learn from it.

A small distributed system

On modern processor architectures, multithreading and concurrency have in general become a small distributed system inside a motherboard, spanning the centimeters that separate the CPU cores and the RAM.
In fact, you can see many parallels between the two field: CPUs are different machines, and coordinating between them is relatively more costly than allowing independent executions. The L1, L2 and L3 caches near the CPU cores behave as replicas, showing tunable consistency models and forcing compilers to introduce synchronization where needed.
Moreover, partial failure is always round the corner as threads run independently. Forcing programmers to deal with possible failure is one of the few usages of checked exceptions that I find not only acceptable but also desirable. I tend not to like checked exceptions too much as they tend to be replicated in too many places in the code, creating coupling. Still, they make forgetting about a possible thread interruption harder and also push for isolating the concurrent code from the domain models it is using underneath, to avoid throws clause contaminations.

Relevant JVM topics

The book is ripe with Java Virtual Machine concurrency concepts, building a pattern language for talking about thread safety and performance (which are the goals we are pursuing with concurrent applications.) Java's model is based on multithreading and shared memory, where the virtual threads are mapped 1:1 over the OS threads:
  • thread safety is based on confinement, atomicity, and visibility. These are not generic terms but are really concrete, explained with many code samples.
  • Publication and synchronization makes threads communicate, and immutable objects help keeping the collaboration simple. Immutability is not just a conceptual suggestion, because the JVM actually behaves differently when final fields are in place.
  • Every concept boils down to an explanation built over the underlying Java Memory Model, a specification that JVMs have to respect when implementing primitive operations.

Libraries

Basic concepts are necessary for understanding what's going on in your VM, but they are an insufficient level of abstraction for productive work. For this reason, the book explains the usage of several standard libraries:
  • synchronized data structures and their higher performance. ConcurrentHashMap is a work of art as it implements lock striping to avoid coordination when accessing different buckets in the map.
  • The Executor framework provides thread pools, futures, task cancellation and clean shutdown. Creating threads by hand is a beginner's solution.
  • Middleware such as latches, semaphores, and barriers to coordinate threads and stop them from clashing with each other without having to manually write synchronized blocks all over the place.
Thus part of the book has an emphasis of using the best tools available in Java SE instead of reinventing the wheel with Object.wait() and Object.notifyAll(), which are still explained thoroughly in the advanced chapters. Reinventing the wheel can be an error-prone task that produces inferior results, and it should not be the only option just because it's the only approach you know.

Be aware that...

The book is updated to Java 6 (it's missing the Fork/Join framework for example), but fortunately this version contains much of what you need on the theory and basic libraries. You will still need to integrate this knowledge with Java 8 parallel streams.
It takes focus to get through this book, and I spent several dozen hours to read the 16 chapters.
The annotations (such as @GuardedBy) won't compile if you don't download a separate package; it's too bad they're not a standard, since the authors are luminaries of the Java concurrency field, experts from many JSR groups and Java programming language authors.
As always for very technical books, I suggest to read it on a pc, with your preferred IDE and JUnit open to write tests and experiment with what you are learning. You probably will need some review on the most difficult topics, just to hear them as explained from different people. Stack Overflow and many blog articles will be your friend as you look for examples of unsafe publication or of the Java Memory Model.

Conclusions

I'm a fan of getting to the bottom of how things do work (and don't). I would definitely recommend this book if you are executing your code in multiple threads, as sooner or later you will be bitten without even understanding what went wrong. Even if you're just writing a Servlet, that code could become a target for concurrency.
Moreover, as for distributed systems, in concurrency simple testing is not enough: problems can be hard to find and combinatorially difficult to reproduce. You need theory, code review, static analysis: this book is one of the tools that can help you avoiding pesky bugs and much wasted time.

Sunday, January 10, 2016

Book review: Thinking in Java

I recently read the 1000-page tome Thinking in Java, written by Bruce Eckel, with the goal of getting my feet wet in the parts of the language that were still obscure to me. Here is my review of the book, containing its strong and weak points.

Basic topics

This is a book that touches on every basic aspect of the Java language, keeping an introductory level throughout but delving into deep usage of the Java standard libraries when treating a specific subject.
The basic concepts are well covered by the first 200 pages:
  • primitive values, classes and objects, control structures, and operators.
  • Access control: private, package, protected, public for classes, methods and fields.
  • Constructors and garbage collection.
  • Polymorphism and interfaces that enable it.
Most of the basic topics are oriented to programmers coming from a C procedural background, so they don't dwell on syntax but instead focus on the semantics and the JVM memory model.
Even if you come from a modern dynamic language, you will still find this first part useful to intimately understand how the language works. You will learn common idioms and patterns such as delegating to parent methods, getting a feel for Java code instead of trying to write Ruby code in a Java environment.

Wide coverage

The larger part of the book instead will be useful to cover new ground, if your knowledge is lacking in some areas or if you want a complete understanding of it. For example, I have a good knowledge of data structures such as ArrayList, HashMap and HashSet; still the 90 pages on the Java collections framework introduced me to structures such as the PriorityQueue whose usage is infrequent but can be very useful when you encounter a problem that calls for them.
Here is a full list of the specific topics treated in the book:
  • Inner static and non-static classes.
  • Exceptions management with try/catch/finally blocks.
  • Generics and all their advanced cases including example such as class SelfBounded<T extends SelfBounded<T>>.  I thought I knew generics until I read this chapter.
  • The peculiarities of arrays, still present in some parts of the language such as method calls with a variable number of arguments.
  • The Java collections framework, much more than List, Set and Map; lots of different implementations and a conceptual map of all the interfaces and classes.
  • Input/Output at the byte and text level, plus part of the java.nio evolution.
  • Enumerated types.
  • Reflection and annotations (definition and usage).
By the way, a few of the chapters can safely be skipped to make the book shorter. Drop the graphical user interfaces chapter as totally outdated, I don't even write user interfaces different without HTML anymore nowadays. probably the most popular GUI framework right now is the Android Platform rather than what's described here.
I also suggest to skip the concurrency and threading chapter, since such a small treatment cannot do justice to this topic. I would prefer another dedicated introduction and then go on with a more advanced book like Java Concurrency in Practice, which will tell you also what not to do instead of showing all the language features.
On this point, I find the writing of Bruce Eckel conservative, showing caution with advanced and obscure features rather than showing off with the risk of writing unmaintainable code down the line. The point is making you able to read complex Java code, not enabling you to write a mess more quickly.

Style

The book is quite lengthy, but lets you select a subset of the chapters pretty well if you need to dig into a particular topic. The text is driven by code samples, and to experimenting instead of reciting theory.
I suggest to read a digital version with your IDE ready: at least in my case, I found it easier to pick up concepts and get involved by writing my own examples. A 1000-page book would be pretty daunting if read on a device with no interaction, as it's the polar opposite of dense.
I created many test cases like this one, which lead me to verify the assumed behavior of Java libraries and features with my own hands:

Currency

The drawback of this book is its not being up-to-date with the current version of the platform, Java 8. The most recent version is the 4th edition, available on Amazon since 2006, which treats every feature up to Java 5.
You will have to piece together knowledge of Java 8 and the intermediate versions from other sources. I would have expected this book to at least be up-to-date with Java 7 due to its popularity.
However, due to Java's backward compatibility, what you read is still correct: I only found one code sample to have a compilation problem. I wonder how long would this book be if it was edited again to include Java 8: it could probably get to 1500 pages or more and implode under its own weight.

Conclusions

If you work with Java, Thinking in Java is a must-read, either to get a quick introduction to the basic features or to delve into one of the specific areas when you need it. You will probably never be surprised by reading Java syntax or idioms again. However, don't expect a complete coverage of such a huge world: this should be your first Java book, not the last.

Monday, January 04, 2016

PHPUnit_Selenium 2.0.0 is out

Here is the text of the change I have just merged to make a new major version of PHPUnit_Selenium a reality:
As signaled in #351, there are incompatibilities between the current version of PHPUnit_Selenium and PHPUnit 5.
It is a losing proposition to still support Selenium 1 API (SeleniumTestCase), as it redefines methods that have even changed signatures. It has not been maintained for years.
So to support PHPUnit 5.x there will be a new major version of this project, 2.x. The old 1.x branch will remain available but not updated anymore.
2.x will contain:
  • Selenium2TestCase
and work with PHPUnit 4.x or 5.x, with correspondent PHP versions.
1.x will contain:
  • SeleniumTestCase
  • Selenium2TestCase
but will only work with PHPUnit 4.x, with correspondent PHP versions. In general, it will not be updated anymore. 
Supported PHP versions vary from 5.3 (!) to 5.6, according to the PHPUnit's version requirement.
Installation is available through Composer, as before.

ShareThis