Sunday, March 09, 2014

The good old TCP/IP stack

There are theoretical models, such as the ISO/OSI one, that cast the Internet into a set of many levels in an attempt of standardization. The Internet protocol suite, also known as TCP/IP, describes instead what goes on in reality to show you this blog post. The suite is divided into multiple layers, each building on the previous one and containing several protocols that can be theoretically swapped with each other.
I may use some protocol-specific terminology for antonomasia, such as frame.

Link layer

The link layer solves the problem: how do I get a frame of bytes from one physical device to another? Consider that the network resources, such as physical cables and radio frequencies, may be shared so that collision is possible. For the same reason, sometimes routing has to be available to identify who am I sending these bytes to; however this routing is physical, consisting of single point-to-point connections or of network card addresses.
Inside a local network, Ethernet and the wireless IEEE 802.11 standards have the lion's share of the market. Devices are identified by their firmware-based MAC addresses and the network may contain switches sending the frames travelling trough them to the correct recipient.
However, a local network is of limited utility nowadays. To talk with the rest of the world, more complex link layer protocols are needed: they get you from your DSL router to your ISP ones, maybe even involving multiple hops such as a section based on copper wires and one on optical fiber.
The link layer is closely coupled to the hardware available: different protocols work on different mediums such as wires, glass and electromagnetic waves. It is possible in theory to abstract the business logic (say, how to detect a collision) from the medium; however, it's like testing a Repository object by looking at the query that it generates instead of running it against the real database.

Internet layer

In the Internet model, machines may have globally-recognizable addresses that have meaning outside their local network. Thanks to these IP addresses and the related protocols, you can solve the problem of getting packets of data from one node in the world to another.
However, these packets have severe limitations:
  • they are of a limited or fixed size, that cannot be increased more than a few thousand bytes due to the packet switching model.
  • No order is guaranteed: packet may take different paths to get to the target host and arrive in any order.
  • Their transmission is best-effort, as there can be arbitrary packet loss.
Inside the global network, all hops at the Internet layer level have an IP address; the source and target IP addresses are written inside each packet so that each intermediate node can route it towards the neighbor that is probably nearest to the target. You can imagine the complexity of constantly updating this routing table while addresses are (re)assigned every day.
IP (version 4 or 6) is not the only Internet layer protocol. ICMP is one of the other famous ones, used for example by ping and traceroute for troubleshooting.
Finally, note that due to the limitations of the public address ranges containing only 4 billion IPs, NAT and other techniques have been developed to provide private address spaces to local networks. This severely breaks the model of  globally addressable nodes, as for example nodes inside your home or office network cannot accept incoming connections (without resorting to port forwarding). It is a necessary evil due to the ubiquitousness of IPv4 and its 32-bit address fields.

Transport layer

The Internet layer provides global connectivity, but with the limitations described above. To provide a useful bidirectional communication channel, the Transport layer builds upon the unreliable packets of the Internet layer to provide the illusion of a local IO stream, the same you could get by reading a file.
Consider for example the Transmission Control Protocol, TCP; it provides:
  • reliable and ordered communication between hosts. Lost packets are retransmitted and sequence numbers to correct out-or-order arrival.
  • multiplexing of communication channels between two nodes single link via ports. I can connect to the same web server with multiple browsers without the HTML pages and images being returned messing with each other.
Other protocols such as UDP are not optimized for reliable communications, but on other parameters like latency. What matters is that with a transport layer we can build a remote terminal which is conceptually the same as a local one, sending streams of text and receiving other text back.

Application layer

Once we have transformed the mess of wires and network devices into a universal interface made of text and bytes, it's up to the application to do something useful with it. Protocols at the application layer differ in what they offer to the end user:
  • Identification of nodes with an host name even if its IP address changes or they are physically moved elsewhere (DNS).
  • A way to read and create hypertext/hypermedia documents and related resources (HTTP).
  • A secure terminal session on a remote machine (SSH).
  • Updates for the local clock of your machine so that it's always correctly set (NTP).
  • Voice and video chat (proprietary protocols usually).

Importance

Why it's important to know how the full stack of the Internet protocols works?
  • When something breaks or slows down, it helps to identify the level at which the failure is happening, and contact the right person such as a your ISP, a system administrator that has to restart a VPN or a programmer not targeting the correct HTTP response code.
  • Layers are isolated from each other, so you can usually swap implementations inside one layer while keeping a system functional, sometimes sacrificing non-functional requirements such as performance. If your DSL line is down, you can use a mobile broadband Interney key without changing software.
  • Some problems are best solved inside a particular layer: congestion control by the transport layer, routing and visibility at the Internet layer. Why wasting energy in segregating responsibilities when there is already a standard division of labor we cannot change...


Sunday, January 26, 2014

Writing here again

Starting from January 2014 I have stopped writing on DZone my standard 2 weekly articles. You can still find the complete archive of everything I have ever written on DZone (469 articles) on my profile page.

It's a matter of simplification and focus: not having to juggle schedules and tasks from my day job at Onebip and from something else.

Now I am back to writing here without any fixed schedule, which means I should be able to keep up quality without worrying about quantity. The time that goes into 4 small articles can be put inside a single one that is 100x more useful.

Of course, the opposite can happen: without a fixed schedule I end up never writing. I hope my commuter's schedule will stimulate me with uninterrupted blocks of time in which to focus.

Sunday, March 03, 2013

If you're a programmer, you don't need Alta Scuola Politecnica

TL;DR: as a software developer, you can successfully ignore Alta Scuola Politecnica and pursue a career on your own so that you can be happy when you get up in the morning.

What's this ASP

The ASP in the title is not some form of programming language such as ASP.NET, but it's the acronym of Alta Scuola Politecnica, a joint program of the two major technical universities in Italy which attempts to select the top tier of master students.

With the official brochure words:
The ASP cultural project is aimed at complementing the MSc education (120 credits) with 30 additional credits, equally subdivided between courses and projects, so as to expose its students to a multidisciplinary way of managing complex problems and treat them in an innovative way. While the MS studies gives them extensive, deep, high-quality skills in focusing on a specific discipline, ASP studies broaden student's competence through interdisciplinarity and team-working. 
In my words: you work for two years on a project that usually never sees the light of day with people from other backgrounds, unless you count "it works on paper" as a deliverable. I heard ASP alumni defining this activity as "project simulation".

In the other half of ASP, you are sent for three weeks a year in seminars where you can get wasted at the evening and then sleep with your fellows during the day while economics and innovation concepts are explained compressed in some hours.

I've had this post for like months in my drafts, but never got to speak out loud. Let's dive into it: I hope this can serve as a note to the next computer engineer in PoliMi and PoliTo which will google about ASP when he is contacted to enter in it. I speak from personal experience (been there, done that) about the computer engineer role in Alta Scuola Politecnica, which has been for me using Prezi and Google Docs.

Who pays this?

The source of funds for paying teachers, trips, hotels, speakers, and so on, comes partly from the public education budget and from private companies sponsoring.

However, and here's the catch, these companies are famous in the Italian communities (communities such as Grusp and XP User Groups, to name a few) to be the ones you don't want to work for. Why?

Because they see programmers as a commodity of factory workers more than as capable designers of software systems. Basically, trying to pay as little as possible for as much hours as possible, possibly body renting you to another company. A developer role in these companies can usually be filled by postman and philosophy graduates that took a two-week Java course.

The companies that you won't see in ASP: Google? No, of course, they're not even in Italy with technical people. Github? Etsy? Connextra? Sourcesense? Thoughtworks? You're dreaming. You will see however, companies that are the dream job of management engineers, just not of computer engineers that actually like to code.

From inside PoliMi

While I googled for ASP, I found this gem by Dino Mandrioli, former president of the Consiglio Corso di Laurea Ingegneria Informatica. Which means: a computer engineer.

His summary: "yet another piece of paper with some credits on it that we give to people that do not need it, because if someone's really good, he doesn't care about supplements".
The piece was written when ASP was first proposed inside PoliMi.

Courses and projects

These courses always have innovation in the title; they compress lots of concepts in an intensive week, and lets you work as a team on presentations.

You know, when I submit a talk to a conference, is the by-product of several months of my workday activities. Why are we teaching students how to conjure up presentations from nowhere in the space of several hours is a mystery to me, but it may be a sought-after skill.

About the projects, they range from designing satellites to helmets to mobile applications (since it's such a buzzword these days). Keep in mind the design verb: I never got to write code in a year. Imagine going to work and not write code for a year; what kind of position are you pursuing?
If the answer is "I don't want to write code" and you're not a hardware guy, stop reading now, quit Computer Engineering and pivot to philosophy.

The students

But I've been there; I'm no journalist or teacher, I'm a former ASP student who dropped out after the first year to pursue an interesting job with an Italian company, which lasted after graduation and became my full time occupation.

So here's some anecdotes on what really happens inside Alta Scuola Politecnica, mainly during the 1-week seminars:
  • Everyone singing the Italian national anthem at dinner, because it was the 150th anniversaty of Italy unification. In retrospect, seem like we were part of Comunione&Lberazione, but with a different religion.
  • Dozens of students late for the morning lesson because they were trashed from the previous night in a disco; event followed by mega-punishment of writing an additional paper about the week for everyone that was more than 30' late.
  • Ericsson speaker asking if someone had heard of Goldratt and being alone in raising my hand. Really, management courses do not name Goldratt and the Theory of Constraints even for a minute?
  • An high percentage of the students getting food poisoning at the former Olympic villages we were staying into. Twice. Maybe it was the cold mountain air.
  • Projects where there is no code or architecture to write. What contribution can a computer engineer make apart from wasting two years?

Your career and job

If you get to the point of entering ASP, you're likely to be bright enough to excel at practically anything they can make you do: presentation and slides, researching papers or tools. This is a problem because since you do not fail you continue to do work which is not meaningful to your career: the projects of the 10-credit Distributed Systems course in Milan are alone worth more for your personal experience that the whole of ASP. If, during a job interview, someone prefers to hear that you are good at teamwork because of ASP rather than the fully distributed application you wrote in Erlang, I suggest you to leave immediately.

Moreover, if you again are entering ASP, you're still bright enough that you do not have problems in finding a programming job, even in Italy, even in crisis. What matters to your happiness and personal growth is which job you choose, and ASP does not serve you well in providing opportunities of which you can say "this company does cool stuff". Forget about Agile, TDD, Domain-Driven Design, ye who enter here.

In fact, there is a large selection bias at work: ASPers are successful because the commitee selects people who is already likely to be successful, not because ASP teaches them anything more than the location of discos at Sestriere. That's a spurious correlation between attendance and results.

For example, statement like "graduated ASPers find a job/enter PHD in less than 2 months after graduation" are not a surprise: probably all the people who are selected for ASP but do not enter or drop out still find a job immediately. They'se selecting the top 7.5% after all.

This is like the old saying: it's getting into Harvard that matters, not finishing it. Other than Mark Zuckerberg, there's plenty of people that dropped out of Harvard, but in the case of ASP we are not even talking about a degree: it's just an additional program for a master's degree.

You can reply with: the connections made in such a college matter, the same goes for Alta Scuola Politecnica. I argue that the best connections you can make in Italy, as a computer engineer, are not inside Alta Scuola Politecnica.

Here's who you should get in touch with if you want a job that does not involve Cobol or legacy Java that runs in the cellar of some bank:

When I get inquiries from companies that found me through PoliMi I almost never reply, as it's always the same offer: you are a Junior, come here with an entry level position so that we can sell your hours to some other company that we made a big contract with.

The people from these groups that I got to knew in the last years are the real heart of the programmer's community in Italy. They are what helped me select a sane, well-paid, fun job 6 months before graduation. Beat that, ASP.



Saturday, February 09, 2013

PHP Benelux 2013

In January I've been speaking at PHP Benelux 2013, one of the major PHP conferences in Europe.
You can find a recount of what happened there on the Onebip blog (my company, which sponsored my travel).



Sunday, January 27, 2013

Pyramids and cathedrals

Most software today is very much like an Egyptian pyramid with millions of bricks piled on top of each other, with no structural integrity, but just done by brute force and thousands of slaves. -- Alan Kay
    An analogy to these programs of the sixties is a dog house. If you take any random boards, nail, and hammer; pound them together and you've got a structure that will stay up. You don't have to know anything, except how to pound a nail to do that. Now, somebody could come along and look at this dog house and say, Wow! If we could just expand that by a factor of a hundred we could make ourselves a cathedral. It's about three feet high. That would give us something thirty stories high, and that would be really impressive. We could get a lot of people in there. The carpenters would set to work blowing this thing up by a factor of a hundred. Now, we all know, being engineers and scientists, that when you blow something up by a factor of a hundred, its mass goes up by a factor of a million, and its strength, which is mostly due to cross sections of things, only goes up by a factor of ten thousand. When you blow something up [by] a factor of a hundred, it gets by a factor of hundred weaker in its ability, and in fact, what will happen to this dog house; it would just collapse into a pile of rubble. Then there are two choices you can have when that happens. The most popular one is to say, Well, that was what we were trying to do all along. [Laughter] Put more garbage on it, plaster it over with limestone, and say, Yes, we were really trying to do pyramids, not gothic cathedrals. That, in fact accounts for much of the structure of modern operating systems today. [Laughter and applause]

    Or, you can come up with a new concept, which the people who started getting interested in complex structures many years ago did. They called it architecture. Literally, the designing and building of successful arches. A non-obvious, a non-linear interaction between simple materials to give you non-obvious synergies, and a fast multiplication of materials. It's quite remarkable to people when I tell them that the amount of material in Chartres cathedral, which is an enormous, physical structure, is less than the amount of material that was put into the Parthenon. The reason is that it's almost all air, and almost all glass. Everything is cunningly organized in a beautiful structure to make the whole have much more integrity than any of its parts. That's the other way you can go, and part of the message of OOP was, that, as complexity starts becoming more and more important, architecture's always going to dominate material [...] -- still Alan Kay
    Let's choose between building pyramids or gothic cathedrals, by choosing to model behavior into piles of lines of code or into the web of messages that objects send to each other.

    Sunday, October 28, 2012

    My thesis: linking social network profiles

    Each of us has several accounts on multiple social networks, such as Facebook, Twitter and LinkedIn. But there's currently no deterministic way to find the LinkedIn profile of a Facebook user in an automated way: you have to google the full name of that person and verify the search results by hand.
    So in my thesis I set out to build a solution to this problem based on machine learning (in particular decision trees and support vector machines).

    Here's the abstract:

    Record linkage is a well-known task that attempts to link different representations
    of the same entity, who happens to be duplicated inside a database; in particu-
    lar, identity reconciliation is a subfield of record linkage that attempts to connect
    multiple records belonging to the same person. This work faces the problem in
    the context of online social networks, with the goal of linking profiles of different
    online platforms.
    This work evaluates several machine learning techniques where domain-specific
    distances are employed (e.g. decision trees and support vector machines). In ad-
    dition, we evaluate the influence of several post-processing techniques such as
    breakup of large connected components and of users containing conflicting pro-
    files.
    The evaluation has been performed on 2 datasets gathered from Facebook, Twitter
    and LinkedIn, for a total of 34,000 profiles and 2200 real users having more than
    one profile in the dataset. Precision and recall are in the range of cross-validated
    90% depending on the model used, and decision trees are discovered as the most
    accurate classifier.
    The full thesis can be downloaded if you're interested into these sorts of things (namely applying machine learning to data coming from social network APIs).





    Sunday, October 14, 2012

    A Computer Engineering degree in 5 minutes

    As you know, I have recently graduated as a Master of Engineering at Politecnico di Milano. I think each course in the program of Politecnico has some underlying principles which remain with you after the exam has been passed and the most technical things have been forgotten and left for documentation to remember.
    Thus, I'll try to synthesize the most important concept I took away from each course. This list may be useful to engineers, students of PoliMi in Como and somewhere else, and just to curious programmers wanting to know what I did for 5 years.

    First year

    Mostly, the courses of the first year are mandatory and involve basic maths and physics which will serve in the next years.
    Linear algebra: algebra is a mature way of dealing with multidimensionality, as you generalize numbers and their multiplication or linear combination with vectors and matrices.
    Analysis 1: an engineer really needs practical math skills, and not mere memorization of proofs.
    Analysis 2: this course should be named Analysis N as you generalize from the 1 input/output variable of Analysis 1 to N independent/dependent variables.
    Electrical engineering: engineers build simplified models to work with reality; in practice you use resistance and capacitors and Kirchhoff's laws, not the Maxwell equations. This course could have been focused on hydraulics and be useful as well.
    Physics 1: Entropy is a nasty thing, and how to find and conserve energy in nature is an issue.
    Physics 2: Maxwell equations tell you anything you need to know about classical electrodynamics. Preparing for the exam means writing them on a sheet of paper and be able to explain and use them.
    Computer science 1: C is the minimum common denominator between all languages, and may its pointers and arrays be with you, always.
    Computer science 2: a process is really a virtual machine provided for you by the operating system, appreciate that.
    Telecommunication networks: abstraction over abstraction, you can go from varying voltage levels on a wire to transmitting web pages reliably.

    Second year

    In the second year, you got to choose some courses, and to do some practical project.
    Probability calculus: a mathematical model is built starting from sets and relations/functions.
    Economy: an engineer must know where the money to fund his efforts comes from.
    Differential equations: meteorologists cannot predict weather for more than a limited amount of time due to divergence from initial conditions.
    Automation: feedback systems beat feed-forwards ones because they don't need an accurate model in order to work. Agilists, what do you say?
    Electronics 1: according to classical physics, USB keys and other SSD drives cannot work. Fortunately, USB keys know some quantum mechanics.
    Operations research: you can buy RAM if you have algorithms that consume too much space, but you can't buy time.
    Computer science 3: algorithms are really intertwined with the data structures they work on.
    Software engineering: maintainability is maybe the most important trait of design, and involves also writing diagrams not to avoid coding but to explain your code to other people.
    Software engineering project: communication between teams is typically the hardest problem in software development.
    Statistics and measurement: when you read 3:36:20 PM on your watch, it's actually a range like 3:36:19.5-3:36:20.5; and that interval has a mean and a variance.

    Third year

    Now you get to choose more than half the courses, and of course you have to work on a Bachelor's thesis which workload is of a course and a half.
    Databases: SQL and the relational model are not going to go away soon, and they're really about sets more than tables.
    Chemistry: the structure of [almost] everything we touch on a daily basis can be explained by protons, neutrons and electrons arranged in different ways. Not philotes, but near enough.
    EM waves and nuclear physics: waves are cool because you can carry information on their properties, such as frequency, amplitude and phase.
    Signals: a transfer function goes a long way, and for the engineer everything is linear, time-invariant and Gaussian unless the contrary is proven.
    Computer installations: you shouldn't really buy servers randomly without doing some math first.
    Theoretical computer science: computers cannot solve every problem, and do not parse HTML with regular expressions.
    Knowledge engineering: stochastic algorithms and neural networks work, give them a chance over pure statistical learning.
    Web technologies: HTTP is the lingua franca you need to speak.
    Web technologies project: (web) frameworks have a steep learning curve.
    Logical networks: sequential and combinatorial are two concepts which appear everywhere.
    Information systems: that was just wasted time having to rework RUP diagrams before changing a single line of code makes you aware of why the Agile manifesto is so popular.
    Thesis work: when on unfamiliar ground like new frameworks, languages and APIs, Test-Driven Development with a very short Red-Green-Refactor cycle will definitely speed you up.

    Fourth year

    This year is full of projects: you're a "graduate" student by now, so you're expected to show originality and autonomy.
    Advanced computer architectures: we may be stuck with the x86 instruction set, but if you try a RISC architecture optimization can do wonders.
    Advanced database and web technologies: it's not only SQL, and even universities recognize CouchDB and MongoDB now.
    Advanced software engineering: there's so much going on in a project other than code. You don't see this on a small scale, and it doesn't mean that you have to write it all down, but diagrams and documentation have their communication purpose.
    Computer vision: test-driven, object-oriented Matlab is a reality. But math is hard, especially for 3D models, so leverage libraries for complex domains you don't have time to explore by yourself. Oh yeah, and computers can see, but barely.
    Image processing: testing image-related code is not easy, but regression testing it is.
    Model identification: estimating the value of a stochastic process isn't just sampling and taking an average.
    Multimedia information retrieval: Google (and Google Images) work because of math that you must have the courage to study.
    Pattern analysis and machine intelligence: studying machine learning gives you an edge over all the programmers that don't know what regression is because of their theoretical background.
    Performance evaluation of computer systems: utilization and queuing networks are concept that apply to servers but to teams as well.
    Workgroup and workflow systems: you may dream of creating the new Tomb Raider, but 90% of the money is in business software.

    Fifth year

    In the last year, you can choose courses from other campuses, and you work on a master's thesis for a third of your time.
    Philosophy of computer science: ethos is an important part of execution. When in Rome, do as Romans do.
    Network architecture: one day we will all have fiber in our homes and send phone messages over 4G instead of SMS.
    Distributed systems: remote procedure call is not a modern way to build a distributed application, it's fundamentally different from running processes in a single addressing space.
    Game theory: people are rational. Somewhat, if you consider their utility functions.
    Interactive TV: recommendation systems literally print money.
    Pervasive systems: the way to go is smaller computers, who use less power and are not even based on a general purpose CPU.
    Thesis work: scientific work has other priorities with respect to programming; background and validation and are key with respect to coding and design.



    Friday, September 28, 2012

    A paper on the philosophy of digital piracy

    During my last course at Politecnico I wrote a paper on the philosophical argument of novelty of computer technology problems. Plainly speaking: ethical problems from copyright infringements are just a new version of theft or a new conceptual issue?
    Here is the paper:
    http://bit.ly/S62WtB
    If philosophy is not your thing, it would be boring to read. But it was a good exercise to write.

    Friday, June 29, 2012

    Roundup: the Lean tools series continues

    Here are my posts about the Lean tools for software development proposed by the Poppendiecks and my daily experience with them. I've published also some independent tutorials that may help you in testing legacy code or cook up some functional JavaScript.

    The Turing test explains which are the most popular interpretations of the famous test for distinguishing between humans and machines.
    Lean tools: Pull systems is about limiting Work-In-Progress with a counter intuitive inversion.
    Functional JavaScript with Underscore.js explains how to work with this library.
    Lean Tools: Queuing Theory is about a scheduling technique that works on both people and computers.
    Record and replay for testing of legacy PHP applications lets you record HTTP requests for repeating them in tests later on.
    Lean tools: Cost of delay tries to put a price tag on delays in releasing a feature.
    My take on Utility and Strategic software is a little essay on this dichotomy established by Fowler. In short, you want to work on strategic systems.
    Lean Tools: Self-Determination is about eliminating Taylorism and assembly lines from our profession.
    The Duck is a Lie is a critique on duck typing.
    Lean Tools: Motivation is about what gives a team motivation, and it's not money.

    Sunday, May 27, 2012

    Roundup: OOP in PHP

    Here are the slides for my recent presentation at phpDay 2012, What they didn't tell you about object-oriented programming in PHP.


    Here are also the links to the various articles I published in this period on DZone. I'm down to two articles a week for the foreseeable future.
    The standard PHP setup
    Selenium on Android
    Hexagonal architecture in JavaScript
    Lean tools: synchronization
    Why everyone is talking about APIs
    Lean tools: Set-Based Development
    Testing PHP scripts
    Software Metaphors
    MongoDB and Java
    Lean tools: Options thinking
    What is global state?
    Lean Tools: the Last Responsible Moment
    PHP 5.4 by examples
    A crash course for the MongoDB console
    The surgery metaphor
    Lean tools: Making decisions

    ShareThis