Thursday, October 15, 2009

10 things plain text excels in

Plain text is the simplest text format in the world. It is called plain in contrast with sophisticated formats like Doc, Odt, Pdf and so on. I work with this universal format a lot and I want to share some tasks it can be useful for.
Plain text is based on representing characters as a single stream per file, using a byte for every character (or from one to sixfour bytes if you're using UTF encodings). There is no font choice in plain text, nor any formatting: the focus is on the content and in its logic. When you open windows notepad or vim, you get an example of plain text editor. Stripping out all the presentation logic is in some cases a good thing, as it simplifies the management of data and content.

There is a chapter in The Pragmatic Programmer: From Journeyman to Master (the first Pragmatic bookshelf title) that is titled Power of plain text, and where the advantages of text files upon binary formats are discussed: there is no obsolescence of plain text and one can leverage every kind of existing tool being secure that it can handle plain text. Plain text in UTF-8 will be still readable thirty years from now.
In fact, one of the Unix philosophy pillars is:
Write programs to handle text streams, because that is a universal interface. -- Doug McIllroy
There is no limit to what you can do with data in plain text, because you can chain together hundreds of unix programs which will work seamlessly. In a Unix system, no configuration files work with a unit smaller than a byte: all directives are kept in plain text files, in a structured but human editable form.

As an example, I put together a list of what I am using plain text for. When I really thinked about it the first time I was impressed and pleased:
  • Todo lists: a list of tasks I have to complete in the near future, divided in Urgent/Important/... sections. Since it is a list, items has a "-" before them and I can indent subpoints by using more than one hyphen, like in -- subtask or --- subsubtask". To mark a completed item and gain confidence, I substitute the hyphens with "+", maintaining indentation. Using vim, it is also fairly simple to reorder items or to mark them with a macro.
  • Specifical todo lists: I have one todo list for this blog, for example. A general list can grow too much for being still manageable, so it's a good choice to gather Todos for particular projects in their own list. This is somewhat similar to the Getting Things Done project actions management, but at a simpler level since I do not need a more elaborate one.
  • Source code: this is pretty obvious, but I wanted to point out that source code is usually plain text.
  • Lists of any kind: for example, books I want to read or to find reviews for.
  • Svn diffs and patches: when submitting a patch to a project like Zend Framework or Doctrine, the process involves checking out the Subversion working copy and making the changes needed for addressing a bug or adding a feature. Then, if you do not have commit access, svn diff > myfix.patch saves the changes in a patch you can upload to the bug tracker for evaluation. Patch format builds up on plain text, but it's still readable and before committing on my projects I usually run a svn diff | more to explore the changeset (another example of plain text as a universal interface).
  • Goals: it is mandatory to write your goals for the short and long term, if you are serious about achieving them. Plain text is a good choice since you can find everywhere the programs to edit them, even five years from now.
  • Blog posts: when writing a new article, I start with a blank vim screen (maybe I should use a template) and write all the content, the most important part of a post. Formatting and images are inserted while putting the post online and proofreading it, and emphasis on words and phrases can be specified by '' or * marks.
  • Email: text emails are more portable than html ones and can be forwarded and quoted easily.
  • Wiki articles: when I edit a wiki article, not only on wikipedia but in any wiki, I use wiki formatting, which is a superset of plain text. I have included this usage since wiki formatting is very readable and can be used without a subsequent "real" formatting phase, for instance for lists like my Todo ones.
  • Schedules: I might use Google Calendar in the future, but now that I'm trying out scheduling my working days a simple text file named 2008-10-15.txt is perfect.
The format for simple scheduling is simple:
08:00   wake up&breakfast
08:15   mail&reader
08:30   nakedphp user stories estimation
Tabulations, even when using spaces instead of \t characters, are very useful to align text and provide spreadsheet-like capabilities. In the schedule case, I only specify the tasks for the next day so one file it's enough.
There are only two problems that can surge with plain text: encodings and newlines. Specifying UTF-8 and what type of newlines (LF, CRLF, or CR) will make your text files universal and consistent. Compare this requirements to the ones for working with docx files.

While I advocate that web applications are the future choice for many tasks, I have never abandoned plain text. When testing out some new practice like writing goals or maintaining an effective Todo list, I always start from plain text. This way if it simply does not work for me or I am not satisfied with the results, I'll simply delete a folder on my pc. No need to register to powerful web applications such as Remember The Milk: I'm sure it works pretty well for TODO lists and it's globally accessible by every machine connected to the Internet, but I am not ready at the moment. I'm only exploring possibilities, with a next-to-zero cost in time: I only have to open vim or gedit.
Now before registering to dozens of web services, think about using plain text for your lists, goals, schedules... Often the simplest solution is overlooked.

This is by no means an encouragement to write a book in plain text: use complex formats for complex tasks, because they will pay back their heaviness.


Peter said...

Thanks for the good examples.

There are several lightweight wiki-like text formats that can be converted to HTML/PDF/RTF easily. The most popular and useful is Almost Free Text, others are listed in this article.

Nitpicking: the maximum length of UTF-8 character is 4 bytes, not 6 (characters up to U+10FFFF are allowed).

Anonymous said...

Thanks for this article.

For me the biggest advantage of plain text files is that I can store them easily in a source control system like git.

Btw: for writing a book in plain text have a look at LaTex.

Giorgio said...

the maximum length of an escaped sequence was originally 6 bytes, after RFC 3629 it was limited to 4. In fact Unicode characters do need only four-byte sequences. I've corrected the sentence in the post.
I stored too my plain text files under Subversion, since it is capable of diffing this format.