Pages

Promote openness: Custom applications and standardized formats


The real difference, in technical terms, between the philosophies of open and closed tools is probably simpler than you think:

Closed systems prefer standardized applications that use custom data formats.

Open systems prefer custom applications that use standardized data formats.

Standardized data formats


Data formats, in this context, are more than just ODF and MS-OOXML. They are data files, databases, and protocols. Standardization is not just a stamp of approval from some committee; if that was all that was necessary, MS-OOXML would be more “standardized” than the file format used in the shopping list you edit in Microsoft Notepad (or Vim or GNU Emacs or nano or TextMate or SciTE or whatever else you use for plain text files). It is the broad technical, legal, and LCD interoperability and compatibility of the file. It is accessibility by everyone and anyone who wants access.

This is why plain text is still among the most-favored content types for open source software. It is why system logs on common Unix-like platforms can be parsed using basic text processing utilities like grep, rather than requiring specialized log viewer GUIs to perform even the simplest one-off operations. It is also why more complex operations are so easily scripted in languages like Perl, obviating tedious repetition of button-click recipes every time the same common tasks need to be performed.

Plain text formats are not the only broadly standardized formats, however. Consider the pcap format used by tcpdump for traffic analysis or for filtering PF firewall logs. It is highly likely that more open source traffic analysis tools use the pcap format than any other single format, except perhaps plain text itself. The tcpdump development team made this widespread compatibility possible and easy by abstracting its data format handling out of the tool, and encapsulating it in the libpcap library, then providing it to the world under a copyfree license.

Unnecessary complexity is anathema. Simplicity enhances compatibility, reusability, robust operation, reimplementation, and comprehensibility. By contrast, closed source software developers design their data formats with the intention of vendor lock-in. Complexity is to their benefit. It prevents potential competitors — even noncommercial competitors — from making compatible software very easily; it makes old files more prone to obsolescence when applications (and data formats) are updated; it contributes to buggy software behavior due to errors in implementation; and it encourages differing interpretations so that the same data format definition can in practice produce multiple, mutually incompatible data formats. This is all to the benefit of vendor lock-in.

Relative simplicity, permissive licensing, and aiming for the least common denominator are the principles of effective data format standardization.

Custom applications


In most cases, applications — whether we are talking about clients, servers, or stand-alone tools — at their most fundamental serve as our interfaces to data formats. This means that anyone who looks at the proliferation of Linux distributions available to the world and says “Linux needs to offer just one or two options to succeed!” is missing the point.

One of the joys of using the vast majority of open source operating systems is that we get to choose the interfaces that best suit our personal preferences. The X Window System is the most clear, obvious example of this: while it is a single tool that is the general “standard” for desktop GUI management, using its own distinct protocol, making it seem at first glance like it fits the “standardized application,” that is only the case because of the fact that the X Window System itself is designed with an approach to providing our interfaces that lets us pick and choose what we want it to do. As a result, we get to use KDE, GNOME, XFCE, Enlightenment, WidowMaker, Fluxbox, IceWM, FVWM, TWM, xmonad, AHWM, wmii, Ratpoison, Compiz, Sawfish, evilwm, Golem, ScrotWM. . . .

The list goes on. Even the X Window System itself, also generally known as X11, is not a single “standardized” application. The heart of the X Window System is the X11 protocol. Most Linux distributions and other open source Unix-like operating systems use an implementation of that protocol known as X.Org, or more succinctly Xorg, but that was not always the case. Before Xorg, XFree86 reigned supreme, and it was only some concerns regarding licensing that caused the split. Apple MacOS X has its own variation of the X Window System, and several MS Windows variants exist as well, including Cygwin/X, Mocha X Server, WeirdX, Xmanager, and Xming. All of this is made possible by the simple fact that X is fundamentally a protocol, and not a piece of software, per se. The pieces of software are the implementations of the X11 protocol and the window managers that make use of that protocol.

HTTP is a similar case of a standardized data format being fronted by custom applications as our interfaces to the data format. The most recognizable HTTP client applications are the Web browsers we all use, including Internet Explorer, Firefox, Safari, Chromium, Opera, Flock, Netscape (yes, it’s still out there), Camino, Konqueror, Lunascape, SeaMonkey, uzbl, and scads of others.

Anyone who uses one of the top tier open source browsers like Chromium and Firefox on MS Windows is sure to appreciate the benefits of the “custom applications” approach; otherwise, they, like the majority, would surely be stuck using Internet Explorer. Any complaint about too many options being available for some open source software distributions, such as Linux-based operating systems, should hopefully be quickly silenced by the realization that it is the freedom to create infinitely variable offshoots that provides us with any options at all. Those of us who make use of less popular browsers like uzbl, Amaya, or W3m probably already appreciate the benefits of diversity.

Open source, closed feel


Even open source software applications that try to be all things to all people at least partially violate this unspoken (and not widely recognized) tenet of open software design principles. They set out to essentially swallow up a user’s entire existence and dictate how tasks will be accomplished, providing built-in tools for accomplishing those tasks to further insulate the user against the influence of outside alternatives. In order to support all these bits of functionality, all these features, such software incorporates more and more in the codebase, whether by way of libraries or tightly integrated features of a single executable binary.

The entire “office suite” category of software tends to fall into this trap. Each individual office suite ends up being a gigantic tarpit from which office workers find themselves decreasingly able to extricate themselves even temporarily. They are increasingly mired in the Office Suite Way To Do It. To get a text editor from an office suite, you also have to install sixteen metric tons of other tools and features that are indivisible from your text editor. Ask yourself how often you sit down to create a “document” of some sort and actually need more than simple, plain text editing capabilities. Would Gedit, Notepad, TextEdit, or Xedit serve the same purpose much of the time?

There are dozens of easily available, free (for use, or even open source “Free Software”) plain text editors offering varying levels of complexity and functionality to suit any user’s needs on a wide range of platforms, much lighter weight than MS Word or LibreOffice Writer, that are not tied to the underlying codebase for spreadsheets, presentations, and databases. They do not come with table and image inclusion baggage. They haul exactly zero requirement for the JVM or extension and scripting languages the vast majority of users never touch, and do not default to opaque file formats that are in many cases effectively specific to that piece of software. When all you need to edit is text, all you need is a plain text editor.

This does not mean that when you need more than simple text you should force yourself into the plain text mold. It does, however, mean that if you do not need more than text, you probably should consider using something that is specifically designed for just text. Get into the habit of using the lightest weight tool that will serve all your needs for the task at hand, and you may ultimately find yourself running a system that does not even have an office suite on it — at least in theory. That might be an attractive alternative for some of us to the error made by many people every day: making a shopping list or similarly simple, brainless plain text only file that contains 222 bytes of actual content, but measures a gigantic 150KB (or worse, a dozen megabytes, especially if you use MS Word’s bullet lists) on disk. In a plain text file, 222 bytes of content ends up taking up 222 bytes on disk.

Take that 150KB file and try opening it in a competing word processor. If it is just a simple shopping list, your chances of successfully opening the file without anything being broken are reasonably high — as long as you chose a competing word processor whose developers went to some pains to provide file format compatibility. The more complexity you introduce, though, the more likely it is to develop problems. Did you use nested bullet lists with custom identifiers (lower case Roman numerals with colons instead of periods slotted in under plain bullet points using stars instead of dots)? Do you really need megabytes of file format that will not even display properly in a different application just to track a todo list, especially when there is still under a single kilobyte of actual content?

That lack of portability is really the biggest problem with applications that have a closed feel, even when they are distributed under an open source license. They end up feeling very similar to using a closed source application because of the sense of being locked into using a single application if you want to still be able to work with the same files. They take a “standardized application, custom data format” approach to how you work, and to those who have learned the joys of customizing their computing environments, it can start to feel like a straightjacket.

If you write files in a truly open format — such as plain text — using any of the myriad of alternative applications that can deal effectively in that format, you maximize the portability of the document. Everyone from an MS-DOS edlin user all the way up to someone who uses MS Word or QuarkXPress can access the contents of your document. It can even be loaded up in a browser window for perusal, with consummate ease.

The key here is not that minimal tools and minimal formats are always the best. Sometimes, more robust file formats that offer more options for formatting and content management are definitely necessary, or even just preferable. This is why Web browsers use HTML, rather than simply being network-attached text editors, after all. The key is that choosing the least common denominator, a standardized format, allows those who want to access the same data format in a different application to do so. Vendor lock-in is vanquished by the simple act of selecting data formats that have been effectively standardized in an open form, so that they can be used with custom applications, with the interfaces that the recipients want to use.

Seeing an open source application achieve immense, standardizing popularity by standing on the back of what amounts to a custom format, produces a feeling very similar to that of seeing a closed source application do the same thing. It can be a very saddening experience.

As open source software users, developers, and proponents, it is on us to encourage that feeling of openness that helps make our favorite software model valuable. We can do that by encouraging people to use custom applications with standardized data formats.