Signs of an ill-factored system

Dmitry Chestnykh has collected implementations of the echo command from UNIX V5, OpenBSD, Plan 9, FreeBSD, and GNU coreutils.1

ImplementationLines of code
Plan 940
GNU coreutils257

From this you might conclude that the GNU implementation has excessive bloat. But what’s really going on is (a) feature creep and (b) the inclusion of documentation within the source code of the program itself. UNIX has evolved away from having a command (with no built-in documentation) with a separate manpage toward having not just a detailed separate manpage but also having quick-reference documentation included in the command itself.

The feature creep leads to a perceived need for the quick-reference documentation. The source to the UNIX V5 implementation is its own reference card! There’d be nothing to write in a usage message.

Both the feature-creep and the awkwardly-placed documentation are signs of an ill-factored system. A better factoring would keep commands single-purpose and as simple as possible, with documentation managed consistently across the whole system.

AttributeUNIX V5Modern UnixSmalltalk
Command (method) source codeShort, single-purposeLong, multi-purposeShort, single-purpose
Command documentationManpage onlyManpage, info file, built-in to program, web siteMethod comment only
How are new variants on commands added?Shell scriptsAdd the feature to the existing commandAdd new methods and refactor common implementation

I think UNIX V5 and Smalltalk are much better-factored than modern Unixes. Factoring out common functionality and assembling the pieces using scripting is the way things are done in both cases. Smalltalk has an advantage in that the scripting language is the systems programming language, but the core philosophies of the two systems have a lot in common.

Removing the inline quick-reference documentation from the GNU program lops 60 lines off the total. The remainder of the growth can be attributed to the baked-in extra features (GNU supports multiple command-line syntaxes, where UNIX V5 supports… nothing) and to premature2 optimisation (Plan 9 and FreeBSD avoid multiple writes, where UNIX V5 goes for TSTTCPW).

To my mind, either of the UNIX V5 or OpenBSD implementations are perfectly acceptable. The remainder are signs of a sick ecosystem: there’s nothing wrong with them intrinsically, but when they’re seen as part of a larger whole things start to look unhealthy.

  1. HT @silentbicycle

  2. That’s a little unfair. At the time, such optimisations were crucial to get a running system. Things are very different these days: JIT compilation, flexible kernel/userland boundaries, and quite simply the incredible raw speed of modern machines combine to make the difference between many printfs and a single writev irrelevant.