On Unix composability

On Unix composability

Unix provides fairly simple means of composing programs, and over time I appreciate this feature more and more, however there are some caveats. This article is my personal collection of noteworthy examples, ending with a discussion.

Somewhat related is the video C was already sharp. Of particular interest is the notion that C does in fact have a garbage collector, and that is Unix--so long as you make your programs small and short-lived, preferrably using the shell to glue them together. One thing could be added, though: early Unix also used statically allocated buffers a lot more.

json-rpc-shell

This shell of mine for networked APIs features an internal command line. At some point, I had to deal with the problem of paging large responses, where a ridiculously easy solution offered itself: imitate a Bourne shell, look for a pipe (|) character on the input line, and redirect all output into popen().

json-rpc> ping | less

Later, when I finally bothered to learn about jq, I was happy to discover how these two programs can be simply joined together, with no additional effort:

json-rpc> ping | jq ascii_upcase
"PONG"

Another recent example are language servers, which also make use of composition, although they require considerably more integration work.

xC

I’ve made the terrible decision to build an entire IRC client on top of GNU Readline, and had to find a means for viewing the buffer scrollback. Again, what better to use than the familiar less. The only annoyance here is that IRC protocol logic needs to keep running in the background--I’ll adress this further on.

Once this infrastructure was in place, it was straight-forward to also support launching an external editor program (vim) to make it more comfortable to edit multi-line messages.

sdn

All started with the desire to turn Midnight Commander inside-out--why run another shell within the file manager, when you can control the shell you’re launched from? As a bonus, when I use F3/F4 on files, I get working job control, and I’m free to suspend them!

Anyway, both programs are fine examples of rather advanced composition, whether it is the sub/supershell integration, or external viewers/editors.

As has become tradition, I simply run less to show the help view of my file manager, and it’s much less work than reimplementing it badly in curses.

IRC bot plugins

When I wrote my VitaminA IRC bot in GNU AWK, it was straight-forward to implement plugins as coprocesses, due to elegant support from the interpreter. |& as the bi-directional popen(). This enabled me to implement each plugin in a different scripting language (and with TCC, C may also become one), so long as I could write an IRC parser for it, which was the obvious choice of IPC protocol.

Later I carried this functionality over to my follow-up bot written in C, however the language overhead, coupled with my desire for fully asynchronous processing, made it a lot more complicated. It appears some synchronicity here and there may not hurt.

sdtui

At some point I needed to convert a dictionary to another textual format. This can be a tedious task requiring fast iteration, so I didn’t want to do it in C, which is what the “library” handling the file format is written in. But I knew Perl. And so I made a small tool to just filter all entries through an easily modifiable external script.

wmstatus

Similarly to the case before, there were parts of my dwm/i3/sway status bar that I didn’t want to hardcode in C, but would rather enjoy being able to outsource to a scripting language--such as weather forecast retrieval, as it depends on some random XML spewn out by an Internet service.

With tiling window managers, it is generally fairly common to delegate functionality, typically global key bindings, to a plethora of external programs, e.g. pactl for volume control, or my display control utilities.

luit

Moving on from talking about my own projects to other people’s works.

xterm is now Unicode-native, and supports legacy encodings via a filter program called luit, which wraps other terminal applications, and transparently re-encodes all text coming in and out. What’s nice is that luit can provide this support to any other terminal!

aerc

This e-mail client goes beyond anything that I’ve done, and includes an entire terminal emulator (sadly, it’s also miserably slow over SSH). So it simply launches man to give you a tutorial, while making it obvious that it hasn’t gone anywhere.

aerc tutorial

A more “composed” approach would, of course, be to run within and control a custom tmux session, though going the way of aerc offers interesting integration opportunities, such as the interface for composing messages, which sports a convenient header above an editor.

Annoyances of coprocesses

While launching a process and simply feeding or consuming data is easy with the popen() library function, reading and writing from and to the same subprocess already poses a challenge: the two programs can exhaust I/O pipe buffers and deadlock, waiting for each other’s pipes to be read from, so that they can finish writing. Then you need to buffer your writes in userspace, and process data asynchronously--this is a lot of code, in addition to having to do the whole pipe-fork-exec dance. Or, if you happen to be on Linux, and can confidently set an upper limit on your buffer that fits within /proc/sys/fs/pipe-max-size, you may succeed with merely changing the size of the kernel buffer using F_SETPIPE_SZ, and/or prevent deadlocks with non-blocking writes on at least one side.

Tricks

As a side note, when you have a very large number of coprocesses, or otherwise run up against the limit on file descriptors, it can be wasteful to create separate pipes for both input and output. Luckily, there’s socketpair(AF_UNIX) that always creates bi-directional sockets, which can be used in almost the same way as pipes, but you only keep around one file descriptor per process. Moreover, if you control both sides, you can stream datagrams, rather than having to figure out message boundaries manually, though some limits apply here.

Another related and useful trick for output-only subprocesses is to share a single datagram socket pair between all of their output streams, and to have them prefix their messages with their PID. Admittedly, this is a bit far from the territory of simple composition.

Annoyances of composition in general

Long-lived asynchronously running slaves, such as the scrollback display in xC, require a bit of management. When they exit, get killed, or get stopped, you’ll often want to handle the resulting signal--which also constitutes a lot of code. Luckily, if the program is being continuously read from or written to, it may suffice to simply handle failed reads (unexpected EOF) and writes (SIGPIPE or EPIPE). Otherwise, Linux’s signalfd can help in reducing some of the signalling boilerplate.

Improper FILE stream buffering can bite you in the arse easily--remember to flush your output if you want your data to appear at the other end immediately, or disable this feature altogether.

Don’t even get me started on terminals, process groups, and sessions.

Overall, it appears the intended language for Unix program composition, the path of the lowest friction, is the shell, and not C. Even popen() launches /bin/sh on its argument.

The inherent trade-off

Part of the simplicity and power of Unix is that everything is a file^Wstream of bytes, most commonly some form of text. It has as much meaning as you’re willing to give it, and if you want to transform SGML using regular expressions--that’s fine. It’s crude and convenient. About the only kind of structure that is provided are directories, and the process hierarchy.

My friend has written a lengthy article bemoaning this.

Another friend of mine has been working on his relational pipes project, desperately trying to shove all the perceivably missing complexity “back” into Unix--a Babel fish for pipes of sorts. Arguably, his life would be much easier if he only switched to PowerShell.

Comparison with libraries

Libraries can only be used with a given programming language, or need to be wrapped (even extern "C" is a wrap), which is typically fairly difficult to do, particularly when you try to cross boundaries between high-level languages. In contrast, setting up and launching a process is universally straight-forward, as well as more machine-independent, due to serialisation being commonly done in a textual format.

On the other hand, there can be considerable overhead in the pipe-fork-exec-write-read dance, if it’s something to be done repeatedly. You can also go the way of coprocesses, which require establishing a protocol--such is the case of LSP.

Numerous software can both be used as a library, as well as a separate process.

The other competing approach are daemons on various local and remote buses. But for the purpose of this discussion, they’re similar to launching a process, which may be exactly what’s going on if the far side uses inetd, CGI, or systemd socket activation. The only major difference lies in the lack of sideband data, such as environment variables and program arguments, and maybe a decrease in reliability. Since the advent of getaddrinfo(), connecting to the outside world doesn’t even involve much code.

The disappointment of GUIs

The integration of less and vim appeared above many times. However, when you move to graphical interfaces, those aren’t very composable at all--the only methods in wider use are through splitting out a library, wiring it in as a component, or plainly running a program with certain parameters, potentially feeding it commands from outside--the mutual connection isn’t apparent to the user. Even TUIs are lacking, as one would often want to employ a terminal subemulator, which is a substantial amount of code, and brings its own set of problems with it--yet it’s at least workable, and universal.

XEmbed can kind of work--e.g., GVIM can be integrated using GtkPlug/GtkSocket with --socketid {id}--but this needs to be supported explicitly in the client. Moreover, the protocol specification mentions several reasons why you might not want to use it.

In theory, GVIM’s --echo-wid could also be used to simply XReparentWindow() like a window manager would, without considering the XEmbed protocol at all, but you’ll run into multiple problems concerning input, and the window will reparent very visibly.

Wayland has an unstable (as is typical) protocol called XDG foreign. I can’t trash-talk Wayland to the extent it deserves yet.

The counterpoint

It could be argued that unlike the terminal, GUI toolkits, which is what you have to use to retain sanity, come with extensive libraries, and already cover the need for less (GtkTextView) and vim (GtkSourceView), but that’s not the point. Do you like to be constrained to use a particular means of text display or editing, rather than being able to just set PAGER and EDITOR to whatever you prefer? Composition is the first step to substitution.

What I would like to see

To make this clear, my concern has everything to do with subwindowing and redirection. As far as I understand, Plan 9 makes this achievable if you launch a program with a specially crafted /dev namespace, and reimplement a subset of rio, yet even there, it’s not something you should be doing.

I haven’t got very far with my analysis of what a graphical terminal enabling this concept would do, but from my experience with widget toolkits, there shouldn’t be any major barriers, so long as the programs are reasonably simple.

At least I’m not the only crazy one, wanting to reinvent wheels. In general, letoram’s work is a good note to end this article on. He’s all in on composition! Maybe he’ll come up with something accessible to mere mortals eventually.

Follow-up

There is some discussion on lobste.rs. Mr Oil Shell’s subsequent article has a few interesting comments about the trade-off of byte streams. And in general his obsession seems to bear more fruit than I was willing to acknowledge, especially the massive interlinking.