On the road to pure Go X11 GUIs

Why C? It's a clean, rather simple language that I've ‘mastered’ at one point and acts as a common denominator to important libraries, no bullshit in between. As of late I've also warmed up somewhat to C++, mainly its latest versions, but despite some of its very welcome syntax sugar it keeps repelling me with things like prolonged build times and the often-backwards STL.

Why not a GUI toolkit? I do have a few GTK+ projects behind me and the experience has been largely positive, yet the amount of boilerplate one needs to write in C, the state of Vala, the direction the toolkit's been taking ever since GTK+ 3 and last but not least its bureaucratic license made me look for alternatives. There aren't many. Although I am a long-time user of Qt Creator, the complexity and C++-driven ugliness of the Qt framework puts me off and one doesn't win on the bureaucracy front there either. Tk desperately needs a rewrite as it hasn't weathered the past decade very well, and even then it forces me to manipulate it through an interpreter. Given the choices, terminals seemed like an acceptable way out.

While I've had some successes with my approach, it has been very laborious. I treat code like something that needs to look about as good as the product, and spend a lot of time yak shaving. Perhaps if I were super-intelligent or had stellar willpower, it could even work! Alas, it looks like I need to eliminate some of the self-imposed busywork.

And so I've placed a bet on Go. It is just as conceptually simple as C, sports a friendly BSD-style license, and already has its own parallel ecosystem. No stinky LLVM, in fact no traces of C at all! It's an overlooked revolution. I can follow symbols through packages however deep I want to and I always end up in Go or its assembly. Well, so long as nothing ugly uses Cgo.

Right, now that I've embraced the garbage collector, how do I make an interface that doesn't look like it dates back to the '80s? And can I avoid Cgo?

A survey of toolkits

First, people have naturally written bindings for Qt, GTK+, libui, and also ~~webshit~~Electron. But that is impure and not at all what I'm looking for, as the introduction should have made clear.

None of these were very compelling. Yes, there are some options but they're impractical, and exp/shiny with its broad scope and large dependency tree, together with the amount of unresolved problems, intimidates me too much to try making changes to it, let alone upstreaming them. All the more so since I'd be alone in the mess.

So let's build one

I kind of wanted to, anyway. I've studied enough of Win32, GTK+, Tk, Borland Turbo Vision and a few others to understand the concepts and the general problem space. Surely it can't take much effort to roll my own, right on top of the display server. Now which one of those to pick?

Wayland

I keep thinking about Wayland. It's really a double-edged sword. Aside from my pet peeve that it by-design blocks, for inane security reasons, my favourite feature of tdv, which is X11 selection stealing*, it also offers no mode that would work over a network—for example, to transfer picture data you can only use EGL or shared memory. Of that, EGL even requires Mesa and thus Cgo. Although there's been one forgotten attempt at coercing it to work, no matter how, and more recently a quirky proxy, in practice no one is pushing for this and you'd need to build another server on top like devdraw. Or just use XWayland.

All that being said, I've been lucky enough to find an interface library that goes as far as to provide an example image viewer application.

* 2020 update: it seems that this might work in some compositors now, see wl-clipboard and wlr-data-control v2, though this extension is currently unstable.

The X Window System

During its 30+ year long history, many words have been written about X11. Much of that is now obsolete, despite often being very well written. Some of the better newer resources are:

An important aspect of X is that it's ubiquitous and widely supported, having been ported to all major operating systems.

As far as Go libraries go (no pun intended), we are limited to just BurntSushi/xgb: The X protocol Go language binding, analogous to XCB. On native Linux basically everything points here. Unlike with xgbutil, BurntSushi seems to at least pretend to pay attention to issues. Still, there is a lot of things to do yet:

Ultimately, since I can't go wrong with X and I can go wrong with Wayland, the choice was easy.

Drawing in X

First things first, I need to draw pictures on the screen. There are several options to pick from.

Core protocol

Rarely used anymore except for the most basic of operations such as blitting. No alpha blending or antialiasing. Core fonts are problematic and mostly unfit for this age, strictly worse than bitmapped TrueType with proper hinting. Pass.

GLX

This Is The Future™. Invented in the early '90s by now-defunct SGI, the originator of the OpenGL API, it lets you render using a 3D driver and thus talk to the graphics chip in an optimal fashion. Unfortunately, it also means Cgo and will not work over a network, just like with Wayland and EGL.

Indirect GLX

Okay, so the underlying protocol extension actually has an indirect mode, too, where it's the X server that bothers itself with the vendor-specific driver library single-handedly. The only downsides to this method are:

The X Rendering Extension

Also known as XRender. Embraces the alpha channel, tosses font loading at the client, gives you projective transformations with interpolation, convolution filters, gradients. It doesn't fully replace core protocol drawing as it's meant to be more of a backend to a higher-level client-side library such as Cairo. Therefore, if you want lines or circles, you'll have to convert them to triangles or trapezoids first.

All in all, not exactly perfect, though still pretty neat and certainly network-efficient.

Client-side rendering

Go already has some good libraries for vector graphics, so this approach would be very convenient. Yet, pushing picture data to the display is going to be slow without shared memory.

Conclusion

It's basically a tie between the last two but XRender is going to have much better responsiveness over a remote connection, even though I might end up redoing a part of Cairo in Go to get a humane interface to it. In any case, they can be combined.

For a more in-depth, if slightly dated explanation of the graphics stack on Linux go here, then here. There are also other interesting articles on that blog that aren't as closely related to my plight such as this one. Background on the original design of X is described elsewhere.

What almost everyone gets wrong

It turns out that our vision, like our hearing, isn't linear but approximately logarithmic. That alone wouldn't be of much interest if the sRGB standard didn't make use of this by assigning pixel values according to that perceived intensity, which makes better use of your usual 8 bits per channel. (In truth it's partly accidental due to CRTs, the display technology of the last century, exhibiting a roughly inverse kind of response to that of our eyes. This was corrected directly in broadcasted signal to simplify the electronics of early TV sets. But when the signal is produced by computers, no such correction is done. In more words.)

To the point: you're not supposed to blend colors in sRGB with linear equations!

This affects multiple areas that will be nearly impossible to unfuck in the future, no matter how hard you preach it. Just read and weep and weep some more for good measure. Conversion to grayscale is also supposed to be done in linear space as this bloke reveals, yet I've been happily ignoring that bit since forever.

XRender does this wrong, Cairo does this wrong, and your browser does this wrong because that's the way it's defined.

The Go standard library is no different. The image/color.Color interface provides 16 bits per channel, just like COLOR in XRender and similarly to normalized floating points in Cairo. So far so good, it's plenty enough for a virtually lossless gamma conversion. The problem is, pictures are loaded directly from sRGB, and none of the data types consider gamma in their RGBA methods while all operations assume linear space. Thus, squabbles over NTSC coefficients are entirely pointless.

This is known but unlikely to be fixed within any reasonable amount of time. The good news is, you can make your own types, which will cost you some cycles on the I/O boundary.

Workaround for XRender?

Unfortunately, while X11 has this concept of DirectColor where you should be able to assign a non-linear mapping between in-memory and effective pixel values, my X server won't give me a 30-bit Visual so that I could let XRender composite in higher-precision linear space and avoid horrible banding resulting from a gamma conversion, so no luck there. And since I have a newer Intel iGPU, I can't run X with

DefaultDepth
30

yet.

Therefore, I'll have to put up with ever so slightly incorrect gradients and compositing. ~~Trumps dying in the Holocaust.~~

Text rendering

In the XRender model we need to render glyphs on the client, separately for each font and size that is used, and load them to the server. Let me summarize the current situation with rendering libraries:

Our only hope isn't without its problems, as one would expect. Since I can live with a very limited selection of well-behaved fonts such as the generously-licensed Go family, I'm most bothered by the lack of subpixel rendering, which might not actually be too hard to add.

Ideally, we would also want something like HarfBuzz for proper glyph selection and positioning—someone needs to port it first, the license is favourable. Until then, the best strategy is to at least use the provided context-free kerning information, and either render extraneous non-spacing combining marks before their spacing base characters with neutralized horizontal advance, or ignore them altogether. ‘Zalgo’ text handling turns out to be complex.

With XCB

The only serious obstacle on my way towards a functioning prototype was that in the source XML from which most of XCB and by extent XGB is generated there's nothing where there should had been a protocol definition for GLYPHITEMs. Meaning you'll have to serialize them yourself, functions just accept arrays of nondescript bytes for their last argument and tack them onto the end of requests. The necessary code only ever got written by hand, for Cairo. (xcb-util-renderutil doesn't count because it shows you the middle finger if you try to display more than ~252 glyphs in a single run.) The other full implementation is in Xlib.

Double buffering

Since changes to Window contents go more or less directly to the GPU's framebuffer, you need to ensure that your drawing can't be interrupted by a display refresh whenever you do complex operations with overdraw. Otherwise, partial results may end up visible to the user, causing flickering. It is especially pronounced over a network, due to much higher latencies.

Thanks to those generous amounts of memory we've got today, this problem is frequently resolved by drawing onto a separate off-screen buffer the same size as the target, and swapping the role of the two when the update is finished.

Even though X11 offers a DOUBLE-BUFFER extension that follows this principle, it's considered deprecated, and the protocol specification is missing from XCB, therefore from XGB as well. Instead, we're supposed to create the ‘second buffer’ manually as a Pixmap, and blit from it to the Window using xproto.CopyArea, or render.Composite. Note that when the Pixmap is backed by shared memory, we're ready for client-side rendering.

Keyboard

The core protocol would almost suffice, if it wasn't for setups with several keyboards layouts, here called groups, that are enabled by the XKB extension. Two shift levels times two groups simply isn't enough to hold all the symbols that can be assigned to a particular key, and although the server is in fact happy to give you more levels and groups in its GetKeyboardMapping response, you aren't given any good means of correlating them with keyboard state, i.e. whether Shift or AltGr is pressed, and the effective group for a keypress. You need to know what key types there are, how many levels they have (their width), what modifiers trigger those levels, and which key is of which type in which group.

You need the full XKB information. Which is a bit of a problem when xgbgen falls flat on its face trying to generate code for the extension. And since my own configuration is

I hope to get at least XkbGetMap working by commenting out the bad parts of xkb.xml. Until then, I won't be able to type in any Czech letters as AltGr gives me QWERTY instead of diacritics.

Keysyms

Alright, suppose we have the right keysym, now what? In general, there are two things we might want to do with it: try to convert it to Unicode, or see if it's a specific non-character key that interests us.

For the latter, some constants are required. Those are best generated from keysymdef.h, which helpfully includes a regular expression exactly for that purpose. Additional codes for various popular special keys lie in XF86keysym.h.

The larger issue is with conversion to characters—while the beginning of the range corresponds to ISO Latin 1, and thus also to Unicode, the rest is not so straight-forward. Luckily, we aren't the first ones to battle this, and you can find a table and an algorithm mapping X11 keysyms to Unicode in the source of xterm in the unicode directory, as well as in libxkbcommon, which maintains a separate fork in C form.

Compose key, IME

Wait, there's more. What about the compose key and CJK input method editors? Well, those are also handled by the client, why do you ask? This is the moment where I was just about to give up because this reeks of mountains of complex code within Xlib, or even worse within toolkits, that would need to be ported.

Though I do find the compose key rather useful. And it looks like libxkbcommon implements its own handling in a reasonable amount of code. It may not be pretty since you hope the necessary files exist on the ‘wrong’ side of the connection (think about X11 forwarding), and it relies on POSIX locales, but it seems manageable.

As for IME, implementing the legacy, convoluted XIM might be the safest option. I'll leave that up to those those who have brought this mess upon themselves. ~~I'm no weeb.~~

Scaling

To make sure users don't experience tiny text and UI elements on high-DPI displays, it would be nice to size these based on physical display dimensions (as long as it's not a projector, of course). Easier said than done. The core protocol will lie to you, and really has no other choice when the Screen is an obsolete, virtual concept, courtesy of Xinerama, which finally allowed people to join all their different monitors into one continuous workspace.

Xinerama can't report the DPI either, its protocol extension will only tell you the viewports of its ‘subscreens.’ But its successor RandR can. And here it gets complicated. Never mind that a window might span several monitors—we could rescale it once its upper-left corner is dragged over, and call it a day for most setups. Let's ignore that viewports may overlap, that one can set almost arbitrary projective transformations, or that the WM/compositor might scale our windows for us automatically in the near future, and we'd like to prevent it from doing that.

The RandR model is just bonkers. Now that Screens are passé, it gives us so-called CRT controllers, each having its own viewport, that can be shown on multiple Outputs. It's at this fourth level that we finally find our correct dimensions. And since 4K monitors use Multi-Stream Transport for 60Hz modes and act like two separate things, RandR 1.5 adds yet another concept called Monitors to group adjacent Outputs into a singular abstraction. Another way to figure out DPI!

Let me enumerate it: Display, Screen, CRT controller, Output, and Monitor. We might have it simple but FSM forbid if you want to change something in this scheme, given that they're all linked together.

Anyway, a good toolkit should still provide an override to any fancy ‘auto’ setting, there are too many variables. And I haven't even touched the subject of importance of integers in scaling, and how to deal with that.

Cursors

There are two sets of cursors, if you want to use something not completely alien to the user: the ugly ones from the core protocol, and whatever libXcursor gives you, which has become a de facto standard. Wayland has roughly copied it, XCB has more of a reimplementation. We will need to do a port. As with the compose key, fancy cursors are also loaded from the ‘wrong’ side of the connection.

Demo and future

Not a lot of ‘tangible’ code has resulted from all this. I've spent most of my time taking stock of the situation, writing trivial prototypes along the way. A few things definitely haven't been done before, though, such as using XGB to render text with XRender. Or an elementary drawing application that leaves it upon XRender to handle the canvas, and paint brush strokes, keeping network traffic low when run remotely.

It looks like I've got a lot of work in front of me before I can even attempt to write, say, a text editor that could serve me as my daily driver. Nonetheless, with the issues mostly identified, the path is much clearer.

I've tried my best to weave a readable narrative through random desire-driven research, in a foreign-to-me language (it takes a lot of effort, don't be mistaken). Feel free to contact me if you'd like to have some of the details explained—I acknowledge that the article is rather densely written, although search engines are generally of help and many links lead to amazing resources. Also feel free to correct my mistakes, some of the technology discussed is older than I am and documentation is often patchy.

Comments

Outsourced below. My favourite comments so far that completely miss the point of the article:

Go, like every language designed by a corporation, is an imperial language. Its purpose is to swallow up and contain within itself everything it touches. I love this definition!
The X-Windows Disaster