C/C++/sh features, idioms

I spend a lot of time on making code look elegant and short in inelegant, verbose languages.

Everything is by the standard here, no extensions.

Contents

C99

Compound literals

These can be very convenient for defining deep structures:

static struct tiff_entry tiff_entries[] = {
  {"NewSubfileType", 254, NULL},
  {"SubfileType", 255, (struct tiff_value[]) {
    {"Full-resolution image data", 1},
    {"Reduced-resolution image data", 2},
    {"Page of a multi-page image", 3},
    {}
  }},
  // …​
};

as well as to get a pointer to a temporary structure:

return WebPMuxSetChunk(mux, fourcc,
  &(WebPData) {.bytes = p, .size = len}, false) == WEBP_MUX_OK;

Sadly, the syntax only allows things that are initialized with {}, so arrays and structures.

Nearly first-class structs

Returning an object instead of initializing it indirectly makes for slightly cleaner code.

struct str
str_dup(struct str s) {
  return (struct str) {
    .str = memcpy(xmalloc(s.alloc), s.str, s.len + 1),
    .alloc = s.alloc,
    .len = s.len,
  };
}

If you put an array inside a struct, you can pass arrays by value.

C99 and C++

Deriving an enum and an array from the same table

I’ve originally met this idiom in Node.js' http-parser library. It’s definitely a trade-off.

#define MPD_SUBSYSTEM_TABLE(XX)               \
  XX (DATABASE,         0, "database")        \
  XX (UPDATE,           1, "update")          \
  XX (STORED_PLAYLIST,  2, "stored_playlist") \
  XX (PLAYLIST,         3, "playlist")        \
  XX (PLAYER,           4, "player")          \
  XX (MIXER,            5, "mixer")           \
  XX (OUTPUT,           6, "output")          \
  XX (OPTIONS,          7, "options")         \
  XX (STICKER,          8, "sticker")         \
  XX (SUBSCRIPTION,     9, "subscription")    \
  XX (MESSAGE,         10, "message")

#define XX(a, b, c) MPD_SUBSYSTEM_ ## a = (1 << b),
enum mpd_subsystem { MPD_SUBSYSTEM_TABLE (XX) };
#undef XX

static const char *mpd_subsystem_names[] = {
#define XX(a, b, c) [b] = c,
  MPD_SUBSYSTEM_TABLE (XX)
#undef XX
};

The identifier/name stuttering can be avoided with the # stringification operator, however it cannot change letter case.

Unused arguments, functions, results

-Wunused-parameter and -Wunused-function are most easily and portably disabled like this:

static char **
app_readline_completion (const char *text, int start, int end)
{
  // We will reconstruct that ourselves
  (void) text;
  return make_completions (g_ctx, rl_line_buffer, start, end);
}

Sometimes, you see people creating helper macros to improve readability:

#define UNUSED(x) (void)(x)
#define USED(x) (void)(x)  // Plan 9

Personally, I also void-cast to make it apparent I know I’m ignoring the result of a call:

static void
on_signal_pipe_readable (const struct pollfd *fd, struct server_context *ctx)
{
  char dummy;
  (void) read (fd→fd, &dummy, 1);
  if (g_termination_requested && !ctx→quitting)
    initiate_quit (ctx);
}

but it has no effect on GCC’s __warn_unused_result__ attribute.

`sizeof` when allocating arrays

For reasons unknown to me, people like to unnecessarily repeat the data type.

static struct item_list
item_list_make (void)
{
  struct item_list self = {};
  self.items = xcalloc (sizeof *self.items, (self.alloc = 16));
  return self;
}

Unary plus

Useful for alignment and symmetry, it even survives clang-format:

if (event→key.keyval == GDK_KEY_Up)
  return stardict_view_scroll (view, GTK_SCROLL_STEPS, -1), TRUE;
if (event→key.keyval == GDK_KEY_Down)
  return stardict_view_scroll (view, GTK_SCROLL_STEPS, +1), TRUE;

Here with an additional syntax hack to transform void into TRUE.

Where to put the annoying break in switch

I’m certainly not proposing using it everywhere. I’ve started using it for one-liners:

jv type = jv_number(type_number);
switch (type_number) {
break; case 0x030000: type = jv_string("Baseline MP Primary Image");
break; case 0x010001: type = jv_string("Large Thumbnail - VGA");
break; case 0x010002: type = jv_string("Large Thumbnail - Full HD");
break; case 0x020001: type = jv_string("Multi-Frame Image Panorama");
break; case 0x020002: type = jv_string("Multi-Frame Image Disparity");
break; case 0x020003: type = jv_string("Multi-Frame Image Multi-Angle");
break; case 0x000000: type = jv_string("Undefined");
}

Naming all constants

Why not name the ends of pipes and sockets? Anonymous enumerations are short, convenient, and don’t create useless variables.

enum { OURS, THEIRS };
int pair[2] = { -1, -1 };
if (socketpair (AF_UNIX, SOCK_STREAM, 0, pair))
  exit_fatal ("socketpair: %s", strerror (errno));
set_cloexec self->socket = pair[OURS];
set_cloexec (pair[THEIRS]);

Iterating fixed arrays without a sentinel value

This shouldn’t be anything new to C programmers, but here’s for completeness:

static int a[] = { 1, 2, 3 };
for (size_t i = 0; i < sizeof a / sizeof *a; i++)
  printf ("%d\n", a[i]);

GLib wraps it in a convenience macro named G_N_ELEMENTS().

C++03

This language has a few features I really wish were present in C as well. It also has others that make me avoid it, notably its crippled void *.

`void` return chaining

The following artificial example doesn’t give the feature enough justice:

void foo() {}
void bar() { return foo(); }

Declarations within conditions

Predating C++17 initializers by a few decades, yet I haven’t really seen it used anywhere:

g.unames.clear ();
while (auto *ent = getpwent ())
  g.unames.emplace (ent→pw_uid, to_wide (ent→pw_name));
endpwent ();

With C++11 uniform initialization, this becomes particularly interesting:

if (std::ofstream f{"foo"})
  f << "bar" << std::endl;

POSIX sh

The Bourne shell keeps surprising me. Many quite high-level features and conveniences have been there since the beginning:

  • You can pipe in and out of your own functions. To me, this is a mind-blowing capability.

  • You can pipe in and out of loop constructs.

  • && and || can be used as shorthands for if-then-fi, with some caveats.

  • Here-documents needn’t be immediately followed by their contents. Moreover, several here-documents can be started on a single line—​then they follow one another.

  • & is a statement terminator like ;. It’s odd and obvious at the same time.

  • Variable assignments do not observe word splitting, and need less quoting.

  • ${1:-defaulting} works, positional variables aren’t special in that regard.

  • The exec command is overloaded and it can be used to redirect the current shell’s own file descriptors to a log file.

  • Similarly, any command can be redirected, such as : >file-to-truncate.

  • Functions may be called indirectly, so you can trivially call your script’s arguments, and have them dispatched to a function with no additional effort. In fact, the callee’s name can be assembled just like any other string (as in Tcl).

  • Functions have an implicit return value/status (as in Perl or Ruby).

  • if/while/until conditions may also contain lists of commands.

  • Functions can be declared using the abc() ( …​ ) syntax for compound commands.

  • `for i in "$@“ can be shortened to merely for i.

Simple formatting

It’s hard to think of any noteworthy shell examples, but this pattern I use a lot:

echo "$(tput bold)-- Script started at $(date)$(tput sgr0)"

Further reading

Comments

Use e-mail, webchat, or the form below. I'll also pick up on new HN, Lobsters, and Reddit posts.