Errors and RAII in C

RAII

…stands for Resource Acquisition Is Initialization. But what this really means is that even after a ‘fatal’ error occurs in a function, resource destructors are guaranteed to be called. Languages that use this idiom do that automatically. But C isn’t one of these languages. You need to do this by hand.

I’ve found out about four basic ways of doing it:

1. Hi, goto

int fun (void) {
  FILE *in, *out;
  int ret = ERROR;

  in  = fopen ("input",  "r");
  if (!in)
    goto fun_error_in;

  out = fopen ("output", "w");
  if (!out)
    goto fun_error_out;

  /* … do something … */
  ret = OK;

  fclose (out);
fun_error_out:
  fclose (in);
fun_error_in:
  return ret;
}

goto isn’t evil! It’s your best friend here. Although the code may not really be that pretty, especially at the end of functions, this is a straight-forward, efficient way of doing RAII in C. I’ve read that it is how they do it in the Linux kernel, too.

Regarding the return status, of course you can also go the other way round; initialize ret to OK and set it to some other constant before each goto. That’s not important here.

2. Have we failed yet?

int fun (void) {
  FILE *in, *out;
  struct {
    unsigned open_in  : 1;
    unsigned open_out : 1;
  } cleanup = {0};
  int good;

  in = fopen ("input",  "r");
  good = cleanup.open_in = (in != NULL);

  if (good) {
    out = fopen ("output", "w");
    good = cleanup.open_out = (out != NULL);
  }

  /* … more 'if (good)' stuff … */

  if (cleanup.open_out)
    fclose (out);
  if (cleanup.open_in)
    fclose (in);
  return good ? OK : ERROR;
}

Yes, it’s a way, too, but, like, seriously? This stuff looks weird. (Not mentioning that it uses extra memory and goes through unnecessary checks in case of an error. Oops, I’ve just mentioned it, haven’t I?) No goto's, though, so if you’re dogmatic about that evilness thing, this seems to be your answer. Source with another example.

3. Exceptions, finally

Oh yeah, everyone wants to implement an exception mechanism in C. Myself included. They are all based on setjmp(), and as such have their own problems, too. (Though they are a lot more convenient to use than the said function alone, as you don’t have to care about where to put the jmp_buf structure.) The thing is, if they actually implement the finally clause, you may simply put resource deallocation just there:

FILE *fopen_ex (const char *file, const char *mode) {
  FILE *fp = fopen (file, mode);
  if (!fp)
    Throw (EFOPEN);
  return fp;
}

/*  If a catch statement is required before finally,
 *  just add something like:
 *
 *  } CatchAny {
 *    Rethrow ();
 */

int fun (void) {
  FILE *in, *out;
  int ret;

  Try {
    in = fopen_ex ("input",  "r");
    Try {
      out = fopen_ex ("output", "w");
      /* ... do something ... */
      fclose (out);
    } Finally {
      fclose (in);
    }
    ret = OK;
  } CatchAny {
    ret = ERROR;
  }

  /* Well, yes, this function doesn't really need to return any value,
   * since hey, we have exceptions now, but let's keep the interface. */
  return ret;
}

(Modify this as needed to comply with the retarded syntax and limitations of your selected ‘exceptions in C’ hack.)

Now this actually looks like something one could be quite familiar with. There is a disadvantage to this approach, though—see how it nests each time you try to acquire some resource?

4. The obscure way someone posted on Wikipedia

The first code sample in Ad-hoc mechanisms. I’m not going to quote it here, because it’s quite lengthy. It further extends on the idea of screwing up C to behave more like C++. And it suffers from a few other things in addition to those you get by emulating exceptions: you have to wrap everything in an object and you can’t have a constructor with no arguments. (The latter can be easily fixed with a special version of the RAII macro.)

5. Compiler extensions

I’ll just link an example again. It’s not particularly pretty and remains error-prone. Portability also suffers, as e.g. the Microsoft compiler won’t support it.

Returning errors

Moving on to the next topic. What is the best way to return errors to the caller? Again, you’ve got a variety of choices here. It’s just not that easy to say if one is better than the other.

1. Again, exceptions

int unfun (void) {
  int r = rand ();
  if (r % 2)
    Throw (EBOOBOO);
  return r;
}

int main (void) {
  Try {
    printf ("%d\n", unfun ());
  } Catch (EBOOBOO) {
    exit (EXIT_FAILURE);
  }

  return 0;
}

You should already have an idea of how this is used. Try to ignore the detail that you can’t use return in try blocks (and maybe some other interesting things) and you’re ready to roll.

2. Return the error information directly

int unfun (int *n) {
  int r = rand ();
  if (r % 2)
    return EBOOBOO;

  *n = r;
  return 0;
}

int main (void) {
  int ret, n;

  ret = unfun (&n);
  if (ret) {
    printf ("Error: %s\n", translate_error (ret));
    exit (EXIT_FAILURE);
  }

  printf ("%d\n", n);
  return 0;
}

Functions pass any error as their return value, be it an integer, an enum or a struct, which I think no one actualy does, since it kills any possible advantage with how cumbersome it is in practice. I’m not quite sure if the enumerations are that great a choice either, as you basically deprive yourself of the ability to use, let’s say, negative values for error codes and the positive ones for a valid result, making it necessary to add another output argument to functions. Although I’ve heard it might help you with debugging.

The problem with this approach is that you can’t put that much information into just a mere number, and you might want to store additional details somewhere else for the caller to retrieve if needed. It may be a global (thread-local) variable or even an extra output argument to the function. Which brings me to…

3. …or indirectly?

int unfun (GError **err) {
  int r = rand ();
  if (r % 2) {
    g_set_error (err, UNFUN_ERROR, UNFUN_ERROR_ODD,
      "I've seen an odd number, run for your life!");
    return -1;
  }

  return r;
}

int main (void) {
  GError *err = 0;
  int n;

  n = unfun (&err);
  if (err) {
    printf ("Error: %s\n", err->message);
    g_error_free (err);
    exit (EXIT_FAILURE);
  }

  printf ("%d\n", n);
  return 0;
}

I’ve chosen Glib’s GError here as an example. The point is that functions return errors via an output argument, and if possible, indicate the event with a special return value (the -1 can be considered as such, because rand() doesn’t usually produce negative numbers). Though this is not needed, as you can initialize the error object beforehands and check whether it has changed after the call just as well. You can also retrieve all sorts of data about what happened and where because things can be put in a structure now. (Albeit GError specifically doesn’t deem this important and only passes an error code with a description string to show to the user.) All in all, it is quite convenient and close to being as good as exceptions.

Another good example is OpenSSL (originally SSLeay), which maintains error information in a thread-local limited-size queue, therefore you can even trace their origin. When the queue is full, it just starts overwriting the oldest entries, so you don’t have to care about cleaning them up manually.

The standard C library and Unix return errors indirectly, too, typically indicating the event with a special output value, though they’re not very consistent as to what exactly you need to check for.

4. Even more indirectly!

Lastly, you can also return gibberish and force the caller to ask about any screw-ups explicitly. That saves you from adding arguments completely. But it can be quite troublesome. For example, if you want to put calls into a condition, you’ll end up having to write something like:

errno = 0;
if ((a = fun (b, &c), errno != 0) ||
    (unfun (),        errno != 0))
  abort ();

which doesn’t look very good. Although it’s quite a nice use of the comma operator. When you’re bored, go have a look at how the Vala compiler abuses it in the generated C sources.

About error codes

Sometimes you may be happy with each function or program unit having its own overlapping set of values, creating collisions on a higher level. The price of this is such that no function may ever directly return an error coming from a deeper level, and if it really has to do so, it must either summarize the inner error codes with just a few values or duplicate the whole spectrum of them in its own scope. Another problem lies in human-readable description strings: if you don’t store them by some other means, you’ll end up with numerous translation functions or look-up tables to be used throughout the code.

So how do we make them unique? One option, and this is probably the path of least resistance, is to put everything into a global errors.h file containing a giant enum (or lots of #define statements whose values form a sequence) and maybe a function that can translate these values into an error message. With a little help from the C preprocessor, it is even possible to define both the enum and a translation table at the same time, and thus avoid the burden of synchronizing two interdependent lists.

If you don’t like the idea of maintaining a master error file, you’ll need to find another way of differentiating the codes, for example using the higher bits to denote the domain and the lower bits for the cause. And to make sure the domain identifiers are unique, you can go with…

1. Manual assignment

#define ERROR_BITS  8

#define ERROR_DOMAIN_FUN    (0 << ERROR_BITS)
#define ERROR_DOMAIN_UNFUN  (1 << ERROR_BITS)

/* In fun.h or some other header… */
enum FunErrors {
  EFUN_BOOBOO = ERROR_DOMAIN_FUN,
  EFUN_SCREWUP,
  EFUN_DISASTER
};

So now we have error-domains.h instead of errors.h. Not that much better, if you ask me personally. Though an improvement it is.

2. Automation

Let’s take OpenSSL again as an example. Internally it constitutes of several smaller libraries. Errors are identified by a tuple of library, function, reason. For the last two a script is used to extract these identifiers from source files and assign each one of them a value. This result is then appended at the end of the main header file of each sublibrary.

A trivial extraction rule to add to your Makefile could look like this:

error-domains.h: $(wildcard src/*.h)
    sed -n 's/.*\<\(ERROR_DOMAIN_[A-Z0-9]\{1,\}\)\>.*/\1/p' $^ | sort -u | \
      awk 'BEGIN {print "#define ERROR_BITS 8"}
        {print "#define " $0 " (" i++ " << ERROR_BITS)"}' > $@

Or rather with some column alignment…

error-domains.h: $(wildcard src/*.h)
    awk 'BEGIN {print "#define ERROR_BITS 8"}
      match ($0, /\<ERROR_DOMAIN_[A-Z0-9]+\>/) \
        {domains[d = substr ($0, RSTART, RLENGTH)] = 1
         if (a < length (d)) a = length (d)}
      END {for (d in domains)
        printf "#define %-" a "s (%d << ERROR_BITS)" ORS, d, i++}' $^ > $@

Now you don’t have to worry about managing domains by hand anymore. Once you mention them within your headers, they come to life by themselves.

3. Dynamic assignment

GLib’s error domain macros translate to calling a function that takes a string identifying the domain and returns a unique number associated with it. This association is formed upon the first call of the function.

…

These are all fine so far but require either cooperation from the programmer or some kind of additional processing. Can you get uniqueness magically without these annoyances? Yes, you can. There’s a small catch, though: you’ll have to resort to pointers, which in turn pushes you into using the indirect method of returning.

…

4. String literals (disqualified)

These are specified to be statically allocated, and thus unique, although you can’t be too sure whether they won’t become too unique, ie. if a literal won’t resolve to different addresses depending on where it is used (it’s up to the linker to deal with any duplicates). Dynamically loaded libraries will break this assumption for certain.

5. Addresses of static variables, or functions

Unlike literals, these are guaranteed to point at the same place in memory under all circumstances. You just have to make them visible to the rest of your code by making them public and including them in a header file.

Names of functions resolve to their location directly, while with variables, you’ll need to use the ampersand operator. But you can circumvent that by hiding the operation in the very same source file where you define them:

static int g_dummy;
void *ERROR_DOMAIN_FUN = &g_dummy;
…
extern void *ERROR_DOMAIN_FUN;

Obviously you can utilize these otherwise dummy objects and make them contain something useful, such as an array of description strings indexed by the integer part of error codes. The same holds for functions.

…

As you may have noticed, only the last proposed alternative has really solved the problem of converting the error code to something people can read. One way out of this is to pass an error description explicitly each time you return an error from a function, like GError has it. Or you can automatically extract the textual descriptions from constant names by converting them to lowercase and generate a translation table, which is what OpenSSL does.

That was about everything I wanted to cover. And I’m not sure if I’m really any wiser. Anyway, I hope you’ve learnt something.

Comments

Use e-mail, webchat, or the form below. I'll also pick up on new HN, Lobsters, and Reddit posts.