On Automation and API Documentation

Posted on March 18, 2012 by Jack Kelly

On the first day of a voyage, we pull down a whole bunch of weather charts from a number of sources. One of those is the wind charts from MetVUW. On the Port Davey voyage I had to download the MetVUW charts, and I was disappointed to see that it was done entirely by hand. This will not do! The result: metscrape.

One of the things I have to deal with when writing these things is that they have to be pretty much fire-and-forget. That means compiled, preferably statically linked, and with limited user interaction. This time I used C, libcurl to fetch the weather images and libxml2 to parse html.

Both packages are thoroughly documented, but I found the documentation to be of vastly differing usefulness. libcurl’s documentation is fantastic: it’s written as a series of man pages (the html versions of which are hyperlinked), but their structure is some of the best I’ve seen. Once you get to the documentation for the C API, you immediately learn that the API has two flavours: the “easy” interface for doing simple transfers and the “multi” interface if you need to get fancy. The tutorial points out all the important bits, including the sequence of calls you need for normal usage. This also shows that thought has gone into the API design: simple things should be easy, complex things should be possible.

libxml’s documentation is generated with gtk-doc, one of those whizz-bang documentation tools that pull the documentation out of the source code. Unfortunately, there’s no section that describes the patterns or the library’s style. There are a decent number of examples but puzzling out what to call when isn’t obvious. htmlCtxtReadFile() (and similar functions) take an “options” parameter when they’re called, so what’s htmlCtxtUseOptions() for? Are the options set here merged with the htmlCtxtReadFile() options? Who can say? The documentation for htmlCtxtUseOptions() just says “Applies the options to the parser context”. There are also irritating inconsistencies in naming: “Context” is sometimes spelled fully, sometimes abbreviated to “Ctx” and sometimes to “Ctxt”.

The experience I’ve had with libxml2’s documentation has clarified an opinion I’ve been forming for a while. Documentation generators encourage the writing of pointless pieces of fluff just for the sake of completeness, and such magic comments have no place in the source code. Here’s a prime example from libxml2’s source:

/**
 * XML_DEFAULT_VERSION:
 *
 * The default version of XML used: 1.0
 */
#define XML_DEFAULT_VERSION     "1.0"

To ensure that XML_DEFAULT_VERSION turns up in the generated documentation, five additional lines have been wasted. That sounds trivial, but when each and every #define, function definition, typedef, struct declaration and so on all add a number of lines the code blows out significantly. The less code that fits on a screen, the more code must be kept in the programmer’s head.

In a language like C, having the low-level documentation (documentation for individual functions, types, &c.) extracted into a manual damages the readability of the source, which should be the primary concern when writing code. (Languages with proper docstrings, like some lisp dialects and Python, are somewhat forgiven because the documentation string is exposed at runtime, so it needs to be part of the source.) I found that when writing headers for MudCore, I would write out the interface first, with related functions grouped together. In a second pass, I added documentation comments. Inserting the magic comments pushed apart definitions that belonged together, making the code less clear and wasting valuable screen space.

Banishing documentation generators means that the source code belongs only to the developers and the program that parses it; it is no longer beholden to the documentation tool. The API documentation must still be written and maintained, of course, but now the developer can lay out the code to be readable on its own merits (except for occasional hairy details).