Sun Jan 6 09:49:58 EST 2013

Imprisoned by the Haskell Toolchain

I was going to write a new version of metscrape in C that supported plugins, so people could contribute modules for their local weather services. C is still one of the best ways to go for portable programs, and a plugin system means you don't need to build in support for unused countries.

But.

I wanted to write my plugins in Haskell, mainly because HXT is peerless when it comes to slicing and dicing XML.

So.

The Haskell FFI is pretty good. Actually, it's one of the best I've used: you declare functions "foreign export" and can fiddle around with things, preserve objects from the garbage collector and so on.

However.

The toolchain support is terrible. Basically, it wraps gcc: if you want to compile a mixed Haskell/C library (and you probably do, to expose the correct entry points, call hs_init() or other low-level details), you have to compile your .c files with ghc. Which means it won't play nice with automake or anything else that wants to use dependency tracking via -M or -MM. automake will want to invoke $(CC) to compile C files, possibly through libtool if you're building a shared library or module. Further, the link command doesn't even put in all the libraries that it needs, so you have to add them yourself.

Modern ghc supports shared libraries, but debian doesn't ship them, so you can't rely on the dynamic linker to sort it out for you.

There's no command akin to pkg-config to give you the right cflags/ldflags to pass to the compiler.

So basically you can't get proper dependency tracking or anything. Ugh. Have fun reimplementing all the required features in the GNU Makefile Standards.

The ultimate problem is that people insist on rolling their own sucky versions of build systems and package managers. (Though cabal and ghc --make suck less than most, I'll admit).

Lessons.

  1. IF YOU DIDN'T SET OUT TO WRITE A BUILD TOOL, DON'T WRITE YOUR OWN BUILD TOOL. Choose one or more of the following, instead:
    1. Emit make-format dependency information. UPDATE: It turns out ghc can do this. Unfortunately, it can't emit dependencies as a side-effect of compilation, which is what automake really likes.
    2. Emit C and write a suffix rule. TA-DA! You now play nice with the rest of the world. Once you've got the native-code backend going, you can emit dependency information (as above) without breaking the world. Use a sensible deprecation policy.
    3. Provide a foo-config script or program that will give you the correct compiler and linker flags.
  2. HAVE FLAGS SO YOU EMIT EXACTLY ONE OUTPUT FILE. This is a corollorary to the above point. make(1) is a dinosaur, but it's everywhere and you have to play nice with it. Don't spew out half-a-dozen files each time you call the compiler (ocamlc I'm looking at you) because then you have to be really careful with your make rules otherwise you'll break parallel make. A "do everything at once" mode is fine for use from an interactive shell, but this is the age of multicore: you don't get to break parallel builds to save a couple of compiler invocations.
  3. DON'T WRITE YOUR OWN PACKAGE MANAGER. If your code plays well with automake (it should. Write some autofoo to help find paths &c. It's not hard.), installing things is really easy. Every major distribution has stuff to streamline making distro-packages from autotooled packages. Want to install in a custom prefix? Let the package manager do it for you. (What? Your package manager sucks, and doesn't let you do this? Fix it, and everyone's ecosystem benefits!). Dishonourable mention: rubygems.

    When I passed around a draft of this post, one reviewer asked me if I seriously expected compiler writers to learn autotools to play nice with it? My answer is a resounding YES. The autotools aren't that hard to learn, and there's a fantastic tutorial to learn from. If you're smart enough to write a compiler, you're smart enough to learn how to make it play nice with the rest of the world.

  4. DON'T WRAP THE TOOLCHAIN. You're not the C compiler. You don't get to compile someone's C code. If lang1 and lang2 both wrap the toolchain and a developer is writing a lang1<->lang2 bridge, they're forced to use Nasty Hacks(tm) to make the wrappers play nice with each other. libtool gets a pass here because it's so entrenched and automake supports it natively. You're new. You don't get that excuse. Honourable mention: python. Its python-config script lets programs embedding the python interpreter build with correct flags.

Posted by Jack Kelly | Permanent link | File under: rants, coding