Blog | jackkelly.name

I'm Funding Ladybird Because I Can't Fund Firefox

2024-07-06T13:00:00Z

I’ve been meaning to write this one for a while, but the announcement of the Ladybird Browser Initiative makes now a particularly good time.

TL;DR: Chrome is eating the web. I have wanted to help fund a serious alternative browser for quite some time, and while Firefox remains the largest potential alternative, Mozilla has never let me. Since I can’t fund Firefox, I’m going to show there’s money in user-funded web browsers by funding Ladybird instead. You should too.

Why Browser Diversity Matters

An open web requires a healthy ecosystem of several competing browsers, where each has enough market share that no one vendor has de facto control over web standards. That’s the world we used to have, after Firefox cracked the dominance of Microsoft’s Internet Explorer (IE) in the 1990s. IE’s poor support for internet standards held back web development all through the late 1990s and early 2000s, and competition from Firefox allowed developers to build “for the web” instead of “for IE6”, forcing browser vendors to catch up.

Unfortunately, we are back in a world without healthy browser competition. statcounter.com claims that Chrome, Google’s browser, has over 65% market share. Add Edge (which uses Blink, Chrome’s browser engine, under the hood) and you’re over 70%. This market dominance allows Google to push through changes like its “Manifest V3” format for browser extensions, which coincidentally cripples ad blockers.

Sidebar: Ad companies had their chance

Why `streaming` Is My Favourite Haskell Streaming Library

2024-04-13T13:00:00Z

It’s really easy to misuse lazy I/O (e.g., hGetContents) in nontrivial Haskell programs. You can accidentally close a Handle before the computation which reads from it has been forced, and it’s hard to predict exactly when data will be produced or consumed by IO actions. Streaming libraries in Haskell avoid these problems by explicitly interleaving the yielding of data and execution of effects, as well as helping control the memory usage of a program by limiting the amount of data “in flight”.

A number of veteran Haskellers have built streaming libraries, and off the top of my head I’m aware of conduit, io-streams, iteratee, machines, pipes, streaming, and streamly. Of those, I think conduit, pipes, streaming, and streamly are the most commonly used ones today. It can be hard to know which library to choose when there’s so many options, so here is my heuristic:

If you’re doing simple streaming (e.g., from a network connection straight into a file), use whatever your library uses (usually conduit); or
If you’re doing anything more complicated, or you’re doing greenfield work, use streaming.

I’ll explain why after the jump.

The Shortlist

Pipes

From what I can tell, pipes and conduit are the most common industrial streaming libraries. pipes is beautiful and elegant, but despite its very extensive tutorial (complete with beautiful diagrams and composition laws), I have never been able to successfully write a nontrivial program using it. I think it is too complicated, and its author regrets the complexity of pipes’s core type:

I firmly subscribe to the principle of least power which says that you should use the simplest type or abstraction available that gets the job done instead of trying to shoehorn everything into the same “god type” or “god abstraction”. I learned this the hard way when I tried to shoehorn everything into my pipes package and realized that it was a huge mistake, so it’s not like I’m innocent in this regard. Don’t make the same mistake I did.

There are many attractive tools in the pipes ecosystem, but they all seem to use pretty high-powered type signatures that even now I struggle to follow. Example: to re-chunk a Producer using pipes-group, you use the chunksOf function, which has this type signature:

chunksOf ::
  Monad m =>
  -- | Chunk size
  Int ->
  Lens
    (Producer a' m x)
    (Producer a m x)
    (FreeT (Producer a' m) m x)
    (FreeT (Producer a m) m x)

I’ve been doing Haskell professionally for several years, and I still find that signature a little intimidating. I can see why it’s a Lens (it bundles the chunking/unchunking operations together), but if we’re doing optics, why isn’t it an Iso? I also get the vibe of why it’s using FreeT, but have zero muscle memory when it comes to working with that type.

Conduit

conduit, on the other hand, feels much more rough-and-ready, with additional concepts like “leftovers” adding incidental complexity to the design. Many libraries for common tasks like HTTP request/response use conduit, so it has a strong gravitational pull. For simple applications, it’s often easier to stick with conduit and not worry about converting to/from some “better” streaming library.

But every time I try to do something non-trivial with conduit, I feel like the API can do everything except solve my problem. I think this is partially due to the design of its core type: the ConduitT monad transformer, which lets you build computations which are part of a streaming pipeline:

data ConduitT i o m r
--            | | | |
--            | | | `- Final result type
--            | | `--- Monad in which it can perform actions
--            | `----- Output type (yielded downstream)
--            `------- Input type (awaited from upstream)

This design decision (which I think is inspired by a similar type in pipes) leads to (IMHO) an unfortunate splitting of the interface: I can never figure out whether a function should be implemented as its own conduit (and connected as a subsequent stage in the pipeline), or as a function which transforms streams.

A challenge for conduit wizards: write a function that can rechunk a ConduitT i ByteString m r into yielding fixed-length ByteString substreams, without buffering. I want to stream the data as I receive it, and only split a ByteString if it spans a chunk boundary. The amount of data streamed out for each chunk except the final one should be equal to a chunk size argument, but the final chunk might necessarily be smaller. I imagine that the type I’d want is a function to turn the conduit into a conduit-that-yields-conduits, which can be consumed piecemeal:

-- Possible type signature?
rechunk ::
  ConduitT i ByteString m r ->
  ConduitT i (ConduitT () ByteString m ()) m r

But I really have no idea how to do this with conduit or pipes, while streaming-bytestring can easily do it by repeatedly applying splitAt.

Streamly

streamly is a relatively new library in this space, and a very ambitious project. It makes really bold performance claims and uses aggressive under-the-hood concurrency to try and achieve them. It also provides a broad, unified toolbox of data streaming/concurrency/folding abstractions. I haven’t used it since a change in its file-watching functions broke a work program during a minor version update, leading us to remove it. That makes me shy away from it, but that was also a few years ago. It’s had a fresh release this year and is probably worth keeping an eye on.

Streaming

The streaming package has a radically simpler design than any of the others we’ve discussed so far. It uses two core types, Stream and Of:

-- f: The functor which contains the stream elements
-- m: The monad in which actions occur
-- r: The type of the final result
data Stream f m r = Step !(f (Stream f m r))
                  | Effect (m (Stream f m r))
                  | Return r

data Of a b = !a :> b deriving Functor

The most common instantiation you see for a Stream is f ~ Of a, giving Stream (Of a) m r. A simple example might be a stream that yields the integers 1, 2, 3 in order:

-- This is spelled out for clarity.
-- There are more concise ways to write this stream.
oneTwoThree :: Stream (Of Int) m ()
oneTwoThree =
  Step (1 :> Step (2 :> Step (3 :> Return ())))

The strictness annotations on the Step constructor and the left side of the Of constructor ensure that stream elements will be forced to WHNF as soon as they’re yielded. The payoff from making the f parameter to Stream an arbitrary Functor is that you can use more complicated structures as “stream elements” to achieve things like chunking with perfect streaming. Compare streaming’s chunksOf to the one from pipes-group above:

chunksOf :: (Monad m, Functor f) => Int -> Stream f m r -> Stream (Stream f m) m r

So if f ~ Of a, chunksOf 3 would turn a Stream (Of a) m r into a Stream (Stream (Of a) m) m r — a stream of streams, which is exactly the behaviour we want.

Because streaming doesn’t have the pipes/conduit concept of “horizontal composition”, it can unify “connecting a consumer” with “transforming the stream” — both are standard function application.

streaming has one major wart, and I think it’s a forgivable one: It has a separate type for streaming ByteStrings. The Streaming.ByteString.ByteStream type from streaming-bytestring is morally Stream (Of ByteString) m r, but with the ByteStrings {-# UNPACK #-}-ed directly into the data structure. This is such a common special case that I think it’s worth the specialisation.

Other Tricks With The `Functor` Parameter

I’ve praised simplicity a lot in this post, so now it’s time to really justify that f parameter to Stream. It would be much easier to bake the item directly into the Step constructor, but the additional f adds a lot of power. We’ve seen it used in chunksOf already, but it’s possible to do even more. I’ve been working on a decompressor for an old computer game, and streaming allowed me to say some pretty useful things:

-- | Attempt to parse the header records from the start of a byte
-- stream. On success, return the remainder of the stream.
decodeHeader ::
  Monad m =>
  ByteStream m r ->
  m (Either String (Header, ByteStream m r))

-- | Break up and decompress the input stream by zipping it with the
-- list of records from the header. You can process each byte stream
-- by using @Streaming.'Streaming.mapsM_'@.
decompressAll ::
  MonadIO m =>
  Header ->
  ByteStream m r ->
  Stream (Compose (Of Record) (ByteStream m)) m r

Let’s look into that Stream (Compose (Of Record) (ByteStream m)) m r type, and consider what we’ll actually have in the Step constructor of the result Stream:

Step !(f (Stream f m r)) is how the constructor is declared. Let’s call the argument to f RestOfStream to keep the noise down;
f is (Compose (Of Record) (ByteStream m)), so we’re holding a (Compose (Of Record) (ByteStream m)) RestOfStream;
Compose is from Data.Functor.Compose:
```
newtype Compose f g a = Compose { getCompose :: f (g a)}
```
Expanding the newtype shows that we’re holding something representationally equal to Of Record (ByteStream m RestOfStream);
Of is the left-strict pair, so we get the Record from the archive (strictly) alongside a (lazy) ByteStream corresponding to its data. At the end of that ByteStream, we get the next record from the archive, and so on until we stream out the entire file.

Let me repeat that last part: as the stream iterates through the records, it yields the Record it’s decompressing, as well as the corresponding uncompressed ByteStream, and you must reach the end of the ByteStream before you can start on the next Record. This makes it very hard to accidentally buffer one record before starting on the next, and since I can’t even figure out how to do something as simple as a perfect rechunking of a ConduitT, I have no idea how you’d do this in pipes or conduit. Maybe I’m not just not smart enough.

streaming seems to make the easy jobs easy and the hard jobs possible, which is why it’s the one I reach for by default.

Which Build Tool For A Bootstrappable Project?

2024-04-01T13:00:00Z

I have a nascent side project which is intended to participate in a bootstrap chain. This means it shouldn’t depend on too many things, that the transitive closure of its build dependencies must also be small, and at no point in the process should any build depend on an opaque binary blob.

Choices on the language side are pretty constrained. Zig is currently not a candidate (despite the language itself being rather promising), because it has removed its C++-based bootstrap in favour of keeping a WASM-based build of a previous compiler version. It’s great that their compiler output is reproducible — Zig-built-by-Zig is byte-for-byte identical with Zig-built-via-WASM — but for now, it’s not truly bootstrappable. (Andrew Kelley says he hopes someone writes a Zig compiler in C when Zig stabilises. I sincerely hope this happens.)

Rust is right out, for reasons described in the Zig article:

Use a prior build of the compiler - This is the approach taken by Rust as well as many other languages.

One big downside is losing the ability to build any commit from source without meta-complexity creeping in. For example, let’s say that you are trying to do git bisect. At some point, git checks out an older commit, but the script fails to build from source because the binary that is being used to build the compiler is now the wrong version. Sure, this can be addressed, but this introduces unwanted complexity that contributors would rather not deal with.

Additionally, building the compiler is limited by what targets prior binaries are available for. For example, if there is not a riscv64 build of the compiler available, then you can’t build from source on riscv64 hardware.
The bottom line here is that it does not adequately support the use case of being able to build any commit on any system.

As far as I can see, the best choice for writing bootstrap-related software in 2024 is still C99, with as few dependencies as possible. Any (hopefully few) necessary dependencies should also be bootstrappable, written in C99 and ideally provide pkg-config-style .pc files to describe the necessary compiler/linker flags. But at least there are several C compilers as well as several implementations of pkg-config (the FreeDesktop one, pkgconf, u-config, etc.).

Since we are compiling C, what should we use for the build system? Autotools is under scrutiny again in the wake of the xz-utils compromise, as code to trigger the payload was smuggled into the dist tarball as “autotools junk” that nobody looks at. Should bootstrappable projects still use autotools, or is there something better in 2024?

Autotools Is A Pain, But Not As Much As You Think

I have historically used automake (and the other autotools) as my go-to build environment for C projects. For simple projects (example: MudCore), it’s really not that much effort to set up, and you get a lot for free:

The generated Makefiles support the GNU Makefile Conventions;
The user can set a custom --prefix at configure time;
Dependencies are tracked correctly as a side-effect of compilation (guaranteeing correct rebuilds during incremental development, while avoiding a wasteful dependency-generation pass for one-off builds); and
The DESTDIR variable is respected at install time (important for package managers and tools like GNU Stow).

Many handwritten Makefiles fail at least one of these criteria.

That said, the autotools do have their warts, both for developers and users:

At some point you need more than the common macros provide, and if you can’t get a useful macro from the Autoconf Archive, you’ll be writing in at least one ancient tongue: portable sh, portable Makefile, and the autoconf-specific dialect of GNU m4.
Many of these tools are barely maintained in 2024. libtool went a long time with a new release on GNU mirrors (and in distro packages) but with the old version number on the webpage.
The autotools are slow, particularly when setting up new projects. Adding a new source file to Makefile.am is a momentary hitch as Makefile regenerates. Adding a new dependency to configure.ac causes a re-building of the entire build system and a full ./configure run, which is a flow-breaking wait of several seconds. The -C flag to configure enables caching; this helps but then nags you to make distclean if ever you need to change flags or other config settings.
automake hasn’t really moved past the recursive-make idiom (correctly considered harmful for around 20 years). This means you have to carefully order the SUBDIRS variable to ensure correct from-scratch builds, and running make in a subdirectory might not do the right thing. You can write non-recursive Makefile.am, and even break large Makefile.ams apart using include and the little-known %reldir% and %canon_reldir% substitutions, but it doesn’t work for all of automake’s features (e.g., texinfo support). So you have to write a half-recursive set of Makefile.am files.
If you use libtool, sometimes your object files are actual objects, and sometimes they are little magic shell scripts. And then you need to run libtool --mode=execute gdb ./foo to get into a debugger.
If you turn on per-target flags, automake will rename all the objects that are built for that target, and gives them pretty ugly names. This gives me an uneasy feeling that I’m not quite doing things the way it wants.
Many “modern” features (e.g., subdir-objects, color-tests, silent-rules, parallel-tests) are off by default, for backwards-compatibility reasons. You just have to know to turn them on.

The Best Alternative Is `meson` And I Don’t Like It

I asked online for recommendations for a modern, portable build system for C. CMake is absolutely out: I still haven’t forgiven it for deciding that the space-separated output of pkg-config meant it was a “list”, then emitting that list as a semicolon-separated string into my build commands. CMake was a good intermediate step and brought some good ideas (interactive configuration, enforced out-of-tree builds), but I think we’ve surpassed it now.

Meson seems to be the lead recommendation, and a lot of the GTK/Gnome-verse packages seem to have switched over to it. I think it has a lot of shortcomings with regards to writing bootstrappable software, and I think these are serious concerns for software projects in general:

meson depends on Python, which by default wants to depend on all sorts of things like Tk, sqlite, zlib, libbz2, liblzma, etc. If any of these packages develop a complicated bootstrap path, then bootstrapping any meson-based project becomes fraught.
meson also depends on ninja, which means you have to bootstrap ninja too, using either CMake or its custom Python-based bootstrap script (props to the Ninja developers for including this). That means our C99 project needs a C++ compiler.
meson’s design is built around a finite list of known compilers. If I want to produce software that participates in a bootstrap chain, what happens if meson falls out of fashion and new C compilers come onto the scene? meson is already unable to build with TinyCC, and there’s a three-year-old open issue to support generic compilers. “Test for features, not compilers” has been build system wisdom since the autoconf days, and was such a good idea that webdev rediscovered it after User-Agent sniffing became too much. I can’t see why meson has chosen this path.
meson dist builds .tar.xz archives by default, and while it does support a --formats gztar option, I can’t see any way to make it the default. ~~The GNU Guix project notes that xz is “notoriously difficult to bootstrap”~~.

EDIT 2024-04-07: I’ve since learned that mescc-tools-extra has an xz decompressor, so we need not worry as much about the official lzma-utils bootstrap chain.
subdir() calls aren’t isolated: the expected way to pass information between subdirectories seems to involve setting global variables in each subdir(). This seems to me a serious design flaw; it would have been much better to allow meson.build in a subdirectory to return a record or dict, which the parent can then inspect and share with other subdirectories. This is similar to the subdir-ordering problem that automake has with its SUBDIRS variable, but because meson considers the project as a whole, it’s not quite as bad.
It’s awkward to avoid subdir() and write a single top-level meson.build because some functions like install_headers() will need you to manually specify deeper paths and explicit subdirectories.
It’s not clear whether I should be running meson or ninja to achieve certain tasks. A correctly written automake project has two phases: you run ./configure to set up the build, and then you use make to drive the build, and let the generated Makefile rules rebuild the necessary bits of the build system.
meson tries to alleviate this by detecting ninja and recommending that you run meson foo for everything. But this makes the problem worse: as with compilers, meson now has a set of hard-coded .ninja build runners that it knows about, and that has caused problems with Samurai in the past. (More on Samurai below.)

I don’t want this to just be a rant. There is a lot I genuinely like about meson’s design:

It has only one language, and it’s a decent-enough one. Despite being imperative, many data structures are immutable or become immutable when used as an input. Example: freezing configuration_data() objects after they’ve been passed to configure_file() is a smart design decision and avoids a lot of potential “why isn’t my change showing up?” bugs.
It sees the entire project at once, even if you use subdir(). This alone is huge, because it’s the only way to get cross-directory dependencies right in a nontrivial project.
There are no generated files in source distributions.
meson dist uses what’s in source control as the basis for the distribution archive, not what’s currently in the worktree. This makes it possible to automatically check for divergence between a source archive and the release tag.
It’s easy to ask for a specific C standard and warning level. I can say, by default, “I want a pedantic C99 compiler”, which beats remembering to run ./configure -C CC='gcc -std=c99' CFLAGS='-Og -g -Warn -Wextra -pedantic' whenever I work on a fresh checkout. (That there seems to be no similar option for configuring the output of meson dist, and I don’t know why.)
Similarly, there’s baked-in support for asan and ubsan.
Compilers are objects in the Meson language, and so you run your configure-time checks against specific compilers. This is much cleaner than AC_LANG_PUSH/AC_LANG_POP.
The “host” and “build” compilers can be distinguished in a cross-compilation scenario, making it possible for a single project to build tools to help the build. In automake, I’d probably put such tools in a separate package or write them in bash or perl.
Depending on other packages is a first-class feature.
Cross-compiling is a first-class feature. (automake also supports this.)
Windows support is a first-class feature.
It has a much nicer “aesthetic” than automake. This isn’t just about pretty colours in configure messages, but the fact only out-of-tree builds are supported means they can create much cleaner directory structures for per-target artefacts.

So, that’s it, then? Meson seems to have a similar number of warts, and a much more attractive feature set? Well, it’s complicated. I consider its finite list of compilers, its Python dependency, and the transitive closure of its bootstrap footprint to be pretty severe drawbacks. There is a possible answer in the muon and Samurai projects, which are C99 reimplementations of meson and ninja, respectively. Do they get us out of the woods? Yes and no:

The authors of muon and Samurai have thought about bootstrapping, which is great. Samurai has a simple Makefile so it’ll build on nearly anything, and muon has a bootstrap.sh which builds itself and (optionally) a vendored copy of Samurai, after which you can configure and build muon into its final form.
muon does not support everything meson does. It didn’t take me long to discover that meson supports passing a dict to the default_options kwarg of the project() function, but muon does not.
In particular, muon dist is not a supported command.
muon does not support cross-compilation.
muon seems to support the idea of a “generic compiler”, which is great but means that neither meson nor muon is a strict subset of the other’s functionality.
muon does not support hotdoc, the only documentation tool supported by meson’s standard list of modules.

Where Does That Leave Us?

I think there are two decent paths forward, if one cares about writing bootstrappable software:

Use Meson as the build language, but do all regular development using muon and Samurai. This ensures that the project is at least buildable and installable as part of a lean boostrap chain. Use meson to run “project administration”-type tasks, like generating documentation and dist tarballs. Remember to generate .tar.gz tarballs, so that bootstrappers don’t have to get all the way to xz-utils just to start the build.
Go back to automake and eat the additional warts in exchange for a guaranteed small set of dependencies. Despite everything, it still works well. Use non-recursive Makefile.am where you can (with include, %reldir%, and %canon_reldir%) and SUBDIRS where you must.

I can think of a couple of ways to harden the build system against the sort of subversion that Jia Tan pulled:
- For developers: stop passing --install to aclocal (i.e. in Makefile.am’s ACLOCAL_AMFLAGS). This prevents aclocal from copying the m4 files it uses into your build tree, which stops make dist from bringing them into your release tarballs. End users will still be able to build the package, but developers changing the build system will need to install autoconf-archive. That’s no problem; it’s available in most distros.
- For distro packagers: unconditionally re-bootstrap the package (using autoreconf or similar) before building. People working on bootstrappable builds routinely do this, but I think it’s now something everyone should be doing. If you can’t bootstrap the build system, the package is nearly unmaintainable, and it reduces the likelihood of surprises sneaking into the configure script or other generated files. I don’t know what’s going on in Pythonland, but they seriously tell people to regenerate configure using a Makefile target which runs something in Docker, which sounds utterly bonkers to my ears. Requiring simple, single-command bootstrap of packages’ build systems should stop things from getting too wild.
  
  autoreconf-ing everything unfortunately puts more work onto distro build farms, but I see many packages in nixpkgs doing it anyway, because of the patches they apply to their packages. It’s probably a price worth paying just for the peace of mind.

Whether using Meson or autotools, it’s probably also worth thinking about building distro packages against VCS release tags instead of tarballs. xz-utils, being an autotooled package, had a lot more dark corners for Jia to hide his payload, but there’s really nothing autotools-specific about this attack. Building against VCS tags also makes it possible to detect when upstream force-pushes over a release tag, which is a thing some maintainers do.

The Road to Amazonka 2.0

2023-08-30T13:00:00Z

Last month, Brendan Hay and I released the 2.0 version of Amazonka, the de facto but unofficial AWS SDK for Haskell. Before that, Amazonka had seen intermittent commits and some pretty major improvements, but hadn’t managed an actual release in about four years. Because of the lack of visible progress, more serious industrial users maintained private forks instead of contributing to the main repository. It took about two years of work to pick up what was left behind, triage all the open issues, make several necessary major improvements, and get the whole project back into a shippable state.

I believe that many open source ecosystems have projects like Amazonka: projects which are large, important to their ecosystem, and stuck. These are my notes about how to unstick such a project, using Amazonka as a case study. It’s a fair amount of work, but a surprising amount of help can come out of the woodwork once someone makes the first move. That person could be you, and if it’s not you, then who’s it gonna be?

Getting Started

I started in almost the worst possible way, by barging onto the Hackage Trustees’ issue tracker and asking to do a non-maintainer upload of the whole amazonka-* package family. While this was a pretty rude thing to do, it at least got me and Brendan talking.

After that, I instead tried doing the actual work: looking for issues to close or PR, and classifying open PRs as “needs fixes”, “should merge”, or “should close”. When the project still has a somewhat responsive maintainer, helping to tame the bugtracker is a great way to move from being an occasional PR author and instead become an actual member of the project team. The biggest difficulty with resurrecting a large stuck project like Amazonka is working out what actually needs to be done: issues can become irrelevant with age, some things should be deferred until after the big cleanup, and the whole thing becomes a big unappealing tangle which the maintainer never quite gets to. Fixing this means making the maintainer’s job as easy as possible; my approach was to write more rather than less, and bring enough detail together that Brendan should be able to say “yep, closed” or “yep, merged”.

My experience with Amazonka and other Haskell libraries is that maintainers tend to care deeply about their code, feel very responsible for it, but are usually time-poor. Nearly every time I’ve showed up willing to do the work, people have bent over backwards to accommodate it. The best thing you can do as a contributor, then, is make the maintainer’s job as easy as possible by putting effort into commit messages, CHANGELOG.md entries, PR descriptions, and so on. A time-poor maintainer will have a much easier time approving things if you provide all the necessary context.

But sometimes maintainers just aren’t able to get the work done, for any number of reasons. Burnout is a real problem, they could be working in other languages, in remote locations, or dealing with a major life event. In those instances you will need to take on more responsibility. My recommendation: take the minimum additional responsibility you need to get the job done, because that ruffles the fewest feathers. A handover or co-maintainership is better than a fork with maintainer blessing, which is much better than a hostile fork. I started opening PRs and posting recommendations to issues in early 2021. In September 2021 there was a flare-up on the issue tracker of the “just fork it, it’s never going to get done” type, so I suggested that Brendan lay out a road map and appoint additional maintainers. He offered me collaborator access, I accepted, and then the real work began.

Getting the Repo Under Control

The issue tracker is the map to the next release, and a map is no good if it doesn’t match the territory. After being made collaborator, I tagged every open issue and pull request (PR) with a new needs triage label, and read/triaged/split/closed all of them. This sounds intimidating, and it is, but it’s quite doable. Having the label made it easy to get a list of all untriaged items, and I would go through 5–10 issues over breakfast each morning instead of reading junk online. This proved to be extremely useful, and recommend it to anyone picking up a dormant project. It gave me a handle on how the library worked, what was on Brendan’s road-map, the pain points affecting real users, and so on.

For this reason, I consider stalebots harmful and recommend against summarily closing old issues until you’ve thought about them properly and understood exactly what’s been reported. Each comment, issue, and PR against a dormant project exists because somebody cared enough to write it up, and that’s worth taking seriously.

After reading and triaging the issues and PRs, I had some sense of what things people needed, was able to cluster them into rough groupings, make guesses at what looked easier or harder, and start working through them in batches. This makes learning a large project much less intimidating, because you can go and learn how (say) service overrides work, clear off a bunch of those issues, and then move onto something else like request signing. A full mental model of the project then develops over time. (In hindsight, it may be possible to acquire this mental model more rapidly by using something like reflexion models.)

Handling New Contributors

Cleaning up the issue tracker had a surprising side-effect: a few people noticed the increased activity and came out of the woodwork to submit features they’d developed for their own use. A few major features came via PR, such as support for AWS ~~SSO~~ Identity Center and the sts:AssumeRoleFromWebIdentity API call (used to assume IAM roles via OpenID Connect, as well as from other identity providers).

These PRs often worked well, but sometimes lacked understanding of Amazonka’s architectural direction because that context was never written down. In these cases, I had to politely request a total rewrite, which is feedback that must be delivered carefully. I therefore put extra effort into these reviews to spell out the library’s architecture and make the contributors’ job as easy as possible. In almost every instance, the authors happily rewrote and resubmitted their PRs, even though the rewrites were a decent amount of work. I’m very grateful for their contributions, and for their flexibility.

Calling Back More Users

To confidently make a final release, we needed more people using it. Many people were already using relatively recent versions of Amazonka from git, but we wanted people to move from their private forks to our version. The first release candidate was announced at the end of November 2021, to give people (especially industrial users) a chance to test it with real workloads.

As with maintainers, I think it’s really important to be respectful of your users’ time, and we needed to make it easy for them to try the release candidate. We therefore provided instructions for how to import Amazonka from git, for both Cabal and Stack. (For similar reasons, we also made sure to provide a migration guide from the 1.6.1 to the 2.0 final release.)

This had the desired effect — it brought a lot of reports out of the woodwork. Most said “yes, this is working great for us”, but it also caused a welcome flurry of bug reports. The proposed four-week stabilisation period turned out to be wildly optimistic, and it wasn’t until July 2023 that we were able to announce a second release candidate.

I would’ve preferred a smaller gap between RC1 and RC2, but it turned out that there was a lot more work required, and some of it required long stretches of focused work. One example: to bring amazonka’s authentication support in line with official AWS SDKs, we needed to add support for unsigned requests to AWS and support several new authentication methods. Doing this properly and in an extensible way required a thorough rework of the authentication subsystem.

Thanks

This release would never have happened if not for the support and contributions of a great many people. Here is a partial list:

Brendan Hay, for Amazonka itself and its releases, handling much of the build system work, shipping a much more ergonomic baseline interface, many code reviews, and for trusting me with the commit bit.
Alex Mason, for many long conversations about API design, developer ergonomics, and “I intend to change Amazonka.Foo. Will it break amazonka-s3-streaming?”
Bellroy, my employer, for giving me several weeks of work time to work on the bigger and more complicated changes. If not for their sponsorship, Amazonka 2.0 probably would not have arrived before 2024.
The “regulars” on the Amazonka issue tracker, including @pbrisbin, @K0te, @mbj, @ysangkok, and @Fuuzetsu.
Anyone else who raised PRs or issues once the project started to recover momentum.

Final Thoughts

I never expected to be the one to do this: I got into cloud relatively recently, and Amazonka initially looked too intimidating to tackle. But work started leaning into more AWS-specific offerings, and the need for a better SDK became more pressing. I also completed my AWS Certified Solutions Architect — Associate certificate, and became the one on the team with strong AWS and Haskell knowledge. On top of that, some close friends got in my ear, saying things like, “you care a lot about Haskell and about cloud. Your community needs this, and you have the skills to do it. If it’s not you, then who’s it gonna be?”

And that’s my challenge to you. Find a stuck project that’s important to your part of the ecosystem, and see if you can unstick it. Because if it’s not you, then who’s it gonna be?

The Maddest My Code Made Anyone

2023-05-14T13:00:00Z

I was lucky enough to grow up through the early 2000’s, during the golden age of Half-Life mods. Industry classics like Counter-Strike (CS) had just been invented, every month brought new mods to try, and files were too big to download on a dial-up connection so you’d leech them off a server at a big local LAN party. One of my personal favourites was Natural Selection (NS), a sci-fi marines-vs-aliens deal where one of the marine players had to command the others, RTS-style, from a command chair. If you’ve never heard of it before, this 2022 video review of NS will give you a sense of what it was like.

Images from ModDB

As Half-Life modding matured, some really interesting inventions appeared. MetaMod was a C++ framework that interposed itself between the server binary and the actual mod DLL, allowing you to inject custom behaviour into an existing mod. I didn’t understand enough C++ to write MetaMod plugins, but that didn’t matter: AMX Mod and later AMX Mod X let you write custom plugins using a simpler C-style language called Pawn (known back then as “Small”). This enabled an explosion of ways for operators to tweak their game servers: quality-of-life improvements for players, reserved player slots for members, and delightfully bonkers gameplay changes. I remember having my mind blown the first time I stumbled upon a game of CS with a class-based perks system, inspired by Warcraft 3, and that was just one instance of the creativity that came from the AMX(X) modding scenes.

And with the Half-Life-specific background covered, we are now ready to talk about NS: Combat and my gloriously dumb contribution to the AMXX world.

The original release of NS was hard to enjoy at low player counts. It was balanced for 6v6, so confining one marine to the command chair hurt the marine team a lot. This was also before the era of server-side match-making, so if nobody was around you’d join your local (often ISP-provided) game server and hang out, hoping enough people would come online to get a good game going.

To address these problems, the NS team added a simpler alternative mode called “combat” as part of the mod’s 2.0 release. Combat maps were much smaller and removed the resource-gathering and RTS elements in favour of a much simpler goal: the marines had to destroy the alien hive, and the aliens had to destroy the (unoccupied) command chair. With the resource system removed, players instead earned XP and levels for kills and assists, and could spend those levels on upgrades, advanced morphs (aliens), or weapons and equipment (marines).

Combat was perhaps too successful: it was designed as a lightweight substitute for the real game, for when you didn’t have a lot of players. But it quickly overtook classic NS in popularity and stayed that way for the rest of the mod’s lifespan. Of course, AMXX modders extended the combat mode in all kinds of broken ways; the main one raised the level cap beyond 10 and added additional upgrades to spend those levels on. It was colloquially known as “xmenu”, because it added a /xmenu player command, opening a menu of new upgrades to spend those additional levels on.

But I liked NS for the buildings! To me, that was what made NS special. Since I could code well enough to write AMXX plugins, I added them to the combat game mode. The Combat Buildings plugin gave players a new /buildmenu command that let them spend their levels to place structures.

The release was surprisingly controversial. Some people rather liked it, but the people who hated it really hated it. One of the ModNS forum moderators (in a long-deleted post, sadly) called it “the most ridiculous concept I have ever seen on these fora”. And here is the maddest my code has ever made anyone:

But as absolutely terrible as /xmenu is, /buildmenu is the god damned devil. Buildmenu is an abomination upon the lord that is causing the universe to unravel and all heretics who follow the terribleness that is buildmenu shall perish in hell. I’d like to give a big thanks to whoever created /buildmenu for making THE WORST COMBAT PLUGIN EVER.

You’re welcome.

I was very taken aback when I first saw this comment, but these days I cherish it. It reminds me one of the first times my code had a big impact on a community. Enough people liked it that I made the final versions of Combat Buildings integrate with other plugins, allowing servers where the aliens could build on walls and ceilings, or allowing players to build in the custom marine vs. marine and alien vs. alien game modes. I loved the feeling of making a game play by my rules, of building on others’ work, of being part of a community and swapping knowledge, and of making cool (dumb) stuff happen just because I willed it. Those feelings don’t ever get old, and are a big reason why I still love hacking on things.

Monoids in the Category of...

2023-01-28T13:00:00Z

The unfortunate meme phrase “a monad is just a monoid in the category of endofunctors, what’s the problem?” comes from two sources:

The fact and most of the phrasing comes from Mac Lane’s Categories for the Working Mathematician, but
“What’s the problem?” is a cheeky addition from a funny 2009 blog post: A Brief, Incomplete, and Mostly Wrong History of Programming Languages

The meme words have become an annoying blot on the fringes of the Haskell universe. Learning resources don’t mention it, the core Haskell community doesn’t like it because it adds little and spooks newcomers, and it’s completely unnecessary to understand it if you just want to write Haskell code. But it is interesting, and it pops up in enough cross-language programming communities that there’s still a lot of curiosity about the meme words. I wrote an explanation on reddit recently, it became my highest-voted comment overnight, and someone said that it deserved its own blog post. This is that post.

This is not a monad tutorial. You do not need to read this, especially if you’re new to Haskell. Do something more useful with your time. But if you will not be satisfied until you understand the meme words, let’s proceed. I’ll assume knowledge of categories, functors, and natural transformations.

The Meme Words and Hask

“A monad is a monoid in the category of endofunctors” is not specific enough. Let’s fill in the details and specialise it to Haskell monads, so that we build towards a familiar typeclass:

“Haskell monads are monoid objects in the monoidal category of endofunctors on Hask, with functor composition as the tensor.”

Let’s first practice looking for monoid objects in a monoidal category that’s very familiar to Haskell programmers: Hask, the “category” where the objects are Haskell types and the morphisms are functions between the types. (I use scare quotes because we quietly ignore ⊥).

We will first explore the following simpler claim about monoids, and come back to monads:

“Haskell monoids are monoid objects in the monoidal category Hask, with (,) as the tensor.”

Product Categories and Bifunctors

We will need the categorical definition of bifunctors to define monoidal categories, and we’ll need product categories to define bifunctors:

Definition 1: The product of two categories is called a product category. If C and D are categories, their product is written C × D and is a category where:

The objects of C × D are pairs (c,d), where c is an object from C and d is an object from D; and
The morphisms of C × D are pairs (f,g), where f is a morphism from C and g is a morphism from D.

Definition 2: A bifunctor is a functor whose domain is a product category.

In Haskell, we tend to only think about bifunctors Hask × Hask → Hask, as represented by class Bifunctor:

class (forall a. Functor (p a)) => Bifunctor p where
  bimap :: (a -> b) -> (c -> d) -> p a c -> p b d
  -- other methods omitted

-- Uncurrying bimap and adding parens for clarity:
bimap' :: Bifunctor p => (a -> b, c -> d) -> (p a c -> p b d)
bimap' (f, g) p = bimap f g p

bimap and bimap' are equivalent, and you can see how bimap' maps a morphism from Hask × Hask to a morphism in Hask. We use bimap because it is more ergonomic to program with.

Aside 3: Iceland_Jack has an unofficial plan to unify the various functor typeclasses using a general categorical interface, which has the potential to subsume a lot of ad-hoc typeclasses. If done in a backwards-compatible way, it would be extremely cool.

Exercise 4: Show that Either is a bifunctor on Hask × Hask → Hask, by giving it a Bifunctor instance.

Solution

Deriving Simple Recursive Functions

2023-01-08T13:00:00Z

I used to teach Haskell to first-year university students, and many of them struggled to write their first recursive functions. It really isn’t obvious why you can solve a problem using the function you’re in the process of defining, and many students have difficulty making that leap. There is no shame in this. I remember taking a long time to grok proof-by-induction, which has a similar conceptual hurdle: how can you use a statement to prove itself?

Writing recursive functions requires a lot of tacit knowledge in selecting the recursion pattern to use, which variables to recurse over, etc. Recursion was not immediately obvious to industry professionals, either: I remember an errata card that came with TI Extended Basic for the Texas Instruments TI 99/4A which mentioned that later versions of the cartridge removed the ability for subprograms to call themselves, because they thought it was not useful and mostly done by accident.

I want to share a recipe that helped my students write their first recursive functions. There are three steps in this recipe:

Write out several related examples
Rewrite the examples in terms of each other
Introduce variables to generalise across all the examples.

Worked examples and some teaching advice after the jump.

Example 1: `product :: [Int] -> Int`

Suppose we are asked to write a function product :: [Int] -> Int that multiplies a list of numbers together. Begin by writing out several examples of what the function should actually do:

product [2, 3, 4, 5] = 2 * 3 * 4 * 5 = 120
product [3, 4, 5] = 3 * 4 * 5 = 60
product [4, 5] = 4 * 5 = 20
product [5] = 5

There is a bit of an art to selecting the initial examples, so here are a few tips:

The shape of the data definition heavily influences the shape of the recursion. Because this function must recurse over cons lists, we choose example inputs with similar tails.
It’s not usually necessary to choose big examples: three or four elements are usually enough.
Choose distinct values in each part of the data structure, so it’s clear which sub-parts need to align.
Avoid elements that behave strangely with respect to the function you’re writing. It’s tempting to use the list [1, 2, 3, 4], but the fact that 1 * x == x means that we could confuse ourselves a bit. I chose the list [2, 3, 4, 5] instead.

Next, rewrite the examples in terms of each other:

product (2:3:4:5:[]) = 2 * 3 * 4 * 5 = 2 * product (3:4:5:[])
product (3:  4:5:[]) = 3 *     4 * 5 = 3 * product (  4:5:[])
product (4:    5:[]) = 4 *         5 = 4 * product (    5:[])
product (5:      []) = 5             = 5

Notes:

If lists are involved, desugar them to make the correspondence more obvious.
Aligning similar elements vertically often helps, but it’s more helpful to align them by their position in the data structure instead of by their value. In this example, I’ve put all the list heads into the same column. This makes it easier to see where to introduce variables.

Finally, generalise over the first and last columns by introducing variables:

         x ,--xs--.                    x            ,--xs--.
         | |      |                    |            |      |
product (2:3:4:5:[]) = 2 * 3 * 4 * 5 = 2 * product (3:4:5:[])
product (3:  4:5:[]) = 3 *     4 * 5 = 3 * product (  4:5:[])
product (4:    5:[]) = 4 *         5 = 4 * product (    5:[])
product (5:      []) = 5             = 5 * product        []

product (x:      xs) =               = x * product        xs

Notes:

Rewriting the 5:[] example into 5 * product [] makes it fit the pattern of all of our other examples, which allows us to generalise over all our examples with a single equation.
Knowing that 5 * product [] = 5 tells us that product [] must be 1.

We now have enough information to write out a function definition:

product :: [Int] -> Int
product [] = 1
product (x:xs) = x * product xs

Example 2: `treeSum :: Tree Int -> Int`

Suppose we are asked to write treeSum :: Tree Int -> Int, given the following definition of binary trees:

data Tree a = Nil | Node a (Tree a) (Tree a)

As before, write out an example, and generate equations from its sub-parts:

-- For the tree:
--
--        4
--       / \
--      3   x
--     / \
--    /   \
--   1     2
--  / \   / \
-- x   x x   x
--
treeSum (Node 4 (Node 3 (Node 1 Nil Nil) (Node 2 Nil Nil)) Nil) = 4 + 3 + 1 + 2
treeSum (Node 3 (Node 1 Nil Nil) (Node 2 Nil Nil)) = 3 + 1 + 2
treeSum (Node 1 Nil Nil) = 1
treeSum (Node 2 Nil Nil) = 2

Then, line up the examples and rewrite them in terms of each other:

treeSum (Node 4 (Node 3 (Node 1 Nil Nil) (Node 2 Nil Nil)) Nil) = 4 + 3 + 1 + 2
treeSum         (Node 3 (Node 1 Nil Nil) (Node 2 Nil Nil))      =     3 + 1 + 2
treeSum                 (Node 1 Nil Nil)                        =         1
treeSum                                  (Node 2 Nil Nil)       =             2

-- Rewritten:
treeSum (Node 4 (Node 3 (Node 1 Nil Nil) (Node 2 Nil Nil)) Nil) = 4 + treeSum (Node 3 (Node 1 Nil Nil) (Node 2 Nil Nil))
treeSum (Node 3 (Node 1 Nil Nil) (Node 2 Nil Nil))              = 3 + treeSum (Node 1 Nil Nil) + treeSum (Node 2 Nil Nil)
treeSum (Node 1 Nil Nil)                                        = 1
treeSum (Node 2 Nil Nil)                                        = 2

Nothing seems to line up! The problem is that the example isn’t complicated to give us a complete picture, so we could try drawing another tree with more nodes and working through that. But the equation for treeSum (Node 3 (Node 1 Nil Nil) (Node 2 Nil Nil)) contains a big hint, as it’s adding three things together: the node value, the treeSum of the left subtree, and the treeSum of the right subtree. We can force the other equations for Nodes into the right shape by adding + 0 a few times, and that gives a pretty big hint that treeSum Nil should be equal to 0:

treeSum (Node 4 (Node 3 (Node 1 Nil Nil) (Node 2 Nil Nil)) Nil) = 4 + treeSum (Node 3 (Node 1 Nil Nil) (Node 2 Nil Nil)) + 0
treeSum (Node 3 (Node 1 Nil Nil) (Node 2 Nil Nil))              = 3 + treeSum (Node 1 Nil Nil) + treeSum (Node 2 Nil Nil)
treeSum (Node 1 Nil Nil)                                        = 1 + 0 + 0
treeSum (Node 2 Nil Nil)                                        = 2 + 0 + 0
treeSum Nil                                                     = 0

-- Rewritten:
treeSum (Node 4 (Node 3 (Node 1 Nil Nil) (Node 2 Nil Nil)) Nil) = 4 + treeSum (Node 3 (Node 1 Nil Nil) (Node 2 Nil Nil)) + treeSum Nil
treeSum (Node 3 (Node 1 Nil Nil) (Node 2 Nil Nil))              = 3 + treeSum (Node 1 Nil Nil) + treeSum (Node 2 Nil Nil)
treeSum (Node 1 Nil Nil)                                        = 1 + treeSum Nil + treeSum Nil
treeSum (Node 2 Nil Nil)                                        = 2 + treeSum Nil + treeSum Nil
treeSum Nil                                                     = 0

Complete the process by generalising over all of the examples with variables:

treeSum :: Tree Int -> Int
treeSum (Node n left right) = n + treeSum left + treeSum right
treeSum Nil = 0

Example 3: `map :: (a -> b) -> [a] -> [b]`

Suppose we are asked to write the classic map function over lists. Since the input function and the element type are not known, use placeholders when generating examples:

map f [a, b, c] = [f a, f b, f c]
map f [b, c] = [f b, f c]
map f [c] = [f c]
map f [] = []

Desugar, align, and rewrite the equations in terms of each other, and finish by introducing variables:

map f (a:b:c:[]) = f a : f b : f c : [] = f a : map f (b:c:[])
map f (b:  c:[]) = f b :       f c : [] = f b : map f (  c:[])
map f (c:    []) = f c :             [] = f c : map f (    [])
       | |    |                             |          |    |
       x `-xs-'                             x          `-xs-'

map f (x : xs) =                          f x : map f    xs
map _ [] = []

Closing Thoughts

This technique is useful but limited; larger data structures quickly become too unwieldy for it to work. But it seems to really help new Haskell programmers “get” recursion and bootstrap their skills and confidence. While it’s fine to show an example or two for students to crib from (at first), something about asking students to physically handle a pen and write it all out seems to make it sink in a lot better.

Uniplate is a Traversal

2022-10-30T13:00:00Z

While writing code to rewrite some Dhall syntax trees, I noticed a cool connection between the core uniplate operation and optics. This will be old news to advanced lens users, but I think it’s worth pointing out. The uniplate package’s original uniplate :: Uniplate a => a -> ([a], [a] -> a) is an early attempt at a “traversal” optic, properly expressed in lens by plate :: Plated a => Traversal' a a.

The uniplate library provides low-boilerplate ways to query and rewrite self-similar data structures; the uniplate function from class Uniplate a is its fundamental operation. Let’s look at the original definition, from the 2007 paper Uniform Boilerplate and List Processing:

class Uniplate a where
  uniplate :: a -> ([a], [a] -> a)

-- An example data type and instance
data Expr = Lit Int | Negate Expr | Add Expr Expr

instance Uniplate Expr where
  uniplate (Lit i) = ([], \[] -> Lit i)
  uniplate (Negate e) = ([e], \[e'] -> Negate e')
  uniplate (Add e1 e2) = ([e1, e2], \[e1', e2'] -> Add e1' e2')

uniplate extracts from a value of type T any immediate children of type T, and provides a function to reassemble the original structure with new children. From this, we can define operations like transform, which applies a function everywhere it can be applied, in a bottom-up way:

transform :: Uniplate a => (a -> a) -> a -> a
transform f a = rebuild $ map (transform f) as
  where (as, rebuild) = uniplate a

Look closely at the type of the uniplate operation: it extracts [a] from a structure, and provides a function to assign a new [a] into a structure. This is exactly what a get/set lens does:

-- As a record:
data GetSetLens s a = GetSetLens
  { get :: s -> a
  , set :: s -> a -> s
  }

-- As a type alias for a tuple:
type GetSetLens s a = (s -> a, s -> a -> s)

-- Factor out the common 's' parameter:
type GetSetLens s a = s -> (a, a -> s)

class Uniplate a where
  uniplate :: GetSetLens a [a]

The example Uniplate instance shows us that this lens requires careful use: we must return a list of exactly the same length as the one we are given. Now that we’ve noticed a connection between Uniplate and lenses, is there a better optic we could use? Yes — traversals are optics that focus zero or more targets, so we could rebuild the uniplate library on top of an operation that provides a Traversal' a a. This is what lens does with Control.Lens.Plated:

class Plated a where
  plate :: Traversal' a a

If you are unable to define a Plated instance on a type (e.g., you do not want to introduce an orphan instance on a type you do not own), lens also provides a helper, uniplate :: Data a => Traversal' a a. Interestingly, lens also provides a partsOf combinator which collects the foci of an optic into a list:

-- Usable as:
partsOf :: Iso' s a       -> Lens' s [a]
partsOf :: Lens' s a      -> Lens' s [a]
partsOf :: Traversal' s a -> Lens' s [a]
partsOf :: Fold s a       -> Getter s [a]
partsOf :: Getter s a     -> Getter s [a]

-- The real type signature:
partsOf :: Functor f => Traversing (->) f s t a a -> LensLike f s t [a] [a]

Its haddock even says that it “resembles an early version of the uniplate (or biplate) type” and that “you really should try to maintain the invariant of the number of children in the list”.

And that brings us full circle; we can get a van Laarhoven version of our original uniplate lens using Data.Data.Lens.uniplate:

Data.Data.Lens.uniplate :: Data a => Traversal' a a
partsOf Data.Data.Lens.uniplate :: Data a => Lens' a [a]

This is one of my favourite things about programming in Haskell: seeing that library authors have carefully refined concepts like “view the self-similar children of a structure” into ever more powerful and composable forms, and being able to notice the different stages in that evolution.

Text-Mode Games as First Haskell Projects

2022-05-28T13:00:00Z

Back in my school days, when my friend and I were first learning C++, we were voracious readers and exhausted our school’s programming curriculum very quickly. Our teacher challenged us to make something larger so we’d stop playing Tribes all the time. It worked: I spent the rest of the term building a text-mode platformer using , and he spent the rest of the term building and tweaking a small text-mode dungeon crawler.

Many new Haskellers make it through initial material (everything up to and including the Monad typeclass, let’s say), write a couple of “Hello, world!”-tier projects that use the IO type, but struggle to make the jump to industrial libraries and/or find projects that excite them. I think text-mode games can grow very smoothly alongside a programmer learning a new language, so here’s some thoughts on how to get started, how you might extend a game, and some advice for Haskell specifically.

A Simple Game

A text-mode dungeon crawler can start very small. My friend began with a core encounter loop, which was very much like a Pokémon battle: the player was placed into combat with a monster, given a choice between attacking and fleeing, and repeated this loop until either the player ran off or one defeated the other. You could imagine it looking something like:

There is a goblin in front of you.
You can ATTACK or RUN. What do you do?
[HP 98/100]> attack

You hit the goblin for 5 damage!
The goblin hits you for 7 damage!

There is a goblin in front of you.
You can ATTACK or RUN. What do you do?
[HP 91/100]> run

Okay, coward! See you later.

In Haskell, we might manually pass state between all our functions, and that state could be as simple as:

data GameState = GameState
  { playerHP :: Int
  , monsterHP :: Int
  }

Extending Your Game

Once this is working, there are a lot of ways to extend it. Some ideas of things to add:

Character generation:
- Begin with something simple, like just giving your fighter a name.
- Add stats.
- Add skills.
- Add classes. Fighter/Rogue/Magic User is a classic split for a reason.
Randomness. Pretty much anything can be made more interesting with randomness:
- Chance to hit
- Damage values
- Player stats
- Monster stats
- Gold drops
Fight a gauntlet of monsters, until the player runs out of HP.
- Track high scores during a session.
- Track high scores between sessions, by writing them to a file.
Have the player visit a town between fights. This makes the game switch between (at least) two modes: fighting and shopping.
- There won’t be much to do in town at first, but some easy options are “buy healing” and “deposit gold”.
- Once you have items in your game, add an item shop and and an item stash.
Items:
- Simple consumables (like healing potions or food) are a great place to start.
- An equipment system can be as simple as “which weapon are you taking into the next fight?”.
Skills and Spells:
- A skill or magic system opens up the player’s options beyond “fight” and “run”, making each combat round much more interesting.
Have more types of things (monsters, items, spells, &c.).
- Configure this at first with a simple data structure in one of your modules.
- Later on, you might want to try reading it from a file.
Maps:
- A simpler step before a full world map is to run each combat in a generated arena.
- Maps add all sorts of new things to hack on: pathfinding algorithms, data structures, graph representation, procedural generation, terrain, etc.

Haskell-Specific Advice

On the Haskell side, your goal should be to keep things as simple as possible. A big ball of IO with do-expressions everywhere is completely fine if it keeps you hacking on and extending your game. Don’t look at the dizzying array of advanced Haskell features, libraries, and techniques; wait until what you have stops scaling and only then look for solutions. Still, some Haskell-specific ideas might be helpful:

Start by passing your GameState in and out of functions manually. When this gets annoying, look at structuring your game around a StateT GameState IO monad.
- When that gets annoying (maybe you’re sick of writing lift, maybe you want to test stateful computations that don’t need to do I/O), consider mtl and structuring your program around MonadState GameState m constraints.
When your “ball of IO mud” gets too big to handle, start extracting pure functions from it. Once you have some IO actions and some pure functions, that’s a great time to practice using the Functor, Applicative and Monad operators to weave the two worlds together.
- Set up hlint at this point, as its suggestions are designed to help you recognise common patterns:
```
-- Actual hlint output
Found:
  do x <- m
     pure (g x)
Perhaps:
  do g <$> m
```
- Once you have a decent number of pure functions kicking around, your game is probably so big that you can no longer test it in a single sitting. This is a good point to start setting up tests - I like the tasty library to organise tests into groups, and tasty-hunit for actual unit tests.

A “command parser” like this is more than enough at first:

playerCommand :: GameState -> IO GameState
playerCommand s = do
  putStrLn "What do you do?"
  line <- getLine
  case words line of
    ["attack"] -> attack s
    ["run"] -> run s
    _ -> do
      putStrLn "I have no idea what that means."
      playerCommand s

Later on, you might want to parse to a concrete command type. This gives you a split like:

data Command = Attack | Run
parseCommand :: String -> Maybe Command
getCommand :: IO (Maybe Command) -- uses 'parseCommand' internally
runCommand :: Command -> GameState -> IO GameState

Even later on, you might want to use a parser combinator library to parse player commands.
When your command lines become complicated, that might be a good time to learn the haskeline library. You can then add command history, better editing, and command completion to your game’s interface.

Reading from data files doesn’t need fancy parsing either. Colon-separated fields can get you a long way — here’s how one might configure a list of monsters:
```
# Name:MinHP:MaxHP:MinDamage:MaxDamage

Goblin:2:5:1:4
Ogre:8:15:4:8
```
The parsing procedure is really simple:
- Split the file into lines.
- Ignore any line that’s blank or begins with '#'.
- Split the remaining lines on ':'
- Parse the lines into records and return them (hint: traverse).
You might eventually want to try reading your configuration from JSON files (using aeson), Dhall files, or an SQLite database.

If passing your configuration everywhere becomes annoying, think about adding a ReaderT Config layer to your monad stack.
Ignore the String vs. Text vs. ByteString stuff until something makes you care. String is fine to get started, and when it gets annoying (e.g., you start using libraries that work over Text, which most of them do), turn on OverloadedStrings and switch your program over to use Text.
A bit of colour can give a game — even a text-mode one — a lot of “pop”.
- After you’ve got your codebase using Text, try the safe-coloured-text library to add a bit of colour.
- Many modern terminals support emoji. While I’m not an emoji fan (that’s a rant for another time), it’s an easy way to add some pictures to your game.
Don’t worry about lens; just use basic record syntax. Once you get frustrated by the record system, look at using GHC’s record extensions like DuplicateRecordFields, NamedFieldPuns and RecordWildCards.
- Once you get sick of writing deeply nested record updates, only then consider lens, and only as much as you need to view/modify/update nested records in an ergonomic way. Remember, the point is to keep moving!

Go Forth and Hack (and Slash)!

A project like this can grow as far as you want, amusing you for a weekend or keeping you tinkering for years. Textmode games are an exceptionally flexible base on which to try out new languages or techniques. Start small, enjoy that incremental progress and use the problems you actually hit to help you choose what to learn about.

Haskell, Lua, and Fennel

2022-05-01T13:00:00Z

I find Haskell a fantastic language for almost all of the programming I want to do: a (reasonably) expressive type system, a (reasonably) good library ecosystem, and (reasonably) good tooling together give me a very satisfying local maximum for getting stuff done. I can’t see myself giving up libraries for a more powerful type system, nor giving up Haskell’s guarantees for a larger library ecosystem.

Scripting a larger program is one of the few areas where Haskell struggles. Despite some very impressive efforts like dyre, I think it’s a bit much to require a working Haskell toolchain and a “dump state, exec, load state” cycle just to make a program scriptable. This post discusses why Lua is a great scripting runtime for compiled programs, its shortcomings as a scripting language, how Fennel addresses many of these shortcomings, and demonstrates a Haskell program calling Fennel code which calls back into Haskell functions.

Lua

Lua is a weakly-typed imperative programming language designed to be embedded into larger programs. It has a lot of attractive features:

It’s written in the common subset of ANSI C and C++, so it compiles on nearly anything;
Its C API is quite simple, so it’s not too hard to drive from the host program;
Hot code loading allows for interactive tweaks to a running program;
It runs reasonably quickly;
A fresh runtime has almost nothing installed in it, so you have a fighting chance of hardening it to make a “safe” sandbox (though it’s still pretty hard); and
It has pretty good semantics, including lexical scoping and tail-call optimisation.

Many of these features are inherent to the runtime and not the language. Which is useful, because the language has some undesirable features:

If you forget to write local when assigning a variable, you set a global variable;
There is nothing like destructuring/pattern-matching in the language;
Variables are mutable by default;
Sequences are indexed starting at 1; and
It is an imperative language, which is an unintuitive paradigm for solving real-world problems.

Fennel

Fennel is a Lisp which compiles to Lua, and draws some syntactic inspiration from Clojure. For example, the classic factorial function in Fennel:

(fn factorial [n]
  (match n
     0 1
     _ (* n (factorial (- n 1)))))

Would compile to this Lua code:

local function factorial(n)
  local _1_ = n
  if (_1_ == 0) then
    return 1
  elseif true then
    local _ = _1_
    return (n * factorial((n - 1)))
  else
    return nil
  end
end
return factorial

The language has been designed to smoothly interoperate with existing Lua code, while also providing convenience features you’d expect from a Lisp (destructuring binds, macros, etc.).

The compiler is provided in two forms: an ahead-of-time compiler which translates .fnl files to .lua; and a runtime compiler that can hook itself into Lua’s package search mechanism.

HsLua

HsLua (old site) is a fully-featured set of Haskell bindings to Lua. The most recent versions bundle Lua 5.4, so it is both mature and up-to-date. The lua package implements low-level FFI bindings to Lua’s C API, but the hslua package is probably the one you want. It provides idiomatic wrappers for the low-level functions as well as re-exports from all the other hslua-* packages, creating an all-in-one import for most common cases. (Use Hoogle to find out which package actually defines a function or type.)

Putting it all Together

Our goal is to put these pieces together in a way that demonstrates how a larger program might use an embedded interpreter: a Haskell program with a Lua runtime which can load Fennel files, calling back into Haskell functions. Almost all of our work will be performed inside a Lua monad, which creates and destroys an interpreter for us.

The first task is to implement a Lua module in Haskell. Since computing large factorials is the only thing Haskell is any good at, let’s export that capability to Lua:

import qualified HsLua as L

-- | The 'L.DocumentedFunction' machinery is from "hslua-packaging";
-- we can provide to Lua any function returning @'LuaE' e a@, so long
-- as we can provide a 'Peeker' for each argument and a 'Pusher' for
-- each result.
factorial :: L.DocumentedFunction e
factorial =
  L.defun "factorial"
    ### L.liftPure (\n -> product [1 .. n])
    <#> L.integralParam "n" "input number"
    =#> L.integralResult "factorial of n"
    #? "Computes the factorial of an integer."
    `L.since` makeVersion [1, 0, 0]

-- | Also using "hslua-packaging", this registers our
-- (single-function) module into Lua's @package.preload@ table,
-- setting things up such that the first time
-- @require('my-haskell-module')@ is called, the module will be
-- assembled, stored in @package.loaded['my-haskell-module']@ and
-- returned.
--
-- This lazy loading can help with the startup time of larger programs.
--
-- /See:/ http://www.lua.org/manual/5.4/manual.html#pdf-require
registerHaskellModule :: Lua ()
registerHaskellModule =
  L.preloadModule
    L.Module
      { L.moduleName = "my-haskell-module",
        L.moduleDescription = "Functions from Haskell",
        L.moduleFields = [],
        L.moduleFunctions = [factorial],
        L.moduleOperations = []
      }

To add Fennel to our Lua runtime, we need to download and unpack a Fennel tarball, use the file-embed library to store fennel.lua inside our Haskell binary (it is a mere 200K, less if compressed), load it, and install it:

fennelLua :: ByteString
fennelLua = $(embedFile "fennel.lua")

-- | Load our embedded copy of @fennel.lua@ and register it in
-- @package.searchers@.
registerFennel :: Lua ()
registerFennel = do
  L.preloadhs "fennel" $ L.NumResults 1 <$ L.dostring fennelLua

  -- It's often easier to run small strings of Lua code than to
  -- manipulate the runtime's stack with the C API.
  void $ L.dostring "require('fennel').install()"

For Fennel versions <1.2.1 (2022-10-15):