How Copyover MUD Servers Worked

Posted on February 6, 2025 by Jack Kelly

Tags: c, mud, coding

When I was younger, I played a lot of MUDs (“Multi-User Dungeons” — the text-only predecessor to modern MMORPGs, often played over Telnet). They were great fun, particularly during high school: a lightweight multiplayer game with no client state meant you could log in from any machine in any lab, even Windows shipped a Telnet client in those days, the Telnet protocol was light enough to run on my school’s slow PCs and limited internet connection, and the lack of flashy graphics meant it was easy to hide the window from a passing teacher or librarian.

At some point, building and tinkering with MUDs became more interesting than playing them. In those days, MUD builders and wizards (admins) were often recruited from each game’s playerbase, and many MUDs let builders edit the world through in-game commands. This was incredibly cool at the time — even through a clumsy line-oriented (ed-style) editor, there was something magical about summoning blank rooms from the void, writing rich descriptions to turn them into “real” spaces, and adding items and “mobs” (Mobile OBJects — NPCs) to make them come to life. A few of my friends and I signed up to a “builder academy” MUD, where everyone got a zone to mess around in, and we tried our hand at crafting our own areas. Most of these projects didn’t get very far, and all of them have been lost to time.

There’s only so much you can do with builder rights on someone else’s MUD. To really change the game, you needed to be able to code, and most MUDs were written “real languages” like C. We’d managed to get a copy of Visual C++ 6 and the CircleMUD source code, and started messing about. But the development cycle was pretty frustrating — for every change, you had to recompile the server, shut it down (dropping everyone’s connections), bring it back up, and wait for everyone to log back in.

Some MUDs used a very cool trick to avoid this, called “copyover” or “hotboot”. It’s an idiom that lets a stateful server replace itself while retaining its PID and open connections. It seemed like magic back then: you recompiled the server, sent the right command, everything froze for a few seconds, and (if you were lucky) it came back to life running the latest code. The trick is simple but not well-documented, so I wanted to write it out while I thought of it.

The copyover method I’m most familiar with works like this:

The copyover command is invoked by a MUD admin.
The server calls pipe(2) to create a “pipe”. This is the data channel that the new version of the server will read from, and the old version of the server will write to.
The server calls fork(2), creating a copy of itself with the same state. We now have a parent and a child process.
The child closes the read end of the pipe, writes the game state into the pipe, and then exits.
(In parallel with №3) The parent closes the write end of the pipe and calls an exec(3) function to replace itself with the new binary. This exec usually includes a specific “copyover” flag on the command line as well as the FD for the read end of the pipe. File descriptors, including open sockets, will remain open across the exec() call.
(In parallel with №3) The parent, now running the new code, reads the game state through the pipe and then closes it.
The parent calls wait(2) to clear away the zombie child process.

At this point, we’ve achieved all of our goals. The server is running the new code with the old state under the old PID. The biggest weakness I see with this scheme is that if the new server fails to come up, you’ve got no way to abort the copyover and you lose all your state. If you give up maintaining a constant PID, I can imagine more elaborate and robust schemes; for example, swapping out the pipe for something more sophisticated allows the new server to report that it’s ready to take over. It’s also possible to be smarter about how file descriptors are handled: A server could split network connection handling off into a separate process from the game logic (and have them communicate over Unix domain sockets), pass sockets to the replacing server using SCM_RIGHTS, store copyover state in a memfd, or use systemd’s file descriptor store to hold your memfds and socket fds while systemd replaces your process. I don’t know what the most modern idioms are, I just wanted to document how it used to work.

A basic copyover server uses pretty basic Unix primitives — pipes, fork(2), and file descriptor persistence across exec(3) — but sufficiently clever use of Unix is indistinguishable from magic. (Other examples: Factorio using fork(2) to implement asynchronous saving on macOS and GNU/Linux; Cloudflare using SCM_RIGHTS to send TLS 1.3 connections to a separate process.) Much of the apparent magic comes not from Unix itself being magical, but because many of its primitives now lie hidden beneath cross-language runtimes or platform abstraction libraries, or are even forgotten outright. I’d started this post just looking to document the old way of copying over a stateful server, but the things I’ve found along the way make me want to dig further. What else have I missed? Is Stevens’ Advanced Programming in the UNIX Environment still the canonical reference?