When I was younger, I played a lot of MUDs (“Multi-User Dungeons” — the text-only predecessor to modern MMORPGs, often played over Telnet). They were great fun, particularly during high school: a lightweight multiplayer game with no client state meant you could log in from any machine in any lab, even Windows shipped a Telnet client in those days, the Telnet protocol was light enough to run on my school’s slow PCs and limited internet connection, and the lack of flashy graphics meant it was easy to hide the window from a passing teacher or librarian.
At some point, building and tinkering with MUDs became more
interesting than playing them. In those days, MUD builders and wizards
(admins) were often recruited from each game’s playerbase, and many MUDs
let builders edit the world through in-game commands. This was
incredibly cool at the time — even through a clumsy line-oriented (ed
-style)
editor, there was something magical about summoning blank rooms from the
void, writing rich descriptions to turn them into “real” spaces, and
adding items and “mobs” (Mobile OBJects — NPCs) to make them come to
life. A few of my friends and I signed up to a “builder academy” MUD,
where everyone got a zone to mess around in, and we tried our hand at
crafting our own areas. Most of these projects didn’t get very far, and
all of them have been lost to time.
There’s only so much you can do with builder rights on someone else’s MUD. To really change the game, you needed to be able to code, and most MUDs were written “real languages” like C. We’d managed to get a copy of Visual C++ 6 and the CircleMUD source code, and started messing about. But the development cycle was pretty frustrating — for every change, you had to recompile the server, shut it down (dropping everyone’s connections), bring it back up, and wait for everyone to log back in.
Some MUDs used a very cool trick to avoid this, called “copyover” or “hotboot”. It’s an idiom that lets a stateful server replace itself while retaining its PID and open connections. It seemed like magic back then: you recompiled the server, sent the right command, everything froze for a few seconds, and (if you were lucky) it came back to life running the latest code. The trick is simple but not well-documented, so I wanted to write it out while I thought of it.
The copyover method I’m most familiar with works like this:
The copyover command is invoked by a MUD admin.
The server calls pipe(2)
to create a “pipe”. This is the data channel that the new version of the
server will read from, and the old version of the server will write
to.
The server calls fork(2)
,
creating a copy of itself with the same state. We now have a
parent and a child process.
The child closes the read end of the pipe, writes the game state into the pipe, and then exits.
(In parallel with №3) The parent closes the
write end of the pipe and calls an exec(3)
function to replace itself with the new binary. This exec
usually includes a specific “copyover” flag on the command line as well
as the FD for the read end of the pipe. File descriptors, including open
sockets, will remain open across the exec()
call.
(In parallel with №3) The parent, now running the new code, reads the game state through the pipe and then closes it.
The parent calls wait(2)
to clear away the zombie child process.
At this point, we’ve achieved all of our goals. The server is running
the new code with the old state under the old PID. The biggest weakness
I see with this scheme is that if the new server fails to come up,
you’ve got no way to abort the copyover and you lose all your state. If
you give up maintaining a constant PID, I can imagine more elaborate and
robust schemes; for example, swapping out the pipe for something more
sophisticated allows the new server to report that it’s ready to take
over. It’s also possible to be smarter about how file descriptors are
handled: A server could split network connection handling off into a
separate process from the game logic (and have them communicate over
Unix domain sockets), pass sockets to the replacing server using SCM_RIGHTS
,
store copyover state in a memfd,
or use systemd’s
file descriptor store to hold your memfds and socket fds while
systemd replaces your process. I don’t know what the most modern idioms
are, I just wanted to document how it used to work.
A basic copyover server uses pretty basic Unix primitives — pipes,
fork(2)
, and file descriptor persistence across
exec(3)
— but sufficiently clever use of Unix is indistinguishable
from magic. (Other examples: Factorio
using fork(2)
to implement asynchronous saving on macOS and
GNU/Linux; Cloudflare
using SCM_RIGHTS
to send TLS 1.3 connections to a separate
process.) Much of the apparent magic comes not from Unix itself
being magical, but because many of its primitives now lie hidden beneath
cross-language runtimes or platform abstraction libraries, or are even
forgotten outright. I’d started this post just looking to document the
old way of copying over a stateful server, but the things I’ve found
along the way make me want to dig further. What else have I missed? Is
Stevens’ Advanced Programming in the UNIX Environment still the
canonical reference?