Update (2020-09-03): /u/zimbatm on lobste.rs suggested the nixos-generators project. Added link and brief discussion.
Update (2020-11-29): Added notes on the impact of instance limits on Packer builds.
The NixOS project publishes Amazon Machine Images (AMIs) that are a great base for reproducible servers. This post describes how NixOS and EC2 work together, first showing how to build upon the NixOS project’s public AMIs; and then digging all the way into the scripts maintainers use to build, import and distribute new AMIs on AWS.
There are few good ways to get AMI IDs for NixOS project images:
The NixOS project publishes its images with
owner-id=080433136561
; you can use this with an
aws ec2 describe-images
call. jq
is a good way
to select the interesting parts of the response, and is the easiest way
to find an AArch64 (which AWS calls arm64
) AMI:
# Return the most recent AArch64 NixOS image in the region
$ aws ec2 describe-images \
--region ap-southeast-2 \
--filters Name=owner-id,Values=080433136561 \
| jq '.Images | map(select(.Architecture == "arm64")) | sort_by(.CreationDate) | reverse | map({ ImageId, Description }) | .[0]'
{
"ImageId": "ami-05446a2f818cd3263",
"Description": "NixOS 20.03.2351.f8248ab6d9e aarch64-linux"
}
The NixOS download
page lists AMI IDs for the most recent stable release, which I think
are all x86_64
images. Scroll down to the “Getting NixOS”
section and click the “Amazon EC2” tab to find a list of AMIs, one for
each region. The launch buttons take you straight to the launch wizard
in the EC2 Management Console.
There is a list of AMIs going back to NixOS 14.04 in <nixpkgs/nixos/modules/virtualisation/ec2-amis.nix>.
You can retrieve a specific AMI ID with a nix
command
like:
$ nix eval --raw '(import <nixpkgs/nixos/modules/virtualisation/ec2-amis.nix>)."20.03".ap-southeast-2.hvm-ebs'
ami-04c0f3a75f63daddd
The NixOS AMIs can rebuild themselves from NixOS configuration in instance user data. To do this, the user data should look something like this:
### https://nixos.org/channels/nixos-unstable nixos
### https://example.com/path/to/another/channel channel-name
{ config, pkgs, ... }:
{
# Normal NixOS config goes here
}
On each boot the system refreshes its configuration:
root
’s channels are replaced with the ones listed
after the three-hash magic comments;
If any channels were found, nix-channel --update
runs to fetch the latest version of each channel;
/etc/nixos/configuration.nix
is replaced with the
entire user data; and
nixos-rebuild switch
runs, rebuilding the
OS.
If you only want this to happen once, you can set
systemd.services.amazon-init.enable = false;
. The first
boot will always refresh the configuration from user data (because
amazon-init
is enabled in the AMI), but then turn off the
service so it doesn’t happen on subsequent restarts.
EC2 instance meta data and user data (if it exists) gets downloaded from the Instance Meta Data Service (IMDS) and applied to a NixOS AMI by the following mechanism:
A script on the initramfs will query the IMDS and download the
user data (if it exists) and some of the instance meta data to
/etc/ec2-metadata
, if those files don’t already
exist.
On each boot, a systemd service called apply-ec2-data
runs to apply the downloaded data to the system. It:
Sets the host name, if not set by the NixOS configuration (config.networking.hostName
);
Sets root’s authorized_keys
file to contain the
first SSH key from the IMDS, unless authorized_keys
already
exists;
Checks for SSH host keypairs in user data, treating the user data as a pipe-separated list of key/value pairs, and setting SSH host keys if they aren’t already present. This seems to be part of instance bootstrapping for NixOps (which passes known keys so it can use strict host key checking, and immediately replaces them afterwards), and is a bad idea otherwise.
On each boot, a systemd service called print-host-key
dumps the SSH host key fingerprints to the system console, where they
can be grepped for.
On each boot, a systemd service called amazon-init
checks whether the user data looks like a nix expression, parses out nix
channels and updates them, and calls nixos-rebuild switch
on the user data.
Evaluating a full NixOS configuration on each boot can take a lot of
CPU and network resources, particularly if it needs to build uncached
derivations. This can cause runaway autoscaling if you’re not careful:
if autoscaling starts in response to CPU usage and the new instances
spend a lot of CPU trying to nixos-rebuild
, further
autoscaling can happen before the new instances have finished coming
online. On T2 instances, it can also burn through your launch
credits for no real benefit.
A tool like Packer can help you build and distribute AMIs by customising the base NixOS AMI. The main steps to provision our image are very simple, NixOS gives us declarative OS configuration:
Upload configuration.nix
and replace
/etc/nixos/configuration.nix
with it;
Invoke nixos-rebuild switch --upgrade
to build the
new OS;
(Optional) Run nix-collect-garbage -d
to remove old
files from /nix/store
; and
There is one more very important step you must do at
the end: make sure the new image responds to its instance meta data and
user data when it boots, and not the meta data/user data from when
packer
booted the NixOS AMI. As the final provisioning
action, you must remove all the files created by the EC2 metadata
fetcher, any SSH host keys, and most importantly root’s
.ssh/authorized_keys
file. If you do not do this, you will
be locked out of your image.
Here’s a simple packer configuration that provisions a NixOS AMI with
git
installed:
nixos-packer-example.json
{
"builders": [
{
"type": "amazon-ebs",
"ami_name": "nixos-packer-example {{timestamp}}",
"instance_type": "t2.micro",
"ssh_username": "root",
"source_ami_filter": {
"filters": {
"architecture": "x86_64"
},
"most_recent": true,
"owners": [
"080433136561"
]
}
}
],
"provisioners": [
{
"type": "file",
"source": "./configuration.nix",
"destination": "/tmp/"
},
{
"type": "shell",
"inline": [
"mv /tmp/configuration.nix /etc/nixos/configuration.nix",
"nixos-rebuild switch --upgrade",
"nix-collect-garbage -d",
"rm -rf /etc/ec2-metadata /etc/ssh/ssh_host_* /root/.ssh"
]
}
]
}
configuration.nix
{ pkgs, ... }:
{
imports = [ <nixpkgs/nixos/modules/virtualisation/amazon-image.nix> ];
ec2.hvm = true;
environment.systemPackages = with pkgs; [ git ];
}
Save the two files to the same directory, and run
packer build nixos-packer-example.json
from inside it.
Remember to clean up any registered AMIs and EBS snapshots when you’re
done playing around, otherwise Amazon will charge you to host them.
nixos-rebuild
can easily use all the instance’s disk
space, especially when building against more recent nixos
channels than the one used to build the base NixOS AMI. You can ask for
additional space by adding a launch_block_device_mappings
stanza to the amazon-ebs
builder:
"launch_block_device_mappings": [
{
"delete_on_termination": true,
"device_name": "/dev/xvda",
"volume_size": 10,
"volume_type": "gp2"
}
]
Some builds (e.g., anything that triggers a rebuild of NixOS
documentation) use a lot of memory, and can exhaust the RAM of a
t2.micro
. If this happens, you’ll see
nixos-rebuild
(or one of its children) fail with exit code
137
and no useful error message. To fix this, you’ll have
to use a larger instance, or create a swap file as a temporary
provisioning step.
Customising NixOS AMIs with a tool like packer
lets you
prebuild almost-ready-to-go images, and delivering each instance’s
configuration.nix
via user data creates a very flexible
configuration system with reasonable cold-start times. This is probably
all you need unless you’re building images for multiple formats (e.g.,
ISO, EC2 AMI, OpenStack) or hacking on nixpkgs
’
image-building support. But if you’re interested in the gory details,
read on.
You can build a .vhd
virtual HD image using the
infrastructure in nixpkgs
:
$ nix-build '<nixpkgs/nixos/release.nix>' \
-A amazonImage.x86_64-linux \
--arg configuration /path/to/configuration.nix
These builds boot a VM to finish the build, so you will want ample
CPU, memory and storage. If building as root (which is the only user on
a default NixOS AMI), you’ll probably want to set
NIX_REMOTE=daemon
so that the build takes place in
/tmp
.
The nixos-generators project provides a nice wrapper around the expressions in nixpkgs, and a single command to build NixOS images in selected formats. If you’re looking to build the same NixOS config into multiple formats, consider looking into it.
Either way, once you’ve built the .vhd
file, you’ll need
to get it into S3 so you can import it with Amazon’s VM
Import/Export service. It’s probably easiest to do the build on an
EC2 instance, to avoid pushing gigabytes of data across the public
internet. I used a t3a.medium
spot instance when writing
this post, and that was fast enough.
(It should also be possible to specify
-A amazonImage.aarch64-linux
to have nix
build
an AArch64 image, but I couldn’t make it work. Any tips?)
The configuration
argument is not strictly necessary,
but if you omit it, you will get a “blank” image like the ones published
by the NixOS project. The only real difference is that it will be built
against your version of nixpkgs.
Once the build finishes, the symlink result
will point
to a directory in the nix store that contains the .vhd
image, along with a nix-support
directory containing image
metadata.
Once you have built your image, you need to import it into EC2 as an AMI. The tool to do this is VM Import/Export.
VM Import/Export needs an S3 bucket to store the images before
triggering the import, and a role specifically called
vmimport
for the service to use. If you’re just mucking
around, you might want to try the following CloudFormation template to
create an S3 bucket and the necessary role:
template.yaml
for VM Import
AWSTemplateFormatVersion: 2010-09-09
Description: Bucket and roles for VM import
Resources:
VMImportBucket:
Type: AWS::S3::Bucket
Properties:
BucketEncryption:
ServerSideEncryptionConfiguration:
- ServerSideEncryptionByDefault:
SSEAlgorithm: AES256
PublicAccessBlockConfiguration:
BlockPublicAcls: true
BlockPublicPolicy: true
IgnorePublicAcls: true
RestrictPublicBuckets: true
VMImportExportServiceRole:
Type: AWS::IAM::Role
Properties:
RoleName: vmimport
Description: Service role for VM import/export
AssumeRolePolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Principal:
Service: vmie.amazonaws.com
Action: sts:AssumeRole
Condition:
StringEquals:
sts:Externalid: vmimport
Policies:
- PolicyName: vmimport
PolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Action:
- s3:GetBucketLocation
- s3:GetObject
- s3:ListBucket
Resource:
- !GetAtt VMImportBucket.Arn
- !Sub "${VMImportBucket.Arn}/*"
- Effect: Allow
Action:
- ec2:ModifySnapshotAttribute
- ec2:CopySnapshot
- ec2:RegisterImage
- ec2:Describe*
Resource: "*"
There are a few things to note before you use the
template.yaml
in your own environment:
The template only creates a bucket for import. I didn’t need to
export VMs so I stripped the vmimport
permissions back from
the set recommended
in the VM Import/Export documentation.
The role must be called vmimport
for VM
Import/Export to find it, so if you already have that role set up, you
may get clashes if you try to deploy this template.
You will need to deploy the stack with
CAPABILITY_NAMED_IAM
, because of the explicitly-named
vmimport
role.
If you try to pull the stack down, CloudFormation will not delete the S3 bucket unless it is empty.
Once you have the role and bucket set up, you can import your NixOS
image as an EBS snapshot and then register it as an AMI. The NixOS
maintainers use a script from nixpkgs at nixos/maintainers/scripts/ec2/create-amis.sh
to release the new AMIs, but it does more than we need for our
experiments. It:
Uploads the .vhd
image to S3 if it doesn’t already
exist, by calling aws s3 ls
and
aws s3 cp
;
Imports the .vhd
from S3 to an EBS snapshot, by
calling aws ec2 import-snapshot
;
Waits for the snapshot import to finish, by calling
aws ec2 describe-import-snapshot-tasks
in a
loop;
Registers the EBS snapshot as an AMI, by calling
aws ec2 register-image
;
Waits for the registration to finish, by calling
aws ec2 describe-images
in a loop;
Makes the new AMI public, by calling
aws ec2 modify-image-attribute --launch-permission 'Add={Group=all}'
;
and
Copies the AMI to all the other regions and makes them public, by
calling aws ec2 copy-image
and
aws ec2 modify-image-attribute
in a loop.
For tinkering, it’s probably enough to comment out the calls to
make_image_public
, and also comment out the loop in
upload_all
that iterates across the regions and copies the
AMI.
I think customising the NixOS project’s images with a tool like
packer
and then configuring instances with custom
configuration.nix
user data is a very solid way to get
started with NixOS on EC2. If you need to ship the same NixOS config in
multiple image formats, or you have extremely unusual configuration
needs, nixpkgs provides great tooling for fully-declarative image
specifications. Odds are you probably won’t need this level of control,
but it’s still interesting to see how the sausage is made.