How to use sidecars and bring extra luggage on fly.io

Building custom model planes for fun and cost optimization

Fly.io is a “cloud service provider”, with an interesting architecture: they have a global anycast routing layer over a bunch of points of presence (PoPs) around the world. Each point of presence provides compute, and they have the additional services to expect from a public “cloud” such as TLS termination, databases & storage, etc. Worth mentioning is their S3-compatible blob storage, Tigris, built on top of the Fly.io architecture.

I liked their networking architecture, and the ability to run VMs, with as little as 256MB of memory, from OCI (Docker) images. However, I did not really like the default model of one process per VM, and I was not interested in their other abstractions e.g. TLS termination, or secrets management, since I already have working parts for that.

In this post, we will see how Nixpkgs allowed me to cherry-pick from the Fly.io abstractions, and do more with less. Rather than going deep into how Nixpkgs work, I will keep things superficial, and high-level, so that it can serve as an introduction to some of the concepts. If you have not heard of Nixpkgs before, it is a package manager for Linux and macOS implemented using a functional programming language called Nix.

Edge networking #

Fly.io sits at the edge of my personal infrastructure: it is the layer that receives traffic from the Internet, and proxies it to my self-hosted servers. Fly.io has better Internet connectivity than my self-hosted servers, and it is nice to be able to cache my servers’ responses, and keep my home IPs (the “origin” in CDN parlance) private. This immediately presents a few problems:

  1. My machines on Fly.io somehow need to connect to my backends via the Internet;
  2. They need to do TLS termination;
  3. I use Nginx, and it will need some DNS resolver.

For 1, I have been using Tailscale. For 2, I have a vault-agent setup that pulls my Let’s Encrypt certificates from my Vault (see also OpenBao). For 3, I can run unbound.

Nginx does not run alone: it has all those sidecar processes and they need to be in the same network namespace so that we can route over Tailscale, and that does not fit Fly.io’s model of one process per machine (VM), so we need to bring some kind of process manager, something lighter than systemd, and more modern than supervisord.

My pick here is process-compose, think docker compose but without Docker: just regular processes, with a way to setup some dependencies between them.

Now that we have some kind of bill of materials for an OCI (Docker) image, let’s see how we can build that image FROM scratch.

Pick and place with Nix & Nixpkgs #

Our image consists of:

The next sections will be illustrated with some Nix code from some fairly ugly flake-module.nix. If you are not familiar with Nix, all this file does (in this context) is setting up our Docker image as a Nix package called fly-io-pop.

Pull up the dependencies #

You can think of Nixpkgs as one huge tree of JSON objects that describes how to build pretty much any piece of software out there. We can pick the software we are interested in from that tree, and put them in our basket: the list variable I named basePkgs.

It is worth noting that we modify some packages with those override and overrideAttrs calls instead of picking them straight from the tree. On Linux, Nixpkgs builds packages with the assumption that they are going to be used in a regular server or desktop environment that uses systemd. We are not using systemd here, and since it’s a heavy dependency, we can reduce our image size if we let go of it. Those “overrides” allow us to build custom versions of those packages without systemd.

The overrideAttrs on process-compose is worth noting: it swaps out the sources for process-compose with my own fork, where I am trying to implement config validation. This ability you get with Nixpkgs, to basically patch anything at any level in your system, and then easily rebuild, and redistribute software with those changes, is very liberating. For years, I have been stuck on ideas, simply because building, and distributing packages with other package managers is so hard.

Configure everything #

One reason why this flake-module.nix file is so ugly is because I pretty much inlined the configuration for everything in it. 🤷

The configuration files are set up using the Nixpkgs’ function writeTextFile, which « Write a text file to the Nix store ». Unlike other package managers where all packages share the same directories, usually under /usr, Nix takes a different approach where each package gets its own directory under /nix/store. Each package directory in the /nix/store starts with a cryptographic hash based on the dependency graph of the package. This gives you all the benefits listed in the first page of the Nix manual.

writeTextFile writes a single file under /nix/store, and it will also start with a hash that uniquely identifies it. The return value of writeTextFile is the absolute path of the file written under /nix/store1.

Since the Nix language maps very well to JSON, it can be used to directly configure many programs. For example, I directly configure process-compose from Nix, which allows me to define a function mkProcess to avoid repeating the same logging configuration for every process.

The process-compose configuration starts with a postInit command, which is a shell script written to the Nix store. The script creates a few things at runtime: temporary directories when the VM boots, but also some permanent files under /var which I have configured in fly.io to be mounted on persistent storage. Some directories are owned by the nginx and unbound users, this will have to be backed by some user database.

For the user-related functions in the libc to work we need the following files: /etc/passwd, /etc/group, and /etc/shadow. Luckily the Docker image builder for the Nix interpreter (docker.nix from the Nix repository), already has an helpful bit of code for that, and I have extracted it into my own [dockerNssHelper function]. The function takes a couple “JSON objects” / “dictionaries” / “hash maps”, Nix calls them attribute sets, that describe my user and group directories: user or group names are the keys, with properties like uid, gid, home, shell, etc. as values. Since /etc/passwd et al. are not unlike CSV files, it is pretty easy for the function to loop over its arguments, perform some string interpolation, and render the files. They will go under /nix/store, and we will symlink them in /etc in a later step.

The last thing postInit does is create an SSH identity, and then call this program called sops-install-secrets at the end, let’s look into that, and how we manage TLS certificates.

Security considerations #

TLS certificates #

I run a certbot NixOS module2 in my backend that gets certificates from Let’s Encrypt and store them in Vault (or OpenBao). Whenever a process needs some TLS certificate, I deploy a vault-agent instance with it. We need to supply the agent with credentials to fetch the certificates from the vault. The credentials are stored in the /nix/store in a YAML file created with Sops, if you have not heard about Sops it is some kind of text editor for encrypted files. A sops file has a bunch of key/value pairs, all the values are symmetrically encrypted with a single key, and then that key is encrypted using any number of “master keys” for the file. Sops is nice way to store secrets in git, and still have something diff-able. This sops-install-secrets program called at the end of postInit is part of sops-nix, a project that integrates Sops with NixOS. Let’s see how this plays out.

We have this secrets.yaml Sops file in the repository, Nix will copy the file under /nix/store, and it can be decrypted by any of three different (private) keys: my GPG encryption key, or the SSH host private key of each of my two machines on fly.io. When the VM boots, sops-install-secrets is executed, and is given the path of a JSON manifest that tells sops-install-secrets where secrets.yaml is, where is the key that can decrypt it, and where each of the secrets the file contains need to go. How I generate the JSON manifest for sops-install-secrets (sops-nix) is an absolute hack: you are supposed to use sops-nix in NixOS, where this JSON manifest is generated from the evaluation of the sops-nix NixOS module. We are not using NixOS here, I just cut off the exact NixOS bits I needed with my hacksaw, and called lib.nixos.evalModules, to evaluate the sops-nix module (slightly modified for the purpose of this hack), with my configuration for it, and as a result we get our JSON manifest for sops-nix to decrypt, and setup the secrets at boot in a ramfs. If you wish, that can serve as quick introduction to the concept of modules in Nix/NixOS, which is really just a design pattern. A module returns an object with three attributes:

The result of the evaluation of a list of modules is all the config definitions (deep-)merged together, and where things get really interesting3 is that a module can receive the current, in-progress4, state of the config evaluation as a parameter, so that it can reference options that have been set (in config) in another module. This design pattern is used extensively in the Nix ecosystem.

Dropping privileges #

The other security related piece is how we drop privileges, and we could do a lot more here, but not running Nginx as root seems like a sensible thing to do. This piece gets interesting because our image is so bare-bones that it does not support PAM, and neither sudo(1), nor doas(1) will work. Thankfully Nix makes it very easy to roll our own: we can inline a little bit of C like if it was another shell script, and makeup for the situation with our own runas helper.

Assemble and deploy the OCI image #

My issue with the “native” Nixpkgs tooling for OCI images is that it creates one layer per derivation, which is better for caching, but unfortunately hits the practical and low number of COW filesystems that can be stacked on top of each other with decent performance (historically 128). Maybe an option to use bind mounts instead makes sense.

Instead I have been using nix2container, with the old fashioned way of composing layers yourself, I had it wrongly configured the first time. The other cool thing nix2container does is to use skopeo to directly interact with a Docker registry. This allows me to create a couple tasks ([more shell scripts]): pop-deploy to release and/or push to prod, and pop-releases to list all the images (versions) I have uploaded.

Profits & Losses #

For sure some time was invested, things are missing… It runs for about $15 a month, scaling vertically is easier, I could easily add a 3rd replica manually. It has been fun, and useful to me.

Did you learn anything? Did something surprise you? Did something feel wrong or good?


1

Technically it returns a derivation, that gets automatically converted to a string that represents the derivation’s output path under /nix/store.

2

NixOS is a Linux distribution built upon Nix and Nixpkgs.

3

And leave the realm of hacking to enter the realm of computer science and lambda calculus.

4

Sorry if I am butchering the fixed point concept.