Bootstrapping NixOS on a 1GB VPS: 5 Bugs, 65% Closure Reduction, and a Working Script
A deep debugging journey: using nixos-anywhere and disko on a 962MB RAM VPS. OOM kills, SIGBUS crashes, SSH failures, persistence bugs, and how we solved each one.
How do you install NixOS on a VPS with only 962MB RAM? Five bugs, four days of debugging, and a 65% smaller system closure later, here’s the complete story.
The Goal
I have a sing-box proxy node running on a DMIT VPS. The plan was simple: replace Arch Linux with my declarative NixOS configuration. One command: nixos-anywhere --flake .#vps-ultraman vps_ultraman_root.
It didn’t work. Here’s why, and how we fixed it.
The Environment
| Value | |
|---|---|
| RAM | 962MB |
| Disk | 20GB (virtualized /dev/vda) |
| CPU | 1 core, AMD EPYC 9654 |
| OS (target) | Arch Linux (minimal) |
| OS (destination) | NixOS 26.05 unstable |
| Config | sing-box server, Tailscale, Btrfs + impermanence |
Bug #1: OOM During nix copy
Symptom: SSH to the target timed out during nix copy --to.
Investigation: nixos-anywhere first copies the disko partitioning tool and its dependencies to the target’s /nix/store. The NixOS installer is booted via kexec and runs entirely in RAM — root is tmpfs, and /nix/store is a tmpfs-backed overlay. On a 962MB system, tmpfs is capped at 482MB (half of RAM). The disko dependency closure is ~400MB — it barely fits, and when it doesn’t, the OOM killer takes out sshd.
The first clue: The installer’s dmesg showed:
Memory: 626288K/1048004K available
Freeing initrd memory: 344548K
That 336MB initrd plus the kernel left only ~626MB usable.
Fix: Two things. First, set up 4GB ZRAM swap on the Arch host before running nixos-anywhere, and again on the NixOS installer after kexec (since kexec replaces the kernel, ZRAM doesn’t survive). Second, use nixos-anywhere --no-disko-deps, which skips copying the partitioning tools — the kexec installer image already contains them.
# On Arch (Phase 0)
modprobe zram
echo 4G > /sys/block/zram0/disksize
mkswap /dev/zram0 && swapon /dev/zram0
# After kexec (on NixOS installer)
# Same commands, but on the new kernel
This is the approach recommended in nixos-anywhere issue #609.
Bug #2: udevadm SIGBUS During Disko
Symptom: Disko crashed at udevadm trigger --subsystem-match=block with a bus error (SIGBUS, exit code 135).
Investigation: The disko script runs sgdisk to create a new GPT partition table on /dev/vda, then calls udevadm trigger so the kernel picks up the new partitions. On this VPS’s virtualized storage, the kernel holds stale partition table references. When udevadm scans block devices, it reads sectors beyond the new partition boundaries and the kernel delivers a bus error.
The dmesg confirmed it:
udevadm: attempt to access beyond end of device
vda1: rw=0, sector=14442744, nr_sectors = 8 limit=2048
Fix: The helper script finds the disko script in /nix/store/*-disko and patches it to replace udevadm trigger --subsystem-match=block and udevadm settle --timeout=120 with sleep 2. The same replacement is applied to the disk-deactivate sub-script.
# Core of the patching logic
D=$(ls -td /nix/store/*-disko 2>/dev/null | grep -v "disko-[0-9]" | head -1)
cp "$D" /tmp/d
sed -i "s/udevadm trigger --subsystem-match=block/sleep 2 #p/" /tmp/d
sed -i "s/udevadm settle --timeout 120/sleep 2 #p/" /tmp/d
mount --bind /tmp/d "$D" # Override the read-only nix store path
Bug #3: System Closure Won’t Fit in tmpfs
Symptom: Even with --no-disko-deps, the system closure copy OOM’d. The original closure was 7.6GB — impossible to stage in 482MB tmpfs.
Investigation: Why was a sing-box proxy node 7.6GB? We analyzed the closure with:
nix path-info -rS '.#nixosConfigurations."vps-ultraman".config.system.build.toplevel' \
| sort -rn | head -40
The results were eye-opening:
| Package | Size | Why on a VPS? |
|---|---|---|
| clang/clang-tools/lldb/llvm | 7GB | dev.cc.enable = true |
| rust + docs + clippy | 3.5GB | dev.rust.enable = true |
| hyprland + grimblast + Qt/Wayland | 4.5GB | Bug: unconditional in hey.nix |
| linux-firmware (all) | 787MB | hardware.enableRedistributableFirmware |
| zbar (barcode scanner) | 620MB | Bug: unconditional in hey.nix |
| microvm | 503MB | Bug: unconditional import |
Fix (multi-step):
-
Desktop tools leak:
modules/hey.nixunconditionally addedgrim,grimblast,zbar,wl-clipboard, etc. toenvironment.systemPackages. These were already present inmodules/desktop/default.nixfor Wayland hosts. Fixed by removing them fromhey.nix. -
Firmware: Added
hardware.enableRedistributableFirmware = mkDefault falseto thecpu/qemuprofile. VMs only need virtio drivers. -
MicroVM: The
modules/services/virt/microvm.nixunconditionally imported the microvm host module on all hosts. (The conditional import fix caused infinite recursion — left for a future refactor.) -
Host-specific cleanup: Disabled
dev.cc,dev.rust,dev.python,hardening.apparmoron the VPS config. -
Persistent overlay: After disko creates
/mnt/nixon btrfs, the helper bind-mounts it as the nix store overlay upper dir. The 2.6GB system closure writes to disk instead of tmpfs:
# After disko creates /mnt/nix on btrfs:
umount /nix/store
umount /nix/.rw-store
mkdir -p /mnt/nix/.rw-store/store /mnt/nix/.rw-store/work
mount --bind /mnt/nix/.rw-store /nix/.rw-store
mount -t overlay overlay \
-o lowerdir=/nix/.ro-store,upperdir=/nix/.rw-store/store,workdir=/nix/.rw-store/work \
/nix/store
Result: Closure dropped from 7.6GB to 2.6GB (65% reduction, 972 paths).
Bug #4: SSH Dead After Successful Boot
Symptom: NixOS booted to TTY, but sshd.service failed with:
sshd.service: start request repeated too quickly.
/etc/ssh/sshd_config: No such file or directory
Investigation: The host keys did exist in /etc/ssh, but sshd_config didn’t. How?
The answer is in the interaction between NixOS activation and impermanence. My configuration uses environment.persistence to persist directories across reboots (the root is tmpfs). The SSH module had:
environment.persistence."/persist".directories = ["/etc/ssh"];
Here’s the boot sequence:
-
local-fs.target: Impermanence bind-mounts/persist/etc/ssh→/etc/ssh. On first boot,/persist/etc/sshis empty — it was just created by a prior activation script. -
NixOS activation: Creates
/etc/ssh/sshd_configas a symlink to/nix/store/...-sshd.conf-final. But/etc/sshis now backed by the empty/persist/etc/ssh. The symlink gets written, but… -
Bind mount hides it: The empty directory was mounted before activation created the symlink. Wait — if the mount runs first, then activation writes into the mount (i.e., into
/persist/etc/ssh), the file should be there. But thecreatePersistentStorageDirsactivation script that creates the parent directories runs during activation, potentially racing withenvironment.etcsymlink creation. -
Host keys survive: sshd’s
preStartgenerates host keys after the bind mount is already in place, so they land in/persist/etc/ssh/. Butsshd_configis created by activation, which ran beforepreStartbut may have lost the race with directory creation.
The principle: Never persist a directory that contains files created by NixOS activation (environment.etc, home.file, home.configFile). Persist individual files instead.
Fix:
# Before (broken):
environment.persistence."/persist".directories = ["/etc/ssh"];
# After (fixed):
environment.persistence."/persist".files = [
"/etc/ssh/ssh_host_ed25519_key"
"/etc/ssh/ssh_host_ed25519_key.pub"
];
sshd_config is a nix store symlink — it doesn’t need persistence. Only the host keys do. This fix also applied to podman.nix which had the same pattern (/etc/containers directory persist hiding registries.conf).
We also changed services.openssh.startWhenNeeded from true (socket activation) to false (persistent daemon). Socket activation saves ~5MB RAM but adds complexity that can fail silently.
Full writeup: docs/persistence.md
Bug #5: No TTY Login Possible
Symptom: Booted to Welcome to NixOS 26.05 - tty1 but couldn’t login. No root password set.
Investigation: The agenix-based hashedPasswordFile only activates if an encrypted .age file exists for the host — which requires pre-existing host keys from Bug #4. Chicken and egg.
Fix: Added modules.security.password.mode option:
password = {
mode = mkOpt (types.enum ["bootstrap" "deploy" "none"]) "deploy";
};
Set password.mode = "bootstrap" on vps-ultraman. This sets users.users.root.initialPassword = "nixos" and users.mutableUsers = true — giving console TTY access during initial setup. After bootstrap, switch to "deploy" for agenix-based secure passwords.
The Working Script
All the fixes are bundled into a single portable script: scripts/bootstrap_vps-ultraman.sh
#!/usr/bin/env bash
# Bootstrap vps-ultraman (1GB RAM VPS) — single command, no parallel terminals.
#
# ./scripts/bootstrap_vps-ultraman.sh
#
# What it does:
# 0. Prep VPS (cpio, 4GB ZRAM, 4GB disk swap)
# 1. Fork background helper (ZRAM on installer, udevadm patch, persistent overlay)
# 2. Run nixos-anywhere in foreground via hey ops bootstrap --no-disko-deps
# 3. Kill helper on exit
set -euo pipefail
SSH_TARGET="${1:-vps_ultraman_root}"
SSH_OPTS="-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o ConnectTimeout=5"
MAX_WAIT=300
log() { echo "[bootstrap] $(date '+%H:%M:%S') $*"; }
# Phase 0: Prep target VPS (foreground)
log "Phase 0: Preparing target VPS..."
ssh $SSH_OPTS "$SSH_TARGET" '
if command -v pacman &>/dev/null; then
pacman -S --noconfirm cpio 2>&1 | tail -1
elif command -v apt &>/dev/null; then
apt update -qq && apt install -y -qq cpio 2>&1 | tail -1
fi
modprobe zram 2>/dev/null || true
echo 4G > /sys/block/zram0/disksize 2>/dev/null || true
mkswap /dev/zram0 2>/dev/null && swapon /dev/zram0 2>/dev/null
swapoff /swapfile 2>/dev/null || true
rm -f /swapfile
fallocate -l 4G /swapfile 2>/dev/null
chmod 600 /swapfile
mkswap /swapfile 2>/dev/null && swapon /swapfile 2>/dev/null
echo "Ready: $(free -h | head -2 | tail -1)"
swapon --show
' 2>&1
log "Phase 0 complete."
# Phase 1+2: Background helper (forked, runs in parallel with nixos-anywhere)
(
# Wait for kexec
while ssh $SSH_OPTS "$SSH_TARGET" 'grep -q "Arch Linux" /etc/os-release' 2>/dev/null; do
sleep 3
done
# ... (ZRAM on installer, disko patching, persistent overlay)
# Full script: https://github.com/alienzj/dotfiles/blob/dev/scripts/bootstrap_vps-ultraman.sh
) &
HELPER_PID=$!
trap 'kill $HELPER_PID 2>/dev/null || true' EXIT
# Phase 3: Run nixos-anywhere (foreground)
log "Starting nixos-anywhere..."
nix run github:nix-community/nixos-anywhere -- \
--no-disko-deps \
--flake ".#vps-ultraman" \
"$SSH_TARGET"
EXIT_CODE=$?
log "Bootstrap finished (exit $EXIT_CODE)."
exit $EXIT_CODE
The Principles
-
ZRAM is essential for
<2GBRAM installs. Set it up on both the host (before kexec) and the installer (after kexec). kexec replaces the kernel, so ZRAM must be recreated. -
--no-disko-depsfor low-RAM VPS. The kexec installer already contains partitioning tools. Copying them to tmpfs wastes ~400MB. -
Analyze closure sizes before bootstrapping.
nix path-info -rScan reveal multi-gigabyte leaks like unconditional desktop tools on a server. -
Never persist directories containing NixOS-generated files. Impermanence bind-mounts run before activation. File-level persistence avoids the race.
-
Persistent sshd beats socket activation. The 5MB RAM savings aren’t worth the debugging complexity.
-
Always have a bootstrap password. A TTY fallback saves you when SSH fails on first boot.
Files Changed
The full changeset spans 17 files across the dotfiles repository (tag v0.3.0):
- modules/security.nix —
resolved,hardeningsubmodule,password.mode - modules/services/net/ssh.nix — file-level persistence,
startWhenNeededoption - modules/hey.nix — removed desktop tools from server closure
- modules/profiles/hardware/cpu/qemu.nix — disabled firmware on VMs
- bin/screenshot.zsh — replaced grimblast with grim+hyprctl
- scripts/bootstrap_vps-ultraman.sh — the working entry point
- docs/persistence.md — persistence best practices
- docs/hosts/vps-ultraman.md — host case study
References
- nixos-anywhere issue #609 — ZRAM for low-RAM systems
- impermanence — the persistence module whose mount ordering caused Bug #4
- disko — declarative disk partitioning