Workflow

Professional Development Workflow: Refactor-Debug-Verify

A deterministic cycle for developing, refactoring, and maintaining a complex Nix Flake.

🏗️ The Problem: The “Hope-Based” Loop

In large Nix configurations, a small change in a library or shared module can trigger an infinite recursion error or a broken flake output. Relying on nixos-rebuild switch (the “Sync” command) to catch these is slow and frustrating because it involves building the entire derivation before failing.

🧪 The Solution: The “Check-First” Cycle

We employ a professional three-tiered verification strategy using the hey check utility.

1. Syntax Layer (hey check syntax)

  • Action: Runs nix-instantiate --parse on every .nix file.
  • Goal: Catch missing semicolons, unclosed braces, or basic typos instantly.
  • When: Run this after any multi-file refactor.

2. Integrity Layer (hey check flake)

  • Action: Runs nix flake check.
  • Goal: Ensure the flake schema is valid, inputs are correctly wired, and all outputs are evaluate-able.
  • When: Run this after updating flake.nix, flake.lock, or adding new hosts.

3. Evaluation Layer (hey check eval)

  • Action: Performs a deep evaluation of a host’s system.build.toplevel.drvPath.
  • Goal: Catch Infinite Recursion and logic errors. This forces Nix to resolve the entire module tree without downloading or building anything.
  • When: Crucial after any logic change to modules or library functions.

🚀 Step-by-Step Refactoring Workflow

Step 1: Modification

Perform your refactor (e.g., extracting a shared option, unifying a script).

Step 2: Verification (The “Smoke Test”)

Run hey check eval --host <affected-host>. This is mandatory for every refactor, including those performed by AI Agents.

  • If it passes, the logic is sound.
  • If it fails with infinite recursion, use the --show-trace output to find the loop.
  • If hey check itself is broken or errors out: Stop your current task and refactor the hey tool itself (located in bin/hey or bin/hey.d/). The verification ecosystem must be kept robust at all times.

Step 3: Debugging Recursion

If you hit a recursion error:

  1. Check the trace for the last file in the repository (e.g., modules/desktop/term/alacritty.nix:131).
  2. Look for “Circular Dependencies”: Is a mkIf condition depending on a value defined inside that same mkIf?
  3. Use hey repl to manually walk the attribute set:
    hey repl nix
    nix-repl> :lf .
    nix-repl> nixosConfigurations.id3-eniac.config.modules.desktop.term.alacritty.settings

Step 4: Final Validation

Once evaluation passes, run a full sync or a VM build to verify behavioral correctness:

  • hey sync build-vm (The safest way to test broad system changes)
  • hey sync switch (The final application)

🛠️ Tool Summary

CommandTierPurpose
hey check syntax1Catch syntax typos
hey check flake2Verify flake schema/inputs
hey check eval3Detect infinite recursion
hey check all1-3Full host health check

📜 Recursion Safety Rules

  1. Lazy Evaluation is your Friend: Nix is lazy, but it must resolve a value to check a condition. If the condition for a module’s enablement depends on that module’s configuration, you have a loop.
  2. Avoid config inside mkIf conditions: Try to use pkgs or other inputs for conditions when possible.
  3. Use lib.mkIf carefully: Ensure the boolean condition is “stable” and doesn’t bounce back and forth between definitions.

🔬 Advanced Debugging: The “Hey” Way

Obscure bugs in specialized environments (Janet, Zsh, Custom Packages) require a scientific approach: Observe -> Hypothesis -> Test -> Verify.

1. Analysis: The Core Dump Logic

If a tool (like Zsh) is crashing with SIGSEGV, don’t guess. Use gdb to find the exact point of failure:

# 1. Identify the source
file core.PID
# 2. Get the backtrace
gdb -ex "bt" -batch /path/to/binary core.PID

Example: We discovered Zsh crashing in deletejob because setopt NOTIFY triggered an asynchronous signal handler while the shell was busy with complex completions—a classic race condition. Fix: unsetopt NOTIFY ensures signals are only handled at safe prompt intervals.

2. Hypothesis: Proving the “Silent Failure”

When a script runs without errors but does nothing (e.g., hey @rofi audiomenu), use tracing to “see” the internal state:

  • strace -f ...: Verify if the process is actually execve-ing the expected binaries (like rofi).
  • janet -c ...: Check for compile-time errors in macros.
  • Hypothesis Testing: Create minimal reproduction scripts (test-rofi.janet) to isolate the failure.

3. Case Study: The Janet Macro Trap

The hey toolchain relies on Janet macros for command dispatch. We fixed two critical bugs using this method:

  • The upscope Fix: Discovering that (do ...) hides definitions from the top-level main auto-execution. Switching to upscope solved the “nothing happens” bug.
  • The Stderr Corruption: The $<_ macro captures both stdout and stderr. If a system tool (like pactl) prints warnings to stderr, it corrupts the JSON output and crashes the parser. Solution: Always use sh -c "cmd 2>/dev/null" when capturing output meant for machine parsing.

4. Verification: The Final Step

A bug is not fixed until a regression test is added. See test/hey/rofi_debug.janet for examples of how to verify macro expansion and subprocess behavior in the judge framework.