Workflow
Professional Development Workflow: Refactor-Debug-Verify
A deterministic cycle for developing, refactoring, and maintaining a complex Nix Flake.
🏗️ The Problem: The “Hope-Based” Loop
In large Nix configurations, a small change in a library or shared module can trigger an infinite recursion error or a broken flake output. Relying on nixos-rebuild switch (the “Sync” command) to catch these is slow and frustrating because it involves building the entire derivation before failing.
🧪 The Solution: The “Check-First” Cycle
We employ a professional three-tiered verification strategy using the hey check utility.
1. Syntax Layer (hey check syntax)
- Action: Runs
nix-instantiate --parseon every.nixfile. - Goal: Catch missing semicolons, unclosed braces, or basic typos instantly.
- When: Run this after any multi-file refactor.
2. Integrity Layer (hey check flake)
- Action: Runs
nix flake check. - Goal: Ensure the flake schema is valid, inputs are correctly wired, and all outputs are evaluate-able.
- When: Run this after updating
flake.nix,flake.lock, or adding new hosts.
3. Evaluation Layer (hey check eval)
- Action: Performs a deep evaluation of a host’s
system.build.toplevel.drvPath. - Goal: Catch Infinite Recursion and logic errors. This forces Nix to resolve the entire module tree without downloading or building anything.
- When: Crucial after any logic change to modules or library functions.
🚀 Step-by-Step Refactoring Workflow
Step 1: Modification
Perform your refactor (e.g., extracting a shared option, unifying a script).
Step 2: Verification (The “Smoke Test”)
Run hey check eval --host <affected-host>. This is mandatory for every refactor, including those performed by AI Agents.
- If it passes, the logic is sound.
- If it fails with
infinite recursion, use the--show-traceoutput to find the loop. - If
hey checkitself is broken or errors out: Stop your current task and refactor theheytool itself (located inbin/heyorbin/hey.d/). The verification ecosystem must be kept robust at all times.
Step 3: Debugging Recursion
If you hit a recursion error:
- Check the trace for the last file in the repository (e.g.,
modules/desktop/term/alacritty.nix:131). - Look for “Circular Dependencies”: Is a
mkIfcondition depending on a value defined inside that samemkIf? - Use
hey replto manually walk the attribute set:hey repl nix nix-repl> :lf . nix-repl> nixosConfigurations.id3-eniac.config.modules.desktop.term.alacritty.settings
Step 4: Final Validation
Once evaluation passes, run a full sync or a VM build to verify behavioral correctness:
hey sync build-vm(The safest way to test broad system changes)hey sync switch(The final application)
🛠️ Tool Summary
| Command | Tier | Purpose |
|---|---|---|
hey check syntax | 1 | Catch syntax typos |
hey check flake | 2 | Verify flake schema/inputs |
hey check eval | 3 | Detect infinite recursion |
hey check all | 1-3 | Full host health check |
📜 Recursion Safety Rules
- Lazy Evaluation is your Friend: Nix is lazy, but it must resolve a value to check a condition. If the condition for a module’s enablement depends on that module’s configuration, you have a loop.
- Avoid
configinsidemkIfconditions: Try to usepkgsor other inputs for conditions when possible. - Use
lib.mkIfcarefully: Ensure the boolean condition is “stable” and doesn’t bounce back and forth between definitions.
🔬 Advanced Debugging: The “Hey” Way
Obscure bugs in specialized environments (Janet, Zsh, Custom Packages) require a scientific approach: Observe -> Hypothesis -> Test -> Verify.
1. Analysis: The Core Dump Logic
If a tool (like Zsh) is crashing with SIGSEGV, don’t guess. Use gdb to find the exact point of failure:
# 1. Identify the source
file core.PID
# 2. Get the backtrace
gdb -ex "bt" -batch /path/to/binary core.PID
Example: We discovered Zsh crashing in deletejob because setopt NOTIFY triggered an asynchronous signal handler while the shell was busy with complex completions—a classic race condition.
Fix: unsetopt NOTIFY ensures signals are only handled at safe prompt intervals.
2. Hypothesis: Proving the “Silent Failure”
When a script runs without errors but does nothing (e.g., hey @rofi audiomenu), use tracing to “see” the internal state:
strace -f ...: Verify if the process is actuallyexecve-ing the expected binaries (likerofi).janet -c ...: Check for compile-time errors in macros.- Hypothesis Testing: Create minimal reproduction scripts (
test-rofi.janet) to isolate the failure.
3. Case Study: The Janet Macro Trap
The hey toolchain relies on Janet macros for command dispatch. We fixed two critical bugs using this method:
- The
upscopeFix: Discovering that(do ...)hides definitions from the top-levelmainauto-execution. Switching toupscopesolved the “nothing happens” bug. - The Stderr Corruption: The
$<_macro captures bothstdoutandstderr. If a system tool (likepactl) prints warnings tostderr, it corrupts the JSON output and crashes the parser. Solution: Always usesh -c "cmd 2>/dev/null"when capturing output meant for machine parsing.
4. Verification: The Final Step
A bug is not fixed until a regression test is added. See test/hey/rofi_debug.janet for examples of how to verify macro expansion and subprocess behavior in the judge framework.