Deep Review: 20260519-103443-pr-355

Date	2026-05-19 10:45
Repo	rancher-sandbox/rancher-desktop-daemon
Round	1
Author	@mook-as
PR	#355 — K3s: Use a dynamic port
Branch	`mook-as:dynamic-k3s-port`
Commits	`2db5c92` K3s: Use a dynamic port
Reviewers	Claude Opus 4.7 (effort: xhigh), Codex GPT 5.5 (effort: xhigh), Gemini 3.1 Pro (effort: default)
Verdict	Merge with fixes — one Important correctness gap (non-SSA status update violates this file's own concurrent-writer contract) plus a port-staleness window that breaches `ResolvePort`'s documented contract. Test coverage and several smaller polish items follow.
Wall-clock time	`14 min 20 s`

Critical Issues ¶

None.

Important Issues ¶

I1. Non-SSA Status().Update on App violates the concurrent-writer contract Claude Codex Gemini¶

pkg/controllers/app/app/controllers/app_controller.go

	limaVMErr := r.Get(ctx, client.ObjectKey{Name: limaVMName, Namespace: namespace}, limaVM)
	if limaVMErr != nil && !apierrors.IsNotFound(limaVMErr) {
		return ctrl.Result{}, limaVMErr
	}

	// If the Kubernetes port is not set, set it first.
	if app.Spec.Kubernetes.Enabled && app.Status.KubernetesPort == 0 {
		app.Status.KubernetesPort, err = controllers.ResolvePort(ctx, 7441+instance.Index())
		if err != nil {
			return ctrl.Result{}, fmt.Errorf("failed to resolve Kubernetes port: %w", err)
		}
		if err := r.Status().Update(ctx, &app); err != nil {
			return ctrl.Result{}, fmt.Errorf("failed to update App status: %w", err)
		}
		// The status update causes a requeue.
		return ctrl.Result{}, nil
	}

	if apierrors.IsNotFound(limaVMErr) {
		template, err := applySpecToTemplate(r.LimaTemplateData, app.Spec, app.Status)
		if err != nil {
			return ctrl.Result{}, fmt.Errorf("failed to apply spec to template: %w", err)

App.Status has multiple concurrent writers — AppReconciler mirrors LimaVM conditions and EngineReconciler.setEngineCondition writes ContainerEngineReady on the same object. The file's own later condition update at line 371 spells this out and uses retry.RetryOnConflict with a re-Get for exactly this reason. The new code reuses the app value fetched at line 136 and calls a bare r.Status().Update. When EngineReconciler lands a write between those two points, the API server returns 409, the reconcile errors out, controller-runtime requeues, and the next pass re-runs ResolvePort from scratch — opening and closing a fresh listener each time. The recovery path works, but the port can churn under contention, and the pattern is exactly what the rest of the file is structured to avoid.

pkg/controllers/app/app/controllers/app_controller.go

// The app controller is a cluster-scoped singleton - only one instance named 'app' is allowed.
func (r *AppReconciler) Reconcile(ctx context.Context, _ ctrl.Request) (ctrl.Result, error) {
	log := logf.FromContext(ctx)

	var app v1alpha1.App
	if err := r.Get(ctx, client.ObjectKey{Name: appName}, &app); err != nil {
		if !apierrors.IsNotFound(err) {
			log.Error(err, "Unable to fetch App singleton")
		}
		return ctrl.Result{}, client.IgnoreNotFound(err)
	}

pkg/controllers/app/app/controllers/app_controller.go

	// app's resourceVersion from the initial Get can be stale.
	// retry.RetryOnConflict + re-Get matches
	// EngineReconciler.setEngineCondition; without it, concurrent
	// writers 409-loop through controller-runtime requeues.
	engineEnabled := r.engineEnabled(ctx)
	if err := retry.RetryOnConflict(retry.DefaultRetry, func() error {
		latest := &v1alpha1.App{}
		if err := r.Get(ctx, client.ObjectKeyFromObject(&app), latest); err != nil {
			return err
		}
		changed := false

Fix: mirror the existing pattern.

 // If the Kubernetes port is not set, set it first.
 if app.Spec.Kubernetes.Enabled && app.Status.KubernetesPort == 0 {
-    app.Status.KubernetesPort, err = controllers.ResolvePort(ctx, 7441+instance.Index())
+    port, err := controllers.ResolvePort(ctx, 7441+instance.Index())
     if err != nil {
         return ctrl.Result{}, fmt.Errorf("failed to resolve Kubernetes port: %w", err)
     }
-    if err := r.Status().Update(ctx, &app); err != nil {
+    if err := retry.RetryOnConflict(retry.DefaultRetry, func() error {
+        latest := &v1alpha1.App{}
+        if err := r.Get(ctx, client.ObjectKey{Name: appName}, latest); err != nil {
+            return err
+        }
+        if latest.Status.KubernetesPort != 0 {
+            return nil
+        }
+        latest.Status.KubernetesPort = port
+        return r.Status().Update(ctx, latest)
+    }); err != nil {
         return ctrl.Result{}, fmt.Errorf("failed to update App status: %w", err)
     }
     // The status update causes a requeue.
     return ctrl.Result{}, nil
 }

I2. Port probe runs minutes before the actual bind, well past ResolvePort's documented contract Codex Gemini¶

ResolvePort opens a 127.0.0.1:desired listener, closes it immediately, and returns the port. Its own godoc warns the caller to "call this immediately before the port is needed to minimize the window between releasing and rebinding" (pkg/service/controllers/ports.go:14-18). Here, AppReconciler calls it on first reconcile, persists the result to Status, requeues, then on a later reconcile creates the input ConfigMap, creates the LimaVM, and only minutes later — once Lima provisions the VM and sets up port forwarding — does anything bind the host port. Between the close and the bind, any other process on the host can take the port. Lima's portForwards block (pkg/controllers/app/app/lima-template.yaml:13-21) uses hostPortRange: [0, 0] to mirror the guest port, so a stolen port causes Lima to land on a different host port than the value in App.Status.KubernetesPort. The kubeconfig probe then writes a config pointing at a port no one is serving.

pkg/service/controllers/ports.go

	"fmt"
	"net"
	"strconv"
)

// ResolvePort binds a TCP port on localhost to confirm availability. If the
// desired port is occupied, the OS assigns a random port instead. The listener
// is closed before returning so the caller can rebind it. Call this immediately
// before the port is needed to minimize the window between releasing and
// rebinding.
func ResolvePort(ctx context.Context, desired int) (int, error) {
	lc := net.ListenConfig{}
	ln, err := lc.Listen(ctx, "tcp", "127.0.0.1:"+strconv.Itoa(desired))
	if err != nil {
		ln, err = lc.Listen(ctx, "tcp", "127.0.0.1:0")

pkg/controllers/app/app/lima-template.yaml

firmware:
  legacyBIOS: false
containerd:
  system: false
  user: false
portForwards:
- guestIPMustBeZero: true
  guestPortRange:
  - 1
  - 65535
  hostIP: 0.0.0.0
  hostPortRange:
  - 0
  - 0
- guestPortRange:
  - 0
  - 0
  guestSocket: /var/run/docker.sock
  hostPortRange:

Fix: tighten the resolve/bind window — move the resolve into the component that actually binds, or have that component report the actual host port back to status and treat App.Status.KubernetesPort as observed rather than intended. Short of that, document the contract breach on the field and on the resolve call so a future reader knows the stored port is advisory.

I3. No tests for the new port-resolution branch or applySpecToTemplate parameter Claude¶

The existing tests cover computeSettledCondition and engineEnabled only. The PR adds:

A new control-flow branch (pkg/controllers/app/app/controllers/app_controller.go:230-241) that fires only when Spec.Kubernetes.Enabled is true and Status.KubernetesPort == 0. Subsequent reconciles skip it, so an envelope-test that simply runs Reconcile twice will not exercise it.
A new parameter on applySpecToTemplate and a new template line (KUBERNETES_PORT: %d) emitted at two call sites (lines 244 and 328).

pkg/controllers/app/app/controllers/app_controller.go

	limaVMErr := r.Get(ctx, client.ObjectKey{Name: limaVMName, Namespace: namespace}, limaVM)
	if limaVMErr != nil && !apierrors.IsNotFound(limaVMErr) {
		return ctrl.Result{}, limaVMErr
	}

	// If the Kubernetes port is not set, set it first.
	if app.Spec.Kubernetes.Enabled && app.Status.KubernetesPort == 0 {
		app.Status.KubernetesPort, err = controllers.ResolvePort(ctx, 7441+instance.Index())
		if err != nil {
			return ctrl.Result{}, fmt.Errorf("failed to resolve Kubernetes port: %w", err)
		}
		if err := r.Status().Update(ctx, &app); err != nil {
			return ctrl.Result{}, fmt.Errorf("failed to update App status: %w", err)
		}
		// The status update causes a requeue.
		return ctrl.Result{}, nil
	}

	if apierrors.IsNotFound(limaVMErr) {
		template, err := applySpecToTemplate(r.LimaTemplateData, app.Spec, app.Status)
		if err != nil {
			return ctrl.Result{}, fmt.Errorf("failed to apply spec to template: %w", err)

pkg/controllers/app/app/controllers/app_controller_test.go


			r := &AppReconciler{Discovery: tt.discovery}
			assert.Equal(t, r.engineEnabled(t.Context()), tt.want)
		})
	}
}

Neither has unit coverage, and no BATS test asserts that app.status.kubernetesPort settles on a sensible value after first reconcile or that the rendered template contains the expected port. The function applySpecToTemplate is pure and easy to cover with a table-driven test on golden output; the reconcile branch is straightforward to cover with the existing fake-client setup.

Fix: add a table-driven unit test for applySpecToTemplate asserting KUBERNETES_PORT: <n> for representative inputs (Kubernetes enabled with port set; Kubernetes disabled with port 0), plus a reconciler test that drives the first-reconcile branch and asserts Status.KubernetesPort becomes non-zero. A BATS check that reads app.status.kubernetesPort and curls the published port for a TLS handshake would close the integration gap.

Suggestions ¶

S1. Pass kubernetesPort, not the whole AppStatus, to applySpecToTemplate Claude¶

pkg/controllers/app/app/controllers/app_controller.go

		return true
	}
	return slices.Contains(enabled, v1alpha1.EngineControllerName)
}

func applySpecToTemplate(baseTemplate string, spec v1alpha1.AppSpec, status v1alpha1.AppStatus) (string, error) {
	hostHome, err := os.UserHomeDir()
	if err != nil {
		return "", fmt.Errorf("failed to get host home directory: %w", err)
	}
	return strings.Join([]string{
		baseTemplate,
		"param:",
		fmt.Sprintf("  CONTAINER_ENGINE: %s", spec.ContainerEngine.Name),
		fmt.Sprintf("  HOST_DOCKER_SOCKET: %q", instance.DockerSocket()),
		fmt.Sprintf("  HOST_HOME_GUEST: %q", toLinuxPath(hostHome)),
		fmt.Sprintf("  KUBERNETES_ENABLED: %v", spec.Kubernetes.Enabled),
		fmt.Sprintf("  KUBERNETES_VERSION: %s", spec.Kubernetes.Version),
		fmt.Sprintf("  KUBERNETES_PORT: %d", status.KubernetesPort),
		"",
	}, "\n"), nil
}

// toLinuxPath converts a host path to a Linux-accessible path inside a Lima VM.
// On Windows, os.UserHomeDir() returns a Windows path (e.g. C:\Users\foo).
// Inside a WSL2 Lima VM the Windows filesystem is mounted at /mnt/<drive>/...,
// so we convert it to WSL2 supported path. On other platforms the path is returned unchanged.

AppStatus carries Conditions []metav1.Condition plus the new KubernetesPort int. The function uses only the latter. The wider signature invites a future reader to wonder whether conditions are intentionally baked into the template, and forces the signature to drift every time an unrelated status field is added.

-func applySpecToTemplate(baseTemplate string, spec v1alpha1.AppSpec, status v1alpha1.AppStatus) (string, error) {
+func applySpecToTemplate(baseTemplate string, spec v1alpha1.AppSpec, kubernetesPort int) (string, error) {
     ...
-    fmt.Sprintf("  KUBERNETES_PORT: %d", status.KubernetesPort),
+    fmt.Sprintf("  KUBERNETES_PORT: %d", kubernetesPort),

Update both call sites (lines 244 and 328) to pass app.Status.KubernetesPort.

pkg/controllers/app/app/controllers/app_controller.go

		// The status update causes a requeue.
		return ctrl.Result{}, nil
	}

	if apierrors.IsNotFound(limaVMErr) {
		template, err := applySpecToTemplate(r.LimaTemplateData, app.Spec, app.Status)
		if err != nil {
			return ctrl.Result{}, fmt.Errorf("failed to apply spec to template: %w", err)
		}
		inputCM := &corev1.ConfigMap{}
		cmErr := r.Get(ctx, client.ObjectKey{Name: inputConfigMapName, Namespace: namespace}, inputCM)

S2. Stored port is sticky across kubernetes.enabled toggle cycles Claude Codex Gemini¶

pkg/controllers/app/app/controllers/app_controller.go

	if limaVMErr != nil && !apierrors.IsNotFound(limaVMErr) {
		return ctrl.Result{}, limaVMErr
	}

	// If the Kubernetes port is not set, set it first.
	if app.Spec.Kubernetes.Enabled && app.Status.KubernetesPort == 0 {
		app.Status.KubernetesPort, err = controllers.ResolvePort(ctx, 7441+instance.Index())
		if err != nil {
			return ctrl.Result{}, fmt.Errorf("failed to resolve Kubernetes port: %w", err)
		}
		if err := r.Status().Update(ctx, &app); err != nil {

The guard re-resolves only when KubernetesPort == 0. A user who enables Kubernetes (port resolves to 7443), disables it, and re-enables it minutes later re-uses the stored 7443 with no check that the host port is still free. Combined with the I2 staleness window, this lets the stored port drift arbitrarily far from reality. Either zero KubernetesPort on the enabled true→false transition so the next enable resolves fresh, or document the stickiness in the field's godoc so a future kubeconfig-merge feature can surface the discrepancy when Lima ends up forwarding to a different host port.

S3. # $PARAM_KUBERNETES_PORT validation-marker comment is opaque and fragile Claude¶

pkg/controllers/app/app/lima-template.yaml

provision:
- mode: system
  script: |
    #!/bin/bash

    # All param need to be mentioned for Lima validation.
    # $PARAM_KUBERNETES_PORT

    set -o errexit -o nounset -o pipefail -o xtrace

    systemctl stop rancher-desktop.target

The existing line is already terse; combined with a comment that looks like a stub variable reference, a future reader has to guess what # $PARAM_KUBERNETES_PORT is doing there. The pattern is also brittle: anyone adding a new param has to remember to add a matching marker comment, even though the for p in "${!PARAM_@}" loop below would otherwise pick it up automatically.

-    # All param need to be mentioned for Lima validation.
-    # $PARAM_KUBERNETES_PORT
+    # Lima validates that every declared `param:` entry is referenced somewhere
+    # in the template. The loop below uses bash indirection that Lima's static
+    # scanner cannot see, so list any params consumed only via ${!PARAM_@} here:
+    #   $PARAM_KUBERNETES_PORT

S4. Design docs still describe the API as hardcoded 6443 Claude Codex¶

docs/design/networking.md:185 reads:

> - Kubernetes port forwarding: Pre-forward 127.0.0.1:6443 to 192.168.127.2:6443 so kubectl on the host reaches the K8s API inside the VM.

The future-work item still names 6443, contradicting the PR. docs/design/api_app.md's App status example (lines 66-85) shows only conditions — no kubernetesPort, even though the field is now part of the documented API surface.

docs/design/networking.md


    Note over HS: Match found for GUID-A
```

The registry is rescanned every second because `startHostSwitch` runs before the hostagent boots the WSL2 VM. On a fresh system where no other WSL2 distro is running, the utility VM may not yet appear in the registry when the first scan runs.

The signature phrase is `"github.com/rancher-sandbox/rancher-desktop/src/go/networking"`. This is a fixed protocol constant shared between host-switch and the guest's `network-setup` binary. Because the signature is product-wide rather than per-instance, this discovery assumes only one opensuse WSL2 instance runs at a time.

### Phase 2: Data Channel (port 6656)

After identifying the VM, the host-switch creates a listener and signals readiness.

```mermaid
sequenceDiagram
    participant HS as Host-Switch
    participant NS as network-setup (guest)
    participant VS as vm-switch (guest)

    HS->>HS: Listen on vsock port 6656
    HS->>NS: Dial port 6669, send "READY"
    NS->>NS: Connect to vsock port 6656
    NS->>VS: Spawn vm-switch (pass vsock FD)
    VS->>VS: Create TAP device (eth0)
    VS->>HS: Ethernet frames over vsock
    HS->>VS: Ethernet frames over vsock
```

Ethernet frames use a simple framing protocol: a 2-byte little-endian size prefix followed by the raw frame payload.

## Virtual Network Services

Update the networking bullet to refer to App.Status.KubernetesPort (or 7441+index) and add kubernetesPort: <n> plus a one-line description to the api_app.md status example.

Design Observations ¶

Concerns ¶

Host-port reservation is advisory but is treated as authoritative. (in-scope) Claude Codex ResolvePort opens and immediately closes a TCP listener. Nothing between that close and Lima's actual bind prevents another process from taking the port. Until this PR, the helper's callers all bound the port directly from the same goroutine. This PR is the first caller whose stored value is the user-facing port and whose binding component (Lima's hostagent) lives elsewhere. The split between "who chose the port" and "who binds the port" is the root of I2 and S2 and is worth re-evaluating at the design level — either the binding component should choose, or App.Status.KubernetesPort should be marked as observed rather than intended.
KUBERNETES_PORT=0 is written to /etc/sysconfig/rancher-desktop whenever Kubernetes is disabled. (future) Claude When Spec.Kubernetes.Enabled == false, the block at lines 230-241 is skipped, Status.KubernetesPort stays at zero, and applySpecToTemplate still emits KUBERNETES_PORT: 0. Today this is latent — the k3s command-override.conf consuming the value is gated by the systemd ExecCondition on KUBERNETES_ENABLED, so 0 never reaches --https-listen-port. A future systemd unit that reads KUBERNETES_PORT without also gating on KUBERNETES_ENABLED would silently misbehave. Either omit KUBERNETES_PORT from the emitted block when Kubernetes is disabled, or document the contract on the field.

Strengths ¶

The set-status-then-requeue structure at lines 230-241 keeps port resolution idempotent and out of the same write that creates the LimaVM, so a transient API-server failure cannot leave the system half-configured. Claude
Both call sites of applySpecToTemplate (lines 244 and 328) were updated together, keeping the steady-state template-update path and the first-time creation path consistent on what data feeds the template. Claude
Storing the selected port in App status gives CLI and UI consumers a single observable API field, once the staleness window in I2 is tightened. Codex

Testing Assessment ¶

Untested scenarios, highest risk first:

First reconcile with Spec.Kubernetes.Enabled=true writes Status.KubernetesPort and requeues without creating the LimaVM (I3, the new branch at lines 230-241).
applySpecToTemplate honours status.KubernetesPort in the emitted template (pure function, no current test).
Concurrent-writer race: EngineReconciler.setEngineCondition updates a condition between the Get at line 136 and the Status().Update at line 236; today's code returns an error and is requeued (I1).
applySpecToTemplate when Spec.Kubernetes.Enabled=false and Status.KubernetesPort=0 — verify the template still parses and the embedded KUBERNETES_PORT: 0 does not break downstream consumers.
kubernetes.enabled toggle cycles — disable then re-enable does not re-resolve the port (S2).
End-to-end: rendered Lima template, generated kubeconfig, and App.Status all agree on the selected fallback port when 7441+index is occupied.

go test ./pkg/controllers/app/app/controllers passes on the PR SHA.

Documentation Assessment ¶

docs/design/networking.md:185 still names the old hardcoded 6443 (S4).
docs/design/api_app.md shows an App status example without kubernetesPort (S4).
The new KubernetesPort godoc in pkg/apis/app/v1alpha1/app_types.go:106-109 is accurate but minimal. One additional sentence noting that the value is the host port resolved on first reconcile (not re-checked per reconcile) and that it is set only once when kubernetes.enabled first goes true would close the documentation half of S2.
pkg/controllers/app/app/lima-template.yaml:35-36 validation-marker comment is opaque (S3).

Commit Structure ¶

The PR is a single commit (2db5c92 K3s: Use a dynamic port); the diff is self-contained. No fixup history, no scope creep.

Acknowledged Limitations ¶

PR review comment (resolved):

pkg/controllers/app/app/lima-template.yaml — @Nino-K asked whether HOST_HOME_GUEST needed shell quoting when it contains quotes, and suggested %q instead of %s (discussion r3249796148). @mook-as agreed and the current diff uses printf '%s=%q\n' "${p#PARAM_}" "${!p}" for every PARAM (discussion r3249928003). The amendment is in-tree.

Unresolved Feedback ¶

General comment from @mook-as on the PR: *"I don't understand why BATS is failing on Windows, that looks unrelated. But it's passing fine on main, so it's probably to do with the PR?"* The current CI snapshot for PR #355 shows all Windows BATS jobs (bats-app, bats-lima, bats-others) passing, so whatever blocked an earlier push appears resolved. No additional commit explains the recovery. Worth a one-line acknowledgement on the PR before merge so the comment does not look like an unresolved concern.

Agent Performance Retro ¶

Claude ¶

Strongest performance this round. Caught every Important finding raised by any agent, contributed the only test-coverage finding (I3), and added three of the four Suggestions on its own (S1 function-signature cohesion, S3 cryptic marker comment, S4 docs reconciliation). Framed the host-port-reservation race as a design observation rather than an Important — defensible, since the race predates this PR — and complemented the framing with the explicit KUBERNETES_PORT=0-when-disabled observation no other agent surfaced. Worth highlighting: Claude's I1 fix sketch is a drop-in replacement that another reviewer could land verbatim, including an idempotency guard (if latest.Status.KubernetesPort != 0 { return nil }) the other agents' fixes omit.

Codex ¶

Crisp and accurate. Two Important findings (I1 status-update pattern, I2 port-staleness window) both verified against the code, with citations into the engine reconciler's own RetryOnConflict block and the ResolvePort godoc; the I1 fix cites the exact line of the comparable retry block. Codex's I2 uniquely bundled the disable/re-enable staleness angle (Claude split it out as S2) — the bundled framing is reasonable but the split is clearer for tracking fixes. No false positives, no severity downgrades. Codex also ran go test ./pkg/controllers/app/app/controllers directly to confirm the type change compiles.

Gemini ¶

Two false positives ate most of the signal. C1 claimed K3s never sees the dynamic port because /etc/sysconfig/rancher-desktop is not sourced — wrong: the rancher-desktop-opensuse image's k3s.service.d/{environment,command-override}.conf source it and use $KUBERNETES_PORT as --https-listen-port. The image lives in a sibling repo that Gemini did not have access to, but the I-just-checked-the-image-side gap was missed, and the framing escalated a "I cannot see the image side" gap to a Critical regression. I1 (%q breaking systemd parsing as the root cause of Windows BATS failure) is also wrong: the current PR's Windows BATS jobs all pass, and the only env-file values actually consumed by systemd services (CONTAINER_ENGINE, KUBERNETES_ENABLED, KUBERNETES_PORT) have no characters %q would quote, so there is no observable bug. C2 (non-SSA status update) and I2 (port-probe race) reduced to the same Important findings as the other two agents — useful corroboration but no unique insight. As expected from the repo context, Gemini also did not run git blame.

Summary ¶

	Claude Opus 4.7	Codex GPT 5.5	Gemini 3.1 Pro
Duration	8m 48s	7m 12s	—
Findings	2I 4S	2I 1S	1I
Tool calls	41 (Grep 20, Read 12, Bash 8)	45 (shell 45)	—
Design observations	2	1	1
False positives	0	0	2
Unique insights	3	0	0
Files reviewed	4	4	4
Coverage misses	0	0	0
Totals	2I 4S	2I 1S	1I
Downgraded	0	0	1 (I→S)
Dropped	0	0	2

Claude was the most valuable reviewer this round — every finding it raised survived consolidation, and three of the seven consolidated findings (I3, S1, S3) were unique to it. Codex was a clean, accurate second reviewer that corroborated Claude's correctness work and contributed the precise EngineReconciler citation for I1's fix. Gemini's contribution after dropping its two false positives is best summarised as "corroborated what the other two already had"; the false-positive cost (one Critical, one Important) was high relative to the unique signal added.

Reconciliation

Gemini C1 (K3s ignores dynamically allocated port): dropped as false positive. The downstream consumer in the rancher-desktop-opensuse image already sources /etc/sysconfig/rancher-desktop and uses $KUBERNETES_PORT.
Gemini I1 (%q breaks systemd parsing as the root cause of Windows BATS failure): dropped as false positive. Current PR Windows BATS jobs all pass, and the systemd-consumed variables (CONTAINER_ENGINE, KUBERNETES_ENABLED, KUBERNETES_PORT) take no characters %q would quote.
Gemini C2 (non-SSA status update): downgraded from Critical to Important (I1). The error path requeues, so the failure mode is wasted churn rather than data loss.

Review Process Notes ¶

Repo context updates ¶

Add a context entry for the rancher-desktop-opensuse sibling repo and how it consumes /etc/sysconfig/rancher-desktop. When the App controller emits a Lima param that flows into the env file, the consumer is a systemd drop-in in rancher-desktop-opensuse/root/usr/local/lib/systemd/system/*.service.d/{environment,command-override}.conf — not a script inside lima-template.yaml. Reviewers without that context will read the controller-side diff as "the value is written but nothing reads it" and raise a Critical false positive (as happened this round). Suggested wording for deep-review-context.md: *"The rancher-desktop-opensuse image (sibling repo at ../rancher-desktop-opensuse) is the consumer of /etc/sysconfig/rancher-desktop. Each service that needs an env-file value carries a *.service.d/environment.conf drop-in with EnvironmentFile=-/etc/sysconfig/rancher-desktop and a command-override.conf that references the var (e.g., k3s uses $KUBERNETES_PORT for --https-listen-port). When reviewing changes that write to that env file, trace the consumer in the image repo, not in lima-template.yaml."*
Add a context entry for pkg/service/controllers.ResolvePort's contract. The helper opens a TCP listener on 127.0.0.1:desired, closes it immediately, and returns the port. Its godoc says callers must invoke it immediately before binding. The check is also localhost-only and does not detect another process already bound on a specific external IP, so the result is best treated as advisory. Suggested wording: *"controllers.ResolvePort(ctx, desired) is a localhost-only probe that closes the listener before returning, so the result is advisory. Flag callers that persist the result across reconciles or pass it to a binder that runs minutes later — the helper's godoc requires "immediately before the port is needed". Also flag use in code paths where the eventual bind happens on a non-loopback interface, since the localhost probe does not see external-interface holders."*