# Deep Review: 20260413-120825-pr-278

| | |
|---|---|
| **Date** | 2026-04-13 12:08 |
| **Repo** | [rancher-sandbox/rancher-desktop-daemon](https://github.com/rancher-sandbox/rancher-desktop-daemon) |
| **Round** | 1 |
| **Author** | [@Nino-K](https://github.com/Nino-K) |
| **PR** | [#278](https://github.com/rancher-sandbox/rancher-desktop-daemon/pull/278) — Bundle k3s versions to app controller |
| **Commits** | `070efc9` Write KUBERNETES_ENABLED and KUBERNETES_PORT to sysconfig<br>`e57a182` Resolve HOST_HOME param to actual host home directory<br>`16ef12c` Add BATs test for supported k8s version in k3s-version.json<br>`9df4002` Bundle k3s-versions.json and validate Kubernetes version |
| **Reviewers** | Claude Opus 4.6, Codex GPT 5.4, Gemini 3.1 Pro |
| **Verdict** | **Merge with fixes** — validator contract is broken for ~11% of bundled versions and empty-version bypass lets bad specs reach the installer; both need fixes before merge. |
| **Wall-clock time** | `19 min 49 s` |

---

## Executive Summary

This PR embeds a filtered `k3s-versions.json` into the App controller and rejects unsupported `spec.kubernetes.version` values with `Ready=False/UnsupportedKubernetesVersion` before creating the LimaVM. It also fixes two unrelated VM-provisioning issues: `HOST_HOME` is now resolved from `os.UserHomeDir()` instead of the literal Lima `{{.Home}}` placeholder, and the sysconfig file now carries `KUBERNETES_ENABLED` and `KUBERNETES_PORT` for k3s service conditions.

The core intent is sound, but the validation has several gaps that let bad specs through or get stuck in a bad state. All three agents independently flagged the suffix-mismatch issue — the bundled list contains `+k3s3` releases but the install template hard-codes `+k3s1`, so four versions pass validation and then fail at install time. All three also flagged the empty-version bypass and the dropped `HOST_HOME` quoting. Two agents caught that `Ready=False` persists indefinitely once set, even after the user fixes the spec. Claude uniquely caught a test assertion that accidentally accepts `NotReady`, and Gemini uniquely caught that the unconditional `os.UserHomeDir()` call at the top of Reconcile would block finalizer cleanup if `HOME` were unset.

No critical-severity findings survived consolidation, but the important-issue list is dense (6 findings) and most block the feature's stated contract.

---

## Critical Issues

None.

---

## Important Issues

I1. **Bundled list contains versions the install template cannot install** — `pkg/controllers/app/app/k3s-versions.json:23,33,39,43` [Claude Opus 4.6, Codex GPT 5.4]

```json
"v1.32.11+k3s3",
"v1.33.7+k3s3",
"v1.34.3+k3s3",
"v1.35.0+k3s3",
```

`parseK3sVersions` strips both the leading `v` and the `+k3s*` suffix, so the lookup set becomes `{"1.32.11", "1.33.7", ...}`. A user setting `kubernetes.version=1.32.11` passes validation at `app_controller.go:127`. The controller then creates the LimaVM, and the provisioning script builds `INSTALL_K3S_VERSION` from the bare version plus a hardcoded `+k3s1` suffix — a tag that does not exist on the k3s release page. `curl -sfL https://get.k3s.io | sh -` aborts under `set -o errexit` and the VM never reaches Running.

```bash
# pkg/controllers/app/app/lima-template.yaml:57
export INSTALL_K3S_VERSION="v${PARAM_KUBERNETES_VERSION}+k3s1"
```

4 of 35 bundled versions (~11%) hit this. The validator's contract ("if it's in the list, it works") is established and immediately violated.

Fix: have `parseK3sVersions` return a map from bare semver to the full release identifier (e.g., `{"1.32.11": "v1.32.11+k3s3"}`). Thread the full tag through `applySpecToTemplate` as a new `PARAM_K3S_RELEASE`, and rewrite the template line to `export INSTALL_K3S_VERSION="${PARAM_K3S_RELEASE}"`. Drop the hardcoded `+k3s1`.

---

I2. **`os.UserHomeDir()` call blocks finalizer cleanup on failure** — `pkg/controllers/app/app/controllers/app_controller.go:70-73` [Gemini 3.1 Pro]

```go
hostHome, err := os.UserHomeDir()
if err != nil {
    return ctrl.Result{}, fmt.Errorf("failed to get host home directory: %w", err)
}

// Handle deletion, delete the LimaVM and wait for it to finish cleaning up.
if base.IsBeingDeleted(&app) {
```

The `os.UserHomeDir()` lookup runs unconditionally at the top of every reconcile, before `base.IsBeingDeleted(&app)` is evaluated. If `HOME` ever becomes unset or inaccessible (e.g., the daemon is restarted under a launchd context that clears the environment), the finalizer cleanup path deadlocks: the App can never be deleted even though the cleanup logic does not need the host home directory. The correct placement is after the deletion branch and the cleanup-finalizer setup, co-located with the first caller of `applySpecToTemplate`.

Fix: move the `os.UserHomeDir()` call below the `base.IsBeingDeleted` block. Better: cache the result once at controller setup time (see S1).

---

I3. **Empty `kubernetes.version` with `enabled=true` bypasses validation** — `pkg/controllers/app/app/controllers/app_controller.go:122` [Claude Opus 4.6, Codex GPT 5.4, Gemini 3.1 Pro]

```go
if app.Spec.Kubernetes.Enabled && app.Spec.Kubernetes.Version != "" {
```

`KubernetesSpec.Version` is `+optional` with no `+kubebuilder:default`, so a user creating an App with `{kubernetes: {enabled: true}}` and no version sets `Version=""`. The check short-circuits, the reconciler proceeds, the template exports `INSTALL_K3S_VERSION="v+k3s1"`, and k3s install fails with a non-existent version tag. The CLI `rdd set` explicitly supports clearing the version field (`set.bats:124-131`), so this is a reachable state.

This is exactly the class of bad input the PR set out to prevent, but the empty-string case falls through into the VM. The user only sees a `CreateFailed` or k3s install error with no pointer to the spec mistake.

Fix: drop the `Version != ""` clause and let the empty-string lookup naturally fail with `UnsupportedKubernetesVersion`. Optionally tailor the message to "version is required when enabled" when empty. A `+kubebuilder:default` matching a bundled version would close this at the API layer.

```diff
-if app.Spec.Kubernetes.Enabled && app.Spec.Kubernetes.Version != "" {
+if app.Spec.Kubernetes.Enabled {
```

---

I4. **Stale `Ready=False` condition is never cleared after the spec is fixed** — `pkg/controllers/app/app/controllers/app_controller.go:121-144,268-284` [Claude Opus 4.6, Codex GPT 5.4]

```go
if _, ok := supportedVersions[app.Spec.Kubernetes.Version]; !ok {
    // sets Ready=False / UnsupportedKubernetesVersion
    return ctrl.Result{}, nil
}
// falls through — never touches Ready again
```

The mirror loop at lines 268-284 only writes condition types the LimaVM exposes (`Created`, `Running`). LimaVM never sets `Ready`. So once the validation block has stamped `Ready=False` on the App and the user does `rdd set kubernetes.version=1.32.0` (or `kubernetes.enabled=false`), the condition persists indefinitely. `kubectl get app -o yaml` will show `Ready=False` even though the VM is happily running k3s. The new BATS tests don't catch this because every test calls `delete_app` first, which destroys the conditions.

Fix: when validation passes (or Kubernetes is disabled entirely), drop the prior `Ready` condition and requeue. Using `apimeta.RemoveStatusCondition` followed by `r.Status().Update` is the minimal change:

```diff
+if apimeta.RemoveStatusCondition(&app.Status.Conditions, "Ready") {
+    if err := r.Status().Update(ctx, &app); err != nil {
+        return ctrl.Result{}, err
+    }
+    return ctrl.Result{}, nil
+}
```

---

I5. **Dropped `HOST_HOME` quoting is fragile against paths with YAML-special characters** — `pkg/controllers/app/app/controllers/app_controller.go:48-51` [Claude Opus 4.6, Gemini 3.1 Pro]

```go
return baseTemplate + fmt.Sprintf(
    "\nparam:\n  CONTAINER_ENGINE: %s\n  HOST_HOME: %s\n  KUBERNETES_ENABLED: %v\n  KUBERNETES_VERSION: %s\n",
    spec.ContainerEngine.Name,
    hostHome,
```

The previous line read `HOST_HOME: \"{{.Home}}\"` — the literal string was double-quoted in the YAML output. Commit e57a182 replaces `{{.Home}}` with `os.UserHomeDir()` (correct fix for the literal-template bug) but drops the quotes. On Windows, `os.UserHomeDir()` returns `%USERPROFILE%`, which commonly includes spaces (`C:\Users\Jan Dubois`) and can legally contain `#` in usernames, which turns into a YAML comment. Even on Unix, a user with a `#` or `:` in their home path would trip the YAML parser.

Fix: use `%q` so the value is always emitted as a quoted YAML scalar. Lima's param parser accepts double-quoted scalars — the existing template uses them elsewhere (e.g., `hostSocket: "{{.Home}}/.rd/docker.sock"`).

```diff
-"\nparam:\n  CONTAINER_ENGINE: %s\n  HOST_HOME: %s\n  KUBERNETES_ENABLED: %v\n  KUBERNETES_VERSION: %s\n",
+"\nparam:\n  CONTAINER_ENGINE: %s\n  HOST_HOME: %q\n  KUBERNETES_ENABLED: %v\n  KUBERNETES_VERSION: %s\n",
```

---

I6. **`assert_output --partial "Ready"` accepts `NotReady`** — `bats/tests/32-app-controllers/app.bats:330-333` [Claude Opus 4.6]

```bash
@test "kubernetes nodes are Ready" {
    run -0 rdd limavm shell "${VM_NAME}" sudo k3s kubectl get nodes
    assert_output --partial "Ready"
}
```

`kubectl get nodes` prints either `Ready` or `NotReady` in the STATUS column. A substring search for `Ready` matches both, so the test gives a false positive when the node is still `NotReady` (cordoned, network init, transient failures). Single-node k3s usually flips Ready quickly, so this rarely fires in practice — but the test claims to verify readiness and doesn't.

Fix: use `kubectl wait` and let it block on the actual condition. No `assert_output` needed.

```diff
 @test "kubernetes nodes are Ready" {
-    run -0 rdd limavm shell "${VM_NAME}" sudo k3s kubectl get nodes
-    assert_output --partial "Ready"
+    run -0 rdd limavm shell "${VM_NAME}" sudo k3s kubectl wait \
+        --for=condition=Ready nodes --all --timeout=60s
 }
```

---

## Suggestions

S1. **`parseK3sVersions` runs on every reconcile** — `pkg/controllers/app/app/controllers/app_controller.go:123` [Claude Opus 4.6, Gemini 3.1 Pro]

```go
supportedVersions, err := parseK3sVersions(r.K3sVersionsData)
if err != nil {
    return ctrl.Result{}, fmt.Errorf("failed to load k3s versions: %w", err)
}
```

The data is a compile-time `//go:embed` string; the parsed set never changes. Parsing on every reconcile is wasted work and the error path is unreachable in practice — if the embed were corrupt at build time, the controller would silently retry forever instead of failing visibly at startup.

Fix: parse once in `controller.go`'s `RegisterWithManager` and pass the resulting map into `AppReconciler`. Surface the parse error at that point so a malformed embed fails the manager to start. A side benefit: the call site can also cache `hostHome` (see I2) and drop both `K3sVersionsData` and the per-reconcile `os.UserHomeDir()` call in one refactor.

---

S2. **Non-SSA `Status().Update` on App — replicates existing anti-pattern** — `pkg/controllers/app/app/controllers/app_controller.go:137,280` [Claude Opus 4.6, Gemini 3.1 Pro]

The new validation block adds a second `Status().Update(ctx, &app)` on the App without `retry.RetryOnConflict` or Server-Side Apply. The repo context flags this because `App.Status.Conditions` has multiple writers today (`AppReconciler` mirrors LimaVM conditions) and the design doc anticipates an `EngineReconciler` that will write `ContainerEngineReady`. The new site is no worse than the existing mirror update at line 280 — the PR replicates an existing gap rather than introducing a novel one — but both sites should eventually migrate.

Fix (when tackling both sites): wrap in `retry.RetryOnConflict` or convert to Server-Side Apply. Deferring is acceptable if done in a dedicated follow-up PR.

---

S3. **`Ready` condition only ever set to `False`** — `pkg/controllers/app/app/controllers/app_controller.go:128-135` [Claude Opus 4.6]

K8s convention is `Ready=True` means the resource is fully reconciled and operational. Here `Ready` is only written on the failure path and never set to `True` in the happy path. A reader running `kubectl wait --for=condition=Ready app/app` in the success case would block forever because the condition never appears — which is surprising and contradicts convention.

Consider either (a) using a more specific condition type like `KubernetesVersionValid` to avoid reader confusion, or (b) maintaining `Ready` consistently (set to True after LimaVM `Running=True` and validation passes). Option (a) is smaller.

---

S4. **`delete_app` now resurrects a missing App via `rdd set`** — `bats/tests/32-app-controllers/app.bats:14-23` [Claude Opus 4.6]

```bash
delete_app() {
    # Stop the VM first so the LimaVM finalizer clears quickly.
    rdd set running=false 2>/dev/null || true
    rdd ctl delete app "${APP_NAME}" --ignore-not-found
```

`rdd set` creates the App if it does not already exist. When `delete_app` is called against a fresh control plane (e.g., the new `unsupported Kubernetes version` test immediately follows `verify App can be recreated after deletion` which just deleted everything), this call recreates a brand-new App, the controller starts adding finalizers and creating the input ConfigMap, and the subsequent `rdd ctl delete app` tears it back down. Wasteful and adds seconds to a sequence the comment claims should be faster.

Fix: use `rdd ctl patch`, which fails silently (via the redirect) when the resource doesn't exist:

```diff
-rdd set running=false 2>/dev/null || true
+rdd ctl patch app "${APP_NAME}" --type=merge -p '{"spec":{"running":false}}' 2>/dev/null || true
```

---

S5. **Design doc `api_app.md` is stale after this PR** — `docs/design/api_app.md:90,92-105` [Claude Opus 4.6, Codex GPT 5.4]

The doc says `spec.kubernetes.version` defaults to `1.30.2`, but no such default exists in the Go types and `1.30.2` is now below the supported floor (`1.32.0`). The condition table (lines 94-104) only lists `Created` and `Running`, and an introductory sentence says conditions are "mirrored from the owned `LimaVM` resource." The new `Ready/False/UnsupportedKubernetesVersion` condition is controller-owned, not mirrored — a reader following the doc would not expect it to ever appear.

Fix: update the default claim (either restore a real default or document that the field is required when `enabled=true`), add a row for the `Ready` condition, and clarify that a small set of App conditions are owned by the reconciler directly rather than mirrored.

---

S6. **Error message references a file the user cannot see** — `pkg/controllers/app/app/controllers/app_controller.go:132` [Claude Opus 4.6]

```go
Message: fmt.Sprintf("Kubernetes version %q is not supported; see the bundled k3s-versions.json for valid versions", app.Spec.Kubernetes.Version),
```

The user has no path to the bundled file — it's embedded into the binary. A more useful message would name a CLI affordance or list the channels (`stable`, `latest`) from the same JSON.

---

S7. **300s timeout on first VM start with k3s install is tight** — `bats/tests/32-app-controllers/app.bats:319-323` [Claude Opus 4.6]

```bash
@test "start VM with Kubernetes enabled" {
    rdd set running=true
    rdd ctl wait --for=condition=Running \
        limavm/"${VM_NAME}" --namespace "${RDD_NAMESPACE}" --timeout=300s
}
```

The `Running` condition only fires after the provision script completes, and the provision script downloads and installs k3s (~75 MB binary plus install steps) on first boot. 300s is workable on a fast network but borderline on slow CI runners. Consider bumping to 360s or 480s to reduce flake risk.

---

## Design Observations

**Concerns**

- **Validation belongs at admission, not in the reconciler** *(future)* [Claude Opus 4.6] — Validating in `Reconcile` means `rdd set kubernetes.version=invalid` succeeds at the CLI layer, the user gets no immediate feedback, and they have to read `app.status.conditions` to discover the spec was rejected. A validating admission webhook (RDD already runs one for LimaVM) would reject the bad spec at API-call time and return the error directly. Alternatively, a generated `+kubebuilder:validation:Enum=...` derived from `k3s-versions.json` at build time would push validation into the API server with no extra runtime infrastructure.
- **Missing default for `kubernetes.version`** *(in-scope)* [Claude Opus 4.6] — The Go spec has no default, the CRD has no default, and the design doc claims a `1.30.2` default that is now below the supported floor. The combination of "no default" and "the validator only fires when `version != ""`" is what produces I3. Either commit to a real default or always require version when `enabled=true`.
- **Supported-version shape conflates validation and installation** *(in-scope)* [Codex GPT 5.4] — The controller now treats "supported version" as a bare semver while the installer consumes a full k3s tag. A single parsed structure that preserves both forms would eliminate the suffix drift in I1 and make future channel updates safer.
- **Per-reconcile JSON parsing lacks startup-time failure surface** *(in-scope)* [Gemini 3.1 Pro] — The embedded JSON is only validated when the first App with `kubernetes.enabled=true` is reconciled. A malformed embed introduced in a future bump would not surface until runtime. See S1.

**Strengths**

- *(in-scope)* [Claude Opus 4.6] The controller correctly returns early on validation failure without updating the template ConfigMap, so a running VM keeps its old working configuration if the user fans out a bad version. Good fail-safe behavior.
- *(in-scope)* [Claude Opus 4.6] Splitting `applySpecToTemplate` to take an explicit `hostHome` parameter (rather than reading the env inside the helper) keeps the helper pure and testable.
- *(in-scope)* [Codex GPT 5.4] Bundling a filtered static `k3s-versions.json` into the controller is the right direction: it removes an external dependency from reconcile-time validation and blocks obviously bad specs before the LimaVM is created.

---

## Testing Assessment

The added tests cover the happy path (valid version → LimaVM created → k3s active) and the rejection path (invalid version → `Ready=False`). Gaps, ranked by risk:

1. **Recovery from invalid → valid version** — would catch I4 (stale `Ready=False`). Test: set bad version, observe `Ready=False`, set good version, assert `Ready` condition is gone (or `True`).
2. **Empty version with `kubernetes.enabled=true`** — would catch I3. Test: `rdd set kubernetes.enabled=true` with no version, expect `Ready=False`.
3. **A `+k3s3` version** — would catch I1. Test: pick `1.34.3` from the bundled list and run the same start-VM flow.
4. **`assert_output --partial "Ready"` false positives** — I6 itself. Switching to `kubectl wait` eliminates the class of problem.
5. **Disabling Kubernetes after a validation failure** — set bad version, then `rdd set kubernetes.enabled=false`, expect `Ready=False` to clear.

Codex ran `go test ./pkg/controllers/app/app/... ./pkg/controllers/lima/limavm/...` in its sandbox — passed. No agent ran the BATS suite.

## Documentation Assessment

- `docs/design/api_app.md` needs the updates in S5 (drop/refresh the `1.30.2` default claim, add the `Ready/UnsupportedKubernetesVersion` row, clarify mirrored vs. owned conditions).
- `parseK3sVersions` and `k3sVersionsJSON` have brief godocs; the `K3sVersionsData` field on `AppReconciler` does not. Once S1 lands (cache the parsed map), the field can be renamed and documented in one move.

## Commit Structure

The four commits are logically separated (bundle+validate / hostHome fix / sysconfig fix / bats tests) and individually reviewable. One concern: `16ef12c "Add BATs test for supported k8s version in k3s-version.json"` lands the validation tests but implicitly depends on `070efc9` (sysconfig fix) and `e57a182` (hostHome fix) to actually pass — bisecting on the bats commit would leave the test suite broken. Consider squashing the BATS commit into the implementation commit, or moving it to the end of the series.

## Acknowledged Limitations

None. The two prior PR comments (Jan: "needs a rebase due to template using finch image"; Nino: "rebased") relate to CI infrastructure and are already addressed by the current branch.

## Unresolved Feedback

Nothing outstanding from the two prior comments.

---

## Agent Performance Retro

### [Claude]

Claude produced the most comprehensive review — caught all three of the convergent findings (version suffix, empty version, stale Ready condition) and added two unique ones (the `assert_output "Ready"` test accepting `NotReady`, and the `delete_app` helper accidentally resurrecting the App via `rdd set`). Claude also did the strongest cross-referencing: it traced `docs/design/api_app.md` to flag the stale `1.30.2` default claim and the condition-table gap, and noted that the sysconfig template now writes `KUBERNETES_PORT=6443` but nothing on the PR touches a k3s port config. Calibration was reasonable — no findings needed downgrading. The one miss: Claude flagged `os.UserHomeDir()` as a caching-level suggestion (S7) without tracing the deletion-path deadlock Gemini caught.

### [Codex]

Codex delivered a tight, focused review — three important findings, all of which survived consolidation at the same severity. Its unique contribution was the strongest framing of the version suffix bug: it pinpointed the exact `+k3s3` entries in the bundled JSON and mapped them to the template's `+k3s1` suffix with file:line references. It also flagged the `api_app.md` staleness independently. Codex missed the `HOST_HOME` quoting regression and the `assert_output` bug, and skipped the per-reconcile JSON parsing concern. The surface area was smaller than Claude's but the signal-to-noise ratio was very high — zero false positives or duplicates.

### [Gemini]

Gemini was the only agent to catch the `os.UserHomeDir()` finalizer deadlock, which is a genuinely important regression that the other two missed or softened. It also independently flagged the `HOST_HOME` quoting, the non-SSA `Status().Update`, and the per-reconcile JSON parsing. Calibration was high — Gemini marked the `os.UserHomeDir()` issue and the empty-version bypass as Critical; both got downgraded to Important in consolidation (the first because the trigger is rare on a normal desktop daemon, the second because the failure is visible, not silent). The review had zero Suggestions — anything that might have been minor got folded into Important or omitted. As expected, Gemini did not run `git blame` (daily quota) and relied on the explicit commit list in the prompt instead.

### Summary

| | Claude Opus 4.6 | Codex GPT 5.4 | Gemini 3.1 Pro |
|---|---|---|---|
| Duration | 14m 29s | 4m 08s | 3m 33s |
| Findings | 5I 8S | 3I | 4I |
| Tool calls | 47 (Bash 29, Read 14, Grep 4) | 47 (shell 45, plan 1, stdin 1) | 19 (grep_search 10, run_shell_command 7, read_file 2) |
| Design observations | 4 | 2 | 1 |
| False positives | 0 | 0 | 0 |
| Unique insights | 2 | 0 | 1 |
| Files reviewed | 6 | 6 | 6 |
| Coverage misses | 0 | 0 | 0 |
| **Totals** | **5I 8S** | **3I** | **4I** |
| Downgraded | 0 | 0 | 4 (I→S) |
| Dropped | 0 | 0 | 0 |


All three agents converged on the core findings (suffix mismatch, empty version, HOST_HOME quoting) which raises confidence that the validation gaps are real and not artifacts of model bias. Claude provided the most breadth; Codex the tightest signal; Gemini the one genuinely unique critical catch (finalizer deadlock). Taken together, the three reviews cover the important-issue space completely — no agent's gaps leave a hole that the other two don't fill.

### Reconciliation

- Gemini C1 (`os.UserHomeDir()` blocks finalizer cleanup): critical → important I2. The failure mode is real but the trigger (HOME unset in the daemon environment) is uncommon on a single-user developer desktop; severity aligned with other regressions of comparable likelihood.
- Gemini C2 (Empty `kubernetes.version` bypasses validation): critical → important I3. The failure is visible to the user (VM fails to start with a k3s install error) rather than silent, and Codex/Claude both rated it Important independently.
- Gemini I2 (Non-SSA `Status().Update`): important → suggestion S2. The new call site replicates an existing pattern at `app_controller.go:280`; the PR does not introduce a novel anti-pattern, and the existing single-writer setup means conflicts are rare in practice.
- Gemini I3 (Per-reconcile JSON parse): important → suggestion S1. Pure performance / startup-surface concern with no correctness impact; Claude rated the same issue as suggestion.

---

## Review Process Notes

**Repo context updates**

- None this round. The existing `deep-review-context.md` entries on multi-writer `App.Status.Conditions`, non-SSA `Status().Update`, and `ctrl.Result{}, nil` return verification were all relevant and matched agent findings — no staleness detected.

**Skill improvements**

- None. The three-agent setup caught all findings; no agent-specific miss suggests a missing review dimension.

---

## Appendix: Original Reviews

### [Claude Opus 4.6]

Now let me understand the broader context. Let me check git blame and the App CRD spec.

Now let me verify the critical k3s suffix issue more carefully.

Let me check for existing Ready condition constants and how conditions are used in the codebase.

Let me do a final check on a few more areas before writing the review.

Now I have enough context to write the review. I've confirmed the key issues — let me write it up.

# Code Review — PR #278: Bundle k3s versions to app controller

## Executive Summary

This PR embeds a filtered `k3s-versions.json` and validates `spec.kubernetes.version` against it, also fixing two related VM-side issues (HOST_HOME literal expansion and missing sysconfig vars for k3s.service). The validation has three notable gaps: the bundled list contains versions whose `+k3s*` suffix doesn't match the hardcoded `+k3s1` in the install template (~11% of versions will fail to install despite passing validation), an empty `kubernetes.version` with `enabled=true` bypasses validation entirely, and the new `Ready=False` condition is never cleared once the user fixes the version.

## Findings

### Critical Issues

None.

### Important Issues

**I1. Bundled list contains versions the install template can't install** — `pkg/controllers/app/app/k3s-versions.json:23,33,39,43`, `pkg/controllers/app/app/lima-template.yaml:57` (important, regression)

```yaml
# k3s-versions.json
"v1.32.11+k3s3",
"v1.33.7+k3s3",
"v1.34.3+k3s3",
"v1.35.0+k3s3",
```

```bash
# lima-template.yaml line 57
export INSTALL_K3S_VERSION="v${PARAM_KUBERNETES_VERSION}+k3s1"
```

`parseK3sVersions` (k3s_versions.go:27-32) strips both the leading `v` and the `+k3s*` suffix, so the lookup set becomes `{"1.32.11", "1.33.7", "1.34.3", "1.35.0", ...}`. A user setting `kubernetes.version=1.32.11` passes validation in `app_controller.go:127`, the controller creates the LimaVM, and the provisioning script at line 57 of `lima-template.yaml` constructs `INSTALL_K3S_VERSION=v1.32.11+k3s1` — a tag that does not exist on the k3s release page. `curl -sfL https://get.k3s.io | sh -` then fails, the provision script aborts under `set -o errexit`, and the VM never reaches Running. Four of the 35 bundled versions (~11%) hit this. The validator's contract says "if it's in the list, it works"; this PR establishes that contract and immediately violates it.

Fix: have `parseK3sVersions` return the full release identifier so the reconciler can pass it through unchanged. Then drop the hardcoded `+k3s1` from the template.

```diff
 // k3s_versions.go
-func parseK3sVersions(data string) (map[string]struct{}, error) {
+// parseK3sVersions returns a map from bare version (e.g. "1.32.11") to the
+// full k3s release identifier (e.g. "v1.32.11+k3s3").
+func parseK3sVersions(data string) (map[string]string, error) {
     var parsed k3sVersionsJSON
     if err := json.Unmarshal([]byte(data), &parsed); err != nil {
         return nil, fmt.Errorf("failed to parse k3s versions: %w", err)
     }
-    set := make(map[string]struct{}, len(parsed.Versions))
+    set := make(map[string]string, len(parsed.Versions))
     for _, v := range parsed.Versions {
         bare := strings.TrimPrefix(v, "v")
         if idx := strings.Index(bare, "+"); idx >= 0 {
             bare = bare[:idx]
         }
-        set[bare] = struct{}{}
+        set[bare] = v
     }
     return set, nil
 }
```

Then introduce a new param `K3S_RELEASE` in `applySpecToTemplate` carrying `set[app.Spec.Kubernetes.Version]`, and replace the template line with `export INSTALL_K3S_VERSION="${PARAM_K3S_RELEASE}"`.

---

**I2. Stale `Ready=False` condition is never cleared after the user fixes the version** — `pkg/controllers/app/app/controllers/app_controller.go:121-144,268-284` (important, regression)

```go
if app.Spec.Kubernetes.Enabled && app.Spec.Kubernetes.Version != "" {
    supportedVersions, err := parseK3sVersions(r.K3sVersionsData)
    ...
    if _, ok := supportedVersions[app.Spec.Kubernetes.Version]; !ok {
        // sets Ready=False / UnsupportedKubernetesVersion
        return ctrl.Result{}, nil
    }
}
// falls through — never touches Ready again
```

The mirror loop at lines 268-284 only writes condition types that exist on the LimaVM (`Created`, `Running`, etc. — see `docs/design/api_app.md:94-104`). LimaVM never sets `Ready`. So once the validation block has stamped `Ready=False` on the App and the user does `rdd set kubernetes.version=1.32.0` (or `kubernetes.enabled=false`), the condition silently persists indefinitely. `kubectl get app -o yaml` will show `Ready=False` even though the VM is happily running k3s. The bats tests don't catch this because every test calls `delete_app` first, which destroys the conditions.

Fix: when validation passes, drop the prior condition. Combine with I3's guard in the same block:

```diff
-    if app.Spec.Kubernetes.Enabled && app.Spec.Kubernetes.Version != "" {
+    if app.Spec.Kubernetes.Enabled {
         supportedVersions, err := parseK3sVersions(r.K3sVersionsData)
         if err != nil {
             return ctrl.Result{}, fmt.Errorf("failed to load k3s versions: %w", err)
         }
         if _, ok := supportedVersions[app.Spec.Kubernetes.Version]; !ok {
             // ... existing Ready=False set + return
         }
+        if apimeta.RemoveStatusCondition(&app.Status.Conditions, "Ready") {
+            if err := r.Status().Update(ctx, &app); err != nil {
+                return ctrl.Result{}, err
+            }
+            return ctrl.Result{}, nil
+        }
+    } else if apimeta.RemoveStatusCondition(&app.Status.Conditions, "Ready") {
+        if err := r.Status().Update(ctx, &app); err != nil {
+            return ctrl.Result{}, err
+        }
+        return ctrl.Result{}, nil
     }
```

---

**I3. Empty `kubernetes.version` with `enabled=true` bypasses validation** — `pkg/controllers/app/app/controllers/app_controller.go:122` (important, gap)

```go
if app.Spec.Kubernetes.Enabled && app.Spec.Kubernetes.Version != "" {
```

`KubernetesSpec.Version` (pkg/apis/app/v1alpha1/app_types.go:29) is `+optional` with no `+kubebuilder:default`, so a user creating an App with `{kubernetes: {enabled: true}}` and no version sets `Version=""`. The check on line 122 short-circuits, the reconciler proceeds, the input ConfigMap is built with `KUBERNETES_VERSION:` (empty value), and the provisioning script in `lima-template.yaml:57` constructs `INSTALL_K3S_VERSION="v+k3s1"`, which fails the same way as I1. The user only sees a CreateFailed condition with a k3s install error.

The doc claims a `1.30.2` default (`docs/design/api_app.md:90`) but no such default exists in code, and `1.30.2` is below the new threshold anyway.

Fix: drop the `Version != ""` clause and let the empty-string lookup naturally fail with `UnsupportedKubernetesVersion`. Optionally tweak the message to say "version is required when enabled" when empty. Or add a `+kubebuilder:default` matching one of the bundled versions.

```diff
-    if app.Spec.Kubernetes.Enabled && app.Spec.Kubernetes.Version != "" {
+    if app.Spec.Kubernetes.Enabled {
```

---

**I4. `HOST_HOME` quoting dropped — fragile against paths with YAML-special characters** — `pkg/controllers/app/app/controllers/app_controller.go:48-51` (important, regression)

```go
return baseTemplate + fmt.Sprintf(
    "\nparam:\n  CONTAINER_ENGINE: %s\n  HOST_HOME: %s\n  KUBERNETES_ENABLED: %v\n  KUBERNETES_VERSION: %s\n",
    spec.ContainerEngine.Name,
    hostHome,
```

The previous line read `HOST_HOME: \"{{.Home}}\"` — the literal string was double-quoted in the YAML output. Commit e57a182 replaces `{{.Home}}` with `os.UserHomeDir()` (correct fix for the literal-template bug) but drops the quotes. `os.UserHomeDir()` on Windows returns `%USERPROFILE%`, which can include spaces (`C:\Users\Jan Dubois`) or, in pathological cases, a `#` (legal in Windows usernames) that turns into a YAML comment. Even on Unix, a user with a `#` or `:` in their home path would trip the YAML parser. The previous quoting was the right defense.

Fix: use `%q` so the value is always emitted as a quoted YAML scalar.

```diff
     return baseTemplate + fmt.Sprintf(
-        "\nparam:\n  CONTAINER_ENGINE: %s\n  HOST_HOME: %s\n  KUBERNETES_ENABLED: %v\n  KUBERNETES_VERSION: %s\n",
+        "\nparam:\n  CONTAINER_ENGINE: %s\n  HOST_HOME: %q\n  KUBERNETES_ENABLED: %v\n  KUBERNETES_VERSION: %s\n",
         spec.ContainerEngine.Name,
         hostHome,
```

(Lima's param parser accepts double-quoted scalars; the existing template uses them elsewhere.)

---

**I5. `assert_output --partial "Ready"` matches `NotReady`** — `bats/tests/32-app-controllers/app.bats:330-333` (important, regression)

```bash
@test "kubernetes nodes are Ready" {
    run -0 rdd limavm shell "${VM_NAME}" sudo k3s kubectl get nodes
    assert_output --partial "Ready"
}
```

`kubectl get nodes` prints either `Ready` or `NotReady` in the STATUS column. A substring search for `Ready` matches both, so the test gives a false positive when the node is still in `NotReady` (cordoned, network init, transient failures). Single-node k3s usually flips Ready quickly, so this rarely fires in practice — but the test claims to verify readiness and doesn't.

Fix: use `kubectl wait` and let it block on the actual condition, which also removes the need for an `assert_output` check:

```diff
 @test "kubernetes nodes are Ready" {
-    run -0 rdd limavm shell "${VM_NAME}" sudo k3s kubectl get nodes
-    assert_output --partial "Ready"
+    run -0 rdd limavm shell "${VM_NAME}" sudo k3s kubectl wait --for=condition=Ready nodes --all --timeout=60s
 }
```

### Suggestions

**S1. `parseK3sVersions` runs on every reconcile** — `pkg/controllers/app/app/controllers/app_controller.go:123` (suggestion, enhancement)

```go
supportedVersions, err := parseK3sVersions(r.K3sVersionsData)
if err != nil {
    return ctrl.Result{}, fmt.Errorf("failed to load k3s versions: %w", err)
}
```

The data is a compile-time embed; the parsed set never changes. Parsing on every reconcile is wasted work and the error path is unreachable in practice — if it ever did fire (corrupt embed at build time) the controller would silently retry forever instead of failing visibly.

Fix: parse once in `RegisterWithManager` (`controller.go:95-100`) and pass the resulting map into `AppReconciler`. Surface the parse error from `init()` / `RegisterWithManager` so a malformed embed fails to start.

```diff
 // controller.go
+supported, err := controllers.ParseK3sVersions(k3sVersionsData)
+if err != nil {
+    return fmt.Errorf("invalid bundled k3s-versions.json: %w", err)
+}
 return (&controllers.AppReconciler{
     Client:           mgr.GetClient(),
     Scheme:           mgr.GetScheme(),
     LimaTemplateData: limaTemplateData(),
-    K3sVersionsData:  k3sVersionsData,
+    SupportedK3sVersions: supported,
 }).SetupWithManager(mgr)
```

---

**S2. `Ready` condition only ever set on failure — confusing semantics** — `pkg/controllers/app/app/controllers/app_controller.go:128-135` (suggestion, gap)

The K8s convention is that `Ready=True` means the resource is fully reconciled and operational. Here `Ready` is only ever set to `False` (and only when validation fails) and never set to `True` in the happy path. A reader checking `kubectl wait --for=condition=Ready` against the App in the success case would block forever because the condition never appears.

Consider either (a) using a more specific condition type like `KubernetesVersionValid` so the semantic is unambiguous, or (b) consistently maintaining `Ready` (set to True after the LimaVM Running condition is True). Option (a) is smaller and avoids confusion with the existing mirrored conditions.

---

**S3. Two `Status().Update` calls without retry-on-conflict or SSA** — `pkg/controllers/app/app/controllers/app_controller.go:137,280` (suggestion, gap)

```go
if err := r.Status().Update(ctx, &app); err != nil {
    log.Error(err, "Unable to update App status for unsupported version")
    return ctrl.Result{}, err
}
```

The new validation block adds a second `Status().Update` site on `App` without going through retry-on-conflict. Today there is only one writer (AppReconciler, single reconcile per App), so conflicts are rare — but the design doc anticipates a separate engine reconciler that will write `ContainerEngineReady`, and any future second writer will start producing conflict-driven re-reconciles. Returning the error and letting controller-runtime requeue is acceptable for now; the existing mirror update at line 280 has the same pattern, so the new code is replicating an existing gap rather than introducing a novel one.

Fix (when ready to do both sites): wrap in `retry.RetryOnConflict` or migrate to Server-Side Apply.

---

**S4. `delete_app` now creates the app via `rdd set` when none exists** — `bats/tests/32-app-controllers/app.bats:14-23` (suggestion, regression)

```bash
delete_app() {
    # Stop the VM first so the LimaVM finalizer clears quickly.
    rdd set running=false 2>/dev/null || true
    rdd ctl delete app "${APP_NAME}" --ignore-not-found
```

`rdd set` creates the App if it does not exist (cmd/rdd/set.go:280-281). When `delete_app` is called against a fresh control plane (e.g., from the new `unsupported Kubernetes version sets Ready=False condition` test which immediately follows `verify App can be recreated after deletion` that just deleted everything), this call recreates a brand-new App, the controller starts adding finalizers and creating the input ConfigMap, then the `rdd ctl delete app` line tears it down again. Wasteful and may add seconds to a sequence the comment claims should be faster.

Fix: use `rdd ctl patch`, which fails (silently, due to the redirect) when the resource doesn't exist:

```diff
 delete_app() {
     # Stop the VM first so the LimaVM finalizer clears quickly.
-    rdd set running=false 2>/dev/null || true
+    rdd ctl patch app "${APP_NAME}" --type=merge -p '{"spec":{"running":false}}' 2>/dev/null || true
     rdd ctl delete app "${APP_NAME}" --ignore-not-found
```

---

**S5. Design doc `api_app.md` doesn't mention the `Ready` condition or `UnsupportedKubernetesVersion` reason** — `docs/design/api_app.md:92-105` (suggestion, gap)

The doc explicitly says:

> `status.conditions`: Conditions are **mirrored from the owned `LimaVM`** resource.

The new validation introduces a non-mirrored `Ready` condition on the App. The condition table (lines 94-104) only lists `Created` and `Running`. A reader following the doc would not expect `Ready` to ever appear. Add a row for `Ready/False/UnsupportedKubernetesVersion` and clarify that a small set of conditions are owned by the App reconciler directly rather than mirrored. Per the deep-review guidance, comments and design docs should match what the code does.

---

**S6. Error message points to a file the user can't see** — `pkg/controllers/app/app/controllers/app_controller.go:132` (suggestion, enhancement)

```go
Message: fmt.Sprintf("Kubernetes version %q is not supported; see the bundled k3s-versions.json for valid versions", app.Spec.Kubernetes.Version),
```

The user has no path to the bundled file — it's embedded into the binary. A more useful message would name a CLI affordance or list the channels (`stable`, `latest`) read from the same JSON.

```diff
-Message: fmt.Sprintf("Kubernetes version %q is not supported; see the bundled k3s-versions.json for valid versions", app.Spec.Kubernetes.Version),
+Message: fmt.Sprintf("Kubernetes version %q is not supported; run 'rdd set --help' to see the list of supported versions", app.Spec.Kubernetes.Version),
```

---

**S7. `os.UserHomeDir()` on every reconcile** — `pkg/controllers/app/app/controllers/app_controller.go:70-73` (suggestion, enhancement)

```go
hostHome, err := os.UserHomeDir()
if err != nil {
    return ctrl.Result{}, fmt.Errorf("failed to get host home directory: %w", err)
}
```

The user home does not change while the daemon runs. Cache it on the reconciler struct in `RegisterWithManager` (alongside the parsed k3s version set from S1). Bonus: failing once at startup gives a clearer error than a per-reconcile failure that loops forever.

---

**S8. 300s timeout on first VM start with k3s install** — `bats/tests/32-app-controllers/app.bats:319-323` (suggestion, enhancement)

```bash
@test "start VM with Kubernetes enabled" {
    rdd set running=true
    rdd ctl wait --for=condition=Running \
        limavm/"${VM_NAME}" --namespace "${RDD_NAMESPACE}" --timeout=300s
}
```

The `Running` condition only fires after the provision script completes, and the provision script downloads and installs k3s (~75 MB binary plus install steps) on first boot. 300s is workable on a fast network but borderline on slow CI runners. Consider bumping to 360s or 480s to reduce flake risk, or rely on the subsequent `wait for k3s to be ready` step's slack.

## Design Observations

**Concerns**

- **Validation belongs at admission, not in the reconciler** *(future)*. Validating in `Reconcile` means `rdd set kubernetes.version=invalid` succeeds at the CLI layer, the user gets no immediate feedback, and they have to read `app.status.conditions` to discover the spec was rejected. A validating admission webhook (RDD already runs one for LimaVM) would reject the bad spec at API-call time and return the error to the CLI directly. Alternatively, a generated `+kubebuilder:validation:Enum=...` derived from `k3s-versions.json` at build time would push validation into the API server with no extra runtime infrastructure.

- **Missing default for `kubernetes.version`** *(in-scope)*. The Go spec has no default, the CRD has no default, and the design doc claims a `1.30.2` default that is now below the supported floor. The combination of "no default" + "the validator only fires when `version != ""`" is what produces I3. Either commit to a real default (and update `api_app.md` to match) or always require version when `enabled=true`.

**Strengths**

- The controller correctly returns early on validation failure without updating the template ConfigMap, so a running VM keeps its old (working) configuration if the user fans out a bad version. Good "fail safe" behavior.
- Splitting the existing `applySpecToTemplate` to take an explicit `hostHome` parameter (rather than reading the env inside the helper) keeps the helper pure and testable.

## Testing Assessment

The added tests cover the happy path (valid version → LimaVM created → k3s active) and the rejection path (invalid version → `Ready=False`). Untested scenarios, ranked by risk:

1. **Recovery from invalid → valid version** — would catch I2 (stale `Ready=False`). Test: set bad version, observe `Ready=False`, set good version, assert `Ready` condition is gone (or `True`).
2. **Empty version with `kubernetes.enabled=true`** — would catch I3. Test: `rdd set kubernetes.enabled=true` (no version), expect `Ready=False`.
3. **A `+k3s3` version** — would catch I1. Test: pick `1.34.3` from the bundled list and run the same start-VM/wait-for-k3s flow.
4. **`assert_output --partial "Ready"` false positives** (I5) — would be eliminated by switching to `kubectl wait`.
5. **Disabling Kubernetes after validation failure** — set bad version, then `rdd set kubernetes.enabled=false`, expect the `Ready=False` to clear and the LimaVM to come up.

## Documentation Assessment

- `docs/design/api_app.md` needs the additions described in S5 (Ready condition, UnsupportedKubernetesVersion reason, mention that some conditions are owned, not mirrored).
- The doc's `1.30.2` default for `kubernetes.version` (line 90) is now factually wrong — there's no Go default and `1.30.2` would be rejected by the new validator. Either restore a real default or update the doc.
- `parseK3sVersions` and `k3sVersionsJSON` have brief godocs; the `K3sVersionsData` field on `AppReconciler` does not (suggestion: one line saying "raw JSON bytes of k3s-versions.json; parsed lazily by the reconciler" — and once S1 is applied, replace with the parsed map and update the docstring).

## Commit Structure

The four commits are logically separated (validate version / hostHome fix / sysconfig fix / bats tests) and individually reviewable. One small concern: commit `16ef12c "Add BATs test for supported k8s version in k3s-version.json"` lands the validation tests but doesn't note that they implicitly require the `lima-template.yaml` and `app_controller.go` changes from later commits (`070efc9`, `e57a182`) to actually pass — bisecting on the bats test commit would leave the test suite broken. Consider squashing the bats commit into the implementation commit, or moving it to the end of the series.

## Acknowledged Limitations

None. The two PR comments (Jan: "needs a rebase due to template using finch image"; Nino: "rebased") relate to CI infrastructure, not to substantive findings the author dismissed.

## Coverage Summary

| File | Status |
|---|---|
| `bats/tests/32-app-controllers/app.bats` | I5, S4, S8 |
| `pkg/controllers/app/app/controller.go` | Reviewed, no issues (cross-referenced with S1/S7 fix sites) |
| `pkg/controllers/app/app/controllers/app_controller.go` | I2, I3, I4, S2, S3, S6, S7 |
| `pkg/controllers/app/app/controllers/k3s_versions.go` | I1, S1 |
| `pkg/controllers/app/app/k3s-versions.json` | I1 |
| `pkg/controllers/app/app/lima-template.yaml` | I1 (interaction with bundled list); the printf and quoting changes themselves are correct |
---

### [Codex GPT 5.4]

Verdict: Needs changes

### Executive Summary

This PR bundles a static k3s version list into the App controller, blocks unsupported `spec.kubernetes.version` values before creating the LimaVM, and extends the App BATS coverage around Kubernetes enablement. The main issues are that the new lookup loses the k3s build suffix the installer still needs, and that the new `Ready=False` condition is never cleared once the user picks a valid version.

### Findings

**Critical Issues**

None.

**Important Issues**

I1. **Validation accepts versions the template cannot install** — `pkg/controllers/app/app/controllers/k3s_versions.go:21-34` (important, regression)

```go
func parseK3sVersions(data string) (map[string]struct{}, error) {
    ...
    for _, v := range parsed.Versions {
        bare := strings.TrimPrefix(v, "v")
        if idx := strings.Index(bare, "+"); idx >= 0 {
            bare = bare[:idx]
        }
        set[bare] = struct{}{}
    }
}
```

`parseK3sVersions()` at lines 21-34 strips the `+k3s*` suffix and turns `"v1.32.11+k3s3"` into `"1.32.11"`, so the reconciler now accepts `spec.kubernetes.version=1.32.11` at [app_controller.go:121-144](/private/var/folders/s2/9q03h0c11_s4fdw169wqz3qc0000gn/T/review-codex-pass-1-695856741/pkg/controllers/app/app/controllers/app_controller.go:121). But the VM template still hard-codes `INSTALL_K3S_VERSION="v${PARAM_KUBERNETES_VERSION}+k3s1"` at [lima-template.yaml:57](/private/var/folders/s2/9q03h0c11_s4fdw169wqz3qc0000gn/T/review-codex-pass-1-695856741/pkg/controllers/app/app/lima-template.yaml:57). The bundled JSON already contains releases that exist only as `+k3s3` at [k3s-versions.json:23](/private/var/folders/s2/9q03h0c11_s4fdw169wqz3qc0000gn/T/review-codex-pass-1-695856741/pkg/controllers/app/app/k3s-versions.json:23), [33](/private/var/folders/s2/9q03h0c11_s4fdw169wqz3qc0000gn/T/review-codex-pass-1-695856741/pkg/controllers/app/app/k3s-versions.json:33), [39](/private/var/folders/s2/9q03h0c11_s4fdw169wqz3qc0000gn/T/review-codex-pass-1-695856741/pkg/controllers/app/app/k3s-versions.json:39), and [43](/private/var/folders/s2/9q03h0c11_s4fdw169wqz3qc0000gn/T/review-codex-pass-1-695856741/pkg/controllers/app/app/k3s-versions.json:43). Those inputs now pass validation and then fail later inside the VM because the installer is asked for a non-existent `+k3s1` build.

Fix: keep the exact bundled version string, not just the bare semver. A simple shape is `map[string]string` from bare semver to the full installable tag, then emit that full tag into the template:

```diff
- supportedVersions, err := parseK3sVersions(r.K3sVersionsData)
+ supportedVersions, err := parseK3sVersions(r.K3sVersionsData)
  ...
- if _, ok := supportedVersions[app.Spec.Kubernetes.Version]; !ok {
+ installVersion, ok := supportedVersions[app.Spec.Kubernetes.Version]
+ if !ok {
      ...
  }
- desired := applySpecToTemplate(r.LimaTemplateData, app.Spec, hostHome)
+ desired := applySpecToTemplate(r.LimaTemplateData, app.Spec, hostHome, installVersion)
```

I2. **`Ready=False` sticks after the spec becomes valid** — `pkg/controllers/app/app/controllers/app_controller.go:127-142` (important, regression)

```go
if _, ok := supportedVersions[app.Spec.Kubernetes.Version]; !ok {
    unsupported := apimeta.SetStatusCondition(&app.Status.Conditions, metav1.Condition{
        Type:   "Ready",
        Status: metav1.ConditionFalse,
        Reason: "UnsupportedKubernetesVersion",
    })
    ...
    return ctrl.Result{}, nil
}
```

The new block at lines 127-142 writes `Ready=False`, but the success path never removes or overwrites that condition. On later reconciles, the controller only mirrors LimaVM conditions at lines 262-279, and those are `Created` / `Running`, not `Ready`. That leaves a stale `Ready=False/UnsupportedKubernetesVersion` on the App even after a valid version is set and the LimaVM is created. The new BATS coverage at [app.bats:295-316](/private/var/folders/s2/9q03h0c11_s4fdw169wqz3qc0000gn/T/review-codex-pass-1-695856741/bats/tests/32-app-controllers/app.bats:295) checks the failing path but never the recovery path, so this regression is currently untested.

Fix: either remove the `Ready` condition as soon as validation passes, or make `Ready` a fully controller-owned condition that is recomputed on every reconcile. For the minimal fix:

```diff
+ apimeta.RemoveStatusCondition(&app.Status.Conditions, "Ready")
+ if err := r.Status().Update(ctx, &app); err != nil {
+     return ctrl.Result{}, err
+ }
```

I3. **Empty Kubernetes versions still bypass the new guard** — `pkg/controllers/app/app/controllers/app_controller.go:122-122` (important, gap)

```go
if app.Spec.Kubernetes.Enabled && app.Spec.Kubernetes.Version != "" {
```

The validation only runs when `spec.kubernetes.version` is non-empty at line 122, but the CLI explicitly supports clearing that field to `""` in [set.bats:124-131](/private/var/folders/s2/9q03h0c11_s4fdw169wqz3qc0000gn/T/review-codex-pass-1-695856741/bats/tests/32-app-controllers/set.bats:124). If a user then enables Kubernetes, the controller skips validation and the template still exports `INSTALL_K3S_VERSION="v${PARAM_KUBERNETES_VERSION}+k3s1"` at [lima-template.yaml:57](/private/var/folders/s2/9q03h0c11_s4fdw169wqz3qc0000gn/T/review-codex-pass-1-695856741/pkg/controllers/app/app/lima-template.yaml:57), producing `v+k3s1`. This is exactly the class of bad input this PR set out to prevent, but the empty-string case still falls through into the VM.

Fix: reject `enabled=true` with an empty version before creating or updating the LimaVM. The strongest fix is CRD validation (`XValidation` on `spec.kubernetes`) so bad objects never persist; otherwise return a dedicated status condition such as `MissingKubernetesVersion`.

**Suggestions**

None.

### Design Observations

**Concerns**

1. `(in-scope)` The controller now treats “supported version” as a bare semver while the installer consumes a full k3s tag. Keeping a single parsed structure that preserves both forms would eliminate the current suffix drift and make future channel updates much safer.

**Strengths**

1. Bundling a filtered static `k3s-versions.json` into the controller is the right direction for this repo: it removes an external dependency from reconcile-time validation and blocks obviously bad specs before the LimaVM is created.

### Testing Assessment

1. Missing a recovery test that sets an unsupported version, then a supported one, and asserts the App no longer reports `Ready=False/UnsupportedKubernetesVersion`.
2. Missing coverage for versions whose bundled entry is not `+k3s1` (`1.32.11`, `1.33.7`, `1.34.3`, `1.35.0`), which is where I1 hides.
3. Missing coverage for `kubernetes.enabled=true` with `kubernetes.version=""`, even though the CLI supports clearing the version field.
4. `go test ./pkg/controllers/app/app/... ./pkg/controllers/lima/limavm/...` passed. I did not run the BATS suite.

### Documentation Assessment

`docs/design/api_app.md` is now stale at [api_app.md:90](/private/var/folders/s2/9q03h0c11_s4fdw169wqz3qc0000gn/T/review-codex-pass-1-695856741/docs/design/api_app.md:90) and [92](/private/var/folders/s2/9q03h0c11_s4fdw169wqz3qc0000gn/T/review-codex-pass-1-695856741/docs/design/api_app.md:92): it still says `spec.kubernetes.version` defaults to `1.30.2`, and it still says App conditions are only mirrored from LimaVM even though this PR adds an App-owned `Ready` condition. The `cmd_app.md` example at [line 20](/private/var/folders/s2/9q03h0c11_s4fdw169wqz3qc0000gn/T/review-codex-pass-1-695856741/docs/design/cmd_app.md:20) is still fine, but the API doc needs updating to match the new behavior.

### Acknowledged Limitations

- **PR comment**: @jandubois previously stopped review because the branch still used the finch image template and therefore lacked `rancher-desktop.target`; @Nino-K replied that the branch had since been rebased and was ready. The current [lima-template.yaml](/private/var/folders/s2/9q03h0c11_s4fdw169wqz3qc0000gn/T/review-codex-pass-1-695856741/pkg/controllers/app/app/lima-template.yaml:41) changes are consistent with that rationale, so I did not treat the old image/template mismatch as an active finding.

### Coverage Summary

- `bats/tests/32-app-controllers/app.bats` — Reviewed, no issues
- `pkg/controllers/app/app/controller.go` — Reviewed, no issues
- `pkg/controllers/app/app/controllers/app_controller.go` — I2, I3
- `pkg/controllers/app/app/controllers/k3s_versions.go` — I1
- `pkg/controllers/app/app/k3s-versions.json` — I1
- `pkg/controllers/app/app/lima-template.yaml` — I1, I3

---

### [Gemini 3.1 Pro]

### Executive Summary

This PR bundles `k3s-versions.json` into the `app` controller to validate the requested Kubernetes version and ensure only supported versions (>= 1.32.0) are provisioned. Key findings include a finalizer deadlock caused by `os.UserHomeDir()` failing during cleanup, a validation bypass when the version string is empty, YAML syntax risks with unquoted host paths, and a concurrent-writer conflict risk on `App.Status`.

### Findings

#### Critical Issues

**C1. `os.UserHomeDir()` blocks finalizer cleanup if it fails** — `pkg/controllers/app/app/controllers/app_controller.go:71-77` (critical, regression)

```go
	hostHome, err := os.UserHomeDir()
	if err != nil {
		return ctrl.Result{}, fmt.Errorf("failed to get host home directory: %w", err)
	}

	// Handle deletion, delete the LimaVM and wait for it to finish cleaning up.
	if base.IsBeingDeleted(&app) {
```

Because `os.UserHomeDir()` is called unconditionally at the top of the reconcile loop, any failure (e.g., the `HOME` environment variable becomes unset or inaccessible) will return an error before `base.IsBeingDeleted(&app)` is evaluated. This permanently deadlocks the finalizer cleanup; the `App` object will never be deleted if `UserHomeDir()` fails, even though the host home directory is not needed to perform the deletion of the `LimaVM` and associated resources.

**Fix:** Move the `os.UserHomeDir()` fetch below the finalizer/deletion logic block, since `hostHome` is only required when constructing the `LimaVM` configuration for creation or updates.

**C2. Empty `kubernetes.version` bypasses validation and breaks VM provisioning** — `pkg/controllers/app/app/controllers/app_controller.go:122` (critical, gap)

```go
	// Validate the requested Kubernetes version against the bundled version list.
	if app.Spec.Kubernetes.Enabled && app.Spec.Kubernetes.Version != "" {
		supportedVersions, err := parseK3sVersions(r.K3sVersionsData)
```

By skipping validation when `app.Spec.Kubernetes.Version` is `""`, the reconciler injects an empty string into the `PARAM_KUBERNETES_VERSION` of the `lima-template.yaml`. This causes the guest agent to evaluate `INSTALL_K3S_VERSION="v+k3s1"`, which does not exist in the remote repository. The K3s installation will silently fail to download inside the VM, hanging the provisioning process without surfacing any `Ready=False` condition to the `App` resource. 

**Fix:** Remove the `&& app.Spec.Kubernetes.Version != ""` condition. If Kubernetes is enabled and the version is empty, it must either default to a valid supported version (e.g., from the "stable" channel) or be rejected explicitly with an `UnsupportedKubernetesVersion` (or `MissingKubernetesVersion`) condition.

#### Important Issues

**I1. Unquoted `HOST_HOME` injection risks invalid YAML** — `pkg/controllers/app/app/controllers/app_controller.go:49` (important, regression)

```go
-func applySpecToTemplate(baseTemplate string, spec v1alpha1.AppSpec) string {
+func applySpecToTemplate(baseTemplate string, spec v1alpha1.AppSpec, hostHome string) string {
 	return baseTemplate + fmt.Sprintf(
-		"\nparam:\n  CONTAINER_ENGINE: %s\n  HOST_HOME: \"{{.Home}}\"\n  KUBERNETES_ENABLED: %v\n  KUBERNETES_VERSION: %s\n",
+		"\nparam:\n  CONTAINER_ENGINE: %s\n  HOST_HOME: %s\n  KUBERNETES_ENABLED: %v\n  KUBERNETES_VERSION: %s\n",
 		spec.ContainerEngine.Name,
+		hostHome,
```

By removing the quotes and using the unquoted `%s` format verb, the user's home directory path is injected as a raw YAML scalar. If `hostHome` contains YAML control characters (e.g., spaces followed by `#`, `:`, or array brackets `[`), the resulting configuration will fail to parse or be parsed as a different type, breaking the LimaVM configuration.

**Fix:** Use `%q` to safely quote and escape the string for JSON/YAML.

```diff
-		"\nparam:\n  CONTAINER_ENGINE: %s\n  HOST_HOME: %s\n  KUBERNETES_ENABLED: %v\n  KUBERNETES_VERSION: %s\n",
+		"\nparam:\n  CONTAINER_ENGINE: %q\n  HOST_HOME: %q\n  KUBERNETES_ENABLED: %v\n  KUBERNETES_VERSION: %q\n",
```

**I2. Non-SSA Status().Update() on App resource with multiple writers** — `pkg/controllers/app/app/controllers/app_controller.go:137` (important, regression)

```go
			if unsupported {
				if err := r.Status().Update(ctx, &app); err != nil {
					log.Error(err, "Unable to update App status for unsupported version")
					return ctrl.Result{}, err
				}
			}
```

As noted in the repository guidelines, `App.Status.Conditions` has multiple concurrent writers (e.g., `AppReconciler` and `EngineReconciler`). Using a standard `Update()` without fetching the latest object or using `retry.RetryOnConflict` will frequently result in `Conflict` errors, needlessly failing the reconcile loop and generating log noise.

**Fix:** Use Server-Side Apply (Patch) or wrap the update in `retry.RetryOnConflict` when updating the `App` status.

**I3. Static JSON is unmarshaled on every reconcile event** — `pkg/controllers/app/app/controllers/app_controller.go:123` (important, performance)

```go
		supportedVersions, err := parseK3sVersions(r.K3sVersionsData)
```

`parseK3sVersions` is called on every single reconciliation loop, which executes a `json.Unmarshal` on the static embedded `r.K3sVersionsData` string. Reconcilers run frequently; re-parsing static data wastes CPU cycles and creates unnecessary garbage collection pressure.

**Fix:** Parse the JSON data once during `SetupWithManager` (or package `init`), cache the resulting `map[string]struct{}` inside `AppReconciler`, and reuse the pre-computed set in the `Reconcile` function.

#### Suggestions

None.

### Design Observations

**Concerns**
- **Implicit Version Passing `(future)`**: `app.Spec.Kubernetes.Version` is currently defined as `+optional` with no default in `app_types.go`, relying on downstream behavior or admission webhooks to handle missing values. Instead of validating empty strings after the fact in the reconciler, consider adding `+kubebuilder:default="1.35.3"` (or the latest stable version) directly to the CRD. This enforces correctness at the Kubernetes API layer.

### Testing Assessment
1. **Empty version behavior**: There is no test ensuring that an App submitted with `kubernetes.version=""` is gracefully handled, either by defaulting to a working version or by setting an appropriate error condition instead of silently hanging the VM provisioner.

### Documentation Assessment
No major documentation gaps.

### Acknowledged Limitations
None.

### Coverage Summary
- `bats/tests/32-app-controllers/app.bats`: Reviewed, no issues
- `pkg/controllers/app/app/controller.go`: Reviewed, no issues
- `pkg/controllers/app/app/controllers/app_controller.go`: C1, C2, I1, I2, I3
- `pkg/controllers/app/app/controllers/k3s_versions.go`: I3
- `pkg/controllers/app/app/k3s-versions.json`: Reviewed, no issues
- `pkg/controllers/app/app/lima-template.yaml`: Reviewed, no issues
