# Deep Review: 20260320-225736-pr-236

| | |
|---|---|
| **Date** | 2026-03-20 22:57 |
| **Author** | [@Nino-K](https://github.com/Nino-K) |
| **PR** | [#236](https://github.com/rancher-sandbox/rancher-desktop-daemon/pull/236) — Implement App controller lifecycle for LimaVM |
| **Branch** | `app-controller-create-delete-limavm` |
| **Commits** | `6a94f5d` Implement App controller lifecycle for LimaVM<br>`dd300d6` Add Bats test to App Controller |
| **Reviewers** | Claude Opus 4.6, Codex GPT 5.4, Gemini 3.1 Pro |
| **Verdict** | **Merge with fixes** — Owned-finalizer deadlock needs fixing; template portability is a known gap but the template is temporary |
| **Wall-clock time** | 31 min 23 s |

---

## Consolidated Review

### Executive Summary

This PR introduces the App controller's core reconciliation loop: it creates a LimaVM from an embedded template, propagates `spec.running`, mirrors VM status conditions onto the App, and handles deletion through a cleanup finalizer. The priority-chain reconcile pattern is clean and the test coverage is thorough for the happy path. The main issues are a finalizer deadlock when a LimaVM is deleted externally, a fragile early-return after ConfigMap cleanup, and hardcoded instance-specific values in the embedded template.

### Critical Issues

None.

### Important Issues

1. **Owned finalizer deadlock when LimaVM is already terminating** — `app_controller.go:89-101` [Gemini 3.1 Pro]

```go
case !base.IsBeingDeleted(limaVM):
    if err := base.RemoveOwnedFinalizer(ctx, r.Client, limaVM); err != nil {
        return ctrl.Result{}, fmt.Errorf("failed to remove owned finalizer from LimaVM: %w", err)
    }
    if err := r.Delete(ctx, limaVM); err != nil && !apierrors.IsNotFound(err) {
        return ctrl.Result{}, fmt.Errorf("failed to delete LimaVM: %w", err)
    }
    return ctrl.Result{RequeueAfter: requeueAfterDeletion}, nil
default:
    // LimaVM deletion is in progress; keep requeueing until it is gone.
    log.Info("Waiting for LimaVM deletion to complete")
    return ctrl.Result{RequeueAfter: requeueAfterDeletion}, nil
```

The `default` case requeues without removing the owned finalizer. If the LimaVM reaches Terminating through any path other than the App controller's deletion flow (manual delete, CLI tool), the `rdd.rancherdesktop.io/owned` finalizer blocks deletion indefinitely. The LimaVM controller's `handleDeletion` (at `limavm_lifecycle.go:32-78`) removes only the cleanup finalizer and owned finalizers on *children* — it never touches the owned finalizer on the LimaVM itself. The result: the LimaVM stays Terminating, the App stays Terminating waiting for it, and both resources deadlock.

The same gap exists during normal operation (App not being deleted): if a LimaVM is deleted externally, the reconcile at lines 116-210 proceeds through spec.running propagation and condition mirroring without ever removing the owned finalizer or acknowledging the terminating state.

Verified via `git blame`: lines 89-101 are introduced in commit `6a94f5d`. (important, regression)

Fix: Always check for the owned finalizer when the LimaVM is terminating. In the deletion switch, consolidate the `err == nil` cases:

```go
case err == nil:
    if base.HasOwnedFinalizer(limaVM) {
        if err := base.RemoveOwnedFinalizer(ctx, r.Client, limaVM); err != nil {
            return ctrl.Result{}, fmt.Errorf("failed to remove owned finalizer: %w", err)
        }
    }
    if !base.IsBeingDeleted(limaVM) {
        if err := r.Delete(ctx, limaVM); err != nil && !apierrors.IsNotFound(err) {
            return ctrl.Result{}, fmt.Errorf("failed to delete LimaVM: %w", err)
        }
    }
    return ctrl.Result{RequeueAfter: requeueAfterDeletion}, nil
```

During normal reconcile, add an early check after the Get at line 116:

```go
if base.IsBeingDeleted(limaVM) {
    if err := base.RemoveOwnedFinalizer(ctx, r.Client, limaVM); err != nil {
        return ctrl.Result{}, fmt.Errorf("failed to remove owned finalizer from terminating LimaVM: %w", err)
    }
    return ctrl.Result{RequeueAfter: requeueAfterDeletion}, nil
}
```

2. **Fragile early-return after input ConfigMap deletion** — `app_controller.go:172-177` [Gemini 3.1 Pro]

```go
if err := r.Get(ctx, client.ObjectKey{Name: inputConfigMapName, Namespace: namespace}, inputConfigMap); err == nil {
    if err := r.Delete(ctx, inputConfigMap); err != nil && !apierrors.IsNotFound(err) {
        return ctrl.Result{}, fmt.Errorf("failed to delete input ConfigMap: %w", err)
    }
    log.Info("Deleted input ConfigMap after LimaVM created its own copy")
    return ctrl.Result{}, nil
}
```

Returning `ctrl.Result{}, nil` after deleting the ConfigMap relies on the LimaVM controller producing subsequent status updates to re-trigger the App controller. The App controller does not watch or own ConfigMaps, so the deletion itself generates no watch event. In practice the LimaVM always has more lifecycle steps, so this works today. But the coupling is implicit: if any future change reduces LimaVM controller chattiness, the App controller stalls until an external event arrives.

Gemini rated this CRITICAL; after tracing the LimaVM controller lifecycle (`limavm_controller.go:176-268`), a newly created LimaVM always produces multiple status updates (initial condition at line 176, template ConfigMap at line 248, Created condition). The stall scenario requires all those updates to complete before the App controller processes the ConfigMap deletion — unlikely in practice. (important, regression)

Fix: Replace `return ctrl.Result{}, nil` with `return ctrl.Result{Requeue: true}, nil` to guarantee the reconciler continues to spec.running propagation and condition mirroring without depending on external events.

3. **Test race: App condition read without waiting for mirroring** — `bats/tests/32-app-controllers/app.bats:128-137` [Claude Opus 4.6]

```bash
@test "verify App status mirrors LimaVM Created condition" {
    run -0 rdd ctl get app "${APP_NAME}" -o json
    local json="${output}"

    run -0 jq -r '.status | has("conditions")' <<<"${json}"
    assert_output "true"
```

Test 123-126 waits for the LimaVM's Created condition, then test 128 reads the App status without waiting for the App controller to mirror it. The Owns watch triggers a reconcile on LimaVM status changes, but the reconcile may not have completed by the time the BATS test reads the App. The same race applies to test 139-149 ("verify App conditions preserve LimaVM LastTransitionTime"). (important, regression)

Fix: Add `rdd ctl wait --for=condition=Created app/"${APP_NAME}" --timeout=30s` before the assertion at line 129.

4. **Transient errors during ConfigMap cleanup skip to finalizer removal** — `app_controller.go:69-71` and `82-84` [Gemini 3.1 Pro]

```go
} else if !apierrors.IsNotFound(cmErr) {
    log.Error(cmErr, "Failed to fetch input ConfigMap during cleanup")
}
```

During App deletion, if fetching either ConfigMap fails with a transient error (connection timeout, rate limit), the code logs the error and falls through to `RemoveCleanupFinalizer` at line 86. Without a garbage collector, the orphaned ConfigMaps persist indefinitely. (important, regression)

Fix: Return the error so the deletion retries:

```go
} else if !apierrors.IsNotFound(cmErr) {
    return ctrl.Result{}, fmt.Errorf("failed to fetch input ConfigMap during cleanup: %w", cmErr)
}
```

5. **Full-object Update causes unnecessary conflict retries** — `app_controller.go:185` [Claude Opus 4.6]

```go
limaVM.Spec.Running = app.Spec.Running
if err := r.Update(ctx, limaVM); err != nil {
    return ctrl.Result{}, fmt.Errorf("failed to propagate running state to LimaVM: %w", err)
}
```

`r.Update(ctx, limaVM)` sends the entire LimaVM object fetched at line 116. The LimaVM controller frequently patches its status concurrently, incrementing the resourceVersion between the Get and Update. Each collision produces a 409 Conflict and a wasted reconcile cycle. Worse, a full Update can overwrite concurrent metadata changes (annotations, labels) made by the LimaVM controller. (important, regression)

Fix: Use a merge patch targeting only `spec.running`:

```go
patch := client.MergeFrom(limaVM.DeepCopy())
limaVM.Spec.Running = app.Spec.Running
if err := r.Patch(ctx, limaVM, patch); err != nil {
```

6. **Embedded template hardcodes instance-specific values** — `lima.yaml:14,37` [Codex GPT 5.4]

```yaml
ssh:
  localPort: 51422
...
  hostSocket: "{{.Home}}/.rd/docker.sock"
```

The SSH port at line 14 is fixed at 51422, ignoring `instance.Index()` which derives per-instance ports. The host socket path at line 37 uses `.rd` instead of `.rd<suffix>` (the default suffix is "2", so the runtime path is `~/.rd2`). Both break parallel instances and the socket path is wrong even for the default instance.

The `controller.go:18` comment acknowledges the template is temporary ("This is temporary and will be removed once the app controller is fully implemented"). The BATS tests never boot the VM to Running, so these paths are untested. (important, gap)

Fix: Render dynamic fields at reconcile time from `instance.ShortDir()` and `instance.Index()` instead of embedding literal values.

7. **Incomplete comment** — `app_controller.go:114` [Claude Opus 4.6]

```go
// Check LimaVM first. If it exists and we can skip ConfigMap
```

This sentence stops mid-thought. (important, regression)

Fix: `// Check whether the LimaVM already exists. If so, skip ConfigMap creation.`

### Suggestions

1. **Typo: "into to"** — `app_controller.go:182` [Claude Opus 4.6]

```go
// Propagate spec.running from App into to the LimaVM.
```

Fix: `// Propagate spec.running from App to the LimaVM.` (suggestion, regression)

2. **`ObservedGeneration` copies LimaVM's generation into App condition** — `app_controller.go:201` [Claude Opus 4.6, Gemini 3.1 Pro]

```go
ObservedGeneration: cond.ObservedGeneration,
```

Per Kubernetes convention, `ObservedGeneration` should reference the containing resource's generation (the App), not the source resource's (the LimaVM). The LimaVM controller does not currently set this field, so the value is always 0. If anyone adds generation tracking to the LimaVM controller, the mirrored value on the App becomes silently wrong. (suggestion, regression)

Fix: Use `ObservedGeneration: app.Generation` or omit the field.

3. **Hardcoded template ConfigMap suffix** — `app_controller.go:74` [Claude Opus 4.6]

```go
templateCMName := limaVMName + "-template"
```

The canonical name computation is `limaVM.GetTemplateConfigMapName()`. On the deletion path (LimaVM already gone), there is no object to call the helper on. The hardcoded `"-template"` suffix could drift from the helper. (suggestion, regression)

Fix: Extract a shared constant (e.g., `templateConfigMapSuffix`) or add a comment referencing `GetTemplateConfigMapName()` as the source of truth.

4. **Missing test for ConfigMap cleanup after deletion** — `bats/tests/32-app-controllers/app.bats:191-194` [Claude Opus 4.6]

```bash
@test "verify LimaVM rd is deleted when App is deleted" {
    run -1 rdd ctl get limavm "${VM_NAME}" --namespace "${RDD_NAMESPACE}"
    assert_output --partial "not found"
}
```

The test verifies the LimaVM is gone after App deletion but does not verify that both ConfigMaps (`rd` and `rd-template`) are also cleaned up. (suggestion, gap)

Fix: Add a test that verifies `rdd ctl get configmap "rd-template" --namespace "${RDD_NAMESPACE}"` returns not found.

5. **`conditionMessageMaxLen` comment misrepresents the limit** — `limavm_controller.go:426-430` [Claude Opus 4.6]

```go
// conditionMessageMaxLen caps how long a condition message can be.
// The API server has a 32KB limit for the whole object, so we keep
// messages short to avoid failures, especially when errors get long
const conditionMessageMaxLen = 32768
```

The comment says "32KB limit for the whole object," but 32768 matches the per-field `maxLength` in the CRD schema (`crd.yaml:86`). The comment and the intent (CRD validation, not object size) are misaligned. (suggestion, regression)

Fix: `// conditionMessageMaxLen matches the CRD maxLength for condition messages. Truncation prevents validation errors when messages grow large.`

---

## Design Observations

### Concerns

- **Terminating LimaVM handling is absent from normal reconcile** `(in-scope)` [Gemini 3.1 Pro]

  The normal reconcile path (lines 116-210) assumes the LimaVM is either missing or alive. A terminating LimaVM passes through spec.running propagation and condition mirroring without acknowledgment. The controller should detect a terminating LimaVM, remove the owned finalizer, and either wait for deletion or recreate. This would also eliminate the deadlock in Finding 1.

- **Template rendering should replace template embedding** `(in-scope)` [Codex GPT 5.4]

  The embedded `lima.yaml` is fully static, but several fields depend on runtime state: socket paths on `instance.ShortDir()`, SSH port on `instance.Index()`, and image format on the host OS. A Go template or struct-based rendering would eliminate the portability gaps in Finding 6 without complicating the reconciler. The `controller.go:18` comment already flags this as temporary.

### Strengths

- The priority-chain pattern (return after each action, let the next reconcile advance the state machine) makes the reconcile function easy to read and debug. Each code block has a single responsibility, and the flow is linear.

- The defensive ConfigMap cleanup during App deletion (lines 63-84) handles the case where the LimaVM's `DeleteOwnedResources` did not run — belt-and-suspenders cleanup for a system without garbage collection.

- `SetControllerReference` for both the input ConfigMap and LimaVM, combined with `Owns(&limav1alpha1.LimaVM{})`, provides correct ownership tracking. The immutability validation on `spec.namespace` prevents orphaning owned resources after creation.

- `LastTransitionTime` mirroring preserves the original LimaVM timestamp rather than generating a new one, so condition timelines remain accurate across the resource hierarchy.

---

## Testing Assessment

Coverage is thorough for the happy-path lifecycle (create, own, propagate, mirror, delete, recreate). Untested scenarios ranked by risk:

1. **Externally deleted LimaVM** — no test verifies the controller handles a LimaVM deleted by a user or CLI tool during normal operation or during App deletion. This is the deadlock path from Finding 1.
2. **ConfigMap cleanup after deletion** — no test verifies that `rd-template` and `rd` ConfigMaps are removed when the App is deleted.
3. **Condition mirroring race** — tests 128 and 139 read App conditions without waiting for the mirror reconcile (Finding 3).
4. **App deletion while LimaVM is mid-startup** — the test always uses `running: false`. Deleting during startup exercises the full cleanup flow more thoroughly.

---

## Documentation Assessment

No gaps. The CRD YAML includes the immutability description, and the embedded `lima.yaml` is self-documenting. The `controller.go:18` comment marks the template as temporary.

---

## Commit Structure

The two commits are well-scoped: `6a94f5d` implements the controller logic and `dd300d6` adds the BATS tests. Separating implementation from tests makes review easier, though the test commit does not stand alone (it requires the controller from the first commit).

---

## Agent Performance Retro

### [Claude] Claude Opus 4.6

Claude produced the most findings (5 important, 4 suggestions) with strong signal quality. Unique contributions: the test race condition (Finding 3), the full-object Update concern (Finding 5), the incomplete comment (Finding 7), and several suggestions (typo, hardcoded suffix, ConfigMap cleanup test, event recorder removal). Claude correctly identified no critical issues and maintained appropriate severity calibration. It traced into the LimaVM controller for the Owns() watch analysis and verified `ObservedGeneration` is unused via grep. The only weakness: Claude did not catch the finalizer deadlock or the early-return stall, the two most impactful findings.

### [Codex] Codex GPT 5.4

Codex produced only two findings but both were substantive. The embedded template portability analysis (Finding 6) was the most thorough cross-reference of any agent — it traced through `instance.ShortDir()`, `instance.Index()`, the BATS helpers, and the design docs to demonstrate the path/port mismatch. Codex also identified the missing image format selection from Issue #227's requirements. No false positives. The review was short (66 lines) with no suggestions or design observations beyond what the findings covered. Duration was the shortest at 253s.

### [Gemini] Gemini 3.1 Pro

Gemini raised two findings as CRITICAL: the early-return stall and the finalizer deadlock. The deadlock finding (Finding 1) was the most valuable individual contribution across all three agents — it identified a real bug that the other two missed. The early-return stall was overrated at CRITICAL (downgraded to IMPORTANT after tracing the LimaVM lifecycle), but the core observation was valid. Gemini also caught the transient-error ConfigMap leak (Finding 4), which the other agents missed. False positive: the "redundant template ConfigMap cleanup" suggestion incorrectly assumed `DeleteOwnedResources` would handle it — the App controller's cleanup runs after the LimaVM is already gone, so there is no owner to drive `DeleteOwnedResources`.

### Summary

| Metric | Claude Opus 4.6 | Codex GPT 5.4 | Gemini 3.1 Pro |
|---|---|---|---|
| Duration | 10:01 | 4:13 | 4:54 |
| Critical | 0 | 0 | 0 (2 downgraded) |
| Important | 5 | 2 | 3 |
| Suggestion | 4 | 0 | 2 |
| Design observations | 1 | 2 | 0 |
| False positives | 0 | 0 | 1 |
| Unique insights | 4 | 1 | 2 |

Gemini provided the highest-impact single finding (the finalizer deadlock). Claude provided the broadest coverage and most actionable suggestions. Codex was the most efficient — two precise, well-researched findings in the shortest time.

---

## Appendix: Original Reviews

### Claude Opus 4.6

#### Executive Summary

This PR implements the core App controller reconciliation loop: creating and owning a LimaVM instance from an embedded template, propagating `spec.running`, mirroring VM status conditions, and handling deletion via a cleanup finalizer. The code is well-structured with a clear priority-chain reconciliation pattern and thorough defensive cleanup. The main concerns are a race condition in the test suite and a full-object `Update` that will cause unnecessary conflict retries.

#### Findings

##### Critical Issues

None.

##### Important Issues

1. **Test race: App condition not awaited before assertion** — `bats/tests/32-app-controllers/app.bats:128-137` `(important, regression)`

```bash
@test "verify App status mirrors LimaVM Created condition" {
    run -0 rdd ctl get app "${APP_NAME}" -o json
    local json="${output}"

    run -0 jq -r '.status | has("conditions")' <<<"${json}"
    assert_output "true"
```

Test 124-126 waits for the LimaVM's `Created` condition, then test 128 immediately reads the App status without waiting for the App controller to mirror it. The `Owns()` watch at line 219 of `app_controller.go` triggers a reconcile when the LimaVM changes, but the reconcile may not complete before the BATS test reads. There is a window between the LimaVM status update and the App controller's `Status().Update()` at line 206. The same race applies to test 139-149 ("verify App conditions preserve LimaVM LastTransitionTime").

Fix: Add `rdd ctl wait --for=condition=Created app/"${APP_NAME}" --timeout=30s` before the assertion at line 129.

2. **Full-object Update causes unnecessary conflict retries** — `app_controller.go:185` `(important, regression)`

```go
limaVM.Spec.Running = app.Spec.Running
if err := r.Update(ctx, limaVM); err != nil {
    return ctrl.Result{}, fmt.Errorf("failed to propagate running state to LimaVM: %w", err)
}
```

`r.Update(ctx, limaVM)` sends the entire LimaVM object (fetched at line 116). The LimaVM controller runs concurrently and frequently patches the LimaVM status (visible at `limavm_controller.go:178,248,277,...` and `limavm_lifecycle.go:94,165,...`). Any such status patch between line 116 and line 185 increments the resourceVersion, causing the App controller's Update to fail with 409 Conflict. This is functionally correct (conflicts cause retries) but wasteful. More importantly, a full-object Update can overwrite concurrent metadata changes (annotations, labels) made by the LimaVM controller between the Get and the Update.

Fix: Use a `MergePatch` targeting only `spec.running`:

```go
patch := client.MergeFrom(limaVM.DeepCopy())
limaVM.Spec.Running = app.Spec.Running
if err := r.Patch(ctx, limaVM, patch); err != nil {
```

3. **Incomplete comment** — `app_controller.go:114` `(important, regression)`

```go
// Check LimaVM first. If it exists and we can skip ConfigMap
```

This sentence is truncated mid-thought. A reader following the reconciliation flow will wonder what "skip ConfigMap" means.

Fix: Complete the sentence, e.g., `// Check whether the LimaVM already exists. If so, skip ConfigMap and LimaVM creation.`

4. **Hardcoded template ConfigMap name duplicates canonical helper** — `app_controller.go:74` `(important, regression)`

```go
templateCMName := limaVMName + "-template"
```

The canonical way to compute this name is `limaVM.GetTemplateConfigMapName()` at `limavm_types.go:113-115`. But on the deletion path at line 74, the LimaVM is already gone (the `IsNotFound(err)` branch), so there is no LimaVM object to call the helper on. Still, the hardcoded string `+ "-template"` should be a shared constant to avoid drift.

Fix: At minimum, add a comment referencing `GetTemplateConfigMapName()` as the source of truth. Better: extract a package-level constant like `templateConfigMapSuffix = "-template"` that both the helper and this code use.

5. **Deletion path doesn't verify ConfigMaps are cleaned up** — `bats/tests/32-app-controllers/app.bats:191-194` `(important, gap)`

```bash
@test "verify LimaVM rd is deleted when App is deleted" {
    run -1 rdd ctl get limavm "${VM_NAME}" --namespace "${RDD_NAMESPACE}"
    assert_output --partial "not found"
}
```

The test verifies the LimaVM is gone after App deletion, but doesn't verify that the template ConfigMap (`rd-template`) and input ConfigMap (`rd`) are also cleaned up. The cleanup code at lines 63-84 of `app_controller.go` handles this, but there's no test coverage for it. If the cleanup code regresses, orphaned ConfigMaps would accumulate silently.

Fix: Add a test after deletion that verifies `rdd ctl get configmap "rd-template" --namespace "${RDD_NAMESPACE}"` returns not found.

##### Suggestions

1. **Typo in comment** — `app_controller.go:182` `(suggestion, regression)`

```go
// Propagate spec.running from App into to the LimaVM.
```

"into to" — one too many prepositions.

Fix: `// Propagate spec.running from App to the LimaVM.`

2. **`ObservedGeneration` mirroring is semantically incorrect** — `app_controller.go:201` `(suggestion, regression)`

```go
ObservedGeneration: cond.ObservedGeneration,
```

This copies the LimaVM's `ObservedGeneration` into the App's condition. Per Kubernetes conventions, `ObservedGeneration` should refer to the generation of the *containing* object (the App), not the source object (the LimaVM). Currently benign because the LimaVM controller doesn't set `ObservedGeneration` (confirmed: no occurrences in `pkg/controllers/lima/limavm/controllers/`), so the value is always 0. But if anyone adds generation tracking to the LimaVM controller, the mirrored value on the App would silently become misleading.

Fix: Use `ObservedGeneration: app.Generation` instead, or omit the field entirely (pass 0) since the App controller doesn't track its own generation either.

3. **`conditionMessageMaxLen` comment is misleading** — `limavm_controller.go:426-430` `(suggestion, regression)`

```go
// conditionMessageMaxLen caps how long a condition message can be.
// The API server has a 32KB limit for the whole object, so we keep
// messages short to avoid failures, especially when errors get long
// (like repeated image download retries).
const conditionMessageMaxLen = 32768
```

The comment says "32KB limit for the whole object" but 32768 is actually the per-field `maxLength` from the CRD condition schema (visible at `crd.yaml:86`). Setting the truncation limit to the maximum isn't "keeping messages short." The truncation is correct — it prevents exceeding the CRD schema validation — but the comment misrepresents why.

Fix: Rewrite as: `// conditionMessageMaxLen matches the CRD maxLength for condition messages. Truncation prevents validation errors when error messages grow large.`

4. **Event emission removed without replacement** — `app_controller.go` `(suggestion, gap)`

The pre-existing code had a `Recorder events.EventRecorder` field and emitted events via `setCondition` whenever conditions changed. The new code removes both. The App controller no longer emits events for any state transitions (creation, deletion, running propagation, condition mirroring). The LimaVM controller still emits its own events, but there's no observability signal for App-level operations like "created LimaVM" or "propagated running state." This makes debugging harder when the App controller is involved.

Fix: Re-add the event recorder and emit events for key lifecycle transitions (LimaVM created, running propagated, deletion started).

#### Design Observations

**Concerns**

- **(future)** The priority-chain pattern (return after each action, let the next reconcile pick up the next step) is clean and debuggable. However, the condition mirroring at lines 194-210 runs on every reconcile where nothing else needs doing. If the LimaVM has many conditions or updates frequently, this path will execute often. Currently there are only two condition types (Created, Running), so this is fine. But if the number grows, consider comparing the full condition set before calling `Status().Update()` — the `statusChanged` boolean already handles this correctly.

**Strengths**

- The defensive ConfigMap cleanup during App deletion (lines 63-84) correctly handles edge cases where LimaVM's `DeleteOwnedResources` didn't run, providing belt-and-suspenders reliability.
- The use of `SetControllerReference` for both the input ConfigMap and the LimaVM establishes proper ownership, and the `Owns(&limav1alpha1.LimaVM{})` watch at line 219 ensures the App controller is re-triggered on LimaVM status changes. The cross-namespace (cluster-scoped owner -> namespaced child) ownership is correctly supported by controller-runtime v0.23.3.
- The `LastTransitionTime` mirroring preserves the original LimaVM timestamp rather than generating a new one. This is correct behavior — `apimeta.SetStatusCondition` respects explicitly-provided timestamps.
- The immutability validation on `spec.namespace` (`XValidation:rule="self == oldSelf"`) prevents orphaning owned resources, and the test at line 176-179 verifies this enforcement.

#### Testing Assessment

Coverage is solid for the main lifecycle (create, own, propagate, mirror, delete, recreate). Untested scenarios ranked by risk:

1. **Race between LimaVM condition and App mirror** — covered in Finding 1 above.
2. **ConfigMap cleanup after deletion** — no test verifies that `rd-template` and `rd` ConfigMaps are gone after App deletion.
3. **Conflict recovery** — no test exercises the case where the App controller's `Update` at line 185 gets a 409 Conflict and retries.
4. **App deletion while LimaVM is mid-startup** — the test always starts with `running: false`. Deleting during a running or starting VM exercises the full cleanup flow more thoroughly.

#### Documentation Assessment

No documentation gaps. The CRD yaml includes the immutability description. The embedded `lima.yaml` is self-explanatory.

#### Commit Structure

The two commits are well-scoped: `6a94f5d` implements the controller logic and `dd300d6` adds the BATS tests. Separating them makes review easier, though one could argue the implementation and tests should ship together in a single commit for bisectability.

### Codex GPT 5.4

#### Executive Summary

This PR wires the `App` singleton to create, own, reconcile, and delete a single `LimaVM`, then mirrors the VM's conditions back onto `App.status`. The lifecycle plumbing is mostly coherent, but the embedded bootstrap template introduces two concrete regressions: it hard-codes instance-specific host endpoints, and it never selects the VM image format from the host platform, so the new controller is not actually portable across supported RDD instance shapes and hosts.

#### Findings

##### Critical Issues

None.

##### Important Issues

1. **Embedded template hard-codes the wrong host socket path and a global SSH port** — `pkg/controllers/app/app/lima.yaml:13-15` `(important, regression)`

```yaml
ssh:
  localPort: 51422
  loadDotSSHPubKeys: false
...
  guestSocket: /var/run/docker.sock
  hostSocket: "{{.Home}}/.rd/docker.sock"
```

The embedded template fixes `ssh.localPort` at line 14 and `hostSocket` at line 37, and the app controller copies that template verbatim into the bootstrap ConfigMap at `pkg/controllers/app/app/controllers/app_controller.go:128-145`. That breaks the repo's existing per-instance contract: `instance.ShortDir()` is `~/.rd<suffix>` at `pkg/instance/instance.go:101-113`, with `~/.rd2` as the default at `docs/design/environment.md:40-43`, so `hostSocket` on line 37 points every App-created VM at the wrong Docker socket path. The fixed port on line 14 has the same problem for parallel instances: RDD already derives instance-specific ports from `instance.Index()` at `pkg/instance/instance.go:24-38`, and the test helpers mirror that at `bats/helpers/instance.bash:16-21`, but this template forces every instance to contend for `51422`.

Fix: render the bootstrap template from runtime values instead of embedding literal endpoints. At minimum, fill the socket path from `instance.ShortDir()` and derive the SSH port from an instance-indexed base.

2. **App bootstrap never selects the image format from the host, so the new lifecycle is platform-specific** — `pkg/controllers/app/app/lima.yaml:1-7` `(important, regression)`

```yaml
images:
- location: https://github.com/rancher-sandbox/rancher-desktop-opensuse/releases/download/v0.1.2/distro.v0.1.2.amd64.qcow2
  arch: x86_64
- location: https://github.com/rancher-sandbox/rancher-desktop-opensuse/releases/download/v0.1.2/distro.v0.1.2.arm64.qcow2
  arch: aarch64
```

The new controller always embeds the same `lima.yaml` at `pkg/controllers/app/app/controller.go:18-21`, passes it through unchanged at `pkg/controllers/app/app/controller.go:72-75`, and writes it verbatim into the ConfigMap at `pkg/controllers/app/app/controllers/app_controller.go:128-145`. Because the file only contains `*.qcow2` images at lines 2-6, the implementation never performs the host-based image-format selection called for in issue #227. That matters here because Windows is a supported LimaVM target in-tree (`pkg/controllers/lima/limavm/drivers_wsl2.go:5-10`), yet the new app BATS suite explicitly skips Windows at `bats/tests/32-app-controllers/app.bats:42-43`, so this portability regression is currently untested.

Fix: build the `images:` stanza at reconcile time from the detected host platform / selected driver, including the tar-based artifact path where required, and add at least one test that exercises the template-selection logic off macOS/Linux-qemu happy paths.

##### Suggestions

None.

#### Design Observations

**Concerns**

- **Render only the dynamic parts of the bootstrap template at runtime** `(in-scope)` — The current design embeds a fully concrete `lima.yaml`, but several fields are not actually static: socket paths depend on `RDD_INSTANCE`, ports depend on the per-instance index, and image artifacts depend on the host/driver. Keeping a mostly-static base template but filling a small runtime struct before writing the ConfigMap would remove both regressions above without complicating the reconciler.

**Strengths**

- The new finalizer flow is directionally correct: `App` now owns the `LimaVM`, blocks its own deletion until child teardown finishes, and explicitly cleans up bootstrap ConfigMaps instead of assuming Kubernetes GC exists.
- Making `spec.namespace` immutable is a good defensive change for a cluster-scoped owner that manages namespaced children; it closes an otherwise easy orphaning path.

#### Testing Assessment

1. The new app tests never boot the VM to `Running=True`, so they do not validate that the embedded template's `hostSocket` and `ssh.localPort` actually work for the current `RDD_INSTANCE`.
2. Windows / WSL2 is entirely untested for this feature (`bats/tests/32-app-controllers/app.bats:42-43`), even though the repo has a Windows Lima driver.
3. No test covers non-default `RDD_INSTANCE` values, so the instance-specific path/port regressions in the embedded template slip through.

#### Documentation Assessment

The existing docs already describe per-instance short directories and multiple RDD instances. If the embedded-template approach remains, the implementation needs to follow that documented contract; otherwise the docs around instance-specific paths and parallel instances become misleading.

### Gemini 3.1 Pro

#### Executive Summary

This PR introduces the core `App` controller to act as a singleton manager for the underlying `LimaVM` instance. While the core lifecycle logic (creation, status mirroring) is mostly correct, there are several critical flaws in the state machine and finalizer handling. Most notably, the controller will deadlock if the `LimaVM` is deleted manually, and it stalls its reconciliation loop immediately after creating the VM due to an incorrect return statement.

#### Critical Issues

1. **Reconciliation stall after ConfigMap deletion** — `pkg/controllers/app/app/controllers/app_controller.go:167-174` (critical, regression)

```go
if err := r.Get(ctx, client.ObjectKey{Name: inputConfigMapName, Namespace: namespace}, inputConfigMap); err == nil {
    if err := r.Delete(ctx, inputConfigMap); err != nil && !apierrors.IsNotFound(err) {
        return ctrl.Result{}, fmt.Errorf("failed to delete input ConfigMap: %w", err)
    }
    log.Info("Deleted input ConfigMap after LimaVM created its own copy")
    return ctrl.Result{}, nil
} else if !apierrors.IsNotFound(err) {
```

Returning `ctrl.Result{}, nil` immediately after successfully deleting the input ConfigMap halts the current reconciliation cycle. The controller never reaches the `spec.running` propagation or condition mirroring logic below it. Because the `App` controller does not `Watch` or `Own` ConfigMaps, the deletion itself does not trigger a subsequent watch event. If the `LimaVM` happens to reach a steady state with no further status updates before this block executes, the `App` controller will stall indefinitely.

Fix: Remove `return ctrl.Result{}, nil` on the successful deletion path so the controller continues to update `LimaVM.Spec.Running` and mirror conditions in the same pass.

2. **Deadlock when LimaVM is terminating** — `pkg/controllers/app/app/controllers/app_controller.go:85-88` and `114` (critical, regression)

```go
case !base.IsBeingDeleted(limaVM):
    if err := base.RemoveOwnedFinalizer(ctx, r.Client, limaVM); err != nil { ... }
    if err := r.Delete(ctx, limaVM); err != nil && !apierrors.IsNotFound(err) { ... }
    return ctrl.Result{RequeueAfter: requeueAfterDeletion}, nil
default:
    // LimaVM deletion is in progress; keep requeueing until it is gone.
    log.Info("Waiting for LimaVM deletion to complete")
    return ctrl.Result{RequeueAfter: requeueAfterDeletion}, nil
```

The `App` controller is the sole manager of the `base.OwnedFinalizerName` on the `LimaVM`. If a `LimaVM` is already terminating (e.g., deleted manually by a user, or if `App` deletion hits an error after issuing the delete but before removing the finalizer), `base.IsBeingDeleted(limaVM)` will be true. The code falls through to the `default` case and requeues without ever removing the finalizer. The `LimaVM` becomes permanently stuck in `Terminating`, and the `App` is stuck waiting for it. The same bug exists in the normal reconcile loop (line 114): it ignores terminating LimaVMs without removing their finalizer.

Fix: Separate finalizer removal from the deletion request. In both the cleanup block and normal reconcile block, always check if the finalizer is present on a terminating `LimaVM` and remove it so it can finish deleting.

```go
// In the deletion switch block:
case err == nil:
    if controllerutil.ContainsFinalizer(limaVM, base.OwnedFinalizerName) {
        if err := base.RemoveOwnedFinalizer(ctx, r.Client, limaVM); err != nil {
            return ctrl.Result{}, fmt.Errorf("failed to remove owned finalizer: %w", err)
        }
    }
    if !base.IsBeingDeleted(limaVM) {
        if err := r.Delete(ctx, limaVM); err != nil && !apierrors.IsNotFound(err) {
            return ctrl.Result{}, fmt.Errorf("failed to delete LimaVM: %w", err)
        }
    }
    log.Info("Waiting for LimaVM deletion to complete")
    return ctrl.Result{RequeueAfter: requeueAfterDeletion}, nil
```

#### Important Issues

1. **Transient errors cause ConfigMap leaks** — `pkg/controllers/app/app/controllers/app_controller.go:66-70` and `79-83` (important, regression)

```go
} else if !apierrors.IsNotFound(cmErr) {
    log.Error(cmErr, "Failed to fetch input ConfigMap during cleanup")
}
```

During `App` deletion, if fetching the ConfigMaps fails with a transient API error (like connection timeout or rate limit), the code simply logs the error and continues. It ultimately calls `base.RemoveCleanupFinalizer(..., &app)`, successfully deleting the `App`. Because the embedded control plane lacks a garbage collector, the bypassed ConfigMaps will leak permanently.

Fix: Return the error instead of just logging it so the deletion is retried:
```go
} else if !apierrors.IsNotFound(cmErr) {
    return ctrl.Result{}, fmt.Errorf("failed to fetch input ConfigMap during cleanup: %w", cmErr)
}
```

2. **Incorrect ObservedGeneration in mirrored conditions** — `pkg/controllers/app/app/controllers/app_controller.go:192` (important, regression)

```go
Message:            cond.Message,
ObservedGeneration: cond.ObservedGeneration,
LastTransitionTime: cond.LastTransitionTime,
```

Copying `cond.ObservedGeneration` blindly from the `LimaVM` condition applies the `LimaVM`'s metadata generation to the `App`'s condition. The Kubernetes standard dictates that `ObservedGeneration` in a status condition must reflect the `.metadata.generation` of the resource the condition belongs to (the `App`). Using a foreign generation value breaks staleness checks for external clients watching the `App`.

Fix: Set `ObservedGeneration: app.Generation` instead.

3. **Missing host-based image format selection (qcow2/tar)** — `pkg/controllers/app/app/lima.yaml:2-7` (important, gap)

The requirements in Issue #227 explicitly state: "Select correct arch (amd64/arm64) and image format (qcow2/tar) from host". While `lima.yaml` natively handles architecture selection by listing both, it hardcodes the `.qcow2` extension for both. This misses the requirement to select the correct image format from the host (which is crucial since `Virtualization.framework` on macOS can require `.tar` or `.raw` disk images depending on the Lima version and configuration).

Fix: Do not rely solely on the static `lima.yaml` embedded string. The `AppReconciler` should dynamically generate or patch `r.LimaTemplateData` to use the correct `.tar` or `.qcow2` image path based on the underlying host OS and VM engine before persisting it to the ConfigMap.

#### Suggestions

1. **Redundant template ConfigMap cleanup** — `pkg/controllers/app/app/controllers/app_controller.go:72-84` (suggestion, enhancement)

The `AppReconciler` attempts to fetch, remove the finalizer from, and delete `templateCMName` (`rd-template`). This is entirely redundant. The `LimaVM` controller's `handleDeletion` logic already uses `base.DeleteOwnedResources` to reliably strip the owned finalizer and delete it. You can safely remove this entire block.

*Consolidation note: This suggestion is incorrect. The App's cleanup code at lines 72-84 runs in the `IsNotFound` case for the LimaVM (line 62), meaning the LimaVM is already gone. `DeleteOwnedResources` runs during the LimaVM's deletion, not after. If the LimaVM was deleted but its `DeleteOwnedResources` failed to clean up a ConfigMap, the App's belt-and-suspenders cleanup catches it.*

2. **Validating preexisting input ConfigMaps** — `pkg/controllers/app/app/controllers/app_controller.go:121-125` (suggestion, enhancement)

If `limaVM` is missing but `inputConfigMap` already exists, the controller skips recreating the ConfigMap and assumes its contents are valid. If it was modified or created by a different process, the `LimaVM` may fail to boot. Consider enforcing the contents by running an `Update` on it, or simply deleting and recreating it.

#### Testing Assessment

The `bats` integration tests provide excellent functional coverage of the happy path lifecycle (creation, condition mirroring, propagation, deletion).

Untested scenarios (ranked by risk):
1. **LimaVM external deletion:** There is no test ensuring the controller safely cleans up finalizers and recreates the `LimaVM` if the owned instance is deleted manually by a user or CLI tool.
2. **Deletion timeouts:** There is no test for a locked `LimaVM` (one that won't delete) to ensure `App` handles the timeout/requeue loop gracefully without deadlocking.

#### Documentation Assessment

No documentation updates were included in this PR. If `App` resource behavior (e.g., immutability of `namespace`) is exposed to users, `docs/design/api_app.md` should be updated to reflect the lifecycle.

#### Commit Structure

The PR history is clean and clearly separated between the application controller logic and the BATS testing.
