# Deep Review: 20260324-190453-pr-236

| | |
|---|---|
| **Date** | 2026-03-24 19:04 |
| **Round** | 4 |
| **Author** | [@Nino-K](https://github.com/Nino-K) |
| **PR** | [#236](https://github.com/rancher-sandbox/rancher-desktop-daemon/pull/236) — Implement App controller lifecycle for LimaVM |
| **Branch** | `app-controller-create-delete-limavm` |
| **Commits** | `2a3c5b2` Implement App controller lifecycle for LimaVM<br>`049a3a0` Add Bats test to App Controller |
| **Reviewers** | Claude Opus 4.6, Codex GPT 5.4, Gemini 3.1 Pro |
| **Verdict** | **Merge with fixes** — wrong finalizer kind in the deletion safety-net blocks App deletion if triggered; ObservedGeneration mirroring breaks Kubernetes condition conventions |
| **Wall-clock time** | `14 min 42 s` |

---

## Executive Summary

This PR implements the App controller's core reconciliation loop: creating a LimaVM instance from an embedded template, propagating `spec.running`, mirroring VM status conditions onto the App resource, and cleaning up owned resources on deletion via a finalizer. The design is clean with a well-structured priority chain that avoids multi-mutation conflicts. Two issues need fixing: the deletion safety-net uses the wrong finalizer kind (all three agents agree), and mirrored ObservedGeneration values reference the wrong resource's generation (two agents agree).

---

## Critical Issues

None.

---

## Important Issues

1. **Wrong finalizer kind in template ConfigMap cleanup blocks App deletion** — `pkg/controllers/app/app/controllers/app_controller.go:76` (important, regression) [Claude Opus 4.6, Codex GPT 5.4, Gemini 3.1 Pro]

```go
if cmErr := base.RemoveOwnedFinalizer(ctx, r.Client, templateCM, v1alpha1.AppKind); cmErr != nil {
    return ctrl.Result{}, fmt.Errorf("failed to remove finalizer from template ConfigMap: %w", cmErr)
}
```

The deletion safety-net at lines 73–84 fires when the LimaVM is already gone but ConfigMaps linger. The template ConfigMap's owned finalizer is `rdd.rancherdesktop.io/owned-by-LimaVM` (set by the LimaVM mutating webhook), not `owned-by-App`. `RemoveOwnedFinalizer` with `AppKind` is a no-op — `controllerutil.RemoveFinalizer` returns false because the target finalizer does not exist (line 84 of `finalizer.go`). The subsequent `r.Delete` at line 79 is rejected by the `OwnedDeletionGuard` webhook (line 57 of `webhook.go`) because `owned-by-LimaVM` remains. The error propagates, causing infinite requeues — the App is permanently stuck in Terminating.

In normal operation this safety-net is unreachable: the LimaVM controller's `handleDeletion` calls `DeleteOwnedResources` to strip owned finalizers and delete the template CM before removing the LimaVM's cleanup finalizer. The path fires only if the LimaVM is removed abnormally (e.g., someone manually strips its finalizers).

Fix:

```diff
-			if cmErr := base.RemoveOwnedFinalizer(ctx, r.Client, templateCM, v1alpha1.AppKind); cmErr != nil {
+			if cmErr := base.RemoveOwnedFinalizer(ctx, r.Client, templateCM, limav1alpha1.LimaVMKind); cmErr != nil {
```

2. **Mirrored ObservedGeneration references wrong resource** — `pkg/controllers/app/app/controllers/app_controller.go:207` (important, regression) [Claude Opus 4.6, Gemini 3.1 Pro]

```go
statusChanged = apimeta.SetStatusCondition(&app.Status.Conditions, metav1.Condition{
    Type:               cond.Type,
    Status:             cond.Status,
    Reason:             cond.Reason,
    Message:            base.TruncateConditionMessage(cond.Message),
    ObservedGeneration: cond.ObservedGeneration,
    LastTransitionTime: cond.LastTransitionTime,
}) || statusChanged
```

`cond.ObservedGeneration` reflects the LimaVM's `metadata.generation`. When set on the App's condition, Kubernetes convention says it should reflect `app.metadata.generation` — the generation of the resource the condition is attached to. A client checking whether the App controller has processed its latest spec change would see a mismatched generation and incorrectly conclude the condition is stale.

Fix:

```diff
-			ObservedGeneration: cond.ObservedGeneration,
+			ObservedGeneration: app.Generation,
```

---

## Suggestions

1. **ConfigMap read failures swallowed during deletion can orphan resources** — `pkg/controllers/app/app/controllers/app_controller.go:69-84` (suggestion, regression) [Codex GPT 5.4]

```go
} else if !apierrors.IsNotFound(cmErr) {
    log.Error(cmErr, "Failed to fetch input ConfigMap during cleanup")
}
// ... (same pattern for template CM at lines 82-84)
```

In the deletion path, non-NotFound errors from ConfigMap Gets at lines 69–70 and 82–84 are logged but fall through to `RemoveCleanupFinalizer` at line 86. If a transient API error prevents reading a ConfigMap, the App's cleanup finalizer is removed and the App can be deleted, potentially orphaning ConfigMaps. Since RDD has no garbage collector, orphaned resources with owned finalizers would linger permanently.

In the embedded single-user kube-apiserver, transient Get errors are very rare, making the practical risk low. But the defensive fix is straightforward.

Fix: Return errors from ConfigMap Gets instead of logging and continuing, so the cleanup finalizer is not removed until both ConfigMaps are confirmed absent or successfully deleted.

2. **Stale input ConfigMap reused without validation** — `pkg/controllers/app/app/controllers/app_controller.go:121-143` (suggestion, gap) [Gemini 3.1 Pro]

```go
inputCM := &corev1.ConfigMap{}
err := r.Get(ctx, client.ObjectKey{Name: inputConfigMapName, Namespace: namespace}, inputCM)
if err != nil && !apierrors.IsNotFound(err) {
    return ctrl.Result{}, err
}
if apierrors.IsNotFound(err) {
    inputCM = &corev1.ConfigMap{
        // ... creates with fresh LimaTemplateData
```

If a stale input ConfigMap exists from a previous crash (between creating the CM at line 140 and the LimaVM at line 162), the controller reuses it without updating its `Data` to match `r.LimaTemplateData`. If the binary was updated between crashes, the LimaVM would be created from outdated template data.

The practical risk is low: the window is narrow (crash between two sequential API calls), and template data is embedded so it only changes with binary updates (which restart the controller). But an explicit update-or-create pattern would eliminate the concern.

Fix: Use `controllerutil.CreateOrUpdate` or update the Data and OwnerReference when the ConfigMap already exists.

3. **Misleading docstring on ConditionMessageMaxLen** — `pkg/controllers/base/conditions.go:12-16` (suggestion, regression) [Claude Opus 4.6, Gemini 3.1 Pro]

```go
// ConditionMessageMaxLen is the maximum number of runes allowed in a Kubernetes
// condition message. The API server enforces a 32768-byte limit on the whole
// object; keeping messages within this bound avoids rejections from operations
```

The comment says 32768 is a byte limit on the whole object, but 32768 is actually the per-field character limit on the condition message (from `maxLength: 32768` in the CRD validation). The whole-object limit is 1.5 MB. The comment should reference the CRD field validation, not a whole-object byte limit.

4. **No unit tests for TruncateConditionMessage** — `pkg/controllers/base/conditions.go:22-29` (suggestion, gap) [Claude Opus 4.6]

The new function has no unit test. A table-driven test covering empty string, under-limit, at-limit, and over-limit cases (including multi-byte runes) would prevent regressions on the boundary logic.

---

## Design Observations

### Concerns

1. **Template ConfigMap cleanup responsibility is split across controllers** — The LimaVM webhook creates the template ConfigMap with `owned-by-LimaVM`, the LimaVM controller later attaches OwnerReferences, and the App controller has a name-based fallback delete path. This split creates the wrong-finalizer bug in Finding 1. A cleaner design would have the App only delete the LimaVM and leave all template ConfigMap cleanup to the LimaVM controller's finalizer. The App's safety-net would then only need to confirm the LimaVM is fully gone. `(in-scope)` [Codex GPT 5.4]

2. **Condition mirroring is add-only** — The mirroring loop at lines 201–210 copies LimaVM conditions to the App but never removes App conditions that no longer exist on the LimaVM. For the current condition types (Created, Running), which are set once and never removed, this is correct. If a future LimaVM version introduces transient conditions, they would be stale on the App forever. A brief comment noting this would be worthwhile. `(future)` [Claude Opus 4.6]

### Strengths

- **Explicit requeue after ConfigMap deletion** (lines 177–180): The comment explains why ConfigMaps are not watched and manually requeues. This prevents a subtle controller stall. [Claude Opus 4.6, Codex GPT 5.4]
- **Priority chain** in the reconcile loop: Each action returns immediately after its step, keeping the resourceVersion fresh and avoiding multi-mutation conflicts. [Claude Opus 4.6]
- **Input ConfigMap lifecycle** is well designed: created as a bootstrap artifact, deleted once the LimaVM webhook copies it. Clean transient-resource pattern. [Claude Opus 4.6]
- **CRD-level immutability** for `spec.namespace` via `XValidation:rule="self == oldSelf"` prevents orphaned resources across namespaces. [Gemini 3.1 Pro]

---

## Testing Assessment

The BATS integration tests cover the happy-path lifecycle well (singleton creation, finalizer, ownership, running propagation, condition mirroring, immutability rejection, deletion, recreation). Untested scenarios ranked by risk:

1. **Template ConfigMap orphan during deletion** — If the LimaVM disappears without cleaning up the template CM, the safety-net fires with the wrong finalizer (Finding 1). Hard to test in BATS but could be exercised by manually stripping the LimaVM's cleanup finalizer.
2. **App deletion immediately after creation** — Before the LimaVM controller sets OwnerReferences on the template CM, deleting the App could leave the template CM orphaned.
3. **Stale input ConfigMap from a previous crash** — The controller reuses whatever it finds without validating Data content (Suggestion 2).
4. **No unit tests for App reconciler** — The entire `app_controller.go` is tested only via BATS integration.
5. **No unit tests for TruncateConditionMessage** — Boundary logic on rune counting has no unit test (Suggestion 4).

---

## Documentation Assessment

Code is well-commented overall. The only documentation issue is the misleading `ConditionMessageMaxLen` docstring (Suggestion 3). The `lima.yaml` embed comment ("This is temporary and will be removed") at `controller.go:18` is clear about the transient nature.

---

## Acknowledged Limitations

- `pkg/controllers/app/app/controller.go:18`: *"This is temporary and will be removed once the app controller is fully implemented."* — Acknowledges the hardcoded embedded template. Reasonable for bootstrapping; impact is limited to a single embedded file that will be replaced by dynamic template resolution.

---

## Agent Performance Retro

### [Claude]

Claude Opus 4.6 produced the most thorough review with careful cross-controller tracing. It correctly identified the wrong finalizer kind (Finding 1) and traced the full call chain through `RemoveOwnedFinalizer`, `controllerutil.RemoveFinalizer`, and `OwnedDeletionGuard.ValidateDelete` to confirm the bug. It uniquely identified the missing unit test for `TruncateConditionMessage`. It also produced the richest design observations section, with both concerns and strengths well supported.

Claude initially rated the wrong-finalizer bug as Important rather than Critical, correctly reasoning that the safety-net path is unreachable under normal operation. It raised ObservedGeneration as a Suggestion, while Gemini rated it Important — the consolidated report sides with Important given that it breaks Kubernetes conventions.

- Unique contributions: Missing TruncateConditionMessage unit tests, add-only condition mirroring concern, priority-chain and input-CM-lifecycle strengths
- Accuracy: No false positives
- Depth: Excellent cross-controller tracing
- Coverage: 9/9 files reviewed
- Coverage misses: None

### [Codex]

Codex GPT 5.4 was the fastest agent (213s) and produced a focused two-finding review. It uniquely identified the ConfigMap read failures being swallowed during deletion (consolidated as Suggestion 1), correctly noting that the controller does not watch ConfigMaps and therefore won't self-heal from transient errors.

Codex missed the ObservedGeneration issue that both Claude and Gemini caught. Its design concern about split cleanup responsibility was the most architecturally insightful observation — it identified the root cause behind the wrong-finalizer bug rather than just the symptom.

- Unique contributions: ConfigMap error swallowing finding, split-responsibility design concern
- Accuracy: No false positives
- Depth: Good tracing into webhook and controller interactions
- Coverage: 9/9 files reviewed
- Coverage misses: Missed ObservedGeneration issue in `app_controller.go`

### [Gemini]

Gemini 3.1 Pro produced the most findings (4) and was the only agent to flag the stale input ConfigMap reuse (consolidated as Suggestion 2). It correctly identified the wrong finalizer and ObservedGeneration issues. Its suggested fix using `controllerutil.CreateOrUpdate` for the input ConfigMap was well-formed and idiomatic.

Gemini rated the ObservedGeneration issue as Important and classified it as a "logic error" rather than "regression" — the consolidated report uses "regression" since the code was introduced by this PR. Gemini's analysis of the template ConfigMap cleanup was slightly imprecise: it claimed `RemoveOwnedFinalizer` "silently succeeds" and the subsequent Delete "succeeds in marking the ConfigMap with a DeletionTimestamp" — in fact, the `OwnedDeletionGuard` webhook rejects the Delete before the DeletionTimestamp is set.

- Unique contributions: Stale input ConfigMap reuse finding
- Accuracy: One imprecise claim about webhook behavior (template CM deletion)
- Depth: Good but did not trace as deeply into cross-controller paths as Claude
- Coverage: 9/9 files reviewed
- Coverage misses: None

### Summary

| Metric | Claude Opus 4.6 | Codex GPT 5.4 | Gemini 3.1 Pro |
|--------|-----------------|----------------|----------------|
| Duration | 5:19 | 3:33 | 6:25 |
| Critical | 0 | 0 | 0 |
| Important | 1 | 2 | 3 |
| Suggestion | 3 | 0 | 1 |
| Design observations | 3 | 2 | 1 |
| False positives | 0 | 0 | 0 |
| Unique insights | 2 | 2 | 1 |
| Files reviewed | 9 | 9 | 9 |
| Coverage misses | 0 | 1 | 0 |

Claude provided the deepest analysis with the most thorough cross-controller tracing. Codex was the fastest and most focused, identifying the architecturally most insightful design concern (split cleanup responsibility). Gemini produced the broadest set of findings. All three agents converged on the wrong-finalizer bug — the strongest signal in this review. Claude and Gemini both caught the ObservedGeneration issue that Codex missed. Overall, the multi-agent approach worked well: each agent brought unique insights, and no single agent caught everything.

---

## Appendix: Original Reviews

### Claude Opus 4.6

Now I have a thorough understanding of the changes and the cross-controller interactions. Let me write the review.

---

## Executive Summary

This PR implements the App controller's lifecycle management for LimaVM resources: creation (with input template ConfigMap bootstrapping), running-state propagation, condition mirroring, and finalized deletion. The design is clean and well-structured, with good test coverage. The primary finding is a wrong finalizer kind in the deletion safety-net path that would permanently block App deletion if the safety-net is ever triggered.

**Verdict: Approve with changes** — fix the wrong finalizer kind before merge.

## Findings

### Critical Issues

None.

### Important Issues

1. **Wrong finalizer kind in template ConfigMap cleanup blocks App deletion** — `pkg/controllers/app/app/controllers/app_controller.go:76` (important, regression)

```go
// Line 76: uses AppKind, but the template CM has owned-by-LimaVM
if cmErr := base.RemoveOwnedFinalizer(ctx, r.Client, templateCM, v1alpha1.AppKind); cmErr != nil {
```

The deletion safety-net at lines 73-84 runs when the LimaVM is already gone but ConfigMaps linger. The template ConfigMap's owned finalizer is `rdd.rancherdesktop.io/owned-by-LimaVM` (set by the LimaVM mutating webhook at `pkg/controllers/lima/limavm/controller.go:211`), not `owned-by-App`. Using `v1alpha1.AppKind` here means `RemoveOwnedFinalizer` is a no-op — it looks for a finalizer that doesn't exist on the object.

The subsequent `r.Delete(ctx, templateCM)` at line 79 is then rejected by the `OwnedDeletionGuard` webhook (`pkg/controllers/base/webhook.go:57-58`) because `owned-by-LimaVM` is still present. This error propagates, the reconcile is requeued, and the same thing happens indefinitely — the App is permanently stuck in Terminating.

In normal operation, the LimaVM controller's `handleDeletion` (`limavm_lifecycle.go:65`) calls `DeleteOwnedResources` which strips the owned finalizer and deletes the template CM before removing the LimaVM's cleanup finalizer. So the LimaVM only disappears after its children are cleaned up, and this safety-net path is never reached. But if the LimaVM is removed abnormally (e.g., someone manually strips its finalizers), this path fires and breaks.

Fix:

```diff
 			templateCMName := limaVMName + "-template"
 			if cmErr := r.Get(ctx, client.ObjectKey{Name: templateCMName, Namespace: namespace}, templateCM); cmErr == nil {
-				if cmErr := base.RemoveOwnedFinalizer(ctx, r.Client, templateCM, v1alpha1.AppKind); cmErr != nil {
+				if cmErr := base.RemoveOwnedFinalizer(ctx, r.Client, templateCM, limav1alpha1.LimaVMKind); cmErr != nil {
 					return ctrl.Result{}, fmt.Errorf("failed to remove finalizer from template ConfigMap: %w", cmErr)
 				}
```

### Suggestions

2. **Mirrored ObservedGeneration is semantically wrong** — `pkg/controllers/app/app/controllers/app_controller.go:207` (suggestion, regression)

```go
ObservedGeneration: cond.ObservedGeneration,
```

`cond.ObservedGeneration` references the LimaVM's `metadata.generation`, but when set on the App's condition it should reference `app.metadata.generation`. A client checking whether the App controller has observed its latest spec change would see a stale-looking generation. In the current single-user embedded system this is harmless, but it's semantically incorrect.

Fix: use `app.Generation` instead of `cond.ObservedGeneration`.

3. **No unit tests for TruncateConditionMessage** — `pkg/controllers/base/conditions.go:22-29` (suggestion, gap)

The new function has no unit test. It exercises boundary conditions (rune counting, suffix fitting within the limit). A table-driven test with empty string, under-limit, at-limit, and over-limit cases would prevent regressions.

4. **Misleading docstring on ConditionMessageMaxLen** — `pkg/controllers/base/conditions.go:13` (suggestion, regression)

```go
// condition message. The API server enforces a 32768-byte limit on the whole
// object;
```

The 32768 limit is a per-field *character* limit on the condition message (matching `maxLength: 32768` in the CRD at `crd.yaml:86`), not a byte limit on the whole object. The whole-object limit is 1.5MB.

Fix: "The CRD enforces a 32768-character limit on the condition message field."

## Design Observations

### Concerns

**Condition mirroring is add-only** (future) — The mirroring loop at lines 201-210 copies LimaVM conditions to the App but never removes App conditions that no longer exist on the LimaVM. For the current condition types (Created, Running), which are set once and never removed, this is correct. If a future LimaVM version introduces transient conditions that disappear, they'd be stale on the App forever. Worth noting with a brief comment but no code change needed now.

### Strengths

- **Explicit requeue after ConfigMap deletion** (line 177-180): The comment explains *why* ConfigMaps aren't watched and manually requeues. This prevents a subtle stall that's easy to miss in controller-runtime.
- **Priority chain** in the reconcile loop: Each action (deletion, finalizer, creation, cleanup, running propagation, status mirroring) returns immediately after its step, keeping the resourceVersion fresh and avoiding multi-mutation conflicts within a single reconcile.
- **Input ConfigMap lifecycle** is well thought out: created as a bootstrap artifact, deleted once the LimaVM webhook copies it to the managed template. Clean transient-resource pattern.
- **Comprehensive BATS tests**: Cover the full lifecycle (create, finalizer, ownership, running propagation, condition mirroring, immutability rejection, deletion, recreation).

## Testing Assessment

The BATS integration tests are thorough for the happy path. Untested scenarios, ranked by risk:

1. **Template ConfigMap orphan during deletion** — related to Finding 1. If the LimaVM disappears without cleaning up the template CM, the test would block forever. This is hard to test in BATS but could be tested by manually stripping the LimaVM's cleanup finalizer.
2. **App creation when input ConfigMap already exists** — if a previous attempt left an input CM, the controller reuses it (lines 122-127). This path is implicitly covered by the re-creation test but isn't explicitly verified.
3. **No unit tests for the App reconciler** — the entire `app_controller.go` is only tested via BATS integration. A unit test with a fake client could exercise the safety-net deletion path.
4. **No unit tests for TruncateConditionMessage** — as noted in Finding 3.

## Documentation Assessment

The code is well-commented overall. The only doc issue is the misleading `ConditionMessageMaxLen` docstring (Finding 4). The `lima.yaml` embed comment ("temporary and will be removed") at `controller.go:18` is clear about the transient nature.

## Acknowledged Limitations

- `controller.go:18`: `// This is temporary and will be removed once the app controller is fully implemented.` — acknowledges the hardcoded embedded template. Reasonable for bootstrapping; the impact is limited to a single embedded file.

## Coverage Summary

| File | Status |
|------|--------|
| `.github/actions/spelling/expect/rd.txt` | Trivial (added "rosetta") |
| `bats/tests/32-app-controllers/app.bats` | Reviewed, no issues |
| `pkg/apis/app/v1alpha1/app_types.go` | Reviewed, no issues |
| `pkg/controllers/app/app/controller.go` | Reviewed, no issues |
| `pkg/controllers/app/app/controllers/app_controller.go` | **Finding 1**, **Finding 2** |
| `pkg/controllers/app/app/crd.yaml` | Reviewed, no issues |
| `pkg/controllers/app/app/lima.yaml` | Reviewed, no issues |
| `pkg/controllers/base/conditions.go` | **Finding 3**, **Finding 4** |
| `pkg/controllers/lima/limavm/controllers/limavm_controller.go` | Reviewed, no issues |

### Codex GPT 5.4

**Verdict**: Changes requested

### Executive Summary

This PR wires the `App` singleton to create, update, mirror, and delete an owned `LimaVM`, and adds a substantial BATS lifecycle test for the happy path. The main problems are both in the new App cleanup flow: one deletion path can wedge the `App` finalizer permanently by trying to delete `rd-template` with the wrong finalizer removed, and several ConfigMap read failures are logged and ignored even though this controller does not watch ConfigMaps, which can orphan or leak bootstrap state.

### Findings

**Critical Issues**

None.

**Important Issues**

1. **Fallback template cleanup removes the wrong finalizer and can wedge App deletion** — `pkg/controllers/app/app/controllers/app_controller.go:75-80` (important, regression)

```go
if cmErr := r.Get(ctx, client.ObjectKey{Name: templateCMName, Namespace: namespace}, templateCM); cmErr == nil {
	if cmErr := base.RemoveOwnedFinalizer(ctx, r.Client, templateCM, v1alpha1.AppKind); cmErr != nil {
		return ctrl.Result{}, fmt.Errorf("failed to remove finalizer from template ConfigMap: %w", cmErr)
	}
	if cmErr := r.Delete(ctx, templateCM); cmErr != nil && !apierrors.IsNotFound(cmErr) {
		return ctrl.Result{}, fmt.Errorf("failed to delete template ConfigMap during cleanup: %w", cmErr)
	}
}
```

The fallback branch at lines 75-80 assumes `rd-template` is App-owned, but the Lima webhook creates that ConfigMap with `owned-by-LimaVM` at [`pkg/controllers/lima/limavm/controller.go:203-212`](/Users/jan/git/rancher-desktop-daemon/pkg/controllers/lima/limavm/controller.go#L203), and the Lima reconciler only adds the owner reference later at [`pkg/controllers/lima/limavm/controllers/limavm_controller.go:231-240`](/Users/jan/git/rancher-desktop-daemon/pkg/controllers/lima/limavm/controllers/limavm_controller.go#L231). If the user deletes `App` before that reconcile runs, LimaVM deletion can complete without ever discovering `rd-template`, because `DeleteOwnedResources()` keys off owner refs, not names. The new App fallback then removes `owned-by-App` at line 76, which is a no-op, and the delete at line 79 is rejected by the generic owned-resource delete guard at [`pkg/controllers/base/webhook.go:57-58`](/Users/jan/git/rancher-desktop-daemon/pkg/controllers/base/webhook.go#L57). Result: the `App` cleanup finalizer never clears and deletion stalls indefinitely.

Fix: when `LimaVM` is already gone, either strip `OwnedFinalizerFor(v1alpha1.LimaVMKind)` before deleting the orphaned template ConfigMap, or better, move orphan-template cleanup behind a Lima-owned helper so App never has to know Lima’s finalizer contract. In either case, add a test that deletes `App` immediately after create, before `status.templateConfigMap` is populated.

2. **ConfigMap lookup failures are logged and dropped even though no watch will retry them** — `pkg/controllers/app/app/controllers/app_controller.go:65-70`, `pkg/controllers/app/app/controllers/app_controller.go:82-86`, `pkg/controllers/app/app/controllers/app_controller.go:172-183` (important, regression)

```go
} else if !apierrors.IsNotFound(cmErr) {
	log.Error(cmErr, "Failed to fetch input ConfigMap during cleanup")
}
...
} else if !apierrors.IsNotFound(cmErr) {
	log.Error(cmErr, "Failed to fetch template ConfigMap during cleanup")
}
...
} else if !apierrors.IsNotFound(err) {
	log.Error(err, "Failed to fetch input ConfigMap")
}
```

The new reconciler treats non-`NotFound` ConfigMap `Get()` errors as soft failures in all three places above. During deletion, lines 69-70 and 82-86 still proceed to `RemoveCleanupFinalizer()` at line 86, so a transient API read failure can orphan `rd` or `rd-template` permanently; this matters because RDD has no garbage collector. During steady-state bootstrap cleanup, lines 181-182 only log the fetch error and then continue to return `ctrl.Result{}, nil` at line 218. The controller itself documents at lines 177-180 that ConfigMaps are not watched, and `SetupWithManager()` only watches `App` and owned `LimaVM` objects at lines 224-225, so a quiet LimaVM can leave the bootstrap ConfigMap behind indefinitely after one failed read.

Fix: treat any non-`NotFound` ConfigMap read failure as retryable. In practice that means returning the error, or at minimum `RequeueAfter`, and in the delete path refusing to remove the App cleanup finalizer until both ConfigMaps are either successfully deleted or confirmed absent.

**Suggestions**

None.

### Design Observations

**Concerns**

1. Cleanup responsibility for `rd-template` is split across two controllers in a way that depends on timing. The Lima webhook creates the ConfigMap, the Lima reconciler later attaches the owner ref, and the App reconciler has a name-based fallback delete path. That split creates the race above. A cleaner design is to make Lima own the full lifecycle of `rd-template` and have App only delete the `LimaVM`; the fallback for “LimaVM missing but template still exists” should live with Lima’s finalizer semantics, not duplicate them in App. `(in-scope)`

**Strengths**

1. Making `spec.namespace` immutable in both Go types and generated CRD is the right constraint; it avoids a real orphaning class for this singleton controller.

2. The explicit `Requeue: true` after deleting the bootstrap ConfigMap is a good catch. Without it, that step really would depend on unrelated Lima activity to make progress.

### Testing Assessment

1. Missing coverage for deleting `App` immediately after creation, before the Lima reconciler sets the owner reference on `rd-template`. That is the highest-risk lifecycle race in this PR.
2. Missing coverage for transient API failures while fetching bootstrap/cleanup ConfigMaps. The new reconciler has multiple “log and continue” branches here.
3. Missing coverage for partial-create states where the bootstrap `rd` ConfigMap exists but the `LimaVM` create failed or was retried.

I ran `go test ./pkg/controllers/app/... ./pkg/controllers/base ./pkg/controllers/lima/...`; it passed. I did not run the BATS suite.

### Documentation Assessment

No user-facing docs changes look necessary. Code comments around the App cleanup fallback should be tightened once the template-ConfigMap ownership handling is fixed, because the current “clean up any ConfigMaps that may have been left behind” comment overstates what this branch can safely delete.

### Coverage Summary

- `.github/actions/spelling/expect/rd.txt` — Trivial
- `bats/tests/32-app-controllers/app.bats` — Reviewed, no issues
- `pkg/apis/app/v1alpha1/app_types.go` — Reviewed, no issues
- `pkg/controllers/app/app/controller.go` — Reviewed, no issues
- `pkg/controllers/app/app/controllers/app_controller.go` — Finding 1, Finding 2
- `pkg/controllers/app/app/crd.yaml` — Reviewed, no issues
- `pkg/controllers/app/app/lima.yaml` — Reviewed, no issues
- `pkg/controllers/base/conditions.go` — Reviewed, no issues
- `pkg/controllers/lima/limavm/controllers/limavm_controller.go` — Reviewed, no issues

### Gemini 3.1 Pro

### Executive Summary

The PR introduces the core reconciliation loop for the App controller, which acts as the singleton entry point for managing the Lima VM lifecycle. It creates and configures the `LimaVM` instance via a temporary input ConfigMap, propagates state changes (e.g., `spec.running`), and mirrors conditions back to the `App` resource. The review found a few logic errors regarding orphaned ConfigMap finalizers, unvalidated reuse of existing ConfigMaps, and condition generation mirroring that should be addressed before merging.

### Findings

#### Critical Issues

None.

#### Important Issues

1. **Wrong finalizer removed during template ConfigMap cleanup** — `pkg/controllers/app/app/controllers/app_controller.go:76-85` (important, logic error)

```go
			templateCM := &corev1.ConfigMap{}
			templateCMName := limaVMName + "-template"
			if cmErr := r.Get(ctx, client.ObjectKey{Name: templateCMName, Namespace: namespace}, templateCM); cmErr == nil {
				if cmErr := base.RemoveOwnedFinalizer(ctx, r.Client, templateCM, v1alpha1.AppKind); cmErr != nil {
					return ctrl.Result{}, fmt.Errorf("failed to remove finalizer from template ConfigMap: %w", cmErr)
				}
				if cmErr := r.Delete(ctx, templateCM); cmErr != nil && !apierrors.IsNotFound(cmErr) {
```

When cleaning up an orphaned `templateCM` (which happens if `LimaVM` is already fully deleted but left the ConfigMap behind), the controller tries to remove `v1alpha1.AppKind`'s finalizer (`rdd.rancherdesktop.io/owned-by-App`). However, `templateCM` is originally created by the LimaVM webhook, which strictly sets the `rdd.rancherdesktop.io/owned-by-LimaVM` finalizer. The App controller never adds its own finalizer to this ConfigMap.

Because `RemoveOwnedFinalizer` silently succeeds when the target finalizer is missing, the subsequent `r.Delete` call succeeds in marking the ConfigMap with a `DeletionTimestamp`. However, the `owned-by-LimaVM` finalizer remains attached. Since the LimaVM controller has already finished its cleanup (hence `LimaVM` is `NotFound`), nothing will ever remove this finalizer. The `templateCM` will be permanently stuck in a Terminating state.

Fix: Change the finalizer removal to target `limav1alpha1.LimaVMKind`.

```diff
-				if cmErr := base.RemoveOwnedFinalizer(ctx, r.Client, templateCM, v1alpha1.AppKind); cmErr != nil {
+				if cmErr := base.RemoveOwnedFinalizer(ctx, r.Client, templateCM, limav1alpha1.LimaVMKind); cmErr != nil {
```

2. **Unvalidated reuse of existing input ConfigMap** — `pkg/controllers/app/app/controllers/app_controller.go:111-132` (important, gap)

```go
		inputCM := &corev1.ConfigMap{}
		err := r.Get(ctx, client.ObjectKey{Name: inputConfigMapName, Namespace: namespace}, inputCM)
		if err != nil && !apierrors.IsNotFound(err) {
			return ctrl.Result{}, err
		}
		if apierrors.IsNotFound(err) {
			inputCM = &corev1.ConfigMap{
// ...
```

When preparing to create the `LimaVM`, the controller checks if `inputConfigMapName` (hardcoded to `"rd"`) already exists. If `apierrors.IsNotFound(err)` is false (meaning the ConfigMap exists), the controller skips creating it and blindly reuses whatever it found in the cluster. It does not update the `Data` field to match `r.LimaTemplateData` or ensure it has the correct `OwnerReference`.

If a stale `"rd"` ConfigMap was left over from a previous crash (or was manually created), the `LimaVM` will be spun up using incorrect or outdated template data. Furthermore, on the next reconcile, the controller will unconditionally delete this ConfigMap (line 150), even if it was not the true owner.

Fix: If the ConfigMap exists, update its `Data` and `OwnerReference` to ensure correctness, or use `controllerutil.CreateOrUpdate` to handle both creation and updating robustly.

```go
		inputCM := &corev1.ConfigMap{
			ObjectMeta: metav1.ObjectMeta{
				Name:      inputConfigMapName,
				Namespace: namespace,
			},
		}
		if _, err := controllerutil.CreateOrUpdate(ctx, r.Client, inputCM, func() error {
			if err := ctrl.SetControllerReference(&app, inputCM, r.Scheme); err != nil {
				return err
			}
			if inputCM.Data == nil {
				inputCM.Data = make(map[string]string)
			}
			inputCM.Data[limav1alpha1.TemplateConfigMapKey] = r.LimaTemplateData
			return nil
		}); err != nil {
			return ctrl.Result{}, fmt.Errorf("failed to create or update input ConfigMap: %w", err)
		}
```

3. **Incorrect ObservedGeneration mirrored on App conditions** — `pkg/controllers/app/app/controllers/app_controller.go:185` (important, logic error)

```go
		statusChanged := false
		for _, cond := range limaVM.Status.Conditions {
			statusChanged = apimeta.SetStatusCondition(&app.Status.Conditions, metav1.Condition{
				Type:               cond.Type,
				Status:             cond.Status,
				Reason:             cond.Reason,
				Message:            base.TruncateConditionMessage(cond.Message),
				ObservedGeneration: cond.ObservedGeneration,
				LastTransitionTime: cond.LastTransitionTime,
			}) || statusChanged
		}
```

When mirroring conditions from `LimaVM` to `App`, the controller directly copies `cond.ObservedGeneration`. In Kubernetes API conventions, `ObservedGeneration` inside a condition represents the `.metadata.generation` of the object the condition is attached to.

Because `LimaVM` and `App` have independent lifecycles and mutation counts, setting the `App`'s condition generation to the `LimaVM`'s generation breaks this convention. A client observing the `App` will see an `ObservedGeneration` that diverges from `app.Generation`, leading it to incorrectly assume the condition is stale or invalid.

Fix: Use `app.Generation` to accurately reflect the generation of the App resource being updated.

```diff
-				ObservedGeneration: cond.ObservedGeneration,
+				ObservedGeneration: app.Generation,
```

#### Suggestions

1. **Misleading comment regarding 32KB limit** — `pkg/controllers/base/conditions.go:12-16` (suggestion, enhancement)

```go
// ConditionMessageMaxLen is the maximum number of runes allowed in a Kubernetes
// condition message. The API server enforces a 32768-byte limit on the whole
// object; keeping messages within this bound avoids rejections from operations
// ...
const ConditionMessageMaxLen = 32768
```

The comment claims the API server enforces a 32768-byte limit on the *whole object*. If this were true, allowing a single message to consume 32768 *runes* (which is exactly 32,768 bytes in ASCII, and up to 131KB in UTF-8) would leave zero room for the metadata, spec, and other status conditions, guaranteeing the object gets rejected.

The number 32768 actually comes from standard kubebuilder validation rules (`// +kubebuilder:validation:MaxLength=32768`) enforced strictly on the `Message` field itself.

Fix: Update the comment to correctly attribute the limit to the `Condition.Message` CRD validation rules rather than a whole-object byte limit. If the whole object genuinely has a 32KB limit (e.g., from kine/sqlite constraints), the constant needs to be reduced significantly (e.g., `16384`) to leave room for the rest of the payload.

### Design Observations

**Strengths**
- The controller intelligently delegates condition message truncation (`base.TruncateConditionMessage`) defensively during status mirroring. This safely guarantees that any future changes or missed truncations originating in `LimaVM` won't break the `App` CRD validation limits.
- Utilizing `XValidation:rule="self == oldSelf"` for `spec.namespace` correctly guarantees immutability, eliminating the potential for orphaned owned resources across namespaces.

### Testing Assessment

The BATS tests cover the happy path well (singleton creation, running propagation, and deletion).
Untested scenarios (ranked by risk):
1. State recovery when a stale `rd` input ConfigMap is left over from an interrupted initialization before the `App` creation.
2. Re-creation of `LimaVM` when the underlying instance is force-deleted out-of-band but `App` is still active.
3. Behavior when the `templateConfigMap` is left orphaned and terminating during `App` deletion.

### Documentation Assessment

No significant documentation gaps found.

### Acknowledged Limitations

- "This is temporary and will be removed once the app controller is fully implemented." (`pkg/controllers/app/app/controller.go:18`)

### Coverage Summary

- `.github/actions/spelling/expect/rd.txt` — Trivial
- `bats/tests/32-app-controllers/app.bats` — Reviewed, no issues
- `pkg/apis/app/v1alpha1/app_types.go` — Reviewed, no issues
- `pkg/controllers/app/app/controller.go` — Reviewed, no issues
- `pkg/controllers/app/app/controllers/app_controller.go` — Finding 1, Finding 2, Finding 3
- `pkg/controllers/app/app/crd.yaml` — Reviewed, no issues
- `pkg/controllers/app/app/lima.yaml` — Reviewed, no issues
- `pkg/controllers/base/conditions.go` — Finding 4
- `pkg/controllers/lima/limavm/controllers/limavm_controller.go` — Reviewed, no issues
