# Deep Review: 20260324-131047-pr-236

| | |
|---|---|
| **Date** | 2026-03-24 13:10 |
| **Round** | 5 (of PR #236) |
| **Author** | [@Nino-K](https://github.com/Nino-K) |
| **PR** | [#236](https://github.com/rancher-sandbox/rancher-desktop-daemon/pull/236) — Implement App controller lifecycle for LimaVM |
| **Branch** | `app-controller-create-delete-limavm` |
| **Commits** | `2a3c5b2` Implement App controller lifecycle for LimaVM<br>`049a3a0` Add Bats test to App Controller |
| **Reviewers** | Claude Opus 4.6, Codex GPT 5.4, Gemini 3.1 Pro |
| **Verdict** | **Merge with fixes** — ownership violation in deletion path must be removed; ObservedGeneration and swallowed errors need correction |
| **Wall-clock time** | `15 min 6 s` |

---

## Executive Summary

This PR implements the App controller's core reconciliation loop: it bootstraps a LimaVM from an embedded template ConfigMap, propagates `spec.running`, mirrors VM status conditions onto the App resource, and tears down the LimaVM through a cleanup finalizer. The reconciler's priority chain and early-return pattern are clean. The main problems center on the deletion path, where the controller reaches into LimaVM-owned resources (violating the ownership model) and swallows transient API errors that can leak ConfigMaps in the GC-less embedded control plane.

---

## Critical Issues

None.

---

## Important Issues

1. **App finalizer deletes LimaVM-owned template ConfigMap** — `app_controller.go:73-84` [Claude Opus 4.6, Codex GPT 5.4, Gemini 3.1 Pro] (important, regression)

```go
templateCM := &corev1.ConfigMap{}
templateCMName := limaVMName + "-template"
if cmErr := r.Get(ctx, client.ObjectKey{Name: templateCMName, Namespace: namespace}, templateCM); cmErr == nil {
    if cmErr := base.RemoveOwnedFinalizer(ctx, r.Client, templateCM, v1alpha1.AppKind); cmErr != nil {
        return ctrl.Result{}, fmt.Errorf("failed to remove finalizer from template ConfigMap: %w", cmErr)
    }
    if cmErr := r.Delete(ctx, templateCM); cmErr != nil && !apierrors.IsNotFound(cmErr) {
        return ctrl.Result{}, fmt.Errorf("failed to delete template ConfigMap during cleanup: %w", cmErr)
    }
}
```

The `rd-template` ConfigMap is owned by the LimaVM controller, not the App controller. The LimaVM webhook sets an `owned-by-LimaVM` finalizer on it (`controller.go:211`), and the LimaVM's own cleanup finalizer deletes it via `DeleteOwnedResources` (`limavm_lifecycle.go:60-71`). This code violates the ownership boundary documented in `controllers.md:63-67` and `api_lima.md:60-68`.

Beyond the design violation, the code at line 76 passes `v1alpha1.AppKind` to `RemoveOwnedFinalizer`, which looks for `owned-by-App` — a finalizer that was never set. The removal is a no-op. The `Delete` at line 79 then hits the `OwnedDeletionGuard` webhook because `owned-by-LimaVM` still blocks deletion.

Fix: Remove this block entirely (lines 73-84). The LimaVM controller handles its own template ConfigMap cleanup.

```diff
-			templateCM := &corev1.ConfigMap{}
-			templateCMName := limaVMName + "-template"
-			if cmErr := r.Get(ctx, client.ObjectKey{Name: templateCMName, Namespace: namespace}, templateCM); cmErr == nil {
-				if cmErr := base.RemoveOwnedFinalizer(ctx, r.Client, templateCM, v1alpha1.AppKind); cmErr != nil {
-					return ctrl.Result{}, fmt.Errorf("failed to remove finalizer from template ConfigMap: %w", cmErr)
-				}
-				if cmErr := r.Delete(ctx, templateCM); cmErr != nil && !apierrors.IsNotFound(cmErr) {
-					return ctrl.Result{}, fmt.Errorf("failed to delete template ConfigMap during cleanup: %w", cmErr)
-				}
-			} else if !apierrors.IsNotFound(cmErr) {
-				log.Error(cmErr, "Failed to fetch template ConfigMap during cleanup")
-			}
-
 			return ctrl.Result{}, base.RemoveCleanupFinalizer(ctx, r.Client, &app)
```

2. **Mirrored ObservedGeneration refers to the LimaVM's generation, not the App's** — `app_controller.go:207` [Claude Opus 4.6, Gemini 3.1 Pro] (important, regression)

```go
statusChanged = apimeta.SetStatusCondition(&app.Status.Conditions, metav1.Condition{
    Type:               cond.Type,
    Status:             cond.Status,
    Reason:             cond.Reason,
    Message:            base.TruncateConditionMessage(cond.Message),
    ObservedGeneration: cond.ObservedGeneration,
    LastTransitionTime: cond.LastTransitionTime,
}) || statusChanged
```

`cond.ObservedGeneration` at line 207 carries the LimaVM's `.metadata.generation`. On the App resource, a condition's `ObservedGeneration` should reflect the App's own generation — it tells observers whether the controller has seen the latest App spec. An App at generation 5 could show `observedGeneration: 2` from the LimaVM, which misleads watchers into thinking the controller is lagging.

Fix: Use the App's generation.

```diff
-			ObservedGeneration: cond.ObservedGeneration,
+			ObservedGeneration: app.Generation,
```

3. **Swallowed error in deletion path leaks the input ConfigMap** — `app_controller.go:69-71` [Gemini 3.1 Pro] (important, regression)

```go
} else if !apierrors.IsNotFound(cmErr) {
    log.Error(cmErr, "Failed to fetch input ConfigMap during cleanup")
}
```

If `r.Get` for the input ConfigMap fails with a transient API error (e.g., network timeout) at line 65, the `else` branch at line 69 logs the error and falls through to `RemoveCleanupFinalizer` at line 86. The finalizer is stripped, the App is deleted, and the input ConfigMap is permanently orphaned. The embedded control plane has no garbage collector, so the ConfigMap persists indefinitely in kine.

Fix: Return the error so controller-runtime retries.

```diff
 } else if !apierrors.IsNotFound(cmErr) {
-    log.Error(cmErr, "Failed to fetch input ConfigMap during cleanup")
+    return ctrl.Result{}, fmt.Errorf("failed to fetch input ConfigMap during cleanup: %w", cmErr)
 }
```

---

## Suggestions

1. **Template ConfigMap name duplicated as string concatenation** — `app_controller.go:74` [Claude Opus 4.6] (suggestion, regression)

```go
templateCMName := limaVMName + "-template"
```

`LimaVM.GetTemplateConfigMapName()` (`limavm_types.go:117`) is the single source of truth for this naming convention. The App controller duplicates it with string concatenation at line 74. If the naming convention changes, this silently breaks. Moot if the template CM cleanup block is removed per finding 1.

2. **Swallowed error in normal-path input ConfigMap cleanup** — `app_controller.go:181-182` [Gemini 3.1 Pro] (suggestion, regression)

```go
} else if !apierrors.IsNotFound(err) {
    log.Error(err, "Failed to fetch input ConfigMap")
}
```

A transient API error when fetching the input ConfigMap during normal reconciliation (line 172) is logged and swallowed. The cleanup retries on the next watch event (LimaVM status change), so this self-heals. Returning the error would give faster, more predictable retry via exponential backoff.

Fix:

```diff
 } else if !apierrors.IsNotFound(err) {
-    log.Error(err, "Failed to fetch input ConfigMap")
+    return ctrl.Result{}, fmt.Errorf("failed to fetch input ConfigMap: %w", err)
 }
```

3. **Condition message comment describes bytes, code counts runes** — `conditions.go:12-13` [Claude Opus 4.6, Gemini 3.1 Pro] (suggestion, gap)

```go
// ConditionMessageMaxLen is the maximum number of runes allowed in a Kubernetes
// condition message. The API server enforces a 32768-byte limit on the whole
// object; keeping messages within this bound avoids rejections...
```

The CRD `maxLength: 32768` validates character count, not byte count. The code correctly truncates by rune. The comment's "32768-byte limit on the whole object" is inaccurate — that limit comes from CRD field validation, not from an overall object size constraint.

Fix: Correct the comment.

```diff
-// condition message. The API server enforces a 32768-byte limit on the whole
-// object; keeping messages within this bound avoids rejections from operations
-// that produce very long error strings (e.g. accumulated retry errors from
-// image downloads).
+// condition message. CRD validation enforces a 32768-character limit on this
+// field; keeping messages within this bound avoids rejections from operations
+// that produce very long error strings (e.g. accumulated retry errors from
+// image downloads).
```

---

## Design Observations

**Concerns**

- **App controller reaches into LimaVM-owned resources** (in-scope) [Claude Opus 4.6, Codex GPT 5.4, Gemini 3.1 Pro] — The deletion handler at lines 73-84 manipulates a ConfigMap the LimaVM controller owns. Per the framework design (`controllers.md`), a parent deletes the child resource and waits for the child's cleanup finalizer to handle everything downstream. Removing this block eliminates the wrong-kind bug, the coupling to LimaVM's internal naming convention, and the risk of interfering with the LimaVM controller's own cleanup.

**Strengths**

- The reconciler's priority chain (deletion > finalizer > create LimaVM > clean input CM > propagate running > mirror conditions) returns early after each mutation, keeping the resourceVersion from the initial Get current and avoiding stale-write conflicts. [Claude Opus 4.6]
- `Owns(&limav1alpha1.LimaVM{})` at line 225 is correct and eliminates polling. The explicit `Requeue: true` after deleting the unwatched input ConfigMap (line 180) avoids a silent stall. [Claude Opus 4.6, Codex GPT 5.4]
- Making `spec.namespace` immutable via CRD validation (`crd.yaml:43-56`) prevents orphaning namespaced children after the fact. [Codex GPT 5.4]

---

## Testing Assessment

The BATS test suite covers the full lifecycle (create, running state, condition mirroring, deletion, re-creation). Untested scenarios ranked by risk:

1. **Crash recovery during creation** — App created, input ConfigMap written, process crashes before LimaVM creation. The reconciler should re-enter the creation path and reuse the existing input CM (line 123). Not tested.
2. **Orphaned template CM after interrupted LimaVM teardown** — LimaVM deleted before it sets ownerRef on template CM. The template CM survives with `owned-by-LimaVM` finalizer. The current deletion code would wedge; removing it per finding 1 means the CM leaks inertly. Not tested.
3. **Concurrent App spec update during creation** — `spec.running` changes between CM creation and LimaVM creation. The LimaVM captures the value at creation time; running propagation (line 186) corrects it on the next reconcile. Self-healing but untested.

---

## Documentation Assessment

- `controller.go:63`: "It also registers Lima types allows App controller" — should read "It also registers Lima types, which allows the App controller."
- The embedded `lima.yaml` has no comment explaining `ssh.localPort: 51422` or the 6 GB memory value. The "temporary" comment at `controller.go:18` reduces urgency.

---

## Acknowledged Limitations

- `controller.go:18-19`: *"This is temporary and will be removed once the app controller is fully implemented."* — The embedded `lima.yaml` serves as a bootstrap template. It hardcodes `ssh.localPort: 51422` and `hostSocket: "{{.Home}}/.rd/docker.sock"`, which would collide across parallel RDD instances. Codex flagged this as Important, but it is acknowledged as temporary and the project has no multi-instance deployment yet. Once the external template mechanism (TwoCows, per issue #227) replaces this embedded file, these hardcoded values disappear.

---

## Agent Performance Retro

### [Claude]

Claude Opus 4.6 delivered the most thorough review with the best signal-to-noise ratio. It correctly identified the ownership violation (finding 1) with full trace through the webhook and LimaVM controller, caught the ObservedGeneration bug (finding 2), and offered three well-calibrated suggestions. Coverage was complete across all 9 files. No false positives.

**Unique contributions:** Template CM name duplication (suggestion 1), additive-only condition mirroring observation (suggestion 2).

### [Codex]

Codex GPT 5.4 was the fastest agent and correctly identified the ownership violation with a clear fix recommendation. It also flagged the hardcoded ports/paths in `lima.yaml` as Important — a valid concern but overcalibrated given the acknowledged temporary nature of the template.

**Unique contributions:** Hardcoded `ssh.localPort` and `hostSocket` collision risk in embedded template.
**Misses:** Did not flag ObservedGeneration (found by Claude and Gemini). Coverage summary omitted `conditions.go` findings.

### [Gemini]

Gemini 3.1 Pro found the most findings (4 numbered + 1 suggestion) and was the only agent to catch the swallowed-error bug in the deletion path (finding 3). It overcalibrated severity: the ownership violation and the swallowed error were both rated CRITICAL. The ownership violation is dead code in the normal flow (LimaVM handles its own CM) — IMPORTANT is appropriate. The swallowed error leaks a small inert ConfigMap in kine — undesirable but not data-loss-grade.

**Unique contributions:** Swallowed error in deletion path (finding 3), swallowed error in normal path (suggestion 2).
**Accuracy note:** Gemini's comment about the condition message limit being characters rather than bytes is more accurate than Claude's rune-vs-byte framing — CRD `maxLength` validates Unicode scalar values, which aligns with Go's rune counting.

### Summary

| Metric | Claude Opus 4.6 | Codex GPT 5.4 | Gemini 3.1 Pro |
|---|---|---|---|
| Duration | 7:09 | 3:55 | 5:13 |
| Critical | 0 | 0 | 2 (overcalibrated) |
| Important | 2 | 2 | 2 |
| Suggestion | 3 | 0 | 1 |
| Design observations | 2 | 2 | 0 |
| False positives | 0 | 0 | 0 |
| Unique insights | 2 | 1 | 2 |
| Files reviewed | 9/9 | 9/9 | 9/9 |
| Coverage misses | 0 | 0 | 0 |

Claude provided the most balanced and well-calibrated review. Gemini added the highest-value unique finding (swallowed error in deletion). Codex was fast and accurate on the main issue but missed two findings the other agents caught. All three correctly identified the ownership violation as the primary concern — strong consensus.

---

## Appendix: Original Reviews

### Claude Opus 4.6

---

## PR #236 Review: Implement App controller lifecycle for LimaVM

**Verdict**: Approve with requested changes

### Executive Summary

This PR implements the core App controller reconciliation loop: it creates a LimaVM instance from an embedded template, propagates `spec.running`, mirrors VM status conditions, and cleans up via finalizer on deletion. The overall structure is sound and follows the project's controller patterns well. The main issue is that the deletion cleanup code directly manipulates a ConfigMap owned by the LimaVM controller using the wrong finalizer kind, which violates the ownership model and can cause a stuck deletion in a narrow race condition.

### Findings

#### Critical Issues

None.

#### Important Issues

1. **Wrong finalizer kind in template ConfigMap cleanup causes potential stuck deletion** — `app_controller.go:76` (important, regression)

```go
if cmErr := base.RemoveOwnedFinalizer(ctx, r.Client, templateCM, v1alpha1.AppKind); cmErr != nil {
    return ctrl.Result{}, fmt.Errorf("failed to remove finalizer from template ConfigMap: %w", cmErr)
}
if cmErr := r.Delete(ctx, templateCM); cmErr != nil && !apierrors.IsNotFound(cmErr) {
```

The template ConfigMap (`rd-template`) carries the `owned-by-LimaVM` finalizer (set by the LimaVM mutating webhook at `controller.go:211`). This code passes `v1alpha1.AppKind` ("App"), so `RemoveOwnedFinalizer` at line 76 attempts to remove `owned-by-App` — which doesn't exist — and is a no-op. The subsequent `r.Delete` at line 79 is then rejected by the `OwnedDeletionGuard` webhook (`webhook.go:57-58`) because the `owned-by-LimaVM` finalizer is still present: *"cannot delete 'rd-template': owned by LimaVM"*.

In the normal flow this code is dead — the LimaVM's `handleDeletion` (`limavm_lifecycle.go:65`) calls `DeleteOwnedResources` which strips the finalizer and deletes the template CM before the LimaVM disappears. But in a race where the LimaVM is deleted before its reconciler sets the owner reference on the template CM (between `controller.go:227` creating the CM and `limavm_controller.go:233` setting the owner ref), `DeleteOwnedResources` won't find the template CM (no matching `ownerReferences[].uid`). The CM survives with its `owned-by-LimaVM` finalizer, and the App gets stuck in an infinite retry loop.

Fix: Per `controllers.md:63`, each controller cleans up only resources it owns. Remove the template ConfigMap cleanup entirely (lines 73-84) — it's the LimaVM controller's responsibility. If belt-and-suspenders cleanup is desired, use `limav1alpha1.LimaVMKind` instead of `v1alpha1.AppKind`:

```diff
-			if cmErr := base.RemoveOwnedFinalizer(ctx, r.Client, templateCM, v1alpha1.AppKind); cmErr != nil {
+			if cmErr := base.RemoveOwnedFinalizer(ctx, r.Client, templateCM, limav1alpha1.LimaVMKind); cmErr != nil {
```

2. **Mirrored ObservedGeneration refers to the wrong resource** — `app_controller.go:207` (important, regression)

```go
statusChanged = apimeta.SetStatusCondition(&app.Status.Conditions, metav1.Condition{
    ...
    ObservedGeneration: cond.ObservedGeneration,
    ...
}) || statusChanged
```

`ObservedGeneration` is copied from the LimaVM condition, where it refers to `limaVM.Generation`. On the App resource, it should refer to `app.Generation` — it tells consumers whether the controller has observed the latest App spec change. With the current code, an App at generation 5 could have a condition with `observedGeneration: 2` (from the LimaVM), which is misleading.

Fix: Set `ObservedGeneration` to the App's generation instead:

```diff
    ObservedGeneration: cond.ObservedGeneration,
+   // Use the App's generation, not the LimaVM's, since this condition
+   // lives on the App resource.
    ObservedGeneration: app.Generation,
```

#### Suggestions

1. **Template ConfigMap name duplicated as string concatenation** — `app_controller.go:74` (suggestion, regression)

```go
templateCMName := limaVMName + "-template"
```

`LimaVM.GetTemplateConfigMapName()` (at `limavm_types.go:117`) is the single source of truth for this naming convention. The App controller duplicates it with string concatenation. If the naming convention changes upstream, this silently breaks. Since this is in the deletion path where the LimaVM is already gone, the App controller can't call the method on a live object. Consider extracting a package-level function in the LimaVM API package, or if this cleanup code is removed per finding 1, this becomes moot.

2. **Condition mirroring is additive-only** — `app_controller.go:200-210` (suggestion, gap)

```go
for _, cond := range limaVM.Status.Conditions {
    statusChanged = apimeta.SetStatusCondition(&app.Status.Conditions, metav1.Condition{...}) || statusChanged
}
```

Conditions are added and updated but never removed from the App. If a condition type were ever removed from the LimaVM status, it would persist as stale on the App. In practice, the LimaVM controller maintains stable condition types (Created, Running) so this is theoretical, but worth noting for future-proofing. A symmetric approach would remove App conditions that no longer exist on the LimaVM.

3. **`ConditionMessageMaxLen` counts runes, not bytes** — `conditions.go:17` (suggestion, regression)

```go
const ConditionMessageMaxLen = 32768
```

The CRD schema enforces `maxLength: 32768` (visible in `crd.yaml:86`), which Kubernetes interprets as bytes for the API server wire format. `TruncateConditionMessage` counts runes. For ASCII messages (the common case) this is identical, but a message of 32768 multi-byte runes could exceed the byte limit. The risk is very low — error messages are almost always ASCII — but the comment at line 13 ("32768-byte limit") is misleading since the code counts runes.

### Design Observations

**Concerns**

- **App controller directly manipulates LimaVM-owned resources** (in-scope) — The deletion handler at lines 73-84 removes finalizers from and deletes the template ConfigMap, which is owned by the LimaVM controller. Per `controllers.md`, the correct pattern is: parent deletes the child resource, child's own cleanup finalizer handles everything downstream. The App should only delete the LimaVM and let the LimaVM controller handle its owned ConfigMaps via `DeleteOwnedResources`. This would eliminate the wrong-kind bug in finding 1 and reduce the coupling between App and LimaVM's internal naming conventions.

**Strengths**

- The reconciler's priority chain (deletion > finalizer > create LimaVM > clean up input CM > propagate running > mirror conditions) is clean and each step returns early, avoiding stale-resourceVersion conflicts.
- Using `Owns(&limav1alpha1.LimaVM{})` for watch setup is correct and eliminates the need for polling.
- The explicit `Requeue: true` after deleting the input ConfigMap (line 180) with a comment explaining why is good — it catches the gap where ConfigMaps aren't watched.
- The BATS tests are comprehensive and test the full lifecycle including re-creation after deletion.

### Testing Assessment

The BATS test suite covers the main lifecycle well. Untested scenarios, ranked by risk:

1. **Crash recovery during creation**: App created, input ConfigMap created, process crashes before LimaVM creation. On restart, the reconciler should re-enter the creation path and create the LimaVM (the input CM already exists and is reused at line 123). Not tested.
2. **Concurrent App spec update during LimaVM creation**: App's `spec.running` changes between ConfigMap creation and LimaVM creation. The LimaVM is created with the value at the time of creation, and the running propagation at line 186 will correct it on the next reconcile. Not tested but self-healing.
3. **Unit tests**: No unit tests for the reconciler's individual code paths. The BATS integration tests provide good coverage, but targeted unit tests would make it easier to test edge cases like conflict errors and partial failures.

### Documentation Assessment

- The comment at `controller.go:63` has a grammar issue: "It also registers Lima types allows" should be "It also registers Lima types, which allows".
- The embedded `lima.yaml` template is not documented (e.g., why `ssh.localPort: 51422`, why 6 GB memory). The `// This is temporary` comment at `controller.go:18` suggests this will be replaced, which reduces the urgency.

### Acknowledged Limitations

- `controller.go:18-19`: *"This is temporary and will be removed once the app controller is fully implemented."* — The embedded `lima.yaml` is a bootstrap mechanism. Once the template is managed externally (e.g., from the TwoCows project referenced in issue #227), this go:embed and the hardcoded template will be removed.

### Unresolved Feedback

No unresolved PR review comments.

### Coverage Summary

| File | Status |
|------|--------|
| `.github/actions/spelling/expect/rd.txt` | Trivial — adds "rosetta" to spelling allowlist |
| `bats/tests/32-app-controllers/app.bats` | Reviewed, no issues |
| `pkg/apis/app/v1alpha1/app_types.go` | Reviewed, no issues |
| `pkg/controllers/app/app/controller.go` | Reviewed, no issues (grammar nit noted in Documentation) |
| `pkg/controllers/app/app/controllers/app_controller.go` | Findings 1, 2 (important); Suggestions 1, 2 |
| `pkg/controllers/app/app/crd.yaml` | Reviewed, no issues — consistent with Go type changes |
| `pkg/controllers/app/app/lima.yaml` | Reviewed, no issues |
| `pkg/controllers/base/conditions.go` | Suggestion 3 |
| `pkg/controllers/lima/limavm/controllers/limavm_controller.go` | Reviewed, no issues — straightforward use of `TruncateConditionMessage` |

### Codex GPT 5.4

**Verdict**: Changes requested

**Executive Summary**
This PR adds the App controller’s first real lifecycle loop: it bootstraps a `LimaVM`, propagates `spec.running`, mirrors VM conditions onto `App`, and tears the VM down through an App finalizer. The main problems are that the App finalizer now reaches into a LimaVM-owned `ConfigMap` and can wedge App deletion, and the embedded `lima.yaml` hardcodes host resources in a way that breaks the repo’s multi-instance design.

**Critical Issues**
None.

**Important Issues**
1. **App finalizer deletes a LimaVM-owned template `ConfigMap` and can leave `App` stuck Terminating** — `pkg/controllers/app/app/controllers/app_controller.go:73-86` `(important, regression)`

```go
templateCM := &corev1.ConfigMap{}
templateCMName := limaVMName + "-template"
if cmErr := r.Get(ctx, client.ObjectKey{Name: templateCMName, Namespace: namespace}, templateCM); cmErr == nil {
	if cmErr := base.RemoveOwnedFinalizer(ctx, r.Client, templateCM, v1alpha1.AppKind); cmErr != nil {
		return ctrl.Result{}, fmt.Errorf("failed to remove finalizer from template ConfigMap: %w", cmErr)
	}
	if cmErr := r.Delete(ctx, templateCM); cmErr != nil && !apierrors.IsNotFound(cmErr) {
		return ctrl.Result{}, fmt.Errorf("failed to delete template ConfigMap during cleanup: %w", cmErr)
	}
}
```

At `pkg/controllers/app/app/controllers/app_controller.go:76` the App controller removes `owned-by-App`, but the LimaVM webhook created `rd-template` with `owned-by-LimaVM` in `pkg/controllers/lima/limavm/controller.go:210-211`. The DELETE at `pkg/controllers/app/app/controllers/app_controller.go:79` is therefore still blocked by the owned-resource guard in `pkg/controllers/base/webhook.go:53-58`. Because the App then keeps its own cleanup finalizer until this branch succeeds (`pkg/controllers/app/app/controllers/app_controller.go:86`), any orphaned `rd-template` left behind by an interrupted LimaVM teardown will wedge App deletion permanently. This also violates the documented ownership model: `docs/design/api_lima.md:60-68` and `docs/design/controllers.md:63-67` say the LimaVM controller owns and deletes `status.templateConfigMap`, and the LimaVM finalizer already does that via `base.DeleteOwnedResources` in `pkg/controllers/lima/limavm/controllers/limavm_lifecycle.go:60-71`.

Fix: remove the `rd-template` cleanup from the App controller entirely and let the LimaVM controller remain the only component that deletes its template `ConfigMap`. If extra resilience is needed for crash recovery, strengthen LimaVM cleanup before it removes its own finalizer instead of having the parent controller reach into a child’s resources.

```diff
-			templateCM := &corev1.ConfigMap{}
-			templateCMName := limaVMName + "-template"
-			if cmErr := r.Get(ctx, client.ObjectKey{Name: templateCMName, Namespace: namespace}, templateCM); cmErr == nil {
-				if cmErr := base.RemoveOwnedFinalizer(ctx, r.Client, templateCM, v1alpha1.AppKind); cmErr != nil {
-					return ctrl.Result{}, fmt.Errorf("failed to remove finalizer from template ConfigMap: %w", cmErr)
-				}
-				if cmErr := r.Delete(ctx, templateCM); cmErr != nil && !apierrors.IsNotFound(cmErr) {
-					return ctrl.Result{}, fmt.Errorf("failed to delete template ConfigMap during cleanup: %w", cmErr)
-				}
-			} else if !apierrors.IsNotFound(cmErr) {
-				log.Error(cmErr, "Failed to fetch template ConfigMap during cleanup")
-			}
-
 			return ctrl.Result{}, base.RemoveCleanupFinalizer(ctx, r.Client, &app)
```

2. **The embedded `lima.yaml` hardcodes host paths and ports, so parallel RDD instances collide** — `pkg/controllers/app/app/lima.yaml:13-14,30-37` `(important, regression)`

```yaml
ssh:
  localPort: 51422
...
- guestSocket: /var/run/docker.sock
  hostPortRange:
  - 0
  - 0
  hostSocket: "{{.Home}}/.rd/docker.sock"
```

The design explicitly allows multiple RDD instances to run in parallel (`docs/design/api_app.md:15-21`), and the repo already has per-instance helpers for exactly this problem: `pkg/instance/instance.go:24-35` provides an instance-specific index for unique ports, and `pkg/instance/instance.go:97-107` defines the per-instance short directory (`~/.rd2`, `~/.rdbats`, etc.). This template ignores both. `ssh.localPort` is fixed at `51422` (`pkg/controllers/app/app/lima.yaml:14`), so a second instance races for the same host SSH port, and `hostSocket` is fixed to `{{.Home}}/.rd/docker.sock` (`pkg/controllers/app/app/lima.yaml:37`), which does not even match this repo’s per-instance short-dir convention. The result is cross-instance interference during VM startup and Docker socket forwarding.

Fix: stop embedding a fully static template. Render the few host-specific fields at runtime from `instance.ShortDir()` and either `instance.Index()` or an automatically assigned SSH port.

```go
type limaTemplateVars struct {
	ShortDir string
	SSHPort  int
}
vars := limaTemplateVars{
	ShortDir: instance.ShortDir(),
	SSHPort:  51400 + instance.Index(),
}
```

Then render template fields like `localPort: {{.SSHPort}}` and `hostSocket: "{{.ShortDir}}/docker.sock"` before creating the bootstrap `ConfigMap`.

**Suggestions**
None.

**Design Observations**
**Strengths**
- The explicit `Requeue: true` after deleting the unwatched bootstrap `ConfigMap` in `pkg/controllers/app/app/controllers/app_controller.go:177-180` is the right shape; it avoids a silent stall when no watch event will fire.
- Making `spec.namespace` immutable in `pkg/apis/app/v1alpha1/app_types.go:18-24` / `pkg/controllers/app/app/crd.yaml:43-56` is a good guard against orphaning namespaced children.

**Testing Assessment**
1. No test covers two different `RDD_INSTANCE`s starting the App controller, so the fixed `ssh.localPort` and `hostSocket` collision in `pkg/controllers/app/app/lima.yaml:14,37` would slip through.
2. The new App BATS only exercises clean LimaVM teardown; it does not cover the failure state where `rd-template` survives after LimaVM deletion, which is exactly the state that wedges the App finalizer in `pkg/controllers/app/app/controllers/app_controller.go:73-86`.

**Documentation Assessment**
The embedded bootstrap template now carries instance-sensitive behavior, but there is no code or doc comment explaining how its host port and socket path are chosen. If this temporary template remains, it should document that those values must be derived from `RDD_INSTANCE`, not hardcoded.

**Coverage Summary**
- `.github/actions/spelling/expect/rd.txt` — Trivial
- `bats/tests/32-app-controllers/app.bats` — Reviewed, no issues
- `pkg/apis/app/v1alpha1/app_types.go` — Reviewed, no issues
- `pkg/controllers/app/app/controller.go` — Reviewed, no issues
- `pkg/controllers/app/app/controllers/app_controller.go` — Finding 1
- `pkg/controllers/app/app/crd.yaml` — Reviewed, no issues
- `pkg/controllers/app/app/lima.yaml` — Finding 2
- `pkg/controllers/base/conditions.go` — Reviewed, no issues
- `pkg/controllers/lima/limavm/controllers/limavm_controller.go` — Reviewed, no issues

### Gemini 3.1 Pro

### Executive Summary

This PR implements the App controller's core reconciliation loop, correctly handling the creation of the `LimaVM` and input template ConfigMap, running state propagation, and condition mirroring. However, there are critical flaws in the cleanup phase: transient API errors cause the input ConfigMap to leak permanently, and the controller attempts to incorrectly manage the lifecycle of a `LimaVM`-owned template ConfigMap, violating the controller design boundaries.

### Findings

#### Critical Issues

1. **Resource Leak on App Deletion** — `pkg/controllers/app/app/controllers/app_controller.go:69-71` (critical, regression)

```go
			} else if !apierrors.IsNotFound(cmErr) {
				log.Error(cmErr, "Failed to fetch input ConfigMap during cleanup")
			}
```

In the App controller's deletion block, if `r.Get` fails due to a transient API error (e.g., connection timeout) while looking for the leftover input ConfigMap, the error is merely logged. The function continues, reaching `base.RemoveCleanupFinalizer(ctx, r.Client, &app)`, which strips the finalizer and allows the App resource to be permanently deleted. Because the embedded control plane has no automatic GC, this permanently leaks the input ConfigMap.

Fix: Return the error to ensure `controller-runtime` requeues and retries the cleanup.

```diff
-			} else if !apierrors.IsNotFound(cmErr) {
-				log.Error(cmErr, "Failed to fetch input ConfigMap during cleanup")
-			}
+			} else if !apierrors.IsNotFound(cmErr) {
+				return ctrl.Result{}, fmt.Errorf("failed to fetch input ConfigMap during cleanup: %w", cmErr)
+			}
```

2. **Design Violation: Deleting Resources Owned by Another Controller** — `pkg/controllers/app/app/controllers/app_controller.go:73-84` (critical, regression)

```go
			templateCM := &corev1.ConfigMap{}
			templateCMName := limaVMName + "-template"
			if cmErr := r.Get(ctx, client.ObjectKey{Name: templateCMName, Namespace: namespace}, templateCM); cmErr == nil {
				if cmErr := base.RemoveOwnedFinalizer(ctx, r.Client, templateCM, v1alpha1.AppKind); cmErr != nil {
					return ctrl.Result{}, fmt.Errorf("failed to remove finalizer from template ConfigMap: %w", cmErr)
				}
				if cmErr := r.Delete(ctx, templateCM); cmErr != nil && !apierrors.IsNotFound(cmErr) {
					return ctrl.Result{}, fmt.Errorf("failed to delete template ConfigMap during cleanup: %w", cmErr)
				}
			} else if !apierrors.IsNotFound(cmErr) {
				log.Error(cmErr, "Failed to fetch template ConfigMap during cleanup")
			}
```

This violates the explicit rule laid out in the project design doc (`docs/design/controllers.md`): *"Each controller cleans up only the resources it owns... If a controller manipulates resources owned by a different controller (e.g., the App controller deleting a ConfigMap owned by LimaVM), that is a design violation."* The `rd-template` ConfigMap is created and owned by the `LimaVM` webhook/controller, not the App controller. When `LimaVM` is deleted, its own finalizer cleans up this ConfigMap. Furthermore, the code erroneously attempts to strip an `owned-by-App` finalizer from it, which was never set.

Fix: Remove this entire block. The App controller should only clean up the `inputConfigMapName` which it actually owns.

#### Important Issues

1. **Incorrect `ObservedGeneration` in Mirrored Conditions** — `pkg/controllers/app/app/controllers/app_controller.go:193` (important, regression)

```go
			Message:            base.TruncateConditionMessage(cond.Message),
			ObservedGeneration: cond.ObservedGeneration,
			LastTransitionTime: cond.LastTransitionTime,
```

When mirroring conditions from the `LimaVM` to the `App` status, the code blindly copies `cond.ObservedGeneration` from the child resource. In Kubernetes, a condition's `ObservedGeneration` must reflect the `.metadata.generation` of the resource the status is attached to (the `App` object). Mixing generation sequence numbers between resources causes the condition to appear broken or permanently stale to observers watching the `App`.

Fix: Use the parent `App`'s generation.

```diff
-			ObservedGeneration: cond.ObservedGeneration,
+			ObservedGeneration: app.Generation,
```

2. **Swallowed API Errors During Normal Reconcile** — `pkg/controllers/app/app/controllers/app_controller.go:173-176` (important, regression)

```go
	} else if !apierrors.IsNotFound(err) {
		log.Error(err, "Failed to fetch input ConfigMap")
	}
```

When cleaning up the lingering input ConfigMap in the normal execution loop, a failure to fetch the ConfigMap due to a transient API error is logged and swallowed. Because the reconciler proceeds without returning an error or `Requeue: true`, it breaks the retry loop. The deletion won't be retried until an unrelated watch event triggers the controller again, needlessly delaying cleanup.

Fix: Return the error to trigger an exponential backoff retry.

```diff
-	} else if !apierrors.IsNotFound(err) {
-		log.Error(err, "Failed to fetch input ConfigMap")
-	}
+	} else if !apierrors.IsNotFound(err) {
+		return ctrl.Result{}, fmt.Errorf("failed to fetch input ConfigMap: %w", err)
+	}
```

#### Suggestions

1. **Inaccurate Comment Regarding Kubernetes Limits** — `pkg/controllers/base/conditions.go:10-13` (suggestion, gap)

```go
// ConditionMessageMaxLen is the maximum number of runes allowed in a Kubernetes
// condition message. The API server enforces a 32768-byte limit on the whole
// object; keeping messages within this bound avoids rejections from operations
```

The comment justifies truncating the string to 32768 runes by claiming there is a "32768-byte limit on the whole object". The API server object size limit in etcd is actually 1.5MB. The limit of 32768 on a condition `Message` stems from an OpenAPI schema validation `maxLength=32768` (which accurately restricts the number of *characters/runes* in the field, not bytes). The code correctly trims by rune count, but the comment's description is inaccurate.

Fix: Update the comment to correctly describe the CRD validation limit.

```diff
- // ConditionMessageMaxLen is the maximum number of runes allowed in a Kubernetes
- // condition message. The API server enforces a 32768-byte limit on the whole
- // object; keeping messages within this bound avoids rejections from operations
- // that produce very long error strings (e.g. accumulated retry errors from
- // image downloads).
+ // ConditionMessageMaxLen is the maximum number of runes allowed in a Kubernetes
+ // condition message. The CRD validation schema enforces a 32768-character
+ // limit on the field; keeping messages within this bound avoids validation rejections
+ // from operations that produce very long error strings.
```

### Coverage Summary

- `.github/actions/spelling/expect/rd.txt` — Trivial
- `bats/tests/32-app-controllers/app.bats` — Reviewed, no issues
- `pkg/apis/app/v1alpha1/app_types.go` — Reviewed, no issues
- `pkg/controllers/app/app/controller.go` — Reviewed, no issues
- `pkg/controllers/app/app/controllers/app_controller.go` — Finding 1, 2, 3, 4
- `pkg/controllers/app/app/crd.yaml` — Reviewed, no issues
- `pkg/controllers/app/app/lima.yaml` — Reviewed, no issues
- `pkg/controllers/base/conditions.go` — Finding 5
- `pkg/controllers/lima/limavm/controllers/limavm_controller.go` — Reviewed, no issues
