# Deep Review: 20260324-134429-pr-236

| | |
|---|---|
| **Date** | 2026-03-24 13:44 |
| **Round** | 6 (of PR #236) |
| **Author** | [@Nino-K](https://github.com/Nino-K) |
| **PR** | [#236](https://github.com/rancher-sandbox/rancher-desktop-daemon/pull/236) — Implement App controller lifecycle for LimaVM |
| **Branch** | `app-controller-create-delete-limavm` |
| **Commits** | `382b50a` Implement App controller lifecycle for LimaVM<br>`6b3a999` Add Bats test to App Controller |
| **Reviewers** | Claude Opus 4.6, Codex GPT 5.4, Gemini 3.1 Pro |
| **Verdict** | **Merge with fixes** — fix the deletion deadlock and add a LimaVM ValidateDelete webhook |
| **Wall-clock time** | `20 min 10 s` |

## Executive Summary

This PR implements the App controller's core reconciliation loop: creating and owning a LimaVM singleton from an embedded template, propagating `spec.running`, mirroring VM status conditions onto the App resource, and cleaning up via a finalizer on deletion. The implementation follows established framework patterns and includes thorough BATS integration tests. One critical issue remains: the App deletion handler deadlocks when the LimaVM is already in Terminating state, and the LimaVM resource lacks the ValidateDelete webhook the framework requires to prevent this scenario.

---

## Consolidated Review

### Critical Issues

1. **App deletion deadlocks if LimaVM is already terminating** — `pkg/controllers/app/app/controllers/app_controller.go:75-87` [Gemini 3.1 Pro] (critical, regression)

```go
	case !base.IsBeingDeleted(limaVM):
		if err := base.RemoveOwnedFinalizer(ctx, r.Client, limaVM, v1alpha1.AppKind); err != nil {
			return ctrl.Result{}, fmt.Errorf("failed to remove owned finalizer from LimaVM: %w", err)
		}
		if err := r.Delete(ctx, limaVM); err != nil && !apierrors.IsNotFound(err) {
			return ctrl.Result{}, fmt.Errorf("failed to delete LimaVM: %w", err)
		}
		log.Info("Requested LimaVM deletion, waiting for teardown")
		return ctrl.Result{RequeueAfter: requeueAfterDeletion}, nil
	default:
		// LimaVM deletion is in progress; keep requeueing until it is gone.
		log.Info("Waiting for LimaVM deletion to complete")
		return ctrl.Result{RequeueAfter: requeueAfterDeletion}, nil
```

The deletion switch only removes the owned finalizer when `!base.IsBeingDeleted(limaVM)` (line 75). If the LimaVM was deleted externally (e.g., `rdd ctl delete limavm rd`), its `DeletionTimestamp` is set and the controller falls to `default` (line 84), where it loops forever waiting for the LimaVM to disappear. The LimaVM cannot disappear because the App controller still holds the `rdd.rancherdesktop.io/owned-by-App` finalizer and refuses to remove it. Both resources become permanently stuck with no recovery path.

Without the ValidateDelete webhook (see Important finding 1), this scenario is triggered by a single, intuitive CLI command.

Fix: Always remove the owned finalizer during App cleanup, regardless of the child's deletion state.

```diff
-	case !base.IsBeingDeleted(limaVM):
-		if err := base.RemoveOwnedFinalizer(ctx, r.Client, limaVM, v1alpha1.AppKind); err != nil {
-			return ctrl.Result{}, fmt.Errorf("failed to remove owned finalizer from LimaVM: %w", err)
-		}
-		if err := r.Delete(ctx, limaVM); err != nil && !apierrors.IsNotFound(err) {
-			return ctrl.Result{}, fmt.Errorf("failed to delete LimaVM: %w", err)
-		}
-		log.Info("Requested LimaVM deletion, waiting for teardown")
-		return ctrl.Result{RequeueAfter: requeueAfterDeletion}, nil
-	default:
-		// LimaVM deletion is in progress; keep requeueing until it is gone.
-		log.Info("Waiting for LimaVM deletion to complete")
-		return ctrl.Result{RequeueAfter: requeueAfterDeletion}, nil
+	default:
+		if err := base.RemoveOwnedFinalizer(ctx, r.Client, limaVM, v1alpha1.AppKind); err != nil {
+			return ctrl.Result{}, fmt.Errorf("failed to remove owned finalizer from LimaVM: %w", err)
+		}
+		if !base.IsBeingDeleted(limaVM) {
+			if err := r.Delete(ctx, limaVM); err != nil && !apierrors.IsNotFound(err) {
+				return ctrl.Result{}, fmt.Errorf("failed to delete LimaVM: %w", err)
+			}
+			log.Info("Requested LimaVM deletion, waiting for teardown")
+		} else {
+			log.Info("Waiting for LimaVM deletion to complete")
+		}
+		return ctrl.Result{RequeueAfter: requeueAfterDeletion}, nil
```

---

### Important Issues

1. **LimaVM lacks a ValidateDelete webhook for its owned finalizer** — `pkg/controllers/lima/limavm/controller.go:109-153` [Codex GPT 5.4, Gemini 3.1 Pro] (important, regression)

```go
// setupWebhookWithRuntimeConfig sets up webhooks with shared certificate configuration.
func (c *controller) setupWebhookWithRuntimeConfig(mgr ctrl.Manager) error {
	// Set up LimaVM mutating webhook (validates uniqueness ...)
	mutatingConfig := base.WebhookConfig[*v1alpha1.LimaVM]{
		// ... only a mutating webhook, no validator
	}
```

This PR adds `base.OwnedFinalizerFor(v1alpha1.AppKind)` to the LimaVM at line 135 of `app_controller.go`, but the Lima controller registers only a mutating webhook for LimaVM (lines 112-127). No validating webhook with a `ValidateDelete` handler exists for LimaVM anywhere in the codebase. This violates the framework contract in `docs/design/controllers.md:49-55`: resources that may carry an owned finalizer must reject DELETE requests via a validating webhook. Without the webhook, `rdd ctl delete limavm rd` is accepted, the LimaVM enters Terminating, and the deadlock in Critical finding 1 is triggered.

Fix: Add a LimaVM validating webhook that embeds `base.OwnedDeletionGuard[*v1alpha1.LimaVM]` and register it alongside the existing mutating webhook in `setupWebhookWithRuntimeConfig`. Include DELETE in its operations list. The existing `ConfigMapValidator` (line 259) demonstrates the pattern.

---

### Suggestions

1. **Dropped transient error in deletion path may orphan ConfigMap** — `pkg/controllers/app/app/controllers/app_controller.go:69-72` [Gemini 3.1 Pro] (suggestion, regression)

```go
		} else if !apierrors.IsNotFound(cmErr) {
			log.Error(cmErr, "Failed to fetch input ConfigMap during cleanup")
		}
		return ctrl.Result{}, base.RemoveCleanupFinalizer(ctx, r.Client, &app)
```

During App deletion, if `r.Get` for the input ConfigMap fails with a transient error (line 69), the error is logged but not returned. The controller proceeds to remove the cleanup finalizer (line 72), allowing the App to be deleted while the ConfigMap may still exist. Since RDD has no garbage collector, the ConfigMap is orphaned. In the embedded kine/sqlite backend, transient errors are unlikely, so this is a low-probability leak. The non-deletion path (line 167-168) self-heals on the next reconcile and needs no change.

Fix: Return the error instead of logging and continuing.

```diff
 		} else if !apierrors.IsNotFound(cmErr) {
-			log.Error(cmErr, "Failed to fetch input ConfigMap during cleanup")
+			return ctrl.Result{}, fmt.Errorf("failed to fetch input ConfigMap during cleanup: %w", cmErr)
 		}
```

2. **Condition mirroring is additive-only** — `pkg/controllers/app/app/controllers/app_controller.go:187-196` [Claude Opus 4.6] (suggestion, gap)

```go
	for _, cond := range limaVM.Status.Conditions {
		statusChanged = apimeta.SetStatusCondition(&app.Status.Conditions, metav1.Condition{
			Type:               cond.Type,
			// ...
		}) || statusChanged
	}
```

The loop copies every LimaVM condition onto the App but never removes App conditions absent from the LimaVM. `apimeta.SetStatusCondition` only adds or updates. Currently safe because the LimaVM controller never removes conditions and the App is deleted/recreated as a unit. If the LimaVM lifecycle is extended to remove conditions, the App would retain stale copies.

Fix: After the mirroring loop, remove App conditions whose types are absent from `limaVM.Status.Conditions`.

3. **Missing unit test for TruncateConditionMessage** — `pkg/controllers/base/conditions.go:22-29` [Claude Opus 4.6] (suggestion, gap)

```go
func TruncateConditionMessage(msg string) string {
	runes := []rune(msg)
	if len(runes) <= ConditionMessageMaxLen {
		return msg
	}
	const suffix = "… (truncated)"
	return string(runes[:ConditionMessageMaxLen-len([]rune(suffix))]) + suffix
}
```

This new shared utility operates on rune boundaries and is used by both the LimaVM controller (line 432) and the App controller (line 192). A table-driven test covering the boundary cases (empty string, exactly at limit, one rune over, multi-byte characters) would match the precedent set by `pkg/controllers/base/finalizer_test.go`.

4. **Hardcoded SSH port and Docker socket in temporary template** — `pkg/controllers/app/app/lima.yaml:14,37` [Claude Opus 4.6, Codex GPT 5.4] (suggestion, gap)

```yaml
ssh:
  localPort: 51422
...
  hostSocket: "{{.Home}}/.rd/docker.sock"
```

The SSH port (line 14) and Docker socket path (line 37) are hardcoded. Concurrent `RDD_INSTANCE` values would collide on the port, and a non-default instance exposes Docker on the wrong socket path. The template header acknowledges this is temporary. A future version should derive these values from instance configuration, as `instance.Index()` and `instance.ShortDir()` already provide.

Codex rated this IMPORTANT; Claude rated it SUGGESTION. Given the explicit "temporary" acknowledgment in `controller.go:18`, SUGGESTION is appropriate — the limitation is known and scoped.

5. **Boolean assignment with short-circuit `||` obscures intent** — `pkg/controllers/app/app/controllers/app_controller.go:188-195` [Gemini 3.1 Pro] (suggestion, enhancement)

```go
		statusChanged = apimeta.SetStatusCondition(...) || statusChanged
```

The left operand ensures `SetStatusCondition` always executes, but the `||` compound assignment forces the reader to verify short-circuit evaluation order. Extracting the return value into a separate variable makes the intent explicit.

---

## Design Observations

### Concerns

- **Missing deletion guard is a framework gap, not just a PR gap** — `pkg/controllers/lima/limavm/controller.go` [Codex GPT 5.4, Gemini 3.1 Pro] (in-scope): The owned finalizer pattern requires both the finalizer on the child and a ValidateDelete webhook on the child's resource type. This PR adds the finalizer but not the webhook. The fix belongs in the Lima controller, not the App controller — the Lima controller owns the LimaVM resource type and its webhook configuration.

### Strengths

- **Clean separation of cleanup responsibilities**: The App controller deletes the LimaVM and its own input ConfigMap. The LimaVM's cleanup finalizer handles everything downstream (template ConfigMap, disk files, hostagent process). This follows the framework's "each controller cleans up only the resources it owns" principle. [Claude Opus 4.6]

- **Idempotent reconciliation**: Every operation is safe to repeat — ConfigMap and LimaVM creation check for existence first, finalizer operations are no-ops when already applied/removed, and condition mirroring writes only when something changed. [Claude Opus 4.6]

- **Explicit requeue after ConfigMap deletion**: The comment at lines 163-165 correctly identifies that ConfigMaps are not watched and adds `Requeue: true` to avoid stalling. [Claude Opus 4.6, Codex GPT 5.4]

- **Defensive truncation at the mirroring boundary**: Truncating condition messages when mirroring from LimaVM to App (line 192) guards against future bypasses of the LimaVM controller's own truncation. [Claude Opus 4.6, Codex GPT 5.4]

---

## Testing Assessment

The BATS test suite covers the happy-path lifecycle thoroughly: singleton creation, finalizer and owner reference verification, template ConfigMap lifecycle, condition mirroring, `spec.running` propagation, validation webhooks, full deletion cascade, and recreate-after-delete.

Untested scenarios ranked by risk:

1. **Out-of-band LimaVM deletion**: No test verifies that deleting the LimaVM directly (while owned by App) is either rejected by a webhook or handled gracefully by the App controller. This is the scenario that triggers Critical finding 1.
2. **Partial creation crash recovery**: Service crash after creating the input ConfigMap but before creating the LimaVM. The code handles this (checks existence), but no test exercises it.
3. **Deletion during LimaVM startup**: Deleting the App while the LimaVM is actively starting. The test only deletes with `running=false`.

---

## Documentation Assessment

- The `AppSpec.Namespace` field has an immutability comment matching the CRD validation rule. Good.
- The `lima.yaml` template now contradicts the per-instance isolation docs in `docs/design/README.md` and `docs/design/cmd_app.md`. Once the template is made instance-aware, the docs should describe how instance-specific values are derived.

---

## Acknowledged Limitations

- `"This is temporary and will be removed once the app controller is fully implemented."` — `pkg/controllers/app/app/controller.go:18`. The embedded template is a bootstrapping mechanism. The hardcoded ports and socket paths (Suggestion 4) are covered by this acknowledgment. The limitation matters more now that the template carries operational values.

---

## Agent Performance Retro

### [Claude] Claude Opus 4.6

- **Duration**: 11 min 36 s
- **Unique contributions**: Additive-only condition mirroring (Suggestion 2), missing `TruncateConditionMessage` unit test (Suggestion 3). Both are valid, forward-looking suggestions.
- **Accuracy**: No false positives. All findings are well-calibrated.
- **Depth**: Read every changed file, traced into `apimeta.SetStatusCondition` to confirm it lacks removal semantics. Good cross-reference to `finalizer_test.go` precedent.
- **Coverage**: Complete — all 9 files accounted for in the coverage summary.
- **Coverage misses**: Missed the critical deadlock in the deletion handler (Critical 1) and the missing ValidateDelete webhook (Important 1). Marked `app_controller.go` with only the suggestion-level condition mirroring finding, missing the two most significant bugs in the same file.
- **Signal-to-noise**: High signal. Three actionable suggestions, no noise.

### [Codex] Codex GPT 5.4

- **Duration**: 3 min 46 s
- **Unique contributions**: First to identify the missing LimaVM ValidateDelete webhook as a standalone finding with full framework-doc citations (Important 1).
- **Accuracy**: No false positives. Severity inflation on the hardcoded template (rated IMPORTANT for a known-temporary limitation).
- **Depth**: Traced the owned finalizer through the framework contract in `controllers.md`, checked `rg` for existing ValidateDelete handlers, and cross-referenced design docs for the multi-instance model.
- **Coverage**: Complete — all 9 files accounted for.
- **Coverage misses**: Did not identify the deadlock consequence (Critical 1) despite finding the missing webhook that enables it. Stopped at the prevention layer without tracing what happens when the prevention is absent.
- **Signal-to-noise**: Good. Two findings, both actionable.

### [Gemini] Gemini 3.1 Pro

- **Duration**: 5 min 55 s
- **Unique contributions**: Identified the critical App deletion deadlock (Critical 1) — the highest-impact finding in this review. Also caught the dropped transient error in the deletion path (Suggestion 1).
- **Accuracy**: Rated the dropped error as IMPORTANT; downgraded to SUGGESTION in consolidation because transient kine errors are unlikely and the consequence is a small orphaned ConfigMap.
- **Depth**: Traced the full deletion switch to identify the deadlock, including the interaction between the owned finalizer and the `IsBeingDeleted` guard. Also noted the missing webhook in Design Observations.
- **Coverage**: Complete — all 9 files accounted for.
- **Coverage misses**: None. Found the critical bug that both other agents missed.
- **Signal-to-noise**: High signal. One critical, one important, one suggestion — all actionable.

### Summary

| Metric | Claude Opus 4.6 | Codex GPT 5.4 | Gemini 3.1 Pro |
|--------|-----------------|----------------|----------------|
| Duration | 11:36 | 3:46 | 5:55 |
| Critical | 0 | 0 | 1 |
| Important | 0 | 1 | 1 |
| Suggestion | 3 | 0 | 1 |
| Design observations | 4 strengths | 2 strengths | 1 concern |
| False positives | 0 | 0 | 0 |
| Unique insights | 2 | 1 | 2 |
| Files reviewed | 9 | 9 | 9 |
| Coverage misses | 1 (critical deadlock) | 1 (deadlock consequence) | 0 |

**Gemini** provided the most value this round, catching the critical deadlock that both other agents missed. **Codex** complemented it by identifying the framework-level prevention (missing webhook) with thorough documentation citations. **Claude** produced the most polished review with the best design observations and forward-looking suggestions, but missed the two most important bugs — a significant coverage gap for the most expensive agent. The multi-agent approach was essential: no single agent found all three key issues (deadlock, missing webhook, and the condition mirroring gap).

---

## Appendix: Original Reviews

### Claude Opus 4.6

---

## Executive Summary

This PR implements the App controller's core lifecycle: creating and owning a LimaVM singleton from an embedded template, propagating `spec.running`, mirroring LimaVM conditions onto the App resource, and cleaning up via a finalizer on deletion. The implementation is clean, follows established framework patterns (finalizers, owner references, condition mirroring), and comes with thorough BATS integration tests. No critical issues were found; the substantive finding is a gap in condition mirroring that would leave stale conditions if the LimaVM lifecycle is ever extended to remove conditions.

**Verdict: Approve with minor suggestions.**

## Findings

### Critical Issues

None.

### Important Issues

None.

### Suggestions

1. **Condition mirroring is additive-only — stale conditions not cleaned up** — `app_controller.go:187-196` (suggestion, gap)

```go
for _, cond := range limaVM.Status.Conditions {
    statusChanged = apimeta.SetStatusCondition(&app.Status.Conditions, metav1.Condition{
        Type:               cond.Type,
        Status:             cond.Status,
        ...
    }) || statusChanged
}
```

The mirroring loop copies every condition present on the LimaVM onto the App, but never removes conditions that exist on the App but not on the LimaVM. `apimeta.SetStatusCondition` only adds or updates — it has no removal semantics (line 31 of `k8s.io/apimachinery@v0.35.2/pkg/api/meta/conditions.go`).

In the current design this is safe: the LimaVM controller never removes conditions (they only transition between states), and the App is deleted/recreated as a unit. But if the LimaVM lifecycle is ever extended to remove conditions (e.g., clearing `Running` during a reset), the App would retain stale copies indefinitely.

Fix: After the mirroring loop, use `apimeta.RemoveStatusCondition` to clean up App conditions whose types are absent from `limaVM.Status.Conditions`:

```diff
+	// Remove conditions from App that no longer exist on LimaVM.
+	limaCondTypes := make(map[string]struct{}, len(limaVM.Status.Conditions))
+	for _, cond := range limaVM.Status.Conditions {
+		limaCondTypes[cond.Type] = struct{}{}
+	}
+	for _, cond := range app.Status.Conditions {
+		if _, ok := limaCondTypes[cond.Type]; !ok {
+			statusChanged = apimeta.RemoveStatusCondition(&app.Status.Conditions, cond.Type) || statusChanged
+		}
+	}
```

2. **No unit test for `TruncateConditionMessage`** — `pkg/controllers/base/conditions.go:22-29` (suggestion, gap)

```go
func TruncateConditionMessage(msg string) string {
    runes := []rune(msg)
    if len(runes) <= ConditionMessageMaxLen {
        return msg
    }
    const suffix = "… (truncated)"
    return string(runes[:ConditionMessageMaxLen-len([]rune(suffix))]) + suffix
}
```

This function is new, shared between two controllers (LimaVM at line 432 and App at line 192), and operates on rune boundaries. The existing `pkg/controllers/base/finalizer_test.go` sets a precedent for unit-testing base utilities. A table-driven test covering the boundary cases (empty string, exactly at limit, one rune over, multi-byte characters) would be valuable.

3. **Hardcoded SSH port and Docker socket path in template** — `pkg/controllers/app/app/lima.yaml:14,37` (suggestion, gap)

```yaml
ssh:
  localPort: 51422
```
```yaml
  hostSocket: "{{.Home}}/.rd/docker.sock"
```

The SSH `localPort` (line 14) is hardcoded. If two RDD instances run simultaneously (different `RDD_INSTANCE` values), they'd conflict on this port. Similarly, the Docker socket path (line 37) uses `.rd/` rather than incorporating the instance identifier, causing collisions between parallel instances. The design doc (api_app.md) explicitly supports multiple parallel instances via `RDD_INSTANCE`.

The file header notes the template is temporary, so this is expected for now. A future version should derive the port and socket path from the instance configuration.

## Design Observations

### Strengths

- **Clean separation of cleanup responsibilities**: The App controller only deletes the LimaVM and its own input ConfigMap. The LimaVM's cleanup finalizer handles the template ConfigMap, disk files, and hostagent process. This follows the framework's "each controller cleans up only the resources it owns" principle precisely.

- **Idempotent reconciliation**: Every operation in the reconcile loop is safe to repeat. ConfigMap and LimaVM creation check for existence first; finalizer operations are no-ops when already applied/removed; condition mirroring only writes when something changed. This makes crash recovery correct without explicit recovery logic.

- **Explicit requeue after ConfigMap deletion**: The comment at lines 163-165 correctly identifies that ConfigMaps are not watched (no `Owns(&corev1.ConfigMap{})`) and adds an explicit `Requeue: true` to avoid stalling. This shows good awareness of the watch-event contract.

- **Defensive truncation at the mirroring boundary**: Truncating condition messages when mirroring from LimaVM to App (line 192) is a good defense-in-depth practice. Even though the LimaVM controller already truncates at source (line 432), the App controller guards against future bypasses.

## Testing Assessment

The BATS test suite (`bats/tests/32-app-controllers/app.bats`) is thorough, covering:

1. Singleton creation and cluster-scoped access
2. Finalizer and owner reference verification
3. Template ConfigMap lifecycle (creation, copy by webhook, deletion)
4. Condition mirroring with timestamp preservation
5. `spec.running` propagation in both directions
6. Validation webhooks (singleton name, namespace immutability)
7. Full deletion cascade (App → LimaVM → cleanup)
8. Recreate-after-delete round trip

Untested scenarios (ranked by risk):

1. **Partial creation crash recovery**: What happens if the service crashes after creating the input ConfigMap but before creating the LimaVM? The code handles this correctly (checks ConfigMap existence before creating), but no test exercises it.
2. **Deletion during LimaVM startup**: Delete the App while the LimaVM is actively starting (hostagent launching). The code handles this, but the test only deletes with `running=false`.
3. **Concurrent spec changes during condition mirroring**: Rapid `spec.running` toggles while conditions are being mirrored. Controller-runtime serializes reconciles for the same key, so this is safe, but an integration test would provide confidence.

## Documentation Assessment

- The `AppSpec.Namespace` field now has an immutability comment that matches the CRD validation rule. Good.
- The `RegisterWithManager` docstring was updated to mention Lima type registration. Adequate.
- The `AppKind` constant has a docstring. Good.
- The design doc (`docs/design/api_app.md`) describes the App singleton naming as `rancher-desktop`, but the code enforces the name `app` (line 41 of `app_types.go`). This is a minor inconsistency in the design doc, not a code issue.

## Acknowledged Limitations

- **`lima.yaml` is temporary**: `controller.go:18` — *"This is temporary and will be removed once the app controller is fully implemented."* — The embedded template is a bootstrapping mechanism. The hardcoded ports and paths (SSH 51422, Docker socket) are covered by this acknowledgment.

## Coverage Summary

| File | Status |
|------|--------|
| `.github/actions/spelling/expect/rd.txt` | **Trivial** — adds "rosetta" to spelling expectations |
| `bats/tests/32-app-controllers/app.bats` | **Reviewed, no issues** — thorough integration test suite |
| `pkg/apis/app/v1alpha1/app_types.go` | **Reviewed, no issues** — `AppKind` constant and immutability annotation are correct |
| `pkg/controllers/app/app/controller.go` | **Reviewed, no issues** — clean wiring; Lima scheme registration is necessary for `Owns` watch |
| `pkg/controllers/app/app/controllers/app_controller.go` | **Finding 1** — additive-only condition mirroring |
| `pkg/controllers/app/app/crd.yaml` | **Reviewed, no issues** — regenerated, matches type annotations |
| `pkg/controllers/app/app/lima.yaml` | **Finding 3** — hardcoded SSH port and socket path |
| `pkg/controllers/base/conditions.go` | **Finding 2** — missing unit test for `TruncateConditionMessage` |
| `pkg/controllers/lima/limavm/controllers/limavm_controller.go` | **Reviewed, no issues** — single-line change wrapping message in `TruncateConditionMessage`; correct since `updateCondition` delegates to `setCondition` |

### Codex GPT 5.4

**Verdict**

Not ready to merge.

### Executive Summary

This PR adds the App controller’s first real lifecycle loop: it creates an owned `LimaVM`, propagates `spec.running`, mirrors VM conditions onto `App`, and tears the VM down through an App finalizer. The core reconcile flow is mostly sound, but I found two important regressions: the new `owned-by-App` finalizer on `LimaVM` has no matching delete validator, and the embedded `lima.yaml` hard-codes host-facing port/socket values that break the repo’s documented multi-instance model.

### Findings

#### Critical Issues

None.

#### Important Issues

1. **Owned `LimaVM`s can now be deleted into a stuck `Terminating` state because there is no LimaVM delete validator** — [pkg/controllers/app/app/controllers/app_controller.go](/Users/jan/git/rancher-desktop-daemon/pkg/controllers/app/app/controllers/app_controller.go):131-149, [pkg/controllers/lima/limavm/controller.go](/Users/jan/git/rancher-desktop-daemon/pkg/controllers/lima/limavm/controller.go):123-146, [docs/design/controllers.md](/Users/jan/git/rancher-desktop-daemon/docs/design/controllers.md):47-55 `(important, regression)`

```go
limaVM = &limav1alpha1.LimaVM{
    ObjectMeta: metav1.ObjectMeta{
        Name:       limaVMName,
        Namespace:  namespace,
        Finalizers: []string{base.OwnedFinalizerFor(v1alpha1.AppKind)},
    },
```

The new `Finalizers` assignment at line 135 adds `rdd.rancherdesktop.io/owned-by-App`, but the Lima controller still only registers a mutating webhook for `LimaVM` at lines 123-127 and a validating webhook for `ConfigMap` at lines 129-146. `rg` found no `ValidateDelete` handler for `LimaVM` anywhere in the repo. That violates the framework contract in `docs/design/controllers.md` lines 49-55: resources that may carry an owned finalizer must reject DELETE up front. In practice, `rdd ctl delete limavm rd` will now be accepted, the object will enter `metadata.deletionTimestamp != nil`, and it will sit in `Terminating` until the App controller later removes the finalizer during App deletion. That is a user-visible stuck resource introduced by this PR.

Fix: add a LimaVM validating webhook that embeds `base.OwnedDeletionGuard[*v1alpha1.LimaVM]` (or equivalent `ValidateDelete`) and register it alongside the existing LimaVM webhook setup. Keep create/update validation behavior as-is, but make DELETE fail with the explanatory “owned by App; delete the App resource instead” error before the object enters `Terminating`.

2. **The embedded App template hard-codes global host artifacts, so non-default or concurrent `RDD_INSTANCE`s will collide** — [pkg/controllers/app/app/lima.yaml](/Users/jan/git/rancher-desktop-daemon/pkg/controllers/app/app/lima.yaml):13-15, [pkg/controllers/app/app/lima.yaml](/Users/jan/git/rancher-desktop-daemon/pkg/controllers/app/app/lima.yaml):30-37, [pkg/instance/instance.go](/Users/jan/git/rancher-desktop-daemon/pkg/instance/instance.go):21-28, [pkg/instance/instance.go](/Users/jan/git/rancher-desktop-daemon/pkg/instance/instance.go):99-105, [docs/design/README.md](/Users/jan/git/rancher-desktop-daemon/docs/design/README.md):26-37, [docs/design/cmd_app.md](/Users/jan/git/rancher-desktop-daemon/docs/design/cmd_app.md):43-52 `(important, regression)`

```yaml
ssh:
  localPort: 51422
...
  guestSocket: /var/run/docker.sock
  hostSocket: "{{.Home}}/.rd/docker.sock"
```

The new template pins `ssh.localPort` to `51422` at line 14 and `hostSocket` to `{{.Home}}/.rd/docker.sock` at line 37. That ignores the per-instance isolation model used everywhere else: `pkg/instance/instance.go` lines 21-28 define `instance.Index()` specifically for unique per-instance ports, and lines 99-105 define the per-instance short directory (`~/.rd2`, `~/.rdbats`, etc.). The docs in `docs/design/README.md` lines 26-37 and `docs/design/cmd_app.md` lines 43-52 also explicitly say multiple `RDD_INSTANCE`s can run independently and that the Docker socket lives under the instance short dir (for example `~/.rd2/docker.sock`). With this template, a second instance will contend for the same SSH port, and even a single non-default instance will expose Docker on the wrong socket path.

Fix: stop embedding final host-facing values in raw YAML. Render the template with instance-specific values before creating the bootstrap `ConfigMap`: derive the SSH port from `instance.Index()` (or omit `localPort` and let Lima choose one if that is acceptable), and set `hostSocket` under `instance.ShortDir()` so each `RDD_INSTANCE` gets its own socket path. Add an integration test that runs with a non-default `RDD_INSTANCE` and asserts the rendered template points at the instance-specific socket/path.

#### Suggestions

None.

### Design Observations

**Strengths**

- The explicit `Requeue: true` after deleting the unwatched bootstrap `ConfigMap` in [pkg/controllers/app/app/controllers/app_controller.go](/Users/jan/git/rancher-desktop-daemon/pkg/controllers/app/app/controllers/app_controller.go):163-166 is the right watch-model fix; it avoids a silent stall.
- Reusing `base.TruncateConditionMessage()` in both App and Lima condition writers keeps status updates inside CRD limits and avoids a class of hard-to-diagnose status-write failures.

### Testing Assessment

1. Direct deletion of an App-owned `LimaVM` is untested. There should be a test that `rdd ctl delete limavm rd` is rejected with an ownership error instead of leaving the object stuck in `Terminating`.
2. Multi-instance behavior is untested. No test exercises a non-default `RDD_INSTANCE` and verifies the rendered Lima template uses instance-scoped host paths and ports.
3. The new BATS suite never inspects the embedded template contents, so the `.rd/docker.sock` / `51422` regression would pass unnoticed.

### Documentation Assessment

The code introduced in [pkg/controllers/app/app/lima.yaml](/Users/jan/git/rancher-desktop-daemon/pkg/controllers/app/app/lima.yaml) now contradicts the existing per-instance path/independence docs in [docs/design/README.md](/Users/jan/git/rancher-desktop-daemon/docs/design/README.md) and [docs/design/cmd_app.md](/Users/jan/git/rancher-desktop-daemon/docs/design/cmd_app.md). Once the code is fixed, the App controller should document how it derives instance-specific template values.

### Acknowledged Limitations

- `"This is temporary and will be removed once the app controller is fully implemented."` — [pkg/controllers/app/app/controller.go](/Users/jan/git/rancher-desktop-daemon/pkg/controllers/app/app/controller.go):16. This limitation matters more now that the embedded template carries operational values like port and socket configuration; the temporary mechanism is no longer just scaffolding.

### Coverage Summary

- [.github/actions/spelling/expect/rd.txt](/Users/jan/git/rancher-desktop-daemon/.github/actions/spelling/expect/rd.txt) — Trivial
- [bats/tests/32-app-controllers/app.bats](/Users/jan/git/rancher-desktop-daemon/bats/tests/32-app-controllers/app.bats) — Reviewed, no issues
- [pkg/apis/app/v1alpha1/app_types.go](/Users/jan/git/rancher-desktop-daemon/pkg/apis/app/v1alpha1/app_types.go) — Reviewed, no issues
- [pkg/controllers/app/app/controller.go](/Users/jan/git/rancher-desktop-daemon/pkg/controllers/app/app/controller.go) — Reviewed, no issues
- [pkg/controllers/app/app/controllers/app_controller.go](/Users/jan/git/rancher-desktop-daemon/pkg/controllers/app/app/controllers/app_controller.go) — Finding 1
- [pkg/controllers/app/app/crd.yaml](/Users/jan/git/rancher-desktop-daemon/pkg/controllers/app/app/crd.yaml) — Reviewed, no issues
- [pkg/controllers/app/app/lima.yaml](/Users/jan/git/rancher-desktop-daemon/pkg/controllers/app/app/lima.yaml) — Finding 2
- [pkg/controllers/base/conditions.go](/Users/jan/git/rancher-desktop-daemon/pkg/controllers/base/conditions.go) — Reviewed, no issues
- [pkg/controllers/lima/limavm/controllers/limavm_controller.go](/Users/jan/git/rancher-desktop-daemon/pkg/controllers/lima/limavm/controllers/limavm_controller.go) — Reviewed, no issues

### Gemini 3.1 Pro

### Executive Summary

This PR introduces the App controller lifecycle, allowing it to act as the singleton manager for the `LimaVM` instance. It successfully implements the bootstrap sequence (ConfigMap creation, VM ownership, and status mirroring), but introduces a critical cleanup deadlock if the child resource is externally deleted, and drops important transient errors during ConfigMap cleanup that can cause permanent resource leaks.

### Findings

#### Critical Issues

1. **App deletion deadlocks if LimaVM is already terminating** — `pkg/controllers/app/app/controllers/app_controller.go:76-88` (critical, regression)

```go
		case !base.IsBeingDeleted(limaVM):
			if err := base.RemoveOwnedFinalizer(ctx, r.Client, limaVM, v1alpha1.AppKind); err != nil {
				return ctrl.Result{}, fmt.Errorf("failed to remove owned finalizer from LimaVM: %w", err)
			}
			if err := r.Delete(ctx, limaVM); err != nil && !apierrors.IsNotFound(err) {
				return ctrl.Result{}, fmt.Errorf("failed to delete LimaVM: %w", err)
			}
			log.Info("Requested LimaVM deletion, waiting for teardown")
			return ctrl.Result{RequeueAfter: requeueAfterDeletion}, nil
		default:
			// LimaVM deletion is in progress; keep requeueing until it is gone.
			log.Info("Waiting for LimaVM deletion to complete")
			return ctrl.Result{RequeueAfter: requeueAfterDeletion}, nil
```

During App cleanup, the controller only removes the `OwnedFinalizer` if the `LimaVM` is not already in the `Terminating` state (`!base.IsBeingDeleted(limaVM)`). If the `LimaVM` was previously deleted by a user (e.g., `kubectl delete limavm rd`), its `DeletionTimestamp` will be non-zero. The controller skips the `!base.IsBeingDeleted` branch and lands in `default`, where it loops forever waiting for the `LimaVM` to disappear. However, the `LimaVM` can never disappear because the App controller still holds the `OwnedFinalizer` and refuses to remove it. Both the App and the LimaVM become permanently deadlocked.

Fix: Always ensure the owned finalizer is removed during App cleanup, regardless of the child's `DeletionTimestamp`.

```diff
-		case !base.IsBeingDeleted(limaVM):
-			if err := base.RemoveOwnedFinalizer(ctx, r.Client, limaVM, v1alpha1.AppKind); err != nil {
-				return ctrl.Result{}, fmt.Errorf("failed to remove owned finalizer from LimaVM: %w", err)
-			}
-			if err := r.Delete(ctx, limaVM); err != nil && !apierrors.IsNotFound(err) {
-				return ctrl.Result{}, fmt.Errorf("failed to delete LimaVM: %w", err)
-			}
-			log.Info("Requested LimaVM deletion, waiting for teardown")
-			return ctrl.Result{RequeueAfter: requeueAfterDeletion}, nil
-		default:
-			// LimaVM deletion is in progress; keep requeueing until it is gone.
-			log.Info("Waiting for LimaVM deletion to complete")
-			return ctrl.Result{RequeueAfter: requeueAfterDeletion}, nil
+		default:
+			if err := base.RemoveOwnedFinalizer(ctx, r.Client, limaVM, v1alpha1.AppKind); err != nil {
+				return ctrl.Result{}, fmt.Errorf("failed to remove owned finalizer from LimaVM: %w", err)
+			}
+			if !base.IsBeingDeleted(limaVM) {
+				if err := r.Delete(ctx, limaVM); err != nil && !apierrors.IsNotFound(err) {
+					return ctrl.Result{}, fmt.Errorf("failed to delete LimaVM: %w", err)
+				}
+				log.Info("Requested LimaVM deletion, waiting for teardown")
+			} else {
+				log.Info("Waiting for LimaVM deletion to complete")
+			}
+			return ctrl.Result{RequeueAfter: requeueAfterDeletion}, nil
```

#### Important Issues

2. **Dropped transient errors risk permanent resource leaks** — `pkg/controllers/app/app/controllers/app_controller.go:68-72` and `119-121` (important, regression)

```go
			} else if !apierrors.IsNotFound(cmErr) {
				log.Error(cmErr, "Failed to fetch input ConfigMap during cleanup")
			}
			return ctrl.Result{}, base.RemoveCleanupFinalizer(ctx, r.Client, &app)
```

During the cleanup of `inputConfigMap` (both in the App deletion path and the standard reconciliation path), if `r.Get` fails with a transient error (such as a dial timeout), the error is logged but not returned. 
- In the App deletion path (lines 68-72), dropping this error causes the controller to proceed to `base.RemoveCleanupFinalizer`. This permanently destroys the App resource and, because the system lacks K8s GC, leaves the `inputConfigMap` orphaned forever.
- In the standard reconciliation path (lines 119-121), the controller will just move on and fail to delete the ConfigMap.

Fix: Return the error so the controller can retry. Apply this fix to both locations:

```diff
			} else if !apierrors.IsNotFound(cmErr) {
-				log.Error(cmErr, "Failed to fetch input ConfigMap during cleanup")
+				return ctrl.Result{}, fmt.Errorf("failed to fetch input ConfigMap during cleanup: %w", cmErr)
			}
```

```diff
	} else if !apierrors.IsNotFound(err) {
-		log.Error(err, "Failed to fetch input ConfigMap")
+		return ctrl.Result{}, fmt.Errorf("failed to fetch input ConfigMap: %w", err)
	}
```

#### Suggestions

3. **Unconventional boolean logic may cause reader confusion** — `pkg/controllers/app/app/controllers/app_controller.go:137-146` (suggestion, enhancement)

```go
		statusChanged = apimeta.SetStatusCondition(&app.Status.Conditions, metav1.Condition{
            // ...
		}) || statusChanged
```

The logic correctly relies on short-circuiting to ensure `SetStatusCondition` is always evaluated (because the function call is the left operand of the `||`). However, using `||` in a compound assignment this way forces reviewers to pause and reverse-engineer the short-circuiting rules to ensure updates aren't accidentally skipped.

Fix: Extracting the return value makes the intent unambiguous.

```diff
-		statusChanged = apimeta.SetStatusCondition(&app.Status.Conditions, metav1.Condition{
+		changed := apimeta.SetStatusCondition(&app.Status.Conditions, metav1.Condition{
			Type:               cond.Type,
			Status:             cond.Status,
			Reason:             cond.Reason,
			Message:            base.TruncateConditionMessage(cond.Message),
			ObservedGeneration: app.Generation,
			LastTransitionTime: cond.LastTransitionTime,
-		}) || statusChanged
+		})
+		statusChanged = statusChanged || changed
```

### Design Observations

#### Concerns
- **Missing deletion guard for LimaVM (in-scope)**: The App controller protects the `LimaVM` resource it creates with an `OwnedFinalizer`. According to the project architecture (`docs/design/controllers.md`), resources with `OwnedFinalizer` should also be registered with an `OwnedDeletionGuard` validating webhook to immediately reject manual user deletes. Currently, `LimaVM` lacks this webhook in `lima/limavm/controller.go`. While the App controller should handle pre-terminated children (as identified in Finding 1), adding the `OwnedDeletionGuard` validating webhook to `LimaVM` acts as the primary defense against this state.

### Testing Assessment

The BATS tests provide solid coverage of the happy-path lifecycle. The highest risk untested scenario is:
1. **Out-of-band deletion**: A test where the `LimaVM` instance is manually deleted (putting it in `Terminating` state) *before* the `App` resource is deleted. Testing this would verify that the `App` controller does not deadlock when cleaning up.

### Documentation Assessment

None.

### Coverage Summary

- `.github/actions/spelling/expect/rd.txt` — Trivial
- `bats/tests/32-app-controllers/app.bats` — Reviewed, no issues
- `pkg/apis/app/v1alpha1/app_types.go` — Reviewed, no issues
- `pkg/controllers/app/app/controller.go` — Reviewed, no issues
- `pkg/controllers/app/app/controllers/app_controller.go` — Finding 1, 2, 3
- `pkg/controllers/app/app/crd.yaml` — Trivial
- `pkg/controllers/app/app/lima.yaml` — Reviewed, no issues
- `pkg/controllers/base/conditions.go` — Reviewed, no issues
- `pkg/controllers/lima/limavm/controllers/limavm_controller.go` — Reviewed, no issues
