# Deep Review: 20260324-145835-pr-236

| | |
|---|---|
| **Date** | 2026-03-24 14:58 |
| **Round** | 7 (of PR #236) |
| **Author** | [@Nino-K](https://github.com/Nino-K) |
| **PR** | [#236](https://github.com/rancher-sandbox/rancher-desktop-daemon/pull/236) — Implement App controller lifecycle for LimaVM |
| **Branch** | `app-controller-create-delete-limavm` |
| **Commits** | `9a40424` Implement App controller lifecycle for LimaVM<br>`7cf057a` Add Bats test to App Controller |
| **Reviewers** | Claude Opus 4.6, Codex GPT 5.4, Gemini 3.1 Pro |
| **Verdict** | **Merge with fixes** — two error-handling gaps let transient API failures orphan a ConfigMap or delay its cleanup indefinitely |
| **Wall-clock time** | `23 min 48 s` |

---

## Executive Summary

This PR implements the App controller's core lifecycle: creating and owning a LimaVM from an embedded template, propagating `spec.running`, mirroring VM conditions onto the App status, and cascading deletion through finalizers. The implementation handles partial failures idempotently and adds a LimaVM validating webhook to prevent direct deletion of owned resources. Two error-handling gaps allow transient API failures to orphan a ConfigMap or delay its cleanup; both have trivial fixes.

---

## Critical Issues

None.

---

## Important Issues

1. **Transient error during deletion cleanup orphans ConfigMap** — `pkg/controllers/app/app/controllers/app_controller.go:69-72` [Claude Opus 4.6, Gemini 3.1 Pro] (important, gap)

```go
		} else if !apierrors.IsNotFound(cmErr) {
			log.Error(cmErr, "Failed to fetch input ConfigMap during cleanup")
		}
		return ctrl.Result{}, base.RemoveCleanupFinalizer(ctx, r.Client, &app)
```

When the LimaVM is already gone and the controller tries to clean up the input ConfigMap, a transient `Get` failure at line 69 is logged but ignored. The controller then removes the cleanup finalizer at line 72, which allows the App to be deleted. Because the embedded kube-apiserver has no garbage collector, the orphaned ConfigMap persists permanently. Returning the error instead triggers controller-runtime's backoff retry — a cheap fix that avoids the orphan entirely.

Claude rated this as a suggestion, arguing that blocking App deletion for a harmless ConfigMap is worse. Gemini rated it as important. Gemini's framing is stronger: the fix costs nothing (a retry on transient failure) and eliminates a permanent leak.

Fix:

```diff
 		} else if !apierrors.IsNotFound(cmErr) {
-			log.Error(cmErr, "Failed to fetch input ConfigMap during cleanup")
+			return ctrl.Result{}, fmt.Errorf("failed to fetch input ConfigMap during cleanup: %w", cmErr)
 		}
 		return ctrl.Result{}, base.RemoveCleanupFinalizer(ctx, r.Client, &app)
```

2. **Transient error during runtime ConfigMap cleanup delays retry** — `pkg/controllers/app/app/controllers/app_controller.go:167-169` [Gemini 3.1 Pro] (important, gap)

```go
	} else if !apierrors.IsNotFound(err) {
		log.Error(err, "Failed to fetch input ConfigMap")
	}
```

In the normal reconciliation path, after the LimaVM exists, the controller tries to delete the lingering input ConfigMap. A transient `Get` failure at line 167 is logged but swallowed. Because the controller does not watch ConfigMaps (correctly — it `Owns` only LimaVM), no watch event will fire for the ConfigMap, and cleanup depends on the next unrelated LimaVM status change to trigger a reconcile.

This self-heals eventually, but the delay is unpredictable. Returning the error triggers an immediate retry with backoff.

Fix:

```diff
 	} else if !apierrors.IsNotFound(err) {
-		log.Error(err, "Failed to fetch input ConfigMap")
+		return ctrl.Result{}, fmt.Errorf("failed to fetch input ConfigMap: %w", err)
 	}
```

---

## Suggestions

1. **Bootstrap failures leave App with empty conditions** — `pkg/controllers/app/app/controllers/app_controller.go:107-152` [Codex GPT 5.4] (suggestion, gap)

```go
if err := r.Create(ctx, inputCM); err != nil {
	return ctrl.Result{}, fmt.Errorf("failed to create input ConfigMap: %w", err)
}
```

When `Create(inputCM)` at line 126 or `Create(limaVM)` at line 148 fails, the error triggers controller-runtime's backoff retry, but no conditions are set on the App. The App API design doc promises `Created=Unknown/Pending` and `Created=False/CreateFailed` states. In practice, the controller retries until the create succeeds, so an observer sees an empty `status.conditions` only during transient failures — a minor UX gap, not a correctness issue.

Codex rated this as important. The severity is lower because the controller retries automatically and the empty-conditions window is brief and self-healing. Setting `Created=Unknown/Pending` before bootstrap work starts would close the gap.

Fix: Set `Created=Unknown/Pending` on the App before attempting any bootstrap work.

2. **Missing unit test for `TruncateConditionMessage`** — `pkg/controllers/base/conditions.go:22-29` [Claude Opus 4.6] (suggestion, gap)

```go
func TruncateConditionMessage(msg string) string {
	runes := []rune(msg)
	if len(runes) <= ConditionMessageMaxLen {
		return msg
	}
	const suffix = "… (truncated)"
	return string(runes[:ConditionMessageMaxLen-len([]rune(suffix))]) + suffix
}
```

This function performs rune-boundary arithmetic. The project has a pattern of unit-testing base utilities (`finalizer_test.go` exists). A test covering the boundary case (exactly 32768 runes), the truncation case, and a multi-byte character case would be good hygiene.

3. **No test for App creation with `running=true`** — `bats/tests/32-app-controllers/app.bats` [Claude Opus 4.6] (suggestion, gap)

The BATS tests create the App with `running: false` and later patch it to `running: true`. The code path at `app_controller.go:142` where the LimaVM is created with `Running: true` at creation time is untested. Correct by inspection but exercises different code from the tested path.

---

## Design Observations

### Concerns

1. **Embedded template hardcodes host paths and ports** — `pkg/controllers/app/app/lima.yaml:14,37` [Codex GPT 5.4] (future)

The template pins `ssh.localPort: 51422` and `hostSocket: "{{.Home}}/.rd/docker.sock"`. The socket path `.rd` does not match the documented instance directory `~/.rd2/`, and the fixed SSH port would collide if two RDD instances ran simultaneously. The controller.go comment at line 17 acknowledges this template is temporary. When the permanent template mechanism replaces it, these values must be instance-aware.

### Strengths

- **Idempotent multi-step creation.** The ConfigMap-then-LimaVM sequence at lines 107-152 handles partial failures correctly. If the controller crashes between creating the ConfigMap and the LimaVM, the next reconcile detects the existing ConfigMap and proceeds to LimaVM creation. [Claude Opus 4.6]

- **Clean ownership model.** The App controller creates and owns the input ConfigMap (bootstrap artifact), while the LimaVM webhook creates and owns the template ConfigMap (operational copy). Each controller cleans up only its own resources, consistent with the framework's ownership boundary. [Claude Opus 4.6]

- **Correct finalizer sequencing.** The deletion handler at lines 75-87 removes the owned finalizer before issuing Delete (so the validating webhook allows it), then polls until the LimaVM controller finishes its own cleanup. [Claude Opus 4.6]

- **LimaVM validating webhook.** Adding `OwnedDeletionGuard` at `limavm/controller.go:134-152` prevents the "stuck in Terminating" failure mode documented in the framework design. [Claude Opus 4.6, Codex GPT 5.4]

- **Defensive condition truncation.** Applying `TruncateConditionMessage` at both the source (LimaVM controller) and the mirror (App controller) provides defense-in-depth against the 32768-character CRD validation limit. [Claude Opus 4.6, Gemini 3.1 Pro]

- **Explicit requeue after unwatched delete.** The `Requeue: true` at line 166 after deleting the input ConfigMap follows the project's rule that every `Result{}, nil` must have a future watch event. [Codex GPT 5.4]

---

## Testing Assessment

The BATS test suite covers the full happy-path lifecycle comprehensively: creation, singleton enforcement, finalizer verification, LimaVM ownership chain, template bootstrapping and cleanup, `running` propagation in both directions, condition mirroring with timestamp fidelity, namespace immutability, owned deletion rejection, full deletion cascade, and recreation after deletion.

Untested scenarios ranked by risk:

1. **Pre-LimaVM bootstrap failures** — No test forces the ConfigMap or LimaVM `Create()` calls to fail and asserts that the App surfaces `Created=False/CreateFailed`.
2. **App creation with `running=true`** — The code path where the LimaVM is created with `Running: true` at line 142 is exercised only indirectly via a later patch.
3. **Running condition mirroring** — Only the `Created` condition is verified to mirror from LimaVM to App. The `Running` condition follows the same code path, so risk is low.
4. **Deletion during creation** — Deleting the App before the LimaVM exists (between `EnsureCleanupFinalizer` and LimaVM Create). Correct by inspection, but untested.

## Documentation Assessment

The design doc `docs/design/api_app.md` shows the App as a namespaced resource in its example YAML, but this PR implements it as cluster-scoped with an immutable `spec.namespace` field. The example and surrounding text should be updated to match.

## Commit Structure

The two commits are logically clean:
- `9a40424` implements the full feature (controller, webhook, CRD, template, shared utilities)
- `7cf057a` adds the BATS integration tests

The `TruncateConditionMessage` utility and the LimaVM webhook additions are bundled into the first commit alongside the App controller changes. They could be separate commits (the webhook is a cross-controller change), but they are cohesive enough that splitting would be unnecessary churn.

## Acknowledged Limitations

1. `pkg/controllers/app/app/controller.go:17`:
   > "This is temporary and will be removed once the app controller is fully implemented."

   The embedded `lima.yaml` template hardcodes the VM configuration at compile time. The author acknowledges this is temporary. This change does not make the limitation worse, but the template now carries runtime-specific host settings (SSH port, socket path) that should be instance-derived when the permanent mechanism arrives.

---

## Agent Performance Retro

### [Claude]

Claude Opus 4.6 produced the most detailed review with strong code tracing. It correctly identified the ConfigMap cleanup error-handling gap (Finding 1) but rated it as a suggestion rather than important — underestimating the permanent orphan risk in a GC-less environment. It provided the most thorough design observations, including idempotent creation, ownership boundaries, and finalizer sequencing. Unique contributions: missing `TruncateConditionMessage` unit test, untested `running=true` creation path. No false positives. Coverage was complete.

Duration: 10 min 0 s.

### [Codex]

Codex GPT 5.4 was the fastest and raised two unique findings. The hardcoded template paths/ports observation (Finding 1 in its review) is valid but lands in the acknowledged-limitation category — the template is explicitly marked temporary. The bootstrap-failures-not-surfaced-as-conditions finding is a genuine gap, though it self-heals and is better rated as suggestion than important. Codex also caught the stale design doc. Coverage was complete. No false positives.

Duration: 4 min 33 s.

### [Gemini]

Gemini 3.1 Pro focused narrowly on the two ConfigMap error-handling paths and found both (the deletion cleanup at line 69 and the runtime cleanup at line 167). Its analysis of the runtime cleanup path was unique — no other agent flagged line 167-169. The review was accurate but shallow: no design observations beyond strengths, no testing assessment beyond confirming the BATS suite is comprehensive, and no documentation gaps noted. Coverage was complete. No false positives.

Duration: 5 min 6 s.

### Summary

| Metric | Claude Opus 4.6 | Codex GPT 5.4 | Gemini 3.1 Pro |
|---|---|---|---|
| Duration | 10:00 | 4:33 | 5:06 |
| Critical | 0 | 0 | 0 |
| Important | 0 | 2 | 2 |
| Suggestion | 3 | 0 | 0 |
| Design observations | 5 strengths | 2 strengths | 2 strengths |
| False positives | 0 | 0 | 0 |
| Unique insights | 2 | 2 | 1 |
| Files reviewed | 10/10 | 10/10 | 10/10 |
| Coverage misses | 0 | 0 | 0 |

All three agents achieved complete coverage with zero false positives. Gemini found the most consequential gap (the runtime cleanup error swallowing at line 167). Claude provided the deepest design analysis. Codex was fastest and raised the broadest set of concerns, though one landed in the acknowledged-limitation category. The multi-agent approach added value: no single agent found all five consolidated findings.

---

## Appendix: Original Reviews

### Claude Opus 4.6

Good, `--for=create` is an established BATS convention in this project. Now I have all the information needed. Let me compile the review.

---

## Executive Summary

This PR implements the core App controller lifecycle: creating a LimaVM with template bootstrapping, propagating `spec.running`, mirroring VM conditions back to the App status, and handling deletion through a finalizer-based cleanup cascade. The implementation is well-structured, correctly handles partial failure scenarios in the multi-step create/delete flows, and adds a LimaVM validating webhook to prevent direct deletion of owned resources. The BATS test coverage is thorough. No critical issues found.

## Findings

### Critical Issues

None.

### Important Issues

None.

### Suggestions

1. **Deletion cleanup skips ConfigMap on transient fetch error** — `pkg/controllers/app/app/controllers/app_controller.go:69-72` (suggestion, regression)

```go
		} else if !apierrors.IsNotFound(cmErr) {
			log.Error(cmErr, "Failed to fetch input ConfigMap during cleanup")
		}
		return ctrl.Result{}, base.RemoveCleanupFinalizer(ctx, r.Client, &app)
```

When the LimaVM is already gone and the code tries to clean up the input ConfigMap, a transient error fetching the ConfigMap (lines 69-70) is logged but cleanup proceeds to remove the cleanup finalizer at line 72. Since RDD has no garbage collector, a surviving input ConfigMap would be orphaned permanently. The owner reference would point to the now-deleted App, so it's identifiable but never auto-cleaned.

This is a reasonable tradeoff — blocking App deletion for a harmless ConfigMap is worse — but the behavior should be explicit.

Fix: Add a comment explaining the intentional choice, or return the error to let the retry loop handle transient failures:

```diff
 		} else if !apierrors.IsNotFound(cmErr) {
-			log.Error(cmErr, "Failed to fetch input ConfigMap during cleanup")
+			// Best-effort: a transient error here could orphan the ConfigMap,
+			// but blocking App deletion for a harmless bootstrap artifact is worse.
+			log.Error(cmErr, "Failed to fetch input ConfigMap during cleanup; proceeding")
 		}
```

2. **Missing unit test for `TruncateConditionMessage`** — `pkg/controllers/base/conditions.go:22-29` (suggestion, gap)

```go
func TruncateConditionMessage(msg string) string {
	runes := []rune(msg)
	if len(runes) <= ConditionMessageMaxLen {
		return msg
	}
	const suffix = "… (truncated)"
	return string(runes[:ConditionMessageMaxLen-len([]rune(suffix))]) + suffix
}
```

This is a new utility function with rune-boundary arithmetic. The project has an established pattern of unit testing base utilities (`finalizer_test.go` exists). A unit test covering the boundary case (exactly 32768 runes), the truncation case, and a multi-byte-character case would be good hygiene.

Fix: Add a `TestTruncateConditionMessage` in a new `conditions_test.go` file.

3. **No test for App creation with `running=true`** — `bats/tests/32-app-controllers/app.bats` (suggestion, gap)

The BATS tests create the App with `running: false` and later patch it to `running: true`. The code path at `app_controller.go:142` where the LimaVM is created with `Running: true` at creation time (from an App created with `running: true`) is untested. This path is correct by inspection but differs from the tested path where `running` is propagated via Update.

Fix: Consider adding a test that creates the App with `running: true` and verifies the LimaVM is created with `spec.running=true` without needing a patch.

## Design Observations

### Strengths

- **Idempotent multi-step creation**: The ConfigMap-then-LimaVM creation at lines 107-152 correctly handles partial failures. If the reconciler crashes between creating the ConfigMap and the LimaVM, the next reconcile detects the existing ConfigMap (line 109) and skips to LimaVM creation. No duplicate resources.

- **Clean ownership model**: The App controller creates the input ConfigMap (owned by App via owner reference) as a bootstrap artifact, while the LimaVM webhook creates the template ConfigMap (owned by LimaVM). These are cleanly separated — the App controller only ever deletes its own input ConfigMap, consistent with the framework's "each controller cleans up only its own resources" design principle.

- **Correct finalizer sequencing in deletion**: The deletion handler at lines 75-87 correctly removes the owned finalizer before issuing Delete (so the validating webhook allows it), then polls until the LimaVM controller finishes its own cleanup. The `RequeueAfter: 2s` polling with the `IsBeingDeleted` check at line 79 avoids re-deleting an already-terminating resource.

- **Defensive condition truncation**: Applying `TruncateConditionMessage` at both the source (LimaVM controller at `limavm_controller.go:432`) and the mirror (App controller at `app_controller.go:192`) follows good defense-in-depth — a future code path that bypasses the LimaVM truncation won't break the App's status update.

- **LimaVM validating webhook for owned deletion guard**: Adding the `OwnedDeletionGuard` webhook at `limavm/controller.go:134-152` prevents the well-documented "stuck in Terminating" failure mode where a direct `rdd ctl delete limavm rd` would leave the owned-by-App finalizer with no controller to remove it.

## Testing Assessment

The BATS test suite covers the full happy-path lifecycle comprehensively: creation, singleton enforcement, finalizer verification, LimaVM ownership chain, template bootstrapping and cleanup, `running` propagation in both directions, condition mirroring with timestamp fidelity, namespace immutability, owned deletion rejection, full deletion cascade, and recreation after deletion.

Untested scenarios ranked by risk:

1. **App creation with `running=true`** — The code path where the LimaVM is created with `Running: true` at line 142 is exercised only in production, not in tests. The tested path propagates `running` via a separate Update.
2. **Unit test for `TruncateConditionMessage`** — Rune-boundary arithmetic with multi-byte characters has no test coverage.
3. **Running condition mirroring** — Only the `Created` condition is verified to mirror from LimaVM to App. The `Running` condition mirroring follows the same code path, so this is low risk.
4. **Deletion during creation** — What happens when the App is deleted before the LimaVM has been created (between `EnsureCleanupFinalizer` and LimaVM Create). By code inspection this is handled correctly, but there's no test.

## Documentation Assessment

No documentation gaps. The CRD YAML (`crd.yaml`) was updated to include the immutability validation rule and comment for `spec.namespace`. The `app_types.go` comments accurately describe the immutability constraint. The spelling dictionary was updated for `rosetta`.

## Commit Structure

The two commits are logically clean:
- `9a40424` implements the full feature (controller lifecycle + webhook + CRD changes + shared utilities)
- `7cf057a` adds the BATS integration tests

One minor observation: the `TruncateConditionMessage` utility and the LimaVM webhook additions are bundled into the first commit alongside the App controller changes. They could arguably be separate commits (the webhook especially is a cross-controller change), but they're cohesive enough that splitting would be unnecessary churn.

## Acknowledged Limitations

1. **Temporary embedded template** — `pkg/controllers/app/app/controller.go:18`:
   > `// This is temporary and will be removed once the app controller is fully implemented.`

   The `lima.yaml` template is embedded directly via `//go:embed`. This hardcodes the VM configuration at compile time. The author acknowledges this is temporary — presumably the template will eventually come from a configurable source. This change does not make the limitation worse.

## Coverage Summary

| Changed File | Status |
|---|---|
| `.github/actions/spelling/expect/rd.txt` | **Trivial** — added `rosetta` to spell-check dictionary |
| `bats/tests/32-app-controllers/app.bats` | **Reviewed, no issues** — comprehensive lifecycle test suite |
| `pkg/apis/app/v1alpha1/app_types.go` | **Reviewed, no issues** — `AppKind` constant and immutability annotation are correct |
| `pkg/controllers/app/app/controller.go` | **Reviewed, no issues** — Lima scheme registration and template embedding are clean |
| `pkg/controllers/app/app/controllers/app_controller.go` | **Finding 1** — transient error handling in deletion cleanup |
| `pkg/controllers/app/app/crd.yaml` | **Reviewed, no issues** — CRD correctly adds `x-kubernetes-validations` for immutability |
| `pkg/controllers/app/app/lima.yaml` | **Reviewed, no issues** — template covers both architectures with sha256 digests |
| `pkg/controllers/base/conditions.go` | **Finding 2** — missing unit test for `TruncateConditionMessage` |
| `pkg/controllers/lima/limavm/controller.go` | **Reviewed, no issues** — validating webhook correctly uses `OwnedDeletionGuard` for DELETE operations |
| `pkg/controllers/lima/limavm/controllers/limavm_controller.go` | **Reviewed, no issues** — `TruncateConditionMessage` call at source is a single-line change, correct |

### Codex GPT 5.4

**Executive Summary**

This PR turns the App controller into the owner of the `rd` LimaVM lifecycle: it bootstraps the VM from an embedded template, propagates `spec.running`, mirrors LimaVM conditions, and adds deletion/finalizer handling plus a LimaVM delete guard. The main problems are that the embedded template hardcodes host endpoints that are supposed to vary per RDD instance, and the App resource still reports no status at all when bootstrap fails before the LimaVM object exists.

**Findings**

**Critical Issues**

None.

**Important Issues**

1. **Bootstrap template hardcodes host paths and ports that must be instance-specific** — [pkg/controllers/app/app/lima.yaml](/Users/jan/git/rancher-desktop-daemon/pkg/controllers/app/app/lima.yaml#L13) (important, regression)

```yaml
ssh:
  localPort: 51422
...
  hostSocket: "{{.Home}}/.rd/docker.sock"
```

The new embedded template pins `ssh.localPort` at line 14 and `hostSocket` at line 37 to fixed host values. That conflicts with the instance model documented in `docs/design/api_app.md:15-18`, and with the runtime helpers in `pkg/instance/instance.go:101-113`, which derive per-instance paths like `~/.rd2`. As written, every non-default instance will publish Docker at the wrong path, and parallel RDD instances will contend for the same SSH port. Even the default instance is inconsistent with the documented socket path (`~/.rd2/docker.sock`, not `~/.rd/docker.sock`).

Fix: render the bootstrap template with instance-aware values instead of embedding final host endpoints in source control.

```diff
 ssh:
-  localPort: 51422
+  localPort: {{ .SSHPort }}
 ...
-  hostSocket: "{{.Home}}/.rd/docker.sock"
+  hostSocket: "{{ .ShortDir }}/docker.sock"
```

Populate `SSHPort` from a deterministic per-instance calculation (for example using `instance.Index()`) or let Lima choose dynamically, and populate `ShortDir` from `instance.ShortDir()` before writing the ConfigMap.

2. **App bootstrap failures never surface on the App status** — [pkg/controllers/app/app/controllers/app_controller.go](/Users/jan/git/rancher-desktop-daemon/pkg/controllers/app/app/controllers/app_controller.go#L107) (important, regression)

```go
if err := r.Create(ctx, inputCM); err != nil {
	return ctrl.Result{}, fmt.Errorf("failed to create input ConfigMap: %w", err)
}
...
if err := r.Create(ctx, limaVM); err != nil {
	return ctrl.Result{}, fmt.Errorf("failed to create LimaVM: %w", err)
}
```

The create path at lines 107-149 returns errors directly for bootstrap failures, but the only status logic is the LimaVM mirroring loop at lines 180-195. If `r.Create(ctx, inputCM)` fails at line 126 or `r.Create(ctx, limaVM)` fails at line 148, no LimaVM ever exists, so there is no child event to mirror and the `App` stays with an empty `status.conditions` forever. That violates the App API contract in `docs/design/api_app.md:90-100`, which promises `Created=Unknown/Pending` and `Created=False/CreateFailed` on the App itself.

Fix: set App `Created=Unknown/Pending` before bootstrap work starts, and patch `Created=False/CreateFailed` onto the App before returning any pre-LimaVM bootstrap error.

```diff
+if apimeta.FindStatusCondition(app.Status.Conditions, "Created") == nil {
+	apimeta.SetStatusCondition(&app.Status.Conditions, metav1.Condition{
+		Type: "Created", Status: metav1.ConditionUnknown, Reason: "Pending",
+		Message: "Reconciliation in progress", ObservedGeneration: app.Generation,
+	})
+	if err := r.Status().Update(ctx, &app); err != nil { return ctrl.Result{}, err }
+	return ctrl.Result{}, nil
+}
...
 if err := r.Create(ctx, limaVM); err != nil {
+	_ = r.updateAppCondition(ctx, &app, "Created", metav1.ConditionFalse, "CreateFailed", err.Error())
 	return ctrl.Result{}, fmt.Errorf("failed to create LimaVM: %w", err)
 }
```

**Suggestions**

None.

**Design Observations**

**Strengths**

- The new LimaVM validating webhook in [pkg/controllers/lima/limavm/controller.go](/Users/jan/git/rancher-desktop-daemon/pkg/controllers/lima/limavm/controller.go#L131) matches the controller-framework design well: the owned finalizer now has the required delete-time guard, so users get a clear rejection instead of a stuck `Terminating` resource.
- The explicit `Requeue: true` after deleting the unwatched bootstrap ConfigMap in [pkg/controllers/app/app/controllers/app_controller.go](/Users/jan/git/rancher-desktop-daemon/pkg/controllers/app/app/controllers/app_controller.go#L163) is the right pattern for this codebase’s “every `Result{}, nil` must have a future watch event” rule.

**Testing Assessment**

1. Multi-instance behavior is untested. Nothing in [bats/tests/32-app-controllers/app.bats](/Users/jan/git/rancher-desktop-daemon/bats/tests/32-app-controllers/app.bats#L1) exercises a non-default `RDD_INSTANCE` or two instances in parallel, so the hardcoded host socket / SSH port regression would slip through.
2. Pre-LimaVM bootstrap failures are untested. There is no case that forces the App controller’s ConfigMap or LimaVM `Create()` calls to fail and then asserts that the App exposes `Created=False/CreateFailed`.
3. The tests never assert the App’s initial `Created=Unknown/Pending` state, so the current status gap is invisible to the suite.

**Documentation Assessment**

`docs/design/api_app.md` is now stale relative to this PR. It still shows a namespaced `App` example named `rancher-desktop`, but this change enforces a cluster-scoped singleton named `app` and adds immutable `spec.namespace`; those constraints should be documented alongside the status contract.

**Acknowledged Limitations**

- `pkg/controllers/app/app/controller.go:17` says: “This is temporary and will be removed once the app controller is fully implemented.” This matters more after this PR because the embedded template now carries runtime-specific host settings; the temporary bootstrap path is starting to encode behavior that should live in generated state.

**Coverage Summary**

- `.github/actions/spelling/expect/rd.txt` — Trivial
- `bats/tests/32-app-controllers/app.bats` — Reviewed, no issues
- `pkg/apis/app/v1alpha1/app_types.go` — Reviewed, no issues
- `pkg/controllers/app/app/controller.go` — Reviewed, no issues
- `pkg/controllers/app/app/controllers/app_controller.go` — Finding 2
- `pkg/controllers/app/app/crd.yaml` — Reviewed, no issues
- `pkg/controllers/app/app/lima.yaml` — Finding 1
- `pkg/controllers/base/conditions.go` — Reviewed, no issues
- `pkg/controllers/lima/limavm/controller.go` — Reviewed, no issues
- `pkg/controllers/lima/limavm/controllers/limavm_controller.go` — Reviewed, no issues

### Gemini 3.1 Pro

### Executive Summary

This change introduces the core `App` controller, which acts as a singleton resource manager orchestrating the lifecycle of a `LimaVM` instance. It appropriately handles generating the required template `ConfigMap`, managing the finalizers needed for safe cross-controller cascading deletions (since the embedded environment lacks Kubernetes Garbage Collection), and mirroring status conditions from the underlying VM back to the `App` resource. The implementation is well-structured, but a couple of error-handling edge cases around `ConfigMap` retrieval during cleanup sequences could lead to orphaned resources.

### Findings

#### Critical Issues

None.

#### Important Issues

1. **Transient API errors during deletion can orphan ConfigMaps** — `pkg/controllers/app/app/controllers/app_controller.go:67-71` `(important, gap)`

```go
			} else if !apierrors.IsNotFound(cmErr) {
				log.Error(cmErr, "Failed to fetch input ConfigMap during cleanup")
			}
			return ctrl.Result{}, base.RemoveCleanupFinalizer(ctx, r.Client, &app)
```

During the `App` resource cleanup phase, the controller tries to fetch the `inputConfigMap` to delete it. If the `Get` call fails with a transient error (like a network timeout), the controller logs the error but proceeds to remove the `App`'s cleanup finalizer. Removing the finalizer causes the `App` to be deleted immediately, which means the `inputConfigMap` will be permanently orphaned.

Fix: Return the error so the controller retries the cleanup phase with backoff:
```diff
-			} else if !apierrors.IsNotFound(cmErr) {
-				log.Error(cmErr, "Failed to fetch input ConfigMap during cleanup")
-			}
+			} else if !apierrors.IsNotFound(cmErr) {
+				return ctrl.Result{}, fmt.Errorf("failed to fetch input ConfigMap during cleanup: %w", cmErr)
+			}
			return ctrl.Result{}, base.RemoveCleanupFinalizer(ctx, r.Client, &app)
```

2. **Transient API errors during runtime cleanup bypass retry backoff** — `pkg/controllers/app/app/controllers/app_controller.go:168-170` `(important, gap)`

```go
	} else if !apierrors.IsNotFound(err) {
		log.Error(err, "Failed to fetch input ConfigMap")
	}

	// Propagate spec.running from App into the LimaVM.
```

In the normal reconciliation loop, the controller cleans up the `inputConfigMap` after the `LimaVM` exists. If the `Get` call fails with an error other than `IsNotFound`, it logs the error and continues through the rest of the reconcile loop, eventually returning `ctrl.Result{}, nil`. Because the `App` controller does not `Watch` `ConfigMaps` (which is correct), dropping the error means the `ConfigMap` might linger until another unrelated event (like a `LimaVM` status update) triggers the next reconcile.

Fix: Return the error to rely on the controller-runtime's natural backoff queue:
```diff
-	} else if !apierrors.IsNotFound(err) {
-		log.Error(err, "Failed to fetch input ConfigMap")
-	}
+	} else if !apierrors.IsNotFound(err) {
+		return ctrl.Result{}, fmt.Errorf("failed to fetch input ConfigMap: %w", err)
+	}
```

#### Suggestions

None.

### Design Observations

#### Strengths
- **Graceful Lifecycle Coupling:** The controller smartly navigates the lack of garbage collection in the embedded apiserver. Manually orchestrating `owned-by-*` and `cleanup` finalizers during cascaded deletions ties the `App`'s deletion directly to the completion of the `LimaVM`'s teardown, preventing race conditions.
- **Defensive Message Bounds:** Implementing `TruncateConditionMessage` at the source prevents long strings (like cascading failure messages or multi-line command output) from violating the strict 32768-character length limit enforced by CRD validation, which would otherwise permanently lock up the controller loop's status patching.
- **Short-circuit Bypass:** The assignment construct `statusChanged = apimeta.SetStatusCondition(...) || statusChanged` elegantly handles boolean reduction in loops, bypassing Go's default right-side short-circuiting to ensure the function actually evaluates during every condition iteration. 

### Testing Assessment

The provided BATS tests (`bats/tests/32-app-controllers/app.bats`) are extremely comprehensive. They test the happy path (creation, `spec.running` propagation, configuration mirroring) as well as validating immutability behavior, finalizer presence, and cascaded deletion logic (including preventing direct deletion of an owned `LimaVM`). No unit tests are provided for the controller logic, but the deep integration test coverage fully exercises the controller in its natural environment.

### Coverage Summary

- `.github/actions/spelling/expect/rd.txt` — Trivial
- `bats/tests/32-app-controllers/app.bats` — Reviewed, no issues
- `pkg/apis/app/v1alpha1/app_types.go` — Reviewed, no issues
- `pkg/controllers/app/app/controller.go` — Reviewed, no issues
- `pkg/controllers/app/app/controllers/app_controller.go` — **Finding 1, 2**
- `pkg/controllers/app/app/crd.yaml` — Reviewed, no issues
- `pkg/controllers/app/app/lima.yaml` — Reviewed, no issues
- `pkg/controllers/base/conditions.go` — Reviewed, no issues
- `pkg/controllers/lima/limavm/controller.go` — Reviewed, no issues
- `pkg/controllers/lima/limavm/controllers/limavm_controller.go` — Reviewed, no issues
