# Deep Review: 20260415-184413-pr-315

| | |
|---|---|
| **Date** | 2026-04-15 18:44 |
| **Repo** | [rancher-sandbox/rancher-desktop-daemon](https://github.com/rancher-sandbox/rancher-desktop-daemon) |
| **Round** | 1 |
| **Author** | @jandubois |
| **PR** | [#315](https://github.com/rancher-sandbox/rancher-desktop-daemon/pull/315) — app: aggregate Ready and Settled conditions, wait for Settled in `rdd set` |
| **Branch** | `container-actions` |
| **Commits** | `52c05a7` app: aggregate Ready and Settled conditions, wait for Settled in `rdd set` |
| **Reviewers** | Claude Opus 4.6, Codex GPT 5.4, Gemini 3.1 Pro |
| **Verdict** | **Merge with fixes** — `EngineEnabled` derives from the init-time registry instead of the runtime-enabled controller set |
| **Wall-clock time** | `15 min 13 s` |

## Executive Summary

This commit replaces per-property wait logic in `rdd set` with a single `Settled` condition on the App resource, computed by the App controller from the mirrored LimaVM `Running` condition and the engine controller's `ContainerEngineReady`. The design is sound: `ObservedGeneration` gating prevents stale-snapshot races, and the Ready/Settled split serves different consumers cleanly (Ready for liveness, Settled for CLI waits).

One important issue: `hasRegisteredController("engine")` checks the compile-time controller registry (populated by `init()`) instead of the runtime-enabled controller set (filtered by `--controllers`). When the engine controller is compiled in but excluded at runtime, no one writes `ContainerEngineReady`, and Settled hangs indefinitely. The prior code queried the discovery ConfigMap, which reflects actually-running controllers.

Structure: 8 files, 219 additions, 107 deletions across the App types, App controller, `rdd set` CLI, BATS tests, and design docs.

---

## Critical Issues

None.

---

## Important Issues

I1. **`hasRegisteredController` checks registration, not runtime enablement** — `pkg/controllers/app/app/controller.go:153,167-174` [Claude Opus 4.6, Codex GPT 5.4, Gemini 3.1 Pro]

```go
EngineEnabled:    hasRegisteredController("engine"),
```

```go
func hasRegisteredController(name string) bool {
	for _, c := range base.GetAllControllers() {
		if c.GetName() == name {
			return true
		}
	}
	return false
}
```

`base.GetAllControllers()` returns every controller whose `init()` called `RegisterController`, regardless of whether `--controllers` subsequently excluded it. The engine controller registers on all non-Windows platforms, so `EngineEnabled` is always `true` on macOS and Linux. When the service starts with `--controllers=app,lima` (or `*,-engine`), no one writes `ContainerEngineReady`. The Settled switch falls through to `engineCond == nil` at `app_controller.go:440` and stays `False/WaitingForEngine` permanently.

The prior code avoided this: `isEngineControllerEnabled` (removed from `set.go` in this commit) queried the discovery ConfigMap, which reflects actually-running controllers. This change regresses that capability.

Fix: plumb the actual enabled-controller set from the shared controller manager startup (`service.go:757-764`) into `RegisterWithManager`, or query the discovery ConfigMap from the reconciler.

---

## Suggestions

S1. **Stale help text in `rdd set` cobra command** — `cmd/rdd/set.go:70-74` [Claude Opus 4.6, Codex GPT 5.4]

```go
By default, rdd set waits for the desired state: when running=true
is set, it waits for the container engine to be ready; when
running=false, it waits for the VM to stop. Pass --wait=false to
return as soon as the patch is accepted, or --timeout=0 to wait
indefinitely.
```

This help text describes the old per-property wait behavior. The implementation now waits for `Settled=True` unconditionally, and the design docs (`docs/design/cmd_app.md:13`) reflect the new behavior. The cobra help was not updated.

Fix: update the Long help to describe the Settled-based wait.

S2. **Narrow race: premature Settled=True after in-place config change** — `pkg/controllers/app/app/controllers/app_controller.go:319-356` [Gemini 3.1 Pro]

When a spec change updates the template ConfigMap (e.g. `kubernetes.version`), the App controller patches the ConfigMap and requeues. On the requeue it mirrors LimaVM conditions, which still carry `Running=True/Started` because the LimaVM controller has not yet detected the template change. If the engine controller has already stamped `ContainerEngineReady` at the new generation (it just re-confirms its existing connection), `computeAppConditions` sees a terminal Running reason with a current-generation engine condition and sets `Settled=True`.

This can cause `rdd set` to return before the VM restarts with the new config.

Mitigating factors: the previous code never waited for non-running property changes (the explicit TODO this commit addresses), so the new behavior is strictly better. The window is narrow — the LimaVM controller detects template changes quickly via `handleTemplateUpdate` and sets `restartNeeded`, which changes Running and causes a subsequent App reconcile to correct Settled. In practice, this race requires the engine reconcile and App requeue to both complete before the LimaVM controller processes the ConfigMap watch event.

Fix: the App controller could check whether the LimaVM's `observedTemplateResourceVersion` matches the current ConfigMap's resourceVersion before allowing Settled=True, but this adds cross-controller coupling. A simpler approach: return `ctrl.Result{RequeueAfter: ...}` instead of `Requeue: true` after the template patch, giving the LimaVM controller time to process.

S3. **Settled message misleading for terminal error states** — `pkg/controllers/app/app/controllers/app_controller.go:419-426` [Claude Opus 4.6]

```go
case desiredRunning && runningCond.Reason != "Started":
    settled.Status = metav1.ConditionFalse
    settled.Reason = runningCond.Reason
    settled.Message = "LimaVM has not yet reached Started"
```

When the reason is a terminal failure like `StartFailed`, the message "has not yet reached Started" implies progress that will not happen. The `Reason` field carries the correct signal, but `rdd set` users see only the timeout error and need to inspect the condition separately.

Fix: propagate the upstream message for failure reasons:

```diff
-		settled.Message = "LimaVM has not yet reached Started"
+		if strings.HasSuffix(runningCond.Reason, "Failed") {
+			settled.Message = runningCond.Message
+		} else {
+			settled.Message = "LimaVM has not yet reached Started"
+		}
```

---

## Design Observations

### Strengths

1. **Single Settled condition replaces per-property mapping** `(in-scope)` [Claude Opus 4.6, Codex GPT 5.4, Gemini 3.1 Pro] — The old `waitForDesiredState` hardcoded a `running → ContainerEngineReady/Running` mapping, making non-running changes unwaiteable. Moving the aggregation server-side means every property change gets correct wait semantics without CLI changes.

2. **ObservedGeneration gating prevents stale-snapshot races** `(in-scope)` [Codex GPT 5.4] — Stamping `ObservedGeneration` on synthesized conditions and filtering on `gen >= minGen` in the CLI ensures a stale `Settled=True` from a prior reconcile cannot prematurely satisfy the wait.

3. **Clean Ready/Settled separation** `(in-scope)` [Claude Opus 4.6] — Ready tracks live operational state (flips on degradation without a spec change), Settled tracks reconcile convergence (gates on generation and terminal reasons). The two serve different consumers — Ready for dashboards, Settled for CLI waits — and the code keeps them independent.

4. **Stopped-app engine bypass** `(in-scope)` [Claude Opus 4.6] — The `!desiredRunning` case at `app_controller.go:431-438` correctly short-circuits the engine check: a stopped VM cannot produce a fresh `ContainerEngineReady`, so gating on it would cause a permanent hang.

### Concerns

1. **Controller dependency source-of-truth split** `(in-scope)` [Codex GPT 5.4] — The CLI used to discover controller presence from the ConfigMap (runtime truth); this commit moves that decision into the App controller but uses the compile-time registry. Reusing the discovery ConfigMap or passing the enabled set from manager startup would keep controller-selection semantics in one place.

---

## Testing Assessment

1. **No test for `--controllers=-engine` with `rdd set running=true`**: No integration test exercises the `EngineEnabled=true, engine-not-running` configuration that triggers I1. A test with `setup_rdd_control_plane "app,lima"` followed by `rdd set running=true` would cover it.

2. **No test for `StartFailed` interaction with Settled**: The Settled condition stays False with a non-descriptive message when the VM fails to start (S3). A test forcing a start failure and verifying the timeout exit code (and the Settled condition's reason) would add confidence.

3. **Existing coverage is solid**: `engine-docker.bats` exercises the full Settled wait for `running=true`, `running=false`, and the containerd backend swap. `set.bats` correctly uses `--wait=false` because it runs without lima or engine controllers. The test header comment at `set.bats:7-15` explains the rationale.

---

## Documentation Assessment

The design docs in `api_app.md` and `cmd_app.md` accurately reflect the new behavior. The condition table in `api_app.md` comprehensively covers all Ready/Settled state transitions. The `computeAppConditions` docstring (lines 358-380) is thorough. The remaining gap is the cobra help text in `set.go:70-74` (S1).

---

## Acknowledged Limitations

- **`set.bats` requires `--wait=false`**: The test header comment at `set.bats:7-15` acknowledges that this suite runs with only the app controller (no lima, no engine), so the App reconciler fails to start and Settled is never written. The wait semantics are covered by `engine-docker.bats`.

---

## Agent Performance Retro

### [Claude]

Claude traced the `hasRegisteredController` → `shouldEnableController` discrepancy through both the controller registration path and the service startup path, citing specific lines in `service.go`. It was the only agent to flag the misleading Settled message for terminal errors (S3), a detail the others passed over despite reviewing the same switch cases. Its severity calibration was the most conservative — it classified the stale help text as Important rather than Suggestion.

### [Codex]

Codex produced the shortest review but every finding landed. Its analysis of I1 was the most thorough, tracing the issue through the discovery design doc and noting the contradiction with the documented external-controller discovery pattern. It was the only agent to cross-reference `docs/design/discovery.md:56`. Coverage summary was file-level precise.

### [Gemini]

Gemini independently found I1 and was the only agent to identify the template-mirroring race (S2). Its analysis of the `ObservedGeneration` stamping was mechanically correct but conflated two issues: the stamping on mirrored conditions (cosmetic) and the Settled switch logic not distinguishing "Started with old config" from "Started with current config" (functional). Its C2 severity (Critical) was too high — the prior code never waited for non-running property changes, making the new behavior strictly better despite the narrow race. Its S1 (empty reason fallback) was dropped as a false positive: CRD validation and controller logic guarantee non-empty reasons. Gemini's stderr shows a "File not found" error from an attempted file read; this did not affect output quality.

### Summary

| | Claude Opus 4.6 | Codex GPT 5.4 | Gemini 3.1 Pro |
|---|---|---|---|
| Duration | 6m 52s | 3m 28s | 4m 36s |
| Findings | 1I 1S | 1I 1S | 1I 1S |
| Tool calls | 28 (Read 14, Bash 8, Agent 3) | 36 (shell 36) | 28 (grep_search 11, run_shell_command 8, read_file 7) |
| Design observations | 3 | 2 | 1 |
| False positives | 0 | 0 | 1 |
| Unique insights | 1 | 0 | 1 |
| Files reviewed | 8 | 8 | 8 |
| Coverage misses | 0 | 0 | 0 |
| **Totals** | **1I 1S** | **1I 1S** | **1I 1S** |
| Downgraded | 1 (I→S) | 0 | 2 (I→S) |
| Dropped | 0 | 0 | 1 |


**Reconciliation:** Gemini C1 (EngineEnabled): critical → important I1 (non-default configuration, the primary use case works correctly). Gemini C2 (premature Settled): critical → suggestion S2 (narrow race, strictly better than prior behavior). Claude I2 (stale help text): important → suggestion S1 (documentation mismatch, behavior is correct). Gemini S1 (empty reason fallback): dropped as false positive.

---

## Review Process Notes

No suggestions.

---

## Appendix: Original Reviews

### [Claude] — Pass 1

I'll start by reading the changed files and tracing the logic. Let me read the key files in parallel.

Now let me read the BATS tests and design docs, and also trace into some key dependencies.

Let me trace into the engine controller and the LimaVM condition reasons to verify the Settled logic.

Let me check two critical things: (1) whether `--controllers` filtering could cause a mismatch with `hasRegisteredController`, and (2) the stale help text in set.go.

Now let me verify the `--controllers` flow more carefully and check for the `hasRegisteredController` vs runtime-enablement gap.

Now I have all the information needed. Let me compile the review.

---

## Executive Summary

This commit introduces two aggregate conditions (`Ready` and `Settled`) on the App resource and simplifies `rdd set` to unconditionally wait for `Settled=True` instead of property-specific condition checks. The design is sound — a single `Settled` condition is cleaner than the prior per-property mapping, and the `ObservedGeneration` gating correctly prevents stale-snapshot races. The primary issue is that `hasRegisteredController` checks init-time registration rather than runtime enablement, so `--controllers=-engine` causes `Settled` to hang for running apps.

## Findings

### Critical Issues

None.

### Important Issues

I1. **`hasRegisteredController` checks registration, not enablement** — `pkg/controllers/app/app/controller.go:153,167-174` (important, regression)

```go
EngineEnabled:    hasRegisteredController("engine"),
```

```go
func hasRegisteredController(name string) bool {
	for _, c := range base.GetAllControllers() {
		if c.GetName() == name {
			return true
		}
	}
	return false
}
```

`base.GetAllControllers()` at line 168 returns every controller that called `RegisterController` during `init()`, regardless of whether `--controllers` subsequently excluded it. The engine controller registers on all non-Windows platforms (`pkg/controllers/app/engine/controller.go:23-30`), but `shouldEnableController` in `pkg/service/cmd/service.go:493` can filter it out at runtime via `--controllers=*,-engine` or `--controllers=app,lima`.

When this happens, `EngineEnabled` is `true` but no one writes `ContainerEngineReady`. For a running app, `computeAppConditions` at `app_controller.go:440-448` falls through to `engineCond == nil` → `Settled=False/WaitingForEngine` → `rdd set running=true` hangs until timeout.

The prior code avoided this: `isEngineControllerEnabled` (removed in this commit from `set.go`) queried the discovery ConfigMap, which reflects actually-running controllers. This change regresses that capability.

Fix: The information is available at controller-manager startup (`service.go:757-764`) where the enabled list is computed. Pass it through, e.g., by having `RegisterWithManager` receive the set of enabled controller names and checking membership:

```diff
-		EngineEnabled:    hasRegisteredController("engine"),
+		EngineEnabled:    enabledControllers.Has("engine"),
```

Alternatively, have the shared controller manager expose a method to query its registered set, and look up "engine" there instead of in the global init-time registry.

I2. **Stale help text in `rdd set` cobra command** — `cmd/rdd/set.go:70-74` (important, gap)

```go
			By default, rdd set waits for the desired state: when running=true
			is set, it waits for the container engine to be ready; when
			running=false, it waits for the VM to stop. Pass --wait=false to
			return as soon as the patch is accepted, or --timeout=0 to wait
			indefinitely.
```

This user-facing help text (rendered by `rdd set --help`) still describes the old per-property wait behavior. Per `git blame`, lines 70-71 are from commit `1d71d4a8` and line 72 from `dbe1083d` — both pre-date this commit. The design doc (`docs/design/cmd_app.md:13`) was correctly updated to describe the new `Settled`-based wait, but the inline cobra help was not.

Fix: update the help text to match the new behavior:

```diff
-			By default, rdd set waits for the desired state: when running=true
-			is set, it waits for the container engine to be ready; when
-			running=false, it waits for the VM to stop. Pass --wait=false to
-			return as soon as the patch is accepted, or --timeout=0 to wait
-			indefinitely.
+			By default, rdd set waits for the reconcile chain to settle:
+			every property change waits for the App's Settled condition to
+			reach True at the current generation. Pass --wait=false to return
+			as soon as the patch is accepted, or --timeout=0 to wait
+			indefinitely.
```

### Suggestions

S1. **Settled message is misleading for terminal error states** — `pkg/controllers/app/app/controllers/app_controller.go:419-426` (suggestion, regression)

```go
	case desiredRunning && runningCond.Reason != "Started":
		settled.Status = metav1.ConditionFalse
		settled.Reason = runningCond.Reason
		settled.Message = "LimaVM has not yet reached Started"
	case !desiredRunning && runningCond.Reason != "Stopped":
		settled.Status = metav1.ConditionFalse
		settled.Reason = runningCond.Reason
		settled.Message = "LimaVM has not yet reached Stopped"
```

When `Reason` is a terminal error like `StartFailed` or `StopFailed`, the message "has not yet reached Started/Stopped" implies progress that will never happen. The VM failed and won't self-heal. The `Reason` field at line 420 does carry the correct signal (`StartFailed`), so an operator who reads both fields can diagnose the situation, but the message alone is misleading. `rdd set` will time out regardless, and the user sees only the timeout error.

Fix: condition the message on whether the reason is a known error terminal:

```diff
	case desiredRunning && runningCond.Reason != "Started":
		settled.Status = metav1.ConditionFalse
		settled.Reason = runningCond.Reason
-		settled.Message = "LimaVM has not yet reached Started"
+		if strings.HasSuffix(runningCond.Reason, "Failed") {
+			settled.Message = runningCond.Message
+		} else {
+			settled.Message = "LimaVM has not yet reached Started"
+		}
```

Or, more simply, always propagate the upstream message:

```diff
-		settled.Message = "LimaVM has not yet reached Started"
+		settled.Message = fmt.Sprintf("LimaVM: %s", runningCond.Message)
```

## Design Observations

### Strengths

- **Single Settled condition replaces per-property mapping.** The old `waitForDesiredState` hardcoded a `running → ContainerEngineReady/Running` mapping, making it impossible to wait for non-`running` changes (the prior TODO). Moving the aggregation server-side into `computeAppConditions` means every property change — engine swap, Kubernetes toggle, or any future addition — gets correct wait semantics for free.

- **Clean separation of Ready and Settled semantics.** Ready tracks "is the app operational right now?" (flips on live degradation even without a spec change), while Settled tracks "has the reconcile chain caught up with the current spec?" (gates on generation and terminal reasons). The two serve different consumers — Ready for liveness dashboards, Settled for CLI waits — and the code keeps them independent.

- **The `!desiredRunning` bypass for engine checks in Settled** (line 431-438) is a thoughtful edge case: a stopped VM cannot produce a fresh `ContainerEngineReady`, so gating on it would cause a permanent hang. The bypass and its comment correctly capture this asymmetry.

## Testing Assessment

1. **`--controllers=-engine` with `rdd set running=true`**: No test exercises the `EngineEnabled=true, engine-not-running` scenario. This is the configuration that triggers finding I1. An integration test with `setup_rdd_control_plane "app,lima"` followed by `rdd set running=true` would cover it.

2. **`StartFailed` / `StopFailed` interaction with Settled**: No test verifies what `rdd set` reports when the VM fails to start. The Settled condition stays False indefinitely with a non-descriptive message (finding S1). A test that forces a start failure and verifies `rdd set` exits with the timeout exit code (and that the Settled condition carries the failure reason) would add confidence.

3. **Existing coverage is solid**: `engine-docker.bats` exercises the full `Settled` wait for `running=true`, `running=false`, and the containerd backend swap. `set.bats` correctly uses `--wait=false` because it runs with `--controllers=app` (no lima/engine). The test header comment at `set.bats:7-15` explains the rationale clearly.

## Documentation Assessment

- `docs/design/api_app.md` and `docs/design/cmd_app.md` are correctly updated and match the new behavior.
- The cobra help text (`cmd/rdd/set.go:70-74`) was not updated (see finding I2).
- The `computeAppConditions` docstring (lines 358-380) is thorough and accurate.

## Acknowledged Limitations

- **`set.bats` requires `--wait=false`**: The test header comment at `set.bats:7-15` explicitly acknowledges that this suite runs with only the app controller, so Settled is never written. The `--wait=false` flag is added to every `rdd set` call accordingly. This is a known constraint of the test topology, not a code defect.

## Coverage Summary

| File | Status |
|------|--------|
| `bats/tests/32-app-controllers/engine-docker.bats` | Reviewed, no issues |
| `bats/tests/32-app-controllers/set.bats` | Reviewed, no issues |
| `cmd/rdd/set.go` | **I2** (stale help text) |
| `docs/design/api_app.md` | Reviewed, no issues |
| `docs/design/cmd_app.md` | Reviewed, no issues |
| `pkg/apis/app/v1alpha1/app_types.go` | Reviewed, no issues |
| `pkg/controllers/app/app/controller.go` | **I1** (`hasRegisteredController` vs enablement) |
| `pkg/controllers/app/app/controllers/app_controller.go` | **I1**, **S1** |
### [Codex] — Pass 1

### Executive Summary

This change centralizes `rdd set` waiting on a new aggregated `App.status.conditions[Settled]` condition and documents the new `Ready` / `Settled` semantics in the App design docs. The main regression is that the App controller now decides whether engine participation is required from the compile-time registry instead of the runtime-enabled controller set, so `rdd set` can block until timeout when `engine` is installed but disabled; the CLI help text also still describes the old wait behavior.

### Findings

#### Critical Issues

None.

#### Important Issues

I1. **`Settled` can never become true when `engine` is compiled in but disabled at runtime** — `pkg/controllers/app/app/controller.go:149-153, pkg/controllers/app/app/controllers/app_controller.go:427-447, cmd/rdd/set.go:478-486` (important, regression)

```go
if err := (&controllers.AppReconciler{
	Client:           mgr.GetClient(),
	Scheme:           mgr.GetScheme(),
	LimaTemplateData: limaTemplateData(),
	EngineEnabled:    hasRegisteredController("engine"),
}).SetupWithManager(mgr); err != nil {
	return err
}
```

`EngineEnabled` is now derived from `hasRegisteredController("engine")` at `pkg/controllers/app/app/controller.go:153`, but `hasRegisteredController` only consults the global init-time registry at `pkg/controllers/base/controller.go:91-118`. The service does not run every registered controller: it filters them through `shouldEnableController(...)` and only registers the selected subset with the manager at `pkg/service/cmd/service.go:761-804`. As a result, on Unix builds where the engine controller is compiled in but the service is started with `--controllers="lima"`, `--controllers="*,-engine"`, or split across external managers, `computeAppConditions()` still takes the engine-required branch at `pkg/controllers/app/app/controllers/app_controller.go:427-447` and waits for `ContainerEngineReady` that no controller will ever write. `rdd set` now waits only for `Settled=True` at `cmd/rdd/set.go:478-486`, so these configurations regress from completing to timing out. This also contradicts the discovery design note that later-attaching external controllers must be discovered from the ConfigMap rather than inferred from local registration (`docs/design/discovery.md:56`).

Fix: derive engine availability from the runtime-enabled controller set, not the local registry. The safest fix is to restore the previous discovery-based check against `rdd-controller-manager` data when computing CLI wait behavior, or to pass the actual enabled-controller list into `AppReconciler` from manager startup instead of `hasRegisteredController(...)`. Any fix must distinguish:
- compiled but disabled controllers
- controllers enabled in another manager
- Windows builds where `engine` never registers

#### Suggestions

S1. **`rdd set --help` still documents the removed property-specific wait mapping** — `cmd/rdd/set.go:70-73` (suggestion, regression)

```go
By default, rdd set waits for the desired state: when running=true
is set, it waits for the container engine to be ready; when
running=false, it waits for the VM to stop. Pass --wait=false to
return as soon as the patch is accepted, or --timeout=0 to wait
```

The implementation now waits only for `Settled=True` at `cmd/rdd/set.go:478-486`, and the design docs were updated accordingly in `docs/design/cmd_app.md`. Leaving the old text in `cmd/rdd/set.go:70-73` makes `rdd set --help` inaccurate on the same commit that changed the behavior. The current help test in `bats/tests/32-app-controllers/set.bats:35-40` only checks that property names are listed, so this mismatch can ship silently.

Fix: update the long help text to describe the `Settled`-based wait and add an assertion in the help test for that wording, for example:

```diff
 By default, rdd set waits for the desired state: when running=true
-is set, it waits for the container engine to be ready; when
-running=false, it waits for the VM to stop. Pass --wait=false to
-return as soon as the patch is accepted, or --timeout=0 to wait
-indefinitely.
+is set, it waits for the App to report Settled=True at the new
+generation. Pass --wait=false to return as soon as the patch is
+accepted, or --timeout=0 to wait indefinitely.
```

### Design Observations

**Concerns**

1. **Controller dependency discovery is now split across layers** `(in-scope)`  
The CLI used to ask discovery which controllers were enabled before deciding what to wait for; this change moved that decision into the App controller, but it uses a different source of truth (the in-process registry). Reusing the discovery ConfigMap or plumbing the actual enabled set into reconcilers would eliminate this class of “compiled vs enabled vs externally attached” mismatches and keep controller-selection semantics in one place.

**Strengths**

1. Aggregating wait semantics into `AppConditionSettled` is a good direction. It removes the CLI’s hardcoded property-to-condition matrix and gives other clients a single condition to watch.

2. Stamping `ObservedGeneration` onto synthesized conditions is the right guard against stale status snapshots. That part of the design is sound and worth preserving.

### Testing Assessment

1. There is no integration test for `rdd set` with `engine` compiled in but disabled at runtime, such as `setup_rdd_control_plane "lima"` or `rdd svc start --controllers="*,-engine"`. That is the highest-risk untested path because it now triggers I1.
2. There is no test that `rdd set --help` describes `Settled`-based waiting. `bats/tests/32-app-controllers/set.bats:35-40` only checks property discovery, so S1 was not detectable.

### Documentation Assessment

The design docs in `docs/design/api_app.md` and `docs/design/cmd_app.md` were updated consistently with the new behavior. The remaining documentation gap is the stale user-facing help text in `cmd/rdd/set.go:70-73` (S1).

### Acknowledged Limitations

- **Code comment**: “`passes --wait=false on every set so the command returns as soon as the patch is accepted: this suite runs with only the app controller (no lima, no engine) ... The real wait semantics are covered by engine-docker.bats.`” — `bats/tests/32-app-controllers/set.bats:10-15`. This explicitly acknowledges that the lightweight `set.bats` suite cannot exercise wait behavior. The change makes that limitation more relevant because wait semantics now depend on controller selection, but the only remaining wait coverage is still the full engine-enabled path.

### Coverage Summary

- `bats/tests/32-app-controllers/engine-docker.bats` — Reviewed, no issues
- `bats/tests/32-app-controllers/set.bats` — Reviewed, no issues
- `cmd/rdd/set.go` — I1, S1
- `docs/design/api_app.md` — Reviewed, no issues
- `docs/design/cmd_app.md` — Reviewed, no issues
- `pkg/apis/app/v1alpha1/app_types.go` — Reviewed, no issues
- `pkg/controllers/app/app/controller.go` — I1
- `pkg/controllers/app/app/controllers/app_controller.go` — I1

### [Gemini] — Pass 1

### Executive Summary

This PR aggregates `Ready` and `Settled` conditions into the App controller and simplifies the `rdd set` CLI command to wait solely on the `Settled` condition. While moving the wait logic out of the CLI and into the controller simplifies client implementations, it introduces a critical regression in how the engine controller's presence is detected. The App reconciler now relies on a global binary registry rather than the runtime flags, permanently stalling the App if the engine controller is excluded at startup. Furthermore, a pre-existing bug in condition mirroring completely defeats the new `Settled` wait for in-place property updates that trigger VM restarts.

### Findings

#### Critical Issues

C1. **`EngineEnabled` calculation ignores `--controllers` runtime flag** — `pkg/controllers/app/app/controller.go:160-169`, `pkg/controllers/app/app/controllers/app_controller.go:400`
(critical, regression)

The `hasRegisteredController` function determines if the engine controller is present by iterating over `base.GetAllControllers()`. This global registry contains every controller compiled into the binary whose `init()` function ran. It completely ignores the `--controllers` command-line flag used to filter which controllers actually start. 

If a user or test runs `rdd svc start --controllers=app,-engine`, the engine controller is not started, but `EngineEnabled` still evaluates to `true`. The App reconciler then indefinitely blocks `Ready=True` and `Settled=True` waiting for a `ContainerEngineReady` condition that will never arrive:
```go
// pkg/controllers/app/app/controllers/app_controller.go:400
case engineEnabled && (engineCond == nil || engineCond.Status != metav1.ConditionTrue):
    ready.Status = metav1.ConditionFalse
    ready.Reason = "EngineNotReady"
```
This causes `rdd set` to hang until timeout for any configuration changes.

Fix: The App reconciler must determine `EngineEnabled` dynamically by reading the discovery ConfigMap, just as the previous CLI implementation did. Since the discovery ConfigMap (`rdd-controller-manager` in `rdd-system`) accurately reflects runtime flags via `EnabledControllers`, the App reconciler should read it (and optionally cache it once found) during the reconcile loop before calling `computeAppConditions`.

C2. **Premature `ObservedGeneration` on mirrored conditions defeats `Settled` wait** — `pkg/controllers/app/app/controllers/app_controller.go:340`, `pkg/controllers/app/app/controllers/app_controller.go:390`
(critical, gap)

When the App reconciler updates the `LimaVM` template (e.g., after changing `kubernetes.version`), `app.Generation` increments. In the same pass, the reconciler mirrors the `LimaVM`'s conditions onto the `App`. However, it blindly stamps `ObservedGeneration: latest.Generation` onto the mirrored condition:
```go
changed = apimeta.SetStatusCondition(&latest.Status.Conditions, metav1.Condition{
    Type:               cond.Type,
    // ...
    ObservedGeneration: latest.Generation, // incorrectly claims it observed the new App generation
})
```
Because the `LimaVM` controller hasn't had time to process the new template yet, the `LimaVM`'s `Running` condition still reflects the old state (`Reason: "Started"`). `computeAppConditions` sees `runningCond.Reason != "Started"` evaluate to `false` and prematurely sets `Settled=True` with the new generation. `rdd set` sees this and exits immediately, completely missing the background VM restart.

Fix: `computeAppConditions` must know if the `LimaVM` state is stale relative to the `LimaVM` object's own generation.

```diff
--- pkg/controllers/app/app/controllers/app_controller.go
+++ pkg/controllers/app/app/controllers/app_controller.go
@@ -340,7 +340,11 @@
 				LastTransitionTime: cond.LastTransitionTime,
 			}) || changed
 		}
-		ready, settled := computeAppConditions(latest, r.EngineEnabled)
+		limaVMStale := false
+		for _, cond := range limaVM.Status.Conditions {
+			if cond.ObservedGeneration < limaVM.Generation {
+				limaVMStale = true
+			}
+		}
+		ready, settled := computeAppConditions(latest, r.EngineEnabled, limaVMStale)
 		changed = apimeta.SetStatusCondition(&latest.Status.Conditions, ready) || changed
 		changed = apimeta.SetStatusCondition(&latest.Status.Conditions, settled) || changed
```

Then update `computeAppConditions` to gate on `limaVMStale`:

```diff
-func computeAppConditions(app *v1alpha1.App, engineEnabled bool) (ready, settled metav1.Condition) {
+func computeAppConditions(app *v1alpha1.App, engineEnabled bool, limaVMStale bool) (ready, settled metav1.Condition) {
// ...
 	switch {
+	case limaVMStale:
+		settled.Status = metav1.ConditionFalse
+		settled.Reason = "WaitingForLimaVM"
+		settled.Message = "LimaVM has not yet observed the latest configuration"
 	case runningCond == nil:
```

#### Important Issues
None.

#### Suggestions

S1. **Fallback reason in `computeAppConditions` for empty condition reasons** — `pkg/controllers/app/app/controllers/app_controller.go:417`
(suggestion, enhancement)

If `runningCond.Reason` happens to be empty (e.g., if a manual edit or bug clears it), `settled.Reason` will also be empty. K8s API conventions require `Reason` to be a non-empty `CamelCase` string. 

Fix: Provide a safe fallback:
```go
	case desiredRunning && runningCond.Reason != "Started":
		settled.Status = metav1.ConditionFalse
		reason := runningCond.Reason
		if reason == "" {
			reason = "Starting"
		}
		settled.Reason = reason
```

### Design Observations

**Strengths**
- Moving the wait condition logic from the CLI (`cmd/rdd/set.go`) into the controller simplifies client integration. Any consumer of the API can now simply watch for `Settled=True` without needing to recreate the complex conditional polling logic that `rdd set` previously employed.

### Testing Assessment

1. The test suite in `bats/tests/32-app-controllers/set.bats` was updated to pass `--wait=false` for all property mutations. As explicitly commented by the author, this is because the App reconciler fails to start in that suite (due to running with `--controllers=app` and missing the `LimaVM` CRD dependency). While this allows the CLI tests to pass, it leaves the actual `Settled` wait logic completely untested for in-place property mutations that restart the VM. A test should be added (perhaps in `engine-docker.bats` where the full control plane is running) to explicitly assert that `rdd set kubernetes.version=...` blocks until the VM fully restarts.

### Documentation Assessment

The documentation updates in `api_app.md` and `cmd_app.md` accurately reflect the intended behavior of the `Ready` and `Settled` conditions. 

### Acknowledged Limitations

- **App reconciler fails to start in isolated tests** 
  - **Code comment**: `bats/tests/32-app-controllers/set.bats:9-13` - *"this suite runs with only the app controller (no lima, no engine), so the App reconciler fails to start and Settled is never written."* The author acknowledges that testing the wait logic here is impossible under the current test harness structure and delegates wait-testing to `engine-docker.bats`.

### Coverage Summary

- `bats/tests/32-app-controllers/engine-docker.bats`: Reviewed, no issues
- `bats/tests/32-app-controllers/set.bats`: Reviewed, no issues
- `cmd/rdd/set.go`: Reviewed, no issues
- `docs/design/api_app.md`: Reviewed, no issues
- `docs/design/cmd_app.md`: Reviewed, no issues
- `pkg/apis/app/v1alpha1/app_types.go`: Reviewed, no issues
- `pkg/controllers/app/app/controller.go`: C1
- `pkg/controllers/app/app/controllers/app_controller.go`: C1, C2, S1

## Resolution

| | |
|---|---|
| **Addressed** | 2026-04-16 |
| **Commit** | c9cd181 — app: add Settled condition, wait for it in `rdd set` |

| # | Finding | Action |
|---|---------|--------|
| 1 | Important #1: `hasRegisteredController` checks registration, not runtime enablement | Fixed |
| 2 | Suggestion #1: Stale help text in `rdd set` cobra command | Fixed |
| 3 | Suggestion #2: Narrow race: premature Settled=True after in-place config change | Commented |
| 4 | Suggestion #3: Settled message misleading for terminal error states | Fixed |
| 5 | Testing Gap #1: No test for `--controllers=-engine` with `rdd set running=true` | Test written |
| 6 | Testing Gap #2: No test for `StartFailed` interaction with Settled | Test written |
