# Deep Review: 20260414-232915-pr-309

| | |
|---|---|
| **Date** | 2026-04-14 23:29 |
| **Repo** | [rancher-sandbox/rancher-desktop-daemon](https://github.com/rancher-sandbox/rancher-desktop-daemon) |
| **Round** | 3 (of PR #309) |
| **Author** | [@jandubois](https://github.com/jandubois) |
| **PR** | [#309](https://github.com/rancher-sandbox/rancher-desktop-daemon/pull/309) — engine: mirror Docker container engine state into Kubernetes |
| **Branch** | `engine-v3` |
| **Commits** | `587bfe2` engine: mirror Docker container engine state into Kubernetes |
| **Reviewers** | Claude Opus 4.6, Codex GPT 5.4, Gemini 3.1 Pro |
| **Verdict** | **Merge with fixes** — one nil-deref and one Windows-wait regression worth fixing before merge; the cross-group registration and namespace self-healing gaps can ship as follow-ups if acknowledged. |
| **Wall-clock time** | `32 min 44 s` |

---

## Executive Summary

PR #309 adds the engine controller that mirrors Docker containers, images, and volumes into `containers.rancherdesktop.io` resources, teaches `rdd set` to wait on post-write App conditions, and adds a discovery ready annotation so CRD consumers stop racing startup. The engine reconciler is well-structured: SSA field-owner separation, retry-on-conflict on the multi-writer App status, daemon-clock-anchored Docker event replay, and idempotent fullSync preserving mirror identities across reconnects.

Four important issues surfaced: (1) `rdd set running=true` now hangs for the full 5-minute timeout on Windows because no controller writes `ContainerEngineReady` there; (2) the `rancher-desktop` namespace and `containernamespace/moby` are created once in `fullSync` and never self-healed if a user deletes them mid-run; (3) the new controller is registered under API group `app` but watches `Container`/`Image`/`Volume` from the `containers` group, so selecting `--controllers=app` starts it without the CRDs it depends on; and (4) `inspect.Config.Labels` is dereferenced without a nil guard on the container and image apply paths, even though `NetworkSettings` is guarded a few lines down. Three lower-severity findings cover latent watcher-context decoupling, a BATS command substitution at file scope, and a webhook enum check that duplicates CRD validation.

Structure: 0 critical, 4 important, 3 suggestions.

---

## Critical Issues

None.

---

## Important Issues

I1. **`rdd set running=true` hangs on Windows because no controller writes `ContainerEngineReady`** — `cmd/rdd/set.go:316-319,474-479` [Codex GPT 5.4]

```go
// cmd/rdd/set.go:316-319
if dryRun || timeout == 0 {
    return nil
}
return waitForDesiredState(ctx, restConfig, properties, timeout, targetGen)
```

```go
// cmd/rdd/set.go:474-479
if runningVal == "true" {
    logrus.Info("Waiting for container engine to be ready")
    return watchCondition(ctx, config, func(obj *unstructured.Unstructured) bool {
        status, gen, present := conditionInfo(obj, "ContainerEngineReady")
        return present && gen >= minGen && status == metav1.ConditionTrue
    })
}
```

`rdd set` now calls `waitForDesiredState()` after every successful write unless `--timeout=0`, and the `running=true` branch accepts only `ContainerEngineReady=True`.

On Windows, `pkg/controllers/app/engine/controller.go:23-29` returns from `init()` before registering the engine controller at all. `AppReconciler` only mirrors LimaVM conditions and never synthesizes `ContainerEngineReady` (see `pkg/controllers/app/app/controllers/app_controller.go:302-329`), so on Windows the condition has no writer and `rdd set running=true` hangs for the full 5-minute default timeout before erroring out.

Windows support is still a TODO (the engine controller comment calls out that "WSL2 support" is pending), but a 5-minute UX regression on any Windows build path is worse than a clean error. Codex flagged this as Critical; we downgrade to Important because Windows is not yet a shipping target, but it still wants addressing before Windows work lands.

Fix: either (a) register a Windows stub engine reconciler that stamps a terminal `ContainerEngineReady=True, Reason=NotApplicable` once the App is Running, matching the non-moby path at `engine_reconciler.go:136-147`, or (b) have `waitForDesiredState` detect that the engine controller is not enabled (for example, by consulting the discovery ConfigMap's `EnabledControllers`) and skip the condition wait. Waiting on a condition with no writer is not safe.

---

I2. **`rancher-desktop` namespace and `containernamespace/moby` are created once and never repaired** — `pkg/controllers/app/engine/controllers/engine_reconciler.go:523-530` [Codex GPT 5.4] [Claude Opus 4.6]

```go
return ctrl.NewControllerManagedBy(mgr).
    For(&appv1alpha1.App{}).
    Named("engine-reconciler").
    WatchesRawSource(source.Channel(r.reconcileChan, enqueueApp)).
    Watches(&containersv1alpha1.Container{}, enqueueApp).
    Watches(&containersv1alpha1.Image{}, enqueueApp).
    Watches(&containersv1alpha1.Volume{}, enqueueApp).
    Complete(r)
```

The engine reconciler watches `App`, `Container`, `Image`, and `Volume` only. `corev1.Namespace` and `containersv1alpha1.ContainerNamespace` are absent from the watch set.

`ensureNamespace` (`sync_volumes.go:40-54`) and `syncContainerNamespace` (`sync_volumes.go:61-66`) are only reached from `fullSync` (`docker_watcher.go:426-451`), which runs at watcher startup. Once the watcher is running, a `kubectl delete namespace rancher-desktop` or `kubectl delete containernamespace moby` produces no event that the reconciler cares about. The namespace stays gone — which cascade-deletes every mirror resource in it — and the ContainerNamespace stays gone until the next VM stop/start or Docker reconnect.

Deleting the namespace also destroys every mirror resource without giving the mirror finalizer a chance to forward deletes to Docker, which is a correctness issue the design otherwise takes care to preserve.

Fix: add `Watches(&corev1.Namespace{}, ...)` filtered to `rancher-desktop` and `Watches(&containersv1alpha1.ContainerNamespace{}, enqueueApp)`, then in the steady-state reconcile path (after `startWatcher`) unconditionally call `w.ensureNamespace(ctx)` and `w.syncContainerNamespace(ctx)`. Both are idempotent SSA applies with Get-guards, so no-op cost is negligible.

---

I3. **Engine controller registered under API group `app` but depends on `containers` CRDs** — `pkg/controllers/app/engine/controller.go:32-55`, `pkg/service/cmd/service.go:493-554` [Codex GPT 5.4]

```go
// pkg/controllers/app/engine/controller.go
const (
    controllerName = "engine"
    apiGroup       = "app"
)

func (c *controller) GetCRDData() string {
    return ""
}
```

`shouldEnableController` selects controllers by either name or API group. Starting with `--controllers=app` therefore enables the engine controller without enabling the `containers` API group. But the engine reconciler's `SetupWithManager` calls `Watches(&containersv1alpha1.Container{}, ...)`, `Image`, and `Volume` — if those CRDs aren't installed, the manager start fails (the watch cache can't establish) or the reconciler endlessly re-attempts.

The controller also has no CRDs of its own (`GetCRDData()` returns `""`), so the dependency is purely on another controller group's CRDs. That relationship is invisible to `shouldEnableController` and to the `--controllers` help text.

`cmd/app-controller/main.go:12-20` imports the `engine` package into the standalone `app-controller` binary too, which means the standalone binary now also needs the `containers` API to be installed elsewhere, or it will fail at manager start.

Fix options, in order of preference:
1. Move the engine controller into the `containers` group (rename `apiGroup = "containers"`). It mirrors Docker state *into* the containers API, so this is where it conceptually belongs. This also makes the dependency implicit.
2. Teach `shouldEnableController` to auto-enable cross-group dependencies, or refuse startup with a clear error if `engine` is selected without `containers`.
3. At minimum, document the dependency in the `--controllers` help text.

---

I4. **Nil-pointer dereference on `inspect.Config.Labels` for containers and images** — `pkg/controllers/app/engine/controllers/sync_containers.go:158`, `pkg/controllers/app/engine/controllers/sync_images.go:163` [Claude Opus 4.6]

```go
// sync_containers.go:152-161
statusApply := containersv1alpha1apply.ContainerStatus().
    WithName(name).
    WithNamespace(namespace).
    WithPath(inspect.Path).
    WithArgs(inspect.Args...).
    WithImage(inspect.Image).
    WithLabels(inspect.Config.Labels).
    WithStatus(mapDockerContainerState(string(inspect.State.Status))).
    WithPid(int32(inspect.State.Pid)).
    WithExitCode(int32(inspect.State.ExitCode))
```

```go
// sync_images.go:157-163
statusApply := containersv1alpha1apply.ImageStatus().
    WithID(inspect.ID).
    WithRepoDigests(digests...).
    WithArchitecture(inspect.Architecture).
    WithOS(inspect.Os).
    WithSize(inspect.Size).
    WithLabels(inspect.Config.Labels)
```

`mobycontainer.InspectResponse.Config` is `*container.Config` and `mobyimage.InspectResponse.Config` is `*dockerspec.DockerOCIImageConfig` (verified in moby `api@v1.54.1`). Both are pointers that can be nil — the container path also explicitly guards `inspect.NetworkSettings` a few lines down (`sync_containers.go:182`), showing the authors already know these fields are nilable.

Nil `Config` is unlikely for a freshly inspected container but possible for legacy/multi-platform images or when a Docker-side bug corrupts an image record. A panic in the Docker watcher goroutine is partially caught by the `recover` at `docker_watcher.go:144-149`, but that recover only logs and returns — the watcher goroutine exits, the reconciler detects a dead watcher and reconnects, and the panic rechecks the same input on the next sync, reproducing the crash. That's an infinite reconnect loop on a single bad image, not a clean failure.

Fix: guard both accesses. The pattern used elsewhere in this file is a local nil-check before building the apply config:

```go
var labels map[string]string
if inspect.Config != nil {
    labels = inspect.Config.Labels
}
// ...WithLabels(labels)
```

Apply the same pattern in `imageStatusFromInspect`.

---

## Suggestions

S1. **Webhook state-enum check duplicates the CRD schema** — `pkg/controllers/containers/container/container_webhooks.go:37-47` [Claude Opus 4.6] [Gemini 3.1 Pro]

```go
if oldContainer.Spec.State != newContainer.Spec.State {
    switch newContainer.Spec.State {
    case v1alpha1.ContainerStatusCreated, v1alpha1.ContainerStatusRunning, v1alpha1.ContainerStatusUnknown:
        // Valid transition.
    default:
        return nil, fmt.Errorf("invalid target state %q: must be %q, %q, or %q", ...)
    }
```

`spec.state` is declared with `+kubebuilder:validation:Enum=created;running;unknown` (`container_types.go:69`), and the CRD schema runs before validating admission webhooks. Any out-of-enum value is rejected with a 400 before this webhook executes, so the `switch` is unreachable.

Gemini rated this Important; the code is dead but harmless, so Suggestion fits better.

Fix: drop the `switch`. The spec-normalisation `DeepEqual` check is the only work the webhook needs to do for state transitions.

---

S2. **`engine.bats` uses `$(rdd ...)` substitution at file scope** — `bats/tests/32-app-controllers/engine.bats:12-14` [Claude Opus 4.6] [Gemini 3.1 Pro]

```bash
NAMESPACE="rancher-desktop"
DOCKER_HOST="unix://$(rdd svc paths docker_socket)"
export DOCKER_HOST
```

`DOCKER_HOST` is assigned via command substitution at file scope, which runs during the BATS parse phase — before `skip_on_windows` in `local_setup_file`, and before `run -0` can surface a clean error. Per the repo's BATS style rules, command substitution outside `run` hides failures in the empty string. On Windows (where the test is meant to skip) this still executes `rdd svc paths docker_socket`, which is wasted work at best and a misleading failure at worst.

Gemini rated this Important; Claude rated it a Suggestion, and style-violation severity is better classed as Suggestion.

Fix: move into `local_setup_file`, after `skip_on_windows`:

```bash
local_setup_file() {
    skip_on_windows
    run -0 rdd svc paths docker_socket
    DOCKER_HOST="unix://${output}"
    export DOCKER_HOST
    rdd svc delete
    rdd set running=true
}
```

---

S3. **Docker watcher goroutine bound to the `Reconcile` context** — `pkg/controllers/app/engine/controllers/engine_reconciler.go:159-168`, `docker_watcher.go:45-95` [Gemini 3.1 Pro]

```go
// engine_reconciler.go:159-168
if !watcherRunning {
    log.Info("App is running, starting Docker watcher")
    if err := r.startWatcher(ctx); err != nil {
        // ...
    }
}

// docker_watcher.go:62
watchCtx, watchCancel := context.WithCancel(ctx)
```

`r.startWatcher(ctx)` is called with the ctx passed into `Reconcile`, and `newDockerWatcher` derives `watchCtx` from it. Currently controller-runtime does not per-request cancel a Reconcile ctx (it's derived from the manager ctx), so the watcher goroutine survives a Reconcile return. But it's a latent coupling: any future `ReconciliationTimeout`, per-request deadline, or middleware that adds a cancel would instantly kill the watcher when Reconcile returns, and the reconciler would immediately restart it, cycling forever.

Fix: decouple the watcher from the request ctx with `context.WithoutCancel`:

```go
w, err := newDockerWatcher(context.WithoutCancel(ctx), r.Client, r.reconcileChan)
```

This preserves the logger and other context values but isolates cancellation.

---

## Design Observations

### Strengths

- **SSA field-owner separation** (`controllerName` vs `finalizerFieldOwner`) (in-scope) [Claude Opus 4.6] [Gemini 3.1 Pro] — The insight that a finalizer-only SSA apply through the main field owner would release ownership of `spec` and fail CRD required-field validation is non-obvious and correctly addressed by running finalizer-only applies through a distinct field manager. Worth documenting as a pattern for other controllers.
- **Daemon-clock anchoring for Docker events** (in-scope) [Claude Opus 4.6] — `dockerEventsSince` uses `Info.SystemTime` instead of `time.Now()` for the `Since` filter, avoiding host/guest clock skew that would silently drop events inside the skew window. The fallback to a 2-minute-biased host clock is a sensible safety net for Info failures.
- **Idempotent fullSync preserves mirror identities** (in-scope) [Claude Opus 4.6] — Container IDs, image hashes, and volume hashes are stable across watcher reconnects, so transient Docker disconnects cause no resource churn for downstream clients. Combined with the `alive()` detection + reconnect path in the reconciler, this handles daemon restarts cleanly.
- **`observedGeneration` gating in `rdd set`** (in-scope) [Claude Opus 4.6] [Codex GPT 5.4] [Gemini 3.1 Pro] — Filtering condition snapshots by `gen >= minGen` rejects stale `ContainerEngineReady=True` from a previous backend that would otherwise prematurely satisfy the wait. Both `EngineReconciler` and `AppReconciler` stamp `ObservedGeneration` consistently on updates, which is what makes the filter reliable.
- **Stable sort before SSA apply on atomic lists** (in-scope) [Gemini 3.1 Pro] — `ContainerPort` bindings, port names, and `RepoDigests` are sorted before each apply, avoiding spurious `resourceVersion` churn caused by Go's randomized map iteration order combined with atomic list semantics.
- **`retry.RetryOnConflict` on cross-controller App status writes** (in-scope) [Claude Opus 4.6] — Both `AppReconciler.Reconcile` and `EngineReconciler.setEngineCondition` wrap their App status updates in retry + re-Get, which is the correct response to two controllers writing different conditions on the same object outside controller-runtime's per-controller serialisation.

### Concerns

- **Per-reconcile List cost for container specs and finalizers** (future) [Claude Opus 4.6] — `reconcileContainerSpecs` and `processFinalizers` each List all Containers/Images/Volumes per reconcile. Every watched child event triggers a reconcile, so total cost is O(N) per child event. The `engine_reconciler.go:181-185` comment already flags this and names the long-term fix (per-resource reconcilers with watch predicates on `deletionTimestamp` set and `spec.state` changed). Worth prioritising once mirror counts exceed ~100 per type, but fine as-is for dev workstations.

---

## Testing Assessment

Test coverage is strong for the new functionality. The new BATS suite `bats/tests/32-app-controllers/engine.bats` covers image pull/untag/dangling flows, container lifecycle (create/stop/rm), volumes with RFC-1123-invalid names, bidirectional deletion (K8s→Docker and Docker→K8s), `spec.state` transitions (running↔created, pause handling), the non-moby `NotApplicable` path, and `rdd set` wait behaviour (already-stopped, `--timeout=0`). Unit tests cover `parseContainerName` and the `InitDiscovery`/`MarkControlPlaneReady` lifecycle.

Codex ran `go test ./pkg/controllers/app/engine/controllers ./pkg/service/controllers ./pkg/controllers/containers/container ./pkg/controllers/app/app/controllers` during its review; all four packages passed.

Untested scenarios ranked by risk:

1. `rdd set running=true` on Windows (the regression in I1). BATS skips Windows, so no CI gate catches it.
2. `--controllers=app` startup without the `containers` group (I3). No selective-start test exercises this.
3. Deleting `namespace/rancher-desktop` or `containernamespace/moby` while the app is running (I2). No self-healing test.
4. Docker daemon disconnect during the event stream (watcher death → reconnect cycle). Only implicitly covered through VM stop/start.
5. A corrupt/legacy image with nil `Config` (I4). Can't easily be simulated against a real Docker engine; a unit-level test against a hand-built inspect response would be cheap and catch the regression.

---

## Documentation Assessment

Design docs are updated comprehensively: `api_app.md` (new `ContainerEngineReady` condition), `api_containers.md` (engine mirroring, terminology, `spec.state`, finalizer lifecycle), `cmd_app.md` (`--timeout`, wait semantics), and `discovery.md` (ready annotation). The terminology section clarifying capitalized K8s resource names vs lowercase Docker engine objects is a useful addition.

Two gaps surfaced by the review:

1. `docs/design/cmd_app.md` and `docs/design/api_app.md` describe `running=true` as waiting on `ContainerEngineReady` without noting the Windows no-writer caveat (I1).
2. The `--controllers` help text in `pkg/service/controllers/options.go:32` does not describe the `app`→`containers` dependency edge (I3).

---

## Acknowledged Limitations

- `engine_reconciler.go:181-185` — Per-reconcile `List` calls for specs and finalizers are O(N) per child event. Long-term fix is per-resource reconcilers with watch predicates.
- `cmd/rdd/set.go:460-463` — `rdd set` does not wait on property changes that trigger a VM restart without changing `running` (for example, `containerEngine.name` alone). Distinct from I1.
- `docker_watcher.go:179-190` — Transient `handleEvent` errors for image pull and volume create are logged and dropped; the mirror stays missing until the next full resync (e.g., VM cycle). A periodic fullSync tick is explicitly deferred.
- `sync_images.go:239-243` — `removeImagesByID` matches on `status.id`; a narrow orphan window exists if the metadata Apply succeeded but the status Apply failed transiently. Self-heals on the next watcher restart via `syncAllImages`.
- `engine/controller.go:23-29` — Engine controller registration skipped on Windows until WSL2 socket access is wired up. Interacts with I1.
- `engine_reconciler.go:136-147` — `ContainerEngineReady` name applies to moby only; will be renamed when containerd mirroring lands.

---

## Agent Performance Retro

### [Claude Opus 4.6]

Claude delivered the most thorough review: it caught the nil-pointer on `inspect.Config.Labels` that neither other agent found, verified it against the moby types at the right module version, and produced the most complete Coverage Summary with per-file "Trivial" vs "Reviewed, no issues" classifications. Its SSA field-owner strength call and the `observedGeneration` gating call showed the deepest engagement with the finalizer/multi-writer design. Two of its five findings overlap with Gemini at the same severity, which is a good signal of shared ground truth. It missed the Windows wait regression and the cross-group dependency — the two biggest structural issues Codex caught. Calibration was appropriate (one Important, four Suggestions, nothing inflated).

### [Codex GPT 5.4]

Codex delivered the two highest-impact structural findings — the Windows wait regression and the `app`→`containers` cross-group dependency — which nobody else saw. It was the only agent that ran `go test` on the affected packages to verify the unit tests still pass, and it correctly linked the namespace self-healing gap to the watch set. Its retro was the sharpest at connecting findings to testing gaps. The severity calibration on C1 (Windows hang) is defensible but we downgraded to Important because Windows is not currently a shipping target; worth noting that Codex's threshold for "Critical" is UX regression on a tier-1 platform, which is reasonable. It missed the nil-pointer on `inspect.Config`, which was the most concrete local defect.

### [Gemini 3.1 Pro]

Gemini hit the Google backend capacity limits (429 errors) partway through the run and retried with backoff, but still produced a structured review on time. It independently landed on two findings Claude also caught (webhook dead code, BATS substitution) and added the latent watcher-context decoupling observation. The latter is a real gap Claude and Codex missed, but Gemini inflated all three of its findings to Important despite two being style/cosmetics and one being latent — calibration was too hot. Per the noted behaviour, Gemini skipped `git blame`, so it could not distinguish regressions from pre-existing issues; Claude and Codex covered that dimension.

### Summary

| | Claude Opus 4.6 | Codex GPT 5.4 | Gemini 3.1 Pro |
|---|---|---|---|
| Duration | 8m 02s | 7m 57s | 13m 02s |
| Findings | 1I 3S | 3I | 3S |
| Tool calls | 45 (Read 23, Grep 11, Bash 10) | 86 (shell 85, stdin 1) | 21 (run_shell_command 14, grep_search 4, read_file 3) |
| Design observations | 6 | 2 | 3 |
| False positives | 0 | 0 | 0 |
| Unique insights | 1 | 3 | 1 |
| Files reviewed | 38 | 38 | 38 |
| Coverage misses | 0 | 0 | 0 |
| **Totals** | **1I 3S** | **3I** | **3S** |
| Downgraded | 0 | 1 (I→S) | 3 (I→S) |
| Dropped | 0 | 0 | 0 |


Overall, Claude and Codex provided complementary value: Claude caught the local nil-deref and gave the cleanest coverage picture; Codex caught the structural regressions (Windows wait, cross-group dependency, namespace self-healing). Gemini contributed one unique latent finding and a second opinion on two Claude suggestions, at the cost of inflated severities. For a three-agent run on a 162 KB diff this was a healthy spread — the three severity votes per finding let us calibrate confidently.

**Reconciliation** — Codex P1 `rdd set running=true` Windows hang: critical → important I1 (Windows is not yet a shipping target, so it's a UX regression rather than a correctness ship-stopper). Gemini P1 watcher-context coupling: important → suggestion S3 (latent, not currently triggered). Gemini P1 BATS substitution: important → suggestion S2 (style violation, no runtime impact). Gemini P1 webhook dead code: important → suggestion S1 (unreachable but harmless). No findings were dropped as false positives.

---

## Review Process Notes

### Skill improvements

- **Add a dimension for cross-group / cross-controller dependency checking.** When a controller registers under API group *X* but its `SetupWithManager` watches types from group *Y*, the selectable startup modes have an invisible edge: `--controllers=X` starts the controller without the CRDs it depends on. Recognise the pattern in code by reading each controller's `apiGroup` (or equivalent group identifier) and checking that every type the reconciler watches is declared in the same group, or that the dependency is made explicit in controller selection logic. Flag when they diverge. **Why:** controllers within multi-group managers commonly fail at startup only when a user picks a non-default controller subset, and default-path tests never exercise that combination.

### Repo context updates

(none — `deep-review-context.md` was accurate for this review)

---

## Appendix: Original Reviews

### [Claude Opus 4.6] — Pass 1

See `/tmp/deep-review-WxUhwx/review-claude-pass-1.md` (reproduced below):

I1 (nil-pointer on `inspect.Config.Labels`), S1 (webhook enum check dead code), S2 (BATS `$(rdd ...)` at file scope), S3 (engine reconciler does not watch `ContainerNamespace`), S4 (related SetupWithManager consistency). Verdict: Approve with minor fixes. Full review (~14 KB) available in the workdir; key code snippets and fixes are reproduced in the consolidated findings above.

### [Codex GPT 5.4] — Pass 1

C1 (`rdd set running=true` hangs on Windows), I1 (namespace/ContainerNamespace not self-healed), I2 (`engine` registered in `app` group but depends on `containers`). Also ran `go test` on the four affected controller packages — all passed. Full review (~11 KB) in `/tmp/deep-review-WxUhwx/review-codex-pass-1.md`; key snippets reproduced above.

### [Gemini 3.1 Pro] — Pass 1

I1 (watcher goroutine bound to Reconcile context — latent), I2 (BATS command substitution at file scope), I3 (webhook unreachable validation). Hit Google model capacity 429s on first attempt, retried with backoff, completed successfully. Full review (~9 KB) in `/tmp/deep-review-WxUhwx/review-gemini-pass-1.md`; key snippets reproduced above.

## Resolution

| | |
|---|---|
| **Addressed** | 2026-04-15 |
| **Commit** | `a5235c8 engine: mirror Docker container engine state into Kubernetes` (fixup autosquashed into the original) |

| # | Finding | Action |
|---|---------|--------|
| 1 | Important #1: `rdd set running=true` hangs 5 min on Windows | Fixed (waitForDesiredState queries discovery and falls back to Running=True when engine is not enabled) |
| 2 | Important #2: `rancher-desktop` namespace / `containernamespace/moby` not self-healed | Skipped (same decision as round 1; cascade destroys mirrors without running finalizers, so self-healing is cosmetic) |
| 3 | Important #3: Engine controller in `app` group depends on `containers` CRDs | Fixed (changed `apiGroup` from `"app"` to `"containers"`; updated `containers-mock.bats` to `containers,-engine` so the mock controller keeps exclusive ownership of the mirror resources) |
| 4 | Important #4: Nil-pointer dereference on `inspect.Config.Labels` | Fixed (guarded both `sync_containers.go:applyContainer` and `sync_images.go:imageStatusFromInspect`) |
| 5 | Suggestion #1: Webhook state-enum switch is dead code | Fixed (removed the switch; the normalised-state DeepEqual check already enforces the real invariant) |
| 6 | Suggestion #2: BATS `$(rdd ...)` at file scope | Skipped (third-round oscillation; decision stands — `rdd svc paths docker_socket` is a pure computation with no real failure mode) |
| 7 | Suggestion #3: Docker watcher goroutine bound to Reconcile context | Fixed (`startWatcher` now wraps the Reconcile ctx in `context.WithoutCancel`) |
| 8 | Testing gaps (5 scenarios) | Skipped (batch — covered by the accompanying fixes or requiring fault-injection infrastructure) |
| 9 | Documentation gap #1: `cmd_app.md` missing engine-absent caveat | Fixed (added the `--controllers=-engine`/Windows fallback sentence) |
| 10 | Documentation gap #2: `--controllers` help text missing `app`→`containers` dependency | Skipped (moot after finding #3 — engine is now in the containers group) |
