# Deep Review: 20260514-141313-pr-350

| | |
|---|---|
| **Date** | 2026-05-14 14:13 |
| **Repo** | [rancher-sandbox/rancher-desktop-daemon](https://github.com/rancher-sandbox/rancher-desktop-daemon) |
| **Round** | 2 (of target) |
| **Author** | [@Nino-K](https://github.com/Nino-K) |
| **PR** | [#350](https://github.com/rancher-sandbox/rancher-desktop-daemon/pull/350) — Wire up docker socket windows |
| **Commits** | `5681e67` Fixes test 104 stopping VM removes all mirror resources<br>`e917657` Wire Docker socket forwarding for Windows<br>`b5ab872` Bump openSUSE to v0.2.3 |
| **Reviewers** | Claude Opus 4.7 (effort: xhigh), Codex GPT 5.5 (effort: xhigh), Gemini 3.1 Pro (effort: default) |
| **Verdict** | **Merge with fixes** — scope the Windows Docker pipe per-instance so RDD never attaches to Docker Desktop's daemon; gate the BATS `cygpath` path on `is_msys` so the suite does not break under WSL2. @mook-as's functional report remains unresolved (author investigating). |
| **Wall-clock time** | `39 min 35 s` |

## Executive Summary

Round 2 of PR #350. The author addressed every Round 1 finding except the
`DockerSocket()` docstring (prior-round S4), which is now subsumed by the
larger pipe-ownership finding below: `waitMirrorResourcesGone` is bounded
(30 s `context.WithTimeout`) and reads the informer cache (`r.List`), the
standalone `containers-controller` binary blank-imports the engine package,
the stale `tcp/unix` wording is fixed, the engine reason strings are now
exported constants, and the BATS `cygpath`/`local` style nits are resolved.
The remaining concerns are two Important issues that survive into Round 2 as
genuinely new analysis: (1) the Windows endpoint is the **global** Docker pipe
`\\.\pipe\docker_engine`, so RDD can silently attach to — and mirror/delete
containers on — Docker Desktop's daemon if it is installed; and (2) the
re-enabled `engine-docker.bats` gates an MSYS2-only `cygpath` call on
`is_windows`, which is also true under WSL2, exactly the gap @mook-as raised
inline. The core stop/cleanup logic (`computeSettledCondition`,
`cleanupMirrorResources`, the `apiReader`/`r.List` split) is well-reasoned and
well-tested at the unit level.

---

## Critical Issues

None.

## Important Issues

I1. **Windows endpoint is the global Docker pipe — RDD can attach to Docker Desktop's daemon** — `pkg/instance/instance.go:127-142` [Codex GPT 5.5] [Gemini 3.1 Pro]

```go
var DockerSocket = sync.OnceValue(func() string {
	if runtime.GOOS == "windows" {
		return `\\.\pipe\docker_engine`
	}
	return filepath.Join(ShortDir(), "docker.sock")
})

var DockerEndpoint = sync.OnceValue(func() string {
	if runtime.GOOS == "windows" {
		return `npipe:////./pipe/docker_engine`
	}
	return "unix://" + DockerSocket()
})
```

On Unix the socket path embeds `ShortDir()`, which factors in `RDD_INSTANCE`,
so instances are isolated. On Windows both functions return the **global**
`docker_engine` pipe — and `pkg/socketbridge/host_windows.go:20-23` documents
that exact name as the one "Docker CLI **and Docker Desktop** use to reach
dockerd."

The failure chain is concrete. `socketbridge.HostBridge.Run` calls
`winio.ListenPipe(b.pipeName, nil)` and returns an error if the pipe is already
owned (`pkg/socketbridge/host_windows.go:52-56`). That error is swallowed:
`pkg/controllers/lima/limavm/controllers/hostswitch_windows.go:199-205` runs
the bridge in an errgroup goroutine that logs the failure and `return nil`.
Meanwhile the engine controller's Docker client connects to the same endpoint
(`pkg/controllers/app/engine/controllers/docker_watcher.go:55` via
`instance.DockerEndpoint()`). So with Docker Desktop installed and listening,
RDD's bridge either loses the listen race or coexists as a second pipe
instance, and RDD's engine client reaches Docker Desktop's daemon. `fullSync`
then mirrors Docker Desktop's containers into RDD's API, and a K8s-side delete
forwards `ContainerRemove` to that daemon — RDD operating destructively on a
container engine it does not own. Concurrent RDD instances (`RDD_INSTANCE=1`
and `=2`) collide the same way, though the repo's single-user calibration
makes the Docker Desktop collision the more pressing leg.

This is the same code Round 1 flagged as S4 (a docstring gap), but the Round 2
analysis is materially different: it is not the docstring, it is RDD silently
mutating a foreign daemon. Round 1's S4 (still unaddressed — the docstring at
`instance.go:123` continues to say "for this instance") is subsumed here.

Gemini additionally claimed the global pipe "breaks the VM boot sequence."
That causal leg does not hold: the bridge starts *after* the VM GUID is known
(`hostswitch_windows.go:195-205`), and @mook-as reported `docker run` /
`docker log` working — so the pipe binds. His `Running=False/Starting` stall is
a VM-side condition, not a pipe-bind failure.

Fix: scope the pipe to the instance, mirroring the Unix isolation, and pass
that name through `socketbridge.NewDockerHostBridge` instead of the hardcoded
`DockerPipeName` constant. If a well-known alias is required for bare
`docker.exe` compatibility, bind it only after proving RDD owns it, and surface
the bind failure into readiness so the engine never treats an externally-owned
pipe as RDD's endpoint.

```diff
 var DockerSocket = sync.OnceValue(func() string {
 	if runtime.GOOS == "windows" {
-		return `\\.\pipe\docker_engine`
+		return `\\.\pipe\` + Name() + `-docker_engine`
 	}
 	return filepath.Join(ShortDir(), "docker.sock")
 })

 var DockerEndpoint = sync.OnceValue(func() string {
 	if runtime.GOOS == "windows" {
-		return `npipe:////./pipe/docker_engine`
+		return `npipe:////./pipe/` + Name() + `-docker_engine`
 	}
 	return "unix://" + DockerSocket()
 })
```

I2. **`engine-docker.bats` gates an MSYS2-only `cygpath` call on `is_windows` — breaks under WSL2** — `bats/tests/32-app-controllers/engine-docker.bats:18-29,36-41` [Claude Opus 4.7] [Codex GPT 5.5]

```bash
if is_windows; then
    run -0 cygpath -m "${BATS_FILE_TMPDIR}/docker-config"
    DOCKER_CONFIG=${output}
else
    DOCKER_CONFIG="${BATS_FILE_TMPDIR}/docker-config"
fi
```

`is_windows()` returns true for **both** MSYS2 and WSL2 —
`bats/helpers/os.bash:13-16` sets `OS=windows` when `/proc/sys/kernel/osrelease`
contains `WSL2`, and `os.bash:55` provides `is_msys()` precisely because the
two "behave differently from WSL." `cygpath` is an MSYS2 tool absent under
WSL2, so `run -0 cygpath ...` fails `local_setup_file` outright there. The
`rdd()` wrapper in `bats/helpers/commands.bash` already shows the correct
pattern — it branches `is_msys` (uses `cygpath`) versus WSL (uses `wslpath`
plus a `WSLENV` adjustment). The same gap applies to
`DOCKER_HOST="npipe:////./pipe/docker_engine"` at line 39 and the `meta.json`
host assertion at lines 791-799: an `npipe://` endpoint is meaningful only to a
native `docker.exe`, not a Linux docker client. Before this PR the whole file
was `skip_on_windows`, so removing that guard is what exposed the WSL2 path.

This is exactly @mook-as's unresolved inline comment ("This wouldn't do the
right thing if we're running BATS under WSL, right?"). It is latent rather than
CI-breaking only if the project's Windows CI runs MSYS2 exclusively today.

Fix: gate the MSYS2-specific path on `is_msys`, and decide WSL2 behaviour
explicitly — either handle `wslpath` + `WSLENV` for `DOCKER_CONFIG` and the
`DOCKER_HOST` endpoint, or `skip` under WSL2 if that environment is genuinely
unsupported for this suite.

```diff
-    if is_windows; then
+    if is_msys; then
         run -0 cygpath -m "${BATS_FILE_TMPDIR}/docker-config"
         DOCKER_CONFIG=${output}
     else
         DOCKER_CONFIG="${BATS_FILE_TMPDIR}/docker-config"
     fi
```

## Suggestions

S1. **`cleanupMirrorResources` comment says "the API server's watch cache" but the poll reads the informer cache** — `pkg/controllers/app/engine/controllers/engine_reconciler.go:492-501` [Claude Opus 4.7]

```go
// Block until the API server's watch cache reflects the deletions.
// ...
return r.waitMirrorResourcesGone(ctx)
```

`waitMirrorResourcesGone` correctly documents itself as polling "the informer
cache" (line 504-506), and `countMirrorResources` uses `r.List` — the
controller-runtime informer cache, which is *downstream* of the API server's
watch stream. An empty informer cache therefore implies the API server already
processed the deletes, which is the stronger guarantee the code wants. The
logic is right; only the `cleanupMirrorResources` comment is imprecise and
could mislead a future reader into thinking this synchronizes against the
apiserver directly.

Fix: align line 492's wording — "Block until this controller's informer cache
reflects the deletions" — and note that the informer cache lagging the
apiserver is exactly what makes it a sufficient gate for downstream
`rdd ctl get` callers.

S2. **`instance.DockerSocket()` on Windows flows a named-pipe string into Lima's `HOST_DOCKER_SOCKET` param** — `pkg/instance/instance.go:127-132`, `pkg/controllers/app/app/controllers/app_controller.go:102` [Claude Opus 4.7]

```go
fmt.Sprintf("  HOST_DOCKER_SOCKET: %q", instance.DockerSocket()),
```

`DockerSocket()` has two consumers: `DockerEndpoint()` (correct — it wants the
pipe path) and `app_controller.go:102`, which formats it into the Lima
template's `HOST_DOCKER_SOCKET` param. A Lima `hostSocket` is a host
*filesystem* socket path, not a named pipe. This is currently inert because the
Lima WSL2 driver advertises `SkipSocketForwarding: true`, so the value is never
acted on — and the pre-existing Windows value was also unused, so this is not a
behaviour regression. But the field is now semantically wrong, and the
`DockerSocket` docstring's "host-side socket that Lima port-forwards" no longer
holds on Windows.

Fix: add a brief comment at `app_controller.go:102` noting the param is inert
on Windows (WSL2 skips socket forwarding; the pipe is bridged by hostswitch),
or stop reusing `DockerSocket()` there so the two semantics do not share one
accessor.

S3. **No unit coverage for the new `waitMirrorResourcesGone` / `countMirrorResources` cleanup-synchronization logic** — `pkg/controllers/app/engine/controllers/engine_reconciler.go:504-551` [Claude Opus 4.7]

`waitMirrorResourcesGone` (including the 30 s timeout fall-through that returns
a distinct error) and `countMirrorResources` are exercised only indirectly by
`bats/tests/32-app-controllers/engine-docker.bats` ("stopping VM removes all
mirror resources"), which cannot reach the timeout branch. No
`EngineReconciler`-level unit tests exist in this package, so the absence is
consistent with repo convention — but the timeout path returns an error that
propagates to `rdd set` as a user-visible failure, and a regression there
(e.g. polling the wrong reader) would ship green. Consider a small table test
for `countMirrorResources` against a fake client, or at minimum assert the
timeout error message somewhere.

S4. **BATS hardcodes the Windows endpoint string, so it will not adapt to a per-instance pipe** — `bats/tests/32-app-controllers/engine-docker.bats:38-41,791-799` [Gemini 3.1 Pro]

```bash
if is_windows; then
    export DOCKER_HOST="npipe:////./pipe/docker_engine"
```

The test hardcodes `npipe:////./pipe/docker_engine` in both the
`local_setup_file` `DOCKER_HOST` export and the `docker_context_dir` host
assertion. If I1 is fixed by scoping the pipe per-instance, both sites must
fetch the endpoint dynamically (e.g. from `rdd svc paths docker_socket`) rather
than asserting the literal. Worth folding into the I1 fix so the test tracks
the endpoint instead of pinning it.

---

## Design Observations

### Concerns

- **`ContainerNamespace/moby` is watched but never re-applied on the steady-state path (in-scope)** [Codex GPT 5.5] — `pkg/controllers/app/engine/controllers/engine_reconciler.go:259-292,834`. The PR adds `Watches(&containersv1alpha1.ContainerNamespace{}, enqueueApp)`, which is necessary so `countMirrorResources`'s `r.List` reads a populated informer (the I1 cleanup-wait path). But it also means a delete of `ContainerNamespace/moby` now wakes the engine reconciler — and the running steady-state path only processes container actions and finalizers, so nothing recreates the namespace. `fullSync` creates it once at watcher startup (`docker_watcher.go:615`). Round 1 surfaced this as a design observation and noted the BATS test "deleting ContainerNamespace/moby completes without a finalizer hang" deliberately asserts it stays gone, so the behaviour is partly intentional — that reasoning still holds. The repo convention is that a `fullSync`-created mirror must "either appear on the steady-state watch list (and be re-applied on reconcile) or carry a comment naming it as startup-only." It is now on the watch list but neither re-applied nor commented. Resolve the ambiguity: either add an `ensureContainerNamespace` step on the running path (and flip the test to assert recreation), or add a comment that the watch exists only to populate the cache for `countMirrorResources` and the namespace is intentionally startup-only.

- **Orphaned `hostagent` processes block instance-directory deletion on Windows (future, pre-existing)** [Gemini 3.1 Pro] — `pkg/service/cmd/service.go` (not in the diff). When `rdd svc delete` runs after a stuck start, `Stop()` issues `TerminateProcess` on Windows, which kills the service but can leave `hostagent` children running; surviving children hold file locks under `instance.Dir()`, so `os.RemoveAll` fails with `ERROR_ACCESS_DENIED` and the next `rdd service start` aborts with `instance "rd" already exists`. This is pre-existing and outside the diff, but the PR's Windows enablement is what makes the path reachable, and it matches @mook-as's second reported symptom. The orchestrator did not fully trace `service.go`; flagged here for the author's active investigation rather than as a finding against this PR. (See repo context on the Windows `taskkill` / `KillTree`-while-parent-alive asymmetry.)

- **Engine controller now runs in the external `containers-controller` binary while watching a foreign API group (future)** [Claude Opus 4.7] — `pkg/controllers/containers/controller.go:12`. Blank-importing `app/engine` into the `containers` aggregator is the correct fix for the Round 1 silent-drop finding. But `EngineReconciler.SetupWithManager` does `For(&appv1alpha1.App{})` — a watch on the `app` group — while the `app` controller that installs the App CRD lives in a different aggregator. So `--controllers=containers` in standalone external mode starts a controller watching a CRD nothing in that binary installs. Harmless in the embedded `rdd svc serve` path (the app controller is always compiled in); the exposure is only the standalone external binary. If standalone external controllers are a supported deployment, the engine controller should tolerate a missing App CRD or document the `app`+`containers` co-dependency; if they are dev-only, a one-line comment closes the question.

- **`waitMirrorResourcesGone` blocks the singleton engine reconcile for up to 30 s (future)** [Claude Opus 4.7] — `pkg/controllers/app/engine/controllers/engine_reconciler.go:507-528`. The engine reconciler is a singleton (`MaxConcurrentReconciles` 1), so no other engine event is processed during the poll. This is acceptable: the block only happens on the `!wantWatcher` stopping path where there is nothing else useful to do, the timeout returns a recoverable error (the next reconcile retries; the terminal `Stopped` condition is stamped only after cleanup succeeds), and manager shutdown cancels the parent context promptly. The controller-runtime-idiomatic alternative — delete, return `RequeueAfter`, re-check the count at the top of `Reconcile` — avoids holding the worker but adds reconcile-entry state-machine complexity. The blocking poll is the simpler, reasonable choice; revisit only if the reconciler later needs to stay responsive during stop.

### Strengths

- **`computeSettledCondition` requires a terminal engine reason at the current generation (in-scope)** [Claude Opus 4.7] [Codex GPT 5.5] [Gemini 3.1 Pro] — `pkg/controllers/app/app/controllers/app_controller.go:434-463`. The new gate (`ObservedGeneration >= app.Generation` plus reason ∈ {`Stopped`, `NotApplicable`}) closes the hole where a `Connected/M+1` condition written mid-stop satisfied the `rdd set running=false` wait before mirror cleanup ran. The reasoning is documented inline and the five new table-test cases cover the absent / stale / current-but-not-terminal permutations precisely.

- **The `apiReader` (uncached) vs. `r.List` (cached) split is deliberate and well-commented (in-scope)** [Claude Opus 4.7] [Codex GPT 5.5] [Gemini 3.1 Pro] — `pkg/controllers/app/engine/controllers/engine_reconciler.go:566-569,541`. `deleteAllOfType` reads the direct API server so it catches resources the watcher created seconds before `stopWatcher`; `countMirrorResources` reads the informer cache so the readiness wait gates on what downstream `rdd ctl get` callers actually observe. Round 1's I1 (unbounded loop reading etcd) is fully resolved by this plus the 30 s bound.

## Testing Assessment

Unit coverage for `computeSettledCondition` is thorough — the five new table
cases map one-to-one onto the new branches. Gaps, ranked by risk:

1. **WSL2 BATS path (I2)** — `local_setup_file` fails before any test runs
   under WSL2; the suite has never been exercised there since it was
   `skip_on_windows` until this PR.
2. **`waitMirrorResourcesGone` 30 s timeout fall-through (S3)** — the
   user-visible error path is unreachable from the BATS test and has no unit
   test.
3. **`alreadyClean` with the new `ObservedGeneration` term** — no unit test
   confirms a stale-generation terminal condition forces
   `cleanupMirrorResources` to re-run; only integration timing covers it.
4. **`npipe`-scheme probe branch** — `probeNamedDockerContext` now accepts
   `npipe`, but `docker_context_test.go` adds only `unix://` cases; the
   `npipe` probe path is reachable only via Windows BATS. A
   `seedContext(t, "npipe-ctx", "npipe:////./pipe/some-pipe")` case would lock
   it down on non-Windows CI.
5. **Docker Desktop pipe collision (I1)** — no test exercises RDD starting
   when `\\.\pipe\docker_engine` is already owned by another listener.

Both Claude and Codex ran `go test` against the touched packages and reported
it passing; Codex also ran `make build-rdd` successfully.

## Documentation Assessment

- `instance.go:123-126`: the `DockerSocket` docstring still says "the path to
  the Docker socket **for this instance**" and "host-side socket that Lima
  port-forwards" — both inaccurate on Windows (the pipe is global and bridged
  by hostswitch, not port-forwarded). The appended Windows line does not
  correct the lede. This is the unaddressed Round 1 S4, now folded into I1.
- `engine_reconciler.go:492` comment imprecision is covered in S1.
- New exported constants in `app_types.go:64-79` are each documented clearly,
  and `pkg/controllers/containers/controller.go:6-7` package doc was correctly
  updated to add `engine` to the aggregated list — good.

## Commit Structure

Three commits, each self-contained: the openSUSE bump (`b5ab872`), the
socket-forwarding wiring (`e917657`), and the test-104 cleanup fix (`5681e67`).
The openSUSE bump is arguably unrelated to "wire up docker socket windows" and
could have been its own PR, but it is a one-line-per-file digest bump and does
not harm reviewability. Messages match their diffs; no fixup commits. No action
needed.

## Acknowledged Limitations

- `engine_reconciler.go:176-195` already documents the watcher-died vs.
  never-started branch and the deliberate dual cleanup paths (`stopWatcher`
  vs. direct `removeDockerContext`). Not an issue.
- `docker_watcher.go:200-207` notes a TODO for periodic full sync so dropped
  image/volume events self-heal. Pre-existing and not worsened by this PR.
- `hostswitch_windows.go:151-155` / `:199-205` document fire-and-forget
  host-switch lifecycle: failures are logged but not surfaced to the
  reconciler. Pre-existing — but note this is the same swallowed-error path
  that makes I1's silent Docker Desktop attachment possible.
- The `TODO(windows)` guards in `engine/controller.go` and
  `engine_reconciler.go` were resolved by this PR rather than deferred, so
  there is no remaining acknowledged-limitation comment in the changed code.

## Unresolved Feedback

- **@mook-as, inline on `engine-docker.bats`** — "This wouldn't do the right
  thing if we're running BATS under WSL, right? Do we no longer support that?"
  ([discussion r3236092197](https://github.com/rancher-sandbox/rancher-desktop-daemon/pull/350#discussion_r3236092197)).
  No reply and no subsequent commit addresses it. The codebase clearly still
  intends to support WSL (`commands.bash` branches `is_msys` vs. WSL
  explicitly), so this is a genuine gap, not a declined concern — see **I2**.

- **@mook-as, general comment** — App stuck at `ContainerEngineReady=False/Stopped`,
  `Running=False/Starting`; `rdd set running=true` dying with "failed to watch
  App: timed out waiting for the condition"; and a wedged `instance "rd"
  already exists` state after `service delete`. @Nino-K replied "I'm looking
  into this." These are runtime integration failures the author is actively
  investigating. The `instance "rd" already exists` symptom aligns with the
  orphaned-`hostagent` design observation above. The `timed out waiting for the
  condition` symptom overlaps with the new `computeSettledCondition` gate and
  `waitMirrorResourcesGone` timeout — worth confirming commit `5681e67`
  (which post-dates the discussion) resolved rather than introduced the hang.

---

## Agent Performance Retro

### [Claude]

Strongest on diff-local detail and the test/wiring gaps: it owned three of the
four suggestions — the `cleanupMirrorResources` comment imprecision, the
`DockerSocket()`→Lima-`HOST_DOCKER_SOCKET` semantic drift (which needed tracing
the second consumer of `DockerSocket()`), and the missing unit coverage — plus
the sharpest design observation, that the engine controller now watches the
`app` group from the `containers` aggregator in the standalone external binary.
It also independently caught the WSL2 gate. Its notable miss is the headline:
Claude reviewed `instance.go` and raised only the Lima-param suggestion, never
connecting the global pipe to a Docker Desktop collision — the most important
finding of the review, which both other agents reached.

### [Codex]

Best analysis on the headline finding. It traced the full claim chain — from
`socketbridge`'s `ListenPipe` failure, through hostswitch's swallowed error, to
the engine client connecting to a foreign daemon — which is what makes I1 land
as a concrete data-loss risk rather than a docstring nit. It independently
caught the WSL2 gate and re-raised the `ContainerNamespace` steady-state gap
(consolidated as a design observation, consistent with Round 1). No false
positives; it ran `go test` and `make build-rdd`. Zero suggestions — Codex
stays on structural issues and leaves the smaller stuff to the other two.

### [Gemini]

Converged with Codex on the pipe-ownership finding and uniquely added the
concurrent-RDD-instance angle and the orphaned-`hostagent` pre-existing gap
that explains @mook-as's `instance "rd" already exists` symptom. But it
inflated the pipe finding to Critical with an unverified causal claim — "breaks
the VM boot sequence" — that @mook-as's own report contradicts (`docker run`
works, so the pipe binds). Its S2 was a false positive: it argued
`local socket_path` should be restored, having misread the assignment as living
in the `docker_context_dir` helper when it is inside an `@test` block. As
usual, no `git blame` (quota), so it could not separate PR-introduced from
pre-existing. Lightest coverage of the three — it left
`pkg/controllers/containers/controller.go` out of its summary.

### Summary

| | Claude Opus 4.7 | Codex GPT 5.5 | Gemini 3.1 Pro |
|---|---|---|---|
| Duration | 29m 29s | 10m 59s | — |
| Findings | 1I 3S | 2I | 1I 1S |
| Tool calls | 31 (Read 17, Bash 9, Grep 5) | 90 (shell 88, stdin 2) | — |
| Design observations | 5 | 3 | 1 |
| False positives | 0 | 0 | 1 |
| Unique insights | 5 | 1 | 1 |
| Files reviewed | 13 | 13 | 12 |
| Coverage misses | 0 | 0 | 1 |
| **Totals** | **1I 3S** | **2I** | **1I 1S** |
| Downgraded | 0 | 1 (I→S) | 2 (I→S) |
| Dropped | 0 | 0 | 1 |


Reconciliation:

- Gemini C1 "Global named pipe breaks multi-instance isolation": critical →
  important (I1). The destructive-to-a-foreign-daemon risk is real, but it
  requires Docker Desktop installed and listening, RDD is pre-release, and
  actual container deletion needs explicit user action — Important, not
  Critical. Gemini's "breaks the VM boot sequence" causal leg was dropped as
  contradicted by @mook-as's report.
- Codex I2 "ContainerNamespace delete events enqueue a reconcile that cannot
  recreate the namespace": important → design observation, consistent with
  Round 1's handling — the BATS test deliberately asserts the namespace stays
  gone after a user delete, so the behaviour is partly intentional.
- Gemini I1 "Orphaned hostagents block instance directory deletion": important
  → design observation. `pkg/service/cmd/service.go` is not in the diff;
  surfaced as a pre-existing concern that the PR's Windows enablement makes
  reachable.
- Gemini S2 "restore `local socket_path`": dropped as a false positive. The
  assignment is inside the `@test "moby engine creates Docker context for the
  instance"` block (engine-docker.bats:775-800), not the `docker_context_dir`
  helper (763-773), so the repo's "no `local` in `@test`" rule applies and
  Round 1's S5 removal was correct.

Codex provided the most value this round — it owned the headline finding's
analysis and caught everything else of substance with no false positives.
Claude contributed the most breadth on the smaller issues and the cleanest
design observations, but missed the headline in a file it did review. Gemini
added two useful unique angles but cost a clarification-worthy false positive
and an inflated severity.

---

## Review Process Notes

### Repo context updates

- **[repo]** `is_windows()` returns true for both MSYS2 and WSL2;
  `is_msys()` (in `bats/helpers/os.bash`) distinguishes them, and the `rdd()`
  wrapper in `bats/helpers/commands.bash` branches `is_msys` (`cygpath`,
  native-Windows assumptions) versus WSL (`wslpath`, `WSLENV`). When reviewing
  BATS code, flag any path that gates an MSYS2-only tool or a
  native-Windows-only assumption (e.g. an `npipe://` endpoint a Linux docker
  client cannot use) on `is_windows` — it must use `is_msys` or explicitly
  handle the WSL branch. Removing a `skip_on_windows` guard exposes the WSL2
  path, which the suite was never previously exercised against.

- **[repo]** On Windows, RDD's Docker named pipe and Docker Desktop's pipe
  share the well-known name `\\.\pipe\docker_engine`
  (`pkg/socketbridge/host_windows.go`). The socketbridge `ListenPipe` failure
  is swallowed by the hostswitch goroutine, and `instance.DockerEndpoint()`
  feeds the engine controller's Docker client. When reviewing Windows
  Docker-endpoint code, verify the pipe is instance-scoped (or that RDD proves
  it owns the pipe before connecting a client) — an unscoped client can
  silently attach to Docker Desktop's daemon and mirror or delete its
  containers.

---

## Appendix: Original Reviews

### [Claude] — Pass 1

    # Code Review: PR #350 — Wire up docker socket windows
    
    ## Executive Summary
    
    This PR enables Docker socket forwarding on Windows by routing every Docker endpoint through `instance.DockerEndpoint()` (`npipe:////./pipe/docker_engine` on Windows), removes the `runtime.GOOS == "windows"` early-return guards in the engine controller and its context management, and hardens VM-stop cleanup so `rdd set running=false` waits until mirror resources are actually gone. The core logic changes (`computeSettledCondition`, `cleanupMirrorResources`/`waitMirrorResourcesGone`, `apiReader` reads) are well-reasoned and well-tested at the unit level. The main gap is in the re-enabled BATS test, which conflates MSYS2 and WSL2 — a concern the maintainer raised inline that remains unanswered.
    
    ## Findings
    
    ### Critical Issues
    
    None.
    
    ### Important Issues
    
    **I1. `engine-docker.bats` uses `is_windows` to gate an MSYS2-only `cygpath` call — breaks under WSL2** — `bats/tests/32-app-controllers/engine-docker.bats:22-24,37-42`
    
    ```bash
    if is_windows; then
        run -0 cygpath -m "${BATS_FILE_TMPDIR}/docker-config"
        DOCKER_CONFIG=${output}
    else
        DOCKER_CONFIG="${BATS_FILE_TMPDIR}/docker-config"
    fi
    ```
    
    `is_windows()` returns true for **both** MSYS2 and WSL2 — `bats/helpers/os.bash:13-19` sets `OS=windows` when `/proc/sys/kernel/osrelease` contains `WSL2`, and `os.bash:53-57` provides `is_msys()` specifically because "Both report OS=windows, but behave differently from WSL." `cygpath` is an MSYS2 tool that does not exist under WSL2, so `run -0 cygpath ...` fails `local_setup_file` outright when the suite runs under WSL2. The `rdd()` wrapper in `bats/helpers/commands.bash:45-83` already demonstrates the correct pattern: it branches `is_msys` (uses `cygpath`) vs. WSL (uses `wslpath`). The same applies to `DOCKER_HOST="npipe:////./pipe/docker_engine"` at line 38 — that scheme only works for a native `docker.exe`, not a Linux docker client. Before this PR the whole file was `skip_on_windows`, so removing that guard exposed the WSL2 path without handling it. This is exactly what @mook-as flagged inline (see Unresolved Feedback).
    
    Fix: gate the MSYS2-specific path on `is_msys`, mirroring `commands.bash`:
    
    ```diff
    -    if is_windows; then
    +    if is_msys; then
             run -0 cygpath -m "${BATS_FILE_TMPDIR}/docker-config"
             DOCKER_CONFIG=${output}
         else
             DOCKER_CONFIG="${BATS_FILE_TMPDIR}/docker-config"
         fi
    ```
    
    and similarly decide the WSL2 behaviour for `DOCKER_HOST` / the `meta.json` host assertion at lines 37-42 and 791-799, or explicitly `skip` under WSL2 if that environment is genuinely unsupported for this suite. If the project's Windows CI only runs MSYS2 today, this is latent rather than CI-breaking — but the change should still use the helper that exists for this exact distinction.
    
    ### Suggestions
    
    **S1. `cleanupMirrorResources` comment claims it polls "the API server's watch cache" but it polls the informer cache** — `pkg/controllers/app/engine/controllers/engine_reconciler.go:492-501,504-506`
    
    ```go
    // Block until the API server's watch cache reflects the deletions.
    // ...
    return r.waitMirrorResourcesGone(ctx)
    ```
    
    ```go
    // waitMirrorResourcesGone polls the informer cache until all four mirror
    // resource types are empty or the 30-second timeout expires. r.List is
    // used (not r.apiReader) so the poll drains the watch cache, not etcd.
    ```
    
    The two doc comments describe different things: `cleanupMirrorResources` (line 492) says "the API server's watch cache," while `waitMirrorResourcesGone` (line 504) correctly says "the informer cache." `r.List` at line 541 reads the controller-runtime informer cache, which is downstream of the API server's watch stream — so an empty informer cache *implies* the API server already processed the deletes, which is the stronger guarantee the code actually wants. The logic is fine; only the `cleanupMirrorResources` comment is imprecise and could mislead a future reader into thinking this synchronizes against the apiserver directly.
    
    Fix: make line 492's comment consistent — "Block until this controller's informer cache reflects the deletions" — and note that the informer cache lagging the apiserver is what makes this a sufficient gate for downstream `rdd ctl get` callers.
    
    **S2. `instance.DockerSocket()` on Windows returns a named-pipe path that flows into the Lima `HOST_DOCKER_SOCKET` param** — `pkg/instance/instance.go:127-132`, `pkg/controllers/app/app/controllers/app_controller.go:102`
    
    ```go
    var DockerSocket = sync.OnceValue(func() string {
    	if runtime.GOOS == "windows" {
    		return `\\.\pipe\docker_engine`
    	}
    	return filepath.Join(ShortDir(), "docker.sock")
    })
    ```
    
    `DockerSocket()` has two consumers: `DockerEndpoint()` (correct — it wants the pipe path) and `app_controller.go:102`, which formats it into the Lima template's `HOST_DOCKER_SOCKET` param → `hostSocket: "{{.Param.HOST_DOCKER_SOCKET}}"` in `lima-template.yaml:29`. A Lima `hostSocket` is a host *filesystem* socket path, not a named pipe. This is currently inert because the Lima WSL2 driver advertises `SkipSocketForwarding: true` (`lima-vm/lima` `pkg/driver/wsl2/wsl_driver_windows.go`), so the `portForwards` socket entry is never acted on for WSL2 instances — but the value is now semantically wrong for that field, and the docstring's "host-side socket that Lima port-forwards" no longer holds on Windows. The pre-existing Windows value (`...\.rd2\docker.sock`) was also unused, so this is not a regression in behaviour, just increased confusion.
    
    Fix: either add a brief comment at `app_controller.go:102` noting the param is inert on Windows (WSL2 skips socket forwarding; the pipe is bridged by hostswitch), or stop reusing `DockerSocket()` there and pass an explicitly-named accessor so the two semantics don't share one function.
    
    **S3. New cleanup-synchronization logic has no unit coverage** — `pkg/controllers/app/engine/controllers/engine_reconciler.go:504-551`
    
    `waitMirrorResourcesGone` (including the 30 s timeout fall-through that returns a distinct error) and `countMirrorResources` are exercised only indirectly by `bats/tests/32-app-controllers/engine-docker.bats:686-701` ("stopping VM removes all mirror resources"). The BATS test cannot reach the timeout branch. No `EngineReconciler`-level unit tests exist in this package, so this is consistent with repo convention — but the timeout path returns an error that propagates to `rdd set` as a user-visible failure, and a regression there (e.g. polling the wrong reader) would ship green. Consider a small table test for `countMirrorResources` against a fake client, or at minimum confirm the timeout error message is asserted somewhere.
    
    ## Design Observations
    
    ### Concerns
    
    **Engine controller now runs in the external `containers-controller` binary while watching a foreign API group** *(future)* — `pkg/controllers/containers/controller.go:12`
    
    Adding `_ "…/controllers/app/engine"` to the `containers` aggregator is the correct fix for a real silent drop: before this PR the external `containers-controller` (`cmd/containers-controller/main.go` → `external.RunControllers("containers")`) never registered the engine controller, even though `engine.GetAPIGroup()` returns `"containers"`. However, `EngineReconciler.SetupWithManager` does `For(&appv1alpha1.App{})` — a watch on the `app` group. The `app` controller (which installs the App CRD via `GetCRDData()`) is in a *different* aggregator. So `--controllers=containers` in standalone external mode now starts a controller that watches a CRD nothing in that binary installs. In the embedded `rdd svc serve` path this is harmless because the app controller is always compiled in; the exposure is only the standalone external binary. The `apiGroup` const comment in `engine/controller.go:29-33` already acknowledges the grouping tension. If standalone external controllers are a supported deployment, the engine controller's startup should tolerate a missing App CRD (or document the `app`+`containers` co-dependency); if they are dev-only, a comment saying so would close the question.
    
    **Blocking a reconcile up to 30 s in `waitMirrorResourcesGone`** *(future)* — `pkg/controllers/app/engine/controllers/engine_reconciler.go:507-528`
    
    `cleanupMirrorResources` now blocks the engine reconcile for up to 30 s polling the cache. Because the engine reconciler is a singleton (`MaxConcurrentReconciles` 1), no other engine event is processed during that window. This is acceptable in practice — the block only happens on the `!wantWatcher` (stopping) path, where there is nothing else useful to do, and manager shutdown cancels the parent context promptly. The controller-runtime-idiomatic alternative is to delete, return `ctrl.Result{RequeueAfter: …}`, and re-check `countMirrorResources` at the top of `Reconcile`, stamping the terminal condition only when the count reaches zero. That avoids holding the worker but adds reconcile-entry state-machine complexity; the blocking poll is the simpler choice and is reasonable here. Worth revisiting only if the engine reconciler later needs to stay responsive during stop.
    
    ### Strengths
    
    - The `computeSettledCondition` change correctly requires a *terminal* engine reason (`Stopped`/`NotApplicable`) plus `ObservedGeneration >= app.Generation`, closing the "`Connected/M+1` written mid-stop satisfies the wait" hole. The reasoning is documented inline and the five new table-test cases (`app_controller_test.go:160-218`) cover the absent / stale / current-but-not-terminal permutations precisely.
    - Using `apiReader` (uncached) for the list/get in `deleteAllOfType` while keeping `r.List` (cached) for `countMirrorResources` is a deliberate, well-commented split: catch just-created resources at delete time, but gate the condition on the cache the downstream waiter actually observes.
    - Adding `Watches(&containersv1alpha1.ContainerNamespace{}, enqueueApp)` makes the fourth mirror type a real watch source, so `countMirrorResources` reads a populated informer rather than lazily spinning one up.
    
    ## Testing Assessment
    
    Unit coverage for `computeSettledCondition` is thorough. Gaps, ranked by risk:
    
    1. **WSL2 BATS path (I1)** — `local_setup_file` fails before any test runs under WSL2; the suite has never been exercised there since it was `skip_on_windows` until this PR.
    2. **`waitMirrorResourcesGone` 30 s timeout fall-through (S3)** — the user-visible error path is unreachable from the BATS test and has no unit test.
    3. **`alreadyClean` with the new `ObservedGeneration` term** — no unit test confirms that a stale-generation terminal condition forces `cleanupMirrorResources` to re-run; only integration timing covers it.
    4. **Windows `manageDockerContext`/`removeDockerContext`** — the TODO(windows) guards were removed and `probeNamedDockerContext` now accepts `npipe`, but the `npipe`-scheme probe branch has no Go unit test (`docker_context_test.go` only adds `unix://` cases); `engine-docker.bats:791-799` asserts the `npipe` host string but does not exercise the probe.
    
    ## Documentation Assessment
    
    - `instance.go:123-126`: the `DockerSocket` docstring still says "host-side socket that Lima port-forwards from the guest's /var/run/docker.sock" — accurate on Unix, but on Windows the returned pipe is bridged by hostswitch, not port-forwarded by Lima. The appended Windows line doesn't correct the lede. Minor (see S2).
    - `engine_reconciler.go:492` comment imprecision covered in S1.
    - New exported constants in `app_types.go:64-79` are each documented clearly — good.
    - `pkg/controllers/containers/controller.go:6-7` package doc was correctly updated to add `engine` to the list.
    
    ## Commit Structure
    
    Three commits, each self-contained: the openSUSE bump (`b5ab872`), the socket-forwarding wiring (`e917657`), and the test-104 cleanup fix (`5681e67`). The openSUSE bump is arguably unrelated to "wire up docker socket windows" and could have been its own PR, but it is a one-line-per-file digest bump and does not harm reviewability. Messages match their diffs. No fixup commits. No action needed.
    
    ## Acknowledged Limitations
    
    - **PR comment (@mook-as / @Nino-K)** — @mook-as reported the App stuck at `ContainerEngineReady=False/Stopped` and `rdd set running=true` dying with "failed to watch App: timed out", plus a wedged `instance "rd" already exists` state after `service delete`. @Nino-K replied "I'm looking into this." These are runtime integration failures the author is actively investigating; they are consistent with this PR being a work-in-progress on Windows and are not re-raised as findings here. Note that the most recent commit (`5681e67`, the cleanup-wait fix) post-dates the discussion and may be part of that investigation — the reported `timed out waiting for the condition` symptom overlaps with the new `computeSettledCondition` gate and `waitMirrorResourcesGone` timeout, so it is worth confirming those changes resolved (rather than introduced) the hang.
    - **TODO(windows) comments removed, not deferred** — `engine/controller.go` and `engine_reconciler.go` previously carried `TODO(windows)` guards; this PR resolves them rather than leaving them, so there is no remaining acknowledged-limitation comment in the changed code.
    
    ## Unresolved Feedback
    
    - **@mook-as, inline on `engine-docker.bats`** — "This wouldn't do the right thing if we're running BATS under WSL, right? Do we no longer support that?" (https://github.com/rancher-sandbox/rancher-desktop-daemon/pull/350#discussion_r3236092197). No reply and no subsequent commit addresses it. Still actionable — see **I1**. The codebase clearly still intends to support WSL (`commands.bash:45-83` branches `is_msys` vs. WSL explicitly), so the `is_windows`-gated `cygpath` call is a genuine gap, not a declined concern.
    
    ## Coverage Summary
    
    | File | Status |
    |---|---|
    | `bats/tests/32-app-controllers/engine-docker.bats` | **I1** |
    | `pkg/apis/app/v1alpha1/app_types.go` | Reviewed, no issues — new constants are documented and consumed correctly |
    | `pkg/controllers/app/app/controllers/app_controller.go` | Reviewed, no issues — `computeSettledCondition` gate is sound; see Design Observations re: wait-hang tradeoff |
    | `pkg/controllers/app/app/controllers/app_controller_test.go` | Reviewed, no issues — strong table-test additions |
    | `pkg/controllers/app/app/lima-images-unix.yaml` | Trivial — openSUSE v0.2.3 digest bump |
    | `pkg/controllers/app/app/lima-images-wsl2.yaml` | Trivial — openSUSE v0.2.3 digest bump |
    | `pkg/controllers/app/engine/controller.go` | Reviewed, no issues — `runtime` guard and import correctly removed |
    | `pkg/controllers/app/engine/controllers/docker_context.go` | Reviewed, no issues — `endpointURL` rename and `npipe` scheme handled consistently |
    | `pkg/controllers/app/engine/controllers/docker_context_test.go` | Reviewed, no issues — tests updated to full-URL form; see S3 re: missing `npipe` probe test |
    | `pkg/controllers/app/engine/controllers/docker_watcher.go` | Reviewed, no issues — `instance` import still used |
    | `pkg/controllers/app/engine/controllers/engine_reconciler.go` | **S1, S3** + Design Observations |
    | `pkg/controllers/containers/controller.go` | Design Observation (foreign-group watch in external binary) — the import itself is a correct fix |
    | `pkg/instance/instance.go` | **S2** |
### [Codex] — Pass 1

    ### Executive Summary
    
    This PR wires the engine controller to the Windows Docker named pipe and tightens stop/cleanup readiness around `ContainerEngineReady`. The main risks are Windows endpoint ownership: the new hardcoded `docker_engine` pipe can connect RDD to Docker Desktop instead of RDD’s VM, and the new `ContainerNamespace` watch still does not recreate the singleton resource it watches.
    
    ### Findings
    
    #### Critical Issues
    
    None.
    
    #### Important Issues
    
    I1. **Windows endpoint can attach to an externally-owned Docker pipe** — `pkg/instance/instance.go:137-141` (important, regression)
    
    ```go
    var DockerEndpoint = sync.OnceValue(func() string {
    	if runtime.GOOS == "windows" {
    		return `npipe:////./pipe/docker_engine`
    	}
    	return "unix://" + DockerSocket()
    })
    ```
    
    `DockerEndpoint()` now points every Windows RDD instance at the global `docker_engine` pipe at line 139. That is the same well-known pipe documented in `pkg/socketbridge/host_windows.go:21-23` as also used by Docker Desktop. If Docker Desktop or another daemon already owns the pipe, RDD’s bridge fails to listen at `pkg/socketbridge/host_windows.go:53-55`, but the host-switch goroutine only logs that failure and returns nil at `pkg/controllers/lima/limavm/controllers/hostswitch_windows.go:199-204`. The engine then creates its Docker client against the same endpoint at `pkg/controllers/app/engine/controllers/docker_watcher.go:55`; if the external daemon responds, full sync mirrors that daemon’s containers at `pkg/controllers/app/engine/controllers/docker_watcher.go:618`, and K8s-side deletes can call `ContainerRemove` against it at `pkg/controllers/app/engine/controllers/docker_watcher.go:575`. This can make RDD report `Connected` for the wrong engine and potentially operate on Docker Desktop containers.
    
    Fix: use an instance-owned pipe for the bridge, Docker client, and Docker context, e.g. `\\.\pipe\rancher-desktop-<suffix>-docker_engine`, and pass that pipe into `socketbridge.NewDockerHostBridge`. If the standard pipe must remain as a compatibility alias, only bind it after proving RDD owns it, and surface bind failure into readiness so the engine never treats an externally-owned pipe as RDD’s Docker endpoint.
    
    ```diff
     var DockerSocket = sync.OnceValue(func() string {
     	if runtime.GOOS == "windows" {
    -		return `\\.\pipe\docker_engine`
    +		return `\\.\pipe\` + Name() + `-docker_engine`
     	}
     	return filepath.Join(ShortDir(), "docker.sock")
     })
    
     var DockerEndpoint = sync.OnceValue(func() string {
     	if runtime.GOOS == "windows" {
    -		return `npipe:////./pipe/docker_engine`
    +		return `npipe:////./pipe/` + Name() + `-docker_engine`
     	}
     	return "unix://" + DockerSocket()
     })
    ```
    
    I2. **ContainerNamespace delete events enqueue a reconcile that cannot recreate the namespace** — `pkg/controllers/app/engine/controllers/engine_reconciler.go:831-834` (important, gap)
    
    ```go
    	Watches(&containersv1alpha1.Container{}, enqueueApp).
    	Watches(&containersv1alpha1.Image{}, enqueueApp).
    	Watches(&containersv1alpha1.Volume{}, enqueueApp).
    	Watches(&containersv1alpha1.ContainerNamespace{}, enqueueApp).
    	Complete(r)
    ```
    
    The new `ContainerNamespace` watch at line 834 makes deletes of `ContainerNamespace/moby` wake the engine reconciler, but the steady-state running path does not re-apply that resource. `fullSync` creates it only during watcher startup at `pkg/controllers/app/engine/controllers/docker_watcher.go:615`, while a normal reconcile after the watcher is already running only processes container actions at `pkg/controllers/app/engine/controllers/engine_reconciler.go:286`, finalizers at `pkg/controllers/app/engine/controllers/engine_reconciler.go:289`, and returns at `pkg/controllers/app/engine/controllers/engine_reconciler.go:292`. The current BATS assertion at `bats/tests/32-app-controllers/engine-docker.bats:726-727` confirms the namespace remains absent, so the test passes the self-healing failure.
    
    Fix: re-apply `ContainerNamespace/moby` on the running steady-state path, then change the BATS test to delete it and wait for recreation.
    
    ```diff
     	if err := r.setEngineCondition(ctx, &app, metav1.ConditionTrue, appv1alpha1.EngineReasonConnected, "Container engine synced"); err != nil {
     		return ctrl.Result{}, err
     	}
    +	if err := r.ensureContainerNamespace(ctx); err != nil {
    +		return ctrl.Result{}, fmt.Errorf("failed to sync container namespace: %w", err)
    +	}
    
     	var actionsErr, finErr error
    ```
    
    I3. **Windows BATS setup still assumes MSYS2 when running under WSL** — `bats/tests/32-app-controllers/engine-docker.bats:22-24` (important, regression)
    
    ```bash
    if is_windows; then
        run -0 cygpath -m "${BATS_FILE_TMPDIR}/docker-config"
        DOCKER_CONFIG=${output}
    ```
    
    `is_windows` is true for WSL at `bats/helpers/os.bash:13-16`, but the new setup enters the `cygpath` branch at lines 22-24 even though WSL uses `wslpath`, as the existing `rdd` wrapper does at `bats/helpers/commands.bash:63-67`. The same setup then starts the native Windows service at `bats/tests/32-app-controllers/engine-docker.bats:36`; under WSL, `DOCKER_CONFIG` also needs to cross the WSL/Windows boundary, but the wrapper currently only adds `RDD_*` variables to `WSLENV` at `bats/helpers/commands.bash:71-82`. This leaves WSL runs either failing before startup or writing Docker context files outside the isolated test config.
    
    Fix: split MSYS2 and WSL handling, and export `DOCKER_CONFIG` through `WSLENV` for the native Windows process.
    
    ```diff
     if is_windows; then
    -    run -0 cygpath -m "${BATS_FILE_TMPDIR}/docker-config"
    -    DOCKER_CONFIG=${output}
    +    if is_msys; then
    +        run -0 cygpath -m "${BATS_FILE_TMPDIR}/docker-config"
    +        DOCKER_CONFIG=${output}
    +    else
    +        DOCKER_CONFIG="${BATS_FILE_TMPDIR}/docker-config"
    +        export WSLENV="${WSLENV:+${WSLENV}:}DOCKER_CONFIG/p"
    +    fi
     else
         DOCKER_CONFIG="${BATS_FILE_TMPDIR}/docker-config"
     fi
    ```
    
    #### Suggestions
    
    None.
    
    ### Design Observations
    
    **Concerns**
    
    - **Pipe ownership should be part of the Docker endpoint abstraction (in-scope).** Right now `instance.DockerEndpoint()`, the host socket bridge, Docker context creation, and tests all encode the same global pipe independently. Centralizing the Windows pipe name as an instance-scoped endpoint and passing it through the bridge would remove the Docker Desktop collision class described in I1 and prevent endpoint drift.
    
    **Strengths**
    
    - The stop-path generation checks are a good direction: requiring a terminal engine reason plus current `ObservedGeneration` prevents stale `Connected` conditions from satisfying `Settled` during shutdown.
    - Using the API reader for cleanup listing while waiting on the informer cache is a pragmatic split: deletion sees committed state, while readiness waits for what clients observe.
    
    ### Testing Assessment
    
    1. I1: Missing a Windows test where `\\.\pipe\docker_engine` is already occupied by another listener; the engine should not report `Connected` to that listener.
    2. I2: The `ContainerNamespace/moby` deletion test currently asserts absence; it should assert the running controller recreates the namespace.
    3. I3: No WSL execution path covers `engine-docker.bats`; the new Windows setup only matches the MSYS2 CI shell.
    
    Verification run: `go test ./pkg/controllers/app/engine/controllers ./pkg/controllers/app/app/controllers ./pkg/instance` passed. `make build-rdd` passed.
    
    ### Documentation Assessment
    
    The Windows Docker endpoint behavior should document whether RDD intentionally uses the global Docker Desktop pipe or an instance-scoped pipe. If the fix for I1 switches to instance-scoped pipes, update `DockerSocket`/`DockerEndpoint` comments and the Windows BATS expected context host.
    
    ### Acknowledged Limitations
    
    - **Code comment**: `pkg/controllers/app/engine/controllers/docker_watcher.go:200-207` notes that transient event handling errors can leave image or volume mirrors missing until a future full resync. This PR does not change that behavior.
    
    ### Unresolved Feedback
    
    - `https://github.com/rancher-sandbox/rancher-desktop-daemon/pull/350#discussion_r3236092197` asks whether the Windows BATS changes still work under WSL. This remains actionable and overlaps with I3.
    
    ### Coverage Summary
    
    - `bats/tests/32-app-controllers/engine-docker.bats` — I3
    - `pkg/apis/app/v1alpha1/app_types.go` — Reviewed, no issues
    - `pkg/controllers/app/app/controllers/app_controller.go` — Reviewed, no issues
    - `pkg/controllers/app/app/controllers/app_controller_test.go` — Reviewed, no issues
    - `pkg/controllers/app/app/lima-images-unix.yaml` — Reviewed, no issues
    - `pkg/controllers/app/app/lima-images-wsl2.yaml` — Reviewed, no issues
    - `pkg/controllers/app/engine/controller.go` — Reviewed, no issues
    - `pkg/controllers/app/engine/controllers/docker_context.go` — Reviewed, no issues
    - `pkg/controllers/app/engine/controllers/docker_context_test.go` — Reviewed, no issues
    - `pkg/controllers/app/engine/controllers/docker_watcher.go` — I1
    - `pkg/controllers/app/engine/controllers/engine_reconciler.go` — I2
    - `pkg/controllers/containers/controller.go` — Reviewed, no issues
    - `pkg/instance/instance.go` — I1

### [Gemini] — Pass 1

    ### Executive Summary
    
    This PR enables Docker socket forwarding on Windows by migrating from Unix sockets to named pipes (`\\.\pipe\docker_engine`). While it successfully wires up the engine controller to support Docker mirroring on Windows, the hardcoded global pipe name creates a collision point that breaks the VM boot sequence and traps the control plane in a start/fail loop. This loop also exacerbates a pre-existing issue where `rdd svc delete` leaves orphaned `hostagent` processes behind on Windows, preventing clean restarts.
    
    ### Findings
    
    #### Critical Issues
    
    **C1. Hardcoded named pipe collides with Docker Desktop and concurrent instances** — `pkg/instance/instance.go:126-128` (critical, regression)
    
    ```go
    var DockerSocket = sync.OnceValue(func() string {
    	if runtime.GOOS == "windows" {
    		return `\\.\pipe\docker_engine`
    	}
    ```
    By hardcoding the global `\\.\pipe\docker_engine` pipe, this change introduces collisions if Docker Desktop is installed or if concurrent RDD instances run (e.g., in CI). If the pipe is already bound by another process or an orphaned `hostagent`, Lima's `hostagent` fails to complete port forwarding. As a result, the `hostagent` never emits `Running=True`, causing the VM to stall in `Starting` while `rdd set running=true` times out waiting for `ContainerEngineReady`.
    
    Fix: Scope the named pipe to the unique instance name (available via `Name()`), mirroring the isolation previously provided by the instance-specific Unix socket paths. Apply the same to `DockerEndpoint`.
    
    ```diff
     var DockerSocket = sync.OnceValue(func() string {
     	if runtime.GOOS == "windows" {
    -		return `\\.\pipe\docker_engine`
    +		return fmt.Sprintf(`\\.\pipe\rdd_%s_docker_engine`, Name())
     	}
     	return filepath.Join(ShortDir(), "docker.sock")
     })
     
     var DockerEndpoint = sync.OnceValue(func() string {
     	if runtime.GOOS == "windows" {
    -		return `npipe:////./pipe/docker_engine`
    +		return fmt.Sprintf(`npipe:////./pipe/rdd_%s_docker_engine`, Name())
     	}
     	return "unix://" + DockerSocket()
     })
    ```
    
    #### Important Issues
    
    **I1. Orphaned hostagents block instance directory deletion on Windows** — `pkg/service/cmd/service.go` (important, gap)
    
    When the VM gets stuck (see C1) and the user runs `rdd svc delete`, `service.Delete()` calls `Stop()`, which issues `TerminateProcess` on Windows. This intentionally kills the service while leaving the `hostagent` children running. Because Windows enforces file locks, the surviving `hostagent` processes hold open files in `instance.Dir()`. `os.RemoveAll(instance.Dir())` silently fails (`ERROR_ACCESS_DENIED`) and ignores the error. The next time `rdd service start` runs, `lima create` encounters the uncleared directory and aborts with `"instance \"rd\" already exists"`.
    
    Fix: Call `process.KillTree` on the service instead of a plain `Kill` during the graceful-shutdown timeout so the hostagents are reaped cleanly, or explicitly check for `ERROR_ACCESS_DENIED` during deletion to prevent the system from getting permanently stuck.
    
    #### Suggestions
    
    **S1. Hardcoded BATS test endpoints do not adapt to dynamic sockets** — `bats/tests/32-app-controllers/engine-docker.bats:39`
    
    ```bash
         if is_windows; then
    -        export DOCKER_HOST="npipe:////./pipe/docker_engine"
    ```
    With C1 isolating the named pipe, the test should dynamically fetch the endpoint rather than hardcoding it.
    
    Fix: Retrieve and format the pipe directly from the service paths.
    ```diff
         rdd set running=true
    +    run -0 rdd svc paths docker_socket
         if is_windows; then
    -        export DOCKER_HOST="npipe:////./pipe/docker_engine"
    +        export DOCKER_HOST="npipe://${output//\\//}"
         else
    -        run -0 rdd svc paths docker_socket
             export DOCKER_HOST="unix://${output}"
         fi
    ```
    *(Apply similar dynamic fetching to the `docker_context_dir` test at line 790).*
    
    **S2. Inconsistent variable scoping in helper function** — `bats/tests/32-app-controllers/engine-docker.bats:796`
    
    ```bash
         else
             run -0 rdd service paths docker_socket
    -        socket_path=${output}
    +        local socket_path=${output}
             run -0 jq -r '.Endpoints.docker.Host' "${meta_file}"
    ```
    The PR removed `local` from `socket_path` in `docker_context_dir()`. While `local` is forbidden inside `@test` blocks to avoid subshell shadowing, `docker_context_dir()` is a helper function running in the same shell, making `local` the correct approach to prevent leaking the variable to the caller.
    
    ### Design Observations
    
    #### Concerns
    None.
    
    #### Strengths
    - **Rigorous K8s watch cache timing:** The implementation of `waitMirrorResourcesGone` directly polling the cache correctly addresses the asynchronous delay between a synchronous etcd `Delete` and the informer reflecting the change, preventing subtle race conditions where a controller acts on stale deletions.
    
    ### Testing Assessment
    
    1. The test suite correctly exercises the Docker context registration and endpoints. Once C1 and S1 are implemented, BATS tests will pass safely in parallel Windows environments.
    
    ### Documentation Assessment
    
    The newly introduced `AppSettledReasonEngineStale` and engine stopped reasons are correctly documented natively within `app_types.go`.
    
    ### Unresolved Feedback
    
    - **App stuck in `Running=False/Starting` and "already exists" errors**
      *General comment by @mook-as*
      The author reported their app gets stuck during start and subsequentially triggers an "instance already exists" error during retry. This is fully addressed and explained in findings **C1** and **I1**.
    - **Support for BATS under WSL**
      https://github.com/rancher-sandbox/rancher-desktop-daemon/pull/350#discussion_r3236092197
      The author questioned if `cygpath -m` breaks running BATS directly under WSL (since `cygpath` is typically an MSYS2/Cygwin utility, not natively present in WSL unless bridged, and `is_windows` evaluations might differ in WSL). Still actionable if WSL compatibility is required for the integration suite.
    
    ### Coverage Summary
    
    - `bats/tests/32-app-controllers/engine-docker.bats` — S1, S2
    - `pkg/apis/app/v1alpha1/app_types.go` — Reviewed, no issues
    - `pkg/controllers/app/app/controllers/app_controller.go` — Reviewed, no issues
    - `pkg/controllers/app/app/controllers/app_controller_test.go` — Reviewed, no issues
    - `pkg/controllers/app/app/lima-images-unix.yaml` — Trivial
    - `pkg/controllers/app/app/lima-images-wsl2.yaml` — Trivial
    - `pkg/controllers/app/engine/controller.go` — Reviewed, no issues
    - `pkg/controllers/app/engine/controllers/docker_context.go` — Reviewed, no issues
    - `pkg/controllers/app/engine/controllers/docker_context_test.go` — Reviewed, no issues
    - `pkg/controllers/app/engine/controllers/docker_watcher.go` — Reviewed, no issues
    - `pkg/controllers/app/engine/controllers/engine_reconciler.go` — Reviewed, no issues
    - `pkg/controllers/containers/controller.go` — Reviewed, no issues
    - `pkg/instance/instance.go` — C1
