Deep Review: 20260514-141313-pr-350
| Date | 2026-05-14 14:13 |
| Repo | rancher-sandbox/rancher-desktop-daemon |
| Round | 2 (of target) |
| Author | @Nino-K |
| PR | #350 — Wire up docker socket windows |
| Commits | 5681e67 Fixes test 104 stopping VM removes all mirror resourcese917657 Wire Docker socket forwarding for Windowsb5ab872 Bump openSUSE to v0.2.3 |
| Reviewers | Claude Opus 4.7 (effort: xhigh), Codex GPT 5.5 (effort: xhigh), Gemini 3.1 Pro (effort: default) |
| Verdict | Merge with fixes — scope the Windows Docker pipe per-instance so RDD never attaches to Docker Desktop's daemon; gate the BATS cygpath path on is_msys so the suite does not break under WSL2. @mook-as's functional report remains unresolved (author investigating). |
| Wall-clock time | 39 min 35 s |
Executive Summary ¶
Round 2 of PR #350. The author addressed every Round 1 finding except the DockerSocket() docstring (prior-round S4), which is now subsumed by the larger pipe-ownership finding below: waitMirrorResourcesGone is bounded (30 s context.WithTimeout) and reads the informer cache (r.List), the standalone containers-controller binary blank-imports the engine package, the stale tcp/unix wording is fixed, the engine reason strings are now exported constants, and the BATS cygpath/local style nits are resolved. The remaining concerns are two Important issues that survive into Round 2 as genuinely new analysis: (1) the Windows endpoint is the global Docker pipe \\.\pipe\docker_engine, so RDD can silently attach to — and mirror/delete containers on — Docker Desktop's daemon if it is installed; and (2) the re-enabled engine-docker.bats gates an MSYS2-only cygpath call on is_windows, which is also true under WSL2, exactly the gap @mook-as raised inline. The core stop/cleanup logic (computeSettledCondition, cleanupMirrorResources, the apiReader/r.List split) is well-reasoned and well-tested at the unit level.
Critical Issues ¶
None.
Important Issues ¶
// DockerSocket returns the path to the Docker socket for this instance
// (e.g., ~/.rd2/docker.sock). This is the host-side socket that Lima
// port-forwards from the guest's /var/run/docker.sock.
// On Windows, returns the named pipe path (\\.\pipe\docker_engine).
var DockerSocket = sync.OnceValue(func() string {
if runtime.GOOS == "windows" {
return `\\.\pipe\docker_engine`
}
return filepath.Join(ShortDir(), "docker.sock")
})
// DockerEndpoint returns the full Docker endpoint URL for this instance.
// On Windows this is a named-pipe URL (npipe:////./pipe/docker_engine);
// on Unix it is a unix-socket URL (unix:///path/to/docker.sock).
var DockerEndpoint = sync.OnceValue(func() string {
if runtime.GOOS == "windows" {
return `npipe:////./pipe/docker_engine`
}
return "unix://" + DockerSocket()
})
On Unix the socket path embeds ShortDir(), which factors in RDD_INSTANCE, so instances are isolated. On Windows both functions return the global docker_engine pipe — and pkg/socketbridge/host_windows.go:20-23 documents that exact name as the one "Docker CLI and Docker Desktop use to reach dockerd."
"github.com/Microsoft/go-winio"
"github.com/go-logr/logr"
"github.com/linuxkit/virtsock/pkg/hvsock"
)
const (
// DockerPipeName is the well-known Windows named pipe that Docker CLI
// and Docker Desktop use to reach dockerd.
DockerPipeName = `\\.\pipe\docker_engine`
// VsockForwardPort is the AF_VSOCK port that rdd-guest listens on inside
// the Lima VM. Must match the constant in cmd/rdd-guest/main.go.
VsockForwardPort uint32 = 6660
)
The failure chain is concrete. socketbridge.HostBridge.Run calls winio.ListenPipe(b.pipeName, nil) and returns an error if the pipe is already owned (pkg/socketbridge/host_windows.go:52-56). That error is swallowed: pkg/controllers/lima/limavm/controllers/hostswitch_windows.go:199-205 runs the bridge in an errgroup goroutine that logs the failure and return nil. Meanwhile the engine controller's Docker client connects to the same endpoint (pkg/controllers/app/engine/controllers/docker_watcher.go:55 via instance.DockerEndpoint()). So with Docker Desktop installed and listening, RDD's bridge either loses the listen race or coexists as a second pipe instance, and RDD's engine client reaches Docker Desktop's daemon. fullSync then mirrors Docker Desktop's containers into RDD's API, and a K8s-side delete forwards ContainerRemove to that daemon — RDD operating destructively on a container engine it does not own. Concurrent RDD instances (RDD_INSTANCE=1 and =2) collide the same way, though the repo's single-user calibration makes the Docker Desktop collision the more pressing leg.
}
}
// Run listens for named-pipe connections and proxies them to the guest until
// ctx is cancelled.
func (b *HostBridge) Run(ctx context.Context) error {
ln, err := winio.ListenPipe(b.pipeName, nil)
if err != nil {
return fmt.Errorf("socket-bridge: listen on %s: %w", b.pipeName, err)
}
b.log.Info("Listening", "pipe", b.pipeName, "vsockPort", b.vsockPort)
go func() {
<-ctx.Done()
ln.Close()
// Start the host-side socket bridge now that we have the VM GUID.
// It listens on the Docker named pipe and forwards each connection to
// rdd-guest inside the VM via vsock port 6660. rdd-guest is baked into
// the VM image (via rancher-desktop-opensuse) and started by systemd.
g.Go(func() error {
bridge := socketbridge.NewDockerHostBridge(vmGUID, logger)
if err := bridge.Run(ctx); err != nil {
logger.Error(err, "Socket bridge exited with error")
}
return nil
})
// Accept vsock connections and feed them into the virtual network.
g.Go(func() error {
for {
conn, err := ln.Accept()
// newDockerWatcher creates a Docker client, performs a full sync, and starts
// the event stream watcher goroutine.
func newDockerWatcher(ctx context.Context, k8s client.Client, apiNamespace string, reconcileChan chan<- event.GenericEvent) (*dockerWatcher, error) {
cli, err := mobyclient.New(
mobyclient.WithHost(instance.DockerEndpoint()),
)
if err != nil {
return nil, fmt.Errorf("failed to create Docker client: %w", err)
}
This is the same code Round 1 flagged as S4 (a docstring gap), but the Round 2 analysis is materially different: it is not the docstring, it is RDD silently mutating a foreign daemon. Round 1's S4 (still unaddressed — the docstring at instance.go:123 continues to say "for this instance") is subsumed here.
// length constraints.
var LimaHome = sync.OnceValue(func() string {
return filepath.Join(ShortDir(), "lima")
})
// DockerSocket returns the path to the Docker socket for this instance
// (e.g., ~/.rd2/docker.sock). This is the host-side socket that Lima
// port-forwards from the guest's /var/run/docker.sock.
// On Windows, returns the named pipe path (\\.\pipe\docker_engine).
var DockerSocket = sync.OnceValue(func() string {
if runtime.GOOS == "windows" {
Gemini additionally claimed the global pipe "breaks the VM boot sequence." That causal leg does not hold: the bridge starts after the VM GUID is known (hostswitch_windows.go:195-205), and @mook-as reported docker run / docker log working — so the pipe binds. His Running=False/Starting stall is a VM-side condition, not a pipe-bind failure.
mux.Handle("/services/forwarder/expose", vn.Mux())
mux.Handle("/services/forwarder/unexpose", vn.Mux())
g, ctx := errgroup.WithContext(ctx)
// Start the host-side socket bridge now that we have the VM GUID.
// It listens on the Docker named pipe and forwards each connection to
// rdd-guest inside the VM via vsock port 6660. rdd-guest is baked into
// the VM image (via rancher-desktop-opensuse) and started by systemd.
g.Go(func() error {
bridge := socketbridge.NewDockerHostBridge(vmGUID, logger)
if err := bridge.Run(ctx); err != nil {
logger.Error(err, "Socket bridge exited with error")
}
return nil
})
// Accept vsock connections and feed them into the virtual network.
g.Go(func() error {
for {
conn, err := ln.Accept()
Fix: scope the pipe to the instance, mirroring the Unix isolation, and pass that name through socketbridge.NewDockerHostBridge instead of the hardcoded DockerPipeName constant. If a well-known alias is required for bare docker.exe compatibility, bind it only after proving RDD owns it, and surface the bind failure into readiness so the engine never treats an externally-owned pipe as RDD's endpoint.
var DockerSocket = sync.OnceValue(func() string {
if runtime.GOOS == "windows" {
- return `\\.\pipe\docker_engine`
+ return `\\.\pipe\` + Name() + `-docker_engine`
}
return filepath.Join(ShortDir(), "docker.sock")
})
var DockerEndpoint = sync.OnceValue(func() string {
if runtime.GOOS == "windows" {
- return `npipe:////./pipe/docker_engine`
+ return `npipe:////./pipe/` + Name() + `-docker_engine`
}
return "unix://" + DockerSocket()
})
engine-docker.bats gates an MSYS2-only cygpath call on is_windows — breaks under WSL2 Claude Codex¶
# Isolate all Docker config reads and writes from the developer's real
# ~/.docker directory. Set DOCKER_CONFIG before starting the service so
# the controller process inherits it (service.Start uses exec.Command
# without an explicit Env, so it inherits the caller's environment).
#
# On Windows, rdd.exe is a native binary: it interprets /tmp/... as a
# path from the drive root (C:\tmp\...) rather than MSYS2's /tmp which
# maps to a different location. cygpath -m produces a mixed-format path
# (C:/msys64/...) that both native Windows processes and MSYS2 agree on.
if is_windows; then
run -0 cygpath -m "${BATS_FILE_TMPDIR}/docker-config"
DOCKER_CONFIG=${output}
else
DOCKER_CONFIG="${BATS_FILE_TMPDIR}/docker-config"
fi
export DOCKER_CONFIG
mkdir -p "${DOCKER_CONFIG}"
# Deliberately skip setup_rdd_control_plane here: `rdd set` internally
# calls ensureServiceRunning, which is exactly the CLI path we want to
# exercise — the engine controller is part of the default controller
# set, so no explicit --controllers selection is needed.
is_windows() returns true for both MSYS2 and WSL2 — bats/helpers/os.bash:13-16 sets OS=windows when /proc/sys/kernel/osrelease contains WSL2, and os.bash:55 provides is_msys() precisely because the two "behave differently from WSL." cygpath is an MSYS2 tool absent under WSL2, so run -0 cygpath ... fails local_setup_file outright there. The rdd() wrapper in bats/helpers/commands.bash already shows the correct pattern — it branches is_msys (uses cygpath) versus WSL (uses wslpath plus a WSLENV adjustment). The same gap applies to DOCKER_HOST="npipe:////./pipe/docker_engine" at line 39 and the meta.json host assertion at lines 791-799: an npipe:// endpoint is meaningful only to a native docker.exe, not a Linux docker client. Before this PR the whole file was skip_on_windows, so removing that guard is what exposed the WSL2 path.
Darwin)
# OS matches the directory name of the PATH_RESOURCES directory,
# so uses "darwin" and not "macos".
OS=darwin
;;
Linux)
if grep --quiet WSL2 </proc/sys/kernel/osrelease; then # spellchecker:ignore
OS=windows
else
OS=linux
fi
;;
MSYS* | MINGW*)
OS=windows
fi
}
# Detect MSYS2 environment (MSYS or MINGW shells).
# Both report OS=windows, but behave differently from WSL.
is_msys() {
[[ "${UNAME}" == MSYS* || "${UNAME}" == MINGW* ]]
}
is_unix() {
! is_windows "$@"
# set, so no explicit --controllers selection is needed.
rdd svc delete
rdd set running=true
if is_windows; then
export DOCKER_HOST="npipe:////./pipe/docker_engine"
else
run -0 rdd svc paths docker_socket
export DOCKER_HOST="unix://${output}"
fi
# Mirror resources live in App.spec.namespace. Override RDD_NAMESPACE
# to whatever the App was created with so the test queries the same
assert_file_exists "${meta_file}"
run -0 jq -r '.Name' "${meta_file}"
assert_output "${context_name}"
if is_windows; then
run -0 jq -r '.Endpoints.docker.Host' "${meta_file}"
assert_output "npipe:////./pipe/docker_engine"
else
run -0 rdd service paths docker_socket
socket_path=${output}
run -0 jq -r '.Endpoints.docker.Host' "${meta_file}"
assert_output "unix://${socket_path}"
fi
}
@test "moby engine sets currentContext when no healthy context exists" {
local context_name="rancher-desktop-${RDD_INSTANCE}"
This is exactly @mook-as's unresolved inline comment ("This wouldn't do the right thing if we're running BATS under WSL, right?"). It is latent rather than CI-breaking only if the project's Windows CI runs MSYS2 exclusively today.
Fix: gate the MSYS2-specific path on is_msys, and decide WSL2 behaviour explicitly — either handle wslpath + WSLENV for DOCKER_CONFIG and the DOCKER_HOST endpoint, or skip under WSL2 if that environment is genuinely unsupported for this suite.
- if is_windows; then
+ if is_msys; then
run -0 cygpath -m "${BATS_FILE_TMPDIR}/docker-config"
DOCKER_CONFIG=${output}
else
DOCKER_CONFIG="${BATS_FILE_TMPDIR}/docker-config"
fi
Suggestions ¶
cleanupMirrorResources comment says "the API server's watch cache" but the poll reads the informer cache Claude¶
errs = append(errs, fmt.Errorf("failed to delete ContainerNamespaces: %w", err))
}
if err := errors.Join(errs...); err != nil {
return err
}
// Block until the API server's watch cache reflects the deletions.
// The Kubernetes watch cache is updated asynchronously: DELETE
// requests commit to etcd synchronously, but the cache event is
// processed in a separate goroutine. On slower systems (e.g.
// Windows CI with SQLite I/O) there is an observable window where
// `kubectl get` returns a just-deleted resource. Polling here
// ensures that ContainerEngineReady is only stamped — and `rdd set
// running=false` only returns — after the cache reflects the clean
// state, so the immediately-following test assertion passes.
return r.waitMirrorResourcesGone(ctx)
}
// waitMirrorResourcesGone polls the informer cache until all four mirror
// resource types are empty or the 30-second timeout expires. r.List is
// used (not r.apiReader) so the poll drains the watch cache, not etcd.
waitMirrorResourcesGone correctly documents itself as polling "the informer cache" (line 504-506), and countMirrorResources uses r.List — the controller-runtime informer cache, which is downstream of the API server's watch stream. An empty informer cache therefore implies the API server already processed the deletes, which is the stronger guarantee the code wants. The logic is right; only the cleanupMirrorResources comment is imprecise and could mislead a future reader into thinking this synchronizes against the apiserver directly.
// running=false` only returns — after the cache reflects the clean
// state, so the immediately-following test assertion passes.
return r.waitMirrorResourcesGone(ctx)
}
// waitMirrorResourcesGone polls the informer cache until all four mirror
// resource types are empty or the 30-second timeout expires. r.List is
// used (not r.apiReader) so the poll drains the watch cache, not etcd.
func (r *EngineReconciler) waitMirrorResourcesGone(ctx context.Context) error {
log := logf.FromContext(ctx)
ctx, cancel := context.WithTimeout(ctx, 30*time.Second)
defer cancel()
for {
Fix: align line 492's wording — "Block until this controller's informer cache reflects the deletions" — and note that the informer cache lagging the apiserver is exactly what makes it a sufficient gate for downstream rdd ctl get callers.
instance.DockerSocket() on Windows flows a named-pipe string into Lima's HOST_DOCKER_SOCKET param Claude¶
// DockerSocket returns the path to the Docker socket for this instance
// (e.g., ~/.rd2/docker.sock). This is the host-side socket that Lima
// port-forwards from the guest's /var/run/docker.sock.
// On Windows, returns the named pipe path (\\.\pipe\docker_engine).
var DockerSocket = sync.OnceValue(func() string {
if runtime.GOOS == "windows" {
return `\\.\pipe\docker_engine`
}
return filepath.Join(ShortDir(), "docker.sock")
})
// DockerEndpoint returns the full Docker endpoint URL for this instance.
// On Windows this is a named-pipe URL (npipe:////./pipe/docker_engine);
// on Unix it is a unix-socket URL (unix:///path/to/docker.sock).
var DockerEndpoint = sync.OnceValue(func() string {
DockerSocket() has two consumers: DockerEndpoint() (correct — it wants the pipe path) and app_controller.go:102, which formats it into the Lima template's HOST_DOCKER_SOCKET param. A Lima hostSocket is a host filesystem socket path, not a named pipe. This is currently inert because the Lima WSL2 driver advertises SkipSocketForwarding: true, so the value is never acted on — and the pre-existing Windows value was also unused, so this is not a behaviour regression. But the field is now semantically wrong, and the DockerSocket docstring's "host-side socket that Lima port-forwards" no longer holds on Windows.
}
return strings.Join([]string{
baseTemplate,
"param:",
fmt.Sprintf(" CONTAINER_ENGINE: %s", spec.ContainerEngine.Name),
fmt.Sprintf(" HOST_DOCKER_SOCKET: %q", instance.DockerSocket()),
fmt.Sprintf(" HOST_HOME_GUEST: %q", toLinuxPath(hostHome)),
fmt.Sprintf(" KUBERNETES_ENABLED: %v", spec.Kubernetes.Enabled),
fmt.Sprintf(" KUBERNETES_VERSION: %s", spec.Kubernetes.Version),
"",
}, "\n"), nil
Fix: add a brief comment at app_controller.go:102 noting the param is inert on Windows (WSL2 skips socket forwarding; the pipe is bridged by hostswitch), or stop reusing DockerSocket() there so the two semantics do not share one accessor.
waitMirrorResourcesGone / countMirrorResources cleanup-synchronization logic Claude¶
waitMirrorResourcesGone (including the 30 s timeout fall-through that returns a distinct error) and countMirrorResources are exercised only indirectly by bats/tests/32-app-controllers/engine-docker.bats ("stopping VM removes all mirror resources"), which cannot reach the timeout branch. No EngineReconciler-level unit tests exist in this package, so the absence is consistent with repo convention — but the timeout path returns an error that propagates to rdd set as a user-visible failure, and a regression there (e.g. polling the wrong reader) would ship green. Consider a small table test for countMirrorResources against a fake client, or at minimum assert the timeout error message somewhere.
# exercise — the engine controller is part of the default controller
# set, so no explicit --controllers selection is needed.
rdd svc delete
rdd set running=true
if is_windows; then
export DOCKER_HOST="npipe:////./pipe/docker_engine"
else
run -0 rdd svc paths docker_socket
export DOCKER_HOST="unix://${output}"
fi
# Mirror resources live in App.spec.namespace. Override RDD_NAMESPACE
# to whatever the App was created with so the test queries the same
# namespace the engine controller uses, regardless of CRD defaults.
RDD_NAMESPACE=$(rdd ctl get app app -o jsonpath='{.spec.namespace}')
The test hardcodes npipe:////./pipe/docker_engine in both the local_setup_file DOCKER_HOST export and the docker_context_dir host assertion. If I1 is fixed by scoping the pipe per-instance, both sites must fetch the endpoint dynamically (e.g. from rdd svc paths docker_socket) rather than asserting the literal. Worth folding into the I1 fix so the test tracks the endpoint instead of pinning it.
Design Observations ¶
Concerns ¶
ContainerNamespace/mobyis watched but never re-applied on the steady-state path (in-scope) Codex —pkg/controllers/app/engine/controllers/engine_reconciler.go:259-292,834. The PR addsWatches(&containersv1alpha1.ContainerNamespace{}, enqueueApp), which is necessary socountMirrorResources'sr.Listreads a populated informer (the I1 cleanup-wait path). But it also means a delete ofContainerNamespace/mobynow wakes the engine reconciler — and the running steady-state path only processes container actions and finalizers, so nothing recreates the namespace.fullSynccreates it once at watcher startup (docker_watcher.go:615). Round 1 surfaced this as a design observation and noted the BATS test "deleting ContainerNamespace/moby completes without a finalizer hang" deliberately asserts it stays gone, so the behaviour is partly intentional — that reasoning still holds. The repo convention is that afullSync-created mirror must "either appear on the steady-state watch list (and be re-applied on reconcile) or carry a comment naming it as startup-only." It is now on the watch list but neither re-applied nor commented. Resolve the ambiguity: either add anensureContainerNamespacestep on the running path (and flip the test to assert recreation), or add a comment that the watch exists only to populate the cache forcountMirrorResourcesand the namespace is intentionally startup-only.- Orphaned
hostagentprocesses block instance-directory deletion on Windows (future, pre-existing) Gemini —pkg/service/cmd/service.go(not in the diff). Whenrdd svc deleteruns after a stuck start,Stop()issuesTerminateProcesson Windows, which kills the service but can leavehostagentchildren running; surviving children hold file locks underinstance.Dir(), soos.RemoveAllfails withERROR_ACCESS_DENIEDand the nextrdd service startaborts withinstance "rd" already exists. This is pre-existing and outside the diff, but the PR's Windows enablement is what makes the path reachable, and it matches @mook-as's second reported symptom. The orchestrator did not fully traceservice.go; flagged here for the author's active investigation rather than as a finding against this PR. (See repo context on the Windowstaskkill/KillTree-while-parent-alive asymmetry.) - Engine controller now runs in the external
containers-controllerbinary while watching a foreign API group (future) Claude —pkg/controllers/containers/controller.go:12. Blank-importingapp/engineinto thecontainersaggregator is the correct fix for the Round 1 silent-drop finding. ButEngineReconciler.SetupWithManagerdoesFor(&appv1alpha1.App{})— a watch on theappgroup — while theappcontroller that installs the App CRD lives in a different aggregator. So--controllers=containersin standalone external mode starts a controller watching a CRD nothing in that binary installs. Harmless in the embeddedrdd svc servepath (the app controller is always compiled in); the exposure is only the standalone external binary. If standalone external controllers are a supported deployment, the engine controller should tolerate a missing App CRD or document theapp+containersco-dependency; if they are dev-only, a one-line comment closes the question. waitMirrorResourcesGoneblocks the singleton engine reconcile for up to 30 s (future) Claude —pkg/controllers/app/engine/controllers/engine_reconciler.go:507-528. The engine reconciler is a singleton (MaxConcurrentReconciles1), so no other engine event is processed during the poll. This is acceptable: the block only happens on the!wantWatcherstopping path where there is nothing else useful to do, the timeout returns a recoverable error (the next reconcile retries; the terminalStoppedcondition is stamped only after cleanup succeeds), and manager shutdown cancels the parent context promptly. The controller-runtime-idiomatic alternative — delete, returnRequeueAfter, re-check the count at the top ofReconcile— avoids holding the worker but adds reconcile-entry state-machine complexity. The blocking poll is the simpler, reasonable choice; revisit only if the reconciler later needs to stay responsive during stop.
Strengths ¶
computeSettledConditionrequires a terminal engine reason at the current generation (in-scope) Claude Codex Gemini —pkg/controllers/app/app/controllers/app_controller.go:434-463. The new gate (ObservedGeneration >= app.Generationplus reason ∈ {Stopped,NotApplicable}) closes the hole where aConnected/M+1condition written mid-stop satisfied therdd set running=falsewait before mirror cleanup ran. The reasoning is documented inline and the five new table-test cases cover the absent / stale / current-but-not-terminal permutations precisely.- The
apiReader(uncached) vs.r.List(cached) split is deliberate and well-commented (in-scope) Claude Codex Gemini —pkg/controllers/app/engine/controllers/engine_reconciler.go:566-569,541.deleteAllOfTypereads the direct API server so it catches resources the watcher created seconds beforestopWatcher;countMirrorResourcesreads the informer cache so the readiness wait gates on what downstreamrdd ctl getcallers actually observe. Round 1's I1 (unbounded loop reading etcd) is fully resolved by this plus the 30 s bound.
Testing Assessment ¶
Unit coverage for computeSettledCondition is thorough — the five new table cases map one-to-one onto the new branches. Gaps, ranked by risk:
- WSL2 BATS path (I2) —
local_setup_filefails before any test runs under WSL2; the suite has never been exercised there since it wasskip_on_windowsuntil this PR. waitMirrorResourcesGone30 s timeout fall-through (S3) — the user-visible error path is unreachable from the BATS test and has no unit test.alreadyCleanwith the newObservedGenerationterm — no unit test confirms a stale-generation terminal condition forcescleanupMirrorResourcesto re-run; only integration timing covers it.npipe-scheme probe branch —probeNamedDockerContextnow acceptsnpipe, butdocker_context_test.goadds onlyunix://cases; thenpipeprobe path is reachable only via Windows BATS. AseedContext(t, "npipe-ctx", "npipe:////./pipe/some-pipe")case would lock it down on non-Windows CI.- Docker Desktop pipe collision (I1) — no test exercises RDD starting when
\\.\pipe\docker_engineis already owned by another listener.
Both Claude and Codex ran go test against the touched packages and reported it passing; Codex also ran make build-rdd successfully.
Documentation Assessment ¶
instance.go:123-126: theDockerSocketdocstring still says "the path to the Docker socket for this instance" and "host-side socket that Lima port-forwards" — both inaccurate on Windows (the pipe is global and bridged by hostswitch, not port-forwarded). The appended Windows line does not correct the lede. This is the unaddressed Round 1 S4, now folded into I1.engine_reconciler.go:492comment imprecision is covered in S1.- New exported constants in
app_types.go:64-79are each documented clearly, andpkg/controllers/containers/controller.go:6-7package doc was correctly updated to addengineto the aggregated list — good.
Commit Structure ¶
Three commits, each self-contained: the openSUSE bump (b5ab872), the socket-forwarding wiring (e917657), and the test-104 cleanup fix (5681e67). The openSUSE bump is arguably unrelated to "wire up docker socket windows" and could have been its own PR, but it is a one-line-per-file digest bump and does not harm reviewability. Messages match their diffs; no fixup commits. No action needed.
Acknowledged Limitations ¶
engine_reconciler.go:176-195already documents the watcher-died vs. never-started branch and the deliberate dual cleanup paths (stopWatchervs. directremoveDockerContext). Not an issue.docker_watcher.go:200-207notes a TODO for periodic full sync so dropped image/volume events self-heal. Pre-existing and not worsened by this PR.hostswitch_windows.go:151-155/:199-205document fire-and-forget host-switch lifecycle: failures are logged but not surfaced to the reconciler. Pre-existing — but note this is the same swallowed-error path that makes I1's silent Docker Desktop attachment possible.- The
TODO(windows)guards inengine/controller.goandengine_reconciler.gowere resolved by this PR rather than deferred, so there is no remaining acknowledged-limitation comment in the changed code.
Unresolved Feedback ¶
- @mook-as, inline on
engine-docker.bats— "This wouldn't do the right thing if we're running BATS under WSL, right? Do we no longer support that?" (discussion r3236092197). No reply and no subsequent commit addresses it. The codebase clearly still intends to support WSL (commands.bashbranchesis_msysvs. WSL explicitly), so this is a genuine gap, not a declined concern — see I2. - @mook-as, general comment — App stuck at
ContainerEngineReady=False/Stopped,Running=False/Starting;rdd set running=truedying with "failed to watch App: timed out waiting for the condition"; and a wedgedinstance "rd" already existsstate afterservice delete. @Nino-K replied "I'm looking into this." These are runtime integration failures the author is actively investigating. Theinstance "rd" already existssymptom aligns with the orphaned-hostagentdesign observation above. Thetimed out waiting for the conditionsymptom overlaps with the newcomputeSettledConditiongate andwaitMirrorResourcesGonetimeout — worth confirming commit5681e67(which post-dates the discussion) resolved rather than introduced the hang.
Agent Performance Retro ¶
Claude ¶
Strongest on diff-local detail and the test/wiring gaps: it owned three of the four suggestions — the cleanupMirrorResources comment imprecision, the DockerSocket()→Lima-HOST_DOCKER_SOCKET semantic drift (which needed tracing the second consumer of DockerSocket()), and the missing unit coverage — plus the sharpest design observation, that the engine controller now watches the app group from the containers aggregator in the standalone external binary. It also independently caught the WSL2 gate. Its notable miss is the headline: Claude reviewed instance.go and raised only the Lima-param suggestion, never connecting the global pipe to a Docker Desktop collision — the most important finding of the review, which both other agents reached.
Codex ¶
Best analysis on the headline finding. It traced the full claim chain — from socketbridge's ListenPipe failure, through hostswitch's swallowed error, to the engine client connecting to a foreign daemon — which is what makes I1 land as a concrete data-loss risk rather than a docstring nit. It independently caught the WSL2 gate and re-raised the ContainerNamespace steady-state gap (consolidated as a design observation, consistent with Round 1). No false positives; it ran go test and make build-rdd. Zero suggestions — Codex stays on structural issues and leaves the smaller stuff to the other two.
Gemini ¶
Converged with Codex on the pipe-ownership finding and uniquely added the concurrent-RDD-instance angle and the orphaned-hostagent pre-existing gap that explains @mook-as's instance "rd" already exists symptom. But it inflated the pipe finding to Critical with an unverified causal claim — "breaks the VM boot sequence" — that @mook-as's own report contradicts (docker run works, so the pipe binds). Its S2 was a false positive: it argued local socket_path should be restored, having misread the assignment as living in the docker_context_dir helper when it is inside an @test block. As usual, no git blame (quota), so it could not separate PR-introduced from pre-existing. Lightest coverage of the three — it left pkg/controllers/containers/controller.go out of its summary.
Summary ¶
| Claude Opus 4.7 | Codex GPT 5.5 | Gemini 3.1 Pro | |
|---|---|---|---|
| Duration | 29m 29s | 10m 59s | — |
| Findings | 1I 3S | 2I | 1I 1S |
| Tool calls | 31 (Read 17, Bash 9, Grep 5) | 90 (shell 88, stdin 2) | — |
| Design observations | 5 | 3 | 1 |
| False positives | 0 | 0 | 1 |
| Unique insights | 5 | 1 | 1 |
| Files reviewed | 13 | 13 | 12 |
| Coverage misses | 0 | 0 | 1 |
| Totals | 1I 3S | 2I | 1I 1S |
| Downgraded | 0 | 1 (I→S) | 2 (I→S) |
| Dropped | 0 | 0 | 1 |
Reconciliation:
- Gemini C1 "Global named pipe breaks multi-instance isolation": critical → important (I1). The destructive-to-a-foreign-daemon risk is real, but it requires Docker Desktop installed and listening, RDD is pre-release, and actual container deletion needs explicit user action — Important, not Critical. Gemini's "breaks the VM boot sequence" causal leg was dropped as contradicted by @mook-as's report.
- Codex I2 "ContainerNamespace delete events enqueue a reconcile that cannot recreate the namespace": important → design observation, consistent with Round 1's handling — the BATS test deliberately asserts the namespace stays gone after a user delete, so the behaviour is partly intentional.
- Gemini I1 "Orphaned hostagents block instance directory deletion": important → design observation.
pkg/service/cmd/service.gois not in the diff; surfaced as a pre-existing concern that the PR's Windows enablement makes reachable. - Gemini S2 "restore
local socket_path": dropped as a false positive. The assignment is inside the@test "moby engine creates Docker context for the instance"block (engine-docker.bats:775-800), not thedocker_context_dirhelper (763-773), so the repo's "nolocalin@test" rule applies and Round 1's S5 removal was correct.
Codex provided the most value this round — it owned the headline finding's analysis and caught everything else of substance with no false positives. Claude contributed the most breadth on the smaller issues and the cleanest design observations, but missed the headline in a file it did review. Gemini added two useful unique angles but cost a clarification-worthy false positive and an inflated severity.
Review Process Notes ¶
Repo context updates ¶
- [repo]
is_windows()returns true for both MSYS2 and WSL2;is_msys()(inbats/helpers/os.bash) distinguishes them, and therdd()wrapper inbats/helpers/commands.bashbranchesis_msys(cygpath, native-Windows assumptions) versus WSL (wslpath,WSLENV). When reviewing BATS code, flag any path that gates an MSYS2-only tool or a native-Windows-only assumption (e.g. annpipe://endpoint a Linux docker client cannot use) onis_windows— it must useis_msysor explicitly handle the WSL branch. Removing askip_on_windowsguard exposes the WSL2 path, which the suite was never previously exercised against. - [repo] On Windows, RDD's Docker named pipe and Docker Desktop's pipe share the well-known name
\\.\pipe\docker_engine(pkg/socketbridge/host_windows.go). The socketbridgeListenPipefailure is swallowed by the hostswitch goroutine, andinstance.DockerEndpoint()feeds the engine controller's Docker client. When reviewing Windows Docker-endpoint code, verify the pipe is instance-scoped (or that RDD proves it owns the pipe before connecting a client) — an unscoped client can silently attach to Docker Desktop's daemon and mirror or delete its containers.