Deep Review: 20260424-130819-pr-333
| Date | 2026-04-24 13:18 |
| Repo | rancher-sandbox/rancher-desktop-daemon |
| Round | 2 (of target) |
| Author | @Nino-K |
| PR | #333 — Docker Engine: improve Docker context probe safety and isolation |
| Commits | 34cb44d Docker Engine: improve Docker context probe safety and isolation |
| Reviewers | Claude Opus 4.7 (effort: xhigh), Codex GPT 5.5 (effort: xhigh), Gemini 3.1 Pro (effort: default) |
| Verdict | Merge with fixes — the round-1 cleanup landed cleanly (dead helper removed, bailouts now log, tests isolated via DOCKER_CONFIG, flaky test stabilized), but one false-override class the PR claims to close remains (DOCKER_CONTEXT env), plus two new local-in-@test regressions and one round-1 suggestion still unaddressed. |
| Wall-clock time | 14 min 24 s |
Executive Summary ¶
Round 2 folds in every round-1 fix: getDockerContextHost is gone, the five "assume healthy" bailouts all log at V(1), the BATS suite uses DOCKER_CONFIG=${BATS_FILE_TMPDIR}/..., and the flaky "no healthy context" test now seeds an unreachable named context so the probe takes the probeNamedDockerContext branch regardless of the host's default Docker socket. The core refactor — ContextStore + WithTLSData + EndpointMeta.ClientOpts() — is sound and correctly retires the TCP+TLS / mTLS / SkipTLSVerify / SSH false-override classes the PR description calls out.
What remains clusters in three buckets:
- One false-override class is still open.
getCurrentDockerContext()readsconfig.jsononly; the Docker CLI'sDOCKER_CONTEXTenv var is ignored. WhenDOCKER_CONTEXT=<named-context>is set in the daemon's (frozen) startup env andconfig.jsonhas nocurrentContext, the code probes the implicit default endpoint viaFromEnv(which does not consultDOCKER_CONTEXT) and — if that default is unreachable — overwritesconfig.jsonwithrancher-desktop-{instance}. A user who later unsetsDOCKER_CONTEXTsees our context as their new default. - Two new BATS regressions. The round-1 test fix introduced two
localdeclarations inside an@testblock atengine-docker.bats:790-792, which violates the repo's documented BATS style (localinside@testis redundant and explicitly flagged). - Documentation drift and one carry-over. The old inline comment explaining why empty/
defaultwere treated identically disappeared with the refactor; the new behavior (probe viaFromEnv) is not documented at the call site, anddocs/design/bootstrap.md:58,86still describes the old always-promote semantics. The round-1 S5 TODO(windows) wording fix was not picked up.
Report structure: 0 Critical, 1 Important, 8 Suggestions, 1 Design Concern, 4 Design Strengths.
Critical Issues ¶
None.
Important Issues ¶
// getCurrentDockerContext reads the currentContext field from ~/.docker/config.json.
// Returns an empty string if the file does not exist or no context is set.
// config.Load validates every "auths" entry; a malformed credential fails
// this call (and set/clearCurrentDockerContext) until the user repairs
// config.json.
func getCurrentDockerContext() (string, error) {
dir, err := dockerConfigDir()
if err != nil {
return "", err
}
cf, err := config.Load(dir)
if err != nil {
return "", err
}
return cf.CurrentContext, nil
}
func setCurrentDockerContext(name string) error {
dir, err := dockerConfigDir()
if err != nil {
return err
getCurrentDockerContext() reads cf.CurrentContext from config.json and nothing else (docker_context.go:88-98). The Docker CLI resolves the effective context with an env-first precedence: DOCKER_HOST bypasses contexts entirely, then DOCKER_CONTEXT selects a named context, then config.json's currentContext, then the platform default. dockerclient.FromEnv (used by probeDockerContext) reads DOCKER_HOST / DOCKER_CERT_PATH / DOCKER_TLS_VERIFY but not DOCKER_CONTEXT.
// getCurrentDockerContext reads the currentContext field from ~/.docker/config.json.
// Returns an empty string if the file does not exist or no context is set.
// config.Load validates every "auths" entry; a malformed credential fails
// this call (and set/clearCurrentDockerContext) until the user repairs
// config.json.
func getCurrentDockerContext() (string, error) {
dir, err := dockerConfigDir()
if err != nil {
return "", err
}
cf, err := config.Load(dir)
if err != nil {
return "", err
}
return cf.CurrentContext, nil
}
func setCurrentDockerContext(name string) error {
dir, err := dockerConfigDir()
if err != nil {
return err
The unpatched sequence:
- User sets
DOCKER_CONTEXT=myremote(e.g., an SSH or mTLS context to a remote Docker), then runsrdd svc start.rdd svc startspawns the daemon viaexec.Commandwith no explicitEnv, so the daemon inheritsDOCKER_CONTEXT=myremotein its frozen startup env. config.jsonhas nocurrentContext.getCurrentDockerContext()returns"", the reconciler enters the else branch, andprobeDockerContextpings the platform default socket.- The user has no local Docker (that is why they were using
DOCKER_CONTEXT=myremotein the first place), so the default socket probe fails. setCurrentDockerContext("rancher-desktop-{instance}")overwritesconfig.json.
While DOCKER_CONTEXT is set, the user's active context is still myremote (env wins). The damage surfaces when the user unsets DOCKER_CONTEXT: their previous "default" — the implicit myremote — is gone, replaced by rancher-desktop-{instance}. This is exactly the false-override class the PR description enumerates for SSH/mTLS contexts.
Fix: resolve the effective context using the daemon's frozen env before probing. The simplest correct stance, given the daemon can only see its spawn-time env, is to not auto-promote when DOCKER_CONTEXT is set and DOCKER_HOST is not:
current, err := getCurrentDockerContext()
if err != nil {
log.Error(err, "Failed to read current Docker context")
return
}
+// Docker CLI resolves a named context from DOCKER_CONTEXT when
+// config.json has no currentContext; getCurrentDockerContext does
+// not. Consult the env so the probe targets the right endpoint and
+// we do not rewrite config.json to shadow an env-selected context.
+if current == "" && os.Getenv("DOCKER_HOST") == "" {
+ if envCtx := os.Getenv("DOCKER_CONTEXT"); envCtx != "" {
+ current = envCtx
+ }
+}
var healthy bool
if current != "" && current != "default" {
healthy = probeNamedDockerContext(probeCtx, current)
Note the prior-round design concern still stands independently: the daemon's env is frozen at spawn, so any user change to DOCKER_HOST / DOCKER_CERT_PATH / DOCKER_TLS_VERIFY / DOCKER_CONTEXT after rdd svc start is invisible to the probe. The fix above does not close that bigger gap — it only addresses the at-spawn-time case — but it closes the specific false-override path a DOCKER_CONTEXT-only user hits today.
(important, gap)
Suggestions ¶
# host has a running Docker daemon at the default socket (e.g.
# ubuntu-latest runners ship Docker pre-installed). Using a named context
# also avoids the stale-env issue: the service process's DOCKER_HOST is
# frozen at first start, so FromEnv-based probing of "" / "default" is
# host-dependent and unreliable from inside a long-lived daemon.
local probe_target="test-unreachable"
run_e -0 docker_context_dir "${probe_target}"
local probe_dir="${output}"
mkdir -p "${probe_dir}"
cat >"${probe_dir}/meta.json" <<EOF
{"Name":"${probe_target}","Endpoints":{"docker":{"Host":"unix:///nonexistent/does-not-exist.sock","SkipTLSVerify":false}}}
The round-1 test fix added two new local declarations at lines 790 and 792 (the diff against the round-1 base confirms both are new). The repo's BATS style rule flags local inside an @test block as redundant — each @test already runs in its own subshell. The pre-existing local context_name on line 780 is out of scope for this PR, but the two new ones are.
run -0 jq -r '.Endpoints.docker.Host' "${meta_file}"
assert_output "unix://${socket_path}"
}
@test "moby engine sets currentContext when no healthy context exists" {
local context_name="rancher-desktop-${RDD_INSTANCE}"
# Seed a named context pointing at an unreachable socket and make it
# current. This forces the reconciler into the probeNamedDockerContext
# branch with a guaranteed-failing endpoint, independent of whether the
# host has a running Docker daemon at the default socket (e.g.
Fix:
- local probe_target="test-unreachable"
+ probe_target="test-unreachable"
run_e -0 docker_context_dir "${probe_target}"
- local probe_dir="${output}"
+ probe_dir="${output}"
(suggestion, regression)
// When we cannot determine whether the context is healthy — because the store
// is unreadable, the endpoint metadata is missing or malformed, the scheme is
// not tcp/unix (e.g. SSH), or the TLS config cannot be loaded — we return true
// (assume healthy). This conservatism prevents RDD from clobbering a context
// that is in the middle of being edited or migrated.
func probeNamedDockerContext(ctx context.Context, name string) bool {
log := logf.FromContext(ctx).WithName("docker-context").WithValues("context", name)
s, err := newContextStore()
if err != nil {
log.V(1).Error(err, "Cannot open context store; assuming context healthy")
return true
}
md, err := s.GetMetadata(name)
if err != nil {
if errdefs.IsNotFound(err) {
return false
}
log.V(1).Error(err, "Cannot read context metadata; assuming context healthy")
return true
}
epMeta, err := docker.EndpointFromContext(md)
if err != nil {
log.V(1).Error(err, "Cannot decode docker endpoint; assuming context healthy")
return true
}
scheme, _, _ := strings.Cut(epMeta.Host, "://")
switch scheme {
case "unix", "tcp":
default:
log.V(1).Info("Non-tcp/unix endpoint scheme; assuming context healthy", "scheme", scheme)
return true
}
ep, err := docker.WithTLSData(s, name, epMeta)
if err != nil {
log.V(1).Error(err, "Cannot load TLS data; assuming context healthy")
return true
}
opts, err := ep.ClientOpts()
if err != nil {
log.V(1).Error(err, "Cannot build client options; assuming context healthy")
return true
}
probeCtx, cancel := context.WithTimeout(ctx, dockerContextProbeTimeout)
defer cancel()
return pingDocker(probeCtx, opts...)
}
The new function has five distinct return paths (open-store error, NotFound → false, decode error, non-tcp/unix scheme → true, TLS/ClientOpts error, unreachable ping). Only the unreachable-unix-socket path has integration coverage via the BATS test; the unit-test file (docker_context_test.go) only renames getDockerContextHost calls to testGetContextHost and does not exercise the new function. The branches most likely to regress on a future rewrite — errdefs.IsNotFound returning false, and the ssh:// short-circuit returning true — have no unit test anchoring the contract the docstring describes.
Fix: add three table-driven cases to docker_context_test.go: (a) seed a context with Host: "ssh://user@host" and assert probeNamedDockerContext returns true; (b) call with a name the store does not contain and assert false; (c) seed a unix:// context pointing at a nonexistent socket and assert false. All three can run against the existing newDockerTestDir fixture. The TCP+TLS / mTLS / SkipTLSVerify paths — the motivation for the PR — deserve coverage too, and a seeded fake context store with TLS data plus an unreachable endpoint would pin the TLS construction without needing a live TLS daemon.
(suggestion, gap)
if e != nil {
e.stop()
}
r.removeDockerContext()
}
// manageDockerContext creates the instance Docker context and, in a goroutine,
// probes the user's current context; if it is absent or unhealthy, switches
// the default to the new context. At most one probe runs at a time.
func (r *EngineReconciler) manageDockerContext(socketPath string) {
// TODO(windows): Docker context management uses unix:// sockets; Windows
// requires npipe:// in the context meta file, an npipe-aware probe client,
// and platform-specific path handling in getCurrentDockerContext and
// createReplaceDockerContext. Track in a follow-up issue before shipping
The function-level comment says RDD switches to its context when the user's current is "absent or unhealthy". After the refactor, "absent" (empty or "default" current) no longer unconditionally counts as "unhealthy" — probeDockerContext(ctx) pings the implicit default endpoint via FromEnv and returns true when a foreign daemon responds. docs/design/bootstrap.md carries the same stale contract:
- Line 58: *"It creates a
rancher-desktop-2docker context. If there is no current context, or if the context isn't accepting connections, then it makes the new context the default."* - Line 86: *"
rdd starthas created therancher-desktop-2docker context, but it will only activate it if the current context doesn't exist, or is non-functional."*
Fix: update the docstring and both design-doc bullets to say empty / "default" current is now treated as "use Docker's implicit default endpoint, and only promote RDD when that endpoint does not respond."
(suggestion, documentation)
Test_getDockerContextHost_missing refers to a production function that no longer exists Claude¶
_, err := getCurrentDockerContext()
assert.Assert(t, err != nil, "malformed auths entry must surface as an error")
}
func Test_getDockerContextHost_missing(t *testing.T) {
newDockerTestDir(t)
host, err := testGetContextHost(t, "does-not-exist")
assert.NilError(t, err)
assert.Equal(t, host, "")
}
The commit deletes getDockerContextHost from production code and introduces a test-only testGetContextHost helper. The Test_getDockerContextHost_missing test calls the helper, not the removed production function, so a future reader grepping for getDockerContextHost lands on a test whose subject is gone. The other two call sites (Test_createReplaceDockerContext, Test_deleteDockerContext) already name their subject by the production function they exercise, so the pattern is inconsistent.
Fix: rename to match the helper, or fold the NotFound case into a Test_probeNamedDockerContext_missing that asserts probeNamedDockerContext returns false for an absent context (and delete the helper, which now has only one caller).
(suggestion, enhancement)
// would overwrite a file we cannot safely round-trip.
log.Error(err, "Failed to read current Docker context")
return
}
var healthy bool
if current != "" && current != "default" {
healthy = probeNamedDockerContext(probeCtx, current)
} else {
healthy = probeDockerContext(probeCtx)
}
// Guard against writing currentContext after removeDockerContext has
// already cancelled probeCtx and deleted the context directory.
if !healthy && probeCtx.Err() == nil {
if err := setCurrentDockerContext(contextName); err != nil {
log.Error(err, "Failed to set current Docker context", "context", contextName)
The old code carried an explicit comment above this block: *"Probe the current context's host — not our own — to decide whether to promote ours. default and empty both mean no working context is set; treat them identically."* The refactor deleted the comment, and the new behavior (probe the implicit default via FromEnv so a foreign Docker daemon does not get clobbered) now has no call-site explanation. probeDockerContext's own docstring states what it does but not why the reconciler routes empty/default into it.
Fix: add a one-line comment above the branch — e.g. // Probe the implicit default endpoint (DOCKER_HOST or platform default socket) so a foreign Docker daemon is not overridden. — so a reader sees the intent without cross-referencing probeDockerContext.
(suggestion, documentation)
// manageDockerContext creates the instance Docker context and, in a goroutine,
// probes the user's current context; if it is absent or unhealthy, switches
// the default to the new context. At most one probe runs at a time.
func (r *EngineReconciler) manageDockerContext(socketPath string) {
// TODO(windows): Docker context management uses unix:// sockets; Windows
// requires npipe:// in the context meta file, an npipe-aware probe client,
// and platform-specific path handling in getCurrentDockerContext and
// createReplaceDockerContext. Track in a follow-up issue before shipping
// Windows support.
if runtime.GOOS == "windows" {
return
}
contextName := instance.Name()
log := logf.FromContext(r.watcherCtx).WithName("docker-context")
getCurrentDockerContext delegates to docker/cli's config.Load, and createReplaceDockerContext (as well as newContextStore) delegates to the store's built-in path handling; both resolve %USERPROFILE%\.docker on Windows through os.UserHomeDir() with no extra work required on RDD's side. The Windows-specific work concentrates in one line of createReplaceDockerContext (the hard-coded "unix://" prefix) and in the probe client selection. Round 1 (S5) flagged the same inaccuracy and it was not picked up.
Fix:
- // TODO(windows): Docker context management uses unix:// sockets; Windows
- // requires npipe:// in the context meta file, an npipe-aware probe client,
- // and platform-specific path handling in getCurrentDockerContext and
- // createReplaceDockerContext. Track in a follow-up issue before shipping
- // Windows support.
+ // TODO(windows): createReplaceDockerContext hard-codes "unix://"; Windows
+ // needs "npipe://" plus an npipe-aware probe client in
+ // probeDockerContext / probeNamedDockerContext. Track in a follow-up
+ // issue before shipping Windows support.
(suggestion, enhancement, round 1 S5 unaddressed)
func probeNamedDockerContext(ctx context.Context, name string) bool {
log := logf.FromContext(ctx).WithName("docker-context").WithValues("context", name)
s, err := newContextStore()
if err != nil {
log.V(1).Error(err, "Cannot open context store; assuming context healthy")
return true
}
md, err := s.GetMetadata(name)
if err != nil {
if errdefs.IsNotFound(err) {
The five bailouts (store open error, metadata read error, endpoint decode error, TLS-data load error, ClientOpts error) all log at V(1). V(1) is off at the default verbosity, so an operator diagnosing "why didn't RDD promote its context?" sees no trace unless they raise verbosity first. Each bailout fires at most once per probe transition and covers a genuinely unusual condition (corrupt or partial Docker config), so the default level would not be noisy. The non-tcp/unix scheme branch at line 185 is the exception — SSH/npipe are common and the V(1).Info there is correct.
}
scheme, _, _ := strings.Cut(epMeta.Host, "://")
switch scheme {
case "unix", "tcp":
default:
log.V(1).Info("Non-tcp/unix endpoint scheme; assuming context healthy", "scheme", scheme)
return true
}
ep, err := docker.WithTLSData(s, name, epMeta)
if err != nil {
log.V(1).Error(err, "Cannot load TLS data; assuming context healthy")
Fix: change the five log.V(1).Error calls to log.Error (default level). Leave the scheme-mismatch V(1).Info as-is.
(suggestion, enhancement)
run -0 rdd ctl get ContainerNamespaces --namespace="${RDD_NAMESPACE}" --output=name
refute_output
}
# --- Docker context management ---
# docker_context_dir returns the ~/.docker/contexts/meta/<hash> directory for
# the named context. Docker derives the sub-directory from sha256(name).
docker_context_dir() {
local name="$1"
local hash
# sha256sum on Linux, shasum on macOS
The helper's body now returns ${DOCKER_CONFIG}/contexts/meta/${hash}, but the docstring still says ~/.docker/contexts/meta/<hash>. The two diverged with the DOCKER_CONFIG isolation change.
Fix: update the comment to say ${DOCKER_CONFIG}/contexts/meta/<hash>, or parameterize the description as "the contexts/meta/<hash> directory under DOCKER_CONFIG".
(suggestion, documentation)
Design Observations ¶
Concerns ¶
Context restoration on shutdown (future) Gemini
When RDD promotes itself to currentContext on startup and later shuts down, removeDockerContext clears currentContext rather than restoring whatever the user had before. A user whose previous default was set silently loses that selection. Saving the previous value when promoting and restoring it on shutdown would preserve the user's intent across an RDD lifecycle. Out of scope for this PR, but worth tracking.
Strengths ¶
Docker CLI primitives are the right abstraction (in-scope) Claude Codex Gemini
ContextStore, WithTLSData, and EndpointMeta.ClientOpts() already know how to assemble an HTTP client for every endpoint shape docker CLI supports (unix, tcp, tcp+TLS, mTLS, SkipTLSVerify). Delegating retires four false-override classes the PR description calls out and keeps RDD aligned with Docker's own context format.
Two-function split makes the transport decision explicit (in-scope) Claude Codex
Splitting the old single-signature probeDockerContext(ctx, host) into probeDockerContext(ctx) (FromEnv-backed default probe) and probeNamedDockerContext(ctx, name) (store-backed named probe) makes the "did we load TLS for this?" question unmissable at each call site. The old signature silently used WithHost(host) even when TLS data was required; the new signatures foreclose that class entirely.
Scheme gate short-circuits SSH without building a connhelper (in-scope) Claude
probeNamedDockerContext checks the URL scheme before calling WithTLSData, so SSH and npipe endpoints skip the commandconn subprocess construction ClientOpts would otherwise build. This keeps the "assume healthy for non-tcp/unix" path cheap and avoids spawning an ssh-dial-stdio helper only to time out.
pingDocker extraction centralizes the timeout idiom (in-scope) Claude
Hoisting the "new client + defer close + Ping within ctx" sequence into pingDocker keeps the timeout-creation pattern in one place and makes both probe wrappers one-liners at the call site. The call graph is shallow and the helper has no surprising side effects.
Testing Assessment ¶
Unit tests in docker_context_test.go cover context-store CRUD via the new testGetContextHost helper; the BATS suite exercises the reconciler end-to-end against a real Docker daemon inside Lima, and the DOCKER_CONFIG isolation is a clear improvement over round 1. Residual gaps, ranked by risk:
DOCKER_CONTEXTenv-var handling has no coverage (see I1). A unit test that setsDOCKER_CONTEXT=myname, leavesconfig.jsonempty, and asserts the probe targetsmyname— not the platform default — would pin the I1 fix.- No unit coverage for the TCP+TLS, mTLS, or
SkipTLSVerifyprobe paths — the motivation for the PR. A fake context store seeded with TLS data, probed against an unreachable TLS endpoint, would verifyep.ClientOpts()actually constructs the TLS config and the failure is a reachability failure, not a config failure. - No unit coverage for the SSH/npipe short-circuit (S2). Seeding a context with
Host: "ssh://u@h"and assertingprobeNamedDockerContextreturnstruewithout attempting a ping would lock the branch. - No unit coverage for
probeDockerContext(FromEnv)(S2).DOCKER_HOST=unix:///nonexistent/...should returnfalse— pinning that behavior guards the default-probe contract. - No unit coverage for the
NotFound→ false branch ofprobeNamedDockerContext(S2). Calling with a missing context should returnfalse; this is the one branch whose contract flipped (the oldgetDockerContextHostreturned empty + nil; the new function returnsfalse). - Early return on
getCurrentDockerContexterror lacks a reconciler-level assertion.Test_getCurrentDockerContext_malformedAuthverifies the error surfaces from the helper, but no test confirms the reconciler refuses to callsetCurrentDockerContextwhen the config is malformed.
Documentation Assessment ¶
- The new
probeNamedDockerContextdocstring atdocker_context.go:152-159accurately documents the "assume healthy" policy and its rationale. - The new
probeDockerContextdocstring atdocker_context.go:143-145states what (FromEnv) but not when the reconciler calls it — a cross-reference to themanageDockerContextcall site would help readers (see S5). engine_reconciler.go:317-320anddocs/design/bootstrap.md:58,86describe pre-refactor semantics (see S3).engine-docker.bats:742-743references~/.docker/contexts/...although the body now returns${DOCKER_CONFIG}/contexts/...(see S8).- TODO(windows) at
engine_reconciler.go:322-326overstates the work (see S6, round 1 S5 unaddressed).
Commit Structure ¶
Single commit, scope matches the title, message enumerates the four independent improvements (FromEnv default probe, ContextStore-based named probe, early return on context-lookup errors, DOCKER_CONFIG isolation). The bundling is defensible: the new probe paths depend on the context-store reads, and pinning the store location in tests keeps both changes verifiable together. No objection.
Acknowledged Limitations ¶
- Windows Docker context management is still deferred. Both
manageDockerContextandremoveDockerContextearly-return onruntime.GOOS == "windows". The TODO calls this out explicitly; see S6 for the wording. - SSH/npipe endpoints are trusted implicitly.
probeNamedDockerContextreturnstruefor any scheme that is notunixortcp. The docstring acknowledges the tradeoff: the controller cannot authoritatively probe these endpoints, so it accepts the risk of a dead user-configured context in exchange for not overriding the user's legitimate choice. - Conservative "assume healthy" on unreadable store / malformed metadata.
docker_context.go:154-159documents this as deliberate — clobbering a partially edited or migrated config is worse than leaving it alone. S7 suggests promoting the bailout logs to the default level so operators can diagnose silent non-promotion.
Unresolved Feedback ¶
None. The gh api fetch of PR review comments returned no inline comments on the current commit.
Agent Performance Retro ¶
Claude ¶
Produced the deepest pass on the round-2 code: caught the stale test-function name (S4), the missing call-site comment for the empty/default branch (S5), the carry-over round-1 S5 TODO wording (S6), and the V(1) verbosity question on the new bailout logs (S7). Missed the two local declarations at engine-docker.bats:790,792 despite having produced the BATS test fix suggestion in round 1 — a coverage miss, since Claude marked the BATS file "Reviewed, no issues" while both peers flagged the style regression. Also missed Codex's DOCKER_CONTEXT I1; the omission is notable because the prior-round memory already documents the "daemon env is frozen" pattern, which is exactly what I1 turns on. Coverage Summary was complete and cross-referenced finding IDs.
Codex ¶
Delivered the strongest single finding of the round — I1 (DOCKER_CONTEXT not consulted) — which is in the exact class the PR claims to close and is the only genuinely new bug surface the round uncovered. Also caught the design-doc drift at bootstrap.md:58,86 plus the manageDockerContext docstring (S3) and the stale docker_context_dir helper comment (S8). The diff-only local regression (S1) came in parallel with Gemini. Missed the stale test-function name (S4) and the round-1 TODO carry-over (S6), neither of which costs much — both are within Claude's strike zone.
Gemini ¶
Matched peers on the local regression and on the missing unit-test coverage. The unique design observation about shutdown not restoring the user's previous currentContext is a valid forward-looking concern and worth tracking. Otherwise low yield: both of Gemini's initial "important" findings landed as suggestions after consolidation — the unit-test coverage gap is a testing concern (not a bug), and the local-in-@test regression is a style-rule violation whose blast radius is minimal (tests still pass). Gemini again skipped git blame (known daily-quota behavior); Claude and Codex covered that gap. No coverage misses, no false positives.
Summary ¶
| Claude Opus 4.7 | Codex GPT 5.5 | Gemini 3.1 Pro | |
|---|---|---|---|
| Duration | 5m 40s | 4m 39s | 4m 36s |
| Findings | 5S | 1I 3S | 2S |
| Tool calls | 20 (Bash 12, Read 7, Grep 1) | 40 (shell 40) | 15 (runshellcommand 7, readfile 4, grepsearch 3) |
| Design observations | 3 | 2 | 3 |
| False positives | 0 | 0 | 0 |
| Unique insights | 4 | 3 | 0 |
| Files reviewed | 4 | 4 | 4 |
| Coverage misses | 1 | 0 | 0 |
| Totals | 5S | 1I 3S | 2S |
| Downgraded | 0 | 0 | 2 (I→S) |
| Dropped | 0 | 0 | 0 |
Reconciliation:
- Gemini P1 I1 (missing unit-test coverage) → consolidated as S2 (suggestion) and merged with Claude P1 S2. Reason: test-coverage gaps do not land an Important absent a concrete regression.
- Gemini P1 I2 (
localin@test) → consolidated as S1 (suggestion) and merged with Codex P1 S2. Reason: repo style violation, tests still pass; regression severity sits at suggestion.
Review Process Notes ¶
Skill improvements ¶
- [ ] Add an "env-based resolution" check to review dimensions when reviewing code that reads configuration from files. When code loads configuration from a file to pick a default, check whether the surrounding tool's CLI resolves the same value from an environment variable with higher precedence. If so, the file-only read silently ignores the env case. This applies to any
config.json/YAML/TOML loader that backs a CLI where the CLI also consults env vars. Flag the gap as a false-default path.
Repo context updates ¶
- [+] Docker context resolution precedence. Docker CLI resolves the effective context as:
--contextflag →--hostflag →DOCKER_HOSTenv (bypasses contexts) →DOCKER_CONTEXTenv (names a context) →config.json'scurrentContext→ platform default.dockerclient.FromEnvreadsDOCKER_HOST/DOCKER_CERT_PATH/DOCKER_TLS_VERIFYbut notDOCKER_CONTEXT. Any RDD code that readsconfig.jsonto determine the user's "current" context, or that callsdockerclient.FromEnvas a stand-in for the user's environment, must also consultDOCKER_CONTEXTor document why it ignores it. Flag probe/override code that only readsconfig.json.