Deep Review: 20260504-192042-pr-348

Date	2026-05-04 19:20
Repo	rancher-sandbox/rancher-desktop-daemon
Round	6
Author	@jandubois
PR	#348 — rdd kubectl: embed kuberlr-style version resolver
Branch	`embed-kuberlr`
Commits	`72bfad2` rdd kubectl: embed kuberlr-style version resolver
Reviewers	Claude Opus 4.7 (effort: xhigh), Codex GPT 5.5 (effort: xhigh), Gemini 3.1 Pro (effort: default)
Verdict	Merge with fixes — Round 5 fixes for S1 docs, S2 download UX, S4 BATS comment, S5 SkipResolver bool, and S6 dead error return all landed cleanly. Round 6 surfaces no Critical or Important findings; the round's most actionable item is S1 — the same Round 5 amendment that swapped `SkipResolver` from env-var to package bool left `docs/design/environment.md:35` describing `RDD_KUBECTL_RESOLVED` as written by `rdd ctl`, which is now untrue. Both Claude (I3) and Codex (S1) caught this regression. The remaining Suggestions are smaller defensive items (Gemini's `--` separator misclassification in `isClientOnly`, plus three Claude items: 256 MiB cap surfaces as checksum mismatch, missing User-Agent header, dead `os.Chmod` call on Windows).
Wall-clock time	`33 min 47 s`

Executive Summary ¶

Round 5's six Suggestions all landed; Round 6 produces no Critical or Important findings. The five surviving Suggestions group into three buckets:

Documentation drift introduced by Round 5's SkipResolver fix (S1, two agents). Round 5's S5 changed SkipResolver() from os.Setenv(envSkipResolver, "1") to a package-level skipResolver bool. The internal-variables table in docs/design/environment.md:35 still says "rdd ctl sets it before invoking the kubectl action," which is no longer true — rdd ctl calls kuberlr.SkipResolver() (cmd/rdd/kubectl.go:48), and the env var is set only by kuberlr.Exec on the kubectl child. A developer grepping a process tree for RDD_KUBECTL_RESOLVED=1 will see it on a downloaded kubectl child but never on the rdd ctl parent the doc claims sets it.
-- separator boundary misclassified as client-only (S2, Gemini). isClientOnly walks args and breaks on -- (pkg/kuberlr/resolver.go:278-280). For kubectl -- get pods, the loop breaks before recording any subcommand, the empty-subcommand fallback at line 312-314 returns true, and the resolver skips the probe — exactly the misclassification the cautionary comment at lines 264-269 calls out. The trigger is rare (no kubectl recipe puts -- before the subcommand) but the fallback should still resolve, not skip.
Defensive cleanup in the new code — S3 (io.LimitReader's 256 MiB cap silently truncates and surfaces as "checksum mismatch" rather than naming the cap), S4 (Go-http-client/1.1 UA may be rate-limited by corporate/air-gapped mirrors), S5 (os.Chmod(tmpName, 0o755) is a no-op on Windows; comment or wrap).

The Windows os.Rename race surfaced again from all three agents (Codex I1, Gemini I1, Claude I1) and was dropped per the prior-round-resolutions gate (declined Rounds 1–5). Gemini also brought back the CacheRoot panic as Critical with a different fix shape (silent os.TempDir() fallback); same finding declined in Round 5 as Suggestion S3, dropped per the gate.

Structure: 0 Critical · 0 Important · 5 Suggestions.

Critical Issues ¶

None.

Important Issues ¶

None.

Suggestions ¶

S1. docs/design/environment.md:35 says rdd ctl sets RDD_KUBECTL_RESOLVED, but Round 5's S5 fix removed that env-var write Codex Claude¶

docs/design/environment.md


`rdd` sets these variables itself; users should leave them alone. They appear here so a developer reading a process tree or strace output can identify them.

| Variable | Description |
| --- | --- |
| `RDD_KUBECTL_RESOLVED` | Recursion guard for the kubectl version resolver. `kuberlr.Exec` sets it on the kubectl child process so a downloaded kubectl that re-execs `rdd` through a shim cannot recurse back into version resolution. `rdd ctl` sets it before invoking the kubectl action so the resolver skips the redundant probe against the embedded apiserver, whose version matches the embedded kubectl by construction. |

## Path Variables

`rdd svc paths --output=shell` exports these variables. Most are informational; `RDD_CACHE_DIR` is also honored as the cache-root override (see Configuration Variables above).

Round 5's S5 fix replaced the os.Setenv(envSkipResolver, "1") body of SkipResolver with a package-level skipResolver bool (pkg/kuberlr/resolver.go:30-41); rdd ctl then calls that helper at cmd/rdd/kubectl.go:48. The internal-variables table in environment.md still credits rdd ctl with writing the env var. The intro at line 31 — "they appear here so a developer reading a process tree or strace output can identify them" — sets the wrong expectation: only the downloaded kubectl child carries RDD_KUBECTL_RESOLVED=1 now.

pkg/kuberlr/resolver.go

// envSkipResolver tells Resolve to short-circuit. Exec sets it on the
// kubectl child process so a downloaded kubectl that re-execs us through
// a shim cannot recurse back into version resolution.
const envSkipResolver = "RDD_KUBECTL_RESOLVED"

// skipResolver short-circuits Resolve for the rest of this process.
// Same-process toggle; envSkipResolver covers the cross-process recursion
// guard that Exec sets on the kubectl child.
var skipResolver bool

// SkipResolver short-circuits Resolve for the rest of this process.
// rdd ctl calls this before kubectlAction because it always targets
// the embedded apiserver, whose version matches the embedded kubectl
// by construction — the probe would always fall through anyway.
func SkipResolver() {
	skipResolver = true
}

// serverProbeTimeout caps the discovery call. A reachable cluster answers
// in under 100 ms; an unreachable one would otherwise stall every kubectl
// invocation.
const serverProbeTimeout = 2 * time.Second

cmd/rdd/kubectl.go

		return fmt.Errorf("failed to set KUBECONFIG: %w", err)
	}
	// rdd ctl always targets the embedded apiserver. Embedded apiserver
	// and embedded kubectl share a k8s.io/kubernetes module entry, so
	// the version probe would always fall through. Skip it.
	kuberlr.SkipResolver()
	return kubectlAction(cmd, args)
}

func newKubectlCommand() *cobra.Command {
	command := &cobra.Command{

docs/design/environment.md

| `RDD_NAMESPACE` | Default Kubernetes namespace for BATS controller tests. | `rdd-bats` |
| `RDD_VM_TYPE` | Lima VM type for tests that boot a VM (`qemu` or `vz`). Useful for reproducing QEMU-specific failures on macOS. | Lima's default (`vz` on macOS, `qemu` on Linux) |

### Internal Variables

`rdd` sets these variables itself; users should leave them alone. They appear here so a developer reading a process tree or strace output can identify them.

| Variable | Description |
| --- | --- |
| `RDD_KUBECTL_RESOLVED` | Recursion guard for the kubectl version resolver. `kuberlr.Exec` sets it on the kubectl child process so a downloaded kubectl that re-execs `rdd` through a shim cannot recurse back into version resolution. `rdd ctl` sets it before invoking the kubectl action so the resolver skips the redundant probe against the embedded apiserver, whose version matches the embedded kubectl by construction. |

Fix: rewrite the second sentence to describe the in-process toggle.

-| `RDD_KUBECTL_RESOLVED` | Recursion guard for the kubectl version resolver. `kuberlr.Exec` sets it on the kubectl child process so a downloaded kubectl that re-execs `rdd` through a shim cannot recurse back into version resolution. `rdd ctl` sets it before invoking the kubectl action so the resolver skips the redundant probe against the embedded apiserver, whose version matches the embedded kubectl by construction. |
+| `RDD_KUBECTL_RESOLVED` | Recursion guard for the kubectl version resolver. `kuberlr.Exec` sets it on the kubectl child process so a downloaded kubectl that re-execs `rdd` through a shim cannot recurse back into version resolution. `rdd ctl` achieves the same skip via the in-process `kuberlr.SkipResolver()` helper, not this env var. |

S2. isClientOnly misclassifies -- before subcommand as client-only Gemini¶

pkg/kuberlr/resolver.go

// subcommand. An unknown long flag (--foo) gets treated as no-value
// so its successor stays a candidate for the subcommand position.
// The conservative bias keeps Resolve probing on commands isClientOnly
// cannot identify; misclassifying as client-only would skip the
// version skew check and silently use a mismatched embedded kubectl.
func isClientOnly(args []string) bool {
	if len(args) == 0 {
		return true
	}
	subcommand := ""
	sawClient := false
	for i := 0; i < len(args); i++ {
		arg := args[i]
		if arg == "--" {
			break
		}
		// kubectl bool flags accept both bare and equals form
		// (`--help`, `--help=true`, `-h=false`, `--client=1`); the
		// equals form follows pflag semantics, with last occurrence
		// winning for repeated flags.
		if arg == "--help" || arg == "-h" {
			return true
		}
		if v, ok := boolFlagValue(arg, "--help", "-h"); ok {
			if v {
				return true
			}
			continue
		}
		if arg == "--client" {
			sawClient = true
			continue
		}
		if v, ok := boolFlagValue(arg, "--client"); ok {
			sawClient = v
			continue
		}
		if strings.HasPrefix(arg, "-") {
			if !strings.Contains(arg, "=") && kubectlGlobalFlagsTakingValue[arg] && i+1 < len(args) {
				i++
			}
			continue
		}
		if subcommand == "" {
			subcommand = arg
		}
	}
	if subcommand == "" {
		return true
	}
	if subcommand == "version" {
		// `kubectl version` (no --client) probes the server for its
		// version. Treat that as cluster-bound so the resolver picks
		// a version-matched binary to do the probe.
		return sawClient
	}
	return clientOnlySubcommands[subcommand]
}

kubectl -- get pods is syntactically valid: pflag treats -- as the end-of-flags marker and Cobra reads the first positional as the subcommand. isClientOnly walks args, breaks on -- at line 278-280, finds no subcommand, and returns true via the empty-subcommand fallback at line 312-314. The resolver then skips the probe and runs the embedded kubectl — exactly the misclassification the comment at lines 264-269 warns against. Probability is low (no documented kubectl recipe puts -- before the subcommand) but the fallback's bias should still be "probe, don't skip."

pkg/kuberlr/resolver.go

// the resolver can skip without losing correctness. Empty args, the
// help flags, "version --client", and the subcommands in
// clientOnlySubcommands all qualify; everything else falls through
// to the probe so the version skew check still runs.
//
// The walk skips known global flags and their values to land on the
// subcommand. An unknown long flag (--foo) gets treated as no-value
// so its successor stays a candidate for the subcommand position.
// The conservative bias keeps Resolve probing on commands isClientOnly
// cannot identify; misclassifying as client-only would skip the
// version skew check and silently use a mismatched embedded kubectl.
func isClientOnly(args []string) bool {
	if len(args) == 0 {
		return true
	}
	subcommand := ""

Fix: take the next positional after -- as the subcommand candidate before breaking.

 for i := 0; i < len(args); i++ {
     arg := args[i]
     if arg == "--" {
+        if subcommand == "" && i+1 < len(args) {
+            subcommand = args[i+1]
+        }
         break
     }

A TestIsClientOnly row covering ["--", "get", "pods"] (expect false) would lock the fix in.

S3. streamGet's 256 MiB cap surfaces as "checksum mismatch", not as "binary too large" Claude¶

pkg/kuberlr/downloader.go

		return err
	}
	tmpName := tmp.Name()
	defer os.Remove(tmpName)
	h := sha512.New()
	if err := streamGet(ctx, base, io.MultiWriter(tmp, h), maxKubectlBytes); err != nil {
		tmp.Close()
		return fmt.Errorf("downloading %s: %w", base, err)
	}
	if err := tmp.Close(); err != nil {
		return err

If kubectl ever exceeds maxKubectlBytes (256 MiB), io.LimitReader returns silently after maxBytes, the digest covers a truncated body, and the caller sees a checksum mismatch error pointing at integrity instead of the size cap. fetchSha512 already gets this right: a misconfigured mirror serving the binary at the .sha512 URL surfaces as "malformed checksum from %s: %d chars" (downloader.go:153-159). Mirror the dedicated error path on the binary side so a future cap bump is signalled by a clear "exceeds N MiB" message rather than a misleading hash mismatch.

pkg/kuberlr/downloader.go

	}
	// Normalize to lowercase to match download's hex.EncodeToString
	// output. PowerShell Get-FileHash emits uppercase hex; without
	// this, the comparison rejects valid digests as mismatched.
	digest := strings.ToLower(fields[0])
	if len(digest) != sha512.Size*2 {
		return "", fmt.Errorf("malformed checksum from %s: %d chars, want %d", url, len(digest), sha512.Size*2)
	}
	for _, c := range digest {
		if !(c >= '0' && c <= '9' || c >= 'a' && c <= 'f') {
			return "", fmt.Errorf("malformed checksum from %s: non-hex character %q", url, c)
		}
	}
	return digest, nil
}

// streamGet performs a GET on url and copies the response body into w,

Fix: wrap the limit reader so reaching the cap returns a distinct sentinel and the error message names the cap.

S4. HTTP requests omit a User-Agent header Claude¶

pkg/kuberlr/downloader.go

// streamGet performs a GET on url and copies the response body into w,
// capped at maxBytes. The cap turns a malicious or misconfigured mirror
// into a bounded failure: the underlying io.LimitReader stops reading
// after maxBytes regardless of the server's intent. streamGet returns
// an error on any non-200 status.
func streamGet(ctx context.Context, url string, w io.Writer, maxBytes int64) error {
	req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, http.NoBody)
	if err != nil {
		return err
	}
	resp, err := httpClient.Do(req)
	if err != nil {
		return err
	}
	defer resp.Body.Close()
	if resp.StatusCode != http.StatusOK {
		return fmt.Errorf("GET %s: %s", url, resp.Status)
	}
	_, err = io.Copy(w, io.LimitReader(resp.Body, maxBytes))
	return err
}

dl.k8s.io is operated by SIG Release and accepts the default Go client UA today, but corporate proxies and air-gapped mirrors regularly use UA-based rate limiting or denylists for Go-http-client/1.1. A stable UA also helps the SIG Release team correlate kuberlr-style traffic. One header on each request closes the gap.

Fix:

 req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, http.NoBody)
 if err != nil {
     return err
 }
+req.Header.Set("User-Agent", "rdd-kuberlr/"+componentbasever.Get().GitVersion)

S5. os.Chmod(tmpName, 0o755) is a no-op on Windows Claude¶

pkg/kuberlr/downloader.go

	}
	got := hex.EncodeToString(h.Sum(nil))
	if got != want {
		return fmt.Errorf("checksum mismatch for %s: want %s, got %s", base, want, got)
	}
	if err := os.Chmod(tmpName, 0o755); err != nil {
		return err
	}
	return os.Rename(tmpName, dst)
}

// Body-size caps for streamGet. maxSha512Bytes covers the longest
// `sha512sum`-format line (128 hex chars + two spaces + a generous

Windows file mode is reduced to a single read-only bit; executability is determined by extension (.exe) and PE header. 0o755 toggles only the read-only bit (writable, since the write bit is set), so the chmod call has no functional effect. Mode 0755 doesn't break anything, but the next reader of downloader.go will read the line as "ensure +x on Windows," which it isn't. A one-line comment or a runtime.GOOS != "windows" guard removes the ambiguity.

Fix:

+// On Windows the executable bit doesn't exist (extension and PE
+// header decide); chmod only affects the read-only bit. Skip the
+// call rather than mislead the next reader.
+if runtime.GOOS != "windows" {
+    if err := os.Chmod(tmpName, 0o755); err != nil {
+        return err
+    }
+}
-if err := os.Chmod(tmpName, 0o755); err != nil {
-    return err
-}

Design Observations ¶

Concerns ¶

Resolver runs against context.Background() (future) Claude. cmd.Context() returns context.Background() because cli.RunNoErrOutput calls cmd.Execute(), not cmd.ExecuteContext(). The 5 min httpClient.Timeout and the 2 s discovery cfg.Timeout are the only cancellation backstops; SIGINT to rdd during a download kills the process without running the defer os.Remove(tmpName), so a Ctrl-C mid-download leaks a .kubectl-* tempfile. Wiring signal.NotifyContext(context.Background(), os.Interrupt) in main and switching to cmd.SetContext(ctx) would close this. Pairs with the existing TODO(eviction) at cache.go:33-39, which already covers the cleanup side.
Cache hit trusts the on-disk binary without re-verification (future) Claude. ensureCached returns immediately when cachedPath(want) exists; nothing checks the cached file's sha512 against the upstream digest on hit. Filesystem corruption or a manually-tampered cache file surfaces as a confusing kubectl exec failure rather than a "cache integrity error" the resolver could re-download from. Re-hashing every cached binary on every kubectl invocation is too expensive; a future LRU/TTL sweep (the TODO(eviction) at cache.go:33-39) is the right place to verify before evicting or reusing.
Mirror checksum gives no security against a compromised mirror (future, persistent) [carry-over from Round 5]. download fetches the kubectl binary and its .sha512 from the same mirror; the hash protects against truncation and disk corruption but not against a mirror serving a malicious binary alongside a matching malicious checksum. SIG Release signs binaries with Sigstore/Cosign. Verifying the Cosign signatures (in addition to or in place of the mirror's own checksum) closes the supply-chain gap. Out of scope for this PR.
Hand-rolled subcommand walker duplicates the bound pflag set (future, persistent) [carry-over from Rounds 3–5]. parseKubeConfigFlags already binds the kubectl global flags via genericclioptions.ConfigFlags.AddFlags; isClientOnly re-walks args manually with the hand-maintained kubectlGlobalFlagsTakingValue map. S2 this round is a fresh shape of the same drift class — -- boundary handling slipping past the manual walker. Driving the walker from the same FlagSet (fs.Args() after parse returns the positional tokens) eliminates the map and most of the drift class.

Strengths ¶

Round 5 fixes landed without regression ClaudeCodex. environment.md Path Variables preamble now scopes the no-effect claim correctly; cmd_service.md key table includes cache_dir; the BATS comment names the right export site; SkipResolver uses a package-level bool; serverVersion returns (semver.Version, bool) with the dead error path gone; the warn-level "downloading kubectl" line surfaces at the default log level.
Process replacement on Unix vs. spawn on Windows is the right split Claude. syscall.Exec keeps the kubectl child at the expected place in the process tree, and the Windows path's *cliexit.Error envelope plus the empty-message branch in cmd/rdd/main.go:163-165 propagates the kubectl exit code without a confusing logrus.Error line.
Same-process toggle plus cross-process env-var recursion guard Claude. SkipResolver() for rdd ctl (in-process bool) and RDD_KUBECTL_RESOLVED for the rare downloaded-shim re-exec (env-based) cover their respective shapes minimally; the env var no longer leaks into unrelated subprocesses.
Hash before mutation; body caps at the right place; reusing kubectl's flag binding ClaudeCodexGemini. Round 5 strengths persist: download verifies the digest before any chmod or rename; streamGet caps the digest body at 4 KiB and the binary at 256 MiB; parseKubeConfigFlags binds genericclioptions.NewConfigFlags(true) so the resolver and kubectl child target the same cluster by construction.
Graceful fall-through on probe failure Gemini. serverVersion returns ok=false for missing kubeconfig, malformed config, unreachable cluster, and unparseable version. Each path falls through to the embedded kubectl rather than aborting the kubectl invocation. Client-only commands keep working when the cluster is unreachable.
Robust BATS server fixture Gemini. fake-kube-server opens its log file with O_APPEND so the BATS-side : >"${LOG_FILE}" truncation between tests does not race with the server's writes.

Testing Assessment ¶

Coverage gaps surfaced this round, ordered by user-reachable risk:

isClientOnly with -- before subcommand (S2). TestIsClientOnly covers ["exec", "pod", "--", "completion"] (-- after subcommand) but not ["--", "get", "pods"] (-- before subcommand). One row pinning the post-fix behavior would close the gap and lock the conservative bias in.
Concurrent first-use cache miss (declined Rounds 1–5; raised again this round by all three agents). Acknowledged limitation.
Download larger than maxKubectlBytes (S3). streamGet's 256 MiB cap has no test asserting the post-cap behavior; a unit test that points streamGet at a body larger than maxBytes would document the truncation contract.
End-to-end Windows execution path (pkg/kuberlr/exec_windows.go). PR description gates the merge on Windows BATS support; the path remains compile-only.
embeddedVersion parse failure (the go run case the comment at resolver.go:79-83 names). A unit test that injects an unparseable string and asserts Resolve returns ("", nil) with no probe would lock in the documented dev-build fall-through.
RDD_KUBECTL_RESOLVED=1 on the parent rdd kubectl Claude. No test confirms the env-var recursion guard short-circuits Resolve; only the same-process SkipResolver boolean is exercised indirectly via rdd ctl.

Verification this round: Codex ran go test ./pkg/kuberlr (passed) and go test ./... inside bats/tests/10-cli/fake-kube (passed); go test ./cmd/rdd failed in the worktree on a missing embedded asset (pkg/guestagent/lima-guestagent.gz), unrelated to the PR.

Documentation Assessment ¶

docs/design/environment.md:35 (Internal Variables) carries the regression in S1: the RDD_KUBECTL_RESOLVED row still credits rdd ctl with writing the env var that Round 5's SkipResolver change removed.
The Path Variables preamble at line 39 (post Round 5 fix) now correctly carves out RDD_CACHE_DIR from the "no effect on RDD's behavior" claim.
docs/design/cmd_service.md key table at lines 127-138 includes cache_dir after Round 5; example output at lines 152-164 matches.
pkg/kuberlr/cache.go package doc at lines 5-10 still accurately describes the resolve-or-fall-through model.
parseKubeConfigFlags docstring at pkg/kuberlr/resolver.go:157-178 enumerates the bound flag surface and explains the deliberate --username/--password carve-out.

Commit Structure ¶

Single commit 72bfad2 (amended from Round 5's fb0811d). Message accurately describes the change. Clean.

Acknowledged Limitations ¶

Cache eviction not implemented — pkg/kuberlr/cache.go:33-39. TODO(eviction) covers both growth (~50 MB per minor version) and .kubectl-* temp leftovers from SIGKILL/power loss; SIGINT mid-download (resolver runs against context.Background()) is the most likely source of a leaked tempfile and pairs with the cache-hit re-verification design observation above.
Offline mode not implemented — pkg/kuberlr/resolver.go:66-69, pkg/kuberlr/downloader.go:46-48. TODOs cover the planned env-var/config opt-out for download attempts.
Windows BATS host deferred — bats/tests/10-cli/7-kubectl-cache.bats:14-20. PR description gates the merge on Windows BATS support; the resolver's Windows-specific code paths remain compile-only.
local_teardown_file() deviation from BATS convention — bats/tests/10-cli/7-kubectl-cache.bats:103-110. Inline rationale stands.
Falling back to embedded on any cluster-probe failure — pkg/kuberlr/resolver.go:115-145. Each fall-through path is documented; the design choice is sound.
Embedded version unparseable in dev builds — pkg/kuberlr/resolver.go:77-86. go run ./cmd/rdd kubectl ... and IDE debug builds skip -ldflags -X; falling through to embedded kubectl rather than breaking every dev invocation is the right call.
--username/--password deliberately unbound on the resolver side — pkg/kuberlr/resolver.go:168-175. Binding them via WithDeprecatedPasswordFlag would force the interactive kubeconfig loader; the resolver's silent fall-through on probe failure covers the asymmetry.
RDD_KUBECTL_MIRROR accepts non-HTTPS URLs — Implicit. The BATS test relies on http://127.0.0.1:port; for users who set this to a non-localhost http URL, MITM is possible. Trusting the env-var owner is consistent with other RDD_* overrides.
Empty-message cliexit.Error contract — cmd/rdd/main.go:159-160. The contract is clear in context but currently single-call-site (only kuberlr.Exec produces one).

Declined in prior rounds ¶

Concurrent first-use Windows os.Rename race — declined Rounds 1, 2, 3, 4, 5. Raised this round by all three agents (Claude I1, Codex I1, Gemini I1); dropped per the resolutions gate. The recurring re-raise (now six rounds running) keeps strengthening the case for an inline // design note: see review history style TODO documenting the deliberate design choice so future agents recognize the decline pattern in the code itself.
CacheRoot panic on os.UserCacheDir failure — declined Rounds 1, 2, 3 ("trigger requires neither $HOME nor %LocalAppData% to resolve"); declined Round 5 as Suggestion S3 (blast-radius re-raise: structural change touches multiple files for a low-probability defensive case). Raised this round by Gemini as C1 with a different fix shape (silent os.TempDir() fallback). Dropped per the resolutions gate; the underlying decline reasoning (rare trigger, structural change disproportionate) holds across the new fix proposal.
Drive-by config.json → config.yaml rename in cmd_service.md — declined Round 5 ("Round 4's S3 — raised by Claude itself — explicitly asked for the rename, which then landed"). Raised again this round by Claude as S4 (drive-by); dropped per the gate.
RDD_CACHE_DIR scope rename — declined Round 1.
Server log.Fatal* bypassing defer logFD.Close() — declined Round 1: test fixture; OS reclaims fd.
Pure-Go unit coverage thin on download/cache machinery — declined Round 2: BATS suite covers the integration concerns.
BATS coverage for HTTP timeout/hang, network cancel, Windows Exec, sha512sum-format mirror, trailing-slash mirror, default-log-level message capture, unreachable-server probe — declined Rounds 1–3.
info.GitVersion empty/unknown unit test — declined Round 1: requires mocked discovery client.
Resolver-policy summary and recursion-guard rationale duplicated across files — declined Round 1: each duplicate serves a different audience.
SkipResolver swallows os.Setenv error — declined Round 3; moot since Round 5 removed the Setenv call entirely.
Cache the server-version probe across invocations — declined Round 3: performance optimisation; explicit follow-up.
Move package doc-comment from cache.go to resolver.go — declined Round 3.
Drop is_macos branch in write_kubectl_sha512 — declined Round 3.
Round 4 S6: serverVersion ignores ctx — skipped Round 4: UX gap bounded by 2 s probe timeout.
Round 4 S10: 5 min downloadTimeout covers headers + body — skipped Round 4: speculative.
Round 5 testing gaps (S3 panic test, download UX assertion, concurrent cache miss, Windows execution path, unparseable gitVersion BATS) — Skipped Round 5; rationale stands.

Unresolved Feedback ¶

@mook-as on cmd/rdd/service_paths.go:33 — https://github.com/rancher-sandbox/rancher-desktop-daemon/pull/348#discussion_r3182959199: "Would it be possible to expose RDD_CACHE_DIR here too?" Resolved across rounds. cache_dir (kuberlr.CacheRoot()) and kubectl_cache (kuberlr.CacheDir()) appear in instancePaths() (service_paths.go:32-33), the environment.md Path Variables table, and the cmd_service.md example output and key table.

Agent Performance Retro ¶

Claude ¶

Claude carried this round's coverage: four Suggestions kept (the shared S1 docs regression with Codex, plus three unique items — S3 the 256 MiB cap surfacing as checksum mismatch, S4 the missing User-Agent, S5 the dead os.Chmod on Windows) plus two Design Observations (resolver running against context.Background() and trusting cache hits without re-verification). The S1 catch was the round's most actionable item; Claude noticed that Round 5's SkipResolver change broke the env-var documentation in the same file Round 5 fixed earlier. Three findings dropped: I1 (Windows rename race) per the resolutions gate (six rounds running), I2 (a self-downgraded note about --instance interaction with the walker), and S4 (drive-by config.json → config.yaml, which Round 5 already declined as the rename Round 4 explicitly asked for). Claude S2 (cache-hit re-verification) moved to Design Observations as Claude himself flagged it as "no fix required for this PR."

Codex ¶

Codex ran a tight one-Important / one-Suggestion shape: I1 (Windows rename race, dropped per the gate) and S1 (RDDKUBECTLRESOLVED docs, kept and shared with Claude). Codex framed the docs finding as Suggestion-severity from the start, which matches the calibration the consolidation settled on; Claude's I3 was downgraded to align. Codex also ran go test ./pkg/kuberlr and go test ./... inside bats/tests/10-cli/fake-kube (both passed) — verification work hard to parallelize across agents and consistent across rounds.

Gemini ¶

Gemini brought back the CacheRoot panic as Critical (C1, dropped per the gate — declined Round 5 for the same blast-radius reasoning) and the Windows rename race as Important (I1, also dropped). The keeper was I2: the -- boundary misclassification in isClientOnly. No other agent traced the empty-subcommand fallback path through to the cautionary comment at lines 264-269. Severity calibrated down from Important to Suggestion (S2) given how rare kubectl -- subcommand is in practice. One unique finding from Gemini balances out the two declined re-raises.

Summary ¶

	Claude Opus 4.7	Codex GPT 5.5	Gemini 3.1 Pro
Duration	15m 21s	6m 02s	—
Findings	4S	1S	1S
Tool calls	45 (Bash 24, Read 21)	35 (shell 35)	—
Design observations	2	0	0
False positives	0	0	0
Unique insights	3	0	1
Files reviewed	19	19	19
Coverage misses	0	0	0
Totals	4S	1S	1S
Downgraded	1 (I→S)	0	1 (I→S)
Dropped	4	1	2

Claude provided the most coverage (three unique Suggestions plus two Design Observations). Codex's S1 was the round's most actionable item (the regression Round 5 introduced). Gemini's S2 was the round's most surprising — a real walker-bias bug in code that has been reviewed five times.

#### Reconciliation

Claude I1 (Windows rename race): important → dropped per resolutions gate (declined Rounds 1–5).
Claude I2 (--instance walker interaction): important → dropped (Claude self-downgraded to a doc-comment note; marginal value).
Claude I3 (RDDKUBECTLRESOLVED docs): important → suggestion S1 (docs-only regression; merged with Codex S1).
Claude S2 (cache hit re-verification): suggestion → Design Observation (Claude flagged "no fix required for this PR").
Claude S4 (drive-by config.yaml rename): suggestion → dropped per resolutions gate (declined Round 5).
Codex I1 (Windows rename race): important → dropped per resolutions gate.
Gemini C1 (CacheRoot panic): critical → dropped per resolutions gate (declined Round 5 with same blast-radius reasoning; Gemini's TempDir-fallback fix shape doesn't change the decline rationale).
Gemini I1 (Windows rename race): important → dropped per resolutions gate.
Gemini I2 (-- separator misclassification): important → suggestion S2 (rare trigger; conservative-bias correction).

Review Process Notes ¶

Skill improvements ¶

Add a "watch for documentation regressions in the same file as the code fix" rule to the consolidation gates. When a prior round amends a function's behavior (env-var write removed, return type changed, semantics inverted) and a docs file in the same PR describes that behavior, the docs are highly likely to drift in the round that lands the code fix. Today the prior-round-resolutions gate verifies that fixes landed; add a sibling check: for every Resolution table row marked "Fixed" that touched named functions or constants, grep the diff for documentation references to those names and confirm they describe the post-fix behavior. The check would have caught S1 mechanically — Round 5's S5 fix removed os.Setenv(envSkipResolver, …) from SkipResolver, but environment.md:35 still says "rdd ctl sets it." Future amendments that rename, remove, or invert behavior carry the same shape.

Repo context updates ¶

None this round. The existing deep-review-context.md carries enough for agents to surface S1 (docs-vs-code drift) and S2 (walker bias against the conservative-bias comment) on a careful read; no new repo-specific patterns emerged.