# Deep Review: 20260519-131414-pr-348

| | |
|---|---|
| **Date** | 2026-05-19 13:14 |
| **Repo** | [rancher-sandbox/rancher-desktop-daemon](https://github.com/rancher-sandbox/rancher-desktop-daemon) |
| **Round** | 9 |
| **Author** | [@jandubois](https://github.com/jandubois) |
| **PR** | [#348](https://github.com/rancher-sandbox/rancher-desktop-daemon/pull/348) — rdd kubectl: embed kuberlr-style version resolver |
| **Branch** | `embed-kuberlr` |
| **Commits** | `30289ab` rdd kubectl: embed kuberlr-style version resolver |
| **Reviewers** | Claude Opus 4.7 (effort: xhigh), Codex GPT 5.5 (effort: xhigh), Gemini 3.1 Pro (effort: default) |
| **Verdict** | **Merge with fixes** — Two new judgment calls surfaced after the Round 8 fixes landed on commit `30289ab`. I1 (Claude) is the contrapositive of Round 8 I1: the `Warnf → Infof` demotion that Round 8 recommended now hides the download notice entirely at `RDD_LOG_LEVEL=warn` (the non-developer default), so users see no feedback during a multi-second-to-multi-minute download. I2 (Codex) is a genuine new defect in the BATS test infrastructure added by this PR: `7-kubectl-cache.bats` calls bare `go env GOOS` while `using_windows_exe()` always selects Windows binaries on WSL, so a WSL dev with Linux Go on PATH populates the mirror at `bin/linux/...` while `rdd.exe` requests `bin/windows/...`. The remaining five Suggestions are polish (test coverage for unbound auth flags, `errors.Is` for sentinel comparison, `defer os.Remove` intent, `cliexit.Error{Err: nil}` contract, and a 2-second probe timeout that may be tight for VPN-bound clusters). Dropped per the resolutions/declined gates: the Windows `os.Rename` race (Codex I1, Gemini C1 — declined Rounds 1-8) and the `exec` auth-plugin probe duplication (Gemini I1 — Round 8 design observation, framed slightly more strongly this round). |
| **Wall-clock time** | `18 min 39 s` |

## Executive Summary

Round 9 reviews commit `30289ab` (Round 8 was `ce035d5`; the Round 8 fixes — `Warnf → Infof`, `kill → process_kill`, exports lifted to `local_setup_file`, `winpath` WSL branch, `--match-server-version` binding, `errors.As` `exitErr.Err != nil` check, and `TestResolveSkipsWhenRecursionGuardSet` — all verified landed). The PR is in good shape; this round surfaces two judgment-call findings rather than fresh structural defects.

I1 (Claude) is the *opposite* shape of Round 8 I1. Round 8 said `Warnf` was too loud at the default log level; Jan fixed by demoting to `Infof`. Round 9 now observes that `Infof` is invisible at the default `RDD_LOG_LEVEL=warn`, so a multi-minute kubectl download from a 3G or VPN link shows the user no feedback — they will think rdd has hung. The right answer is some level that *does* render at the default threshold while not feeling like a spurious warning; a plain `fmt.Fprintln(os.Stderr, ...)` is one option that side-steps the level-vs-rendering tradeoff.

I2 (Codex) is a genuine new defect in the BATS plumbing this PR adds. `bats/tests/10-cli/7-kubectl-cache.bats:15-17,35,38` calls bare `go env GOOS` / `go env GOOS` / `go build`. On WSL, `bats/helpers/defaults.bash:10-18` `using_windows_exe()` always returns true and `commands.bash:1-8` sets `EXE=.exe`, so rdd is `rdd.exe` and `runtime.GOOS=windows` inside it. If the WSL dev's PATH resolves `go` to Linux Go, the fake mirror lands at `bin/linux/$GOARCH/` while `rdd.exe` requests `bin/windows/$GOARCH/`, producing a 404 instead of exercising the resolver.

Dropped per the resolutions/declined gates: Codex I1 and Gemini C1 (concurrent Windows `os.Rename` cache-publish race) — declined Rounds 1-8 inclusive, ninth re-raise this round with no new evidence. Gemini I1 (the user's KUBECONFIG `exec` credential plugin can prompt or open a browser before kubectl prints anything) is the same shape as the Round 8 Design Observation "Double execution of KUBECONFIG `exec` auth plugins"; carried over with Gemini's stronger framing rather than re-raised as new.

Structure: 0 Critical · 2 Important · 5 Suggestions.

## Critical Issues

None.

## Important Issues

I1. **Round 8 `Warnf → Infof` fix overshoots: download notice is now invisible at default log level** — `pkg/kuberlr/downloader.go:78` [Claude Opus 4.7]

```go
logrus.Infof("downloading kubectl v%d.%d.%d from %s", want.Major, want.Minor, want.Patch, mirrorURL())
```

Round 8 I1 flagged this line as `Warnf`, arguing the WARN banner at default log level was too loud for a routine cache-miss event. Jan demoted to `Infof`. Round 9 surfaces the contrapositive: `cmd/rdd/main.go:96-101` defaults `RDD_LOG_LEVEL` to `warn` outside developer mode, so `Infof` never reaches the user's terminal. A `rdd kubectl get pods` against an out-of-skew cluster on a slow link now appears to hang for the full `downloadTimeout` window (up to 5 minutes, per `downloader.go:39`) with no indication that rdd is downloading. A user who Ctrl+C's the apparent hang loses the partial download and may chase a "rdd is broken" debugging path.

This is a genuine three-way trade-off the Round 8 framing flattened:

| Level | Visible at default | Banner color | Connotation |
|---|---|---|---|
| `Warnf` | yes | yellow `WARN[...]` | "something is wrong" |
| `Infof` | no | n/a | invisible |
| `fmt.Fprintln(os.Stderr, ...)` | yes | none | plain operator notice |

The third option side-steps the level-vs-rendering tradeoff: it always renders, carries no severity connotation, and matches how most CLIs surface progress on a long operation. Compare `git clone`'s `Cloning into 'x'...`, `cargo build`'s `Compiling x v1.2.3`, `apt`'s `Get:1 http://...` — none use a logging framework's WARN level. The line is operator-facing CLI output, not application telemetry.

```diff
-	logrus.Infof("downloading kubectl v%d.%d.%d from %s", want.Major, want.Minor, want.Patch, mirrorURL())
+	fmt.Fprintf(os.Stderr, "downloading kubectl v%d.%d.%d from %s\n", want.Major, want.Minor, want.Patch, mirrorURL())
```

Alternatives if Jan prefers to stay inside logrus: keep `Warnf` and document the deliberate choice inline (reversing Round 8 I1), or add a project-specific log level for one-shot operator notices.

I2. **WSL BATS run on `7-kubectl-cache.bats` populates the mirror at `bin/linux/...` while `rdd.exe` requests `bin/windows/...`** — `bats/tests/10-cli/7-kubectl-cache.bats:15-17,35,38` [Codex GPT 5.5]

```bash
GOOS=$(go env GOOS)
export GOOS
GOARCH=$(go env GOARCH)
export GOARCH
...
(
    cd "${BATS_TEST_DIRNAME}/fake-kube" || exit
    go build -ldflags='-s -w' \
        -o "$(winpath "${MIRROR_BIN_DIR}/kubectl${EXE}")" \
        ./kubectl
    go build -ldflags='-s -w' \
        -o "$(winpath "${BATS_FILE_TMPDIR}/fake-kube-server${EXE}")" \
        ./server
)
```

`bats/helpers/defaults.bash:10-18` `using_windows_exe()` returns true on every WSL run today (the TODO at line 16 notes Linux-binary coverage on WSL is not yet wired up), and `bats/helpers/commands.bash:1-8` sets `EXE=.exe` for that path. So `rdd` resolves to native `rdd.exe`, whose `runtime.GOOS` is `windows`. The resolver downloads from `bin/${runtime.GOOS}/${runtime.GOARCH}/kubectl.exe`.

But `setup_file` calls bare `go env GOOS`. On a WSL dev workstation that has Linux Go on PATH (the common case — `apt install golang` or the official Go tarball), `go env GOOS` returns `linux`. `MIRROR_BIN_DIR` then evaluates to `${MIRROR_ROOT}/release/v1.99.0/bin/linux/$GOARCH`, and the `go build` calls produce ELF binaries at that path. When `rdd.exe` probes the mirror, it GETs `bin/windows/$GOARCH/kubectl.exe.sha512` and hits a 404. The test fails not because the resolver is broken but because the fixture was built for the wrong platform.

The test passes on macOS, Linux, and MSYS2 (which has no Linux Go option). The failure surface is specifically WSL with a Linux Go toolchain.

Codex's `go${EXE}` fix is one option but assumes `go.exe` reaches a Windows Go toolchain on PATH — not guaranteed on every WSL setup. A more robust fix hard-pins GOOS/GOARCH to match the rdd binary the test will exercise:

```diff
-GOOS=$(go env GOOS)
-export GOOS
-GOARCH=$(go env GOARCH)
-export GOARCH
+if using_windows_exe; then
+    export GOOS=windows
+    export GOARCH=$(go env GOARCH)  # arch matches host even cross-OS
+else
+    GOOS=$(go env GOOS)
+    export GOOS
+    GOARCH=$(go env GOARCH)
+    export GOARCH
+fi
...
-        go build -ldflags='-s -w' \
+        GOOS="${GOOS}" go build -ldflags='-s -w' \
             -o "$(winpath "${MIRROR_BIN_DIR}/kubectl${EXE}")" \
             ./kubectl
-        go build -ldflags='-s -w' \
+        GOOS="${GOOS}" go build -ldflags='-s -w' \
             -o "$(winpath "${BATS_FILE_TMPDIR}/fake-kube-server${EXE}")" \
             ./server
```

The `GOOS=...` inline on `go build` makes Linux Go cross-compile a Windows binary on WSL. Linux Go ships a working cross-compiler out of the box, so this needs no additional toolchain on the dev machine. The `fake-kube-server${EXE}` build also needs to produce a Windows binary on WSL because `local_setup_file` execs it directly at line 52 — production-side `rdd.exe` and the test's server need to match the OS.

(important, regression)

## Suggestions

S1. **`parseKubeConfigFlags` test does not assert that `--username` / `--password` stay unbound** — `pkg/kuberlr/resolver_test.go:69-153` [Claude Opus 4.7]

```go
type want struct {
    ...
    username   string
    password   string
    ...
}
```

The `want` struct carries `username` and `password` fields and the comparison loop dereferences `cf.Username` / `cf.Password`, but no test case sets either flag in `args`. The docstring at `resolver.go:168-175` justifies leaving them unbound to dodge `ToRawKubeConfigLoader`'s interactive credential prompt; a future refactor that calls `WithDeprecatedPasswordFlag` would silently bind them and the test suite as written would not catch the regression. Add cases that pass the flags and assert the bound values remain empty:

```diff
+		{"--username unbound by design", []string{"--username=alice", "get"}, want{}},
+		{"--password unbound by design", []string{"--password=secret", "get"}, want{}},
```

(suggestion, gap)

S2. **Sentinel comparison in fake server uses `!=` instead of `errors.Is`** — `bats/tests/10-cli/fake-kube/server/main.go:91` [Claude Opus 4.7]

```go
if err := server.Serve(listener); err != nil && err != http.ErrServerClosed {
    log.Fatalf("serve: %v", err)
}
```

`http.ErrServerClosed` is a sentinel value; `errors.Is(err, http.ErrServerClosed)` is the standard idiom for sentinel comparison. The fixture never wraps the error today, so the equality compare works, but the code reads like production-shaped Go from a quick scan and the repo uses `errors.Is` for sentinels elsewhere.

```diff
-	if err := server.Serve(listener); err != nil && err != http.ErrServerClosed {
+	if err := server.Serve(listener); err != nil && !errors.Is(err, http.ErrServerClosed) {
```

(suggestion, enhancement)

S3. **`defer os.Remove(tmpName)` runs after a successful `Rename`; consider scoping to the failure path** — `pkg/kuberlr/downloader.go:107-128` [Claude Opus 4.7]

```go
tmp, err := os.CreateTemp(filepath.Dir(dst), ".kubectl-*")
...
tmpName := tmp.Name()
defer os.Remove(tmpName)
...
return os.Rename(tmpName, dst)
```

After a successful `Rename`, `tmpName` no longer exists, so the deferred `os.Remove` returns `ENOENT` silently — harmless on both Unix and Windows. The reader still has to mentally verify the deferred call cannot delete the *new* `dst` (it cannot: `Rename` moves the inode, leaving `tmpName` empty). A small boolean-guarded defer expresses intent more clearly:

```diff
-	tmpName := tmp.Name()
-	defer os.Remove(tmpName)
+	tmpName := tmp.Name()
+	cleanup := true
+	defer func() {
+		if cleanup {
+			os.Remove(tmpName)
+		}
+	}()
...
-	return os.Rename(tmpName, dst)
+	if err := os.Rename(tmpName, dst); err != nil {
+		return err
+	}
+	cleanup = false
+	return nil
```

Purely an intent / readability change; no behavioral difference today.

(suggestion, enhancement)

S4. **`cliexit.Error{Err: nil}` no-log contract is implicit; only one producer relies on it** — `cmd/rdd/main.go:172-178`, `pkg/kuberlr/exec_windows.go:39` [Claude Opus 4.7]

```go
var exitErr *cliexit.Error
if errors.As(err, &exitErr) {
    if exitErr.Err != nil {
        logrus.Error(err)
    }
    os.Exit(exitErr.Code)
}
```

The only producer of `&cliexit.Error{Code: ...}` with `Err == nil` today is `pkg/kuberlr/exec_windows.go:39` (kubectl's child has already written its own stderr; rdd should not log on top of it). Round 8 S4's resolution captured the consumer-side change cleanly. The implicit contract still lives only in the comment at `main.go:170-171`; a future caller that builds `&cliexit.Error{Code: 5}` without an `Err` will silently lose its log output, which is the right behavior only if the caller has already written to stderr. A named constructor documents the convention at the producer site:

```diff
+// Silent wraps a kubectl-child exit code without an inner error. main.go
+// suppresses its log line in this case because the child has already
+// written its own stderr.
+func Silent(code int) *Error { return &Error{Code: code} }
```

(suggestion, enhancement)

S5. **`serverProbeTimeout = 2 * time.Second` may be tight for VPN-bound corporate clusters** — `pkg/kuberlr/resolver.go:41-44` [Claude Opus 4.7]

```go
// serverProbeTimeout caps the discovery call. A reachable cluster answers
// in under 100 ms; an unreachable one would otherwise stall every kubectl
// invocation.
const serverProbeTimeout = 2 * time.Second
```

The comment's "under 100 ms" holds for LAN clusters and direct internet routes. A GKE / EKS / AKS control plane reached through a corporate split-tunnel VPN can routinely take 1-3 seconds to complete TLS handshake + IAM auth + return `/version`, especially on a cold connection. A spurious timeout falls through silently to the embedded kubectl (per `serverVersion`'s contract at `resolver.go:135-138`), so an out-of-skew cluster reached over a slow VPN runs the embedded mismatched kubectl with no operator-visible reason. 5 seconds is the same order of magnitude and absorbs the realistic tail without weakening the unreachable-cluster guard.

(suggestion, enhancement)

## Design Observations

### Concerns

- **KUBECONFIG `exec` credential plugin can prompt or open a browser before kubectl runs (in-scope)** [Gemini 3.1 Pro] — carried over from the Round 8 Design Observation "Double execution of KUBECONFIG `exec` auth plugins". Gemini's framing this round is sharper: the first user-visible side effect of `rdd kubectl get pods` against an EKS / GKE cluster whose `exec` plugin (`aws eks get-token`, `gke-gcloud-auth-plugin`, `kubelogin`) lacks a cached token is the plugin firing — including an interactive prompt, OS keychain unlock, or browser window — from rdd's `serverVersion` probe, *before* the user knows their command has even started. The Round 8 framing was "double execution" (plugin runs in the probe, then again under kubectl); Gemini's framing is "first execution is unexpected" (user thinks `kubectl get pods` triggered the prompt, not `rdd`'s version check). The TODO at `resolver.go:64-67` (`RDD_KUBECTL_OFFLINE`) and the operator escape hatch are still the natural answer; surface here so the design observation does not regress into "carried over without rationale" in future rounds.

### Strengths

- **`Warnf → Infof → ?` review loop is working as intended** [Claude Opus 4.7] — Round 8 surfaced the level choice as a judgment call; Jan picked one direction; Round 9 surfaces the contrapositive. This is the right way for the review process to handle reversible UX tradeoffs: state the trade-off, ship a reasonable answer, accept feedback if the answer overshoots. No process change needed.
- **Single-file flag-parsing test surface locks in subtle behavior** [Claude Opus 4.7] — `TestIsClientOnly` and `TestParseKubeConfigFlags` together exercise 50+ cases of pflag's UnknownFlags handling, `--client=garbage` short-circuit, `--` separator, `-n` namespace aliases, kubectl globals (`--warnings-as-errors`, `--match-server-version`). Hard-won coverage of pflag's quirks that future kubectl-flag changes will benefit from.
- **Recursion guard belt-and-braces** [Claude Opus 4.7] — cross-process via `RDD_KUBECTL_RESOLVED` env var, same-process via `skipResolver` bool, covering both axes (downloaded kubectl re-execs rdd through a shim vs. `rdd ctl` calling `kubectlAction` in-process). `TestResolveSkipsWhenRecursionGuardSet` (Round 8 testing gap, now closed) locks the cross-process axis in.
- **Streaming caps on `streamGet`** [Claude Opus 4.7] — `maxSha512Bytes` (4 KiB) and `maxKubectlBytes` (256 MiB), paired with `downloadTimeout` (5 min), turn a malicious or misconfigured mirror into a bounded failure rather than an unbounded write.
- **Sibling Go module for fake-kube fixtures** [Claude Opus 4.7] — `bats/tests/10-cli/fake-kube/go.mod` keeps test-only dependencies out of the parent module; one server plays both apiserver `/version` and release mirror `/release/*` roles with file-based per-test overrides.
- **Conservative probe fallback policy** [Codex GPT 5.5, Gemini 3.1 Pro] — missing kubeconfig, malformed kubeconfig, unreachable cluster, and unparseable server version all fall through silently to the embedded kubectl. Only download and sha errors surface loudly, matching the policy the resolver comments draw explicitly.

## Testing Assessment

`pkg/kuberlr/resolver_test.go` and the five-test BATS lifecycle in `7-kubectl-cache.bats` cover the user-reachable paths well. Round 9 surfaces test-side gaps, no production-code gaps:

1. **WSL with native `rdd.exe` plus a Linux `go` on PATH** (I2). The mirror tree and `rdd.exe`'s GET path mismatch on OS; the test fails for a reason unrelated to the resolver. Pin GOOS/GOARCH to match `rdd.exe`'s platform when `using_windows_exe` is true.
2. **`--username` / `--password` deliberately unbound on the resolver side** (S1). Add the two test cases that lock the unbound invariant in.
3. **Server `/version` returns valid JSON with an unexpected `gitVersion`** (e.g., `v0.0.0`). The current tests cover unparseable (`""`) and within-skew / out-of-skew cases; an apiserver that answers with a parseable-but-bogus version would currently trigger a download attempt for a release that does not exist on the mirror.
4. **`rdd ctl` does not re-trigger the resolver** — `cmd/rdd/kubectl.go:48` asserts the contract by inspection only; no test exercises `rdd ctl get pods` and confirms the resolver short-circuits via the same-process `skipResolver` flag.

## Documentation Assessment

`docs/design/cmd_service.md` and `docs/design/environment.md` match the resolver behavior. No documentation gaps surfaced this round.

## Commit Structure

Single commit `30289ab`. The `config.json → config.yaml`, `rdd.args → args.json`, and `docker_socket` key-table additions in `cmd_service.md` are doc-sync corrections that predate this PR; Round 7 explicitly green-lit bundling them as drive-bys. Settled.

## Acknowledged Limitations

- **Code comment**: `pkg/kuberlr/cache.go:33-39` documents unbounded cache growth and leaked `.kubectl-*` temp files after SIGKILL or power loss; eviction work is deferred.
- **Code comment**: `pkg/kuberlr/downloader.go:55-57` and `pkg/kuberlr/resolver.go:64-67` document the planned `RDD_KUBECTL_OFFLINE` opt-out; intentionally deferred.
- **Code comment**: `pkg/kuberlr/downloader.go:138-141` documents `maxKubectlBytes` truncation surfacing as "checksum mismatch"; left as a comment per Round 6.
- **Code comment**: `pkg/kuberlr/exec_windows.go:27-30` documents the dormant `CommandContext` hard-kill race against console-delivered Ctrl+C; a future refactor that wires signal-driven cancellation must revisit this site (Round 7 S3 → Commented).
- **Code comment**: `pkg/kuberlr/resolver.go:217-228` documents the `--client=garbage` pflag short-circuit and the explicit deviation from the conservative bias.
- **Code comment**: `pkg/kuberlr/resolver.go:168-175` documents leaving `--username` / `--password` unbound on the resolver side to dodge `ToRawKubeConfigLoader`'s interactive prompt.
- **Code comment**: `bats/tests/10-cli/7-kubectl-cache.bats:102-105` documents the deliberate deviation from the "no `teardown_file`" rule for the ephemeral HTTP-server fixture.

### Declined in prior rounds

- **Concurrent first-use Windows `os.Rename` cache-publish race** — declined Rounds 1, 2, 3, 4, 5, 6, 7, 8. Re-raised this round by Codex (I1) and Gemini (C1). Dropped per the resolutions gate. Nine rounds running.
- **`CacheRoot` panic on `os.UserCacheDir` failure** — declined Rounds 1, 2, 3, 5, 7, 8. Not re-raised this round.
- **`local_teardown_file` exists** — Gemini S1. The deviation is documented in-code at lines 102-105; the comment names the exact rule it deviates from and gives a rationale ("preserve rdd state vs. leak ephemeral port"). Dropped — not a re-raise of a prior-round finding but a re-raise of a documented in-code deviation.
- **Command substitution swallows non-zero exits** (`$(winpath ...)`) — Gemini S2. Same shape as Round 8 S5; declined in Round 8 with reason "winpath/cygpath failure is pathological; the `run -0` rewrite at standalone sites buys minimal practical safety while adding noise to the surrounding code." Dropped per the resolutions gate.

## Unresolved Feedback

None. The Round 8 resolution table lists 11 items, of which 7 were Fixed, 3 were Skipped (S3, S5, S6, S7 — covered by the declined-in-prior-rounds list above), and 1 had a test written. The Round 9 commit `30289ab` reflects all Fixed items.

## Agent Performance Retro

### [Claude]

Claude carried this round. Six findings (1 Important + 5 Suggestions), four of them unique to Claude. The unique I1 — observing that the Round 8 fix overshoots — is the kind of judgment-call reversal review processes are supposed to surface and that no fixed-prompt agent can be expected to produce reliably; it required reading both the prior `.reviews/` round and the post-fix code with attention to default log levels. The five suggestions split between code polish (S3, S4) and test coverage gaps (S1) and one operator-facing tuning (S5). Coverage was complete on the 20 changed files. Calibration was on — nothing was inflated to Important; nothing should be promoted from Suggestion to Important in consolidation.

### [Codex]

Codex returned its trademark tight, focused review: two Important findings, no Suggestions, no design observations beyond the two strengths. I1 (Windows `os.Rename` race) was the long-declined re-raise; dropped per the resolutions gate without prejudice. I2 (WSL `go env GOOS` mismatch) was the second-most-valuable finding this round — a genuine new defect in the test infrastructure this PR added, surface area no other agent reached. Codex ran `go test ./pkg/kuberlr` and `make build-rdd` to verify before writing, the right gate for a build-or-test claim. The Coverage Summary called out 20/20 files; no misses.

### [Gemini]

Gemini's signal was thin this round. Three of four findings were re-raises of declined items (C1 Windows `os.Rename` ninth round; S1 `local_teardown_file` against a documented in-code deviation; S2 command substitution declined Round 8 S5). I1 (KUBECONFIG `exec` plugin user-visible prompt) is a sharper framing of the Round 8 Design Observation but not a new finding; carried over with the refined framing rather than re-raised as Important. Gemini reviewed 20 files with no coverage misses but produced no net-new signal this round.

### Summary

| | Claude Opus 4.7 | Codex GPT 5.5 | Gemini 3.1 Pro |
|---|---|---|---|
| Duration | 12m 45s | 7m 28s | — |
| Findings | 1I 5S | 1I | none |
| Tool calls | 51 (Read 27, Grep 15, Bash 9) | 50 (shell 46, stdin 4) | — |
| Design observations | 1 | 2 | 0 |
| False positives | 0 | 0 | 0 |
| Unique insights | 5 | 1 | 0 |
| Files reviewed | 20 | 20 | 20 |
| Coverage misses | 0 | 0 | 0 |
| **Totals** | **1I 5S** | **1I** | **none** |
| Downgraded | 0 | 0 | 0 |
| Dropped | 0 | 1 | 4 |


Overall: Claude provided the highest signal this round (unique I1 contrapositive + four polish suggestions). Codex contributed the second-most-valuable finding (I2 WSL toolchain). Gemini contributed no net-new findings; its single non-re-raise (I1 exec-plugin) was a refinement of an existing design observation rather than a fresh finding.

## Review Process Notes

**Skill improvements**

- **Add explicit guidance for "contrapositive" findings after a Fixed prior-round Important.** When a prior round flagged a knob as set wrong in direction A and the author fixed it by moving to direction B, the new direction can be wrong for the opposite reason. Agents that read the resolution table and see "Fixed" tend to treat the line as settled. Add to the prior-round resolutions gate: *"When the fix changed a UX-facing knob (log level, timeout, default, error severity), evaluate the new value with the same scrutiny the prior round applied to the old value — a fix can overshoot."* This is the principle, not the specific bug; it applies to any reversible UX tradeoff a prior round surfaced as a judgment call.

**Repo context updates**

None this round. The "long-declined findings on PR #348" entry added after Round 7-8 is doing its job — Codex and Gemini still re-raised the `os.Rename` race, but consolidation dropped them in seconds rather than spending clarification-round budget.

## Appendix: Original Reviews

### [Claude] — Pass 1

I have enough context to write the review. Let me check one more thing about how Exec on Unix interacts with the cleanup defer:

# Code Review: PR #348 — rdd kubectl: embed kuberlr-style version resolver

### Executive Summary

This PR adds a kuberlr-style version resolver to `rdd kubectl` so out-of-skew clusters get a version-matched binary fetched from `dl.k8s.io`. The new `pkg/kuberlr/` package is well-organized (cache/downloader/resolver/exec split), thoroughly unit-tested for skew rules and flag binding, and exercised end-to-end by a fake-mirror BATS test. The implementation is largely solid; my main concern is download UX (the "downloading" notice is logged at `info`, below the default `warn` level, so users see no feedback during a multi-second/minute download).

### Findings

#### Critical Issues
None.

#### Important Issues

**I1. Download progress invisible at default log level** — `pkg/kuberlr/downloader.go:78`

```go
logrus.Infof("downloading kubectl v%d.%d.%d from %s", want.Major, want.Minor, want.Patch, mirrorURL())
```

Outside developer mode `RDD_LOG_LEVEL` defaults to `warn` (see `cmd/rdd/main.go:96-101`), so this `Infof` never reaches the user's terminal. A user running `rdd kubectl get pods` against an out-of-skew cluster will see rdd appear to hang for several seconds (or several minutes on a slow link, up to the 5-minute `downloadTimeout` at `downloader.go:39`) with no indication that a network fetch is happening. They may Ctrl+C the apparent hang and lose the partial download. The fall-back-on-failure rationale in `cmd/rdd/kubectl.go:66-74` is correct, but it depends on the user knowing a download is in progress. This is the kind of UX regression the repo guideline calls out for routine fall-through paths.

Fix: bump to `logrus.Warn` (or print to `os.Stderr` directly) so the message lands above the default threshold.

```diff
-	logrus.Infof("downloading kubectl v%d.%d.%d from %s", want.Major, want.Minor, want.Patch, mirrorURL())
+	logrus.Warnf("downloading kubectl v%d.%d.%d from %s", want.Major, want.Minor, want.Patch, mirrorURL())
```

(important, regression)

#### Suggestions

**S1. `parseKubeConfigFlags` test does not exercise the deprecated `--username`/`--password` unbound assertion** — `pkg/kuberlr/resolver_test.go:69-153`

```go
type want struct {
    ...
    username   string
    password   string
    ...
}
```

The `want` struct includes `username` and `password` and the comparison loop dereferences `cf.Username`/`cf.Password`, but no test case passes `--username=alice` or `--password=secret` in `args`. The docstring at `resolver.go:168-175` justifies leaving them unbound to avoid the interactive credential prompt; if a future refactor accidentally calls `WithDeprecatedPasswordFlag`, the test suite as written will not catch it.

Fix: add cases that pass the flags and assert they remain empty.

```diff
+		{"--username unbound by design", []string{"--username=alice", "get"}, want{}},
+		{"--password unbound by design", []string{"--password=secret", "get"}, want{}},
```

(suggestion, gap)

**S2. Sentinel comparison in fake server uses `!=` instead of `errors.Is`** — `bats/tests/10-cli/fake-kube/server/main.go:91`

```go
if err := server.Serve(listener); err != nil && err != http.ErrServerClosed {
    log.Fatalf("serve: %v", err)
}
```

`http.ErrServerClosed` is a sentinel, and `errors.Is(err, http.ErrServerClosed)` is the modern idiom; today the fixture never wraps it so the equality compare works, but the test code reads like production code from a quick scan. Either form is correct, but `errors.Is` is the convention the rest of the repo uses for sentinels.

```diff
-	if err := server.Serve(listener); err != nil && err != http.ErrServerClosed {
+	if err := server.Serve(listener); err != nil && !errors.Is(err, http.ErrServerClosed) {
```

(suggestion, enhancement)

**S3. `ensureCached` may chmod/rename a binary whose `defer os.Remove` then races** — `pkg/kuberlr/downloader.go:107-128`

```go
tmp, err := os.CreateTemp(filepath.Dir(dst), ".kubectl-*")
...
tmpName := tmp.Name()
defer os.Remove(tmpName)
...
return os.Rename(tmpName, dst)
```

The deferred `os.Remove(tmpName)` runs after `Rename` succeeds. On Unix the source path is gone so `Remove` silently returns `ENOENT`; on Windows the file no longer exists at `tmpName` either, so `Remove` is also a no-op. Functionally harmless, but the intent of the defer is "clean up on failure," not "always best-effort." A reader has to mentally verify it cannot delete the *new* `dst`. A small rename-then-clear pattern would express the intent more clearly.

```diff
-	tmpName := tmp.Name()
-	defer os.Remove(tmpName)
+	tmpName := tmp.Name()
+	removeOnFailure := true
+	defer func() {
+		if removeOnFailure {
+			os.Remove(tmpName)
+		}
+	}()
...
-	return os.Rename(tmpName, dst)
+	if err := os.Rename(tmpName, dst); err != nil {
+		return err
+	}
+	removeOnFailure = false
+	return nil
```

(suggestion, enhancement)

**S4. `cmd/rdd/main.go:174` log suppression depends on `Err == nil` but only Windows `Exec` produces that shape** — `cmd/rdd/main.go:172-178`

```go
var exitErr *cliexit.Error
if errors.As(err, &exitErr) {
    if exitErr.Err != nil {
        logrus.Error(err)
    }
    os.Exit(exitErr.Code)
}
```

The only producer of `&cliexit.Error{Code: ...}` with a nil inner `Err` is `pkg/kuberlr/exec_windows.go:39`. On Unix, `syscall.Exec` replaces the process and the exit code is never seen by main; on every other failure path (`cliexit.Rejected`, `cliexit.Timeout`) the inner `Err` is non-nil and logging proceeds as before. The behavior is correct but the comment at `main.go:170-171` is the only signal of the contract — a future caller that builds `&cliexit.Error{Code: 5}` without an `Err` would silently lose its log output. Worth a `// Suppresses logging — caller has already written output to stderr` comment on the producer side, or a constructor that documents the no-log convention.

```diff
+// Silent wraps a kubectl child's exit code with no log output, because the
+// child has already written its own diagnostics.
+func Silent(code int) *Error { return &Error{Code: code} }
```

(suggestion, enhancement)

**S5. `serverProbeTimeout` of 2 s may be tight for high-latency mirrors** — `pkg/kuberlr/resolver.go:41-44`

```go
const serverProbeTimeout = 2 * time.Second
```

The comment says "reachable cluster answers in under 100 ms," which is true for LAN clusters, but a GKE/EKS control plane reached over a corporate VPN with TLS handshake from a cold connection can easily blow past 2 s. A spurious timeout falls through silently to the embedded kubectl (per `serverVersion`'s contract at `resolver.go:135-138`), so a user with a slightly slow link gets a mismatched-binary failure they cannot diagnose. 5 s is the same order of magnitude and gives much more headroom for legitimate latency.

(suggestion, enhancement)

### Design Observations

**Strengths**
- The four-file split (`cache.go`, `downloader.go`, `resolver.go`, `exec_{unix,windows}.go`) cleanly separates concerns and keeps each file under 200 lines.
- The fake-kube test fixtures live in their own Go module — a tidy way to keep test-only deps out of the main `go.mod` while still being buildable with `go build`.
- Comprehensive table-driven tests for `isClientOnly` and `parseKubeConfigFlags` lock in the subtle flag-parsing behavior (the kustomize/help/version subcommands, `--client=garbage` short-circuit, `--` separator handling).
- Recursion guard via `RDD_KUBECTL_RESOLVED` belt-and-braces matched by the same-process `skipResolver` toggle for `rdd ctl` is well-conceived.

**Concerns**
- The package-level `httpClient` at `downloader.go:45` has a hard 5-minute `Timeout`. A user on a 3G link downloading a 50 MB kubectl could hit it; a corporate proxy that buffers the entire response before forwarding can too. Today this surfaces as a clear failure, but the documented `(offline)` TODO at `resolver.go:64-67` will need either a longer ceiling or per-request override before it can serve as a useful fallback. (future)

### Testing Assessment

Coverage is strong for both unit and integration paths. Specific untested scenarios, ranked by risk:

1. **Concurrent `rdd kubectl` invocations downloading the same version simultaneously.** Two processes both miss the `os.Stat(path)` in `ensureCached`, both download, both `os.Rename` to the same dst. On modern Go (1.25+) Windows uses `MOVEFILE_REPLACE_EXISTING` so it should work, but a regression here is invisible without a stress test.
2. **`--username`/`--password` deliberately unbound on the resolver side** — see S1.
3. **`isClientOnly` short-circuit when the user passes `--match-server-version`** — the docstring at `resolver.go:208-216` mentions klog flags must stay unbound, but no test exercises a `--match-server-version` + cluster-bound subcommand combination that would trigger kubectl's own server probe.
4. **Resolver behavior when the server's `/version` returns valid JSON but with a `gitVersion` that parses successfully to something unexpected (e.g., `v0.0.0`).** The unit tests only cover the unparseable case; a server returning `gitVersion: ""` would parse-fail, but `gitVersion: "v0.0.0"` would parse OK and trigger a probably-incorrect download attempt.
5. **`rdd ctl` does not re-trigger the resolver** — `cmd/rdd/kubectl.go:48` is asserted by inspection only; no test exercises `rdd ctl get pods` and confirms no probe fires.

### Documentation Assessment

- `docs/design/cmd_service.md:15` correctly fixes the stale `config.json` → `config.yaml` reference, but this is a drive-by cleanup not strictly required for the resolver — see Commit Structure below.
- `docs/design/environment.md:29-35` adds an "Internal Variables" section documenting `RDD_KUBECTL_RESOLVED`. Helpful for the strace/process-tree reader.
- The new "(macOS, instance `2`)" example column at `docs/design/environment.md:46-57` is consistent with the rest of the table.

### Commit Structure

Single commit, well-scoped subject. The `config.json` → `config.yaml` fix at `docs/design/cmd_service.md:15` is a drive-by doc cleanup unrelated to the resolver work; bundling is harmless here since both touch the same doc, but a strict reading of the "smallest reasonable diff" rule would split it.

### Acknowledged Limitations

- **TODO(eviction) at `pkg/kuberlr/cache.go:33-39`** — the cache only grows, and SIGKILL during a download leaks `.kubectl-*` temp files. Author acknowledges and proposes a per-version TTL/LRU sweep. This PR adds the cache; the eviction work is correctly deferred.
- **TODO(offline) at `pkg/kuberlr/downloader.go:56-57` and `pkg/kuberlr/resolver.go:64-67`** — air-gapped users have no way to forbid downloads; resolver fails with the download error instead of falling back to the closest cached binary. Documented as future work.
- **`exec_windows.go:27-30` — `cmd.Context()` is never canceled today**; comment notes that signal-driven cancellation must revisit this site to avoid hard-killing kubectl on Ctrl+C. Correct call given cobra context is never wired up via `cli.RunNoErrOutput`.
- **`exec_unix.go:19-20` — `ctx` is intentionally unused** ("syscall.Exec replaces the process so cancellation no longer applies"). Signature parity with the Windows variant. Correct.
- **`isClientOnly` bool-parse short-circuit on `--help=garbage`** (`pkg/kuberlr/resolver.go:218-224`) — author acknowledges this deviates from a strict conservative bias and explains why kubectl's own embedded path will reject the same flag, so no silent mismatched execution follows. Reasoned, not a defect.
- **`local_teardown_file` at `bats/tests/10-cli/7-kubectl-cache.bats:101-109`** — explicitly deviates from the project's "no teardown_file" rule with a justification (HTTP server leaks ports between sessions if left running). Reasonable exception.
- **`bats/tests/10-cli/7-kubectl-cache.bats:188-192`** — `kubectl version` exit code unasserted on 500 because modern kubectl's behavior varies. Documented and reasonable.
- **`maxKubectlBytes` truncation is silent** (`pkg/kuberlr/downloader.go:138-141`) — author notes the failure surfaces as a "checksum mismatch" rather than a size cap message, and the cap must be bumped if kubectl outgrows it. Acknowledged.
- **Inline PR comment by @mook-as at `cmd/rdd/service_paths.go:33`** asking whether `RDD_CACHE_DIR` could be exposed via `rdd svc paths` was addressed: the PR adds both `cache_dir` and `kubectl_cache` keys (`service_paths.go:32-33`) and the corresponding BATS test case at `bats/tests/10-cli/5-paths.bats:42-51` verifies the override is honored.

### Coverage Summary

- `.github/dependabot.yml` — Reviewed, no issues
- `bats/helpers/paths.bash` — Reviewed, no issues (new `winpath` helper correctly distinguishes `is_msys` from WSL+`using_windows_exe`)
- `bats/tests/10-cli/5-paths.bats` — Reviewed, no issues
- `bats/tests/10-cli/7-kubectl-cache.bats` — Reviewed, no issues (the `local_teardown_file` deviation is acknowledged)
- `bats/tests/10-cli/fake-kube/README.md` — Trivial
- `bats/tests/10-cli/fake-kube/go.mod` — Trivial
- `bats/tests/10-cli/fake-kube/kubectl/main.go` — Reviewed, no issues
- `bats/tests/10-cli/fake-kube/server/main.go` — **S2**
- `cmd/rdd/kubectl.go` — Reviewed, no issues
- `cmd/rdd/main.go` — **S4**
- `cmd/rdd/service_paths.go` — Reviewed, no issues
- `docs/design/cmd_service.md` — Reviewed, no issues
- `docs/design/environment.md` — Reviewed, no issues
- `go.mod` — Trivial (cli-runtime move from indirect to direct require)
- `pkg/kuberlr/cache.go` — Reviewed, no issues
- `pkg/kuberlr/downloader.go` — **I1, S3**
- `pkg/kuberlr/exec_unix.go` — Reviewed, no issues
- `pkg/kuberlr/exec_windows.go` — Reviewed, no issues
- `pkg/kuberlr/resolver.go` — **S5**
- `pkg/kuberlr/resolver_test.go` — **S1**
### [Codex] — Pass 1

### Executive Summary

The PR adds a kubectl version resolver with cache-backed downloads and solid streaming verification. I found two fix-before-merge issues: a Windows cache publication race when concurrent processes download the same version, and a WSL BATS failure caused by building fixtures with the wrong Go toolchain.

### Findings

#### Critical Issues

None.

#### Important Issues

I1. **Concurrent cache misses can fail on Windows after another process wins the publish race** — `pkg/kuberlr/downloader.go:68-128` (important, regression)

```go
if err := os.Chmod(tmpName, 0o755); err != nil {
	return err
}
return os.Rename(tmpName, dst)
```

`ensureCached` treats the cache as shared across instances (`pkg/kuberlr/cache.go:24-27`), but the miss path checks once at line 70, downloads at line 79, and publishes with `os.Rename` at line 128. On Windows, `os.Rename` fails when `dst` already exists, so two first-run `rdd kubectl` processes for the same server version can race: one publishes a valid binary, the other returns `resolving kubectl version: ...` even though the cache is now valid.

Fix: after a Windows rename failure, re-stat `dst` and accept the winner, or guard the download/publish sequence with a cache lock.

```diff
-	return os.Rename(tmpName, dst)
+	if err := os.Rename(tmpName, dst); err != nil {
+		if runtime.GOOS == "windows" {
+			if _, statErr := os.Stat(dst); statErr == nil {
+				return nil
+			}
+		}
+		return err
+	}
+	return nil
```

I2. **WSL can build the fake kubectl for Linux while rdd.exe requests Windows** — `bats/tests/10-cli/7-kubectl-cache.bats:15-39` (important, regression)

```bash
GOOS=$(go env GOOS)
export GOOS
GOARCH=$(go env GOARCH)
export GOARCH
...
go build -ldflags='-s -w' \
    -o "$(winpath "${MIRROR_BIN_DIR}/kubectl${EXE}")" \
    ./kubectl
```

On WSL, `using_windows_exe()` always selects Windows binaries (`bats/helpers/defaults.bash:15-17`), and `commands.bash` sets `EXE=.exe` for that path (`bats/helpers/commands.bash:3-8`). This test still calls bare `go` at lines 15, 17, 35, and 38. If `go` resolves to a Linux toolchain, `GOOS=linux` and the mirror is populated under `bin/linux/...`, but native `rdd.exe` requests `bin/windows/...`, producing a 404 instead of exercising the resolver.

Fix: use the same executable suffix convention as the rest of the Windows test harness.

```diff
-GOOS=$(go env GOOS)
+GOOS=$("go${EXE}" env GOOS)
 export GOOS
-GOARCH=$(go env GOARCH)
+GOARCH=$("go${EXE}" env GOARCH)
 export GOARCH
 ...
-        go build -ldflags='-s -w' \
+        "go${EXE}" build -ldflags='-s -w' \
 ...
-        go build -ldflags='-s -w' \
+        "go${EXE}" build -ldflags='-s -w' \
```

#### Suggestions

None.

### Design Observations

#### Strengths

The resolver keeps `rdd ctl` on the embedded kubectl path and avoids probing for obvious client-only kubectl commands, which keeps the new network behavior scoped to external-cluster use.

The download path verifies the sha512 while streaming into a temp file and only publishes after verification, which avoids partial binaries being mistaken for valid cache entries.

### Testing Assessment

1. Concurrent first-time cache misses for the same kubectl version are not covered; this is the scenario behind I1 and should include Windows behavior.
2. WSL with native `rdd.exe` plus a Linux `go` earlier on `PATH` is not covered; this is the scenario behind I2.
3. Verified locally: `go test ./pkg/kuberlr` passed, and `make build-rdd` completed. BATS was not run because `bats/lib/bats-core/bin/bats` is not initialized in this worktree.

### Documentation Assessment

The new cache path and environment variables are documented in `docs/design/environment.md` and `docs/design/cmd_service.md`. No documentation gaps found.

### Acknowledged Limitations

- **Code comment**: `pkg/kuberlr/resolver.go:64` says `TODO(offline): when no cached kubectl matches and the network is down...`. This intentionally defers offline fallback; the change makes it relevant for air-gapped users but documents the behavior.
- **Code comment**: `pkg/kuberlr/cache.go:33` says `TODO(eviction): the cache only grows...`. Cache growth and stale temp cleanup are acknowledged and should not block this PR.
- **Code comment**: `pkg/kuberlr/downloader.go:138` notes that an oversized kubectl body reports as `"checksum mismatch"` rather than a size-cap error. This is a diagnostic limitation, not a correctness issue.
- **Code comment**: `bats/tests/10-cli/7-kubectl-cache.bats:102` says it “Deviates from the `no teardown_file` rule.” The rationale is consistent with the test-owned ephemeral HTTP server, so this is not reported as a finding.

### Coverage Summary

- `.github/dependabot.yml` — Reviewed, no issues
- `bats/helpers/paths.bash` — Reviewed, no issues
- `bats/tests/10-cli/5-paths.bats` — Reviewed, no issues
- `bats/tests/10-cli/7-kubectl-cache.bats` — I2
- `bats/tests/10-cli/fake-kube/README.md` — Reviewed, no issues
- `bats/tests/10-cli/fake-kube/go.mod` — Reviewed, no issues
- `bats/tests/10-cli/fake-kube/kubectl/main.go` — Reviewed, no issues
- `bats/tests/10-cli/fake-kube/server/main.go` — Reviewed, no issues
- `cmd/rdd/kubectl.go` — Reviewed, no issues
- `cmd/rdd/main.go` — Reviewed, no issues
- `cmd/rdd/service_paths.go` — Reviewed, no issues
- `docs/design/cmd_service.md` — Reviewed, no issues
- `docs/design/environment.md` — Reviewed, no issues
- `go.mod` — Trivial
- `pkg/kuberlr/cache.go` — I1
- `pkg/kuberlr/downloader.go` — I1
- `pkg/kuberlr/exec_unix.go` — Reviewed, no issues
- `pkg/kuberlr/exec_windows.go` — Reviewed, no issues
- `pkg/kuberlr/resolver.go` — Reviewed, no issues
- `pkg/kuberlr/resolver_test.go` — Reviewed, no issues

### [Gemini] — Pass 1

### Executive Summary
This PR implements a kuberlr-style version resolver for `rdd kubectl`, dynamically downloading and executing a version-matched binary from `dl.k8s.io` when the target cluster is out of skew. The most critical issue is a Windows platform regression where `os.Rename` fails if the cache destination file already exists, breaking concurrent downloads and retry idempotency.

### Findings

#### Critical Issues

C1. **Windows `os.Rename` fails when destination exists** — `pkg/kuberlr/downloader.go:127` (critical, regression)

```go
	if err := os.Chmod(tmpName, 0o755); err != nil {
		return err
	}
	return os.Rename(tmpName, dst)
```

On Windows, `os.Rename` fails when the destination file already exists (unlike Unix, which atomically replaces). If two `rdd kubectl` processes resolve the same uncached version concurrently, the second process to finish downloading will fail at `os.Rename` and abort the command. Even without concurrency, if a leftover cached file exists (e.g., from a partial run or manual copy), `os.Rename` will persistently fail.

Fix: On Windows, check if the error from `os.Rename` is due to the file already existing. If it is, verify that `dst` exists and treat it as a success, or explicitly remove `dst` before renaming.

#### Important Issues

I1. **Auth plugins trigger interactive prompts during version probe** — `pkg/kuberlr/resolver.go:121-127` (important, gap)

```go
func serverVersion(args []string) (semver.Version, bool) {
	cfg, err := loadClientConfig(args)
	// ...
	client, err := discovery.NewDiscoveryClientForConfig(cfg)
```

The probe purposefully ignores `--username` and `--password` to avoid interactive prompts on `ClientConfig()`. However, if the user's `KUBECONFIG` uses an `exec` credential plugin (e.g., AWS EKS, kubelogin) that lacks a cached token, `client.ServerVersion()` will execute the plugin, which can trigger an interactive prompt (or open a browser window) before `kubectl` even starts.

Fix: Enforce non-interactive execution for the probe's client config (e.g., by omitting auth or catching the prompt), or add a strict fallback.

#### Suggestions

S1. **Forbidden teardown file addition** — `bats/tests/10-cli/7-kubectl-cache.bats:101-108` (suggestion, regression)

```bash
local_teardown_file() {
    # Deviates from the "no teardown_file" rule. The rule's intent is to
    # preserve rdd state for post-mortem inspection; an HTTP server bound
    # to an ephemeral port is the opposite — leaving it running just leaks
    # ports between sessions.
```

Despite the explanatory comment, repo conventions strictly forbid adding `local_teardown_file()`. The project relies on `local_setup_file()` starting with `rdd svc delete || :` to clean up from previous runs. 

Fix: Remove `local_teardown_file()` and handle any necessary port cleanup or process termination in `local_setup_file()`, matching the project's established pattern.

S2. **Command substitution swallows errors** — `bats/tests/10-cli/5-paths.bats:43`, `bats/tests/10-cli/7-kubectl-cache.bats:94-97` (suggestion, regression)

```bash
    cache_override=$(winpath "${BATS_TEST_TMPDIR}/cache-override")
```

BATS style for this repo prefers `run -0 cmd ...; value=${output}` over `value=$(cmd ...)` because command substitution swallows non-zero exit codes and leaves the variable empty on failure. This applies to multiple instances of `winpath` usage in these tests.

Fix: Replace `$(winpath ...)` with `run -0 winpath ...` and assign from `${output}`.

### Design Observations

**Concerns**
None.

**Strengths**
- **Client-only and flag degradation**: `isClientOnly` relies on `pflag`'s `UnknownFlags = true` to gracefully skip unknown subcommands/flags while catching actual client-only invocations. The conservative bias gracefully degrades to a probe.
- **Embedded control plane short-circuit**: `cmd/rdd/kubectl.go` adds `kuberlr.SkipResolver()` for the embedded `ctl` command, preventing an unconditional network probe against an embedded server that inherently cannot be out of skew.

### Testing Assessment
1. **Concurrent Cache Writes (High Risk)**: There is no test simulating concurrent cache misses invoking `ensureCached` to verify cache initialization safety and lock-free execution behavior on multiple platforms (specifically Windows).
2. **Interactive Credentials (Medium Risk)**: No test ensures the background version probe does not indefinitely block when KUBECONFIG has an `exec` plugin waiting on `stdin`.

### Documentation Assessment
Reviewed, no issues.

### Acknowledged Limitations

- **Code comment**: "TODO(offline): when no cached kubectl matches and the network is down... Add an opt-out for download attempts (config flag or RDD_KUBECTL_OFFLINE) and fall back to the cached binary closest to the server's minor." (`pkg/kuberlr/resolver.go:64`). The code explicitly defers handling of disconnected execution when the cache misses.
- **Code comment**: "TODO(eviction): the cache only grows; a user switching across many remote clusters accumulates ~50 MB per minor version indefinitely. SIGKILL or power loss during a download also leaks the .kubectl-* temp file..." (`pkg/kuberlr/cache.go:33`). Cache bounding and garbage collection of temporary files are explicitly deferred.
- **Code comment**: "CommandContext hard-kills on ctx cancellation. The cobra context is never canceled today; wiring signal-driven cancellation must revisit this site..." (`pkg/kuberlr/exec_windows.go:27`). The code acknowledges that future signal-driven context cancellation would leak `kubectl` children on Windows due to `Process.Kill()`.
- **Code comment**: "If kubectl ever exceeds maxKubectlBytes, io.LimitReader truncates silently and the digest covers a partial body, so the caller sees a "checksum mismatch" error rather than a size-cap message." (`pkg/kuberlr/downloader.go:142`). The implementation accepts a misleading error message rather than detecting truncation directly.
- **Code comment**: "The deprecated --username and --password flags stay unbound... return the interactive variant that prompts on stdin when the kubeconfig context lacks credentials — the wrong behavior for a background version probe." (`pkg/kuberlr/resolver.go:167`). The resolver intentionally ignores these flags to prevent interactive prompts on `ClientConfig()`.
- **PR comment**: The author addressed @mook-as's comment to expose `RDD_CACHE_DIR` by adding it to `cmd/rdd/service_paths.go`. Subsequent changes match this rationale.

### Coverage Summary
- `.github/dependabot.yml`: Trivial
- `bats/helpers/paths.bash`: Reviewed, no issues
- `bats/tests/10-cli/5-paths.bats`: S2
- `bats/tests/10-cli/7-kubectl-cache.bats`: S1, S2
- `bats/tests/10-cli/fake-kube/README.md`: Trivial
- `bats/tests/10-cli/fake-kube/go.mod`: Trivial
- `bats/tests/10-cli/fake-kube/kubectl/main.go`: Reviewed, no issues
- `bats/tests/10-cli/fake-kube/server/main.go`: Reviewed, no issues
- `cmd/rdd/kubectl.go`: Reviewed, no issues
- `cmd/rdd/main.go`: Reviewed, no issues
- `cmd/rdd/service_paths.go`: Reviewed, no issues
- `docs/design/cmd_service.md`: Trivial
- `docs/design/environment.md`: Trivial
- `go.mod`: Trivial
- `pkg/kuberlr/cache.go`: Reviewed, no issues
- `pkg/kuberlr/downloader.go`: C1
- `pkg/kuberlr/exec_unix.go`: Reviewed, no issues
- `pkg/kuberlr/exec_windows.go`: Reviewed, no issues
- `pkg/kuberlr/resolver.go`: I1
- `pkg/kuberlr/resolver_test.go`: Reviewed, no issues
