# Deep Review: 20260503-002937-pr-348

| | |
|---|---|
| **Date** | 2026-05-03 00:29 |
| **Repo** | [rancher-sandbox/rancher-desktop-daemon](https://github.com/rancher-sandbox/rancher-desktop-daemon) |
| **Round** | 2 |
| **Author** | [@jandubois](https://github.com/jandubois) |
| **PR** | [#348](https://github.com/rancher-sandbox/rancher-desktop-daemon/pull/348) — rdd kubectl: embed kuberlr-style version resolver |
| **Branch** | `embed-kuberlr` |
| **Commits** | `7531daf` rdd kubectl: embed kuberlr-style version resolver |
| **Reviewers** | Claude Opus 4.7 (effort: xhigh), Codex GPT 5.5 (effort: xhigh), Gemini 3.1 Pro (effort: default) |
| **Verdict** | **Merge with fixes** — flag-scanner gap (I1) is a real correctness regression for `rdd kubectl --server=…`; the two UX regressions (I2 warn-level logging, I3 unconditional 2 s probe) come from the Round 1 fixes and want a second pass. |
| **Wall-clock time** | `18 min 55 s` |

## Executive Summary

Round 1's two Important findings (download timeout, sha-mismatch fixture leak) and most Suggestions are addressed in `7531daf`. Round 2 surfaces three new Importants on the same code paths the Round 1 fixes touched: the kubectl-flag scanner only looks at `--kubeconfig`/`--context` so `--server`/`--cluster`/`--user` skew the probe target away from where kubectl will actually connect (I1, raised independently by Claude and Codex); the warn-level logging Round 1 added on the four silent fall-through paths is loud at the project's default `warn` log level — every `rdd kubectl version --client` on a host without a kubeconfig prints a warn line (I2); and the 2 s `serverProbeTimeout` Round 1 added stalls every `rdd kubectl` invocation against an unreachable cluster, including client-only commands like `--help` (I3).

Suggestions are localized: trailing-slash handling on `RDD_KUBECTL_MIRROR`, a parse failure on dev `go run` builds because `componentbasever`'s default `GitVersion` is unparseable, three stale `bats/fake-kube/...` path comments, and a small robustness gap in `fetchSha512` against `sha512sum`-format mirrors.

Structure: 0 Critical · 3 Important · 5 Suggestions.

## Critical Issues

None.

## Important Issues

I1. **Resolver only sees `--kubeconfig` and `--context` — kubectl's other connection-override flags steer the actual call to a different cluster than the probe** — `pkg/kuberlr/resolver.go:124-155`, `cmd/rdd/kubectl.go:60,74` [Claude Opus 4.7][Codex GPT 5.5]

```go
func loadClientConfig(args []string) (*rest.Config, error) {
    kubeconfig, context := extractKubeFlags(args)
    rules := clientcmd.NewDefaultClientConfigLoadingRules()
    rules.ExplicitPath = kubeconfig
    overrides := &clientcmd.ConfigOverrides{CurrentContext: context}
    return clientcmd.NewNonInteractiveDeferredLoadingClientConfig(rules, overrides).ClientConfig()
}
```

`extractKubeFlags` honors `--kubeconfig` and `--context`. kubectl honors several more cluster-selection flags: `--server` (`-s`), `--cluster`, `--user`, `--token`, `--certificate-authority`, `--insecure-skip-tls-verify`, `--client-certificate`, `--client-key`, `--tls-server-name`. `kubectlAction` passes the same raw `args` to `Resolve` and to the kubectl child, so a user running `rdd kubectl --server=https://other.cluster:6443 get pods` makes the resolver probe the kubeconfig's default cluster while kubectl talks to `other.cluster`. The two clusters can advertise different versions, so the resolver may decide "in skew, run embedded" against an out-of-skew target — the very case the resolver exists to handle. Round 1 flagged this as a design observation ("Manual kubectl flag parsing carries drift risk"); Round 2 makes it a concrete gap.

The most consequential override is `--server`: it routes kubectl at any URL with no kubeconfig involvement and the resolver has no knob for it.

Fix: thread the standard cluster overrides through `clientcmd.ConfigOverrides`. The minimum is `--server` → `overrides.ClusterInfo.Server`, `--cluster` → `overrides.Context.Cluster`, `--user` → `overrides.Context.AuthInfo`. A more durable fix delegates to `genericclioptions.ConfigFlags` so resolver and kubectl child cannot drift by construction.

```diff
-func extractKubeFlags(args []string) (kubeconfig, context string) {
+func extractKubeFlags(args []string) (kubeconfig, context, server, cluster, user string) {
     for i, arg := range args {
         if arg == "--" {
             break
         }
         switch {
         case arg == "--kubeconfig" && i+1 < len(args):
             kubeconfig = args[i+1]
         case strings.HasPrefix(arg, "--kubeconfig="):
             kubeconfig = strings.TrimPrefix(arg, "--kubeconfig=")
         case arg == "--context" && i+1 < len(args):
             context = args[i+1]
         case strings.HasPrefix(arg, "--context="):
             context = strings.TrimPrefix(arg, "--context=")
+        case arg == "--server" && i+1 < len(args), arg == "-s" && i+1 < len(args):
+            server = args[i+1]
+        case strings.HasPrefix(arg, "--server="), strings.HasPrefix(arg, "-s="):
+            server = strings.TrimPrefix(strings.TrimPrefix(arg, "--server="), "-s=")
+        // …same for --cluster, --user
         }
     }
-    return kubeconfig, context
+    return
 }
```

If the full flag set is too much code for this PR, document the gap in the `Resolve` doc and add a unit-test row for `--server` so the limitation is explicit.

I2. **Probe failure logs at warn — every `rdd kubectl` on a host without a kubeconfig prints a warn line at the default log level** — `pkg/kuberlr/resolver.go:97-115`, `cmd/rdd/log.go:86-90` [Claude Opus 4.7]

```go
cfg, err := loadClientConfig(args)
if err != nil {
    logrus.WithError(err).Warn("kubectl resolver: cannot load kubeconfig; using embedded kubectl")
    return semver.Version{}, false, nil //nolint:nilerr
}
```

`cmd/rdd/log.go:86-90` defaults the log level to `warn` outside developer mode, and `docs/design/environment.md:14` documents the same default. Every `rdd kubectl version --client`, `rdd kubectl --help`, and `rdd kubectl completion` on a host with no kubeconfig (or with a kubeconfig pointing at a currently-unreachable cluster) now prints `level=warning msg="kubectl resolver: cannot load kubeconfig; using embedded kubectl"` before kubectl runs. Round 1's Resolution table records that the warn-level logging was deliberate — bundled with the S2 parse-failure fix to "let an operator running with the default level learn why kubectl skipped a download" — but the chosen level fires once per command rather than once per surprising state.

The "cannot parse server version" branch (line 113) is genuinely surprising — keep that one at warn. The other three branches (no kubeconfig, no discovery client, cluster probe failed) are routine for a host that uses `rdd kubectl` for client-only work and should be `Debug`.

Fix:

```diff
-    logrus.WithError(err).Warn("kubectl resolver: cannot load kubeconfig; using embedded kubectl")
+    logrus.WithError(err).Debug("kubectl resolver: cannot load kubeconfig; using embedded kubectl")
```

Same for the `cannot build discovery client` and `cluster probe failed` branches. Leave the `cannot parse server version` branch at warn.

I3. **Unconditional 2 s `serverProbeTimeout` stalls every `rdd kubectl` against an unreachable cluster, including client-only commands** — `pkg/kuberlr/resolver.go:31,49-68,101` [Claude Opus 4.7]

```go
const serverProbeTimeout = 2 * time.Second
…
func Resolve(ctx context.Context, args []string) (string, error) {
    if os.Getenv(envSkipResolver) != "" {
        return "", nil
    }
    embedded, err := embeddedVersion()
    …
    server, ok, err := serverVersion(args)
```

`Resolve` runs unconditionally on every `rdd kubectl` and `rdd ctl` invocation. When the configured cluster is unreachable — laptop on a flight, VPN down, firewalled CI runner — `client.ServerVersion()` blocks until `cfg.Timeout` fires. Two seconds of latency on `rdd kubectl version --client`, `rdd kubectl --help`, and `rdd kubectl completion bash` is a regression versus the embedded-only behavior these client-only commands had before this PR.

Fix: short-circuit the obvious client-only invocations before probing. The kubectl subcommands that never need a server are `version --client`, `completion`, `config`, and the help paths (`--help`, `-h`):

```go
func Resolve(ctx context.Context, args []string) (string, error) {
    if os.Getenv(envSkipResolver) != "" {
        return "", nil
    }
    if isClientOnly(args) {
        return "", nil
    }
    …
}
```

If a precise client-only matcher feels brittle, halve `serverProbeTimeout` to 1 s — modern apiservers answer `/version` in well under 100 ms, so the headroom does not need to be that large.

## Suggestions

S1. **`RDD_KUBECTL_MIRROR` with a trailing slash produces double-slash URLs** — `pkg/kuberlr/downloader.go:47-52,87` [Claude Opus 4.7]

```go
func mirrorURL() string {
    if v := os.Getenv(envKubeMirror); v != "" {
        return v
    }
    return defaultKubeMirror
}
…
base := fmt.Sprintf("%s/release/v%d.%d.%d/bin/%s/%s/kubectl", mirrorURL(), …)
```

If a user sets `RDD_KUBECTL_MIRROR=https://my.mirror.example/`, `base` becomes `https://my.mirror.example//release/…`. Many HTTP servers tolerate the empty path segment; some return 404. The default mirror and the BATS test never set a trailing slash, so the case stays invisible until a user trips on it.

Fix:

```diff
 func mirrorURL() string {
     if v := os.Getenv(envKubeMirror); v != "" {
-        return v
+        return strings.TrimRight(v, "/")
     }
     return defaultKubeMirror
 }
```

S2. **Dev `go run` builds fail to parse the embedded version and error out for every `rdd kubectl`** — `pkg/kuberlr/resolver.go:53-56,84-86` [Claude Opus 4.7]

```go
func embeddedVersion() (semver.Version, error) {
    return semver.ParseTolerant(componentbasever.Get().GitVersion)
}
```

`componentbasever`'s default `gitVersion` is `"v0.0.0-master+$Format:%H$"` (`k8s.io/component-base/version/base.go:58`); `:` and `%` are invalid in semver build metadata, so `ParseTolerant` returns an error. `make build` overrides the constant via `-ldflags -X`, but `go run ./cmd/rdd kubectl …` and any IDE-driven debug run that bypasses the Makefile now fail with `resolving kubectl version: reading embedded kubectl version: …`. Before this PR, `go run ./cmd/rdd kubectl` worked.

Fix: treat an unparseable embedded version the same as an unreachable server — log debug, skip the resolver, let kubectl run. The dev workflow stays unbroken; production builds still parse the real version.

```diff
     embedded, err := embeddedVersion()
     if err != nil {
-        return "", fmt.Errorf("reading embedded kubectl version: %w", err)
+        logrus.WithError(err).Debug("kubectl resolver: embedded version not parseable; using embedded kubectl")
+        return "", nil
     }
```

S3. **Stale `bats/fake-kube/...` references in three comments — actual path is `bats/tests/10-cli/fake-kube/...`** — `bats/tests/10-cli/fake-kube/go.mod:1`, `bats/tests/10-cli/7-kubectl-cache.bats:8,32` [Claude Opus 4.7]

```go
// Test fixture: see bats/fake-kube/README for the role this module plays.
```

```bash
# under bats/fake-kube/server plays both roles. The fake kubectl prints
# bats/fake-kube/go.mod, not the parent rancher-desktop-daemon one.
```

The fake-kube module sits under `bats/tests/10-cli/fake-kube/`. Three call sites still point at the obsolete `bats/fake-kube/...` path. The README itself uses the relative phrasing "next to the test that uses them" and stays correct.

Fix: replace each `bats/fake-kube` occurrence with `bats/tests/10-cli/fake-kube`, or with relative phrasing ("the sibling go.mod", "the server next to this file").

S4. **`fetchSha512` keeps trailing tokens — `sha512sum`-format mirrors break verification** — `pkg/kuberlr/downloader.go:120-126` [Claude Opus 4.7]

```go
func fetchSha512(ctx context.Context, url string) (string, error) {
    var sb strings.Builder
    if err := streamGet(ctx, url, &sb); err != nil {
        return "", err
    }
    return strings.TrimSpace(sb.String()), nil
}
```

`dl.k8s.io` publishes one line containing only the hex digest, so the current code works against the default mirror. A user who mirrors the kubectl tree with the system `sha512sum` writes `<digest>  kubectl` per line; `TrimSpace` only strips edges, so the trailing `  kubectl` survives, the comparison at line 110 fails, and every download against that mirror reports `checksum mismatch for …`. The error misleads — the binary is fine, the parser is wrong.

Fix: take the first whitespace-delimited token.

```diff
-    return strings.TrimSpace(sb.String()), nil
+    line := strings.TrimSpace(sb.String())
+    if i := strings.IndexAny(line, " \t"); i >= 0 {
+        line = line[:i]
+    }
+    return line, nil
```

S5. **Pure-Go unit coverage thin on the download/cache machinery** — `pkg/kuberlr/resolver_test.go` [Claude Opus 4.7]

The unit tests cover `compatible`, `cachedPath`, and `extractKubeFlags`. Everything substantial — `ensureCached` (cache hit vs miss), `download` (sha-mismatch, partial-write cleanup, context cancel), `streamGet` (non-200, mid-stream failure) — is exercised only by the BATS suite, which depends on a built rdd binary. A small `httptest.NewServer`-driven test file under `pkg/kuberlr/` would catch regressions on every `go test ./...` and remove the BATS dependency for the failure-mode cases.

Not a blocker — the BATS suite covers the integration concerns. The gap is fast-feedback coverage for ~50 lines of test code.

## Design Observations

### Concerns

- **Resolver runs unconditionally for `rdd ctl`, even though the embedded apiserver and the embedded kubectl share the same module (in-scope)** [Claude Opus 4.7]. `ctlAction` calls `kubectlAction`, which always runs `Resolve` against the embedded apiserver. The probe always falls through (versions match by construction), so the behavior is functionally unchanged — but it adds a real `serverVersion` round-trip on every `rdd ctl` invocation, including when the apiserver is briefly slow during startup. Round 1 addressed this by editing the PR description; the structural fix is to set `envSkipResolver` in `ctlAction` before calling `kubectlAction`, which makes the "always falls through" claim load-bearing rather than incidental.

  ```diff
   func ctlAction(cmd *cobra.Command, args []string) error {
       if err := ensureServiceRunning(cmd.Context()); err != nil {
           return err
       }
       …
  +    os.Setenv(envSkipResolver, "1")
       return kubectlAction(cmd, args)
   }
  ```

  Same symptom as I3 in a different shape: the cleanest unified fix is the `isClientOnly`/`envSkipResolver` short-circuit, applied here and in `Resolve`.

### Strengths

- **Atomic publish stays clean** [Claude Opus 4.7][Codex GPT 5.5]. `os.CreateTemp` → `io.MultiWriter(tmp, h)` → `os.Rename` is the textbook pattern for "no half-built file in the cache." `defer os.Remove(tmpName)` covers every error path; `errors.Is(err, os.ErrNotExist)` distinguishes "missing" from "broken FS." Round 1 already called this out and the implementation has not regressed.
- **Loud failures on download / sha mismatch** [Claude Opus 4.7]. The `cmd/rdd/kubectl.go:62-69` comment explains exactly why the resolver refuses to silently fall back to the embedded kubectl on a download failure — running a version-mismatched binary against the user's cluster would hide the real failure.
- **Recursion guard via `envSkipResolver`** [Claude Opus 4.7]. A downloaded kubectl that re-execs rdd through a shim cannot recurse back into `Resolve`. Small detail, well-explained at `pkg/kuberlr/resolver.go:23-26`.
- **Fake-kube as a separate Go module** [Claude Opus 4.7]. The fixture lives under its own `go.mod`, so test-only deps stay out of the main module; `dependabot.yml`'s `directories:` extension keeps the fixture patched.

## Testing Assessment

Round 1's testing gaps tied to fixed findings (sha-mismatch isolation, `--` separator, cross-major skew) are now covered. Highest-risk untested scenarios in Round 2:

1. **`--server` / `--cluster` overrides** (I1). No test exercises the case where the kubectl arg overrides the kubeconfig's cluster URL.
2. **Default-log-level message capture** (I2). No test asserts that the stderr of `rdd kubectl version --client` is silent on a host without a kubeconfig. A BATS line `refute_output --partial 'kubectl resolver:'` would catch any future regression.
3. **`rdd kubectl` against an unreachable server** (I3). The BATS suite covers apiserver 500 (test 5) but not "TCP connect refused" or "no route to host." A test pointing at `127.0.0.1:1` would exercise the timeout path.
4. **Network cancel during download.** `downloadTimeout` and `ctx` cancellation are wired but neither unit nor BATS covers a real cancellation.
5. **Windows `Exec` path.** `pkg/kuberlr/exec_windows.go` has no automated coverage (BATS skips Windows; no unit test).
6. **`sha512sum`-format mirror response** (S4).
7. **Trailing-slash mirror URL** (S1).

Codex verified locally that `go test ./pkg/kuberlr` passes; Claude noted `go test ./cmd/rdd` cannot compile in a worktree because `pkg/guestagent/lima-guestagent.gz` is absent, which is a pre-existing build-environment gap unrelated to this PR.

## Documentation Assessment

- `docs/design/environment.md` documents `RDD_CACHE_DIR`, `RDD_KUBECTL_MIRROR`, and the path-table entry `RDD_KUBECTL_CACHE`. Round 1's S4 is fully addressed.
- `docs/design/cmd_service.md` adds the `kubectl_cache` row to the `rdd service paths` table.
- `RDD_KUBECTL_RESOLVED` is internal and reasonably omitted from user-facing docs.
- The fake-kube `README.md` and `go.mod` reference `bats/fake-kube/...` — actual path is `bats/tests/10-cli/fake-kube/...` (S3).
- `TODO(eviction)` and `TODO(offline)` accurately scope future work.

## Acknowledged Limitations

- **Cache eviction not implemented** — `pkg/kuberlr/cache.go:33-36`: the `TODO(eviction)` comment notes ~50 MB per minor version accumulation. Round 1 declined the implementation; the TODO marker carries the framing forward.
- **Offline mode not implemented** — `pkg/kuberlr/resolver.go:45-48`, `pkg/kuberlr/downloader.go:44-46`: TODOs cover the planned env-var/config opt-out for download attempts. The current behavior on a network failure is a hard error, which the PR description and inline comments already call out.
- **`local_teardown_file()` deviation from BATS convention** — `bats/tests/10-cli/7-kubectl-cache.bats:90-98`: the inline rationale ("an HTTP server bound to an ephemeral port is the opposite [of state worth preserving]") is sound; the deviation is narrow and explicit.
- **Falling back to embedded on any cluster-probe failure** — `pkg/kuberlr/resolver.go:88-117`: each `nilerr` branch is documented. The design choice is sound; I2 is about the log level, not the fall-through itself.

### Declined in prior rounds

- **`CacheDir()` panics when `os.UserCacheDir()` fails** — declined Round 1: failure mode requires neither `$HOME` nor `%LocalAppData%` to resolve; the panic surfaces a clear stack and Jan judged the API change not worth the rare failure mode. Claude raised it again in Round 2 (S1 raw); dropped per the resolutions gate.
- **Concurrent first-use downloads can race on Windows `os.Rename`** — declined Round 1: "Modern Go handles dst-exists on `os.Rename` across platforms; concurrency harness deferred." Codex raised the publish-rename failure as Important (I2 raw) and Claude as Suggestion (S5 raw) in Round 2. Both raise the same code path and the surrounding code has not changed; dropped per the resolutions gate. The technical refinement Claude added (Windows blocks overwriting an executable currently mapped into another process even with `POSIX_SEMANTICS`) is worth filing as a follow-up TODO if the maintainers want a record beyond the prior decision.

## Agent Performance Retro

### [Claude]

Claude carried this round. Three Importants (I1, I2, I3) plus five focused Suggestions, all on real concerns. Two of the Importants (I2 warn-level logging, I3 unconditional 2 s probe) are direct second-order consequences of Round 1's fixes — exactly the kind of follow-up review work that benefits from a deeper agent. S2 (`go run` parse failure) and S4 (`sha512sum`-format mirrors) required tracing into the `componentbasever` package and reasoning about real-world mirror behavior; both checks landed.

Claude shared I1 with Codex, but its framing — concrete enumeration of the missing flags, link back to Round 1's design observation — was the more usable. Its independent S5 (Windows `os.Rename` race) repeated a Round 1 declined finding without engaging with the Resolution table; small process miss.

### [Codex]

Codex stayed lean — two Importants, no Suggestions. I1 matched Claude's; the framing was tighter (the regression-vs-design-observation framing) but the enumeration was thinner. I2 (Windows `os.Rename` race) was a re-raise of a Round 1 declined finding. Codex's Round 1 review had the same finding dropped as a false positive; the Round 1 Resolution table records "Modern Go handles dst-exists" as the rationale. Codex did not consult that and re-raised the same shape under the same framing. Drop per the resolutions gate.

### [Gemini]

Gemini failed at startup with `ProjectIdRequiredError` (needs `GOOGLE_CLOUD_PROJECT`) and a "not running in a trusted directory" rejection. CLI exit 55, before reading the prompt. The auth and trust configuration is local to this machine and unrelated to the PR. Excluded for the second round running.

### Summary

| | Claude Opus 4.7 | Codex GPT 5.5 | Gemini 3.1 Pro |
|---|---|---|---|
| Duration | 11m 47s | 3m 41s | — |
| Findings | 3I 5S | 1I | none |
| Tool calls | 51 (Bash 30, Read 21) | 25 (shell 25) | — |
| Design observations | 1 | 0 | 0 |
| False positives | 0 | 0 | 0 |
| Unique insights | 7 | 0 | 0 |
| Files reviewed | 16 | 18 | 0 |
| Coverage misses | 0 | 0 | 0 |
| **Totals** | **3I 5S** | **1I** | **none** |
| Downgraded | 0 | 0 | 0 |
| Dropped | 1 | 1 | 0 |


Claude provided the most value on this round; the I2 / I3 pair is the kind of UX-regression review work that benefits from end-to-end reasoning about CLI behavior at the project's default log level and across networks. Codex's coverage shrank because Round 1 already addressed most of the easy correctness gaps — the resolutions table left fewer surfaces for a tight reviewer to catch on Round 2.

#### Reconciliation

- Codex P2 Windows os.Rename race (I2 raw): important → dropped (declined Round 1; surrounding code unchanged).
- Claude P2 Windows os.Rename race (S5 raw): suggestion → dropped (same reason; cross-listed under "Declined in prior rounds").
- Claude P2 `CacheDir()` panic (S1 raw): suggestion → dropped (declined Round 1 as S11).

## Review Process Notes

### Repo context updates

- **Pin to "default log level is `warn`" when reviewing logrus calls in CLI code paths.** A `logrus.WithError(err).Warn(...)` line that fires on a routine fall-through is operator noise on every command. When a CLI introduces a new fall-through path, the review prompt should ask whether the chosen log level fits how often the path fires. Add a one-liner to `deep-review-context.md`: "Default `RDD_LOG_LEVEL` outside developer mode is `warn`; warn-level messages reach the user terminal on every invocation. Treat new warn-level logs in routine fall-through paths as UX regressions."

## Appendix: Original Reviews

### [Claude] — Pass 1

Now I have a thorough picture. Let me write the review.

# Code Review: PR #348 — rdd kubectl version resolver

| | |
|---|---|
| **Branch** | rdd-kubectl-resolver |
| **Verdict** | Approve with suggestions — important issues are gaps, not blockers |

## Executive Summary

This PR embeds a kuberlr-style version resolver into `rdd kubectl` so out-of-skew external clusters trigger a sha-verified download of a matching kubectl from `dl.k8s.io`. The implementation is small, tightly scoped, and well-documented; the only meaningful gaps are (1) the arg scanner ignores kubectl flags that change which API server gets probed (`--server`, `--cluster`), so the resolver can pick the wrong kubectl version, and (2) probe failures log at warn level on every invocation, which becomes noisy for client-only commands when no cluster is configured.

## Findings

### Critical Issues

None.

### Important Issues

**I1. Resolver ignores `--server`, `--cluster`, and `--user` flags — wrong kubectl can be selected** — `pkg/kuberlr/resolver.go:138-155` (important, gap)

```go
func extractKubeFlags(args []string) (kubeconfig, context string) {
    for i, arg := range args {
        if arg == "--" {
            break
        }
        switch {
        case arg == "--kubeconfig" && i+1 < len(args):
            kubeconfig = args[i+1]
        ...
        case arg == "--context" && i+1 < len(args):
            context = args[i+1]
        ...
```

`extractKubeFlags` only honors `--kubeconfig` and `--context`. kubectl also supports `--server` (override the cluster URL), `--cluster` (pick a different cluster entry), and `--user` (`k8s.io/client-go/tools/clientcmd/overrides.go:147-150` defines all four as standard cluster-selection flags). Their effect on which API server kubectl actually talks to is not captured by `loadClientConfig` at line 124, so `serverVersion` probes the kubeconfig's default cluster while kubectl-as-finally-exec'd will hit a different one. In the worst case the probe says "in skew, run embedded" while the actual target is two minors away, defeating the entire point of the resolver.

The single most consequential is `--server`: it can route kubectl at any URL with no kubeconfig involvement. `--cluster` similarly overrides the cluster reference within the kubeconfig.

Fix: thread the missing flags through `loadClientConfig` via `clientcmd.ConfigOverrides`. Both `--server` and `--cluster` map to override fields:

```diff
-func extractKubeFlags(args []string) (kubeconfig, context string) {
+func extractKubeFlags(args []string) (kubeconfig, context, server, cluster, user string) {
     for i, arg := range args {
         if arg == "--" {
             break
         }
         switch {
         case arg == "--kubeconfig" && i+1 < len(args):
             kubeconfig = args[i+1]
         case strings.HasPrefix(arg, "--kubeconfig="):
             kubeconfig = strings.TrimPrefix(arg, "--kubeconfig=")
         case arg == "--context" && i+1 < len(args):
             context = args[i+1]
         case strings.HasPrefix(arg, "--context="):
             context = strings.TrimPrefix(arg, "--context=")
+        case arg == "--server" && i+1 < len(args):
+            server = args[i+1]
+        case strings.HasPrefix(arg, "--server="):
+            server = strings.TrimPrefix(arg, "--server=")
+        case arg == "--cluster" && i+1 < len(args):
+            cluster = args[i+1]
+        ...
         }
     }
-    return kubeconfig, context
+    return
 }
```

Then in `loadClientConfig`, populate `overrides.ClusterInfo.Server` and `overrides.Context.Cluster` accordingly.

---

**I2. Probe failures log at warn level on every invocation, polluting normal CLI output** — `pkg/kuberlr/resolver.go:97-115` (important, regression)

```go
cfg, err := loadClientConfig(args)
if err != nil {
    logrus.WithError(err).Warn("kubectl resolver: cannot load kubeconfig; using embedded kubectl")
    return semver.Version{}, false, nil
}
...
info, err := client.ServerVersion()
if err != nil {
    logrus.WithError(err).Warn("kubectl resolver: cluster probe failed; using embedded kubectl")
    return semver.Version{}, false, nil
}
```

Per the project convention (`docs/design/environment.md:14`), the default `RDD_LOG_LEVEL` is `warn`, so every one of these messages reaches the user's terminal. Common scenarios that hit them:

- `rdd kubectl version --client` when no kubeconfig exists: every invocation logs "cannot load kubeconfig" before printing the client version. The new BATS test `apiserver returns 500` (line 190) actually exercises the second variant; its `assert_output` (line 201) is non-empty for the right reason (kubectl printed something) but does not protect against future warn-message regressions.
- `rdd kubectl --help` / completion / config queries on a host whose cluster is sometimes offline: same problem, on every call.

The author's intent (per the comment at lines 90-94) is that an operator should learn why the resolver skipped a download. That argument applies once, not to every shell prompt. logrus also formats the error with field-prefixed JSON-ish output that does not match how kubectl normally surfaces messages.

Fix: log at `Debug` for the loadClientConfig and ServerVersion failure paths. The "cannot parse server version" branch (line 113) is genuinely surprising — keep that one at warn. Alternative: log once per process via a `sync.Once`, but a level change is simpler and matches the pattern "fall through silently to embedded kubectl."

```diff
-        logrus.WithError(err).Warn("kubectl resolver: cannot load kubeconfig; using embedded kubectl")
+        logrus.WithError(err).Debug("kubectl resolver: cannot load kubeconfig; using embedded kubectl")
```

---

**I3. Per-invocation 2 s probe penalizes every `rdd kubectl` and `rdd ctl` call when the cluster is unreachable** — `pkg/kuberlr/resolver.go:31,101` (important, regression)

```go
const serverProbeTimeout = 2 * time.Second
...
cfg.Timeout = serverProbeTimeout
```

Every `rdd kubectl` invocation now performs a synchronous probe that can take up to 2 seconds before the embedded kubectl runs. Common cases: a developer's laptop is on a flight, the cluster is firewalled, or VPN is down. Every `rdd kubectl --help`, `version --client`, and `completion` call now stalls 2 seconds because the probe waits the full timeout.

`rdd ctl` is documented as "behavior unchanged," but kubectlAction at `cmd/rdd/kubectl.go:60` calls Resolve unconditionally. The embedded apiserver typically responds in well under 100 ms; under stress (e.g., during startup right after `ensureServiceRunning`'s 90 s timeout settled) a slow first response could hit the 2 s ceiling and stall an `rdd ctl` even though the right answer is "use embedded."

Fix: short-circuit obvious client-only invocations before probing — the most common offenders:

```go
func Resolve(ctx context.Context, args []string) (string, error) {
    if os.Getenv(envSkipResolver) != "" {
        return "", nil
    }
    if isClientOnly(args) { // "version --client", "completion ...", "config ...", "--help", "-h"
        return "", nil
    }
    ...
}
```

If a generic client-only check is undesirable, at minimum drop `serverProbeTimeout` to ~500 ms — modern apiservers answer /version in well under 100 ms, and the tail risk is one more retry, not a download failure.

### Suggestions

**S1. `CacheDir()` panics when `os.UserCacheDir()` fails — crashes `rdd svc paths`** — `pkg/kuberlr/cache.go:40-44` (suggestion, gap)

```go
cache, err := os.UserCacheDir()
if err != nil {
    panic(fmt.Errorf("could not determine user cache directory: %w", err))
}
```

`CacheDir()` is called unconditionally from `instancePaths()` (`cmd/rdd/service_paths.go:32`). On the rare host where `UserCacheDir` errors (Linux without `HOME`/`XDG_CACHE_HOME`, Windows without `LocalAppData`/`APPDATA`), the panic crashes `rdd svc paths` with a Go stack trace. Every other path in `instancePaths` is fallible-but-tolerant (it returns whatever the helper returns); this one isn't.

Fix: change the signature to return `(string, error)`, propagate to `instancePaths`, and let `Resolve` surface a real error. Or, since `CacheDir()` is a path query and `os.UserCacheDir` returning an error means kuberlr has no valid place to write, return `""` and have callers downstream guard appropriately.

---

**S2. `RDD_KUBECTL_MIRROR` with a trailing slash produces double-slash URLs** — `pkg/kuberlr/downloader.go:47-52,87` (suggestion, gap)

```go
func mirrorURL() string {
    if v := os.Getenv(envKubeMirror); v != "" {
        return v
    }
    return defaultKubeMirror
}
...
base := fmt.Sprintf("%s/release/v%d.%d.%d/...", mirrorURL(), ...)
```

If a user sets `RDD_KUBECTL_MIRROR=https://my.mirror.example/`, the resulting URL becomes `https://my.mirror.example//release/...`. Many HTTP servers tolerate this; some 404. The default and the BATS test never hit the case, so it is invisible.

Fix:

```diff
 func mirrorURL() string {
     if v := os.Getenv(envKubeMirror); v != "" {
-        return v
+        return strings.TrimRight(v, "/")
     }
     return defaultKubeMirror
 }
```

---

**S3. Dev builds without `-X` LDFLAGS fail to parse the embedded version** — `pkg/kuberlr/resolver.go:84-86,53-56` (suggestion, regression)

```go
func embeddedVersion() (semver.Version, error) {
    return semver.ParseTolerant(componentbasever.Get().GitVersion)
}
```

`componentbasever` defaults `gitVersion` to `"v0.0.0-master+$Format:%H$"` (`k8s.io/component-base/version/base.go:52`). The `:` and `%` are invalid in semver build metadata, so `ParseTolerant` returns an error. In `make build` the LDFLAGS in `Makefile:28` overwrite this, but `go run ./cmd/rdd kubectl ...` and any debug build that bypasses the Makefile (such as IDE-driven debug runs) now error out with `"resolving kubectl version: reading embedded kubectl version: ..."`. Before this PR, `go run ./cmd/rdd kubectl` worked.

Fix: treat an unparseable embedded version the same as an unreachable server — log debug, skip the resolver, and let kubectl run:

```diff
 func Resolve(ctx context.Context, args []string) (string, error) {
     ...
     embedded, err := embeddedVersion()
     if err != nil {
-        return "", fmt.Errorf("reading embedded kubectl version: %w", err)
+        logrus.WithError(err).Debug("kubectl resolver: embedded version not parseable; using embedded kubectl")
+        return "", nil
     }
     ...
 }
```

---

**S4. `bats/fake-kube/...` references in comments — actual path is `bats/tests/10-cli/fake-kube/...`** — `bats/tests/10-cli/fake-kube/go.mod:1`, `bats/tests/10-cli/7-kubectl-cache.bats:8,32` (suggestion, gap)

```go
// Test fixture: see bats/fake-kube/README for the role this module plays.
```

Three call sites still point at the old `bats/fake-kube/...` path. The module sits under `bats/tests/10-cli/fake-kube/`. The README itself (`bats/tests/10-cli/fake-kube/README.md:3`) correctly says "next to the test that uses them," so the moved-once-not-twice diff is the comments.

Fix: replace `bats/fake-kube` with `bats/tests/10-cli/fake-kube` in those three lines (or with relative phrasing like "the sibling go.mod").

---

**S5. Concurrent download race on Windows can leave second invocation with `os.Rename` failure** — `pkg/kuberlr/downloader.go:57-71,116` (suggestion, gap)

```go
if _, err := os.Stat(path); err == nil {
    return path, nil
} else if !errors.Is(err, os.ErrNotExist) {
    return "", err
}
...
return os.Rename(tmpName, dst)
```

Two parallel `rdd kubectl` invocations against an out-of-skew cluster both `Stat` (line 59), both miss, both download. On Windows, if A's rename completes first and A then `Exec`'s the cached binary, B's subsequent `os.Rename(tmpName, dst)` (line 116) fails because Windows refuses to replace an in-use executable (Go's `os.Rename` calls `MoveFileEx` with `MOVEFILE_REPLACE_EXISTING`, which still requires the destination not to be open without `FILE_SHARE_DELETE`). B's user sees `"resolving kubectl version: ... rename: ..."` and a retry usually succeeds.

This is rare in practice (parallel out-of-skew invocations on Windows). A self-healing path exists (re-Stat after download, skip rename if dst already exists), but the bigger fix is per-version file locking. Either is more change than the symptom warrants right now; suggest a TODO rather than a fix.

Fix: add a TODO near the rename call documenting the known race, or guard with a re-Stat:

```diff
+    // Concurrent downloaders may both arrive here. The first to rename wins;
+    // on Windows the second's rename fails if the file is held open by the
+    // first's exec'd kubectl. Cheap re-Stat avoids the failure.
+    if _, err := os.Stat(dst); err == nil {
+        return nil
+    }
     return os.Rename(tmpName, dst)
```

---

**S6. `fetchSha512` assumes the digest line carries no filename — breaks against `sha512sum`-style mirrors** — `pkg/kuberlr/downloader.go:120-126` (suggestion, gap)

```go
func fetchSha512(ctx context.Context, url string) (string, error) {
    var sb strings.Builder
    if err := streamGet(ctx, url, &sb); err != nil {
        return "", err
    }
    return strings.TrimSpace(sb.String()), nil
}
```

`dl.k8s.io` publishes one line containing only the hex digest, so this works. Other mirrors (or someone hand-mirroring with `sha512sum > kubectl.sha512`) emit `"<digest>  kubectl"`. `TrimSpace` only trims edges, so the trailing `"  kubectl"` survives, the comparison at line 110 fails, and every download errors. Easy to misdiagnose.

Fix: take the first whitespace-delimited token:

```diff
-    return strings.TrimSpace(sb.String()), nil
+    line := strings.TrimSpace(sb.String())
+    if i := strings.IndexAny(line, " \t"); i >= 0 {
+        line = line[:i]
+    }
+    return line, nil
```

---

**S7. Unit test depth thin on the download/cache machinery** — `pkg/kuberlr/resolver_test.go` (suggestion, enhancement)

The unit tests cover `compatible`, `cachedPath`, and `extractKubeFlags`. Everything substantial — `ensureCached` (cache hit vs miss), `download` (sha-mismatch, partial-write cleanup), `streamGet` (non-200 path, context cancel) — is exercised only by the BATS suite, which depends on a built rdd binary. Pure-Go unit tests with `httptest.Server` would catch regressions much earlier and run on every `go test ./...`.

This is not a blocker — the BATS suite covers the integration concerns. But a small `httptest.NewServer`-driven test file in `pkg/kuberlr/` would round out coverage for ~50 lines of work.

## Design Observations

### Concerns

**Resolver runs on every invocation, including for `rdd ctl`** — see I3 above, and consider whether the resolver is needed at all for `rdd ctl`. The PR description correctly observes that the embedded apiserver and the embedded kubectl come from the same module, so the probe is guaranteed to fall through. That guarantee can be made structural rather than dynamic by skipping `Resolve` when entering through `ctlAction`:

```diff
 func ctlAction(cmd *cobra.Command, args []string) error {
     if err := ensureServiceRunning(cmd.Context()); err != nil {
         return err
     }
     ...
+    // ctl always targets the embedded apiserver; embedded apiserver and
+    // embedded kubectl share a module and cannot drift, so skip the probe.
+    os.Setenv(envSkipResolver, "1")
     return kubectlAction(cmd, args)
 }
```

Or factor `kubectlAction`'s body into one helper that runs the resolver and one that doesn't. (in-scope)

### Strengths

- Atomic publish: `os.CreateTemp` → write+hash → `os.Rename` is the textbook pattern for "no half-built file in cache." `defer os.Remove(tmpName)` covers every error path; `errors.Is(err, os.ErrNotExist)` distinguishes "missing" from "broken FS."
- Loud failures on download/sha mismatch — preferring a noisy error to a silent fall-through to a wrong-version embedded kubectl is the right call and is correctly documented inline at `cmd/rdd/kubectl.go:62-69`.
- `envSkipResolver` recursion guard (`pkg/kuberlr/resolver.go:23-26`, set in both `exec_unix.go:19` and `exec_windows.go:24`) is a small detail that prevents a downloaded-kubectl-shim loop and is well-explained by its comment.
- The fake-kube fixture as a separate Go module is the right call — keeps the test-only deps out of the main module, and `dependabot.yml`'s `directories:` extension keeps it patched.

## Testing Assessment

Highest-risk untested scenarios, ranked:

1. **`--server` / `--cluster` argument** (I1). No test exercises the case where the kubectl arg overrides the kubeconfig's cluster URL.
2. **Concurrent `rdd kubectl` invocations on a fresh cache** (S5). The cache-hit/miss tests run sequentially.
3. **Network cancel during download.** `downloadTimeout` and `ctx` cancellation are wired but neither unit nor BATS covers a real cancellation path.
4. **Windows-specific `Exec` path.** `pkg/kuberlr/exec_windows.go` has no automated coverage at all (BATS skips Windows; unit tests don't exercise it).
5. **Sha file with trailing filename** (S6).
6. **`rdd ctl` resolver fall-through.** PR description argues `rdd ctl` is unchanged because the probe always reports a matching version; no test confirms this on the real embedded apiserver.

## Documentation Assessment

- `docs/design/environment.md` updates are accurate. `RDD_KUBECTL_RESOLVED` (the recursion guard) is internal and reasonably omitted.
- `docs/design/cmd_service.md` table addition is correct; the existing `docker_socket` entry is still missing from this table (pre-existing gap, not from this PR).
- The fake-kube `README.md` and the `go.mod` comment there reference `bats/fake-kube/...` — actual path is `bats/tests/10-cli/fake-kube/...` (S4).
- The `TODO(eviction)` and `TODO(offline)` comments accurately scope future work.

## Commit Structure

Single commit, well-formed message that explains motivation, scope, error policy, and test surface clearly. No concerns.

## Acknowledged Limitations

- **Cache eviction not implemented** — `pkg/kuberlr/cache.go:33-36`: TODO comment explicitly notes `~50 MB per minor version` accumulation and proposes per-version TTL/LRU sweep. No immediate footprint concern.
- **Offline mode not implemented** — `pkg/kuberlr/resolver.go:45-48` and `pkg/kuberlr/downloader.go:44-46`: TODO marks an env-var/config opt-out for download attempts that would fall back to the closest cached binary. The current behavior on a network failure is a hard error, which the PR description and inline comments call out.
- **`local_teardown_file()` deviation from BATS convention** — `bats/tests/10-cli/7-kubectl-cache.bats:90-98`: the author explicitly justifies the deviation in the comment ("the rule's intent is to preserve rdd state for post-mortem inspection; an HTTP server bound to an ephemeral port is the opposite"). The justification is sound; CLAUDE.md's BATS style rule may want a carve-out for non-rdd ancillary processes.
- **Falling back to embedded on any cluster probe failure** — `pkg/kuberlr/resolver.go:88-117`: each `nilerr` branch is explicitly documented as a legitimate fall-through. The design choice is sound; the noise from logging at warn (I2) is a separate concern.

## Coverage Summary

| File | Status |
|---|---|
| `.github/dependabot.yml` | Reviewed, no issues |
| `bats/tests/10-cli/5-paths.bats` | Reviewed, no issues |
| `bats/tests/10-cli/7-kubectl-cache.bats` | I2, I3 (testing assumption), S4 |
| `bats/tests/10-cli/fake-kube/README.md` | Trivial |
| `bats/tests/10-cli/fake-kube/go.mod` | S4 |
| `bats/tests/10-cli/fake-kube/kubectl/main.go` | Reviewed, no issues |
| `bats/tests/10-cli/fake-kube/server/main.go` | Reviewed, no issues |
| `cmd/rdd/kubectl.go` | I3 (Design Observation) |
| `cmd/rdd/service_paths.go` | S1 (transitive) |
| `docs/design/cmd_service.md` | Reviewed, no issues |
| `docs/design/environment.md` | Reviewed, no issues |
| `go.mod` | Trivial |
| `pkg/kuberlr/cache.go` | S1 |
| `pkg/kuberlr/downloader.go` | S2, S5, S6 |
| `pkg/kuberlr/exec_unix.go` | Reviewed, no issues |
| `pkg/kuberlr/exec_windows.go` | Reviewed, no issues; testing gap noted under Testing Assessment |
| `pkg/kuberlr/resolver.go` | I1, I2, I3, S3 |
| `pkg/kuberlr/resolver_test.go` | S7 |
### [Codex] — Pass 1

### Executive Summary

Verdict: changes requested. The PR adds a kubectl version resolver with checksum-verified cache downloads and good end-to-end coverage for the happy path and download failures, but two resolver edge cases need attention: probing the wrong cluster when standard kubectl connection override flags are used, and a Windows race during shared cache publication.

### Findings

#### Critical Issues

None.

#### Important Issues

I1. **Resolver ignores standard kubectl connection override flags** — `pkg/kuberlr/resolver.go:120-151` (important, regression)

```go
func loadClientConfig(args []string) (*rest.Config, error) {
	kubeconfig, context := extractKubeFlags(args)
	rules := clientcmd.NewDefaultClientConfigLoadingRules()
	rules.ExplicitPath = kubeconfig
	overrides := &clientcmd.ConfigOverrides{CurrentContext: context}
	return clientcmd.NewNonInteractiveDeferredLoadingClientConfig(rules, overrides).ClientConfig()
}
```

`kubectlAction` resolves before running kubectl at `cmd/rdd/kubectl.go:60`, but the resolver only carries `--kubeconfig` and `--context` into the probe at `pkg/kuberlr/resolver.go:125-128`. The actual kubectl invocation still receives the original args at `cmd/rdd/kubectl.go:74` or `cmd/rdd/kubectl.go:76`, so flags such as `--server` / `-s`, `--cluster`, `--user`, `--token`, `--certificate-authority`, and `--insecure-skip-tls-verify` can make kubectl talk to a different cluster than the resolver probed. That can silently fall through to the embedded kubectl for an out-of-skew server, or download a cached kubectl for the wrong server.

Fix: parse the full kubectl client config flag set instead of the custom two-flag scanner. Use Kubernetes’ existing config flag machinery where possible, or at minimum populate `clientcmd.ConfigOverrides` for the standard cluster/auth override flags and add tests for `--server`, `-s`, and TLS/auth override cases.

I2. **Concurrent cache fills fail on Windows after another process wins the race** — `pkg/kuberlr/downloader.go:57-68`, `pkg/kuberlr/downloader.go:116` (important, regression)

```go
if _, err := os.Stat(path); err == nil {
	return path, nil
}
if err := download(ctx, want, path); err != nil {
	return "", err
}
```

```go
return os.Rename(tmpName, dst)
```

The cache is explicitly shared across instances, but two first-use `rdd kubectl` invocations can both observe a miss at `pkg/kuberlr/downloader.go:59`, both download at `pkg/kuberlr/downloader.go:67`, and then race to publish the same destination at `pkg/kuberlr/downloader.go:116`. On Windows, `os.Rename` fails when the destination already exists, so the loser returns an error that surfaces as `resolving kubectl version` at `cmd/rdd/kubectl.go:70` even though a valid cached binary is now present.

Fix: make publication collision-tolerant or serialize by version. A minimal safe fix is to treat an existing destination after a failed rename as success, without deleting/replacing it, because the destination may already be executing on Windows.

```diff
	if err := os.Chmod(tmpName, 0o755); err != nil {
		return err
	}
-	return os.Rename(tmpName, dst)
+	if err := os.Rename(tmpName, dst); err != nil {
+		if errors.Is(err, os.ErrExist) {
+			if _, statErr := os.Stat(dst); statErr == nil {
+				return nil
+			}
+		}
+		return err
+	}
+	return nil
```

#### Suggestions

None.

### Design Observations

**Strengths**

The download path writes to a temp file, hashes the downloaded bytes, chmods before publish, and removes the temp file on failure. That keeps normal crash states from being mistaken for a valid cached kubectl.

The BATS fixture keeps the fake mirror and fake apiserver in a separate Go module, which avoids pulling fixture code or dependencies into the main binary.

### Testing Assessment

1. Standard kubectl connection override flags are not covered: add resolver tests or BATS coverage for `--server` / `-s`, `--cluster`, `--user`, and TLS/auth flags. This is the highest-risk missing coverage and maps to I1.
2. Concurrent cache fill behavior is not covered, especially on Windows. Add a unit test around publish collision or a Windows-specific test for existing destination after download. This maps to I2.
3. Windows BATS coverage is skipped for the new cache suite, so Windows-specific exec and cache semantics remain mostly compile-only.

Verification run: `go test ./pkg/kuberlr` passed. `go test ./cmd/rdd ./pkg/kuberlr` could not compile `cmd/rdd` in this worktree because `pkg/guestagent/lima-guestagent.gz` is missing.

### Documentation Assessment

The user-facing environment and `svc paths` docs were updated for the new cache and mirror settings. No additional documentation gaps found beyond the acknowledged TODOs below.

### Acknowledged Limitations

- **Code comment**: `pkg/kuberlr/resolver.go:45` says offline mode is TODO: “when no cached kubectl matches and the network is down, Resolve currently fails with the download error.” This is consistent with the PR’s stated error policy and is not duplicated as a finding.
- **Code comment**: `pkg/kuberlr/downloader.go:45` also notes the missing offline “may we download?” toggle. Same limitation as above, documented consistently.
- **Code comment**: `pkg/kuberlr/cache.go:33` says the cache only grows and needs future eviction. This is a storage-footprint limitation, not a correctness regression.
- **Code comment**: `bats/tests/10-cli/7-kubectl-cache.bats:15` skips the fake-kube BATS suite on Windows. That makes I2 more likely to escape tests, but the skip itself is documented.
- **Code comment**: `bats/tests/10-cli/7-kubectl-cache.bats:90` adds `local_teardown_file()` with an explicit rationale for deviating from the repo’s normal no-teardown convention. I did not report it as a BATS style finding because the comment explains the intentional exception.

### Coverage Summary

- `.github/dependabot.yml` — Reviewed, no issues
- `bats/tests/10-cli/5-paths.bats` — Reviewed, no issues
- `bats/tests/10-cli/7-kubectl-cache.bats` — Reviewed, no issues
- `bats/tests/10-cli/fake-kube/README.md` — Reviewed, no issues
- `bats/tests/10-cli/fake-kube/go.mod` — Reviewed, no issues
- `bats/tests/10-cli/fake-kube/kubectl/main.go` — Reviewed, no issues
- `bats/tests/10-cli/fake-kube/server/main.go` — Reviewed, no issues
- `cmd/rdd/kubectl.go` — I1, I2
- `cmd/rdd/service_paths.go` — Reviewed, no issues
- `docs/design/cmd_service.md` — Reviewed, no issues
- `docs/design/environment.md` — Reviewed, no issues
- `go.mod` — Trivial
- `pkg/kuberlr/cache.go` — Reviewed, no issues
- `pkg/kuberlr/downloader.go` — I2
- `pkg/kuberlr/exec_unix.go` — Reviewed, no issues
- `pkg/kuberlr/exec_windows.go` — Reviewed, no issues
- `pkg/kuberlr/resolver.go` — I1
- `pkg/kuberlr/resolver_test.go` — Reviewed, no issues

### [Gemini] — Pass 1

Gemini failed at startup before reading the prompt. `stderr-gemini-pass-1.log`:

```
Error authenticating: ProjectIdRequiredError: This account requires setting the GOOGLE_CLOUD_PROJECT or GOOGLE_CLOUD_PROJECT_ID env var.
Gemini CLI is not running in a trusted directory. To proceed, either use --skip-trust, set the GEMINI_CLI_TRUST_WORKSPACE=true environment variable, or trust this directory in interactive mode.
```

CLI exit 55. Local auth/trust configuration; unrelated to the PR.

## Resolution

| | |
|---|---|
| **Addressed** | 2026-05-03 |
| **Commit** | `75346bd rdd kubectl: embed kuberlr-style version resolver` (amended into the original commit) |

| # | Finding | Action | Reason |
|---|---------|--------|--------|
| 1 | Important #1: Resolver only sees `--kubeconfig` and `--context` | Fixed | |
| 2 | Important #2: Probe failure logs at warn | Fixed | Three routine fall-throughs demoted to Debug; parse-failure stays at Warn |
| 3 | Important #3: Unconditional 2 s probe stalls every kubectl | Fixed | Added `isClientOnly` short-circuit covering version --client / completion / config / options / help / --help / -h; `rdd ctl` now also short-circuits via `kuberlr.SkipResolver()` |
| 4 | Suggestion #1: `RDD_KUBECTL_MIRROR` trailing-slash | Fixed | |
| 5 | Suggestion #2: Dev `go run` builds fail to parse the embedded version | Fixed | |
| 6 | Suggestion #3: Stale `bats/fake-kube/...` references | Fixed | |
| 7 | Suggestion #4: `fetchSha512` keeps trailing tokens | Fixed | |
| 8 | Suggestion #5: Pure-Go unit coverage thin on download/cache machinery | Skipped | BATS suite already covers the integration path; the gap is fast-feedback for ~50 lines of new test code, not a correctness gap |
| 9 | Testing Gap #1: `--server` / `--cluster` overrides | Test written | New TestExtractKubeFlags rows cover -s, --server, --cluster, --user, the all-five-together case, and stops-at-`--` |
| 10 | Testing Gap #2: Default-log-level message capture | Skipped | Defer with the same rationale as S5; no new test infrastructure this round |
| 11 | Testing Gap #3: `rdd kubectl` against unreachable server | Skipped | Defer with the same rationale as S5; no new test infrastructure this round |
| 12 | Testing Gap #4: Network cancel during download | Skipped | Defer with the same rationale as S5; no new test infrastructure this round |
| 13 | Testing Gap #5: Windows `Exec` path | Skipped | Defer with the same rationale as S5; no new test infrastructure this round |
| 14 | Testing Gap #6: `sha512sum`-format mirror response | Skipped | Defer with the same rationale as S5; no new test infrastructure this round |
| 15 | Testing Gap #7: Trailing-slash mirror URL | Skipped | Defer with the same rationale as S5; no new test infrastructure this round |
| 16 | Design Observation: Resolver runs unconditionally for `rdd ctl` | Fixed | `kuberlr.SkipResolver()` called in `ctlAction` short-circuits Resolve; documented `RDD_KUBECTL_RESOLVED` in `docs/design/environment.md` under a new Internal Variables section per Jan's request to document every `RDD_*` variable |
