# Deep Review: 20260422-170229-pr-336

| | |
|---|---|
| **Date** | 2026-04-22 17:02 |
| **Repo** | [rancher-sandbox/rancher-desktop-daemon](https://github.com/rancher-sandbox/rancher-desktop-daemon) |
| **Round** | 8 (of target) |
| **Author** | [@jandubois](https://github.com/jandubois) |
| **PR** | [#336](https://github.com/rancher-sandbox/rancher-desktop-daemon/pull/336) — cli: standardize --wait and --timeout across svc and limavm |
| **Commits** | `7b08089` cli: standardize --wait and --timeout across svc and limavm |
| **Reviewers** | Claude Opus 4.7 (effort: xhigh), Codex GPT 5.4 (effort: high), Gemini 3.1 Pro (effort: default) |
| **Verdict** | **Merge with fixes** — address I1 (cleanupDiscoveryConfigMap on `--wait=false`) either by gating the call or by documenting the trade-off; the other findings are polish and need not block merge. |
| **Wall-clock time** | `36 min 36 s` |

---

## Executive Summary

Round 8 of the CLI standardization PR. All round-7 Important findings were addressed (`api_service.md` fix, `controller.bash` hard-fail on timed delete, 120s `local_setup_file` bound); deferred round-7 Suggestions (context.TODO, Running/StartTime race, autostart entry-point coverage) resurface here as expected.

The core change remains sound. One new Important finding: `cleanupDiscoveryConfigMap` runs unconditionally on `--wait=false`, so a `rdd svc stop --wait=false` deletes the discovery ConfigMap while the daemon is still fully up and serving — pessimizing any concurrent `rdd ctl`/`rdd kubectl` caller that then polls a missing ConfigMap until `startWaitTimeout` (90s). The documented trade-off covers the timed-out-wait case; the `--wait=false` case is not called out and is strictly worse for concurrent callers.

The persistent Important gap (I2) is the autostart-entry-point BATS coverage for `rdd set`, `rdd kubectl`, `rdd service config` — raised in rounds 6, 7, and 8 and deferred each time. Author's call whether to add now or track as pending work.

---

## Critical Issues

None.

---

## Important Issues

I1. **`cleanupDiscoveryConfigMap` runs unconditionally on `--wait=false`, pessimizing concurrent CLI callers** — `pkg/service/cmd/service.go:353-358` [Claude Opus 4.7]

```go
// Delete the discovery configmap before signaling so the kube-API
// delete still has a live apiserver. Trade-off: a timed-out --wait
// leaves the configmap deleted while the daemon drains, so a quick
// `rdd svc start` polls the missing configmap until startWaitTimeout.
// Moving cleanup into the daemon's own shutdown path is the durable fix.
_ = cleanupDiscoveryConfigMap()
```

The comment calls out the timed-out-wait case, but the same cleanup fires on `--wait=false`, where it is strictly pessimal. The CLI returns immediately, the daemon stays up and serving, and any concurrent `rdd ctl`/`rdd kubectl`/`rdd set` caller that hits `ensureServiceRunning` then polls the missing ConfigMap in `waitForFreshDiscoveryConfigMap` until `startWaitTimeout` (90s) expires. The daemon itself never recreates the discovery ConfigMap after `MarkControlPlaneReady` (`pkg/service/cmd/service.go:803`), so the window is not self-healing — it closes only when the daemon exits.

The call is pre-existing (was unconditional before the PR), not a regression. The PR surfaces it by documenting `--wait=false` as a supported stop mode in `docs/design/cmd_service.md:93-95` and removing `--wait=false` from `svc delete`.

Fix: gate the call on `wait`, or (per the code's own TODO) move cleanup into the daemon's graceful-shutdown path so it runs only after the apiserver quiesces:

```diff
-    _ = cleanupDiscoveryConfigMap()
+    if wait {
+        _ = cleanupDiscoveryConfigMap()
+    }
```

If neither fix lands, document the trade-off in `cmd_service.md` alongside the existing `--wait=false` paragraph, so users know a `--wait=false` stop can slow subsequent auto-starting CLI calls by up to 90s.

I2. **Autostart entry-point exit-code-4 contract still untested** — `bats/tests/20-service/6-timeout.bats`, `bats/tests/33-lima-controllers/limavm-cli.bats` [Codex GPT 5.4] (round 7 I2 unaddressed for autostart paths)

`ensureServiceRunning` (`cmd/rdd/service.go:93-111`) now routes `context.DeadlineExceeded` through `cliexit.Classify`, and every caller that autostarts the service inherits the exit-code-4 contract: `rdd set`, `rdd ctl`, `rdd service config`, and every `rdd limavm *` command. Round 8 adds explicit `run -4` coverage for `limavm create --start`, `start`, `stop`, `restart`, and `delete`, closing most of the round-7 gaps. Still missing:

1. `rdd set --timeout=…` exit-code-4. `cmd/rdd/set.go:324-325` is the original site of the contract; no BATS test asserts `rdd set` returns 4 specifically.
2. `rdd ctl --…` exit-code-4 when `ensureServiceRunning` itself times out.
3. `rdd service config` exit-code-4 (autostart-only caller; no own wait loop).

A missing `cliexit.Classify` on any of these would ship green because the "uniform contract" now leans on the shared helper. Forcing `ensureServiceRunning` past its 90s cap in CI is expensive, so a focused Go-level test around `ensureServiceRunning` — asserting the returned error unwraps to `*cliexit.Error{Code: 4}` when the `ctx` deadline expires — is cheaper and equally diagnostic.

Fix: add a Go test in `cmd/rdd/` that exercises `ensureServiceRunning` (or `cliexit.Classify` at the helper boundary) with a forced-timeout `ctx`; skip the end-to-end BATS path unless `svc config` is cheap to cover.

---

## Suggestions

S1. **`Delete` returns exit 4 even when the fallback `process.Kill` already brought the daemon down** — `pkg/service/cmd/service.go:448-465` [Claude Opus 4.7]

```go
if Running() {
    if err := StopWithWait(ctx, true, timeout); err != nil {
        if errors.Is(err, context.DeadlineExceeded) || Running() {
            return err
        }
    }
}
```

When the deadline fires inside `StopWithWait`, the wait loop's `ctx.Done` branch calls `process.Kill(pid)` and returns `DeadlineExceeded`. The Kill often takes effect immediately (Unix apiserver's `SetupSignalContext` exits on the second signal; Windows `TerminateProcess` blocks until the process is gone). `Delete`'s outer check then short-circuits on `errors.Is(DeadlineExceeded)` before `Running()` has a chance to reflect the post-Kill state, so a user running `rdd svc delete --timeout=30s` against a slow-to-drain daemon sees exit 4 and an intact directory even though the daemon is gone — they must re-run to finish.

Trade-off: Claude's proposed swap (check `Running()` only, drop the `DeadlineExceeded` short-circuit) silently succeeds on a timed-out-but-Kill-succeeded delete, contradicting the documented "deadline expires → exit 4" contract. The cleaner fix is inside `StopWithWait`'s wait loop (see S2) — if the wait loop returns `nil` when the process is already gone by the time `ctx.Done` fires, `Delete`'s outer predicate never sees `DeadlineExceeded` in this scenario and the contract stays intact.

S2. **Spurious timeout when the service exits in the same scheduling tick as the deadline** — `pkg/service/cmd/service.go:384-410` [Claude Opus 4.7]

```go
for {
    select {
    case <-ctx.Done():
        _ = process.Kill(pid)
        err := ctx.Err()
        if errors.Is(err, context.DeadlineExceeded) {
            return fmt.Errorf("timed out waiting for ...", ...)
        }
    case <-ticker.C:
        if !Running() {
            _ = os.Remove(instance.Config())
            return nil
        }
    }
}
```

Go selects randomly when both `ctx.Done()` and `ticker.C` are ready in the same tick. A graceful shutdown that lands right on the deadline can take the `ctx.Done` branch even though `!Running()` is already true, returning `DeadlineExceeded` (exit 4). Re-running the command succeeds, so the symptom is minor user friction — but the race also feeds S1.

Fix: double-check `Running()` in the `ctx.Done` branch before declaring a timeout:

```diff
     case <-ctx.Done():
+        if !Running() {
+            _ = os.Remove(instance.Config())
+            return nil
+        }
         _ = process.Kill(pid)
         err := ctx.Err()
```

This also resolves S1 without changing `Delete`'s outer predicate.

S3. **`ensureServiceRunning` and `serviceStartAction` race `Running()` against `StartTime()`** — `cmd/rdd/service.go:99-103, 268-272` [Claude Opus 4.7] (round 7 S2 unaddressed, now at both call sites)

```go
if service.Running() {
    startTime, err := service.StartTime()
    if err != nil {
        return err
    }
```

`Running()` calls `PID()`, which removes a stale PID file when the process check fails (`pkg/service/cmd/service.go:158`). Between `Running()` returning true and `StartTime()` calling `os.Stat(instance.PIDFile())`, the daemon can exit and a second `PID()` invocation can remove the file. `StartTime()` then returns `fs.ErrNotExist`, which propagates as an opaque error (not `Classify`-able, not actionable) from an unrelated-looking call like `rdd ctl get`. The window is microseconds, but once a user hits it the diagnosis is hard.

Round 7 S2 flagged the race in `serviceStartAction` only; this round notes the same pattern in `ensureServiceRunning` too.

Fix: treat a missing PID file as "daemon exited between the two checks" at both call sites. Two options:

```diff
 startTime, err := service.StartTime()
-if err != nil {
+if errors.Is(err, fs.ErrNotExist) {
+    return cliexit.Classify(startAndWaitForReady(ctx, nil, startWaitTimeout))
+}
+if err != nil {
     return err
 }
```

Or fold the retry inside `StartTime()` so every caller gets consistent handling.

S4. **`cleanupDiscoveryConfigMap` still uses `context.TODO()` despite `StopWithWait` now carrying a `ctx`** — `pkg/service/cmd/service.go:434` [Codex GPT 5.4] [Claude Opus 4.7] (round 7 S6 deferred)

```go
err = controllers.CleanupDiscovery(context.TODO(), client)
```

The `--timeout` envelope does not bound the pre-shutdown discovery-cleanup RPC. If the apiserver is wedged or a connection stalls, `rdd svc stop --timeout=1m` / `rdd svc delete --timeout=5m` can hang indefinitely before the timed wait begins, contradicting `docs/design/cmd_service.md:117-121` which frames `--timeout` as bounding the whole operation.

Round 7 S6 raised the same three concerns (unbounded RPC, `context.TODO`, deferred durable fix) as a Suggestion, and the author skipped it pending the discovery-CM-cleanup pending-work item. The code comment at lines 353-357 documents moving cleanup into the daemon as the durable fix, so this stays a Suggestion rather than an Important issue — but threading the caller's `ctx` through `cleanupDiscoveryConfigMap(ctx)` is a one-line interim fix that makes the `--timeout` contract honest today:

```diff
-func cleanupDiscoveryConfigMap() error {
+func cleanupDiscoveryConfigMap(ctx context.Context) error {
     ...
-    err = controllers.CleanupDiscovery(context.TODO(), client)
+    err = controllers.CleanupDiscovery(ctx, client)
```

Codex raised this as Important. Per the repo-context rule about deliberate code-comment trade-offs, demoted to Suggestion.

S5. **`local_setup_file` in the new timeout file duplicates `setup_rdd_control_plane`** — `bats/tests/20-service/6-timeout.bats:11-15` [Claude Opus 4.7]

```bash
local_setup_file() {
    rdd svc delete --timeout=120s
    rdd svc create
    rdd svc start
}
```

`bats/helpers/controller.bash:setup_rdd_control_plane` does exactly this (and its 120s bound carries a detailed comment explaining why). Using the helper keeps the bounded-delete rationale in one place and inherits future updates automatically.

```diff
 local_setup_file() {
-    rdd svc delete --timeout=120s
-    rdd svc create
-    rdd svc start
+    setup_rdd_control_plane
 }
```

S6. **Test 3's hard-reset `rdd svc delete --timeout=120s` can fail with a confusing error** — `bats/tests/20-service/6-timeout.bats:36-41` [Claude Opus 4.7]

```bash
@test 'svc start --timeout=1ms exits with code 4' {
    # Reset to a known state; prior tests left the control plane half-dead.
    rdd svc delete --timeout=120s
    rdd svc create
    run -4 rdd svc start --wait --timeout=1ms
```

If the prior `svc delete --timeout=1ms` test left a daemon that refuses to die within 120s, this line exits 4 and BATS's `set -e` aborts the test with the same message `context deadline exceeded` — indistinguishable from the failure being tested. Test 2 uses `|| :` on its reset for exactly this reason.

Fix: add `|| :` to match test 2, or skip the teardown entirely by reusing test 2's verified post-state.

S7. **`limavm delete --wait` help text omits the "ignored if --wait=false" companion phrasing** — `cmd/rdd/limavm.go:254` [Claude Opus 4.7]

```go
command.Flags().BoolVar(&wait, "wait", true, "Wait for the VM to be deleted before returning")
```

Every `--timeout` flag added by this PR carries the "ignored if --wait=false" nudge. The `--wait` flag itself stays terse across all limavm subcommands. Not a bug; minor wording polish for symmetry.

S8. **`stopWaitTimeout` doc comment could name the specific function that caps the drain** — `cmd/rdd/service.go:37-38` [Claude Opus 4.7]

```go
// stopWaitTimeout bounds the CLI wait for the control plane to shut down.
// Run caps the internal drain at 45s (see pkg/service/cmd/service.go).
```

The "see pkg/service/cmd/service.go" pointer is accurate but vague — `Run` lives in that file at line 692, and the 45s cap is at line 859. Naming the function (`Run`) in the comment helps a reader find the cross-reference in one step.

S9. **`cmd_service.md` `rdd service delete` section omits the discovery-ConfigMap side-effect on timeout** — `docs/design/cmd_service.md:117-122` [Claude Opus 4.7]

The paragraph documents the directory-preservation invariant ("the directory is left in place") but not the parallel effect: a timed-out delete (like a timed-out stop) wipes the discovery ConfigMap while the daemon continues draining, so a follow-up command that auto-starts the service polls the missing ConfigMap until `startWaitTimeout` (90s). The code comment at `pkg/service/cmd/service.go:353-357` calls this out; the user-facing doc does not.

Fix: add one sentence to the `delete` section noting the discovery-ConfigMap side-effect, or cross-reference the `svc stop` doc if that gains a matching note.

---

## Design Observations

### Concerns

**`cleanupDiscoveryConfigMap` placement is load-bearing** — `pkg/service/cmd/service.go:353-358` *(future)* [Claude Opus 4.7]

The CLI-side cleanup (instead of daemon-side) is the root of both I1 and S4: `--wait=false` stop, timed-out stop, and timed-out delete all leave the ConfigMap deleted while the daemon is still serving. The TODO at lines 353-357 names the durable fix — delete the ConfigMap from the daemon's `ctx`-cancellation handler, stop touching it in `StopWithWait`. The move also removes the "CLI needs kubeconfig access to cleanly stop the service" coupling, which complicates permissions and error paths. Not in scope for this PR; listed here so the pending-work item carries the context.

### Strengths

- **`cliexit.Classify` is the right abstraction** — `pkg/cli/exit/exit.go:56-65` *(in-scope)* [Claude Opus 4.7] [Gemini 3.1 Pro]. A single choke-point at the CLI boundary, idempotent under `errors.As` (re-Classifying a `*cliexit.Error` is a no-op), and every call site uses it consistently.
- **`ensureServiceRunning` as the hidden fan-out** — `cmd/rdd/service.go:93-111` *(in-scope)* [Claude Opus 4.7] [Gemini 3.1 Pro]. Routing `context.DeadlineExceeded` through `cliexit.Classify` here propagates the exit-code-4 contract to every autostart caller (`rdd set`, `rdd kubectl`, `rdd service config`, all `rdd limavm *`) without repeating the classification at each site. Exactly the right place to centralize the contract.
- **Watch-drop handling distinguishes drop from unmet predicate** — `cmd/rdd/limavm.go:497-512, 550-566` *(in-scope)* [Claude Opus 4.7]. The post-watch re-read separates "watch closed before predicate satisfied" from "context deadline" from "satisfied predicate," addressing the silent-success-on-watch-drop class of bugs.
- **`svc delete` preserves the directory on timeout** — `pkg/service/cmd/service.go:448-465` *(in-scope)* [Gemini 3.1 Pro]. Dropping `--wait=false` from `svc delete` and preserving the directory on timed-out wait correctly addresses Windows directory-corruption and Unix PID-file mutual-exclusion risks under a live daemon.

---

## Testing Assessment

The new BATS coverage is solid for the primary contract (exit-code-4 with `context deadline exceeded`). Round 8 adds `limavm create --start`, `stop`, and `delete` timeout assertions, closing most of the round-7 coverage gaps. Remaining gaps, in priority order:

1. **Autostart entry-point exit-code-4 (I2)** — `rdd set`, `rdd ctl`, `rdd service config` all route through `ensureServiceRunning` for classification, with no dedicated coverage. A Go-level test against `ensureServiceRunning`/`Classify` is cheaper than BATS.
2. **`svc start --wait=false`** — not asserted to return immediately. A regression that reinstates waiting by default would ship green. The existing TODO at `bats/tests/20-service/2-create.bats:3` still tracks this.
3. **`svc stop --wait=false`** — not asserted to return immediately. Especially relevant given I1 (the pre-shutdown cleanup fires regardless).
4. **`--timeout=0` indefinite wait** — no test asserts `0` disables the deadline. A regression that treats `0` as "immediate timeout" would silently break the documented contract.
5. **CM disappears mid-poll** — `waitForFreshDiscoveryConfigMap` is not tested against a concurrent `rdd svc stop --wait=false` that deletes the CM while the poll loop is running (the I1 scenario).

---

## Documentation Assessment

`docs/design/cmd_service.md` and `cmd_lima.md` accurately describe the new defaults, flags, and exit-code-4 contract. The round-7 `api_service.md` imprecision is fixed and the signal-fallback wording now matches `StopWithWait` precisely.

Two gaps:

- **`cmd_service.md` `rdd service delete` section** — missing the discovery-ConfigMap side-effect on timed delete (S9 above). Parallel to the `stop` case; needs a matching note.
- **Minor wording in `cmd_service.md:94-95`** — "Run `rdd service start` again with no options to wait" is slightly ambiguous (the user can still pass `--timeout`). "Run `rdd service start` again to wait" reads cleaner.

Pre-existing test `bats/tests/20-service/2-create.bats:37-44` asserts log strings ("successfully started", "waiting for ... to be ready") that no current code path emits — flagged in round 7, not introduced by this PR.

---

## Commit Structure

One commit, one concept, descriptive message. Clean. Round 8 of iteration has converged on the same single-commit shape as round 7.

---

## Acknowledged Limitations

- **Discovery ConfigMap deleted before signaling, leaving a startup window for new clients.** `pkg/service/cmd/service.go:353-359` documents the trade-off for the timed-out case; I1 extends the scope to `--wait=false` and S4 names the `context.TODO` gap. The durable fix is moving cleanup into the daemon's own shutdown path.
- **`context.Canceled` in `Delete`'s predicate.** `pkg/service/cmd/service.go:455-461` notes that `context.Canceled` is unreachable today because the CLI runs on `context.Background()`. Verified: `cmd/rdd/main.go` passes `context.Background()` to `newServiceCommand`, and nothing else wires SIGINT into cobra's command context. Wiring SIGINT into `cmd.Context()` would let Ctrl-C fall through to deletion during a live stop, which needs an explicit decision.
- **Timeout values shorter than `stopWaitTimeout` race the daemon's 45s internal drain cap.** `pkg/service/cmd/service.go:395-398` documents this; the BATS suite avoids the race by using `--timeout=1ms` (deterministic loss) or `--timeout=120s`/`5m` (always wins).
- **Second-precision freshness anchor accepts a stale ConfigMap when a prior crash lands in the same wall-clock second as the new startup.** Round 6 S1 (Codex). The truncation aligns with `metav1.Time` JSON serialization (a real constraint); finer-grained anchoring would require carrying sub-second precision through the ConfigMap path.
- **`--timeout` scope is intentionally not extended over `ensureServiceRunning`** — `cmd/rdd/service.go:88-91` documents the 90s cap as independent of the caller's `--timeout` so a broken service fails fast on long-deadline commands. Design tradeoff flagged in multiple prior rounds (round 5 S3, round 7 Design Observation), consistently treated as intentional.
- **Deadline-expiry path sends SIGTERM (graceful), not SIGKILL (forced)** — `pkg/service/cmd/service.go:389-401` documents that SIGTERM is the *second* shutdown signal (SIGINT was sent earlier), which the apiserver's `SetupSignalContext` handler responds to with `os.Exit(1)`. The genuinely-deadlocked-daemon case accepts the hang for log flush and proper cleanup.

---

## Agent Performance Retro

### [Claude]

Deepest review again: 1 Important + 9 Suggestions + 1 Design Observation + 3 Strengths. Three are genuinely new this round: I1 (the `--wait=false` cleanup angle that Claude alone saw, requiring a trace through `ensureServiceRunning` → `waitForFreshDiscoveryConfigMap` → `MarkControlPlaneReady` to confirm the daemon doesn't recreate the CM), S1/S2 (the paired `Delete` exit-code 4 / wait-loop race). Five are re-raises of deferred round-7 items (S3 race, S4 context.TODO, plus test-file and doc polish) that Claude correctly tracked. One false positive: I2 clock-skew between CLI and serve subprocess — the subprocess runs on the same host kernel (`exec.CommandContext`, not a VM or container), so the clock-skew scenario does not apply. Claude also mislabeled I1 as "regression"; the `cleanupDiscoveryConfigMap` call was unconditional pre-PR, so it's a pre-existing gap surfaced by new docs.

### [Codex]

Tight and calibrated: 2 Important, no Suggestions, no false positives. I2 (autostart entry-point coverage) is the same gap Codex raised in rounds 6 and 7, framed here with sharper accounting of which entry points remain uncovered after round 8's `limavm` additions. I1 (context.TODO / unbounded pre-wait) is a re-raise of round 7's S6, which the author had deferred to the discovery-CM-cleanup pending-work item; demoted to Suggestion S4 per the repo-context rule about deliberate code-comment tradeoffs. Codex did not catch the `--wait=false` cleanup angle (Claude's I1) despite their shared focus on the same function.

### [Gemini]

Thinnest review: 1 Suggestion, dropped as a false positive. The `run_e -0` nit is the exact same finding Gemini raised in round 7 (also dropped there for the same reason: `assert_created` → `assert_info` → `assert_stderr_line` reads `$stderr`, so `run_e` is required). Gemini did not track that this was rejected last round and repeated it verbatim. The two Strengths observations (Centralized Contract Enforcement, Graceful Deletion Handling) are useful framings but not unique. Also did not run `git blame` per the known daily-quota limitation.

### Summary

| | Claude Opus 4.7 | Codex GPT 5.4 | Gemini 3.1 Pro |
|---|---|---|---|
| Duration | 22m 25s | 8m 33s | 5m 53s |
| Findings | 1I 9S | 1I 1S | none |
| Tool calls | 42 (Bash 25, Read 17) | 52 (shell 49, stdin 3) | 18 (grep_search 17, read_file 1) |
| Design observations | 4 | 0 | 2 |
| False positives | 1 | 0 | 1 |
| Unique insights | 3 | 1 | 0 |
| Files reviewed | 11 | 11 | 11 |
| Coverage misses | 0 | 0 | 0 |
| **Totals** | **1I 9S** | **1I 1S** | **none** |
| Downgraded | 0 | 1 (I→S) | 0 |
| Dropped | 1 | 0 | 1 |


Claude carried this round: I1 is the single new finding with real user-visible consequence, S1/S2 together close the exit-4 race in `Delete`, and the doc/test polish is concrete and targeted. Codex stayed sharp on the coverage gap. Gemini added little this round — its Suggestion was a deferred-to-pending-work repeat and its Strengths observations were reductive summaries of what Claude and Codex already framed more precisely.

---

## Review Process Notes

### Repo context updates

- **[repo] When the CLI spawns the daemon as a local subprocess (`exec.CommandContext`), discount clock-skew findings between CLI and daemon clocks.** The rdd control plane runs as a direct subprocess on the host, not inside a container or VM, so CLI-side `time.Now()` and subprocess-side `time.Now()` share the same host kernel and the same clock source. Findings that posit a clock-skew between the two processes require a VM/container boundary that does not exist here. Flag only if a code path crosses into a Lima VM or a container.

---

## Appendix: Original Reviews

### [Claude] — Pass 1

See `/tmp/deep-review-IkBy0u/review-claude-pass-1.md` — 1 Important + 9 Suggestions + 1 Concern + 3 Strengths. I1 (cleanupDiscoveryConfigMap on --wait=false) reproduced verbatim above. I2 (clock-skew) dropped as false positive. S1-S9 reproduced.

### [Codex] — Pass 1

See `/tmp/deep-review-IkBy0u/review-codex-pass-1.md` — 2 Important, 0 Suggestions. I1 demoted to S4 per repo-context rule; I2 kept as Important.

### [Gemini] — Pass 1

See `/tmp/deep-review-IkBy0u/review-gemini-pass-1.md` — 0 Important, 1 Suggestion (run_e -0), dropped as false positive (same finding as round 7 S8, same reason).