Deep Review: 20260521-205512-pr-361

Date	2026-05-21 20:55
Repo	rancher-sandbox/rancher-desktop-daemon
Round	2
Author	@jandubois
PR	#361 — kubernetes-reconciler: add V(1) tracing and merge-failure handling
Commits	`8543b4f` Document KubernetesReady reasons in api_app.md `6062ce1` Set KubernetesReady=False when kubeconfig merge fails `805b48e` Add V(1) tracing and log silent probe errors in kubernetes-reconciler `f0ff640` Extract WatchEventLogger to shared predicates package
Reviewers	Claude Opus 4.7 (effort: xhigh), Codex GPT 5.5 (effort: xhigh), Gemini 3.1 Pro (effort: default)
Verdict	Merge with optional fixes — round 1 blockers are addressed (docs, MergeFailed test, message wording); remaining items are doc-drift polish, additional Reconcile-path test coverage, and a daemon-side autostart change that persists `-v` into `args.json`.
Wall-clock time	`37 min 45 s`

Executive Summary ¶

Round 2. The PR closes the round-1 Suggestions on docs, MergeFailed coverage, and message wording, and adds two new commits: a daemon-side wiring that propagates klog verbosity through autostart, and the merge-failure condition itself. Reviewers found no Critical or Important regressions. The residual items are stale Go-side Settled docstring/prose that lag the new KubernetesReady gate in api_app.md, three uncovered Reconcile branches (NotApplicable, NotRunning, Probing), and an autostart change that bakes -v <N> into the persisted args.json.

Critical Issues ¶

None.

Important Issues ¶

None.

Suggestions ¶

S1. Settled docstring and prose still omit KubernetesReady Codex¶

pkg/apis/app/v1alpha1/app_types.go

	// rancher-desktop-{instance} context into ~/.kube/config. It is only
	// meaningful when spec.kubernetes.enabled is true; when Kubernetes is
	// disabled the condition is absent or False with reason NotApplicable.
	AppConditionKubernetesReady = "KubernetesReady"

	// AppConditionSettled reports whether the reconcile chain has
	// fully caught up with the current spec: observed generations on
	// the feeding conditions match the App's generation, and the VM
	// and engine have reached a stable state for the desired config.
	// A spec change forces Settled to False; once the chain quiesces,
	// the App reconciler flips it back to True. `rdd set` waits on
	// this condition.
	AppConditionSettled = "Settled"
)

The PR adds WaitingForKubernetes and KubernetesStale reasons to the conditions table and extends the (forwarded) row to mention KubernetesReady (api_app.md:117-128), but three prose surfaces still describe Settled as if KubernetesReady were not a gating condition: the AppConditionSettled docstring above; api_app.md:100 lists only "App controller" and "engine controller" as condition writers; api_app.md:132 says Settled is "computed from the mirrored Running and the engine-written ContainerEngineReady." A reader who hits these first will not realise KubernetesReady blocks Settled too.

docs/design/api_app.md


- **spec.kubernetes.version**: The Kubernetes version to use (e.g. `"1.30.2"`). Defaults to `"1.30.2"`. Propagated to the `KUBERNETES_VERSION` Lima template param.

- **status.kubernetesPort**: The host TCP port allocated for the k3s API server (`7441 + instance.Index()` by default). Set by the App reconciler on the first reconcile after `spec.kubernetes.enabled` becomes `true`, and cleared when `spec.kubernetes.enabled` is set back to `false` so that a fresh port is resolved on the next enable. The `KUBERNETES_PORT` Lima template param is set to this value; Lima's identity port-forward rule binds the same port on the host and forwards it to the guest.

- **status.conditions**: Multiple controllers write here. The App controller mirrors `Created` and `Running` from the owned `LimaVM`, computes `Settled`, and the engine controller writes `ContainerEngineReady`. All writers use `retry.RetryOnConflict` with a re-Get so concurrent status updates do not 409.

  | Type                   | Status    | Reason           | Description                                                       |
  |------------------------|-----------|------------------|-------------------------------------------------------------------|
  | `Created`              | `Unknown` | `Pending`        | LimaVM controller has started reconciliation                      |
  | `Created`              | `True`    | `Created`        | Lima instance created on disk and ready                           |
  | `Created`              | `False`   | `CreateFailed`   | Lima instance creation failed (see `message` for details)         |
  | `Running`              | `Unknown` | `Reconciling`    | Verifying instance state (e.g. after controller restart)          |
  | `Running`              | `True`    | `Started`        | Lima instance is running                                          |
  | `Running`              | `False`   | `Stopped`        | Lima instance is stopped                                          |
  | `Running`              | `False`   | `Starting`       | Lima instance is starting up                                      |
  | `Running`              | `False`   | `StartFailed`    | Lima instance failed to start                                     |
  | `Running`              | `False`   | `StopFailed`     | Lima instance failed to stop cleanly                              |
  | `ContainerEngineReady` | `True`    | `Connected`      | Engine controller has connected to Docker and completed full sync |
  | `ContainerEngineReady` | `True`    | `NotApplicable`  | Mirroring is not implemented for the current backend (e.g. `containerd`); forced `True` so `rdd set` can finish waiting |
  | `ContainerEngineReady` | `False`   | `ConnectFailed`  | Engine controller failed to connect to Docker                     |
  | `ContainerEngineReady` | `False`   | `Stopped`        | The VM is stopped; the engine watcher is not running              |
  | `KubernetesReady`      | `True`    | `Ready`          | k3s API server is reachable; instance context merged into `~/.kube/config` |
  | `KubernetesReady`      | `False`   | `NotApplicable`  | `spec.kubernetes.enabled` is false                                |
  | `KubernetesReady`      | `False`   | `NotRunning`     | VM is not running, so k3s cannot be healthy                       |
  | `KubernetesReady`      | `False`   | `Probing`        | Waiting for k3s API server to respond                             |
  | `KubernetesReady`      | `False`   | `MergeFailed`    | k3s is reachable but merging the instance kubeconfig failed (see `message`) |
  | `Settled`              | `True`    | `Settled`        | Reconcile chain has caught up with the current spec               |
  | `Settled`              | `False`   | `WaitingForLimaVM` | The App has no `Running` condition yet (nothing mirrored from LimaVM) |
  | `Settled`              | `False`   | `WaitingForEngine` | Engine controller has not yet written `ContainerEngineReady`    |
  | `Settled`              | `False`   | `EngineStale`    | Engine controller has not yet observed the current generation     |
  | `Settled`              | `False`   | `WaitingForKubernetes` | Kubernetes controller has not yet written `KubernetesReady`  |
  | `Settled`              | `False`   | `KubernetesStale` | Kubernetes controller has not yet observed the current generation |
  | `Settled`              | `False`   | *(forwarded)*    | Forwarded from the blocking `Running`, `ContainerEngineReady`, or `KubernetesReady` reason |

  `Running=True` means the Lima guest has finished booting and SSH is reachable. It says nothing about the container engine socket; consumers that depend on the engine (container/image/volume mirrors, `docker` against the forwarded socket) must also check `ContainerEngineReady`, which flips to `True` only after the engine controller has connected to the socket and completed its initial full sync.

  The `Created` and `Running` conditions are mirrored from LimaVM, so their `lastTransitionTime` reflects the LimaVM transition rather than the copy — the timestamp is meaningful for staleness checks. The engine reconciler stamps `ContainerEngineReady.observedGeneration` with the App's `metadata.generation` at the time it writes the condition; if the App's generation advances between the reconciler's read and write, the stamp reflects the write-time generation rather than the observed one. The App reconciler computes `Settled` from the mirrored `Running` and the engine-written `ContainerEngineReady`, and stamps its own `observedGeneration` with the `metadata.generation` observed when it computed the condition, not the generation at write time. The retry-on-conflict loop re-reads on 409, so a successful write always carries the current generation. `rdd set` waits for `Settled.status=True` with `observedGeneration >= post-patch metadata.generation`, so stale snapshots cannot prematurely satisfy the wait.

Deleting the `App` resource triggers the finalizer to stop and delete the owned LimaVM (and wait for the LimaVM controller to complete its own cleanup before removing the App finalizer).

## GUI

Fix: extend the docstring at app_types.go:48-49 to mention Kubernetes when enabled, add the Kubernetes controller to the writers list at api_app.md:100, and update the Settled derivation paragraph at api_app.md:132.

S2. ensureServiceRunning bakes -v <N> into the persisted args.json on first autostart Claude¶

cmd/rdd/service.go

		RunE:  serviceConfigAction,
	}
	return command
}

func ensureServiceRunning(ctx context.Context) error {
	// Pass klog verbosity the same way `rdd svc create` / `rdd svc start` do,
	// so V(1) tracing fires when autostart launches the daemon (e.g. from
	// `rdd set`).
	klogArgs := []string{"-v", logrusLevelToKlog()}
	if !service.Exists() {
		if err := service.Create(klogArgs); err != nil {
			return err
		}
	}
	if service.Running() {
		ctx, cancel := context.WithTimeout(ctx, 90*time.Second)
		defer cancel()
		if err := service.Wait(ctx); err != nil {
			return err
		}
		return waitForDiscoveryConfigMap(ctx)
	}
	return startAndWaitForReady(ctx, klogArgs)
}

// startAndWaitForReady starts the service, waits for the API server and the
// discovery ConfigMap to be ready. After an unclean shutdown the ConfigMap may
// survive with stale data, so the freshness check waits for a ConfigMap whose
// creationTimestamp is at or after the current startup.

service.Create persists its arg slice to instance.ArgsFile(). With the new klogArgs, the first rdd set (or any other autostart trigger) on a fresh checkout now writes a verbosity tag derived from the calling shell's logrus level into the on-disk args. serviceStartAction already appends -v <N> to each start invocation, so the persisted value never takes effect — but it does drift from what an explicit rdd svc create would have stored, and anyone inspecting args.json will see a verbosity that depends on which command happened to autostart the daemon first.

Fix: keep service.Create(nil) for the autostart-create branch and pass klogArgs only on the transient startAndWaitForReady path.

-    klogArgs := []string{"-v", logrusLevelToKlog()}
     if !service.Exists() {
-        if err := service.Create(klogArgs); err != nil {
+        if err := service.Create(nil); err != nil {
             return err
         }
     }
     ...
-    return startAndWaitForReady(ctx, klogArgs)
+    return startAndWaitForReady(ctx, []string{"-v", logrusLevelToKlog()})

S3. Four of six Reconcile branches remain uncovered Claude¶

The two new tests exercise the Ready and MergeFailed terminals. The other four Reconcile branches have no Reconcile-level coverage: NotApplicable (kube_reconciler.go:82-89), NotRunning (kube_reconciler.go:92-99), Probing from a probe error (kube_reconciler.go:103-114), and Probing from a non-200 healthz (kube_reconciler.go:115-125). The NotApplicable and NotRunning paths also call removeKubeContext, which has no Reconcile-level test at all — declined Round 1's Testing Gap #3 covered only probeCurrentKubeContext internals, not the dispatch surface.

pkg/controllers/app/kubernetes/controllers/kube_reconciler.go

		"generation", app.Generation,
		"resourceVersion", app.ResourceVersion,
	)

	// When Kubernetes is disabled, stamp the condition and clean up.
	if !app.Spec.Kubernetes.Enabled {
		r.removeKubeContext(ctx)
		return ctrl.Result{}, r.setKubeCondition(ctx, &app,
			metav1.ConditionFalse,
			appv1alpha1.AppKubernetesReasonNotApplicable,
			"Kubernetes is not enabled",
		)
	}

	// Kubernetes is enabled but the VM is not (yet) running.
	if !running {
		r.removeKubeContext(ctx)
		return ctrl.Result{}, r.setKubeCondition(ctx, &app,
			metav1.ConditionFalse,
			appv1alpha1.AppKubernetesReasonNotRunning,
			"VM is not running",
		)
	}

	// Probe the k3s API server from the instance kubeconfig.
	healthy, err := r.probeK3sAPI(ctx)
	if err != nil {
		// kubeconfig missing or unreadable — k3s has not started yet.
		log.V(1).Info("Cannot probe k3s API server", "err", err)
		if condErr := r.setKubeCondition(ctx, &app,
			metav1.ConditionFalse,
			appv1alpha1.AppKubernetesReasonProbing,
			"Waiting for k3s API server",
		); condErr != nil {
			return ctrl.Result{}, condErr
		}
		return ctrl.Result{RequeueAfter: kubeProbeRequeue}, nil
	}
	if !healthy {
		log.V(1).Info("k3s API server not yet healthy")
		if condErr := r.setKubeCondition(ctx, &app,
			metav1.ConditionFalse,
			appv1alpha1.AppKubernetesReasonProbing,
			"Waiting for k3s API server",
		); condErr != nil {
			return ctrl.Result{}, condErr
		}
		return ctrl.Result{RequeueAfter: kubeProbeRequeue}, nil
	}

	// k3s is healthy — merge the context into ~/.kube/config.
	if err := r.manageKubeContext(ctx); err != nil {
		if condErr := r.setKubeCondition(ctx, &app,
			metav1.ConditionFalse,

Fix: add Test_Reconcile_KubernetesReady_NotApplicable, _NotRunning, and two _Probing cases (probe error + non-200) using the existing newAppRunning / newKubeScheme scaffolding.

S4. Tests do not isolate HOME Claude¶

pkg/controllers/app/kubernetes/controllers/kube_reconciler_test.go

}

func Test_Reconcile_KubernetesReady_Ready(t *testing.T) {
	dir := t.TempDir()
	srcPath := fakeK3sServer(t, filepath.Join(dir, "k3s.yaml"))
	t.Setenv("KUBECONFIG", filepath.Join(dir, ".kube", "config"))

	scheme := newKubeScheme(t)
	app := newAppRunning()
	c := fake.NewClientBuilder().
		WithScheme(scheme).

Both tests set KUBECONFIG but not HOME. kubeConfigPath() (kube_context.go:19-28) reads KUBECONFIG first and only falls back to $HOME/.kube/config, so the tests pass today — but a future refactor that adds an "ignore empty KUBECONFIG" path or threads kubeconfig resolution through clientcmd.NewDefaultClientConfigLoadingRules would silently start operating on the developer's real ~/.kube/config. Defense-in-depth costs one line per test.

pkg/controllers/app/kubernetes/controllers/kube_context.go

	clientcmdapi "k8s.io/client-go/tools/clientcmd/api"
)

// kubeConfigPath returns ~/.kube/config, or $KUBECONFIG if set.
// Computed dynamically so t.Setenv("HOME", …) works in tests.
func kubeConfigPath() (string, error) {
	if kc := os.Getenv("KUBECONFIG"); kc != "" {
		return kc, nil
	}
	home, err := os.UserHomeDir()
	if err != nil {
		return "", err
	}
	return filepath.Join(home, ".kube", "config"), nil
}

// loadKubeConfig loads the kubeconfig at path; returns an empty config if absent.
func loadKubeConfig(path string) (*clientcmdapi.Config, error) {
	cfg, err := clientcmd.LoadFromFile(path)
	if os.IsNotExist(err) {

+    t.Setenv("HOME", dir)
     t.Setenv("KUBECONFIG", filepath.Join(dir, ".kube", "config"))

S5. Float-based duration comparison breaks parallel with the sibling test Claude¶

pkg/controllers/app/kubernetes/controllers/kube_reconciler_test.go

	t.Cleanup(func() { r.removeKubeContext(context.Background()) })

	req := ctrl.Request{NamespacedName: client.ObjectKey{Name: appName}}
	result, err := r.Reconcile(context.Background(), req)
	assert.NilError(t, err)
	assert.Equal(t, result.RequeueAfter.Seconds(), float64(0),
		"Ready path should not request a requeue")

	got := &appv1alpha1.App{}
	assert.NilError(t, c.Get(context.Background(), client.ObjectKey{Name: appName}, got))

	cond := findKubeReady(got)

Test_Reconcile_KubernetesReady_MergeFailed (line 175) compares the duration directly: assert.Equal(t, result.RequeueAfter, kubeProbeRequeue, ...). The Ready test takes the .Seconds() detour against a float64(0) literal, which is harder to read and invites float comparisons on a future tick.

pkg/controllers/app/kubernetes/controllers/kube_reconciler_test.go

	}

	req := ctrl.Request{NamespacedName: client.ObjectKey{Name: appName}}
	result, err := r.Reconcile(context.Background(), req)
	assert.NilError(t, err)
	assert.Equal(t, result.RequeueAfter, kubeProbeRequeue,
		"MergeFailed path should requeue after kubeProbeRequeue")

	got := &appv1alpha1.App{}
	assert.NilError(t, c.Get(context.Background(), client.ObjectKey{Name: appName}, got))

-    assert.Equal(t, result.RequeueAfter.Seconds(), float64(0),
-        "Ready path should not request a requeue")
+    assert.Equal(t, result.RequeueAfter, time.Duration(0),
+        "Ready path should not request a requeue")

S6. predicates package docstring frames a one-member package as a collection Claude¶

pkg/controllers/app/predicates/predicates.go

// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: SUSE LLC
// SPDX-FileCopyrightText: The Rancher Desktop Authors

// Package predicates provides controller-runtime predicates for the App
// controllers.
package predicates

import (
	"sigs.k8s.io/controller-runtime/pkg/event"
	logf "sigs.k8s.io/controller-runtime/pkg/log"
	"sigs.k8s.io/controller-runtime/pkg/predicate"

The package contains exactly one predicate today. If a second member is in flight, the docstring is fine; if not, keeping the helper next to one of its callers and re-extracting on the second consumer keeps the package count honest. No fix required — call the question while the file is fresh.

Design Observations ¶

Concerns ¶

probeCurrentKubeContext ignores ExecProvider, AuthProvider, CAFile, and CertFile (future) Gemini. The hand-rolled HTTP client at kube_reconciler.go:312-332 only honours inline CAData / CertData / KeyData and a bearer token. A developer whose current-context uses an EKS exec-credential plugin, a GKE auth-provider, or any file-based cert path will see the probe fail with TLS or 401, and the background goroutine at kube_reconciler.go:259-263 interprets the failure as "current-context unhealthy" and silently overwrites it with rancher-desktop. Verified at e8033bd (merge-base) that the construction predates this PR; the new V(1) log lines make the failure mode diagnosable but do not address the auth coverage. A rest.HTTPClientFor(restCfg) call would pick up every kubeconfig auth path client-go supports.
removeKubeContext swallows cleanup errors (future) Gemini. kube_reconciler.go:355-374 logs but does not return errors from clearCurrentKubeContext and deleteKubeContext. The caller at kube_reconciler.go:83 discards the call and stamps KubernetesReady=False/NotApplicable. A Windows file lock or permission error on ~/.kube/config therefore leaks the rancher-desktop entry indefinitely with no retry or backoff. Pre-existing (unchanged from e8033bd). Aggregating the errors with errors.Join and returning them from removeKubeContext would let controller-runtime apply normal backoff.
manageKubeContext rewrites ~/.kube/config unconditionally on every reconcile (future) Gemini. The call at kube_reconciler.go:128 fires on every reconcile that reaches the healthy branch (17 per the PR's own smoke test). createReplaceKubeContext (kube_context.go:42-88) re-reads, re-merges, and re-writes the file each time, then spawns a background probeCurrentKubeContext goroutine. Pre-existing, but it deserves a deep-equal short-circuit in createReplaceKubeContext: skip the write (and the goroutine) when the target cluster, user, and context already match the on-disk values.
cmd/app-controller/main.go does not blank-import kubernetes or engine (future) Claude. The standalone development binary at cmd/app-controller/main.go:9-16 imports only app and demo, so a developer running bin/app-controller exercises neither the engine reconciler nor the kubernetes reconciler this PR diagnoses. Pre-existing (introduced by #356 for kubernetes and earlier for engine); flagging it here while the file is adjacent.

Strengths ¶

K3sConfigPath injection Claude. Replacing the process-global instance.K3sConfig() call with a KubernetesReconciler field threaded through controller.go:46-49 is the minimum change that breaks the test's dependency on process state — and it enables both new tests without touching any sibling controller.
probeCurrentKubeContext error-path logging preserves the "leave current-context alone" contract Claude. Every previously silent fall-through now logs at Error level while still returning true, and the new docstring at kube_reconciler.go:271-274 documents why — the kind of comment that would have shortened the #358 investigation.
MergeFailed propagates cleanly to Settled Claude Codex. The new AppKubernetesReasonMergeFailed is an exported constant, documented inline, surfaced in api_app.md, and forwarded by computeSettledCondition via the existing kubeCond.Reason path — docs, code, and the Settled forwarding stay in sync.
Predicate extraction prevents drift Claude Codex Gemini. The shared predicates.WatchEventLogger lets both reconcilers adopt watch-event logging without two copies of the predicate; the move commit is a pure refactor.
manageKubeContext separates synchronous and asynchronous error handling Codex. Returning only the synchronous merge error keeps the readiness contract focused on what the user can act on while preserving the deliberate best-effort current-context probe.

Testing Assessment ¶

Unit tests for the changed packages pass (go test ./pkg/controllers/app/...). go test ./cmd/rdd could not compile under the agent worktree because pkg/guestagent/guestagent.go embeds lima-guestagent.gz, which is generated by make and absent from a fresh checkout — not a PR-introduced regression.

Coverage gaps, ranked:

Probe-error and probe-unhealthy Reconcile paths are uncovered — production startup hits both before Ready. (S3)
VM-not-running and Kubernetes-disabled paths are uncovered — both invoke removeKubeContext, which still has no Reconcile-level test. (S3)
App reconciler Settled forwarding of MergeFailed has no dedicated test — the generic Kubernetes-false propagation reduces the risk but does not pin the new reason. Codex
predicates.WatchEventLogger has no test — declined Round 1; the in-production usage is the de-facto smoke.

Documentation Assessment ¶

docs/design/api_app.md gains the KubernetesReady reason rows and WaitingForKubernetes/KubernetesStale forwarded reasons. Internal docstrings on the new K3sConfigPath field, manageKubeContext's sync/async split, and probeCurrentKubeContext's "return true on internal error" contract are accurate and load-bearing. The remaining drift sits in the AppConditionSettled Go docstring and the surrounding prose in api_app.md — see S1.

Commit Structure ¶

Four clean commits, one concept each: f0ff640 extract, 805b48e tracing, 6062ce1 condition propagation, 8543b4f docs. Each is independently revertable and each message matches its diff. No fold-in candidates.

Acknowledged Limitations ¶

Code comment — pkg/controllers/app/kubernetes/controllers/kube_reconciler.go:204-208: "Returns an error if the synchronous merge fails; failures from the async current-context probe are logged but not returned." The PR's behavior matches the docstring; see the first Design Concern for the residual gap.
Code comment — pkg/controllers/app/kubernetes/controllers/kube_reconciler.go:271-274: "Returns true on unexpected internal errors (path lookup, kubeconfig parse, TLS setup) to leave the user's current-context alone; each such path logs the underlying error." Implemented by this PR.

Declined in prior rounds ¶

WatchEventLogger nil-object guards — declined Round 1: cache-driven watch source never delivers nil events; defensive code for an unreachable state contradicts YAGNI. No material change since the decline.
MergeFailed re-logs on every requeue with no backoff — declined Round 1: matches the existing probe-failure pattern; log-file noise only, not user-facing terminal; setKubeCondition short-circuits identical reason/message. Claude's pass-1 finding "manageKubeContext logs the merge error twice" proposes a different fix (demote the in-function log.Error to V(1)) for the same noise concern at the same code surface (kube_reconciler.go:213-216); dropped here under this decline.
probeCurrentKubeContext error-path tests — declined Round 1: integration-style mocking cost outweighs value for observability-only paths.
predicates.WatchEventLogger smoke test — declined Round 1: pure observability with a single-line return; the in-app reconciler usage is the de-facto smoke test.

Agent Performance Retro ¶

Claude ¶

Claude produced the largest set this round — six suggestions, one Design concern, and three Strengths — and the only finding rooted in the autostart wiring (S2). That finding required tracing into pkg/service/cmd to verify that service.Create's slice arg lands in args.json, which the diff itself does not reveal. Claude also caught the standalone cmd/app-controller binary missing the kubernetes and engine imports while reading the new controller wiring, surfacing a pre-existing gap as a Design Observation. One Claude finding — "manageKubeContext logs the merge error twice" — restated the noise concern at the same code surface as Round 1's declined S5; moved to Declined in prior rounds. No false positives.

Codex ¶

Codex's single finding (S1) was the one piece any agent caught that ties directly to the Round 1 fix's narrower-than-original scope: the conditions table was extended for KubernetesReady/WaitingForKubernetes/KubernetesStale, but the surrounding AppConditionSettled docstring and two api_app.md prose paragraphs still describe Settled as if KubernetesReady were not part of the gate. Codex was the only agent to read across both Go docstrings and design prose to spot the drift. No false positives.

Gemini ¶

Gemini's two Important findings and one Suggestion all named real bugs in probeCurrentKubeContext (incomplete auth coverage), removeKubeContext (swallowed cleanup errors), and manageKubeContext (unconditional disk I/O). Verified at the merge-base SHA e8033bd: every cited construction predates this PR, so all three moved to Design Observations as pre-existing concerns. Severity calibration ran one notch high on the same I-vs-S axis as Round 1. Gemini also hit a 429 quota error mid-run but produced a complete structured review before that point.

Summary ¶

	Claude Opus 4.7	Codex GPT 5.5	Gemini 3.1 Pro
Duration	9m 51s	6m 20s	6m 46s
Findings	5S	1S	none
Tool calls	52 (Bash 32, Read 20)	47 (shell 47)	—
Design observations	4	2	2
False positives	0	0	0
Unique insights	5	1	4
Files reviewed	8	8	8
Coverage misses	0	0	0
Totals	5S	1S	none
Downgraded	0	0	3 (I→S)
Dropped	1	0	0

Reconciliation: Gemini I1 (probeCurrentKubeContext incomplete auth) → Design Observation concern; Gemini I2 (removeKubeContext silent cleanup) → Design Observation concern; Gemini S1 (manageKubeContext unconditional I/O) → Design Observation concern — all three demoted after merge-base verification confirmed the cited code predates this PR. Claude's "manageKubeContext logs the merge error twice" suggestion → dropped under Round 1's declined S5 (log file noise declined as acceptable).

The value distribution this round: Claude on test/coverage polish and the autostart wiring, Codex on the doc fix's residual scope, Gemini on pre-existing design concerns the diff exposed. No agent dominated; the three slices remained complementary.

Review Process Notes ¶

Skill improvements ¶

None — the prior-round gates worked as written. The decline-table walk caught Claude's duplicate-logging restatement, and the merge-base verification gate caught all three Gemini pre-existing findings before they reached the consolidated Important section.

Repo context updates ¶

None — deep-review-context.md already covers the relevant patterns (cross-controller condition propagation, kubeconfig handling, controller imports in cmd/*-controller/main.go). The PR's behavior matches every applicable entry.