Deep Review: 20260415-120235-pr-302

Date	2026-04-15 12:18
Repo	rancher-sandbox/rancher-desktop-daemon
Round	2
Author	@Nino-K
PR	#302 — Validating admission controller app settings
Commits	`aa37b3f` Address admission webhook review feedback `af452c9` Add bats tests to verify the validating admission controller `2a4a2ea` Add validating admission webhook for App settings
Reviewers	Claude Opus 4.6, Codex GPT 5.4, Gemini 3.1 Pro
Verdict	Merge with fixes — remove the dead `containerEngine.name` check that reviewers already asked to drop in round 1.
Wall-clock time	`19 min 10 s`

Executive Summary ¶

This PR moves App validation out of the reconciler and into a validating admission webhook, enforcing kubernetes.version ∈ bundled k3s-versions.json and requiring a non-empty kubernetes.version when kubernetes.enabled=true. The webhook registers for Create and Update, parses the JSON bundle once at NewAppValidator() so a broken fixture fails startup rather than the first admission request, and is exercised end-to-end through rdd ctl apply --dry-run=server BATS tests that cover both admission paths. The framework integration matches sibling controllers (limavm, containernamespace) cleanly.

The only blocking issue is unresolved round-1 feedback: the webhook still contains a duplicate containerEngine.name in {moby, containerd} check that is unreachable dead code behind the CRD's enum schema. @jandubois explicitly told the author to drop it in the previous review (discussion_r3082914825), but the feedback commit aa37b3f did not remove it. All three suggestions target test hygiene: one missing negative case, one hardcoded K3S_VERSION that will break on the next k3s bump, and one mislabeled test that actually exercises CRD schema validation rather than the new webhook.

Structure: 0 Critical, 1 Important, 4 Suggestions.

Critical Issues ¶

None.

Important Issues ¶

I1. Unresolved round-1 feedback: duplicate containerEngine.name check is unreachable dead code Gemini¶

pkg/controllers/app/app/controllers/app_webhook.go

func (v *AppValidator) ValidateDelete(_ context.Context, _ *v1alpha1.App) (ctrlwebhookadmission.Warnings, error) {
	return nil, nil
}

func (v *AppValidator) validate(app *v1alpha1.App) (ctrlwebhookadmission.Warnings, error) {
	engine := app.Spec.ContainerEngine.Name
	if engine != "moby" && engine != "containerd" {
		return nil, fmt.Errorf("spec.containerEngine.name %q is invalid; must be \"moby\" or \"containerd\"", engine)
	}

	k8s := app.Spec.Kubernetes
	if k8s.Enabled && k8s.Version == "" {
		return nil, errors.New("spec.kubernetes.version must not be empty when spec.kubernetes.enabled is true")
	}

The CRD schema already constrains this field to enum: [moby, containerd] (pkg/controllers/app/app/crd.yaml:60-63). CRD structural-schema validation runs strictly before validating webhooks in the admission pipeline, so any request with an invalid engine name is rejected by the API server with 422 Unprocessable Entity before the webhook is ever invoked. The check is dead code.

pkg/controllers/app/app/crd.yaml

                  name:
                    default: moby
                    description: |-
                      name specifies the container engine to use.
                      Valid values are "moby" (Docker-compatible) and "containerd".
                    enum:
                    - moby
                    - containerd
                    type: string
                required:
                - name
                type: object
              kubernetes:
                description: kubernetes specifies the Kubernetes configuration.

@jandubois explicitly called this out on app_webhook.go:50 in the round-1 review (discussion_r3082914825): "No, Kubernetes doesn't allow you assign arbitrary values to enum fields. You don't need to validate again." The feedback commit aa37b3f addressed @mook-as's strings.Cut suggestion but did not remove this check.

pkg/controllers/app/app/controllers/app_webhook.go

	return nil, nil
}

func (v *AppValidator) validate(app *v1alpha1.App) (ctrlwebhookadmission.Warnings, error) {
	engine := app.Spec.ContainerEngine.Name
	if engine != "moby" && engine != "containerd" {
		return nil, fmt.Errorf("spec.containerEngine.name %q is invalid; must be \"moby\" or \"containerd\"", engine)
	}

	k8s := app.Spec.Kubernetes
	if k8s.Enabled && k8s.Version == "" {

Claude Opus 4.6 and Codex GPT 5.4 also noticed the duplication but deferred to the author's acknowledgement in the PR discussion. Because the round-1 feedback unambiguously asked for removal and the code still ships with the check, this reviewer treats it as unresolved feedback rather than an accepted exception.

Fix:

 func (v *AppValidator) validate(app *v1alpha1.App) (ctrlwebhookadmission.Warnings, error) {
-	engine := app.Spec.ContainerEngine.Name
-	if engine != "moby" && engine != "containerd" {
-		return nil, fmt.Errorf("spec.containerEngine.name %q is invalid; must be \"moby\" or \"containerd\"", engine)
-	}
-
 	k8s := app.Spec.Kubernetes
 	if k8s.Enabled && k8s.Version == "" {
 		return nil, errors.New("spec.kubernetes.version must not be empty when spec.kubernetes.enabled is true")
 	}

Removing the check also shrinks the webhook's responsibility to the rules the CRD schema cannot express (the enabled && version=="" cross-field constraint and the version-in-set lookup), which is a better shape for the file long-term. See also S3, which addresses the companion BATS test that currently exercises the CRD enum rather than the webhook.

Suggestions ¶

S1. Hardcoded K3S_VERSION="1.32.0" will break tests on the next k3s bump Codex¶

bats/tests/32-app-controllers/app.bats


APP_NAME="app"
VM_NAME="rd"
INPUT_CM_NAME="rd"
APP_VALIDATOR_CONFIG="app-validator"
K3S_VERSION="1.32.0"

delete_app() {
    # Stop the VM first so the LimaVM finalizer clears quickly.
    rdd set running=false 2>/dev/null || true
    rdd ctl delete app "${APP_NAME}" --ignore-not-found

@test "dry-run accepts supported kubernetes.version" {
    run -0 rdd ctl apply --dry-run=server -f - <<EOF
...
    version: "${K3S_VERSION}"
EOF
}

The constant at line 14 and the happy-path test at lines 451-464 bake in 1.32.0, but the webhook's source of truth is the embedded k3s-versions.json loaded at pkg/controllers/app/app/controller.go:36-37. A routine k3s-version bump that drops 1.32.0 from the bundle will break this test even though admission is still working correctly for the then-current supported versions. The repo already has one older hardcoded version in app.bats, so this extends an existing maintenance hazard rather than avoiding it.

pkg/controllers/app/app/controller.go

var limaImagesWSL2 string

//go:embed lima-template.yaml
var limaTemplate string

//go:embed k3s-versions.json
var k3sVersionsData string

func limaTemplateData() string {
	images := limaImagesUnix
	if runtime.GOOS == "windows" {
		images = limaImagesWSL2

bats/tests/32-app-controllers/app.bats

    version: "0.0.0"
EOF
    assert_output --partial "not supported"
}

@test "dry-run accepts supported kubernetes.version" {
    run -0 rdd ctl apply --dry-run=server -f - <<EOF
apiVersion: app.rancherdesktop.io/v1alpha1
kind: App
metadata:
  name: app
spec:
  running: false
  containerEngine:
    name: moby
  kubernetes:
    enabled: true
    version: "${K3S_VERSION}"
EOF
}

@test "webhook rejects invalid update via patch dry-run" {
    # The App may not exist here and might be deleted after kubernetes tests, so we create one.
    create_app false

Fix: derive the accepted version from the same bundle the webhook reads, normalizing it the same way as parseK3sVersions() (strip v prefix, cut at +):

get_supported_k3s_version() {
    run -0 jq -r '.versions[0]' pkg/controllers/app/app/k3s-versions.json
    local raw="${output#v}"
    printf '%s\n' "${raw%%+*}"
}

Then substitute the value into the success-path tests. The version the webhook actually accepts and the version the test submits now both come from the same file.

S2. Missing negative test: invalid version with kubernetes.enabled=false Claude¶

The webhook validates kubernetes.version against the supported set regardless of whether kubernetes.enabled is true (app_webhook.go:59-63):

pkg/controllers/app/app/controllers/app_webhook.go

	k8s := app.Spec.Kubernetes
	if k8s.Enabled && k8s.Version == "" {
		return nil, errors.New("spec.kubernetes.version must not be empty when spec.kubernetes.enabled is true")
	}

	if k8s.Version != "" {
		if _, ok := v.supportedK8sVersions[k8s.Version]; !ok {
			return nil, fmt.Errorf("spec.kubernetes.version %q is not supported; see the bundled k3s-versions.json for valid versions", k8s.Version)
		}
	}

	return nil, nil
}

bats/tests/32-app-controllers/app.bats

# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: SUSE LLC
# SPDX-FileCopyrightText: The Rancher Desktop Authors

load '../../helpers/load'

@mook-as specifically noted this design decision in the PR review ("That means we verify the version when it's disabled. Probably a good idea, I guess…") — it is intentional and prevents the App from storing a version that would blow up the moment kubernetes is later enabled. But the BATS suite only exercises enabled=true + invalid version; there is no coverage for the enabled=false + invalid version path, so a regression that accidentally gated this check on Enabled would ship green.

Fix: add a dry-run test case that pins the behavior:

@test "dry-run rejects unsupported version even when kubernetes is disabled" {
    run -1 rdd ctl apply --dry-run=server -f - <<YAML
apiVersion: app.rancherdesktop.io/v1alpha1
kind: App
metadata:
  name: app
spec:
  running: false
  containerEngine:
    name: moby
  kubernetes:
    enabled: false
    version: "0.0.0"
YAML
    assert_output --partial "not supported"
}

S3. "dry-run rejects invalid containerEngine name" exercises the CRD enum, not the webhook Claude¶

bats/tests/32-app-controllers/app.bats

  kubernetes:
    enabled: false
EOF
}

@test "dry-run rejects invalid containerEngine name" {
    run -1 rdd ctl apply --dry-run=server -f - <<EOF
apiVersion: app.rancherdesktop.io/v1alpha1
kind: App
metadata:
  name: app
spec:
  running: false
  containerEngine:
    name: kvm
  kubernetes:
    enabled: false
EOF
    assert_output --partial "containerEngine.name"
}

@test "dry-run rejects kubernetes.enabled=true with empty version" {
    run -1 rdd ctl apply --dry-run=server -f - <<EOF
apiVersion: app.rancherdesktop.io/v1alpha1
kind: App

The test lives under the --- Admission webhook: --dry-run=server validation --- section header (line 362), which implies it is testing the webhook. In reality, CRD schema validation happens before validating webhooks, and crd.yaml:60-63 already constrains containerEngine.name to enum: [moby, containerd]. The API server rejects name: kvm via the schema long before the webhook is reached — the match on the containerEngine.name substring happens to pass because the CRD error message also contains that path.

pkg/controllers/app/app/crd.yaml

                  name:
                    default: moby
                    description: |-
                      name specifies the container engine to use.
                      Valid values are "moby" (Docker-compatible) and "containerd".
                    enum:
                    - moby
                    - containerd
                    type: string
                required:
                - name
                type: object
              kubernetes:
                description: kubernetes specifies the Kubernetes configuration.

bats/tests/32-app-controllers/app.bats

    rdd ctl wait --for=condition=Running=False \
        limavm/"${VM_NAME}" --namespace "${RDD_NAMESPACE}" --timeout=60s
    delete_app
}

# --- Admission webhook: --dry-run=server validation ---

@test "webhook configuration is registered" {
    rdd ctl wait --for=create ValidatingWebhookConfiguration "${APP_VALIDATOR_CONFIG}" --timeout=60s

    run -0 rdd ctl get ValidatingWebhookConfiguration "${APP_VALIDATOR_CONFIG}" \

The test still has value as a behavioral assertion ("invalid engine name is rejected"), but it is misfiled. Once I1 is fixed and the webhook no longer pretends to validate this field, the test should explicitly document that it is exercising CRD schema validation (e.g., rename to "CRD schema rejects invalid containerEngine name" and move it out of the webhook section), or drop it if the sibling CRD tests already cover enum validation.

S4. Fully qualify the ValidatingWebhookConfiguration name Gemini¶

pkg/controllers/app/app/controller.go


const (
	// appValidatorWebhookName is the name used for the App validating webhook.
	appValidatorWebhookName = "app-validator.app.rancherdesktop.io"
	// appValidatorConfigName is the name of the App ValidatingWebhookConfiguration.
	appValidatorConfigName = "app-validator"
)

// controller implements the base.Controller interface for app.
type controller struct {
	webhookPort     int

The webhook.name field is fully qualified (app-validator.app.rancherdesktop.io) but the ValidatingWebhookConfiguration resource name is not. In RDD's embedded single-user control plane there is no realistic collision risk, so this is purely stylistic. Worth considering only because the sibling constant one line up already uses the fully-qualified form, so the asymmetry reads as an oversight rather than a decision.

Fix:

-	appValidatorConfigName = "app-validator"
+	appValidatorConfigName = "app-validator.app.rancherdesktop.io"

Calibrated as a low-confidence suggestion because it is a cluster-scoped resource name and the repo context explicitly calls out that name-collision findings should be calibrated against the single-tenant assumption.

Design Observations ¶

Strengths ¶

Parse once at construction, fail fast (app_webhook.go:22-31, in-scope) Claude Codex Gemini — NewAppValidator invokes parseK3sVersions once and stores the result in an O(1) lookup map. A malformed embedded JSON fixture now breaks controller startup rather than the first admission request, and the hot path never touches the parser. The feedback commit aa37b3f moved this out of per-request code explicitly in response to earlier review, which is visible in the diff shape.
Framework integration matches sibling controllers (controller.go:68-136, in-scope) Claude — The new WebhookController implementation (SetWebhookPort, GetWebhookServiceName, GetWebhookManagers, setupWebhook) mirrors pkg/controllers/lima/limavm/controller.go and pkg/controllers/containers/container/container_controller.go one-for-one, and the compile-time assertion var _ base.WebhookController = &controller{} keeps the interface contract honest. No new framework primitives were needed; the PR reuses base.WebhookConfig / base.SetupWebhookForResource exactly as the surrounding controllers do.
set.bats adaptation anticipates the new constraint (bats/tests/32-app-controllers/set.bats:128-129, in-scope) Claude — The existing "clear string field" test previously ran rdd set kubernetes.version=, which the new webhook would reject when enabled=true. The PR changes it to clear both fields in one call (rdd set kubernetes.enabled=false kubernetes.version=) with an inline comment explaining the interaction. rdd set submits a single merge-patch (see cmd/rdd/set.go:385-407), so both fields land atomically and the webhook sees a consistent state.
Both Create and Update paths exercised end-to-end (bats/tests/32-app-controllers/app.bats:467-481, in-scope) Claude Gemini — The BATS suite uses rdd ctl apply --dry-run=server for Create and rdd ctl patch --dry-run=server for Update, routing real admission requests through the running API server. Controller-runtime dispatches these through distinct ValidateCreate and ValidateUpdate methods, and the repo context explicitly asks reviewers to verify both paths ship with coverage.

Concerns ¶

None.

Testing Assessment ¶

Coverage is good in shape but has three hygiene gaps already captured as suggestions above:

S1: the happy-path version (1.32.0) is hardcoded in a BATS constant and will go stale on the next k3s bump.
S2: the enabled=false + invalid version path is an intentional webhook rule with no BATS coverage.
S3: the invalid-engine test is actually exercising CRD schema validation, not the webhook.

The k3s_versions_test.go unit tests are thorough for the parser: they cover the bundled format (v1.32.0+k3s1), the bare form, missing prefix, missing suffix, duplicate k3s builds collapsing to one key, the empty array, and the malformed-JSON error path. There is no direct Go unit test for AppValidator.ValidateCreate / ValidateUpdate; coverage reaches them only through BATS. That is consistent with how the rest of the repo tests webhook validators — the integration path catches real regressions — but it means parser-adjacent edge cases (e.g., leading/trailing whitespace in a version field, case-insensitivity questions) are slower to diagnose if they ever arise.

Documentation Assessment ¶

All new exported symbols have appropriate Godoc. The NewAppValidator comment explicitly states why parsing happens at construction time ("so that a malformed JSON fixture causes controller startup to fail rather than the first admission request"), and ValidateDelete documents why it is a no-op. No user-facing documentation needs to change; this is an internal admission layer.

One minor observation: docs/design/api_app.md does not yet mention that invalid App settings are now rejected at admission time rather than later by the reconciler. Not a blocker for this PR, but the design doc will drift further from reality on every pass if it is not updated alongside changes like this one.

Commit Structure ¶

Clean three-commit sequence: 2a4a2ea introduces the webhook implementation, af452c9 adds BATS coverage, aa37b3f addresses round-1 review feedback. Each commit represents a distinct concept and the diff reads in order. The only caveat is that the feedback commit silently did not remove the containerEngine.name check even though round-1 feedback asked for its removal (see I1).

Acknowledged Limitations ¶

Test structure optional refactor — @mook-as suggested at bats/tests/32-app-controllers/app.bats:372 (discussion_r3082715251) that the dry-run tests could be restructured around a create_app() helper driven by yq expressions for clarity. Explicitly marked "Totally optional, because I could see argument going either way" and "feel free to ignore this." Not pursued in the feedback commit, and the current per-test YAML remains readable. Noting for completeness; no action needed.
No Go constants for engine names — @jandubois noted on app_webhook.go:50 (discussion_r3082914825): "We still should have constants for the engine names, but we can add them when we need them." Explicitly deferred. The string literals "moby" and "containerd" remain hardcoded in the webhook today, and also duplicated once more in the error message. Not tracked as a finding.

Unresolved Feedback ¶

Remove the duplicate containerEngine.name check — @jandubois on app_webhook.go:50 (discussion_r3082914825): "No, Kubernetes doesn't allow you assign arbitrary values to enum fields. You don't need to validate again." The feedback commit aa37b3f did not remove the check; see I1 above.

Agent Performance Retro ¶

Claude ¶

Produced a clean, well-structured review with two test-hygiene suggestions (missing enabled=false + invalid version case, mislabeled containerEngine test) that nobody else caught. Traced into the set.bats change to confirm rdd set submits an atomic merge-patch rather than two sequential updates, which is the right depth for this change. The notable miss was calibration on the dead-code containerEngine check: Claude saw it and called it out under Acknowledged Limitations but chose to defer to the author's PR comment rather than flagging it as a regression. The round-1 feedback from @jandubois was already in the prompt, and the repo context explicitly instructs reviewers to flag CRD-schema duplicates — the "defense in depth" framing was weaker than the finding warranted.

Codex ¶

Only one suggestion (S1, hardcoded K3S_VERSION) but it was the most durable maintenance finding in the review: nobody else noticed that the happy-path version is read from a BATS constant rather than from the same JSON fixture the webhook uses, even though the test file is short. The fix Codex proposed (jq -r '.versions[0]' + normalize) is concrete and ready to paste. Codex also deferred on the duplicate-check question under Acknowledged Limitations, reading the author's PR response as a valid exception. Its stderr log is just a progress transcript — no errors, no quota trouble.

Gemini ¶

The only agent to flag the dead containerEngine.name check as Important (I1), which turned out to be the correct calibration for a round-2 review where the feedback had been asked for in round 1 and not actioned. Gemini cited the CRD schema line numbers directly and proposed a minimal diff. Its second suggestion (qualify the ValidatingWebhookConfiguration name) is lower value in this single-tenant context but worth noting because it points at an inconsistency between two adjacent constants. Gemini continues to skip git blame entirely due to the known daily-quota issue, and its stderr log is empty of errors.

	Claude Opus 4.6	Codex GPT 5.4	Gemini 3.1 Pro
Duration	6m 57s	3m 46s	4m 03s
Findings	2S	1S	1I 1S
Tool calls	27 (Read 9, Grep 7, Bash 6)	43 (shell 38, stdin 5)	21 (runshellcommand 11, readfile 5, grepsearch 4)
Design observations	3	2	2
False positives	0	0	0
Unique insights	2	1	2
Files reviewed	6	6	6
Coverage misses	0	0	0
Totals	2S	1S	1I 1S
Downgraded	0	0	0
Dropped	0	0	0

Summary: Gemini delivered the highest-impact single finding of the round by holding the line on unresolved feedback rather than deferring to the PR comment thread. Codex contributed the most durable test-hygiene improvement (S1, fixture-derived version). Claude contributed the broadest coverage of smaller test gaps and the only cross-file tracing into rdd set to confirm atomicity. All three agents saw the duplicate containerEngine check; only Gemini flagged it at the severity the round-1 feedback trail justified.

Review Process Notes ¶

Repo context updates ¶

Flag unresolved round-1 feedback as Important when the author did not action it in a follow-up commit, regardless of whether they acknowledged it in the PR thread. When an earlier review comment asked for a specific code change and the later feedback commit did not make that change, do not let the PR-comment thread serve as an accepted exception. Acknowledgement in a comment is not the same as acceptance by the reviewer; the subsequent review round should confirm the feedback is either applied or explicitly deferred in-repo (design doc, code comment, tracked TODO). Flag the gap, cite the discussion link, and let the reviewer decide whether to downgrade — do not pre-emptively downgrade because "the author already saw it."