Deep Review: 20260415-120235-pr-302

Date2026-04-15 12:18
Reporancher-sandbox/rancher-desktop-daemon
Round2
Author@Nino-K
PR#302 — Validating admission controller app settings
Commitsaa37b3f Address admission webhook review feedback
af452c9 Add bats tests to verify the validating admission controller
2a4a2ea Add validating admission webhook for App settings
ReviewersClaude Opus 4.6, Codex GPT 5.4, Gemini 3.1 Pro
VerdictMerge with fixes — remove the dead containerEngine.name check that reviewers already asked to drop in round 1.
Wall-clock time19 min 10 s


Executive Summary

This PR moves App validation out of the reconciler and into a validating admission webhook, enforcing kubernetes.version ∈ bundled k3s-versions.json and requiring a non-empty kubernetes.version when kubernetes.enabled=true. The webhook registers for Create and Update, parses the JSON bundle once at NewAppValidator() so a broken fixture fails startup rather than the first admission request, and is exercised end-to-end through rdd ctl apply --dry-run=server BATS tests that cover both admission paths. The framework integration matches sibling controllers (limavm, containernamespace) cleanly.

The only blocking issue is unresolved round-1 feedback: the webhook still contains a duplicate containerEngine.name in {moby, containerd} check that is unreachable dead code behind the CRD's enum schema. @jandubois explicitly told the author to drop it in the previous review (discussion_r3082914825), but the feedback commit aa37b3f did not remove it. All three suggestions target test hygiene: one missing negative case, one hardcoded K3S_VERSION that will break on the next k3s bump, and one mislabeled test that actually exercises CRD schema validation rather than the new webhook.

Structure: 0 Critical, 1 Important, 4 Suggestions.


Critical Issues

None.


Important Issues

I1. Unresolved round-1 feedback: duplicate containerEngine.name check is unreachable dead code Gemini
func (v *AppValidator) ValidateDelete(_ context.Context, _ *v1alpha1.App) (ctrlwebhookadmission.Warnings, error) {
	return nil, nil
}

func (v *AppValidator) validate(app *v1alpha1.App) (ctrlwebhookadmission.Warnings, error) {
	engine := app.Spec.ContainerEngine.Name
	if engine != "moby" && engine != "containerd" {
		return nil, fmt.Errorf("spec.containerEngine.name %q is invalid; must be \"moby\" or \"containerd\"", engine)
	}

	k8s := app.Spec.Kubernetes
	if k8s.Enabled && k8s.Version == "" {
		return nil, errors.New("spec.kubernetes.version must not be empty when spec.kubernetes.enabled is true")
	}

The CRD schema already constrains this field to enum: [moby, containerd] (pkg/controllers/app/app/crd.yaml:60-63). CRD structural-schema validation runs strictly before validating webhooks in the admission pipeline, so any request with an invalid engine name is rejected by the API server with 422 Unprocessable Entity before the webhook is ever invoked. The check is dead code.

                  name:
                    default: moby
                    description: |-
                      name specifies the container engine to use.
                      Valid values are "moby" (Docker-compatible) and "containerd".
                    enum:
                    - moby
                    - containerd
                    type: string
                required:
                - name
                type: object
              kubernetes:
                description: kubernetes specifies the Kubernetes configuration.

@jandubois explicitly called this out on app_webhook.go:50 in the round-1 review (discussion_r3082914825): "No, Kubernetes doesn't allow you assign arbitrary values to enum fields. You don't need to validate again." The feedback commit aa37b3f addressed @mook-as's strings.Cut suggestion but did not remove this check.

	return nil, nil
}

func (v *AppValidator) validate(app *v1alpha1.App) (ctrlwebhookadmission.Warnings, error) {
	engine := app.Spec.ContainerEngine.Name
	if engine != "moby" && engine != "containerd" {
		return nil, fmt.Errorf("spec.containerEngine.name %q is invalid; must be \"moby\" or \"containerd\"", engine)
	}

	k8s := app.Spec.Kubernetes
	if k8s.Enabled && k8s.Version == "" {

Claude Opus 4.6 and Codex GPT 5.4 also noticed the duplication but deferred to the author's acknowledgement in the PR discussion. Because the round-1 feedback unambiguously asked for removal and the code still ships with the check, this reviewer treats it as unresolved feedback rather than an accepted exception.

Fix:

 func (v *AppValidator) validate(app *v1alpha1.App) (ctrlwebhookadmission.Warnings, error) {
-	engine := app.Spec.ContainerEngine.Name
-	if engine != "moby" && engine != "containerd" {
-		return nil, fmt.Errorf("spec.containerEngine.name %q is invalid; must be \"moby\" or \"containerd\"", engine)
-	}
-
 	k8s := app.Spec.Kubernetes
 	if k8s.Enabled && k8s.Version == "" {
 		return nil, errors.New("spec.kubernetes.version must not be empty when spec.kubernetes.enabled is true")
 	}

Removing the check also shrinks the webhook's responsibility to the rules the CRD schema cannot express (the enabled && version=="" cross-field constraint and the version-in-set lookup), which is a better shape for the file long-term. See also S3, which addresses the companion BATS test that currently exercises the CRD enum rather than the webhook.


Suggestions

S1. Hardcoded K3S_VERSION="1.32.0" will break tests on the next k3s bump Codex

APP_NAME="app"
VM_NAME="rd"
INPUT_CM_NAME="rd"
APP_VALIDATOR_CONFIG="app-validator"
K3S_VERSION="1.32.0"

delete_app() {
    # Stop the VM first so the LimaVM finalizer clears quickly.
    rdd set running=false 2>/dev/null || true
    rdd ctl delete app "${APP_NAME}" --ignore-not-found
@test "dry-run accepts supported kubernetes.version" {
    run -0 rdd ctl apply --dry-run=server -f - <<EOF
...
    version: "${K3S_VERSION}"
EOF
}

The constant at line 14 and the happy-path test at lines 451-464 bake in 1.32.0, but the webhook's source of truth is the embedded k3s-versions.json loaded at pkg/controllers/app/app/controller.go:36-37. A routine k3s-version bump that drops 1.32.0 from the bundle will break this test even though admission is still working correctly for the then-current supported versions. The repo already has one older hardcoded version in app.bats, so this extends an existing maintenance hazard rather than avoiding it.

var limaImagesWSL2 string

//go:embed lima-template.yaml
var limaTemplate string

//go:embed k3s-versions.json
var k3sVersionsData string

func limaTemplateData() string {
	images := limaImagesUnix
	if runtime.GOOS == "windows" {
		images = limaImagesWSL2
    version: "0.0.0"
EOF
    assert_output --partial "not supported"
}

@test "dry-run accepts supported kubernetes.version" {
    run -0 rdd ctl apply --dry-run=server -f - <<EOF
apiVersion: app.rancherdesktop.io/v1alpha1
kind: App
metadata:
  name: app
spec:
  running: false
  containerEngine:
    name: moby
  kubernetes:
    enabled: true
    version: "${K3S_VERSION}"
EOF
}

@test "webhook rejects invalid update via patch dry-run" {
    # The App may not exist here and might be deleted after kubernetes tests, so we create one.
    create_app false

Fix: derive the accepted version from the same bundle the webhook reads, normalizing it the same way as parseK3sVersions() (strip v prefix, cut at +):

get_supported_k3s_version() {
    run -0 jq -r '.versions[0]' pkg/controllers/app/app/k3s-versions.json
    local raw="${output#v}"
    printf '%s\n' "${raw%%+*}"
}

Then substitute the value into the success-path tests. The version the webhook actually accepts and the version the test submits now both come from the same file.

S2. Missing negative test: invalid version with kubernetes.enabled=false Claude

The webhook validates kubernetes.version against the supported set regardless of whether kubernetes.enabled is true (app_webhook.go:59-63):

	k8s := app.Spec.Kubernetes
	if k8s.Enabled && k8s.Version == "" {
		return nil, errors.New("spec.kubernetes.version must not be empty when spec.kubernetes.enabled is true")
	}

	if k8s.Version != "" {
		if _, ok := v.supportedK8sVersions[k8s.Version]; !ok {
			return nil, fmt.Errorf("spec.kubernetes.version %q is not supported; see the bundled k3s-versions.json for valid versions", k8s.Version)
		}
	}

	return nil, nil
}
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: SUSE LLC
# SPDX-FileCopyrightText: The Rancher Desktop Authors

load '../../helpers/load'

@mook-as specifically noted this design decision in the PR review ("That means we verify the version when it's disabled. Probably a good idea, I guess…") — it is intentional and prevents the App from storing a version that would blow up the moment kubernetes is later enabled. But the BATS suite only exercises enabled=true + invalid version; there is no coverage for the enabled=false + invalid version path, so a regression that accidentally gated this check on Enabled would ship green.

Fix: add a dry-run test case that pins the behavior:

@test "dry-run rejects unsupported version even when kubernetes is disabled" {
    run -1 rdd ctl apply --dry-run=server -f - <<YAML
apiVersion: app.rancherdesktop.io/v1alpha1
kind: App
metadata:
  name: app
spec:
  running: false
  containerEngine:
    name: moby
  kubernetes:
    enabled: false
    version: "0.0.0"
YAML
    assert_output --partial "not supported"
}
S3. "dry-run rejects invalid containerEngine name" exercises the CRD enum, not the webhook Claude
  kubernetes:
    enabled: false
EOF
}

@test "dry-run rejects invalid containerEngine name" {
    run -1 rdd ctl apply --dry-run=server -f - <<EOF
apiVersion: app.rancherdesktop.io/v1alpha1
kind: App
metadata:
  name: app
spec:
  running: false
  containerEngine:
    name: kvm
  kubernetes:
    enabled: false
EOF
    assert_output --partial "containerEngine.name"
}

@test "dry-run rejects kubernetes.enabled=true with empty version" {
    run -1 rdd ctl apply --dry-run=server -f - <<EOF
apiVersion: app.rancherdesktop.io/v1alpha1
kind: App

The test lives under the --- Admission webhook: --dry-run=server validation --- section header (line 362), which implies it is testing the webhook. In reality, CRD schema validation happens before validating webhooks, and crd.yaml:60-63 already constrains containerEngine.name to enum: [moby, containerd]. The API server rejects name: kvm via the schema long before the webhook is reached — the match on the containerEngine.name substring happens to pass because the CRD error message also contains that path.

                  name:
                    default: moby
                    description: |-
                      name specifies the container engine to use.
                      Valid values are "moby" (Docker-compatible) and "containerd".
                    enum:
                    - moby
                    - containerd
                    type: string
                required:
                - name
                type: object
              kubernetes:
                description: kubernetes specifies the Kubernetes configuration.
    rdd ctl wait --for=condition=Running=False \
        limavm/"${VM_NAME}" --namespace "${RDD_NAMESPACE}" --timeout=60s
    delete_app
}

# --- Admission webhook: --dry-run=server validation ---

@test "webhook configuration is registered" {
    rdd ctl wait --for=create ValidatingWebhookConfiguration "${APP_VALIDATOR_CONFIG}" --timeout=60s

    run -0 rdd ctl get ValidatingWebhookConfiguration "${APP_VALIDATOR_CONFIG}" \

The test still has value as a behavioral assertion ("invalid engine name is rejected"), but it is misfiled. Once I1 is fixed and the webhook no longer pretends to validate this field, the test should explicitly document that it is exercising CRD schema validation (e.g., rename to "CRD schema rejects invalid containerEngine name" and move it out of the webhook section), or drop it if the sibling CRD tests already cover enum validation.

S4. Fully qualify the ValidatingWebhookConfiguration name Gemini

const (
	// appValidatorWebhookName is the name used for the App validating webhook.
	appValidatorWebhookName = "app-validator.app.rancherdesktop.io"
	// appValidatorConfigName is the name of the App ValidatingWebhookConfiguration.
	appValidatorConfigName = "app-validator"
)

// controller implements the base.Controller interface for app.
type controller struct {
	webhookPort     int

The webhook.name field is fully qualified (app-validator.app.rancherdesktop.io) but the ValidatingWebhookConfiguration resource name is not. In RDD's embedded single-user control plane there is no realistic collision risk, so this is purely stylistic. Worth considering only because the sibling constant one line up already uses the fully-qualified form, so the asymmetry reads as an oversight rather than a decision.

Fix:

-	appValidatorConfigName = "app-validator"
+	appValidatorConfigName = "app-validator.app.rancherdesktop.io"

Calibrated as a low-confidence suggestion because it is a cluster-scoped resource name and the repo context explicitly calls out that name-collision findings should be calibrated against the single-tenant assumption.


Design Observations

Strengths

Concerns

None.


Testing Assessment

Coverage is good in shape but has three hygiene gaps already captured as suggestions above:

The k3s_versions_test.go unit tests are thorough for the parser: they cover the bundled format (v1.32.0+k3s1), the bare form, missing prefix, missing suffix, duplicate k3s builds collapsing to one key, the empty array, and the malformed-JSON error path. There is no direct Go unit test for AppValidator.ValidateCreate / ValidateUpdate; coverage reaches them only through BATS. That is consistent with how the rest of the repo tests webhook validators — the integration path catches real regressions — but it means parser-adjacent edge cases (e.g., leading/trailing whitespace in a version field, case-insensitivity questions) are slower to diagnose if they ever arise.


Documentation Assessment

All new exported symbols have appropriate Godoc. The NewAppValidator comment explicitly states why parsing happens at construction time ("so that a malformed JSON fixture causes controller startup to fail rather than the first admission request"), and ValidateDelete documents why it is a no-op. No user-facing documentation needs to change; this is an internal admission layer.

One minor observation: docs/design/api_app.md does not yet mention that invalid App settings are now rejected at admission time rather than later by the reconciler. Not a blocker for this PR, but the design doc will drift further from reality on every pass if it is not updated alongside changes like this one.


Commit Structure

Clean three-commit sequence: 2a4a2ea introduces the webhook implementation, af452c9 adds BATS coverage, aa37b3f addresses round-1 review feedback. Each commit represents a distinct concept and the diff reads in order. The only caveat is that the feedback commit silently did not remove the containerEngine.name check even though round-1 feedback asked for its removal (see I1).


Acknowledged Limitations


Unresolved Feedback


Agent Performance Retro

Claude

Produced a clean, well-structured review with two test-hygiene suggestions (missing enabled=false + invalid version case, mislabeled containerEngine test) that nobody else caught. Traced into the set.bats change to confirm rdd set submits an atomic merge-patch rather than two sequential updates, which is the right depth for this change. The notable miss was calibration on the dead-code containerEngine check: Claude saw it and called it out under Acknowledged Limitations but chose to defer to the author's PR comment rather than flagging it as a regression. The round-1 feedback from @jandubois was already in the prompt, and the repo context explicitly instructs reviewers to flag CRD-schema duplicates — the "defense in depth" framing was weaker than the finding warranted.

Codex

Only one suggestion (S1, hardcoded K3S_VERSION) but it was the most durable maintenance finding in the review: nobody else noticed that the happy-path version is read from a BATS constant rather than from the same JSON fixture the webhook uses, even though the test file is short. The fix Codex proposed (jq -r '.versions[0]' + normalize) is concrete and ready to paste. Codex also deferred on the duplicate-check question under Acknowledged Limitations, reading the author's PR response as a valid exception. Its stderr log is just a progress transcript — no errors, no quota trouble.

Gemini

The only agent to flag the dead containerEngine.name check as Important (I1), which turned out to be the correct calibration for a round-2 review where the feedback had been asked for in round 1 and not actioned. Gemini cited the CRD schema line numbers directly and proposed a minimal diff. Its second suggestion (qualify the ValidatingWebhookConfiguration name) is lower value in this single-tenant context but worth noting because it points at an inconsistency between two adjacent constants. Gemini continues to skip git blame entirely due to the known daily-quota issue, and its stderr log is empty of errors.

Claude Opus 4.6Codex GPT 5.4Gemini 3.1 Pro
Duration6m 57s3m 46s4m 03s
Findings2S1S1I 1S
Tool calls27 (Read 9, Grep 7, Bash 6)43 (shell 38, stdin 5)21 (runshellcommand 11, readfile 5, grepsearch 4)
Design observations322
False positives000
Unique insights212
Files reviewed666
Coverage misses000
Totals2S1S1I 1S
Downgraded000
Dropped000

Summary: Gemini delivered the highest-impact single finding of the round by holding the line on unresolved feedback rather than deferring to the PR comment thread. Codex contributed the most durable test-hygiene improvement (S1, fixture-derived version). Claude contributed the broadest coverage of smaller test gaps and the only cross-file tracing into rdd set to confirm atomicity. All three agents saw the duplicate containerEngine check; only Gemini flagged it at the severity the round-1 feedback trail justified.


Review Process Notes

Repo context updates