Deep Review: PR #238

Date2026-03-16
PR#238 — Split RDD finalizer into /cleanup and /owned
Branchsplit-finalizers
Commits9de9269 Split RDD finalizer into /cleanup and /owned
ReviewersClaude Codex Gemini
VerdictMerge with fixes — upgrade path for legacy finalizers must be addressed
Wall-clock time12 min 27 s

Executive Summary

This PR splits the single shared rdd.rancherdesktop.io/cleanup finalizer into two purpose-specific finalizers: /cleanup for self-protection (a resource's own controller runs cleanup before deletion) and /owned for cascade blocking (an owner prevents child deletion until explicit release via DeleteOwnedResources). The split fixes a real bug where DeleteOwnedResources stripped the only finalizer on a child resource, bypassing the child controller's cleanup logic.

The implementation is correct for new resources but leaves existing resources stranded on upgrade: legacy template ConfigMaps still carry /cleanup, which DeleteOwnedResources no longer strips.


Critical Issues

None.


Important Issues

1. Legacy template ConfigMaps become stuck in Terminating after upgrade Claude Codex Gemini

Before this PR, the LimaVM webhook created template ConfigMaps with rdd.rancherdesktop.io/cleanup (the old FinalizerName). After this PR, DeleteOwnedResources strips only /owned. When a user deletes a LimaVM created before the upgrade:

		for _, item := range list.Items {
			if !IsOwnedByUID(&item, owner.GetUID()) {
				continue
			}

			// Strip the owned finalizer to unblock deletion. The cleanup finalizer (if any)
			// remains so the child's own controller can still run its cleanup logic.
			patch := client.MergeFrom(item.DeepCopy())
			if controllerutil.RemoveFinalizer(&item, OwnedFinalizerName) {
				// Use Patch instead of Update for PartialObjectMetadata
				if err := c.Patch(ctx, &item, patch); err != nil {
					itemLogger := logger.V(1).WithValues("namespace", item.GetNamespace(), "name", item.GetName(), "gvk", gvk.String())
					if apierrors.IsNotFound(err) {
						itemLogger.Info("Resource already deleted during finalizer removal")
						continue
					}
					// For other errors, log and collect error but proceed with deletion attempt
					// The deletion might still succeed if there are no other blocking finalizers
					itemLogger.Info("Failed to remove finalizer, will attempt deletion anyway", "error", err)
					errs = append(errs, fmt.Errorf("failed to remove finalizers from %s %s/%s: %w", gvk, item.GetNamespace(), item.GetName(), err))
				}
			}
			if err := c.Delete(ctx, &item); err != nil && client.IgnoreNotFound(err) != nil {
				errs = append(errs, fmt.Errorf("failed to delete %s %s/%s during cleanup: %w", gvk, item.GetNamespace(), item.GetName(), err))
			}
		}

The webhook at controller.go:210-211 now creates new template ConfigMaps with /owned, but existing ones carry /cleanup:

	// Create template ConfigMap with label
	templateConfigMap := &corev1.ConfigMap{
		ObjectMeta: metav1.ObjectMeta{
			Name:      limavm.GetTemplateConfigMapName(),
			Namespace: limavm.Namespace,
			Labels: map[string]string{
				controllers.TemplateConfigMapLabel: "true",
			},
			Finalizers: []string{
				base.OwnedFinalizerName,
			},
		},
		Data: map[string]string{
			v1alpha1.TemplateConfigMapKey: templateData,
		},
	}

All three agents identified this issue. Claude rated it important/gap; Codex rated it important/regression; Gemini rated it critical/regression. The correct classification is important, regression: the PR introduces the incompatibility, but recovery is straightforward in the single-user embedded context (delete the kine database or manually patch the finalizer).

Fix: Migrate legacy children. In the LimaVM reconciler, when fetching the template ConfigMap during normal reconciliation, check whether it carries /cleanup without /owned. If so, patch to replace /cleanup with /owned. This keeps the migration close to the only controller that creates these ConfigMaps and avoids polluting the generic DeleteOwnedResources helper with knowledge of the legacy scheme.


Suggestions

1. Stale comment claims owned resources "have no finalizers" Claude

After this PR, template ConfigMaps carry the /owned finalizer. The practical conclusion (delete in one pass) remains correct because DeleteOwnedResources strips /owned before issuing the delete, but the stated reasoning at lines 60-64 is wrong. Blame confirms these lines predate this PR (de1f2298), but the PR made the comment inaccurate.


	// Delete owned resources and remove the finalizer in one pass. This is safe
	// because owned resources (ConfigMaps) delete instantly and have no finalizers.
	// If we later own resources with complex teardown, split this into two reconcile
	// cycles: delete owned resources, then verify they are gone before removing
	// the finalizer.
	if err := base.DeleteOwnedResources(ctx, r.Client, limaVM, r.Manager); err != nil {
		logger.Error(err, "Failed to delete owned resources")
		return ctrl.Result{}, err
	}

	// Remove finalizer
	if err := base.RemoveCleanupFinalizer(ctx, r.Client, limaVM); err != nil {
		logger.Error(err, "Failed to remove finalizer")
		return ctrl.Result{}, err
	}

Fix: Rewrite the comment to explain that DeleteOwnedResources strips /owned before deletion, and the two-cycle caveat applies if a child ever needs /cleanup alongside /owned.

2. Missing test case for legacy child with only /cleanup finalizer Claude Codex

The test covers no finalizers, owned-only, and both-finalizers. The missing case is a child with only /cleanup — the upgrade scenario from Important Issue #1. RemoveFinalizer at line 162 returns false (different code path), and the resource gets stuck in Terminating.

	testCases := []struct {
		name       string
		finalizers []string
		// expectDeleted is true when the owned resource should be fully deleted
		// (NotFound) after DeleteOwnedResources. When false, the resource still
		// exists because a remaining finalizer blocks deletion.
		expectDeleted bool
	}{
		{
			name:          "config map without finalizers",
			expectDeleted: true,
		},
		{
			name:          "config map with owned finalizer only",
			finalizers:    []string{OwnedFinalizerName},
			expectDeleted: true,
		},
		{
			name:          "config map with both finalizers strips only owned",
			finalizers:    []string{CleanupFinalizerName, OwnedFinalizerName},
			expectDeleted: false,
		},
	}

Fix: Add a test case:

{
    name:          "config map with cleanup finalizer only (legacy)",
    finalizers:    []string{CleanupFinalizerName},
    expectDeleted: false,
},

After implementing the migration fix, add a corresponding integration test that creates a legacy-finalizered ConfigMap and verifies LimaVM deletion cleans it up.

3. EnsureOwnedFinalizer and RemoveOwnedFinalizer are unused Gemini

The LimaVM webhook applies OwnedFinalizerName directly in the struct literal (controller.go:211), and DeleteOwnedResources removes it via controllerutil.RemoveFinalizer with a direct patch (finalizer.go:162). Neither caller uses the new helpers.

// EnsureOwnedFinalizer adds the owned finalizer to a child object if not already present.
// Returns true if the finalizer was added and the object has been updated.
func EnsureOwnedFinalizer(ctx context.Context, c client.Client, obj client.Object) (bool, error) {
	if !controllerutil.AddFinalizer(obj, OwnedFinalizerName) {
		return false, nil
	}
	if err := c.Update(ctx, obj); err != nil {
		return false, fmt.Errorf("failed to add owned finalizer: %w", err)
	}
	return true, nil
}

// RemoveOwnedFinalizer removes the owned finalizer from the object and updates it.
func RemoveOwnedFinalizer(ctx context.Context, c client.Client, obj client.Object) error {
	if controllerutil.RemoveFinalizer(obj, OwnedFinalizerName) {
		if err := c.Update(ctx, obj); err != nil {
			return fmt.Errorf("failed to remove owned finalizer: %w", err)
		}
	}
	return nil
}

Fix: Either remove these functions or refactor existing code to use them. They provide a consistent API surface for future controllers, so keeping them is reasonable — but document that intent.


Design Observations

1. DeleteOwnedResources is now schema-version-sensitive (in-scope) Codex

The helper assumes every child that should be unblocked already uses /owned. Encoding a short compatibility window (stripping the legacy finalizer from owned resources) in the helper would eliminate the entire class of upgrade bugs across all controllers, rather than requiring each controller to handle its own migration. The tradeoff: this would contradict the design principle that DeleteOwnedResources does not strip /cleanup. A targeted migration in the owning controller is safer and keeps the helper's contract clean.

2. The /owned finalizer doubles as a user-facing deletion guard (in-scope) Gemini

Child resources like template ConfigMaps do not need /owned for the garbage collection loop to work — DeleteOwnedResources can delete resources without a finalizer. The /owned finalizer's primary value is blocking accidental kubectl delete attempts via the ValidateDelete webhook. This layered approach (webhook + finalizer) is sound but worth documenting explicitly so future contributors understand the finalizer's dual purpose.


Testing Assessment

The unit tests cover the three key scenarios for the new design: no finalizer, owned-only (deleted), and both finalizers (owned stripped, cleanup remains). The BATS integration test verifies that LimaVM gets /cleanup and the template ConfigMap gets /owned.

Untested scenarios ranked by risk:

  1. Upgrade path: Child resource with only the legacy /cleanup finalizer processed by DeleteOwnedResources. Highest risk — this is the scenario from Important Issue #1.
  2. Webhook validation during cascade deletion: Verifying that ValidateDelete allows deletion of a template ConfigMap immediately after DeleteOwnedResources strips /owned.
  3. Concurrent owner deletion: Two owners deleting simultaneously, both calling DeleteOwnedResources on a shared child. The patch/delete is naturally idempotent (NotFound handled at lines 166-168), so this is likely safe but untested.

Documentation Assessment

The new docs/design/controllers.md documents the finalizer design clearly, including the "Why Two Finalizers?" rationale. Cross-links from README.md and discovery.md are appropriate.

Gap: The design doc should mention the upgrade path for existing resources, or state explicitly that migration from the single-finalizer scheme is required.


Agent Performance Retro

Claude Opus 4.6

Codex GPT 5.4

Gemini 3.1 Pro

Summary

MetricClaude Opus 4.6Codex GPT 5.4Gemini 3.1 Pro
Duration4:163:204:26
Critical001 (overcalibrated)
Important111
Suggestion211
Design observations211
False positives000
Unique insights102

Claude provided the most thorough analysis with the best severity calibration. Gemini contributed the most unique findings but overcalibrated severity. Codex was accurate and concise but offered no unique insights beyond the shared finding.


Skill Improvement Recommendations