Deep Review: 20260412-031335-pr-295

Date	2026-04-12 03:13
Repo	rancher-sandbox/rancher-desktop-daemon
Round	2 (of PR #295)
Author	@jandubois
PR	#295 — Container Engine watcher
Commits	`7c8f356` ci: always capture support bundle and log hostagent pgid `9669b4b` bats: close fds 3 and 4 when invoking rdd `50bdc15` ci: capture support bundle when BATS targets time out `762c1e8` Make rdd set wait for the desired state by default `65943da` Add BATS integration tests for the engine controller `ba6caf7` Document engine mirroring and container state transitions `4db3f20` Allow container spec.state transitions via webhook `93359f7` Add engine controller that mirrors Docker state into K8s `87f1df1` Add DockerSocket() helper and Docker client dependency `32a0109` Work around distro proxy bug; update docker socket path `536ccb2` Rosetta is now a vz driver option, not a global option
Reviewers	Claude Opus 4.6, Codex GPT 5.4, Gemini 3.1 Pro
Verdict	Merge with fixes — untag handling is broken, container pause/unpause is asymmetric, and several watcher lifecycle races need addressing before merge
Wall-clock time	`29 min 41 s`

Executive Summary ¶

PR #295 introduces the engine controller — a new EngineReconciler plus long-lived dockerWatcher goroutine that mirrors Docker containers, images, and volumes into Kubernetes resources, forwards K8s-initiated deletions and spec.state transitions back to Docker, relaxes the container webhook to permit state changes, and adds rdd set --timeout wait semantics. It also bundles CI hang-diagnostics (bats-with-timeout.sh), Lima template updates (.rd2 socket path, vz rosetta nesting), a distro proxy workaround, and BATS integration coverage.

The overall architecture is sound — a singleton reconciler owning a watcher goroutine, finalizer-gated reverse deletes, and a Get-then-Apply create pattern to avoid fighting user intent on spec.state. But the watcher's Docker↔K8s event handling has several correctness gaps: ActionUnTag events never remove stale mirrors (the event payload does not carry the tag name, and the call site passes the image ID into a tag-based lookup), paused containers cannot be unpaused via spec.state=running (asymmetric with the tested pause→stop path), and a transient Docker event-stream blip wipes every mirror resource. There is also a bootstrap race where the initial fullSync finishes before the event stream subscribes, so mutations in that window are invisible, and a NotFound during Inspect during fullSync aborts the whole sync.

No CRITICAL findings: 11 IMPORTANT issues (1 unique to each of the three reviewers plus a several-agent cluster around untag and error handling) and 10 SUGGESTIONS. Structurally, two commits (4db3f20 and 762c1e8) bundle engine-controller behavior changes into commits advertised as webhook and CLI features — rework the commit history before merge. Test coverage is strong on the happy paths (K8s-initiated deletion, state transitions, moby⇄containerd backend switch) but misses paused→running, transient Docker disconnect, and Docker-initiated untag flows.

Critical Issues ¶

None.

Important Issues ¶

I1. Docker ActionUnTag events never remove the stale tag mirror Claude Codex Gemini¶

pkg/controllers/app/engine/controllers/docker_watcher.go

		events.ActionUnPause,
		events.ActionRestart:
		log.V(1).Info("Container event", "action", msg.Action, "id", msg.Actor.ID)
		return w.syncContainer(ctx, msg.Actor.ID)

	case events.ActionDestroy:
		log.V(1).Info("Container destroyed", "id", msg.Actor.ID)
		return w.removeMirrorResource(ctx, &containersv1alpha1.Container{}, msg.Actor.ID)

	default:
		return nil
	}
}

All three reviewers caught this. When Docker emits an untag event, msg.Actor.ID is the image ID hash, not the tag. removeImageByTag searches for status.repoTag == <hash> at pkg/controllers/app/engine/controllers/sync_images.go:185, which never matches. The per-tag Image mirror survives indefinitely, until a watcher restart runs fullSync again.

pkg/controllers/app/engine/controllers/sync_images.go

// reconcileImageByID re-inspects an image and reconciles every K8s Image
// mirror whose status.id matches. Tags still present are re-applied; K8s
// resources for tags that are no longer present are deleted. If the
// image has been fully removed, all mirrors with that status.id are
// deleted instead.
//
// This is the path for events that carry an image ID but not the tag
// name — notably Docker's untag events, where the event payload only
// contains the image ID (see handleImageEvent).
func (w *dockerWatcher) reconcileImageByID(ctx context.Context, id string) error {
	result, err := w.cli.ImageInspect(ctx, id)

I verified the upstream Moby contract: daemon/images/image_delete.go:134,191,272 all call LogImageEvent(ctx, imgID.String(), imgID.String(), events.ActionUnTag), so both Actor.ID and Attributes["name"] are the image ID hash — the tag that was removed is not propagated through the event payload at all. Gemini proposed fixing this by reading msg.Actor.Attributes["name"], which is also wrong for this reason.

Fix (Codex's proposal): treat untag as a per-image reconciliation, not a direct tag delete. Re-inspect the image by ID, then delete any K8s Image objects whose status.id matches but whose status.repoTag is no longer in the fresh RepoTags. If the inspect returns NotFound, fall back to removeImagesByID.

 case events.ActionUnTag:
-    return w.removeImageByTag(ctx, msg.Actor.ID)
+    return w.syncImageAndPruneStaleTags(ctx, msg.Actor.ID)

I2. reconcileContainerState does not unpause a paused container on spec.state=running Claude¶

pkg/controllers/app/engine/controllers/docker_watcher.go

// iteration starts from a fresh copy via DeepCopyObject and writes only
// that copy, so caller state is never observed.
func (w *dockerWatcher) removeMirrorResource(ctx context.Context, obj client.Object, name string) error {
	key := client.ObjectKey{Name: name, Namespace: apiNamespace}
	retryErr := retry.RetryOnConflict(retry.DefaultRetry, func() error {
		latest := obj.DeepCopyObject().(client.Object)
		if err := w.k8s.Get(ctx, key, latest); err != nil {
			if apierrors.IsNotFound(err) {
				return nil
			}
			return err
		}
		if !removeFinalizer(latest, mirrorFinalizer) {
			return nil
		}
		return w.k8s.Update(ctx, latest)
	})
	if retryErr != nil {
		return fmt.Errorf("failed to remove finalizer from %s: %w", name, retryErr)
	}

The comment at docker_watcher.go:268-271 carefully explains that ContainerStop handles paused/restarting containers natively, and the BATS test patching spec.state=created stops a paused container at bats/tests/32-app-controllers/engine.bats:200-220 covers that direction. The symmetric case (desired=running, actual=paused) falls into ContainerStart, which Docker rejects for paused containers with "container is paused, unpause before starting". The user's intent is silently dropped and there is no test for this direction.

pkg/controllers/app/engine/controllers/docker_watcher.go

//
// The obj parameter is used as an empty object template: callers may
// pass either a zero-valued struct (e.g. &containersv1alpha1.Volume{})
// or a slice element from a prior List; in both cases each retry
// iteration starts from a fresh copy via DeepCopyObject and writes only
// that copy, so caller state is never observed.
func (w *dockerWatcher) removeMirrorResource(ctx context.Context, obj client.Object, name string) error {
	key := client.ObjectKey{Name: name, Namespace: apiNamespace}
	retryErr := retry.RetryOnConflict(retry.DefaultRetry, func() error {
		latest := obj.DeepCopyObject().(client.Object)
		if err := w.k8s.Get(ctx, key, latest); err != nil {
			if apierrors.IsNotFound(err) {
				return nil
			}

bats/tests/32-app-controllers/engine.bats

    run -0 docker inspect test-state --format '{{.Id}}'
    cid=${output}

    rdd ctl wait --for=jsonpath='{.status.status}'=running \
        --namespace="${NAMESPACE}" container/"${cid}" --timeout=30s

    rdd ctl patch container "${cid}" --namespace="${NAMESPACE}" \
        --type=merge -p '{"spec":{"state":"created"}}'

    rdd ctl wait --for=jsonpath='{.status.status}'=exited \
        --namespace="${NAMESPACE}" container/"${cid}" --timeout=30s

    run -0 docker inspect test-state --format '{{.State.Status}}'
    assert_output "exited"
}

@test "patching spec.state=running restarts Docker container" {
    run -0 docker inspect test-state --format '{{.Id}}'
    cid=${output}

    rdd ctl patch container "${cid}" --namespace="${NAMESPACE}" \
        --type=merge -p '{"spec":{"state":"running"}}'

    rdd ctl wait --for=jsonpath='{.status.status}'=running \
        --namespace="${NAMESPACE}" container/"${cid}" --timeout=30s

    run -0 docker inspect test-state --format '{{.State.Status}}'
    assert_output "running"
}

@test "patching spec.state=created stops a paused container" {

Fix: dispatch ContainerUnpause when actual == paused and desired == running, and add a BATS test mirroring the existing stop-paused test.

 switch desired {
 case containersv1alpha1.ContainerStatusRunning:
+    if actual == containersv1alpha1.ContainerStatusPaused {
+        log.Info("Unpausing container", "id", c.Name)
+        _, err := w.cli.ContainerUnpause(ctx, c.Name, dockerclient.ContainerUnpauseOptions{})
+        return err
+    }
     log.Info("Starting container", "id", c.Name)
     _, err := w.cli.ContainerStart(ctx, c.Name, dockerclient.ContainerStartOptions{})
     return err

I3. Watcher bootstrap race — mutations during fullSync are invisible Codex¶

pkg/controllers/app/engine/controllers/docker_watcher.go

		k8s:           k8s,
		cancel:        watchCancel,
		done:          make(chan struct{}),
		reconcileChan: reconcileChan,
	}

	// Capture a Docker-relative timestamp before fullSync begins. The
	// run() goroutine passes this to the event stream as the Since
	// filter, so any mutation that happens while fullSync is snapshotting
	// state is replayed once the stream opens. Without this, the window
	// between the List-based snapshot and the event subscription is a
	// blind spot during VM startup.
	since := strconv.FormatInt(time.Now().Unix(), 10)

	if err := w.fullSync(watchCtx); err != nil {
		watchCancel()
		cli.Close()
		return nil, fmt.Errorf("failed to perform initial sync: %w", err)

newDockerWatcher() runs fullSync before run() subscribes to the Docker event stream at docker_watcher.go:108-110. Any container/image/volume mutation in Docker during that window is invisible — it happened after the sync snapshotted state, but before the event stream was open. This is especially plausible during VM startup where dockerd is actively accepting work as the App's Running condition flips.

pkg/controllers/app/engine/controllers/docker_watcher.go

	}
}

// run is the main watcher goroutine that processes Docker events. The
// since parameter is the "Since" filter passed to the Docker events
// endpoint, captured before the initial fullSync so events that
// happened during the sync window are replayed once the stream opens.
func (w *dockerWatcher) run(ctx context.Context, since string) {
	log := logf.FromContext(ctx).WithName("docker-watcher")
	defer close(w.done)
	// Recover from panics in event handling so an unexpected Docker
	// event shape cannot crash the whole app-controller process. The
	// engine reconciler notices the dead watcher via alive() on the

After startup, Reconcile() marks the engine connected and returns ctrl.Result{}, nil at engine_reconciler.go:143-146, so there is no immediate second sync to close the gap.

pkg/controllers/app/engine/controllers/engine_reconciler.go

		return ctrl.Result{}, r.setEngineCondition(ctx, &app, terminalStatus, terminalReason, terminalMessage)
	}

	if !watcherRunning {
		log.Info("App is running, starting Docker watcher")
		if err := r.startWatcher(ctx); err != nil {
			log.Error(err, "Failed to start Docker watcher")
			_ = r.setEngineCondition(ctx, &app, metav1.ConditionFalse, "ConnectFailed", err.Error())
			return ctrl.Result{}, err
		}
		if err := r.setEngineCondition(ctx, &app, metav1.ConditionTrue, "Connected", "Container engine synced"); err != nil {
			return ctrl.Result{}, err
		}
		return ctrl.Result{}, nil

Fix: open the event stream before or in parallel with fullSync, buffer events until the sync finishes, then replay. Alternatively, record a Since timestamp before fullSync starts and subscribe to the event stream with Since set to that timestamp so post-snapshot mutations are replayed after the sync.

I4. Transient Docker disconnect wipes every mirror resource Claude Gemini¶

pkg/controllers/app/engine/controllers/engine_reconciler.go

		r.watcher = nil
		watcherRunning = false
	}
	r.watcherMu.Unlock()

	// A watcher that dies while the App is still running is treated as a
	// transient disconnect: log it, clear the watcher pointer, and fall
	// through to the normal reconcile flow. If wantWatcher is still true
	// below, the reconciler starts a fresh watcher whose fullSync
	// reconciles any drift in place — downstream clients see no churn
	// because the existing mirror resources keep their identity. An
	// actual stop or backend change is handled by the !wantWatcher
	// branch below, which does sweep the mirrors.
	if watcherDied {
		log.Info("Docker watcher died, will attempt to reconnect")
	}

	// The watcher should only run when the App is Running with the moby
	// backend. In every other state (stopped, containerd, both) we stop
	// the watcher and sweep mirror resources. The sweep is gated on the
	// current ContainerEngineReady reason: once the condition reflects
	// the terminal state, cleanup would be a no-op against an empty
	// namespace, so we short-circuit to avoid four empty List calls per

Any transient error on the Docker event stream (docker_watcher.go:117-123 — e.g. user ran systemctl restart docker inside the guest) causes run() to return, flips the watcher to !alive on the next reconcile, and then cleanupMirrorResources deletes every Container/Image/Volume/ContainerNamespace in rancher-desktop. The subsequent reconcile restarts the watcher and fullSync recreates them, but every downstream client (front end, rdd ctl get, anything watching) sees the whole universe blink DELETED → ADDED with fresh UIDs. A single Docker blip is as disruptive as an intentional VM shutdown.

pkg/controllers/app/engine/controllers/docker_watcher.go

	defer close(w.done)
	// Recover from panics in event handling so an unexpected Docker
	// event shape cannot crash the whole app-controller process. The
	// engine reconciler notices the dead watcher via alive() on the
	// next reconcile and starts a fresh one.
	defer func() {
		if r := recover(); r != nil {
			log.Error(nil, "panic in Docker watcher goroutine", "recovered", r)
		}
	}()

	eventFilter := dockerclient.Filters{}.
		Add("type", string(events.ContainerEventType)).
		Add("type", string(events.ImageEventType)).
		Add("type", string(events.VolumeEventType))

	result := w.cli.Events(ctx, dockerclient.EventsListOptions{

Fix: split "VM stopped / engine backend changed" (sweep everything — current behavior) from "transient watcher failure" (stop the watcher, leave mirror resources in place, attempt a bounded retry; on success fullSync reconciles drift in place). Only sweep if reconnect fails after a timeout. This preserves resource identity across transient outages and avoids flooding downstream watchers.

I5. IsNotFound during fullSync Inspect aborts the whole sync Gemini¶

pkg/controllers/app/engine/controllers/sync_containers.go

	}

	return errors.Join(errs...)
}

// syncContainer inspects a single container and creates/updates the K8s resource.
// NotFound is treated as success: the container raced a concurrent delete
// between List and Inspect, and the stale K8s mirror will be pruned by
// syncAllContainers' remove-stale step later in the same sync.
func (w *dockerWatcher) syncContainer(ctx context.Context, id string) error {
	result, err := w.cli.ContainerInspect(ctx, id, dockerclient.ContainerInspectOptions{})
	if cerrdefs.IsNotFound(err) {
		return nil
	}
	if err != nil {
		return fmt.Errorf("failed to inspect container %s: %w", id, err)
	}

During fullSync, the controller lists all containers/images/volumes and then Inspects each by ID. If a resource is deleted in Docker between the List and Inspect calls (a routine TOCTOU), Inspect returns a NotFound error. That error propagates through syncAllContainers → fullSync → newDockerWatcher, which then tears down the watcher (watchCancel(), cli.Close()) and returns ConnectFailed. The engine reconciler then flips the App condition to ConnectFailed and waits for the next retry — over nothing more than a single concurrent docker rm.

Fix: swallow NotFound as success in syncContainer, syncImageFromSummary, and syncVolume — the stale K8s mirror will be swept in the "remove stale" step later in the sync.

 result, err := w.cli.ContainerInspect(ctx, id, dockerclient.ContainerInspectOptions{})
 if err != nil {
+    if cerrdefs.IsNotFound(err) {
+        return nil
+    }
     return fmt.Errorf("failed to inspect container %s: %w", id, err)
 }

I6. ActionTag on a dangling image leaks the dangling mirror Gemini¶

pkg/controllers/app/engine/controllers/sync_images.go

					WithRepoTag(tag).
					WithNamespace(containerNamespace),
			); err != nil {
				errs = append(errs, err)
			}
		}
	} else {
		// Dangling image (no tags).
		name := sanitizeKubernetesObjectName(inspect.ID)
		names = append(names, name)
		if err := w.applyImage(ctx,
			containersv1alpha1apply.Image(name, apiNamespace).
				WithFinalizers(mirrorFinalizer),
			imageStatusFromInspect(inspect),
		); err != nil {
			errs = append(errs, err)
		}
	}

	return names, errors.Join(errs...)
}

// imageStatusFromInspect builds a fresh ImageStatus apply config from a
// Docker inspect response. Returning a new value per call avoids the
// aliasing trap of a shallow struct copy — slice/map WithX calls on
// one "copy" would otherwise mutate the backing memory shared with
// other callers.
func imageStatusFromInspect(inspect mobyimage.InspectResponse) *containersv1alpha1apply.ImageStatusApplyConfiguration {
	statusApply := containersv1alpha1apply.ImageStatus().
		WithID(inspect.ID).
		WithRepoDigests(inspect.RepoDigests...).
		WithArchitecture(inspect.Architecture).
		WithOS(inspect.Os).
		WithSize(inspect.Size).
		WithLabels(inspect.Config.Labels)

	if t, err := time.Parse(time.RFC3339Nano, inspect.Created); err == nil {
		statusApply.WithCreatedAt(metav1.NewTime(t))
	}
	return statusApply
}

When an image that was dangling (K8s mirror name <id>) is subsequently tagged, applyImageFromInspect creates the new tagged mirror <id>-<taghash> but never deletes the pre-existing dangling mirror. Both survive until the next fullSync at watcher restart.

This is the mirror image of I1: tags and the dangling form should transition cleanly in both directions, but neither direction is currently handled incrementally.

Fix: inside the len(inspect.RepoTags) > 0 branch, proactively remove the dangling resource:

w.removeMirrorResource(ctx, &containersv1alpha1.Image{}, sanitizeKubernetesObjectName(inspect.ID))

I7. Failed spec.state transitions stall after one attempt Codex Claude Gemini¶

pkg/controllers/app/engine/controllers/engine_reconciler.go

// the original spec.state patch) and then sit forever.
func (r *EngineReconciler) reconcileContainerSpecs(ctx context.Context) error {
	r.watcherMu.Lock()
	w := r.watcher
	r.watcherMu.Unlock()
	if w == nil {
		return nil
	}

	var containers containersv1alpha1.ContainerList
	if err := r.List(ctx, &containers, client.InNamespace(apiNamespace)); err != nil {
		return err
	}

	var errs []error
	for i := range containers.Items {
		c := &containers.Items[i]
		if c.DeletionTimestamp != nil {
			continue
		}
		if err := w.reconcileContainerState(ctx, c); err != nil {

reconcileContainerState returns raw Docker API errors, but reconcileContainerSpecs only logs them and returns success. The reconcile then falls through to return ctrl.Result{}, nil, so a failed start/stop gets exactly one attempt: the user patch that changed spec.state produced the only watch event, and no requeue is requested when Docker rejects the operation. The mismatch sits forever until some unrelated event triggers a reconcile.

Gemini and Codex both flagged this as a correctness issue (Gemini at Critical, Codex at Important); Claude flagged it as Suggestion S5 pointing out the asymmetry with the per-item errors.Join pattern used in processContainerFinalizers (engine_reconciler.go:334-356). I settled on Important — the retry never happens unless another reconcile fires, but the reconcile loop's overall safety net (watch events from later Docker activity) provides an eventual fallback.

pkg/controllers/app/engine/controllers/engine_reconciler.go

	if err := r.processContainerFinalizers(ctx, w); err != nil {
		return err
	}
	if err := r.processImageFinalizers(ctx, w); err != nil {
		return err
	}
	return r.processVolumeFinalizers(ctx, w)
}

// processContainerFinalizers deletes the Docker-side container for every
// K8s Container pending deletion and only strips the mirror finalizer
// when the Docker delete succeeds. Per-item errors are collected so one
// stuck container does not block the rest; the reconciler retries the
// remaining stuck items on the next reconcile.
func (r *EngineReconciler) processContainerFinalizers(ctx context.Context, w *dockerWatcher) error {
	var containers containersv1alpha1.ContainerList
	if err := r.List(ctx, &containers, client.InNamespace(apiNamespace)); err != nil {
		return err
	}
	var errs []error
	for i := range containers.Items {
		c := &containers.Items[i]
		if c.DeletionTimestamp == nil || !hasFinalizer(c, mirrorFinalizer) {
			continue
		}
		if err := w.deleteContainer(ctx, c.Name); err != nil {
			errs = append(errs, fmt.Errorf("failed to delete container %s from Docker: %w", c.Name, err))
			continue
		}
		if removeFinalizer(c, mirrorFinalizer) {
			// NotFound is benign: a concurrent Docker destroy event may
			// already have stripped the finalizer and deleted the
			// mirror between our List and Update.

Fix: collect per-container errors with errors.Join and return them so controller-runtime requeues with backoff, matching the existing finalizer pattern.

+    var errs []error
     for i := range containers.Items {
         ...
         if err := w.reconcileContainerState(ctx, c); err != nil {
-            logf.FromContext(ctx).Error(err, "Failed to reconcile container state", "container", c.Name)
+            errs = append(errs, fmt.Errorf("container %s: %w", c.Name, err))
         }
     }
-    return nil
+    return errors.Join(errs...)

I8. cleanupMirrorResources runs on every idle reconcile Claude¶

pkg/controllers/app/engine/controllers/engine_reconciler.go

	// current ContainerEngineReady reason: once the condition reflects
	// the terminal state, cleanup would be a no-op against an empty
	// namespace, so we short-circuit to avoid four empty List calls per
	// unrelated reconcile. On failure, the condition stays pending and
	// the next requeue re-tries the sweep.
	wantWatcher := running && engineIsDocker
	if !wantWatcher {
		if watcherRunning {
			log.Info("Stopping Docker watcher",
				"running", running, "engine", app.Spec.ContainerEngine.Name)
			r.stopWatcher()
		}
		terminalReason := "Stopped"
		terminalStatus := metav1.ConditionFalse
		terminalMessage := "Container engine stopped"
		if running && !engineIsDocker {
			terminalReason = "NotApplicable"
			terminalStatus = metav1.ConditionTrue
			terminalMessage = "Engine mirroring is only supported with the moby backend"
		}
		current := meta.FindStatusCondition(app.Status.Conditions, conditionContainerEngineReady)
		alreadyClean := !watcherDied && current != nil && current.Reason == terminalReason && current.Status == terminalStatus
		if !alreadyClean {
			if err := r.cleanupMirrorResources(ctx); err != nil {
				log.Error(err, "Failed to clean up mirror resources")
				return ctrl.Result{}, err
			}
		}
		return ctrl.Result{}, r.setEngineCondition(ctx, &app, terminalStatus, terminalReason, terminalMessage)
	}

The comment defends unconditional cleanup ("so transient errors are retried on the next requeue"), but in the common "VM stopped" and "containerd backend" steady states every incoming App event triggers four List calls (Container/Image/Volume/ContainerNamespace), each returning empty. Since the reconciler also Watches Container/Image/Volume as trigger sources (engine_reconciler.go:421-423), any update on any of them in rancher-desktop cascades into four list calls per event.

pkg/controllers/app/engine/controllers/engine_reconciler.go

}

// SetupWithManager sets up the controller with the Manager.
func (r *EngineReconciler) SetupWithManager(mgr ctrl.Manager) error {
	r.reconcileChan = make(chan event.GenericEvent, 1)

	// Map any container/image/volume event to a reconcile of the App singleton.
	enqueueApp := handler.EnqueueRequestsFromMapFunc(
		func(_ context.Context, _ client.Object) []reconcile.Request {
			return []reconcile.Request{{NamespacedName: types.NamespacedName{Name: appName}}}
		},
	)

Fix: gate cleanup on a state change — only sweep on transition into the not-wanted state (watcher was running, or a dedicated "CleanupPending" bit is set). If ContainerEngineReady already reflects the terminal state and observedGeneration == app.Generation, short-circuit with no List calls.

I9. ForceOwnership on initial create races with concurrent user patch Claude¶

pkg/controllers/app/engine/controllers/sync_containers.go

// WithSpec(unknown), the engine-controller field owner would keep
// reasserting spec.state=unknown, clobbering any user patch to
// running/created. Gating the WithSpec call behind a Get — so spec is
// written exactly once, on creation — keeps the invariant "subsequent
// syncs never touch spec" without needing a second field owner.
func (w *dockerWatcher) applyContainer(ctx context.Context, inspect mobycontainer.InspectResponse) error {
	namespace, name := parseContainerName(inspect.Name)

	// Create the resource if it doesn't exist. spec.state is always set to
	// "unknown" on creation — meaning the engine mirrors Docker state without
	// expressing intent. The user can later set it to "running" or "created"
	// to control the container; the reconciler ignores "unknown".
	//
	// Deliberately omit ForceOwnership on the create apply: if a user
	// patch to spec.state landed in the window between the Get above
	// and this Apply, ForceOwnership would silently clobber it. Without
	// the flag, the Apply succeeds on a fresh resource (no other owner
	// for spec.state yet) and fails loudly on a concurrent conflict
	// (the Get-NotFound observation was racy).
	var existing containersv1alpha1.Container
	err := w.k8s.Get(ctx, client.ObjectKey{Name: inspect.ID, Namespace: apiNamespace}, &existing)
	if apierrors.IsNotFound(err) {

The Get-then-Apply pattern (explained by the comment) is the right idea — spec must be written exactly once to avoid fighting user intent. But the Get is racy: a concurrent user patch spec.state=running that lands between the Get (returning NotFound) and the Apply is clobbered by the Apply, because ForceOwnership explicitly steals spec.state from whoever wrote it last. The window is small but real: in practice the first Docker create event fires very close to when rdd-ctl-driven user patches might arrive for pre-existing resources after a crash recovery.

Fix: drop ForceOwnership on the initial create apply, or use a typed Create call with explicit metav1.CreateOptions instead of SSA. The narrower fix is to remove client.ForceOwnership from the create path and leave it on the status-only subsequent apply.

I10. watchCondition conflates user cancellation and timeout Claude¶

cmd/rdd/set.go

				return true, nil
			}
		}
		return false, nil
	}

	condition := func(event watch.Event) (bool, error) {
		obj, ok := event.Object.(*unstructured.Unstructured)
		if !ok {
			return false, nil
		}
		return satisfied(obj), nil
	}

	if _, err := watchtools.UntilWithSync(ctx, lw, &unstructured.Unstructured{}, precondition, condition); err != nil {
		if errors.Is(err, context.DeadlineExceeded) {
			return errors.New("timed out waiting for App state")

When the user interrupts rdd set mid-wait with Ctrl+C, ctx is cancelled and watchtools returns context.Canceled. The CLI reports "timed out waiting for App state", which is false — the user cancelled. A user debugging why rdd set "timed out" after 2 seconds will go hunting in VM logs for the symptom instead of realizing they pressed Ctrl+C.

Fix: distinguish the two cases.

-    if errors.Is(err, context.DeadlineExceeded) || errors.Is(err, context.Canceled) {
+    if errors.Is(err, context.DeadlineExceeded) {
         return errors.New("timed out waiting for App state")
     }
+    if errors.Is(err, context.Canceled) {
+        return errors.New("wait cancelled")
+    }
     return fmt.Errorf("failed to watch App: %w", err)

I11. rdd set with multi-property changes returns on stale ContainerEngineReady Claude¶

cmd/rdd/set.go

//
// TODO: changes that trigger a VM restart without changing "running" (e.g.
// containerEngine.name) are not waited on. This requires a "Reconciled"
// condition with observedGeneration on the App resource so the CLI can
// detect when the full reconcile chain has settled.
//
// Known hazard until the "Reconciled" condition lands: `rdd set
// containerEngine.name=<other> running=true` on an already-running VM
// returns immediately because the stale ContainerEngineReady=True from
// the previous backend satisfies the precondition check before the
// reconciler has observed the new spec. Users who switch backends
// on a running VM should either stop first (`rdd set running=false`)
// or re-query ContainerEngineReady after the set completes to confirm
// the backend has finished transitioning.
func waitForDesiredState(ctx context.Context, config *rest.Config, properties map[string]string, timeout time.Duration) error {
	runningVal, ok := properties["running"]
	if !ok {
		return nil
	}

	ctx, cancel := context.WithTimeout(ctx, timeout)
	defer cancel()

	if runningVal == "true" {

rdd set containerEngine.name=containerd running=true on an already-running VM races with the stale cached ContainerEngineReady=True/Connected from the moby backend. The precondition check at set.go:489-495 fires on the stale state and returns immediately, leaving the user with a half-restarted VM even though rdd set succeeded.

cmd/rdd/set.go

		return fmt.Errorf("failed to create dynamic client: %w", err)
	}

	lw := &cache.ListWatch{
		ListFunc: func(options metav1.ListOptions) (runtime.Object, error) {
			options.FieldSelector = "metadata.name=app"
			return dynClient.Resource(appGVR).List(ctx, options)
		},
		WatchFunc: func(options metav1.ListOptions) (watch.Interface, error) {
			options.FieldSelector = "metadata.name=app"
			return dynClient.Resource(appGVR).Watch(ctx, options)
		},
	}

	precondition := func(store cache.Store) (bool, error) {
		for _, item := range store.List() {
			if obj, ok := item.(*unstructured.Unstructured); ok && satisfied(obj) {

The test at bats/tests/32-app-controllers/engine.bats:266-274 explicitly documents this limitation: "Stop first so there is no stale True/Connected from moby to satisfy the CLI wait below before the engine reconciler has run." That comment is the confession — the code works around a race the test harness itself can't avoid without stopping first.

bats/tests/32-app-controllers/engine.bats

    rdd ctl wait --for=jsonpath='{.status.status}'=paused \
        --namespace="${NAMESPACE}" container/"${cid}" --timeout=30s

    rdd ctl patch container "${cid}" --namespace="${NAMESPACE}" \
        --type=merge -p '{"spec":{"state":"running"}}'

    rdd ctl wait --for=jsonpath='{.status.status}'=running \
        --namespace="${NAMESPACE}" container/"${cid}" --timeout=30s

    run -0 docker inspect test-unpause --format '{{.State.Status}}'
    assert_output "running"

    docker rm --force test-unpause
}

# --- Cleanup on shutdown ---

@test "stopping VM removes all mirror resources" {
    # Make sure we have at least one resource to verify cleanup.

The author's TODO at set.go:440-443 acknowledges the root cause and proposes a future Reconciled condition with observedGeneration. Until that lands: either narrow the documented behavior ("waits for engine readiness only when running is the only property"), or skip the precondition shortcut when properties include anything besides running. At minimum, document the hazard in the TODO itself so future maintainers know it's a real user-visible issue, not just a polish item.

cmd/rdd/set.go

// container engine to be ready. If running=false was set, it waits for the
// App's Running condition to go to False — i.e. the VM has actually
// stopped, which is stricter than "container engine disconnected".
// Other property changes do not wait.
//
// TODO: changes that trigger a VM restart without changing "running" (e.g.
// containerEngine.name) are not waited on. This requires a "Reconciled"
// condition with observedGeneration on the App resource so the CLI can
// detect when the full reconcile chain has settled.
//
// Known hazard until the "Reconciled" condition lands: `rdd set
// containerEngine.name=<other> running=true` on an already-running VM
// returns immediately because the stale ContainerEngineReady=True from
// the previous backend satisfies the precondition check before the

Suggestions ¶

S1. removeMirrorResource mutates caller-provided obj across retries Claude¶

pkg/controllers/app/engine/controllers/docker_watcher.go

		log.V(1).Info("Image deleted", "id", msg.Actor.ID)
		return w.removeImagesByID(ctx, msg.Actor.ID)

	default:
		return nil
	}
}

// handleVolumeEvent processes volume events.
func (w *dockerWatcher) handleVolumeEvent(ctx context.Context, msg events.Message) error {
	log := logf.FromContext(ctx).WithName("docker-watcher")

	switch msg.Action {
	case events.ActionCreate:
		log.V(1).Info("Volume created", "name", msg.Actor.ID)
		return w.syncVolume(ctx, msg.Actor.ID)

	case events.ActionDestroy:
		log.V(1).Info("Volume destroyed", "name", msg.Actor.ID)
		return w.removeMirrorResource(ctx, &containersv1alpha1.Volume{},
			volumeK8sName(msg.Actor.ID))

	default:
		return nil
	}
}

// removeMirrorResource removes the finalizer from a mirror resource and deletes
// it. This is used when Docker has already deleted the object.

obj is caller-provided and retry iterations reuse the same pointer. That works in practice because controller-runtime's Get resets-and-populates, but it depends on an implementation detail. Callers like syncAllContainers at sync_containers.go:60-66 pass &k8sContainers.Items[i], a slice element with prior list state pre-populated. deleteAllOfType at engine_reconciler.go:257 already uses the fresher DeepCopyObject() inside the retry closure pattern — apply the same here.

pkg/controllers/app/engine/controllers/sync_containers.go


	// Remove stale K8s containers.
	var k8sContainers containersv1alpha1.ContainerList
	if err := w.k8s.List(ctx, &k8sContainers, client.InNamespace(apiNamespace)); err != nil {
		return fmt.Errorf("failed to list K8s containers: %w", err)
	}
	for i := range k8sContainers.Items {
		c := &k8sContainers.Items[i]
		if !dockerIDs[c.Name] {
			log.V(1).Info("Removing stale container", "id", c.Name)
			if err := w.removeMirrorResource(ctx, c, c.Name); err != nil {
				errs = append(errs, err)
			}
		}
	}

	return errors.Join(errs...)

pkg/controllers/app/engine/controllers/engine_reconciler.go

		return err
	}

	var errs []error
	for _, item := range items {
		obj := item.(client.Object)
		key := client.ObjectKeyFromObject(obj)
		kind := obj.GetObjectKind().GroupVersionKind().Kind

		retryErr := retry.RetryOnConflict(retry.DefaultRetry, func() error {
			latest := obj.DeepCopyObject().(client.Object)

S2. DockerSocket() hardcode of .rd2 is a silent regression Claude¶

pkg/instance/instance.go

// (e.g., ~/.rd2/docker.sock). This is the host-side socket that Lima
// port-forwards from the guest's /var/run/docker.sock.
//
// TODO: use ShortDir() once the Lima template derives the socket path
// from the instance suffix instead of hardcoding ".rd2". The two
// hardcodes (here and in pkg/controllers/app/app/lima-template.yaml)
// must move together — either both parameterized on RDD_INSTANCE or
// neither — or the watcher will open a different socket from the one
// Lima is port-forwarding to, and mirroring will silently diverge per
// instance. Until then, multi-instance BATS runs cannot share the
// same host and the engine.bats test has a parallel TODO at line
// 12-14.
var DockerSocket = sync.OnceValue(func() string {
	home, err := os.UserHomeDir()
	if err != nil {
		panic(fmt.Errorf("could not get home directory: %w", err))
	}
	dir := filepath.Join(home, ".rd2")
	_ = os.MkdirAll(dir, 0o700)

Commit 87f1df1 originally used filepath.Join(ShortDir(), "docker.sock"), which is the correct per-instance path. Commit 762c1e8 regressed it to a hardcoded .rd2 to match the (also-hardcoded) Lima template path at pkg/controllers/app/app/lima-template.yaml:30. The TODO at instance.go:127-128 acknowledges this, but the two hardcodes need to be fixed together — either both parameterized on the instance suffix, or a pair of tests that fails the build if they ever diverge.

pkg/controllers/app/app/lima-template.yaml

  - 0
  guestSocket: /var/run/docker.sock
  hostPortRange:
  - 0
  - 0
  hostSocket: "{{.Home}}/.rd2/docker.sock"
provision:
- mode: system
  script: |
    #!/bin/bash

pkg/instance/instance.go


// DockerSocket returns the path to the Docker socket for this instance
// (e.g., ~/.rd2/docker.sock). This is the host-side socket that Lima
// port-forwards from the guest's /var/run/docker.sock.
//
// TODO: use ShortDir() once the Lima template derives the socket path
// from the instance suffix instead of hardcoding ".rd2". The two
// hardcodes (here and in pkg/controllers/app/app/lima-template.yaml)
// must move together — either both parameterized on RDD_INSTANCE or
// neither — or the watcher will open a different socket from the one
// Lima is port-forwarding to, and mirroring will silently diverge per
// instance. Until then, multi-instance BATS runs cannot share the

Also, os.MkdirAll silently swallows errors (_ =); while the directory is typically pre-created by rdd svc start, the silent failure is a trap. The BATS helper at bats/tests/32-app-controllers/engine.bats:12-14 carries a parallel TODO, confirming the author is aware.

bats/tests/32-app-controllers/engine.bats

# Engine controller tests — verify that the engine controller mirrors Docker
# containers, images, and volumes into K8s resources, and that K8s deletions
# and spec.state changes are forwarded to Docker.

NAMESPACE="rancher-desktop"
# TODO: use .rd${RDD_INSTANCE} once the Lima template derives the socket path
# from the instance suffix instead of hardcoding ".rd2".
DOCKER_HOST="unix://${HOME}/.rd2/docker.sock"
export DOCKER_HOST

local_setup_file() {
    # The Docker socket access pattern used by these tests is not yet wired
    # up for Windows/WSL2.

S3. docker_watcher.run() has no panic recovery Claude¶

Unlike controller-runtime reconcilers (wrapped in HandleCrash), this raw goroutine has no panic recovery. A nil-pointer in handleEvent — e.g. an unexpected Moby event shape — takes down the whole app-controller process. Add defer utilruntime.HandleCrash(...) or at minimum defer func() { if r := recover(); r != nil { log.Error(...) } }().

S4. Inconsistent Force flags across delete{Container,Image,Volume} Claude¶

Containers and volumes use Force: true, meaning rdd ctl delete container/foo force-stops a running container without giving processes a chance to exit gracefully. Images use no Force, so in-use images are correctly kept — but the inconsistency is surprising and undocumented. Either document the policy in a comment, or consider Force: false for containers (the user can patch spec.state=created first, then delete, which is the K8s-idiomatic pattern).

S5. statusCopy := *statusApply shallow copy shares slice/map state Claude¶

pkg/controllers/app/engine/controllers/sync_images.go

		name := sanitizeKubernetesObjectName(inspect.ID)
		names = append(names, name)
		if err := w.applyImage(ctx,
			containersv1alpha1apply.Image(name, apiNamespace).
				WithFinalizers(mirrorFinalizer),
			imageStatusFromInspect(inspect),
		); err != nil {
			errs = append(errs, err)
		}
	}

	return names, errors.Join(errs...)
}

// imageStatusFromInspect builds a fresh ImageStatus apply config from a
// Docker inspect response. Returning a new value per call avoids the
// aliasing trap of a shallow struct copy — slice/map WithX calls on
// one "copy" would otherwise mutate the backing memory shared with
// other callers.
func imageStatusFromInspect(inspect mobyimage.InspectResponse) *containersv1alpha1apply.ImageStatusApplyConfiguration {
	statusApply := containersv1alpha1apply.ImageStatus().

statusCopy := *statusApply is a shallow copy: RepoDigests (slice), Labels (map), and every *string/*int64 field share backing memory with the original across iterations. It is safe today because the loop only calls WithRepoTag and WithNamespace — both set-only scalar operations — but a future edit that adds WithLabels(extraLabels) or WithRepoDigests(extra) inside the loop would corrupt all prior iterations' payloads via in-place append/map mutation.

Fix: replace the shallow copy with a helper that returns a fresh ImageStatusApplyConfiguration each iteration, building from the common fields explicitly.

S6. pgid log field is misleading on Windows Claude¶

pkg/controllers/lima/limavm/controllers/limavm_lifecycle.go


	// Start watcher goroutine to track hostagent lifecycle and reap the process.
	// The watcher enqueues reconciles as phase transitions occur.
	r.startWatcher(ctx, limaVM.Name, limaVM.Namespace, haCmd, inst.Dir, begin)

	// On Unix, SetGroup() sets Setpgid=true which makes the hostagent a
	// new process group leader, so pgid == pid. The "pgid" log field
	// reflects that Unix-only semantics and is consumed by
	// bats-with-timeout.sh for post-mortem leak detection (attributing
	// orphaned qemu grandchildren back to their hostagent ancestor).
	// On Windows, SetGroup() sets CREATE_NEW_PROCESS_GROUP instead —
	// a different console-control-event concept — and this log field
	// will need to be renamed or recalculated when Windows CI arrives.
	logger.Info("Hostagent started, watcher active",
		"instance", limaVM.Name,
		"pid", haCmd.Process.Pid,
		"pgid", haCmd.Process.Pid)

On Unix the claim holds (Setpgid=true → pgid=pid). On Windows there is no POSIX process group; SetGroup sets CREATE_NEW_PROCESS_GROUP, which is a different console-control-event concept — logging the value as "pgid" is a misnomer. Harmless today (Windows CI is not exercised), but will bite when it's added. Either read the real pgid via syscall.Getpgid(haCmd.Process.Pid) on Unix only, or label the field with a platform-neutral name.

S7. BATS engine tests use [ -n "${output}" ] instead of refute_output Claude¶

bats/tests/32-app-controllers/engine.bats


# --- K8s deletion removes Docker object ---

@test "deleting Container resource removes Docker container" {
    docker run -d --name test-delete busybox sleep 3600

    run -0 docker inspect test-delete --format '{{.Id}}'
    cid=${output}

    rdd ctl wait --for=create --namespace="${NAMESPACE}" \
        container/"${cid}" --timeout=30s

Per the repo convention, prefer refute_output "" over [ -n ... ], which fails silently on violation. Swap both sites.

 run -0 rdd ctl get image --namespace="${NAMESPACE}" \
     --field-selector "status.repoTag=busybox:latest" -o name
+refute_output ""
 image_ref=${output}
-[ -n "${image_ref}" ]

S8. Concurrent finalizer removal can log benign NotFound errors Gemini¶

pkg/controllers/app/engine/controllers/engine_reconciler.go

	var containers containersv1alpha1.ContainerList
	if err := r.List(ctx, &containers, client.InNamespace(apiNamespace)); err != nil {
		return err
	}
	var errs []error
	for i := range containers.Items {
		c := &containers.Items[i]
		if c.DeletionTimestamp == nil || !hasFinalizer(c, mirrorFinalizer) {
			continue
		}
		if err := w.deleteContainer(ctx, c.Name); err != nil {
			errs = append(errs, fmt.Errorf("failed to delete container %s from Docker: %w", c.Name, err))
			continue
		}
		if removeFinalizer(c, mirrorFinalizer) {

r.Update(ctx, c) can return NotFound if the Docker ActionDestroy event handler concurrently stripped the finalizer and deleted the mirror object between this reconcile's List and Update. It's a benign race but produces noisy error logs and triggers unnecessary reconciles.

Fix: client.IgnoreNotFound(r.Update(ctx, c)). The same pattern applies to processImageFinalizers and processVolumeFinalizers.

S9. Design doc overstates containerd mirroring support Codex¶

docs/design/api_containers.md

## Engine Mirroring

The engine controller (`pkg/controllers/app/engine/`) watches the `App` resource
for the `Running` condition.  When the VM is running with the `moby` backend,
the controller:

1. Connects to the Docker engine via the host socket.
2. Creates the `rancher-desktop` K8s namespace and the `moby`
   `ContainerNamespace` resource.
3. Lists all containers, images, and volumes and creates corresponding K8s
   resources.
4. Watches the Docker event stream for create, update, and delete events.

Containerd mirroring is not implemented yet; with `containerEngine.name=containerd`

The new doc text says the engine controller creates engine-specific namespaces for containerd, but the implementation does the opposite: engine_reconciler.go:115-133 shuts the watcher down and reports ContainerEngineReady=True, reason=NotApplicable whenever app.Spec.ContainerEngine.Name != "moby". That wording will mislead anyone trying to build against the API from the doc.

pkg/controllers/app/engine/controllers/engine_reconciler.go

	// current ContainerEngineReady reason: once the condition reflects
	// the terminal state, cleanup would be a no-op against an empty
	// namespace, so we short-circuit to avoid four empty List calls per
	// unrelated reconcile. On failure, the condition stays pending and
	// the next requeue re-tries the sweep.
	wantWatcher := running && engineIsDocker
	if !wantWatcher {
		if watcherRunning {
			log.Info("Stopping Docker watcher",
				"running", running, "engine", app.Spec.ContainerEngine.Name)
			r.stopWatcher()
		}
		terminalReason := "Stopped"
		terminalStatus := metav1.ConditionFalse
		terminalMessage := "Container engine stopped"
		if running && !engineIsDocker {
			terminalReason = "NotApplicable"
			terminalStatus = metav1.ConditionTrue
			terminalMessage = "Engine mirroring is only supported with the moby backend"
		}
		current := meta.FindStatusCondition(app.Status.Conditions, conditionContainerEngineReady)
		alreadyClean := !watcherDied && current != nil && current.Reason == terminalReason && current.Status == terminalStatus
		if !alreadyClean {
			if err := r.cleanupMirrorResources(ctx); err != nil {
				log.Error(err, "Failed to clean up mirror resources")
				return ctrl.Result{}, err
			}
		}
		return ctrl.Result{}, r.setEngineCondition(ctx, &app, terminalStatus, terminalReason, terminalMessage)

Fix:

-2. Creates the `rancher-desktop` K8s namespace and the appropriate
-   `ContainerNamespace` resource (`moby` for dockerd, engine-specific
-   namespaces for containerd).
+2. Creates the `rancher-desktop` K8s namespace and the `moby`
+   `ContainerNamespace` resource when the App uses the moby backend.
+   Containerd mirroring is not implemented yet; the controller reports
+   `ContainerEngineReady` with reason `NotApplicable` in that mode.

Separately, the Image example at docs/design/api_containers.md:218-219 still describes the old "random suffix" naming scheme even though the implementation now uses a deterministic hash — that text predates this PR but is stale.

S10. docs/design/api_containers.md:306-312 still describes the old finalizer-untag semantics Claude¶

Delete the Image object; the finalizer will untag the image. If all tags of an
image are removed, _and_ it is not in use by a container (running or not), then
the image itself is removed.

This contradicts the current code: on a Docker untag the controller deletes the K8s resource directly, without going through the finalizer path. The doc change in ba6caf7 left this stale. Reconcile the doc with the implementation — the current code better matches the intent, but the description on this line should match what the code does.

Design Observations ¶

Concerns ¶

Engine controller cleanup policy treats "VM stopped" and "watcher died" identically (in-scope) — A transient event-stream blip triggers the same aggressive sweep as a user-initiated shutdown. See I4 for the specific fix; the broader design question is whether the Reconcile state machine should distinguish "intentional stop" (from App condition) from "involuntary failure" (from watcher liveness). Claude Gemini
spec.state: unknown conflates "no user intent" with a real spec state (future) — A cleaner model would be a nullable spec.state (pointer) with nil = no opinion, or split spec.desired.state from status.managedBy. The current shape forces the webhook to enforce "stop trying to be unknown once a user expressed intent" — future contributors will trip over this. Not in-scope for this PR. Claude
cleanupMirrorResources assumes exclusive ownership of rancher-desktop (future) — deleteAllOfType iterates every resource in the namespace and strips the mirror finalizer. The mock controller at pkg/controllers/mock/mock_controller.go:27 uses the same apiNamespace = "rancher-desktop"; if the two ever coexist (e.g. mock + real engine in a dev instance), cleanup will wipe the mock data. Filter by a label selector (engine.rancherdesktop.io/mirror=true) or by field owner. Claude
Engine feature is a single singleton reconciler owning bootstrap sync, event streaming, desired-state convergence, and finalizer processing (future) — Splitting into per-resource reconcilers would remove the global O(N) sweeps, make retries object-scoped instead of best-effort, and eliminate the I3 bootstrap race by giving each object a durable reconcile path. The code already acknowledges this at engine_reconciler.go:149-155 with an in-place comment. Codex

Strengths ¶

The Get-then-Apply comment at sync_containers.go:87-93 explaining why spec is written only on create (to avoid SSA fights with user patches) is the kind of insight most code lacks. Claude
processContainerFinalizers, processImageFinalizers, and processVolumeFinalizers collect per-item errors with errors.Join — one stuck object does not block the rest. This is the correct pattern for sweep loops. (The asymmetry with reconcileContainerSpecs, which does not do this, is I7.) Claude
The docker_watcher.alive() check paired with the explicit stop() → <-w.done handshake avoids the common goroutine-ownership ambiguity. Claude
The mirror reverse-delete path is designed around the repo's "no K8s GC" constraint: mirror resources carry an explicit engine finalizer, and cleanupMirrorResources sweeps all mirrored kinds on shutdown instead of assuming owner-reference garbage collection. Codex
The new rdd set implementation uses a watch-based wait rather than polling — the right control-plane pattern for this codebase. Codex
The immutableValidator webhook correctly prevents drift by forcing explicit transitions; the equality.Semantic.DeepEqual comparison with State normalized is the right way to verify nothing else changed. Gemini
Deterministic RFC 1123 volume names via SHA-256 hashing with the original name preserved in status.name elegantly handles Docker names that contain uppercase/underscore characters. Gemini

Testing Assessment ¶

Strong happy-path coverage: container lifecycle mirroring, docker run/stop/rm, volume create/delete (including uppercase names), image pull, K8s-initiated container deletion, in-use image finalizer retention, spec-state transitions in the running↔created↔paused(→stopped) direction, VM shutdown cleanup, and the moby↔containerd backend switch. The BATS test at bats/tests/32-app-controllers/engine.bats is readable, well-commented, and validates the main contracts end-to-end.

Gaps, in rough order of risk:

Paused → running transition (I2). No test. High risk of silently-dropped user intent on unpause.
Docker-initiated untag of in-use image (I1, I6). The existing deleting an in-use Image keeps the finalizer test covers the K8s-initiated direction; the Docker-initiated docker rmi <tag> path is not exercised.
Transient Docker disconnect (I4). No test. Would need to kill/restart Docker inside the VM or sever the host socket briefly.
NotFound during fullSync Inspect (I5). No test for a container being removed concurrently with the initial sync.
Watcher bootstrap race (I3). No test for mutations landing between fullSync and event-stream subscription.
Multi-property rdd set while VM running (I11). The test at engine.bats:266-274 documents the stale-condition race but works around it by stopping first.
Concurrent user patch during engine reconcile (I9). No test for a user patching spec.state=running in the window between the engine's Get and Apply on create.
Engine tests only cover a single instance and skip Windows/WSL2 — the hardcoded .rd2 socket path (S2) means multi-instance tests cannot be added until the Lima template is parameterized on the instance suffix.

Documentation Assessment ¶

docs/design/api_containers.md is substantially updated with engine mirroring semantics, unknown state rationale, and finalizer lifecycle — a net improvement.
The State is the desired state of the container comment at pkg/apis/containers/v1alpha1/container_types.go:56-61 does not describe what unknown means to the reconciler. Update to: "State is the desired state of the container; unknown means the engine controller mirrors without expressing intent and takes no start/stop action."
The rdd set help text at cmd/rdd/set.go:68-72 correctly mentions --timeout=0, but commit message 762c1e8 refers to a --wait=false flag that does not exist. Fix the commit message (see I7 / Commit Structure).
S9 and S10 document stale doc text that should be updated with this PR.

Commit Structure ¶

This PR bundles the core engine-watcher feature with unrelated CI and Lima template work. Two commits in particular are mis-titled:

4db3f20 "Allow container spec.state transitions via webhook" — the webhook diff is a small part of this commit. It also introduces the unknown enum value, changes the CRD default from running to unknown, adds the reconcileContainerState flow with paused-container handling, adds processContainerFinalizers/processImageFinalizers/processVolumeFinalizers, and reworks sync_containers.go to gate spec-apply on Get. A reader scanning git log --oneline for the unknown enum addition would not find it. Claude Codex
762c1e8 "Make rdd set wait for the desired state by default" — same problem, larger. ~300 lines of engine controller churn, the DockerSocket regression from ShortDir() to hardcoded .rd2 (S2), and a significant reconciler rework. The commit message also refers to a --wait=false flag that does not exist — the flag is --timeout=0. Claude Codex
50bdc15 and 7c8f356 — the second immediately flips the first from "on-timeout only" to "always". These should be a single commit. Claude

Crisp/self-contained commits (good examples): 9669b4b (bats fd close), ba6caf7 (docs), 32a0109 (distro proxy workaround), 536ccb2 (rosetta vz move).

Fix: split the commits along responsibility lines and correct the 762c1e8 commit message to reflect the actual flag name before the PR lands.

Acknowledged Limitations ¶

DockerSocket() hardcodes .rd2 — TODO at pkg/instance/instance.go:127-128, mirrored by a parallel TODO at bats/tests/32-app-controllers/engine.bats:12-14. S2 is in findings because it's a regression from the previously correct state, not because the author is unaware.
rdd set doesn't wait for non-running property changes that trigger a VM restart — TODO at cmd/rdd/set.go:440-443 proposes a Reconciled condition with observedGeneration. I11 is still in findings because the race is user-visible today, not just a future polish item.
reconcileContainerSpecs + processFinalizers run O(N) per child event — comment at engine_reconciler.go:149-155 acknowledges the long-term fix (split into per-resource reconcilers). Not promoted to a finding given the typical scale.
moby ContainerNamespace has no mirror finalizer by design — comment at sync_volumes.go:54-59 explains; the BATS test deleting containernamespace/moby completes without a finalizer hang verifies.
Containerd backend reports ContainerEngineReady=True/NotApplicable — intentional so that rdd set running=true returns promptly. The test comment at engine.bats:266-273 documents this.
Windows hostagent pgid logging — the comment at limavm_lifecycle.go:479-481 assumes Unix Setpgid semantics. S6 is in findings because the wording breaks down on Windows, not because the author is unaware.

Agent Performance Retro ¶

Claude ¶

Claude produced the most comprehensive review of the three by a wide margin (8 Important, 8 Suggestions, 3 Design observations), and carried the most unique signal: the paused→running asymmetry (I2), the ForceOwnership create race (I9), the watchCondition cancel/timeout confusion (I10), the multi-property rdd set stale-condition race (I11), and the idle-reconcile cleanupMirrorResources cost (I8). It also caught the shallow-copy trap (S5), the panic-recovery gap (S3), and the Force flag inconsistency (S4), plus the commit-structure critique. It correctly framed I1 as "no dangling mirror is ever created" rather than "the tag is not removed," which is a sharper diagnosis than Codex or Gemini's.

Claude's run time was unusual: it took 22 minutes — well past the nominal 15-minute timeout — but produced the deepest analysis. Worth noting that --effort max is paying for quality here.

Codex ¶

Codex produced a tight, well-calibrated review (3 Important, 1 Suggestion) with the single highest-signal unique finding of the round: the watcher bootstrap race (I3) between fullSync and event-stream subscription. That requires tracing the initialization order carefully — a class of bug where no raw code read would catch it. Codex also reached the correct fix for the untag bug (I1) independently before Claude got there, and correctly identified that the commit-structure and documentation issues needed flagging. It downgraded the error-swallowing from Gemini's "Critical" to Important (correct) and from Claude's "Suggestion" (correct). Zero false positives.

Gemini ¶

Gemini produced 4 "Critical" and 3 "Important" findings but hallucinated C1 — the "typo in BATS wrapper $ @ causes immediate CI failure" finding. The actual scripts/bats-with-timeout.sh:225 reads "$@" & with no space. I verified this directly. A CI-failure-level claim should never be made without verifying the literal character sequence in the source, and Gemini's own tool output earlier contained Error stating path " & — a garbled state during its own reading of the file that it apparently then committed to as fact.

Gemini also inflated severity on C3 and C4 (early return bypasses finalizers, container state errors swallowed) — both are real concerns but neither is Critical. And its suggested fix for the untag bug (use msg.Actor.Attributes["name"]) is also wrong, because Docker sets that attribute to the image ID hash too. That said, Gemini caught two things the others missed: the dangling-image leak on tag (I6) and the NotFound during fullSync issue (I5). When correctly calibrated, those are valuable finds.

Gemini's review ran in about 6 minutes — fastest of the three.

Summary ¶

	Claude Opus 4.6	Codex GPT 5.4	Gemini 3.1 Pro
Duration	22m 15s	7m 00s	6m 51s
Findings	8I 8S	3I 1S	4I 1S
Tool calls	99 (Bash 36, Read 36, Grep 20)	49 (shell 47, stdin 2)	4 (runshellcommand 3, googlewebsearch 1)
Design observations	3	2	1
False positives	0	0	1
Unique insights	7	1	2
Files reviewed	26	26	26
Coverage misses	0	0	0
Totals	8I 8S	3I 1S	4I 1S
Downgraded	0	0	3 (I→S)
Dropped	0	0	1

Claude is the clear leader for depth and calibration on this review; Codex's bootstrap-race catch is the single most valuable unique finding; Gemini's hallucinated "critical CI failure" is the most concerning failure mode and reinforces the rule that model claims about literal file contents need to be verified before being trusted.

Reconciliation:

Gemini C1 (shell typo, Critical → rejected as false positive): the claimed "$ @" does not exist; the actual script has "$@". Verified by reading scripts/bats-with-timeout.sh:225.
Gemini C2 (untag broken, Critical → Important as I1): the bug is real but not a data-loss-level regression — stale resources survive until the next watcher restart's fullSync. Also, Gemini's proposed fix is wrong (reading Actor.Attributes["name"] returns the image ID hash per Moby daemon/images/image_events.go:22).
Gemini C3 (early return bypasses finalizers, Critical → rejected): subsequent reconciles from the cleanup's own watch events process the deferred finalizers. Not critical in practice.
Gemini C4 (state errors swallowed, Critical → Important as I7): real bug but not Critical; retry eventually happens via subsequent Docker events.
Claude S5 (state errors swallowed, Suggestion → promoted to Important as I7): merged with Codex/Gemini framing.
Gemini Design Observation (tear-down on disconnect → promoted to Important as I4): the issue is user-visible (UIDs churn, watch clients see DELETED→ADDED), so it warrants Important rather than future-design treatment.

Review Process Notes ¶

Skill improvements:

None this round. The review dimensions already cover watcher lifecycle, initialization ordering, retry semantics, and Docker event-handling contracts. Gemini's hallucinated CI-failure Critical is a model-specific failure, not a prompt gap — no amount of prompt tweaking removes "verify the literal characters" as a model responsibility.

Repo context updates:

[repo] Add to deep-review-context.md: "When reviewing watcher or event-stream code, cross-check against upstream source for the event payload contract — Docker's image events in particular do not propagate tag names through Actor.ID or Actor.Attributes. Agents must trace into daemon/images/image_events.go (or equivalent) when proposing a fix that reads event fields."

Why: this review had two agents (Gemini, Codex initially) propose fixes based on a wrong assumption about what Actor.ID and Actor.Attributes["name"] contain for ActionUnTag. Future reviewers will save time if this is documented up front.