Deep Review: 20260414-192417-pr-309

Date	2026-04-14 19:24
Repo	rancher-sandbox/rancher-desktop-daemon
Round	1
Author	@jandubois
PR	#309 — WIP: squash of docker-v2 engine commits for rebase
Branch	`engine-v3`
Commits	`a154249` WIP: squash of docker-v2 engine commits for rebase
Reviewers	Claude Opus 4.6, Codex GPT 5.4, Gemini 3.1 Pro
Verdict	Merge with fixes — C1 and C2 are cheap to fix and close real stalls; the three Important items on bootstrap re-watch, the `rdd set` wait, and volume sync error handling all deserve to go in before rebase lands on main
Wall-clock time	`16 min 32 s`

Executive Summary ¶

PR #309 adds a new engine controller under pkg/controllers/app/engine/ that connects to the embedded Docker daemon, mirrors Docker state into Container, Image, Volume, and ContainerNamespace resources, and forwards user-initiated deletions and Container.spec.state transitions back to Docker. It also introduces a --timeout flag and waitForDesiredState wait path in rdd set, extends the Container CRD with a user-settable spec.state, adds a new engine.bats integration suite, and updates the design docs for App and Container.

The implementation is careful about the hard parts: SSA field ownership is split across two field managers so a finalizer-only apply does not drop spec ownership; atomic-list ordering is stabilized to avoid resourceVersion churn; the Docker event stream is anchored to the daemon's own clock via Info().SystemTime; and multi-writer status updates consistently use retry.RetryOnConflict.

The two critical findings are both defer/control-flow bugs in the same file: a LIFO-order race in dockerWatcher.run() can leave w.done still open when the reconciler checks alive() after a stream error, permanently stalling the watcher; and the main Reconcile loop early-returns on any reconcileContainerSpecs error, which means a single stuck Container.spec.state actuation blocks all finalizer processing across every mirror type. Two sharp Important-severity design gaps round out the picture: the engine creates rancher-desktop namespace and containernamespace/moby in one-shot fullSync() but never watches them for recreation, so a live delete wedges the controller; and the new rdd set running=true wait can return success before Lima has actually applied a template/backend change, because ContainerEngineReady gets re-stamped with the new generation the moment the engine reconciler runs, long before LimaVM has finished its restart cycle.

Critical Issues ¶

C1. Watcher event-stream error races with alive() via defer LIFO order Gemini¶

pkg/controllers/app/engine/controllers/docker_watcher.go


// run is the main watcher goroutine that processes Docker events. The
// since parameter is the "Since" filter passed to the Docker events
// endpoint, captured before the initial fullSync so events that
// happened during the sync window are replayed once the stream opens.
func (w *dockerWatcher) run(ctx context.Context, since string) {
	log := logf.FromContext(ctx).WithName("docker-watcher")
	// Wake the engine reconciler after this goroutine exits so it observes
	// the dead watcher via alive() and starts a fresh one. Registered
	// before close(w.done) so that LIFO defer ordering guarantees the
	// enqueue fires AFTER w.done is closed — otherwise the reconciler
	// could wake, see alive()==true on the about-to-exit goroutine, and
	// skip the restart.
	defer w.enqueueReconcile()
	defer close(w.done)
	// Recover from panics in event handling so an unexpected Docker
	// event shape cannot crash the whole app-controller process.
	defer func() {
		if r := recover(); r != nil {
			log.Error(nil, "panic in Docker watcher goroutine",
				"recovered", r, "stack", string(debug.Stack()))
		}
	}()

	eventFilter := dockerclient.Filters{}.
		Add("type", string(events.ContainerEventType)).
		Add("type", string(events.ImageEventType)).
		Add("type", string(events.VolumeEventType))

	result := w.cli.Events(ctx, dockerclient.EventsListOptions{
		Since:   since,
		Filters: eventFilter,
	})

	for {
		select {
		case <-ctx.Done():
			log.Info("Docker watcher stopping")
			return
		case err := <-result.Err:
			if ctx.Err() != nil {
				return
			}
			log.Error(err, "Docker event stream error")
			return
		case msg, ok := <-result.Messages:
			if !ok {
				// The daemon closed the event stream without writing to
				// result.Err. The deferred enqueueReconcile wakes the
				// engine reconciler after w.done is closed, so it
				// observes the dead watcher and starts a fresh one.
				log.Info("Docker event stream closed")
				return
			}
			// A transient handleEvent error (apiserver 503, SSA conflict
			// past its internal retry) is logged and the event is
			// dropped. Container events tend to self-heal because the
			// next state change (start/stop/die) triggers another sync

When the Docker event stream errors out, the watcher calls w.enqueueReconcile() inline and then returns. Defers fire in LIFO order on exit: the panic-recovery defer fires first (no-op), then close(w.done) — both after the enqueue. The reconciler is woken by the enqueue and, at engine_reconciler.go:100, checks r.watcher.alive(), which is select { case <-w.done: return false; default: return true }. In the window between the inline enqueueReconcile() and the deferred close(w.done), alive() still reports true, so the reconciler takes the "watcher is already running" branch at engine_reconciler.go:165-174 and skips the restart. The goroutine then exits and no further event ever wakes the engine reconciler — the watcher is dead but the reconciler believes it is alive.

pkg/controllers/app/engine/controllers/engine_reconciler.go

	// which blocks on <-w.done and cli.Close() — would stall any
	// concurrent reconcile the moment MaxConcurrentReconciles > 1.
	var diedWatcher *dockerWatcher
	r.watcherMu.Lock()
	watcherRunning := r.watcher != nil
	if watcherRunning && !r.watcher.alive() {
		diedWatcher = r.watcher
		r.watcher = nil
		watcherRunning = false
	}
	r.watcherMu.Unlock()

pkg/controllers/app/engine/controllers/engine_reconciler.go

			}
		}
		return ctrl.Result{}, r.setEngineCondition(ctx, &app, terminalStatus, terminalReason, terminalMessage)
	}

	if !watcherRunning {
		log.Info("App is running, starting Docker watcher")
		if err := r.startWatcher(ctx); err != nil {
			log.Error(err, "Failed to start Docker watcher")
			if condErr := r.setEngineCondition(ctx, &app, metav1.ConditionFalse, "ConnectFailed", err.Error()); condErr != nil {
				log.Error(condErr, "Failed to update ContainerEngineReady to ConnectFailed")
			}
			return ctrl.Result{}, err
		}
	}

	// Re-assert the Connected condition on every steady-state reconcile
	// so its observedGeneration tracks the App's current generation. When
	// the spec is patched without cycling the watcher (e.g. `rdd set
	// running=true kubernetes.enabled=true` from an already-running App),

In wall-clock time the race window is short (a few instructions between enqueueReconcile() returning and close(w.done) executing), but the reconciler runs on a separate goroutine scheduled off a work queue, so it absolutely can observe the intermediate state. The symptom is permanent — there is no further path that wakes the engine reconciler once the child-resource events drain — until an unrelated event like a user rdd set finally triggers a reconcile minutes or hours later.

Fix: invert the defer order so close(w.done) is deferred first (fires last) and enqueueReconcile is deferred after it (fires first, when w.done is already closed), and drop the inline enqueueReconcile() calls in the error paths. That also removes the need for the recover handler to re-call enqueueReconcile.

 func (w *dockerWatcher) run(ctx context.Context, since string) {
 	log := logf.FromContext(ctx).WithName("docker-watcher")
 	defer close(w.done)
+	defer w.enqueueReconcile()
 	defer func() {
 		if r := recover(); r != nil {
 			log.Error(nil, "panic in Docker watcher goroutine",
 				"recovered", r, "stack", string(debug.Stack()))
-			w.enqueueReconcile()
 		}
 	}()
 	...
 		case err := <-result.Err:
 			if ctx.Err() != nil {
 				return
 			}
 			log.Error(err, "Docker event stream error")
-			w.enqueueReconcile()
 			return
 		case msg, ok := <-result.Messages:
 			if !ok {
 				log.Info("Docker event stream closed")
-				w.enqueueReconcile()
 				return
 			}

C2. reconcileContainerSpecs errors block all finalizer processing Gemini¶

pkg/controllers/app/engine/controllers/engine_reconciler.go

	// every Container/Image/Volume watch event triggers a reconcile via
	// SetupWithManager below. Cost is therefore O(N) per child event.
	// The long-term fix is to split these into per-resource reconcilers
	// with watch predicates ("deletion timestamp set", "spec.state
	// changed"); until then the sweep runs on every event.
	//
	// Run both passes unconditionally and join their errors. An early
	// return on specsErr would stall every mirror's finalizer behind a
	// single stuck Container.spec.state actuation (for example a
	// container whose ContainerStart keeps failing): processFinalizers
	// would never reach the Image/Volume finalizers, and the next
	// reconcile would hit the same broken container first and skip
	// them again.
	var specsErr, finErr error
	if err := r.reconcileContainerSpecs(ctx); err != nil {
		specsErr = fmt.Errorf("failed to reconcile container specs: %w", err)
	}
	if err := r.processFinalizers(ctx); err != nil {

reconcileContainerSpecs already uses errors.Join internally, so a single failing container does not abort the inner loop — but it still returns a non-nil aggregate error, which causes the main Reconcile to early-return and skip processFinalizers entirely. If any Container mirror is stuck in a state Docker keeps rejecting (for example spec.state=running on a container whose image was removed, so every ContainerStart returns an error), that single broken container blocks every Container, Image, and Volume finalizer in the namespace. Deleted mirrors sit in Terminating indefinitely because the processFinalizers pass never runs — and because the next reconcile enters the same branch and fails again. This is a persistent stall of the finalizer pipeline, not a transient hiccup: every reconcile until the broken container is manually fixed hits the same path.

The fix mirrors the per-item error collection the rest of the file already uses: run both passes unconditionally and join their errors, so the controller still requeues but finalizers are no longer gated on spec actuation.

-	if err := r.reconcileContainerSpecs(ctx); err != nil {
-		return ctrl.Result{}, fmt.Errorf("failed to reconcile container specs: %w", err)
-	}
-	if err := r.processFinalizers(ctx); err != nil {
-		return ctrl.Result{}, fmt.Errorf("failed to process finalizers: %w", err)
-	}
-	return ctrl.Result{}, nil
+	specsErr := r.reconcileContainerSpecs(ctx)
+	if specsErr != nil {
+		specsErr = fmt.Errorf("failed to reconcile container specs: %w", specsErr)
+	}
+	finErr := r.processFinalizers(ctx)
+	if finErr != nil {
+		finErr = fmt.Errorf("failed to process finalizers: %w", finErr)
+	}
+	return ctrl.Result{}, errors.Join(specsErr, finErr)

Important Issues ¶

I1. Bootstrap resources are created once in fullSync but never re-watched Codex¶

pkg/controllers/app/engine/controllers/engine_reconciler.go

}

// SetupWithManager sets up the controller with the Manager.
func (r *EngineReconciler) SetupWithManager(mgr ctrl.Manager) error {
	r.reconcileChan = make(chan event.GenericEvent, 1)

	// Map any Container/Image/Volume event to a reconcile of the App singleton.
	enqueueApp := handler.EnqueueRequestsFromMapFunc(
		func(_ context.Context, _ client.Object) []reconcile.Request {
			return []reconcile.Request{{NamespacedName: types.NamespacedName{Name: appName}}}
		},
	)

	return ctrl.NewControllerManagedBy(mgr).
		For(&appv1alpha1.App{}).
		Named("engine-reconciler").
		WatchesRawSource(source.Channel(r.reconcileChan, enqueueApp)).
		Watches(&containersv1alpha1.Container{}, enqueueApp).

The engine creates both the rancher-desktop Kubernetes namespace (sync_volumes.go:40-53 — ensureNamespace) and the containernamespace/moby mirror (sync_volumes.go:62-67 — syncContainerNamespace) exactly once, inside fullSync() at watcher startup. SetupWithManager watches App, Container, Image, and Volume — nothing on corev1.Namespace or containersv1alpha1.ContainerNamespace.

pkg/controllers/app/engine/controllers/sync_volumes.go

	return fmt.Sprintf("vol-%x", sum)
}

// ensureNamespace creates the Kubernetes namespace that holds the
// mirror resources if it doesn't exist.
func (w *dockerWatcher) ensureNamespace(ctx context.Context) error {
	var ns corev1.Namespace
	if err := w.k8s.Get(ctx, client.ObjectKey{Name: apiNamespace}, &ns); err != nil {
		if !apierrors.IsNotFound(err) {
			return err
		}
		ns = corev1.Namespace{
			ObjectMeta: metav1.ObjectMeta{Name: apiNamespace},
		}
		if err := w.k8s.Create(ctx, &ns); err != nil && !apierrors.IsAlreadyExists(err) {
			return fmt.Errorf("failed to create namespace %s: %w", apiNamespace, err)
		}
	}
	return nil
}

// syncContainerNamespace creates the "moby" `ContainerNamespace` mirror.
// Unlike `Container` / `Image` / `Volume` mirrors, this resource has no
// mirror finalizer: Docker has no corresponding engine object to delete
// on the reverse path, and cleanupMirrorResources sweeps it
// unconditionally on VM stop, so a finalizer with no handler would only
// trap user-initiated deletes in Terminating until the next bounce.
func (w *dockerWatcher) syncContainerNamespace(ctx context.Context) error {
	applyConfig := containersv1alpha1apply.ContainerNamespace(containerNamespace, apiNamespace)

	return w.k8s.Apply(ctx, applyConfig,
		client.ForceOwnership, client.FieldOwner(controllerName))
}

// syncAllVolumes lists all Docker volumes and creates/updates `Volume`
// mirrors, then removes stale ones.
func (w *dockerWatcher) syncAllVolumes(ctx context.Context) error {
	log := logf.FromContext(ctx).WithName("docker-watcher")

While the App is running, deleting either of those bootstrap resources produces no watch event that will re-enter fullSync(). The engine.bats test at line 362 (deleting containernamespace/moby completes without a finalizer hang) only works because the next test does rdd set running=true which cycles the watcher; once the file runs, the intervening state is "moby namespace is gone and nobody notices". The namespace case is worse: once namespace/rancher-desktop is deleted, every subsequent mirror apply inside handleEvent fails with a NotFound, and docker_watcher.go:203-206 explicitly logs-and-drops those errors until a watcher restart happens. The controller is wedged in a broken steady state with no self-heal path.

pkg/controllers/app/engine/controllers/docker_watcher.go

			// on the same ID; image pull and volume create events fire
			// only once, so a dropped apply leaves the mirror missing
			// until the next full resync (watcher restart or engine
			// reconnect). The bounded-window fix is a periodic
			// fullSync tick, deferred until the rest of the mirroring
			// work lands.
			if err := w.handleEvent(ctx, msg); err != nil {
				log.Error(err, "Failed to handle Docker event",
					"type", msg.Type, "action", msg.Action, "actor", msg.Actor.ID)
			}
		}
	}
}

pkg/controllers/app/engine/controllers/engine_reconciler.go

// errors.Join and returned so controller-runtime requeues with backoff
// — matching the pattern in processContainerFinalizers. Without the
// return, a failed start/stop would get exactly one attempt (the watch
// event from the original spec.state patch) and then sit forever.
func (r *EngineReconciler) reconcileContainerSpecs(ctx context.Context) error {
	r.watcherMu.Lock()
	w := r.watcher
	r.watcherMu.Unlock()
	if w == nil {
		return nil
	}

Fix needs both pieces: watch the bootstrap resources so a delete enqueues a reconcile, AND move ensureNamespace() / syncContainerNamespace() out of startup-only and into the steady-state reconcile path so the wake actually rebuilds them.

 return ctrl.NewControllerManagedBy(mgr).
     For(&appv1alpha1.App{}).
     Named("engine-reconciler").
     WatchesRawSource(source.Channel(r.reconcileChan, enqueueApp)).
+    Watches(&corev1.Namespace{}, enqueueApp /* filter name == apiNamespace */).
+    Watches(&containersv1alpha1.ContainerNamespace{}, enqueueApp).
     Watches(&containersv1alpha1.Container{}, enqueueApp).
     Watches(&containersv1alpha1.Image{}, enqueueApp).
     Watches(&containersv1alpha1.Volume{}, enqueueApp).
     Complete(r)

Combined with calling ensureNamespace + syncContainerNamespace near the top of the steady-state reconcile (before any sub-resource apply) so the reconciler that the watch enqueues can actually recover.

I2. rdd set running=true wait returns success before Lima applies template/backend changes Codex¶

cmd/rdd/set.go

//
// TODO: changes that trigger a VM restart without changing "running" (e.g.
// containerEngine.name alone) are still not waited on. A dedicated
// "Reconciled" condition on App would let the CLI detect when the full
// reconcile chain has settled for any property change.
func waitForDesiredState(ctx context.Context, config *rest.Config, properties map[string]string, timeout time.Duration, minGen int64) error {
	runningVal, ok := properties["running"]
	if !ok {
		return nil
	}

	ctx, cancel := context.WithTimeout(ctx, timeout)
	defer cancel()

	if runningVal == "true" {
		logrus.Info("Waiting for container engine to be ready")
		return watchCondition(ctx, config, func(obj *unstructured.Unstructured) bool {
			status, gen, present := conditionInfo(obj, "ContainerEngineReady")
			return present && gen >= minGen && status == metav1.ConditionTrue
		})
	}
	logrus.Info("Waiting for the VM to stop")
	return watchCondition(ctx, config, func(obj *unstructured.Unstructured) bool {
		// Running absent means the VM was never started (nothing to
		// wait for). Otherwise wait until the condition is refreshed
		// with our generation and reports a non-True status.
		status, gen, present := conditionInfo(obj, "Running")
		if !present {
			return true
		}
		return gen >= minGen && status != metav1.ConditionTrue
	})
}

// watchCondition watches the App singleton until the supplied predicate
// returns true or the context expires. It uses watchtools.UntilWithSync
// so benign watch-channel closures (API server timeouts, proxy
// disconnects, resource-version compaction) re-list transparently

The new wait path treats ContainerEngineReady as the completion signal for every running=true command. But engine_reconciler.go:184-186 re-asserts ContainerEngineReady=True/Connected on every steady-state reconcile solely to advance observedGeneration, with a comment explicitly stating that this "is the only place that stamps the new generation into ContainerEngineReady, and rdd set's wait predicate depends on it".

pkg/controllers/app/engine/controllers/engine_reconciler.go

	// running=true kubernetes.enabled=true` from an already-running App),
	// this is the only place that stamps the new generation into
	// ContainerEngineReady, and rdd set's wait predicate depends on it.
	// setEngineCondition no-ops when nothing changed, so the extra call
	// is free on stable reconciles.
	if err := r.setEngineCondition(ctx, &app, metav1.ConditionTrue, "Connected", "Container engine synced"); err != nil {
		return ctrl.Result{}, err
	}

	// reconcileContainerSpecs + processFinalizers each issue one or more
	// List() calls across every mirrored object on every reconcile, and
	// every Container/Image/Volume watch event triggers a reconcile via
	// SetupWithManager below. Cost is therefore O(N) per child event.

That self-stamping is correct for this code path's purpose, but it decouples the signal from actual convergence. When the user runs rdd set running=true kubernetes.enabled=true or rdd set running=true containerEngine.name=containerd against an already-running App:

The patch lands; App generation becomes N+1.
AppReconciler.Reconcile fires, sees the template ConfigMap differs, patches the template ConfigMap at app_controller.go:274-290, and returns with Requeue: true. It does not wait for Lima to pick up the change.
EngineReconciler.Reconcile fires in parallel on the same App change. running=true, engineIsDocker=true, watcher is still alive → it re-stamps ContainerEngineReady=Connected/True with ObservedGeneration=N+1.
rdd set's watchCondition observes status=True && observedGeneration >= minGen → returns success.
Lima still has to pick up the template-ConfigMap change, mark restartNeeded, stop the VM, start it with the new template. The engine controller will eventually sweep mirrors and set ContainerEngineReady=Stopped, but the user has already seen the command succeed and moved on — docker ps shows a different engine's state than the one the user asked for.

pkg/controllers/app/app/controllers/app_controller.go

		} else {
			desired, err := applySpecToTemplate(r.LimaTemplateData, app.Spec)
			if err != nil {
				return ctrl.Result{}, fmt.Errorf("failed to apply spec to template: %w", err)
			}
			if templateCM.Data[limav1alpha1.TemplateConfigMapKey] != desired {
				patch := client.MergeFrom(templateCM.DeepCopy())
				if templateCM.Data == nil {
					templateCM.Data = make(map[string]string)
				}
				templateCM.Data[limav1alpha1.TemplateConfigMapKey] = desired
				if err := r.Patch(ctx, templateCM, patch); err != nil {
					return ctrl.Result{}, fmt.Errorf("failed to update template ConfigMap: %w", err)
				}
				log.Info("Updated template ConfigMap",
					"containerEngine", app.Spec.ContainerEngine.Name,
					"kubernetesEnabled", app.Spec.Kubernetes.Enabled,
					"kubernetesVersion", app.Spec.Kubernetes.Version)
				// ConfigMaps are not watched, so requeue to let the
				// reconciler evaluate remaining spec fields (e.g. running).
				return ctrl.Result{Requeue: true}, nil
			}
		}
	}

	if !limaVM.Spec.Running && app.Spec.Running {
		limaVM.Spec.Running = true

The existing TODO at set.go:463-466 acknowledges the gap for "running not changed" cases; this finding extends the same logic to running=true combined with a template-affecting field.

cmd/rdd/set.go

// so a stale ContainerEngineReady=True from a previous backend (or a
// stale Running=False from a prior stop) cannot satisfy the wait before
// the reconciler has observed the new spec. EngineReconciler and
// AppReconciler both stamp ObservedGeneration when they update conditions.
//
// TODO: changes that trigger a VM restart without changing "running" (e.g.
// containerEngine.name alone) are still not waited on. A dedicated
// "Reconciled" condition on App would let the CLI detect when the full
// reconcile chain has settled for any property change.
func waitForDesiredState(ctx context.Context, config *rest.Config, properties map[string]string, timeout time.Duration, minGen int64) error {
	runningVal, ok := properties["running"]
	if !ok {
		return nil
	}

Fix: condition the wait target on what actually changed. Three feasible shapes:

For any patch that touches a template input (containerEngine.*, kubernetes.*), wait on LimaVM.status.observedTemplateResourceVersion (already maintained for this purpose, per pkg/apis/lima/v1alpha1 and docs/design/api_lima.md) rather than ContainerEngineReady.
Or introduce the "Reconciled" condition on App mentioned in the TODO comment and key the wait on that.
Or, as a short-term fix that does not need a new condition, stop re-stamping ContainerEngineReady purely to advance observedGeneration in the steady-state path — but that breaks the current wait semantics for the cold-start-from-stopped path.

The first option is cheapest: rdd set knows which fields it patched, so it can pick the right wait predicate per patch.

I3. syncAllVolumes fails the whole fullSync on a transient per-item error Claude Gemini¶

pkg/controllers/app/engine/controllers/sync_volumes.go

	}

	// Track which `Volume` mirror names we create.
	activeNames := make(map[string]bool, len(volumeList.Items))

	// Per-item apply failures are logged and skipped rather than failing
	// the whole startup, matching the containers/images pattern: a
	// single permanently-broken Apply (transient API 503, SSA conflict
	// past its internal retry) must not pin ContainerEngineReady at
	// ConnectFailed. Structural errors below (k8s list, stale-mirror
	// cleanup) are still fatal.
	var errs []error
	for _, v := range volumeList.Items {
		mirrorName := volumeMirrorName(v.Name)
		activeNames[mirrorName] = true
		if err := w.applyVolume(ctx, v); err != nil {
			log.Error(err, "Skipping volume during full sync", "name", v.Name)
		}

syncAllContainers (sync_containers.go:57-63) and syncAllImages (sync_images.go:86-93) both log a per-item apply failure and continue, with an explicit comment calling out that transient failures must not "prevent every other healthy container from being mirrored, nor pin the ContainerEngineReady condition at ConnectFailed". syncAllVolumes does not follow that contract — it accumulates per-item errors into errs and returns them.

pkg/controllers/app/engine/controllers/sync_containers.go

	// failing the whole startup: a single permanently-broken Inspect
	// (engine-side corruption, transient race) must not prevent every
	// other healthy container from being mirrored, nor pin the
	// ContainerEngineReady condition at ConnectFailed. Structural
	// errors (k8s list, stale-mirror cleanup) are still fatal.
	var errs []error
	for _, dc := range listResult.Items {
		dockerIDs[dc.ID] = true
		if err := w.syncContainer(ctx, dc.ID); err != nil {
			log.Error(err, "Skipping container during full sync", "id", dc.ID)
		}
	}

	// Remove stale Container mirrors.
	var containerMirrors containersv1alpha1.ContainerList
	if err := w.k8s.List(ctx, &containerMirrors, client.InNamespace(apiNamespace)); err != nil {
		return fmt.Errorf("failed to list Containers: %w", err)

pkg/controllers/app/engine/controllers/sync_images.go

	var errs []error
	for _, summary := range listResult.Items {
		for _, n := range imageMirrorNames(summary.ID, summary.RepoTags) {
			activeNames[n] = true
		}
		if err := w.syncImageFromSummary(ctx, summary); err != nil {
			log.Error(err, "Skipping image during full sync", "id", summary.ID)
		}
	}

	// Remove stale Image mirrors.
	var imageMirrors containersv1alpha1.ImageList
	if err := w.k8s.List(ctx, &imageMirrors, client.InNamespace(apiNamespace)); err != nil {
		return fmt.Errorf("failed to list Images: %w", err)
	}
	for i := range imageMirrors.Items {
		img := &imageMirrors.Items[i]
		if !activeNames[img.Name] {

The downstream chain amplifies this: fullSync returns errors.Join(...), newDockerWatcher forwards that error to startWatcher, which sets ContainerEngineReady=False/ConnectFailed and returns. A single transient API-server 503 on one Volume apply fails the entire watcher startup until the condition clears.

Fix: match the containers/images pattern exactly — log the error and continue.

 var errs []error
 for _, v := range volumeList.Items {
     mirrorName := volumeMirrorName(v.Name)
     activeNames[mirrorName] = true
     if err := w.applyVolume(ctx, v); err != nil {
-        errs = append(errs, err)
+        log.Error(err, "Skipping volume during full sync", "name", v.Name)
     }
 }

The stale-mirror sweep below still uses errs — leave that branch untouched. If the mirror is missing, the next event will pick it up; if the volume is permanently broken, nothing in either pattern catches it anyway.

I4. engine.bats computes DOCKER_HOST via command substitution at file scope Gemini Codex¶

bats/tests/32-app-controllers/engine.bats

# containers, images, and volumes into Container, Image, and Volume
# resources, and that deletions and spec.state changes are forwarded
# to Docker.

NAMESPACE="rancher-desktop"
DOCKER_HOST="unix://$(rdd svc paths docker_socket)"
export DOCKER_HOST

local_setup_file() {
    # The Docker socket access pattern used by these tests is not yet wired
    # up for Windows/WSL2.
    skip_on_windows

The $(rdd svc paths short_dir) substitution runs when BATS loads the file, before local_setup_file has a chance to run skip_on_windows or establish any test context. If rdd is missing, unbuildable, or returns an error, the failure surfaces as a file-load-time error rather than a clean per-test failure, and the value silently becomes unix:///docker.sock (no error, no indication which test file broke). The repo's BATS style rules (per ~/.claude/memory/bats-style.md) explicitly prefer run -0 for commands whose output is captured, so failures are both visible and attributable.

Fix: move the assignment into local_setup_file, using run -0 so a failure is reported against the file rather than silently eaten.

 NAMESPACE="rancher-desktop"
-DOCKER_HOST="unix://$(rdd svc paths short_dir)/docker.sock"
-export DOCKER_HOST

 local_setup_file() {
     skip_on_windows
     rdd svc delete
     rdd set running=true
+    run -0 rdd svc paths short_dir
+    DOCKER_HOST="unix://${output}/docker.sock"
+    export DOCKER_HOST
 }

Reconciliation note: Gemini and Codex both raised this; I kept it as Important because the failure mode (silent substitution during file load) is the exact class the BATS style rule exists to prevent, even though Codex filed it as a Suggestion.

I5. removeImagesByID can miss Image mirrors whose status.id has not yet been populated Claude¶

pkg/controllers/app/engine/controllers/sync_images.go

// removeImagesByID finds and removes all `Image` mirrors for a given Docker image ID.
//
// Matching is by status.id, which leaves a narrow orphan window: if
// applyImage's metadata Apply succeeded but the status SubResource Apply
// failed transiently, the mirror exists with an empty status.id and this
// function will not match it. The orphan is swept by syncAllImages'
// stale-mirror pass on the next watcher restart, which is the existing
// self-heal path; the window is too narrow to justify a fallback lookup
// by computed dangling-ID name here.
func (w *dockerWatcher) removeImagesByID(ctx context.Context, id string) error {
	var images containersv1alpha1.ImageList
	if err := w.k8s.List(ctx, &images, client.InNamespace(apiNamespace)); err != nil {
		return err
	}
	var errs []error
	for i := range images.Items {
		if images.Items[i].Status.ID == id {
			if err := w.removeMirrorResource(ctx, &images.Items[i], images.Items[i].Name); err != nil {
				errs = append(errs, err)
			}
		}
	}
	return errors.Join(errs...)
}

applyImage (sync_images.go:191-212) issues two sequential calls — the metadata Apply and the status SubResource Apply. If the first succeeds and the second fails transiently (for example a 503 or a SSA conflict past its internal retry), the Image mirror exists with an empty status.id. A subsequent Docker ActionDelete event for the underlying image ID routes through removeImagesByID, which iterates by status.id == id — the empty-status mirror never matches and is left as an orphan. The next full sync at watcher restart prunes it via syncAllImages' stale-mirror sweep, but during normal operation the orphan persists.

pkg/controllers/app/engine/controllers/sync_images.go


// applyImage creates or updates a single `Image` mirror and its status.
func (w *dockerWatcher) applyImage(
	ctx context.Context,
	image *containersv1alpha1apply.ImageApplyConfiguration,
	status *containersv1alpha1apply.ImageStatusApplyConfiguration,
) error {
	err := w.k8s.Apply(ctx, image,
		client.ForceOwnership, client.FieldOwner(controllerName))
	if err != nil {
		return fmt.Errorf("failed to apply image %s: %w", *image.GetName(), err)
	}

	err = w.k8s.SubResource("status").Apply(ctx,
		containersv1alpha1apply.Image(*image.GetName(), *image.GetNamespace()).
			WithStatus(status),
		client.ForceOwnership, client.FieldOwner(controllerName))
	if err != nil {
		return fmt.Errorf("failed to apply image status %s: %w", *image.GetName(), err)
	}

	return nil
}

// reconcileImageByID re-inspects a Docker image and reconciles every
// `Image` mirror whose status.id matches. Tags still present are
// re-applied; Image mirrors for tags that are no longer present are
// deleted. If the image has been fully removed from Docker, all mirrors
// with that status.id are deleted instead.
//
// This is the path for events that carry an image ID but not the tag
// name — notably Docker's untag events, where the event payload only

The symmetric applyImageFromInspect path for dangling images is also affected: a dangling image created and deleted faster than the status apply completes produces an orphan mirror.

Fix: as a fallback when no mirror matches by status.id, resolve the expected mirror name(s) with imageMirrorNames(id, nil) — the same helper used to create the mirror name for a dangling image — and try to remove by computed name too. The dangling-ID name is deterministic from the Docker ID alone, so this cleanly handles the empty-status case without needing to know the tags.

matched := make(map[string]bool)
for i := range images.Items {
    if images.Items[i].Status.ID == id {
        matched[images.Items[i].Name] = true
        if err := w.removeMirrorResource(ctx, &images.Items[i], images.Items[i].Name); err != nil {
            errs = append(errs, err)
        }
    }
}
for _, name := range imageMirrorNames(id, nil) {
    if matched[name] {
        continue
    }
    if err := w.removeMirrorResource(ctx, &containersv1alpha1.Image{}, name); err != nil && !apierrors.IsNotFound(err) {
        errs = append(errs, err)
    }
}

Suggestions ¶

S1. isStopped omits ContainerStatusUnknown, triggering a ContainerStop dispatch on fresh mirrors Claude¶

pkg/controllers/app/engine/controllers/docker_watcher.go

// created branch always dispatches Stop from any non-stopped state. The
// running branch handles paused explicitly because Docker rejects
// ContainerStart on a paused container. isStopped() treats
// created/exited/dead as terminal stopped states: the user's intent
// ("do not run") is already satisfied and redispatching Stop would be
// wasted work on every unrelated reconcile event.
func (w *dockerWatcher) reconcileContainerState(ctx context.Context, c *containersv1alpha1.Container) error {
	desired := c.Spec.State
	if desired == containersv1alpha1.ContainerStatusUnknown {
		return nil
	}

	actual := c.Status.Status
	log := logf.FromContext(ctx).WithName("docker-watcher")

	switch desired {
	case containersv1alpha1.ContainerStatusRunning:
		if actual == containersv1alpha1.ContainerStatusRunning {
			return nil
		}
		if actual == containersv1alpha1.ContainerStatusPaused {
			log.Info("Unpausing container", "id", c.Name)
			_, err := w.cli.ContainerUnpause(ctx, c.Name, dockerclient.ContainerUnpauseOptions{})
			return err
		}
		log.Info("Starting container", "id", c.Name)
		_, err := w.cli.ContainerStart(ctx, c.Name, dockerclient.ContainerStartOptions{})
		return err
	case containersv1alpha1.ContainerStatusCreated:
		if isStopped(actual) {
			return nil
		}
		log.Info("Stopping container", "id", c.Name)
		_, err := w.cli.ContainerStop(ctx, c.Name, dockerclient.ContainerStopOptions{})
		return err
	}
	return nil
}

// isStopped reports whether a container status represents a terminal
// non-running state. Every state listed here satisfies a desired state
// of "created": the container is not running, cannot transition without
// a user action (start/rm), and dispatching ContainerStop again would
// be a no-op.
func isStopped(s containersv1alpha1.ContainerStatusValue) bool {
	switch s {
	case containersv1alpha1.ContainerStatusCreated,
		containersv1alpha1.ContainerStatusExited,
		containersv1alpha1.ContainerStatusDead:
		return true
	}
	return false
}

// deleteContainer removes a container from Docker. NotFound errors are
// treated as success (the container is already gone); all other errors
// are returned so the caller keeps the mirror finalizer in place and

If a Container mirror has status.status still defaulted to unknown (the CRD default per container_types.go:108, before the first applyContainer status Apply lands), and the user patches spec.state=created at that moment, reconcileContainerState sees desired=created, actual=unknown, and the isStopped guard says "not stopped" → dispatches ContainerStop. This matches the state matrix comment at docker_watcher.go:363 (the author's intentional choice: "created: anything-not-stopped → stop"), but it wastes a Docker round trip per reconcile until the status catches up, and it makes the state matrix disagree with itself (the running row treats unknown as "start", which is equally speculative).

pkg/apis/containers/v1alpha1/container_types.go

	// +optional
	Labels map[string]string `json:"labels"`
	// Status of the container.
	//
	// +required
	// +kubebuilder:default:=unknown
	Status ContainerStatusValue `json:"status"`
	// Pid is the process identifier for the main process in the container.
	//
	// +optional
	Pid int32 `json:"pid"`

pkg/controllers/app/engine/controllers/docker_watcher.go

// from Docker). The webhook restricts desired to {unknown, created,
// running}; actual can be any value in the CRD enum.
//
//	desired \ actual  | running | paused   | created | exited | dead | pausing/restarting | removing | unknown
//	------------------+---------+----------+---------+--------+------+--------------------+----------+--------
//	running           | nil     | unpause  | start   | start  | start| start              | start    | start
//	created           | stop    | stop     | nil     | nil    | nil  | stop               | stop     | stop
//
// ContainerStop handles paused/restarting containers natively, so the
// created branch always dispatches Stop from any non-stopped state. The
// running branch handles paused explicitly because Docker rejects

The change is a design judgment call rather than a clear bug — the author's matrix is internally consistent — so this is surfaced as a suggestion rather than an issue. If the intent is "do nothing until the status is known", add ContainerStatusUnknown to isStopped and add an actual == ContainerStatusUnknown no-op guard at the top of the running branch.

Reconciliation note: Claude originally flagged this as Important. Downgraded because the author's documented matrix explicitly covers the unknown column — the question is whether the matrix itself is right, not whether the code matches the matrix.

S2. Controller multiplexes too many event domains through one App-scoped loop Claude Codex¶

The current EngineReconciler.Reconcile drives watcher lifecycle, bootstrap resource creation (when it happens at all — see I1), per-Container spec.state actuation, and per-resource finalizer processing, all gated on a single App-scoped reconcile triggered by broad Container / Image / Volume watches. The code already acknowledges the cost ("reconcileContainerSpecs + processFinalizers each issue one or more List() calls across every mirrored object on every reconcile"), but the architectural problem is bigger than List cost: I1 is a "created but not watched" gap that is a direct consequence of collapsing bootstrap into fullSync(), and C2 is a "one reconcile path blocks sibling responsibilities" gap that is a direct consequence of serializing unrelated concerns in one reconcile function.

Splitting into narrower reconcilers (watcher lifecycle + bootstrap on App; a finalizer reconciler per resource type with a deletionTimestamp != nil predicate; a spec.state reconciler with a spec.state changed predicate) would make each reconcile's wake-up set explicit and eliminate an entire class of stalls. This is a future-work design observation rather than a change request for this PR.

S3. syncAllImages/syncAllContainers inspect each resource sequentially at watcher startup Claude¶

Both full-sync functions explicitly call out the sequential Inspect as a conscious trade-off ("typical dev machines have tens of containers, not hundreds"). For a single-user local dev machine the choice is defensible; the note is worth keeping because the O(N) wall-clock cost is user-visible at watcher startup and the engine.bats suite already creates several images/containers serially. If users ever report ContainerEngineReady taking a long time after rdd set running=true, this is the first thing to parallelize.

S4. engine.bats bypasses setup_rdd_control_plane and relies on rdd set's implicit service start Claude¶

bats/tests/32-app-controllers/engine.bats


NAMESPACE="rancher-desktop"
DOCKER_HOST="unix://$(rdd svc paths docker_socket)"
export DOCKER_HOST

local_setup_file() {
    # The Docker socket access pattern used by these tests is not yet wired
    # up for Windows/WSL2.
    skip_on_windows
    # Deliberately skip setup_rdd_control_plane here: `rdd set` internally
    # calls ensureServiceRunning, which is exactly the CLI path we want to
    # exercise — the engine controller is part of the default controller
    # set, so no explicit --controllers selection is needed.
    rdd svc delete
    rdd set running=true
}

Every other BATS file calls setup_rdd_control_plane "..." which explicitly starts the service with a chosen set of controllers. engine.bats skips that helper and relies on rdd set's implicit ensureServiceRunning (via getAppKubeClient at cmd/rdd/set.go:574) to start the service with all controllers. This works, but it:

cmd/rdd/set.go

}

// getAppKubeClient returns a controller-runtime client with the App scheme
// registered, and the raw REST config for creating other clients.
func getAppKubeClient(ctx context.Context) (client.Client, *rest.Config, error) {
	if err := ensureServiceRunning(ctx); err != nil {
		return nil, nil, err
	}
	config, err := service.GetKubeRestConfig()
	if err != nil {
		return nil, nil, fmt.Errorf("failed to get kubeconfig: %w", err)

Couples the test to an implicit CLI path.
Cannot specify a subset of controllers (always gets the full set).
Diverges from the house style so future readers have to trace through ensureServiceRunning to understand the setup contract.

Fix is either to switch to setup_rdd_control_plane '*' + rdd set running=true, or to document the deliberate choice with a comment explaining why the implicit path is preferred here.

Reconciliation note: Claude originally flagged this as Important. Downgraded — this is a style/consistency issue, not a correctness bug.

S5. containerd backend test switches the engine to containerd but never switches back Claude¶

The final test sets containerEngine.name=containerd and never restores moby. teardown_file stops the service, so later test files are unaffected as long as each one owns its own instance. But if the file is ever extended with more tests after that one, they would unexpectedly run against containerd.

S6. reconcileContainerState reads c.Status.Status from a potentially stale informer cache Claude¶

The Container object passed to reconcileContainerState comes from r.List() in engine_reconciler.go:362, which reads the controller-runtime informer cache. Between the List and the Docker API call, the container's actual state may have moved on. ContainerStart / ContainerStop are idempotent for their normal targets, but ContainerUnpause on an already-unpaused container returns a Docker error, which becomes a reconcile error, which causes a backoff-requeue loop until the cache catches up.

pkg/controllers/app/engine/controllers/engine_reconciler.go

// errors.Join and returned so controller-runtime requeues with backoff
// — matching the pattern in processContainerFinalizers. Without the
// return, a failed start/stop would get exactly one attempt (the watch
// event from the original spec.state patch) and then sit forever.
func (r *EngineReconciler) reconcileContainerSpecs(ctx context.Context) error {
	r.watcherMu.Lock()
	w := r.watcher
	r.watcherMu.Unlock()
	if w == nil {
		return nil
	}

In practice the cache converges within milliseconds and the transient error self-heals. Not worth fixing unless the CI starts flaking on pause/unpause races; if it does, the fix is a cerrdefs.IsNotPaused (or equivalent) check analogous to the IsNotFound pattern already used for the delete helpers.

Reconciliation note: Claude originally flagged this as Important. Downgraded — the race is transient and the controller requeues, so it self-heals within the next reconcile tick.

S7. conditionInfo silently returns observedGeneration=0 for missing fields Claude¶

cmd/rdd/set.go


// conditionInfo returns the status, observedGeneration, and presence of
// the named condition on an App. present is false when the condition is
// not on the object; callers must check it before trusting the other
// return values.
func conditionInfo(obj *unstructured.Unstructured, condType string) (status metav1.ConditionStatus, observedGen int64, present bool) {
	conditions, found, err := unstructured.NestedSlice(obj.Object, "status", "conditions")
	if err != nil || !found {
		return "", 0, false
	}
	for _, c := range conditions {
		cond, ok := c.(map[string]any)
		if !ok {
			continue
		}
		if cond["type"] != condType {
			continue
		}
		s, _ := cond["status"].(string)
		gen, _, _ := unstructured.NestedInt64(cond, "observedGeneration")
		return metav1.ConditionStatus(s), gen, true
	}
	return "", 0, false
}

// getAppKubeClient returns a controller-runtime client with the App scheme
// registered, and the raw REST config for creating other clients.
func getAppKubeClient(ctx context.Context) (client.Client, *rest.Config, error) {
	if err := ensureServiceRunning(ctx); err != nil {

unstructured.NestedInt64 returns (0, false, nil) when the field is missing. The wait predicate at set.go:480 and :492 checks gen >= minGen, so an absent observedGeneration correctly makes the predicate false (0 < minGen) and the wait keeps going. The behavior is right; the error-discarding line is easy to misread as dropping a real error. A one-line comment at the unstructured.NestedInt64 call ("missing observedGeneration → 0 is the correct default; the predicate already filters on >= minGen") would remove the readability papercut.

cmd/rdd/set.go


	if runningVal == "true" {
		logrus.Info("Waiting for container engine to be ready")
		return watchCondition(ctx, config, func(obj *unstructured.Unstructured) bool {
			status, gen, present := conditionInfo(obj, "ContainerEngineReady")
			return present && gen >= minGen && status == metav1.ConditionTrue
		})
	}
	logrus.Info("Waiting for the VM to stop")
	return watchCondition(ctx, config, func(obj *unstructured.Unstructured) bool {
		// Running absent means the VM was never started (nothing to

S8. Image mirror naming scheme differs from volume naming scheme for no functional reason Claude¶

Volume mirrors use vol-<sha256(name)> (68 chars, opaque, deterministic from the Docker name). Image mirrors use <sanitize(id)>-<sha256(tag)> (~136 chars, encodes the Docker ID into the K8s name). Both are deterministic, both are valid RFC 1123 subdomain names — the asymmetry is purely cosmetic. The image scheme makes K8s-side debugging easier (you can eyeball the image ID from the mirror name), the volume scheme is shorter. No action needed; noting for consistency if either scheme is ever revisited.

Design Observations ¶

Concerns ¶

O(N) reconcile cost per child event (in-scope) Claude Codex. Every Container / Image / Volume watch event triggers a full App reconcile that Lists all three types and sweeps for finalizers and spec.state changes. The comment at engine_reconciler.go:188-194 documents the cost and defers the split into per-resource reconcilers. This PR does not make it worse, but S2 makes the case that the design gap also produces structural stalls (C2, I1), so the split is worth sequencing earlier than "long-term".
setEngineCondition is re-called on every steady-state reconcile purely to advance observedGeneration (in-scope) Codex. This is the mechanism the rdd set wait predicate relies on, but it also makes the condition a low-fidelity proxy for "the engine controller has observed the App generation" rather than "the container engine is actually ready for the requested state" — see I2 for the consequence. Worth revisiting once the wait predicate has a better target.

Strengths ¶

Docker daemon clock synchronization (in-scope) Claude Gemini. dockerEventsSince (docker_watcher.go:107-122) reads the daemon's own SystemTime before fullSync() and uses it as the Since filter for the event stream, closing the blind spot between the List-based snapshot and the event subscription. The host-clock fallback is explicitly documented as biased because fullSync is idempotent. This is the kind of race other embedded-VM controllers routinely get wrong.
Dual SSA field owners for finalizer-only applies (in-scope) Claude Codex. The finalizerFieldOwner pattern (engine_reconciler.go:45-50, sync_containers.go:146-158) is a clean solution to the SSA ownership problem where a finalizer-only apply would drop spec ownership from controllerName and fail the Container CRD's required-field validation. The comment explaining why the second field manager exists is exactly the kind of non-obvious design rationale that belongs inline.
Generation-threaded wait predicate (in-scope) Claude. rdd set's minGen is captured post-write and threaded through to waitForDesiredState / watchCondition, so a stale condition from a previous reconcile cannot satisfy the wait. This is the subtle correctness problem I2 picks at — but the predicate itself is right, the issue is which condition to key it on.
Consistent retry-on-conflict for multi-writer status updates (in-scope) Claude Codex. Both AppReconciler (app_controller.go:311-334) and EngineReconciler (engine_reconciler.go:241-267) use retry.RetryOnConflict + re-Get when writing to App.Status.Conditions. This replaces the older 409-loop behavior that the repo context document calls out as the right pattern for this controller.
Atomic-list SSA stabilization (in-scope) Claude Codex Gemini. Sorting RepoDigests, Container port bindings, and port names before applying (sync_containers.go:187-225, sync_images.go:171-176) avoids the Go-map-iteration-order churn that has bitten other controllers. +listType=map +listMapKey=name on Status.Ports further pins the canonical order.

Testing Assessment ¶

The new engine.bats suite is comprehensive for its scope: it covers image pull/tag/untag/dangling (including the tricky direction of tagging a dangling image), container lifecycle with status=running → exited, volume create/rm, volume-name hashing, container delete forwarding, finalizer-held in-use image delete, spec.state=created/running/unknown transitions including the paused branches, cleanup on VM stop, rdd set --timeout=0, VM restart recreating the namespace, and the containerd-backend NotApplicable path. The unit test sync_containers_test.go exhaustively covers parseContainerName.

Coverage gaps the findings expose:

Bootstrap-resource self-heal (I1). No test deletes namespace/rancher-desktop or containernamespace/moby while the engine is running and asserts that the controller rebuilds them — only the "delete containernamespace/moby does not hang" case, which succeeds for the wrong reason (the next test restarts the VM).
Watcher self-heal after event-stream failure (C1). No test simulates a Docker event-stream drop and verifies the reconciler detects the dead watcher and reconnects. This would require some Docker fault injection; worth doing even as a unit test against a fake Docker client.
Finalizer pipeline under persistent per-container error (C2). No test creates a container with a spec.state Docker keeps rejecting and then deletes a sibling Image / Volume, to verify the sibling finalizer still runs. This is the exact shape C2 would fail at.
rdd set wait with template-affecting changes (I2). No test does rdd set running=true kubernetes.enabled=true on an already-running VM and asserts that the wait holds until LimaVM has actually applied the template. The existing tests only exercise running=true on a stopped VM, where ContainerEngineReady=True is a correct signal.
Status-empty mirror delete race (I5). No test covers an ActionDelete for an Image whose status Apply hasn't landed — the orphan-mirror path.

Codex ran go test ./pkg/controllers/app/engine/controllers and go test ./pkg/controllers/containers/container during its review; both passed.

Documentation Assessment ¶

docs/design/api_app.md, docs/design/api_containers.md, and docs/design/cmd_app.md are all updated consistently with the implementation. The terminology block added to api_containers.md (K8s resource Container vs Docker engine object container) is a useful distinction that is easy to forget in reviews.

If I2 is not fixed, both cmd_app.md and the new ContainerEngineReady text in api_app.md should be tightened: as written they read like a "desired state reached" guarantee, but the condition only proves "the engine controller has observed the new App generation". The honest phrasing is "the engine controller is connected to Docker at or after App generation N", which is precisely what the signal means today.

Commit Structure ¶

The PR is a single commit, a154249, with message WIP: squash of docker-v2 engine commits for rebase — the original rebase-prep placeholder. Before this lands on main the commit message should summarize the actual user-visible change: engine mirroring of Docker containers/images/volumes into new K8s resources, finalizer forwarding in both directions, new Container.spec.state actuation, and the new rdd set --timeout wait semantics. Attempting to infer the scope from the diff alone is needlessly expensive for future git blame readers.

Acknowledged Limitations ¶

O(N) List cost on every reconcile (engine_reconciler.go:188-194) — the code calls out the cost and defers the split into per-resource reconcilers. This change does not make it worse, but S2 argues the split is load-bearing for structural reasons beyond performance.
Dropped events for one-shot image-pull / volume-create actions (docker_watcher.go:193-202) — a transient handleEvent error leaves the mirror missing until the next full resync (watcher restart or reconnect). A periodic fullSync tick is deferred.
rdd set does not wait for non-running template changes (set.go:463-466) — the TODO acknowledges that containerEngine.name=containerd alone is not waited on. I2 extends this gap to the running=true combined case, which is not captured by the existing TODO.
Container spec.state is level-triggered and known broken (MEMORY.md) — the level-trigger design fights Docker restart policy and out-of-band state changes. Jan plans to redesign with edge-triggered action annotations after containerd mirroring lands; this PR intentionally does not address it.

Agent Performance Retro ¶

Claude ¶

Claude produced the widest review by volume — 5 Important and 5 Suggestion candidates — but leaned heavy on severity at the top of the scale. Its strongest unique catches were I5 (removeImagesByID missing mirrors with empty status.id) and the isStopped/unknown observation (now S1), both of which required tracing the apply path carefully rather than just reading the diff. Its original I2 (mirror-name length exceeding 253 chars) was a speculative finding that its own text admitted "the current Docker ID format stays within bounds" — downgraded out of the findings entirely because there is no code path that can produce a name longer than ~136 chars. Several of its Important calls turned out to be style (I4 → S4) or self-healing transients (I5 → S6); the initial severity floor was too high. Coverage was complete and its file-by-file table made reconciliation straightforward.

Codex ¶

Codex found two Important issues and both were the sharpest architectural catches in the review: I1 (bootstrap resources not watched) and I2 (rdd set wait premature for template-affecting changes). Both required cross-file reasoning — I1 only makes sense once you pair the Watch list in SetupWithManager against the one-shot fullSync() creations in sync_volumes.go, and I2 required tracing the interaction between engine_reconciler.go:184-186, app_controller.go:274-290, and the LimaVM restart path across a file not changed by this PR. Codex ran go test against the two relevant packages, which neither other agent did. It missed the two critical items — C1 and C2 both require reading for control-flow hazards rather than design hazards — and used "suggestion" severity more conservatively than the other two (its BATS substitution finding became Gemini's Important I4).

Gemini ¶

Gemini found both Critical items — the defer-ordering race on w.done (C1) and the reconcile-order stall on finalizer processing (C2) — which are the two findings with the highest symptom severity in the review. Both required the kind of "read the control flow, not the comments" attention that the other two agents did not apply to the same code paths. Its Important list was narrow (volume sync inconsistency, BATS substitution) but both items were also independently raised by other agents, so they held up under consolidation. Its Suggestions section was empty and its Design Observations were thin; the investment went almost entirely into the critical findings. Gemini also did not run git blame (its daily quota is a known constraint per the review skill notes), which would not have changed the findings here since this is a single new-feature commit with no regressions against prior work to disambiguate.

Summary ¶

	Claude Opus 4.6	Codex GPT 5.4	Gemini 3.1 Pro
Duration	4m 56s	7m 27s	5m 04s
Findings	1I 7S	2I	2C 2I
Tool calls	31 (Read 12, Bash 10, Grep 6)	57 (shell 55, plan 1, stdin 1)	6 (runshellcommand 3, grepsearch 2, readfile 1)
Design observations	5	4	2
False positives	0	0	0
Unique insights	5	2	2
Files reviewed	27	27	27
Coverage misses	0	0	0
Totals	1I 7S	2I	2C 2I
Downgraded	3 (I→S)	0	0
Dropped	1	0	0

All three reviews complemented each other cleanly: Gemini owned the critical control-flow bugs, Codex owned the architectural wait-semantic and watch-gap findings, and Claude owned the breadth of file-level nits. No agent's findings substantially overlapped the others' — consolidation had to unify only the volume-sync and BATS-substitution items. Codex had the highest hit rate on Important findings after severity reconciliation; Gemini had the highest impact-per-finding.

Review Process Notes ¶

Skill improvements ¶

Require a severity-reconciliation paragraph in the retro (standing rule): When the consolidator downgrades a finding from Important to Suggestion, the retro should capture why in one sentence per downgrade. This makes it clear the downgrade was a judgment call, not an omission, and prevents the report from looking like "Claude was wrong" when the real story is "Claude set a conservative severity floor and the reviewer picked a different one". The consolidation process already does this in the reconciliation notes on each finding; make the retro paragraph also reference the reconciliation so readers do not have to scroll back to see how many downgrades happened.

Repo context updates ¶

[repo] Engine-controller state matrices are load-bearing and should be re-read, not skimmed (standing rule to add to deep-review-context.md): The engine reconciler and the Container reconcileContainerState function both have tabular comments that document "for every desired×actual pair, dispatch X". Reviewers should read the matrix and pair each cell against isStopped-like guards, because the matrix can be internally consistent while still encoding a questionable design choice (as in S1). The reviewer should not take the matrix on faith — they should verify that every helper used in the switch is aligned with the matrix's claims.