Deep Review: 20260412-203409-pr-295
| Date | 2026-04-12 20:34 |
| Repo | rancher-sandbox/rancher-desktop-daemon |
| Round | 4 (of PR #295) |
| Author | @jandubois |
| PR | #295 — Container Engine watcher |
| Commits | 33db27d Address review round 3 findings for PR #295cf4b90e ci: disable Spotlight indexing on macOS BATS runnersc484dab ci: capture memory pressure and API state in support bundle3196b24 bats: park spec.state=unknown before pausing test-state84c0736 engine: apply Container/Image/Volume capitalization convention…plus 14 earlier commits |
| Reviewers | Claude Opus 4.6, Codex GPT 5.4, Gemini 3.1 Pro |
| Verdict | Merge with fixes — the spec.state=created convergence bug (I1) fires redundant ContainerStop calls on every unrelated watch event, and the App controller's blind Status().Update (I2) now races the new setEngineCondition writer. Neither is a functional break but both deserve a fix before merge. |
| Wall-clock time | 57 min 34 s |
Executive Summary ¶
This round adds an engine controller that mirrors Docker state (Container, Image, Volume) into the rdd Kubernetes API, forwards user-initiated deletes back to Docker, and makes rdd set wait for the desired state by default. The core structure — dockerWatcher for Docker-side I/O, EngineReconciler for controller-runtime integration — is clean and carefully commented, and the BATS coverage is thorough. Review round 3's findings have been addressed.
Structure: 0 critical, 4 important, 13 suggestions, 3 design observations.
The dominant issue is a single line of state-matching logic in reconcileContainerState: once a Container with spec.state=created reaches Docker's exited state, desired != actual keeps dispatching ContainerStop on every reconcile. This was independently caught by all three agents. Three other Important findings surface real concurrency or drift bugs that happen to be structural rather than catastrophic: the App controller's pre-existing Status().Update is no longer safe now that the engine controller writes the same conditions, rdd set's watch predicate ignores the ObservedGeneration the engine already stamps, and the status.ports list is built in non-deterministic map-iteration order which churns atomic-list SSA applies.
Critical Issues ¶
None.
Important Issues ¶
reconcileContainerState dispatches redundant ContainerStop every reconcile once a created-desired container reaches exited Claude Codex Gemini¶
return w.k8s.Update(ctx, latest)
})
if retryErr != nil {
return fmt.Errorf("failed to remove finalizer from %s: %w", name, retryErr)
}
deleteTarget := obj.DeepCopyObject().(client.Object)
deleteTarget.SetName(name)
deleteTarget.SetNamespace(apiNamespace)
return client.IgnoreNotFound(w.k8s.Delete(ctx, deleteTarget))
}
// reconcileContainerState checks the Container's spec.state against the
// observed status and dispatches the Docker action that bridges them.
// The engine creates Containers with spec.state="unknown", which the
// reconciler ignores.
//
// Full state matrix (rows=desired from the spec.state enum, columns=actual
// from Docker). The webhook restricts desired to {unknown, created,
// running}; actual can be any value in the CRD enum.
//
// desired \ actual | running | paused | created | exited | dead | pausing/restarting | removing | unknown
// ------------------+---------+----------+---------+--------+------+--------------------+----------+--------
// running | nil | unpause | start | start | start| start | start | start
// created | stop | stop | nil | nil | nil | stop | stop | stop
//
// ContainerStop handles paused/restarting containers natively, so the
// created branch always dispatches Stop from any non-stopped state. The
// running branch handles paused explicitly because Docker rejects
// ContainerStart on a paused container. isStopped() treats
// created/exited/dead as terminal stopped states: the user's intent
// ("do not run") is already satisfied and redispatching Stop would be
// wasted work on every unrelated reconcile event.
func (w *dockerWatcher) reconcileContainerState(ctx context.Context, c *containersv1alpha1.Container) error {
desired := c.Spec.State
if desired == containersv1alpha1.ContainerStatusUnknown {
return nil
}
actual := c.Status.Status
log := logf.FromContext(ctx).WithName("docker-watcher")
switch desired {
case containersv1alpha1.ContainerStatusRunning:
if actual == containersv1alpha1.ContainerStatusRunning {
return nil
}
if actual == containersv1alpha1.ContainerStatusPaused {
After the user patches spec.state=created on a running container, ContainerStop transitions the container to Docker's exited state (not created), and applyContainer writes status.status=exited at sync_containers.go:157. On the next reconcile — triggered by any Container/Image/Volume watch event via SetupWithManager's unconditional enqueueApp mapping — desired == "created" and actual == "exited", so the desired != actual check at line 341 falls through and dispatches ContainerStop again. Docker's ContainerStop on a non-running container returns NotModified which the client treats as success, so no error propagates — but the reconciler never reaches a quiescent state for this container. Cost scales with the rate of unrelated watch events times the number of created-desired stuck containers. The same issue applies to actual == "dead" or actual == "created" when desired == "created".
finalizerOnly := containersv1alpha1apply.Container(inspect.ID, apiNamespace).
WithFinalizers(mirrorFinalizer)
if err := w.k8s.Apply(ctx, finalizerOnly,
client.ForceOwnership, client.FieldOwner(finalizerFieldOwner)); err != nil {
return fmt.Errorf("failed to reassert container finalizer %s: %w", inspect.ID, err)
}
}
// Build status.
statusApply := containersv1alpha1apply.ContainerStatus().
WithName(name).
The new BATS test at engine.bats:231-248 only verifies the one-shot transition to exited and does not observe the subsequent churn, so this isn't caught by CI.
"${image_ref}" --timeout=30s
}
# --- Container state transitions via spec ---
@test "patching spec.state=created stops Docker container" {
docker run -d --name test-state busybox sleep 3600
run -0 docker inspect test-state --format '{{.Id}}'
cid=${output}
rdd ctl wait --for=jsonpath='{.status.status}'=running \
--namespace="${NAMESPACE}" container/"${cid}" --timeout=30s
rdd ctl patch container "${cid}" --namespace="${NAMESPACE}" \
--type=merge -p '{"spec":{"state":"created"}}'
rdd ctl wait --for=jsonpath='{.status.status}'=exited \
--namespace="${NAMESPACE}" container/"${cid}" --timeout=30s
run -0 docker inspect test-state --format '{{.State.Status}}'
assert_output "exited"
}
@test "patching spec.state=running restarts Docker container" {
run -0 docker inspect test-state --format '{{.Id}}'
cid=${output}
Fix: treat any stopped state (created, exited, dead) as satisfying desired=created, symmetric with the unpause handling in the Running branch at lines 354-363.
func (w *dockerWatcher) reconcileContainerState(ctx context.Context, c *containersv1alpha1.Container) error {
desired := c.Spec.State
if desired == containersv1alpha1.ContainerStatusUnknown {
return nil
}
actual := c.Status.Status
- if desired == actual {
- return nil
- }
+ // Treat any stopped state as satisfying desired=created.
+ if desired == containersv1alpha1.ContainerStatusCreated {
+ switch actual {
+ case containersv1alpha1.ContainerStatusCreated,
+ containersv1alpha1.ContainerStatusExited,
+ containersv1alpha1.ContainerStatusDead:
+ return nil
+ }
+ }
+ if desired == actual {
+ return nil
+ }
(Gemini flagged this as Critical on the grounds of "Docker API spam loop"; Claude and Codex called it Important. Calibrated as Important here: the loop is bounded by unrelated event frequency, NotModified is a silent success, and no error propagates — the cost is wasted RPCs, not a hang or failure.)
App.Status.Conditions now has two writers but the AppReconciler still uses a blind Status().Update Codex¶
if err := r.Update(ctx, limaVM); err != nil {
return ctrl.Result{}, fmt.Errorf("failed to propagate running state to LimaVM: %w", err)
}
return ctrl.Result{}, nil
}
// Mirror LimaVM status conditions onto the App status. The engine
// reconciler writes ContainerEngineReady on the same object, so the
// App's resourceVersion from the initial Get can be stale by the time
// we write. retry.RetryOnConflict + re-Get matches the pattern in
// EngineReconciler.setEngineCondition so concurrent writers no longer
// 409-loop through controller-runtime requeue. Truncate messages
// defensively: the LimaVM controller already truncates at source, but
// guarding here ensures a future bypass can't cause the App status
// update to fail CRD validation.
if err := retry.RetryOnConflict(retry.DefaultRetry, func() error {
latest := &v1alpha1.App{}
if err := r.Get(ctx, client.ObjectKeyFromObject(&app), latest); err != nil {
return err
}
changed := false
for _, cond := range limaVM.Status.Conditions {
changed = apimeta.SetStatusCondition(&latest.Status.Conditions, metav1.Condition{
Type: cond.Type,
Status: cond.Status,
Reason: cond.Reason,
Message: base.TruncateConditionMessage(cond.Message),
ObservedGeneration: latest.Generation,
LastTransitionTime: cond.LastTransitionTime,
}) || changed
}
if !changed {
return nil
This PR adds a second writer to App.Status.Conditions via setEngineCondition (engine_reconciler.go:223-240), which correctly wraps its update in retry.RetryOnConflict with a re-Get. The existing AppReconciler still does not: the comment at line 230-231 claims the resourceVersion from the initial Get is still current, which was true when the App controller was the sole writer but is no longer true once the engine reconciler can land a ContainerEngineReady update in between the Get and the Update. The repo overview explicitly calls this out: *"App.Status.Conditions has multiple concurrent writers (AppReconciler mirrors LimaVM conditions; EngineReconciler writes ContainerEngineReady). Flag any non-SSA, non-retry-on-conflict Status().Update on the App object."*
if w != nil {
w.stop()
}
}
// setEngineCondition updates the ContainerEngineReady condition on the
// App resource. The App controller also writes App.Status.Conditions
// (to mirror LimaVM conditions) and controller-runtime does not
// serialize reconciles across controllers, so a naive Update can race
// and 409. retry.RetryOnConflict plus a re-Get inside the loop is the
// same pattern used elsewhere in this file (see removeMirrorResource,
// deleteAllOfType).
func (r *EngineReconciler) setEngineCondition(ctx context.Context, app *appv1alpha1.App, status metav1.ConditionStatus, reason, message string) error {
return retry.RetryOnConflict(retry.DefaultRetry, func() error {
latest := &appv1alpha1.App{}
if err := r.Get(ctx, client.ObjectKey{Name: app.Name}, latest); err != nil {
// The App can be deleted concurrently with the reconciler
// (e.g., rdd svc delete mid-reconcile). Treat NotFound as a
// no-op so we do not propagate a spurious error that would
// requeue only to be IgnoreNotFound-dropped on the next
// reconcile.
if apierrors.IsNotFound(err) {
return nil
}
return err
}
changed := meta.SetStatusCondition(&latest.Status.Conditions, metav1.Condition{
Type: conditionContainerEngineReady,
This isn't catastrophic — controller-runtime requeues on the returned conflict error and the next reconcile re-Gets — but it turns routine status churn into avoidable 409s and noisy reconcile-error logs.
Fix: convert the App controller's status write to the same retry.RetryOnConflict + re-Get pattern used in setEngineCondition.
if statusChanged {
if err := retry.RetryOnConflict(retry.DefaultRetry, func() error {
latest := &v1alpha1.App{}
if err := r.Get(ctx, client.ObjectKeyFromObject(&app), latest); err != nil {
return err
}
for _, cond := range limaVM.Status.Conditions {
apimeta.SetStatusCondition(&latest.Status.Conditions, metav1.Condition{
Type: cond.Type,
Status: cond.Status,
Reason: cond.Reason,
Message: base.TruncateConditionMessage(cond.Message),
ObservedGeneration: latest.Generation,
LastTransitionTime: cond.LastTransitionTime,
})
}
return r.Status().Update(ctx, latest)
}); err != nil {
log.Error(err, "Unable to update App status")
return ctrl.Result{}, err
}
}
// waitForDesiredState waits for the App to reach the state implied by the
// properties that were set. If running=true was set, it waits for the
// container engine to be ready. If running=false was set, it waits for the
// App's Running condition to go to False — i.e. the VM has actually
// stopped, which is stricter than "container engine disconnected".
// Other property changes do not wait.
//
// minGen is the App's metadata.generation after our write. The predicate
// only accepts condition snapshots whose ObservedGeneration >= minGen,
// so a stale ContainerEngineReady=True from a previous backend (or a
// stale Running=False from a prior stop) cannot satisfy the wait before
// the reconciler has observed the new spec. EngineReconciler and
// AppReconciler both stamp ObservedGeneration when they update conditions.
//
// TODO: changes that trigger a VM restart without changing "running" (e.g.
// containerEngine.name alone) are still not waited on. A dedicated
// "Reconciled" condition on App would let the CLI detect when the full
// reconcile chain has settled for any property change.
func waitForDesiredState(ctx context.Context, config *rest.Config, properties map[string]string, timeout time.Duration, minGen int64) error {
runningVal, ok := properties["running"]
if !ok {
return nil
}
ctx, cancel := context.WithTimeout(ctx, timeout)
defer cancel()
if runningVal == "true" {
logrus.Info("Waiting for container engine to be ready")
return watchCondition(ctx, config, func(obj *unstructured.Unstructured) bool {
status, gen, present := conditionInfo(obj, "ContainerEngineReady")
The predicate only checks Status == True and has no generation handoff from the preceding patch. The "Known hazard" comment at cmd/rdd/set.go:445-452 acknowledges that rdd set containerEngine.name=<other> running=true on an already-running VM can be satisfied by the stale Connected condition left over from the previous backend — but setEngineCondition already writes ObservedGeneration: latest.Generation at engine_reconciler.go:234, so the fix is cheap: capture the generation from the patch response in setAction and compare it in the predicate. The waitForDesiredState signature needs a minGen int64 parameter, patchApp/createAndPatchApp need to return the post-patch generation, and watchCondition's predicate gains cond.ObservedGeneration >= minGen.
if dryRun {
logrus.Info("App validated (dry run)")
} else {
logrus.Info("App updated")
}
return app.Generation, nil
}
// waitForDesiredState waits for the App to reach the state implied by the
// properties that were set. If running=true was set, it waits for the
// container engine to be ready. If running=false was set, it waits for the
// App's Running condition to go to False — i.e. the VM has actually
// stopped, which is stricter than "container engine disconnected".
// Other property changes do not wait.
//
// minGen is the App's metadata.generation after our write. The predicate
// only accepts condition snapshots whose ObservedGeneration >= minGen,
// deleteAllOfType).
func (r *EngineReconciler) setEngineCondition(ctx context.Context, app *appv1alpha1.App, status metav1.ConditionStatus, reason, message string) error {
return retry.RetryOnConflict(retry.DefaultRetry, func() error {
latest := &appv1alpha1.App{}
if err := r.Get(ctx, client.ObjectKey{Name: app.Name}, latest); err != nil {
// The App can be deleted concurrently with the reconciler
// (e.g., rdd svc delete mid-reconcile). Treat NotFound as a
// no-op so we do not propagate a spurious error that would
// requeue only to be IgnoreNotFound-dropped on the next
// reconcile.
if apierrors.IsNotFound(err) {
Without the fix, rdd set containerEngine.name=containerd (without changing running) is the worst case: no wait triggers at all because no running argument is set, even though a full VM restart follows. With running=true included, the wait happens but can be satisfied by a pre-patch snapshot.
Fix sketch (conceptual, paired with I4 in rdd set.go):
// In setAction, after patchApp/createAndPatchApp:
var app appv1alpha1.App
if err := c.Get(ctx, client.ObjectKey{Name: "app"}, &app); err != nil {
return fmt.Errorf("failed to get App after patch: %w", err)
}
minGen := app.Generation
// In waitForDesiredState predicate:
return conditionStatusForGeneration(obj, "ContainerEngineReady", metav1.ConditionTrue, minGen)
Claude flagged this as Important because the fix is already available (the stamp exists), not because the hazard itself is new.
status.ports is built in non-deterministic map-iteration order and churns the atomic list on every sync Claude¶
if inspect.State.Error != "" {
statusApply.WithError(inspect.State.Error)
}
if t, err := time.Parse(time.RFC3339Nano, inspect.Created); err == nil {
statusApply.WithCreatedAt(metav1.NewTime(t))
}
if t, err := time.Parse(time.RFC3339Nano, inspect.State.StartedAt); err == nil && !t.IsZero() {
statusApply.WithStartedAt(metav1.NewTime(t))
}
if t, err := time.Parse(time.RFC3339Nano, inspect.State.FinishedAt); err == nil && !t.IsZero() {
statusApply.WithFinishedAt(metav1.NewTime(t))
}
// Port bindings. Go map iteration is randomized, so sort by port
// name before building the apply slice: even with
// +listType=map +listMapKey=name on Status.Ports, the entries land
// in the stored object in the order we provide, and a stable order
// avoids spurious resourceVersion churn when consumers render the
// list or diff it.
var applyPorts []*containersv1alpha1apply.ContainerPortApplyConfiguration
if inspect.NetworkSettings != nil {
portNames := make([]mobynetwork.Port, 0, len(inspect.NetworkSettings.Ports))
for p := range inspect.NetworkSettings.Ports {
inspect.NetworkSettings.Ports is a Go map, so iteration order is randomized per process. The Container CRD declares status.ports as a plain array with no x-kubernetes-list-type annotation (verified against pkg/controllers/containers/container/crd.yaml: only status.conditions carries x-kubernetes-list-type: map). Lists without a list-type default to atomic, which is position-sensitive: the apiserver treats [A, B] and [B, A] as different values and produces a new resourceVersion on every apply that reorders them. That fires a watch event, which the engine reconciler catches via Watches(&containersv1alpha1.Container{}, enqueueApp), which re-enters reconcileContainerSpecs + processFinalizers — a four-list sweep per watch event. For a container with two or more exposed ports, every unrelated Docker event (image pull, volume create, etc.) can amplify into a status re-apply and a full reconcile sweep.
Fix: sort inspect.NetworkSettings.Ports by port name before building the apply slice. The inner bindings loop is Docker-ordered (IPv4 then IPv6) and does not need sorting unless Docker's internal order changes.
if inspect.NetworkSettings != nil {
+ portNames := make([]mobynat.Port, 0, len(inspect.NetworkSettings.Ports))
+ for p := range inspect.NetworkSettings.Ports {
+ portNames = append(portNames, p)
+ }
+ sort.Slice(portNames, func(i, j int) bool { return portNames[i] < portNames[j] })
- for portName, ports := range inspect.NetworkSettings.Ports {
+ for _, portName := range portNames {
+ ports := inspect.NetworkSettings.Ports[portName]
Alternatively, annotate Ports with +listType=map + +listMapKey=name in container_types.go so the apiserver merger ignores ordering — this is more robust but requires a CRD regeneration and is a larger change.
Suggestions ¶
// neither — or the watcher will open a different socket from the one
// Lima is port-forwarding to, and mirroring will silently diverge per
// instance. Until then, multi-instance BATS runs cannot share the
// same host and the engine.bats test has a parallel TODO at line
// 12-14.
var DockerSocket = sync.OnceValue(func() string {
home, err := os.UserHomeDir()
if err != nil {
panic(fmt.Errorf("could not get home directory: %w", err))
}
dir := filepath.Join(home, ".rd2")
_ = os.MkdirAll(dir, 0o700)
return filepath.Join(dir, "docker.sock")
})
If MkdirAll fails (permission denied, read-only filesystem), the error is discarded and the returned path may point at a non-existent parent. dockerclient.WithHost("unix://" + socketPath) in docker_watcher.go:47 then fails at connect time with a cryptic "no such file or directory". Sibling helpers (LogDir, ShortDir, etc.) don't do MkdirAll — the directory is really Lima's responsibility, and creating it here is inconsistent.
// newDockerWatcher creates a Docker client, performs a full sync, and starts
// the event stream watcher goroutine.
func newDockerWatcher(ctx context.Context, k8s client.Client, reconcileChan chan<- event.GenericEvent) (*dockerWatcher, error) {
socketPath := instance.DockerSocket()
cli, err := dockerclient.New(
dockerclient.WithHost("unix://" + socketPath),
)
if err != nil {
return nil, fmt.Errorf("failed to create Docker client: %w", err)
}
Fix: drop the MkdirAll call entirely and let Lima own the directory lifecycle.
if w != nil {
w.stop()
}
}
// setEngineCondition updates the ContainerEngineReady condition on the
// App resource. The App controller also writes App.Status.Conditions
// (to mirror LimaVM conditions) and controller-runtime does not
// serialize reconciles across controllers, so a naive Update can race
// and 409. retry.RetryOnConflict plus a re-Get inside the loop is the
// same pattern used elsewhere in this file (see removeMirrorResource,
// deleteAllOfType).
func (r *EngineReconciler) setEngineCondition(ctx context.Context, app *appv1alpha1.App, status metav1.ConditionStatus, reason, message string) error {
return retry.RetryOnConflict(retry.DefaultRetry, func() error {
latest := &appv1alpha1.App{}
if err := r.Get(ctx, client.ObjectKey{Name: app.Name}, latest); err != nil {
// The App can be deleted concurrently with the reconciler
// (e.g., rdd svc delete mid-reconcile). Treat NotFound as a
// no-op so we do not propagate a spurious error that would
// requeue only to be IgnoreNotFound-dropped on the next
// reconcile.
if apierrors.IsNotFound(err) {
return nil
}
return err
}
changed := meta.SetStatusCondition(&latest.Status.Conditions, metav1.Condition{
Type: conditionContainerEngineReady,
Status: status,
Reconcile wraps its initial Get in client.IgnoreNotFound, but if the App is deleted between that Get and a later setEngineCondition call (e.g., rdd svc delete runs concurrently), the inner Get returns NotFound and retry.RetryOnConflict does not treat NotFound as retriable — the error propagates up and the reconciler returns it, causing a gratuitous requeue-then-bail.
Fix: treat NotFound as a no-op inside the retry closure.
return retry.RetryOnConflict(retry.DefaultRetry, func() error {
latest := &appv1alpha1.App{}
if err := r.Get(ctx, client.ObjectKey{Name: app.Name}, latest); err != nil {
+ if apierrors.IsNotFound(err) {
+ return nil
+ }
return err
}
_, err := w.cli.ContainerStop(ctx, c.Name, dockerclient.ContainerStopOptions{})
return err
}
return nil
}
// isStopped reports whether a container status represents a terminal
// non-running state. Every state listed here satisfies a desired state
// of "created": the container is not running, cannot transition without
// a user action (start/rm), and dispatching ContainerStop again would
// be a no-op.
func isStopped(s containersv1alpha1.ContainerStatusValue) bool {
switch s {
case containersv1alpha1.ContainerStatusCreated,
containersv1alpha1.ContainerStatusExited,
containersv1alpha1.ContainerStatusDead:
return true
}
return false
}
// deleteContainer removes a container from Docker. NotFound errors are
Consider: (1) Image A is tagged busybox:latest, mirror A holds Status.RepoTag=busybox:latest. (2) docker pull busybox:latest brings in image B, moves the tag, emits untag(A)+tag(B). (3) Before the watcher processes those events, the user deletes mirror A. processImageFinalizers reads the stale mirror A, calls ImageRemove("busybox:latest"), which now hits image B and untags it. The ID-based path would have untagged A correctly.
The intent of using the tag is presumably to untag-only rather than remove — so that a multi-tagged image keeps its other tags. A safer approach re-inspects the image and verifies Inspect().RepoTags still contains the tag before calling ImageRemove, or removes by status.ID + a ForceOwnership: false flag.
processVolumeFinalizers at lines 426-455 guards w.deleteVolume(ctx, v.Status.Name) with if v.Status.Name != "" because a bare-skeleton mirror has no engine name to forward to. The Image path has no equivalent guard: if a user directly creates an Image mirror via kubectl (with the mirror finalizer) and then deletes it before status is populated, deleteImage passes an empty string to ImageRemove, which the daemon rejects. Since the Container webhook only intercepts Update not Create, such a bare mirror is technically reachable.
}
return r.Update(ctx, latest)
})
if retryErr != nil {
errs = append(errs, fmt.Errorf("failed to remove finalizer from Container %s: %w", c.Name, retryErr))
}
}
return errors.Join(errs...)
}
func (r *EngineReconciler) processImageFinalizers(ctx context.Context, w *dockerWatcher) error {
var images containersv1alpha1.ImageList
if err := r.List(ctx, &images, client.InNamespace(apiNamespace)); err != nil {
return err
}
var errs []error
for i := range images.Items {
img := &images.Items[i]
if img.DeletionTimestamp == nil || !hasFinalizer(img, mirrorFinalizer) {
continue
}
// An Image mirror with no engine-populated status has no Docker
// reference to forward the delete to — either a user created it
// as a bare skeleton or it landed in the startup race window
// before applyImage ran. Symmetric with processVolumeFinalizers'
// empty-Status.Name guard: strip the finalizer and let the
// Delete proceed.
if img.Status.ID != "" || img.Status.RepoTag != "" {
if err := w.deleteImage(ctx, img); err != nil {
errs = append(errs, fmt.Errorf("failed to delete image %s from Docker: %w", img.Name, err))
continue
}
}
if removeFinalizer(img, mirrorFinalizer) {
if err := client.IgnoreNotFound(r.Update(ctx, img)); err != nil {
errs = append(errs, fmt.Errorf("failed to remove finalizer from Image %s: %w", img.Name, err))
}
}
}
return errors.Join(errs...)
Fix: symmetrically guard the Image path with img.Status.ID != "" || img.Status.RepoTag != "".
parseContainerName("/") returns ("moby", "") which downstream CRD validation would reject Claude¶
The table test at sync_containers_test.go:42-51 pins the "/" result as ("moby", ""). That flows into ContainerStatus().WithName("") at sync_containers.go:151, and status.name is +required in container_types.go:73-75, so the apiserver would reject the apply with a validation error. In practice inspect.Name from Docker is never / because Docker always generates a name, so this is defensive only. But the current test pins a result that in the hot path would produce an invalid mirror.
wantNamespace: containerNamespace,
wantName: "",
},
{
input: "/",
wantNamespace: containerNamespace,
wantName: "",
},
}
for _, tc := range tests {
t.Run(tc.input, func(t *testing.T) {
ns, name := parseContainerName(tc.input)
if ns != tc.wantNamespace || name != tc.wantName {
t.Errorf("parseContainerName(%q) = (%q, %q), want (%q, %q)",
tc.input, ns, name, tc.wantNamespace, tc.wantName)
}
})
}
}
// controllerName. SSA treats fields absent from an apply config
// as released by that manager; releasing spec from controllerName
// would prune spec (no other manager owns it) and fail the
// required-field validation on the Container CRD. A separate
// manager that only ever touches finalizers keeps controllerName's
// spec ownership untouched.
finalizerOnly := containersv1alpha1apply.Container(inspect.ID, apiNamespace).
WithFinalizers(mirrorFinalizer)
if err := w.k8s.Apply(ctx, finalizerOnly,
client.ForceOwnership, client.FieldOwner(finalizerFieldOwner)); err != nil {
return fmt.Errorf("failed to reassert container finalizer %s: %w", inspect.ID, err)
// +kubebuilder:validation:Enum=created;running;unknown
State ContainerStatusValue `json:"state"`
}
// ContainerStatus defines the observed state of the container.
type ContainerStatus struct {
// Name of the container; this is distinct from the container ID.
//
// +required
Name string `json:"name"`
// Namespace is the container namespace; refers to a `ContainerNamespace`
// object in the same Kubernetes namespace.
//
Fix: return an explicit error or fall back to a synthetic name (e.g., the container ID). A test assertion + a clear error is better than silently producing an invalid apply.
If a user has initiated kubectl delete container/... (mirror has DeletionTimestamp) and Docker then emits a container event (the container dies on its own), applyContainer reaches the else branch and tries to re-assert the mirror finalizer via SSA. Adding a finalizer to a deleting object is typically a no-op under SSA (the finalizer is still present), but if any other field manager stripped it the reassertion fails with a cryptic error and the next Docker event for the same container repeats the failure.
Fix: skip the finalizer reassertion when existing.DeletionTimestamp != nil.
finalizerOnly := containersv1alpha1apply.Container(inspect.ID, apiNamespace).
WithFinalizers(mirrorFinalizer)
if err := w.k8s.Apply(ctx, finalizerOnly,
client.ForceOwnership, client.FieldOwner(finalizerFieldOwner)); err != nil {
return fmt.Errorf("failed to reassert container finalizer %s: %w", inspect.ID, err)
}
}
// Build status.
statusApply := containersv1alpha1apply.ContainerStatus().
WithName(name).
inspect.State.Status is a free-form Docker string, directly cast into the CRD enum without validation. The CRD enum at container_types.go:11 currently covers every documented Docker state, but a future Docker release that introduces a new state string (initializing, error, etc.) would cause SSA applies to fail with CRD validation errors, the per-container error would be logged and skipped by syncAllContainers:57-59, and that container's mirror would silently fall behind.
import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// +kubebuilder:validation:Enum=created;running;pausing;paused;restarting;removing;exited;dead;unknown
// ContainerStatusValue describes the status of a container.
type ContainerStatusValue string
// Possible values for ContainerStatusValue.
Fix: whitelist the known states and fall back to unknown on mismatch, so forward-compat with Docker is explicit rather than implicit.
Pairs with I3: neither createAndPatchApp nor patchApp returns the metadata.generation of the post-write App object, so waitForDesiredState has no way to filter stale condition snapshots in the informer's initial list. The fix is a small plumbing change — return app.Generation from the patch path and thread it into watchCondition's predicate.
return
}
log.Error(err, "Docker event stream error")
w.enqueueReconcile()
return
case msg, ok := <-result.Messages:
if !ok {
// The daemon closed the event stream without writing to
// result.Err. Treat the same way as an error: wake the
// reconciler so it observes the dead watcher and starts
// a fresh one.
log.Info("Docker event stream closed")
w.enqueueReconcile()
return
}
if err := w.handleEvent(ctx, msg); err != nil {
If the messages channel closes (daemon disconnect), the select continuously yields a zero-value events.Message. handleEvent safely ignores unknown event types (default → return nil), so this isn't a hard spin — the loop will also see result.Err close and terminate — but explicit msg, ok := <-result.Messages with a closed-channel exit is safer and matches the Go idiom.
removeMirrorResource wraps its finalizer-strip in retry.RetryOnConflict; processContainerFinalizers does not. On a 409, the whole reconcile requeues and re-runs deleteContainer, which is idempotent but wastes an API round-trip. Consistency across finalizer paths would simplify reasoning.
instance="${RDD_INSTANCE:-2}"
# Locate the rdd binary relative to this script rather than via PATH.
# CI runners do not add <repo>/bin to PATH, and bats targets invoke
# us with `../scripts/bats-with-timeout.sh` from the bats/ directory.
script_dir=$(cd "$(dirname "$0")" && pwd)
rdd_bin="${script_dir}/../bin/rdd"
if [ ! -x "$rdd_bin" ] && [ -x "${rdd_bin}.exe" ]; then
rdd_bin="${rdd_bin}.exe"
fi
if [ ! -x "$rdd_bin" ]; then
echo "bats-with-timeout: rdd binary not found at $rdd_bin; run 'make' first" >&2
exit 2
fi
# `rdd svc paths log_dir` is a pure local computation (see
# cmd/rdd/service_paths.go): it resolves the path from the instance
# suffix without touching the running service, so it is safe to call
# even when the target under test is hung.
log_dir=$("$rdd_bin" svc paths log_dir)
mkdir -p "${log_dir}"
bundle_file="${log_dir}/support-bundle.log"
If neither bin/rdd nor bin/rdd.exe exists (a developer running the script before make), the $() subshell fails with a generic shell error and set -o errexit aborts without explaining what went wrong.
Fix: [ -x "$rdd_bin" ] || { echo "bats-with-timeout: rdd binary not found at $rdd_bin; run 'make' first" >&2; exit 2; } before the command.
The log field hardcodes pgid to haCmd.Process.Pid on the assumption that process.SetGroup makes the child a process-group leader. The comment explains the assumption but doesn't cross-reference SetGroup, so a future change that drops Setpgid would silently break downstream tooling (the new bats-with-timeout.sh pgid matching).
Fix: query the real pgid via syscall.Getpgid(haCmd.Process.Pid) after Start() and log the result. One extra syscall for a defensible invariant.
Top-level assignments run at file load and export DOCKER_HOST globally to every test in the file (and any subshell). This is harmless today but will leak through if another BATS suite ever sources this file. The documented .rd2 TODO replacement will be cleaner if DOCKER_HOST is scoped to local_setup_file instead of being module-level.
Design Observations ¶
Concerns ¶
- Engine reconciler iterates all containers on every unrelated watch event —
pkg/controllers/app/engine/controllers/engine_reconciler.go:170-182Claude Codex(future). The author's TODO at lines 170-176 acknowledges this. Concrete cost: everyContainer/Image/Volumeevent triggers a fullreconcileContainerSpecs + processFinalizerssweep (4 List calls + O(N) Docker RPCs in the stuck-state case — see I1). Splitting into per-kind reconcilers with watch predicates (deletion timestamp set,spec.state changed) would bound cost to O(1) per event. Not blocking, but I1 is a direct consequence of this design. - Engine namespace and
containernamespace/mobyare not repaired after user-initiated delete —pkg/controllers/app/engine/controllers/engine_reconciler.go:457-475,pkg/controllers/app/engine/controllers/sync_volumes.go:56-66Codex(in-scope).ensureNamespaceandsyncContainerNamespacerun only infullSyncat watcher startup;SetupWithManagerdoes not watchcorev1.NamespaceorContainerNamespace. A user delete of either resource is therefore not repaired until the Docker watcher is torn down and restarted. For the mobyContainerNamespacethis is documented atsync_volumes.go:56-61as a deliberate design choice (to avoid a finalizer trap), and the BATS test atengine.bats:355-361codifies the "stays deleted" behavior. For therancher-desktopNamespace there is no such deliberation — if deleted, subsequent mirror applies would fail with NotFound until the next watcher restart. Users are unlikely to delete these resources in an embedded single-user context, so Codex's Important severity is demoted to a design observation; but aWatches(&corev1.Namespace{}, enqueueAppIfName("rancher-desktop"))would close the gap cleanly. Container.Statushas concurrent SSA and non-SSA writers —pkg/controllers/containers/container/container_controller.go(pre-PR) andpkg/controllers/app/engine/controllers/sync_containers.go(new) Claude(future). The pre-existing container reconciler writesstatus.Conditionsvia a plainr.Status().Update()without a field manager, while the new engine controller writesstatus.Name,status.Status,status.Ports, etc. via SSA with field managerengine-controller. EachStatus().Updatesends the full status object, so the container reconciler's client implicitly takes ownership of engine-written fields, which the engine controller then forcibly reclaims viaForceOwnershipon the next sync. Functional today, but the repo-context rule about concurrent writers onApp.Status.Conditionsapplies symmetrically here. Migrating the container reconciler to SSA with a distinct field manager would eliminate the ping-pong.
Strengths ¶
- Clean separation between
dockerWatcherandEngineReconcilerClaude. Docker-side I/O and controller-runtime integration are split cleanly, making ownership of state obvious. - Daemon-clock
Sincefilter is the right design choice Claude Codex Gemini.dockerEventsSince()uses Docker'sInfo().SystemTimeto anchor the event stream'sSincefilter rather than hosttime.Now(), avoiding the host/guest clock-skew bug that would otherwise drop startup-window events. The host-clock fallback with a 2-minute bias is pragmatic for the rare case whereInfois unavailable. - Race/retry decisions are commented with reasoning, not just mechanics Claude.
sync_containers.go:102-122on whycreateomitsForceOwnershipis a textbook example — it captures the "fail loudly on concurrent user patch" intent that a terse SSA snippet would hide. errors.Jointhroughout cleanup/sync paths Claude.cleanupMirrorResources,processFinalizers,syncAllContainers, andsyncAllImagesconsistently avoid the "one stuck object blocks the rest" footgun.NotApplicable=Truefor containerd backend Gemini. ReportingContainerEngineReady=Truewith ReasonNotApplicableis a clean way to letrdd set running=true containerEngine.name=containerdcomplete its wait even though mirroring is a no-op — with the caveat about Reason discrimination captured in I3.
Testing Assessment ¶
BATS coverage is thorough for happy-path transitions and the non-obvious cases (untag-leaves-dangling, tagging-dangling-removes-mirror, in-use image finalizer, paused-to-created, paused-to-running). Untested scenarios ranked by risk:
- Steady-state reconcile cadence after
spec.state=created(I1). No test verifies that oncestatus.status=exitedis observed, subsequent unrelated events stop dispatchingContainerStop. A unit test onreconcileContainerStatewith a counting fake Docker client would catch this without introducing flakiness. - Watcher death and reconnect while the VM is still running. The
watcherDiedpath inReconcileatengine_reconciler.go:88-106handles it, but nothing exercises the "Docker is still up but the event stream closed" scenario (proxy timeout, API version compaction). Simulatable by killing the Docker client's underlying TCP socket. - Containers in
dead/removingstates withdesired=created. Claude's I1 plus S7 flag this region as ambiguous; no test covers it. - Concurrent create-then-patch race.
sync_containers.go:117-122's comment explicitly relies on a "fail loudly on conflict" contract, but no test verifies the 409 is retried on the next reconcile. - Full sync after
rdd svc deletefollowed byrdd svc startwith prior Docker state still present. Verifies the stale-mirror prune path actually runs. --timeout=0preserves legacy no-wait behavior. Covered implicitly byrdd set --timeout=10s running=falseatengine.bats:346but not explicitly tested for=0.- Windows moby startup with
rdd set running=true. Currently skipped viaskip_on_windowsinlocal_setup_file; when WSL2 socket forwarding lands this becomes exercisable.
Documentation Assessment ¶
docs/design/api_containers.mdis correctly updated for the new engine mirroring behavior and thespec.state=unknowndefault — well done.docs/design/api_app.mdstill describesApp.Status.Conditionsas mirrored from LimaVM; it should mention the newContainerEngineReadycondition and its writer ownership Codex.docs/design/cmd_app.mdstill describesrdd setas a pure patch command and does not document the new default wait behavior or the--timeout=0escape hatch Codex.reconcileContainerStatehas good per-branch comments but no top-of-function "desired vs actual → action" table. A three-row table would have surfaced I1 during authoring.- The CRD enum restriction on
spec.state(created;running;unknown) lives only in the kubebuilder annotation; the doc comment atcontainer_types.go:55-66could inline the valid values in prose.
Commit Structure ¶
Commit history is unusually clean. Related chunks — the Docker client dependency, the engine controller, the webhook, docs, BATS tests, rdd set wait — each get their own commit. CI-only changes are separated from code changes. The four "Address review round N findings" commits are preserved to make each round's response visible against the baseline, which is acceptable under the project's git-workflow guidance.
The diff is large (~132KB, 27 files) but internally consistent. Splitting the engine controller from the CI infrastructure would be tempting, but the BATS support-bundle work is specifically motivated by debugging the engine controller's hangs in CI, so bundling them is defensible.
Acknowledged Limitations ¶
DockerSocket()andlima-template.yamlhardcode.rd2—pkg/instance/instance.go:127-135,bats/tests/32-app-controllers/engine.bats:13-14. Documented TODO; fix requires parameterizing the Lima template onRDD_INSTANCEand moving both hardcodes together. Not blocking.rdd set containerEngine.name=<other>on an already-running VM does not wait on the restart —cmd/rdd/set.go:440-452. Documented; promoted here to I3 because theObservedGenerationmachinery to fix it already exists insetEngineCondition.reconcileContainerSpecsandprocessFinalizersare O(N) per child event —pkg/controllers/app/engine/controllers/engine_reconciler.go:170-176. Documented TODO; acceptable for current workloads.- Sequential container/image Inspect during initial sync —
sync_containers.go:30-36,sync_images.go:36-40. Documented; acceptable for typical dev workloads. - Windows signal/pgid semantics for the
pgidlog field —limavm_lifecycle.go:479-486. Documented; S12 proposes a runtime assertion. containernamespace/mobycan be deleted without recreation —pkg/controllers/app/engine/controllers/sync_volumes.go:56-61. Documented inline as a deliberate design choice to avoid a finalizer trap;engine.bats:355-361codifies it.- Lima template rename from
.rd/docker.sock→.rd2/docker.sock. Not an upgrade concern — no public release yet. - Windows Docker socket is still Unix-prefixed —
docker_watcher.go:45-47passes"unix://" + pathon all platforms, but Lima (and therefore the engine reconciler'swantWatchergate) is not yet exercised on Windows. Latent bug rather than active regression; worth a platform check once WSL2 lands.
Agent Performance Retro ¶
Claude ¶
Claude produced the longest, most detailed review (614 lines, 3 important + 12 suggestions + 3 design observations). Most unique value came from tracing pre-existing code: verifying moby daemon's stop.go to confirm ContainerStop on a non-running container returns NotModified, connecting the ports-ordering churn (I4) to the missing x-kubernetes-list-type annotation in crd.yaml, and spotting that the ObservedGeneration handoff for I3 is cheap because setEngineCondition already stamps the field. Notable miss: didn't flag the App controller's blind Status().Update even though the repo-context rule calls it out explicitly. Claude's S3 (deleteImage tag race) is a good find but the proposed diff is the wrong direction — Claude recovers in prose.
Codex ¶
Codex produced the tightest, most-calibrated review: zero criticals, four importants, zero suggestions. Every finding was grounded in a concrete cross-file interaction — I2 (App controller blind Update) is the clearest example of reading the repo-context rule and applying it. Codex was the only agent to notice the Namespace repair gap (demoted to design observation here) and the latent Windows unix-socket issue (demoted to acknowledged limitation). Codex's severity calibration was more aggressive than Claude's and more accurate than Gemini's — four importants all landed as legitimate concerns after verification.
Gemini ¶
Gemini's review was the shortest and the most Critical-happy: one critical (C1 spec.state convergence), one important (CrashLoopBackOff bypass), two suggestions. The Critical framing overstated the severity — the loop is bounded by unrelated event rate and NotModified is silent success — but the finding itself matched Claude and Codex. Gemini's S1 (channel closure check) and S2 (processContainerFinalizers missing RetryOnConflict) are legitimate minor cleanups. Gemini's CrashLoopBackOff observation is a real design concern worth surfacing even if not action-required today. Known stderr noise: Gemini's sandbox blocked one path outside the worktree (~/go/pkg/mod/...) and hit an ENAMETOOLONG stat error on what looks like an embedded bash snippet from the bats-with-timeout.sh diff — neither affected the final review.
Summary ¶
| Claude Opus 4.6 | Codex GPT 5.4 | Gemini 3.1 Pro | |
|---|---|---|---|
| Duration | 25m 38s | 6m 34s | 4m 58s |
| Findings | 3I 11S | 2I | 1I 2S |
| Tool calls | 76 (Bash 41, Read 29, TodoWrite 5) | 21 (shell 21) | 9 (runshellcommand 5, grepsearch 2, readfile 2) |
| Design observations | 8 | 3 | 2 |
| False positives | 0 | 0 | 0 |
| Unique insights | 13 | 2 | 2 |
| Files reviewed | 27 | 27 | 27 |
| Coverage misses | 0 | 0 | 0 |
| Totals | 3I 11S | 2I | 1I 2S |
| Downgraded | 0 | 2 (I→S) | 2 (I→S) |
| Dropped | 0 | 0 | 0 |
Claude provided the broadest coverage and deepest pre-existing-code tracing. Codex provided the best severity calibration and directly actionable findings grounded in the repo-context rules. Gemini's coverage was the narrowest but it independently caught the dominant issue (I1). For this PR, Codex was the highest-signal agent: every finding it raised survived consolidation at the same severity it proposed.
Reconciliation ¶
- Gemini C1 spec.state convergence: critical → important I1 (NotModified is silent success; no hang or error; cost is wasted RPCs not functional break).
- Gemini I1 CrashLoopBackOff bypass: important → demoted to Design Observation (user explicitly opted into declarative state; Docker restart policy handles the containerized case).
- Codex I2 Namespace repair gap: important → demoted to Design Observation (moby path is deliberate per comment; rancher-desktop path is a corner case users don't hit in single-user embedded context).
- Codex I4 Windows unix-socket: important → demoted to Acknowledged Limitation (latent bug; Lima stack is not yet exercised on Windows, engine reconciler's
wantWatchergate is currently unreachable on that platform).
Review Process Notes ¶
Skill improvements ¶
None. The repo-context rules covered every dimension needed; every Important finding traces back to an existing rule in deep-review-context.md.
Repo context updates ¶
- Add a rule about Go map iteration order feeding into SSA atomic-list fields. Claude's I4 (non-deterministic
status.portsordering) is the kind of finding that will recur across any controller that mirrors map-keyed upstream data into a K8s list field without anx-kubernetes-list-typeannotation. A one-line rule like *"When building astatus.Xlist from a Go map iteration, sort by key before constructing the apply. Unannotated CRD lists default to atomic, and atomic-list SSA is position-sensitive — random iteration order churns resourceVersion on every sync"* would make this a default check for future syncs.