# Deep Review: 20260415-184215-pr-316

| | |
|---|---|
| **Date** | 2026-04-15 18:42 |
| **Repo** | [rancher-sandbox/rancher-desktop-daemon](https://github.com/rancher-sandbox/rancher-desktop-daemon) |
| **Round** | 1 |
| **Author** | [@jandubois](https://github.com/jandubois) |
| **PR** | [#316](https://github.com/rancher-sandbox/rancher-desktop-daemon/pull/316) — engine: replace Container.spec.state with action annotations |
| **Branch** | `container-action-annotations` |
| **Commits** | `d4dd09d` engine: replace Container.spec.state with action annotations |
| **Reviewers** | Claude Opus 4.6, Codex GPT 5.4, Gemini 3.1 Pro |
| **Verdict** | **Merge with fixes** — invalid-annotation wedge (I2) has a clear reproduction path and needs a guard |
| **Wall-clock time** | `1 h 20 min 55 s` |

## Executive Summary

This PR replaces the level-triggered `Container.spec.state` field with a one-shot `containers.rancherdesktop.io/action` annotation, eliminating reconciler thrashing against Docker's restart policy and out-of-band `docker start/stop`. The annotation model is well-designed: dispatch-before-patch gives crash recovery, the compare-and-swap annotation removal preserves concurrent writes, and collapsing the two-field-owner SSA dance after emptying `ContainerSpec` materially simplifies `sync_containers.go`.

Two important issues remain. First, the annotation cleanup patch lacks optimistic concurrency, creating a narrow TOCTOU window where a concurrent action can be silently dropped (I1). Second, the webhook does not validate annotation values, so an invalid action string wedges the reconciler in an infinite retry loop because the CRD enum rejects the status write while the annotation stays in place (I2).

**Structure:** 15 changed files, +543 −288 lines. API types, generated apply configurations, CRD, webhook, engine controller, mock controller, BATS integration tests, design documentation.

---

## Critical Issues

None.

---

## Important Issues

I1. **Annotation removal patch lacks optimistic concurrency** — `pkg/controllers/app/engine/controllers/docker_watcher.go:453-468` [Gemini 3.1 Pro]

```go
patch := client.MergeFrom(latest.DeepCopy())
delete(latest.Annotations, containersv1alpha1.AnnotationAction)
return w.k8s.Patch(ctx, &latest, patch)
```

`removeActionAnnotation` reads the latest object, checks that the annotation still matches the processed value (lines 460–462), then sends a merge patch to remove it. `client.MergeFrom` does not include `resourceVersion` in the patch, so the API server applies it unconditionally. If a concurrent writer replaces the annotation with a new action between the `Get` and the `Patch`, the merge patch removes the new value and the action is silently lost.

The compare-and-swap check on lines 460–462 is correct in intent but does not close the TOCTOU window — it guards against changes visible in the cache at `Get` time, not changes committed to etcd after the `Get`.

The same pattern appears in `patchContainerLastAction` (line 442), though the consequence there is a lost status update rather than a lost action.

Fix: use `client.MergeFromWithOptions(latest.DeepCopy(), client.MergeFromWithOptimisticLock{})` to include `resourceVersion` in the patch. The existing `RetryOnConflict` loop already handles the resulting 409 correctly.

I2. **Invalid action annotation wedges reconciliation** — `pkg/controllers/app/engine/controllers/docker_watcher.go:343-367`, `pkg/controllers/containers/container/container_webhooks.go:37-41` [Codex GPT 5.4]

```go
action := containersv1alpha1.ContainerAction(raw)
// ...
dockerErr := w.dispatchContainerAction(ctx, log, c.Name, action)
```

The update webhook validates only spec immutability; annotations are free-form. A misspelled annotation value (e.g. `action=staart`) passes the webhook, reaches `dispatchContainerAction` (which returns `"unknown container action"`), and then `patchContainerLastAction` tries to write the invalid value into `status.lastAction.action`. The CRD enum (`start|stop|pause|unpause|restart`) rejects the status write, so `patchContainerLastAction` fails and `removeActionAnnotation` never runs. The bad annotation persists and every subsequent reconcile retries the same impossible action.

Fix: validate the annotation value in the update webhook, and add a defensive guard in `processContainerAction` that removes the annotation and logs a warning for unrecognized values instead of attempting a status write that cannot succeed.

---

## Suggestions

S1. **`InProgress` action state is defined but never written** — `pkg/apis/containers/v1alpha1/container_types.go:57-58` [Claude Opus 4.6, Codex GPT 5.4, Gemini 3.1 Pro]

```go
ContainerActionInProgress ContainerActionState = "InProgress"
```

The reconciler blocks on the Docker call and writes only `Succeeded` or `Failed`. No code path ever writes `InProgress`. The enum value and its CRD entry are dead code.

Fix: remove `ContainerActionInProgress` from the type, the CRD, and the generated deep-copy until the controller needs it. Writing `InProgress` before dispatch would require an extra API call per action and break the crash-recovery property (annotation-present = in-progress).

S2. **`unpause` on a stopped container silently records `Succeeded`** — `pkg/controllers/app/engine/controllers/docker_watcher.go:402-409` [Gemini 3.1 Pro]

```go
case containersv1alpha1.ContainerActionUnpause:
    paused, err := w.isContainerPaused(ctx, id)
    if err != nil {
        return err
    }
    if !paused {
        return nil
    }
```

When a user requests `unpause` on a stopped container, `isContainerPaused` returns `false`, `dispatchContainerAction` returns `nil`, and `processContainerAction` records `Succeeded`. By contrast, `pause` on a stopped container reaches Docker and correctly records `Failed`. The asymmetry could confuse GUI developers.

Fix: check `inspect.Container.State.Running` before the `!paused` short-circuit and return an error when the container is not running.

S3. **Restart-idempotency sentence contradicts itself** — `docs/design/api_containers.md:240-244` [Claude Opus 4.6]

```markdown
Start, stop, pause, unpause, and restart are idempotent against a
container already in the target state, so replay is safe; restart is
the one exception and may run twice on crash recovery.
```

The clause lists `restart` as idempotent, then the next clause names `restart` as the exception. The code comment at `docker_watcher.go:374-381` is precise: start/stop are idempotent via Docker's 304, pause/unpause via the reconciler pre-check, and restart may double-restart (harmlessly). The user-facing doc should mirror that.

Fix: drop `restart` from the idempotent list and state that a double restart is harmless.

---

## Design Observations

### Concerns

- **O(N) reconcile cost per child event (in-scope)**: `reconcileContainerActions` and `processFinalizers` both list all resources on every reconcile. The code comment at `engine_reconciler.go:186-190` acknowledges this and identifies per-resource reconcilers with watch predicates as the long-term fix. This PR does not worsen the cost, but the action-annotation watch predicate (only reconcile containers with the annotation set) is a natural next step. [Codex GPT 5.4, Gemini 3.1 Pro]

### Strengths

- **Crash-recovery ordering (in-scope)**: dispatching the Docker call before patching status or removing the annotation ensures that a crash mid-flight leaves the annotation in place for replay. Start, stop, pause, and unpause are idempotent against a container already in the target state, so replay is safe. [Gemini 3.1 Pro]
- **SSA simplification (in-scope)**: collapsing the two-field-manager apply in `sync_containers.go` after emptying `ContainerSpec` removes a subtle source of pruning bugs and cuts 30 lines of code. [Codex GPT 5.4]
- **Atomic annotation replacement (in-scope)**: the single-annotation model means a new value atomically replaces any pending action — no queue, no accumulating history, and a GUI user who changes their mind before the reconciler runs sees only the final action. [Codex GPT 5.4]
- **`lastAction` is deliberately written via non-SSA merge patch (in-scope)**: `patchContainerLastAction` uses `client.MergeFrom` rather than an SSA apply-config, so `engine-controller` never claims `status.lastAction` as an SSA-owned field. This matters because `applyContainer` in `sync_containers.go` runs on every Docker event and applies a status config that omits `lastAction`; if `engine-controller` had claimed the field under SSA, the subsequent apply would release-and-prune it. The asymmetric write pattern is load-bearing, not incidental. [Claude Opus 4.6]

---

## Testing Assessment

The BATS tests cover the five actions (start, stop, pause, unpause, restart), the failure path (pause on a stopped container), and verify `lastAction` outcomes. Coverage gaps:

1. No test covers an invalid annotation value (I2). An invalid-action test would confirm the webhook rejects it and the reconciler does not wedge.
2. No test covers `unpause` on a stopped container (S2), which would expose the false-success asymmetry.
3. The duplicate-informer-delivery scenario described in `docker_watcher.go:374-381` is comment-verified only; deterministic BATS coverage would require timing control that is impractical.

---

## Documentation Assessment

`docs/design/api_containers.md` clearly explains the rationale for the annotation model, the crash-recovery trade-off for `restart`, and the `lastAction` contract. The documentation mentions only `Succeeded` and `Failed` as terminal states, consistent with the implementation — but the CRD schema still exposes `InProgress` (S1).

---

## Acknowledged Limitations

- **Double-restart on crash recovery**: the PR description documents that `restart` may run twice on crash recovery. This is an accepted idempotency gap.
- **No CLI support for actions**: the PR comment confirms actions exist solely for the GUI; CLI users should use `docker start/stop/etc.` directly.

---

## Agent Performance Retro

### [Claude]

Claude's initial run produced no output (cause unknown); a rerun succeeded. It missed both important issues (I1, I2) but contributed the review's most architecturally grounded observation: the non-SSA merge-patch shape of `patchContainerLastAction` is what keeps `applyContainer`'s status apply from pruning `lastAction`. Codex and Gemini both noted the simplification in `sync_containers.go` but neither traced why the new write had to be a merge patch rather than a status apply. Claude also raised a precise documentation nit (S3, the self-contradicting restart sentence) that the other two missed.

### [Codex]

Codex found the most impactful issue in the review: the invalid-annotation wedge (I2), which no other agent raised. Its analysis traced the full path from webhook gap through CRD enum rejection to infinite retry, with a concrete fix. It correctly identified `InProgress` as dead code (S1). It did not flag the optimistic-concurrency gap in `removeActionAnnotation` (I1).

### [Gemini]

Gemini identified the annotation-removal race condition (I1, filed as critical) and the `unpause`-on-stopped asymmetry (S2, filed as important). Both are real issues. Gemini's C1 severity was too high — the TOCTOU window is narrow and the intended caller (GUI) serializes actions naturally — so it was downgraded to important. Gemini's I1 was downgraded to suggestion for the same practical-impact reason. Gemini also caught S1 (`InProgress` dead code), matching Codex.

### Summary

| | Claude Opus 4.6 | Codex GPT 5.4 | Gemini 3.1 Pro |
|---|---|---|---|
| Duration | 14m 04s | 5m 51s | 3m 24s |
| Findings | 2S | 1I 1S | 1I 1S |
| Tool calls | 36 (Read 17, Grep 14, Bash 4) | 41 (shell 40, stdin 1) | 3 (read_file 2, run_shell_command 1) |
| Design observations | 3 | 2 | 3 |
| False positives | 0 | 0 | 0 |
| Unique insights | 2 | 1 | 1 |
| Files reviewed | 15 | 15 | 15 |
| Coverage misses | 0 | 0 | 0 |
| **Totals** | **2S** | **1I 1S** | **1I 1S** |
| Downgraded | 0 | 0 | 2 (I→S) |
| Dropped | 0 | 0 | 0 |


**Reconciliation:** Gemini C1 (annotation removal race): critical → important I1, because the TOCTOU window is narrow and the GUI serializes actions naturally. Gemini I1 (unpause on stopped): important → suggestion S2, because the scenario is unusual and does not cause data loss or crashes.

---

## Review Process Notes

No suggestions.

---

## Resolution

| | |
|---|---|
| **Addressed** | 2026-04-16 |
| **Commit** | `6251b90 engine: replace Container.spec.state with action annotations` (fixup-squashed into the original) |

| # | Finding | Action |
|---|---------|--------|
| 1 | Important #1: Annotation removal patch lacks optimistic concurrency | Fixed |
| 2 | Important #2: Invalid action annotation wedges reconciliation | Fixed |
| 3 | Suggestion #1: `InProgress` action state is defined but never written | Fixed |
| 4 | Suggestion #2: `unpause` on a stopped container silently records `Succeeded` | Fixed |
| 5 | Suggestion #3: Restart-idempotency sentence contradicts itself | Fixed |
| 6 | Testing Gap #1: No test for invalid annotation value | Test written |
| 7 | Testing Gap #2: No test for `unpause` on a stopped container | Test written |
| 8 | Testing Gap #3: Duplicate-informer-delivery scenario is comment-verified only | Skipped |

## Appendix: Original Reviews

### [Gemini] — Pass 1


<details>
<summary>Full review output</summary>

### Executive Summary

This change replaces the level-triggered `spec.state` field on `Container` resources with a one-shot `containers.rancherdesktop.io/action` annotation, resolving reconciliation thrashing against external Docker state changes. The primary finding is a critical race condition in the annotation cleanup logic where a JSON Merge Patch unconditionally strips the annotation, silently dropping concurrent user requests.

### Findings

#### Critical Issues

C1. **Race condition drops concurrent user action annotations** — `pkg/controllers/app/engine/controllers/docker_watcher.go:449-451` (critical, regression)

```go
		patch := client.MergeFrom(latest.DeepCopy())
		delete(latest.Annotations, containersv1alpha1.AnnotationAction)
		return w.k8s.Patch(ctx, &latest, patch)
```

`removeActionAnnotation` uses `client.Patch` with `client.MergeFrom` to remove the `AnnotationAction`. JSON Merge Patches (`types.MergePatchType`) do not carry or enforce `resourceVersion` for metadata changes. When the generated patch `{"metadata":{"annotations":{"containers.rancherdesktop.io/action":null}}}` is sent to the Kubernetes API server, it unconditionally removes the annotation. If a user or the GUI updates the annotation to a new action immediately after the reconciler fetches `latest` (but before the `Patch` executes), the new action is silently deleted and ignored. The surrounding `RetryOnConflict` loop provides no protection here, as `Patch` will not return a 409 Conflict in this scenario.

Fix: Use `Update` instead of `Patch` so the API server enforces optimistic concurrency control. If the annotation changed concurrently, `Update` will return a 409 Conflict, triggering the retry loop to correctly read the new value and back off safely.

```diff
-		patch := client.MergeFrom(latest.DeepCopy())
 		delete(latest.Annotations, containersv1alpha1.AnnotationAction)
-		return w.k8s.Patch(ctx, &latest, patch)
+		return w.k8s.Update(ctx, &latest)
```

#### Important Issues

I1. **`unpause` on a stopped container silently records `Succeeded`** — `pkg/controllers/app/engine/controllers/docker_watcher.go:392-398` (important, gap)

```go
	case containersv1alpha1.ContainerActionUnpause:
		paused, err := w.isContainerPaused(ctx, id)
		if err != nil {
			return err
		}
		if !paused {
			return nil
		}
```

When a user requests `unpause` on a non-running (stopped or exited) container, the `isContainerPaused` pre-check evaluates to `false`. `dispatchContainerAction` interprets this as "already in the target state" and returns `nil`. Consequently, `status.lastAction` records `Succeeded`, even though the container remains stopped and Docker was never asked to unpause it. By contrast, a `pause` request on a stopped container correctly records `Failed` because the native Docker API is invoked and rejects it with an explicit error.

Fix: The pre-check for `unpause` should explicitly verify that the container is running before declaring success, or it should defer to Docker to generate the appropriate error.

```diff
 	case containersv1alpha1.ContainerActionUnpause:
-		paused, err := w.isContainerPaused(ctx, id)
+		inspect, err := w.cli.ContainerInspect(ctx, id, dockerclient.ContainerInspectOptions{})
 		if err != nil {
 			return err
 		}
-		if !paused {
+		if !inspect.Container.State.Running {
+			return fmt.Errorf("container is not running")
+		}
+		if !inspect.Container.State.Paused {
 			return nil
 		}
```

#### Suggestions

S1. **`InProgress` action state is defined but never used** — `pkg/apis/containers/v1alpha1/container_types.go:58` (suggestion, enhancement)

```go
	ContainerActionInProgress ContainerActionState = "InProgress"
```

The `ContainerActionState` enum defines `InProgress`, but the reconciler never writes it to `status.lastAction`. The reconciler blocks on the Docker call and then writes either `Succeeded` or `Failed`. Assuming the GUI determines the pending state by the presence of the `AnnotationAction` annotation itself, `InProgress` is dead code.

Fix: Remove `ContainerActionInProgress` from `container_types.go` and `crd.yaml` to avoid confusion. Writing it to `status.lastAction` before calling Docker would require an extra API call and break the intended "crash mid-flight replays the action" property.

### Design Observations

**Concerns**

- **Missing Error Scrubbing (future)**: `patchContainerLastAction` uses JSON Merge Patch (`client.MergeFrom`) to update `LastAction`. If the previous action failed (setting `error: "..."`), and the new action succeeds (omitting `error`), `MergeFrom` diffs the missing struct field and generates `"error": null` in the patch payload, successfully clearing the error on the K8s API server. While this currently works flawlessly, it relies on a nuanced behavior of `MergeFrom` and struct diffing. Future additions to `ContainerLastAction` that omit fields on success must be careful to avoid `omitempty` pointer gotchas.

**Strengths**

- **Crash Recovery**: The ordering of events in `processContainerAction`—dispatching the blocking Docker call before removing the annotation—is a robust safeguard against process crashes mid-flight.
- **List Scope Bug Prevention**: The loop in `reconcileContainerActions` safely uses `c := &containers.Items[i]` instead of creating pointers to the loop variable `c`. This prevents a classic Go loop scope bug when appending addresses or pointers to a collection.

### Testing Assessment

The integration tests properly exercise the new action-based architecture via annotation injections and validations across states.

Missing coverage (ranked by risk):
1. Concurrent modifications to the action annotation (the race condition flagged in C1). Testing this deterministically in BATS is difficult but could be achieved by using `kubectl patch` simultaneously during a long-running action.
2. Asserting that `unpause` on a stopped container results in a `Failed` last action state (as raised in I1).

### Documentation Assessment

The updates to `docs/design/api_containers.md` excellently document the rationale behind switching from a desired-state spec model to a one-shot annotation model, explicitly detailing Docker's competing restart policy.

### Acknowledged Limitations

- **Double-restarts on crash recovery**: "restart is the one exception and may run twice on crash recovery." (PR description). The architectural choice to keep the annotation present during the Docker call accepts this idempotency gap for restarts.
- **Controller listing scope**: "reconcileContainerSpecs and processFinalizers both list all resources per reconcile; splitting into per-resource reconcilers with watch predicates is the long-term fix." (PR description).
- **Missing CLI support for actions**: "I don't think we need any CLI support for setting container actions. The feature exists solely for the benefit of the GUI." (PR comment).

### Coverage Summary

- `bats/tests/32-app-controllers/engine-docker.bats`: Reviewed, no issues
- `docs/design/api_containers.md`: Reviewed, no issues
- `pkg/apis/containers/v1alpha1/applyconfiguration/containers/v1alpha1/container.go`: Reviewed, no issues
- `pkg/apis/containers/v1alpha1/applyconfiguration/containers/v1alpha1/containerlastaction.go`: Reviewed, no issues
- `pkg/apis/containers/v1alpha1/applyconfiguration/containers/v1alpha1/containerspec.go`: Reviewed, no issues
- `pkg/apis/containers/v1alpha1/applyconfiguration/containers/v1alpha1/containerstatus.go`: Reviewed, no issues
- `pkg/apis/containers/v1alpha1/applyconfiguration/utils.go`: Trivial
- `pkg/apis/containers/v1alpha1/container_types.go`: S1
- `pkg/apis/containers/v1alpha1/zz_generated.deepcopy.go`: Trivial
- `pkg/controllers/app/engine/controllers/docker_watcher.go`: C1, I1
- `pkg/controllers/app/engine/controllers/engine_reconciler.go`: Reviewed, no issues
- `pkg/controllers/app/engine/controllers/sync_containers.go`: Reviewed, no issues
- `pkg/controllers/containers/container/container_webhooks.go`: Reviewed, no issues
- `pkg/controllers/containers/container/crd.yaml`: S1
- `pkg/controllers/mock/container_reconciler.go`: Reviewed, no issues

</details>

### [Codex] — Pass 1

<details>
<summary>Full review output</summary>


### Executive Summary

This change replaces `Container.spec.state` with one-shot action annotations and simplifies the engine controller's SSA ownership model around `Container` mirrors. The overall direction is sound, but the new annotation path has one important validation hole that can wedge a `Container` into a permanent retry loop, and the new `InProgress` action state is exposed in the API without any implementation path that can ever emit it.

### Findings

**Critical Issues**

None.

**Important Issues**

I1. **Invalid action annotations can wedge reconciliation indefinitely** — `pkg/controllers/containers/container/container_webhooks.go:37-41`, `pkg/controllers/app/engine/controllers/docker_watcher.go:343-365`, `pkg/controllers/containers/container/crd.yaml:387-395` (important, regression)

```go
action := containersv1alpha1.ContainerAction(raw)

lastAction := &containersv1alpha1.ContainerLastAction{
	Action:      action,
	RequestedAt: requestedAt,
	CompletedAt: metav1.Now(),
}
if dockerErr == nil {
	lastAction.State = containersv1alpha1.ContainerActionSucceeded
} else {
	lastAction.State = containersv1alpha1.ContainerActionFailed
```

The update webhook registered at `container_controller.go:84-91` only enforces spec immutability, and `ValidateUpdate` at `container_webhooks.go:37-41` ignores metadata entirely, so `containers.rancherdesktop.io/action=staart` is accepted. `processContainerAction` then copies that raw annotation into `lastAction.Action` at `docker_watcher.go:343-351`, `dispatchContainerAction` returns `unknown container action` at `docker_watcher.go:419`, and `patchContainerLastAction` at `docker_watcher.go:361-363` tries to persist `status.lastAction.action="staart"`. The CRD enum at `crd.yaml:387-395` rejects that status write, so `removeActionAnnotation` at `docker_watcher.go:364-365` never runs. The bad annotation stays on the object and every reconcile retries the same impossible action forever.

Fix: reject invalid action values in the update webhook before they reach storage, and make the reconciler treat unknown values as non-retryable. A minimal fix is admission validation like:

```diff
func (c *immutableValidator) ValidateUpdate(_ context.Context, oldContainer, newContainer *v1alpha1.Container) (...) {
    if !equality.Semantic.DeepEqual(oldContainer.Spec, newContainer.Spec) {
        return nil, fmt.Errorf("spec is immutable: old: %v, new: %v", oldContainer.Spec, newContainer.Spec)
    }
+   if action, ok := newContainer.Annotations[v1alpha1.AnnotationAction]; ok {
+       switch v1alpha1.ContainerAction(action) {
+       case v1alpha1.ContainerActionStart, v1alpha1.ContainerActionStop,
+           v1alpha1.ContainerActionPause, v1alpha1.ContainerActionUnpause,
+           v1alpha1.ContainerActionRestart:
+       default:
+           return nil, fmt.Errorf("invalid %s annotation %q", v1alpha1.AnnotationAction, action)
+       }
+   }
    return nil, nil
}
```

Also keep a defensive fallback in the reconciler for already-persisted bad values; otherwise a webhook outage or legacy object can still leave the controller stuck.

**Suggestions**

S1. **`InProgress` is part of the API contract, but no code path can ever emit it** — `pkg/apis/containers/v1alpha1/container_types.go:50-60`, `pkg/controllers/app/engine/controllers/docker_watcher.go:348-357` (suggestion, regression)

```go
if dockerErr == nil {
	lastAction.State = containersv1alpha1.ContainerActionSucceeded
} else {
	lastAction.State = containersv1alpha1.ContainerActionFailed
	lastAction.Error = dockerErr.Error()
}
```

`ContainerActionState` adds `InProgress` at `container_types.go:50-60`, but `processContainerAction` at `docker_watcher.go:348-357` only ever writes `Succeeded` or `Failed`. That makes the new enum misleading and leaves clients without the intermediate state the type advertises.

Fix: either write `status.lastAction.state=InProgress` before dispatch and transition it to a terminal state afterwards, or remove `InProgress` from the enum/CRD/generated code until the controller actually produces it.

### Design Observations

**Strengths**

The compare-and-swap removal in `removeActionAnnotation()` is a good choice: it preserves a newer action written while an older one is finishing, which matches the "single pending action, atomic replacement" model cleanly.

Collapsing the old two-field-manager `Container` apply logic after emptying `spec` materially simplifies the SSA story in `sync_containers.go` and removes a subtle source of pruning bugs.

### Testing Assessment

1. No test covers an invalid `containers.rancherdesktop.io/action` value, so I1 would ship green.
2. No test simulates the duplicate-informer-delivery case described in `docker_watcher.go:374-381`, so the pause/unpause pre-check is only comment-verified.

### Documentation Assessment

`docs/design/api_containers.md` documents terminal `lastAction` outcomes only, while the API schema still exposes `InProgress`. Either the controller should emit `InProgress` (S1) or the schema/generated docs should stop advertising it.

### Acknowledged Limitations

- **PR comment**: the author explicitly declined adding CLI support for container actions, on the rationale that this is a GUI-only feature and terminal users should use `docker`/`nerdctl` directly. This PR remains consistent with that rationale.
- **Code comment**: `pkg/controllers/app/engine/controllers/engine_reconciler.go:186-190` notes that both passes are still `O(N)` per child event and that the long-term fix is per-resource reconcilers with watch predicates. This change does not materially worsen that limitation, but it also does not resolve it.

### Coverage Summary

- `bats/tests/32-app-controllers/engine-docker.bats` — Reviewed, no issues
- `docs/design/api_containers.md` — Reviewed, no issues
- `pkg/apis/containers/v1alpha1/applyconfiguration/containers/v1alpha1/container.go` — Trivial
- `pkg/apis/containers/v1alpha1/applyconfiguration/containers/v1alpha1/containerlastaction.go` — Trivial
- `pkg/apis/containers/v1alpha1/applyconfiguration/containers/v1alpha1/containerspec.go` — Trivial
- `pkg/apis/containers/v1alpha1/applyconfiguration/containers/v1alpha1/containerstatus.go` — Trivial
- `pkg/apis/containers/v1alpha1/applyconfiguration/utils.go` — Trivial
- `pkg/apis/containers/v1alpha1/container_types.go` — `I1, S1`
- `pkg/apis/containers/v1alpha1/zz_generated.deepcopy.go` — Trivial
- `pkg/controllers/app/engine/controllers/docker_watcher.go` — `I1, S1`
- `pkg/controllers/app/engine/controllers/engine_reconciler.go` — Reviewed, no issues
- `pkg/controllers/app/engine/controllers/sync_containers.go` — Reviewed, no issues
- `pkg/controllers/containers/container/container_webhooks.go` — `I1`
- `pkg/controllers/containers/container/crd.yaml` — `I1`
- `pkg/controllers/mock/container_reconciler.go` — Reviewed, no issues

</details>

### [Claude] — Pass 1

<details>
<summary>Full review output</summary>

### Executive Summary

PR #316 replaces `Container.spec.state` with a one-shot action annotation model, eliminating the level-triggered fight between the reconciler and Docker's concurrent writers. The implementation is clean: the webhook simplifies, the two-field-owner SSA dance disappears, and crash-recovery semantics are well-reasoned (pre-checks for pause/unpause avoid spurious Failed records; start/stop rely on Docker's 304 idempotency; restart's double-restart is acknowledged). The change is self-contained and well-tested.

### Findings

#### Critical Issues

None.

#### Important Issues

None.

#### Suggestions

**S1. `ContainerActionInProgress` is defined but never written** — `pkg/apis/containers/v1alpha1/container_types.go:58`, `pkg/controllers/containers/container/crd.yaml:412` *(suggestion, regression)*

```go
// Possible values for ContainerActionState.
const (
	ContainerActionInProgress ContainerActionState = "InProgress"
	ContainerActionSucceeded  ContainerActionState = "Succeeded"
	ContainerActionFailed     ContainerActionState = "Failed"
)
```

The `ContainerActionInProgress` constant is declared at `container_types.go:58` and enumerated in the CRD at `crd.yaml:412`, but no code path ever assigns it. `processContainerAction` at `pkg/controllers/app/engine/controllers/docker_watcher.go:336-367` dispatches the Docker call synchronously and writes `lastAction.State` as either `Succeeded` (line 354) or `Failed` (line 356) immediately afterward — there is no pre-dispatch write that would set `InProgress`. A GUI polling `status.lastAction.state` during a slow action (e.g., a 10s graceful stop) only ever sees the previous action's terminal outcome, not `InProgress`.

This is unused API surface. YAGNI would remove it; if the intent is to reserve it for a future "write InProgress before dispatch, update after" pattern, a doc comment on the constant explaining the reservation would prevent confusion.

Fix: either remove the constant and the CRD enum entry, or add an explanatory comment.

**S2. Doc sentence lists restart as both idempotent and the exception** — `docs/design/api_containers.md:240-244` *(suggestion, regression)*

```markdown
The engine calls Docker before patching `status.lastAction` and
removing the annotation, so a crash mid-flight leaves the annotation
in place and the next reconcile replays the action. Start, stop,
pause, unpause, and restart are idempotent against a container already
in the target state, so replay is safe; restart is the one exception
and may run twice on crash recovery.
```

The comma list includes `restart` as idempotent; the next clause names `restart` as the exception. The code comment at `docker_watcher.go:374-381` is clearer: start/stop are idempotent via Docker's 304, pause/unpause are idempotent via the reconciler pre-check, and restart "may double-restart" (harmlessly). The user-facing doc should mirror that precision.

### Design Observations

#### Strengths

- **Field manager separation for `lastAction` writes preserves the SSA invariant cleanly.** `patchContainerLastAction` at `docker_watcher.go:435-446` uses a non-SSA merge patch rather than a status apply-config that includes `lastAction`. That means `engine-controller` never claims `status.lastAction` as an SSA-owned field, and the subsequent SSA status apply from `applyContainer` at `sync_containers.go:195-202` (which omits `lastAction` from its config) cannot release-and-prune it. The BATS "pause and unpause actions toggle a running container" test verifies this empirically: the assertion only succeeds if `lastAction` survives the Docker-event-triggered sync that follows the pause.

- **Value-matching annotation removal is the right primitive.** `removeActionAnnotation` at `docker_watcher.go:453-468` re-Gets and checks that the current annotation value still matches the one that was dispatched. A user who supersedes an in-flight action by `kubectl annotate --overwrite` keeps the new annotation intact; the old action's completion only clears its own dispatch.

- **Pre-check-only-where-needed matches Docker's actual semantics.** The asymmetry in `dispatchContainerAction` — pre-check for pause/unpause, no pre-check for start/stop/restart — is grounded in the Docker daemon's behavior (304 for start/stop, hard-error for redundant pause/unpause) rather than defensive uniformity.

### Testing Assessment

Coverage is appropriate for the scope. Ranked by residual risk:

1. **No explicit test that `lastAction` survives a Docker-event-triggered status sync.** Indirectly covered by the pause/unpause test, but not isolated.
2. **No test for a superseded action.** The "value-matches" guard at `docker_watcher.go:460-463` has no direct test.
3. **Crash-recovery replay is not tested.** Not trivially testable at BATS level without killing the process mid-dispatch.

### Documentation Assessment

- `docs/design/api_containers.md` is updated in sync with the code. The example reflects the new shape accurately. The new "Change container state" subsection documents the table of actions and the crash-replay semantics.
- The comment on `Container.Spec` and the CRD description are consistent: "reserved for future use."
- The comment explaining why the two-field-owner SSA dance was removed lives in the PR description rather than the code. Acceptable — the code is simpler and the comment burden should match the current shape, not the diff.

### Commit Structure

Single commit, message accurately describes the change. No concerns.

### Acknowledged Limitations

- **PR comment from @jandubois**: "I don't think we need any CLI support for setting container actions. The feature exists solely for the benefit of the GUI." The implementation is consistent: the BATS tests reach for `rdd ctl annotate` (generic `kubectl annotate`), and no dedicated `rdd` subcommand is added.
- **Code comment** (`docker_watcher.go:331-335`): restart may double-restart on crash recovery. Harmless; acknowledged in both the doc and the code.
- **Code comment** (`engine_reconciler.go:186-195`): `reconcileContainerActions` + `processFinalizers` run per-reconcile List passes; long-term fix is per-resource reconcilers with watch predicates.
- **Pre-existing comment on `mock/container_reconciler.go:92`**: the mock uses a direct `ContainerStatusValue(inspect.State.Status)` cast without the `mapDockerContainerState` validation. Pre-existing; the mock uses static test data, so the risk is bounded.

### Coverage Summary

| File | Status |
|---|---|
| `bats/tests/32-app-controllers/engine-docker.bats` | Reviewed, no issues |
| `docs/design/api_containers.md` | S2 |
| `pkg/apis/containers/v1alpha1/applyconfiguration/containers/v1alpha1/container.go` | Reviewed, no issues (generated) |
| `pkg/apis/containers/v1alpha1/applyconfiguration/containers/v1alpha1/containerlastaction.go` | Reviewed, no issues (generated) |
| `pkg/apis/containers/v1alpha1/applyconfiguration/containers/v1alpha1/containerspec.go` | Trivial (deleted) |
| `pkg/apis/containers/v1alpha1/applyconfiguration/containers/v1alpha1/containerstatus.go` | Reviewed, no issues (generated) |
| `pkg/apis/containers/v1alpha1/applyconfiguration/utils.go` | Reviewed, no issues (generated) |
| `pkg/apis/containers/v1alpha1/container_types.go` | S1 |
| `pkg/apis/containers/v1alpha1/zz_generated.deepcopy.go` | Reviewed, no issues (generated) |
| `pkg/controllers/app/engine/controllers/docker_watcher.go` | Reviewed, no issues |
| `pkg/controllers/app/engine/controllers/engine_reconciler.go` | Reviewed, no issues |
| `pkg/controllers/app/engine/controllers/sync_containers.go` | Reviewed, no issues |
| `pkg/controllers/containers/container/container_webhooks.go` | Reviewed, no issues |
| `pkg/controllers/containers/container/crd.yaml` | S1 (generated from kubebuilder markers) |
| `pkg/controllers/mock/container_reconciler.go` | Reviewed, no issues |

</details>
