# Deep Review: 20260325-122724-pr-234

| | |
|---|---|
| **Date** | 2026-03-25 12:27 |
| **Repo** | [rancher-sandbox/rancher-desktop-daemon](https://github.com/rancher-sandbox/rancher-desktop-daemon) |
| **Round** | 1 |
| **Author** | [@jandubois](https://github.com/jandubois) |
| **PR** | [#234](https://github.com/rancher-sandbox/rancher-desktop-daemon/pull/234) — Preserve Lima serial logs across VM restarts |
| **Branch** | `rotate-serial-logs` |
| **Commits** | `d86eced` Preserve Lima serial logs across VM restarts<br>`75d179e` Preserve instance logs on LimaVM deletion when RDD_KEEP_LOGS is set<br>`0bad155` Preserve instance logs in svc delete when RDD_KEEP_LOGS is set |
| **Reviewers** | Claude Opus 4.6, Codex GPT 5.4, Gemini 3.1 Pro |
| **Verdict** | **Merge with fixes** — premature pruning bug when active log is missing; minor error-handling gaps |
| **Wall-clock time** | `12 min 39 s` |

---

## Executive Summary

This PR extracts a `Rotate()` helper from `logfile.Create()`, rotates serial logs before each VM start, and preserves instance logs on deletion when `RDD_KEEP_LOGS` is set. The design is clean: rotation reuses the existing numbering and pruning logic, preservation is non-fatal, and the three commits decompose naturally. Two bugs need fixing: `Rotate` prunes a backup when the active log is absent, and `Rotate` silently swallows non-`ErrNotExist` stat errors.

---

## Critical Issues

None.

---

## Important Issues

1. **Premature pruning when active log is missing** — `pkg/util/logfile/logfile.go:89-102` [Gemini 3.1 Pro]

```go
nextN := maxN + 1
filePath := filepath.Join(dir, name+".log")

if _, err := os.Lstat(filePath); err == nil {
    numberedName := fmt.Sprintf("%s.%d.log", name, nextN)
    if err := os.Rename(filePath, filepath.Join(dir, numberedName)); err != nil {
        return fmt.Errorf("rename %s to %s: %w", filePath, numberedName, err)
    }
    numberedFiles = append(numberedFiles, numberedFile{n: nextN, name: numberedName})
}

if !keepAll {
    pruneOldFiles(dir, numberedFiles, nextN)
}
```

`pruneOldFiles` always receives `nextN` (`maxN + 1`) as its `currentN` argument, regardless of whether a rename occurred. When the active log is absent, no backup is created, yet the cutoff advances by one. With five existing backups (`serial.1.log` through `serial.5.log`), `pruneOldFiles` computes `cutoff = 6 - 5 = 1` and deletes `serial.1.log`, reducing the pool to four files without adding a new one. Each call with a missing active log loses one backup.

Fix: pass `maxN` instead of `nextN` when the rename was skipped.

```diff
+	currentN := maxN
 	if _, err := os.Lstat(filePath); err == nil {
 		numberedName := fmt.Sprintf("%s.%d.log", name, nextN)
 		if err := os.Rename(filePath, filepath.Join(dir, numberedName)); err != nil {
 			return fmt.Errorf("rename %s to %s: %w", filePath, numberedName, err)
 		}
 		numberedFiles = append(numberedFiles, numberedFile{n: nextN, name: numberedName})
+		currentN = nextN
 	}

 	if !keepAll {
-		pruneOldFiles(dir, numberedFiles, nextN)
+		pruneOldFiles(dir, numberedFiles, currentN)
 	}
```

(important, regression)

2. **Silent `Lstat` errors in `Rotate`** — `pkg/util/logfile/logfile.go:93` [Gemini 3.1 Pro]

```go
if _, err := os.Lstat(filePath); err == nil {
```

If `Lstat` fails with something other than `ErrNotExist` (e.g., permission denied), `Rotate` skips the rename and returns `nil`. The caller proceeds to overwrite the log, losing its contents without warning. Non-existence errors should skip the rename; all others should propagate.

Fix:

```diff
 	if _, err := os.Lstat(filePath); err == nil {
 		// ... rename logic ...
+	} else if !errors.Is(err, fs.ErrNotExist) {
+		return fmt.Errorf("stat %s: %w", filePath, err)
 	}
```

(important, regression)

---

## Suggestions

1. **`PreserveLogs` aborts on first rename failure** — `pkg/instance/logs.go:54-55` [Gemini 3.1 Pro]

```go
if err := os.Rename(src, dst); err != nil {
    return count, fmt.Errorf("preserve %s: %w", entry.Name(), err)
}
```

If one file fails to rename (e.g., held open by a lingering process on Windows), the function returns immediately and skips all remaining files. For a best-effort debugging aid gated behind an environment variable, aggregating errors and preserving as many files as possible is more useful.

Fix: collect errors and return `errors.Join(errs...)` at the end.

(suggestion, gap)

2. **`svc delete` removes `ShortDir` without Lima driver cleanup** — `pkg/service/cmd/service.go:407` [Codex GPT 5.4]

```go
_ = os.RemoveAll(instance.ShortDir())
```

`Delete()` removes `ShortDir()` (which contains `LimaHome`) with a raw `os.RemoveAll`. The normal LimaVM deletion path goes through `limainstance.Delete`, which lets the Lima driver run platform-specific cleanup (e.g., WSL2 distro unregister on Windows). `Stop()` at line 402 stops the service, but controllers may not complete full instance cleanup before returning. On macOS (VZ driver) this is harmless — VMs are just files. On Windows/WSL2, it could leave registered distros behind.

This is future-facing: the project has no Windows support yet. Worth a comment documenting the limitation. If Windows support lands before this is revisited, add Lima-level instance deletion before the `RemoveAll`.

(suggestion, gap)

3. **`os.IsNotExist` vs `errors.Is` inconsistency** — `pkg/service/cmd/service.go:420` [Claude Opus 4.6]

```go
if !os.IsNotExist(err) {
```

New code in `logs.go:69` uses the modern `errors.Is(err, fs.ErrExist)`, while `service.go:420` uses the legacy `os.IsNotExist`. Both work identically for `*os.PathError`, but mixing styles within a PR is a minor inconsistency.

(suggestion, regression)

4. **`Rotate` docstring omits directory precondition** — `pkg/util/logfile/logfile.go:57-62` [Claude Opus 4.6]

```go
// Rotate renames {name}.log to {name}.{N}.log without creating a new file.
// Use this for logs managed by an external process ...
```

Unlike `Create` (which calls `os.MkdirAll`), `Rotate` assumes the directory exists. Current callers satisfy this, but the exported API does not document the precondition.

Fix: add "The directory must already exist" to the docstring.

(suggestion, regression)

---

## Design Observations

### Strengths

- **Clean `Rotate`/`Create` decomposition.** Extracting `rotate()` preserves all existing behavior (numbering, pruning, Lstat guard) while making rotation independently usable for serial logs. `Create` becomes a thin wrapper. [Claude Opus 4.6, Gemini 3.1 Pro, Codex GPT 5.4]

- **Non-fatal log preservation.** Both controller-side (`preserveInstanceLogs`) and CLI-side (`preserveAllInstanceLogs`) log errors and continue rather than blocking deletion. For a debugging aid gated behind an environment variable, this is the right trade-off. [Claude Opus 4.6]

- **Race-safe `nextAvailableDir`.** Using `os.Mkdir` (not `MkdirAll`) as a test-and-create avoids races between concurrent callers without locks. The `ErrExist` check handles collisions correctly. [Claude Opus 4.6]

- **Pre-filtering before directory creation.** `PreserveLogs` filters for `.log` files before calling `nextAvailableDir`, avoiding empty directories when no logs exist. [Claude Opus 4.6]

- **Correct ordering in `handleDeletion`.** `preserveInstanceLogs` runs after `StopForcibly` but before `limainstance.Delete`, ensuring files exist and no process is writing to them. [Claude Opus 4.6]

### Concerns

- **Cross-device rename in `PreserveLogs`** `(future)` — `os.Rename` returns `EXDEV` when source and destination live on different filesystems. Both paths are typically under `$HOME`, but users with non-standard mounts would fail consistently. A copy-and-delete fallback would make preservation robust across all storage layouts. [Claude Opus 4.6, Gemini 3.1 Pro]

---

## Testing Assessment

Unit test coverage is solid:

- `logfile.Rotate`: 4 test cases (no file, rename, sequential numbering, pruning)
- `instance.PreserveLogs`: 3 test cases (moves logs, skips non-logs, handles missing source)
- `instance.nextAvailableDir`: 3 test cases (first call, collision, missing parent)

Untested scenarios, ranked by risk:

1. **Rotate with existing backups but no active log** — would catch Finding 1 (premature pruning). Set up numbered backups, omit the active log, verify no backup is deleted.
2. **`PreserveLogs` partial rename failure** — verify remaining files are still preserved (relevant after adopting the aggregation fix from Suggestion 1).
3. **End-to-end serial log rotation through a VM restart** — current tests exercise `Rotate` in isolation; no integration test proves serial logs survive an actual restart.

---

## Documentation Assessment

Documentation is accurate and well-organized. `environment.md` clearly explains the three effects of `RDD_KEEP_LOGS`. `cmd_service.md` now describes the directories `svc delete` removes. The spelling dictionary adds `serialp` and `serialv`.

---

## Commit Structure

The three commits decompose cleanly: rotation primitive, controller-side preservation, CLI-side preservation. Each compiles and tests independently. Messages describe both what and why.

---

## Agent Performance Retro

### [Claude]

Claude produced the most thorough review, covering all 12 files and identifying design strengths that other agents missed (race-safe `nextAvailableDir`, pre-filtering, correct ordering in `handleDeletion`). Its sole important finding — cross-device rename — is valid but low-risk (same filesystem in practice). The two suggestions (docstring precondition, `os.IsNotExist` inconsistency) are actionable. No false positives. Claude missed the premature pruning bug and the silent Lstat error that Gemini caught.

### [Codex]

Codex focused deeply on one finding: `svc delete` bypassing Lima's driver cleanup on WSL2. The analysis is technically sound but overstates the severity — the project has no Windows support yet, making this future-facing rather than a current regression. Codex missed all findings from both other agents (`Rotate` bugs, `PreserveLogs` abort, cross-device rename). Coverage was complete but shallow outside the one finding.

### [Gemini]

Despite hitting a 429 rate limit during the review, Gemini produced the highest-signal findings: the premature pruning bug (Finding 1) and the silent Lstat error (Finding 2) are both real regressions that the other agents missed. The `PreserveLogs` abort-on-first-failure observation is also valid. The review is concise and well-structured. Gemini missed the cross-device rename concern and the `svc delete`/WSL2 issue.

### Summary

| Metric | Claude Opus 4.6 | Codex GPT 5.4 | Gemini 3.1 Pro |
|---|---|---|---|
| Duration | 5:52 | 4:33 | 3:51 |
| Critical | 0 | 0 | 0 |
| Important | 1 | 1 | 3 |
| Suggestion | 2 | 0 | 0 |
| Design observations | 6 | 1 | 2 |
| False positives | 0 | 0 | 0 |
| Unique insights | 2 (docstring, style) | 1 (WSL2 cleanup) | 2 (pruning bug, Lstat error) |
| Files reviewed | 12 | 12 | 12 |
| Coverage misses | 0 | 0 | 0 |

Gemini delivered the most impactful review despite rate-limiting issues. Its two unique findings are real regressions that need fixing. Claude provided the deepest architectural analysis. Codex's sole finding, while technically valid, addresses a future concern rather than a current bug.

---

## Appendix: Original Reviews

### Claude Opus 4.6

Good. `ShortDir` only contains `lima/` (for `LimaHome`), no other significant content. The removal in `Delete()` is safe.

Now I have all the information I need for the review. Let me write the final assessment.

---

## Executive Summary

This PR adds serial log rotation before VM starts, extracts a `Rotate()` helper from `logfile.Create()`, and preserves instance log files (serial + hostagent) when deleting LimaVM instances or running `rdd svc delete` — all gated behind `RDD_KEEP_LOGS`. The changes are clean, well-tested, and correctly handle error paths. One important issue around cross-filesystem rename failure and a few minor suggestions.

## Findings

### Critical Issues

None.

### Important Issues

1. **`os.Rename` fails across filesystem boundaries** — `pkg/instance/logs.go:54`

```go
if err := os.Rename(src, dst); err != nil {
    return count, fmt.Errorf("preserve %s: %w", entry.Name(), err)
}
```

`PreserveLogs` moves files from the Lima instance directory (`~/.rd2/lima/<name>/`) to the log directory (`~/Library/Logs/rancher-desktop-2/<name>/`). `os.Rename` returns `EXDEV` when source and destination are on different filesystems. While both paths are typically under `$HOME` (same filesystem), users with non-standard mounts (e.g., Lima home on a separate partition, or log directory on a network mount) would hit this silently on every deletion — the error is logged but logs are permanently lost.

The error does not self-heal: if the filesystem layout causes `EXDEV`, every subsequent call fails the same way.

Fix: Fall back to copy-and-delete when `os.Rename` returns `*os.LinkError` with `syscall.EXDEV`:

```diff
 if err := os.Rename(src, dst); err != nil {
+    if isCrossDevice(err) {
+        if copyErr := copyFile(src, dst); copyErr != nil {
+            return count, fmt.Errorf("preserve %s: %w", entry.Name(), copyErr)
+        }
+        os.Remove(src)
+        count++
+        continue
+    }
     return count, fmt.Errorf("preserve %s: %w", entry.Name(), err)
 }
```

Alternatively, if you consider this unlikely enough, add a comment at line 54 noting the cross-device limitation so a future maintainer doesn't have to rediscover it. (important, gap)

### Suggestions

1. **`Rotate` silently fails on nonexistent directory** — `pkg/util/logfile/logfile.go:66-68`

```go
entries, err := os.ReadDir(dir)
if err != nil {
    return fmt.Errorf("read log directory %s: %w", dir, err)
}
```

Unlike `Create()` (which calls `os.MkdirAll` before `Rotate`), a standalone `Rotate()` call assumes the directory exists. The current callers guarantee this (Lima instance directories exist by the time `startInstance` runs), but the exported API doesn't document this precondition. A reader calling `Rotate` on a fresh directory would get an unhelpful "read log directory" error.

Fix: Add a one-line note to the `Rotate` docstring:

```diff
 // Rotate renames {name}.log to {name}.{N}.log without creating a new file.
+// The directory must already exist (unlike Create, which creates it).
 // Use this for logs managed by an external process ...
```
(suggestion, regression)

2. **`os.IsNotExist` vs `errors.Is` inconsistency** — `pkg/service/cmd/service.go:420`

```go
if !os.IsNotExist(err) {
```

New code in `pkg/instance/logs.go:69` uses the modern `errors.Is(err, fs.ErrExist)`, while the new code in `service.go:420` uses the legacy `os.IsNotExist`. Both work identically for `*os.PathError`, but mixing styles in the same PR is a minor inconsistency.

Fix: Use `errors.Is(err, fs.ErrNotExist)` at line 420 to match `logs.go`. Or leave it — this matches the existing pattern in `service.go`. (suggestion, regression)

## Design Observations

### Strengths

- **Clean refactoring of `Rotate` from `Create`**: The extraction preserves all existing behavior (numbering, pruning, Lstat guard) while making the rotation primitive independently usable. `Create` becomes a thin wrapper: rotate, create file, write header. This is the right decomposition.

- **Non-fatal error handling for log preservation**: Both `preserveInstanceLogs` (controller path, line 523-525) and `preserveAllInstanceLogs` (CLI path, line 431-433) log errors and continue rather than blocking deletion. For a debugging aid gated behind an environment variable, this is the correct trade-off — we should never prevent instance cleanup because log copying failed.

- **Atomic directory naming via `nextAvailableDir`**: Using `os.Mkdir` (not `MkdirAll`) as a test-and-create in `nextAvailableDir` (line 67-74) is a clean way to handle concurrent calls without locks. The `ErrExist` check handles races correctly — two concurrent callers would each get a unique directory.

- **Pre-filtering before directory creation**: `PreserveLogs` filters for `.log` files before calling `nextAvailableDir` (line 30-38), avoiding empty directories when no logs exist. This is a thoughtful detail that prevents clutter in the log directory.

- **Ordering in `handleDeletion`**: `preserveInstanceLogs` (line 53) is called after `StopForcibly` (line 51) but before `limainstance.Delete` (line 55), ensuring no process is writing to logs when they're moved and the files still exist when we try to move them. Correct sequencing.

### Concerns

**`Delete()` now removes `ShortDir` unconditionally** — `pkg/service/cmd/service.go:407` (in-scope)

The PR adds `os.RemoveAll(instance.ShortDir())` to `Delete()`. Previously only `instance.Dir()` was removed. `ShortDir()` (e.g., `~/.rd2/`) currently contains only `lima/`, but it's an instance-scoped directory that other features could use in the future. Today the deletion is correct — the commit message documents the intent. But this is a one-way door: if future code stores state in ShortDir that should survive a delete (e.g., caches, certificates), the `RemoveAll` would destroy it silently. This is not a bug today, just a note that ShortDir deletion should be reconsidered if new content is added there.

## Testing Assessment

The test coverage is good:

- `logfile.Rotate`: 4 new test cases (no file, rename, sequential numbering, pruning) covering the core scenarios.
- `instance.PreserveLogs`: 3 test cases (moves logs, skips non-logs, handles missing source). The `LogDir` override pattern in tests is clean.
- `instance.nextAvailableDir`: 3 test cases (first call, collision, missing parent).

Untested scenarios ranked by risk:

1. **`preserveAllInstanceLogs` in `svc delete`** — no unit test, but it's a thin loop over `PreserveLogs` which is well-tested. Covered by BATS integration tests via `rdd svc delete`. Low risk.
2. **Cross-device rename failure** — no test exercises `EXDEV`. Would require test infrastructure to create cross-device scenarios. Low priority.
3. **`PreserveLogs` with partial rename failure** — if one file fails to rename, the function returns a partial count. No test covers this path. Low risk since the behavior is documented via the return signature.

## Documentation Assessment

Documentation changes are accurate and well-organized:

- `environment.md`: The new "Log Preservation" section clearly explains the three effects of `RDD_KEEP_LOGS` (no pruning, survives delete, instance logs survive VM deletion). The "Log rotation" and "BATS defaults" subsections provide useful operational context.
- `cmd_service.md`: The previously-empty `rdd service delete` section now has a one-sentence description covering the three directories removed. Matches the implementation.
- Spelling dictionary: `serialp` and `serialv` added. Correct.

## Commit Structure

The three commits are cleanly separated:

1. `d86eced` — serial log rotation (new `Rotate` API + caller in `startInstance`)
2. `75d179e` — instance log preservation on LimaVM deletion + ShortDir cleanup + BATS teardown removal
3. `0bad155` — `svc delete` preservation + `PreserveLogs` extraction into shared package

Each commit compiles and tests independently. The logical progression (rotation primitive → controller-side preservation → CLI-side preservation) is natural. Commit messages accurately describe both the what and the why.

## Acknowledged Limitations

None found in the changed code — no TODO or explanatory comments documenting known limitations.

## Unresolved Feedback

No PR review comments to evaluate.

## Coverage Summary

| File | Status |
|------|--------|
| `.github/actions/spelling/expect/rd.txt` | **Trivial** — added `serialp`, `serialv` to spell check dictionary |
| `bats/tests/33-lima-controllers/limavm-instance.bats` | **Reviewed, no issues** — `local_teardown_file` removal is safe because `svc delete` now removes `ShortDir` |
| `bats/tests/33-lima-controllers/limavm-running.bats` | **Reviewed, no issues** — same rationale as above |
| `docs/design/cmd_service.md` | **Reviewed, no issues** — accurate description of `svc delete` behavior |
| `docs/design/environment.md` | **Reviewed, no issues** — clear, accurate documentation |
| `pkg/controllers/lima/limavm/controllers/limavm_lifecycle.go` | **Finding 1** (via `PreserveLogs` call), **Finding 2** (no issues in this file itself) |
| `pkg/instance/logs.go` | **Finding 1** — cross-device rename risk |
| `pkg/instance/logs_test.go` | **Reviewed, no issues** — good coverage of core scenarios |
| `pkg/service/cmd/service.go` | **Finding 3** — minor `os.IsNotExist` style inconsistency |
| `pkg/util/logfile/logfile.go` | **Finding 2** — missing precondition in `Rotate` docstring |
| `pkg/util/logfile/logfile_test.go` | **Reviewed, no issues** — thorough `Rotate` test coverage |
| `scripts/collect-bats-logs.sh` | **Reviewed, no issues** — correctly collects preserved instance logs, `nullglob` handles empty directories |

### Codex GPT 5.4

### Executive Summary

This PR adds serial-log rotation before hostagent startup, extracts the shared logfile rotation logic, and preserves Lima instance logs when `RDD_KEEP_LOGS` is set during LimaVM deletion and `rdd svc delete`. The helper/unit-test coverage is good, but `rdd svc delete` now removes `ShortDir()` by raw filesystem deletion instead of going through Lima’s delete path, which is a real regression on Windows because WSL2 instances need driver-level unregister.

### Findings

**Critical Issues**

None.

**Important Issues**

1. **`svc delete` bypasses Lima instance deletion and can orphan WSL distros** — [pkg/service/cmd/service.go:403-407](/Users/jan/git/rancher-desktop-daemon/pkg/service/cmd/service.go#L403) (important, regression)

```go
_ = Stop()
preserveAllInstanceLogs()
if os.Getenv("RDD_KEEP_LOGS") == "" {
    _ = os.RemoveAll(instance.LogDir())
}
_ = os.RemoveAll(instance.ShortDir())
return os.RemoveAll(instance.Dir())
```

`Delete()` at line 403 now removes `instance.ShortDir()` at line 407 directly, but the normal LimaVM deletion path calls `limainstance.Delete(ctx, existingInst, true)` in [pkg/controllers/lima/limavm/controllers/limavm_lifecycle.go:50](/Users/jan/git/rancher-desktop-daemon/pkg/controllers/lima/limavm/controllers/limavm_lifecycle.go#L50) specifically to let Lima run driver cleanup before removing `inst.Dir`. That difference matters on Windows: Lima’s WSL2 driver implements deletion by unregistering the distro in `pkg/driver/wsl2/wsl_driver_windows.go:207-215`, so deleting `.rd...` on disk at line 407 can return success while leaving the WSL distro registered. The next recreate/delete cycle can then collide with the stale distro name or leak host-side resources. `git blame` shows line 407 was introduced by commit `75d179e`, so this is a PR regression.

Fix: before removing `ShortDir()`, enumerate `instance.LimaHome()` and delete each instance through Lima’s API (`store.Inspect` + `limainstance.Delete`) so driver-specific cleanup runs on every platform. Only fall back to raw `RemoveAll` for obviously incomplete instance directories, and propagate/report failures instead of discarding the `ShortDir()` removal error.

**Suggestions**

None.

### Design Observations

**Strengths**

- Extracting `logfile.Rotate()` and reusing the same numbering/pruning logic from `Create()` reduces drift between service/hostagent logs and serial logs. That is the right abstraction boundary for this change.

### Testing Assessment

The new unit tests are solid for `pkg/util/logfile` and `pkg/instance`, and `go test ./pkg/util/logfile ./pkg/instance` passed locally.

Untested scenarios, highest risk first:

1. No end-to-end coverage for `rdd svc delete` with a real Lima instance present, especially on Windows/WSL2 where driver unregister is required.
2. No integration test proving serial logs survive an actual VM restart; current coverage only exercises `logfile.Rotate()` in isolation.
3. No integration test for the new `RDD_KEEP_LOGS` preservation flow on LimaVM deletion and `svc delete` that verifies preserved logs land under `RDD_LOG_DIR`.

### Documentation Assessment

The docs are otherwise aligned with the intended behavior. After fixing Finding 1, `rdd svc delete` documentation in [docs/design/cmd_service.md](/Users/jan/git/rancher-desktop-daemon/docs/design/cmd_service.md) and [docs/design/environment.md](/Users/jan/git/rancher-desktop-daemon/docs/design/environment.md) should remain accurate; until then they overstate cleanup on Windows.

### Coverage Summary

- [.github/actions/spelling/expect/rd.txt](/Users/jan/git/rancher-desktop-daemon/.github/actions/spelling/expect/rd.txt) — Trivial
- [bats/tests/33-lima-controllers/limavm-instance.bats](/Users/jan/git/rancher-desktop-daemon/bats/tests/33-lima-controllers/limavm-instance.bats) — Reviewed, no issues
- [bats/tests/33-lima-controllers/limavm-running.bats](/Users/jan/git/rancher-desktop-daemon/bats/tests/33-lima-controllers/limavm-running.bats) — Reviewed, no issues
- [docs/design/cmd_service.md](/Users/jan/git/rancher-desktop-daemon/docs/design/cmd_service.md) — Reviewed, no issues
- [docs/design/environment.md](/Users/jan/git/rancher-desktop-daemon/docs/design/environment.md) — Reviewed, no issues
- [pkg/controllers/lima/limavm/controllers/limavm_lifecycle.go](/Users/jan/git/rancher-desktop-daemon/pkg/controllers/lima/limavm/controllers/limavm_lifecycle.go) — Reviewed, no issues
- [pkg/instance/logs.go](/Users/jan/git/rancher-desktop-daemon/pkg/instance/logs.go) — Reviewed, no issues
- [pkg/instance/logs_test.go](/Users/jan/git/rancher-desktop-daemon/pkg/instance/logs_test.go) — Reviewed, no issues
- [pkg/service/cmd/service.go](/Users/jan/git/rancher-desktop-daemon/pkg/service/cmd/service.go) — Finding 1
- [pkg/util/logfile/logfile.go](/Users/jan/git/rancher-desktop-daemon/pkg/util/logfile/logfile.go) — Reviewed, no issues
- [pkg/util/logfile/logfile_test.go](/Users/jan/git/rancher-desktop-daemon/pkg/util/logfile/logfile_test.go) — Reviewed, no issues
- [scripts/collect-bats-logs.sh](/Users/jan/git/rancher-desktop-daemon/scripts/collect-bats-logs.sh) — Reviewed, no issues

### Gemini 3.1 Pro

### Executive Summary

This PR successfully introduces the ability to preserve instance and serial logs across restarts and deletions, primarily to aid post-mortem debugging. The implementation smartly extracts log rotation logic to make it reusable and gracefully cleans up log directories. However, there are a few important edge cases with the file operations: `logfile.Rotate` silently drops non-existence stat errors and prematurely prunes old logs when the active file is missing, and `PreserveLogs` aborts on the first file error rather than rescuing what it can.

### Critical Issues

None.

### Important Issues

1. **Premature pruning of backups when active log is missing** — `pkg/util/logfile/logfile.go:107` (important, regression)

   In `Rotate()`, `nextN` is unconditionally passed to `pruneOldFiles` even if no active log file was found to rename. This inflates the sequence number used for the retention cutoff, causing the oldest backup to be deleted even though no new backup was created. Over multiple starts where the active log is missing, this unnecessarily shrinks the retention pool by one file each time. (In the previous implementation of `Create`, a new file was immediately created, so passing `nextN` was always correct).

   ```diff
   -	if !keepAll {
   -		pruneOldFiles(dir, numberedFiles, nextN)
   -	}
   +	currentN := maxN
   	if _, err := os.Lstat(filePath); err == nil {
   		numberedName := fmt.Sprintf("%s.%d.log", name, nextN)
   		if err := os.Rename(filePath, filepath.Join(dir, numberedName)); err != nil {
   			return fmt.Errorf("rename %s to %s: %w", filePath, numberedName, err)
   		}
   		numberedFiles = append(numberedFiles, numberedFile{n: nextN, name: numberedName})
   +		currentN = nextN
   	}

   	if !keepAll {
   -		pruneOldFiles(dir, numberedFiles, nextN)
   +		pruneOldFiles(dir, numberedFiles, currentN)
   	}
   ```

2. **Silent omission of non-NotExist `os.Lstat` errors** — `pkg/util/logfile/logfile.go:93` (important, gap)

   In `Rotate()`, `os.Lstat(filePath)` is checked with `err == nil`. If `Lstat` fails with an error other than `fs.ErrNotExist` (e.g., permission denied), the error is silently ignored. The rename step is skipped, `Rotate` returns `nil`, and the operation proceeds (potentially causing the caller to fail opaquely later or to blindly overwrite the log without rotating it). Non-NotExist errors should be propagated.

   ```diff
   -	if _, err := os.Lstat(filePath); err == nil {
   +	if _, err := os.Lstat(filePath); err == nil {
   		numberedName := fmt.Sprintf("%s.%d.log", name, nextN)
   		if err := os.Rename(filePath, filepath.Join(dir, numberedName)); err != nil {
   			return fmt.Errorf("rename %s to %s: %w", filePath, numberedName, err)
   		}
   		numberedFiles = append(numberedFiles, numberedFile{n: nextN, name: numberedName})
   +	} else if !errors.Is(err, fs.ErrNotExist) {
   +		return fmt.Errorf("stat %s: %w", filePath, err)
   	}
   ```

3. **Log preservation aborts entirely on first file failure** — `pkg/instance/logs.go:54-57` (important, gap)

   In `PreserveLogs()`, if `os.Rename` fails for a single file (e.g., `ha.stderr.log` is held open by a lingering process on Windows, or due to an `EXDEV` cross-device link error), the function returns immediately. This aborts the loop and skips preserving any remaining log files for that instance. It should aggregate errors so that one unmovable file does not prevent the rest from being preserved.

   ```diff
   +	var errs []error
   	count := 0
   	for _, entry := range logFiles {
   		src := filepath.Join(srcDir, entry.Name())
   		dst := filepath.Join(destDir, entry.Name())
   		if err := os.Rename(src, dst); err != nil {
   -			return count, fmt.Errorf("preserve %s: %w", entry.Name(), err)
   +			errs = append(errs, fmt.Errorf("preserve %s: %w", entry.Name(), err))
   +			continue
   		}
   		count++
   	}

   -	return count, nil
   +	return count, errors.Join(errs...)
   ```

### Suggestions

None.

### Design Observations

**Concerns**
- **Cross-device link failures during log preservation** `(in-scope)`
  `instance.PreserveLogs` uses `os.Rename` to move logs from the instance directory (`LimaHome()`) to the log directory (`LogDir()`). If a user mounts these directories on different filesystems (or one is symlinked to another drive), `os.Rename` will consistently fail with an invalid cross-device link (`EXDEV`) error. Go's `os.Rename` does not fall back to copy-and-delete. While this is an edge case for local environments, implementing a fallback copy routine would make log preservation robust across all storage setups.

**Strengths**
- The automated BATS log collection strategy is cleanly improved; the new directory structure keeps artifacts organized and readable without brittle hardcoded paths.
- Reusing the numbering and pruning code in `logfile.Rotate` keeps the log backup convention perfectly consistent between service logs and serial logs.

### Testing Assessment

1. The new `TestRotateNoFile` test in `logfile_test.go` verifies rotation works when the log directory is completely empty. However, it misses the scenario where older backups exist but the *active* log file does not. Adding a test that sets up numbered backups (e.g., `serial.1.log` through `serial.5.log`), omits the active `serial.log`, and verifies that `serial.1.log` isn't incorrectly pruned would catch the bug identified in Finding 1.

### Documentation Assessment

The additions to `docs/design/environment.md` are excellent and clearly articulate the new log preservation behaviors and folder mapping semantics.

### Commit Structure

No issues. The commits are well isolated and descriptively named.

### Coverage Summary

- `.github/actions/spelling/expect/rd.txt` — Trivial
- `bats/tests/33-lima-controllers/limavm-instance.bats` — Reviewed, no issues
- `bats/tests/33-lima-controllers/limavm-running.bats` — Reviewed, no issues
- `docs/design/cmd_service.md` — Reviewed, no issues
- `docs/design/environment.md` — Reviewed, no issues
- `pkg/controllers/lima/limavm/controllers/limavm_lifecycle.go` — Reviewed, no issues
- `pkg/instance/logs.go` — Finding 3
- `pkg/instance/logs_test.go` — Reviewed, no issues
- `pkg/service/cmd/service.go` — Reviewed, no issues
- `pkg/util/logfile/logfile.go` — Finding 1, Finding 2
- `pkg/util/logfile/logfile_test.go` — Finding 1
- `scripts/collect-bats-logs.sh` — Reviewed, no issues
