Deep Review: 20260411-220257-pr-292
| Date | 2026-04-11 22:02 |
| Repo | rancher-sandbox/rancher-desktop-daemon |
| Round | 1 |
| Author | @jandubois |
| PR | #292 — Windows typeperf sampler |
| Commits | 9bed00e ci: add Windows machine-info and perf-sampler instrumentation5290abd ci: add typeperf-based perf sampler for comparison testinge2511a8 ci: use BLG format to capture processes started mid-rune44cc1b ci: remove PowerShell perf sampler in favor of typeperf |
| Reviewers | Claude Opus 4.6, Codex GPT 5.4, Gemini 3.1 Pro |
| Verdict | Merge with fixes — the diagnostic is otherwise clean, but the environment-variable filter silently drops every CI/RDD/runner variable it was written to capture, and a relog failure on stop deletes the only raw BLG artifact. Both are worth fixing before merge; everything else is polish. |
| Wall-clock time | 16 min 41 s |
Executive Summary ¶
PR #292 adds two Windows-only CI composite actions that upload diagnostics alongside the BATS log bundle: windows-machine-info (one-shot hardware/OS/Defender/WSL snapshot via PowerShell) and windows-typeperf-sampler (background typeperf BLG collection, relog → TSV conversion on stop, and a Go summarizer that prints top-N processes by CPU and working set per sample). The sampler was iterated inside the PR from direct-TSV to BLG specifically to capture processes that start after collection begins, and the PowerShell sampler that commit 5290abd introduced alongside it was removed again in e44cc1b in favor of typeperf only.
The code is well-scoped and the design choices are sound: BLG + relog is the correct way to deal with typeperf's locked-at-first-sample column set, splitting one-shot and continuous sampling into two actions matches their lifecycles, and the Go summarizer indexes columns once from the header and streams rows. Two issues are worth fixing before merge. First, the env-var filter in machine-info.ps1:170 anchors its regex with ^(...)$, so prefix alternatives like GITHUB_, RUNNER_, RDD_, and MSYS only match exact variable names that do not exist — GITHUB_WORKSPACE, RUNNER_OS, RDD_TRACE, MSYSTEM are all silently dropped, defeating most of the diagnostic. Second, the stop path in windows-typeperf-sampler/action.yaml invokes relog without checking $LASTEXITCODE and unconditionally Remove-Item $blgFile -Forces the BLG, so any conversion failure permanently loses the raw capture — and because the workflow step sets continue-on-error: true, the job will not even fail. Both bugs are straightforward to fix.
Remaining findings are suggestions: add Wait-Process after Stop-Process so typeperf's file handle is guaranteed closed before relog runs; move or gate the 256 MB disk benchmark so it stops warming the file cache just before the BATS run it is meant to diagnose; wrap the machine-info FileStream benchmark in try/finally so a mid-write exception does not leak the handle; reconcile the stale summarize-perf-counters/scripts/... header comment in summarize-typeperf.go; print or remove the unused handles/threads counters; distinguish io.EOF from real CSV parse errors; and pass -y to typeperf for symmetry with the relog call.
Critical Issues ¶
None.
Important Issues ¶
Get-Service vmms, vmcompute, hns, WslService -ErrorAction SilentlyContinue |
Select-Object Name, Status, StartType | Format-Table -AutoSize
Write-Output "=== Top 20 processes by working set ==="
Get-Process | Sort-Object -Property WS -Descending | Select-Object -First 20 |
Select-Object @{N='WS_MB';E={[math]::Round($_.WorkingSet64/1MB,1)}},
@{N='CPU_sec';E={if ($_.CPU) { [math]::Round($_.CPU,1) } else { '' }}},
Id, Handles, ProcessName |
Format-Table -AutoSize
Write-Output "=== Environment ==="
^(...)$ anchors force each alternative to match the whole variable name. LOCALAPPDATA, TEMP, TMP, USERPROFILE, HOMEDRIVE, and PATH are exact names and match fine. But GITHUB_, RUNNER_, RDD_, and MSYS were intended as prefixes — no environment variable is literally named GITHUB_ or RUNNER_. Every GITHUB_WORKSPACE, GITHUB_ACTIONS, GITHUB_SHA, RUNNER_OS, RUNNER_TEMP, RDD_TRACE, RDD_LOG_LEVEL, MSYSTEM, MSYS2_PATH_TYPE fails the $ anchor and is dropped. I verified the behavior with python3 -c "import re; re.match(r'^(GITHUB_|...)$', 'GITHUB_WORKSPACE')" — no match.
The whole point of this section is correlating runner state with flaky timeouts. Losing the Actions metadata and the RDD tuning knobs defeats most of the value; right now the block only exposes LOCALAPPDATA, TEMP, TMP, USERPROFILE, HOMEDRIVE, and PATH.
Fix: turn the prefix entries into proper patterns. (Intentionally keep GITHUB_TOKEN out by not adding a GITHUB_.*TOKEN.* catch-all, though GITHUB_TOKEN is not set on composite-action steps unless explicitly mapped.)
- Get-ChildItem env: | Where-Object { $_.Name -match '^(LOCALAPPDATA|TEMP|TMP|USERPROFILE|HOMEDRIVE|GITHUB_|RUNNER_|RDD_|MSYS|PATH)$' } |
+ Get-ChildItem env: | Where-Object { $_.Name -match '^(LOCALAPPDATA|TEMP|TMP|USERPROFILE|HOMEDRIVE|PATH|GITHUB_.*|RUNNER_.*|RDD_.*|MSYS.*)$' } |
Sort-Object Name | Format-Table -AutoSize Name, Value
Wait-Process -Id $typeperfPid -Timeout 5 -ErrorAction SilentlyContinue
Remove-Item $pidFile -Force -ErrorAction SilentlyContinue
Write-Output "stopped typeperf pid=$typeperfPid"
}
$blgFile = "$outputDir/typeperf.blg"
$tsvFile = "$outputDir/typeperf.tsv"
if (Test-Path $blgFile) {
$size = (Get-Item $blgFile).Length / 1MB
Write-Output ("typeperf.blg: {0:N1} MB" -f $size)
relog $blgFile -f TSV -o $tsvFile -y
if ($LASTEXITCODE -ne 0 -or -not (Test-Path $tsvFile)) {
Write-Warning "relog failed; preserving $blgFile for inspection"
exit 1
}
$lines = (Get-Content $tsvFile | Measure-Object -Line).Lines
Write-Output ("typeperf.tsv: {0:N1} MB, {1} lines" -f ((Get-Item $tsvFile).Length / 1MB), $lines)
go run "${{ github.action_path }}/summarize-typeperf.go" `
$tsvFile 10 > "$outputDir/typeperf-summary.txt"
if ($LASTEXITCODE -ne 0) {
PowerShell does not throw on a non-zero exit from a native command; a failed relog silently flows into the Get-Content/Measure-Object call on a missing TSV (non-terminating error, execution continues) and into the unconditional Remove-Item $blgFile -Force at line 54. Because the workflow step in .github/workflows/bats.yaml:171-177 sets continue-on-error: true, the job will not fail — it will just upload an empty rdd-logs/_diag/ with no raw BLG to retry conversion against. For a diagnostic action whose whole job is preserving evidence, that is the worst failure mode.
shell: run-in-shell {0}
env:
RDD_TRACE: true
RDD_LOG_LEVEL: trace
- name: "Windows: stop typeperf sampler"
if: always() && runner.os == 'Windows'
uses: ./.github/actions/windows-typeperf-sampler
with:
command: stop
output-dir: rdd-logs/_diag
continue-on-error: true
- name: Collect RDD logs
if: always()
shell: run-in-shell {0}
run: scripts/collect-bats-logs.sh rdd-logs
Fix: gate subsequent steps on $LASTEXITCODE and only delete the BLG after both conversion and summarization succeed; keep the BLG on failure so it can be re-processed locally.
if (Test-Path $blgFile) {
$size = (Get-Item $blgFile).Length / 1MB
Write-Output ("typeperf.blg: {0:N1} MB" -f $size)
relog $blgFile -f TSV -o $tsvFile -y
+ if ($LASTEXITCODE -ne 0 -or -not (Test-Path $tsvFile)) {
+ Write-Warning "relog failed; preserving $blgFile for inspection"
+ exit 1
+ }
$lines = (Get-Content $tsvFile | Measure-Object -Line).Lines
Write-Output ("typeperf.tsv: {0:N1} MB, {1} lines" -f ((Get-Item $tsvFile).Length / 1MB), $lines)
- Remove-Item $blgFile -Force
go run "${{ github.action_path }}/summarize-typeperf.go" `
$tsvFile 10 > "$outputDir/typeperf-summary.txt"
+ if ($LASTEXITCODE -eq 0) {
+ Remove-Item $blgFile -Force
+ }
Write-Output "summary written to $outputDir/typeperf-summary.txt"
}
Suggestions ¶
run: |
$outputDir = "${{ inputs.output-dir }}"
$pidFile = "$outputDir/typeperf.pid"
if (Test-Path $pidFile) {
$typeperfPid = Get-Content $pidFile
Stop-Process -Id $typeperfPid -Force -ErrorAction SilentlyContinue
Wait-Process -Id $typeperfPid -Timeout 5 -ErrorAction SilentlyContinue
Remove-Item $pidFile -Force -ErrorAction SilentlyContinue
Write-Output "stopped typeperf pid=$typeperfPid"
}
$blgFile = "$outputDir/typeperf.blg"
$tsvFile = "$outputDir/typeperf.tsv"
if (Test-Path $blgFile) {
$size = (Get-Item $blgFile).Length / 1MB
Write-Output ("typeperf.blg: {0:N1} MB" -f $size)
relog $blgFile -f TSV -o $tsvFile -y
if ($LASTEXITCODE -ne 0 -or -not (Test-Path $tsvFile)) {
Write-Warning "relog failed; preserving $blgFile for inspection"
exit 1
}
Stop-Process -Force issues TerminateProcess, which is asynchronous: the kernel signals cancellation but the process object and its file handles are not fully released until every outstanding I/O completes. On a busy runner typeperf can still hold typeperf.blg for a few milliseconds when relog starts, producing a file-sharing violation. The race window is tiny and this is defensive rather than a demonstrated failure, but the cost of a lost summary partly defeats the instrumentation, so it is worth eliminating.
Gemini rated this Important; I downgraded it to a Suggestion because typeperf flushes each BLG sample immediately and Windows file-handle teardown after TerminateProcess is usually complete within milliseconds, so the practical failure rate on CI runners is near zero. Wait-Process is still the right call.
Fix:
Stop-Process -Id $typeperfPid -Force -ErrorAction SilentlyContinue
+Wait-Process -Id $typeperfPid -Timeout 5 -ErrorAction SilentlyContinue
Remove-Item $pidFile -Force -ErrorAction SilentlyContinue
// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: SUSE LLC
// SPDX-FileCopyrightText: The Rancher Desktop Authors
// summarize-typeperf reads a TSV file produced by typeperf (via relog
// from BLG) and prints a compact summary of system counters and top
// processes per sample.
//
// Usage:
//
// go run .github/actions/windows-typeperf-sampler/summarize-typeperf.go <perfdata.tsv> [topN]
package main
import (
"encoding/csv"
The file is summarize-typeperf.go under .github/actions/windows-typeperf-sampler/, and the workflow invokes that exact path from action.yaml:55-56. The header refers to a summarize-perf-counters name and scripts/summarize-perf-counters.go path that git log --all -- scripts/summarize* confirms has never existed in this repo — it looks like leftover boilerplate from an external adaptation. Anyone trying to reproduce the summary locally will be sent to a file that is not there.
The companion fmt.Fprintf(os.Stderr, "usage: summarize-perf-counters ...") at line 46 has the same stale name.
handles int
threads int
}
var headerRe = regexp.MustCompile(`\\Process\(([^)]+)\)\\(.+)$`)
func main() {
if len(os.Args) < 2 {
fmt.Fprintf(os.Stderr, "usage: summarize-typeperf <perfdata.tsv> [topN]\n")
os.Exit(1)
}
Fix:
-// summarize-perf-counters reads a TSV file produced by typeperf and prints
-// a compact summary of system counters and top processes per sample.
+// summarize-typeperf reads a TSV file produced by typeperf (via relog from
+// BLG) and prints a compact summary of system counters and top processes
+// per sample.
//
// Usage:
//
-// go run scripts/summarize-perf-counters.go <perfdata.tsv> [topN]
+// go run .github/actions/windows-typeperf-sampler/summarize-typeperf.go <perfdata.tsv> [topN]
package main
And update the usage line at line 46 to match.
run: |
wsl --list --verbose
wsl --status
continue-on-error: true
- name: "Windows: capture machine info"
if: runner.os == 'Windows'
uses: ./.github/actions/windows-machine-info
with:
output-path: rdd-logs/_diag/machine-info.txt
continue-on-error: true
- name: "Windows: start typeperf sampler"
if: runner.os == 'Windows'
uses: ./.github/actions/windows-typeperf-sampler
with:
command: start
output-dir: rdd-logs/_diag
continue-on-error: true
- run: make -C bats --keep-going --jobs=$(nproc) --output-sync=target
shell: run-in-shell {0}
env:
RDD_TRACE: true
RDD_LOG_LEVEL: trace
machine-info.ps1:79-128 deliberately writes and reads back a 256 MB sequential file on the same volume as the BATS workload, and the code comment at lines 80-86 explicitly notes that this measures the cache-warm sequential rate (not raw disk) because "FileStream with no buffering still goes through the OS file cache". That is fine as a runner-characterization data point, but placing the step right before the BATS step means every Windows job now starts with a 256 MB warm-up burst that both drains ~5 seconds of wall clock and changes the disk/cache state before the test the diagnostic is meant to correlate with begins.
'\PhysicalDisk(_Total)\Disk Write Bytes/sec',
'\PhysicalDisk(_Total)\Disk Read Bytes/sec',
'\PhysicalDisk(_Total)\Current Disk Queue Length'
) -ErrorAction SilentlyContinue | Format-List
Write-Output "=== Disk Throughput Benchmark ==="
# Sequential write/read of a 256 MB file in TEMP. This measures the disk
# path the bats run actually uses (Lima's instance dirs and the lima
# download cache live under %USERPROFILE% / %LOCALAPPDATA%, on the same
# volume as %TEMP% on a default GitHub runner). FileStream with no
# buffering still goes through the OS file cache, so this is the
# cache-warm sequential rate, not raw disk speed -- but that is what
# gzip writes hit too.
$benchFile = Join-Path $env:TEMP "diag-disk-bench-$([System.Guid]::NewGuid().ToString('N')).bin"
try {
$sizeMB = 256
$buf = New-Object byte[] (1MB)
(New-Object Random).NextBytes($buf)
$sw = [System.Diagnostics.Stopwatch]::StartNew()
$fs = [System.IO.File]::Create($benchFile)
try {
for ($i = 0; $i -lt $sizeMB; $i++) {
$fs.Write($buf, 0, $buf.Length)
}
$fs.Flush($true) # flush to disk
} finally {
$fs.Dispose()
}
$sw.Stop()
$writeMs = $sw.Elapsed.TotalMilliseconds
$writeMBs = [math]::Round($sizeMB * 1000 / $writeMs, 1)
$sw.Restart()
$fs = [System.IO.File]::OpenRead($benchFile)
try {
$total = 0
$readbuf = New-Object byte[] (1MB)
while (($n = $fs.Read($readbuf, 0, $readbuf.Length)) -gt 0) {
$total += $n
}
} finally {
$fs.Dispose()
}
$sw.Stop()
$readMs = $sw.Elapsed.TotalMilliseconds
$readMBs = [math]::Round($sizeMB * 1000 / $readMs, 1)
[PSCustomObject]@{
File = $benchFile
SizeMB = $sizeMB
WriteMs = [math]::Round($writeMs, 1)
WriteMBPerSec = $writeMBs
ReadMs = [math]::Round($readMs, 1)
ReadMBPerSec = $readMBs
} | Format-List
} catch {
Write-Output "disk bench failed: $($_.Exception.Message)"
} finally {
if (Test-Path $benchFile) { Remove-Item $benchFile -Force -ErrorAction SilentlyContinue }
if: runner.os == 'macOS'
working-directory: ${{ runner.temp }}
run: |
brew install bash coreutils make
# Set up run-in-shell helper script (as a symlink to bash)
mkdir -p bin
ln -s $(brew --prefix bash)/bin/bash bin/run-in-shell
readlink -f bin >> "$GITHUB_PATH"
# Use GNU make
echo "$(brew --prefix)/opt/make/libexec/gnubin" >> "$GITHUB_PATH"
- name: "Windows: install prerequisites"
if: runner.os == 'Windows'
shell: pwsh
working-directory: ${{ runner.temp }}
Fix: either move the machine-info step to run after the BATS step (runner characteristics do not change during the job), or gate the disk benchmark behind an action input and default it off.
FileStream benchmark does not dispose the handle on error, so a mid-write exception leaks the file Gemini¶
# download cache live under %USERPROFILE% / %LOCALAPPDATA%, on the same
# volume as %TEMP% on a default GitHub runner). FileStream with no
# buffering still goes through the OS file cache, so this is the
# cache-warm sequential rate, not raw disk speed -- but that is what
# gzip writes hit too.
$benchFile = Join-Path $env:TEMP "diag-disk-bench-$([System.Guid]::NewGuid().ToString('N')).bin"
try {
$sizeMB = 256
$buf = New-Object byte[] (1MB)
(New-Object Random).NextBytes($buf)
$sw = [System.Diagnostics.Stopwatch]::StartNew()
$fs = [System.IO.File]::Create($benchFile)
try {
for ($i = 0; $i -lt $sizeMB; $i++) {
$fs.Write($buf, 0, $buf.Length)
}
$fs.Flush($true) # flush to disk
} finally {
$fs.Dispose()
}
$sw.Stop()
$writeMs = $sw.Elapsed.TotalMilliseconds
$writeMBs = [math]::Round($sizeMB * 1000 / $writeMs, 1)
$sw.Restart()
$fs = [System.IO.File]::OpenRead($benchFile)
try {
$total = 0
$readbuf = New-Object byte[] (1MB)
while (($n = $fs.Read($readbuf, 0, $readbuf.Length)) -gt 0) {
$total += $n
}
} finally {
$fs.Dispose()
}
$sw.Stop()
$readMs = $sw.Elapsed.TotalMilliseconds
If $fs.Write throws mid-loop (disk full, antivirus locking, I/O error) control jumps past $fs.Close() into the outer catch block. The handle is leaked to GC. The finally then tries Remove-Item $benchFile -Force -ErrorAction SilentlyContinue on a still-locked file; with SilentlyContinue the delete failure is swallowed and a 256 MB file lingers under %TEMP% until the runner is recycled. The identical pattern repeats in the read loop at lines 104-111.
Fix: wrap each stream in its own try/finally so the handle is disposed deterministically.
- $fs = [System.IO.File]::Create($benchFile)
- for ($i = 0; $i -lt $sizeMB; $i++) {
- $fs.Write($buf, 0, $buf.Length)
- }
- $fs.Flush($true) # flush to disk
- $fs.Close()
+ $fs = [System.IO.File]::Create($benchFile)
+ try {
+ for ($i = 0; $i -lt $sizeMB; $i++) {
+ $fs.Write($buf, 0, $buf.Length)
+ }
+ $fs.Flush($true) # flush to disk
+ } finally {
+ $fs.Dispose()
+ }
"math"
"os"
"regexp"
"slices"
"strconv"
"strings"
)
type procSnap struct {
name string
pid int
cpu float64
wsMB float64
handles int
threads int
}
// procIndex holds column indices for one process instance.
printTopN only prints name, pid, cpu, wsMB:
fmt.Printf(" %-30s PID %6d CPU %7.1f%% WS %8.1f MB\n",
s.name, s.pid, s.cpu, s.wsMB)
The handles/threads fields on procSnap, the matching fields on procIndex, the header detection for Handle Count/Thread Count, and the per-sample extraction at lines 160-168 all exist only to populate two fields that nothing reads. The corresponding \Process(*)\Handle Count and \Process(*)\Thread Count lines in typeperf-counters.txt each add one column per tracked process per sample to the BLG (and the intermediate TSV) for no benefit.
}
// Per-process snapshots via precomputed indices
snaps := make([]procSnap, 0, len(instances))
for _, inst := range instances {
pi := procs[inst]
cpu, cpuOK := colFloat(record, pi.cpu)
ws, wsOK := colFloat(record, pi.ws)
if !cpuOK && !wsOK {
continue
}
pid, _ := colInt(record, pi.pid)
handles, _ := colInt(record, pi.handles)
threads, _ := colInt(record, pi.threads)
snaps = append(snaps, procSnap{
name: inst,
pid: pid,
cpu: cpu,
wsMB: ws / (1024 * 1024),
Handle and thread counts are actually useful for leak diagnosis, so extending the output is probably preferable to deleting the counters:
- fmt.Printf(" %-30s PID %6d CPU %7.1f%% WS %8.1f MB\n",
- s.name, s.pid, s.cpu, s.wsMB)
+ fmt.Printf(" %-30s PID %6d CPU %7.1f%% WS %8.1f MB H %6d T %4d\n",
+ s.name, s.pid, s.cpu, s.wsMB, s.handles, s.threads)
case "Thread Count":
pi.threads = i
}
}
fmt.Printf("Processes tracked: %d\n", len(procs))
rowNum := 0
for {
record, err := reader.Read()
if errors.Is(err, io.EOF) {
break
}
if err != nil {
fmt.Fprintf(os.Stderr, "read row %d: %v\n", rowNum+1, err)
encoding/csv.Reader returns io.EOF at clean end, but also returns csv.ErrFieldCount or quoting errors mid-file. The current loop collapses all of them into a silent break, leaving the misleading Samples: N footer at line 176 with a low count and no explanation. Worth distinguishing io.EOF from real parse errors so a partial TSV still summarizes what it can while flagging the failure on stderr.
pid: pid,
cpu: cpu,
wsMB: ws / (1024 * 1024),
handles: handles,
threads: threads,
})
}
printTopN(snaps, topN, "CPU", func(s procSnap) float64 { return s.cpu })
printTopN(snaps, topN, "Working Set", func(s procSnap) float64 { return s.wsMB })
}
for {
record, err := reader.Read()
+ if errors.Is(err, io.EOF) {
+ break
+ }
if err != nil {
- break
+ fmt.Fprintf(os.Stderr, "read row %d: %v\n", rowNum+1, err)
+ break
}
rowNum++
(Adds errors and io imports.)
if (-not (Test-Path $outputDir)) {
New-Item -ItemType Directory -Force -Path $outputDir | Out-Null
}
$counterFile = "${{ github.action_path }}/typeperf-counters.txt"
$blgFile = "$outputDir/typeperf.blg"
$proc = Start-Process -PassThru -WindowStyle Hidden -FilePath typeperf `
-ArgumentList "-cf", $counterFile, "-f", "BIN", "-o", $blgFile, "-si", "1", "-y"
$proc.Id | Set-Content -Path "$outputDir/typeperf.pid"
Write-Output "typeperf started, pid=$($proc.Id), output=$blgFile"
- name: Stop typeperf and summarize
if: runner.os == 'Windows' && inputs.command == 'stop'
If $blgFile already exists, typeperf prompts to overwrite and blocks on stdin. With -WindowStyle Hidden the prompt is invisible and the sampler hangs indefinitely. Fresh CI runners never hit this, but relog on line 51 already passes -y for exactly the same reason; the start step should be symmetric as cheap insurance against a reused runner.
}
$blgFile = "$outputDir/typeperf.blg"
$tsvFile = "$outputDir/typeperf.tsv"
if (Test-Path $blgFile) {
$size = (Get-Item $blgFile).Length / 1MB
Write-Output ("typeperf.blg: {0:N1} MB" -f $size)
relog $blgFile -f TSV -o $tsvFile -y
if ($LASTEXITCODE -ne 0 -or -not (Test-Path $tsvFile)) {
Write-Warning "relog failed; preserving $blgFile for inspection"
exit 1
}
- -ArgumentList "-cf", $counterFile, "-f", "BIN", "-o", $blgFile, "-si", "1"
+ -ArgumentList "-cf", $counterFile, "-f", "BIN", "-o", $blgFile, "-si", "1", "-y"
Design Observations ¶
Strengths ¶
- BLG-then-relog is the correct architecture for this problem
(in-scope)Claude Codex. Commite2511a8correctly identifies that typeperf's direct-TSV mode locks the column set at the first sample and misses processes that start later; switching to BLG and converting viarelogat stop decouples low-overhead collection from post-processing and eliminates the blind spot entirely. The commit message's explanation is worth keeping even though squashing it into5290abdwould have been defensible. - Separation of one-shot machine-info and continuous typeperf sampler matches their lifecycles
(in-scope)Claude. Runner hardware and OS state do not change during a job, so capturing them once at start is the right shape; perf counters do change and need continuous sampling. Splitting them into two composite actions keeps each one independently usable. - Go summarizer indexes columns once and streams rows
(in-scope)Claude Gemini.summarize-typeperf.go:76-113builds the process-column index from the header in a single pass, then the per-row loop does simple lookups — no repeated header scans. The column-index struct initializes to-1so missing counters are safely handled bycolFloat/colIntboundary checks, preventing panics on abbreviated rows.
Testing Assessment ¶
No automated test coverage is expected for CI-only diagnostic scripts, and the agents did not flag its absence. The more pragmatic gap is that the regex bug in I1 would have been visible to anyone who looked at the first machine-info.txt artifact produced on CI — worth eyeballing the artifact on the first post-merge run to confirm the fix actually surfaces GITHUB_*, RUNNER_*, RDD_*, and MSYS* variables. The end-to-end start→BATS→stop cycle producing both typeperf.tsv and typeperf-summary.txt also has no validation other than watching the first Windows CI run.
Documentation Assessment ¶
The header comment in summarize-typeperf.go is stale (see S2). Beyond that, the PR description in #292 still describes the sampler as running "alongside the existing PowerShell sampler so we can compare overhead and data quality" and lists both perf-sampler.jsonl (PowerShell) and typeperf.tsv (new) in its artifacts table — but commit e44cc1b removed the PowerShell sampler before the final state, so the description is out of sync with what merges. Worth updating before merge so the commit history matches the PR narrative.
Commit Structure ¶
Four commits form a coherent narrative: (1) 9bed00e introduces machine-info and the original PowerShell sampler, (2) 5290abd adds the typeperf variant alongside it, (3) e2511a8 fixes the direct-TSV blind spot by switching to BLG, (4) e44cc1b deletes the PowerShell sampler once typeperf proved superior. The e2511a8 fix could arguably have been squashed into 5290abd, but leaving it separate preserves useful context about why BLG is required over direct-TSV — the reason is non-obvious enough that the audit trail is worth the extra commit.
Acknowledged Limitations ¶
- Disk benchmark is cache-warm, not raw disk —
machine-info.ps1:80-86Codex Gemini. The code comment explicitly notes thatFileStreamwithout buffering still traverses the OS file cache, so the reported MB/s reflects the cache-warm sequential rate rather than raw disk throughput. The author calls this out correctly; with S3 it becomes more important because the benchmark is what warms the cache just before BATS runs.
Agent Performance Retro ¶
[Claude] — Claude Opus 4.6 ¶
Claude produced the most findings (6) and the only diagnosis of the regex bug in machine-info.ps1 (I1), which is the single most consequential issue in the review because it silently zeroes out the whole "Environment" section of the diagnostic artifact. Its approach was methodical: it read each file, built a mental model of what the diagnostic was supposed to expose, and then noticed that what it would actually expose did not match. Claude also caught three smaller, easy-to-miss quality issues in summarize-typeperf.go: the stale header comment (S2, also found by Codex), the dead handles/threads code path (S5), and the io.EOF-vs-parse-error loop (S6). The typeperf -y defensive hardening (S7) is unique to Claude as well.
Claude independently identified the Stop-Process/relog race Gemini flagged but calibrated it lower (Suggestion, not Important), which matches the realistic CI behavior better. Its only gap is that it missed Codex's relog/BLG loss (I2) — a small block of code that does not look risky until you think about $LASTEXITCODE semantics on native commands in PowerShell. It also marked bats.yaml as "Reviewed, no issues" while Codex correctly flagged a step-ordering concern in that file (S3 in this report).
Tool usage: 21 calls (Bash 10, Read 9, Grep 1). The Bash calls include git log --all, git blame, and verification of the PowerShell regex behavior — exactly the depth the prompt asks for.
[Codex] — Codex GPT 5.4 ¶
Codex produced the fewest findings (3) but every single one is substantive: the relog failure mode (I2) that wipes the raw BLG is the most original finding in the review because it requires knowing that PowerShell does not throw on native command non-zero exit codes, and it directly interacts with the workflow's continue-on-error: true, which Codex also cited by file:line. The disk-bench-perturbs-CI-cache concern (S3) is a systems-level observation that neither other agent made, and the stale header comment (S2) was found in parallel with Claude.
Codex had zero false positives, zero duplicates, and a 100 % signal rate on its three findings. It missed several smaller polish items (dead handles/threads counters in summarize-typeperf.go, the FileStream disposal leak, the env var regex, the io.EOF loop, the typeperf -y symmetry), some of which are in files it reviewed. Its one "Reviewed, no issues" coverage miss is typeperf-counters.txt:14-15 — the two lines that configure the counters feeding Claude's S5 dead-code finding.
Tool usage: 22 calls (21 shell, 1 write_stdin). Good depth; the shell breakdown likely includes multiple git log/git blame invocations and file reads via sed/cat.
[Gemini] — Gemini 3.1 Pro ¶
Gemini produced 2 findings. Its unique contribution is the FileStream disposal-on-error leak in machine-info.ps1 (S4 in this report) — a genuine PowerShell resource-management observation that neither other agent caught, and one Gemini is well-calibrated for because it pattern-matches on try/finally idioms without needing deep semantic tracing. Its other finding, the Stop-Process/relog race, is duplicated by Claude (S3 → S1 in consolidation). Gemini rated that race Important; I downgraded it to Suggestion because the race window after TerminateProcess is typically sub-millisecond on Windows and typeperf flushes BLG writes immediately, so the practical failure rate on GitHub runners is near zero.
Gemini marked summarize-typeperf.go, bats.yaml, and typeperf-counters.txt as "Reviewed, no issues"/Trivial while the other agents found three distinct issues in those files. The Go summarizer gap is particularly notable — it is a 236-line Go file, exactly the kind of code Gemini is supposed to be strong at, but it produced zero findings there. The session file shows no recorded tool calls, consistent with Gemini's well-known habit of skipping git blame to conserve its daily Pro-model quota.
Tool usage: session did not record individual tool calls. From the prompt-to-response elapsed time (~3m 52s) and the depth of its two findings, Gemini clearly read at least machine-info.ps1 and windows-typeperf-sampler/action.yaml but apparently did not explore the Go file or the workflow in enough detail to form opinions there.
Summary ¶
| Claude Opus 4.6 | Codex GPT 5.4 | Gemini 3.1 Pro | |
|---|---|---|---|
| Duration | 7m 27s | 3m 41s | 3m 52s |
| Findings | 1I 5S | 1I 2S | 0I 2S |
| Tool calls | 21 (Bash 10, Read 9, Grep 1) | 22 (shell 21, stdin 1) | not recorded |
| Design observations | 3 | 1 | 2 |
| False positives | 0 | 0 | 0 |
| Unique insights | 4 | 2 | 1 |
| Files reviewed | 6 | 6 | 6 |
| Coverage misses | 1 | 1 | 3 |
| Totals | 1I 5S | 1I 2S | 0I 2S |
| Downgraded | 0 | 0 | 1 (I→S) |
| Dropped | 0 | 0 | 0 |
Reconciliation. One severity change during consolidation: Gemini P1 "Race condition when stopping typeperf" was rated important and consolidated as suggestion (S1 in this report). Rationale: after TerminateProcess, file-handle cleanup on Windows completes in sub-millisecond time; typeperf flushes BLG writes at sample time rather than at exit, so there is no pending I/O holding the file; and neither this PR's stop path nor the widely deployed Stop-Process-then-use pattern in other CI scripts shows the failure empirically. Wait-Process is still the right defensive change, so the finding is kept — just at the correct severity. No false positives from any agent. No dropped findings.
Overall. All three agents added distinct value: Claude is the clear leader on breadth and caught the highest-impact bug (I1); Codex is the clear leader on signal-to-noise ratio, delivering the other Important finding (I2) plus two substantive suggestions with zero noise; Gemini contributed one genuinely unique suggestion (S4) that the other two missed. The multi-agent ensemble was justified for this PR — removing any single agent would have dropped a real finding from the report.
Review Process Notes ¶
Skill improvements ¶
- Add a PowerShell "native command exit code" heuristic to the review dimensions. Codex's I2 was the single most original finding in this review, and it required knowing that in PowerShell, a non-zero exit from an external command like
relogdoes not throw and does not stop execution — subsequent statements run against whatever state the failed command left behind. This is a general review principle that will come up again any time we review PowerShell in CI: "When reviewing PowerShell that invokes native commands, verify that$LASTEXITCODE(orif (-not $?)) is checked after each external invocation; PowerShell does not automatically throw on non-zero exits, so errors silently propagate into subsequent statements that assume success." This is a real prompt-gap signal because only one of three agents found the issue, and the inference is a domain rule that generalizes beyond this PR.
Repo context updates ¶
No repo context gaps surfaced in this review. The PR touches only CI infrastructure and has no interaction with the controllers, kube-apiserver, Lima lifecycle, or any of the architectural areas that deep-review-context.md covers. No entries in that file are contradicted by the reviewed code, so no pruning is needed.