Deep Review: 20260408-175512-pr-280

Date	2026-04-08 17:55
Repo	rancher-sandbox/rancher-desktop-daemon
Round	3
Author	@jandubois
PR	#280 — Add host-switch virtual network for WSL2 instances
Branch	`host-switch`
Commits	`a29eda1` Add host-switch virtual network for WSL2 instances `d498946` Fix go-mod-k8s-sync checkout for cross-repository PRs
Reviewers	Claude Opus 4.6, Codex GPT 5.4, Gemini 3.1 Pro
Verdict	Merge with fixes — two important issues: accept loop swallows unexpected errors, and one-shot registry read fails when no WSL2 VM exists yet
Wall-clock time	`32 min 27 s`

Executive Summary ¶

This PR ports the host-switch networking logic from rancher-desktop into the LimaVM controller as an in-process goroutine. The goroutine performs a vsock handshake with the WSL2 guest, then runs a gvisor-tap-vsock virtual network providing DNS, DHCP, and NAT. A second commit fixes the CI workflow for cross-repository PRs. The lifecycle integration is thorough — every start, stop, restart, crash, and shutdown path correctly pairs startHostSwitch and stopHostSwitch. Two issues need attention: the vsock accept loop silently swallows unexpected errors, and the registry-based VM discovery reads only once, failing immediately on fresh systems where no WSL2 VM exists yet.

Critical Issues ¶

None.

Important Issues ¶

I1. Accept loop swallows unexpected errors Claude Gemini¶

pkg/controllers/lima/limavm/controllers/hostswitch_windows.go

	// Accept vsock connections and feed them into the virtual network.
	g.Go(func() error {
		for {
			conn, err := ln.Accept()
			if err != nil {
				if errors.Is(err, net.ErrClosed) {
					return nil // Listener closed during shutdown.
				}
				return fmt.Errorf("vsock accept failed: %w", err)
			}
			// AcceptStdio blocks until the connection closes. This is
			// intentional: each VM runs a single vm-switch process, so
			// reconnections are serial (old connection EOF, then new accept).
			if err := vn.AcceptStdio(ctx, conn); err != nil {
				logger.Error(err, "Failed to accept connection into virtual network")
			} else {

The accept goroutine returns nil for all Accept failures, not just the net.ErrClosed expected during shutdown. If Accept fails for an unexpected reason (resource exhaustion, OS-level socket error), the goroutine exits silently. Because it returns nil, the errgroup does not cancel the shared context. The HTTP server and context-wait goroutines keep running, but no new vsock connections can be accepted. The guest retains its existing connection but cannot reconnect after a network restart. The only external symptom is a "Host-switch stopped" log at line 241 — no error is surfaced.

pkg/controllers/lima/limavm/controllers/hostswitch_windows.go


	logger.Info("Host-switch running", "subnet", subnet.SubnetCIDR, "gateway", subnet.GatewayIP)

	if err := g.Wait(); err != nil {
		logger.Error(err, "Host-switch exited with error")
	} else {
		logger.Info("Host-switch stopped")
	}
}

// newVirtualNetworkConfig builds the gvisor-tap-vsock configuration.

Fix: Distinguish shutdown from unexpected errors.

 		conn, err := ln.Accept()
 		if err != nil {
-			// Listener closed (context cancelled).
-			return nil //nolint:nilerr // Expected on shutdown.
+			if errors.Is(err, net.ErrClosed) {
+				return nil // Listener closed during shutdown.
+			}
+			return fmt.Errorf("vsock accept failed: %w", err)
 		}

This propagates unexpected errors through the errgroup, which cancels the context and triggers clean shutdown of all goroutines. The error is then logged at line 239.

I2. Registry-based VM discovery is one-shot; fails when no WSL2 VM exists Gemini¶

pkg/controllers/lima/limavm/controllers/hostswitch_windows.go


	return ln, nil
}

// getVMGUID discovers the Hyper-V VM running our distro by polling the
// registry for VM GUIDs and handshaking with each in parallel. The registry
// is rescanned every second so that VMs starting after the initial scan
// (e.g., the WSL2 utility VM on a fresh system) are discovered.
//
// The signature-based discovery assumes only one opensuse WSL2 instance
// runs at a time. With multiple instances, the first match wins and the
// others get no host-switch networking.
func getVMGUID(ctx context.Context, logger logr.Logger) (hvsock.GUID, error) {
	ctx, cancel := context.WithCancel(ctx)
	defer cancel()

	found := make(chan hvsock.GUID, 1)
	seen := make(map[hvsock.GUID]bool)

	scanRegistry := func() {
		key, err := registry.OpenKey(
			registry.LOCAL_MACHINE,
			`SOFTWARE\Microsoft\Windows NT\CurrentVersion\HostComputeService\VolatileStore\ComputeSystem`,
			registry.ENUMERATE_SUB_KEYS)
		if err != nil {
			return // Registry key not present yet; retry on next tick.
		}
		names, err := key.ReadSubKeyNames(0)
		key.Close()
		if err != nil {
			return
		}
		for _, name := range names {
			vmGUID, err := hvsock.GUIDFromString(name)
			if err != nil {
				logger.V(1).Info("Skipping invalid VM GUID", "name", name, "error", err)
				continue
			}

getVMGUID reads the Hyper-V registry once and launches handshake goroutines for each GUID found. If the registry key is absent or contains zero entries — which happens on a fresh system where no other WSL2 distro is running — the function returns an error immediately. Because startHostSwitch runs before the hostagent boots the WSL2 VM (by design, since the guest blocks on the handshake), the utility VM may not exist yet.

When another WSL2 distro is already running, the shared utility VM's GUID is in the registry, and attemptHandshake polls it every second until the opensuse distro's network-setup starts listening. The bug manifests only when no WSL2 VM exists at startup time.

Fix: Retry registry reads within the handshake timeout loop, spawning handshake goroutines for newly discovered GUIDs as they appear.

 func getVMGUID(ctx context.Context, logger logr.Logger) (hvsock.GUID, error) {
-	key, err := registry.OpenKey(...)
-	// ... one-shot read
+	ticker := time.NewTicker(time.Second)
+	defer ticker.Stop()
+
+	seen := make(map[hvsock.GUID]bool)
+	found := make(chan hvsock.GUID, 1)
+
+	for {
+		select {
+		case <-ctx.Done():
+			return hvsock.GUIDZero, fmt.Errorf("VM GUID discovery timed out: %w", ctx.Err())
+		case vmGUID := <-found:
+			return vmGUID, nil
+		case <-ticker.C:
+			key, err := registry.OpenKey(...)
+			if err != nil {
+				continue // VM not running yet
+			}
+			names, _ := key.ReadSubKeyNames(0)
+			key.Close()
+			for _, name := range names {
+				vmGUID, err := hvsock.GUIDFromString(name)
+				if err == nil && !seen[vmGUID] {
+					seen[vmGUID] = true
+					go attemptHandshake(ctx, logger, vmGUID, found)
+				}
+			}
+		}
+	}
 }

This merges the current getVMGUID and attemptHandshake polling into a single retry loop: the registry is re-scanned each tick, and handshake goroutines are spawned only for newly discovered GUIDs. The existing 5-minute handshakeTimeout bounds the total wait.

Suggestions ¶

S1. VM discovery cannot distinguish multiple opensuse instances Claude Codex¶

pkg/controllers/lima/limavm/controllers/hostswitch_windows.go


	return ln, nil
}

// getVMGUID discovers the Hyper-V VM running our distro by polling the
// registry for VM GUIDs and handshaking with each in parallel. The registry
// is rescanned every second so that VMs starting after the initial scan
// (e.g., the WSL2 utility VM on a fresh system) are discovered.
//
// The signature-based discovery assumes only one opensuse WSL2 instance
// runs at a time. With multiple instances, the first match wins and the
// others get no host-switch networking.
func getVMGUID(ctx context.Context, logger logr.Logger) (hvsock.GUID, error) {
	ctx, cancel := context.WithCancel(ctx)
	defer cancel()

	found := make(chan hvsock.GUID, 1)
	seen := make(map[hvsock.GUID]bool)

	scanRegistry := func() {
		key, err := registry.OpenKey(
			registry.LOCAL_MACHINE,
			`SOFTWARE\Microsoft\Windows NT\CurrentVersion\HostComputeService\VolatileStore\ComputeSystem`,
			registry.ENUMERATE_SUB_KEYS)
		if err != nil {
			return // Registry key not present yet; retry on next tick.
		}
		names, err := key.ReadSubKeyNames(0)
		key.Close()
		if err != nil {
			return
		}
		for _, name := range names {
			vmGUID, err := hvsock.GUIDFromString(name)
			if err != nil {
				logger.V(1).Info("Skipping invalid VM GUID", "name", name, "error", err)
				continue
			}
			if !seen[vmGUID] {
				seen[vmGUID] = true
				go attemptHandshake(ctx, logger, vmGUID, found)
			}
		}
	}

	ticker := time.NewTicker(time.Second)
	defer ticker.Stop()

	// Immediate first scan, then rescan on each tick.
	scanRegistry()
	for {
		select {
		case vmGUID := <-found:
			return vmGUID, nil
		case <-ctx.Done():
			return hvsock.GUIDZero, fmt.Errorf("VM GUID discovery timed out: %w", ctx.Err())
		case <-ticker.C:
			scanRegistry()
		}
	}
}

// attemptHandshake polls a single VM GUID once per second, trying to match
// the signature phrase. Each probe runs synchronously to avoid goroutine
// leaks and panics from sending on a closed channel.
func attemptHandshake(ctx context.Context, logger logr.Logger, vmGUID hvsock.GUID, found chan<- hvsock.GUID) {
	ticker := time.NewTicker(time.Second)
	defer ticker.Stop()

	for {
		conn, err := getVsockConnection(vmGUID, vsockHandshakePort)
		if ctx.Err() != nil {
			if conn != nil {
				conn.Close()
			}
			return
		}
		if err != nil {
			logger.V(1).Info("Handshake dial failed", "vmGUID", vmGUID.String(), "error", err)
		} else {
			sig, err := readSignature(conn)
			conn.Close()
			if err != nil {
				logger.V(1).Info("Handshake read failed", "vmGUID", vmGUID.String(), "error", err)
			} else if sig == signaturePhrase {
				logger.V(1).Info("Signature matched", "vmGUID", vmGUID.String())
				select {
				case found <- vmGUID:
				default:
				}
				return
			} else {
				// Valid signature from a different distro; no point retrying.
				logger.V(1).Info("Signature mismatch", "vmGUID", vmGUID.String())
				return
			}
		}

		select {
		case <-ctx.Done():
			return
		case <-ticker.C:
		}

getVMGUID matches the first VM that sends the fixed signaturePhrase (line 98). If two WSL2 VMs run the opensuse distro simultaneously, two host-switch goroutines could bind to the same guest while the other gets no networking. In the current architecture (one App, one LimaVM, one WSL2 instance), this cannot happen. But the code's structure — one host-switch per instance name, map-based state tracking — implies multi-instance support.

pkg/controllers/lima/limavm/controllers/hostswitch_windows.go

	handshakeTimeout   = 5 * time.Minute

	// signaturePhrase identifies our distro among all running Hyper-V VMs.
	// This value is a protocol contract with the guest-side network-setup
	// binary and must not be changed independently.
	signaturePhrase = "github.com/rancher-sandbox/rancher-desktop/src/go/networking"
	readySignal     = "READY"

	gatewayMacAddr = "5a:94:ef:e4:0c:dd"
	defaultMTU     = 1500
)

Fix: Add a comment documenting the single-instance constraint.

 // getVMGUID enumerates running Hyper-V VMs from the registry and handshakes
-// with each in parallel to find the one running our distro.
+// with each in parallel to find the one running our distro. This assumes
+// only one opensuse WSL2 instance runs at a time; with multiple instances,
+// the signature-based discovery would be ambiguous.

S2. startHostSwitch inline cancel-wait duplicates stopHostSwitch Gemini¶

pkg/controllers/lima/limavm/controllers/hostswitch_windows.go

	if inst.VMType != limatype.WSL2 {
		return
	}

	r.stopHostSwitch(name)

	hsCtx, hsCancel := context.WithCancel(ctx)
	done := make(chan struct{})

	r.hostSwitchMu.Lock()
	r.hostSwitchStates[name] = &hostSwitchState{
		cancel: hsCancel,
		done:   done,
	}
	r.hostSwitchMu.Unlock()

	logger := logr.FromContextOrDiscard(ctx).WithValues("instance", name, "component", "host-switch")
	go r.runHostSwitch(hsCtx, logger, done)
}

The inline cancel-and-wait duplicates stopHostSwitch's logic. The unlock-wait-relock pattern also opens a narrow window where a concurrent shutdownAllHostagents call could delete the old entry from the map, wait on it, and then startHostSwitch would insert a new running entry — leaving a host-switch the manager believes it stopped. In practice this requires a shutdown racing with a startup, which is extremely unlikely.

Fix: Delegate to the existing stopHostSwitch method.

-	r.hostSwitchMu.Lock()
-	if old, exists := r.hostSwitchStates[name]; exists {
-		r.hostSwitchMu.Unlock()
-		old.cancel()
-		<-old.done
-		r.hostSwitchMu.Lock()
-	}
+	r.stopHostSwitch(name)
+
+	r.hostSwitchMu.Lock()
 	r.hostSwitchStates[name] = &hostSwitchState{

S3. signalListenerReady leaks goroutine when hvsock.Dial hangs Codex Gemini¶

pkg/controllers/lima/limavm/controllers/hostswitch_windows.go

	ctx, cancel := context.WithCancel(ctx)
	defer cancel()

	found := make(chan hvsock.GUID, 1)
	seen := make(map[hvsock.GUID]bool)

	scanRegistry := func() {
		key, err := registry.OpenKey(
			registry.LOCAL_MACHINE,
			`SOFTWARE\Microsoft\Windows NT\CurrentVersion\HostComputeService\VolatileStore\ComputeSystem`,
			registry.ENUMERATE_SUB_KEYS)
		if err != nil {
			return // Registry key not present yet; retry on next tick.
		}
		names, err := key.ReadSubKeyNames(0)
		key.Close()
		if err != nil {
			return
		}
		for _, name := range names {
			vmGUID, err := hvsock.GUIDFromString(name)
			if err != nil {
				logger.V(1).Info("Skipping invalid VM GUID", "name", name, "error", err)
				continue
			}
			if !seen[vmGUID] {
				seen[vmGUID] = true
				go attemptHandshake(ctx, logger, vmGUID, found)
			}

The comment at lines 316-318 says the select "prevents an indefinite hang," but it prevents only the caller from hanging. If hvsock.Dial blocks forever, the background goroutine leaks. This is a known limitation of the hvsock library (no cancelable dial), already documented in the code. The goroutine typically leaks only during process shutdown, when it is about to be cleaned up anyway.

pkg/controllers/lima/limavm/controllers/hostswitch_windows.go

// runs at a time. With multiple instances, the first match wins and the
// others get no host-switch networking.
func getVMGUID(ctx context.Context, logger logr.Logger) (hvsock.GUID, error) {
	ctx, cancel := context.WithCancel(ctx)
	defer cancel()

	found := make(chan hvsock.GUID, 1)
	seen := make(map[hvsock.GUID]bool)

	scanRegistry := func() {
		key, err := registry.OpenKey(
			registry.LOCAL_MACHINE,
			`SOFTWARE\Microsoft\Windows NT\CurrentVersion\HostComputeService\VolatileStore\ComputeSystem`,

Fix: Correct the comment to avoid implying the goroutine is canceled.

 // signalListenerReady tells the guest that the data listener is ready.
-// The dial is wrapped in a goroutine because hvsock.Dial does not accept
-// a context; the select on ctx.Done prevents an indefinite hang if the
-// VM becomes unresponsive between discovery and signaling.
+// The dial is wrapped in a goroutine because hvsock.Dial does not accept
+// a context. The select on ctx.Done prevents the caller from hanging, but
+// the dial goroutine itself may leak if the VM becomes unresponsive.

S4. Platform-independent logic locked behind Windows build constraint Claude¶

validateSubnet and newVirtualNetworkConfig use only net.ParseCIDR and net.IPv4 — no Windows-specific APIs. Because they reside in hostswitch_windows.go, the tests run only on Windows CI. Extracting these functions and their types into a hostswitch_config.go (no build constraint) would let the tests run on all platforms.

S5. AcceptStdio blocks accept loop, serializing connections Claude¶

AcceptStdio blocks until the connection closes, so the accept goroutine handles one connection at a time. This is correct because each VM runs a single vm-switch process — reconnections are serial (old connection EOF, then accept new connection). A clarifying comment would prevent a future contributor from "fixing" this with go vn.AcceptStdio(...).

+		// AcceptStdio blocks until the connection closes, so the loop
+		// handles one connection at a time. This is correct: each VM
+		// runs a single vm-switch process, and reconnections are serial.
 		if err := vn.AcceptStdio(ctx, conn); err != nil {

S6. attemptHandshake delays first probe by 1 second Gemini¶

time.NewTicker fires its first event after the specified duration, so the first handshake probe is delayed by 1 second. Adding an immediate attempt before entering the ticker loop would shave a second off networking availability on every boot.

S7. Networking doc overstates Phase 1 identification Codex¶

The doc describes the signature phrase as sufficient to identify "the correct" VM, which holds only while at most one matching opensuse guest is running. The text should document this single-instance constraint, consistent with S1.

S8. validateSubnet ignores CIDR mask Gemini¶

validateSubnet hardcodes the fourth octet to .1, .2, and .254 without checking the mask. For a /25 subnet, .254 would fall outside the valid range. Since the function is called only with the hardcoded defaultSubnet = "192.168.127.0/24", this cannot happen today. A defensive assertion (ones != 24 → error) would guard against future misuse.

Design Observations ¶

Concerns ¶

(future) The discovery protocol needs per-instance identity, not a product-wide signature string. Moving to an instance token or GUID mapping would eliminate the class of cross-instance misbinding bugs behind S1 and make the "one goroutine per instance" contract enforceable. Codex Gemini
(future) The host-switch should integrate with the reconciler's state machine rather than running as an unobserved side goroutine. A readiness/health channel between startInstance and runHostSwitch would make startup sequencing explicit and let the reconciler handle host-switch failures the same way it handles hostagent exits. The author acknowledges this in a code comment (see Acknowledged Limitations). Codex

Strengths ¶

Thorough lifecycle integration. Every code path that stops a hostagent also stops the host-switch: handleDeletion (line 40), handleWatchedState crash path (line 213), stopped-not-shouldRun path (line 268), handleRestartNeeded (line 334), startInstance failure paths (lines 458, 468), stopInstance (line 506), and shutdownAllHostagents (line 543). All eight call sites are correctly paired. Claude Codex
Errgroup structure avoids goroutine leaks. The API listener is set up before launching errgroup goroutines. The listener-close goroutines ensure Accept and Serve unblock on context cancellation — a well-designed shutdown cascade. Claude
Clean platform isolation via hostSwitchPlatform embedding. No if runtime.GOOS == "windows" checks in the controller; the compiler ensures all methods exist on all platforms. Claude Codex Gemini
Design document matches implementation. Every constant, port, MAC address, and subnet in networking.md matches the code. Lifecycle sequence diagrams correctly reflect the implementation. Claude

Testing Assessment ¶

No integration test for host-switch lifecycle — The host-switch requires a running WSL2 VM, which CI cannot provide. Unit tests cover validateSubnet and newVirtualNetworkConfig, which are the testable pure-logic components. This is a reasonable trade-off.
Handshake logic untested — getVMGUID, attemptHandshake, readSignature, and signalListenerReady depend on vsock and Windows registry, making them impractical to test without a real VM.
No coverage for concurrent start/stop — No test exercises a host-switch surviving concurrent start and stop events, though this is difficult to trigger in practice (reconciles are serialized).

Documentation Assessment ¶

networking.md is thorough and well-structured: protocol constants, lifecycle sequences, and architecture are all documented.
The api_lima.md cross-reference at line 327 correctly points to the new doc.
S7 notes that the Phase 1 description should document the single-instance constraint.

Commit Structure ¶

The two commits are well-scoped. The CI workflow fix (d498946) is independent of the host-switch feature and could have been a separate PR, but bundling it is acceptable given its small size. The PR body clearly separates the two changes with a horizontal rule.

Acknowledged Limitations ¶

Host-switch failure invisible to reconciler — hostswitch_windows.go:152-157: "If the goroutine exits due to an error (not context cancellation), the controller is not notified: the guest loses DHCP/DNS/NAT and must be restarted manually. Integrating host-switch health into the controller's state machine (enqueue a reconcile on unexpected exit) would allow automatic recovery but requires non-trivial plumbing." This limitation becomes more relevant once I2 is fixed (retry loop means the goroutine can fail later, not just at startup).
hvsock.Dial does not accept a context — hostswitch_windows.go:316-318: The goroutine wrapping hvsock.Dial may leak if the dial blocks indefinitely. The process is typically exiting when this happens.
Sequential shutdown — limavm_controller.go:508 (pre-existing): shutdownAllHostagents waits on each hostagent sequentially. Adding stopHostSwitch(name) at line 543 adds to per-instance shutdown time, but the stop is fast (context cancel → goroutine exits).

Agent Performance Retro ¶

Claude ¶

Claude produced the most thorough analysis of the lifecycle integration, tracing all eight startHostSwitch/stopHostSwitch call sites and confirming each was correctly paired. It uniquely investigated the AcceptStdio blocking behavior by reading the upstream gvisor-tap-vsock source (tap/switch.go), providing concrete evidence for S5. Its coverage was complete — every file reviewed with detailed notes. It used git blame four times to verify regression attribution.

Claude missed the one-shot registry read bug (I2) despite reading the getVMGUID function. This is the most significant coverage gap: the timing between startHostSwitch and the hostagent starting the VM is the central design constraint, and Claude acknowledged the ordering without questioning whether the registry would contain entries at that point.

Codex ¶

Codex provided the deepest analysis of the startup sequencing problem, identifying that host-switch failures are invisible to the reconciler (moved to Acknowledged Limitations since the author documented this). It uniquely raised the multi-instance ambiguity as an important issue (downgraded to S1 since the current architecture supports only one WSL2 instance). Codex ran go test and fetched PR review comments via gh api, showing initiative in gathering evidence. It used git blame ten times — the most of any agent.

Codex also missed the one-shot registry bug. Like Claude, it focused on what happens after the handshake rather than whether the handshake's prerequisites are met.

Gemini ¶

Gemini uniquely identified the one-shot registry read bug (I2, originally rated C1). This was the most important unique finding across all agents: neither Claude nor Codex caught the timing issue despite reading the same code. Gemini also identified the startHostSwitch race condition (S2) and the validateSubnet mask issue (S8).

However, Gemini's tool usage was minimal (7 calls vs. 39 for Claude and 47 for Codex). It never ran git blame, so its regression classifications lack blame evidence. The signalListenerReady leak finding (S3) and the AcceptStdio serialization concern (S5) overlap with Claude and Codex but without the upstream source verification that Claude performed.

Gemini rated the registry bug as CRITICAL. The consolidated report downgrades it to IMPORTANT because the bug requires no other WSL2 distro to be running — on most developer machines, WSL2 is already active, keeping the utility VM's GUID in the registry.

Tool call highlights

Claude: 4 git blame calls, verified upstream AcceptStdio behavior in gvisor-tap-vsock source. Thorough context-building.
Codex: 10 git blame calls, ran go test, fetched PR comments via gh api. Most comprehensive tool usage.
Gemini: No git blame, only 7 total tool calls. Unable to verify regression classifications. Read only limavm_lifecycle.go in full; other files accessed only via grep.

Summary ¶

	Claude Opus 4.6	Codex GPT 5.4	Gemini 3.1 Pro
Duration	482s	410s	261s
Findings	0C 1I 3S	0C 0I 3S	0C 2I 3S
Tool calls	39 (Read 23, Bash 11)	47 (exec_command 45)	7 (grep_search 5)
Design observations	4 strengths	2 concerns, 2 strengths	1 concern, 1 strength
False positives	0	0	0
Unique insights	S5 (AcceptStdio blocking)	Host-switch health (→ ack'd limitation)	I2 (one-shot registry), S2, S8
Files reviewed	11/11	11/11	11/11
Coverage misses	Missed I2	Missed I1, I2	0

Totals

	Claude Opus 4.6	Codex GPT 5.4	Gemini 3.1 Pro
Totals	0C 1I 3S	0C 0I 3S	0C 2I 3S
Downgraded	0	2 (I1→ack'd, I2→S1)	1 (C1→I2)
Dropped	0	0	0

Reconciliation: Gemini C1 (one-shot registry) → consolidated I2 (downgraded from critical to important: only triggers when no other WSL2 distro is running). Codex I1 (startup failure invisible) → acknowledged limitation (author already documented this in code comment at hostswitch_windows.go:152-157). Codex I2 (multi-instance ambiguity) → consolidated S1 (downgraded from important to suggestion: current architecture supports only one WSL2 instance).

Review Process Notes ¶

No suggestions for this round. The prompt guided all three agents to the accept-loop error handling issue independently. The one-shot registry bug was caught by the agent that focused more on timing constraints than on code tracing — a reminder that different analytical styles complement each other.

Resolution ¶


Addressed	2026-04-08
Commit	`4c4439f Add host-switch virtual network for WSL2 instances`

#	Finding	Action
1	Important #1: Accept loop swallows unexpected errors	Fixed
2	Important #2: Registry-based VM discovery is one-shot	Fixed
3	Suggestion #1: VM discovery cannot distinguish multiple opensuse instances	Commented
4	Suggestion #2: startHostSwitch inline cancel-wait duplicates stopHostSwitch	Fixed
5	Suggestion #3: signalListenerReady leaks goroutine when hvsock.Dial hangs	Fixed
6	Suggestion #4: Platform-independent logic locked behind Windows build constraint	Skipped
7	Suggestion #5: AcceptStdio blocks accept loop, serializing connections	Commented
8	Suggestion #6: attemptHandshake delays first probe by 1 second	Fixed
9	Suggestion #7: Networking doc overstates Phase 1 identification	Fixed
10	Suggestion #8: validateSubnet ignores CIDR mask	Skipped
11	Testing Gap #1: No integration test for host-switch lifecycle	Skipped
12	Testing Gap #2: Handshake logic untested	Skipped
13	Testing Gap #3: No coverage for concurrent start/stop	Skipped