fix(packaged): swallow harmless setTypeOfService EINVAL from undici (#895) (#906)

* fix(packaged): swallow harmless setTypeOfService EINVAL from undici (#895)

Opening Settings → Pets → Community in the packaged desktop app
surfaced a native "JavaScript error in main process" dialog with
`Uncaught Exception: Error: setTypeOfService EINVAL`. Root cause:
undici's socket setup tries to set the IP_TOS byte for QoS / DSCP
marking on outbound sockets, and the macOS kernel refuses with
EINVAL on certain configurations (VPNs, IPv6-only sockets, some
firewall postures). The byte is purely advisory — the socket
itself is healthy and serves traffic — so the rejection should
not crash the app.

Two cooperating layers:

1. **`protocol.ts`** registers the `od://` scheme that backs every
   renderer page load and API call in the packaged build by
   forwarding through Node's global `fetch` (which is undici under
   the hood). Pulled the inner request handler out as
   `handleOdRequest()` so a test can drive it with a stub fetch,
   and wrapped the `await fetch()` in a try/catch that returns a
   502 Response on failure. Without this, every undici rejection —
   not just `setTypeOfService` — propagated to Electron's default
   uncaught-exception path. Now the renderer sees a normal error
   response and the main process keeps running.

2. **`logging.ts`** adds a defensive `process.on('uncaughtException')`
   handler with a narrow filter, `isHarmlessSocketOptionError`,
   that only matches the canonical undici shape (message contains
   `setTypeOfService` AND code is `EINVAL` or message contains
   `EINVAL`). For any unrecognised error the handler re-throws
   via `setImmediate` so Node's default crash + Electron's
   crash dialog still fire end-to-end — a future regression that
   broadens the filter to "every EINVAL" is caught by the unit
   tests below.

Tests: 13 new tests across `tests/protocol.test.ts` (5) and
`tests/logging.test.ts` (8) pin both layers — including the
explicit #895 regression case (fetch rejecting with the canonical
EINVAL shape returns a 502 instead of throwing) and the negative
guard against the filter swallowing real bugs (a generic write
EINVAL or a setTypeOfService EACCES is *not* matched).

Verified locally:
- `pnpm --filter @open-design/packaged vitest tests/protocol.test.ts tests/logging.test.ts` → 13/13
- packaged `tsconfig.json` and `tsconfig.tests.json` (the CI killer): both clean
- the one pre-existing failure in `tests/sidecars.test.ts` (`adds custom VP_HOME/bin …`) is independent of this PR — confirmed by stashing the change and re-running

* fix(packaged): break recursive rethrow + tighten EINVAL filter (#906 review)

@mrcfps and @lefarcen both flagged a real P1 in the first iteration:
the non-harmless branch of the new uncaughtException handler
rethrew via setImmediate while the same listener was still
registered, so a real bug would re-enter the handler indefinitely
instead of terminating. mrcfps reproduced the loop with a minimal
Node script. lefarcen also flagged that the filter trusted the
message string over a contradicting structured `code`.

Both fixes:

1. **Recursive rethrow (P1).** Extract the handler as a named
   factory, `createFatalUncaughtExceptionHandler(logger)`, that
   captures itself in closure. On non-harmless errors the handler
   now `process.removeListener('uncaughtException', self)` before
   scheduling the rethrow. With no listener registered, the next
   throw lands in Node's default crash path — which is exactly
   what we wanted ("preserve fail-fast for real bugs").

2. **`code` is authoritative (P2).** When `code` is present on the
   error, only `code === 'EINVAL'` qualifies. A contradicting
   `EACCES`/`EPERM` paired with `setTypeOfService EINVAL` in the
   message now slips through to the crash path instead of being
   swallowed. Message-based detection only fires when `code` is
   genuinely absent (some libuv builds don't populate it on raw
   thrown Errors).

3 new tests pin both fixes:
   - `does NOT match when code contradicts the message` and the
     EPERM variant guard against the P2 regression.
   - `removes itself from uncaughtException listeners before
     scheduling the rethrow` uses `vi.spyOn(process,
     'removeListener')` and a stubbed setImmediate to assert the
     call order: removeListener fires before setImmediate
     schedules the throw.
   - `does NOT re-enter itself when invoked twice` is a
     belt-and-suspenders loop guard — even if a future refactor
     dropped the removeListener call, the test would catch
     runaway scheduling.

Verified locally:
- packaged vitest: 18/18 (was 13, +3 new tests; +2 negative-guard
  tests for the P2 filter; -0 deletions)
- packaged tsc -p tsconfig.json --noEmit: clean
- packaged tsc -p tsconfig.tests.json --noEmit (the CI killer): clean
This commit is contained in:
Sid 2026-05-09 01:16:23 +08:00 committed by GitHub
parent 32c36f44a7
commit 362b92c1a6
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
4 changed files with 490 additions and 2 deletions

View file

@ -26,6 +26,94 @@ function normalizeError(error: unknown): unknown {
return error;
}
/**
* Recognise known-harmless socket option errors so the packaged main
* process can swallow them instead of surfacing Electron's "JavaScript
* error in main process" dialog (issue #895).
*
* The flagship case is undici throwing `setTypeOfService EINVAL` from
* its socket setup path: certain macOS / VPN configurations refuse to
* let the kernel set the IP_TOS byte on outbound sockets. The QoS
* marking failing has no functional impact on the request the socket
* still connects and serves traffic so the right behaviour is to
* log + ignore, not to crash.
*
* Match strategy is intentionally narrow:
* 1. The error message must name the `setTypeOfService` syscall.
* 2. The structured `code` property is **authoritative** when
* present: it must equal `EINVAL` for the error to qualify.
* A `code` of anything else (e.g. `EACCES`) is treated as a
* contradicting signal and the filter rejects the match
* otherwise an `EACCES` permission failure with a stale or
* copy-pasted `EINVAL` substring in the message would slip
* through.
* 3. Only when `code` is absent (some libuv builds don't populate
* it on raw thrown Errors) do we fall back to looking for the
* `EINVAL` token in the message.
*
* We never swallow every `EINVAL`: that code is raised by plenty of
* real bugs (bad config values, malformed arguments to other
* syscalls). Exported so a unit test can pin the exact shape this
* branch matches.
*/
export function isHarmlessSocketOptionError(value: unknown): boolean {
if (!(value instanceof Error)) return false;
const message = typeof value.message === "string" ? value.message : "";
if (!message) return false;
if (!message.includes("setTypeOfService")) return false;
const code = (value as NodeJS.ErrnoException).code;
if (typeof code === "string" && code.length > 0) {
// Structured code present — it has to be EINVAL. Anything else
// (EACCES, EPERM, ECONNRESET, …) is a contradicting signal and
// we let it crash so real bugs don't get hidden.
return code === "EINVAL";
}
// No structured code: fall back to message-based detection. The
// libuv error string is `<syscall> <errcode>` so the EINVAL token
// appears alongside the syscall name.
return message.includes("EINVAL");
}
/**
* Build the named `uncaughtException` handler used by
* `attachPackagedDesktopProcessLogging`. Exposed as its own factory
* so a unit test can drive it without bringing up the full logging
* pipeline.
*
* Behaviour contract (issue #906 review):
* - Harmless `setTypeOfService EINVAL` shapes from undici socket
* internals are logged at warn level and the function returns
* silently. The process continues running.
* - Anything else is logged at error level, then the handler
* **removes itself from `process.uncaughtException` listeners**
* and re-throws via `setImmediate`. Removing the listener first
* is critical: without it, the re-throw re-enters the same
* handler, schedules another setImmediate, and the packaged
* main process spins forever instead of terminating
* reproduced and called out by mrcfps + lefarcen on the first
* #906 review pass. With the listener gone the next throw has
* no `uncaughtException` listener to land in, Node's default
* crash path takes over, and Electron's native "JavaScript
* error in main process" dialog renders as it did before this
* code existed.
*/
export function createFatalUncaughtExceptionHandler(
logger: PackagedDesktopLogger,
): (error: unknown) => void {
const handler = (error: unknown): void => {
if (isHarmlessSocketOptionError(error)) {
logger.warn("packaged desktop swallowed harmless socket option error", { error });
return;
}
logger.error("packaged desktop fatal uncaught exception", { error });
process.removeListener("uncaughtException", handler);
setImmediate(() => {
throw error;
});
};
return handler;
}
function normalizeMeta(meta: Record<string, unknown> | undefined): Record<string, unknown> | undefined {
if (meta == null) return undefined;
return Object.fromEntries(
@ -123,7 +211,30 @@ export function attachPackagedDesktopProcessLogging(options: {
process.on("uncaughtExceptionMonitor", (error) => {
logger.error("packaged desktop uncaught exception", { error });
});
// Defensive filter for known-harmless network errors. undici can throw
// `setTypeOfService EINVAL` from socket internals on certain macOS /
// VPN configurations (issue #895): the kernel rejects setting the
// IP_TOS byte on the outbound socket, but the connection itself is
// healthy — we just don't get the QoS / DSCP marking, which the app
// doesn't depend on. Without this filter the rejection bubbles to
// Electron's default handler and surfaces as a native "JavaScript
// error in main process" dialog the next time anything in the
// renderer does a fetch (e.g. opening Settings → Pets → Community).
//
// For non-harmless errors the handler restores Node's default
// uncaughtException behaviour: it removes itself from the listener
// list and re-throws via `setImmediate`. With no `uncaughtException`
// handlers registered, Node's default crash path takes over (stack
// trace + non-zero exit) and Electron's own "JavaScript error in
// main process" dialog renders as it would have before this code
// existed. See `createFatalUncaughtExceptionHandler` for the
// unit-tested factory.
process.on("uncaughtException", createFatalUncaughtExceptionHandler(logger));
process.on("unhandledRejection", (reason) => {
if (isHarmlessSocketOptionError(reason)) {
logger.warn("packaged desktop swallowed harmless socket option rejection", { reason });
return;
}
logger.error("packaged desktop unhandled rejection", { reason });
});
process.on("beforeExit", (code) => {

View file

@ -25,13 +25,63 @@ function toWebRuntimeUrl(webRuntimeUrl: string, requestUrl: string): string {
return target.toString();
}
function buildProxyErrorResponse(error: unknown, target: string): Response {
const message = error instanceof Error ? error.message : String(error);
const code =
error instanceof Error && typeof (error as NodeJS.ErrnoException).code === "string"
? (error as NodeJS.ErrnoException).code
: null;
return new Response(
JSON.stringify({
error: "OD_PROTOCOL_PROXY_FAILED",
message,
...(code === null ? {} : { code }),
target,
}),
{
status: 502,
headers: { "content-type": "application/json" },
},
);
}
/**
* Inner request handler for the `od://` Electron protocol every
* renderer fetch flows through here and gets proxied to the local web
* sidecar via Node's global `fetch` (which is undici under the hood).
*
* Pulled out as a named export so unit tests can drive it with a stub
* `fetchImpl` without spinning up Electron, and so the try/catch
* stays auditable from one place.
*
* Why the try/catch matters: undici can throw `setTypeOfService
* EINVAL` from socket internals on certain macOS / VPN configurations
* (issue #895). Without the catch, the rejection bubbles all the way
* up to the Electron main process and surfaces as a native
* "JavaScript error in main process" dialog the next time the user
* does anything that triggers a renderer-to-sidecar fetch (e.g.
* Settings Pets Community). Returning a 502 instead lets the
* renderer see a normal failure and keeps the process alive.
*/
export async function handleOdRequest(
request: Request,
webRuntimeUrl: string,
fetchImpl: typeof fetch = fetch,
): Promise<Response> {
const target = toWebRuntimeUrl(webRuntimeUrl, request.url);
try {
return await fetchImpl(new Request(target, request));
} catch (error) {
return buildProxyErrorResponse(error, target);
}
}
export function packagedEntryUrl(): string {
return OD_ENTRY_URL;
}
export function registerOdProtocol(webRuntimeUrl: string): void {
protocol.handle(OD_SCHEME, async (request) => {
const target = toWebRuntimeUrl(webRuntimeUrl, request.url);
return await fetch(new Request(target, request));
return await handleOdRequest(request, webRuntimeUrl);
});
}

View file

@ -0,0 +1,195 @@
/**
* Regression coverage for the harmless-socket-option filter the
* packaged Electron entry uses to swallow `setTypeOfService EINVAL`
* undici crashes (issue #895). Without the filter, those errors
* surface as native "JavaScript error in main process" dialogs the
* moment a renderer fetch hits the affected socket option setter on
* macOS / VPN configurations that don't allow IP_TOS marking.
*
* Match strategy is intentionally narrow name the syscall AND
* verify the EINVAL code so a future regression that broadens the
* filter to "every EINVAL" (which would silently swallow real bugs)
* trips a test.
*
* @see https://github.com/nexu-io/open-design/issues/895
*/
import { afterEach, describe, expect, it, vi } from 'vitest';
import {
createFatalUncaughtExceptionHandler,
isHarmlessSocketOptionError,
type PackagedDesktopLogger,
} from '../src/logging.js';
function stubLogger(): PackagedDesktopLogger {
return {
error: vi.fn(),
info: vi.fn(),
warn: vi.fn(),
};
}
afterEach(() => {
vi.restoreAllMocks();
});
describe('isHarmlessSocketOptionError (issue #895)', () => {
it('matches the canonical undici setTypeOfService EINVAL shape', () => {
const error = new Error('setTypeOfService EINVAL') as NodeJS.ErrnoException;
error.code = 'EINVAL';
error.syscall = 'setTypeOfService';
expect(isHarmlessSocketOptionError(error)).toBe(true);
});
it('matches when only the message has both tokens (no code property set)', () => {
// Some libuv builds surface the error without populating the
// `code` property — the message string itself is the sole signal.
const error = new Error('setTypeOfService EINVAL');
expect(isHarmlessSocketOptionError(error)).toBe(true);
});
it('matches when code is EINVAL and message references setTypeOfService', () => {
const error = new Error('Error: setTypeOfService failed') as NodeJS.ErrnoException;
error.code = 'EINVAL';
expect(isHarmlessSocketOptionError(error)).toBe(true);
});
it('does NOT match a generic EINVAL — that code is also raised by real bugs', () => {
// Guard against a future refactor that broadens the filter to
// "every EINVAL" and silently swallows configuration errors.
const error = new Error('write EINVAL') as NodeJS.ErrnoException;
error.code = 'EINVAL';
expect(isHarmlessSocketOptionError(error)).toBe(false);
});
it('does NOT match a setTypeOfService error with a different errno (e.g. EACCES)', () => {
const error = new Error('setTypeOfService EACCES') as NodeJS.ErrnoException;
error.code = 'EACCES';
expect(isHarmlessSocketOptionError(error)).toBe(false);
});
// #906 review (lefarcen P2): the structured `code` property must be
// authoritative when present. Without this guard a contradicting
// EACCES code paired with a message containing both `setTypeOfService`
// and `EINVAL` (a stale token, copy-pasted shape, or future libuv
// formatting change) would slip through and silently swallow a real
// permissions failure.
it('does NOT match when code contradicts the message — code is authoritative (#906 review)', () => {
const error = new Error('setTypeOfService EINVAL') as NodeJS.ErrnoException;
error.code = 'EACCES';
expect(isHarmlessSocketOptionError(error)).toBe(false);
});
it('does NOT match when code is a different errno but message has both tokens (defence-in-depth)', () => {
const error = new Error('setTypeOfService EINVAL — extra context') as NodeJS.ErrnoException;
error.code = 'EPERM';
expect(isHarmlessSocketOptionError(error)).toBe(false);
});
it('does NOT match unrelated errors with similar shape', () => {
const error = new Error('something happened');
expect(isHarmlessSocketOptionError(error)).toBe(false);
});
it('does NOT match non-Error rejection values (preserves fail-fast for primitives)', () => {
expect(isHarmlessSocketOptionError('setTypeOfService EINVAL')).toBe(false);
expect(isHarmlessSocketOptionError(null)).toBe(false);
expect(isHarmlessSocketOptionError(undefined)).toBe(false);
expect(isHarmlessSocketOptionError({ message: 'setTypeOfService EINVAL' })).toBe(false);
});
it('does NOT match Error with empty message even if code is EINVAL', () => {
const error = new Error('') as NodeJS.ErrnoException;
error.code = 'EINVAL';
expect(isHarmlessSocketOptionError(error)).toBe(false);
});
});
// Regression coverage for the #906 review (mrcfps + lefarcen):
// the non-harmless branch must NOT re-enter itself when it re-throws.
// Without the explicit `process.removeListener` call, the rethrown
// error landed back in this same listener, scheduled another
// `setImmediate`, and the packaged main process span forever instead
// of terminating. The factory exposes the named handler so we can
// assert the listener-removal step in isolation.
describe('createFatalUncaughtExceptionHandler (issue #906)', () => {
it('logs harmless socket option errors at warn level and returns silently', () => {
const logger = stubLogger();
const handler = createFatalUncaughtExceptionHandler(logger);
const harmless = new Error('setTypeOfService EINVAL') as NodeJS.ErrnoException;
harmless.code = 'EINVAL';
const removeListenerSpy = vi.spyOn(process, 'removeListener');
const setImmediateSpy = vi.spyOn(globalThis, 'setImmediate');
handler(harmless);
expect(logger.warn).toHaveBeenCalledTimes(1);
expect(logger.error).not.toHaveBeenCalled();
// Critical: the harmless branch must not deregister or schedule
// anything — the process must keep running normally.
expect(removeListenerSpy).not.toHaveBeenCalled();
expect(setImmediateSpy).not.toHaveBeenCalled();
});
it('removes itself from uncaughtException listeners before scheduling the rethrow (#906 P1)', () => {
const logger = stubLogger();
const handler = createFatalUncaughtExceptionHandler(logger);
const removeListenerSpy = vi.spyOn(process, 'removeListener');
// Stub setImmediate to capture the scheduled callback without
// actually firing it (which would dump the rethrown error onto
// vitest's own uncaughtException path).
const scheduled: Array<() => void> = [];
const setImmediateSpy = vi
.spyOn(globalThis, 'setImmediate')
.mockImplementation(((fn: () => void) => {
scheduled.push(fn);
return 0 as unknown as NodeJS.Immediate;
}) as typeof setImmediate);
const realBug = new Error('something genuinely broke');
handler(realBug);
expect(logger.error).toHaveBeenCalledTimes(1);
// The actual #906 P1 fix: handler unregisters itself BEFORE the
// rethrow is scheduled. Mirrors mrcfps's reproduction script.
expect(removeListenerSpy).toHaveBeenCalledWith('uncaughtException', handler);
const removeListenerOrder = removeListenerSpy.mock.invocationCallOrder[0]!;
const setImmediateOrder = setImmediateSpy.mock.invocationCallOrder[0]!;
expect(removeListenerOrder).toBeLessThan(setImmediateOrder);
// The scheduled callback rethrows the original error so Node's
// default uncaughtException path picks it up after the listener
// is gone.
expect(scheduled).toHaveLength(1);
expect(() => scheduled[0]!()).toThrow(realBug);
});
it('does NOT re-enter itself when invoked twice with non-harmless errors (loop guard)', () => {
// Belt-and-suspenders: even if a future refactor accidentally
// forgot the removeListener call, this test would catch the
// recursion the original review reproduced.
const logger = stubLogger();
const handler = createFatalUncaughtExceptionHandler(logger);
const setImmediateCallCount: number[] = [];
vi.spyOn(globalThis, 'setImmediate').mockImplementation(((fn: () => void) => {
setImmediateCallCount.push(setImmediateCallCount.length);
// Crucially do NOT actually call fn() — that would test the
// recursion path through Node's process.emit, which is what
// we're trying to break out of.
void fn;
return 0 as unknown as NodeJS.Immediate;
}) as typeof setImmediate);
handler(new Error('first'));
handler(new Error('second'));
// Each invocation schedules exactly one rethrow. If the handler
// re-entered itself we'd see runaway scheduling.
expect(setImmediateCallCount).toHaveLength(2);
expect(logger.error).toHaveBeenCalledTimes(2);
});
});

View file

@ -0,0 +1,132 @@
/**
* Regression coverage for the `od://` protocol proxy in
* apps/packaged/src/protocol.ts.
*
* The packaged Electron entry registers `od://` as the loader for the
* web runtime and forwards every renderer request to the local web
* sidecar through Node's global `fetch` (which is undici under the
* hood). Without a try/catch in the handler, undici throwing
* `setTypeOfService EINVAL` from socket internals on certain macOS /
* VPN configurations bubbled up to Electron's default uncaught
* exception handler surfacing as a native "JavaScript error in
* main process" dialog the moment the user did anything that
* triggered a fetch (e.g. Settings Pets Community).
*
* @see https://github.com/nexu-io/open-design/issues/895
*/
// `protocol.handle` from the `electron` module is invoked at import
// time inside `apps/packaged/src/protocol.ts`. Stub the module before
// importing so the test environment doesn't need a real Electron
// runtime.
import { vi } from 'vitest';
vi.mock('electron', () => ({
protocol: {
registerSchemesAsPrivileged: vi.fn(),
handle: vi.fn(),
},
}));
import { afterEach, describe, expect, it } from 'vitest';
import { handleOdRequest } from '../src/protocol.js';
afterEach(() => {
vi.restoreAllMocks();
});
describe('od:// protocol proxy', () => {
it('proxies the request through fetchImpl with the rewritten target URL', async () => {
const captured: Request[] = [];
const fetchImpl: typeof fetch = async (input) => {
captured.push(input as Request);
return new Response('ok', { status: 200 });
};
const request = new Request('od://app/api/codex-pets/sync', { method: 'POST' });
const response = await handleOdRequest(request, 'http://127.0.0.1:17579/', fetchImpl);
expect(response.status).toBe(200);
expect(captured).toHaveLength(1);
expect(captured[0]!.url).toBe('http://127.0.0.1:17579/api/codex-pets/sync');
expect(captured[0]!.method).toBe('POST');
});
it('preserves the request path, search, and hash when rewriting to the web sidecar', async () => {
const captured: Request[] = [];
const fetchImpl: typeof fetch = async (input) => {
captured.push(input as Request);
return new Response('', { status: 204 });
};
const request = new Request('od://app/api/projects?limit=5#section', { method: 'GET' });
await handleOdRequest(request, 'http://127.0.0.1:42424/', fetchImpl);
const target = new URL(captured[0]!.url);
expect(target.host).toBe('127.0.0.1:42424');
expect(target.pathname).toBe('/api/projects');
expect(target.search).toBe('?limit=5');
// `Request` strips the hash fragment per the Fetch spec, but the
// pathname + search above are the values the proxy is responsible
// for getting right. Pin those.
});
// The flagship #895 regression: undici can throw `setTypeOfService
// EINVAL` mid-fetch from socket internals. Without the try/catch
// wrapper around the handler's fetch call, that rejection propagates
// up to Electron's default uncaught exception handler and surfaces
// as a native "JavaScript error in main process" dialog. The
// handler must instead return a 502 Response so the renderer sees
// a normal failure and the process keeps running.
it('returns a 502 Response when the underlying fetch rejects (issue #895)', async () => {
const fetchImpl: typeof fetch = async () => {
const error = new Error('setTypeOfService EINVAL') as NodeJS.ErrnoException;
error.code = 'EINVAL';
error.syscall = 'setTypeOfService';
throw error;
};
const request = new Request('od://app/api/codex-pets/sync', { method: 'POST' });
const response = await handleOdRequest(request, 'http://127.0.0.1:17579/', fetchImpl);
expect(response.status).toBe(502);
const body = (await response.json()) as {
error: string;
message: string;
code?: string;
target: string;
};
expect(body.error).toBe('OD_PROTOCOL_PROXY_FAILED');
expect(body.message).toContain('setTypeOfService');
expect(body.code).toBe('EINVAL');
expect(body.target).toBe('http://127.0.0.1:17579/api/codex-pets/sync');
});
it('does not throw when fetch rejects (the actual #895 root-cause guard)', async () => {
const fetchImpl: typeof fetch = async () => {
throw new Error('socket hang up');
};
// The promise must resolve with a Response, never reject.
await expect(
handleOdRequest(new Request('od://app/'), 'http://127.0.0.1:1/', fetchImpl),
).resolves.toBeInstanceOf(Response);
});
it('handles non-Error rejection values without throwing', async () => {
const fetchImpl: typeof fetch = async () => {
// eslint-disable-next-line @typescript-eslint/no-throw-literal
throw 'sync timeout';
};
const response = await handleOdRequest(
new Request('od://app/api/probe'),
'http://127.0.0.1:1/',
fetchImpl,
);
expect(response.status).toBe(502);
const body = (await response.json()) as { message: string };
expect(body.message).toBe('sync timeout');
});
});