open-design

vndangkhoa/open-design

Fork 0

mirror of https://github.com/nexu-io/open-design.git synced 2026-06-01 03:14:35 +07:00

Commit graph

Author SHA1 Message Date

Author	SHA1	Message	Date
Nagendhra Madishetti	32fa0c23bb	feat(daemon): Critique Theater Phase 6.2 (artifact extraction + endpoint) (#1085 ) The orchestrator was leaving artifactPath = null on every shipped run because the SHIP <ARTIFACT> body never made it past the parser. Reviewers caught this on PR #1006: a rerun-style endpoint built on top of that null could not return a usable prior-art reference, and tests that synthesized artifactPath via insertCritiqueRun were hiding the gap rather than covering the feature. This PR closes that gap. The parser now hands the orchestrator a ShipArtifactPayload (round, mime, body) through a side-channel callback, and the orchestrator writes the bytes to <artifactsDir>/<projectId>/<runId>/ artifact.<ext> via a new artifact-writer module. The row's artifactPath is the absolute on-disk path. The web layer never sees that path: it fetches the bytes through GET /api/projects/:projectId/critique/:runId/artifact, which the new artifact-handler module serves with a mime-derived Content-Type, X-Content-Type-Options: nosniff, a CSP header for HTML and SVG, and the same cross-project leak guard pattern the interrupt handler uses. The body and mime intentionally never travel on the SSE wire. The SHIP PanelEvent (which doubles as the SSE payload shape) keeps its lightweight artifactRef, and the orchestrator strips body/mime before bus.emit, so a multi-megabyte artifact does not broadcast to every subscriber. The new orchestrator test asserts this explicitly. Defense in depth in the writer + handler: - mime allowlist with text/html, text/css, text/markdown, text/plain, application/json, image/svg+xml; everything else falls through to application/octet-stream + .bin so unknown payloads can't be misinterpreted as a known type; - UTF-8 byte-length cap, configurable via cfg.parserMaxBlockBytes, so multi-byte payloads can't sneak past a JS .length check; - atomic write through a sibling tmp file + rename so a daemon crash mid-write can't leave a half-written artifact under the canonical name; - path-traversal guard on the GET endpoint that resolves the row's artifactPath against the artifacts root and refuses anything that escapes it, refuses non-regular files (symlinks, dirs), and refuses files larger than the response cap. Folded in two non-blocking notes lefarcen left on PR #1016 (the contracts move) since persistence.ts was already in scope here: - P2: introduced CritiquePersistedStatus = CritiqueRunStatus \| 'running' in the contracts package. CritiqueRunRow.status and CritiqueRunInsert. status now use it, and the inline `as CritiqueRunStatus \| 'running'` widen in interrupt-handler.ts is gone. Public DTOs continue to use the terminal-only CritiqueRunStatus so a future endpoint can't leak a 'running' row through the wire. - P3: added AssertExhaustiveValues + a compile-time assertion that CRITIQUE_RUN_STATUSES covers every CritiqueRunStatus variant. Adding a value to ShipStatus or CritiqueRunStatus without updating the array now fails the build with a tuple naming the missing variants instead of silently dropping out of UI filters. Coverage: 174 critique tests across 14 files pass locally, including the new critique-artifact-writer (13 cases) and critique-artifact-endpoint (11 cases) suites, the inverted critique-lifecycle artifact-persistence test, and the orchestrator happy-path that asserts the SSE ship payload does NOT carry body or mime. Validated: pnpm guard, pnpm --filter @open-design/contracts build, pnpm --filter @open-design/daemon build (full tsc), pnpm --filter @open-design/web typecheck, pnpm --filter @open-design/daemon exec vitest run tests/critique (all green). This is step (b) of the four-step plan that PR #1006's closing comment laid out. Step (a) was the contracts move in PR #1016. Steps (c) (persist original_message_id / agent_id / model_id) and (d) (real rerun endpoint on top of (a)+(b)+(c)) follow. Co-authored-by: Nagendhra <nagendhra405@gmail.com>	2026-05-10 23:59:04 +08:00
Nagendhra Madishetti	76e6c7a9f6	feat: Critique Theater Phase 4 (persistence + transcript + orchestrator) (#481 ) * docs(specs): add Critique Theater design spec for panel-tempered artifacts * docs(specs): add Critique Theater implementation plan * docs(specs): rename UI to Design Jury, add lane-density modes, ship-rule explainer, label sizing * feat(contracts): add CritiqueConfig schema and defaults * fix(contracts): apply Task 1.1 review (CRITIQUE_PROTOCOL_VERSION rename, descriptions, RoleWeights export) * feat(contracts): add PanelEvent discriminated union and isPanelEvent guard * fix(contracts): apply Task 1.2 review (exhaustive event-type list, runId guard, import order) * feat(contracts): add CritiqueSseEvent variants and panelEventToSse mapper * test(daemon): add v1 wire-protocol golden fixtures for Critique Theater parser * feat(daemon): add v1 streaming parser for Critique Theater wire protocol * chore(contracts): add .js extensions to relative imports for NodeNext consumers * fix(daemon): satisfy noUncheckedIndexedAccess in v1 parser regex match access * test(daemon): cover parser failure modes; fix unclosed-PANELIST swallow bug * fix(daemon,contracts): address PR #387 review - parser now clamps panelist + DIM scores against the run-declared scale captured from <CRITIQUE_RUN scale=...>, not a hardcoded 100 - PANELIST appearing before any <ROUND n=...> opens now throws MalformedBlockError rather than emitting events with NaN round - DIM_RE and MUST_FIX_RE hoisted to module scope and lastIndex reset per call so the parser hot path stops recompiling regex per artifact - overflow check after drain simplified to a plain buf.length > cap test (the prior compound condition was always true on the right side and obscured intent) - scoreThreshold <= scoreScale refine gains a 1e-9 epsilon so floating slack does not reject semantically valid configs - round-1 designer ARTIFACT guard gains a comment naming the spec invariant and the v2 relaxation path - 3 new regression tests cover the panelist-without-round, scale=10 clamp, and scale=20 plumbing cases * docs(specs): rationale for non-goals, failure-mode rate targets, Phase 10 matrix, Phase 14 doc layout * Merge branch 'main' into feat/critique-theater Resolves the contracts/index.ts conflict by keeping the .js extensions added by chore(contracts) `2d6e8d6` and slotting in the new export for ./api/app-config introduced upstream by #255 (`9d700ec`). Critique Theater additions (./sse/critique, ./critique) preserved in their original positions. Verified after merge: pnpm --filter @open-design/contracts test -> 10/10 pass pnpm --filter @open-design/contracts typecheck -> exit 0 pnpm --filter @open-design/daemon typecheck -> exit 0 pnpm --filter @open-design/web typecheck -> exit 0 Two daemon tests in tests/media-config.test.ts fail both before and after the merge because they read real OAuth credentials from the developer machine instead of using mock fixtures. That's an upstream isolation issue on origin/main, not something this branch introduces. * fix: unblock web build and address mrcfps PANELIST oversize bypass The chore commit that added .js extensions to satisfy daemon's nodenext typecheck broke apps/web's Next.js build, because webpack tried to resolve the literal ./common.js when only common.ts exists on disk. Replaced with a subpath approach: contracts/exports gains a './critique' entry pointing straight at src/critique.ts (which has no relative imports), and daemon imports route through @open-design/contracts/critique instead of the barrel. Web keeps the bundler-friendly barrel; daemon's nodenext walks only the leaf module. All 13 contracts source files reverted to no-.js. Separately, mrcfps flagged that parserMaxBlockBytes was only enforced on the leftover buffer after drain returned, so a complete oversized block arriving in one chunk slipped past the cap. Added an explicit per-block size check inside drain for every buffered block type (PANELIST, ROUND_END, SHIP). Three regression tests yield the whole stream as a single chunk and assert OversizeBlockError fires before any events emit. * fix(daemon): close three v1 parser invariant gaps from mrcfps review Three independent gaps that all let malformed or oversized protocol output pass the v1 envelope contract: (1) Envelope guard. ROUND, PANELIST, ROUND_END, and SHIP now throw MalformedBlockError when state.inRun is false. Without this, a stream that omits <CRITIQUE_RUN> could still emit panelist_* events without the run_started handshake, leaving downstream reducers with no run-level config. (2) UTF-8 byte length. Both the per-block size check and the post-drain buf-size check now compare Buffer.byteLength(text, 'utf8') against parserMaxBlockBytes. The previous string-length comparison let multibyte content (CJK, emoji) inside <NOTES>/<SUMMARY> exceed the configured byte cap while staying under the JS string length cap, bypassing the daemon's resource guard. (3) Header-end ordering. PANELIST, ROUND_END, and SHIP now require the opener's > to appear before the matched closing tag. A malformed opener like <PANELIST role="x" score="8"</PANELIST> previously fell through to the closing tag's > and emitted events for an invalid block. Four regression tests cover each gap (ROUND-without-run, SHIP-without-run, multibyte-byte-cap, malformed-opener). * feat(daemon): add critique_runs persistence (Task 4.1) Introduces a new SQLite table critique_runs to back the orchestrator's run lifecycle. Plan called for ALTER TABLE artifacts ADD COLUMN ..., but artifacts is not a DB concept in this repo; runs get their own table. - migrateCritique(db) creates the table + two indexes idempotently and is wired into the existing migrate(db) flow on daemon boot. - CRUD helpers (insertCritiqueRun, getCritiqueRun, updateCritiqueRun, listCritiqueRunsByProject, deleteCritiqueRun) round-trip rounds_json through helpers so callers see typed CritiqueRunRow. - reconcileStaleRuns flips stale 'running' rows to 'interrupted' with a recoveryReason='daemon_restart' marker, supporting the spec's daemon-restart-mid-run failure mode. - Public CritiqueRunStatus union excludes the in-flight 'running' value but the runtime CHECK accepts it, matching the spec's lifecycle. - 11 vitest cases cover migration idempotence, round-trip, default rounds, status validation, update + list ordering, deletion, and reconciliation, plus FK CASCADE on project deletion. * feat(daemon): add Critique Theater transcript writer (Task 4.2) Streams PanelEvent sequences to .ndjson on disk under the artifact dir, gzipping to .ndjson.gz when the cumulative UTF-8 byte size crosses gzipThresholdBytes (default 256 KiB). Uses Node fs streams plus zlib.createGzip so the writer never holds the full transcript in memory. readTranscript inverts the path and streams events back, picking the right pipeline by file extension. Covers happy path, large multibyte, empty input, mid-stream failure cleanup, and unknown-extension reject. * feat(daemon): add Critique Theater orchestrator (Task 4.3) Drives one run end-to-end: parses stdout via parseCritiqueStream, scores each round through scoreboard helpers, persists lifecycle to critique_runs, and emits CritiqueSseEvent variants on the existing project event bus. Honors per-round and total timeouts, applies fallbackPolicy when no <SHIP> arrives, and tees events into writeTranscript so transcripts stream to disk without buffering the whole run in memory. Defensive entry validation throws RangeError on invalid CritiqueConfig before any side effect. Also adds scoreboard.ts (computeComposite, decideRound, selectFallbackRound) and re-exports panelEventToSse/CritiqueSseEvent from the critique subpath so daemon imports never touch the barrel. Fixes missing .js extensions in sse/critique.ts that caused NodeNext module resolution errors. * feat(daemon): wire Critique Theater orchestrator into spawn path (Task 4.4) Adds loadCritiqueConfigFromEnv to read OD_CRITIQUE_* keys with strict validation at boot. Branches the existing CLI spawn flow on cfg.enabled: when false (the M0 default) the legacy single-pass generation runs unchanged; when true the orchestrator owns the run end-to-end. Same SSE bus, same artifact dir, no behavior change for users until they flip the flag. * fix(lockfile): regenerate to include contracts zod + vitest entries The earlier conflict resolution took main's lockfile and ran pnpm install, but the install pass on Windows didn't write the contracts package's zod and vitest entries back into the lockfile. CI's --frozen-lockfile install rejected the resulting state. Re-running pnpm install with --no-frozen-lockfile rewrites the lockfile so it now matches every package.json across the workspace, including contracts/zod ^3.23.8 and contracts/vitest ^2.1.8. Verified locally: pnpm install --frozen-lockfile passes. * fix(daemon): parser ship envelope, SHIP-before-round guard, real artifactRef (Defects 3 + 5) - ParserOptions gains projectId + artifactId; the parser threads them into every emitted ship event's artifactRef so downstream consumers see the real run identity instead of empty placeholders. - <SHIP> now requires at least one closed <ROUND_END> in the same run; malformed streams that emit SHIP before any round complete now throw MalformedBlockError instead of bypassing the round-1 artifact invariant. - The SHIP handler validates the inner <ARTIFACT> block is present and non-empty; missing artifact raises MissingArtifactError. - Three new regressions: SHIP-before-round, SHIP-without-artifact, artifactRef populated from parser options. - Orchestrator threads projectId + artifactId into parserOpts. - Test fixtures updated to include <ARTIFACT> inside <SHIP> blocks. * fix(daemon): orchestrator owns lifecycle, gzip atomicity, fallback on timeout (Defects 2,4,7,8) - Orchestrator now accepts child + childExitPromise, races parser / child-exit / abort / timeout in one awaited flow, and SIGTERMs the child on every non-clean termination. Server awaits the result so the run lifecycle has a single owner. - ChildExitError surfaces when child exits non-zero mid-stream; the run is classified as failed with cause cli_exit_nonzero. - Timeout / abort with at least one completed round elects a fallback via selectFallbackRound and emits a synthetic ship event with status=timed_out or interrupted; the score persists to critique_runs instead of staying null. - applyTimeouts includes childExitRace in every Promise.race so early child exits are classified without waiting for the total timeout. iter.return() cleanup is capped at 200ms to prevent hang on stalling generators. - writeTranscript writes gzip output to transcript.ndjson.gz.tmp, fsyncs, then atomic-renames. Crashes mid-write leave no partial .gz or .gz.tmp on disk. * fix(daemon): plain-stream gating, per-run artifact dir, boot reconcile (Defects 1, 2, 6) - Spawn-path branch now inspects def.streamFormat and only routes through runOrchestrator when format === 'plain'. Adapters emitting wrapper formats (claude-stream-json, copilot-stream-json, json-event-stream, acp-json-rpc, pi-rpc) fall through to legacy single-pass with a one-time stderr warning per format. Per-format decoding into the orchestrator is reserved for v2. - critiqueArtifactDir is now path.join(ARTIFACTS_DIR, projectId, runId) so concurrent or sequential runs in the same project never overwrite each other's transcript or final HTML. Persistence stores the relative per-run path. - reconcileStaleRuns is now invoked after openDatabase on every daemon boot with staleAfterMs = critiqueCfg.totalTimeoutMs. Stale running rows from a prior crash flip to interrupted with rounds_json. recoveryReason='daemon_restart'. Logs a one-line warning naming the flipped count when greater than zero. - Spawn now passes child + childExitPromise to runOrchestrator so the orchestrator can race child exit against the parser, abort signal, and timeouts in one awaited flow. Server awaits the orchestrator's result and surfaces failures through the existing run lifecycle. * fix(daemon): daemon-authoritative scoring, lifecycle status, stderr ordering, insert type Round 2 review feedback on PR #481. 1. CritiqueRunInsert.status now accepts 'running' so the boot-reconcile tests (and any caller seeding an in-flight row) typecheck without casting. The runtime check in insertCritiqueRun already accepted 'running' against the DB constraint set, only the public type was stricter than the DB. 2. round_end keeps the daemon-computed composite authoritative. The agent's <ROUND_END composite=...> attribute is advisory: a divergence beyond COMPOSITE_TOLERANCE emits a composite_mismatch parser_warning so the discrepancy is observable, but the daemon value is what scores and persists. Same policy for must_fix. 3. SHIP-handling derives the final status from decideRound(...) using the daemon's scored round rather than trusting <SHIP composite=... status=...>. A run that the agent claims as shipped but whose daemon composite is below threshold now finalizes as below_threshold, so a malformed or adversarial stream cannot force a ship. 4. server.ts captures the orchestrator's result and maps the critique terminal status to the chat run lifecycle. shipped/below_threshold finalize as 'succeeded'; timed_out/interrupted/degraded/failed finalize as 'failed'. cancelRequested is honored. 5. stderr forwarding and child.on('error') registrations moved BEFORE the orchestrator await so a CLI that floods stderr cannot fill the OS pipe and deadlock until the total timeout, and so an early child error fired during the run is observed by the same listener used after. Tests: - tests/critique-authority.test.ts: 3 new regressions (lying ship downgraded to below_threshold, mismatch warning emitted, aligned composites stay quiet). - All four affected suites green: 14 orchestrator + 10 spawn-wiring + 3 boot-reconcile + 3 authority = 30/30. Workspace typechecks: contracts, daemon, web all exit 0. * fix(daemon,contracts): inline critique SSE, signal-terminated child, null shipped artifactPath Round 3 review feedback on PR #481. 1. packages/contracts/src/critique.ts inlines CritiqueSseEvent + panelEventToSse + CRITIQUE_SSE_EVENT_NAMES + a local mirror of SseTransportEvent. The previous re-export from './sse/critique.js' broke the workspace web build (Turbopack cannot rewrite .js to .ts on a relative source import) while removing the .js extension broke daemon's NodeNext typecheck (it walks this leaf via the './critique' subpath export which requires explicit .js extensions). Inlining removes the cross-file relative import entirely so both consumers walk one self-contained file. packages/contracts/src/sse/critique.ts is removed and its co-located test moves up to packages/contracts/src/critique.test.ts. The barrel packages/contracts/src/index.ts drops the redundant './sse/critique' re-export since './critique' already exports the same symbols. 2. apps/daemon/src/critique/orchestrator.ts treats a signal-terminated child as a terminal race rejection. Previously the race only caught non-zero numeric exit codes and treated code === null as indefinitely pending, so a SIGTERM from /api/runs/:id/cancel resolved childExitPromise as { code: null, signal: 'SIGTERM' } and the orchestrator fell through to the no-SHIP fallback path, persisting below_threshold instead of interrupted. The race now rejects with a new ChildSignaledError when signal !== null, and a new catch branch classifies the run as 'interrupted' and (if at least one round closed) emits a synthetic ship event with status='interrupted' so the persisted row and the SSE transcript reflect the actual cause. 3. Same file, ship-handling: artifactPath is now persisted as null on shipped runs until a future phase actually extracts the <SHIP><ARTIFACT> body to disk. Previously the orchestrator wrote ${artifactDir}/${artifactId} even though no file existed at that path, so any later replay/export/UI code that trusted critique_runs.artifact_path would dereference a missing file. The transcript still records the ship event with the artifact reference so consumers can find the run. Tests: - apps/daemon/tests/critique-lifecycle.test.ts: 2 new regressions (SIGTERM-terminated child after one closed round persists 'interrupted' with a synthetic ship event of the same status; shipped run leaves artifactPath null in result and DB row). - 43 critique-suite tests pass: 14 orchestrator + 11 transcript + 10 spawn-wiring + 3 boot-reconcile + 3 authority + 2 lifecycle. Workspace typechecks: contracts, daemon, web all exit 0. * fix(daemon): buffer raw SHIP, emit only normalized; reject SHIP for unclosed round Round 4 review feedback on PR #481. The parser-event loop used to unconditionally collectedEvents.push(event) and bus.emit(panelEventToSse(event)) for every event, including raw <SHIP>. SSE clients and the transcript could see the agent's forged status="shipped" / composite="9.5" before decideRound(...) ran, even when the daemon later corrected the persisted DB row to below_threshold. The loop now skips ship events entirely; the orchestrator buffers the raw shipEvent, runs daemon-authoritative scoring, and emits a single normalized ship payload built from the daemon's computed composite, selectFallbackRound's mustFix, and decideRound's status. The transcript and SSE bus now only ever see the daemon-scored ship. The unknown-round fallback used to make agent-claimed status/composite authoritative when SHIP referenced a round that was never closed: a malformed stream could close low round 1, then send <SHIP round="2" status="shipped" composite="10">, completedRounds.find(r => r.n === 2) was undefined, and the orchestrator persisted the agent's value. That re-opened the scoring-integrity hole the previous round was meant to close. The orchestrator now drops a SHIP whose round isn't in completedRounds, emits a parser_warning, and falls through to the no-SHIP fallback policy. The synthetic ship from selectFallbackRound gets emitted instead, with daemon-authoritative round/composite/status. Tests: - tests/critique-authority.test.ts: extended the lying-ship regression to also assert the emitted critique.ship payload is downgraded (status='below_threshold', composite < threshold), so the SSE bus cannot see the agent's claim. Added a new regression where SHIP references an unclosed round 2: the agent ship is dropped, a parser_warning fires, the fallback selects round 1, and the only emitted critique.ship has round=1 and status=below_threshold. - 44 critique-suite tests pass: 14 orchestrator + 11 transcript + 10 spawn-wiring + 3 boot-reconcile + 4 authority + 2 lifecycle. Workspace daemon typecheck exits 0. --------- Co-authored-by: Nagendhra <nagendhra405@gmail.com> Co-authored-by: mrcfps <mrc@powerformer.com>	2026-05-05 15:50:35 +08:00
Nagendhra Madishetti	47eeaf445d	feat: Critique Theater foundation (contracts + parser, Phases 0-2) (#387 ) * docs(specs): add Critique Theater design spec for panel-tempered artifacts * docs(specs): add Critique Theater implementation plan * docs(specs): rename UI to Design Jury, add lane-density modes, ship-rule explainer, label sizing * feat(contracts): add CritiqueConfig schema and defaults * fix(contracts): apply Task 1.1 review (CRITIQUE_PROTOCOL_VERSION rename, descriptions, RoleWeights export) * feat(contracts): add PanelEvent discriminated union and isPanelEvent guard * fix(contracts): apply Task 1.2 review (exhaustive event-type list, runId guard, import order) * feat(contracts): add CritiqueSseEvent variants and panelEventToSse mapper * test(daemon): add v1 wire-protocol golden fixtures for Critique Theater parser * feat(daemon): add v1 streaming parser for Critique Theater wire protocol * chore(contracts): add .js extensions to relative imports for NodeNext consumers * fix(daemon): satisfy noUncheckedIndexedAccess in v1 parser regex match access * test(daemon): cover parser failure modes; fix unclosed-PANELIST swallow bug * fix(daemon,contracts): address PR #387 review - parser now clamps panelist + DIM scores against the run-declared scale captured from <CRITIQUE_RUN scale=...>, not a hardcoded 100 - PANELIST appearing before any <ROUND n=...> opens now throws MalformedBlockError rather than emitting events with NaN round - DIM_RE and MUST_FIX_RE hoisted to module scope and lastIndex reset per call so the parser hot path stops recompiling regex per artifact - overflow check after drain simplified to a plain buf.length > cap test (the prior compound condition was always true on the right side and obscured intent) - scoreThreshold <= scoreScale refine gains a 1e-9 epsilon so floating slack does not reject semantically valid configs - round-1 designer ARTIFACT guard gains a comment naming the spec invariant and the v2 relaxation path - 3 new regression tests cover the panelist-without-round, scale=10 clamp, and scale=20 plumbing cases * docs(specs): rationale for non-goals, failure-mode rate targets, Phase 10 matrix, Phase 14 doc layout * Merge branch 'main' into feat/critique-theater Resolves the contracts/index.ts conflict by keeping the .js extensions added by chore(contracts) `2d6e8d6` and slotting in the new export for ./api/app-config introduced upstream by #255 (`9d700ec`). Critique Theater additions (./sse/critique, ./critique) preserved in their original positions. Verified after merge: pnpm --filter @open-design/contracts test -> 10/10 pass pnpm --filter @open-design/contracts typecheck -> exit 0 pnpm --filter @open-design/daemon typecheck -> exit 0 pnpm --filter @open-design/web typecheck -> exit 0 Two daemon tests in tests/media-config.test.ts fail both before and after the merge because they read real OAuth credentials from the developer machine instead of using mock fixtures. That's an upstream isolation issue on origin/main, not something this branch introduces. * fix: unblock web build and address mrcfps PANELIST oversize bypass The chore commit that added .js extensions to satisfy daemon's nodenext typecheck broke apps/web's Next.js build, because webpack tried to resolve the literal ./common.js when only common.ts exists on disk. Replaced with a subpath approach: contracts/exports gains a './critique' entry pointing straight at src/critique.ts (which has no relative imports), and daemon imports route through @open-design/contracts/critique instead of the barrel. Web keeps the bundler-friendly barrel; daemon's nodenext walks only the leaf module. All 13 contracts source files reverted to no-.js. Separately, mrcfps flagged that parserMaxBlockBytes was only enforced on the leftover buffer after drain returned, so a complete oversized block arriving in one chunk slipped past the cap. Added an explicit per-block size check inside drain for every buffered block type (PANELIST, ROUND_END, SHIP). Three regression tests yield the whole stream as a single chunk and assert OversizeBlockError fires before any events emit. * fix(daemon): close three v1 parser invariant gaps from mrcfps review Three independent gaps that all let malformed or oversized protocol output pass the v1 envelope contract: (1) Envelope guard. ROUND, PANELIST, ROUND_END, and SHIP now throw MalformedBlockError when state.inRun is false. Without this, a stream that omits <CRITIQUE_RUN> could still emit panelist_* events without the run_started handshake, leaving downstream reducers with no run-level config. (2) UTF-8 byte length. Both the per-block size check and the post-drain buf-size check now compare Buffer.byteLength(text, 'utf8') against parserMaxBlockBytes. The previous string-length comparison let multibyte content (CJK, emoji) inside <NOTES>/<SUMMARY> exceed the configured byte cap while staying under the JS string length cap, bypassing the daemon's resource guard. (3) Header-end ordering. PANELIST, ROUND_END, and SHIP now require the opener's > to appear before the matched closing tag. A malformed opener like <PANELIST role="x" score="8"</PANELIST> previously fell through to the closing tag's > and emitted events for an invalid block. Four regression tests cover each gap (ROUND-without-run, SHIP-without-run, multibyte-byte-cap, malformed-opener). * fix(lockfile): regenerate to include contracts zod + vitest entries The earlier conflict resolution took main's lockfile and ran pnpm install, but the install pass on Windows didn't write the contracts package's zod and vitest entries back into the lockfile. CI's --frozen-lockfile install rejected the resulting state. Re-running pnpm install with --no-frozen-lockfile rewrites the lockfile so it now matches every package.json across the workspace, including contracts/zod ^3.23.8 and contracts/vitest ^2.1.8. Verified locally: pnpm install --frozen-lockfile passes. --------- Co-authored-by: Nagendhra <nagendhra405@gmail.com>	2026-05-04 20:28:28 +08:00

Nagendhra Madishetti

32fa0c23bb

feat(daemon): Critique Theater Phase 6.2 (artifact extraction + endpoint) (#1085 )

The orchestrator was leaving artifactPath = null on every shipped run because
the SHIP <ARTIFACT> body never made it past the parser. Reviewers caught this
on PR #1006: a rerun-style endpoint built on top of that null could not return
a usable prior-art reference, and tests that synthesized artifactPath via
insertCritiqueRun were hiding the gap rather than covering the feature.

This PR closes that gap. The parser now hands the orchestrator a
ShipArtifactPayload (round, mime, body) through a side-channel callback, and
the orchestrator writes the bytes to <artifactsDir>/<projectId>/<runId>/
artifact.<ext> via a new artifact-writer module. The row's artifactPath is
the absolute on-disk path. The web layer never sees that path: it fetches
the bytes through GET /api/projects/:projectId/critique/:runId/artifact,
which the new artifact-handler module serves with a mime-derived
Content-Type, X-Content-Type-Options: nosniff, a CSP header for HTML and
SVG, and the same cross-project leak guard pattern the interrupt handler
uses.

The body and mime intentionally never travel on the SSE wire. The SHIP
PanelEvent (which doubles as the SSE payload shape) keeps its lightweight
artifactRef, and the orchestrator strips body/mime before bus.emit, so a
multi-megabyte artifact does not broadcast to every subscriber. The new
orchestrator test asserts this explicitly.

Defense in depth in the writer + handler:

  - mime allowlist with text/html, text/css, text/markdown, text/plain,
    application/json, image/svg+xml; everything else falls through to
    application/octet-stream + .bin so unknown payloads can't be
    misinterpreted as a known type;
  - UTF-8 byte-length cap, configurable via cfg.parserMaxBlockBytes, so
    multi-byte payloads can't sneak past a JS .length check;
  - atomic write through a sibling tmp file + rename so a daemon crash
    mid-write can't leave a half-written artifact under the canonical
    name;
  - path-traversal guard on the GET endpoint that resolves the row's
    artifactPath against the artifacts root and refuses anything that
    escapes it, refuses non-regular files (symlinks, dirs), and refuses
    files larger than the response cap.

Folded in two non-blocking notes lefarcen left on PR #1016 (the contracts
move) since persistence.ts was already in scope here:

  - P2: introduced CritiquePersistedStatus = CritiqueRunStatus | 'running'
    in the contracts package. CritiqueRunRow.status and CritiqueRunInsert.
    status now use it, and the inline `as CritiqueRunStatus | 'running'`
    widen in interrupt-handler.ts is gone. Public DTOs continue to use the
    terminal-only CritiqueRunStatus so a future endpoint can't leak a
    'running' row through the wire.
  - P3: added AssertExhaustiveValues + a compile-time assertion that
    CRITIQUE_RUN_STATUSES covers every CritiqueRunStatus variant.
    Adding a value to ShipStatus or CritiqueRunStatus without updating
    the array now fails the build with a tuple naming the missing
    variants instead of silently dropping out of UI filters.

Coverage: 174 critique tests across 14 files pass locally, including the
new critique-artifact-writer (13 cases) and critique-artifact-endpoint
(11 cases) suites, the inverted critique-lifecycle artifact-persistence
test, and the orchestrator happy-path that asserts the SSE ship payload
does NOT carry body or mime.

Validated: pnpm guard, pnpm --filter @open-design/contracts build,
pnpm --filter @open-design/daemon build (full tsc), pnpm --filter
@open-design/web typecheck, pnpm --filter @open-design/daemon exec
vitest run tests/critique (all green).

This is step (b) of the four-step plan that PR #1006's closing comment
laid out. Step (a) was the contracts move in PR #1016. Steps (c)
(persist original_message_id / agent_id / model_id) and (d) (real
rerun endpoint on top of (a)+(b)+(c)) follow.

Co-authored-by: Nagendhra <nagendhra405@gmail.com>

2026-05-10 23:59:04 +08:00

Nagendhra Madishetti

76e6c7a9f6

feat: Critique Theater Phase 4 (persistence + transcript + orchestrator) (#481 )

* docs(specs): add Critique Theater design spec for panel-tempered artifacts

* docs(specs): add Critique Theater implementation plan

* docs(specs): rename UI to Design Jury, add lane-density modes, ship-rule explainer, label sizing

* feat(contracts): add CritiqueConfig schema and defaults

* fix(contracts): apply Task 1.1 review (CRITIQUE_PROTOCOL_VERSION rename, descriptions, RoleWeights export)

* feat(contracts): add PanelEvent discriminated union and isPanelEvent guard

* fix(contracts): apply Task 1.2 review (exhaustive event-type list, runId guard, import order)

* feat(contracts): add CritiqueSseEvent variants and panelEventToSse mapper

* test(daemon): add v1 wire-protocol golden fixtures for Critique Theater parser

* feat(daemon): add v1 streaming parser for Critique Theater wire protocol

* chore(contracts): add .js extensions to relative imports for NodeNext consumers

* fix(daemon): satisfy noUncheckedIndexedAccess in v1 parser regex match access

* test(daemon): cover parser failure modes; fix unclosed-PANELIST swallow bug

* fix(daemon,contracts): address PR #387 review

- parser now clamps panelist + DIM scores against the run-declared scale
  captured from <CRITIQUE_RUN scale=...>, not a hardcoded 100
- PANELIST appearing before any <ROUND n=...> opens now throws
  MalformedBlockError rather than emitting events with NaN round
- DIM_RE and MUST_FIX_RE hoisted to module scope and lastIndex reset per
  call so the parser hot path stops recompiling regex per artifact
- overflow check after drain simplified to a plain buf.length > cap test
  (the prior compound condition was always true on the right side and
  obscured intent)
- scoreThreshold <= scoreScale refine gains a 1e-9 epsilon so floating
  slack does not reject semantically valid configs
- round-1 designer ARTIFACT guard gains a comment naming the spec
  invariant and the v2 relaxation path
- 3 new regression tests cover the panelist-without-round, scale=10
  clamp, and scale=20 plumbing cases

* docs(specs): rationale for non-goals, failure-mode rate targets, Phase 10 matrix, Phase 14 doc layout

* Merge branch 'main' into feat/critique-theater

Resolves the contracts/index.ts conflict by keeping the .js extensions added
by chore(contracts) 2d6e8d6 and slotting in the new export for ./api/app-config
introduced upstream by #255 (9d700ec). Critique Theater additions
(./sse/critique, ./critique) preserved in their original positions.

Verified after merge:
  pnpm --filter @open-design/contracts test    -> 10/10 pass
  pnpm --filter @open-design/contracts typecheck -> exit 0
  pnpm --filter @open-design/daemon typecheck  -> exit 0
  pnpm --filter @open-design/web typecheck     -> exit 0

Two daemon tests in tests/media-config.test.ts fail both before and after the
merge because they read real OAuth credentials from the developer machine
instead of using mock fixtures. That's an upstream isolation issue on
origin/main, not something this branch introduces.

* fix: unblock web build and address mrcfps PANELIST oversize bypass

The chore commit that added .js extensions to satisfy daemon's nodenext
typecheck broke apps/web's Next.js build, because webpack tried to resolve
the literal ./common.js when only common.ts exists on disk. Replaced with
a subpath approach: contracts/exports gains a './critique' entry pointing
straight at src/critique.ts (which has no relative imports), and daemon
imports route through @open-design/contracts/critique instead of the
barrel. Web keeps the bundler-friendly barrel; daemon's nodenext walks
only the leaf module. All 13 contracts source files reverted to no-.js.

Separately, mrcfps flagged that parserMaxBlockBytes was only enforced on
the leftover buffer after drain returned, so a complete oversized block
arriving in one chunk slipped past the cap. Added an explicit per-block
size check inside drain for every buffered block type (PANELIST,
ROUND_END, SHIP). Three regression tests yield the whole stream as a
single chunk and assert OversizeBlockError fires before any events emit.

* fix(daemon): close three v1 parser invariant gaps from mrcfps review

Three independent gaps that all let malformed or oversized protocol
output pass the v1 envelope contract:

(1) Envelope guard. ROUND, PANELIST, ROUND_END, and SHIP now throw
MalformedBlockError when state.inRun is false. Without this, a stream
that omits <CRITIQUE_RUN> could still emit panelist_* events without
the run_started handshake, leaving downstream reducers with no run-level
config.

(2) UTF-8 byte length. Both the per-block size check and the post-drain
buf-size check now compare Buffer.byteLength(text, 'utf8') against
parserMaxBlockBytes. The previous string-length comparison let multibyte
content (CJK, emoji) inside <NOTES>/<SUMMARY> exceed the configured
byte cap while staying under the JS string length cap, bypassing the
daemon's resource guard.

(3) Header-end ordering. PANELIST, ROUND_END, and SHIP now require the
opener's > to appear before the matched closing tag. A malformed opener
like <PANELIST role="x" score="8"</PANELIST> previously fell through
to the closing tag's > and emitted events for an invalid block.

Four regression tests cover each gap (ROUND-without-run,
SHIP-without-run, multibyte-byte-cap, malformed-opener).

* feat(daemon): add critique_runs persistence (Task 4.1)

Introduces a new SQLite table critique_runs to back the orchestrator's
run lifecycle. Plan called for ALTER TABLE artifacts ADD COLUMN ..., but
artifacts is not a DB concept in this repo; runs get their own table.

- migrateCritique(db) creates the table + two indexes idempotently and
  is wired into the existing migrate(db) flow on daemon boot.
- CRUD helpers (insertCritiqueRun, getCritiqueRun, updateCritiqueRun,
  listCritiqueRunsByProject, deleteCritiqueRun) round-trip rounds_json
  through helpers so callers see typed CritiqueRunRow.
- reconcileStaleRuns flips stale 'running' rows to 'interrupted' with
  a recoveryReason='daemon_restart' marker, supporting the spec's
  daemon-restart-mid-run failure mode.
- Public CritiqueRunStatus union excludes the in-flight 'running' value
  but the runtime CHECK accepts it, matching the spec's lifecycle.
- 11 vitest cases cover migration idempotence, round-trip, default
  rounds, status validation, update + list ordering, deletion, and
  reconciliation, plus FK CASCADE on project deletion.

* feat(daemon): add Critique Theater transcript writer (Task 4.2)

Streams PanelEvent sequences to .ndjson on disk under the artifact dir,
gzipping to .ndjson.gz when the cumulative UTF-8 byte size crosses
gzipThresholdBytes (default 256 KiB). Uses Node fs streams plus
zlib.createGzip so the writer never holds the full transcript in memory.
readTranscript inverts the path and streams events back, picking the
right pipeline by file extension. Covers happy path, large multibyte,
empty input, mid-stream failure cleanup, and unknown-extension reject.

* feat(daemon): add Critique Theater orchestrator (Task 4.3)

Drives one run end-to-end: parses stdout via parseCritiqueStream, scores
each round through scoreboard helpers, persists lifecycle to critique_runs,
and emits CritiqueSseEvent variants on the existing project event bus.
Honors per-round and total timeouts, applies fallbackPolicy when no
<SHIP> arrives, and tees events into writeTranscript so transcripts
stream to disk without buffering the whole run in memory. Defensive entry
validation throws RangeError on invalid CritiqueConfig before any side
effect.

Also adds scoreboard.ts (computeComposite, decideRound, selectFallbackRound)
and re-exports panelEventToSse/CritiqueSseEvent from the critique subpath
so daemon imports never touch the barrel. Fixes missing .js extensions in
sse/critique.ts that caused NodeNext module resolution errors.

* feat(daemon): wire Critique Theater orchestrator into spawn path (Task 4.4)

Adds loadCritiqueConfigFromEnv to read OD_CRITIQUE_* keys with strict
validation at boot. Branches the existing CLI spawn flow on cfg.enabled:
when false (the M0 default) the legacy single-pass generation runs
unchanged; when true the orchestrator owns the run end-to-end. Same SSE
bus, same artifact dir, no behavior change for users until they flip the
flag.

* fix(lockfile): regenerate to include contracts zod + vitest entries

The earlier conflict resolution took main's lockfile and ran pnpm
install, but the install pass on Windows didn't write the contracts
package's zod and vitest entries back into the lockfile. CI's
--frozen-lockfile install rejected the resulting state. Re-running
pnpm install with --no-frozen-lockfile rewrites the lockfile so it
now matches every package.json across the workspace, including
contracts/zod ^3.23.8 and contracts/vitest ^2.1.8. Verified locally:
pnpm install --frozen-lockfile passes.

* fix(daemon): parser ship envelope, SHIP-before-round guard, real artifactRef (Defects 3 + 5)

- ParserOptions gains projectId + artifactId; the parser threads them into
  every emitted ship event's artifactRef so downstream consumers see the
  real run identity instead of empty placeholders.
- <SHIP> now requires at least one closed <ROUND_END> in the same run;
  malformed streams that emit SHIP before any round complete now throw
  MalformedBlockError instead of bypassing the round-1 artifact invariant.
- The SHIP handler validates the inner <ARTIFACT> block is present and
  non-empty; missing artifact raises MissingArtifactError.
- Three new regressions: SHIP-before-round, SHIP-without-artifact,
  artifactRef populated from parser options.
- Orchestrator threads projectId + artifactId into parserOpts.
- Test fixtures updated to include <ARTIFACT> inside <SHIP> blocks.

* fix(daemon): orchestrator owns lifecycle, gzip atomicity, fallback on timeout (Defects 2,4,7,8)

- Orchestrator now accepts child + childExitPromise, races parser /
  child-exit / abort / timeout in one awaited flow, and SIGTERMs the
  child on every non-clean termination. Server awaits the result so
  the run lifecycle has a single owner.
- ChildExitError surfaces when child exits non-zero mid-stream; the
  run is classified as failed with cause cli_exit_nonzero.
- Timeout / abort with at least one completed round elects a fallback
  via selectFallbackRound and emits a synthetic ship event with
  status=timed_out or interrupted; the score persists to
  critique_runs instead of staying null.
- applyTimeouts includes childExitRace in every Promise.race so early
  child exits are classified without waiting for the total timeout.
  iter.return() cleanup is capped at 200ms to prevent hang on
  stalling generators.
- writeTranscript writes gzip output to transcript.ndjson.gz.tmp,
  fsyncs, then atomic-renames. Crashes mid-write leave no partial
  .gz or .gz.tmp on disk.

* fix(daemon): plain-stream gating, per-run artifact dir, boot reconcile (Defects 1, 2, 6)

- Spawn-path branch now inspects def.streamFormat and only routes through
  runOrchestrator when format === 'plain'. Adapters emitting wrapper
  formats (claude-stream-json, copilot-stream-json, json-event-stream,
  acp-json-rpc, pi-rpc) fall through to legacy single-pass with a
  one-time stderr warning per format. Per-format decoding into the
  orchestrator is reserved for v2.
- critiqueArtifactDir is now path.join(ARTIFACTS_DIR, projectId, runId)
  so concurrent or sequential runs in the same project never overwrite
  each other's transcript or final HTML. Persistence stores the relative
  per-run path.
- reconcileStaleRuns is now invoked after openDatabase on every daemon
  boot with staleAfterMs = critiqueCfg.totalTimeoutMs. Stale running
  rows from a prior crash flip to interrupted with rounds_json.
  recoveryReason='daemon_restart'. Logs a one-line warning naming the
  flipped count when greater than zero.
- Spawn now passes child + childExitPromise to runOrchestrator so the
  orchestrator can race child exit against the parser, abort signal,
  and timeouts in one awaited flow. Server awaits the orchestrator's
  result and surfaces failures through the existing run lifecycle.

* fix(daemon): daemon-authoritative scoring, lifecycle status, stderr ordering, insert type

Round 2 review feedback on PR #481.

1. CritiqueRunInsert.status now accepts 'running' so the boot-reconcile
   tests (and any caller seeding an in-flight row) typecheck without
   casting. The runtime check in insertCritiqueRun already accepted
   'running' against the DB constraint set, only the public type was
   stricter than the DB.
2. round_end keeps the daemon-computed composite authoritative. The
   agent's <ROUND_END composite=...> attribute is advisory: a divergence
   beyond COMPOSITE_TOLERANCE emits a composite_mismatch parser_warning
   so the discrepancy is observable, but the daemon value is what scores
   and persists. Same policy for must_fix.
3. SHIP-handling derives the final status from decideRound(...) using the
   daemon's scored round rather than trusting <SHIP composite=... status=...>.
   A run that the agent claims as shipped but whose daemon composite is
   below threshold now finalizes as below_threshold, so a malformed or
   adversarial stream cannot force a ship.
4. server.ts captures the orchestrator's result and maps the critique
   terminal status to the chat run lifecycle. shipped/below_threshold
   finalize as 'succeeded'; timed_out/interrupted/degraded/failed
   finalize as 'failed'. cancelRequested is honored.
5. stderr forwarding and child.on('error') registrations moved BEFORE
   the orchestrator await so a CLI that floods stderr cannot fill the
   OS pipe and deadlock until the total timeout, and so an early
   child error fired during the run is observed by the same listener
   used after.

Tests:
- tests/critique-authority.test.ts: 3 new regressions (lying ship
  downgraded to below_threshold, mismatch warning emitted, aligned
  composites stay quiet).
- All four affected suites green: 14 orchestrator + 10 spawn-wiring +
  3 boot-reconcile + 3 authority = 30/30.

Workspace typechecks: contracts, daemon, web all exit 0.

* fix(daemon,contracts): inline critique SSE, signal-terminated child, null shipped artifactPath

Round 3 review feedback on PR #481.

1. packages/contracts/src/critique.ts inlines CritiqueSseEvent +
   panelEventToSse + CRITIQUE_SSE_EVENT_NAMES + a local mirror of
   SseTransportEvent. The previous re-export from './sse/critique.js'
   broke the workspace web build (Turbopack cannot rewrite .js to .ts
   on a relative source import) while removing the .js extension broke
   daemon's NodeNext typecheck (it walks this leaf via the './critique'
   subpath export which requires explicit .js extensions). Inlining
   removes the cross-file relative import entirely so both consumers
   walk one self-contained file. packages/contracts/src/sse/critique.ts
   is removed and its co-located test moves up to
   packages/contracts/src/critique.test.ts. The barrel
   packages/contracts/src/index.ts drops the redundant
   './sse/critique' re-export since './critique' already exports the
   same symbols.

2. apps/daemon/src/critique/orchestrator.ts treats a signal-terminated
   child as a terminal race rejection. Previously the race only caught
   non-zero numeric exit codes and treated code === null as
   indefinitely pending, so a SIGTERM from /api/runs/:id/cancel
   resolved childExitPromise as { code: null, signal: 'SIGTERM' } and
   the orchestrator fell through to the no-SHIP fallback path,
   persisting below_threshold instead of interrupted. The race now
   rejects with a new ChildSignaledError when signal !== null, and a
   new catch branch classifies the run as 'interrupted' and (if at
   least one round closed) emits a synthetic ship event with
   status='interrupted' so the persisted row and the SSE transcript
   reflect the actual cause.

3. Same file, ship-handling: artifactPath is now persisted as null on
   shipped runs until a future phase actually extracts the
   <SHIP><ARTIFACT> body to disk. Previously the orchestrator wrote
   ${artifactDir}/${artifactId} even though no file existed at that
   path, so any later replay/export/UI code that trusted
   critique_runs.artifact_path would dereference a missing file. The
   transcript still records the ship event with the artifact reference
   so consumers can find the run.

Tests:
- apps/daemon/tests/critique-lifecycle.test.ts: 2 new regressions
  (SIGTERM-terminated child after one closed round persists
  'interrupted' with a synthetic ship event of the same status; shipped
  run leaves artifactPath null in result and DB row).
- 43 critique-suite tests pass: 14 orchestrator + 11 transcript +
  10 spawn-wiring + 3 boot-reconcile + 3 authority + 2 lifecycle.

Workspace typechecks: contracts, daemon, web all exit 0.

* fix(daemon): buffer raw SHIP, emit only normalized; reject SHIP for unclosed round

Round 4 review feedback on PR #481.

The parser-event loop used to unconditionally collectedEvents.push(event)
and bus.emit(panelEventToSse(event)) for every event, including raw
<SHIP>. SSE clients and the transcript could see the agent's forged
status="shipped" / composite="9.5" before decideRound(...) ran, even
when the daemon later corrected the persisted DB row to below_threshold.
The loop now skips ship events entirely; the orchestrator buffers the
raw shipEvent, runs daemon-authoritative scoring, and emits a single
normalized ship payload built from the daemon's computed composite,
selectFallbackRound's mustFix, and decideRound's status. The transcript
and SSE bus now only ever see the daemon-scored ship.

The unknown-round fallback used to make agent-claimed status/composite
authoritative when SHIP referenced a round that was never closed: a
malformed stream could close low round 1, then send <SHIP round="2"
status="shipped" composite="10">, completedRounds.find(r => r.n === 2)
was undefined, and the orchestrator persisted the agent's value. That
re-opened the scoring-integrity hole the previous round was meant to
close. The orchestrator now drops a SHIP whose round isn't in
completedRounds, emits a parser_warning, and falls through to the
no-SHIP fallback policy. The synthetic ship from selectFallbackRound
gets emitted instead, with daemon-authoritative round/composite/status.

Tests:
- tests/critique-authority.test.ts: extended the lying-ship regression
  to also assert the emitted critique.ship payload is downgraded
  (status='below_threshold', composite < threshold), so the SSE bus
  cannot see the agent's claim. Added a new regression where SHIP
  references an unclosed round 2: the agent ship is dropped, a
  parser_warning fires, the fallback selects round 1, and the only
  emitted critique.ship has round=1 and status=below_threshold.
- 44 critique-suite tests pass: 14 orchestrator + 11 transcript + 10
  spawn-wiring + 3 boot-reconcile + 4 authority + 2 lifecycle.

Workspace daemon typecheck exits 0.

---------

Co-authored-by: Nagendhra <nagendhra405@gmail.com>
Co-authored-by: mrcfps <mrc@powerformer.com>

2026-05-05 15:50:35 +08:00

Nagendhra Madishetti

47eeaf445d

feat: Critique Theater foundation (contracts + parser, Phases 0-2) (#387 )

* docs(specs): add Critique Theater design spec for panel-tempered artifacts

* docs(specs): add Critique Theater implementation plan

* docs(specs): rename UI to Design Jury, add lane-density modes, ship-rule explainer, label sizing

* feat(contracts): add CritiqueConfig schema and defaults

* fix(contracts): apply Task 1.1 review (CRITIQUE_PROTOCOL_VERSION rename, descriptions, RoleWeights export)

* feat(contracts): add PanelEvent discriminated union and isPanelEvent guard

* fix(contracts): apply Task 1.2 review (exhaustive event-type list, runId guard, import order)

* feat(contracts): add CritiqueSseEvent variants and panelEventToSse mapper

* test(daemon): add v1 wire-protocol golden fixtures for Critique Theater parser

* feat(daemon): add v1 streaming parser for Critique Theater wire protocol

* chore(contracts): add .js extensions to relative imports for NodeNext consumers

* fix(daemon): satisfy noUncheckedIndexedAccess in v1 parser regex match access

* test(daemon): cover parser failure modes; fix unclosed-PANELIST swallow bug

* fix(daemon,contracts): address PR #387 review

- parser now clamps panelist + DIM scores against the run-declared scale
  captured from <CRITIQUE_RUN scale=...>, not a hardcoded 100
- PANELIST appearing before any <ROUND n=...> opens now throws
  MalformedBlockError rather than emitting events with NaN round
- DIM_RE and MUST_FIX_RE hoisted to module scope and lastIndex reset per
  call so the parser hot path stops recompiling regex per artifact
- overflow check after drain simplified to a plain buf.length > cap test
  (the prior compound condition was always true on the right side and
  obscured intent)
- scoreThreshold <= scoreScale refine gains a 1e-9 epsilon so floating
  slack does not reject semantically valid configs
- round-1 designer ARTIFACT guard gains a comment naming the spec
  invariant and the v2 relaxation path
- 3 new regression tests cover the panelist-without-round, scale=10
  clamp, and scale=20 plumbing cases

* docs(specs): rationale for non-goals, failure-mode rate targets, Phase 10 matrix, Phase 14 doc layout

* Merge branch 'main' into feat/critique-theater

Resolves the contracts/index.ts conflict by keeping the .js extensions added
by chore(contracts) 2d6e8d6 and slotting in the new export for ./api/app-config
introduced upstream by #255 (9d700ec). Critique Theater additions
(./sse/critique, ./critique) preserved in their original positions.

Verified after merge:
  pnpm --filter @open-design/contracts test    -> 10/10 pass
  pnpm --filter @open-design/contracts typecheck -> exit 0
  pnpm --filter @open-design/daemon typecheck  -> exit 0
  pnpm --filter @open-design/web typecheck     -> exit 0

Two daemon tests in tests/media-config.test.ts fail both before and after the
merge because they read real OAuth credentials from the developer machine
instead of using mock fixtures. That's an upstream isolation issue on
origin/main, not something this branch introduces.

* fix: unblock web build and address mrcfps PANELIST oversize bypass

The chore commit that added .js extensions to satisfy daemon's nodenext
typecheck broke apps/web's Next.js build, because webpack tried to resolve
the literal ./common.js when only common.ts exists on disk. Replaced with
a subpath approach: contracts/exports gains a './critique' entry pointing
straight at src/critique.ts (which has no relative imports), and daemon
imports route through @open-design/contracts/critique instead of the
barrel. Web keeps the bundler-friendly barrel; daemon's nodenext walks
only the leaf module. All 13 contracts source files reverted to no-.js.

Separately, mrcfps flagged that parserMaxBlockBytes was only enforced on
the leftover buffer after drain returned, so a complete oversized block
arriving in one chunk slipped past the cap. Added an explicit per-block
size check inside drain for every buffered block type (PANELIST,
ROUND_END, SHIP). Three regression tests yield the whole stream as a
single chunk and assert OversizeBlockError fires before any events emit.

* fix(daemon): close three v1 parser invariant gaps from mrcfps review

Three independent gaps that all let malformed or oversized protocol
output pass the v1 envelope contract:

(1) Envelope guard. ROUND, PANELIST, ROUND_END, and SHIP now throw
MalformedBlockError when state.inRun is false. Without this, a stream
that omits <CRITIQUE_RUN> could still emit panelist_* events without
the run_started handshake, leaving downstream reducers with no run-level
config.

(2) UTF-8 byte length. Both the per-block size check and the post-drain
buf-size check now compare Buffer.byteLength(text, 'utf8') against
parserMaxBlockBytes. The previous string-length comparison let multibyte
content (CJK, emoji) inside <NOTES>/<SUMMARY> exceed the configured
byte cap while staying under the JS string length cap, bypassing the
daemon's resource guard.

(3) Header-end ordering. PANELIST, ROUND_END, and SHIP now require the
opener's > to appear before the matched closing tag. A malformed opener
like <PANELIST role="x" score="8"</PANELIST> previously fell through
to the closing tag's > and emitted events for an invalid block.

Four regression tests cover each gap (ROUND-without-run,
SHIP-without-run, multibyte-byte-cap, malformed-opener).

* fix(lockfile): regenerate to include contracts zod + vitest entries

The earlier conflict resolution took main's lockfile and ran pnpm
install, but the install pass on Windows didn't write the contracts
package's zod and vitest entries back into the lockfile. CI's
--frozen-lockfile install rejected the resulting state. Re-running
pnpm install with --no-frozen-lockfile rewrites the lockfile so it
now matches every package.json across the workspace, including
contracts/zod ^3.23.8 and contracts/vitest ^2.1.8. Verified locally:
pnpm install --frozen-lockfile passes.

---------

Co-authored-by: Nagendhra <nagendhra405@gmail.com>

2026-05-04 20:28:28 +08:00

3 commits