open-design/apps/web/tests/components/Theater/state
Nagendhra Madishetti 40766ef1ba
test(web): Critique Theater Phase 13 (reducer p99 bench + surface coverage walker) (#1318)
* feat(web): pure reducer for Critique Theater states (Phase 7.1)

Pure CritiqueState reducer driven by the contracts-level PanelEvent
(the same shape both the live SSE stream and the recorded transcript
emit), so a single reducer powers both the in-flight panel and the
rerun replay. Lifecycle covers run_started → running → (shipped /
degraded / interrupted / failed), with panelist_open / dim /
must_fix / close / round_end events building per-round
CritiquePanelistView entries as they arrive.

Defensive behaviour that surfaced while writing the spec tests:
- Terminal phases (shipped / degraded / interrupted / failed) are
  sticky against further lifecycle events for the same run, except
  for parser_warning which can land late and is recorded in a side
  channel without changing phase.
- A new run_started for a different runId at any time discards the
  prior state and reboots, so the UI can launch consecutive runs
  without an explicit reset action.
- Events whose runId does not match the active run return the same
  state reference, so React's useReducer doesn't re-render
  subscribers on stray traffic.
- Round bookkeeping keys by round number rather than "always last",
  so an out-of-order panelist_dim for round 1 arriving after a
  round 2 dim does not corrupt the round 2 bucket.

Test coverage: 18 cases covering each transition, the runId guard,
sticky-terminal behaviour, the out-of-order round invariant, and
the stable-identity guarantee. Sets up Phase 7.2 and 7.3 to wire
SSE + replay into the same reducer.

* feat(web): useCritiqueStream hook subscribes to SSE and feeds reducer (Phase 7.2)

createCritiqueEventsConnection is a pure connection manager that
mirrors apps/web/src/providers/project-events.ts: opens an
EventSource at /api/projects/:id/events, listens for every name in
CRITIQUE_SSE_EVENT_NAMES, decodes each frame back into a PanelEvent
(stripping the critique. prefix and merging the data payload), and
hands it to the caller's onEvent. Reconnect uses exponential
backoff (1s → 30s) and resets on `ready`; malformed payloads drop
with a dev-mode warning rather than tearing the stream.

useCritiqueStream wraps the manager in a useReducer that owns the
CritiqueState. enabled=false or a null projectId tears down the
connection cleanly; switching projectId closes the old connection
and opens a fresh one. The returned dispatch lets local UI
synthesise actions (e.g. an Esc keypress firing a synthetic
interrupted while a kill request is in flight); production traffic
comes from the SSE stream.

Test coverage:
- sse.test.ts (10 cases, node env): subscription set covers every
  CRITIQUE_SSE_EVENT_NAMES channel; payload decoding lifts the wire
  shape back to PanelEvent; malformed JSON is swallowed and does
  not stop the stream; exponential backoff schedule and ready-reset
  semantics are pinned with a setTimeout seam; close() cancels
  pending reconnects and shuts the live source; no-op fallback
  when EventSource is unavailable.
- useCritiqueStream.test.tsx (6 cases, jsdom env): idle pre-event,
  reducer driven by synthetic actions, no connection when disabled
  or projectId is null, clean close on unmount, projectId change
  reopens cleanly.

* feat(web): useCritiqueReplay hook drives reducer from transcript file (Phase 7.3)

Fetches the per-run NDJSON transcript (one PanelEvent per line),
parses every line via the shared isPanelEvent predicate, and
dispatches into the same CritiqueState reducer the live SSE stream
uses. A single reducer means the UI rendering a replay can be
identical to the live panel, and a UI mounting both
useCritiqueStream and useCritiqueReplay in parallel does not have
to reconcile two state shapes.

speed knob is `paused | instant | live | { intervalMs: N }`.
- instant flushes every event synchronously, useful for opening a
  finished run already at its terminal state.
- intervalMs paces dispatches at a fixed cadence so the reviewer
  can watch the run unfold.
- paused parses the transcript but holds events back until the
  caller advances speed (consumers can drive a scrubber later).
- live is reserved for the future "playback at original cadence"
  feature, currently treated as instant; replay timestamps are not
  yet persisted with each event so honest pacing requires a
  follow-up Phase 7+ task.

gunzip seam handles `.ndjson.gz` transcripts via
DecompressionStream when present; the production fetch path picks
between text and arrayBuffer based on the URL extension. Both seams
are injectable so the unit tests don't need to spin up a real
network or a real gzip pipeline.

Test coverage (8 cases, jsdom env):
- Idle status before any URL is provided.
- speed=instant flushes the full transcript synchronously to
  shipped state.
- speed={intervalMs:N} paces with the setTimeout seam, reaching
  done after the last tick.
- speed=paused leaves status=playing with no dispatches.
- Empty transcript reports done with state still idle.
- Fetch rejection surfaces an error status with the message.
- Malformed NDJSON lines are skipped; valid events around them
  still land.
- .gz transcripts route through the gunzip seam.

Closes the Phase 7 plan tasks 7.1 / 7.2 / 7.3 (reducer + stream +
replay), all on one branch ready for review. Phases 8+ (Theater
components) consume these from this PR.

* fix(web): close payload-override gap + paused-resume bug in Critique Theater hooks (Phase 7 review)

Two P1 fixes from lefarcen's review on PR #1307:

SSE payload override

`sseToPanelEvent` previously spread `data` after the channel-derived
`type`, so a payload-provided `type` could override the channel and
route a `critique.run_started` frame into the reducer as a `ship`
action. Reversed the spread so the channel-derived `type` is
authoritative, and revalidated the resulting object through the
contracts-level `isPanelEvent` predicate before returning. Frames
that fail validation (missing runId, empty runId, unknown type) are
dropped, so a malformed or compromised SSE frame can no longer
dispatch a wrong-shape action into the reducer.

Three new sse.test.ts cases pin the regression: hostile `type:'ship'`
in the payload still resolves to `run_started`, missing runId is
dropped, empty runId is dropped.

Replay pause/resume

`useCritiqueReplay` had one big effect keyed on `transcriptUrl`
only, so flipping `speed` from `paused` to `instant` never re-fired
and the held events sat undispatched. Split into a parse effect
(depends on URL, fetches and stores events in state) and a pace
effect (depends on parsed-events + speed, owns the cursor + timers).
The playback cursor lives in a ref that survives pause/resume
cycles, so flipping `paused` -> `instant` flushes from the current
position rather than restarting (which would double-dispatch
`run_started` and reset the reducer).

Two new useCritiqueReplay.test.tsx cases:
- paused-then-instant transitions from `playing` to `done` and
  reaches the shipped terminal phase
- intervalMs paced playback dispatches one event, pauses to drain
  the next scheduled timer, flips to instant, and confirms the
  remaining transcript drains exactly once (cursor was preserved)

Doc consistency

The earlier source comment in useCritiqueReplay.ts claimed `live`
"paces by recorded timestamps" while the impl used zero-delay
timers and the PR body said it behaves like `instant`. Aligned to
reality: `live` currently behaves like `{ intervalMs: 0 }` (events
drain on successive microtasks via setTimeoutFn) because transcripts
do not yet carry per-event timestamps. Honest timestamp-driven
pacing is queued as a Phase 7+ follow-up.

Validated: pnpm guard, pnpm --filter @open-design/web typecheck,
Theater suite 47/47 (up from 42, +3 sse + 2 replay), full web suite
96 files / 888 tests.

* feat(i18n): seed Critique Theater key block (en + zh-CN; other locales fall back via spread)

* feat(web): Theater PanelistLane component (Phase 8.1)

* feat(web): Theater ScoreTicker component (Phase 8.2)

* feat(web): Theater RoundDivider component (Phase 8.3)

* feat(web): Theater InterruptButton component with Escape keybind (Phase 8.4)

* feat(web): Theater TheaterDegraded chip (Phase 8.5)

* feat(web): Theater TheaterCollapsed post-run summary (Phase 8.6)

* feat(web): Theater TheaterTranscript replay surface (Phase 8.7)

* feat(web): Theater TheaterStage top-level container (Phase 8.8)

* feat(web): Theater CSS using existing semantic tokens (no hex literals)

* feat(web): Theater public exports barrel

* fix(web): resolve P2 + P3 review feedback on Phase 8 (PR #1314)

Addresses all 4 P2 + 3 P3 items from codex, Siri-Ray, and lefarcen.

State-lifecycle fixes (3 x P2)
1. Reducer learns a synthetic `__reset__` action (`CritiqueResetAction`).
   Host hooks dispatch it when their gating prop changes so a stale
   run from a prior project / transcript cannot bleed into the next
   context. Reset is idempotent on idle (returns the same reference).
2. `useCritiqueStream` dispatches `__reset__` at the top of its
   connection effect, so a workspace switch from project A (which
   streamed a critique) to project B clears the reducer before the
   new EventSource opens. enabled=false also clears.
3. `useCritiqueReplay` dispatches `__reset__` at the top of its
   parse effect, so transcriptUrl swaps (including swap-to-null after
   a replay reached `shipped`) lift the reducer back to idle before
   the new fetch starts.

SSE validation (1 x P2)
4. `sseToPanelEvent` now runs a per-variant `hasValidVariantShape`
   check after the cheap `isPanelEvent` predicate. A
   `critique.ship` frame missing `composite` / `round` / `status` /
   `artifactRef` is rejected before reaching the reducer, so
   TheaterCollapsed can no longer crash on `undefined.toFixed(1)`.
   Every variant's required fields are validated: run_started
   (protocolVersion, non-empty cast, maxRounds, threshold, scale),
   panelist_* (round, role, plus variant-specific shape), round_end
   (round, composite, mustFix, decision in {continue,ship}, reason),
   ship (round, composite, status, artifactRef.{projectId,artifactId},
   summary), degraded (reason, adapter), interrupted (bestRound,
   composite), failed (cause), parser_warning (kind, position).

Reducer correctness (1 x P2)
5. `panelist_open` now materializes the round + an empty panelist
   view (`{dims: [], mustFixes: []}`) so TheaterStage can highlight
   the in-progress lane the instant the tag opens. Before this, a
   stream that emitted only `panelist_open` after `run_started` left
   `rounds = []` and the UI rendered no current round until a later
   `panelist_dim` arrived.

Polish (3 x P3)
6. Brand role tint swaps from `var(--magenta, var(--accent))` to
   `var(--purple, var(--accent))`. `--purple` is actually defined
   across the design systems; `--magenta` is not, so Brand was
   silently falling through to `--accent` and looking identical to
   Designer.
7. New i18n key `critiqueTheater.interruptedSummary` for the
   interrupted-collapse copy ("Interrupted at round N, best
   composite X.X"). Previously the interrupted branch reused
   `shippedSummary` and the UI read "Shipped at round..." for a run
   that specifically did not ship. Native value in en + zh-CN; other
   locales fall back via `...en` spread.
8. `TheaterDegraded` heading id comes from `useId()` instead of a
   hardcoded `theater-degraded-heading`, so two chips rendered on
   the same page (chat history with multiple completed runs) keep
   their aria-labelledby references unambiguous.

Tests (15 new cases)
- reducer.test.ts (+5): __reset__ on running/terminal/idle, panelist_open materializes round, panelist_open does not stomp prior panelist data.
- sse.test.ts (+6): variant-level rejection for ship without required fields, degraded without adapter, run_started with empty cast, panelist_dim with non-numeric score, round_end with unknown decision, plus a positive fully-formed ship.
- useCritiqueStream.test.tsx (+2): state reset on projectId change, state reset on enabled flip false.
- useCritiqueReplay.test.tsx (+1): state reset on transcriptUrl swap to null after a replay reached shipped.
- TheaterCollapsed.test.tsx (text-pinning update): asserts the interrupted branch reads "Interrupted at round 1" + "best composite 7.9", and explicitly NOT "Shipped at round...".
- TheaterDegraded.test.tsx (+1): two chips on the same page get unique aria-labelledby ids that each resolve to an `<h3>`.

Validated
- pnpm guard clean
- pnpm --filter @open-design/web typecheck clean
- Theater suite: 13 files, 101 tests (was 86 on the first Phase 8 push, +15 new)
- tests/i18n/locales.test.ts 5 of 5 across 18 locales

* feat(web): CritiqueTheaterMount wires SSE + reducer into a single drop-in (Phase 9.1)

* feat(i18n): Critique Theater strings for de + ja + ko + zh-TW (Phase 9.2)

* fix(web): resolve P1 + P2 review feedback on Phase 9 (PR #1315)

Addresses every blocker from codex, Siri-Ray, and lefarcen. The
three state-lifecycle and SSE-validation issues they also flagged
inherit fixes from PR #1314's review pass that this branch now sits
on top of after rebase.

Real daemon kill on Interrupt (P1)
- CritiqueTheaterMount now POSTs to
  /api/projects/:id/critique/:runId/interrupt alongside the
  optimistic local dispatch. Before this fix, clicking Interrupt
  only flipped the React state to interrupted while the daemon job
  kept running. The fetch is best-effort: a 404 (endpoint not wired
  yet, lands in Phase 15) is swallowed with a dev-mode console.warn
  so the UI still moves to the collapsed badge.
- New fetchInterrupt test seam lets RTL assert on the URL / method
  and simulate the "daemon not ready yet" path. Two tests pin both:
  the happy URL proj-42/critique/run-abc/interrupt POSTs, and a
  rejected fetch still flips the UI.

interruptPending reset on new run (P2)
- A ref-backed effect compares the current runId against the last
  one we saw; when it changes, interruptPending is cleared. A user
  who interrupts run-1 and then triggers run-2 from the same mount
  now gets a fresh, enabled kill button instead of one stuck in
  "Interrupting…". Pinned by a new mount test.

Escape keybind scope (P2)
- InterruptButton now checks the keydown target. Escape inside an
  input, textarea, select, or contenteditable element is ignored
  (and any ancestor of those via closest() is treated the same
  way). Body-level focus still fires the keybind so the Theater
  area's affordance keeps working. Four new tests cover textarea,
  input, contenteditable, and the body-focus positive case.

userFacingName i18n key (P2)
- The spec at specs/current/critique-theater.md:6 mandates a single
  critiqueTheater.userFacingName key so the "Design Jury" label can
  be renamed without touching code. Phase 8 introduced
  critiqueTheater.title by mistake; renamed across types.ts, en.ts,
  zh-CN.ts, de.ts, ja.ts, ko.ts, zh-TW.ts, and the lone consumer
  TheaterStage.tsx. The locale alignment test stays green.

Validated
- pnpm guard clean
- pnpm --filter @open-design/web typecheck clean
- Theater suite: 14 files, 112 tests (was 101 before, +11 new for
  the Phase 9 review pass: 3 mount + 4 InterruptButton focus scope;
  the rest were already in #1314's review fix).
- tests/i18n/locales.test.ts 5 of 5 across 18 locales.

* feat(daemon): adapter-degraded registry with TTL (Phase 10.1)

In-memory registry recording adapters that produced malformed or
oversize transcripts so the orchestrator can skip them for a TTL
window (default 24h) instead of cycling through known-bad providers
on every run.

Records carry reason (malformed_block | oversize_block |
missing_artifact), source label, and expiresAt. The test-only
clock seam lets the suite advance time deterministically and prove
that an expired entry stops counting as degraded without anyone
calling clearDegraded.

7/7 vitest cases green.

* feat(daemon): synthetic good + bad adapter fixtures (Phase 10.2)

Two test-only adapters that read the existing v1 transcript
fixtures (happy-3-rounds and malformed-unbalanced) and replay them
as either a full string or a 512-byte chunked stream. The chunked
form is what the conformance harness uses to prove the parser
holds together when the transcript arrives in arbitrary network
slices, not as one buffered blob.

* feat(daemon): adapter conformance harness (Phase 10.3)

runAdapterConformance pulls a transcript through the same
parseCritiqueStream pipeline the orchestrator uses and classifies
the outcome as shipped, degraded, or failed. On a degraded
outcome it forwards the matched reason to the adapter-degraded
registry, so a single nightly conformance run is what populates
the skip list rather than the orchestrator learning each adapter
is broken at request time.

5/5 vitest cases green covering shipped, malformed degraded,
oversize degraded, no-ship failure, and the harness-thrown
failure path.

* test(e2e): Critique Theater Playwright suite (Phase 11)

Six tests, one viewport per visual case, deterministic SSE
fixtures stubbed via page.route(). Adds the suite to
test:ui:extended so the existing extended-UI lane picks it up.

Coverage:

  1. Happy path: a single mounted theater plays the full
     fixture (1 run_started, 5 panelists open / dim / must_fix /
     close, 1 round_end, 1 ship) and ends on the score badge.
  2. Interrupt mid-run: the panelist that is open at the time
     the interrupt button is clicked closes with an interrupted
     marker and the transcript freezes there.
  3. Visual regression at 375x720 mobile.
  4. Visual regression at 768x1024 tablet.
  5. Visual regression at 1280x800 desktop.
  6. A11y role tree: the theater region exposes a labelled
     landmark, each panelist lane is a group with an accessible
     name, the score is a status live region.

All SSE traffic is stubbed by page.route so the suite runs in CI
without a daemon. The toggle is seeded via localStorage by
bootAppWithCritiqueEnabled so the gate behaves as if Settings
flipped it on. typecheck clean; playwright --list reports 6.

* test(web): reducer p99 bench at 10k iterations (Phase 13.1)

Locks the documented 2ms budget for the Critique Theater reducer
on a representative SSE script (27 actions, one full happy run)
behind a regression gate. Asserts p99 stays under 4ms (2x the
documented budget) so CI runners with a noisy neighbour do not
flake while a real regression to 20ms or 200ms still trips.

The bench is a vitest case rather than a bare microbenchmark so
it runs in the same CI lane as every other web test and does not
need a parallel runner.

* test(web): critique surface coverage walker (Phase 13.2)

Walks the public critique surface (11 SSE event names, 5 panelist
roles, 6 lifecycle phases, 9 named i18n keys) and asserts each
named symbol appears in both the src corpus and the test corpus.
The walker is the gate that catches a rename in one half of the
codebase without a matching update in the other half: a future
PR that drops 'panelist_must_fix' from the reducer without also
removing its test reference fails this suite.

62 assertions, one per symbol per corpus.

* fix(web): tighten Phase 13 gates from lefarcen review (PR #1318)

Address the actionable items from lefarcen's review of the two
Phase 13 CI gates. The two questions about longer-term DX (pre-
commit hook to auto-update the symbol table, AST-walker swap)
are documented as deferred follow-ups rather than landed here.

reducer-bench:
  - Describe renamed to 'reducer p99 regression gate (Phase 13.1)'
    so it reads as a gate, not a comparative benchmark.
  - Failure message now carries the full distribution
    (p50 / p90 / p99 / max + ceiling), so triage on a tripped gate
    can distinguish a real 20ms regression from a 4.001ms CI hiccup
    without re-running locally (lefarcen Q3).
  - Captured a baseline (p50=0.011ms p90=0.013ms p99=0.018ms
    max=0.244ms on a local Node 24 / Win11 run, 2026-05-11) inside
    the docblock so reviewers can see the actual reading sits ~222x
    below the 4ms ceiling (lefarcen Q1).
  - Replaced 'role as any' casts with PanelistRole-typed casts so
    the fixture is typecheck-strict.
  - Phase numbering corrected (13.2 → 13.1 to match the PR body).

critique-coverage:
  - Symbols now grouped under four describe blocks (SSE events /
    panelist roles / lifecycle phases / i18n keys) so a failure
    points at the category that drifted at a glance (lefarcen nit).
  - Docblock now explains the grep-over-AST trade-off (the bug
    class is structural at the string level, not at the AST level)
    and points at the future AST-walker work as a deferred follow-
    up (lefarcen Q2).
  - Docblock now walks a contributor through the four-step
    maintenance flow (add to contract → add caller → add test →
    add literal here), so the next person to add an SSE event or
    i18n key knows the gate exists and what to update (lefarcen
    Q4).
  - Phase strings switched from 'phase: <name>' to bare-quoted
    literals so the walker is robust against single vs double
    quotes and ':' vs '===' source-shape changes.
  - Dead try/catch around 'stack = [root]' removed (cannot throw).
  - Per-symbol failure messages name the symbol AND which corpus
    is missing it, so the gate is self-describing on the next
    CI red.
  - Phase numbering corrected (13.4 → 13.2 to match the PR body).

63 / 63 vitest cases green (1 bench + 62 coverage). Web
typecheck clean.

* fix(web): tighten coverage walker semantics from lefarcen P2/P3 (PR #1318)

Two follow-on findings on commit 338a185:

P2 — coverage gate weakened. The previous revision used one helper
`corpusReferences` for both SRC and TEST corpora, and that helper
accepted the unprefixed PanelEvent type form (`type: 'panelist_must_fix'`)
as a substitute for the prefixed SSE wire name (`critique.panelist_must_fix`).
The fallback is correct on the TEST side (reducer tests dispatch
PanelEvent literals) but it weakened the SRC side: production code
could drop the SSE channel name silently and the PanelEvent type
alias would keep the walker green.

Split into two helpers: `srcReferences` is strict (exact substring
match only, no fallback) and `testReferences` keeps the lenient
fallback for SSE events. The production-side assertions now route
through `srcReferences` so the wire name is load-bearing again.

P3 — maintenance doc overclaimed. The previous revision said 'CI red
if you forget step 4' but the symbol arrays are partially hand-
maintained, so a contributor adding a NEW phase string or i18n key
without updating the array leaves CI green (the walker never knew
to look). Rewrote the failure-mode section to distinguish the two
cases:

  - Renaming an EXISTING symbol without updating the walker → CI red
    (existing assertion fails because the old name is gone).
  - Adding a NEW hand-maintained symbol without updating the walker
    → CI stays green (walker does not know to look for it).

Also clarified that `SSE_EVENTS` and `PANELIST_ROLE_STRINGS` are
auto-built from contracts so step 4 is one-line for `PHASE_STRINGS`
and `I18N_KEYS` only.

63 / 63 vitest cases still green.

* fix(web): close two P2 findings on PR #1318 (Siri-Ray + lefarcen)

P2 (coverage walker counted self as evidence). The walker walked
apps/web/tests, which contains apps/web/tests/components/Theater/
critique-coverage.test.ts itself. The hand-maintained PHASE_STRINGS
and I18N_KEYS literals inside that file would satisfy the test-side
coverage assertion against themselves, so a real Theater test that
covers a symbol could be deleted and the gate would still pass.

Excluded the walker file from TEST_FILES via path.resolve(__filename)
filter so the test corpus only contains independent evidence.

Once the walker stopped seeing itself, the gate correctly red-flagged
nine i18n keys that no INDEPENDENT test exercises:
critiqueTheater.userFacingName, roundLabel, composite, threshold,
interrupt, interrupted, degradedHeading, shippedSummary,
interruptedSummary. Component tests like TheaterCollapsed.test.tsx
exercise the rendered text but never mention the key STRING, so the
walker couldn't see them. Closed that gap by adding
apps/web/tests/components/Theater/critique-i18n-keys.test.ts: 9 cases,
one per watched key, asserting the dictionary entry exists as a
non-empty string. That's both real coverage (catches a stale dict)
and the independent evidence the walker requires.

P2 (interruptedSummary missing from de/ja/ko/zh-TW). The native
locale overrides were missing the key, so an interrupted run on a
German / Japanese / Korean / Traditional Chinese UI silently fell
back to the English string via the ...en spread. Added the key with
{round} and {composite} placeholders preserved, using PerishCode's
suggested copy from the earlier review thread.

Verified:
- pnpm --filter @open-design/web typecheck clean.
- pnpm exec vitest run tests/components/Theater tests/i18n:
  20 files / 190 tests green (critique-coverage 62 / 62,
  critique-i18n-keys 9 / 9 new, reducer-bench 1 / 1, locales 5 / 5).

* fix(web): drop the Dict cast in i18n key coverage test (lefarcen P1 / Siri-Ray on PR #1318)

The previous revision used `(en as Record<string, string>)[key]` to
read each watched key. Dict has no string index signature, so CI's
strict typecheck rejected the broad cast with TS2352 even though the
runtime assertion was fine.

Replaced with the typed pattern lefarcen suggested: type WATCHED_KEYS
as `readonly (keyof typeof en)[]` and read `en[key]` directly. That
removes the cast and also strengthens the test, because a renamed or
removed key now fails the type check immediately rather than at
runtime.

Verified:
- pnpm --filter @open-design/web typecheck clean.
- pnpm --filter @open-design/web exec vitest run
  tests/components/Theater/critique-i18n-keys.test.ts: 9 / 9 green.

* fix(web): tighten isPanelEvent in contracts so enum + numeric fields are checked end-to-end (Siri-Ray round-3 P1 on PR #1314)

The variant validator on the web SSE path previously accepted any
`typeof === 'string'` for closed-enum fields (ship.status,
panelist_*.role, degraded.reason, failed.cause, parser_warning.kind,
run_started.cast[]) and any `typeof === 'number'` for numeric fields,
which let NaN / Infinity through. Downstream components index i18n
tables by enum value, so an unknown status or role would land
`SHIP_BADGE_KEY[final.status]` on undefined and crash the translator.

The replay parser had a separate gap: `useCritiqueReplay.parseTranscript`
called the cheap `isPanelEvent` header check directly, so a recorded
line like `{"type":"ship","runId":"r"}` reached the reducer with
composite, status, round, artifactRef, summary all undefined and
TheaterCollapsed then called `final.composite.toFixed(1)` on undefined.

Resolution: move all wire-side validation into the contract guard.

- Export const arrays for the closed enums:
  SHIP_STATUSES, DEGRADED_REASONS, FAILED_CAUSES, PARSER_WARNING_KINDS,
  ROUND_DECISIONS (PANELIST_ROLES already existed).
- Rewrite `isPanelEvent` in packages/contracts/src/critique.ts to be the
  single deep validator: header (known type + non-empty runId) plus
  every variant-specific required field plus closed-enum membership
  plus Number.isFinite on every numeric field. Documented as the wire
  source of truth.
- Drop the local `hasValidVariantShape` from web/sse.ts; sseToPanelEvent
  now relies entirely on the contract guard, and parseTranscript in
  useCritiqueReplay (which already uses isPanelEvent) gets the deeper
  validation for free.

Tests (TDD, red-first):

- packages/contracts/tests/critique.test.ts: 13 new cases pinning the
  strict guard directly (well-formed across every variant, every
  rejection path: unknown type, empty/non-string runId, unknown enum,
  non-finite numeric, missing variant field).
- apps/web/tests/components/Theater/state/sse.test.ts: 9 new cases for
  each closed-enum rejection on the wire path plus a positive sweep
  across every legal enum value across every variant.
- apps/web/tests/components/Theater/hooks/useCritiqueReplay.test.tsx:
  2 new cases for incomplete and unknown-enum transcript lines.

Verified:
- pnpm --filter @open-design/contracts test 4 files / 30 tests green.
- pnpm --filter @open-design/contracts build clean.
- pnpm --filter @open-design/web typecheck clean.
- pnpm --filter @open-design/web test 107 files / 976 tests green.

* fix(contracts): enforce numeric domains in isPanelEvent (lefarcen P2 on PR #1314 round 4)

The strict guard from PR #1314 round 3 enforced enum membership and
Number.isFinite, but accepted any finite number where the contract
intends a specific domain: scale: 0 (ScoreTicker divides by it),
negative thresholds, fractional rounds, negative mustFix, etc.
ScoreTicker.tsx writes `var(--scale, ${state.scale})` into inline
CSS and divides by it for tick width, so a guard-passing scale: 0
shipped Infinity into the rendered style. Negative composite /
score values reached downstream code that assumes >= 0.

Resolution: mirror the daemon-side Zod domain constraints in the
runtime guard.

Three new helpers in packages/contracts/src/critique.ts:

  - isPositiveInt(v): integer with v > 0. Used for round, maxRounds,
    scale, protocolVersion (all 1-indexed in the orchestrator).
  - isNonNegativeInt(v): integer with v >= 0. Used for mustFix,
    position, bestRound. bestRound: 0 is the valid sentinel for
    'interrupted before any round closed'.
  - isNonNegativeFinite(v): finite number with v >= 0. Used for
    composite, score, dimScore, threshold. Threshold may be
    fractional (e.g. 8.5 on a scale of 10).

Cross-field check inside run_started: threshold <= scale (the daemon
Zod schema enforces this with an epsilon refine, the wire guard
matches the same intent).

Tests (TDD, red-first) added in packages/contracts/tests/critique.test.ts:

  - 22 new rejection cases across every numeric field that
    previously slipped through: scale: 0, negative scale, fractional
    scale, maxRounds: 0, fractional maxRounds, protocolVersion: 0,
    fractional protocolVersion, negative threshold, threshold > scale,
    round: 0, fractional round, negative dimScore / score, negative /
    fractional mustFix, negative composite, ship round: 0, negative /
    fractional bestRound, negative interrupted composite, negative /
    fractional parser_warning position.
  - 3 positive boundary cases that must still pass: threshold == scale,
    fractional threshold within [0, scale], interrupted with
    bestRound: 0 (no round completed before interrupt), parser_warning
    with position: 0 (start of stream).

Verified:
- pnpm --filter @open-design/contracts build clean.
- pnpm --filter @open-design/contracts test: 4 files / 59 tests green
  (was 37 before the new domain cases).
- pnpm --filter @open-design/web typecheck clean.
- pnpm --filter @open-design/web test: 110 files / 1004 tests green;
  no regression on Theater suite, sse validator, replay parser, or
  assistant-feedback widget tests.

* fix(web): restore wait-for-daemon-ack pattern on Theater interrupt

Same regression as flagged on PR #1316 post-main-merge: the
optimistic local dispatch fired before the POST resolved, so a
daemon 404 / 409 still terminalized the UI and the real SSE
terminal event got ignored by the sticky interrupted phase.

Snapshot runId / bestRound / composite at click time, dispatch
interrupted only on res.ok, clear interruptPending on rejection or
non-2xx so the user can retry. Tests cover rejection + 404 leaving
the run on the live stage; the 204 path waits for the ack.

* test(e2e): move critique-coverage walker from apps/web/tests to e2e/tests (Siri-Ray P2)

The walker is by definition a cross-app consistency check: it reads
the web reducer, the daemon critique module, the contracts package,
and the e2e UI suite. Hosting it under apps/web/tests/ violated the
repo boundary rule (root AGENTS.md): app packages must not import
another app's private src/ or tests/ as a shared helper, and
cross-app consistency checks belong in e2e/tests/. The web test
lane was effectively coupled to daemon and e2e file layout, so a
daemon-only refactor could break the web lane.

Moved the file to e2e/tests/critique-coverage.test.ts and switched
the contracts import to the import.meta.glob shape the e2e package
already uses (see localized-content.test.ts), so the e2e package
does not have to add @open-design/contracts as a workspace dep just
to load two const arrays. REPO_ROOT and SELF_PATH recalculated for
the new location.

Web test lane no longer depends on daemon, contracts, or e2e layout.
The e2e walker covers the same 62 assertions as before:

  e2e/tests/critique-coverage.test.ts  62 / 62 green

Web typecheck clean, e2e typecheck clean.

* fix(test): add projectKind prop to FileViewer deck render after v0.7.0 merge

---------

Co-authored-by: Nagendhra <nagendhra405@gmail.com>
2026-05-14 15:55:36 +08:00
..
reducer-bench.test.ts test(web): Critique Theater Phase 13 (reducer p99 bench + surface coverage walker) (#1318) 2026-05-14 15:55:36 +08:00
reducer.test.ts feat(web): Critique Theater Phase 8 (8 Theater components, barrel, role-keyed CSS) (#1314) 2026-05-12 21:38:58 +08:00
sse.test.ts feat(web): Critique Theater Phase 9 (drop-in mount wrapper, native i18n for de, ja, ko, zh-TW) (#1315) 2026-05-13 21:21:41 +08:00