Commit graph

106 commits

Author SHA1 Message Date
Marc Chan
c45c5c9764
fix(ci): align visual selectors and nix hashes (#2471)
* fix(ci): align visual selectors and nix hashes

* fix(ci): add strict PR visual verification

* fix(ci): repair visual-home captures

Generated-By: looper 0.8.1 (runner=fixer, agent=opencode)
2026-05-21 10:45:37 +08:00
Eli-tangerine
ce95266586
[codex] Polish home composer working-directory controls (#2468)
Some checks failed
visual-baseline / Capture visual baselines (push) Waiting to run
ci / Detect CI change scopes (push) Successful in 1s
nix-check / build (push) Failing after 3s
ci / Preflight (push) Failing after 2s
ci / Core package tests (push) Failing after 1s
ci / Tools workspace tests (push) Failing after 1s
ci / Daemon workspace tests (1/2) (push) Failing after 1s
ci / Daemon workspace tests (2/2) (push) Failing after 1s
ci / Web workspace tests (push) Failing after 1s
ci / E2E vitest (push) Failing after 1s
ci / Playwright critical (starters) (push) Failing after 1s
ci / Playwright critical (core) (push) Failing after 1s
ci / Build workspaces (push) Failing after 1s
ci / App workspace tests (push) Failing after 0s
ci / Validate workspace (push) Failing after 0s
ci / Runtime trace (push) Has been skipped
* Polish design system home flows

* Polish home prompt presets

* Polish home working directory controls

* test: align home hero chrome smoke

* fix: stabilize home composer ci checks

---------

Co-authored-by: qiongyu1999 <2694684348@qq.com>
2026-05-21 00:22:46 +08:00
Marc Chan
e727168676
chore(ci): expand visual regression coverage (#2381)
Some checks failed
ci / Runtime trace (push) Blocked by required conditions
visual-baseline / Capture visual baselines (push) Waiting to run
ci / Detect CI change scopes (push) Successful in 0s
landing-page-ci / Validate landing page (push) Failing after 2s
landing-page-deploy / Deploy landing page (push) Has been skipped
nix-check / build (push) Failing after 2s
ci / Preflight (push) Failing after 1s
ci / Core package tests (push) Failing after 1s
ci / Tools workspace tests (push) Failing after 1s
ci / Daemon workspace tests (1/2) (push) Failing after 1s
ci / Daemon workspace tests (2/2) (push) Failing after 2s
ci / Web workspace tests (push) Failing after 1s
ci / E2E vitest (push) Failing after 2s
ci / Playwright critical (starters) (push) Failing after 1s
ci / Playwright critical (core) (push) Failing after 1s
ci / Build workspaces (push) Failing after 1s
ci / App workspace tests (push) Failing after 0s
ci / Validate workspace (push) Failing after 14m14s
* Improve visual diff annotations

* Expand visual regression coverage

* fix(ci): cap visual diff canvas pixels

Generated-By: looper 0.8.1 (runner=fixer, agent=opencode)

* Stabilize visual regression screenshots

* test(e2e): stub routines for visual snapshot

Generated-By: looper 0.8.1 (runner=fixer, agent=opencode)

* Expand visual regression surfaces

* fix(e2e): order design system visual mocks

Generated-By: looper 0.8.1 (runner=fixer, agent=opencode)

* fix(e2e): order design system visual mocks

Generated-By: looper 0.8.1 (runner=fixer, agent=opencode)

* Tune visual diff box stroke

* fix(e2e): stabilize visual detail mocks

Generated-By: looper 0.8.1 (runner=fixer, agent=opencode)

* fix(e2e): harden visual diff box helpers

Generated-By: looper 0.8.1 (runner=fixer, agent=opencode)

* fix(web): preserve deep-linked project bootstrap

Generated-By: looper 0.8.1 (runner=fixer, agent=opencode)

* fix(e2e): stub automation task mocks

Generated-By: looper 0.8.1 (runner=fixer, agent=opencode)
2026-05-20 22:25:41 +08:00
Eli-tangerine
8193981511
Keep PR 2400 changes without folder pickers (#2462)
* feat(daemon): add project working directory management and editor hand-off functionality

- Introduced new flags for project commands to manage working directories, including `--working-dir` and `--dir`.
- Implemented API routes for listing available editors and opening projects in selected editors.
- Added a hand-off button in the ChatPane header to facilitate opening project folders in local applications.
- Enhanced the HomeHero component to include working directory and design system settings, improving user experience in project creation.
- Created HomeHeroSettingsChips component for inline management of working directory and design system selection.

* feat(chat): implement voice transcription proxy and enhance UI components

- Added a new API route for voice transcription using OpenAI's `/audio/transcriptions` endpoint, allowing users to send audio blobs directly for transcription.
- Integrated multer for handling audio file uploads in memory, ensuring efficient processing without disk storage.
- Updated the HomeHero component to include example prompt suggestions for plugins, enhancing user interaction.
- Introduced the EditorIcon component to visually represent different editors in the hand-off menu, improving the user experience.
- Refined the HandoffButton component to utilize the new EditorIcon, providing a more cohesive interface for selecting editors.
- Enhanced CSS styles for various components to improve layout and responsiveness, including adjustments to tab and button sizes for better usability.

* style(workspace-shell): enhance layout and overflow handling

- Updated CSS for .workspace-shell to ensure full viewport width and height, with proper overflow management.
- Adjusted grid layout to prevent content overflow and maintain responsiveness.
- Modified styles for .workspace-tabs-chrome to improve width handling and prevent overflow issues.

* refactor(chat): remove voice transcription proxy and related components

- Deleted the voice transcription proxy implementation, including the associated API route and multer configuration.
- Removed the MicButton component from the ChatComposer and HomeHero components to streamline the UI.
- Updated HomeHero to include example suggestions without the voice input functionality.
- Adjusted CSS styles for various components to maintain layout consistency after the removal of the MicButton.

* feat(daemon): implement minting of HMAC tokens for working directory management

- Added a new function `mintImportTokenFromCurrentSecret` to generate HMAC tokens bound to a specified base directory, enhancing security for working directory operations.
- Updated the `desktop-auth.ts` file to include the new token minting functionality, which returns structured errors when the desktop auth secret is cleared.
- Introduced new IPC message types for minting import tokens in the sidecar protocol, allowing seamless integration with the daemon's working directory management.
- Enhanced the `WorkingDirPill` component to utilize the new token minting flow for secure directory selection in desktop builds.
- Updated CSS styles for the HomeHero component to accommodate new example suggestion features and maintain layout consistency.

* fix(HomeView): import HOME_HERO_CHIPS constant for improved chip management

- Updated the HomeView component to import the HOME_HERO_CHIPS constant from the chips module, enhancing the management of hero chips within the component.

* feat(daemon): implement mintImportTokenViaSidecar for secure working directory management

- Introduced the `mintImportTokenViaSidecar` function to facilitate the minting of HMAC tokens for desktop-import operations via the daemon's sidecar IPC. This allows CLI commands to bypass authentication when the desktop-auth gate is active.
- Updated the CLI to utilize the new token minting function when setting the working directory, ensuring secure access to trust-gated API endpoints.
- Enhanced the sidecar server to handle minting requests and return structured error messages for improved user feedback.
- Added tests to validate the new token minting functionality and its integration with the working directory management process.
- Refactored related components to support the new token flow, improving overall security and user experience.

* feat(HomeHero): enhance UI components and styles for improved user experience

- Updated HomeHero component to replace active dot indicators with Plug icons for better visual representation of active plugins.
- Adjusted CSS styles for various elements, including padding and dimensions, to enhance layout consistency and responsiveness.
- Introduced new styles for active type icons and improved hover effects for buttons.
- Updated HomeHeroSettingsChips to change button titles and icons for clarity.
- Added tests to ensure proper rendering and functionality of updated components.

* feat(ProjectDesignSystemPicker): enhance design system selection with preview functionality

- Updated the ProjectDesignSystemPicker component to include a preview feature for design systems, allowing users to see a preview of the selected design system.
- Implemented hover functionality to update the preview based on the hovered design system.
- Added fullscreen preview capability for a more immersive experience.
- Enhanced CSS styles for the design system picker to improve layout and responsiveness.
- Introduced tests to validate the new preview functionality and ensure proper interaction within the component.

* feat: refactor project metadata handling and enhance design system picker

- Updated the default scenario plugin ID retrieval to use project metadata, improving the logic for determining the appropriate plugin based on project intent.
- Enhanced the ProjectDesignSystemPicker and related components to support localized design system summaries and categories, improving user experience.
- Introduced new translations for working directory and design system picker components, ensuring better accessibility and usability across different locales.
- Added a new 'live-artifact' project type to the HomeHero chips, expanding the functionality for users creating refreshable artifacts.
- Updated tests to validate the new project metadata handling and design system picker functionalities.

* feat: enhance localization and styling for design system components

- Added French translations for working directory and design system picker components, improving accessibility for French-speaking users.
- Updated CSS styles for the pet task item to ensure consistent padding and layout.
- Introduced a new test suite for HomeHeroSettingsChips to validate localization and design system selection functionality.
- Enhanced ProjectDesignSystemPicker tests to ensure proper localization and interaction with design system categories.

* fix: update .gitignore to include all claude-sessions directories and remove specific session files

- Modified .gitignore to ensure all claude-sessions directories are ignored by using a wildcard pattern.
- Deleted two specific claude-sessions markdown files to clean up unnecessary session data.

* fix: repair home automation ci regressions

* fix: stabilize artifact consistency e2e

* Remove folder picker changes from PR 2400

---------

Co-authored-by: pftom <1043269994@qq.com>
Co-authored-by: qiongyu1999 <2694684348@qq.com>
2026-05-20 22:07:30 +08:00
lefarcen
41a33aed9e
Reapply "fix(web): demote Plugins and Integrations to nav rail footer (#1806)" (#2360) (#2397)
* Reapply "fix(web): demote Plugins and Integrations to nav rail footer (#1806)" (#2360)

This reverts commit 1ab8758045.

* fixup: align EntryHelpMenu Discord URL with #2386 update

The revert of #2360 brought back EntryHelpMenu.tsx as #1806 originally
added it, with DISCORD_URL = 'https://discord.gg/BYShPgWpq'. #2386 later
rotated the Discord invite to mHAjSMV6gz, but only in the places that
existed on main at the time (EntryShell avatar dropdown + the e2e test);
EntryHelpMenu didn't exist then, so it never got updated. The e2e test
the revert reintroduced asserts the new URL, so the component must
match.
2026-05-20 18:27:48 +08:00
Eli
a5e43ae2a4
add discord feedback entry points (#2386) 2026-05-20 16:22:45 +08:00
shangxinyu1
71044bd3d6
test(e2e): harden extended coverage state assertions (#2245)
* test(e2e): harden extended coverage contracts

* docs(testing): add e2e hardening status

* fix(web): persist artifact chips after daemon runs

* ci: install playwright browsers for e2e vitest

* Fix daemon run recovery across reloads

Pin daemon-created runs to assistant messages immediately so hard reloads before the create response can reattach.

Replay terminal and active run events from the beginning on reload so restored turns keep assistant text, thinking events, produced files, and artifacts.

Fixes #2366

Fixes #2368

Fixes #2371

* test(e2e): preserve fake runtime selection across reload

* fix(web): scope daemon run recovery to daemon mode

* fix(e2e): remove duplicate delayed smoke flag

* fix(web): scope replay artifact recovery to current run

* fix(daemon): remove duplicate run-create pin
2026-05-20 16:21:01 +08:00
shangxinyu1
5fc27f8923
Fix daemon run recovery across reloads (#2374)
* Fix daemon run recovery across reloads

Pin daemon-created runs to assistant messages immediately so hard reloads before the create response can reattach.

Replay terminal and active run events from the beginning on reload so restored turns keep assistant text, thinking events, produced files, and artifacts.

Fixes #2366

Fixes #2368

Fixes #2371

* Fix ProjectView daemon run recovery tests
2026-05-20 15:10:23 +08:00
Marc Chan
f294ab4915
chore(ci): add visual regression PR workflow (#2372)
* Add visual regression PR workflow

* Allow manual visual PR comments

* Post visual comments for same-repo PRs

* fix(ci): surface R2 lookup failures in visual report

Generated-By: looper 0.8.1 (runner=fixer, agent=opencode)

* Align visual workflow names
2026-05-20 15:05:59 +08:00
PerishFire
cc0d423c89
fix: launch windows updater fixture via node (#2364) 2026-05-20 13:32:39 +08:00
PerishFire
15d08d4158
feat: add windows packaged auto update flow (#2362) 2026-05-20 12:56:14 +08:00
lefarcen
1ab8758045
Revert "fix(web): demote Plugins and Integrations to nav rail footer (#1806)" (#2360)
This reverts commit a38e09f931.
2026-05-20 12:43:49 +08:00
Sid
8bcd96f5e5
fix(frames): resolve relative screen= against embedder URL (#2316)
Shared device frames serve at /frames/<name>.html and previously
assigned the raw ?screen= value to the inner iframe.src. A
project-relative value like screen=screens/foo.html resolved against
/frames/, producing /frames/screens/foo.html (404), instead of the
embedding project's /api/projects/:id/raw/screens/foo.html.

The five frame HTML files now resolve relative ?screen= values
against document.referrer when present (the embedding project
preview), falling back to location.href so standalone /frames/*
loads keep working. Absolute and root-relative paths are passed
through unchanged.

Adds an e2e Vitest spec that evaluates each frame's inline <script>
in a Node vm and asserts iframe.src under five scenarios per file
(25 cases total): project-relative against referrer, root-relative
pass-through, absolute pass-through, empty referrer fallback, and
missing ?screen= no-op.

Fixes #2234
2026-05-20 10:03:01 +08:00
PerishFire
ad37fd30cf
Add desktop updater UI flow (#2270) 2026-05-19 21:36:51 +08:00
shangxinyu1
d5eb6d17d7
test(e2e): harden connectors auth cancel recovery (#2176)
* test(e2e): harden connectors auth cancel recovery

* test(e2e): target connector cancel alert
2026-05-19 18:11:32 +08:00
PerishFire
bb13eee765
chore: optimize CI and beta release runtime (#2231)
* chore(ci): add runtime trace summaries

* chore(ci): tighten measured workspace steps

* chore(release): tighten beta setup steps

* chore(release): slim beta windows smoke

* chore(ci): shard daemon tests

* chore(ci): harden runtime trace lookup

* chore(release): avoid mac pnpm cache in beta

* chore(ci): split critical playwright checks

* chore(release): publish beta platforms from builders

* test(e2e): update beta release workflow expectation

* chore(ci): stop gating PRs on nix check

* fix(release): keep beta latest complete
2026-05-19 18:06:28 +08:00
Tom Huang
86ec951fb9
[codex] Add automation templates and proposal workflows (#2193)
* feat(web): introduce Automations tab with dual-track capability for routines

This commit adds a new Automations tab that consolidates routines, schedules, and live artifacts, allowing users to manage automations seamlessly. The tab features a modal for creating and editing automations, which supports various scheduling options (hourly, daily, weekdays, weekly) and project modes (create_each_run, reuse). The CLI is also updated to expose automation commands, ensuring consistency between the web UI and CLI interfaces.

Key changes include:
- New `NewAutomationModal` component for automation creation and editing.
- Updated `TasksView` to integrate the new Automations functionality.
- Enhanced styling for the Automations tab to improve user experience.

This implementation aligns with the dual-track capability exposure policy, ensuring all features are accessible via both the web UI and CLI.

* feat(daemon): enhance automation context handling and CLI commands

This commit introduces several improvements to the automation context management and updates the CLI commands accordingly. Key changes include:

- Added support for new context fields (`plugin`, `mcp`, `connector`) in automation commands.
- Updated the CLI to reflect new target options (`new-project`).
- Enhanced error messages for invalid target inputs.
- Introduced functions to handle context selection and normalization for routines, including the ability to parse and store context data in the database.
- Updated the database schema to include a new `context_json` field for routines.
- Improved the handling of context in routine routes and the web interface, ensuring that selected contexts are properly managed and displayed.

These changes aim to provide a more robust and flexible automation experience, aligning with the recent enhancements in the web UI.

* feat(web): enhance TasksView with automation run history and status indicators

This commit introduces several new features to the TasksView component, including:

- Added functionality to display automation run history for each routine, showing metadata such as status, timestamps, and project details.
- Implemented status indicators for routine runs, providing visual feedback on their current state (succeeded, failed, running, queued).
- Enhanced the UI to allow users to expand and view detailed run history, including the ability to open the corresponding project conversation.
- Updated styles to improve the presentation of automation statuses and history.

These changes aim to provide users with better insights into their automation routines and improve overall usability.

* feat(daemon): implement automation ingestion and proposal management

This commit introduces several new features related to automation ingestion and proposal management within the daemon. Key changes include:

- Added new modules for handling automation source packets and proposals, allowing for the storage, retrieval, and management of automation-related data.
- Implemented functions to list, create, and apply automation proposals, enhancing the automation workflow.
- Introduced new CLI commands for interacting with memory entries and automation sources, providing users with more control over their automation processes.
- Enhanced the server routes to support automation source and proposal APIs, enabling seamless integration with the existing system.

These changes aim to improve the overall automation experience, making it easier for users to manage and utilize automation proposals and ingestions effectively.
2026-05-19 16:35:28 +08:00
Siri-Ray
eb127e0f79
Remove live artifact home chip (#2221) 2026-05-19 16:35:09 +08:00
chaoxiaoche
a38e09f931
fix(web): demote Plugins and Integrations to nav rail footer (#1806)
* fix(web): demote Plugins and Integrations to nav rail footer

Plugins and Integrations are platform-configuration surfaces, not
daily-use destinations. Moving them to the footer section of the
left nav rail — separated from primary items by a thin divider —
keeps them reachable while giving the primary four items
(Home, Projects, Automations, Design Systems) the visual weight
they deserve.

- EntryNavRail: remove Plugins/Integrations from the main __group
  and place them in the __footer above the help launcher
- entry-layout.css: add __divider rule (1 px separator) to visually
  mark the boundary between primary and secondary nav regions

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): remove settings dropdown, gear opens settings directly

The gear/cog button previously opened a dropdown that mixed three
unrelated concerns: community links (X, Discord), preference quick-
access (Language, Appearance), a feature shortcut (Use everywhere),
and a redundant Settings entry — creating two separate paths to the
same Settings dialog and duplicating Language/Appearance relative to
the Settings sidebar.

Changes:
- Gear button now directly opens the Settings dialog (no intermediate
  dropdown layer)
- Follow @nexudotio on X and Join Discord moved to the Help menu at
  the bottom of the nav rail, where community/external links belong
- Language and Appearance remain exclusively in the Settings dialog
- Use everywhere remains exclusively in the topbar chip
- Remove dead state (avatarMenuOpen, languageExpanded,
  appearanceExpanded), ref, outside-click effect, and module-level
  constants (APPEARANCE_THEMES, APPEARANCE_LABEL, describeModelChip)
  that were introduced solely to support the dropdown

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(web): move output-type chips above chat input as tab bar

The home hero previously had two chip rows below the chat input that
mixed two unrelated dimensions: output type (Prototype, Slide deck,
Image…) and workflow source (Create plugin, From Figma, From folder,
From template). Users had no visual cue that these were different
categories.

This change separates the two dimensions clearly:

- Output type (create group) becomes a tab bar positioned above the
  input card. Tabs share the same chip data and onPickChip dispatch,
  so plugin selection, active state, and pending state are unchanged.
  Active tab shows a colored underline; the bar border visually
  connects to the input card below.

- Workflow source (migrate group) stays as the chip row below the
  input card, now standing alone with unambiguous "how to start"
  semantics.

- Subtitle updated from "Pick a plugin below" to "Pick a type"
  to match the new placement.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): skip replace-prompt confirmation when switching output-type tabs

Output-type tab clicks (create group) are mode-selection gestures;
the user expects the prompt to update immediately without a dialog.
The confirmation is still shown for migrate-group chips (From Figma
etc.) where the replacement carries meaningful user-provided content.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): use brand logo as Home destination, drop redundant brand mentions

The entry nav rail previously rendered the brand logo and a separate
Home icon back-to-back; both invoked `onViewChange('home')`, so the
Home button was pure duplicate affordance. The hero pane also
displayed a third "Open Design" lockup that competed with the rail
logo and the rebranded title.

This collapses those affordances:

* `EntryNavRail` drops the dedicated Home `NavButton`. The brand
  logo button now carries `aria-current="page"` and an `is-active`
  visual state when the home view is showing, and its tooltip reads
  "Open Design · Home" off-home so the navigation behavior stays
  discoverable for new users.
* `entry-layout.css` adds the matching `.entry-nav-rail__logo.is-active`
  accent treatment so the "you are here" cue reads at parity with
  primary rail buttons.
* `HomeHero` removes the inline `home-hero__brand` lockup and the
  associated CSS, then retunes the title/subtitle/type-tab spacing
  so the headline group still pairs tightly with the type tabs and
  input card below.
* `entry-chrome-flows` is updated to assert the logo carries the
  active page treatment and that no `entry-nav-home` test id
  resurfaces by mistake.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): auto-grow home chat input, disable internal scroll and drag-resize

The home hero textarea was capped at a fixed `min-height: 84px` with
`resize: vertical`, so users had to either drag the bottom-right
corner to enlarge it or scroll the textarea internally to read
longer prompts. That hid context (loaded plugin templates routinely
overflow three lines) and exposed a manual grip whose state was easy
to leave in an awkward height.

This change makes the chat box grow with its content:

* `HomeHero` adds a `useLayoutEffect` that, on every prompt change,
  resets the textarea height to `auto` so the browser can measure a
  smaller content, then writes back `scrollHeight` in pixels. The
  effect uses layout phase (not effect phase) to avoid a one-frame
  flash at the previous height when a plugin loads a long example
  prompt.
* `home-hero.css` swaps `resize: vertical` for `resize: none` and
  adds `overflow: hidden`, so the manual drag grip disappears and
  the textarea never scrolls internally. `min-height: 84px` is kept
  so the empty input still reads as a chunky chat box. The outer
  page handles overflow when the prompt is genuinely very long.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): make output-type tabs read as folder-tabs attached to chat box

The previous tab bar used a small icon + label per tab and signaled
the active tab with a 2px accent underline sitting on a horizontal
divider line. That divider visually broke the relationship between
the tabs and the input card below — the active tab looked like a
free-floating header, not a flap of the chat surface, so users had
to do extra work to mentally connect "I picked Prototype" with the
prompt area immediately underneath.

This switches the tab bar to a folder-tab pattern:

* `HomeHero` drops the per-tab `Icon` element. The seven labels
  (Prototype, Live artifact, Slide deck, Image, Video, HyperFrames,
  Audio) already disambiguate at the type sizes used here, and the
  icons were primarily decorative.
* `home-hero.css` rewrites the tab styling:
  - The bar no longer paints its own baseline border; the input
    card's top edge serves as the baseline.
  - Each tab is a rounded-top container with a 1px transparent
    border and a `-1px` bottom margin, so its bottom edge overlaps
    the card's top border by exactly one pixel.
  - The active tab borrows the card's panel background and border
    color, and overrides its bottom border with the panel color
    so it visually erases the card's top border for the tab's
    width. The result reads as one continuous "Prototype is this
    chat box" surface.
  - The card's prior `margin-top: 8px` is removed so the tab bar
    bottom and card top sit at the same y coordinate, which is the
    geometric precondition for the 1px overlap to land cleanly.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): enlarge home chat attach and submit buttons for legibility

The attach (paper-clip) and submit (arrow-up) controls on the home
chat input rendered at 32px diameter with 14px and 18px glyphs
respectively. After stroke antialiasing the icons read at roughly
11–12px on typical displays — small enough that users reported the
glyphs were illegible and had to be discovered by trial-and-error.

This bumps both controls to 38px circles and grows their glyphs
(attach 14 → 18, submit 18 → 22). The primary call-to-action now
has clearly more visual weight than the surrounding muted hint
text, and the paper-clip is recognisable at a glance.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): paint a chevron on inline select slots in the home prompt

Inline select slots in the home prompt overlay (e.g. the
`high-fidelity` / `wireframe` fidelity picker on the Live artifact
template) rendered as plain pill highlights, visually identical to
free-text and read-only text slots. The slot's `appearance: none`
strips the browser's default chevron, so users had no affordance
hinting the value was switchable.

This wraps select-type slots in an `inline-flex` span and overlays
an explicit chevron-down `Icon` against the slot's trailing padding
(bumped from 18px to 22px to make room). The chevron inherits the
accent color via the wrap's `color` property so it themes correctly
in light and dark modes. Click semantics are unchanged because the
chevron is `pointer-events: none`.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): size inline number slots to their value, not the whole row

Number-type inline slots in the home prompt (e.g. the slide-count
spinner on the Slide deck template) inherited the generic slot
`min-width: 8ch` and then stretched to the browser's default
`<input type=number>` width, so a two-digit value like "10" ate
the entire remaining line and pushed the native spinner buttons
to the far right edge — far away from the value they control.

This sizes the number input to its actual content plus four
characters of trailing room for the spinner buttons (clamped to
6–14ch), and adds a `home-hero__prompt-slot-input--number`
modifier that overrides the slot `min-width` to 4ch. The
spinner now sits flush against the value and the slot reads as
a compact inline pill matching the surrounding text/select
slots.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): widen prompt line-height so slot pills do not collide vertically

Each interactive slot and mention in the home prompt paints a 2px
outline ring via `box-shadow`. At the previous `line-height: 1.55`
on a 15px font, two lines were ~23px apart while a single pill
occupied ~20px of vertical space (text box + ring on both sides),
leaving roughly 2px of clear space between rows. When the prompt
wrapped onto multiple lines — common for the Image template's
"Generate a {kind} of {subject}. Style: {style}. Aspect: {ratio}."
example — the rings from line N and line N+1 visually merged into
a single bar, making it ambiguous which pill the user was about to
click or edit.

This bumps the line-height of both the highlight overlay and the
underlying editable textarea to 1.85 (~28px per line), restoring
~8px of clear space between pill rows. The two values must stay
identical so the overlay glyphs continue to track the textarea
caret positions exactly; a brief comment in each rule documents
this coupling for future edits.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): paint dropdown chevron on select element with neutral color

The previous chevron treatment wrapped each select slot in an
inline-flex span and laid an accent-orange `<Icon>` over the
trailing padding. At the prompt's normal display size the orange
chevron blended into the orange pill ring corners, so users
perceived "a bunch of dropdown arrows" across every highlighted
slot, even though only the actual select rendered one.

Move the chevron onto the `<select>` itself as a small neutral-
gray `background-image` SVG. The grey contrasts clearly with the
pill's accent ring and the chevron lives inside the select's own
padding, so it can never visually overlap the value text and can
never appear on non-select slots.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): shrink select-slot dropdown caret so it stops reading as a check-mark

The 8×6 stroked chevron looked like a check-mark in zoomed views
because the stroke width was a large fraction of the glyph. Swap
for a small (9×5) filled triangle drawn as a `background-image`,
with `background-size` pinned so browsers can't scale it to its
intrinsic SVG box. The caret is now unambiguously a dropdown
indicator without crowding the value text.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): make read-only prompt slots visually distinct from editable pills

Multi-word context values are rendered as plain <span>s with
pointer-events: none, but they inherited the same orange pill +
ring treatment as the truly editable <input>/<select> slots.
Users couldn't tell which pills they could click into.

Strip the pill background, the ring, the radius, and the padding
from `.home-hero__prompt-slot-text` and replace them with a
subtle dashed bottom border. The orange foreground keeps the
slot family link, but read-only highlights now clearly read as
"context value spliced into the prompt" rather than as
"interactive control".

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): scale select-slot dropdown caret up to 12px wide for legibility

The 9×5 caret was too quiet at the prompt's font size to read as
a dropdown affordance. Bump to a 12×6 filled triangle and pad the
select's trailing space (24px) so the value text still has clear
breathing room from the caret.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): collapse plugins search and filter strips into one bar

The plugins-home gallery laid out search, count, mode (Featured),
total-in-catalog, clear-filters, the main category row, and the
sub-category row as four floating clusters. Search lived up in
the section header, far from the chips it actually scopes; the
mode strip wedged "Featured + 386 in catalog + Clear filters"
between the header and the category row with no obvious
ownership; and the two clear-filter affordances duplicated each
other (chip strip + empty-state).

Fold the mode strip into the category row: Featured is now the
leading chip on the same line as the category pills, and the
search field, the result count, and a compact Clear link sit as
a right-aligned tools cluster on that same row. The header
shrinks back to just the title, subtitle, and Browse registry
link. The sub-category row stays as the contextual second line
when a category is active.

Tests: keeps the existing data-testids (plugins-home-chip-featured,
plugins-home-row-category, plugins-home-clear, plugins-home-count,
plugins-home-search) so the existing 11-case section spec passes
without modification.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): hide Recent projects rail on first-run home when empty

A brand-new user landing on Home saw an empty dashed box with the
copy "No projects yet — type a prompt to start one." sitting
above the plugin gallery. The hero already prompts the user to
type, so the empty rail just adds vertical noise without telling
them anything new.

Return null from RecentProjectsStrip when the recent list is
empty (loading or not) so the rail appears only when there is
actually something to show. Update the entry-chrome e2e to
assert toHaveCount(0) on first run — codifying the new contract
that the rail is conditional, not always-mounted.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): hide structured inputs form when every field is in the prompt

The PluginInputsForm below the chat textarea surfaced every plugin
input — even the ones the prompt template already substitutes
inline as highlighted slot pills. For a plugin like Prototype
that's five identical labelled inputs duplicating the five slot
pills above them, making the chat box look like it has grown a
second composer.

Compute the set of placeholder keys actually referenced in the
prompt template (via INPUT_PLACEHOLDER_PATTERN) and skip those
when deciding what the structured form needs to render. The form
still appears for plugins that expose inputs not referenced in
the prompt text (e.g. a "Run in background" toggle), but
template-only plugins collapse the redundant second editor.

Update the picker spec accordingly: when every field is in the
template the form is now expected to be absent rather than to
render a duplicate of the inline slot.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): drop redundant filtered-count and Clear link beside the search

The combined plugins filter bar trailed the search field with
"59 / 386" and an inline "Clear" link, but both repeat what the
chip strip already shows: every chip carries its own count
(Slides 59, All 386, …) and clicking the All chip already resets
the filters. Users had no way to know what the bare "59 / 386"
fraction or "Clear" referred to without inference.

Strip the trailing tools cluster down to just the search input.
The empty-results message at the bottom of the gallery still
exposes a contextual "Clear filters" button when a stacked
filter yields no matches, so the affordance isn't lost — just
removed from a position where it didn't read as actionable.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): drop default subtitle copy from the Official starters strip

The Home gallery rendered a two-line explanatory subtitle under
"Official starters" — "Ready-to-use Open Design workflows bundled
with this runtime. Pick one to load a starter prompt, or browse
the registry for more." — every visit. Returning users skip it,
new users get the same message from the section title plus the
Browse registry link plus the visual card grid itself; the prose
read as filler chrome above the chip strip.

Default the subtitle prop to undefined and only render the <p>
when a caller passes an explicit string. Other surfaces that
mount PluginsHomeSection with their own copy keep their
subtitle; the bare Home gallery loses one row of vertical noise.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Fade out empty workflow lanes in Plugins home filter

Empty top-level lanes (Deploy 0, Refine 0, etc.) and empty
sub-categories (Vercel 0 under Deploy, Figma 0 under Import, etc.)
used to look identical to populated ones, so users couldn't tell at
a glance which chips were real catalog buckets and which were
"contribute a plugin" invites. The strip kept all lanes visible by
design — the workflow shape (Import / Create / Export / Share /
Deploy / Refine / Extend) is part of what we want users to see —
so we keep them clickable, but tag count-zero chips with
data-empty='true' and give them a faded, dashed-border treatment.
"All" pills stay solid since their count reflects the parent lane,
not their own emptiness.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): update home nav test expectations

* fix(e2e): align critical smoke with entry chrome

---------

Co-authored-by: chaoxiaoche <chaoxiaoche@chaoxiaochedeMacBook-Pro.local>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-19 14:58:15 +08:00
PerishFire
bd48c597b0
chore: pin dependency versions and harden CI caches (#2189)
* chore: pin dependency versions

* ci: enforce pinned dependency specs

* ci: fix pnpm executable invocation
2026-05-19 13:58:27 +08:00
PerishFire
99b42726b8
Simplify CI PR gate (#2183) 2026-05-19 13:18:41 +08:00
shangxinyu1
f650a043d9
test(e2e): align entry coverage with redesigned flows (#2101)
* Migrate entry E2E coverage and split CI

* test(e2e): relax connectors auth error assertions

* ci: route scenario registry changes to extended e2e

* ci: decouple packaged smoke jobs from validate gate

* ci: restore pre-split workflow

* test(e2e): slim critical ui smoke coverage

* test(e2e): move broader entry flows out of critical

* test(e2e): restore entry chrome coverage to ci

* ci: parallelize workspace validation jobs

* test(web): stabilize media palette bridge assertion

* ci: cache e2e playwright browsers
2026-05-19 11:26:40 +08:00
PerishFire
4424f08be0
[codex] Add packaged desktop auto-update (#1375)
* Add packaged desktop auto-update

* Handle counted beta nightly update versions

* Refresh desktop auto-update branch for main

* Serialize desktop updater operations

* Refresh auto-update branch for packaged paths
2026-05-19 11:20:05 +08:00
Abhisek Das
3f56395e2d
fix(web): restore Settings subtab-pill hover contrast in dark theme (#1815)
The .subtab-pill button hover rule hard-coded `rgba(255,255,255,0.6)` as the
hover background. In light theme this composites to a near-white wash that
sits cleanly under `--text`, but in dark theme it overlays a translucent
white on the dark panel and produces a bright, near-white surface beneath
the same light `--text`, dropping contrast to ~1.87 (well below WCAG AA 4.5).

The reporter (#1795, on 0.6.0) called this out on five Settings surfaces.
Four of them — BYOK / Appearance / Notifications / protocol-chip — were
incidentally fixed by the c5d77a03 dark-palette refresh because they use
the .seg-control / .seg-btn hover path, not the .subtab-pill path. The
Pet → pet source tabs (Built-in / Custom / Community) still reproduce on
0.7.0 because they use .subtab-pill, which never got a theme-aware hover.

Switch the hover background to `var(--bg-muted)`, an existing token that
mirrors `--bg-panel`'s active state shape (one step lighter than
`--bg-subtle`) and is defined per-theme. Composited contrast becomes
~9.4 in dark and ~14.3 in light against `--text`, comfortably above AA.

The same .subtab-pill rule is shared by RoutinesSection and DesignsTab, so
they pick up the fix as a side effect of fixing the source class.

Anchored by a Playwright regression test under e2e/ui/ that asserts WCAG
AA across both themes for the Pet source tabs and the seg-btn surfaces.
The test goes red on this commit's parent (Pets/dark, ratio 1.87) and
green here (all four assertions pass).
2026-05-15 23:08:00 +08:00
lefarcen
91195bc4ac Merge origin/preview/v0.8.0 (PR #1830) into sync branch
Pulls in the teammate's PR #1830 — e2e align to redesigned entry flows
+ watcher and codex runtime test fixes. Conflicts resolved:

- apps/daemon/tests/runtimes/registry-and-args.test.ts: imported both
  withPlatform (HEAD, used 6x) and withEnvSnapshot (theirs, used 1x);
  both helpers are needed in the file.
- apps/web/src/components/ProjectView.tsx: kept both ChatPane-remount
  guards. Main's fix #1710 uses lastSyncedConversationIdRef tracking URL
  sync; the teammate added lastSeenRouteConversationIdRef tracking last
  observed route. Both refs are already defined in the file and the two
  checks are complementary defenses.
- e2e/ui/settings-*.test.ts (5 files): took teammate's version wholesale
  — they replaced inline `page.goto('/') + click Execution mode` with
  `gotoEntryHome + openSettingsDialogFromEntry` helpers, which is exactly
  the PR's intent.
2026-05-15 19:10:39 +08:00
shangxinyu1
2181eb376d
test(e2e): align UI suites with redesigned entry flows (#1830)
Some checks failed
nix-check / build (push) Failing after 2s
* test(e2e): align critical ui tests with new entry shell

* test(e2e): align ui suites with redesigned entry flows

* fix(e2e): correct entry chrome page helper type

* fix(daemon): stabilize watcher and codex runtime tests

* fix(daemon): satisfy strict watcher option typing
2026-05-15 19:05:39 +08:00
lefarcen
22a3b99a47 Merge origin/main into preview/v0.8.0
Sync 49 commits from main. Conflicts resolved:
- .github/workflows/ci.yml: kept v0.8.0 granular per-area gating, added main's
  linux specs + release-stable.yml + release-preview.yml triggers
- .github/workflows/release-preview.yml: kept v0.8.0's full workflow over main's placeholder
- apps/web/src/components/AssistantMessage.tsx: combined v0.8.0 file-ops
  summary with main's stripTodoToolGroups + suppressAskUserQuestionFallbackText
- apps/web/src/components/ChatPane.tsx: kept both new imports
- apps/web/src/index.css: kept both .msg-plugin-chip and .user-copy-btn blocks
- e2e/ui/*.test.ts: kept v0.8.0 openEntrySettingsDialog helper over main's
  inline dialog navigation (UI was redesigned in v0.8.0)
- nix/package-{daemon,web}.nix: kept v0.8.0 pnpmDepsHash; rerun nix build to refresh
2026-05-15 18:23:33 +08:00
Olin Hendershot
74637f1cb5
Add Linux packaged client parity smoke coverage (#1204)
* docs: plan linux client issue 709

* fix: complete linux headless lifecycle routing

* feat: add linux packaged inspect

* test: add linux headless packaged smoke

* ci: add linux headless packaged smoke

* ci: smoke linux AppImage release artifacts

* docs: document linux packaged client status

* chore: finalize linux client audit remediation

* docs: add linux client publication packet

* test: harden linux client smoke coverage

* ci: preserve linux smoke audit evidence

* refactor: consolidate linux e2e helpers

Move pathExists and the desktop/web/daemon app-key array out of
linux.spec.ts into linux-helpers.ts, where expectPathInside and
linuxUserHome already live. Keeps the spec file focused on tests and
the helpers file as the canonical home for shared Linux e2e utilities.

* fix: move linux e2e helpers to lib

* fix: address linux release review blockers

* fix: drop npm dependency from containerized linux build

writeAssembledApp() previously called runNpmInstall() which executed
`npm install` directly. Inside the containerized build path,
electronuserland/builder:base strips npm/npx/corepack, so the inner
tools-pack build would fail at the assembled-app install step.

Route the install through OD_TOOLS_PACK_PNPM_BIN: buildDockerArgs sets
the env to the standalone pnpm binary it bootstraps, and the new
resolveProductionInstallCommand helper consumes that env to run
`<bin> install --prod --no-lockfile --config.node-linker=hoisted`.
Host invocations with no env set keep the prior npm behavior.
--config.node-linker=hoisted preserves the flat node_modules layout
that electron-builder packs the same way as npm-installed trees.

New tests cover the resolver branches and assert the docker-arg-to-
resolver chain end-to-end so reviewers can see the container's inner
build receives the env that switches its install away from npm.

* fix: harden linux container bootstrap

* fix: validate desktop marker liveness in headless cleanup

cleanup --headless previously skipped on any parseable desktop-root.json, trapping recovery when the AppImage had crashed and left a stale marker. Validate the marker the same way stopPackedLinuxApp does: if the PID is not in the live snapshot list, proceed through cleanup instead of skipping.

Extract the validation into validateDesktopAppImageMarker so the stop and cleanup paths share one definition of live and owned. Tests cover both branches: a stale marker drives cleanup to remove the runtime/output roots, while a live marker drives cleanup to skip and preserve them.
2026-05-15 16:38:29 +08:00
Tom Huang
c5d77a03bd
Garnet hemisphere (#1769)
Some checks failed
nix-check / build (push) Failing after 2s
* feat(chat-composer): enhance mention handling and input overlay

- Introduced a new overlay for inline mentions in the chat composer, improving user experience by visually indicating mentions as users type.
- Updated the `ChatComposer` component to manage mention entities and integrate them into the input field, allowing for better context and interaction.
- Enhanced the `AssistantMessage` component to support the display of plugin action panels based on the current project context, facilitating easier plugin management.
- Refactored related components to ensure consistent handling of project files and mentions across the application.

This update significantly improves the chat interaction model, making it more intuitive for users to engage with mentions and plugins.

* feat(plugin-management): enhance plugin action panels and UI components

- Updated the `AssistantMessage` component to include plugin action panels based on the latest project context, improving user interaction with generated plugins.
- Refactored the `PluginsView` to support detailed views for available marketplace entries, allowing users to access more information and actions for each plugin.
- Introduced new CSS styles for improved visual representation of plugin-related UI elements, enhancing overall user experience.
- Enhanced the `listPlugins` function to include an option for fetching hidden plugins, providing more flexibility in plugin management.

This update significantly improves the usability and functionality of the plugin management system, making it easier for users to interact with and manage their plugins.

* fix(assistant-message): refine plugin folder candidate selection logic

- Updated the `pluginFoldersTouchedThisTurn` function to improve the logic for selecting plugin folder candidates based on touched paths and message content.
- Introduced a new helper function, `pathMatchesFolderFileBasename`, to enhance the matching criteria for folder candidates.
- Added a check for explicit folder matches before falling back to a single candidate, improving accuracy in folder selection.
- Modified the `shouldRenderSlotAsText` function in `HomeHero` to include the name parameter, refining the rendering logic for slot text.

These changes enhance the functionality and reliability of the assistant message component in managing plugin folder candidates.

* feat(plugin-folder-actions): implement agent-routed CLI actions for plugin management

- Introduced a new `PluginFolderAgentAction` type to streamline actions related to plugin folders, including install, publish, and contribute.
- Updated the `DesignFilesPanel`, `FileWorkspace`, and `AssistantMessage` components to utilize the new agent action handling, improving user interaction with generated plugins.
- Refactored the action handling logic to send commands to the agent, enhancing the workflow for managing plugin folders.
- Added corresponding tests to ensure the new functionality works as expected and integrates seamlessly with existing components.

This update significantly enhances the plugin management experience by routing actions through the agent, allowing for a more cohesive and interactive user experience.

* Fix PR 1702 CI blockers

* Fix PR 1702 remaining CI checks

* Prebuild AGUI adapter after install

* Restore plugin project snapshot wiring

* feat(marketplace): refactor marketplace URL handling and enhance fetching logic

- Introduced new functions to normalize marketplace URLs and manage fetching of marketplace manifests, improving the reliability of marketplace integrations.
- Updated the server and plugin logic to utilize the new fetching mechanisms, ensuring consistent handling of marketplace data.
- Enhanced tests to cover new URL normalization and fetching scenarios, ensuring robustness in marketplace management.

This update significantly improves the marketplace experience by streamlining URL handling and enhancing data fetching capabilities.

* Fix project auto-send cleanup spec

* Reconcile run messages on cancel

* Use active design system as visual direction

* Fix active design system prompt wording

* feat(workspace-tabs): implement workspace tabs functionality and file attachment handling

- Introduced a new `WorkspaceTabsBar` component to manage workspace tabs, allowing users to navigate between different views (projects, marketplace, etc.).
- Enhanced file handling capabilities in the `HomeHero` and `EntryShell` components, enabling users to stage and attach files before project creation.
- Updated the `App` component to support auto-sending attachments alongside the first message in a project.
- Improved CSS styles for workspace tabs and attachment UI, ensuring a cohesive design and user experience.

This update significantly enhances the workspace navigation and file management features, providing users with a more intuitive and efficient workflow.

* refactor(workspace-tabs): streamline workspace tabs and UI components

- Removed unused components and actions from the `WorkspaceTabsBar` and `AppChromeHeader`, simplifying the codebase.
- Updated CSS styles for the workspace shell and tabs, enhancing visual consistency and reducing element sizes for a cleaner layout.
- Introduced a new client type detection mechanism to dynamically adjust the workspace shell's class, improving responsiveness.
- Added tests for the `WorkspaceTabsBar` to ensure proper navigation and tab management functionality.

These changes improve the overall performance and user experience of the workspace navigation system.

* Update critical e2e for entry modal flow

* Stabilize entry critical e2e flows

* fix(ui): adjust workspace tabs and header styles for improved layout

- Updated the CSS for workspace tabs and the app header, reducing element sizes and padding for a cleaner appearance.
- Introduced a new button in the `WorkspaceTabsBar` for quick access to the home tab, enhancing navigation.
- Minor adjustments to the layout and styles to ensure consistency across components.

These changes enhance the user interface and improve the overall user experience in the workspace navigation system.

* feat(workspace-tabs): implement pinned home tab functionality

- Added a new pinned home tab feature to the `WorkspaceTabsBar`, allowing the home tab to remain accessible during navigation.
- Updated tab management logic to collapse duplicate home tabs into a single pinned instance when restoring from local storage.
- Enhanced CSS styles for workspace tabs to accommodate the new pinned tab design.
- Updated tests to verify the behavior of the pinned home tab and its interaction with other tabs.

These changes improve navigation consistency and user experience within the workspace.

* refactor(workspace-tabs): enhance tab management and styling

- Updated CSS styles for workspace tabs, adjusting padding and flex properties for improved layout and consistency.
- Refactored tab creation logic to ensure unique IDs for project and marketplace tabs, enhancing navigation clarity.
- Removed deprecated functions related to pinned home tabs, streamlining the codebase.
- Improved test cases to verify independent behavior of home tabs during navigation.

These changes enhance the user experience by providing a more intuitive tab management system and a cleaner UI.

* style(workspace-tabs): update CSS for improved layout and visibility

- Adjusted CSS properties for workspace tabs, including overflow, position, and z-index to enhance layout and stacking context.
- Ensured consistent styling across tab components for better visual hierarchy.

These changes contribute to a more polished and user-friendly interface within the workspace.

* style(entry-layout): update CSS variables for improved layout consistency

- Replaced fixed width values with CSS variables for the entry rail to enhance flexibility.
- Adjusted padding and height properties for better visual alignment and spacing.
- Introduced a new background style for the entry main topbar to improve aesthetics.

These changes contribute to a more responsive and visually appealing layout in the entry view.

---------

Co-authored-by: qiongyu1999 <2694684348@qq.com>
Co-authored-by: Eli <129168833+qiongyu1999@users.noreply.github.com>
2026-05-15 14:42:11 +08:00
chaoxiaoche
bcc58af931
refactor(web): rename Execution mode and tighten settings dialog UI (#1568)
* refactor(web): rename Execution mode and tighten settings dialog UI

- Rename "Settings → Execution & model" to "Settings → Execution mode"
  across the web UI, i18n keys, docs, and e2e selectors.
- Redesign SettingsDialog: kicker + title row in the modal head, a
  flatMap-driven agent grid that renders the inline test-result row
  beside the selected card, compact unavailable cards with right-aligned
  install/docs links, and an install guide that only shows when the
  user has no working agent picked.
- Trim verbose subtitle / hint copy across chat model, CLI proxy,
  media providers, custom instructions, and memory sections.
- Add an `info` Icon variant for the redesigned settings hints.
- Update e2e selectors and docs that referenced the old menu label.

Co-authored-by: Cursor <cursoragent@cursor.com>

* refactor(web): polish Settings dialog — media providers, skills, MCP

Media providers
- Hide internal Stub fixture provider (settingsVisible: false)
- Split provider list into Available (integrated, editable) and Coming
  Soon (collapsed <details> drawer with name/hint/Docs link only)
- Drop right-side Integrated/Configured badges from every row; all rows
  in the main list are integrated by definition; inline grey "Saved"
  chip next to the provider name is the only status indicator now
- "Saved" badge moves inline to the right of the provider name and uses
  a neutral grey treatment (was a standalone green pill below the name)
- "Reload from daemon" button shows a 2s green "✓ Reloaded" flash on
  success instead of leaving a permanent paragraph under the header;
  errors remain sticky

Skills
- Replace three pill-row filter banks (Source, Type, Category) with a
  compact single-row toolbar: search + three inline <select> dropdowns
  side by side; active filter highlighted with a stronger border

MCP server
- Shorten section hint to one line
- Move WHAT YOUR AGENT CAN DO capabilities above the client dropdown
  (motivate before asking to act)
- Move "Build the daemon first" warning below the code block where it
  contextually explains why the command might fail, not as a top-level
  error before the user has done anything
- Downgrade "Restart your client" left-border from accent orange to
  border-strong grey — it is a next step, not a warning

External MCP
- Shorten section hint to one line

Misc CSS
- Add .sr-only utility for accessible off-screen live regions
- Add button.ghost.is-success-flash for transient success feedback
- Add .library-filter-selects / .library-filter-select for dropdown
  filter rows
- Add .media-provider-coming-soon-* for the roadmap drawer

Co-authored-by: Cursor <cursoragent@cursor.com>

* [codex] Add Cursor Agent auth diagnostics (#1538)

* Add Cursor Agent auth diagnostics

* Handle Cursor not logged in auth status

* Address Cursor auth review feedback

* Classify Cursor stdout auth failures

* test: expand Memory and Routines coverage (#1521)

* test: expand settings and packaged coverage

* test: extend memory settings coverage

* test: cover routine settings failure states

* test: cover routine operation failures

* test: fix daemon test typing on CI

* test: decouple packaged smoke from orbit bug

* test: avoid live memory LLM calls in route tests

* test: fix daemon fetch typing in CI

* fix: restore preview comment and inspect toggles

* test: align manual edit flow with current inspector UX

* test: align comment attachment flow with current preview comments UI

* fix: probe resolved Codex launch path during detection

* fix: remove duplicate board activation helper after rebase

* test: update ghost cli detection mock

* test: align FileViewer toolbar expectation

* ci: move full app tests to extended lane

* ci: run app tests by changed scope

* ci: cover shared app inputs in test scopes

* ci: avoid setup-node cache in windows packaged smoke

* test: align extended settings and manual edit flows

* refactor(web): rename Execution mode and tighten settings dialog UI

- Rename "Settings → Execution & model" to "Settings → Execution mode"
  across the web UI, i18n keys, docs, and e2e selectors.
- Redesign SettingsDialog: kicker + title row in the modal head, a
  flatMap-driven agent grid that renders the inline test-result row
  beside the selected card, compact unavailable cards with right-aligned
  install/docs links, and an install guide that only shows when the
  user has no working agent picked.
- Trim verbose subtitle / hint copy across chat model, CLI proxy,
  media providers, custom instructions, and memory sections.
- Add an `info` Icon variant for the redesigned settings hints.
- Update e2e selectors and docs that referenced the old menu label.

Co-authored-by: Cursor <cursoragent@cursor.com>

* refactor(web): settings dialog UX polish — layout, dedup, and interactions

- Remove duplicate section headers from all settings sections
  (Notifications, Appearance, Privacy, About, Design Systems, Skills,
  MCP server, Connectors, Media providers, Routines)
- Restructure Notifications cards: title + toggle on same row, hint below
- Restructure Skills toolbar: search + New skill button in row 1,
  filter dropdowns in row 2 with left-aligned labels
- Restructure Pet section: tabs and Wake button on same row
- MCP server: group capabilities and setup into separate cards,
  remove nested double border on client picker
- Connectors: show connect errors as toast instead of inline card text,
  position toast inside panel, hide single-provider tab
- Media providers: move Reload button to left-aligned small ghost button
- Memory: info icon shows path on hover, Path copied badge inline;
  Extraction history and MEMORY.md as standalone collapsible cards;
  group header hidden when only one type visible
- Pet grid cards: Adopt button hidden until hover, icon-only when adopted,
  description truncated to 2 lines, text fills full width via abs positioning
- Agent cards: selected state uses accent border only, no background change
- Add sun/moon icons to Appearance theme buttons (Light/Dark)
- Shorten several hint strings for clarity

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): resolve i18n review comments from PR #1568

- Update settings.title and settings.envConfigure to localized
  "Execution mode" in all 17 non-English locale files
- Add settings.memoryFlashPathCopied to all locales and use t()
  in MemorySection instead of hardcoded English "Path copied"
- Add settings.agentModelHead to all locales and use t() in
  SettingsDialog for "Model for:" agent model row header

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): update tests to match settings dialog redesign

- Add role prop to Toast (alert/status) so error toasts from
  ConnectorsBrowser are announced immediately by screen readers
- Clear connectErrorToast on successful connector retry
- Update SettingsDialog.execution tests:
  - Remove heading assertions for About and MCP server (headers
    were intentionally removed as duplicate nav labels)
  - Rewrite CLI env test to use codex-only fields (per-agent
    filtering means only selected agent's fields are shown)
  - Update Composio key hint text assertion to match shortened copy
  - Replace filter button click with select change for Type filter
  - Replace Configured/Unsupported/Integrated badge checks with
    updated assertions matching the new media provider UI
  - Replace disabled BFL row test with coming-soon section check
- Update SettingsDialog.media test: remove Fal.ai input assertions
  (non-integrated providers no longer have editable fields)

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): unblock CI for #1568

Three small fixes to get Playwright back to green on the settings
dialog redesign:

1. `en.ts`: revert `settings.envConfigure` to "Configure execution mode".
   This PR collapsed both `settings.title` (header gear) and
   `settings.envConfigure` (entry-side foot pill) to the same string
   "Execution mode", so `getByRole('button', { name: 'Execution mode' })`
   resolved to two elements and tripped Playwright strict mode in the
   three Composio-flow tests (entry-configuration-flows.test.ts:174,
   228, 285). Restoring the distinct label also gives screen readers
   a clearer hint for the pill, which doubles as a status display.
   Non-English locales still alias the two keys; happy to follow up
   on those, but they don't gate the (English-only) Playwright suite.

2. entry-configuration-flows.test.ts:167 — `Connectors` heading is now
   rendered at `<h2>` in the modal-head (SettingsDialog.tsx:1545), with
   the inner `<h3>` removed by design (see comment around line 1448).
   Updated the assertion from `level: 3` to `level: 2`.

3. project-management-flows.test.ts:360 — same change for the `Pets`
   heading.

Verified locally with `pnpm --filter @open-design/web typecheck` and
`pnpm --filter @open-design/e2e typecheck`. The actual Playwright
specs need the dev server up; I didn't rerun them here, but the
locator changes are mechanical and match the new DOM.

* fix(web): use exact match for Execution mode button locator

Playwright's `getByRole({ name })` defaults to substring matching, so
`{ name: 'Execution mode' }` still resolved to both the header gear
(aria-label "Execution mode") and the entry-side foot pill (aria-label
"Configure execution mode" — substring contains "Execution mode").
Strict mode tripped in the three composio-flow tests at lines 202,
257, and 319.

Adding `exact: true` makes each call resolve to just the header gear,
which opens the same dialog the foot pill does — the test outcomes
are unchanged.

---------

Co-authored-by: chaoxiaoche <chaoxiaoche@chaoxiaochedeMacBook-Pro.local>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Caprika <56862773+alchemistklk@users.noreply.github.com>
Co-authored-by: shangxinyu1 <shangxinyu@refly.ai>
Co-authored-by: lefarcen <935902669@qq.com>
2026-05-15 14:35:06 +08:00
PerishCode
4f15c33595 Merge remote-tracking branch 'origin/preview/0.8.0' into preview/v0.8.0 2026-05-14 21:10:03 +08:00
PerishCode
43b1b94c8e Add preview release channel 2026-05-14 19:15:16 +08:00
lefarcen
3b7f87c7ae Merge remote-tracking branch 'origin/main' into reconcile/garnet-main-merge 2026-05-14 17:44:26 +08:00
PerishCode
cba8bf151d chore: align namespace lifecycle packaging 2026-05-14 16:35:46 +08:00
lefarcen
b268bbe169 Merge origin/garnet-hemisphere (post-9e196d34) — Use Plugin handoff fix
Brings in 11 new garnet commits, most importantly:
- 1a90aef4 feat(plugin-use): implement plugin use handoff functionality —
  fixes the bug QA reported where /plugins Use Plugin would 422 silently
  for template plugins; new flow hands off to HomeView with the plugin
  pre-bound + input form prompted there.
- 2ac58544 feat(plugin-inputs): enhance plugin input handling with file
  upload support — extends PluginInputsForm for file uploads.
- 3b167b69 feat(plugins): registry protocol — new @open-design/registry-protocol
  workspace package (needs build before daemon boot).
- Plus enhancements to plugin metadata, GitHub installer, plugin detail
  view, login/whoami, static HTML preview paths.

Conflicts resolved:
- packages/contracts/src/api/projects.ts: HEAD's skipDiscoveryBrief
  field + garnet's contextPlugins (@-mention plugin context refs) both
  kept on ProjectMetadata.
- apps/landing-page/* (3 files): accepted HEAD — garnet had the older
  single-page landing-page header; main has the multi-page layout
  (/skills/, /systems/, /templates/, /craft/) with dynamic counts. Not
  related to the Use Plugin core fix.

New @open-design/registry-protocol package must be built before daemon
boots; pnpm install does this via postinstall already.
2026-05-14 16:32:35 +08:00
Nagendhra Madishetti
40766ef1ba
test(web): Critique Theater Phase 13 (reducer p99 bench + surface coverage walker) (#1318)
* feat(web): pure reducer for Critique Theater states (Phase 7.1)

Pure CritiqueState reducer driven by the contracts-level PanelEvent
(the same shape both the live SSE stream and the recorded transcript
emit), so a single reducer powers both the in-flight panel and the
rerun replay. Lifecycle covers run_started → running → (shipped /
degraded / interrupted / failed), with panelist_open / dim /
must_fix / close / round_end events building per-round
CritiquePanelistView entries as they arrive.

Defensive behaviour that surfaced while writing the spec tests:
- Terminal phases (shipped / degraded / interrupted / failed) are
  sticky against further lifecycle events for the same run, except
  for parser_warning which can land late and is recorded in a side
  channel without changing phase.
- A new run_started for a different runId at any time discards the
  prior state and reboots, so the UI can launch consecutive runs
  without an explicit reset action.
- Events whose runId does not match the active run return the same
  state reference, so React's useReducer doesn't re-render
  subscribers on stray traffic.
- Round bookkeeping keys by round number rather than "always last",
  so an out-of-order panelist_dim for round 1 arriving after a
  round 2 dim does not corrupt the round 2 bucket.

Test coverage: 18 cases covering each transition, the runId guard,
sticky-terminal behaviour, the out-of-order round invariant, and
the stable-identity guarantee. Sets up Phase 7.2 and 7.3 to wire
SSE + replay into the same reducer.

* feat(web): useCritiqueStream hook subscribes to SSE and feeds reducer (Phase 7.2)

createCritiqueEventsConnection is a pure connection manager that
mirrors apps/web/src/providers/project-events.ts: opens an
EventSource at /api/projects/:id/events, listens for every name in
CRITIQUE_SSE_EVENT_NAMES, decodes each frame back into a PanelEvent
(stripping the critique. prefix and merging the data payload), and
hands it to the caller's onEvent. Reconnect uses exponential
backoff (1s → 30s) and resets on `ready`; malformed payloads drop
with a dev-mode warning rather than tearing the stream.

useCritiqueStream wraps the manager in a useReducer that owns the
CritiqueState. enabled=false or a null projectId tears down the
connection cleanly; switching projectId closes the old connection
and opens a fresh one. The returned dispatch lets local UI
synthesise actions (e.g. an Esc keypress firing a synthetic
interrupted while a kill request is in flight); production traffic
comes from the SSE stream.

Test coverage:
- sse.test.ts (10 cases, node env): subscription set covers every
  CRITIQUE_SSE_EVENT_NAMES channel; payload decoding lifts the wire
  shape back to PanelEvent; malformed JSON is swallowed and does
  not stop the stream; exponential backoff schedule and ready-reset
  semantics are pinned with a setTimeout seam; close() cancels
  pending reconnects and shuts the live source; no-op fallback
  when EventSource is unavailable.
- useCritiqueStream.test.tsx (6 cases, jsdom env): idle pre-event,
  reducer driven by synthetic actions, no connection when disabled
  or projectId is null, clean close on unmount, projectId change
  reopens cleanly.

* feat(web): useCritiqueReplay hook drives reducer from transcript file (Phase 7.3)

Fetches the per-run NDJSON transcript (one PanelEvent per line),
parses every line via the shared isPanelEvent predicate, and
dispatches into the same CritiqueState reducer the live SSE stream
uses. A single reducer means the UI rendering a replay can be
identical to the live panel, and a UI mounting both
useCritiqueStream and useCritiqueReplay in parallel does not have
to reconcile two state shapes.

speed knob is `paused | instant | live | { intervalMs: N }`.
- instant flushes every event synchronously, useful for opening a
  finished run already at its terminal state.
- intervalMs paces dispatches at a fixed cadence so the reviewer
  can watch the run unfold.
- paused parses the transcript but holds events back until the
  caller advances speed (consumers can drive a scrubber later).
- live is reserved for the future "playback at original cadence"
  feature, currently treated as instant; replay timestamps are not
  yet persisted with each event so honest pacing requires a
  follow-up Phase 7+ task.

gunzip seam handles `.ndjson.gz` transcripts via
DecompressionStream when present; the production fetch path picks
between text and arrayBuffer based on the URL extension. Both seams
are injectable so the unit tests don't need to spin up a real
network or a real gzip pipeline.

Test coverage (8 cases, jsdom env):
- Idle status before any URL is provided.
- speed=instant flushes the full transcript synchronously to
  shipped state.
- speed={intervalMs:N} paces with the setTimeout seam, reaching
  done after the last tick.
- speed=paused leaves status=playing with no dispatches.
- Empty transcript reports done with state still idle.
- Fetch rejection surfaces an error status with the message.
- Malformed NDJSON lines are skipped; valid events around them
  still land.
- .gz transcripts route through the gunzip seam.

Closes the Phase 7 plan tasks 7.1 / 7.2 / 7.3 (reducer + stream +
replay), all on one branch ready for review. Phases 8+ (Theater
components) consume these from this PR.

* fix(web): close payload-override gap + paused-resume bug in Critique Theater hooks (Phase 7 review)

Two P1 fixes from lefarcen's review on PR #1307:

SSE payload override

`sseToPanelEvent` previously spread `data` after the channel-derived
`type`, so a payload-provided `type` could override the channel and
route a `critique.run_started` frame into the reducer as a `ship`
action. Reversed the spread so the channel-derived `type` is
authoritative, and revalidated the resulting object through the
contracts-level `isPanelEvent` predicate before returning. Frames
that fail validation (missing runId, empty runId, unknown type) are
dropped, so a malformed or compromised SSE frame can no longer
dispatch a wrong-shape action into the reducer.

Three new sse.test.ts cases pin the regression: hostile `type:'ship'`
in the payload still resolves to `run_started`, missing runId is
dropped, empty runId is dropped.

Replay pause/resume

`useCritiqueReplay` had one big effect keyed on `transcriptUrl`
only, so flipping `speed` from `paused` to `instant` never re-fired
and the held events sat undispatched. Split into a parse effect
(depends on URL, fetches and stores events in state) and a pace
effect (depends on parsed-events + speed, owns the cursor + timers).
The playback cursor lives in a ref that survives pause/resume
cycles, so flipping `paused` -> `instant` flushes from the current
position rather than restarting (which would double-dispatch
`run_started` and reset the reducer).

Two new useCritiqueReplay.test.tsx cases:
- paused-then-instant transitions from `playing` to `done` and
  reaches the shipped terminal phase
- intervalMs paced playback dispatches one event, pauses to drain
  the next scheduled timer, flips to instant, and confirms the
  remaining transcript drains exactly once (cursor was preserved)

Doc consistency

The earlier source comment in useCritiqueReplay.ts claimed `live`
"paces by recorded timestamps" while the impl used zero-delay
timers and the PR body said it behaves like `instant`. Aligned to
reality: `live` currently behaves like `{ intervalMs: 0 }` (events
drain on successive microtasks via setTimeoutFn) because transcripts
do not yet carry per-event timestamps. Honest timestamp-driven
pacing is queued as a Phase 7+ follow-up.

Validated: pnpm guard, pnpm --filter @open-design/web typecheck,
Theater suite 47/47 (up from 42, +3 sse + 2 replay), full web suite
96 files / 888 tests.

* feat(i18n): seed Critique Theater key block (en + zh-CN; other locales fall back via spread)

* feat(web): Theater PanelistLane component (Phase 8.1)

* feat(web): Theater ScoreTicker component (Phase 8.2)

* feat(web): Theater RoundDivider component (Phase 8.3)

* feat(web): Theater InterruptButton component with Escape keybind (Phase 8.4)

* feat(web): Theater TheaterDegraded chip (Phase 8.5)

* feat(web): Theater TheaterCollapsed post-run summary (Phase 8.6)

* feat(web): Theater TheaterTranscript replay surface (Phase 8.7)

* feat(web): Theater TheaterStage top-level container (Phase 8.8)

* feat(web): Theater CSS using existing semantic tokens (no hex literals)

* feat(web): Theater public exports barrel

* fix(web): resolve P2 + P3 review feedback on Phase 8 (PR #1314)

Addresses all 4 P2 + 3 P3 items from codex, Siri-Ray, and lefarcen.

State-lifecycle fixes (3 x P2)
1. Reducer learns a synthetic `__reset__` action (`CritiqueResetAction`).
   Host hooks dispatch it when their gating prop changes so a stale
   run from a prior project / transcript cannot bleed into the next
   context. Reset is idempotent on idle (returns the same reference).
2. `useCritiqueStream` dispatches `__reset__` at the top of its
   connection effect, so a workspace switch from project A (which
   streamed a critique) to project B clears the reducer before the
   new EventSource opens. enabled=false also clears.
3. `useCritiqueReplay` dispatches `__reset__` at the top of its
   parse effect, so transcriptUrl swaps (including swap-to-null after
   a replay reached `shipped`) lift the reducer back to idle before
   the new fetch starts.

SSE validation (1 x P2)
4. `sseToPanelEvent` now runs a per-variant `hasValidVariantShape`
   check after the cheap `isPanelEvent` predicate. A
   `critique.ship` frame missing `composite` / `round` / `status` /
   `artifactRef` is rejected before reaching the reducer, so
   TheaterCollapsed can no longer crash on `undefined.toFixed(1)`.
   Every variant's required fields are validated: run_started
   (protocolVersion, non-empty cast, maxRounds, threshold, scale),
   panelist_* (round, role, plus variant-specific shape), round_end
   (round, composite, mustFix, decision in {continue,ship}, reason),
   ship (round, composite, status, artifactRef.{projectId,artifactId},
   summary), degraded (reason, adapter), interrupted (bestRound,
   composite), failed (cause), parser_warning (kind, position).

Reducer correctness (1 x P2)
5. `panelist_open` now materializes the round + an empty panelist
   view (`{dims: [], mustFixes: []}`) so TheaterStage can highlight
   the in-progress lane the instant the tag opens. Before this, a
   stream that emitted only `panelist_open` after `run_started` left
   `rounds = []` and the UI rendered no current round until a later
   `panelist_dim` arrived.

Polish (3 x P3)
6. Brand role tint swaps from `var(--magenta, var(--accent))` to
   `var(--purple, var(--accent))`. `--purple` is actually defined
   across the design systems; `--magenta` is not, so Brand was
   silently falling through to `--accent` and looking identical to
   Designer.
7. New i18n key `critiqueTheater.interruptedSummary` for the
   interrupted-collapse copy ("Interrupted at round N, best
   composite X.X"). Previously the interrupted branch reused
   `shippedSummary` and the UI read "Shipped at round..." for a run
   that specifically did not ship. Native value in en + zh-CN; other
   locales fall back via `...en` spread.
8. `TheaterDegraded` heading id comes from `useId()` instead of a
   hardcoded `theater-degraded-heading`, so two chips rendered on
   the same page (chat history with multiple completed runs) keep
   their aria-labelledby references unambiguous.

Tests (15 new cases)
- reducer.test.ts (+5): __reset__ on running/terminal/idle, panelist_open materializes round, panelist_open does not stomp prior panelist data.
- sse.test.ts (+6): variant-level rejection for ship without required fields, degraded without adapter, run_started with empty cast, panelist_dim with non-numeric score, round_end with unknown decision, plus a positive fully-formed ship.
- useCritiqueStream.test.tsx (+2): state reset on projectId change, state reset on enabled flip false.
- useCritiqueReplay.test.tsx (+1): state reset on transcriptUrl swap to null after a replay reached shipped.
- TheaterCollapsed.test.tsx (text-pinning update): asserts the interrupted branch reads "Interrupted at round 1" + "best composite 7.9", and explicitly NOT "Shipped at round...".
- TheaterDegraded.test.tsx (+1): two chips on the same page get unique aria-labelledby ids that each resolve to an `<h3>`.

Validated
- pnpm guard clean
- pnpm --filter @open-design/web typecheck clean
- Theater suite: 13 files, 101 tests (was 86 on the first Phase 8 push, +15 new)
- tests/i18n/locales.test.ts 5 of 5 across 18 locales

* feat(web): CritiqueTheaterMount wires SSE + reducer into a single drop-in (Phase 9.1)

* feat(i18n): Critique Theater strings for de + ja + ko + zh-TW (Phase 9.2)

* fix(web): resolve P1 + P2 review feedback on Phase 9 (PR #1315)

Addresses every blocker from codex, Siri-Ray, and lefarcen. The
three state-lifecycle and SSE-validation issues they also flagged
inherit fixes from PR #1314's review pass that this branch now sits
on top of after rebase.

Real daemon kill on Interrupt (P1)
- CritiqueTheaterMount now POSTs to
  /api/projects/:id/critique/:runId/interrupt alongside the
  optimistic local dispatch. Before this fix, clicking Interrupt
  only flipped the React state to interrupted while the daemon job
  kept running. The fetch is best-effort: a 404 (endpoint not wired
  yet, lands in Phase 15) is swallowed with a dev-mode console.warn
  so the UI still moves to the collapsed badge.
- New fetchInterrupt test seam lets RTL assert on the URL / method
  and simulate the "daemon not ready yet" path. Two tests pin both:
  the happy URL proj-42/critique/run-abc/interrupt POSTs, and a
  rejected fetch still flips the UI.

interruptPending reset on new run (P2)
- A ref-backed effect compares the current runId against the last
  one we saw; when it changes, interruptPending is cleared. A user
  who interrupts run-1 and then triggers run-2 from the same mount
  now gets a fresh, enabled kill button instead of one stuck in
  "Interrupting…". Pinned by a new mount test.

Escape keybind scope (P2)
- InterruptButton now checks the keydown target. Escape inside an
  input, textarea, select, or contenteditable element is ignored
  (and any ancestor of those via closest() is treated the same
  way). Body-level focus still fires the keybind so the Theater
  area's affordance keeps working. Four new tests cover textarea,
  input, contenteditable, and the body-focus positive case.

userFacingName i18n key (P2)
- The spec at specs/current/critique-theater.md:6 mandates a single
  critiqueTheater.userFacingName key so the "Design Jury" label can
  be renamed without touching code. Phase 8 introduced
  critiqueTheater.title by mistake; renamed across types.ts, en.ts,
  zh-CN.ts, de.ts, ja.ts, ko.ts, zh-TW.ts, and the lone consumer
  TheaterStage.tsx. The locale alignment test stays green.

Validated
- pnpm guard clean
- pnpm --filter @open-design/web typecheck clean
- Theater suite: 14 files, 112 tests (was 101 before, +11 new for
  the Phase 9 review pass: 3 mount + 4 InterruptButton focus scope;
  the rest were already in #1314's review fix).
- tests/i18n/locales.test.ts 5 of 5 across 18 locales.

* feat(daemon): adapter-degraded registry with TTL (Phase 10.1)

In-memory registry recording adapters that produced malformed or
oversize transcripts so the orchestrator can skip them for a TTL
window (default 24h) instead of cycling through known-bad providers
on every run.

Records carry reason (malformed_block | oversize_block |
missing_artifact), source label, and expiresAt. The test-only
clock seam lets the suite advance time deterministically and prove
that an expired entry stops counting as degraded without anyone
calling clearDegraded.

7/7 vitest cases green.

* feat(daemon): synthetic good + bad adapter fixtures (Phase 10.2)

Two test-only adapters that read the existing v1 transcript
fixtures (happy-3-rounds and malformed-unbalanced) and replay them
as either a full string or a 512-byte chunked stream. The chunked
form is what the conformance harness uses to prove the parser
holds together when the transcript arrives in arbitrary network
slices, not as one buffered blob.

* feat(daemon): adapter conformance harness (Phase 10.3)

runAdapterConformance pulls a transcript through the same
parseCritiqueStream pipeline the orchestrator uses and classifies
the outcome as shipped, degraded, or failed. On a degraded
outcome it forwards the matched reason to the adapter-degraded
registry, so a single nightly conformance run is what populates
the skip list rather than the orchestrator learning each adapter
is broken at request time.

5/5 vitest cases green covering shipped, malformed degraded,
oversize degraded, no-ship failure, and the harness-thrown
failure path.

* test(e2e): Critique Theater Playwright suite (Phase 11)

Six tests, one viewport per visual case, deterministic SSE
fixtures stubbed via page.route(). Adds the suite to
test:ui:extended so the existing extended-UI lane picks it up.

Coverage:

  1. Happy path: a single mounted theater plays the full
     fixture (1 run_started, 5 panelists open / dim / must_fix /
     close, 1 round_end, 1 ship) and ends on the score badge.
  2. Interrupt mid-run: the panelist that is open at the time
     the interrupt button is clicked closes with an interrupted
     marker and the transcript freezes there.
  3. Visual regression at 375x720 mobile.
  4. Visual regression at 768x1024 tablet.
  5. Visual regression at 1280x800 desktop.
  6. A11y role tree: the theater region exposes a labelled
     landmark, each panelist lane is a group with an accessible
     name, the score is a status live region.

All SSE traffic is stubbed by page.route so the suite runs in CI
without a daemon. The toggle is seeded via localStorage by
bootAppWithCritiqueEnabled so the gate behaves as if Settings
flipped it on. typecheck clean; playwright --list reports 6.

* test(web): reducer p99 bench at 10k iterations (Phase 13.1)

Locks the documented 2ms budget for the Critique Theater reducer
on a representative SSE script (27 actions, one full happy run)
behind a regression gate. Asserts p99 stays under 4ms (2x the
documented budget) so CI runners with a noisy neighbour do not
flake while a real regression to 20ms or 200ms still trips.

The bench is a vitest case rather than a bare microbenchmark so
it runs in the same CI lane as every other web test and does not
need a parallel runner.

* test(web): critique surface coverage walker (Phase 13.2)

Walks the public critique surface (11 SSE event names, 5 panelist
roles, 6 lifecycle phases, 9 named i18n keys) and asserts each
named symbol appears in both the src corpus and the test corpus.
The walker is the gate that catches a rename in one half of the
codebase without a matching update in the other half: a future
PR that drops 'panelist_must_fix' from the reducer without also
removing its test reference fails this suite.

62 assertions, one per symbol per corpus.

* fix(web): tighten Phase 13 gates from lefarcen review (PR #1318)

Address the actionable items from lefarcen's review of the two
Phase 13 CI gates. The two questions about longer-term DX (pre-
commit hook to auto-update the symbol table, AST-walker swap)
are documented as deferred follow-ups rather than landed here.

reducer-bench:
  - Describe renamed to 'reducer p99 regression gate (Phase 13.1)'
    so it reads as a gate, not a comparative benchmark.
  - Failure message now carries the full distribution
    (p50 / p90 / p99 / max + ceiling), so triage on a tripped gate
    can distinguish a real 20ms regression from a 4.001ms CI hiccup
    without re-running locally (lefarcen Q3).
  - Captured a baseline (p50=0.011ms p90=0.013ms p99=0.018ms
    max=0.244ms on a local Node 24 / Win11 run, 2026-05-11) inside
    the docblock so reviewers can see the actual reading sits ~222x
    below the 4ms ceiling (lefarcen Q1).
  - Replaced 'role as any' casts with PanelistRole-typed casts so
    the fixture is typecheck-strict.
  - Phase numbering corrected (13.2 → 13.1 to match the PR body).

critique-coverage:
  - Symbols now grouped under four describe blocks (SSE events /
    panelist roles / lifecycle phases / i18n keys) so a failure
    points at the category that drifted at a glance (lefarcen nit).
  - Docblock now explains the grep-over-AST trade-off (the bug
    class is structural at the string level, not at the AST level)
    and points at the future AST-walker work as a deferred follow-
    up (lefarcen Q2).
  - Docblock now walks a contributor through the four-step
    maintenance flow (add to contract → add caller → add test →
    add literal here), so the next person to add an SSE event or
    i18n key knows the gate exists and what to update (lefarcen
    Q4).
  - Phase strings switched from 'phase: <name>' to bare-quoted
    literals so the walker is robust against single vs double
    quotes and ':' vs '===' source-shape changes.
  - Dead try/catch around 'stack = [root]' removed (cannot throw).
  - Per-symbol failure messages name the symbol AND which corpus
    is missing it, so the gate is self-describing on the next
    CI red.
  - Phase numbering corrected (13.4 → 13.2 to match the PR body).

63 / 63 vitest cases green (1 bench + 62 coverage). Web
typecheck clean.

* fix(web): tighten coverage walker semantics from lefarcen P2/P3 (PR #1318)

Two follow-on findings on commit 338a185:

P2 — coverage gate weakened. The previous revision used one helper
`corpusReferences` for both SRC and TEST corpora, and that helper
accepted the unprefixed PanelEvent type form (`type: 'panelist_must_fix'`)
as a substitute for the prefixed SSE wire name (`critique.panelist_must_fix`).
The fallback is correct on the TEST side (reducer tests dispatch
PanelEvent literals) but it weakened the SRC side: production code
could drop the SSE channel name silently and the PanelEvent type
alias would keep the walker green.

Split into two helpers: `srcReferences` is strict (exact substring
match only, no fallback) and `testReferences` keeps the lenient
fallback for SSE events. The production-side assertions now route
through `srcReferences` so the wire name is load-bearing again.

P3 — maintenance doc overclaimed. The previous revision said 'CI red
if you forget step 4' but the symbol arrays are partially hand-
maintained, so a contributor adding a NEW phase string or i18n key
without updating the array leaves CI green (the walker never knew
to look). Rewrote the failure-mode section to distinguish the two
cases:

  - Renaming an EXISTING symbol without updating the walker → CI red
    (existing assertion fails because the old name is gone).
  - Adding a NEW hand-maintained symbol without updating the walker
    → CI stays green (walker does not know to look for it).

Also clarified that `SSE_EVENTS` and `PANELIST_ROLE_STRINGS` are
auto-built from contracts so step 4 is one-line for `PHASE_STRINGS`
and `I18N_KEYS` only.

63 / 63 vitest cases still green.

* fix(web): close two P2 findings on PR #1318 (Siri-Ray + lefarcen)

P2 (coverage walker counted self as evidence). The walker walked
apps/web/tests, which contains apps/web/tests/components/Theater/
critique-coverage.test.ts itself. The hand-maintained PHASE_STRINGS
and I18N_KEYS literals inside that file would satisfy the test-side
coverage assertion against themselves, so a real Theater test that
covers a symbol could be deleted and the gate would still pass.

Excluded the walker file from TEST_FILES via path.resolve(__filename)
filter so the test corpus only contains independent evidence.

Once the walker stopped seeing itself, the gate correctly red-flagged
nine i18n keys that no INDEPENDENT test exercises:
critiqueTheater.userFacingName, roundLabel, composite, threshold,
interrupt, interrupted, degradedHeading, shippedSummary,
interruptedSummary. Component tests like TheaterCollapsed.test.tsx
exercise the rendered text but never mention the key STRING, so the
walker couldn't see them. Closed that gap by adding
apps/web/tests/components/Theater/critique-i18n-keys.test.ts: 9 cases,
one per watched key, asserting the dictionary entry exists as a
non-empty string. That's both real coverage (catches a stale dict)
and the independent evidence the walker requires.

P2 (interruptedSummary missing from de/ja/ko/zh-TW). The native
locale overrides were missing the key, so an interrupted run on a
German / Japanese / Korean / Traditional Chinese UI silently fell
back to the English string via the ...en spread. Added the key with
{round} and {composite} placeholders preserved, using PerishCode's
suggested copy from the earlier review thread.

Verified:
- pnpm --filter @open-design/web typecheck clean.
- pnpm exec vitest run tests/components/Theater tests/i18n:
  20 files / 190 tests green (critique-coverage 62 / 62,
  critique-i18n-keys 9 / 9 new, reducer-bench 1 / 1, locales 5 / 5).

* fix(web): drop the Dict cast in i18n key coverage test (lefarcen P1 / Siri-Ray on PR #1318)

The previous revision used `(en as Record<string, string>)[key]` to
read each watched key. Dict has no string index signature, so CI's
strict typecheck rejected the broad cast with TS2352 even though the
runtime assertion was fine.

Replaced with the typed pattern lefarcen suggested: type WATCHED_KEYS
as `readonly (keyof typeof en)[]` and read `en[key]` directly. That
removes the cast and also strengthens the test, because a renamed or
removed key now fails the type check immediately rather than at
runtime.

Verified:
- pnpm --filter @open-design/web typecheck clean.
- pnpm --filter @open-design/web exec vitest run
  tests/components/Theater/critique-i18n-keys.test.ts: 9 / 9 green.

* fix(web): tighten isPanelEvent in contracts so enum + numeric fields are checked end-to-end (Siri-Ray round-3 P1 on PR #1314)

The variant validator on the web SSE path previously accepted any
`typeof === 'string'` for closed-enum fields (ship.status,
panelist_*.role, degraded.reason, failed.cause, parser_warning.kind,
run_started.cast[]) and any `typeof === 'number'` for numeric fields,
which let NaN / Infinity through. Downstream components index i18n
tables by enum value, so an unknown status or role would land
`SHIP_BADGE_KEY[final.status]` on undefined and crash the translator.

The replay parser had a separate gap: `useCritiqueReplay.parseTranscript`
called the cheap `isPanelEvent` header check directly, so a recorded
line like `{"type":"ship","runId":"r"}` reached the reducer with
composite, status, round, artifactRef, summary all undefined and
TheaterCollapsed then called `final.composite.toFixed(1)` on undefined.

Resolution: move all wire-side validation into the contract guard.

- Export const arrays for the closed enums:
  SHIP_STATUSES, DEGRADED_REASONS, FAILED_CAUSES, PARSER_WARNING_KINDS,
  ROUND_DECISIONS (PANELIST_ROLES already existed).
- Rewrite `isPanelEvent` in packages/contracts/src/critique.ts to be the
  single deep validator: header (known type + non-empty runId) plus
  every variant-specific required field plus closed-enum membership
  plus Number.isFinite on every numeric field. Documented as the wire
  source of truth.
- Drop the local `hasValidVariantShape` from web/sse.ts; sseToPanelEvent
  now relies entirely on the contract guard, and parseTranscript in
  useCritiqueReplay (which already uses isPanelEvent) gets the deeper
  validation for free.

Tests (TDD, red-first):

- packages/contracts/tests/critique.test.ts: 13 new cases pinning the
  strict guard directly (well-formed across every variant, every
  rejection path: unknown type, empty/non-string runId, unknown enum,
  non-finite numeric, missing variant field).
- apps/web/tests/components/Theater/state/sse.test.ts: 9 new cases for
  each closed-enum rejection on the wire path plus a positive sweep
  across every legal enum value across every variant.
- apps/web/tests/components/Theater/hooks/useCritiqueReplay.test.tsx:
  2 new cases for incomplete and unknown-enum transcript lines.

Verified:
- pnpm --filter @open-design/contracts test 4 files / 30 tests green.
- pnpm --filter @open-design/contracts build clean.
- pnpm --filter @open-design/web typecheck clean.
- pnpm --filter @open-design/web test 107 files / 976 tests green.

* fix(contracts): enforce numeric domains in isPanelEvent (lefarcen P2 on PR #1314 round 4)

The strict guard from PR #1314 round 3 enforced enum membership and
Number.isFinite, but accepted any finite number where the contract
intends a specific domain: scale: 0 (ScoreTicker divides by it),
negative thresholds, fractional rounds, negative mustFix, etc.
ScoreTicker.tsx writes `var(--scale, ${state.scale})` into inline
CSS and divides by it for tick width, so a guard-passing scale: 0
shipped Infinity into the rendered style. Negative composite /
score values reached downstream code that assumes >= 0.

Resolution: mirror the daemon-side Zod domain constraints in the
runtime guard.

Three new helpers in packages/contracts/src/critique.ts:

  - isPositiveInt(v): integer with v > 0. Used for round, maxRounds,
    scale, protocolVersion (all 1-indexed in the orchestrator).
  - isNonNegativeInt(v): integer with v >= 0. Used for mustFix,
    position, bestRound. bestRound: 0 is the valid sentinel for
    'interrupted before any round closed'.
  - isNonNegativeFinite(v): finite number with v >= 0. Used for
    composite, score, dimScore, threshold. Threshold may be
    fractional (e.g. 8.5 on a scale of 10).

Cross-field check inside run_started: threshold <= scale (the daemon
Zod schema enforces this with an epsilon refine, the wire guard
matches the same intent).

Tests (TDD, red-first) added in packages/contracts/tests/critique.test.ts:

  - 22 new rejection cases across every numeric field that
    previously slipped through: scale: 0, negative scale, fractional
    scale, maxRounds: 0, fractional maxRounds, protocolVersion: 0,
    fractional protocolVersion, negative threshold, threshold > scale,
    round: 0, fractional round, negative dimScore / score, negative /
    fractional mustFix, negative composite, ship round: 0, negative /
    fractional bestRound, negative interrupted composite, negative /
    fractional parser_warning position.
  - 3 positive boundary cases that must still pass: threshold == scale,
    fractional threshold within [0, scale], interrupted with
    bestRound: 0 (no round completed before interrupt), parser_warning
    with position: 0 (start of stream).

Verified:
- pnpm --filter @open-design/contracts build clean.
- pnpm --filter @open-design/contracts test: 4 files / 59 tests green
  (was 37 before the new domain cases).
- pnpm --filter @open-design/web typecheck clean.
- pnpm --filter @open-design/web test: 110 files / 1004 tests green;
  no regression on Theater suite, sse validator, replay parser, or
  assistant-feedback widget tests.

* fix(web): restore wait-for-daemon-ack pattern on Theater interrupt

Same regression as flagged on PR #1316 post-main-merge: the
optimistic local dispatch fired before the POST resolved, so a
daemon 404 / 409 still terminalized the UI and the real SSE
terminal event got ignored by the sticky interrupted phase.

Snapshot runId / bestRound / composite at click time, dispatch
interrupted only on res.ok, clear interruptPending on rejection or
non-2xx so the user can retry. Tests cover rejection + 404 leaving
the run on the live stage; the 204 path waits for the ack.

* test(e2e): move critique-coverage walker from apps/web/tests to e2e/tests (Siri-Ray P2)

The walker is by definition a cross-app consistency check: it reads
the web reducer, the daemon critique module, the contracts package,
and the e2e UI suite. Hosting it under apps/web/tests/ violated the
repo boundary rule (root AGENTS.md): app packages must not import
another app's private src/ or tests/ as a shared helper, and
cross-app consistency checks belong in e2e/tests/. The web test
lane was effectively coupled to daemon and e2e file layout, so a
daemon-only refactor could break the web lane.

Moved the file to e2e/tests/critique-coverage.test.ts and switched
the contracts import to the import.meta.glob shape the e2e package
already uses (see localized-content.test.ts), so the e2e package
does not have to add @open-design/contracts as a workspace dep just
to load two const arrays. REPO_ROOT and SELF_PATH recalculated for
the new location.

Web test lane no longer depends on daemon, contracts, or e2e layout.
The e2e walker covers the same 62 assertions as before:

  e2e/tests/critique-coverage.test.ts  62 / 62 green

Web typecheck clean, e2e typecheck clean.

* fix(test): add projectKind prop to FileViewer deck render after v0.7.0 merge

---------

Co-authored-by: Nagendhra <nagendhra405@gmail.com>
2026-05-14 15:55:36 +08:00
lefarcen
6c16283850 Merge origin/main (post-7c8305f4) into reconcile branch
Brings in 10 new main commits: routine deep-link to specific
conversations (#1508), Windows resource cache fix for Orbit templates,
collapsible comment side panel (#1607), routines project radio polish,
Copilot logo swap, and minor UI fixes.

Conflicts resolved:
- router.ts: garnet's home/view + marketplace routes + main's
  per-project conversationId deep-link field coexist on Route union
- ProjectView.tsx: garnet's isPhantomDaemonRunMessage helper +
  main's isStoppableAssistantMessage helper both kept
- ProjectView.run-cleanup.test.tsx: accepted HEAD (garnet's
  phantom-row regression test); main's three new tests for
  finalizeActiveAssistantMessagesOnStop / clearStreamingConversationMarker
  / shouldClearActiveRunRefs are queued as a follow-up TODO inline.
2026-05-14 15:13:38 +08:00
shangxinyu1
2976c76fc3
test: expand Memory and Routines coverage (#1521)
* test: expand settings and packaged coverage

* test: extend memory settings coverage

* test: cover routine settings failure states

* test: cover routine operation failures

* test: fix daemon test typing on CI

* test: decouple packaged smoke from orbit bug

* test: avoid live memory LLM calls in route tests

* test: fix daemon fetch typing in CI

* fix: restore preview comment and inspect toggles

* test: align manual edit flow with current inspector UX

* test: align comment attachment flow with current preview comments UI

* fix: probe resolved Codex launch path during detection

* fix: remove duplicate board activation helper after rebase

* test: update ghost cli detection mock

* test: align FileViewer toolbar expectation

* ci: move full app tests to extended lane

* ci: run app tests by changed scope

* ci: cover shared app inputs in test scopes

* ci: avoid setup-node cache in windows packaged smoke

* test: align extended settings and manual edit flows
2026-05-14 14:48:40 +08:00
Nagendhra Madishetti
e508fa3fbd
test(e2e): Critique Theater Phase 11 activation (un-fixme suite, seeded-project nav, split SSE fixture) (#1483) 2026-05-14 14:27:39 +08:00
pftom
f1d0dac58b feat(plugins): enhance plugin registry with new metadata and preview features
- Added new fields to the plugin and metadata types, including `mode`, `taskKind`, `surface`, and `preview`, to improve plugin description and categorization.
- Implemented a `previewFrom` function to generate structured previews for plugins based on their metadata, enhancing the user experience.
- Introduced a `visualKindFor` function to determine the visual representation of plugins based on their attributes, improving sorting and display logic.
- Updated the `entryFromMarketplace` and `officialEntryFromManifest` functions to accommodate the new fields and ensure proper handling of plugin entries.
- Created a new documentation file for the plugin system test suite, consolidating testing strategies and acceptance criteria for better clarity.

This update significantly enhances the plugin registry's capabilities, providing richer metadata and improved user interactions with plugin previews.
2026-05-14 11:12:51 +08:00
lefarcen
a6307bf658 chore(reconcile): remove ad-hoc Playwright screenshot scripts
These landed inside the e2e/ package during the screenshot-comparison
work for product review; they were never meant to be tracked.
2026-05-13 23:49:21 +08:00
lefarcen
53997990b7 Merge origin/main (post-0.7.0) into reconciled garnet branch
Second-pass merge layering 41+ new commits from origin/main on top of
the first reconcile commit. Headline upstream additions absorbed:

- 0.7.0 release: redesigned chat bubble user-text styling, neutralised
  palette, lucide icons, ElevenLabs audio voice option discovery in the
  prompt composer, analytics tracking (PostHog) wired across home /
  studio / create surfaces, Prometheus `/api/metrics` endpoint,
  critique-theater drop-in mount with a settings toggle.
- Misc upstream fixes (titlebar padding, release header layout, deck
  preview chrome, feedback form auto-scroll, conversation-created SSE
  on routine runs, etc.)

Conflict resolutions (12 files, ~22 hunks):

- contracts barrel + prompts/system: union of both sides; new analytics
  exports (`./analytics/events`, `./analytics/public-params`) added
  alongside garnet's plugin/atom/genui exports. Both ElevenLabs voice
  fields (audioVoiceOptions/audioVoiceOptionsError, main) and
  pluginBlock/activeStageBlocks (garnet) preserved on ComposeInput.
- daemon/server.ts: Prometheus `/api/metrics` route inserted after
  garnet's `/api/daemon/shutdown`. main's `createAnalyticsService` call
  added before the chat-run service init alongside the prior reconcile
  note about the dropped legacy POST /api/projects body.
- App.tsx: handleCreateProject now consumes both garnet's plugin
  fields (pluginId / appliedPluginSnapshotId / pluginInputs /
  autoSendFirstMessage) and main's analytics requestId. Tracking
  fires success + failure paths; PluginLoopHome auto-send sessionStorage
  flag is preserved.
- ProjectView.tsx: the garnet auto-send useEffect coexists with main's
  `useCritiqueTheaterEnabled()` hook.
- ChatComposer.tsx: imports merged (drop now-unused fetchSkills,
  add analytics provider + tracking + buildVisualAnnotationAttachment).
- index.css: main's redesigned `.msg.user .user-text` chat bubble
  styling wins over garnet's plain text rule; garnet's
  `.msg-plugin-chip*` rules preserved alongside.
- EntryView.tsx: accepted HEAD (garnet wrapper) — consistent with
  reconcile decision #2. main's added PetRail / TopTab / analytics
  view tracking is intentionally NOT brought into the wrapper; the
  follow-up to re-integrate PetRail / image-templates / video-templates
  into EntryShell still stands and now also covers analytics
  view-tracking hooks.
- daemon/package.json + pnpm-lock: merged dep set (tar + posthog-node +
  prom-client coexist).
- Test fixtures (FileWorkspace.test): kept garnet's plugin-folders
  describe block intact; main's projectKind="prototype" addition is
  dropped where it conflicted with garnet's plugin-folder fixture
  files.

Verification: `pnpm install` (after lockfile reconciled), `pnpm typecheck`
exits 0 across all workspace packages.

Follow-up not done in this commit:
- PetRail / image-templates / video-templates / 0.7.0 analytics
  view-tracking hooks need to be added to EntryShell.
- Critique-theater settings toggle UX (added on main) lives in the
  SettingsDialog hierarchy; the reconcile state preserves the
  SettingsDialog so this should work without changes, but no
  end-to-end verification yet.
2026-05-13 23:29:56 +08:00
lefarcen
d3602be666 Merge origin/main into garnet-hemisphere (reconcile)
Merge of `origin/main` (`03ed3960`, 2026-05-13 pre-0.7.0) into the
161-commit garnet-hemisphere line, reconciling the product-vibe-coded
plugin/marketplace/EntryShell surfaces from garnet with the routines /
skills / live-artifacts feature work landed on main since the fork point.

Headline decisions (full rationale + side-by-side screenshots in
`specs/change/20260513-garnet-skills-automations/reconcile-result-vs-garnet.md`):

- #1 SettingsDialog: keep main's Memory / Skills / External MCP /
  Connectors / Routines / MCP server nav items even though the top-level
  /integrations + /automations routes also cover them. Two entries
  coexist for now; revisit once Track A/B fill in the placeholder content.
- #2 EntryView: accept garnet's thin wrapper delegating to EntryShell.
  Main's PetRail sidebar + image-templates/video-templates tabs are
  intentionally deferred to a follow-up that re-integrates them into
  the new EntryShell layout.
- #3 /integrations + /automations top-level routes: kept (garnet's
  product intent). Skills tab is still a "Coming soon" placeholder
  awaiting Track A; Routines/Schedules/Live-artifacts cards on
  /automations are still mock awaiting Track B.
- #5 DesignFilesPanel: hybrid — main's pagination as primary list,
  garnet's Plugin folders section preserved between the live-artifacts
  block and the pagination block. (by-kind sections drop in favour of
  pagination; plugin-folders rendering stays because it is a
  garnet-specific product addition.)
- #7 server.ts (10 hunks, ~5400 conflict lines): manual hunk-by-hunk
  merge. Both daemon admin routes + plugin/genui routes (garnet) and
  routines/memory/skills upgrades (main) preserved. Garnet's inline
  project route block kept alongside main's `registerProjectRoutes` /
  `registerProjectUploadRoutes` modular wiring — duplicate route
  audit is a follow-up. Garnet's POST /api/projects plugin-snapshot
  resolution + default-scenario fallback is intentionally dropped from
  the inline body (now handled by registerProjectRoutes) and listed for
  follow-up re-integration into `project-routes.ts`.

Verification (worktree at /Users/elian/Documents/open-design-garnet):
- `pnpm typecheck` exits 0 across all workspace packages
- daemon (`pnpm tools-dev run web --namespace reconcile-shots`) boots,
  serves `/api/daemon/status` healthy, and survives a Playwright
  walkthrough of /integrations / /automations / home / projects /
  design-systems / plugins / settings dialog
- `@open-design/plugin-runtime` package built (was missing dist/ on
  garnet); without it the daemon's plugins/* imports fail at boot

Track A (Skills tab → real SkillsSection) and Track B (Automations
cards → real routines / live-artifacts backend) are the two remaining
follow-ups blocking the placeholder/mock content from going live. See
`spec.md` and `track-skills.md` in the same directory.
2026-05-13 22:29:21 +08:00
Nagendhra Madishetti
38a5ab69e6
feat(daemon): Critique Theater Phase 12 (9 Prometheus metrics + 6 log events + OTel span + Grafana dashboard) (#1485)
* feat(web): pure reducer for Critique Theater states (Phase 7.1)

Pure CritiqueState reducer driven by the contracts-level PanelEvent
(the same shape both the live SSE stream and the recorded transcript
emit), so a single reducer powers both the in-flight panel and the
rerun replay. Lifecycle covers run_started → running → (shipped /
degraded / interrupted / failed), with panelist_open / dim /
must_fix / close / round_end events building per-round
CritiquePanelistView entries as they arrive.

Defensive behaviour that surfaced while writing the spec tests:
- Terminal phases (shipped / degraded / interrupted / failed) are
  sticky against further lifecycle events for the same run, except
  for parser_warning which can land late and is recorded in a side
  channel without changing phase.
- A new run_started for a different runId at any time discards the
  prior state and reboots, so the UI can launch consecutive runs
  without an explicit reset action.
- Events whose runId does not match the active run return the same
  state reference, so React's useReducer doesn't re-render
  subscribers on stray traffic.
- Round bookkeeping keys by round number rather than "always last",
  so an out-of-order panelist_dim for round 1 arriving after a
  round 2 dim does not corrupt the round 2 bucket.

Test coverage: 18 cases covering each transition, the runId guard,
sticky-terminal behaviour, the out-of-order round invariant, and
the stable-identity guarantee. Sets up Phase 7.2 and 7.3 to wire
SSE + replay into the same reducer.

* feat(web): useCritiqueStream hook subscribes to SSE and feeds reducer (Phase 7.2)

createCritiqueEventsConnection is a pure connection manager that
mirrors apps/web/src/providers/project-events.ts: opens an
EventSource at /api/projects/:id/events, listens for every name in
CRITIQUE_SSE_EVENT_NAMES, decodes each frame back into a PanelEvent
(stripping the critique. prefix and merging the data payload), and
hands it to the caller's onEvent. Reconnect uses exponential
backoff (1s → 30s) and resets on `ready`; malformed payloads drop
with a dev-mode warning rather than tearing the stream.

useCritiqueStream wraps the manager in a useReducer that owns the
CritiqueState. enabled=false or a null projectId tears down the
connection cleanly; switching projectId closes the old connection
and opens a fresh one. The returned dispatch lets local UI
synthesise actions (e.g. an Esc keypress firing a synthetic
interrupted while a kill request is in flight); production traffic
comes from the SSE stream.

Test coverage:
- sse.test.ts (10 cases, node env): subscription set covers every
  CRITIQUE_SSE_EVENT_NAMES channel; payload decoding lifts the wire
  shape back to PanelEvent; malformed JSON is swallowed and does
  not stop the stream; exponential backoff schedule and ready-reset
  semantics are pinned with a setTimeout seam; close() cancels
  pending reconnects and shuts the live source; no-op fallback
  when EventSource is unavailable.
- useCritiqueStream.test.tsx (6 cases, jsdom env): idle pre-event,
  reducer driven by synthetic actions, no connection when disabled
  or projectId is null, clean close on unmount, projectId change
  reopens cleanly.

* feat(web): useCritiqueReplay hook drives reducer from transcript file (Phase 7.3)

Fetches the per-run NDJSON transcript (one PanelEvent per line),
parses every line via the shared isPanelEvent predicate, and
dispatches into the same CritiqueState reducer the live SSE stream
uses. A single reducer means the UI rendering a replay can be
identical to the live panel, and a UI mounting both
useCritiqueStream and useCritiqueReplay in parallel does not have
to reconcile two state shapes.

speed knob is `paused | instant | live | { intervalMs: N }`.
- instant flushes every event synchronously, useful for opening a
  finished run already at its terminal state.
- intervalMs paces dispatches at a fixed cadence so the reviewer
  can watch the run unfold.
- paused parses the transcript but holds events back until the
  caller advances speed (consumers can drive a scrubber later).
- live is reserved for the future "playback at original cadence"
  feature, currently treated as instant; replay timestamps are not
  yet persisted with each event so honest pacing requires a
  follow-up Phase 7+ task.

gunzip seam handles `.ndjson.gz` transcripts via
DecompressionStream when present; the production fetch path picks
between text and arrayBuffer based on the URL extension. Both seams
are injectable so the unit tests don't need to spin up a real
network or a real gzip pipeline.

Test coverage (8 cases, jsdom env):
- Idle status before any URL is provided.
- speed=instant flushes the full transcript synchronously to
  shipped state.
- speed={intervalMs:N} paces with the setTimeout seam, reaching
  done after the last tick.
- speed=paused leaves status=playing with no dispatches.
- Empty transcript reports done with state still idle.
- Fetch rejection surfaces an error status with the message.
- Malformed NDJSON lines are skipped; valid events around them
  still land.
- .gz transcripts route through the gunzip seam.

Closes the Phase 7 plan tasks 7.1 / 7.2 / 7.3 (reducer + stream +
replay), all on one branch ready for review. Phases 8+ (Theater
components) consume these from this PR.

* fix(web): close payload-override gap + paused-resume bug in Critique Theater hooks (Phase 7 review)

Two P1 fixes from lefarcen's review on PR #1307:

SSE payload override

`sseToPanelEvent` previously spread `data` after the channel-derived
`type`, so a payload-provided `type` could override the channel and
route a `critique.run_started` frame into the reducer as a `ship`
action. Reversed the spread so the channel-derived `type` is
authoritative, and revalidated the resulting object through the
contracts-level `isPanelEvent` predicate before returning. Frames
that fail validation (missing runId, empty runId, unknown type) are
dropped, so a malformed or compromised SSE frame can no longer
dispatch a wrong-shape action into the reducer.

Three new sse.test.ts cases pin the regression: hostile `type:'ship'`
in the payload still resolves to `run_started`, missing runId is
dropped, empty runId is dropped.

Replay pause/resume

`useCritiqueReplay` had one big effect keyed on `transcriptUrl`
only, so flipping `speed` from `paused` to `instant` never re-fired
and the held events sat undispatched. Split into a parse effect
(depends on URL, fetches and stores events in state) and a pace
effect (depends on parsed-events + speed, owns the cursor + timers).
The playback cursor lives in a ref that survives pause/resume
cycles, so flipping `paused` -> `instant` flushes from the current
position rather than restarting (which would double-dispatch
`run_started` and reset the reducer).

Two new useCritiqueReplay.test.tsx cases:
- paused-then-instant transitions from `playing` to `done` and
  reaches the shipped terminal phase
- intervalMs paced playback dispatches one event, pauses to drain
  the next scheduled timer, flips to instant, and confirms the
  remaining transcript drains exactly once (cursor was preserved)

Doc consistency

The earlier source comment in useCritiqueReplay.ts claimed `live`
"paces by recorded timestamps" while the impl used zero-delay
timers and the PR body said it behaves like `instant`. Aligned to
reality: `live` currently behaves like `{ intervalMs: 0 }` (events
drain on successive microtasks via setTimeoutFn) because transcripts
do not yet carry per-event timestamps. Honest timestamp-driven
pacing is queued as a Phase 7+ follow-up.

Validated: pnpm guard, pnpm --filter @open-design/web typecheck,
Theater suite 47/47 (up from 42, +3 sse + 2 replay), full web suite
96 files / 888 tests.

* feat(i18n): seed Critique Theater key block (en + zh-CN; other locales fall back via spread)

* feat(web): Theater PanelistLane component (Phase 8.1)

* feat(web): Theater ScoreTicker component (Phase 8.2)

* feat(web): Theater RoundDivider component (Phase 8.3)

* feat(web): Theater InterruptButton component with Escape keybind (Phase 8.4)

* feat(web): Theater TheaterDegraded chip (Phase 8.5)

* feat(web): Theater TheaterCollapsed post-run summary (Phase 8.6)

* feat(web): Theater TheaterTranscript replay surface (Phase 8.7)

* feat(web): Theater TheaterStage top-level container (Phase 8.8)

* feat(web): Theater CSS using existing semantic tokens (no hex literals)

* feat(web): Theater public exports barrel

* fix(web): resolve P2 + P3 review feedback on Phase 8 (PR #1314)

Addresses all 4 P2 + 3 P3 items from codex, Siri-Ray, and lefarcen.

State-lifecycle fixes (3 x P2)
1. Reducer learns a synthetic `__reset__` action (`CritiqueResetAction`).
   Host hooks dispatch it when their gating prop changes so a stale
   run from a prior project / transcript cannot bleed into the next
   context. Reset is idempotent on idle (returns the same reference).
2. `useCritiqueStream` dispatches `__reset__` at the top of its
   connection effect, so a workspace switch from project A (which
   streamed a critique) to project B clears the reducer before the
   new EventSource opens. enabled=false also clears.
3. `useCritiqueReplay` dispatches `__reset__` at the top of its
   parse effect, so transcriptUrl swaps (including swap-to-null after
   a replay reached `shipped`) lift the reducer back to idle before
   the new fetch starts.

SSE validation (1 x P2)
4. `sseToPanelEvent` now runs a per-variant `hasValidVariantShape`
   check after the cheap `isPanelEvent` predicate. A
   `critique.ship` frame missing `composite` / `round` / `status` /
   `artifactRef` is rejected before reaching the reducer, so
   TheaterCollapsed can no longer crash on `undefined.toFixed(1)`.
   Every variant's required fields are validated: run_started
   (protocolVersion, non-empty cast, maxRounds, threshold, scale),
   panelist_* (round, role, plus variant-specific shape), round_end
   (round, composite, mustFix, decision in {continue,ship}, reason),
   ship (round, composite, status, artifactRef.{projectId,artifactId},
   summary), degraded (reason, adapter), interrupted (bestRound,
   composite), failed (cause), parser_warning (kind, position).

Reducer correctness (1 x P2)
5. `panelist_open` now materializes the round + an empty panelist
   view (`{dims: [], mustFixes: []}`) so TheaterStage can highlight
   the in-progress lane the instant the tag opens. Before this, a
   stream that emitted only `panelist_open` after `run_started` left
   `rounds = []` and the UI rendered no current round until a later
   `panelist_dim` arrived.

Polish (3 x P3)
6. Brand role tint swaps from `var(--magenta, var(--accent))` to
   `var(--purple, var(--accent))`. `--purple` is actually defined
   across the design systems; `--magenta` is not, so Brand was
   silently falling through to `--accent` and looking identical to
   Designer.
7. New i18n key `critiqueTheater.interruptedSummary` for the
   interrupted-collapse copy ("Interrupted at round N, best
   composite X.X"). Previously the interrupted branch reused
   `shippedSummary` and the UI read "Shipped at round..." for a run
   that specifically did not ship. Native value in en + zh-CN; other
   locales fall back via `...en` spread.
8. `TheaterDegraded` heading id comes from `useId()` instead of a
   hardcoded `theater-degraded-heading`, so two chips rendered on
   the same page (chat history with multiple completed runs) keep
   their aria-labelledby references unambiguous.

Tests (15 new cases)
- reducer.test.ts (+5): __reset__ on running/terminal/idle, panelist_open materializes round, panelist_open does not stomp prior panelist data.
- sse.test.ts (+6): variant-level rejection for ship without required fields, degraded without adapter, run_started with empty cast, panelist_dim with non-numeric score, round_end with unknown decision, plus a positive fully-formed ship.
- useCritiqueStream.test.tsx (+2): state reset on projectId change, state reset on enabled flip false.
- useCritiqueReplay.test.tsx (+1): state reset on transcriptUrl swap to null after a replay reached shipped.
- TheaterCollapsed.test.tsx (text-pinning update): asserts the interrupted branch reads "Interrupted at round 1" + "best composite 7.9", and explicitly NOT "Shipped at round...".
- TheaterDegraded.test.tsx (+1): two chips on the same page get unique aria-labelledby ids that each resolve to an `<h3>`.

Validated
- pnpm guard clean
- pnpm --filter @open-design/web typecheck clean
- Theater suite: 13 files, 101 tests (was 86 on the first Phase 8 push, +15 new)
- tests/i18n/locales.test.ts 5 of 5 across 18 locales

* feat(web): CritiqueTheaterMount wires SSE + reducer into a single drop-in (Phase 9.1)

* feat(i18n): Critique Theater strings for de + ja + ko + zh-TW (Phase 9.2)

* fix(web): resolve P1 + P2 review feedback on Phase 9 (PR #1315)

Addresses every blocker from codex, Siri-Ray, and lefarcen. The
three state-lifecycle and SSE-validation issues they also flagged
inherit fixes from PR #1314's review pass that this branch now sits
on top of after rebase.

Real daemon kill on Interrupt (P1)
- CritiqueTheaterMount now POSTs to
  /api/projects/:id/critique/:runId/interrupt alongside the
  optimistic local dispatch. Before this fix, clicking Interrupt
  only flipped the React state to interrupted while the daemon job
  kept running. The fetch is best-effort: a 404 (endpoint not wired
  yet, lands in Phase 15) is swallowed with a dev-mode console.warn
  so the UI still moves to the collapsed badge.
- New fetchInterrupt test seam lets RTL assert on the URL / method
  and simulate the "daemon not ready yet" path. Two tests pin both:
  the happy URL proj-42/critique/run-abc/interrupt POSTs, and a
  rejected fetch still flips the UI.

interruptPending reset on new run (P2)
- A ref-backed effect compares the current runId against the last
  one we saw; when it changes, interruptPending is cleared. A user
  who interrupts run-1 and then triggers run-2 from the same mount
  now gets a fresh, enabled kill button instead of one stuck in
  "Interrupting…". Pinned by a new mount test.

Escape keybind scope (P2)
- InterruptButton now checks the keydown target. Escape inside an
  input, textarea, select, or contenteditable element is ignored
  (and any ancestor of those via closest() is treated the same
  way). Body-level focus still fires the keybind so the Theater
  area's affordance keeps working. Four new tests cover textarea,
  input, contenteditable, and the body-focus positive case.

userFacingName i18n key (P2)
- The spec at specs/current/critique-theater.md:6 mandates a single
  critiqueTheater.userFacingName key so the "Design Jury" label can
  be renamed without touching code. Phase 8 introduced
  critiqueTheater.title by mistake; renamed across types.ts, en.ts,
  zh-CN.ts, de.ts, ja.ts, ko.ts, zh-TW.ts, and the lone consumer
  TheaterStage.tsx. The locale alignment test stays green.

Validated
- pnpm guard clean
- pnpm --filter @open-design/web typecheck clean
- Theater suite: 14 files, 112 tests (was 101 before, +11 new for
  the Phase 9 review pass: 3 mount + 4 InterruptButton focus scope;
  the rest were already in #1314's review fix).
- tests/i18n/locales.test.ts 5 of 5 across 18 locales.

* feat(daemon): adapter-degraded registry with TTL (Phase 10.1)

In-memory registry recording adapters that produced malformed or
oversize transcripts so the orchestrator can skip them for a TTL
window (default 24h) instead of cycling through known-bad providers
on every run.

Records carry reason (malformed_block | oversize_block |
missing_artifact), source label, and expiresAt. The test-only
clock seam lets the suite advance time deterministically and prove
that an expired entry stops counting as degraded without anyone
calling clearDegraded.

7/7 vitest cases green.

* feat(daemon): synthetic good + bad adapter fixtures (Phase 10.2)

Two test-only adapters that read the existing v1 transcript
fixtures (happy-3-rounds and malformed-unbalanced) and replay them
as either a full string or a 512-byte chunked stream. The chunked
form is what the conformance harness uses to prove the parser
holds together when the transcript arrives in arbitrary network
slices, not as one buffered blob.

* feat(daemon): adapter conformance harness (Phase 10.3)

runAdapterConformance pulls a transcript through the same
parseCritiqueStream pipeline the orchestrator uses and classifies
the outcome as shipped, degraded, or failed. On a degraded
outcome it forwards the matched reason to the adapter-degraded
registry, so a single nightly conformance run is what populates
the skip list rather than the orchestrator learning each adapter
is broken at request time.

5/5 vitest cases green covering shipped, malformed degraded,
oversize degraded, no-ship failure, and the harness-thrown
failure path.

* test(e2e): Critique Theater Playwright suite (Phase 11)

Six tests, one viewport per visual case, deterministic SSE
fixtures stubbed via page.route(). Adds the suite to
test:ui:extended so the existing extended-UI lane picks it up.

Coverage:

  1. Happy path: a single mounted theater plays the full
     fixture (1 run_started, 5 panelists open / dim / must_fix /
     close, 1 round_end, 1 ship) and ends on the score badge.
  2. Interrupt mid-run: the panelist that is open at the time
     the interrupt button is clicked closes with an interrupted
     marker and the transcript freezes there.
  3. Visual regression at 375x720 mobile.
  4. Visual regression at 768x1024 tablet.
  5. Visual regression at 1280x800 desktop.
  6. A11y role tree: the theater region exposes a labelled
     landmark, each panelist lane is a group with an accessible
     name, the score is a status live region.

All SSE traffic is stubbed by page.route so the suite runs in CI
without a daemon. The toggle is seeded via localStorage by
bootAppWithCritiqueEnabled so the gate behaves as if Settings
flipped it on. typecheck clean; playwright --list reports 6.

* test(web): reducer p99 bench at 10k iterations (Phase 13.1)

Locks the documented 2ms budget for the Critique Theater reducer
on a representative SSE script (27 actions, one full happy run)
behind a regression gate. Asserts p99 stays under 4ms (2x the
documented budget) so CI runners with a noisy neighbour do not
flake while a real regression to 20ms or 200ms still trips.

The bench is a vitest case rather than a bare microbenchmark so
it runs in the same CI lane as every other web test and does not
need a parallel runner.

* test(web): critique surface coverage walker (Phase 13.2)

Walks the public critique surface (11 SSE event names, 5 panelist
roles, 6 lifecycle phases, 9 named i18n keys) and asserts each
named symbol appears in both the src corpus and the test corpus.
The walker is the gate that catches a rename in one half of the
codebase without a matching update in the other half: a future
PR that drops 'panelist_must_fix' from the reducer without also
removing its test reference fails this suite.

62 assertions, one per symbol per corpus.

* docs: Critique Theater user guide (Phase 14.1)

Seven sections aimed at end users (not contributors):

  1. What is Design Jury
  2. How it works (the five panelists, auto-converging rounds,
     the composite formula)
  3. Settings (the M1 toggle and what it does)
  4. Reading the score badge
  5. Replay surface
  6. Troubleshooting (degraded, interrupted, failed)
  7. FAQ

The composite formula is documented as
    designer * 0 + critic * 0.4 + brand * 0.2 + a11y * 0.2 + copy * 0.2
because anyone trying to reverse-engineer the score is going to
search for those weights and the docs are the place they should
land first.

* docs(daemon): critique module AGENTS map (Phase 14.2)

Daemon-side wayfinder for the apps/daemon/src/critique directory.
Tables every file, what owns what invariant, and the 'when you
change anything here' guide so a future contributor does not
have to reverse-engineer the rollout resolver before adding a
new SSE event.

* docs(web): Theater module AGENTS map (Phase 14.3)

Web-side mirror of the daemon AGENTS map. Same file table, same
invariants section, same change-impact guide, sized to the
Theater component package.

* feat(daemon): rollout flag resolver (Phase 15.1)

Single decision point every caller consults to know whether the
orchestrator should wire the critique pipeline for a given run.
Priority:

  1. Skill-level policy (required wins, opt-out wins inversely)
  2. Per-project override from the Settings toggle
  3. OD_CRITIQUE_ENABLED env override
  4. Rollout phase default
       M0 dark-launch      false
       M1 settings only    false (toggle is off until the user flips it)
       M2 per-skill        true if skill opted in
       M3 global default   true

OD_CRITIQUE_ROLLOUT_PHASE parser defaults to M0 on unknown input
so a fresh install never surprises a user with the feature on.

10/10 vitest cases green covering every cell of the matrix.

* feat(web): Settings toggle hook for Critique Theater (Phase 15.2)

React hook that reads critiqueTheaterEnabled from the existing
open-design:config localStorage blob and stays in sync via:

  - the platform storage event (cross-tab)
  - a open-design:critique-theater-toggle CustomEvent (same-tab)

Same-tab event is the one that fires when the Settings panel saves
in the current window: the toggle and every mounted theater update
without a page reload.

setCritiqueTheaterEnabled(next) is the imperative setter the Settings
panel calls. It preserves the rest of the stored config (mode, apiKey,
etc.) and dispatches the same-tab event after the localStorage write.

The web hook reflects what the user toggled; the daemon-side
isCritiqueEnabled is the final routing authority (project override,
env, rollout phase). When they disagree, the daemon wins for backend
gating and the web reflects the toggle state.

6/6 vitest cases green covering first read, stored read, same-tab
event flip, config preservation, corrupted JSON tolerance, and
cross-tab storage event.

* test(web): Phase 15 toggle hook failure-mode coverage (PR #1320)

lefarcen P2 on PR #1320 flagged that the PR body claimed safe
behavior for disabled localStorage, non-object JSON, and missing
CustomEvent shim, but the suite only covered corrupt JSON plus
happy-path storage events. Added four failure-mode tests so the
swallowed errors are not silently traded for a throw in a future
refactor:

1. Returns false on a stored JSON value that parses to an array
   (non-object). Catches a regression where the guard treats
   anything truthy as a config blob.
2. Returns false on a stored JSON value of literal 'null'.
   typeof null === 'object' in JS, so the guard has to check null
   explicitly; this test pins that check.
3. Returns false when localStorage.getItem throws (private mode /
   disabled storage / SecurityError). The hook must swallow and
   return false so the rest of the app keeps rendering.
4. setCritiqueTheaterEnabled still dispatches the same-tab
   CustomEvent when localStorage.setItem throws (quota exceeded /
   disabled storage). The dispatch path is the in-session
   broadcast that keeps every mounted hook coherent even when
   persistence is unavailable; verified by mounting two probes
   and asserting both flip after the setter is called with a
   throwing setItem.

10/10 vitest cases green (6 existing + 4 new).

* fix(web): honor CustomEvent payload in toggle hook listener (PR #1320)

Both Siri-Ray (blocking) and lefarcen (P2 new) caught the same
real bug in the failure-mode test I added in affcdd27: the test
asserts the in-session UI flips when localStorage.setItem throws,
but the CustomEvent listener was ignoring the event's typed
detail and just calling readToggle(). Under a throwing setItem
the localStorage value is stale (or absent), so the listener
would see the OLD value and the test would fail (or worse, the
production claim 'in-session event keeps mounts coherent' was
hollow).

Fixed the hook, not the test: the listener now reads
event.detail.enabled when it is a boolean, falling back to
readToggle() only for malformed events or for cross-tab storage
events (which do not carry a typed payload). The setter already
dispatched the detail; the listener just was not consuming it.

Test changes:

  - The existing 'setItem throws' test now asserts the right
    behavior for the right reason. Updated the inline comment to
    say the listener reads from detail, not localStorage.
  - New test 'falls back to readToggle when the CustomEvent
    carries no usable detail' pins the fallback path: a
    malformed dispatcher (no detail, or detail.enabled not a
    boolean) degrades cleanly instead of throwing or being
    silently ignored.

11 / 11 vitest cases green (10 prior + 1 new fallback).

* feat(daemon): route critique spawn-path eligibility through the rollout resolver

The wireup edit Phase 10 and Phase 15 carved out: today server.ts gates
the critique pipeline on critiqueCfg.enabled, which is just the
OD_CRITIQUE_ENABLED env var. After this commit it gates on
isCritiqueEnabled(...) from the Phase 15 resolver, so the full
priority matrix is live:

  1. Per-skill od.critique.policy veto (opt-out / required)
  2. Per-project override (M1 Settings toggle, written through the
     existing Phase 6 settings endpoint)
  3. OD_CRITIQUE_ENABLED env override (power-user lane / CI fixtures)
  4. OD_CRITIQUE_ROLLOUT_PHASE default
       M0 dark-launch      false
       M1 settings only    false
       M2 per-skill        only when skillPolicy === 'opt-in'
       M3 global default   true

Default behaviour on a fresh install is unchanged: the resolver
returns false at M0 without an env override or a project override,
so prod traffic falls through to the legacy single-pass path
exactly the way it did before.

Inputs threaded today: phase from OD_CRITIQUE_ROLLOUT_PHASE,
envOverride from OD_CRITIQUE_ENABLED. skillPolicy and projectOverride
are passed as null for the v1 cutover; the daemon-side handler that
round-trips critiqueTheaterEnabled on the project settings row and
the od.critique.policy frontmatter resolver land as the next two
commits in this branch.

The three call sites that used critiqueCfg.enabled (the brand-thread
guard, the skill-thread guard, the top-line critiqueShouldRun
compound) now read from a single locally-scoped critiqueEnabledForRun
boolean, so the eligibility check is computed exactly once per spawn
and the prompt composer + orchestrator stay in lockstep the way
the existing comment already promised.

Tests still green: daemon vitest 22 / 22 across rollout +
conformance + adapter-degraded. Daemon typecheck clean.

* feat(web): mount CritiqueTheaterMount in ProjectView

The web counterpart of the daemon wireup. ProjectView now renders
<CritiqueTheaterMount projectId={project.id} enabled={...} /> as a
sibling of <AppChromeHeader> inside the top-level <div className="app">.

The mount is the drop-in from the Phase 9 stack: it owns the SSE
subscription, the kill-request handshake, and the phase-aware swap
from the live <TheaterStage> to the collapsed badge once a run
settles. The mount returns null until the daemon emits a
critique.run_started for the active project, so the visual surface
is byte-for-byte unchanged for users who have not opted in.

Enabled wiring: useCritiqueTheaterEnabled() reads the M1 Settings
toggle from the existing open-design:config localStorage blob and
stays in sync with both the platform storage event (cross-tab) and
the same-tab open-design:critique-theater-toggle CustomEvent the
Phase 15 setter dispatches. The hook honors the event payload
directly so a private-mode browser that cannot persist the toggle
still updates the in-session UI correctly.

The daemon-side gate (isCritiqueEnabled in apps/daemon/src/server.ts)
remains the authority for whether a run is actually wired through
the critique pipeline. This hook only governs whether the web layer
renders the resulting SSE stream when the daemon emits one. The
two-layer gate is intentional: an integrator embedding the Theater
in a custom UI can flip the web visibility independent of the
daemon's routing decision, and a daemon-side env override flips
backend gating without touching the web's localStorage.

Tests still green: web Theater suite 181 / 181 across 16 files.
Web typecheck clean.

* feat(daemon): resolve od.critique.policy frontmatter at the spawn site

The next step in the wireup branch's ladder: replace the placeholder
`skillPolicy: null` with the actual value parsed from the active
skill's SKILL.md frontmatter.

Three small edits, one new field on a public type:

1. SkillInfo gains a `critiquePolicy: SkillCritiquePolicy` field
   carrying the parsed `od.critique.policy` token (required /
   opt-in / opt-out / null). The field is null when the skill has
   no opinion, which lets the lower-priority resolver tiers
   (projectOverride, envOverride, phase default) decide.

2. listSkills() populates the new field via a small
   `normalizeCritiquePolicy` helper that tolerates the YAML
   scalar's casing and trims whitespace. Unknown tokens collapse
   to null so a typo in SKILL.md cannot accidentally force the
   panel on or off; it just falls through. Derived example cards
   inherit the parent's policy.

3. server.ts captures `skill.critiquePolicy` into a hoisted
   `skillCritiquePolicy` variable inside the existing skill-load
   block, then threads it into the isCritiqueEnabled call as the
   skillPolicy input. The hoisting keeps the variable in scope at
   the resolver call site without restructuring the spawn handler.

After this commit, the priority matrix the rollout resolver was
designed for is live for its top tier. The previous commit wired
env + phase; this one wires skill. The projectOverride input
remains null pending the next commit that extends the Phase 6
settings endpoint.

Daemon vitest: 10 / 10 rollout cases pass against the new wiring.
Daemon typecheck: clean.

* feat(daemon): feed projectOverride into the rollout resolver from project metadata

Replaces the placeholder `projectOverride: null` in the spawn
handler with the actual value the Settings panel writes onto the
project's metadata blob: `critiqueTheaterEnabled?: boolean`.

The read is defensive at the boundary: the metadata object is
typed loosely (it round-trips through SQLite as a free-form JSON
blob), so the spawn handler narrows to `boolean` and falls
through to `null` for any other shape. A missing key, a malformed
value, or a project that has never visited Settings collapses to
`null`, which is exactly the resolver's "no opinion, fall
through to env / phase" signal.

The `critique` frontmatter slot also gets typed on the
SkillFrontmatter shape so the `od.critique.policy` chain the
previous commit introduced no longer needs a bracket-access
cast. Same pattern as the existing `craft`, `preview`, and
`design_system` nested-record slots.

After this commit, every tier of the rollout resolver's priority
matrix is wired:

  1. skillPolicy   (from SKILL.md od.critique.policy)
  2. projectOverride (from project metadata critiqueTheaterEnabled)
  3. envOverride   (from OD_CRITIQUE_ENABLED)
  4. rollout phase (from OD_CRITIQUE_ROLLOUT_PHASE)

The write path for projectOverride still flows through the
existing project-update handler the Settings panel already uses
to persist project metadata; no new endpoint is needed. The
Settings UI button that calls setCritiqueTheaterEnabled and
posts the new field is the next commit on this branch.

Daemon typecheck: clean. Daemon vitest: 10 / 10 rollout cases
still green against the new wiring.

* fix(daemon): forward critique events to project sinks + align composer gate (PR #1338)

Two codex review items addressed in one commit since they share the
same root cause (resolver-enabled run hits a transport / prompt
contract that was still env-gated):

P1 (transport mismatch). The daemon emits critique.* SSE frames
through critiqueBus -> design.runs.emit, which fans out on
/api/runs/:runId/events. The web CritiqueTheaterMount subscribes to
/api/projects/:projectId/events (it's project-scoped, not run-
scoped, because the mount lives at the project workspace and
follows the user across runs). Result: in production the mount
never sees a real frame and the e2e tests' stubbed routes hide the
mismatch.

Fixed by extending critiqueBus.emit to fan out to BOTH sinks: the
existing runs.emit transport, AND the per-project event-sinks map.
The project-events route emits via sse.send(payload.type, payload),
so we pack the SSE channel name onto payload.type and let the sink
push the right channel. The web sseToPanelEvent overwrites type
from the channel name on the way back into a PanelEvent, so the
round-trip stays correct.

P2 (prompt gate misalignment). composeSystemPrompt reads
cfg.enabled to decide whether to append the panel addendum, but
critiqueCfg.enabled is loaded from OD_CRITIQUE_ENABLED only. A run
the resolver enabled via phase / project / skill (env unset) would
have critiqueShouldRun = true while critiqueCfg.enabled remained
false, dropping the panel prompt while still routing through
runOrchestrator -> parser waits for tags that never arrive -> run
degrades.

Fixed by passing a derived config { ...critiqueCfg, enabled: true }
to the composer when critiqueShouldRun is true. The composer's own
gate now agrees with the resolver decision on every input the
spec defines.

Daemon typecheck: clean. Daemon vitest: 10 / 10 rollout cases
still green against the new wiring.

* fix: address PerishCode P1 + P2 follow-ups on PR #1338

Two follow-up items PerishCode flagged on the activation PR.
Non-blocking but both are real:

1. Phase 11 e2e suite was wired into test:ui:extended but lands
   the user on '/' (home route) where ProjectView (and therefore
   CritiqueTheaterMount) is never rendered. With the suite as
   written, every assertion would time out the first time the
   lane runs in CI, contradicting the PR body's claim that the
   suite stays parked behind test.describe.fixme.

   The state diverged from my earlier Phase 11 work because the
   merge from main on commit 4ab719c6 brought in #1307's
   squash-merged version of the e2e file (the pre-fixme shape).

   Re-applied test.describe.fixme to the describe block plus
   removed ui/critique-theater.test.ts from the test:ui:extended
   script in e2e/package.json. Added a file-header docblock
   explaining what the follow-up commit needs to do: replace
   goto('/') with /projects/:id navigation similar to
   app-design-files.test.ts, split the SSE fixture into a live
   prefix and terminal suffix (Codex P2 on PR #1320), and commit
   the first PNG baselines.

2. bestRoundOf in CritiqueTheaterMount returned the LAST round
   with a numeric composite, not the round with the HIGHEST
   composite, while bestCompositeOf correctly returned the max.
   A run that closed round 1 at 8.5 and round 2 at 6.0 would
   dispatch interrupted { bestRound: 2, composite: 8.5 } on a
   user-clicked interrupt.

   Folded the two helpers into a single bestRoundAndComposite
   that walks state.rounds once and returns the matching pair so
   the two values cannot drift. The onInterrupt callback now
   destructures from one helper instead of two independent reads.
   Falls back to (state.activeRound, 0) when no round has closed
   with a composite yet.

Web typecheck: clean. CritiqueTheaterMount.test.tsx: 7 / 7 cases
still green against the new helper.

* fix: wire M1 project override end-to-end + correct deferred-surface doc claims (PR #1338)

Three lefarcen P2s on the latest review pass, all real:

1. M1 project override was half-wired: the daemon read
   metadata.critiqueTheaterEnabled but the web setter only
   wrote localStorage. A user opt-in would render the Theater
   on the web (localStorage was set) while the daemon resolved
   projectOverride=null and skipped critique unless env / phase
   already permitted. Two halves talking past each other.

   Extended setCritiqueTheaterEnabled to accept an optional
   { projectId, fetchProjectSettings } options bag. When a
   projectId is supplied, the setter ALSO sends a
   PATCH /api/projects/:id with { metadata: { critiqueTheaterEnabled
   } } so the daemon's spawn-time resolver picks the same value up
   on the next generation. The existing project-routes endpoint
   already accepts arbitrary metadata patches, so no new endpoint
   is needed. The local write + the CustomEvent dispatch still
   fire before the PATCH, so a network failure does not unwind
   the in-session UI flip. Three new vitest cases pin the new
   path: PATCHes when projectId is provided, skips when it is
   not, swallows a rejected PATCH so the in-session UI still
   flips.

2. Rollout docs (docs/critique-theater.md section 3) claimed the
   Settings toggle persists into the daemon settings store, but
   the previous implementation only had a localStorage reader /
   writer plus a daemon read of project metadata, with no
   round-trip. Rewrote the section to lead with the four-tier
   resolver (skill policy / project override / env / phase),
   document that the setter now round-trips via the existing
   PATCH endpoint when given a projectId, and call out the
   Settings panel UI control as a deliberate follow-up.

3. Troubleshooting table pointed users at /api/metrics/critique
   (Phase 12, deferred) and 'od adapters clear-degraded <id>'
   (CLI wrapper that does not exist). Replaced the metrics
   reference with the local conformance harness command
   (pnpm --filter @open-design/daemon vitest run
   tests/critique-conformance.test.ts) that ships today, with a
   note that the Phase 12 dashboard surfaces this status as a
   series once that PR lands. Replaced the CLI command with the
   programmatic clearDegraded() helper that exists today and
   flagged the CLI wrapper as planned follow-up.

Web typecheck: clean. Toggle hook tests: 14 / 14 green (11
existing + 3 new for the round-trip path).

* test(web): multi-round interrupt regression for bestRoundAndComposite (PR #1338)

lefarcen P3 follow-up to the previous bestRoundAndComposite fix:
the existing CritiqueTheaterMount.test.tsx interrupt cases only
exercised a single-round state, so a future refactor back to two
independent helpers wouldn't be caught by the test suite even
though it'd reintroduce the round / composite drift bug.

Added a regression case that:

  1. Drives the reducer through two complete rounds with the
     full 5-role cast closing at distinct composites: round 1
     at 8.5, round 2 at 6.0 (the high-composite round is NOT the
     most recent one).
  2. Clicks Interrupt + waits for the daemon ack via the test
     seam fetcher returning 204.
  3. Asserts the collapsed badge displays "round 1" (the
     correct best-composite round), and queryByText for
     "round 2 ... 8.5" returns null (the buggy pairing
     would have produced that string).

The bestRoundAndComposite helper walks state.rounds in one pass
and returns the matching pair, so the round number and the
composite cannot drift apart. This test locks the fix in: a
refactor that splits the helpers back into independent walks
will be caught here.

8 / 8 vitest cases green on the file.

* fix(web): read-merge-write the project metadata in setCritiqueTheaterEnabled (PerishCode P2 on PR #1338)

The previous round-trip sent { metadata: { critiqueTheaterEnabled: next } }
as the entire PATCH body. The daemon's project-routes handler only
re-stamps three immutable fields (baseDir, importedFrom,
fromTrustedPicker) before calling updateProject(db, id, patch),
which then does a shallow { ...existing, ...patch } in apps/daemon/
src/db.ts. So patch.metadata replaces the row's metadata wholesale,
dropping kind, templateId, linkedDirs, and every other field the rest
of the app reads.

No in-tree caller passes projectId today (only vitest cases), so the
bug had not surfaced yet. But the surface is documented in
docs/critique-theater.md section 3 and the function's own JSDoc as
the M1 round-trip path, so it would have shipped as a latent footgun
for the next integrator: a Settings UI follow-up, or any third party
that wires the setter into a project-aware surface.

Fix: read-merge-write rather than a bare patch.

- GET /api/projects/:id to read the row's current metadata.
- Spread that metadata into the PATCH body and overlay
  critiqueTheaterEnabled: next on top, mirroring the partial-metadata
  pattern already used in ChatComposer.tsx for linkedDirs.
- PATCH the merged object.

Failure handling:
- GET fails: skip the PATCH entirely. We cannot construct a safe
  merged body without the current state, and a bare patch would
  wipe other metadata. The in-session CustomEvent fired earlier in
  the setter still keeps every mounted hook consistent; the next
  save retries the round-trip.
- PATCH fails: log in dev. The in-session UI is already correct via
  the CustomEvent.

Tests (TDD, red-first):

- 'GETs the project then PATCHes with merged metadata when a
  projectId is supplied': stubs a GET that returns
  { kind: 'template', templateId: 'modern-blog', linkedDirs: [...] }
  and asserts the PATCH body equals the merge plus the toggle.
- 'PATCHes with just the toggle when the project has no prior
  metadata': stubs a GET that returns no metadata block.
- 'skips the PATCH (does not stomp metadata) when the prefetch GET
  fails': stubs a rejecting GET and asserts only the GET fires.
- 'swallows a rejected PATCH after a successful prefetch': stubs a
  successful GET and a rejecting PATCH; asserts the in-session UI
  still flips via the CustomEvent.

Doc updated on the setter's JSDoc to describe the new three-step
flow (localStorage, CustomEvent, read-merge-write PATCH) and the
two failure modes.

Verified:
- pnpm --filter @open-design/web typecheck clean.
- pnpm --filter @open-design/web test: 111 files / 1055 tests green
  (was 1052, +3 from the new merge-flow cases).

* fix(web): restore wait-for-daemon-ack pattern on Theater interrupt

Same regression as flagged on PR #1316 post-main-merge: the
optimistic local dispatch fired before the POST resolved, so a
daemon 404 / 409 still terminalized the UI and the real SSE
terminal event got ignored by the sticky interrupted phase.

Snapshot runId / bestRound / composite at click time, dispatch
interrupted only on res.ok, clear interruptPending on rejection or
non-2xx so the user can retry. Tests cover rejection + 404 leaving
the run on the live stage; the 204 path waits for the ack.

* feat(daemon): Critique Theater Phase 12 observability foundations

Lands the metrics registry, the structured logger, the /api/metrics
route, and the adapter-degraded bump that wires up the first data
point. The orchestrator-side bumps for runs / rounds / composite /
must-fix / interrupted / parser_errors / protocol_version land in a
follow-up commit on this branch (kept separate so the wiring diff
reads cleanly against the registry shape).

Surfaces added:

- apps/daemon/src/metrics/index.ts: 9 Prometheus series under the
  open_design_critique_* namespace with the histogram buckets the
  spec calls out (round_duration_ms at 100 / 250 / 500 / 1000 /
  2500 / 5000 / 10000 / 30000 / 60000 ms; composite_score at
  0-10 integer steps).
- apps/daemon/src/logging/critique.ts: 6 typed events, one JSON line
  per call on stdout, namespaced critique. Matches the JSON-per-line
  convention cli.ts already uses; no new logger framework.
- apps/daemon/src/server.ts: GET /api/metrics route. Honors
  OD_METRICS_ENDPOINT=disabled to opt out for air-gapped installs.
- apps/daemon/src/critique/adapter-degraded.ts: markDegraded now
  bumps degraded_total so the adapter-health dashboard panel
  reflects every TTL refresh and every fresh mark.

Deps: prom-client ^15.1.0, @opentelemetry/api ^1.9.0 added to
apps/daemon/package.json. Both are zero-config no-ops without an
exporter wired; daemon bundle size impact is ~150 KB uncompressed.
The @opentelemetry/api dep is in place ahead of the OTel-spans
follow-up commit; it adds no behavior on this commit.

Tests:
- tests/metrics/critique.test.ts (3 cases): registry shape +
  exposition text + reset-between-tests
- tests/logging/critique.test.ts (4 cases): event shape + ordering
  + newline framing + namespace stamping

Verification (Windows-local):
- pnpm --filter @open-design/daemon typecheck: clean
- New metrics + logging suites: 7 / 7 green
- Existing adapter-degraded + conformance + rollout suites:
  22 / 22 green; the bump is non-breaking

* feat(daemon): wire Critique Theater metrics + structured logs from the orchestrator

Lights up the bump sites the Phase 12 foundations PR registered the
series for. Every panel event the parser surfaces now reaches the
matching Prometheus counter / histogram and the matching JSON log
line on stdout.

Switch-loop bumps + logs:

- run_started: log run_started, set protocol_version gauge to the
  observed protocol version (small-integer cardinality).
- panelist_open: record the first-open wall-clock per round so
  round_end can compute round_duration_ms; subsequent opens in the
  same round leave the start time untouched.
- panelist_must_fix: bump must_fix_total with the panelist role.
  The wire event does not yet carry a dim name, so the label is
  'unspecified' for now; a future parser revision can drop in the
  real dim without a metric rename.
- round_end: bump rounds_total, observe composite_score, observe
  round_duration_ms (current ms minus the tracked start), log
  round_closed with the composite / mustFix / decision triple.
- parser_warning (parser-yielded): bump parser_errors_total with
  the kind label, log parser_recover with kind + position.

Orchestrator-side parser warnings (composite_mismatch and
duplicate_ship from the daemon-authoritative scoring checks) go
through a new emitParserWarning helper so the bus emit, the
collectedEvents push, the metric bump, and the log line stay in
lockstep. Three inline emission sites collapse to one-line helper
calls.

After the try/catch, a single terminal-status switch bumps
runs_total{status, adapter, skill} once per run, with branch-
specific log + counter:

- shipped / below_threshold: log run_shipped
- interrupted: bump interrupted_total, log run_failed{cause: interrupted}
- timed_out: log run_failed{cause: timed_out}
- failed: log run_failed{cause: orchestrator_internal}
- degraded: log degraded{reason: orchestrator_classified}

OrchestratorParams gains optional skill: string for the label;
defaults to 'unknown' so spawn sites that have not yet threaded it
keep working without a metric shape change.

Tests:
- The new metrics + logging suites (7 / 7) verify registry shape
  and event framing; orchestrator-side metric integration is
  exercised through the existing critique-conformance and
  critique-adapter-degraded suites (22 / 22 still green).
- Logger test reassigns process.stdout.write directly instead of
  vi.spyOn so the Node overloaded write signature does not
  collide with MockInstance<unknown>.

* feat(observability): Grafana dashboard JSON for Critique Theater

Three default rows mapping to the metrics this branch wires up:

1. Fleet quality: composite score p50 / p90 / p99 line graph by
   adapter, plus a heatmap of the composite distribution. The
   line graph answers 'are my agents getting better over time';
   the heatmap answers 'are the bad runs clustered around one
   adapter or smeared across the fleet'.

2. Adapter health: stacked bar charts for degraded marks (by
   adapter / reason) and parser errors (by adapter / kind) over
   a 5-minute window. The two queries together let an operator
   see 'is this adapter degraded because of malformed wire output
   or because of oversize blocks' without flipping panels.

3. Brief throughput: runs-per-hour by terminal status, an average
   rounds-per-run stat per adapter, and a round-duration ms p50 /
   p90 / p99 line. Throughput numbers fall straight out of the
   runs_total / rounds_total counters; the duration histogram is
   the same one the runs feed.

The dashboard uses a templated $datasource var (defaults to
'prometheus') so an operator with multiple Prometheus instances
can switch without editing JSON. Schema version 39 (Grafana 11).

Operators import via:

  pnpm dlx @grafana/cli dashboard import     tools/dev/dashboards/critique.json

or paste into a provisioned dashboards directory. The file is
checked into the repo as a starting artifact; alert rules and
SLO panels ship after the first 1000 runs inform the right
thresholds. JSON validates with node -e 'JSON.parse(...)' (sanity
checked locally).

* feat(daemon): OpenTelemetry outer span around the critique run

Wraps each runOrchestrator call in a 'critique.run' span via the
existing @opentelemetry/api dep added in the Phase 12 foundations
commit. Attributes set on the span:

- critique.run_id, critique.adapter, critique.skill at start
- critique.final_status, critique.final_composite on terminal
  resolution
- span status flipped to ERROR for failed / timed_out runs so a
  Tempo / Honeycomb / Jaeger filter on traces.status=error
  surfaces the right slice without joining back to Prometheus

No exporter is wired by default; @opentelemetry/api is the API
package and intentionally splits from @opentelemetry/sdk-*, so
the span is zero-overhead until an operator attaches an SDK
through their runtime config.

Inner per-round / parse_chunk / scoreboard_eval / persist_round /
ship.persist spans defined in the Phase 12 plan are a follow-up:
the outer span alone gives the trace a duration + final status +
adapter/skill labels, which is the 80% value for dashboards that
correlate runs across services. Adding child spans inside the
existing 600-line orchestrator without restructuring is a separate
careful change.

Verification:
- pnpm --filter @open-design/daemon typecheck: clean
- 29 / 29 critique + metrics + logging tests still green

* fix(nix): bump pnpmDepsHash for prom-client + @opentelemetry/api lockfile bump

nix-check failed on PR #1485 with hash mismatch in
open-design-daemon-pnpm-deps and open-design-web-pnpm-deps after
the Phase 12 foundations commit (2b8b7445) added prom-client and
@opentelemetry/api to apps/daemon/package.json and refreshed
pnpm-lock.yaml.

CI reported the new sha:
  specified: HFLm+8hv3o5x3Xem4MXNsNclIgiVRc70+EBafL0rVn8=
  got:       7R1sQC38gOT0gsZ2oNOviCZ486cbbGJGJCis6WI8z9s=

Both nix files pin the same workspace lockfile, so both flip in
lockstep. No other Nix surface changes required.

* fix(daemon): four Phase 12 review findings (Codex P2 x2 + Siri-Ray P2 + lefarcen P2)

1. Siri-Ray P2 in orchestrator.ts (round metric / log used untrusted
   agent values). The new observability path now records rs.composite
   and rs.mustFix (daemon-authoritative) instead of event.composite
   and event.mustFix when rs exists, and skips the bumps + log
   entirely when rs is missing (a degenerate round_end without any
   matching panelist_open). The dashboard p50 / p90 / p99 now agrees
   with persistence and ship decisions; an adapter reporting <ROUND_END
   composite='10'> while the daemon computed 6 logs 6 and still emits
   the composite_mismatch parser warning the prior block was already
   producing.

2. Codex P2 in server.ts (skill label always 'unknown'). The spawn
   path called runOrchestrator without passing the resolved skill id,
   so every live run bumped open_design_critique_*{skill='unknown'}
   and the per-skill dashboard breakdown was always empty. Threaded
   effectiveSkillId (already computed at the same handler scope as
   the project skill fallback) through skill: . . . so the metric
   reflects the real skill when one is assigned, and the orchestrator
   default of 'unknown' only fires for runs that genuinely have none.

3. Codex P2 in conformance.ts (protocol-version mismatch let through).
   An adapter that emitted <CRITIQUE_RUN version='2'> followed by a
   valid SHIP classified as shipped because the harness only watched
   for terminal events. Added a guard inside the parse loop: if a
   run_started carries protocolVersion !== CRITIQUE_PROTOCOL_VERSION,
   mark the adapter degraded with reason 'protocol_version_mismatch'
   (already in DEGRADED_REASONS) and return early. ConformanceOutcome
   union widened to accept the new reason.

4. lefarcen P2 in tools/dev/dashboards/critique.json (runs-per-hour
   panel under-reported by 3600x). 'rate(...[1h])' returns per-second.
   Multiplied by 3600 so the panel title and unit match the actual
   value rendered.

Verification:
- pnpm --filter @open-design/daemon typecheck: clean
- New metrics + logging suites (7), existing adapter-degraded (7),
  conformance (5), rollout (10): 29 / 29 green
- Grafana JSON re-parses with node -e 'JSON.parse(...)'

* fix(nix): set pnpmDepsHash to fakeHash so CI surfaces the real hash for the regenerated lockfile (lefarcen P1 on PR #1485)

* fix(nix): pin pnpmDepsHash to sha256-NtXbiRU0YZ4EVJVNC6N3sR1S0ozA3BvCwgXI0L0OMH4= from CI nix-check output

---------

Co-authored-by: Nagendhra <nagendhra405@gmail.com>
2026-05-13 22:11:27 +08:00
lefarcen
5172e37217 Merge origin/main into release/v0.7.0 to prepare merge-back PR
Resolves 7 conflicts via hybrid strategy:
- apps/web/src/components/EntryView.tsx: take main (Discord+X pills are forward feature)
- apps/web/src/components/Icon.tsx: take main (switch-case refactor)
- apps/web/src/components/NewProjectPanel.tsx: take release (preserve #1514 dropdown UX validated in 0.7.0 acceptance)
- apps/web/src/index.css: take main (project-target-platforms / instructions chip styles)
- apps/web/tests/components/FileViewer.inspect-empty-hint.test.tsx: accept main's deletion
- nix/package-daemon.nix, nix/package-web.nix: take main pnpmDepsHash

Non-conflicting hunks from #1519 (AppChromeHeader), #1428 (PostHog analytics
call sites), and #1540 (release light background) are preserved via auto-merge.
2026-05-13 18:19:47 +08:00
Siri-Ray
026e13b347
fix(web): restore release header layout (#1519)
* fix(web): restore release header layout

* fix(web): disambiguate entry settings button

Generated-By: looper 0.7.4 (runner=fixer, agent=codex)
2026-05-13 14:57:25 +08:00
Caprika
6736310a01
Implement manual edit inspector (#1448)
* feat(web): tweaks palette popover with HSL hue-shift recoloring

Adds a Tweaks color-palette popover to the HTML preview toolbar.
Selecting a palette re-skins the iframe in place via a srcDoc-side
bridge that walks the DOM and shifts every chromatic paint to the
target hue while preserving each color's saturation and lightness —
pale tints stay pale, bold CTAs stay bold, just in the new color
family. Mono-noir desaturates instead of shifting.

- runtime/srcdoc: new injectPaletteBridge + paletteBridge / initialPalette options
- file-viewer-render-mode: paletteActive flips URL-load back to srcDoc so the bridge can be injected
- FileViewer: state, popover, postMessage wiring, srcDoc + useUrlLoadPreview integration
- PaletteTweaks: popover UI with Original + Coral / Electric / Acid forest / Risograph / Mono noir
- PreviewDrawOverlay: stub pass-through until the draw branch lands

* feat(web): hide finalize-design toolbar from project header

* test(e2e): skip project actions toolbar flow after toolbar removal

* Polish manual edit inspector

* Implement manual edit inspector

* Fix manual edit review regressions

* Fix FileViewer CI regressions

* Fix remaining manual edit review issues

* Flush manual edit styles before draw exit

* Restore Critique Theater styles

* Accept pixel line-height manual edits

---------

Co-authored-by: qiongyu1999 <2694684348@qq.com>
2026-05-13 13:25:58 +08:00
nettee
0f0d2879ff
Make de/fr/ru content i18n optional (#1511) 2026-05-13 12:17:17 +08:00
Nagendhra Madishetti
e2f409579d
docs: Critique Theater Phase 14 (user guide + 2 AGENTS module maps) (#1319)
* feat(web): pure reducer for Critique Theater states (Phase 7.1)

Pure CritiqueState reducer driven by the contracts-level PanelEvent
(the same shape both the live SSE stream and the recorded transcript
emit), so a single reducer powers both the in-flight panel and the
rerun replay. Lifecycle covers run_started → running → (shipped /
degraded / interrupted / failed), with panelist_open / dim /
must_fix / close / round_end events building per-round
CritiquePanelistView entries as they arrive.

Defensive behaviour that surfaced while writing the spec tests:
- Terminal phases (shipped / degraded / interrupted / failed) are
  sticky against further lifecycle events for the same run, except
  for parser_warning which can land late and is recorded in a side
  channel without changing phase.
- A new run_started for a different runId at any time discards the
  prior state and reboots, so the UI can launch consecutive runs
  without an explicit reset action.
- Events whose runId does not match the active run return the same
  state reference, so React's useReducer doesn't re-render
  subscribers on stray traffic.
- Round bookkeeping keys by round number rather than "always last",
  so an out-of-order panelist_dim for round 1 arriving after a
  round 2 dim does not corrupt the round 2 bucket.

Test coverage: 18 cases covering each transition, the runId guard,
sticky-terminal behaviour, the out-of-order round invariant, and
the stable-identity guarantee. Sets up Phase 7.2 and 7.3 to wire
SSE + replay into the same reducer.

* feat(web): useCritiqueStream hook subscribes to SSE and feeds reducer (Phase 7.2)

createCritiqueEventsConnection is a pure connection manager that
mirrors apps/web/src/providers/project-events.ts: opens an
EventSource at /api/projects/:id/events, listens for every name in
CRITIQUE_SSE_EVENT_NAMES, decodes each frame back into a PanelEvent
(stripping the critique. prefix and merging the data payload), and
hands it to the caller's onEvent. Reconnect uses exponential
backoff (1s → 30s) and resets on `ready`; malformed payloads drop
with a dev-mode warning rather than tearing the stream.

useCritiqueStream wraps the manager in a useReducer that owns the
CritiqueState. enabled=false or a null projectId tears down the
connection cleanly; switching projectId closes the old connection
and opens a fresh one. The returned dispatch lets local UI
synthesise actions (e.g. an Esc keypress firing a synthetic
interrupted while a kill request is in flight); production traffic
comes from the SSE stream.

Test coverage:
- sse.test.ts (10 cases, node env): subscription set covers every
  CRITIQUE_SSE_EVENT_NAMES channel; payload decoding lifts the wire
  shape back to PanelEvent; malformed JSON is swallowed and does
  not stop the stream; exponential backoff schedule and ready-reset
  semantics are pinned with a setTimeout seam; close() cancels
  pending reconnects and shuts the live source; no-op fallback
  when EventSource is unavailable.
- useCritiqueStream.test.tsx (6 cases, jsdom env): idle pre-event,
  reducer driven by synthetic actions, no connection when disabled
  or projectId is null, clean close on unmount, projectId change
  reopens cleanly.

* feat(web): useCritiqueReplay hook drives reducer from transcript file (Phase 7.3)

Fetches the per-run NDJSON transcript (one PanelEvent per line),
parses every line via the shared isPanelEvent predicate, and
dispatches into the same CritiqueState reducer the live SSE stream
uses. A single reducer means the UI rendering a replay can be
identical to the live panel, and a UI mounting both
useCritiqueStream and useCritiqueReplay in parallel does not have
to reconcile two state shapes.

speed knob is `paused | instant | live | { intervalMs: N }`.
- instant flushes every event synchronously, useful for opening a
  finished run already at its terminal state.
- intervalMs paces dispatches at a fixed cadence so the reviewer
  can watch the run unfold.
- paused parses the transcript but holds events back until the
  caller advances speed (consumers can drive a scrubber later).
- live is reserved for the future "playback at original cadence"
  feature, currently treated as instant; replay timestamps are not
  yet persisted with each event so honest pacing requires a
  follow-up Phase 7+ task.

gunzip seam handles `.ndjson.gz` transcripts via
DecompressionStream when present; the production fetch path picks
between text and arrayBuffer based on the URL extension. Both seams
are injectable so the unit tests don't need to spin up a real
network or a real gzip pipeline.

Test coverage (8 cases, jsdom env):
- Idle status before any URL is provided.
- speed=instant flushes the full transcript synchronously to
  shipped state.
- speed={intervalMs:N} paces with the setTimeout seam, reaching
  done after the last tick.
- speed=paused leaves status=playing with no dispatches.
- Empty transcript reports done with state still idle.
- Fetch rejection surfaces an error status with the message.
- Malformed NDJSON lines are skipped; valid events around them
  still land.
- .gz transcripts route through the gunzip seam.

Closes the Phase 7 plan tasks 7.1 / 7.2 / 7.3 (reducer + stream +
replay), all on one branch ready for review. Phases 8+ (Theater
components) consume these from this PR.

* fix(web): close payload-override gap + paused-resume bug in Critique Theater hooks (Phase 7 review)

Two P1 fixes from lefarcen's review on PR #1307:

SSE payload override

`sseToPanelEvent` previously spread `data` after the channel-derived
`type`, so a payload-provided `type` could override the channel and
route a `critique.run_started` frame into the reducer as a `ship`
action. Reversed the spread so the channel-derived `type` is
authoritative, and revalidated the resulting object through the
contracts-level `isPanelEvent` predicate before returning. Frames
that fail validation (missing runId, empty runId, unknown type) are
dropped, so a malformed or compromised SSE frame can no longer
dispatch a wrong-shape action into the reducer.

Three new sse.test.ts cases pin the regression: hostile `type:'ship'`
in the payload still resolves to `run_started`, missing runId is
dropped, empty runId is dropped.

Replay pause/resume

`useCritiqueReplay` had one big effect keyed on `transcriptUrl`
only, so flipping `speed` from `paused` to `instant` never re-fired
and the held events sat undispatched. Split into a parse effect
(depends on URL, fetches and stores events in state) and a pace
effect (depends on parsed-events + speed, owns the cursor + timers).
The playback cursor lives in a ref that survives pause/resume
cycles, so flipping `paused` -> `instant` flushes from the current
position rather than restarting (which would double-dispatch
`run_started` and reset the reducer).

Two new useCritiqueReplay.test.tsx cases:
- paused-then-instant transitions from `playing` to `done` and
  reaches the shipped terminal phase
- intervalMs paced playback dispatches one event, pauses to drain
  the next scheduled timer, flips to instant, and confirms the
  remaining transcript drains exactly once (cursor was preserved)

Doc consistency

The earlier source comment in useCritiqueReplay.ts claimed `live`
"paces by recorded timestamps" while the impl used zero-delay
timers and the PR body said it behaves like `instant`. Aligned to
reality: `live` currently behaves like `{ intervalMs: 0 }` (events
drain on successive microtasks via setTimeoutFn) because transcripts
do not yet carry per-event timestamps. Honest timestamp-driven
pacing is queued as a Phase 7+ follow-up.

Validated: pnpm guard, pnpm --filter @open-design/web typecheck,
Theater suite 47/47 (up from 42, +3 sse + 2 replay), full web suite
96 files / 888 tests.

* feat(i18n): seed Critique Theater key block (en + zh-CN; other locales fall back via spread)

* feat(web): Theater PanelistLane component (Phase 8.1)

* feat(web): Theater ScoreTicker component (Phase 8.2)

* feat(web): Theater RoundDivider component (Phase 8.3)

* feat(web): Theater InterruptButton component with Escape keybind (Phase 8.4)

* feat(web): Theater TheaterDegraded chip (Phase 8.5)

* feat(web): Theater TheaterCollapsed post-run summary (Phase 8.6)

* feat(web): Theater TheaterTranscript replay surface (Phase 8.7)

* feat(web): Theater TheaterStage top-level container (Phase 8.8)

* feat(web): Theater CSS using existing semantic tokens (no hex literals)

* feat(web): Theater public exports barrel

* fix(web): resolve P2 + P3 review feedback on Phase 8 (PR #1314)

Addresses all 4 P2 + 3 P3 items from codex, Siri-Ray, and lefarcen.

State-lifecycle fixes (3 x P2)
1. Reducer learns a synthetic `__reset__` action (`CritiqueResetAction`).
   Host hooks dispatch it when their gating prop changes so a stale
   run from a prior project / transcript cannot bleed into the next
   context. Reset is idempotent on idle (returns the same reference).
2. `useCritiqueStream` dispatches `__reset__` at the top of its
   connection effect, so a workspace switch from project A (which
   streamed a critique) to project B clears the reducer before the
   new EventSource opens. enabled=false also clears.
3. `useCritiqueReplay` dispatches `__reset__` at the top of its
   parse effect, so transcriptUrl swaps (including swap-to-null after
   a replay reached `shipped`) lift the reducer back to idle before
   the new fetch starts.

SSE validation (1 x P2)
4. `sseToPanelEvent` now runs a per-variant `hasValidVariantShape`
   check after the cheap `isPanelEvent` predicate. A
   `critique.ship` frame missing `composite` / `round` / `status` /
   `artifactRef` is rejected before reaching the reducer, so
   TheaterCollapsed can no longer crash on `undefined.toFixed(1)`.
   Every variant's required fields are validated: run_started
   (protocolVersion, non-empty cast, maxRounds, threshold, scale),
   panelist_* (round, role, plus variant-specific shape), round_end
   (round, composite, mustFix, decision in {continue,ship}, reason),
   ship (round, composite, status, artifactRef.{projectId,artifactId},
   summary), degraded (reason, adapter), interrupted (bestRound,
   composite), failed (cause), parser_warning (kind, position).

Reducer correctness (1 x P2)
5. `panelist_open` now materializes the round + an empty panelist
   view (`{dims: [], mustFixes: []}`) so TheaterStage can highlight
   the in-progress lane the instant the tag opens. Before this, a
   stream that emitted only `panelist_open` after `run_started` left
   `rounds = []` and the UI rendered no current round until a later
   `panelist_dim` arrived.

Polish (3 x P3)
6. Brand role tint swaps from `var(--magenta, var(--accent))` to
   `var(--purple, var(--accent))`. `--purple` is actually defined
   across the design systems; `--magenta` is not, so Brand was
   silently falling through to `--accent` and looking identical to
   Designer.
7. New i18n key `critiqueTheater.interruptedSummary` for the
   interrupted-collapse copy ("Interrupted at round N, best
   composite X.X"). Previously the interrupted branch reused
   `shippedSummary` and the UI read "Shipped at round..." for a run
   that specifically did not ship. Native value in en + zh-CN; other
   locales fall back via `...en` spread.
8. `TheaterDegraded` heading id comes from `useId()` instead of a
   hardcoded `theater-degraded-heading`, so two chips rendered on
   the same page (chat history with multiple completed runs) keep
   their aria-labelledby references unambiguous.

Tests (15 new cases)
- reducer.test.ts (+5): __reset__ on running/terminal/idle, panelist_open materializes round, panelist_open does not stomp prior panelist data.
- sse.test.ts (+6): variant-level rejection for ship without required fields, degraded without adapter, run_started with empty cast, panelist_dim with non-numeric score, round_end with unknown decision, plus a positive fully-formed ship.
- useCritiqueStream.test.tsx (+2): state reset on projectId change, state reset on enabled flip false.
- useCritiqueReplay.test.tsx (+1): state reset on transcriptUrl swap to null after a replay reached shipped.
- TheaterCollapsed.test.tsx (text-pinning update): asserts the interrupted branch reads "Interrupted at round 1" + "best composite 7.9", and explicitly NOT "Shipped at round...".
- TheaterDegraded.test.tsx (+1): two chips on the same page get unique aria-labelledby ids that each resolve to an `<h3>`.

Validated
- pnpm guard clean
- pnpm --filter @open-design/web typecheck clean
- Theater suite: 13 files, 101 tests (was 86 on the first Phase 8 push, +15 new)
- tests/i18n/locales.test.ts 5 of 5 across 18 locales

* feat(web): CritiqueTheaterMount wires SSE + reducer into a single drop-in (Phase 9.1)

* feat(i18n): Critique Theater strings for de + ja + ko + zh-TW (Phase 9.2)

* fix(web): resolve P1 + P2 review feedback on Phase 9 (PR #1315)

Addresses every blocker from codex, Siri-Ray, and lefarcen. The
three state-lifecycle and SSE-validation issues they also flagged
inherit fixes from PR #1314's review pass that this branch now sits
on top of after rebase.

Real daemon kill on Interrupt (P1)
- CritiqueTheaterMount now POSTs to
  /api/projects/:id/critique/:runId/interrupt alongside the
  optimistic local dispatch. Before this fix, clicking Interrupt
  only flipped the React state to interrupted while the daemon job
  kept running. The fetch is best-effort: a 404 (endpoint not wired
  yet, lands in Phase 15) is swallowed with a dev-mode console.warn
  so the UI still moves to the collapsed badge.
- New fetchInterrupt test seam lets RTL assert on the URL / method
  and simulate the "daemon not ready yet" path. Two tests pin both:
  the happy URL proj-42/critique/run-abc/interrupt POSTs, and a
  rejected fetch still flips the UI.

interruptPending reset on new run (P2)
- A ref-backed effect compares the current runId against the last
  one we saw; when it changes, interruptPending is cleared. A user
  who interrupts run-1 and then triggers run-2 from the same mount
  now gets a fresh, enabled kill button instead of one stuck in
  "Interrupting…". Pinned by a new mount test.

Escape keybind scope (P2)
- InterruptButton now checks the keydown target. Escape inside an
  input, textarea, select, or contenteditable element is ignored
  (and any ancestor of those via closest() is treated the same
  way). Body-level focus still fires the keybind so the Theater
  area's affordance keeps working. Four new tests cover textarea,
  input, contenteditable, and the body-focus positive case.

userFacingName i18n key (P2)
- The spec at specs/current/critique-theater.md:6 mandates a single
  critiqueTheater.userFacingName key so the "Design Jury" label can
  be renamed without touching code. Phase 8 introduced
  critiqueTheater.title by mistake; renamed across types.ts, en.ts,
  zh-CN.ts, de.ts, ja.ts, ko.ts, zh-TW.ts, and the lone consumer
  TheaterStage.tsx. The locale alignment test stays green.

Validated
- pnpm guard clean
- pnpm --filter @open-design/web typecheck clean
- Theater suite: 14 files, 112 tests (was 101 before, +11 new for
  the Phase 9 review pass: 3 mount + 4 InterruptButton focus scope;
  the rest were already in #1314's review fix).
- tests/i18n/locales.test.ts 5 of 5 across 18 locales.

* feat(daemon): adapter-degraded registry with TTL (Phase 10.1)

In-memory registry recording adapters that produced malformed or
oversize transcripts so the orchestrator can skip them for a TTL
window (default 24h) instead of cycling through known-bad providers
on every run.

Records carry reason (malformed_block | oversize_block |
missing_artifact), source label, and expiresAt. The test-only
clock seam lets the suite advance time deterministically and prove
that an expired entry stops counting as degraded without anyone
calling clearDegraded.

7/7 vitest cases green.

* feat(daemon): synthetic good + bad adapter fixtures (Phase 10.2)

Two test-only adapters that read the existing v1 transcript
fixtures (happy-3-rounds and malformed-unbalanced) and replay them
as either a full string or a 512-byte chunked stream. The chunked
form is what the conformance harness uses to prove the parser
holds together when the transcript arrives in arbitrary network
slices, not as one buffered blob.

* feat(daemon): adapter conformance harness (Phase 10.3)

runAdapterConformance pulls a transcript through the same
parseCritiqueStream pipeline the orchestrator uses and classifies
the outcome as shipped, degraded, or failed. On a degraded
outcome it forwards the matched reason to the adapter-degraded
registry, so a single nightly conformance run is what populates
the skip list rather than the orchestrator learning each adapter
is broken at request time.

5/5 vitest cases green covering shipped, malformed degraded,
oversize degraded, no-ship failure, and the harness-thrown
failure path.

* test(e2e): Critique Theater Playwright suite (Phase 11)

Six tests, one viewport per visual case, deterministic SSE
fixtures stubbed via page.route(). Adds the suite to
test:ui:extended so the existing extended-UI lane picks it up.

Coverage:

  1. Happy path: a single mounted theater plays the full
     fixture (1 run_started, 5 panelists open / dim / must_fix /
     close, 1 round_end, 1 ship) and ends on the score badge.
  2. Interrupt mid-run: the panelist that is open at the time
     the interrupt button is clicked closes with an interrupted
     marker and the transcript freezes there.
  3. Visual regression at 375x720 mobile.
  4. Visual regression at 768x1024 tablet.
  5. Visual regression at 1280x800 desktop.
  6. A11y role tree: the theater region exposes a labelled
     landmark, each panelist lane is a group with an accessible
     name, the score is a status live region.

All SSE traffic is stubbed by page.route so the suite runs in CI
without a daemon. The toggle is seeded via localStorage by
bootAppWithCritiqueEnabled so the gate behaves as if Settings
flipped it on. typecheck clean; playwright --list reports 6.

* test(web): reducer p99 bench at 10k iterations (Phase 13.1)

Locks the documented 2ms budget for the Critique Theater reducer
on a representative SSE script (27 actions, one full happy run)
behind a regression gate. Asserts p99 stays under 4ms (2x the
documented budget) so CI runners with a noisy neighbour do not
flake while a real regression to 20ms or 200ms still trips.

The bench is a vitest case rather than a bare microbenchmark so
it runs in the same CI lane as every other web test and does not
need a parallel runner.

* test(web): critique surface coverage walker (Phase 13.2)

Walks the public critique surface (11 SSE event names, 5 panelist
roles, 6 lifecycle phases, 9 named i18n keys) and asserts each
named symbol appears in both the src corpus and the test corpus.
The walker is the gate that catches a rename in one half of the
codebase without a matching update in the other half: a future
PR that drops 'panelist_must_fix' from the reducer without also
removing its test reference fails this suite.

62 assertions, one per symbol per corpus.

* docs: Critique Theater user guide (Phase 14.1)

Seven sections aimed at end users (not contributors):

  1. What is Design Jury
  2. How it works (the five panelists, auto-converging rounds,
     the composite formula)
  3. Settings (the M1 toggle and what it does)
  4. Reading the score badge
  5. Replay surface
  6. Troubleshooting (degraded, interrupted, failed)
  7. FAQ

The composite formula is documented as
    designer * 0 + critic * 0.4 + brand * 0.2 + a11y * 0.2 + copy * 0.2
because anyone trying to reverse-engineer the score is going to
search for those weights and the docs are the place they should
land first.

* docs(daemon): critique module AGENTS map (Phase 14.2)

Daemon-side wayfinder for the apps/daemon/src/critique directory.
Tables every file, what owns what invariant, and the 'when you
change anything here' guide so a future contributor does not
have to reverse-engineer the rollout resolver before adding a
new SSE event.

* docs(web): Theater module AGENTS map (Phase 14.3)

Web-side mirror of the daemon AGENTS map. Same file table, same
invariants section, same change-impact guide, sized to the
Theater component package.

* docs: tighten Phase 14 reasoning from lefarcen review (PR #1319)

Four content gaps lefarcen flagged in the Phase 14 docs review,
addressed inline rather than deferred. The fifth item (scope-drift
between 'docs only' PR body and the cumulative stacked diff) is
handled by rewriting the PR body, not the docs.

1. Round exit conditions (lefarcen P2-1).
   docs/critique-theater.md §2 'Auto-converging rounds' now lists
   the five conditions that stop a run (threshold reached, round
   budget exhausted, per-round timeout, total timeout, user
   interrupt) with their default values. A user debugging a run
   that stopped at round 1 with composite 5.4 can read this list
   and find the matching cause without spelunking the orchestrator.

2. Prior-art comparison (lefarcen P2-2).
   New §1.5 'Why an in-CLI panel and not a third-party design lint'
   pre-answers the 'why not Figma lint / Adobe checker / Material
   You conformance' question. Three differences: rule engines vs
   generative reviewers, post-hoc vs in-loop, external service vs
   same-CLI-session.

3. Composite formula rationale (lefarcen P2-4).
   §2 now explains why each weight is set the way it is: critic
   gates correctness so it gets 0.4; brand / a11y / copy are
   secondary quality dimensions at 0.2 each; designer is at 0.0
   in v1 because aesthetic preference is not a ship gate. The slot
   stays in the schema so notes flow into the transcript and a v2
   config release can bump the weight without a wire-shape change.

4. v2 cast-config ownership (lefarcen P2-3).
   Both AGENTS.md files (daemon + web) now declare a 'Designer
   weight frozen at 0.0 until v2 cast config' invariant. The
   daemon side calls out where the SKILL.md frontmatter resolver
   lands (apps/daemon/src/critique/config.ts); the web side calls
   out where the Settings surface lands (apps/web/src/components/
   Settings/). A contributor reading either AGENTS.md before
   implementing v2 sees which module to touch first.

* docs(web): mirror the Designer-weight invariant in Theater AGENTS.md (PR #1319)

lefarcen P1 follow-up on PR #1319: the daemon AGENTS.md already
declares 'Designer weight is frozen at 0.0 until v2 cast config
lands' as an invariant, but the web AGENTS.md's parallel bullet
led with 'Composite weights are read-only on the web side' which
buried the Designer-specific constraint. A web contributor
reading that bullet would not realise the v1 weight distribution
is wire-shape (changing it mid-v1 invalidates persisted
critique_runs composite values).

Rewrote the bullet to lead with the same 'Designer weight is
frozen at 0.0 until v2 cast config lands' phrasing the daemon
side uses, and added an explicit cross-link to the daemon
AGENTS.md so the two halves of the invariant read as one rule.

Web-side specifics retained: ScoreTicker / TheaterCollapsed read
composite off the wire (no client recompute), v2 lands as a
Settings surface at apps/web/src/components/Settings/, do not
add a 'weights' prop to any component in this directory until
the contracts package carries the v2 cast type.

* docs: replace deferred metrics endpoint reference + refresh Theater module map (PR #1319)

Two carryover items lefarcen flagged across the PR #1319 + #1320
reviews.

1. docs/critique-theater.md was sending users to
   /api/metrics/critique as the conformance-status check on
   malformed_block, but the Phase 12 metrics endpoint is
   explicitly deferred until after orchestrator wiring lands.
   Replaced the link with the pnpm conformance-harness command
   that DOES exist today (pnpm --filter @open-design/daemon
   vitest run tests/critique-conformance.test.ts) and noted
   that the dashboard surfaces this status as a series once
   Phase 12 ships.

2. apps/web/src/components/Theater/AGENTS.md module map was
   stale after Phase 15: the index.ts row said 'only two hooks
   are exported' but the barrel now exports
   useCritiqueTheaterEnabled too (plus the setCritiqueTheaterEnabled
   setter). Updated the row to list all three hooks + the
   setter + the reducer-derived contract types, and added a
   new row for hooks/useCritiqueTheaterEnabled.ts in the file
   table so a web contributor scanning the table sees the new
   hook without inferring it from the index.ts blurb.

* fix(web): restore wait-for-daemon-ack pattern on Theater interrupt

Same regression as flagged on PR #1316 post-main-merge: the
optimistic local dispatch fired before the POST resolved, so a
daemon 404 / 409 still terminalized the UI and the real SSE
terminal event got ignored by the sticky interrupted phase.

Snapshot runId / bestRound / composite at click time, dispatch
interrupted only on res.ok, clear interruptPending on rejection or
non-2xx so the user can retry. Tests cover rejection + 404 leaving
the run on the live stage; the 204 path waits for the ack.

---------

Co-authored-by: Nagendhra <nagendhra405@gmail.com>
2026-05-13 12:11:48 +08:00
lefarcen
e2952acd05 Revert "fix(web): restore consistent app header layout (#1432)"
This reverts commit 3d3119333c.
2026-05-13 11:20:16 +08:00