Commit graph

141 commits

Author SHA1 Message Date
PerishFire
bb13eee765
chore: optimize CI and beta release runtime (#2231)
* chore(ci): add runtime trace summaries

* chore(ci): tighten measured workspace steps

* chore(release): tighten beta setup steps

* chore(release): slim beta windows smoke

* chore(ci): shard daemon tests

* chore(ci): harden runtime trace lookup

* chore(release): avoid mac pnpm cache in beta

* chore(ci): split critical playwright checks

* chore(release): publish beta platforms from builders

* test(e2e): update beta release workflow expectation

* chore(ci): stop gating PRs on nix check

* fix(release): keep beta latest complete
2026-05-19 18:06:28 +08:00
Tom Huang
86ec951fb9
[codex] Add automation templates and proposal workflows (#2193)
* feat(web): introduce Automations tab with dual-track capability for routines

This commit adds a new Automations tab that consolidates routines, schedules, and live artifacts, allowing users to manage automations seamlessly. The tab features a modal for creating and editing automations, which supports various scheduling options (hourly, daily, weekdays, weekly) and project modes (create_each_run, reuse). The CLI is also updated to expose automation commands, ensuring consistency between the web UI and CLI interfaces.

Key changes include:
- New `NewAutomationModal` component for automation creation and editing.
- Updated `TasksView` to integrate the new Automations functionality.
- Enhanced styling for the Automations tab to improve user experience.

This implementation aligns with the dual-track capability exposure policy, ensuring all features are accessible via both the web UI and CLI.

* feat(daemon): enhance automation context handling and CLI commands

This commit introduces several improvements to the automation context management and updates the CLI commands accordingly. Key changes include:

- Added support for new context fields (`plugin`, `mcp`, `connector`) in automation commands.
- Updated the CLI to reflect new target options (`new-project`).
- Enhanced error messages for invalid target inputs.
- Introduced functions to handle context selection and normalization for routines, including the ability to parse and store context data in the database.
- Updated the database schema to include a new `context_json` field for routines.
- Improved the handling of context in routine routes and the web interface, ensuring that selected contexts are properly managed and displayed.

These changes aim to provide a more robust and flexible automation experience, aligning with the recent enhancements in the web UI.

* feat(web): enhance TasksView with automation run history and status indicators

This commit introduces several new features to the TasksView component, including:

- Added functionality to display automation run history for each routine, showing metadata such as status, timestamps, and project details.
- Implemented status indicators for routine runs, providing visual feedback on their current state (succeeded, failed, running, queued).
- Enhanced the UI to allow users to expand and view detailed run history, including the ability to open the corresponding project conversation.
- Updated styles to improve the presentation of automation statuses and history.

These changes aim to provide users with better insights into their automation routines and improve overall usability.

* feat(daemon): implement automation ingestion and proposal management

This commit introduces several new features related to automation ingestion and proposal management within the daemon. Key changes include:

- Added new modules for handling automation source packets and proposals, allowing for the storage, retrieval, and management of automation-related data.
- Implemented functions to list, create, and apply automation proposals, enhancing the automation workflow.
- Introduced new CLI commands for interacting with memory entries and automation sources, providing users with more control over their automation processes.
- Enhanced the server routes to support automation source and proposal APIs, enabling seamless integration with the existing system.

These changes aim to improve the overall automation experience, making it easier for users to manage and utilize automation proposals and ingestions effectively.
2026-05-19 16:35:28 +08:00
Siri-Ray
eb127e0f79
Remove live artifact home chip (#2221) 2026-05-19 16:35:09 +08:00
chaoxiaoche
a38e09f931
fix(web): demote Plugins and Integrations to nav rail footer (#1806)
* fix(web): demote Plugins and Integrations to nav rail footer

Plugins and Integrations are platform-configuration surfaces, not
daily-use destinations. Moving them to the footer section of the
left nav rail — separated from primary items by a thin divider —
keeps them reachable while giving the primary four items
(Home, Projects, Automations, Design Systems) the visual weight
they deserve.

- EntryNavRail: remove Plugins/Integrations from the main __group
  and place them in the __footer above the help launcher
- entry-layout.css: add __divider rule (1 px separator) to visually
  mark the boundary between primary and secondary nav regions

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): remove settings dropdown, gear opens settings directly

The gear/cog button previously opened a dropdown that mixed three
unrelated concerns: community links (X, Discord), preference quick-
access (Language, Appearance), a feature shortcut (Use everywhere),
and a redundant Settings entry — creating two separate paths to the
same Settings dialog and duplicating Language/Appearance relative to
the Settings sidebar.

Changes:
- Gear button now directly opens the Settings dialog (no intermediate
  dropdown layer)
- Follow @nexudotio on X and Join Discord moved to the Help menu at
  the bottom of the nav rail, where community/external links belong
- Language and Appearance remain exclusively in the Settings dialog
- Use everywhere remains exclusively in the topbar chip
- Remove dead state (avatarMenuOpen, languageExpanded,
  appearanceExpanded), ref, outside-click effect, and module-level
  constants (APPEARANCE_THEMES, APPEARANCE_LABEL, describeModelChip)
  that were introduced solely to support the dropdown

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(web): move output-type chips above chat input as tab bar

The home hero previously had two chip rows below the chat input that
mixed two unrelated dimensions: output type (Prototype, Slide deck,
Image…) and workflow source (Create plugin, From Figma, From folder,
From template). Users had no visual cue that these were different
categories.

This change separates the two dimensions clearly:

- Output type (create group) becomes a tab bar positioned above the
  input card. Tabs share the same chip data and onPickChip dispatch,
  so plugin selection, active state, and pending state are unchanged.
  Active tab shows a colored underline; the bar border visually
  connects to the input card below.

- Workflow source (migrate group) stays as the chip row below the
  input card, now standing alone with unambiguous "how to start"
  semantics.

- Subtitle updated from "Pick a plugin below" to "Pick a type"
  to match the new placement.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): skip replace-prompt confirmation when switching output-type tabs

Output-type tab clicks (create group) are mode-selection gestures;
the user expects the prompt to update immediately without a dialog.
The confirmation is still shown for migrate-group chips (From Figma
etc.) where the replacement carries meaningful user-provided content.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): use brand logo as Home destination, drop redundant brand mentions

The entry nav rail previously rendered the brand logo and a separate
Home icon back-to-back; both invoked `onViewChange('home')`, so the
Home button was pure duplicate affordance. The hero pane also
displayed a third "Open Design" lockup that competed with the rail
logo and the rebranded title.

This collapses those affordances:

* `EntryNavRail` drops the dedicated Home `NavButton`. The brand
  logo button now carries `aria-current="page"` and an `is-active`
  visual state when the home view is showing, and its tooltip reads
  "Open Design · Home" off-home so the navigation behavior stays
  discoverable for new users.
* `entry-layout.css` adds the matching `.entry-nav-rail__logo.is-active`
  accent treatment so the "you are here" cue reads at parity with
  primary rail buttons.
* `HomeHero` removes the inline `home-hero__brand` lockup and the
  associated CSS, then retunes the title/subtitle/type-tab spacing
  so the headline group still pairs tightly with the type tabs and
  input card below.
* `entry-chrome-flows` is updated to assert the logo carries the
  active page treatment and that no `entry-nav-home` test id
  resurfaces by mistake.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): auto-grow home chat input, disable internal scroll and drag-resize

The home hero textarea was capped at a fixed `min-height: 84px` with
`resize: vertical`, so users had to either drag the bottom-right
corner to enlarge it or scroll the textarea internally to read
longer prompts. That hid context (loaded plugin templates routinely
overflow three lines) and exposed a manual grip whose state was easy
to leave in an awkward height.

This change makes the chat box grow with its content:

* `HomeHero` adds a `useLayoutEffect` that, on every prompt change,
  resets the textarea height to `auto` so the browser can measure a
  smaller content, then writes back `scrollHeight` in pixels. The
  effect uses layout phase (not effect phase) to avoid a one-frame
  flash at the previous height when a plugin loads a long example
  prompt.
* `home-hero.css` swaps `resize: vertical` for `resize: none` and
  adds `overflow: hidden`, so the manual drag grip disappears and
  the textarea never scrolls internally. `min-height: 84px` is kept
  so the empty input still reads as a chunky chat box. The outer
  page handles overflow when the prompt is genuinely very long.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): make output-type tabs read as folder-tabs attached to chat box

The previous tab bar used a small icon + label per tab and signaled
the active tab with a 2px accent underline sitting on a horizontal
divider line. That divider visually broke the relationship between
the tabs and the input card below — the active tab looked like a
free-floating header, not a flap of the chat surface, so users had
to do extra work to mentally connect "I picked Prototype" with the
prompt area immediately underneath.

This switches the tab bar to a folder-tab pattern:

* `HomeHero` drops the per-tab `Icon` element. The seven labels
  (Prototype, Live artifact, Slide deck, Image, Video, HyperFrames,
  Audio) already disambiguate at the type sizes used here, and the
  icons were primarily decorative.
* `home-hero.css` rewrites the tab styling:
  - The bar no longer paints its own baseline border; the input
    card's top edge serves as the baseline.
  - Each tab is a rounded-top container with a 1px transparent
    border and a `-1px` bottom margin, so its bottom edge overlaps
    the card's top border by exactly one pixel.
  - The active tab borrows the card's panel background and border
    color, and overrides its bottom border with the panel color
    so it visually erases the card's top border for the tab's
    width. The result reads as one continuous "Prototype is this
    chat box" surface.
  - The card's prior `margin-top: 8px` is removed so the tab bar
    bottom and card top sit at the same y coordinate, which is the
    geometric precondition for the 1px overlap to land cleanly.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): enlarge home chat attach and submit buttons for legibility

The attach (paper-clip) and submit (arrow-up) controls on the home
chat input rendered at 32px diameter with 14px and 18px glyphs
respectively. After stroke antialiasing the icons read at roughly
11–12px on typical displays — small enough that users reported the
glyphs were illegible and had to be discovered by trial-and-error.

This bumps both controls to 38px circles and grows their glyphs
(attach 14 → 18, submit 18 → 22). The primary call-to-action now
has clearly more visual weight than the surrounding muted hint
text, and the paper-clip is recognisable at a glance.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): paint a chevron on inline select slots in the home prompt

Inline select slots in the home prompt overlay (e.g. the
`high-fidelity` / `wireframe` fidelity picker on the Live artifact
template) rendered as plain pill highlights, visually identical to
free-text and read-only text slots. The slot's `appearance: none`
strips the browser's default chevron, so users had no affordance
hinting the value was switchable.

This wraps select-type slots in an `inline-flex` span and overlays
an explicit chevron-down `Icon` against the slot's trailing padding
(bumped from 18px to 22px to make room). The chevron inherits the
accent color via the wrap's `color` property so it themes correctly
in light and dark modes. Click semantics are unchanged because the
chevron is `pointer-events: none`.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): size inline number slots to their value, not the whole row

Number-type inline slots in the home prompt (e.g. the slide-count
spinner on the Slide deck template) inherited the generic slot
`min-width: 8ch` and then stretched to the browser's default
`<input type=number>` width, so a two-digit value like "10" ate
the entire remaining line and pushed the native spinner buttons
to the far right edge — far away from the value they control.

This sizes the number input to its actual content plus four
characters of trailing room for the spinner buttons (clamped to
6–14ch), and adds a `home-hero__prompt-slot-input--number`
modifier that overrides the slot `min-width` to 4ch. The
spinner now sits flush against the value and the slot reads as
a compact inline pill matching the surrounding text/select
slots.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): widen prompt line-height so slot pills do not collide vertically

Each interactive slot and mention in the home prompt paints a 2px
outline ring via `box-shadow`. At the previous `line-height: 1.55`
on a 15px font, two lines were ~23px apart while a single pill
occupied ~20px of vertical space (text box + ring on both sides),
leaving roughly 2px of clear space between rows. When the prompt
wrapped onto multiple lines — common for the Image template's
"Generate a {kind} of {subject}. Style: {style}. Aspect: {ratio}."
example — the rings from line N and line N+1 visually merged into
a single bar, making it ambiguous which pill the user was about to
click or edit.

This bumps the line-height of both the highlight overlay and the
underlying editable textarea to 1.85 (~28px per line), restoring
~8px of clear space between pill rows. The two values must stay
identical so the overlay glyphs continue to track the textarea
caret positions exactly; a brief comment in each rule documents
this coupling for future edits.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): paint dropdown chevron on select element with neutral color

The previous chevron treatment wrapped each select slot in an
inline-flex span and laid an accent-orange `<Icon>` over the
trailing padding. At the prompt's normal display size the orange
chevron blended into the orange pill ring corners, so users
perceived "a bunch of dropdown arrows" across every highlighted
slot, even though only the actual select rendered one.

Move the chevron onto the `<select>` itself as a small neutral-
gray `background-image` SVG. The grey contrasts clearly with the
pill's accent ring and the chevron lives inside the select's own
padding, so it can never visually overlap the value text and can
never appear on non-select slots.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): shrink select-slot dropdown caret so it stops reading as a check-mark

The 8×6 stroked chevron looked like a check-mark in zoomed views
because the stroke width was a large fraction of the glyph. Swap
for a small (9×5) filled triangle drawn as a `background-image`,
with `background-size` pinned so browsers can't scale it to its
intrinsic SVG box. The caret is now unambiguously a dropdown
indicator without crowding the value text.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): make read-only prompt slots visually distinct from editable pills

Multi-word context values are rendered as plain <span>s with
pointer-events: none, but they inherited the same orange pill +
ring treatment as the truly editable <input>/<select> slots.
Users couldn't tell which pills they could click into.

Strip the pill background, the ring, the radius, and the padding
from `.home-hero__prompt-slot-text` and replace them with a
subtle dashed bottom border. The orange foreground keeps the
slot family link, but read-only highlights now clearly read as
"context value spliced into the prompt" rather than as
"interactive control".

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): scale select-slot dropdown caret up to 12px wide for legibility

The 9×5 caret was too quiet at the prompt's font size to read as
a dropdown affordance. Bump to a 12×6 filled triangle and pad the
select's trailing space (24px) so the value text still has clear
breathing room from the caret.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): collapse plugins search and filter strips into one bar

The plugins-home gallery laid out search, count, mode (Featured),
total-in-catalog, clear-filters, the main category row, and the
sub-category row as four floating clusters. Search lived up in
the section header, far from the chips it actually scopes; the
mode strip wedged "Featured + 386 in catalog + Clear filters"
between the header and the category row with no obvious
ownership; and the two clear-filter affordances duplicated each
other (chip strip + empty-state).

Fold the mode strip into the category row: Featured is now the
leading chip on the same line as the category pills, and the
search field, the result count, and a compact Clear link sit as
a right-aligned tools cluster on that same row. The header
shrinks back to just the title, subtitle, and Browse registry
link. The sub-category row stays as the contextual second line
when a category is active.

Tests: keeps the existing data-testids (plugins-home-chip-featured,
plugins-home-row-category, plugins-home-clear, plugins-home-count,
plugins-home-search) so the existing 11-case section spec passes
without modification.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): hide Recent projects rail on first-run home when empty

A brand-new user landing on Home saw an empty dashed box with the
copy "No projects yet — type a prompt to start one." sitting
above the plugin gallery. The hero already prompts the user to
type, so the empty rail just adds vertical noise without telling
them anything new.

Return null from RecentProjectsStrip when the recent list is
empty (loading or not) so the rail appears only when there is
actually something to show. Update the entry-chrome e2e to
assert toHaveCount(0) on first run — codifying the new contract
that the rail is conditional, not always-mounted.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): hide structured inputs form when every field is in the prompt

The PluginInputsForm below the chat textarea surfaced every plugin
input — even the ones the prompt template already substitutes
inline as highlighted slot pills. For a plugin like Prototype
that's five identical labelled inputs duplicating the five slot
pills above them, making the chat box look like it has grown a
second composer.

Compute the set of placeholder keys actually referenced in the
prompt template (via INPUT_PLACEHOLDER_PATTERN) and skip those
when deciding what the structured form needs to render. The form
still appears for plugins that expose inputs not referenced in
the prompt text (e.g. a "Run in background" toggle), but
template-only plugins collapse the redundant second editor.

Update the picker spec accordingly: when every field is in the
template the form is now expected to be absent rather than to
render a duplicate of the inline slot.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): drop redundant filtered-count and Clear link beside the search

The combined plugins filter bar trailed the search field with
"59 / 386" and an inline "Clear" link, but both repeat what the
chip strip already shows: every chip carries its own count
(Slides 59, All 386, …) and clicking the All chip already resets
the filters. Users had no way to know what the bare "59 / 386"
fraction or "Clear" referred to without inference.

Strip the trailing tools cluster down to just the search input.
The empty-results message at the bottom of the gallery still
exposes a contextual "Clear filters" button when a stacked
filter yields no matches, so the affordance isn't lost — just
removed from a position where it didn't read as actionable.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): drop default subtitle copy from the Official starters strip

The Home gallery rendered a two-line explanatory subtitle under
"Official starters" — "Ready-to-use Open Design workflows bundled
with this runtime. Pick one to load a starter prompt, or browse
the registry for more." — every visit. Returning users skip it,
new users get the same message from the section title plus the
Browse registry link plus the visual card grid itself; the prose
read as filler chrome above the chip strip.

Default the subtitle prop to undefined and only render the <p>
when a caller passes an explicit string. Other surfaces that
mount PluginsHomeSection with their own copy keep their
subtitle; the bare Home gallery loses one row of vertical noise.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Fade out empty workflow lanes in Plugins home filter

Empty top-level lanes (Deploy 0, Refine 0, etc.) and empty
sub-categories (Vercel 0 under Deploy, Figma 0 under Import, etc.)
used to look identical to populated ones, so users couldn't tell at
a glance which chips were real catalog buckets and which were
"contribute a plugin" invites. The strip kept all lanes visible by
design — the workflow shape (Import / Create / Export / Share /
Deploy / Refine / Extend) is part of what we want users to see —
so we keep them clickable, but tag count-zero chips with
data-empty='true' and give them a faded, dashed-border treatment.
"All" pills stay solid since their count reflects the parent lane,
not their own emptiness.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): update home nav test expectations

* fix(e2e): align critical smoke with entry chrome

---------

Co-authored-by: chaoxiaoche <chaoxiaoche@chaoxiaochedeMacBook-Pro.local>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-19 14:58:15 +08:00
PerishFire
bd48c597b0
chore: pin dependency versions and harden CI caches (#2189)
* chore: pin dependency versions

* ci: enforce pinned dependency specs

* ci: fix pnpm executable invocation
2026-05-19 13:58:27 +08:00
PerishFire
99b42726b8
Simplify CI PR gate (#2183) 2026-05-19 13:18:41 +08:00
shangxinyu1
f650a043d9
test(e2e): align entry coverage with redesigned flows (#2101)
* Migrate entry E2E coverage and split CI

* test(e2e): relax connectors auth error assertions

* ci: route scenario registry changes to extended e2e

* ci: decouple packaged smoke jobs from validate gate

* ci: restore pre-split workflow

* test(e2e): slim critical ui smoke coverage

* test(e2e): move broader entry flows out of critical

* test(e2e): restore entry chrome coverage to ci

* ci: parallelize workspace validation jobs

* test(web): stabilize media palette bridge assertion

* ci: cache e2e playwright browsers
2026-05-19 11:26:40 +08:00
PerishFire
4424f08be0
[codex] Add packaged desktop auto-update (#1375)
* Add packaged desktop auto-update

* Handle counted beta nightly update versions

* Refresh desktop auto-update branch for main

* Serialize desktop updater operations

* Refresh auto-update branch for packaged paths
2026-05-19 11:20:05 +08:00
Abhisek Das
3f56395e2d
fix(web): restore Settings subtab-pill hover contrast in dark theme (#1815)
The .subtab-pill button hover rule hard-coded `rgba(255,255,255,0.6)` as the
hover background. In light theme this composites to a near-white wash that
sits cleanly under `--text`, but in dark theme it overlays a translucent
white on the dark panel and produces a bright, near-white surface beneath
the same light `--text`, dropping contrast to ~1.87 (well below WCAG AA 4.5).

The reporter (#1795, on 0.6.0) called this out on five Settings surfaces.
Four of them — BYOK / Appearance / Notifications / protocol-chip — were
incidentally fixed by the c5d77a03 dark-palette refresh because they use
the .seg-control / .seg-btn hover path, not the .subtab-pill path. The
Pet → pet source tabs (Built-in / Custom / Community) still reproduce on
0.7.0 because they use .subtab-pill, which never got a theme-aware hover.

Switch the hover background to `var(--bg-muted)`, an existing token that
mirrors `--bg-panel`'s active state shape (one step lighter than
`--bg-subtle`) and is defined per-theme. Composited contrast becomes
~9.4 in dark and ~14.3 in light against `--text`, comfortably above AA.

The same .subtab-pill rule is shared by RoutinesSection and DesignsTab, so
they pick up the fix as a side effect of fixing the source class.

Anchored by a Playwright regression test under e2e/ui/ that asserts WCAG
AA across both themes for the Pet source tabs and the seg-btn surfaces.
The test goes red on this commit's parent (Pets/dark, ratio 1.87) and
green here (all four assertions pass).
2026-05-15 23:08:00 +08:00
lefarcen
91195bc4ac Merge origin/preview/v0.8.0 (PR #1830) into sync branch
Pulls in the teammate's PR #1830 — e2e align to redesigned entry flows
+ watcher and codex runtime test fixes. Conflicts resolved:

- apps/daemon/tests/runtimes/registry-and-args.test.ts: imported both
  withPlatform (HEAD, used 6x) and withEnvSnapshot (theirs, used 1x);
  both helpers are needed in the file.
- apps/web/src/components/ProjectView.tsx: kept both ChatPane-remount
  guards. Main's fix #1710 uses lastSyncedConversationIdRef tracking URL
  sync; the teammate added lastSeenRouteConversationIdRef tracking last
  observed route. Both refs are already defined in the file and the two
  checks are complementary defenses.
- e2e/ui/settings-*.test.ts (5 files): took teammate's version wholesale
  — they replaced inline `page.goto('/') + click Execution mode` with
  `gotoEntryHome + openSettingsDialogFromEntry` helpers, which is exactly
  the PR's intent.
2026-05-15 19:10:39 +08:00
shangxinyu1
2181eb376d
test(e2e): align UI suites with redesigned entry flows (#1830)
Some checks failed
nix-check / build (push) Failing after 2s
* test(e2e): align critical ui tests with new entry shell

* test(e2e): align ui suites with redesigned entry flows

* fix(e2e): correct entry chrome page helper type

* fix(daemon): stabilize watcher and codex runtime tests

* fix(daemon): satisfy strict watcher option typing
2026-05-15 19:05:39 +08:00
lefarcen
22a3b99a47 Merge origin/main into preview/v0.8.0
Sync 49 commits from main. Conflicts resolved:
- .github/workflows/ci.yml: kept v0.8.0 granular per-area gating, added main's
  linux specs + release-stable.yml + release-preview.yml triggers
- .github/workflows/release-preview.yml: kept v0.8.0's full workflow over main's placeholder
- apps/web/src/components/AssistantMessage.tsx: combined v0.8.0 file-ops
  summary with main's stripTodoToolGroups + suppressAskUserQuestionFallbackText
- apps/web/src/components/ChatPane.tsx: kept both new imports
- apps/web/src/index.css: kept both .msg-plugin-chip and .user-copy-btn blocks
- e2e/ui/*.test.ts: kept v0.8.0 openEntrySettingsDialog helper over main's
  inline dialog navigation (UI was redesigned in v0.8.0)
- nix/package-{daemon,web}.nix: kept v0.8.0 pnpmDepsHash; rerun nix build to refresh
2026-05-15 18:23:33 +08:00
Olin Hendershot
74637f1cb5
Add Linux packaged client parity smoke coverage (#1204)
* docs: plan linux client issue 709

* fix: complete linux headless lifecycle routing

* feat: add linux packaged inspect

* test: add linux headless packaged smoke

* ci: add linux headless packaged smoke

* ci: smoke linux AppImage release artifacts

* docs: document linux packaged client status

* chore: finalize linux client audit remediation

* docs: add linux client publication packet

* test: harden linux client smoke coverage

* ci: preserve linux smoke audit evidence

* refactor: consolidate linux e2e helpers

Move pathExists and the desktop/web/daemon app-key array out of
linux.spec.ts into linux-helpers.ts, where expectPathInside and
linuxUserHome already live. Keeps the spec file focused on tests and
the helpers file as the canonical home for shared Linux e2e utilities.

* fix: move linux e2e helpers to lib

* fix: address linux release review blockers

* fix: drop npm dependency from containerized linux build

writeAssembledApp() previously called runNpmInstall() which executed
`npm install` directly. Inside the containerized build path,
electronuserland/builder:base strips npm/npx/corepack, so the inner
tools-pack build would fail at the assembled-app install step.

Route the install through OD_TOOLS_PACK_PNPM_BIN: buildDockerArgs sets
the env to the standalone pnpm binary it bootstraps, and the new
resolveProductionInstallCommand helper consumes that env to run
`<bin> install --prod --no-lockfile --config.node-linker=hoisted`.
Host invocations with no env set keep the prior npm behavior.
--config.node-linker=hoisted preserves the flat node_modules layout
that electron-builder packs the same way as npm-installed trees.

New tests cover the resolver branches and assert the docker-arg-to-
resolver chain end-to-end so reviewers can see the container's inner
build receives the env that switches its install away from npm.

* fix: harden linux container bootstrap

* fix: validate desktop marker liveness in headless cleanup

cleanup --headless previously skipped on any parseable desktop-root.json, trapping recovery when the AppImage had crashed and left a stale marker. Validate the marker the same way stopPackedLinuxApp does: if the PID is not in the live snapshot list, proceed through cleanup instead of skipping.

Extract the validation into validateDesktopAppImageMarker so the stop and cleanup paths share one definition of live and owned. Tests cover both branches: a stale marker drives cleanup to remove the runtime/output roots, while a live marker drives cleanup to skip and preserve them.
2026-05-15 16:38:29 +08:00
Tom Huang
c5d77a03bd
Garnet hemisphere (#1769)
Some checks failed
nix-check / build (push) Failing after 2s
* feat(chat-composer): enhance mention handling and input overlay

- Introduced a new overlay for inline mentions in the chat composer, improving user experience by visually indicating mentions as users type.
- Updated the `ChatComposer` component to manage mention entities and integrate them into the input field, allowing for better context and interaction.
- Enhanced the `AssistantMessage` component to support the display of plugin action panels based on the current project context, facilitating easier plugin management.
- Refactored related components to ensure consistent handling of project files and mentions across the application.

This update significantly improves the chat interaction model, making it more intuitive for users to engage with mentions and plugins.

* feat(plugin-management): enhance plugin action panels and UI components

- Updated the `AssistantMessage` component to include plugin action panels based on the latest project context, improving user interaction with generated plugins.
- Refactored the `PluginsView` to support detailed views for available marketplace entries, allowing users to access more information and actions for each plugin.
- Introduced new CSS styles for improved visual representation of plugin-related UI elements, enhancing overall user experience.
- Enhanced the `listPlugins` function to include an option for fetching hidden plugins, providing more flexibility in plugin management.

This update significantly improves the usability and functionality of the plugin management system, making it easier for users to interact with and manage their plugins.

* fix(assistant-message): refine plugin folder candidate selection logic

- Updated the `pluginFoldersTouchedThisTurn` function to improve the logic for selecting plugin folder candidates based on touched paths and message content.
- Introduced a new helper function, `pathMatchesFolderFileBasename`, to enhance the matching criteria for folder candidates.
- Added a check for explicit folder matches before falling back to a single candidate, improving accuracy in folder selection.
- Modified the `shouldRenderSlotAsText` function in `HomeHero` to include the name parameter, refining the rendering logic for slot text.

These changes enhance the functionality and reliability of the assistant message component in managing plugin folder candidates.

* feat(plugin-folder-actions): implement agent-routed CLI actions for plugin management

- Introduced a new `PluginFolderAgentAction` type to streamline actions related to plugin folders, including install, publish, and contribute.
- Updated the `DesignFilesPanel`, `FileWorkspace`, and `AssistantMessage` components to utilize the new agent action handling, improving user interaction with generated plugins.
- Refactored the action handling logic to send commands to the agent, enhancing the workflow for managing plugin folders.
- Added corresponding tests to ensure the new functionality works as expected and integrates seamlessly with existing components.

This update significantly enhances the plugin management experience by routing actions through the agent, allowing for a more cohesive and interactive user experience.

* Fix PR 1702 CI blockers

* Fix PR 1702 remaining CI checks

* Prebuild AGUI adapter after install

* Restore plugin project snapshot wiring

* feat(marketplace): refactor marketplace URL handling and enhance fetching logic

- Introduced new functions to normalize marketplace URLs and manage fetching of marketplace manifests, improving the reliability of marketplace integrations.
- Updated the server and plugin logic to utilize the new fetching mechanisms, ensuring consistent handling of marketplace data.
- Enhanced tests to cover new URL normalization and fetching scenarios, ensuring robustness in marketplace management.

This update significantly improves the marketplace experience by streamlining URL handling and enhancing data fetching capabilities.

* Fix project auto-send cleanup spec

* Reconcile run messages on cancel

* Use active design system as visual direction

* Fix active design system prompt wording

* feat(workspace-tabs): implement workspace tabs functionality and file attachment handling

- Introduced a new `WorkspaceTabsBar` component to manage workspace tabs, allowing users to navigate between different views (projects, marketplace, etc.).
- Enhanced file handling capabilities in the `HomeHero` and `EntryShell` components, enabling users to stage and attach files before project creation.
- Updated the `App` component to support auto-sending attachments alongside the first message in a project.
- Improved CSS styles for workspace tabs and attachment UI, ensuring a cohesive design and user experience.

This update significantly enhances the workspace navigation and file management features, providing users with a more intuitive and efficient workflow.

* refactor(workspace-tabs): streamline workspace tabs and UI components

- Removed unused components and actions from the `WorkspaceTabsBar` and `AppChromeHeader`, simplifying the codebase.
- Updated CSS styles for the workspace shell and tabs, enhancing visual consistency and reducing element sizes for a cleaner layout.
- Introduced a new client type detection mechanism to dynamically adjust the workspace shell's class, improving responsiveness.
- Added tests for the `WorkspaceTabsBar` to ensure proper navigation and tab management functionality.

These changes improve the overall performance and user experience of the workspace navigation system.

* Update critical e2e for entry modal flow

* Stabilize entry critical e2e flows

* fix(ui): adjust workspace tabs and header styles for improved layout

- Updated the CSS for workspace tabs and the app header, reducing element sizes and padding for a cleaner appearance.
- Introduced a new button in the `WorkspaceTabsBar` for quick access to the home tab, enhancing navigation.
- Minor adjustments to the layout and styles to ensure consistency across components.

These changes enhance the user interface and improve the overall user experience in the workspace navigation system.

* feat(workspace-tabs): implement pinned home tab functionality

- Added a new pinned home tab feature to the `WorkspaceTabsBar`, allowing the home tab to remain accessible during navigation.
- Updated tab management logic to collapse duplicate home tabs into a single pinned instance when restoring from local storage.
- Enhanced CSS styles for workspace tabs to accommodate the new pinned tab design.
- Updated tests to verify the behavior of the pinned home tab and its interaction with other tabs.

These changes improve navigation consistency and user experience within the workspace.

* refactor(workspace-tabs): enhance tab management and styling

- Updated CSS styles for workspace tabs, adjusting padding and flex properties for improved layout and consistency.
- Refactored tab creation logic to ensure unique IDs for project and marketplace tabs, enhancing navigation clarity.
- Removed deprecated functions related to pinned home tabs, streamlining the codebase.
- Improved test cases to verify independent behavior of home tabs during navigation.

These changes enhance the user experience by providing a more intuitive tab management system and a cleaner UI.

* style(workspace-tabs): update CSS for improved layout and visibility

- Adjusted CSS properties for workspace tabs, including overflow, position, and z-index to enhance layout and stacking context.
- Ensured consistent styling across tab components for better visual hierarchy.

These changes contribute to a more polished and user-friendly interface within the workspace.

* style(entry-layout): update CSS variables for improved layout consistency

- Replaced fixed width values with CSS variables for the entry rail to enhance flexibility.
- Adjusted padding and height properties for better visual alignment and spacing.
- Introduced a new background style for the entry main topbar to improve aesthetics.

These changes contribute to a more responsive and visually appealing layout in the entry view.

---------

Co-authored-by: qiongyu1999 <2694684348@qq.com>
Co-authored-by: Eli <129168833+qiongyu1999@users.noreply.github.com>
2026-05-15 14:42:11 +08:00
chaoxiaoche
bcc58af931
refactor(web): rename Execution mode and tighten settings dialog UI (#1568)
* refactor(web): rename Execution mode and tighten settings dialog UI

- Rename "Settings → Execution & model" to "Settings → Execution mode"
  across the web UI, i18n keys, docs, and e2e selectors.
- Redesign SettingsDialog: kicker + title row in the modal head, a
  flatMap-driven agent grid that renders the inline test-result row
  beside the selected card, compact unavailable cards with right-aligned
  install/docs links, and an install guide that only shows when the
  user has no working agent picked.
- Trim verbose subtitle / hint copy across chat model, CLI proxy,
  media providers, custom instructions, and memory sections.
- Add an `info` Icon variant for the redesigned settings hints.
- Update e2e selectors and docs that referenced the old menu label.

Co-authored-by: Cursor <cursoragent@cursor.com>

* refactor(web): polish Settings dialog — media providers, skills, MCP

Media providers
- Hide internal Stub fixture provider (settingsVisible: false)
- Split provider list into Available (integrated, editable) and Coming
  Soon (collapsed <details> drawer with name/hint/Docs link only)
- Drop right-side Integrated/Configured badges from every row; all rows
  in the main list are integrated by definition; inline grey "Saved"
  chip next to the provider name is the only status indicator now
- "Saved" badge moves inline to the right of the provider name and uses
  a neutral grey treatment (was a standalone green pill below the name)
- "Reload from daemon" button shows a 2s green "✓ Reloaded" flash on
  success instead of leaving a permanent paragraph under the header;
  errors remain sticky

Skills
- Replace three pill-row filter banks (Source, Type, Category) with a
  compact single-row toolbar: search + three inline <select> dropdowns
  side by side; active filter highlighted with a stronger border

MCP server
- Shorten section hint to one line
- Move WHAT YOUR AGENT CAN DO capabilities above the client dropdown
  (motivate before asking to act)
- Move "Build the daemon first" warning below the code block where it
  contextually explains why the command might fail, not as a top-level
  error before the user has done anything
- Downgrade "Restart your client" left-border from accent orange to
  border-strong grey — it is a next step, not a warning

External MCP
- Shorten section hint to one line

Misc CSS
- Add .sr-only utility for accessible off-screen live regions
- Add button.ghost.is-success-flash for transient success feedback
- Add .library-filter-selects / .library-filter-select for dropdown
  filter rows
- Add .media-provider-coming-soon-* for the roadmap drawer

Co-authored-by: Cursor <cursoragent@cursor.com>

* [codex] Add Cursor Agent auth diagnostics (#1538)

* Add Cursor Agent auth diagnostics

* Handle Cursor not logged in auth status

* Address Cursor auth review feedback

* Classify Cursor stdout auth failures

* test: expand Memory and Routines coverage (#1521)

* test: expand settings and packaged coverage

* test: extend memory settings coverage

* test: cover routine settings failure states

* test: cover routine operation failures

* test: fix daemon test typing on CI

* test: decouple packaged smoke from orbit bug

* test: avoid live memory LLM calls in route tests

* test: fix daemon fetch typing in CI

* fix: restore preview comment and inspect toggles

* test: align manual edit flow with current inspector UX

* test: align comment attachment flow with current preview comments UI

* fix: probe resolved Codex launch path during detection

* fix: remove duplicate board activation helper after rebase

* test: update ghost cli detection mock

* test: align FileViewer toolbar expectation

* ci: move full app tests to extended lane

* ci: run app tests by changed scope

* ci: cover shared app inputs in test scopes

* ci: avoid setup-node cache in windows packaged smoke

* test: align extended settings and manual edit flows

* refactor(web): rename Execution mode and tighten settings dialog UI

- Rename "Settings → Execution & model" to "Settings → Execution mode"
  across the web UI, i18n keys, docs, and e2e selectors.
- Redesign SettingsDialog: kicker + title row in the modal head, a
  flatMap-driven agent grid that renders the inline test-result row
  beside the selected card, compact unavailable cards with right-aligned
  install/docs links, and an install guide that only shows when the
  user has no working agent picked.
- Trim verbose subtitle / hint copy across chat model, CLI proxy,
  media providers, custom instructions, and memory sections.
- Add an `info` Icon variant for the redesigned settings hints.
- Update e2e selectors and docs that referenced the old menu label.

Co-authored-by: Cursor <cursoragent@cursor.com>

* refactor(web): settings dialog UX polish — layout, dedup, and interactions

- Remove duplicate section headers from all settings sections
  (Notifications, Appearance, Privacy, About, Design Systems, Skills,
  MCP server, Connectors, Media providers, Routines)
- Restructure Notifications cards: title + toggle on same row, hint below
- Restructure Skills toolbar: search + New skill button in row 1,
  filter dropdowns in row 2 with left-aligned labels
- Restructure Pet section: tabs and Wake button on same row
- MCP server: group capabilities and setup into separate cards,
  remove nested double border on client picker
- Connectors: show connect errors as toast instead of inline card text,
  position toast inside panel, hide single-provider tab
- Media providers: move Reload button to left-aligned small ghost button
- Memory: info icon shows path on hover, Path copied badge inline;
  Extraction history and MEMORY.md as standalone collapsible cards;
  group header hidden when only one type visible
- Pet grid cards: Adopt button hidden until hover, icon-only when adopted,
  description truncated to 2 lines, text fills full width via abs positioning
- Agent cards: selected state uses accent border only, no background change
- Add sun/moon icons to Appearance theme buttons (Light/Dark)
- Shorten several hint strings for clarity

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): resolve i18n review comments from PR #1568

- Update settings.title and settings.envConfigure to localized
  "Execution mode" in all 17 non-English locale files
- Add settings.memoryFlashPathCopied to all locales and use t()
  in MemorySection instead of hardcoded English "Path copied"
- Add settings.agentModelHead to all locales and use t() in
  SettingsDialog for "Model for:" agent model row header

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): update tests to match settings dialog redesign

- Add role prop to Toast (alert/status) so error toasts from
  ConnectorsBrowser are announced immediately by screen readers
- Clear connectErrorToast on successful connector retry
- Update SettingsDialog.execution tests:
  - Remove heading assertions for About and MCP server (headers
    were intentionally removed as duplicate nav labels)
  - Rewrite CLI env test to use codex-only fields (per-agent
    filtering means only selected agent's fields are shown)
  - Update Composio key hint text assertion to match shortened copy
  - Replace filter button click with select change for Type filter
  - Replace Configured/Unsupported/Integrated badge checks with
    updated assertions matching the new media provider UI
  - Replace disabled BFL row test with coming-soon section check
- Update SettingsDialog.media test: remove Fal.ai input assertions
  (non-integrated providers no longer have editable fields)

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): unblock CI for #1568

Three small fixes to get Playwright back to green on the settings
dialog redesign:

1. `en.ts`: revert `settings.envConfigure` to "Configure execution mode".
   This PR collapsed both `settings.title` (header gear) and
   `settings.envConfigure` (entry-side foot pill) to the same string
   "Execution mode", so `getByRole('button', { name: 'Execution mode' })`
   resolved to two elements and tripped Playwright strict mode in the
   three Composio-flow tests (entry-configuration-flows.test.ts:174,
   228, 285). Restoring the distinct label also gives screen readers
   a clearer hint for the pill, which doubles as a status display.
   Non-English locales still alias the two keys; happy to follow up
   on those, but they don't gate the (English-only) Playwright suite.

2. entry-configuration-flows.test.ts:167 — `Connectors` heading is now
   rendered at `<h2>` in the modal-head (SettingsDialog.tsx:1545), with
   the inner `<h3>` removed by design (see comment around line 1448).
   Updated the assertion from `level: 3` to `level: 2`.

3. project-management-flows.test.ts:360 — same change for the `Pets`
   heading.

Verified locally with `pnpm --filter @open-design/web typecheck` and
`pnpm --filter @open-design/e2e typecheck`. The actual Playwright
specs need the dev server up; I didn't rerun them here, but the
locator changes are mechanical and match the new DOM.

* fix(web): use exact match for Execution mode button locator

Playwright's `getByRole({ name })` defaults to substring matching, so
`{ name: 'Execution mode' }` still resolved to both the header gear
(aria-label "Execution mode") and the entry-side foot pill (aria-label
"Configure execution mode" — substring contains "Execution mode").
Strict mode tripped in the three composio-flow tests at lines 202,
257, and 319.

Adding `exact: true` makes each call resolve to just the header gear,
which opens the same dialog the foot pill does — the test outcomes
are unchanged.

---------

Co-authored-by: chaoxiaoche <chaoxiaoche@chaoxiaochedeMacBook-Pro.local>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Caprika <56862773+alchemistklk@users.noreply.github.com>
Co-authored-by: shangxinyu1 <shangxinyu@refly.ai>
Co-authored-by: lefarcen <935902669@qq.com>
2026-05-15 14:35:06 +08:00
PerishCode
4f15c33595 Merge remote-tracking branch 'origin/preview/0.8.0' into preview/v0.8.0 2026-05-14 21:10:03 +08:00
PerishCode
43b1b94c8e Add preview release channel 2026-05-14 19:15:16 +08:00
lefarcen
3b7f87c7ae Merge remote-tracking branch 'origin/main' into reconcile/garnet-main-merge 2026-05-14 17:44:26 +08:00
PerishCode
cba8bf151d chore: align namespace lifecycle packaging 2026-05-14 16:35:46 +08:00
lefarcen
b268bbe169 Merge origin/garnet-hemisphere (post-9e196d34) — Use Plugin handoff fix
Brings in 11 new garnet commits, most importantly:
- 1a90aef4 feat(plugin-use): implement plugin use handoff functionality —
  fixes the bug QA reported where /plugins Use Plugin would 422 silently
  for template plugins; new flow hands off to HomeView with the plugin
  pre-bound + input form prompted there.
- 2ac58544 feat(plugin-inputs): enhance plugin input handling with file
  upload support — extends PluginInputsForm for file uploads.
- 3b167b69 feat(plugins): registry protocol — new @open-design/registry-protocol
  workspace package (needs build before daemon boot).
- Plus enhancements to plugin metadata, GitHub installer, plugin detail
  view, login/whoami, static HTML preview paths.

Conflicts resolved:
- packages/contracts/src/api/projects.ts: HEAD's skipDiscoveryBrief
  field + garnet's contextPlugins (@-mention plugin context refs) both
  kept on ProjectMetadata.
- apps/landing-page/* (3 files): accepted HEAD — garnet had the older
  single-page landing-page header; main has the multi-page layout
  (/skills/, /systems/, /templates/, /craft/) with dynamic counts. Not
  related to the Use Plugin core fix.

New @open-design/registry-protocol package must be built before daemon
boots; pnpm install does this via postinstall already.
2026-05-14 16:32:35 +08:00
Nagendhra Madishetti
40766ef1ba
test(web): Critique Theater Phase 13 (reducer p99 bench + surface coverage walker) (#1318)
* feat(web): pure reducer for Critique Theater states (Phase 7.1)

Pure CritiqueState reducer driven by the contracts-level PanelEvent
(the same shape both the live SSE stream and the recorded transcript
emit), so a single reducer powers both the in-flight panel and the
rerun replay. Lifecycle covers run_started → running → (shipped /
degraded / interrupted / failed), with panelist_open / dim /
must_fix / close / round_end events building per-round
CritiquePanelistView entries as they arrive.

Defensive behaviour that surfaced while writing the spec tests:
- Terminal phases (shipped / degraded / interrupted / failed) are
  sticky against further lifecycle events for the same run, except
  for parser_warning which can land late and is recorded in a side
  channel without changing phase.
- A new run_started for a different runId at any time discards the
  prior state and reboots, so the UI can launch consecutive runs
  without an explicit reset action.
- Events whose runId does not match the active run return the same
  state reference, so React's useReducer doesn't re-render
  subscribers on stray traffic.
- Round bookkeeping keys by round number rather than "always last",
  so an out-of-order panelist_dim for round 1 arriving after a
  round 2 dim does not corrupt the round 2 bucket.

Test coverage: 18 cases covering each transition, the runId guard,
sticky-terminal behaviour, the out-of-order round invariant, and
the stable-identity guarantee. Sets up Phase 7.2 and 7.3 to wire
SSE + replay into the same reducer.

* feat(web): useCritiqueStream hook subscribes to SSE and feeds reducer (Phase 7.2)

createCritiqueEventsConnection is a pure connection manager that
mirrors apps/web/src/providers/project-events.ts: opens an
EventSource at /api/projects/:id/events, listens for every name in
CRITIQUE_SSE_EVENT_NAMES, decodes each frame back into a PanelEvent
(stripping the critique. prefix and merging the data payload), and
hands it to the caller's onEvent. Reconnect uses exponential
backoff (1s → 30s) and resets on `ready`; malformed payloads drop
with a dev-mode warning rather than tearing the stream.

useCritiqueStream wraps the manager in a useReducer that owns the
CritiqueState. enabled=false or a null projectId tears down the
connection cleanly; switching projectId closes the old connection
and opens a fresh one. The returned dispatch lets local UI
synthesise actions (e.g. an Esc keypress firing a synthetic
interrupted while a kill request is in flight); production traffic
comes from the SSE stream.

Test coverage:
- sse.test.ts (10 cases, node env): subscription set covers every
  CRITIQUE_SSE_EVENT_NAMES channel; payload decoding lifts the wire
  shape back to PanelEvent; malformed JSON is swallowed and does
  not stop the stream; exponential backoff schedule and ready-reset
  semantics are pinned with a setTimeout seam; close() cancels
  pending reconnects and shuts the live source; no-op fallback
  when EventSource is unavailable.
- useCritiqueStream.test.tsx (6 cases, jsdom env): idle pre-event,
  reducer driven by synthetic actions, no connection when disabled
  or projectId is null, clean close on unmount, projectId change
  reopens cleanly.

* feat(web): useCritiqueReplay hook drives reducer from transcript file (Phase 7.3)

Fetches the per-run NDJSON transcript (one PanelEvent per line),
parses every line via the shared isPanelEvent predicate, and
dispatches into the same CritiqueState reducer the live SSE stream
uses. A single reducer means the UI rendering a replay can be
identical to the live panel, and a UI mounting both
useCritiqueStream and useCritiqueReplay in parallel does not have
to reconcile two state shapes.

speed knob is `paused | instant | live | { intervalMs: N }`.
- instant flushes every event synchronously, useful for opening a
  finished run already at its terminal state.
- intervalMs paces dispatches at a fixed cadence so the reviewer
  can watch the run unfold.
- paused parses the transcript but holds events back until the
  caller advances speed (consumers can drive a scrubber later).
- live is reserved for the future "playback at original cadence"
  feature, currently treated as instant; replay timestamps are not
  yet persisted with each event so honest pacing requires a
  follow-up Phase 7+ task.

gunzip seam handles `.ndjson.gz` transcripts via
DecompressionStream when present; the production fetch path picks
between text and arrayBuffer based on the URL extension. Both seams
are injectable so the unit tests don't need to spin up a real
network or a real gzip pipeline.

Test coverage (8 cases, jsdom env):
- Idle status before any URL is provided.
- speed=instant flushes the full transcript synchronously to
  shipped state.
- speed={intervalMs:N} paces with the setTimeout seam, reaching
  done after the last tick.
- speed=paused leaves status=playing with no dispatches.
- Empty transcript reports done with state still idle.
- Fetch rejection surfaces an error status with the message.
- Malformed NDJSON lines are skipped; valid events around them
  still land.
- .gz transcripts route through the gunzip seam.

Closes the Phase 7 plan tasks 7.1 / 7.2 / 7.3 (reducer + stream +
replay), all on one branch ready for review. Phases 8+ (Theater
components) consume these from this PR.

* fix(web): close payload-override gap + paused-resume bug in Critique Theater hooks (Phase 7 review)

Two P1 fixes from lefarcen's review on PR #1307:

SSE payload override

`sseToPanelEvent` previously spread `data` after the channel-derived
`type`, so a payload-provided `type` could override the channel and
route a `critique.run_started` frame into the reducer as a `ship`
action. Reversed the spread so the channel-derived `type` is
authoritative, and revalidated the resulting object through the
contracts-level `isPanelEvent` predicate before returning. Frames
that fail validation (missing runId, empty runId, unknown type) are
dropped, so a malformed or compromised SSE frame can no longer
dispatch a wrong-shape action into the reducer.

Three new sse.test.ts cases pin the regression: hostile `type:'ship'`
in the payload still resolves to `run_started`, missing runId is
dropped, empty runId is dropped.

Replay pause/resume

`useCritiqueReplay` had one big effect keyed on `transcriptUrl`
only, so flipping `speed` from `paused` to `instant` never re-fired
and the held events sat undispatched. Split into a parse effect
(depends on URL, fetches and stores events in state) and a pace
effect (depends on parsed-events + speed, owns the cursor + timers).
The playback cursor lives in a ref that survives pause/resume
cycles, so flipping `paused` -> `instant` flushes from the current
position rather than restarting (which would double-dispatch
`run_started` and reset the reducer).

Two new useCritiqueReplay.test.tsx cases:
- paused-then-instant transitions from `playing` to `done` and
  reaches the shipped terminal phase
- intervalMs paced playback dispatches one event, pauses to drain
  the next scheduled timer, flips to instant, and confirms the
  remaining transcript drains exactly once (cursor was preserved)

Doc consistency

The earlier source comment in useCritiqueReplay.ts claimed `live`
"paces by recorded timestamps" while the impl used zero-delay
timers and the PR body said it behaves like `instant`. Aligned to
reality: `live` currently behaves like `{ intervalMs: 0 }` (events
drain on successive microtasks via setTimeoutFn) because transcripts
do not yet carry per-event timestamps. Honest timestamp-driven
pacing is queued as a Phase 7+ follow-up.

Validated: pnpm guard, pnpm --filter @open-design/web typecheck,
Theater suite 47/47 (up from 42, +3 sse + 2 replay), full web suite
96 files / 888 tests.

* feat(i18n): seed Critique Theater key block (en + zh-CN; other locales fall back via spread)

* feat(web): Theater PanelistLane component (Phase 8.1)

* feat(web): Theater ScoreTicker component (Phase 8.2)

* feat(web): Theater RoundDivider component (Phase 8.3)

* feat(web): Theater InterruptButton component with Escape keybind (Phase 8.4)

* feat(web): Theater TheaterDegraded chip (Phase 8.5)

* feat(web): Theater TheaterCollapsed post-run summary (Phase 8.6)

* feat(web): Theater TheaterTranscript replay surface (Phase 8.7)

* feat(web): Theater TheaterStage top-level container (Phase 8.8)

* feat(web): Theater CSS using existing semantic tokens (no hex literals)

* feat(web): Theater public exports barrel

* fix(web): resolve P2 + P3 review feedback on Phase 8 (PR #1314)

Addresses all 4 P2 + 3 P3 items from codex, Siri-Ray, and lefarcen.

State-lifecycle fixes (3 x P2)
1. Reducer learns a synthetic `__reset__` action (`CritiqueResetAction`).
   Host hooks dispatch it when their gating prop changes so a stale
   run from a prior project / transcript cannot bleed into the next
   context. Reset is idempotent on idle (returns the same reference).
2. `useCritiqueStream` dispatches `__reset__` at the top of its
   connection effect, so a workspace switch from project A (which
   streamed a critique) to project B clears the reducer before the
   new EventSource opens. enabled=false also clears.
3. `useCritiqueReplay` dispatches `__reset__` at the top of its
   parse effect, so transcriptUrl swaps (including swap-to-null after
   a replay reached `shipped`) lift the reducer back to idle before
   the new fetch starts.

SSE validation (1 x P2)
4. `sseToPanelEvent` now runs a per-variant `hasValidVariantShape`
   check after the cheap `isPanelEvent` predicate. A
   `critique.ship` frame missing `composite` / `round` / `status` /
   `artifactRef` is rejected before reaching the reducer, so
   TheaterCollapsed can no longer crash on `undefined.toFixed(1)`.
   Every variant's required fields are validated: run_started
   (protocolVersion, non-empty cast, maxRounds, threshold, scale),
   panelist_* (round, role, plus variant-specific shape), round_end
   (round, composite, mustFix, decision in {continue,ship}, reason),
   ship (round, composite, status, artifactRef.{projectId,artifactId},
   summary), degraded (reason, adapter), interrupted (bestRound,
   composite), failed (cause), parser_warning (kind, position).

Reducer correctness (1 x P2)
5. `panelist_open` now materializes the round + an empty panelist
   view (`{dims: [], mustFixes: []}`) so TheaterStage can highlight
   the in-progress lane the instant the tag opens. Before this, a
   stream that emitted only `panelist_open` after `run_started` left
   `rounds = []` and the UI rendered no current round until a later
   `panelist_dim` arrived.

Polish (3 x P3)
6. Brand role tint swaps from `var(--magenta, var(--accent))` to
   `var(--purple, var(--accent))`. `--purple` is actually defined
   across the design systems; `--magenta` is not, so Brand was
   silently falling through to `--accent` and looking identical to
   Designer.
7. New i18n key `critiqueTheater.interruptedSummary` for the
   interrupted-collapse copy ("Interrupted at round N, best
   composite X.X"). Previously the interrupted branch reused
   `shippedSummary` and the UI read "Shipped at round..." for a run
   that specifically did not ship. Native value in en + zh-CN; other
   locales fall back via `...en` spread.
8. `TheaterDegraded` heading id comes from `useId()` instead of a
   hardcoded `theater-degraded-heading`, so two chips rendered on
   the same page (chat history with multiple completed runs) keep
   their aria-labelledby references unambiguous.

Tests (15 new cases)
- reducer.test.ts (+5): __reset__ on running/terminal/idle, panelist_open materializes round, panelist_open does not stomp prior panelist data.
- sse.test.ts (+6): variant-level rejection for ship without required fields, degraded without adapter, run_started with empty cast, panelist_dim with non-numeric score, round_end with unknown decision, plus a positive fully-formed ship.
- useCritiqueStream.test.tsx (+2): state reset on projectId change, state reset on enabled flip false.
- useCritiqueReplay.test.tsx (+1): state reset on transcriptUrl swap to null after a replay reached shipped.
- TheaterCollapsed.test.tsx (text-pinning update): asserts the interrupted branch reads "Interrupted at round 1" + "best composite 7.9", and explicitly NOT "Shipped at round...".
- TheaterDegraded.test.tsx (+1): two chips on the same page get unique aria-labelledby ids that each resolve to an `<h3>`.

Validated
- pnpm guard clean
- pnpm --filter @open-design/web typecheck clean
- Theater suite: 13 files, 101 tests (was 86 on the first Phase 8 push, +15 new)
- tests/i18n/locales.test.ts 5 of 5 across 18 locales

* feat(web): CritiqueTheaterMount wires SSE + reducer into a single drop-in (Phase 9.1)

* feat(i18n): Critique Theater strings for de + ja + ko + zh-TW (Phase 9.2)

* fix(web): resolve P1 + P2 review feedback on Phase 9 (PR #1315)

Addresses every blocker from codex, Siri-Ray, and lefarcen. The
three state-lifecycle and SSE-validation issues they also flagged
inherit fixes from PR #1314's review pass that this branch now sits
on top of after rebase.

Real daemon kill on Interrupt (P1)
- CritiqueTheaterMount now POSTs to
  /api/projects/:id/critique/:runId/interrupt alongside the
  optimistic local dispatch. Before this fix, clicking Interrupt
  only flipped the React state to interrupted while the daemon job
  kept running. The fetch is best-effort: a 404 (endpoint not wired
  yet, lands in Phase 15) is swallowed with a dev-mode console.warn
  so the UI still moves to the collapsed badge.
- New fetchInterrupt test seam lets RTL assert on the URL / method
  and simulate the "daemon not ready yet" path. Two tests pin both:
  the happy URL proj-42/critique/run-abc/interrupt POSTs, and a
  rejected fetch still flips the UI.

interruptPending reset on new run (P2)
- A ref-backed effect compares the current runId against the last
  one we saw; when it changes, interruptPending is cleared. A user
  who interrupts run-1 and then triggers run-2 from the same mount
  now gets a fresh, enabled kill button instead of one stuck in
  "Interrupting…". Pinned by a new mount test.

Escape keybind scope (P2)
- InterruptButton now checks the keydown target. Escape inside an
  input, textarea, select, or contenteditable element is ignored
  (and any ancestor of those via closest() is treated the same
  way). Body-level focus still fires the keybind so the Theater
  area's affordance keeps working. Four new tests cover textarea,
  input, contenteditable, and the body-focus positive case.

userFacingName i18n key (P2)
- The spec at specs/current/critique-theater.md:6 mandates a single
  critiqueTheater.userFacingName key so the "Design Jury" label can
  be renamed without touching code. Phase 8 introduced
  critiqueTheater.title by mistake; renamed across types.ts, en.ts,
  zh-CN.ts, de.ts, ja.ts, ko.ts, zh-TW.ts, and the lone consumer
  TheaterStage.tsx. The locale alignment test stays green.

Validated
- pnpm guard clean
- pnpm --filter @open-design/web typecheck clean
- Theater suite: 14 files, 112 tests (was 101 before, +11 new for
  the Phase 9 review pass: 3 mount + 4 InterruptButton focus scope;
  the rest were already in #1314's review fix).
- tests/i18n/locales.test.ts 5 of 5 across 18 locales.

* feat(daemon): adapter-degraded registry with TTL (Phase 10.1)

In-memory registry recording adapters that produced malformed or
oversize transcripts so the orchestrator can skip them for a TTL
window (default 24h) instead of cycling through known-bad providers
on every run.

Records carry reason (malformed_block | oversize_block |
missing_artifact), source label, and expiresAt. The test-only
clock seam lets the suite advance time deterministically and prove
that an expired entry stops counting as degraded without anyone
calling clearDegraded.

7/7 vitest cases green.

* feat(daemon): synthetic good + bad adapter fixtures (Phase 10.2)

Two test-only adapters that read the existing v1 transcript
fixtures (happy-3-rounds and malformed-unbalanced) and replay them
as either a full string or a 512-byte chunked stream. The chunked
form is what the conformance harness uses to prove the parser
holds together when the transcript arrives in arbitrary network
slices, not as one buffered blob.

* feat(daemon): adapter conformance harness (Phase 10.3)

runAdapterConformance pulls a transcript through the same
parseCritiqueStream pipeline the orchestrator uses and classifies
the outcome as shipped, degraded, or failed. On a degraded
outcome it forwards the matched reason to the adapter-degraded
registry, so a single nightly conformance run is what populates
the skip list rather than the orchestrator learning each adapter
is broken at request time.

5/5 vitest cases green covering shipped, malformed degraded,
oversize degraded, no-ship failure, and the harness-thrown
failure path.

* test(e2e): Critique Theater Playwright suite (Phase 11)

Six tests, one viewport per visual case, deterministic SSE
fixtures stubbed via page.route(). Adds the suite to
test:ui:extended so the existing extended-UI lane picks it up.

Coverage:

  1. Happy path: a single mounted theater plays the full
     fixture (1 run_started, 5 panelists open / dim / must_fix /
     close, 1 round_end, 1 ship) and ends on the score badge.
  2. Interrupt mid-run: the panelist that is open at the time
     the interrupt button is clicked closes with an interrupted
     marker and the transcript freezes there.
  3. Visual regression at 375x720 mobile.
  4. Visual regression at 768x1024 tablet.
  5. Visual regression at 1280x800 desktop.
  6. A11y role tree: the theater region exposes a labelled
     landmark, each panelist lane is a group with an accessible
     name, the score is a status live region.

All SSE traffic is stubbed by page.route so the suite runs in CI
without a daemon. The toggle is seeded via localStorage by
bootAppWithCritiqueEnabled so the gate behaves as if Settings
flipped it on. typecheck clean; playwright --list reports 6.

* test(web): reducer p99 bench at 10k iterations (Phase 13.1)

Locks the documented 2ms budget for the Critique Theater reducer
on a representative SSE script (27 actions, one full happy run)
behind a regression gate. Asserts p99 stays under 4ms (2x the
documented budget) so CI runners with a noisy neighbour do not
flake while a real regression to 20ms or 200ms still trips.

The bench is a vitest case rather than a bare microbenchmark so
it runs in the same CI lane as every other web test and does not
need a parallel runner.

* test(web): critique surface coverage walker (Phase 13.2)

Walks the public critique surface (11 SSE event names, 5 panelist
roles, 6 lifecycle phases, 9 named i18n keys) and asserts each
named symbol appears in both the src corpus and the test corpus.
The walker is the gate that catches a rename in one half of the
codebase without a matching update in the other half: a future
PR that drops 'panelist_must_fix' from the reducer without also
removing its test reference fails this suite.

62 assertions, one per symbol per corpus.

* fix(web): tighten Phase 13 gates from lefarcen review (PR #1318)

Address the actionable items from lefarcen's review of the two
Phase 13 CI gates. The two questions about longer-term DX (pre-
commit hook to auto-update the symbol table, AST-walker swap)
are documented as deferred follow-ups rather than landed here.

reducer-bench:
  - Describe renamed to 'reducer p99 regression gate (Phase 13.1)'
    so it reads as a gate, not a comparative benchmark.
  - Failure message now carries the full distribution
    (p50 / p90 / p99 / max + ceiling), so triage on a tripped gate
    can distinguish a real 20ms regression from a 4.001ms CI hiccup
    without re-running locally (lefarcen Q3).
  - Captured a baseline (p50=0.011ms p90=0.013ms p99=0.018ms
    max=0.244ms on a local Node 24 / Win11 run, 2026-05-11) inside
    the docblock so reviewers can see the actual reading sits ~222x
    below the 4ms ceiling (lefarcen Q1).
  - Replaced 'role as any' casts with PanelistRole-typed casts so
    the fixture is typecheck-strict.
  - Phase numbering corrected (13.2 → 13.1 to match the PR body).

critique-coverage:
  - Symbols now grouped under four describe blocks (SSE events /
    panelist roles / lifecycle phases / i18n keys) so a failure
    points at the category that drifted at a glance (lefarcen nit).
  - Docblock now explains the grep-over-AST trade-off (the bug
    class is structural at the string level, not at the AST level)
    and points at the future AST-walker work as a deferred follow-
    up (lefarcen Q2).
  - Docblock now walks a contributor through the four-step
    maintenance flow (add to contract → add caller → add test →
    add literal here), so the next person to add an SSE event or
    i18n key knows the gate exists and what to update (lefarcen
    Q4).
  - Phase strings switched from 'phase: <name>' to bare-quoted
    literals so the walker is robust against single vs double
    quotes and ':' vs '===' source-shape changes.
  - Dead try/catch around 'stack = [root]' removed (cannot throw).
  - Per-symbol failure messages name the symbol AND which corpus
    is missing it, so the gate is self-describing on the next
    CI red.
  - Phase numbering corrected (13.4 → 13.2 to match the PR body).

63 / 63 vitest cases green (1 bench + 62 coverage). Web
typecheck clean.

* fix(web): tighten coverage walker semantics from lefarcen P2/P3 (PR #1318)

Two follow-on findings on commit 338a185:

P2 — coverage gate weakened. The previous revision used one helper
`corpusReferences` for both SRC and TEST corpora, and that helper
accepted the unprefixed PanelEvent type form (`type: 'panelist_must_fix'`)
as a substitute for the prefixed SSE wire name (`critique.panelist_must_fix`).
The fallback is correct on the TEST side (reducer tests dispatch
PanelEvent literals) but it weakened the SRC side: production code
could drop the SSE channel name silently and the PanelEvent type
alias would keep the walker green.

Split into two helpers: `srcReferences` is strict (exact substring
match only, no fallback) and `testReferences` keeps the lenient
fallback for SSE events. The production-side assertions now route
through `srcReferences` so the wire name is load-bearing again.

P3 — maintenance doc overclaimed. The previous revision said 'CI red
if you forget step 4' but the symbol arrays are partially hand-
maintained, so a contributor adding a NEW phase string or i18n key
without updating the array leaves CI green (the walker never knew
to look). Rewrote the failure-mode section to distinguish the two
cases:

  - Renaming an EXISTING symbol without updating the walker → CI red
    (existing assertion fails because the old name is gone).
  - Adding a NEW hand-maintained symbol without updating the walker
    → CI stays green (walker does not know to look for it).

Also clarified that `SSE_EVENTS` and `PANELIST_ROLE_STRINGS` are
auto-built from contracts so step 4 is one-line for `PHASE_STRINGS`
and `I18N_KEYS` only.

63 / 63 vitest cases still green.

* fix(web): close two P2 findings on PR #1318 (Siri-Ray + lefarcen)

P2 (coverage walker counted self as evidence). The walker walked
apps/web/tests, which contains apps/web/tests/components/Theater/
critique-coverage.test.ts itself. The hand-maintained PHASE_STRINGS
and I18N_KEYS literals inside that file would satisfy the test-side
coverage assertion against themselves, so a real Theater test that
covers a symbol could be deleted and the gate would still pass.

Excluded the walker file from TEST_FILES via path.resolve(__filename)
filter so the test corpus only contains independent evidence.

Once the walker stopped seeing itself, the gate correctly red-flagged
nine i18n keys that no INDEPENDENT test exercises:
critiqueTheater.userFacingName, roundLabel, composite, threshold,
interrupt, interrupted, degradedHeading, shippedSummary,
interruptedSummary. Component tests like TheaterCollapsed.test.tsx
exercise the rendered text but never mention the key STRING, so the
walker couldn't see them. Closed that gap by adding
apps/web/tests/components/Theater/critique-i18n-keys.test.ts: 9 cases,
one per watched key, asserting the dictionary entry exists as a
non-empty string. That's both real coverage (catches a stale dict)
and the independent evidence the walker requires.

P2 (interruptedSummary missing from de/ja/ko/zh-TW). The native
locale overrides were missing the key, so an interrupted run on a
German / Japanese / Korean / Traditional Chinese UI silently fell
back to the English string via the ...en spread. Added the key with
{round} and {composite} placeholders preserved, using PerishCode's
suggested copy from the earlier review thread.

Verified:
- pnpm --filter @open-design/web typecheck clean.
- pnpm exec vitest run tests/components/Theater tests/i18n:
  20 files / 190 tests green (critique-coverage 62 / 62,
  critique-i18n-keys 9 / 9 new, reducer-bench 1 / 1, locales 5 / 5).

* fix(web): drop the Dict cast in i18n key coverage test (lefarcen P1 / Siri-Ray on PR #1318)

The previous revision used `(en as Record<string, string>)[key]` to
read each watched key. Dict has no string index signature, so CI's
strict typecheck rejected the broad cast with TS2352 even though the
runtime assertion was fine.

Replaced with the typed pattern lefarcen suggested: type WATCHED_KEYS
as `readonly (keyof typeof en)[]` and read `en[key]` directly. That
removes the cast and also strengthens the test, because a renamed or
removed key now fails the type check immediately rather than at
runtime.

Verified:
- pnpm --filter @open-design/web typecheck clean.
- pnpm --filter @open-design/web exec vitest run
  tests/components/Theater/critique-i18n-keys.test.ts: 9 / 9 green.

* fix(web): tighten isPanelEvent in contracts so enum + numeric fields are checked end-to-end (Siri-Ray round-3 P1 on PR #1314)

The variant validator on the web SSE path previously accepted any
`typeof === 'string'` for closed-enum fields (ship.status,
panelist_*.role, degraded.reason, failed.cause, parser_warning.kind,
run_started.cast[]) and any `typeof === 'number'` for numeric fields,
which let NaN / Infinity through. Downstream components index i18n
tables by enum value, so an unknown status or role would land
`SHIP_BADGE_KEY[final.status]` on undefined and crash the translator.

The replay parser had a separate gap: `useCritiqueReplay.parseTranscript`
called the cheap `isPanelEvent` header check directly, so a recorded
line like `{"type":"ship","runId":"r"}` reached the reducer with
composite, status, round, artifactRef, summary all undefined and
TheaterCollapsed then called `final.composite.toFixed(1)` on undefined.

Resolution: move all wire-side validation into the contract guard.

- Export const arrays for the closed enums:
  SHIP_STATUSES, DEGRADED_REASONS, FAILED_CAUSES, PARSER_WARNING_KINDS,
  ROUND_DECISIONS (PANELIST_ROLES already existed).
- Rewrite `isPanelEvent` in packages/contracts/src/critique.ts to be the
  single deep validator: header (known type + non-empty runId) plus
  every variant-specific required field plus closed-enum membership
  plus Number.isFinite on every numeric field. Documented as the wire
  source of truth.
- Drop the local `hasValidVariantShape` from web/sse.ts; sseToPanelEvent
  now relies entirely on the contract guard, and parseTranscript in
  useCritiqueReplay (which already uses isPanelEvent) gets the deeper
  validation for free.

Tests (TDD, red-first):

- packages/contracts/tests/critique.test.ts: 13 new cases pinning the
  strict guard directly (well-formed across every variant, every
  rejection path: unknown type, empty/non-string runId, unknown enum,
  non-finite numeric, missing variant field).
- apps/web/tests/components/Theater/state/sse.test.ts: 9 new cases for
  each closed-enum rejection on the wire path plus a positive sweep
  across every legal enum value across every variant.
- apps/web/tests/components/Theater/hooks/useCritiqueReplay.test.tsx:
  2 new cases for incomplete and unknown-enum transcript lines.

Verified:
- pnpm --filter @open-design/contracts test 4 files / 30 tests green.
- pnpm --filter @open-design/contracts build clean.
- pnpm --filter @open-design/web typecheck clean.
- pnpm --filter @open-design/web test 107 files / 976 tests green.

* fix(contracts): enforce numeric domains in isPanelEvent (lefarcen P2 on PR #1314 round 4)

The strict guard from PR #1314 round 3 enforced enum membership and
Number.isFinite, but accepted any finite number where the contract
intends a specific domain: scale: 0 (ScoreTicker divides by it),
negative thresholds, fractional rounds, negative mustFix, etc.
ScoreTicker.tsx writes `var(--scale, ${state.scale})` into inline
CSS and divides by it for tick width, so a guard-passing scale: 0
shipped Infinity into the rendered style. Negative composite /
score values reached downstream code that assumes >= 0.

Resolution: mirror the daemon-side Zod domain constraints in the
runtime guard.

Three new helpers in packages/contracts/src/critique.ts:

  - isPositiveInt(v): integer with v > 0. Used for round, maxRounds,
    scale, protocolVersion (all 1-indexed in the orchestrator).
  - isNonNegativeInt(v): integer with v >= 0. Used for mustFix,
    position, bestRound. bestRound: 0 is the valid sentinel for
    'interrupted before any round closed'.
  - isNonNegativeFinite(v): finite number with v >= 0. Used for
    composite, score, dimScore, threshold. Threshold may be
    fractional (e.g. 8.5 on a scale of 10).

Cross-field check inside run_started: threshold <= scale (the daemon
Zod schema enforces this with an epsilon refine, the wire guard
matches the same intent).

Tests (TDD, red-first) added in packages/contracts/tests/critique.test.ts:

  - 22 new rejection cases across every numeric field that
    previously slipped through: scale: 0, negative scale, fractional
    scale, maxRounds: 0, fractional maxRounds, protocolVersion: 0,
    fractional protocolVersion, negative threshold, threshold > scale,
    round: 0, fractional round, negative dimScore / score, negative /
    fractional mustFix, negative composite, ship round: 0, negative /
    fractional bestRound, negative interrupted composite, negative /
    fractional parser_warning position.
  - 3 positive boundary cases that must still pass: threshold == scale,
    fractional threshold within [0, scale], interrupted with
    bestRound: 0 (no round completed before interrupt), parser_warning
    with position: 0 (start of stream).

Verified:
- pnpm --filter @open-design/contracts build clean.
- pnpm --filter @open-design/contracts test: 4 files / 59 tests green
  (was 37 before the new domain cases).
- pnpm --filter @open-design/web typecheck clean.
- pnpm --filter @open-design/web test: 110 files / 1004 tests green;
  no regression on Theater suite, sse validator, replay parser, or
  assistant-feedback widget tests.

* fix(web): restore wait-for-daemon-ack pattern on Theater interrupt

Same regression as flagged on PR #1316 post-main-merge: the
optimistic local dispatch fired before the POST resolved, so a
daemon 404 / 409 still terminalized the UI and the real SSE
terminal event got ignored by the sticky interrupted phase.

Snapshot runId / bestRound / composite at click time, dispatch
interrupted only on res.ok, clear interruptPending on rejection or
non-2xx so the user can retry. Tests cover rejection + 404 leaving
the run on the live stage; the 204 path waits for the ack.

* test(e2e): move critique-coverage walker from apps/web/tests to e2e/tests (Siri-Ray P2)

The walker is by definition a cross-app consistency check: it reads
the web reducer, the daemon critique module, the contracts package,
and the e2e UI suite. Hosting it under apps/web/tests/ violated the
repo boundary rule (root AGENTS.md): app packages must not import
another app's private src/ or tests/ as a shared helper, and
cross-app consistency checks belong in e2e/tests/. The web test
lane was effectively coupled to daemon and e2e file layout, so a
daemon-only refactor could break the web lane.

Moved the file to e2e/tests/critique-coverage.test.ts and switched
the contracts import to the import.meta.glob shape the e2e package
already uses (see localized-content.test.ts), so the e2e package
does not have to add @open-design/contracts as a workspace dep just
to load two const arrays. REPO_ROOT and SELF_PATH recalculated for
the new location.

Web test lane no longer depends on daemon, contracts, or e2e layout.
The e2e walker covers the same 62 assertions as before:

  e2e/tests/critique-coverage.test.ts  62 / 62 green

Web typecheck clean, e2e typecheck clean.

* fix(test): add projectKind prop to FileViewer deck render after v0.7.0 merge

---------

Co-authored-by: Nagendhra <nagendhra405@gmail.com>
2026-05-14 15:55:36 +08:00
lefarcen
6c16283850 Merge origin/main (post-7c8305f4) into reconcile branch
Brings in 10 new main commits: routine deep-link to specific
conversations (#1508), Windows resource cache fix for Orbit templates,
collapsible comment side panel (#1607), routines project radio polish,
Copilot logo swap, and minor UI fixes.

Conflicts resolved:
- router.ts: garnet's home/view + marketplace routes + main's
  per-project conversationId deep-link field coexist on Route union
- ProjectView.tsx: garnet's isPhantomDaemonRunMessage helper +
  main's isStoppableAssistantMessage helper both kept
- ProjectView.run-cleanup.test.tsx: accepted HEAD (garnet's
  phantom-row regression test); main's three new tests for
  finalizeActiveAssistantMessagesOnStop / clearStreamingConversationMarker
  / shouldClearActiveRunRefs are queued as a follow-up TODO inline.
2026-05-14 15:13:38 +08:00
shangxinyu1
2976c76fc3
test: expand Memory and Routines coverage (#1521)
* test: expand settings and packaged coverage

* test: extend memory settings coverage

* test: cover routine settings failure states

* test: cover routine operation failures

* test: fix daemon test typing on CI

* test: decouple packaged smoke from orbit bug

* test: avoid live memory LLM calls in route tests

* test: fix daemon fetch typing in CI

* fix: restore preview comment and inspect toggles

* test: align manual edit flow with current inspector UX

* test: align comment attachment flow with current preview comments UI

* fix: probe resolved Codex launch path during detection

* fix: remove duplicate board activation helper after rebase

* test: update ghost cli detection mock

* test: align FileViewer toolbar expectation

* ci: move full app tests to extended lane

* ci: run app tests by changed scope

* ci: cover shared app inputs in test scopes

* ci: avoid setup-node cache in windows packaged smoke

* test: align extended settings and manual edit flows
2026-05-14 14:48:40 +08:00
Nagendhra Madishetti
e508fa3fbd
test(e2e): Critique Theater Phase 11 activation (un-fixme suite, seeded-project nav, split SSE fixture) (#1483) 2026-05-14 14:27:39 +08:00
pftom
f1d0dac58b feat(plugins): enhance plugin registry with new metadata and preview features
- Added new fields to the plugin and metadata types, including `mode`, `taskKind`, `surface`, and `preview`, to improve plugin description and categorization.
- Implemented a `previewFrom` function to generate structured previews for plugins based on their metadata, enhancing the user experience.
- Introduced a `visualKindFor` function to determine the visual representation of plugins based on their attributes, improving sorting and display logic.
- Updated the `entryFromMarketplace` and `officialEntryFromManifest` functions to accommodate the new fields and ensure proper handling of plugin entries.
- Created a new documentation file for the plugin system test suite, consolidating testing strategies and acceptance criteria for better clarity.

This update significantly enhances the plugin registry's capabilities, providing richer metadata and improved user interactions with plugin previews.
2026-05-14 11:12:51 +08:00
lefarcen
a6307bf658 chore(reconcile): remove ad-hoc Playwright screenshot scripts
These landed inside the e2e/ package during the screenshot-comparison
work for product review; they were never meant to be tracked.
2026-05-13 23:49:21 +08:00
lefarcen
53997990b7 Merge origin/main (post-0.7.0) into reconciled garnet branch
Second-pass merge layering 41+ new commits from origin/main on top of
the first reconcile commit. Headline upstream additions absorbed:

- 0.7.0 release: redesigned chat bubble user-text styling, neutralised
  palette, lucide icons, ElevenLabs audio voice option discovery in the
  prompt composer, analytics tracking (PostHog) wired across home /
  studio / create surfaces, Prometheus `/api/metrics` endpoint,
  critique-theater drop-in mount with a settings toggle.
- Misc upstream fixes (titlebar padding, release header layout, deck
  preview chrome, feedback form auto-scroll, conversation-created SSE
  on routine runs, etc.)

Conflict resolutions (12 files, ~22 hunks):

- contracts barrel + prompts/system: union of both sides; new analytics
  exports (`./analytics/events`, `./analytics/public-params`) added
  alongside garnet's plugin/atom/genui exports. Both ElevenLabs voice
  fields (audioVoiceOptions/audioVoiceOptionsError, main) and
  pluginBlock/activeStageBlocks (garnet) preserved on ComposeInput.
- daemon/server.ts: Prometheus `/api/metrics` route inserted after
  garnet's `/api/daemon/shutdown`. main's `createAnalyticsService` call
  added before the chat-run service init alongside the prior reconcile
  note about the dropped legacy POST /api/projects body.
- App.tsx: handleCreateProject now consumes both garnet's plugin
  fields (pluginId / appliedPluginSnapshotId / pluginInputs /
  autoSendFirstMessage) and main's analytics requestId. Tracking
  fires success + failure paths; PluginLoopHome auto-send sessionStorage
  flag is preserved.
- ProjectView.tsx: the garnet auto-send useEffect coexists with main's
  `useCritiqueTheaterEnabled()` hook.
- ChatComposer.tsx: imports merged (drop now-unused fetchSkills,
  add analytics provider + tracking + buildVisualAnnotationAttachment).
- index.css: main's redesigned `.msg.user .user-text` chat bubble
  styling wins over garnet's plain text rule; garnet's
  `.msg-plugin-chip*` rules preserved alongside.
- EntryView.tsx: accepted HEAD (garnet wrapper) — consistent with
  reconcile decision #2. main's added PetRail / TopTab / analytics
  view tracking is intentionally NOT brought into the wrapper; the
  follow-up to re-integrate PetRail / image-templates / video-templates
  into EntryShell still stands and now also covers analytics
  view-tracking hooks.
- daemon/package.json + pnpm-lock: merged dep set (tar + posthog-node +
  prom-client coexist).
- Test fixtures (FileWorkspace.test): kept garnet's plugin-folders
  describe block intact; main's projectKind="prototype" addition is
  dropped where it conflicted with garnet's plugin-folder fixture
  files.

Verification: `pnpm install` (after lockfile reconciled), `pnpm typecheck`
exits 0 across all workspace packages.

Follow-up not done in this commit:
- PetRail / image-templates / video-templates / 0.7.0 analytics
  view-tracking hooks need to be added to EntryShell.
- Critique-theater settings toggle UX (added on main) lives in the
  SettingsDialog hierarchy; the reconcile state preserves the
  SettingsDialog so this should work without changes, but no
  end-to-end verification yet.
2026-05-13 23:29:56 +08:00
lefarcen
d3602be666 Merge origin/main into garnet-hemisphere (reconcile)
Merge of `origin/main` (`03ed3960`, 2026-05-13 pre-0.7.0) into the
161-commit garnet-hemisphere line, reconciling the product-vibe-coded
plugin/marketplace/EntryShell surfaces from garnet with the routines /
skills / live-artifacts feature work landed on main since the fork point.

Headline decisions (full rationale + side-by-side screenshots in
`specs/change/20260513-garnet-skills-automations/reconcile-result-vs-garnet.md`):

- #1 SettingsDialog: keep main's Memory / Skills / External MCP /
  Connectors / Routines / MCP server nav items even though the top-level
  /integrations + /automations routes also cover them. Two entries
  coexist for now; revisit once Track A/B fill in the placeholder content.
- #2 EntryView: accept garnet's thin wrapper delegating to EntryShell.
  Main's PetRail sidebar + image-templates/video-templates tabs are
  intentionally deferred to a follow-up that re-integrates them into
  the new EntryShell layout.
- #3 /integrations + /automations top-level routes: kept (garnet's
  product intent). Skills tab is still a "Coming soon" placeholder
  awaiting Track A; Routines/Schedules/Live-artifacts cards on
  /automations are still mock awaiting Track B.
- #5 DesignFilesPanel: hybrid — main's pagination as primary list,
  garnet's Plugin folders section preserved between the live-artifacts
  block and the pagination block. (by-kind sections drop in favour of
  pagination; plugin-folders rendering stays because it is a
  garnet-specific product addition.)
- #7 server.ts (10 hunks, ~5400 conflict lines): manual hunk-by-hunk
  merge. Both daemon admin routes + plugin/genui routes (garnet) and
  routines/memory/skills upgrades (main) preserved. Garnet's inline
  project route block kept alongside main's `registerProjectRoutes` /
  `registerProjectUploadRoutes` modular wiring — duplicate route
  audit is a follow-up. Garnet's POST /api/projects plugin-snapshot
  resolution + default-scenario fallback is intentionally dropped from
  the inline body (now handled by registerProjectRoutes) and listed for
  follow-up re-integration into `project-routes.ts`.

Verification (worktree at /Users/elian/Documents/open-design-garnet):
- `pnpm typecheck` exits 0 across all workspace packages
- daemon (`pnpm tools-dev run web --namespace reconcile-shots`) boots,
  serves `/api/daemon/status` healthy, and survives a Playwright
  walkthrough of /integrations / /automations / home / projects /
  design-systems / plugins / settings dialog
- `@open-design/plugin-runtime` package built (was missing dist/ on
  garnet); without it the daemon's plugins/* imports fail at boot

Track A (Skills tab → real SkillsSection) and Track B (Automations
cards → real routines / live-artifacts backend) are the two remaining
follow-ups blocking the placeholder/mock content from going live. See
`spec.md` and `track-skills.md` in the same directory.
2026-05-13 22:29:21 +08:00
Nagendhra Madishetti
38a5ab69e6
feat(daemon): Critique Theater Phase 12 (9 Prometheus metrics + 6 log events + OTel span + Grafana dashboard) (#1485)
* feat(web): pure reducer for Critique Theater states (Phase 7.1)

Pure CritiqueState reducer driven by the contracts-level PanelEvent
(the same shape both the live SSE stream and the recorded transcript
emit), so a single reducer powers both the in-flight panel and the
rerun replay. Lifecycle covers run_started → running → (shipped /
degraded / interrupted / failed), with panelist_open / dim /
must_fix / close / round_end events building per-round
CritiquePanelistView entries as they arrive.

Defensive behaviour that surfaced while writing the spec tests:
- Terminal phases (shipped / degraded / interrupted / failed) are
  sticky against further lifecycle events for the same run, except
  for parser_warning which can land late and is recorded in a side
  channel without changing phase.
- A new run_started for a different runId at any time discards the
  prior state and reboots, so the UI can launch consecutive runs
  without an explicit reset action.
- Events whose runId does not match the active run return the same
  state reference, so React's useReducer doesn't re-render
  subscribers on stray traffic.
- Round bookkeeping keys by round number rather than "always last",
  so an out-of-order panelist_dim for round 1 arriving after a
  round 2 dim does not corrupt the round 2 bucket.

Test coverage: 18 cases covering each transition, the runId guard,
sticky-terminal behaviour, the out-of-order round invariant, and
the stable-identity guarantee. Sets up Phase 7.2 and 7.3 to wire
SSE + replay into the same reducer.

* feat(web): useCritiqueStream hook subscribes to SSE and feeds reducer (Phase 7.2)

createCritiqueEventsConnection is a pure connection manager that
mirrors apps/web/src/providers/project-events.ts: opens an
EventSource at /api/projects/:id/events, listens for every name in
CRITIQUE_SSE_EVENT_NAMES, decodes each frame back into a PanelEvent
(stripping the critique. prefix and merging the data payload), and
hands it to the caller's onEvent. Reconnect uses exponential
backoff (1s → 30s) and resets on `ready`; malformed payloads drop
with a dev-mode warning rather than tearing the stream.

useCritiqueStream wraps the manager in a useReducer that owns the
CritiqueState. enabled=false or a null projectId tears down the
connection cleanly; switching projectId closes the old connection
and opens a fresh one. The returned dispatch lets local UI
synthesise actions (e.g. an Esc keypress firing a synthetic
interrupted while a kill request is in flight); production traffic
comes from the SSE stream.

Test coverage:
- sse.test.ts (10 cases, node env): subscription set covers every
  CRITIQUE_SSE_EVENT_NAMES channel; payload decoding lifts the wire
  shape back to PanelEvent; malformed JSON is swallowed and does
  not stop the stream; exponential backoff schedule and ready-reset
  semantics are pinned with a setTimeout seam; close() cancels
  pending reconnects and shuts the live source; no-op fallback
  when EventSource is unavailable.
- useCritiqueStream.test.tsx (6 cases, jsdom env): idle pre-event,
  reducer driven by synthetic actions, no connection when disabled
  or projectId is null, clean close on unmount, projectId change
  reopens cleanly.

* feat(web): useCritiqueReplay hook drives reducer from transcript file (Phase 7.3)

Fetches the per-run NDJSON transcript (one PanelEvent per line),
parses every line via the shared isPanelEvent predicate, and
dispatches into the same CritiqueState reducer the live SSE stream
uses. A single reducer means the UI rendering a replay can be
identical to the live panel, and a UI mounting both
useCritiqueStream and useCritiqueReplay in parallel does not have
to reconcile two state shapes.

speed knob is `paused | instant | live | { intervalMs: N }`.
- instant flushes every event synchronously, useful for opening a
  finished run already at its terminal state.
- intervalMs paces dispatches at a fixed cadence so the reviewer
  can watch the run unfold.
- paused parses the transcript but holds events back until the
  caller advances speed (consumers can drive a scrubber later).
- live is reserved for the future "playback at original cadence"
  feature, currently treated as instant; replay timestamps are not
  yet persisted with each event so honest pacing requires a
  follow-up Phase 7+ task.

gunzip seam handles `.ndjson.gz` transcripts via
DecompressionStream when present; the production fetch path picks
between text and arrayBuffer based on the URL extension. Both seams
are injectable so the unit tests don't need to spin up a real
network or a real gzip pipeline.

Test coverage (8 cases, jsdom env):
- Idle status before any URL is provided.
- speed=instant flushes the full transcript synchronously to
  shipped state.
- speed={intervalMs:N} paces with the setTimeout seam, reaching
  done after the last tick.
- speed=paused leaves status=playing with no dispatches.
- Empty transcript reports done with state still idle.
- Fetch rejection surfaces an error status with the message.
- Malformed NDJSON lines are skipped; valid events around them
  still land.
- .gz transcripts route through the gunzip seam.

Closes the Phase 7 plan tasks 7.1 / 7.2 / 7.3 (reducer + stream +
replay), all on one branch ready for review. Phases 8+ (Theater
components) consume these from this PR.

* fix(web): close payload-override gap + paused-resume bug in Critique Theater hooks (Phase 7 review)

Two P1 fixes from lefarcen's review on PR #1307:

SSE payload override

`sseToPanelEvent` previously spread `data` after the channel-derived
`type`, so a payload-provided `type` could override the channel and
route a `critique.run_started` frame into the reducer as a `ship`
action. Reversed the spread so the channel-derived `type` is
authoritative, and revalidated the resulting object through the
contracts-level `isPanelEvent` predicate before returning. Frames
that fail validation (missing runId, empty runId, unknown type) are
dropped, so a malformed or compromised SSE frame can no longer
dispatch a wrong-shape action into the reducer.

Three new sse.test.ts cases pin the regression: hostile `type:'ship'`
in the payload still resolves to `run_started`, missing runId is
dropped, empty runId is dropped.

Replay pause/resume

`useCritiqueReplay` had one big effect keyed on `transcriptUrl`
only, so flipping `speed` from `paused` to `instant` never re-fired
and the held events sat undispatched. Split into a parse effect
(depends on URL, fetches and stores events in state) and a pace
effect (depends on parsed-events + speed, owns the cursor + timers).
The playback cursor lives in a ref that survives pause/resume
cycles, so flipping `paused` -> `instant` flushes from the current
position rather than restarting (which would double-dispatch
`run_started` and reset the reducer).

Two new useCritiqueReplay.test.tsx cases:
- paused-then-instant transitions from `playing` to `done` and
  reaches the shipped terminal phase
- intervalMs paced playback dispatches one event, pauses to drain
  the next scheduled timer, flips to instant, and confirms the
  remaining transcript drains exactly once (cursor was preserved)

Doc consistency

The earlier source comment in useCritiqueReplay.ts claimed `live`
"paces by recorded timestamps" while the impl used zero-delay
timers and the PR body said it behaves like `instant`. Aligned to
reality: `live` currently behaves like `{ intervalMs: 0 }` (events
drain on successive microtasks via setTimeoutFn) because transcripts
do not yet carry per-event timestamps. Honest timestamp-driven
pacing is queued as a Phase 7+ follow-up.

Validated: pnpm guard, pnpm --filter @open-design/web typecheck,
Theater suite 47/47 (up from 42, +3 sse + 2 replay), full web suite
96 files / 888 tests.

* feat(i18n): seed Critique Theater key block (en + zh-CN; other locales fall back via spread)

* feat(web): Theater PanelistLane component (Phase 8.1)

* feat(web): Theater ScoreTicker component (Phase 8.2)

* feat(web): Theater RoundDivider component (Phase 8.3)

* feat(web): Theater InterruptButton component with Escape keybind (Phase 8.4)

* feat(web): Theater TheaterDegraded chip (Phase 8.5)

* feat(web): Theater TheaterCollapsed post-run summary (Phase 8.6)

* feat(web): Theater TheaterTranscript replay surface (Phase 8.7)

* feat(web): Theater TheaterStage top-level container (Phase 8.8)

* feat(web): Theater CSS using existing semantic tokens (no hex literals)

* feat(web): Theater public exports barrel

* fix(web): resolve P2 + P3 review feedback on Phase 8 (PR #1314)

Addresses all 4 P2 + 3 P3 items from codex, Siri-Ray, and lefarcen.

State-lifecycle fixes (3 x P2)
1. Reducer learns a synthetic `__reset__` action (`CritiqueResetAction`).
   Host hooks dispatch it when their gating prop changes so a stale
   run from a prior project / transcript cannot bleed into the next
   context. Reset is idempotent on idle (returns the same reference).
2. `useCritiqueStream` dispatches `__reset__` at the top of its
   connection effect, so a workspace switch from project A (which
   streamed a critique) to project B clears the reducer before the
   new EventSource opens. enabled=false also clears.
3. `useCritiqueReplay` dispatches `__reset__` at the top of its
   parse effect, so transcriptUrl swaps (including swap-to-null after
   a replay reached `shipped`) lift the reducer back to idle before
   the new fetch starts.

SSE validation (1 x P2)
4. `sseToPanelEvent` now runs a per-variant `hasValidVariantShape`
   check after the cheap `isPanelEvent` predicate. A
   `critique.ship` frame missing `composite` / `round` / `status` /
   `artifactRef` is rejected before reaching the reducer, so
   TheaterCollapsed can no longer crash on `undefined.toFixed(1)`.
   Every variant's required fields are validated: run_started
   (protocolVersion, non-empty cast, maxRounds, threshold, scale),
   panelist_* (round, role, plus variant-specific shape), round_end
   (round, composite, mustFix, decision in {continue,ship}, reason),
   ship (round, composite, status, artifactRef.{projectId,artifactId},
   summary), degraded (reason, adapter), interrupted (bestRound,
   composite), failed (cause), parser_warning (kind, position).

Reducer correctness (1 x P2)
5. `panelist_open` now materializes the round + an empty panelist
   view (`{dims: [], mustFixes: []}`) so TheaterStage can highlight
   the in-progress lane the instant the tag opens. Before this, a
   stream that emitted only `panelist_open` after `run_started` left
   `rounds = []` and the UI rendered no current round until a later
   `panelist_dim` arrived.

Polish (3 x P3)
6. Brand role tint swaps from `var(--magenta, var(--accent))` to
   `var(--purple, var(--accent))`. `--purple` is actually defined
   across the design systems; `--magenta` is not, so Brand was
   silently falling through to `--accent` and looking identical to
   Designer.
7. New i18n key `critiqueTheater.interruptedSummary` for the
   interrupted-collapse copy ("Interrupted at round N, best
   composite X.X"). Previously the interrupted branch reused
   `shippedSummary` and the UI read "Shipped at round..." for a run
   that specifically did not ship. Native value in en + zh-CN; other
   locales fall back via `...en` spread.
8. `TheaterDegraded` heading id comes from `useId()` instead of a
   hardcoded `theater-degraded-heading`, so two chips rendered on
   the same page (chat history with multiple completed runs) keep
   their aria-labelledby references unambiguous.

Tests (15 new cases)
- reducer.test.ts (+5): __reset__ on running/terminal/idle, panelist_open materializes round, panelist_open does not stomp prior panelist data.
- sse.test.ts (+6): variant-level rejection for ship without required fields, degraded without adapter, run_started with empty cast, panelist_dim with non-numeric score, round_end with unknown decision, plus a positive fully-formed ship.
- useCritiqueStream.test.tsx (+2): state reset on projectId change, state reset on enabled flip false.
- useCritiqueReplay.test.tsx (+1): state reset on transcriptUrl swap to null after a replay reached shipped.
- TheaterCollapsed.test.tsx (text-pinning update): asserts the interrupted branch reads "Interrupted at round 1" + "best composite 7.9", and explicitly NOT "Shipped at round...".
- TheaterDegraded.test.tsx (+1): two chips on the same page get unique aria-labelledby ids that each resolve to an `<h3>`.

Validated
- pnpm guard clean
- pnpm --filter @open-design/web typecheck clean
- Theater suite: 13 files, 101 tests (was 86 on the first Phase 8 push, +15 new)
- tests/i18n/locales.test.ts 5 of 5 across 18 locales

* feat(web): CritiqueTheaterMount wires SSE + reducer into a single drop-in (Phase 9.1)

* feat(i18n): Critique Theater strings for de + ja + ko + zh-TW (Phase 9.2)

* fix(web): resolve P1 + P2 review feedback on Phase 9 (PR #1315)

Addresses every blocker from codex, Siri-Ray, and lefarcen. The
three state-lifecycle and SSE-validation issues they also flagged
inherit fixes from PR #1314's review pass that this branch now sits
on top of after rebase.

Real daemon kill on Interrupt (P1)
- CritiqueTheaterMount now POSTs to
  /api/projects/:id/critique/:runId/interrupt alongside the
  optimistic local dispatch. Before this fix, clicking Interrupt
  only flipped the React state to interrupted while the daemon job
  kept running. The fetch is best-effort: a 404 (endpoint not wired
  yet, lands in Phase 15) is swallowed with a dev-mode console.warn
  so the UI still moves to the collapsed badge.
- New fetchInterrupt test seam lets RTL assert on the URL / method
  and simulate the "daemon not ready yet" path. Two tests pin both:
  the happy URL proj-42/critique/run-abc/interrupt POSTs, and a
  rejected fetch still flips the UI.

interruptPending reset on new run (P2)
- A ref-backed effect compares the current runId against the last
  one we saw; when it changes, interruptPending is cleared. A user
  who interrupts run-1 and then triggers run-2 from the same mount
  now gets a fresh, enabled kill button instead of one stuck in
  "Interrupting…". Pinned by a new mount test.

Escape keybind scope (P2)
- InterruptButton now checks the keydown target. Escape inside an
  input, textarea, select, or contenteditable element is ignored
  (and any ancestor of those via closest() is treated the same
  way). Body-level focus still fires the keybind so the Theater
  area's affordance keeps working. Four new tests cover textarea,
  input, contenteditable, and the body-focus positive case.

userFacingName i18n key (P2)
- The spec at specs/current/critique-theater.md:6 mandates a single
  critiqueTheater.userFacingName key so the "Design Jury" label can
  be renamed without touching code. Phase 8 introduced
  critiqueTheater.title by mistake; renamed across types.ts, en.ts,
  zh-CN.ts, de.ts, ja.ts, ko.ts, zh-TW.ts, and the lone consumer
  TheaterStage.tsx. The locale alignment test stays green.

Validated
- pnpm guard clean
- pnpm --filter @open-design/web typecheck clean
- Theater suite: 14 files, 112 tests (was 101 before, +11 new for
  the Phase 9 review pass: 3 mount + 4 InterruptButton focus scope;
  the rest were already in #1314's review fix).
- tests/i18n/locales.test.ts 5 of 5 across 18 locales.

* feat(daemon): adapter-degraded registry with TTL (Phase 10.1)

In-memory registry recording adapters that produced malformed or
oversize transcripts so the orchestrator can skip them for a TTL
window (default 24h) instead of cycling through known-bad providers
on every run.

Records carry reason (malformed_block | oversize_block |
missing_artifact), source label, and expiresAt. The test-only
clock seam lets the suite advance time deterministically and prove
that an expired entry stops counting as degraded without anyone
calling clearDegraded.

7/7 vitest cases green.

* feat(daemon): synthetic good + bad adapter fixtures (Phase 10.2)

Two test-only adapters that read the existing v1 transcript
fixtures (happy-3-rounds and malformed-unbalanced) and replay them
as either a full string or a 512-byte chunked stream. The chunked
form is what the conformance harness uses to prove the parser
holds together when the transcript arrives in arbitrary network
slices, not as one buffered blob.

* feat(daemon): adapter conformance harness (Phase 10.3)

runAdapterConformance pulls a transcript through the same
parseCritiqueStream pipeline the orchestrator uses and classifies
the outcome as shipped, degraded, or failed. On a degraded
outcome it forwards the matched reason to the adapter-degraded
registry, so a single nightly conformance run is what populates
the skip list rather than the orchestrator learning each adapter
is broken at request time.

5/5 vitest cases green covering shipped, malformed degraded,
oversize degraded, no-ship failure, and the harness-thrown
failure path.

* test(e2e): Critique Theater Playwright suite (Phase 11)

Six tests, one viewport per visual case, deterministic SSE
fixtures stubbed via page.route(). Adds the suite to
test:ui:extended so the existing extended-UI lane picks it up.

Coverage:

  1. Happy path: a single mounted theater plays the full
     fixture (1 run_started, 5 panelists open / dim / must_fix /
     close, 1 round_end, 1 ship) and ends on the score badge.
  2. Interrupt mid-run: the panelist that is open at the time
     the interrupt button is clicked closes with an interrupted
     marker and the transcript freezes there.
  3. Visual regression at 375x720 mobile.
  4. Visual regression at 768x1024 tablet.
  5. Visual regression at 1280x800 desktop.
  6. A11y role tree: the theater region exposes a labelled
     landmark, each panelist lane is a group with an accessible
     name, the score is a status live region.

All SSE traffic is stubbed by page.route so the suite runs in CI
without a daemon. The toggle is seeded via localStorage by
bootAppWithCritiqueEnabled so the gate behaves as if Settings
flipped it on. typecheck clean; playwright --list reports 6.

* test(web): reducer p99 bench at 10k iterations (Phase 13.1)

Locks the documented 2ms budget for the Critique Theater reducer
on a representative SSE script (27 actions, one full happy run)
behind a regression gate. Asserts p99 stays under 4ms (2x the
documented budget) so CI runners with a noisy neighbour do not
flake while a real regression to 20ms or 200ms still trips.

The bench is a vitest case rather than a bare microbenchmark so
it runs in the same CI lane as every other web test and does not
need a parallel runner.

* test(web): critique surface coverage walker (Phase 13.2)

Walks the public critique surface (11 SSE event names, 5 panelist
roles, 6 lifecycle phases, 9 named i18n keys) and asserts each
named symbol appears in both the src corpus and the test corpus.
The walker is the gate that catches a rename in one half of the
codebase without a matching update in the other half: a future
PR that drops 'panelist_must_fix' from the reducer without also
removing its test reference fails this suite.

62 assertions, one per symbol per corpus.

* docs: Critique Theater user guide (Phase 14.1)

Seven sections aimed at end users (not contributors):

  1. What is Design Jury
  2. How it works (the five panelists, auto-converging rounds,
     the composite formula)
  3. Settings (the M1 toggle and what it does)
  4. Reading the score badge
  5. Replay surface
  6. Troubleshooting (degraded, interrupted, failed)
  7. FAQ

The composite formula is documented as
    designer * 0 + critic * 0.4 + brand * 0.2 + a11y * 0.2 + copy * 0.2
because anyone trying to reverse-engineer the score is going to
search for those weights and the docs are the place they should
land first.

* docs(daemon): critique module AGENTS map (Phase 14.2)

Daemon-side wayfinder for the apps/daemon/src/critique directory.
Tables every file, what owns what invariant, and the 'when you
change anything here' guide so a future contributor does not
have to reverse-engineer the rollout resolver before adding a
new SSE event.

* docs(web): Theater module AGENTS map (Phase 14.3)

Web-side mirror of the daemon AGENTS map. Same file table, same
invariants section, same change-impact guide, sized to the
Theater component package.

* feat(daemon): rollout flag resolver (Phase 15.1)

Single decision point every caller consults to know whether the
orchestrator should wire the critique pipeline for a given run.
Priority:

  1. Skill-level policy (required wins, opt-out wins inversely)
  2. Per-project override from the Settings toggle
  3. OD_CRITIQUE_ENABLED env override
  4. Rollout phase default
       M0 dark-launch      false
       M1 settings only    false (toggle is off until the user flips it)
       M2 per-skill        true if skill opted in
       M3 global default   true

OD_CRITIQUE_ROLLOUT_PHASE parser defaults to M0 on unknown input
so a fresh install never surprises a user with the feature on.

10/10 vitest cases green covering every cell of the matrix.

* feat(web): Settings toggle hook for Critique Theater (Phase 15.2)

React hook that reads critiqueTheaterEnabled from the existing
open-design:config localStorage blob and stays in sync via:

  - the platform storage event (cross-tab)
  - a open-design:critique-theater-toggle CustomEvent (same-tab)

Same-tab event is the one that fires when the Settings panel saves
in the current window: the toggle and every mounted theater update
without a page reload.

setCritiqueTheaterEnabled(next) is the imperative setter the Settings
panel calls. It preserves the rest of the stored config (mode, apiKey,
etc.) and dispatches the same-tab event after the localStorage write.

The web hook reflects what the user toggled; the daemon-side
isCritiqueEnabled is the final routing authority (project override,
env, rollout phase). When they disagree, the daemon wins for backend
gating and the web reflects the toggle state.

6/6 vitest cases green covering first read, stored read, same-tab
event flip, config preservation, corrupted JSON tolerance, and
cross-tab storage event.

* test(web): Phase 15 toggle hook failure-mode coverage (PR #1320)

lefarcen P2 on PR #1320 flagged that the PR body claimed safe
behavior for disabled localStorage, non-object JSON, and missing
CustomEvent shim, but the suite only covered corrupt JSON plus
happy-path storage events. Added four failure-mode tests so the
swallowed errors are not silently traded for a throw in a future
refactor:

1. Returns false on a stored JSON value that parses to an array
   (non-object). Catches a regression where the guard treats
   anything truthy as a config blob.
2. Returns false on a stored JSON value of literal 'null'.
   typeof null === 'object' in JS, so the guard has to check null
   explicitly; this test pins that check.
3. Returns false when localStorage.getItem throws (private mode /
   disabled storage / SecurityError). The hook must swallow and
   return false so the rest of the app keeps rendering.
4. setCritiqueTheaterEnabled still dispatches the same-tab
   CustomEvent when localStorage.setItem throws (quota exceeded /
   disabled storage). The dispatch path is the in-session
   broadcast that keeps every mounted hook coherent even when
   persistence is unavailable; verified by mounting two probes
   and asserting both flip after the setter is called with a
   throwing setItem.

10/10 vitest cases green (6 existing + 4 new).

* fix(web): honor CustomEvent payload in toggle hook listener (PR #1320)

Both Siri-Ray (blocking) and lefarcen (P2 new) caught the same
real bug in the failure-mode test I added in affcdd27: the test
asserts the in-session UI flips when localStorage.setItem throws,
but the CustomEvent listener was ignoring the event's typed
detail and just calling readToggle(). Under a throwing setItem
the localStorage value is stale (or absent), so the listener
would see the OLD value and the test would fail (or worse, the
production claim 'in-session event keeps mounts coherent' was
hollow).

Fixed the hook, not the test: the listener now reads
event.detail.enabled when it is a boolean, falling back to
readToggle() only for malformed events or for cross-tab storage
events (which do not carry a typed payload). The setter already
dispatched the detail; the listener just was not consuming it.

Test changes:

  - The existing 'setItem throws' test now asserts the right
    behavior for the right reason. Updated the inline comment to
    say the listener reads from detail, not localStorage.
  - New test 'falls back to readToggle when the CustomEvent
    carries no usable detail' pins the fallback path: a
    malformed dispatcher (no detail, or detail.enabled not a
    boolean) degrades cleanly instead of throwing or being
    silently ignored.

11 / 11 vitest cases green (10 prior + 1 new fallback).

* feat(daemon): route critique spawn-path eligibility through the rollout resolver

The wireup edit Phase 10 and Phase 15 carved out: today server.ts gates
the critique pipeline on critiqueCfg.enabled, which is just the
OD_CRITIQUE_ENABLED env var. After this commit it gates on
isCritiqueEnabled(...) from the Phase 15 resolver, so the full
priority matrix is live:

  1. Per-skill od.critique.policy veto (opt-out / required)
  2. Per-project override (M1 Settings toggle, written through the
     existing Phase 6 settings endpoint)
  3. OD_CRITIQUE_ENABLED env override (power-user lane / CI fixtures)
  4. OD_CRITIQUE_ROLLOUT_PHASE default
       M0 dark-launch      false
       M1 settings only    false
       M2 per-skill        only when skillPolicy === 'opt-in'
       M3 global default   true

Default behaviour on a fresh install is unchanged: the resolver
returns false at M0 without an env override or a project override,
so prod traffic falls through to the legacy single-pass path
exactly the way it did before.

Inputs threaded today: phase from OD_CRITIQUE_ROLLOUT_PHASE,
envOverride from OD_CRITIQUE_ENABLED. skillPolicy and projectOverride
are passed as null for the v1 cutover; the daemon-side handler that
round-trips critiqueTheaterEnabled on the project settings row and
the od.critique.policy frontmatter resolver land as the next two
commits in this branch.

The three call sites that used critiqueCfg.enabled (the brand-thread
guard, the skill-thread guard, the top-line critiqueShouldRun
compound) now read from a single locally-scoped critiqueEnabledForRun
boolean, so the eligibility check is computed exactly once per spawn
and the prompt composer + orchestrator stay in lockstep the way
the existing comment already promised.

Tests still green: daemon vitest 22 / 22 across rollout +
conformance + adapter-degraded. Daemon typecheck clean.

* feat(web): mount CritiqueTheaterMount in ProjectView

The web counterpart of the daemon wireup. ProjectView now renders
<CritiqueTheaterMount projectId={project.id} enabled={...} /> as a
sibling of <AppChromeHeader> inside the top-level <div className="app">.

The mount is the drop-in from the Phase 9 stack: it owns the SSE
subscription, the kill-request handshake, and the phase-aware swap
from the live <TheaterStage> to the collapsed badge once a run
settles. The mount returns null until the daemon emits a
critique.run_started for the active project, so the visual surface
is byte-for-byte unchanged for users who have not opted in.

Enabled wiring: useCritiqueTheaterEnabled() reads the M1 Settings
toggle from the existing open-design:config localStorage blob and
stays in sync with both the platform storage event (cross-tab) and
the same-tab open-design:critique-theater-toggle CustomEvent the
Phase 15 setter dispatches. The hook honors the event payload
directly so a private-mode browser that cannot persist the toggle
still updates the in-session UI correctly.

The daemon-side gate (isCritiqueEnabled in apps/daemon/src/server.ts)
remains the authority for whether a run is actually wired through
the critique pipeline. This hook only governs whether the web layer
renders the resulting SSE stream when the daemon emits one. The
two-layer gate is intentional: an integrator embedding the Theater
in a custom UI can flip the web visibility independent of the
daemon's routing decision, and a daemon-side env override flips
backend gating without touching the web's localStorage.

Tests still green: web Theater suite 181 / 181 across 16 files.
Web typecheck clean.

* feat(daemon): resolve od.critique.policy frontmatter at the spawn site

The next step in the wireup branch's ladder: replace the placeholder
`skillPolicy: null` with the actual value parsed from the active
skill's SKILL.md frontmatter.

Three small edits, one new field on a public type:

1. SkillInfo gains a `critiquePolicy: SkillCritiquePolicy` field
   carrying the parsed `od.critique.policy` token (required /
   opt-in / opt-out / null). The field is null when the skill has
   no opinion, which lets the lower-priority resolver tiers
   (projectOverride, envOverride, phase default) decide.

2. listSkills() populates the new field via a small
   `normalizeCritiquePolicy` helper that tolerates the YAML
   scalar's casing and trims whitespace. Unknown tokens collapse
   to null so a typo in SKILL.md cannot accidentally force the
   panel on or off; it just falls through. Derived example cards
   inherit the parent's policy.

3. server.ts captures `skill.critiquePolicy` into a hoisted
   `skillCritiquePolicy` variable inside the existing skill-load
   block, then threads it into the isCritiqueEnabled call as the
   skillPolicy input. The hoisting keeps the variable in scope at
   the resolver call site without restructuring the spawn handler.

After this commit, the priority matrix the rollout resolver was
designed for is live for its top tier. The previous commit wired
env + phase; this one wires skill. The projectOverride input
remains null pending the next commit that extends the Phase 6
settings endpoint.

Daemon vitest: 10 / 10 rollout cases pass against the new wiring.
Daemon typecheck: clean.

* feat(daemon): feed projectOverride into the rollout resolver from project metadata

Replaces the placeholder `projectOverride: null` in the spawn
handler with the actual value the Settings panel writes onto the
project's metadata blob: `critiqueTheaterEnabled?: boolean`.

The read is defensive at the boundary: the metadata object is
typed loosely (it round-trips through SQLite as a free-form JSON
blob), so the spawn handler narrows to `boolean` and falls
through to `null` for any other shape. A missing key, a malformed
value, or a project that has never visited Settings collapses to
`null`, which is exactly the resolver's "no opinion, fall
through to env / phase" signal.

The `critique` frontmatter slot also gets typed on the
SkillFrontmatter shape so the `od.critique.policy` chain the
previous commit introduced no longer needs a bracket-access
cast. Same pattern as the existing `craft`, `preview`, and
`design_system` nested-record slots.

After this commit, every tier of the rollout resolver's priority
matrix is wired:

  1. skillPolicy   (from SKILL.md od.critique.policy)
  2. projectOverride (from project metadata critiqueTheaterEnabled)
  3. envOverride   (from OD_CRITIQUE_ENABLED)
  4. rollout phase (from OD_CRITIQUE_ROLLOUT_PHASE)

The write path for projectOverride still flows through the
existing project-update handler the Settings panel already uses
to persist project metadata; no new endpoint is needed. The
Settings UI button that calls setCritiqueTheaterEnabled and
posts the new field is the next commit on this branch.

Daemon typecheck: clean. Daemon vitest: 10 / 10 rollout cases
still green against the new wiring.

* fix(daemon): forward critique events to project sinks + align composer gate (PR #1338)

Two codex review items addressed in one commit since they share the
same root cause (resolver-enabled run hits a transport / prompt
contract that was still env-gated):

P1 (transport mismatch). The daemon emits critique.* SSE frames
through critiqueBus -> design.runs.emit, which fans out on
/api/runs/:runId/events. The web CritiqueTheaterMount subscribes to
/api/projects/:projectId/events (it's project-scoped, not run-
scoped, because the mount lives at the project workspace and
follows the user across runs). Result: in production the mount
never sees a real frame and the e2e tests' stubbed routes hide the
mismatch.

Fixed by extending critiqueBus.emit to fan out to BOTH sinks: the
existing runs.emit transport, AND the per-project event-sinks map.
The project-events route emits via sse.send(payload.type, payload),
so we pack the SSE channel name onto payload.type and let the sink
push the right channel. The web sseToPanelEvent overwrites type
from the channel name on the way back into a PanelEvent, so the
round-trip stays correct.

P2 (prompt gate misalignment). composeSystemPrompt reads
cfg.enabled to decide whether to append the panel addendum, but
critiqueCfg.enabled is loaded from OD_CRITIQUE_ENABLED only. A run
the resolver enabled via phase / project / skill (env unset) would
have critiqueShouldRun = true while critiqueCfg.enabled remained
false, dropping the panel prompt while still routing through
runOrchestrator -> parser waits for tags that never arrive -> run
degrades.

Fixed by passing a derived config { ...critiqueCfg, enabled: true }
to the composer when critiqueShouldRun is true. The composer's own
gate now agrees with the resolver decision on every input the
spec defines.

Daemon typecheck: clean. Daemon vitest: 10 / 10 rollout cases
still green against the new wiring.

* fix: address PerishCode P1 + P2 follow-ups on PR #1338

Two follow-up items PerishCode flagged on the activation PR.
Non-blocking but both are real:

1. Phase 11 e2e suite was wired into test:ui:extended but lands
   the user on '/' (home route) where ProjectView (and therefore
   CritiqueTheaterMount) is never rendered. With the suite as
   written, every assertion would time out the first time the
   lane runs in CI, contradicting the PR body's claim that the
   suite stays parked behind test.describe.fixme.

   The state diverged from my earlier Phase 11 work because the
   merge from main on commit 4ab719c6 brought in #1307's
   squash-merged version of the e2e file (the pre-fixme shape).

   Re-applied test.describe.fixme to the describe block plus
   removed ui/critique-theater.test.ts from the test:ui:extended
   script in e2e/package.json. Added a file-header docblock
   explaining what the follow-up commit needs to do: replace
   goto('/') with /projects/:id navigation similar to
   app-design-files.test.ts, split the SSE fixture into a live
   prefix and terminal suffix (Codex P2 on PR #1320), and commit
   the first PNG baselines.

2. bestRoundOf in CritiqueTheaterMount returned the LAST round
   with a numeric composite, not the round with the HIGHEST
   composite, while bestCompositeOf correctly returned the max.
   A run that closed round 1 at 8.5 and round 2 at 6.0 would
   dispatch interrupted { bestRound: 2, composite: 8.5 } on a
   user-clicked interrupt.

   Folded the two helpers into a single bestRoundAndComposite
   that walks state.rounds once and returns the matching pair so
   the two values cannot drift. The onInterrupt callback now
   destructures from one helper instead of two independent reads.
   Falls back to (state.activeRound, 0) when no round has closed
   with a composite yet.

Web typecheck: clean. CritiqueTheaterMount.test.tsx: 7 / 7 cases
still green against the new helper.

* fix: wire M1 project override end-to-end + correct deferred-surface doc claims (PR #1338)

Three lefarcen P2s on the latest review pass, all real:

1. M1 project override was half-wired: the daemon read
   metadata.critiqueTheaterEnabled but the web setter only
   wrote localStorage. A user opt-in would render the Theater
   on the web (localStorage was set) while the daemon resolved
   projectOverride=null and skipped critique unless env / phase
   already permitted. Two halves talking past each other.

   Extended setCritiqueTheaterEnabled to accept an optional
   { projectId, fetchProjectSettings } options bag. When a
   projectId is supplied, the setter ALSO sends a
   PATCH /api/projects/:id with { metadata: { critiqueTheaterEnabled
   } } so the daemon's spawn-time resolver picks the same value up
   on the next generation. The existing project-routes endpoint
   already accepts arbitrary metadata patches, so no new endpoint
   is needed. The local write + the CustomEvent dispatch still
   fire before the PATCH, so a network failure does not unwind
   the in-session UI flip. Three new vitest cases pin the new
   path: PATCHes when projectId is provided, skips when it is
   not, swallows a rejected PATCH so the in-session UI still
   flips.

2. Rollout docs (docs/critique-theater.md section 3) claimed the
   Settings toggle persists into the daemon settings store, but
   the previous implementation only had a localStorage reader /
   writer plus a daemon read of project metadata, with no
   round-trip. Rewrote the section to lead with the four-tier
   resolver (skill policy / project override / env / phase),
   document that the setter now round-trips via the existing
   PATCH endpoint when given a projectId, and call out the
   Settings panel UI control as a deliberate follow-up.

3. Troubleshooting table pointed users at /api/metrics/critique
   (Phase 12, deferred) and 'od adapters clear-degraded <id>'
   (CLI wrapper that does not exist). Replaced the metrics
   reference with the local conformance harness command
   (pnpm --filter @open-design/daemon vitest run
   tests/critique-conformance.test.ts) that ships today, with a
   note that the Phase 12 dashboard surfaces this status as a
   series once that PR lands. Replaced the CLI command with the
   programmatic clearDegraded() helper that exists today and
   flagged the CLI wrapper as planned follow-up.

Web typecheck: clean. Toggle hook tests: 14 / 14 green (11
existing + 3 new for the round-trip path).

* test(web): multi-round interrupt regression for bestRoundAndComposite (PR #1338)

lefarcen P3 follow-up to the previous bestRoundAndComposite fix:
the existing CritiqueTheaterMount.test.tsx interrupt cases only
exercised a single-round state, so a future refactor back to two
independent helpers wouldn't be caught by the test suite even
though it'd reintroduce the round / composite drift bug.

Added a regression case that:

  1. Drives the reducer through two complete rounds with the
     full 5-role cast closing at distinct composites: round 1
     at 8.5, round 2 at 6.0 (the high-composite round is NOT the
     most recent one).
  2. Clicks Interrupt + waits for the daemon ack via the test
     seam fetcher returning 204.
  3. Asserts the collapsed badge displays "round 1" (the
     correct best-composite round), and queryByText for
     "round 2 ... 8.5" returns null (the buggy pairing
     would have produced that string).

The bestRoundAndComposite helper walks state.rounds in one pass
and returns the matching pair, so the round number and the
composite cannot drift apart. This test locks the fix in: a
refactor that splits the helpers back into independent walks
will be caught here.

8 / 8 vitest cases green on the file.

* fix(web): read-merge-write the project metadata in setCritiqueTheaterEnabled (PerishCode P2 on PR #1338)

The previous round-trip sent { metadata: { critiqueTheaterEnabled: next } }
as the entire PATCH body. The daemon's project-routes handler only
re-stamps three immutable fields (baseDir, importedFrom,
fromTrustedPicker) before calling updateProject(db, id, patch),
which then does a shallow { ...existing, ...patch } in apps/daemon/
src/db.ts. So patch.metadata replaces the row's metadata wholesale,
dropping kind, templateId, linkedDirs, and every other field the rest
of the app reads.

No in-tree caller passes projectId today (only vitest cases), so the
bug had not surfaced yet. But the surface is documented in
docs/critique-theater.md section 3 and the function's own JSDoc as
the M1 round-trip path, so it would have shipped as a latent footgun
for the next integrator: a Settings UI follow-up, or any third party
that wires the setter into a project-aware surface.

Fix: read-merge-write rather than a bare patch.

- GET /api/projects/:id to read the row's current metadata.
- Spread that metadata into the PATCH body and overlay
  critiqueTheaterEnabled: next on top, mirroring the partial-metadata
  pattern already used in ChatComposer.tsx for linkedDirs.
- PATCH the merged object.

Failure handling:
- GET fails: skip the PATCH entirely. We cannot construct a safe
  merged body without the current state, and a bare patch would
  wipe other metadata. The in-session CustomEvent fired earlier in
  the setter still keeps every mounted hook consistent; the next
  save retries the round-trip.
- PATCH fails: log in dev. The in-session UI is already correct via
  the CustomEvent.

Tests (TDD, red-first):

- 'GETs the project then PATCHes with merged metadata when a
  projectId is supplied': stubs a GET that returns
  { kind: 'template', templateId: 'modern-blog', linkedDirs: [...] }
  and asserts the PATCH body equals the merge plus the toggle.
- 'PATCHes with just the toggle when the project has no prior
  metadata': stubs a GET that returns no metadata block.
- 'skips the PATCH (does not stomp metadata) when the prefetch GET
  fails': stubs a rejecting GET and asserts only the GET fires.
- 'swallows a rejected PATCH after a successful prefetch': stubs a
  successful GET and a rejecting PATCH; asserts the in-session UI
  still flips via the CustomEvent.

Doc updated on the setter's JSDoc to describe the new three-step
flow (localStorage, CustomEvent, read-merge-write PATCH) and the
two failure modes.

Verified:
- pnpm --filter @open-design/web typecheck clean.
- pnpm --filter @open-design/web test: 111 files / 1055 tests green
  (was 1052, +3 from the new merge-flow cases).

* fix(web): restore wait-for-daemon-ack pattern on Theater interrupt

Same regression as flagged on PR #1316 post-main-merge: the
optimistic local dispatch fired before the POST resolved, so a
daemon 404 / 409 still terminalized the UI and the real SSE
terminal event got ignored by the sticky interrupted phase.

Snapshot runId / bestRound / composite at click time, dispatch
interrupted only on res.ok, clear interruptPending on rejection or
non-2xx so the user can retry. Tests cover rejection + 404 leaving
the run on the live stage; the 204 path waits for the ack.

* feat(daemon): Critique Theater Phase 12 observability foundations

Lands the metrics registry, the structured logger, the /api/metrics
route, and the adapter-degraded bump that wires up the first data
point. The orchestrator-side bumps for runs / rounds / composite /
must-fix / interrupted / parser_errors / protocol_version land in a
follow-up commit on this branch (kept separate so the wiring diff
reads cleanly against the registry shape).

Surfaces added:

- apps/daemon/src/metrics/index.ts: 9 Prometheus series under the
  open_design_critique_* namespace with the histogram buckets the
  spec calls out (round_duration_ms at 100 / 250 / 500 / 1000 /
  2500 / 5000 / 10000 / 30000 / 60000 ms; composite_score at
  0-10 integer steps).
- apps/daemon/src/logging/critique.ts: 6 typed events, one JSON line
  per call on stdout, namespaced critique. Matches the JSON-per-line
  convention cli.ts already uses; no new logger framework.
- apps/daemon/src/server.ts: GET /api/metrics route. Honors
  OD_METRICS_ENDPOINT=disabled to opt out for air-gapped installs.
- apps/daemon/src/critique/adapter-degraded.ts: markDegraded now
  bumps degraded_total so the adapter-health dashboard panel
  reflects every TTL refresh and every fresh mark.

Deps: prom-client ^15.1.0, @opentelemetry/api ^1.9.0 added to
apps/daemon/package.json. Both are zero-config no-ops without an
exporter wired; daemon bundle size impact is ~150 KB uncompressed.
The @opentelemetry/api dep is in place ahead of the OTel-spans
follow-up commit; it adds no behavior on this commit.

Tests:
- tests/metrics/critique.test.ts (3 cases): registry shape +
  exposition text + reset-between-tests
- tests/logging/critique.test.ts (4 cases): event shape + ordering
  + newline framing + namespace stamping

Verification (Windows-local):
- pnpm --filter @open-design/daemon typecheck: clean
- New metrics + logging suites: 7 / 7 green
- Existing adapter-degraded + conformance + rollout suites:
  22 / 22 green; the bump is non-breaking

* feat(daemon): wire Critique Theater metrics + structured logs from the orchestrator

Lights up the bump sites the Phase 12 foundations PR registered the
series for. Every panel event the parser surfaces now reaches the
matching Prometheus counter / histogram and the matching JSON log
line on stdout.

Switch-loop bumps + logs:

- run_started: log run_started, set protocol_version gauge to the
  observed protocol version (small-integer cardinality).
- panelist_open: record the first-open wall-clock per round so
  round_end can compute round_duration_ms; subsequent opens in the
  same round leave the start time untouched.
- panelist_must_fix: bump must_fix_total with the panelist role.
  The wire event does not yet carry a dim name, so the label is
  'unspecified' for now; a future parser revision can drop in the
  real dim without a metric rename.
- round_end: bump rounds_total, observe composite_score, observe
  round_duration_ms (current ms minus the tracked start), log
  round_closed with the composite / mustFix / decision triple.
- parser_warning (parser-yielded): bump parser_errors_total with
  the kind label, log parser_recover with kind + position.

Orchestrator-side parser warnings (composite_mismatch and
duplicate_ship from the daemon-authoritative scoring checks) go
through a new emitParserWarning helper so the bus emit, the
collectedEvents push, the metric bump, and the log line stay in
lockstep. Three inline emission sites collapse to one-line helper
calls.

After the try/catch, a single terminal-status switch bumps
runs_total{status, adapter, skill} once per run, with branch-
specific log + counter:

- shipped / below_threshold: log run_shipped
- interrupted: bump interrupted_total, log run_failed{cause: interrupted}
- timed_out: log run_failed{cause: timed_out}
- failed: log run_failed{cause: orchestrator_internal}
- degraded: log degraded{reason: orchestrator_classified}

OrchestratorParams gains optional skill: string for the label;
defaults to 'unknown' so spawn sites that have not yet threaded it
keep working without a metric shape change.

Tests:
- The new metrics + logging suites (7 / 7) verify registry shape
  and event framing; orchestrator-side metric integration is
  exercised through the existing critique-conformance and
  critique-adapter-degraded suites (22 / 22 still green).
- Logger test reassigns process.stdout.write directly instead of
  vi.spyOn so the Node overloaded write signature does not
  collide with MockInstance<unknown>.

* feat(observability): Grafana dashboard JSON for Critique Theater

Three default rows mapping to the metrics this branch wires up:

1. Fleet quality: composite score p50 / p90 / p99 line graph by
   adapter, plus a heatmap of the composite distribution. The
   line graph answers 'are my agents getting better over time';
   the heatmap answers 'are the bad runs clustered around one
   adapter or smeared across the fleet'.

2. Adapter health: stacked bar charts for degraded marks (by
   adapter / reason) and parser errors (by adapter / kind) over
   a 5-minute window. The two queries together let an operator
   see 'is this adapter degraded because of malformed wire output
   or because of oversize blocks' without flipping panels.

3. Brief throughput: runs-per-hour by terminal status, an average
   rounds-per-run stat per adapter, and a round-duration ms p50 /
   p90 / p99 line. Throughput numbers fall straight out of the
   runs_total / rounds_total counters; the duration histogram is
   the same one the runs feed.

The dashboard uses a templated $datasource var (defaults to
'prometheus') so an operator with multiple Prometheus instances
can switch without editing JSON. Schema version 39 (Grafana 11).

Operators import via:

  pnpm dlx @grafana/cli dashboard import     tools/dev/dashboards/critique.json

or paste into a provisioned dashboards directory. The file is
checked into the repo as a starting artifact; alert rules and
SLO panels ship after the first 1000 runs inform the right
thresholds. JSON validates with node -e 'JSON.parse(...)' (sanity
checked locally).

* feat(daemon): OpenTelemetry outer span around the critique run

Wraps each runOrchestrator call in a 'critique.run' span via the
existing @opentelemetry/api dep added in the Phase 12 foundations
commit. Attributes set on the span:

- critique.run_id, critique.adapter, critique.skill at start
- critique.final_status, critique.final_composite on terminal
  resolution
- span status flipped to ERROR for failed / timed_out runs so a
  Tempo / Honeycomb / Jaeger filter on traces.status=error
  surfaces the right slice without joining back to Prometheus

No exporter is wired by default; @opentelemetry/api is the API
package and intentionally splits from @opentelemetry/sdk-*, so
the span is zero-overhead until an operator attaches an SDK
through their runtime config.

Inner per-round / parse_chunk / scoreboard_eval / persist_round /
ship.persist spans defined in the Phase 12 plan are a follow-up:
the outer span alone gives the trace a duration + final status +
adapter/skill labels, which is the 80% value for dashboards that
correlate runs across services. Adding child spans inside the
existing 600-line orchestrator without restructuring is a separate
careful change.

Verification:
- pnpm --filter @open-design/daemon typecheck: clean
- 29 / 29 critique + metrics + logging tests still green

* fix(nix): bump pnpmDepsHash for prom-client + @opentelemetry/api lockfile bump

nix-check failed on PR #1485 with hash mismatch in
open-design-daemon-pnpm-deps and open-design-web-pnpm-deps after
the Phase 12 foundations commit (2b8b7445) added prom-client and
@opentelemetry/api to apps/daemon/package.json and refreshed
pnpm-lock.yaml.

CI reported the new sha:
  specified: HFLm+8hv3o5x3Xem4MXNsNclIgiVRc70+EBafL0rVn8=
  got:       7R1sQC38gOT0gsZ2oNOviCZ486cbbGJGJCis6WI8z9s=

Both nix files pin the same workspace lockfile, so both flip in
lockstep. No other Nix surface changes required.

* fix(daemon): four Phase 12 review findings (Codex P2 x2 + Siri-Ray P2 + lefarcen P2)

1. Siri-Ray P2 in orchestrator.ts (round metric / log used untrusted
   agent values). The new observability path now records rs.composite
   and rs.mustFix (daemon-authoritative) instead of event.composite
   and event.mustFix when rs exists, and skips the bumps + log
   entirely when rs is missing (a degenerate round_end without any
   matching panelist_open). The dashboard p50 / p90 / p99 now agrees
   with persistence and ship decisions; an adapter reporting <ROUND_END
   composite='10'> while the daemon computed 6 logs 6 and still emits
   the composite_mismatch parser warning the prior block was already
   producing.

2. Codex P2 in server.ts (skill label always 'unknown'). The spawn
   path called runOrchestrator without passing the resolved skill id,
   so every live run bumped open_design_critique_*{skill='unknown'}
   and the per-skill dashboard breakdown was always empty. Threaded
   effectiveSkillId (already computed at the same handler scope as
   the project skill fallback) through skill: . . . so the metric
   reflects the real skill when one is assigned, and the orchestrator
   default of 'unknown' only fires for runs that genuinely have none.

3. Codex P2 in conformance.ts (protocol-version mismatch let through).
   An adapter that emitted <CRITIQUE_RUN version='2'> followed by a
   valid SHIP classified as shipped because the harness only watched
   for terminal events. Added a guard inside the parse loop: if a
   run_started carries protocolVersion !== CRITIQUE_PROTOCOL_VERSION,
   mark the adapter degraded with reason 'protocol_version_mismatch'
   (already in DEGRADED_REASONS) and return early. ConformanceOutcome
   union widened to accept the new reason.

4. lefarcen P2 in tools/dev/dashboards/critique.json (runs-per-hour
   panel under-reported by 3600x). 'rate(...[1h])' returns per-second.
   Multiplied by 3600 so the panel title and unit match the actual
   value rendered.

Verification:
- pnpm --filter @open-design/daemon typecheck: clean
- New metrics + logging suites (7), existing adapter-degraded (7),
  conformance (5), rollout (10): 29 / 29 green
- Grafana JSON re-parses with node -e 'JSON.parse(...)'

* fix(nix): set pnpmDepsHash to fakeHash so CI surfaces the real hash for the regenerated lockfile (lefarcen P1 on PR #1485)

* fix(nix): pin pnpmDepsHash to sha256-NtXbiRU0YZ4EVJVNC6N3sR1S0ozA3BvCwgXI0L0OMH4= from CI nix-check output

---------

Co-authored-by: Nagendhra <nagendhra405@gmail.com>
2026-05-13 22:11:27 +08:00
lefarcen
5172e37217 Merge origin/main into release/v0.7.0 to prepare merge-back PR
Resolves 7 conflicts via hybrid strategy:
- apps/web/src/components/EntryView.tsx: take main (Discord+X pills are forward feature)
- apps/web/src/components/Icon.tsx: take main (switch-case refactor)
- apps/web/src/components/NewProjectPanel.tsx: take release (preserve #1514 dropdown UX validated in 0.7.0 acceptance)
- apps/web/src/index.css: take main (project-target-platforms / instructions chip styles)
- apps/web/tests/components/FileViewer.inspect-empty-hint.test.tsx: accept main's deletion
- nix/package-daemon.nix, nix/package-web.nix: take main pnpmDepsHash

Non-conflicting hunks from #1519 (AppChromeHeader), #1428 (PostHog analytics
call sites), and #1540 (release light background) are preserved via auto-merge.
2026-05-13 18:19:47 +08:00
Siri-Ray
026e13b347
fix(web): restore release header layout (#1519)
* fix(web): restore release header layout

* fix(web): disambiguate entry settings button

Generated-By: looper 0.7.4 (runner=fixer, agent=codex)
2026-05-13 14:57:25 +08:00
Caprika
6736310a01
Implement manual edit inspector (#1448)
* feat(web): tweaks palette popover with HSL hue-shift recoloring

Adds a Tweaks color-palette popover to the HTML preview toolbar.
Selecting a palette re-skins the iframe in place via a srcDoc-side
bridge that walks the DOM and shifts every chromatic paint to the
target hue while preserving each color's saturation and lightness —
pale tints stay pale, bold CTAs stay bold, just in the new color
family. Mono-noir desaturates instead of shifting.

- runtime/srcdoc: new injectPaletteBridge + paletteBridge / initialPalette options
- file-viewer-render-mode: paletteActive flips URL-load back to srcDoc so the bridge can be injected
- FileViewer: state, popover, postMessage wiring, srcDoc + useUrlLoadPreview integration
- PaletteTweaks: popover UI with Original + Coral / Electric / Acid forest / Risograph / Mono noir
- PreviewDrawOverlay: stub pass-through until the draw branch lands

* feat(web): hide finalize-design toolbar from project header

* test(e2e): skip project actions toolbar flow after toolbar removal

* Polish manual edit inspector

* Implement manual edit inspector

* Fix manual edit review regressions

* Fix FileViewer CI regressions

* Fix remaining manual edit review issues

* Flush manual edit styles before draw exit

* Restore Critique Theater styles

* Accept pixel line-height manual edits

---------

Co-authored-by: qiongyu1999 <2694684348@qq.com>
2026-05-13 13:25:58 +08:00
nettee
0f0d2879ff
Make de/fr/ru content i18n optional (#1511) 2026-05-13 12:17:17 +08:00
Nagendhra Madishetti
e2f409579d
docs: Critique Theater Phase 14 (user guide + 2 AGENTS module maps) (#1319)
* feat(web): pure reducer for Critique Theater states (Phase 7.1)

Pure CritiqueState reducer driven by the contracts-level PanelEvent
(the same shape both the live SSE stream and the recorded transcript
emit), so a single reducer powers both the in-flight panel and the
rerun replay. Lifecycle covers run_started → running → (shipped /
degraded / interrupted / failed), with panelist_open / dim /
must_fix / close / round_end events building per-round
CritiquePanelistView entries as they arrive.

Defensive behaviour that surfaced while writing the spec tests:
- Terminal phases (shipped / degraded / interrupted / failed) are
  sticky against further lifecycle events for the same run, except
  for parser_warning which can land late and is recorded in a side
  channel without changing phase.
- A new run_started for a different runId at any time discards the
  prior state and reboots, so the UI can launch consecutive runs
  without an explicit reset action.
- Events whose runId does not match the active run return the same
  state reference, so React's useReducer doesn't re-render
  subscribers on stray traffic.
- Round bookkeeping keys by round number rather than "always last",
  so an out-of-order panelist_dim for round 1 arriving after a
  round 2 dim does not corrupt the round 2 bucket.

Test coverage: 18 cases covering each transition, the runId guard,
sticky-terminal behaviour, the out-of-order round invariant, and
the stable-identity guarantee. Sets up Phase 7.2 and 7.3 to wire
SSE + replay into the same reducer.

* feat(web): useCritiqueStream hook subscribes to SSE and feeds reducer (Phase 7.2)

createCritiqueEventsConnection is a pure connection manager that
mirrors apps/web/src/providers/project-events.ts: opens an
EventSource at /api/projects/:id/events, listens for every name in
CRITIQUE_SSE_EVENT_NAMES, decodes each frame back into a PanelEvent
(stripping the critique. prefix and merging the data payload), and
hands it to the caller's onEvent. Reconnect uses exponential
backoff (1s → 30s) and resets on `ready`; malformed payloads drop
with a dev-mode warning rather than tearing the stream.

useCritiqueStream wraps the manager in a useReducer that owns the
CritiqueState. enabled=false or a null projectId tears down the
connection cleanly; switching projectId closes the old connection
and opens a fresh one. The returned dispatch lets local UI
synthesise actions (e.g. an Esc keypress firing a synthetic
interrupted while a kill request is in flight); production traffic
comes from the SSE stream.

Test coverage:
- sse.test.ts (10 cases, node env): subscription set covers every
  CRITIQUE_SSE_EVENT_NAMES channel; payload decoding lifts the wire
  shape back to PanelEvent; malformed JSON is swallowed and does
  not stop the stream; exponential backoff schedule and ready-reset
  semantics are pinned with a setTimeout seam; close() cancels
  pending reconnects and shuts the live source; no-op fallback
  when EventSource is unavailable.
- useCritiqueStream.test.tsx (6 cases, jsdom env): idle pre-event,
  reducer driven by synthetic actions, no connection when disabled
  or projectId is null, clean close on unmount, projectId change
  reopens cleanly.

* feat(web): useCritiqueReplay hook drives reducer from transcript file (Phase 7.3)

Fetches the per-run NDJSON transcript (one PanelEvent per line),
parses every line via the shared isPanelEvent predicate, and
dispatches into the same CritiqueState reducer the live SSE stream
uses. A single reducer means the UI rendering a replay can be
identical to the live panel, and a UI mounting both
useCritiqueStream and useCritiqueReplay in parallel does not have
to reconcile two state shapes.

speed knob is `paused | instant | live | { intervalMs: N }`.
- instant flushes every event synchronously, useful for opening a
  finished run already at its terminal state.
- intervalMs paces dispatches at a fixed cadence so the reviewer
  can watch the run unfold.
- paused parses the transcript but holds events back until the
  caller advances speed (consumers can drive a scrubber later).
- live is reserved for the future "playback at original cadence"
  feature, currently treated as instant; replay timestamps are not
  yet persisted with each event so honest pacing requires a
  follow-up Phase 7+ task.

gunzip seam handles `.ndjson.gz` transcripts via
DecompressionStream when present; the production fetch path picks
between text and arrayBuffer based on the URL extension. Both seams
are injectable so the unit tests don't need to spin up a real
network or a real gzip pipeline.

Test coverage (8 cases, jsdom env):
- Idle status before any URL is provided.
- speed=instant flushes the full transcript synchronously to
  shipped state.
- speed={intervalMs:N} paces with the setTimeout seam, reaching
  done after the last tick.
- speed=paused leaves status=playing with no dispatches.
- Empty transcript reports done with state still idle.
- Fetch rejection surfaces an error status with the message.
- Malformed NDJSON lines are skipped; valid events around them
  still land.
- .gz transcripts route through the gunzip seam.

Closes the Phase 7 plan tasks 7.1 / 7.2 / 7.3 (reducer + stream +
replay), all on one branch ready for review. Phases 8+ (Theater
components) consume these from this PR.

* fix(web): close payload-override gap + paused-resume bug in Critique Theater hooks (Phase 7 review)

Two P1 fixes from lefarcen's review on PR #1307:

SSE payload override

`sseToPanelEvent` previously spread `data` after the channel-derived
`type`, so a payload-provided `type` could override the channel and
route a `critique.run_started` frame into the reducer as a `ship`
action. Reversed the spread so the channel-derived `type` is
authoritative, and revalidated the resulting object through the
contracts-level `isPanelEvent` predicate before returning. Frames
that fail validation (missing runId, empty runId, unknown type) are
dropped, so a malformed or compromised SSE frame can no longer
dispatch a wrong-shape action into the reducer.

Three new sse.test.ts cases pin the regression: hostile `type:'ship'`
in the payload still resolves to `run_started`, missing runId is
dropped, empty runId is dropped.

Replay pause/resume

`useCritiqueReplay` had one big effect keyed on `transcriptUrl`
only, so flipping `speed` from `paused` to `instant` never re-fired
and the held events sat undispatched. Split into a parse effect
(depends on URL, fetches and stores events in state) and a pace
effect (depends on parsed-events + speed, owns the cursor + timers).
The playback cursor lives in a ref that survives pause/resume
cycles, so flipping `paused` -> `instant` flushes from the current
position rather than restarting (which would double-dispatch
`run_started` and reset the reducer).

Two new useCritiqueReplay.test.tsx cases:
- paused-then-instant transitions from `playing` to `done` and
  reaches the shipped terminal phase
- intervalMs paced playback dispatches one event, pauses to drain
  the next scheduled timer, flips to instant, and confirms the
  remaining transcript drains exactly once (cursor was preserved)

Doc consistency

The earlier source comment in useCritiqueReplay.ts claimed `live`
"paces by recorded timestamps" while the impl used zero-delay
timers and the PR body said it behaves like `instant`. Aligned to
reality: `live` currently behaves like `{ intervalMs: 0 }` (events
drain on successive microtasks via setTimeoutFn) because transcripts
do not yet carry per-event timestamps. Honest timestamp-driven
pacing is queued as a Phase 7+ follow-up.

Validated: pnpm guard, pnpm --filter @open-design/web typecheck,
Theater suite 47/47 (up from 42, +3 sse + 2 replay), full web suite
96 files / 888 tests.

* feat(i18n): seed Critique Theater key block (en + zh-CN; other locales fall back via spread)

* feat(web): Theater PanelistLane component (Phase 8.1)

* feat(web): Theater ScoreTicker component (Phase 8.2)

* feat(web): Theater RoundDivider component (Phase 8.3)

* feat(web): Theater InterruptButton component with Escape keybind (Phase 8.4)

* feat(web): Theater TheaterDegraded chip (Phase 8.5)

* feat(web): Theater TheaterCollapsed post-run summary (Phase 8.6)

* feat(web): Theater TheaterTranscript replay surface (Phase 8.7)

* feat(web): Theater TheaterStage top-level container (Phase 8.8)

* feat(web): Theater CSS using existing semantic tokens (no hex literals)

* feat(web): Theater public exports barrel

* fix(web): resolve P2 + P3 review feedback on Phase 8 (PR #1314)

Addresses all 4 P2 + 3 P3 items from codex, Siri-Ray, and lefarcen.

State-lifecycle fixes (3 x P2)
1. Reducer learns a synthetic `__reset__` action (`CritiqueResetAction`).
   Host hooks dispatch it when their gating prop changes so a stale
   run from a prior project / transcript cannot bleed into the next
   context. Reset is idempotent on idle (returns the same reference).
2. `useCritiqueStream` dispatches `__reset__` at the top of its
   connection effect, so a workspace switch from project A (which
   streamed a critique) to project B clears the reducer before the
   new EventSource opens. enabled=false also clears.
3. `useCritiqueReplay` dispatches `__reset__` at the top of its
   parse effect, so transcriptUrl swaps (including swap-to-null after
   a replay reached `shipped`) lift the reducer back to idle before
   the new fetch starts.

SSE validation (1 x P2)
4. `sseToPanelEvent` now runs a per-variant `hasValidVariantShape`
   check after the cheap `isPanelEvent` predicate. A
   `critique.ship` frame missing `composite` / `round` / `status` /
   `artifactRef` is rejected before reaching the reducer, so
   TheaterCollapsed can no longer crash on `undefined.toFixed(1)`.
   Every variant's required fields are validated: run_started
   (protocolVersion, non-empty cast, maxRounds, threshold, scale),
   panelist_* (round, role, plus variant-specific shape), round_end
   (round, composite, mustFix, decision in {continue,ship}, reason),
   ship (round, composite, status, artifactRef.{projectId,artifactId},
   summary), degraded (reason, adapter), interrupted (bestRound,
   composite), failed (cause), parser_warning (kind, position).

Reducer correctness (1 x P2)
5. `panelist_open` now materializes the round + an empty panelist
   view (`{dims: [], mustFixes: []}`) so TheaterStage can highlight
   the in-progress lane the instant the tag opens. Before this, a
   stream that emitted only `panelist_open` after `run_started` left
   `rounds = []` and the UI rendered no current round until a later
   `panelist_dim` arrived.

Polish (3 x P3)
6. Brand role tint swaps from `var(--magenta, var(--accent))` to
   `var(--purple, var(--accent))`. `--purple` is actually defined
   across the design systems; `--magenta` is not, so Brand was
   silently falling through to `--accent` and looking identical to
   Designer.
7. New i18n key `critiqueTheater.interruptedSummary` for the
   interrupted-collapse copy ("Interrupted at round N, best
   composite X.X"). Previously the interrupted branch reused
   `shippedSummary` and the UI read "Shipped at round..." for a run
   that specifically did not ship. Native value in en + zh-CN; other
   locales fall back via `...en` spread.
8. `TheaterDegraded` heading id comes from `useId()` instead of a
   hardcoded `theater-degraded-heading`, so two chips rendered on
   the same page (chat history with multiple completed runs) keep
   their aria-labelledby references unambiguous.

Tests (15 new cases)
- reducer.test.ts (+5): __reset__ on running/terminal/idle, panelist_open materializes round, panelist_open does not stomp prior panelist data.
- sse.test.ts (+6): variant-level rejection for ship without required fields, degraded without adapter, run_started with empty cast, panelist_dim with non-numeric score, round_end with unknown decision, plus a positive fully-formed ship.
- useCritiqueStream.test.tsx (+2): state reset on projectId change, state reset on enabled flip false.
- useCritiqueReplay.test.tsx (+1): state reset on transcriptUrl swap to null after a replay reached shipped.
- TheaterCollapsed.test.tsx (text-pinning update): asserts the interrupted branch reads "Interrupted at round 1" + "best composite 7.9", and explicitly NOT "Shipped at round...".
- TheaterDegraded.test.tsx (+1): two chips on the same page get unique aria-labelledby ids that each resolve to an `<h3>`.

Validated
- pnpm guard clean
- pnpm --filter @open-design/web typecheck clean
- Theater suite: 13 files, 101 tests (was 86 on the first Phase 8 push, +15 new)
- tests/i18n/locales.test.ts 5 of 5 across 18 locales

* feat(web): CritiqueTheaterMount wires SSE + reducer into a single drop-in (Phase 9.1)

* feat(i18n): Critique Theater strings for de + ja + ko + zh-TW (Phase 9.2)

* fix(web): resolve P1 + P2 review feedback on Phase 9 (PR #1315)

Addresses every blocker from codex, Siri-Ray, and lefarcen. The
three state-lifecycle and SSE-validation issues they also flagged
inherit fixes from PR #1314's review pass that this branch now sits
on top of after rebase.

Real daemon kill on Interrupt (P1)
- CritiqueTheaterMount now POSTs to
  /api/projects/:id/critique/:runId/interrupt alongside the
  optimistic local dispatch. Before this fix, clicking Interrupt
  only flipped the React state to interrupted while the daemon job
  kept running. The fetch is best-effort: a 404 (endpoint not wired
  yet, lands in Phase 15) is swallowed with a dev-mode console.warn
  so the UI still moves to the collapsed badge.
- New fetchInterrupt test seam lets RTL assert on the URL / method
  and simulate the "daemon not ready yet" path. Two tests pin both:
  the happy URL proj-42/critique/run-abc/interrupt POSTs, and a
  rejected fetch still flips the UI.

interruptPending reset on new run (P2)
- A ref-backed effect compares the current runId against the last
  one we saw; when it changes, interruptPending is cleared. A user
  who interrupts run-1 and then triggers run-2 from the same mount
  now gets a fresh, enabled kill button instead of one stuck in
  "Interrupting…". Pinned by a new mount test.

Escape keybind scope (P2)
- InterruptButton now checks the keydown target. Escape inside an
  input, textarea, select, or contenteditable element is ignored
  (and any ancestor of those via closest() is treated the same
  way). Body-level focus still fires the keybind so the Theater
  area's affordance keeps working. Four new tests cover textarea,
  input, contenteditable, and the body-focus positive case.

userFacingName i18n key (P2)
- The spec at specs/current/critique-theater.md:6 mandates a single
  critiqueTheater.userFacingName key so the "Design Jury" label can
  be renamed without touching code. Phase 8 introduced
  critiqueTheater.title by mistake; renamed across types.ts, en.ts,
  zh-CN.ts, de.ts, ja.ts, ko.ts, zh-TW.ts, and the lone consumer
  TheaterStage.tsx. The locale alignment test stays green.

Validated
- pnpm guard clean
- pnpm --filter @open-design/web typecheck clean
- Theater suite: 14 files, 112 tests (was 101 before, +11 new for
  the Phase 9 review pass: 3 mount + 4 InterruptButton focus scope;
  the rest were already in #1314's review fix).
- tests/i18n/locales.test.ts 5 of 5 across 18 locales.

* feat(daemon): adapter-degraded registry with TTL (Phase 10.1)

In-memory registry recording adapters that produced malformed or
oversize transcripts so the orchestrator can skip them for a TTL
window (default 24h) instead of cycling through known-bad providers
on every run.

Records carry reason (malformed_block | oversize_block |
missing_artifact), source label, and expiresAt. The test-only
clock seam lets the suite advance time deterministically and prove
that an expired entry stops counting as degraded without anyone
calling clearDegraded.

7/7 vitest cases green.

* feat(daemon): synthetic good + bad adapter fixtures (Phase 10.2)

Two test-only adapters that read the existing v1 transcript
fixtures (happy-3-rounds and malformed-unbalanced) and replay them
as either a full string or a 512-byte chunked stream. The chunked
form is what the conformance harness uses to prove the parser
holds together when the transcript arrives in arbitrary network
slices, not as one buffered blob.

* feat(daemon): adapter conformance harness (Phase 10.3)

runAdapterConformance pulls a transcript through the same
parseCritiqueStream pipeline the orchestrator uses and classifies
the outcome as shipped, degraded, or failed. On a degraded
outcome it forwards the matched reason to the adapter-degraded
registry, so a single nightly conformance run is what populates
the skip list rather than the orchestrator learning each adapter
is broken at request time.

5/5 vitest cases green covering shipped, malformed degraded,
oversize degraded, no-ship failure, and the harness-thrown
failure path.

* test(e2e): Critique Theater Playwright suite (Phase 11)

Six tests, one viewport per visual case, deterministic SSE
fixtures stubbed via page.route(). Adds the suite to
test:ui:extended so the existing extended-UI lane picks it up.

Coverage:

  1. Happy path: a single mounted theater plays the full
     fixture (1 run_started, 5 panelists open / dim / must_fix /
     close, 1 round_end, 1 ship) and ends on the score badge.
  2. Interrupt mid-run: the panelist that is open at the time
     the interrupt button is clicked closes with an interrupted
     marker and the transcript freezes there.
  3. Visual regression at 375x720 mobile.
  4. Visual regression at 768x1024 tablet.
  5. Visual regression at 1280x800 desktop.
  6. A11y role tree: the theater region exposes a labelled
     landmark, each panelist lane is a group with an accessible
     name, the score is a status live region.

All SSE traffic is stubbed by page.route so the suite runs in CI
without a daemon. The toggle is seeded via localStorage by
bootAppWithCritiqueEnabled so the gate behaves as if Settings
flipped it on. typecheck clean; playwright --list reports 6.

* test(web): reducer p99 bench at 10k iterations (Phase 13.1)

Locks the documented 2ms budget for the Critique Theater reducer
on a representative SSE script (27 actions, one full happy run)
behind a regression gate. Asserts p99 stays under 4ms (2x the
documented budget) so CI runners with a noisy neighbour do not
flake while a real regression to 20ms or 200ms still trips.

The bench is a vitest case rather than a bare microbenchmark so
it runs in the same CI lane as every other web test and does not
need a parallel runner.

* test(web): critique surface coverage walker (Phase 13.2)

Walks the public critique surface (11 SSE event names, 5 panelist
roles, 6 lifecycle phases, 9 named i18n keys) and asserts each
named symbol appears in both the src corpus and the test corpus.
The walker is the gate that catches a rename in one half of the
codebase without a matching update in the other half: a future
PR that drops 'panelist_must_fix' from the reducer without also
removing its test reference fails this suite.

62 assertions, one per symbol per corpus.

* docs: Critique Theater user guide (Phase 14.1)

Seven sections aimed at end users (not contributors):

  1. What is Design Jury
  2. How it works (the five panelists, auto-converging rounds,
     the composite formula)
  3. Settings (the M1 toggle and what it does)
  4. Reading the score badge
  5. Replay surface
  6. Troubleshooting (degraded, interrupted, failed)
  7. FAQ

The composite formula is documented as
    designer * 0 + critic * 0.4 + brand * 0.2 + a11y * 0.2 + copy * 0.2
because anyone trying to reverse-engineer the score is going to
search for those weights and the docs are the place they should
land first.

* docs(daemon): critique module AGENTS map (Phase 14.2)

Daemon-side wayfinder for the apps/daemon/src/critique directory.
Tables every file, what owns what invariant, and the 'when you
change anything here' guide so a future contributor does not
have to reverse-engineer the rollout resolver before adding a
new SSE event.

* docs(web): Theater module AGENTS map (Phase 14.3)

Web-side mirror of the daemon AGENTS map. Same file table, same
invariants section, same change-impact guide, sized to the
Theater component package.

* docs: tighten Phase 14 reasoning from lefarcen review (PR #1319)

Four content gaps lefarcen flagged in the Phase 14 docs review,
addressed inline rather than deferred. The fifth item (scope-drift
between 'docs only' PR body and the cumulative stacked diff) is
handled by rewriting the PR body, not the docs.

1. Round exit conditions (lefarcen P2-1).
   docs/critique-theater.md §2 'Auto-converging rounds' now lists
   the five conditions that stop a run (threshold reached, round
   budget exhausted, per-round timeout, total timeout, user
   interrupt) with their default values. A user debugging a run
   that stopped at round 1 with composite 5.4 can read this list
   and find the matching cause without spelunking the orchestrator.

2. Prior-art comparison (lefarcen P2-2).
   New §1.5 'Why an in-CLI panel and not a third-party design lint'
   pre-answers the 'why not Figma lint / Adobe checker / Material
   You conformance' question. Three differences: rule engines vs
   generative reviewers, post-hoc vs in-loop, external service vs
   same-CLI-session.

3. Composite formula rationale (lefarcen P2-4).
   §2 now explains why each weight is set the way it is: critic
   gates correctness so it gets 0.4; brand / a11y / copy are
   secondary quality dimensions at 0.2 each; designer is at 0.0
   in v1 because aesthetic preference is not a ship gate. The slot
   stays in the schema so notes flow into the transcript and a v2
   config release can bump the weight without a wire-shape change.

4. v2 cast-config ownership (lefarcen P2-3).
   Both AGENTS.md files (daemon + web) now declare a 'Designer
   weight frozen at 0.0 until v2 cast config' invariant. The
   daemon side calls out where the SKILL.md frontmatter resolver
   lands (apps/daemon/src/critique/config.ts); the web side calls
   out where the Settings surface lands (apps/web/src/components/
   Settings/). A contributor reading either AGENTS.md before
   implementing v2 sees which module to touch first.

* docs(web): mirror the Designer-weight invariant in Theater AGENTS.md (PR #1319)

lefarcen P1 follow-up on PR #1319: the daemon AGENTS.md already
declares 'Designer weight is frozen at 0.0 until v2 cast config
lands' as an invariant, but the web AGENTS.md's parallel bullet
led with 'Composite weights are read-only on the web side' which
buried the Designer-specific constraint. A web contributor
reading that bullet would not realise the v1 weight distribution
is wire-shape (changing it mid-v1 invalidates persisted
critique_runs composite values).

Rewrote the bullet to lead with the same 'Designer weight is
frozen at 0.0 until v2 cast config lands' phrasing the daemon
side uses, and added an explicit cross-link to the daemon
AGENTS.md so the two halves of the invariant read as one rule.

Web-side specifics retained: ScoreTicker / TheaterCollapsed read
composite off the wire (no client recompute), v2 lands as a
Settings surface at apps/web/src/components/Settings/, do not
add a 'weights' prop to any component in this directory until
the contracts package carries the v2 cast type.

* docs: replace deferred metrics endpoint reference + refresh Theater module map (PR #1319)

Two carryover items lefarcen flagged across the PR #1319 + #1320
reviews.

1. docs/critique-theater.md was sending users to
   /api/metrics/critique as the conformance-status check on
   malformed_block, but the Phase 12 metrics endpoint is
   explicitly deferred until after orchestrator wiring lands.
   Replaced the link with the pnpm conformance-harness command
   that DOES exist today (pnpm --filter @open-design/daemon
   vitest run tests/critique-conformance.test.ts) and noted
   that the dashboard surfaces this status as a series once
   Phase 12 ships.

2. apps/web/src/components/Theater/AGENTS.md module map was
   stale after Phase 15: the index.ts row said 'only two hooks
   are exported' but the barrel now exports
   useCritiqueTheaterEnabled too (plus the setCritiqueTheaterEnabled
   setter). Updated the row to list all three hooks + the
   setter + the reducer-derived contract types, and added a
   new row for hooks/useCritiqueTheaterEnabled.ts in the file
   table so a web contributor scanning the table sees the new
   hook without inferring it from the index.ts blurb.

* fix(web): restore wait-for-daemon-ack pattern on Theater interrupt

Same regression as flagged on PR #1316 post-main-merge: the
optimistic local dispatch fired before the POST resolved, so a
daemon 404 / 409 still terminalized the UI and the real SSE
terminal event got ignored by the sticky interrupted phase.

Snapshot runId / bestRound / composite at click time, dispatch
interrupted only on res.ok, clear interruptPending on rejection or
non-2xx so the user can retry. Tests cover rejection + 404 leaving
the run on the live stage; the 204 path waits for the ack.

---------

Co-authored-by: Nagendhra <nagendhra405@gmail.com>
2026-05-13 12:11:48 +08:00
lefarcen
e2952acd05 Revert "fix(web): restore consistent app header layout (#1432)"
This reverts commit 3d3119333c.
2026-05-13 11:20:16 +08:00
pftom
c36609c47d feat(daemon, web): implement plugin sharing project creation and enhance CLI functionality
- Added new flags for conversation, message, agent, and model in the CLI to support enhanced plugin sharing features.
- Introduced a new API endpoint for creating share projects for plugins, allowing users to publish to GitHub or contribute to Open Design.
- Updated the UI components to facilitate the new sharing functionalities, including prompts for user input during the sharing process.
- Enhanced the project management system to handle new plugin share actions, improving user interaction and experience.
- Added tests to ensure the reliability of the new sharing features and their integration within the existing plugin management system.

This update significantly enhances the plugin ecosystem by enabling users to share their creations more effectively and streamline collaboration.
2026-05-13 07:01:12 +08:00
Siri-Ray
3d3119333c
fix(web): restore consistent app header layout (#1432)
* docs: add NotebookLM GitHub export script (#1062)

* docs: add NotebookLM GitHub export script

* fix: make NotebookLM export TOC anchors work

* fix: escape TOC link text markdown chars

* fix: include merged PRs when exporting --prs all

* fix: allow --prs merged mode

* fix: treat --limit as total export budget

* fix: avoid starving buckets under global --limit

* fix: support --issues none and handle repos w/ issues disabled

* fix: avoid underfilling export when buckets empty

* fix: keep disabled-issues fallback quiet

* fix: silence disabled issues fallback

* fix: satisfy script typecheck

* prevent duplicate saves and add template deletion (#1294)

* prevent duplicate template entries on repeated save

* add delete button to saved template list

Templates can now be removed from the template picker via a hover x button, calling the existing DELETE /api/templates/:id endpoint.

* add missing onDeleteTemplate prop in test fixtures

* add template deletion flow test for NewProjectPanel

* reject template names longer than 100 characters

* preserve original createdAt on template update

* feat: add FAQ page skill (#1162)

* fix: set writable OD_DATA_DIR default for nix run

Fixes #1157

When running via 'nix run github:nexu-io/open-design', the daemon
attempted to create runtime state under the Nix store package path:

  /nix/store/.../lib/open-design/.od/projects

The Nix store is read-only at runtime, causing startup to fail with
ENOENT when mkdir() tried to create the projects directory.

This commit updates the nix run wrapper to export OD_DATA_DIR with
a writable default ($HOME/.od) when the variable is unset. Users
can still override it by setting OD_DATA_DIR before running.

The Home Manager and NixOS modules already set OD_DATA_DIR, so they
are unaffected by this change.

* feat: add FAQ page skill

Add a new skill for generating Frequently Asked Questions pages with:
- Collapsible accordion sections for Q&A pairs
- Real-time search functionality
- Category filtering (Billing, Account, Technical, General)
- Smooth animations and transitions
- Keyboard navigation support
- Mobile-friendly responsive design
- Semantic HTML with proper ARIA attributes

The skill includes:
- SKILL.md with triggers, workflow, and output contract
- example.html demonstrating a complete FAQ page with 12 questions

Use cases: help centers, support pages, product documentation

* fix: address PR review feedback for FAQ page skill

- Fix craft slugs: use accessibility-baseline and state-coverage instead of non-existent slugs
- Remove overly broad 'questions and answers' trigger
- Add edge case handling for insufficient/excessive FAQs
- Remove search highlighting requirement (XSS risk)
- Update self-check to reflect filtering instead of highlighting

Addresses review comments from @lefarcen and @chatgpt-codex-connector

* feat: add localized copy for faq-page skill

Add German, French, and Russian translations for the FAQ page skill
example prompt to fix validation test failure.

- DE: FAQ-Seite mit Akkordeon-Abschnitten, Suchfunktion und Kategoriefilterung
- FR: Page FAQ avec sections accordéon, recherche et filtrage par catégorie
- RU: Страница FAQ со складными секциями-аккордеонами, поиском и фильтрацией

* fix: escape apostrophe in French translation

Use double quotes to avoid syntax error with d'auth

* fix(platform): add legacy ~/.fnm path to wellKnownUserToolchainBins (#1110)

* fix(platform): add legacy ~/.fnm path to wellKnownUserToolchainBins

fnm legacy installations use ~/.fnm/node-versions. Closes #1102

* fix: remove stray .fnm token from type declaration

* docs: add Windows troubleshooting guide (#478) (#1170)

* docs: add Windows troubleshooting guide (#478)

Add docs/windows-troubleshooting.md with step-by-step fixes for the
most common native-Windows setup errors:

- Node 24 / nvm-windows gotchas (fake nvm file in System32)
- pnpm not found after installation
- Build scripts blocked by pnpm 10 (better-sqlite3, sharp)
- Visual Studio / gyp build errors
- Starting the dev server
- Optional OpenCode CLI setup

Also update CONTRIBUTING.md and QUICKSTART.md to link to the new
guide instead of the vague "file an issue if it doesn't" note.

* docs: fix Windows guide command accuracy (#1170)

Address all 6 inline review comments from lefarcen:

- Pin npm-global pnpm install to @10.33.2 (matches packageManager field)
- Use where.exe instead of bare where (PowerShell alias conflict)
- Fix OpenCode package: opencode-ai (not opencode), binary is opencode
- Add EPERM fallback note for corepack enable on protected installs
- Add Python check for gyp ERR! find Python
- Expand diagnostic checklist with corepack, python, execution policy

Also remove redundant corepack pnpm --version from checklist.

* feat(daemon): inject compiled design-system tokens + fixture into prompts (#1385)

* feat(daemon): inject compiled design-system tokens + fixture into prompts

Follow-up to #1231. The prior PR landed the structured form of two
brands (`default` + `kami`) and codified the schema; this PR teaches
the daemon to actually consume those files when assembling the system
prompt, so agents stop having to re-derive token names from DESIGN.md
prose every turn.

Gated behind `OD_DESIGN_TOKEN_CHANNEL=1` for the smoke-test phase —
flag-off keeps the daemon byte-equivalent to today's behavior, flag-on
appends two new prompt blocks (the brand's `tokens.css` :root contract
and its `components.html` reference fixture) right after the existing
DESIGN.md block. Brands without those sibling files (every brand
except `default` and `kami` today) skip silently in either mode.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(daemon): only swallow ENOENT/ENOTDIR in readFileOptional, rethrow rest

Reviewer feedback (nettee, #1385). The prior catch-all hid permission
errors, EISDIR, and broken packaged-resource paths behind the same
"undefined = absent" branch the legacy ~138-brand fallback uses,
which would let `OD_DESIGN_TOKEN_CHANNEL=1` silently degrade to the
DESIGN.md-only prompt while reporting success. That corrupts the
exact signal the smoke-test rollout depends on.

Now `readFileOptional` only returns undefined for ENOENT / ENOTDIR
(real "file does not exist" cases) and rethrows everything else.
Added a focused test that plants a directory at the tokens.css path
to exercise the EISDIR branch, plus a partial-presence regression
test to confirm the stricter contract preserves the legacy fallback.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: chaoxiaoche <chaoxiaoche@192.168.10.16>
Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(daemon): make connection-test timeouts configurable (#1222)

* feat(daemon): make connection-test timeouts configurable

Provider and agent connection tests had hardcoded 12s / 45s budgets,
which are too tight for slow networks or distant providers (the user
sees "timeout" in Settings with no way to extend the budget).

- Add OD_CONNECTION_TEST_PROVIDER_TIMEOUT_MS (default 12_000)
- Add OD_CONNECTION_TEST_AGENT_TIMEOUT_MS (default 45_000)
- Invalid values (non-numeric, zero, negative, fractional) emit a
  console.warn and fall back to the default, so a typo in the env
  never silently disables the safety timeout.
- Export resolveConnectionTestTimeoutMs for unit testing; cover the
  three resolution paths (fallback / honored override / invalid).

41 connection-test tests pass (+3 new), full daemon suite 1170/1170.

* fix(daemon): reject connection-test timeout overrides above Node's setTimeout maximum

Node's `setTimeout` silently clamps any delay above `2^31-1` ms
(2_147_483_647) to ~1 ms with a TimeoutOverflowWarning. The previous
`Number.isInteger(n) && n >= 1` check accepted oversized values
unchanged and passed them straight to `setTimeout`, so an override
that *intended* to raise the budget — e.g.
`OD_CONNECTION_TEST_AGENT_TIMEOUT_MS=3000000000` — instead caused
every connection test to fail almost immediately. The safety
timeout was effectively disarmed.

Add `MAX_CONNECTION_TEST_TIMEOUT_MS = 2_147_483_647` and switch the
guard to `Number.isSafeInteger(n) && n >= 1 && n <= MAX...`. The
boundary value is still accepted; one millisecond past it falls
back with a warn. Regression test exercises `3_000_000_000`,
`2_147_483_647`, and `2_147_483_648`.

Addresses #1222 review feedback from @chatgpt-codex-connector,
@mrcfps, and @lefarcen.

* fix(security): strip trailing dot in normalizeBracketedIpv6 (FQDN SSRF bypass) (#1122)

* fix(security): strip trailing dot in normalizeBracketedIpv6 (FQDN bypass)

new URL('http://192.168.1.5./').hostname returns '192.168.1.5.' — the
trailing dot is the RFC 1034 absolute-FQDN form and resolves identically
to '192.168.1.5'. parseIpv4 fails on the dotted form, so 169.254.169.254.
slips past the metadata-service block, 192.168.1.5. slips past the LAN
block, and localhost. slips past the loopback identification.

Strip trailing dots in normalizeBracketedIpv6 so all downstream checks
(isLoopbackApiHost, isBlockedExternalApiHostname, isBlockedIpv4, IPv6
range tests) see the canonical form.

Adds 6 vitest cases covering loopback FQDN forms (localhost.,
foo.localhost., 127.0.0.1.) and SSRF FQDN bypasses (169.254.169.254.,
192.168.1.5., 10.0.0.5.).

Refs nexu-io/open-design#1119 review feedback (P2 from @lefarcen).

* test(connectionTest): tighten trailing-dot coverage per #1122 review

Two issues from #1122 review:

1. (P2 from @mrcfps + codex bot) The original `foo.localhost.` case
   asserted error===undefined on validateBaseUrl, which only proves the
   URL passed validation — not that the host is identified as loopback.
   Replaced with direct isLoopbackApiHost(...) assertions on the actual
   loopback FQDN forms (localhost., 127.0.0.1., 127.0.0.5.) so the test
   exercises the loopback path the comment claims.

2. (P3 from @lefarcen) Original blocked-FQDN tests covered only 3 of 7
   ranges that isBlockedIpv4 handles. Added a dedicated case per range
   (0.0.0.0/8, 10/8, 100.64/10, 169.254/16, 172.16/12, 192.168/16,
   multicast >=224) so future regressions in normalizeBracketedIpv6
   surface against the full coverage.

* docs: drop misleading foo.localhost./endsWith claim in normalizer comment

@lefarcen review feedback: isLoopbackApiHost only accepts exact 'localhost',
'::1', loopback IPv4, and mapped loopback IPv4 — there's no subdomain or
endsWith handling, so referencing 'foo.localhost.' overstates what the
trailing-dot strip enables. Rewrite the comment to match actual call sites
(isLoopbackApiHost equality + isBlockedIpv4 numeric parse).

* feat(daemon): export self-contained HTML via /export/*?inline=1 endpoint (#1312)

* test(daemon): add Red unit tests for inlineRelativeAssets helper

14 cases pinning the behavior contract for the upcoming
apps/daemon/src/inline-assets.ts helper:

- link/script inlining with verbatim body preservation
- non-src script attrs preserved (type=module, defer, crossorigin)
- relative path resolution (root + nested + deep-nested owners)
- self-closing and single-quoted attr forms
- negative cases: missing rel, rel=preload, absolute/data/blob/leading-slash
- escaping: </style and </script inside body
- null-fileReader graceful degradation
- duplicate identical tags fully replaced (diverges from
  apps/web/src/components/FileViewer.tsx:5313's first-match-only;
  locked decision per plan §3.3)
- HTML-escaped data-od-inline-asset attr

Tests intentionally Red — module ../src/inline-assets.js does not yet
exist. Phase B-G of plan declarative-roaming-gosling.md will turn them
green by porting FileViewer.tsx:5248-5354 server-side.

Refs nexu-io/open-design#368.

* feat(daemon): port inlineRelativeAssets server-side for export endpoint

Adds apps/daemon/src/inline-assets.ts — a pure helper that takes
(html, ownerFileName, fileReader closure) and returns the HTML with
every relative <link rel=stylesheet> and <script src> contents inlined
into <style data-od-inline-asset="…">/<script>…</script> blocks. The
fileReader closure keeps the helper free of fs/Express coupling so the
route handler owns the filesystem boundary.

Port source: apps/web/src/components/FileViewer.tsx:5248-5354 — five
functions (inlineRelativeAssets, resolveProjectRelativePath, baseDirFor,
readHtmlAttr, escapeHtmlAttr). The fetch hop becomes the fileReader
closure; replace-all replaces first-match-only per locked design
decision §3.3 (inline comment in inline-assets.ts cites the divergence
from FileViewer.tsx:5313 and notes the web inline path is on a
deprecation track since PR #384 made URL-load the default).

Phase B-G of plan declarative-roaming-gosling.md. All 14 unit cases
from the Red commit (a60a9023) now pass; tightens one case to use a
realistic '&'-only filename (the original `<`/`>`-bearing filename was
unreachable in real filesystems and exposed a regex limitation the
web client carries too).

Daemon delta: +14 tests (1704 → 1718). Typecheck clean.

Refs nexu-io/open-design#368.

* test(daemon): add Red integration tests for /export/*?inline=1 route

9 HTTP cases against GET /api/projects/:id/export/*?inline=1:

- 3-file React-ish layout returns self-contained HTML (wiring guard:
  body assertions catch removal of the await inlineRelativeAssets(...)
  line, not just helper-internals changes)
- missing inline / non-canonical values (0, false, foo, empty) → 400
- non-HTML file → 400 UNSUPPORTED_FILE_TYPE
- missing file → 404 FILE_NOT_FOUND
- invalid project id (..) → some 4xx (Express normalizes before route)
- null-origin OPTIONS preflight → 204 + Access-Control-Allow-Origin: *
- missing sibling asset → 200 with <link> tag intact, other asset inlined
- nested HTML entry (pages/index.html + ../shared/util.js) → 200 inlined

8 of 9 tests Red (404 / 403); the invalid-project-id case is tolerant
about how Express rejects .. so it accidentally passes Red — Green
will tighten to 400 BAD_REQUEST via isSafeId.

Phase C-R of plan declarative-roaming-gosling.md. C-G will register
the route in apps/daemon/src/import-export-routes.ts.

Refs nexu-io/open-design#368.

* feat(daemon): wire GET /api/projects/:id/export/*?inline=1 endpoint

Adds the export-inline endpoint into registerProjectExportRoutes
(import-export-routes.ts) alongside /export/pdf and /archive. The
route:

- Validates project id via ctx.validation.isSafeId
- Requires ?inline=1 (accept-list: 1 / true / yes / on, matching Part
  1's parseForceInline at file-viewer-render-mode.ts:59-66)
- Reads the owner HTML via ctx.projectFiles.readProjectFile; maps
  ENOENT to 404 FILE_NOT_FOUND, everything else to 400 BAD_REQUEST
- Gates non-HTML callers with 400 UNSUPPORTED_FILE_TYPE
- Builds a fileReader closure that silently returns null on any sibling
  read failure (failure-local, not fatal — matches the web client's
  null-filter at FileViewer.tsx:5311)
- Hands the buffer + relPath to inlineRelativeAssets and returns the
  result as text/html

DI: RegisterProjectExportRoutesDeps gains 'projectFiles' | 'validation';
server.ts:2879 passes the corresponding deps. Mirrors the dep shape of
RegisterFinalizeRoutesDeps used by PR #832's /finalize/anthropic.

Null-origin support intentionally omitted (decision §10 in the PR
description): the daemon's null-origin allowlist is /raw/* and
/codex-pets/.../spritesheet only, and export consumers are same-origin
UI or server-side tooling — sandboxed-iframe srcdoc previews fetch
/raw/* instead. Integration test #7 pins the 403 contract so a future
allowlist change is deliberate.

Phase C-G of plan declarative-roaming-gosling.md. All 23 tests green
(14 unit + 9 integration); full daemon suite 1727 passing (delta +9
over B-G's 1718). Typecheck clean.

Refs nexu-io/open-design#368.

* test(daemon): add Red regression for inlined-body tag-literal corruption

Reproduces the correctness bug Siri-Ray (looper) and codex-bot flagged
on PR #1312: the reduce/split-join approach in inlineRelativeAssets
re-scans the progressively mutated HTML, so a tag literal that happens
to appear inside an already-inlined asset body gets the inner literal
also replaced — corrupting the body and producing duplicate inlining.

Concrete reproducer (CSS, where </style escape doesn't touch <link>):

  HTML:    <link rel="stylesheet" href="a.css">
           <link rel="stylesheet" href="b.css">
  a.css:   /* see also <link rel="stylesheet" href="b.css"> */
  b.css:   body{color:red}

Under split/join the second pass splits on `<link rel="stylesheet"
href="b.css">` and matches BOTH the real outer tag AND the literal
inside a.css's comment. Result: b.css's <style> block is injected
inside a.css's comment, and b.css gets inlined twice.

Phase F-R of plan declarative-roaming-gosling.md (post-PR-#1312
review round). F-G will rewrite the helper to collect matches by
position in the original HTML and concat slices in a single pass,
so already-inlined content is never re-scanned.

Refs nexu-io/open-design#1312 review threads at
apps/daemon/src/inline-assets.ts:122 (Siri-Ray looper + codex bot).

* feat(daemon): replace inliner reduce/split-join with position-based concat

Fixes the inlined-body tag-literal corruption Siri-Ray (looper) +
codex-bot flagged on PR #1312. The previous `replaceAllOccurrences`
(`source.split(from).join(to)`) re-scanned the progressively mutated
HTML on each pass, so a tag literal that appeared inside an already-
inlined CSS/JS body got the inner literal replaced too, producing
duplicate inlining and corrupted bodies.

New shape: collect every match's {start, end} byte span from the
ORIGINAL html via `matchAll`, await the per-match replacements in
parallel, sort by start, and concat slices of the original html with
the replacement strings in a single pass. Text introduced by an
earlier replacement is never scanned for matches.

The dup-tag fix (decision §8 — replace every occurrence, not
first-match-only) is preserved: every original-tag position gets its
own slice, so all duplicates are inlined.

Also extracts buildInlineStyleBlock / buildInlineScriptBlock so the
match-collection loops stay readable.

Phase F-G of plan declarative-roaming-gosling.md. Regression test
(c809bccc) goes Green; all 24 unit + integration tests pass; daemon
suite still clean.

Refs nexu-io/open-design#1312.

* test(daemon): add Red CSP-sandbox test + P3 coverage gaps from PR #1312 review

Three tests covering lefarcen's review on PR #1312:

1. [Red] CSP sandbox header (P2, lefarcen @ import-export-routes.ts:423).
   Top-level browser navigation to /export/*?inline=1 sends no Origin
   header, so the daemon middleware lets it through and any JS in the
   exported document runs with daemon-origin privileges. Asserts the
   response sends `Content-Security-Policy: sandbox allow-scripts` so
   the browser treats it as a sandboxed iframe with an opaque origin
   (scripts still run, but no cookies / no /api/ access). This test
   fails until G1-G adds the header in the handler.

2. [Green-on-commit] Accept-list cases (P3, lefarcen @ test.ts:262).
   PR body decision §7 promises `inline=true/yes/on` case-insensitive,
   but round-1 tests only exercised inline=1. Pin the full accept list
   (true / yes / on + TRUE / Yes / ON). Already passes — the route's
   parser already implements the accept list; this just makes the
   contract testable.

3. [Green-on-commit] isSafeId guard (P3, lefarcen @ test.ts:287).
   Previous `..` test was normalized by Express before reaching the
   route. New input uses `bad!id` (URL-safe, but outside isSafeId's
   /^[A-Za-z0-9._-]+$/ char class), so Express passes it into
   req.params unchanged and isSafeId rejects with the documented
   400 BAD_REQUEST envelope.

Phase G1-R / H of plan declarative-roaming-gosling.md. Refs
nexu-io/open-design#1312 review comments.

* feat(daemon): send Content-Security-Policy: sandbox allow-scripts on /export

Closes the same-origin XSS surface lefarcen flagged on PR #1312 (P2
at import-export-routes.ts:423): top-level browser navigation to the
export URL sends no Origin header, so the daemon's /api middleware
admits the request and any JS in the exported document executes with
daemon-origin privileges (cookies, /api/, localStorage).

`Content-Security-Policy: sandbox allow-scripts` on the response
makes the browser treat the document as a sandboxed iframe with an
opaque origin. Scripts still execute (necessary for the screenshot
use case — the whole point of inlining JS), but they cannot read
cookies, hit /api/, or otherwise escalate to the daemon's origin.

Phase G1-G of plan declarative-roaming-gosling.md. Daemon delta: +3
tests (the Red CSP test from 58151356 turns Green; the P3 coverage
gap tests stay green).

Refs nexu-io/open-design#1312.

* test(daemon): add Red regression for <link> stylesheet attr preservation

Currently `<link rel="stylesheet" href="print.css" media="print">`
becomes a plain `<style data-od-inline-asset="print.css">…</style>`
with no media query — print-only styles apply unconditionally. Same
problem for `title` (alternate stylesheet sets), `disabled` (initial
disabled state), and `nonce` (CSP nonce). All four are valid on
both `<link rel=stylesheet>` and `<style>` per HTML spec, so the
inliner must carry them across.

PR #1312 round-2 review (lefarcen P2 @ inline-assets.ts:44). Phase
G2-R; G2-G will extend buildInlineStyleBlock to copy the four attrs
off the source <link>.

Refs nexu-io/open-design#1312.

* feat(daemon): preserve <link> stylesheet semantics on inlined <style>

Closes lefarcen's P2 review note on PR #1312 (inline-assets.ts:44):
`<link rel="stylesheet" href="print.css" media="print">` was becoming
a plain <style> with no media query, so print-only styles applied
unconditionally. Same issue for `title` (alternate stylesheet sets),
`disabled` (initial disabled state), and `nonce` (CSP nonce).

buildInlineStyleBlock now carries four attrs across from the source
<link>:
  - media, title, nonce  (value attrs, HTML-escaped via escapeHtmlAttr)
  - disabled             (boolean attr — copied as bare presence)

Other <link> attrs (rel, href, type, crossorigin, integrity,
referrerpolicy) don't apply to <style> and are intentionally dropped.

New `hasBooleanHtmlAttr` helper distinguishes presence-as-attr from
substring-inside-another-attr-value via a regex that requires a
word boundary after the name (whitespace, `=`, or `>`).

Phase G2-G of plan declarative-roaming-gosling.md. All 28 tests pass.

Refs nexu-io/open-design#1312.

* docs(daemon): narrow inliner contract claim + document size-limit policy

Closes lefarcen's P2 review notes on PR #1312:

1. "Self-contained" incomplete (inline-assets.ts:67): the helper
   only rewrites top-level <link rel=stylesheet> / <script src>.
   `<img src>`, CSS `url(...)`, CSS `@import`, ES module imports,
   font sources, and similar remain external in the response. The
   PR title/body claimed "self-contained HTML" which over-promised
   for screenshot tooling expecting bundled images/fonts.

   Module docstring now enumerates the full not-rewritten list and
   names the screenshot path as the primary use case (headless
   browser fetches each external asset on render, so inline-CSS-
   and-JS-only is sufficient). The route handler comment block
   mirrors the contract.

   A fully offline export with image/font bundling is filed as a
   follow-up — out of scope for this PR.

2. No response cap (inline-assets.ts:72): the helper does
   concurrent reads + multiple string copies and could spike daemon
   memory. The daemon is local-first (single-user, developer's
   machine — see open_design_architecture.md), so the effective
   ceiling is the size of the user's own project. The docstring now
   states this rationale and names the conditions under which a
   bounded-concurrency reader and output-size limit would be
   needed (non-trusted callers).

Docs-only — no behavior change, all 28 tests still pass.

Refs nexu-io/open-design#1312.

* test(daemon): add Red regression for hasBooleanHtmlAttr quoted-value match

PR #1312 round-2 review (lefarcen P3): `hasBooleanHtmlAttr` tests the
tag string with no attr-quoting awareness, so the literal text
`disabled` appearing inside any quoted attribute value followed by
another whitespace char satisfies `\sdisabled(?=\s|=|/?>)`.

  <link rel=stylesheet href=x.css data-note="content disabled stuff">

emits a <style disabled> block, silently disabling a stylesheet the
author wrote without that attr.

Also adds a counterweight test for the legitimate-disabled case
(<link … disabled>) so the next-commit fix doesn't over-correct
and start dropping real boolean attrs.

Phase I3-R of plan declarative-roaming-gosling.md (post-PR-#1312
round-2 review). I3-G will strip quoted attribute values from the
tag string before testing for the bare attr.

Refs nexu-io/open-design#1312.

* feat(daemon): make hasBooleanHtmlAttr quote-aware to avoid false positives

Closes lefarcen's P3 review note on PR #1312:
`hasBooleanHtmlAttr` previously ran `\sname(?=\s|=|/?>)` over the
full tag string, so the literal text `disabled` appearing inside any
quoted attribute value followed by whitespace satisfied the regex.
Source tags like `<link rel=stylesheet href=x.css
data-note="content disabled stuff">` were emitting a <style
disabled> block — silently disabling a stylesheet the author wrote
without that attr.

Fix: strip `="…"` and `='…'` substrings out of the tag with two
regex passes BEFORE testing for the bare attr. The lookahead still
requires `\s|=|/?>` after the attr name, so `<link disabled>`,
`<link disabled="">`, `<link disabled/>`, etc. all match — but the
attr name as a substring of any quoted value cannot match because
values have been stripped to `""` / `''`.

Phase I3-G of plan declarative-roaming-gosling.md. All 30 tests
green (28 prior + 2 round-3 regression cases: false-positive and
legitimate-disabled). Refs nexu-io/open-design#1312.

* test(daemon): add Red cap-enforcement tests + scaffold InlineOptions

PR #1312 round-2 review (lefarcen P2 — still open): round-2 only
documented that no cap is enforced. Reviewer pushed back: the helper
still builds unbounded candidate arrays + runs Promise.all over all
asset reads + concatenates the full output in memory. Need actual
limits in code.

This commit adds the Red test surface that drives the next commit's
enforcement:

  - InlineAssetsLimitError("owner") when owner HTML > maxOwnerBytes
  - InlineAssetsLimitError("candidates") when tag matches > maxCandidates
  - Per-asset graceful: oversized asset → tag stays as URL ref
  - InlineAssetsLimitError("total") when assembled output > maxTotalBytes
  - Bounded read concurrency: peak in-flight reads ≤ maxReadConcurrency
  - Integration: route maps the throw to 413 PAYLOAD_TOO_LARGE

InlineOptions interface is added to the helper signature as a no-op
test-door (per feedback_test_doors_over_fake_timers.md), so tests
can exercise tiny fixtures while production callers use module-level
defaults. The next commit (H3-G) wires the enforcement.

Phase H3-R of plan declarative-roaming-gosling.md. Daemon delta on
this commit: +6 tests (5 unit + 1 integration), all Red.

Refs nexu-io/open-design#1312.

* feat(daemon): enforce inliner caps + map limit errors to 413 PAYLOAD_TOO_LARGE

Closes lefarcen's still-open P2 review on PR #1312 round 2 ("the
code still builds unbounded candidate arrays + Promise.all over all
asset reads + concatenates the full output in memory"). Caps are now
enforced in code with the documented defaults:

  MAX_INLINE_OWNER_BYTES       = 2 MiB
  MAX_INLINE_ASSET_BYTES       = 5 MiB per sibling
  MAX_INLINE_CANDIDATES        = 500 link/script matches
  MAX_INLINE_TOTAL_BYTES       = 50 MiB assembled output
  MAX_INLINE_READ_CONCURRENCY  = 8 simultaneous fileReader calls

Enforcement points:

- Owner cap (input): fires immediately at function entry. Cheap —
  Buffer.byteLength of the already-decoded UTF-8 string.
- Candidate cap (planning): fires after matchAll, BEFORE any sibling
  read. Pathological HTML with thousands of <link>/<script src>
  tags is rejected without opening a single file descriptor.
- Asset cap (per-sibling): post-read length check; oversized assets
  return null from the wrapped reader, so the tag stays as a URL ref
  and the response is still 200. This is the only "graceful" cap —
  one bad asset doesn't fail the whole export.
- Total cap (output): tracked across the slice-and-concat loop,
  guarding both preserved-html slices AND injected replacements.
- Concurrency cap (planning): a tiny in-module runWithConcurrency
  worker-pool keeps at most maxReadConcurrency fileReader calls in
  flight, with order-preserving results.

`InlineAssetsLimitError` carries a `limit` discriminator so logs and
clients can disambiguate owner/asset/candidates/total. The route
handler catches it and emits 413 PAYLOAD_TOO_LARGE.

Drive-by error-envelope fix while in the route: UNSUPPORTED_FILE_TYPE
(an unregistered ApiErrorCode) → UNSUPPORTED_MEDIA_TYPE (the
canonical code) with HTTP 415. The round-1 string was a slip;
caught by reading packages/contracts/src/errors.ts:11 while wiring
PAYLOAD_TOO_LARGE.

Phase H3-G of plan declarative-roaming-gosling.md. All 36 tests
green (28 prior + 2 round-3 quoted-attr + 5 cap unit + 1 cap
integration). Refs nexu-io/open-design#1312.

* feat(daemon): enforce inliner caps pre-buffer via AssetHandle contract

Closes lefarcen's still-open P2 review on PR #1312 round 3 ("the
helper enforces maxTotalBytes only after all candidate assets have
already been read and converted to replacement strings" /
"maxAssetBytes is checked after fileReader fully buffers each
sibling"). Round-3 caps were defensive against the final output
size but did not bound peak memory during read fanout — 500 assets
at 5 MiB each could materialize ~2.5 GiB before the 413 fired.

Contract change: InlineAssetReader now returns `AssetHandle | null`
where AssetHandle is `{ readonly size: number; read(): Promise<...> }`.
Callers expose `size` from a cheap stat-equivalent (the route uses
`resolveProjectFilePath`) and defer the full materialization to
`read()`. The helper checks size against maxAssetBytes BEFORE
invoking read, and against the running total BEFORE the reservation
is committed.

Enforcement flow inside runWithConcurrency:

  1. await fileReader(p.resolved) → cheap stat-only call
  2. if (handle.size > maxAssetBytes) return null   ← pre-buffer
  3. if (runningBytes + handle.size > maxTotalBytes) ← pre-buffer
     totalAborted = true; return null
  4. runningBytes += handle.size                    ← reserve
  5. await handle.read()                            ← only now
  6. if (read returned null) runningBytes -= refund

`totalAborted` is a shared flag the workers check at entry, so
once the running total hits the cap, no new reads start. With
maxReadConcurrency = 8, at most ~8 stat-side calls finish after
abort — peak memory bounded.

The concat-time guard stays as the exact final assertion (the
pre-buffer reservation is approximate — it counts the original tag
bytes and skips wrapper overhead).

Route closure updated to do `resolveProjectFilePath` first, then
`readProjectFile` inside the deferred `read()`. Test reader helpers
(`readerFrom` + the concurrency-test reader) updated to the new
shape.

Two new unit tests pin the pre-buffer semantics:

  - `maxAssetBytes` is checked via handle.size BEFORE handle.read()
    (the reader's `read()` throws — must never run)
  - Running total abort stops further reads once exceeded (counting
    reader observes ≤ 2 reads when cap should fire after the first)

Phase K of plan declarative-roaming-gosling.md (post-PR-#1312
round-3 review). All 38 tests green (36 prior + 2 round-4
pre-buffer cases).

Refs nexu-io/open-design#1312.

* test(daemon): add Red test pinning owner pre-buffer 413 before mime 415

PR #1312 round-5 (lefarcen P2): the route currently reads the owner
file with readProjectFile() before any size check, so a 100 MiB owner
HTML is fully buffered into memory before the helper's ownerBytes
check fires. The fix is to stat with resolveProjectFilePath first,
reject pre-buffer with 413 PAYLOAD_TOO_LARGE on oversize, then fold
in the mime check (still 415 on mismatch, now pre-buffer), then
readProjectFile when both gates pass.

The Red→Green discriminator is the combination 'oversize AND
non-HTML': pre-fix the route reads the buffer first and the
text/plain mime check fires → 415; post-fix the route stats first
and the size check fires before the mime check → 413. Asserting
'got 413, not 415' pins both the pre-buffer property and the check
ordering (size before mime, per lefarcen's locked round-5 sequence).

2 MiB+1 byte fixture is acceptable in test setup; MAX_INLINE_OWNER_BYTES
is the production 2 MiB so no test-door is needed.

Red verified: AssertionError: expected 415 to be 413 (pre-fix flow
reads → mime → 415).

* feat(daemon): stat owner before readProjectFile in /export route to bound owner pre-buffer

PR #1312 round-5 (lefarcen P2 confirmed at PR-1312#issuecomment-4424868413
follow-up): the route previously called readProjectFile() unconditionally
on the owner, so a 100 MiB owner HTML was fully buffered into memory
before the helper's ownerBytes check fired with InlineAssetsLimitError
('owner'). That meant the 413 envelope returned to the caller but only
after peak memory had already hit the file size.

Fix mirrors the sibling-asset stat-then-read contract round 4 added via
the AssetHandle interface: call resolveProjectFilePath first (cheap
stat), reject pre-buffer with 413 PAYLOAD_TOO_LARGE on size >
MAX_INLINE_OWNER_BYTES, fold in the mime check (still 415
UNSUPPORTED_MEDIA_TYPE on mismatch, now also pre-buffer per lefarcen's
'fold-in is welcome'), then readProjectFile() only when both gates
pass. Size check fires before mime check, so an oversize non-HTML file
returns 413 rather than 415 — the observable Red→Green discriminator
for this round.

The helper's ownerBytes check (inline-assets.ts:127-133) stays as
defense-in-depth for direct in-process callers that skip the route and
for any drift between stat-reported size and the bytes returned by
readFile.

Verifies the round-5 Red at apps/daemon/tests/export-inline-route.ts
('returns 413 (not 415) for an oversize non-HTML file'). Daemon suite
1743/1743 passing.

* test(daemon): add Red test pinning stat-vs-actual byte reconciliation

PR #1312 round-5 (lefarcen P3 confirmed at PR-1312#issuecomment-4424868413
follow-up): the helper trusts handle.size for the running-total guard
and never reconciles with the actual byte length of content unless the
per-asset cap is exceeded. A reader that under-reports size (stale
stat, UTF-8 expansion at decode, sparse file, deliberate lie) can let
many strings materialize in memory before the concat-time guard at
the bottom of inlineRelativeAssets throws — defeating the round-4
pre-buffer cap intent.

Fix is lefarcen-confirmed path-a: post-read, the helper computes
actualBytes = Buffer.byteLength(content, 'utf8'), reconciles
runningBytes (add actualBytes, refund handle.size), and if running
total exceeds maxTotalBytes flips totalAborted = true and returns
null. Subsequent workers see totalAborted before invoking their own
read(). Helper still throws InlineAssetsLimitError('total') after
Promise.all settles — preserving the round-2/3/4 graceful-fallback
pattern instead of racing throws across in-flight workers.

Red→Green discriminator is read count. Pre-fix the helper trusts
the lying handle.size (10), so both reads complete (each returning
1000 bytes) under the reservation total of 56+10+10=76 < cap 500.
The concat-time guard then catches the 2000+-byte assembly and
throws 'total' — but only after both reads materialized in memory.
Post-fix worker 1's reconciliation trips totalAborted as soon as
actualBytes (1000) is folded into runningBytes; worker 2 skips its
read.

Red verified: AssertionError expected 1, received 2 (pre-fix flow
completes both reads before concat-guard fires).

* feat(daemon): reconcile inliner reservation with post-read actual bytes

PR #1312 round-5 (lefarcen P3 confirmed at PR-1312#issuecomment-4424868413
follow-up, path-a): the helper trusted handle.size for the running-
total guard and only reconciled with actual bytes for the per-asset
cap. A reader that under-reported size — stale stat, UTF-8 decode
expansion at read time, sparse file, deliberate lie — could let
many strings materialize before the concat-time guard at the bottom
of inlineRelativeAssets caught the excess. That defeated the
round-4 pre-buffer cap intent.

Fix: after a successful read(), compute actualBytes =
Buffer.byteLength(content, 'utf8'), reconcile runningBytes by
folding in (actualBytes - handle.size), and re-check the total cap.
If the reconciliation pushes runningBytes past maxTotalBytes,
drop the asset's inlining (tag stays as URL ref), set
totalAborted = true to block subsequent worker reads, and let
Promise.all settle. The helper then throws
InlineAssetsLimitError('total') below — matching the round-2/3/4
graceful-fallback pattern (no throw-before-settle race between
in-flight workers).

The per-asset cap check at line 228 is preserved for stat-lying
readers that blow a single asset past maxAssetBytes; that branch
refunds handle.size and drops without flipping totalAborted, so
sibling assets still get a fair shot.

Verifies the round-5 Red at apps/daemon/tests/export-inline-route.ts
('reconciles handle.size with actual content bytes'). Daemon suite
1744/1744 passing.

---------

Co-authored-by: DevForgeAI CI/CD Engineer <devforge-ai@development.ai>

* fix: truncate long template names on project cards (#1220) (#1302)

Add min-width: 0 to .design-card-name so text-overflow: ellipsis
works correctly in flex layouts. Long template names were pushing
the task execution status (Running, Failed, etc.) out of view on
project cards.

Closes #1220

Co-authored-by: laomo <laomo@openclaw.ai>

* fix(desktop): swallow setTypeOfService EINVAL crashes in dev main (#647) (#1298)

* fix(desktop): swallow harmless setTypeOfService EINVAL crashes in dev main

The packaged Electron entry (apps/packaged/src/logging.ts) already
filters the undici "setTypeOfService EINVAL" crash that issue #895
introduced for the prod build, but the dev / source-built desktop
entry was missing the parallel guard. Result: switching settings
tabs in a from-source desktop run could fire a fresh fetch, undici
would try to set IP_TOS on the outbound socket, the kernel would
refuse on certain macOS / VPN configurations, and the rejection
bubbled to Electron's default handler as the "JavaScript error in
the main process" dialog reported in issue #647.

Add the same defensive filter to apps/desktop:

  - isHarmlessSocketOptionError matches only the canonical undici
    shape (syscall name AND EINVAL code). A contradicting code
    (EACCES, EPERM, etc) explicitly fails the match so real bugs
    don't get hidden.
  - The uncaughtException handler logs harmless cases at warn and
    returns silently. For anything else it removes itself from the
    listener list and re-throws via setImmediate, restoring Node's
    default crash path so Electron's native dialog renders exactly
    as it would without this filter.
  - unhandledRejection mirrors the same harmless / fall-through
    split.

The filter is installed BEFORE app.whenReady so it is armed by the
time the renderer fires its first fetch.

The helper is duplicated rather than imported from apps/packaged
because AGENTS.md forbids cross-app private-source imports. The file
header calls out the parallel and notes that the two copies should
stay in sync until the helper is promoted to a shared workspace
package (follow-up); the contract is identical so a regression in
one will surface in the other's test suite.

Tests in apps/desktop/tests/main/uncaught-exception.test.ts mirror
apps/packaged/tests/logging.test.ts: 8 cases pinning the matcher
shape, 2 cases pinning the handler's harmless-log-warn vs
fall-through-rethrow split.

Validated: pnpm guard, pnpm --filter @open-design/desktop typecheck,
pnpm --filter @open-design/desktop build, and pnpm --filter
@open-design/desktop test (14 passed, 10 new).

* fix(desktop,packaged): fail-fast on non-harmless unhandled rejections

The previous unhandledRejection listeners logged non-harmless reasons
and returned, which kept the main process alive after any rejected
promise. A real bug, a failed IPC registration, or any unexpected
async exception was reduced to a console line instead of surfacing
through Node/Electron's default crash path the filter was meant to
preserve.

Both copies now route non-harmless rejections through a parallel
factory (createDesktopUnhandledRejectionHandler /
createFatalUnhandledRejectionHandler) that mirrors the
uncaughtException policy: harmless setTypeOfService EINVAL shapes log
at warn and return, anything else logs at error, removes the
listener, and re-throws via setImmediate. Listener removal happens
before the scheduled throw, so the rethrown reason lands in the
uncaughtException path with no recursion.

Tests cover the harmless branch, the detach + ordered rethrow, and
non-Error / primitive rejection reasons (Promise.reject(42)) which
must fall through. Desktop suite: 13/13, packaged suite: 16/16.
Flagged on PR #1298 by Siri-Ray and the codex P2 review thread; the
two file copies stay in lockstep per the AGENTS.md sync invariant.

---------

Co-authored-by: Nagendhra <nagendhra405@gmail.com>

* feature: refine assistant artifact feedback (#1379)

* feature: refine assistant artifact feedback

* fix: clear hidden custom feedback reason

* test: update assistant feedback expectations

* fix: support object-style question-form options (#1293)

* fix: support object-style question-form options

* fix: preserve stable option values in form submissions

* fix(daemon/acp): terminate ACP child after clean prompt completion (#1286)

* fix(daemon/acp): terminate ACP child after clean prompt completion (Bug B / #1265)

Some ACP agents (notably Devin for Terminal) keep the child process
alive after stdin closes, waiting for the next prompt. Open Design
spawns a fresh agent per chat turn and relies on child.on('close') to
finalize the run, so without an explicit signal-driven shutdown the
chat sits stuck in the 'working' state indefinitely.

Three small, targeted changes:

- apps/daemon/src/acp.ts: After a clean session/prompt response we
  schedule a 500ms grace period and then SIGTERM the child. This
  mirrors the pattern detectAcpModels() already uses after model
  discovery. The grace period leaves well-behaved agents that exit on
  stdin.end() unaffected.
- apps/daemon/src/acp.ts: New completedSuccessfully() method on the
  session handle reports whether the prompt resolved without a fatal
  error or abort, so the consumer can distinguish 'clean signal exit'
  from 'genuine signal failure'.
- apps/daemon/src/server.ts: child.on('close') now treats a SIGTERM
  exit as 'succeeded' when acpSession.completedSuccessfully() is true.
- apps/web/src/providers/daemon.ts: Trust the server's authoritative
  endStatus; the signal/non-zero-code safety net no longer overrides
  an explicit 'succeeded' status, so the chat doesn't surface a fake
  'agent exited with signal SIGTERM' error after a clean ACP run.

Daemon tests cover the SIGTERM grace timer, clean early-exit (timer
cleared), and completedSuccessfully() abort/error states. Manual UI
test on plain main + this fix confirms Devin chats now return to ready
automatically after Done · ...

* fix(daemon/connectionTest): treat ACP clean SIGTERM as success

Codex review on #1286 caught that the new SIGTERM in attachAcpSession
breaks ACP connection tests for agents that don't shut down on
stdin.end() (the exact Devin behavior the patch targets).

attachAgentStreamHandlers() in connectionTest.ts now also respects
acpSession.completedSuccessfully(), mirroring the same check we apply
in server.ts. Without this, a clean prompt response followed by our
SIGTERM would set winner.signal === 'SIGTERM', flip exitedCleanly to
false, and the connection test would report 'agent_spawn_failed'
even when the agent had returned a healthy response.

Also widened the AgentSpawnHandle type so completedSuccessfully is
visible on the structural type used inside connectionTest.ts.

All 56 daemon tests still pass; typecheck + guard clean.

* fix(daemon/acp): narrow ACP success-on-signal override to forced-SIGTERM

Looper review on #1286 caught that the success predicate was broader
than the SIGTERM case it was meant to handle. `completedSuccessfully()`
flips to true as soon as the ACP `session/prompt` response is
processed, but it does not say why the child later closed. With the
broad predicate, an ACP agent that returned a prompt result and then
exited with code 1 (or was killed by SIGKILL/SIGSEGV) was still marked
'succeeded', regressing the existing close-status behavior for genuine
post-response process failures.

Scope the override to the exact forced-shutdown shape this PR
introduces:

  code === null && signal === 'SIGTERM' && acpCleanCompletion

Applied to both `server.ts` (chat run finalization) and
`connectionTest.ts` (connection-test classification). Any other
post-response failure now falls through to 'failed' / 'agent_spawn_failed'
as before.

All 59 daemon tests still pass; typecheck + guard clean.

* fix(web/daemon): only bypass exit-code safety net on explicit server success

Looper review on #1286 caught that the previous web change trusted
`endStatus === 'succeeded'` absolutely, but `endStatus` can become
'succeeded' in two distinct ways:

1. The SSE end event explicitly carries `status: 'succeeded'`
   (authoritative server declaration).
2. The end event omits or has an invalid `status` field and the
   handler silently falls back to 'succeeded' as a local default.

Both produced `endStatus === 'succeeded'` in the existing code, so
the new safety-net bypass treated them identically. That regressed
backward compat: a compatible or older daemon emitting an end event
like `{code:1}` or `{code:null,signal:"SIGTERM"}` with no
`status` would suddenly skip the failure banner.

Track explicit success separately via `serverDeclaredSuccess`,
set true only when:

- The SSE end event has `status === 'succeeded'`, or
- The fallback `fetchChatRunStatus` REST path returns
  `status === 'succeeded'` (which the existing `isChatRunStatus()`
  guard already proves is explicit).

The safety net is now bypassed only on that explicit signal; the
local-fallback success path still reaches the exit-code/signal
check so real failures surface as before.

Adds three web-side regression tests in `apps/web/tests/providers/sse.test.ts`:

- Explicit `status: 'succeeded'` + SIGTERM → onDone called, no error
- End event with `{code:1}` and no `status` → onError surfaces
  'agent exited with code 1' as before
- End event with `{code:null,signal:'SIGTERM'}` and no `status` →
  onError surfaces 'agent exited with signal SIGTERM' as before

`pnpm guard` + daemon typecheck clean; 27/27 SSE tests pass (up
from 24).

* Fix Codex wrapper launch paths (#1395)

* test: add Memory and Routines coverage (#1400)

* test: align extended Playwright coverage with current UI behavior

* test: address extended suite review feedback

* test: fix Codex fallback config hydration in e2e

* test: add Memory and Routines coverage

* test: fix Memory and Routines component test typing

* test: include Memory and Routines e2e in extended suite

* refactor(settings): use tiled language picker instead of dropdown (#1406)

The Language section in Settings rendered a single-button dropdown
trigger that opened a floating menu. With one visible label and lots of
empty panel space, the layout misled users into thinking only one
language existed. Replace the dropdown trigger + portaled menu with an
inline tile grid that shows every locale at a glance and clicks
directly to switch.

Side effects of the new layout: the languageOpen / languageMenuRect
state, the dynamic placement effect, the resize-close effect, the
mousedown click-outside handler, and the languageRef are gone. The
global Escape handler no longer needs to guard against the menu being
open. CSS for .settings-language-picker, .settings-language-button,
.settings-language-menu, and .settings-language-option is replaced by
.settings-language-grid (auto-fill 180px minmax columns) +
.settings-language-tile.

Tests in SettingsDialog.execution.test.tsx that drove the dropdown
(click trigger → click menuitemradio → assert menu closed) are
rewritten to drive the tiles directly via the radio role.

Refs #1347

* fix(web): restore consistent app header layout

* fix(web): restore consistent app header layout

Generated-By: looper 0.7.2 (runner=fixer, agent=opencode)

* fix(web): restore consistent app header layout

Generated-By: looper 0.7.2 (runner=fixer, agent=opencode)

* fix(web): restore consistent app header layout

Generated-By: looper 0.7.2 (runner=fixer, agent=opencode)

* fix(web): hide project output chips in header

---------

Co-authored-by: Prantik Medhi <140103052+prantikmedhi@users.noreply.github.com>
Co-authored-by: 이용진 <90879448+Leesin0222@users.noreply.github.com>
Co-authored-by: Nicholas-Xiong <2482929840@qq.com>
Co-authored-by: Hesam <chngyzkhanwhsht@gmail.com>
Co-authored-by: Yuhao Chen <godcorn001@outlook.com>
Co-authored-by: chaoxiaoche <fanzhen910412@gmail.com>
Co-authored-by: chaoxiaoche <chaoxiaoche@192.168.10.16>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: eggward han <32223217+Eggwardhan@users.noreply.github.com>
Co-authored-by: @aaronjmars <61592645+aaronjmars@users.noreply.github.com>
Co-authored-by: Bryan <121247296+bankielewicz@users.noreply.github.com>
Co-authored-by: DevForgeAI CI/CD Engineer <devforge-ai@development.ai>
Co-authored-by: mrzhangkris <92247501+mrzhangkris@users.noreply.github.com>
Co-authored-by: laomo <laomo@openclaw.ai>
Co-authored-by: Nagendhra Madishetti <nagendhra.madishetti24@gmail.com>
Co-authored-by: Nagendhra <nagendhra405@gmail.com>
Co-authored-by: Mason <jinmeihong0201@gmail.com>
Co-authored-by: Yiang Yiyan <15089131836@163.com>
Co-authored-by: Rocky <101849785+MrRockySL@users.noreply.github.com>
Co-authored-by: nettee <nettee.liu@gmail.com>
Co-authored-by: shangxinyu1 <shangxinyu@refly.ai>
Co-authored-by: Matt Van Horn <mvanhorn@users.noreply.github.com>
2026-05-12 23:15:46 +08:00
shangxinyu1
0220124e0f
test: add Memory and Routines coverage (#1400)
* test: align extended Playwright coverage with current UI behavior

* test: address extended suite review feedback

* test: fix Codex fallback config hydration in e2e

* test: add Memory and Routines coverage

* test: fix Memory and Routines component test typing

* test: include Memory and Routines e2e in extended suite
2026-05-12 17:48:56 +08:00
lefarcen
2a0ebea50b release: Open Design 0.7.0
- bump 14 monorepo package.json files to 0.7.0 (root + apps/{web,daemon,desktop,packaged,landing-page} + packages/{contracts,platform,sidecar,sidecar-proto} + tools/{dev,pack,pr} + e2e); apps/packaged was already at 0.6.1 from beta lane, all others at 0.6.0
- add CHANGELOG.md [0.7.0] - 2026-05-12 entry covering 97 merged PRs since 0.6.0:
  - Critique Theater: Phase 7 web client state machine (#1307) + Phase 6.2 daemon artifact extraction (#1085)
  - Web/UI: thumbs-up/down feedback widget (#1308), Cmd+, opens Settings (#1173), Finalize design package + Continue in CLI (#974), fetch models button for BYOK (#1034), provider models alphabetical sort (#1097), collapsible MCP JSON field-mapping (#1136), design file rename (#894)
  - Daemon: auto-memory store with chat-protocol-aware extraction (#999), install/uninstall skills & design systems (#1003), HTTP 206 range requests for video/audio (#1105), scheduled routines (#1033), agent runtime + route registration refactor (#1063, #1043)
  - HyperFrames: HTML-in-Canvas across web + skills (#866)
  - Skills/design systems: generic skills + design-templates split + finalize-design API (#955), agent-browser skill (#1284), WeChat design system + login-flow skill (#1083), hud/loom/trading-terminal design systems (#1069), release-notes-one-pager skill (#873), tokens.css schema (#1231)
  - Packaging: macOS Intel (x64) build (#759), official Nix flake (#402), beta packaging cache (#1095)
  - Maintainer ops: tools-pr PR-duty workspace (#1259), MAINTAINERS.md (#1290), contributor card bot (#932), PR→issue linking discipline (#1263)
  - Changed: conversation run isolation (#1271), default English i18n fallback (#1270), Codex CLI exit diagnostics / empty-response handling / path fallback (#1267, #1244, #1205)
  - Fixed: ~30 web + desktop + daemon + packaging bugfixes
  - Internal: nightly UI/desktop regression coverage (#1256), e2e/release report hardening (#1140), entry/settings automation (#954)
- catch up [Unreleased] compare link to v0.7.0 and add missing [0.6.0] release link
- add 97 PR footnote refs ([#402]..[#1330])

Verified locally: pnpm install + pre-build contracts/daemon/desktop dist + pnpm typecheck (exit 0 across all 14 packages on Node 22.22 with engine-warning).

Release workflow validation runs after merge via release-stable.
2026-05-12 15:33:28 +08:00
shangxinyu1
5d674410f2
test: stabilize extended Playwright coverage (#1341)
* test: align extended Playwright coverage with current UI behavior

* test: address extended suite review feedback

* test: restore Codex path hydration assertion
2026-05-12 15:11:34 +08:00
Eli
9c489aa045
feat(web): redesign Designs tab cards — covers, tags, overflow menu, multi-select (#1161)
* feat(web): redesign Designs tab cards — covers, tags, overflow menu, multi-select

- Render real previews on project cards: HTML iframe / image / video / hashed gradient fallback with project initial; lazily fetches the project's primary file when metadata.entryFile is unset, prefers index.html → newest html → image → video.
- Live artifact card thumbnails embed the rendered artifact URL via sandboxed iframe.
- Replace the per-card close button with a `…` overflow menu (Rename, Delete) that opens on hover/click; click-outside and Esc close it.
- Add multi-select mode (toolbar toggle → checkbox per card → "N selected · Delete · Cancel" pill) with batch delete via the existing onDelete prop.
- Add a category tag to every card (Prototype / Live Artifact / Slide / Media) derived from project.metadata.intent / kind / skillId.
- Replace browser prompt() and confirm() with custom modals (rename input + danger-confirm) reusing the existing .modal shell.
- Add `more-horizontal` icon and 16 new i18n keys across all 18 locales (zh-CN/zh-TW localized; others fall back to English).

* test(e2e): update home delete flow for overflow menu + custom confirm modal

The previous flow targeted a per-card X button labelled "delete project <name>"
and asserted on a native `dialog` event. The card UI now exposes a `…` overflow
menu and a styled confirm modal, so reach delete via the menu and assert against
the modal's Cancel / Delete buttons instead.

* fix(web): harden Designs tab preview sandbox

* fix(web): hide Designs select mode in kanban
2026-05-12 15:08:22 +08:00
Eli
928079daf5
feat(web): consolidate Image/Video/Audio entries into a Media tab (#1167)
Reduces the New Project panel's top-level tab count by collapsing the
three media surfaces into a single Media tab with an inner segmented
control, and polishes the controls inside that tab so they stop
dominating the panel:

- Media tab + segmented (Image / Video / Audio) inside the panel body.
  Underlying ProjectKind branches and submission contract unchanged —
  the daemon still receives kind=image/video/audio.
- Model picker rewritten as a combobox: one trigger row + searchable,
  provider-grouped popover with Recommended badges. Replaces the flat
  grid of provider-grouped cards that scrolled past the fold once the
  fourth provider landed.
- Aspect picker compressed from a 5-card grid to a single row of
  segmented pills with mini ratio glyphs.
- Image surface no longer carries a free-form Style notes field; it
  was redundant with the prompt template + main prompt input.
- Live artifact tab locks fidelity to high-fidelity (the wireframe
  option is now hidden) — a wireframe live artifact doesn't make
  sense and the picker added noise.

i18n: adds tabMedia / titleMedia / model* keys across all 18 locales,
removes imageStyleLabel / imageStylePlaceholder. Tests + e2e selectors
updated to drive the new Media tab + segmented surface flow.
2026-05-12 14:52:03 +08:00
Eli
1b307bf17f
feat(web): tweaks palette popover with HSL hue-shift recoloring (#1292)
* feat(web): tweaks palette popover with HSL hue-shift recoloring

Adds a Tweaks color-palette popover to the HTML preview toolbar.
Selecting a palette re-skins the iframe in place via a srcDoc-side
bridge that walks the DOM and shifts every chromatic paint to the
target hue while preserving each color's saturation and lightness —
pale tints stay pale, bold CTAs stay bold, just in the new color
family. Mono-noir desaturates instead of shifting.

- runtime/srcdoc: new injectPaletteBridge + paletteBridge / initialPalette options
- file-viewer-render-mode: paletteActive flips URL-load back to srcDoc so the bridge can be injected
- FileViewer: state, popover, postMessage wiring, srcDoc + useUrlLoadPreview integration
- PaletteTweaks: popover UI with Original + Coral / Electric / Acid forest / Risograph / Mono noir
- PreviewDrawOverlay: stub pass-through until the draw branch lands

* feat(web): hide finalize-design toolbar from project header

* test(e2e): skip project actions toolbar flow after toolbar removal
2026-05-12 14:38:00 +08:00
Sid
fb47d0ae51
style(web): polish EntryView UI — sidebar layout, folder tabs, slim form, blue selected token (#1360)
* chore(web): upgrade radius scale + introduce blue --selected token

UI polish pass — design tokens for follow-up commits.

Radius scale was visually too square at the small end. Bump up so
buttons / inputs / cards feel rounded rather than boxy:

- `--radius-sm: 6px → 8px`  (buttons, inputs, small chips)
- `--radius: 10px → 12px`  (medium containers, Recent filter pill)
- `--radius-lg: 14px → 16px`  (project cards)
- `--radius-pill: 999px` unchanged (status chips)

Introduce a separate "selected" colour so selection indicators
(card borders, focus rings) read as blue instead of fighting with
the orange brand accent that drives primary CTAs:

- `--selected: #2563eb`  (Tailwind blue-600)
- `--selected-soft: rgba(37, 99, 235, 0.16)`  (soft tint for shadows)

No selectors are migrated to `--selected` in this commit — that
happens in a later "selected state" commit so the diff stays scoped.

* refactor(web): replace entry global header with sidebar brand + reorder bottom chips

Pre-existing layout: a global \`AppChromeHeader\` strip sat across the
whole top of EntryView (logo + settings gear), then a 2-column body
below it. Visual mass concentrated in a thin horizontal bar that did
not relate to the page's column structure, and the settings gear
duplicated the bottom Local-CLI chip.

New layout matches the two-column "brand-in-sidebar + tabs-in-main"
pattern: the brand block lives at the top of \`.entry-side\` (left
column), the right tabs live at the top of \`.entry-main\`, and the
vertical divider between them is the only horizontal seam.

EntryView:

- Drop \`<AppChromeHeader actions={avatarMenu} />\` from EntryView's
  render — the home page no longer renders the global chrome strip.
  (ProjectView still uses AppChromeHeader for back-nav / file
  actions, so the component itself stays in the codebase.)
- Add a sidebar brand block inside \`.entry-side\` using the
  already-defined \`.entry-brand\` / \`.entry-brand-mark\` /
  \`.entry-brand-title\` classes that were sitting dead in
  index.css.
- Reorder \`.entry-side-foot\` chips so that the env-critical
  Local CLI row sits on top of the row, with the secondary
  toggles (language picker, pet adoption, X follow icon) compact
  on a second row. The Follow @nexudotio chip drops its text
  label and becomes icon-only — pure marketing content, so it no
  longer earns a full-width pill.
- Settings access moves entirely to the Local CLI chip's existing
  click handler; the top-right gear is gone (it was a duplicate).

CSS:

- \`.entry-shell\` grid: \`auto 1fr\` → \`1fr\` (no header row).
- \`.entry-side\` background: \`var(--bg-panel)\` → \`transparent\`,
  so the sidebar shares the page beige and only the New-prototype
  card reads as white. Removes the "everything on the left is on
  one big white sheet" feeling.
- \`.entry-brand\` gets \`padding: 24px 20px 18px\` so the logo +
  title block has breathing room at the top of the sidebar.
- \`.entry-brand-mark\` width/height \`44 → 34\`. The previous
  44px gradient ring was visually heavier than the title text it
  sat next to.
- \`.entry-brand-title\` weight \`600 → 450\`, color
  \`var(--text-strong)\` → \`var(--text)\`. Serif title still
  reads as the page anchor without the chunky "bold black" stamp.
- \`.entry-brand-actions\` added for future right-aligned actions
  (carries no actual content in this commit — kept so re-adding a
  settings/avatar entry point doesn't need new CSS).
- \`.entry-side-foot .foot-pill\` slim pass: padding
  \`4px 10px → 3px 8px\`, font \`11.5px → 10.5px\`, gap \`6 → 5\`,
  plus \`justify-content: center\` and \`min-height: 24px\` so the
  icon-only Follow pill stays the same height as the text pills
  next to it.

* style(web): align right tabs row with brand row + strip hover/focus noise

Right column's tabs row ("Designs / Templates / Design systems /
Image templates / Video templates") needed three things:

1. Vertical center of tab text aligned with the brand logo on the
   left (both rows feel like one row, separated by the vertical
   divider only).
2. Active tab's underline sitting flush on the horizontal divider
   below the tabs (not floating mid-row).
3. No hover background, no focus outline, no transition — tabs are
   a navigation strip, not action buttons.

Changes:

- `.entry-header` padding `0 28px` → `24px 28px 0`, drop the
  `min-height: 52px`. Padding-top mirrors the brand block's
  padding-top (24px) so left logo top and right tabs top land on
  the same Y. Header height now content-driven; underline meets
  the `border-bottom` divider naturally.
- `.entry-tabs` gets `align-self: stretch` + `align-items: center`
  + `gap: 2px → 24px`. The stretch lets the tabs container fill
  header height; the bigger gap matches Claude Design's tab
  rhythm.
- `.entry-tab` becomes a "plain underline tab":
  - `border-radius: 6px 6px 0 0 → 0` (no folder-tab look — that's
    on the left tabs).
  - `padding: 14px 11px → 6px 4px 8px` so text + underline form a
    tight group, with the underline sitting at the bottom of the
    tab box right above the header divider.
  - `font-size: 14px → 12px` matches the left newproj tabs (set
    in commit 4) — both columns share the same tab type-size.
  - `transition: none` removes the inherited 120ms background /
    border / color transition.
- Hover / focus / active states explicitly zero out background,
  border-color, outline. Hover keeps a subtle color change
  (`text-muted → text`) so the tab still feels interactive
  without flashing a chip behind it.
- Active state colors are duplicated across `.active`,
  `.active:hover`, `.active:focus`, `.active:focus-visible` so the
  black underline never gets overwritten by the inactive-state
  rules above.

* style(web): folder-tab merge on left newproj tabs + flat card top corners

The left "Prototype / Live artifact / Slide deck / …" tabs sat as
plain underline tabs above a fully-rounded card. The active tab and
card looked like two stacked rectangles with a gap.

Folder-tab pattern:

- Active tab gets a white background + 12px top corners + a 1px
  border on top / left / right.
- Active tab's bottom border matches the card's background color
  (effectively invisible) — so where the tab sits, the card's top
  border is "broken" and tab + card read as one merged shape.
- Card top corners are square (`border-radius: 0 0 12px 12px`),
  bottom corners stay 12px. With the active tab's square bottom
  edge, the merge line at the tab/card seam is a clean horizontal,
  not a curve mismatch.

Implementation:

- `.newproj-tabs-shell`:
  - `overflow: hidden → visible` so the tab's overlap with the
    card below isn't clipped at the shell's bottom edge.
  - `margin-bottom: -1px` + `z-index: 2` so the shell renders on
    top of the card and the 1px tab/card overlap actually paints.
  - The `.can-left { padding-left: 40 }` / `.can-right` overrides
    used to reserve room for scroll arrows are removed (arrows
    are hidden, no extra padding needed).
- `.newproj-tabs` keeps its horizontal `overflow-x: auto` so the
  8 project-type tabs can still scroll inside the sidebar width.
- `.newproj-tabs-arrow` becomes `display: none`. The two
  chevron-circle buttons added clutter without much benefit —
  users with touchpads / wheels / keyboard already scroll the
  tabs row natively, and the `::before` / `::after` linear-
  gradient fades (now using `--bg` instead of `--bg-panel` so
  they fade into the page beige, not the sidebar panel that no
  longer exists) signal there are more tabs to the right.
- `.newproj-tab`:
  - Replace the plain bottom-underline (`border-bottom: 2px
    solid transparent`) with a full transparent 1px border so
    the active state can flip just the colors without changing
    layout.
  - `border-radius: 0 → 12px 12px 0 0`.
  - `position: relative` for z-index stacking.
  - `padding: 10px 6px → 7px 14px` (less vertical, more
    horizontal — tabs read as "labels" rather than chunky
    buttons).
  - Symmetric top/bottom padding (`7px`) so the text + folder-
    tab top corners stack cleanly.
  - `transition: none` — no animation between active/inactive
    states (tabs are nav, not action buttons).
- All hover / focus / focus-visible / active states zeroed out
  background and border-color so the inherited `button { … }`
  base style (which adds bg-subtle on hover) does not bleed in.
  Subtle color change on hover (`text-muted → text`) is the
  only affordance.
- `.newproj-tab.active` (+ active hover/focus combos so the
  base rules don't override): white bg, full var(--border) on
  three sides, bottom border = var(--bg-panel) (invisible
  against card), z-index 3 (above non-active tabs and shell
  pseudo-elements).
- `.newproj-body`:
  - `margin: 0 24px` so the card breathes inside the sidebar
    (and the active tab's left edge aligns with the card's left
    edge).
  - `padding: 18px 24px 28px → 16px 18px 18px` — tighter.
  - `border-radius: (full 12) → 0 0 12px 12px` for the
    flat-top merge with the active tab.
  - Adds explicit `border` + `background: var(--bg-panel)` +
    `box-shadow: var(--shadow-xs)` so the form reads as a card
    floating on the transparent sidebar.
  - `flex: 1 → 0 0 auto` (and `min-height: 0` / `overflow-y:
    auto` removed) — the card is content-sized, not stretched
    to fill the sidebar. Empty space below the card is now
    page beige, not a giant white sheet.
  - `gap: 14px → 12px` between form sections.

* style(web): slim NewProjectPanel form (title, fidelity, buttons, ds-picker)

The form inside the new white card felt overweight against the
compacted layout from the previous commits — fidelity cards were
~133px tall, the Create button + Open-folder secondary button both
had ~11px symmetric padding, the design-system trigger had a 32px
avatar in a 55px-tall row. Slim every element so the card reads as a
focused form, not a stack of beefy buttons.

Title:
- \`.newproj-title\` font \`14px / 600 → 13px / 550\`. Still
  visibly the section heading but no longer competing with the
  serif brand title above.

Fidelity:
- \`.fidelity-thumb { aspect-ratio: 12/7 → 16/7 }\`. The previous
  aspect made cards taller than they needed to be in the narrow
  sidebar column.
- \`.fidelity-card { gap: 8 → 6, padding: 10/10/12 → 8/8/10 }\`.
  Combined with the thumb aspect change, card height drops from
  ~133px → ~102px (visually close to the Claude Design reference
  while keeping the same content).

Primary / secondary buttons:
- \`.newproj-create\` padding \`11px (symmetric) → 8px 11px\`,
  margin-top \`4 → 2\` — primary CTA no longer towers over the
  fidelity cards above it.
- \`.newproj-import\` padding \`10px → 6px 10px\` — the secondary
  "Import Claude Design ZIP" button feels like an alt option, not
  a peer of Create.

Design system trigger:
- \`.ds-picker-trigger\` gap \`10 → 8\`, padding \`8/10 → 6/10\`.
- \`.ds-picker-title\` font \`13 → 12.5\` so name + subtitle stay
  legible in the slimmer row without overflowing the column.
- \`.ds-avatar\` width/height \`32 → 26\`, border-radius \`6 → 5\`.
  The thumbnail was the dominant element in the row; shrinking it
  pulls the row height from ~55px → ~50px.

Footer disclaimer:
- \`.newproj-footer\` padding-top \`0 → 12px\`. The "Only you can
  see your project by default." line was butting against the card
  bottom; 12px of air separates the disclaimer (page-bg context)
  from the card (panel-bg context) cleanly.

* style(web): blue selected indicators + Recent filter rounded + neutral input focus

Three small "selection state" tweaks driven by the new
\`--selected\` token introduced earlier in this branch:

1. Fidelity card selected border is now blue, not the brand
   accent. The orange Create button + the orange selected card
   border were fighting for the same visual role (primary
   action vs primary selection). Blue clearly says "this is
   the one that is selected" without competing with the CTA.

   - \`.fidelity-card.active\` border-color
     \`var(--accent) → var(--selected)\`.
   - Box-shadow ring + soft 0.04 drop swapped from the orange
     \`180/90/59\` rgba tuple to the blue \`37/99/235\` tuple.
   - \`.fidelity-card.active .fidelity-thumb\` border swapped
     from \`var(--accent-soft) → var(--selected-soft)\`.

2. Recent / Your designs filter is no longer a fully-rounded
   pill. The bottom-left settings chips deserve to be the only
   "999px pill" shape — those are tertiary status indicators.
   The Recent/Your designs toggle is a higher-importance
   inline filter, so it gets the medium radius instead.

   - \`.subtab-pill\` wrapper border-radius
     \`var(--radius-pill) → var(--radius)\` (12px).
   - Inner button border-radius
     \`var(--radius-pill) → var(--radius-sm)\` (8px).
   - Active state background \`var(--text) → var(--bg-panel)\`,
     color \`var(--bg) → var(--text)\`. The "black filled pill"
     read as a status badge; white-on-faint-gray reads as
     "selected toggle" — same shape as Claude Design's Recent
     pill.

3. Input focus is neutralised. The base \`input:focus\` rule
   added an orange border + a 3px orange-soft ring around the
   focused field — way too much visual weight for a quiet form
   ("Project name" → focus made it scream).

   - \`input:focus / textarea:focus / select:focus\` border-color
     \`var(--accent) → var(--border-strong)\` (light grey).
   - Box-shadow ring removed (\`none\`). Focused inputs now only
     darken their border by one step — barely visible but enough
     to confirm focus.

These three changes are grouped because they all migrate selection-
state styling off the brand accent and onto neutral / blue tokens.
The next pass (if any) can sweep the remaining \`var(--accent)\`
selection sites (\`.ds-row.active\`, \`.ds-picker-trigger.open\`,
\`.conv-pill.open\`, …) to use \`--selected\` too, but each of those
lives in a different surface and felt out of scope for the entry
view polish.

* refactor(web): pet rail toggle moves inside pet pill as split button

WHAT
- Convert the pet pill from a single `<button>` to a `<div>` containing
  two buttons separated by a 1px divider:
  * `.pet-pill-main` keeps the existing "Adopt a pet" / "Change pet"
    glyph + label + unadopted dot, still wired to `onAdoptPet`.
  * `.pet-pill-toggle` is a small icon-only button that flips
    `petRailHidden` — eye icon when the rail is hidden ("click to show"),
    eye-off when visible ("click to hide").
- Drop the old avatar-menu popover from EntryView entirely:
  `avatarMenuOpen` state, the outside-click / Escape effect, and the
  cog-popover trigger are all removed. The `Settings` entry of that
  popover was already redundant with the `Local CLI` chip; the
  `Hide/Show pet picker` entry now lives directly on the pet pill.
- CSS in the `.pet-pill` block:
  * `height: 24px` + `padding: 0` so the outer pill matches every other
    chip in the row vertically.
  * `.pet-pill-glyph` reduced from 14px to 12px and constrained to a
    14x14 inline-flex box so the unicorn / paw glyph stops pushing the
    chip taller than 24px.
  * Per-region hover (`.pet-pill-main:hover`, `.pet-pill-toggle:hover`)
    so each side of the split lights up independently, with the divider
    inheriting the accent tint while the chip is in `pet-pill-fresh`.

WHY
- After commit 5fe5721c removed the global `<AppChromeHeader>`, the only
  entrypoint to "Show pet picker" was the avatar-menu popover. Putting
  the avatar cog back next to the brand mark felt wrong: it elevates
  Settings (already on the `Local CLI` chip) to a primary affordance and
  sits next to the logo, where it doesn't belong by hierarchy.
- The pet-rail toggle is fundamentally a pet-area control — it belongs
  with the pet adoption chip, not in a popover. Putting both on the same
  chip via a split button gives the rail toggle a stable, discoverable
  home and keeps `.entry-brand` a brand-only row.

SCOPE
- `apps/web/src/components/EntryView.tsx` + `apps/web/src/index.css`.
  No new state, no new i18n keys (reuses `pet.railShow` / `pet.railHide`).
- The orphan i18n keys `entry.openSettingsTitle` and
  `entry.openSettingsAria` are no longer referenced by EntryView but are
  left in place — they're shared types that other locale files still
  declare; a focused cleanup belongs in a separate commit.

* test(e2e): update entry chrome + project mgmt assertions for new layout

WHAT
- entry-chrome-flows.test.ts:
  - Rename `entry chrome settings menu toggles pet rail visibility` →
    `pet pill toggle hides and shows the pet rail`. The flow no longer
    goes through an `Open settings` cog + `.avatar-popover` chain;
    instead it clicks the in-pill `.pet-pill-toggle` directly and
    verifies its `aria-label` flips between `Hide` / `Show pet picker`.
  - Replace `.app-chrome-header` / `.app-chrome-brand` assertions with
    `.entry-brand` + `.entry-brand-title` text checks. The global
    chrome strip no longer exists on EntryView.
  - The compact-width overflow guard now measures `.entry-brand` rather
    than `.app-chrome-header`, since the brand row replaced the chrome
    strip as the only top-of-page horizontal stack.

- project-management-flows.test.ts:
  - Drop the `Scroll project types right` arrow click. The
    `.newproj-tabs-arrow` buttons are hidden (the folder-tab pattern
    leans on shadow gradients on `.newproj-tabs-shell::before/::after`
    instead). Playwright's `locator.click()` auto-calls
    `scrollIntoViewIfNeeded()`, so clicking `new-project-tab-image`
    after a tab-switch still reaches the off-screen tab.

WHY
- These selectors / interactions are tied to UI affordances the earlier
  commits in this branch deliberately replaced. The behaviors they pin
  (pet rail toggle reachability, no horizontal overflow at 820px,
  draft preservation across tab switches) are still asserted — only
  the selectors needed to follow the new structure.

VERIFICATION
- `pnpm exec playwright test ui/entry-chrome-flows.test.ts ui/entry-configuration-flows.test.ts ui/project-management-flows.test.ts`
  → 17/17 passed (chromium project, single worker, fresh daemon).

* fix(web): restore .newproj-body as scroll container (P1 regression)

WHAT
Reintroduce `flex: 1 1 auto; min-height: 0; overflow-y: auto;` on
`.newproj-body`, alongside the `display: flex` + `padding` that
commit ba44e396 kept. The parent `.newproj` is still `overflow: hidden`,
so without these three lines the card can clip its own content with
no scroll recovery.

WHY
Reported by @lefarcen (P1) and @Siri-Ray in review on #1360. Before
this commit the slim-form pass made the body shrink-wrap (`flex: 0 0 auto`)
to keep the empty-state caption snug against the card edge. That works
when the form is short, but the card can grow well past the available
sidebar height in real scenarios:
- Compact-height windows (≤ 720 vertical px).
- Image / media tabs that add aspect + model rows.
- Validation / error text after a failed Create.
- Design-system popover opened with many systems.

In all four cases the Create / Import / Open-folder stack — or the
picker's bottom options — were sliding below the visible sidebar with
no scroll bar to recover them. This is a regression against the
behavior that landed in #1038, which made `.newproj-body` the scroll
container precisely to keep the form bounded.

SCOPE
- `apps/web/src/index.css` only, one ruleset.
- Visual cost: the empty-state caption (`.newproj-footer`) now sits at
  the bottom of the available sidebar height instead of hugging the
  card, which is the same behavior pre-#1167 / pre-this branch.
- A short comment in CSS now flags the invariant so a future refactor
  doesn't quietly flip the flex semantics again.

* fix(web): restore :focus-visible ring on entry-tab + newproj-tab (a11y)

WHAT
Split the prior `:focus, :focus-visible, :active` group on both tab
selectors so that `:focus-visible` no longer inherits the zero-out
that was added to scrub the orange mouse-focus halo:

- `.newproj-tab:focus-visible` → 2px inset blue ring (`--selected`)
  hugging the folder-tab's 8px top-corner radius, plus `--text`
  foreground so the label reads at full contrast while focused.
- `.entry-tab:focus-visible` → 2px solid outline in `--selected` with
  `outline-offset: 2px` and `border-radius: 4px`. Outline is used
  here instead of inset shadow because the tab has no padding to
  spare against the active 1px bottom border, and outline doesn't
  participate in layout.

Mouse-driven `:focus` and `:active` keep the prior transparent
treatment — there is no orange ring on click, which is the polish
the rest of this branch is going for.

WHY
Flagged by @Siri-Ray (changed-range) and @lefarcen (P2) on #1360:
the polish-tab commits stripped the focus indicator entirely instead
of just suppressing mouse focus, so keyboard users had no way to see
which tab was active during arrow-key navigation. Re-introducing
`:focus-visible` only restores keyboard reachability while keeping
the visual quiet for pointer users.

SCOPE
- `apps/web/src/index.css` only. Two rulesets touched, one new
  `:focus-visible` rule added per selector.
- No JS, no aria, no test churn — the rules trigger off the existing
  `:focus-visible` pseudo-class, which the same Playwright tests
  already exercise via Tab.

* fix(web): scope quieted input focus to .entry-side, restore global ring (a11y)

WHAT
Split the global input focus rule into two layers:

- `input:focus, textarea:focus, select:focus` now keeps a visible
  focus indicator on every input across the app — but in the new
  `--selected` blue (border + 3px `--selected-soft` ring) instead of
  the original `--accent` orange. This preserves accessibility for
  every settings page, dialog, project workspace, and right-column
  control that was previously losing its focus halo.

- `.entry-side input:focus` keeps the neutral treatment from this
  branch — `border-color: var(--border-strong)`, no ring. The orange
  "Create" CTA on the entry sidebar is already the loudest element in
  that panel, so a competing blue ring on the title / path inputs
  next to it pulled the eye in the wrong direction. Scoping the
  quieter focus to the sidebar keeps that intent without leaking out
  to the rest of the app.

WHY
Flagged by @lefarcen as a P2 a11y regression on #1360: the previous
version of this rule scrubbed the focus indicator (`box-shadow: none`,
border only one shade darker) for every input in the app, not just on
the entry surface this branch is targeting. Keyboard users on
settings forms and dialogs were left without a visible focus state.

SCOPE
- `apps/web/src/index.css` only, one global rule restored and one
  scoped override added. No JS, no template change.
- Color shift global focus orange → blue is intentional: it consumes
  the new `--selected` token introduced in commit 13dc8a65 and
  matches the active-state direction this PR is establishing.

* chore(web): drop dead AppChromeHeader / isMacPlatform imports + document --selected token

WHAT
- Remove the `AppChromeHeader` import from `EntryView.tsx`. The
  component itself is still used (and re-exported) by ProjectView;
  EntryView dropped its render site in commit 5fe5721c and the import
  has been a stale reference ever since.
- Remove the `isMacPlatform` import too. It was only used by the old
  avatar-menu popover (for the `⌘,` / `Ctrl+,` Settings hint) which
  was deleted along with the popover when the pet-pill split button
  replaced it.
- Add a docblock above the `--selected` / `--selected-soft` token pair
  in `index.css` so the cascade has a local explanation for why this
  blue is separate from the brand `--accent`. The note calls out which
  affordances should reach for `--selected` (active option, focused
  input ring, active filter pill) and pins the 16% soft fill role.

WHY
Both flagged by @lefarcen on #1360:
- P3 — dead import: the TS config doesn't fail on unused imports, so
  this was silently shipping as dead code and obscuring the deliberate
  removal of the global chrome header.
- P3 — token doc: the `--accent` vs `--selected` split was only
  explained in the PR body. Putting the rationale next to the token
  makes the contract durable beyond this discussion.

SCOPE
- `apps/web/src/components/EntryView.tsx`: two `import` lines removed.
- `apps/web/src/index.css`: one comment block added directly above the
  token declaration.
- Verified: `pnpm --filter @open-design/web typecheck` → exit 0.
2026-05-12 14:26:39 +08:00
pftom
b55f171693 feat(web): redesign entry view with new navigation and home layout
- Introduced `EntryNavRail` for a streamlined left navigation experience, featuring primary actions and a brand logo.
- Created `EntryShell` to manage the entire home view layout, integrating the centered hero, recent projects, and plugins section.
- Developed `HomeHero` for user prompt input, allowing seamless interaction with plugins and project creation.
- Replaced the previous `PluginLoopHome` with a more cohesive `HomeView` that orchestrates plugin interactions and project submissions.
- Enhanced CSS styles for improved visual consistency across the redesigned components.

This update significantly enhances the user experience by providing a more intuitive and visually appealing entry point into the application.
2026-05-12 10:56:35 +08:00
chaoxiaoche
a75d9938c7
feat(design-systems): add structured tokens.css schema (default + kami) (#1231)
* feat(design-systems): add structured tokens.css schema (default + kami)

Compile each brand's DESIGN.md prose into a machine-readable :root
block agents paste verbatim, removing the "Primary → --accent"
translation step where most token misuse happens. Daemon prompt
injection lands in a follow-up; lint-artifact already enforces the
shared token vocabulary so no rule changes needed.

Schema validated across two contrasting aesthetics:
- default (sans-serif, cobalt, B2B utility) — stress test the
  shallow form, 2-level fg / 2-level surface
- kami (serif, parchment, ink-blue, print-first) — stress test the
  rich form, 4-level fg ramp, 3-level surface, ring elevation, i18n
  font stacks, and solid-hex tag tints (print renderers double-paint
  alpha)

Schema growth from kami's stress test (5 new optional slots, all
backward-compatible — default aliases via var() to existing tokens):
- --fg-2 / --meta (4-level fg ramp)
- --surface-warm (3-level surface)
- --border-soft (2-level border)
- --elev-ring (ring elevation as first-class level)

Brand-specific extensions live in tokens.css with explicit "NOT in
shared schema" labels and a documented promotion path (≥2 brands
need it → promote to schema slot).

components.html in each brand is a self-contained reference fixture
that exercises every token through real layouts. Both fixtures lint
clean against apps/daemon/src/lint-artifact.ts.

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(design-systems): add token-fixture drift guard

Each design system in design-systems/<brand>/ ships two files agents
consume in tandem: tokens.css (canonical token bindings) and
components.html (a self-contained fixture whose first <style> embeds
the same :root paste so the file renders standalone). The fixture's
:root block is a copy of tokens.css's :root block, kept in sync only
by an inline comment.

This adds scripts/check-tokens-fixture-sync.ts and registers it in
pnpm guard. The check pairs each brand's tokens.css with its
components.html and asserts the unscoped :root block is byte-equivalent
after canonical normalization (CSS comments stripped, whitespace
collapsed, separator spacing normalized). Brands missing one half of
the pair, or with no :root rule in either file, fail the guard.

Scoped overrides like :root[lang="zh-CN"] are not required to appear
in the fixture (per the kami fixture's inline comment they are pasted
only when an artifact's <html lang> matches), so the check only
compares the unscoped :root block.

Verified: pnpm guard passes for default + kami, fails on intentional
value drift, fails on missing token, tolerates whitespace-only
formatting differences.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(design-systems): point fixture CTAs to real files

Both default and kami components.html advertised in-page anchors
(#tokens, #spec, #surface, #accent, #type, #components) but defined
no matching ids, so every CTA was a no-op when the fixture was
opened locally — flagged by mrcfps in #1231.

Re-point each link to a real artifact in the same brand directory:

- "View tokens" / "Inspect tokens" / "Inspect typography" → ./tokens.css
- "Read the spec" / "Read the rule" → ./DESIGN.md

Browsers render these as raw source views, which is the desired UX
for a reference fixture: clicking the CTA shows the underlying
contract instead of jumping to nothing. Agents copying the fixture
also learn the pattern of "buttons link to actual sibling resources".

The :root token block is unchanged, so the token-fixture drift guard
still passes for both brands.

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(design-systems): codify token schema (A1/A2/B/C layers)

The two-brand pilot (default + kami) settled the shape of the shared
token schema; this commit codifies it as a machine-readable contract
and enforces it in pnpm guard, addressing lefarcen's review on #1231:

> the optional-vs-required split won't generalize cleanly when brand
> #3 needs different Layer A tokens or when multiple brands converge
> on the same extension (promoting C→B→A). Consider surfacing that
> limitation in the PR narrative or in a future SCHEMA.md.

Schema lives under design-systems/_schema/ as three files:

- tokens.schema.ts   — TypeScript declaration of every shared token
                       with its layer (A1-identity / A1-structure /
                       A2 / B-slot), plus per-brand C-extension
                       allowlists and a global C-prefix allowlist
- defaults.css       — CSS mirror of A2 fallback values, used as the
                       human-readable contract reviewer's-eye copy
                       and the future input to the derive script
- AGENTS.md          — schema layer model, C → B-slot → A2 promotion
                       rules, when-not-to-add-a-token guidance

Layer model:

  A1-identity    8 tokens — bg/surface/fg/muted/border/accent +
                 font-display/font-body. The brand IS these values;
                 no fallback is defensible.

  A1-structure  18 tokens — type scale (8), leading (2), tracking
                 (1), section-y (3), container (4). Structural
                 decisions vary per brand by design and have no
                 cross-brand default.

  A2            26 tokens — accent states, semantic colors, motion,
                 base spacing scale, radius, elevation, focus,
                 font-mono. Required in every tokens.css; fallback
                 lives in defaults.css for the future derive script
                 to inline when DESIGN.md does not specify the value.

  B-slot         4 tokens — fg-2 / meta / surface-warm / border-soft.
                 Brand may bind independently or alias the named
                 sibling via var(...) for components that target the
                 richer ramp.

  C-extension    n tokens — brand-specific names (kami's tag-bg-*,
                 leading-display, accent-light, etc.). Allowlisted
                 per-brand in BRAND_EXTENSIONS or globally by prefix
                 in BRAND_EXTENSION_PREFIXES. Promote when a second
                 brand adopts the same name.

Why A2 fails the guard today:
  Artifacts are generated by agents pasting one brand's :root block
  into a single <style>; there is no global stylesheet that supplies
  fallbacks at runtime. A tokens.css missing an A2 declaration would
  silently break any var() reference in the fixture. Until the
  derive script (PR-B) lands and inlines defaults, every brand's
  tokens.css must declare every A2 token directly. The guard
  enforces this strictly.

Why --font-mono lands in A2 (not A1):
  149 brands' DESIGN.md files were surveyed: 87 (58%) declare a
  monospace stack, 62 (42%) do not — including major brands like
  bmw / nike / apple / notion / mastercard / meta. Agent paste
  cannot rely on the brand author having written it down; a
  defaultable A2 fallback (with CJK brands like kami overriding) is
  safer than forcing every brand author to add a field they may not
  realize their kbd / code-block components need.

Five guard checks, each registered as its own entry in scripts/guard.ts
so failures attribute to a specific contract:

  1. token-fixture sync       — components.html :root ↔ tokens.css :root
                                 byte-equivalent (existing)
  2. A1 required tokens       — every brand declares every A1 token
  3. A2 required tokens       — every brand declares every A2 token
  4. unknown token allowlist  — every declared token is in schema or
                                 brand-extension allowlist
  5. A2 defaults parity       — defaults.css ↔ tokens.schema.ts
                                 fallback byte-equivalent

Verified on default + kami:
  - 26 A1 tokens declared in both brands
  - 26 A2 tokens declared in both brands
  - 129 total declarations, all match shared schema or brand extensions
  - defaults.css ↔ tokens.schema.ts parity holds
  - sanity test: drifting --motion-fast in defaults.css fails check 5
    with a clear divergence message

The PR description originally listed "Dedicated SCHEMA.md" as
explicitly NOT in this PR ("Once 3+ brands ship, extracting a single
source of truth becomes worthwhile"). That boundary moves: lefarcen's
review surfaced the schema-generalization risk, and the schema must
exist as a machine-enforced contract before the derive script can
read it. The TS file replaces the markdown that was deferred.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web/tests): pass missing designTemplates prop to ProjectView

Pre-existing typecheck regression on main: PR #955 (b5eb8c16,
"generic skills + split skills/design-templates + finalize-design
API") added required `designTemplates: SkillSummary[]` to ProjectView
Props but updated only two of the three test fixtures that render
ProjectView directly. The third — ProjectView.api-empty-response.test.tsx
— was missed, so `pnpm typecheck` (and CI on any PR merging into
main) fails on:

  apps/web/tests/components/ProjectView.api-empty-response.test.tsx
    (168,6): error TS2741: Property 'designTemplates' is missing in
    type ...

The other two ProjectView tests already pass `designTemplates={[]}`,
so this aligns this fixture with the existing pattern. Out of scope
for #1231 strictly, but the regression blocks the merged-state
typecheck CI runs that #1231 triggers, and the one-line fix here
restores main's typecheck health for everyone.

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(design-systems): enforce B-slot required tokens in pnpm guard

Closes mrcfps + lefarcen review comment thread on #1231:

> The guard validates A2 required tokens here, but there's no
> sibling check for B-slot aliases (--fg-2, --meta, --surface-warm,
> --border-soft). Per the schema docs, every brand must declare
> A1 + A2 + B-slot names so shared components can safely read
> var(--fg-2) etc. Without a B-slot guard, a brand can omit those
> aliases, pass pnpm guard, and break any artifact that references
> them.

Same artifact-paste constraint as A2: agents render artifacts by
pasting one brand's :root block into a single <style>; there is no
runtime cascade, so a missing B-slot makes any var(--fg-2) reference
resolve to nothing. Until now the schema narrative claimed B-slots
were optional with a var() default, but no machine check enforced
declaration — a contract gap reviewers reasonably refused to merge.

This commit closes the gap in three places so machine and narrative
agree:

1. scripts/check-tokens-fixture-sync.ts
   - Add checkDesignSystemBSlotRequiredTokens, mirroring the A2
     check but using getBSlotNames() from the schema.
   - Failure message names each missing slot AND the schema-suggested
     alias (--fg-2 (default alias: var(--fg))) so a brand author
     fixing the failure has a copy-pasteable resolution.
   - Renumber section comments: 5 checks → 6 checks.

2. scripts/guard.ts
   - Register the new check between A2 required and unknown
     allowlist so failures attribute to a specific contract.

3. design-systems/_schema/AGENTS.md
   - Update the layer table: B-slot row's "If omitted" column
     changes from "resolves via var() to a richer sibling" to
     "guard fails — brand must declare, either as var(--sibling)
     (collapsed) or independent value (richer)".
   - Add a "Why B-slot is required (and what the alias is for)"
     section that distinguishes the schema-suggested alias from a
     runtime fallback, with worked examples for default (alias) and
     kami (independent bind).

Verified on default + kami:
- pnpm guard passes all 6 design-system checks
- 4 B-slot tokens declared in both brands (default aliases via var(),
  kami binds independently — both forms satisfy the contract)
- pnpm typecheck clean across the workspace
- Sanity test: removing --fg-2 + --meta from default/tokens.css fires
  the new guard with a precise per-token alias hint:
    [default] design-systems/default/tokens.css is missing 2 B-slot
    tokens (alias the named sibling via var(...) or bind
    independently):
      --fg-2 (default alias: var(--fg)),
      --meta (default alias: var(--muted))

The schema contract is now machine-enforced end-to-end (A1 + A2 +
B-slot all required-with-fixed-form-of-fallback). The derive script
in PR-B can rely on every brand's tokens.css containing every shared
slot name.

Co-authored-by: Cursor <cursoragent@cursor.com>

* test(e2e): skip leading-underscore meta-directories under design-systems/

CI for #1231 went red on `Validate workspace` after merging origin/main.
Cause is a clean collision between two recently-landed changes:

- main #1270 (be77dc03 "Default English resource i18n fallback")
  tightened tests/localized-content.test.ts so every directory under
  design-systems/ is run through assertResourceId() with the strict
  RESOURCE_ID_PATTERN /^[a-z0-9][a-z0-9-]*$/.

- this branch #1231 introduced design-systems/_schema/ as the home
  of the shared token contract (tokens.schema.ts, defaults.css,
  AGENTS.md). The leading underscore signals "meta-directory, not
  brand" — the same convention SCSS partials, Jekyll, Hugo all use.

The two changes never met until CI built the merge commit, where
assertResourceId('_schema') deterministically failed:

  Error: Design system directory _schema has malformed resource id: _schema
    at invariant tests/localized-content.test.ts:66:11
    at assertResourceId tests/localized-content.test.ts:71:3
    at readDesignSystemResources tests/localized-content.test.ts:202:8

Fix tightens readDesignSystemResources's directory filter so the
leading-underscore convention is recognised explicitly:

    .filter((entry) => entry.isDirectory() && !entry.name.startsWith('_'))

This aligns with what apps/daemon/src/design-systems.ts:listDesignSystems
already does implicitly — it requires DESIGN.md per directory, so
_schema/ was always invisible at runtime; the test was the only place
that surfaced it.

Verified locally on the post-merge tree:
- pnpm test (e2e vitest) — tests/localized-content.test.ts: 4 passed
- pnpm guard — all 6 design-system checks pass on default + kami
- pnpm typecheck — clean across the workspace (after pnpm install
  to pull deps for tools/pr that arrived with main)

The fix is intentionally narrow (one filter line in one test) and
documents the convention inline so future meta-directories under
design-systems/ (e.g. _archive/, _drafts/) are covered for free.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: chaoxiaoche <chaoxiaoche@192.168.10.16>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 22:23:34 +08:00
nettee
be77dc0394
Default English resource i18n fallback (#1270) 2026-05-11 20:29:05 +08:00
shangxinyu1
10802bb0b0
test: expand nightly UI and desktop regression coverage (#1256)
* e2e(ui): cover examples preview flows

* e2e(ui): cover Codex local CLI fallback UX

* test: expand desktop and connector regression coverage

* e2e(ui): cover workspace restoration flows

* e2e(ui): cover retry recovery workspace flow

* test: cover artifact and connector recovery flows

* e2e(ui): cover Continue in CLI stale provenance flow

* e2e(ui): cover BYOK model fetch caching

* test: expand Orbit and desktop connector coverage

* e2e(ui): cover workspace quick switcher recovery flows

* e2e(ui): cover connector pending authorization recovery

* e2e(ui): cover workspace and conversation restoration routes

* e2e(ui): cover conversation draft and attachment restoration

* e2e(ui): cover conversation history selection recovery

* e2e(ui): cover workspace surface conversation selection

* test: cover artifact presentation and orbit link behavior

* test: cover artifact external link restoration

* e2e(ui): cover root-route deep-link restoration

* e2e(specs): cover Orbit open-artifact desktop click

* e2e(specs): cover desktop artifact open link

* test: fix Orbit settings fixture type drift

* test: split Playwright critical and extended suites

* test: fix ProjectView design template fixtures

* ci: split workspace test stages

* guard: allow split Playwright suite scripts

* test: shrink Playwright critical suite

* test: restore omitted Playwright suites
2026-05-11 19:23:13 +08:00
Tom Huang
b5eb8c1647
feat: generic skills + split skills/design-templates + finalize-design API (#955)
* feat: general-purpose skills with @-mention composition and user import

Lift skills from "one mode-bound skill per project" to a generic capability
the user can compose per turn:

- Daemon: scan multiple skill roots (user-skills under runtime data, then
  the bundled `skills/`); user-imported skills can shadow built-ins by id.
- New `POST /api/skills/import` and `DELETE /api/skills/:id` endpoints,
  with CONFLICT/BAD_REQUEST/NOT_FOUND error codes and built-in delete
  protection.
- ChatRequest gains `skillIds: string[]`; the chat run concatenates each
  picked skill's body (and merges craftRequires) into the system prompt
  for that turn only — the project's persistent `skillId` is untouched.
- Web composer: `@` popover now lists skills alongside project files;
  picks render as removable chips above the textarea and ride along with
  the request as `skillIds`.
- Settings → Library: import form (name/description/triggers/body),
  per-card delete for user skills, "user" origin badge.

* chore(web): drop welcome pet teaser + add ds→prompt-template mapping util

- SettingsDialog: remove the inline pet adoption teaser from the welcome
  panel so the first-run modal stays focused on configuration.
- New `inferPromptTemplateCategoriesForDs(ds)` helper that maps a design
  system's authored metadata to prompt-template gallery categories.
  Imported by the design-system gallery wiring on a sibling branch; no
  callers in this branch yet.

* feat: split skills/design-templates and add finalize-design API

Phase 0 of the skills/design-templates refactor (specs/current/
skills-and-design-templates.md):

- Move ~104 rendering catalogue entries from skills/ to design-templates/
  and keep skills/ for the small set of functional skills that *do work*
  on user input (utilities, briefs, packagers).
- Add design-templates/AGENTS.md and skills/AGENTS.md describing the
  contract, and a brand-agnostic craft/ surface for opt-in craft rules.
- Daemon: add DESIGN_TEMPLATES_DIR / USER_DESIGN_TEMPLATES_DIR roots and
  an /api/design-templates surface mirroring /api/skills. Asset/example
  routes still span both registries so existing srcdoc URLs keep
  resolving across the rename.
- Web: split LibrarySection into SkillsSection + DesignSystemsSection,
  rename the EntryView "Examples" tab to "Templates", and update locales
  + the New-project picker accordingly.

Adds the finalize-design endpoint:

- New apps/daemon/src/finalize-design.ts and packages/contracts/src/api/
  finalize.ts — one-shot synthesis of a project's transcript + active
  design system + current artifact into <projectDir>/DESIGN.md via the
  Anthropic Messages API. Per-project .finalize.lock mirrors the
  transcript-export hygiene from PR #493; provider credentials are not
  persisted by the daemon.

Other supporting changes:

- README + AGENTS.md updates to document the new directory split and
  craft/ surface, plus i18n strings across 13 locales.
- Test refactors and new coverage (finalize-design, runs, sidecar
  server, plus refreshed daemon integration tests).
- .gitignore: scope the *.exe ignore to /OpenDesign.exe so legitimate
  vendor binaries are no longer hidden.

* fix(merge): move clinical-case-report to design-templates/

Origin/main added the clinical-case-report skill under skills/ before
the skills/design-templates split landed. Its od.mode is prototype, so
per specs/current/skills-and-design-templates.md it is a design template
and belongs alongside the other rendering catalogue entries — not under
the slimmed-down functional skills/ root. Moving it keeps the EntryView
Templates tab consistent with origin/main's intent.

* feat(skills): curated design/creative catalogue + collapsible Settings rows

Seed ~100 curated design/creative skill stubs under skills/ sourced from
awesome-claude-skills (ComposioHQ) and awesome-agent-skills (VoltAgent).
Each stub carries an od.category tag so the new filter pill row in
Settings -> Skills can group them. The seed script
(scripts/seed-curated-design-skills.ts, pnpm seed:curated-design-skills)
is idempotent: it only creates folders that don't already exist, so
hand-edited stubs are never overwritten.

- Daemon: parse and surface od.category on SkillInfo with a strict slug
  normaliser; mirror the field on SkillSummary in @open-design/contracts.
  Category is purely a UI hint — system-prompt composition is unchanged.
- Web: rewrite SkillsSection from a left-list / right-detail grid into a
  vertical stack of collapsible rows mirroring the External MCP panel
  (header always visible with name + mode/source/category pills + per-row
  enable toggle; SKILL.md preview, file tree and inline edit form expand
  on demand). Add a Category filter row above the list. Reorder Settings
  nav so Skills + External MCP sit above the Composio/MCP cluster. Update
  composer placeholder/hint across 17 locales to advertise '@ files or
  skills · / for commands'.
- Docs: extend skills/AGENTS.md with the curated catalogue rules
  (idempotency, category vocabulary, no upstream vendoring).

Co-authored-by: Cursor <cursoragent@cursor.com>

* test(skills): teach localized-content + system-prompt tests about the skills/design-templates split

mrcfps blocking review on PR #955: the skills/design-templates split
(b5993385) moved ~110 SKILL.md entries out of `skills/` and into
`design-templates/`, but two repo-level tests still hard-coded the
single-root layout, so CI gates went red on the merged branch:

- `e2e/tests/localized-content.test.ts` only scanned `<repo>/skills`
  while the locale `skillCopy` map keeps id-keyed entries spanning
  both roots (ExamplesTab/Templates uses one lookup regardless of
  origin). Teach the helper to read both `skills/` and
  `design-templates/`, deduplicating ids so the union matches the
  localized claim.
- `apps/daemon/tests/prompts/system.test.ts` read
  `skills/live-artifact/SKILL.md`, which now lives under
  `design-templates/live-artifact/`. Update the absolute path so
  composeSystemPrompt's coverage of the live-artifact preamble is
  exercised again.

Also enroll the curated design/creative catalogue (PR #955, ~91
stubs sourced from awesome-claude-skills / awesome-agent-skills) in
the DE / FR / RU `_SKILL_IDS_WITH_EN_FALLBACK` lists. The stubs are
English-only by design (frontmatter advertises an upstream URL); the
fallback list is exactly the place to acknowledge "we know this id
exists, English copy is fine here" so the localized-content coverage
gate passes without forcing a translation task per locale.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(skills): always quote frontmatter name so importUserSkill round-trips numeric / boolean ids

mrcfps PR #955 review: `buildSkillMarkdown` emitted `name:
${escapeYamlString(name)}` without quotes, so YAML coerced names
like `123`, `true`, `false`, or `null` into non-string scalars on
re-parse. listSkills() then read `data.name` as a number/boolean
and the import flow's follow-up `findSkillById(skills, result.id)`
missed it, falling into `/api/skills/import`'s "imported skill
could not be re-read" 500 path for those ids.

Switch the emitter to a quoted scalar (`name: "..."`) — the
double-escape already in `escapeYamlString` makes the quoted form
safe — and add a round-trip test covering `123`, `true`, `false`,
`null`, and `0` to lock in the contract.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): drop staged-skill chips when the matching @<id> token leaves the draft

mrcfps PR #955 review: `submit()` always forwarded every id in
`stagedSkills`, but that state was only mutated on picker click and
chip removal. Hand-deleting an `@<id>` token from the textarea left
the chip staged, so the request still carried `skillIds: [<id>]` and
the daemon composed a skill the prompt no longer referenced.

Sync the chips with the draft inside `handleChange()` by pruning
`stagedSkills` whenever the new value no longer contains the
`@<id>` token (using the same whitespace boundary as
`removeStagedSkill`'s strip regex). Comment explains why this
prune does not run for `staged` file attachments — users frequently
add files via the upload button without leaving an `@<path>` token,
so a symmetric prune there would erase legitimate uploads.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(daemon): stage @-composed skills' side files alongside the active skill

codex PR #955 review: composing a per-turn `@`-picked skill into the
system prompt appended its body (with the `withSkillRootPreamble`
guidance pointing at relative paths under `<cwd>/.od-skills/<folder>/`)
but never staged the actual folder. `startChatRun` only copied
`activeSkillDir`, so when the project's primary skill was different
(or absent) the composed skill's references/, examples/, and scripts/
files lived only at their absolute repo path — agents that honour
the cwd-relative form (or that don't get `--add-dir`, e.g. Codex with
allowlisted gpt-image projects) couldn't reach them.

Thread the composed skills' dirs out of `composeDaemonSystemPrompt`
as `extraSkillDirs` and stage each one through the same
`stageActiveSkill` API used for the primary skill. Dedupe by folder
basename so a project whose primary skill is also `@`-composed isn't
copied twice. Each preamble already advertises its own folder, so the
prompt and the staged tree stay aligned without further changes.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): respect the Library disable toggle in the project @-mention picker

codex PR #955 review: only `EntryView` received `enabledSkills`
(filtered against `config.disabledSkills`); active projects still
got `skills={skills}` raw, so a skill the user disabled in Settings
kept appearing in the project's `@`-mention popover and could ride
along to the daemon via `skillIds`. That broke the Library toggle
for any project opened on the post-split branch.

Compute a functional-skills-only enabled subset
(`enabledFunctionalSkills`) and pass it into `<ProjectView>` instead.
Templates stay separate — design-templates are filtered through their
own `enabledDesignTemplates` memo for the Templates gallery — so
ProjectView's chat composer still only sees skills, never templates,
matching the pre-split prop surface.

Co-authored-by: Cursor <cursoragent@cursor.com>

* test(e2e): mock /api/design-templates for example-use-prompt flow

The Templates tab in EntryView fetches from /api/design-templates after
the skills/design-templates split (specs/current/skills-and-design-templates.md).
The example-use-prompt Playwright scenario only mocked /api/skills, so the
gallery card never appeared and the test timed out waiting on
example-card-warm-utility-example. Serve the same fixture summary on both
endpoints so the templates gallery renders the card the test clicks.

Co-authored-by: Cursor <cursoragent@cursor.com>

* test(tools-pack): create design-templates fixture for resources test

The packaging resources copy now bundles the new design-templates tree
alongside skills (see resources.ts BUNDLED_RESOURCE_TREES). The
copyBundledResourceTrees fixture only created skills, design-systems,
craft, etc., so the recursive copy crashed with ENOENT on
design-templates before it could check the prompt-templates assertion.
Add the missing fixture directory so the test exercises the same set
of resource trees the packaged build does.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(skills): clone built-in side files into the shadow on first edit

mrcfps PR #955 review: editing a built-in skill wrote a USER_SKILLS_DIR
shadow folder that contained only a new SKILL.md. The next listSkills()
pass surfaced the shadow as the active dir, but every side-file resolver
(/api/skills/:id/files, /example, /assets/*, the system-prompt preamble,
and the per-turn cwd staging) reads through skill.dir. With nothing but
SKILL.md in the shadow, the bundled assets/, references/, scripts/, and
examples/ disappeared the moment the user hit save — a built-in like
last30days or live-artifact would break immediately after edit instead
of just having its body overridden.

Teach updateUserSkill() to take a `sourceDir` and clone every entry
except SKILL.md / dotfiles into the shadow on the very first edit. The
shadow stays self-contained, so all the resolvers keep working without
fallback bookkeeping. Subsequent edits detect the existing shadow and
skip the clone, so user tweaks under the side tree survive a re-save.

Wire `sourceDir: skill.dir` from server.ts's PUT /api/skills/:id handler
and add two regression tests:
- 'clones built-in side files into the shadow on the first edit' walks
  the file tree after save and asserts assets/template.html, references/
  notes.md, and scripts/helper.sh all round-trip from the built-in.
- 'preserves user-edited side files on subsequent edits' edits the
  staged assets/template.html, re-saves, and confirms the user content
  is still there.

Co-authored-by: Cursor <cursoragent@cursor.com>

* test(e2e): rename home tab from Examples to Templates

The Examples tab was renamed to Templates in EntryView (b5993385's
skills/design-templates split — entry.tabExamples became entry.tabTemplates
and the tab value moved from 'examples' to 'templates'), but
entry-chrome-flows still asserted the old label and testId. Update both.

* fix(skills+web): preserve template body in API mode and dir-based skill delete

Two follow-ups from PR #955 review:

1. ProjectView only received `enabledFunctionalSkills`, but
   `composedSystemPrompt()` still resolved `project.skillId` through that
   prop and `fetchSkill()`. Projects created from the new
   `/api/design-templates` surface keep a template id in `project.skillId`,
   so opening one in API mode dropped the template body from the system
   prompt and the upstream request ran without the project's primary
   template instructions. Now ProjectView takes a separate
   `designTemplates` prop (the unfiltered template list, so a
   later-disabled template still loads for projects already created from
   it) and `composedSystemPrompt()` plus the metadata / `isDeck` lookups
   fall back to that list, with `fetchDesignTemplate()` as the body-fetch
   fallback to `fetchSkill()`. The chat composer's `@`-picker keeps
   receiving only the enabled functional skills.

2. `DELETE /api/skills/:id` used `deleteUserSkill(USER_SKILLS_DIR, skill.id)`
   which re-slugified the frontmatter id and removed
   `<userSkillsDir>/<slug>/`. That matched the import shape but missed the
   install shape — `installFromTarget` writes the folder at
   `sanitizeRepoName(url)` (GitHub) or `path.basename(realpath)` (local
   symlink), neither of which is guaranteed to equal the slugified
   frontmatter `name`. A duplicate `app.delete('/api/skills/:id', ...)`
   handler at the install routes never fired because Express resolved the
   earlier registration first, leaving the install/uninstall path without
   working teardown. The handler now removes `skill.dir` (the absolute
   path listSkills already discovered) under a USER_SKILLS_DIR safety
   check, using `lstat` + `unlinkSync` so symlinked local installs unlink
   cleanly without recursing into the user's source tree. The dead
   duplicate handler is removed; `deleteUserSkill` is dropped from the
   server.ts import set (still exported and unit-tested in skills.ts).
   Regression coverage in `apps/daemon/tests/skills-delete-route.test.ts`
   pins both shapes plus the symlink-preserves-source case.

* test(daemon): point hyperframes system-prompt test at design-templates

The merge with main brought in a hyperframes system-prompt test that
reads `skills/hyperframes/SKILL.md`, but this branch's split moved
`hyperframes` into `design-templates/` (same migration as `live-artifact`
already handled above in this file). CI was failing with ENOENT on the
old path.

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 17:48:34 +08:00
Caprika
f7f2661bda
[codex] Handle empty API responses as no output (#1244)
* Handle empty API responses as no output

* Fix empty API response comment cleanup

* Stabilize API empty response detection
2026-05-11 16:57:02 +08:00
PerishFire
421ddf553c
fix(pack/win): close running app before silent reinstall (#1238) 2026-05-11 16:35:07 +08:00
Tom Huang
e254d1280b
feat(memory): auto-memory store with chat-protocol-aware extraction (#999)
* feat(memory): auto-memory store with chat-protocol-aware extraction

Markdown memory store at <dataDir>/memory/ with two extractors —
heuristic regex for explicit "remember:" / "我是 X" markers, and a
small-model LLM pass after each turn — folded into the system prompt
so cross-chat preferences, role, and ongoing-work context survive
restarts.

Settings UI:
- Memory tab lists entries, exposes a hand-edited MEMORY.md index, and
  shows an extraction history with per-attempt phase/skip/failure rows.
- Memory model picker is inline next to the chat model picker (CLI and
  BYOK) so the choice "which fast model mines facts each turn?" sits
  next to the chat-model decision instead of a separate panel. The
  picker reuses the same SUGGESTED_MODELS table and "Custom..." pattern
  the chat picker uses.

LLM extractor supports all four protocols (anthropic / openai / azure /
google); pickProvider takes the chat agent id from the chat handler
and constrains its auto-pick to the chat's protocol family — Claude
Code chats no longer surprise users by silently extracting on whatever
OpenAI key happens to be in media-config. When no matching key is
configured the attempt records as 'skipped: no-provider' instead of
quietly switching vendors.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(memory): keep hint outside <label> and disambiguate Model selectors

The inline Memory model picker wrapped its hint paragraph inside the
<label>, which made the hint's "API key" / "model" wording bleed into
the <select>'s accessible name and broke Playwright's getByLabel('API
key') / getByLabel('Model') strict-mode matching in the existing
settings-api-protocol e2e suite.

- Move the hint <p> out of the <label> in MemoryModelInline so the
  select's accessible name is just "Memory model".
- Switch the chat-Model selectors in settings-api-protocol.test.ts from
  getByLabel('Model') to getByRole('combobox', { name: 'Model', exact:
  true }) so they no longer collide with the new "Memory model" select
  that sits next to the chat Model picker.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(memory): address review changes — BYOK wiring, MEMORY.md index, /v1, label wrapper

Addresses the four blocking review threads on PR #999.

1. MemoryModelInline accessibility (mrcfps)
   The inline picker still wrapped its select + custom input + flash +
   hint inside a single <label>, which made the select's accessible
   name absorb every text descendant — including the "API key" / "model"
   hint copy. The previous fix moved only the hint outside; the
   reviewer asked for a non-label wrapper. Switch to <div className="field">
   and associate just the short title with the controls via
   `aria-labelledby` / `aria-label`. The select's accessible name is
   now exactly "Memory model" so `getByLabel` strict-mode locators
   on the surrounding chat form stop cross-matching the memory copy.

2. Respect the hand-edited MEMORY.md index (mrcfps + codex)
   `composeMemoryBody()` was reading every *.md file in the memory
   dir, ignoring the index. Removing a `- [Name](id.md)` line had no
   effect on future prompts. Parse the index's `INDEX_LINK_RE` bullets
   and filter `listMemoryEntries()` to the linked id set, so the
   editor's "delete this line to disable injection" promise actually
   holds.

3. Versioned OpenAI-compatible base URLs (codex)
   `callOpenAI` and `callAnthropic` hard-coded `/v1` onto
   `provider.baseUrl`, breaking custom endpoints whose saved URL
   already includes `/v1` (`/v1/v1/chat/completions`). Apply the same
   conditional `appendVersionedApiPath` helper the chat proxy and
   connection-test routes already use.

4. Wire memory into BYOK / API-mode chats (mrcfps + codex)
   The previous PR's daemon-only memory hook never fired for BYOK,
   leaving the Memory tab + model picker as a no-op for that mode.
   Add the missing surface and wire it through ProjectView:
   - contracts: extend `composeSystemPrompt` with `memoryBody`,
     mirroring the daemon's local composer; add
     `MemorySystemPromptResponse` and the `attemptedLLM` flag on
     `ExtractMemoryResponse`.
   - daemon: expose `GET /api/memory/system-prompt` (returns the
     composed body) and turn `POST /api/memory/extract` into a
     two-phase endpoint — heuristic-only when only userMessage is
     supplied (pre-turn), LLM-only when assistantMessage is also
     supplied (post-turn), so the extraction-history doesn't double
     up.
   - web: ProjectView's BYOK branch now fetches the memory body
     before composing the system prompt, runs the heuristic
     extractor before the run (so "remember:" markers in this turn
     reach this turn's prompt), accumulates assistant text during
     streaming, and queues the LLM extractor on `onDone` — fire-and-
     forget so it never blocks the chat round-trip.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(memory): re-sync BYOK memory override when chat config drifts

The inline memory-model picker captured `apiProtocol` / `chatApiKey` /
`chatBaseUrl` / `chatApiVersion` into the saved override only at the
moment the user clicked a model. If they later swapped the BYOK
protocol tab, rotated the API key, or edited the base URL in the same
settings flow, the daemon's background extractor kept calling the
*old* vendor / credential — directly contradicting the picker's
"borrows the surrounding chat picker's protocol, key, base URL, and
api-version automatically" promise.

Add a debounced effect that compares the persisted (masked) shape
against the live chat props and re-PATCHes /api/memory/config when
they drift. The masked config exposes `apiKeyTail` (last 4 chars), so
key rotation is detectable without ever round-tripping the secret
back to the browser. The 300 ms debounce coalesces the keystroke-
granularity prop updates the parent settings dialog streams during
its autosave loop, so a user editing the base URL doesn't trigger one
PATCH per character. Background re-syncs are silent — the "Saved!"
flash only fires for explicit user clicks, so the picker doesn't feel
like it's fighting them as they edit unrelated chat fields.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(memory): thread BYOK chat config through /api/memory/extract default path

Leaving the BYOK memory picker on "Same as chat" still broke the
default LLM extraction path: `MemoryModelInline` clears the override
for that option, both `/api/memory/extract` calls in `ProjectView`
only sent the messages, and the daemon never persists BYOK creds, so
`extractWithLLM(..., { chatAgentId: null })` always reached
`pickProvider()` with no chat context and fell through to env /
media-config — the wrong vendor for a BYOK chat that works for
inference.

Thread the live BYOK chat config through the extract endpoint as a
per-call snapshot:

- contracts: extend `ExtractMemoryRequest` with an optional
  `chatProvider` (provider/apiKey/baseUrl/apiVersion/model) and add
  `'chat-byok'` to the credentialSource enum.
- daemon: parse + validate `chatProvider` on `/api/memory/extract`
  (provider must be one of the five known shapes) and forward to
  `extractWithLLM` as a new option. `pickProvider()` gets a new
  path 2 that uses the snapshot directly with the per-protocol
  fast-model default — so a memory pass on `gpt-4o` / `claude-sonnet-4-5`
  silently turns into a cheap `gpt-4o-mini` / `claude-haiku-4-5` call
  instead of paying chat-tier rates for sediment work. Override and
  CLI-agent-constrained paths still win when they apply.
- web: `ProjectView` snapshots `apiProtocol` / `apiKey` / `baseUrl` /
  `apiVersion` from the live `AppConfig` on each BYOK extract call
  (both pre-turn heuristic-only and post-turn LLM phases). The
  picker's existing drift-resync effect already covers explicit
  overrides; this snapshot covers the implicit "Same as chat"
  default that the override flow can't reach.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(memory): treat empty apiKey on PATCH as a real clear

MemoryModelInline silently re-PATCHes /api/memory/config whenever the
surrounding BYOK chat creds drift. The previous reuse branch lumped
`apiKey === ''` together with `apiKey === undefined`, so clearing the
chat API key from the picker quietly preserved the old daemon-side
secret and kept calling the provider on a stale credential.

Distinguish four states for the apiKey field:
- absent       -> preserve stored secret (form re-save without re-typing)
- ''           -> clear stored secret (user removed it from the picker)
- 'sk-...'     -> replace
- new provider -> ignore stored secret entirely

Add tests/memory-config-route.test.ts covering all four cases.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 15:45:42 +08:00
PerishFire
31e57fd773
fix(daemon): persist runStatus/endedAt on chat run termination (#1230)
* fix(daemon): persist runStatus/endedAt on chat run termination (#135)

POST /api/runs created the run but never reconciled the messages row
on terminal status. If the web failed to persist the cancel (refresh,
dropped PUT), the row stayed at run_status='running' / ended_at=NULL,
and on reload the elapsed timer kept climbing because the renderer
fell back to now - startedAt.

Mirror routine/orbit reconciliation: attach a wait-completion handler
that updates run_status and ended_at, guarded by COALESCE and a
run_status IN ('queued','running') filter so concurrent web persists
are not clobbered.

Adds cancelRun helper and two regression specs under e2e/tests/dialog/.

* fix(daemon): annotate reconcile callback params for chat-routes

The chat run reconciliation block landed in chat-routes.ts after the
recent server-route split (#1043), where stricter type checking surfaces
implicit `any` parameters. Annotate the wait/then callback as
`{ status: string }` and the catch callback as `unknown`.

* refactor(daemon): extract reconcileAssistantMessageOnRunEnd helper

The inline if/wait/then/catch block in POST /api/runs read as a bolt-on
patch. Lift it to a named file-scope helper so the route handler stays
intent-level (start the run, arrange follow-up reconciliation) and the
guard for missing assistantMessageId is an internal detail.

The helper's docblock describes the invariant ("messages row reflects
the run's terminal state even without web persist"); commit history
keeps the issue context.

* test(e2e): wait for any terminal status in stop-reconcile spec

The earlier .catch fallback chained two waitForRunStatus calls (canceled
then succeeded). waitForRunStatus throws on the first non-expected
terminal, so a canceled run that resolves to failed (e.g. agent exits
non-zero on SIGTERM) would still abort the test before reaching the
messages-row assertion.

Add waitForRunTerminal to e2e/lib/vitest/runs.ts: polls until any
terminal status without throwing on mismatch, since this spec's claim
is about the resulting messages row, not which terminal the run took.

Addresses Codex inline review on PR #1230.
2026-05-11 15:37:52 +08:00
PerishFire
976edaf38e
test: harden e2e smoke and release reports (#1140)
* test: harden e2e inspect specs

* test: wire e2e release reports

* chore: bump packaged beta base to 0.6.1

* test: run release smoke vitest directly

* test: add suite-owned tools-dev lifecycle

* ci: harden stable release packaging

* fix(release,e2e): gate stable signing on verify and harden suite cleanup

- restore `needs: [metadata, verify]` on the stable release `build_mac`,
  `build_mac_intel`, `build_win`, and `build_linux` jobs so Apple
  signing/notarization and Windows release builds cannot run before
  pnpm guard, typecheck, and layout checks complete on the metadata commit.
- in `runToolsDevSuite`, drop the `started` flag and always attempt
  `stopToolsDevWeb` in `finally`; record stop errors in diagnostics, and
  when the test body succeeded, escalate the stop failure to the suite
  result and rethrow — so orphan daemon/web processes from an interrupted
  `startToolsDevWeb` or a broken shutdown can no longer pass silently.

Addresses PR #1140 review feedback from lefarcen and mrcfps.
2026-05-11 13:11:16 +08:00
shangxinyu1
d45bf3fb9a
test: expand entry and settings automation coverage (#954)
* test: harden new project panel metadata coverage

* test: expand entry e2e coverage

* test: drop e2e docs from the guarded package

* test: cover examples gallery interactions

* test: cover examples preview modal actions

* test: cover examples preview escape fullscreen

* test: cover examples template prompt filtering

* test: cover updated settings and entry tabs

* test: fix entry/settings coverage type drift

* test: fix example preview fetch assertion

* test: fix new project panel skill fixture
2026-05-11 10:49:42 +08:00
code-Y
84f768d4a2
feat: add WeChat design system, login-flow skill, and fix API mode tool_calls bug (#1083)
* feat: add WeChat design system, login-flow skill, and fix API mode tool_calls bug

- Add WeChat design system (design-systems/wechat/) with full brand spec
  including color palette, typography, and component rules for chat UI
- Add login-flow skill (skills/login-flow/) for mobile authentication flows
  with P0 checklist, example HTML, and i18n registration across 3 locales
- Fix DeepSeek V4 bug: API/BYOK mode (streamFormat=plain) models now receive
  a directive to emit only <artifact> HTML blocks and suppress tool_calls,
  since plain adapters proxy to external providers that cannot execute tools

* fix: restore full server.ts and WeChat DESIGN.md from ad46d8cd commit

Restore files that were corrupted in PR #1083 head branch.
The WeChat DESIGN.md was reduced to a single line (filename only)
and server.ts was reduced to ~1 line. Both are restored to their
original ad46d8cd state with full content.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix: restore full server.ts and WeChat DESIGN.md from ad46d8cd

Restore files corrupted in PR #1083:
- apps/daemon/src/server.ts: restored 7106-line file
- design-systems/wechat/DESIGN.md: restored 301-line WeChat design spec
- skills/login-flow/SKILL.md: restored from local working state
- skills/login-flow/example.html: restored 351-line example HTML

* fix: only suppress tool_calls when streamFormat='plain' explicitly, remove nonexistent assets/template.html

1. streamFormat check now requires explicit 'plain' value instead of defaulting
   to 'plain' when undefined. This prevents normal tool-using chat runs from
   incorrectly inheriting the API/BYOK tool_calls suppression rule.

2. login-flow SKILL.md: removed reference to assets/template.html since that file
   does not exist in the skill bundle and derivePreflight() would inject a hard
   instruction to read it before any other tool, causing pre-flight to fail.

* fix: thread streamFormat to composeSystemPrompt in server.ts call

Previously the composeSystemPrompt call at line ~4940 omitted streamFormat,
causing the composer to default to 'plain' and suppress tool_calls even
for tool-using chat runs. Now streamFormat is passed through from the
adapter definition so the API mode rule only fires when streamFormat='plain'
is explicitly set.

* fix: WeChat category metadata, font-family, and login-flow example interactivity

WeChat DESIGN.md:
- Add Category: Social & Messaging metadata so it appears correctly in picker
- Fix font-family declaration: remove invalid -webkit-font-family prefix,
  use standard font-family so downstream CSS generation works correctly

skills/login-flow/example.html:
- Add password toggle click handler so show/hide actually works
- Change Apple icon fill from hardcoded #fff to currentColor so it is
  visible on light backgrounds

* fix: mirror streamFormat suppression in contracts composer and add WeChat i18n

1. packages/contracts/src/prompts/system.ts: Add streamFormat parameter to
   ComposeInput and ComposeInput interface, mirroring the same suppression
   rule from daemon prompts/system.ts. When streamFormat='plain' is passed,
   a directive is appended telling models not to emit tool_calls and to only
   output <artifact> HTML blocks.

2. apps/web/src/i18n/content.{ts,fr,ru}.ts: Add WeChat design system entries:
   - Add 'wechat' to DE/FR/RU_DESIGN_SYSTEM_IDS_WITH_EN_FALLBACK arrays
   - Add 'wechat' summary to DE/FR/RU_DESIGN_SYSTEM_SUMMARIES
   - Add 'Social & Messaging' category to DE/FR/RU_DESIGN_SYSTEM_CATEGORIES
     (matching the Category: Social & Messaging metadata in WeChat DESIGN.md)

* fix: thread streamFormat='plain' into web composeSystemPrompt for api mode

* test: focus localized content coverage on missing resources

---------

Co-authored-by: Open Design Contributor <z@open-design.dev>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: mrcfps <mrc@powerformer.com>
2026-05-10 20:38:33 +08:00
Marc Chan
b03a504da6
release: Open Design 0.6.0 (#1080) 2026-05-09 19:58:11 +08:00
Paul Stean
0b039777b9
fix/Bug#772-DesignFilesSortButton-DesignFilesTableNowSortableColumns (#804)
* Disclaimer: Changes made using OpenCode with Big Pickle, an AI code assistant.

1. Removed the ↑ button from DesignFilesPanel.tsx (was a "back" button that
    closed the preview pane — confusingly placed in the header as if it were for
    sorting).

2. Converted the file list to a single <table> with sortable columns:
    - Replaced the section-based grouping (Pages, Sketches, Scripts, Images,
    Other) with a flat, sortable table
    - Added column headers: Name, Kind, Modified — all clickable to toggle
    sort direction
    - Default sort is by modified time descending (same as previous behavior)
    - Sort indicator arrows (↑/↓) show the active sort column and direction
    - Live artifacts remain as a separate section above the table

3. Added i18n keys designFiles.colName, designFiles.colKind, designFiles.
    colModified to all 17 locale files and the type definitions.

4. Updated CSS with table layout styles (.df-table, .df-file-row, column
    width classes, sortable header styles).

    Files modified:
    - apps/web/src/components/DesignFilesPanel.tsx
    - apps/web/src/index.css
    - apps/web/src/i18n/types.ts
    - apps/web/src/i18n/locales/en.ts
    - apps/web/src/i18n/locales/*.ts (all 16 other locale files)

* Updated to preserve keyboard access to sorting

* Fixed keyboard to focus/activate/launch file from Design Files list. Single space bar will show preview, double spare bar will open the file as a tab

* Top pagination bar (above the table):
     - "Show" dropdown with options 15, 30 (default), 45, 60, All
     - Page range indicator (1–20 of 45)
     - Previous / Next buttons

     Bottom pagination bar (below the table):
     - Previous / Next buttons
     - "Go to page" dropdown listing all page numbers
     - Same page range indicator

     Implementation details:
     - All controls use native <select> and <button> elements — fully keyboard
     accessible (Tab, arrow keys, Enter/Space)
     - Page resets to 0 when page size changes
     - safePage clamps to valid bounds when file count changes (e.g. after delete)
     - "All" sets page size to total file count (effectively one page)
     - Prev/Next buttons show disabled state at boundaries with reduced opacity

* All 46 test files, all 385 tests pass. Here's what the regression test covers:
     ┌────────────────────┬──────────────────────────────────────────────────────┐
     │Test                │What it verifies                                      │
     ├────────────────────┼──────────────────────────────────────────────────────┤
     │default page size   │500 files → only 30 .df-file-row elements in DOM      │
     ├────────────────────┼──────────────────────────────────────────────────────┤
     │page size All       │changing per-page to "All" shows all 500 rows         │
     ├────────────────────┼──────────────────────────────────────────────────────┤
     │page size 60        │changing to 60 shows 60 rows                          │
     ├────────────────────┼──────────────────────────────────────────────────────┤
     │Next navigation     │clicking Next advances page and shows file-31 (sorted │
     │                    │by mtime desc)                                        │
     ├────────────────────┼──────────────────────────────────────────────────────┤
     │Prev/Next disabled  │Prev disabled on page 0, Next disabled on last page   │
     │states              │                                                      │
     ├────────────────────┼──────────────────────────────────────────────────────┤
     │jump to page        │bottom dropdown jumps to page 3 (shows file-91)       │
     ├────────────────────┼──────────────────────────────────────────────────────┤
     │page info text      │1–30 of 500 → after Next → 31–60 of 500               │
     ├────────────────────┼──────────────────────────────────────────────────────┤
     │render time         │renders 500 files in under 2s                         │
     └────────────────────┴──────────────────────────────────────────────────────┘

* Fixed i18n for DesignFiles, and Fixed DesignFilesPanel Test

* Fixed - P3 — .df-thead rule defined but never applied

* Fixed keyboard use for file navigation, focus and button usage

* Fix i18n for x of y in design files pagination

* Fixed SafePage clamping

* Fixed dupe file total count

* Fixed x of y i18n

* Fixed DeleteSelected i18n and missing from Test

* fix effective pagesize issue, and change duplicate file kind to a filesize

* Readded page/everything selection and i18n

* Fixed i18n issues

* Resolved indonesian i18n issue with cloudflare keys

* Fixed unrelated cloudflare i18n issues as requested in Pull Request by reviewer

* Fix e2e test: click filename button instead of row for preview

The DesignFilesPanel was refactored from <button> rows to a <tr> with
a nested <button> for the filename. The e2e test was still clicking
the <tr> which has no onClick handler, so the preview never appeared.

* Remove duplicate formatSize helper, reuse humanBytes instead
2026-05-09 09:01:57 +08:00
PerishFire
dcfab797c2
[codex] Add stable nightly promotion gate (#962)
* Upload beta e2e spec reports to R2

* Expose beta report URLs in summary

* Complete Indonesian deploy locale keys

* chore: factor release workflow scripts

* chore: bump packaged beta base version

* test: wait for mac packaged runtime health

* fix: capture mac packaged startup logs

* chore: improve mac release build observability

* fix: ad-hoc sign unsigned mac builds

* chore: diagnose mac packaged startup

* fix: relax unsigned mac launch signing

* chore: improve mac launch diagnostics

* chore: simplify beta mac release artifacts

* fix: align packaged mac smoke launch config

* fix: externalize mac daemon wasm dependency

* chore: require signed stable mac releases

* fix: use stable app version for nightly package builds

* chore: clean release artifacts after publish

* chore: publish beta reports as zip

* ci: disable beta mac tools-pack cache

* fix: skip mac framework binary symlinks when signing

* fix: sign mac framework version bundles

* ci: disable beta mac pnpm cache

* chore: align stable release reports

* ci: require matching nightly before stable release

* ci: avoid mac pnpm cache for packaged smoke
2026-05-08 21:48:54 +08:00
Tom Huang
d592f6087f
feat(mcp): external MCP client with daemon-managed OAuth and 39 design-focused templates (#898)
* feat(mcp): add external MCP client with daemon-managed OAuth and 17 design-focused templates

Open Design now acts as an MCP CLIENT and surfaces tools from third-party
MCP servers to the underlying agent (Claude Code, Hermes, Kimi).

Daemon
- New mcp-config / mcp-oauth / mcp-tokens modules: persist server entries
  to .od/mcp-config.json, run the OAuth dance for HTTP/SSE servers
  end-to-end on the daemon (so cloud deployments work and tokens
  survive across turns), and inject Authorization: Bearer headers into
  the per-spawn .mcp.json the daemon writes for Claude Code (or the
  ACP mcpServers map for Hermes/Kimi).
- /api/mcp/servers and /api/mcp/oauth/{start,status,disconnect}
  endpoints, plus spawn-time wiring in agents that hands the configured
  servers to the active agent CLI.
- System-prompt directive for connected external MCPs so the model
  does not chase Claude Code's synthetic *_authenticate /
  *_complete_authentication tools when the Bearer is already pinned.

Web
- Settings -> External MCP servers panel with per-row OAuth Connect /
  Disconnect / Refresh affordances and per-row template hints.
- New "Add server" picker categorized into 7 groups
  (image-generation, image-editing, web-capture, ui-components,
  data-viz, publishing, utilities) with a search box, sticky close
  button, collapsible <details> sections (auto-expand on search),
  60vh capped scroll region, and a pinned Custom-server footer.
- ChatComposer /mcp slash and MCP picker button forward to the new
  Settings tab; AssistantMessage renders MCP tool calls inline;
  markdown autolinker handles bare http(s) URLs (incl. OAuth links)
  before italic markers so OAuth callback URLs do not get
  italic-fragmented mid-token.

Contracts
- packages/contracts/src/api/mcp.ts owns the wire shapes
  (McpServerConfig, McpTemplate with stable McpTemplateCategory
  enum, McpServersResponse, OAuth start/status/disconnect bodies, the
  postMessage payload from the OAuth callback).

Templates (17 built-in)
- image-generation: Higgsfield (OpenClaw, OAuth HTTP), Pollinations,
  Allyson (animated SVG), AWS Bedrock Image (uvx).
- image-editing: Imagician, ImageSorcery.
- web-capture: just-every screenshot-website-fast, ScreenshotOne.
- ui-components: 21st.dev Magic, shadcn/ui, FlyonUI.
- data-viz: AntV Chart, Mermaid.
- publishing: EdgeOne Pages.
- utilities: Filesystem, GitHub, Fetch.

Tests
- apps/daemon/tests/mcp-{config,oauth,tokens,spawn}.test.ts cover
  storage round-trip, OAuth helpers, token persistence, spawn-time
  wiring, every template's transport / command / args / env-field
  invariants, and the canonical category enum.
- apps/web/tests/runtime/markdown.test.tsx covers the new autolinker
  ordering rules.

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(mcp): add 21 more design-focused templates and a `design-systems` category

Expands the built-in MCP picker from 17 to 38 templates so users can compose
the full Open Design craft loop (design-system intake → generate → edit →
audit → publish) without leaving the Settings dialog. Every install spec is
verified live against the upstream README; templates that needed Go binaries,
multi-step `init` ceremonies, or massive runtime stacks (PostgreSQL + Redis
+ Ollama) are intentionally deferred so picking a template still resolves to
a working server in one click.

New `design-systems` category between `web-capture` and `ui-components`
(reflects the upstream-of-components position in the workflow). Mirrored in
`McpTemplateCategory` on both contracts and daemon, and `CATEGORY_ORDER` on
the web side.

New templates by category:

- image-generation (+4): prompt-to-asset (icons / favicons / OG / logos with
  free-tier routing across Cloudflare AI / NVIDIA NIM / HF / Stable Horde),
  Nano Banana (hosted streamable HTTP, virtual try-on + product placement),
  Seedream (hosted streamable HTTP, ByteDance Seedream v3-v5 + SeedEdit),
  fal.ai (uvx, 600+ models incl. FLUX / Kling / Hunyuan / MusicGen).
- image-editing (+3): Photopea (34 layered-editor tools — closes the PSD
  gap), Topaz Labs (AI upscale / denoise / sharpen), Transloadit (86+ media
  pipeline robots).
- web-capture (+1): Pagecast (browser → demo GIF / MP4 with auto-zoom).
- design-systems (+4, NEW category): Figma-Context (Framelink, designs →
  code), Design Token Bridge (Tailwind ⇄ CSS ⇄ Figma ⇄ M3 / SwiftUI / W3C
  DTCG + WCAG contrast), Design System Extractor (Storybook scrape),
  Aesthetics Wiki (cottagecore / dark-academia / y2k / … moodboards).
- data-viz (+2): MCP Dashboards (45+ chart types + KPI dashboards),
  Excalidraw Architect (hand-drawn architecture diagrams).
- publishing (+6): PageDrop, PDFSpark, OGForge, QRMint, Slideshot
  (HTML → PDF / PPTX / PNG with 7 themes), Deckrun (Markdown → PDF / video,
  hosted free tier with no key required).
- utilities (+1): A11y axe-core (WCAG 2.0/2.1/2.2 + color-contrast + ARIA).

Tests cover every new template's wiring (command, args, env / header
required-vs-optional, secret flag), the category enum invariant, and
in-category declaration order for image-generation, design-systems and
publishing buckets where the order is what users see in the picker. 21 new
test cases pass; full mcp-config suite is green.

Templates intentionally deferred (documented in PR body): figma-use
(needs Figma desktop with --remote-debugging-port=9222), m-moire (multi-step
`memi suite init` + daemon ceremony), gemini-media-mcp + trident-mcp (Go
binaries — no npx / uvx path), Pixelle-MCP (full app with web UI + ComfyUI
backend), storybook-addon-mcp (lives inside user's Storybook, not standalone),
primitiv (multi-step init / build / serve), ReftrixMCP (PostgreSQL + Redis +
Ollama + DINOv2), narasimhaponnada/mermaid (overlap with peng-shawn).

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(mcp): add figma-use template (write designs from chat) under design-systems

figma-use is the natural counterpart to Figma-Context already in this PR:
where Framelink reads Figma designs into the model, figma-use writes back
into the canvas (90+ tools — create frames / text / components / variants,
render JSX into Figma, export PNG/SVG, query nodes via XPath, lint for
WCAG / auto-layout / hardcoded colors, analyze design systems).

Wired as an HTTP MCP template (`http://localhost:38451/mcp`) because
`figma-use mcp serve` only exposes HTTP — there's no stdio mode in the
upstream `serve.ts`. No API key. Two prerequisites the user owns are
spelled out in the description so picking the template still resolves to
a working server: (1) start Figma with `--remote-debugging-port=9222`
(or `figma-use daemon start --pipe` on Figma 126+), and (2) leave
`npx figma-use mcp serve` running in a terminal.

Inserted between `design-system-extractor` and `aesthetics-wiki` so the
design-systems category reads as a workflow: read existing design (Figma
Context) → translate tokens (Token Bridge) → extract from Storybook
(Extractor) → write back to Figma (figma-use) → break creative block
(Aesthetics Wiki).

Tests cover the new template's transport (`http`), endpoint URL, the
empty header-fields invariant (no auth required), and bump the
design-systems group order to include it.

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(settings): i18n the External MCP / MCP server / Connectors sidebar entries and make the dialog header track the active section

The External MCP sidebar entry this PR introduces was hardcoded English
("External MCP / Add MCP tools (Higgsfield, GitHub…)"). Same for the
adjacent Connectors and MCP server entries. The dialog header was also
pinned to "Execution & model" copy, so opening Settings → External MCP
showed a header that lied about which section the user was on.

Adds six translation keys — `settings.connectorsTitle/Hint`,
`settings.mcpServerTitle/Hint`, `settings.externalMcpTitle/Hint` — and
translates them across all 17 locales (ar, de, en, es-ES, fa, fr, hu, id,
ja, ko, pl, pt-BR, ru, tr, uk, zh-CN, zh-TW).

`SettingsDialog` now derives the header title/subtitle from the active
section (11 sections total) instead of a single hardcoded pair, so each
section renders an honest header.

Co-authored-by: Cursor <cursoragent@cursor.com>

* test(e2e): pin level: 3 on dialog heading lookups for Pets and Connectors

CI's Validate workspace job (#1479) failed two Playwright cases with the
strict-mode violation:

  getByRole('dialog').getByRole('heading', { name: 'Pets' })
  resolved to 2 elements:
    1) <h2>Pets</h2>
    2) <h3>Pets</h3>

Same root cause as the unit-test fix already in this PR: the dynamic
dialog `<h2>` now echoes the section's own `<h3>` because the dialog
header tracks the active section. Disambiguate to `level: 3` so each
assertion still pins the section heading specifically (which is what
the test intends to verify).

Audit of the rest of e2e/ for `dialog.getByRole('heading', ...)` —
settings-api-protocol.test.ts looks for "OpenAI API" / "Anthropic API"
section h3s which never appear in the dialog `<h2>` (always
"Execution & model"), so those stay safe.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(mcp): bind OAuth refresh to the issuing client and skip stale tokens

Persist the OAuth client context (token endpoint, client_id, client_secret,
issuer, redirect_uri, resource) alongside the bearer token so refresh hits
the same client the refresh_token was bound to (RFC 6749 §6). The previous
refresh path re-ran beginAuth with a dummy OOB redirect URI, which kept
getOrRegisterClient from finding the original DCR client and made
providers reject the refresh on the next chat turn. Refreshes now reuse
the persisted endpoint/client pair directly.

Also stop injecting expired access tokens at spawn time when refresh is
unavailable or fails. Pinning a stale Bearer made every Claude MCP call
401 while the prompt still treated the server as connected; on that path
we now skip the entry and let the UI surface a reconnect.

Generated-By: looper 0.6.1 (runner=fixer, agent=claude-code)

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-08 17:59:20 +08:00
Marc Chan
b06f26a5fd
test: strengthen e2e PR coverage (#796)
* test: strengthen e2e PR coverage

* fix: address e2e PR feedback

Generated-By: looper 0.6.1 (runner=fixer, agent=opencode)

* fix: address e2e PR feedback

Generated-By: looper 0.6.1 (runner=fixer, agent=opencode)

* ci: cache Windows packaged smoke builds

* test: fake additional agent runtimes

* fix: address e2e PR feedback

Generated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* fix: address e2e PR feedback

Generated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* fix: address e2e PR feedback

Generated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* fix: address e2e PR feedback

Route tools-pack mac starts through a launch-time packaged config override so portable packaged smoke runs keep using the namespace runtime root that inspect and logs expect.

Generated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* fix: address e2e PR feedback

Fall back to the packaged app's embedded config when the build output config is missing so installed mac starts still work.

Generated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* fix: align packaged mac PR smoke with tools-pack runtime mode

Generated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* fix: address e2e PR feedback

Generated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* fix: address e2e PR feedback

Keep blake3-wasm out of the packaged mac daemon prebundle so the standalone runtime loads the Cloudflare asset hasher from node_modules instead of crashing in ESM.

Generated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* fix: address e2e PR feedback

Skip the portable mac launch override when the bundled packaged config is missing so installed fallback app targets can still boot with packaged defaults.

Add a regression test covering the missing-config start path.

Generated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* fix(pack): remove duplicate mac prebundle dependency key
2026-05-08 16:48:10 +08:00
Marc Chan
e14b8092ea
feat: add Orbit activity summaries (#681)
* feat: add Orbit activity summaries

* fix(orbit): make runs navigable while agent continues

* fix(web): widen minimum chat panel

* feat: support Orbit template selection

* fix(daemon): avoid bogus skill side-file preflight

* fix(web): collapse orbit artifact project cards

* fix(web): preserve orbit project card titles

* fix: improve Orbit run daily briefing

* fix: handle Orbit digest data failures

* fix: load Orbit templates and connector tools reliably

* fix: keep Orbit summary counts consistent

Generated-By: looper 0.6.1 (runner=fixer, agent=opencode)

* fix: apply Orbit template skill context

* fix: cache and curate connector tools for Orbit

* fix: align Orbit defaults and connector discovery

* fix: simplify Orbit template settings

* fix: move connectors into settings

* fix: compact connector settings catalog

* fix: address Orbit PR feedback

Generated-By: looper 0.6.1 (runner=fixer, agent=opencode)

* fix: address Orbit PR feedback

Generated-By: looper 0.6.1 (runner=fixer, agent=opencode)

* fix: address Orbit PR feedback

Generated-By: looper 0.6.1 (runner=fixer, agent=opencode)

* fix: address Orbit PR feedback

Generated-By: looper 0.6.1 (runner=fixer, agent=opencode)

* fix: address Orbit PR feedback

Generated-By: looper 0.6.1 (runner=fixer, agent=opencode)

* fix: address Orbit PR feedback

Generated-By: looper 0.6.1 (runner=fixer, agent=opencode)

* fix: address Orbit PR feedback

Generated-By: looper 0.6.1 (runner=fixer, agent=opencode)

* fix: address Orbit PR feedback

Generated-By: looper 0.6.1 (runner=fixer, agent=opencode)

* fix: prevent connector action button from stretching into pill

The icon-only connect/disconnect buttons in the embedded connectors
catalog inherited min-width: 92px / 106px from the non-embedded pill
rules, overriding the 24px square sizing and causing the buttons to
overlap the card head text. Reset min-width to 0 in the embedded
icon-only rule so the compact square layout holds.

* fix(web): align live artifact file rows

* fix: clean up Orbit connector settings lifecycle

Generated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* fix: address Orbit review regressions

Generated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* feat(web): localize Orbit and connector settings

* feat(web): gate Orbit runs without connectors

* feat(web): refine connector settings UX

* feat(web): safeguard Composio key clearing

* fix(web): refresh Composio tool badges

* feat(web): show connector logos

* feat(daemon): localize Orbit prompt window

* fix(daemon): clarify blocked connector callback closes

* test(daemon): harden flaky async probes

* fix(web): align Indonesian connector locale keys

* test(web): align connector browser props

* fix(web): preserve explicit credential clears

Generated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* fix(daemon): time out Composio logo proxy fetches

Generated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* fix(web): localize Indonesian connector settings copy

Translate the new connector settings strings in the Indonesian locale and lock them with a regression test so this surface no longer silently falls back to English.

Generated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* fix(web): preserve discovered connector tools

Generated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* fix(web): preserve onboarding autosave completion

Keep settings autosave from clearing onboarding completion after the close gesture, and expose the desktop main types from source so workspace validation can typecheck packaged imports without a prior desktop build.

Generated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* fix(daemon): defer Composio catalog cache hydration

Load persisted Composio catalog data only after the runtime data directory is configured so startup cannot read another namespace's cache. Add a regression test that exercises the module-load singleton path.

Generated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* fix(web): treat discovery completion independently

Generated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* fix(web): preserve latest settings draft on close

Use the latest persisted settings draft when the dialog closes so onboarding completion does not race a stale daemon sync and overwrite newer Orbit/template selections.

Generated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* fix(web): avoid syncing draft Composio key on Orbit run

Generated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* fix(web): localize Orbit settings copy

Translate the new Indonesian Orbit and autosave strings so the settings UI no longer falls back to English and the locale regression stays covered.

Generated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* fix(web): prefer fresh connector catalog state

Keep refetched connector status/auth data authoritative while retaining discovery-only tool metadata so the connectors UI stays consistent after refreshes.

Generated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* fix(web): declare Indonesian locale fallback keys explicitly

Generated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* fix(web): inline Indonesian fallback strings for CI

Replace the Indonesian locale's per-key English lookups with explicit strings so workspace typecheck no longer depends on brittle build-mode resolution in CI.

Add a regression test that blocks those per-key English lookups from reappearing in the CI-sensitive fallback sections.

Generated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* fix(daemon): restrict proxied connector logos to image MIME types

Reject non-image upstream logo responses so the daemon never serves third-party HTML from its localhost origin.

Generated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* test(e2e): align settings dialog regressions

Generated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* fix(web): decouple Orbit runs from media sync failures

Generated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* fix(web): keep SPA catch-all export-compatible

Disable dynamic catch-all params for the exported SPA shell so Next.js static builds can emit the root route again. Add a regression test covering the route config against the web export mode.

Generated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* fix(web): preserve Orbit config and workspace routes

Generated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* fix(daemon): block SVG in connector logo proxy

Reject SVG and other unsafe proxied logo responses so third-party logo content cannot execute under the daemon origin, while keeping raster logo fetches working and making rejected responses non-cacheable.

Generated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* fix(daemon): fall back to static catalog for empty cache

Generated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* fix(web): disable Orbit run before connector gate resolves

Generated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* fix(desktop): export shipped desktop types

Point the desktop ./main type export at the generated declaration so installed consumers resolve the published file set.

Generated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* fix(web): restore persisted question form selections

Render historical submitted answers directly so reloaded question forms keep their locked selections visible.

Generated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* fix(web): retry forced media sync autosave

Generated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* fix(daemon): keep Composio logo timeout through body read

Keep the Composio logo fetch timeout active until the response body is fully consumed so stalled body reads abort and clear the inflight cache entry. Add a regression test that proves a delayed body read times out and the next request can recover.\n\nGenerated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* fix(web): refresh Orbit gate after connector auth

Re-check connector availability when the settings window regains focus so Orbit unlocks as soon as a connector finishes authenticating in the same settings session.

Generated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* fix(daemon): keep connector detail tool lists intact

Generated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* fix(daemon): ignore malformed Orbit summaries

Generated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* fix(e2e): stabilize design-system multi-select flow

Generated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* fix(daemon): cap Composio logo cache growth

Bound the Composio logo cache with LRU eviction and expired-entry pruning so repeated untrusted logo requests cannot grow daemon memory without limit.

Generated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* fix(daemon): bound proxied Composio logo payloads

Generated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* fix(web): align autosave settings tests

Generated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* fix(web): remove stray CSS conflict marker

Generated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* fixer: address PR #681 follow-up items

Generated-By: looper 0.6.2 (runner=fixer, agent=opencode)

* fix(web): restore restart routes and connector flows

* fix(web): keep SPA export route static

* fix(web): stabilize chat scroll tests

---------

Co-authored-by: lefarcen <935902669@qq.com>
2026-05-08 14:27:46 +08:00
shangxinyu1
7107623ee2
test: expand entry and settings automation coverage (#811)
* test: harden new project panel metadata coverage

* test: add settings and connector sync coverage

* test: expand entry e2e coverage

* test: satisfy exact optional property types in entry connector flow

* test: keep entry Playwright coverage under e2e/ui

* test: tighten coverage docs and settings test cleanup

* test: drop e2e docs from the guarded package

* docs: move automation coverage docs out of e2e

* test: restore clipboard cleanup without delete

* test: match composio save dialog behavior

* test: avoid placeholder assertion after composio save

* test: expect closeModal on settings saves

* test: align settings save assertions with closeModal flags

* test: fix settings save mocks

* test: align composio replacement hint
2026-05-08 09:30:16 +08:00
lefarcen
2bb029cb58
release: Open Design 0.5.0 (#820)
0.5.0 已从 c21cbc6 发布(https://github.com/nexu-io/open-design/releases/tag/open-design-v0.5.0);本次 squash 把版本 bump 与 CHANGELOG [0.5.0] 条目带到 main 历史,便于后续 0.5.1 release 在 main 上走标准 dispatch 流程。
2026-05-08 00:41:01 +08:00
PerishFire
a383b4bd3a
Preserve beta e2e spec reports in R2 (#812)
* Upload beta e2e spec reports to R2

* Expose beta report URLs in summary

* Complete Indonesian deploy locale keys
2026-05-07 20:55:03 +08:00
PerishFire
6efac8887e
Improve Windows beta packaging and installer flow (#768)
* Optimize Windows packaged web output

* Fix packaged contracts runtime build

* Optimize Windows packaged size pruning

* Prune Windows root Next payload

* Remove Windows bundled Node runtime

* Prune Windows standalone duplicate Next

* Add tools-pack cache foundation

* Cache Windows packaged build layers

* Cache Windows workspace builds

* Cache Electron-ready Windows app

* Split Windows tools-pack module

* Cache Windows dir build outputs

* Split Windows pack build modules

* Document Windows NSIS smoke namespace limits

* Move Windows NSIS smoke note to agents guide

* Optimize Windows beta packaging

* Bump packaged beta base version

* Improve Windows installer namespace UX

* Improve Windows tools-pack cache keys

* Stabilize Windows beta cache version keys

* Cache Windows workspace build outputs

* Optimize windows release beta cache layers

* Cache windows release dependencies

* Trim windows release cache before save

* Refresh windows tools-pack cache key

* Improve windows installer preflight prompts

* Fallback NSIS installer strings to English

* Fix Windows installer cleanup and preflight

* Improve Windows NSIS state logging

* Fix system NSIS Persian language alias

* Use long-path removal for Windows uninstall

* Fix mac tools-pack tests on Windows

* Address Windows packaging review feedback

* Fix Windows installer cache namespace isolation

* Include web output mode in Windows tarball cache key

* Use unique Windows release cache save keys
2026-05-07 16:44:15 +08:00
bojie.hbj
a5aaeb7905
Fix: Remove element selector tooltip in Tweaks mode (#697)
* Fix: Remove element selector tooltip in Tweaks mode

* fix: cover localized content fallbacks

* fix: keep comment target labels testable

---------

Co-authored-by: bojiehuang <bojiehuang@bojiehuangdeMacBook-Pro.local>
Co-authored-by: mrcfps <mrc@powerformer.com>
2026-05-07 00:02:38 +08:00
shangxinyu1
8301bcd46e
test: add desktop settings and project flow e2e coverage (#306)
* test: add desktop settings regression coverage

* test: stabilize desktop smoke interactions on latest main

* fixer: address PR #306 follow-up items

Generated-By: looper 0.2.7 (runner=fixer, agent=codex)

* test: expand ui e2e automation suite

* fix: add missing Ukrainian prompt template labels

* chore: align desktop e2e helpers with layout guard

* chore: move settings protocol e2e into ui suite

* fix: preserve api provider settings across protocol switches

* fix: avoid leaking api keys across protocol drafts

* test: fold desktop smoke coverage into mac spec

* fix: dedupe Ukrainian prompt template labels
2026-05-06 21:48:12 +08:00
lefarcen
ae4a08773a
chore(release): prepare 0.4.1 (#659)
- bump remaining monorepo package.json files to 0.4.1 after apps/packaged was already bumped in #637
- add CHANGELOG.md [0.4.1] - 2026-05-06 entry covering the startup hotfix and 19 merged PRs since 0.4.0:
  - Added: manual edit mode (#620), Cmd/Ctrl+P quick file switcher (#556), resizable chat panel (#563), PI status/cancel updates (#618), accessibility and RTL/Bidi craft modules (#587, #595), i18n structure checks (#608)
  - Changed: first-PR README links now surface help-wanted issues (#605)
  - Fixed: packaged contracts runtime exports (#577), packaged runtime beta gating (#637), ACP/MCP/agent fixes (#604, #612, #627), conversation error recovery (#623), native mac quit (#637)
  - Documentation/Internal: OD_DATA_DIR migration docs (#570), Simplified Chinese QUICKSTART (#578), zh-TW/ko README syncs (#586, #619), generated metrics (#592)

Release workflow validation runs after merge via release-stable.
2026-05-06 18:05:56 +08:00
PerishFire
f1cdb2844a
test(e2e): gate beta packaged runtime (#637)
* test(e2e): gate beta mac packaged runtime

* test(e2e): separate ui automation layout

* test(e2e): move localized content coverage

* chore(release): prepare packaged 0.4.1 beta validation

* test(e2e): keep ui lane playwright-only

* fix(web): keep chat recoverable after conversation load failure

* fix(desktop): honor native mac quit
2026-05-06 17:44:29 +08:00
Caprika
8eb9b1b506
Implement manual edit mode (#620) 2026-05-06 16:13:52 +08:00
nhancdt2602
d1d63f9dae
Fix/error message persistence (#623)
* test(ProjectView): persists daemon errors on assistant messages

Signed-off-by: nhancdt2602 <nhancu2602@gmail.com>

* fix(ProjectView): persists daemon errors on assistant message

Signed-off-by: nhancdt2602 <nhancu2602@gmail.com>

* test(e2e): cover agent switch and persistence

Signed-off-by: nhancdt2602 <nhancu2602@gmail.com>

* chore(chat-event): handle falsy empty error detail

Signed-off-by: nhancdt2602 <nhancu2602@gmail.com>

---------

Signed-off-by: nhancdt2602 <nhancu2602@gmail.com>
2026-05-06 15:28:30 +08:00
lefarcen
963bbf2500
release: Open Design 0.4.0 (#454) 2026-05-05 23:39:40 +08:00
PerishFire
bbdd4e84b5
chore: enforce test directory conventions (#496)
* chore: enforce test directory conventions

Move package, app, and tool tests out of src and add guard enforcement so source directories stay source-only.

* ci: use guard and package-scoped tests

Run the new repository guard in CI and keep test execution aligned with package-scoped commands after removing root aliases.

* ci: align stable release guard check

Use the new repository guard in stable release verification after replacing the residual-JS-only script.

* chore: tighten test layout enforcement

Enforce sibling tests directories, typecheck moved test suites with dedicated configs, and refresh remaining guidance that pointed at src-based tests.

* chore: clarify no-emit test tsconfigs

Explicitly disable declaration-only emit in test tsconfigs so review tooling sees they are no-emit typecheck configs.
2026-05-05 15:34:22 +08:00
Oscar Nogueira Neto
df2e8adba0
docs: add Brazilian Portuguese (pt-BR) translations (#460)
* docs: add Brazilian Portuguese (pt-BR) translations

Translate README, CONTRIBUTING, QUICKSTART, and the e2e/cases,
e2e/reports, skills/html-ppt, skills/guizang-ppt READMEs into pt-BR.
Add the pt-BR entry to the language switcher in every existing locale
variant (en, de, fr, ko, ja-JP, ru, uk, zh-CN, zh-TW for README; en,
de, fr, ja-JP, zh-CN for CONTRIBUTING; en, de, fr, ja-JP for
QUICKSTART) so readers in any language can jump to the Portuguese
version. Code blocks, file paths, identifiers, license attribution
links and brand names are kept verbatim with the source.

* docs(pt-BR): use repo-relative links in e2e READMEs

Replace author-local absolute paths (/Users/mac/...) with repo-relative links so the translated docs navigate correctly on GitHub and other checkouts.

* docs(pt-BR): fix README badge anchor and skill reference link

- Update Agents badge href to the Portuguese fragment (#agentes-de-código-suportados) so the badge jumps to the translated heading.
- Restore the [`SKILL.md`][skill] reference-link syntax in the comparison table; the opening bracket was lost in translation, breaking the row.
2026-05-05 09:14:06 +08:00
jiakeboge
2473ab9567
fix(web): add copy buttons for FileViewer code blocks (#471)
* fix(web): add copy buttons for FileViewer code blocks

* fix(web): harden FileViewer markdown copy controls

* fix(web): restore focus after clipboard fallback

* test(e2e): restore execCommand after markdown copy tests
2026-05-05 09:09:39 +08:00
lefarcen
016c08183f
release: Open Design 0.3.0 2026-05-03 23:07:28 +08:00
laurin
0c65a0508e
fix(web): keep Design Files view active after deleting a file (#329)
* fix(web): keep Design Files view active after deleting a file (#115)

Deleting a file from the Design Files panel was navigating the user
into another file's tab (typically the previously-opened file such as
`index.html`). Two unrelated symptoms contributed:

- `FileWorkspace.handleDelete` always called
  `setActiveTab(nextActive ?? DESIGN_FILES_TAB)`, which yanked the user
  out of the Design Files panel whenever `tabsState.active` still
  pointed at a real file (clicking the Design Files tab only updates
  local `activeTab`; it does not touch persisted `tabsState.active`).
- The row popover had `onMouseDown` propagation guard but no
  corresponding `onClick` guard on the destructive action path.

Fix in `FileWorkspace.handleDelete`:
- If the deleted file is the one being viewed (`activeTab === name`),
  fall back to another open tab (or Design Files if none remain) — the
  prior behavior, which is correct in this case.
- Otherwise (deletion came from the Design Files panel, the typical
  path), leave `activeTab` alone so the user stays in the list view.
  Only null out `tabsState.active` when it referenced the deleted file
  to avoid leaving a dangling pointer for the hydration `useEffect` to
  resync against.

Defensive cleanup in `DesignFilesPanel`:
- Add `onClick` propagation guard on the popover wrapper (mirrors the
  existing `onMouseDown` guard).
- Stop propagation on each popover button (Open / Download / Delete)
  and on the download `<a>` so destructive/menu actions can't bubble
  into row click handlers under any DOM arrangement.

Closes #115

* fix(web): address review feedback + add multi-tab delete regression E2E

Review feedback (#329):
- Drop the redundant `onClick={(e) => e.stopPropagation()}` on the
  download `<a>` wrapper; the inner `<button>` is the actual click
  target and already stops propagation.
- Expand the comment in `handleDelete`'s else branch to explicitly
  state that `activeTab` is preserved because the user is in a
  different context (Design Files panel or another tab) and shouldn't
  be navigated away.

Regression coverage:
- Extend `runDesignFilesDeleteFlow` to upload a sibling file
  (`keep-me.png`) before `trash-me.png`, so deletion has a fallback
  tab that the buggy code would have navigated to. After deletion,
  assert the Design Files tab still has `aria-selected="true"` and
  the sibling tab is still present (just not auto-activated).
2026-05-03 09:34:17 +08:00
lefarcen
62b01a6dbf
release: Open Design 0.2.0 (#297) 2026-05-02 22:28:59 +08:00
Caprika
0c00f241e7
Add preview comment attachments (#284) 2026-05-02 19:23:46 +08:00
Marc Chan
a93246d892
chore(ci): add GitHub CI workflow (#271)
* Add GitHub CI workflow

* Address CI workflow review feedback

Generated-By: looper 0.3.0 (runner=fixer, agent=openai/gpt-5.5)
2026-05-02 16:14:33 +08:00
Tom
5a0f954297
fix(daemon): emit tool_use from tool_execution_start in pi-rpc (#186)
* fix(daemon): emit tool_use from tool_execution_start in pi-rpc

The pi-rpc adapter emitted tool_use from message_end, which fires
before tool execution starts. The web UI pairs tool results to prior
tool_use events, so receiving tool_result without a preceding tool_use
broke tool card rendering and file auto-open behavior.

Move tool_use emission to tool_execution_start, matching the pattern
in copilot-stream.ts. Remove redundant tool call extraction from
message_end (tool_use is now emitted at execution time, usage is
already emitted from turn_end).

Extract mapPiRpcEvent as a pure exported function so tests exercise
the real event mapping logic instead of an inlined copy that can
diverge from production.

Ref: mrcfps review comments on PR #117

* docs(daemon): clarify mapPiRpcEvent mutability contract

The function mutates ctx.sentFirstToken to track streaming state.
Calling it "pure" is misleading; revised the doc comment to say
no I/O or child process interaction instead.

* fix(pi-rpc): remove redundant status(tool) emission from tool_execution_start

Now that tool_use fires inline from tool_execution_start, the
accompanying status(tool) event is redundant: tool_use already
carries the tool name, and the UI renders running state from the
tool card. The extra status pill breaks consecutive tool_use
grouping in AssistantMessage.buildBlocks. Aligns with
copilot-stream, which emits only tool_use from
tool.execution_start with no status event.
2026-05-02 16:06:37 +08:00
Siri-Ray
0bafc73d24
Add visible conversation timestamps (#120)
* Add visible conversation timestamps

Generated-By: looper 0.2.7 (runner=worker, agent=codex)

* Fix assistant message timestamps

Generated-By: looper 0.2.7 (runner=fixer, agent=codex)
2026-05-02 11:15:07 +08:00
d 🔹
f8af2cd875
fix(web): exit PreviewModal fullscreen on first Esc press (#168)
Closes #141.

When the user clicked the Fullscreen button, requestFullscreen() put the
stage element into native browser fullscreen and React's `fullscreen`
state was set true. Pressing Esc was meant to exit the overlay, but in
browsers like Firefox the browser consumes Esc to drop its native
fullscreen element without delivering keydown to JS. The React state
stayed true, the `ds-modal-fullscreen` class lingered, and only a second
Esc reached the keydown handler that flipped the state.

Subscribe to `fullscreenchange` so the React state mirrors the native
state. When the browser exits its fullscreen element, the overlay drops
on the same keystroke. The keydown handler is still needed for the
fallback path (no native fullscreen API support, where requestFullscreen
is undefined and only React state is set).

Adds three regression tests in e2e/tests/preview-modal-fullscreen.test.tsx
covering the bug fix path, the keydown fallback, and a non-collapse
guard for transitions where another element is still fullscreen.

Co-authored-by: d 🔹 <258577966+voidborne-d@users.noreply.github.com>
2026-04-30 23:35:01 +08:00
nettee
3fb849d047
Fix chat runs surviving web disconnects (#146)
* fix chat runs surviving web disconnects

* fix chat run create abort propagation

Generated-By: looper 0.0.0-dev (runner=fixer, agent=openai/gpt-5.5)

* fix daemon keepalive reconnect budget

Generated-By: looper 0.0.0-dev (runner=fixer, agent=gpt-5.5)

* fix daemon stream disconnect cancellation

Generated-By: looper 0.0.0-dev (runner=fixer, agent=openai/gpt-5.5)

* fix daemon stream abort cancellation race

Generated-By: looper 0.0.0-dev (runner=fixer, agent=openai/gpt-5.5)

* fix daemon run cancellation semantics

* fix load

* doc

* 2

* add run refresh recovery

* fix active run refresh status

* fix reattach abort handling

* fix

* fix chat initial scroll

* fix daemon start failures

Generated-By: looper 0.2.7 (runner=fixer, agent=openai/gpt-5.5)

* fix background run recovery

Generated-By: looper 0.2.7 (runner=fixer, agent=openai/gpt-5.5)

* fix stop run status

Generated-By: looper 0.2.7 (runner=fixer, agent=openai/gpt-5.5)

* fix background run recovery

Generated-By: looper 0.2.7 (runner=fixer, agent=openai/gpt-5.5)

* extract daemon run service

* move prompt composition to daemon

* fix prompt module resolution

* fix project id generation

* add project run status

* add designs kanban view with awaiting_input status

- add grid/kanban view toggle on Designs tab; persist choice in localStorage
- introduce awaiting_input project display status (daemon-derived from
  unanswered <question-form>) so projects asking the user aren't shown
  as Completed; ordered between Running and Completed with amber accent
- hide transient queued state from users: coerce queued/starting to
  running in daemon /api/projects projection and drop the queued kanban
  column
- a11y polish on Designs cards: Space activation, aria-labels on delete,
  focus-visible outlines, reveal delete on focus-within and touch,
  prefers-reduced-motion handling
- kanban layout uses flex sizing instead of viewport math; scoped icon-
  only pill button rule fixes view-toggle icon alignment

---------

Co-authored-by: mrcfps <mrc@powerformer.com>
2026-04-30 20:16:46 +08:00
Tom
8f34e39b7b
feat(daemon): add pi coding agent adapter (#117)
* feat(daemon): add pi coding agent adapter

Add pi (https://pi.dev) as a supported coding agent, using its
--mode rpc JSON-RPC protocol over stdio for structured event streaming.

Changes:
- apps/daemon/pi-rpc.js: new RPC session handler that drives pi's
  --mode rpc protocol, translating typed agent events (text_delta,
  thinking_delta, tool_use, tool_result, usage, status) into the
  daemon's UI event format. Auto-resolves extension UI requests
  (fire-and-forget consumed, dialogs auto-approved) so pi stays
  unblocked in the headless web UI. Kills the process after agent_end
  since pi's RPC process is designed for multi-prompt sessions.
- apps/daemon/agents.js: add pi agent definition with custom
  fetchModels (pi --list-models outputs to stderr, not stdout),
  575+ models from 20+ providers, reasoning/thinking level support
  via --thinking flag, and streamFormat 'pi-rpc'.
- apps/daemon/server.js: wire pi-rpc stream format to
  attachPiRpcSession; skip stdin.end() for pi-rpc since the RPC
  session manages stdin bidirectionally.
- apps/daemon/acp.js: export createJsonLineStream for reuse by
  pi-rpc.js.
- apps/daemon/pi-rpc.test.mjs: 19 unit tests covering model list
  parsing (TSV, dedup, edge cases), RPC event translation (text,
  thinking, tools, usage, compaction, retry), sendCommand wire
  format, extension UI auto-resolution.
- e2e/tests/structured-streams.test.ts: add pi RPC tool_use/tool_result
  event mapping test alongside existing Claude/Copilot fixtures.

Verified end-to-end: daemon /api/chat → pi RPC → SSE stream with
status, text_delta, usage, and tool events. Live E2E test passes
(OD_E2E_RUNTIMES=pi). All 59 project tests green.

* refactor(daemon): migrate pi-rpc to TypeScript

Follow upstream #118 TypeScript migration convention: rename
pi-rpc.js → pi-rpc.ts and pi-rpc.test.mjs → pi-rpc.test.ts
with @ts-nocheck header (same as all other daemon modules).
Import paths remain ./pi-rpc.js per NodeNext module resolution.

* fix(daemon): avoid duplicate usage events in pi-rpc handler

Pi emits both message_end and turn_end per turn, both carrying
usage data. Emitting from both handlers caused double-counting
in the UI and any consumer that aggregates usage.

Remove usage emission from the message_end branch since turn_end
is the canonical per-turn usage source. Keep tool call extraction
in message_end (unique data not available in turn_end).

Add regression test confirming exactly one usage event is emitted
when both message_end and turn_end carry usage for the same turn.

Addresses Copilot P2 review on PR #117.

* fix(daemon): scope pi RPC id counter per session, bump graceful shutdown

Move nextRpcId and sendCommand inside attachPiRpcSession as local
state, matching the pattern in acp.ts where nextId is scoped per
session. Prevents RPC id collisions across concurrent /api/chat
requests.

Bump post-agent_end SIGTERM grace period from 2s to 5s and make it
configurable via PI_GRACEFUL_SHUTDOWN_MS env var for resource-
constrained machines.

Add test confirming concurrent sessions get independent id sequences.

* fix(daemon): wrap parser.feed in try-catch in pi-rpc

Catch errors from parser.feed and route them through the
existing fail() handler instead of letting them propagate
as unhandled exceptions.
2026-04-30 17:45:11 +08:00
PerishFire
c6d11018a0
Refresh desktop integration control plane (#123)
* feat(dev): add desktop tools-dev control plane

* refactor(sidecar): split Open Design contracts

Move Open Design-specific sidecar protocol definitions into @open-design/contracts so sidecar and platform can remain descriptor-driven primitives.

* refactor(daemon): organize package sources

Keep daemon app code, tests, and sidecar entrypoints in separate package directories so each layer can be built and verified independently.

* chore(repo): streamline maintenance entrypoints

Centralize agent guidance by directory and reduce root command chains while preserving the existing build scope.

* docs: translate agent guidance to English

* fix(sidecar): tolerate stale IPC sockets

Remove stale Unix socket files only after confirming no listener is active, so tools-dev can restart after unclean shutdowns.
2026-04-30 14:23:53 +08:00
nettee
56d08b8c5f
Add shared contracts and migrate project code to TypeScript (#118) 2026-04-30 13:01:15 +08:00
Caprika
5c45c3b967
fix sse keepalive behind nginx (#111) 2026-04-30 11:31:18 +08:00
PerishFire
cfebff9653
Align app directories and isolate e2e tests (#102)
* chore: align app directories

* test: consolidate external suites under e2e
2026-04-30 09:47:03 +08:00
shangxinyu1
751c9de56d
Add UI e2e automation suite and reporting (#64)
* test: add e2e ui automation suite

* fix review feedback for ui e2e suite
Resolved the FileWorkspace.tsx merge-marker issue and kept the intended combination of multiple, accept="image/*", and data-testid.
Updated the e2e port handling so the test config no longer relies on a single hardcoded app port. It now resolves an available port first and passes the same port selection through the dev server and Playwright base URL. Since main has moved to the Next.js dev stack, this was also adapted from the old Vite-based flow to NEXT_PORT.
Kept test:ui serialized so cleanup completes before Playwright starts.
Updated reset-e2e-artifacts.mjs so cleanup failures are surfaced with a warning instead of being silently swallowed, except for the expected ENOENT case.
2026-04-29 23:31:17 +08:00