open-design

mirror of https://github.com/nexu-io/open-design.git synced 2026-06-01 03:14:35 +07:00

Author	SHA1	Message	Date
kami	ac748d53fc	Fix pointer HTML artifact previews (#2075 ) Resolves short pointer-only HTML artifacts to the existing project HTML target before persisting preview artifacts, with helper and ProjectView regression coverage for #536. Validation: CI and nix-check were green on PR head `8c7ee5f38d`.	2026-05-18 18:28:28 +08:00
kami	0101a09b10	fix(mcp): support no-auth local HTTP servers (#2008 )	2026-05-18 17:08:46 +08:00
Yuhao Chen	25708fbe0f	fix(web): keep connector errors out of cards (#1740 ) * fix(web): keep connector errors out of cards * fix(web): route panel-alert open-details through openConnectorDetails The new "open details" action on the panel-alert row called `setDetailConnectorId(alert.connectorId)` directly, bypassing the `toolPreviewFailedIds` clear that `openConnectorDetails()` does on every other entry point. Concrete regression: user opens a connector's drawer from the card, the tool-preview hydration fetch fails so `toolPreviewFailedIds[id]` gets stamped with the current `toolPreviewRetryToken`, user closes the drawer, and later the same connector throws a connect error and shows up as a panel alert. Clicking the alert's open-details button re-sets `detailConnectorId` but never clears the failed-fetch token, so the hydration effect at the previous bail-out check (`toolPreviewFailedIds[detailConnector.id] === toolPreviewRetryToken`) returns immediately and the drawer never retries loading tools. Routing the click through `openConnectorDetails(alert.connectorId)` restores the same clear-before-set behavior the card and search entry points already had, so the reopened drawer always gets a fresh hydration attempt. * test(web): cover connector panel alert tool retry * fix(web): clarify connector panel alert status text * fix(web): clear stale connector cancel alerts --------- Co-authored-by: lefarcen <935902669@qq.com>	2026-05-18 17:06:46 +08:00
Ethan Guo	bd1b41a9d4	feat(web): highlight captured Pod component on chip hover (#1982 ) * feat(web): highlight captured Pod component on chip hover Wires onPointerEnter/Leave on chip + onFocus/Blur on its × so the matching member overlay gets an outline-focused emphasis on canvas. * ci: trigger re-run for flaky palette dark-mode test * fix(web): skip non-mouse pointer events on pod chip hover Gate onPointerEnter/Leave to pointerType === 'mouse' so a touch tap on a chip no longer flickers the captured-member highlight; also add the CommentTargetOverlay class-wiring test, drop the redundant alpha on the hover outline, and document the non-member fallback path.	2026-05-18 16:34:19 +08:00
fyz3120	92d91101ae	Fix inspect ancestor selection notice (#2039 ) Co-authored-by: fuyizheng3120 <182291973+fuyizheng3120@users.noreply.github.com>	2026-05-18 16:33:42 +08:00
Jeongjin Shin	0026dfa344	fix: support CODEX_API_KEY for Codex CLI (#1880 ) Some checks failed ci / Packaged mac smoke (push) Blocked by required conditions Details ci / Packaged windows smoke (push) Blocked by required conditions Details ci / Detect PR change scopes (push) Failing after 1s Details ci / Validate workspace (push) Has been skipped Details landing-page-ci / Validate landing page (push) Failing after 1s Details landing-page-deploy / Deploy landing page (push) Has been skipped Details nix-check / build (push) Failing after 3s Details ci / Packaged linux headless smoke (push) Has been skipped Details * fix(daemon): allow Codex API key env * fix(web): expose Codex API key setting * fix(web): clarify Codex key labels	2026-05-18 14:29:23 +08:00
Ghxst	332e982a07	fix(web): stabilize HTML preview URL on tab switch (#1959 ) Some checks failed ci / Packaged mac smoke (push) Blocked by required conditions Details ci / Packaged windows smoke (push) Blocked by required conditions Details ci / Detect PR change scopes (push) Failing after 1s Details ci / Validate workspace (push) Has been skipped Details nix-check / build (push) Failing after 2s Details ci / Packaged linux headless smoke (push) Has been skipped Details Co-authored-by: GHX5T-SOL <200635707+GHX5T-SOL@users.noreply.github.com>	2026-05-17 23:25:14 +08:00
kami	91be2696dd	Persist routine failure reasons (#1963 ) Co-authored-by: multica-agent <github@multica.ai>	2026-05-17 23:22:00 +08:00
hahaplus	89bf313782	fix(web): align Home prompt overlay with textarea so caret lands on click (#1958 ) Picking a chip such as Slide deck or Image loaded a default prompt into the Home textarea and rendered an overlay with `{{key}}` placeholders as interactive `<input>` / `<select>` controls. The overlay controls and the underlying textarea text were laid out independently: - Inputs declared `min-width: 8ch` and `Math.max(displayValue.length + 1, 10)ch` of width. - Selects added 18px of right-padding for the dropdown arrow. - The textarea kept the raw substituted string in a proportional font. The two layouts no longer matched column-for-column, so every slot shifted the textarea text to the left of where it appeared in the overlay, compounding across the line. Clicking on visible prose to position the caret hit a different character offset in the textarea and subsequent typing or deletion landed in the wrong place. For example, the Slide deck template Create a {{slideCount}}-slide {{deckType}} for {{audience}} about {{topic}}. ... renders with slideCount=10 (~2 ch in the textarea) under a slot input forced to 10 ch in the overlay — clicking right after the literal `-slide` placed the caret several characters into `pitch deck`. The Image / Video / Audio chips with their pre-filled subject, style, aspect values reproduced the same drift. Render the inline pills as read-only `<span>`s carrying the exact substring the textarea shows at that position, mark them `aria-hidden` so the textarea remains the single labelled control, and surface every plugin input field — including the ones referenced inline — in `PluginInputsForm` underneath. Editing flows through the form, the parent's `updateActiveInputs` already re-renders the prompt, and the pills stay aligned with the textarea on every keystroke. Also drop the now-unused inline helpers (updatePluginInput, getTemplateInputNames, shouldRenderSlotAsText, inlineFieldType, fileInputLabel, fileMetadata) and the dead `.home-hero__prompt-slot-control/input/select/toggle/file/text` CSS rules. Verified: - pnpm --filter @open-design/web typecheck - vitest run on HomeHero.plugin-picker, HomeHero.rail, and HomeView.prefill (29/29 pass; tests updated to reflect the new read-only span + always-on form contract) - Manual click-to-edit on Slide deck and Image chips in the pnpm tools-dev web runtime — caret now lands where the user clicked.	2026-05-17 23:18:28 +08:00
Ethan Guo	0d2c87242f	fix(web): snapshot pending before reload so expired-auth cancel actually fires (#1972 ) reloadConnectorStatuses commits its prune setState (and runs the connectorAuthorizationPendingRef mirror effect) between the two awaits in onFocus, so cancelStaleAuthorizations was reading a post-prune ref and never identified anyone as stuck. Capture the snapshot inside onFocus before reloadConnectorStatuses runs, and pass it explicitly so the expired-auth daemon cancel actually fires. Refs #1909 #1354	2026-05-17 23:13:53 +08:00
kami	d36ceacf3a	feat(web): surface saved Project instructions for review and retrieval (#1933 ) Saved Project instructions had no post-save read-back surface. The header control was a bare pencil icon that opened an editor and closed on Save, so after returning to the workspace a user could not confirm what was stored, revisit it, or tell whether it was still active (#1822). Make the saved state discoverable: once instructions exist, the header affordance becomes a persistent "Project instructions" chip. Opening it shows a read-only review panel with a preview of the saved text and an "Active - included in every message" status, plus a reopen-to-edit action. Saving lands back on the review panel so the stored value is read back immediately. The empty state keeps the pencil and opens the editor directly. Persistence and project-level system-prompt injection are unchanged. Closes #1822 Co-authored-by: multica-agent <github@multica.ai>	2026-05-17 23:07:25 +08:00
kami	647433ccef	Fix Inspect preview transport flash (#1967 ) Some checks failed ci / Packaged mac smoke (push) Blocked by required conditions Details ci / Packaged windows smoke (push) Blocked by required conditions Details ci / Detect PR change scopes (push) Failing after 2s Details ci / Validate workspace (push) Has been skipped Details nix-check / build (push) Failing after 1s Details ci / Packaged linux headless smoke (push) Has been skipped Details * Fix inspect preview transport flash Co-authored-by: multica-agent <github@multica.ai> * Keep inactive srcdoc transport lazy Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-17 22:30:37 +08:00
kami	30ad8b8ac3	improve privacy consent modal: policy link, clearer CTAs, mobile layout (#1921 ) The first-run consent banner had no link to a privacy policy, an affirmative button ("Help improve") that didn't read as a consent choice, and a fixed bottom-right card that crowded content on phones. - Add a "Read the privacy policy" link (external-link icon, accent, underlined) above the actions, plus a root PRIVACY.md it points to documenting the telemetry behaviour the modal discloses. - Rename the CTAs to "Share usage data" / "Don't share" so both name the action; they stay equal-prominence per the EDPB/GDPR comment. - Stretch the banner to a bottom-edge bar under 540px with a safe-area inset so it clears mobile browser chrome. - Add PrivacyConsentModal tests; sync the new i18n key to every locale and update the consent-label assertion in App.connectors. Refs #1756	2026-05-17 20:24:15 +08:00
Ethan Guo	b7fa1d9286	fix(web): recover stuck Composio connector when auth is cancelled (#1909 ) * fix(web): recover stuck Composio connector when auth is cancelled When a user closes the Composio system-browser auth window without completing the flow, the connector card stayed in its loading state forever because nothing told the daemon the request was abandoned. The window-focus refresh now awaits the status poll and, for every connector still in connectorAuthorizationPending whose freshly fetched status is not 'connected', issues cancelConnectorAuthorizationRequest and clears the matching local pending/error UI state. The connected guard is read from the just-returned statuses so a postMessage success that lands just before focus is not racy. Fixes #1354 * fix(web): gate Composio focus auto-cancel on TTL + cancel response Require expiresAt to have elapsed before auto-cancelling a pending Composio auth on focus, and preserve the spinner on a failed cancel (matching the manual Cancel handler) so the UI never disagrees with the daemon. Refs https://github.com/nexu-io/open-design/pull/1909#discussion_r3254152045 Refs https://github.com/nexu-io/open-design/pull/1909#discussion_r3254152046	2026-05-17 20:15:38 +08:00
Ethan Guo	324eca27ea	feat(web): add manual removal for captured Pod components (#1951 ) * feat(web): add manual removal for captured Pod components Adds `removePodMember` helper and a per-chip × in `BoardComposerPopover`; leaves `comments.ts` untouched (avoid-zone from #1127). Closes #802 Contract: runs/2026-05-16T08-08-52_open-design_issue-feat/contract.md * style(web): hide Pod chip × until chip hover Swaps the unicode × for the existing `Icon name="close"` SVG so the hit target stays centered, and fades the button in only on chip hover / keyboard focus for a quieter resting state. * fix(web): auto-close Pod composer when last chip is removed Removing the last chip leaves a stale anchor; close so Send cannot attach to elements no longer visible. * refactor(web): extract BoardComposerPopover and Pod-member reducer Moves the popover to its own module and lifts the chip-removal reducer into a pure `applyPodMemberRemoval` so unit tests exercise the real code path and the popover's export is no longer test-only. * fix(web): rebuild Pod anchor when a member is removed Without this, the popover keeps the original union bbox / selector / label after each chip removal, so a subsequent Send to chat anchors the comment to elements no longer in the Pod. * fix(web): render every captured chip and scroll on overflow The previous slice(0, 6) cap left chips beyond the sixth invisible and undeletable. Render the full list inside a 132px-tall scrollable strip.	2026-05-17 20:13:56 +08:00
Ethan Guo	10c62bf6e4	fix(web): warn before editing a built-in skill creates a shadow (#1850 ) Clicking Edit on a built-in skill silently issued PUT /api/skills/:id, which the daemon stores as a user-owned shadow and then hides the built-in entry from the Settings list. From the user's perspective the row they just edited disappears with no explanation. Arm an inline override-warning banner on Edit for source==='built-in' skills, mirroring the existing inline delete-confirm pattern. The edit form only opens after the user explicitly confirms. User skills bypass the warning and edit directly. No daemon or listSkills contract change. Fixes #1378	2026-05-17 18:30:45 +08:00
Ethan Guo	a6288cec3f	fix(web): allow editing existing routines from RoutinesSection (#1884 ) The Routines list only exposed Run now / Pause / History / Delete, so any change to a routine required deleting and recreating it. The daemon already serves PATCH /api/routines/:id; only the web entry point was missing. Add an Edit button on each routine row that pre-populates the create form from the stored schedule and target via a new formFromRoutine helper, and branch the submit handler on editingId so it PATCHes /api/routines/:id with the updated fields instead of POSTing a new routine. Fixes #1373	2026-05-17 15:02:54 +08:00
张东明	bac56415a2	fix(web): surface daemon error messages for invalid folder imports (#1923 ) * fix(web): surface daemon error messages for invalid folder imports importFolderProject() was swallowing non-2xx responses by returning null, so the UI could only show a generic "Open folder failed: <path>" message even though the daemon already returns specific errors like "cannot import the filesystem root" or "folder not found". Parse the daemon error body and throw so the panel displays the actual reason. Also show feedback for empty path input instead of silently returning. Fixes #1186 * test(web): update folder import test to match new error propagation The existing test expected a generic "Open folder failed: <path>" message from a boolean return. Update to match the new behavior where the daemon's error message is thrown and displayed directly.	2026-05-17 15:00:49 +08:00
Yuhao Chen	26502ea124	test(web): decouple memory preview icon assertion (#1863 )	2026-05-16 11:37:34 +08:00
mehmet turac	6b15236843	fix(web): distinguish expanded memory preview action (#1813 )	2026-05-15 23:08:30 +08:00
mehmet turac	9689dce1ad	fix(web): align comment marker numbering (#1826 )	2026-05-15 23:07:32 +08:00
mehmet turac	444f7d03eb	fix(web): reveal memory editor after edit click (#1827 )	2026-05-15 23:07:18 +08:00
mehmet turac	7d9488300e	fix(web): keep draw overlay scrollable (#1848 )	2026-05-15 23:05:49 +08:00
lefarcen	22a3b99a47	Merge origin/main into preview/v0.8.0 Sync 49 commits from main. Conflicts resolved: - .github/workflows/ci.yml: kept v0.8.0 granular per-area gating, added main's linux specs + release-stable.yml + release-preview.yml triggers - .github/workflows/release-preview.yml: kept v0.8.0's full workflow over main's placeholder - apps/web/src/components/AssistantMessage.tsx: combined v0.8.0 file-ops summary with main's stripTodoToolGroups + suppressAskUserQuestionFallbackText - apps/web/src/components/ChatPane.tsx: kept both new imports - apps/web/src/index.css: kept both .msg-plugin-chip and .user-copy-btn blocks - e2e/ui/*.test.ts: kept v0.8.0 openEntrySettingsDialog helper over main's inline dialog navigation (UI was redesigned in v0.8.0) - nix/package-{daemon,web}.nix: kept v0.8.0 pnpmDepsHash; rerun nix build to refresh	2026-05-15 18:23:33 +08:00
mehmet turac	15ebe8b266	fix(web): keep picker hint clear of comments panel (#1820 )	2026-05-15 17:51:12 +08:00
mehmet turac	fd80d7c997	fix(web): clear draw ink when exiting mode (#1821 )	2026-05-15 17:48:28 +08:00
leessju	4e19c3f4f3	Prevent imported Claude canvases from zooming on scroll (#1726 ) * Preserve HTML preview state across mode toggles HTML previews could rebuild their iframe when switching into Edit or Comment, which reset scroll/canvas state and caused visible churn for multi-file artifacts. The viewer now keeps URL-loaded previews mounted when the artifact owns the mode bridge, relays file-refreshes through frame navigation, and restores preview scroll/viewport state across bridge mode changes. Constraint: Generic srcdoc-only bridges are still required for unbridged artifacts, inspect mode, palette tweaks, decks, draw overlays, and forceInline. Rejected: Keep all Edit/Comment previews on srcdoc \| causes unnecessary iframe replacement for bridge-capable URL-loaded artifacts. Confidence: high Scope-risk: moderate Directive: Do not enable URL-load for bridge-dependent modes unless the artifact has an owned postMessage bridge. Tested: pnpm guard Tested: pnpm --filter @open-design/web typecheck Tested: pnpm --filter @open-design/web test Tested: Playwright verified Edit and Comment toggles preserve iframe src and DOM node while receiving comment targets. * Prevent preview wheel gestures from escaping into zoom Trackpad pinch-like wheel events arrive with ctrl/meta modifiers on some platforms, which can make a normal vertical scroll feel like the preview zoomed. The preview now consumes those modified wheel events inside the host preview shell and in injected srcdoc previews, then maps the delta back to scroll where a scroll target exists. Constraint: URL-loaded sandbox iframes cannot always be inspected by the host, so srcdoc previews need their own in-frame guard. Rejected: Add allow-same-origin to preview iframes \| weakens the sandbox boundary for generated artifacts. Confidence: medium Scope-risk: narrow Directive: Do not broaden iframe sandbox permissions to fix gesture handling without a security review. Tested: pnpm guard Tested: pnpm --filter @open-design/web typecheck Tested: pnpm --filter @open-design/web exec vitest run tests/components/FileViewer.test.tsx tests/runtime/srcdoc.test.ts Tested: playwright-cli verified ctrl-wheel in preview keeps app zoom at 100% and prevents default in the iframe context * Revert "Prevent preview wheel gestures from escaping into zoom" This reverts commit `976407ab4c`. * Prevent imported Claude canvases from zooming on scroll Claude Design exports can classify ordinary macOS two-finger vertical wheel events as mouse-wheel zoom clicks inside design-canvas.jsx. Normalize that imported canvas code so plain wheel input pans, while Cmd+wheel remains the explicit zoom gesture. Constraint: The offending canvas code lives inside imported user artifacts rather than a tracked runtime component, so the fix belongs in the Claude Design zip import normalization path.\nRejected: Host-side wheel interception \| wheel events inside the sandboxed iframe are handled by the artifact before the host can reliably classify them.\nRejected: Disable all wheel zoom \| users still need Cmd+wheel as an explicit zoom control.\nConfidence: high\nScope-risk: narrow\nDirective: Keep plain wheel as pan-only for imported design-canvas.jsx unless a future bridge provides an explicit wheel-mode handshake.\nTested: pnpm --filter @open-design/daemon exec vitest run tests/claude-design-import.test.ts\nTested: pnpm --filter @open-design/daemon typecheck\nTested: pnpm guard --------- Co-authored-by: nicejames <nicejames@gmail.com> Co-authored-by: lefarcen <935902669@qq.com>	2026-05-15 16:37:57 +08:00
Nicholas-Xiong	d16acf6462	fix: Add error feedback for manual folder path import (#1666 ) * fix: Add error feedback for manual folder path import Fixes #1408 When users manually enter a folder path and click 'Open folder' (non-Electron environment), the app now provides clear error feedback if the import fails. Before: - No error clearing before import - No error handling for failed imports - Silent failures left users confused After: - Clears previous errors before attempting import - Catches and displays import errors with clear messages - Success feedback is implicit (navigation to the opened project) Why implicit success feedback: The parent handler (Home.tsx) navigates to the newly opened project on success, which provides clear visual feedback by changing the entire view. An additional toast would be redundant. Error handling: - Catches all errors from onImportFolder - Displays user-friendly error messages - Preserves error details when available * fix: surface failed folder imports --------- Co-authored-by: Siri-Ray <2667192167@qq.com>	2026-05-15 16:36:24 +08:00
Zihan Zhao	cfcfbe0178	Inline attached file context for BYOK chats (#1730 ) BYOK/API-mode chats bypass the daemon run path, so attached project files were saved as message metadata but their readable contents were not sent to the provider. This adds a web-side attachment context step for API-mode requests, reusing raw text reads and existing document preview extraction. Constraint: Docker PDF previews require pdftotext in the runtime image Confidence: high Scope-risk: moderate Tested: corepack pnpm --filter @open-design/web test -- tests/api-attachment-context.test.ts tests/components/ProjectView.api-empty-response.test.tsx Tested: corepack pnpm --filter @open-design/web typecheck Tested: corepack pnpm --filter @open-design/web build Tested: corepack pnpm guard Tested: corepack pnpm typecheck	2026-05-15 15:52:15 +08:00
Yuhao Chen	b2d2635360	fix(web): hide resolved comments from preview overlays (#1762 )	2026-05-15 15:46:03 +08:00
Quang Do	88db51521d	feat(web): add custom select primitive (#1714 ) Some checks failed ci / Packaged mac smoke (push) Blocked by required conditions Details ci / Packaged windows smoke (push) Blocked by required conditions Details ci / Detect PR change scopes (push) Failing after 2s Details ci / Validate workspace (push) Has been skipped Details nix-check / build (push) Failing after 1s Details * feat(web): add custom select primitive * fix(web): harden custom select active option state	2026-05-15 14:43:18 +08:00
Tom Huang	c5d77a03bd	Garnet hemisphere (#1769 ) Some checks failed nix-check / build (push) Failing after 2s Details * feat(chat-composer): enhance mention handling and input overlay - Introduced a new overlay for inline mentions in the chat composer, improving user experience by visually indicating mentions as users type. - Updated the `ChatComposer` component to manage mention entities and integrate them into the input field, allowing for better context and interaction. - Enhanced the `AssistantMessage` component to support the display of plugin action panels based on the current project context, facilitating easier plugin management. - Refactored related components to ensure consistent handling of project files and mentions across the application. This update significantly improves the chat interaction model, making it more intuitive for users to engage with mentions and plugins. * feat(plugin-management): enhance plugin action panels and UI components - Updated the `AssistantMessage` component to include plugin action panels based on the latest project context, improving user interaction with generated plugins. - Refactored the `PluginsView` to support detailed views for available marketplace entries, allowing users to access more information and actions for each plugin. - Introduced new CSS styles for improved visual representation of plugin-related UI elements, enhancing overall user experience. - Enhanced the `listPlugins` function to include an option for fetching hidden plugins, providing more flexibility in plugin management. This update significantly improves the usability and functionality of the plugin management system, making it easier for users to interact with and manage their plugins. * fix(assistant-message): refine plugin folder candidate selection logic - Updated the `pluginFoldersTouchedThisTurn` function to improve the logic for selecting plugin folder candidates based on touched paths and message content. - Introduced a new helper function, `pathMatchesFolderFileBasename`, to enhance the matching criteria for folder candidates. - Added a check for explicit folder matches before falling back to a single candidate, improving accuracy in folder selection. - Modified the `shouldRenderSlotAsText` function in `HomeHero` to include the name parameter, refining the rendering logic for slot text. These changes enhance the functionality and reliability of the assistant message component in managing plugin folder candidates. * feat(plugin-folder-actions): implement agent-routed CLI actions for plugin management - Introduced a new `PluginFolderAgentAction` type to streamline actions related to plugin folders, including install, publish, and contribute. - Updated the `DesignFilesPanel`, `FileWorkspace`, and `AssistantMessage` components to utilize the new agent action handling, improving user interaction with generated plugins. - Refactored the action handling logic to send commands to the agent, enhancing the workflow for managing plugin folders. - Added corresponding tests to ensure the new functionality works as expected and integrates seamlessly with existing components. This update significantly enhances the plugin management experience by routing actions through the agent, allowing for a more cohesive and interactive user experience. * Fix PR 1702 CI blockers * Fix PR 1702 remaining CI checks * Prebuild AGUI adapter after install * Restore plugin project snapshot wiring * feat(marketplace): refactor marketplace URL handling and enhance fetching logic - Introduced new functions to normalize marketplace URLs and manage fetching of marketplace manifests, improving the reliability of marketplace integrations. - Updated the server and plugin logic to utilize the new fetching mechanisms, ensuring consistent handling of marketplace data. - Enhanced tests to cover new URL normalization and fetching scenarios, ensuring robustness in marketplace management. This update significantly improves the marketplace experience by streamlining URL handling and enhancing data fetching capabilities. * Fix project auto-send cleanup spec * Reconcile run messages on cancel * Use active design system as visual direction * Fix active design system prompt wording * feat(workspace-tabs): implement workspace tabs functionality and file attachment handling - Introduced a new `WorkspaceTabsBar` component to manage workspace tabs, allowing users to navigate between different views (projects, marketplace, etc.). - Enhanced file handling capabilities in the `HomeHero` and `EntryShell` components, enabling users to stage and attach files before project creation. - Updated the `App` component to support auto-sending attachments alongside the first message in a project. - Improved CSS styles for workspace tabs and attachment UI, ensuring a cohesive design and user experience. This update significantly enhances the workspace navigation and file management features, providing users with a more intuitive and efficient workflow. * refactor(workspace-tabs): streamline workspace tabs and UI components - Removed unused components and actions from the `WorkspaceTabsBar` and `AppChromeHeader`, simplifying the codebase. - Updated CSS styles for the workspace shell and tabs, enhancing visual consistency and reducing element sizes for a cleaner layout. - Introduced a new client type detection mechanism to dynamically adjust the workspace shell's class, improving responsiveness. - Added tests for the `WorkspaceTabsBar` to ensure proper navigation and tab management functionality. These changes improve the overall performance and user experience of the workspace navigation system. * Update critical e2e for entry modal flow * Stabilize entry critical e2e flows * fix(ui): adjust workspace tabs and header styles for improved layout - Updated the CSS for workspace tabs and the app header, reducing element sizes and padding for a cleaner appearance. - Introduced a new button in the `WorkspaceTabsBar` for quick access to the home tab, enhancing navigation. - Minor adjustments to the layout and styles to ensure consistency across components. These changes enhance the user interface and improve the overall user experience in the workspace navigation system. * feat(workspace-tabs): implement pinned home tab functionality - Added a new pinned home tab feature to the `WorkspaceTabsBar`, allowing the home tab to remain accessible during navigation. - Updated tab management logic to collapse duplicate home tabs into a single pinned instance when restoring from local storage. - Enhanced CSS styles for workspace tabs to accommodate the new pinned tab design. - Updated tests to verify the behavior of the pinned home tab and its interaction with other tabs. These changes improve navigation consistency and user experience within the workspace. * refactor(workspace-tabs): enhance tab management and styling - Updated CSS styles for workspace tabs, adjusting padding and flex properties for improved layout and consistency. - Refactored tab creation logic to ensure unique IDs for project and marketplace tabs, enhancing navigation clarity. - Removed deprecated functions related to pinned home tabs, streamlining the codebase. - Improved test cases to verify independent behavior of home tabs during navigation. These changes enhance the user experience by providing a more intuitive tab management system and a cleaner UI. * style(workspace-tabs): update CSS for improved layout and visibility - Adjusted CSS properties for workspace tabs, including overflow, position, and z-index to enhance layout and stacking context. - Ensured consistent styling across tab components for better visual hierarchy. These changes contribute to a more polished and user-friendly interface within the workspace. * style(entry-layout): update CSS variables for improved layout consistency - Replaced fixed width values with CSS variables for the entry rail to enhance flexibility. - Adjusted padding and height properties for better visual alignment and spacing. - Introduced a new background style for the entry main topbar to improve aesthetics. These changes contribute to a more responsive and visually appealing layout in the entry view. --------- Co-authored-by: qiongyu1999 <2694684348@qq.com> Co-authored-by: Eli <129168833+qiongyu1999@users.noreply.github.com>	2026-05-15 14:42:11 +08:00
ngoduybien	843b6fec4f	fix(web): fall back to srcDoc when HTML preview needs sandbox shim (#1306 ) * fix(web): fall back to srcDoc preview when HTML needs the sandbox shim The URL-load HTML preview iframe is sandboxed with `allow-scripts` only — no `allow-same-origin` — so any artifact that reads `localStorage`/`sessionStorage` at startup throws SecurityError, its React tree unmounts, and the preview goes blank. The srcDoc path already polyfills both via `injectSandboxShim` (apps/web/src/runtime/srcdoc.ts) before any user script runs, but URL-load served raw HTML untouched. Agent-emitted React prototypes that read Web Storage at mount went blank until the user toggled Tweaks (which forces the srcDoc path). Detect the two reliable signals — `<script type="text/babel">` (Babel-standalone XHR-fetches and evals sibling `.jsx` files at runtime; those routinely read Web Storage from `useState` initializers) and direct `localStorage` / `sessionStorage` references in the source — and set `forceInline` automatically so those artifacts route back through the srcDoc path. Plain static HTML keeps the URL-load benefits (real source maps, per-asset HTTP caching, isolated per-script failures). No new daemon endpoint, no new contract, no sandbox loosening. Pure content sniff in the existing render-mode helper; reuses the same `forceInline` seam the `parseForceInline` opt-out already uses. Tests cover the new helper across positive (Babel-standalone variants with attribute reordering, quoting, whitespace, case) and negative (plain script tags, module type, JSON type, substring lookalikes) cases. * fix(web): address review feedback on sandbox-shim fallback - Accept unquoted `<script type=text/babel>` per HTML5 attribute syntax (regex `["']` → `["']?`). Adds a focused test covering bare unquoted, unquoted with unquoted `src=`, mixed unquoted/quoted, and the negative case `<script type=text/babelish>` to confirm the trailing word-boundary still rejects look-alikes. - Memoize `htmlNeedsSandboxShim(source)` on `source`. HtmlViewer re-renders on board/inspect/edit/slide state changes; the scan only changes when the source itself does. Cheap micro-opt, free correctness win. - Narrow the helper docstring's scope claim and add an explicit known limitation: external scripts (`<script src="./app.js">`, `<script type="module" src="./main.js">`) that read Web Storage during module eval are not covered — the helper only sees the document, not the linked subresource. Workaround documented: `?forceInline=1` or Tweaks. Catching this case would require fetching every script reference before deciding load strategy, duplicating browser work; not worth the cost until a real report surfaces. * fix(web): correct inline comment on `\b` boundary behavior The comment claimed `\b` rejects `text/babel-other`, but `\b` matches between `l` and `-` (hyphen is a non-word char), so the regex actually does match that input. The test asserts `text/babelish` as the negative case, which `\b` does correctly reject (`i` is a word char). Comment now matches the regex's actual behavior, with a note that hyphenated variants are a harmless false positive (srcDoc fallback is the safe direction) and a pointer to the `(?=[\s>"'])` lookahead tightening if a real case ever surfaces. No behavior change; existing tests still pass. * fix(web): align test comment with helper docstring on hyphenated variants Same class of inconsistency the previous commit fixed in the helper: the test comment claimed `type=text/babel-other` "remains a non-match", but the assertion actually covers `type=text/babelish`, and the helper docstring explicitly documents hyphenated variants as a safe false-positive that does match. Comment now describes both shapes correctly and explains why the hyphenated variant isn't asserted (it's the documented safe direction, not a regression). No behavior change; test count unchanged. * chore: trigger CI	2026-05-15 14:41:23 +08:00
chaoxiaoche	bcc58af931	refactor(web): rename Execution mode and tighten settings dialog UI (#1568 ) * refactor(web): rename Execution mode and tighten settings dialog UI - Rename "Settings → Execution & model" to "Settings → Execution mode" across the web UI, i18n keys, docs, and e2e selectors. - Redesign SettingsDialog: kicker + title row in the modal head, a flatMap-driven agent grid that renders the inline test-result row beside the selected card, compact unavailable cards with right-aligned install/docs links, and an install guide that only shows when the user has no working agent picked. - Trim verbose subtitle / hint copy across chat model, CLI proxy, media providers, custom instructions, and memory sections. - Add an `info` Icon variant for the redesigned settings hints. - Update e2e selectors and docs that referenced the old menu label. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(web): polish Settings dialog — media providers, skills, MCP Media providers - Hide internal Stub fixture provider (settingsVisible: false) - Split provider list into Available (integrated, editable) and Coming Soon (collapsed <details> drawer with name/hint/Docs link only) - Drop right-side Integrated/Configured badges from every row; all rows in the main list are integrated by definition; inline grey "Saved" chip next to the provider name is the only status indicator now - "Saved" badge moves inline to the right of the provider name and uses a neutral grey treatment (was a standalone green pill below the name) - "Reload from daemon" button shows a 2s green "✓ Reloaded" flash on success instead of leaving a permanent paragraph under the header; errors remain sticky Skills - Replace three pill-row filter banks (Source, Type, Category) with a compact single-row toolbar: search + three inline <select> dropdowns side by side; active filter highlighted with a stronger border MCP server - Shorten section hint to one line - Move WHAT YOUR AGENT CAN DO capabilities above the client dropdown (motivate before asking to act) - Move "Build the daemon first" warning below the code block where it contextually explains why the command might fail, not as a top-level error before the user has done anything - Downgrade "Restart your client" left-border from accent orange to border-strong grey — it is a next step, not a warning External MCP - Shorten section hint to one line Misc CSS - Add .sr-only utility for accessible off-screen live regions - Add button.ghost.is-success-flash for transient success feedback - Add .library-filter-selects / .library-filter-select for dropdown filter rows - Add .media-provider-coming-soon-* for the roadmap drawer Co-authored-by: Cursor <cursoragent@cursor.com> * [codex] Add Cursor Agent auth diagnostics (#1538) * Add Cursor Agent auth diagnostics * Handle Cursor not logged in auth status * Address Cursor auth review feedback * Classify Cursor stdout auth failures * test: expand Memory and Routines coverage (#1521) * test: expand settings and packaged coverage * test: extend memory settings coverage * test: cover routine settings failure states * test: cover routine operation failures * test: fix daemon test typing on CI * test: decouple packaged smoke from orbit bug * test: avoid live memory LLM calls in route tests * test: fix daemon fetch typing in CI * fix: restore preview comment and inspect toggles * test: align manual edit flow with current inspector UX * test: align comment attachment flow with current preview comments UI * fix: probe resolved Codex launch path during detection * fix: remove duplicate board activation helper after rebase * test: update ghost cli detection mock * test: align FileViewer toolbar expectation * ci: move full app tests to extended lane * ci: run app tests by changed scope * ci: cover shared app inputs in test scopes * ci: avoid setup-node cache in windows packaged smoke * test: align extended settings and manual edit flows * refactor(web): rename Execution mode and tighten settings dialog UI - Rename "Settings → Execution & model" to "Settings → Execution mode" across the web UI, i18n keys, docs, and e2e selectors. - Redesign SettingsDialog: kicker + title row in the modal head, a flatMap-driven agent grid that renders the inline test-result row beside the selected card, compact unavailable cards with right-aligned install/docs links, and an install guide that only shows when the user has no working agent picked. - Trim verbose subtitle / hint copy across chat model, CLI proxy, media providers, custom instructions, and memory sections. - Add an `info` Icon variant for the redesigned settings hints. - Update e2e selectors and docs that referenced the old menu label. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(web): settings dialog UX polish — layout, dedup, and interactions - Remove duplicate section headers from all settings sections (Notifications, Appearance, Privacy, About, Design Systems, Skills, MCP server, Connectors, Media providers, Routines) - Restructure Notifications cards: title + toggle on same row, hint below - Restructure Skills toolbar: search + New skill button in row 1, filter dropdowns in row 2 with left-aligned labels - Restructure Pet section: tabs and Wake button on same row - MCP server: group capabilities and setup into separate cards, remove nested double border on client picker - Connectors: show connect errors as toast instead of inline card text, position toast inside panel, hide single-provider tab - Media providers: move Reload button to left-aligned small ghost button - Memory: info icon shows path on hover, Path copied badge inline; Extraction history and MEMORY.md as standalone collapsible cards; group header hidden when only one type visible - Pet grid cards: Adopt button hidden until hover, icon-only when adopted, description truncated to 2 lines, text fills full width via abs positioning - Agent cards: selected state uses accent border only, no background change - Add sun/moon icons to Appearance theme buttons (Light/Dark) - Shorten several hint strings for clarity Co-authored-by: Cursor <cursoragent@cursor.com> * fix(web): resolve i18n review comments from PR #1568 - Update settings.title and settings.envConfigure to localized "Execution mode" in all 17 non-English locale files - Add settings.memoryFlashPathCopied to all locales and use t() in MemorySection instead of hardcoded English "Path copied" - Add settings.agentModelHead to all locales and use t() in SettingsDialog for "Model for:" agent model row header Co-authored-by: Cursor <cursoragent@cursor.com> * fix(web): update tests to match settings dialog redesign - Add role prop to Toast (alert/status) so error toasts from ConnectorsBrowser are announced immediately by screen readers - Clear connectErrorToast on successful connector retry - Update SettingsDialog.execution tests: - Remove heading assertions for About and MCP server (headers were intentionally removed as duplicate nav labels) - Rewrite CLI env test to use codex-only fields (per-agent filtering means only selected agent's fields are shown) - Update Composio key hint text assertion to match shortened copy - Replace filter button click with select change for Type filter - Replace Configured/Unsupported/Integrated badge checks with updated assertions matching the new media provider UI - Replace disabled BFL row test with coming-soon section check - Update SettingsDialog.media test: remove Fal.ai input assertions (non-integrated providers no longer have editable fields) Co-authored-by: Cursor <cursoragent@cursor.com> * fix(web): unblock CI for #1568 Three small fixes to get Playwright back to green on the settings dialog redesign: 1. `en.ts`: revert `settings.envConfigure` to "Configure execution mode". This PR collapsed both `settings.title` (header gear) and `settings.envConfigure` (entry-side foot pill) to the same string "Execution mode", so `getByRole('button', { name: 'Execution mode' })` resolved to two elements and tripped Playwright strict mode in the three Composio-flow tests (entry-configuration-flows.test.ts:174, 228, 285). Restoring the distinct label also gives screen readers a clearer hint for the pill, which doubles as a status display. Non-English locales still alias the two keys; happy to follow up on those, but they don't gate the (English-only) Playwright suite. 2. entry-configuration-flows.test.ts:167 — `Connectors` heading is now rendered at `<h2>` in the modal-head (SettingsDialog.tsx:1545), with the inner `<h3>` removed by design (see comment around line 1448). Updated the assertion from `level: 3` to `level: 2`. 3. project-management-flows.test.ts:360 — same change for the `Pets` heading. Verified locally with `pnpm --filter @open-design/web typecheck` and `pnpm --filter @open-design/e2e typecheck`. The actual Playwright specs need the dev server up; I didn't rerun them here, but the locator changes are mechanical and match the new DOM. * fix(web): use exact match for Execution mode button locator Playwright's `getByRole({ name })` defaults to substring matching, so `{ name: 'Execution mode' }` still resolved to both the header gear (aria-label "Execution mode") and the entry-side foot pill (aria-label "Configure execution mode" — substring contains "Execution mode"). Strict mode tripped in the three composio-flow tests at lines 202, 257, and 319. Adding `exact: true` makes each call resolve to just the header gear, which opens the same dialog the foot pill does — the test outcomes are unchanged. --------- Co-authored-by: chaoxiaoche <chaoxiaoche@chaoxiaochedeMacBook-Pro.local> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Caprika <56862773+alchemistklk@users.noreply.github.com> Co-authored-by: shangxinyu1 <shangxinyu@refly.ai> Co-authored-by: lefarcen <935902669@qq.com>	2026-05-15 14:35:06 +08:00
Quang Do	a41d4f6126	fix(web): keep chat pinned during content growth (#1716 )	2026-05-15 14:12:00 +08:00
Prantik Medhi	01e54700a2	fix(web): make file grouping by kind work (#1551 ) * fix(web): group design files by kind * fix(web): unblock CI for #1551 - FileViewer test (line 434): add missing `projectKind="prototype"` to match every other instance; this was the source of the typecheck failure blocking workspace validation. - DesignFilesPanel "groups files by kind" test: assert against `.df-section-label` elements so the section header check is not ambiguous with the per-row kind cell text. - DesignFilesPanel batch-delete test: derive the expected file names from the rendered row testids and use `arrayContaining` so the assertion no longer depends on the (now kind-default) row order. * fix(web): satisfy strict-index typecheck in batch-delete test `onDeleteFiles.mock.calls[0][0]` tripped `noUncheckedIndexedAccess` ("Object is possibly 'undefined'"). Drop the separate length probe and assert the exact array instead — `selected` is a `Set`, `handleBatchDelete` spreads it with `[...selected]`, and the test clicks rows[0]/rows[1] in that order, so insertion order is deterministic and equals `[firstName, secondName]`. --------- Co-authored-by: lefarcen <935902669@qq.com>	2026-05-15 13:07:27 +08:00
Prantik Medhi	8aeedf368b	fix(web): localize accent controls in settings (#1565 ) * fix(web): localize accent controls * fix(web): localize accent default label * fix(web): unblock CI for #1565 Add missing `projectKind="prototype"` to the FileViewer deck-render test (line 434) so workspace typecheck stops failing on the `Property 'projectKind' is missing` error. This mirrors every other FileViewer render in the same file and is unrelated to the accent localization changes in this PR — it's drift from a recent change on main that made `projectKind` required. --------- Co-authored-by: lefarcen <935902669@qq.com>	2026-05-15 12:09:28 +08:00
Yuhao Chen	b0963fd874	fix(web): allow downloads from preview iframes (#1732 )	2026-05-15 11:55:29 +08:00
Tom Huang	76defffb93	Garnet hemisphere (#1702 ) * feat(chat-composer): enhance mention handling and input overlay - Introduced a new overlay for inline mentions in the chat composer, improving user experience by visually indicating mentions as users type. - Updated the `ChatComposer` component to manage mention entities and integrate them into the input field, allowing for better context and interaction. - Enhanced the `AssistantMessage` component to support the display of plugin action panels based on the current project context, facilitating easier plugin management. - Refactored related components to ensure consistent handling of project files and mentions across the application. This update significantly improves the chat interaction model, making it more intuitive for users to engage with mentions and plugins. * feat(plugin-management): enhance plugin action panels and UI components - Updated the `AssistantMessage` component to include plugin action panels based on the latest project context, improving user interaction with generated plugins. - Refactored the `PluginsView` to support detailed views for available marketplace entries, allowing users to access more information and actions for each plugin. - Introduced new CSS styles for improved visual representation of plugin-related UI elements, enhancing overall user experience. - Enhanced the `listPlugins` function to include an option for fetching hidden plugins, providing more flexibility in plugin management. This update significantly improves the usability and functionality of the plugin management system, making it easier for users to interact with and manage their plugins. * fix(assistant-message): refine plugin folder candidate selection logic - Updated the `pluginFoldersTouchedThisTurn` function to improve the logic for selecting plugin folder candidates based on touched paths and message content. - Introduced a new helper function, `pathMatchesFolderFileBasename`, to enhance the matching criteria for folder candidates. - Added a check for explicit folder matches before falling back to a single candidate, improving accuracy in folder selection. - Modified the `shouldRenderSlotAsText` function in `HomeHero` to include the name parameter, refining the rendering logic for slot text. These changes enhance the functionality and reliability of the assistant message component in managing plugin folder candidates. * feat(plugin-folder-actions): implement agent-routed CLI actions for plugin management - Introduced a new `PluginFolderAgentAction` type to streamline actions related to plugin folders, including install, publish, and contribute. - Updated the `DesignFilesPanel`, `FileWorkspace`, and `AssistantMessage` components to utilize the new agent action handling, improving user interaction with generated plugins. - Refactored the action handling logic to send commands to the agent, enhancing the workflow for managing plugin folders. - Added corresponding tests to ensure the new functionality works as expected and integrates seamlessly with existing components. This update significantly enhances the plugin management experience by routing actions through the agent, allowing for a more cohesive and interactive user experience. * Fix PR 1702 CI blockers * Fix PR 1702 remaining CI checks * Prebuild AGUI adapter after install * Restore plugin project snapshot wiring * feat(marketplace): refactor marketplace URL handling and enhance fetching logic - Introduced new functions to normalize marketplace URLs and manage fetching of marketplace manifests, improving the reliability of marketplace integrations. - Updated the server and plugin logic to utilize the new fetching mechanisms, ensuring consistent handling of marketplace data. - Enhanced tests to cover new URL normalization and fetching scenarios, ensuring robustness in marketplace management. This update significantly improves the marketplace experience by streamlining URL handling and enhancing data fetching capabilities. * Fix project auto-send cleanup spec	2026-05-14 21:12:50 +08:00
sakshyasinha	c4a67a7b3e	Fix Kimi CLI icon contrast in light mode (#1667 ) * fix(web): improve Kimi CLI icon contrast * fix(web): render Kimi icon via theme-aware CSS mask Move Kimi to the MONO_ICONS set so it renders through CSS mask with currentColor adaptation, making it legible in both light and dark themes instead of baking a single dark fill that fails on dark backgrounds. * fix(web): adjust Kimi icon secondary mark for dual-theme contrast Keep Kimi as a baked two-tone asset: blue accent (#1783ff) for brand identity, mid-tone gray (#666666) secondary mark for acceptable contrast on both light and dark card surfaces. Revert from mask path to preserve the blue branding. * fix(web): correct corrupted Kimi SVG and strengthen asset validation test Remove extraneous PR discussion text that was accidentally included in the SVG file. Strengthen the test to validate the bundled asset is valid SVG with the expected fills (blue accent + gray secondary mark), catching asset corruption that would otherwise go undetected.	2026-05-14 20:32:52 +08:00
lefarcen	640a332276	Merge remote-tracking branch 'origin/garnet-hemisphere' into reconcile/garnet-main-merge	2026-05-14 17:44:44 +08:00
lefarcen	3b7f87c7ae	Merge remote-tracking branch 'origin/main' into reconcile/garnet-main-merge	2026-05-14 17:44:26 +08:00
pftom	8e3af79dea	feat(github-installer): enhance GitHub content installation and error handling - Introduced new interfaces for GitHub content entries and budgets to streamline content fetching. - Enhanced the `installFromGithub` function to support installation from GitHub contents, including subpath handling. - Implemented robust error handling and retry logic for fetching GitHub content, improving installation resilience. - Updated tests to validate the new content fetching logic and ensure correct behavior across various scenarios. This update significantly improves the GitHub installation process, making it more flexible and user-friendly.	2026-05-14 17:31:29 +08:00
lefarcen	b268bbe169	Merge origin/garnet-hemisphere (post-9e196d34) — Use Plugin handoff fix Brings in 11 new garnet commits, most importantly: - `1a90aef4` feat(plugin-use): implement plugin use handoff functionality — fixes the bug QA reported where /plugins Use Plugin would 422 silently for template plugins; new flow hands off to HomeView with the plugin pre-bound + input form prompted there. - `2ac58544` feat(plugin-inputs): enhance plugin input handling with file upload support — extends PluginInputsForm for file uploads. - `3b167b69` feat(plugins): registry protocol — new @open-design/registry-protocol workspace package (needs build before daemon boot). - Plus enhancements to plugin metadata, GitHub installer, plugin detail view, login/whoami, static HTML preview paths. Conflicts resolved: - packages/contracts/src/api/projects.ts: HEAD's skipDiscoveryBrief field + garnet's contextPlugins (@-mention plugin context refs) both kept on ProjectMetadata. - apps/landing-page/* (3 files): accepted HEAD — garnet had the older single-page landing-page header; main has the multi-page layout (/skills/, /systems/, /templates/, /craft/) with dynamic counts. Not related to the Use Plugin core fix. New @open-design/registry-protocol package must be built before daemon boots; pnpm install does this via postinstall already.	2026-05-14 16:32:35 +08:00
pftom	6614b9bf09	refactor(plugin-authoring): streamline prompt and input handling - Removed the redundant `buildPluginAuthoringPrompt` function call in `startPluginAuthoring` for cleaner code. - Introduced new functions to build prompts and inputs based on user goals, enhancing the authoring experience. - Updated `HomeView` to manage authoring inputs and prompts more effectively, ensuring better state handling. - Adjusted the `PluginImportModal` to reflect changes in the import process, removing references to template creation. - Enhanced tests to cover new input handling and prompt generation logic, ensuring reliability in the authoring flow. This update improves the clarity and efficiency of the plugin authoring process, making it more intuitive for users.	2026-05-14 16:18:34 +08:00
pftom	fbcf50382a	feat(github-installer): enhance GitHub source parsing and error handling - Updated the GitHub source regex to support more flexible parsing of repository references, including subpaths. - Introduced a new `parseGithubSource` function to handle the extraction of owner, repo, and potential subpaths, improving the robustness of the installation process. - Enhanced the `installFromGithub` function to retry fetching from multiple candidates and provide detailed error messages when installation fails. - Added tests to validate the new parsing logic and ensure correct behavior when handling various GitHub source formats. This update significantly improves the handling of GitHub sources, making the installation process more resilient and user-friendly.	2026-05-14 16:09:37 +08:00
Nagendhra Madishetti	40766ef1ba	test(web): Critique Theater Phase 13 (reducer p99 bench + surface coverage walker) (#1318 ) * feat(web): pure reducer for Critique Theater states (Phase 7.1) Pure CritiqueState reducer driven by the contracts-level PanelEvent (the same shape both the live SSE stream and the recorded transcript emit), so a single reducer powers both the in-flight panel and the rerun replay. Lifecycle covers run_started → running → (shipped / degraded / interrupted / failed), with panelist_open / dim / must_fix / close / round_end events building per-round CritiquePanelistView entries as they arrive. Defensive behaviour that surfaced while writing the spec tests: - Terminal phases (shipped / degraded / interrupted / failed) are sticky against further lifecycle events for the same run, except for parser_warning which can land late and is recorded in a side channel without changing phase. - A new run_started for a different runId at any time discards the prior state and reboots, so the UI can launch consecutive runs without an explicit reset action. - Events whose runId does not match the active run return the same state reference, so React's useReducer doesn't re-render subscribers on stray traffic. - Round bookkeeping keys by round number rather than "always last", so an out-of-order panelist_dim for round 1 arriving after a round 2 dim does not corrupt the round 2 bucket. Test coverage: 18 cases covering each transition, the runId guard, sticky-terminal behaviour, the out-of-order round invariant, and the stable-identity guarantee. Sets up Phase 7.2 and 7.3 to wire SSE + replay into the same reducer. * feat(web): useCritiqueStream hook subscribes to SSE and feeds reducer (Phase 7.2) createCritiqueEventsConnection is a pure connection manager that mirrors apps/web/src/providers/project-events.ts: opens an EventSource at /api/projects/:id/events, listens for every name in CRITIQUE_SSE_EVENT_NAMES, decodes each frame back into a PanelEvent (stripping the critique. prefix and merging the data payload), and hands it to the caller's onEvent. Reconnect uses exponential backoff (1s → 30s) and resets on `ready`; malformed payloads drop with a dev-mode warning rather than tearing the stream. useCritiqueStream wraps the manager in a useReducer that owns the CritiqueState. enabled=false or a null projectId tears down the connection cleanly; switching projectId closes the old connection and opens a fresh one. The returned dispatch lets local UI synthesise actions (e.g. an Esc keypress firing a synthetic interrupted while a kill request is in flight); production traffic comes from the SSE stream. Test coverage: - sse.test.ts (10 cases, node env): subscription set covers every CRITIQUE_SSE_EVENT_NAMES channel; payload decoding lifts the wire shape back to PanelEvent; malformed JSON is swallowed and does not stop the stream; exponential backoff schedule and ready-reset semantics are pinned with a setTimeout seam; close() cancels pending reconnects and shuts the live source; no-op fallback when EventSource is unavailable. - useCritiqueStream.test.tsx (6 cases, jsdom env): idle pre-event, reducer driven by synthetic actions, no connection when disabled or projectId is null, clean close on unmount, projectId change reopens cleanly. * feat(web): useCritiqueReplay hook drives reducer from transcript file (Phase 7.3) Fetches the per-run NDJSON transcript (one PanelEvent per line), parses every line via the shared isPanelEvent predicate, and dispatches into the same CritiqueState reducer the live SSE stream uses. A single reducer means the UI rendering a replay can be identical to the live panel, and a UI mounting both useCritiqueStream and useCritiqueReplay in parallel does not have to reconcile two state shapes. speed knob is `paused \| instant \| live \| { intervalMs: N }`. - instant flushes every event synchronously, useful for opening a finished run already at its terminal state. - intervalMs paces dispatches at a fixed cadence so the reviewer can watch the run unfold. - paused parses the transcript but holds events back until the caller advances speed (consumers can drive a scrubber later). - live is reserved for the future "playback at original cadence" feature, currently treated as instant; replay timestamps are not yet persisted with each event so honest pacing requires a follow-up Phase 7+ task. gunzip seam handles `.ndjson.gz` transcripts via DecompressionStream when present; the production fetch path picks between text and arrayBuffer based on the URL extension. Both seams are injectable so the unit tests don't need to spin up a real network or a real gzip pipeline. Test coverage (8 cases, jsdom env): - Idle status before any URL is provided. - speed=instant flushes the full transcript synchronously to shipped state. - speed={intervalMs:N} paces with the setTimeout seam, reaching done after the last tick. - speed=paused leaves status=playing with no dispatches. - Empty transcript reports done with state still idle. - Fetch rejection surfaces an error status with the message. - Malformed NDJSON lines are skipped; valid events around them still land. - .gz transcripts route through the gunzip seam. Closes the Phase 7 plan tasks 7.1 / 7.2 / 7.3 (reducer + stream + replay), all on one branch ready for review. Phases 8+ (Theater components) consume these from this PR. * fix(web): close payload-override gap + paused-resume bug in Critique Theater hooks (Phase 7 review) Two P1 fixes from lefarcen's review on PR #1307: SSE payload override `sseToPanelEvent` previously spread `data` after the channel-derived `type`, so a payload-provided `type` could override the channel and route a `critique.run_started` frame into the reducer as a `ship` action. Reversed the spread so the channel-derived `type` is authoritative, and revalidated the resulting object through the contracts-level `isPanelEvent` predicate before returning. Frames that fail validation (missing runId, empty runId, unknown type) are dropped, so a malformed or compromised SSE frame can no longer dispatch a wrong-shape action into the reducer. Three new sse.test.ts cases pin the regression: hostile `type:'ship'` in the payload still resolves to `run_started`, missing runId is dropped, empty runId is dropped. Replay pause/resume `useCritiqueReplay` had one big effect keyed on `transcriptUrl` only, so flipping `speed` from `paused` to `instant` never re-fired and the held events sat undispatched. Split into a parse effect (depends on URL, fetches and stores events in state) and a pace effect (depends on parsed-events + speed, owns the cursor + timers). The playback cursor lives in a ref that survives pause/resume cycles, so flipping `paused` -> `instant` flushes from the current position rather than restarting (which would double-dispatch `run_started` and reset the reducer). Two new useCritiqueReplay.test.tsx cases: - paused-then-instant transitions from `playing` to `done` and reaches the shipped terminal phase - intervalMs paced playback dispatches one event, pauses to drain the next scheduled timer, flips to instant, and confirms the remaining transcript drains exactly once (cursor was preserved) Doc consistency The earlier source comment in useCritiqueReplay.ts claimed `live` "paces by recorded timestamps" while the impl used zero-delay timers and the PR body said it behaves like `instant`. Aligned to reality: `live` currently behaves like `{ intervalMs: 0 }` (events drain on successive microtasks via setTimeoutFn) because transcripts do not yet carry per-event timestamps. Honest timestamp-driven pacing is queued as a Phase 7+ follow-up. Validated: pnpm guard, pnpm --filter @open-design/web typecheck, Theater suite 47/47 (up from 42, +3 sse + 2 replay), full web suite 96 files / 888 tests. * feat(i18n): seed Critique Theater key block (en + zh-CN; other locales fall back via spread) * feat(web): Theater PanelistLane component (Phase 8.1) * feat(web): Theater ScoreTicker component (Phase 8.2) * feat(web): Theater RoundDivider component (Phase 8.3) * feat(web): Theater InterruptButton component with Escape keybind (Phase 8.4) * feat(web): Theater TheaterDegraded chip (Phase 8.5) * feat(web): Theater TheaterCollapsed post-run summary (Phase 8.6) * feat(web): Theater TheaterTranscript replay surface (Phase 8.7) * feat(web): Theater TheaterStage top-level container (Phase 8.8) * feat(web): Theater CSS using existing semantic tokens (no hex literals) * feat(web): Theater public exports barrel * fix(web): resolve P2 + P3 review feedback on Phase 8 (PR #1314) Addresses all 4 P2 + 3 P3 items from codex, Siri-Ray, and lefarcen. State-lifecycle fixes (3 x P2) 1. Reducer learns a synthetic `__reset__` action (`CritiqueResetAction`). Host hooks dispatch it when their gating prop changes so a stale run from a prior project / transcript cannot bleed into the next context. Reset is idempotent on idle (returns the same reference). 2. `useCritiqueStream` dispatches `__reset__` at the top of its connection effect, so a workspace switch from project A (which streamed a critique) to project B clears the reducer before the new EventSource opens. enabled=false also clears. 3. `useCritiqueReplay` dispatches `__reset__` at the top of its parse effect, so transcriptUrl swaps (including swap-to-null after a replay reached `shipped`) lift the reducer back to idle before the new fetch starts. SSE validation (1 x P2) 4. `sseToPanelEvent` now runs a per-variant `hasValidVariantShape` check after the cheap `isPanelEvent` predicate. A `critique.ship` frame missing `composite` / `round` / `status` / `artifactRef` is rejected before reaching the reducer, so TheaterCollapsed can no longer crash on `undefined.toFixed(1)`. Every variant's required fields are validated: run_started (protocolVersion, non-empty cast, maxRounds, threshold, scale), panelist_* (round, role, plus variant-specific shape), round_end (round, composite, mustFix, decision in {continue,ship}, reason), ship (round, composite, status, artifactRef.{projectId,artifactId}, summary), degraded (reason, adapter), interrupted (bestRound, composite), failed (cause), parser_warning (kind, position). Reducer correctness (1 x P2) 5. `panelist_open` now materializes the round + an empty panelist view (`{dims: [], mustFixes: []}`) so TheaterStage can highlight the in-progress lane the instant the tag opens. Before this, a stream that emitted only `panelist_open` after `run_started` left `rounds = []` and the UI rendered no current round until a later `panelist_dim` arrived. Polish (3 x P3) 6. Brand role tint swaps from `var(--magenta, var(--accent))` to `var(--purple, var(--accent))`. `--purple` is actually defined across the design systems; `--magenta` is not, so Brand was silently falling through to `--accent` and looking identical to Designer. 7. New i18n key `critiqueTheater.interruptedSummary` for the interrupted-collapse copy ("Interrupted at round N, best composite X.X"). Previously the interrupted branch reused `shippedSummary` and the UI read "Shipped at round..." for a run that specifically did not ship. Native value in en + zh-CN; other locales fall back via `...en` spread. 8. `TheaterDegraded` heading id comes from `useId()` instead of a hardcoded `theater-degraded-heading`, so two chips rendered on the same page (chat history with multiple completed runs) keep their aria-labelledby references unambiguous. Tests (15 new cases) - reducer.test.ts (+5): __reset__ on running/terminal/idle, panelist_open materializes round, panelist_open does not stomp prior panelist data. - sse.test.ts (+6): variant-level rejection for ship without required fields, degraded without adapter, run_started with empty cast, panelist_dim with non-numeric score, round_end with unknown decision, plus a positive fully-formed ship. - useCritiqueStream.test.tsx (+2): state reset on projectId change, state reset on enabled flip false. - useCritiqueReplay.test.tsx (+1): state reset on transcriptUrl swap to null after a replay reached shipped. - TheaterCollapsed.test.tsx (text-pinning update): asserts the interrupted branch reads "Interrupted at round 1" + "best composite 7.9", and explicitly NOT "Shipped at round...". - TheaterDegraded.test.tsx (+1): two chips on the same page get unique aria-labelledby ids that each resolve to an `<h3>`. Validated - pnpm guard clean - pnpm --filter @open-design/web typecheck clean - Theater suite: 13 files, 101 tests (was 86 on the first Phase 8 push, +15 new) - tests/i18n/locales.test.ts 5 of 5 across 18 locales * feat(web): CritiqueTheaterMount wires SSE + reducer into a single drop-in (Phase 9.1) * feat(i18n): Critique Theater strings for de + ja + ko + zh-TW (Phase 9.2) * fix(web): resolve P1 + P2 review feedback on Phase 9 (PR #1315) Addresses every blocker from codex, Siri-Ray, and lefarcen. The three state-lifecycle and SSE-validation issues they also flagged inherit fixes from PR #1314's review pass that this branch now sits on top of after rebase. Real daemon kill on Interrupt (P1) - CritiqueTheaterMount now POSTs to /api/projects/:id/critique/:runId/interrupt alongside the optimistic local dispatch. Before this fix, clicking Interrupt only flipped the React state to interrupted while the daemon job kept running. The fetch is best-effort: a 404 (endpoint not wired yet, lands in Phase 15) is swallowed with a dev-mode console.warn so the UI still moves to the collapsed badge. - New fetchInterrupt test seam lets RTL assert on the URL / method and simulate the "daemon not ready yet" path. Two tests pin both: the happy URL proj-42/critique/run-abc/interrupt POSTs, and a rejected fetch still flips the UI. interruptPending reset on new run (P2) - A ref-backed effect compares the current runId against the last one we saw; when it changes, interruptPending is cleared. A user who interrupts run-1 and then triggers run-2 from the same mount now gets a fresh, enabled kill button instead of one stuck in "Interrupting…". Pinned by a new mount test. Escape keybind scope (P2) - InterruptButton now checks the keydown target. Escape inside an input, textarea, select, or contenteditable element is ignored (and any ancestor of those via closest() is treated the same way). Body-level focus still fires the keybind so the Theater area's affordance keeps working. Four new tests cover textarea, input, contenteditable, and the body-focus positive case. userFacingName i18n key (P2) - The spec at specs/current/critique-theater.md:6 mandates a single critiqueTheater.userFacingName key so the "Design Jury" label can be renamed without touching code. Phase 8 introduced critiqueTheater.title by mistake; renamed across types.ts, en.ts, zh-CN.ts, de.ts, ja.ts, ko.ts, zh-TW.ts, and the lone consumer TheaterStage.tsx. The locale alignment test stays green. Validated - pnpm guard clean - pnpm --filter @open-design/web typecheck clean - Theater suite: 14 files, 112 tests (was 101 before, +11 new for the Phase 9 review pass: 3 mount + 4 InterruptButton focus scope; the rest were already in #1314's review fix). - tests/i18n/locales.test.ts 5 of 5 across 18 locales. * feat(daemon): adapter-degraded registry with TTL (Phase 10.1) In-memory registry recording adapters that produced malformed or oversize transcripts so the orchestrator can skip them for a TTL window (default 24h) instead of cycling through known-bad providers on every run. Records carry reason (malformed_block \| oversize_block \| missing_artifact), source label, and expiresAt. The test-only clock seam lets the suite advance time deterministically and prove that an expired entry stops counting as degraded without anyone calling clearDegraded. 7/7 vitest cases green. * feat(daemon): synthetic good + bad adapter fixtures (Phase 10.2) Two test-only adapters that read the existing v1 transcript fixtures (happy-3-rounds and malformed-unbalanced) and replay them as either a full string or a 512-byte chunked stream. The chunked form is what the conformance harness uses to prove the parser holds together when the transcript arrives in arbitrary network slices, not as one buffered blob. * feat(daemon): adapter conformance harness (Phase 10.3) runAdapterConformance pulls a transcript through the same parseCritiqueStream pipeline the orchestrator uses and classifies the outcome as shipped, degraded, or failed. On a degraded outcome it forwards the matched reason to the adapter-degraded registry, so a single nightly conformance run is what populates the skip list rather than the orchestrator learning each adapter is broken at request time. 5/5 vitest cases green covering shipped, malformed degraded, oversize degraded, no-ship failure, and the harness-thrown failure path. * test(e2e): Critique Theater Playwright suite (Phase 11) Six tests, one viewport per visual case, deterministic SSE fixtures stubbed via page.route(). Adds the suite to test:ui:extended so the existing extended-UI lane picks it up. Coverage: 1. Happy path: a single mounted theater plays the full fixture (1 run_started, 5 panelists open / dim / must_fix / close, 1 round_end, 1 ship) and ends on the score badge. 2. Interrupt mid-run: the panelist that is open at the time the interrupt button is clicked closes with an interrupted marker and the transcript freezes there. 3. Visual regression at 375x720 mobile. 4. Visual regression at 768x1024 tablet. 5. Visual regression at 1280x800 desktop. 6. A11y role tree: the theater region exposes a labelled landmark, each panelist lane is a group with an accessible name, the score is a status live region. All SSE traffic is stubbed by page.route so the suite runs in CI without a daemon. The toggle is seeded via localStorage by bootAppWithCritiqueEnabled so the gate behaves as if Settings flipped it on. typecheck clean; playwright --list reports 6. * test(web): reducer p99 bench at 10k iterations (Phase 13.1) Locks the documented 2ms budget for the Critique Theater reducer on a representative SSE script (27 actions, one full happy run) behind a regression gate. Asserts p99 stays under 4ms (2x the documented budget) so CI runners with a noisy neighbour do not flake while a real regression to 20ms or 200ms still trips. The bench is a vitest case rather than a bare microbenchmark so it runs in the same CI lane as every other web test and does not need a parallel runner. * test(web): critique surface coverage walker (Phase 13.2) Walks the public critique surface (11 SSE event names, 5 panelist roles, 6 lifecycle phases, 9 named i18n keys) and asserts each named symbol appears in both the src corpus and the test corpus. The walker is the gate that catches a rename in one half of the codebase without a matching update in the other half: a future PR that drops 'panelist_must_fix' from the reducer without also removing its test reference fails this suite. 62 assertions, one per symbol per corpus. * fix(web): tighten Phase 13 gates from lefarcen review (PR #1318) Address the actionable items from lefarcen's review of the two Phase 13 CI gates. The two questions about longer-term DX (pre- commit hook to auto-update the symbol table, AST-walker swap) are documented as deferred follow-ups rather than landed here. reducer-bench: - Describe renamed to 'reducer p99 regression gate (Phase 13.1)' so it reads as a gate, not a comparative benchmark. - Failure message now carries the full distribution (p50 / p90 / p99 / max + ceiling), so triage on a tripped gate can distinguish a real 20ms regression from a 4.001ms CI hiccup without re-running locally (lefarcen Q3). - Captured a baseline (p50=0.011ms p90=0.013ms p99=0.018ms max=0.244ms on a local Node 24 / Win11 run, 2026-05-11) inside the docblock so reviewers can see the actual reading sits ~222x below the 4ms ceiling (lefarcen Q1). - Replaced 'role as any' casts with PanelistRole-typed casts so the fixture is typecheck-strict. - Phase numbering corrected (13.2 → 13.1 to match the PR body). critique-coverage: - Symbols now grouped under four describe blocks (SSE events / panelist roles / lifecycle phases / i18n keys) so a failure points at the category that drifted at a glance (lefarcen nit). - Docblock now explains the grep-over-AST trade-off (the bug class is structural at the string level, not at the AST level) and points at the future AST-walker work as a deferred follow- up (lefarcen Q2). - Docblock now walks a contributor through the four-step maintenance flow (add to contract → add caller → add test → add literal here), so the next person to add an SSE event or i18n key knows the gate exists and what to update (lefarcen Q4). - Phase strings switched from 'phase: <name>' to bare-quoted literals so the walker is robust against single vs double quotes and ':' vs '===' source-shape changes. - Dead try/catch around 'stack = [root]' removed (cannot throw). - Per-symbol failure messages name the symbol AND which corpus is missing it, so the gate is self-describing on the next CI red. - Phase numbering corrected (13.4 → 13.2 to match the PR body). 63 / 63 vitest cases green (1 bench + 62 coverage). Web typecheck clean. * fix(web): tighten coverage walker semantics from lefarcen P2/P3 (PR #1318) Two follow-on findings on commit `338a185`: P2 — coverage gate weakened. The previous revision used one helper `corpusReferences` for both SRC and TEST corpora, and that helper accepted the unprefixed PanelEvent type form (`type: 'panelist_must_fix'`) as a substitute for the prefixed SSE wire name (`critique.panelist_must_fix`). The fallback is correct on the TEST side (reducer tests dispatch PanelEvent literals) but it weakened the SRC side: production code could drop the SSE channel name silently and the PanelEvent type alias would keep the walker green. Split into two helpers: `srcReferences` is strict (exact substring match only, no fallback) and `testReferences` keeps the lenient fallback for SSE events. The production-side assertions now route through `srcReferences` so the wire name is load-bearing again. P3 — maintenance doc overclaimed. The previous revision said 'CI red if you forget step 4' but the symbol arrays are partially hand- maintained, so a contributor adding a NEW phase string or i18n key without updating the array leaves CI green (the walker never knew to look). Rewrote the failure-mode section to distinguish the two cases: - Renaming an EXISTING symbol without updating the walker → CI red (existing assertion fails because the old name is gone). - Adding a NEW hand-maintained symbol without updating the walker → CI stays green (walker does not know to look for it). Also clarified that `SSE_EVENTS` and `PANELIST_ROLE_STRINGS` are auto-built from contracts so step 4 is one-line for `PHASE_STRINGS` and `I18N_KEYS` only. 63 / 63 vitest cases still green. * fix(web): close two P2 findings on PR #1318 (Siri-Ray + lefarcen) P2 (coverage walker counted self as evidence). The walker walked apps/web/tests, which contains apps/web/tests/components/Theater/ critique-coverage.test.ts itself. The hand-maintained PHASE_STRINGS and I18N_KEYS literals inside that file would satisfy the test-side coverage assertion against themselves, so a real Theater test that covers a symbol could be deleted and the gate would still pass. Excluded the walker file from TEST_FILES via path.resolve(__filename) filter so the test corpus only contains independent evidence. Once the walker stopped seeing itself, the gate correctly red-flagged nine i18n keys that no INDEPENDENT test exercises: critiqueTheater.userFacingName, roundLabel, composite, threshold, interrupt, interrupted, degradedHeading, shippedSummary, interruptedSummary. Component tests like TheaterCollapsed.test.tsx exercise the rendered text but never mention the key STRING, so the walker couldn't see them. Closed that gap by adding apps/web/tests/components/Theater/critique-i18n-keys.test.ts: 9 cases, one per watched key, asserting the dictionary entry exists as a non-empty string. That's both real coverage (catches a stale dict) and the independent evidence the walker requires. P2 (interruptedSummary missing from de/ja/ko/zh-TW). The native locale overrides were missing the key, so an interrupted run on a German / Japanese / Korean / Traditional Chinese UI silently fell back to the English string via the ...en spread. Added the key with {round} and {composite} placeholders preserved, using PerishCode's suggested copy from the earlier review thread. Verified: - pnpm --filter @open-design/web typecheck clean. - pnpm exec vitest run tests/components/Theater tests/i18n: 20 files / 190 tests green (critique-coverage 62 / 62, critique-i18n-keys 9 / 9 new, reducer-bench 1 / 1, locales 5 / 5). * fix(web): drop the Dict cast in i18n key coverage test (lefarcen P1 / Siri-Ray on PR #1318) The previous revision used `(en as Record<string, string>)[key]` to read each watched key. Dict has no string index signature, so CI's strict typecheck rejected the broad cast with TS2352 even though the runtime assertion was fine. Replaced with the typed pattern lefarcen suggested: type WATCHED_KEYS as `readonly (keyof typeof en)[]` and read `en[key]` directly. That removes the cast and also strengthens the test, because a renamed or removed key now fails the type check immediately rather than at runtime. Verified: - pnpm --filter @open-design/web typecheck clean. - pnpm --filter @open-design/web exec vitest run tests/components/Theater/critique-i18n-keys.test.ts: 9 / 9 green. * fix(web): tighten isPanelEvent in contracts so enum + numeric fields are checked end-to-end (Siri-Ray round-3 P1 on PR #1314) The variant validator on the web SSE path previously accepted any `typeof === 'string'` for closed-enum fields (ship.status, panelist_.role, degraded.reason, failed.cause, parser_warning.kind, run_started.cast[]) and any `typeof === 'number'` for numeric fields, which let NaN / Infinity through. Downstream components index i18n tables by enum value, so an unknown status or role would land `SHIP_BADGE_KEY[final.status]` on undefined and crash the translator. The replay parser had a separate gap: `useCritiqueReplay.parseTranscript` called the cheap `isPanelEvent` header check directly, so a recorded line like `{"type":"ship","runId":"r"}` reached the reducer with composite, status, round, artifactRef, summary all undefined and TheaterCollapsed then called `final.composite.toFixed(1)` on undefined. Resolution: move all wire-side validation into the contract guard. - Export const arrays for the closed enums: SHIP_STATUSES, DEGRADED_REASONS, FAILED_CAUSES, PARSER_WARNING_KINDS, ROUND_DECISIONS (PANELIST_ROLES already existed). - Rewrite `isPanelEvent` in packages/contracts/src/critique.ts to be the single deep validator: header (known type + non-empty runId) plus every variant-specific required field plus closed-enum membership plus Number.isFinite on every numeric field. Documented as the wire source of truth. - Drop the local `hasValidVariantShape` from web/sse.ts; sseToPanelEvent now relies entirely on the contract guard, and parseTranscript in useCritiqueReplay (which already uses isPanelEvent) gets the deeper validation for free. Tests (TDD, red-first): - packages/contracts/tests/critique.test.ts: 13 new cases pinning the strict guard directly (well-formed across every variant, every rejection path: unknown type, empty/non-string runId, unknown enum, non-finite numeric, missing variant field). - apps/web/tests/components/Theater/state/sse.test.ts: 9 new cases for each closed-enum rejection on the wire path plus a positive sweep across every legal enum value across every variant. - apps/web/tests/components/Theater/hooks/useCritiqueReplay.test.tsx: 2 new cases for incomplete and unknown-enum transcript lines. Verified: - pnpm --filter @open-design/contracts test 4 files / 30 tests green. - pnpm --filter @open-design/contracts build clean. - pnpm --filter @open-design/web typecheck clean. - pnpm --filter @open-design/web test 107 files / 976 tests green. fix(contracts): enforce numeric domains in isPanelEvent (lefarcen P2 on PR #1314 round 4) The strict guard from PR #1314 round 3 enforced enum membership and Number.isFinite, but accepted any finite number where the contract intends a specific domain: scale: 0 (ScoreTicker divides by it), negative thresholds, fractional rounds, negative mustFix, etc. ScoreTicker.tsx writes `var(--scale, ${state.scale})` into inline CSS and divides by it for tick width, so a guard-passing scale: 0 shipped Infinity into the rendered style. Negative composite / score values reached downstream code that assumes >= 0. Resolution: mirror the daemon-side Zod domain constraints in the runtime guard. Three new helpers in packages/contracts/src/critique.ts: - isPositiveInt(v): integer with v > 0. Used for round, maxRounds, scale, protocolVersion (all 1-indexed in the orchestrator). - isNonNegativeInt(v): integer with v >= 0. Used for mustFix, position, bestRound. bestRound: 0 is the valid sentinel for 'interrupted before any round closed'. - isNonNegativeFinite(v): finite number with v >= 0. Used for composite, score, dimScore, threshold. Threshold may be fractional (e.g. 8.5 on a scale of 10). Cross-field check inside run_started: threshold <= scale (the daemon Zod schema enforces this with an epsilon refine, the wire guard matches the same intent). Tests (TDD, red-first) added in packages/contracts/tests/critique.test.ts: - 22 new rejection cases across every numeric field that previously slipped through: scale: 0, negative scale, fractional scale, maxRounds: 0, fractional maxRounds, protocolVersion: 0, fractional protocolVersion, negative threshold, threshold > scale, round: 0, fractional round, negative dimScore / score, negative / fractional mustFix, negative composite, ship round: 0, negative / fractional bestRound, negative interrupted composite, negative / fractional parser_warning position. - 3 positive boundary cases that must still pass: threshold == scale, fractional threshold within [0, scale], interrupted with bestRound: 0 (no round completed before interrupt), parser_warning with position: 0 (start of stream). Verified: - pnpm --filter @open-design/contracts build clean. - pnpm --filter @open-design/contracts test: 4 files / 59 tests green (was 37 before the new domain cases). - pnpm --filter @open-design/web typecheck clean. - pnpm --filter @open-design/web test: 110 files / 1004 tests green; no regression on Theater suite, sse validator, replay parser, or assistant-feedback widget tests. * fix(web): restore wait-for-daemon-ack pattern on Theater interrupt Same regression as flagged on PR #1316 post-main-merge: the optimistic local dispatch fired before the POST resolved, so a daemon 404 / 409 still terminalized the UI and the real SSE terminal event got ignored by the sticky interrupted phase. Snapshot runId / bestRound / composite at click time, dispatch interrupted only on res.ok, clear interruptPending on rejection or non-2xx so the user can retry. Tests cover rejection + 404 leaving the run on the live stage; the 204 path waits for the ack. * test(e2e): move critique-coverage walker from apps/web/tests to e2e/tests (Siri-Ray P2) The walker is by definition a cross-app consistency check: it reads the web reducer, the daemon critique module, the contracts package, and the e2e UI suite. Hosting it under apps/web/tests/ violated the repo boundary rule (root AGENTS.md): app packages must not import another app's private src/ or tests/ as a shared helper, and cross-app consistency checks belong in e2e/tests/. The web test lane was effectively coupled to daemon and e2e file layout, so a daemon-only refactor could break the web lane. Moved the file to e2e/tests/critique-coverage.test.ts and switched the contracts import to the import.meta.glob shape the e2e package already uses (see localized-content.test.ts), so the e2e package does not have to add @open-design/contracts as a workspace dep just to load two const arrays. REPO_ROOT and SELF_PATH recalculated for the new location. Web test lane no longer depends on daemon, contracts, or e2e layout. The e2e walker covers the same 62 assertions as before: e2e/tests/critique-coverage.test.ts 62 / 62 green Web typecheck clean, e2e typecheck clean. * fix(test): add projectKind prop to FileViewer deck render after v0.7.0 merge --------- Co-authored-by: Nagendhra <nagendhra405@gmail.com>	2026-05-14 15:55:36 +08:00
pftom	2ac5854432	feat(plugin-inputs): enhance plugin input handling with file upload support - Added support for file input fields in the PluginInputsForm, allowing users to upload files with serializable metadata. - Updated the HomeHero component to improve the layout and interaction of input fields, enhancing user experience. - Adjusted CSS styles for better visual representation of input fields and their states. - Modified HomeView to reflect changes in authoring chip IDs for better clarity in plugin actions. - Enhanced tests to cover new file input functionality and ensure correct behavior in various scenarios. This update significantly improves the plugin input handling, enabling users to upload files seamlessly and enhancing the overall interaction model.	2026-05-14 15:52:21 +08:00
pftom	1a90aef4a2	feat(plugin-use): implement plugin use handoff functionality - Added support for using installed plugins directly from the PluginsView, allowing users to initiate plugin actions seamlessly. - Enhanced the HomeView to handle plugin use handoffs, managing state and user interactions effectively. - Introduced new types and functions to facilitate the creation and processing of plugin use handoffs, improving the overall user experience. - Updated tests to cover the new plugin use functionality, ensuring reliability and correctness in the application flow. This update significantly enhances the interaction model for plugins, enabling users to utilize plugins more intuitively within the application.	2026-05-14 15:40:42 +08:00
pftom	9ea33e076b	feat(context-plugins): add support for context plugins in project metadata and UI - Introduced a new `contextPlugins` field in the `ProjectMetadata` type to accommodate plugins selected via `@` mentions, allowing for additive context in project creation. - Updated the `HomeHero` and `EntryShell` components to handle and display context plugins, enhancing user interaction with selected plugins. - Implemented rendering logic for context plugins in the metadata block, providing clear visibility of selected plugins and their descriptions. - Enhanced the UI to support the removal of context plugins and display additional details on hover, improving the overall user experience. This update significantly enriches the project creation process by allowing users to incorporate multiple context plugins seamlessly.	2026-05-14 15:29:49 +08:00
lefarcen	6c16283850	Merge origin/main (post-7c8305f4) into reconcile branch Brings in 10 new main commits: routine deep-link to specific conversations (#1508), Windows resource cache fix for Orbit templates, collapsible comment side panel (#1607), routines project radio polish, Copilot logo swap, and minor UI fixes. Conflicts resolved: - router.ts: garnet's home/view + marketplace routes + main's per-project conversationId deep-link field coexist on Route union - ProjectView.tsx: garnet's isPhantomDaemonRunMessage helper + main's isStoppableAssistantMessage helper both kept - ProjectView.run-cleanup.test.tsx: accepted HEAD (garnet's phantom-row regression test); main's three new tests for finalizeActiveAssistantMessagesOnStop / clearStreamingConversationMarker / shouldClearActiveRunRefs are queued as a follow-up TODO inline.	2026-05-14 15:13:38 +08:00
shangxinyu1	2976c76fc3	test: expand Memory and Routines coverage (#1521 ) * test: expand settings and packaged coverage * test: extend memory settings coverage * test: cover routine settings failure states * test: cover routine operation failures * test: fix daemon test typing on CI * test: decouple packaged smoke from orbit bug * test: avoid live memory LLM calls in route tests * test: fix daemon fetch typing in CI * fix: restore preview comment and inspect toggles * test: align manual edit flow with current inspector UX * test: align comment attachment flow with current preview comments UI * fix: probe resolved Codex launch path during detection * fix: remove duplicate board activation helper after rebase * test: update ghost cli detection mock * test: align FileViewer toolbar expectation * ci: move full app tests to extended lane * ci: run app tests by changed scope * ci: cover shared app inputs in test scopes * ci: avoid setup-node cache in windows packaged smoke * test: align extended settings and manual edit flows	2026-05-14 14:48:40 +08:00
Nagendhra Madishetti	5cb0508790	fix(web): deep-link Routines history rows to their specific conversation (Fixes #1505 ) (#1508 )	2026-05-14 14:27:34 +08:00
soulme	2a8ebff11a	feat(web): add collapsible comment side panel (#1607 )	2026-05-14 14:27:09 +08:00
Siri-Ray	d2738924fb	fix(web): freeze completed run durations across conversations (#1351 ) * fix(web): freeze completed run durations across conversations * fix(web): finalize stopped API runs Generated-By: looper 0.6.0 (runner=fixer, agent=codex) * fix(daemon): optimize conversation latest run lookup Generated-By: looper 0.6.0 (runner=fixer, agent=codex) * fix(web): scope streaming cleanup to conversation Generated-By: looper 0.6.0 (runner=fixer, agent=codex) * fix(web): capture streaming conversation cleanup Generated-By: looper 0.6.0 (runner=fixer, agent=codex) * fix(web): guard stale run ref cleanup Generated-By: looper 0.6.0 (runner=fixer, agent=codex)	2026-05-14 14:25:37 +08:00
pftom	3b167b6921	feat(plugins): add registry protocol and enhance plugin management features - Introduced the `@open-design/registry-protocol` package, enabling improved interactions with plugin registries. - Updated the `typecheck` script in the daemon's `package.json` to include the new registry protocol. - Enhanced the CLI with new flags and commands for better plugin management, including `yank` and additional marketplace functionalities. - Implemented a plugin lockfile system to manage installed plugins and their versions, improving reliability during upgrades. - Added new marketplace doctor functionality to validate plugin entries and ensure compliance with registry standards. This update significantly enhances the plugin ecosystem by providing robust registry interactions and improved management capabilities.	2026-05-14 08:55:36 +08:00
pftom	56c264c9bd	feat(plugins): add login and whoami commands for GitHub CLI authentication - Introduced `login` and `whoami` commands to the plugin CLI, enabling users to authenticate with the Open Design registry via GitHub CLI. - The `login` command wraps GitHub CLI authentication, allowing users to specify a host, defaulting to GitHub. - The `whoami` command retrieves and displays the authenticated GitHub account information, with an option for JSON output. - Updated the CLI help documentation to include usage instructions for the new commands. - Enhanced error handling for GitHub CLI dependencies and authentication status. This update improves the user experience by simplifying the authentication process for plugin publishing.	2026-05-14 07:25:05 +08:00
lefarcen	d83b228c81	Merge remote-tracking branch 'origin/garnet-hemisphere' into reconcile/garnet-main-merge	2026-05-13 23:52:33 +08:00
pftom	7c48fbd902	feat(plugins): enhance PluginsView with new marketplace management features - Updated the PluginsView component to include new tabs for 'Installed', 'Available', and 'Sources', improving the organization of plugin management. - Introduced functions for adding, refreshing, removing, and setting trust for plugin marketplaces, enhancing the marketplace interaction capabilities. - Enhanced the UI to reflect the new structure, including updated CSS styles for better visual consistency and usability. - Added tests to ensure the functionality of the new marketplace features and verify the correct rendering of available plugins. This update significantly improves the user experience in managing plugins and marketplaces, providing a more intuitive interface for users.	2026-05-13 23:42:41 +08:00
lefarcen	53997990b7	Merge origin/main (post-0.7.0) into reconciled garnet branch Second-pass merge layering 41+ new commits from origin/main on top of the first reconcile commit. Headline upstream additions absorbed: - 0.7.0 release: redesigned chat bubble user-text styling, neutralised palette, lucide icons, ElevenLabs audio voice option discovery in the prompt composer, analytics tracking (PostHog) wired across home / studio / create surfaces, Prometheus `/api/metrics` endpoint, critique-theater drop-in mount with a settings toggle. - Misc upstream fixes (titlebar padding, release header layout, deck preview chrome, feedback form auto-scroll, conversation-created SSE on routine runs, etc.) Conflict resolutions (12 files, ~22 hunks): - contracts barrel + prompts/system: union of both sides; new analytics exports (`./analytics/events`, `./analytics/public-params`) added alongside garnet's plugin/atom/genui exports. Both ElevenLabs voice fields (audioVoiceOptions/audioVoiceOptionsError, main) and pluginBlock/activeStageBlocks (garnet) preserved on ComposeInput. - daemon/server.ts: Prometheus `/api/metrics` route inserted after garnet's `/api/daemon/shutdown`. main's `createAnalyticsService` call added before the chat-run service init alongside the prior reconcile note about the dropped legacy POST /api/projects body. - App.tsx: handleCreateProject now consumes both garnet's plugin fields (pluginId / appliedPluginSnapshotId / pluginInputs / autoSendFirstMessage) and main's analytics requestId. Tracking fires success + failure paths; PluginLoopHome auto-send sessionStorage flag is preserved. - ProjectView.tsx: the garnet auto-send useEffect coexists with main's `useCritiqueTheaterEnabled()` hook. - ChatComposer.tsx: imports merged (drop now-unused fetchSkills, add analytics provider + tracking + buildVisualAnnotationAttachment). - index.css: main's redesigned `.msg.user .user-text` chat bubble styling wins over garnet's plain text rule; garnet's `.msg-plugin-chip*` rules preserved alongside. - EntryView.tsx: accepted HEAD (garnet wrapper) — consistent with reconcile decision #2. main's added PetRail / TopTab / analytics view tracking is intentionally NOT brought into the wrapper; the follow-up to re-integrate PetRail / image-templates / video-templates into EntryShell still stands and now also covers analytics view-tracking hooks. - daemon/package.json + pnpm-lock: merged dep set (tar + posthog-node + prom-client coexist). - Test fixtures (FileWorkspace.test): kept garnet's plugin-folders describe block intact; main's projectKind="prototype" addition is dropped where it conflicted with garnet's plugin-folder fixture files. Verification: `pnpm install` (after lockfile reconciled), `pnpm typecheck` exits 0 across all workspace packages. Follow-up not done in this commit: - PetRail / image-templates / video-templates / 0.7.0 analytics view-tracking hooks need to be added to EntryShell. - Critique-theater settings toggle UX (added on main) lives in the SettingsDialog hierarchy; the reconcile state preserves the SettingsDialog so this should work without changes, but no end-to-end verification yet.	2026-05-13 23:29:56 +08:00
lefarcen	d3602be666	Merge origin/main into garnet-hemisphere (reconcile) Merge of `origin/main` (`03ed3960`, 2026-05-13 pre-0.7.0) into the 161-commit garnet-hemisphere line, reconciling the product-vibe-coded plugin/marketplace/EntryShell surfaces from garnet with the routines / skills / live-artifacts feature work landed on main since the fork point. Headline decisions (full rationale + side-by-side screenshots in `specs/change/20260513-garnet-skills-automations/reconcile-result-vs-garnet.md`): - #1 SettingsDialog: keep main's Memory / Skills / External MCP / Connectors / Routines / MCP server nav items even though the top-level /integrations + /automations routes also cover them. Two entries coexist for now; revisit once Track A/B fill in the placeholder content. - #2 EntryView: accept garnet's thin wrapper delegating to EntryShell. Main's PetRail sidebar + image-templates/video-templates tabs are intentionally deferred to a follow-up that re-integrates them into the new EntryShell layout. - #3 /integrations + /automations top-level routes: kept (garnet's product intent). Skills tab is still a "Coming soon" placeholder awaiting Track A; Routines/Schedules/Live-artifacts cards on /automations are still mock awaiting Track B. - #5 DesignFilesPanel: hybrid — main's pagination as primary list, garnet's Plugin folders section preserved between the live-artifacts block and the pagination block. (by-kind sections drop in favour of pagination; plugin-folders rendering stays because it is a garnet-specific product addition.) - #7 server.ts (10 hunks, ~5400 conflict lines): manual hunk-by-hunk merge. Both daemon admin routes + plugin/genui routes (garnet) and routines/memory/skills upgrades (main) preserved. Garnet's inline project route block kept alongside main's `registerProjectRoutes` / `registerProjectUploadRoutes` modular wiring — duplicate route audit is a follow-up. Garnet's POST /api/projects plugin-snapshot resolution + default-scenario fallback is intentionally dropped from the inline body (now handled by registerProjectRoutes) and listed for follow-up re-integration into `project-routes.ts`. Verification (worktree at /Users/elian/Documents/open-design-garnet): - `pnpm typecheck` exits 0 across all workspace packages - daemon (`pnpm tools-dev run web --namespace reconcile-shots`) boots, serves `/api/daemon/status` healthy, and survives a Playwright walkthrough of /integrations / /automations / home / projects / design-systems / plugins / settings dialog - `@open-design/plugin-runtime` package built (was missing dist/ on garnet); without it the daemon's plugins/* imports fail at boot Track A (Skills tab → real SkillsSection) and Track B (Automations cards → real routines / live-artifacts backend) are the two remaining follow-ups blocking the placeholder/mock content from going live. See `spec.md` and `track-skills.md` in the same directory.	2026-05-13 22:29:21 +08:00
Yuhao Chen	828f8b93bf	fix(web): protect built-in skills from delete (#1583 )	2026-05-13 22:13:30 +08:00
Nagendhra Madishetti	38a5ab69e6	feat(daemon): Critique Theater Phase 12 (9 Prometheus metrics + 6 log events + OTel span + Grafana dashboard) (#1485 ) * feat(web): pure reducer for Critique Theater states (Phase 7.1) Pure CritiqueState reducer driven by the contracts-level PanelEvent (the same shape both the live SSE stream and the recorded transcript emit), so a single reducer powers both the in-flight panel and the rerun replay. Lifecycle covers run_started → running → (shipped / degraded / interrupted / failed), with panelist_open / dim / must_fix / close / round_end events building per-round CritiquePanelistView entries as they arrive. Defensive behaviour that surfaced while writing the spec tests: - Terminal phases (shipped / degraded / interrupted / failed) are sticky against further lifecycle events for the same run, except for parser_warning which can land late and is recorded in a side channel without changing phase. - A new run_started for a different runId at any time discards the prior state and reboots, so the UI can launch consecutive runs without an explicit reset action. - Events whose runId does not match the active run return the same state reference, so React's useReducer doesn't re-render subscribers on stray traffic. - Round bookkeeping keys by round number rather than "always last", so an out-of-order panelist_dim for round 1 arriving after a round 2 dim does not corrupt the round 2 bucket. Test coverage: 18 cases covering each transition, the runId guard, sticky-terminal behaviour, the out-of-order round invariant, and the stable-identity guarantee. Sets up Phase 7.2 and 7.3 to wire SSE + replay into the same reducer. * feat(web): useCritiqueStream hook subscribes to SSE and feeds reducer (Phase 7.2) createCritiqueEventsConnection is a pure connection manager that mirrors apps/web/src/providers/project-events.ts: opens an EventSource at /api/projects/:id/events, listens for every name in CRITIQUE_SSE_EVENT_NAMES, decodes each frame back into a PanelEvent (stripping the critique. prefix and merging the data payload), and hands it to the caller's onEvent. Reconnect uses exponential backoff (1s → 30s) and resets on `ready`; malformed payloads drop with a dev-mode warning rather than tearing the stream. useCritiqueStream wraps the manager in a useReducer that owns the CritiqueState. enabled=false or a null projectId tears down the connection cleanly; switching projectId closes the old connection and opens a fresh one. The returned dispatch lets local UI synthesise actions (e.g. an Esc keypress firing a synthetic interrupted while a kill request is in flight); production traffic comes from the SSE stream. Test coverage: - sse.test.ts (10 cases, node env): subscription set covers every CRITIQUE_SSE_EVENT_NAMES channel; payload decoding lifts the wire shape back to PanelEvent; malformed JSON is swallowed and does not stop the stream; exponential backoff schedule and ready-reset semantics are pinned with a setTimeout seam; close() cancels pending reconnects and shuts the live source; no-op fallback when EventSource is unavailable. - useCritiqueStream.test.tsx (6 cases, jsdom env): idle pre-event, reducer driven by synthetic actions, no connection when disabled or projectId is null, clean close on unmount, projectId change reopens cleanly. * feat(web): useCritiqueReplay hook drives reducer from transcript file (Phase 7.3) Fetches the per-run NDJSON transcript (one PanelEvent per line), parses every line via the shared isPanelEvent predicate, and dispatches into the same CritiqueState reducer the live SSE stream uses. A single reducer means the UI rendering a replay can be identical to the live panel, and a UI mounting both useCritiqueStream and useCritiqueReplay in parallel does not have to reconcile two state shapes. speed knob is `paused \| instant \| live \| { intervalMs: N }`. - instant flushes every event synchronously, useful for opening a finished run already at its terminal state. - intervalMs paces dispatches at a fixed cadence so the reviewer can watch the run unfold. - paused parses the transcript but holds events back until the caller advances speed (consumers can drive a scrubber later). - live is reserved for the future "playback at original cadence" feature, currently treated as instant; replay timestamps are not yet persisted with each event so honest pacing requires a follow-up Phase 7+ task. gunzip seam handles `.ndjson.gz` transcripts via DecompressionStream when present; the production fetch path picks between text and arrayBuffer based on the URL extension. Both seams are injectable so the unit tests don't need to spin up a real network or a real gzip pipeline. Test coverage (8 cases, jsdom env): - Idle status before any URL is provided. - speed=instant flushes the full transcript synchronously to shipped state. - speed={intervalMs:N} paces with the setTimeout seam, reaching done after the last tick. - speed=paused leaves status=playing with no dispatches. - Empty transcript reports done with state still idle. - Fetch rejection surfaces an error status with the message. - Malformed NDJSON lines are skipped; valid events around them still land. - .gz transcripts route through the gunzip seam. Closes the Phase 7 plan tasks 7.1 / 7.2 / 7.3 (reducer + stream + replay), all on one branch ready for review. Phases 8+ (Theater components) consume these from this PR. * fix(web): close payload-override gap + paused-resume bug in Critique Theater hooks (Phase 7 review) Two P1 fixes from lefarcen's review on PR #1307: SSE payload override `sseToPanelEvent` previously spread `data` after the channel-derived `type`, so a payload-provided `type` could override the channel and route a `critique.run_started` frame into the reducer as a `ship` action. Reversed the spread so the channel-derived `type` is authoritative, and revalidated the resulting object through the contracts-level `isPanelEvent` predicate before returning. Frames that fail validation (missing runId, empty runId, unknown type) are dropped, so a malformed or compromised SSE frame can no longer dispatch a wrong-shape action into the reducer. Three new sse.test.ts cases pin the regression: hostile `type:'ship'` in the payload still resolves to `run_started`, missing runId is dropped, empty runId is dropped. Replay pause/resume `useCritiqueReplay` had one big effect keyed on `transcriptUrl` only, so flipping `speed` from `paused` to `instant` never re-fired and the held events sat undispatched. Split into a parse effect (depends on URL, fetches and stores events in state) and a pace effect (depends on parsed-events + speed, owns the cursor + timers). The playback cursor lives in a ref that survives pause/resume cycles, so flipping `paused` -> `instant` flushes from the current position rather than restarting (which would double-dispatch `run_started` and reset the reducer). Two new useCritiqueReplay.test.tsx cases: - paused-then-instant transitions from `playing` to `done` and reaches the shipped terminal phase - intervalMs paced playback dispatches one event, pauses to drain the next scheduled timer, flips to instant, and confirms the remaining transcript drains exactly once (cursor was preserved) Doc consistency The earlier source comment in useCritiqueReplay.ts claimed `live` "paces by recorded timestamps" while the impl used zero-delay timers and the PR body said it behaves like `instant`. Aligned to reality: `live` currently behaves like `{ intervalMs: 0 }` (events drain on successive microtasks via setTimeoutFn) because transcripts do not yet carry per-event timestamps. Honest timestamp-driven pacing is queued as a Phase 7+ follow-up. Validated: pnpm guard, pnpm --filter @open-design/web typecheck, Theater suite 47/47 (up from 42, +3 sse + 2 replay), full web suite 96 files / 888 tests. * feat(i18n): seed Critique Theater key block (en + zh-CN; other locales fall back via spread) * feat(web): Theater PanelistLane component (Phase 8.1) * feat(web): Theater ScoreTicker component (Phase 8.2) * feat(web): Theater RoundDivider component (Phase 8.3) * feat(web): Theater InterruptButton component with Escape keybind (Phase 8.4) * feat(web): Theater TheaterDegraded chip (Phase 8.5) * feat(web): Theater TheaterCollapsed post-run summary (Phase 8.6) * feat(web): Theater TheaterTranscript replay surface (Phase 8.7) * feat(web): Theater TheaterStage top-level container (Phase 8.8) * feat(web): Theater CSS using existing semantic tokens (no hex literals) * feat(web): Theater public exports barrel * fix(web): resolve P2 + P3 review feedback on Phase 8 (PR #1314) Addresses all 4 P2 + 3 P3 items from codex, Siri-Ray, and lefarcen. State-lifecycle fixes (3 x P2) 1. Reducer learns a synthetic `__reset__` action (`CritiqueResetAction`). Host hooks dispatch it when their gating prop changes so a stale run from a prior project / transcript cannot bleed into the next context. Reset is idempotent on idle (returns the same reference). 2. `useCritiqueStream` dispatches `__reset__` at the top of its connection effect, so a workspace switch from project A (which streamed a critique) to project B clears the reducer before the new EventSource opens. enabled=false also clears. 3. `useCritiqueReplay` dispatches `__reset__` at the top of its parse effect, so transcriptUrl swaps (including swap-to-null after a replay reached `shipped`) lift the reducer back to idle before the new fetch starts. SSE validation (1 x P2) 4. `sseToPanelEvent` now runs a per-variant `hasValidVariantShape` check after the cheap `isPanelEvent` predicate. A `critique.ship` frame missing `composite` / `round` / `status` / `artifactRef` is rejected before reaching the reducer, so TheaterCollapsed can no longer crash on `undefined.toFixed(1)`. Every variant's required fields are validated: run_started (protocolVersion, non-empty cast, maxRounds, threshold, scale), panelist_* (round, role, plus variant-specific shape), round_end (round, composite, mustFix, decision in {continue,ship}, reason), ship (round, composite, status, artifactRef.{projectId,artifactId}, summary), degraded (reason, adapter), interrupted (bestRound, composite), failed (cause), parser_warning (kind, position). Reducer correctness (1 x P2) 5. `panelist_open` now materializes the round + an empty panelist view (`{dims: [], mustFixes: []}`) so TheaterStage can highlight the in-progress lane the instant the tag opens. Before this, a stream that emitted only `panelist_open` after `run_started` left `rounds = []` and the UI rendered no current round until a later `panelist_dim` arrived. Polish (3 x P3) 6. Brand role tint swaps from `var(--magenta, var(--accent))` to `var(--purple, var(--accent))`. `--purple` is actually defined across the design systems; `--magenta` is not, so Brand was silently falling through to `--accent` and looking identical to Designer. 7. New i18n key `critiqueTheater.interruptedSummary` for the interrupted-collapse copy ("Interrupted at round N, best composite X.X"). Previously the interrupted branch reused `shippedSummary` and the UI read "Shipped at round..." for a run that specifically did not ship. Native value in en + zh-CN; other locales fall back via `...en` spread. 8. `TheaterDegraded` heading id comes from `useId()` instead of a hardcoded `theater-degraded-heading`, so two chips rendered on the same page (chat history with multiple completed runs) keep their aria-labelledby references unambiguous. Tests (15 new cases) - reducer.test.ts (+5): __reset__ on running/terminal/idle, panelist_open materializes round, panelist_open does not stomp prior panelist data. - sse.test.ts (+6): variant-level rejection for ship without required fields, degraded without adapter, run_started with empty cast, panelist_dim with non-numeric score, round_end with unknown decision, plus a positive fully-formed ship. - useCritiqueStream.test.tsx (+2): state reset on projectId change, state reset on enabled flip false. - useCritiqueReplay.test.tsx (+1): state reset on transcriptUrl swap to null after a replay reached shipped. - TheaterCollapsed.test.tsx (text-pinning update): asserts the interrupted branch reads "Interrupted at round 1" + "best composite 7.9", and explicitly NOT "Shipped at round...". - TheaterDegraded.test.tsx (+1): two chips on the same page get unique aria-labelledby ids that each resolve to an `<h3>`. Validated - pnpm guard clean - pnpm --filter @open-design/web typecheck clean - Theater suite: 13 files, 101 tests (was 86 on the first Phase 8 push, +15 new) - tests/i18n/locales.test.ts 5 of 5 across 18 locales * feat(web): CritiqueTheaterMount wires SSE + reducer into a single drop-in (Phase 9.1) * feat(i18n): Critique Theater strings for de + ja + ko + zh-TW (Phase 9.2) * fix(web): resolve P1 + P2 review feedback on Phase 9 (PR #1315) Addresses every blocker from codex, Siri-Ray, and lefarcen. The three state-lifecycle and SSE-validation issues they also flagged inherit fixes from PR #1314's review pass that this branch now sits on top of after rebase. Real daemon kill on Interrupt (P1) - CritiqueTheaterMount now POSTs to /api/projects/:id/critique/:runId/interrupt alongside the optimistic local dispatch. Before this fix, clicking Interrupt only flipped the React state to interrupted while the daemon job kept running. The fetch is best-effort: a 404 (endpoint not wired yet, lands in Phase 15) is swallowed with a dev-mode console.warn so the UI still moves to the collapsed badge. - New fetchInterrupt test seam lets RTL assert on the URL / method and simulate the "daemon not ready yet" path. Two tests pin both: the happy URL proj-42/critique/run-abc/interrupt POSTs, and a rejected fetch still flips the UI. interruptPending reset on new run (P2) - A ref-backed effect compares the current runId against the last one we saw; when it changes, interruptPending is cleared. A user who interrupts run-1 and then triggers run-2 from the same mount now gets a fresh, enabled kill button instead of one stuck in "Interrupting…". Pinned by a new mount test. Escape keybind scope (P2) - InterruptButton now checks the keydown target. Escape inside an input, textarea, select, or contenteditable element is ignored (and any ancestor of those via closest() is treated the same way). Body-level focus still fires the keybind so the Theater area's affordance keeps working. Four new tests cover textarea, input, contenteditable, and the body-focus positive case. userFacingName i18n key (P2) - The spec at specs/current/critique-theater.md:6 mandates a single critiqueTheater.userFacingName key so the "Design Jury" label can be renamed without touching code. Phase 8 introduced critiqueTheater.title by mistake; renamed across types.ts, en.ts, zh-CN.ts, de.ts, ja.ts, ko.ts, zh-TW.ts, and the lone consumer TheaterStage.tsx. The locale alignment test stays green. Validated - pnpm guard clean - pnpm --filter @open-design/web typecheck clean - Theater suite: 14 files, 112 tests (was 101 before, +11 new for the Phase 9 review pass: 3 mount + 4 InterruptButton focus scope; the rest were already in #1314's review fix). - tests/i18n/locales.test.ts 5 of 5 across 18 locales. * feat(daemon): adapter-degraded registry with TTL (Phase 10.1) In-memory registry recording adapters that produced malformed or oversize transcripts so the orchestrator can skip them for a TTL window (default 24h) instead of cycling through known-bad providers on every run. Records carry reason (malformed_block \| oversize_block \| missing_artifact), source label, and expiresAt. The test-only clock seam lets the suite advance time deterministically and prove that an expired entry stops counting as degraded without anyone calling clearDegraded. 7/7 vitest cases green. * feat(daemon): synthetic good + bad adapter fixtures (Phase 10.2) Two test-only adapters that read the existing v1 transcript fixtures (happy-3-rounds and malformed-unbalanced) and replay them as either a full string or a 512-byte chunked stream. The chunked form is what the conformance harness uses to prove the parser holds together when the transcript arrives in arbitrary network slices, not as one buffered blob. * feat(daemon): adapter conformance harness (Phase 10.3) runAdapterConformance pulls a transcript through the same parseCritiqueStream pipeline the orchestrator uses and classifies the outcome as shipped, degraded, or failed. On a degraded outcome it forwards the matched reason to the adapter-degraded registry, so a single nightly conformance run is what populates the skip list rather than the orchestrator learning each adapter is broken at request time. 5/5 vitest cases green covering shipped, malformed degraded, oversize degraded, no-ship failure, and the harness-thrown failure path. * test(e2e): Critique Theater Playwright suite (Phase 11) Six tests, one viewport per visual case, deterministic SSE fixtures stubbed via page.route(). Adds the suite to test:ui:extended so the existing extended-UI lane picks it up. Coverage: 1. Happy path: a single mounted theater plays the full fixture (1 run_started, 5 panelists open / dim / must_fix / close, 1 round_end, 1 ship) and ends on the score badge. 2. Interrupt mid-run: the panelist that is open at the time the interrupt button is clicked closes with an interrupted marker and the transcript freezes there. 3. Visual regression at 375x720 mobile. 4. Visual regression at 768x1024 tablet. 5. Visual regression at 1280x800 desktop. 6. A11y role tree: the theater region exposes a labelled landmark, each panelist lane is a group with an accessible name, the score is a status live region. All SSE traffic is stubbed by page.route so the suite runs in CI without a daemon. The toggle is seeded via localStorage by bootAppWithCritiqueEnabled so the gate behaves as if Settings flipped it on. typecheck clean; playwright --list reports 6. * test(web): reducer p99 bench at 10k iterations (Phase 13.1) Locks the documented 2ms budget for the Critique Theater reducer on a representative SSE script (27 actions, one full happy run) behind a regression gate. Asserts p99 stays under 4ms (2x the documented budget) so CI runners with a noisy neighbour do not flake while a real regression to 20ms or 200ms still trips. The bench is a vitest case rather than a bare microbenchmark so it runs in the same CI lane as every other web test and does not need a parallel runner. * test(web): critique surface coverage walker (Phase 13.2) Walks the public critique surface (11 SSE event names, 5 panelist roles, 6 lifecycle phases, 9 named i18n keys) and asserts each named symbol appears in both the src corpus and the test corpus. The walker is the gate that catches a rename in one half of the codebase without a matching update in the other half: a future PR that drops 'panelist_must_fix' from the reducer without also removing its test reference fails this suite. 62 assertions, one per symbol per corpus. * docs: Critique Theater user guide (Phase 14.1) Seven sections aimed at end users (not contributors): 1. What is Design Jury 2. How it works (the five panelists, auto-converging rounds, the composite formula) 3. Settings (the M1 toggle and what it does) 4. Reading the score badge 5. Replay surface 6. Troubleshooting (degraded, interrupted, failed) 7. FAQ The composite formula is documented as designer * 0 + critic * 0.4 + brand * 0.2 + a11y * 0.2 + copy * 0.2 because anyone trying to reverse-engineer the score is going to search for those weights and the docs are the place they should land first. * docs(daemon): critique module AGENTS map (Phase 14.2) Daemon-side wayfinder for the apps/daemon/src/critique directory. Tables every file, what owns what invariant, and the 'when you change anything here' guide so a future contributor does not have to reverse-engineer the rollout resolver before adding a new SSE event. * docs(web): Theater module AGENTS map (Phase 14.3) Web-side mirror of the daemon AGENTS map. Same file table, same invariants section, same change-impact guide, sized to the Theater component package. * feat(daemon): rollout flag resolver (Phase 15.1) Single decision point every caller consults to know whether the orchestrator should wire the critique pipeline for a given run. Priority: 1. Skill-level policy (required wins, opt-out wins inversely) 2. Per-project override from the Settings toggle 3. OD_CRITIQUE_ENABLED env override 4. Rollout phase default M0 dark-launch false M1 settings only false (toggle is off until the user flips it) M2 per-skill true if skill opted in M3 global default true OD_CRITIQUE_ROLLOUT_PHASE parser defaults to M0 on unknown input so a fresh install never surprises a user with the feature on. 10/10 vitest cases green covering every cell of the matrix. * feat(web): Settings toggle hook for Critique Theater (Phase 15.2) React hook that reads critiqueTheaterEnabled from the existing open-design:config localStorage blob and stays in sync via: - the platform storage event (cross-tab) - a open-design:critique-theater-toggle CustomEvent (same-tab) Same-tab event is the one that fires when the Settings panel saves in the current window: the toggle and every mounted theater update without a page reload. setCritiqueTheaterEnabled(next) is the imperative setter the Settings panel calls. It preserves the rest of the stored config (mode, apiKey, etc.) and dispatches the same-tab event after the localStorage write. The web hook reflects what the user toggled; the daemon-side isCritiqueEnabled is the final routing authority (project override, env, rollout phase). When they disagree, the daemon wins for backend gating and the web reflects the toggle state. 6/6 vitest cases green covering first read, stored read, same-tab event flip, config preservation, corrupted JSON tolerance, and cross-tab storage event. * test(web): Phase 15 toggle hook failure-mode coverage (PR #1320) lefarcen P2 on PR #1320 flagged that the PR body claimed safe behavior for disabled localStorage, non-object JSON, and missing CustomEvent shim, but the suite only covered corrupt JSON plus happy-path storage events. Added four failure-mode tests so the swallowed errors are not silently traded for a throw in a future refactor: 1. Returns false on a stored JSON value that parses to an array (non-object). Catches a regression where the guard treats anything truthy as a config blob. 2. Returns false on a stored JSON value of literal 'null'. typeof null === 'object' in JS, so the guard has to check null explicitly; this test pins that check. 3. Returns false when localStorage.getItem throws (private mode / disabled storage / SecurityError). The hook must swallow and return false so the rest of the app keeps rendering. 4. setCritiqueTheaterEnabled still dispatches the same-tab CustomEvent when localStorage.setItem throws (quota exceeded / disabled storage). The dispatch path is the in-session broadcast that keeps every mounted hook coherent even when persistence is unavailable; verified by mounting two probes and asserting both flip after the setter is called with a throwing setItem. 10/10 vitest cases green (6 existing + 4 new). * fix(web): honor CustomEvent payload in toggle hook listener (PR #1320) Both Siri-Ray (blocking) and lefarcen (P2 new) caught the same real bug in the failure-mode test I added in `affcdd27`: the test asserts the in-session UI flips when localStorage.setItem throws, but the CustomEvent listener was ignoring the event's typed detail and just calling readToggle(). Under a throwing setItem the localStorage value is stale (or absent), so the listener would see the OLD value and the test would fail (or worse, the production claim 'in-session event keeps mounts coherent' was hollow). Fixed the hook, not the test: the listener now reads event.detail.enabled when it is a boolean, falling back to readToggle() only for malformed events or for cross-tab storage events (which do not carry a typed payload). The setter already dispatched the detail; the listener just was not consuming it. Test changes: - The existing 'setItem throws' test now asserts the right behavior for the right reason. Updated the inline comment to say the listener reads from detail, not localStorage. - New test 'falls back to readToggle when the CustomEvent carries no usable detail' pins the fallback path: a malformed dispatcher (no detail, or detail.enabled not a boolean) degrades cleanly instead of throwing or being silently ignored. 11 / 11 vitest cases green (10 prior + 1 new fallback). * feat(daemon): route critique spawn-path eligibility through the rollout resolver The wireup edit Phase 10 and Phase 15 carved out: today server.ts gates the critique pipeline on critiqueCfg.enabled, which is just the OD_CRITIQUE_ENABLED env var. After this commit it gates on isCritiqueEnabled(...) from the Phase 15 resolver, so the full priority matrix is live: 1. Per-skill od.critique.policy veto (opt-out / required) 2. Per-project override (M1 Settings toggle, written through the existing Phase 6 settings endpoint) 3. OD_CRITIQUE_ENABLED env override (power-user lane / CI fixtures) 4. OD_CRITIQUE_ROLLOUT_PHASE default M0 dark-launch false M1 settings only false M2 per-skill only when skillPolicy === 'opt-in' M3 global default true Default behaviour on a fresh install is unchanged: the resolver returns false at M0 without an env override or a project override, so prod traffic falls through to the legacy single-pass path exactly the way it did before. Inputs threaded today: phase from OD_CRITIQUE_ROLLOUT_PHASE, envOverride from OD_CRITIQUE_ENABLED. skillPolicy and projectOverride are passed as null for the v1 cutover; the daemon-side handler that round-trips critiqueTheaterEnabled on the project settings row and the od.critique.policy frontmatter resolver land as the next two commits in this branch. The three call sites that used critiqueCfg.enabled (the brand-thread guard, the skill-thread guard, the top-line critiqueShouldRun compound) now read from a single locally-scoped critiqueEnabledForRun boolean, so the eligibility check is computed exactly once per spawn and the prompt composer + orchestrator stay in lockstep the way the existing comment already promised. Tests still green: daemon vitest 22 / 22 across rollout + conformance + adapter-degraded. Daemon typecheck clean. * feat(web): mount CritiqueTheaterMount in ProjectView The web counterpart of the daemon wireup. ProjectView now renders <CritiqueTheaterMount projectId={project.id} enabled={...} /> as a sibling of <AppChromeHeader> inside the top-level <div className="app">. The mount is the drop-in from the Phase 9 stack: it owns the SSE subscription, the kill-request handshake, and the phase-aware swap from the live <TheaterStage> to the collapsed badge once a run settles. The mount returns null until the daemon emits a critique.run_started for the active project, so the visual surface is byte-for-byte unchanged for users who have not opted in. Enabled wiring: useCritiqueTheaterEnabled() reads the M1 Settings toggle from the existing open-design:config localStorage blob and stays in sync with both the platform storage event (cross-tab) and the same-tab open-design:critique-theater-toggle CustomEvent the Phase 15 setter dispatches. The hook honors the event payload directly so a private-mode browser that cannot persist the toggle still updates the in-session UI correctly. The daemon-side gate (isCritiqueEnabled in apps/daemon/src/server.ts) remains the authority for whether a run is actually wired through the critique pipeline. This hook only governs whether the web layer renders the resulting SSE stream when the daemon emits one. The two-layer gate is intentional: an integrator embedding the Theater in a custom UI can flip the web visibility independent of the daemon's routing decision, and a daemon-side env override flips backend gating without touching the web's localStorage. Tests still green: web Theater suite 181 / 181 across 16 files. Web typecheck clean. * feat(daemon): resolve od.critique.policy frontmatter at the spawn site The next step in the wireup branch's ladder: replace the placeholder `skillPolicy: null` with the actual value parsed from the active skill's SKILL.md frontmatter. Three small edits, one new field on a public type: 1. SkillInfo gains a `critiquePolicy: SkillCritiquePolicy` field carrying the parsed `od.critique.policy` token (required / opt-in / opt-out / null). The field is null when the skill has no opinion, which lets the lower-priority resolver tiers (projectOverride, envOverride, phase default) decide. 2. listSkills() populates the new field via a small `normalizeCritiquePolicy` helper that tolerates the YAML scalar's casing and trims whitespace. Unknown tokens collapse to null so a typo in SKILL.md cannot accidentally force the panel on or off; it just falls through. Derived example cards inherit the parent's policy. 3. server.ts captures `skill.critiquePolicy` into a hoisted `skillCritiquePolicy` variable inside the existing skill-load block, then threads it into the isCritiqueEnabled call as the skillPolicy input. The hoisting keeps the variable in scope at the resolver call site without restructuring the spawn handler. After this commit, the priority matrix the rollout resolver was designed for is live for its top tier. The previous commit wired env + phase; this one wires skill. The projectOverride input remains null pending the next commit that extends the Phase 6 settings endpoint. Daemon vitest: 10 / 10 rollout cases pass against the new wiring. Daemon typecheck: clean. * feat(daemon): feed projectOverride into the rollout resolver from project metadata Replaces the placeholder `projectOverride: null` in the spawn handler with the actual value the Settings panel writes onto the project's metadata blob: `critiqueTheaterEnabled?: boolean`. The read is defensive at the boundary: the metadata object is typed loosely (it round-trips through SQLite as a free-form JSON blob), so the spawn handler narrows to `boolean` and falls through to `null` for any other shape. A missing key, a malformed value, or a project that has never visited Settings collapses to `null`, which is exactly the resolver's "no opinion, fall through to env / phase" signal. The `critique` frontmatter slot also gets typed on the SkillFrontmatter shape so the `od.critique.policy` chain the previous commit introduced no longer needs a bracket-access cast. Same pattern as the existing `craft`, `preview`, and `design_system` nested-record slots. After this commit, every tier of the rollout resolver's priority matrix is wired: 1. skillPolicy (from SKILL.md od.critique.policy) 2. projectOverride (from project metadata critiqueTheaterEnabled) 3. envOverride (from OD_CRITIQUE_ENABLED) 4. rollout phase (from OD_CRITIQUE_ROLLOUT_PHASE) The write path for projectOverride still flows through the existing project-update handler the Settings panel already uses to persist project metadata; no new endpoint is needed. The Settings UI button that calls setCritiqueTheaterEnabled and posts the new field is the next commit on this branch. Daemon typecheck: clean. Daemon vitest: 10 / 10 rollout cases still green against the new wiring. * fix(daemon): forward critique events to project sinks + align composer gate (PR #1338) Two codex review items addressed in one commit since they share the same root cause (resolver-enabled run hits a transport / prompt contract that was still env-gated): P1 (transport mismatch). The daemon emits critique.* SSE frames through critiqueBus -> design.runs.emit, which fans out on /api/runs/:runId/events. The web CritiqueTheaterMount subscribes to /api/projects/:projectId/events (it's project-scoped, not run- scoped, because the mount lives at the project workspace and follows the user across runs). Result: in production the mount never sees a real frame and the e2e tests' stubbed routes hide the mismatch. Fixed by extending critiqueBus.emit to fan out to BOTH sinks: the existing runs.emit transport, AND the per-project event-sinks map. The project-events route emits via sse.send(payload.type, payload), so we pack the SSE channel name onto payload.type and let the sink push the right channel. The web sseToPanelEvent overwrites type from the channel name on the way back into a PanelEvent, so the round-trip stays correct. P2 (prompt gate misalignment). composeSystemPrompt reads cfg.enabled to decide whether to append the panel addendum, but critiqueCfg.enabled is loaded from OD_CRITIQUE_ENABLED only. A run the resolver enabled via phase / project / skill (env unset) would have critiqueShouldRun = true while critiqueCfg.enabled remained false, dropping the panel prompt while still routing through runOrchestrator -> parser waits for tags that never arrive -> run degrades. Fixed by passing a derived config { ...critiqueCfg, enabled: true } to the composer when critiqueShouldRun is true. The composer's own gate now agrees with the resolver decision on every input the spec defines. Daemon typecheck: clean. Daemon vitest: 10 / 10 rollout cases still green against the new wiring. * fix: address PerishCode P1 + P2 follow-ups on PR #1338 Two follow-up items PerishCode flagged on the activation PR. Non-blocking but both are real: 1. Phase 11 e2e suite was wired into test:ui:extended but lands the user on '/' (home route) where ProjectView (and therefore CritiqueTheaterMount) is never rendered. With the suite as written, every assertion would time out the first time the lane runs in CI, contradicting the PR body's claim that the suite stays parked behind test.describe.fixme. The state diverged from my earlier Phase 11 work because the merge from main on commit `4ab719c6` brought in #1307's squash-merged version of the e2e file (the pre-fixme shape). Re-applied test.describe.fixme to the describe block plus removed ui/critique-theater.test.ts from the test:ui:extended script in e2e/package.json. Added a file-header docblock explaining what the follow-up commit needs to do: replace goto('/') with /projects/:id navigation similar to app-design-files.test.ts, split the SSE fixture into a live prefix and terminal suffix (Codex P2 on PR #1320), and commit the first PNG baselines. 2. bestRoundOf in CritiqueTheaterMount returned the LAST round with a numeric composite, not the round with the HIGHEST composite, while bestCompositeOf correctly returned the max. A run that closed round 1 at 8.5 and round 2 at 6.0 would dispatch interrupted { bestRound: 2, composite: 8.5 } on a user-clicked interrupt. Folded the two helpers into a single bestRoundAndComposite that walks state.rounds once and returns the matching pair so the two values cannot drift. The onInterrupt callback now destructures from one helper instead of two independent reads. Falls back to (state.activeRound, 0) when no round has closed with a composite yet. Web typecheck: clean. CritiqueTheaterMount.test.tsx: 7 / 7 cases still green against the new helper. * fix: wire M1 project override end-to-end + correct deferred-surface doc claims (PR #1338) Three lefarcen P2s on the latest review pass, all real: 1. M1 project override was half-wired: the daemon read metadata.critiqueTheaterEnabled but the web setter only wrote localStorage. A user opt-in would render the Theater on the web (localStorage was set) while the daemon resolved projectOverride=null and skipped critique unless env / phase already permitted. Two halves talking past each other. Extended setCritiqueTheaterEnabled to accept an optional { projectId, fetchProjectSettings } options bag. When a projectId is supplied, the setter ALSO sends a PATCH /api/projects/:id with { metadata: { critiqueTheaterEnabled } } so the daemon's spawn-time resolver picks the same value up on the next generation. The existing project-routes endpoint already accepts arbitrary metadata patches, so no new endpoint is needed. The local write + the CustomEvent dispatch still fire before the PATCH, so a network failure does not unwind the in-session UI flip. Three new vitest cases pin the new path: PATCHes when projectId is provided, skips when it is not, swallows a rejected PATCH so the in-session UI still flips. 2. Rollout docs (docs/critique-theater.md section 3) claimed the Settings toggle persists into the daemon settings store, but the previous implementation only had a localStorage reader / writer plus a daemon read of project metadata, with no round-trip. Rewrote the section to lead with the four-tier resolver (skill policy / project override / env / phase), document that the setter now round-trips via the existing PATCH endpoint when given a projectId, and call out the Settings panel UI control as a deliberate follow-up. 3. Troubleshooting table pointed users at /api/metrics/critique (Phase 12, deferred) and 'od adapters clear-degraded <id>' (CLI wrapper that does not exist). Replaced the metrics reference with the local conformance harness command (pnpm --filter @open-design/daemon vitest run tests/critique-conformance.test.ts) that ships today, with a note that the Phase 12 dashboard surfaces this status as a series once that PR lands. Replaced the CLI command with the programmatic clearDegraded() helper that exists today and flagged the CLI wrapper as planned follow-up. Web typecheck: clean. Toggle hook tests: 14 / 14 green (11 existing + 3 new for the round-trip path). * test(web): multi-round interrupt regression for bestRoundAndComposite (PR #1338) lefarcen P3 follow-up to the previous bestRoundAndComposite fix: the existing CritiqueTheaterMount.test.tsx interrupt cases only exercised a single-round state, so a future refactor back to two independent helpers wouldn't be caught by the test suite even though it'd reintroduce the round / composite drift bug. Added a regression case that: 1. Drives the reducer through two complete rounds with the full 5-role cast closing at distinct composites: round 1 at 8.5, round 2 at 6.0 (the high-composite round is NOT the most recent one). 2. Clicks Interrupt + waits for the daemon ack via the test seam fetcher returning 204. 3. Asserts the collapsed badge displays "round 1" (the correct best-composite round), and queryByText for "round 2 ... 8.5" returns null (the buggy pairing would have produced that string). The bestRoundAndComposite helper walks state.rounds in one pass and returns the matching pair, so the round number and the composite cannot drift apart. This test locks the fix in: a refactor that splits the helpers back into independent walks will be caught here. 8 / 8 vitest cases green on the file. * fix(web): read-merge-write the project metadata in setCritiqueTheaterEnabled (PerishCode P2 on PR #1338) The previous round-trip sent { metadata: { critiqueTheaterEnabled: next } } as the entire PATCH body. The daemon's project-routes handler only re-stamps three immutable fields (baseDir, importedFrom, fromTrustedPicker) before calling updateProject(db, id, patch), which then does a shallow { ...existing, ...patch } in apps/daemon/ src/db.ts. So patch.metadata replaces the row's metadata wholesale, dropping kind, templateId, linkedDirs, and every other field the rest of the app reads. No in-tree caller passes projectId today (only vitest cases), so the bug had not surfaced yet. But the surface is documented in docs/critique-theater.md section 3 and the function's own JSDoc as the M1 round-trip path, so it would have shipped as a latent footgun for the next integrator: a Settings UI follow-up, or any third party that wires the setter into a project-aware surface. Fix: read-merge-write rather than a bare patch. - GET /api/projects/:id to read the row's current metadata. - Spread that metadata into the PATCH body and overlay critiqueTheaterEnabled: next on top, mirroring the partial-metadata pattern already used in ChatComposer.tsx for linkedDirs. - PATCH the merged object. Failure handling: - GET fails: skip the PATCH entirely. We cannot construct a safe merged body without the current state, and a bare patch would wipe other metadata. The in-session CustomEvent fired earlier in the setter still keeps every mounted hook consistent; the next save retries the round-trip. - PATCH fails: log in dev. The in-session UI is already correct via the CustomEvent. Tests (TDD, red-first): - 'GETs the project then PATCHes with merged metadata when a projectId is supplied': stubs a GET that returns { kind: 'template', templateId: 'modern-blog', linkedDirs: [...] } and asserts the PATCH body equals the merge plus the toggle. - 'PATCHes with just the toggle when the project has no prior metadata': stubs a GET that returns no metadata block. - 'skips the PATCH (does not stomp metadata) when the prefetch GET fails': stubs a rejecting GET and asserts only the GET fires. - 'swallows a rejected PATCH after a successful prefetch': stubs a successful GET and a rejecting PATCH; asserts the in-session UI still flips via the CustomEvent. Doc updated on the setter's JSDoc to describe the new three-step flow (localStorage, CustomEvent, read-merge-write PATCH) and the two failure modes. Verified: - pnpm --filter @open-design/web typecheck clean. - pnpm --filter @open-design/web test: 111 files / 1055 tests green (was 1052, +3 from the new merge-flow cases). * fix(web): restore wait-for-daemon-ack pattern on Theater interrupt Same regression as flagged on PR #1316 post-main-merge: the optimistic local dispatch fired before the POST resolved, so a daemon 404 / 409 still terminalized the UI and the real SSE terminal event got ignored by the sticky interrupted phase. Snapshot runId / bestRound / composite at click time, dispatch interrupted only on res.ok, clear interruptPending on rejection or non-2xx so the user can retry. Tests cover rejection + 404 leaving the run on the live stage; the 204 path waits for the ack. * feat(daemon): Critique Theater Phase 12 observability foundations Lands the metrics registry, the structured logger, the /api/metrics route, and the adapter-degraded bump that wires up the first data point. The orchestrator-side bumps for runs / rounds / composite / must-fix / interrupted / parser_errors / protocol_version land in a follow-up commit on this branch (kept separate so the wiring diff reads cleanly against the registry shape). Surfaces added: - apps/daemon/src/metrics/index.ts: 9 Prometheus series under the open_design_critique_* namespace with the histogram buckets the spec calls out (round_duration_ms at 100 / 250 / 500 / 1000 / 2500 / 5000 / 10000 / 30000 / 60000 ms; composite_score at 0-10 integer steps). - apps/daemon/src/logging/critique.ts: 6 typed events, one JSON line per call on stdout, namespaced critique. Matches the JSON-per-line convention cli.ts already uses; no new logger framework. - apps/daemon/src/server.ts: GET /api/metrics route. Honors OD_METRICS_ENDPOINT=disabled to opt out for air-gapped installs. - apps/daemon/src/critique/adapter-degraded.ts: markDegraded now bumps degraded_total so the adapter-health dashboard panel reflects every TTL refresh and every fresh mark. Deps: prom-client ^15.1.0, @opentelemetry/api ^1.9.0 added to apps/daemon/package.json. Both are zero-config no-ops without an exporter wired; daemon bundle size impact is ~150 KB uncompressed. The @opentelemetry/api dep is in place ahead of the OTel-spans follow-up commit; it adds no behavior on this commit. Tests: - tests/metrics/critique.test.ts (3 cases): registry shape + exposition text + reset-between-tests - tests/logging/critique.test.ts (4 cases): event shape + ordering + newline framing + namespace stamping Verification (Windows-local): - pnpm --filter @open-design/daemon typecheck: clean - New metrics + logging suites: 7 / 7 green - Existing adapter-degraded + conformance + rollout suites: 22 / 22 green; the bump is non-breaking * feat(daemon): wire Critique Theater metrics + structured logs from the orchestrator Lights up the bump sites the Phase 12 foundations PR registered the series for. Every panel event the parser surfaces now reaches the matching Prometheus counter / histogram and the matching JSON log line on stdout. Switch-loop bumps + logs: - run_started: log run_started, set protocol_version gauge to the observed protocol version (small-integer cardinality). - panelist_open: record the first-open wall-clock per round so round_end can compute round_duration_ms; subsequent opens in the same round leave the start time untouched. - panelist_must_fix: bump must_fix_total with the panelist role. The wire event does not yet carry a dim name, so the label is 'unspecified' for now; a future parser revision can drop in the real dim without a metric rename. - round_end: bump rounds_total, observe composite_score, observe round_duration_ms (current ms minus the tracked start), log round_closed with the composite / mustFix / decision triple. - parser_warning (parser-yielded): bump parser_errors_total with the kind label, log parser_recover with kind + position. Orchestrator-side parser warnings (composite_mismatch and duplicate_ship from the daemon-authoritative scoring checks) go through a new emitParserWarning helper so the bus emit, the collectedEvents push, the metric bump, and the log line stay in lockstep. Three inline emission sites collapse to one-line helper calls. After the try/catch, a single terminal-status switch bumps runs_total{status, adapter, skill} once per run, with branch- specific log + counter: - shipped / below_threshold: log run_shipped - interrupted: bump interrupted_total, log run_failed{cause: interrupted} - timed_out: log run_failed{cause: timed_out} - failed: log run_failed{cause: orchestrator_internal} - degraded: log degraded{reason: orchestrator_classified} OrchestratorParams gains optional skill: string for the label; defaults to 'unknown' so spawn sites that have not yet threaded it keep working without a metric shape change. Tests: - The new metrics + logging suites (7 / 7) verify registry shape and event framing; orchestrator-side metric integration is exercised through the existing critique-conformance and critique-adapter-degraded suites (22 / 22 still green). - Logger test reassigns process.stdout.write directly instead of vi.spyOn so the Node overloaded write signature does not collide with MockInstance<unknown>. * feat(observability): Grafana dashboard JSON for Critique Theater Three default rows mapping to the metrics this branch wires up: 1. Fleet quality: composite score p50 / p90 / p99 line graph by adapter, plus a heatmap of the composite distribution. The line graph answers 'are my agents getting better over time'; the heatmap answers 'are the bad runs clustered around one adapter or smeared across the fleet'. 2. Adapter health: stacked bar charts for degraded marks (by adapter / reason) and parser errors (by adapter / kind) over a 5-minute window. The two queries together let an operator see 'is this adapter degraded because of malformed wire output or because of oversize blocks' without flipping panels. 3. Brief throughput: runs-per-hour by terminal status, an average rounds-per-run stat per adapter, and a round-duration ms p50 / p90 / p99 line. Throughput numbers fall straight out of the runs_total / rounds_total counters; the duration histogram is the same one the runs feed. The dashboard uses a templated $datasource var (defaults to 'prometheus') so an operator with multiple Prometheus instances can switch without editing JSON. Schema version 39 (Grafana 11). Operators import via: pnpm dlx @grafana/cli dashboard import tools/dev/dashboards/critique.json or paste into a provisioned dashboards directory. The file is checked into the repo as a starting artifact; alert rules and SLO panels ship after the first 1000 runs inform the right thresholds. JSON validates with node -e 'JSON.parse(...)' (sanity checked locally). * feat(daemon): OpenTelemetry outer span around the critique run Wraps each runOrchestrator call in a 'critique.run' span via the existing @opentelemetry/api dep added in the Phase 12 foundations commit. Attributes set on the span: - critique.run_id, critique.adapter, critique.skill at start - critique.final_status, critique.final_composite on terminal resolution - span status flipped to ERROR for failed / timed_out runs so a Tempo / Honeycomb / Jaeger filter on traces.status=error surfaces the right slice without joining back to Prometheus No exporter is wired by default; @opentelemetry/api is the API package and intentionally splits from @opentelemetry/sdk-, so the span is zero-overhead until an operator attaches an SDK through their runtime config. Inner per-round / parse_chunk / scoreboard_eval / persist_round / ship.persist spans defined in the Phase 12 plan are a follow-up: the outer span alone gives the trace a duration + final status + adapter/skill labels, which is the 80% value for dashboards that correlate runs across services. Adding child spans inside the existing 600-line orchestrator without restructuring is a separate careful change. Verification: - pnpm --filter @open-design/daemon typecheck: clean - 29 / 29 critique + metrics + logging tests still green fix(nix): bump pnpmDepsHash for prom-client + @opentelemetry/api lockfile bump nix-check failed on PR #1485 with hash mismatch in open-design-daemon-pnpm-deps and open-design-web-pnpm-deps after the Phase 12 foundations commit (`2b8b7445`) added prom-client and @opentelemetry/api to apps/daemon/package.json and refreshed pnpm-lock.yaml. CI reported the new sha: specified: HFLm+8hv3o5x3Xem4MXNsNclIgiVRc70+EBafL0rVn8= got: 7R1sQC38gOT0gsZ2oNOviCZ486cbbGJGJCis6WI8z9s= Both nix files pin the same workspace lockfile, so both flip in lockstep. No other Nix surface changes required. * fix(daemon): four Phase 12 review findings (Codex P2 x2 + Siri-Ray P2 + lefarcen P2) 1. Siri-Ray P2 in orchestrator.ts (round metric / log used untrusted agent values). The new observability path now records rs.composite and rs.mustFix (daemon-authoritative) instead of event.composite and event.mustFix when rs exists, and skips the bumps + log entirely when rs is missing (a degenerate round_end without any matching panelist_open). The dashboard p50 / p90 / p99 now agrees with persistence and ship decisions; an adapter reporting <ROUND_END composite='10'> while the daemon computed 6 logs 6 and still emits the composite_mismatch parser warning the prior block was already producing. 2. Codex P2 in server.ts (skill label always 'unknown'). The spawn path called runOrchestrator without passing the resolved skill id, so every live run bumped open_design_critique_{skill='unknown'} and the per-skill dashboard breakdown was always empty. Threaded effectiveSkillId (already computed at the same handler scope as the project skill fallback) through skill: . . . so the metric reflects the real skill when one is assigned, and the orchestrator default of 'unknown' only fires for runs that genuinely have none. 3. Codex P2 in conformance.ts (protocol-version mismatch let through). An adapter that emitted <CRITIQUE_RUN version='2'> followed by a valid SHIP classified as shipped because the harness only watched for terminal events. Added a guard inside the parse loop: if a run_started carries protocolVersion !== CRITIQUE_PROTOCOL_VERSION, mark the adapter degraded with reason 'protocol_version_mismatch' (already in DEGRADED_REASONS) and return early. ConformanceOutcome union widened to accept the new reason. 4. lefarcen P2 in tools/dev/dashboards/critique.json (runs-per-hour panel under-reported by 3600x). 'rate(...[1h])' returns per-second. Multiplied by 3600 so the panel title and unit match the actual value rendered. Verification: - pnpm --filter @open-design/daemon typecheck: clean - New metrics + logging suites (7), existing adapter-degraded (7), conformance (5), rollout (10): 29 / 29 green - Grafana JSON re-parses with node -e 'JSON.parse(...)' fix(nix): set pnpmDepsHash to fakeHash so CI surfaces the real hash for the regenerated lockfile (lefarcen P1 on PR #1485) * fix(nix): pin pnpmDepsHash to sha256-NtXbiRU0YZ4EVJVNC6N3sR1S0ozA3BvCwgXI0L0OMH4= from CI nix-check output --------- Co-authored-by: Nagendhra <nagendhra405@gmail.com>	2026-05-13 22:11:27 +08:00
Nagendhra Madishetti	385e1d111d	feat(web): Critique Theater Phase 9 (drop-in mount wrapper, native i18n for de, ja, ko, zh-TW) (#1315 ) * feat(web): pure reducer for Critique Theater states (Phase 7.1) Pure CritiqueState reducer driven by the contracts-level PanelEvent (the same shape both the live SSE stream and the recorded transcript emit), so a single reducer powers both the in-flight panel and the rerun replay. Lifecycle covers run_started → running → (shipped / degraded / interrupted / failed), with panelist_open / dim / must_fix / close / round_end events building per-round CritiquePanelistView entries as they arrive. Defensive behaviour that surfaced while writing the spec tests: - Terminal phases (shipped / degraded / interrupted / failed) are sticky against further lifecycle events for the same run, except for parser_warning which can land late and is recorded in a side channel without changing phase. - A new run_started for a different runId at any time discards the prior state and reboots, so the UI can launch consecutive runs without an explicit reset action. - Events whose runId does not match the active run return the same state reference, so React's useReducer doesn't re-render subscribers on stray traffic. - Round bookkeeping keys by round number rather than "always last", so an out-of-order panelist_dim for round 1 arriving after a round 2 dim does not corrupt the round 2 bucket. Test coverage: 18 cases covering each transition, the runId guard, sticky-terminal behaviour, the out-of-order round invariant, and the stable-identity guarantee. Sets up Phase 7.2 and 7.3 to wire SSE + replay into the same reducer. * feat(web): useCritiqueStream hook subscribes to SSE and feeds reducer (Phase 7.2) createCritiqueEventsConnection is a pure connection manager that mirrors apps/web/src/providers/project-events.ts: opens an EventSource at /api/projects/:id/events, listens for every name in CRITIQUE_SSE_EVENT_NAMES, decodes each frame back into a PanelEvent (stripping the critique. prefix and merging the data payload), and hands it to the caller's onEvent. Reconnect uses exponential backoff (1s → 30s) and resets on `ready`; malformed payloads drop with a dev-mode warning rather than tearing the stream. useCritiqueStream wraps the manager in a useReducer that owns the CritiqueState. enabled=false or a null projectId tears down the connection cleanly; switching projectId closes the old connection and opens a fresh one. The returned dispatch lets local UI synthesise actions (e.g. an Esc keypress firing a synthetic interrupted while a kill request is in flight); production traffic comes from the SSE stream. Test coverage: - sse.test.ts (10 cases, node env): subscription set covers every CRITIQUE_SSE_EVENT_NAMES channel; payload decoding lifts the wire shape back to PanelEvent; malformed JSON is swallowed and does not stop the stream; exponential backoff schedule and ready-reset semantics are pinned with a setTimeout seam; close() cancels pending reconnects and shuts the live source; no-op fallback when EventSource is unavailable. - useCritiqueStream.test.tsx (6 cases, jsdom env): idle pre-event, reducer driven by synthetic actions, no connection when disabled or projectId is null, clean close on unmount, projectId change reopens cleanly. * feat(web): useCritiqueReplay hook drives reducer from transcript file (Phase 7.3) Fetches the per-run NDJSON transcript (one PanelEvent per line), parses every line via the shared isPanelEvent predicate, and dispatches into the same CritiqueState reducer the live SSE stream uses. A single reducer means the UI rendering a replay can be identical to the live panel, and a UI mounting both useCritiqueStream and useCritiqueReplay in parallel does not have to reconcile two state shapes. speed knob is `paused \| instant \| live \| { intervalMs: N }`. - instant flushes every event synchronously, useful for opening a finished run already at its terminal state. - intervalMs paces dispatches at a fixed cadence so the reviewer can watch the run unfold. - paused parses the transcript but holds events back until the caller advances speed (consumers can drive a scrubber later). - live is reserved for the future "playback at original cadence" feature, currently treated as instant; replay timestamps are not yet persisted with each event so honest pacing requires a follow-up Phase 7+ task. gunzip seam handles `.ndjson.gz` transcripts via DecompressionStream when present; the production fetch path picks between text and arrayBuffer based on the URL extension. Both seams are injectable so the unit tests don't need to spin up a real network or a real gzip pipeline. Test coverage (8 cases, jsdom env): - Idle status before any URL is provided. - speed=instant flushes the full transcript synchronously to shipped state. - speed={intervalMs:N} paces with the setTimeout seam, reaching done after the last tick. - speed=paused leaves status=playing with no dispatches. - Empty transcript reports done with state still idle. - Fetch rejection surfaces an error status with the message. - Malformed NDJSON lines are skipped; valid events around them still land. - .gz transcripts route through the gunzip seam. Closes the Phase 7 plan tasks 7.1 / 7.2 / 7.3 (reducer + stream + replay), all on one branch ready for review. Phases 8+ (Theater components) consume these from this PR. * fix(web): close payload-override gap + paused-resume bug in Critique Theater hooks (Phase 7 review) Two P1 fixes from lefarcen's review on PR #1307: SSE payload override `sseToPanelEvent` previously spread `data` after the channel-derived `type`, so a payload-provided `type` could override the channel and route a `critique.run_started` frame into the reducer as a `ship` action. Reversed the spread so the channel-derived `type` is authoritative, and revalidated the resulting object through the contracts-level `isPanelEvent` predicate before returning. Frames that fail validation (missing runId, empty runId, unknown type) are dropped, so a malformed or compromised SSE frame can no longer dispatch a wrong-shape action into the reducer. Three new sse.test.ts cases pin the regression: hostile `type:'ship'` in the payload still resolves to `run_started`, missing runId is dropped, empty runId is dropped. Replay pause/resume `useCritiqueReplay` had one big effect keyed on `transcriptUrl` only, so flipping `speed` from `paused` to `instant` never re-fired and the held events sat undispatched. Split into a parse effect (depends on URL, fetches and stores events in state) and a pace effect (depends on parsed-events + speed, owns the cursor + timers). The playback cursor lives in a ref that survives pause/resume cycles, so flipping `paused` -> `instant` flushes from the current position rather than restarting (which would double-dispatch `run_started` and reset the reducer). Two new useCritiqueReplay.test.tsx cases: - paused-then-instant transitions from `playing` to `done` and reaches the shipped terminal phase - intervalMs paced playback dispatches one event, pauses to drain the next scheduled timer, flips to instant, and confirms the remaining transcript drains exactly once (cursor was preserved) Doc consistency The earlier source comment in useCritiqueReplay.ts claimed `live` "paces by recorded timestamps" while the impl used zero-delay timers and the PR body said it behaves like `instant`. Aligned to reality: `live` currently behaves like `{ intervalMs: 0 }` (events drain on successive microtasks via setTimeoutFn) because transcripts do not yet carry per-event timestamps. Honest timestamp-driven pacing is queued as a Phase 7+ follow-up. Validated: pnpm guard, pnpm --filter @open-design/web typecheck, Theater suite 47/47 (up from 42, +3 sse + 2 replay), full web suite 96 files / 888 tests. * feat(i18n): seed Critique Theater key block (en + zh-CN; other locales fall back via spread) * feat(web): Theater PanelistLane component (Phase 8.1) * feat(web): Theater ScoreTicker component (Phase 8.2) * feat(web): Theater RoundDivider component (Phase 8.3) * feat(web): Theater InterruptButton component with Escape keybind (Phase 8.4) * feat(web): Theater TheaterDegraded chip (Phase 8.5) * feat(web): Theater TheaterCollapsed post-run summary (Phase 8.6) * feat(web): Theater TheaterTranscript replay surface (Phase 8.7) * feat(web): Theater TheaterStage top-level container (Phase 8.8) * feat(web): Theater CSS using existing semantic tokens (no hex literals) * feat(web): Theater public exports barrel * fix(web): resolve P2 + P3 review feedback on Phase 8 (PR #1314) Addresses all 4 P2 + 3 P3 items from codex, Siri-Ray, and lefarcen. State-lifecycle fixes (3 x P2) 1. Reducer learns a synthetic `__reset__` action (`CritiqueResetAction`). Host hooks dispatch it when their gating prop changes so a stale run from a prior project / transcript cannot bleed into the next context. Reset is idempotent on idle (returns the same reference). 2. `useCritiqueStream` dispatches `__reset__` at the top of its connection effect, so a workspace switch from project A (which streamed a critique) to project B clears the reducer before the new EventSource opens. enabled=false also clears. 3. `useCritiqueReplay` dispatches `__reset__` at the top of its parse effect, so transcriptUrl swaps (including swap-to-null after a replay reached `shipped`) lift the reducer back to idle before the new fetch starts. SSE validation (1 x P2) 4. `sseToPanelEvent` now runs a per-variant `hasValidVariantShape` check after the cheap `isPanelEvent` predicate. A `critique.ship` frame missing `composite` / `round` / `status` / `artifactRef` is rejected before reaching the reducer, so TheaterCollapsed can no longer crash on `undefined.toFixed(1)`. Every variant's required fields are validated: run_started (protocolVersion, non-empty cast, maxRounds, threshold, scale), panelist_* (round, role, plus variant-specific shape), round_end (round, composite, mustFix, decision in {continue,ship}, reason), ship (round, composite, status, artifactRef.{projectId,artifactId}, summary), degraded (reason, adapter), interrupted (bestRound, composite), failed (cause), parser_warning (kind, position). Reducer correctness (1 x P2) 5. `panelist_open` now materializes the round + an empty panelist view (`{dims: [], mustFixes: []}`) so TheaterStage can highlight the in-progress lane the instant the tag opens. Before this, a stream that emitted only `panelist_open` after `run_started` left `rounds = []` and the UI rendered no current round until a later `panelist_dim` arrived. Polish (3 x P3) 6. Brand role tint swaps from `var(--magenta, var(--accent))` to `var(--purple, var(--accent))`. `--purple` is actually defined across the design systems; `--magenta` is not, so Brand was silently falling through to `--accent` and looking identical to Designer. 7. New i18n key `critiqueTheater.interruptedSummary` for the interrupted-collapse copy ("Interrupted at round N, best composite X.X"). Previously the interrupted branch reused `shippedSummary` and the UI read "Shipped at round..." for a run that specifically did not ship. Native value in en + zh-CN; other locales fall back via `...en` spread. 8. `TheaterDegraded` heading id comes from `useId()` instead of a hardcoded `theater-degraded-heading`, so two chips rendered on the same page (chat history with multiple completed runs) keep their aria-labelledby references unambiguous. Tests (15 new cases) - reducer.test.ts (+5): __reset__ on running/terminal/idle, panelist_open materializes round, panelist_open does not stomp prior panelist data. - sse.test.ts (+6): variant-level rejection for ship without required fields, degraded without adapter, run_started with empty cast, panelist_dim with non-numeric score, round_end with unknown decision, plus a positive fully-formed ship. - useCritiqueStream.test.tsx (+2): state reset on projectId change, state reset on enabled flip false. - useCritiqueReplay.test.tsx (+1): state reset on transcriptUrl swap to null after a replay reached shipped. - TheaterCollapsed.test.tsx (text-pinning update): asserts the interrupted branch reads "Interrupted at round 1" + "best composite 7.9", and explicitly NOT "Shipped at round...". - TheaterDegraded.test.tsx (+1): two chips on the same page get unique aria-labelledby ids that each resolve to an `<h3>`. Validated - pnpm guard clean - pnpm --filter @open-design/web typecheck clean - Theater suite: 13 files, 101 tests (was 86 on the first Phase 8 push, +15 new) - tests/i18n/locales.test.ts 5 of 5 across 18 locales * feat(web): CritiqueTheaterMount wires SSE + reducer into a single drop-in (Phase 9.1) * feat(i18n): Critique Theater strings for de + ja + ko + zh-TW (Phase 9.2) * fix(web): resolve P1 + P2 review feedback on Phase 9 (PR #1315) Addresses every blocker from codex, Siri-Ray, and lefarcen. The three state-lifecycle and SSE-validation issues they also flagged inherit fixes from PR #1314's review pass that this branch now sits on top of after rebase. Real daemon kill on Interrupt (P1) - CritiqueTheaterMount now POSTs to /api/projects/:id/critique/:runId/interrupt alongside the optimistic local dispatch. Before this fix, clicking Interrupt only flipped the React state to interrupted while the daemon job kept running. The fetch is best-effort: a 404 (endpoint not wired yet, lands in Phase 15) is swallowed with a dev-mode console.warn so the UI still moves to the collapsed badge. - New fetchInterrupt test seam lets RTL assert on the URL / method and simulate the "daemon not ready yet" path. Two tests pin both: the happy URL proj-42/critique/run-abc/interrupt POSTs, and a rejected fetch still flips the UI. interruptPending reset on new run (P2) - A ref-backed effect compares the current runId against the last one we saw; when it changes, interruptPending is cleared. A user who interrupts run-1 and then triggers run-2 from the same mount now gets a fresh, enabled kill button instead of one stuck in "Interrupting…". Pinned by a new mount test. Escape keybind scope (P2) - InterruptButton now checks the keydown target. Escape inside an input, textarea, select, or contenteditable element is ignored (and any ancestor of those via closest() is treated the same way). Body-level focus still fires the keybind so the Theater area's affordance keeps working. Four new tests cover textarea, input, contenteditable, and the body-focus positive case. userFacingName i18n key (P2) - The spec at specs/current/critique-theater.md:6 mandates a single critiqueTheater.userFacingName key so the "Design Jury" label can be renamed without touching code. Phase 8 introduced critiqueTheater.title by mistake; renamed across types.ts, en.ts, zh-CN.ts, de.ts, ja.ts, ko.ts, zh-TW.ts, and the lone consumer TheaterStage.tsx. The locale alignment test stays green. Validated - pnpm guard clean - pnpm --filter @open-design/web typecheck clean - Theater suite: 14 files, 112 tests (was 101 before, +11 new for the Phase 9 review pass: 3 mount + 4 InterruptButton focus scope; the rest were already in #1314's review fix). - tests/i18n/locales.test.ts 5 of 5 across 18 locales. * fix(web): address PerishCode P3 polish items on PR #1315 Three non-blocking findings from PerishCode's latest review folded in alongside the merge-with-main work. 1. bestRoundOf / bestCompositeOf could pair mismatched values. The split helpers walked state.rounds independently: bestRoundOf returned the LAST round with a numeric composite; bestCompositeOf returned the MAX composite. With non-monotonic composites (round 1 at 8.5, round 2 at 6.0), the user-initiated interrupt shipped { bestRound: 2, composite: 8.5 } which is a pair that never existed and the collapsed badge advertised forever (the reducer's interrupted phase is sticky). Folded into a single bestRoundAndComposite() that walks once and returns the matching pair. Multi-round regression test in CritiqueTheaterMount.test.tsx pins the fix. 2. critiqueTheater.interruptedSummary was missing from de.ts, ja.ts, ko.ts, zh-TW.ts. The Phase 9 native-locale work translated 39 of the 40 critiqueTheater.* keys; this one slipped through because the rest of the key block uses the locale-specific interrupted line as an anchor. The collapsed badge would have shown the English fallback string in an otherwise native UI on every interrupted run. Added native translations to all four locales (matching PerishCode's suggested copy). 3. InterruptButton's window-scope Escape handler could collide with Esc-to-dismiss on modals / popovers elsewhere on the page. The prior fix filtered text-entry focus but still fired from any body-level Esc, which double-handles when a non-text dismissable surface is open. New gate defers when a [role="dialog"] (or any aria-modal="true" element) without aria-hidden="true" is on the page AND the Escape didn't originate inside .theater-stage. Three new InterruptButton.test.tsx cases pin the deferral, the aria-hidden exemption, and the in-stage exemption. Verified: typecheck clean, 111 / 1007 tests green. * fix(web): restore wait-for-daemon-ack pattern on Theater interrupt Same regression as flagged on PR #1316 post-main-merge: the optimistic local dispatch fired before the POST resolved, so a daemon 404 / 409 still terminalized the UI and the real SSE terminal event got ignored by the sticky interrupted phase. Pulled the corrected pattern from the Phase 10 commit: - snapshot runId / bestRound / composite at click time - dispatch interrupted only on res.ok - on rejection or non-2xx, clear interruptPending and log in dev so the user can retry and the real terminal event still wins Tests updated: rejection and non-2xx cases now assert the UI stays in running phase; the 204 cases wait for the async ack with waitFor. Web typecheck clean. * fix(web): drop shadow hasValidVariantShape; delegate to isPanelEvent (PerishCode follow-up on PR #1315) The wire-layer guard in `apps/web/src/components/Theater/state/sse.ts` carried a `hasValidVariantShape` second-pass filter and a docstring claiming `isPanelEvent` from contracts was a header-only check ("missing runId, unknown type"). That stopped being true after the Siri-Ray round-3 fix on PR #1314: `isPanelEvent` is now the strict guard, validating every variant's required fields, closed-enum membership against PANELIST_ROLES / SHIP_STATUSES / DEGRADED_REASONS / FAILED_CAUSES / PARSER_WARNING_KINDS / ROUND_DECISIONS, finite numerics, and the threshold <= scale cross-field check. Net effect: the shadow guard was strictly weaker than the contract guard above it. It could not reject any frame `isPanelEvent` already let through. The docstring also misled future readers into thinking the safety net was at the wire layer when it actually sits one layer up in contracts. Two changes: 1. Drop `hasValidVariantShape` entirely. `sseToPanelEvent` collapses to the one-liner: spread the payload, pin `type` from the channel, return `isPanelEvent(candidate) ? candidate : null`. Docstring rewritten to describe `isPanelEvent` accurately as the strict guard, and to call out (with a forward reference to this commit) that the deletion was intentional. 2. Add three small wire-layer regression cases to `sse.test.ts` so a future accidental weakening of either layer fails loudly at the boundary the reducer actually sees: - drops a `critique.ship` with an unknown status (closed-enum check) - drops a `critique.panelist_close` with an unknown role (closed-enum check) - drops a `critique.round_end` with `composite: NaN` (finite-numeric check) Verification: - pnpm --filter @open-design/web typecheck: clean - pnpm --filter @open-design/web exec vitest run tests/components/Theater/state/sse.test.ts: 22 / 22 green (19 prior + 3 new) - Net diff: -84 +41 lines (-43 net); the shadow function and its stale docstring are gone, the three regressions land at half the cost * fix(test): add projectKind prop to FileViewer deck render after v0.7.0 merge --------- Co-authored-by: Nagendhra <nagendhra405@gmail.com>	2026-05-13 21:21:41 +08:00
Nagendhra Madishetti	c326492203	feat: Critique Theater Phase 15 (rollout resolver + Settings toggle hook) (#1320 ) * feat(web): pure reducer for Critique Theater states (Phase 7.1) Pure CritiqueState reducer driven by the contracts-level PanelEvent (the same shape both the live SSE stream and the recorded transcript emit), so a single reducer powers both the in-flight panel and the rerun replay. Lifecycle covers run_started → running → (shipped / degraded / interrupted / failed), with panelist_open / dim / must_fix / close / round_end events building per-round CritiquePanelistView entries as they arrive. Defensive behaviour that surfaced while writing the spec tests: - Terminal phases (shipped / degraded / interrupted / failed) are sticky against further lifecycle events for the same run, except for parser_warning which can land late and is recorded in a side channel without changing phase. - A new run_started for a different runId at any time discards the prior state and reboots, so the UI can launch consecutive runs without an explicit reset action. - Events whose runId does not match the active run return the same state reference, so React's useReducer doesn't re-render subscribers on stray traffic. - Round bookkeeping keys by round number rather than "always last", so an out-of-order panelist_dim for round 1 arriving after a round 2 dim does not corrupt the round 2 bucket. Test coverage: 18 cases covering each transition, the runId guard, sticky-terminal behaviour, the out-of-order round invariant, and the stable-identity guarantee. Sets up Phase 7.2 and 7.3 to wire SSE + replay into the same reducer. * feat(web): useCritiqueStream hook subscribes to SSE and feeds reducer (Phase 7.2) createCritiqueEventsConnection is a pure connection manager that mirrors apps/web/src/providers/project-events.ts: opens an EventSource at /api/projects/:id/events, listens for every name in CRITIQUE_SSE_EVENT_NAMES, decodes each frame back into a PanelEvent (stripping the critique. prefix and merging the data payload), and hands it to the caller's onEvent. Reconnect uses exponential backoff (1s → 30s) and resets on `ready`; malformed payloads drop with a dev-mode warning rather than tearing the stream. useCritiqueStream wraps the manager in a useReducer that owns the CritiqueState. enabled=false or a null projectId tears down the connection cleanly; switching projectId closes the old connection and opens a fresh one. The returned dispatch lets local UI synthesise actions (e.g. an Esc keypress firing a synthetic interrupted while a kill request is in flight); production traffic comes from the SSE stream. Test coverage: - sse.test.ts (10 cases, node env): subscription set covers every CRITIQUE_SSE_EVENT_NAMES channel; payload decoding lifts the wire shape back to PanelEvent; malformed JSON is swallowed and does not stop the stream; exponential backoff schedule and ready-reset semantics are pinned with a setTimeout seam; close() cancels pending reconnects and shuts the live source; no-op fallback when EventSource is unavailable. - useCritiqueStream.test.tsx (6 cases, jsdom env): idle pre-event, reducer driven by synthetic actions, no connection when disabled or projectId is null, clean close on unmount, projectId change reopens cleanly. * feat(web): useCritiqueReplay hook drives reducer from transcript file (Phase 7.3) Fetches the per-run NDJSON transcript (one PanelEvent per line), parses every line via the shared isPanelEvent predicate, and dispatches into the same CritiqueState reducer the live SSE stream uses. A single reducer means the UI rendering a replay can be identical to the live panel, and a UI mounting both useCritiqueStream and useCritiqueReplay in parallel does not have to reconcile two state shapes. speed knob is `paused \| instant \| live \| { intervalMs: N }`. - instant flushes every event synchronously, useful for opening a finished run already at its terminal state. - intervalMs paces dispatches at a fixed cadence so the reviewer can watch the run unfold. - paused parses the transcript but holds events back until the caller advances speed (consumers can drive a scrubber later). - live is reserved for the future "playback at original cadence" feature, currently treated as instant; replay timestamps are not yet persisted with each event so honest pacing requires a follow-up Phase 7+ task. gunzip seam handles `.ndjson.gz` transcripts via DecompressionStream when present; the production fetch path picks between text and arrayBuffer based on the URL extension. Both seams are injectable so the unit tests don't need to spin up a real network or a real gzip pipeline. Test coverage (8 cases, jsdom env): - Idle status before any URL is provided. - speed=instant flushes the full transcript synchronously to shipped state. - speed={intervalMs:N} paces with the setTimeout seam, reaching done after the last tick. - speed=paused leaves status=playing with no dispatches. - Empty transcript reports done with state still idle. - Fetch rejection surfaces an error status with the message. - Malformed NDJSON lines are skipped; valid events around them still land. - .gz transcripts route through the gunzip seam. Closes the Phase 7 plan tasks 7.1 / 7.2 / 7.3 (reducer + stream + replay), all on one branch ready for review. Phases 8+ (Theater components) consume these from this PR. * fix(web): close payload-override gap + paused-resume bug in Critique Theater hooks (Phase 7 review) Two P1 fixes from lefarcen's review on PR #1307: SSE payload override `sseToPanelEvent` previously spread `data` after the channel-derived `type`, so a payload-provided `type` could override the channel and route a `critique.run_started` frame into the reducer as a `ship` action. Reversed the spread so the channel-derived `type` is authoritative, and revalidated the resulting object through the contracts-level `isPanelEvent` predicate before returning. Frames that fail validation (missing runId, empty runId, unknown type) are dropped, so a malformed or compromised SSE frame can no longer dispatch a wrong-shape action into the reducer. Three new sse.test.ts cases pin the regression: hostile `type:'ship'` in the payload still resolves to `run_started`, missing runId is dropped, empty runId is dropped. Replay pause/resume `useCritiqueReplay` had one big effect keyed on `transcriptUrl` only, so flipping `speed` from `paused` to `instant` never re-fired and the held events sat undispatched. Split into a parse effect (depends on URL, fetches and stores events in state) and a pace effect (depends on parsed-events + speed, owns the cursor + timers). The playback cursor lives in a ref that survives pause/resume cycles, so flipping `paused` -> `instant` flushes from the current position rather than restarting (which would double-dispatch `run_started` and reset the reducer). Two new useCritiqueReplay.test.tsx cases: - paused-then-instant transitions from `playing` to `done` and reaches the shipped terminal phase - intervalMs paced playback dispatches one event, pauses to drain the next scheduled timer, flips to instant, and confirms the remaining transcript drains exactly once (cursor was preserved) Doc consistency The earlier source comment in useCritiqueReplay.ts claimed `live` "paces by recorded timestamps" while the impl used zero-delay timers and the PR body said it behaves like `instant`. Aligned to reality: `live` currently behaves like `{ intervalMs: 0 }` (events drain on successive microtasks via setTimeoutFn) because transcripts do not yet carry per-event timestamps. Honest timestamp-driven pacing is queued as a Phase 7+ follow-up. Validated: pnpm guard, pnpm --filter @open-design/web typecheck, Theater suite 47/47 (up from 42, +3 sse + 2 replay), full web suite 96 files / 888 tests. * feat(i18n): seed Critique Theater key block (en + zh-CN; other locales fall back via spread) * feat(web): Theater PanelistLane component (Phase 8.1) * feat(web): Theater ScoreTicker component (Phase 8.2) * feat(web): Theater RoundDivider component (Phase 8.3) * feat(web): Theater InterruptButton component with Escape keybind (Phase 8.4) * feat(web): Theater TheaterDegraded chip (Phase 8.5) * feat(web): Theater TheaterCollapsed post-run summary (Phase 8.6) * feat(web): Theater TheaterTranscript replay surface (Phase 8.7) * feat(web): Theater TheaterStage top-level container (Phase 8.8) * feat(web): Theater CSS using existing semantic tokens (no hex literals) * feat(web): Theater public exports barrel * fix(web): resolve P2 + P3 review feedback on Phase 8 (PR #1314) Addresses all 4 P2 + 3 P3 items from codex, Siri-Ray, and lefarcen. State-lifecycle fixes (3 x P2) 1. Reducer learns a synthetic `__reset__` action (`CritiqueResetAction`). Host hooks dispatch it when their gating prop changes so a stale run from a prior project / transcript cannot bleed into the next context. Reset is idempotent on idle (returns the same reference). 2. `useCritiqueStream` dispatches `__reset__` at the top of its connection effect, so a workspace switch from project A (which streamed a critique) to project B clears the reducer before the new EventSource opens. enabled=false also clears. 3. `useCritiqueReplay` dispatches `__reset__` at the top of its parse effect, so transcriptUrl swaps (including swap-to-null after a replay reached `shipped`) lift the reducer back to idle before the new fetch starts. SSE validation (1 x P2) 4. `sseToPanelEvent` now runs a per-variant `hasValidVariantShape` check after the cheap `isPanelEvent` predicate. A `critique.ship` frame missing `composite` / `round` / `status` / `artifactRef` is rejected before reaching the reducer, so TheaterCollapsed can no longer crash on `undefined.toFixed(1)`. Every variant's required fields are validated: run_started (protocolVersion, non-empty cast, maxRounds, threshold, scale), panelist_* (round, role, plus variant-specific shape), round_end (round, composite, mustFix, decision in {continue,ship}, reason), ship (round, composite, status, artifactRef.{projectId,artifactId}, summary), degraded (reason, adapter), interrupted (bestRound, composite), failed (cause), parser_warning (kind, position). Reducer correctness (1 x P2) 5. `panelist_open` now materializes the round + an empty panelist view (`{dims: [], mustFixes: []}`) so TheaterStage can highlight the in-progress lane the instant the tag opens. Before this, a stream that emitted only `panelist_open` after `run_started` left `rounds = []` and the UI rendered no current round until a later `panelist_dim` arrived. Polish (3 x P3) 6. Brand role tint swaps from `var(--magenta, var(--accent))` to `var(--purple, var(--accent))`. `--purple` is actually defined across the design systems; `--magenta` is not, so Brand was silently falling through to `--accent` and looking identical to Designer. 7. New i18n key `critiqueTheater.interruptedSummary` for the interrupted-collapse copy ("Interrupted at round N, best composite X.X"). Previously the interrupted branch reused `shippedSummary` and the UI read "Shipped at round..." for a run that specifically did not ship. Native value in en + zh-CN; other locales fall back via `...en` spread. 8. `TheaterDegraded` heading id comes from `useId()` instead of a hardcoded `theater-degraded-heading`, so two chips rendered on the same page (chat history with multiple completed runs) keep their aria-labelledby references unambiguous. Tests (15 new cases) - reducer.test.ts (+5): __reset__ on running/terminal/idle, panelist_open materializes round, panelist_open does not stomp prior panelist data. - sse.test.ts (+6): variant-level rejection for ship without required fields, degraded without adapter, run_started with empty cast, panelist_dim with non-numeric score, round_end with unknown decision, plus a positive fully-formed ship. - useCritiqueStream.test.tsx (+2): state reset on projectId change, state reset on enabled flip false. - useCritiqueReplay.test.tsx (+1): state reset on transcriptUrl swap to null after a replay reached shipped. - TheaterCollapsed.test.tsx (text-pinning update): asserts the interrupted branch reads "Interrupted at round 1" + "best composite 7.9", and explicitly NOT "Shipped at round...". - TheaterDegraded.test.tsx (+1): two chips on the same page get unique aria-labelledby ids that each resolve to an `<h3>`. Validated - pnpm guard clean - pnpm --filter @open-design/web typecheck clean - Theater suite: 13 files, 101 tests (was 86 on the first Phase 8 push, +15 new) - tests/i18n/locales.test.ts 5 of 5 across 18 locales * feat(web): CritiqueTheaterMount wires SSE + reducer into a single drop-in (Phase 9.1) * feat(i18n): Critique Theater strings for de + ja + ko + zh-TW (Phase 9.2) * fix(web): resolve P1 + P2 review feedback on Phase 9 (PR #1315) Addresses every blocker from codex, Siri-Ray, and lefarcen. The three state-lifecycle and SSE-validation issues they also flagged inherit fixes from PR #1314's review pass that this branch now sits on top of after rebase. Real daemon kill on Interrupt (P1) - CritiqueTheaterMount now POSTs to /api/projects/:id/critique/:runId/interrupt alongside the optimistic local dispatch. Before this fix, clicking Interrupt only flipped the React state to interrupted while the daemon job kept running. The fetch is best-effort: a 404 (endpoint not wired yet, lands in Phase 15) is swallowed with a dev-mode console.warn so the UI still moves to the collapsed badge. - New fetchInterrupt test seam lets RTL assert on the URL / method and simulate the "daemon not ready yet" path. Two tests pin both: the happy URL proj-42/critique/run-abc/interrupt POSTs, and a rejected fetch still flips the UI. interruptPending reset on new run (P2) - A ref-backed effect compares the current runId against the last one we saw; when it changes, interruptPending is cleared. A user who interrupts run-1 and then triggers run-2 from the same mount now gets a fresh, enabled kill button instead of one stuck in "Interrupting…". Pinned by a new mount test. Escape keybind scope (P2) - InterruptButton now checks the keydown target. Escape inside an input, textarea, select, or contenteditable element is ignored (and any ancestor of those via closest() is treated the same way). Body-level focus still fires the keybind so the Theater area's affordance keeps working. Four new tests cover textarea, input, contenteditable, and the body-focus positive case. userFacingName i18n key (P2) - The spec at specs/current/critique-theater.md:6 mandates a single critiqueTheater.userFacingName key so the "Design Jury" label can be renamed without touching code. Phase 8 introduced critiqueTheater.title by mistake; renamed across types.ts, en.ts, zh-CN.ts, de.ts, ja.ts, ko.ts, zh-TW.ts, and the lone consumer TheaterStage.tsx. The locale alignment test stays green. Validated - pnpm guard clean - pnpm --filter @open-design/web typecheck clean - Theater suite: 14 files, 112 tests (was 101 before, +11 new for the Phase 9 review pass: 3 mount + 4 InterruptButton focus scope; the rest were already in #1314's review fix). - tests/i18n/locales.test.ts 5 of 5 across 18 locales. * feat(daemon): adapter-degraded registry with TTL (Phase 10.1) In-memory registry recording adapters that produced malformed or oversize transcripts so the orchestrator can skip them for a TTL window (default 24h) instead of cycling through known-bad providers on every run. Records carry reason (malformed_block \| oversize_block \| missing_artifact), source label, and expiresAt. The test-only clock seam lets the suite advance time deterministically and prove that an expired entry stops counting as degraded without anyone calling clearDegraded. 7/7 vitest cases green. * feat(daemon): synthetic good + bad adapter fixtures (Phase 10.2) Two test-only adapters that read the existing v1 transcript fixtures (happy-3-rounds and malformed-unbalanced) and replay them as either a full string or a 512-byte chunked stream. The chunked form is what the conformance harness uses to prove the parser holds together when the transcript arrives in arbitrary network slices, not as one buffered blob. * feat(daemon): adapter conformance harness (Phase 10.3) runAdapterConformance pulls a transcript through the same parseCritiqueStream pipeline the orchestrator uses and classifies the outcome as shipped, degraded, or failed. On a degraded outcome it forwards the matched reason to the adapter-degraded registry, so a single nightly conformance run is what populates the skip list rather than the orchestrator learning each adapter is broken at request time. 5/5 vitest cases green covering shipped, malformed degraded, oversize degraded, no-ship failure, and the harness-thrown failure path. * test(e2e): Critique Theater Playwright suite (Phase 11) Six tests, one viewport per visual case, deterministic SSE fixtures stubbed via page.route(). Adds the suite to test:ui:extended so the existing extended-UI lane picks it up. Coverage: 1. Happy path: a single mounted theater plays the full fixture (1 run_started, 5 panelists open / dim / must_fix / close, 1 round_end, 1 ship) and ends on the score badge. 2. Interrupt mid-run: the panelist that is open at the time the interrupt button is clicked closes with an interrupted marker and the transcript freezes there. 3. Visual regression at 375x720 mobile. 4. Visual regression at 768x1024 tablet. 5. Visual regression at 1280x800 desktop. 6. A11y role tree: the theater region exposes a labelled landmark, each panelist lane is a group with an accessible name, the score is a status live region. All SSE traffic is stubbed by page.route so the suite runs in CI without a daemon. The toggle is seeded via localStorage by bootAppWithCritiqueEnabled so the gate behaves as if Settings flipped it on. typecheck clean; playwright --list reports 6. * test(web): reducer p99 bench at 10k iterations (Phase 13.1) Locks the documented 2ms budget for the Critique Theater reducer on a representative SSE script (27 actions, one full happy run) behind a regression gate. Asserts p99 stays under 4ms (2x the documented budget) so CI runners with a noisy neighbour do not flake while a real regression to 20ms or 200ms still trips. The bench is a vitest case rather than a bare microbenchmark so it runs in the same CI lane as every other web test and does not need a parallel runner. * test(web): critique surface coverage walker (Phase 13.2) Walks the public critique surface (11 SSE event names, 5 panelist roles, 6 lifecycle phases, 9 named i18n keys) and asserts each named symbol appears in both the src corpus and the test corpus. The walker is the gate that catches a rename in one half of the codebase without a matching update in the other half: a future PR that drops 'panelist_must_fix' from the reducer without also removing its test reference fails this suite. 62 assertions, one per symbol per corpus. * docs: Critique Theater user guide (Phase 14.1) Seven sections aimed at end users (not contributors): 1. What is Design Jury 2. How it works (the five panelists, auto-converging rounds, the composite formula) 3. Settings (the M1 toggle and what it does) 4. Reading the score badge 5. Replay surface 6. Troubleshooting (degraded, interrupted, failed) 7. FAQ The composite formula is documented as designer * 0 + critic * 0.4 + brand * 0.2 + a11y * 0.2 + copy * 0.2 because anyone trying to reverse-engineer the score is going to search for those weights and the docs are the place they should land first. * docs(daemon): critique module AGENTS map (Phase 14.2) Daemon-side wayfinder for the apps/daemon/src/critique directory. Tables every file, what owns what invariant, and the 'when you change anything here' guide so a future contributor does not have to reverse-engineer the rollout resolver before adding a new SSE event. * docs(web): Theater module AGENTS map (Phase 14.3) Web-side mirror of the daemon AGENTS map. Same file table, same invariants section, same change-impact guide, sized to the Theater component package. * feat(daemon): rollout flag resolver (Phase 15.1) Single decision point every caller consults to know whether the orchestrator should wire the critique pipeline for a given run. Priority: 1. Skill-level policy (required wins, opt-out wins inversely) 2. Per-project override from the Settings toggle 3. OD_CRITIQUE_ENABLED env override 4. Rollout phase default M0 dark-launch false M1 settings only false (toggle is off until the user flips it) M2 per-skill true if skill opted in M3 global default true OD_CRITIQUE_ROLLOUT_PHASE parser defaults to M0 on unknown input so a fresh install never surprises a user with the feature on. 10/10 vitest cases green covering every cell of the matrix. * feat(web): Settings toggle hook for Critique Theater (Phase 15.2) React hook that reads critiqueTheaterEnabled from the existing open-design:config localStorage blob and stays in sync via: - the platform storage event (cross-tab) - a open-design:critique-theater-toggle CustomEvent (same-tab) Same-tab event is the one that fires when the Settings panel saves in the current window: the toggle and every mounted theater update without a page reload. setCritiqueTheaterEnabled(next) is the imperative setter the Settings panel calls. It preserves the rest of the stored config (mode, apiKey, etc.) and dispatches the same-tab event after the localStorage write. The web hook reflects what the user toggled; the daemon-side isCritiqueEnabled is the final routing authority (project override, env, rollout phase). When they disagree, the daemon wins for backend gating and the web reflects the toggle state. 6/6 vitest cases green covering first read, stored read, same-tab event flip, config preservation, corrupted JSON tolerance, and cross-tab storage event. * test(web): Phase 15 toggle hook failure-mode coverage (PR #1320) lefarcen P2 on PR #1320 flagged that the PR body claimed safe behavior for disabled localStorage, non-object JSON, and missing CustomEvent shim, but the suite only covered corrupt JSON plus happy-path storage events. Added four failure-mode tests so the swallowed errors are not silently traded for a throw in a future refactor: 1. Returns false on a stored JSON value that parses to an array (non-object). Catches a regression where the guard treats anything truthy as a config blob. 2. Returns false on a stored JSON value of literal 'null'. typeof null === 'object' in JS, so the guard has to check null explicitly; this test pins that check. 3. Returns false when localStorage.getItem throws (private mode / disabled storage / SecurityError). The hook must swallow and return false so the rest of the app keeps rendering. 4. setCritiqueTheaterEnabled still dispatches the same-tab CustomEvent when localStorage.setItem throws (quota exceeded / disabled storage). The dispatch path is the in-session broadcast that keeps every mounted hook coherent even when persistence is unavailable; verified by mounting two probes and asserting both flip after the setter is called with a throwing setItem. 10/10 vitest cases green (6 existing + 4 new). * fix(web): honor CustomEvent payload in toggle hook listener (PR #1320) Both Siri-Ray (blocking) and lefarcen (P2 new) caught the same real bug in the failure-mode test I added in `affcdd27`: the test asserts the in-session UI flips when localStorage.setItem throws, but the CustomEvent listener was ignoring the event's typed detail and just calling readToggle(). Under a throwing setItem the localStorage value is stale (or absent), so the listener would see the OLD value and the test would fail (or worse, the production claim 'in-session event keeps mounts coherent' was hollow). Fixed the hook, not the test: the listener now reads event.detail.enabled when it is a boolean, falling back to readToggle() only for malformed events or for cross-tab storage events (which do not carry a typed payload). The setter already dispatched the detail; the listener just was not consuming it. Test changes: - The existing 'setItem throws' test now asserts the right behavior for the right reason. Updated the inline comment to say the listener reads from detail, not localStorage. - New test 'falls back to readToggle when the CustomEvent carries no usable detail' pins the fallback path: a malformed dispatcher (no detail, or detail.enabled not a boolean) degrades cleanly instead of throwing or being silently ignored. 11 / 11 vitest cases green (10 prior + 1 new fallback). * fix(web): tighten isPanelEvent in contracts so enum + numeric fields are checked end-to-end (Siri-Ray round-3 P1 on PR #1314) The variant validator on the web SSE path previously accepted any `typeof === 'string'` for closed-enum fields (ship.status, panelist_.role, degraded.reason, failed.cause, parser_warning.kind, run_started.cast[]) and any `typeof === 'number'` for numeric fields, which let NaN / Infinity through. Downstream components index i18n tables by enum value, so an unknown status or role would land `SHIP_BADGE_KEY[final.status]` on undefined and crash the translator. The replay parser had a separate gap: `useCritiqueReplay.parseTranscript` called the cheap `isPanelEvent` header check directly, so a recorded line like `{"type":"ship","runId":"r"}` reached the reducer with composite, status, round, artifactRef, summary all undefined and TheaterCollapsed then called `final.composite.toFixed(1)` on undefined. Resolution: move all wire-side validation into the contract guard. - Export const arrays for the closed enums: SHIP_STATUSES, DEGRADED_REASONS, FAILED_CAUSES, PARSER_WARNING_KINDS, ROUND_DECISIONS (PANELIST_ROLES already existed). - Rewrite `isPanelEvent` in packages/contracts/src/critique.ts to be the single deep validator: header (known type + non-empty runId) plus every variant-specific required field plus closed-enum membership plus Number.isFinite on every numeric field. Documented as the wire source of truth. - Drop the local `hasValidVariantShape` from web/sse.ts; sseToPanelEvent now relies entirely on the contract guard, and parseTranscript in useCritiqueReplay (which already uses isPanelEvent) gets the deeper validation for free. Tests (TDD, red-first): - packages/contracts/tests/critique.test.ts: 13 new cases pinning the strict guard directly (well-formed across every variant, every rejection path: unknown type, empty/non-string runId, unknown enum, non-finite numeric, missing variant field). - apps/web/tests/components/Theater/state/sse.test.ts: 9 new cases for each closed-enum rejection on the wire path plus a positive sweep across every legal enum value across every variant. - apps/web/tests/components/Theater/hooks/useCritiqueReplay.test.tsx: 2 new cases for incomplete and unknown-enum transcript lines. Verified: - pnpm --filter @open-design/contracts test 4 files / 30 tests green. - pnpm --filter @open-design/contracts build clean. - pnpm --filter @open-design/web typecheck clean. - pnpm --filter @open-design/web test 107 files / 976 tests green. docs: tighten Phase 14 reasoning from lefarcen review (PR #1319) Four content gaps lefarcen flagged in the Phase 14 docs review, addressed inline rather than deferred. The fifth item (scope-drift between 'docs only' PR body and the cumulative stacked diff) is handled by rewriting the PR body, not the docs. 1. Round exit conditions (lefarcen P2-1). docs/critique-theater.md §2 'Auto-converging rounds' now lists the five conditions that stop a run (threshold reached, round budget exhausted, per-round timeout, total timeout, user interrupt) with their default values. A user debugging a run that stopped at round 1 with composite 5.4 can read this list and find the matching cause without spelunking the orchestrator. 2. Prior-art comparison (lefarcen P2-2). New §1.5 'Why an in-CLI panel and not a third-party design lint' pre-answers the 'why not Figma lint / Adobe checker / Material You conformance' question. Three differences: rule engines vs generative reviewers, post-hoc vs in-loop, external service vs same-CLI-session. 3. Composite formula rationale (lefarcen P2-4). §2 now explains why each weight is set the way it is: critic gates correctness so it gets 0.4; brand / a11y / copy are secondary quality dimensions at 0.2 each; designer is at 0.0 in v1 because aesthetic preference is not a ship gate. The slot stays in the schema so notes flow into the transcript and a v2 config release can bump the weight without a wire-shape change. 4. v2 cast-config ownership (lefarcen P2-3). Both AGENTS.md files (daemon + web) now declare a 'Designer weight frozen at 0.0 until v2 cast config' invariant. The daemon side calls out where the SKILL.md frontmatter resolver lands (apps/daemon/src/critique/config.ts); the web side calls out where the Settings surface lands (apps/web/src/components/ Settings/). A contributor reading either AGENTS.md before implementing v2 sees which module to touch first. * docs: replace deferred metrics endpoint reference + refresh Theater module map (PR #1319) Two carryover items lefarcen flagged across the PR #1319 + #1320 reviews. 1. docs/critique-theater.md was sending users to /api/metrics/critique as the conformance-status check on malformed_block, but the Phase 12 metrics endpoint is explicitly deferred until after orchestrator wiring lands. Replaced the link with the pnpm conformance-harness command that DOES exist today (pnpm --filter @open-design/daemon vitest run tests/critique-conformance.test.ts) and noted that the dashboard surfaces this status as a series once Phase 12 ships. 2. apps/web/src/components/Theater/AGENTS.md module map was stale after Phase 15: the index.ts row said 'only two hooks are exported' but the barrel now exports useCritiqueTheaterEnabled too (plus the setCritiqueTheaterEnabled setter). Updated the row to list all three hooks + the setter + the reducer-derived contract types, and added a new row for hooks/useCritiqueTheaterEnabled.ts in the file table so a web contributor scanning the table sees the new hook without inferring it from the index.ts blurb. * docs(phase-15): clarify resolver + toggle scope in Section 3 (lefarcen P2 on PR #1320) Two P2 doc-accuracy items from the latest review: 1. The prior text said "the web toggle persists into the daemon's settings store; both surfaces flip the same flag." That isn't true on this PR's head: setCritiqueTheaterEnabled writes localStorage and dispatches a same-tab CustomEvent, but does NOT write through to the daemon (no /api/settings/critique endpoint ships in Phase 15, and no production caller of the setter rounds through to the daemon). Reworded the section to describe the actual scope: client-side in-session toggle this phase, daemon round-trip deferred to the Settings UI follow-up. 2. The prior text implied the rollout resolver was wired into the spawn-time gate, but apps/daemon/src/server.ts still reads critiqueCfg.enabled directly; isCritiqueEnabled is only called in tests. Reworded the section to make this explicit: Phase 15 ships the resolver in isolation so it can land green and be reviewed on its own, the one-line wiring change ships in the wireup PR that follows. Operators who want to enable the feature today should still set OD_CRITIQUE_ENABLED=1 rather than relying on the client toggle. That guidance is now explicit in the doc. * docs(rollout): align module docblock with actual Phase 15 scope (lefarcen P2 on PR #1320) The module-level docblock said the orchestrator entry and a GET /api/settings/critique endpoint already consume isCritiqueEnabled, but neither integration ships in this PR: the spawn-time gate in server.ts still reads critiqueCfg.enabled directly, and the daemon settings endpoint is deferred to the Settings UI follow-up. A future contributor reading the source comment would assume rollout phases and project overrides are live in generation routing when they are not yet wired. Reworded the docblock into three sections that mirror the user-facing docs/critique-theater.md scope notes: - What ships in Phase 15: the pure resolver function plus its supporting parsers, with full priority-matrix coverage. - Planned consumers (not yet wired): the orchestrator entry (one-line swap, lands in the wireup PR), the settings echo endpoint (lands with the Settings UI PR), and the conformance harness. - Operator guidance: until the wiring change lands, set OD_CRITIQUE_ENABLED=1 rather than relying on the client toggle. Resolution-order bullet 2 also updated: per-project override 'will write here once the Settings UI follow-up adds the daemon-side write path' instead of implying that write path exists today. Verified: - pnpm --filter @open-design/contracts build clean. - pnpm --filter @open-design/daemon typecheck clean. No runtime change. Documentation-only alignment. * fix(web): restore wait-for-daemon-ack pattern on Theater interrupt Same regression as flagged on PR #1316 post-main-merge: the optimistic local dispatch fired before the POST resolved, so a daemon 404 / 409 still terminalized the UI and the real SSE terminal event got ignored by the sticky interrupted phase. Snapshot runId / bestRound / composite at click time, dispatch interrupted only on res.ok, clear interruptPending on rejection or non-2xx so the user can retry. Tests cover rejection + 404 leaving the run on the live stage; the 204 path waits for the ack. * fix(web): export useCritiqueTheaterEnabled + setter from Theater barrel (Siri-Ray P2) The Phase 15 hook and its imperative setter were not in the public barrel, even though Phase 14 AGENTS.md describes index.ts as exporting them. That mismatch would force the Settings follow-up to import from the private hooks/ path (or render the AGENTS module map inaccurate). Added the export alongside useCritiqueStream and useCritiqueReplay so the Phase 15 public surface matches the module map. * fix(test): add projectKind prop to FileViewer deck render after v0.7.0 merge * fix(contracts): restore numeric-domain guards in isPanelEvent (lefarcen + Siri-Ray P2 on PR #1320) --------- Co-authored-by: Nagendhra <nagendhra405@gmail.com>	2026-05-13 21:19:51 +08:00
pftom	d3d95121f3	feat(plugins): enhance visual score sorting and add new example templates - Updated the `sortByVisualAppeal` function to prioritize featured ranks, ensuring that curated plugins are displayed prominently. - Added tests to verify the new sorting logic, ensuring that plugins with numeric featured ranks are sorted correctly ahead of others. - Introduced new example templates for a magazine article layout, a Twitter share card, and a Xiaohongshu card, expanding the available options for users. - Enhanced the overall plugin preview experience by integrating these new templates, providing users with more visually appealing and functional examples. This update significantly improves the plugin sorting mechanism and enriches the template offerings, enhancing user engagement and experience.	2026-05-13 21:02:05 +08:00
pftom	8b2d48a258	feat(daemon, web): enhance plugin preview handling and add new templates - Introduced logic to assemble example slides with a companion template when the declared entry is missing, improving the user experience for plugin previews. - Updated the server logic to handle special cases for `example-slides.html`, ensuring proper fallback to `template.html` when applicable. - Enhanced tests to verify the new preview assembly functionality and ensure correct rendering of fallback content. - Added new HTML and Markdown examples for various skills, including a magazine article layout and a Twitter share card, expanding the available templates for users. This update significantly improves the plugin preview experience, providing users with more robust and visually appealing fallback options.	2026-05-13 20:58:24 +08:00
Caprika	06dbde51f9	[codex] Add Cursor Agent auth diagnostics (#1538 ) * Add Cursor Agent auth diagnostics * Handle Cursor not logged in auth status * Address Cursor auth review feedback * Classify Cursor stdout auth failures	2026-05-13 20:25:34 +08:00
Caprika	a3276ec542	[codex] Add visual draw annotation context (#1547 ) * feat(web): add visual draw annotation context * Fix visual draw annotation staging * Fix concurrent visual annotation IDs	2026-05-13 20:02:19 +08:00
Prantik Medhi	086be271d4	fix: hide preview chrome in source view (#1556 ) * fix: hide preview chrome in source view * fix: keep source-view edit controls	2026-05-13 19:12:00 +08:00
Prantik Medhi	660c5b88b4	fix(web): auto-scroll feedback form (#1566 )	2026-05-13 19:06:08 +08:00
lefarcen	fa2bb59ab3	fix(test): align FileViewer tests with merged component - Drop FileViewer.inspect-empty-hint.test.tsx (deleted on main; should have been gone after the merge but the modify/delete conflict left it in the tree). - Take main's FileViewer.test.tsx so the manual-edit interaction tests match the merged FileViewer.tsx behaviour, then patch every <FileViewer> render with projectKind="prototype" so it satisfies the prop requirement release added in #1509.	2026-05-13 18:49:44 +08:00
lefarcen	4096d65125	fix(web): align Icon names and FileViewer tests with merged state - SettingsDialog: use 'sparkles' for Pet section nav icon (main's choice; the merge picked up release's 'paw' which is not in main's IconName union). - FileViewer.test.tsx: take release's version which passes projectKind on every render (release added that prop in #1509 analytics; main's tests predate that prop). - FileViewer.manual-edit{,-history}.test.tsx: keep main's tests but pass projectKind="prototype" so they satisfy the merged FileViewer Props.	2026-05-13 18:40:48 +08:00
lefarcen	5172e37217	Merge origin/main into release/v0.7.0 to prepare merge-back PR Resolves 7 conflicts via hybrid strategy: - apps/web/src/components/EntryView.tsx: take main (Discord+X pills are forward feature) - apps/web/src/components/Icon.tsx: take main (switch-case refactor) - apps/web/src/components/NewProjectPanel.tsx: take release (preserve #1514 dropdown UX validated in 0.7.0 acceptance) - apps/web/src/index.css: take main (project-target-platforms / instructions chip styles) - apps/web/tests/components/FileViewer.inspect-empty-hint.test.tsx: accept main's deletion - nix/package-daemon.nix, nix/package-web.nix: take main pnpmDepsHash Non-conflicting hunks from #1519 (AppChromeHeader), #1428 (PostHog analytics call sites), and #1540 (release light background) are preserved via auto-merge.	2026-05-13 18:19:47 +08:00
Prantik Medhi	9040088f1c	fix(web): remove redundant bulk-select button (#1550 )	2026-05-13 16:38:14 +08:00
kami	4f76e836ae	feat(audio): add ElevenLabs audio support (#1384 ) * docs: add ElevenLabs audio support design * docs: add ElevenLabs audio implementation plan * feat(daemon): add ElevenLabs speech renderer * feat(daemon): add ElevenLabs sound effects renderer * fix(daemon): preserve ElevenLabs sfx durations * feat(web): expose ElevenLabs media providers * feat(daemon): document ElevenLabs audio contract * feat(audio): add ElevenLabs voice selection * chore: ignore superpowers scratch docs * fix(daemon): cache ElevenLabs voice options * fix(audio): expand ElevenLabs voice and SFX selection * fix(audio): align ElevenLabs SFX controls * fix(audio): tighten ElevenLabs SFX prompt budget * fix(audio): preflight ElevenLabs SFX prompt length * fix(audio): surface ElevenLabs lookup failures * fix(audio): sanitize ElevenLabs prompt errors	2026-05-13 15:53:41 +08:00
pftom	9e196d34af	feat(daemon, web): enhance plugin sharing workflows and UI components - Updated the plugin sharing prompts to utilize local daemon endpoints for publishing to GitHub and contributing to Open Design, streamlining the user experience. - Refactored the `PluginsView` and `PluginShareMenu` components to support new sharing functionalities, including confirmation modals and improved link handling. - Enhanced the CSS styles for the plugin share confirmation modal and related UI elements for better visual consistency. - Added tests to verify the functionality of the new sharing workflows and ensure proper integration within the existing plugin management system. This update significantly improves the plugin sharing experience, making it easier for users to publish and contribute their plugins effectively.	2026-05-13 14:35:09 +08:00
Sid	eda182c8a1	refactor(web): UI polish for v0.7.0 — neutralised palette, official brand glyphs, lucide (#1522 ) * refactor(web): adopt lucide-react for the inline Icon component The hand-rolled `<Icon>` set drifted in stroke weight and proportion across its 50+ glyphs as new icons were added. Swap the implementation to dispatch to `lucide-react` while keeping the same `<Icon name="..." size={X} />` API so the 246 existing call sites stay untouched. - Adds `lucide-react` as a dependency (tree-shaken; ~30KB gzipped for the ~50 icons we actually import). - `discord` and `x-brand` keep their bespoke inline SVG paths since lucide intentionally does not ship brand artwork. - `spinner` continues to use the existing `.icon-spin` className for its rotation; under the hood it now renders lucide's `Loader2`. - New `paw` glyph (lucide `PawPrint`) so the Pets nav item stops sharing the `sparkles` icon with External MCP. No behaviour change: the prop surface is identical, fill follows `currentColor` exactly as before, and aria-hidden / focusable defaults are preserved. Visual deltas are limited to the strokes themselves (slightly finer endcaps, more consistent baseline weights) — exactly the consistency upgrade lucide gives us. * feat(web): bundle official brand assets for agent icons `AgentIcon` previously approximated each agent's brand with hand-drawn SVG (orange Anthropic-ish sparkle, OpenAI-knot ellipses, etc). Replace those approximations with the real, vendor-published artwork shipped as static assets under `apps/web/public/agent-icons/`. - 13 SVG marks sourced from `@lobehub/icons-static-svg` (MIT) — color variants where the vendor published one (Claude, Codex, Gemini, Copilot, Qwen, Qoder, DeepSeek, Kimi, Mistral/Vibe), monochrome marks for the rest (Cursor, OpenCode, Hermes, MiMo, Pi, Kilo). - 1 PNG mark (Devin) sourced from devin.ai/icon.png, resized to 96×96 via `sips` since Cognition doesn't publish an SVG. - Each SVG was cleaned (stripped `<title>` brand text and the library's internal `style="flex:none;..."` ; dropped `width/height="1em"` so `viewBox` governs sizing) and run through `svgo --multipass`. Total bundle footprint: ~36 KB for all 17 files, only loaded on the agent cards that render them. - `AgentIcon` now resolves brands via a small `ICON_EXT` table and renders `<img src="/agent-icons/<id>.<ext>">`. Agents without an asset (`devin` is the lone outlier removed in this commit because PNG; new agents with no shipped artwork at all) fall back to an initial-letter pill that reads as "no official mark yet" rather than inventing brand artwork. - Removes the `simple-icons` dependency from a previous iteration since `AgentIcon` was its only consumer. Public-API stable: `<AgentIcon id={a.id} size={X} />` still accepts the same prop shape; `AvatarMenu`'s small-size usage continues to work. * refactor(web): polish entry view + Settings dialog UI for v0.7.0 A sweep over the two surfaces that have the most visual surface area in the app (the entry sidebar / New Project panel on the left, and the Settings modal). The work converged on a single neutral palette + a small set of shared dimensional standards documented in CSS, so future sections that get added slot into the same rhythm. New Project panel (apps/web/src/components/NewProjectPanel.tsx + .newproj* rules in index.css) - Adds a spec comment block at the top of the .newproj rules listing the canonical heights (input 30, dropdown 38, compact toggle 36, popover item 38) and the neutral colour rules. - Rebuilds PlatformPicker as a DS-picker-style dropdown trigger + popover (the previous 6-card 2×3 grid was ~280px tall; the dropdown collapses to a single 38px row with the same multi-select semantics). - Replaces SurfaceOptions' two heavy `ToggleRow` cards with the new compact one-line `CompactToggle`; the descriptive hint moves to a native `title` tooltip. - Compresses the Fidelity card grid (thumb aspect 16/7 → 16/5, tighter padding, smaller label). - Neutralises every selected/active state inside the panel: removes the orange accent fills and rings from `.newproj-card.active`, `.newproj-title-badge`, `.compact-toggle.on`, `.toggle-row.on`, the DS picker popover items + radio/check marks, the trigger open border and shadow, and the search-bar background. The Create CTA stays the only orange element on the panel. - Aligns the project-name input focus state across the sidebar: border `var(--text)` + 8% black halo (rgba is written out because the CSS pipeline collapses `color-mix(... 8%, transparent)` down to a solid `var(--text)`, which would render as a 3px solid black band). - Switches the body card from `flex: 1 1 auto` to `flex: 0 1 auto` so a short form variant doesn't leave a white void at the bottom of the card, and disables overscroll-bounce on the card so a fast scroll doesn't briefly expose the page-level gray under the white surface. - Pins the privacy footer below the card with a fixed 0 margin-top + shorter padding-top so it reads as a label of the card rather than a centred dialog footer. Entry sidebar footer (apps/web/src/components/EntryView.tsx + .entry-side-foot* rules) - Replaces the X social pill's `external-link` placeholder glyph with a bespoke filled `x-brand` SVG that mirrors the `discord` mark already in the icon set. - Wraps Discord + X in `.entry-side-foot-social` and lets that group flex-margin to the right of the row, so the two social pills read as a tight pair instead of a fourth pill stuck to the Pet pill. - Drops the "unadopted" red dot on the Pet pill (it duplicated the call to action that the label already carried). - Shrinks the footer icons to 10px and dims them to 55% / 75% opacity on hover so the labels are clearly the focal point — `currentColor` on the lucide-rendered SVGs would otherwise make the glyphs full black on hover. - Tightens the env-pill version text cap (180 → 142) so the top row ends close to the right edge of the Language + Pet group below it. Settings dialog (apps/web/src/components/SettingsDialog.tsx + .modal-settings / .settings-* / .seg-* / .agent-* rules) - Removes the "SETTINGS" kicker eyebrow above each section title (the big-typography title and modal context already make it redundant). - Switches the sidebar from a card-per-item layout to ChatGPT-style single-line pills: hides the `<small>` description, swaps the sidebar bg from gray to white, makes the active item a gray pill (no border, no shadow) so all items keep a consistent row height regardless of state. - Drops the modal-body's top border (already separated by the whitespace between modal-head and the body grid) and pins `.modal-settings { height: min(720px, 100vh - 64px) }` so the dialog no longer resizes when the user switches between short and long sections. - Compresses the Local CLI / BYOK seg-control from a 2-line ~52px card pair to a 1-line ~42px segmented pill that height-matches the active sidebar nav-item, and aligns the `.settings-content` padding-top with `.settings-sidebar` (22 → 16) so the first content row sits level with the first sidebar item. - Neutralises agent-card selected state, install/docs link colour, and protocol-chip active state — same accent-stripping pattern as the New Project panel. - Uniform agent-card height via `min-height: 64px` so installed cards (icon + name + version) align with unavailable cards (icon + name + not-installed + Install/Docs row). No prop-API changes, no business-logic edits — this is a pure visual refactor. Existing tests, providers and daemon contracts are untouched.	2026-05-13 13:59:19 +08:00
Jesse Yu	b2841f6045	Fix user chat message bubble styling (#1517 ) Co-authored-by: Haoyuan Yu <haoyuan.yu@shopee.com>	2026-05-13 13:59:15 +08:00
Caprika	6736310a01	Implement manual edit inspector (#1448 ) * feat(web): tweaks palette popover with HSL hue-shift recoloring Adds a Tweaks color-palette popover to the HTML preview toolbar. Selecting a palette re-skins the iframe in place via a srcDoc-side bridge that walks the DOM and shifts every chromatic paint to the target hue while preserving each color's saturation and lightness — pale tints stay pale, bold CTAs stay bold, just in the new color family. Mono-noir desaturates instead of shifting. - runtime/srcdoc: new injectPaletteBridge + paletteBridge / initialPalette options - file-viewer-render-mode: paletteActive flips URL-load back to srcDoc so the bridge can be injected - FileViewer: state, popover, postMessage wiring, srcDoc + useUrlLoadPreview integration - PaletteTweaks: popover UI with Original + Coral / Electric / Acid forest / Risograph / Mono noir - PreviewDrawOverlay: stub pass-through until the draw branch lands * feat(web): hide finalize-design toolbar from project header * test(e2e): skip project actions toolbar flow after toolbar removal * Polish manual edit inspector * Implement manual edit inspector * Fix manual edit review regressions * Fix FileViewer CI regressions * Fix remaining manual edit review issues * Flush manual edit styles before draw exit * Restore Critique Theater styles * Accept pixel line-height manual edits --------- Co-authored-by: qiongyu1999 <2694684348@qq.com>	2026-05-13 13:25:58 +08:00
pftom	c9cc3b88c0	feat(web): standardize plugin terminology and enhance UI components - Updated terminology from "Community" to "Official" across various components to reflect first-party plugin status. - Enhanced the ChatComposer, HomeHero, and PluginsHomeSection components to improve user experience and clarity in plugin management. - Improved CSS styles for better visual consistency and layout across plugin-related interfaces. - Added tests to ensure proper functionality and visibility of official plugins in the UI. This update reinforces the distinction between official and user-installed plugins, enhancing the overall user experience in plugin interactions.	2026-05-13 12:19:29 +08:00
lefarcen	dc7791ef9d	feat(analytics): add project_id + project_kind to studio/artifact events (#1509 ) Product tracking doc 260513 added project_id + project_kind to studio_view (artifact), studio_click (share_option), and artifact_export_result. The Studio funnel can now group by project type without joining run_created on the back end. - contracts: 3 props gain required project_id + project_kind - ProjectView → FileWorkspace → FileViewer: thread projectKind down, converting metadata.kind via projectKindToTracking once at the top - FileViewer + HtmlViewer: populate the three call sites	2026-05-13 12:13:55 +08:00
Siri-Ray	c16297f10c	Refine preview and project dropdown controls (#1514 ) * Refine preview and project dropdown controls * fix(web): gate OS widget metadata Generated-By: looper 0.7.4 (runner=fixer, agent=codex) * fix(web): mark platform picker listbox multi-select Generated-By: looper 0.7.4 (runner=fixer, agent=codex)	2026-05-13 12:13:31 +08:00
Nagendhra Madishetti	e2f409579d	docs: Critique Theater Phase 14 (user guide + 2 AGENTS module maps) (#1319 ) * feat(web): pure reducer for Critique Theater states (Phase 7.1) Pure CritiqueState reducer driven by the contracts-level PanelEvent (the same shape both the live SSE stream and the recorded transcript emit), so a single reducer powers both the in-flight panel and the rerun replay. Lifecycle covers run_started → running → (shipped / degraded / interrupted / failed), with panelist_open / dim / must_fix / close / round_end events building per-round CritiquePanelistView entries as they arrive. Defensive behaviour that surfaced while writing the spec tests: - Terminal phases (shipped / degraded / interrupted / failed) are sticky against further lifecycle events for the same run, except for parser_warning which can land late and is recorded in a side channel without changing phase. - A new run_started for a different runId at any time discards the prior state and reboots, so the UI can launch consecutive runs without an explicit reset action. - Events whose runId does not match the active run return the same state reference, so React's useReducer doesn't re-render subscribers on stray traffic. - Round bookkeeping keys by round number rather than "always last", so an out-of-order panelist_dim for round 1 arriving after a round 2 dim does not corrupt the round 2 bucket. Test coverage: 18 cases covering each transition, the runId guard, sticky-terminal behaviour, the out-of-order round invariant, and the stable-identity guarantee. Sets up Phase 7.2 and 7.3 to wire SSE + replay into the same reducer. * feat(web): useCritiqueStream hook subscribes to SSE and feeds reducer (Phase 7.2) createCritiqueEventsConnection is a pure connection manager that mirrors apps/web/src/providers/project-events.ts: opens an EventSource at /api/projects/:id/events, listens for every name in CRITIQUE_SSE_EVENT_NAMES, decodes each frame back into a PanelEvent (stripping the critique. prefix and merging the data payload), and hands it to the caller's onEvent. Reconnect uses exponential backoff (1s → 30s) and resets on `ready`; malformed payloads drop with a dev-mode warning rather than tearing the stream. useCritiqueStream wraps the manager in a useReducer that owns the CritiqueState. enabled=false or a null projectId tears down the connection cleanly; switching projectId closes the old connection and opens a fresh one. The returned dispatch lets local UI synthesise actions (e.g. an Esc keypress firing a synthetic interrupted while a kill request is in flight); production traffic comes from the SSE stream. Test coverage: - sse.test.ts (10 cases, node env): subscription set covers every CRITIQUE_SSE_EVENT_NAMES channel; payload decoding lifts the wire shape back to PanelEvent; malformed JSON is swallowed and does not stop the stream; exponential backoff schedule and ready-reset semantics are pinned with a setTimeout seam; close() cancels pending reconnects and shuts the live source; no-op fallback when EventSource is unavailable. - useCritiqueStream.test.tsx (6 cases, jsdom env): idle pre-event, reducer driven by synthetic actions, no connection when disabled or projectId is null, clean close on unmount, projectId change reopens cleanly. * feat(web): useCritiqueReplay hook drives reducer from transcript file (Phase 7.3) Fetches the per-run NDJSON transcript (one PanelEvent per line), parses every line via the shared isPanelEvent predicate, and dispatches into the same CritiqueState reducer the live SSE stream uses. A single reducer means the UI rendering a replay can be identical to the live panel, and a UI mounting both useCritiqueStream and useCritiqueReplay in parallel does not have to reconcile two state shapes. speed knob is `paused \| instant \| live \| { intervalMs: N }`. - instant flushes every event synchronously, useful for opening a finished run already at its terminal state. - intervalMs paces dispatches at a fixed cadence so the reviewer can watch the run unfold. - paused parses the transcript but holds events back until the caller advances speed (consumers can drive a scrubber later). - live is reserved for the future "playback at original cadence" feature, currently treated as instant; replay timestamps are not yet persisted with each event so honest pacing requires a follow-up Phase 7+ task. gunzip seam handles `.ndjson.gz` transcripts via DecompressionStream when present; the production fetch path picks between text and arrayBuffer based on the URL extension. Both seams are injectable so the unit tests don't need to spin up a real network or a real gzip pipeline. Test coverage (8 cases, jsdom env): - Idle status before any URL is provided. - speed=instant flushes the full transcript synchronously to shipped state. - speed={intervalMs:N} paces with the setTimeout seam, reaching done after the last tick. - speed=paused leaves status=playing with no dispatches. - Empty transcript reports done with state still idle. - Fetch rejection surfaces an error status with the message. - Malformed NDJSON lines are skipped; valid events around them still land. - .gz transcripts route through the gunzip seam. Closes the Phase 7 plan tasks 7.1 / 7.2 / 7.3 (reducer + stream + replay), all on one branch ready for review. Phases 8+ (Theater components) consume these from this PR. * fix(web): close payload-override gap + paused-resume bug in Critique Theater hooks (Phase 7 review) Two P1 fixes from lefarcen's review on PR #1307: SSE payload override `sseToPanelEvent` previously spread `data` after the channel-derived `type`, so a payload-provided `type` could override the channel and route a `critique.run_started` frame into the reducer as a `ship` action. Reversed the spread so the channel-derived `type` is authoritative, and revalidated the resulting object through the contracts-level `isPanelEvent` predicate before returning. Frames that fail validation (missing runId, empty runId, unknown type) are dropped, so a malformed or compromised SSE frame can no longer dispatch a wrong-shape action into the reducer. Three new sse.test.ts cases pin the regression: hostile `type:'ship'` in the payload still resolves to `run_started`, missing runId is dropped, empty runId is dropped. Replay pause/resume `useCritiqueReplay` had one big effect keyed on `transcriptUrl` only, so flipping `speed` from `paused` to `instant` never re-fired and the held events sat undispatched. Split into a parse effect (depends on URL, fetches and stores events in state) and a pace effect (depends on parsed-events + speed, owns the cursor + timers). The playback cursor lives in a ref that survives pause/resume cycles, so flipping `paused` -> `instant` flushes from the current position rather than restarting (which would double-dispatch `run_started` and reset the reducer). Two new useCritiqueReplay.test.tsx cases: - paused-then-instant transitions from `playing` to `done` and reaches the shipped terminal phase - intervalMs paced playback dispatches one event, pauses to drain the next scheduled timer, flips to instant, and confirms the remaining transcript drains exactly once (cursor was preserved) Doc consistency The earlier source comment in useCritiqueReplay.ts claimed `live` "paces by recorded timestamps" while the impl used zero-delay timers and the PR body said it behaves like `instant`. Aligned to reality: `live` currently behaves like `{ intervalMs: 0 }` (events drain on successive microtasks via setTimeoutFn) because transcripts do not yet carry per-event timestamps. Honest timestamp-driven pacing is queued as a Phase 7+ follow-up. Validated: pnpm guard, pnpm --filter @open-design/web typecheck, Theater suite 47/47 (up from 42, +3 sse + 2 replay), full web suite 96 files / 888 tests. * feat(i18n): seed Critique Theater key block (en + zh-CN; other locales fall back via spread) * feat(web): Theater PanelistLane component (Phase 8.1) * feat(web): Theater ScoreTicker component (Phase 8.2) * feat(web): Theater RoundDivider component (Phase 8.3) * feat(web): Theater InterruptButton component with Escape keybind (Phase 8.4) * feat(web): Theater TheaterDegraded chip (Phase 8.5) * feat(web): Theater TheaterCollapsed post-run summary (Phase 8.6) * feat(web): Theater TheaterTranscript replay surface (Phase 8.7) * feat(web): Theater TheaterStage top-level container (Phase 8.8) * feat(web): Theater CSS using existing semantic tokens (no hex literals) * feat(web): Theater public exports barrel * fix(web): resolve P2 + P3 review feedback on Phase 8 (PR #1314) Addresses all 4 P2 + 3 P3 items from codex, Siri-Ray, and lefarcen. State-lifecycle fixes (3 x P2) 1. Reducer learns a synthetic `__reset__` action (`CritiqueResetAction`). Host hooks dispatch it when their gating prop changes so a stale run from a prior project / transcript cannot bleed into the next context. Reset is idempotent on idle (returns the same reference). 2. `useCritiqueStream` dispatches `__reset__` at the top of its connection effect, so a workspace switch from project A (which streamed a critique) to project B clears the reducer before the new EventSource opens. enabled=false also clears. 3. `useCritiqueReplay` dispatches `__reset__` at the top of its parse effect, so transcriptUrl swaps (including swap-to-null after a replay reached `shipped`) lift the reducer back to idle before the new fetch starts. SSE validation (1 x P2) 4. `sseToPanelEvent` now runs a per-variant `hasValidVariantShape` check after the cheap `isPanelEvent` predicate. A `critique.ship` frame missing `composite` / `round` / `status` / `artifactRef` is rejected before reaching the reducer, so TheaterCollapsed can no longer crash on `undefined.toFixed(1)`. Every variant's required fields are validated: run_started (protocolVersion, non-empty cast, maxRounds, threshold, scale), panelist_* (round, role, plus variant-specific shape), round_end (round, composite, mustFix, decision in {continue,ship}, reason), ship (round, composite, status, artifactRef.{projectId,artifactId}, summary), degraded (reason, adapter), interrupted (bestRound, composite), failed (cause), parser_warning (kind, position). Reducer correctness (1 x P2) 5. `panelist_open` now materializes the round + an empty panelist view (`{dims: [], mustFixes: []}`) so TheaterStage can highlight the in-progress lane the instant the tag opens. Before this, a stream that emitted only `panelist_open` after `run_started` left `rounds = []` and the UI rendered no current round until a later `panelist_dim` arrived. Polish (3 x P3) 6. Brand role tint swaps from `var(--magenta, var(--accent))` to `var(--purple, var(--accent))`. `--purple` is actually defined across the design systems; `--magenta` is not, so Brand was silently falling through to `--accent` and looking identical to Designer. 7. New i18n key `critiqueTheater.interruptedSummary` for the interrupted-collapse copy ("Interrupted at round N, best composite X.X"). Previously the interrupted branch reused `shippedSummary` and the UI read "Shipped at round..." for a run that specifically did not ship. Native value in en + zh-CN; other locales fall back via `...en` spread. 8. `TheaterDegraded` heading id comes from `useId()` instead of a hardcoded `theater-degraded-heading`, so two chips rendered on the same page (chat history with multiple completed runs) keep their aria-labelledby references unambiguous. Tests (15 new cases) - reducer.test.ts (+5): __reset__ on running/terminal/idle, panelist_open materializes round, panelist_open does not stomp prior panelist data. - sse.test.ts (+6): variant-level rejection for ship without required fields, degraded without adapter, run_started with empty cast, panelist_dim with non-numeric score, round_end with unknown decision, plus a positive fully-formed ship. - useCritiqueStream.test.tsx (+2): state reset on projectId change, state reset on enabled flip false. - useCritiqueReplay.test.tsx (+1): state reset on transcriptUrl swap to null after a replay reached shipped. - TheaterCollapsed.test.tsx (text-pinning update): asserts the interrupted branch reads "Interrupted at round 1" + "best composite 7.9", and explicitly NOT "Shipped at round...". - TheaterDegraded.test.tsx (+1): two chips on the same page get unique aria-labelledby ids that each resolve to an `<h3>`. Validated - pnpm guard clean - pnpm --filter @open-design/web typecheck clean - Theater suite: 13 files, 101 tests (was 86 on the first Phase 8 push, +15 new) - tests/i18n/locales.test.ts 5 of 5 across 18 locales * feat(web): CritiqueTheaterMount wires SSE + reducer into a single drop-in (Phase 9.1) * feat(i18n): Critique Theater strings for de + ja + ko + zh-TW (Phase 9.2) * fix(web): resolve P1 + P2 review feedback on Phase 9 (PR #1315) Addresses every blocker from codex, Siri-Ray, and lefarcen. The three state-lifecycle and SSE-validation issues they also flagged inherit fixes from PR #1314's review pass that this branch now sits on top of after rebase. Real daemon kill on Interrupt (P1) - CritiqueTheaterMount now POSTs to /api/projects/:id/critique/:runId/interrupt alongside the optimistic local dispatch. Before this fix, clicking Interrupt only flipped the React state to interrupted while the daemon job kept running. The fetch is best-effort: a 404 (endpoint not wired yet, lands in Phase 15) is swallowed with a dev-mode console.warn so the UI still moves to the collapsed badge. - New fetchInterrupt test seam lets RTL assert on the URL / method and simulate the "daemon not ready yet" path. Two tests pin both: the happy URL proj-42/critique/run-abc/interrupt POSTs, and a rejected fetch still flips the UI. interruptPending reset on new run (P2) - A ref-backed effect compares the current runId against the last one we saw; when it changes, interruptPending is cleared. A user who interrupts run-1 and then triggers run-2 from the same mount now gets a fresh, enabled kill button instead of one stuck in "Interrupting…". Pinned by a new mount test. Escape keybind scope (P2) - InterruptButton now checks the keydown target. Escape inside an input, textarea, select, or contenteditable element is ignored (and any ancestor of those via closest() is treated the same way). Body-level focus still fires the keybind so the Theater area's affordance keeps working. Four new tests cover textarea, input, contenteditable, and the body-focus positive case. userFacingName i18n key (P2) - The spec at specs/current/critique-theater.md:6 mandates a single critiqueTheater.userFacingName key so the "Design Jury" label can be renamed without touching code. Phase 8 introduced critiqueTheater.title by mistake; renamed across types.ts, en.ts, zh-CN.ts, de.ts, ja.ts, ko.ts, zh-TW.ts, and the lone consumer TheaterStage.tsx. The locale alignment test stays green. Validated - pnpm guard clean - pnpm --filter @open-design/web typecheck clean - Theater suite: 14 files, 112 tests (was 101 before, +11 new for the Phase 9 review pass: 3 mount + 4 InterruptButton focus scope; the rest were already in #1314's review fix). - tests/i18n/locales.test.ts 5 of 5 across 18 locales. * feat(daemon): adapter-degraded registry with TTL (Phase 10.1) In-memory registry recording adapters that produced malformed or oversize transcripts so the orchestrator can skip them for a TTL window (default 24h) instead of cycling through known-bad providers on every run. Records carry reason (malformed_block \| oversize_block \| missing_artifact), source label, and expiresAt. The test-only clock seam lets the suite advance time deterministically and prove that an expired entry stops counting as degraded without anyone calling clearDegraded. 7/7 vitest cases green. * feat(daemon): synthetic good + bad adapter fixtures (Phase 10.2) Two test-only adapters that read the existing v1 transcript fixtures (happy-3-rounds and malformed-unbalanced) and replay them as either a full string or a 512-byte chunked stream. The chunked form is what the conformance harness uses to prove the parser holds together when the transcript arrives in arbitrary network slices, not as one buffered blob. * feat(daemon): adapter conformance harness (Phase 10.3) runAdapterConformance pulls a transcript through the same parseCritiqueStream pipeline the orchestrator uses and classifies the outcome as shipped, degraded, or failed. On a degraded outcome it forwards the matched reason to the adapter-degraded registry, so a single nightly conformance run is what populates the skip list rather than the orchestrator learning each adapter is broken at request time. 5/5 vitest cases green covering shipped, malformed degraded, oversize degraded, no-ship failure, and the harness-thrown failure path. * test(e2e): Critique Theater Playwright suite (Phase 11) Six tests, one viewport per visual case, deterministic SSE fixtures stubbed via page.route(). Adds the suite to test:ui:extended so the existing extended-UI lane picks it up. Coverage: 1. Happy path: a single mounted theater plays the full fixture (1 run_started, 5 panelists open / dim / must_fix / close, 1 round_end, 1 ship) and ends on the score badge. 2. Interrupt mid-run: the panelist that is open at the time the interrupt button is clicked closes with an interrupted marker and the transcript freezes there. 3. Visual regression at 375x720 mobile. 4. Visual regression at 768x1024 tablet. 5. Visual regression at 1280x800 desktop. 6. A11y role tree: the theater region exposes a labelled landmark, each panelist lane is a group with an accessible name, the score is a status live region. All SSE traffic is stubbed by page.route so the suite runs in CI without a daemon. The toggle is seeded via localStorage by bootAppWithCritiqueEnabled so the gate behaves as if Settings flipped it on. typecheck clean; playwright --list reports 6. * test(web): reducer p99 bench at 10k iterations (Phase 13.1) Locks the documented 2ms budget for the Critique Theater reducer on a representative SSE script (27 actions, one full happy run) behind a regression gate. Asserts p99 stays under 4ms (2x the documented budget) so CI runners with a noisy neighbour do not flake while a real regression to 20ms or 200ms still trips. The bench is a vitest case rather than a bare microbenchmark so it runs in the same CI lane as every other web test and does not need a parallel runner. * test(web): critique surface coverage walker (Phase 13.2) Walks the public critique surface (11 SSE event names, 5 panelist roles, 6 lifecycle phases, 9 named i18n keys) and asserts each named symbol appears in both the src corpus and the test corpus. The walker is the gate that catches a rename in one half of the codebase without a matching update in the other half: a future PR that drops 'panelist_must_fix' from the reducer without also removing its test reference fails this suite. 62 assertions, one per symbol per corpus. * docs: Critique Theater user guide (Phase 14.1) Seven sections aimed at end users (not contributors): 1. What is Design Jury 2. How it works (the five panelists, auto-converging rounds, the composite formula) 3. Settings (the M1 toggle and what it does) 4. Reading the score badge 5. Replay surface 6. Troubleshooting (degraded, interrupted, failed) 7. FAQ The composite formula is documented as designer * 0 + critic * 0.4 + brand * 0.2 + a11y * 0.2 + copy * 0.2 because anyone trying to reverse-engineer the score is going to search for those weights and the docs are the place they should land first. * docs(daemon): critique module AGENTS map (Phase 14.2) Daemon-side wayfinder for the apps/daemon/src/critique directory. Tables every file, what owns what invariant, and the 'when you change anything here' guide so a future contributor does not have to reverse-engineer the rollout resolver before adding a new SSE event. * docs(web): Theater module AGENTS map (Phase 14.3) Web-side mirror of the daemon AGENTS map. Same file table, same invariants section, same change-impact guide, sized to the Theater component package. * docs: tighten Phase 14 reasoning from lefarcen review (PR #1319) Four content gaps lefarcen flagged in the Phase 14 docs review, addressed inline rather than deferred. The fifth item (scope-drift between 'docs only' PR body and the cumulative stacked diff) is handled by rewriting the PR body, not the docs. 1. Round exit conditions (lefarcen P2-1). docs/critique-theater.md §2 'Auto-converging rounds' now lists the five conditions that stop a run (threshold reached, round budget exhausted, per-round timeout, total timeout, user interrupt) with their default values. A user debugging a run that stopped at round 1 with composite 5.4 can read this list and find the matching cause without spelunking the orchestrator. 2. Prior-art comparison (lefarcen P2-2). New §1.5 'Why an in-CLI panel and not a third-party design lint' pre-answers the 'why not Figma lint / Adobe checker / Material You conformance' question. Three differences: rule engines vs generative reviewers, post-hoc vs in-loop, external service vs same-CLI-session. 3. Composite formula rationale (lefarcen P2-4). §2 now explains why each weight is set the way it is: critic gates correctness so it gets 0.4; brand / a11y / copy are secondary quality dimensions at 0.2 each; designer is at 0.0 in v1 because aesthetic preference is not a ship gate. The slot stays in the schema so notes flow into the transcript and a v2 config release can bump the weight without a wire-shape change. 4. v2 cast-config ownership (lefarcen P2-3). Both AGENTS.md files (daemon + web) now declare a 'Designer weight frozen at 0.0 until v2 cast config' invariant. The daemon side calls out where the SKILL.md frontmatter resolver lands (apps/daemon/src/critique/config.ts); the web side calls out where the Settings surface lands (apps/web/src/components/ Settings/). A contributor reading either AGENTS.md before implementing v2 sees which module to touch first. * docs(web): mirror the Designer-weight invariant in Theater AGENTS.md (PR #1319) lefarcen P1 follow-up on PR #1319: the daemon AGENTS.md already declares 'Designer weight is frozen at 0.0 until v2 cast config lands' as an invariant, but the web AGENTS.md's parallel bullet led with 'Composite weights are read-only on the web side' which buried the Designer-specific constraint. A web contributor reading that bullet would not realise the v1 weight distribution is wire-shape (changing it mid-v1 invalidates persisted critique_runs composite values). Rewrote the bullet to lead with the same 'Designer weight is frozen at 0.0 until v2 cast config lands' phrasing the daemon side uses, and added an explicit cross-link to the daemon AGENTS.md so the two halves of the invariant read as one rule. Web-side specifics retained: ScoreTicker / TheaterCollapsed read composite off the wire (no client recompute), v2 lands as a Settings surface at apps/web/src/components/Settings/, do not add a 'weights' prop to any component in this directory until the contracts package carries the v2 cast type. * docs: replace deferred metrics endpoint reference + refresh Theater module map (PR #1319) Two carryover items lefarcen flagged across the PR #1319 + #1320 reviews. 1. docs/critique-theater.md was sending users to /api/metrics/critique as the conformance-status check on malformed_block, but the Phase 12 metrics endpoint is explicitly deferred until after orchestrator wiring lands. Replaced the link with the pnpm conformance-harness command that DOES exist today (pnpm --filter @open-design/daemon vitest run tests/critique-conformance.test.ts) and noted that the dashboard surfaces this status as a series once Phase 12 ships. 2. apps/web/src/components/Theater/AGENTS.md module map was stale after Phase 15: the index.ts row said 'only two hooks are exported' but the barrel now exports useCritiqueTheaterEnabled too (plus the setCritiqueTheaterEnabled setter). Updated the row to list all three hooks + the setter + the reducer-derived contract types, and added a new row for hooks/useCritiqueTheaterEnabled.ts in the file table so a web contributor scanning the table sees the new hook without inferring it from the index.ts blurb. * fix(web): restore wait-for-daemon-ack pattern on Theater interrupt Same regression as flagged on PR #1316 post-main-merge: the optimistic local dispatch fired before the POST resolved, so a daemon 404 / 409 still terminalized the UI and the real SSE terminal event got ignored by the sticky interrupted phase. Snapshot runId / bestRound / composite at click time, dispatch interrupted only on res.ok, clear interruptPending on rejection or non-2xx so the user can retry. Tests cover rejection + 404 leaving the run on the live stage; the 204 path waits for the ack. --------- Co-authored-by: Nagendhra <nagendhra405@gmail.com>	2026-05-13 12:11:48 +08:00
pftom	fcc3ae5838	feat(web): enhance HomeHero and related components for improved context selection and visibility handling - Updated the HomeHero component to support skill and MCP server mentions, allowing users to select these options seamlessly. - Improved CSS styles for the HomeHero component, enhancing the visual presentation of active selections and context tabs. - Refactored visibility handling for slides in the deck framework, ensuring proper display logic and preventing visibility issues with variant classes. - Added tests to verify the functionality of context selection and visibility handling, ensuring a smoother user experience. This update significantly enhances the user interface and interaction capabilities within the HomeHero component, improving the overall experience for users managing skills and presentations.	2026-05-13 11:48:15 +08:00
pftom	9006b74cde	feat(web): enhance ChatComposer and related components with skill and MCP server integration - Added support for skills and MCP servers in the ChatComposer, allowing users to apply skills and select MCP servers through mentions. - Updated the HomeHero and ProjectView components to manage skill selection and project skill changes. - Enhanced the EntryShell and other components to accommodate new skill and MCP server functionalities. - Improved CSS styles for better visual presentation of new features. - Added tests to ensure proper functionality of skill and MCP server integrations within the ChatComposer. This update significantly improves the user experience by enabling seamless integration of skills and MCP servers into the chat interface, enhancing project management capabilities.	2026-05-13 11:47:51 +08:00
pftom	f7a13c7b15	feat(web): enhance HomeHero component to support IME composition handling - Added a `composingRef` to track IME composition state, preventing submission during text composition. - Implemented `isImeComposing` function to check if the input is currently being composed. - Updated event handlers to ensure that submissions and plugin selections are blocked while composing. - Added tests to verify that submissions and plugin picks do not occur during IME composition, improving user experience for non-Latin input methods. This update enhances the input handling in the HomeHero component, ensuring a smoother experience for users utilizing IME for text input.	2026-05-13 11:26:10 +08:00
lefarcen	e2952acd05	Revert "fix(web): restore consistent app header layout (#1432 )" This reverts commit `3d3119333c`.	2026-05-13 11:20:16 +08:00
pftom	c36609c47d	feat(daemon, web): implement plugin sharing project creation and enhance CLI functionality - Added new flags for conversation, message, agent, and model in the CLI to support enhanced plugin sharing features. - Introduced a new API endpoint for creating share projects for plugins, allowing users to publish to GitHub or contribute to Open Design. - Updated the UI components to facilitate the new sharing functionalities, including prompts for user input during the sharing process. - Enhanced the project management system to handle new plugin share actions, improving user interaction and experience. - Added tests to ensure the reliability of the new sharing features and their integration within the existing plugin management system. This update significantly enhances the plugin ecosystem by enabling users to share their creations more effectively and streamline collaboration.	2026-05-13 07:01:12 +08:00
Neha Prasad	342ba44383	fix memory extraction history affordance (#1447 )	2026-05-12 13:35:34 -04:00
Siri-Ray	3d3119333c	fix(web): restore consistent app header layout (#1432 ) * docs: add NotebookLM GitHub export script (#1062) * docs: add NotebookLM GitHub export script * fix: make NotebookLM export TOC anchors work * fix: escape TOC link text markdown chars * fix: include merged PRs when exporting --prs all * fix: allow --prs merged mode * fix: treat --limit as total export budget * fix: avoid starving buckets under global --limit * fix: support --issues none and handle repos w/ issues disabled * fix: avoid underfilling export when buckets empty * fix: keep disabled-issues fallback quiet * fix: silence disabled issues fallback * fix: satisfy script typecheck * prevent duplicate saves and add template deletion (#1294) * prevent duplicate template entries on repeated save * add delete button to saved template list Templates can now be removed from the template picker via a hover x button, calling the existing DELETE /api/templates/:id endpoint. * add missing onDeleteTemplate prop in test fixtures * add template deletion flow test for NewProjectPanel * reject template names longer than 100 characters * preserve original createdAt on template update * feat: add FAQ page skill (#1162) * fix: set writable OD_DATA_DIR default for nix run Fixes #1157 When running via 'nix run github:nexu-io/open-design', the daemon attempted to create runtime state under the Nix store package path: /nix/store/.../lib/open-design/.od/projects The Nix store is read-only at runtime, causing startup to fail with ENOENT when mkdir() tried to create the projects directory. This commit updates the nix run wrapper to export OD_DATA_DIR with a writable default ($HOME/.od) when the variable is unset. Users can still override it by setting OD_DATA_DIR before running. The Home Manager and NixOS modules already set OD_DATA_DIR, so they are unaffected by this change. * feat: add FAQ page skill Add a new skill for generating Frequently Asked Questions pages with: - Collapsible accordion sections for Q&A pairs - Real-time search functionality - Category filtering (Billing, Account, Technical, General) - Smooth animations and transitions - Keyboard navigation support - Mobile-friendly responsive design - Semantic HTML with proper ARIA attributes The skill includes: - SKILL.md with triggers, workflow, and output contract - example.html demonstrating a complete FAQ page with 12 questions Use cases: help centers, support pages, product documentation * fix: address PR review feedback for FAQ page skill - Fix craft slugs: use accessibility-baseline and state-coverage instead of non-existent slugs - Remove overly broad 'questions and answers' trigger - Add edge case handling for insufficient/excessive FAQs - Remove search highlighting requirement (XSS risk) - Update self-check to reflect filtering instead of highlighting Addresses review comments from @lefarcen and @chatgpt-codex-connector * feat: add localized copy for faq-page skill Add German, French, and Russian translations for the FAQ page skill example prompt to fix validation test failure. - DE: FAQ-Seite mit Akkordeon-Abschnitten, Suchfunktion und Kategoriefilterung - FR: Page FAQ avec sections accordéon, recherche et filtrage par catégorie - RU: Страница FAQ со складными секциями-аккордеонами, поиском и фильтрацией * fix: escape apostrophe in French translation Use double quotes to avoid syntax error with d'auth * fix(platform): add legacy ~/.fnm path to wellKnownUserToolchainBins (#1110) * fix(platform): add legacy ~/.fnm path to wellKnownUserToolchainBins fnm legacy installations use ~/.fnm/node-versions. Closes #1102 * fix: remove stray .fnm token from type declaration * docs: add Windows troubleshooting guide (#478) (#1170) * docs: add Windows troubleshooting guide (#478) Add docs/windows-troubleshooting.md with step-by-step fixes for the most common native-Windows setup errors: - Node 24 / nvm-windows gotchas (fake nvm file in System32) - pnpm not found after installation - Build scripts blocked by pnpm 10 (better-sqlite3, sharp) - Visual Studio / gyp build errors - Starting the dev server - Optional OpenCode CLI setup Also update CONTRIBUTING.md and QUICKSTART.md to link to the new guide instead of the vague "file an issue if it doesn't" note. * docs: fix Windows guide command accuracy (#1170) Address all 6 inline review comments from lefarcen: - Pin npm-global pnpm install to @10.33.2 (matches packageManager field) - Use where.exe instead of bare where (PowerShell alias conflict) - Fix OpenCode package: opencode-ai (not opencode), binary is opencode - Add EPERM fallback note for corepack enable on protected installs - Add Python check for gyp ERR! find Python - Expand diagnostic checklist with corepack, python, execution policy Also remove redundant corepack pnpm --version from checklist. * feat(daemon): inject compiled design-system tokens + fixture into prompts (#1385) * feat(daemon): inject compiled design-system tokens + fixture into prompts Follow-up to #1231. The prior PR landed the structured form of two brands (`default` + `kami`) and codified the schema; this PR teaches the daemon to actually consume those files when assembling the system prompt, so agents stop having to re-derive token names from DESIGN.md prose every turn. Gated behind `OD_DESIGN_TOKEN_CHANNEL=1` for the smoke-test phase — flag-off keeps the daemon byte-equivalent to today's behavior, flag-on appends two new prompt blocks (the brand's `tokens.css` :root contract and its `components.html` reference fixture) right after the existing DESIGN.md block. Brands without those sibling files (every brand except `default` and `kami` today) skip silently in either mode. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(daemon): only swallow ENOENT/ENOTDIR in readFileOptional, rethrow rest Reviewer feedback (nettee, #1385). The prior catch-all hid permission errors, EISDIR, and broken packaged-resource paths behind the same "undefined = absent" branch the legacy ~138-brand fallback uses, which would let `OD_DESIGN_TOKEN_CHANNEL=1` silently degrade to the DESIGN.md-only prompt while reporting success. That corrupts the exact signal the smoke-test rollout depends on. Now `readFileOptional` only returns undefined for ENOENT / ENOTDIR (real "file does not exist" cases) and rethrows everything else. Added a focused test that plants a directory at the tokens.css path to exercise the EISDIR branch, plus a partial-presence regression test to confirm the stricter contract preserves the legacy fallback. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: chaoxiaoche <chaoxiaoche@192.168.10.16> Co-authored-by: Cursor <cursoragent@cursor.com> * feat(daemon): make connection-test timeouts configurable (#1222) * feat(daemon): make connection-test timeouts configurable Provider and agent connection tests had hardcoded 12s / 45s budgets, which are too tight for slow networks or distant providers (the user sees "timeout" in Settings with no way to extend the budget). - Add OD_CONNECTION_TEST_PROVIDER_TIMEOUT_MS (default 12_000) - Add OD_CONNECTION_TEST_AGENT_TIMEOUT_MS (default 45_000) - Invalid values (non-numeric, zero, negative, fractional) emit a console.warn and fall back to the default, so a typo in the env never silently disables the safety timeout. - Export resolveConnectionTestTimeoutMs for unit testing; cover the three resolution paths (fallback / honored override / invalid). 41 connection-test tests pass (+3 new), full daemon suite 1170/1170. * fix(daemon): reject connection-test timeout overrides above Node's setTimeout maximum Node's `setTimeout` silently clamps any delay above `2^31-1` ms (2_147_483_647) to ~1 ms with a TimeoutOverflowWarning. The previous `Number.isInteger(n) && n >= 1` check accepted oversized values unchanged and passed them straight to `setTimeout`, so an override that intended to raise the budget — e.g. `OD_CONNECTION_TEST_AGENT_TIMEOUT_MS=3000000000` — instead caused every connection test to fail almost immediately. The safety timeout was effectively disarmed. Add `MAX_CONNECTION_TEST_TIMEOUT_MS = 2_147_483_647` and switch the guard to `Number.isSafeInteger(n) && n >= 1 && n <= MAX...`. The boundary value is still accepted; one millisecond past it falls back with a warn. Regression test exercises `3_000_000_000`, `2_147_483_647`, and `2_147_483_648`. Addresses #1222 review feedback from @chatgpt-codex-connector, @mrcfps, and @lefarcen. * fix(security): strip trailing dot in normalizeBracketedIpv6 (FQDN SSRF bypass) (#1122) * fix(security): strip trailing dot in normalizeBracketedIpv6 (FQDN bypass) new URL('http://192.168.1.5./').hostname returns '192.168.1.5.' — the trailing dot is the RFC 1034 absolute-FQDN form and resolves identically to '192.168.1.5'. parseIpv4 fails on the dotted form, so 169.254.169.254. slips past the metadata-service block, 192.168.1.5. slips past the LAN block, and localhost. slips past the loopback identification. Strip trailing dots in normalizeBracketedIpv6 so all downstream checks (isLoopbackApiHost, isBlockedExternalApiHostname, isBlockedIpv4, IPv6 range tests) see the canonical form. Adds 6 vitest cases covering loopback FQDN forms (localhost., foo.localhost., 127.0.0.1.) and SSRF FQDN bypasses (169.254.169.254., 192.168.1.5., 10.0.0.5.). Refs nexu-io/open-design#1119 review feedback (P2 from @lefarcen). * test(connectionTest): tighten trailing-dot coverage per #1122 review Two issues from #1122 review: 1. (P2 from @mrcfps + codex bot) The original `foo.localhost.` case asserted error===undefined on validateBaseUrl, which only proves the URL passed validation — not that the host is identified as loopback. Replaced with direct isLoopbackApiHost(...) assertions on the actual loopback FQDN forms (localhost., 127.0.0.1., 127.0.0.5.) so the test exercises the loopback path the comment claims. 2. (P3 from @lefarcen) Original blocked-FQDN tests covered only 3 of 7 ranges that isBlockedIpv4 handles. Added a dedicated case per range (0.0.0.0/8, 10/8, 100.64/10, 169.254/16, 172.16/12, 192.168/16, multicast >=224) so future regressions in normalizeBracketedIpv6 surface against the full coverage. * docs: drop misleading foo.localhost./endsWith claim in normalizer comment @lefarcen review feedback: isLoopbackApiHost only accepts exact 'localhost', '::1', loopback IPv4, and mapped loopback IPv4 — there's no subdomain or endsWith handling, so referencing 'foo.localhost.' overstates what the trailing-dot strip enables. Rewrite the comment to match actual call sites (isLoopbackApiHost equality + isBlockedIpv4 numeric parse). * feat(daemon): export self-contained HTML via /export/?inline=1 endpoint (#1312) test(daemon): add Red unit tests for inlineRelativeAssets helper 14 cases pinning the behavior contract for the upcoming apps/daemon/src/inline-assets.ts helper: - link/script inlining with verbatim body preservation - non-src script attrs preserved (type=module, defer, crossorigin) - relative path resolution (root + nested + deep-nested owners) - self-closing and single-quoted attr forms - negative cases: missing rel, rel=preload, absolute/data/blob/leading-slash - escaping: </style and </script inside body - null-fileReader graceful degradation - duplicate identical tags fully replaced (diverges from apps/web/src/components/FileViewer.tsx:5313's first-match-only; locked decision per plan §3.3) - HTML-escaped data-od-inline-asset attr Tests intentionally Red — module ../src/inline-assets.js does not yet exist. Phase B-G of plan declarative-roaming-gosling.md will turn them green by porting FileViewer.tsx:5248-5354 server-side. Refs nexu-io/open-design#368. * feat(daemon): port inlineRelativeAssets server-side for export endpoint Adds apps/daemon/src/inline-assets.ts — a pure helper that takes (html, ownerFileName, fileReader closure) and returns the HTML with every relative <link rel=stylesheet> and <script src> contents inlined into <style data-od-inline-asset="…">/<script>…</script> blocks. The fileReader closure keeps the helper free of fs/Express coupling so the route handler owns the filesystem boundary. Port source: apps/web/src/components/FileViewer.tsx:5248-5354 — five functions (inlineRelativeAssets, resolveProjectRelativePath, baseDirFor, readHtmlAttr, escapeHtmlAttr). The fetch hop becomes the fileReader closure; replace-all replaces first-match-only per locked design decision §3.3 (inline comment in inline-assets.ts cites the divergence from FileViewer.tsx:5313 and notes the web inline path is on a deprecation track since PR #384 made URL-load the default). Phase B-G of plan declarative-roaming-gosling.md. All 14 unit cases from the Red commit (`a60a9023`) now pass; tightens one case to use a realistic '&'-only filename (the original `<`/`>`-bearing filename was unreachable in real filesystems and exposed a regex limitation the web client carries too). Daemon delta: +14 tests (1704 → 1718). Typecheck clean. Refs nexu-io/open-design#368. * test(daemon): add Red integration tests for /export/?inline=1 route 9 HTTP cases against GET /api/projects/:id/export/?inline=1: - 3-file React-ish layout returns self-contained HTML (wiring guard: body assertions catch removal of the await inlineRelativeAssets(...) line, not just helper-internals changes) - missing inline / non-canonical values (0, false, foo, empty) → 400 - non-HTML file → 400 UNSUPPORTED_FILE_TYPE - missing file → 404 FILE_NOT_FOUND - invalid project id (..) → some 4xx (Express normalizes before route) - null-origin OPTIONS preflight → 204 + Access-Control-Allow-Origin: * - missing sibling asset → 200 with <link> tag intact, other asset inlined - nested HTML entry (pages/index.html + ../shared/util.js) → 200 inlined 8 of 9 tests Red (404 / 403); the invalid-project-id case is tolerant about how Express rejects .. so it accidentally passes Red — Green will tighten to 400 BAD_REQUEST via isSafeId. Phase C-R of plan declarative-roaming-gosling.md. C-G will register the route in apps/daemon/src/import-export-routes.ts. Refs nexu-io/open-design#368. * feat(daemon): wire GET /api/projects/:id/export/?inline=1 endpoint Adds the export-inline endpoint into registerProjectExportRoutes (import-export-routes.ts) alongside /export/pdf and /archive. The route: - Validates project id via ctx.validation.isSafeId - Requires ?inline=1 (accept-list: 1 / true / yes / on, matching Part 1's parseForceInline at file-viewer-render-mode.ts:59-66) - Reads the owner HTML via ctx.projectFiles.readProjectFile; maps ENOENT to 404 FILE_NOT_FOUND, everything else to 400 BAD_REQUEST - Gates non-HTML callers with 400 UNSUPPORTED_FILE_TYPE - Builds a fileReader closure that silently returns null on any sibling read failure (failure-local, not fatal — matches the web client's null-filter at FileViewer.tsx:5311) - Hands the buffer + relPath to inlineRelativeAssets and returns the result as text/html DI: RegisterProjectExportRoutesDeps gains 'projectFiles' \| 'validation'; server.ts:2879 passes the corresponding deps. Mirrors the dep shape of RegisterFinalizeRoutesDeps used by PR #832's /finalize/anthropic. Null-origin support intentionally omitted (decision §10 in the PR description): the daemon's null-origin allowlist is /raw/ and /codex-pets/.../spritesheet only, and export consumers are same-origin UI or server-side tooling — sandboxed-iframe srcdoc previews fetch /raw/* instead. Integration test #7 pins the 403 contract so a future allowlist change is deliberate. Phase C-G of plan declarative-roaming-gosling.md. All 23 tests green (14 unit + 9 integration); full daemon suite 1727 passing (delta +9 over B-G's 1718). Typecheck clean. Refs nexu-io/open-design#368. * test(daemon): add Red regression for inlined-body tag-literal corruption Reproduces the correctness bug Siri-Ray (looper) and codex-bot flagged on PR #1312: the reduce/split-join approach in inlineRelativeAssets re-scans the progressively mutated HTML, so a tag literal that happens to appear inside an already-inlined asset body gets the inner literal also replaced — corrupting the body and producing duplicate inlining. Concrete reproducer (CSS, where </style escape doesn't touch <link>): HTML: <link rel="stylesheet" href="a.css"> <link rel="stylesheet" href="b.css"> a.css: /* see also <link rel="stylesheet" href="b.css"> / b.css: body{color:red} Under split/join the second pass splits on `<link rel="stylesheet" href="b.css">` and matches BOTH the real outer tag AND the literal inside a.css's comment. Result: b.css's <style> block is injected inside a.css's comment, and b.css gets inlined twice. Phase F-R of plan declarative-roaming-gosling.md (post-PR-#1312 review round). F-G will rewrite the helper to collect matches by position in the original HTML and concat slices in a single pass, so already-inlined content is never re-scanned. Refs nexu-io/open-design#1312 review threads at apps/daemon/src/inline-assets.ts:122 (Siri-Ray looper + codex bot). feat(daemon): replace inliner reduce/split-join with position-based concat Fixes the inlined-body tag-literal corruption Siri-Ray (looper) + codex-bot flagged on PR #1312. The previous `replaceAllOccurrences` (`source.split(from).join(to)`) re-scanned the progressively mutated HTML on each pass, so a tag literal that appeared inside an already- inlined CSS/JS body got the inner literal replaced too, producing duplicate inlining and corrupted bodies. New shape: collect every match's {start, end} byte span from the ORIGINAL html via `matchAll`, await the per-match replacements in parallel, sort by start, and concat slices of the original html with the replacement strings in a single pass. Text introduced by an earlier replacement is never scanned for matches. The dup-tag fix (decision §8 — replace every occurrence, not first-match-only) is preserved: every original-tag position gets its own slice, so all duplicates are inlined. Also extracts buildInlineStyleBlock / buildInlineScriptBlock so the match-collection loops stay readable. Phase F-G of plan declarative-roaming-gosling.md. Regression test (`c809bccc`) goes Green; all 24 unit + integration tests pass; daemon suite still clean. Refs nexu-io/open-design#1312. * test(daemon): add Red CSP-sandbox test + P3 coverage gaps from PR #1312 review Three tests covering lefarcen's review on PR #1312: 1. [Red] CSP sandbox header (P2, lefarcen @ import-export-routes.ts:423). Top-level browser navigation to /export/?inline=1 sends no Origin header, so the daemon middleware lets it through and any JS in the exported document runs with daemon-origin privileges. Asserts the response sends `Content-Security-Policy: sandbox allow-scripts` so the browser treats it as a sandboxed iframe with an opaque origin (scripts still run, but no cookies / no /api/ access). This test fails until G1-G adds the header in the handler. 2. [Green-on-commit] Accept-list cases (P3, lefarcen @ test.ts:262). PR body decision §7 promises `inline=true/yes/on` case-insensitive, but round-1 tests only exercised inline=1. Pin the full accept list (true / yes / on + TRUE / Yes / ON). Already passes — the route's parser already implements the accept list; this just makes the contract testable. 3. [Green-on-commit] isSafeId guard (P3, lefarcen @ test.ts:287). Previous `..` test was normalized by Express before reaching the route. New input uses `bad!id` (URL-safe, but outside isSafeId's /^[A-Za-z0-9._-]+$/ char class), so Express passes it into req.params unchanged and isSafeId rejects with the documented 400 BAD_REQUEST envelope. Phase G1-R / H of plan declarative-roaming-gosling.md. Refs nexu-io/open-design#1312 review comments. feat(daemon): send Content-Security-Policy: sandbox allow-scripts on /export Closes the same-origin XSS surface lefarcen flagged on PR #1312 (P2 at import-export-routes.ts:423): top-level browser navigation to the export URL sends no Origin header, so the daemon's /api middleware admits the request and any JS in the exported document executes with daemon-origin privileges (cookies, /api/, localStorage). `Content-Security-Policy: sandbox allow-scripts` on the response makes the browser treat the document as a sandboxed iframe with an opaque origin. Scripts still execute (necessary for the screenshot use case — the whole point of inlining JS), but they cannot read cookies, hit /api/, or otherwise escalate to the daemon's origin. Phase G1-G of plan declarative-roaming-gosling.md. Daemon delta: +3 tests (the Red CSP test from `58151356` turns Green; the P3 coverage gap tests stay green). Refs nexu-io/open-design#1312. * test(daemon): add Red regression for <link> stylesheet attr preservation Currently `<link rel="stylesheet" href="print.css" media="print">` becomes a plain `<style data-od-inline-asset="print.css">…</style>` with no media query — print-only styles apply unconditionally. Same problem for `title` (alternate stylesheet sets), `disabled` (initial disabled state), and `nonce` (CSP nonce). All four are valid on both `<link rel=stylesheet>` and `<style>` per HTML spec, so the inliner must carry them across. PR #1312 round-2 review (lefarcen P2 @ inline-assets.ts:44). Phase G2-R; G2-G will extend buildInlineStyleBlock to copy the four attrs off the source <link>. Refs nexu-io/open-design#1312. * feat(daemon): preserve <link> stylesheet semantics on inlined <style> Closes lefarcen's P2 review note on PR #1312 (inline-assets.ts:44): `<link rel="stylesheet" href="print.css" media="print">` was becoming a plain <style> with no media query, so print-only styles applied unconditionally. Same issue for `title` (alternate stylesheet sets), `disabled` (initial disabled state), and `nonce` (CSP nonce). buildInlineStyleBlock now carries four attrs across from the source <link>: - media, title, nonce (value attrs, HTML-escaped via escapeHtmlAttr) - disabled (boolean attr — copied as bare presence) Other <link> attrs (rel, href, type, crossorigin, integrity, referrerpolicy) don't apply to <style> and are intentionally dropped. New `hasBooleanHtmlAttr` helper distinguishes presence-as-attr from substring-inside-another-attr-value via a regex that requires a word boundary after the name (whitespace, `=`, or `>`). Phase G2-G of plan declarative-roaming-gosling.md. All 28 tests pass. Refs nexu-io/open-design#1312. * docs(daemon): narrow inliner contract claim + document size-limit policy Closes lefarcen's P2 review notes on PR #1312: 1. "Self-contained" incomplete (inline-assets.ts:67): the helper only rewrites top-level <link rel=stylesheet> / <script src>. `<img src>`, CSS `url(...)`, CSS `@import`, ES module imports, font sources, and similar remain external in the response. The PR title/body claimed "self-contained HTML" which over-promised for screenshot tooling expecting bundled images/fonts. Module docstring now enumerates the full not-rewritten list and names the screenshot path as the primary use case (headless browser fetches each external asset on render, so inline-CSS- and-JS-only is sufficient). The route handler comment block mirrors the contract. A fully offline export with image/font bundling is filed as a follow-up — out of scope for this PR. 2. No response cap (inline-assets.ts:72): the helper does concurrent reads + multiple string copies and could spike daemon memory. The daemon is local-first (single-user, developer's machine — see open_design_architecture.md), so the effective ceiling is the size of the user's own project. The docstring now states this rationale and names the conditions under which a bounded-concurrency reader and output-size limit would be needed (non-trusted callers). Docs-only — no behavior change, all 28 tests still pass. Refs nexu-io/open-design#1312. * test(daemon): add Red regression for hasBooleanHtmlAttr quoted-value match PR #1312 round-2 review (lefarcen P3): `hasBooleanHtmlAttr` tests the tag string with no attr-quoting awareness, so the literal text `disabled` appearing inside any quoted attribute value followed by another whitespace char satisfies `\sdisabled(?=\s\|=\|/?>)`. <link rel=stylesheet href=x.css data-note="content disabled stuff"> emits a <style disabled> block, silently disabling a stylesheet the author wrote without that attr. Also adds a counterweight test for the legitimate-disabled case (<link … disabled>) so the next-commit fix doesn't over-correct and start dropping real boolean attrs. Phase I3-R of plan declarative-roaming-gosling.md (post-PR-#1312 round-2 review). I3-G will strip quoted attribute values from the tag string before testing for the bare attr. Refs nexu-io/open-design#1312. * feat(daemon): make hasBooleanHtmlAttr quote-aware to avoid false positives Closes lefarcen's P3 review note on PR #1312: `hasBooleanHtmlAttr` previously ran `\sname(?=\s\|=\|/?>)` over the full tag string, so the literal text `disabled` appearing inside any quoted attribute value followed by whitespace satisfied the regex. Source tags like `<link rel=stylesheet href=x.css data-note="content disabled stuff">` were emitting a <style disabled> block — silently disabling a stylesheet the author wrote without that attr. Fix: strip `="…"` and `='…'` substrings out of the tag with two regex passes BEFORE testing for the bare attr. The lookahead still requires `\s\|=\|/?>` after the attr name, so `<link disabled>`, `<link disabled="">`, `<link disabled/>`, etc. all match — but the attr name as a substring of any quoted value cannot match because values have been stripped to `""` / `''`. Phase I3-G of plan declarative-roaming-gosling.md. All 30 tests green (28 prior + 2 round-3 regression cases: false-positive and legitimate-disabled). Refs nexu-io/open-design#1312. * test(daemon): add Red cap-enforcement tests + scaffold InlineOptions PR #1312 round-2 review (lefarcen P2 — still open): round-2 only documented that no cap is enforced. Reviewer pushed back: the helper still builds unbounded candidate arrays + runs Promise.all over all asset reads + concatenates the full output in memory. Need actual limits in code. This commit adds the Red test surface that drives the next commit's enforcement: - InlineAssetsLimitError("owner") when owner HTML > maxOwnerBytes - InlineAssetsLimitError("candidates") when tag matches > maxCandidates - Per-asset graceful: oversized asset → tag stays as URL ref - InlineAssetsLimitError("total") when assembled output > maxTotalBytes - Bounded read concurrency: peak in-flight reads ≤ maxReadConcurrency - Integration: route maps the throw to 413 PAYLOAD_TOO_LARGE InlineOptions interface is added to the helper signature as a no-op test-door (per feedback_test_doors_over_fake_timers.md), so tests can exercise tiny fixtures while production callers use module-level defaults. The next commit (H3-G) wires the enforcement. Phase H3-R of plan declarative-roaming-gosling.md. Daemon delta on this commit: +6 tests (5 unit + 1 integration), all Red. Refs nexu-io/open-design#1312. * feat(daemon): enforce inliner caps + map limit errors to 413 PAYLOAD_TOO_LARGE Closes lefarcen's still-open P2 review on PR #1312 round 2 ("the code still builds unbounded candidate arrays + Promise.all over all asset reads + concatenates the full output in memory"). Caps are now enforced in code with the documented defaults: MAX_INLINE_OWNER_BYTES = 2 MiB MAX_INLINE_ASSET_BYTES = 5 MiB per sibling MAX_INLINE_CANDIDATES = 500 link/script matches MAX_INLINE_TOTAL_BYTES = 50 MiB assembled output MAX_INLINE_READ_CONCURRENCY = 8 simultaneous fileReader calls Enforcement points: - Owner cap (input): fires immediately at function entry. Cheap — Buffer.byteLength of the already-decoded UTF-8 string. - Candidate cap (planning): fires after matchAll, BEFORE any sibling read. Pathological HTML with thousands of <link>/<script src> tags is rejected without opening a single file descriptor. - Asset cap (per-sibling): post-read length check; oversized assets return null from the wrapped reader, so the tag stays as a URL ref and the response is still 200. This is the only "graceful" cap — one bad asset doesn't fail the whole export. - Total cap (output): tracked across the slice-and-concat loop, guarding both preserved-html slices AND injected replacements. - Concurrency cap (planning): a tiny in-module runWithConcurrency worker-pool keeps at most maxReadConcurrency fileReader calls in flight, with order-preserving results. `InlineAssetsLimitError` carries a `limit` discriminator so logs and clients can disambiguate owner/asset/candidates/total. The route handler catches it and emits 413 PAYLOAD_TOO_LARGE. Drive-by error-envelope fix while in the route: UNSUPPORTED_FILE_TYPE (an unregistered ApiErrorCode) → UNSUPPORTED_MEDIA_TYPE (the canonical code) with HTTP 415. The round-1 string was a slip; caught by reading packages/contracts/src/errors.ts:11 while wiring PAYLOAD_TOO_LARGE. Phase H3-G of plan declarative-roaming-gosling.md. All 36 tests green (28 prior + 2 round-3 quoted-attr + 5 cap unit + 1 cap integration). Refs nexu-io/open-design#1312. * feat(daemon): enforce inliner caps pre-buffer via AssetHandle contract Closes lefarcen's still-open P2 review on PR #1312 round 3 ("the helper enforces maxTotalBytes only after all candidate assets have already been read and converted to replacement strings" / "maxAssetBytes is checked after fileReader fully buffers each sibling"). Round-3 caps were defensive against the final output size but did not bound peak memory during read fanout — 500 assets at 5 MiB each could materialize ~2.5 GiB before the 413 fired. Contract change: InlineAssetReader now returns `AssetHandle \| null` where AssetHandle is `{ readonly size: number; read(): Promise<...> }`. Callers expose `size` from a cheap stat-equivalent (the route uses `resolveProjectFilePath`) and defer the full materialization to `read()`. The helper checks size against maxAssetBytes BEFORE invoking read, and against the running total BEFORE the reservation is committed. Enforcement flow inside runWithConcurrency: 1. await fileReader(p.resolved) → cheap stat-only call 2. if (handle.size > maxAssetBytes) return null ← pre-buffer 3. if (runningBytes + handle.size > maxTotalBytes) ← pre-buffer totalAborted = true; return null 4. runningBytes += handle.size ← reserve 5. await handle.read() ← only now 6. if (read returned null) runningBytes -= refund `totalAborted` is a shared flag the workers check at entry, so once the running total hits the cap, no new reads start. With maxReadConcurrency = 8, at most ~8 stat-side calls finish after abort — peak memory bounded. The concat-time guard stays as the exact final assertion (the pre-buffer reservation is approximate — it counts the original tag bytes and skips wrapper overhead). Route closure updated to do `resolveProjectFilePath` first, then `readProjectFile` inside the deferred `read()`. Test reader helpers (`readerFrom` + the concurrency-test reader) updated to the new shape. Two new unit tests pin the pre-buffer semantics: - `maxAssetBytes` is checked via handle.size BEFORE handle.read() (the reader's `read()` throws — must never run) - Running total abort stops further reads once exceeded (counting reader observes ≤ 2 reads when cap should fire after the first) Phase K of plan declarative-roaming-gosling.md (post-PR-#1312 round-3 review). All 38 tests green (36 prior + 2 round-4 pre-buffer cases). Refs nexu-io/open-design#1312. * test(daemon): add Red test pinning owner pre-buffer 413 before mime 415 PR #1312 round-5 (lefarcen P2): the route currently reads the owner file with readProjectFile() before any size check, so a 100 MiB owner HTML is fully buffered into memory before the helper's ownerBytes check fires. The fix is to stat with resolveProjectFilePath first, reject pre-buffer with 413 PAYLOAD_TOO_LARGE on oversize, then fold in the mime check (still 415 on mismatch, now pre-buffer), then readProjectFile when both gates pass. The Red→Green discriminator is the combination 'oversize AND non-HTML': pre-fix the route reads the buffer first and the text/plain mime check fires → 415; post-fix the route stats first and the size check fires before the mime check → 413. Asserting 'got 413, not 415' pins both the pre-buffer property and the check ordering (size before mime, per lefarcen's locked round-5 sequence). 2 MiB+1 byte fixture is acceptable in test setup; MAX_INLINE_OWNER_BYTES is the production 2 MiB so no test-door is needed. Red verified: AssertionError: expected 415 to be 413 (pre-fix flow reads → mime → 415). * feat(daemon): stat owner before readProjectFile in /export route to bound owner pre-buffer PR #1312 round-5 (lefarcen P2 confirmed at PR-1312#issuecomment-4424868413 follow-up): the route previously called readProjectFile() unconditionally on the owner, so a 100 MiB owner HTML was fully buffered into memory before the helper's ownerBytes check fired with InlineAssetsLimitError ('owner'). That meant the 413 envelope returned to the caller but only after peak memory had already hit the file size. Fix mirrors the sibling-asset stat-then-read contract round 4 added via the AssetHandle interface: call resolveProjectFilePath first (cheap stat), reject pre-buffer with 413 PAYLOAD_TOO_LARGE on size > MAX_INLINE_OWNER_BYTES, fold in the mime check (still 415 UNSUPPORTED_MEDIA_TYPE on mismatch, now also pre-buffer per lefarcen's 'fold-in is welcome'), then readProjectFile() only when both gates pass. Size check fires before mime check, so an oversize non-HTML file returns 413 rather than 415 — the observable Red→Green discriminator for this round. The helper's ownerBytes check (inline-assets.ts:127-133) stays as defense-in-depth for direct in-process callers that skip the route and for any drift between stat-reported size and the bytes returned by readFile. Verifies the round-5 Red at apps/daemon/tests/export-inline-route.ts ('returns 413 (not 415) for an oversize non-HTML file'). Daemon suite 1743/1743 passing. * test(daemon): add Red test pinning stat-vs-actual byte reconciliation PR #1312 round-5 (lefarcen P3 confirmed at PR-1312#issuecomment-4424868413 follow-up): the helper trusts handle.size for the running-total guard and never reconciles with the actual byte length of content unless the per-asset cap is exceeded. A reader that under-reports size (stale stat, UTF-8 expansion at decode, sparse file, deliberate lie) can let many strings materialize in memory before the concat-time guard at the bottom of inlineRelativeAssets throws — defeating the round-4 pre-buffer cap intent. Fix is lefarcen-confirmed path-a: post-read, the helper computes actualBytes = Buffer.byteLength(content, 'utf8'), reconciles runningBytes (add actualBytes, refund handle.size), and if running total exceeds maxTotalBytes flips totalAborted = true and returns null. Subsequent workers see totalAborted before invoking their own read(). Helper still throws InlineAssetsLimitError('total') after Promise.all settles — preserving the round-2/3/4 graceful-fallback pattern instead of racing throws across in-flight workers. Red→Green discriminator is read count. Pre-fix the helper trusts the lying handle.size (10), so both reads complete (each returning 1000 bytes) under the reservation total of 56+10+10=76 < cap 500. The concat-time guard then catches the 2000+-byte assembly and throws 'total' — but only after both reads materialized in memory. Post-fix worker 1's reconciliation trips totalAborted as soon as actualBytes (1000) is folded into runningBytes; worker 2 skips its read. Red verified: AssertionError expected 1, received 2 (pre-fix flow completes both reads before concat-guard fires). * feat(daemon): reconcile inliner reservation with post-read actual bytes PR #1312 round-5 (lefarcen P3 confirmed at PR-1312#issuecomment-4424868413 follow-up, path-a): the helper trusted handle.size for the running- total guard and only reconciled with actual bytes for the per-asset cap. A reader that under-reported size — stale stat, UTF-8 decode expansion at read time, sparse file, deliberate lie — could let many strings materialize before the concat-time guard at the bottom of inlineRelativeAssets caught the excess. That defeated the round-4 pre-buffer cap intent. Fix: after a successful read(), compute actualBytes = Buffer.byteLength(content, 'utf8'), reconcile runningBytes by folding in (actualBytes - handle.size), and re-check the total cap. If the reconciliation pushes runningBytes past maxTotalBytes, drop the asset's inlining (tag stays as URL ref), set totalAborted = true to block subsequent worker reads, and let Promise.all settle. The helper then throws InlineAssetsLimitError('total') below — matching the round-2/3/4 graceful-fallback pattern (no throw-before-settle race between in-flight workers). The per-asset cap check at line 228 is preserved for stat-lying readers that blow a single asset past maxAssetBytes; that branch refunds handle.size and drops without flipping totalAborted, so sibling assets still get a fair shot. Verifies the round-5 Red at apps/daemon/tests/export-inline-route.ts ('reconciles handle.size with actual content bytes'). Daemon suite 1744/1744 passing. --------- Co-authored-by: DevForgeAI CI/CD Engineer <devforge-ai@development.ai> * fix: truncate long template names on project cards (#1220) (#1302) Add min-width: 0 to .design-card-name so text-overflow: ellipsis works correctly in flex layouts. Long template names were pushing the task execution status (Running, Failed, etc.) out of view on project cards. Closes #1220 Co-authored-by: laomo <laomo@openclaw.ai> * fix(desktop): swallow setTypeOfService EINVAL crashes in dev main (#647) (#1298) * fix(desktop): swallow harmless setTypeOfService EINVAL crashes in dev main The packaged Electron entry (apps/packaged/src/logging.ts) already filters the undici "setTypeOfService EINVAL" crash that issue #895 introduced for the prod build, but the dev / source-built desktop entry was missing the parallel guard. Result: switching settings tabs in a from-source desktop run could fire a fresh fetch, undici would try to set IP_TOS on the outbound socket, the kernel would refuse on certain macOS / VPN configurations, and the rejection bubbled to Electron's default handler as the "JavaScript error in the main process" dialog reported in issue #647. Add the same defensive filter to apps/desktop: - isHarmlessSocketOptionError matches only the canonical undici shape (syscall name AND EINVAL code). A contradicting code (EACCES, EPERM, etc) explicitly fails the match so real bugs don't get hidden. - The uncaughtException handler logs harmless cases at warn and returns silently. For anything else it removes itself from the listener list and re-throws via setImmediate, restoring Node's default crash path so Electron's native dialog renders exactly as it would without this filter. - unhandledRejection mirrors the same harmless / fall-through split. The filter is installed BEFORE app.whenReady so it is armed by the time the renderer fires its first fetch. The helper is duplicated rather than imported from apps/packaged because AGENTS.md forbids cross-app private-source imports. The file header calls out the parallel and notes that the two copies should stay in sync until the helper is promoted to a shared workspace package (follow-up); the contract is identical so a regression in one will surface in the other's test suite. Tests in apps/desktop/tests/main/uncaught-exception.test.ts mirror apps/packaged/tests/logging.test.ts: 8 cases pinning the matcher shape, 2 cases pinning the handler's harmless-log-warn vs fall-through-rethrow split. Validated: pnpm guard, pnpm --filter @open-design/desktop typecheck, pnpm --filter @open-design/desktop build, and pnpm --filter @open-design/desktop test (14 passed, 10 new). * fix(desktop,packaged): fail-fast on non-harmless unhandled rejections The previous unhandledRejection listeners logged non-harmless reasons and returned, which kept the main process alive after any rejected promise. A real bug, a failed IPC registration, or any unexpected async exception was reduced to a console line instead of surfacing through Node/Electron's default crash path the filter was meant to preserve. Both copies now route non-harmless rejections through a parallel factory (createDesktopUnhandledRejectionHandler / createFatalUnhandledRejectionHandler) that mirrors the uncaughtException policy: harmless setTypeOfService EINVAL shapes log at warn and return, anything else logs at error, removes the listener, and re-throws via setImmediate. Listener removal happens before the scheduled throw, so the rethrown reason lands in the uncaughtException path with no recursion. Tests cover the harmless branch, the detach + ordered rethrow, and non-Error / primitive rejection reasons (Promise.reject(42)) which must fall through. Desktop suite: 13/13, packaged suite: 16/16. Flagged on PR #1298 by Siri-Ray and the codex P2 review thread; the two file copies stay in lockstep per the AGENTS.md sync invariant. --------- Co-authored-by: Nagendhra <nagendhra405@gmail.com> * feature: refine assistant artifact feedback (#1379) * feature: refine assistant artifact feedback * fix: clear hidden custom feedback reason * test: update assistant feedback expectations * fix: support object-style question-form options (#1293) * fix: support object-style question-form options * fix: preserve stable option values in form submissions * fix(daemon/acp): terminate ACP child after clean prompt completion (#1286) * fix(daemon/acp): terminate ACP child after clean prompt completion (Bug B / #1265) Some ACP agents (notably Devin for Terminal) keep the child process alive after stdin closes, waiting for the next prompt. Open Design spawns a fresh agent per chat turn and relies on child.on('close') to finalize the run, so without an explicit signal-driven shutdown the chat sits stuck in the 'working' state indefinitely. Three small, targeted changes: - apps/daemon/src/acp.ts: After a clean session/prompt response we schedule a 500ms grace period and then SIGTERM the child. This mirrors the pattern detectAcpModels() already uses after model discovery. The grace period leaves well-behaved agents that exit on stdin.end() unaffected. - apps/daemon/src/acp.ts: New completedSuccessfully() method on the session handle reports whether the prompt resolved without a fatal error or abort, so the consumer can distinguish 'clean signal exit' from 'genuine signal failure'. - apps/daemon/src/server.ts: child.on('close') now treats a SIGTERM exit as 'succeeded' when acpSession.completedSuccessfully() is true. - apps/web/src/providers/daemon.ts: Trust the server's authoritative endStatus; the signal/non-zero-code safety net no longer overrides an explicit 'succeeded' status, so the chat doesn't surface a fake 'agent exited with signal SIGTERM' error after a clean ACP run. Daemon tests cover the SIGTERM grace timer, clean early-exit (timer cleared), and completedSuccessfully() abort/error states. Manual UI test on plain main + this fix confirms Devin chats now return to ready automatically after Done · ... * fix(daemon/connectionTest): treat ACP clean SIGTERM as success Codex review on #1286 caught that the new SIGTERM in attachAcpSession breaks ACP connection tests for agents that don't shut down on stdin.end() (the exact Devin behavior the patch targets). attachAgentStreamHandlers() in connectionTest.ts now also respects acpSession.completedSuccessfully(), mirroring the same check we apply in server.ts. Without this, a clean prompt response followed by our SIGTERM would set winner.signal === 'SIGTERM', flip exitedCleanly to false, and the connection test would report 'agent_spawn_failed' even when the agent had returned a healthy response. Also widened the AgentSpawnHandle type so completedSuccessfully is visible on the structural type used inside connectionTest.ts. All 56 daemon tests still pass; typecheck + guard clean. * fix(daemon/acp): narrow ACP success-on-signal override to forced-SIGTERM Looper review on #1286 caught that the success predicate was broader than the SIGTERM case it was meant to handle. `completedSuccessfully()` flips to true as soon as the ACP `session/prompt` response is processed, but it does not say why the child later closed. With the broad predicate, an ACP agent that returned a prompt result and then exited with code 1 (or was killed by SIGKILL/SIGSEGV) was still marked 'succeeded', regressing the existing close-status behavior for genuine post-response process failures. Scope the override to the exact forced-shutdown shape this PR introduces: code === null && signal === 'SIGTERM' && acpCleanCompletion Applied to both `server.ts` (chat run finalization) and `connectionTest.ts` (connection-test classification). Any other post-response failure now falls through to 'failed' / 'agent_spawn_failed' as before. All 59 daemon tests still pass; typecheck + guard clean. * fix(web/daemon): only bypass exit-code safety net on explicit server success Looper review on #1286 caught that the previous web change trusted `endStatus === 'succeeded'` absolutely, but `endStatus` can become 'succeeded' in two distinct ways: 1. The SSE end event explicitly carries `status: 'succeeded'` (authoritative server declaration). 2. The end event omits or has an invalid `status` field and the handler silently falls back to 'succeeded' as a local default. Both produced `endStatus === 'succeeded'` in the existing code, so the new safety-net bypass treated them identically. That regressed backward compat: a compatible or older daemon emitting an end event like `{code:1}` or `{code:null,signal:"SIGTERM"}` with no `status` would suddenly skip the failure banner. Track explicit success separately via `serverDeclaredSuccess`, set true only when: - The SSE end event has `status === 'succeeded'`, or - The fallback `fetchChatRunStatus` REST path returns `status === 'succeeded'` (which the existing `isChatRunStatus()` guard already proves is explicit). The safety net is now bypassed only on that explicit signal; the local-fallback success path still reaches the exit-code/signal check so real failures surface as before. Adds three web-side regression tests in `apps/web/tests/providers/sse.test.ts`: - Explicit `status: 'succeeded'` + SIGTERM → onDone called, no error - End event with `{code:1}` and no `status` → onError surfaces 'agent exited with code 1' as before - End event with `{code:null,signal:'SIGTERM'}` and no `status` → onError surfaces 'agent exited with signal SIGTERM' as before `pnpm guard` + daemon typecheck clean; 27/27 SSE tests pass (up from 24). * Fix Codex wrapper launch paths (#1395) * test: add Memory and Routines coverage (#1400) * test: align extended Playwright coverage with current UI behavior * test: address extended suite review feedback * test: fix Codex fallback config hydration in e2e * test: add Memory and Routines coverage * test: fix Memory and Routines component test typing * test: include Memory and Routines e2e in extended suite * refactor(settings): use tiled language picker instead of dropdown (#1406) The Language section in Settings rendered a single-button dropdown trigger that opened a floating menu. With one visible label and lots of empty panel space, the layout misled users into thinking only one language existed. Replace the dropdown trigger + portaled menu with an inline tile grid that shows every locale at a glance and clicks directly to switch. Side effects of the new layout: the languageOpen / languageMenuRect state, the dynamic placement effect, the resize-close effect, the mousedown click-outside handler, and the languageRef are gone. The global Escape handler no longer needs to guard against the menu being open. CSS for .settings-language-picker, .settings-language-button, .settings-language-menu, and .settings-language-option is replaced by .settings-language-grid (auto-fill 180px minmax columns) + .settings-language-tile. Tests in SettingsDialog.execution.test.tsx that drove the dropdown (click trigger → click menuitemradio → assert menu closed) are rewritten to drive the tiles directly via the radio role. Refs #1347 * fix(web): restore consistent app header layout * fix(web): restore consistent app header layout Generated-By: looper 0.7.2 (runner=fixer, agent=opencode) * fix(web): restore consistent app header layout Generated-By: looper 0.7.2 (runner=fixer, agent=opencode) * fix(web): restore consistent app header layout Generated-By: looper 0.7.2 (runner=fixer, agent=opencode) * fix(web): hide project output chips in header --------- Co-authored-by: Prantik Medhi <140103052+prantikmedhi@users.noreply.github.com> Co-authored-by: 이용진 <90879448+Leesin0222@users.noreply.github.com> Co-authored-by: Nicholas-Xiong <2482929840@qq.com> Co-authored-by: Hesam <chngyzkhanwhsht@gmail.com> Co-authored-by: Yuhao Chen <godcorn001@outlook.com> Co-authored-by: chaoxiaoche <fanzhen910412@gmail.com> Co-authored-by: chaoxiaoche <chaoxiaoche@192.168.10.16> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: eggward han <32223217+Eggwardhan@users.noreply.github.com> Co-authored-by: @aaronjmars <61592645+aaronjmars@users.noreply.github.com> Co-authored-by: Bryan <121247296+bankielewicz@users.noreply.github.com> Co-authored-by: DevForgeAI CI/CD Engineer <devforge-ai@development.ai> Co-authored-by: mrzhangkris <92247501+mrzhangkris@users.noreply.github.com> Co-authored-by: laomo <laomo@openclaw.ai> Co-authored-by: Nagendhra Madishetti <nagendhra.madishetti24@gmail.com> Co-authored-by: Nagendhra <nagendhra405@gmail.com> Co-authored-by: Mason <jinmeihong0201@gmail.com> Co-authored-by: Yiang Yiyan <15089131836@163.com> Co-authored-by: Rocky <101849785+MrRockySL@users.noreply.github.com> Co-authored-by: nettee <nettee.liu@gmail.com> Co-authored-by: shangxinyu1 <shangxinyu@refly.ai> Co-authored-by: Matt Van Horn <mvanhorn@users.noreply.github.com>	2026-05-12 23:15:46 +08:00
pftom	1f4259a190	feat(web): update routing and terminology for automation features - Modified the routing logic to recognize 'automations' as a valid entry point alongside 'tasks', ensuring backward compatibility. - Updated the EntryNavRail component to reflect the change from 'tasks' to 'automations' in labels and tooltips. - Renamed instances of 'tasks' to 'automations' in the TasksView component for consistency and clarity. - Enhanced HomeView and chip actions to include new input structures for better handling of automation scenarios. - Updated tests to validate the new routing and terminology changes, ensuring proper functionality across the application. This update improves the user experience by clarifying the distinction between tasks and automations, aligning the UI with the updated terminology.	2026-05-12 23:00:24 +08:00
pftom	358105c154	feat(web): enhance PluginsHomeSection with contribution card and improved plugin creation flow - Updated the `onCreatePlugin` prop to accept an optional goal parameter, allowing for more contextual plugin creation. - Introduced a contribution card that displays when no plugins match the current filters, providing users with a prompt to create new plugins. - Enhanced the `resolveContributionTarget` function to return detailed information about the contribution context. - Added new CSS styles for the contribution card to improve visual presentation. - Updated tests to cover new contribution scenarios and ensure proper functionality of the contribution card. This update significantly improves the user experience by guiding users to contribute plugins when no matches are found, fostering community engagement.	2026-05-12 22:46:43 +08:00
pftom	5f71968f61	feat(daemon, web): implement plugin sharing features for GitHub and Open Design contributions - Added new API endpoints for publishing plugins to GitHub and contributing to Open Design, enhancing the plugin sharing capabilities. - Introduced functions for handling plugin sharing actions, including `publishGeneratedPluginToGitHub` and `contributeGeneratedPluginToOpenDesign`. - Updated the `DesignFilesPanel` and `FileWorkspace` components to support new sharing functionalities, allowing users to publish or contribute plugins directly from the interface. - Enhanced the UI with new buttons for publishing and contributing plugins, improving user interaction and experience. - Added tests to ensure the reliability of the new sharing features and their integration within the existing plugin management system. This update significantly improves the plugin ecosystem by enabling users to share their creations with the community and streamline collaboration.	2026-05-12 22:39:32 +08:00
pftom	72ecc09326	feat(daemon, web): enhance plugin input handling and subcategory filtering - Updated the `pickPluginFields` function to support legacy input aliases, improving compatibility with existing plugin structures. - Added tests to ensure the new input handling works correctly with legacy inputs when resolving plugin snapshots. - Enhanced CSS styles for subcategory elements in the plugin home section, improving visual clarity and user experience. - Introduced new tests for subcategory filtering within the active workflow lane, ensuring accurate plugin categorization. This update significantly improves the plugin management experience by enhancing input handling and refining the categorization system.	2026-05-12 22:09:26 +08:00
pftom	26d21a942e	feat(web): enhance plugin input handling and categorization - Added support for plugin inputs in the EntryShell and HomeView components, allowing for more dynamic plugin interactions. - Updated the PluginsHomeSection to include subcategory filtering, improving the user experience when navigating plugins. - Enhanced the PluginsView and related components to reflect the new categorization model, transitioning to a workflow-based approach. - Refactored tests to ensure coverage for new input handling and categorization features, maintaining reliability across the application. This update significantly improves the plugin management experience by providing clearer categorization and enhanced input handling for plugins.	2026-05-12 21:59:38 +08:00
Eli	49ea2499ac	[codex] Add draw annotation workflow (#1435 ) * feat(web): tweaks palette popover with HSL hue-shift recoloring Adds a Tweaks color-palette popover to the HTML preview toolbar. Selecting a palette re-skins the iframe in place via a srcDoc-side bridge that walks the DOM and shifts every chromatic paint to the target hue while preserving each color's saturation and lightness — pale tints stay pale, bold CTAs stay bold, just in the new color family. Mono-noir desaturates instead of shifting. - runtime/srcdoc: new injectPaletteBridge + paletteBridge / initialPalette options - file-viewer-render-mode: paletteActive flips URL-load back to srcDoc so the bridge can be injected - FileViewer: state, popover, postMessage wiring, srcDoc + useUrlLoadPreview integration - PaletteTweaks: popover UI with Original + Coral / Electric / Acid forest / Risograph / Mono noir - PreviewDrawOverlay: stub pass-through until the draw branch lands * feat(web): hide finalize-design toolbar from project header * test(e2e): skip project actions toolbar flow after toolbar removal * Add draw annotation workflow * Restore project actions toolbar	2026-05-12 21:54:59 +08:00
pftom	67a109335d	feat(web, daemon): enhance plugin categorization and introduce new export scenarios - Updated the plugin categorization system to reflect a workflow-based model, replacing the previous category bar with a curated workflow bar (From source, Generate, Export). - Added new export scenarios for Next.js, React, and Vue, providing users with starter plugins for downstream integration. - Enhanced the HomeView and PluginsHomeSection components to support the new categorization and improve user interaction with plugins. - Updated tests to cover new scenarios and ensure proper functionality across the updated plugin management system. This update significantly improves the user experience by providing clearer categorization and new tools for exporting Open Design artifacts.	2026-05-12 21:50:19 +08:00
Neha Prasad	2d405fae96	fix:align artifact preview exit button (#1445 )	2026-05-12 21:39:31 +08:00
Nagendhra Madishetti	09a8fa8d64	feat(web): Critique Theater Phase 8 (8 Theater components, barrel, role-keyed CSS) (#1314 ) * feat(web): pure reducer for Critique Theater states (Phase 7.1) Pure CritiqueState reducer driven by the contracts-level PanelEvent (the same shape both the live SSE stream and the recorded transcript emit), so a single reducer powers both the in-flight panel and the rerun replay. Lifecycle covers run_started → running → (shipped / degraded / interrupted / failed), with panelist_open / dim / must_fix / close / round_end events building per-round CritiquePanelistView entries as they arrive. Defensive behaviour that surfaced while writing the spec tests: - Terminal phases (shipped / degraded / interrupted / failed) are sticky against further lifecycle events for the same run, except for parser_warning which can land late and is recorded in a side channel without changing phase. - A new run_started for a different runId at any time discards the prior state and reboots, so the UI can launch consecutive runs without an explicit reset action. - Events whose runId does not match the active run return the same state reference, so React's useReducer doesn't re-render subscribers on stray traffic. - Round bookkeeping keys by round number rather than "always last", so an out-of-order panelist_dim for round 1 arriving after a round 2 dim does not corrupt the round 2 bucket. Test coverage: 18 cases covering each transition, the runId guard, sticky-terminal behaviour, the out-of-order round invariant, and the stable-identity guarantee. Sets up Phase 7.2 and 7.3 to wire SSE + replay into the same reducer. * feat(web): useCritiqueStream hook subscribes to SSE and feeds reducer (Phase 7.2) createCritiqueEventsConnection is a pure connection manager that mirrors apps/web/src/providers/project-events.ts: opens an EventSource at /api/projects/:id/events, listens for every name in CRITIQUE_SSE_EVENT_NAMES, decodes each frame back into a PanelEvent (stripping the critique. prefix and merging the data payload), and hands it to the caller's onEvent. Reconnect uses exponential backoff (1s → 30s) and resets on `ready`; malformed payloads drop with a dev-mode warning rather than tearing the stream. useCritiqueStream wraps the manager in a useReducer that owns the CritiqueState. enabled=false or a null projectId tears down the connection cleanly; switching projectId closes the old connection and opens a fresh one. The returned dispatch lets local UI synthesise actions (e.g. an Esc keypress firing a synthetic interrupted while a kill request is in flight); production traffic comes from the SSE stream. Test coverage: - sse.test.ts (10 cases, node env): subscription set covers every CRITIQUE_SSE_EVENT_NAMES channel; payload decoding lifts the wire shape back to PanelEvent; malformed JSON is swallowed and does not stop the stream; exponential backoff schedule and ready-reset semantics are pinned with a setTimeout seam; close() cancels pending reconnects and shuts the live source; no-op fallback when EventSource is unavailable. - useCritiqueStream.test.tsx (6 cases, jsdom env): idle pre-event, reducer driven by synthetic actions, no connection when disabled or projectId is null, clean close on unmount, projectId change reopens cleanly. * feat(web): useCritiqueReplay hook drives reducer from transcript file (Phase 7.3) Fetches the per-run NDJSON transcript (one PanelEvent per line), parses every line via the shared isPanelEvent predicate, and dispatches into the same CritiqueState reducer the live SSE stream uses. A single reducer means the UI rendering a replay can be identical to the live panel, and a UI mounting both useCritiqueStream and useCritiqueReplay in parallel does not have to reconcile two state shapes. speed knob is `paused \| instant \| live \| { intervalMs: N }`. - instant flushes every event synchronously, useful for opening a finished run already at its terminal state. - intervalMs paces dispatches at a fixed cadence so the reviewer can watch the run unfold. - paused parses the transcript but holds events back until the caller advances speed (consumers can drive a scrubber later). - live is reserved for the future "playback at original cadence" feature, currently treated as instant; replay timestamps are not yet persisted with each event so honest pacing requires a follow-up Phase 7+ task. gunzip seam handles `.ndjson.gz` transcripts via DecompressionStream when present; the production fetch path picks between text and arrayBuffer based on the URL extension. Both seams are injectable so the unit tests don't need to spin up a real network or a real gzip pipeline. Test coverage (8 cases, jsdom env): - Idle status before any URL is provided. - speed=instant flushes the full transcript synchronously to shipped state. - speed={intervalMs:N} paces with the setTimeout seam, reaching done after the last tick. - speed=paused leaves status=playing with no dispatches. - Empty transcript reports done with state still idle. - Fetch rejection surfaces an error status with the message. - Malformed NDJSON lines are skipped; valid events around them still land. - .gz transcripts route through the gunzip seam. Closes the Phase 7 plan tasks 7.1 / 7.2 / 7.3 (reducer + stream + replay), all on one branch ready for review. Phases 8+ (Theater components) consume these from this PR. * fix(web): close payload-override gap + paused-resume bug in Critique Theater hooks (Phase 7 review) Two P1 fixes from lefarcen's review on PR #1307: SSE payload override `sseToPanelEvent` previously spread `data` after the channel-derived `type`, so a payload-provided `type` could override the channel and route a `critique.run_started` frame into the reducer as a `ship` action. Reversed the spread so the channel-derived `type` is authoritative, and revalidated the resulting object through the contracts-level `isPanelEvent` predicate before returning. Frames that fail validation (missing runId, empty runId, unknown type) are dropped, so a malformed or compromised SSE frame can no longer dispatch a wrong-shape action into the reducer. Three new sse.test.ts cases pin the regression: hostile `type:'ship'` in the payload still resolves to `run_started`, missing runId is dropped, empty runId is dropped. Replay pause/resume `useCritiqueReplay` had one big effect keyed on `transcriptUrl` only, so flipping `speed` from `paused` to `instant` never re-fired and the held events sat undispatched. Split into a parse effect (depends on URL, fetches and stores events in state) and a pace effect (depends on parsed-events + speed, owns the cursor + timers). The playback cursor lives in a ref that survives pause/resume cycles, so flipping `paused` -> `instant` flushes from the current position rather than restarting (which would double-dispatch `run_started` and reset the reducer). Two new useCritiqueReplay.test.tsx cases: - paused-then-instant transitions from `playing` to `done` and reaches the shipped terminal phase - intervalMs paced playback dispatches one event, pauses to drain the next scheduled timer, flips to instant, and confirms the remaining transcript drains exactly once (cursor was preserved) Doc consistency The earlier source comment in useCritiqueReplay.ts claimed `live` "paces by recorded timestamps" while the impl used zero-delay timers and the PR body said it behaves like `instant`. Aligned to reality: `live` currently behaves like `{ intervalMs: 0 }` (events drain on successive microtasks via setTimeoutFn) because transcripts do not yet carry per-event timestamps. Honest timestamp-driven pacing is queued as a Phase 7+ follow-up. Validated: pnpm guard, pnpm --filter @open-design/web typecheck, Theater suite 47/47 (up from 42, +3 sse + 2 replay), full web suite 96 files / 888 tests. * feat(i18n): seed Critique Theater key block (en + zh-CN; other locales fall back via spread) * feat(web): Theater PanelistLane component (Phase 8.1) * feat(web): Theater ScoreTicker component (Phase 8.2) * feat(web): Theater RoundDivider component (Phase 8.3) * feat(web): Theater InterruptButton component with Escape keybind (Phase 8.4) * feat(web): Theater TheaterDegraded chip (Phase 8.5) * feat(web): Theater TheaterCollapsed post-run summary (Phase 8.6) * feat(web): Theater TheaterTranscript replay surface (Phase 8.7) * feat(web): Theater TheaterStage top-level container (Phase 8.8) * feat(web): Theater CSS using existing semantic tokens (no hex literals) * feat(web): Theater public exports barrel * fix(web): resolve P2 + P3 review feedback on Phase 8 (PR #1314) Addresses all 4 P2 + 3 P3 items from codex, Siri-Ray, and lefarcen. State-lifecycle fixes (3 x P2) 1. Reducer learns a synthetic `__reset__` action (`CritiqueResetAction`). Host hooks dispatch it when their gating prop changes so a stale run from a prior project / transcript cannot bleed into the next context. Reset is idempotent on idle (returns the same reference). 2. `useCritiqueStream` dispatches `__reset__` at the top of its connection effect, so a workspace switch from project A (which streamed a critique) to project B clears the reducer before the new EventSource opens. enabled=false also clears. 3. `useCritiqueReplay` dispatches `__reset__` at the top of its parse effect, so transcriptUrl swaps (including swap-to-null after a replay reached `shipped`) lift the reducer back to idle before the new fetch starts. SSE validation (1 x P2) 4. `sseToPanelEvent` now runs a per-variant `hasValidVariantShape` check after the cheap `isPanelEvent` predicate. A `critique.ship` frame missing `composite` / `round` / `status` / `artifactRef` is rejected before reaching the reducer, so TheaterCollapsed can no longer crash on `undefined.toFixed(1)`. Every variant's required fields are validated: run_started (protocolVersion, non-empty cast, maxRounds, threshold, scale), panelist_* (round, role, plus variant-specific shape), round_end (round, composite, mustFix, decision in {continue,ship}, reason), ship (round, composite, status, artifactRef.{projectId,artifactId}, summary), degraded (reason, adapter), interrupted (bestRound, composite), failed (cause), parser_warning (kind, position). Reducer correctness (1 x P2) 5. `panelist_open` now materializes the round + an empty panelist view (`{dims: [], mustFixes: []}`) so TheaterStage can highlight the in-progress lane the instant the tag opens. Before this, a stream that emitted only `panelist_open` after `run_started` left `rounds = []` and the UI rendered no current round until a later `panelist_dim` arrived. Polish (3 x P3) 6. Brand role tint swaps from `var(--magenta, var(--accent))` to `var(--purple, var(--accent))`. `--purple` is actually defined across the design systems; `--magenta` is not, so Brand was silently falling through to `--accent` and looking identical to Designer. 7. New i18n key `critiqueTheater.interruptedSummary` for the interrupted-collapse copy ("Interrupted at round N, best composite X.X"). Previously the interrupted branch reused `shippedSummary` and the UI read "Shipped at round..." for a run that specifically did not ship. Native value in en + zh-CN; other locales fall back via `...en` spread. 8. `TheaterDegraded` heading id comes from `useId()` instead of a hardcoded `theater-degraded-heading`, so two chips rendered on the same page (chat history with multiple completed runs) keep their aria-labelledby references unambiguous. Tests (15 new cases) - reducer.test.ts (+5): __reset__ on running/terminal/idle, panelist_open materializes round, panelist_open does not stomp prior panelist data. - sse.test.ts (+6): variant-level rejection for ship without required fields, degraded without adapter, run_started with empty cast, panelist_dim with non-numeric score, round_end with unknown decision, plus a positive fully-formed ship. - useCritiqueStream.test.tsx (+2): state reset on projectId change, state reset on enabled flip false. - useCritiqueReplay.test.tsx (+1): state reset on transcriptUrl swap to null after a replay reached shipped. - TheaterCollapsed.test.tsx (text-pinning update): asserts the interrupted branch reads "Interrupted at round 1" + "best composite 7.9", and explicitly NOT "Shipped at round...". - TheaterDegraded.test.tsx (+1): two chips on the same page get unique aria-labelledby ids that each resolve to an `<h3>`. Validated - pnpm guard clean - pnpm --filter @open-design/web typecheck clean - Theater suite: 13 files, 101 tests (was 86 on the first Phase 8 push, +15 new) - tests/i18n/locales.test.ts 5 of 5 across 18 locales * fix(web): tighten isPanelEvent in contracts so enum + numeric fields are checked end-to-end (Siri-Ray round-3 P1 on PR #1314) The variant validator on the web SSE path previously accepted any `typeof === 'string'` for closed-enum fields (ship.status, panelist_.role, degraded.reason, failed.cause, parser_warning.kind, run_started.cast[]) and any `typeof === 'number'` for numeric fields, which let NaN / Infinity through. Downstream components index i18n tables by enum value, so an unknown status or role would land `SHIP_BADGE_KEY[final.status]` on undefined and crash the translator. The replay parser had a separate gap: `useCritiqueReplay.parseTranscript` called the cheap `isPanelEvent` header check directly, so a recorded line like `{"type":"ship","runId":"r"}` reached the reducer with composite, status, round, artifactRef, summary all undefined and TheaterCollapsed then called `final.composite.toFixed(1)` on undefined. Resolution: move all wire-side validation into the contract guard. - Export const arrays for the closed enums: SHIP_STATUSES, DEGRADED_REASONS, FAILED_CAUSES, PARSER_WARNING_KINDS, ROUND_DECISIONS (PANELIST_ROLES already existed). - Rewrite `isPanelEvent` in packages/contracts/src/critique.ts to be the single deep validator: header (known type + non-empty runId) plus every variant-specific required field plus closed-enum membership plus Number.isFinite on every numeric field. Documented as the wire source of truth. - Drop the local `hasValidVariantShape` from web/sse.ts; sseToPanelEvent now relies entirely on the contract guard, and parseTranscript in useCritiqueReplay (which already uses isPanelEvent) gets the deeper validation for free. Tests (TDD, red-first): - packages/contracts/tests/critique.test.ts: 13 new cases pinning the strict guard directly (well-formed across every variant, every rejection path: unknown type, empty/non-string runId, unknown enum, non-finite numeric, missing variant field). - apps/web/tests/components/Theater/state/sse.test.ts: 9 new cases for each closed-enum rejection on the wire path plus a positive sweep across every legal enum value across every variant. - apps/web/tests/components/Theater/hooks/useCritiqueReplay.test.tsx: 2 new cases for incomplete and unknown-enum transcript lines. Verified: - pnpm --filter @open-design/contracts test 4 files / 30 tests green. - pnpm --filter @open-design/contracts build clean. - pnpm --filter @open-design/web typecheck clean. - pnpm --filter @open-design/web test 107 files / 976 tests green. fix(contracts): enforce numeric domains in isPanelEvent (lefarcen P2 on PR #1314 round 4) The strict guard from PR #1314 round 3 enforced enum membership and Number.isFinite, but accepted any finite number where the contract intends a specific domain: scale: 0 (ScoreTicker divides by it), negative thresholds, fractional rounds, negative mustFix, etc. ScoreTicker.tsx writes `var(--scale, ${state.scale})` into inline CSS and divides by it for tick width, so a guard-passing scale: 0 shipped Infinity into the rendered style. Negative composite / score values reached downstream code that assumes >= 0. Resolution: mirror the daemon-side Zod domain constraints in the runtime guard. Three new helpers in packages/contracts/src/critique.ts: - isPositiveInt(v): integer with v > 0. Used for round, maxRounds, scale, protocolVersion (all 1-indexed in the orchestrator). - isNonNegativeInt(v): integer with v >= 0. Used for mustFix, position, bestRound. bestRound: 0 is the valid sentinel for 'interrupted before any round closed'. - isNonNegativeFinite(v): finite number with v >= 0. Used for composite, score, dimScore, threshold. Threshold may be fractional (e.g. 8.5 on a scale of 10). Cross-field check inside run_started: threshold <= scale (the daemon Zod schema enforces this with an epsilon refine, the wire guard matches the same intent). Tests (TDD, red-first) added in packages/contracts/tests/critique.test.ts: - 22 new rejection cases across every numeric field that previously slipped through: scale: 0, negative scale, fractional scale, maxRounds: 0, fractional maxRounds, protocolVersion: 0, fractional protocolVersion, negative threshold, threshold > scale, round: 0, fractional round, negative dimScore / score, negative / fractional mustFix, negative composite, ship round: 0, negative / fractional bestRound, negative interrupted composite, negative / fractional parser_warning position. - 3 positive boundary cases that must still pass: threshold == scale, fractional threshold within [0, scale], interrupted with bestRound: 0 (no round completed before interrupt), parser_warning with position: 0 (start of stream). Verified: - pnpm --filter @open-design/contracts build clean. - pnpm --filter @open-design/contracts test: 4 files / 59 tests green (was 37 before the new domain cases). - pnpm --filter @open-design/web typecheck clean. - pnpm --filter @open-design/web test: 110 files / 1004 tests green; no regression on Theater suite, sse validator, replay parser, or assistant-feedback widget tests. --------- Co-authored-by: Nagendhra <nagendhra405@gmail.com>	2026-05-12 21:38:58 +08:00

1 2 3 4 5 ...

262 commits