open-design

mirror of https://github.com/nexu-io/open-design.git synced 2026-06-01 03:14:35 +07:00

Author	SHA1	Message	Date
kami	d36ceacf3a	feat(web): surface saved Project instructions for review and retrieval (#1933 ) Saved Project instructions had no post-save read-back surface. The header control was a bare pencil icon that opened an editor and closed on Save, so after returning to the workspace a user could not confirm what was stored, revisit it, or tell whether it was still active (#1822). Make the saved state discoverable: once instructions exist, the header affordance becomes a persistent "Project instructions" chip. Opening it shows a read-only review panel with a preview of the saved text and an "Active - included in every message" status, plus a reopen-to-edit action. Saving lands back on the review panel so the stored value is read back immediately. The empty state keeps the pencil and opens the editor directly. Persistence and project-level system-prompt injection are unchanged. Closes #1822 Co-authored-by: multica-agent <github@multica.ai>	2026-05-17 23:07:25 +08:00
kami	647433ccef	Fix Inspect preview transport flash (#1967 ) Some checks failed ci / Packaged mac smoke (push) Blocked by required conditions Details ci / Packaged windows smoke (push) Blocked by required conditions Details ci / Detect PR change scopes (push) Failing after 2s Details ci / Validate workspace (push) Has been skipped Details nix-check / build (push) Failing after 1s Details ci / Packaged linux headless smoke (push) Has been skipped Details * Fix inspect preview transport flash Co-authored-by: multica-agent <github@multica.ai> * Keep inactive srcdoc transport lazy Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-17 22:30:37 +08:00
kami	30ad8b8ac3	improve privacy consent modal: policy link, clearer CTAs, mobile layout (#1921 ) The first-run consent banner had no link to a privacy policy, an affirmative button ("Help improve") that didn't read as a consent choice, and a fixed bottom-right card that crowded content on phones. - Add a "Read the privacy policy" link (external-link icon, accent, underlined) above the actions, plus a root PRIVACY.md it points to documenting the telemetry behaviour the modal discloses. - Rename the CTAs to "Share usage data" / "Don't share" so both name the action; they stay equal-prominence per the EDPB/GDPR comment. - Stretch the banner to a bottom-edge bar under 540px with a safe-area inset so it clears mobile browser chrome. - Add PrivacyConsentModal tests; sync the new i18n key to every locale and update the consent-label assertion in App.connectors. Refs #1756	2026-05-17 20:24:15 +08:00
Ethan Guo	9cc1fb28f3	fix(web): apply Tweaks palette shift to :root CSS custom properties (#1952 ) Walk document.styleSheets so chromatic custom properties on :root/html/body/:host (including @media-nested rules) are hue-shifted via inline setProperty, not just the values returned by getComputedStyle. Fixes #1393 References: PR #1643 (open, in review) binds the Tweaks toolbar toggle to artifact `.tw-panel` panels via a new `injectTweaksBridge` in `srcdoc.ts` — adjacent surface but different problem. Our fix extends `injectPaletteBridge` to also shift `:root` CSS custom-property colors, which #1643 does not touch.	2026-05-17 20:16:57 +08:00
Ethan Guo	b7fa1d9286	fix(web): recover stuck Composio connector when auth is cancelled (#1909 ) * fix(web): recover stuck Composio connector when auth is cancelled When a user closes the Composio system-browser auth window without completing the flow, the connector card stayed in its loading state forever because nothing told the daemon the request was abandoned. The window-focus refresh now awaits the status poll and, for every connector still in connectorAuthorizationPending whose freshly fetched status is not 'connected', issues cancelConnectorAuthorizationRequest and clears the matching local pending/error UI state. The connected guard is read from the just-returned statuses so a postMessage success that lands just before focus is not racy. Fixes #1354 * fix(web): gate Composio focus auto-cancel on TTL + cancel response Require expiresAt to have elapsed before auto-cancelling a pending Composio auth on focus, and preserve the spinner on a failed cancel (matching the manual Cancel handler) so the UI never disagrees with the daemon. Refs https://github.com/nexu-io/open-design/pull/1909#discussion_r3254152045 Refs https://github.com/nexu-io/open-design/pull/1909#discussion_r3254152046	2026-05-17 20:15:38 +08:00
Ethan Guo	324eca27ea	feat(web): add manual removal for captured Pod components (#1951 ) * feat(web): add manual removal for captured Pod components Adds `removePodMember` helper and a per-chip × in `BoardComposerPopover`; leaves `comments.ts` untouched (avoid-zone from #1127). Closes #802 Contract: runs/2026-05-16T08-08-52_open-design_issue-feat/contract.md * style(web): hide Pod chip × until chip hover Swaps the unicode × for the existing `Icon name="close"` SVG so the hit target stays centered, and fades the button in only on chip hover / keyboard focus for a quieter resting state. * fix(web): auto-close Pod composer when last chip is removed Removing the last chip leaves a stale anchor; close so Send cannot attach to elements no longer visible. * refactor(web): extract BoardComposerPopover and Pod-member reducer Moves the popover to its own module and lifts the chip-removal reducer into a pure `applyPodMemberRemoval` so unit tests exercise the real code path and the popover's export is no longer test-only. * fix(web): rebuild Pod anchor when a member is removed Without this, the popover keeps the original union bbox / selector / label after each chip removal, so a subsequent Send to chat anchors the comment to elements no longer in the Pod. * fix(web): render every captured chip and scroll on overflow The previous slice(0, 6) cap left chips beyond the sixth invisible and undeletable. Render the full list inside a 132px-tall scrollable strip.	2026-05-17 20:13:56 +08:00
Ethan Guo	10c62bf6e4	fix(web): warn before editing a built-in skill creates a shadow (#1850 ) Clicking Edit on a built-in skill silently issued PUT /api/skills/:id, which the daemon stores as a user-owned shadow and then hides the built-in entry from the Settings list. From the user's perspective the row they just edited disappears with no explanation. Arm an inline override-warning banner on Edit for source==='built-in' skills, mirroring the existing inline delete-confirm pattern. The edit form only opens after the user explicitly confirms. User skills bypass the warning and edit directly. No daemon or listSkills contract change. Fixes #1378	2026-05-17 18:30:45 +08:00
Ethan Guo	a6288cec3f	fix(web): allow editing existing routines from RoutinesSection (#1884 ) The Routines list only exposed Run now / Pause / History / Delete, so any change to a routine required deleting and recreating it. The daemon already serves PATCH /api/routines/:id; only the web entry point was missing. Add an Edit button on each routine row that pre-populates the create form from the stored schedule and target via a new formFromRoutine helper, and branch the submit handler on editingId so it PATCHes /api/routines/:id with the updated fields instead of POSTing a new routine. Fixes #1373	2026-05-17 15:02:54 +08:00
张东明	bac56415a2	fix(web): surface daemon error messages for invalid folder imports (#1923 ) * fix(web): surface daemon error messages for invalid folder imports importFolderProject() was swallowing non-2xx responses by returning null, so the UI could only show a generic "Open folder failed: <path>" message even though the daemon already returns specific errors like "cannot import the filesystem root" or "folder not found". Parse the daemon error body and throw so the panel displays the actual reason. Also show feedback for empty path input instead of silently returning. Fixes #1186 * test(web): update folder import test to match new error propagation The existing test expected a generic "Open folder failed: <path>" message from a boolean return. Update to match the new behavior where the daemon's error message is thrown and displayed directly.	2026-05-17 15:00:49 +08:00
kami	e64f1d8497	fix(desktop): export PDF saves to a file instead of the OS print dialog (#1920 ) * fix(desktop): export PDF saves to a file instead of the OS print dialog The `od:print-pdf` IPC handler called `webContents.print()`, which opens the printer-first macOS system print dialog. That handler is the destination of the renderer's `window.__odDesktop.printPdf()` bridge, so on desktop "Export PDF" felt like a print flow rather than a file export — from both the PreviewModal share menu (which always uses this path) and the FileViewer share menu (whose daemon-route fallback lands here too). Route the handler through a direct Save-as-PDF flow instead: a native Save dialog, then `webContents.printToPDF()` straight to the chosen file — the same shape `exportPdfFromHtml` already uses for the daemon-backed export path. The flow is extracted into `savePrintReadyDocumentAsPdf` behind a structural `PrintReadyPdfTarget` surface that has no `print()` method, so the regression cannot be reintroduced. Fixes #1774 Co-authored-by: multica-agent <github@multica.ai> * fix(desktop): preserve PDF page sizing in print bridge Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-17 11:26:08 +08:00
Yuhao Chen	2f553941d3	fix(runtime): auto-annotate imported HTML elements for Tweaks selection (#892 ) (#1169 ) * fix(runtime): auto-annotate imported HTML elements for Tweaks selection (#892) * fix(runtime): always annotate missing od-ids and broaden selector (#892) - Remove the conditional gate so annotateMissingOdIds runs for every srcdoc, not just comment/inspect bridges. This fixes the persistence regression where saved Inspect tweaks reference synthesized data-od-id selectors that vanish when the bridge is rebuilt without annotations. - Expand the selector to cover div-based imported HTML (div[class], div[id]), headings, buttons, and links so Picker/Pods work on common anonymous wrapper markup. Addresses review feedback from lefarcen and mrcfps. * fix(runtime): narrow div selector, skip iframe/object/embed, add tests (#892) - Replace broad div[class] with direct-child combinators only under semantic containers, body, and [id] elements to avoid layout noise - Add iframe, object, embed to the skip list alongside script/style - Add tests for: direct-child divs, nested div skip, always-on behavior, skip list with id attributes, div children of [id] elements - Format selector as array join and skip tags as Set for readability * fix(runtime): let annotateManualEditSourcePaths coexist with data-od-id (#892) annotateMissingOdIds now runs unconditionally, so elements like main/h1 get data-od-id before annotateManualEditSourcePaths runs. The old skip condition (has data-od-id) incorrectly prevented source-path marking on those elements. Change the guard to skip only when the element already has data-od-source-path, allowing both attributes to coexist.	2026-05-16 16:17:36 +08:00
Yuhao Chen	26502ea124	test(web): decouple memory preview icon assertion (#1863 )	2026-05-16 11:37:34 +08:00
mehmet turac	6b15236843	fix(web): distinguish expanded memory preview action (#1813 )	2026-05-15 23:08:30 +08:00
Yuhao Chen	19d65678a3	fix(web): keep filter pill hover labels readable (#1828 )	2026-05-15 23:07:43 +08:00
mehmet turac	9689dce1ad	fix(web): align comment marker numbering (#1826 )	2026-05-15 23:07:32 +08:00
mehmet turac	444f7d03eb	fix(web): reveal memory editor after edit click (#1827 )	2026-05-15 23:07:18 +08:00
mehmet turac	7d9488300e	fix(web): keep draw overlay scrollable (#1848 )	2026-05-15 23:05:49 +08:00
mehmet turac	c4a8e92916	fix(web): add breathing room to plugin publish footer (#1849 )	2026-05-15 23:02:36 +08:00
lefarcen	22a3b99a47	Merge origin/main into preview/v0.8.0 Sync 49 commits from main. Conflicts resolved: - .github/workflows/ci.yml: kept v0.8.0 granular per-area gating, added main's linux specs + release-stable.yml + release-preview.yml triggers - .github/workflows/release-preview.yml: kept v0.8.0's full workflow over main's placeholder - apps/web/src/components/AssistantMessage.tsx: combined v0.8.0 file-ops summary with main's stripTodoToolGroups + suppressAskUserQuestionFallbackText - apps/web/src/components/ChatPane.tsx: kept both new imports - apps/web/src/index.css: kept both .msg-plugin-chip and .user-copy-btn blocks - e2e/ui/*.test.ts: kept v0.8.0 openEntrySettingsDialog helper over main's inline dialog navigation (UI was redesigned in v0.8.0) - nix/package-{daemon,web}.nix: kept v0.8.0 pnpmDepsHash; rerun nix build to refresh	2026-05-15 18:23:33 +08:00
mehmet turac	15ebe8b266	fix(web): keep picker hint clear of comments panel (#1820 )	2026-05-15 17:51:12 +08:00
mehmet turac	fd80d7c997	fix(web): clear draw ink when exiting mode (#1821 )	2026-05-15 17:48:28 +08:00
Nicholas-Xiong	9cfb01e6b0	feat: add Italian (it) locale support (#1323 ) * feat: add Italian (it) locale support - Add complete Italian translation dictionary (apps/web/src/i18n/locales/it.ts) - Register 'it' locale in types.ts (Locale type, LOCALES array, LOCALE_LABEL) - Import and register Italian dictionary in index.tsx - All 1352 translation keys translated to Italian - Follows the same structure as French locale (PR #326) Closes #1245 * test: Update locale tests to include Italian (it) 1. Add 'it' to EXPECTED_LOCALES array 2. Add LOCALE_LABEL.it assertion to verify 'Italiano' label This fixes the CI test failure and completes the Italian locale integration. --------- Co-authored-by: lefarcen <935902669@qq.com>	2026-05-15 16:38:55 +08:00
leessju	4e19c3f4f3	Prevent imported Claude canvases from zooming on scroll (#1726 ) * Preserve HTML preview state across mode toggles HTML previews could rebuild their iframe when switching into Edit or Comment, which reset scroll/canvas state and caused visible churn for multi-file artifacts. The viewer now keeps URL-loaded previews mounted when the artifact owns the mode bridge, relays file-refreshes through frame navigation, and restores preview scroll/viewport state across bridge mode changes. Constraint: Generic srcdoc-only bridges are still required for unbridged artifacts, inspect mode, palette tweaks, decks, draw overlays, and forceInline. Rejected: Keep all Edit/Comment previews on srcdoc \| causes unnecessary iframe replacement for bridge-capable URL-loaded artifacts. Confidence: high Scope-risk: moderate Directive: Do not enable URL-load for bridge-dependent modes unless the artifact has an owned postMessage bridge. Tested: pnpm guard Tested: pnpm --filter @open-design/web typecheck Tested: pnpm --filter @open-design/web test Tested: Playwright verified Edit and Comment toggles preserve iframe src and DOM node while receiving comment targets. * Prevent preview wheel gestures from escaping into zoom Trackpad pinch-like wheel events arrive with ctrl/meta modifiers on some platforms, which can make a normal vertical scroll feel like the preview zoomed. The preview now consumes those modified wheel events inside the host preview shell and in injected srcdoc previews, then maps the delta back to scroll where a scroll target exists. Constraint: URL-loaded sandbox iframes cannot always be inspected by the host, so srcdoc previews need their own in-frame guard. Rejected: Add allow-same-origin to preview iframes \| weakens the sandbox boundary for generated artifacts. Confidence: medium Scope-risk: narrow Directive: Do not broaden iframe sandbox permissions to fix gesture handling without a security review. Tested: pnpm guard Tested: pnpm --filter @open-design/web typecheck Tested: pnpm --filter @open-design/web exec vitest run tests/components/FileViewer.test.tsx tests/runtime/srcdoc.test.ts Tested: playwright-cli verified ctrl-wheel in preview keeps app zoom at 100% and prevents default in the iframe context * Revert "Prevent preview wheel gestures from escaping into zoom" This reverts commit `976407ab4c`. * Prevent imported Claude canvases from zooming on scroll Claude Design exports can classify ordinary macOS two-finger vertical wheel events as mouse-wheel zoom clicks inside design-canvas.jsx. Normalize that imported canvas code so plain wheel input pans, while Cmd+wheel remains the explicit zoom gesture. Constraint: The offending canvas code lives inside imported user artifacts rather than a tracked runtime component, so the fix belongs in the Claude Design zip import normalization path.\nRejected: Host-side wheel interception \| wheel events inside the sandboxed iframe are handled by the artifact before the host can reliably classify them.\nRejected: Disable all wheel zoom \| users still need Cmd+wheel as an explicit zoom control.\nConfidence: high\nScope-risk: narrow\nDirective: Keep plain wheel as pan-only for imported design-canvas.jsx unless a future bridge provides an explicit wheel-mode handshake.\nTested: pnpm --filter @open-design/daemon exec vitest run tests/claude-design-import.test.ts\nTested: pnpm --filter @open-design/daemon typecheck\nTested: pnpm guard --------- Co-authored-by: nicejames <nicejames@gmail.com> Co-authored-by: lefarcen <935902669@qq.com>	2026-05-15 16:37:57 +08:00
Nicholas-Xiong	d16acf6462	fix: Add error feedback for manual folder path import (#1666 ) * fix: Add error feedback for manual folder path import Fixes #1408 When users manually enter a folder path and click 'Open folder' (non-Electron environment), the app now provides clear error feedback if the import fails. Before: - No error clearing before import - No error handling for failed imports - Silent failures left users confused After: - Clears previous errors before attempting import - Catches and displays import errors with clear messages - Success feedback is implicit (navigation to the opened project) Why implicit success feedback: The parent handler (Home.tsx) navigates to the newly opened project on success, which provides clear visual feedback by changing the entire view. An additional toast would be redundant. Error handling: - Catches all errors from onImportFolder - Displays user-friendly error messages - Preserves error details when available * fix: surface failed folder imports --------- Co-authored-by: Siri-Ray <2667192167@qq.com>	2026-05-15 16:36:24 +08:00
Zihan Zhao	cfcfbe0178	Inline attached file context for BYOK chats (#1730 ) BYOK/API-mode chats bypass the daemon run path, so attached project files were saved as message metadata but their readable contents were not sent to the provider. This adds a web-side attachment context step for API-mode requests, reusing raw text reads and existing document preview extraction. Constraint: Docker PDF previews require pdftotext in the runtime image Confidence: high Scope-risk: moderate Tested: corepack pnpm --filter @open-design/web test -- tests/api-attachment-context.test.ts tests/components/ProjectView.api-empty-response.test.tsx Tested: corepack pnpm --filter @open-design/web typecheck Tested: corepack pnpm --filter @open-design/web build Tested: corepack pnpm guard Tested: corepack pnpm typecheck	2026-05-15 15:52:15 +08:00
Yuhao Chen	b2d2635360	fix(web): hide resolved comments from preview overlays (#1762 )	2026-05-15 15:46:03 +08:00
Quang Do	88db51521d	feat(web): add custom select primitive (#1714 ) Some checks failed ci / Packaged mac smoke (push) Blocked by required conditions Details ci / Packaged windows smoke (push) Blocked by required conditions Details ci / Detect PR change scopes (push) Failing after 2s Details ci / Validate workspace (push) Has been skipped Details nix-check / build (push) Failing after 1s Details * feat(web): add custom select primitive * fix(web): harden custom select active option state	2026-05-15 14:43:18 +08:00
Tom Huang	c5d77a03bd	Garnet hemisphere (#1769 ) Some checks failed nix-check / build (push) Failing after 2s Details * feat(chat-composer): enhance mention handling and input overlay - Introduced a new overlay for inline mentions in the chat composer, improving user experience by visually indicating mentions as users type. - Updated the `ChatComposer` component to manage mention entities and integrate them into the input field, allowing for better context and interaction. - Enhanced the `AssistantMessage` component to support the display of plugin action panels based on the current project context, facilitating easier plugin management. - Refactored related components to ensure consistent handling of project files and mentions across the application. This update significantly improves the chat interaction model, making it more intuitive for users to engage with mentions and plugins. * feat(plugin-management): enhance plugin action panels and UI components - Updated the `AssistantMessage` component to include plugin action panels based on the latest project context, improving user interaction with generated plugins. - Refactored the `PluginsView` to support detailed views for available marketplace entries, allowing users to access more information and actions for each plugin. - Introduced new CSS styles for improved visual representation of plugin-related UI elements, enhancing overall user experience. - Enhanced the `listPlugins` function to include an option for fetching hidden plugins, providing more flexibility in plugin management. This update significantly improves the usability and functionality of the plugin management system, making it easier for users to interact with and manage their plugins. * fix(assistant-message): refine plugin folder candidate selection logic - Updated the `pluginFoldersTouchedThisTurn` function to improve the logic for selecting plugin folder candidates based on touched paths and message content. - Introduced a new helper function, `pathMatchesFolderFileBasename`, to enhance the matching criteria for folder candidates. - Added a check for explicit folder matches before falling back to a single candidate, improving accuracy in folder selection. - Modified the `shouldRenderSlotAsText` function in `HomeHero` to include the name parameter, refining the rendering logic for slot text. These changes enhance the functionality and reliability of the assistant message component in managing plugin folder candidates. * feat(plugin-folder-actions): implement agent-routed CLI actions for plugin management - Introduced a new `PluginFolderAgentAction` type to streamline actions related to plugin folders, including install, publish, and contribute. - Updated the `DesignFilesPanel`, `FileWorkspace`, and `AssistantMessage` components to utilize the new agent action handling, improving user interaction with generated plugins. - Refactored the action handling logic to send commands to the agent, enhancing the workflow for managing plugin folders. - Added corresponding tests to ensure the new functionality works as expected and integrates seamlessly with existing components. This update significantly enhances the plugin management experience by routing actions through the agent, allowing for a more cohesive and interactive user experience. * Fix PR 1702 CI blockers * Fix PR 1702 remaining CI checks * Prebuild AGUI adapter after install * Restore plugin project snapshot wiring * feat(marketplace): refactor marketplace URL handling and enhance fetching logic - Introduced new functions to normalize marketplace URLs and manage fetching of marketplace manifests, improving the reliability of marketplace integrations. - Updated the server and plugin logic to utilize the new fetching mechanisms, ensuring consistent handling of marketplace data. - Enhanced tests to cover new URL normalization and fetching scenarios, ensuring robustness in marketplace management. This update significantly improves the marketplace experience by streamlining URL handling and enhancing data fetching capabilities. * Fix project auto-send cleanup spec * Reconcile run messages on cancel * Use active design system as visual direction * Fix active design system prompt wording * feat(workspace-tabs): implement workspace tabs functionality and file attachment handling - Introduced a new `WorkspaceTabsBar` component to manage workspace tabs, allowing users to navigate between different views (projects, marketplace, etc.). - Enhanced file handling capabilities in the `HomeHero` and `EntryShell` components, enabling users to stage and attach files before project creation. - Updated the `App` component to support auto-sending attachments alongside the first message in a project. - Improved CSS styles for workspace tabs and attachment UI, ensuring a cohesive design and user experience. This update significantly enhances the workspace navigation and file management features, providing users with a more intuitive and efficient workflow. * refactor(workspace-tabs): streamline workspace tabs and UI components - Removed unused components and actions from the `WorkspaceTabsBar` and `AppChromeHeader`, simplifying the codebase. - Updated CSS styles for the workspace shell and tabs, enhancing visual consistency and reducing element sizes for a cleaner layout. - Introduced a new client type detection mechanism to dynamically adjust the workspace shell's class, improving responsiveness. - Added tests for the `WorkspaceTabsBar` to ensure proper navigation and tab management functionality. These changes improve the overall performance and user experience of the workspace navigation system. * Update critical e2e for entry modal flow * Stabilize entry critical e2e flows * fix(ui): adjust workspace tabs and header styles for improved layout - Updated the CSS for workspace tabs and the app header, reducing element sizes and padding for a cleaner appearance. - Introduced a new button in the `WorkspaceTabsBar` for quick access to the home tab, enhancing navigation. - Minor adjustments to the layout and styles to ensure consistency across components. These changes enhance the user interface and improve the overall user experience in the workspace navigation system. * feat(workspace-tabs): implement pinned home tab functionality - Added a new pinned home tab feature to the `WorkspaceTabsBar`, allowing the home tab to remain accessible during navigation. - Updated tab management logic to collapse duplicate home tabs into a single pinned instance when restoring from local storage. - Enhanced CSS styles for workspace tabs to accommodate the new pinned tab design. - Updated tests to verify the behavior of the pinned home tab and its interaction with other tabs. These changes improve navigation consistency and user experience within the workspace. * refactor(workspace-tabs): enhance tab management and styling - Updated CSS styles for workspace tabs, adjusting padding and flex properties for improved layout and consistency. - Refactored tab creation logic to ensure unique IDs for project and marketplace tabs, enhancing navigation clarity. - Removed deprecated functions related to pinned home tabs, streamlining the codebase. - Improved test cases to verify independent behavior of home tabs during navigation. These changes enhance the user experience by providing a more intuitive tab management system and a cleaner UI. * style(workspace-tabs): update CSS for improved layout and visibility - Adjusted CSS properties for workspace tabs, including overflow, position, and z-index to enhance layout and stacking context. - Ensured consistent styling across tab components for better visual hierarchy. These changes contribute to a more polished and user-friendly interface within the workspace. * style(entry-layout): update CSS variables for improved layout consistency - Replaced fixed width values with CSS variables for the entry rail to enhance flexibility. - Adjusted padding and height properties for better visual alignment and spacing. - Introduced a new background style for the entry main topbar to improve aesthetics. These changes contribute to a more responsive and visually appealing layout in the entry view. --------- Co-authored-by: qiongyu1999 <2694684348@qq.com> Co-authored-by: Eli <129168833+qiongyu1999@users.noreply.github.com>	2026-05-15 14:42:11 +08:00
ngoduybien	843b6fec4f	fix(web): fall back to srcDoc when HTML preview needs sandbox shim (#1306 ) * fix(web): fall back to srcDoc preview when HTML needs the sandbox shim The URL-load HTML preview iframe is sandboxed with `allow-scripts` only — no `allow-same-origin` — so any artifact that reads `localStorage`/`sessionStorage` at startup throws SecurityError, its React tree unmounts, and the preview goes blank. The srcDoc path already polyfills both via `injectSandboxShim` (apps/web/src/runtime/srcdoc.ts) before any user script runs, but URL-load served raw HTML untouched. Agent-emitted React prototypes that read Web Storage at mount went blank until the user toggled Tweaks (which forces the srcDoc path). Detect the two reliable signals — `<script type="text/babel">` (Babel-standalone XHR-fetches and evals sibling `.jsx` files at runtime; those routinely read Web Storage from `useState` initializers) and direct `localStorage` / `sessionStorage` references in the source — and set `forceInline` automatically so those artifacts route back through the srcDoc path. Plain static HTML keeps the URL-load benefits (real source maps, per-asset HTTP caching, isolated per-script failures). No new daemon endpoint, no new contract, no sandbox loosening. Pure content sniff in the existing render-mode helper; reuses the same `forceInline` seam the `parseForceInline` opt-out already uses. Tests cover the new helper across positive (Babel-standalone variants with attribute reordering, quoting, whitespace, case) and negative (plain script tags, module type, JSON type, substring lookalikes) cases. * fix(web): address review feedback on sandbox-shim fallback - Accept unquoted `<script type=text/babel>` per HTML5 attribute syntax (regex `["']` → `["']?`). Adds a focused test covering bare unquoted, unquoted with unquoted `src=`, mixed unquoted/quoted, and the negative case `<script type=text/babelish>` to confirm the trailing word-boundary still rejects look-alikes. - Memoize `htmlNeedsSandboxShim(source)` on `source`. HtmlViewer re-renders on board/inspect/edit/slide state changes; the scan only changes when the source itself does. Cheap micro-opt, free correctness win. - Narrow the helper docstring's scope claim and add an explicit known limitation: external scripts (`<script src="./app.js">`, `<script type="module" src="./main.js">`) that read Web Storage during module eval are not covered — the helper only sees the document, not the linked subresource. Workaround documented: `?forceInline=1` or Tweaks. Catching this case would require fetching every script reference before deciding load strategy, duplicating browser work; not worth the cost until a real report surfaces. * fix(web): correct inline comment on `\b` boundary behavior The comment claimed `\b` rejects `text/babel-other`, but `\b` matches between `l` and `-` (hyphen is a non-word char), so the regex actually does match that input. The test asserts `text/babelish` as the negative case, which `\b` does correctly reject (`i` is a word char). Comment now matches the regex's actual behavior, with a note that hyphenated variants are a harmless false positive (srcDoc fallback is the safe direction) and a pointer to the `(?=[\s>"'])` lookahead tightening if a real case ever surfaces. No behavior change; existing tests still pass. * fix(web): align test comment with helper docstring on hyphenated variants Same class of inconsistency the previous commit fixed in the helper: the test comment claimed `type=text/babel-other` "remains a non-match", but the assertion actually covers `type=text/babelish`, and the helper docstring explicitly documents hyphenated variants as a safe false-positive that does match. Comment now describes both shapes correctly and explains why the hyphenated variant isn't asserted (it's the documented safe direction, not a regression). No behavior change; test count unchanged. * chore: trigger CI	2026-05-15 14:41:23 +08:00
chaoxiaoche	bcc58af931	refactor(web): rename Execution mode and tighten settings dialog UI (#1568 ) * refactor(web): rename Execution mode and tighten settings dialog UI - Rename "Settings → Execution & model" to "Settings → Execution mode" across the web UI, i18n keys, docs, and e2e selectors. - Redesign SettingsDialog: kicker + title row in the modal head, a flatMap-driven agent grid that renders the inline test-result row beside the selected card, compact unavailable cards with right-aligned install/docs links, and an install guide that only shows when the user has no working agent picked. - Trim verbose subtitle / hint copy across chat model, CLI proxy, media providers, custom instructions, and memory sections. - Add an `info` Icon variant for the redesigned settings hints. - Update e2e selectors and docs that referenced the old menu label. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(web): polish Settings dialog — media providers, skills, MCP Media providers - Hide internal Stub fixture provider (settingsVisible: false) - Split provider list into Available (integrated, editable) and Coming Soon (collapsed <details> drawer with name/hint/Docs link only) - Drop right-side Integrated/Configured badges from every row; all rows in the main list are integrated by definition; inline grey "Saved" chip next to the provider name is the only status indicator now - "Saved" badge moves inline to the right of the provider name and uses a neutral grey treatment (was a standalone green pill below the name) - "Reload from daemon" button shows a 2s green "✓ Reloaded" flash on success instead of leaving a permanent paragraph under the header; errors remain sticky Skills - Replace three pill-row filter banks (Source, Type, Category) with a compact single-row toolbar: search + three inline <select> dropdowns side by side; active filter highlighted with a stronger border MCP server - Shorten section hint to one line - Move WHAT YOUR AGENT CAN DO capabilities above the client dropdown (motivate before asking to act) - Move "Build the daemon first" warning below the code block where it contextually explains why the command might fail, not as a top-level error before the user has done anything - Downgrade "Restart your client" left-border from accent orange to border-strong grey — it is a next step, not a warning External MCP - Shorten section hint to one line Misc CSS - Add .sr-only utility for accessible off-screen live regions - Add button.ghost.is-success-flash for transient success feedback - Add .library-filter-selects / .library-filter-select for dropdown filter rows - Add .media-provider-coming-soon-* for the roadmap drawer Co-authored-by: Cursor <cursoragent@cursor.com> * [codex] Add Cursor Agent auth diagnostics (#1538) * Add Cursor Agent auth diagnostics * Handle Cursor not logged in auth status * Address Cursor auth review feedback * Classify Cursor stdout auth failures * test: expand Memory and Routines coverage (#1521) * test: expand settings and packaged coverage * test: extend memory settings coverage * test: cover routine settings failure states * test: cover routine operation failures * test: fix daemon test typing on CI * test: decouple packaged smoke from orbit bug * test: avoid live memory LLM calls in route tests * test: fix daemon fetch typing in CI * fix: restore preview comment and inspect toggles * test: align manual edit flow with current inspector UX * test: align comment attachment flow with current preview comments UI * fix: probe resolved Codex launch path during detection * fix: remove duplicate board activation helper after rebase * test: update ghost cli detection mock * test: align FileViewer toolbar expectation * ci: move full app tests to extended lane * ci: run app tests by changed scope * ci: cover shared app inputs in test scopes * ci: avoid setup-node cache in windows packaged smoke * test: align extended settings and manual edit flows * refactor(web): rename Execution mode and tighten settings dialog UI - Rename "Settings → Execution & model" to "Settings → Execution mode" across the web UI, i18n keys, docs, and e2e selectors. - Redesign SettingsDialog: kicker + title row in the modal head, a flatMap-driven agent grid that renders the inline test-result row beside the selected card, compact unavailable cards with right-aligned install/docs links, and an install guide that only shows when the user has no working agent picked. - Trim verbose subtitle / hint copy across chat model, CLI proxy, media providers, custom instructions, and memory sections. - Add an `info` Icon variant for the redesigned settings hints. - Update e2e selectors and docs that referenced the old menu label. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(web): settings dialog UX polish — layout, dedup, and interactions - Remove duplicate section headers from all settings sections (Notifications, Appearance, Privacy, About, Design Systems, Skills, MCP server, Connectors, Media providers, Routines) - Restructure Notifications cards: title + toggle on same row, hint below - Restructure Skills toolbar: search + New skill button in row 1, filter dropdowns in row 2 with left-aligned labels - Restructure Pet section: tabs and Wake button on same row - MCP server: group capabilities and setup into separate cards, remove nested double border on client picker - Connectors: show connect errors as toast instead of inline card text, position toast inside panel, hide single-provider tab - Media providers: move Reload button to left-aligned small ghost button - Memory: info icon shows path on hover, Path copied badge inline; Extraction history and MEMORY.md as standalone collapsible cards; group header hidden when only one type visible - Pet grid cards: Adopt button hidden until hover, icon-only when adopted, description truncated to 2 lines, text fills full width via abs positioning - Agent cards: selected state uses accent border only, no background change - Add sun/moon icons to Appearance theme buttons (Light/Dark) - Shorten several hint strings for clarity Co-authored-by: Cursor <cursoragent@cursor.com> * fix(web): resolve i18n review comments from PR #1568 - Update settings.title and settings.envConfigure to localized "Execution mode" in all 17 non-English locale files - Add settings.memoryFlashPathCopied to all locales and use t() in MemorySection instead of hardcoded English "Path copied" - Add settings.agentModelHead to all locales and use t() in SettingsDialog for "Model for:" agent model row header Co-authored-by: Cursor <cursoragent@cursor.com> * fix(web): update tests to match settings dialog redesign - Add role prop to Toast (alert/status) so error toasts from ConnectorsBrowser are announced immediately by screen readers - Clear connectErrorToast on successful connector retry - Update SettingsDialog.execution tests: - Remove heading assertions for About and MCP server (headers were intentionally removed as duplicate nav labels) - Rewrite CLI env test to use codex-only fields (per-agent filtering means only selected agent's fields are shown) - Update Composio key hint text assertion to match shortened copy - Replace filter button click with select change for Type filter - Replace Configured/Unsupported/Integrated badge checks with updated assertions matching the new media provider UI - Replace disabled BFL row test with coming-soon section check - Update SettingsDialog.media test: remove Fal.ai input assertions (non-integrated providers no longer have editable fields) Co-authored-by: Cursor <cursoragent@cursor.com> * fix(web): unblock CI for #1568 Three small fixes to get Playwright back to green on the settings dialog redesign: 1. `en.ts`: revert `settings.envConfigure` to "Configure execution mode". This PR collapsed both `settings.title` (header gear) and `settings.envConfigure` (entry-side foot pill) to the same string "Execution mode", so `getByRole('button', { name: 'Execution mode' })` resolved to two elements and tripped Playwright strict mode in the three Composio-flow tests (entry-configuration-flows.test.ts:174, 228, 285). Restoring the distinct label also gives screen readers a clearer hint for the pill, which doubles as a status display. Non-English locales still alias the two keys; happy to follow up on those, but they don't gate the (English-only) Playwright suite. 2. entry-configuration-flows.test.ts:167 — `Connectors` heading is now rendered at `<h2>` in the modal-head (SettingsDialog.tsx:1545), with the inner `<h3>` removed by design (see comment around line 1448). Updated the assertion from `level: 3` to `level: 2`. 3. project-management-flows.test.ts:360 — same change for the `Pets` heading. Verified locally with `pnpm --filter @open-design/web typecheck` and `pnpm --filter @open-design/e2e typecheck`. The actual Playwright specs need the dev server up; I didn't rerun them here, but the locator changes are mechanical and match the new DOM. * fix(web): use exact match for Execution mode button locator Playwright's `getByRole({ name })` defaults to substring matching, so `{ name: 'Execution mode' }` still resolved to both the header gear (aria-label "Execution mode") and the entry-side foot pill (aria-label "Configure execution mode" — substring contains "Execution mode"). Strict mode tripped in the three composio-flow tests at lines 202, 257, and 319. Adding `exact: true` makes each call resolve to just the header gear, which opens the same dialog the foot pill does — the test outcomes are unchanged. --------- Co-authored-by: chaoxiaoche <chaoxiaoche@chaoxiaochedeMacBook-Pro.local> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Caprika <56862773+alchemistklk@users.noreply.github.com> Co-authored-by: shangxinyu1 <shangxinyu@refly.ai> Co-authored-by: lefarcen <935902669@qq.com>	2026-05-15 14:35:06 +08:00
Quang Do	a41d4f6126	fix(web): keep chat pinned during content growth (#1716 )	2026-05-15 14:12:00 +08:00
Prantik Medhi	01e54700a2	fix(web): make file grouping by kind work (#1551 ) * fix(web): group design files by kind * fix(web): unblock CI for #1551 - FileViewer test (line 434): add missing `projectKind="prototype"` to match every other instance; this was the source of the typecheck failure blocking workspace validation. - DesignFilesPanel "groups files by kind" test: assert against `.df-section-label` elements so the section header check is not ambiguous with the per-row kind cell text. - DesignFilesPanel batch-delete test: derive the expected file names from the rendered row testids and use `arrayContaining` so the assertion no longer depends on the (now kind-default) row order. * fix(web): satisfy strict-index typecheck in batch-delete test `onDeleteFiles.mock.calls[0][0]` tripped `noUncheckedIndexedAccess` ("Object is possibly 'undefined'"). Drop the separate length probe and assert the exact array instead — `selected` is a `Set`, `handleBatchDelete` spreads it with `[...selected]`, and the test clicks rows[0]/rows[1] in that order, so insertion order is deterministic and equals `[firstName, secondName]`. --------- Co-authored-by: lefarcen <935902669@qq.com>	2026-05-15 13:07:27 +08:00
Prantik Medhi	8aeedf368b	fix(web): localize accent controls in settings (#1565 ) * fix(web): localize accent controls * fix(web): localize accent default label * fix(web): unblock CI for #1565 Add missing `projectKind="prototype"` to the FileViewer deck-render test (line 434) so workspace typecheck stops failing on the `Property 'projectKind' is missing` error. This mirrors every other FileViewer render in the same file and is unrelated to the accent localization changes in this PR — it's drift from a recent change on main that made `projectKind` required. --------- Co-authored-by: lefarcen <935902669@qq.com>	2026-05-15 12:09:28 +08:00
Yuhao Chen	b0963fd874	fix(web): allow downloads from preview iframes (#1732 )	2026-05-15 11:55:29 +08:00
Tom Huang	76defffb93	Garnet hemisphere (#1702 ) * feat(chat-composer): enhance mention handling and input overlay - Introduced a new overlay for inline mentions in the chat composer, improving user experience by visually indicating mentions as users type. - Updated the `ChatComposer` component to manage mention entities and integrate them into the input field, allowing for better context and interaction. - Enhanced the `AssistantMessage` component to support the display of plugin action panels based on the current project context, facilitating easier plugin management. - Refactored related components to ensure consistent handling of project files and mentions across the application. This update significantly improves the chat interaction model, making it more intuitive for users to engage with mentions and plugins. * feat(plugin-management): enhance plugin action panels and UI components - Updated the `AssistantMessage` component to include plugin action panels based on the latest project context, improving user interaction with generated plugins. - Refactored the `PluginsView` to support detailed views for available marketplace entries, allowing users to access more information and actions for each plugin. - Introduced new CSS styles for improved visual representation of plugin-related UI elements, enhancing overall user experience. - Enhanced the `listPlugins` function to include an option for fetching hidden plugins, providing more flexibility in plugin management. This update significantly improves the usability and functionality of the plugin management system, making it easier for users to interact with and manage their plugins. * fix(assistant-message): refine plugin folder candidate selection logic - Updated the `pluginFoldersTouchedThisTurn` function to improve the logic for selecting plugin folder candidates based on touched paths and message content. - Introduced a new helper function, `pathMatchesFolderFileBasename`, to enhance the matching criteria for folder candidates. - Added a check for explicit folder matches before falling back to a single candidate, improving accuracy in folder selection. - Modified the `shouldRenderSlotAsText` function in `HomeHero` to include the name parameter, refining the rendering logic for slot text. These changes enhance the functionality and reliability of the assistant message component in managing plugin folder candidates. * feat(plugin-folder-actions): implement agent-routed CLI actions for plugin management - Introduced a new `PluginFolderAgentAction` type to streamline actions related to plugin folders, including install, publish, and contribute. - Updated the `DesignFilesPanel`, `FileWorkspace`, and `AssistantMessage` components to utilize the new agent action handling, improving user interaction with generated plugins. - Refactored the action handling logic to send commands to the agent, enhancing the workflow for managing plugin folders. - Added corresponding tests to ensure the new functionality works as expected and integrates seamlessly with existing components. This update significantly enhances the plugin management experience by routing actions through the agent, allowing for a more cohesive and interactive user experience. * Fix PR 1702 CI blockers * Fix PR 1702 remaining CI checks * Prebuild AGUI adapter after install * Restore plugin project snapshot wiring * feat(marketplace): refactor marketplace URL handling and enhance fetching logic - Introduced new functions to normalize marketplace URLs and manage fetching of marketplace manifests, improving the reliability of marketplace integrations. - Updated the server and plugin logic to utilize the new fetching mechanisms, ensuring consistent handling of marketplace data. - Enhanced tests to cover new URL normalization and fetching scenarios, ensuring robustness in marketplace management. This update significantly improves the marketplace experience by streamlining URL handling and enhancing data fetching capabilities. * Fix project auto-send cleanup spec	2026-05-14 21:12:50 +08:00
sakshyasinha	c4a67a7b3e	Fix Kimi CLI icon contrast in light mode (#1667 ) * fix(web): improve Kimi CLI icon contrast * fix(web): render Kimi icon via theme-aware CSS mask Move Kimi to the MONO_ICONS set so it renders through CSS mask with currentColor adaptation, making it legible in both light and dark themes instead of baking a single dark fill that fails on dark backgrounds. * fix(web): adjust Kimi icon secondary mark for dual-theme contrast Keep Kimi as a baked two-tone asset: blue accent (#1783ff) for brand identity, mid-tone gray (#666666) secondary mark for acceptable contrast on both light and dark card surfaces. Revert from mask path to preserve the blue branding. * fix(web): correct corrupted Kimi SVG and strengthen asset validation test Remove extraneous PR discussion text that was accidentally included in the SVG file. Strengthen the test to validate the bundled asset is valid SVG with the expected fills (blue accent + gray secondary mark), catching asset corruption that would otherwise go undetected.	2026-05-14 20:32:52 +08:00
lefarcen	640a332276	Merge remote-tracking branch 'origin/garnet-hemisphere' into reconcile/garnet-main-merge	2026-05-14 17:44:44 +08:00
lefarcen	3b7f87c7ae	Merge remote-tracking branch 'origin/main' into reconcile/garnet-main-merge	2026-05-14 17:44:26 +08:00
pftom	8e3af79dea	feat(github-installer): enhance GitHub content installation and error handling - Introduced new interfaces for GitHub content entries and budgets to streamline content fetching. - Enhanced the `installFromGithub` function to support installation from GitHub contents, including subpath handling. - Implemented robust error handling and retry logic for fetching GitHub content, improving installation resilience. - Updated tests to validate the new content fetching logic and ensure correct behavior across various scenarios. This update significantly improves the GitHub installation process, making it more flexible and user-friendly.	2026-05-14 17:31:29 +08:00
lefarcen	b268bbe169	Merge origin/garnet-hemisphere (post-9e196d34) — Use Plugin handoff fix Brings in 11 new garnet commits, most importantly: - `1a90aef4` feat(plugin-use): implement plugin use handoff functionality — fixes the bug QA reported where /plugins Use Plugin would 422 silently for template plugins; new flow hands off to HomeView with the plugin pre-bound + input form prompted there. - `2ac58544` feat(plugin-inputs): enhance plugin input handling with file upload support — extends PluginInputsForm for file uploads. - `3b167b69` feat(plugins): registry protocol — new @open-design/registry-protocol workspace package (needs build before daemon boot). - Plus enhancements to plugin metadata, GitHub installer, plugin detail view, login/whoami, static HTML preview paths. Conflicts resolved: - packages/contracts/src/api/projects.ts: HEAD's skipDiscoveryBrief field + garnet's contextPlugins (@-mention plugin context refs) both kept on ProjectMetadata. - apps/landing-page/* (3 files): accepted HEAD — garnet had the older single-page landing-page header; main has the multi-page layout (/skills/, /systems/, /templates/, /craft/) with dynamic counts. Not related to the Use Plugin core fix. New @open-design/registry-protocol package must be built before daemon boots; pnpm install does this via postinstall already.	2026-05-14 16:32:35 +08:00
pftom	6614b9bf09	refactor(plugin-authoring): streamline prompt and input handling - Removed the redundant `buildPluginAuthoringPrompt` function call in `startPluginAuthoring` for cleaner code. - Introduced new functions to build prompts and inputs based on user goals, enhancing the authoring experience. - Updated `HomeView` to manage authoring inputs and prompts more effectively, ensuring better state handling. - Adjusted the `PluginImportModal` to reflect changes in the import process, removing references to template creation. - Enhanced tests to cover new input handling and prompt generation logic, ensuring reliability in the authoring flow. This update improves the clarity and efficiency of the plugin authoring process, making it more intuitive for users.	2026-05-14 16:18:34 +08:00
pftom	fbcf50382a	feat(github-installer): enhance GitHub source parsing and error handling - Updated the GitHub source regex to support more flexible parsing of repository references, including subpaths. - Introduced a new `parseGithubSource` function to handle the extraction of owner, repo, and potential subpaths, improving the robustness of the installation process. - Enhanced the `installFromGithub` function to retry fetching from multiple candidates and provide detailed error messages when installation fails. - Added tests to validate the new parsing logic and ensure correct behavior when handling various GitHub source formats. This update significantly improves the handling of GitHub sources, making the installation process more resilient and user-friendly.	2026-05-14 16:09:37 +08:00
Nagendhra Madishetti	40766ef1ba	test(web): Critique Theater Phase 13 (reducer p99 bench + surface coverage walker) (#1318 ) * feat(web): pure reducer for Critique Theater states (Phase 7.1) Pure CritiqueState reducer driven by the contracts-level PanelEvent (the same shape both the live SSE stream and the recorded transcript emit), so a single reducer powers both the in-flight panel and the rerun replay. Lifecycle covers run_started → running → (shipped / degraded / interrupted / failed), with panelist_open / dim / must_fix / close / round_end events building per-round CritiquePanelistView entries as they arrive. Defensive behaviour that surfaced while writing the spec tests: - Terminal phases (shipped / degraded / interrupted / failed) are sticky against further lifecycle events for the same run, except for parser_warning which can land late and is recorded in a side channel without changing phase. - A new run_started for a different runId at any time discards the prior state and reboots, so the UI can launch consecutive runs without an explicit reset action. - Events whose runId does not match the active run return the same state reference, so React's useReducer doesn't re-render subscribers on stray traffic. - Round bookkeeping keys by round number rather than "always last", so an out-of-order panelist_dim for round 1 arriving after a round 2 dim does not corrupt the round 2 bucket. Test coverage: 18 cases covering each transition, the runId guard, sticky-terminal behaviour, the out-of-order round invariant, and the stable-identity guarantee. Sets up Phase 7.2 and 7.3 to wire SSE + replay into the same reducer. * feat(web): useCritiqueStream hook subscribes to SSE and feeds reducer (Phase 7.2) createCritiqueEventsConnection is a pure connection manager that mirrors apps/web/src/providers/project-events.ts: opens an EventSource at /api/projects/:id/events, listens for every name in CRITIQUE_SSE_EVENT_NAMES, decodes each frame back into a PanelEvent (stripping the critique. prefix and merging the data payload), and hands it to the caller's onEvent. Reconnect uses exponential backoff (1s → 30s) and resets on `ready`; malformed payloads drop with a dev-mode warning rather than tearing the stream. useCritiqueStream wraps the manager in a useReducer that owns the CritiqueState. enabled=false or a null projectId tears down the connection cleanly; switching projectId closes the old connection and opens a fresh one. The returned dispatch lets local UI synthesise actions (e.g. an Esc keypress firing a synthetic interrupted while a kill request is in flight); production traffic comes from the SSE stream. Test coverage: - sse.test.ts (10 cases, node env): subscription set covers every CRITIQUE_SSE_EVENT_NAMES channel; payload decoding lifts the wire shape back to PanelEvent; malformed JSON is swallowed and does not stop the stream; exponential backoff schedule and ready-reset semantics are pinned with a setTimeout seam; close() cancels pending reconnects and shuts the live source; no-op fallback when EventSource is unavailable. - useCritiqueStream.test.tsx (6 cases, jsdom env): idle pre-event, reducer driven by synthetic actions, no connection when disabled or projectId is null, clean close on unmount, projectId change reopens cleanly. * feat(web): useCritiqueReplay hook drives reducer from transcript file (Phase 7.3) Fetches the per-run NDJSON transcript (one PanelEvent per line), parses every line via the shared isPanelEvent predicate, and dispatches into the same CritiqueState reducer the live SSE stream uses. A single reducer means the UI rendering a replay can be identical to the live panel, and a UI mounting both useCritiqueStream and useCritiqueReplay in parallel does not have to reconcile two state shapes. speed knob is `paused \| instant \| live \| { intervalMs: N }`. - instant flushes every event synchronously, useful for opening a finished run already at its terminal state. - intervalMs paces dispatches at a fixed cadence so the reviewer can watch the run unfold. - paused parses the transcript but holds events back until the caller advances speed (consumers can drive a scrubber later). - live is reserved for the future "playback at original cadence" feature, currently treated as instant; replay timestamps are not yet persisted with each event so honest pacing requires a follow-up Phase 7+ task. gunzip seam handles `.ndjson.gz` transcripts via DecompressionStream when present; the production fetch path picks between text and arrayBuffer based on the URL extension. Both seams are injectable so the unit tests don't need to spin up a real network or a real gzip pipeline. Test coverage (8 cases, jsdom env): - Idle status before any URL is provided. - speed=instant flushes the full transcript synchronously to shipped state. - speed={intervalMs:N} paces with the setTimeout seam, reaching done after the last tick. - speed=paused leaves status=playing with no dispatches. - Empty transcript reports done with state still idle. - Fetch rejection surfaces an error status with the message. - Malformed NDJSON lines are skipped; valid events around them still land. - .gz transcripts route through the gunzip seam. Closes the Phase 7 plan tasks 7.1 / 7.2 / 7.3 (reducer + stream + replay), all on one branch ready for review. Phases 8+ (Theater components) consume these from this PR. * fix(web): close payload-override gap + paused-resume bug in Critique Theater hooks (Phase 7 review) Two P1 fixes from lefarcen's review on PR #1307: SSE payload override `sseToPanelEvent` previously spread `data` after the channel-derived `type`, so a payload-provided `type` could override the channel and route a `critique.run_started` frame into the reducer as a `ship` action. Reversed the spread so the channel-derived `type` is authoritative, and revalidated the resulting object through the contracts-level `isPanelEvent` predicate before returning. Frames that fail validation (missing runId, empty runId, unknown type) are dropped, so a malformed or compromised SSE frame can no longer dispatch a wrong-shape action into the reducer. Three new sse.test.ts cases pin the regression: hostile `type:'ship'` in the payload still resolves to `run_started`, missing runId is dropped, empty runId is dropped. Replay pause/resume `useCritiqueReplay` had one big effect keyed on `transcriptUrl` only, so flipping `speed` from `paused` to `instant` never re-fired and the held events sat undispatched. Split into a parse effect (depends on URL, fetches and stores events in state) and a pace effect (depends on parsed-events + speed, owns the cursor + timers). The playback cursor lives in a ref that survives pause/resume cycles, so flipping `paused` -> `instant` flushes from the current position rather than restarting (which would double-dispatch `run_started` and reset the reducer). Two new useCritiqueReplay.test.tsx cases: - paused-then-instant transitions from `playing` to `done` and reaches the shipped terminal phase - intervalMs paced playback dispatches one event, pauses to drain the next scheduled timer, flips to instant, and confirms the remaining transcript drains exactly once (cursor was preserved) Doc consistency The earlier source comment in useCritiqueReplay.ts claimed `live` "paces by recorded timestamps" while the impl used zero-delay timers and the PR body said it behaves like `instant`. Aligned to reality: `live` currently behaves like `{ intervalMs: 0 }` (events drain on successive microtasks via setTimeoutFn) because transcripts do not yet carry per-event timestamps. Honest timestamp-driven pacing is queued as a Phase 7+ follow-up. Validated: pnpm guard, pnpm --filter @open-design/web typecheck, Theater suite 47/47 (up from 42, +3 sse + 2 replay), full web suite 96 files / 888 tests. * feat(i18n): seed Critique Theater key block (en + zh-CN; other locales fall back via spread) * feat(web): Theater PanelistLane component (Phase 8.1) * feat(web): Theater ScoreTicker component (Phase 8.2) * feat(web): Theater RoundDivider component (Phase 8.3) * feat(web): Theater InterruptButton component with Escape keybind (Phase 8.4) * feat(web): Theater TheaterDegraded chip (Phase 8.5) * feat(web): Theater TheaterCollapsed post-run summary (Phase 8.6) * feat(web): Theater TheaterTranscript replay surface (Phase 8.7) * feat(web): Theater TheaterStage top-level container (Phase 8.8) * feat(web): Theater CSS using existing semantic tokens (no hex literals) * feat(web): Theater public exports barrel * fix(web): resolve P2 + P3 review feedback on Phase 8 (PR #1314) Addresses all 4 P2 + 3 P3 items from codex, Siri-Ray, and lefarcen. State-lifecycle fixes (3 x P2) 1. Reducer learns a synthetic `__reset__` action (`CritiqueResetAction`). Host hooks dispatch it when their gating prop changes so a stale run from a prior project / transcript cannot bleed into the next context. Reset is idempotent on idle (returns the same reference). 2. `useCritiqueStream` dispatches `__reset__` at the top of its connection effect, so a workspace switch from project A (which streamed a critique) to project B clears the reducer before the new EventSource opens. enabled=false also clears. 3. `useCritiqueReplay` dispatches `__reset__` at the top of its parse effect, so transcriptUrl swaps (including swap-to-null after a replay reached `shipped`) lift the reducer back to idle before the new fetch starts. SSE validation (1 x P2) 4. `sseToPanelEvent` now runs a per-variant `hasValidVariantShape` check after the cheap `isPanelEvent` predicate. A `critique.ship` frame missing `composite` / `round` / `status` / `artifactRef` is rejected before reaching the reducer, so TheaterCollapsed can no longer crash on `undefined.toFixed(1)`. Every variant's required fields are validated: run_started (protocolVersion, non-empty cast, maxRounds, threshold, scale), panelist_* (round, role, plus variant-specific shape), round_end (round, composite, mustFix, decision in {continue,ship}, reason), ship (round, composite, status, artifactRef.{projectId,artifactId}, summary), degraded (reason, adapter), interrupted (bestRound, composite), failed (cause), parser_warning (kind, position). Reducer correctness (1 x P2) 5. `panelist_open` now materializes the round + an empty panelist view (`{dims: [], mustFixes: []}`) so TheaterStage can highlight the in-progress lane the instant the tag opens. Before this, a stream that emitted only `panelist_open` after `run_started` left `rounds = []` and the UI rendered no current round until a later `panelist_dim` arrived. Polish (3 x P3) 6. Brand role tint swaps from `var(--magenta, var(--accent))` to `var(--purple, var(--accent))`. `--purple` is actually defined across the design systems; `--magenta` is not, so Brand was silently falling through to `--accent` and looking identical to Designer. 7. New i18n key `critiqueTheater.interruptedSummary` for the interrupted-collapse copy ("Interrupted at round N, best composite X.X"). Previously the interrupted branch reused `shippedSummary` and the UI read "Shipped at round..." for a run that specifically did not ship. Native value in en + zh-CN; other locales fall back via `...en` spread. 8. `TheaterDegraded` heading id comes from `useId()` instead of a hardcoded `theater-degraded-heading`, so two chips rendered on the same page (chat history with multiple completed runs) keep their aria-labelledby references unambiguous. Tests (15 new cases) - reducer.test.ts (+5): __reset__ on running/terminal/idle, panelist_open materializes round, panelist_open does not stomp prior panelist data. - sse.test.ts (+6): variant-level rejection for ship without required fields, degraded without adapter, run_started with empty cast, panelist_dim with non-numeric score, round_end with unknown decision, plus a positive fully-formed ship. - useCritiqueStream.test.tsx (+2): state reset on projectId change, state reset on enabled flip false. - useCritiqueReplay.test.tsx (+1): state reset on transcriptUrl swap to null after a replay reached shipped. - TheaterCollapsed.test.tsx (text-pinning update): asserts the interrupted branch reads "Interrupted at round 1" + "best composite 7.9", and explicitly NOT "Shipped at round...". - TheaterDegraded.test.tsx (+1): two chips on the same page get unique aria-labelledby ids that each resolve to an `<h3>`. Validated - pnpm guard clean - pnpm --filter @open-design/web typecheck clean - Theater suite: 13 files, 101 tests (was 86 on the first Phase 8 push, +15 new) - tests/i18n/locales.test.ts 5 of 5 across 18 locales * feat(web): CritiqueTheaterMount wires SSE + reducer into a single drop-in (Phase 9.1) * feat(i18n): Critique Theater strings for de + ja + ko + zh-TW (Phase 9.2) * fix(web): resolve P1 + P2 review feedback on Phase 9 (PR #1315) Addresses every blocker from codex, Siri-Ray, and lefarcen. The three state-lifecycle and SSE-validation issues they also flagged inherit fixes from PR #1314's review pass that this branch now sits on top of after rebase. Real daemon kill on Interrupt (P1) - CritiqueTheaterMount now POSTs to /api/projects/:id/critique/:runId/interrupt alongside the optimistic local dispatch. Before this fix, clicking Interrupt only flipped the React state to interrupted while the daemon job kept running. The fetch is best-effort: a 404 (endpoint not wired yet, lands in Phase 15) is swallowed with a dev-mode console.warn so the UI still moves to the collapsed badge. - New fetchInterrupt test seam lets RTL assert on the URL / method and simulate the "daemon not ready yet" path. Two tests pin both: the happy URL proj-42/critique/run-abc/interrupt POSTs, and a rejected fetch still flips the UI. interruptPending reset on new run (P2) - A ref-backed effect compares the current runId against the last one we saw; when it changes, interruptPending is cleared. A user who interrupts run-1 and then triggers run-2 from the same mount now gets a fresh, enabled kill button instead of one stuck in "Interrupting…". Pinned by a new mount test. Escape keybind scope (P2) - InterruptButton now checks the keydown target. Escape inside an input, textarea, select, or contenteditable element is ignored (and any ancestor of those via closest() is treated the same way). Body-level focus still fires the keybind so the Theater area's affordance keeps working. Four new tests cover textarea, input, contenteditable, and the body-focus positive case. userFacingName i18n key (P2) - The spec at specs/current/critique-theater.md:6 mandates a single critiqueTheater.userFacingName key so the "Design Jury" label can be renamed without touching code. Phase 8 introduced critiqueTheater.title by mistake; renamed across types.ts, en.ts, zh-CN.ts, de.ts, ja.ts, ko.ts, zh-TW.ts, and the lone consumer TheaterStage.tsx. The locale alignment test stays green. Validated - pnpm guard clean - pnpm --filter @open-design/web typecheck clean - Theater suite: 14 files, 112 tests (was 101 before, +11 new for the Phase 9 review pass: 3 mount + 4 InterruptButton focus scope; the rest were already in #1314's review fix). - tests/i18n/locales.test.ts 5 of 5 across 18 locales. * feat(daemon): adapter-degraded registry with TTL (Phase 10.1) In-memory registry recording adapters that produced malformed or oversize transcripts so the orchestrator can skip them for a TTL window (default 24h) instead of cycling through known-bad providers on every run. Records carry reason (malformed_block \| oversize_block \| missing_artifact), source label, and expiresAt. The test-only clock seam lets the suite advance time deterministically and prove that an expired entry stops counting as degraded without anyone calling clearDegraded. 7/7 vitest cases green. * feat(daemon): synthetic good + bad adapter fixtures (Phase 10.2) Two test-only adapters that read the existing v1 transcript fixtures (happy-3-rounds and malformed-unbalanced) and replay them as either a full string or a 512-byte chunked stream. The chunked form is what the conformance harness uses to prove the parser holds together when the transcript arrives in arbitrary network slices, not as one buffered blob. * feat(daemon): adapter conformance harness (Phase 10.3) runAdapterConformance pulls a transcript through the same parseCritiqueStream pipeline the orchestrator uses and classifies the outcome as shipped, degraded, or failed. On a degraded outcome it forwards the matched reason to the adapter-degraded registry, so a single nightly conformance run is what populates the skip list rather than the orchestrator learning each adapter is broken at request time. 5/5 vitest cases green covering shipped, malformed degraded, oversize degraded, no-ship failure, and the harness-thrown failure path. * test(e2e): Critique Theater Playwright suite (Phase 11) Six tests, one viewport per visual case, deterministic SSE fixtures stubbed via page.route(). Adds the suite to test:ui:extended so the existing extended-UI lane picks it up. Coverage: 1. Happy path: a single mounted theater plays the full fixture (1 run_started, 5 panelists open / dim / must_fix / close, 1 round_end, 1 ship) and ends on the score badge. 2. Interrupt mid-run: the panelist that is open at the time the interrupt button is clicked closes with an interrupted marker and the transcript freezes there. 3. Visual regression at 375x720 mobile. 4. Visual regression at 768x1024 tablet. 5. Visual regression at 1280x800 desktop. 6. A11y role tree: the theater region exposes a labelled landmark, each panelist lane is a group with an accessible name, the score is a status live region. All SSE traffic is stubbed by page.route so the suite runs in CI without a daemon. The toggle is seeded via localStorage by bootAppWithCritiqueEnabled so the gate behaves as if Settings flipped it on. typecheck clean; playwright --list reports 6. * test(web): reducer p99 bench at 10k iterations (Phase 13.1) Locks the documented 2ms budget for the Critique Theater reducer on a representative SSE script (27 actions, one full happy run) behind a regression gate. Asserts p99 stays under 4ms (2x the documented budget) so CI runners with a noisy neighbour do not flake while a real regression to 20ms or 200ms still trips. The bench is a vitest case rather than a bare microbenchmark so it runs in the same CI lane as every other web test and does not need a parallel runner. * test(web): critique surface coverage walker (Phase 13.2) Walks the public critique surface (11 SSE event names, 5 panelist roles, 6 lifecycle phases, 9 named i18n keys) and asserts each named symbol appears in both the src corpus and the test corpus. The walker is the gate that catches a rename in one half of the codebase without a matching update in the other half: a future PR that drops 'panelist_must_fix' from the reducer without also removing its test reference fails this suite. 62 assertions, one per symbol per corpus. * fix(web): tighten Phase 13 gates from lefarcen review (PR #1318) Address the actionable items from lefarcen's review of the two Phase 13 CI gates. The two questions about longer-term DX (pre- commit hook to auto-update the symbol table, AST-walker swap) are documented as deferred follow-ups rather than landed here. reducer-bench: - Describe renamed to 'reducer p99 regression gate (Phase 13.1)' so it reads as a gate, not a comparative benchmark. - Failure message now carries the full distribution (p50 / p90 / p99 / max + ceiling), so triage on a tripped gate can distinguish a real 20ms regression from a 4.001ms CI hiccup without re-running locally (lefarcen Q3). - Captured a baseline (p50=0.011ms p90=0.013ms p99=0.018ms max=0.244ms on a local Node 24 / Win11 run, 2026-05-11) inside the docblock so reviewers can see the actual reading sits ~222x below the 4ms ceiling (lefarcen Q1). - Replaced 'role as any' casts with PanelistRole-typed casts so the fixture is typecheck-strict. - Phase numbering corrected (13.2 → 13.1 to match the PR body). critique-coverage: - Symbols now grouped under four describe blocks (SSE events / panelist roles / lifecycle phases / i18n keys) so a failure points at the category that drifted at a glance (lefarcen nit). - Docblock now explains the grep-over-AST trade-off (the bug class is structural at the string level, not at the AST level) and points at the future AST-walker work as a deferred follow- up (lefarcen Q2). - Docblock now walks a contributor through the four-step maintenance flow (add to contract → add caller → add test → add literal here), so the next person to add an SSE event or i18n key knows the gate exists and what to update (lefarcen Q4). - Phase strings switched from 'phase: <name>' to bare-quoted literals so the walker is robust against single vs double quotes and ':' vs '===' source-shape changes. - Dead try/catch around 'stack = [root]' removed (cannot throw). - Per-symbol failure messages name the symbol AND which corpus is missing it, so the gate is self-describing on the next CI red. - Phase numbering corrected (13.4 → 13.2 to match the PR body). 63 / 63 vitest cases green (1 bench + 62 coverage). Web typecheck clean. * fix(web): tighten coverage walker semantics from lefarcen P2/P3 (PR #1318) Two follow-on findings on commit `338a185`: P2 — coverage gate weakened. The previous revision used one helper `corpusReferences` for both SRC and TEST corpora, and that helper accepted the unprefixed PanelEvent type form (`type: 'panelist_must_fix'`) as a substitute for the prefixed SSE wire name (`critique.panelist_must_fix`). The fallback is correct on the TEST side (reducer tests dispatch PanelEvent literals) but it weakened the SRC side: production code could drop the SSE channel name silently and the PanelEvent type alias would keep the walker green. Split into two helpers: `srcReferences` is strict (exact substring match only, no fallback) and `testReferences` keeps the lenient fallback for SSE events. The production-side assertions now route through `srcReferences` so the wire name is load-bearing again. P3 — maintenance doc overclaimed. The previous revision said 'CI red if you forget step 4' but the symbol arrays are partially hand- maintained, so a contributor adding a NEW phase string or i18n key without updating the array leaves CI green (the walker never knew to look). Rewrote the failure-mode section to distinguish the two cases: - Renaming an EXISTING symbol without updating the walker → CI red (existing assertion fails because the old name is gone). - Adding a NEW hand-maintained symbol without updating the walker → CI stays green (walker does not know to look for it). Also clarified that `SSE_EVENTS` and `PANELIST_ROLE_STRINGS` are auto-built from contracts so step 4 is one-line for `PHASE_STRINGS` and `I18N_KEYS` only. 63 / 63 vitest cases still green. * fix(web): close two P2 findings on PR #1318 (Siri-Ray + lefarcen) P2 (coverage walker counted self as evidence). The walker walked apps/web/tests, which contains apps/web/tests/components/Theater/ critique-coverage.test.ts itself. The hand-maintained PHASE_STRINGS and I18N_KEYS literals inside that file would satisfy the test-side coverage assertion against themselves, so a real Theater test that covers a symbol could be deleted and the gate would still pass. Excluded the walker file from TEST_FILES via path.resolve(__filename) filter so the test corpus only contains independent evidence. Once the walker stopped seeing itself, the gate correctly red-flagged nine i18n keys that no INDEPENDENT test exercises: critiqueTheater.userFacingName, roundLabel, composite, threshold, interrupt, interrupted, degradedHeading, shippedSummary, interruptedSummary. Component tests like TheaterCollapsed.test.tsx exercise the rendered text but never mention the key STRING, so the walker couldn't see them. Closed that gap by adding apps/web/tests/components/Theater/critique-i18n-keys.test.ts: 9 cases, one per watched key, asserting the dictionary entry exists as a non-empty string. That's both real coverage (catches a stale dict) and the independent evidence the walker requires. P2 (interruptedSummary missing from de/ja/ko/zh-TW). The native locale overrides were missing the key, so an interrupted run on a German / Japanese / Korean / Traditional Chinese UI silently fell back to the English string via the ...en spread. Added the key with {round} and {composite} placeholders preserved, using PerishCode's suggested copy from the earlier review thread. Verified: - pnpm --filter @open-design/web typecheck clean. - pnpm exec vitest run tests/components/Theater tests/i18n: 20 files / 190 tests green (critique-coverage 62 / 62, critique-i18n-keys 9 / 9 new, reducer-bench 1 / 1, locales 5 / 5). * fix(web): drop the Dict cast in i18n key coverage test (lefarcen P1 / Siri-Ray on PR #1318) The previous revision used `(en as Record<string, string>)[key]` to read each watched key. Dict has no string index signature, so CI's strict typecheck rejected the broad cast with TS2352 even though the runtime assertion was fine. Replaced with the typed pattern lefarcen suggested: type WATCHED_KEYS as `readonly (keyof typeof en)[]` and read `en[key]` directly. That removes the cast and also strengthens the test, because a renamed or removed key now fails the type check immediately rather than at runtime. Verified: - pnpm --filter @open-design/web typecheck clean. - pnpm --filter @open-design/web exec vitest run tests/components/Theater/critique-i18n-keys.test.ts: 9 / 9 green. * fix(web): tighten isPanelEvent in contracts so enum + numeric fields are checked end-to-end (Siri-Ray round-3 P1 on PR #1314) The variant validator on the web SSE path previously accepted any `typeof === 'string'` for closed-enum fields (ship.status, panelist_.role, degraded.reason, failed.cause, parser_warning.kind, run_started.cast[]) and any `typeof === 'number'` for numeric fields, which let NaN / Infinity through. Downstream components index i18n tables by enum value, so an unknown status or role would land `SHIP_BADGE_KEY[final.status]` on undefined and crash the translator. The replay parser had a separate gap: `useCritiqueReplay.parseTranscript` called the cheap `isPanelEvent` header check directly, so a recorded line like `{"type":"ship","runId":"r"}` reached the reducer with composite, status, round, artifactRef, summary all undefined and TheaterCollapsed then called `final.composite.toFixed(1)` on undefined. Resolution: move all wire-side validation into the contract guard. - Export const arrays for the closed enums: SHIP_STATUSES, DEGRADED_REASONS, FAILED_CAUSES, PARSER_WARNING_KINDS, ROUND_DECISIONS (PANELIST_ROLES already existed). - Rewrite `isPanelEvent` in packages/contracts/src/critique.ts to be the single deep validator: header (known type + non-empty runId) plus every variant-specific required field plus closed-enum membership plus Number.isFinite on every numeric field. Documented as the wire source of truth. - Drop the local `hasValidVariantShape` from web/sse.ts; sseToPanelEvent now relies entirely on the contract guard, and parseTranscript in useCritiqueReplay (which already uses isPanelEvent) gets the deeper validation for free. Tests (TDD, red-first): - packages/contracts/tests/critique.test.ts: 13 new cases pinning the strict guard directly (well-formed across every variant, every rejection path: unknown type, empty/non-string runId, unknown enum, non-finite numeric, missing variant field). - apps/web/tests/components/Theater/state/sse.test.ts: 9 new cases for each closed-enum rejection on the wire path plus a positive sweep across every legal enum value across every variant. - apps/web/tests/components/Theater/hooks/useCritiqueReplay.test.tsx: 2 new cases for incomplete and unknown-enum transcript lines. Verified: - pnpm --filter @open-design/contracts test 4 files / 30 tests green. - pnpm --filter @open-design/contracts build clean. - pnpm --filter @open-design/web typecheck clean. - pnpm --filter @open-design/web test 107 files / 976 tests green. fix(contracts): enforce numeric domains in isPanelEvent (lefarcen P2 on PR #1314 round 4) The strict guard from PR #1314 round 3 enforced enum membership and Number.isFinite, but accepted any finite number where the contract intends a specific domain: scale: 0 (ScoreTicker divides by it), negative thresholds, fractional rounds, negative mustFix, etc. ScoreTicker.tsx writes `var(--scale, ${state.scale})` into inline CSS and divides by it for tick width, so a guard-passing scale: 0 shipped Infinity into the rendered style. Negative composite / score values reached downstream code that assumes >= 0. Resolution: mirror the daemon-side Zod domain constraints in the runtime guard. Three new helpers in packages/contracts/src/critique.ts: - isPositiveInt(v): integer with v > 0. Used for round, maxRounds, scale, protocolVersion (all 1-indexed in the orchestrator). - isNonNegativeInt(v): integer with v >= 0. Used for mustFix, position, bestRound. bestRound: 0 is the valid sentinel for 'interrupted before any round closed'. - isNonNegativeFinite(v): finite number with v >= 0. Used for composite, score, dimScore, threshold. Threshold may be fractional (e.g. 8.5 on a scale of 10). Cross-field check inside run_started: threshold <= scale (the daemon Zod schema enforces this with an epsilon refine, the wire guard matches the same intent). Tests (TDD, red-first) added in packages/contracts/tests/critique.test.ts: - 22 new rejection cases across every numeric field that previously slipped through: scale: 0, negative scale, fractional scale, maxRounds: 0, fractional maxRounds, protocolVersion: 0, fractional protocolVersion, negative threshold, threshold > scale, round: 0, fractional round, negative dimScore / score, negative / fractional mustFix, negative composite, ship round: 0, negative / fractional bestRound, negative interrupted composite, negative / fractional parser_warning position. - 3 positive boundary cases that must still pass: threshold == scale, fractional threshold within [0, scale], interrupted with bestRound: 0 (no round completed before interrupt), parser_warning with position: 0 (start of stream). Verified: - pnpm --filter @open-design/contracts build clean. - pnpm --filter @open-design/contracts test: 4 files / 59 tests green (was 37 before the new domain cases). - pnpm --filter @open-design/web typecheck clean. - pnpm --filter @open-design/web test: 110 files / 1004 tests green; no regression on Theater suite, sse validator, replay parser, or assistant-feedback widget tests. * fix(web): restore wait-for-daemon-ack pattern on Theater interrupt Same regression as flagged on PR #1316 post-main-merge: the optimistic local dispatch fired before the POST resolved, so a daemon 404 / 409 still terminalized the UI and the real SSE terminal event got ignored by the sticky interrupted phase. Snapshot runId / bestRound / composite at click time, dispatch interrupted only on res.ok, clear interruptPending on rejection or non-2xx so the user can retry. Tests cover rejection + 404 leaving the run on the live stage; the 204 path waits for the ack. * test(e2e): move critique-coverage walker from apps/web/tests to e2e/tests (Siri-Ray P2) The walker is by definition a cross-app consistency check: it reads the web reducer, the daemon critique module, the contracts package, and the e2e UI suite. Hosting it under apps/web/tests/ violated the repo boundary rule (root AGENTS.md): app packages must not import another app's private src/ or tests/ as a shared helper, and cross-app consistency checks belong in e2e/tests/. The web test lane was effectively coupled to daemon and e2e file layout, so a daemon-only refactor could break the web lane. Moved the file to e2e/tests/critique-coverage.test.ts and switched the contracts import to the import.meta.glob shape the e2e package already uses (see localized-content.test.ts), so the e2e package does not have to add @open-design/contracts as a workspace dep just to load two const arrays. REPO_ROOT and SELF_PATH recalculated for the new location. Web test lane no longer depends on daemon, contracts, or e2e layout. The e2e walker covers the same 62 assertions as before: e2e/tests/critique-coverage.test.ts 62 / 62 green Web typecheck clean, e2e typecheck clean. * fix(test): add projectKind prop to FileViewer deck render after v0.7.0 merge --------- Co-authored-by: Nagendhra <nagendhra405@gmail.com>	2026-05-14 15:55:36 +08:00
pftom	2ac5854432	feat(plugin-inputs): enhance plugin input handling with file upload support - Added support for file input fields in the PluginInputsForm, allowing users to upload files with serializable metadata. - Updated the HomeHero component to improve the layout and interaction of input fields, enhancing user experience. - Adjusted CSS styles for better visual representation of input fields and their states. - Modified HomeView to reflect changes in authoring chip IDs for better clarity in plugin actions. - Enhanced tests to cover new file input functionality and ensure correct behavior in various scenarios. This update significantly improves the plugin input handling, enabling users to upload files seamlessly and enhancing the overall interaction model.	2026-05-14 15:52:21 +08:00
pftom	1a90aef4a2	feat(plugin-use): implement plugin use handoff functionality - Added support for using installed plugins directly from the PluginsView, allowing users to initiate plugin actions seamlessly. - Enhanced the HomeView to handle plugin use handoffs, managing state and user interactions effectively. - Introduced new types and functions to facilitate the creation and processing of plugin use handoffs, improving the overall user experience. - Updated tests to cover the new plugin use functionality, ensuring reliability and correctness in the application flow. This update significantly enhances the interaction model for plugins, enabling users to utilize plugins more intuitively within the application.	2026-05-14 15:40:42 +08:00
pftom	9ea33e076b	feat(context-plugins): add support for context plugins in project metadata and UI - Introduced a new `contextPlugins` field in the `ProjectMetadata` type to accommodate plugins selected via `@` mentions, allowing for additive context in project creation. - Updated the `HomeHero` and `EntryShell` components to handle and display context plugins, enhancing user interaction with selected plugins. - Implemented rendering logic for context plugins in the metadata block, providing clear visibility of selected plugins and their descriptions. - Enhanced the UI to support the removal of context plugins and display additional details on hover, improving the overall user experience. This update significantly enriches the project creation process by allowing users to incorporate multiple context plugins seamlessly.	2026-05-14 15:29:49 +08:00
lefarcen	6c16283850	Merge origin/main (post-7c8305f4) into reconcile branch Brings in 10 new main commits: routine deep-link to specific conversations (#1508), Windows resource cache fix for Orbit templates, collapsible comment side panel (#1607), routines project radio polish, Copilot logo swap, and minor UI fixes. Conflicts resolved: - router.ts: garnet's home/view + marketplace routes + main's per-project conversationId deep-link field coexist on Route union - ProjectView.tsx: garnet's isPhantomDaemonRunMessage helper + main's isStoppableAssistantMessage helper both kept - ProjectView.run-cleanup.test.tsx: accepted HEAD (garnet's phantom-row regression test); main's three new tests for finalizeActiveAssistantMessagesOnStop / clearStreamingConversationMarker / shouldClearActiveRunRefs are queued as a follow-up TODO inline.	2026-05-14 15:13:38 +08:00
shangxinyu1	2976c76fc3	test: expand Memory and Routines coverage (#1521 ) * test: expand settings and packaged coverage * test: extend memory settings coverage * test: cover routine settings failure states * test: cover routine operation failures * test: fix daemon test typing on CI * test: decouple packaged smoke from orbit bug * test: avoid live memory LLM calls in route tests * test: fix daemon fetch typing in CI * fix: restore preview comment and inspect toggles * test: align manual edit flow with current inspector UX * test: align comment attachment flow with current preview comments UI * fix: probe resolved Codex launch path during detection * fix: remove duplicate board activation helper after rebase * test: update ghost cli detection mock * test: align FileViewer toolbar expectation * ci: move full app tests to extended lane * ci: run app tests by changed scope * ci: cover shared app inputs in test scopes * ci: avoid setup-node cache in windows packaged smoke * test: align extended settings and manual edit flows	2026-05-14 14:48:40 +08:00
Nagendhra Madishetti	5cb0508790	fix(web): deep-link Routines history rows to their specific conversation (Fixes #1505 ) (#1508 )	2026-05-14 14:27:34 +08:00
soulme	2a8ebff11a	feat(web): add collapsible comment side panel (#1607 )	2026-05-14 14:27:09 +08:00
Siri-Ray	d2738924fb	fix(web): freeze completed run durations across conversations (#1351 ) * fix(web): freeze completed run durations across conversations * fix(web): finalize stopped API runs Generated-By: looper 0.6.0 (runner=fixer, agent=codex) * fix(daemon): optimize conversation latest run lookup Generated-By: looper 0.6.0 (runner=fixer, agent=codex) * fix(web): scope streaming cleanup to conversation Generated-By: looper 0.6.0 (runner=fixer, agent=codex) * fix(web): capture streaming conversation cleanup Generated-By: looper 0.6.0 (runner=fixer, agent=codex) * fix(web): guard stale run ref cleanup Generated-By: looper 0.6.0 (runner=fixer, agent=codex)	2026-05-14 14:25:37 +08:00
sukumarp2022	852a005b32	feat(web): add export as image screenshot to share menu (#1569 ) Add an option to export the current preview viewport as a PNG image. - Add requestPreviewSnapshot() utility in exports.ts (reuses the existing srcdoc snapshot bridge via postMessage) - Add exportAsImage() and dataUrlToBlob() helpers for Blob download - Add Export as image menu item in the HTML viewer share menu, gated behind srcdoc mode (bridge only present in srcdoc, not URL-load mode) - Refactor PreviewDrawOverlay to delegate to the shared requestPreviewSnapshot() instead of duplicating the snapshot logic - Add fileViewer.exportImage i18n key across all 19 locale files - Add 7 unit tests covering snapshot request, timeout, error handling, and download filename sanitization Fixes #1500	2026-05-14 11:07:28 +08:00
Bryan	54498f1ac5	fix(web): parse Provenance with Markdown-bold labels (#1584 ) * fix(web): parse Provenance with Markdown-bold labels (#1580) The daemon's finalize synthesis prompt at apps/daemon/src/finalize-design.ts:560-565 lists the five Provenance fields without pinning field-label syntax, so Claude renders them with Markdown-bold labels per Markdown convention (`- Field: value`). The parser at apps/web/src/lib/parse-provenance.ts:32-36 uses `[:\s]+` as its label/value separator, which stops at the trailing `` after the colon; the capture group then slurps the `` and any following whitespace into the value. Downstream of that, transcriptMessageCount and generatedAt parse as null because the captured tokens don't start with digits or a valid ISO 8601 prefix, and the Continue in CLI clipboard prompt shows `Design system: ** ...`, `Transcript message count when DESIGN.md was generated: unknown`, `DESIGN.md generated at: unknown`. Fix: strip leading and trailing Markdown emphasis (``, `_`, whitespace) from every captured value via a single helper threaded through extractField / extractFieldOrNone / extractNumber / extractDate. Widen the transcriptMessageCount regex's capture from `(\d+)` to `([^\n]+)` so the strip step gets a chance to run on `* 4`. Add `[^:]` between `count` and `[:\s]+` to mirror the other label-walking regexes for bolded label variants. Defense-in-depth: tell the synthesis prompt to emit plain `- Field: value` bullets with no emphasis on the labels. The parser hardening is the load-bearing fix; this is belt-and-suspenders for new model variants. Red-Green-Refactor: - Phase 1 (Red): 3 new parse-provenance tests covering bold labels with backticked values, bold labels with a short `Generated:` form, and bold labels with `none` sentinels. All 3 failed against pre-fix source. - Phase 2 (Green): strip + regex widening. All 7 parse-provenance tests + 1158 web tests pass. - Phase 3: empirically verified against a live finalized DESIGN.md — all five fields now parse correctly. - Phase 4 (defense-in-depth): one-line addendum to synthesis prompt. - Phase 5: bold-labelled Provenance fixture added to the hook test (useDesignMdState.test.tsx) so the round-7 `unknown-provenance` fail-closed path is regression-pinned end-to-end. Backticks in field values are intentionally kept (out of scope per the issue spec; rendered clipboard text reads fine with them). The variant `- Field: value` (colon outside emphasis) is not in the issue enumeration and is not handled. Fixes #1580 fix(web): narrow Provenance strip to Markdown residue only Round-2 fix per lefarcen's review on PR #1584. The round-1 helper used `^[\s_]+` / `[\s_]+$`, which stripped a literal leading or trailing ``/`_` from any captured value — `_draft.html` corrupted to `draft.html`, and a build id like `build_id_v1_` lost its trailing underscore. Narrow stripMarkdownEmphasis to three explicit passes: 1. Leading ``/`_` tokens FOLLOWED BY WHITESPACE — only matches the ` ` residue left after `- Field:** value` is captured starting at the ``. 2. Trailing WHITESPACE followed by ``/`_` tokens — mirror of (1) if the value closes with emphasis after whitespace. 3. A single balanced wrap around the remaining value (`X` / `X` / `__X__` / `_X_`) — handles the `- Field: value` shape and any plain-label `value` form. Asymmetric literal ``/`_` characters in the value (no whitespace separator, no balanced closing token) are preserved by construction. Added regression tests: - plain label + `_draft.html` value - plain label + `build_id_v1_` value (trailing underscore) - bold label + `_draft.html` value (residue stripped, literal leading underscore preserved) - plain label + `wrapped-id*` value (balanced residue stripped) All 11 parse-provenance tests + 1162 web tests pass. Empirically re-verified against a live finalized DESIGN.md — all five fields still parse correctly. --------- Co-authored-by: DevForgeAI CI/CD Engineer <devforge-ai@development.ai>	2026-05-14 11:04:24 +08:00
Prantik Medhi	0c0da7cc23	fix(web): confirm continue-in-cli copy (#1604 )	2026-05-14 11:02:36 +08:00
pftom	3b167b6921	feat(plugins): add registry protocol and enhance plugin management features - Introduced the `@open-design/registry-protocol` package, enabling improved interactions with plugin registries. - Updated the `typecheck` script in the daemon's `package.json` to include the new registry protocol. - Enhanced the CLI with new flags and commands for better plugin management, including `yank` and additional marketplace functionalities. - Implemented a plugin lockfile system to manage installed plugins and their versions, improving reliability during upgrades. - Added new marketplace doctor functionality to validate plugin entries and ensure compliance with registry standards. This update significantly enhances the plugin ecosystem by providing robust registry interactions and improved management capabilities.	2026-05-14 08:55:36 +08:00
pftom	56c264c9bd	feat(plugins): add login and whoami commands for GitHub CLI authentication - Introduced `login` and `whoami` commands to the plugin CLI, enabling users to authenticate with the Open Design registry via GitHub CLI. - The `login` command wraps GitHub CLI authentication, allowing users to specify a host, defaulting to GitHub. - The `whoami` command retrieves and displays the authenticated GitHub account information, with an option for JSON output. - Updated the CLI help documentation to include usage instructions for the new commands. - Enhanced error handling for GitHub CLI dependencies and authentication status. This update improves the user experience by simplifying the authentication process for plugin publishing.	2026-05-14 07:25:05 +08:00
lefarcen	d83b228c81	Merge remote-tracking branch 'origin/garnet-hemisphere' into reconcile/garnet-main-merge	2026-05-13 23:52:33 +08:00
pftom	7c48fbd902	feat(plugins): enhance PluginsView with new marketplace management features - Updated the PluginsView component to include new tabs for 'Installed', 'Available', and 'Sources', improving the organization of plugin management. - Introduced functions for adding, refreshing, removing, and setting trust for plugin marketplaces, enhancing the marketplace interaction capabilities. - Enhanced the UI to reflect the new structure, including updated CSS styles for better visual consistency and usability. - Added tests to ensure the functionality of the new marketplace features and verify the correct rendering of available plugins. This update significantly improves the user experience in managing plugins and marketplaces, providing a more intuitive interface for users.	2026-05-13 23:42:41 +08:00
lefarcen	53997990b7	Merge origin/main (post-0.7.0) into reconciled garnet branch Second-pass merge layering 41+ new commits from origin/main on top of the first reconcile commit. Headline upstream additions absorbed: - 0.7.0 release: redesigned chat bubble user-text styling, neutralised palette, lucide icons, ElevenLabs audio voice option discovery in the prompt composer, analytics tracking (PostHog) wired across home / studio / create surfaces, Prometheus `/api/metrics` endpoint, critique-theater drop-in mount with a settings toggle. - Misc upstream fixes (titlebar padding, release header layout, deck preview chrome, feedback form auto-scroll, conversation-created SSE on routine runs, etc.) Conflict resolutions (12 files, ~22 hunks): - contracts barrel + prompts/system: union of both sides; new analytics exports (`./analytics/events`, `./analytics/public-params`) added alongside garnet's plugin/atom/genui exports. Both ElevenLabs voice fields (audioVoiceOptions/audioVoiceOptionsError, main) and pluginBlock/activeStageBlocks (garnet) preserved on ComposeInput. - daemon/server.ts: Prometheus `/api/metrics` route inserted after garnet's `/api/daemon/shutdown`. main's `createAnalyticsService` call added before the chat-run service init alongside the prior reconcile note about the dropped legacy POST /api/projects body. - App.tsx: handleCreateProject now consumes both garnet's plugin fields (pluginId / appliedPluginSnapshotId / pluginInputs / autoSendFirstMessage) and main's analytics requestId. Tracking fires success + failure paths; PluginLoopHome auto-send sessionStorage flag is preserved. - ProjectView.tsx: the garnet auto-send useEffect coexists with main's `useCritiqueTheaterEnabled()` hook. - ChatComposer.tsx: imports merged (drop now-unused fetchSkills, add analytics provider + tracking + buildVisualAnnotationAttachment). - index.css: main's redesigned `.msg.user .user-text` chat bubble styling wins over garnet's plain text rule; garnet's `.msg-plugin-chip*` rules preserved alongside. - EntryView.tsx: accepted HEAD (garnet wrapper) — consistent with reconcile decision #2. main's added PetRail / TopTab / analytics view tracking is intentionally NOT brought into the wrapper; the follow-up to re-integrate PetRail / image-templates / video-templates into EntryShell still stands and now also covers analytics view-tracking hooks. - daemon/package.json + pnpm-lock: merged dep set (tar + posthog-node + prom-client coexist). - Test fixtures (FileWorkspace.test): kept garnet's plugin-folders describe block intact; main's projectKind="prototype" addition is dropped where it conflicted with garnet's plugin-folder fixture files. Verification: `pnpm install` (after lockfile reconciled), `pnpm typecheck` exits 0 across all workspace packages. Follow-up not done in this commit: - PetRail / image-templates / video-templates / 0.7.0 analytics view-tracking hooks need to be added to EntryShell. - Critique-theater settings toggle UX (added on main) lives in the SettingsDialog hierarchy; the reconcile state preserves the SettingsDialog so this should work without changes, but no end-to-end verification yet.	2026-05-13 23:29:56 +08:00
lefarcen	d3602be666	Merge origin/main into garnet-hemisphere (reconcile) Merge of `origin/main` (`03ed3960`, 2026-05-13 pre-0.7.0) into the 161-commit garnet-hemisphere line, reconciling the product-vibe-coded plugin/marketplace/EntryShell surfaces from garnet with the routines / skills / live-artifacts feature work landed on main since the fork point. Headline decisions (full rationale + side-by-side screenshots in `specs/change/20260513-garnet-skills-automations/reconcile-result-vs-garnet.md`): - #1 SettingsDialog: keep main's Memory / Skills / External MCP / Connectors / Routines / MCP server nav items even though the top-level /integrations + /automations routes also cover them. Two entries coexist for now; revisit once Track A/B fill in the placeholder content. - #2 EntryView: accept garnet's thin wrapper delegating to EntryShell. Main's PetRail sidebar + image-templates/video-templates tabs are intentionally deferred to a follow-up that re-integrates them into the new EntryShell layout. - #3 /integrations + /automations top-level routes: kept (garnet's product intent). Skills tab is still a "Coming soon" placeholder awaiting Track A; Routines/Schedules/Live-artifacts cards on /automations are still mock awaiting Track B. - #5 DesignFilesPanel: hybrid — main's pagination as primary list, garnet's Plugin folders section preserved between the live-artifacts block and the pagination block. (by-kind sections drop in favour of pagination; plugin-folders rendering stays because it is a garnet-specific product addition.) - #7 server.ts (10 hunks, ~5400 conflict lines): manual hunk-by-hunk merge. Both daemon admin routes + plugin/genui routes (garnet) and routines/memory/skills upgrades (main) preserved. Garnet's inline project route block kept alongside main's `registerProjectRoutes` / `registerProjectUploadRoutes` modular wiring — duplicate route audit is a follow-up. Garnet's POST /api/projects plugin-snapshot resolution + default-scenario fallback is intentionally dropped from the inline body (now handled by registerProjectRoutes) and listed for follow-up re-integration into `project-routes.ts`. Verification (worktree at /Users/elian/Documents/open-design-garnet): - `pnpm typecheck` exits 0 across all workspace packages - daemon (`pnpm tools-dev run web --namespace reconcile-shots`) boots, serves `/api/daemon/status` healthy, and survives a Playwright walkthrough of /integrations / /automations / home / projects / design-systems / plugins / settings dialog - `@open-design/plugin-runtime` package built (was missing dist/ on garnet); without it the daemon's plugins/* imports fail at boot Track A (Skills tab → real SkillsSection) and Track B (Automations cards → real routines / live-artifacts backend) are the two remaining follow-ups blocking the placeholder/mock content from going live. See `spec.md` and `track-skills.md` in the same directory.	2026-05-13 22:29:21 +08:00
Yuhao Chen	828f8b93bf	fix(web): protect built-in skills from delete (#1583 )	2026-05-13 22:13:30 +08:00
Nagendhra Madishetti	38a5ab69e6	feat(daemon): Critique Theater Phase 12 (9 Prometheus metrics + 6 log events + OTel span + Grafana dashboard) (#1485 ) * feat(web): pure reducer for Critique Theater states (Phase 7.1) Pure CritiqueState reducer driven by the contracts-level PanelEvent (the same shape both the live SSE stream and the recorded transcript emit), so a single reducer powers both the in-flight panel and the rerun replay. Lifecycle covers run_started → running → (shipped / degraded / interrupted / failed), with panelist_open / dim / must_fix / close / round_end events building per-round CritiquePanelistView entries as they arrive. Defensive behaviour that surfaced while writing the spec tests: - Terminal phases (shipped / degraded / interrupted / failed) are sticky against further lifecycle events for the same run, except for parser_warning which can land late and is recorded in a side channel without changing phase. - A new run_started for a different runId at any time discards the prior state and reboots, so the UI can launch consecutive runs without an explicit reset action. - Events whose runId does not match the active run return the same state reference, so React's useReducer doesn't re-render subscribers on stray traffic. - Round bookkeeping keys by round number rather than "always last", so an out-of-order panelist_dim for round 1 arriving after a round 2 dim does not corrupt the round 2 bucket. Test coverage: 18 cases covering each transition, the runId guard, sticky-terminal behaviour, the out-of-order round invariant, and the stable-identity guarantee. Sets up Phase 7.2 and 7.3 to wire SSE + replay into the same reducer. * feat(web): useCritiqueStream hook subscribes to SSE and feeds reducer (Phase 7.2) createCritiqueEventsConnection is a pure connection manager that mirrors apps/web/src/providers/project-events.ts: opens an EventSource at /api/projects/:id/events, listens for every name in CRITIQUE_SSE_EVENT_NAMES, decodes each frame back into a PanelEvent (stripping the critique. prefix and merging the data payload), and hands it to the caller's onEvent. Reconnect uses exponential backoff (1s → 30s) and resets on `ready`; malformed payloads drop with a dev-mode warning rather than tearing the stream. useCritiqueStream wraps the manager in a useReducer that owns the CritiqueState. enabled=false or a null projectId tears down the connection cleanly; switching projectId closes the old connection and opens a fresh one. The returned dispatch lets local UI synthesise actions (e.g. an Esc keypress firing a synthetic interrupted while a kill request is in flight); production traffic comes from the SSE stream. Test coverage: - sse.test.ts (10 cases, node env): subscription set covers every CRITIQUE_SSE_EVENT_NAMES channel; payload decoding lifts the wire shape back to PanelEvent; malformed JSON is swallowed and does not stop the stream; exponential backoff schedule and ready-reset semantics are pinned with a setTimeout seam; close() cancels pending reconnects and shuts the live source; no-op fallback when EventSource is unavailable. - useCritiqueStream.test.tsx (6 cases, jsdom env): idle pre-event, reducer driven by synthetic actions, no connection when disabled or projectId is null, clean close on unmount, projectId change reopens cleanly. * feat(web): useCritiqueReplay hook drives reducer from transcript file (Phase 7.3) Fetches the per-run NDJSON transcript (one PanelEvent per line), parses every line via the shared isPanelEvent predicate, and dispatches into the same CritiqueState reducer the live SSE stream uses. A single reducer means the UI rendering a replay can be identical to the live panel, and a UI mounting both useCritiqueStream and useCritiqueReplay in parallel does not have to reconcile two state shapes. speed knob is `paused \| instant \| live \| { intervalMs: N }`. - instant flushes every event synchronously, useful for opening a finished run already at its terminal state. - intervalMs paces dispatches at a fixed cadence so the reviewer can watch the run unfold. - paused parses the transcript but holds events back until the caller advances speed (consumers can drive a scrubber later). - live is reserved for the future "playback at original cadence" feature, currently treated as instant; replay timestamps are not yet persisted with each event so honest pacing requires a follow-up Phase 7+ task. gunzip seam handles `.ndjson.gz` transcripts via DecompressionStream when present; the production fetch path picks between text and arrayBuffer based on the URL extension. Both seams are injectable so the unit tests don't need to spin up a real network or a real gzip pipeline. Test coverage (8 cases, jsdom env): - Idle status before any URL is provided. - speed=instant flushes the full transcript synchronously to shipped state. - speed={intervalMs:N} paces with the setTimeout seam, reaching done after the last tick. - speed=paused leaves status=playing with no dispatches. - Empty transcript reports done with state still idle. - Fetch rejection surfaces an error status with the message. - Malformed NDJSON lines are skipped; valid events around them still land. - .gz transcripts route through the gunzip seam. Closes the Phase 7 plan tasks 7.1 / 7.2 / 7.3 (reducer + stream + replay), all on one branch ready for review. Phases 8+ (Theater components) consume these from this PR. * fix(web): close payload-override gap + paused-resume bug in Critique Theater hooks (Phase 7 review) Two P1 fixes from lefarcen's review on PR #1307: SSE payload override `sseToPanelEvent` previously spread `data` after the channel-derived `type`, so a payload-provided `type` could override the channel and route a `critique.run_started` frame into the reducer as a `ship` action. Reversed the spread so the channel-derived `type` is authoritative, and revalidated the resulting object through the contracts-level `isPanelEvent` predicate before returning. Frames that fail validation (missing runId, empty runId, unknown type) are dropped, so a malformed or compromised SSE frame can no longer dispatch a wrong-shape action into the reducer. Three new sse.test.ts cases pin the regression: hostile `type:'ship'` in the payload still resolves to `run_started`, missing runId is dropped, empty runId is dropped. Replay pause/resume `useCritiqueReplay` had one big effect keyed on `transcriptUrl` only, so flipping `speed` from `paused` to `instant` never re-fired and the held events sat undispatched. Split into a parse effect (depends on URL, fetches and stores events in state) and a pace effect (depends on parsed-events + speed, owns the cursor + timers). The playback cursor lives in a ref that survives pause/resume cycles, so flipping `paused` -> `instant` flushes from the current position rather than restarting (which would double-dispatch `run_started` and reset the reducer). Two new useCritiqueReplay.test.tsx cases: - paused-then-instant transitions from `playing` to `done` and reaches the shipped terminal phase - intervalMs paced playback dispatches one event, pauses to drain the next scheduled timer, flips to instant, and confirms the remaining transcript drains exactly once (cursor was preserved) Doc consistency The earlier source comment in useCritiqueReplay.ts claimed `live` "paces by recorded timestamps" while the impl used zero-delay timers and the PR body said it behaves like `instant`. Aligned to reality: `live` currently behaves like `{ intervalMs: 0 }` (events drain on successive microtasks via setTimeoutFn) because transcripts do not yet carry per-event timestamps. Honest timestamp-driven pacing is queued as a Phase 7+ follow-up. Validated: pnpm guard, pnpm --filter @open-design/web typecheck, Theater suite 47/47 (up from 42, +3 sse + 2 replay), full web suite 96 files / 888 tests. * feat(i18n): seed Critique Theater key block (en + zh-CN; other locales fall back via spread) * feat(web): Theater PanelistLane component (Phase 8.1) * feat(web): Theater ScoreTicker component (Phase 8.2) * feat(web): Theater RoundDivider component (Phase 8.3) * feat(web): Theater InterruptButton component with Escape keybind (Phase 8.4) * feat(web): Theater TheaterDegraded chip (Phase 8.5) * feat(web): Theater TheaterCollapsed post-run summary (Phase 8.6) * feat(web): Theater TheaterTranscript replay surface (Phase 8.7) * feat(web): Theater TheaterStage top-level container (Phase 8.8) * feat(web): Theater CSS using existing semantic tokens (no hex literals) * feat(web): Theater public exports barrel * fix(web): resolve P2 + P3 review feedback on Phase 8 (PR #1314) Addresses all 4 P2 + 3 P3 items from codex, Siri-Ray, and lefarcen. State-lifecycle fixes (3 x P2) 1. Reducer learns a synthetic `__reset__` action (`CritiqueResetAction`). Host hooks dispatch it when their gating prop changes so a stale run from a prior project / transcript cannot bleed into the next context. Reset is idempotent on idle (returns the same reference). 2. `useCritiqueStream` dispatches `__reset__` at the top of its connection effect, so a workspace switch from project A (which streamed a critique) to project B clears the reducer before the new EventSource opens. enabled=false also clears. 3. `useCritiqueReplay` dispatches `__reset__` at the top of its parse effect, so transcriptUrl swaps (including swap-to-null after a replay reached `shipped`) lift the reducer back to idle before the new fetch starts. SSE validation (1 x P2) 4. `sseToPanelEvent` now runs a per-variant `hasValidVariantShape` check after the cheap `isPanelEvent` predicate. A `critique.ship` frame missing `composite` / `round` / `status` / `artifactRef` is rejected before reaching the reducer, so TheaterCollapsed can no longer crash on `undefined.toFixed(1)`. Every variant's required fields are validated: run_started (protocolVersion, non-empty cast, maxRounds, threshold, scale), panelist_* (round, role, plus variant-specific shape), round_end (round, composite, mustFix, decision in {continue,ship}, reason), ship (round, composite, status, artifactRef.{projectId,artifactId}, summary), degraded (reason, adapter), interrupted (bestRound, composite), failed (cause), parser_warning (kind, position). Reducer correctness (1 x P2) 5. `panelist_open` now materializes the round + an empty panelist view (`{dims: [], mustFixes: []}`) so TheaterStage can highlight the in-progress lane the instant the tag opens. Before this, a stream that emitted only `panelist_open` after `run_started` left `rounds = []` and the UI rendered no current round until a later `panelist_dim` arrived. Polish (3 x P3) 6. Brand role tint swaps from `var(--magenta, var(--accent))` to `var(--purple, var(--accent))`. `--purple` is actually defined across the design systems; `--magenta` is not, so Brand was silently falling through to `--accent` and looking identical to Designer. 7. New i18n key `critiqueTheater.interruptedSummary` for the interrupted-collapse copy ("Interrupted at round N, best composite X.X"). Previously the interrupted branch reused `shippedSummary` and the UI read "Shipped at round..." for a run that specifically did not ship. Native value in en + zh-CN; other locales fall back via `...en` spread. 8. `TheaterDegraded` heading id comes from `useId()` instead of a hardcoded `theater-degraded-heading`, so two chips rendered on the same page (chat history with multiple completed runs) keep their aria-labelledby references unambiguous. Tests (15 new cases) - reducer.test.ts (+5): __reset__ on running/terminal/idle, panelist_open materializes round, panelist_open does not stomp prior panelist data. - sse.test.ts (+6): variant-level rejection for ship without required fields, degraded without adapter, run_started with empty cast, panelist_dim with non-numeric score, round_end with unknown decision, plus a positive fully-formed ship. - useCritiqueStream.test.tsx (+2): state reset on projectId change, state reset on enabled flip false. - useCritiqueReplay.test.tsx (+1): state reset on transcriptUrl swap to null after a replay reached shipped. - TheaterCollapsed.test.tsx (text-pinning update): asserts the interrupted branch reads "Interrupted at round 1" + "best composite 7.9", and explicitly NOT "Shipped at round...". - TheaterDegraded.test.tsx (+1): two chips on the same page get unique aria-labelledby ids that each resolve to an `<h3>`. Validated - pnpm guard clean - pnpm --filter @open-design/web typecheck clean - Theater suite: 13 files, 101 tests (was 86 on the first Phase 8 push, +15 new) - tests/i18n/locales.test.ts 5 of 5 across 18 locales * feat(web): CritiqueTheaterMount wires SSE + reducer into a single drop-in (Phase 9.1) * feat(i18n): Critique Theater strings for de + ja + ko + zh-TW (Phase 9.2) * fix(web): resolve P1 + P2 review feedback on Phase 9 (PR #1315) Addresses every blocker from codex, Siri-Ray, and lefarcen. The three state-lifecycle and SSE-validation issues they also flagged inherit fixes from PR #1314's review pass that this branch now sits on top of after rebase. Real daemon kill on Interrupt (P1) - CritiqueTheaterMount now POSTs to /api/projects/:id/critique/:runId/interrupt alongside the optimistic local dispatch. Before this fix, clicking Interrupt only flipped the React state to interrupted while the daemon job kept running. The fetch is best-effort: a 404 (endpoint not wired yet, lands in Phase 15) is swallowed with a dev-mode console.warn so the UI still moves to the collapsed badge. - New fetchInterrupt test seam lets RTL assert on the URL / method and simulate the "daemon not ready yet" path. Two tests pin both: the happy URL proj-42/critique/run-abc/interrupt POSTs, and a rejected fetch still flips the UI. interruptPending reset on new run (P2) - A ref-backed effect compares the current runId against the last one we saw; when it changes, interruptPending is cleared. A user who interrupts run-1 and then triggers run-2 from the same mount now gets a fresh, enabled kill button instead of one stuck in "Interrupting…". Pinned by a new mount test. Escape keybind scope (P2) - InterruptButton now checks the keydown target. Escape inside an input, textarea, select, or contenteditable element is ignored (and any ancestor of those via closest() is treated the same way). Body-level focus still fires the keybind so the Theater area's affordance keeps working. Four new tests cover textarea, input, contenteditable, and the body-focus positive case. userFacingName i18n key (P2) - The spec at specs/current/critique-theater.md:6 mandates a single critiqueTheater.userFacingName key so the "Design Jury" label can be renamed without touching code. Phase 8 introduced critiqueTheater.title by mistake; renamed across types.ts, en.ts, zh-CN.ts, de.ts, ja.ts, ko.ts, zh-TW.ts, and the lone consumer TheaterStage.tsx. The locale alignment test stays green. Validated - pnpm guard clean - pnpm --filter @open-design/web typecheck clean - Theater suite: 14 files, 112 tests (was 101 before, +11 new for the Phase 9 review pass: 3 mount + 4 InterruptButton focus scope; the rest were already in #1314's review fix). - tests/i18n/locales.test.ts 5 of 5 across 18 locales. * feat(daemon): adapter-degraded registry with TTL (Phase 10.1) In-memory registry recording adapters that produced malformed or oversize transcripts so the orchestrator can skip them for a TTL window (default 24h) instead of cycling through known-bad providers on every run. Records carry reason (malformed_block \| oversize_block \| missing_artifact), source label, and expiresAt. The test-only clock seam lets the suite advance time deterministically and prove that an expired entry stops counting as degraded without anyone calling clearDegraded. 7/7 vitest cases green. * feat(daemon): synthetic good + bad adapter fixtures (Phase 10.2) Two test-only adapters that read the existing v1 transcript fixtures (happy-3-rounds and malformed-unbalanced) and replay them as either a full string or a 512-byte chunked stream. The chunked form is what the conformance harness uses to prove the parser holds together when the transcript arrives in arbitrary network slices, not as one buffered blob. * feat(daemon): adapter conformance harness (Phase 10.3) runAdapterConformance pulls a transcript through the same parseCritiqueStream pipeline the orchestrator uses and classifies the outcome as shipped, degraded, or failed. On a degraded outcome it forwards the matched reason to the adapter-degraded registry, so a single nightly conformance run is what populates the skip list rather than the orchestrator learning each adapter is broken at request time. 5/5 vitest cases green covering shipped, malformed degraded, oversize degraded, no-ship failure, and the harness-thrown failure path. * test(e2e): Critique Theater Playwright suite (Phase 11) Six tests, one viewport per visual case, deterministic SSE fixtures stubbed via page.route(). Adds the suite to test:ui:extended so the existing extended-UI lane picks it up. Coverage: 1. Happy path: a single mounted theater plays the full fixture (1 run_started, 5 panelists open / dim / must_fix / close, 1 round_end, 1 ship) and ends on the score badge. 2. Interrupt mid-run: the panelist that is open at the time the interrupt button is clicked closes with an interrupted marker and the transcript freezes there. 3. Visual regression at 375x720 mobile. 4. Visual regression at 768x1024 tablet. 5. Visual regression at 1280x800 desktop. 6. A11y role tree: the theater region exposes a labelled landmark, each panelist lane is a group with an accessible name, the score is a status live region. All SSE traffic is stubbed by page.route so the suite runs in CI without a daemon. The toggle is seeded via localStorage by bootAppWithCritiqueEnabled so the gate behaves as if Settings flipped it on. typecheck clean; playwright --list reports 6. * test(web): reducer p99 bench at 10k iterations (Phase 13.1) Locks the documented 2ms budget for the Critique Theater reducer on a representative SSE script (27 actions, one full happy run) behind a regression gate. Asserts p99 stays under 4ms (2x the documented budget) so CI runners with a noisy neighbour do not flake while a real regression to 20ms or 200ms still trips. The bench is a vitest case rather than a bare microbenchmark so it runs in the same CI lane as every other web test and does not need a parallel runner. * test(web): critique surface coverage walker (Phase 13.2) Walks the public critique surface (11 SSE event names, 5 panelist roles, 6 lifecycle phases, 9 named i18n keys) and asserts each named symbol appears in both the src corpus and the test corpus. The walker is the gate that catches a rename in one half of the codebase without a matching update in the other half: a future PR that drops 'panelist_must_fix' from the reducer without also removing its test reference fails this suite. 62 assertions, one per symbol per corpus. * docs: Critique Theater user guide (Phase 14.1) Seven sections aimed at end users (not contributors): 1. What is Design Jury 2. How it works (the five panelists, auto-converging rounds, the composite formula) 3. Settings (the M1 toggle and what it does) 4. Reading the score badge 5. Replay surface 6. Troubleshooting (degraded, interrupted, failed) 7. FAQ The composite formula is documented as designer * 0 + critic * 0.4 + brand * 0.2 + a11y * 0.2 + copy * 0.2 because anyone trying to reverse-engineer the score is going to search for those weights and the docs are the place they should land first. * docs(daemon): critique module AGENTS map (Phase 14.2) Daemon-side wayfinder for the apps/daemon/src/critique directory. Tables every file, what owns what invariant, and the 'when you change anything here' guide so a future contributor does not have to reverse-engineer the rollout resolver before adding a new SSE event. * docs(web): Theater module AGENTS map (Phase 14.3) Web-side mirror of the daemon AGENTS map. Same file table, same invariants section, same change-impact guide, sized to the Theater component package. * feat(daemon): rollout flag resolver (Phase 15.1) Single decision point every caller consults to know whether the orchestrator should wire the critique pipeline for a given run. Priority: 1. Skill-level policy (required wins, opt-out wins inversely) 2. Per-project override from the Settings toggle 3. OD_CRITIQUE_ENABLED env override 4. Rollout phase default M0 dark-launch false M1 settings only false (toggle is off until the user flips it) M2 per-skill true if skill opted in M3 global default true OD_CRITIQUE_ROLLOUT_PHASE parser defaults to M0 on unknown input so a fresh install never surprises a user with the feature on. 10/10 vitest cases green covering every cell of the matrix. * feat(web): Settings toggle hook for Critique Theater (Phase 15.2) React hook that reads critiqueTheaterEnabled from the existing open-design:config localStorage blob and stays in sync via: - the platform storage event (cross-tab) - a open-design:critique-theater-toggle CustomEvent (same-tab) Same-tab event is the one that fires when the Settings panel saves in the current window: the toggle and every mounted theater update without a page reload. setCritiqueTheaterEnabled(next) is the imperative setter the Settings panel calls. It preserves the rest of the stored config (mode, apiKey, etc.) and dispatches the same-tab event after the localStorage write. The web hook reflects what the user toggled; the daemon-side isCritiqueEnabled is the final routing authority (project override, env, rollout phase). When they disagree, the daemon wins for backend gating and the web reflects the toggle state. 6/6 vitest cases green covering first read, stored read, same-tab event flip, config preservation, corrupted JSON tolerance, and cross-tab storage event. * test(web): Phase 15 toggle hook failure-mode coverage (PR #1320) lefarcen P2 on PR #1320 flagged that the PR body claimed safe behavior for disabled localStorage, non-object JSON, and missing CustomEvent shim, but the suite only covered corrupt JSON plus happy-path storage events. Added four failure-mode tests so the swallowed errors are not silently traded for a throw in a future refactor: 1. Returns false on a stored JSON value that parses to an array (non-object). Catches a regression where the guard treats anything truthy as a config blob. 2. Returns false on a stored JSON value of literal 'null'. typeof null === 'object' in JS, so the guard has to check null explicitly; this test pins that check. 3. Returns false when localStorage.getItem throws (private mode / disabled storage / SecurityError). The hook must swallow and return false so the rest of the app keeps rendering. 4. setCritiqueTheaterEnabled still dispatches the same-tab CustomEvent when localStorage.setItem throws (quota exceeded / disabled storage). The dispatch path is the in-session broadcast that keeps every mounted hook coherent even when persistence is unavailable; verified by mounting two probes and asserting both flip after the setter is called with a throwing setItem. 10/10 vitest cases green (6 existing + 4 new). * fix(web): honor CustomEvent payload in toggle hook listener (PR #1320) Both Siri-Ray (blocking) and lefarcen (P2 new) caught the same real bug in the failure-mode test I added in `affcdd27`: the test asserts the in-session UI flips when localStorage.setItem throws, but the CustomEvent listener was ignoring the event's typed detail and just calling readToggle(). Under a throwing setItem the localStorage value is stale (or absent), so the listener would see the OLD value and the test would fail (or worse, the production claim 'in-session event keeps mounts coherent' was hollow). Fixed the hook, not the test: the listener now reads event.detail.enabled when it is a boolean, falling back to readToggle() only for malformed events or for cross-tab storage events (which do not carry a typed payload). The setter already dispatched the detail; the listener just was not consuming it. Test changes: - The existing 'setItem throws' test now asserts the right behavior for the right reason. Updated the inline comment to say the listener reads from detail, not localStorage. - New test 'falls back to readToggle when the CustomEvent carries no usable detail' pins the fallback path: a malformed dispatcher (no detail, or detail.enabled not a boolean) degrades cleanly instead of throwing or being silently ignored. 11 / 11 vitest cases green (10 prior + 1 new fallback). * feat(daemon): route critique spawn-path eligibility through the rollout resolver The wireup edit Phase 10 and Phase 15 carved out: today server.ts gates the critique pipeline on critiqueCfg.enabled, which is just the OD_CRITIQUE_ENABLED env var. After this commit it gates on isCritiqueEnabled(...) from the Phase 15 resolver, so the full priority matrix is live: 1. Per-skill od.critique.policy veto (opt-out / required) 2. Per-project override (M1 Settings toggle, written through the existing Phase 6 settings endpoint) 3. OD_CRITIQUE_ENABLED env override (power-user lane / CI fixtures) 4. OD_CRITIQUE_ROLLOUT_PHASE default M0 dark-launch false M1 settings only false M2 per-skill only when skillPolicy === 'opt-in' M3 global default true Default behaviour on a fresh install is unchanged: the resolver returns false at M0 without an env override or a project override, so prod traffic falls through to the legacy single-pass path exactly the way it did before. Inputs threaded today: phase from OD_CRITIQUE_ROLLOUT_PHASE, envOverride from OD_CRITIQUE_ENABLED. skillPolicy and projectOverride are passed as null for the v1 cutover; the daemon-side handler that round-trips critiqueTheaterEnabled on the project settings row and the od.critique.policy frontmatter resolver land as the next two commits in this branch. The three call sites that used critiqueCfg.enabled (the brand-thread guard, the skill-thread guard, the top-line critiqueShouldRun compound) now read from a single locally-scoped critiqueEnabledForRun boolean, so the eligibility check is computed exactly once per spawn and the prompt composer + orchestrator stay in lockstep the way the existing comment already promised. Tests still green: daemon vitest 22 / 22 across rollout + conformance + adapter-degraded. Daemon typecheck clean. * feat(web): mount CritiqueTheaterMount in ProjectView The web counterpart of the daemon wireup. ProjectView now renders <CritiqueTheaterMount projectId={project.id} enabled={...} /> as a sibling of <AppChromeHeader> inside the top-level <div className="app">. The mount is the drop-in from the Phase 9 stack: it owns the SSE subscription, the kill-request handshake, and the phase-aware swap from the live <TheaterStage> to the collapsed badge once a run settles. The mount returns null until the daemon emits a critique.run_started for the active project, so the visual surface is byte-for-byte unchanged for users who have not opted in. Enabled wiring: useCritiqueTheaterEnabled() reads the M1 Settings toggle from the existing open-design:config localStorage blob and stays in sync with both the platform storage event (cross-tab) and the same-tab open-design:critique-theater-toggle CustomEvent the Phase 15 setter dispatches. The hook honors the event payload directly so a private-mode browser that cannot persist the toggle still updates the in-session UI correctly. The daemon-side gate (isCritiqueEnabled in apps/daemon/src/server.ts) remains the authority for whether a run is actually wired through the critique pipeline. This hook only governs whether the web layer renders the resulting SSE stream when the daemon emits one. The two-layer gate is intentional: an integrator embedding the Theater in a custom UI can flip the web visibility independent of the daemon's routing decision, and a daemon-side env override flips backend gating without touching the web's localStorage. Tests still green: web Theater suite 181 / 181 across 16 files. Web typecheck clean. * feat(daemon): resolve od.critique.policy frontmatter at the spawn site The next step in the wireup branch's ladder: replace the placeholder `skillPolicy: null` with the actual value parsed from the active skill's SKILL.md frontmatter. Three small edits, one new field on a public type: 1. SkillInfo gains a `critiquePolicy: SkillCritiquePolicy` field carrying the parsed `od.critique.policy` token (required / opt-in / opt-out / null). The field is null when the skill has no opinion, which lets the lower-priority resolver tiers (projectOverride, envOverride, phase default) decide. 2. listSkills() populates the new field via a small `normalizeCritiquePolicy` helper that tolerates the YAML scalar's casing and trims whitespace. Unknown tokens collapse to null so a typo in SKILL.md cannot accidentally force the panel on or off; it just falls through. Derived example cards inherit the parent's policy. 3. server.ts captures `skill.critiquePolicy` into a hoisted `skillCritiquePolicy` variable inside the existing skill-load block, then threads it into the isCritiqueEnabled call as the skillPolicy input. The hoisting keeps the variable in scope at the resolver call site without restructuring the spawn handler. After this commit, the priority matrix the rollout resolver was designed for is live for its top tier. The previous commit wired env + phase; this one wires skill. The projectOverride input remains null pending the next commit that extends the Phase 6 settings endpoint. Daemon vitest: 10 / 10 rollout cases pass against the new wiring. Daemon typecheck: clean. * feat(daemon): feed projectOverride into the rollout resolver from project metadata Replaces the placeholder `projectOverride: null` in the spawn handler with the actual value the Settings panel writes onto the project's metadata blob: `critiqueTheaterEnabled?: boolean`. The read is defensive at the boundary: the metadata object is typed loosely (it round-trips through SQLite as a free-form JSON blob), so the spawn handler narrows to `boolean` and falls through to `null` for any other shape. A missing key, a malformed value, or a project that has never visited Settings collapses to `null`, which is exactly the resolver's "no opinion, fall through to env / phase" signal. The `critique` frontmatter slot also gets typed on the SkillFrontmatter shape so the `od.critique.policy` chain the previous commit introduced no longer needs a bracket-access cast. Same pattern as the existing `craft`, `preview`, and `design_system` nested-record slots. After this commit, every tier of the rollout resolver's priority matrix is wired: 1. skillPolicy (from SKILL.md od.critique.policy) 2. projectOverride (from project metadata critiqueTheaterEnabled) 3. envOverride (from OD_CRITIQUE_ENABLED) 4. rollout phase (from OD_CRITIQUE_ROLLOUT_PHASE) The write path for projectOverride still flows through the existing project-update handler the Settings panel already uses to persist project metadata; no new endpoint is needed. The Settings UI button that calls setCritiqueTheaterEnabled and posts the new field is the next commit on this branch. Daemon typecheck: clean. Daemon vitest: 10 / 10 rollout cases still green against the new wiring. * fix(daemon): forward critique events to project sinks + align composer gate (PR #1338) Two codex review items addressed in one commit since they share the same root cause (resolver-enabled run hits a transport / prompt contract that was still env-gated): P1 (transport mismatch). The daemon emits critique.* SSE frames through critiqueBus -> design.runs.emit, which fans out on /api/runs/:runId/events. The web CritiqueTheaterMount subscribes to /api/projects/:projectId/events (it's project-scoped, not run- scoped, because the mount lives at the project workspace and follows the user across runs). Result: in production the mount never sees a real frame and the e2e tests' stubbed routes hide the mismatch. Fixed by extending critiqueBus.emit to fan out to BOTH sinks: the existing runs.emit transport, AND the per-project event-sinks map. The project-events route emits via sse.send(payload.type, payload), so we pack the SSE channel name onto payload.type and let the sink push the right channel. The web sseToPanelEvent overwrites type from the channel name on the way back into a PanelEvent, so the round-trip stays correct. P2 (prompt gate misalignment). composeSystemPrompt reads cfg.enabled to decide whether to append the panel addendum, but critiqueCfg.enabled is loaded from OD_CRITIQUE_ENABLED only. A run the resolver enabled via phase / project / skill (env unset) would have critiqueShouldRun = true while critiqueCfg.enabled remained false, dropping the panel prompt while still routing through runOrchestrator -> parser waits for tags that never arrive -> run degrades. Fixed by passing a derived config { ...critiqueCfg, enabled: true } to the composer when critiqueShouldRun is true. The composer's own gate now agrees with the resolver decision on every input the spec defines. Daemon typecheck: clean. Daemon vitest: 10 / 10 rollout cases still green against the new wiring. * fix: address PerishCode P1 + P2 follow-ups on PR #1338 Two follow-up items PerishCode flagged on the activation PR. Non-blocking but both are real: 1. Phase 11 e2e suite was wired into test:ui:extended but lands the user on '/' (home route) where ProjectView (and therefore CritiqueTheaterMount) is never rendered. With the suite as written, every assertion would time out the first time the lane runs in CI, contradicting the PR body's claim that the suite stays parked behind test.describe.fixme. The state diverged from my earlier Phase 11 work because the merge from main on commit `4ab719c6` brought in #1307's squash-merged version of the e2e file (the pre-fixme shape). Re-applied test.describe.fixme to the describe block plus removed ui/critique-theater.test.ts from the test:ui:extended script in e2e/package.json. Added a file-header docblock explaining what the follow-up commit needs to do: replace goto('/') with /projects/:id navigation similar to app-design-files.test.ts, split the SSE fixture into a live prefix and terminal suffix (Codex P2 on PR #1320), and commit the first PNG baselines. 2. bestRoundOf in CritiqueTheaterMount returned the LAST round with a numeric composite, not the round with the HIGHEST composite, while bestCompositeOf correctly returned the max. A run that closed round 1 at 8.5 and round 2 at 6.0 would dispatch interrupted { bestRound: 2, composite: 8.5 } on a user-clicked interrupt. Folded the two helpers into a single bestRoundAndComposite that walks state.rounds once and returns the matching pair so the two values cannot drift. The onInterrupt callback now destructures from one helper instead of two independent reads. Falls back to (state.activeRound, 0) when no round has closed with a composite yet. Web typecheck: clean. CritiqueTheaterMount.test.tsx: 7 / 7 cases still green against the new helper. * fix: wire M1 project override end-to-end + correct deferred-surface doc claims (PR #1338) Three lefarcen P2s on the latest review pass, all real: 1. M1 project override was half-wired: the daemon read metadata.critiqueTheaterEnabled but the web setter only wrote localStorage. A user opt-in would render the Theater on the web (localStorage was set) while the daemon resolved projectOverride=null and skipped critique unless env / phase already permitted. Two halves talking past each other. Extended setCritiqueTheaterEnabled to accept an optional { projectId, fetchProjectSettings } options bag. When a projectId is supplied, the setter ALSO sends a PATCH /api/projects/:id with { metadata: { critiqueTheaterEnabled } } so the daemon's spawn-time resolver picks the same value up on the next generation. The existing project-routes endpoint already accepts arbitrary metadata patches, so no new endpoint is needed. The local write + the CustomEvent dispatch still fire before the PATCH, so a network failure does not unwind the in-session UI flip. Three new vitest cases pin the new path: PATCHes when projectId is provided, skips when it is not, swallows a rejected PATCH so the in-session UI still flips. 2. Rollout docs (docs/critique-theater.md section 3) claimed the Settings toggle persists into the daemon settings store, but the previous implementation only had a localStorage reader / writer plus a daemon read of project metadata, with no round-trip. Rewrote the section to lead with the four-tier resolver (skill policy / project override / env / phase), document that the setter now round-trips via the existing PATCH endpoint when given a projectId, and call out the Settings panel UI control as a deliberate follow-up. 3. Troubleshooting table pointed users at /api/metrics/critique (Phase 12, deferred) and 'od adapters clear-degraded <id>' (CLI wrapper that does not exist). Replaced the metrics reference with the local conformance harness command (pnpm --filter @open-design/daemon vitest run tests/critique-conformance.test.ts) that ships today, with a note that the Phase 12 dashboard surfaces this status as a series once that PR lands. Replaced the CLI command with the programmatic clearDegraded() helper that exists today and flagged the CLI wrapper as planned follow-up. Web typecheck: clean. Toggle hook tests: 14 / 14 green (11 existing + 3 new for the round-trip path). * test(web): multi-round interrupt regression for bestRoundAndComposite (PR #1338) lefarcen P3 follow-up to the previous bestRoundAndComposite fix: the existing CritiqueTheaterMount.test.tsx interrupt cases only exercised a single-round state, so a future refactor back to two independent helpers wouldn't be caught by the test suite even though it'd reintroduce the round / composite drift bug. Added a regression case that: 1. Drives the reducer through two complete rounds with the full 5-role cast closing at distinct composites: round 1 at 8.5, round 2 at 6.0 (the high-composite round is NOT the most recent one). 2. Clicks Interrupt + waits for the daemon ack via the test seam fetcher returning 204. 3. Asserts the collapsed badge displays "round 1" (the correct best-composite round), and queryByText for "round 2 ... 8.5" returns null (the buggy pairing would have produced that string). The bestRoundAndComposite helper walks state.rounds in one pass and returns the matching pair, so the round number and the composite cannot drift apart. This test locks the fix in: a refactor that splits the helpers back into independent walks will be caught here. 8 / 8 vitest cases green on the file. * fix(web): read-merge-write the project metadata in setCritiqueTheaterEnabled (PerishCode P2 on PR #1338) The previous round-trip sent { metadata: { critiqueTheaterEnabled: next } } as the entire PATCH body. The daemon's project-routes handler only re-stamps three immutable fields (baseDir, importedFrom, fromTrustedPicker) before calling updateProject(db, id, patch), which then does a shallow { ...existing, ...patch } in apps/daemon/ src/db.ts. So patch.metadata replaces the row's metadata wholesale, dropping kind, templateId, linkedDirs, and every other field the rest of the app reads. No in-tree caller passes projectId today (only vitest cases), so the bug had not surfaced yet. But the surface is documented in docs/critique-theater.md section 3 and the function's own JSDoc as the M1 round-trip path, so it would have shipped as a latent footgun for the next integrator: a Settings UI follow-up, or any third party that wires the setter into a project-aware surface. Fix: read-merge-write rather than a bare patch. - GET /api/projects/:id to read the row's current metadata. - Spread that metadata into the PATCH body and overlay critiqueTheaterEnabled: next on top, mirroring the partial-metadata pattern already used in ChatComposer.tsx for linkedDirs. - PATCH the merged object. Failure handling: - GET fails: skip the PATCH entirely. We cannot construct a safe merged body without the current state, and a bare patch would wipe other metadata. The in-session CustomEvent fired earlier in the setter still keeps every mounted hook consistent; the next save retries the round-trip. - PATCH fails: log in dev. The in-session UI is already correct via the CustomEvent. Tests (TDD, red-first): - 'GETs the project then PATCHes with merged metadata when a projectId is supplied': stubs a GET that returns { kind: 'template', templateId: 'modern-blog', linkedDirs: [...] } and asserts the PATCH body equals the merge plus the toggle. - 'PATCHes with just the toggle when the project has no prior metadata': stubs a GET that returns no metadata block. - 'skips the PATCH (does not stomp metadata) when the prefetch GET fails': stubs a rejecting GET and asserts only the GET fires. - 'swallows a rejected PATCH after a successful prefetch': stubs a successful GET and a rejecting PATCH; asserts the in-session UI still flips via the CustomEvent. Doc updated on the setter's JSDoc to describe the new three-step flow (localStorage, CustomEvent, read-merge-write PATCH) and the two failure modes. Verified: - pnpm --filter @open-design/web typecheck clean. - pnpm --filter @open-design/web test: 111 files / 1055 tests green (was 1052, +3 from the new merge-flow cases). * fix(web): restore wait-for-daemon-ack pattern on Theater interrupt Same regression as flagged on PR #1316 post-main-merge: the optimistic local dispatch fired before the POST resolved, so a daemon 404 / 409 still terminalized the UI and the real SSE terminal event got ignored by the sticky interrupted phase. Snapshot runId / bestRound / composite at click time, dispatch interrupted only on res.ok, clear interruptPending on rejection or non-2xx so the user can retry. Tests cover rejection + 404 leaving the run on the live stage; the 204 path waits for the ack. * feat(daemon): Critique Theater Phase 12 observability foundations Lands the metrics registry, the structured logger, the /api/metrics route, and the adapter-degraded bump that wires up the first data point. The orchestrator-side bumps for runs / rounds / composite / must-fix / interrupted / parser_errors / protocol_version land in a follow-up commit on this branch (kept separate so the wiring diff reads cleanly against the registry shape). Surfaces added: - apps/daemon/src/metrics/index.ts: 9 Prometheus series under the open_design_critique_* namespace with the histogram buckets the spec calls out (round_duration_ms at 100 / 250 / 500 / 1000 / 2500 / 5000 / 10000 / 30000 / 60000 ms; composite_score at 0-10 integer steps). - apps/daemon/src/logging/critique.ts: 6 typed events, one JSON line per call on stdout, namespaced critique. Matches the JSON-per-line convention cli.ts already uses; no new logger framework. - apps/daemon/src/server.ts: GET /api/metrics route. Honors OD_METRICS_ENDPOINT=disabled to opt out for air-gapped installs. - apps/daemon/src/critique/adapter-degraded.ts: markDegraded now bumps degraded_total so the adapter-health dashboard panel reflects every TTL refresh and every fresh mark. Deps: prom-client ^15.1.0, @opentelemetry/api ^1.9.0 added to apps/daemon/package.json. Both are zero-config no-ops without an exporter wired; daemon bundle size impact is ~150 KB uncompressed. The @opentelemetry/api dep is in place ahead of the OTel-spans follow-up commit; it adds no behavior on this commit. Tests: - tests/metrics/critique.test.ts (3 cases): registry shape + exposition text + reset-between-tests - tests/logging/critique.test.ts (4 cases): event shape + ordering + newline framing + namespace stamping Verification (Windows-local): - pnpm --filter @open-design/daemon typecheck: clean - New metrics + logging suites: 7 / 7 green - Existing adapter-degraded + conformance + rollout suites: 22 / 22 green; the bump is non-breaking * feat(daemon): wire Critique Theater metrics + structured logs from the orchestrator Lights up the bump sites the Phase 12 foundations PR registered the series for. Every panel event the parser surfaces now reaches the matching Prometheus counter / histogram and the matching JSON log line on stdout. Switch-loop bumps + logs: - run_started: log run_started, set protocol_version gauge to the observed protocol version (small-integer cardinality). - panelist_open: record the first-open wall-clock per round so round_end can compute round_duration_ms; subsequent opens in the same round leave the start time untouched. - panelist_must_fix: bump must_fix_total with the panelist role. The wire event does not yet carry a dim name, so the label is 'unspecified' for now; a future parser revision can drop in the real dim without a metric rename. - round_end: bump rounds_total, observe composite_score, observe round_duration_ms (current ms minus the tracked start), log round_closed with the composite / mustFix / decision triple. - parser_warning (parser-yielded): bump parser_errors_total with the kind label, log parser_recover with kind + position. Orchestrator-side parser warnings (composite_mismatch and duplicate_ship from the daemon-authoritative scoring checks) go through a new emitParserWarning helper so the bus emit, the collectedEvents push, the metric bump, and the log line stay in lockstep. Three inline emission sites collapse to one-line helper calls. After the try/catch, a single terminal-status switch bumps runs_total{status, adapter, skill} once per run, with branch- specific log + counter: - shipped / below_threshold: log run_shipped - interrupted: bump interrupted_total, log run_failed{cause: interrupted} - timed_out: log run_failed{cause: timed_out} - failed: log run_failed{cause: orchestrator_internal} - degraded: log degraded{reason: orchestrator_classified} OrchestratorParams gains optional skill: string for the label; defaults to 'unknown' so spawn sites that have not yet threaded it keep working without a metric shape change. Tests: - The new metrics + logging suites (7 / 7) verify registry shape and event framing; orchestrator-side metric integration is exercised through the existing critique-conformance and critique-adapter-degraded suites (22 / 22 still green). - Logger test reassigns process.stdout.write directly instead of vi.spyOn so the Node overloaded write signature does not collide with MockInstance<unknown>. * feat(observability): Grafana dashboard JSON for Critique Theater Three default rows mapping to the metrics this branch wires up: 1. Fleet quality: composite score p50 / p90 / p99 line graph by adapter, plus a heatmap of the composite distribution. The line graph answers 'are my agents getting better over time'; the heatmap answers 'are the bad runs clustered around one adapter or smeared across the fleet'. 2. Adapter health: stacked bar charts for degraded marks (by adapter / reason) and parser errors (by adapter / kind) over a 5-minute window. The two queries together let an operator see 'is this adapter degraded because of malformed wire output or because of oversize blocks' without flipping panels. 3. Brief throughput: runs-per-hour by terminal status, an average rounds-per-run stat per adapter, and a round-duration ms p50 / p90 / p99 line. Throughput numbers fall straight out of the runs_total / rounds_total counters; the duration histogram is the same one the runs feed. The dashboard uses a templated $datasource var (defaults to 'prometheus') so an operator with multiple Prometheus instances can switch without editing JSON. Schema version 39 (Grafana 11). Operators import via: pnpm dlx @grafana/cli dashboard import tools/dev/dashboards/critique.json or paste into a provisioned dashboards directory. The file is checked into the repo as a starting artifact; alert rules and SLO panels ship after the first 1000 runs inform the right thresholds. JSON validates with node -e 'JSON.parse(...)' (sanity checked locally). * feat(daemon): OpenTelemetry outer span around the critique run Wraps each runOrchestrator call in a 'critique.run' span via the existing @opentelemetry/api dep added in the Phase 12 foundations commit. Attributes set on the span: - critique.run_id, critique.adapter, critique.skill at start - critique.final_status, critique.final_composite on terminal resolution - span status flipped to ERROR for failed / timed_out runs so a Tempo / Honeycomb / Jaeger filter on traces.status=error surfaces the right slice without joining back to Prometheus No exporter is wired by default; @opentelemetry/api is the API package and intentionally splits from @opentelemetry/sdk-, so the span is zero-overhead until an operator attaches an SDK through their runtime config. Inner per-round / parse_chunk / scoreboard_eval / persist_round / ship.persist spans defined in the Phase 12 plan are a follow-up: the outer span alone gives the trace a duration + final status + adapter/skill labels, which is the 80% value for dashboards that correlate runs across services. Adding child spans inside the existing 600-line orchestrator without restructuring is a separate careful change. Verification: - pnpm --filter @open-design/daemon typecheck: clean - 29 / 29 critique + metrics + logging tests still green fix(nix): bump pnpmDepsHash for prom-client + @opentelemetry/api lockfile bump nix-check failed on PR #1485 with hash mismatch in open-design-daemon-pnpm-deps and open-design-web-pnpm-deps after the Phase 12 foundations commit (`2b8b7445`) added prom-client and @opentelemetry/api to apps/daemon/package.json and refreshed pnpm-lock.yaml. CI reported the new sha: specified: HFLm+8hv3o5x3Xem4MXNsNclIgiVRc70+EBafL0rVn8= got: 7R1sQC38gOT0gsZ2oNOviCZ486cbbGJGJCis6WI8z9s= Both nix files pin the same workspace lockfile, so both flip in lockstep. No other Nix surface changes required. * fix(daemon): four Phase 12 review findings (Codex P2 x2 + Siri-Ray P2 + lefarcen P2) 1. Siri-Ray P2 in orchestrator.ts (round metric / log used untrusted agent values). The new observability path now records rs.composite and rs.mustFix (daemon-authoritative) instead of event.composite and event.mustFix when rs exists, and skips the bumps + log entirely when rs is missing (a degenerate round_end without any matching panelist_open). The dashboard p50 / p90 / p99 now agrees with persistence and ship decisions; an adapter reporting <ROUND_END composite='10'> while the daemon computed 6 logs 6 and still emits the composite_mismatch parser warning the prior block was already producing. 2. Codex P2 in server.ts (skill label always 'unknown'). The spawn path called runOrchestrator without passing the resolved skill id, so every live run bumped open_design_critique_{skill='unknown'} and the per-skill dashboard breakdown was always empty. Threaded effectiveSkillId (already computed at the same handler scope as the project skill fallback) through skill: . . . so the metric reflects the real skill when one is assigned, and the orchestrator default of 'unknown' only fires for runs that genuinely have none. 3. Codex P2 in conformance.ts (protocol-version mismatch let through). An adapter that emitted <CRITIQUE_RUN version='2'> followed by a valid SHIP classified as shipped because the harness only watched for terminal events. Added a guard inside the parse loop: if a run_started carries protocolVersion !== CRITIQUE_PROTOCOL_VERSION, mark the adapter degraded with reason 'protocol_version_mismatch' (already in DEGRADED_REASONS) and return early. ConformanceOutcome union widened to accept the new reason. 4. lefarcen P2 in tools/dev/dashboards/critique.json (runs-per-hour panel under-reported by 3600x). 'rate(...[1h])' returns per-second. Multiplied by 3600 so the panel title and unit match the actual value rendered. Verification: - pnpm --filter @open-design/daemon typecheck: clean - New metrics + logging suites (7), existing adapter-degraded (7), conformance (5), rollout (10): 29 / 29 green - Grafana JSON re-parses with node -e 'JSON.parse(...)' fix(nix): set pnpmDepsHash to fakeHash so CI surfaces the real hash for the regenerated lockfile (lefarcen P1 on PR #1485) * fix(nix): pin pnpmDepsHash to sha256-NtXbiRU0YZ4EVJVNC6N3sR1S0ozA3BvCwgXI0L0OMH4= from CI nix-check output --------- Co-authored-by: Nagendhra <nagendhra405@gmail.com>	2026-05-13 22:11:27 +08:00
Nagendhra Madishetti	385e1d111d	feat(web): Critique Theater Phase 9 (drop-in mount wrapper, native i18n for de, ja, ko, zh-TW) (#1315 ) * feat(web): pure reducer for Critique Theater states (Phase 7.1) Pure CritiqueState reducer driven by the contracts-level PanelEvent (the same shape both the live SSE stream and the recorded transcript emit), so a single reducer powers both the in-flight panel and the rerun replay. Lifecycle covers run_started → running → (shipped / degraded / interrupted / failed), with panelist_open / dim / must_fix / close / round_end events building per-round CritiquePanelistView entries as they arrive. Defensive behaviour that surfaced while writing the spec tests: - Terminal phases (shipped / degraded / interrupted / failed) are sticky against further lifecycle events for the same run, except for parser_warning which can land late and is recorded in a side channel without changing phase. - A new run_started for a different runId at any time discards the prior state and reboots, so the UI can launch consecutive runs without an explicit reset action. - Events whose runId does not match the active run return the same state reference, so React's useReducer doesn't re-render subscribers on stray traffic. - Round bookkeeping keys by round number rather than "always last", so an out-of-order panelist_dim for round 1 arriving after a round 2 dim does not corrupt the round 2 bucket. Test coverage: 18 cases covering each transition, the runId guard, sticky-terminal behaviour, the out-of-order round invariant, and the stable-identity guarantee. Sets up Phase 7.2 and 7.3 to wire SSE + replay into the same reducer. * feat(web): useCritiqueStream hook subscribes to SSE and feeds reducer (Phase 7.2) createCritiqueEventsConnection is a pure connection manager that mirrors apps/web/src/providers/project-events.ts: opens an EventSource at /api/projects/:id/events, listens for every name in CRITIQUE_SSE_EVENT_NAMES, decodes each frame back into a PanelEvent (stripping the critique. prefix and merging the data payload), and hands it to the caller's onEvent. Reconnect uses exponential backoff (1s → 30s) and resets on `ready`; malformed payloads drop with a dev-mode warning rather than tearing the stream. useCritiqueStream wraps the manager in a useReducer that owns the CritiqueState. enabled=false or a null projectId tears down the connection cleanly; switching projectId closes the old connection and opens a fresh one. The returned dispatch lets local UI synthesise actions (e.g. an Esc keypress firing a synthetic interrupted while a kill request is in flight); production traffic comes from the SSE stream. Test coverage: - sse.test.ts (10 cases, node env): subscription set covers every CRITIQUE_SSE_EVENT_NAMES channel; payload decoding lifts the wire shape back to PanelEvent; malformed JSON is swallowed and does not stop the stream; exponential backoff schedule and ready-reset semantics are pinned with a setTimeout seam; close() cancels pending reconnects and shuts the live source; no-op fallback when EventSource is unavailable. - useCritiqueStream.test.tsx (6 cases, jsdom env): idle pre-event, reducer driven by synthetic actions, no connection when disabled or projectId is null, clean close on unmount, projectId change reopens cleanly. * feat(web): useCritiqueReplay hook drives reducer from transcript file (Phase 7.3) Fetches the per-run NDJSON transcript (one PanelEvent per line), parses every line via the shared isPanelEvent predicate, and dispatches into the same CritiqueState reducer the live SSE stream uses. A single reducer means the UI rendering a replay can be identical to the live panel, and a UI mounting both useCritiqueStream and useCritiqueReplay in parallel does not have to reconcile two state shapes. speed knob is `paused \| instant \| live \| { intervalMs: N }`. - instant flushes every event synchronously, useful for opening a finished run already at its terminal state. - intervalMs paces dispatches at a fixed cadence so the reviewer can watch the run unfold. - paused parses the transcript but holds events back until the caller advances speed (consumers can drive a scrubber later). - live is reserved for the future "playback at original cadence" feature, currently treated as instant; replay timestamps are not yet persisted with each event so honest pacing requires a follow-up Phase 7+ task. gunzip seam handles `.ndjson.gz` transcripts via DecompressionStream when present; the production fetch path picks between text and arrayBuffer based on the URL extension. Both seams are injectable so the unit tests don't need to spin up a real network or a real gzip pipeline. Test coverage (8 cases, jsdom env): - Idle status before any URL is provided. - speed=instant flushes the full transcript synchronously to shipped state. - speed={intervalMs:N} paces with the setTimeout seam, reaching done after the last tick. - speed=paused leaves status=playing with no dispatches. - Empty transcript reports done with state still idle. - Fetch rejection surfaces an error status with the message. - Malformed NDJSON lines are skipped; valid events around them still land. - .gz transcripts route through the gunzip seam. Closes the Phase 7 plan tasks 7.1 / 7.2 / 7.3 (reducer + stream + replay), all on one branch ready for review. Phases 8+ (Theater components) consume these from this PR. * fix(web): close payload-override gap + paused-resume bug in Critique Theater hooks (Phase 7 review) Two P1 fixes from lefarcen's review on PR #1307: SSE payload override `sseToPanelEvent` previously spread `data` after the channel-derived `type`, so a payload-provided `type` could override the channel and route a `critique.run_started` frame into the reducer as a `ship` action. Reversed the spread so the channel-derived `type` is authoritative, and revalidated the resulting object through the contracts-level `isPanelEvent` predicate before returning. Frames that fail validation (missing runId, empty runId, unknown type) are dropped, so a malformed or compromised SSE frame can no longer dispatch a wrong-shape action into the reducer. Three new sse.test.ts cases pin the regression: hostile `type:'ship'` in the payload still resolves to `run_started`, missing runId is dropped, empty runId is dropped. Replay pause/resume `useCritiqueReplay` had one big effect keyed on `transcriptUrl` only, so flipping `speed` from `paused` to `instant` never re-fired and the held events sat undispatched. Split into a parse effect (depends on URL, fetches and stores events in state) and a pace effect (depends on parsed-events + speed, owns the cursor + timers). The playback cursor lives in a ref that survives pause/resume cycles, so flipping `paused` -> `instant` flushes from the current position rather than restarting (which would double-dispatch `run_started` and reset the reducer). Two new useCritiqueReplay.test.tsx cases: - paused-then-instant transitions from `playing` to `done` and reaches the shipped terminal phase - intervalMs paced playback dispatches one event, pauses to drain the next scheduled timer, flips to instant, and confirms the remaining transcript drains exactly once (cursor was preserved) Doc consistency The earlier source comment in useCritiqueReplay.ts claimed `live` "paces by recorded timestamps" while the impl used zero-delay timers and the PR body said it behaves like `instant`. Aligned to reality: `live` currently behaves like `{ intervalMs: 0 }` (events drain on successive microtasks via setTimeoutFn) because transcripts do not yet carry per-event timestamps. Honest timestamp-driven pacing is queued as a Phase 7+ follow-up. Validated: pnpm guard, pnpm --filter @open-design/web typecheck, Theater suite 47/47 (up from 42, +3 sse + 2 replay), full web suite 96 files / 888 tests. * feat(i18n): seed Critique Theater key block (en + zh-CN; other locales fall back via spread) * feat(web): Theater PanelistLane component (Phase 8.1) * feat(web): Theater ScoreTicker component (Phase 8.2) * feat(web): Theater RoundDivider component (Phase 8.3) * feat(web): Theater InterruptButton component with Escape keybind (Phase 8.4) * feat(web): Theater TheaterDegraded chip (Phase 8.5) * feat(web): Theater TheaterCollapsed post-run summary (Phase 8.6) * feat(web): Theater TheaterTranscript replay surface (Phase 8.7) * feat(web): Theater TheaterStage top-level container (Phase 8.8) * feat(web): Theater CSS using existing semantic tokens (no hex literals) * feat(web): Theater public exports barrel * fix(web): resolve P2 + P3 review feedback on Phase 8 (PR #1314) Addresses all 4 P2 + 3 P3 items from codex, Siri-Ray, and lefarcen. State-lifecycle fixes (3 x P2) 1. Reducer learns a synthetic `__reset__` action (`CritiqueResetAction`). Host hooks dispatch it when their gating prop changes so a stale run from a prior project / transcript cannot bleed into the next context. Reset is idempotent on idle (returns the same reference). 2. `useCritiqueStream` dispatches `__reset__` at the top of its connection effect, so a workspace switch from project A (which streamed a critique) to project B clears the reducer before the new EventSource opens. enabled=false also clears. 3. `useCritiqueReplay` dispatches `__reset__` at the top of its parse effect, so transcriptUrl swaps (including swap-to-null after a replay reached `shipped`) lift the reducer back to idle before the new fetch starts. SSE validation (1 x P2) 4. `sseToPanelEvent` now runs a per-variant `hasValidVariantShape` check after the cheap `isPanelEvent` predicate. A `critique.ship` frame missing `composite` / `round` / `status` / `artifactRef` is rejected before reaching the reducer, so TheaterCollapsed can no longer crash on `undefined.toFixed(1)`. Every variant's required fields are validated: run_started (protocolVersion, non-empty cast, maxRounds, threshold, scale), panelist_* (round, role, plus variant-specific shape), round_end (round, composite, mustFix, decision in {continue,ship}, reason), ship (round, composite, status, artifactRef.{projectId,artifactId}, summary), degraded (reason, adapter), interrupted (bestRound, composite), failed (cause), parser_warning (kind, position). Reducer correctness (1 x P2) 5. `panelist_open` now materializes the round + an empty panelist view (`{dims: [], mustFixes: []}`) so TheaterStage can highlight the in-progress lane the instant the tag opens. Before this, a stream that emitted only `panelist_open` after `run_started` left `rounds = []` and the UI rendered no current round until a later `panelist_dim` arrived. Polish (3 x P3) 6. Brand role tint swaps from `var(--magenta, var(--accent))` to `var(--purple, var(--accent))`. `--purple` is actually defined across the design systems; `--magenta` is not, so Brand was silently falling through to `--accent` and looking identical to Designer. 7. New i18n key `critiqueTheater.interruptedSummary` for the interrupted-collapse copy ("Interrupted at round N, best composite X.X"). Previously the interrupted branch reused `shippedSummary` and the UI read "Shipped at round..." for a run that specifically did not ship. Native value in en + zh-CN; other locales fall back via `...en` spread. 8. `TheaterDegraded` heading id comes from `useId()` instead of a hardcoded `theater-degraded-heading`, so two chips rendered on the same page (chat history with multiple completed runs) keep their aria-labelledby references unambiguous. Tests (15 new cases) - reducer.test.ts (+5): __reset__ on running/terminal/idle, panelist_open materializes round, panelist_open does not stomp prior panelist data. - sse.test.ts (+6): variant-level rejection for ship without required fields, degraded without adapter, run_started with empty cast, panelist_dim with non-numeric score, round_end with unknown decision, plus a positive fully-formed ship. - useCritiqueStream.test.tsx (+2): state reset on projectId change, state reset on enabled flip false. - useCritiqueReplay.test.tsx (+1): state reset on transcriptUrl swap to null after a replay reached shipped. - TheaterCollapsed.test.tsx (text-pinning update): asserts the interrupted branch reads "Interrupted at round 1" + "best composite 7.9", and explicitly NOT "Shipped at round...". - TheaterDegraded.test.tsx (+1): two chips on the same page get unique aria-labelledby ids that each resolve to an `<h3>`. Validated - pnpm guard clean - pnpm --filter @open-design/web typecheck clean - Theater suite: 13 files, 101 tests (was 86 on the first Phase 8 push, +15 new) - tests/i18n/locales.test.ts 5 of 5 across 18 locales * feat(web): CritiqueTheaterMount wires SSE + reducer into a single drop-in (Phase 9.1) * feat(i18n): Critique Theater strings for de + ja + ko + zh-TW (Phase 9.2) * fix(web): resolve P1 + P2 review feedback on Phase 9 (PR #1315) Addresses every blocker from codex, Siri-Ray, and lefarcen. The three state-lifecycle and SSE-validation issues they also flagged inherit fixes from PR #1314's review pass that this branch now sits on top of after rebase. Real daemon kill on Interrupt (P1) - CritiqueTheaterMount now POSTs to /api/projects/:id/critique/:runId/interrupt alongside the optimistic local dispatch. Before this fix, clicking Interrupt only flipped the React state to interrupted while the daemon job kept running. The fetch is best-effort: a 404 (endpoint not wired yet, lands in Phase 15) is swallowed with a dev-mode console.warn so the UI still moves to the collapsed badge. - New fetchInterrupt test seam lets RTL assert on the URL / method and simulate the "daemon not ready yet" path. Two tests pin both: the happy URL proj-42/critique/run-abc/interrupt POSTs, and a rejected fetch still flips the UI. interruptPending reset on new run (P2) - A ref-backed effect compares the current runId against the last one we saw; when it changes, interruptPending is cleared. A user who interrupts run-1 and then triggers run-2 from the same mount now gets a fresh, enabled kill button instead of one stuck in "Interrupting…". Pinned by a new mount test. Escape keybind scope (P2) - InterruptButton now checks the keydown target. Escape inside an input, textarea, select, or contenteditable element is ignored (and any ancestor of those via closest() is treated the same way). Body-level focus still fires the keybind so the Theater area's affordance keeps working. Four new tests cover textarea, input, contenteditable, and the body-focus positive case. userFacingName i18n key (P2) - The spec at specs/current/critique-theater.md:6 mandates a single critiqueTheater.userFacingName key so the "Design Jury" label can be renamed without touching code. Phase 8 introduced critiqueTheater.title by mistake; renamed across types.ts, en.ts, zh-CN.ts, de.ts, ja.ts, ko.ts, zh-TW.ts, and the lone consumer TheaterStage.tsx. The locale alignment test stays green. Validated - pnpm guard clean - pnpm --filter @open-design/web typecheck clean - Theater suite: 14 files, 112 tests (was 101 before, +11 new for the Phase 9 review pass: 3 mount + 4 InterruptButton focus scope; the rest were already in #1314's review fix). - tests/i18n/locales.test.ts 5 of 5 across 18 locales. * fix(web): address PerishCode P3 polish items on PR #1315 Three non-blocking findings from PerishCode's latest review folded in alongside the merge-with-main work. 1. bestRoundOf / bestCompositeOf could pair mismatched values. The split helpers walked state.rounds independently: bestRoundOf returned the LAST round with a numeric composite; bestCompositeOf returned the MAX composite. With non-monotonic composites (round 1 at 8.5, round 2 at 6.0), the user-initiated interrupt shipped { bestRound: 2, composite: 8.5 } which is a pair that never existed and the collapsed badge advertised forever (the reducer's interrupted phase is sticky). Folded into a single bestRoundAndComposite() that walks once and returns the matching pair. Multi-round regression test in CritiqueTheaterMount.test.tsx pins the fix. 2. critiqueTheater.interruptedSummary was missing from de.ts, ja.ts, ko.ts, zh-TW.ts. The Phase 9 native-locale work translated 39 of the 40 critiqueTheater.* keys; this one slipped through because the rest of the key block uses the locale-specific interrupted line as an anchor. The collapsed badge would have shown the English fallback string in an otherwise native UI on every interrupted run. Added native translations to all four locales (matching PerishCode's suggested copy). 3. InterruptButton's window-scope Escape handler could collide with Esc-to-dismiss on modals / popovers elsewhere on the page. The prior fix filtered text-entry focus but still fired from any body-level Esc, which double-handles when a non-text dismissable surface is open. New gate defers when a [role="dialog"] (or any aria-modal="true" element) without aria-hidden="true" is on the page AND the Escape didn't originate inside .theater-stage. Three new InterruptButton.test.tsx cases pin the deferral, the aria-hidden exemption, and the in-stage exemption. Verified: typecheck clean, 111 / 1007 tests green. * fix(web): restore wait-for-daemon-ack pattern on Theater interrupt Same regression as flagged on PR #1316 post-main-merge: the optimistic local dispatch fired before the POST resolved, so a daemon 404 / 409 still terminalized the UI and the real SSE terminal event got ignored by the sticky interrupted phase. Pulled the corrected pattern from the Phase 10 commit: - snapshot runId / bestRound / composite at click time - dispatch interrupted only on res.ok - on rejection or non-2xx, clear interruptPending and log in dev so the user can retry and the real terminal event still wins Tests updated: rejection and non-2xx cases now assert the UI stays in running phase; the 204 cases wait for the async ack with waitFor. Web typecheck clean. * fix(web): drop shadow hasValidVariantShape; delegate to isPanelEvent (PerishCode follow-up on PR #1315) The wire-layer guard in `apps/web/src/components/Theater/state/sse.ts` carried a `hasValidVariantShape` second-pass filter and a docstring claiming `isPanelEvent` from contracts was a header-only check ("missing runId, unknown type"). That stopped being true after the Siri-Ray round-3 fix on PR #1314: `isPanelEvent` is now the strict guard, validating every variant's required fields, closed-enum membership against PANELIST_ROLES / SHIP_STATUSES / DEGRADED_REASONS / FAILED_CAUSES / PARSER_WARNING_KINDS / ROUND_DECISIONS, finite numerics, and the threshold <= scale cross-field check. Net effect: the shadow guard was strictly weaker than the contract guard above it. It could not reject any frame `isPanelEvent` already let through. The docstring also misled future readers into thinking the safety net was at the wire layer when it actually sits one layer up in contracts. Two changes: 1. Drop `hasValidVariantShape` entirely. `sseToPanelEvent` collapses to the one-liner: spread the payload, pin `type` from the channel, return `isPanelEvent(candidate) ? candidate : null`. Docstring rewritten to describe `isPanelEvent` accurately as the strict guard, and to call out (with a forward reference to this commit) that the deletion was intentional. 2. Add three small wire-layer regression cases to `sse.test.ts` so a future accidental weakening of either layer fails loudly at the boundary the reducer actually sees: - drops a `critique.ship` with an unknown status (closed-enum check) - drops a `critique.panelist_close` with an unknown role (closed-enum check) - drops a `critique.round_end` with `composite: NaN` (finite-numeric check) Verification: - pnpm --filter @open-design/web typecheck: clean - pnpm --filter @open-design/web exec vitest run tests/components/Theater/state/sse.test.ts: 22 / 22 green (19 prior + 3 new) - Net diff: -84 +41 lines (-43 net); the shadow function and its stale docstring are gone, the three regressions land at half the cost * fix(test): add projectKind prop to FileViewer deck render after v0.7.0 merge --------- Co-authored-by: Nagendhra <nagendhra405@gmail.com>	2026-05-13 21:21:41 +08:00
Nagendhra Madishetti	c326492203	feat: Critique Theater Phase 15 (rollout resolver + Settings toggle hook) (#1320 ) * feat(web): pure reducer for Critique Theater states (Phase 7.1) Pure CritiqueState reducer driven by the contracts-level PanelEvent (the same shape both the live SSE stream and the recorded transcript emit), so a single reducer powers both the in-flight panel and the rerun replay. Lifecycle covers run_started → running → (shipped / degraded / interrupted / failed), with panelist_open / dim / must_fix / close / round_end events building per-round CritiquePanelistView entries as they arrive. Defensive behaviour that surfaced while writing the spec tests: - Terminal phases (shipped / degraded / interrupted / failed) are sticky against further lifecycle events for the same run, except for parser_warning which can land late and is recorded in a side channel without changing phase. - A new run_started for a different runId at any time discards the prior state and reboots, so the UI can launch consecutive runs without an explicit reset action. - Events whose runId does not match the active run return the same state reference, so React's useReducer doesn't re-render subscribers on stray traffic. - Round bookkeeping keys by round number rather than "always last", so an out-of-order panelist_dim for round 1 arriving after a round 2 dim does not corrupt the round 2 bucket. Test coverage: 18 cases covering each transition, the runId guard, sticky-terminal behaviour, the out-of-order round invariant, and the stable-identity guarantee. Sets up Phase 7.2 and 7.3 to wire SSE + replay into the same reducer. * feat(web): useCritiqueStream hook subscribes to SSE and feeds reducer (Phase 7.2) createCritiqueEventsConnection is a pure connection manager that mirrors apps/web/src/providers/project-events.ts: opens an EventSource at /api/projects/:id/events, listens for every name in CRITIQUE_SSE_EVENT_NAMES, decodes each frame back into a PanelEvent (stripping the critique. prefix and merging the data payload), and hands it to the caller's onEvent. Reconnect uses exponential backoff (1s → 30s) and resets on `ready`; malformed payloads drop with a dev-mode warning rather than tearing the stream. useCritiqueStream wraps the manager in a useReducer that owns the CritiqueState. enabled=false or a null projectId tears down the connection cleanly; switching projectId closes the old connection and opens a fresh one. The returned dispatch lets local UI synthesise actions (e.g. an Esc keypress firing a synthetic interrupted while a kill request is in flight); production traffic comes from the SSE stream. Test coverage: - sse.test.ts (10 cases, node env): subscription set covers every CRITIQUE_SSE_EVENT_NAMES channel; payload decoding lifts the wire shape back to PanelEvent; malformed JSON is swallowed and does not stop the stream; exponential backoff schedule and ready-reset semantics are pinned with a setTimeout seam; close() cancels pending reconnects and shuts the live source; no-op fallback when EventSource is unavailable. - useCritiqueStream.test.tsx (6 cases, jsdom env): idle pre-event, reducer driven by synthetic actions, no connection when disabled or projectId is null, clean close on unmount, projectId change reopens cleanly. * feat(web): useCritiqueReplay hook drives reducer from transcript file (Phase 7.3) Fetches the per-run NDJSON transcript (one PanelEvent per line), parses every line via the shared isPanelEvent predicate, and dispatches into the same CritiqueState reducer the live SSE stream uses. A single reducer means the UI rendering a replay can be identical to the live panel, and a UI mounting both useCritiqueStream and useCritiqueReplay in parallel does not have to reconcile two state shapes. speed knob is `paused \| instant \| live \| { intervalMs: N }`. - instant flushes every event synchronously, useful for opening a finished run already at its terminal state. - intervalMs paces dispatches at a fixed cadence so the reviewer can watch the run unfold. - paused parses the transcript but holds events back until the caller advances speed (consumers can drive a scrubber later). - live is reserved for the future "playback at original cadence" feature, currently treated as instant; replay timestamps are not yet persisted with each event so honest pacing requires a follow-up Phase 7+ task. gunzip seam handles `.ndjson.gz` transcripts via DecompressionStream when present; the production fetch path picks between text and arrayBuffer based on the URL extension. Both seams are injectable so the unit tests don't need to spin up a real network or a real gzip pipeline. Test coverage (8 cases, jsdom env): - Idle status before any URL is provided. - speed=instant flushes the full transcript synchronously to shipped state. - speed={intervalMs:N} paces with the setTimeout seam, reaching done after the last tick. - speed=paused leaves status=playing with no dispatches. - Empty transcript reports done with state still idle. - Fetch rejection surfaces an error status with the message. - Malformed NDJSON lines are skipped; valid events around them still land. - .gz transcripts route through the gunzip seam. Closes the Phase 7 plan tasks 7.1 / 7.2 / 7.3 (reducer + stream + replay), all on one branch ready for review. Phases 8+ (Theater components) consume these from this PR. * fix(web): close payload-override gap + paused-resume bug in Critique Theater hooks (Phase 7 review) Two P1 fixes from lefarcen's review on PR #1307: SSE payload override `sseToPanelEvent` previously spread `data` after the channel-derived `type`, so a payload-provided `type` could override the channel and route a `critique.run_started` frame into the reducer as a `ship` action. Reversed the spread so the channel-derived `type` is authoritative, and revalidated the resulting object through the contracts-level `isPanelEvent` predicate before returning. Frames that fail validation (missing runId, empty runId, unknown type) are dropped, so a malformed or compromised SSE frame can no longer dispatch a wrong-shape action into the reducer. Three new sse.test.ts cases pin the regression: hostile `type:'ship'` in the payload still resolves to `run_started`, missing runId is dropped, empty runId is dropped. Replay pause/resume `useCritiqueReplay` had one big effect keyed on `transcriptUrl` only, so flipping `speed` from `paused` to `instant` never re-fired and the held events sat undispatched. Split into a parse effect (depends on URL, fetches and stores events in state) and a pace effect (depends on parsed-events + speed, owns the cursor + timers). The playback cursor lives in a ref that survives pause/resume cycles, so flipping `paused` -> `instant` flushes from the current position rather than restarting (which would double-dispatch `run_started` and reset the reducer). Two new useCritiqueReplay.test.tsx cases: - paused-then-instant transitions from `playing` to `done` and reaches the shipped terminal phase - intervalMs paced playback dispatches one event, pauses to drain the next scheduled timer, flips to instant, and confirms the remaining transcript drains exactly once (cursor was preserved) Doc consistency The earlier source comment in useCritiqueReplay.ts claimed `live` "paces by recorded timestamps" while the impl used zero-delay timers and the PR body said it behaves like `instant`. Aligned to reality: `live` currently behaves like `{ intervalMs: 0 }` (events drain on successive microtasks via setTimeoutFn) because transcripts do not yet carry per-event timestamps. Honest timestamp-driven pacing is queued as a Phase 7+ follow-up. Validated: pnpm guard, pnpm --filter @open-design/web typecheck, Theater suite 47/47 (up from 42, +3 sse + 2 replay), full web suite 96 files / 888 tests. * feat(i18n): seed Critique Theater key block (en + zh-CN; other locales fall back via spread) * feat(web): Theater PanelistLane component (Phase 8.1) * feat(web): Theater ScoreTicker component (Phase 8.2) * feat(web): Theater RoundDivider component (Phase 8.3) * feat(web): Theater InterruptButton component with Escape keybind (Phase 8.4) * feat(web): Theater TheaterDegraded chip (Phase 8.5) * feat(web): Theater TheaterCollapsed post-run summary (Phase 8.6) * feat(web): Theater TheaterTranscript replay surface (Phase 8.7) * feat(web): Theater TheaterStage top-level container (Phase 8.8) * feat(web): Theater CSS using existing semantic tokens (no hex literals) * feat(web): Theater public exports barrel * fix(web): resolve P2 + P3 review feedback on Phase 8 (PR #1314) Addresses all 4 P2 + 3 P3 items from codex, Siri-Ray, and lefarcen. State-lifecycle fixes (3 x P2) 1. Reducer learns a synthetic `__reset__` action (`CritiqueResetAction`). Host hooks dispatch it when their gating prop changes so a stale run from a prior project / transcript cannot bleed into the next context. Reset is idempotent on idle (returns the same reference). 2. `useCritiqueStream` dispatches `__reset__` at the top of its connection effect, so a workspace switch from project A (which streamed a critique) to project B clears the reducer before the new EventSource opens. enabled=false also clears. 3. `useCritiqueReplay` dispatches `__reset__` at the top of its parse effect, so transcriptUrl swaps (including swap-to-null after a replay reached `shipped`) lift the reducer back to idle before the new fetch starts. SSE validation (1 x P2) 4. `sseToPanelEvent` now runs a per-variant `hasValidVariantShape` check after the cheap `isPanelEvent` predicate. A `critique.ship` frame missing `composite` / `round` / `status` / `artifactRef` is rejected before reaching the reducer, so TheaterCollapsed can no longer crash on `undefined.toFixed(1)`. Every variant's required fields are validated: run_started (protocolVersion, non-empty cast, maxRounds, threshold, scale), panelist_* (round, role, plus variant-specific shape), round_end (round, composite, mustFix, decision in {continue,ship}, reason), ship (round, composite, status, artifactRef.{projectId,artifactId}, summary), degraded (reason, adapter), interrupted (bestRound, composite), failed (cause), parser_warning (kind, position). Reducer correctness (1 x P2) 5. `panelist_open` now materializes the round + an empty panelist view (`{dims: [], mustFixes: []}`) so TheaterStage can highlight the in-progress lane the instant the tag opens. Before this, a stream that emitted only `panelist_open` after `run_started` left `rounds = []` and the UI rendered no current round until a later `panelist_dim` arrived. Polish (3 x P3) 6. Brand role tint swaps from `var(--magenta, var(--accent))` to `var(--purple, var(--accent))`. `--purple` is actually defined across the design systems; `--magenta` is not, so Brand was silently falling through to `--accent` and looking identical to Designer. 7. New i18n key `critiqueTheater.interruptedSummary` for the interrupted-collapse copy ("Interrupted at round N, best composite X.X"). Previously the interrupted branch reused `shippedSummary` and the UI read "Shipped at round..." for a run that specifically did not ship. Native value in en + zh-CN; other locales fall back via `...en` spread. 8. `TheaterDegraded` heading id comes from `useId()` instead of a hardcoded `theater-degraded-heading`, so two chips rendered on the same page (chat history with multiple completed runs) keep their aria-labelledby references unambiguous. Tests (15 new cases) - reducer.test.ts (+5): __reset__ on running/terminal/idle, panelist_open materializes round, panelist_open does not stomp prior panelist data. - sse.test.ts (+6): variant-level rejection for ship without required fields, degraded without adapter, run_started with empty cast, panelist_dim with non-numeric score, round_end with unknown decision, plus a positive fully-formed ship. - useCritiqueStream.test.tsx (+2): state reset on projectId change, state reset on enabled flip false. - useCritiqueReplay.test.tsx (+1): state reset on transcriptUrl swap to null after a replay reached shipped. - TheaterCollapsed.test.tsx (text-pinning update): asserts the interrupted branch reads "Interrupted at round 1" + "best composite 7.9", and explicitly NOT "Shipped at round...". - TheaterDegraded.test.tsx (+1): two chips on the same page get unique aria-labelledby ids that each resolve to an `<h3>`. Validated - pnpm guard clean - pnpm --filter @open-design/web typecheck clean - Theater suite: 13 files, 101 tests (was 86 on the first Phase 8 push, +15 new) - tests/i18n/locales.test.ts 5 of 5 across 18 locales * feat(web): CritiqueTheaterMount wires SSE + reducer into a single drop-in (Phase 9.1) * feat(i18n): Critique Theater strings for de + ja + ko + zh-TW (Phase 9.2) * fix(web): resolve P1 + P2 review feedback on Phase 9 (PR #1315) Addresses every blocker from codex, Siri-Ray, and lefarcen. The three state-lifecycle and SSE-validation issues they also flagged inherit fixes from PR #1314's review pass that this branch now sits on top of after rebase. Real daemon kill on Interrupt (P1) - CritiqueTheaterMount now POSTs to /api/projects/:id/critique/:runId/interrupt alongside the optimistic local dispatch. Before this fix, clicking Interrupt only flipped the React state to interrupted while the daemon job kept running. The fetch is best-effort: a 404 (endpoint not wired yet, lands in Phase 15) is swallowed with a dev-mode console.warn so the UI still moves to the collapsed badge. - New fetchInterrupt test seam lets RTL assert on the URL / method and simulate the "daemon not ready yet" path. Two tests pin both: the happy URL proj-42/critique/run-abc/interrupt POSTs, and a rejected fetch still flips the UI. interruptPending reset on new run (P2) - A ref-backed effect compares the current runId against the last one we saw; when it changes, interruptPending is cleared. A user who interrupts run-1 and then triggers run-2 from the same mount now gets a fresh, enabled kill button instead of one stuck in "Interrupting…". Pinned by a new mount test. Escape keybind scope (P2) - InterruptButton now checks the keydown target. Escape inside an input, textarea, select, or contenteditable element is ignored (and any ancestor of those via closest() is treated the same way). Body-level focus still fires the keybind so the Theater area's affordance keeps working. Four new tests cover textarea, input, contenteditable, and the body-focus positive case. userFacingName i18n key (P2) - The spec at specs/current/critique-theater.md:6 mandates a single critiqueTheater.userFacingName key so the "Design Jury" label can be renamed without touching code. Phase 8 introduced critiqueTheater.title by mistake; renamed across types.ts, en.ts, zh-CN.ts, de.ts, ja.ts, ko.ts, zh-TW.ts, and the lone consumer TheaterStage.tsx. The locale alignment test stays green. Validated - pnpm guard clean - pnpm --filter @open-design/web typecheck clean - Theater suite: 14 files, 112 tests (was 101 before, +11 new for the Phase 9 review pass: 3 mount + 4 InterruptButton focus scope; the rest were already in #1314's review fix). - tests/i18n/locales.test.ts 5 of 5 across 18 locales. * feat(daemon): adapter-degraded registry with TTL (Phase 10.1) In-memory registry recording adapters that produced malformed or oversize transcripts so the orchestrator can skip them for a TTL window (default 24h) instead of cycling through known-bad providers on every run. Records carry reason (malformed_block \| oversize_block \| missing_artifact), source label, and expiresAt. The test-only clock seam lets the suite advance time deterministically and prove that an expired entry stops counting as degraded without anyone calling clearDegraded. 7/7 vitest cases green. * feat(daemon): synthetic good + bad adapter fixtures (Phase 10.2) Two test-only adapters that read the existing v1 transcript fixtures (happy-3-rounds and malformed-unbalanced) and replay them as either a full string or a 512-byte chunked stream. The chunked form is what the conformance harness uses to prove the parser holds together when the transcript arrives in arbitrary network slices, not as one buffered blob. * feat(daemon): adapter conformance harness (Phase 10.3) runAdapterConformance pulls a transcript through the same parseCritiqueStream pipeline the orchestrator uses and classifies the outcome as shipped, degraded, or failed. On a degraded outcome it forwards the matched reason to the adapter-degraded registry, so a single nightly conformance run is what populates the skip list rather than the orchestrator learning each adapter is broken at request time. 5/5 vitest cases green covering shipped, malformed degraded, oversize degraded, no-ship failure, and the harness-thrown failure path. * test(e2e): Critique Theater Playwright suite (Phase 11) Six tests, one viewport per visual case, deterministic SSE fixtures stubbed via page.route(). Adds the suite to test:ui:extended so the existing extended-UI lane picks it up. Coverage: 1. Happy path: a single mounted theater plays the full fixture (1 run_started, 5 panelists open / dim / must_fix / close, 1 round_end, 1 ship) and ends on the score badge. 2. Interrupt mid-run: the panelist that is open at the time the interrupt button is clicked closes with an interrupted marker and the transcript freezes there. 3. Visual regression at 375x720 mobile. 4. Visual regression at 768x1024 tablet. 5. Visual regression at 1280x800 desktop. 6. A11y role tree: the theater region exposes a labelled landmark, each panelist lane is a group with an accessible name, the score is a status live region. All SSE traffic is stubbed by page.route so the suite runs in CI without a daemon. The toggle is seeded via localStorage by bootAppWithCritiqueEnabled so the gate behaves as if Settings flipped it on. typecheck clean; playwright --list reports 6. * test(web): reducer p99 bench at 10k iterations (Phase 13.1) Locks the documented 2ms budget for the Critique Theater reducer on a representative SSE script (27 actions, one full happy run) behind a regression gate. Asserts p99 stays under 4ms (2x the documented budget) so CI runners with a noisy neighbour do not flake while a real regression to 20ms or 200ms still trips. The bench is a vitest case rather than a bare microbenchmark so it runs in the same CI lane as every other web test and does not need a parallel runner. * test(web): critique surface coverage walker (Phase 13.2) Walks the public critique surface (11 SSE event names, 5 panelist roles, 6 lifecycle phases, 9 named i18n keys) and asserts each named symbol appears in both the src corpus and the test corpus. The walker is the gate that catches a rename in one half of the codebase without a matching update in the other half: a future PR that drops 'panelist_must_fix' from the reducer without also removing its test reference fails this suite. 62 assertions, one per symbol per corpus. * docs: Critique Theater user guide (Phase 14.1) Seven sections aimed at end users (not contributors): 1. What is Design Jury 2. How it works (the five panelists, auto-converging rounds, the composite formula) 3. Settings (the M1 toggle and what it does) 4. Reading the score badge 5. Replay surface 6. Troubleshooting (degraded, interrupted, failed) 7. FAQ The composite formula is documented as designer * 0 + critic * 0.4 + brand * 0.2 + a11y * 0.2 + copy * 0.2 because anyone trying to reverse-engineer the score is going to search for those weights and the docs are the place they should land first. * docs(daemon): critique module AGENTS map (Phase 14.2) Daemon-side wayfinder for the apps/daemon/src/critique directory. Tables every file, what owns what invariant, and the 'when you change anything here' guide so a future contributor does not have to reverse-engineer the rollout resolver before adding a new SSE event. * docs(web): Theater module AGENTS map (Phase 14.3) Web-side mirror of the daemon AGENTS map. Same file table, same invariants section, same change-impact guide, sized to the Theater component package. * feat(daemon): rollout flag resolver (Phase 15.1) Single decision point every caller consults to know whether the orchestrator should wire the critique pipeline for a given run. Priority: 1. Skill-level policy (required wins, opt-out wins inversely) 2. Per-project override from the Settings toggle 3. OD_CRITIQUE_ENABLED env override 4. Rollout phase default M0 dark-launch false M1 settings only false (toggle is off until the user flips it) M2 per-skill true if skill opted in M3 global default true OD_CRITIQUE_ROLLOUT_PHASE parser defaults to M0 on unknown input so a fresh install never surprises a user with the feature on. 10/10 vitest cases green covering every cell of the matrix. * feat(web): Settings toggle hook for Critique Theater (Phase 15.2) React hook that reads critiqueTheaterEnabled from the existing open-design:config localStorage blob and stays in sync via: - the platform storage event (cross-tab) - a open-design:critique-theater-toggle CustomEvent (same-tab) Same-tab event is the one that fires when the Settings panel saves in the current window: the toggle and every mounted theater update without a page reload. setCritiqueTheaterEnabled(next) is the imperative setter the Settings panel calls. It preserves the rest of the stored config (mode, apiKey, etc.) and dispatches the same-tab event after the localStorage write. The web hook reflects what the user toggled; the daemon-side isCritiqueEnabled is the final routing authority (project override, env, rollout phase). When they disagree, the daemon wins for backend gating and the web reflects the toggle state. 6/6 vitest cases green covering first read, stored read, same-tab event flip, config preservation, corrupted JSON tolerance, and cross-tab storage event. * test(web): Phase 15 toggle hook failure-mode coverage (PR #1320) lefarcen P2 on PR #1320 flagged that the PR body claimed safe behavior for disabled localStorage, non-object JSON, and missing CustomEvent shim, but the suite only covered corrupt JSON plus happy-path storage events. Added four failure-mode tests so the swallowed errors are not silently traded for a throw in a future refactor: 1. Returns false on a stored JSON value that parses to an array (non-object). Catches a regression where the guard treats anything truthy as a config blob. 2. Returns false on a stored JSON value of literal 'null'. typeof null === 'object' in JS, so the guard has to check null explicitly; this test pins that check. 3. Returns false when localStorage.getItem throws (private mode / disabled storage / SecurityError). The hook must swallow and return false so the rest of the app keeps rendering. 4. setCritiqueTheaterEnabled still dispatches the same-tab CustomEvent when localStorage.setItem throws (quota exceeded / disabled storage). The dispatch path is the in-session broadcast that keeps every mounted hook coherent even when persistence is unavailable; verified by mounting two probes and asserting both flip after the setter is called with a throwing setItem. 10/10 vitest cases green (6 existing + 4 new). * fix(web): honor CustomEvent payload in toggle hook listener (PR #1320) Both Siri-Ray (blocking) and lefarcen (P2 new) caught the same real bug in the failure-mode test I added in `affcdd27`: the test asserts the in-session UI flips when localStorage.setItem throws, but the CustomEvent listener was ignoring the event's typed detail and just calling readToggle(). Under a throwing setItem the localStorage value is stale (or absent), so the listener would see the OLD value and the test would fail (or worse, the production claim 'in-session event keeps mounts coherent' was hollow). Fixed the hook, not the test: the listener now reads event.detail.enabled when it is a boolean, falling back to readToggle() only for malformed events or for cross-tab storage events (which do not carry a typed payload). The setter already dispatched the detail; the listener just was not consuming it. Test changes: - The existing 'setItem throws' test now asserts the right behavior for the right reason. Updated the inline comment to say the listener reads from detail, not localStorage. - New test 'falls back to readToggle when the CustomEvent carries no usable detail' pins the fallback path: a malformed dispatcher (no detail, or detail.enabled not a boolean) degrades cleanly instead of throwing or being silently ignored. 11 / 11 vitest cases green (10 prior + 1 new fallback). * fix(web): tighten isPanelEvent in contracts so enum + numeric fields are checked end-to-end (Siri-Ray round-3 P1 on PR #1314) The variant validator on the web SSE path previously accepted any `typeof === 'string'` for closed-enum fields (ship.status, panelist_.role, degraded.reason, failed.cause, parser_warning.kind, run_started.cast[]) and any `typeof === 'number'` for numeric fields, which let NaN / Infinity through. Downstream components index i18n tables by enum value, so an unknown status or role would land `SHIP_BADGE_KEY[final.status]` on undefined and crash the translator. The replay parser had a separate gap: `useCritiqueReplay.parseTranscript` called the cheap `isPanelEvent` header check directly, so a recorded line like `{"type":"ship","runId":"r"}` reached the reducer with composite, status, round, artifactRef, summary all undefined and TheaterCollapsed then called `final.composite.toFixed(1)` on undefined. Resolution: move all wire-side validation into the contract guard. - Export const arrays for the closed enums: SHIP_STATUSES, DEGRADED_REASONS, FAILED_CAUSES, PARSER_WARNING_KINDS, ROUND_DECISIONS (PANELIST_ROLES already existed). - Rewrite `isPanelEvent` in packages/contracts/src/critique.ts to be the single deep validator: header (known type + non-empty runId) plus every variant-specific required field plus closed-enum membership plus Number.isFinite on every numeric field. Documented as the wire source of truth. - Drop the local `hasValidVariantShape` from web/sse.ts; sseToPanelEvent now relies entirely on the contract guard, and parseTranscript in useCritiqueReplay (which already uses isPanelEvent) gets the deeper validation for free. Tests (TDD, red-first): - packages/contracts/tests/critique.test.ts: 13 new cases pinning the strict guard directly (well-formed across every variant, every rejection path: unknown type, empty/non-string runId, unknown enum, non-finite numeric, missing variant field). - apps/web/tests/components/Theater/state/sse.test.ts: 9 new cases for each closed-enum rejection on the wire path plus a positive sweep across every legal enum value across every variant. - apps/web/tests/components/Theater/hooks/useCritiqueReplay.test.tsx: 2 new cases for incomplete and unknown-enum transcript lines. Verified: - pnpm --filter @open-design/contracts test 4 files / 30 tests green. - pnpm --filter @open-design/contracts build clean. - pnpm --filter @open-design/web typecheck clean. - pnpm --filter @open-design/web test 107 files / 976 tests green. docs: tighten Phase 14 reasoning from lefarcen review (PR #1319) Four content gaps lefarcen flagged in the Phase 14 docs review, addressed inline rather than deferred. The fifth item (scope-drift between 'docs only' PR body and the cumulative stacked diff) is handled by rewriting the PR body, not the docs. 1. Round exit conditions (lefarcen P2-1). docs/critique-theater.md §2 'Auto-converging rounds' now lists the five conditions that stop a run (threshold reached, round budget exhausted, per-round timeout, total timeout, user interrupt) with their default values. A user debugging a run that stopped at round 1 with composite 5.4 can read this list and find the matching cause without spelunking the orchestrator. 2. Prior-art comparison (lefarcen P2-2). New §1.5 'Why an in-CLI panel and not a third-party design lint' pre-answers the 'why not Figma lint / Adobe checker / Material You conformance' question. Three differences: rule engines vs generative reviewers, post-hoc vs in-loop, external service vs same-CLI-session. 3. Composite formula rationale (lefarcen P2-4). §2 now explains why each weight is set the way it is: critic gates correctness so it gets 0.4; brand / a11y / copy are secondary quality dimensions at 0.2 each; designer is at 0.0 in v1 because aesthetic preference is not a ship gate. The slot stays in the schema so notes flow into the transcript and a v2 config release can bump the weight without a wire-shape change. 4. v2 cast-config ownership (lefarcen P2-3). Both AGENTS.md files (daemon + web) now declare a 'Designer weight frozen at 0.0 until v2 cast config' invariant. The daemon side calls out where the SKILL.md frontmatter resolver lands (apps/daemon/src/critique/config.ts); the web side calls out where the Settings surface lands (apps/web/src/components/ Settings/). A contributor reading either AGENTS.md before implementing v2 sees which module to touch first. * docs: replace deferred metrics endpoint reference + refresh Theater module map (PR #1319) Two carryover items lefarcen flagged across the PR #1319 + #1320 reviews. 1. docs/critique-theater.md was sending users to /api/metrics/critique as the conformance-status check on malformed_block, but the Phase 12 metrics endpoint is explicitly deferred until after orchestrator wiring lands. Replaced the link with the pnpm conformance-harness command that DOES exist today (pnpm --filter @open-design/daemon vitest run tests/critique-conformance.test.ts) and noted that the dashboard surfaces this status as a series once Phase 12 ships. 2. apps/web/src/components/Theater/AGENTS.md module map was stale after Phase 15: the index.ts row said 'only two hooks are exported' but the barrel now exports useCritiqueTheaterEnabled too (plus the setCritiqueTheaterEnabled setter). Updated the row to list all three hooks + the setter + the reducer-derived contract types, and added a new row for hooks/useCritiqueTheaterEnabled.ts in the file table so a web contributor scanning the table sees the new hook without inferring it from the index.ts blurb. * docs(phase-15): clarify resolver + toggle scope in Section 3 (lefarcen P2 on PR #1320) Two P2 doc-accuracy items from the latest review: 1. The prior text said "the web toggle persists into the daemon's settings store; both surfaces flip the same flag." That isn't true on this PR's head: setCritiqueTheaterEnabled writes localStorage and dispatches a same-tab CustomEvent, but does NOT write through to the daemon (no /api/settings/critique endpoint ships in Phase 15, and no production caller of the setter rounds through to the daemon). Reworded the section to describe the actual scope: client-side in-session toggle this phase, daemon round-trip deferred to the Settings UI follow-up. 2. The prior text implied the rollout resolver was wired into the spawn-time gate, but apps/daemon/src/server.ts still reads critiqueCfg.enabled directly; isCritiqueEnabled is only called in tests. Reworded the section to make this explicit: Phase 15 ships the resolver in isolation so it can land green and be reviewed on its own, the one-line wiring change ships in the wireup PR that follows. Operators who want to enable the feature today should still set OD_CRITIQUE_ENABLED=1 rather than relying on the client toggle. That guidance is now explicit in the doc. * docs(rollout): align module docblock with actual Phase 15 scope (lefarcen P2 on PR #1320) The module-level docblock said the orchestrator entry and a GET /api/settings/critique endpoint already consume isCritiqueEnabled, but neither integration ships in this PR: the spawn-time gate in server.ts still reads critiqueCfg.enabled directly, and the daemon settings endpoint is deferred to the Settings UI follow-up. A future contributor reading the source comment would assume rollout phases and project overrides are live in generation routing when they are not yet wired. Reworded the docblock into three sections that mirror the user-facing docs/critique-theater.md scope notes: - What ships in Phase 15: the pure resolver function plus its supporting parsers, with full priority-matrix coverage. - Planned consumers (not yet wired): the orchestrator entry (one-line swap, lands in the wireup PR), the settings echo endpoint (lands with the Settings UI PR), and the conformance harness. - Operator guidance: until the wiring change lands, set OD_CRITIQUE_ENABLED=1 rather than relying on the client toggle. Resolution-order bullet 2 also updated: per-project override 'will write here once the Settings UI follow-up adds the daemon-side write path' instead of implying that write path exists today. Verified: - pnpm --filter @open-design/contracts build clean. - pnpm --filter @open-design/daemon typecheck clean. No runtime change. Documentation-only alignment. * fix(web): restore wait-for-daemon-ack pattern on Theater interrupt Same regression as flagged on PR #1316 post-main-merge: the optimistic local dispatch fired before the POST resolved, so a daemon 404 / 409 still terminalized the UI and the real SSE terminal event got ignored by the sticky interrupted phase. Snapshot runId / bestRound / composite at click time, dispatch interrupted only on res.ok, clear interruptPending on rejection or non-2xx so the user can retry. Tests cover rejection + 404 leaving the run on the live stage; the 204 path waits for the ack. * fix(web): export useCritiqueTheaterEnabled + setter from Theater barrel (Siri-Ray P2) The Phase 15 hook and its imperative setter were not in the public barrel, even though Phase 14 AGENTS.md describes index.ts as exporting them. That mismatch would force the Settings follow-up to import from the private hooks/ path (or render the AGENTS module map inaccurate). Added the export alongside useCritiqueStream and useCritiqueReplay so the Phase 15 public surface matches the module map. * fix(test): add projectKind prop to FileViewer deck render after v0.7.0 merge * fix(contracts): restore numeric-domain guards in isPanelEvent (lefarcen + Siri-Ray P2 on PR #1320) --------- Co-authored-by: Nagendhra <nagendhra405@gmail.com>	2026-05-13 21:19:51 +08:00
Dongsen	8e6250f2c1	feat(web): render GFM tables in markdown artifact and chat renderers (#1496 ) * feat(web): render GFM pipe tables in markdown artifact and chat renderers Both hand-rolled markdown parsers (artifact preview and assistant chat) ignored pipe-table syntax, so GFM tables collapsed into a single paragraph. Add a table block to each parser that: - detects header+alignment+body rows; - honors :--- / :---: / ---: column alignment; - preserves inline formatting (code, bold, italic, links) inside cells; - restores escaped \\| as a literal pipe; - breaks a preceding paragraph at a table start without a blank line. Closes #1495 * fix(web): use scan-based table cell splitter, drop NUL placeholder Address review on #1496: - lefarcen (P1): the placeholder approach in both splitters was unsafe. artifacts/markdown.ts embedded literal NUL bytes around 'ODPIPE', which flipped the whole file to binary in GitHub diffs and hid the patch from review. runtime/markdown.tsx used a printable ' ODPIPE ' sentinel that could collide with real cell content. - codex (P2): both splitters ran before inline parsing, so a literal '\|' inside a backtick code span (e.g. a TypeScript union cell like \`\"ready\" \| \"done\"\`) was treated as a column boundary and shredded the row. Both issues collapse to one root cause — splitting by regex on '\|' without knowing what's a code span and what's an escape. Replace the placeholder-then-split routine in both files with a single character- level state-machine scanner that: - treats '\|' inside backticks as cell content (tracks inCode); - resolves '\\\|' to a literal '\|' as it accumulates each cell; - strips a single optional leading '\|' and a single unescaped trailing '\|' as row terminators rather than empty cells. No placeholders, no NUL bytes, no collision surface. Add a test in each test file covering the GFM-style pipe-inside-code-span cell, and the runtime test additionally asserts the row produces exactly two <td> cells (i.e. no phantom column from the embedded pipe). * style(web): make markdown table apply in artifact viewer + add header bg, borders The prior `.md-table` rules were scoped to `.prose-block`, which only wraps the chat assistant path (AssistantMessage). The artifact preview in FileViewer renders into `.markdown-rendered`, so tables in long-form documents (the most common case) inherited no styling and showed as borderless, structureless rows. Lift the rule out of `.prose-block` so both paths share it. Visual treatment matches the existing prose-block ecosystem (blockquote, markdown-status): - `var(--border)` for cell + outer borders, 6px outer radius. - `var(--bg-panel)` on `<th>` for the header band — same token already used by blockquote and code-copy chip. - Even-row zebra via `color-mix(in oklab, var(--bg-panel) 40%, transparent)` for a very faint stripe that holds up in both light and dark themes (no hardcoded color). - Padding bumped from 4×8 to 8×12; cells were touching before. - `display: block; overflow-x: auto` kept so wide tables scroll inside narrow columns instead of pushing layout. * fix(web): wrap md-table in scroll container so columns fill width Previously `.md-table` itself was `display: block; overflow-x: auto` to let wide tables scroll horizontally. The side effect: a block-mode table no longer behaves like `<table>` for layout, so columns collapse to natural content width and leave empty space on the right when the content is narrower than the container. Split the two concerns: the new `.md-table-wrap` <div> owns the scroll viewport (border, radius, overflow-x), and the inner `<table>` keeps its native `display: table` with `width: 100%` so column widths distribute across the available space again. Wide tables still scroll; narrow tables now fill the container. Both render paths (chat runtime + artifact HTML) emit the wrapper, and the corresponding unit tests assert the nested shape.	2026-05-13 21:15:23 +08:00
pftom	d3d95121f3	feat(plugins): enhance visual score sorting and add new example templates - Updated the `sortByVisualAppeal` function to prioritize featured ranks, ensuring that curated plugins are displayed prominently. - Added tests to verify the new sorting logic, ensuring that plugins with numeric featured ranks are sorted correctly ahead of others. - Introduced new example templates for a magazine article layout, a Twitter share card, and a Xiaohongshu card, expanding the available options for users. - Enhanced the overall plugin preview experience by integrating these new templates, providing users with more visually appealing and functional examples. This update significantly improves the plugin sorting mechanism and enriches the template offerings, enhancing user engagement and experience.	2026-05-13 21:02:05 +08:00
pftom	8b2d48a258	feat(daemon, web): enhance plugin preview handling and add new templates - Introduced logic to assemble example slides with a companion template when the declared entry is missing, improving the user experience for plugin previews. - Updated the server logic to handle special cases for `example-slides.html`, ensuring proper fallback to `template.html` when applicable. - Enhanced tests to verify the new preview assembly functionality and ensure correct rendering of fallback content. - Added new HTML and Markdown examples for various skills, including a magazine article layout and a Twitter share card, expanding the available templates for users. This update significantly improves the plugin preview experience, providing users with more robust and visually appealing fallback options.	2026-05-13 20:58:24 +08:00
Caprika	06dbde51f9	[codex] Add Cursor Agent auth diagnostics (#1538 ) * Add Cursor Agent auth diagnostics * Handle Cursor not logged in auth status * Address Cursor auth review feedback * Classify Cursor stdout auth failures	2026-05-13 20:25:34 +08:00
Caprika	a3276ec542	[codex] Add visual draw annotation context (#1547 ) * feat(web): add visual draw annotation context * Fix visual draw annotation staging * Fix concurrent visual annotation IDs	2026-05-13 20:02:19 +08:00
Sid	fb545b8d21	fix(web): count nested .slide elements in deck preview bridge (#1542 ) * fix(web): count nested .slide elements in deck preview bridge Generated HTML decks commonly nest .slide elements under an extra wrapper rather than placing them as direct children of .deck / .deck-stage / .deck-shell / body. The bridge then reported count: 0 and the preview toolbar showed 1 / 0 even though the deck visibly contained slides and its own keyboard handler navigated them. Keep the structured selectors first so decorative .slide markup in non-deck pages is not counted, and fall back to all .slide only when nothing structured matched. Pinned with a regression test that runs the extracted deck bridge script against a JSDOM nested-slide layout. Fixes #1530 * test(web): pin deck bridge structured-first precedence with decoy .slide Address review feedback on #1542: the existing structured-first case expected count === 3 but the fixture only contained the three real slides, so a regression that dropped the structured selector entirely and went straight to `.slide` would still pass. Add a decoy `.slide` in a sibling <header> outside `.deck`; the structured-first pass keeps count === 3, while a naive broad selector would now count 4. Verified by temporarily collapsing slides() to `document.querySelectorAll('.slide')` and observing the second case fail with `expected 4 to be 3`.	2026-05-13 19:42:20 +08:00
Prantik Medhi	086be271d4	fix: hide preview chrome in source view (#1556 ) * fix: hide preview chrome in source view * fix: keep source-view edit controls	2026-05-13 19:12:00 +08:00
Prantik Medhi	660c5b88b4	fix(web): auto-scroll feedback form (#1566 )	2026-05-13 19:06:08 +08:00
lefarcen	fa2bb59ab3	fix(test): align FileViewer tests with merged component - Drop FileViewer.inspect-empty-hint.test.tsx (deleted on main; should have been gone after the merge but the modify/delete conflict left it in the tree). - Take main's FileViewer.test.tsx so the manual-edit interaction tests match the merged FileViewer.tsx behaviour, then patch every <FileViewer> render with projectKind="prototype" so it satisfies the prop requirement release added in #1509.	2026-05-13 18:49:44 +08:00
lefarcen	4096d65125	fix(web): align Icon names and FileViewer tests with merged state - SettingsDialog: use 'sparkles' for Pet section nav icon (main's choice; the merge picked up release's 'paw' which is not in main's IconName union). - FileViewer.test.tsx: take release's version which passes projectKind on every render (release added that prop in #1509 analytics; main's tests predate that prop). - FileViewer.manual-edit{,-history}.test.tsx: keep main's tests but pass projectKind="prototype" so they satisfy the merged FileViewer Props.	2026-05-13 18:40:48 +08:00
lefarcen	5172e37217	Merge origin/main into release/v0.7.0 to prepare merge-back PR Resolves 7 conflicts via hybrid strategy: - apps/web/src/components/EntryView.tsx: take main (Discord+X pills are forward feature) - apps/web/src/components/Icon.tsx: take main (switch-case refactor) - apps/web/src/components/NewProjectPanel.tsx: take release (preserve #1514 dropdown UX validated in 0.7.0 acceptance) - apps/web/src/index.css: take main (project-target-platforms / instructions chip styles) - apps/web/tests/components/FileViewer.inspect-empty-hint.test.tsx: accept main's deletion - nix/package-daemon.nix, nix/package-web.nix: take main pnpmDepsHash Non-conflicting hunks from #1519 (AppChromeHeader), #1428 (PostHog analytics call sites), and #1540 (release light background) are preserved via auto-merge.	2026-05-13 18:19:47 +08:00
Prantik Medhi	9040088f1c	fix(web): remove redundant bulk-select button (#1550 )	2026-05-13 16:38:14 +08:00
kami	4f76e836ae	feat(audio): add ElevenLabs audio support (#1384 ) * docs: add ElevenLabs audio support design * docs: add ElevenLabs audio implementation plan * feat(daemon): add ElevenLabs speech renderer * feat(daemon): add ElevenLabs sound effects renderer * fix(daemon): preserve ElevenLabs sfx durations * feat(web): expose ElevenLabs media providers * feat(daemon): document ElevenLabs audio contract * feat(audio): add ElevenLabs voice selection * chore: ignore superpowers scratch docs * fix(daemon): cache ElevenLabs voice options * fix(audio): expand ElevenLabs voice and SFX selection * fix(audio): align ElevenLabs SFX controls * fix(audio): tighten ElevenLabs SFX prompt budget * fix(audio): preflight ElevenLabs SFX prompt length * fix(audio): surface ElevenLabs lookup failures * fix(audio): sanitize ElevenLabs prompt errors	2026-05-13 15:53:41 +08:00
PerishFire	11b4750677	Update release light background (#1540 )	2026-05-13 15:36:22 +08:00
Rocky	cd68c8a80a	fix(daemon+web): emit conversation-created SSE event when routine run starts (#1523 ) * fix(daemon+web): emit conversation-created SSE event when routine run starts When a Routine fires in "Reuse an existing project" mode, the daemon creates a new conversation in the project and writes a queued/running assistant message to the database, but the open `ProjectView` has no way to learn that anything happened: the project events SSE stream only carries `file-changed` and `live_artifact` events, and `ProjectView` reloads conversations only when `project.id` changes. The result is the user's own routine "Run now" appears to do nothing until they exit and re-enter the project (#1361). Fix: - Add a `conversation-created` payload type to the existing project events stream in `apps/web/src/providers/project-events.ts`. The payload carries `projectId`, `conversationId`, `title`, and `createdAt`. Mirror the existing `file-changed` listener pattern with explicit malformed-payload handling. - In `apps/daemon/src/server.ts`, after `insertConversation` runs in the routine `setRunHandler` (both reuse-an-existing-project and new-project paths), broadcast a `conversation-created` event through the existing `activeProjectEventSinks` map. The function body was already generic so it was renamed from `emitProjectLiveArtifactEvent` to `emitProjectEvent` and the two pre-existing callers updated. - In `apps/web/src/components/ProjectView.tsx`, when `handleProjectEvent` receives a `conversation-created` event whose `projectId` matches the currently-viewed project, refetch the conversation list via `listConversations`. The active conversation is intentionally NOT changed — per maintainer guidance on #1361, auto-switching is a separate UX decision left for a follow-up. - `projectEventToAgentEvent` returns null for `conversation-created` so it doesn't get routed into the live-artifact path. Tests (`apps/web/tests/providers/project-events.test.ts`): - A single `conversation-created` event reaches the consumer with the parsed payload. - Two consecutive `conversation-created` events from concurrent routine runs both reach the consumer (covers the multiple-concurrent-runs case reported in #1502). - Malformed `conversation-created` payloads are swallowed without throwing, matching the existing `file-changed` / `live_artifact` defensive behavior. Manual verification: - Built locally with `pnpm exec tools-pack mac build --to app --portable` and installed. - Created a routine in `Reuse an existing project` mode targeting an existing project. - With the project view open, clicked `Run now`. The new "Routine" conversation appeared in the project's conversation list within about a second, without exiting and re-entering the project, and the active conversation was not changed. - Clicked `Run now` twice in quick succession; both new conversations appeared in the list, covering the concurrent-runs case in #1502. - `pnpm guard` and `pnpm --filter @open-design/web typecheck` clean; full web test suite is 1016/1016 passing. Fixes #1361 fix(#1523 review): share SSE type via contracts; guard conversation refresh against project-switch and reordering races Addresses Codex P1 + lefarcen P2 inline review on #1523 (#1361): 1. Move `ProjectConversationCreatedSsePayload` to `@open-design/contracts` (`packages/contracts/src/sse/chat.ts`) so the daemon producer and the web consumer share one type. The web provider re-exports it under the local `ProjectConversationCreatedEvent` name to keep the existing import shape stable for callers; the daemon emit site picks up the same shape via a JSDoc typedef so producer and consumer can't drift as this stream grows. (Addresses lefarcen P2 on project-events.ts:17.) 2. Guard the `conversation-created` async refresh in `ProjectView` against two distinct races: - Project-switch race: capture `project.id` at dispatch time and re-check it via a live `projectIdRef` after `listConversations` resolves; bail if the user switched projects while the request was in flight. The existing project-load effects use the same cancellation pattern. (Addresses Codex P1 on ProjectView.tsx:767.) - Concurrent-refresh re-ordering race: bump a monotonic `conversationsRefreshTokenRef` on every dispatch and capture each request's token; only the request whose captured token still equals the live ref at await-return applies its result. Two rapid `conversation-created` events (the #1502 concurrent Run-now case) can no longer drop the newest conversation when the earlier request resolves last with a stale, shorter list. (Addresses lefarcen P2 on ProjectView.tsx:767.) Both guards are documented inline with comments that point back at the review threads. The existing project-events tests (single delivery, concurrent delivery, malformed payloads) are unchanged — the new guards are defensive logic on the consumer, not new event shapes. `pnpm guard`, `pnpm --filter @open-design/web typecheck`, `pnpm --filter @open-design/daemon typecheck`, and the full web test suite (1016/1016) remain green.	2026-05-13 14:50:58 +08:00
pftom	9e196d34af	feat(daemon, web): enhance plugin sharing workflows and UI components - Updated the plugin sharing prompts to utilize local daemon endpoints for publishing to GitHub and contributing to Open Design, streamlining the user experience. - Refactored the `PluginsView` and `PluginShareMenu` components to support new sharing functionalities, including confirmation modals and improved link handling. - Enhanced the CSS styles for the plugin share confirmation modal and related UI elements for better visual consistency. - Added tests to verify the functionality of the new sharing workflows and ensure proper integration within the existing plugin management system. This update significantly improves the plugin sharing experience, making it easier for users to publish and contribute their plugins effectively.	2026-05-13 14:35:09 +08:00
Sid	eda182c8a1	refactor(web): UI polish for v0.7.0 — neutralised palette, official brand glyphs, lucide (#1522 ) * refactor(web): adopt lucide-react for the inline Icon component The hand-rolled `<Icon>` set drifted in stroke weight and proportion across its 50+ glyphs as new icons were added. Swap the implementation to dispatch to `lucide-react` while keeping the same `<Icon name="..." size={X} />` API so the 246 existing call sites stay untouched. - Adds `lucide-react` as a dependency (tree-shaken; ~30KB gzipped for the ~50 icons we actually import). - `discord` and `x-brand` keep their bespoke inline SVG paths since lucide intentionally does not ship brand artwork. - `spinner` continues to use the existing `.icon-spin` className for its rotation; under the hood it now renders lucide's `Loader2`. - New `paw` glyph (lucide `PawPrint`) so the Pets nav item stops sharing the `sparkles` icon with External MCP. No behaviour change: the prop surface is identical, fill follows `currentColor` exactly as before, and aria-hidden / focusable defaults are preserved. Visual deltas are limited to the strokes themselves (slightly finer endcaps, more consistent baseline weights) — exactly the consistency upgrade lucide gives us. * feat(web): bundle official brand assets for agent icons `AgentIcon` previously approximated each agent's brand with hand-drawn SVG (orange Anthropic-ish sparkle, OpenAI-knot ellipses, etc). Replace those approximations with the real, vendor-published artwork shipped as static assets under `apps/web/public/agent-icons/`. - 13 SVG marks sourced from `@lobehub/icons-static-svg` (MIT) — color variants where the vendor published one (Claude, Codex, Gemini, Copilot, Qwen, Qoder, DeepSeek, Kimi, Mistral/Vibe), monochrome marks for the rest (Cursor, OpenCode, Hermes, MiMo, Pi, Kilo). - 1 PNG mark (Devin) sourced from devin.ai/icon.png, resized to 96×96 via `sips` since Cognition doesn't publish an SVG. - Each SVG was cleaned (stripped `<title>` brand text and the library's internal `style="flex:none;..."` ; dropped `width/height="1em"` so `viewBox` governs sizing) and run through `svgo --multipass`. Total bundle footprint: ~36 KB for all 17 files, only loaded on the agent cards that render them. - `AgentIcon` now resolves brands via a small `ICON_EXT` table and renders `<img src="/agent-icons/<id>.<ext>">`. Agents without an asset (`devin` is the lone outlier removed in this commit because PNG; new agents with no shipped artwork at all) fall back to an initial-letter pill that reads as "no official mark yet" rather than inventing brand artwork. - Removes the `simple-icons` dependency from a previous iteration since `AgentIcon` was its only consumer. Public-API stable: `<AgentIcon id={a.id} size={X} />` still accepts the same prop shape; `AvatarMenu`'s small-size usage continues to work. * refactor(web): polish entry view + Settings dialog UI for v0.7.0 A sweep over the two surfaces that have the most visual surface area in the app (the entry sidebar / New Project panel on the left, and the Settings modal). The work converged on a single neutral palette + a small set of shared dimensional standards documented in CSS, so future sections that get added slot into the same rhythm. New Project panel (apps/web/src/components/NewProjectPanel.tsx + .newproj* rules in index.css) - Adds a spec comment block at the top of the .newproj rules listing the canonical heights (input 30, dropdown 38, compact toggle 36, popover item 38) and the neutral colour rules. - Rebuilds PlatformPicker as a DS-picker-style dropdown trigger + popover (the previous 6-card 2×3 grid was ~280px tall; the dropdown collapses to a single 38px row with the same multi-select semantics). - Replaces SurfaceOptions' two heavy `ToggleRow` cards with the new compact one-line `CompactToggle`; the descriptive hint moves to a native `title` tooltip. - Compresses the Fidelity card grid (thumb aspect 16/7 → 16/5, tighter padding, smaller label). - Neutralises every selected/active state inside the panel: removes the orange accent fills and rings from `.newproj-card.active`, `.newproj-title-badge`, `.compact-toggle.on`, `.toggle-row.on`, the DS picker popover items + radio/check marks, the trigger open border and shadow, and the search-bar background. The Create CTA stays the only orange element on the panel. - Aligns the project-name input focus state across the sidebar: border `var(--text)` + 8% black halo (rgba is written out because the CSS pipeline collapses `color-mix(... 8%, transparent)` down to a solid `var(--text)`, which would render as a 3px solid black band). - Switches the body card from `flex: 1 1 auto` to `flex: 0 1 auto` so a short form variant doesn't leave a white void at the bottom of the card, and disables overscroll-bounce on the card so a fast scroll doesn't briefly expose the page-level gray under the white surface. - Pins the privacy footer below the card with a fixed 0 margin-top + shorter padding-top so it reads as a label of the card rather than a centred dialog footer. Entry sidebar footer (apps/web/src/components/EntryView.tsx + .entry-side-foot* rules) - Replaces the X social pill's `external-link` placeholder glyph with a bespoke filled `x-brand` SVG that mirrors the `discord` mark already in the icon set. - Wraps Discord + X in `.entry-side-foot-social` and lets that group flex-margin to the right of the row, so the two social pills read as a tight pair instead of a fourth pill stuck to the Pet pill. - Drops the "unadopted" red dot on the Pet pill (it duplicated the call to action that the label already carried). - Shrinks the footer icons to 10px and dims them to 55% / 75% opacity on hover so the labels are clearly the focal point — `currentColor` on the lucide-rendered SVGs would otherwise make the glyphs full black on hover. - Tightens the env-pill version text cap (180 → 142) so the top row ends close to the right edge of the Language + Pet group below it. Settings dialog (apps/web/src/components/SettingsDialog.tsx + .modal-settings / .settings-* / .seg-* / .agent-* rules) - Removes the "SETTINGS" kicker eyebrow above each section title (the big-typography title and modal context already make it redundant). - Switches the sidebar from a card-per-item layout to ChatGPT-style single-line pills: hides the `<small>` description, swaps the sidebar bg from gray to white, makes the active item a gray pill (no border, no shadow) so all items keep a consistent row height regardless of state. - Drops the modal-body's top border (already separated by the whitespace between modal-head and the body grid) and pins `.modal-settings { height: min(720px, 100vh - 64px) }` so the dialog no longer resizes when the user switches between short and long sections. - Compresses the Local CLI / BYOK seg-control from a 2-line ~52px card pair to a 1-line ~42px segmented pill that height-matches the active sidebar nav-item, and aligns the `.settings-content` padding-top with `.settings-sidebar` (22 → 16) so the first content row sits level with the first sidebar item. - Neutralises agent-card selected state, install/docs link colour, and protocol-chip active state — same accent-stripping pattern as the New Project panel. - Uniform agent-card height via `min-height: 64px` so installed cards (icon + name + version) align with unavailable cards (icon + name + not-installed + Install/Docs row). No prop-API changes, no business-logic edits — this is a pure visual refactor. Existing tests, providers and daemon contracts are untouched.	2026-05-13 13:59:19 +08:00
Jesse Yu	b2841f6045	Fix user chat message bubble styling (#1517 ) Co-authored-by: Haoyuan Yu <haoyuan.yu@shopee.com>	2026-05-13 13:59:15 +08:00
Caprika	6736310a01	Implement manual edit inspector (#1448 ) * feat(web): tweaks palette popover with HSL hue-shift recoloring Adds a Tweaks color-palette popover to the HTML preview toolbar. Selecting a palette re-skins the iframe in place via a srcDoc-side bridge that walks the DOM and shifts every chromatic paint to the target hue while preserving each color's saturation and lightness — pale tints stay pale, bold CTAs stay bold, just in the new color family. Mono-noir desaturates instead of shifting. - runtime/srcdoc: new injectPaletteBridge + paletteBridge / initialPalette options - file-viewer-render-mode: paletteActive flips URL-load back to srcDoc so the bridge can be injected - FileViewer: state, popover, postMessage wiring, srcDoc + useUrlLoadPreview integration - PaletteTweaks: popover UI with Original + Coral / Electric / Acid forest / Risograph / Mono noir - PreviewDrawOverlay: stub pass-through until the draw branch lands * feat(web): hide finalize-design toolbar from project header * test(e2e): skip project actions toolbar flow after toolbar removal * Polish manual edit inspector * Implement manual edit inspector * Fix manual edit review regressions * Fix FileViewer CI regressions * Fix remaining manual edit review issues * Flush manual edit styles before draw exit * Restore Critique Theater styles * Accept pixel line-height manual edits --------- Co-authored-by: qiongyu1999 <2694684348@qq.com>	2026-05-13 13:25:58 +08:00
pftom	c9cc3b88c0	feat(web): standardize plugin terminology and enhance UI components - Updated terminology from "Community" to "Official" across various components to reflect first-party plugin status. - Enhanced the ChatComposer, HomeHero, and PluginsHomeSection components to improve user experience and clarity in plugin management. - Improved CSS styles for better visual consistency and layout across plugin-related interfaces. - Added tests to ensure proper functionality and visibility of official plugins in the UI. This update reinforces the distinction between official and user-installed plugins, enhancing the overall user experience in plugin interactions.	2026-05-13 12:19:29 +08:00
nettee	0f0d2879ff	Make de/fr/ru content i18n optional (#1511 )	2026-05-13 12:17:17 +08:00
lefarcen	dc7791ef9d	feat(analytics): add project_id + project_kind to studio/artifact events (#1509 ) Product tracking doc 260513 added project_id + project_kind to studio_view (artifact), studio_click (share_option), and artifact_export_result. The Studio funnel can now group by project type without joining run_created on the back end. - contracts: 3 props gain required project_id + project_kind - ProjectView → FileWorkspace → FileViewer: thread projectKind down, converting metadata.kind via projectKindToTracking once at the top - FileViewer + HtmlViewer: populate the three call sites	2026-05-13 12:13:55 +08:00
Siri-Ray	c16297f10c	Refine preview and project dropdown controls (#1514 ) * Refine preview and project dropdown controls * fix(web): gate OS widget metadata Generated-By: looper 0.7.4 (runner=fixer, agent=codex) * fix(web): mark platform picker listbox multi-select Generated-By: looper 0.7.4 (runner=fixer, agent=codex)	2026-05-13 12:13:31 +08:00
Nagendhra Madishetti	e2f409579d	docs: Critique Theater Phase 14 (user guide + 2 AGENTS module maps) (#1319 ) * feat(web): pure reducer for Critique Theater states (Phase 7.1) Pure CritiqueState reducer driven by the contracts-level PanelEvent (the same shape both the live SSE stream and the recorded transcript emit), so a single reducer powers both the in-flight panel and the rerun replay. Lifecycle covers run_started → running → (shipped / degraded / interrupted / failed), with panelist_open / dim / must_fix / close / round_end events building per-round CritiquePanelistView entries as they arrive. Defensive behaviour that surfaced while writing the spec tests: - Terminal phases (shipped / degraded / interrupted / failed) are sticky against further lifecycle events for the same run, except for parser_warning which can land late and is recorded in a side channel without changing phase. - A new run_started for a different runId at any time discards the prior state and reboots, so the UI can launch consecutive runs without an explicit reset action. - Events whose runId does not match the active run return the same state reference, so React's useReducer doesn't re-render subscribers on stray traffic. - Round bookkeeping keys by round number rather than "always last", so an out-of-order panelist_dim for round 1 arriving after a round 2 dim does not corrupt the round 2 bucket. Test coverage: 18 cases covering each transition, the runId guard, sticky-terminal behaviour, the out-of-order round invariant, and the stable-identity guarantee. Sets up Phase 7.2 and 7.3 to wire SSE + replay into the same reducer. * feat(web): useCritiqueStream hook subscribes to SSE and feeds reducer (Phase 7.2) createCritiqueEventsConnection is a pure connection manager that mirrors apps/web/src/providers/project-events.ts: opens an EventSource at /api/projects/:id/events, listens for every name in CRITIQUE_SSE_EVENT_NAMES, decodes each frame back into a PanelEvent (stripping the critique. prefix and merging the data payload), and hands it to the caller's onEvent. Reconnect uses exponential backoff (1s → 30s) and resets on `ready`; malformed payloads drop with a dev-mode warning rather than tearing the stream. useCritiqueStream wraps the manager in a useReducer that owns the CritiqueState. enabled=false or a null projectId tears down the connection cleanly; switching projectId closes the old connection and opens a fresh one. The returned dispatch lets local UI synthesise actions (e.g. an Esc keypress firing a synthetic interrupted while a kill request is in flight); production traffic comes from the SSE stream. Test coverage: - sse.test.ts (10 cases, node env): subscription set covers every CRITIQUE_SSE_EVENT_NAMES channel; payload decoding lifts the wire shape back to PanelEvent; malformed JSON is swallowed and does not stop the stream; exponential backoff schedule and ready-reset semantics are pinned with a setTimeout seam; close() cancels pending reconnects and shuts the live source; no-op fallback when EventSource is unavailable. - useCritiqueStream.test.tsx (6 cases, jsdom env): idle pre-event, reducer driven by synthetic actions, no connection when disabled or projectId is null, clean close on unmount, projectId change reopens cleanly. * feat(web): useCritiqueReplay hook drives reducer from transcript file (Phase 7.3) Fetches the per-run NDJSON transcript (one PanelEvent per line), parses every line via the shared isPanelEvent predicate, and dispatches into the same CritiqueState reducer the live SSE stream uses. A single reducer means the UI rendering a replay can be identical to the live panel, and a UI mounting both useCritiqueStream and useCritiqueReplay in parallel does not have to reconcile two state shapes. speed knob is `paused \| instant \| live \| { intervalMs: N }`. - instant flushes every event synchronously, useful for opening a finished run already at its terminal state. - intervalMs paces dispatches at a fixed cadence so the reviewer can watch the run unfold. - paused parses the transcript but holds events back until the caller advances speed (consumers can drive a scrubber later). - live is reserved for the future "playback at original cadence" feature, currently treated as instant; replay timestamps are not yet persisted with each event so honest pacing requires a follow-up Phase 7+ task. gunzip seam handles `.ndjson.gz` transcripts via DecompressionStream when present; the production fetch path picks between text and arrayBuffer based on the URL extension. Both seams are injectable so the unit tests don't need to spin up a real network or a real gzip pipeline. Test coverage (8 cases, jsdom env): - Idle status before any URL is provided. - speed=instant flushes the full transcript synchronously to shipped state. - speed={intervalMs:N} paces with the setTimeout seam, reaching done after the last tick. - speed=paused leaves status=playing with no dispatches. - Empty transcript reports done with state still idle. - Fetch rejection surfaces an error status with the message. - Malformed NDJSON lines are skipped; valid events around them still land. - .gz transcripts route through the gunzip seam. Closes the Phase 7 plan tasks 7.1 / 7.2 / 7.3 (reducer + stream + replay), all on one branch ready for review. Phases 8+ (Theater components) consume these from this PR. * fix(web): close payload-override gap + paused-resume bug in Critique Theater hooks (Phase 7 review) Two P1 fixes from lefarcen's review on PR #1307: SSE payload override `sseToPanelEvent` previously spread `data` after the channel-derived `type`, so a payload-provided `type` could override the channel and route a `critique.run_started` frame into the reducer as a `ship` action. Reversed the spread so the channel-derived `type` is authoritative, and revalidated the resulting object through the contracts-level `isPanelEvent` predicate before returning. Frames that fail validation (missing runId, empty runId, unknown type) are dropped, so a malformed or compromised SSE frame can no longer dispatch a wrong-shape action into the reducer. Three new sse.test.ts cases pin the regression: hostile `type:'ship'` in the payload still resolves to `run_started`, missing runId is dropped, empty runId is dropped. Replay pause/resume `useCritiqueReplay` had one big effect keyed on `transcriptUrl` only, so flipping `speed` from `paused` to `instant` never re-fired and the held events sat undispatched. Split into a parse effect (depends on URL, fetches and stores events in state) and a pace effect (depends on parsed-events + speed, owns the cursor + timers). The playback cursor lives in a ref that survives pause/resume cycles, so flipping `paused` -> `instant` flushes from the current position rather than restarting (which would double-dispatch `run_started` and reset the reducer). Two new useCritiqueReplay.test.tsx cases: - paused-then-instant transitions from `playing` to `done` and reaches the shipped terminal phase - intervalMs paced playback dispatches one event, pauses to drain the next scheduled timer, flips to instant, and confirms the remaining transcript drains exactly once (cursor was preserved) Doc consistency The earlier source comment in useCritiqueReplay.ts claimed `live` "paces by recorded timestamps" while the impl used zero-delay timers and the PR body said it behaves like `instant`. Aligned to reality: `live` currently behaves like `{ intervalMs: 0 }` (events drain on successive microtasks via setTimeoutFn) because transcripts do not yet carry per-event timestamps. Honest timestamp-driven pacing is queued as a Phase 7+ follow-up. Validated: pnpm guard, pnpm --filter @open-design/web typecheck, Theater suite 47/47 (up from 42, +3 sse + 2 replay), full web suite 96 files / 888 tests. * feat(i18n): seed Critique Theater key block (en + zh-CN; other locales fall back via spread) * feat(web): Theater PanelistLane component (Phase 8.1) * feat(web): Theater ScoreTicker component (Phase 8.2) * feat(web): Theater RoundDivider component (Phase 8.3) * feat(web): Theater InterruptButton component with Escape keybind (Phase 8.4) * feat(web): Theater TheaterDegraded chip (Phase 8.5) * feat(web): Theater TheaterCollapsed post-run summary (Phase 8.6) * feat(web): Theater TheaterTranscript replay surface (Phase 8.7) * feat(web): Theater TheaterStage top-level container (Phase 8.8) * feat(web): Theater CSS using existing semantic tokens (no hex literals) * feat(web): Theater public exports barrel * fix(web): resolve P2 + P3 review feedback on Phase 8 (PR #1314) Addresses all 4 P2 + 3 P3 items from codex, Siri-Ray, and lefarcen. State-lifecycle fixes (3 x P2) 1. Reducer learns a synthetic `__reset__` action (`CritiqueResetAction`). Host hooks dispatch it when their gating prop changes so a stale run from a prior project / transcript cannot bleed into the next context. Reset is idempotent on idle (returns the same reference). 2. `useCritiqueStream` dispatches `__reset__` at the top of its connection effect, so a workspace switch from project A (which streamed a critique) to project B clears the reducer before the new EventSource opens. enabled=false also clears. 3. `useCritiqueReplay` dispatches `__reset__` at the top of its parse effect, so transcriptUrl swaps (including swap-to-null after a replay reached `shipped`) lift the reducer back to idle before the new fetch starts. SSE validation (1 x P2) 4. `sseToPanelEvent` now runs a per-variant `hasValidVariantShape` check after the cheap `isPanelEvent` predicate. A `critique.ship` frame missing `composite` / `round` / `status` / `artifactRef` is rejected before reaching the reducer, so TheaterCollapsed can no longer crash on `undefined.toFixed(1)`. Every variant's required fields are validated: run_started (protocolVersion, non-empty cast, maxRounds, threshold, scale), panelist_* (round, role, plus variant-specific shape), round_end (round, composite, mustFix, decision in {continue,ship}, reason), ship (round, composite, status, artifactRef.{projectId,artifactId}, summary), degraded (reason, adapter), interrupted (bestRound, composite), failed (cause), parser_warning (kind, position). Reducer correctness (1 x P2) 5. `panelist_open` now materializes the round + an empty panelist view (`{dims: [], mustFixes: []}`) so TheaterStage can highlight the in-progress lane the instant the tag opens. Before this, a stream that emitted only `panelist_open` after `run_started` left `rounds = []` and the UI rendered no current round until a later `panelist_dim` arrived. Polish (3 x P3) 6. Brand role tint swaps from `var(--magenta, var(--accent))` to `var(--purple, var(--accent))`. `--purple` is actually defined across the design systems; `--magenta` is not, so Brand was silently falling through to `--accent` and looking identical to Designer. 7. New i18n key `critiqueTheater.interruptedSummary` for the interrupted-collapse copy ("Interrupted at round N, best composite X.X"). Previously the interrupted branch reused `shippedSummary` and the UI read "Shipped at round..." for a run that specifically did not ship. Native value in en + zh-CN; other locales fall back via `...en` spread. 8. `TheaterDegraded` heading id comes from `useId()` instead of a hardcoded `theater-degraded-heading`, so two chips rendered on the same page (chat history with multiple completed runs) keep their aria-labelledby references unambiguous. Tests (15 new cases) - reducer.test.ts (+5): __reset__ on running/terminal/idle, panelist_open materializes round, panelist_open does not stomp prior panelist data. - sse.test.ts (+6): variant-level rejection for ship without required fields, degraded without adapter, run_started with empty cast, panelist_dim with non-numeric score, round_end with unknown decision, plus a positive fully-formed ship. - useCritiqueStream.test.tsx (+2): state reset on projectId change, state reset on enabled flip false. - useCritiqueReplay.test.tsx (+1): state reset on transcriptUrl swap to null after a replay reached shipped. - TheaterCollapsed.test.tsx (text-pinning update): asserts the interrupted branch reads "Interrupted at round 1" + "best composite 7.9", and explicitly NOT "Shipped at round...". - TheaterDegraded.test.tsx (+1): two chips on the same page get unique aria-labelledby ids that each resolve to an `<h3>`. Validated - pnpm guard clean - pnpm --filter @open-design/web typecheck clean - Theater suite: 13 files, 101 tests (was 86 on the first Phase 8 push, +15 new) - tests/i18n/locales.test.ts 5 of 5 across 18 locales * feat(web): CritiqueTheaterMount wires SSE + reducer into a single drop-in (Phase 9.1) * feat(i18n): Critique Theater strings for de + ja + ko + zh-TW (Phase 9.2) * fix(web): resolve P1 + P2 review feedback on Phase 9 (PR #1315) Addresses every blocker from codex, Siri-Ray, and lefarcen. The three state-lifecycle and SSE-validation issues they also flagged inherit fixes from PR #1314's review pass that this branch now sits on top of after rebase. Real daemon kill on Interrupt (P1) - CritiqueTheaterMount now POSTs to /api/projects/:id/critique/:runId/interrupt alongside the optimistic local dispatch. Before this fix, clicking Interrupt only flipped the React state to interrupted while the daemon job kept running. The fetch is best-effort: a 404 (endpoint not wired yet, lands in Phase 15) is swallowed with a dev-mode console.warn so the UI still moves to the collapsed badge. - New fetchInterrupt test seam lets RTL assert on the URL / method and simulate the "daemon not ready yet" path. Two tests pin both: the happy URL proj-42/critique/run-abc/interrupt POSTs, and a rejected fetch still flips the UI. interruptPending reset on new run (P2) - A ref-backed effect compares the current runId against the last one we saw; when it changes, interruptPending is cleared. A user who interrupts run-1 and then triggers run-2 from the same mount now gets a fresh, enabled kill button instead of one stuck in "Interrupting…". Pinned by a new mount test. Escape keybind scope (P2) - InterruptButton now checks the keydown target. Escape inside an input, textarea, select, or contenteditable element is ignored (and any ancestor of those via closest() is treated the same way). Body-level focus still fires the keybind so the Theater area's affordance keeps working. Four new tests cover textarea, input, contenteditable, and the body-focus positive case. userFacingName i18n key (P2) - The spec at specs/current/critique-theater.md:6 mandates a single critiqueTheater.userFacingName key so the "Design Jury" label can be renamed without touching code. Phase 8 introduced critiqueTheater.title by mistake; renamed across types.ts, en.ts, zh-CN.ts, de.ts, ja.ts, ko.ts, zh-TW.ts, and the lone consumer TheaterStage.tsx. The locale alignment test stays green. Validated - pnpm guard clean - pnpm --filter @open-design/web typecheck clean - Theater suite: 14 files, 112 tests (was 101 before, +11 new for the Phase 9 review pass: 3 mount + 4 InterruptButton focus scope; the rest were already in #1314's review fix). - tests/i18n/locales.test.ts 5 of 5 across 18 locales. * feat(daemon): adapter-degraded registry with TTL (Phase 10.1) In-memory registry recording adapters that produced malformed or oversize transcripts so the orchestrator can skip them for a TTL window (default 24h) instead of cycling through known-bad providers on every run. Records carry reason (malformed_block \| oversize_block \| missing_artifact), source label, and expiresAt. The test-only clock seam lets the suite advance time deterministically and prove that an expired entry stops counting as degraded without anyone calling clearDegraded. 7/7 vitest cases green. * feat(daemon): synthetic good + bad adapter fixtures (Phase 10.2) Two test-only adapters that read the existing v1 transcript fixtures (happy-3-rounds and malformed-unbalanced) and replay them as either a full string or a 512-byte chunked stream. The chunked form is what the conformance harness uses to prove the parser holds together when the transcript arrives in arbitrary network slices, not as one buffered blob. * feat(daemon): adapter conformance harness (Phase 10.3) runAdapterConformance pulls a transcript through the same parseCritiqueStream pipeline the orchestrator uses and classifies the outcome as shipped, degraded, or failed. On a degraded outcome it forwards the matched reason to the adapter-degraded registry, so a single nightly conformance run is what populates the skip list rather than the orchestrator learning each adapter is broken at request time. 5/5 vitest cases green covering shipped, malformed degraded, oversize degraded, no-ship failure, and the harness-thrown failure path. * test(e2e): Critique Theater Playwright suite (Phase 11) Six tests, one viewport per visual case, deterministic SSE fixtures stubbed via page.route(). Adds the suite to test:ui:extended so the existing extended-UI lane picks it up. Coverage: 1. Happy path: a single mounted theater plays the full fixture (1 run_started, 5 panelists open / dim / must_fix / close, 1 round_end, 1 ship) and ends on the score badge. 2. Interrupt mid-run: the panelist that is open at the time the interrupt button is clicked closes with an interrupted marker and the transcript freezes there. 3. Visual regression at 375x720 mobile. 4. Visual regression at 768x1024 tablet. 5. Visual regression at 1280x800 desktop. 6. A11y role tree: the theater region exposes a labelled landmark, each panelist lane is a group with an accessible name, the score is a status live region. All SSE traffic is stubbed by page.route so the suite runs in CI without a daemon. The toggle is seeded via localStorage by bootAppWithCritiqueEnabled so the gate behaves as if Settings flipped it on. typecheck clean; playwright --list reports 6. * test(web): reducer p99 bench at 10k iterations (Phase 13.1) Locks the documented 2ms budget for the Critique Theater reducer on a representative SSE script (27 actions, one full happy run) behind a regression gate. Asserts p99 stays under 4ms (2x the documented budget) so CI runners with a noisy neighbour do not flake while a real regression to 20ms or 200ms still trips. The bench is a vitest case rather than a bare microbenchmark so it runs in the same CI lane as every other web test and does not need a parallel runner. * test(web): critique surface coverage walker (Phase 13.2) Walks the public critique surface (11 SSE event names, 5 panelist roles, 6 lifecycle phases, 9 named i18n keys) and asserts each named symbol appears in both the src corpus and the test corpus. The walker is the gate that catches a rename in one half of the codebase without a matching update in the other half: a future PR that drops 'panelist_must_fix' from the reducer without also removing its test reference fails this suite. 62 assertions, one per symbol per corpus. * docs: Critique Theater user guide (Phase 14.1) Seven sections aimed at end users (not contributors): 1. What is Design Jury 2. How it works (the five panelists, auto-converging rounds, the composite formula) 3. Settings (the M1 toggle and what it does) 4. Reading the score badge 5. Replay surface 6. Troubleshooting (degraded, interrupted, failed) 7. FAQ The composite formula is documented as designer * 0 + critic * 0.4 + brand * 0.2 + a11y * 0.2 + copy * 0.2 because anyone trying to reverse-engineer the score is going to search for those weights and the docs are the place they should land first. * docs(daemon): critique module AGENTS map (Phase 14.2) Daemon-side wayfinder for the apps/daemon/src/critique directory. Tables every file, what owns what invariant, and the 'when you change anything here' guide so a future contributor does not have to reverse-engineer the rollout resolver before adding a new SSE event. * docs(web): Theater module AGENTS map (Phase 14.3) Web-side mirror of the daemon AGENTS map. Same file table, same invariants section, same change-impact guide, sized to the Theater component package. * docs: tighten Phase 14 reasoning from lefarcen review (PR #1319) Four content gaps lefarcen flagged in the Phase 14 docs review, addressed inline rather than deferred. The fifth item (scope-drift between 'docs only' PR body and the cumulative stacked diff) is handled by rewriting the PR body, not the docs. 1. Round exit conditions (lefarcen P2-1). docs/critique-theater.md §2 'Auto-converging rounds' now lists the five conditions that stop a run (threshold reached, round budget exhausted, per-round timeout, total timeout, user interrupt) with their default values. A user debugging a run that stopped at round 1 with composite 5.4 can read this list and find the matching cause without spelunking the orchestrator. 2. Prior-art comparison (lefarcen P2-2). New §1.5 'Why an in-CLI panel and not a third-party design lint' pre-answers the 'why not Figma lint / Adobe checker / Material You conformance' question. Three differences: rule engines vs generative reviewers, post-hoc vs in-loop, external service vs same-CLI-session. 3. Composite formula rationale (lefarcen P2-4). §2 now explains why each weight is set the way it is: critic gates correctness so it gets 0.4; brand / a11y / copy are secondary quality dimensions at 0.2 each; designer is at 0.0 in v1 because aesthetic preference is not a ship gate. The slot stays in the schema so notes flow into the transcript and a v2 config release can bump the weight without a wire-shape change. 4. v2 cast-config ownership (lefarcen P2-3). Both AGENTS.md files (daemon + web) now declare a 'Designer weight frozen at 0.0 until v2 cast config' invariant. The daemon side calls out where the SKILL.md frontmatter resolver lands (apps/daemon/src/critique/config.ts); the web side calls out where the Settings surface lands (apps/web/src/components/ Settings/). A contributor reading either AGENTS.md before implementing v2 sees which module to touch first. * docs(web): mirror the Designer-weight invariant in Theater AGENTS.md (PR #1319) lefarcen P1 follow-up on PR #1319: the daemon AGENTS.md already declares 'Designer weight is frozen at 0.0 until v2 cast config lands' as an invariant, but the web AGENTS.md's parallel bullet led with 'Composite weights are read-only on the web side' which buried the Designer-specific constraint. A web contributor reading that bullet would not realise the v1 weight distribution is wire-shape (changing it mid-v1 invalidates persisted critique_runs composite values). Rewrote the bullet to lead with the same 'Designer weight is frozen at 0.0 until v2 cast config lands' phrasing the daemon side uses, and added an explicit cross-link to the daemon AGENTS.md so the two halves of the invariant read as one rule. Web-side specifics retained: ScoreTicker / TheaterCollapsed read composite off the wire (no client recompute), v2 lands as a Settings surface at apps/web/src/components/Settings/, do not add a 'weights' prop to any component in this directory until the contracts package carries the v2 cast type. * docs: replace deferred metrics endpoint reference + refresh Theater module map (PR #1319) Two carryover items lefarcen flagged across the PR #1319 + #1320 reviews. 1. docs/critique-theater.md was sending users to /api/metrics/critique as the conformance-status check on malformed_block, but the Phase 12 metrics endpoint is explicitly deferred until after orchestrator wiring lands. Replaced the link with the pnpm conformance-harness command that DOES exist today (pnpm --filter @open-design/daemon vitest run tests/critique-conformance.test.ts) and noted that the dashboard surfaces this status as a series once Phase 12 ships. 2. apps/web/src/components/Theater/AGENTS.md module map was stale after Phase 15: the index.ts row said 'only two hooks are exported' but the barrel now exports useCritiqueTheaterEnabled too (plus the setCritiqueTheaterEnabled setter). Updated the row to list all three hooks + the setter + the reducer-derived contract types, and added a new row for hooks/useCritiqueTheaterEnabled.ts in the file table so a web contributor scanning the table sees the new hook without inferring it from the index.ts blurb. * fix(web): restore wait-for-daemon-ack pattern on Theater interrupt Same regression as flagged on PR #1316 post-main-merge: the optimistic local dispatch fired before the POST resolved, so a daemon 404 / 409 still terminalized the UI and the real SSE terminal event got ignored by the sticky interrupted phase. Snapshot runId / bestRound / composite at click time, dispatch interrupted only on res.ok, clear interruptPending on rejection or non-2xx so the user can retry. Tests cover rejection + 404 leaving the run on the live stage; the 204 path waits for the ack. --------- Co-authored-by: Nagendhra <nagendhra405@gmail.com>	2026-05-13 12:11:48 +08:00
pftom	fcc3ae5838	feat(web): enhance HomeHero and related components for improved context selection and visibility handling - Updated the HomeHero component to support skill and MCP server mentions, allowing users to select these options seamlessly. - Improved CSS styles for the HomeHero component, enhancing the visual presentation of active selections and context tabs. - Refactored visibility handling for slides in the deck framework, ensuring proper display logic and preventing visibility issues with variant classes. - Added tests to verify the functionality of context selection and visibility handling, ensuring a smoother user experience. This update significantly enhances the user interface and interaction capabilities within the HomeHero component, improving the overall experience for users managing skills and presentations.	2026-05-13 11:48:15 +08:00
pftom	9006b74cde	feat(web): enhance ChatComposer and related components with skill and MCP server integration - Added support for skills and MCP servers in the ChatComposer, allowing users to apply skills and select MCP servers through mentions. - Updated the HomeHero and ProjectView components to manage skill selection and project skill changes. - Enhanced the EntryShell and other components to accommodate new skill and MCP server functionalities. - Improved CSS styles for better visual presentation of new features. - Added tests to ensure proper functionality of skill and MCP server integrations within the ChatComposer. This update significantly improves the user experience by enabling seamless integration of skills and MCP servers into the chat interface, enhancing project management capabilities.	2026-05-13 11:47:51 +08:00
pftom	f7a13c7b15	feat(web): enhance HomeHero component to support IME composition handling - Added a `composingRef` to track IME composition state, preventing submission during text composition. - Implemented `isImeComposing` function to check if the input is currently being composed. - Updated event handlers to ensure that submissions and plugin selections are blocked while composing. - Added tests to verify that submissions and plugin picks do not occur during IME composition, improving user experience for non-Latin input methods. This update enhances the input handling in the HomeHero component, ensuring a smoother experience for users utilizing IME for text input.	2026-05-13 11:26:10 +08:00
lefarcen	e2952acd05	Revert "fix(web): restore consistent app header layout (#1432 )" This reverts commit `3d3119333c`.	2026-05-13 11:20:16 +08:00
pftom	c36609c47d	feat(daemon, web): implement plugin sharing project creation and enhance CLI functionality - Added new flags for conversation, message, agent, and model in the CLI to support enhanced plugin sharing features. - Introduced a new API endpoint for creating share projects for plugins, allowing users to publish to GitHub or contribute to Open Design. - Updated the UI components to facilitate the new sharing functionalities, including prompts for user input during the sharing process. - Enhanced the project management system to handle new plugin share actions, improving user interaction and experience. - Added tests to ensure the reliability of the new sharing features and their integration within the existing plugin management system. This update significantly enhances the plugin ecosystem by enabling users to share their creations more effectively and streamline collaboration.	2026-05-13 07:01:12 +08:00
Neha Prasad	342ba44383	fix memory extraction history affordance (#1447 )	2026-05-12 13:35:34 -04:00
Siri-Ray	3d3119333c	fix(web): restore consistent app header layout (#1432 ) * docs: add NotebookLM GitHub export script (#1062) * docs: add NotebookLM GitHub export script * fix: make NotebookLM export TOC anchors work * fix: escape TOC link text markdown chars * fix: include merged PRs when exporting --prs all * fix: allow --prs merged mode * fix: treat --limit as total export budget * fix: avoid starving buckets under global --limit * fix: support --issues none and handle repos w/ issues disabled * fix: avoid underfilling export when buckets empty * fix: keep disabled-issues fallback quiet * fix: silence disabled issues fallback * fix: satisfy script typecheck * prevent duplicate saves and add template deletion (#1294) * prevent duplicate template entries on repeated save * add delete button to saved template list Templates can now be removed from the template picker via a hover x button, calling the existing DELETE /api/templates/:id endpoint. * add missing onDeleteTemplate prop in test fixtures * add template deletion flow test for NewProjectPanel * reject template names longer than 100 characters * preserve original createdAt on template update * feat: add FAQ page skill (#1162) * fix: set writable OD_DATA_DIR default for nix run Fixes #1157 When running via 'nix run github:nexu-io/open-design', the daemon attempted to create runtime state under the Nix store package path: /nix/store/.../lib/open-design/.od/projects The Nix store is read-only at runtime, causing startup to fail with ENOENT when mkdir() tried to create the projects directory. This commit updates the nix run wrapper to export OD_DATA_DIR with a writable default ($HOME/.od) when the variable is unset. Users can still override it by setting OD_DATA_DIR before running. The Home Manager and NixOS modules already set OD_DATA_DIR, so they are unaffected by this change. * feat: add FAQ page skill Add a new skill for generating Frequently Asked Questions pages with: - Collapsible accordion sections for Q&A pairs - Real-time search functionality - Category filtering (Billing, Account, Technical, General) - Smooth animations and transitions - Keyboard navigation support - Mobile-friendly responsive design - Semantic HTML with proper ARIA attributes The skill includes: - SKILL.md with triggers, workflow, and output contract - example.html demonstrating a complete FAQ page with 12 questions Use cases: help centers, support pages, product documentation * fix: address PR review feedback for FAQ page skill - Fix craft slugs: use accessibility-baseline and state-coverage instead of non-existent slugs - Remove overly broad 'questions and answers' trigger - Add edge case handling for insufficient/excessive FAQs - Remove search highlighting requirement (XSS risk) - Update self-check to reflect filtering instead of highlighting Addresses review comments from @lefarcen and @chatgpt-codex-connector * feat: add localized copy for faq-page skill Add German, French, and Russian translations for the FAQ page skill example prompt to fix validation test failure. - DE: FAQ-Seite mit Akkordeon-Abschnitten, Suchfunktion und Kategoriefilterung - FR: Page FAQ avec sections accordéon, recherche et filtrage par catégorie - RU: Страница FAQ со складными секциями-аккордеонами, поиском и фильтрацией * fix: escape apostrophe in French translation Use double quotes to avoid syntax error with d'auth * fix(platform): add legacy ~/.fnm path to wellKnownUserToolchainBins (#1110) * fix(platform): add legacy ~/.fnm path to wellKnownUserToolchainBins fnm legacy installations use ~/.fnm/node-versions. Closes #1102 * fix: remove stray .fnm token from type declaration * docs: add Windows troubleshooting guide (#478) (#1170) * docs: add Windows troubleshooting guide (#478) Add docs/windows-troubleshooting.md with step-by-step fixes for the most common native-Windows setup errors: - Node 24 / nvm-windows gotchas (fake nvm file in System32) - pnpm not found after installation - Build scripts blocked by pnpm 10 (better-sqlite3, sharp) - Visual Studio / gyp build errors - Starting the dev server - Optional OpenCode CLI setup Also update CONTRIBUTING.md and QUICKSTART.md to link to the new guide instead of the vague "file an issue if it doesn't" note. * docs: fix Windows guide command accuracy (#1170) Address all 6 inline review comments from lefarcen: - Pin npm-global pnpm install to @10.33.2 (matches packageManager field) - Use where.exe instead of bare where (PowerShell alias conflict) - Fix OpenCode package: opencode-ai (not opencode), binary is opencode - Add EPERM fallback note for corepack enable on protected installs - Add Python check for gyp ERR! find Python - Expand diagnostic checklist with corepack, python, execution policy Also remove redundant corepack pnpm --version from checklist. * feat(daemon): inject compiled design-system tokens + fixture into prompts (#1385) * feat(daemon): inject compiled design-system tokens + fixture into prompts Follow-up to #1231. The prior PR landed the structured form of two brands (`default` + `kami`) and codified the schema; this PR teaches the daemon to actually consume those files when assembling the system prompt, so agents stop having to re-derive token names from DESIGN.md prose every turn. Gated behind `OD_DESIGN_TOKEN_CHANNEL=1` for the smoke-test phase — flag-off keeps the daemon byte-equivalent to today's behavior, flag-on appends two new prompt blocks (the brand's `tokens.css` :root contract and its `components.html` reference fixture) right after the existing DESIGN.md block. Brands without those sibling files (every brand except `default` and `kami` today) skip silently in either mode. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(daemon): only swallow ENOENT/ENOTDIR in readFileOptional, rethrow rest Reviewer feedback (nettee, #1385). The prior catch-all hid permission errors, EISDIR, and broken packaged-resource paths behind the same "undefined = absent" branch the legacy ~138-brand fallback uses, which would let `OD_DESIGN_TOKEN_CHANNEL=1` silently degrade to the DESIGN.md-only prompt while reporting success. That corrupts the exact signal the smoke-test rollout depends on. Now `readFileOptional` only returns undefined for ENOENT / ENOTDIR (real "file does not exist" cases) and rethrows everything else. Added a focused test that plants a directory at the tokens.css path to exercise the EISDIR branch, plus a partial-presence regression test to confirm the stricter contract preserves the legacy fallback. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: chaoxiaoche <chaoxiaoche@192.168.10.16> Co-authored-by: Cursor <cursoragent@cursor.com> * feat(daemon): make connection-test timeouts configurable (#1222) * feat(daemon): make connection-test timeouts configurable Provider and agent connection tests had hardcoded 12s / 45s budgets, which are too tight for slow networks or distant providers (the user sees "timeout" in Settings with no way to extend the budget). - Add OD_CONNECTION_TEST_PROVIDER_TIMEOUT_MS (default 12_000) - Add OD_CONNECTION_TEST_AGENT_TIMEOUT_MS (default 45_000) - Invalid values (non-numeric, zero, negative, fractional) emit a console.warn and fall back to the default, so a typo in the env never silently disables the safety timeout. - Export resolveConnectionTestTimeoutMs for unit testing; cover the three resolution paths (fallback / honored override / invalid). 41 connection-test tests pass (+3 new), full daemon suite 1170/1170. * fix(daemon): reject connection-test timeout overrides above Node's setTimeout maximum Node's `setTimeout` silently clamps any delay above `2^31-1` ms (2_147_483_647) to ~1 ms with a TimeoutOverflowWarning. The previous `Number.isInteger(n) && n >= 1` check accepted oversized values unchanged and passed them straight to `setTimeout`, so an override that intended to raise the budget — e.g. `OD_CONNECTION_TEST_AGENT_TIMEOUT_MS=3000000000` — instead caused every connection test to fail almost immediately. The safety timeout was effectively disarmed. Add `MAX_CONNECTION_TEST_TIMEOUT_MS = 2_147_483_647` and switch the guard to `Number.isSafeInteger(n) && n >= 1 && n <= MAX...`. The boundary value is still accepted; one millisecond past it falls back with a warn. Regression test exercises `3_000_000_000`, `2_147_483_647`, and `2_147_483_648`. Addresses #1222 review feedback from @chatgpt-codex-connector, @mrcfps, and @lefarcen. * fix(security): strip trailing dot in normalizeBracketedIpv6 (FQDN SSRF bypass) (#1122) * fix(security): strip trailing dot in normalizeBracketedIpv6 (FQDN bypass) new URL('http://192.168.1.5./').hostname returns '192.168.1.5.' — the trailing dot is the RFC 1034 absolute-FQDN form and resolves identically to '192.168.1.5'. parseIpv4 fails on the dotted form, so 169.254.169.254. slips past the metadata-service block, 192.168.1.5. slips past the LAN block, and localhost. slips past the loopback identification. Strip trailing dots in normalizeBracketedIpv6 so all downstream checks (isLoopbackApiHost, isBlockedExternalApiHostname, isBlockedIpv4, IPv6 range tests) see the canonical form. Adds 6 vitest cases covering loopback FQDN forms (localhost., foo.localhost., 127.0.0.1.) and SSRF FQDN bypasses (169.254.169.254., 192.168.1.5., 10.0.0.5.). Refs nexu-io/open-design#1119 review feedback (P2 from @lefarcen). * test(connectionTest): tighten trailing-dot coverage per #1122 review Two issues from #1122 review: 1. (P2 from @mrcfps + codex bot) The original `foo.localhost.` case asserted error===undefined on validateBaseUrl, which only proves the URL passed validation — not that the host is identified as loopback. Replaced with direct isLoopbackApiHost(...) assertions on the actual loopback FQDN forms (localhost., 127.0.0.1., 127.0.0.5.) so the test exercises the loopback path the comment claims. 2. (P3 from @lefarcen) Original blocked-FQDN tests covered only 3 of 7 ranges that isBlockedIpv4 handles. Added a dedicated case per range (0.0.0.0/8, 10/8, 100.64/10, 169.254/16, 172.16/12, 192.168/16, multicast >=224) so future regressions in normalizeBracketedIpv6 surface against the full coverage. * docs: drop misleading foo.localhost./endsWith claim in normalizer comment @lefarcen review feedback: isLoopbackApiHost only accepts exact 'localhost', '::1', loopback IPv4, and mapped loopback IPv4 — there's no subdomain or endsWith handling, so referencing 'foo.localhost.' overstates what the trailing-dot strip enables. Rewrite the comment to match actual call sites (isLoopbackApiHost equality + isBlockedIpv4 numeric parse). * feat(daemon): export self-contained HTML via /export/?inline=1 endpoint (#1312) test(daemon): add Red unit tests for inlineRelativeAssets helper 14 cases pinning the behavior contract for the upcoming apps/daemon/src/inline-assets.ts helper: - link/script inlining with verbatim body preservation - non-src script attrs preserved (type=module, defer, crossorigin) - relative path resolution (root + nested + deep-nested owners) - self-closing and single-quoted attr forms - negative cases: missing rel, rel=preload, absolute/data/blob/leading-slash - escaping: </style and </script inside body - null-fileReader graceful degradation - duplicate identical tags fully replaced (diverges from apps/web/src/components/FileViewer.tsx:5313's first-match-only; locked decision per plan §3.3) - HTML-escaped data-od-inline-asset attr Tests intentionally Red — module ../src/inline-assets.js does not yet exist. Phase B-G of plan declarative-roaming-gosling.md will turn them green by porting FileViewer.tsx:5248-5354 server-side. Refs nexu-io/open-design#368. * feat(daemon): port inlineRelativeAssets server-side for export endpoint Adds apps/daemon/src/inline-assets.ts — a pure helper that takes (html, ownerFileName, fileReader closure) and returns the HTML with every relative <link rel=stylesheet> and <script src> contents inlined into <style data-od-inline-asset="…">/<script>…</script> blocks. The fileReader closure keeps the helper free of fs/Express coupling so the route handler owns the filesystem boundary. Port source: apps/web/src/components/FileViewer.tsx:5248-5354 — five functions (inlineRelativeAssets, resolveProjectRelativePath, baseDirFor, readHtmlAttr, escapeHtmlAttr). The fetch hop becomes the fileReader closure; replace-all replaces first-match-only per locked design decision §3.3 (inline comment in inline-assets.ts cites the divergence from FileViewer.tsx:5313 and notes the web inline path is on a deprecation track since PR #384 made URL-load the default). Phase B-G of plan declarative-roaming-gosling.md. All 14 unit cases from the Red commit (`a60a9023`) now pass; tightens one case to use a realistic '&'-only filename (the original `<`/`>`-bearing filename was unreachable in real filesystems and exposed a regex limitation the web client carries too). Daemon delta: +14 tests (1704 → 1718). Typecheck clean. Refs nexu-io/open-design#368. * test(daemon): add Red integration tests for /export/?inline=1 route 9 HTTP cases against GET /api/projects/:id/export/?inline=1: - 3-file React-ish layout returns self-contained HTML (wiring guard: body assertions catch removal of the await inlineRelativeAssets(...) line, not just helper-internals changes) - missing inline / non-canonical values (0, false, foo, empty) → 400 - non-HTML file → 400 UNSUPPORTED_FILE_TYPE - missing file → 404 FILE_NOT_FOUND - invalid project id (..) → some 4xx (Express normalizes before route) - null-origin OPTIONS preflight → 204 + Access-Control-Allow-Origin: * - missing sibling asset → 200 with <link> tag intact, other asset inlined - nested HTML entry (pages/index.html + ../shared/util.js) → 200 inlined 8 of 9 tests Red (404 / 403); the invalid-project-id case is tolerant about how Express rejects .. so it accidentally passes Red — Green will tighten to 400 BAD_REQUEST via isSafeId. Phase C-R of plan declarative-roaming-gosling.md. C-G will register the route in apps/daemon/src/import-export-routes.ts. Refs nexu-io/open-design#368. * feat(daemon): wire GET /api/projects/:id/export/?inline=1 endpoint Adds the export-inline endpoint into registerProjectExportRoutes (import-export-routes.ts) alongside /export/pdf and /archive. The route: - Validates project id via ctx.validation.isSafeId - Requires ?inline=1 (accept-list: 1 / true / yes / on, matching Part 1's parseForceInline at file-viewer-render-mode.ts:59-66) - Reads the owner HTML via ctx.projectFiles.readProjectFile; maps ENOENT to 404 FILE_NOT_FOUND, everything else to 400 BAD_REQUEST - Gates non-HTML callers with 400 UNSUPPORTED_FILE_TYPE - Builds a fileReader closure that silently returns null on any sibling read failure (failure-local, not fatal — matches the web client's null-filter at FileViewer.tsx:5311) - Hands the buffer + relPath to inlineRelativeAssets and returns the result as text/html DI: RegisterProjectExportRoutesDeps gains 'projectFiles' \| 'validation'; server.ts:2879 passes the corresponding deps. Mirrors the dep shape of RegisterFinalizeRoutesDeps used by PR #832's /finalize/anthropic. Null-origin support intentionally omitted (decision §10 in the PR description): the daemon's null-origin allowlist is /raw/ and /codex-pets/.../spritesheet only, and export consumers are same-origin UI or server-side tooling — sandboxed-iframe srcdoc previews fetch /raw/* instead. Integration test #7 pins the 403 contract so a future allowlist change is deliberate. Phase C-G of plan declarative-roaming-gosling.md. All 23 tests green (14 unit + 9 integration); full daemon suite 1727 passing (delta +9 over B-G's 1718). Typecheck clean. Refs nexu-io/open-design#368. * test(daemon): add Red regression for inlined-body tag-literal corruption Reproduces the correctness bug Siri-Ray (looper) and codex-bot flagged on PR #1312: the reduce/split-join approach in inlineRelativeAssets re-scans the progressively mutated HTML, so a tag literal that happens to appear inside an already-inlined asset body gets the inner literal also replaced — corrupting the body and producing duplicate inlining. Concrete reproducer (CSS, where </style escape doesn't touch <link>): HTML: <link rel="stylesheet" href="a.css"> <link rel="stylesheet" href="b.css"> a.css: /* see also <link rel="stylesheet" href="b.css"> / b.css: body{color:red} Under split/join the second pass splits on `<link rel="stylesheet" href="b.css">` and matches BOTH the real outer tag AND the literal inside a.css's comment. Result: b.css's <style> block is injected inside a.css's comment, and b.css gets inlined twice. Phase F-R of plan declarative-roaming-gosling.md (post-PR-#1312 review round). F-G will rewrite the helper to collect matches by position in the original HTML and concat slices in a single pass, so already-inlined content is never re-scanned. Refs nexu-io/open-design#1312 review threads at apps/daemon/src/inline-assets.ts:122 (Siri-Ray looper + codex bot). feat(daemon): replace inliner reduce/split-join with position-based concat Fixes the inlined-body tag-literal corruption Siri-Ray (looper) + codex-bot flagged on PR #1312. The previous `replaceAllOccurrences` (`source.split(from).join(to)`) re-scanned the progressively mutated HTML on each pass, so a tag literal that appeared inside an already- inlined CSS/JS body got the inner literal replaced too, producing duplicate inlining and corrupted bodies. New shape: collect every match's {start, end} byte span from the ORIGINAL html via `matchAll`, await the per-match replacements in parallel, sort by start, and concat slices of the original html with the replacement strings in a single pass. Text introduced by an earlier replacement is never scanned for matches. The dup-tag fix (decision §8 — replace every occurrence, not first-match-only) is preserved: every original-tag position gets its own slice, so all duplicates are inlined. Also extracts buildInlineStyleBlock / buildInlineScriptBlock so the match-collection loops stay readable. Phase F-G of plan declarative-roaming-gosling.md. Regression test (`c809bccc`) goes Green; all 24 unit + integration tests pass; daemon suite still clean. Refs nexu-io/open-design#1312. * test(daemon): add Red CSP-sandbox test + P3 coverage gaps from PR #1312 review Three tests covering lefarcen's review on PR #1312: 1. [Red] CSP sandbox header (P2, lefarcen @ import-export-routes.ts:423). Top-level browser navigation to /export/?inline=1 sends no Origin header, so the daemon middleware lets it through and any JS in the exported document runs with daemon-origin privileges. Asserts the response sends `Content-Security-Policy: sandbox allow-scripts` so the browser treats it as a sandboxed iframe with an opaque origin (scripts still run, but no cookies / no /api/ access). This test fails until G1-G adds the header in the handler. 2. [Green-on-commit] Accept-list cases (P3, lefarcen @ test.ts:262). PR body decision §7 promises `inline=true/yes/on` case-insensitive, but round-1 tests only exercised inline=1. Pin the full accept list (true / yes / on + TRUE / Yes / ON). Already passes — the route's parser already implements the accept list; this just makes the contract testable. 3. [Green-on-commit] isSafeId guard (P3, lefarcen @ test.ts:287). Previous `..` test was normalized by Express before reaching the route. New input uses `bad!id` (URL-safe, but outside isSafeId's /^[A-Za-z0-9._-]+$/ char class), so Express passes it into req.params unchanged and isSafeId rejects with the documented 400 BAD_REQUEST envelope. Phase G1-R / H of plan declarative-roaming-gosling.md. Refs nexu-io/open-design#1312 review comments. feat(daemon): send Content-Security-Policy: sandbox allow-scripts on /export Closes the same-origin XSS surface lefarcen flagged on PR #1312 (P2 at import-export-routes.ts:423): top-level browser navigation to the export URL sends no Origin header, so the daemon's /api middleware admits the request and any JS in the exported document executes with daemon-origin privileges (cookies, /api/, localStorage). `Content-Security-Policy: sandbox allow-scripts` on the response makes the browser treat the document as a sandboxed iframe with an opaque origin. Scripts still execute (necessary for the screenshot use case — the whole point of inlining JS), but they cannot read cookies, hit /api/, or otherwise escalate to the daemon's origin. Phase G1-G of plan declarative-roaming-gosling.md. Daemon delta: +3 tests (the Red CSP test from `58151356` turns Green; the P3 coverage gap tests stay green). Refs nexu-io/open-design#1312. * test(daemon): add Red regression for <link> stylesheet attr preservation Currently `<link rel="stylesheet" href="print.css" media="print">` becomes a plain `<style data-od-inline-asset="print.css">…</style>` with no media query — print-only styles apply unconditionally. Same problem for `title` (alternate stylesheet sets), `disabled` (initial disabled state), and `nonce` (CSP nonce). All four are valid on both `<link rel=stylesheet>` and `<style>` per HTML spec, so the inliner must carry them across. PR #1312 round-2 review (lefarcen P2 @ inline-assets.ts:44). Phase G2-R; G2-G will extend buildInlineStyleBlock to copy the four attrs off the source <link>. Refs nexu-io/open-design#1312. * feat(daemon): preserve <link> stylesheet semantics on inlined <style> Closes lefarcen's P2 review note on PR #1312 (inline-assets.ts:44): `<link rel="stylesheet" href="print.css" media="print">` was becoming a plain <style> with no media query, so print-only styles applied unconditionally. Same issue for `title` (alternate stylesheet sets), `disabled` (initial disabled state), and `nonce` (CSP nonce). buildInlineStyleBlock now carries four attrs across from the source <link>: - media, title, nonce (value attrs, HTML-escaped via escapeHtmlAttr) - disabled (boolean attr — copied as bare presence) Other <link> attrs (rel, href, type, crossorigin, integrity, referrerpolicy) don't apply to <style> and are intentionally dropped. New `hasBooleanHtmlAttr` helper distinguishes presence-as-attr from substring-inside-another-attr-value via a regex that requires a word boundary after the name (whitespace, `=`, or `>`). Phase G2-G of plan declarative-roaming-gosling.md. All 28 tests pass. Refs nexu-io/open-design#1312. * docs(daemon): narrow inliner contract claim + document size-limit policy Closes lefarcen's P2 review notes on PR #1312: 1. "Self-contained" incomplete (inline-assets.ts:67): the helper only rewrites top-level <link rel=stylesheet> / <script src>. `<img src>`, CSS `url(...)`, CSS `@import`, ES module imports, font sources, and similar remain external in the response. The PR title/body claimed "self-contained HTML" which over-promised for screenshot tooling expecting bundled images/fonts. Module docstring now enumerates the full not-rewritten list and names the screenshot path as the primary use case (headless browser fetches each external asset on render, so inline-CSS- and-JS-only is sufficient). The route handler comment block mirrors the contract. A fully offline export with image/font bundling is filed as a follow-up — out of scope for this PR. 2. No response cap (inline-assets.ts:72): the helper does concurrent reads + multiple string copies and could spike daemon memory. The daemon is local-first (single-user, developer's machine — see open_design_architecture.md), so the effective ceiling is the size of the user's own project. The docstring now states this rationale and names the conditions under which a bounded-concurrency reader and output-size limit would be needed (non-trusted callers). Docs-only — no behavior change, all 28 tests still pass. Refs nexu-io/open-design#1312. * test(daemon): add Red regression for hasBooleanHtmlAttr quoted-value match PR #1312 round-2 review (lefarcen P3): `hasBooleanHtmlAttr` tests the tag string with no attr-quoting awareness, so the literal text `disabled` appearing inside any quoted attribute value followed by another whitespace char satisfies `\sdisabled(?=\s\|=\|/?>)`. <link rel=stylesheet href=x.css data-note="content disabled stuff"> emits a <style disabled> block, silently disabling a stylesheet the author wrote without that attr. Also adds a counterweight test for the legitimate-disabled case (<link … disabled>) so the next-commit fix doesn't over-correct and start dropping real boolean attrs. Phase I3-R of plan declarative-roaming-gosling.md (post-PR-#1312 round-2 review). I3-G will strip quoted attribute values from the tag string before testing for the bare attr. Refs nexu-io/open-design#1312. * feat(daemon): make hasBooleanHtmlAttr quote-aware to avoid false positives Closes lefarcen's P3 review note on PR #1312: `hasBooleanHtmlAttr` previously ran `\sname(?=\s\|=\|/?>)` over the full tag string, so the literal text `disabled` appearing inside any quoted attribute value followed by whitespace satisfied the regex. Source tags like `<link rel=stylesheet href=x.css data-note="content disabled stuff">` were emitting a <style disabled> block — silently disabling a stylesheet the author wrote without that attr. Fix: strip `="…"` and `='…'` substrings out of the tag with two regex passes BEFORE testing for the bare attr. The lookahead still requires `\s\|=\|/?>` after the attr name, so `<link disabled>`, `<link disabled="">`, `<link disabled/>`, etc. all match — but the attr name as a substring of any quoted value cannot match because values have been stripped to `""` / `''`. Phase I3-G of plan declarative-roaming-gosling.md. All 30 tests green (28 prior + 2 round-3 regression cases: false-positive and legitimate-disabled). Refs nexu-io/open-design#1312. * test(daemon): add Red cap-enforcement tests + scaffold InlineOptions PR #1312 round-2 review (lefarcen P2 — still open): round-2 only documented that no cap is enforced. Reviewer pushed back: the helper still builds unbounded candidate arrays + runs Promise.all over all asset reads + concatenates the full output in memory. Need actual limits in code. This commit adds the Red test surface that drives the next commit's enforcement: - InlineAssetsLimitError("owner") when owner HTML > maxOwnerBytes - InlineAssetsLimitError("candidates") when tag matches > maxCandidates - Per-asset graceful: oversized asset → tag stays as URL ref - InlineAssetsLimitError("total") when assembled output > maxTotalBytes - Bounded read concurrency: peak in-flight reads ≤ maxReadConcurrency - Integration: route maps the throw to 413 PAYLOAD_TOO_LARGE InlineOptions interface is added to the helper signature as a no-op test-door (per feedback_test_doors_over_fake_timers.md), so tests can exercise tiny fixtures while production callers use module-level defaults. The next commit (H3-G) wires the enforcement. Phase H3-R of plan declarative-roaming-gosling.md. Daemon delta on this commit: +6 tests (5 unit + 1 integration), all Red. Refs nexu-io/open-design#1312. * feat(daemon): enforce inliner caps + map limit errors to 413 PAYLOAD_TOO_LARGE Closes lefarcen's still-open P2 review on PR #1312 round 2 ("the code still builds unbounded candidate arrays + Promise.all over all asset reads + concatenates the full output in memory"). Caps are now enforced in code with the documented defaults: MAX_INLINE_OWNER_BYTES = 2 MiB MAX_INLINE_ASSET_BYTES = 5 MiB per sibling MAX_INLINE_CANDIDATES = 500 link/script matches MAX_INLINE_TOTAL_BYTES = 50 MiB assembled output MAX_INLINE_READ_CONCURRENCY = 8 simultaneous fileReader calls Enforcement points: - Owner cap (input): fires immediately at function entry. Cheap — Buffer.byteLength of the already-decoded UTF-8 string. - Candidate cap (planning): fires after matchAll, BEFORE any sibling read. Pathological HTML with thousands of <link>/<script src> tags is rejected without opening a single file descriptor. - Asset cap (per-sibling): post-read length check; oversized assets return null from the wrapped reader, so the tag stays as a URL ref and the response is still 200. This is the only "graceful" cap — one bad asset doesn't fail the whole export. - Total cap (output): tracked across the slice-and-concat loop, guarding both preserved-html slices AND injected replacements. - Concurrency cap (planning): a tiny in-module runWithConcurrency worker-pool keeps at most maxReadConcurrency fileReader calls in flight, with order-preserving results. `InlineAssetsLimitError` carries a `limit` discriminator so logs and clients can disambiguate owner/asset/candidates/total. The route handler catches it and emits 413 PAYLOAD_TOO_LARGE. Drive-by error-envelope fix while in the route: UNSUPPORTED_FILE_TYPE (an unregistered ApiErrorCode) → UNSUPPORTED_MEDIA_TYPE (the canonical code) with HTTP 415. The round-1 string was a slip; caught by reading packages/contracts/src/errors.ts:11 while wiring PAYLOAD_TOO_LARGE. Phase H3-G of plan declarative-roaming-gosling.md. All 36 tests green (28 prior + 2 round-3 quoted-attr + 5 cap unit + 1 cap integration). Refs nexu-io/open-design#1312. * feat(daemon): enforce inliner caps pre-buffer via AssetHandle contract Closes lefarcen's still-open P2 review on PR #1312 round 3 ("the helper enforces maxTotalBytes only after all candidate assets have already been read and converted to replacement strings" / "maxAssetBytes is checked after fileReader fully buffers each sibling"). Round-3 caps were defensive against the final output size but did not bound peak memory during read fanout — 500 assets at 5 MiB each could materialize ~2.5 GiB before the 413 fired. Contract change: InlineAssetReader now returns `AssetHandle \| null` where AssetHandle is `{ readonly size: number; read(): Promise<...> }`. Callers expose `size` from a cheap stat-equivalent (the route uses `resolveProjectFilePath`) and defer the full materialization to `read()`. The helper checks size against maxAssetBytes BEFORE invoking read, and against the running total BEFORE the reservation is committed. Enforcement flow inside runWithConcurrency: 1. await fileReader(p.resolved) → cheap stat-only call 2. if (handle.size > maxAssetBytes) return null ← pre-buffer 3. if (runningBytes + handle.size > maxTotalBytes) ← pre-buffer totalAborted = true; return null 4. runningBytes += handle.size ← reserve 5. await handle.read() ← only now 6. if (read returned null) runningBytes -= refund `totalAborted` is a shared flag the workers check at entry, so once the running total hits the cap, no new reads start. With maxReadConcurrency = 8, at most ~8 stat-side calls finish after abort — peak memory bounded. The concat-time guard stays as the exact final assertion (the pre-buffer reservation is approximate — it counts the original tag bytes and skips wrapper overhead). Route closure updated to do `resolveProjectFilePath` first, then `readProjectFile` inside the deferred `read()`. Test reader helpers (`readerFrom` + the concurrency-test reader) updated to the new shape. Two new unit tests pin the pre-buffer semantics: - `maxAssetBytes` is checked via handle.size BEFORE handle.read() (the reader's `read()` throws — must never run) - Running total abort stops further reads once exceeded (counting reader observes ≤ 2 reads when cap should fire after the first) Phase K of plan declarative-roaming-gosling.md (post-PR-#1312 round-3 review). All 38 tests green (36 prior + 2 round-4 pre-buffer cases). Refs nexu-io/open-design#1312. * test(daemon): add Red test pinning owner pre-buffer 413 before mime 415 PR #1312 round-5 (lefarcen P2): the route currently reads the owner file with readProjectFile() before any size check, so a 100 MiB owner HTML is fully buffered into memory before the helper's ownerBytes check fires. The fix is to stat with resolveProjectFilePath first, reject pre-buffer with 413 PAYLOAD_TOO_LARGE on oversize, then fold in the mime check (still 415 on mismatch, now pre-buffer), then readProjectFile when both gates pass. The Red→Green discriminator is the combination 'oversize AND non-HTML': pre-fix the route reads the buffer first and the text/plain mime check fires → 415; post-fix the route stats first and the size check fires before the mime check → 413. Asserting 'got 413, not 415' pins both the pre-buffer property and the check ordering (size before mime, per lefarcen's locked round-5 sequence). 2 MiB+1 byte fixture is acceptable in test setup; MAX_INLINE_OWNER_BYTES is the production 2 MiB so no test-door is needed. Red verified: AssertionError: expected 415 to be 413 (pre-fix flow reads → mime → 415). * feat(daemon): stat owner before readProjectFile in /export route to bound owner pre-buffer PR #1312 round-5 (lefarcen P2 confirmed at PR-1312#issuecomment-4424868413 follow-up): the route previously called readProjectFile() unconditionally on the owner, so a 100 MiB owner HTML was fully buffered into memory before the helper's ownerBytes check fired with InlineAssetsLimitError ('owner'). That meant the 413 envelope returned to the caller but only after peak memory had already hit the file size. Fix mirrors the sibling-asset stat-then-read contract round 4 added via the AssetHandle interface: call resolveProjectFilePath first (cheap stat), reject pre-buffer with 413 PAYLOAD_TOO_LARGE on size > MAX_INLINE_OWNER_BYTES, fold in the mime check (still 415 UNSUPPORTED_MEDIA_TYPE on mismatch, now also pre-buffer per lefarcen's 'fold-in is welcome'), then readProjectFile() only when both gates pass. Size check fires before mime check, so an oversize non-HTML file returns 413 rather than 415 — the observable Red→Green discriminator for this round. The helper's ownerBytes check (inline-assets.ts:127-133) stays as defense-in-depth for direct in-process callers that skip the route and for any drift between stat-reported size and the bytes returned by readFile. Verifies the round-5 Red at apps/daemon/tests/export-inline-route.ts ('returns 413 (not 415) for an oversize non-HTML file'). Daemon suite 1743/1743 passing. * test(daemon): add Red test pinning stat-vs-actual byte reconciliation PR #1312 round-5 (lefarcen P3 confirmed at PR-1312#issuecomment-4424868413 follow-up): the helper trusts handle.size for the running-total guard and never reconciles with the actual byte length of content unless the per-asset cap is exceeded. A reader that under-reports size (stale stat, UTF-8 expansion at decode, sparse file, deliberate lie) can let many strings materialize in memory before the concat-time guard at the bottom of inlineRelativeAssets throws — defeating the round-4 pre-buffer cap intent. Fix is lefarcen-confirmed path-a: post-read, the helper computes actualBytes = Buffer.byteLength(content, 'utf8'), reconciles runningBytes (add actualBytes, refund handle.size), and if running total exceeds maxTotalBytes flips totalAborted = true and returns null. Subsequent workers see totalAborted before invoking their own read(). Helper still throws InlineAssetsLimitError('total') after Promise.all settles — preserving the round-2/3/4 graceful-fallback pattern instead of racing throws across in-flight workers. Red→Green discriminator is read count. Pre-fix the helper trusts the lying handle.size (10), so both reads complete (each returning 1000 bytes) under the reservation total of 56+10+10=76 < cap 500. The concat-time guard then catches the 2000+-byte assembly and throws 'total' — but only after both reads materialized in memory. Post-fix worker 1's reconciliation trips totalAborted as soon as actualBytes (1000) is folded into runningBytes; worker 2 skips its read. Red verified: AssertionError expected 1, received 2 (pre-fix flow completes both reads before concat-guard fires). * feat(daemon): reconcile inliner reservation with post-read actual bytes PR #1312 round-5 (lefarcen P3 confirmed at PR-1312#issuecomment-4424868413 follow-up, path-a): the helper trusted handle.size for the running- total guard and only reconciled with actual bytes for the per-asset cap. A reader that under-reported size — stale stat, UTF-8 decode expansion at read time, sparse file, deliberate lie — could let many strings materialize before the concat-time guard at the bottom of inlineRelativeAssets caught the excess. That defeated the round-4 pre-buffer cap intent. Fix: after a successful read(), compute actualBytes = Buffer.byteLength(content, 'utf8'), reconcile runningBytes by folding in (actualBytes - handle.size), and re-check the total cap. If the reconciliation pushes runningBytes past maxTotalBytes, drop the asset's inlining (tag stays as URL ref), set totalAborted = true to block subsequent worker reads, and let Promise.all settle. The helper then throws InlineAssetsLimitError('total') below — matching the round-2/3/4 graceful-fallback pattern (no throw-before-settle race between in-flight workers). The per-asset cap check at line 228 is preserved for stat-lying readers that blow a single asset past maxAssetBytes; that branch refunds handle.size and drops without flipping totalAborted, so sibling assets still get a fair shot. Verifies the round-5 Red at apps/daemon/tests/export-inline-route.ts ('reconciles handle.size with actual content bytes'). Daemon suite 1744/1744 passing. --------- Co-authored-by: DevForgeAI CI/CD Engineer <devforge-ai@development.ai> * fix: truncate long template names on project cards (#1220) (#1302) Add min-width: 0 to .design-card-name so text-overflow: ellipsis works correctly in flex layouts. Long template names were pushing the task execution status (Running, Failed, etc.) out of view on project cards. Closes #1220 Co-authored-by: laomo <laomo@openclaw.ai> * fix(desktop): swallow setTypeOfService EINVAL crashes in dev main (#647) (#1298) * fix(desktop): swallow harmless setTypeOfService EINVAL crashes in dev main The packaged Electron entry (apps/packaged/src/logging.ts) already filters the undici "setTypeOfService EINVAL" crash that issue #895 introduced for the prod build, but the dev / source-built desktop entry was missing the parallel guard. Result: switching settings tabs in a from-source desktop run could fire a fresh fetch, undici would try to set IP_TOS on the outbound socket, the kernel would refuse on certain macOS / VPN configurations, and the rejection bubbled to Electron's default handler as the "JavaScript error in the main process" dialog reported in issue #647. Add the same defensive filter to apps/desktop: - isHarmlessSocketOptionError matches only the canonical undici shape (syscall name AND EINVAL code). A contradicting code (EACCES, EPERM, etc) explicitly fails the match so real bugs don't get hidden. - The uncaughtException handler logs harmless cases at warn and returns silently. For anything else it removes itself from the listener list and re-throws via setImmediate, restoring Node's default crash path so Electron's native dialog renders exactly as it would without this filter. - unhandledRejection mirrors the same harmless / fall-through split. The filter is installed BEFORE app.whenReady so it is armed by the time the renderer fires its first fetch. The helper is duplicated rather than imported from apps/packaged because AGENTS.md forbids cross-app private-source imports. The file header calls out the parallel and notes that the two copies should stay in sync until the helper is promoted to a shared workspace package (follow-up); the contract is identical so a regression in one will surface in the other's test suite. Tests in apps/desktop/tests/main/uncaught-exception.test.ts mirror apps/packaged/tests/logging.test.ts: 8 cases pinning the matcher shape, 2 cases pinning the handler's harmless-log-warn vs fall-through-rethrow split. Validated: pnpm guard, pnpm --filter @open-design/desktop typecheck, pnpm --filter @open-design/desktop build, and pnpm --filter @open-design/desktop test (14 passed, 10 new). * fix(desktop,packaged): fail-fast on non-harmless unhandled rejections The previous unhandledRejection listeners logged non-harmless reasons and returned, which kept the main process alive after any rejected promise. A real bug, a failed IPC registration, or any unexpected async exception was reduced to a console line instead of surfacing through Node/Electron's default crash path the filter was meant to preserve. Both copies now route non-harmless rejections through a parallel factory (createDesktopUnhandledRejectionHandler / createFatalUnhandledRejectionHandler) that mirrors the uncaughtException policy: harmless setTypeOfService EINVAL shapes log at warn and return, anything else logs at error, removes the listener, and re-throws via setImmediate. Listener removal happens before the scheduled throw, so the rethrown reason lands in the uncaughtException path with no recursion. Tests cover the harmless branch, the detach + ordered rethrow, and non-Error / primitive rejection reasons (Promise.reject(42)) which must fall through. Desktop suite: 13/13, packaged suite: 16/16. Flagged on PR #1298 by Siri-Ray and the codex P2 review thread; the two file copies stay in lockstep per the AGENTS.md sync invariant. --------- Co-authored-by: Nagendhra <nagendhra405@gmail.com> * feature: refine assistant artifact feedback (#1379) * feature: refine assistant artifact feedback * fix: clear hidden custom feedback reason * test: update assistant feedback expectations * fix: support object-style question-form options (#1293) * fix: support object-style question-form options * fix: preserve stable option values in form submissions * fix(daemon/acp): terminate ACP child after clean prompt completion (#1286) * fix(daemon/acp): terminate ACP child after clean prompt completion (Bug B / #1265) Some ACP agents (notably Devin for Terminal) keep the child process alive after stdin closes, waiting for the next prompt. Open Design spawns a fresh agent per chat turn and relies on child.on('close') to finalize the run, so without an explicit signal-driven shutdown the chat sits stuck in the 'working' state indefinitely. Three small, targeted changes: - apps/daemon/src/acp.ts: After a clean session/prompt response we schedule a 500ms grace period and then SIGTERM the child. This mirrors the pattern detectAcpModels() already uses after model discovery. The grace period leaves well-behaved agents that exit on stdin.end() unaffected. - apps/daemon/src/acp.ts: New completedSuccessfully() method on the session handle reports whether the prompt resolved without a fatal error or abort, so the consumer can distinguish 'clean signal exit' from 'genuine signal failure'. - apps/daemon/src/server.ts: child.on('close') now treats a SIGTERM exit as 'succeeded' when acpSession.completedSuccessfully() is true. - apps/web/src/providers/daemon.ts: Trust the server's authoritative endStatus; the signal/non-zero-code safety net no longer overrides an explicit 'succeeded' status, so the chat doesn't surface a fake 'agent exited with signal SIGTERM' error after a clean ACP run. Daemon tests cover the SIGTERM grace timer, clean early-exit (timer cleared), and completedSuccessfully() abort/error states. Manual UI test on plain main + this fix confirms Devin chats now return to ready automatically after Done · ... * fix(daemon/connectionTest): treat ACP clean SIGTERM as success Codex review on #1286 caught that the new SIGTERM in attachAcpSession breaks ACP connection tests for agents that don't shut down on stdin.end() (the exact Devin behavior the patch targets). attachAgentStreamHandlers() in connectionTest.ts now also respects acpSession.completedSuccessfully(), mirroring the same check we apply in server.ts. Without this, a clean prompt response followed by our SIGTERM would set winner.signal === 'SIGTERM', flip exitedCleanly to false, and the connection test would report 'agent_spawn_failed' even when the agent had returned a healthy response. Also widened the AgentSpawnHandle type so completedSuccessfully is visible on the structural type used inside connectionTest.ts. All 56 daemon tests still pass; typecheck + guard clean. * fix(daemon/acp): narrow ACP success-on-signal override to forced-SIGTERM Looper review on #1286 caught that the success predicate was broader than the SIGTERM case it was meant to handle. `completedSuccessfully()` flips to true as soon as the ACP `session/prompt` response is processed, but it does not say why the child later closed. With the broad predicate, an ACP agent that returned a prompt result and then exited with code 1 (or was killed by SIGKILL/SIGSEGV) was still marked 'succeeded', regressing the existing close-status behavior for genuine post-response process failures. Scope the override to the exact forced-shutdown shape this PR introduces: code === null && signal === 'SIGTERM' && acpCleanCompletion Applied to both `server.ts` (chat run finalization) and `connectionTest.ts` (connection-test classification). Any other post-response failure now falls through to 'failed' / 'agent_spawn_failed' as before. All 59 daemon tests still pass; typecheck + guard clean. * fix(web/daemon): only bypass exit-code safety net on explicit server success Looper review on #1286 caught that the previous web change trusted `endStatus === 'succeeded'` absolutely, but `endStatus` can become 'succeeded' in two distinct ways: 1. The SSE end event explicitly carries `status: 'succeeded'` (authoritative server declaration). 2. The end event omits or has an invalid `status` field and the handler silently falls back to 'succeeded' as a local default. Both produced `endStatus === 'succeeded'` in the existing code, so the new safety-net bypass treated them identically. That regressed backward compat: a compatible or older daemon emitting an end event like `{code:1}` or `{code:null,signal:"SIGTERM"}` with no `status` would suddenly skip the failure banner. Track explicit success separately via `serverDeclaredSuccess`, set true only when: - The SSE end event has `status === 'succeeded'`, or - The fallback `fetchChatRunStatus` REST path returns `status === 'succeeded'` (which the existing `isChatRunStatus()` guard already proves is explicit). The safety net is now bypassed only on that explicit signal; the local-fallback success path still reaches the exit-code/signal check so real failures surface as before. Adds three web-side regression tests in `apps/web/tests/providers/sse.test.ts`: - Explicit `status: 'succeeded'` + SIGTERM → onDone called, no error - End event with `{code:1}` and no `status` → onError surfaces 'agent exited with code 1' as before - End event with `{code:null,signal:'SIGTERM'}` and no `status` → onError surfaces 'agent exited with signal SIGTERM' as before `pnpm guard` + daemon typecheck clean; 27/27 SSE tests pass (up from 24). * Fix Codex wrapper launch paths (#1395) * test: add Memory and Routines coverage (#1400) * test: align extended Playwright coverage with current UI behavior * test: address extended suite review feedback * test: fix Codex fallback config hydration in e2e * test: add Memory and Routines coverage * test: fix Memory and Routines component test typing * test: include Memory and Routines e2e in extended suite * refactor(settings): use tiled language picker instead of dropdown (#1406) The Language section in Settings rendered a single-button dropdown trigger that opened a floating menu. With one visible label and lots of empty panel space, the layout misled users into thinking only one language existed. Replace the dropdown trigger + portaled menu with an inline tile grid that shows every locale at a glance and clicks directly to switch. Side effects of the new layout: the languageOpen / languageMenuRect state, the dynamic placement effect, the resize-close effect, the mousedown click-outside handler, and the languageRef are gone. The global Escape handler no longer needs to guard against the menu being open. CSS for .settings-language-picker, .settings-language-button, .settings-language-menu, and .settings-language-option is replaced by .settings-language-grid (auto-fill 180px minmax columns) + .settings-language-tile. Tests in SettingsDialog.execution.test.tsx that drove the dropdown (click trigger → click menuitemradio → assert menu closed) are rewritten to drive the tiles directly via the radio role. Refs #1347 * fix(web): restore consistent app header layout * fix(web): restore consistent app header layout Generated-By: looper 0.7.2 (runner=fixer, agent=opencode) * fix(web): restore consistent app header layout Generated-By: looper 0.7.2 (runner=fixer, agent=opencode) * fix(web): restore consistent app header layout Generated-By: looper 0.7.2 (runner=fixer, agent=opencode) * fix(web): hide project output chips in header --------- Co-authored-by: Prantik Medhi <140103052+prantikmedhi@users.noreply.github.com> Co-authored-by: 이용진 <90879448+Leesin0222@users.noreply.github.com> Co-authored-by: Nicholas-Xiong <2482929840@qq.com> Co-authored-by: Hesam <chngyzkhanwhsht@gmail.com> Co-authored-by: Yuhao Chen <godcorn001@outlook.com> Co-authored-by: chaoxiaoche <fanzhen910412@gmail.com> Co-authored-by: chaoxiaoche <chaoxiaoche@192.168.10.16> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: eggward han <32223217+Eggwardhan@users.noreply.github.com> Co-authored-by: @aaronjmars <61592645+aaronjmars@users.noreply.github.com> Co-authored-by: Bryan <121247296+bankielewicz@users.noreply.github.com> Co-authored-by: DevForgeAI CI/CD Engineer <devforge-ai@development.ai> Co-authored-by: mrzhangkris <92247501+mrzhangkris@users.noreply.github.com> Co-authored-by: laomo <laomo@openclaw.ai> Co-authored-by: Nagendhra Madishetti <nagendhra.madishetti24@gmail.com> Co-authored-by: Nagendhra <nagendhra405@gmail.com> Co-authored-by: Mason <jinmeihong0201@gmail.com> Co-authored-by: Yiang Yiyan <15089131836@163.com> Co-authored-by: Rocky <101849785+MrRockySL@users.noreply.github.com> Co-authored-by: nettee <nettee.liu@gmail.com> Co-authored-by: shangxinyu1 <shangxinyu@refly.ai> Co-authored-by: Matt Van Horn <mvanhorn@users.noreply.github.com>	2026-05-12 23:15:46 +08:00
pftom	1f4259a190	feat(web): update routing and terminology for automation features - Modified the routing logic to recognize 'automations' as a valid entry point alongside 'tasks', ensuring backward compatibility. - Updated the EntryNavRail component to reflect the change from 'tasks' to 'automations' in labels and tooltips. - Renamed instances of 'tasks' to 'automations' in the TasksView component for consistency and clarity. - Enhanced HomeView and chip actions to include new input structures for better handling of automation scenarios. - Updated tests to validate the new routing and terminology changes, ensuring proper functionality across the application. This update improves the user experience by clarifying the distinction between tasks and automations, aligning the UI with the updated terminology.	2026-05-12 23:00:24 +08:00
pftom	358105c154	feat(web): enhance PluginsHomeSection with contribution card and improved plugin creation flow - Updated the `onCreatePlugin` prop to accept an optional goal parameter, allowing for more contextual plugin creation. - Introduced a contribution card that displays when no plugins match the current filters, providing users with a prompt to create new plugins. - Enhanced the `resolveContributionTarget` function to return detailed information about the contribution context. - Added new CSS styles for the contribution card to improve visual presentation. - Updated tests to cover new contribution scenarios and ensure proper functionality of the contribution card. This update significantly improves the user experience by guiding users to contribute plugins when no matches are found, fostering community engagement.	2026-05-12 22:46:43 +08:00
pftom	5f71968f61	feat(daemon, web): implement plugin sharing features for GitHub and Open Design contributions - Added new API endpoints for publishing plugins to GitHub and contributing to Open Design, enhancing the plugin sharing capabilities. - Introduced functions for handling plugin sharing actions, including `publishGeneratedPluginToGitHub` and `contributeGeneratedPluginToOpenDesign`. - Updated the `DesignFilesPanel` and `FileWorkspace` components to support new sharing functionalities, allowing users to publish or contribute plugins directly from the interface. - Enhanced the UI with new buttons for publishing and contributing plugins, improving user interaction and experience. - Added tests to ensure the reliability of the new sharing features and their integration within the existing plugin management system. This update significantly improves the plugin ecosystem by enabling users to share their creations with the community and streamline collaboration.	2026-05-12 22:39:32 +08:00
lefarcen	e1bc83a476	feat(analytics): PostHog product analytics (P0 events, consent-gated, packaged) (#1428 ) * feat(analytics): scaffold PostHog product-analytics integration - Add @open-design/contracts/analytics subpath with the 17 P0 event payload types, header constants, and code↔CSV enum mapping helpers. - Add apps/daemon/src/analytics.ts with env-gated posthog-node client, request-scoped analytics context reader, and artifact-id anonymizer. - Expose GET /api/analytics/config so the web bundle never embeds the PostHog key at build time; daemon owns POSTHOG_KEY / POSTHOG_HOST. - Add apps/web/src/analytics module (identity + lazy posthog-js client + React provider) and mount it under <I18nProvider> in app/layout. No event wiring yet — that lands in the next commit alongside trigger points (App.tsx, EntryView, NewProjectPanel, SettingsDialog, FileViewer, runs.ts). * feat(analytics): wire app_launch, home_view, home_click, project_create_result - App.tsx: fire app_launch once after first effect tick. handleCreateProject now emits project_create_result on both success and failure paths. - EntryView.tsx: home_view (page) gated on agents loading so has_available_cli isn't transiently false; home_view (asset_panel) fires per top-tab change with the right result_count. - NewProjectPanel.tsx: home_click create_button fires before delegating to the parent; a fresh request_id is generated here and threaded through onCreate so the matching project_create_result stitches via $insert_id. - contracts/analytics: tighten createTabToTracking and topTabToTracking for the worktree branch's renamed tabs (live-artifact, templates). * feat(analytics): wire settings_view + 3 settings_click events - settings_view fires on dialog mount and on every section switch, carrying the active section (mapped via settingsSectionToTracking for the 16-section worktree layout), execution_mode, and the selected CLI provider id when present. - settings_click execution_mode_tab: setMode now emits before/after values whenever the user toggles between Local CLI and BYOK. - settings_click cli_provider_card: agent card onClick reports cli_provider_id via agentIdToTracking (kiro → other). - settings_click byok_field: onFocus added to api_key, model select, and base_url inputs; provider_id widened to include google so the worktree's Gemini protocol slot type-checks. * feat(analytics): wire studio_view + studio_click chat, studio_view artifact - packages/contracts/src/analytics/artifact-id.ts: FNV-1a 64-bit helper produces a 16-hex anonymized id for (projectId, fileName). Stable cross-platform so the daemon and the web bundle resolve the same id without a Web Crypto round-trip; daemon now re-exports it. - ChatComposer: studio_view chat_panel fires once per project mount, studio_click chat_composer fires on attachment + send buttons with estimated user_query_tokens (length/4) and has_attachment. - FileViewer: studio_view artifact fires once per (project, file) at the dispatcher level, before any sub-viewer renders, with artifact_kind derived from the renderer registry / file.kind table. - Widen TrackingExportFormat to include markdown and cloudflare_pages so the worktree branch's full share menu can emit verbatim. * feat(analytics): wire studio_click share_option + artifact_export_result HtmlViewer's share menu now emits both events per click via a fireShareExport helper: - studio_click share_option fires immediately on click with the chosen export_format and a fresh request_id. - artifact_export_result fires when the export resolves — success for sync exporters (html, markdown, template) the moment the call returns, success/failed for async exporters (pdf, zip, deploy) via .then/.catch. The same request_id threads both events so PostHog stitches click → result via $insert_id. DEPLOY_PROVIDER_OPTIONS maps to the CSV's vercel / cloudflare_pages slots; markdown is now a first-class export_format value. Also ignore .env.local so local POSTHOG_KEY / .env-style secrets don't get committed. * feat(analytics): emit run_created and run_finished from the daemon POST /api/runs now reads the analytics context off the x-od-analytics-* headers the web client sets on every fetch, then: - Captures run_created with project_id, conversation_id, run_id, model_id, agent_provider_id (mapped via agentIdToTracking), skill_id, design_system_id, plus the token_count_source marker. - Schedules a run_finished capture on runs.wait(run) resolution, mapping succeeded/canceled/failed to success/cancelled/failed and reporting total_duration_ms. Both events use a stable insert_id derived from the same uuid so PostHog dedupes the daemon-side mirror against any future web-side capture without double-counting. Token sub-fields (user_query_tokens/system_prompt_tokens/...) stay omitted in v1 — the claude-stream parser only exposes input/output totals today. See tracking-doc-issues.md §3.2. * feat(analytics): emit settings_cli_test_result + settings_byok_test_result The original BLOCKING-list assumed these CSV P0 events were not implementable in this branch because main lacked Test buttons. The worktree HEAD actually wires `handleTestAgent` and `handleTestProvider` in SettingsDialog, so both events are now in scope. - handleTestAgent emits settings_cli_test_result on success and failure paths with cli_provider_id mapped via agentIdToTracking, result drawn from result.ok / catch branch, error_code from result.kind or the thrown error name, and duration_ms timed via performance.now(). - handleTestProvider emits settings_byok_test_result analogously, using apiProtocol (anthropic\|openai\|azure\|ollama\|google) directly as provider_id — wider than the CSV's 5-value enum, documented in tracking-doc-issues.md §2.5. Contracts: add SettingsCliTestResultProps / SettingsByokTestResultProps plus matching track* helpers. AnalyticsEventName union now covers all 14 P0 events this branch supports. * feat(analytics): gate PostHog on the existing telemetry.metrics consent The integration now reuses the same first-launch privacy banner + Settings → Privacy toggle that gates Langfuse, so a single user decision controls both telemetry sinks. - /api/analytics/config now consults the persisted AppConfigPrefs: it returns enabled=true only when POSTHOG_KEY is set AND the user has chosen "Share usage data" (telemetry.metrics === true). The response also echoes installationId so the web client uses the same anonymous id Langfuse keys off of — one identity per install, shared across both sinks. - Web AnalyticsProvider: - Bootstrap fetch resolves installationId and threads it through the x-od-analytics-anonymous-id header on every /api/* fetch, so daemon-side captures (run_created / run_finished / project_create_result) land on the same person record. - Exposes a setConsent(granted) method that calls posthog-js's opt_in_capturing / opt_out_capturing, wired from App.tsx via a useEffect watching config.telemetry?.metrics. Toggling Privacy → metrics now stops/resumes events immediately, no reload. - app_launch additionally gates on telemetry.metrics so a freshly- declined user fires nothing, and a freshly-opted-in user fires on the next reload. * feat(packaging): bake POSTHOG_KEY into packaged daemon spawn env Wires PostHog product analytics through the same Langfuse-style build- secret pipeline so official Open Design builds ship with the key while fork builds compile without it (the integration short-circuits cleanly when POSTHOG_KEY is absent). tools/pack - resolveToolPackConfig reads POSTHOG_KEY / POSTHOG_HOST from process.env at packaging time, validates them (no whitespace in the key, http(s) URL for host, trailing-slash strip), and stamps them on ToolPackConfig. Fork builds without the env vars simply omit the fields; the daemon-side gate keeps things off in that case. - Mac, Windows, and Linux packaged-config writers each append the two fields to open-design-config.json next to the existing telemetryRelayUrl entry. apps/packaged - RawPackagedConfig / PackagedConfig surface posthogKey / posthogHost so the Electron entry and headless entry both forward them to the daemon sidecar. - buildPackagedDaemonSpawnEnv emits POSTHOG_KEY / POSTHOG_HOST into the daemon child env when present. The daemon's existing analytics module reads these via process.env — no daemon-side changes needed. - The headless packaged path falls back to process.env for fields the builder hasn't injected, mirroring how OPEN_DESIGN_TELEMETRY_RELAY_URL is read there. CI - release-beta.yml and release-stable.yml expose POSTHOG_KEY (secret) and POSTHOG_HOST (var) at workflow-env scope so every packaging job inherits them. PR / fork builds without these set simply skip the bake step. Tests - tools/pack: config.test.ts covers bake-through, fork-build omission, whitespace rejection, invalid-URL rejection, and trailing-slash normalization. - apps/packaged: sidecars.test.ts covers buildPackagedDaemonSpawnEnv forwarding the keys when present and omitting them when null. * feat(analytics): enable PostHog autocapture + perf + exceptions Flip on the PostHog SDK's automatic diagnostic features so we capture click paths, page transitions, web vitals, dead clicks, and browser exceptions without scattering instrumentation through the codebase. Privacy defense lives in one place — apps/web/src/analytics/scrub.ts — wired in via posthog-js's `before_send` hook so every outgoing event passes through the same audit point: - $autocapture / $rageclick / $dead_click / $copy_autocapture: strips $el_text and value/placeholder/aria-label attrs from any input, textarea, password input, or contenteditable element. PostHog autocapture does not capture input.value by default, but $el_text on a <textarea> reflects the typed content — that's the prompt body for us, so it has to be scrubbed every time. - $pageview / $pageleave: drops query string and fragment from $current_url / $referrer so any future ?q=… can't leak. - $exception: rewrites file:// and absolute filesystem paths in stack frames to app://apps/<repo-relative> so we don't ship the user's home directory. - Suppresses $opt_in entirely — duplicate of our explicit setConsent toggle in App.tsx. Element-level defense in depth is limited to the single most sensitive surface: the chat composer textarea gets `ph-no-capture` so PostHog never even generates an event for clicks inside that subtree. Every other input relies on scrub.ts — sprinkling the class through every form would be noisy and easy to forget on new surfaces. The existing Privacy → "Share usage data" toggle continues to gate every new feature: posthog-js's opt_out_capturing() halts autocapture, $pageview, $exception, web vitals, and dead clicks alongside the explicit capture() calls — one global switch. 11 unit tests pin the scrub rules in apps/web/tests/analytics-scrub.test.ts. * ci(nix): bump pnpmDepsHash for posthog-js + posthog-node additions Adding posthog-js to apps/web and posthog-node to apps/daemon changed pnpm-lock.yaml, which Nix's fixed-output pnpmDeps derivation pins by sha256. The CI nix flake check failed with: specified: sha256-KF3Mld72/iau+pJmA7HvnanRx8VLtDP0N624SKrtrrc= got: sha256-PGFgX4lYyeH2TRAXfUq52A3EOa6bb1gO59hPsXhEk3s= Copy the new hash into both nix/package-web.nix and nix/package-daemon.nix per the procedure documented in nix/README.md §"First-build hash pinning". * feat(analytics): unify PostHog identity with Langfuse installationId PostHog's distinct_id is the installationId stamped by /api/analytics/ config; Langfuse already reads the same id off app-config.json to populate trace.userId. With both sinks keying off the same anonymous identity, dashboards can correlate user actions (PostHog events) with LLM runs (Langfuse traces) without re-identifying. Two gaps closed: 1. applyConsent(false) — clear posthog-js's persisted ph__posthog localStorage entry on opt-out via posthog.reset(). Without this, a user who opts out, then clicks Delete my data, then re-opts in would see PostHog stitch their new session to the deleted identity because bootstrap.distinctID only takes effect on first init. 2. applyIdentity(newInstallationId) — Delete my data rotates the installationId in app-config; App.tsx now watches config.installationId and calls posthog.reset() then identify(newId) so the next event batch is fully decoupled from the deleted one. Idempotent on same-id re-renders so benign config refreshes don't churn PostHog identities. The fetch wrapper's x-od-analytics-anonymous-id header also flips to the new id on rotation so daemon-side captures (run_created / run_finished) land on the same person record from the very next API call, not after a reload. The end-to-end rotation flow is verified against a live PostHog project; these unit tests pin the safety guards (no-client paths, null inputs) since stubbing posthog-js's init-loaded callback chain is brittle. fix(langfuse): require both metrics AND content consent for trace reports Tightens the Langfuse gate so a user who shares anonymous metrics but NOT conversation content stops emitting Langfuse traces entirely — Langfuse is used for turn-quality evals which only make sense with prompt/output bodies. PostHog (product analytics, content-free) stays gated on `metrics` alone and is unaffected. i18n: "Conversation content" → "Conversation and tool content" with hints expanded to mention tool inputs/outputs so the consent surface matches what the trace actually carries (en + zh-CN). Bundled here per PR scope — change originated outside this PostHog PR but lands cleanly on the same files; gating Langfuse strictly on `content` makes the dual-sink consent model (PostHog = metrics, Langfuse = metrics + content) symmetric across both i18n locales and the daemon-side gate. * feat(analytics): wire byok_provider_option + fix PR review P1s Adds the BYOK protocol-chip click event (5-value provider_id mirroring the apiProtocol Settings UI) and resolves four P1 review threads on PR #1428. byok_provider_option: - New SettingsClickByokProviderOptionProps in contracts (provider_id = anthropic\|openai\|azure\|google\|ollama; maps to CSV's 5 values per tracking-doc-issues.md §2.5). - trackSettingsClickByokProviderOption helper in apps/web/src/analytics. - SettingsDialog hooks it on the protocol-chip onClick alongside the existing setApiProtocol call; is_selected reflects whether the chip was already active. Review fixes: 1. client.ts (Siri-Ray): clear `initPromise` when the resolution is null so a Privacy → metrics opt-in after a previous decline triggers a fresh /api/analytics/config fetch. Without this, the disabled response was cached forever — first-session opt-in needed a reload to start sending PostHog events. 2. provider.tsx (Siri-Ray): replace `url.includes('/api/')` with a strict same-origin + /api/ pathname check (shared `isSameOriginApiCall` helper). Outbound third-party URLs containing `/api/` (e.g. provider.example.com/api/x) no longer receive our x-od-analytics-* headers. 3. provider.tsx (codex-connector, lefarcen): gate header injection on `resolvedAnonId` being non-null. When Privacy → metrics is off, /api/analytics/config returns enabled=false → resolvedAnonId stays null → wrapper never installs → daemon can't read consent-bearing headers → no daemon-side PostHog event. setConsent now also clears resolvedAnonId on opt-out and re-fetches on opt-in. 4. daemon/analytics.ts (defense in depth): createAnalyticsService now takes dataDir and capture() re-reads app-config to check telemetry.metrics inside the fire-and-forget wrapper. Even if a stale header somehow reaches the daemon after opt-out, the capture is dropped before posthog-node.capture is called. * fix(web): place "Share usage data" on the right in privacy consent banner Swap button order in PrivacyConsentModal and the in-settings ConsentCard so the affirmative "Share usage data" lands on the right and "Not now" on the left. Matches the OK-on-the-right pattern users expect for primary actions. Both buttons keep equal visual prominence (same .privacy-consent-action styling) so the swap doesn't change the EDPB equal-prominence stance called out in the original Langfuse telemetry spec. * feat(analytics): populate run_finished token totals from claude-stream usage Daemon's claude-stream parser already emits agent usage events with input_tokens / output_tokens totals; the run service buffers them in run.events and Langfuse reads them out the same way. The run_finished PostHog event was leaving these fields empty. Scan run.events for the most recent agent usage frame on terminal transition and emit input_tokens / output_tokens / total_tokens when present. token_count_source flips to 'provider_usage' only when at least one count landed; runs without provider-side usage data keep 'unknown'. Provider does not break the input down into the 7 sub-fields the tracking doc lists (memory / context / attachment / system_prompt / …); those stay omitted until a parser change exposes them. * feat(analytics): estimate user_query_tokens from prompt length The user_query_tokens field for run_created / run_finished was hardcoded to 0. We can't tokenize without bundling a model-specific tokenizer, but the character/4 heuristic is the industry-standard estimate when one isn't available and is enough for funnel analysis (prompt-length cohorts, short-vs-long-query conversion rates). Extracted from req.body via the same telemetryPromptFromRunRequest pattern the daemon already uses for langfuse-bridge (currentPrompt then message fallback). Only the integer count goes to PostHog — the prompt text itself never leaves the daemon. token_count_source flips appropriately: - run_created with a prompt: 'estimated' (was 'unknown') - run_created with no prompt: 'unknown' - run_finished with provider usage: 'provider_usage' (overrides baseProps' 'estimated' value) - run_finished without provider usage: inherits 'estimated' or 'unknown' from baseProps so input/output absent doesn't mask the estimate.	2026-05-12 22:32:42 +08:00
pftom	72ecc09326	feat(daemon, web): enhance plugin input handling and subcategory filtering - Updated the `pickPluginFields` function to support legacy input aliases, improving compatibility with existing plugin structures. - Added tests to ensure the new input handling works correctly with legacy inputs when resolving plugin snapshots. - Enhanced CSS styles for subcategory elements in the plugin home section, improving visual clarity and user experience. - Introduced new tests for subcategory filtering within the active workflow lane, ensuring accurate plugin categorization. This update significantly improves the plugin management experience by enhancing input handling and refining the categorization system.	2026-05-12 22:09:26 +08:00
pftom	26d21a942e	feat(web): enhance plugin input handling and categorization - Added support for plugin inputs in the EntryShell and HomeView components, allowing for more dynamic plugin interactions. - Updated the PluginsHomeSection to include subcategory filtering, improving the user experience when navigating plugins. - Enhanced the PluginsView and related components to reflect the new categorization model, transitioning to a workflow-based approach. - Refactored tests to ensure coverage for new input handling and categorization features, maintaining reliability across the application. This update significantly improves the plugin management experience by providing clearer categorization and enhanced input handling for plugins.	2026-05-12 21:59:38 +08:00
Eli	49ea2499ac	[codex] Add draw annotation workflow (#1435 ) * feat(web): tweaks palette popover with HSL hue-shift recoloring Adds a Tweaks color-palette popover to the HTML preview toolbar. Selecting a palette re-skins the iframe in place via a srcDoc-side bridge that walks the DOM and shifts every chromatic paint to the target hue while preserving each color's saturation and lightness — pale tints stay pale, bold CTAs stay bold, just in the new color family. Mono-noir desaturates instead of shifting. - runtime/srcdoc: new injectPaletteBridge + paletteBridge / initialPalette options - file-viewer-render-mode: paletteActive flips URL-load back to srcDoc so the bridge can be injected - FileViewer: state, popover, postMessage wiring, srcDoc + useUrlLoadPreview integration - PaletteTweaks: popover UI with Original + Coral / Electric / Acid forest / Risograph / Mono noir - PreviewDrawOverlay: stub pass-through until the draw branch lands * feat(web): hide finalize-design toolbar from project header * test(e2e): skip project actions toolbar flow after toolbar removal * Add draw annotation workflow * Restore project actions toolbar	2026-05-12 21:54:59 +08:00
pftom	67a109335d	feat(web, daemon): enhance plugin categorization and introduce new export scenarios - Updated the plugin categorization system to reflect a workflow-based model, replacing the previous category bar with a curated workflow bar (From source, Generate, Export). - Added new export scenarios for Next.js, React, and Vue, providing users with starter plugins for downstream integration. - Enhanced the HomeView and PluginsHomeSection components to support the new categorization and improve user interaction with plugins. - Updated tests to cover new scenarios and ensure proper functionality across the updated plugin management system. This update significantly improves the user experience by providing clearer categorization and new tools for exporting Open Design artifacts.	2026-05-12 21:50:19 +08:00
Neha Prasad	2d405fae96	fix:align artifact preview exit button (#1445 )	2026-05-12 21:39:31 +08:00
Nagendhra Madishetti	09a8fa8d64	feat(web): Critique Theater Phase 8 (8 Theater components, barrel, role-keyed CSS) (#1314 ) * feat(web): pure reducer for Critique Theater states (Phase 7.1) Pure CritiqueState reducer driven by the contracts-level PanelEvent (the same shape both the live SSE stream and the recorded transcript emit), so a single reducer powers both the in-flight panel and the rerun replay. Lifecycle covers run_started → running → (shipped / degraded / interrupted / failed), with panelist_open / dim / must_fix / close / round_end events building per-round CritiquePanelistView entries as they arrive. Defensive behaviour that surfaced while writing the spec tests: - Terminal phases (shipped / degraded / interrupted / failed) are sticky against further lifecycle events for the same run, except for parser_warning which can land late and is recorded in a side channel without changing phase. - A new run_started for a different runId at any time discards the prior state and reboots, so the UI can launch consecutive runs without an explicit reset action. - Events whose runId does not match the active run return the same state reference, so React's useReducer doesn't re-render subscribers on stray traffic. - Round bookkeeping keys by round number rather than "always last", so an out-of-order panelist_dim for round 1 arriving after a round 2 dim does not corrupt the round 2 bucket. Test coverage: 18 cases covering each transition, the runId guard, sticky-terminal behaviour, the out-of-order round invariant, and the stable-identity guarantee. Sets up Phase 7.2 and 7.3 to wire SSE + replay into the same reducer. * feat(web): useCritiqueStream hook subscribes to SSE and feeds reducer (Phase 7.2) createCritiqueEventsConnection is a pure connection manager that mirrors apps/web/src/providers/project-events.ts: opens an EventSource at /api/projects/:id/events, listens for every name in CRITIQUE_SSE_EVENT_NAMES, decodes each frame back into a PanelEvent (stripping the critique. prefix and merging the data payload), and hands it to the caller's onEvent. Reconnect uses exponential backoff (1s → 30s) and resets on `ready`; malformed payloads drop with a dev-mode warning rather than tearing the stream. useCritiqueStream wraps the manager in a useReducer that owns the CritiqueState. enabled=false or a null projectId tears down the connection cleanly; switching projectId closes the old connection and opens a fresh one. The returned dispatch lets local UI synthesise actions (e.g. an Esc keypress firing a synthetic interrupted while a kill request is in flight); production traffic comes from the SSE stream. Test coverage: - sse.test.ts (10 cases, node env): subscription set covers every CRITIQUE_SSE_EVENT_NAMES channel; payload decoding lifts the wire shape back to PanelEvent; malformed JSON is swallowed and does not stop the stream; exponential backoff schedule and ready-reset semantics are pinned with a setTimeout seam; close() cancels pending reconnects and shuts the live source; no-op fallback when EventSource is unavailable. - useCritiqueStream.test.tsx (6 cases, jsdom env): idle pre-event, reducer driven by synthetic actions, no connection when disabled or projectId is null, clean close on unmount, projectId change reopens cleanly. * feat(web): useCritiqueReplay hook drives reducer from transcript file (Phase 7.3) Fetches the per-run NDJSON transcript (one PanelEvent per line), parses every line via the shared isPanelEvent predicate, and dispatches into the same CritiqueState reducer the live SSE stream uses. A single reducer means the UI rendering a replay can be identical to the live panel, and a UI mounting both useCritiqueStream and useCritiqueReplay in parallel does not have to reconcile two state shapes. speed knob is `paused \| instant \| live \| { intervalMs: N }`. - instant flushes every event synchronously, useful for opening a finished run already at its terminal state. - intervalMs paces dispatches at a fixed cadence so the reviewer can watch the run unfold. - paused parses the transcript but holds events back until the caller advances speed (consumers can drive a scrubber later). - live is reserved for the future "playback at original cadence" feature, currently treated as instant; replay timestamps are not yet persisted with each event so honest pacing requires a follow-up Phase 7+ task. gunzip seam handles `.ndjson.gz` transcripts via DecompressionStream when present; the production fetch path picks between text and arrayBuffer based on the URL extension. Both seams are injectable so the unit tests don't need to spin up a real network or a real gzip pipeline. Test coverage (8 cases, jsdom env): - Idle status before any URL is provided. - speed=instant flushes the full transcript synchronously to shipped state. - speed={intervalMs:N} paces with the setTimeout seam, reaching done after the last tick. - speed=paused leaves status=playing with no dispatches. - Empty transcript reports done with state still idle. - Fetch rejection surfaces an error status with the message. - Malformed NDJSON lines are skipped; valid events around them still land. - .gz transcripts route through the gunzip seam. Closes the Phase 7 plan tasks 7.1 / 7.2 / 7.3 (reducer + stream + replay), all on one branch ready for review. Phases 8+ (Theater components) consume these from this PR. * fix(web): close payload-override gap + paused-resume bug in Critique Theater hooks (Phase 7 review) Two P1 fixes from lefarcen's review on PR #1307: SSE payload override `sseToPanelEvent` previously spread `data` after the channel-derived `type`, so a payload-provided `type` could override the channel and route a `critique.run_started` frame into the reducer as a `ship` action. Reversed the spread so the channel-derived `type` is authoritative, and revalidated the resulting object through the contracts-level `isPanelEvent` predicate before returning. Frames that fail validation (missing runId, empty runId, unknown type) are dropped, so a malformed or compromised SSE frame can no longer dispatch a wrong-shape action into the reducer. Three new sse.test.ts cases pin the regression: hostile `type:'ship'` in the payload still resolves to `run_started`, missing runId is dropped, empty runId is dropped. Replay pause/resume `useCritiqueReplay` had one big effect keyed on `transcriptUrl` only, so flipping `speed` from `paused` to `instant` never re-fired and the held events sat undispatched. Split into a parse effect (depends on URL, fetches and stores events in state) and a pace effect (depends on parsed-events + speed, owns the cursor + timers). The playback cursor lives in a ref that survives pause/resume cycles, so flipping `paused` -> `instant` flushes from the current position rather than restarting (which would double-dispatch `run_started` and reset the reducer). Two new useCritiqueReplay.test.tsx cases: - paused-then-instant transitions from `playing` to `done` and reaches the shipped terminal phase - intervalMs paced playback dispatches one event, pauses to drain the next scheduled timer, flips to instant, and confirms the remaining transcript drains exactly once (cursor was preserved) Doc consistency The earlier source comment in useCritiqueReplay.ts claimed `live` "paces by recorded timestamps" while the impl used zero-delay timers and the PR body said it behaves like `instant`. Aligned to reality: `live` currently behaves like `{ intervalMs: 0 }` (events drain on successive microtasks via setTimeoutFn) because transcripts do not yet carry per-event timestamps. Honest timestamp-driven pacing is queued as a Phase 7+ follow-up. Validated: pnpm guard, pnpm --filter @open-design/web typecheck, Theater suite 47/47 (up from 42, +3 sse + 2 replay), full web suite 96 files / 888 tests. * feat(i18n): seed Critique Theater key block (en + zh-CN; other locales fall back via spread) * feat(web): Theater PanelistLane component (Phase 8.1) * feat(web): Theater ScoreTicker component (Phase 8.2) * feat(web): Theater RoundDivider component (Phase 8.3) * feat(web): Theater InterruptButton component with Escape keybind (Phase 8.4) * feat(web): Theater TheaterDegraded chip (Phase 8.5) * feat(web): Theater TheaterCollapsed post-run summary (Phase 8.6) * feat(web): Theater TheaterTranscript replay surface (Phase 8.7) * feat(web): Theater TheaterStage top-level container (Phase 8.8) * feat(web): Theater CSS using existing semantic tokens (no hex literals) * feat(web): Theater public exports barrel * fix(web): resolve P2 + P3 review feedback on Phase 8 (PR #1314) Addresses all 4 P2 + 3 P3 items from codex, Siri-Ray, and lefarcen. State-lifecycle fixes (3 x P2) 1. Reducer learns a synthetic `__reset__` action (`CritiqueResetAction`). Host hooks dispatch it when their gating prop changes so a stale run from a prior project / transcript cannot bleed into the next context. Reset is idempotent on idle (returns the same reference). 2. `useCritiqueStream` dispatches `__reset__` at the top of its connection effect, so a workspace switch from project A (which streamed a critique) to project B clears the reducer before the new EventSource opens. enabled=false also clears. 3. `useCritiqueReplay` dispatches `__reset__` at the top of its parse effect, so transcriptUrl swaps (including swap-to-null after a replay reached `shipped`) lift the reducer back to idle before the new fetch starts. SSE validation (1 x P2) 4. `sseToPanelEvent` now runs a per-variant `hasValidVariantShape` check after the cheap `isPanelEvent` predicate. A `critique.ship` frame missing `composite` / `round` / `status` / `artifactRef` is rejected before reaching the reducer, so TheaterCollapsed can no longer crash on `undefined.toFixed(1)`. Every variant's required fields are validated: run_started (protocolVersion, non-empty cast, maxRounds, threshold, scale), panelist_* (round, role, plus variant-specific shape), round_end (round, composite, mustFix, decision in {continue,ship}, reason), ship (round, composite, status, artifactRef.{projectId,artifactId}, summary), degraded (reason, adapter), interrupted (bestRound, composite), failed (cause), parser_warning (kind, position). Reducer correctness (1 x P2) 5. `panelist_open` now materializes the round + an empty panelist view (`{dims: [], mustFixes: []}`) so TheaterStage can highlight the in-progress lane the instant the tag opens. Before this, a stream that emitted only `panelist_open` after `run_started` left `rounds = []` and the UI rendered no current round until a later `panelist_dim` arrived. Polish (3 x P3) 6. Brand role tint swaps from `var(--magenta, var(--accent))` to `var(--purple, var(--accent))`. `--purple` is actually defined across the design systems; `--magenta` is not, so Brand was silently falling through to `--accent` and looking identical to Designer. 7. New i18n key `critiqueTheater.interruptedSummary` for the interrupted-collapse copy ("Interrupted at round N, best composite X.X"). Previously the interrupted branch reused `shippedSummary` and the UI read "Shipped at round..." for a run that specifically did not ship. Native value in en + zh-CN; other locales fall back via `...en` spread. 8. `TheaterDegraded` heading id comes from `useId()` instead of a hardcoded `theater-degraded-heading`, so two chips rendered on the same page (chat history with multiple completed runs) keep their aria-labelledby references unambiguous. Tests (15 new cases) - reducer.test.ts (+5): __reset__ on running/terminal/idle, panelist_open materializes round, panelist_open does not stomp prior panelist data. - sse.test.ts (+6): variant-level rejection for ship without required fields, degraded without adapter, run_started with empty cast, panelist_dim with non-numeric score, round_end with unknown decision, plus a positive fully-formed ship. - useCritiqueStream.test.tsx (+2): state reset on projectId change, state reset on enabled flip false. - useCritiqueReplay.test.tsx (+1): state reset on transcriptUrl swap to null after a replay reached shipped. - TheaterCollapsed.test.tsx (text-pinning update): asserts the interrupted branch reads "Interrupted at round 1" + "best composite 7.9", and explicitly NOT "Shipped at round...". - TheaterDegraded.test.tsx (+1): two chips on the same page get unique aria-labelledby ids that each resolve to an `<h3>`. Validated - pnpm guard clean - pnpm --filter @open-design/web typecheck clean - Theater suite: 13 files, 101 tests (was 86 on the first Phase 8 push, +15 new) - tests/i18n/locales.test.ts 5 of 5 across 18 locales * fix(web): tighten isPanelEvent in contracts so enum + numeric fields are checked end-to-end (Siri-Ray round-3 P1 on PR #1314) The variant validator on the web SSE path previously accepted any `typeof === 'string'` for closed-enum fields (ship.status, panelist_.role, degraded.reason, failed.cause, parser_warning.kind, run_started.cast[]) and any `typeof === 'number'` for numeric fields, which let NaN / Infinity through. Downstream components index i18n tables by enum value, so an unknown status or role would land `SHIP_BADGE_KEY[final.status]` on undefined and crash the translator. The replay parser had a separate gap: `useCritiqueReplay.parseTranscript` called the cheap `isPanelEvent` header check directly, so a recorded line like `{"type":"ship","runId":"r"}` reached the reducer with composite, status, round, artifactRef, summary all undefined and TheaterCollapsed then called `final.composite.toFixed(1)` on undefined. Resolution: move all wire-side validation into the contract guard. - Export const arrays for the closed enums: SHIP_STATUSES, DEGRADED_REASONS, FAILED_CAUSES, PARSER_WARNING_KINDS, ROUND_DECISIONS (PANELIST_ROLES already existed). - Rewrite `isPanelEvent` in packages/contracts/src/critique.ts to be the single deep validator: header (known type + non-empty runId) plus every variant-specific required field plus closed-enum membership plus Number.isFinite on every numeric field. Documented as the wire source of truth. - Drop the local `hasValidVariantShape` from web/sse.ts; sseToPanelEvent now relies entirely on the contract guard, and parseTranscript in useCritiqueReplay (which already uses isPanelEvent) gets the deeper validation for free. Tests (TDD, red-first): - packages/contracts/tests/critique.test.ts: 13 new cases pinning the strict guard directly (well-formed across every variant, every rejection path: unknown type, empty/non-string runId, unknown enum, non-finite numeric, missing variant field). - apps/web/tests/components/Theater/state/sse.test.ts: 9 new cases for each closed-enum rejection on the wire path plus a positive sweep across every legal enum value across every variant. - apps/web/tests/components/Theater/hooks/useCritiqueReplay.test.tsx: 2 new cases for incomplete and unknown-enum transcript lines. Verified: - pnpm --filter @open-design/contracts test 4 files / 30 tests green. - pnpm --filter @open-design/contracts build clean. - pnpm --filter @open-design/web typecheck clean. - pnpm --filter @open-design/web test 107 files / 976 tests green. fix(contracts): enforce numeric domains in isPanelEvent (lefarcen P2 on PR #1314 round 4) The strict guard from PR #1314 round 3 enforced enum membership and Number.isFinite, but accepted any finite number where the contract intends a specific domain: scale: 0 (ScoreTicker divides by it), negative thresholds, fractional rounds, negative mustFix, etc. ScoreTicker.tsx writes `var(--scale, ${state.scale})` into inline CSS and divides by it for tick width, so a guard-passing scale: 0 shipped Infinity into the rendered style. Negative composite / score values reached downstream code that assumes >= 0. Resolution: mirror the daemon-side Zod domain constraints in the runtime guard. Three new helpers in packages/contracts/src/critique.ts: - isPositiveInt(v): integer with v > 0. Used for round, maxRounds, scale, protocolVersion (all 1-indexed in the orchestrator). - isNonNegativeInt(v): integer with v >= 0. Used for mustFix, position, bestRound. bestRound: 0 is the valid sentinel for 'interrupted before any round closed'. - isNonNegativeFinite(v): finite number with v >= 0. Used for composite, score, dimScore, threshold. Threshold may be fractional (e.g. 8.5 on a scale of 10). Cross-field check inside run_started: threshold <= scale (the daemon Zod schema enforces this with an epsilon refine, the wire guard matches the same intent). Tests (TDD, red-first) added in packages/contracts/tests/critique.test.ts: - 22 new rejection cases across every numeric field that previously slipped through: scale: 0, negative scale, fractional scale, maxRounds: 0, fractional maxRounds, protocolVersion: 0, fractional protocolVersion, negative threshold, threshold > scale, round: 0, fractional round, negative dimScore / score, negative / fractional mustFix, negative composite, ship round: 0, negative / fractional bestRound, negative interrupted composite, negative / fractional parser_warning position. - 3 positive boundary cases that must still pass: threshold == scale, fractional threshold within [0, scale], interrupted with bestRound: 0 (no round completed before interrupt), parser_warning with position: 0 (start of stream). Verified: - pnpm --filter @open-design/contracts build clean. - pnpm --filter @open-design/contracts test: 4 files / 59 tests green (was 37 before the new domain cases). - pnpm --filter @open-design/web typecheck clean. - pnpm --filter @open-design/web test: 110 files / 1004 tests green; no regression on Theater suite, sse validator, replay parser, or assistant-feedback widget tests. --------- Co-authored-by: Nagendhra <nagendhra405@gmail.com>	2026-05-12 21:38:58 +08:00
pftom	6f818d971d	feat(daemon, web): implement plugin folder installation and enhance atom worker registry - Added a new API endpoint for installing plugins from specified folder paths, improving the plugin management experience. - Introduced functions for normalizing and validating project plugin folder paths, ensuring robust error handling. - Implemented a registry for built-in atom workers, allowing for dynamic signal aggregation during pipeline execution. - Enhanced the `runStageWithRegistry` function to support multiple atom workers, merging their outputs with pessimistic logic. - Updated the UI components to display plugin folder candidates and facilitate user interactions for plugin installation. - Added tests for the new atom worker registry and plugin folder installation features, ensuring reliability and correctness. This update significantly enhances the plugin installation process and the overall functionality of the atom worker system, providing users with better tools for managing plugins and their interactions.	2026-05-12 21:38:45 +08:00
pftom	61a68a34f8	feat(web): implement Home intent rail for streamlined project creation - Introduced a new Home intent rail in the HomeHero component, allowing users to select project categories or migration shortcuts via chips. - Enhanced the EntryShell and HomeView components to support project kind selection based on the chosen chip, improving the project creation flow. - Added functionality for folder import and template selection, integrating with existing project creation mechanisms. - Updated CSS styles for the Home intent rail to ensure a responsive and visually appealing layout. - Implemented tests for the new chip interactions and HomeHero functionality, ensuring reliability and user experience. This update significantly enhances the user experience by providing a more intuitive way to create projects and manage migrations directly from the Home interface.	2026-05-12 21:08:25 +08:00
pftom	ed2cbe171b	feat(daemon, web): implement media generation scenario and enhance plugin handling - Introduced a new `od-media-generation` scenario plugin for handling image, video, and audio projects, providing a default pipeline for media generation. - Updated the `collectBundledScenarios` function to deduplicate scenarios and prefer canonical IDs for task kinds, improving plugin routing. - Enhanced the `PluginsView` and `HomeHero` components to better display community and user-installed plugins, improving user experience. - Refactored tests to accommodate the new media generation scenario and ensure proper functionality across plugin types. This update significantly enhances the media handling capabilities and overall plugin management experience, making it easier for users to work with various media projects.	2026-05-12 20:54:33 +08:00
pftom	13d5598b0c	feat(web, daemon): enhance plugin import functionality and UI components - Added support for uploading plugins via zip files and folders, improving the plugin import process. - Introduced a new `PluginImportModal` for a streamlined user experience when importing plugins. - Updated the `PluginsView` to include disabled states for unfinished plugin areas, enhancing clarity for users. - Refactored various components to utilize the new `resolvePluginQueryFallback` function for improved localization handling. - Enhanced CSS styles for better visual feedback and responsiveness in the plugin import interface. This update significantly improves the plugin management experience, making it easier for users to import and manage plugins effectively.	2026-05-12 20:46:17 +08:00
pftom	443aea72c5	feat(daemon, web): enhance plugin handling and UI integration - Introduced a new plugin upload mechanism with file size limits and memory storage, allowing users to upload plugins directly. - Implemented fallback logic for plugin application, ensuring projects can be created without explicit plugin requests. - Enhanced the UI to support plugin selection and integration, including a new `PluginsView` component for managing plugins. - Updated various components to utilize localized text for plugin queries, improving user experience across different languages. - Added tests for new plugin functionalities and local skill loading, ensuring reliability and correctness. This update significantly improves the plugin management experience, providing users with better tools for plugin integration and interaction.	2026-05-12 20:42:40 +08:00
lefarcen	7b191b5f85	fix: load Orbit templates from design templates (#1442 ) (cherry picked from commit `988e727927`) Co-authored-by: shangxinyu1 <shangxinyu@refly.ai>	2026-05-12 19:38:16 +08:00
Rocky	a7e6e0dc3d	fix(web/AssistantMessage): update status block detail to latest value instead of skipping (#1413 ) When using Open Design with an ACP agent (e.g. Devin for Terminal), after selecting a non-default model in the picker, the model badge under the conversation header kept showing the agent's initial default (e.g. `model swe-1-6-fast`) instead of the running model. The conversation header text and the agent itself reflected the selected model correctly — only the badge UI was stale. Root cause: `buildBlocks()` in this file de-duplicated consecutive status events with the same label by SKIPPING the new one rather than updating the existing block. The daemon emits two `status: label='model'` events per turn — once after `session/new` returns with the agent's default model, then again after `session/set_config_option` succeeds with the user-selected model (see `apps/daemon/src/acp.ts` ~lines 495 and 587). The dedupe kept the first and skipped the second, so the badge stayed stuck on the default. Fix: update the existing block's detail to the latest value instead. "Most recent detail wins" is more accurate than "first one wins forever" for every status label that reaches this code path (`'model'`, `'initializing'`, etc.; the filter at lines 1156-1162 already drops the labels we don't want to surface as badges: streaming, starting, requesting, thinking, empty_response). Adds two regression tests in apps/web/tests/components/AssistantMessage.test.tsx: - Two sequential `status: 'model'` events with different details render the second detail and not the first (the Bug A scenario). - Two sequential events with the SAME detail still collapse to a single badge (no regression in the existing dedupe behavior). `pnpm guard` clean; AssistantMessage tests pass 22/22. Manually verified working in a local build against Devin for Terminal `2026.5.6-4` with #1208 applied — badge now updates to the selected model on every turn.	2026-05-12 19:28:35 +08:00
PerishFire	e6c5560884	Fix appearance accent color persistence (#1439 )	2026-05-12 19:11:09 +08:00
pftom	3d6d434f11	feat(web): enhance TasksView and plugin preview components with new features and styles - Updated `TasksView` to include a title row with a "Coming soon" label and added a preview note for user guidance. - Enhanced `DesignSystemSurface` to fetch and display design system showcases, improving visual representation in the plugin home section. - Refactored `PreviewSurface` to pass the `inView` prop to `DesignSystemSurface`, optimizing rendering behavior. - Improved CSS styles for tasks and design system components, ensuring a cohesive and visually appealing layout. This update significantly enhances user experience by providing clearer information and improved visual elements in the tasks and plugin previews.	2026-05-12 18:10:37 +08:00
pftom	9f4e76d507	feat(web): introduce integrations and tasks views with enhanced navigation and settings management - Added new `IntegrationsView` and `TasksView` components to facilitate user interaction with integrations and task management. - Updated the `App` component to manage initial tab states for integrations and settings navigation. - Enhanced `EntryNavRail` to include navigation options for tasks and integrations, improving accessibility. - Refactored `EntryShell` to support dynamic rendering of the new views and manage integration tab states. - Improved CSS styles for the new views, ensuring a cohesive design and responsive layout. This update significantly enhances the user experience by providing dedicated views for integrations and tasks, streamlining navigation and settings management.	2026-05-12 18:05:15 +08:00
Matt Van Horn	23218eacd9	refactor(settings): use tiled language picker instead of dropdown (#1406 ) The Language section in Settings rendered a single-button dropdown trigger that opened a floating menu. With one visible label and lots of empty panel space, the layout misled users into thinking only one language existed. Replace the dropdown trigger + portaled menu with an inline tile grid that shows every locale at a glance and clicks directly to switch. Side effects of the new layout: the languageOpen / languageMenuRect state, the dynamic placement effect, the resize-close effect, the mousedown click-outside handler, and the languageRef are gone. The global Escape handler no longer needs to guard against the menu being open. CSS for .settings-language-picker, .settings-language-button, .settings-language-menu, and .settings-language-option is replaced by .settings-language-grid (auto-fill 180px minmax columns) + .settings-language-tile. Tests in SettingsDialog.execution.test.tsx that drove the dropdown (click trigger → click menuitemradio → assert menu closed) are rewritten to drive the tiles directly via the radio role. Refs #1347	2026-05-12 17:49:04 +08:00
shangxinyu1	0220124e0f	test: add Memory and Routines coverage (#1400 ) * test: align extended Playwright coverage with current UI behavior * test: address extended suite review feedback * test: fix Codex fallback config hydration in e2e * test: add Memory and Routines coverage * test: fix Memory and Routines component test typing * test: include Memory and Routines e2e in extended suite	2026-05-12 17:48:56 +08:00
pftom	7464cb443a	feat(web): enhance plugin media detail and modal components with improved layout and functionality - Introduced a new custom stage container for media plugins, allowing for a unified modal experience across different media types (image, video, audio). - Updated the `PreviewModal` to support custom ReactNode stages, enhancing the flexibility of media previews. - Refactored the `PluginMetaSections` to include an optional heading for better clarity and organization of plugin information. - Improved the close button design in various modals for consistency and better user experience. - Enhanced CSS styles for modals and plugin details to ensure responsive design and visual consistency across the application. This update significantly improves the usability and aesthetics of media-related modals, making it easier for users to interact with and understand plugin content.	2026-05-12 17:44:32 +08:00
Rocky	6c3fd86642	fix(daemon/acp): terminate ACP child after clean prompt completion (#1286 ) * fix(daemon/acp): terminate ACP child after clean prompt completion (Bug B / #1265) Some ACP agents (notably Devin for Terminal) keep the child process alive after stdin closes, waiting for the next prompt. Open Design spawns a fresh agent per chat turn and relies on child.on('close') to finalize the run, so without an explicit signal-driven shutdown the chat sits stuck in the 'working' state indefinitely. Three small, targeted changes: - apps/daemon/src/acp.ts: After a clean session/prompt response we schedule a 500ms grace period and then SIGTERM the child. This mirrors the pattern detectAcpModels() already uses after model discovery. The grace period leaves well-behaved agents that exit on stdin.end() unaffected. - apps/daemon/src/acp.ts: New completedSuccessfully() method on the session handle reports whether the prompt resolved without a fatal error or abort, so the consumer can distinguish 'clean signal exit' from 'genuine signal failure'. - apps/daemon/src/server.ts: child.on('close') now treats a SIGTERM exit as 'succeeded' when acpSession.completedSuccessfully() is true. - apps/web/src/providers/daemon.ts: Trust the server's authoritative endStatus; the signal/non-zero-code safety net no longer overrides an explicit 'succeeded' status, so the chat doesn't surface a fake 'agent exited with signal SIGTERM' error after a clean ACP run. Daemon tests cover the SIGTERM grace timer, clean early-exit (timer cleared), and completedSuccessfully() abort/error states. Manual UI test on plain main + this fix confirms Devin chats now return to ready automatically after Done · ... * fix(daemon/connectionTest): treat ACP clean SIGTERM as success Codex review on #1286 caught that the new SIGTERM in attachAcpSession breaks ACP connection tests for agents that don't shut down on stdin.end() (the exact Devin behavior the patch targets). attachAgentStreamHandlers() in connectionTest.ts now also respects acpSession.completedSuccessfully(), mirroring the same check we apply in server.ts. Without this, a clean prompt response followed by our SIGTERM would set winner.signal === 'SIGTERM', flip exitedCleanly to false, and the connection test would report 'agent_spawn_failed' even when the agent had returned a healthy response. Also widened the AgentSpawnHandle type so completedSuccessfully is visible on the structural type used inside connectionTest.ts. All 56 daemon tests still pass; typecheck + guard clean. * fix(daemon/acp): narrow ACP success-on-signal override to forced-SIGTERM Looper review on #1286 caught that the success predicate was broader than the SIGTERM case it was meant to handle. `completedSuccessfully()` flips to true as soon as the ACP `session/prompt` response is processed, but it does not say why the child later closed. With the broad predicate, an ACP agent that returned a prompt result and then exited with code 1 (or was killed by SIGKILL/SIGSEGV) was still marked 'succeeded', regressing the existing close-status behavior for genuine post-response process failures. Scope the override to the exact forced-shutdown shape this PR introduces: code === null && signal === 'SIGTERM' && acpCleanCompletion Applied to both `server.ts` (chat run finalization) and `connectionTest.ts` (connection-test classification). Any other post-response failure now falls through to 'failed' / 'agent_spawn_failed' as before. All 59 daemon tests still pass; typecheck + guard clean. * fix(web/daemon): only bypass exit-code safety net on explicit server success Looper review on #1286 caught that the previous web change trusted `endStatus === 'succeeded'` absolutely, but `endStatus` can become 'succeeded' in two distinct ways: 1. The SSE end event explicitly carries `status: 'succeeded'` (authoritative server declaration). 2. The end event omits or has an invalid `status` field and the handler silently falls back to 'succeeded' as a local default. Both produced `endStatus === 'succeeded'` in the existing code, so the new safety-net bypass treated them identically. That regressed backward compat: a compatible or older daemon emitting an end event like `{code:1}` or `{code:null,signal:"SIGTERM"}` with no `status` would suddenly skip the failure banner. Track explicit success separately via `serverDeclaredSuccess`, set true only when: - The SSE end event has `status === 'succeeded'`, or - The fallback `fetchChatRunStatus` REST path returns `status === 'succeeded'` (which the existing `isChatRunStatus()` guard already proves is explicit). The safety net is now bypassed only on that explicit signal; the local-fallback success path still reaches the exit-code/signal check so real failures surface as before. Adds three web-side regression tests in `apps/web/tests/providers/sse.test.ts`: - Explicit `status: 'succeeded'` + SIGTERM → onDone called, no error - End event with `{code:1}` and no `status` → onError surfaces 'agent exited with code 1' as before - End event with `{code:null,signal:'SIGTERM'}` and no `status` → onError surfaces 'agent exited with signal SIGTERM' as before `pnpm guard` + daemon typecheck clean; 27/27 SSE tests pass (up from 24).	2026-05-12 17:13:10 +08:00
Yiang Yiyan	5ff578dc8d	fix: support object-style question-form options (#1293 ) * fix: support object-style question-form options * fix: preserve stable option values in form submissions	2026-05-12 17:03:45 +08:00
Mason	2f51f3c1ae	feature: refine assistant artifact feedback (#1379 ) * feature: refine assistant artifact feedback * fix: clear hidden custom feedback reason * test: update assistant feedback expectations	2026-05-12 17:00:42 +08:00
pftom	244e8b7981	feat(daemon): enhance plugin preview handling and add fallback mechanisms - Implemented a new `collectPluginPreviewCandidates` function to gather potential HTML assets for plugins, improving the robustness of the preview endpoint. - Introduced `discoverPluginHtmlAssets` to scan common directories for HTML files, ensuring that plugins with missing declared entries can still provide a valid preview. - Updated the `/api/plugins/:id/preview` route to utilize the new candidate collection and discovery functions, enhancing the user experience by preventing blank tiles in the gallery. - Added comprehensive tests for the new fallback logic to ensure reliability and correctness in various scenarios. This update significantly improves the plugin preview functionality, ensuring users have access to valid previews even when manifest entries are outdated or missing.	2026-05-12 16:45:24 +08:00
pftom	2e27745800	feat(web): enhance PluginsHomeSection with improved plugin card visuals and layout - Updated the `PluginsHomeSection` to change the section title from "Plugins" to "Community" and revised the subtitle for clarity. - Refactored the `PluginCard` component to include a new `PreviewSurface` for dynamic rendering of plugin previews based on type (media, HTML, design). - Introduced lazy loading for plugin previews using the `useInView` hook to optimize performance. - Added new components for different preview types: `MediaSurface`, `HtmlSurface`, `DesignSystemSurface`, and `TextSurface`, enhancing the visual presentation of plugins. - Improved CSS styles for a more responsive and visually appealing grid layout, accommodating up to five tiles per row on larger screens. - Implemented a new `inferPluginPreview` function to classify and render plugins based on their manifest data. This update significantly enhances the user experience by providing a more engaging and visually rich plugin gallery.	2026-05-12 15:51:39 +08:00
이용진	aeb6cde923	prevent duplicate saves and add template deletion (#1294 ) * prevent duplicate template entries on repeated save * add delete button to saved template list Templates can now be removed from the template picker via a hover x button, calling the existing DELETE /api/templates/:id endpoint. * add missing onDeleteTemplate prop in test fixtures * add template deletion flow test for NewProjectPanel * reject template names longer than 100 characters * preserve original createdAt on template update	2026-05-12 15:48:04 +08:00
pftom	6cccccdc56	feat(web): refactor PluginsHomeSection to implement 3-axis faceted filtering - Replaced the previous tag-based filtering with a new 3-axis model (SURFACE, TYPE, SCENARIO) for enhanced plugin discovery. - Introduced a new `usePluginFacets` hook to manage the independent selection of facets and their AND-composition. - Updated the `PluginsHomeSection` component to render facet rows and a Featured chip, improving user interaction and layout. - Removed legacy categorization logic and associated files, streamlining the codebase. - Enhanced CSS styles for the new layout and improved visual consistency across the plugins home section. - Added tests to ensure the correctness of facet extraction and filtering logic. This update significantly enhances the user experience by providing a more flexible and intuitive way to filter and discover plugins.	2026-05-12 15:29:59 +08:00
Eli	9c489aa045	feat(web): redesign Designs tab cards — covers, tags, overflow menu, multi-select (#1161 ) * feat(web): redesign Designs tab cards — covers, tags, overflow menu, multi-select - Render real previews on project cards: HTML iframe / image / video / hashed gradient fallback with project initial; lazily fetches the project's primary file when metadata.entryFile is unset, prefers index.html → newest html → image → video. - Live artifact card thumbnails embed the rendered artifact URL via sandboxed iframe. - Replace the per-card close button with a `…` overflow menu (Rename, Delete) that opens on hover/click; click-outside and Esc close it. - Add multi-select mode (toolbar toggle → checkbox per card → "N selected · Delete · Cancel" pill) with batch delete via the existing onDelete prop. - Add a category tag to every card (Prototype / Live Artifact / Slide / Media) derived from project.metadata.intent / kind / skillId. - Replace browser prompt() and confirm() with custom modals (rename input + danger-confirm) reusing the existing .modal shell. - Add `more-horizontal` icon and 16 new i18n keys across all 18 locales (zh-CN/zh-TW localized; others fall back to English). * test(e2e): update home delete flow for overflow menu + custom confirm modal The previous flow targeted a per-card X button labelled "delete project <name>" and asserted on a native `dialog` event. The card UI now exposes a `…` overflow menu and a styled confirm modal, so reach delete via the menu and assert against the modal's Cancel / Delete buttons instead. * fix(web): harden Designs tab preview sandbox * fix(web): hide Designs select mode in kanban	2026-05-12 15:08:22 +08:00
Eli	77f69257a7	feat(web): in-context comment thread for the artifact preview (#1276 ) * feat(web): free-pin fallback in comment mode for unannotated artifacts When the artifact has no data-od-id annotations, clicking in Comment mode now posts a synthetic position-based target so the host opens a popover at the click location. Daemon upsert validation requires a non-empty selector/label, so the pin uses [data-od-pin=ID] and label 'pin'. Coordinates are document-space (viewport + scrollY) so pins stay anchored after scroll/reload. Clicks on interactive elements (a/button/input/textarea/select/label/contenteditable) keep their native behavior and are not pinned. * feat(web): tighten comment popover layout for free-pin and element targets The popover header used to dump the raw elementId verbatim — fine for data-od-id targets like 'hero-cta' but jarring for free-pins where elementId is a synthetic 'pin-...' string. Branch on the prefix and show 'Pin · at X, Y' for free-pins; keep the label + selection kind for real element / pod targets. Replace the text 'Close' button with an icon-only close affordance to match the popover-as-card visual. Action row is now two right-aligned buttons (Comment + Send to Claude) for element targets and (Add note + Send to Claude) for pod targets, eliminating the three-button row that wrapped onto two lines at narrow widths. The 'Remove' affordance for existing comments stays left-aligned. * feat(web): drop comments tab from chat sidebar The chat sidebar's 'Comments' tab listed saved/attached preview comments but duplicates the per-element popover already shown in the artifact viewer. Hide the tab and its content while the right-side comment thread panel takes over the same surface in-context. The CommentsPanel / CommentSection components stay defined as dead code for the moment so callers and translation keys remain valid; a later pass can delete them. * feat(web): right-side comment thread panel in board mode Render a 320px CommentSidePanel anchored to the right of the artifact preview whenever board (comment) mode is on. The panel lists every saved preview comment for the current file with an avatar initial, the element label (or 'Pin' for free-pin synthetic ids), an Xd/Xh/Xm-ago timestamp, the note body, a Reply link, and a checkbox. Reply focuses the comment's element via liveSnapshotForComment so the popover opens at the right anchor. Selecting one or more comments via the checkboxes surfaces a 'N selected · Clear · Send to Claude' action bar above the list; Send to Claude reuses the existing onSendBoardCommentAttachments pipeline via commentsToAttachments. The panel takes the place of the chat sidebar's removed Comments tab so the thread lives next to the artifact instead of behind a tab switch. * feat(web): styles for right-side comment thread panel Floating 320px panel anchored to the right edge of the artifact preview with a scrollable comment list and a coral selection bar that appears when one or more comments are checked. Selected items get a coral tint; the reply / check / send-to-claude controls match the popover's coral primary tone. * feat(web): toast confirmation on comment save, close popover After savePersistentComment succeeds, close the popover via clearBoardComposer and surface a transient 'Comment saved' (or 'Pin saved' for free-pin targets) toast for 2.2s. Replaces the previous behavior where the popover stayed open with an empty draft after save, which left users uncertain whether the save landed and forced an extra click to dismiss. * feat(web): position the comment-save toast at the top of the preview * feat(web): allow editing saved comment notes via the side panel Rename the per-item 'Reply' affordance to 'Edit' (no thread model exists yet, so reply was misleading) and pre-fill the popover with the existing note when clicked. The save path goes through onSavePreviewComment which the daemon implements as an upsert keyed on (project, conversation, filePath, elementId), so the edit overwrites the existing row's note without spawning a duplicate. Also fall back to a snapshot synthesized from the saved comment's own fields when the corresponding live target is no longer in the iframe DOM (e.g. free-pin parents that were re-rendered), so the edit path still works after artifact reloads. * feat(web): hide already-sent comments from the side panel After Send to Claude, the daemon flips the comment status from 'open' to 'applying' (and then 'needs_review' / 'resolved' / 'failed' depending on the run). Filter the side panel to status === 'open' so sent comments visibly leave the list — the user gets clear feedback that the send landed and the panel stays focused on actionable, un-sent items. * feat(web): drop single-tab bar and conversation count badge After the Comments tab was removed the chat header still rendered a one-tab 'tablist' just for the Chat tab, which read as visual noise without a sibling to switch between. Drop the tabs wrapper entirely; the chat content stays mounted and the header now hosts only the conversation-history affordance. Also drop the numeric badge that overlaid the conversation history button: counting open conversations next to a generic history icon was easy to mistake for an unread / notification count. The dropdown itself remains the canonical place to see and switch between past conversations. * feat(web): right-align chat header actions after tab bar removal With the tabs wrapper gone, chat-header-actions sat flush left because nothing was pushing it across the header. Add margin-left: auto so the history / new-conversation / collapse buttons land at the right edge, matching the design files / index.html tab row's own right-aligned controls. * feat(web): rename board-mode toggle to Comment with comment icon The artifact preview toolbar's board-mode entry was labeled 'Tweaks' with the tweaks icon, which collided with the palette Tweaks button next to it and hid the comment capability behind a generic label. Rename to 'Comment' with the comment icon and switch to the viewer-action class so the button matches the surrounding toolbar items (Edit/Draw) and the coral active state lands on the right surface. * fix(web): pass designTemplates to ProjectView in api-empty-response test The test props for ProjectView were missing the designTemplates prop that was added to Props in #955 (generic skills split). CI's strict typecheck (tsc -b --noEmit) caught it; local runs that hit project references differently did not. Pass an empty SkillSummary array — matches the empty skills fixture for the same reason.	2026-05-12 15:05:08 +08:00
Eli	928079daf5	feat(web): consolidate Image/Video/Audio entries into a Media tab (#1167 ) Reduces the New Project panel's top-level tab count by collapsing the three media surfaces into a single Media tab with an inner segmented control, and polishes the controls inside that tab so they stop dominating the panel: - Media tab + segmented (Image / Video / Audio) inside the panel body. Underlying ProjectKind branches and submission contract unchanged — the daemon still receives kind=image/video/audio. - Model picker rewritten as a combobox: one trigger row + searchable, provider-grouped popover with Recommended badges. Replaces the flat grid of provider-grouped cards that scrolled past the fold once the fourth provider landed. - Aspect picker compressed from a 5-card grid to a single row of segmented pills with mini ratio glyphs. - Image surface no longer carries a free-form Style notes field; it was redundant with the prompt template + main prompt input. - Live artifact tab locks fidelity to high-fidelity (the wireframe option is now hidden) — a wireframe live artifact doesn't make sense and the picker added noise. i18n: adds tabMedia / titleMedia / model* keys across all 18 locales, removes imageStyleLabel / imageStylePlaceholder. Tests + e2e selectors updated to drive the new Media tab + segmented surface flow.	2026-05-12 14:52:03 +08:00
huyhoangnhh98	140a4e1ff6	Improve responsive preview and design handoff outputs (#1224 ) * feat: improve responsive design handoff * feat: refine cross-platform design outputs Changelog:\n- Add auto-fit responsive preview behavior for tablet/mobile frames.\n- Add landing page and OS widgets metadata options with project header chips.\n- Strengthen prompt contracts for modern breakpoints, app-specific modules, CJX-ready UX, and final product surfaces.\n- Require cross-platform outputs to use separate platform files instead of tabbed demo selectors.\n- Add DESIGN-MANIFEST.json plus richer handoff guidance to daemon/client exports.\n- Update archive/export tests for manifest and responsive viewport matrix. * feat: enforce screen-file design outputs Changelog:\n- Enforce screen-file-first generation for landing pages, app screens, platform surfaces, and OS widgets.\n- Update design handoff and manifest exports so coding tools map each screen file to separate routes/surfaces.\n- Strengthen minimal-brief visual guidance to avoid monochrome or unstyled design outputs. * fix: address responsive handoff review feedback * fix: address handoff review blockers * fix: preserve proxy auth and normalized export entry * fix: narrow frame wrapper filter to directory paths only * fix: make artifact save failure banner generic --------- Co-authored-by: Huy Hoàng <macos@MacBook-Pro-Hoang.local>	2026-05-12 14:18:33 +08:00
pftom	5af84c09af	feat(web): refactor PluginsHomeSection to use tag-based filtering and introduce PluginCard component - Replaced the legacy tabbed categorization in `PluginsHomeSection` with a tag-driven approach, allowing dynamic filtering based on plugin tags. - Introduced a new `PluginCard` component to encapsulate the rendering of individual plugin cards, improving separation of concerns and maintainability. - Added a `usePluginCategories` hook to manage plugin visibility and filtering logic, enhancing the overall structure and testability of the component. - Implemented a "More" pill for overflow tags in the filter row, improving user interaction with a cleaner UI. - Updated CSS styles to support the new layout and improve visual consistency across the plugins home section. This update significantly enhances the user experience by providing a more flexible and intuitive way to discover and interact with plugins.	2026-05-12 13:25:44 +08:00
pftom	9825b3ba1f	feat(web): enhance entry view with responsive topbar and GitHub star integration - Updated `EntryShell` to include a responsive topbar that collapses into a settings dropdown on narrow viewports, improving accessibility and user experience. - Integrated `GithubStarBadge` to display the current star count for the GitHub repository, encouraging user engagement. - Added a new `useGithubStars` hook to manage the star count fetching and caching, ensuring efficient updates without unnecessary API calls. - Enhanced CSS styles for the topbar and avatar items to improve visual consistency and interaction feedback. - Updated internationalization files to include translations for new UI elements related to the GitHub star feature. This update significantly improves the user interface by providing a more dynamic and engaging entry experience.	2026-05-12 11:57:41 +08:00
pftom	45760a75aa	feat(web): enhance entry view with API protocol and model switching - Introduced `InlineModelSwitcher` to allow users to switch between CLI and BYOK modes, along with selecting the active model. - Updated `App` component to handle API protocol and model changes, ensuring seamless configuration updates. - Modified routing to support distinct views for home, projects, and design systems, improving navigation and user experience. - Removed legacy `PluginsSection` from `NewProjectPanel`, streamlining the project creation process. - Enhanced CSS styles for better visual consistency across updated components. This update significantly improves user interaction by providing intuitive controls for managing execution modes and models directly from the entry view.	2026-05-12 11:25:06 +08:00
Nagendhra Madishetti	dbc94b83ed	feat(web): add thumbs-up/down feedback widget under completed assistant turns (#1288 ) (#1308 ) * feat(web): add thumbs-up/down feedback widget under completed assistant turns (#1288) Adds a lightweight feedback widget that surfaces under each assistant turn whose run succeeded. Users can submit positive or negative feedback in one click; the negative path opens an optional free-text comment area. The widget never blocks the message composer and only mounts after the run has produced its final artifact, matching the acceptance criteria. What ships - `<MessageFeedback>` (apps/web/src/components/MessageFeedback.tsx) renders the three states: idle (prompt + thumbs), submitted positive (confirmation + Change), submitted negative (confirmation + optional comment textarea + Send + Change). - AssistantMessage.tsx slots the widget under AssistantFooter, gated on `runSucceeded && !hasEmptyResponse`, so failed and empty-response turns don't ask the user to rate something that never finished. - The full record shape leaves room for the future analytics metadata the issue calls out (rating, comment, submittedAt; artifactRef / runId derivable from the surrounding message whenever the analytics pipeline lands). Persistence (v1 = localStorage) Lefarcen's clarifying comment on the issue asked whether v1 should be daemon-persisted or in-memory while the analytics pipeline is defined. The daemon's messages table is column-strict, so daemon persistence would require a SQLite migration plus a contract bump on `ChatMessage`; locking that shape in before the analytics pipeline is designed risks reworking it twice. localStorage is the middle ground: feedback survives reload (so the "feedback state is visually clear after submission" criterion holds across tabs and sessions) without committing the wire shape. The hook surface is just `(value, setter)`, so a future PR can swap the storage layer for a daemon mirror or an analytics shipper without touching the React surface. The store handles corrupted JSON, unknown future rating values, disabled storage (private-mode browsers), and broadcasts changes across listeners in the same tab via a CustomEvent so two mounts of the hook for the same messageId stay in sync. i18n 11 new keys under `feedback.` (prompt, thumbsUp/Down, two confirmation chips, comment label/placeholder/submit/saved, change). English source values authored alongside the keys; zh-CN translations added in the same pass so the locale alignment test stays green and Chinese users see Chinese strings from day one. The other 16 locales pick up English fallbacks via their existing `...en` spread. Test coverage - `tests/state/message-feedback.test.ts` (8 jsdom cases) — round-trip, null-clear, corrupted JSON, missing rating, unknown rating, key collision across messages. - `tests/components/MessageFeedback.test.tsx` (7 jsdom cases) — idle state, positive submit, negative submit, comment save, blank-comment Send disabled, Change unsticks the rating, rehydration from pre-populated storage. The locale alignment test continues to enforce that every locale declares the new keys (5/5 across 18 locales). Validated - pnpm guard clean - pnpm --filter @open-design/web typecheck clean - tests/i18n/locales.test.ts 5/5 - tests/state/message-feedback.test.ts 8/8 - tests/components/MessageFeedback.test.tsx 7/7 - Full web suite: 98 files, 903 tests fix(web): tighten feedback widget gate + storage sync + textarea, add styles (PR #1308 review) Addresses every P2/P3 from the codex + Siri-Ray + lefarcen reviews on PR #1308, plus a couple of polish items the review surfaced indirectly. Visibility gate (lefarcen P2) The gate was `runSucceeded && !hasEmptyResponse`, which also matched text-only acknowledgements and question-form replies. The issue scopes feedback to turns that produced a final artifact, so the gate now also requires `produced.length > 0`. New AssistantMessage suite (5 jsdom cases) pins: artifact -> shown, no-artifact -> hidden, streaming -> hidden, failed run -> hidden, empty_response -> hidden. Storage sync (codex P2 + lefarcen P2) The previous broadcast contract was: write storage, dispatch a bare CustomEvent, listeners re-read storage. That had two failure modes: - setItem throwing (private mode / quota / disabled storage) left the listener seeing null and clobbering the in-memory state the user just confirmed. - The clear path early-returned after removeItem and never dispatched, so a second mount of the same messageId stayed in the submitted state when the user clicked Change. New contract: every successful OR failed write dispatches a CustomEvent whose `detail.value` carries the new feedback record (or null). Listeners apply the value directly without re-reading. Same- tab sync survives storage failures and the clear path no longer early-returns. Cross-tab still re-reads on the platform `storage` event since that event has no detail. Two new storage tests pin the new broadcast contract (positive + null) and the failed-setItem path; two new component tests pin in-session confirmation under setItem failure and two-mount Submit + Change synchrony. Textarea draft fix (lefarcen P3) The textarea used `draftComment \|\| feedback.comment \|\| ''` as its controlled value, so erasing a saved comment snapped it back. The draft is now exclusively the source of truth; a ref-backed effect re-seeds the draft from feedback.comment whenever the rating transitions (mount, idle -> negative, cross-mount sync). Send is now enabled when `draftComment !== savedComment`, which lets the user both edit and clear a saved comment. New component test pins erase+ Send actually removing a previously-saved comment. Accessibility The confirmation chip and "Comment saved" tag both gain `role="status"` + `aria-live="polite"` so screen readers announce the state transition. The thumb buttons keep their `aria-label`. CSS (lefarcen P3) The widget's `.message-feedback*` class set had no rules in index.css, so it rendered with default browser controls. Added a ~130-line block that mirrors the surrounding chat pill/chip vocabulary: bg-subtle background, border-pill confirmation chip, accent-tinted positive state and amber-tinted negative state to match the assistant-footer's data-unfinished pattern. Comment area sits below the chip and wraps on narrow widths so the composer isn't pushed off-screen on small panes. Validated - pnpm guard clean - pnpm --filter @open-design/web typecheck clean - tests/state/message-feedback.test.ts 10/10 (was 8, +2 broadcast) - tests/components/MessageFeedback.test.tsx 10/10 (was 7, +3 sync / storage-failure / clear-saved-comment) - tests/components/AssistantMessage.test.tsx 5/5 (new file) - tests/i18n/locales.test.ts 5/5 - Full web suite: 866 tests --------- Co-authored-by: Nagendhra <nagendhra405@gmail.com>	2026-05-12 11:10:28 +08:00
pftom	b55f171693	feat(web): redesign entry view with new navigation and home layout - Introduced `EntryNavRail` for a streamlined left navigation experience, featuring primary actions and a brand logo. - Created `EntryShell` to manage the entire home view layout, integrating the centered hero, recent projects, and plugins section. - Developed `HomeHero` for user prompt input, allowing seamless interaction with plugins and project creation. - Replaced the previous `PluginLoopHome` with a more cohesive `HomeView` that orchestrates plugin interactions and project submissions. - Enhanced CSS styles for improved visual consistency across the redesigned components. This update significantly enhances the user experience by providing a more intuitive and visually appealing entry point into the application.	2026-05-12 10:56:35 +08:00
Nagendhra Madishetti	1df3eca161	feat(web): Critique Theater Phase 7 — reducer + useCritiqueStream + useCritiqueReplay (#1307 ) * feat(web): pure reducer for Critique Theater states (Phase 7.1) Pure CritiqueState reducer driven by the contracts-level PanelEvent (the same shape both the live SSE stream and the recorded transcript emit), so a single reducer powers both the in-flight panel and the rerun replay. Lifecycle covers run_started → running → (shipped / degraded / interrupted / failed), with panelist_open / dim / must_fix / close / round_end events building per-round CritiquePanelistView entries as they arrive. Defensive behaviour that surfaced while writing the spec tests: - Terminal phases (shipped / degraded / interrupted / failed) are sticky against further lifecycle events for the same run, except for parser_warning which can land late and is recorded in a side channel without changing phase. - A new run_started for a different runId at any time discards the prior state and reboots, so the UI can launch consecutive runs without an explicit reset action. - Events whose runId does not match the active run return the same state reference, so React's useReducer doesn't re-render subscribers on stray traffic. - Round bookkeeping keys by round number rather than "always last", so an out-of-order panelist_dim for round 1 arriving after a round 2 dim does not corrupt the round 2 bucket. Test coverage: 18 cases covering each transition, the runId guard, sticky-terminal behaviour, the out-of-order round invariant, and the stable-identity guarantee. Sets up Phase 7.2 and 7.3 to wire SSE + replay into the same reducer. * feat(web): useCritiqueStream hook subscribes to SSE and feeds reducer (Phase 7.2) createCritiqueEventsConnection is a pure connection manager that mirrors apps/web/src/providers/project-events.ts: opens an EventSource at /api/projects/:id/events, listens for every name in CRITIQUE_SSE_EVENT_NAMES, decodes each frame back into a PanelEvent (stripping the critique. prefix and merging the data payload), and hands it to the caller's onEvent. Reconnect uses exponential backoff (1s → 30s) and resets on `ready`; malformed payloads drop with a dev-mode warning rather than tearing the stream. useCritiqueStream wraps the manager in a useReducer that owns the CritiqueState. enabled=false or a null projectId tears down the connection cleanly; switching projectId closes the old connection and opens a fresh one. The returned dispatch lets local UI synthesise actions (e.g. an Esc keypress firing a synthetic interrupted while a kill request is in flight); production traffic comes from the SSE stream. Test coverage: - sse.test.ts (10 cases, node env): subscription set covers every CRITIQUE_SSE_EVENT_NAMES channel; payload decoding lifts the wire shape back to PanelEvent; malformed JSON is swallowed and does not stop the stream; exponential backoff schedule and ready-reset semantics are pinned with a setTimeout seam; close() cancels pending reconnects and shuts the live source; no-op fallback when EventSource is unavailable. - useCritiqueStream.test.tsx (6 cases, jsdom env): idle pre-event, reducer driven by synthetic actions, no connection when disabled or projectId is null, clean close on unmount, projectId change reopens cleanly. * feat(web): useCritiqueReplay hook drives reducer from transcript file (Phase 7.3) Fetches the per-run NDJSON transcript (one PanelEvent per line), parses every line via the shared isPanelEvent predicate, and dispatches into the same CritiqueState reducer the live SSE stream uses. A single reducer means the UI rendering a replay can be identical to the live panel, and a UI mounting both useCritiqueStream and useCritiqueReplay in parallel does not have to reconcile two state shapes. speed knob is `paused \| instant \| live \| { intervalMs: N }`. - instant flushes every event synchronously, useful for opening a finished run already at its terminal state. - intervalMs paces dispatches at a fixed cadence so the reviewer can watch the run unfold. - paused parses the transcript but holds events back until the caller advances speed (consumers can drive a scrubber later). - live is reserved for the future "playback at original cadence" feature, currently treated as instant; replay timestamps are not yet persisted with each event so honest pacing requires a follow-up Phase 7+ task. gunzip seam handles `.ndjson.gz` transcripts via DecompressionStream when present; the production fetch path picks between text and arrayBuffer based on the URL extension. Both seams are injectable so the unit tests don't need to spin up a real network or a real gzip pipeline. Test coverage (8 cases, jsdom env): - Idle status before any URL is provided. - speed=instant flushes the full transcript synchronously to shipped state. - speed={intervalMs:N} paces with the setTimeout seam, reaching done after the last tick. - speed=paused leaves status=playing with no dispatches. - Empty transcript reports done with state still idle. - Fetch rejection surfaces an error status with the message. - Malformed NDJSON lines are skipped; valid events around them still land. - .gz transcripts route through the gunzip seam. Closes the Phase 7 plan tasks 7.1 / 7.2 / 7.3 (reducer + stream + replay), all on one branch ready for review. Phases 8+ (Theater components) consume these from this PR. * fix(web): close payload-override gap + paused-resume bug in Critique Theater hooks (Phase 7 review) Two P1 fixes from lefarcen's review on PR #1307: SSE payload override `sseToPanelEvent` previously spread `data` after the channel-derived `type`, so a payload-provided `type` could override the channel and route a `critique.run_started` frame into the reducer as a `ship` action. Reversed the spread so the channel-derived `type` is authoritative, and revalidated the resulting object through the contracts-level `isPanelEvent` predicate before returning. Frames that fail validation (missing runId, empty runId, unknown type) are dropped, so a malformed or compromised SSE frame can no longer dispatch a wrong-shape action into the reducer. Three new sse.test.ts cases pin the regression: hostile `type:'ship'` in the payload still resolves to `run_started`, missing runId is dropped, empty runId is dropped. Replay pause/resume `useCritiqueReplay` had one big effect keyed on `transcriptUrl` only, so flipping `speed` from `paused` to `instant` never re-fired and the held events sat undispatched. Split into a parse effect (depends on URL, fetches and stores events in state) and a pace effect (depends on parsed-events + speed, owns the cursor + timers). The playback cursor lives in a ref that survives pause/resume cycles, so flipping `paused` -> `instant` flushes from the current position rather than restarting (which would double-dispatch `run_started` and reset the reducer). Two new useCritiqueReplay.test.tsx cases: - paused-then-instant transitions from `playing` to `done` and reaches the shipped terminal phase - intervalMs paced playback dispatches one event, pauses to drain the next scheduled timer, flips to instant, and confirms the remaining transcript drains exactly once (cursor was preserved) Doc consistency The earlier source comment in useCritiqueReplay.ts claimed `live` "paces by recorded timestamps" while the impl used zero-delay timers and the PR body said it behaves like `instant`. Aligned to reality: `live` currently behaves like `{ intervalMs: 0 }` (events drain on successive microtasks via setTimeoutFn) because transcripts do not yet carry per-event timestamps. Honest timestamp-driven pacing is queued as a Phase 7+ follow-up. Validated: pnpm guard, pnpm --filter @open-design/web typecheck, Theater suite 47/47 (up from 42, +3 sse + 2 replay), full web suite 96 files / 888 tests. --------- Co-authored-by: Nagendhra <nagendhra405@gmail.com>	2026-05-12 10:45:07 +08:00
Nagendhra Madishetti	64510b790b	fix(web): translate Design Files refresh strings instead of hardcoding English (#1254 ) (#1300 ) * fix(web): translate Design Files / live artifact refresh strings instead of hardcoding English When the app language was set to Chinese, the Design Files refresh flow showed Chinese for the surrounding chrome but kept English for every label and message originating in describeRefreshStatus, describeEventPhase, and the refresh-event timeline body of LiveArtifactRefreshHistoryPanel. Same-screen mixed-language UX, the exact symptom reported in #1254. Root cause: those three sites bypassed i18n entirely. describeRefreshStatus returned hardcoded English label + description strings for the running / succeeded / failed / idle / never statuses; describeEventPhase returned hardcoded Started / Succeeded / Failed labels; the timeline body inlined "Refresh started…", "<n> source(s) updated", and "Refresh failed." string literals; and the empty-timeline copy ("No refresh activity yet in this session. Trigger Refresh to record a timeline…") was hardcoded too. Fix: thread the existing TranslateFn through both helpers, swap every hardcoded string for a t() lookup, and pull the empty-timeline copy and the failure-fallback through the same path. Added 13 new keys under liveArtifact.refresh.* — statusRunning, the five Description keys, three event-phase labels (eventStarted/Succeeded/Failed), eventStartedDetail, sourcesUpdatedOne/Many with an {n} placeholder, and timelineEmpty. Status labels for succeeded / failed / ready / never already had keys (statusSucceeded / statusFailed / statusReady / statusNever) so those are reused unchanged. Locales: full Chinese translations added to zh-CN.ts (the locale directly named in the issue). The other 16 locales pick up English fallbacks through their existing ...en spread, so the locale-key alignment test stays green; native translations for those locales can land via the usual locale-team passes without re-touching the source code. fix(web): cover the rest of the refresh panel under i18n + add a zh-CN render test Lefarcen's review on #1254 / PR #1300 surfaced that the first pass only translated three helpers (describeRefreshStatus, describeEventPhase, session timeline body) and left the rest of the panel in English. Under a Chinese UI the panel still mixed languages, which was exactly the regression the issue was filed for. This commit threads t() through every user-visible refresh-panel string the user would see in the Chinese flow: - Hero block: "Last refreshed" label + "Never" empty state. - Created / Last updated facts + their "Unknown" empty label. - Persisted refresh history header, hint, empty-state copy. - Persisted timeline status badge: succeeded / running / failed / cancelled / skipped now resolve through describePersistedStatus, which uses an exhaustive switch off LiveArtifactRefreshLogEntry's status union so a future contract addition trips tsc. - Session activity header, hint. - Document source header, hint, Type / Tool / Connector field labels. - Advanced debug metadata summary + note line. - "just now" relative-time fallback in the persisted timeline. 22 new i18n keys total (23 with the new heroLastRefreshedNever distinct from statusNever); zh-CN strings authored alongside the English source, every other locale picks them up via its existing ...en spread and the locale-key alignment test stays green. Intentionally untranslated surfaces: raw daemon payloads inside the <details> debug panel (event.step / refreshId / error.message and the JSON.stringify dump), since those are agent / connector identifiers and stack-trace style strings, not localised copy. The debug summary heading itself is translated; if the debug section should be hidden in localised primary flows, that is a separate UX call worth its own issue. Test coverage: new render test wraps LiveArtifactRefreshHistoryPanel in I18nProvider initial="zh-CN" and pins the Chinese rendering of every translated label, plus negative assertions that the formerly hardcoded English literals are NOT present in the markup. With the no-provider fallback returning English, the existing static-markup tests can't observe the regression this PR is meant to fix; the zh-CN render test is the only one that would have caught the original gap and will catch the next one. Validated: pnpm guard, pnpm --filter @open-design/web typecheck, locales.test.ts (5/5), FileViewer.test.tsx (69/69, +1 new zh-CN test), full web suite (92 files, 841 tests). * fix(web): route formatRelativeTime through Intl.RelativeTimeFormat so units localise Lefarcen's second pass on PR #1300 caught the remaining hardcoded English path: formatRelativeTime() still emitted units like `5s ago` and `45m ago`, so Chinese users would see those strings inside the otherwise-translated refresh panel. The function now takes the active locale + TranslateFn and routes through Intl.RelativeTimeFormat with style: 'narrow', numeric: 'always'. That preserves the historical `5s ago` shape for English while producing locale-correct output for every other locale (zh-CN gets `5秒前` / `45分前`, with the right past / future suffix and word order). The `just now` carve-out (abs < 5s) keeps using t('liveArtifact.refresh.justNow') since Intl's narrow output for zero-delta reads awkwardly. A try/catch around the RTF constructor falls back to 'en' if the runtime rejects the locale, so the function is safe on engines with limited ICU data. Callsites threaded through: - LiveArtifactRefreshHistoryPanel hero metric (`lastRefreshedAt`) - Session timeline event row (`event.startedAt`) - Session timeline event time (`event.at`) - LiveArtifactRefreshFact for the created / last-updated facts; the component now accepts optional `locale` + `t` props and the panel passes them in. Test coverage extension: - The existing zh-CN render test sets a real lastRefreshedAt (now - 45s) and real session-event timestamps, then asserts the Chinese past-tense suffix `前` appears AND the legacy English `Xs ago` / `Xm ago` shapes do NOT. That was the gap lefarcen pointed at: setting `lastRefreshedAt: undefined` couldn't see the regression because no relative-time formatting ran. - Added a small second test for the lastRefreshedAt-undefined empty hero so the original `从未` coverage still pins. Validated: pnpm guard, pnpm --filter @open-design/web typecheck, FileViewer.test.tsx (70/70, +1 new test), locales.test.ts (5/5), full web suite (92 files, 842 tests). --------- Co-authored-by: Nagendhra <nagendhra405@gmail.com>	2026-05-12 10:38:07 +08:00
Caprika	5bd9763181	[codex] Improve Claude Code exit diagnostics (#1267 ) * fix daemon claude diagnostics * fix claude custom endpoint auth diagnostics * fix project view api empty response test props * fix claude diagnostic review gaps * fix silent custom endpoint claude diagnostics * fix claude diagnostic credential redaction * fix quoted api key redaction * fix claude diagnostic tail redaction * fix silent claude configured profile diagnostics	2026-05-12 00:08:31 +08:00
pftom	4983fc3ac4	feat(web): implement file operations summary in assistant messages - Added a new `FileOpsSummary` component to display a summary of file operations (read, write, edit) performed during an agent's run, enhancing user visibility into file interactions. - Integrated the `FileOpsSummary` into the `AssistantMessage` component, allowing it to show a compact view while streaming and expand to a detailed list once the run completes. - Created a new `file-ops` CSS style to manage the presentation of the file operations summary, including hover effects and status indicators. - Developed utility functions in `runtime/file-ops.ts` to derive and count file operations from agent events, ensuring accurate aggregation of file interactions. - Added tests for the `FileOpsSummary` component and the file operations logic to ensure functionality and prevent regressions. This update improves the user experience by providing clear insights into file operations related to agent activities.	2026-05-11 23:25:38 +08:00
pftom	070b8b07c6	feat(web): enhance PluginLoopHome with plugin details modal and refined plugin rail - Introduced a new `PluginDetailsModal` component to display detailed information about plugins directly from the PluginLoopHome interface, allowing users to preview queries, inputs, and other relevant data before applying a plugin. - Updated the `ChatComposer`, `ChatPane`, and `InlinePluginsRail` components to support a single pinned plugin display when a project is created with a specific plugin, improving user experience by preventing unnecessary plugin selection prompts. - Enhanced CSS styles for better visual presentation of plugin actions and details. - Added tests to ensure the new functionality works as intended and to prevent regressions related to the plugin rail behavior. This update streamlines the plugin selection process and enhances the overall user experience within the application.	2026-05-11 23:18:34 +08:00
pftom	b3dc3c3e0c	feat(web): integrate applied plugin snapshot for enhanced user experience - Added support for displaying an active plugin as a context chip in user messages when a project is created with a pinned plugin. This replaces the in-composer plugin rail to avoid re-prompting users for plugin selection. - Introduced `applied_plugin_snapshot_id` in the database schema and updated relevant components (ChatComposer, ChatPane, ProjectView) to handle the new functionality. - Implemented fetching of the applied plugin snapshot in ProjectView to ensure the active plugin is rendered correctly. - Enhanced CSS for the plugin chip to improve visual presentation. This change streamlines the user experience by providing context on previously selected plugins directly within the chat interface.	2026-05-11 22:53:40 +08:00
nettee	87a95b7fb4	Fix conversation run isolation (#1271 )	2026-05-11 21:13:54 +08:00
Kaelz31	3524a43d18	fix: pretty-print JSON file previews (#1206 ) * fix: pretty-print JSON file previews * fix: avoid formatting JSON with unsafe numbers * fix: preserve precision-sensitive JSON previews * fix: preserve signed zero in JSON previews * fix: scan JSON numbers without repeated slicing --------- Co-authored-by: Kael S <YOUR_GITHUB_EMAIL_HERE>	2026-05-11 20:52:55 +08:00
eggward han	a0316d2599	fix(web): suppress autosave indicator for draft-only Connector key edits (#1232 ) When the user typed a replacement Composio API key, the global Settings autosave loop persisted `buildPersistedConfig(cfg)` — which intentionally strips the in-flight secret — and then advanced the indicator through 'saving' -> 'saved' despite the key never actually being written. The "All changes saved" status then contradicted the section-local "Save key" gesture and eroded trust in the saved-state badge for a sensitive field. The autosave effect now tracks the snapshot at the last successful save (or the initial cfg on mount) and compares the next snapshot's persisted shape against it via a new `isAutosaveDraftOnlyChange` helper. When the only diffs since last save are fields that `buildPersistedConfig` strips (today the Composio API key, generalizing to any future save-on-explicit-confirm secret), the persist call is skipped and the indicator settles to 'idle' instead of flashing 'saved'. The forced media-provider sync path still runs because that is a real outbound effect even when the persisted shape hasn't changed. Refs #1187	2026-05-11 20:52:45 +08:00
nettee	be77dc0394	Default English resource i18n fallback (#1270 )	2026-05-11 20:29:05 +08:00
pftom	1bdf765cf2	feat(daemon): enrich API responses with surface specs and add new flags - Implemented `--schema` flag for `od ui show` to return only the JSON Schema of the surface. - Enhanced the response of `GET /api/runs/:runId/genui/:surfaceId` to include the surface spec from the AppliedPluginSnapshot. - Introduced new flags for daemon and library commands to improve command handling and parsing. - Added tests for the new functionality, ensuring proper behavior of the enriched responses and flag handling. This change supports headless interactions by allowing code agents to inspect surface contracts before responding.	2026-05-11 20:27:05 +08:00
Caprika	fb079d8115	Add reliable agent-browser skill (#1284 ) * Add reliable agent browser skill * Fix ProjectView delete conversation test props	2026-05-11 20:09:12 +08:00
PerishFire	1eb20e3807	fix(web): keep tweaks selection usable without annotations (#1268 )	2026-05-11 20:06:49 +08:00
初晨	0f0d214298	fix(web): render static previews for sketch json files (#1060 ) * fix(web): render static previews for sketch json files * fix(web): tolerate malformed sketch text items * fix(web): harden sketch preview parsing * fix(web): preserve sketch items on round-trip * fix(web): clear sketch files destructively * fix(web): unblock unsupported sketch saves	2026-05-11 19:29:46 +08:00
Dongsen	12ce5ad38b	fix(web): ignore <artifact> tags inside markdown code spans and fences (#1132 ) * test(web): add failing parser cases for <artifact> recitation in markdown code Cover the three real-world prose contexts where the model legitimately quotes the artifact tag without intending to emit one: - inside an inline backtick span - inside a fenced code block - spread across streaming chunks crossing the fence boundary Establishes the RED baseline before parser code-fence awareness lands. * fix(web): ignore <artifact> tags inside markdown code spans and fences The streaming artifact parser scanned the buffer with a raw indexOf, guarded only by 'next char must be whitespace'. That meant any literal <artifact ...> the model recited while documenting the protocol — even inside backticks or a ```html fence — flipped the parser into artifact mode, swallowed the rest of the reply from the chat UI, and (when a matching </artifact> appeared in the recitation) silently wrote a spurious file to disk via persistArtifact. Replace findOpenTag with a linear scan that tracks fenced code blocks (```) and inline code spans (`), skipping any <artifact prefix found inside either. If the buffer ends mid-fence, return a partial match anchored at the fence start so the next streaming chunk can resolve the boundary without losing fence context. Closes #1130. * fix(web): match renderer fence/inline-code rules in artifact parser Codex review on PR #1132 caught that the previous fix toggled inFence on any triple-backtick run anywhere in the buffer, including mid-line, while the chat renderer (apps/web/src/runtime/markdown.tsx) only treats ``` as a fence when it occupies a whole line matching /^[ ]{0,3}```(\w[\w+-])?\s$/. That asymmetry would suppress a real <artifact> tag emitted after a prose sentence like "the opening marker is ```html and the response then writes:". Rework findOpenTag in three passes that mirror the renderer: 1. Walk \n-terminated lines; only a line that matches FENCE_LINE_RE toggles fence state. Open fences without a close (or with an unterminated tail line) return partial so the next chunk can resolve. 2. Collect inline code spans with /`[^`]+`/g — the same regex used by renderInline — so what the parser skips matches what the user sees as code. Unmatched trailing backticks after the last \n hold back. 3. Find the first <artifact …> outside any skip range; preserve the existing partial-prefix tail handling. Adds a regression test covering the exact case Codex reported. * test(web): pin parser behavior on double-backtick and in-fence string literal recitation Two cases raised in PR #1132 review: - a real artifact tag wrapped in '``<artifact …>``' (double-backtick inline code span) should not be treated as a real artifact - a fenced JS example whose body contains a string literal like 'const fence = "```";' should not pop fence state early and let a later literal <artifact> be parsed as real Both already pass on 96e88ca because the line-anchored fence regex and the renderer-aligned inline regex handle them correctly. Pinning the behavior so future regressions surface as test failures. * fix(web): make stripArtifact markdown-aware to stop truncating literal recitations The streaming artifact parser was hardened in 96e88ca to skip <artifact> recitations inside backticks and fences, but the post-stream stripper at AssistantMessage.tsx still ran a naive 'content.indexOf("<artifact")' over the same text events. As reported by lefarcen on PR #1132, that meant chat replies with literal protocol recitations could still get silently truncated mid-explanation — even though the parser preserved them in the text stream and the file panel was no longer polluted with ghost files. Extract the renderer-aligned classification (FENCE_LINE_RE, INLINE_CODE_RE, computeSkipRanges, rangeContains) into a single source of truth at apps/web/src/artifacts/markdown-context.ts so the parser and the stripper agree on what counts as code. Add apps/web/src/artifacts/strip.ts with a markdown-aware stripArtifact that: - ignores any <artifact open inside a fenced block or inline code span - looks for </artifact> with the same skip-range filter, so a real open paired with a literal close inside backticks does not strip a literal body that is meant to render - returns content unchanged when an open exists with no matching real close (the previous implementation sliced to end-of-string, which would nuke trailing prose on a malformed or still-streaming tag) Refactor parser.ts to import the shared helpers; behavior preserved (all seven existing parser tests still pass). New strip.test.ts covers six cases including the empirically-verified inline-backtick regression. * fix(web): align artifact stripper/parser fence rules with renderer exactly Two gaps surfaced in review at a0bf05f: - markdown-context.ts used a single FENCE_LINE_RE that allowed 0-3 leading spaces and reused the same pattern for opening and closing fences. The chat renderer (runtime/markdown.tsx:44 and :49) is asymmetric — opens with /^```(\w[\w+-])?\s$/, closes with /^```\s$/, and rejects any leading indentation on either side. Indented " ```html" was being treated as a code fence even though the renderer keeps it as a paragraph, and a literal "```html" line inside an open fenced example was closing the skip range early — both could expose a real or literal <artifact …> to the wrong handler. - stripArtifact discarded computeSkipRanges' unclosedFenceStart, so a fenced literal that ends at EOF without a trailing newline (very common for chat output) leaked the inner <artifact …> recitation to the stripper, reproducing the original #1130 truncation symptom on a narrower input shape. Split FENCE_LINE_RE into FENCE_OPEN_RE / FENCE_CLOSE_RE with no leading indentation, gate the fence state machine on the right side of the toggle, and have stripArtifact extend skip ranges to end-of-content when a fence is left open. Also tightened the parser's tail-line hold-back regex to match the renderer's no-leading-space rule. Added regression tests for the EOF-unclosed-fence case, the indented pseudo-fence (renderer treats as paragraph, stripper must strip the real artifact), and a "```html" line inside an open fence. Refs nexu-io/open-design#1130 refactor(web): align streaming tail-line fence guard with FENCE_OPEN_RE The streaming parser's tail-line hold-back used a stricter local regex (/^```\w$/) than the renderer's FENCE_OPEN_RE (/^```(\w[\w+-])?\s$/), missing valid opener tails like ```c++, ```ts-, or ``` (trailing space). In practice these tails are still held back by the unmatched-backtick parity scan that runs immediately after — three backticks in a tail line are odd, so firstUnmatched stays set and the parser holds from that position. So this wasn't a runtime correctness bug, just a regex divergence that future readers could trip on. Drop the local regex and reuse FENCE_OPEN_RE so the tail check matches the same shape the rest of the pipeline already uses. Pinned the behavior with three new parser tests (`+`/`-` info-string suffix and trailing-space tails arriving as the first chunk) — they pass at HEAD, proving the parity scan was already covering these cases. Refs nexu-io/open-design#1132 (lefarcen polish P2) fix(web): scope inline-code skip ranges per block and reject <artifact prefix-shared opens INLINE_CODE_RE previously ran over the whole buffer, so an unmatched backtick in one paragraph could pair with a backtick in a later paragraph and create a phantom inline span that swallowed any real <artifact …> between them. Mirror runtime/markdown.tsx by splitting the buffer on fence / blank / heading / list / hr boundaries and running INLINE_CODE_RE per block region instead. stripArtifact accepted any unskipped `<artifact` substring as a real open, while the streaming parser already required a following whitespace character — so prose like `<artifactual>demo</artifact>` was being truncated to `prefix suffix`. Extract the parser's real-open guard into isRealArtifactOpenAt and reuse it from both sides. While reordering findOpenTag for the shared guard, also fix the related hold-back ordering issue tracked at #1141: a stray tail-line backtick or fence-opener prefix used to suppress an artifact already complete earlier in the buffer. Scan for the earliest complete real open first, then pick the earliest hold-back position only when no complete tag was found. Regressions pinned in parser.test.ts and strip.test.ts for both new finding shapes. * fix(web): keep HR-shaped lines inside paragraph regions for inline-code scanning The previous walker closed inline-scan regions on lines matching the HR regex, but `parseBlocks()` in runtime/markdown.tsx does not break a paragraph on HR — its inner accumulation loop only breaks on blank / fence / heading / ul / ol (runtime/markdown.tsx:95-104). HR is only an HR block in the outer loop's first-look, never mid-paragraph. So inputs like `intro \`\n---\n<artifact …>…</artifact>\n---\nclosing \`` are one paragraph in the renderer, whose two stray backticks pair to cover the literal artifact recitation — but the walker was splitting on the `---` lines, leaving the recitation outside skip ranges, and the parser/stripper would treat it as a real tag. Drop HR from the paragraph-break list (HR-shaped lines carry no backticks of their own, so keeping them inside the surrounding region is benign either way) and document the renderer-mirror rationale. Regressions pinned on both sides.	2026-05-11 19:29:22 +08:00
Sid	156bf5a34e	fix(web): refresh home projects after deleting a conversation (#1202 ) (#1219 ) The home design cards render their `Needs input` badge from the cached `/api/projects` payload — App.tsx owns the `projects` state and exposes a `refreshProjects` callback that ProjectView already fires from every other state-changing branch (run end, live-artifact events, project rename, etc.). The conversation-delete branch silently skipped it: deleting a conversation that owned an unanswered `<question-form>` flips the daemon-side flag, but the home view kept showing the stale badge until the next manual reload. Call `onProjectsRefresh()` immediately after a successful `deleteConversation` API response (and only then — if the request fails the cached state is still the truth and we must not pretend otherwise). Adds `onProjectsRefresh` to the useCallback deps for exhaustive-deps correctness; matches the pattern at the four existing call sites in this file. New regression coverage in `apps/web/tests/components/ProjectView.deleteConversation.test.tsx`: - triggers onProjectsRefresh after deleting a conversation (verified RED before this fix, GREEN after) - does not trigger onProjectsRefresh when the delete request fails (defensive complement so a future "always refresh" refactor doesn't paper over a real failure with a stale-but-confident UI)	2026-05-11 19:29:09 +08:00
shangxinyu1	10802bb0b0	test: expand nightly UI and desktop regression coverage (#1256 ) * e2e(ui): cover examples preview flows * e2e(ui): cover Codex local CLI fallback UX * test: expand desktop and connector regression coverage * e2e(ui): cover workspace restoration flows * e2e(ui): cover retry recovery workspace flow * test: cover artifact and connector recovery flows * e2e(ui): cover Continue in CLI stale provenance flow * e2e(ui): cover BYOK model fetch caching * test: expand Orbit and desktop connector coverage * e2e(ui): cover workspace quick switcher recovery flows * e2e(ui): cover connector pending authorization recovery * e2e(ui): cover workspace and conversation restoration routes * e2e(ui): cover conversation draft and attachment restoration * e2e(ui): cover conversation history selection recovery * e2e(ui): cover workspace surface conversation selection * test: cover artifact presentation and orbit link behavior * test: cover artifact external link restoration * e2e(ui): cover root-route deep-link restoration * e2e(specs): cover Orbit open-artifact desktop click * e2e(specs): cover desktop artifact open link * test: fix Orbit settings fixture type drift * test: split Playwright critical and extended suites * test: fix ProjectView design template fixtures * ci: split workspace test stages * guard: allow split Playwright suite scripts * test: shrink Playwright critical suite * test: restore omitted Playwright suites	2026-05-11 19:23:13 +08:00
PerishFire	8c0fb8dc01	feat(tools-pr): add maintainer PR-duty workspace (#1259 ) * feat(tools-pr): add maintainer PR-duty workspace Adds `tools/pr` as the maintainer-only control plane for PR-duty work on this repo. Thin `gh` wrapper that encodes repo-specific knowledge: review lanes, forbidden surfaces, lane-specific checklists, validation command derivation from touched packages. Subcommands: - `list` — triage open queue by lane and review-state bucket. - `view <num>` — agent-friendly review brief for a single PR. - `classify [num]` — emit script-level tags for one PR or the whole open queue; full-queue JSON output lands under `.tmp/tools-pr/classify/` with rate-limit telemetry per run. - `assignment` — assigner-perspective view of PR ownership, idle time, and blockers (derived from existing tags; no new judgments). Tag dictionary (13 tags) covers: bot-only-approval, needs-rebase, forbidden-surface, unlabeled, duplicate-title, non-ascii-slug, maintainer-edits-disabled, org-member, unresolved-changes-requested, stale-approval, and three awaiting-* timing tags. Each rule is expressible as one factual sentence over `gh` data + repo paths — see `tools/pr/AGENTS.md` for the full dictionary plus precision rules. Templates in `tools/pr/templates/.md` are aesthetic references for recurring maintainer comments (duplicate-title ask, awaiting-author nudge, agent-review brief shape). `templates/examples/` holds frozen-in-time agent-review snapshots for three PR shapes. Infrastructure: - `gh()` wraps `execFile` with minimum-touch retry (2 attempts at 1s + 2s backoff) on transient 5xx / network errors. Persistent failures still surface — retry is anti-jitter, not an exponential-backoff resilience layer. - Heavy chunks (`reviews`, `comments`, `commits`, assignment timelines) use cursor-paginated `gh api graphql` via `fetchPaginatedPrList` to stay under GitHub's GraphQL server-side timeout. Light chunks stay on `gh pr list --json`. - `fetchOrgMembers` cached per process via `gh api orgs/<owner>/members --paginate`. Wiring: - Root `package.json` adds `pnpm tools-pr` to the allowed root entry points. - `scripts/postinstall.mjs` builds `tools/pr` alongside other workspace packages. - `scripts/guard.ts` allowlists `tools/pr/bin/tools-pr.mjs` and `tools/pr/esbuild.config.mjs`, and adds `pr/` to the `tools/` top-level layout allowlist. - Root `AGENTS.md` and `tools/AGENTS.md` document the new command surface, root-command-boundary update, and per-tool ownership. docs(agents): brief tools-pr in root AGENTS.md, link to tools/pr/AGENTS.md Adds a `PR-duty tooling` section to the root AGENTS.md summarising what `pnpm tools-pr` is, listing the four common subcommands (list / view / classify / assignment), and pointing readers to `tools/pr/AGENTS.md` for the full tag dictionary, operational playbook, templates, and design rules. The section keeps root-level guidance to high-level orientation while details stay local to the tool's own AGENTS.md. * fix(tools-pr): drop overly broad touches-root-package.json forbidden hit `deriveForbidden` was flagging any change to root `package.json` as a forbidden-surface hit, but AGENTS.md §Root command boundary only forbids specific lifecycle aliases (pnpm dev / test / build / daemon / preview / start) — tools-control-plane entrypoints like `pnpm tools-pr` are explicitly allowed. Distinguishing "forbidden alias" from "allowed entry" requires reading the diff content, which is `pnpm guard`'s job rather than a path-derived classify tag. Dogfooded on this branch's own PR (#1259), which added the `pnpm tools-pr` script and was incorrectly flagged. Removing the hit aligns the `forbidden-surface` tag with what tools-pr can mechanically detect from file paths alone (apps/nextjs/, packages/shared/). * fix(tools-pr): paginate commits fetch, recognise ready-to-merge, escape title-index separator Three review follow-ups on #1259, all factual fixes: - `fetchOpenPrCommits` now uses `fetchPaginatedPrList` instead of a one-shot `pullRequests(first: $first)` query. GitHub GraphQL caps connection page size at 100, so the previous implementation would fail at runtime when callers passed `--limit > 100`. The paginated path makes the commits fetch consistent with the other heavy chunks (reviews, comments, assignment timelines) and removes the artificial ceiling entirely. The `limit` parameter is dropped from `fetchOpenPrCommits`; the CLI `--limit` continues to bound the `gh pr list --json` chunks. - `deriveStatus` in `assignment.ts` now reads `facts.reviewDecision` and `facts.mergeStateStatus`. When the PR is `APPROVED` with merge state `CLEAN` or `UNSTABLE` and carries no blockers, status renders as `ready to merge` instead of falling through to `in review`. The assignment view loses its main triage signal without this — a clean human-approved PR rendered identical to a REVIEW_REQUIRED one. - `tags.ts:tagDuplicateTitle` and `tags.ts:buildContext` both constructed the title-index key with a literal NUL byte between author and title, which made the file appear as binary in `git diff` / review tooling. Replaced the literal byte with a Unicode escape sequence in source; the runtime string value is identical, the source stays plain text and round-trips through review tooling cleanly. * fix(tools-pr): raise default --limit to 1000 to cover the live open queue mrcfps flagged that `tools-pr list` (and `classify --all`, `assignment`) defaults to `--limit 100`, which silently drops every PR past the first 100 in the open queue. The repo currently sits at 104 open PRs, so the out-of-the-box run was already omitting four PRs. Raise the default to 1000 in `list.ts`, `classify.ts`, and `assignment.ts`, and remove the now-pointless 200 ceiling — `gh pr list --limit N` paginates internally, so a high cap is cheap. Users can still pass `--limit <small>` for a truncated preview. CLI help text on the three subcommands updated to match. * fix(web): pass designTemplates to ProjectView render helper #955 made `designTemplates` a required Prop on ProjectView, but the test helper added in #1244 (`renderProjectView` in `ProjectView.api-empty-response.test.tsx`) was never updated. The two PRs landed on main without conflicting, leaving `apps/web` typecheck red for every PR that rebases past `b5eb8c16`. Pass `designTemplates={[] as SkillSummary[]}` alongside the existing `skills={[] as SkillSummary[]}` so the helper compiles. The component already treats the array shape (empty included) as a no-op fallback in the empty-response paths the test exercises. * fix(tools-pr): correct author signal + merge inline review comments Two correctness gaps in the awaiting-* signal pipeline surfaced during review of the new tools-pr commands: 1. `authorSignalAt` iterated every PR commit unconditionally. On `maintainerCanModify=true` PRs a maintainer's follow-up push would advance the author timestamp, masking a stalled author response. Filter commits to those whose `authorLogin` matches `facts.author`, mirroring the same filter already applied to comments. 2. `fetchOpenPrComments` (and `fetchView`) only fetched `pullRequest.comments` / `gh pr view --json comments`, which is the issue-conversation thread. Inline review-thread replies — where authors and reviewers actually exchange most fix-up replies — live in `reviewThreads.comments` / REST `pulls/{n}/comments`. Missing them let `humanReviewerSignalAt` / `authorSignalAt` and the `view` brief point at the wrong side after someone replied inline. Extend the list-mode GraphQL to also sweep `reviewThreads(last: 20).comments(first: 20)`, and add a parallel REST inline-comments fetch in `fetchView` that merges into `GhView.comments`.	2026-05-11 19:17:21 +08:00
Tom Huang	b5eb8c1647	feat: generic skills + split skills/design-templates + finalize-design API (#955 ) * feat: general-purpose skills with @-mention composition and user import Lift skills from "one mode-bound skill per project" to a generic capability the user can compose per turn: - Daemon: scan multiple skill roots (user-skills under runtime data, then the bundled `skills/`); user-imported skills can shadow built-ins by id. - New `POST /api/skills/import` and `DELETE /api/skills/:id` endpoints, with CONFLICT/BAD_REQUEST/NOT_FOUND error codes and built-in delete protection. - ChatRequest gains `skillIds: string[]`; the chat run concatenates each picked skill's body (and merges craftRequires) into the system prompt for that turn only — the project's persistent `skillId` is untouched. - Web composer: `@` popover now lists skills alongside project files; picks render as removable chips above the textarea and ride along with the request as `skillIds`. - Settings → Library: import form (name/description/triggers/body), per-card delete for user skills, "user" origin badge. * chore(web): drop welcome pet teaser + add ds→prompt-template mapping util - SettingsDialog: remove the inline pet adoption teaser from the welcome panel so the first-run modal stays focused on configuration. - New `inferPromptTemplateCategoriesForDs(ds)` helper that maps a design system's authored metadata to prompt-template gallery categories. Imported by the design-system gallery wiring on a sibling branch; no callers in this branch yet. * feat: split skills/design-templates and add finalize-design API Phase 0 of the skills/design-templates refactor (specs/current/ skills-and-design-templates.md): - Move ~104 rendering catalogue entries from skills/ to design-templates/ and keep skills/ for the small set of functional skills that do work on user input (utilities, briefs, packagers). - Add design-templates/AGENTS.md and skills/AGENTS.md describing the contract, and a brand-agnostic craft/ surface for opt-in craft rules. - Daemon: add DESIGN_TEMPLATES_DIR / USER_DESIGN_TEMPLATES_DIR roots and an /api/design-templates surface mirroring /api/skills. Asset/example routes still span both registries so existing srcdoc URLs keep resolving across the rename. - Web: split LibrarySection into SkillsSection + DesignSystemsSection, rename the EntryView "Examples" tab to "Templates", and update locales + the New-project picker accordingly. Adds the finalize-design endpoint: - New apps/daemon/src/finalize-design.ts and packages/contracts/src/api/ finalize.ts — one-shot synthesis of a project's transcript + active design system + current artifact into <projectDir>/DESIGN.md via the Anthropic Messages API. Per-project .finalize.lock mirrors the transcript-export hygiene from PR #493; provider credentials are not persisted by the daemon. Other supporting changes: - README + AGENTS.md updates to document the new directory split and craft/ surface, plus i18n strings across 13 locales. - Test refactors and new coverage (finalize-design, runs, sidecar server, plus refreshed daemon integration tests). - .gitignore: scope the .exe ignore to /OpenDesign.exe so legitimate vendor binaries are no longer hidden. fix(merge): move clinical-case-report to design-templates/ Origin/main added the clinical-case-report skill under skills/ before the skills/design-templates split landed. Its od.mode is prototype, so per specs/current/skills-and-design-templates.md it is a design template and belongs alongside the other rendering catalogue entries — not under the slimmed-down functional skills/ root. Moving it keeps the EntryView Templates tab consistent with origin/main's intent. * feat(skills): curated design/creative catalogue + collapsible Settings rows Seed ~100 curated design/creative skill stubs under skills/ sourced from awesome-claude-skills (ComposioHQ) and awesome-agent-skills (VoltAgent). Each stub carries an od.category tag so the new filter pill row in Settings -> Skills can group them. The seed script (scripts/seed-curated-design-skills.ts, pnpm seed:curated-design-skills) is idempotent: it only creates folders that don't already exist, so hand-edited stubs are never overwritten. - Daemon: parse and surface od.category on SkillInfo with a strict slug normaliser; mirror the field on SkillSummary in @open-design/contracts. Category is purely a UI hint — system-prompt composition is unchanged. - Web: rewrite SkillsSection from a left-list / right-detail grid into a vertical stack of collapsible rows mirroring the External MCP panel (header always visible with name + mode/source/category pills + per-row enable toggle; SKILL.md preview, file tree and inline edit form expand on demand). Add a Category filter row above the list. Reorder Settings nav so Skills + External MCP sit above the Composio/MCP cluster. Update composer placeholder/hint across 17 locales to advertise '@ files or skills · / for commands'. - Docs: extend skills/AGENTS.md with the curated catalogue rules (idempotency, category vocabulary, no upstream vendoring). Co-authored-by: Cursor <cursoragent@cursor.com> * test(skills): teach localized-content + system-prompt tests about the skills/design-templates split mrcfps blocking review on PR #955: the skills/design-templates split (`b5993385`) moved ~110 SKILL.md entries out of `skills/` and into `design-templates/`, but two repo-level tests still hard-coded the single-root layout, so CI gates went red on the merged branch: - `e2e/tests/localized-content.test.ts` only scanned `<repo>/skills` while the locale `skillCopy` map keeps id-keyed entries spanning both roots (ExamplesTab/Templates uses one lookup regardless of origin). Teach the helper to read both `skills/` and `design-templates/`, deduplicating ids so the union matches the localized claim. - `apps/daemon/tests/prompts/system.test.ts` read `skills/live-artifact/SKILL.md`, which now lives under `design-templates/live-artifact/`. Update the absolute path so composeSystemPrompt's coverage of the live-artifact preamble is exercised again. Also enroll the curated design/creative catalogue (PR #955, ~91 stubs sourced from awesome-claude-skills / awesome-agent-skills) in the DE / FR / RU `_SKILL_IDS_WITH_EN_FALLBACK` lists. The stubs are English-only by design (frontmatter advertises an upstream URL); the fallback list is exactly the place to acknowledge "we know this id exists, English copy is fine here" so the localized-content coverage gate passes without forcing a translation task per locale. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(skills): always quote frontmatter name so importUserSkill round-trips numeric / boolean ids mrcfps PR #955 review: `buildSkillMarkdown` emitted `name: ${escapeYamlString(name)}` without quotes, so YAML coerced names like `123`, `true`, `false`, or `null` into non-string scalars on re-parse. listSkills() then read `data.name` as a number/boolean and the import flow's follow-up `findSkillById(skills, result.id)` missed it, falling into `/api/skills/import`'s "imported skill could not be re-read" 500 path for those ids. Switch the emitter to a quoted scalar (`name: "..."`) — the double-escape already in `escapeYamlString` makes the quoted form safe — and add a round-trip test covering `123`, `true`, `false`, `null`, and `0` to lock in the contract. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(web): drop staged-skill chips when the matching @<id> token leaves the draft mrcfps PR #955 review: `submit()` always forwarded every id in `stagedSkills`, but that state was only mutated on picker click and chip removal. Hand-deleting an `@<id>` token from the textarea left the chip staged, so the request still carried `skillIds: [<id>]` and the daemon composed a skill the prompt no longer referenced. Sync the chips with the draft inside `handleChange()` by pruning `stagedSkills` whenever the new value no longer contains the `@<id>` token (using the same whitespace boundary as `removeStagedSkill`'s strip regex). Comment explains why this prune does not run for `staged` file attachments — users frequently add files via the upload button without leaving an `@<path>` token, so a symmetric prune there would erase legitimate uploads. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(daemon): stage @-composed skills' side files alongside the active skill codex PR #955 review: composing a per-turn `@`-picked skill into the system prompt appended its body (with the `withSkillRootPreamble` guidance pointing at relative paths under `<cwd>/.od-skills/<folder>/`) but never staged the actual folder. `startChatRun` only copied `activeSkillDir`, so when the project's primary skill was different (or absent) the composed skill's references/, examples/, and scripts/ files lived only at their absolute repo path — agents that honour the cwd-relative form (or that don't get `--add-dir`, e.g. Codex with allowlisted gpt-image projects) couldn't reach them. Thread the composed skills' dirs out of `composeDaemonSystemPrompt` as `extraSkillDirs` and stage each one through the same `stageActiveSkill` API used for the primary skill. Dedupe by folder basename so a project whose primary skill is also `@`-composed isn't copied twice. Each preamble already advertises its own folder, so the prompt and the staged tree stay aligned without further changes. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(web): respect the Library disable toggle in the project @-mention picker codex PR #955 review: only `EntryView` received `enabledSkills` (filtered against `config.disabledSkills`); active projects still got `skills={skills}` raw, so a skill the user disabled in Settings kept appearing in the project's `@`-mention popover and could ride along to the daemon via `skillIds`. That broke the Library toggle for any project opened on the post-split branch. Compute a functional-skills-only enabled subset (`enabledFunctionalSkills`) and pass it into `<ProjectView>` instead. Templates stay separate — design-templates are filtered through their own `enabledDesignTemplates` memo for the Templates gallery — so ProjectView's chat composer still only sees skills, never templates, matching the pre-split prop surface. Co-authored-by: Cursor <cursoragent@cursor.com> * test(e2e): mock /api/design-templates for example-use-prompt flow The Templates tab in EntryView fetches from /api/design-templates after the skills/design-templates split (specs/current/skills-and-design-templates.md). The example-use-prompt Playwright scenario only mocked /api/skills, so the gallery card never appeared and the test timed out waiting on example-card-warm-utility-example. Serve the same fixture summary on both endpoints so the templates gallery renders the card the test clicks. Co-authored-by: Cursor <cursoragent@cursor.com> * test(tools-pack): create design-templates fixture for resources test The packaging resources copy now bundles the new design-templates tree alongside skills (see resources.ts BUNDLED_RESOURCE_TREES). The copyBundledResourceTrees fixture only created skills, design-systems, craft, etc., so the recursive copy crashed with ENOENT on design-templates before it could check the prompt-templates assertion. Add the missing fixture directory so the test exercises the same set of resource trees the packaged build does. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(skills): clone built-in side files into the shadow on first edit mrcfps PR #955 review: editing a built-in skill wrote a USER_SKILLS_DIR shadow folder that contained only a new SKILL.md. The next listSkills() pass surfaced the shadow as the active dir, but every side-file resolver (/api/skills/:id/files, /example, /assets/, the system-prompt preamble, and the per-turn cwd staging) reads through skill.dir. With nothing but SKILL.md in the shadow, the bundled assets/, references/, scripts/, and examples/ disappeared the moment the user hit save — a built-in like last30days or live-artifact would break immediately after edit instead of just having its body overridden. Teach updateUserSkill() to take a `sourceDir` and clone every entry except SKILL.md / dotfiles into the shadow on the very first edit. The shadow stays self-contained, so all the resolvers keep working without fallback bookkeeping. Subsequent edits detect the existing shadow and skip the clone, so user tweaks under the side tree survive a re-save. Wire `sourceDir: skill.dir` from server.ts's PUT /api/skills/:id handler and add two regression tests: - 'clones built-in side files into the shadow on the first edit' walks the file tree after save and asserts assets/template.html, references/ notes.md, and scripts/helper.sh all round-trip from the built-in. - 'preserves user-edited side files on subsequent edits' edits the staged assets/template.html, re-saves, and confirms the user content is still there. Co-authored-by: Cursor <cursoragent@cursor.com> test(e2e): rename home tab from Examples to Templates The Examples tab was renamed to Templates in EntryView (b5993385's skills/design-templates split — entry.tabExamples became entry.tabTemplates and the tab value moved from 'examples' to 'templates'), but entry-chrome-flows still asserted the old label and testId. Update both. * fix(skills+web): preserve template body in API mode and dir-based skill delete Two follow-ups from PR #955 review: 1. ProjectView only received `enabledFunctionalSkills`, but `composedSystemPrompt()` still resolved `project.skillId` through that prop and `fetchSkill()`. Projects created from the new `/api/design-templates` surface keep a template id in `project.skillId`, so opening one in API mode dropped the template body from the system prompt and the upstream request ran without the project's primary template instructions. Now ProjectView takes a separate `designTemplates` prop (the unfiltered template list, so a later-disabled template still loads for projects already created from it) and `composedSystemPrompt()` plus the metadata / `isDeck` lookups fall back to that list, with `fetchDesignTemplate()` as the body-fetch fallback to `fetchSkill()`. The chat composer's `@`-picker keeps receiving only the enabled functional skills. 2. `DELETE /api/skills/:id` used `deleteUserSkill(USER_SKILLS_DIR, skill.id)` which re-slugified the frontmatter id and removed `<userSkillsDir>/<slug>/`. That matched the import shape but missed the install shape — `installFromTarget` writes the folder at `sanitizeRepoName(url)` (GitHub) or `path.basename(realpath)` (local symlink), neither of which is guaranteed to equal the slugified frontmatter `name`. A duplicate `app.delete('/api/skills/:id', ...)` handler at the install routes never fired because Express resolved the earlier registration first, leaving the install/uninstall path without working teardown. The handler now removes `skill.dir` (the absolute path listSkills already discovered) under a USER_SKILLS_DIR safety check, using `lstat` + `unlinkSync` so symlinked local installs unlink cleanly without recursing into the user's source tree. The dead duplicate handler is removed; `deleteUserSkill` is dropped from the server.ts import set (still exported and unit-tested in skills.ts). Regression coverage in `apps/daemon/tests/skills-delete-route.test.ts` pins both shapes plus the symlink-preserves-source case. * test(daemon): point hyperframes system-prompt test at design-templates The merge with main brought in a hyperframes system-prompt test that reads `skills/hyperframes/SKILL.md`, but this branch's split moved `hyperframes` into `design-templates/` (same migration as `live-artifact` already handled above in this file). CI was failing with ENOENT on the old path. --------- Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-11 17:48:34 +08:00
Caprika	f7f2661bda	[codex] Handle empty API responses as no output (#1244 ) * Handle empty API responses as no output * Fix empty API response comment cleanup * Stabilize API empty response detection	2026-05-11 16:57:02 +08:00
nettee	e859c31574	fix(web): complete finished tool calls missing results (#1240 )	2026-05-11 15:54:11 +08:00
Tom Huang	e254d1280b	feat(memory): auto-memory store with chat-protocol-aware extraction (#999 ) * feat(memory): auto-memory store with chat-protocol-aware extraction Markdown memory store at <dataDir>/memory/ with two extractors — heuristic regex for explicit "remember:" / "我是 X" markers, and a small-model LLM pass after each turn — folded into the system prompt so cross-chat preferences, role, and ongoing-work context survive restarts. Settings UI: - Memory tab lists entries, exposes a hand-edited MEMORY.md index, and shows an extraction history with per-attempt phase/skip/failure rows. - Memory model picker is inline next to the chat model picker (CLI and BYOK) so the choice "which fast model mines facts each turn?" sits next to the chat-model decision instead of a separate panel. The picker reuses the same SUGGESTED_MODELS table and "Custom..." pattern the chat picker uses. LLM extractor supports all four protocols (anthropic / openai / azure / google); pickProvider takes the chat agent id from the chat handler and constrains its auto-pick to the chat's protocol family — Claude Code chats no longer surprise users by silently extracting on whatever OpenAI key happens to be in media-config. When no matching key is configured the attempt records as 'skipped: no-provider' instead of quietly switching vendors. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(memory): keep hint outside <label> and disambiguate Model selectors The inline Memory model picker wrapped its hint paragraph inside the <label>, which made the hint's "API key" / "model" wording bleed into the <select>'s accessible name and broke Playwright's getByLabel('API key') / getByLabel('Model') strict-mode matching in the existing settings-api-protocol e2e suite. - Move the hint <p> out of the <label> in MemoryModelInline so the select's accessible name is just "Memory model". - Switch the chat-Model selectors in settings-api-protocol.test.ts from getByLabel('Model') to getByRole('combobox', { name: 'Model', exact: true }) so they no longer collide with the new "Memory model" select that sits next to the chat Model picker. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(memory): address review changes — BYOK wiring, MEMORY.md index, /v1, label wrapper Addresses the four blocking review threads on PR #999. 1. MemoryModelInline accessibility (mrcfps) The inline picker still wrapped its select + custom input + flash + hint inside a single <label>, which made the select's accessible name absorb every text descendant — including the "API key" / "model" hint copy. The previous fix moved only the hint outside; the reviewer asked for a non-label wrapper. Switch to <div className="field"> and associate just the short title with the controls via `aria-labelledby` / `aria-label`. The select's accessible name is now exactly "Memory model" so `getByLabel` strict-mode locators on the surrounding chat form stop cross-matching the memory copy. 2. Respect the hand-edited MEMORY.md index (mrcfps + codex) `composeMemoryBody()` was reading every .md file in the memory dir, ignoring the index. Removing a `- [Name](id.md)` line had no effect on future prompts. Parse the index's `INDEX_LINK_RE` bullets and filter `listMemoryEntries()` to the linked id set, so the editor's "delete this line to disable injection" promise actually holds. 3. Versioned OpenAI-compatible base URLs (codex) `callOpenAI` and `callAnthropic` hard-coded `/v1` onto `provider.baseUrl`, breaking custom endpoints whose saved URL already includes `/v1` (`/v1/v1/chat/completions`). Apply the same conditional `appendVersionedApiPath` helper the chat proxy and connection-test routes already use. 4. Wire memory into BYOK / API-mode chats (mrcfps + codex) The previous PR's daemon-only memory hook never fired for BYOK, leaving the Memory tab + model picker as a no-op for that mode. Add the missing surface and wire it through ProjectView: - contracts: extend `composeSystemPrompt` with `memoryBody`, mirroring the daemon's local composer; add `MemorySystemPromptResponse` and the `attemptedLLM` flag on `ExtractMemoryResponse`. - daemon: expose `GET /api/memory/system-prompt` (returns the composed body) and turn `POST /api/memory/extract` into a two-phase endpoint — heuristic-only when only userMessage is supplied (pre-turn), LLM-only when assistantMessage is also supplied (post-turn), so the extraction-history doesn't double up. - web: ProjectView's BYOK branch now fetches the memory body before composing the system prompt, runs the heuristic extractor before the run (so "remember:" markers in this turn reach this turn's prompt), accumulates assistant text during streaming, and queues the LLM extractor on `onDone` — fire-and- forget so it never blocks the chat round-trip. Co-authored-by: Cursor <cursoragent@cursor.com> fix(memory): re-sync BYOK memory override when chat config drifts The inline memory-model picker captured `apiProtocol` / `chatApiKey` / `chatBaseUrl` / `chatApiVersion` into the saved override only at the moment the user clicked a model. If they later swapped the BYOK protocol tab, rotated the API key, or edited the base URL in the same settings flow, the daemon's background extractor kept calling the old vendor / credential — directly contradicting the picker's "borrows the surrounding chat picker's protocol, key, base URL, and api-version automatically" promise. Add a debounced effect that compares the persisted (masked) shape against the live chat props and re-PATCHes /api/memory/config when they drift. The masked config exposes `apiKeyTail` (last 4 chars), so key rotation is detectable without ever round-tripping the secret back to the browser. The 300 ms debounce coalesces the keystroke- granularity prop updates the parent settings dialog streams during its autosave loop, so a user editing the base URL doesn't trigger one PATCH per character. Background re-syncs are silent — the "Saved!" flash only fires for explicit user clicks, so the picker doesn't feel like it's fighting them as they edit unrelated chat fields. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(memory): thread BYOK chat config through /api/memory/extract default path Leaving the BYOK memory picker on "Same as chat" still broke the default LLM extraction path: `MemoryModelInline` clears the override for that option, both `/api/memory/extract` calls in `ProjectView` only sent the messages, and the daemon never persists BYOK creds, so `extractWithLLM(..., { chatAgentId: null })` always reached `pickProvider()` with no chat context and fell through to env / media-config — the wrong vendor for a BYOK chat that works for inference. Thread the live BYOK chat config through the extract endpoint as a per-call snapshot: - contracts: extend `ExtractMemoryRequest` with an optional `chatProvider` (provider/apiKey/baseUrl/apiVersion/model) and add `'chat-byok'` to the credentialSource enum. - daemon: parse + validate `chatProvider` on `/api/memory/extract` (provider must be one of the five known shapes) and forward to `extractWithLLM` as a new option. `pickProvider()` gets a new path 2 that uses the snapshot directly with the per-protocol fast-model default — so a memory pass on `gpt-4o` / `claude-sonnet-4-5` silently turns into a cheap `gpt-4o-mini` / `claude-haiku-4-5` call instead of paying chat-tier rates for sediment work. Override and CLI-agent-constrained paths still win when they apply. - web: `ProjectView` snapshots `apiProtocol` / `apiKey` / `baseUrl` / `apiVersion` from the live `AppConfig` on each BYOK extract call (both pre-turn heuristic-only and post-turn LLM phases). The picker's existing drift-resync effect already covers explicit overrides; this snapshot covers the implicit "Same as chat" default that the override flow can't reach. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(memory): treat empty apiKey on PATCH as a real clear MemoryModelInline silently re-PATCHes /api/memory/config whenever the surrounding BYOK chat creds drift. The previous reuse branch lumped `apiKey === ''` together with `apiKey === undefined`, so clearing the chat API key from the picker quietly preserved the old daemon-side secret and kept calling the provider on a stale credential. Distinguish four states for the apiKey field: - absent -> preserve stored secret (form re-save without re-typing) - '' -> clear stored secret (user removed it from the picker) - 'sk-...' -> replace - new provider -> ignore stored secret entirely Add tests/memory-config-route.test.ts covering all four cases. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-11 15:45:42 +08:00
Tom Huang	e11e86d468	feat(hyperframes): land HTML-in-Canvas across web + skills (#866 ) * feat(hyperframes): land HTML-in-Canvas across web + skills Ships HTML-in-Canvas as a first-class HyperFrames video path: - 7 new video prompt templates (liquid glass, iPhone+MacBook, portal, shatter, magnetic, liquid background, text-cursor reveal). - skills/hyperframes/references/html-in-canvas.md, surfaced via SKILL.md description+triggers and the system-prompt pre-flight references list. - ChatPane starter prompts now branch by project kind and video model, so the hyperframes-html surface shows HTML-in-canvas-shaped prompts instead of the generic prototype trio. - NewProjectPanel propagates a picked template's model+aspect onto the project, and defaults videoModel to hyperframes-html when the hyperframes skill resolves for the video tab. Polish bundled in the same branch: - DesignFilesPanel empty state becomes a centered pill with a "New sketch" CTA; designFiles.empty copy simplified across 19 locales. - Topbar project title + meta render on one baseline row separated by a middot. - scripts/seed-test-projects.ts hardens daemon URL discovery against pnpm engine warnings on stdout. * fix(new-project): preserve explicit video model choice across tab revisits Latch a videoModelTouched guard once the user picks a model via the dropdown or via a template that declares one, so the hyperframes-html auto-default no longer silently overwrites the override when the Video tab is re-entered. Generated-By: looper 0.6.1 (runner=fixer, agent=claude-code) * fix(i18n): register hyperframes html-in-canvas templates, category, and tags Adds the seven new prompt-template ids, the "VFX / HTML-in-Canvas" category, and the new tag set to the de/ru/fr i18n bundles so the e2e localized-content coverage test passes. Generated-By: looper 0.6.1 (runner=fixer, agent=claude-code) * fix(daemon): inject html-in-canvas preflight for hyperframes runs The contracts-side derivePreflight() learned about references/html-in-canvas.md when this PR landed, but the daemon copy at apps/daemon/src/prompts/system.ts kept the older five-ref allowlist. server.ts:4138 wires composeSystemPrompt from the daemon copy into live chat runs, so the main HyperFrames flow this PR is meant to improve still wasn't auto-injecting the preflight directive in production. Mirror the html-in-canvas case into the daemon composer and lock it behind a daemon-side test so the two copies cannot drift again on this reference. The broader live-artifact preflight gap (artifact- schema / connector-policy / refresh-contract) is pre-existing drift and is intentionally out of scope here. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(web): restyle designs empty state as centered card on grid backdrop Swap the horizontal pill for a stacked card and add a faint grid backdrop so the empty designs surface reads as an intentional canvas rather than a gap. Title now wraps instead of truncating; container is taller. * fix(new-project): pin skillId to hyperframes when videoModel is hyperframes-html When the Video tab resolves its skill it used to fall back to `list[0]?.id` if no skill declared `default_for: video`. That list is built from an unsorted `readdir()` in apps/daemon/src/skills.ts, so a freshly mounted project could land on `video-shortform` even when the user had explicitly chosen the HyperFrames-HTML model (or one of the new `hyperframes-html-in-canvas-*` templates). The agent then ran without the hyperframes SKILL body or its `references/html-in-canvas.md` preflight — the exact regression PR #866 was meant to land. `skillIdForTab` now pins to `hyperframes` whenever the current video model is `hyperframes-html`, regardless of discovery order. Added a unit test that mounts both `video-shortform` and `hyperframes` (with hyperframes last, simulating the bad readdir order) and asserts the create payload routes through `hyperframes`. --------- Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-11 15:45:12 +08:00
shangxinyu1	d45bf3fb9a	test: expand entry and settings automation coverage (#954 ) * test: harden new project panel metadata coverage * test: expand entry e2e coverage * test: drop e2e docs from the guarded package * test: cover examples gallery interactions * test: cover examples preview modal actions * test: cover examples preview escape fullscreen * test: cover examples template prompt filtering * test: cover updated settings and entry tabs * test: fix entry/settings coverage type drift * test: fix example preview fetch assertion * test: fix new project panel skill fixture	2026-05-11 10:49:42 +08:00
Matt Van Horn	976a5900f8	fix: clear stale upload failure banner when previewing files (#797 ) * fix: clear stale upload failure banner when previewing existing files Closes #786 - Clear uploadError in openFile() so navigating to a file dismisses the banner - Scope banner visibility to the Design Files tab so stale errors do not bleed into preview surfaces - Add test pinning that no banner is rendered when there is no upload error * fix(workspace): move upload banner into DesignFilesPanel + interactive test Per @mrcfps + @lefarcen review on PR #797: - Move the upload-error banner from FileWorkspace into DesignFilesPanel body. Hide it whenever the in-panel preview is active (the missed flow that mrcfps and lefarcen flagged: single-click preview kept activeTab on DESIGN_FILES_TAB, so the old guard left the banner mounted above the preview). - Keep a fallback banner in FileWorkspace that fires only when activeTab is not Design Files. This preserves the partial-upload visibility flagged by chatgpt-codex-connector: a partial upload opens the last successful file (flipping activeTab to a viewer) and the failure note still surfaces. - Wrap uploadProjectFiles in try/catch so thrown errors surface a banner instead of disappearing. - Replace the brittle viewer-empty assertion with two interactive vitest cases: (1) mock-fail upload, banner visible, preview file, banner hidden, close preview, banner back, dismiss, banner gone; (2) partial-upload uploaded+failed, banner appears on the viewer surface with the existing 'Uploaded N file(s), but M failed' text. - Add df-upload-banner class and stable test ids upload-error-banner and upload-error-dismiss so future tests don't rely on the generic viewer-empty class. Closes #786 staleness; addresses follow-up review. --------- Co-authored-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com> Co-authored-by: mrcfps <mrc@powerformer.com>	2026-05-10 23:56:24 +08:00
Zihuailin	06e677cb72	Fix pending prompt clearing for templates (#1148 )	2026-05-10 21:52:49 +08:00
Dongsen	7c1db80893	fix(web): 写盘前拦截 prose-as-HTML artifact (#50 ) (#1144 ) * fix(web): reject prose-as-HTML artifacts at write time (#50) AI 偶尔会在仅做 in-place 编辑（无新 canonical HTML 产出）时仍按 system prompt 的非协商性收尾规则发出 `<artifact type="text/html">` 块，但块内只装一句中文总结。`persistArtifact` 之前不做内容校验，此类 prose 会作为合法 HTML 落盘到 `.od/projects/<id>/<id>.html`，并附带 `kind: html` manifest，污染项目文件面板（截图见 #50 评论）。新增 `validateHtmlArtifact` 纯函数：要求非空 + 长度 ≥64 + 含 `<!doctype html>` 或 `<html>` 标签（大小写不敏感、容忍 BOM）。 `persistArtifact` 在 `ext === '.html'` 分支调用 gate，失败时通过 `setError` 报错且不写文件。 scope 限于 `<artifact>`-tag 持久化路径——FileViewer/FileWorkspace 里用户手动保存草稿 HTML 走的是不同入口，不受影响。 prompt 层根因（缺少免发条款）已拆出 #1143 单独跟进，本 PR 是持久化层的兜底防御。 * fix(web): anchor HTML structural check at content start (#1144 review) mrcfps 在 #1144 review 指出原实现的 false negative： HTML_OPENING_TAG_RE / DOCTYPE_RE 用 .test() 在整个字符串里搜，所以 AI 描述改动时 inline 引一个 tag 名（"Updated the <html lang> attribute..."）就能蒙混过关——长度过 64、含 `<html `——同样落地为幽灵 HTML 文件。修复：合并两个 regex 成 STARTS_WITH_DOCUMENT_RE，加 ^ anchor，要求 trimmed 内容的首个非空白 token 必须是 `<!doctype html>` 或 `<html`。Mid-string 出现的 tag 名不再算数。同时按 lefarcen 的非阻塞文档建议把 docblock 改写得更精确： - "structural sniff" 替代 "validation"，明确不是 HTML 校验器 - 列出 not-a-linter / .jsx-tsx-skipped / 用户手动保存路径不受影响三条 scope 边界 - 64 字符阈值会拒收 49 字符的最小空 doc（如 `<!doctype html><html><body></body></html>`），明确这是有意 trade-off：AI 产出预期是 non-trivial deliverable 新增 3 例测试覆盖 mrcfps 描述的 false negative： - 长 prose 中 inline 引 `<html lang>` 应拒收 - 长 prose 中 inline 引 `<!doctype html>` 应拒收 - 首个 token 是 `<p>` 等非文档标签的 fragment 应拒收	2026-05-10 20:22:48 +08:00
Sid	e948405c22	fix(web): surface connector auth errors and stop silent popup close (#725 ) (#1128 ) * fix(web): surface connector auth errors and stop silent popup close (#725) Two layered bugs caused the "Twitter Connect button does nothing" symptom: 1. ConnectorsBrowser dropped result.error from connectConnector. On Electron desktop the popup is never opened (electronAPI.openExternal path), so the existing renderConnectorAuthError(null, ...) was a no-op and the user got zero feedback. 2. registry.ts silently called authWindow?.close() whenever the connect response did not carry { kind: 'redirect_required', redirectUrl }, leaving web users with a popup that vanishes without explanation. Patch: - Add a per-connector connectorAuthorizationError state and render it as an inline banner on both ConnectorCard and ConnectorDetailDrawer (mirrors the existing cancel-failed pattern; reuses the existing .connector-authorization-error styling). - Replace authWindow?.close() with a renderConnectorAuthInfo helper that branches on auth.kind ('connected' \| 'pending' \| unknown) and writes an explanatory message to the popup before the user closes it. - Tests: 1 registry test for the pending/info popup branch, 2 ConnectorsBrowser tests for surfacing and clearing the inline banner. * fix(web): clear connector auth error on background status refresh Addresses review feedback from @mrcfps and the Codex bot on PR #1128: the inline error banner stayed visible even after background status refresh marked the connector as `connected` (e.g. user completes auth out-of-band through the Composio dashboard, then focus/poll/message refresh observes the connection). - Add clearConnectorAuthorizationErrorsForConnected helper next to the existing pending-state helpers; same shape, returns the same object reference when nothing changes so React skips a re-render. - Wire it into reloadConnectorStatuses so every status refresh path (pending poll, focus, OAuth callback message) drops stale errors for any connector now reported as connected. - Add 2 unit tests for the helper next to the existing pending-state helper tests in EntryView.test.ts.	2026-05-10 19:38:18 +08:00
Priyanshu Kayarkar	eabf3a6e86	feat: add collapsible MCP JSON field-mapping helper (#1136 ) * feat(web): add collapsible MCP JSON helper component * feat(web): add collapsible MCP JSON field-mapping helper * test(web): add McpJsonHelper component tests for toggle behavior * fix(web): scope helper id per row and show helper * test(web): rewrite McpJsonHelper tests to use row-scoped ids * feat(mcp): use stable _localId for McpRow keys and aria-controls\n\n- Add _localId to DraftRow and genLocalId()\n- Use _localId as React key and helper id to avoid duplicate DOM ids\n- Move helper outside transport branches so helper is visible for all transports\n- Fix malformed template.homepage anchor * fix(web): restore _localId-scoped helperId and helper visibility for all transports * test(web): replace integration test with _localId-scoped helper tests * test(web): exercise McpJsonHelper via production McpClientSection in jsdom * fix(web): resolve typecheck errors * test(web):expand rows before querying helper toggles to fix timeout	2026-05-10 19:37:46 +08:00
soulme	cbb3c0e33a	Improve design files grouping (#1082 ) Add a modified-date grouping mode to make busy design workspaces easier to scan as generated files accumulate. The new view keeps existing batch actions and pagination available, adds localized labels, and covers date boundaries with component tests.	2026-05-10 11:55:34 +08:00
Pratik Rai	9f073f7b06	fix(web+desktop): handle popup-blocked PDF export with native Electron print dialog (#973 ) * fix(web): add alert when pdf export popup is blocked (#664) * fix(web): implement synchronous empty-tab strategy for pdf export * Update popup-blocked alert with browser-specific instructions * Add desktop preload script exposing native print-pdf IPC channel * Add native Electron print dialog for PDF export via IPC handler * fix(desktop): resolve electron preload, ipc security, and print callbacks * fix(desktop): resolve PR review comments for IPC lifecycle and merge fallout * fix(web): prevent double-print race by moving script injection to browser path * fix: resolve timing issues for blob revocation and desktop print readiness * fix(desktop): make waitForPrintableContent descend into sandboxed iframes * fix(desktop): handle print dialog cancellation gracefully without throwing errors * fix(desktop): send raw document to desktop bridge to fix readiness timing * fix(desktop): restore iframe sandbox and implement postMessage readiness handshake * fix(desktop): separate legacy print readiness from new handshake logic to fix regression * fix(desktop): resolve print handshake race condition and add regression test * fix(web): strip allow-modals from desktop sandbox to prevent hidden window stalls * fix(desktop): ensure print readiness cache is injected for all bridge exports * fix(web): make print readiness handshake explicitly wait for image completion * chore(desktop): add 30s timeout to print readiness handshake * fix(web): defer image readiness scan until after DOM load to catch lazy images * fix(desktop): secure print readiness handshake with per-export nonce	2026-05-10 11:45:46 +08:00
Bryan A	587c783dc0	feat(web): add Finalize design package + Continue in CLI buttons (#451 ) (#974 ) * feat(daemon): expose resolvedDir on GET /api/projects/:id (#451 prereq) Native projects (no metadata.baseDir) live at <projects root>/<id>, where projects root is daemon-side state. The web client cannot reconstruct an absolute path on its own, and shell.openPath on a relative path is undefined behavior. Without resolvedDir, the upcoming Continue in CLI button (#451) would render permanently disabled for native projects. Mirrors PR #832's pattern of exposing designMdPath in its response. Computed via the existing resolveProjectDir(...) helper. No behavior change to existing callers; they ignore the new field. Adds ProjectDetailResponse contract type and a focused projects-routes test covering imported-folder, native, and unknown-id paths. * feat(web): add parseProvenance helper for DESIGN.md staleness checks Pure helper that extracts Project ID, design system, current artifact, transcript message count, and generated UTC timestamp from the `## Provenance` section emitted by the daemon's finalize synthesis prompt (apps/daemon/src/finalize-design.ts). Used by useDesignMdState to derive the Continue in CLI button's stale/fresh state without an additional daemon endpoint. Handles missing section, "none" sentinels for design system / artifact, and malformed timestamps without throwing. Tests cover all four branches. * feat(web): add buildClipboardPrompt template for Continue in CLI Inline single-source-of-truth template per #451 spec §3.4. Names the project, the working directory, and the DESIGN.md-first operating contract for the receiving `claude` CLI session. Trailing TODO is the blank task slot the issue body specifies — left empty so the user fills it in before submitting. Also lands the shared copyToClipboard helper (jsdom-safe canonical path + execCommand fallback) so the new button and any future caller share one fallback path, mirroring the inline pattern in FileViewer.tsx. Tests cover happy-path field rendering, "none"/"unknown" sentinels when DESIGN.md fields are absent, and both clipboard branches. * feat(web): add useProjectDetail + useDesignMdState hooks useProjectDetail wraps GET /api/projects/:id, surfacing the resolvedDir field and falling back to metadata.baseDir for older daemons that don't include it. Continue in CLI needs an absolute working directory so the desktop bridge can openPath it; the web client never reconstructs the path itself. useDesignMdState fetches the project's file list, downloads DESIGN.md when present, parses the Provenance section, and computes a stale verdict by comparing the recorded generatedAt against the max mtime of non-DESIGN.md files and the max conversation updatedAt. Drives the button's three-state UI (disabled / fresh / stale) without a daemon-side endpoint. Tests cover happy path, fallback, and both stale branches plus the pure computeStale helper for the null-timestamp edge case. * feat(web): add useFinalizeProject hook with cancel + error-code mapping Wraps POST /api/projects/:id/finalize/anthropic for the Finalize design package button. Three concerns: 1. Lifecycle: idle → pending → success \| error. Double-clicking the button aborts the prior in-flight request before starting a new one so the daemon never sees stacked finalize calls per project. 2. Cancellation: AbortController plumbed through fetch + a 130 s timer (daemon timeout 120 s + 10 s buffer). Cancel returns to idle cleanly — it's a user gesture, not an error surface. 3. Daemon error mapping: when the response is non-OK, body.error.code drives the canonical user-facing toast string (table covers all 7 codes the daemon emits today plus a network-error catch-all). body.error.details, when a string, surfaces alongside the category message so account-usage-cap responses (Anthropic 400 → UPSTREAM_UNAVAILABLE) can show the upstream's own reason instead of just the daemon's category label — committed to lefarcen on #450 verification reply. Tests cover request body shape, all 8 error codes via it.each, the network-error path, the details-surfacing branch, the cancel ⇒ idle flow, and the unknown-code → catch-all message branch. * feat(web): add useTerminalLaunch with electron/web detection Capability-detected wrapper around window.electronAPI.openPath. On desktop the bridge forwards to shell.openPath, which opens the OS file manager at the project working directory (per Electron's contract for directory paths — it is NOT a terminal launcher; spawning a terminal application is deferred per #451 Non-goals). On browser builds the hook reports web-fallback so the caller renders a manual-instruction toast naming the working directory. Treats any non-empty string return from shell.openPath as ok: false so platform-specific failures surface the manual fallback toast. Behavior is exercised end-to-end by the upcoming ContinueInCliButton tests. * feat(desktop): expose shell.openPath via electronAPI bridge Adds an openPath bridge method that the Continue in CLI button (#451) uses to surface the project working directory in the OS file manager. shell.openPath is part of Electron's contract and resolves to '' on success / a non-empty error string on failure; the IPC handler forwards the result so the renderer can decide between the success toast and the manual fallback toast without a separate error channel. Empty / non-string inputs short-circuit to a self-describing error string so the renderer never needs to worry about undefined-input crashes from the main process. Web side: extracts Window.electronAPI into a single global declaration at apps/web/src/types/electron.d.ts so future bridge methods land in one place. Two pre-existing inline declare-global blocks (NewProjectPanel.tsx, providers/registry.ts) are deleted in favor of that single source of truth — the inline ones each carried a partial shape of the bridge and were diverging from the desktop preload. * feat(web): add FinalizeDesignButton, ContinueInCliButton, ProjectActionsToolbar Project-level toolbar that hosts the two new actions from #451. Mounted between AppChromeHeader and the chat/workspace split (wiring lands in the next commit). Per-file actions (Export PDF/PPTX/ZIP, Deploy) stay in the FileViewer share menu. FinalizeDesignButton has three idle labels driven by DESIGN.md existence + staleness, plus a pending state with a spinner and a cancel link that maps to useFinalizeProject's AbortController. Error toasts are owned by ProjectView so the button doesn't carry its own toast surface. ContinueInCliButton renders disabled with a Finalize-pointing tooltip when DESIGN.md is missing (so the workflow is discoverable rather than hidden), enabled when fresh, and enabled with a stale chip otherwise. Chip text is the spec's canonical "Spec is stale — regenerate?" — N-turns-ago is deferred per spec §4.6. Toast.tsx is a tiny transient component that mirrors PromptTemplatePreviewModal's state-based toast pattern; supports a secondary details line so daemon error envelopes that carry an upstream explanation (e.g. Anthropic account-usage cap) can surface the real reason alongside the daemon's category label. CSS appends one block to apps/web/src/index.css mirroring the existing app-project-title token usage; no CSS modules in this repo (verified by grep). * test(web): cover ContinueInCliButton states + interaction wiring Three rendered states (DESIGN.md missing → disabled with the Finalize-pointing tooltip; DESIGN.md fresh → enabled, no chip; DESIGN.md stale → enabled with the canonical "Spec is stale — regenerate?" chip), plus three onClick branches (no-op when disabled, fires once when fresh, fires once when stale). Click-handler integration with clipboard / shell.openPath / toast lives in ProjectView (the button is presentational and takes the handler in via props), so those are covered by Phase K's wiring + the manual smoke test rather than the per-component test. * feat(web): wire Continue in CLI + Finalize buttons into ProjectView Mounts the new project-actions toolbar between AppChromeHeader and the chat/workspace split, hidden when workspaceFocused so the focus-mode artifact view stays uncluttered. Wires the four hooks (useProjectDetail, useDesignMdState, useFinalizeProject, useTerminalLaunch) to a single shared toast surface. handleFinalize reads the request body from the existing config: AppConfig prop and uses effectiveMaxTokens(config) to match the chat-flow's maxTokens defaulting; on success it refreshes useDesignMdState so the toolbar re-renders with the new chip state. handleContinueInCli builds the literal clipboard prompt, copies it, opens the working directory via shell.openPath on desktop / falls through to a manual-instruction toast on browser, and surfaces shell.openPath failures with a fallback toast that names the path. Errors lift into the same toast surface (a useEffect tied to finalize.error) so the daemon's category message + body.error.details reach the user as the spec's two-line render — covered by hook test 16a in the prior commit. ⌘+Shift+K (mac) / Ctrl+Shift+K (others) is the keyboard accelerator for Continue in CLI; capture-phase, platform-gated, no-op when DESIGN.md is missing. Mirrors the existing FileWorkspace shortcut idiom and does not collide with ⌘+P (Quick Switcher). * fix(web): distinguish timeout abort from user cancel in useFinalizeProject Addresses codex P2 finding on PR #974: the catch block treated every AbortError as a user-initiated cancel and reset to idle silently. If the internal 130 s timeout fired, users saw no failure signal but the daemon's synthesis call may still have been in flight. Adds a timedOutRef set inside the setTimeout callback before controller.abort(), and branches in the catch: timeout → status 'error' with new TIMEOUT code ("Finalize timed out after 130 s. The daemon may still be running."), user cancel → existing idle reset. Reset the ref at the start of every trigger() so a previous timeout doesn't poison the next call. Adds one test using vi.useFakeTimers() that advances past 130_001 ms and asserts the TIMEOUT error surface. * fix(web): surface clipboard failures by rendering the prompt in the toast Addresses codex P2 finding on PR #974: handleContinueInCli ignored copyToClipboard's return value, so when both clipboard paths failed (restricted browser context / insecure origin) the toast still said "paste the prompt" though nothing had been copied — leaving users with no manual-copy recourse in exactly the environments where the fallback should help. handleContinueInCli now branches on copyToClipboard's boolean return. On failure the toast renders the prepared prompt in a scrollable <pre> block and pins itself open (no auto-dismiss) so the user has time to select-and-copy manually. Includes a Dismiss button + the working directory in the secondary details line so the user has the information needed to proceed. The folder-open call is skipped on copy failure because there's nothing to paste yet; the user copies first, then re-clicks Continue in CLI when they're ready. Toast component grows an optional Updating VS Code Server to version 41dd792b5e652393e7787322889ed5fdc58bd75b Removing previous installation... Installing VS Code Server for Linux x64 (41dd792b5e652393e7787322889ed5fdc58bd75b) Downloading: 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 2% 2% 2% 2% 2% 2% 2% 2% 2% 2% 2% 2% 2% 2% 2% 2% 2% 2% 3% 3% 3% 3% 3% 3% 3% 3% 3% 3% 3% 3% 3% 3% 3% 3% 3% 3% 4% 4% 4% 4% 4% 4% 4% 4% 4% 4% 4% 4% 4% 4% 4% 4% 4% 4% 5% 5% 5% 5% 5% 5% 5% 5% 5% 5% 5% 5% 5% 5% 5% 5% 5% 5% 6% 6% 6% 6% 6% 6% 6% 6% 6% 6% 6% 6% 6% 6% 6% 6% 6% 6% 7% 7% 7% 7% 7% 7% 7% 7% 7% 7% 7% 7% 7% 7% 7% 7% 7% 7% 8% 8% 8% 8% 8% 8% 8% 8% 8% 8% 8% 8% 8% 8% 8% 8% 8% 8% 9% 9% 9% 9% 9% 9% 9% 9% 9% 9% 9% 9% 9% 9% 9% 9% 9% 10% 10% 10% 10% 10% 10% 10% 10% 10% 10% 10% 10% 10% 10% 10% 10% 10% 10% 11% 11% 11% 11% 11% 11% 11% 11% 11% 11% 11% 11% 11% 11% 11% 11% 11% 11% 12% 12% 12% 12% 12% 12% 12% 12% 12% 12% 12% 12% 12% 12% 12% 12% 12% 12% 13% 13% 13% 13% 13% 13% 13% 13% 13% 13% 13% 13% 13% 13% 13% 13% 13% 13% 14% 14% 14% 14% 14% 14% 14% 14% 14% 14% 14% 14% 14% 14% 14% 14% 14% 14% 15% 15% 15% 15% 15% 15% 15% 15% 15% 15% 15% 15% 15% 15% 15% 15% 15% 15% 16% 16% 16% 16% 16% 16% 16% 16% 16% 16% 16% 16% 16% 16% 16% 16% 16% 16% 17% 17% 17% 17% 17% 17% 17% 17% 17% 17% 17% 17% 17% 17% 17% 17% 17% 17% 18% 18% 18% 18% 18% 18% 18% 18% 18% 18% 18% 18% 18% 18% 18% 18% 18% 19% 19% 19% 19% 19% 19% 19% 19% 19% 19% 19% 19% 19% 19% 19% 19% 19% 19% 20% 20% 20% 20% 20% 20% 20% 20% 20% 20% 20% 20% 20% 20% 20% 20% 20% 20% 21% 21% 21% 21% 21% 21% 21% 21% 21% 21% 21% 21% 21% 21% 21% 21% 21% 21% 22% 22% 22% 22% 22% 22% 22% 22% 22% 22% 22% 22% 22% 22% 22% 22% 22% 22% 23% 23% 23% 23% 23% 23% 23% 23% 23% 23% 23% 23% 23% 23% 23% 23% 23% 23% 24% 24% 24% 24% 24% 24% 24% 24% 24% 24% 24% 24% 24% 24% 24% 24% 24% 24% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 26% 26% 26% 26% 26% 26% 26% 26% 26% 26% 26% 26% 26% 26% 26% 26% 26% 26% 27% 27% 27% 27% 27% 27% 27% 27% 27% 27% 27% 27% 27% 27% 27% 27% 27% 28% 28% 28% 28% 28% 28% 28% 28% 28% 28% 28% 28% 28% 28% 28% 28% 28% 28% 29% 29% 29% 29% 29% 29% 29% 29% 29% 29% 29% 29% 29% 29% 29% 29% 29% 29% 30% 30% 30% 30% 30% 30% 30% 30% 30% 30% 30% 30% 30% 30% 30% 30% 30% 30% 31% 31% 31% 31% 31% 31% 31% 31% 31% 31% 31% 31% 31% 31% 31% 31% 31% 31% 32% 32% 32% 32% 32% 32% 32% 32% 32% 32% 32% 32% 32% 32% 32% 32% 32% 32% 33% 33% 33% 33% 33% 33% 33% 33% 33% 33% 33% 33% 33% 33% 33% 33% 33% 33% 34% 34% 34% 34% 34% 34% 34% 34% 34% 34% 34% 34% 34% 34% 34% 34% 34% 34% 35% 35% 35% 35% 35% 35% 35% 35% 35% 35% 35% 35% 35% 35% 35% 35% 35% 35% 36% 36% 36% 36% 36% 36% 36% 36% 36% 36% 36% 36% 36% 36% 36% 36% 36% 37% 37% 37% 37% 37% 37% 37% 37% 37% 37% 37% 37% 37% 37% 37% 37% 37% 37% 38% 38% 38% 38% 38% 38% 38% 38% 38% 38% 38% 38% 38% 38% 38% 38% 38% 38% 39% 39% 39% 39% 39% 39% 39% 39% 39% 39% 39% 39% 39% 39% 39% 39% 39% 39% 40% 40% 40% 40% 40% 40% 40% 40% 40% 40% 40% 40% 40% 40% 40% 40% 40% 40% 41% 41% 41% 41% 41% 41% 41% 41% 41% 41% 41% 41% 41% 41% 41% 41% 41% 41% 42% 42% 42% 42% 42% 42% 42% 42% 42% 42% 42% 42% 42% 42% 42% 42% 42% 42% 43% 43% 43% 43% 43% 43% 43% 43% 43% 43% 43% 43% 43% 43% 43% 43% 43% 43% 44% 44% 44% 44% 44% 44% 44% 44% 44% 44% 44% 44% 44% 44% 44% 44% 44% 44% 45% 45% 45% 45% 45% 45% 45% 45% 45% 45% 45% 45% 45% 45% 45% 45% 45% 46% 46% 46% 46% 46% 46% 46% 46% 46% 46% 46% 46% 46% 46% 46% 46% 46% 46% 47% 47% 47% 47% 47% 47% 47% 47% 47% 47% 47% 47% 47% 47% 47% 47% 47% 47% 48% 48% 48% 48% 48% 48% 48% 48% 48% 48% 48% 48% 48% 48% 48% 48% 48% 48% 49% 49% 49% 49% 49% 49% 49% 49% 49% 49% 49% 49% 49% 49% 49% 49% 49% 49% 50% 50% 50% 50% 50% 50% 50% 50% 50% 50% 50% 50% 50% 50% 50% 50% 50% 50% 51% 51% 51% 51% 51% 51% 51% 51% 51% 51% 51% 51% 51% 51% 51% 51% 51% 51% 52% 52% 52% 52% 52% 52% 52% 52% 52% 52% 52% 52% 52% 52% 52% 52% 52% 52% 53% 53% 53% 53% 53% 53% 53% 53% 53% 53% 53% 53% 53% 53% 53% 53% 53% 53% 54% 54% 54% 54% 54% 54% 54% 54% 54% 54% 54% 54% 54% 54% 54% 54% 54% 55% 55% 55% 55% 55% 55% 55% 55% 55% 55% 55% 55% 55% 55% 55% 55% 55% 55% 56% 56% 56% 56% 56% 56% 56% 56% 56% 56% 56% 56% 56% 56% 56% 56% 56% 56% 57% 57% 57% 57% 57% 57% 57% 57% 57% 57% 57% 57% 57% 57% 57% 57% 57% 57% 58% 58% 58% 58% 58% 58% 58% 58% 58% 58% 58% 58% 58% 58% 58% 58% 58% 58% 59% 59% 59% 59% 59% 59% 59% 59% 59% 59% 59% 59% 59% 59% 59% 59% 59% 59% 60% 60% 60% 60% 60% 60% 60% 60% 60% 60% 60% 60% 60% 60% 60% 60% 60% 60% 61% 61% 61% 61% 61% 61% 61% 61% 61% 61% 61% 61% 61% 61% 61% 61% 61% 61% 62% 62% 62% 62% 62% 62% 62% 62% 62% 62% 62% 62% 62% 62% 62% 62% 62% 62% 63% 63% 63% 63% 63% 63% 63% 63% 63% 63% 63% 63% 63% 63% 63% 63% 63% 64% 64% 64% 64% 64% 64% 64% 64% 64% 64% 64% 64% 64% 64% 64% 64% 64% 64% 65% 65% 65% 65% 65% 65% 65% 65% 65% 65% 65% 65% 65% 65% 65% 65% 65% 65% 66% 66% 66% 66% 66% 66% 66% 66% 66% 66% 66% 66% 66% 66% 66% 66% 66% 66% 67% 67% 67% 67% 67% 67% 67% 67% 67% 67% 67% 67% 67% 67% 67% 67% 67% 67% 68% 68% 68% 68% 68% 68% 68% 68% 68% 68% 68% 68% 68% 68% 68% 68% 68% 68% 69% 69% 69% 69% 69% 69% 69% 69% 69% 69% 69% 69% 69% 69% 69% 69% 69% 69% 70% 70% 70% 70% 70% 70% 70% 70% 70% 70% 70% 70% 70% 70% 70% 70% 70% 70% 71% 71% 71% 71% 71% 71% 71% 71% 71% 71% 71% 71% 71% 71% 71% 71% 71% 71% 72% 72% 72% 72% 72% 72% 72% 72% 72% 72% 72% 72% 72% 72% 72% 72% 72% 73% 73% 73% 73% 73% 73% 73% 73% 73% 73% 73% 73% 73% 73% 73% 73% 73% 73% 74% 74% 74% 74% 74% 74% 74% 74% 74% 74% 74% 74% 74% 74% 74% 74% 74% 74% 75% 75% 75% 75% 75% 75% 75% 75% 75% 75% 75% 75% 75% 75% 75% 75% 75% 75% 76% 76% 76% 76% 76% 76% 76% 76% 76% 76% 76% 76% 76% 76% 76% 76% 76% 76% 77% 77% 77% 77% 77% 77% 77% 77% 77% 77% 77% 77% 77% 77% 77% 77% 77% 77% 78% 78% 78% 78% 78% 78% 78% 78% 78% 78% 78% 78% 78% 78% 78% 78% 78% 78% 79% 79% 79% 79% 79% 79% 79% 79% 79% 79% 79% 79% 79% 79% 79% 79% 79% 79% 80% 80% 80% 80% 80% 80% 80% 80% 80% 80% 80% 80% 80% 80% 80% 80% 80% 80% 81% 81% 81% 81% 81% 81% 81% 81% 81% 81% 81% 81% 81% 81% 81% 81% 81% 81% 82% 82% 82% 82% 82% 82% 82% 82% 82% 82% 82% 82% 82% 82% 82% 82% 82% 83% 83% 83% 83% 83% 83% 83% 83% 83% 83% 83% 83% 83% 83% 83% 83% 83% 83% 84% 84% 84% 84% 84% 84% 84% 84% 84% 84% 84% 84% 84% 84% 84% 84% 84% 84% 85% 85% 85% 85% 85% 85% 85% 85% 85% 85% 85% 85% 85% 85% 85% 85% 85% 85% 86% 86% 86% 86% 86% 86% 86% 86% 86% 86% 86% 86% 86% 86% 86% 86% 86% 86% 87% 87% 87% 87% 87% 87% 87% 87% 87% 87% 87% 87% 87% 87% 87% 87% 87% 87% 88% 88% 88% 88% 88% 88% 88% 88% 88% 88% 88% 88% 88% 88% 88% 88% 88% 88% 89% 89% 89% 89% 89% 89% 89% 89% 89% 89% 89% 89% 89% 89% 89% 89% 89% 89% 90% 90% 90% 90% 90% 90% 90% 90% 90% 90% 90% 90% 90% 90% 90% 90% 90% 90% 91% 91% 91% 91% 91% 91% 91% 91% 91% 91% 91% 91% 91% 91% 91% 91% 91% 92% 92% 92% 92% 92% 92% 92% 92% 92% 92% 92% 92% 92% 92% 92% 92% 92% 92% 93% 93% 93% 93% 93% 93% 93% 93% 93% 93% 93% 93% 93% 93% 93% 93% 93% 93% 94% 94% 94% 94% 94% 94% 94% 94% 94% 94% 94% 94% 94% 94% 94% 94% 94% 94% 95% 95% 95% 95% 95% 95% 95% 95% 95% 95% 95% 95% 95% 95% 95% 95% 95% 95% 96% 96% 96% 96% 96% 96% 96% 96% 96% 96% 96% 96% 96% 96% 96% 96% 96% 96% 97% 97% 97% 97% 97% 97% 97% 97% 97% 97% 97% 97% 97% 97% 97% 97% 97% 97% 98% 98% 98% 98% 98% 98% 98% 98% 98% 98% 98% 98% 98% 98% 98% 98% 98% 98% 99% 99% 99% 99% 99% 99% 99% 99% 99% 99% 99% 99% 99% 99% 99% 99% 99% 99%100%100% Unpacking: 0% 1% 2% 3% 4% 5% 6% 7% 8% 9% 10% 11% 12% 13% 14% 15% 16% 17% 18% 19% 20% 21% 22% 23% 24% 25% 26% 27% 28% 29% 30% 31% 32% 33% 34% 35% 36% 37% 38% 39% 40% 41% 42% 43% 44% 45% 46% 47% 48% 49% 50% 51% 52% 53% 54% 55% 56% 57% 58% 59% 60% 61% 62% 63% 64% 65% 66% 67% 68% 69% 70% 71% 72% 73% 74% 75% 76% 77% 78% 79% 80% 81% 82% 83% 84% 85% 86% 87% 88% 89% 90% 91% 92% 93% 94% 95% 96% 97% 98% 99%100% Unpacked 4009 files and folders to /home/bryan/.vscode-server/bin/41dd792b5e652393e7787322889ed5fdc58bd75b. Looking for compatibility check script at /home/bryan/.vscode-server/bin/41dd792b5e652393e7787322889ed5fdc58bd75b/bin/helpers/check-requirements.sh Running compatibility check script Compatibility check successful (0) prop and the auto-dismiss TTL is suppressed whenever code is present. CSS adds .od-toast-code (monospace, max-height 240 with overflow-auto) and .od-toast-dismiss styling. Six new Toast tests cover details rendering, code rendering, no-auto-dismiss when code is present, auto-dismiss when code is absent, and the Dismiss button affordance. * fix(web): make ContinueInCliButton disabled-state guidance visible Addresses mrcfps's PR #974 review: native <button disabled> does not fire hover/focus events in browsers we ship against, so a `title` tooltip on the disabled button never surfaces. The only guidance for the missing-DESIGN.md state was effectively invisible — defeating the spec's "discoverable, not hidden" intent. Renders the help text as a visible sibling <span> next to the disabled button instead. Adds aria-describedby pointing the button at the hint's id so assistive tech announces the explanation when the disabled button gets focus. The native `disabled` attribute stays so the button still can't be clicked or submitted. CSS adds .project-actions-disabled-hint (muted italic, 11.5px, matches the existing meta/secondary text style on this surface). Test asserts the role="note" hint is in the DOM with the canonical text and that the button's aria-describedby links to its id. * fix(web): keep ProjectActionsToolbar at natural height inside the .app grid The .app container was `grid-template-rows: auto 1fr` — only two rows. Adding ProjectActionsToolbar as a third child between AppChromeHeader and the chat/workspace split made the toolbar the 2nd grid item, so it took the `1fr` row (filling roughly half the viewport) while the split got pushed into an implicit auto row at its content's natural height. Surfaced as a screenshot from Bryan showing the toolbar's background bleeding across most of the screen. Extend grid-template-rows to `auto auto 1fr` and pin the split to `grid-row: 3` explicitly. Now: - Toolbar visible: row 1 = header (auto), row 2 = toolbar (auto), row 3 = split (1fr, fills remaining viewport). - Toolbar hidden via hidden=workspaceFocused → ProjectActionsToolbar returns null, row 2 collapses to 0px (auto with no content), split still fills row 3. No JS changes; existing 609 tests still green. * fix(web): guard useFinalizeProject state writes against superseded triggers Addresses mrcfps's PR #974 P1 review on useFinalizeProject.ts:132 (also called out as P1.3 in lefarcen's deep-dive review). Calling trigger() twice in quick succession aborted the first controller and swapped abortRef to the new one, but the first request's later AbortError catch still unconditionally called setStatus('idle') / setError(null). That cleared the spinner and re-enabled both toolbar buttons while the replacement finalize was still pending — defeating the de-duplication this hook was meant to enforce. Adds an isCurrent() closure (`abortRef.current === controller`) and gates every state-write site after the await: success path, non-OK envelope path, AbortError-timeout, AbortError-cancel, and network-error all bail early when the trigger has been superseded. Per mrcfps: "make every state write request-scoped." Regression test triggers twice in quick succession with a never-resolving fetch, awaits the first promise (it rejects with AbortError), and asserts status stays 'pending' rather than collapsing to 'idle' under the replacement's lifetime. * fix(desktop): allowlist-validate shell.openPath against registered project roots Addresses mrcfps's PR #974 P1 review on runtime.ts:305 (also called out as P1.2 in lefarcen's deep-dive review): the new `shell:open-path` IPC handler accepted any renderer-supplied string and forwarded it straight into Electron's `shell.openPath`, widening the renderer→main trust boundary so XSS or a compromised renderer dependency could open arbitrary local paths to the user. Adds an explicit gate around the bridge: 1. validateExistingDirectory(p) — floor check that rejects empty strings, relative paths, files, apps, and non-existent paths; realpath-resolves so symlink games can't be used to register one path and reach another. 2. createProjectRootGate() — Set-backed allowlist of daemon-validated project working directories. The renderer calls registerProjectRoot(absDir) once per project mount via a new IPC method (preload bridge); the main process only opens paths that pass both the floor check and the allowlist. ProjectView wires the registration via a useEffect tied to projectDetail.resolvedDir, so the active project's daemon-supplied working directory is always the one being approved (not a renderer- synthesized string). Threat-model caveat documented in the runtime.ts comment block: an attacker that fully controls the renderer can also call register with arbitrary paths. Closing that gap fully requires a daemon-side round-trip to derive the canonical resolvedDir from the daemon's project registry, which is deferred to keep this PR focused. Today's allowlist still defends against accidental misuse, bugs, and common XSS payloads that don't know to call register first. Adds apps/packaged/tests/desktop-project-root-gate.test.ts with 13 cases: floor-validation rejection cases (empty / relative / missing / file), happy-path resolution, symlink realpath canonicalization, and the allowlist's register/isApproved/reset semantics. Mirrors the existing apps/packaged/tests/desktop-url-allowlist.test.ts pattern from PR #911 — the packaged workspace hosts the test because apps/desktop has no vitest setup yet. * fix(daemon): wire request-lifecycle abort signal through finalize route Addresses mrcfps's PR #974 P1 review on apps/daemon/src/server.ts:3831-3837 (also called out as P1.1 in lefarcen's deep-dive review): `POST /api/projects/:id/finalize/anthropic` called `finalizeDesignPackage(...)` without threading any request-lifecycle abort, so cancelling the browser fetch only aborted the UI-side request — the daemon's 60–120 s Anthropic call kept running and still wrote DESIGN.md after the UI returned to idle. Adds an AbortController inside the route handler, fired from `res.on('close')`, and threads its signal into the existing `signal?: AbortSignal` parameter on `FinalizeOptions` (finalize-design.ts:70). `callAnthropicWithRetry` already passes the signal through to the underlying fetch, so a client disconnect now propagates all the way to the Anthropic SDK call. Listener-event choice: `res.on('close')` is the canonical event for "client disconnected before response was sent" in Express. The common alternative `req.on('close')` fires whenever the request stream finishes — for POST routes that means as soon as the body-parser middleware drains the body, well before the route does any work. Using req.on('close') would have flipped the abort controller in every successful run; the test caught this empirically. Caveat documented in the route's comment block: an abort fired after the upstream response has been received but before the atomic write completes still allows the write to land. The SDK contract bounds the network round-trip, not the post-network disk handoff. Adds tests/finalize-route-abort.test.ts: spins up the test server, mocks global fetch to capture the daemon-side AbortSignal at the Anthropic call, sends the request via raw http (so we can destroy the underlying socket), waits until the server reaches the Anthropic call, then destroys the socket and asserts that the daemon-side signal received an abort event within 5 s. Three pre-existing project-watchers chokidar tests show flaky timeouts under full-suite concurrency but pass in isolation; unrelated to this fix. * fix(daemon): refactor finalize-route-abort test to satisfy strict TS narrowing The CI typecheck (`pnpm --filter @open-design/daemon typecheck`, which runs both tsconfig.json and tsconfig.tests.json) caught what my pre-push validation missed: TS narrowed `capturedSignal` to literal `null` because vitest's mockImplementation closure can't prove its callback runs, leaving the bare `let capturedSignal: AbortSignal \| null = null` permanently typed at its initial value. At line 184 (`expect(capturedSignal?.aborted).toBe(true)`) the right-hand side of the optional-chain became unreachable, and TS flagged it as `Property 'aborted' does not exist on type 'never'`. Switches to the standard ref-object pattern (`const capture: { signal: AbortSignal \| null } = { signal: null }`). TS narrows let bindings inside closures conservatively but treats object-property writes as opaque, so `capture.signal` reads correctly across the closure boundary. Logic is unchanged. (Pre-push oversight: ran `pnpm --filter @open-design/web typecheck` but not the full repo `pnpm typecheck` after the daemon test landed; the daemon's own typecheck would have caught this. Adding `pnpm typecheck` back into the standard pre-push checklist.) * fix(desktop): make shell.openPath gate daemon-controlled and reject .app bundles Addresses lefarcen + mrcfps PR #974 P1 reviews on the previous path allowlist (commit `8bf56597`): - mrcfps (runtime.ts:45): `validateExistingDirectory` accepted macOS `.app` bundles because they're directories, so the gate would forward `/Applications/Safari.app` (or any other app bundle) into shell.openPath and launch the application — a stronger capability than the bridge's intended "reveal the project folder" feature. - lefarcen (runtime.ts:396): the allowlist was renderer-controlled. A compromised renderer could call `shell:register-project-root` with any existing absolute directory and then `shell:open-path` that same path; the IPC injection issue I'd documented as "deferred" was the central reviewer concern, not an acceptable caveat. Both reviewers asked for the gate to be derived from a daemon-authoritative source. The redesign drops the renderer-controlled register/openPath pair and replaces it with a single `openPath(projectId)` bridge call. The desktop main process resolves the project ID by calling the daemon's `GET /api/projects/:id` endpoint over the web sidecar proxy (which already forwards `/api/` to the daemon — verified in apps/web/sidecar/server.ts:209 and apps/web/next.config.ts:77), parses `resolvedDir` from the response, validates it against the floor (absolute, exists, is-directory, not .app), and only then forwards to `shell.openPath`. The renderer never names the path directly, so a compromised renderer cannot escalate to opening arbitrary local paths — it can only name a project the daemon already knows about, and the canonical path comes from the daemon's own response. Surface changes: - `runtime.ts`: `createProjectRootGate` removed. `fetchResolvedProjectDir(webUrl, projectId, fetchImpl?)` added. `validateExistingDirectory` rejects `.app` suffix after the realpath check (so symlinked launders are caught too). `shell:open-path` handler signature changes from `(path)` to `(projectId)`; `shell:register-project-root` handler removed. - `preload.cts`: `openPath(projectId)`; `registerProjectRoot` removed from the bridge surface. - `apps/web/src/types/electron.d.ts`: type updated to match. - `useTerminalLaunch.ts`: `open(projectId)` instead of `open(dir)`. - `ProjectView.tsx`: passes `project.id` to `terminalLauncher.open`; the registerProjectRoot useEffect is deleted. Toast text still reads `projectDir` (from `useProjectDetail.resolvedDir`) for fallback messages — the display* path is independent of the open mechanism. - `apps/packaged/tests/desktop-project-root-gate.test.ts`: rewritten to cover `validateExistingDirectory` (8 cases including the new `.app` suffix and symlinked-bundle rejection) and `fetchResolvedProjectDir` (8 cases including empty/invalid project ids, daemon HTTP success/failure, missing resolvedDir, network error, and URL canonicalization). Total: 16 passing tests, ~330 LOC churn including test rewrites. Lesson learned (from the iteration loop, not the code): when a reviewer asks for "ideally X, or at least Y," shipping Y with a deferred-X note flags the gap rather than fixing it. Either ship X or argue Y is sufficient; don't middle-ground. * feat(contracts,sidecar-proto): add desktop-auth IPC + fromTrustedPicker Schema-only prep for the PR #974 round-3 fix. Adds the two type extensions the daemon HTTP gate and the desktop main process will build on: - packages/sidecar-proto: SIDECAR_MESSAGES.REGISTER_DESKTOP_AUTH, with a base64-validated `{ secret }` payload + RegisterDesktopAuthResult. Updates normalizeDaemonSidecarMessage to accept the new message and pins both branches (accept + reject) in tests/index.test.ts. - packages/contracts: ProjectMetadata.fromTrustedPicker — a marker the daemon stamps on folder-imported projects whose POST /api/import/folder passed the desktop HMAC gate. The marker is privileged in the same way as `baseDir`: only the gated import handler sets it, and the desktop main process refuses to forward `shell.openPath` for folder-imported projects whose metadata lacks it. * fix(daemon): gate /api/import/folder on desktop HMAC token Closes the renderer→arbitrary-baseDir→shell.openPath bypass chain flagged by lefarcen and mrcfps in round 3 of PR #974. Both reviewers converged on the same gap: the previous round only moved path resolution into the daemon, but renderer JS could still POST /api/import/folder with any absolute path, get a project ID back, and then call openPath(projectId) to reveal the attacker-chosen path. Daemon-side closure: - New module-scope desktop auth secret + setter exported from apps/daemon/src/server.ts. The secret is null at boot (web/standalone mode unaffected) and gets set when the desktop main process registers it over the daemon's sidecar IPC. - New `verifyDesktopImportToken` pure helper. Verifies tokens shaped `${nonce}~${exp}~${signature}` against HMAC-SHA256(secret, baseDir + "\n" + nonce + "\n" + exp). Field separator is `~` (not `.`) because ISO 8601 expiries embed dots; `~` is in neither base64url nor ISO 8601 character sets. Rejects expired tokens, replayed nonces, and expiries beyond 2× the 60s TTL. - New middleware on POST /api/import/folder. When the secret is set, every request must carry a valid `X-OD-Desktop-Import-Token` header bound to the requested baseDir. Rejected requests return 403 with FORBIDDEN. When the secret is unset (no desktop registered), the route is unchanged so web-only deployments and standalone daemons keep working. - Trusted imports get `metadata.fromTrustedPicker: true` stamped on the project. POST /api/projects and PATCH /api/projects/:id reject any client-supplied `fromTrustedPicker` (privileged the same way as `baseDir`), and the PATCH preservation block re-stamps the marker on partial-metadata patches so it cannot be silently stripped. - Daemon sidecar IPC handler: REGISTER_DESKTOP_AUTH calls setDesktopAuthSecret with the base64-decoded secret. The HTTP and IPC servers share a process so the registration takes effect immediately for the next inbound /api/import/folder call. Tests: - apps/daemon/tests/desktop-import-token-gate.test.ts (15 cases): web mode acceptance, no-token rejection, malformed-token rejection, wrong-secret rejection, wrong-baseDir rejection, expired rejection, oversized-window rejection, valid mint + trusted-picker stamp + replay rejection, plus 6 pure-helper cases for verifyDesktopImportToken. afterAll() clears the secret to keep the shared HTTP server clean for sibling test files. - apps/daemon/tests/projects-routes.test.ts (+2 cases): POST and PATCH reject `fromTrustedPicker` in client-supplied metadata. Existing folder-import-route.test.ts continues to pass because none of those tests register a desktop secret; the gate stays dormant. * fix(desktop,web): atomic pickAndImport replacing pickFolder; openPath trusted-picker check Closes the renderer→arbitrary-baseDir bypass at the bridge boundary. The renderer no longer receives a raw filesystem path from the main process; the picker dialog and the import call live in a single main-process transaction. Desktop main: - runDesktopMain generates a per-process 32-byte secret and registers it with the daemon over the daemon's sidecar IPC before the BrowserWindow is created. registerDesktopAuthWithDaemon retries a few times because tools-dev / tools-pack spawn daemon, web, and desktop as siblings, so the daemon may not be listening yet on desktop boot. A failed registration logs a warning and the runtime refuses pickAndImport calls (no secret → no token can be minted). - runtime.ts replaces the `dialog:pick-folder` IPC with `dialog:pick-and-import`. The handler shows the picker, mints an HMAC token bound to the chosen path, POSTs /api/import/folder via the discovered web URL with the token + body, and returns the daemon's ImportFolderResponse to the renderer (or a structured failure envelope). Renderer never sees the path or the token. - shell:open-path now consults a new pure helper `isOpenPathAllowedForProject` that refuses folder-imported projects whose metadata lacks `fromTrustedPicker: true`. This is the literal interpretation of mrcfps's round-3 follow-up: openPath is gated to projects whose resolvedDir came from the trusted-picker flow, not just transitively via the import gate. Native projects (no baseDir → daemon-owned <projectsRoot>/<id>) are always safe to open. - fetchResolvedProjectDir now returns a `ResolvedProjectDirContext` with hasBaseDir + fromTrustedPicker so the openPath handler can enforce the marker check. - New `signDesktopImportToken` pure helper mirrors the daemon-side signer with the same `~`-separated wire shape, exported for the packaged workspace's test file. Preload bridge: - `pickFolder` is deleted. The new `pickAndImport(init?)` returns the daemon's import response or a structured failure. `openPath` keeps its existing signature; its trust gate now lives in the main process. Web renderer: - electron.d.ts drops `pickFolder` and adds `pickAndImport` with the shared DesktopPickAndImportResult union pulled from contracts. - NewProjectPanel: when running on Electron (pickAndImport bridge present), the "Open folder" button calls pickAndImport atomically and forwards the response through a new `onImportFolderResponse` prop. On web (no bridge), the existing manual baseDir input keeps working — browser builds have no shell.openPath surface so a renderer-named path cannot escalate. - EntryView and App.tsx pass through the new callback. App's `handleImportFolderResponse` updates state from the response without a second fetch (the import already happened in the main process). Tests (apps/packaged/tests/desktop-project-root-gate.test.ts): - 3 cases for `isOpenPathAllowedForProject`: native allowed, trusted-picker allowed, legacy folder-import refused. - 6 cases for `signDesktopImportToken`: shape (~-separated), determinism, signature flips when secret/baseDir/nonce/exp changes. - Existing fetchResolvedProjectDir cases extended for the new `context` shape and additional cases that prove the metadata inspection (hasBaseDir, fromTrustedPicker) reads the daemon response correctly. * fix(daemon): make desktop import-folder gate fail-closed (PR #974 round 4) lefarcen P1 on round 3 of PR #974: the gate's `secret == null → accept` branch (originally intended to keep web-only deployments unaffected) let a renderer bypass the import boundary in two real desktop edges: - Startup race: desktop's REGISTER_DESKTOP_AUTH IPC hasn't reached the daemon yet, but the renderer is already alive in the BrowserWindow and races to fetch /api/import/folder directly with arbitrary baseDir. - Daemon restart mid-session: the new daemon process boots tokenless while a desktop is still running. Same shape: renderer fetches the route, daemon falls through to "web mode", accepts the untrusted baseDir. shell.openPath rejects (no fromTrustedPicker marker) but the daemon's other file APIs (read/write project files, list directories) operate on the attacker-chosen path. Two coordinated mechanisms close that: (1) Sticky in-process flag. `desktopAuthEverRegistered` flips to true on first non-null `setDesktopAuthSecret(...)` and never goes back. setDesktopAuthSecret(null) (used by tests) does NOT relax the gate so production code can never silently fall back to fail-open. Add `resetDesktopAuthForTests()` for vitest cleanup. (2) Orchestrator-pinned mode via OD_REQUIRE_DESKTOP_AUTH=1 read at module load. tools-dev / tools-pack / apps/packaged set this when the daemon is spawned in a desktop-bundled flow (separate commits). With the env set, the gate is active from request 0 — a renderer racing /api/import/folder before registration completes gets a 503 DESKTOP_AUTH_PENDING (transient, retry). Standalone-daemon (web-only) deployments where neither mechanism fires keep the gate dormant and the route's behavior unchanged. Also addresses lefarcen P3 (whitespace HMAC mismatch): the desktop signs the exact picker output, so the daemon must verify the same string. The previous version trimmed `baseDir` before HMAC, which would reject legitimate paths whose final component carried edge whitespace. Use the raw request-body baseDir for verification; the existing trim()+realpath() logic still normalizes for fs operations. New error code: `DESKTOP_AUTH_PENDING` (HTTP 503, retryable). Tests: - `stays fail-closed (503 DESKTOP_AUTH_PENDING) after a registered secret is cleared` — exercises the sticky flag. - `verifies the exact request-body baseDir, not a trimmed version` — pins the round-4 P3 fix. - All existing desktop-import-token-gate cases continue to pass; the beforeEach/afterEach/afterAll resetters now use resetDesktopAuthForTests() to honor the sticky flag. * fix(tools-dev,packaged): pin desktop import-auth on daemon spawn PR #974 round-4 P1 follow-through. The daemon-side fail-closed gate needs OD_REQUIRE_DESKTOP_AUTH=1 in the daemon's spawn env whenever the daemon is paired with a desktop, so the gate is active from request 0 and the daemon-restart-mid-session bypass cannot reopen. tools-dev: - spawnDaemonRuntime accepts a `requireDesktopAuth` option that appends OD_REQUIRE_DESKTOP_AUTH=1 to the spawn env. - startDaemon takes the same flag and additionally checks whether a desktop runtime is already alive in this namespace; either branch pins the env (revival case where the daemon died mid-session and the user runs `tools-dev start daemon` to bring it back up). - startApp threads the bundled-target list down so the daemon spawn knows when desktop is queued in the same orchestration even though the daemon starts first. - The `start` / `restart` / `run` command actions pass the resolved target list into startApp. apps/packaged: - Packaged builds always pair a desktop with the daemon, so startPackagedSidecars unconditionally sets OD_REQUIRE_DESKTOP_AUTH=1 in the daemon child env. Headless builds also flow through this same path, so the same gate applies. Standalone-daemon flows unaffected: `tools-dev start daemon` (alone, no desktop running, no desktop in the bundled target list) does not set the env, and the daemon's gate stays dormant — current web-only behavior is preserved. * fix(desktop,web): align project-id regex with daemon; surface pickAndImport failures mrcfps round-4 nits on PR #974. apps/desktop/src/main/runtime.ts (mrcfps #1): the previous client-side regex `^[a-zA-Z0-9_-]+$` rejected `.` even though the daemon's canonical isSafeId / POST /api/projects accept `[A-Za-z0-9._-]{1,128}`. Result: dotted ids like `my-project.v2` were valid backend-side but got "project id contains disallowed characters" before fetchResolvedProjectDir even hit the network, regressing Continue in CLI / Finalize for those projects. Align the regex with the daemon's shape, comment-tag the rationale. apps/packaged/tests/desktop-project-root-gate.test.ts: add a regression case for a dotted id and one for the 128-char length cap (the new regex exposes both, the old regex obscured the dotted one). apps/web/src/components/NewProjectPanel.tsx (mrcfps #2): the `if (!result \|\| result.ok !== true) return` branch swallowed every non-OK pickAndImport shape (`desktop auth secret not registered`, `web sidecar URL not available`, daemon HTTP errors with details) the same way as the explicit `{ canceled: true }` cancel — leaving the user with a silent no-op when the trusted-picker flow couldn't even get off the ground. Reserve silent-return for the cancel case only; surface every other reason via a Toast (existing component, already used by ProjectView for related Continue-in-CLI flows). The new `formatPickAndImportErrorDetails` helper flattens daemon ApiError envelopes into a single readable secondary line so the operator sees both the category ("Open folder failed: daemon returned HTTP 503") and the upstream reason ("desktop auth required but secret not yet registered"). * docs(architecture): document desktop folder-import auth boundary lefarcen P3 on PR #974 round 4: the `Folder import` section in docs/architecture.md still documented only realpath / sandbox / RUNTIME_DATA_DIR checks and omitted the new desktop HMAC trust boundary, replay/TTL behavior, fail-closed semantics, daemon-restart edge, and legacy-import migration note. Without that subsection it's hard to review whether the 60s TTL, the `~`-separated token shape, or the legacy folder-imports needing re-pick are intentional product decisions or overlooked gaps. Add a "Desktop folder-import auth (PR #974)" subsection covering: - The trust handshake (32-byte secret over sidecar IPC at desktop boot). - Token shape (`${nonce}~${exp}~${signature}`), HMAC payload, and why `.` cannot be the field separator (ISO 8601 expiries embed dots). - TTL and replay behavior (60s, single-use, 2× TTL upper bound). - Fail-closed mechanisms — sticky in-process flag and OD_REQUIRE_DESKTOP_AUTH env var pinning. - Web-only deployments are unaffected (browser builds have no shell.openPath surface). - The `metadata.fromTrustedPicker` marker and the openPath-side defense-in-depth check. - Legacy folder-imports need re-pick to use the Continue-in-CLI button. - Daemon-restart edge: 503 DESKTOP_AUTH_PENDING until desktop re-registers; restart desktop to recover. * fix(packaged): skip desktop-auth gate in headless mode (PR #974 round 5 P2) Round 5 (lefarcen P2): packaged headless mode (daemon+web only, no Electron) was inheriting OD_REQUIRE_DESKTOP_AUTH=1 from the round-4 unconditional pin in startPackagedSidecars. Headless never runs desktop main, so no client could ever register an HMAC secret and folder import returned 503 DESKTOP_AUTH_PENDING permanently — even though headless has no shell.openPath surface to exploit. Plumb a required `requireDesktopAuth: boolean` option through startPackagedSidecars: apps/packaged/src/index.ts (Electron entry) passes true; apps/packaged/src/headless.ts passes false. Extract buildPackagedDaemonSpawnEnv as a pure helper so vitest can pin both branches without spawning a child process. Tests added in apps/packaged/tests/sidecars.test.ts cover both branches plus OD_LEGACY_DATA_DIR / daemonCliEntry env forwarding edges. Refs: nexu-io/open-design#974 * fix(desktop,daemon): lazy auth retry + canonical HMAC binding (PR #974 round 5 P1+P3) Round 5 (lefarcen P1, mrcfps): a daemon restart under OD_REQUIRE_DESKTOP_AUTH=1 left desktop holding a stale secret while the new daemon process required a fresh registration — folder import returned 503 DESKTOP_AUTH_PENDING permanently until the user restarted desktop. Same dead-end if the startup handshake missed its retry window. Round 5 (lefarcen P3): the daemon verified the HMAC against raw request-body baseDir, then trimmed before realpath(). A picker selection of "/tmp/foo " could authorize an import of "/tmp/foo" — token bound to a different path than the one imported. Three coordinated fixes: 1. P1 lazy retry: extract pickAndImportFolder as a pure helper that takes injected fetch / mintToken / registerDesktopAuth deps. On 503 DESKTOP_AUTH_PENDING from /api/import/folder, re-invoke the registration callback once, mint a fresh token (new nonce + new exp keeps replay protection), and POST again. Single retry, no infinite loop. Other failure shapes return immediately to the renderer. 2. P1 wiring: runDesktopMain now ALWAYS passes desktopAuthSecret to the runtime regardless of whether the initial handshake succeeded, plus a registerDesktopAuthWithDaemon callback the runtime invokes lazily. Soften the startup warning text to match the new recovery semantics. 3. P3 binding: trim picker output ONCE on the desktop side before both signing the HMAC and POSTing. Daemon-side verification stays against raw request-body baseDir (round-4 behavior); the daemon's defensive trim before realpath() is now a no-op for desktop traffic and only load-bearing for web-mode callers (path.isAbsolute(" /foo ") is false). End-to-end: desktop-signed string == request body == HMAC- verified string == realpath() input. Tests: - apps/packaged/tests/desktop-pick-and-import.test.ts (NEW, 7 cases): lazy-retry happy path; lazy-retry exhausted (re-register WAS called); single-attempt happy path (no unnecessary IPC); optional-callback no-op; non-503 failures bypass retry; network errors; non-PENDING 503 bypasses retry. - apps/daemon/tests/desktop-import-token-gate.test.ts: replace round-4 whitespace test with two round-5 binding tests — the trimmed string flows end-to-end (HMAC verifies, project metadata.baseDir equals realpath of trimmed input), and a request whose body baseDir diverges from the HMAC-bound string is rejected 403. docs/architecture.md §"Desktop folder-import auth" — update the daemon- restart-edge bullet to describe the lazy-retry recovery (round 4 said "restart desktop to recover", which is now wrong) and add a headless- packaged-mode bullet describing the round-5 P2 gate exclusion. Refs: nexu-io/open-design#974 * feat(sidecar-proto,daemon): surface desktopAuthGateActive over STATUS IPC (PR #974 round 6 prep) Round 6 (mrcfps): the split-start dev flow `tools-dev start daemon` -> `tools-dev start desktop` was leaving the daemon ungated because `OD_REQUIRE_DESKTOP_AUTH=1` is only injected when daemon and desktop spawn in the same orchestrator invocation. To fix that, tools-dev needs to introspect the running daemon's gate state before launching desktop main — but the existing STATUS IPC didn't carry the flag. This commit extends `DaemonStatusSnapshot` with a required `desktopAuthGateActive: boolean` and wires the daemon sidecar's STATUS handler (and the public `status()` method on the handle) to recompute the value from `isDesktopAuthGateActive()` per request, since the flag flips after `REGISTER_DESKTOP_AUTH` and stays sticky. Extracted `withCurrentDesktopAuthGate(snapshot)` as a tiny pure helper so the wiring is testable without booting a real IPC server. The new test pins four scenarios: - no secret registered (web-only mode) -> false - after `setDesktopAuthSecret(buf)` -> true - after `setDesktopAuthSecret(null)` (sticky) -> still true - input snapshot's stale value is overridden by the live flag The orchestrator-side consumer lands in the next commit (`tools/dev/src/desktop-auth-gate.ts`). Refs: nexu-io/open-design#974 * fix(tools-dev): auto-restart ungated daemon before desktop start (PR #974 round 6 mrcfps) Round 6 (mrcfps): the split-start dev sequence `tools-dev start daemon` -> `tools-dev start desktop` was leaving the daemon running without `OD_REQUIRE_DESKTOP_AUTH=1`. The env var is only injected when (A) daemon and desktop spawn in the same orchestrator invocation (`startApp` line ~682) or (B) a desktop runtime is already alive at daemon spawn time (`startDaemon` lines ~595-596). Neither fires for the split flow, so a renderer (or any local HTTP client) could `POST /api/import/folder` directly with an arbitrary `baseDir` before the desktop's first registration POST. Round-5's lazy retry didn't help: it triggers on `503 DESKTOP_AUTH_PENDING`, and the ungated daemon returns 200. Close the gap by introspecting the running daemon's `desktopAuthGateActive` (added to the STATUS IPC in the prior commit) at the start of `startApp(DESKTOP, ...)`. When the daemon reports the gate inactive, stop the daemon (and web, if running), respawn the daemon with `requireDesktopAuth: true`, restart web, then proceed with the desktop start. Restart order is critical and pinned by tests: web stops FIRST (so the web->daemon proxy doesn't serve a transient 502 against the down-then-up daemon), then daemon stops, then daemon respawns gated, then web restarts. The bundled-targets path (`pnpm tools-dev`) is unaffected because trigger (A) already armed the gate at first daemon spawn — the helper costs one ~800ms STATUS IPC roundtrip and returns no-op. Helper lives in its own module (`tools/dev/src/desktop-auth-gate.ts`) so the regression test can import it without triggering the `cli.parse()` side effect at the bottom of `tools/dev/src/index.ts`. Five `node:test` cases pin the call sequence — no daemon, gate active, gate inactive + no web, gate inactive + web running, log shape — so a future refactor can't silently regress the gate. Two synthetic `DaemonStatusSnapshot` literals in `inspectAppStatus` and `inspect` (used when the IPC is unreachable) get `desktopAuthGateActive: false` to satisfy the now-required type field — semantically correct since "no daemon answering" trivially means "no gate active." `docs/architecture.md` adds a new bullet under the Desktop folder- import auth section describing this auto-restart behavior. Refs: nexu-io/open-design#974 * fix(daemon): combine finalize request-abort + timeout signals (PR #974 round 7 lefarcen P1) Round 6 wired the route handler to pass `finalizeAbort.signal` into `finalizeDesignPackage`, but the helper only created its own DEFAULT_TIMEOUT_MS controller when no caller signal was supplied. The result: a client that stayed connected could hold the finalize lock and upstream call indefinitely. Always create the timeout controller; when the caller passes a signal, combine both via `AbortSignal.any` so neither cancel path replaces the other. Adds two regression tests in finalize-design.test.ts: - timeout fires when caller signal never aborts - pre-aborted caller signal still cancels Adds an internal `timeoutMs` option to FinalizeOptions so tests can exercise the abort path without a 120 s wait or fake-timer chains. Production callers omit it; default remains DEFAULT_TIMEOUT_MS. * fix(daemon): allow PATCH preserving existing fromTrustedPicker marker (PR #974 round 7 lefarcen P2) The PATCH /api/projects/:id handler was rejecting any metadata that contained `fromTrustedPicker`, including the unchanged `true` marker that the linked-folder UI re-spreads when editing `linkedDirs`. Trusted folder-imported projects could not update other metadata fields without 400-ing on their own marker. Switch the rejection condition from `'in'` to a value comparison: only reject when the incoming value differs from the persisted one (`patch.metadata.fromTrustedPicker !== existingMeta?.fromTrustedPicker`). That keeps acquisition (existing=undefined, patch true) and flip (existing=true, patch false) attempts blocked while letting the UI re-spread the existing marker. POST /api/projects stays strict; that path has no existingMeta. Adds two regression tests in desktop-import-token-gate.test.ts: - allows PATCH preserving the existing fromTrustedPicker:true marker - rejects PATCH that flips fromTrustedPicker on a trusted project * fix(desktop,packaged): main-process api uses daemon URL not webUrl (PR #974 round 7 lefarcen P2) Packaged builds load the renderer from `od://app/` and report that URL through `discoverWebUrl`. But Node-side `globalThis.fetch` (undici) does not route through Electron's registered `od://` protocol handler — that handler runs in the renderer's protocol scope, not in main-process Node. So `pickAndImportFolder` and `fetchResolvedProjectDir` calls from main silently failed in packaged builds against the protocol scheme. Add `discoverDaemonUrl` to `DesktopRuntimeOptions` and `DesktopMainOptions`. The packaged shell already has the sidecar's real `http://127.0.0.1:<port>` URL (`sidecars.daemon.url` from STATUS IPC) — thread it through to the runtime. Main-process API calls now prefer the daemon URL and fall back to the renderer URL for tools-dev (where it is itself http://127.0.0.1). `PickAndImportFolderDeps.webUrl` renamed to `apiBaseUrl` so the boundary is explicit at the type level; `fetchResolvedProjectDir`'s first parameter renamed similarly. tools-dev callers see no behavior change — their web URL is already an http://127.0.0.1 URL Node fetch can hit. Test (`apps/packaged/tests/desktop-pick-and-import.test.ts`): - existing 7 cases updated to the new prop name (no behavior change) - new case pins URL composition: builds `${apiBaseUrl}/api/import/folder` and never produces a custom-protocol URL. Note for review: this test pins URL composition; full Electron protocol handler integration (renderer fetch through `od://`) is not exercised in unit tests here. * fix(tools-dev): preserve daemon/web ports across desktop-auth gate restart (PR #974 round 7 lefarcen P2) Round 6 added the split-start auto-restart in ensureDaemonGateForDesktop to close the dev-flow gap where `start daemon` then `start desktop` left the daemon ungated. The restart was passing the current `start desktop` CLI options to startDaemonGated/startWeb, which meant a stack started with `--daemon-port 17456 --web-port 17573` could be silently moved to random ports during the hardening restart, breaking browsers and scripts pinned to those ports. Extract the running ports from the STATUS snapshots (daemon.url and web.url) and forward them as explicit `{ port }` callback args. The closure in `tools/dev/src/index.ts` overrides the corresponding option when a port was extracted; null falls back to the original CLI flags. Adds three regression tests in tools/dev/tests/desktop-auth-gate.test.ts: - preserves the running daemon port across the hardening restart - preserves the running web port across the hardening restart - falls back to caller options (port:null) when the URL has no port * fix(web): refresh useDesignMdState on file/chat events (PR #974 round 7 mrcfps) useDesignMdState() previously only recomputed on mount and on explicit refresh() (called once after finalize). Once the user kept working — editing files or sending more chat turns — the stale/fresh badge could drift out of sync because file mtimes and conversation updatedAt moved past the recorded generatedAt without the hook re-checking. Hook accepts an optional `refreshKey: number` arg; ProjectView keeps a counter and bumps it on three events: - file-changed SSE (covers tool-emitted file mutations) - live_artifact* SSE (covers chat turns that emit artifacts) - streaming `true → false` edge (covers pure-text chat turns) The hook treats refreshKey as a compute() dep; React's Object.is comparison short-circuits the no-op renders, so each bump is a single recompute pass. Adds a regression test in useDesignMdState.test.tsx: - flips stale state after a refreshKey bump without remounting * fix(web): degraded-state useDesignMdState on malformed provenance (PR #974 round 7 mrcfps) useDesignMdState used to report `{ isStale: false, staleReason: null }` when the parser could not extract a comparison timestamp from the DESIGN.md `## Provenance` section. The pinned test made that the documented behavior. As mrcfps pointed out, that fails open exactly when the freshness signal is most untrustworthy: any provenance- formatting drift silently disables the staleness warning. Extend `DesignMdStaleReason` with a third variant `'unknown-provenance'`. On `generatedMs === null`, return `{ isStale: true, staleReason: 'unknown-provenance' }`. ContinueInCliButton renders a distinct chip text "Spec freshness unknown — regenerate to refresh signal" for that variant; the button stays enabled because not-comparable is not the same as broken state. Tests: - modify the existing pinned test to assert the new degraded state - add an end-to-end useDesignMdState test feeding a malformed Provenance section through compute() so a regression that re-pins fresh-on-null at the hook level (not just computeStale) fails fast - add ContinueInCliButton render + click tests for the new chip --------- Co-authored-by: DevForgeAI CI/CD Engineer <devforge-ai@development.ai> Co-authored-by: lefarcen <935902669@qq.com>	2026-05-10 11:44:32 +08:00
Cursor Agent	85beeb58c9	feat(web): native diff-review UI on GenUISurfaceRenderer (Phase 8 entry slice) Plan Q1 / spec §21.5. apps/web/src/components/GenUISurfaceRenderer.tsx now ships first- class branches for the auto-derived diff-review choice surface and generic single-enum-property choice surfaces. Diff-review (DiffReviewChoiceSurface): - Three top-level buttons: 'Accept all' / 'Reject all' / 'Partial…'. - Optional 'Skip' when the host supplies onSkip. - Optional notes textarea — forwarded as decision.reason when non-empty. - On 'Accept all': submits { decision: 'accept', accepted_files, rejected_files: [] } using the touched file list from pending.context.touchedFiles. Daemon side default-fills when the list is empty. - On 'Reject all': symmetric. - On 'Partial…': reveals a per-file accept/reject toggle for each touched file. Submit refuses locally when ANY file is left undecided (mirrors the daemon's 'partial must cover every touched file' contract from §3.O5 so the user doesn't ping the server with an obviously-invalid payload). - Disabled when context.touchedFiles is empty (the daemon's default-fill path doesn't help with a partial decision). Generic choice (GenericChoiceSurface): - Detects schemas of shape `{ properties: { <key>: { enum: [...] } } }` and renders one button per enum value. Property literally named 'decision' wins over other enum properties when several are declared (so plugin-author-customised diff-review schemas keep rendering as accept/reject/partial buttons even if they add extra fields). PendingSurface gains an optional `context: { touchedFiles?: [] }` field. Future runtime-context entries plug in here without bloating the GenUISurfaceSpec contract. Web tests: 586 → 593 (+7 cases on GenUISurfaceRenderer.diff-review: accept-all default-fill, reject- all default-fill, partial union, partial blocks on undecided file, partial disabled when context absent, optional reason forwarding, generic single-enum choice button group). Co-authored-by: Tom Huang <1043269994@qq.com>	2026-05-09 15:36:38 +00:00
Demoniooo	617fb043fe	feat(settings): add fetch models button for BYOK providers (#1034 ) * feat(settings): add fetch models button for BYOK providers * fix(settings): exclude Ollama from fetch models, add manual-entry hint * fix(provider-models): classify non-JSON upstream errors by HTTP status * fix(i18n): drop redundant English overrides from non-English locales * fix(provider-models): allow ollama through allowlist, return unsupported_protocol --------- Co-authored-by: haolin122 <hl6593@nyu.edu>	2026-05-09 22:28:03 +08:00
mec dot	039cc3d674	fix(settings): add install onboarding links for unavailable local CLIs (#985 ) * fix(settings): add install onboarding links for unavailable local CLIs * fix(settings): rename Claude config dir label to config directory --------- Co-authored-by: mrcfps <mrc@powerformer.com>	2026-05-09 21:49:44 +08:00
Sid	5db578123e	fix(web): dispatch Examples preview on od.preview.type (#897 ) (#1001 ) * fix(web): dispatch Examples preview on od.preview.type (#897) The Examples gallery unconditionally fetched `/api/skills/:id/example`, and the daemon endpoint only resolves HTML files (`example.html`, `assets/template.html`, `assets/index.html`, `examples/.html`). Skills that declare `od.preview.type: image` (`hatch-pet`) or `od.preview.type: markdown` (`dcf-valuation`, `last30days`, `x-research`) ship no such HTML — the fetch returns 404 and the modal landed on the misleading "Couldn't load this example. The example HTML failed to fetch." copy. Dispatch on `previewType` at the data layer (`fetchSkillExample`) and at the render layer (`PreviewModal`): - `fetchSkillExample(id, previewType)` short-circuits any non-`html` value to `{ unavailable: true, kind }` without firing a network call. - `PreviewView` grows an optional `unavailable: { kind }` shape; the modal renders a calm "no shipped preview" placeholder distinct from loading and error states. The Share menu disables (no HTML to export). - `ExamplesTab` tracks `previewUnavailable` per skill alongside the existing `previews` / `previewErrors` maps; the card placeholder swaps to "open to learn more" copy so users don't hover waiting for a render that won't come. - New `preview.unavailableTitle` / `preview.unavailableBody` and `examples.unavailablePlaceholder` / `examples.shareUnavailable` keys shipped across all 17 locales. Body copy uses the raw preview kind (`{kind}` placeholder) so future kinds slot in without a copy change. Tests: registry-level coverage that the dispatch never hits the network for non-html types; PreviewModal-level coverage that the unavailable affordance is mutually exclusive with loading/error and disables the Share menu; ExamplesTab-level coverage that the gallery renders the unavailable state for image/markdown skills and routes html skills through the existing fetch path. Updated the existing `#860` retry regression test for the new two-arg signature. fix(web): use neutral noun for preview.unavailableBody copy (#1001 review) P3 from lefarcen: `'a {kind} document'` reads awkwardly when `{kind}` is `image` ("a image document") and the article disagreement undermined the PR body's claim that future kinds slot in without copy changes. Drop the article and replace `document` with a more neutral noun (`output` / `resultat` / `产物` / `出力` / etc.) so every kind reads naturally: - `produces {kind} output` (English) - `produit un résultat {kind}` (French) - `生成 {kind} 产物` (Simplified Chinese) - … and 14 more `{kind}` placeholder stays literal in every locale; surrounding vocabulary for skill / prompt / chat preserved per existing file conventions.	2026-05-09 21:32:45 +08:00
CIoudherd	724d071c01	feat: add design file rename support (#894 ) * feat(contracts): add project file rename contract * feat(daemon): add safe project file rename API * feat(web): support renaming design files * fix(daemon): handle case-only file renames * fix(web): prevent rename collisions with pending sketches * fix(daemon): preserve source names during rename * test(daemon): cover rename symlink escapes * fix(daemon): avoid clobbering rename targets * test(web): align rename tests after rebase * test(web): align rename tests with latest main	2026-05-09 21:24:36 +08:00
meshackm	461a312002	fix: fix link handling in example preview iframe sandbox (#701 ) * fix: fix link handling in example preview iframe sandbox * fix: added missing single quote * fix(srcdoc): security/correctness issues - Use location.hash = href for hash-link navigation so hashchange events, CSS :target, history/back and named anchors work correctly - Always pass 'noopener,noreferrer' to window.open() for _blank links - Guard e.target with instanceof Element before calling .closest() to avoid TypeError on text node click targets * fix(srcdoc): security/correctness issues - Added a protocol allowlist - Re-navigation to same hash - Consistency with rest of shim IIFE using var instead of let and const * fix(srcdoc): scroll to top on bare hash links (#) Intercept href="#" clicks and scroll to top of page in the iframe sandbox to mimic native browser behaviour. * test(web): update PreviewModal sandbox test assertions to include popup flags	2026-05-09 21:24:25 +08:00
Cursor Agent	411d83b0bf	feat(web): MarketplaceView + PluginDetailView + /marketplace routes Plan G4 / spec §11.6. router.ts gains two new Route variants — 'marketplace' and 'marketplace-detail' — plus parsing for both /marketplace/<id> and the /plugins/<id> alias the public site (§13) reserves. App.tsx dispatches them outside the EntryView / ProjectView split so the discovery surface stays independent of any active project. New components: - MarketplaceView (apps/web/src/components/MarketplaceView.tsx) - Card grid of every installed plugin with trust-tier filters (All / Trusted / Restricted). - Secondary 'Configured catalogs' panel listing every row in /api/marketplaces with id / url / trust / plugin count. - Cards link to /marketplace/<id>. - PluginDetailView (apps/web/src/components/PluginDetailView.tsx) - Loads /api/plugins/:id, renders header (title, version, trust, sourceKind, taskKind), description, capability checklist, connector requirements (required + optional), and declared GenUI surfaces. - 'Use this plugin' button calls applyPlugin(id) and navigates back to Home so the existing inline rail / NewProjectPanel surface picks up the snapshot. Web tests: 579 → 586 (added router-marketplace 5 cases + MarketplaceView 2 cases). Typecheck clean. Co-authored-by: Tom Huang <1043269994@qq.com>	2026-05-09 12:28:59 +00:00
Cursor Agent	daceadefbd	feat(web): mount PluginsSection in NewProjectPanel Plan §3.F5 / spec §8 / §11.6. New apps/web/src/components/PluginsSection.tsx bundles InlinePluginsRail, ContextChipStrip, and PluginInputsForm into one host-agnostic widget. The host wires three optional callbacks: - onApplied(brief, applied) fires on apply + every input change - onCleared() fires when the user removes a context chip (clears the active plugin) - onValidityChange(valid) mirrors the inputs-form validity gate renderPluginBriefTemplate substitutes {{var}} placeholders inside useCase.query against the live inputs map so the brief output stays in sync as the user types. NewProjectPanel mounts PluginsSection right under the project-name input. The section is purely additive: clicking a plugin card hydrates the project name field (only when empty) with the rendered brief, so the existing Send-button rules are unchanged. The deeper composer gating + ChatComposer mount stay scheduled for the Phase 2B PR. Web test suite: 575 → 579 (added 4 cases on PluginsSection: empty state, apply hydration, input change re-emission, chip-remove clearing). Co-authored-by: Tom Huang <1043269994@qq.com>	2026-05-09 12:12:34 +00:00
Cursor Agent	adc2afd769	feat(web): plugin composer surface — applyPlugin + Rail + Inputs + GenUI renderer Plan §3.C1–§3.C4. Web composer integration for the plugin system: - apps/web/src/state/projects.ts gains: * applyPlugin(pluginId, { inputs?, projectId?, grantCaps? }) — wraps POST /api/plugins/:id/apply and returns the typed ApplyResult. * listPlugins() — wraps GET /api/plugins. * renderPluginBriefTemplate(template, inputs) — substitutes {{var}} placeholders inside useCase.query as the user types so the composer's brief textarea re-renders live. - New components: * InlinePluginsRail — the card strip that lives below the input box on Home and inside ChatComposer. Supports 'wide' / 'strip' layouts + taskKind / mode filters. * ContextChipStrip — typed ContextItem chips above the brief input. Optional onRemove for clearing the applied plugin. * PluginInputsForm — JSON-Schema-light form rendered between the input and Send. Required fields gate Send via onValidityChange; string/text/select/number/boolean field types are supported. * GenUISurfaceRenderer — first-class confirmation + oauth-prompt surfaces (form + choice fall back to a JSON Schema preview + free-form textarea until Phase 2A.5). * GenUIInbox — drawer that lists every persisted surface answer for a project; revoke calls POST /api/projects/:id/genui/:sid/revoke. - jsdom tests under apps/web/tests/components/: * InlinePluginsRail (mount fetch, click → applyPlugin → onApplied, taskKind filter) * PluginInputsForm (validity gating, default hydration, select options) * GenUISurfaceRenderer (confirmation true/false branches; oauth surface forwards { authorized, connectorId } per spec §10.3.1) Web test suite: 567 → 575 (added 8 plugin component cases). The NewProjectPanel / ChatComposer / ProjectView mounts will land in the follow-up commit so this PR's diff stays reviewable. Co-authored-by: Tom Huang <1043269994@qq.com>	2026-05-09 11:47:12 +00:00
zztdan	fe879036fb	fix(web): restore media config from daemon on startup (#687 ) * fix(web): restore media config from daemon on startup * fix(media): preserve stored keys on settings save * fix(web): harden daemon media restore flow * fix(web): unify media provider empty-state rules * fix(desktop): retry loading discovered web url * fix(web): preserve local media providers on partial daemon reload * fix(web): preserve media providers on daemon reload * fix(web): skip media migration for masked-only local state * fix(web): preserve daemon media state across reloads	2026-05-09 19:31:08 +08:00
Paul Stean	f3535cdd9f	feat(web): scroll question forms to top of viewport instead of pinning to bottom (#1044 ) * feat(web): scroll question forms to top of viewport instead of pinning to bottom * fix: reset scroll state on composer send after form has claimed control * fix: use getBoundingClientRect for form scroll position; guard early-return on streaming * fix: smooth scroll for all chat-scroll operations; polyfill scrollTo for jsdom tests * fix: revert streaming bottom-pin to instant to avoid scroll event thrash * fix: revert initial-load bottom-pin to instant to avoid scroll event thrash	2026-05-09 19:29:54 +08:00
Marc Chan	223d35f073	fix: improve Orbit and packaged data-dir startup errors (#1067 )	2026-05-09 16:47:01 +08:00
Eli	1bf7836471	feat(web): redesign top bar — lift Share/Present, zoom dropdown, focus toggle (#1048 ) * feat(web): redesign top bar — lift Share/Present, add zoom dropdown, move focus toggle - AppChromeHeader: add #app-chrome-file-actions portal anchor so file viewers can render their primary actions (Present/Share) up in the project chrome instead of cramming a second toolbar row. - HtmlFileViewer / LiveArtifactViewer: portal Present + Share into the top bar via createPortal; Share gets a real chrome-action-primary button. - HtmlFileViewer: replace the 100% reset button with a zoom dropdown (50/75/100/125/150/200) with click-outside + Esc handling. - HtmlFileViewer: move Preview/Source tabs next to Reload (left side, view modes); move Tweaks to the right cluster next to Inspect/Edit. - HtmlFileViewer: showPresent no longer requires deck — any HTML artifact with loaded source can be presented (prototype/slide/regular HTML). - LiveArtifactViewer: add Present (in-tab/fullscreen/new-tab) with iframe ref + previewBodyRef wrapper; in-tab present hides chrome and overlays an exit button (Esc also exits). - ChatPane: add chevron-left collapse icon in chat header (onCollapse prop) so users can hide the chat from where it lives. - FileWorkspace: focus toggle is now icon-only and only renders when chat is collapsed, sitting on the LEFT of the workspace tabs row as a chevron-right expand button — direction matches where chat re-emerges from. - index.css: add chrome-action-primary/secondary, zoom-menu, present-exit-btn, app-chrome-file-actions styling, plus a narrow-width media query that collapses secondary action labels. * fix(web): tests — fall back to inline render when chrome portal anchor missing The Share/Present primary actions render via createPortal into #app-chrome-file-actions, which only exists when AppChromeHeader has mounted. Vitest renders FileViewer / LiveArtifactViewer in isolation, so the portal anchor was absent and the buttons disappeared from the test DOM, breaking 7 share-menu tests. - HtmlFileViewer / LiveArtifactViewer: when chromeActionsHost is null, render the present/share JSX inline instead of returning null. UX is identical in production (host is always present); tests now find the buttons without needing a portal-aware harness. - FileWorkspace tests: rewrite the "focus toggle in tab bar" assertions to reflect the new design — the toggle lives in ChatPane while the chat is open, and FileWorkspace only renders an icon-only expand button on the LEFT of the tab bar once chat is collapsed.	2026-05-09 15:26:22 +08:00
Tatsuyato	f5564c93a7	i18n: add full Thai translation (th) (#1018 ) * i18n: add full Thai translation (th-TH) * i18n: fix placeholders, update tests and complete documentation for Thai (th) * i18n: fix placeholders, update tests and complete documentation for Thai locale * chore: revert unrelated docker deployment changes (fix scope drift) --------- Co-authored-by: ryu <ryu@example.com>	2026-05-09 15:19:47 +08:00
Caprika	b020f1e39a	fix opencode todowrite footer state (#1046 )	2026-05-09 15:08:19 +08:00
lefarcen	6f74ac304d	fix(web): expand design file row click target (#1039 )	2026-05-09 14:46:09 +08:00
lefarcen	286b6cdf8d	fix(web): make privacy consent choices explicit (#1031 )	2026-05-09 14:37:43 +08:00
Caprika	8feb2e586c	fix(connectors): preserve OAuth state and advertised tool counts (#1036 ) * fix(connectors): preserve oauth state and advertised counts * test(connectors): type fixture for advertised count * docs(connectors): align tool count badge contract * docs(connectors): clarify curated tool names role	2026-05-09 13:54:35 +08:00
Herédi Áron	66f84972cf	feat(byok): Added Ollama CLoud as BYOK provider (#923 ) * feat: add Ollama Cloud to KNOWN_PROVIDERS as OpenAI-compatible BYOK provider * feat: add ollama.com to isOpenAICompatible base URL detection * feat: add Ollama Cloud models to SUGGESTED_MODELS_BY_PROTOCOL fallback list * fix: use full Ollama Cloud model list from /api/tags, drop -cloud suffix * feat: add Ollama Cloud as native protocol with NDJSON streaming and connection test support * fix: remove ollama.com from OpenAI compatibility check * feat: add token overrides for Ollama Cloud models to prevent truncation * fix: extend inferApiProtocol and legacy migration to recognize ollama.com base URLs * fix: normalize Ollama Cloud base URL by stripping /api suffix during migration and in daemon --------- Co-authored-by: herediaron <aronheredi346@gmail.com>	2026-05-09 11:21:16 +08:00
Sid	78ae6feb59	fix(web): surface empty-annotation state for Inspect/Picker (#890 ) (#1005 ) When the agent emits an HTML artifact with no `data-od-id` / `data-screen-label` annotations (a freeform PRD → HTML pass through a Claude-Code-compatible CLI without going through a skill, for example), the existing Inspect / Picker affordances no-oped silently: - The bridge's click handler walks up to <html>, finds nothing tagged, and bails before emitting `od:comment-target` — by design, since posting a synthetic id here would change save-to-source semantics for inspect overrides (the persisted CSS keys off the same elementId). - The host then sat at "Click any element with `data-od-id` to tune its style" — phrased as if the user just hadn't found the right element, when the page in fact had nothing matching at all. - Picker mode (Tweaks → Picker) had no hint at all. The bridge already broadcasts `od:comment-targets` with the full list on every mode toggle and DOM mutation, but the host's existing listener was gated on `boardMode` only — Inspect mode never learned the artifact's annotation count. Two surgical fixes: 1. `FileViewer.tsx`: a dedicated `od:comment-targets` listener that installs whenever Inspect OR Comments mode is active, mirroring the bridge's broadcast into `liveCommentTargets`. The comment-mode-only listener still owns its hover / click / pod events; this new listener only handles the targets list. 2. `FileViewer.tsx`: the inspect-empty-hint banner now dispatches on `liveCommentTargets.size === 0`. Empty: a clear "this artifact has no `data-od-id` annotations yet — ask the agent to add them" message that names the missing attribute. Populated: existing instructive copy. Mirrored across Inspect and Picker modes so the failure surface gives the same calibration signal in both. Tests: - `tests/runtime/srcdoc-bridge-empty-targets.test.ts` (3 cases): pin the bridge contract this fix depends on. Run the IIFE in jsdom and assert (a) `allTargets()` posts an empty list for unannotated DOM, (b) clicks on unannotated elements do NOT post `od:comment-target` (regression pin against future "synthetic id" fallbacks that would silently change save-to-source semantics), (c) clicks DO still resolve to an annotated ancestor when one exists. - `tests/components/FileViewer.inspect-empty-hint.test.tsx` (3 cases): pin the host dispatch — empty state in Inspect mode, the switch back to instructive copy when targets show up, and the mirrored affordance in Picker mode. Out of scope (flagged in the design comment so it isn't lost): - The follow-up scenario from #890 ("parent has data-od-id, target child does not → adjustments hit the parent") is a different bug that needs either synthetic-id fallback or a UI affordance to descend into the click target. Leaving that to a follow-up so this PR stays narrow. - i18n: the existing inspect-empty-hint copy is hardcoded English; rolling it into the 17-locale Dict is a separate cleanup.	2026-05-09 11:20:13 +08:00
初晨	9ef136ced5	fix: sync Orbit last run with selected prompt template (#937 ) * fix(orbit): scope last run to selected template * fix(orbit): preserve legacy last run on upgrade * fix(orbit): pin legacy last-run fallback on refresh * fix(orbit): pin template id at run start * test(web): sync orbit fixtures with skill summary	2026-05-09 11:19:59 +08:00
lefarcen	afb331a288	feat: add opt-in Langfuse telemetry (#800 ) * docs(specs): add langfuse telemetry change spec Captures the design for forwarding completed agent runs to Langfuse, including data-model mapping, field-budget caps, privacy gates, build-secret injection, GDPR right-to-deletion approach, and the resolved decisions on default consent, identifier shape, region, and ownership. * feat(daemon): add langfuse-trace module and telemetry prefs Adds the dependency-free building blocks for forwarding completed agent runs to Langfuse. Two layers: - AppConfigPrefs gains installationId and a TelemetryPrefs object with metrics / content / artifactManifest gates. The daemon validator treats telemetry like agentModels — replace-on-write, drop-when-empty, reject non-boolean inner values. - New langfuse-trace.ts builds a {trace-create, generation-create} pair from a ReportContext, capping prompt at 8 KB, output at 16 KB, artifacts at 50 entries, and dropping any batch larger than 1 MB before send. reportRunCompleted is no-op when LANGFUSE_PUBLIC_KEY / LANGFUSE_SECRET_KEY are unset (so dev runs and forks never emit) and short-circuits on prefs.metrics === false. Server-side wiring into the run-close path lands in a follow-up. * fix(langfuse): default to US Langfuse region End-to-end smoke against the project's actual dev key on 2026-05-07 returned 401 from cloud.langfuse.com (EU) and 207 from us.cloud.langfuse.com (US), confirming the org lives in US. Update the default base URL, the matching test, and the spec's Q3 decision row to match. Self-hosted or EU-region operators can still override via the LANGFUSE_BASE_URL env var. * feat(daemon): wire langfuse trace forwarding into run-close Adds the daemon-side glue to forward completed agent runs: - runs.ts gains an optional onTerminate hook fired once per run after it reaches a terminal state. Errors thrown from the hook are caught and logged, never propagated, so telemetry can never break the run path. - New langfuse-bridge.ts assembles a ReportContext from the in-memory run record, the conversation's persisted assistant message, and the user's app-config preferences. It tolerates a missing message (e.g. when web has not yet PUT the final delta) and a missing app-config. - server.ts stashes the original user prompt on the run object inside startChatRun so the bridge can include it without crossing the createChatRunService boundary, and registers the hook callback when building the run service. Behavior remains a no-op unless LANGFUSE_PUBLIC_KEY / LANGFUSE_SECRET_KEY are set in the daemon env AND telemetry.metrics is true in app-config. A live smoke against us.cloud.langfuse.com on 2026-05-07 confirmed the matching trace + generation schema is accepted (HTTP 207, both events 201 created). * fix(langfuse): address PR #800 review feedback P1 — Move trace forwarding off the daemon-internal run-close hook and onto the message-persistence path. The original onTerminate hook ran inside finish() the moment the SSE 'end' event was emitted, which is before the web client's onDone handler refreshes project files and PUTs producedFiles + final assistant content back to SQLite. Reading SQLite at that moment routinely missed both. The fix: drop the runs.ts hook entirely and trigger from PUT /api/projects/:id/conversations/:cid/ messages/:mid when the saved row carries a terminal runStatus. A reportedRuns Set guards against the multiple PUT calls web makes per turn (each retry / state update). Set entries auto-evict after the same 30 min TTL the runs map uses. Web persists a terminal-status message in all three completion paths — onDone (succeeded), onError (failed), and cancel (canceled) — so this catches every run shape. P2 — postLangfuseBatch now parses the 207 Multi-Status response body. Langfuse legacy ingestion always returns 207, and response.ok is true for 207, so per-event validation errors used to slip through silently. We now warn when body.errors is non-empty. Two new unit tests. P2 — truncate() and the HARD_BATCH cap now compare UTF-8 byte length, not String.length (which counts UTF-16 code units). A 4096-character CJK prompt occupies 12 KB, well over the 8 KB input cap. truncate also walks backwards to a UTF-8 leading byte so the cut never lands inside a multi-byte codepoint. New unit test covers '设'.repeat(4096). P2 — Spec R7 now lists the actual Langfuse trace deletion endpoint (DELETE /api/public/traces/{traceId} for single, DELETE /api/public/traces with body for batch). Verified by curl on us.cloud.langfuse.com: DELETE /api/public/traces/X → 200; the path the original spec named (POST /api/public/trace/X) returns 404. Reference link points at langfuse.com/docs/administration/data-deletion. P3 — Q4 (legacy ingestion vs OTel) moved from Open Questions to Resolved Decisions. The implementation already commits to legacy and the trade-off was discussed during design; the open-question status was stale. * feat(web): privacy consent surface + Settings → Privacy tab Adds the user-facing half of the telemetry feature so the daemon-side hook from PR #800 has something to talk to. - AppConfig gains optional `installationId` (anonymous v4 uuid generated on first opt-in; null after explicit decline; undefined when the user has never seen the consent surface) and `telemetry: TelemetryConfig` ({metrics, content, artifactManifest}). syncConfigToDaemon round-trips both fields so the bridge module sees the same prefs. - SettingsDialog grows a Privacy section with two states. When the user has never made a consent decision (typical first-run path), the section renders the GDPR-aligned consent card: a kicker, the disclosure body listing both metrics and conversation content as separate bullets, and two equally-prominent buttons ("Share usage data" / "Don't share"). The Don't-share path keeps the app fully usable (core app must work with all tracking declined). After a decision the same panel switches to three independent toggles + the anonymous ID + a "Delete my data" button that rotates the ID and turns everything off. - App.tsx points the welcome modal at the new Privacy section so the consent decision is the first thing a fresh installation sees. - 17 i18n keys land in en + zh-CN + zh-TW with hand-translated copy, and as English placeholders in the remaining 14 locales — enough for the parity check to pass while leaving room for proper localisation in a follow-up. Dict type updated. - Minimal index.css for the consent card + toggle rows so the panel is legible without depending on follow-up design polish. Telemetry remains a no-op end-to-end until the user clicks Share usage data: the daemon gate (prefs.metrics === true) keeps every code path short-circuited otherwise. * refactor(web): rebuild Privacy panel using project-native settings primitives The first cut used custom .settings-privacy-* classes + raw HTML checkboxes that didn't match any other Settings tab. Replace with the shell other sections already use: - settings-subsection containers with section-head + h4 + .hint - seg-control / seg-btn pill toggles ("active" / "offline") for each of the three telemetry preferences, mirroring NotificationsSection - a 2-cell seg-control for the consent card so Share usage data and Don't share carry identical visual weight (the GDPR equal-prominence requirement that the previous accent / outline split missed) - ghost button + readonly text input for the installation id row, mirroring the API-key field pattern elsewhere Drop the bespoke CSS block in favor of inheriting the existing settings-section / seg-control / ghost styling. The only privacy- specific style left is a tight definition list inside the consent card for the metrics + content disclosure rows. * refactor(web): use .toggle-row iOS switch for Privacy preferences Active/offline pills (the seg-control single-cell pattern that NotificationsSection uses) read awkwardly for a flat preference list. Switch the three telemetry toggles to .toggle-row — the same control NewProjectPanel uses for "speaker notes" / "animations": label + hint on the left, iOS-style sliding switch on the right, full-row click target. The consent card's two-button seg-control stays as-is — there the equal-weight pill pair is exactly what GDPR equal-prominence wants. * feat(web): standalone first-run privacy consent banner Replaces the Settings-dialog-as-onboarding hack with a dedicated bottom-right banner card that mounts whenever the user has never made a privacy decision (cfg.installationId === undefined). The banner is prominent (anchored to the corner with a soft shadow) but non-blocking, mirrors cookie-consent UX, and shares the project's panel styling — same .modal-elevated background, --radius-lg corners, --shadow-lg lift. Wiring: - App.tsx imports PrivacyConsentModal and renders it at the root, gated on installationId === undefined && !settingsOpen so it doesn't double up with the Privacy tab's own consent card when Settings is already showing. - Share / Don't share both go through handleConfigPersist, so the resulting installationId + telemetry prefs land in localStorage and the daemon at the same time, reusing the existing autosave plumbing. - The previous attempt that pinned the welcome SettingsDialog to the Privacy section is reverted; onboarding now stays focused on agent configuration, and the consent decision lives in its own surface. * fix(web): keep privacy banner visible while Settings welcome modal is open The banner gated itself on `!settingsOpen` to avoid double-rendering with the Privacy tab's consent card. But the first-run path opens the Settings welcome modal automatically when `onboardingCompleted=false`, which fired immediately after bootstrap — so the banner flashed for a moment and then vanished behind the modal backdrop. Drop the `!settingsOpen` clause so the banner stays mounted whenever the user has not yet made a privacy decision, and bump its z-index above the modal backdrop (200 vs 100) so first-run users can actually reach the consent buttons. The minor visual overlap with the Privacy tab's own card is fine: clicking either copy resolves both surfaces. * copy(privacy): soften consent button labels Banner action buttons now read "Help improve Open Design" / "Not now" (en, with hand translations in zh-CN / zh-TW and English placeholders in the other 13 locales) instead of "Share usage data" / "Don't share". The new wording aligns the affirmative action with the kicker copy ("Help us improve Open Design") and reads less alarming, while the disclosure list above still names both data categories explicitly so the consent stays informed under GDPR. The decline button stays as a soft "Not now" rather than an aggressive "Don't share" so the reject path doesn't read as hostile to the user. No structural change — the two-cell seg-control still gives the buttons identical visual weight, and the underlying side-effects are unchanged (installationId is generated on Help / nulled on Not now, and the telemetry prefs flip the same way). * feat(telemetry): expand trace fields for evals & dataset construction Each Langfuse trace now ships the full per-turn + per-install fact sheet that the eval/dataset workflow needs, instead of only the bare turn id + token count from before. Everything below is gated by `prefs.metrics === true`; nothing here is content (those gates remain separate). Per-turn: - model — first-class generation.model field, drives Langfuse cost lookup and model-grouping in the UI; also mirrored in trace.metadata and trace.tags so list-view filters work. - reasoning — generation.modelParameters.{ reasoning } so the Model Parameters card lights up; mirrored in metadata. - skillId / designSystemId — metadata + tags, so dataset slices can group by which skill/DS produced which output. Per-process / build (constant within one daemon run, cached at start): - appVersion / appChannel / packaged from app-version.ts - nodeVersion (process.version), os (platform()), osRelease, arch (os.arch()) - clientType — desktop vs web, derived from a new X-OD-Client header the web layer sets in providers/daemon.ts (with a User-Agent sniff fallback for third-party callers). Plumbing: - startChatRun stashes model / reasoning / skillId / designSystemId on the run object alongside the existing userPrompt stash. - POST /api/runs reads X-OD-Client and stores run.clientType. - langfuse-bridge collects RuntimeInfo once per process and merges per-run client carrier; ReportContext gains optional `turn` + `runtime` blocks; existing fields stay backward compatible. Spec gains a "Telemetry Fields Catalog" section enumerating every field, its source, and the gate it lives under, so the eval team has a single place to look up what's available without reading the trace schema by example. Tests: - new langfuse-trace tests cover turn tags, runtime tags, generation model/modelParameters promotion, modelParameters omission when reasoning is unset, and metadata mirroring. - langfuse-bridge gains an end-to-end "turn-level config" test that threads model/reasoning/skill/DS/clientType + appVersion through the bridge and asserts the Langfuse payload shape. - existing tests adjusted to tolerate host-dependent os tag. * copy(privacy): trim Share button to verb phrase only "Help improve Open Design" overflowed the equal-width 2-cell seg-control on the consent banner — the product name is already in the kicker + headline above the buttons, so the button itself only needs the verb phrase. Drop the product name from all locales: - en: Help improve Open Design → Help improve - zh-CN: 帮助改进 Open Design → 帮助改进 - zh-TW: 協助改進 Open Design → 協助改進 The decline button ("Not now" / "暂不" / "暫不") was already short, so the two buttons now have comparable length and the equal-prominence seg-control fits cleanly. Standalone Settings → Privacy panel uses the same labels for consistency. * fix(web): defer Settings welcome modal until privacy decision is made Previously bootstrap raced two surfaces against each other on first launch: the privacy consent banner (gated on installationId === undefined) and the Settings welcome modal (gated on onboardingCompleted === false). The banner's higher z-index kept it above the backdrop visually, but having two foreground surfaces at once is still confusing UX. Sequence them instead: bootstrap only opens the welcome modal when the user has already resolved consent (installationId !== undefined). Until then the banner owns the foreground alone. Once the user clicks Help improve / Not now, the corresponding handler hands off to the welcome modal if onboarding is still pending. End state matches what it was before — just without the simultaneous-render flash. * debug(privacy): log banner gate state to track sudden disappearance Two console.log points to find which setCfg call (or stale bundle) is flipping cfg.installationId from undefined to a value while the banner is visible. To remove once the regression is reproduced. * fix(privacy): keep installationId + telemetry out of localStorage Daemon is now the single source of truth for the privacy decision. Why this matters: the consent banner gates on \`config.installationId === undefined\`, but loadConfig() merges localStorage on top of the daemon's reply, so a stale uuid in \`open-design:config\` (left over from a previous opt-in) was re-hydrating the React state and immediately syncing back to the daemon — defeating "Delete my data" and re-suppressing the banner within milliseconds of every page load. The deeper reason to fix it here, not just patch the gate: a privacy identifier persisted in browser storage that the user can't see or clear without DevTools is a compliance liability. Anything users can revoke needs one canonical place to store it. Daemon \`app-config.json\` already serves that role for everything else gated through syncConfigToDaemon, so installationId + telemetry now ride that path exclusively: - saveConfig() strips both keys before writing localStorage. - loadConfig() strips both keys when reading older stale payloads, so existing installs migrate transparently on next launch. - syncConfigToDaemon() / mergeDaemonConfig still round-trip them, so the React state stays in sync with the daemon as before. Net effect: clearing app-config.json (or hitting "Delete my data") now fully resets the install identity, with no residual cohort key in browser storage. * feat(privacy): scrub secrets + PII from prompt/output before send When prefs.content is on, daemon now runs the prompt and assistant text through a regex scrubber (apps/daemon/src/redact.ts) before posting to Langfuse. The scrubber is the simplest thing that gives the user-facing copy a truthful claim — pure regex, zero new dependencies, fully auditable in this Apache-2.0 repo (vs. pulling a single-maintainer 5-month-old npm package into a core process). Categories covered (each replaced with [REDACTED:<kind>]): - Anthropic / OpenAI sk- keys (incl. proj/live/test/ant variants) - Langfuse pk-lf- / sk-lf- (specific rule wins over generic sk-) - GitHub gh[opsur]_ tokens - AWS access key ids (AKIA + 16 uppercase) - Google API keys (AIza + 35) - Slack xox[abprs]- tokens - Stripe live/test keys - JWT header.payload.signature triples - Bearer-header values (scheme word stays readable) - Emails, IPv4, US-style phone numbers - Credit cards — 13–19 digit runs that pass a Luhn check, so order ids and unix-nanos timestamps that fail Luhn pass through unchanged Not covered, stated openly in spec + i18n: names, postal addresses, business-secret semantics, raw 40-hex tokens (too high a false-positive cost for artifact slugs). Those would require an ML layer. Wired in: - apps/daemon/src/redact.ts — exports redactSecrets() + redactSecretsWithCounts() helper for future audit-summary metadata. - apps/daemon/src/langfuse-bridge.ts — runs both prompt and output through redactSecrets() before they reach the trace builder. - 18 unit tests cover every pattern plus negative cases (Luhn-failing digit runs, out-of-range IPv4 octets, idempotence on re-redacted text, ordinary prose passthrough). - i18n privacyContentHint on en + zh-CN + zh-TW (plus 14 locale placeholders) enumerates the categories so the consent disclosure matches the implementation — the GDPR informed-consent requirement. - spec gains a Pre-send Redaction subsection with the regex shape table + intentional non-coverage list. Drive-by: dropped the [privacy] debug logs that traced the now-fixed bootstrap regression. * fix(telemetry): make Langfuse reporting resilient * feat(telemetry): nest Langfuse turn observations * feat(telemetry): emit Langfuse tool spans * fix(telemetry): report after finalized message writes * fix(telemetry): honor persisted terminal status * fix(web): let consent banner yield page clicks * fix(telemetry): report current turn prompt only	2026-05-09 10:06:01 +08:00
Paul Stean	0b039777b9	fix/Bug#772-DesignFilesSortButton-DesignFilesTableNowSortableColumns (#804 ) * Disclaimer: Changes made using OpenCode with Big Pickle, an AI code assistant. 1. Removed the ↑ button from DesignFilesPanel.tsx (was a "back" button that closed the preview pane — confusingly placed in the header as if it were for sorting). 2. Converted the file list to a single <table> with sortable columns: - Replaced the section-based grouping (Pages, Sketches, Scripts, Images, Other) with a flat, sortable table - Added column headers: Name, Kind, Modified — all clickable to toggle sort direction - Default sort is by modified time descending (same as previous behavior) - Sort indicator arrows (↑/↓) show the active sort column and direction - Live artifacts remain as a separate section above the table 3. Added i18n keys designFiles.colName, designFiles.colKind, designFiles. colModified to all 17 locale files and the type definitions. 4. Updated CSS with table layout styles (.df-table, .df-file-row, column width classes, sortable header styles). Files modified: - apps/web/src/components/DesignFilesPanel.tsx - apps/web/src/index.css - apps/web/src/i18n/types.ts - apps/web/src/i18n/locales/en.ts - apps/web/src/i18n/locales/.ts (all 16 other locale files) Updated to preserve keyboard access to sorting * Fixed keyboard to focus/activate/launch file from Design Files list. Single space bar will show preview, double spare bar will open the file as a tab * Top pagination bar (above the table): - "Show" dropdown with options 15, 30 (default), 45, 60, All - Page range indicator (1–20 of 45) - Previous / Next buttons Bottom pagination bar (below the table): - Previous / Next buttons - "Go to page" dropdown listing all page numbers - Same page range indicator Implementation details: - All controls use native <select> and <button> elements — fully keyboard accessible (Tab, arrow keys, Enter/Space) - Page resets to 0 when page size changes - safePage clamps to valid bounds when file count changes (e.g. after delete) - "All" sets page size to total file count (effectively one page) - Prev/Next buttons show disabled state at boundaries with reduced opacity * All 46 test files, all 385 tests pass. Here's what the regression test covers: ┌────────────────────┬──────────────────────────────────────────────────────┐ │Test │What it verifies │ ├────────────────────┼──────────────────────────────────────────────────────┤ │default page size │500 files → only 30 .df-file-row elements in DOM │ ├────────────────────┼──────────────────────────────────────────────────────┤ │page size All │changing per-page to "All" shows all 500 rows │ ├────────────────────┼──────────────────────────────────────────────────────┤ │page size 60 │changing to 60 shows 60 rows │ ├────────────────────┼──────────────────────────────────────────────────────┤ │Next navigation │clicking Next advances page and shows file-31 (sorted │ │ │by mtime desc) │ ├────────────────────┼──────────────────────────────────────────────────────┤ │Prev/Next disabled │Prev disabled on page 0, Next disabled on last page │ │states │ │ ├────────────────────┼──────────────────────────────────────────────────────┤ │jump to page │bottom dropdown jumps to page 3 (shows file-91) │ ├────────────────────┼──────────────────────────────────────────────────────┤ │page info text │1–30 of 500 → after Next → 31–60 of 500 │ ├────────────────────┼──────────────────────────────────────────────────────┤ │render time │renders 500 files in under 2s │ └────────────────────┴──────────────────────────────────────────────────────┘ * Fixed i18n for DesignFiles, and Fixed DesignFilesPanel Test * Fixed - P3 — .df-thead rule defined but never applied * Fixed keyboard use for file navigation, focus and button usage * Fix i18n for x of y in design files pagination * Fixed SafePage clamping * Fixed dupe file total count * Fixed x of y i18n * Fixed DeleteSelected i18n and missing from Test * fix effective pagesize issue, and change duplicate file kind to a filesize * Readded page/everything selection and i18n * Fixed i18n issues * Resolved indonesian i18n issue with cloudflare keys * Fixed unrelated cloudflare i18n issues as requested in Pull Request by reviewer * Fix e2e test: click filename button instead of row for preview The DesignFilesPanel was refactored from <button> rows to a <tr> with a nested <button> for the filename. The e2e test was still clicking the <tr> which has no onClick handler, so the preview never appeared. * Remove duplicate formatSize helper, reuse humanBytes instead	2026-05-09 09:01:57 +08:00
Nagendhra Madishetti	72fd9a73a2	fix(web): keep chat auto-scroll glued to bottom across streaming chunks (#989 ) Issue #983: when an assistant turn streamed in, the chat log stopped auto-scrolling even though the user was sitting at the bottom of the conversation. Root cause: the auto-scroll effect re-measured `scrollHeight - scrollTop - clientHeight` AFTER the new chunk had already grown the element. A single tool-use card or markdown render adds 100+ px in one tick, so the post-content distance check (`< 80`) skipped the scroll exactly when the user expected it most. Switch the gate to the existing `scrolledFromBottom` state. That flag is maintained by the user-driven scroll listener (only flipped by a real scroll event, not by content growth), so it carries the user's pre-content intent through to the effect. New content auto-scrolls when the user was glued to the bottom; scrollback sessions still preserve their position. Existing chat-scroll-preservation tests still pass (6/6); the prior- state behavior we test there is bottom-pinned vs absolute-restore on tab switch, which this change does not affect. Co-authored-by: Nagendhra <nagendhra405@gmail.com>	2026-05-09 08:15:15 +08:00
Sid	05461c64fd	fix(connectors): show stable curated tool count in connector card badge (#748 ) (#767 ) * fix(connectors): show stable curated tool count in connector card badge (#748) The connector card's "N tools" badge in `apps/web/src/components/ EntryView.tsx` rendered `connector.tools.length` for both pre- and post-Composio-hydration states, so the displayed count lurched without explanation: - Before configuring a Composio API key, GitHub showed "2 tools" (the static fallback catalog at `composio.ts:21-53`). - After hydration, the daemon merged in the full Composio provider inventory (`composio.ts:725-778`, ~868 GitHub tools), and the same badge jumped to "868 tools". Same field name, two different concepts — `connector.tools` is "everything the provider ships" while the agent-callable subset is `definition.allowedToolNames` (read-only auto-approval tools that pass `isAgentPreviewListableTool()`). Fix: surface `allowedToolNames` on the wire `ConnectorDetail` and have the badge use that count instead. The detail drawer below the card still iterates over `connector.tools` to enumerate the full inventory — the count and the list are intentionally different surfaces. The badge now stays close to "tools the agent can actually invoke" (≈2-30 for GitHub, depending on auto-allowed read tools) instead of the raw provider inventory (~868). No 800x jump on hydration. Wire format change: - packages/contracts/src/api/connectors.ts: add `allowedToolNames: string[]` to ConnectorDetail - apps/daemon/src/connectors/catalog.ts: same field on the daemon-internal type, populated in `connectorDefinitionToDetail()` as a defensive copy of `definition.allowedToolNames` - packages/contracts/src/examples.ts: extend the example fixture - apps/web/src/components/EntryView.tsx: badge call sites switch to `connector.allowedToolNames?.length ?? connector.tools.length` (the `??` keeps the badge alive against any older daemon build that hasn't shipped the field yet) Tests: - 4 new daemon tests in connectors-service.test.ts pin the contract: getConnector() emits the field, the array is a defensive copy, and the #748 regression guard simulates the Composio post-hydration shape (tools.length=801, allowedToolNames.length=1) to prove the badge invariant - web EntryView.test.ts fixtures updated to satisfy the new required field Verified locally: - daemon vitest: 925/925 - web vitest: 332/332 - daemon/web/contracts typecheck clean - i18n-check passes - Live `/api/connectors/discovery` returns the new field; pre- hydration GitHub/Notion/Google Drive badges all read "2 tools" (no regression vs before this change) Fixes #748 * fix(connectors): split drawer badge vs inventory counts; fix daemon test typecheck (#767 review) Two follow-ups for @mrcfps's review on PR #767. 1) P1: Drawer empty-state regression. The previous commit reused the curated `toolCount` for both the header badge AND the inventory section's loading gate / empty-state branch / section count. The inventory section renders `connector.tools` directly, so a hydrated connector with raw provider tools but an empty allowlist (e.g. a write-only Composio surface) would render "no tools available" and hide the actual inventory list — exactly the contradiction my own PR description warned against. Fix splits the two surfaces: - `badgeToolCount` (curated, via the new exported helper `getConnectorBadgeToolCount`) feeds the card and drawer header badge — the summary count, where the #748 stability matters. - `inventoryToolCount = connector.tools.length` (inline) drives the drawer's loading gate, section count, and empty-state branch — the surfaces describing the actual rendered list. The card has no inventory section so it stays on the badge helper unchanged. 2) CI: daemon test typecheck failed. The connectors-service test's `provisionedTools[0].safety` index access tripped daemon `tsconfig.tests.json`'s `noUncheckedIndexedAccess` strict setting, even though my local `tsc -p tsconfig.json --noEmit` was clean — that config is a separate compilation. Bind through a defined-checked local before reading `.safety`, per @mrcfps's exact suggestion. Tests: - 4 new web tests in `EntryView.test.ts` pin the `getConnectorBadgeToolCount` contract, including the explicit regression: a connector with `allowedToolNames=[]` and `tools.length=800` returns badge=0 but the inventory length stays at 800 — the drawer's empty-state branch must use the inventory count, never the badge count. - Existing daemon test fixed without losing assertion coverage. Verified locally: - daemon vitest: 921/921 - web vitest: 336/336 (was 332, +4) - daemon `tsc -p tsconfig.json` and `tsc -p tsconfig.tests.json` (the CI killer): both clean - web `tsc -b --noEmit` clean - i18n-check passes Process learning baked into this PR: from now on I'll always run the `tsconfig.tests.json` separately before pushing, since the workspace typecheck script chains both and the second one is what CI fails on. * fix(connectors): pin badge to curated catalog count, not the dynamic execution allowlist (#767 review v2) @lefarcen and @mrcfps both flagged that the previous iteration of this PR (commit `447a270`) used `allowedToolNames` as the badge source. That field isn't actually stable across Composio hydration: `apps/daemon/src/connectors/composio.ts:758-761` extends it with every provider-discovered tool whose classified safety is read+auto, so a connector like GitHub goes from 2 → ≈52 on hydration — better than 2 → 868, but still a 26x jump and still the kind of unexplained number-lurch the issue is about. My regression test only covered write-classified tools so this read-path inflation slipped through. Fix: introduce `curatedToolNames` as a separate field that is locked to the static catalog and never extended by discovery. - `packages/contracts/src/api/connectors.ts` and `apps/daemon/src/connectors/catalog.ts`: add `curatedToolNames: string[]` to `ConnectorDetail` (required on the wire) and as an optional field on `ConnectorCatalogDefinition` (defaults to allowedToolNames at serialize time). - `apps/daemon/src/connectors/composio.ts`: `definitionFromToolkit()` now sets `curatedToolNames = [...staticDefinition.allowedToolNames]` — the catalog set, no extension. The runtime `allowedToolNames` keeps its dynamic auto-allow behavior so the execution gate is unchanged end-to-end. - `apps/daemon/src/connectors/catalog.ts`: `connectorDefinitionToDetail()` populates `curatedToolNames: [...(definition.curatedToolNames ?? definition.allowedToolNames)]`, so non-Composio connectors (no discovery layer) trivially mirror the two fields. - `apps/web/src/components/EntryView.tsx`: `getConnectorBadgeToolCount` becomes a 3-tier fallback — `curatedToolNames` first, then `allowedToolNames`, then `tools.length`. The order means a half-deployed daemon (allowed but not curated) still produces a more meaningful number than the raw inventory; only when both are missing do we fall back to the original buggy behavior. Tests added: - daemon `connectors-service.test.ts`: 3 new tests pin the contract — wire shape, hydration stability (catalog stays at 1 while allowedToolNames grows to 51), and defensive copy. - web `EntryView.test.ts`: 5 tests on `getConnectorBadgeToolCount` including the @lefarcen scenario explicitly: tools=800, allowedToolNames=52, curatedToolNames=2 → badge=2. Verified locally: - daemon vitest: 924/924 (was 921, +3) - web vitest: 337/337 (was 332, +5 net) - daemon `tsconfig.json` and `tsconfig.tests.json` (the CI killer): both clean - web `tsc -b --noEmit` clean - contracts typecheck clean - i18n-check passes - Live wire format: `/api/connectors/discovery` ships `curatedToolNames: ['github.github_search_repositories', 'github.github_get_issue']` for GitHub (length 2) * fix: make ConnectorDetail.{allowed,curated}ToolNames optional in shape After merging main, several existing test fixtures (EntryView, SettingsDialog.orbit, ConnectorsBrowser) declared `ConnectorDetail` inline without the new `allowedToolNames` / `curatedToolNames` keys the PR introduced. Daemon-built payloads always populate both fields (see `connectorDefinitionToDetail` and the Composio normalization path), so making the wire shape `?`-optional is safe for runtime consumers and avoids touching every fixture. Updates the daemon-side mirror type to match and adds non-null assertions in the daemon tests that read these fields directly. --------- Co-authored-by: lefarcen <935902669@qq.com>	2026-05-08 23:42:52 +08:00
tenderpooh	109722de3a	feat(desktop): export artifacts directly to PDF (#532 ) * feat(desktop): export artifacts directly to PDF * fix(desktop): PDF 내보내기 기본 여백 제거	2026-05-08 23:42:12 +08:00
soulme	e3423c2b7b	feat: add draggable file tab reordering (#936 )	2026-05-08 22:21:19 +08:00
Nagendhra Madishetti	5d7568ba2c	fix(web): confirm before clearing a saved Media provider API key (#875 ) The Clear button on Settings → Media providers wiped the saved apiKey + baseUrl + model in a single click with no recovery — a fat-fingered click on the wrong row would silently delete a key the user just pasted in. Wrap the existing onClick in `window.confirm()` matching the same pattern the codebase already uses for destructive actions (conversation delete, design delete, FileWorkspace file delete). The prompt is localized via a new `settings.mediaProviderClearConfirm` key with `{name}` placeholders for the provider label, translated across all 17 locales. Updated the existing media-provider clear test to auto-accept the prompt, plus added a sibling test asserting that dismissing the prompt leaves the saved config intact. Co-authored-by: Nagendhra <nagendhra405@gmail.com>	2026-05-08 22:15:59 +08:00
leprincep35700	f951ccb612	fix: keep examples filter counts consistent (#949 ) * fix: keep examples filter counts consistent * test: cover scoped examples scenario counts * test: satisfy examples fixture typing --------- Co-authored-by: leprincep35700 <leprincep35700@users.noreply.github.com>	2026-05-08 21:41:52 +08:00
leprincep35700	ce5f20918c	test: cover model option rendering (#948 ) * test: cover model option rendering * fix: strengthen model option regression coverage --------- Co-authored-by: leprincep35700 <leprincep35700@users.noreply.github.com>	2026-05-08 21:38:13 +08:00
Siri-Ray	208f09c60e	fix: settle completed runs and clean up shutdown children (#924 ) * fix: clean up completed and shutting down runs * fix: bound daemon CLI shutdown Generated-By: looper 0.6.0 (runner=fixer, agent=codex) * fix: harden daemon shutdown cleanup Generated-By: looper 0.6.0 (runner=fixer, agent=codex) * fix: harden daemon shutdown cleanup Generated-By: looper 0.6.0 (runner=fixer, agent=codex) * test: align acp abort fake with typed child	2026-05-08 21:05:22 +08:00
nettee	32d820e4ee	fix(daemon): typecheck leaf modules (#943 ) * update drift * fix(daemon): typecheck leaf modules * fix(daemon): decode Qoder stdout buffers Generated-By: looper 0.5.6 (runner=fixer, agent=opencode)	2026-05-08 20:01:25 +08:00
Caprika	4f647f56ba	[codex] Optimize Composio connector previews (#907 ) * Optimize Composio connector previews * Fix partial connector tool preview hydration * Cancel pending connector authorization on daemon * Preserve Composio cached tool counts * Avoid pending state after OAuth launch failure * Preserve static tool count fallback * Fix connector preview retry state * Remove Composio auth config metrics * Hydrate unknown connector tool previews * Fix remaining connector review threads * Stop failed connector preview spinner * Hydrate only targeted agent connectors	2026-05-08 20:01:06 +08:00
ferasbusiness666	1e8926271b	Harden security scan findings and upgrade dependencies (#806 ) * feat: add accent color control and launcher for Open Design * fix: remove launcher binary from PR * test: cover accent appearance edge cases * Harden security scan findings and upgrade deps * Address proxy security review * Pin jsdom for web test stability --------- Co-authored-by: ferasbusiness666 <ferasbusiness666@users.noreply.github.com> Co-authored-by: lefarcen <935902669@qq.com>	2026-05-08 19:46:34 +08:00
Tom Huang	d592f6087f	feat(mcp): external MCP client with daemon-managed OAuth and 39 design-focused templates (#898 ) * feat(mcp): add external MCP client with daemon-managed OAuth and 17 design-focused templates Open Design now acts as an MCP CLIENT and surfaces tools from third-party MCP servers to the underlying agent (Claude Code, Hermes, Kimi). Daemon - New mcp-config / mcp-oauth / mcp-tokens modules: persist server entries to .od/mcp-config.json, run the OAuth dance for HTTP/SSE servers end-to-end on the daemon (so cloud deployments work and tokens survive across turns), and inject Authorization: Bearer headers into the per-spawn .mcp.json the daemon writes for Claude Code (or the ACP mcpServers map for Hermes/Kimi). - /api/mcp/servers and /api/mcp/oauth/{start,status,disconnect} endpoints, plus spawn-time wiring in agents that hands the configured servers to the active agent CLI. - System-prompt directive for connected external MCPs so the model does not chase Claude Code's synthetic _authenticate / _complete_authentication tools when the Bearer is already pinned. Web - Settings -> External MCP servers panel with per-row OAuth Connect / Disconnect / Refresh affordances and per-row template hints. - New "Add server" picker categorized into 7 groups (image-generation, image-editing, web-capture, ui-components, data-viz, publishing, utilities) with a search box, sticky close button, collapsible <details> sections (auto-expand on search), 60vh capped scroll region, and a pinned Custom-server footer. - ChatComposer /mcp slash and MCP picker button forward to the new Settings tab; AssistantMessage renders MCP tool calls inline; markdown autolinker handles bare http(s) URLs (incl. OAuth links) before italic markers so OAuth callback URLs do not get italic-fragmented mid-token. Contracts - packages/contracts/src/api/mcp.ts owns the wire shapes (McpServerConfig, McpTemplate with stable McpTemplateCategory enum, McpServersResponse, OAuth start/status/disconnect bodies, the postMessage payload from the OAuth callback). Templates (17 built-in) - image-generation: Higgsfield (OpenClaw, OAuth HTTP), Pollinations, Allyson (animated SVG), AWS Bedrock Image (uvx). - image-editing: Imagician, ImageSorcery. - web-capture: just-every screenshot-website-fast, ScreenshotOne. - ui-components: 21st.dev Magic, shadcn/ui, FlyonUI. - data-viz: AntV Chart, Mermaid. - publishing: EdgeOne Pages. - utilities: Filesystem, GitHub, Fetch. Tests - apps/daemon/tests/mcp-{config,oauth,tokens,spawn}.test.ts cover storage round-trip, OAuth helpers, token persistence, spawn-time wiring, every template's transport / command / args / env-field invariants, and the canonical category enum. - apps/web/tests/runtime/markdown.test.tsx covers the new autolinker ordering rules. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(mcp): add 21 more design-focused templates and a `design-systems` category Expands the built-in MCP picker from 17 to 38 templates so users can compose the full Open Design craft loop (design-system intake → generate → edit → audit → publish) without leaving the Settings dialog. Every install spec is verified live against the upstream README; templates that needed Go binaries, multi-step `init` ceremonies, or massive runtime stacks (PostgreSQL + Redis + Ollama) are intentionally deferred so picking a template still resolves to a working server in one click. New `design-systems` category between `web-capture` and `ui-components` (reflects the upstream-of-components position in the workflow). Mirrored in `McpTemplateCategory` on both contracts and daemon, and `CATEGORY_ORDER` on the web side. New templates by category: - image-generation (+4): prompt-to-asset (icons / favicons / OG / logos with free-tier routing across Cloudflare AI / NVIDIA NIM / HF / Stable Horde), Nano Banana (hosted streamable HTTP, virtual try-on + product placement), Seedream (hosted streamable HTTP, ByteDance Seedream v3-v5 + SeedEdit), fal.ai (uvx, 600+ models incl. FLUX / Kling / Hunyuan / MusicGen). - image-editing (+3): Photopea (34 layered-editor tools — closes the PSD gap), Topaz Labs (AI upscale / denoise / sharpen), Transloadit (86+ media pipeline robots). - web-capture (+1): Pagecast (browser → demo GIF / MP4 with auto-zoom). - design-systems (+4, NEW category): Figma-Context (Framelink, designs → code), Design Token Bridge (Tailwind ⇄ CSS ⇄ Figma ⇄ M3 / SwiftUI / W3C DTCG + WCAG contrast), Design System Extractor (Storybook scrape), Aesthetics Wiki (cottagecore / dark-academia / y2k / … moodboards). - data-viz (+2): MCP Dashboards (45+ chart types + KPI dashboards), Excalidraw Architect (hand-drawn architecture diagrams). - publishing (+6): PageDrop, PDFSpark, OGForge, QRMint, Slideshot (HTML → PDF / PPTX / PNG with 7 themes), Deckrun (Markdown → PDF / video, hosted free tier with no key required). - utilities (+1): A11y axe-core (WCAG 2.0/2.1/2.2 + color-contrast + ARIA). Tests cover every new template's wiring (command, args, env / header required-vs-optional, secret flag), the category enum invariant, and in-category declaration order for image-generation, design-systems and publishing buckets where the order is what users see in the picker. 21 new test cases pass; full mcp-config suite is green. Templates intentionally deferred (documented in PR body): figma-use (needs Figma desktop with --remote-debugging-port=9222), m-moire (multi-step `memi suite init` + daemon ceremony), gemini-media-mcp + trident-mcp (Go binaries — no npx / uvx path), Pixelle-MCP (full app with web UI + ComfyUI backend), storybook-addon-mcp (lives inside user's Storybook, not standalone), primitiv (multi-step init / build / serve), ReftrixMCP (PostgreSQL + Redis + Ollama + DINOv2), narasimhaponnada/mermaid (overlap with peng-shawn). Co-authored-by: Cursor <cursoragent@cursor.com> * feat(mcp): add figma-use template (write designs from chat) under design-systems figma-use is the natural counterpart to Figma-Context already in this PR: where Framelink reads Figma designs into the model, figma-use writes back into the canvas (90+ tools — create frames / text / components / variants, render JSX into Figma, export PNG/SVG, query nodes via XPath, lint for WCAG / auto-layout / hardcoded colors, analyze design systems). Wired as an HTTP MCP template (`http://localhost:38451/mcp`) because `figma-use mcp serve` only exposes HTTP — there's no stdio mode in the upstream `serve.ts`. No API key. Two prerequisites the user owns are spelled out in the description so picking the template still resolves to a working server: (1) start Figma with `--remote-debugging-port=9222` (or `figma-use daemon start --pipe` on Figma 126+), and (2) leave `npx figma-use mcp serve` running in a terminal. Inserted between `design-system-extractor` and `aesthetics-wiki` so the design-systems category reads as a workflow: read existing design (Figma Context) → translate tokens (Token Bridge) → extract from Storybook (Extractor) → write back to Figma (figma-use) → break creative block (Aesthetics Wiki). Tests cover the new template's transport (`http`), endpoint URL, the empty header-fields invariant (no auth required), and bump the design-systems group order to include it. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(settings): i18n the External MCP / MCP server / Connectors sidebar entries and make the dialog header track the active section The External MCP sidebar entry this PR introduces was hardcoded English ("External MCP / Add MCP tools (Higgsfield, GitHub…)"). Same for the adjacent Connectors and MCP server entries. The dialog header was also pinned to "Execution & model" copy, so opening Settings → External MCP showed a header that lied about which section the user was on. Adds six translation keys — `settings.connectorsTitle/Hint`, `settings.mcpServerTitle/Hint`, `settings.externalMcpTitle/Hint` — and translates them across all 17 locales (ar, de, en, es-ES, fa, fr, hu, id, ja, ko, pl, pt-BR, ru, tr, uk, zh-CN, zh-TW). `SettingsDialog` now derives the header title/subtitle from the active section (11 sections total) instead of a single hardcoded pair, so each section renders an honest header. Co-authored-by: Cursor <cursoragent@cursor.com> * test(e2e): pin level: 3 on dialog heading lookups for Pets and Connectors CI's Validate workspace job (#1479) failed two Playwright cases with the strict-mode violation: getByRole('dialog').getByRole('heading', { name: 'Pets' }) resolved to 2 elements: 1) <h2>Pets</h2> 2) <h3>Pets</h3> Same root cause as the unit-test fix already in this PR: the dynamic dialog `<h2>` now echoes the section's own `<h3>` because the dialog header tracks the active section. Disambiguate to `level: 3` so each assertion still pins the section heading specifically (which is what the test intends to verify). Audit of the rest of e2e/ for `dialog.getByRole('heading', ...)` — settings-api-protocol.test.ts looks for "OpenAI API" / "Anthropic API" section h3s which never appear in the dialog `<h2>` (always "Execution & model"), so those stay safe. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(mcp): bind OAuth refresh to the issuing client and skip stale tokens Persist the OAuth client context (token endpoint, client_id, client_secret, issuer, redirect_uri, resource) alongside the bearer token so refresh hits the same client the refresh_token was bound to (RFC 6749 §6). The previous refresh path re-ran beginAuth with a dummy OOB redirect URI, which kept getOrRegisterClient from finding the original DCR client and made providers reject the refresh on the next chat turn. Refreshes now reuse the persisted endpoint/client pair directly. Also stop injecting expired access tokens at spawn time when refresh is unavailable or fails. Pinning a stale Bearer made every Claude MCP call 401 while the prompt still treated the server as connected; on that path we now skip the entry and let the UI surface a reconnect. Generated-By: looper 0.6.1 (runner=fixer, agent=claude-code) --------- Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-08 17:59:20 +08:00
Sid	8b0625aa6f	fix(web): unbreak Create button on plain HTTP / LAN-IP deployments (#849 ) (#900 ) `crypto.randomUUID()` is restricted to secure contexts (HTTPS or `localhost`), so when Open Design is served over plain HTTP on a LAN IP — the standard Docker / unRAID / NAS self-hosted setup, e.g. `http://192.168.1.10:17573` — Chromium silently makes the function undefined. Calls then throw `TypeError: crypto.randomUUID is not a function`, which the `try/catch` around `createProject()` swallows by returning `null`, which the click handler reads as "no project, do nothing". The Create button effectively becomes a silent no-op for every LAN-IP user (issue #849, also reported as #394). Centralize the call into a new `apps/web/src/utils/uuid.ts` helper with a three-tier fallback per @lefarcen's review: 1. `crypto.randomUUID()` — secure-context happy path, native and cryptographically random. 2. `crypto.getRandomValues()` + RFC 4122 §4.4 byte layout — still available in non-secure contexts since the Web Crypto API is not gated by `isSecureContext`. Yields a real v4 UUID with crypto-quality entropy. 3. `Math.random()` — last-resort polyfill for environments missing both, kept because the IDs we generate (project ids, message ids, client request ids) are scoped to a single user's local browser session — cryptographic uniqueness isn't required, just enough entropy to avoid collisions. Replace all four `crypto.randomUUID()` callsites confirmed in @lefarcen's audit: - `apps/web/src/state/projects.ts:48` (createProject id) - `apps/web/src/components/ProjectView.tsx:986` (user message id) - `apps/web/src/components/ProjectView.tsx:1013` (assistant message id) - `apps/web/src/components/ProjectView.tsx:1263` (daemon stream clientRequestId) with calls to the new `randomUUID()` helper. Tests: 6 new tests in `apps/web/tests/utils/uuid.test.ts` cover each fallback tier, RFC 4122 v4 format validation (regex + explicit version/variant nibble checks), the explicit "doesn't throw when `crypto.randomUUID` is undefined" assertion that pins the #849 root cause, and a 1000-iteration uniqueness check on the `getRandomValues` path. Verified locally: - web vitest: 522/522 (was 516, +6) - web `tsc -b --noEmit` clean - `tsx scripts/i18n-check.ts` passes	2026-05-08 16:50:59 +08:00
shangxinyu1	8fee22d358	Fix stuck chat runs and unintended cancels (#896 ) * Fix stuck chat runs and unintended cancels * Harden chat run stall watchdog	2026-05-08 15:47:44 +08:00
Marc Chan	e14b8092ea	feat: add Orbit activity summaries (#681 ) * feat: add Orbit activity summaries * fix(orbit): make runs navigable while agent continues * fix(web): widen minimum chat panel * feat: support Orbit template selection * fix(daemon): avoid bogus skill side-file preflight * fix(web): collapse orbit artifact project cards * fix(web): preserve orbit project card titles * fix: improve Orbit run daily briefing * fix: handle Orbit digest data failures * fix: load Orbit templates and connector tools reliably * fix: keep Orbit summary counts consistent Generated-By: looper 0.6.1 (runner=fixer, agent=opencode) * fix: apply Orbit template skill context * fix: cache and curate connector tools for Orbit * fix: align Orbit defaults and connector discovery * fix: simplify Orbit template settings * fix: move connectors into settings * fix: compact connector settings catalog * fix: address Orbit PR feedback Generated-By: looper 0.6.1 (runner=fixer, agent=opencode) * fix: address Orbit PR feedback Generated-By: looper 0.6.1 (runner=fixer, agent=opencode) * fix: address Orbit PR feedback Generated-By: looper 0.6.1 (runner=fixer, agent=opencode) * fix: address Orbit PR feedback Generated-By: looper 0.6.1 (runner=fixer, agent=opencode) * fix: address Orbit PR feedback Generated-By: looper 0.6.1 (runner=fixer, agent=opencode) * fix: address Orbit PR feedback Generated-By: looper 0.6.1 (runner=fixer, agent=opencode) * fix: address Orbit PR feedback Generated-By: looper 0.6.1 (runner=fixer, agent=opencode) * fix: address Orbit PR feedback Generated-By: looper 0.6.1 (runner=fixer, agent=opencode) * fix: prevent connector action button from stretching into pill The icon-only connect/disconnect buttons in the embedded connectors catalog inherited min-width: 92px / 106px from the non-embedded pill rules, overriding the 24px square sizing and causing the buttons to overlap the card head text. Reset min-width to 0 in the embedded icon-only rule so the compact square layout holds. * fix(web): align live artifact file rows * fix: clean up Orbit connector settings lifecycle Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix: address Orbit review regressions Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * feat(web): localize Orbit and connector settings * feat(web): gate Orbit runs without connectors * feat(web): refine connector settings UX * feat(web): safeguard Composio key clearing * fix(web): refresh Composio tool badges * feat(web): show connector logos * feat(daemon): localize Orbit prompt window * fix(daemon): clarify blocked connector callback closes * test(daemon): harden flaky async probes * fix(web): align Indonesian connector locale keys * test(web): align connector browser props * fix(web): preserve explicit credential clears Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(daemon): time out Composio logo proxy fetches Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(web): localize Indonesian connector settings copy Translate the new connector settings strings in the Indonesian locale and lock them with a regression test so this surface no longer silently falls back to English. Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(web): preserve discovered connector tools Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(web): preserve onboarding autosave completion Keep settings autosave from clearing onboarding completion after the close gesture, and expose the desktop main types from source so workspace validation can typecheck packaged imports without a prior desktop build. Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(daemon): defer Composio catalog cache hydration Load persisted Composio catalog data only after the runtime data directory is configured so startup cannot read another namespace's cache. Add a regression test that exercises the module-load singleton path. Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(web): treat discovery completion independently Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(web): preserve latest settings draft on close Use the latest persisted settings draft when the dialog closes so onboarding completion does not race a stale daemon sync and overwrite newer Orbit/template selections. Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(web): avoid syncing draft Composio key on Orbit run Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(web): localize Orbit settings copy Translate the new Indonesian Orbit and autosave strings so the settings UI no longer falls back to English and the locale regression stays covered. Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(web): prefer fresh connector catalog state Keep refetched connector status/auth data authoritative while retaining discovery-only tool metadata so the connectors UI stays consistent after refreshes. Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(web): declare Indonesian locale fallback keys explicitly Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(web): inline Indonesian fallback strings for CI Replace the Indonesian locale's per-key English lookups with explicit strings so workspace typecheck no longer depends on brittle build-mode resolution in CI. Add a regression test that blocks those per-key English lookups from reappearing in the CI-sensitive fallback sections. Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(daemon): restrict proxied connector logos to image MIME types Reject non-image upstream logo responses so the daemon never serves third-party HTML from its localhost origin. Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * test(e2e): align settings dialog regressions Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(web): decouple Orbit runs from media sync failures Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(web): keep SPA catch-all export-compatible Disable dynamic catch-all params for the exported SPA shell so Next.js static builds can emit the root route again. Add a regression test covering the route config against the web export mode. Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(web): preserve Orbit config and workspace routes Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(daemon): block SVG in connector logo proxy Reject SVG and other unsafe proxied logo responses so third-party logo content cannot execute under the daemon origin, while keeping raster logo fetches working and making rejected responses non-cacheable. Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(daemon): fall back to static catalog for empty cache Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(web): disable Orbit run before connector gate resolves Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(desktop): export shipped desktop types Point the desktop ./main type export at the generated declaration so installed consumers resolve the published file set. Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(web): restore persisted question form selections Render historical submitted answers directly so reloaded question forms keep their locked selections visible. Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(web): retry forced media sync autosave Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(daemon): keep Composio logo timeout through body read Keep the Composio logo fetch timeout active until the response body is fully consumed so stalled body reads abort and clear the inflight cache entry. Add a regression test that proves a delayed body read times out and the next request can recover.\n\nGenerated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(web): refresh Orbit gate after connector auth Re-check connector availability when the settings window regains focus so Orbit unlocks as soon as a connector finishes authenticating in the same settings session. Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(daemon): keep connector detail tool lists intact Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(daemon): ignore malformed Orbit summaries Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(e2e): stabilize design-system multi-select flow Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(daemon): cap Composio logo cache growth Bound the Composio logo cache with LRU eviction and expired-entry pruning so repeated untrusted logo requests cannot grow daemon memory without limit. Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(daemon): bound proxied Composio logo payloads Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(web): align autosave settings tests Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(web): remove stray CSS conflict marker Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fixer: address PR #681 follow-up items Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(web): restore restart routes and connector flows * fix(web): keep SPA export route static * fix(web): stabilize chat scroll tests --------- Co-authored-by: lefarcen <935902669@qq.com>	2026-05-08 14:27:46 +08:00
shangxinyu1	aec9428b08	Fix desktop preview and packaged app interactions (#879 ) * Fix packaged deck navigation interactions * Fix connector auth in packaged app and localized content coverage * Fix Electron connector browser handoff contract	2026-05-08 14:26:10 +08:00
lefarcen	b9d30aa30e	test(web): de-flake chat-scroll-preservation across tab switches (#886 ) The earlier shape installed instance-level Object.defineProperty mocks on the remounted chat-log only after `await switchTab('Chat')`. Inside that act() the component schedules a rAF that writes scrollTop on the new element; depending on whether jsdom's rAF polyfill flushed before the await resolved, the write either landed on the still-default prototype setter (lost) or the not-yet-installed instance setter (also lost). The instance mock's closure-captured remountedTop then served its initial 0 forever and the assertion failed nondeterministically across CI runs without any product-code change. Patch the geometry at HTMLElement.prototype level so any chat-log React mounts later automatically reads/writes through a test-controlled `geom` object. The component's restore rAF can fire at any point and still write to the same place the assertion reads from. Verified 8/8 clean local runs.	2026-05-08 14:16:12 +08:00
Nagendhra Madishetti	661d11e60b	fix(web): confirm before clearing the saved Composio API key (#877 ) The Clear button on Settings → Connectors removed the daemon-stored Composio key in a single click with no recovery — a stray click wiped a credential the user had to fetch back from app.composio.dev. Wrap the existing onClick in window.confirm() matching the same pattern the codebase already uses for destructive actions (conversation delete, design delete, FileWorkspace file delete, and the Media providers Clear button shipped alongside this in issue #737). The prompt copy stays in English to match the rest of the Composio section, which is hardcoded English today. Updated the existing 'clears a saved Composio key' test to auto-accept the prompt, plus added a sibling test asserting that dismissing the prompt leaves the daemon-stored key intact in the saved payload. Co-authored-by: Nagendhra <nagendhra405@gmail.com>	2026-05-08 12:39:04 +08:00
nettee	8930b9650c	feat: Add a toggle to reveal media provider API keys (#867 )	2026-05-08 11:46:21 +08:00
Nagendhra Madishetti	655d561f38	fix(web): show explicit error/retry state when example preview HTML fails to load (#863 ) * fix(web): show explicit error/retry state when example preview HTML fails to load Reporter (#860) saw the example preview modal stuck with the toolbar buttons greyed out and only restarting the app got back to a usable state. Lefarcen confirmed the diagnosis: when /api/skills/:id/example fails, fetchSkillExample returns null, the modal stays at preview.loading forever, and the share menu's disabled={!activeHtml} guard sits in the disabled position with no recovery path. Three changes: 1. fetchSkillExample now returns a discriminated { html } \| { error } instead of collapsing every failure into null, so callers can tell a real fetch failure from a normal load. 2. PreviewView gains an optional error field. When set, PreviewModal renders a stacked title/body/Retry affordance instead of the indefinite "Loading…" placeholder. Retry re-fires onView so the parent can re-run its fetch. 3. ExamplesTab tracks per-skill errors alongside per-skill html, clears the in-flight value before each fetch, and wires onView from the modal into loadPreview so the Retry button actually retries. i18n: three new keys (preview.errorTitle, preview.errorBody, preview.retry), translated across all 17 locales. The locales-aligned test stays green. CSS: .ds-modal-error stacks the new content vertically inside the existing .ds-modal-empty positioning, no other modals affected. * fix(web): stabilize preview onView and guard parallel preview fetches Codex caught a real bug in the round-1 fix: the inline onView={() => loadPreview(...)} prop was recreated on every parent render, and PreviewModal's mount effect re-fires onView whenever its identity changes. A persistent fetch failure would update state, recreate the prop, re-fire the effect, re-run loadPreview, and burn through the error UI in a flash instead of waiting for a Retry click. Pin a stable onPreviewView via a useRef-backed callback so the modal sees a single identity for the lifetime of the panel; loadPreview is reached through the ref, so its closure refresh on state updates no longer leaks into the modal's effect dependencies. While in this surface, also add lefarcen's race guard: a synchronous inFlightRef Set so two parallel loadPreview calls (e.g. card hover firing while the modal opens) cannot both pass the cache check before either setState lands. The first caller adds the id pre-await; the second sees it and exits early. try/finally clears the entry on both success and failure paths. Adds tests/components/preview-modal-error-state.test.tsx covering: - error UI renders when view.error is set, - Retry click calls onView with the active view id, - re-rendering with the same onView identity does not re-fire the modal's mount effect (pins the no-auto-retry contract). * fix(web): close Retry over the active skill id, not the modal-internal view id mrcfps caught a real regression in round 2: PreviewModal calls onView(activeId) where activeId is the modal-local view id ('preview' in this component). The previous round forwarded that argument straight into loadPreview, so the mount effect and Retry button hit /api/skills/preview/example instead of /api/skills/{skill-id}/example. The new error state could not actually recover. Mirror the active skill id into a ref alongside loadPreviewRef and have onPreviewView ignore the modal-forwarded argument, fetching the selected skill via the ref instead. The callback identity stays stable, so the no-auto-retry contract from round 2 still holds. Adds tests/components/examples-tab-retry.test.tsx that mounts the real ExamplesTab, mocks fetchSkillExample to reject, opens the preview, clicks Retry, and asserts the second call hits the same skill id (and explicitly never gets called with 'preview'). --------- Co-authored-by: Nagendhra <nagendhra405@gmail.com>	2026-05-08 11:16:14 +08:00
kami	2eae7da24b	feat: support Cloudflare Pages custom domains (#851 ) * Support Cloudflare Pages custom domains without hiding pages.dev fallback Keep the default Pages preview as the first public link while optional owned-zone binding provisions DNS and Pages custom-domain state in parallel. Constraint: Cloudflare deploys must use the existing direct-upload API path with no Wrangler dependency. Constraint: pages.dev must stay visible even while custom-domain verification is pending. Rejected: Vercel custom-domain support \| outside requested Cloudflare-only scope. Rejected: overwriting arbitrary CNAME records \| risks taking over user-managed DNS. Confidence: high Scope-risk: moderate Directive: Do not expose providerMetadata through public deploy contracts; keep custom-domain DNS ownership checks conservative. Tested: pnpm --dir apps/daemon exec vitest run -c vitest.config.ts tests/deploy.test.ts tests/deploy-routes.test.ts Tested: pnpm --filter @open-design/contracts build && pnpm --filter @open-design/contracts typecheck && pnpm --filter @open-design/contracts test Tested: pnpm --filter @open-design/web typecheck && pnpm --filter @open-design/web test -- providers/registry.test.ts components/FileViewer.test.tsx i18n/locales.test.ts Tested: pnpm i18n:check && pnpm guard && pnpm typecheck Tested: pnpm --filter @open-design/daemon build && pnpm --filter @open-design/web build && git diff --check Not-tested: real Cloudflare account/token/domain smoke test * Preserve Cloudflare fallback correctness under large accounts and races Constraint: Cloudflare Pages keeps pages.dev as the primary usable fallback while custom domains remain optional typed metadata. Rejected: Treating custom-domain DNS or binding failure as a top-level deployment failure \| pages.dev can still be ready and usable. Confidence: high Scope-risk: moderate Directive: Keep custom-domain finality tied to Cloudflare Pages API active status plus URL reachability; do not expose providerMetadata. Tested: pnpm --dir apps/daemon exec vitest run -c vitest.config.ts tests/deploy.test.ts tests/deploy-routes.test.ts; pnpm --filter @open-design/web test -- components/FileViewer.test.tsx i18n/locales.test.ts providers/registry.test.ts; pnpm --filter @open-design/daemon typecheck; pnpm --filter @open-design/web typecheck; pnpm i18n:check; git diff --check; pnpm guard; pnpm typecheck; pnpm --filter @open-design/daemon build; pnpm --filter @open-design/web build Not-tested: Real Cloudflare token/account/zone smoke test. * Keep impeccable design notes local Constraint: .impeccable.md is local assistant/design context and should not be part of the PR diff. Rejected: Keeping the file tracked while adding it to .gitignore \| tracked files are not ignored by Git. Confidence: high Scope-risk: narrow Directive: Keep .impeccable.md untracked and ignored; do not rely on it for required project documentation. Tested: git check-ignore -v .impeccable.md; git diff --check Not-tested: Full workspace tests not rerun for ignore-only metadata change.	2026-05-08 11:11:22 +08:00
Nagendhra Madishetti	77824ec029	fix(web): preserve Chat scroll position across Chat/Comments tab switches (#790 ) (#841 ) * fix(web): preserve Chat scroll position across Chat/Comments tab switches (#790) The chat-log <div> in ChatPane is conditionally rendered (the inner `{tab === 'chat' ? <>...</> : null}` branch). When the user switches to Comments and back, the chat-log is unmounted and remounted; the remounted element starts at scrollTop=0, and the initial-bottom-scroll effect skips because didInitialScrollRef.current is already true from the original mount. Result: the conversation view jumps to the top instead of preserving the user's reading position. Replaced the empty-deps scroll listener with a tab-keyed effect that: 1. Captures scrollTop in the existing onScroll handler so the saved position is always current. 2. On every mount of the chat-log (when tab becomes 'chat'), restores the saved scrollTop on the next animation frame so layout finishes before the scroll write lands. The existing scrolledFromBottom signal that drives the jump-to-bottom button is folded into the same handler and now correctly re-attaches on every chat-log remount, fixing a secondary issue where that listener would silently stop firing after a tab toggle. * fix(web): preserve bottom-pinned chat across off-tab streaming and snapshot on unmount Round 1 saved an absolute scrollTop, so a user who left Chat while pinned to the bottom came back above any new messages that streamed in while Comments was open. Save a discriminated state instead: { pinnedToBottom: true } when the user was within 50px of the bottom, otherwise { scrollTop }. On remount, pinned state snaps to the new scrollHeight so bottom-followers stay pinned; non-pinned state restores the absolute offset. Also snapshot the final scroll state in the effect cleanup before removing the listener, so programmatic scrolls or layout shifts right before unmount don't leave the ref stale. Adds tests/components/chat-scroll-preservation.test.tsx covering both branches. * fix(web): clear saved chat scroll state on conversation switch The savedChatScrollRef persisted across conversation changes, so switching to Comments while on conversation A and then switching to conversation B would, on returning to Chat, restore A's scrollTop instead of starting fresh at the bottom. Reset the ref alongside didInitialScrollRef when activeConversationId changes. Added a third test covering the cross-conversation case. * fix(web): scroll new conversation to its bottom when conv switch happened off-tab When activeConversationId changed while the user was on the Comments tab, the conversation-reset effect cleared didInitialScrollRef and the saved scroll ref, but the initial-bottom-scroll effect couldn't do anything because logRef.current was null. Returning to Chat then left the new conversation at scrollTop: 0 instead of its initial bottom. Add `tab` to the initial-scroll deps so the effect re-runs when the chat-log remounts, picks up the cleared didInitialScrollRef state, and scrolls the fresh conversation to its scrollHeight. Updated the cross-conversation test to assert the new conversation lands at its bottom (1000), not at scrollTop: 0. * fix(web): resync jump-to-latest button when restoring saved chat scroll position The rAF restore branch wrote scrollTop but never refreshed scrolledFromBottom, so a user who left Chat ~60px from the bottom and returned to find new messages stacked underneath would land hundreds of pixels above the latest turn while the jump-to-latest button stayed hidden until they manually scrolled. Recompute the distance and update scrolledFromBottom inside the restore rAF, mirroring what onScroll already does. Adds a test that asserts the jump-to-latest button is visible immediately after a non-pinned restore over a grown scrollHeight. --------- Co-authored-by: Nagendhra <nagendhra405@gmail.com>	2026-05-08 11:10:56 +08:00
shangxinyu1	32df17b87b	Fix desktop preview interactions and connector auth feedback (#864 ) * Fix desktop preview modal interactions * Fix connector auth failures surfacing	2026-05-08 11:05:41 +08:00
Nagendhra Madishetti	8bb9900603	fix(web): scope settings save validation + sanitize payload to active sidebar section (#739 ) (#827 ) The footer Save button's enabled state was computed purely from execution-mode completeness (BYOK requires apiKey + model + valid baseUrl; Local CLI requires a selected available agent). That check ran regardless of which sidebar section the user was on, so a draft mode toggle on the execution section that left required fields empty would lock the Save button across every other section. After clicking BYOK without filling fields and navigating to Language or Appearance, the user could not save unrelated changes in those sections even though they had nothing to do with execution mode. Two paired helpers in apps/web/src/components/SettingsDialog.tsx address this: shouldEnableSettingsSave(cfg, activeSection, agents, isBaseUrlValid) returns true on any section other than 'execution' so unrelated sections do not get blocked by an incomplete execution draft. On 'execution' it keeps the original mode-completeness check unchanged (within-section invariant). sanitizeSettingsSavePayload(cfg, initial, activeSection, agents, isBaseUrlValid) is the counterpart used at the onSave call site. When Save is enabled on a non-execution section but the user's draft execution config is incomplete, it reverts the execution-mode fields (mode, apiKey, apiProtocol, apiVersion, apiProtocolConfigs, apiProviderBaseUrl, baseUrl, model, agentId, agentCliEnv, maxTokens) to their `initial` values so the unrelated section change is committed without leaving the app in a broken execution state. Within the execution section, or when execution is already valid, the cfg passes through unchanged. Both lefarcen and chatgpt-codex flagged this persistence gap on the first revision of this PR; mrcfps marked it blocking. The sanitize helper is the fix lefarcen suggested (revert-to-initial when the active section is not execution and the execution draft is incomplete). Tests in apps/web/tests/components/SettingsDialog.test.ts: - shouldEnableSettingsSave: 4 cases (the cross-section fix, daemon mode validity, api mode validity, regression guard for within-execution). - sanitizeSettingsSavePayload: 5 cases (revert path, no-op when execution is valid, no-op on the execution section itself, every non-execution section covered, edge case where the agent registry says unavailable but initial cfg was already valid daemon). Local: web tests 33/33, web typecheck and pnpm guard all clean. Co-authored-by: Nagendhra <nagendhra405@gmail.com>	2026-05-08 10:57:12 +08:00
Tom Huang	56bf6ee1b6	feat: agent-callable research command and /search (#615 ) * feat: pre-generation research (Tavily) for grounded generation Adds an optional pre-generation research step so the agent can produce slides / prototypes / decks grounded in real sources instead of guessing. User flow: 1. Settings -> Tavily Search -> paste API key (or set TAVILY_API_KEY). 2. Click the new Research button in the chat composer. 3. On send, the daemon runs a Tavily search, prepends the findings as a <research_context> block ahead of the system prompt, and spawns the agent. Research progress shows up as status pills in the chat stream; the agent cites sources inline as [1]/[2]/... Phase 1 surface: - Single provider (Tavily), single depth ('shallow'), no LLM synthesis pass (Tavily's `answer` is the summary). - Composer toggle only; no popover / depth picker yet. - Reuses the existing `status` SSE agent payload + StatusPill UI so no new event variants or renderer code are needed. Layers touched: - contracts: ResearchOptions / Source / Findings DTOs; ChatRequest.research; export from index. - daemon: apps/daemon/src/research/{index,tavily}.ts orchestrator + provider; tavily added to MEDIA_PROVIDERS and ENV_KEYS; hook in startChatRun before prompt assembly. - web: ChatComposer toggle + ChatSendMeta; threaded through ChatPane / ProjectView / streamViaDaemon into ChatRequest. Side fix (required to land the feature, but useful on its own): contracts internal relative imports lacked the `.js` suffix that NodeNext module resolution requires. This was already breaking `pnpm --filter @open-design/daemon typecheck` on main; without the fix, none of the new research types were visible to the daemon. All internal contracts imports now carry `.js`. Spec: specs/current/research-feature.md (phases 2-4 outlined for follow-up: composer popover, multi-provider, deep recursion, example skills with research_recommends). Verified: - pnpm --filter @open-design/contracts typecheck/test - pnpm --filter @open-design/daemon typecheck (the chokidar project-watchers test is a pre-existing flake, unrelated) - pnpm --filter @open-design/web typecheck - node scripts/verify-media-models.mjs * fix(daemon): clamp Tavily max_results to 20 Tavily's /search endpoint requires `max_results` in [0, 20]; sending a larger value (e.g. when `research.depth: "deep"` resolves to 30) returns 400 and `runResearch` silently falls back to no-research. Clamp at the provider boundary so Phase 2 depth tiers above 20 still produce results instead of failing the request. Generated-By: looper 0.6.1 (runner=fixer, agent=claude-code) * Remove stale research merge leftovers * Add agent-callable research search * Fix Indonesian locale typecheck * Fix research command invocation edge cases * Harden slash search prompt expansion * Honor research source caps in command contract * Require search reports in design files * Add research data provider settings * Wire web research provider fallback order * Update research provider fallback wording * Revert "Update research provider fallback wording" This reverts commit `86fb6001e3`. * Revert "Wire web research provider fallback order" This reverts commit `4c9e16036b`. * Revert "Add research data provider settings" This reverts commit `23630d1746`. * Add Dexter and Last30Days research skills * Add DCF and Last30Days OD skills * Add Last30Days and Dexter skills * Resolve research review threads --------- Co-authored-by: a1chzt <chizblank@gmail.com>	2026-05-08 10:33:44 +08:00
shangxinyu1	7107623ee2	test: expand entry and settings automation coverage (#811 ) * test: harden new project panel metadata coverage * test: add settings and connector sync coverage * test: expand entry e2e coverage * test: satisfy exact optional property types in entry connector flow * test: keep entry Playwright coverage under e2e/ui * test: tighten coverage docs and settings test cleanup * test: drop e2e docs from the guarded package * docs: move automation coverage docs out of e2e * test: restore clipboard cleanup without delete * test: match composio save dialog behavior * test: avoid placeholder assertion after composio save * test: expect closeModal on settings saves * test: align settings save assertions with closeModal flags * test: fix settings save mocks * test: align composio replacement hint	2026-05-08 09:30:16 +08:00
Nagendhra Madishetti	d4b547caa7	fix(web): keep saved Composio API key indicator visible while typing a replacement (#741 ) (#751 ) The saved-key badge was wired to `isSavedState = apiKeyConfigured && !hasPendingEdit`, which made it disappear on the first keystroke as soon as the user started typing a draft replacement. Users reading the settings panel saw the saved key indicator vanish before they had clicked Save and reasonably assumed the stored credential had already been overwritten or removed. Credential editing is a high-trust workflow; a UI that fakes a state change before the durable write is the wrong default. Replaced the boolean derivation with a single helper `deriveComposioCredentialState` returning one of `empty \| pending-new \| saved \| saved-pending`. The component now shows the saved-key badge for both `saved` and `saved-pending`, so the indicator stays anchored while the user types. The hint text differentiates all four states so the unsaved-replacement case is still clearly called out. Helper is exported and unit-tested in `apps/web/tests/components/SettingsDialog.test.ts` against the empty, pending-new, saved, and saved-pending states plus the whitespace-only-draft edge case that should still resolve to `saved`. Co-authored-by: Nagendhra <nagendhra405@gmail.com>	2026-05-07 21:12:32 +08:00
kami	09eb88f683	Add Cloudflare Pages artifact deployment Adds Cloudflare Pages artifact deployment support.	2026-05-07 20:04:22 +08:00
PerishFire	cb92c93ae0	Migrate beta release publishing to R2 (#805 ) * Prebundle standalone web packaged runtime * Harden mac standalone prebundle policy * Prebundle mac daemon packaged runtime * Prune mac Electron locales * Maximize mac release artifact compression * Publish beta mac artifacts to R2 * Use remote R2 uploads for beta releases * Fail fast on beta R2 access issues * Use S3-compatible uploads for beta R2 releases * Decouple beta versioning from GitHub releases * Remove legacy beta metadata source * Address release beta review notes	2026-05-07 19:13:52 +08:00
Pratik Rai	555dbebfe2	fix(web): add alert when pdf export popup is blocked (#664 ) Fixes PDF export feedback when popup blockers prevent opening the export preview.	2026-05-07 18:39:42 +08:00
Aqil Aziz	fcc37c6c2d	feat(i18n): add Indonesian locale	2026-05-07 16:48:05 +08:00
Tom Huang	38eb78a382	feat(web): add Inspect mode for live per-element style tuning	2026-05-07 16:40:30 +08:00
lefarcen	984797b3cd	fix: add DeepSeek v4 models to catalog (#722 )	2026-05-07 15:55:41 +08:00
shangxinyu1	9b501f12a5	Support overriding the Codex executable path (#755 ) * Support overriding the Codex executable path * Replace save-as-template prompts with an in-app dialog * Seed local packaged app config from workspace * Fix packaged config and connection test overrides * Keep tools-pack mac config seeding self-contained * Require absolute CODEX_BIN overrides	2026-05-07 15:00:52 +08:00
monshunter	e6e5928be1	feat(web): add connection tests for execution settings (#507 ) * feat(settings): add connection test for providers and CLI agents Adds a "Test" action in the Settings dialog that verifies the configured provider (Anthropic/OpenAI/Azure/Google) or CLI agent without sending a real chat. Backed by a new daemon endpoint and shared contracts, with categorized inline statuses and i18n strings across all supported locales. * fix(settings): address connection test review feedback * fix(daemon): pass empty MCP servers for connection probes * fix(connection-test): address review blockers * fix(daemon): fail json stream runs on structured errors * fix(contracts): build connection test subpath export * Use draft CLI env in agent connection tests * fix(i18n): add fallback ids for new curated content	2026-05-07 11:25:37 +08:00
Marc Chan	5a29fed7d3	fix(web): align design system default test fixture (#708 )	2026-05-07 09:07:28 +08:00
Sid	4c82e48e4f	fix web design system selection persistence (#621 )	2026-05-07 02:27:00 +08:00
Feroomon2010	576dfed9e1	feat: add accent color control and launcher for Open Design (#683 ) * feat: add accent color control and launcher for Open Design * fix: remove launcher binary from PR * test: cover accent appearance edge cases --------- Co-authored-by: ferasbusiness666 <ferasbusiness666@users.noreply.github.com>	2026-05-06 23:14:21 +08:00
shangxinyu1	8301bcd46e	test: add desktop settings and project flow e2e coverage (#306 ) * test: add desktop settings regression coverage * test: stabilize desktop smoke interactions on latest main * fixer: address PR #306 follow-up items Generated-By: looper 0.2.7 (runner=fixer, agent=codex) * test: expand ui e2e automation suite * fix: add missing Ukrainian prompt template labels * chore: align desktop e2e helpers with layout guard * chore: move settings protocol e2e into ui suite * fix: preserve api provider settings across protocol switches * fix: avoid leaking api keys across protocol drafts * test: fold desktop smoke coverage into mac spec * fix: dedupe Ukrainian prompt template labels	2026-05-06 21:48:12 +08:00
zztdan	f3024fdc22	feat(media): add Nano Banana image provider (#631 ) * feat(media): add Nano Banana image provider * fix(media): support Gemini API key headers for Nano Banana * refactor(media): move Nano Banana model override flag into provider metadata	2026-05-06 20:26:31 +08:00
mamba	570d06419c	feat[qoder cli] add Qoder CLI agent support (#626 ) * chore(agent): 增加对 Qoder CLI 的支持和识别 - 在 QUICKSTART 文档中添加 Qoder CLI 为可选本地 agent CLI - 更新代码中 agents.ts 注释包含 Qoder CLI 扫描支持 - 修改首次加载时检测的可用 CLI 列表，加入 Qoder CLI - 在多个语言版本的 README 中增加 Qoder CLI 支持及相关徽章统计 - 更新 agent 适配器与事件解析相关的代码注释和文档，包含 qoder-stream-json 解析器 - 调整 Windows 下 spawn 行为以支持 Qoder CLI 的 stdin 提供 prompt - 修复多语言文档对支持的 CLI 数量描述错误，确保数据保持同步 Change-Id: I388f2f61c60ce8faa7cef5d84eb407950f8bdbfb Co-developed-by: Qoder <noreply@qoder.com> * chore(agent): 增加对 Qoder CLI 的支持和识别 - 在 QUICKSTART 文档中添加 Qoder CLI 为可选本地 agent CLI - 更新代码中 agents.ts 注释包含 Qoder CLI 扫描支持 - 修改首次加载时检测的可用 CLI 列表，加入 Qoder CLI - 在多个语言版本的 README 中增加 Qoder CLI 支持及相关徽章统计 - 更新 agent 适配器与事件解析相关的代码注释和文档，包含 qoder-stream-json 解析器 - 调整 Windows 下 spawn 行为以支持 Qoder CLI 的 stdin 提供 prompt - 修复多语言文档对支持的 CLI 数量描述错误，确保数据保持同步 Change-Id: Id33f125b7c0b1a1c0b0274073da74d1578c324f7 Co-developed-by: Qoder <noreply@qoder.com> * feat(agent-icon): 添加新的Qoder徽标SVG图形组件 - 新增qoderGlyph函数，返回指定大小的SVG格式图形 - 图形包含多路径定义，颜色使用深灰和绿色填充 - 该组件可用于替代或补充现有AgentIcon图标功能 - 提升应用程序的品牌标识和视觉表现力 Change-Id: I4eca18166b5e33bc6229b40b2531d5a54607a560 Co-developed-by: Qoder <noreply@qoder.com> * Translate to English: --- docs(readme): update to expand CLI agents to 16 - Increased the number of coding agent CLIs from 11 to 16 - New agents included: Devin for Terminal, Kiro CLI, Kilo, Mistral Vibe CLI, DeepSeek TUI docs(readme): update to expand supported coding agents to 16 - Increased the number of supported code agent CLIs from 11 to 16 - Added support for new CLI tools: Devin for Terminal, Kiro CLI, Kilo, Mistral Vibe CLI, DeepSeek CLI - Added automatic CLI detection and switching while maintaining support for more agents - Added BYOK proxy TUI - Expanded compatibility and support coverage in the README’s multiple language versions - Reflected changes across all README translations (Arabic, German, French, Japanese, Korean) - Updated badges and descriptions to reflect CLI count and feature changes - Added event parsers and protocols for the new CLIs in the agent transport implementation - Updated the BYOK proxy and tool exploration features to be compatible with the expanded CLIs Change-Id: I89786b4a0b09bd279fb23265c2177076206fc5af Co-developed-by: Qoder <noreply@qoder.com> * feat(daemon): 支持 imagePaths 参数作为附件路径传递给 Qoder - 修改 buildArgs 函数，添加 --attachment 参数处理 imagePaths 中的绝对路径 - 过滤并忽略空字符串、非字符串及相对路径的 imagePaths 项 - 在单元测试中覆盖 imagePaths 参数支持及无效项过滤逻辑 - 在文档中补充 Qoder 运行时适配器对 --attachment 参数的说明 Change-Id: Ibfc3583ba86c6d258d524912559e97b77bf1dc87 Co-developed-by: Qoder <noreply@qoder.com> * docs(runtime): 说明Qoder适配器继承用户令牌的环境变量 - 添加文档说明检测代理仅为可用性探针，不进行身份验证 - 说明Qoder CLI账号状态独立，认证通过运行时错误路径反馈 - 详细描述子进程环境继承机制及静态环境变量与用户私密令牌区分 - 明确QODER_PERSONAL_ACCESS_TOKEN通过守护进程环境传递，不写入静态环境 - 解释Qoder验证由Qoder CLI负责，支持持久登录和自动化环境变量注入 test(agent): 添加QODER_PERSONAL_ACCESS_TOKEN环境变量继承测试 - 验证qoder适配器环境继承守护进程中的QODER_PERSONAL_ACCESS_TOKEN - 确认qoder适配器未在静态环境变量中定义用户令牌 - 保证用户私密令牌不会被写入静态适配器环境配置 Change-Id: Ie61869afbe497df1b16879b4e47b35123f758ed8 Co-developed-by: Qoder <noreply@qoder.com> * fix(daemon): 改进Qoder模式支持及错误处理机制 - 更新Qoder CLI参数，使用`--yolo`替代`--permission-mode bypass_permissions` - 将工作目录参数从`--cwd`改为`-w`以符合Qoder文档 - 在agent流事件处理中新增错误捕获并通过SSE错误事件发送 - 运行结束时若检测到agent流错误，则标记运行失败 - 测试中fix(daemon): 优化Qoder代理参数与错误处理 - 调整Qoder启动参数，改用`--yolo`和`-w`替代旧参数，避开argv长度限制 - 增强代理流事件处理，捕获并通过SSE错误通同步更新Qoder参数使用及相应断言 - 新增端到端测试，覆盖Qoder助手错误通过SSE错误通道反馈及运行状态失败处理 - 补充工具函数辅助测试事件流读取与运行状态轮询 Change-Id: I5d933745c3659e093b0d2d807f22726e7f83eb48 Co-developed-by: Qoder <noreply@qoder.com> * feat(qoder-stream): 识别并报告Qoder运行错误事件 - 新增messageFromResult函数以从结果对象提取错误信息 - 在处理result事件时根据is_error字段触发error事件 - error事件携带具体错误消息和原始数据 - 添加测试验证Qoder运行返回is_error且退出码为0时正确触发错误事件 - 更新qoder流解析测试以校验错误事件映射 - 在聊天路由测试中增加针对Qoder错误运行的端到端场景验证 Change-Id: Ie98ac518135dbec3181c52de5a49afdea993e279 Co-developed-by: Qoder <noreply@qoder.com>	2026-05-06 19:54:03 +08:00
PerishFire	f1cdb2844a	test(e2e): gate beta packaged runtime (#637 ) * test(e2e): gate beta mac packaged runtime * test(e2e): separate ui automation layout * test(e2e): move localized content coverage * chore(release): prepare packaged 0.4.1 beta validation * test(e2e): keep ui lane playwright-only * fix(web): keep chat recoverable after conversation load failure * fix(desktop): honor native mac quit	2026-05-06 17:44:29 +08:00
Caprika	8eb9b1b506	Implement manual edit mode (#620 )	2026-05-06 16:13:52 +08:00
Sid	33255a8fdf	Fix agent CLI config and workspace focus mode (#604 ) * fix agent CLI config and workspace focus mode * address CLI env review follow-ups	2026-05-06 16:06:56 +08:00
nettee	8762f06297	Add i18n structure checks (#608 )	2026-05-06 11:55:59 +08:00
Kadu Maverick	2036ce0a8e	feat(web): add Cmd/Ctrl+P quick file switcher (#556 ) * feat(web): add Cmd/Ctrl+P quick file switcher A keyboard-driven file palette overlaid on the workspace. Press Cmd/Ctrl+P anywhere in the project view; type to fuzzy-filter the file list, ↑/↓ to navigate, Enter to open in a tab, Esc to dismiss. With an empty query the palette surfaces recents (per-project, localStorage) followed by the rest of the file list sorted by mtime. Adds: - apps/web/src/components/QuickSwitcher.tsx: palette UI and matcher - apps/web/src/quickSwitcherRecents.ts: per-project recents store - index.css: scoped .qs-* styles using existing design tokens - i18n: 6 new keys translated across all 16 locale files Wires into FileWorkspace's existing openFile() so recents and tab state behave identically to opening from DesignFilesPanel. Capture-phase keydown beats the browser's print dialog. No backend changes; uses the files prop already passed to FileWorkspace. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(web): address QuickSwitcher review feedback Three fixes from the PR review: - z-index: bump .qs-overlay from 200 to 1500 so the palette renders in the modal tier (alongside prompt-template-modal-overlay) instead of behind context menus and popovers (which sit at 200). - Arrow-key guard: skip setCursor when matches is empty. Without this, pressing ↓ on a no-results query set the cursor to -1, making the highlight selector miss every row on the next render. - Tests: add 19 unit tests covering scoreMatch ranking tiers, render output (empty state / row count / kbd hints / placeholder), and the full recents lifecycle (cap at 6, dedupe-on-push, corrupt-JSON recovery, per-project scoping, quota-exceeded no-op). Vitest stays on the node env via a small in-memory localStorage stub. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(web): QuickSwitcher review — wrap, IME, platform gate Three follow-ups from @mrcfps's review on #556: - ArrowUp/ArrowDown now wrap at list bounds (last → first, first → last) via modulo arithmetic in a new pure helper `nextCursor(current, total, direction)`. Previously they clamped, which contradicted the wrap behavior the PR test plan promised. Pulled into a pure function so boundary cases are unit-testable without simulating keyboard events. - Palette's onKeyDown now early-returns on `e.nativeEvent.isComposing`, so users typing CJK file names through an IME keep ↑/↓/Enter for candidate navigation instead of having them steered by the palette. The global Cmd/Ctrl+P opener already had the equivalent guard. - Global keydown is now platform-gated: macOS responds only to metaKey, win/linux only to ctrlKey. Previously both fired everywhere, which meant Ctrl+P on macOS was stealing native readline "previous line" in text fields (and the chat composer). Tests: +6 unit tests for `nextCursor` covering forward/backward wrap, mid-list moves, empty list (no division-by-zero), and single-item no-op. Suite now 258 passing (up from 252). Verified live: ↓ from last row → first row; ↑ from first row → last row, in a mocked-project Playwright harness. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-06 10:31:50 +08:00
soulme	6b7a40e5c3	Fix file tab wheel scrolling (#549 )	2026-05-05 23:23:48 +08:00
Martin Atrin	79fcaef129	Add Tweaks mode for HTML previews with picker, pod selection, and batched chat attachments (#513 ) * Add tweaks mode for HTML preview comments * Fix tweaks geometry and restore critique migration * Harden tweaks mode reload sync * Guard tweaks batch sends during active runs --------- Co-authored-by: puma <puma@pumas-MacBook-Air.local>	2026-05-05 21:09:20 +08:00
Marc Chan	c3d9136a0c	Add live artifacts and Composio connector catalog (#381 ) * docs: add live artifacts implementation spec * docs: align live artifacts implementation plan * Ralph iteration 1: work in progress * Ralph iteration 2: work in progress * Ralph iteration 3: work in progress * Ralph iteration 4: work in progress * Ralph iteration 5: work in progress * Ralph iteration 6: work in progress * Ralph iteration 7: work in progress * Ralph iteration 8: work in progress * Ralph iteration 9: work in progress * Ralph iteration 10: work in progress * Ralph iteration 11: work in progress * Ralph iteration 12: work in progress * Ralph iteration 13: work in progress * Ralph iteration 14: work in progress * Ralph iteration 15: work in progress * Ralph iteration 16: work in progress * Ralph iteration 17: work in progress * Ralph iteration 18: work in progress * Ralph iteration 19: work in progress * Ralph iteration 20: work in progress * Ralph iteration 21: work in progress * Ralph iteration 22: work in progress * Ralph iteration 23: work in progress * Ralph iteration 24: work in progress * Ralph iteration 25: work in progress * Ralph iteration 26: work in progress * Ralph iteration 27: work in progress * Ralph iteration 28: work in progress * Ralph iteration 29: work in progress * Ralph iteration 30: work in progress * Ralph iteration 31: work in progress * Ralph iteration 32: work in progress * Ralph iteration 33: work in progress * Ralph iteration 34: work in progress * Ralph iteration 35: work in progress * Ralph iteration 36: work in progress * Ralph iteration 37: work in progress * Ralph iteration 38: work in progress * Ralph iteration 39: work in progress * Ralph iteration 40: work in progress * Ralph iteration 41: work in progress * Ralph iteration 42: work in progress * Ralph iteration 43: work in progress * Ralph iteration 44: work in progress * Ralph iteration 45: work in progress * Ralph iteration 46: work in progress * Ralph iteration 47: work in progress * Ralph iteration 48: work in progress * Ralph iteration 49: work in progress * Ralph iteration 50: work in progress * Ralph iteration 51: work in progress * Ralph iteration 52: work in progress * Ralph iteration 53: work in progress * Ralph iteration 54: work in progress * Ralph iteration 55: work in progress * Ralph iteration 56: work in progress * Ralph iteration 57: work in progress * Ralph iteration 58: work in progress * Ralph iteration 59: work in progress * Ralph iteration 60: work in progress * Ralph iteration 61: work in progress * Ralph iteration 62: work in progress * Ralph iteration 63: work in progress * Ralph iteration 64: work in progress * Ralph iteration 65: work in progress * Ralph iteration 1: work in progress * Ralph iteration 2: work in progress * Ralph iteration 3: work in progress * Ralph iteration 4: work in progress * Ralph iteration 5: work in progress * Ralph iteration 6: work in progress * Ralph iteration 8: work in progress * Ralph iteration 9: work in progress * Ralph iteration 17: work in progress * Add Composio-backed connectors * Add Composio-backed connector catalog * Fix connector callback flow * Update live artifact connector refresh * Fix live artifact refresh updates * Improve live artifact viewer toolbar * Refine live artifact source tabs * Expand Composio connector catalog * Improve Composio connector browsing * Fix artifact refresh source safety checks Generated-By: looper 0.4.1 (runner=fixer, agent=opencode) * Fix live artifacts PR feedback Generated-By: looper 0.5.0 (runner=fixer, agent=opencode) * Fix live artifact preview CORS validation Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * Fix connector OAuth IPv6 loopback hosts Allow bracketed IPv6 loopback Host headers when deriving connector OAuth callback URLs so IPv6-bound daemons can complete connection flow. Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * Preserve live artifact refresh permissions Respect explicit refresh permission choices during live artifact create and update flows so revoked connector sources remain gated. Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * Fix live artifact preview cache freshness Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * Fix live artifact refresh validation Guard manual refreshes with local daemon checks and reject daemon_tool sources without a toolName before refresh execution. Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * Fix Composio credential invalidation Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * Fix live artifact CORS methods Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * Fix workspace validation Restore media config test isolation under Vitest setup data-dir overrides and add the missing French live artifact display copy so the workspace test suite stays aligned.\n\nGenerated-By: looper 0.5.2 (runner=fixer, agent=opencode) * Fix connector safety filtering Keep agent-preview connector listings aligned with execution safety policy and prune stale Composio OAuth state records before they accumulate. Generated-By: looper 0.5.2 (runner=fixer, agent=opencode) * Fix agent runtime cleanup Generated-By: looper 0.5.2 (runner=fixer, agent=opencode) * Fix live artifact daemon access Validate local-only live artifact routes against the peer socket address and pass daemon-resolved CLI paths to ACP MCP descriptors.\n\nGenerated-By: looper 0.5.2 (runner=fixer, agent=opencode) * Fix connector run limit pruning Evict stale connector rate-limit buckets so long-lived daemon processes do not retain per-run entries indefinitely.\n\nGenerated-By: looper 0.5.2 (runner=fixer, agent=opencode) * Fix connector compact schemas Generated-By: looper 0.5.2 (runner=fixer, agent=opencode) * Improve connector connection feedback * Adjust connector gate positioning * Fix live artifact refresh commits Avoid marking refresh candidates failed after snapshot or state persistence errors by deferring live artifact mutations until the durable refresh metadata is written. Also align connector OAuth callback host validation with daemon loopback handling.\n\nGenerated-By: looper 0.5.4 (runner=fixer, agent=opencode) * Improve connector search relevance * fix(daemon): harden connector connection state Require loopback daemon validation before connector connect side effects and only clear provider-owned connector statuses during credential reset. Generated-By: looper 0.5.4 (runner=fixer, agent=opencode) * fix(daemon): guard connector disconnect route Require local daemon request validation before connector disconnect side effects. Generated-By: looper 0.5.4 (runner=fixer, agent=opencode) * fix(daemon): guard composio config updates Generated-By: looper 0.5.4 (runner=fixer, agent=opencode) * fix(daemon): dispatch live artifacts mcp first Route the live-artifacts MCP server before the generic MCP CLI so od mcp live-artifacts starts the dedicated server instead of failing generic argument parsing.\n\nGenerated-By: looper 0.5.4 (runner=fixer, agent=opencode) * fix(daemon): handle integer connector schemas Allow JSON Schema integer connector inputs while preserving fractional-value validation so generated connector tool schemas accept valid page sizes and limits. Generated-By: looper 0.5.4 (runner=fixer, agent=opencode) * fix: align live artifact refresh error codes Generated-By: looper 0.5.4 (runner=fixer, agent=opencode) * Fix live artifact connector refresh flow * Update live artifact design cards * Add beta badge to live artifact form * Remove live artifact tile model * Fix live artifact refresh sync * Fix live artifact MCP refresh durability Generated-By: looper 0.5.4 (runner=fixer, agent=opencode) * Fix live artifact refresh safety Enforce persisted refresh opt-out and connector auto-read gating before refresh sources execute. Generated-By: looper 0.5.5 (runner=fixer, agent=opencode)	2026-05-05 16:42:11 +08:00
PerishFire	bbdd4e84b5	chore: enforce test directory conventions (#496 ) * chore: enforce test directory conventions Move package, app, and tool tests out of src and add guard enforcement so source directories stay source-only. * ci: use guard and package-scoped tests Run the new repository guard in CI and keep test execution aligned with package-scoped commands after removing root aliases. * ci: align stable release guard check Use the new repository guard in stable release verification after replacing the residual-JS-only script. * chore: tighten test layout enforcement Enforce sibling tests directories, typecheck moved test suites with dedicated configs, and refresh remaining guidance that pointed at src-based tests. * chore: clarify no-emit test tsconfigs Explicitly disable declaration-only emit in test tsconfigs so review tooling sees they are no-emit typecheck configs.	2026-05-05 15:34:22 +08:00

... 3 4 5 6 7 ...

439 commits