open-design

mirror of https://github.com/nexu-io/open-design.git synced 2026-06-01 03:14:35 +07:00

Author	SHA1	Message	Date
open-design-bot[bot]	21e7522574	docs(blog): refresh daily indexing status (#2502 ) Some checks failed ci / Detect CI change scopes (push) Successful in 1s Details visual-baseline / Capture visual baselines (push) Waiting to run Details landing-page-ci / Validate landing page (push) Failing after 2s Details landing-page-deploy / Deploy landing page (push) Has been skipped Details nix-check / build (push) Failing after 1s Details ci / Preflight (push) Failing after 1s Details ci / Core package tests (push) Failing after 1s Details ci / Tools workspace tests (push) Failing after 1s Details ci / Daemon workspace tests (1/2) (push) Failing after 1s Details ci / Daemon workspace tests (2/2) (push) Failing after 1s Details ci / Web workspace tests (push) Failing after 1s Details ci / E2E vitest (push) Failing after 1s Details ci / Playwright critical (starters) (push) Failing after 1s Details ci / Playwright critical (core) (push) Failing after 1s Details ci / Build workspaces (push) Failing after 1s Details ci / App workspace tests (push) Failing after 0s Details ci / Validate workspace (push) Failing after 1s Details ci / Runtime trace (push) Has been skipped Details Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>	2026-05-21 14:35:50 +08:00
open-design-bot[bot]	b7025c4a78	docs(blog): refresh 3-day traffic digest (#2503 ) Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>	2026-05-21 11:50:47 +08:00
open-design-bot[bot]	5bf128ebdc	Update docs/assets/github-metrics.svg (#2493 ) Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>	2026-05-21 11:46:41 +08:00
open-design-bot[bot]	7905e72962	docs(blog): refresh 3-day traffic digest (#2349 ) Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>	2026-05-20 18:00:22 +08:00
shangxinyu1	71044bd3d6	test(e2e): harden extended coverage state assertions (#2245 ) * test(e2e): harden extended coverage contracts * docs(testing): add e2e hardening status * fix(web): persist artifact chips after daemon runs * ci: install playwright browsers for e2e vitest * Fix daemon run recovery across reloads Pin daemon-created runs to assistant messages immediately so hard reloads before the create response can reattach. Replay terminal and active run events from the beginning on reload so restored turns keep assistant text, thinking events, produced files, and artifacts. Fixes #2366 Fixes #2368 Fixes #2371 * test(e2e): preserve fake runtime selection across reload * fix(web): scope daemon run recovery to daemon mode * fix(e2e): remove duplicate delayed smoke flag * fix(web): scope replay artifact recovery to current run * fix(daemon): remove duplicate run-create pin	2026-05-20 16:21:01 +08:00
ashleyashli	65e760b88a	feat(seo): add GSC report opportunities (#2388 ) Co-authored-by: ashley li <ashleyli@ashleydeMacBook-Air-2.local> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-20 16:19:14 +08:00
open-design-bot[bot]	b409e5b923	docs(blog): refresh daily indexing status (#2346 ) Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>	2026-05-20 16:13:16 +08:00
ashleyashli	9122a38080	fix(seo): include fresh GSC data in daily report (#2343 ) Co-authored-by: ashley li <ashleyli@ashleydeMacBook-Air-2.local> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-20 12:22:15 +08:00
open-design-bot[bot]	a89aa56db3	Update docs/assets/github-metrics.svg (#2312 ) Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>	2026-05-20 10:41:27 +08:00
open-design-bot[bot]	6a6959ed30	docs(blog): refresh indexing status after deploy (#2250 ) Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>	2026-05-19 18:25:55 +08:00
ashleyashli	07659b7272	feat(seo): add Search Console reporting workflows (#2229 ) * feat(blog): daily 3-day Search Console traffic digest Adds `blog-3day-report.yml` (cron 09:00 Asia/Shanghai) and a companion `report-3day.ts` script that refreshes `docs/blog-traffic-digest.md` once per day. The digest has two sections: - T-3 spotlight: posts published exactly three days ago, with their 3-day Search Analytics window plus current URL Inspection coverage state. - Rolling 30-day cohort: every post 1–30 days old with its latest 3-day Search Analytics window, sorted by impressions descending. The workflow is read-only against Google APIs (no Indexing API, no "request indexing" automation) and mirrors the secret / config plumbing already used by `blog-indexing-monitor.yml`. Output lands in a reviewable `automation/blog-traffic-digest` PR opened by the open-design bot. Also widens `querySearchAnalytics` to accept `windowDays: 3 \| 7 \| 28` and updates `docs/blog-indexing-automation.md` with the new pipeline. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(seo): post daily Search Console report to Feishu Co-authored-by: Cursor <cursoragent@cursor.com> * feat(blog): push traffic digest to Feishu Emit a compact JSON summary from the daily 3-day traffic digest and add a Feishu custom bot sender for the summary card. Wire the workflow to send the card when `FEISHU_BLOG_DIGEST_WEBHOOK` is configured while keeping Markdown PR output as the source of truth. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(landing-page): add Discord routing CTAs Add a lightweight Discord pill to the landing hero and Discord links in the landing and blog footers so community routing is visible without displacing the primary GitHub and download CTAs. Add a blog-ending conversion card that points guide and use-case readers to the internal workflows library, while keeping Discord as a secondary support path. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: ashley li <ashleyli@ashleydeMacBook-Air-2.local> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-19 18:09:44 +08:00
nettee	320bd4c303	docs: document ACP runtime integration (#2232 ) * docs: document ACP runtime integration * docs: link ACP runtime guide from entrypoints Generated-By: looper 0.7.5 (runner=fixer, agent=opencode, model=openai/gpt-5.4)	2026-05-19 18:03:10 +08:00
Tom Huang	86ec951fb9	[codex] Add automation templates and proposal workflows (#2193 ) * feat(web): introduce Automations tab with dual-track capability for routines This commit adds a new Automations tab that consolidates routines, schedules, and live artifacts, allowing users to manage automations seamlessly. The tab features a modal for creating and editing automations, which supports various scheduling options (hourly, daily, weekdays, weekly) and project modes (create_each_run, reuse). The CLI is also updated to expose automation commands, ensuring consistency between the web UI and CLI interfaces. Key changes include: - New `NewAutomationModal` component for automation creation and editing. - Updated `TasksView` to integrate the new Automations functionality. - Enhanced styling for the Automations tab to improve user experience. This implementation aligns with the dual-track capability exposure policy, ensuring all features are accessible via both the web UI and CLI. * feat(daemon): enhance automation context handling and CLI commands This commit introduces several improvements to the automation context management and updates the CLI commands accordingly. Key changes include: - Added support for new context fields (`plugin`, `mcp`, `connector`) in automation commands. - Updated the CLI to reflect new target options (`new-project`). - Enhanced error messages for invalid target inputs. - Introduced functions to handle context selection and normalization for routines, including the ability to parse and store context data in the database. - Updated the database schema to include a new `context_json` field for routines. - Improved the handling of context in routine routes and the web interface, ensuring that selected contexts are properly managed and displayed. These changes aim to provide a more robust and flexible automation experience, aligning with the recent enhancements in the web UI. * feat(web): enhance TasksView with automation run history and status indicators This commit introduces several new features to the TasksView component, including: - Added functionality to display automation run history for each routine, showing metadata such as status, timestamps, and project details. - Implemented status indicators for routine runs, providing visual feedback on their current state (succeeded, failed, running, queued). - Enhanced the UI to allow users to expand and view detailed run history, including the ability to open the corresponding project conversation. - Updated styles to improve the presentation of automation statuses and history. These changes aim to provide users with better insights into their automation routines and improve overall usability. * feat(daemon): implement automation ingestion and proposal management This commit introduces several new features related to automation ingestion and proposal management within the daemon. Key changes include: - Added new modules for handling automation source packets and proposals, allowing for the storage, retrieval, and management of automation-related data. - Implemented functions to list, create, and apply automation proposals, enhancing the automation workflow. - Introduced new CLI commands for interacting with memory entries and automation sources, providing users with more control over their automation processes. - Enhanced the server routes to support automation source and proposal APIs, enabling seamless integration with the existing system. These changes aim to improve the overall automation experience, making it easier for users to manage and utilize automation proposals and ingestions effectively.	2026-05-19 16:35:28 +08:00
ashleyashli	5d28f1c19d	Add community and use case blog posts (#2103 ) * Add community and use case blog posts Co-authored-by: Cursor <cursoragent@cursor.com> * Fix plugin workflow example path Co-authored-by: Cursor <cursoragent@cursor.com> * Fix plugin workflow commands and CTA Co-authored-by: Cursor <cursoragent@cursor.com> * Reduce blog topics diff noise Co-authored-by: Cursor <cursoragent@cursor.com> * Align plugin publishing docs with CLI Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: ashley li <ashleyli@ashleydeMacBook-Air-2.local> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-19 14:05:16 +08:00
open-design-bot[bot]	226bc57471	Update docs/assets/github-metrics.svg (#2145 ) Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>	2026-05-19 13:50:05 +08:00
mec dot	76561964a2	docs: add Docker deployment guide with verification screenshots (#1851 )	2026-05-18 18:00:20 +08:00
Caprika	832bdeb535	Centralize daemon startup (#2054 )	2026-05-18 17:08:17 +08:00
open-design-bot[bot]	281c2273d1	Update docs/assets/github-metrics.svg (#2010 ) Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>	2026-05-18 11:05:26 +08:00
open-design-bot[bot]	5b7a2fdb2f	Update docs/assets/github-metrics.svg (#1931 ) Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>	2026-05-17 11:25:46 +08:00
open-design-bot[bot]	cfd361af3a	Update docs/assets/github-metrics.svg (#1915 ) Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>	2026-05-16 22:45:00 +08:00
lefarcen	7d1adf9fd7	docs: point 0.8.0 preview contributors at main (#1846 ) * docs: point 0.8.0 preview contributors at main, not preview/v0.8.0 0.8.0 has been merged into main (#1832). Anywhere we used to tell contributors to checkout / PR against preview/v0.8.0 was actively mis-routing new PRs. Update: - docs/preview-v0.8.0-announcement.md + zh-CN: status line, Branch row, source-build checkout, and 'open a PR against' guidance now point at main - .github/ISSUE_TEMPLATE/bug-report.yml + feature-request.yml: phrase the 'use the preview template' nudge as 'about the 0.8.0 preview features (now on main)' instead of 'about the preview/v0.8.0 branch' - .github/ISSUE_TEMPLATE/config.yml: same rewording for the contact link - .github/ISSUE_TEMPLATE/preview-v0.8.0-feedback.yml: refresh the description and the intro body so it reads as 'preview features pre-tag', not 'features pre-merge' The preview-v0.8.0-feedback template and preview/v0.8.0 label are intentionally kept: 0.8.0 isn't tagged yet, so we still want a dedicated lane for preview-features feedback. * chore: stop treating preview/v0.8.0 as a live branch Earlier in this PR we kept the preview-v0.8.0 surface area intact — that was the wrong call. 0.8.0 is now on main; pretending there's a parallel 'preview' branch in the templates, labels, and copy was going to keep mis-routing contributors. Drop: - .github/ISSUE_TEMPLATE/preview-v0.8.0-feedback.yml (the dedicated template that auto-applied the preview/v0.8.0 label and prefix) - .github/ISSUE_TEMPLATE/config.yml contact_links entry pointing at it - bug-report.yml + feature-request.yml nudges that sent users there - The Preview-v0.8.0-feedback link block from both announcement docs (replaced with normal bug-report / feature-request links) Rename: - docs/preview-v0.8.0-announcement.{md,zh-CN.md} -> docs/v0.8.0-announcement.{md,zh-CN.md} so the on-disk doc title reads as a 0.8.0 announcement, not a branch-specific one. No other repo file referenced the old paths. The preview/v0.8.0 label and branch themselves are intentionally untouched — those are separate ops the maintainer will decide on later. This PR only removes mentions inside the repo. * chore: keep 0.8.0 preview-feedback template as a chooser-level ad The previous commit deleted preview-v0.8.0-feedback.yml entirely. Bring it back, but reframe it: it's now the dedicated 0.8.0 lane in the issue chooser — a high-visibility surface that tells visitors "0.8.0 is here as a preview, please share what you noticed." - Renamed in the chooser to "Open Design 0.8.0 — preview feedback" - Title prefix shortened from "[preview/v0.8.0] " to "[0.8.0] " so the branch slug no longer leaks into issue titles - label preview/v0.8.0 still auto-applied (the label entity is still in use across 26 issues; maintainer will decide on its fate separately) - Area dropdown widened from "Skills + Automations" to cover the actual 0.8.0 surface (plugins, headless, agent flow, desktop shell) - Intro body rewritten to read as a preview-release ad, not a feature-branch tester request Announcement docs (English + Chinese) also routed their "open an issue" CTA back through this template instead of the generic bug-report / feature-request links — same advertising goal.	2026-05-15 22:37:04 +08:00
lefarcen	e40399d39a	Merge pull request #1832 from nexu-io/sync/main-into-preview-v0.8.0 Release preview/v0.8.0 into main	2026-05-15 20:44:27 +08:00
ashleyashli	772ef97476	feat(landing): automate blog indexing monitoring (#1825 ) * feat(landing): add blog indexing automation Automate supported blog discovery checks through sitemap submission, URL Inspection monitoring, IndexNow notifications, and guarded SEO CI checks. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(landing): support oauth for blog indexing Use OAuth refresh-token auth as the preferred Search Console path while keeping service-account auth as a fallback, so the indexing workflows can run despite GSC service-account invite issues. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(landing): tighten blog indexing observability Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: ashley li <ashleyli@ashleydeMacBook-Air-2.local> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-15 18:32:30 +08:00
lefarcen	22a3b99a47	Merge origin/main into preview/v0.8.0 Sync 49 commits from main. Conflicts resolved: - .github/workflows/ci.yml: kept v0.8.0 granular per-area gating, added main's linux specs + release-stable.yml + release-preview.yml triggers - .github/workflows/release-preview.yml: kept v0.8.0's full workflow over main's placeholder - apps/web/src/components/AssistantMessage.tsx: combined v0.8.0 file-ops summary with main's stripTodoToolGroups + suppressAskUserQuestionFallbackText - apps/web/src/components/ChatPane.tsx: kept both new imports - apps/web/src/index.css: kept both .msg-plugin-chip and .user-copy-btn blocks - e2e/ui/*.test.ts: kept v0.8.0 openEntrySettingsDialog helper over main's inline dialog navigation (UI was redesigned in v0.8.0) - nix/package-{daemon,web}.nix: kept v0.8.0 pnpmDepsHash; rerun nix build to refresh	2026-05-15 18:23:33 +08:00
Olin Hendershot	74637f1cb5	Add Linux packaged client parity smoke coverage (#1204 ) * docs: plan linux client issue 709 * fix: complete linux headless lifecycle routing * feat: add linux packaged inspect * test: add linux headless packaged smoke * ci: add linux headless packaged smoke * ci: smoke linux AppImage release artifacts * docs: document linux packaged client status * chore: finalize linux client audit remediation * docs: add linux client publication packet * test: harden linux client smoke coverage * ci: preserve linux smoke audit evidence * refactor: consolidate linux e2e helpers Move pathExists and the desktop/web/daemon app-key array out of linux.spec.ts into linux-helpers.ts, where expectPathInside and linuxUserHome already live. Keeps the spec file focused on tests and the helpers file as the canonical home for shared Linux e2e utilities. * fix: move linux e2e helpers to lib * fix: address linux release review blockers * fix: drop npm dependency from containerized linux build writeAssembledApp() previously called runNpmInstall() which executed `npm install` directly. Inside the containerized build path, electronuserland/builder:base strips npm/npx/corepack, so the inner tools-pack build would fail at the assembled-app install step. Route the install through OD_TOOLS_PACK_PNPM_BIN: buildDockerArgs sets the env to the standalone pnpm binary it bootstraps, and the new resolveProductionInstallCommand helper consumes that env to run `<bin> install --prod --no-lockfile --config.node-linker=hoisted`. Host invocations with no env set keep the prior npm behavior. --config.node-linker=hoisted preserves the flat node_modules layout that electron-builder packs the same way as npm-installed trees. New tests cover the resolver branches and assert the docker-arg-to- resolver chain end-to-end so reviewers can see the container's inner build receives the env that switches its install away from npm. * fix: harden linux container bootstrap * fix: validate desktop marker liveness in headless cleanup cleanup --headless previously skipped on any parseable desktop-root.json, trapping recovery when the AppImage had crashed and left a stale marker. Validate the marker the same way stopPackedLinuxApp does: if the PID is not in the live snapshot list, proceed through cleanup instead of skipping. Extract the validation into validateDesktopAppImageMarker so the stop and cleanup paths share one definition of live and owned. Tests cover both branches: a stale marker drives cleanup to remove the runtime/output roots, while a live marker drives cleanup to skip and preserve them.	2026-05-15 16:38:29 +08:00
Tom Huang	c5d77a03bd	Garnet hemisphere (#1769 ) Some checks failed nix-check / build (push) Failing after 2s Details * feat(chat-composer): enhance mention handling and input overlay - Introduced a new overlay for inline mentions in the chat composer, improving user experience by visually indicating mentions as users type. - Updated the `ChatComposer` component to manage mention entities and integrate them into the input field, allowing for better context and interaction. - Enhanced the `AssistantMessage` component to support the display of plugin action panels based on the current project context, facilitating easier plugin management. - Refactored related components to ensure consistent handling of project files and mentions across the application. This update significantly improves the chat interaction model, making it more intuitive for users to engage with mentions and plugins. * feat(plugin-management): enhance plugin action panels and UI components - Updated the `AssistantMessage` component to include plugin action panels based on the latest project context, improving user interaction with generated plugins. - Refactored the `PluginsView` to support detailed views for available marketplace entries, allowing users to access more information and actions for each plugin. - Introduced new CSS styles for improved visual representation of plugin-related UI elements, enhancing overall user experience. - Enhanced the `listPlugins` function to include an option for fetching hidden plugins, providing more flexibility in plugin management. This update significantly improves the usability and functionality of the plugin management system, making it easier for users to interact with and manage their plugins. * fix(assistant-message): refine plugin folder candidate selection logic - Updated the `pluginFoldersTouchedThisTurn` function to improve the logic for selecting plugin folder candidates based on touched paths and message content. - Introduced a new helper function, `pathMatchesFolderFileBasename`, to enhance the matching criteria for folder candidates. - Added a check for explicit folder matches before falling back to a single candidate, improving accuracy in folder selection. - Modified the `shouldRenderSlotAsText` function in `HomeHero` to include the name parameter, refining the rendering logic for slot text. These changes enhance the functionality and reliability of the assistant message component in managing plugin folder candidates. * feat(plugin-folder-actions): implement agent-routed CLI actions for plugin management - Introduced a new `PluginFolderAgentAction` type to streamline actions related to plugin folders, including install, publish, and contribute. - Updated the `DesignFilesPanel`, `FileWorkspace`, and `AssistantMessage` components to utilize the new agent action handling, improving user interaction with generated plugins. - Refactored the action handling logic to send commands to the agent, enhancing the workflow for managing plugin folders. - Added corresponding tests to ensure the new functionality works as expected and integrates seamlessly with existing components. This update significantly enhances the plugin management experience by routing actions through the agent, allowing for a more cohesive and interactive user experience. * Fix PR 1702 CI blockers * Fix PR 1702 remaining CI checks * Prebuild AGUI adapter after install * Restore plugin project snapshot wiring * feat(marketplace): refactor marketplace URL handling and enhance fetching logic - Introduced new functions to normalize marketplace URLs and manage fetching of marketplace manifests, improving the reliability of marketplace integrations. - Updated the server and plugin logic to utilize the new fetching mechanisms, ensuring consistent handling of marketplace data. - Enhanced tests to cover new URL normalization and fetching scenarios, ensuring robustness in marketplace management. This update significantly improves the marketplace experience by streamlining URL handling and enhancing data fetching capabilities. * Fix project auto-send cleanup spec * Reconcile run messages on cancel * Use active design system as visual direction * Fix active design system prompt wording * feat(workspace-tabs): implement workspace tabs functionality and file attachment handling - Introduced a new `WorkspaceTabsBar` component to manage workspace tabs, allowing users to navigate between different views (projects, marketplace, etc.). - Enhanced file handling capabilities in the `HomeHero` and `EntryShell` components, enabling users to stage and attach files before project creation. - Updated the `App` component to support auto-sending attachments alongside the first message in a project. - Improved CSS styles for workspace tabs and attachment UI, ensuring a cohesive design and user experience. This update significantly enhances the workspace navigation and file management features, providing users with a more intuitive and efficient workflow. * refactor(workspace-tabs): streamline workspace tabs and UI components - Removed unused components and actions from the `WorkspaceTabsBar` and `AppChromeHeader`, simplifying the codebase. - Updated CSS styles for the workspace shell and tabs, enhancing visual consistency and reducing element sizes for a cleaner layout. - Introduced a new client type detection mechanism to dynamically adjust the workspace shell's class, improving responsiveness. - Added tests for the `WorkspaceTabsBar` to ensure proper navigation and tab management functionality. These changes improve the overall performance and user experience of the workspace navigation system. * Update critical e2e for entry modal flow * Stabilize entry critical e2e flows * fix(ui): adjust workspace tabs and header styles for improved layout - Updated the CSS for workspace tabs and the app header, reducing element sizes and padding for a cleaner appearance. - Introduced a new button in the `WorkspaceTabsBar` for quick access to the home tab, enhancing navigation. - Minor adjustments to the layout and styles to ensure consistency across components. These changes enhance the user interface and improve the overall user experience in the workspace navigation system. * feat(workspace-tabs): implement pinned home tab functionality - Added a new pinned home tab feature to the `WorkspaceTabsBar`, allowing the home tab to remain accessible during navigation. - Updated tab management logic to collapse duplicate home tabs into a single pinned instance when restoring from local storage. - Enhanced CSS styles for workspace tabs to accommodate the new pinned tab design. - Updated tests to verify the behavior of the pinned home tab and its interaction with other tabs. These changes improve navigation consistency and user experience within the workspace. * refactor(workspace-tabs): enhance tab management and styling - Updated CSS styles for workspace tabs, adjusting padding and flex properties for improved layout and consistency. - Refactored tab creation logic to ensure unique IDs for project and marketplace tabs, enhancing navigation clarity. - Removed deprecated functions related to pinned home tabs, streamlining the codebase. - Improved test cases to verify independent behavior of home tabs during navigation. These changes enhance the user experience by providing a more intuitive tab management system and a cleaner UI. * style(workspace-tabs): update CSS for improved layout and visibility - Adjusted CSS properties for workspace tabs, including overflow, position, and z-index to enhance layout and stacking context. - Ensured consistent styling across tab components for better visual hierarchy. These changes contribute to a more polished and user-friendly interface within the workspace. * style(entry-layout): update CSS variables for improved layout consistency - Replaced fixed width values with CSS variables for the entry rail to enhance flexibility. - Adjusted padding and height properties for better visual alignment and spacing. - Introduced a new background style for the entry main topbar to improve aesthetics. These changes contribute to a more responsive and visually appealing layout in the entry view. --------- Co-authored-by: qiongyu1999 <2694684348@qq.com> Co-authored-by: Eli <129168833+qiongyu1999@users.noreply.github.com>	2026-05-15 14:42:11 +08:00
chaoxiaoche	bcc58af931	refactor(web): rename Execution mode and tighten settings dialog UI (#1568 ) * refactor(web): rename Execution mode and tighten settings dialog UI - Rename "Settings → Execution & model" to "Settings → Execution mode" across the web UI, i18n keys, docs, and e2e selectors. - Redesign SettingsDialog: kicker + title row in the modal head, a flatMap-driven agent grid that renders the inline test-result row beside the selected card, compact unavailable cards with right-aligned install/docs links, and an install guide that only shows when the user has no working agent picked. - Trim verbose subtitle / hint copy across chat model, CLI proxy, media providers, custom instructions, and memory sections. - Add an `info` Icon variant for the redesigned settings hints. - Update e2e selectors and docs that referenced the old menu label. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(web): polish Settings dialog — media providers, skills, MCP Media providers - Hide internal Stub fixture provider (settingsVisible: false) - Split provider list into Available (integrated, editable) and Coming Soon (collapsed <details> drawer with name/hint/Docs link only) - Drop right-side Integrated/Configured badges from every row; all rows in the main list are integrated by definition; inline grey "Saved" chip next to the provider name is the only status indicator now - "Saved" badge moves inline to the right of the provider name and uses a neutral grey treatment (was a standalone green pill below the name) - "Reload from daemon" button shows a 2s green "✓ Reloaded" flash on success instead of leaving a permanent paragraph under the header; errors remain sticky Skills - Replace three pill-row filter banks (Source, Type, Category) with a compact single-row toolbar: search + three inline <select> dropdowns side by side; active filter highlighted with a stronger border MCP server - Shorten section hint to one line - Move WHAT YOUR AGENT CAN DO capabilities above the client dropdown (motivate before asking to act) - Move "Build the daemon first" warning below the code block where it contextually explains why the command might fail, not as a top-level error before the user has done anything - Downgrade "Restart your client" left-border from accent orange to border-strong grey — it is a next step, not a warning External MCP - Shorten section hint to one line Misc CSS - Add .sr-only utility for accessible off-screen live regions - Add button.ghost.is-success-flash for transient success feedback - Add .library-filter-selects / .library-filter-select for dropdown filter rows - Add .media-provider-coming-soon-* for the roadmap drawer Co-authored-by: Cursor <cursoragent@cursor.com> * [codex] Add Cursor Agent auth diagnostics (#1538) * Add Cursor Agent auth diagnostics * Handle Cursor not logged in auth status * Address Cursor auth review feedback * Classify Cursor stdout auth failures * test: expand Memory and Routines coverage (#1521) * test: expand settings and packaged coverage * test: extend memory settings coverage * test: cover routine settings failure states * test: cover routine operation failures * test: fix daemon test typing on CI * test: decouple packaged smoke from orbit bug * test: avoid live memory LLM calls in route tests * test: fix daemon fetch typing in CI * fix: restore preview comment and inspect toggles * test: align manual edit flow with current inspector UX * test: align comment attachment flow with current preview comments UI * fix: probe resolved Codex launch path during detection * fix: remove duplicate board activation helper after rebase * test: update ghost cli detection mock * test: align FileViewer toolbar expectation * ci: move full app tests to extended lane * ci: run app tests by changed scope * ci: cover shared app inputs in test scopes * ci: avoid setup-node cache in windows packaged smoke * test: align extended settings and manual edit flows * refactor(web): rename Execution mode and tighten settings dialog UI - Rename "Settings → Execution & model" to "Settings → Execution mode" across the web UI, i18n keys, docs, and e2e selectors. - Redesign SettingsDialog: kicker + title row in the modal head, a flatMap-driven agent grid that renders the inline test-result row beside the selected card, compact unavailable cards with right-aligned install/docs links, and an install guide that only shows when the user has no working agent picked. - Trim verbose subtitle / hint copy across chat model, CLI proxy, media providers, custom instructions, and memory sections. - Add an `info` Icon variant for the redesigned settings hints. - Update e2e selectors and docs that referenced the old menu label. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(web): settings dialog UX polish — layout, dedup, and interactions - Remove duplicate section headers from all settings sections (Notifications, Appearance, Privacy, About, Design Systems, Skills, MCP server, Connectors, Media providers, Routines) - Restructure Notifications cards: title + toggle on same row, hint below - Restructure Skills toolbar: search + New skill button in row 1, filter dropdowns in row 2 with left-aligned labels - Restructure Pet section: tabs and Wake button on same row - MCP server: group capabilities and setup into separate cards, remove nested double border on client picker - Connectors: show connect errors as toast instead of inline card text, position toast inside panel, hide single-provider tab - Media providers: move Reload button to left-aligned small ghost button - Memory: info icon shows path on hover, Path copied badge inline; Extraction history and MEMORY.md as standalone collapsible cards; group header hidden when only one type visible - Pet grid cards: Adopt button hidden until hover, icon-only when adopted, description truncated to 2 lines, text fills full width via abs positioning - Agent cards: selected state uses accent border only, no background change - Add sun/moon icons to Appearance theme buttons (Light/Dark) - Shorten several hint strings for clarity Co-authored-by: Cursor <cursoragent@cursor.com> * fix(web): resolve i18n review comments from PR #1568 - Update settings.title and settings.envConfigure to localized "Execution mode" in all 17 non-English locale files - Add settings.memoryFlashPathCopied to all locales and use t() in MemorySection instead of hardcoded English "Path copied" - Add settings.agentModelHead to all locales and use t() in SettingsDialog for "Model for:" agent model row header Co-authored-by: Cursor <cursoragent@cursor.com> * fix(web): update tests to match settings dialog redesign - Add role prop to Toast (alert/status) so error toasts from ConnectorsBrowser are announced immediately by screen readers - Clear connectErrorToast on successful connector retry - Update SettingsDialog.execution tests: - Remove heading assertions for About and MCP server (headers were intentionally removed as duplicate nav labels) - Rewrite CLI env test to use codex-only fields (per-agent filtering means only selected agent's fields are shown) - Update Composio key hint text assertion to match shortened copy - Replace filter button click with select change for Type filter - Replace Configured/Unsupported/Integrated badge checks with updated assertions matching the new media provider UI - Replace disabled BFL row test with coming-soon section check - Update SettingsDialog.media test: remove Fal.ai input assertions (non-integrated providers no longer have editable fields) Co-authored-by: Cursor <cursoragent@cursor.com> * fix(web): unblock CI for #1568 Three small fixes to get Playwright back to green on the settings dialog redesign: 1. `en.ts`: revert `settings.envConfigure` to "Configure execution mode". This PR collapsed both `settings.title` (header gear) and `settings.envConfigure` (entry-side foot pill) to the same string "Execution mode", so `getByRole('button', { name: 'Execution mode' })` resolved to two elements and tripped Playwright strict mode in the three Composio-flow tests (entry-configuration-flows.test.ts:174, 228, 285). Restoring the distinct label also gives screen readers a clearer hint for the pill, which doubles as a status display. Non-English locales still alias the two keys; happy to follow up on those, but they don't gate the (English-only) Playwright suite. 2. entry-configuration-flows.test.ts:167 — `Connectors` heading is now rendered at `<h2>` in the modal-head (SettingsDialog.tsx:1545), with the inner `<h3>` removed by design (see comment around line 1448). Updated the assertion from `level: 3` to `level: 2`. 3. project-management-flows.test.ts:360 — same change for the `Pets` heading. Verified locally with `pnpm --filter @open-design/web typecheck` and `pnpm --filter @open-design/e2e typecheck`. The actual Playwright specs need the dev server up; I didn't rerun them here, but the locator changes are mechanical and match the new DOM. * fix(web): use exact match for Execution mode button locator Playwright's `getByRole({ name })` defaults to substring matching, so `{ name: 'Execution mode' }` still resolved to both the header gear (aria-label "Execution mode") and the entry-side foot pill (aria-label "Configure execution mode" — substring contains "Execution mode"). Strict mode tripped in the three composio-flow tests at lines 202, 257, and 319. Adding `exact: true` makes each call resolve to just the header gear, which opens the same dialog the foot pill does — the test outcomes are unchanged. --------- Co-authored-by: chaoxiaoche <chaoxiaoche@chaoxiaochedeMacBook-Pro.local> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Caprika <56862773+alchemistklk@users.noreply.github.com> Co-authored-by: shangxinyu1 <shangxinyu@refly.ai> Co-authored-by: lefarcen <935902669@qq.com>	2026-05-15 14:35:06 +08:00
PerishCode	4f15c33595	Merge remote-tracking branch 'origin/preview/0.8.0' into preview/v0.8.0	2026-05-14 21:10:03 +08:00
Nagendhra Madishetti	98bc6d63e6	feat: Critique Theater wireup (activate the stack, M0 dark-launch by default) (#1338 ) * feat(web): pure reducer for Critique Theater states (Phase 7.1) Pure CritiqueState reducer driven by the contracts-level PanelEvent (the same shape both the live SSE stream and the recorded transcript emit), so a single reducer powers both the in-flight panel and the rerun replay. Lifecycle covers run_started → running → (shipped / degraded / interrupted / failed), with panelist_open / dim / must_fix / close / round_end events building per-round CritiquePanelistView entries as they arrive. Defensive behaviour that surfaced while writing the spec tests: - Terminal phases (shipped / degraded / interrupted / failed) are sticky against further lifecycle events for the same run, except for parser_warning which can land late and is recorded in a side channel without changing phase. - A new run_started for a different runId at any time discards the prior state and reboots, so the UI can launch consecutive runs without an explicit reset action. - Events whose runId does not match the active run return the same state reference, so React's useReducer doesn't re-render subscribers on stray traffic. - Round bookkeeping keys by round number rather than "always last", so an out-of-order panelist_dim for round 1 arriving after a round 2 dim does not corrupt the round 2 bucket. Test coverage: 18 cases covering each transition, the runId guard, sticky-terminal behaviour, the out-of-order round invariant, and the stable-identity guarantee. Sets up Phase 7.2 and 7.3 to wire SSE + replay into the same reducer. * feat(web): useCritiqueStream hook subscribes to SSE and feeds reducer (Phase 7.2) createCritiqueEventsConnection is a pure connection manager that mirrors apps/web/src/providers/project-events.ts: opens an EventSource at /api/projects/:id/events, listens for every name in CRITIQUE_SSE_EVENT_NAMES, decodes each frame back into a PanelEvent (stripping the critique. prefix and merging the data payload), and hands it to the caller's onEvent. Reconnect uses exponential backoff (1s → 30s) and resets on `ready`; malformed payloads drop with a dev-mode warning rather than tearing the stream. useCritiqueStream wraps the manager in a useReducer that owns the CritiqueState. enabled=false or a null projectId tears down the connection cleanly; switching projectId closes the old connection and opens a fresh one. The returned dispatch lets local UI synthesise actions (e.g. an Esc keypress firing a synthetic interrupted while a kill request is in flight); production traffic comes from the SSE stream. Test coverage: - sse.test.ts (10 cases, node env): subscription set covers every CRITIQUE_SSE_EVENT_NAMES channel; payload decoding lifts the wire shape back to PanelEvent; malformed JSON is swallowed and does not stop the stream; exponential backoff schedule and ready-reset semantics are pinned with a setTimeout seam; close() cancels pending reconnects and shuts the live source; no-op fallback when EventSource is unavailable. - useCritiqueStream.test.tsx (6 cases, jsdom env): idle pre-event, reducer driven by synthetic actions, no connection when disabled or projectId is null, clean close on unmount, projectId change reopens cleanly. * feat(web): useCritiqueReplay hook drives reducer from transcript file (Phase 7.3) Fetches the per-run NDJSON transcript (one PanelEvent per line), parses every line via the shared isPanelEvent predicate, and dispatches into the same CritiqueState reducer the live SSE stream uses. A single reducer means the UI rendering a replay can be identical to the live panel, and a UI mounting both useCritiqueStream and useCritiqueReplay in parallel does not have to reconcile two state shapes. speed knob is `paused \| instant \| live \| { intervalMs: N }`. - instant flushes every event synchronously, useful for opening a finished run already at its terminal state. - intervalMs paces dispatches at a fixed cadence so the reviewer can watch the run unfold. - paused parses the transcript but holds events back until the caller advances speed (consumers can drive a scrubber later). - live is reserved for the future "playback at original cadence" feature, currently treated as instant; replay timestamps are not yet persisted with each event so honest pacing requires a follow-up Phase 7+ task. gunzip seam handles `.ndjson.gz` transcripts via DecompressionStream when present; the production fetch path picks between text and arrayBuffer based on the URL extension. Both seams are injectable so the unit tests don't need to spin up a real network or a real gzip pipeline. Test coverage (8 cases, jsdom env): - Idle status before any URL is provided. - speed=instant flushes the full transcript synchronously to shipped state. - speed={intervalMs:N} paces with the setTimeout seam, reaching done after the last tick. - speed=paused leaves status=playing with no dispatches. - Empty transcript reports done with state still idle. - Fetch rejection surfaces an error status with the message. - Malformed NDJSON lines are skipped; valid events around them still land. - .gz transcripts route through the gunzip seam. Closes the Phase 7 plan tasks 7.1 / 7.2 / 7.3 (reducer + stream + replay), all on one branch ready for review. Phases 8+ (Theater components) consume these from this PR. * fix(web): close payload-override gap + paused-resume bug in Critique Theater hooks (Phase 7 review) Two P1 fixes from lefarcen's review on PR #1307: SSE payload override `sseToPanelEvent` previously spread `data` after the channel-derived `type`, so a payload-provided `type` could override the channel and route a `critique.run_started` frame into the reducer as a `ship` action. Reversed the spread so the channel-derived `type` is authoritative, and revalidated the resulting object through the contracts-level `isPanelEvent` predicate before returning. Frames that fail validation (missing runId, empty runId, unknown type) are dropped, so a malformed or compromised SSE frame can no longer dispatch a wrong-shape action into the reducer. Three new sse.test.ts cases pin the regression: hostile `type:'ship'` in the payload still resolves to `run_started`, missing runId is dropped, empty runId is dropped. Replay pause/resume `useCritiqueReplay` had one big effect keyed on `transcriptUrl` only, so flipping `speed` from `paused` to `instant` never re-fired and the held events sat undispatched. Split into a parse effect (depends on URL, fetches and stores events in state) and a pace effect (depends on parsed-events + speed, owns the cursor + timers). The playback cursor lives in a ref that survives pause/resume cycles, so flipping `paused` -> `instant` flushes from the current position rather than restarting (which would double-dispatch `run_started` and reset the reducer). Two new useCritiqueReplay.test.tsx cases: - paused-then-instant transitions from `playing` to `done` and reaches the shipped terminal phase - intervalMs paced playback dispatches one event, pauses to drain the next scheduled timer, flips to instant, and confirms the remaining transcript drains exactly once (cursor was preserved) Doc consistency The earlier source comment in useCritiqueReplay.ts claimed `live` "paces by recorded timestamps" while the impl used zero-delay timers and the PR body said it behaves like `instant`. Aligned to reality: `live` currently behaves like `{ intervalMs: 0 }` (events drain on successive microtasks via setTimeoutFn) because transcripts do not yet carry per-event timestamps. Honest timestamp-driven pacing is queued as a Phase 7+ follow-up. Validated: pnpm guard, pnpm --filter @open-design/web typecheck, Theater suite 47/47 (up from 42, +3 sse + 2 replay), full web suite 96 files / 888 tests. * feat(i18n): seed Critique Theater key block (en + zh-CN; other locales fall back via spread) * feat(web): Theater PanelistLane component (Phase 8.1) * feat(web): Theater ScoreTicker component (Phase 8.2) * feat(web): Theater RoundDivider component (Phase 8.3) * feat(web): Theater InterruptButton component with Escape keybind (Phase 8.4) * feat(web): Theater TheaterDegraded chip (Phase 8.5) * feat(web): Theater TheaterCollapsed post-run summary (Phase 8.6) * feat(web): Theater TheaterTranscript replay surface (Phase 8.7) * feat(web): Theater TheaterStage top-level container (Phase 8.8) * feat(web): Theater CSS using existing semantic tokens (no hex literals) * feat(web): Theater public exports barrel * fix(web): resolve P2 + P3 review feedback on Phase 8 (PR #1314) Addresses all 4 P2 + 3 P3 items from codex, Siri-Ray, and lefarcen. State-lifecycle fixes (3 x P2) 1. Reducer learns a synthetic `__reset__` action (`CritiqueResetAction`). Host hooks dispatch it when their gating prop changes so a stale run from a prior project / transcript cannot bleed into the next context. Reset is idempotent on idle (returns the same reference). 2. `useCritiqueStream` dispatches `__reset__` at the top of its connection effect, so a workspace switch from project A (which streamed a critique) to project B clears the reducer before the new EventSource opens. enabled=false also clears. 3. `useCritiqueReplay` dispatches `__reset__` at the top of its parse effect, so transcriptUrl swaps (including swap-to-null after a replay reached `shipped`) lift the reducer back to idle before the new fetch starts. SSE validation (1 x P2) 4. `sseToPanelEvent` now runs a per-variant `hasValidVariantShape` check after the cheap `isPanelEvent` predicate. A `critique.ship` frame missing `composite` / `round` / `status` / `artifactRef` is rejected before reaching the reducer, so TheaterCollapsed can no longer crash on `undefined.toFixed(1)`. Every variant's required fields are validated: run_started (protocolVersion, non-empty cast, maxRounds, threshold, scale), panelist_* (round, role, plus variant-specific shape), round_end (round, composite, mustFix, decision in {continue,ship}, reason), ship (round, composite, status, artifactRef.{projectId,artifactId}, summary), degraded (reason, adapter), interrupted (bestRound, composite), failed (cause), parser_warning (kind, position). Reducer correctness (1 x P2) 5. `panelist_open` now materializes the round + an empty panelist view (`{dims: [], mustFixes: []}`) so TheaterStage can highlight the in-progress lane the instant the tag opens. Before this, a stream that emitted only `panelist_open` after `run_started` left `rounds = []` and the UI rendered no current round until a later `panelist_dim` arrived. Polish (3 x P3) 6. Brand role tint swaps from `var(--magenta, var(--accent))` to `var(--purple, var(--accent))`. `--purple` is actually defined across the design systems; `--magenta` is not, so Brand was silently falling through to `--accent` and looking identical to Designer. 7. New i18n key `critiqueTheater.interruptedSummary` for the interrupted-collapse copy ("Interrupted at round N, best composite X.X"). Previously the interrupted branch reused `shippedSummary` and the UI read "Shipped at round..." for a run that specifically did not ship. Native value in en + zh-CN; other locales fall back via `...en` spread. 8. `TheaterDegraded` heading id comes from `useId()` instead of a hardcoded `theater-degraded-heading`, so two chips rendered on the same page (chat history with multiple completed runs) keep their aria-labelledby references unambiguous. Tests (15 new cases) - reducer.test.ts (+5): __reset__ on running/terminal/idle, panelist_open materializes round, panelist_open does not stomp prior panelist data. - sse.test.ts (+6): variant-level rejection for ship without required fields, degraded without adapter, run_started with empty cast, panelist_dim with non-numeric score, round_end with unknown decision, plus a positive fully-formed ship. - useCritiqueStream.test.tsx (+2): state reset on projectId change, state reset on enabled flip false. - useCritiqueReplay.test.tsx (+1): state reset on transcriptUrl swap to null after a replay reached shipped. - TheaterCollapsed.test.tsx (text-pinning update): asserts the interrupted branch reads "Interrupted at round 1" + "best composite 7.9", and explicitly NOT "Shipped at round...". - TheaterDegraded.test.tsx (+1): two chips on the same page get unique aria-labelledby ids that each resolve to an `<h3>`. Validated - pnpm guard clean - pnpm --filter @open-design/web typecheck clean - Theater suite: 13 files, 101 tests (was 86 on the first Phase 8 push, +15 new) - tests/i18n/locales.test.ts 5 of 5 across 18 locales * feat(web): CritiqueTheaterMount wires SSE + reducer into a single drop-in (Phase 9.1) * feat(i18n): Critique Theater strings for de + ja + ko + zh-TW (Phase 9.2) * fix(web): resolve P1 + P2 review feedback on Phase 9 (PR #1315) Addresses every blocker from codex, Siri-Ray, and lefarcen. The three state-lifecycle and SSE-validation issues they also flagged inherit fixes from PR #1314's review pass that this branch now sits on top of after rebase. Real daemon kill on Interrupt (P1) - CritiqueTheaterMount now POSTs to /api/projects/:id/critique/:runId/interrupt alongside the optimistic local dispatch. Before this fix, clicking Interrupt only flipped the React state to interrupted while the daemon job kept running. The fetch is best-effort: a 404 (endpoint not wired yet, lands in Phase 15) is swallowed with a dev-mode console.warn so the UI still moves to the collapsed badge. - New fetchInterrupt test seam lets RTL assert on the URL / method and simulate the "daemon not ready yet" path. Two tests pin both: the happy URL proj-42/critique/run-abc/interrupt POSTs, and a rejected fetch still flips the UI. interruptPending reset on new run (P2) - A ref-backed effect compares the current runId against the last one we saw; when it changes, interruptPending is cleared. A user who interrupts run-1 and then triggers run-2 from the same mount now gets a fresh, enabled kill button instead of one stuck in "Interrupting…". Pinned by a new mount test. Escape keybind scope (P2) - InterruptButton now checks the keydown target. Escape inside an input, textarea, select, or contenteditable element is ignored (and any ancestor of those via closest() is treated the same way). Body-level focus still fires the keybind so the Theater area's affordance keeps working. Four new tests cover textarea, input, contenteditable, and the body-focus positive case. userFacingName i18n key (P2) - The spec at specs/current/critique-theater.md:6 mandates a single critiqueTheater.userFacingName key so the "Design Jury" label can be renamed without touching code. Phase 8 introduced critiqueTheater.title by mistake; renamed across types.ts, en.ts, zh-CN.ts, de.ts, ja.ts, ko.ts, zh-TW.ts, and the lone consumer TheaterStage.tsx. The locale alignment test stays green. Validated - pnpm guard clean - pnpm --filter @open-design/web typecheck clean - Theater suite: 14 files, 112 tests (was 101 before, +11 new for the Phase 9 review pass: 3 mount + 4 InterruptButton focus scope; the rest were already in #1314's review fix). - tests/i18n/locales.test.ts 5 of 5 across 18 locales. * feat(daemon): adapter-degraded registry with TTL (Phase 10.1) In-memory registry recording adapters that produced malformed or oversize transcripts so the orchestrator can skip them for a TTL window (default 24h) instead of cycling through known-bad providers on every run. Records carry reason (malformed_block \| oversize_block \| missing_artifact), source label, and expiresAt. The test-only clock seam lets the suite advance time deterministically and prove that an expired entry stops counting as degraded without anyone calling clearDegraded. 7/7 vitest cases green. * feat(daemon): synthetic good + bad adapter fixtures (Phase 10.2) Two test-only adapters that read the existing v1 transcript fixtures (happy-3-rounds and malformed-unbalanced) and replay them as either a full string or a 512-byte chunked stream. The chunked form is what the conformance harness uses to prove the parser holds together when the transcript arrives in arbitrary network slices, not as one buffered blob. * feat(daemon): adapter conformance harness (Phase 10.3) runAdapterConformance pulls a transcript through the same parseCritiqueStream pipeline the orchestrator uses and classifies the outcome as shipped, degraded, or failed. On a degraded outcome it forwards the matched reason to the adapter-degraded registry, so a single nightly conformance run is what populates the skip list rather than the orchestrator learning each adapter is broken at request time. 5/5 vitest cases green covering shipped, malformed degraded, oversize degraded, no-ship failure, and the harness-thrown failure path. * test(e2e): Critique Theater Playwright suite (Phase 11) Six tests, one viewport per visual case, deterministic SSE fixtures stubbed via page.route(). Adds the suite to test:ui:extended so the existing extended-UI lane picks it up. Coverage: 1. Happy path: a single mounted theater plays the full fixture (1 run_started, 5 panelists open / dim / must_fix / close, 1 round_end, 1 ship) and ends on the score badge. 2. Interrupt mid-run: the panelist that is open at the time the interrupt button is clicked closes with an interrupted marker and the transcript freezes there. 3. Visual regression at 375x720 mobile. 4. Visual regression at 768x1024 tablet. 5. Visual regression at 1280x800 desktop. 6. A11y role tree: the theater region exposes a labelled landmark, each panelist lane is a group with an accessible name, the score is a status live region. All SSE traffic is stubbed by page.route so the suite runs in CI without a daemon. The toggle is seeded via localStorage by bootAppWithCritiqueEnabled so the gate behaves as if Settings flipped it on. typecheck clean; playwright --list reports 6. * test(web): reducer p99 bench at 10k iterations (Phase 13.1) Locks the documented 2ms budget for the Critique Theater reducer on a representative SSE script (27 actions, one full happy run) behind a regression gate. Asserts p99 stays under 4ms (2x the documented budget) so CI runners with a noisy neighbour do not flake while a real regression to 20ms or 200ms still trips. The bench is a vitest case rather than a bare microbenchmark so it runs in the same CI lane as every other web test and does not need a parallel runner. * test(web): critique surface coverage walker (Phase 13.2) Walks the public critique surface (11 SSE event names, 5 panelist roles, 6 lifecycle phases, 9 named i18n keys) and asserts each named symbol appears in both the src corpus and the test corpus. The walker is the gate that catches a rename in one half of the codebase without a matching update in the other half: a future PR that drops 'panelist_must_fix' from the reducer without also removing its test reference fails this suite. 62 assertions, one per symbol per corpus. * docs: Critique Theater user guide (Phase 14.1) Seven sections aimed at end users (not contributors): 1. What is Design Jury 2. How it works (the five panelists, auto-converging rounds, the composite formula) 3. Settings (the M1 toggle and what it does) 4. Reading the score badge 5. Replay surface 6. Troubleshooting (degraded, interrupted, failed) 7. FAQ The composite formula is documented as designer * 0 + critic * 0.4 + brand * 0.2 + a11y * 0.2 + copy * 0.2 because anyone trying to reverse-engineer the score is going to search for those weights and the docs are the place they should land first. * docs(daemon): critique module AGENTS map (Phase 14.2) Daemon-side wayfinder for the apps/daemon/src/critique directory. Tables every file, what owns what invariant, and the 'when you change anything here' guide so a future contributor does not have to reverse-engineer the rollout resolver before adding a new SSE event. * docs(web): Theater module AGENTS map (Phase 14.3) Web-side mirror of the daemon AGENTS map. Same file table, same invariants section, same change-impact guide, sized to the Theater component package. * feat(daemon): rollout flag resolver (Phase 15.1) Single decision point every caller consults to know whether the orchestrator should wire the critique pipeline for a given run. Priority: 1. Skill-level policy (required wins, opt-out wins inversely) 2. Per-project override from the Settings toggle 3. OD_CRITIQUE_ENABLED env override 4. Rollout phase default M0 dark-launch false M1 settings only false (toggle is off until the user flips it) M2 per-skill true if skill opted in M3 global default true OD_CRITIQUE_ROLLOUT_PHASE parser defaults to M0 on unknown input so a fresh install never surprises a user with the feature on. 10/10 vitest cases green covering every cell of the matrix. * feat(web): Settings toggle hook for Critique Theater (Phase 15.2) React hook that reads critiqueTheaterEnabled from the existing open-design:config localStorage blob and stays in sync via: - the platform storage event (cross-tab) - a open-design:critique-theater-toggle CustomEvent (same-tab) Same-tab event is the one that fires when the Settings panel saves in the current window: the toggle and every mounted theater update without a page reload. setCritiqueTheaterEnabled(next) is the imperative setter the Settings panel calls. It preserves the rest of the stored config (mode, apiKey, etc.) and dispatches the same-tab event after the localStorage write. The web hook reflects what the user toggled; the daemon-side isCritiqueEnabled is the final routing authority (project override, env, rollout phase). When they disagree, the daemon wins for backend gating and the web reflects the toggle state. 6/6 vitest cases green covering first read, stored read, same-tab event flip, config preservation, corrupted JSON tolerance, and cross-tab storage event. * test(web): Phase 15 toggle hook failure-mode coverage (PR #1320) lefarcen P2 on PR #1320 flagged that the PR body claimed safe behavior for disabled localStorage, non-object JSON, and missing CustomEvent shim, but the suite only covered corrupt JSON plus happy-path storage events. Added four failure-mode tests so the swallowed errors are not silently traded for a throw in a future refactor: 1. Returns false on a stored JSON value that parses to an array (non-object). Catches a regression where the guard treats anything truthy as a config blob. 2. Returns false on a stored JSON value of literal 'null'. typeof null === 'object' in JS, so the guard has to check null explicitly; this test pins that check. 3. Returns false when localStorage.getItem throws (private mode / disabled storage / SecurityError). The hook must swallow and return false so the rest of the app keeps rendering. 4. setCritiqueTheaterEnabled still dispatches the same-tab CustomEvent when localStorage.setItem throws (quota exceeded / disabled storage). The dispatch path is the in-session broadcast that keeps every mounted hook coherent even when persistence is unavailable; verified by mounting two probes and asserting both flip after the setter is called with a throwing setItem. 10/10 vitest cases green (6 existing + 4 new). * fix(web): honor CustomEvent payload in toggle hook listener (PR #1320) Both Siri-Ray (blocking) and lefarcen (P2 new) caught the same real bug in the failure-mode test I added in `affcdd27`: the test asserts the in-session UI flips when localStorage.setItem throws, but the CustomEvent listener was ignoring the event's typed detail and just calling readToggle(). Under a throwing setItem the localStorage value is stale (or absent), so the listener would see the OLD value and the test would fail (or worse, the production claim 'in-session event keeps mounts coherent' was hollow). Fixed the hook, not the test: the listener now reads event.detail.enabled when it is a boolean, falling back to readToggle() only for malformed events or for cross-tab storage events (which do not carry a typed payload). The setter already dispatched the detail; the listener just was not consuming it. Test changes: - The existing 'setItem throws' test now asserts the right behavior for the right reason. Updated the inline comment to say the listener reads from detail, not localStorage. - New test 'falls back to readToggle when the CustomEvent carries no usable detail' pins the fallback path: a malformed dispatcher (no detail, or detail.enabled not a boolean) degrades cleanly instead of throwing or being silently ignored. 11 / 11 vitest cases green (10 prior + 1 new fallback). * feat(daemon): route critique spawn-path eligibility through the rollout resolver The wireup edit Phase 10 and Phase 15 carved out: today server.ts gates the critique pipeline on critiqueCfg.enabled, which is just the OD_CRITIQUE_ENABLED env var. After this commit it gates on isCritiqueEnabled(...) from the Phase 15 resolver, so the full priority matrix is live: 1. Per-skill od.critique.policy veto (opt-out / required) 2. Per-project override (M1 Settings toggle, written through the existing Phase 6 settings endpoint) 3. OD_CRITIQUE_ENABLED env override (power-user lane / CI fixtures) 4. OD_CRITIQUE_ROLLOUT_PHASE default M0 dark-launch false M1 settings only false M2 per-skill only when skillPolicy === 'opt-in' M3 global default true Default behaviour on a fresh install is unchanged: the resolver returns false at M0 without an env override or a project override, so prod traffic falls through to the legacy single-pass path exactly the way it did before. Inputs threaded today: phase from OD_CRITIQUE_ROLLOUT_PHASE, envOverride from OD_CRITIQUE_ENABLED. skillPolicy and projectOverride are passed as null for the v1 cutover; the daemon-side handler that round-trips critiqueTheaterEnabled on the project settings row and the od.critique.policy frontmatter resolver land as the next two commits in this branch. The three call sites that used critiqueCfg.enabled (the brand-thread guard, the skill-thread guard, the top-line critiqueShouldRun compound) now read from a single locally-scoped critiqueEnabledForRun boolean, so the eligibility check is computed exactly once per spawn and the prompt composer + orchestrator stay in lockstep the way the existing comment already promised. Tests still green: daemon vitest 22 / 22 across rollout + conformance + adapter-degraded. Daemon typecheck clean. * feat(web): mount CritiqueTheaterMount in ProjectView The web counterpart of the daemon wireup. ProjectView now renders <CritiqueTheaterMount projectId={project.id} enabled={...} /> as a sibling of <AppChromeHeader> inside the top-level <div className="app">. The mount is the drop-in from the Phase 9 stack: it owns the SSE subscription, the kill-request handshake, and the phase-aware swap from the live <TheaterStage> to the collapsed badge once a run settles. The mount returns null until the daemon emits a critique.run_started for the active project, so the visual surface is byte-for-byte unchanged for users who have not opted in. Enabled wiring: useCritiqueTheaterEnabled() reads the M1 Settings toggle from the existing open-design:config localStorage blob and stays in sync with both the platform storage event (cross-tab) and the same-tab open-design:critique-theater-toggle CustomEvent the Phase 15 setter dispatches. The hook honors the event payload directly so a private-mode browser that cannot persist the toggle still updates the in-session UI correctly. The daemon-side gate (isCritiqueEnabled in apps/daemon/src/server.ts) remains the authority for whether a run is actually wired through the critique pipeline. This hook only governs whether the web layer renders the resulting SSE stream when the daemon emits one. The two-layer gate is intentional: an integrator embedding the Theater in a custom UI can flip the web visibility independent of the daemon's routing decision, and a daemon-side env override flips backend gating without touching the web's localStorage. Tests still green: web Theater suite 181 / 181 across 16 files. Web typecheck clean. * feat(daemon): resolve od.critique.policy frontmatter at the spawn site The next step in the wireup branch's ladder: replace the placeholder `skillPolicy: null` with the actual value parsed from the active skill's SKILL.md frontmatter. Three small edits, one new field on a public type: 1. SkillInfo gains a `critiquePolicy: SkillCritiquePolicy` field carrying the parsed `od.critique.policy` token (required / opt-in / opt-out / null). The field is null when the skill has no opinion, which lets the lower-priority resolver tiers (projectOverride, envOverride, phase default) decide. 2. listSkills() populates the new field via a small `normalizeCritiquePolicy` helper that tolerates the YAML scalar's casing and trims whitespace. Unknown tokens collapse to null so a typo in SKILL.md cannot accidentally force the panel on or off; it just falls through. Derived example cards inherit the parent's policy. 3. server.ts captures `skill.critiquePolicy` into a hoisted `skillCritiquePolicy` variable inside the existing skill-load block, then threads it into the isCritiqueEnabled call as the skillPolicy input. The hoisting keeps the variable in scope at the resolver call site without restructuring the spawn handler. After this commit, the priority matrix the rollout resolver was designed for is live for its top tier. The previous commit wired env + phase; this one wires skill. The projectOverride input remains null pending the next commit that extends the Phase 6 settings endpoint. Daemon vitest: 10 / 10 rollout cases pass against the new wiring. Daemon typecheck: clean. * feat(daemon): feed projectOverride into the rollout resolver from project metadata Replaces the placeholder `projectOverride: null` in the spawn handler with the actual value the Settings panel writes onto the project's metadata blob: `critiqueTheaterEnabled?: boolean`. The read is defensive at the boundary: the metadata object is typed loosely (it round-trips through SQLite as a free-form JSON blob), so the spawn handler narrows to `boolean` and falls through to `null` for any other shape. A missing key, a malformed value, or a project that has never visited Settings collapses to `null`, which is exactly the resolver's "no opinion, fall through to env / phase" signal. The `critique` frontmatter slot also gets typed on the SkillFrontmatter shape so the `od.critique.policy` chain the previous commit introduced no longer needs a bracket-access cast. Same pattern as the existing `craft`, `preview`, and `design_system` nested-record slots. After this commit, every tier of the rollout resolver's priority matrix is wired: 1. skillPolicy (from SKILL.md od.critique.policy) 2. projectOverride (from project metadata critiqueTheaterEnabled) 3. envOverride (from OD_CRITIQUE_ENABLED) 4. rollout phase (from OD_CRITIQUE_ROLLOUT_PHASE) The write path for projectOverride still flows through the existing project-update handler the Settings panel already uses to persist project metadata; no new endpoint is needed. The Settings UI button that calls setCritiqueTheaterEnabled and posts the new field is the next commit on this branch. Daemon typecheck: clean. Daemon vitest: 10 / 10 rollout cases still green against the new wiring. * fix(daemon): forward critique events to project sinks + align composer gate (PR #1338) Two codex review items addressed in one commit since they share the same root cause (resolver-enabled run hits a transport / prompt contract that was still env-gated): P1 (transport mismatch). The daemon emits critique.* SSE frames through critiqueBus -> design.runs.emit, which fans out on /api/runs/:runId/events. The web CritiqueTheaterMount subscribes to /api/projects/:projectId/events (it's project-scoped, not run- scoped, because the mount lives at the project workspace and follows the user across runs). Result: in production the mount never sees a real frame and the e2e tests' stubbed routes hide the mismatch. Fixed by extending critiqueBus.emit to fan out to BOTH sinks: the existing runs.emit transport, AND the per-project event-sinks map. The project-events route emits via sse.send(payload.type, payload), so we pack the SSE channel name onto payload.type and let the sink push the right channel. The web sseToPanelEvent overwrites type from the channel name on the way back into a PanelEvent, so the round-trip stays correct. P2 (prompt gate misalignment). composeSystemPrompt reads cfg.enabled to decide whether to append the panel addendum, but critiqueCfg.enabled is loaded from OD_CRITIQUE_ENABLED only. A run the resolver enabled via phase / project / skill (env unset) would have critiqueShouldRun = true while critiqueCfg.enabled remained false, dropping the panel prompt while still routing through runOrchestrator -> parser waits for tags that never arrive -> run degrades. Fixed by passing a derived config { ...critiqueCfg, enabled: true } to the composer when critiqueShouldRun is true. The composer's own gate now agrees with the resolver decision on every input the spec defines. Daemon typecheck: clean. Daemon vitest: 10 / 10 rollout cases still green against the new wiring. * fix: address PerishCode P1 + P2 follow-ups on PR #1338 Two follow-up items PerishCode flagged on the activation PR. Non-blocking but both are real: 1. Phase 11 e2e suite was wired into test:ui:extended but lands the user on '/' (home route) where ProjectView (and therefore CritiqueTheaterMount) is never rendered. With the suite as written, every assertion would time out the first time the lane runs in CI, contradicting the PR body's claim that the suite stays parked behind test.describe.fixme. The state diverged from my earlier Phase 11 work because the merge from main on commit `4ab719c6` brought in #1307's squash-merged version of the e2e file (the pre-fixme shape). Re-applied test.describe.fixme to the describe block plus removed ui/critique-theater.test.ts from the test:ui:extended script in e2e/package.json. Added a file-header docblock explaining what the follow-up commit needs to do: replace goto('/') with /projects/:id navigation similar to app-design-files.test.ts, split the SSE fixture into a live prefix and terminal suffix (Codex P2 on PR #1320), and commit the first PNG baselines. 2. bestRoundOf in CritiqueTheaterMount returned the LAST round with a numeric composite, not the round with the HIGHEST composite, while bestCompositeOf correctly returned the max. A run that closed round 1 at 8.5 and round 2 at 6.0 would dispatch interrupted { bestRound: 2, composite: 8.5 } on a user-clicked interrupt. Folded the two helpers into a single bestRoundAndComposite that walks state.rounds once and returns the matching pair so the two values cannot drift. The onInterrupt callback now destructures from one helper instead of two independent reads. Falls back to (state.activeRound, 0) when no round has closed with a composite yet. Web typecheck: clean. CritiqueTheaterMount.test.tsx: 7 / 7 cases still green against the new helper. * fix: wire M1 project override end-to-end + correct deferred-surface doc claims (PR #1338) Three lefarcen P2s on the latest review pass, all real: 1. M1 project override was half-wired: the daemon read metadata.critiqueTheaterEnabled but the web setter only wrote localStorage. A user opt-in would render the Theater on the web (localStorage was set) while the daemon resolved projectOverride=null and skipped critique unless env / phase already permitted. Two halves talking past each other. Extended setCritiqueTheaterEnabled to accept an optional { projectId, fetchProjectSettings } options bag. When a projectId is supplied, the setter ALSO sends a PATCH /api/projects/:id with { metadata: { critiqueTheaterEnabled } } so the daemon's spawn-time resolver picks the same value up on the next generation. The existing project-routes endpoint already accepts arbitrary metadata patches, so no new endpoint is needed. The local write + the CustomEvent dispatch still fire before the PATCH, so a network failure does not unwind the in-session UI flip. Three new vitest cases pin the new path: PATCHes when projectId is provided, skips when it is not, swallows a rejected PATCH so the in-session UI still flips. 2. Rollout docs (docs/critique-theater.md section 3) claimed the Settings toggle persists into the daemon settings store, but the previous implementation only had a localStorage reader / writer plus a daemon read of project metadata, with no round-trip. Rewrote the section to lead with the four-tier resolver (skill policy / project override / env / phase), document that the setter now round-trips via the existing PATCH endpoint when given a projectId, and call out the Settings panel UI control as a deliberate follow-up. 3. Troubleshooting table pointed users at /api/metrics/critique (Phase 12, deferred) and 'od adapters clear-degraded <id>' (CLI wrapper that does not exist). Replaced the metrics reference with the local conformance harness command (pnpm --filter @open-design/daemon vitest run tests/critique-conformance.test.ts) that ships today, with a note that the Phase 12 dashboard surfaces this status as a series once that PR lands. Replaced the CLI command with the programmatic clearDegraded() helper that exists today and flagged the CLI wrapper as planned follow-up. Web typecheck: clean. Toggle hook tests: 14 / 14 green (11 existing + 3 new for the round-trip path). * test(web): multi-round interrupt regression for bestRoundAndComposite (PR #1338) lefarcen P3 follow-up to the previous bestRoundAndComposite fix: the existing CritiqueTheaterMount.test.tsx interrupt cases only exercised a single-round state, so a future refactor back to two independent helpers wouldn't be caught by the test suite even though it'd reintroduce the round / composite drift bug. Added a regression case that: 1. Drives the reducer through two complete rounds with the full 5-role cast closing at distinct composites: round 1 at 8.5, round 2 at 6.0 (the high-composite round is NOT the most recent one). 2. Clicks Interrupt + waits for the daemon ack via the test seam fetcher returning 204. 3. Asserts the collapsed badge displays "round 1" (the correct best-composite round), and queryByText for "round 2 ... 8.5" returns null (the buggy pairing would have produced that string). The bestRoundAndComposite helper walks state.rounds in one pass and returns the matching pair, so the round number and the composite cannot drift apart. This test locks the fix in: a refactor that splits the helpers back into independent walks will be caught here. 8 / 8 vitest cases green on the file. * fix(web): read-merge-write the project metadata in setCritiqueTheaterEnabled (PerishCode P2 on PR #1338) The previous round-trip sent { metadata: { critiqueTheaterEnabled: next } } as the entire PATCH body. The daemon's project-routes handler only re-stamps three immutable fields (baseDir, importedFrom, fromTrustedPicker) before calling updateProject(db, id, patch), which then does a shallow { ...existing, ...patch } in apps/daemon/ src/db.ts. So patch.metadata replaces the row's metadata wholesale, dropping kind, templateId, linkedDirs, and every other field the rest of the app reads. No in-tree caller passes projectId today (only vitest cases), so the bug had not surfaced yet. But the surface is documented in docs/critique-theater.md section 3 and the function's own JSDoc as the M1 round-trip path, so it would have shipped as a latent footgun for the next integrator: a Settings UI follow-up, or any third party that wires the setter into a project-aware surface. Fix: read-merge-write rather than a bare patch. - GET /api/projects/:id to read the row's current metadata. - Spread that metadata into the PATCH body and overlay critiqueTheaterEnabled: next on top, mirroring the partial-metadata pattern already used in ChatComposer.tsx for linkedDirs. - PATCH the merged object. Failure handling: - GET fails: skip the PATCH entirely. We cannot construct a safe merged body without the current state, and a bare patch would wipe other metadata. The in-session CustomEvent fired earlier in the setter still keeps every mounted hook consistent; the next save retries the round-trip. - PATCH fails: log in dev. The in-session UI is already correct via the CustomEvent. Tests (TDD, red-first): - 'GETs the project then PATCHes with merged metadata when a projectId is supplied': stubs a GET that returns { kind: 'template', templateId: 'modern-blog', linkedDirs: [...] } and asserts the PATCH body equals the merge plus the toggle. - 'PATCHes with just the toggle when the project has no prior metadata': stubs a GET that returns no metadata block. - 'skips the PATCH (does not stomp metadata) when the prefetch GET fails': stubs a rejecting GET and asserts only the GET fires. - 'swallows a rejected PATCH after a successful prefetch': stubs a successful GET and a rejecting PATCH; asserts the in-session UI still flips via the CustomEvent. Doc updated on the setter's JSDoc to describe the new three-step flow (localStorage, CustomEvent, read-merge-write PATCH) and the two failure modes. Verified: - pnpm --filter @open-design/web typecheck clean. - pnpm --filter @open-design/web test: 111 files / 1055 tests green (was 1052, +3 from the new merge-flow cases). * fix(web): restore wait-for-daemon-ack pattern on Theater interrupt Same regression as flagged on PR #1316 post-main-merge: the optimistic local dispatch fired before the POST resolved, so a daemon 404 / 409 still terminalized the UI and the real SSE terminal event got ignored by the sticky interrupted phase. Snapshot runId / bestRound / composite at click time, dispatch interrupted only on res.ok, clear interruptPending on rejection or non-2xx so the user can retry. Tests cover rejection + 404 leaving the run on the live stage; the 204 path waits for the ack. * fix(test): add projectKind prop to FileViewer deck render after v0.7.0 merge * fix(daemon): address PerishCode P3 trio on PR #1338 (emit helper reuse + spawn-input coverage + restored docs) --------- Co-authored-by: Nagendhra <nagendhra405@gmail.com>	2026-05-14 20:37:06 +08:00
PerishCode	cba8bf151d	chore: align namespace lifecycle packaging	2026-05-14 16:35:46 +08:00
lefarcen	b268bbe169	Merge origin/garnet-hemisphere (post-9e196d34) — Use Plugin handoff fix Brings in 11 new garnet commits, most importantly: - `1a90aef4` feat(plugin-use): implement plugin use handoff functionality — fixes the bug QA reported where /plugins Use Plugin would 422 silently for template plugins; new flow hands off to HomeView with the plugin pre-bound + input form prompted there. - `2ac58544` feat(plugin-inputs): enhance plugin input handling with file upload support — extends PluginInputsForm for file uploads. - `3b167b69` feat(plugins): registry protocol — new @open-design/registry-protocol workspace package (needs build before daemon boot). - Plus enhancements to plugin metadata, GitHub installer, plugin detail view, login/whoami, static HTML preview paths. Conflicts resolved: - packages/contracts/src/api/projects.ts: HEAD's skipDiscoveryBrief field + garnet's contextPlugins (@-mention plugin context refs) both kept on ProjectMetadata. - apps/landing-page/* (3 files): accepted HEAD — garnet had the older single-page landing-page header; main has the multi-page layout (/skills/, /systems/, /templates/, /craft/) with dynamic counts. Not related to the Use Plugin core fix. New @open-design/registry-protocol package must be built before daemon boots; pnpm install does this via postinstall already.	2026-05-14 16:32:35 +08:00
pftom	2ac5854432	feat(plugin-inputs): enhance plugin input handling with file upload support - Added support for file input fields in the PluginInputsForm, allowing users to upload files with serializable metadata. - Updated the HomeHero component to improve the layout and interaction of input fields, enhancing user experience. - Adjusted CSS styles for better visual representation of input fields and their states. - Modified HomeView to reflect changes in authoring chip IDs for better clarity in plugin actions. - Enhanced tests to cover new file input functionality and ensure correct behavior in various scenarios. This update significantly improves the plugin input handling, enabling users to upload files seamlessly and enhancing the overall interaction model.	2026-05-14 15:52:21 +08:00
pftom	9ea33e076b	feat(context-plugins): add support for context plugins in project metadata and UI - Introduced a new `contextPlugins` field in the `ProjectMetadata` type to accommodate plugins selected via `@` mentions, allowing for additive context in project creation. - Updated the `HomeHero` and `EntryShell` components to handle and display context plugins, enhancing user interaction with selected plugins. - Implemented rendering logic for context plugins in the metadata block, providing clear visibility of selected plugins and their descriptions. - Enhanced the UI to support the removal of context plugins and display additional details on hover, improving the overall user experience. This update significantly enriches the project creation process by allowing users to incorporate multiple context plugins seamlessly.	2026-05-14 15:29:49 +08:00
lefarcen	6c16283850	Merge origin/main (post-7c8305f4) into reconcile branch Brings in 10 new main commits: routine deep-link to specific conversations (#1508), Windows resource cache fix for Orbit templates, collapsible comment side panel (#1607), routines project radio polish, Copilot logo swap, and minor UI fixes. Conflicts resolved: - router.ts: garnet's home/view + marketplace routes + main's per-project conversationId deep-link field coexist on Route union - ProjectView.tsx: garnet's isPhantomDaemonRunMessage helper + main's isStoppableAssistantMessage helper both kept - ProjectView.run-cleanup.test.tsx: accepted HEAD (garnet's phantom-row regression test); main's three new tests for finalizeActiveAssistantMessagesOnStop / clearStreamingConversationMarker / shouldClearActiveRunRefs are queued as a follow-up TODO inline.	2026-05-14 15:13:38 +08:00
github-actions[bot]	a7bebd926f	Auto-generated metrics for run #25835844101 (#1615 )	2026-05-14 14:27:03 +08:00
pftom	f1d0dac58b	feat(plugins): enhance plugin registry with new metadata and preview features - Added new fields to the plugin and metadata types, including `mode`, `taskKind`, `surface`, and `preview`, to improve plugin description and categorization. - Implemented a `previewFrom` function to generate structured previews for plugins based on their metadata, enhancing the user experience. - Introduced a `visualKindFor` function to determine the visual representation of plugins based on their attributes, improving sorting and display logic. - Updated the `entryFromMarketplace` and `officialEntryFromManifest` functions to accommodate the new fields and ensure proper handling of plugin entries. - Created a new documentation file for the plugin system test suite, consolidating testing strategies and acceptance criteria for better clarity. This update significantly enhances the plugin registry's capabilities, providing richer metadata and improved user interactions with plugin previews.	2026-05-14 11:12:51 +08:00
pftom	3b167b6921	feat(plugins): add registry protocol and enhance plugin management features - Introduced the `@open-design/registry-protocol` package, enabling improved interactions with plugin registries. - Updated the `typecheck` script in the daemon's `package.json` to include the new registry protocol. - Enhanced the CLI with new flags and commands for better plugin management, including `yank` and additional marketplace functionalities. - Implemented a plugin lockfile system to manage installed plugins and their versions, improving reliability during upgrades. - Added new marketplace doctor functionality to validate plugin entries and ensure compliance with registry standards. This update significantly enhances the plugin ecosystem by providing robust registry interactions and improved management capabilities.	2026-05-14 08:55:36 +08:00
pftom	56c264c9bd	feat(plugins): add login and whoami commands for GitHub CLI authentication - Introduced `login` and `whoami` commands to the plugin CLI, enabling users to authenticate with the Open Design registry via GitHub CLI. - The `login` command wraps GitHub CLI authentication, allowing users to specify a host, defaulting to GitHub. - The `whoami` command retrieves and displays the authenticated GitHub account information, with an option for JSON output. - Updated the CLI help documentation to include usage instructions for the new commands. - Enhanced error handling for GitHub CLI dependencies and authentication status. This update improves the user experience by simplifying the authentication process for plugin publishing.	2026-05-14 07:25:05 +08:00
lefarcen	d83b228c81	Merge remote-tracking branch 'origin/garnet-hemisphere' into reconcile/garnet-main-merge	2026-05-13 23:52:33 +08:00
pftom	7c48fbd902	feat(plugins): enhance PluginsView with new marketplace management features - Updated the PluginsView component to include new tabs for 'Installed', 'Available', and 'Sources', improving the organization of plugin management. - Introduced functions for adding, refreshing, removing, and setting trust for plugin marketplaces, enhancing the marketplace interaction capabilities. - Enhanced the UI to reflect the new structure, including updated CSS styles for better visual consistency and usability. - Added tests to ensure the functionality of the new marketplace features and verify the correct rendering of available plugins. This update significantly improves the user experience in managing plugins and marketplaces, providing a more intuitive interface for users.	2026-05-13 23:42:41 +08:00
lefarcen	53997990b7	Merge origin/main (post-0.7.0) into reconciled garnet branch Second-pass merge layering 41+ new commits from origin/main on top of the first reconcile commit. Headline upstream additions absorbed: - 0.7.0 release: redesigned chat bubble user-text styling, neutralised palette, lucide icons, ElevenLabs audio voice option discovery in the prompt composer, analytics tracking (PostHog) wired across home / studio / create surfaces, Prometheus `/api/metrics` endpoint, critique-theater drop-in mount with a settings toggle. - Misc upstream fixes (titlebar padding, release header layout, deck preview chrome, feedback form auto-scroll, conversation-created SSE on routine runs, etc.) Conflict resolutions (12 files, ~22 hunks): - contracts barrel + prompts/system: union of both sides; new analytics exports (`./analytics/events`, `./analytics/public-params`) added alongside garnet's plugin/atom/genui exports. Both ElevenLabs voice fields (audioVoiceOptions/audioVoiceOptionsError, main) and pluginBlock/activeStageBlocks (garnet) preserved on ComposeInput. - daemon/server.ts: Prometheus `/api/metrics` route inserted after garnet's `/api/daemon/shutdown`. main's `createAnalyticsService` call added before the chat-run service init alongside the prior reconcile note about the dropped legacy POST /api/projects body. - App.tsx: handleCreateProject now consumes both garnet's plugin fields (pluginId / appliedPluginSnapshotId / pluginInputs / autoSendFirstMessage) and main's analytics requestId. Tracking fires success + failure paths; PluginLoopHome auto-send sessionStorage flag is preserved. - ProjectView.tsx: the garnet auto-send useEffect coexists with main's `useCritiqueTheaterEnabled()` hook. - ChatComposer.tsx: imports merged (drop now-unused fetchSkills, add analytics provider + tracking + buildVisualAnnotationAttachment). - index.css: main's redesigned `.msg.user .user-text` chat bubble styling wins over garnet's plain text rule; garnet's `.msg-plugin-chip*` rules preserved alongside. - EntryView.tsx: accepted HEAD (garnet wrapper) — consistent with reconcile decision #2. main's added PetRail / TopTab / analytics view tracking is intentionally NOT brought into the wrapper; the follow-up to re-integrate PetRail / image-templates / video-templates into EntryShell still stands and now also covers analytics view-tracking hooks. - daemon/package.json + pnpm-lock: merged dep set (tar + posthog-node + prom-client coexist). - Test fixtures (FileWorkspace.test): kept garnet's plugin-folders describe block intact; main's projectKind="prototype" addition is dropped where it conflicted with garnet's plugin-folder fixture files. Verification: `pnpm install` (after lockfile reconciled), `pnpm typecheck` exits 0 across all workspace packages. Follow-up not done in this commit: - PetRail / image-templates / video-templates / 0.7.0 analytics view-tracking hooks need to be added to EntryShell. - Critique-theater settings toggle UX (added on main) lives in the SettingsDialog hierarchy; the reconcile state preserves the SettingsDialog so this should work without changes, but no end-to-end verification yet.	2026-05-13 23:29:56 +08:00
lefarcen	d3602be666	Merge origin/main into garnet-hemisphere (reconcile) Merge of `origin/main` (`03ed3960`, 2026-05-13 pre-0.7.0) into the 161-commit garnet-hemisphere line, reconciling the product-vibe-coded plugin/marketplace/EntryShell surfaces from garnet with the routines / skills / live-artifacts feature work landed on main since the fork point. Headline decisions (full rationale + side-by-side screenshots in `specs/change/20260513-garnet-skills-automations/reconcile-result-vs-garnet.md`): - #1 SettingsDialog: keep main's Memory / Skills / External MCP / Connectors / Routines / MCP server nav items even though the top-level /integrations + /automations routes also cover them. Two entries coexist for now; revisit once Track A/B fill in the placeholder content. - #2 EntryView: accept garnet's thin wrapper delegating to EntryShell. Main's PetRail sidebar + image-templates/video-templates tabs are intentionally deferred to a follow-up that re-integrates them into the new EntryShell layout. - #3 /integrations + /automations top-level routes: kept (garnet's product intent). Skills tab is still a "Coming soon" placeholder awaiting Track A; Routines/Schedules/Live-artifacts cards on /automations are still mock awaiting Track B. - #5 DesignFilesPanel: hybrid — main's pagination as primary list, garnet's Plugin folders section preserved between the live-artifacts block and the pagination block. (by-kind sections drop in favour of pagination; plugin-folders rendering stays because it is a garnet-specific product addition.) - #7 server.ts (10 hunks, ~5400 conflict lines): manual hunk-by-hunk merge. Both daemon admin routes + plugin/genui routes (garnet) and routines/memory/skills upgrades (main) preserved. Garnet's inline project route block kept alongside main's `registerProjectRoutes` / `registerProjectUploadRoutes` modular wiring — duplicate route audit is a follow-up. Garnet's POST /api/projects plugin-snapshot resolution + default-scenario fallback is intentionally dropped from the inline body (now handled by registerProjectRoutes) and listed for follow-up re-integration into `project-routes.ts`. Verification (worktree at /Users/elian/Documents/open-design-garnet): - `pnpm typecheck` exits 0 across all workspace packages - daemon (`pnpm tools-dev run web --namespace reconcile-shots`) boots, serves `/api/daemon/status` healthy, and survives a Playwright walkthrough of /integrations / /automations / home / projects / design-systems / plugins / settings dialog - `@open-design/plugin-runtime` package built (was missing dist/ on garnet); without it the daemon's plugins/* imports fail at boot Track A (Skills tab → real SkillsSection) and Track B (Automations cards → real routines / live-artifacts backend) are the two remaining follow-ups blocking the placeholder/mock content from going live. See `spec.md` and `track-skills.md` in the same directory.	2026-05-13 22:29:21 +08:00
pftom	0edbf38171	feat(plugins): add specVersion and version fields to plugin and marketplace schemas - Introduced `specVersion` and `version` fields to the plugin and marketplace schemas, ensuring better versioning and compatibility tracking. - Updated various components and functions to handle the new fields, including database migrations, plugin snapshots, and marketplace management. - Enhanced tests to validate the presence and correctness of the new fields in plugin manifests and marketplace entries. - Improved documentation to reflect the changes in schema requirements and provide guidance on the new versioning system. This update strengthens the plugin ecosystem by providing clear versioning, enhancing the reliability and maintainability of plugins and marketplaces.	2026-05-13 22:24:50 +08:00
Nagendhra Madishetti	38a5ab69e6	feat(daemon): Critique Theater Phase 12 (9 Prometheus metrics + 6 log events + OTel span + Grafana dashboard) (#1485 ) * feat(web): pure reducer for Critique Theater states (Phase 7.1) Pure CritiqueState reducer driven by the contracts-level PanelEvent (the same shape both the live SSE stream and the recorded transcript emit), so a single reducer powers both the in-flight panel and the rerun replay. Lifecycle covers run_started → running → (shipped / degraded / interrupted / failed), with panelist_open / dim / must_fix / close / round_end events building per-round CritiquePanelistView entries as they arrive. Defensive behaviour that surfaced while writing the spec tests: - Terminal phases (shipped / degraded / interrupted / failed) are sticky against further lifecycle events for the same run, except for parser_warning which can land late and is recorded in a side channel without changing phase. - A new run_started for a different runId at any time discards the prior state and reboots, so the UI can launch consecutive runs without an explicit reset action. - Events whose runId does not match the active run return the same state reference, so React's useReducer doesn't re-render subscribers on stray traffic. - Round bookkeeping keys by round number rather than "always last", so an out-of-order panelist_dim for round 1 arriving after a round 2 dim does not corrupt the round 2 bucket. Test coverage: 18 cases covering each transition, the runId guard, sticky-terminal behaviour, the out-of-order round invariant, and the stable-identity guarantee. Sets up Phase 7.2 and 7.3 to wire SSE + replay into the same reducer. * feat(web): useCritiqueStream hook subscribes to SSE and feeds reducer (Phase 7.2) createCritiqueEventsConnection is a pure connection manager that mirrors apps/web/src/providers/project-events.ts: opens an EventSource at /api/projects/:id/events, listens for every name in CRITIQUE_SSE_EVENT_NAMES, decodes each frame back into a PanelEvent (stripping the critique. prefix and merging the data payload), and hands it to the caller's onEvent. Reconnect uses exponential backoff (1s → 30s) and resets on `ready`; malformed payloads drop with a dev-mode warning rather than tearing the stream. useCritiqueStream wraps the manager in a useReducer that owns the CritiqueState. enabled=false or a null projectId tears down the connection cleanly; switching projectId closes the old connection and opens a fresh one. The returned dispatch lets local UI synthesise actions (e.g. an Esc keypress firing a synthetic interrupted while a kill request is in flight); production traffic comes from the SSE stream. Test coverage: - sse.test.ts (10 cases, node env): subscription set covers every CRITIQUE_SSE_EVENT_NAMES channel; payload decoding lifts the wire shape back to PanelEvent; malformed JSON is swallowed and does not stop the stream; exponential backoff schedule and ready-reset semantics are pinned with a setTimeout seam; close() cancels pending reconnects and shuts the live source; no-op fallback when EventSource is unavailable. - useCritiqueStream.test.tsx (6 cases, jsdom env): idle pre-event, reducer driven by synthetic actions, no connection when disabled or projectId is null, clean close on unmount, projectId change reopens cleanly. * feat(web): useCritiqueReplay hook drives reducer from transcript file (Phase 7.3) Fetches the per-run NDJSON transcript (one PanelEvent per line), parses every line via the shared isPanelEvent predicate, and dispatches into the same CritiqueState reducer the live SSE stream uses. A single reducer means the UI rendering a replay can be identical to the live panel, and a UI mounting both useCritiqueStream and useCritiqueReplay in parallel does not have to reconcile two state shapes. speed knob is `paused \| instant \| live \| { intervalMs: N }`. - instant flushes every event synchronously, useful for opening a finished run already at its terminal state. - intervalMs paces dispatches at a fixed cadence so the reviewer can watch the run unfold. - paused parses the transcript but holds events back until the caller advances speed (consumers can drive a scrubber later). - live is reserved for the future "playback at original cadence" feature, currently treated as instant; replay timestamps are not yet persisted with each event so honest pacing requires a follow-up Phase 7+ task. gunzip seam handles `.ndjson.gz` transcripts via DecompressionStream when present; the production fetch path picks between text and arrayBuffer based on the URL extension. Both seams are injectable so the unit tests don't need to spin up a real network or a real gzip pipeline. Test coverage (8 cases, jsdom env): - Idle status before any URL is provided. - speed=instant flushes the full transcript synchronously to shipped state. - speed={intervalMs:N} paces with the setTimeout seam, reaching done after the last tick. - speed=paused leaves status=playing with no dispatches. - Empty transcript reports done with state still idle. - Fetch rejection surfaces an error status with the message. - Malformed NDJSON lines are skipped; valid events around them still land. - .gz transcripts route through the gunzip seam. Closes the Phase 7 plan tasks 7.1 / 7.2 / 7.3 (reducer + stream + replay), all on one branch ready for review. Phases 8+ (Theater components) consume these from this PR. * fix(web): close payload-override gap + paused-resume bug in Critique Theater hooks (Phase 7 review) Two P1 fixes from lefarcen's review on PR #1307: SSE payload override `sseToPanelEvent` previously spread `data` after the channel-derived `type`, so a payload-provided `type` could override the channel and route a `critique.run_started` frame into the reducer as a `ship` action. Reversed the spread so the channel-derived `type` is authoritative, and revalidated the resulting object through the contracts-level `isPanelEvent` predicate before returning. Frames that fail validation (missing runId, empty runId, unknown type) are dropped, so a malformed or compromised SSE frame can no longer dispatch a wrong-shape action into the reducer. Three new sse.test.ts cases pin the regression: hostile `type:'ship'` in the payload still resolves to `run_started`, missing runId is dropped, empty runId is dropped. Replay pause/resume `useCritiqueReplay` had one big effect keyed on `transcriptUrl` only, so flipping `speed` from `paused` to `instant` never re-fired and the held events sat undispatched. Split into a parse effect (depends on URL, fetches and stores events in state) and a pace effect (depends on parsed-events + speed, owns the cursor + timers). The playback cursor lives in a ref that survives pause/resume cycles, so flipping `paused` -> `instant` flushes from the current position rather than restarting (which would double-dispatch `run_started` and reset the reducer). Two new useCritiqueReplay.test.tsx cases: - paused-then-instant transitions from `playing` to `done` and reaches the shipped terminal phase - intervalMs paced playback dispatches one event, pauses to drain the next scheduled timer, flips to instant, and confirms the remaining transcript drains exactly once (cursor was preserved) Doc consistency The earlier source comment in useCritiqueReplay.ts claimed `live` "paces by recorded timestamps" while the impl used zero-delay timers and the PR body said it behaves like `instant`. Aligned to reality: `live` currently behaves like `{ intervalMs: 0 }` (events drain on successive microtasks via setTimeoutFn) because transcripts do not yet carry per-event timestamps. Honest timestamp-driven pacing is queued as a Phase 7+ follow-up. Validated: pnpm guard, pnpm --filter @open-design/web typecheck, Theater suite 47/47 (up from 42, +3 sse + 2 replay), full web suite 96 files / 888 tests. * feat(i18n): seed Critique Theater key block (en + zh-CN; other locales fall back via spread) * feat(web): Theater PanelistLane component (Phase 8.1) * feat(web): Theater ScoreTicker component (Phase 8.2) * feat(web): Theater RoundDivider component (Phase 8.3) * feat(web): Theater InterruptButton component with Escape keybind (Phase 8.4) * feat(web): Theater TheaterDegraded chip (Phase 8.5) * feat(web): Theater TheaterCollapsed post-run summary (Phase 8.6) * feat(web): Theater TheaterTranscript replay surface (Phase 8.7) * feat(web): Theater TheaterStage top-level container (Phase 8.8) * feat(web): Theater CSS using existing semantic tokens (no hex literals) * feat(web): Theater public exports barrel * fix(web): resolve P2 + P3 review feedback on Phase 8 (PR #1314) Addresses all 4 P2 + 3 P3 items from codex, Siri-Ray, and lefarcen. State-lifecycle fixes (3 x P2) 1. Reducer learns a synthetic `__reset__` action (`CritiqueResetAction`). Host hooks dispatch it when their gating prop changes so a stale run from a prior project / transcript cannot bleed into the next context. Reset is idempotent on idle (returns the same reference). 2. `useCritiqueStream` dispatches `__reset__` at the top of its connection effect, so a workspace switch from project A (which streamed a critique) to project B clears the reducer before the new EventSource opens. enabled=false also clears. 3. `useCritiqueReplay` dispatches `__reset__` at the top of its parse effect, so transcriptUrl swaps (including swap-to-null after a replay reached `shipped`) lift the reducer back to idle before the new fetch starts. SSE validation (1 x P2) 4. `sseToPanelEvent` now runs a per-variant `hasValidVariantShape` check after the cheap `isPanelEvent` predicate. A `critique.ship` frame missing `composite` / `round` / `status` / `artifactRef` is rejected before reaching the reducer, so TheaterCollapsed can no longer crash on `undefined.toFixed(1)`. Every variant's required fields are validated: run_started (protocolVersion, non-empty cast, maxRounds, threshold, scale), panelist_* (round, role, plus variant-specific shape), round_end (round, composite, mustFix, decision in {continue,ship}, reason), ship (round, composite, status, artifactRef.{projectId,artifactId}, summary), degraded (reason, adapter), interrupted (bestRound, composite), failed (cause), parser_warning (kind, position). Reducer correctness (1 x P2) 5. `panelist_open` now materializes the round + an empty panelist view (`{dims: [], mustFixes: []}`) so TheaterStage can highlight the in-progress lane the instant the tag opens. Before this, a stream that emitted only `panelist_open` after `run_started` left `rounds = []` and the UI rendered no current round until a later `panelist_dim` arrived. Polish (3 x P3) 6. Brand role tint swaps from `var(--magenta, var(--accent))` to `var(--purple, var(--accent))`. `--purple` is actually defined across the design systems; `--magenta` is not, so Brand was silently falling through to `--accent` and looking identical to Designer. 7. New i18n key `critiqueTheater.interruptedSummary` for the interrupted-collapse copy ("Interrupted at round N, best composite X.X"). Previously the interrupted branch reused `shippedSummary` and the UI read "Shipped at round..." for a run that specifically did not ship. Native value in en + zh-CN; other locales fall back via `...en` spread. 8. `TheaterDegraded` heading id comes from `useId()` instead of a hardcoded `theater-degraded-heading`, so two chips rendered on the same page (chat history with multiple completed runs) keep their aria-labelledby references unambiguous. Tests (15 new cases) - reducer.test.ts (+5): __reset__ on running/terminal/idle, panelist_open materializes round, panelist_open does not stomp prior panelist data. - sse.test.ts (+6): variant-level rejection for ship without required fields, degraded without adapter, run_started with empty cast, panelist_dim with non-numeric score, round_end with unknown decision, plus a positive fully-formed ship. - useCritiqueStream.test.tsx (+2): state reset on projectId change, state reset on enabled flip false. - useCritiqueReplay.test.tsx (+1): state reset on transcriptUrl swap to null after a replay reached shipped. - TheaterCollapsed.test.tsx (text-pinning update): asserts the interrupted branch reads "Interrupted at round 1" + "best composite 7.9", and explicitly NOT "Shipped at round...". - TheaterDegraded.test.tsx (+1): two chips on the same page get unique aria-labelledby ids that each resolve to an `<h3>`. Validated - pnpm guard clean - pnpm --filter @open-design/web typecheck clean - Theater suite: 13 files, 101 tests (was 86 on the first Phase 8 push, +15 new) - tests/i18n/locales.test.ts 5 of 5 across 18 locales * feat(web): CritiqueTheaterMount wires SSE + reducer into a single drop-in (Phase 9.1) * feat(i18n): Critique Theater strings for de + ja + ko + zh-TW (Phase 9.2) * fix(web): resolve P1 + P2 review feedback on Phase 9 (PR #1315) Addresses every blocker from codex, Siri-Ray, and lefarcen. The three state-lifecycle and SSE-validation issues they also flagged inherit fixes from PR #1314's review pass that this branch now sits on top of after rebase. Real daemon kill on Interrupt (P1) - CritiqueTheaterMount now POSTs to /api/projects/:id/critique/:runId/interrupt alongside the optimistic local dispatch. Before this fix, clicking Interrupt only flipped the React state to interrupted while the daemon job kept running. The fetch is best-effort: a 404 (endpoint not wired yet, lands in Phase 15) is swallowed with a dev-mode console.warn so the UI still moves to the collapsed badge. - New fetchInterrupt test seam lets RTL assert on the URL / method and simulate the "daemon not ready yet" path. Two tests pin both: the happy URL proj-42/critique/run-abc/interrupt POSTs, and a rejected fetch still flips the UI. interruptPending reset on new run (P2) - A ref-backed effect compares the current runId against the last one we saw; when it changes, interruptPending is cleared. A user who interrupts run-1 and then triggers run-2 from the same mount now gets a fresh, enabled kill button instead of one stuck in "Interrupting…". Pinned by a new mount test. Escape keybind scope (P2) - InterruptButton now checks the keydown target. Escape inside an input, textarea, select, or contenteditable element is ignored (and any ancestor of those via closest() is treated the same way). Body-level focus still fires the keybind so the Theater area's affordance keeps working. Four new tests cover textarea, input, contenteditable, and the body-focus positive case. userFacingName i18n key (P2) - The spec at specs/current/critique-theater.md:6 mandates a single critiqueTheater.userFacingName key so the "Design Jury" label can be renamed without touching code. Phase 8 introduced critiqueTheater.title by mistake; renamed across types.ts, en.ts, zh-CN.ts, de.ts, ja.ts, ko.ts, zh-TW.ts, and the lone consumer TheaterStage.tsx. The locale alignment test stays green. Validated - pnpm guard clean - pnpm --filter @open-design/web typecheck clean - Theater suite: 14 files, 112 tests (was 101 before, +11 new for the Phase 9 review pass: 3 mount + 4 InterruptButton focus scope; the rest were already in #1314's review fix). - tests/i18n/locales.test.ts 5 of 5 across 18 locales. * feat(daemon): adapter-degraded registry with TTL (Phase 10.1) In-memory registry recording adapters that produced malformed or oversize transcripts so the orchestrator can skip them for a TTL window (default 24h) instead of cycling through known-bad providers on every run. Records carry reason (malformed_block \| oversize_block \| missing_artifact), source label, and expiresAt. The test-only clock seam lets the suite advance time deterministically and prove that an expired entry stops counting as degraded without anyone calling clearDegraded. 7/7 vitest cases green. * feat(daemon): synthetic good + bad adapter fixtures (Phase 10.2) Two test-only adapters that read the existing v1 transcript fixtures (happy-3-rounds and malformed-unbalanced) and replay them as either a full string or a 512-byte chunked stream. The chunked form is what the conformance harness uses to prove the parser holds together when the transcript arrives in arbitrary network slices, not as one buffered blob. * feat(daemon): adapter conformance harness (Phase 10.3) runAdapterConformance pulls a transcript through the same parseCritiqueStream pipeline the orchestrator uses and classifies the outcome as shipped, degraded, or failed. On a degraded outcome it forwards the matched reason to the adapter-degraded registry, so a single nightly conformance run is what populates the skip list rather than the orchestrator learning each adapter is broken at request time. 5/5 vitest cases green covering shipped, malformed degraded, oversize degraded, no-ship failure, and the harness-thrown failure path. * test(e2e): Critique Theater Playwright suite (Phase 11) Six tests, one viewport per visual case, deterministic SSE fixtures stubbed via page.route(). Adds the suite to test:ui:extended so the existing extended-UI lane picks it up. Coverage: 1. Happy path: a single mounted theater plays the full fixture (1 run_started, 5 panelists open / dim / must_fix / close, 1 round_end, 1 ship) and ends on the score badge. 2. Interrupt mid-run: the panelist that is open at the time the interrupt button is clicked closes with an interrupted marker and the transcript freezes there. 3. Visual regression at 375x720 mobile. 4. Visual regression at 768x1024 tablet. 5. Visual regression at 1280x800 desktop. 6. A11y role tree: the theater region exposes a labelled landmark, each panelist lane is a group with an accessible name, the score is a status live region. All SSE traffic is stubbed by page.route so the suite runs in CI without a daemon. The toggle is seeded via localStorage by bootAppWithCritiqueEnabled so the gate behaves as if Settings flipped it on. typecheck clean; playwright --list reports 6. * test(web): reducer p99 bench at 10k iterations (Phase 13.1) Locks the documented 2ms budget for the Critique Theater reducer on a representative SSE script (27 actions, one full happy run) behind a regression gate. Asserts p99 stays under 4ms (2x the documented budget) so CI runners with a noisy neighbour do not flake while a real regression to 20ms or 200ms still trips. The bench is a vitest case rather than a bare microbenchmark so it runs in the same CI lane as every other web test and does not need a parallel runner. * test(web): critique surface coverage walker (Phase 13.2) Walks the public critique surface (11 SSE event names, 5 panelist roles, 6 lifecycle phases, 9 named i18n keys) and asserts each named symbol appears in both the src corpus and the test corpus. The walker is the gate that catches a rename in one half of the codebase without a matching update in the other half: a future PR that drops 'panelist_must_fix' from the reducer without also removing its test reference fails this suite. 62 assertions, one per symbol per corpus. * docs: Critique Theater user guide (Phase 14.1) Seven sections aimed at end users (not contributors): 1. What is Design Jury 2. How it works (the five panelists, auto-converging rounds, the composite formula) 3. Settings (the M1 toggle and what it does) 4. Reading the score badge 5. Replay surface 6. Troubleshooting (degraded, interrupted, failed) 7. FAQ The composite formula is documented as designer * 0 + critic * 0.4 + brand * 0.2 + a11y * 0.2 + copy * 0.2 because anyone trying to reverse-engineer the score is going to search for those weights and the docs are the place they should land first. * docs(daemon): critique module AGENTS map (Phase 14.2) Daemon-side wayfinder for the apps/daemon/src/critique directory. Tables every file, what owns what invariant, and the 'when you change anything here' guide so a future contributor does not have to reverse-engineer the rollout resolver before adding a new SSE event. * docs(web): Theater module AGENTS map (Phase 14.3) Web-side mirror of the daemon AGENTS map. Same file table, same invariants section, same change-impact guide, sized to the Theater component package. * feat(daemon): rollout flag resolver (Phase 15.1) Single decision point every caller consults to know whether the orchestrator should wire the critique pipeline for a given run. Priority: 1. Skill-level policy (required wins, opt-out wins inversely) 2. Per-project override from the Settings toggle 3. OD_CRITIQUE_ENABLED env override 4. Rollout phase default M0 dark-launch false M1 settings only false (toggle is off until the user flips it) M2 per-skill true if skill opted in M3 global default true OD_CRITIQUE_ROLLOUT_PHASE parser defaults to M0 on unknown input so a fresh install never surprises a user with the feature on. 10/10 vitest cases green covering every cell of the matrix. * feat(web): Settings toggle hook for Critique Theater (Phase 15.2) React hook that reads critiqueTheaterEnabled from the existing open-design:config localStorage blob and stays in sync via: - the platform storage event (cross-tab) - a open-design:critique-theater-toggle CustomEvent (same-tab) Same-tab event is the one that fires when the Settings panel saves in the current window: the toggle and every mounted theater update without a page reload. setCritiqueTheaterEnabled(next) is the imperative setter the Settings panel calls. It preserves the rest of the stored config (mode, apiKey, etc.) and dispatches the same-tab event after the localStorage write. The web hook reflects what the user toggled; the daemon-side isCritiqueEnabled is the final routing authority (project override, env, rollout phase). When they disagree, the daemon wins for backend gating and the web reflects the toggle state. 6/6 vitest cases green covering first read, stored read, same-tab event flip, config preservation, corrupted JSON tolerance, and cross-tab storage event. * test(web): Phase 15 toggle hook failure-mode coverage (PR #1320) lefarcen P2 on PR #1320 flagged that the PR body claimed safe behavior for disabled localStorage, non-object JSON, and missing CustomEvent shim, but the suite only covered corrupt JSON plus happy-path storage events. Added four failure-mode tests so the swallowed errors are not silently traded for a throw in a future refactor: 1. Returns false on a stored JSON value that parses to an array (non-object). Catches a regression where the guard treats anything truthy as a config blob. 2. Returns false on a stored JSON value of literal 'null'. typeof null === 'object' in JS, so the guard has to check null explicitly; this test pins that check. 3. Returns false when localStorage.getItem throws (private mode / disabled storage / SecurityError). The hook must swallow and return false so the rest of the app keeps rendering. 4. setCritiqueTheaterEnabled still dispatches the same-tab CustomEvent when localStorage.setItem throws (quota exceeded / disabled storage). The dispatch path is the in-session broadcast that keeps every mounted hook coherent even when persistence is unavailable; verified by mounting two probes and asserting both flip after the setter is called with a throwing setItem. 10/10 vitest cases green (6 existing + 4 new). * fix(web): honor CustomEvent payload in toggle hook listener (PR #1320) Both Siri-Ray (blocking) and lefarcen (P2 new) caught the same real bug in the failure-mode test I added in `affcdd27`: the test asserts the in-session UI flips when localStorage.setItem throws, but the CustomEvent listener was ignoring the event's typed detail and just calling readToggle(). Under a throwing setItem the localStorage value is stale (or absent), so the listener would see the OLD value and the test would fail (or worse, the production claim 'in-session event keeps mounts coherent' was hollow). Fixed the hook, not the test: the listener now reads event.detail.enabled when it is a boolean, falling back to readToggle() only for malformed events or for cross-tab storage events (which do not carry a typed payload). The setter already dispatched the detail; the listener just was not consuming it. Test changes: - The existing 'setItem throws' test now asserts the right behavior for the right reason. Updated the inline comment to say the listener reads from detail, not localStorage. - New test 'falls back to readToggle when the CustomEvent carries no usable detail' pins the fallback path: a malformed dispatcher (no detail, or detail.enabled not a boolean) degrades cleanly instead of throwing or being silently ignored. 11 / 11 vitest cases green (10 prior + 1 new fallback). * feat(daemon): route critique spawn-path eligibility through the rollout resolver The wireup edit Phase 10 and Phase 15 carved out: today server.ts gates the critique pipeline on critiqueCfg.enabled, which is just the OD_CRITIQUE_ENABLED env var. After this commit it gates on isCritiqueEnabled(...) from the Phase 15 resolver, so the full priority matrix is live: 1. Per-skill od.critique.policy veto (opt-out / required) 2. Per-project override (M1 Settings toggle, written through the existing Phase 6 settings endpoint) 3. OD_CRITIQUE_ENABLED env override (power-user lane / CI fixtures) 4. OD_CRITIQUE_ROLLOUT_PHASE default M0 dark-launch false M1 settings only false M2 per-skill only when skillPolicy === 'opt-in' M3 global default true Default behaviour on a fresh install is unchanged: the resolver returns false at M0 without an env override or a project override, so prod traffic falls through to the legacy single-pass path exactly the way it did before. Inputs threaded today: phase from OD_CRITIQUE_ROLLOUT_PHASE, envOverride from OD_CRITIQUE_ENABLED. skillPolicy and projectOverride are passed as null for the v1 cutover; the daemon-side handler that round-trips critiqueTheaterEnabled on the project settings row and the od.critique.policy frontmatter resolver land as the next two commits in this branch. The three call sites that used critiqueCfg.enabled (the brand-thread guard, the skill-thread guard, the top-line critiqueShouldRun compound) now read from a single locally-scoped critiqueEnabledForRun boolean, so the eligibility check is computed exactly once per spawn and the prompt composer + orchestrator stay in lockstep the way the existing comment already promised. Tests still green: daemon vitest 22 / 22 across rollout + conformance + adapter-degraded. Daemon typecheck clean. * feat(web): mount CritiqueTheaterMount in ProjectView The web counterpart of the daemon wireup. ProjectView now renders <CritiqueTheaterMount projectId={project.id} enabled={...} /> as a sibling of <AppChromeHeader> inside the top-level <div className="app">. The mount is the drop-in from the Phase 9 stack: it owns the SSE subscription, the kill-request handshake, and the phase-aware swap from the live <TheaterStage> to the collapsed badge once a run settles. The mount returns null until the daemon emits a critique.run_started for the active project, so the visual surface is byte-for-byte unchanged for users who have not opted in. Enabled wiring: useCritiqueTheaterEnabled() reads the M1 Settings toggle from the existing open-design:config localStorage blob and stays in sync with both the platform storage event (cross-tab) and the same-tab open-design:critique-theater-toggle CustomEvent the Phase 15 setter dispatches. The hook honors the event payload directly so a private-mode browser that cannot persist the toggle still updates the in-session UI correctly. The daemon-side gate (isCritiqueEnabled in apps/daemon/src/server.ts) remains the authority for whether a run is actually wired through the critique pipeline. This hook only governs whether the web layer renders the resulting SSE stream when the daemon emits one. The two-layer gate is intentional: an integrator embedding the Theater in a custom UI can flip the web visibility independent of the daemon's routing decision, and a daemon-side env override flips backend gating without touching the web's localStorage. Tests still green: web Theater suite 181 / 181 across 16 files. Web typecheck clean. * feat(daemon): resolve od.critique.policy frontmatter at the spawn site The next step in the wireup branch's ladder: replace the placeholder `skillPolicy: null` with the actual value parsed from the active skill's SKILL.md frontmatter. Three small edits, one new field on a public type: 1. SkillInfo gains a `critiquePolicy: SkillCritiquePolicy` field carrying the parsed `od.critique.policy` token (required / opt-in / opt-out / null). The field is null when the skill has no opinion, which lets the lower-priority resolver tiers (projectOverride, envOverride, phase default) decide. 2. listSkills() populates the new field via a small `normalizeCritiquePolicy` helper that tolerates the YAML scalar's casing and trims whitespace. Unknown tokens collapse to null so a typo in SKILL.md cannot accidentally force the panel on or off; it just falls through. Derived example cards inherit the parent's policy. 3. server.ts captures `skill.critiquePolicy` into a hoisted `skillCritiquePolicy` variable inside the existing skill-load block, then threads it into the isCritiqueEnabled call as the skillPolicy input. The hoisting keeps the variable in scope at the resolver call site without restructuring the spawn handler. After this commit, the priority matrix the rollout resolver was designed for is live for its top tier. The previous commit wired env + phase; this one wires skill. The projectOverride input remains null pending the next commit that extends the Phase 6 settings endpoint. Daemon vitest: 10 / 10 rollout cases pass against the new wiring. Daemon typecheck: clean. * feat(daemon): feed projectOverride into the rollout resolver from project metadata Replaces the placeholder `projectOverride: null` in the spawn handler with the actual value the Settings panel writes onto the project's metadata blob: `critiqueTheaterEnabled?: boolean`. The read is defensive at the boundary: the metadata object is typed loosely (it round-trips through SQLite as a free-form JSON blob), so the spawn handler narrows to `boolean` and falls through to `null` for any other shape. A missing key, a malformed value, or a project that has never visited Settings collapses to `null`, which is exactly the resolver's "no opinion, fall through to env / phase" signal. The `critique` frontmatter slot also gets typed on the SkillFrontmatter shape so the `od.critique.policy` chain the previous commit introduced no longer needs a bracket-access cast. Same pattern as the existing `craft`, `preview`, and `design_system` nested-record slots. After this commit, every tier of the rollout resolver's priority matrix is wired: 1. skillPolicy (from SKILL.md od.critique.policy) 2. projectOverride (from project metadata critiqueTheaterEnabled) 3. envOverride (from OD_CRITIQUE_ENABLED) 4. rollout phase (from OD_CRITIQUE_ROLLOUT_PHASE) The write path for projectOverride still flows through the existing project-update handler the Settings panel already uses to persist project metadata; no new endpoint is needed. The Settings UI button that calls setCritiqueTheaterEnabled and posts the new field is the next commit on this branch. Daemon typecheck: clean. Daemon vitest: 10 / 10 rollout cases still green against the new wiring. * fix(daemon): forward critique events to project sinks + align composer gate (PR #1338) Two codex review items addressed in one commit since they share the same root cause (resolver-enabled run hits a transport / prompt contract that was still env-gated): P1 (transport mismatch). The daemon emits critique.* SSE frames through critiqueBus -> design.runs.emit, which fans out on /api/runs/:runId/events. The web CritiqueTheaterMount subscribes to /api/projects/:projectId/events (it's project-scoped, not run- scoped, because the mount lives at the project workspace and follows the user across runs). Result: in production the mount never sees a real frame and the e2e tests' stubbed routes hide the mismatch. Fixed by extending critiqueBus.emit to fan out to BOTH sinks: the existing runs.emit transport, AND the per-project event-sinks map. The project-events route emits via sse.send(payload.type, payload), so we pack the SSE channel name onto payload.type and let the sink push the right channel. The web sseToPanelEvent overwrites type from the channel name on the way back into a PanelEvent, so the round-trip stays correct. P2 (prompt gate misalignment). composeSystemPrompt reads cfg.enabled to decide whether to append the panel addendum, but critiqueCfg.enabled is loaded from OD_CRITIQUE_ENABLED only. A run the resolver enabled via phase / project / skill (env unset) would have critiqueShouldRun = true while critiqueCfg.enabled remained false, dropping the panel prompt while still routing through runOrchestrator -> parser waits for tags that never arrive -> run degrades. Fixed by passing a derived config { ...critiqueCfg, enabled: true } to the composer when critiqueShouldRun is true. The composer's own gate now agrees with the resolver decision on every input the spec defines. Daemon typecheck: clean. Daemon vitest: 10 / 10 rollout cases still green against the new wiring. * fix: address PerishCode P1 + P2 follow-ups on PR #1338 Two follow-up items PerishCode flagged on the activation PR. Non-blocking but both are real: 1. Phase 11 e2e suite was wired into test:ui:extended but lands the user on '/' (home route) where ProjectView (and therefore CritiqueTheaterMount) is never rendered. With the suite as written, every assertion would time out the first time the lane runs in CI, contradicting the PR body's claim that the suite stays parked behind test.describe.fixme. The state diverged from my earlier Phase 11 work because the merge from main on commit `4ab719c6` brought in #1307's squash-merged version of the e2e file (the pre-fixme shape). Re-applied test.describe.fixme to the describe block plus removed ui/critique-theater.test.ts from the test:ui:extended script in e2e/package.json. Added a file-header docblock explaining what the follow-up commit needs to do: replace goto('/') with /projects/:id navigation similar to app-design-files.test.ts, split the SSE fixture into a live prefix and terminal suffix (Codex P2 on PR #1320), and commit the first PNG baselines. 2. bestRoundOf in CritiqueTheaterMount returned the LAST round with a numeric composite, not the round with the HIGHEST composite, while bestCompositeOf correctly returned the max. A run that closed round 1 at 8.5 and round 2 at 6.0 would dispatch interrupted { bestRound: 2, composite: 8.5 } on a user-clicked interrupt. Folded the two helpers into a single bestRoundAndComposite that walks state.rounds once and returns the matching pair so the two values cannot drift. The onInterrupt callback now destructures from one helper instead of two independent reads. Falls back to (state.activeRound, 0) when no round has closed with a composite yet. Web typecheck: clean. CritiqueTheaterMount.test.tsx: 7 / 7 cases still green against the new helper. * fix: wire M1 project override end-to-end + correct deferred-surface doc claims (PR #1338) Three lefarcen P2s on the latest review pass, all real: 1. M1 project override was half-wired: the daemon read metadata.critiqueTheaterEnabled but the web setter only wrote localStorage. A user opt-in would render the Theater on the web (localStorage was set) while the daemon resolved projectOverride=null and skipped critique unless env / phase already permitted. Two halves talking past each other. Extended setCritiqueTheaterEnabled to accept an optional { projectId, fetchProjectSettings } options bag. When a projectId is supplied, the setter ALSO sends a PATCH /api/projects/:id with { metadata: { critiqueTheaterEnabled } } so the daemon's spawn-time resolver picks the same value up on the next generation. The existing project-routes endpoint already accepts arbitrary metadata patches, so no new endpoint is needed. The local write + the CustomEvent dispatch still fire before the PATCH, so a network failure does not unwind the in-session UI flip. Three new vitest cases pin the new path: PATCHes when projectId is provided, skips when it is not, swallows a rejected PATCH so the in-session UI still flips. 2. Rollout docs (docs/critique-theater.md section 3) claimed the Settings toggle persists into the daemon settings store, but the previous implementation only had a localStorage reader / writer plus a daemon read of project metadata, with no round-trip. Rewrote the section to lead with the four-tier resolver (skill policy / project override / env / phase), document that the setter now round-trips via the existing PATCH endpoint when given a projectId, and call out the Settings panel UI control as a deliberate follow-up. 3. Troubleshooting table pointed users at /api/metrics/critique (Phase 12, deferred) and 'od adapters clear-degraded <id>' (CLI wrapper that does not exist). Replaced the metrics reference with the local conformance harness command (pnpm --filter @open-design/daemon vitest run tests/critique-conformance.test.ts) that ships today, with a note that the Phase 12 dashboard surfaces this status as a series once that PR lands. Replaced the CLI command with the programmatic clearDegraded() helper that exists today and flagged the CLI wrapper as planned follow-up. Web typecheck: clean. Toggle hook tests: 14 / 14 green (11 existing + 3 new for the round-trip path). * test(web): multi-round interrupt regression for bestRoundAndComposite (PR #1338) lefarcen P3 follow-up to the previous bestRoundAndComposite fix: the existing CritiqueTheaterMount.test.tsx interrupt cases only exercised a single-round state, so a future refactor back to two independent helpers wouldn't be caught by the test suite even though it'd reintroduce the round / composite drift bug. Added a regression case that: 1. Drives the reducer through two complete rounds with the full 5-role cast closing at distinct composites: round 1 at 8.5, round 2 at 6.0 (the high-composite round is NOT the most recent one). 2. Clicks Interrupt + waits for the daemon ack via the test seam fetcher returning 204. 3. Asserts the collapsed badge displays "round 1" (the correct best-composite round), and queryByText for "round 2 ... 8.5" returns null (the buggy pairing would have produced that string). The bestRoundAndComposite helper walks state.rounds in one pass and returns the matching pair, so the round number and the composite cannot drift apart. This test locks the fix in: a refactor that splits the helpers back into independent walks will be caught here. 8 / 8 vitest cases green on the file. * fix(web): read-merge-write the project metadata in setCritiqueTheaterEnabled (PerishCode P2 on PR #1338) The previous round-trip sent { metadata: { critiqueTheaterEnabled: next } } as the entire PATCH body. The daemon's project-routes handler only re-stamps three immutable fields (baseDir, importedFrom, fromTrustedPicker) before calling updateProject(db, id, patch), which then does a shallow { ...existing, ...patch } in apps/daemon/ src/db.ts. So patch.metadata replaces the row's metadata wholesale, dropping kind, templateId, linkedDirs, and every other field the rest of the app reads. No in-tree caller passes projectId today (only vitest cases), so the bug had not surfaced yet. But the surface is documented in docs/critique-theater.md section 3 and the function's own JSDoc as the M1 round-trip path, so it would have shipped as a latent footgun for the next integrator: a Settings UI follow-up, or any third party that wires the setter into a project-aware surface. Fix: read-merge-write rather than a bare patch. - GET /api/projects/:id to read the row's current metadata. - Spread that metadata into the PATCH body and overlay critiqueTheaterEnabled: next on top, mirroring the partial-metadata pattern already used in ChatComposer.tsx for linkedDirs. - PATCH the merged object. Failure handling: - GET fails: skip the PATCH entirely. We cannot construct a safe merged body without the current state, and a bare patch would wipe other metadata. The in-session CustomEvent fired earlier in the setter still keeps every mounted hook consistent; the next save retries the round-trip. - PATCH fails: log in dev. The in-session UI is already correct via the CustomEvent. Tests (TDD, red-first): - 'GETs the project then PATCHes with merged metadata when a projectId is supplied': stubs a GET that returns { kind: 'template', templateId: 'modern-blog', linkedDirs: [...] } and asserts the PATCH body equals the merge plus the toggle. - 'PATCHes with just the toggle when the project has no prior metadata': stubs a GET that returns no metadata block. - 'skips the PATCH (does not stomp metadata) when the prefetch GET fails': stubs a rejecting GET and asserts only the GET fires. - 'swallows a rejected PATCH after a successful prefetch': stubs a successful GET and a rejecting PATCH; asserts the in-session UI still flips via the CustomEvent. Doc updated on the setter's JSDoc to describe the new three-step flow (localStorage, CustomEvent, read-merge-write PATCH) and the two failure modes. Verified: - pnpm --filter @open-design/web typecheck clean. - pnpm --filter @open-design/web test: 111 files / 1055 tests green (was 1052, +3 from the new merge-flow cases). * fix(web): restore wait-for-daemon-ack pattern on Theater interrupt Same regression as flagged on PR #1316 post-main-merge: the optimistic local dispatch fired before the POST resolved, so a daemon 404 / 409 still terminalized the UI and the real SSE terminal event got ignored by the sticky interrupted phase. Snapshot runId / bestRound / composite at click time, dispatch interrupted only on res.ok, clear interruptPending on rejection or non-2xx so the user can retry. Tests cover rejection + 404 leaving the run on the live stage; the 204 path waits for the ack. * feat(daemon): Critique Theater Phase 12 observability foundations Lands the metrics registry, the structured logger, the /api/metrics route, and the adapter-degraded bump that wires up the first data point. The orchestrator-side bumps for runs / rounds / composite / must-fix / interrupted / parser_errors / protocol_version land in a follow-up commit on this branch (kept separate so the wiring diff reads cleanly against the registry shape). Surfaces added: - apps/daemon/src/metrics/index.ts: 9 Prometheus series under the open_design_critique_* namespace with the histogram buckets the spec calls out (round_duration_ms at 100 / 250 / 500 / 1000 / 2500 / 5000 / 10000 / 30000 / 60000 ms; composite_score at 0-10 integer steps). - apps/daemon/src/logging/critique.ts: 6 typed events, one JSON line per call on stdout, namespaced critique. Matches the JSON-per-line convention cli.ts already uses; no new logger framework. - apps/daemon/src/server.ts: GET /api/metrics route. Honors OD_METRICS_ENDPOINT=disabled to opt out for air-gapped installs. - apps/daemon/src/critique/adapter-degraded.ts: markDegraded now bumps degraded_total so the adapter-health dashboard panel reflects every TTL refresh and every fresh mark. Deps: prom-client ^15.1.0, @opentelemetry/api ^1.9.0 added to apps/daemon/package.json. Both are zero-config no-ops without an exporter wired; daemon bundle size impact is ~150 KB uncompressed. The @opentelemetry/api dep is in place ahead of the OTel-spans follow-up commit; it adds no behavior on this commit. Tests: - tests/metrics/critique.test.ts (3 cases): registry shape + exposition text + reset-between-tests - tests/logging/critique.test.ts (4 cases): event shape + ordering + newline framing + namespace stamping Verification (Windows-local): - pnpm --filter @open-design/daemon typecheck: clean - New metrics + logging suites: 7 / 7 green - Existing adapter-degraded + conformance + rollout suites: 22 / 22 green; the bump is non-breaking * feat(daemon): wire Critique Theater metrics + structured logs from the orchestrator Lights up the bump sites the Phase 12 foundations PR registered the series for. Every panel event the parser surfaces now reaches the matching Prometheus counter / histogram and the matching JSON log line on stdout. Switch-loop bumps + logs: - run_started: log run_started, set protocol_version gauge to the observed protocol version (small-integer cardinality). - panelist_open: record the first-open wall-clock per round so round_end can compute round_duration_ms; subsequent opens in the same round leave the start time untouched. - panelist_must_fix: bump must_fix_total with the panelist role. The wire event does not yet carry a dim name, so the label is 'unspecified' for now; a future parser revision can drop in the real dim without a metric rename. - round_end: bump rounds_total, observe composite_score, observe round_duration_ms (current ms minus the tracked start), log round_closed with the composite / mustFix / decision triple. - parser_warning (parser-yielded): bump parser_errors_total with the kind label, log parser_recover with kind + position. Orchestrator-side parser warnings (composite_mismatch and duplicate_ship from the daemon-authoritative scoring checks) go through a new emitParserWarning helper so the bus emit, the collectedEvents push, the metric bump, and the log line stay in lockstep. Three inline emission sites collapse to one-line helper calls. After the try/catch, a single terminal-status switch bumps runs_total{status, adapter, skill} once per run, with branch- specific log + counter: - shipped / below_threshold: log run_shipped - interrupted: bump interrupted_total, log run_failed{cause: interrupted} - timed_out: log run_failed{cause: timed_out} - failed: log run_failed{cause: orchestrator_internal} - degraded: log degraded{reason: orchestrator_classified} OrchestratorParams gains optional skill: string for the label; defaults to 'unknown' so spawn sites that have not yet threaded it keep working without a metric shape change. Tests: - The new metrics + logging suites (7 / 7) verify registry shape and event framing; orchestrator-side metric integration is exercised through the existing critique-conformance and critique-adapter-degraded suites (22 / 22 still green). - Logger test reassigns process.stdout.write directly instead of vi.spyOn so the Node overloaded write signature does not collide with MockInstance<unknown>. * feat(observability): Grafana dashboard JSON for Critique Theater Three default rows mapping to the metrics this branch wires up: 1. Fleet quality: composite score p50 / p90 / p99 line graph by adapter, plus a heatmap of the composite distribution. The line graph answers 'are my agents getting better over time'; the heatmap answers 'are the bad runs clustered around one adapter or smeared across the fleet'. 2. Adapter health: stacked bar charts for degraded marks (by adapter / reason) and parser errors (by adapter / kind) over a 5-minute window. The two queries together let an operator see 'is this adapter degraded because of malformed wire output or because of oversize blocks' without flipping panels. 3. Brief throughput: runs-per-hour by terminal status, an average rounds-per-run stat per adapter, and a round-duration ms p50 / p90 / p99 line. Throughput numbers fall straight out of the runs_total / rounds_total counters; the duration histogram is the same one the runs feed. The dashboard uses a templated $datasource var (defaults to 'prometheus') so an operator with multiple Prometheus instances can switch without editing JSON. Schema version 39 (Grafana 11). Operators import via: pnpm dlx @grafana/cli dashboard import tools/dev/dashboards/critique.json or paste into a provisioned dashboards directory. The file is checked into the repo as a starting artifact; alert rules and SLO panels ship after the first 1000 runs inform the right thresholds. JSON validates with node -e 'JSON.parse(...)' (sanity checked locally). * feat(daemon): OpenTelemetry outer span around the critique run Wraps each runOrchestrator call in a 'critique.run' span via the existing @opentelemetry/api dep added in the Phase 12 foundations commit. Attributes set on the span: - critique.run_id, critique.adapter, critique.skill at start - critique.final_status, critique.final_composite on terminal resolution - span status flipped to ERROR for failed / timed_out runs so a Tempo / Honeycomb / Jaeger filter on traces.status=error surfaces the right slice without joining back to Prometheus No exporter is wired by default; @opentelemetry/api is the API package and intentionally splits from @opentelemetry/sdk-, so the span is zero-overhead until an operator attaches an SDK through their runtime config. Inner per-round / parse_chunk / scoreboard_eval / persist_round / ship.persist spans defined in the Phase 12 plan are a follow-up: the outer span alone gives the trace a duration + final status + adapter/skill labels, which is the 80% value for dashboards that correlate runs across services. Adding child spans inside the existing 600-line orchestrator without restructuring is a separate careful change. Verification: - pnpm --filter @open-design/daemon typecheck: clean - 29 / 29 critique + metrics + logging tests still green fix(nix): bump pnpmDepsHash for prom-client + @opentelemetry/api lockfile bump nix-check failed on PR #1485 with hash mismatch in open-design-daemon-pnpm-deps and open-design-web-pnpm-deps after the Phase 12 foundations commit (`2b8b7445`) added prom-client and @opentelemetry/api to apps/daemon/package.json and refreshed pnpm-lock.yaml. CI reported the new sha: specified: HFLm+8hv3o5x3Xem4MXNsNclIgiVRc70+EBafL0rVn8= got: 7R1sQC38gOT0gsZ2oNOviCZ486cbbGJGJCis6WI8z9s= Both nix files pin the same workspace lockfile, so both flip in lockstep. No other Nix surface changes required. * fix(daemon): four Phase 12 review findings (Codex P2 x2 + Siri-Ray P2 + lefarcen P2) 1. Siri-Ray P2 in orchestrator.ts (round metric / log used untrusted agent values). The new observability path now records rs.composite and rs.mustFix (daemon-authoritative) instead of event.composite and event.mustFix when rs exists, and skips the bumps + log entirely when rs is missing (a degenerate round_end without any matching panelist_open). The dashboard p50 / p90 / p99 now agrees with persistence and ship decisions; an adapter reporting <ROUND_END composite='10'> while the daemon computed 6 logs 6 and still emits the composite_mismatch parser warning the prior block was already producing. 2. Codex P2 in server.ts (skill label always 'unknown'). The spawn path called runOrchestrator without passing the resolved skill id, so every live run bumped open_design_critique_{skill='unknown'} and the per-skill dashboard breakdown was always empty. Threaded effectiveSkillId (already computed at the same handler scope as the project skill fallback) through skill: . . . so the metric reflects the real skill when one is assigned, and the orchestrator default of 'unknown' only fires for runs that genuinely have none. 3. Codex P2 in conformance.ts (protocol-version mismatch let through). An adapter that emitted <CRITIQUE_RUN version='2'> followed by a valid SHIP classified as shipped because the harness only watched for terminal events. Added a guard inside the parse loop: if a run_started carries protocolVersion !== CRITIQUE_PROTOCOL_VERSION, mark the adapter degraded with reason 'protocol_version_mismatch' (already in DEGRADED_REASONS) and return early. ConformanceOutcome union widened to accept the new reason. 4. lefarcen P2 in tools/dev/dashboards/critique.json (runs-per-hour panel under-reported by 3600x). 'rate(...[1h])' returns per-second. Multiplied by 3600 so the panel title and unit match the actual value rendered. Verification: - pnpm --filter @open-design/daemon typecheck: clean - New metrics + logging suites (7), existing adapter-degraded (7), conformance (5), rollout (10): 29 / 29 green - Grafana JSON re-parses with node -e 'JSON.parse(...)' fix(nix): set pnpmDepsHash to fakeHash so CI surfaces the real hash for the regenerated lockfile (lefarcen P1 on PR #1485) * fix(nix): pin pnpmDepsHash to sha256-NtXbiRU0YZ4EVJVNC6N3sR1S0ozA3BvCwgXI0L0OMH4= from CI nix-check output --------- Co-authored-by: Nagendhra <nagendhra405@gmail.com>	2026-05-13 22:11:27 +08:00
pftom	8b2d48a258	feat(daemon, web): enhance plugin preview handling and add new templates - Introduced logic to assemble example slides with a companion template when the declared entry is missing, improving the user experience for plugin previews. - Updated the server logic to handle special cases for `example-slides.html`, ensuring proper fallback to `template.html` when applicable. - Enhanced tests to verify the new preview assembly functionality and ensure correct rendering of fallback content. - Added new HTML and Markdown examples for various skills, including a magazine article layout and a Twitter share card, expanding the available templates for users. This update significantly improves the plugin preview experience, providing users with more robust and visually appealing fallback options.	2026-05-13 20:58:24 +08:00
Prantik Medhi	cd30ec6fef	docs: add Windows launcher note (#1546 )	2026-05-13 16:46:42 +08:00
github-actions[bot]	1c27e34e02	Update docs/assets/github-metrics.svg - [Skip GitHub Action] (#1492 ) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2026-05-13 13:34:14 +08:00
pftom	c9cc3b88c0	feat(web): standardize plugin terminology and enhance UI components - Updated terminology from "Community" to "Official" across various components to reflect first-party plugin status. - Enhanced the ChatComposer, HomeHero, and PluginsHomeSection components to improve user experience and clarity in plugin management. - Improved CSS styles for better visual consistency and layout across plugin-related interfaces. - Added tests to ensure proper functionality and visibility of official plugins in the UI. This update reinforces the distinction between official and user-installed plugins, enhancing the overall user experience in plugin interactions.	2026-05-13 12:19:29 +08:00
nettee	0f0d2879ff	Make de/fr/ru content i18n optional (#1511 )	2026-05-13 12:17:17 +08:00
Nagendhra Madishetti	e2f409579d	docs: Critique Theater Phase 14 (user guide + 2 AGENTS module maps) (#1319 ) * feat(web): pure reducer for Critique Theater states (Phase 7.1) Pure CritiqueState reducer driven by the contracts-level PanelEvent (the same shape both the live SSE stream and the recorded transcript emit), so a single reducer powers both the in-flight panel and the rerun replay. Lifecycle covers run_started → running → (shipped / degraded / interrupted / failed), with panelist_open / dim / must_fix / close / round_end events building per-round CritiquePanelistView entries as they arrive. Defensive behaviour that surfaced while writing the spec tests: - Terminal phases (shipped / degraded / interrupted / failed) are sticky against further lifecycle events for the same run, except for parser_warning which can land late and is recorded in a side channel without changing phase. - A new run_started for a different runId at any time discards the prior state and reboots, so the UI can launch consecutive runs without an explicit reset action. - Events whose runId does not match the active run return the same state reference, so React's useReducer doesn't re-render subscribers on stray traffic. - Round bookkeeping keys by round number rather than "always last", so an out-of-order panelist_dim for round 1 arriving after a round 2 dim does not corrupt the round 2 bucket. Test coverage: 18 cases covering each transition, the runId guard, sticky-terminal behaviour, the out-of-order round invariant, and the stable-identity guarantee. Sets up Phase 7.2 and 7.3 to wire SSE + replay into the same reducer. * feat(web): useCritiqueStream hook subscribes to SSE and feeds reducer (Phase 7.2) createCritiqueEventsConnection is a pure connection manager that mirrors apps/web/src/providers/project-events.ts: opens an EventSource at /api/projects/:id/events, listens for every name in CRITIQUE_SSE_EVENT_NAMES, decodes each frame back into a PanelEvent (stripping the critique. prefix and merging the data payload), and hands it to the caller's onEvent. Reconnect uses exponential backoff (1s → 30s) and resets on `ready`; malformed payloads drop with a dev-mode warning rather than tearing the stream. useCritiqueStream wraps the manager in a useReducer that owns the CritiqueState. enabled=false or a null projectId tears down the connection cleanly; switching projectId closes the old connection and opens a fresh one. The returned dispatch lets local UI synthesise actions (e.g. an Esc keypress firing a synthetic interrupted while a kill request is in flight); production traffic comes from the SSE stream. Test coverage: - sse.test.ts (10 cases, node env): subscription set covers every CRITIQUE_SSE_EVENT_NAMES channel; payload decoding lifts the wire shape back to PanelEvent; malformed JSON is swallowed and does not stop the stream; exponential backoff schedule and ready-reset semantics are pinned with a setTimeout seam; close() cancels pending reconnects and shuts the live source; no-op fallback when EventSource is unavailable. - useCritiqueStream.test.tsx (6 cases, jsdom env): idle pre-event, reducer driven by synthetic actions, no connection when disabled or projectId is null, clean close on unmount, projectId change reopens cleanly. * feat(web): useCritiqueReplay hook drives reducer from transcript file (Phase 7.3) Fetches the per-run NDJSON transcript (one PanelEvent per line), parses every line via the shared isPanelEvent predicate, and dispatches into the same CritiqueState reducer the live SSE stream uses. A single reducer means the UI rendering a replay can be identical to the live panel, and a UI mounting both useCritiqueStream and useCritiqueReplay in parallel does not have to reconcile two state shapes. speed knob is `paused \| instant \| live \| { intervalMs: N }`. - instant flushes every event synchronously, useful for opening a finished run already at its terminal state. - intervalMs paces dispatches at a fixed cadence so the reviewer can watch the run unfold. - paused parses the transcript but holds events back until the caller advances speed (consumers can drive a scrubber later). - live is reserved for the future "playback at original cadence" feature, currently treated as instant; replay timestamps are not yet persisted with each event so honest pacing requires a follow-up Phase 7+ task. gunzip seam handles `.ndjson.gz` transcripts via DecompressionStream when present; the production fetch path picks between text and arrayBuffer based on the URL extension. Both seams are injectable so the unit tests don't need to spin up a real network or a real gzip pipeline. Test coverage (8 cases, jsdom env): - Idle status before any URL is provided. - speed=instant flushes the full transcript synchronously to shipped state. - speed={intervalMs:N} paces with the setTimeout seam, reaching done after the last tick. - speed=paused leaves status=playing with no dispatches. - Empty transcript reports done with state still idle. - Fetch rejection surfaces an error status with the message. - Malformed NDJSON lines are skipped; valid events around them still land. - .gz transcripts route through the gunzip seam. Closes the Phase 7 plan tasks 7.1 / 7.2 / 7.3 (reducer + stream + replay), all on one branch ready for review. Phases 8+ (Theater components) consume these from this PR. * fix(web): close payload-override gap + paused-resume bug in Critique Theater hooks (Phase 7 review) Two P1 fixes from lefarcen's review on PR #1307: SSE payload override `sseToPanelEvent` previously spread `data` after the channel-derived `type`, so a payload-provided `type` could override the channel and route a `critique.run_started` frame into the reducer as a `ship` action. Reversed the spread so the channel-derived `type` is authoritative, and revalidated the resulting object through the contracts-level `isPanelEvent` predicate before returning. Frames that fail validation (missing runId, empty runId, unknown type) are dropped, so a malformed or compromised SSE frame can no longer dispatch a wrong-shape action into the reducer. Three new sse.test.ts cases pin the regression: hostile `type:'ship'` in the payload still resolves to `run_started`, missing runId is dropped, empty runId is dropped. Replay pause/resume `useCritiqueReplay` had one big effect keyed on `transcriptUrl` only, so flipping `speed` from `paused` to `instant` never re-fired and the held events sat undispatched. Split into a parse effect (depends on URL, fetches and stores events in state) and a pace effect (depends on parsed-events + speed, owns the cursor + timers). The playback cursor lives in a ref that survives pause/resume cycles, so flipping `paused` -> `instant` flushes from the current position rather than restarting (which would double-dispatch `run_started` and reset the reducer). Two new useCritiqueReplay.test.tsx cases: - paused-then-instant transitions from `playing` to `done` and reaches the shipped terminal phase - intervalMs paced playback dispatches one event, pauses to drain the next scheduled timer, flips to instant, and confirms the remaining transcript drains exactly once (cursor was preserved) Doc consistency The earlier source comment in useCritiqueReplay.ts claimed `live` "paces by recorded timestamps" while the impl used zero-delay timers and the PR body said it behaves like `instant`. Aligned to reality: `live` currently behaves like `{ intervalMs: 0 }` (events drain on successive microtasks via setTimeoutFn) because transcripts do not yet carry per-event timestamps. Honest timestamp-driven pacing is queued as a Phase 7+ follow-up. Validated: pnpm guard, pnpm --filter @open-design/web typecheck, Theater suite 47/47 (up from 42, +3 sse + 2 replay), full web suite 96 files / 888 tests. * feat(i18n): seed Critique Theater key block (en + zh-CN; other locales fall back via spread) * feat(web): Theater PanelistLane component (Phase 8.1) * feat(web): Theater ScoreTicker component (Phase 8.2) * feat(web): Theater RoundDivider component (Phase 8.3) * feat(web): Theater InterruptButton component with Escape keybind (Phase 8.4) * feat(web): Theater TheaterDegraded chip (Phase 8.5) * feat(web): Theater TheaterCollapsed post-run summary (Phase 8.6) * feat(web): Theater TheaterTranscript replay surface (Phase 8.7) * feat(web): Theater TheaterStage top-level container (Phase 8.8) * feat(web): Theater CSS using existing semantic tokens (no hex literals) * feat(web): Theater public exports barrel * fix(web): resolve P2 + P3 review feedback on Phase 8 (PR #1314) Addresses all 4 P2 + 3 P3 items from codex, Siri-Ray, and lefarcen. State-lifecycle fixes (3 x P2) 1. Reducer learns a synthetic `__reset__` action (`CritiqueResetAction`). Host hooks dispatch it when their gating prop changes so a stale run from a prior project / transcript cannot bleed into the next context. Reset is idempotent on idle (returns the same reference). 2. `useCritiqueStream` dispatches `__reset__` at the top of its connection effect, so a workspace switch from project A (which streamed a critique) to project B clears the reducer before the new EventSource opens. enabled=false also clears. 3. `useCritiqueReplay` dispatches `__reset__` at the top of its parse effect, so transcriptUrl swaps (including swap-to-null after a replay reached `shipped`) lift the reducer back to idle before the new fetch starts. SSE validation (1 x P2) 4. `sseToPanelEvent` now runs a per-variant `hasValidVariantShape` check after the cheap `isPanelEvent` predicate. A `critique.ship` frame missing `composite` / `round` / `status` / `artifactRef` is rejected before reaching the reducer, so TheaterCollapsed can no longer crash on `undefined.toFixed(1)`. Every variant's required fields are validated: run_started (protocolVersion, non-empty cast, maxRounds, threshold, scale), panelist_* (round, role, plus variant-specific shape), round_end (round, composite, mustFix, decision in {continue,ship}, reason), ship (round, composite, status, artifactRef.{projectId,artifactId}, summary), degraded (reason, adapter), interrupted (bestRound, composite), failed (cause), parser_warning (kind, position). Reducer correctness (1 x P2) 5. `panelist_open` now materializes the round + an empty panelist view (`{dims: [], mustFixes: []}`) so TheaterStage can highlight the in-progress lane the instant the tag opens. Before this, a stream that emitted only `panelist_open` after `run_started` left `rounds = []` and the UI rendered no current round until a later `panelist_dim` arrived. Polish (3 x P3) 6. Brand role tint swaps from `var(--magenta, var(--accent))` to `var(--purple, var(--accent))`. `--purple` is actually defined across the design systems; `--magenta` is not, so Brand was silently falling through to `--accent` and looking identical to Designer. 7. New i18n key `critiqueTheater.interruptedSummary` for the interrupted-collapse copy ("Interrupted at round N, best composite X.X"). Previously the interrupted branch reused `shippedSummary` and the UI read "Shipped at round..." for a run that specifically did not ship. Native value in en + zh-CN; other locales fall back via `...en` spread. 8. `TheaterDegraded` heading id comes from `useId()` instead of a hardcoded `theater-degraded-heading`, so two chips rendered on the same page (chat history with multiple completed runs) keep their aria-labelledby references unambiguous. Tests (15 new cases) - reducer.test.ts (+5): __reset__ on running/terminal/idle, panelist_open materializes round, panelist_open does not stomp prior panelist data. - sse.test.ts (+6): variant-level rejection for ship without required fields, degraded without adapter, run_started with empty cast, panelist_dim with non-numeric score, round_end with unknown decision, plus a positive fully-formed ship. - useCritiqueStream.test.tsx (+2): state reset on projectId change, state reset on enabled flip false. - useCritiqueReplay.test.tsx (+1): state reset on transcriptUrl swap to null after a replay reached shipped. - TheaterCollapsed.test.tsx (text-pinning update): asserts the interrupted branch reads "Interrupted at round 1" + "best composite 7.9", and explicitly NOT "Shipped at round...". - TheaterDegraded.test.tsx (+1): two chips on the same page get unique aria-labelledby ids that each resolve to an `<h3>`. Validated - pnpm guard clean - pnpm --filter @open-design/web typecheck clean - Theater suite: 13 files, 101 tests (was 86 on the first Phase 8 push, +15 new) - tests/i18n/locales.test.ts 5 of 5 across 18 locales * feat(web): CritiqueTheaterMount wires SSE + reducer into a single drop-in (Phase 9.1) * feat(i18n): Critique Theater strings for de + ja + ko + zh-TW (Phase 9.2) * fix(web): resolve P1 + P2 review feedback on Phase 9 (PR #1315) Addresses every blocker from codex, Siri-Ray, and lefarcen. The three state-lifecycle and SSE-validation issues they also flagged inherit fixes from PR #1314's review pass that this branch now sits on top of after rebase. Real daemon kill on Interrupt (P1) - CritiqueTheaterMount now POSTs to /api/projects/:id/critique/:runId/interrupt alongside the optimistic local dispatch. Before this fix, clicking Interrupt only flipped the React state to interrupted while the daemon job kept running. The fetch is best-effort: a 404 (endpoint not wired yet, lands in Phase 15) is swallowed with a dev-mode console.warn so the UI still moves to the collapsed badge. - New fetchInterrupt test seam lets RTL assert on the URL / method and simulate the "daemon not ready yet" path. Two tests pin both: the happy URL proj-42/critique/run-abc/interrupt POSTs, and a rejected fetch still flips the UI. interruptPending reset on new run (P2) - A ref-backed effect compares the current runId against the last one we saw; when it changes, interruptPending is cleared. A user who interrupts run-1 and then triggers run-2 from the same mount now gets a fresh, enabled kill button instead of one stuck in "Interrupting…". Pinned by a new mount test. Escape keybind scope (P2) - InterruptButton now checks the keydown target. Escape inside an input, textarea, select, or contenteditable element is ignored (and any ancestor of those via closest() is treated the same way). Body-level focus still fires the keybind so the Theater area's affordance keeps working. Four new tests cover textarea, input, contenteditable, and the body-focus positive case. userFacingName i18n key (P2) - The spec at specs/current/critique-theater.md:6 mandates a single critiqueTheater.userFacingName key so the "Design Jury" label can be renamed without touching code. Phase 8 introduced critiqueTheater.title by mistake; renamed across types.ts, en.ts, zh-CN.ts, de.ts, ja.ts, ko.ts, zh-TW.ts, and the lone consumer TheaterStage.tsx. The locale alignment test stays green. Validated - pnpm guard clean - pnpm --filter @open-design/web typecheck clean - Theater suite: 14 files, 112 tests (was 101 before, +11 new for the Phase 9 review pass: 3 mount + 4 InterruptButton focus scope; the rest were already in #1314's review fix). - tests/i18n/locales.test.ts 5 of 5 across 18 locales. * feat(daemon): adapter-degraded registry with TTL (Phase 10.1) In-memory registry recording adapters that produced malformed or oversize transcripts so the orchestrator can skip them for a TTL window (default 24h) instead of cycling through known-bad providers on every run. Records carry reason (malformed_block \| oversize_block \| missing_artifact), source label, and expiresAt. The test-only clock seam lets the suite advance time deterministically and prove that an expired entry stops counting as degraded without anyone calling clearDegraded. 7/7 vitest cases green. * feat(daemon): synthetic good + bad adapter fixtures (Phase 10.2) Two test-only adapters that read the existing v1 transcript fixtures (happy-3-rounds and malformed-unbalanced) and replay them as either a full string or a 512-byte chunked stream. The chunked form is what the conformance harness uses to prove the parser holds together when the transcript arrives in arbitrary network slices, not as one buffered blob. * feat(daemon): adapter conformance harness (Phase 10.3) runAdapterConformance pulls a transcript through the same parseCritiqueStream pipeline the orchestrator uses and classifies the outcome as shipped, degraded, or failed. On a degraded outcome it forwards the matched reason to the adapter-degraded registry, so a single nightly conformance run is what populates the skip list rather than the orchestrator learning each adapter is broken at request time. 5/5 vitest cases green covering shipped, malformed degraded, oversize degraded, no-ship failure, and the harness-thrown failure path. * test(e2e): Critique Theater Playwright suite (Phase 11) Six tests, one viewport per visual case, deterministic SSE fixtures stubbed via page.route(). Adds the suite to test:ui:extended so the existing extended-UI lane picks it up. Coverage: 1. Happy path: a single mounted theater plays the full fixture (1 run_started, 5 panelists open / dim / must_fix / close, 1 round_end, 1 ship) and ends on the score badge. 2. Interrupt mid-run: the panelist that is open at the time the interrupt button is clicked closes with an interrupted marker and the transcript freezes there. 3. Visual regression at 375x720 mobile. 4. Visual regression at 768x1024 tablet. 5. Visual regression at 1280x800 desktop. 6. A11y role tree: the theater region exposes a labelled landmark, each panelist lane is a group with an accessible name, the score is a status live region. All SSE traffic is stubbed by page.route so the suite runs in CI without a daemon. The toggle is seeded via localStorage by bootAppWithCritiqueEnabled so the gate behaves as if Settings flipped it on. typecheck clean; playwright --list reports 6. * test(web): reducer p99 bench at 10k iterations (Phase 13.1) Locks the documented 2ms budget for the Critique Theater reducer on a representative SSE script (27 actions, one full happy run) behind a regression gate. Asserts p99 stays under 4ms (2x the documented budget) so CI runners with a noisy neighbour do not flake while a real regression to 20ms or 200ms still trips. The bench is a vitest case rather than a bare microbenchmark so it runs in the same CI lane as every other web test and does not need a parallel runner. * test(web): critique surface coverage walker (Phase 13.2) Walks the public critique surface (11 SSE event names, 5 panelist roles, 6 lifecycle phases, 9 named i18n keys) and asserts each named symbol appears in both the src corpus and the test corpus. The walker is the gate that catches a rename in one half of the codebase without a matching update in the other half: a future PR that drops 'panelist_must_fix' from the reducer without also removing its test reference fails this suite. 62 assertions, one per symbol per corpus. * docs: Critique Theater user guide (Phase 14.1) Seven sections aimed at end users (not contributors): 1. What is Design Jury 2. How it works (the five panelists, auto-converging rounds, the composite formula) 3. Settings (the M1 toggle and what it does) 4. Reading the score badge 5. Replay surface 6. Troubleshooting (degraded, interrupted, failed) 7. FAQ The composite formula is documented as designer * 0 + critic * 0.4 + brand * 0.2 + a11y * 0.2 + copy * 0.2 because anyone trying to reverse-engineer the score is going to search for those weights and the docs are the place they should land first. * docs(daemon): critique module AGENTS map (Phase 14.2) Daemon-side wayfinder for the apps/daemon/src/critique directory. Tables every file, what owns what invariant, and the 'when you change anything here' guide so a future contributor does not have to reverse-engineer the rollout resolver before adding a new SSE event. * docs(web): Theater module AGENTS map (Phase 14.3) Web-side mirror of the daemon AGENTS map. Same file table, same invariants section, same change-impact guide, sized to the Theater component package. * docs: tighten Phase 14 reasoning from lefarcen review (PR #1319) Four content gaps lefarcen flagged in the Phase 14 docs review, addressed inline rather than deferred. The fifth item (scope-drift between 'docs only' PR body and the cumulative stacked diff) is handled by rewriting the PR body, not the docs. 1. Round exit conditions (lefarcen P2-1). docs/critique-theater.md §2 'Auto-converging rounds' now lists the five conditions that stop a run (threshold reached, round budget exhausted, per-round timeout, total timeout, user interrupt) with their default values. A user debugging a run that stopped at round 1 with composite 5.4 can read this list and find the matching cause without spelunking the orchestrator. 2. Prior-art comparison (lefarcen P2-2). New §1.5 'Why an in-CLI panel and not a third-party design lint' pre-answers the 'why not Figma lint / Adobe checker / Material You conformance' question. Three differences: rule engines vs generative reviewers, post-hoc vs in-loop, external service vs same-CLI-session. 3. Composite formula rationale (lefarcen P2-4). §2 now explains why each weight is set the way it is: critic gates correctness so it gets 0.4; brand / a11y / copy are secondary quality dimensions at 0.2 each; designer is at 0.0 in v1 because aesthetic preference is not a ship gate. The slot stays in the schema so notes flow into the transcript and a v2 config release can bump the weight without a wire-shape change. 4. v2 cast-config ownership (lefarcen P2-3). Both AGENTS.md files (daemon + web) now declare a 'Designer weight frozen at 0.0 until v2 cast config' invariant. The daemon side calls out where the SKILL.md frontmatter resolver lands (apps/daemon/src/critique/config.ts); the web side calls out where the Settings surface lands (apps/web/src/components/ Settings/). A contributor reading either AGENTS.md before implementing v2 sees which module to touch first. * docs(web): mirror the Designer-weight invariant in Theater AGENTS.md (PR #1319) lefarcen P1 follow-up on PR #1319: the daemon AGENTS.md already declares 'Designer weight is frozen at 0.0 until v2 cast config lands' as an invariant, but the web AGENTS.md's parallel bullet led with 'Composite weights are read-only on the web side' which buried the Designer-specific constraint. A web contributor reading that bullet would not realise the v1 weight distribution is wire-shape (changing it mid-v1 invalidates persisted critique_runs composite values). Rewrote the bullet to lead with the same 'Designer weight is frozen at 0.0 until v2 cast config lands' phrasing the daemon side uses, and added an explicit cross-link to the daemon AGENTS.md so the two halves of the invariant read as one rule. Web-side specifics retained: ScoreTicker / TheaterCollapsed read composite off the wire (no client recompute), v2 lands as a Settings surface at apps/web/src/components/Settings/, do not add a 'weights' prop to any component in this directory until the contracts package carries the v2 cast type. * docs: replace deferred metrics endpoint reference + refresh Theater module map (PR #1319) Two carryover items lefarcen flagged across the PR #1319 + #1320 reviews. 1. docs/critique-theater.md was sending users to /api/metrics/critique as the conformance-status check on malformed_block, but the Phase 12 metrics endpoint is explicitly deferred until after orchestrator wiring lands. Replaced the link with the pnpm conformance-harness command that DOES exist today (pnpm --filter @open-design/daemon vitest run tests/critique-conformance.test.ts) and noted that the dashboard surfaces this status as a series once Phase 12 ships. 2. apps/web/src/components/Theater/AGENTS.md module map was stale after Phase 15: the index.ts row said 'only two hooks are exported' but the barrel now exports useCritiqueTheaterEnabled too (plus the setCritiqueTheaterEnabled setter). Updated the row to list all three hooks + the setter + the reducer-derived contract types, and added a new row for hooks/useCritiqueTheaterEnabled.ts in the file table so a web contributor scanning the table sees the new hook without inferring it from the index.ts blurb. * fix(web): restore wait-for-daemon-ack pattern on Theater interrupt Same regression as flagged on PR #1316 post-main-merge: the optimistic local dispatch fired before the POST resolved, so a daemon 404 / 409 still terminalized the UI and the real SSE terminal event got ignored by the sticky interrupted phase. Snapshot runId / bestRound / composite at click time, dispatch interrupted only on res.ok, clear interruptPending on rejection or non-2xx so the user can retry. Tests cover rejection + 404 leaving the run on the live stage; the 204 path waits for the ack. --------- Co-authored-by: Nagendhra <nagendhra405@gmail.com>	2026-05-13 12:11:48 +08:00

1 2 3

140 commits