open-design

mirror of https://github.com/nexu-io/open-design.git synced 2026-06-01 03:14:35 +07:00

Author	SHA1	Message	Date
李韭二	440de58e10	docs(packaging): clarify Intel Mac ZIP support (#1874 ) Co-authored-by: li9292 <li9292@li9292s-MacBook-Air.local>	2026-05-16 22:27:33 +08:00
lefarcen	22a3b99a47	Merge origin/main into preview/v0.8.0 Sync 49 commits from main. Conflicts resolved: - .github/workflows/ci.yml: kept v0.8.0 granular per-area gating, added main's linux specs + release-stable.yml + release-preview.yml triggers - .github/workflows/release-preview.yml: kept v0.8.0's full workflow over main's placeholder - apps/web/src/components/AssistantMessage.tsx: combined v0.8.0 file-ops summary with main's stripTodoToolGroups + suppressAskUserQuestionFallbackText - apps/web/src/components/ChatPane.tsx: kept both new imports - apps/web/src/index.css: kept both .msg-plugin-chip and .user-copy-btn blocks - e2e/ui/*.test.ts: kept v0.8.0 openEntrySettingsDialog helper over main's inline dialog navigation (UI was redesigned in v0.8.0) - nix/package-{daemon,web}.nix: kept v0.8.0 pnpmDepsHash; rerun nix build to refresh	2026-05-15 18:23:33 +08:00
Olin Hendershot	74637f1cb5	Add Linux packaged client parity smoke coverage (#1204 ) * docs: plan linux client issue 709 * fix: complete linux headless lifecycle routing * feat: add linux packaged inspect * test: add linux headless packaged smoke * ci: add linux headless packaged smoke * ci: smoke linux AppImage release artifacts * docs: document linux packaged client status * chore: finalize linux client audit remediation * docs: add linux client publication packet * test: harden linux client smoke coverage * ci: preserve linux smoke audit evidence * refactor: consolidate linux e2e helpers Move pathExists and the desktop/web/daemon app-key array out of linux.spec.ts into linux-helpers.ts, where expectPathInside and linuxUserHome already live. Keeps the spec file focused on tests and the helpers file as the canonical home for shared Linux e2e utilities. * fix: move linux e2e helpers to lib * fix: address linux release review blockers * fix: drop npm dependency from containerized linux build writeAssembledApp() previously called runNpmInstall() which executed `npm install` directly. Inside the containerized build path, electronuserland/builder:base strips npm/npx/corepack, so the inner tools-pack build would fail at the assembled-app install step. Route the install through OD_TOOLS_PACK_PNPM_BIN: buildDockerArgs sets the env to the standalone pnpm binary it bootstraps, and the new resolveProductionInstallCommand helper consumes that env to run `<bin> install --prod --no-lockfile --config.node-linker=hoisted`. Host invocations with no env set keep the prior npm behavior. --config.node-linker=hoisted preserves the flat node_modules layout that electron-builder packs the same way as npm-installed trees. New tests cover the resolver branches and assert the docker-arg-to- resolver chain end-to-end so reviewers can see the container's inner build receives the env that switches its install away from npm. * fix: harden linux container bootstrap * fix: validate desktop marker liveness in headless cleanup cleanup --headless previously skipped on any parseable desktop-root.json, trapping recovery when the AppImage had crashed and left a stale marker. Validate the marker the same way stopPackedLinuxApp does: if the PID is not in the live snapshot list, proceed through cleanup instead of skipping. Extract the validation into validateDesktopAppImageMarker so the stop and cleanup paths share one definition of live and owned. Tests cover both branches: a stale marker drives cleanup to remove the runtime/output roots, while a live marker drives cleanup to skip and preserve them.	2026-05-15 16:38:29 +08:00
PerishCode	545aed642e	Merge remote-tracking branch 'origin/preview/0.8.0' into preview/v0.8.0 # Conflicts: # nix/package-daemon.nix # scripts/postinstall.mjs	2026-05-14 21:46:05 +08:00
PerishCode	883598f556	Build registry protocol in packaged workspaces	2026-05-14 21:23:45 +08:00
Tom Huang	76defffb93	Garnet hemisphere (#1702 ) * feat(chat-composer): enhance mention handling and input overlay - Introduced a new overlay for inline mentions in the chat composer, improving user experience by visually indicating mentions as users type. - Updated the `ChatComposer` component to manage mention entities and integrate them into the input field, allowing for better context and interaction. - Enhanced the `AssistantMessage` component to support the display of plugin action panels based on the current project context, facilitating easier plugin management. - Refactored related components to ensure consistent handling of project files and mentions across the application. This update significantly improves the chat interaction model, making it more intuitive for users to engage with mentions and plugins. * feat(plugin-management): enhance plugin action panels and UI components - Updated the `AssistantMessage` component to include plugin action panels based on the latest project context, improving user interaction with generated plugins. - Refactored the `PluginsView` to support detailed views for available marketplace entries, allowing users to access more information and actions for each plugin. - Introduced new CSS styles for improved visual representation of plugin-related UI elements, enhancing overall user experience. - Enhanced the `listPlugins` function to include an option for fetching hidden plugins, providing more flexibility in plugin management. This update significantly improves the usability and functionality of the plugin management system, making it easier for users to interact with and manage their plugins. * fix(assistant-message): refine plugin folder candidate selection logic - Updated the `pluginFoldersTouchedThisTurn` function to improve the logic for selecting plugin folder candidates based on touched paths and message content. - Introduced a new helper function, `pathMatchesFolderFileBasename`, to enhance the matching criteria for folder candidates. - Added a check for explicit folder matches before falling back to a single candidate, improving accuracy in folder selection. - Modified the `shouldRenderSlotAsText` function in `HomeHero` to include the name parameter, refining the rendering logic for slot text. These changes enhance the functionality and reliability of the assistant message component in managing plugin folder candidates. * feat(plugin-folder-actions): implement agent-routed CLI actions for plugin management - Introduced a new `PluginFolderAgentAction` type to streamline actions related to plugin folders, including install, publish, and contribute. - Updated the `DesignFilesPanel`, `FileWorkspace`, and `AssistantMessage` components to utilize the new agent action handling, improving user interaction with generated plugins. - Refactored the action handling logic to send commands to the agent, enhancing the workflow for managing plugin folders. - Added corresponding tests to ensure the new functionality works as expected and integrates seamlessly with existing components. This update significantly enhances the plugin management experience by routing actions through the agent, allowing for a more cohesive and interactive user experience. * Fix PR 1702 CI blockers * Fix PR 1702 remaining CI checks * Prebuild AGUI adapter after install * Restore plugin project snapshot wiring * feat(marketplace): refactor marketplace URL handling and enhance fetching logic - Introduced new functions to normalize marketplace URLs and manage fetching of marketplace manifests, improving the reliability of marketplace integrations. - Updated the server and plugin logic to utilize the new fetching mechanisms, ensuring consistent handling of marketplace data. - Enhanced tests to cover new URL normalization and fetching scenarios, ensuring robustness in marketplace management. This update significantly improves the marketplace experience by streamlining URL handling and enhancing data fetching capabilities. * Fix project auto-send cleanup spec	2026-05-14 21:12:50 +08:00
PerishCode	4f15c33595	Merge remote-tracking branch 'origin/preview/0.8.0' into preview/v0.8.0	2026-05-14 21:10:03 +08:00
PerishCode	9ef4b1c048	Use channel mac executable identity	2026-05-14 19:35:51 +08:00
PerishCode	43b1b94c8e	Add preview release channel	2026-05-14 19:15:16 +08:00
PerishCode	e0c76a09f2	Fix packaged beta build resources	2026-05-14 18:01:52 +08:00
PerishCode	cba8bf151d	chore: align namespace lifecycle packaging	2026-05-14 16:35:46 +08:00
lefarcen	b268bbe169	Merge origin/garnet-hemisphere (post-9e196d34) — Use Plugin handoff fix Brings in 11 new garnet commits, most importantly: - `1a90aef4` feat(plugin-use): implement plugin use handoff functionality — fixes the bug QA reported where /plugins Use Plugin would 422 silently for template plugins; new flow hands off to HomeView with the plugin pre-bound + input form prompted there. - `2ac58544` feat(plugin-inputs): enhance plugin input handling with file upload support — extends PluginInputsForm for file uploads. - `3b167b69` feat(plugins): registry protocol — new @open-design/registry-protocol workspace package (needs build before daemon boot). - Plus enhancements to plugin metadata, GitHub installer, plugin detail view, login/whoami, static HTML preview paths. Conflicts resolved: - packages/contracts/src/api/projects.ts: HEAD's skipDiscoveryBrief field + garnet's contextPlugins (@-mention plugin context refs) both kept on ProjectMetadata. - apps/landing-page/* (3 files): accepted HEAD — garnet had the older single-page landing-page header; main has the multi-page layout (/skills/, /systems/, /templates/, /craft/) with dynamic counts. Not related to the Use Plugin core fix. New @open-design/registry-protocol package must be built before daemon boots; pnpm install does this via postinstall already.	2026-05-14 16:32:35 +08:00
pftom	2ac5854432	feat(plugin-inputs): enhance plugin input handling with file upload support - Added support for file input fields in the PluginInputsForm, allowing users to upload files with serializable metadata. - Updated the HomeHero component to improve the layout and interaction of input fields, enhancing user experience. - Adjusted CSS styles for better visual representation of input fields and their states. - Modified HomeView to reflect changes in authoring chip IDs for better clarity in plugin actions. - Enhanced tests to cover new file input functionality and ensure correct behavior in various scenarios. This update significantly improves the plugin input handling, enabling users to upload files seamlessly and enhancing the overall interaction model.	2026-05-14 15:52:21 +08:00
lefarcen	6c16283850	Merge origin/main (post-7c8305f4) into reconcile branch Brings in 10 new main commits: routine deep-link to specific conversations (#1508), Windows resource cache fix for Orbit templates, collapsible comment side panel (#1607), routines project radio polish, Copilot logo swap, and minor UI fixes. Conflicts resolved: - router.ts: garnet's home/view + marketplace routes + main's per-project conversationId deep-link field coexist on Route union - ProjectView.tsx: garnet's isPhantomDaemonRunMessage helper + main's isStoppableAssistantMessage helper both kept - ProjectView.run-cleanup.test.tsx: accepted HEAD (garnet's phantom-row regression test); main's three new tests for finalizeActiveAssistantMessagesOnStop / clearStreamingConversationMarker / shouldClearActiveRunRefs are queued as a follow-up TODO inline.	2026-05-14 15:13:38 +08:00
shangxinyu1	2976c76fc3	test: expand Memory and Routines coverage (#1521 ) * test: expand settings and packaged coverage * test: extend memory settings coverage * test: cover routine settings failure states * test: cover routine operation failures * test: fix daemon test typing on CI * test: decouple packaged smoke from orbit bug * test: avoid live memory LLM calls in route tests * test: fix daemon fetch typing in CI * fix: restore preview comment and inspect toggles * test: align manual edit flow with current inspector UX * test: align comment attachment flow with current preview comments UI * fix: probe resolved Codex launch path during detection * fix: remove duplicate board activation helper after rebase * test: update ghost cli detection mock * test: align FileViewer toolbar expectation * ci: move full app tests to extended lane * ci: run app tests by changed scope * ci: cover shared app inputs in test scopes * ci: avoid setup-node cache in windows packaged smoke * test: align extended settings and manual edit flows	2026-05-14 14:48:40 +08:00
PerishFire	59ed000903	Fix Windows resource cache for Orbit templates (#1554 )	2026-05-14 14:27:29 +08:00
lefarcen	53997990b7	Merge origin/main (post-0.7.0) into reconciled garnet branch Second-pass merge layering 41+ new commits from origin/main on top of the first reconcile commit. Headline upstream additions absorbed: - 0.7.0 release: redesigned chat bubble user-text styling, neutralised palette, lucide icons, ElevenLabs audio voice option discovery in the prompt composer, analytics tracking (PostHog) wired across home / studio / create surfaces, Prometheus `/api/metrics` endpoint, critique-theater drop-in mount with a settings toggle. - Misc upstream fixes (titlebar padding, release header layout, deck preview chrome, feedback form auto-scroll, conversation-created SSE on routine runs, etc.) Conflict resolutions (12 files, ~22 hunks): - contracts barrel + prompts/system: union of both sides; new analytics exports (`./analytics/events`, `./analytics/public-params`) added alongside garnet's plugin/atom/genui exports. Both ElevenLabs voice fields (audioVoiceOptions/audioVoiceOptionsError, main) and pluginBlock/activeStageBlocks (garnet) preserved on ComposeInput. - daemon/server.ts: Prometheus `/api/metrics` route inserted after garnet's `/api/daemon/shutdown`. main's `createAnalyticsService` call added before the chat-run service init alongside the prior reconcile note about the dropped legacy POST /api/projects body. - App.tsx: handleCreateProject now consumes both garnet's plugin fields (pluginId / appliedPluginSnapshotId / pluginInputs / autoSendFirstMessage) and main's analytics requestId. Tracking fires success + failure paths; PluginLoopHome auto-send sessionStorage flag is preserved. - ProjectView.tsx: the garnet auto-send useEffect coexists with main's `useCritiqueTheaterEnabled()` hook. - ChatComposer.tsx: imports merged (drop now-unused fetchSkills, add analytics provider + tracking + buildVisualAnnotationAttachment). - index.css: main's redesigned `.msg.user .user-text` chat bubble styling wins over garnet's plain text rule; garnet's `.msg-plugin-chip*` rules preserved alongside. - EntryView.tsx: accepted HEAD (garnet wrapper) — consistent with reconcile decision #2. main's added PetRail / TopTab / analytics view tracking is intentionally NOT brought into the wrapper; the follow-up to re-integrate PetRail / image-templates / video-templates into EntryShell still stands and now also covers analytics view-tracking hooks. - daemon/package.json + pnpm-lock: merged dep set (tar + posthog-node + prom-client coexist). - Test fixtures (FileWorkspace.test): kept garnet's plugin-folders describe block intact; main's projectKind="prototype" addition is dropped where it conflicted with garnet's plugin-folder fixture files. Verification: `pnpm install` (after lockfile reconciled), `pnpm typecheck` exits 0 across all workspace packages. Follow-up not done in this commit: - PetRail / image-templates / video-templates / 0.7.0 analytics view-tracking hooks need to be added to EntryShell. - Critique-theater settings toggle UX (added on main) lives in the SettingsDialog hierarchy; the reconcile state preserves the SettingsDialog so this should work without changes, but no end-to-end verification yet.	2026-05-13 23:29:56 +08:00
lefarcen	d3602be666	Merge origin/main into garnet-hemisphere (reconcile) Merge of `origin/main` (`03ed3960`, 2026-05-13 pre-0.7.0) into the 161-commit garnet-hemisphere line, reconciling the product-vibe-coded plugin/marketplace/EntryShell surfaces from garnet with the routines / skills / live-artifacts feature work landed on main since the fork point. Headline decisions (full rationale + side-by-side screenshots in `specs/change/20260513-garnet-skills-automations/reconcile-result-vs-garnet.md`): - #1 SettingsDialog: keep main's Memory / Skills / External MCP / Connectors / Routines / MCP server nav items even though the top-level /integrations + /automations routes also cover them. Two entries coexist for now; revisit once Track A/B fill in the placeholder content. - #2 EntryView: accept garnet's thin wrapper delegating to EntryShell. Main's PetRail sidebar + image-templates/video-templates tabs are intentionally deferred to a follow-up that re-integrates them into the new EntryShell layout. - #3 /integrations + /automations top-level routes: kept (garnet's product intent). Skills tab is still a "Coming soon" placeholder awaiting Track A; Routines/Schedules/Live-artifacts cards on /automations are still mock awaiting Track B. - #5 DesignFilesPanel: hybrid — main's pagination as primary list, garnet's Plugin folders section preserved between the live-artifacts block and the pagination block. (by-kind sections drop in favour of pagination; plugin-folders rendering stays because it is a garnet-specific product addition.) - #7 server.ts (10 hunks, ~5400 conflict lines): manual hunk-by-hunk merge. Both daemon admin routes + plugin/genui routes (garnet) and routines/memory/skills upgrades (main) preserved. Garnet's inline project route block kept alongside main's `registerProjectRoutes` / `registerProjectUploadRoutes` modular wiring — duplicate route audit is a follow-up. Garnet's POST /api/projects plugin-snapshot resolution + default-scenario fallback is intentionally dropped from the inline body (now handled by registerProjectRoutes) and listed for follow-up re-integration into `project-routes.ts`. Verification (worktree at /Users/elian/Documents/open-design-garnet): - `pnpm typecheck` exits 0 across all workspace packages - daemon (`pnpm tools-dev run web --namespace reconcile-shots`) boots, serves `/api/daemon/status` healthy, and survives a Playwright walkthrough of /integrations / /automations / home / projects / design-systems / plugins / settings dialog - `@open-design/plugin-runtime` package built (was missing dist/ on garnet); without it the daemon's plugins/* imports fail at boot Track A (Skills tab → real SkillsSection) and Track B (Automations cards → real routines / live-artifacts backend) are the two remaining follow-ups blocking the placeholder/mock content from going live. See `spec.md` and `track-skills.md` in the same directory.	2026-05-13 22:29:21 +08:00
Nagendhra Madishetti	38a5ab69e6	feat(daemon): Critique Theater Phase 12 (9 Prometheus metrics + 6 log events + OTel span + Grafana dashboard) (#1485 ) * feat(web): pure reducer for Critique Theater states (Phase 7.1) Pure CritiqueState reducer driven by the contracts-level PanelEvent (the same shape both the live SSE stream and the recorded transcript emit), so a single reducer powers both the in-flight panel and the rerun replay. Lifecycle covers run_started → running → (shipped / degraded / interrupted / failed), with panelist_open / dim / must_fix / close / round_end events building per-round CritiquePanelistView entries as they arrive. Defensive behaviour that surfaced while writing the spec tests: - Terminal phases (shipped / degraded / interrupted / failed) are sticky against further lifecycle events for the same run, except for parser_warning which can land late and is recorded in a side channel without changing phase. - A new run_started for a different runId at any time discards the prior state and reboots, so the UI can launch consecutive runs without an explicit reset action. - Events whose runId does not match the active run return the same state reference, so React's useReducer doesn't re-render subscribers on stray traffic. - Round bookkeeping keys by round number rather than "always last", so an out-of-order panelist_dim for round 1 arriving after a round 2 dim does not corrupt the round 2 bucket. Test coverage: 18 cases covering each transition, the runId guard, sticky-terminal behaviour, the out-of-order round invariant, and the stable-identity guarantee. Sets up Phase 7.2 and 7.3 to wire SSE + replay into the same reducer. * feat(web): useCritiqueStream hook subscribes to SSE and feeds reducer (Phase 7.2) createCritiqueEventsConnection is a pure connection manager that mirrors apps/web/src/providers/project-events.ts: opens an EventSource at /api/projects/:id/events, listens for every name in CRITIQUE_SSE_EVENT_NAMES, decodes each frame back into a PanelEvent (stripping the critique. prefix and merging the data payload), and hands it to the caller's onEvent. Reconnect uses exponential backoff (1s → 30s) and resets on `ready`; malformed payloads drop with a dev-mode warning rather than tearing the stream. useCritiqueStream wraps the manager in a useReducer that owns the CritiqueState. enabled=false or a null projectId tears down the connection cleanly; switching projectId closes the old connection and opens a fresh one. The returned dispatch lets local UI synthesise actions (e.g. an Esc keypress firing a synthetic interrupted while a kill request is in flight); production traffic comes from the SSE stream. Test coverage: - sse.test.ts (10 cases, node env): subscription set covers every CRITIQUE_SSE_EVENT_NAMES channel; payload decoding lifts the wire shape back to PanelEvent; malformed JSON is swallowed and does not stop the stream; exponential backoff schedule and ready-reset semantics are pinned with a setTimeout seam; close() cancels pending reconnects and shuts the live source; no-op fallback when EventSource is unavailable. - useCritiqueStream.test.tsx (6 cases, jsdom env): idle pre-event, reducer driven by synthetic actions, no connection when disabled or projectId is null, clean close on unmount, projectId change reopens cleanly. * feat(web): useCritiqueReplay hook drives reducer from transcript file (Phase 7.3) Fetches the per-run NDJSON transcript (one PanelEvent per line), parses every line via the shared isPanelEvent predicate, and dispatches into the same CritiqueState reducer the live SSE stream uses. A single reducer means the UI rendering a replay can be identical to the live panel, and a UI mounting both useCritiqueStream and useCritiqueReplay in parallel does not have to reconcile two state shapes. speed knob is `paused \| instant \| live \| { intervalMs: N }`. - instant flushes every event synchronously, useful for opening a finished run already at its terminal state. - intervalMs paces dispatches at a fixed cadence so the reviewer can watch the run unfold. - paused parses the transcript but holds events back until the caller advances speed (consumers can drive a scrubber later). - live is reserved for the future "playback at original cadence" feature, currently treated as instant; replay timestamps are not yet persisted with each event so honest pacing requires a follow-up Phase 7+ task. gunzip seam handles `.ndjson.gz` transcripts via DecompressionStream when present; the production fetch path picks between text and arrayBuffer based on the URL extension. Both seams are injectable so the unit tests don't need to spin up a real network or a real gzip pipeline. Test coverage (8 cases, jsdom env): - Idle status before any URL is provided. - speed=instant flushes the full transcript synchronously to shipped state. - speed={intervalMs:N} paces with the setTimeout seam, reaching done after the last tick. - speed=paused leaves status=playing with no dispatches. - Empty transcript reports done with state still idle. - Fetch rejection surfaces an error status with the message. - Malformed NDJSON lines are skipped; valid events around them still land. - .gz transcripts route through the gunzip seam. Closes the Phase 7 plan tasks 7.1 / 7.2 / 7.3 (reducer + stream + replay), all on one branch ready for review. Phases 8+ (Theater components) consume these from this PR. * fix(web): close payload-override gap + paused-resume bug in Critique Theater hooks (Phase 7 review) Two P1 fixes from lefarcen's review on PR #1307: SSE payload override `sseToPanelEvent` previously spread `data` after the channel-derived `type`, so a payload-provided `type` could override the channel and route a `critique.run_started` frame into the reducer as a `ship` action. Reversed the spread so the channel-derived `type` is authoritative, and revalidated the resulting object through the contracts-level `isPanelEvent` predicate before returning. Frames that fail validation (missing runId, empty runId, unknown type) are dropped, so a malformed or compromised SSE frame can no longer dispatch a wrong-shape action into the reducer. Three new sse.test.ts cases pin the regression: hostile `type:'ship'` in the payload still resolves to `run_started`, missing runId is dropped, empty runId is dropped. Replay pause/resume `useCritiqueReplay` had one big effect keyed on `transcriptUrl` only, so flipping `speed` from `paused` to `instant` never re-fired and the held events sat undispatched. Split into a parse effect (depends on URL, fetches and stores events in state) and a pace effect (depends on parsed-events + speed, owns the cursor + timers). The playback cursor lives in a ref that survives pause/resume cycles, so flipping `paused` -> `instant` flushes from the current position rather than restarting (which would double-dispatch `run_started` and reset the reducer). Two new useCritiqueReplay.test.tsx cases: - paused-then-instant transitions from `playing` to `done` and reaches the shipped terminal phase - intervalMs paced playback dispatches one event, pauses to drain the next scheduled timer, flips to instant, and confirms the remaining transcript drains exactly once (cursor was preserved) Doc consistency The earlier source comment in useCritiqueReplay.ts claimed `live` "paces by recorded timestamps" while the impl used zero-delay timers and the PR body said it behaves like `instant`. Aligned to reality: `live` currently behaves like `{ intervalMs: 0 }` (events drain on successive microtasks via setTimeoutFn) because transcripts do not yet carry per-event timestamps. Honest timestamp-driven pacing is queued as a Phase 7+ follow-up. Validated: pnpm guard, pnpm --filter @open-design/web typecheck, Theater suite 47/47 (up from 42, +3 sse + 2 replay), full web suite 96 files / 888 tests. * feat(i18n): seed Critique Theater key block (en + zh-CN; other locales fall back via spread) * feat(web): Theater PanelistLane component (Phase 8.1) * feat(web): Theater ScoreTicker component (Phase 8.2) * feat(web): Theater RoundDivider component (Phase 8.3) * feat(web): Theater InterruptButton component with Escape keybind (Phase 8.4) * feat(web): Theater TheaterDegraded chip (Phase 8.5) * feat(web): Theater TheaterCollapsed post-run summary (Phase 8.6) * feat(web): Theater TheaterTranscript replay surface (Phase 8.7) * feat(web): Theater TheaterStage top-level container (Phase 8.8) * feat(web): Theater CSS using existing semantic tokens (no hex literals) * feat(web): Theater public exports barrel * fix(web): resolve P2 + P3 review feedback on Phase 8 (PR #1314) Addresses all 4 P2 + 3 P3 items from codex, Siri-Ray, and lefarcen. State-lifecycle fixes (3 x P2) 1. Reducer learns a synthetic `__reset__` action (`CritiqueResetAction`). Host hooks dispatch it when their gating prop changes so a stale run from a prior project / transcript cannot bleed into the next context. Reset is idempotent on idle (returns the same reference). 2. `useCritiqueStream` dispatches `__reset__` at the top of its connection effect, so a workspace switch from project A (which streamed a critique) to project B clears the reducer before the new EventSource opens. enabled=false also clears. 3. `useCritiqueReplay` dispatches `__reset__` at the top of its parse effect, so transcriptUrl swaps (including swap-to-null after a replay reached `shipped`) lift the reducer back to idle before the new fetch starts. SSE validation (1 x P2) 4. `sseToPanelEvent` now runs a per-variant `hasValidVariantShape` check after the cheap `isPanelEvent` predicate. A `critique.ship` frame missing `composite` / `round` / `status` / `artifactRef` is rejected before reaching the reducer, so TheaterCollapsed can no longer crash on `undefined.toFixed(1)`. Every variant's required fields are validated: run_started (protocolVersion, non-empty cast, maxRounds, threshold, scale), panelist_* (round, role, plus variant-specific shape), round_end (round, composite, mustFix, decision in {continue,ship}, reason), ship (round, composite, status, artifactRef.{projectId,artifactId}, summary), degraded (reason, adapter), interrupted (bestRound, composite), failed (cause), parser_warning (kind, position). Reducer correctness (1 x P2) 5. `panelist_open` now materializes the round + an empty panelist view (`{dims: [], mustFixes: []}`) so TheaterStage can highlight the in-progress lane the instant the tag opens. Before this, a stream that emitted only `panelist_open` after `run_started` left `rounds = []` and the UI rendered no current round until a later `panelist_dim` arrived. Polish (3 x P3) 6. Brand role tint swaps from `var(--magenta, var(--accent))` to `var(--purple, var(--accent))`. `--purple` is actually defined across the design systems; `--magenta` is not, so Brand was silently falling through to `--accent` and looking identical to Designer. 7. New i18n key `critiqueTheater.interruptedSummary` for the interrupted-collapse copy ("Interrupted at round N, best composite X.X"). Previously the interrupted branch reused `shippedSummary` and the UI read "Shipped at round..." for a run that specifically did not ship. Native value in en + zh-CN; other locales fall back via `...en` spread. 8. `TheaterDegraded` heading id comes from `useId()` instead of a hardcoded `theater-degraded-heading`, so two chips rendered on the same page (chat history with multiple completed runs) keep their aria-labelledby references unambiguous. Tests (15 new cases) - reducer.test.ts (+5): __reset__ on running/terminal/idle, panelist_open materializes round, panelist_open does not stomp prior panelist data. - sse.test.ts (+6): variant-level rejection for ship without required fields, degraded without adapter, run_started with empty cast, panelist_dim with non-numeric score, round_end with unknown decision, plus a positive fully-formed ship. - useCritiqueStream.test.tsx (+2): state reset on projectId change, state reset on enabled flip false. - useCritiqueReplay.test.tsx (+1): state reset on transcriptUrl swap to null after a replay reached shipped. - TheaterCollapsed.test.tsx (text-pinning update): asserts the interrupted branch reads "Interrupted at round 1" + "best composite 7.9", and explicitly NOT "Shipped at round...". - TheaterDegraded.test.tsx (+1): two chips on the same page get unique aria-labelledby ids that each resolve to an `<h3>`. Validated - pnpm guard clean - pnpm --filter @open-design/web typecheck clean - Theater suite: 13 files, 101 tests (was 86 on the first Phase 8 push, +15 new) - tests/i18n/locales.test.ts 5 of 5 across 18 locales * feat(web): CritiqueTheaterMount wires SSE + reducer into a single drop-in (Phase 9.1) * feat(i18n): Critique Theater strings for de + ja + ko + zh-TW (Phase 9.2) * fix(web): resolve P1 + P2 review feedback on Phase 9 (PR #1315) Addresses every blocker from codex, Siri-Ray, and lefarcen. The three state-lifecycle and SSE-validation issues they also flagged inherit fixes from PR #1314's review pass that this branch now sits on top of after rebase. Real daemon kill on Interrupt (P1) - CritiqueTheaterMount now POSTs to /api/projects/:id/critique/:runId/interrupt alongside the optimistic local dispatch. Before this fix, clicking Interrupt only flipped the React state to interrupted while the daemon job kept running. The fetch is best-effort: a 404 (endpoint not wired yet, lands in Phase 15) is swallowed with a dev-mode console.warn so the UI still moves to the collapsed badge. - New fetchInterrupt test seam lets RTL assert on the URL / method and simulate the "daemon not ready yet" path. Two tests pin both: the happy URL proj-42/critique/run-abc/interrupt POSTs, and a rejected fetch still flips the UI. interruptPending reset on new run (P2) - A ref-backed effect compares the current runId against the last one we saw; when it changes, interruptPending is cleared. A user who interrupts run-1 and then triggers run-2 from the same mount now gets a fresh, enabled kill button instead of one stuck in "Interrupting…". Pinned by a new mount test. Escape keybind scope (P2) - InterruptButton now checks the keydown target. Escape inside an input, textarea, select, or contenteditable element is ignored (and any ancestor of those via closest() is treated the same way). Body-level focus still fires the keybind so the Theater area's affordance keeps working. Four new tests cover textarea, input, contenteditable, and the body-focus positive case. userFacingName i18n key (P2) - The spec at specs/current/critique-theater.md:6 mandates a single critiqueTheater.userFacingName key so the "Design Jury" label can be renamed without touching code. Phase 8 introduced critiqueTheater.title by mistake; renamed across types.ts, en.ts, zh-CN.ts, de.ts, ja.ts, ko.ts, zh-TW.ts, and the lone consumer TheaterStage.tsx. The locale alignment test stays green. Validated - pnpm guard clean - pnpm --filter @open-design/web typecheck clean - Theater suite: 14 files, 112 tests (was 101 before, +11 new for the Phase 9 review pass: 3 mount + 4 InterruptButton focus scope; the rest were already in #1314's review fix). - tests/i18n/locales.test.ts 5 of 5 across 18 locales. * feat(daemon): adapter-degraded registry with TTL (Phase 10.1) In-memory registry recording adapters that produced malformed or oversize transcripts so the orchestrator can skip them for a TTL window (default 24h) instead of cycling through known-bad providers on every run. Records carry reason (malformed_block \| oversize_block \| missing_artifact), source label, and expiresAt. The test-only clock seam lets the suite advance time deterministically and prove that an expired entry stops counting as degraded without anyone calling clearDegraded. 7/7 vitest cases green. * feat(daemon): synthetic good + bad adapter fixtures (Phase 10.2) Two test-only adapters that read the existing v1 transcript fixtures (happy-3-rounds and malformed-unbalanced) and replay them as either a full string or a 512-byte chunked stream. The chunked form is what the conformance harness uses to prove the parser holds together when the transcript arrives in arbitrary network slices, not as one buffered blob. * feat(daemon): adapter conformance harness (Phase 10.3) runAdapterConformance pulls a transcript through the same parseCritiqueStream pipeline the orchestrator uses and classifies the outcome as shipped, degraded, or failed. On a degraded outcome it forwards the matched reason to the adapter-degraded registry, so a single nightly conformance run is what populates the skip list rather than the orchestrator learning each adapter is broken at request time. 5/5 vitest cases green covering shipped, malformed degraded, oversize degraded, no-ship failure, and the harness-thrown failure path. * test(e2e): Critique Theater Playwright suite (Phase 11) Six tests, one viewport per visual case, deterministic SSE fixtures stubbed via page.route(). Adds the suite to test:ui:extended so the existing extended-UI lane picks it up. Coverage: 1. Happy path: a single mounted theater plays the full fixture (1 run_started, 5 panelists open / dim / must_fix / close, 1 round_end, 1 ship) and ends on the score badge. 2. Interrupt mid-run: the panelist that is open at the time the interrupt button is clicked closes with an interrupted marker and the transcript freezes there. 3. Visual regression at 375x720 mobile. 4. Visual regression at 768x1024 tablet. 5. Visual regression at 1280x800 desktop. 6. A11y role tree: the theater region exposes a labelled landmark, each panelist lane is a group with an accessible name, the score is a status live region. All SSE traffic is stubbed by page.route so the suite runs in CI without a daemon. The toggle is seeded via localStorage by bootAppWithCritiqueEnabled so the gate behaves as if Settings flipped it on. typecheck clean; playwright --list reports 6. * test(web): reducer p99 bench at 10k iterations (Phase 13.1) Locks the documented 2ms budget for the Critique Theater reducer on a representative SSE script (27 actions, one full happy run) behind a regression gate. Asserts p99 stays under 4ms (2x the documented budget) so CI runners with a noisy neighbour do not flake while a real regression to 20ms or 200ms still trips. The bench is a vitest case rather than a bare microbenchmark so it runs in the same CI lane as every other web test and does not need a parallel runner. * test(web): critique surface coverage walker (Phase 13.2) Walks the public critique surface (11 SSE event names, 5 panelist roles, 6 lifecycle phases, 9 named i18n keys) and asserts each named symbol appears in both the src corpus and the test corpus. The walker is the gate that catches a rename in one half of the codebase without a matching update in the other half: a future PR that drops 'panelist_must_fix' from the reducer without also removing its test reference fails this suite. 62 assertions, one per symbol per corpus. * docs: Critique Theater user guide (Phase 14.1) Seven sections aimed at end users (not contributors): 1. What is Design Jury 2. How it works (the five panelists, auto-converging rounds, the composite formula) 3. Settings (the M1 toggle and what it does) 4. Reading the score badge 5. Replay surface 6. Troubleshooting (degraded, interrupted, failed) 7. FAQ The composite formula is documented as designer * 0 + critic * 0.4 + brand * 0.2 + a11y * 0.2 + copy * 0.2 because anyone trying to reverse-engineer the score is going to search for those weights and the docs are the place they should land first. * docs(daemon): critique module AGENTS map (Phase 14.2) Daemon-side wayfinder for the apps/daemon/src/critique directory. Tables every file, what owns what invariant, and the 'when you change anything here' guide so a future contributor does not have to reverse-engineer the rollout resolver before adding a new SSE event. * docs(web): Theater module AGENTS map (Phase 14.3) Web-side mirror of the daemon AGENTS map. Same file table, same invariants section, same change-impact guide, sized to the Theater component package. * feat(daemon): rollout flag resolver (Phase 15.1) Single decision point every caller consults to know whether the orchestrator should wire the critique pipeline for a given run. Priority: 1. Skill-level policy (required wins, opt-out wins inversely) 2. Per-project override from the Settings toggle 3. OD_CRITIQUE_ENABLED env override 4. Rollout phase default M0 dark-launch false M1 settings only false (toggle is off until the user flips it) M2 per-skill true if skill opted in M3 global default true OD_CRITIQUE_ROLLOUT_PHASE parser defaults to M0 on unknown input so a fresh install never surprises a user with the feature on. 10/10 vitest cases green covering every cell of the matrix. * feat(web): Settings toggle hook for Critique Theater (Phase 15.2) React hook that reads critiqueTheaterEnabled from the existing open-design:config localStorage blob and stays in sync via: - the platform storage event (cross-tab) - a open-design:critique-theater-toggle CustomEvent (same-tab) Same-tab event is the one that fires when the Settings panel saves in the current window: the toggle and every mounted theater update without a page reload. setCritiqueTheaterEnabled(next) is the imperative setter the Settings panel calls. It preserves the rest of the stored config (mode, apiKey, etc.) and dispatches the same-tab event after the localStorage write. The web hook reflects what the user toggled; the daemon-side isCritiqueEnabled is the final routing authority (project override, env, rollout phase). When they disagree, the daemon wins for backend gating and the web reflects the toggle state. 6/6 vitest cases green covering first read, stored read, same-tab event flip, config preservation, corrupted JSON tolerance, and cross-tab storage event. * test(web): Phase 15 toggle hook failure-mode coverage (PR #1320) lefarcen P2 on PR #1320 flagged that the PR body claimed safe behavior for disabled localStorage, non-object JSON, and missing CustomEvent shim, but the suite only covered corrupt JSON plus happy-path storage events. Added four failure-mode tests so the swallowed errors are not silently traded for a throw in a future refactor: 1. Returns false on a stored JSON value that parses to an array (non-object). Catches a regression where the guard treats anything truthy as a config blob. 2. Returns false on a stored JSON value of literal 'null'. typeof null === 'object' in JS, so the guard has to check null explicitly; this test pins that check. 3. Returns false when localStorage.getItem throws (private mode / disabled storage / SecurityError). The hook must swallow and return false so the rest of the app keeps rendering. 4. setCritiqueTheaterEnabled still dispatches the same-tab CustomEvent when localStorage.setItem throws (quota exceeded / disabled storage). The dispatch path is the in-session broadcast that keeps every mounted hook coherent even when persistence is unavailable; verified by mounting two probes and asserting both flip after the setter is called with a throwing setItem. 10/10 vitest cases green (6 existing + 4 new). * fix(web): honor CustomEvent payload in toggle hook listener (PR #1320) Both Siri-Ray (blocking) and lefarcen (P2 new) caught the same real bug in the failure-mode test I added in `affcdd27`: the test asserts the in-session UI flips when localStorage.setItem throws, but the CustomEvent listener was ignoring the event's typed detail and just calling readToggle(). Under a throwing setItem the localStorage value is stale (or absent), so the listener would see the OLD value and the test would fail (or worse, the production claim 'in-session event keeps mounts coherent' was hollow). Fixed the hook, not the test: the listener now reads event.detail.enabled when it is a boolean, falling back to readToggle() only for malformed events or for cross-tab storage events (which do not carry a typed payload). The setter already dispatched the detail; the listener just was not consuming it. Test changes: - The existing 'setItem throws' test now asserts the right behavior for the right reason. Updated the inline comment to say the listener reads from detail, not localStorage. - New test 'falls back to readToggle when the CustomEvent carries no usable detail' pins the fallback path: a malformed dispatcher (no detail, or detail.enabled not a boolean) degrades cleanly instead of throwing or being silently ignored. 11 / 11 vitest cases green (10 prior + 1 new fallback). * feat(daemon): route critique spawn-path eligibility through the rollout resolver The wireup edit Phase 10 and Phase 15 carved out: today server.ts gates the critique pipeline on critiqueCfg.enabled, which is just the OD_CRITIQUE_ENABLED env var. After this commit it gates on isCritiqueEnabled(...) from the Phase 15 resolver, so the full priority matrix is live: 1. Per-skill od.critique.policy veto (opt-out / required) 2. Per-project override (M1 Settings toggle, written through the existing Phase 6 settings endpoint) 3. OD_CRITIQUE_ENABLED env override (power-user lane / CI fixtures) 4. OD_CRITIQUE_ROLLOUT_PHASE default M0 dark-launch false M1 settings only false M2 per-skill only when skillPolicy === 'opt-in' M3 global default true Default behaviour on a fresh install is unchanged: the resolver returns false at M0 without an env override or a project override, so prod traffic falls through to the legacy single-pass path exactly the way it did before. Inputs threaded today: phase from OD_CRITIQUE_ROLLOUT_PHASE, envOverride from OD_CRITIQUE_ENABLED. skillPolicy and projectOverride are passed as null for the v1 cutover; the daemon-side handler that round-trips critiqueTheaterEnabled on the project settings row and the od.critique.policy frontmatter resolver land as the next two commits in this branch. The three call sites that used critiqueCfg.enabled (the brand-thread guard, the skill-thread guard, the top-line critiqueShouldRun compound) now read from a single locally-scoped critiqueEnabledForRun boolean, so the eligibility check is computed exactly once per spawn and the prompt composer + orchestrator stay in lockstep the way the existing comment already promised. Tests still green: daemon vitest 22 / 22 across rollout + conformance + adapter-degraded. Daemon typecheck clean. * feat(web): mount CritiqueTheaterMount in ProjectView The web counterpart of the daemon wireup. ProjectView now renders <CritiqueTheaterMount projectId={project.id} enabled={...} /> as a sibling of <AppChromeHeader> inside the top-level <div className="app">. The mount is the drop-in from the Phase 9 stack: it owns the SSE subscription, the kill-request handshake, and the phase-aware swap from the live <TheaterStage> to the collapsed badge once a run settles. The mount returns null until the daemon emits a critique.run_started for the active project, so the visual surface is byte-for-byte unchanged for users who have not opted in. Enabled wiring: useCritiqueTheaterEnabled() reads the M1 Settings toggle from the existing open-design:config localStorage blob and stays in sync with both the platform storage event (cross-tab) and the same-tab open-design:critique-theater-toggle CustomEvent the Phase 15 setter dispatches. The hook honors the event payload directly so a private-mode browser that cannot persist the toggle still updates the in-session UI correctly. The daemon-side gate (isCritiqueEnabled in apps/daemon/src/server.ts) remains the authority for whether a run is actually wired through the critique pipeline. This hook only governs whether the web layer renders the resulting SSE stream when the daemon emits one. The two-layer gate is intentional: an integrator embedding the Theater in a custom UI can flip the web visibility independent of the daemon's routing decision, and a daemon-side env override flips backend gating without touching the web's localStorage. Tests still green: web Theater suite 181 / 181 across 16 files. Web typecheck clean. * feat(daemon): resolve od.critique.policy frontmatter at the spawn site The next step in the wireup branch's ladder: replace the placeholder `skillPolicy: null` with the actual value parsed from the active skill's SKILL.md frontmatter. Three small edits, one new field on a public type: 1. SkillInfo gains a `critiquePolicy: SkillCritiquePolicy` field carrying the parsed `od.critique.policy` token (required / opt-in / opt-out / null). The field is null when the skill has no opinion, which lets the lower-priority resolver tiers (projectOverride, envOverride, phase default) decide. 2. listSkills() populates the new field via a small `normalizeCritiquePolicy` helper that tolerates the YAML scalar's casing and trims whitespace. Unknown tokens collapse to null so a typo in SKILL.md cannot accidentally force the panel on or off; it just falls through. Derived example cards inherit the parent's policy. 3. server.ts captures `skill.critiquePolicy` into a hoisted `skillCritiquePolicy` variable inside the existing skill-load block, then threads it into the isCritiqueEnabled call as the skillPolicy input. The hoisting keeps the variable in scope at the resolver call site without restructuring the spawn handler. After this commit, the priority matrix the rollout resolver was designed for is live for its top tier. The previous commit wired env + phase; this one wires skill. The projectOverride input remains null pending the next commit that extends the Phase 6 settings endpoint. Daemon vitest: 10 / 10 rollout cases pass against the new wiring. Daemon typecheck: clean. * feat(daemon): feed projectOverride into the rollout resolver from project metadata Replaces the placeholder `projectOverride: null` in the spawn handler with the actual value the Settings panel writes onto the project's metadata blob: `critiqueTheaterEnabled?: boolean`. The read is defensive at the boundary: the metadata object is typed loosely (it round-trips through SQLite as a free-form JSON blob), so the spawn handler narrows to `boolean` and falls through to `null` for any other shape. A missing key, a malformed value, or a project that has never visited Settings collapses to `null`, which is exactly the resolver's "no opinion, fall through to env / phase" signal. The `critique` frontmatter slot also gets typed on the SkillFrontmatter shape so the `od.critique.policy` chain the previous commit introduced no longer needs a bracket-access cast. Same pattern as the existing `craft`, `preview`, and `design_system` nested-record slots. After this commit, every tier of the rollout resolver's priority matrix is wired: 1. skillPolicy (from SKILL.md od.critique.policy) 2. projectOverride (from project metadata critiqueTheaterEnabled) 3. envOverride (from OD_CRITIQUE_ENABLED) 4. rollout phase (from OD_CRITIQUE_ROLLOUT_PHASE) The write path for projectOverride still flows through the existing project-update handler the Settings panel already uses to persist project metadata; no new endpoint is needed. The Settings UI button that calls setCritiqueTheaterEnabled and posts the new field is the next commit on this branch. Daemon typecheck: clean. Daemon vitest: 10 / 10 rollout cases still green against the new wiring. * fix(daemon): forward critique events to project sinks + align composer gate (PR #1338) Two codex review items addressed in one commit since they share the same root cause (resolver-enabled run hits a transport / prompt contract that was still env-gated): P1 (transport mismatch). The daemon emits critique.* SSE frames through critiqueBus -> design.runs.emit, which fans out on /api/runs/:runId/events. The web CritiqueTheaterMount subscribes to /api/projects/:projectId/events (it's project-scoped, not run- scoped, because the mount lives at the project workspace and follows the user across runs). Result: in production the mount never sees a real frame and the e2e tests' stubbed routes hide the mismatch. Fixed by extending critiqueBus.emit to fan out to BOTH sinks: the existing runs.emit transport, AND the per-project event-sinks map. The project-events route emits via sse.send(payload.type, payload), so we pack the SSE channel name onto payload.type and let the sink push the right channel. The web sseToPanelEvent overwrites type from the channel name on the way back into a PanelEvent, so the round-trip stays correct. P2 (prompt gate misalignment). composeSystemPrompt reads cfg.enabled to decide whether to append the panel addendum, but critiqueCfg.enabled is loaded from OD_CRITIQUE_ENABLED only. A run the resolver enabled via phase / project / skill (env unset) would have critiqueShouldRun = true while critiqueCfg.enabled remained false, dropping the panel prompt while still routing through runOrchestrator -> parser waits for tags that never arrive -> run degrades. Fixed by passing a derived config { ...critiqueCfg, enabled: true } to the composer when critiqueShouldRun is true. The composer's own gate now agrees with the resolver decision on every input the spec defines. Daemon typecheck: clean. Daemon vitest: 10 / 10 rollout cases still green against the new wiring. * fix: address PerishCode P1 + P2 follow-ups on PR #1338 Two follow-up items PerishCode flagged on the activation PR. Non-blocking but both are real: 1. Phase 11 e2e suite was wired into test:ui:extended but lands the user on '/' (home route) where ProjectView (and therefore CritiqueTheaterMount) is never rendered. With the suite as written, every assertion would time out the first time the lane runs in CI, contradicting the PR body's claim that the suite stays parked behind test.describe.fixme. The state diverged from my earlier Phase 11 work because the merge from main on commit `4ab719c6` brought in #1307's squash-merged version of the e2e file (the pre-fixme shape). Re-applied test.describe.fixme to the describe block plus removed ui/critique-theater.test.ts from the test:ui:extended script in e2e/package.json. Added a file-header docblock explaining what the follow-up commit needs to do: replace goto('/') with /projects/:id navigation similar to app-design-files.test.ts, split the SSE fixture into a live prefix and terminal suffix (Codex P2 on PR #1320), and commit the first PNG baselines. 2. bestRoundOf in CritiqueTheaterMount returned the LAST round with a numeric composite, not the round with the HIGHEST composite, while bestCompositeOf correctly returned the max. A run that closed round 1 at 8.5 and round 2 at 6.0 would dispatch interrupted { bestRound: 2, composite: 8.5 } on a user-clicked interrupt. Folded the two helpers into a single bestRoundAndComposite that walks state.rounds once and returns the matching pair so the two values cannot drift. The onInterrupt callback now destructures from one helper instead of two independent reads. Falls back to (state.activeRound, 0) when no round has closed with a composite yet. Web typecheck: clean. CritiqueTheaterMount.test.tsx: 7 / 7 cases still green against the new helper. * fix: wire M1 project override end-to-end + correct deferred-surface doc claims (PR #1338) Three lefarcen P2s on the latest review pass, all real: 1. M1 project override was half-wired: the daemon read metadata.critiqueTheaterEnabled but the web setter only wrote localStorage. A user opt-in would render the Theater on the web (localStorage was set) while the daemon resolved projectOverride=null and skipped critique unless env / phase already permitted. Two halves talking past each other. Extended setCritiqueTheaterEnabled to accept an optional { projectId, fetchProjectSettings } options bag. When a projectId is supplied, the setter ALSO sends a PATCH /api/projects/:id with { metadata: { critiqueTheaterEnabled } } so the daemon's spawn-time resolver picks the same value up on the next generation. The existing project-routes endpoint already accepts arbitrary metadata patches, so no new endpoint is needed. The local write + the CustomEvent dispatch still fire before the PATCH, so a network failure does not unwind the in-session UI flip. Three new vitest cases pin the new path: PATCHes when projectId is provided, skips when it is not, swallows a rejected PATCH so the in-session UI still flips. 2. Rollout docs (docs/critique-theater.md section 3) claimed the Settings toggle persists into the daemon settings store, but the previous implementation only had a localStorage reader / writer plus a daemon read of project metadata, with no round-trip. Rewrote the section to lead with the four-tier resolver (skill policy / project override / env / phase), document that the setter now round-trips via the existing PATCH endpoint when given a projectId, and call out the Settings panel UI control as a deliberate follow-up. 3. Troubleshooting table pointed users at /api/metrics/critique (Phase 12, deferred) and 'od adapters clear-degraded <id>' (CLI wrapper that does not exist). Replaced the metrics reference with the local conformance harness command (pnpm --filter @open-design/daemon vitest run tests/critique-conformance.test.ts) that ships today, with a note that the Phase 12 dashboard surfaces this status as a series once that PR lands. Replaced the CLI command with the programmatic clearDegraded() helper that exists today and flagged the CLI wrapper as planned follow-up. Web typecheck: clean. Toggle hook tests: 14 / 14 green (11 existing + 3 new for the round-trip path). * test(web): multi-round interrupt regression for bestRoundAndComposite (PR #1338) lefarcen P3 follow-up to the previous bestRoundAndComposite fix: the existing CritiqueTheaterMount.test.tsx interrupt cases only exercised a single-round state, so a future refactor back to two independent helpers wouldn't be caught by the test suite even though it'd reintroduce the round / composite drift bug. Added a regression case that: 1. Drives the reducer through two complete rounds with the full 5-role cast closing at distinct composites: round 1 at 8.5, round 2 at 6.0 (the high-composite round is NOT the most recent one). 2. Clicks Interrupt + waits for the daemon ack via the test seam fetcher returning 204. 3. Asserts the collapsed badge displays "round 1" (the correct best-composite round), and queryByText for "round 2 ... 8.5" returns null (the buggy pairing would have produced that string). The bestRoundAndComposite helper walks state.rounds in one pass and returns the matching pair, so the round number and the composite cannot drift apart. This test locks the fix in: a refactor that splits the helpers back into independent walks will be caught here. 8 / 8 vitest cases green on the file. * fix(web): read-merge-write the project metadata in setCritiqueTheaterEnabled (PerishCode P2 on PR #1338) The previous round-trip sent { metadata: { critiqueTheaterEnabled: next } } as the entire PATCH body. The daemon's project-routes handler only re-stamps three immutable fields (baseDir, importedFrom, fromTrustedPicker) before calling updateProject(db, id, patch), which then does a shallow { ...existing, ...patch } in apps/daemon/ src/db.ts. So patch.metadata replaces the row's metadata wholesale, dropping kind, templateId, linkedDirs, and every other field the rest of the app reads. No in-tree caller passes projectId today (only vitest cases), so the bug had not surfaced yet. But the surface is documented in docs/critique-theater.md section 3 and the function's own JSDoc as the M1 round-trip path, so it would have shipped as a latent footgun for the next integrator: a Settings UI follow-up, or any third party that wires the setter into a project-aware surface. Fix: read-merge-write rather than a bare patch. - GET /api/projects/:id to read the row's current metadata. - Spread that metadata into the PATCH body and overlay critiqueTheaterEnabled: next on top, mirroring the partial-metadata pattern already used in ChatComposer.tsx for linkedDirs. - PATCH the merged object. Failure handling: - GET fails: skip the PATCH entirely. We cannot construct a safe merged body without the current state, and a bare patch would wipe other metadata. The in-session CustomEvent fired earlier in the setter still keeps every mounted hook consistent; the next save retries the round-trip. - PATCH fails: log in dev. The in-session UI is already correct via the CustomEvent. Tests (TDD, red-first): - 'GETs the project then PATCHes with merged metadata when a projectId is supplied': stubs a GET that returns { kind: 'template', templateId: 'modern-blog', linkedDirs: [...] } and asserts the PATCH body equals the merge plus the toggle. - 'PATCHes with just the toggle when the project has no prior metadata': stubs a GET that returns no metadata block. - 'skips the PATCH (does not stomp metadata) when the prefetch GET fails': stubs a rejecting GET and asserts only the GET fires. - 'swallows a rejected PATCH after a successful prefetch': stubs a successful GET and a rejecting PATCH; asserts the in-session UI still flips via the CustomEvent. Doc updated on the setter's JSDoc to describe the new three-step flow (localStorage, CustomEvent, read-merge-write PATCH) and the two failure modes. Verified: - pnpm --filter @open-design/web typecheck clean. - pnpm --filter @open-design/web test: 111 files / 1055 tests green (was 1052, +3 from the new merge-flow cases). * fix(web): restore wait-for-daemon-ack pattern on Theater interrupt Same regression as flagged on PR #1316 post-main-merge: the optimistic local dispatch fired before the POST resolved, so a daemon 404 / 409 still terminalized the UI and the real SSE terminal event got ignored by the sticky interrupted phase. Snapshot runId / bestRound / composite at click time, dispatch interrupted only on res.ok, clear interruptPending on rejection or non-2xx so the user can retry. Tests cover rejection + 404 leaving the run on the live stage; the 204 path waits for the ack. * feat(daemon): Critique Theater Phase 12 observability foundations Lands the metrics registry, the structured logger, the /api/metrics route, and the adapter-degraded bump that wires up the first data point. The orchestrator-side bumps for runs / rounds / composite / must-fix / interrupted / parser_errors / protocol_version land in a follow-up commit on this branch (kept separate so the wiring diff reads cleanly against the registry shape). Surfaces added: - apps/daemon/src/metrics/index.ts: 9 Prometheus series under the open_design_critique_* namespace with the histogram buckets the spec calls out (round_duration_ms at 100 / 250 / 500 / 1000 / 2500 / 5000 / 10000 / 30000 / 60000 ms; composite_score at 0-10 integer steps). - apps/daemon/src/logging/critique.ts: 6 typed events, one JSON line per call on stdout, namespaced critique. Matches the JSON-per-line convention cli.ts already uses; no new logger framework. - apps/daemon/src/server.ts: GET /api/metrics route. Honors OD_METRICS_ENDPOINT=disabled to opt out for air-gapped installs. - apps/daemon/src/critique/adapter-degraded.ts: markDegraded now bumps degraded_total so the adapter-health dashboard panel reflects every TTL refresh and every fresh mark. Deps: prom-client ^15.1.0, @opentelemetry/api ^1.9.0 added to apps/daemon/package.json. Both are zero-config no-ops without an exporter wired; daemon bundle size impact is ~150 KB uncompressed. The @opentelemetry/api dep is in place ahead of the OTel-spans follow-up commit; it adds no behavior on this commit. Tests: - tests/metrics/critique.test.ts (3 cases): registry shape + exposition text + reset-between-tests - tests/logging/critique.test.ts (4 cases): event shape + ordering + newline framing + namespace stamping Verification (Windows-local): - pnpm --filter @open-design/daemon typecheck: clean - New metrics + logging suites: 7 / 7 green - Existing adapter-degraded + conformance + rollout suites: 22 / 22 green; the bump is non-breaking * feat(daemon): wire Critique Theater metrics + structured logs from the orchestrator Lights up the bump sites the Phase 12 foundations PR registered the series for. Every panel event the parser surfaces now reaches the matching Prometheus counter / histogram and the matching JSON log line on stdout. Switch-loop bumps + logs: - run_started: log run_started, set protocol_version gauge to the observed protocol version (small-integer cardinality). - panelist_open: record the first-open wall-clock per round so round_end can compute round_duration_ms; subsequent opens in the same round leave the start time untouched. - panelist_must_fix: bump must_fix_total with the panelist role. The wire event does not yet carry a dim name, so the label is 'unspecified' for now; a future parser revision can drop in the real dim without a metric rename. - round_end: bump rounds_total, observe composite_score, observe round_duration_ms (current ms minus the tracked start), log round_closed with the composite / mustFix / decision triple. - parser_warning (parser-yielded): bump parser_errors_total with the kind label, log parser_recover with kind + position. Orchestrator-side parser warnings (composite_mismatch and duplicate_ship from the daemon-authoritative scoring checks) go through a new emitParserWarning helper so the bus emit, the collectedEvents push, the metric bump, and the log line stay in lockstep. Three inline emission sites collapse to one-line helper calls. After the try/catch, a single terminal-status switch bumps runs_total{status, adapter, skill} once per run, with branch- specific log + counter: - shipped / below_threshold: log run_shipped - interrupted: bump interrupted_total, log run_failed{cause: interrupted} - timed_out: log run_failed{cause: timed_out} - failed: log run_failed{cause: orchestrator_internal} - degraded: log degraded{reason: orchestrator_classified} OrchestratorParams gains optional skill: string for the label; defaults to 'unknown' so spawn sites that have not yet threaded it keep working without a metric shape change. Tests: - The new metrics + logging suites (7 / 7) verify registry shape and event framing; orchestrator-side metric integration is exercised through the existing critique-conformance and critique-adapter-degraded suites (22 / 22 still green). - Logger test reassigns process.stdout.write directly instead of vi.spyOn so the Node overloaded write signature does not collide with MockInstance<unknown>. * feat(observability): Grafana dashboard JSON for Critique Theater Three default rows mapping to the metrics this branch wires up: 1. Fleet quality: composite score p50 / p90 / p99 line graph by adapter, plus a heatmap of the composite distribution. The line graph answers 'are my agents getting better over time'; the heatmap answers 'are the bad runs clustered around one adapter or smeared across the fleet'. 2. Adapter health: stacked bar charts for degraded marks (by adapter / reason) and parser errors (by adapter / kind) over a 5-minute window. The two queries together let an operator see 'is this adapter degraded because of malformed wire output or because of oversize blocks' without flipping panels. 3. Brief throughput: runs-per-hour by terminal status, an average rounds-per-run stat per adapter, and a round-duration ms p50 / p90 / p99 line. Throughput numbers fall straight out of the runs_total / rounds_total counters; the duration histogram is the same one the runs feed. The dashboard uses a templated $datasource var (defaults to 'prometheus') so an operator with multiple Prometheus instances can switch without editing JSON. Schema version 39 (Grafana 11). Operators import via: pnpm dlx @grafana/cli dashboard import tools/dev/dashboards/critique.json or paste into a provisioned dashboards directory. The file is checked into the repo as a starting artifact; alert rules and SLO panels ship after the first 1000 runs inform the right thresholds. JSON validates with node -e 'JSON.parse(...)' (sanity checked locally). * feat(daemon): OpenTelemetry outer span around the critique run Wraps each runOrchestrator call in a 'critique.run' span via the existing @opentelemetry/api dep added in the Phase 12 foundations commit. Attributes set on the span: - critique.run_id, critique.adapter, critique.skill at start - critique.final_status, critique.final_composite on terminal resolution - span status flipped to ERROR for failed / timed_out runs so a Tempo / Honeycomb / Jaeger filter on traces.status=error surfaces the right slice without joining back to Prometheus No exporter is wired by default; @opentelemetry/api is the API package and intentionally splits from @opentelemetry/sdk-, so the span is zero-overhead until an operator attaches an SDK through their runtime config. Inner per-round / parse_chunk / scoreboard_eval / persist_round / ship.persist spans defined in the Phase 12 plan are a follow-up: the outer span alone gives the trace a duration + final status + adapter/skill labels, which is the 80% value for dashboards that correlate runs across services. Adding child spans inside the existing 600-line orchestrator without restructuring is a separate careful change. Verification: - pnpm --filter @open-design/daemon typecheck: clean - 29 / 29 critique + metrics + logging tests still green fix(nix): bump pnpmDepsHash for prom-client + @opentelemetry/api lockfile bump nix-check failed on PR #1485 with hash mismatch in open-design-daemon-pnpm-deps and open-design-web-pnpm-deps after the Phase 12 foundations commit (`2b8b7445`) added prom-client and @opentelemetry/api to apps/daemon/package.json and refreshed pnpm-lock.yaml. CI reported the new sha: specified: HFLm+8hv3o5x3Xem4MXNsNclIgiVRc70+EBafL0rVn8= got: 7R1sQC38gOT0gsZ2oNOviCZ486cbbGJGJCis6WI8z9s= Both nix files pin the same workspace lockfile, so both flip in lockstep. No other Nix surface changes required. * fix(daemon): four Phase 12 review findings (Codex P2 x2 + Siri-Ray P2 + lefarcen P2) 1. Siri-Ray P2 in orchestrator.ts (round metric / log used untrusted agent values). The new observability path now records rs.composite and rs.mustFix (daemon-authoritative) instead of event.composite and event.mustFix when rs exists, and skips the bumps + log entirely when rs is missing (a degenerate round_end without any matching panelist_open). The dashboard p50 / p90 / p99 now agrees with persistence and ship decisions; an adapter reporting <ROUND_END composite='10'> while the daemon computed 6 logs 6 and still emits the composite_mismatch parser warning the prior block was already producing. 2. Codex P2 in server.ts (skill label always 'unknown'). The spawn path called runOrchestrator without passing the resolved skill id, so every live run bumped open_design_critique_{skill='unknown'} and the per-skill dashboard breakdown was always empty. Threaded effectiveSkillId (already computed at the same handler scope as the project skill fallback) through skill: . . . so the metric reflects the real skill when one is assigned, and the orchestrator default of 'unknown' only fires for runs that genuinely have none. 3. Codex P2 in conformance.ts (protocol-version mismatch let through). An adapter that emitted <CRITIQUE_RUN version='2'> followed by a valid SHIP classified as shipped because the harness only watched for terminal events. Added a guard inside the parse loop: if a run_started carries protocolVersion !== CRITIQUE_PROTOCOL_VERSION, mark the adapter degraded with reason 'protocol_version_mismatch' (already in DEGRADED_REASONS) and return early. ConformanceOutcome union widened to accept the new reason. 4. lefarcen P2 in tools/dev/dashboards/critique.json (runs-per-hour panel under-reported by 3600x). 'rate(...[1h])' returns per-second. Multiplied by 3600 so the panel title and unit match the actual value rendered. Verification: - pnpm --filter @open-design/daemon typecheck: clean - New metrics + logging suites (7), existing adapter-degraded (7), conformance (5), rollout (10): 29 / 29 green - Grafana JSON re-parses with node -e 'JSON.parse(...)' fix(nix): set pnpmDepsHash to fakeHash so CI surfaces the real hash for the regenerated lockfile (lefarcen P1 on PR #1485) * fix(nix): pin pnpmDepsHash to sha256-NtXbiRU0YZ4EVJVNC6N3sR1S0ozA3BvCwgXI0L0OMH4= from CI nix-check output --------- Co-authored-by: Nagendhra <nagendhra405@gmail.com>	2026-05-13 22:11:27 +08:00
lefarcen	5172e37217	Merge origin/main into release/v0.7.0 to prepare merge-back PR Resolves 7 conflicts via hybrid strategy: - apps/web/src/components/EntryView.tsx: take main (Discord+X pills are forward feature) - apps/web/src/components/Icon.tsx: take main (switch-case refactor) - apps/web/src/components/NewProjectPanel.tsx: take release (preserve #1514 dropdown UX validated in 0.7.0 acceptance) - apps/web/src/index.css: take main (project-target-platforms / instructions chip styles) - apps/web/tests/components/FileViewer.inspect-empty-hint.test.tsx: accept main's deletion - nix/package-daemon.nix, nix/package-web.nix: take main pnpmDepsHash Non-conflicting hunks from #1519 (AppChromeHeader), #1428 (PostHog analytics call sites), and #1540 (release light background) are preserved via auto-merge.	2026-05-13 18:19:47 +08:00
mehmet turac	8601dac9f4	fix(tools-pack): warn on stale dist in dev/workspace mode (#1470 ) * fix(picker): improve provider group header separation in Media model picker Added min-height and border-bottom to the sticky provider group header to ensure it fully separates from the model content below. Fixes #1434 * fix(tools-pack): warn on stale dist in dev/workspace mode Detect when dist is stale relative to source and emit a warning. This helps developers notice when they need to rebuild tools-pack after making changes to tools/pack/src/*. Fixes #1452 fix(tools-pack): remove TypeScript annotation and use recursive source mtime check Addressed review feedback: - Remove TypeScript annotation from .mjs file (P0) - Use recursive source file mtime check instead of directory mtime (P1) Fixes #1452	2026-05-13 16:28:57 +08:00
PerishFire	61163d6b92	Optimize Windows packaged prebundle flow (#1389 )	2026-05-12 12:07:32 -04:00
lefarcen	e1bc83a476	feat(analytics): PostHog product analytics (P0 events, consent-gated, packaged) (#1428 ) * feat(analytics): scaffold PostHog product-analytics integration - Add @open-design/contracts/analytics subpath with the 17 P0 event payload types, header constants, and code↔CSV enum mapping helpers. - Add apps/daemon/src/analytics.ts with env-gated posthog-node client, request-scoped analytics context reader, and artifact-id anonymizer. - Expose GET /api/analytics/config so the web bundle never embeds the PostHog key at build time; daemon owns POSTHOG_KEY / POSTHOG_HOST. - Add apps/web/src/analytics module (identity + lazy posthog-js client + React provider) and mount it under <I18nProvider> in app/layout. No event wiring yet — that lands in the next commit alongside trigger points (App.tsx, EntryView, NewProjectPanel, SettingsDialog, FileViewer, runs.ts). * feat(analytics): wire app_launch, home_view, home_click, project_create_result - App.tsx: fire app_launch once after first effect tick. handleCreateProject now emits project_create_result on both success and failure paths. - EntryView.tsx: home_view (page) gated on agents loading so has_available_cli isn't transiently false; home_view (asset_panel) fires per top-tab change with the right result_count. - NewProjectPanel.tsx: home_click create_button fires before delegating to the parent; a fresh request_id is generated here and threaded through onCreate so the matching project_create_result stitches via $insert_id. - contracts/analytics: tighten createTabToTracking and topTabToTracking for the worktree branch's renamed tabs (live-artifact, templates). * feat(analytics): wire settings_view + 3 settings_click events - settings_view fires on dialog mount and on every section switch, carrying the active section (mapped via settingsSectionToTracking for the 16-section worktree layout), execution_mode, and the selected CLI provider id when present. - settings_click execution_mode_tab: setMode now emits before/after values whenever the user toggles between Local CLI and BYOK. - settings_click cli_provider_card: agent card onClick reports cli_provider_id via agentIdToTracking (kiro → other). - settings_click byok_field: onFocus added to api_key, model select, and base_url inputs; provider_id widened to include google so the worktree's Gemini protocol slot type-checks. * feat(analytics): wire studio_view + studio_click chat, studio_view artifact - packages/contracts/src/analytics/artifact-id.ts: FNV-1a 64-bit helper produces a 16-hex anonymized id for (projectId, fileName). Stable cross-platform so the daemon and the web bundle resolve the same id without a Web Crypto round-trip; daemon now re-exports it. - ChatComposer: studio_view chat_panel fires once per project mount, studio_click chat_composer fires on attachment + send buttons with estimated user_query_tokens (length/4) and has_attachment. - FileViewer: studio_view artifact fires once per (project, file) at the dispatcher level, before any sub-viewer renders, with artifact_kind derived from the renderer registry / file.kind table. - Widen TrackingExportFormat to include markdown and cloudflare_pages so the worktree branch's full share menu can emit verbatim. * feat(analytics): wire studio_click share_option + artifact_export_result HtmlViewer's share menu now emits both events per click via a fireShareExport helper: - studio_click share_option fires immediately on click with the chosen export_format and a fresh request_id. - artifact_export_result fires when the export resolves — success for sync exporters (html, markdown, template) the moment the call returns, success/failed for async exporters (pdf, zip, deploy) via .then/.catch. The same request_id threads both events so PostHog stitches click → result via $insert_id. DEPLOY_PROVIDER_OPTIONS maps to the CSV's vercel / cloudflare_pages slots; markdown is now a first-class export_format value. Also ignore .env.local so local POSTHOG_KEY / .env-style secrets don't get committed. * feat(analytics): emit run_created and run_finished from the daemon POST /api/runs now reads the analytics context off the x-od-analytics-* headers the web client sets on every fetch, then: - Captures run_created with project_id, conversation_id, run_id, model_id, agent_provider_id (mapped via agentIdToTracking), skill_id, design_system_id, plus the token_count_source marker. - Schedules a run_finished capture on runs.wait(run) resolution, mapping succeeded/canceled/failed to success/cancelled/failed and reporting total_duration_ms. Both events use a stable insert_id derived from the same uuid so PostHog dedupes the daemon-side mirror against any future web-side capture without double-counting. Token sub-fields (user_query_tokens/system_prompt_tokens/...) stay omitted in v1 — the claude-stream parser only exposes input/output totals today. See tracking-doc-issues.md §3.2. * feat(analytics): emit settings_cli_test_result + settings_byok_test_result The original BLOCKING-list assumed these CSV P0 events were not implementable in this branch because main lacked Test buttons. The worktree HEAD actually wires `handleTestAgent` and `handleTestProvider` in SettingsDialog, so both events are now in scope. - handleTestAgent emits settings_cli_test_result on success and failure paths with cli_provider_id mapped via agentIdToTracking, result drawn from result.ok / catch branch, error_code from result.kind or the thrown error name, and duration_ms timed via performance.now(). - handleTestProvider emits settings_byok_test_result analogously, using apiProtocol (anthropic\|openai\|azure\|ollama\|google) directly as provider_id — wider than the CSV's 5-value enum, documented in tracking-doc-issues.md §2.5. Contracts: add SettingsCliTestResultProps / SettingsByokTestResultProps plus matching track* helpers. AnalyticsEventName union now covers all 14 P0 events this branch supports. * feat(analytics): gate PostHog on the existing telemetry.metrics consent The integration now reuses the same first-launch privacy banner + Settings → Privacy toggle that gates Langfuse, so a single user decision controls both telemetry sinks. - /api/analytics/config now consults the persisted AppConfigPrefs: it returns enabled=true only when POSTHOG_KEY is set AND the user has chosen "Share usage data" (telemetry.metrics === true). The response also echoes installationId so the web client uses the same anonymous id Langfuse keys off of — one identity per install, shared across both sinks. - Web AnalyticsProvider: - Bootstrap fetch resolves installationId and threads it through the x-od-analytics-anonymous-id header on every /api/* fetch, so daemon-side captures (run_created / run_finished / project_create_result) land on the same person record. - Exposes a setConsent(granted) method that calls posthog-js's opt_in_capturing / opt_out_capturing, wired from App.tsx via a useEffect watching config.telemetry?.metrics. Toggling Privacy → metrics now stops/resumes events immediately, no reload. - app_launch additionally gates on telemetry.metrics so a freshly- declined user fires nothing, and a freshly-opted-in user fires on the next reload. * feat(packaging): bake POSTHOG_KEY into packaged daemon spawn env Wires PostHog product analytics through the same Langfuse-style build- secret pipeline so official Open Design builds ship with the key while fork builds compile without it (the integration short-circuits cleanly when POSTHOG_KEY is absent). tools/pack - resolveToolPackConfig reads POSTHOG_KEY / POSTHOG_HOST from process.env at packaging time, validates them (no whitespace in the key, http(s) URL for host, trailing-slash strip), and stamps them on ToolPackConfig. Fork builds without the env vars simply omit the fields; the daemon-side gate keeps things off in that case. - Mac, Windows, and Linux packaged-config writers each append the two fields to open-design-config.json next to the existing telemetryRelayUrl entry. apps/packaged - RawPackagedConfig / PackagedConfig surface posthogKey / posthogHost so the Electron entry and headless entry both forward them to the daemon sidecar. - buildPackagedDaemonSpawnEnv emits POSTHOG_KEY / POSTHOG_HOST into the daemon child env when present. The daemon's existing analytics module reads these via process.env — no daemon-side changes needed. - The headless packaged path falls back to process.env for fields the builder hasn't injected, mirroring how OPEN_DESIGN_TELEMETRY_RELAY_URL is read there. CI - release-beta.yml and release-stable.yml expose POSTHOG_KEY (secret) and POSTHOG_HOST (var) at workflow-env scope so every packaging job inherits them. PR / fork builds without these set simply skip the bake step. Tests - tools/pack: config.test.ts covers bake-through, fork-build omission, whitespace rejection, invalid-URL rejection, and trailing-slash normalization. - apps/packaged: sidecars.test.ts covers buildPackagedDaemonSpawnEnv forwarding the keys when present and omitting them when null. * feat(analytics): enable PostHog autocapture + perf + exceptions Flip on the PostHog SDK's automatic diagnostic features so we capture click paths, page transitions, web vitals, dead clicks, and browser exceptions without scattering instrumentation through the codebase. Privacy defense lives in one place — apps/web/src/analytics/scrub.ts — wired in via posthog-js's `before_send` hook so every outgoing event passes through the same audit point: - $autocapture / $rageclick / $dead_click / $copy_autocapture: strips $el_text and value/placeholder/aria-label attrs from any input, textarea, password input, or contenteditable element. PostHog autocapture does not capture input.value by default, but $el_text on a <textarea> reflects the typed content — that's the prompt body for us, so it has to be scrubbed every time. - $pageview / $pageleave: drops query string and fragment from $current_url / $referrer so any future ?q=… can't leak. - $exception: rewrites file:// and absolute filesystem paths in stack frames to app://apps/<repo-relative> so we don't ship the user's home directory. - Suppresses $opt_in entirely — duplicate of our explicit setConsent toggle in App.tsx. Element-level defense in depth is limited to the single most sensitive surface: the chat composer textarea gets `ph-no-capture` so PostHog never even generates an event for clicks inside that subtree. Every other input relies on scrub.ts — sprinkling the class through every form would be noisy and easy to forget on new surfaces. The existing Privacy → "Share usage data" toggle continues to gate every new feature: posthog-js's opt_out_capturing() halts autocapture, $pageview, $exception, web vitals, and dead clicks alongside the explicit capture() calls — one global switch. 11 unit tests pin the scrub rules in apps/web/tests/analytics-scrub.test.ts. * ci(nix): bump pnpmDepsHash for posthog-js + posthog-node additions Adding posthog-js to apps/web and posthog-node to apps/daemon changed pnpm-lock.yaml, which Nix's fixed-output pnpmDeps derivation pins by sha256. The CI nix flake check failed with: specified: sha256-KF3Mld72/iau+pJmA7HvnanRx8VLtDP0N624SKrtrrc= got: sha256-PGFgX4lYyeH2TRAXfUq52A3EOa6bb1gO59hPsXhEk3s= Copy the new hash into both nix/package-web.nix and nix/package-daemon.nix per the procedure documented in nix/README.md §"First-build hash pinning". * feat(analytics): unify PostHog identity with Langfuse installationId PostHog's distinct_id is the installationId stamped by /api/analytics/ config; Langfuse already reads the same id off app-config.json to populate trace.userId. With both sinks keying off the same anonymous identity, dashboards can correlate user actions (PostHog events) with LLM runs (Langfuse traces) without re-identifying. Two gaps closed: 1. applyConsent(false) — clear posthog-js's persisted ph__posthog localStorage entry on opt-out via posthog.reset(). Without this, a user who opts out, then clicks Delete my data, then re-opts in would see PostHog stitch their new session to the deleted identity because bootstrap.distinctID only takes effect on first init. 2. applyIdentity(newInstallationId) — Delete my data rotates the installationId in app-config; App.tsx now watches config.installationId and calls posthog.reset() then identify(newId) so the next event batch is fully decoupled from the deleted one. Idempotent on same-id re-renders so benign config refreshes don't churn PostHog identities. The fetch wrapper's x-od-analytics-anonymous-id header also flips to the new id on rotation so daemon-side captures (run_created / run_finished) land on the same person record from the very next API call, not after a reload. The end-to-end rotation flow is verified against a live PostHog project; these unit tests pin the safety guards (no-client paths, null inputs) since stubbing posthog-js's init-loaded callback chain is brittle. fix(langfuse): require both metrics AND content consent for trace reports Tightens the Langfuse gate so a user who shares anonymous metrics but NOT conversation content stops emitting Langfuse traces entirely — Langfuse is used for turn-quality evals which only make sense with prompt/output bodies. PostHog (product analytics, content-free) stays gated on `metrics` alone and is unaffected. i18n: "Conversation content" → "Conversation and tool content" with hints expanded to mention tool inputs/outputs so the consent surface matches what the trace actually carries (en + zh-CN). Bundled here per PR scope — change originated outside this PostHog PR but lands cleanly on the same files; gating Langfuse strictly on `content` makes the dual-sink consent model (PostHog = metrics, Langfuse = metrics + content) symmetric across both i18n locales and the daemon-side gate. * feat(analytics): wire byok_provider_option + fix PR review P1s Adds the BYOK protocol-chip click event (5-value provider_id mirroring the apiProtocol Settings UI) and resolves four P1 review threads on PR #1428. byok_provider_option: - New SettingsClickByokProviderOptionProps in contracts (provider_id = anthropic\|openai\|azure\|google\|ollama; maps to CSV's 5 values per tracking-doc-issues.md §2.5). - trackSettingsClickByokProviderOption helper in apps/web/src/analytics. - SettingsDialog hooks it on the protocol-chip onClick alongside the existing setApiProtocol call; is_selected reflects whether the chip was already active. Review fixes: 1. client.ts (Siri-Ray): clear `initPromise` when the resolution is null so a Privacy → metrics opt-in after a previous decline triggers a fresh /api/analytics/config fetch. Without this, the disabled response was cached forever — first-session opt-in needed a reload to start sending PostHog events. 2. provider.tsx (Siri-Ray): replace `url.includes('/api/')` with a strict same-origin + /api/ pathname check (shared `isSameOriginApiCall` helper). Outbound third-party URLs containing `/api/` (e.g. provider.example.com/api/x) no longer receive our x-od-analytics-* headers. 3. provider.tsx (codex-connector, lefarcen): gate header injection on `resolvedAnonId` being non-null. When Privacy → metrics is off, /api/analytics/config returns enabled=false → resolvedAnonId stays null → wrapper never installs → daemon can't read consent-bearing headers → no daemon-side PostHog event. setConsent now also clears resolvedAnonId on opt-out and re-fetches on opt-in. 4. daemon/analytics.ts (defense in depth): createAnalyticsService now takes dataDir and capture() re-reads app-config to check telemetry.metrics inside the fire-and-forget wrapper. Even if a stale header somehow reaches the daemon after opt-out, the capture is dropped before posthog-node.capture is called. * fix(web): place "Share usage data" on the right in privacy consent banner Swap button order in PrivacyConsentModal and the in-settings ConsentCard so the affirmative "Share usage data" lands on the right and "Not now" on the left. Matches the OK-on-the-right pattern users expect for primary actions. Both buttons keep equal visual prominence (same .privacy-consent-action styling) so the swap doesn't change the EDPB equal-prominence stance called out in the original Langfuse telemetry spec. * feat(analytics): populate run_finished token totals from claude-stream usage Daemon's claude-stream parser already emits agent usage events with input_tokens / output_tokens totals; the run service buffers them in run.events and Langfuse reads them out the same way. The run_finished PostHog event was leaving these fields empty. Scan run.events for the most recent agent usage frame on terminal transition and emit input_tokens / output_tokens / total_tokens when present. token_count_source flips to 'provider_usage' only when at least one count landed; runs without provider-side usage data keep 'unknown'. Provider does not break the input down into the 7 sub-fields the tracking doc lists (memory / context / attachment / system_prompt / …); those stay omitted until a parser change exposes them. * feat(analytics): estimate user_query_tokens from prompt length The user_query_tokens field for run_created / run_finished was hardcoded to 0. We can't tokenize without bundling a model-specific tokenizer, but the character/4 heuristic is the industry-standard estimate when one isn't available and is enough for funnel analysis (prompt-length cohorts, short-vs-long-query conversion rates). Extracted from req.body via the same telemetryPromptFromRunRequest pattern the daemon already uses for langfuse-bridge (currentPrompt then message fallback). Only the integer count goes to PostHog — the prompt text itself never leaves the daemon. token_count_source flips appropriately: - run_created with a prompt: 'estimated' (was 'unknown') - run_created with no prompt: 'unknown' - run_finished with provider usage: 'provider_usage' (overrides baseProps' 'estimated' value) - run_finished without provider usage: inherits 'estimated' or 'unknown' from baseProps so input/output absent doesn't mask the estimate.	2026-05-12 22:32:42 +08:00
lefarcen	2a0ebea50b	release: Open Design 0.7.0 - bump 14 monorepo package.json files to 0.7.0 (root + apps/{web,daemon,desktop,packaged,landing-page} + packages/{contracts,platform,sidecar,sidecar-proto} + tools/{dev,pack,pr} + e2e); apps/packaged was already at 0.6.1 from beta lane, all others at 0.6.0 - add CHANGELOG.md [0.7.0] - 2026-05-12 entry covering 97 merged PRs since 0.6.0: - Critique Theater: Phase 7 web client state machine (#1307) + Phase 6.2 daemon artifact extraction (#1085) - Web/UI: thumbs-up/down feedback widget (#1308), Cmd+, opens Settings (#1173), Finalize design package + Continue in CLI (#974), fetch models button for BYOK (#1034), provider models alphabetical sort (#1097), collapsible MCP JSON field-mapping (#1136), design file rename (#894) - Daemon: auto-memory store with chat-protocol-aware extraction (#999), install/uninstall skills & design systems (#1003), HTTP 206 range requests for video/audio (#1105), scheduled routines (#1033), agent runtime + route registration refactor (#1063, #1043) - HyperFrames: HTML-in-Canvas across web + skills (#866) - Skills/design systems: generic skills + design-templates split + finalize-design API (#955), agent-browser skill (#1284), WeChat design system + login-flow skill (#1083), hud/loom/trading-terminal design systems (#1069), release-notes-one-pager skill (#873), tokens.css schema (#1231) - Packaging: macOS Intel (x64) build (#759), official Nix flake (#402), beta packaging cache (#1095) - Maintainer ops: tools-pr PR-duty workspace (#1259), MAINTAINERS.md (#1290), contributor card bot (#932), PR→issue linking discipline (#1263) - Changed: conversation run isolation (#1271), default English i18n fallback (#1270), Codex CLI exit diagnostics / empty-response handling / path fallback (#1267, #1244, #1205) - Fixed: ~30 web + desktop + daemon + packaging bugfixes - Internal: nightly UI/desktop regression coverage (#1256), e2e/release report hardening (#1140), entry/settings automation (#954) - catch up [Unreleased] compare link to v0.7.0 and add missing [0.6.0] release link - add 97 PR footnote refs ([#402]..[#1330]) Verified locally: pnpm install + pre-build contracts/daemon/desktop dist + pnpm typecheck (exit 0 across all 14 packages on Node 22.22 with engine-warning). Release workflow validation runs after merge via release-stable.	2026-05-12 15:33:28 +08:00
lefarcen	43f7fc536a	Add Langfuse telemetry relay (#1296 ) * Add Langfuse telemetry relay * Configure telemetry worker custom domain * Add telemetry relay health check * Harden telemetry relay config	2026-05-12 13:59:19 +08:00
PerishFire	819c34fd8f	fix(tools-pr): fall back on reviewDecision for unresolved-changes-requested (#1287 ) * fix(tools-pr): fall back on reviewDecision for unresolved-changes-requested Patrol classify on the live 102-PR queue missed three PRs (#1101, #1127, #1163) where GitHub's reviewDecision is CHANGES_REQUESTED but the classify tag did not fire. Root cause is a divergence between two notions of "latest review state per reviewer": - GitHub's reviewDecision keeps a reviewer's CHANGES_REQUESTED in effect until that same reviewer submits APPROVED or DISMISSED. A subsequent COMMENTED review by the same reviewer does NOT supersede it. - Our `reduceLatestReviewsByAuthor` collapses every reviewer to their latest review with no special-casing of state, so a CHANGES_REQUESTED followed by COMMENTED disappears from the reduced view. `tagUnresolvedChangesRequested` filtered the reduced view for `state === "CHANGES_REQUESTED"`, so the three PRs above (each had a reviewer write CHANGES_REQUESTED → COMMENTED) escaped the rule even though the PR-level reviewDecision was still CHANGES_REQUESTED. Add a narrow fallback: when the first path returns no per-reviewer reviewers, trust `facts.reviewDecision === "CHANGES_REQUESTED"` as the source of truth. The fallback reason and source token differ from the first path so report consumers can tell which signal fired. Reducer semantics left alone on purpose — flipping COMMENTED handling there would cascade to `bot-only-approval`, `stale-approval`, and `humanReviewerSignalAt`, each of which has its own correctness story. * fix(tools-pr): keep fallback reason strictly factual Codex flagged that the fallback path's reason text asserted a specific review sequence ("CHANGES_REQUESTED then COMMENTED") that the condition alone does not prove. The condition only observes: - `facts.reviewDecision === "CHANGES_REQUESTED"`, and - after `reduceLatestReviewsByAuthor`, no review carries `state === "CHANGES_REQUESTED"`. Multiple GitHub configurations satisfy that pair — a reviewer's CR followed by COMMENTED, a CR that sits outside the `reviews(last: 30)` fetch window, etc. Per `tools/pr/AGENTS.md`'s strictly-factual rule, the reason must report only what is directly observed, not the most likely upstream cause. Drop the inferred-cause clause from `reason`; move the explanation of possible upstream causes into the code comment above the branch where it does not show up in classify output. * docs(tools-pr): document fallback data source for unresolved-changes-requested Siri-Ray and lefarcen both flagged that the tag dictionary row for `unresolved-changes-requested` only describes the primary per-reviewer path. The fallback added earlier in this PR emits the same tag with a different `source` token (`gh.reviewDecision` vs the original `gh.latestReviews[].state`), so report consumers need the dictionary to list both paths to interpret which one fired. Update the row to call out both: the primary per-reviewer rule, and the PR-level reviewDecision fallback that fires when no per-reviewer CR survives the latest-per-author reduction. The two-token source column mirrors the actual `Tag.source` strings emitted at runtime. * test(tools-pr): pin both emission paths of unresolved-changes-requested lefarcen flagged the fallback was validated only by live-PR examples in the PR body, so a refactor could silently regress the coverage. Add a deterministic test file `tests/tags-unresolved-cr.test.ts` that exercises `classifyPr` against crafted `PrFacts` fixtures: - primary path (per-reviewer CR after reduction) fires with source=gh.latestReviews[].state and surfaces the reviewer login - fallback path fires with source=gh.reviewDecision when no per-reviewer CR survives reduction (covers both the COMMENTED-follow-up shape and the empty-reviews shape — the latter pins the `reviews(last: 30)` out-of-window concern from the factual-reason fix) - primary wins over fallback when both signals are present (single tag emitted, source=gh.latestReviews[].state) - two negative cases: empty reviewDecision and APPROVED — neither emits Also extend the fallback's code comment with the observed scale (3 of 102 open PRs hit this gap: #1101, #1127, #1163) so future maintainers can tell this is a recurring queue pattern, not a theoretical edge case. This is the first test under `tools/pr/tests/`; the package test script already ran `node --import tsx --test tests/*.test.ts` against an empty glob, so no scaffolding changes are needed.	2026-05-12 09:40:50 +08:00
PerishFire	c3d41c7d45	fix(tools-pr): chunk stats fetch through cursor-paginated GraphQL (#1285 ) `fetchOpenPrs` was reading the stats chunk via `gh pr list --limit 1000 --json mergeStateStatus,...`. With the default limit raised to 1000 in #1259, this 502s reliably on the live open queue (107 PRs): GitHub's GraphQL gateway has to recompute mergeStateStatus for every PR up front, and the resulting query exceeds the gateway budget once the requested page passes ~60 PRs. Switch the stats chunk to `fetchPaginatedPrList`, the same cursor- paginated GraphQL helper that already drives reviews / comments / commits / assignment-timelines. Page size stays at PR_LIST_PAGE_SIZE (30), well within the gateway budget, and the heavy stats fetch is now consistent with the other heavy chunks. Verified locally: `pnpm tools-pr list` now completes against the live 107-PR queue without a 502.	2026-05-11 20:51:29 +08:00
PerishFire	8c0fb8dc01	feat(tools-pr): add maintainer PR-duty workspace (#1259 ) * feat(tools-pr): add maintainer PR-duty workspace Adds `tools/pr` as the maintainer-only control plane for PR-duty work on this repo. Thin `gh` wrapper that encodes repo-specific knowledge: review lanes, forbidden surfaces, lane-specific checklists, validation command derivation from touched packages. Subcommands: - `list` — triage open queue by lane and review-state bucket. - `view <num>` — agent-friendly review brief for a single PR. - `classify [num]` — emit script-level tags for one PR or the whole open queue; full-queue JSON output lands under `.tmp/tools-pr/classify/` with rate-limit telemetry per run. - `assignment` — assigner-perspective view of PR ownership, idle time, and blockers (derived from existing tags; no new judgments). Tag dictionary (13 tags) covers: bot-only-approval, needs-rebase, forbidden-surface, unlabeled, duplicate-title, non-ascii-slug, maintainer-edits-disabled, org-member, unresolved-changes-requested, stale-approval, and three awaiting-* timing tags. Each rule is expressible as one factual sentence over `gh` data + repo paths — see `tools/pr/AGENTS.md` for the full dictionary plus precision rules. Templates in `tools/pr/templates/.md` are aesthetic references for recurring maintainer comments (duplicate-title ask, awaiting-author nudge, agent-review brief shape). `templates/examples/` holds frozen-in-time agent-review snapshots for three PR shapes. Infrastructure: - `gh()` wraps `execFile` with minimum-touch retry (2 attempts at 1s + 2s backoff) on transient 5xx / network errors. Persistent failures still surface — retry is anti-jitter, not an exponential-backoff resilience layer. - Heavy chunks (`reviews`, `comments`, `commits`, assignment timelines) use cursor-paginated `gh api graphql` via `fetchPaginatedPrList` to stay under GitHub's GraphQL server-side timeout. Light chunks stay on `gh pr list --json`. - `fetchOrgMembers` cached per process via `gh api orgs/<owner>/members --paginate`. Wiring: - Root `package.json` adds `pnpm tools-pr` to the allowed root entry points. - `scripts/postinstall.mjs` builds `tools/pr` alongside other workspace packages. - `scripts/guard.ts` allowlists `tools/pr/bin/tools-pr.mjs` and `tools/pr/esbuild.config.mjs`, and adds `pr/` to the `tools/` top-level layout allowlist. - Root `AGENTS.md` and `tools/AGENTS.md` document the new command surface, root-command-boundary update, and per-tool ownership. docs(agents): brief tools-pr in root AGENTS.md, link to tools/pr/AGENTS.md Adds a `PR-duty tooling` section to the root AGENTS.md summarising what `pnpm tools-pr` is, listing the four common subcommands (list / view / classify / assignment), and pointing readers to `tools/pr/AGENTS.md` for the full tag dictionary, operational playbook, templates, and design rules. The section keeps root-level guidance to high-level orientation while details stay local to the tool's own AGENTS.md. * fix(tools-pr): drop overly broad touches-root-package.json forbidden hit `deriveForbidden` was flagging any change to root `package.json` as a forbidden-surface hit, but AGENTS.md §Root command boundary only forbids specific lifecycle aliases (pnpm dev / test / build / daemon / preview / start) — tools-control-plane entrypoints like `pnpm tools-pr` are explicitly allowed. Distinguishing "forbidden alias" from "allowed entry" requires reading the diff content, which is `pnpm guard`'s job rather than a path-derived classify tag. Dogfooded on this branch's own PR (#1259), which added the `pnpm tools-pr` script and was incorrectly flagged. Removing the hit aligns the `forbidden-surface` tag with what tools-pr can mechanically detect from file paths alone (apps/nextjs/, packages/shared/). * fix(tools-pr): paginate commits fetch, recognise ready-to-merge, escape title-index separator Three review follow-ups on #1259, all factual fixes: - `fetchOpenPrCommits` now uses `fetchPaginatedPrList` instead of a one-shot `pullRequests(first: $first)` query. GitHub GraphQL caps connection page size at 100, so the previous implementation would fail at runtime when callers passed `--limit > 100`. The paginated path makes the commits fetch consistent with the other heavy chunks (reviews, comments, assignment timelines) and removes the artificial ceiling entirely. The `limit` parameter is dropped from `fetchOpenPrCommits`; the CLI `--limit` continues to bound the `gh pr list --json` chunks. - `deriveStatus` in `assignment.ts` now reads `facts.reviewDecision` and `facts.mergeStateStatus`. When the PR is `APPROVED` with merge state `CLEAN` or `UNSTABLE` and carries no blockers, status renders as `ready to merge` instead of falling through to `in review`. The assignment view loses its main triage signal without this — a clean human-approved PR rendered identical to a REVIEW_REQUIRED one. - `tags.ts:tagDuplicateTitle` and `tags.ts:buildContext` both constructed the title-index key with a literal NUL byte between author and title, which made the file appear as binary in `git diff` / review tooling. Replaced the literal byte with a Unicode escape sequence in source; the runtime string value is identical, the source stays plain text and round-trips through review tooling cleanly. * fix(tools-pr): raise default --limit to 1000 to cover the live open queue mrcfps flagged that `tools-pr list` (and `classify --all`, `assignment`) defaults to `--limit 100`, which silently drops every PR past the first 100 in the open queue. The repo currently sits at 104 open PRs, so the out-of-the-box run was already omitting four PRs. Raise the default to 1000 in `list.ts`, `classify.ts`, and `assignment.ts`, and remove the now-pointless 200 ceiling — `gh pr list --limit N` paginates internally, so a high cap is cheap. Users can still pass `--limit <small>` for a truncated preview. CLI help text on the three subcommands updated to match. * fix(web): pass designTemplates to ProjectView render helper #955 made `designTemplates` a required Prop on ProjectView, but the test helper added in #1244 (`renderProjectView` in `ProjectView.api-empty-response.test.tsx`) was never updated. The two PRs landed on main without conflicting, leaving `apps/web` typecheck red for every PR that rebases past `b5eb8c16`. Pass `designTemplates={[] as SkillSummary[]}` alongside the existing `skills={[] as SkillSummary[]}` so the helper compiles. The component already treats the array shape (empty included) as a no-op fallback in the empty-response paths the test exercises. * fix(tools-pr): correct author signal + merge inline review comments Two correctness gaps in the awaiting-* signal pipeline surfaced during review of the new tools-pr commands: 1. `authorSignalAt` iterated every PR commit unconditionally. On `maintainerCanModify=true` PRs a maintainer's follow-up push would advance the author timestamp, masking a stalled author response. Filter commits to those whose `authorLogin` matches `facts.author`, mirroring the same filter already applied to comments. 2. `fetchOpenPrComments` (and `fetchView`) only fetched `pullRequest.comments` / `gh pr view --json comments`, which is the issue-conversation thread. Inline review-thread replies — where authors and reviewers actually exchange most fix-up replies — live in `reviewThreads.comments` / REST `pulls/{n}/comments`. Missing them let `humanReviewerSignalAt` / `authorSignalAt` and the `view` brief point at the wrong side after someone replied inline. Extend the list-mode GraphQL to also sweep `reviewThreads(last: 20).comments(first: 20)`, and add a parallel REST inline-comments fetch in `fetchView` that merges into `GhView.comments`.	2026-05-11 19:17:21 +08:00
Tom Huang	b5eb8c1647	feat: generic skills + split skills/design-templates + finalize-design API (#955 ) * feat: general-purpose skills with @-mention composition and user import Lift skills from "one mode-bound skill per project" to a generic capability the user can compose per turn: - Daemon: scan multiple skill roots (user-skills under runtime data, then the bundled `skills/`); user-imported skills can shadow built-ins by id. - New `POST /api/skills/import` and `DELETE /api/skills/:id` endpoints, with CONFLICT/BAD_REQUEST/NOT_FOUND error codes and built-in delete protection. - ChatRequest gains `skillIds: string[]`; the chat run concatenates each picked skill's body (and merges craftRequires) into the system prompt for that turn only — the project's persistent `skillId` is untouched. - Web composer: `@` popover now lists skills alongside project files; picks render as removable chips above the textarea and ride along with the request as `skillIds`. - Settings → Library: import form (name/description/triggers/body), per-card delete for user skills, "user" origin badge. * chore(web): drop welcome pet teaser + add ds→prompt-template mapping util - SettingsDialog: remove the inline pet adoption teaser from the welcome panel so the first-run modal stays focused on configuration. - New `inferPromptTemplateCategoriesForDs(ds)` helper that maps a design system's authored metadata to prompt-template gallery categories. Imported by the design-system gallery wiring on a sibling branch; no callers in this branch yet. * feat: split skills/design-templates and add finalize-design API Phase 0 of the skills/design-templates refactor (specs/current/ skills-and-design-templates.md): - Move ~104 rendering catalogue entries from skills/ to design-templates/ and keep skills/ for the small set of functional skills that do work on user input (utilities, briefs, packagers). - Add design-templates/AGENTS.md and skills/AGENTS.md describing the contract, and a brand-agnostic craft/ surface for opt-in craft rules. - Daemon: add DESIGN_TEMPLATES_DIR / USER_DESIGN_TEMPLATES_DIR roots and an /api/design-templates surface mirroring /api/skills. Asset/example routes still span both registries so existing srcdoc URLs keep resolving across the rename. - Web: split LibrarySection into SkillsSection + DesignSystemsSection, rename the EntryView "Examples" tab to "Templates", and update locales + the New-project picker accordingly. Adds the finalize-design endpoint: - New apps/daemon/src/finalize-design.ts and packages/contracts/src/api/ finalize.ts — one-shot synthesis of a project's transcript + active design system + current artifact into <projectDir>/DESIGN.md via the Anthropic Messages API. Per-project .finalize.lock mirrors the transcript-export hygiene from PR #493; provider credentials are not persisted by the daemon. Other supporting changes: - README + AGENTS.md updates to document the new directory split and craft/ surface, plus i18n strings across 13 locales. - Test refactors and new coverage (finalize-design, runs, sidecar server, plus refreshed daemon integration tests). - .gitignore: scope the .exe ignore to /OpenDesign.exe so legitimate vendor binaries are no longer hidden. fix(merge): move clinical-case-report to design-templates/ Origin/main added the clinical-case-report skill under skills/ before the skills/design-templates split landed. Its od.mode is prototype, so per specs/current/skills-and-design-templates.md it is a design template and belongs alongside the other rendering catalogue entries — not under the slimmed-down functional skills/ root. Moving it keeps the EntryView Templates tab consistent with origin/main's intent. * feat(skills): curated design/creative catalogue + collapsible Settings rows Seed ~100 curated design/creative skill stubs under skills/ sourced from awesome-claude-skills (ComposioHQ) and awesome-agent-skills (VoltAgent). Each stub carries an od.category tag so the new filter pill row in Settings -> Skills can group them. The seed script (scripts/seed-curated-design-skills.ts, pnpm seed:curated-design-skills) is idempotent: it only creates folders that don't already exist, so hand-edited stubs are never overwritten. - Daemon: parse and surface od.category on SkillInfo with a strict slug normaliser; mirror the field on SkillSummary in @open-design/contracts. Category is purely a UI hint — system-prompt composition is unchanged. - Web: rewrite SkillsSection from a left-list / right-detail grid into a vertical stack of collapsible rows mirroring the External MCP panel (header always visible with name + mode/source/category pills + per-row enable toggle; SKILL.md preview, file tree and inline edit form expand on demand). Add a Category filter row above the list. Reorder Settings nav so Skills + External MCP sit above the Composio/MCP cluster. Update composer placeholder/hint across 17 locales to advertise '@ files or skills · / for commands'. - Docs: extend skills/AGENTS.md with the curated catalogue rules (idempotency, category vocabulary, no upstream vendoring). Co-authored-by: Cursor <cursoragent@cursor.com> * test(skills): teach localized-content + system-prompt tests about the skills/design-templates split mrcfps blocking review on PR #955: the skills/design-templates split (`b5993385`) moved ~110 SKILL.md entries out of `skills/` and into `design-templates/`, but two repo-level tests still hard-coded the single-root layout, so CI gates went red on the merged branch: - `e2e/tests/localized-content.test.ts` only scanned `<repo>/skills` while the locale `skillCopy` map keeps id-keyed entries spanning both roots (ExamplesTab/Templates uses one lookup regardless of origin). Teach the helper to read both `skills/` and `design-templates/`, deduplicating ids so the union matches the localized claim. - `apps/daemon/tests/prompts/system.test.ts` read `skills/live-artifact/SKILL.md`, which now lives under `design-templates/live-artifact/`. Update the absolute path so composeSystemPrompt's coverage of the live-artifact preamble is exercised again. Also enroll the curated design/creative catalogue (PR #955, ~91 stubs sourced from awesome-claude-skills / awesome-agent-skills) in the DE / FR / RU `_SKILL_IDS_WITH_EN_FALLBACK` lists. The stubs are English-only by design (frontmatter advertises an upstream URL); the fallback list is exactly the place to acknowledge "we know this id exists, English copy is fine here" so the localized-content coverage gate passes without forcing a translation task per locale. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(skills): always quote frontmatter name so importUserSkill round-trips numeric / boolean ids mrcfps PR #955 review: `buildSkillMarkdown` emitted `name: ${escapeYamlString(name)}` without quotes, so YAML coerced names like `123`, `true`, `false`, or `null` into non-string scalars on re-parse. listSkills() then read `data.name` as a number/boolean and the import flow's follow-up `findSkillById(skills, result.id)` missed it, falling into `/api/skills/import`'s "imported skill could not be re-read" 500 path for those ids. Switch the emitter to a quoted scalar (`name: "..."`) — the double-escape already in `escapeYamlString` makes the quoted form safe — and add a round-trip test covering `123`, `true`, `false`, `null`, and `0` to lock in the contract. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(web): drop staged-skill chips when the matching @<id> token leaves the draft mrcfps PR #955 review: `submit()` always forwarded every id in `stagedSkills`, but that state was only mutated on picker click and chip removal. Hand-deleting an `@<id>` token from the textarea left the chip staged, so the request still carried `skillIds: [<id>]` and the daemon composed a skill the prompt no longer referenced. Sync the chips with the draft inside `handleChange()` by pruning `stagedSkills` whenever the new value no longer contains the `@<id>` token (using the same whitespace boundary as `removeStagedSkill`'s strip regex). Comment explains why this prune does not run for `staged` file attachments — users frequently add files via the upload button without leaving an `@<path>` token, so a symmetric prune there would erase legitimate uploads. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(daemon): stage @-composed skills' side files alongside the active skill codex PR #955 review: composing a per-turn `@`-picked skill into the system prompt appended its body (with the `withSkillRootPreamble` guidance pointing at relative paths under `<cwd>/.od-skills/<folder>/`) but never staged the actual folder. `startChatRun` only copied `activeSkillDir`, so when the project's primary skill was different (or absent) the composed skill's references/, examples/, and scripts/ files lived only at their absolute repo path — agents that honour the cwd-relative form (or that don't get `--add-dir`, e.g. Codex with allowlisted gpt-image projects) couldn't reach them. Thread the composed skills' dirs out of `composeDaemonSystemPrompt` as `extraSkillDirs` and stage each one through the same `stageActiveSkill` API used for the primary skill. Dedupe by folder basename so a project whose primary skill is also `@`-composed isn't copied twice. Each preamble already advertises its own folder, so the prompt and the staged tree stay aligned without further changes. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(web): respect the Library disable toggle in the project @-mention picker codex PR #955 review: only `EntryView` received `enabledSkills` (filtered against `config.disabledSkills`); active projects still got `skills={skills}` raw, so a skill the user disabled in Settings kept appearing in the project's `@`-mention popover and could ride along to the daemon via `skillIds`. That broke the Library toggle for any project opened on the post-split branch. Compute a functional-skills-only enabled subset (`enabledFunctionalSkills`) and pass it into `<ProjectView>` instead. Templates stay separate — design-templates are filtered through their own `enabledDesignTemplates` memo for the Templates gallery — so ProjectView's chat composer still only sees skills, never templates, matching the pre-split prop surface. Co-authored-by: Cursor <cursoragent@cursor.com> * test(e2e): mock /api/design-templates for example-use-prompt flow The Templates tab in EntryView fetches from /api/design-templates after the skills/design-templates split (specs/current/skills-and-design-templates.md). The example-use-prompt Playwright scenario only mocked /api/skills, so the gallery card never appeared and the test timed out waiting on example-card-warm-utility-example. Serve the same fixture summary on both endpoints so the templates gallery renders the card the test clicks. Co-authored-by: Cursor <cursoragent@cursor.com> * test(tools-pack): create design-templates fixture for resources test The packaging resources copy now bundles the new design-templates tree alongside skills (see resources.ts BUNDLED_RESOURCE_TREES). The copyBundledResourceTrees fixture only created skills, design-systems, craft, etc., so the recursive copy crashed with ENOENT on design-templates before it could check the prompt-templates assertion. Add the missing fixture directory so the test exercises the same set of resource trees the packaged build does. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(skills): clone built-in side files into the shadow on first edit mrcfps PR #955 review: editing a built-in skill wrote a USER_SKILLS_DIR shadow folder that contained only a new SKILL.md. The next listSkills() pass surfaced the shadow as the active dir, but every side-file resolver (/api/skills/:id/files, /example, /assets/, the system-prompt preamble, and the per-turn cwd staging) reads through skill.dir. With nothing but SKILL.md in the shadow, the bundled assets/, references/, scripts/, and examples/ disappeared the moment the user hit save — a built-in like last30days or live-artifact would break immediately after edit instead of just having its body overridden. Teach updateUserSkill() to take a `sourceDir` and clone every entry except SKILL.md / dotfiles into the shadow on the very first edit. The shadow stays self-contained, so all the resolvers keep working without fallback bookkeeping. Subsequent edits detect the existing shadow and skip the clone, so user tweaks under the side tree survive a re-save. Wire `sourceDir: skill.dir` from server.ts's PUT /api/skills/:id handler and add two regression tests: - 'clones built-in side files into the shadow on the first edit' walks the file tree after save and asserts assets/template.html, references/ notes.md, and scripts/helper.sh all round-trip from the built-in. - 'preserves user-edited side files on subsequent edits' edits the staged assets/template.html, re-saves, and confirms the user content is still there. Co-authored-by: Cursor <cursoragent@cursor.com> test(e2e): rename home tab from Examples to Templates The Examples tab was renamed to Templates in EntryView (b5993385's skills/design-templates split — entry.tabExamples became entry.tabTemplates and the tab value moved from 'examples' to 'templates'), but entry-chrome-flows still asserted the old label and testId. Update both. * fix(skills+web): preserve template body in API mode and dir-based skill delete Two follow-ups from PR #955 review: 1. ProjectView only received `enabledFunctionalSkills`, but `composedSystemPrompt()` still resolved `project.skillId` through that prop and `fetchSkill()`. Projects created from the new `/api/design-templates` surface keep a template id in `project.skillId`, so opening one in API mode dropped the template body from the system prompt and the upstream request ran without the project's primary template instructions. Now ProjectView takes a separate `designTemplates` prop (the unfiltered template list, so a later-disabled template still loads for projects already created from it) and `composedSystemPrompt()` plus the metadata / `isDeck` lookups fall back to that list, with `fetchDesignTemplate()` as the body-fetch fallback to `fetchSkill()`. The chat composer's `@`-picker keeps receiving only the enabled functional skills. 2. `DELETE /api/skills/:id` used `deleteUserSkill(USER_SKILLS_DIR, skill.id)` which re-slugified the frontmatter id and removed `<userSkillsDir>/<slug>/`. That matched the import shape but missed the install shape — `installFromTarget` writes the folder at `sanitizeRepoName(url)` (GitHub) or `path.basename(realpath)` (local symlink), neither of which is guaranteed to equal the slugified frontmatter `name`. A duplicate `app.delete('/api/skills/:id', ...)` handler at the install routes never fired because Express resolved the earlier registration first, leaving the install/uninstall path without working teardown. The handler now removes `skill.dir` (the absolute path listSkills already discovered) under a USER_SKILLS_DIR safety check, using `lstat` + `unlinkSync` so symlinked local installs unlink cleanly without recursing into the user's source tree. The dead duplicate handler is removed; `deleteUserSkill` is dropped from the server.ts import set (still exported and unit-tested in skills.ts). Regression coverage in `apps/daemon/tests/skills-delete-route.test.ts` pins both shapes plus the symlink-preserves-source case. * test(daemon): point hyperframes system-prompt test at design-templates The merge with main brought in a hyperframes system-prompt test that reads `skills/hyperframes/SKILL.md`, but this branch's split moved `hyperframes` into `design-templates/` (same migration as `live-artifact` already handled above in this file). CI was failing with ENOENT on the old path. --------- Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-11 17:48:34 +08:00
PerishFire	421ddf553c	fix(pack/win): close running app before silent reinstall (#1238 )	2026-05-11 16:35:07 +08:00
Bryan A	587c783dc0	feat(web): add Finalize design package + Continue in CLI buttons (#451 ) (#974 ) * feat(daemon): expose resolvedDir on GET /api/projects/:id (#451 prereq) Native projects (no metadata.baseDir) live at <projects root>/<id>, where projects root is daemon-side state. The web client cannot reconstruct an absolute path on its own, and shell.openPath on a relative path is undefined behavior. Without resolvedDir, the upcoming Continue in CLI button (#451) would render permanently disabled for native projects. Mirrors PR #832's pattern of exposing designMdPath in its response. Computed via the existing resolveProjectDir(...) helper. No behavior change to existing callers; they ignore the new field. Adds ProjectDetailResponse contract type and a focused projects-routes test covering imported-folder, native, and unknown-id paths. * feat(web): add parseProvenance helper for DESIGN.md staleness checks Pure helper that extracts Project ID, design system, current artifact, transcript message count, and generated UTC timestamp from the `## Provenance` section emitted by the daemon's finalize synthesis prompt (apps/daemon/src/finalize-design.ts). Used by useDesignMdState to derive the Continue in CLI button's stale/fresh state without an additional daemon endpoint. Handles missing section, "none" sentinels for design system / artifact, and malformed timestamps without throwing. Tests cover all four branches. * feat(web): add buildClipboardPrompt template for Continue in CLI Inline single-source-of-truth template per #451 spec §3.4. Names the project, the working directory, and the DESIGN.md-first operating contract for the receiving `claude` CLI session. Trailing TODO is the blank task slot the issue body specifies — left empty so the user fills it in before submitting. Also lands the shared copyToClipboard helper (jsdom-safe canonical path + execCommand fallback) so the new button and any future caller share one fallback path, mirroring the inline pattern in FileViewer.tsx. Tests cover happy-path field rendering, "none"/"unknown" sentinels when DESIGN.md fields are absent, and both clipboard branches. * feat(web): add useProjectDetail + useDesignMdState hooks useProjectDetail wraps GET /api/projects/:id, surfacing the resolvedDir field and falling back to metadata.baseDir for older daemons that don't include it. Continue in CLI needs an absolute working directory so the desktop bridge can openPath it; the web client never reconstructs the path itself. useDesignMdState fetches the project's file list, downloads DESIGN.md when present, parses the Provenance section, and computes a stale verdict by comparing the recorded generatedAt against the max mtime of non-DESIGN.md files and the max conversation updatedAt. Drives the button's three-state UI (disabled / fresh / stale) without a daemon-side endpoint. Tests cover happy path, fallback, and both stale branches plus the pure computeStale helper for the null-timestamp edge case. * feat(web): add useFinalizeProject hook with cancel + error-code mapping Wraps POST /api/projects/:id/finalize/anthropic for the Finalize design package button. Three concerns: 1. Lifecycle: idle → pending → success \| error. Double-clicking the button aborts the prior in-flight request before starting a new one so the daemon never sees stacked finalize calls per project. 2. Cancellation: AbortController plumbed through fetch + a 130 s timer (daemon timeout 120 s + 10 s buffer). Cancel returns to idle cleanly — it's a user gesture, not an error surface. 3. Daemon error mapping: when the response is non-OK, body.error.code drives the canonical user-facing toast string (table covers all 7 codes the daemon emits today plus a network-error catch-all). body.error.details, when a string, surfaces alongside the category message so account-usage-cap responses (Anthropic 400 → UPSTREAM_UNAVAILABLE) can show the upstream's own reason instead of just the daemon's category label — committed to lefarcen on #450 verification reply. Tests cover request body shape, all 8 error codes via it.each, the network-error path, the details-surfacing branch, the cancel ⇒ idle flow, and the unknown-code → catch-all message branch. * feat(web): add useTerminalLaunch with electron/web detection Capability-detected wrapper around window.electronAPI.openPath. On desktop the bridge forwards to shell.openPath, which opens the OS file manager at the project working directory (per Electron's contract for directory paths — it is NOT a terminal launcher; spawning a terminal application is deferred per #451 Non-goals). On browser builds the hook reports web-fallback so the caller renders a manual-instruction toast naming the working directory. Treats any non-empty string return from shell.openPath as ok: false so platform-specific failures surface the manual fallback toast. Behavior is exercised end-to-end by the upcoming ContinueInCliButton tests. * feat(desktop): expose shell.openPath via electronAPI bridge Adds an openPath bridge method that the Continue in CLI button (#451) uses to surface the project working directory in the OS file manager. shell.openPath is part of Electron's contract and resolves to '' on success / a non-empty error string on failure; the IPC handler forwards the result so the renderer can decide between the success toast and the manual fallback toast without a separate error channel. Empty / non-string inputs short-circuit to a self-describing error string so the renderer never needs to worry about undefined-input crashes from the main process. Web side: extracts Window.electronAPI into a single global declaration at apps/web/src/types/electron.d.ts so future bridge methods land in one place. Two pre-existing inline declare-global blocks (NewProjectPanel.tsx, providers/registry.ts) are deleted in favor of that single source of truth — the inline ones each carried a partial shape of the bridge and were diverging from the desktop preload. * feat(web): add FinalizeDesignButton, ContinueInCliButton, ProjectActionsToolbar Project-level toolbar that hosts the two new actions from #451. Mounted between AppChromeHeader and the chat/workspace split (wiring lands in the next commit). Per-file actions (Export PDF/PPTX/ZIP, Deploy) stay in the FileViewer share menu. FinalizeDesignButton has three idle labels driven by DESIGN.md existence + staleness, plus a pending state with a spinner and a cancel link that maps to useFinalizeProject's AbortController. Error toasts are owned by ProjectView so the button doesn't carry its own toast surface. ContinueInCliButton renders disabled with a Finalize-pointing tooltip when DESIGN.md is missing (so the workflow is discoverable rather than hidden), enabled when fresh, and enabled with a stale chip otherwise. Chip text is the spec's canonical "Spec is stale — regenerate?" — N-turns-ago is deferred per spec §4.6. Toast.tsx is a tiny transient component that mirrors PromptTemplatePreviewModal's state-based toast pattern; supports a secondary details line so daemon error envelopes that carry an upstream explanation (e.g. Anthropic account-usage cap) can surface the real reason alongside the daemon's category label. CSS appends one block to apps/web/src/index.css mirroring the existing app-project-title token usage; no CSS modules in this repo (verified by grep). * test(web): cover ContinueInCliButton states + interaction wiring Three rendered states (DESIGN.md missing → disabled with the Finalize-pointing tooltip; DESIGN.md fresh → enabled, no chip; DESIGN.md stale → enabled with the canonical "Spec is stale — regenerate?" chip), plus three onClick branches (no-op when disabled, fires once when fresh, fires once when stale). Click-handler integration with clipboard / shell.openPath / toast lives in ProjectView (the button is presentational and takes the handler in via props), so those are covered by Phase K's wiring + the manual smoke test rather than the per-component test. * feat(web): wire Continue in CLI + Finalize buttons into ProjectView Mounts the new project-actions toolbar between AppChromeHeader and the chat/workspace split, hidden when workspaceFocused so the focus-mode artifact view stays uncluttered. Wires the four hooks (useProjectDetail, useDesignMdState, useFinalizeProject, useTerminalLaunch) to a single shared toast surface. handleFinalize reads the request body from the existing config: AppConfig prop and uses effectiveMaxTokens(config) to match the chat-flow's maxTokens defaulting; on success it refreshes useDesignMdState so the toolbar re-renders with the new chip state. handleContinueInCli builds the literal clipboard prompt, copies it, opens the working directory via shell.openPath on desktop / falls through to a manual-instruction toast on browser, and surfaces shell.openPath failures with a fallback toast that names the path. Errors lift into the same toast surface (a useEffect tied to finalize.error) so the daemon's category message + body.error.details reach the user as the spec's two-line render — covered by hook test 16a in the prior commit. ⌘+Shift+K (mac) / Ctrl+Shift+K (others) is the keyboard accelerator for Continue in CLI; capture-phase, platform-gated, no-op when DESIGN.md is missing. Mirrors the existing FileWorkspace shortcut idiom and does not collide with ⌘+P (Quick Switcher). * fix(web): distinguish timeout abort from user cancel in useFinalizeProject Addresses codex P2 finding on PR #974: the catch block treated every AbortError as a user-initiated cancel and reset to idle silently. If the internal 130 s timeout fired, users saw no failure signal but the daemon's synthesis call may still have been in flight. Adds a timedOutRef set inside the setTimeout callback before controller.abort(), and branches in the catch: timeout → status 'error' with new TIMEOUT code ("Finalize timed out after 130 s. The daemon may still be running."), user cancel → existing idle reset. Reset the ref at the start of every trigger() so a previous timeout doesn't poison the next call. Adds one test using vi.useFakeTimers() that advances past 130_001 ms and asserts the TIMEOUT error surface. * fix(web): surface clipboard failures by rendering the prompt in the toast Addresses codex P2 finding on PR #974: handleContinueInCli ignored copyToClipboard's return value, so when both clipboard paths failed (restricted browser context / insecure origin) the toast still said "paste the prompt" though nothing had been copied — leaving users with no manual-copy recourse in exactly the environments where the fallback should help. handleContinueInCli now branches on copyToClipboard's boolean return. On failure the toast renders the prepared prompt in a scrollable <pre> block and pins itself open (no auto-dismiss) so the user has time to select-and-copy manually. Includes a Dismiss button + the working directory in the secondary details line so the user has the information needed to proceed. The folder-open call is skipped on copy failure because there's nothing to paste yet; the user copies first, then re-clicks Continue in CLI when they're ready. Toast component grows an optional Updating VS Code Server to version 41dd792b5e652393e7787322889ed5fdc58bd75b Removing previous installation... Installing VS Code Server for Linux x64 (41dd792b5e652393e7787322889ed5fdc58bd75b) Downloading: 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 2% 2% 2% 2% 2% 2% 2% 2% 2% 2% 2% 2% 2% 2% 2% 2% 2% 2% 3% 3% 3% 3% 3% 3% 3% 3% 3% 3% 3% 3% 3% 3% 3% 3% 3% 3% 4% 4% 4% 4% 4% 4% 4% 4% 4% 4% 4% 4% 4% 4% 4% 4% 4% 4% 5% 5% 5% 5% 5% 5% 5% 5% 5% 5% 5% 5% 5% 5% 5% 5% 5% 5% 6% 6% 6% 6% 6% 6% 6% 6% 6% 6% 6% 6% 6% 6% 6% 6% 6% 6% 7% 7% 7% 7% 7% 7% 7% 7% 7% 7% 7% 7% 7% 7% 7% 7% 7% 7% 8% 8% 8% 8% 8% 8% 8% 8% 8% 8% 8% 8% 8% 8% 8% 8% 8% 8% 9% 9% 9% 9% 9% 9% 9% 9% 9% 9% 9% 9% 9% 9% 9% 9% 9% 10% 10% 10% 10% 10% 10% 10% 10% 10% 10% 10% 10% 10% 10% 10% 10% 10% 10% 11% 11% 11% 11% 11% 11% 11% 11% 11% 11% 11% 11% 11% 11% 11% 11% 11% 11% 12% 12% 12% 12% 12% 12% 12% 12% 12% 12% 12% 12% 12% 12% 12% 12% 12% 12% 13% 13% 13% 13% 13% 13% 13% 13% 13% 13% 13% 13% 13% 13% 13% 13% 13% 13% 14% 14% 14% 14% 14% 14% 14% 14% 14% 14% 14% 14% 14% 14% 14% 14% 14% 14% 15% 15% 15% 15% 15% 15% 15% 15% 15% 15% 15% 15% 15% 15% 15% 15% 15% 15% 16% 16% 16% 16% 16% 16% 16% 16% 16% 16% 16% 16% 16% 16% 16% 16% 16% 16% 17% 17% 17% 17% 17% 17% 17% 17% 17% 17% 17% 17% 17% 17% 17% 17% 17% 17% 18% 18% 18% 18% 18% 18% 18% 18% 18% 18% 18% 18% 18% 18% 18% 18% 18% 19% 19% 19% 19% 19% 19% 19% 19% 19% 19% 19% 19% 19% 19% 19% 19% 19% 19% 20% 20% 20% 20% 20% 20% 20% 20% 20% 20% 20% 20% 20% 20% 20% 20% 20% 20% 21% 21% 21% 21% 21% 21% 21% 21% 21% 21% 21% 21% 21% 21% 21% 21% 21% 21% 22% 22% 22% 22% 22% 22% 22% 22% 22% 22% 22% 22% 22% 22% 22% 22% 22% 22% 23% 23% 23% 23% 23% 23% 23% 23% 23% 23% 23% 23% 23% 23% 23% 23% 23% 23% 24% 24% 24% 24% 24% 24% 24% 24% 24% 24% 24% 24% 24% 24% 24% 24% 24% 24% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 26% 26% 26% 26% 26% 26% 26% 26% 26% 26% 26% 26% 26% 26% 26% 26% 26% 26% 27% 27% 27% 27% 27% 27% 27% 27% 27% 27% 27% 27% 27% 27% 27% 27% 27% 28% 28% 28% 28% 28% 28% 28% 28% 28% 28% 28% 28% 28% 28% 28% 28% 28% 28% 29% 29% 29% 29% 29% 29% 29% 29% 29% 29% 29% 29% 29% 29% 29% 29% 29% 29% 30% 30% 30% 30% 30% 30% 30% 30% 30% 30% 30% 30% 30% 30% 30% 30% 30% 30% 31% 31% 31% 31% 31% 31% 31% 31% 31% 31% 31% 31% 31% 31% 31% 31% 31% 31% 32% 32% 32% 32% 32% 32% 32% 32% 32% 32% 32% 32% 32% 32% 32% 32% 32% 32% 33% 33% 33% 33% 33% 33% 33% 33% 33% 33% 33% 33% 33% 33% 33% 33% 33% 33% 34% 34% 34% 34% 34% 34% 34% 34% 34% 34% 34% 34% 34% 34% 34% 34% 34% 34% 35% 35% 35% 35% 35% 35% 35% 35% 35% 35% 35% 35% 35% 35% 35% 35% 35% 35% 36% 36% 36% 36% 36% 36% 36% 36% 36% 36% 36% 36% 36% 36% 36% 36% 36% 37% 37% 37% 37% 37% 37% 37% 37% 37% 37% 37% 37% 37% 37% 37% 37% 37% 37% 38% 38% 38% 38% 38% 38% 38% 38% 38% 38% 38% 38% 38% 38% 38% 38% 38% 38% 39% 39% 39% 39% 39% 39% 39% 39% 39% 39% 39% 39% 39% 39% 39% 39% 39% 39% 40% 40% 40% 40% 40% 40% 40% 40% 40% 40% 40% 40% 40% 40% 40% 40% 40% 40% 41% 41% 41% 41% 41% 41% 41% 41% 41% 41% 41% 41% 41% 41% 41% 41% 41% 41% 42% 42% 42% 42% 42% 42% 42% 42% 42% 42% 42% 42% 42% 42% 42% 42% 42% 42% 43% 43% 43% 43% 43% 43% 43% 43% 43% 43% 43% 43% 43% 43% 43% 43% 43% 43% 44% 44% 44% 44% 44% 44% 44% 44% 44% 44% 44% 44% 44% 44% 44% 44% 44% 44% 45% 45% 45% 45% 45% 45% 45% 45% 45% 45% 45% 45% 45% 45% 45% 45% 45% 46% 46% 46% 46% 46% 46% 46% 46% 46% 46% 46% 46% 46% 46% 46% 46% 46% 46% 47% 47% 47% 47% 47% 47% 47% 47% 47% 47% 47% 47% 47% 47% 47% 47% 47% 47% 48% 48% 48% 48% 48% 48% 48% 48% 48% 48% 48% 48% 48% 48% 48% 48% 48% 48% 49% 49% 49% 49% 49% 49% 49% 49% 49% 49% 49% 49% 49% 49% 49% 49% 49% 49% 50% 50% 50% 50% 50% 50% 50% 50% 50% 50% 50% 50% 50% 50% 50% 50% 50% 50% 51% 51% 51% 51% 51% 51% 51% 51% 51% 51% 51% 51% 51% 51% 51% 51% 51% 51% 52% 52% 52% 52% 52% 52% 52% 52% 52% 52% 52% 52% 52% 52% 52% 52% 52% 52% 53% 53% 53% 53% 53% 53% 53% 53% 53% 53% 53% 53% 53% 53% 53% 53% 53% 53% 54% 54% 54% 54% 54% 54% 54% 54% 54% 54% 54% 54% 54% 54% 54% 54% 54% 55% 55% 55% 55% 55% 55% 55% 55% 55% 55% 55% 55% 55% 55% 55% 55% 55% 55% 56% 56% 56% 56% 56% 56% 56% 56% 56% 56% 56% 56% 56% 56% 56% 56% 56% 56% 57% 57% 57% 57% 57% 57% 57% 57% 57% 57% 57% 57% 57% 57% 57% 57% 57% 57% 58% 58% 58% 58% 58% 58% 58% 58% 58% 58% 58% 58% 58% 58% 58% 58% 58% 58% 59% 59% 59% 59% 59% 59% 59% 59% 59% 59% 59% 59% 59% 59% 59% 59% 59% 59% 60% 60% 60% 60% 60% 60% 60% 60% 60% 60% 60% 60% 60% 60% 60% 60% 60% 60% 61% 61% 61% 61% 61% 61% 61% 61% 61% 61% 61% 61% 61% 61% 61% 61% 61% 61% 62% 62% 62% 62% 62% 62% 62% 62% 62% 62% 62% 62% 62% 62% 62% 62% 62% 62% 63% 63% 63% 63% 63% 63% 63% 63% 63% 63% 63% 63% 63% 63% 63% 63% 63% 64% 64% 64% 64% 64% 64% 64% 64% 64% 64% 64% 64% 64% 64% 64% 64% 64% 64% 65% 65% 65% 65% 65% 65% 65% 65% 65% 65% 65% 65% 65% 65% 65% 65% 65% 65% 66% 66% 66% 66% 66% 66% 66% 66% 66% 66% 66% 66% 66% 66% 66% 66% 66% 66% 67% 67% 67% 67% 67% 67% 67% 67% 67% 67% 67% 67% 67% 67% 67% 67% 67% 67% 68% 68% 68% 68% 68% 68% 68% 68% 68% 68% 68% 68% 68% 68% 68% 68% 68% 68% 69% 69% 69% 69% 69% 69% 69% 69% 69% 69% 69% 69% 69% 69% 69% 69% 69% 69% 70% 70% 70% 70% 70% 70% 70% 70% 70% 70% 70% 70% 70% 70% 70% 70% 70% 70% 71% 71% 71% 71% 71% 71% 71% 71% 71% 71% 71% 71% 71% 71% 71% 71% 71% 71% 72% 72% 72% 72% 72% 72% 72% 72% 72% 72% 72% 72% 72% 72% 72% 72% 72% 73% 73% 73% 73% 73% 73% 73% 73% 73% 73% 73% 73% 73% 73% 73% 73% 73% 73% 74% 74% 74% 74% 74% 74% 74% 74% 74% 74% 74% 74% 74% 74% 74% 74% 74% 74% 75% 75% 75% 75% 75% 75% 75% 75% 75% 75% 75% 75% 75% 75% 75% 75% 75% 75% 76% 76% 76% 76% 76% 76% 76% 76% 76% 76% 76% 76% 76% 76% 76% 76% 76% 76% 77% 77% 77% 77% 77% 77% 77% 77% 77% 77% 77% 77% 77% 77% 77% 77% 77% 77% 78% 78% 78% 78% 78% 78% 78% 78% 78% 78% 78% 78% 78% 78% 78% 78% 78% 78% 79% 79% 79% 79% 79% 79% 79% 79% 79% 79% 79% 79% 79% 79% 79% 79% 79% 79% 80% 80% 80% 80% 80% 80% 80% 80% 80% 80% 80% 80% 80% 80% 80% 80% 80% 80% 81% 81% 81% 81% 81% 81% 81% 81% 81% 81% 81% 81% 81% 81% 81% 81% 81% 81% 82% 82% 82% 82% 82% 82% 82% 82% 82% 82% 82% 82% 82% 82% 82% 82% 82% 83% 83% 83% 83% 83% 83% 83% 83% 83% 83% 83% 83% 83% 83% 83% 83% 83% 83% 84% 84% 84% 84% 84% 84% 84% 84% 84% 84% 84% 84% 84% 84% 84% 84% 84% 84% 85% 85% 85% 85% 85% 85% 85% 85% 85% 85% 85% 85% 85% 85% 85% 85% 85% 85% 86% 86% 86% 86% 86% 86% 86% 86% 86% 86% 86% 86% 86% 86% 86% 86% 86% 86% 87% 87% 87% 87% 87% 87% 87% 87% 87% 87% 87% 87% 87% 87% 87% 87% 87% 87% 88% 88% 88% 88% 88% 88% 88% 88% 88% 88% 88% 88% 88% 88% 88% 88% 88% 88% 89% 89% 89% 89% 89% 89% 89% 89% 89% 89% 89% 89% 89% 89% 89% 89% 89% 89% 90% 90% 90% 90% 90% 90% 90% 90% 90% 90% 90% 90% 90% 90% 90% 90% 90% 90% 91% 91% 91% 91% 91% 91% 91% 91% 91% 91% 91% 91% 91% 91% 91% 91% 91% 92% 92% 92% 92% 92% 92% 92% 92% 92% 92% 92% 92% 92% 92% 92% 92% 92% 92% 93% 93% 93% 93% 93% 93% 93% 93% 93% 93% 93% 93% 93% 93% 93% 93% 93% 93% 94% 94% 94% 94% 94% 94% 94% 94% 94% 94% 94% 94% 94% 94% 94% 94% 94% 94% 95% 95% 95% 95% 95% 95% 95% 95% 95% 95% 95% 95% 95% 95% 95% 95% 95% 95% 96% 96% 96% 96% 96% 96% 96% 96% 96% 96% 96% 96% 96% 96% 96% 96% 96% 96% 97% 97% 97% 97% 97% 97% 97% 97% 97% 97% 97% 97% 97% 97% 97% 97% 97% 97% 98% 98% 98% 98% 98% 98% 98% 98% 98% 98% 98% 98% 98% 98% 98% 98% 98% 98% 99% 99% 99% 99% 99% 99% 99% 99% 99% 99% 99% 99% 99% 99% 99% 99% 99% 99%100%100% Unpacking: 0% 1% 2% 3% 4% 5% 6% 7% 8% 9% 10% 11% 12% 13% 14% 15% 16% 17% 18% 19% 20% 21% 22% 23% 24% 25% 26% 27% 28% 29% 30% 31% 32% 33% 34% 35% 36% 37% 38% 39% 40% 41% 42% 43% 44% 45% 46% 47% 48% 49% 50% 51% 52% 53% 54% 55% 56% 57% 58% 59% 60% 61% 62% 63% 64% 65% 66% 67% 68% 69% 70% 71% 72% 73% 74% 75% 76% 77% 78% 79% 80% 81% 82% 83% 84% 85% 86% 87% 88% 89% 90% 91% 92% 93% 94% 95% 96% 97% 98% 99%100% Unpacked 4009 files and folders to /home/bryan/.vscode-server/bin/41dd792b5e652393e7787322889ed5fdc58bd75b. Looking for compatibility check script at /home/bryan/.vscode-server/bin/41dd792b5e652393e7787322889ed5fdc58bd75b/bin/helpers/check-requirements.sh Running compatibility check script Compatibility check successful (0) prop and the auto-dismiss TTL is suppressed whenever code is present. CSS adds .od-toast-code (monospace, max-height 240 with overflow-auto) and .od-toast-dismiss styling. Six new Toast tests cover details rendering, code rendering, no-auto-dismiss when code is present, auto-dismiss when code is absent, and the Dismiss button affordance. * fix(web): make ContinueInCliButton disabled-state guidance visible Addresses mrcfps's PR #974 review: native <button disabled> does not fire hover/focus events in browsers we ship against, so a `title` tooltip on the disabled button never surfaces. The only guidance for the missing-DESIGN.md state was effectively invisible — defeating the spec's "discoverable, not hidden" intent. Renders the help text as a visible sibling <span> next to the disabled button instead. Adds aria-describedby pointing the button at the hint's id so assistive tech announces the explanation when the disabled button gets focus. The native `disabled` attribute stays so the button still can't be clicked or submitted. CSS adds .project-actions-disabled-hint (muted italic, 11.5px, matches the existing meta/secondary text style on this surface). Test asserts the role="note" hint is in the DOM with the canonical text and that the button's aria-describedby links to its id. * fix(web): keep ProjectActionsToolbar at natural height inside the .app grid The .app container was `grid-template-rows: auto 1fr` — only two rows. Adding ProjectActionsToolbar as a third child between AppChromeHeader and the chat/workspace split made the toolbar the 2nd grid item, so it took the `1fr` row (filling roughly half the viewport) while the split got pushed into an implicit auto row at its content's natural height. Surfaced as a screenshot from Bryan showing the toolbar's background bleeding across most of the screen. Extend grid-template-rows to `auto auto 1fr` and pin the split to `grid-row: 3` explicitly. Now: - Toolbar visible: row 1 = header (auto), row 2 = toolbar (auto), row 3 = split (1fr, fills remaining viewport). - Toolbar hidden via hidden=workspaceFocused → ProjectActionsToolbar returns null, row 2 collapses to 0px (auto with no content), split still fills row 3. No JS changes; existing 609 tests still green. * fix(web): guard useFinalizeProject state writes against superseded triggers Addresses mrcfps's PR #974 P1 review on useFinalizeProject.ts:132 (also called out as P1.3 in lefarcen's deep-dive review). Calling trigger() twice in quick succession aborted the first controller and swapped abortRef to the new one, but the first request's later AbortError catch still unconditionally called setStatus('idle') / setError(null). That cleared the spinner and re-enabled both toolbar buttons while the replacement finalize was still pending — defeating the de-duplication this hook was meant to enforce. Adds an isCurrent() closure (`abortRef.current === controller`) and gates every state-write site after the await: success path, non-OK envelope path, AbortError-timeout, AbortError-cancel, and network-error all bail early when the trigger has been superseded. Per mrcfps: "make every state write request-scoped." Regression test triggers twice in quick succession with a never-resolving fetch, awaits the first promise (it rejects with AbortError), and asserts status stays 'pending' rather than collapsing to 'idle' under the replacement's lifetime. * fix(desktop): allowlist-validate shell.openPath against registered project roots Addresses mrcfps's PR #974 P1 review on runtime.ts:305 (also called out as P1.2 in lefarcen's deep-dive review): the new `shell:open-path` IPC handler accepted any renderer-supplied string and forwarded it straight into Electron's `shell.openPath`, widening the renderer→main trust boundary so XSS or a compromised renderer dependency could open arbitrary local paths to the user. Adds an explicit gate around the bridge: 1. validateExistingDirectory(p) — floor check that rejects empty strings, relative paths, files, apps, and non-existent paths; realpath-resolves so symlink games can't be used to register one path and reach another. 2. createProjectRootGate() — Set-backed allowlist of daemon-validated project working directories. The renderer calls registerProjectRoot(absDir) once per project mount via a new IPC method (preload bridge); the main process only opens paths that pass both the floor check and the allowlist. ProjectView wires the registration via a useEffect tied to projectDetail.resolvedDir, so the active project's daemon-supplied working directory is always the one being approved (not a renderer- synthesized string). Threat-model caveat documented in the runtime.ts comment block: an attacker that fully controls the renderer can also call register with arbitrary paths. Closing that gap fully requires a daemon-side round-trip to derive the canonical resolvedDir from the daemon's project registry, which is deferred to keep this PR focused. Today's allowlist still defends against accidental misuse, bugs, and common XSS payloads that don't know to call register first. Adds apps/packaged/tests/desktop-project-root-gate.test.ts with 13 cases: floor-validation rejection cases (empty / relative / missing / file), happy-path resolution, symlink realpath canonicalization, and the allowlist's register/isApproved/reset semantics. Mirrors the existing apps/packaged/tests/desktop-url-allowlist.test.ts pattern from PR #911 — the packaged workspace hosts the test because apps/desktop has no vitest setup yet. * fix(daemon): wire request-lifecycle abort signal through finalize route Addresses mrcfps's PR #974 P1 review on apps/daemon/src/server.ts:3831-3837 (also called out as P1.1 in lefarcen's deep-dive review): `POST /api/projects/:id/finalize/anthropic` called `finalizeDesignPackage(...)` without threading any request-lifecycle abort, so cancelling the browser fetch only aborted the UI-side request — the daemon's 60–120 s Anthropic call kept running and still wrote DESIGN.md after the UI returned to idle. Adds an AbortController inside the route handler, fired from `res.on('close')`, and threads its signal into the existing `signal?: AbortSignal` parameter on `FinalizeOptions` (finalize-design.ts:70). `callAnthropicWithRetry` already passes the signal through to the underlying fetch, so a client disconnect now propagates all the way to the Anthropic SDK call. Listener-event choice: `res.on('close')` is the canonical event for "client disconnected before response was sent" in Express. The common alternative `req.on('close')` fires whenever the request stream finishes — for POST routes that means as soon as the body-parser middleware drains the body, well before the route does any work. Using req.on('close') would have flipped the abort controller in every successful run; the test caught this empirically. Caveat documented in the route's comment block: an abort fired after the upstream response has been received but before the atomic write completes still allows the write to land. The SDK contract bounds the network round-trip, not the post-network disk handoff. Adds tests/finalize-route-abort.test.ts: spins up the test server, mocks global fetch to capture the daemon-side AbortSignal at the Anthropic call, sends the request via raw http (so we can destroy the underlying socket), waits until the server reaches the Anthropic call, then destroys the socket and asserts that the daemon-side signal received an abort event within 5 s. Three pre-existing project-watchers chokidar tests show flaky timeouts under full-suite concurrency but pass in isolation; unrelated to this fix. * fix(daemon): refactor finalize-route-abort test to satisfy strict TS narrowing The CI typecheck (`pnpm --filter @open-design/daemon typecheck`, which runs both tsconfig.json and tsconfig.tests.json) caught what my pre-push validation missed: TS narrowed `capturedSignal` to literal `null` because vitest's mockImplementation closure can't prove its callback runs, leaving the bare `let capturedSignal: AbortSignal \| null = null` permanently typed at its initial value. At line 184 (`expect(capturedSignal?.aborted).toBe(true)`) the right-hand side of the optional-chain became unreachable, and TS flagged it as `Property 'aborted' does not exist on type 'never'`. Switches to the standard ref-object pattern (`const capture: { signal: AbortSignal \| null } = { signal: null }`). TS narrows let bindings inside closures conservatively but treats object-property writes as opaque, so `capture.signal` reads correctly across the closure boundary. Logic is unchanged. (Pre-push oversight: ran `pnpm --filter @open-design/web typecheck` but not the full repo `pnpm typecheck` after the daemon test landed; the daemon's own typecheck would have caught this. Adding `pnpm typecheck` back into the standard pre-push checklist.) * fix(desktop): make shell.openPath gate daemon-controlled and reject .app bundles Addresses lefarcen + mrcfps PR #974 P1 reviews on the previous path allowlist (commit `8bf56597`): - mrcfps (runtime.ts:45): `validateExistingDirectory` accepted macOS `.app` bundles because they're directories, so the gate would forward `/Applications/Safari.app` (or any other app bundle) into shell.openPath and launch the application — a stronger capability than the bridge's intended "reveal the project folder" feature. - lefarcen (runtime.ts:396): the allowlist was renderer-controlled. A compromised renderer could call `shell:register-project-root` with any existing absolute directory and then `shell:open-path` that same path; the IPC injection issue I'd documented as "deferred" was the central reviewer concern, not an acceptable caveat. Both reviewers asked for the gate to be derived from a daemon-authoritative source. The redesign drops the renderer-controlled register/openPath pair and replaces it with a single `openPath(projectId)` bridge call. The desktop main process resolves the project ID by calling the daemon's `GET /api/projects/:id` endpoint over the web sidecar proxy (which already forwards `/api/` to the daemon — verified in apps/web/sidecar/server.ts:209 and apps/web/next.config.ts:77), parses `resolvedDir` from the response, validates it against the floor (absolute, exists, is-directory, not .app), and only then forwards to `shell.openPath`. The renderer never names the path directly, so a compromised renderer cannot escalate to opening arbitrary local paths — it can only name a project the daemon already knows about, and the canonical path comes from the daemon's own response. Surface changes: - `runtime.ts`: `createProjectRootGate` removed. `fetchResolvedProjectDir(webUrl, projectId, fetchImpl?)` added. `validateExistingDirectory` rejects `.app` suffix after the realpath check (so symlinked launders are caught too). `shell:open-path` handler signature changes from `(path)` to `(projectId)`; `shell:register-project-root` handler removed. - `preload.cts`: `openPath(projectId)`; `registerProjectRoot` removed from the bridge surface. - `apps/web/src/types/electron.d.ts`: type updated to match. - `useTerminalLaunch.ts`: `open(projectId)` instead of `open(dir)`. - `ProjectView.tsx`: passes `project.id` to `terminalLauncher.open`; the registerProjectRoot useEffect is deleted. Toast text still reads `projectDir` (from `useProjectDetail.resolvedDir`) for fallback messages — the display* path is independent of the open mechanism. - `apps/packaged/tests/desktop-project-root-gate.test.ts`: rewritten to cover `validateExistingDirectory` (8 cases including the new `.app` suffix and symlinked-bundle rejection) and `fetchResolvedProjectDir` (8 cases including empty/invalid project ids, daemon HTTP success/failure, missing resolvedDir, network error, and URL canonicalization). Total: 16 passing tests, ~330 LOC churn including test rewrites. Lesson learned (from the iteration loop, not the code): when a reviewer asks for "ideally X, or at least Y," shipping Y with a deferred-X note flags the gap rather than fixing it. Either ship X or argue Y is sufficient; don't middle-ground. * feat(contracts,sidecar-proto): add desktop-auth IPC + fromTrustedPicker Schema-only prep for the PR #974 round-3 fix. Adds the two type extensions the daemon HTTP gate and the desktop main process will build on: - packages/sidecar-proto: SIDECAR_MESSAGES.REGISTER_DESKTOP_AUTH, with a base64-validated `{ secret }` payload + RegisterDesktopAuthResult. Updates normalizeDaemonSidecarMessage to accept the new message and pins both branches (accept + reject) in tests/index.test.ts. - packages/contracts: ProjectMetadata.fromTrustedPicker — a marker the daemon stamps on folder-imported projects whose POST /api/import/folder passed the desktop HMAC gate. The marker is privileged in the same way as `baseDir`: only the gated import handler sets it, and the desktop main process refuses to forward `shell.openPath` for folder-imported projects whose metadata lacks it. * fix(daemon): gate /api/import/folder on desktop HMAC token Closes the renderer→arbitrary-baseDir→shell.openPath bypass chain flagged by lefarcen and mrcfps in round 3 of PR #974. Both reviewers converged on the same gap: the previous round only moved path resolution into the daemon, but renderer JS could still POST /api/import/folder with any absolute path, get a project ID back, and then call openPath(projectId) to reveal the attacker-chosen path. Daemon-side closure: - New module-scope desktop auth secret + setter exported from apps/daemon/src/server.ts. The secret is null at boot (web/standalone mode unaffected) and gets set when the desktop main process registers it over the daemon's sidecar IPC. - New `verifyDesktopImportToken` pure helper. Verifies tokens shaped `${nonce}~${exp}~${signature}` against HMAC-SHA256(secret, baseDir + "\n" + nonce + "\n" + exp). Field separator is `~` (not `.`) because ISO 8601 expiries embed dots; `~` is in neither base64url nor ISO 8601 character sets. Rejects expired tokens, replayed nonces, and expiries beyond 2× the 60s TTL. - New middleware on POST /api/import/folder. When the secret is set, every request must carry a valid `X-OD-Desktop-Import-Token` header bound to the requested baseDir. Rejected requests return 403 with FORBIDDEN. When the secret is unset (no desktop registered), the route is unchanged so web-only deployments and standalone daemons keep working. - Trusted imports get `metadata.fromTrustedPicker: true` stamped on the project. POST /api/projects and PATCH /api/projects/:id reject any client-supplied `fromTrustedPicker` (privileged the same way as `baseDir`), and the PATCH preservation block re-stamps the marker on partial-metadata patches so it cannot be silently stripped. - Daemon sidecar IPC handler: REGISTER_DESKTOP_AUTH calls setDesktopAuthSecret with the base64-decoded secret. The HTTP and IPC servers share a process so the registration takes effect immediately for the next inbound /api/import/folder call. Tests: - apps/daemon/tests/desktop-import-token-gate.test.ts (15 cases): web mode acceptance, no-token rejection, malformed-token rejection, wrong-secret rejection, wrong-baseDir rejection, expired rejection, oversized-window rejection, valid mint + trusted-picker stamp + replay rejection, plus 6 pure-helper cases for verifyDesktopImportToken. afterAll() clears the secret to keep the shared HTTP server clean for sibling test files. - apps/daemon/tests/projects-routes.test.ts (+2 cases): POST and PATCH reject `fromTrustedPicker` in client-supplied metadata. Existing folder-import-route.test.ts continues to pass because none of those tests register a desktop secret; the gate stays dormant. * fix(desktop,web): atomic pickAndImport replacing pickFolder; openPath trusted-picker check Closes the renderer→arbitrary-baseDir bypass at the bridge boundary. The renderer no longer receives a raw filesystem path from the main process; the picker dialog and the import call live in a single main-process transaction. Desktop main: - runDesktopMain generates a per-process 32-byte secret and registers it with the daemon over the daemon's sidecar IPC before the BrowserWindow is created. registerDesktopAuthWithDaemon retries a few times because tools-dev / tools-pack spawn daemon, web, and desktop as siblings, so the daemon may not be listening yet on desktop boot. A failed registration logs a warning and the runtime refuses pickAndImport calls (no secret → no token can be minted). - runtime.ts replaces the `dialog:pick-folder` IPC with `dialog:pick-and-import`. The handler shows the picker, mints an HMAC token bound to the chosen path, POSTs /api/import/folder via the discovered web URL with the token + body, and returns the daemon's ImportFolderResponse to the renderer (or a structured failure envelope). Renderer never sees the path or the token. - shell:open-path now consults a new pure helper `isOpenPathAllowedForProject` that refuses folder-imported projects whose metadata lacks `fromTrustedPicker: true`. This is the literal interpretation of mrcfps's round-3 follow-up: openPath is gated to projects whose resolvedDir came from the trusted-picker flow, not just transitively via the import gate. Native projects (no baseDir → daemon-owned <projectsRoot>/<id>) are always safe to open. - fetchResolvedProjectDir now returns a `ResolvedProjectDirContext` with hasBaseDir + fromTrustedPicker so the openPath handler can enforce the marker check. - New `signDesktopImportToken` pure helper mirrors the daemon-side signer with the same `~`-separated wire shape, exported for the packaged workspace's test file. Preload bridge: - `pickFolder` is deleted. The new `pickAndImport(init?)` returns the daemon's import response or a structured failure. `openPath` keeps its existing signature; its trust gate now lives in the main process. Web renderer: - electron.d.ts drops `pickFolder` and adds `pickAndImport` with the shared DesktopPickAndImportResult union pulled from contracts. - NewProjectPanel: when running on Electron (pickAndImport bridge present), the "Open folder" button calls pickAndImport atomically and forwards the response through a new `onImportFolderResponse` prop. On web (no bridge), the existing manual baseDir input keeps working — browser builds have no shell.openPath surface so a renderer-named path cannot escalate. - EntryView and App.tsx pass through the new callback. App's `handleImportFolderResponse` updates state from the response without a second fetch (the import already happened in the main process). Tests (apps/packaged/tests/desktop-project-root-gate.test.ts): - 3 cases for `isOpenPathAllowedForProject`: native allowed, trusted-picker allowed, legacy folder-import refused. - 6 cases for `signDesktopImportToken`: shape (~-separated), determinism, signature flips when secret/baseDir/nonce/exp changes. - Existing fetchResolvedProjectDir cases extended for the new `context` shape and additional cases that prove the metadata inspection (hasBaseDir, fromTrustedPicker) reads the daemon response correctly. * fix(daemon): make desktop import-folder gate fail-closed (PR #974 round 4) lefarcen P1 on round 3 of PR #974: the gate's `secret == null → accept` branch (originally intended to keep web-only deployments unaffected) let a renderer bypass the import boundary in two real desktop edges: - Startup race: desktop's REGISTER_DESKTOP_AUTH IPC hasn't reached the daemon yet, but the renderer is already alive in the BrowserWindow and races to fetch /api/import/folder directly with arbitrary baseDir. - Daemon restart mid-session: the new daemon process boots tokenless while a desktop is still running. Same shape: renderer fetches the route, daemon falls through to "web mode", accepts the untrusted baseDir. shell.openPath rejects (no fromTrustedPicker marker) but the daemon's other file APIs (read/write project files, list directories) operate on the attacker-chosen path. Two coordinated mechanisms close that: (1) Sticky in-process flag. `desktopAuthEverRegistered` flips to true on first non-null `setDesktopAuthSecret(...)` and never goes back. setDesktopAuthSecret(null) (used by tests) does NOT relax the gate so production code can never silently fall back to fail-open. Add `resetDesktopAuthForTests()` for vitest cleanup. (2) Orchestrator-pinned mode via OD_REQUIRE_DESKTOP_AUTH=1 read at module load. tools-dev / tools-pack / apps/packaged set this when the daemon is spawned in a desktop-bundled flow (separate commits). With the env set, the gate is active from request 0 — a renderer racing /api/import/folder before registration completes gets a 503 DESKTOP_AUTH_PENDING (transient, retry). Standalone-daemon (web-only) deployments where neither mechanism fires keep the gate dormant and the route's behavior unchanged. Also addresses lefarcen P3 (whitespace HMAC mismatch): the desktop signs the exact picker output, so the daemon must verify the same string. The previous version trimmed `baseDir` before HMAC, which would reject legitimate paths whose final component carried edge whitespace. Use the raw request-body baseDir for verification; the existing trim()+realpath() logic still normalizes for fs operations. New error code: `DESKTOP_AUTH_PENDING` (HTTP 503, retryable). Tests: - `stays fail-closed (503 DESKTOP_AUTH_PENDING) after a registered secret is cleared` — exercises the sticky flag. - `verifies the exact request-body baseDir, not a trimmed version` — pins the round-4 P3 fix. - All existing desktop-import-token-gate cases continue to pass; the beforeEach/afterEach/afterAll resetters now use resetDesktopAuthForTests() to honor the sticky flag. * fix(tools-dev,packaged): pin desktop import-auth on daemon spawn PR #974 round-4 P1 follow-through. The daemon-side fail-closed gate needs OD_REQUIRE_DESKTOP_AUTH=1 in the daemon's spawn env whenever the daemon is paired with a desktop, so the gate is active from request 0 and the daemon-restart-mid-session bypass cannot reopen. tools-dev: - spawnDaemonRuntime accepts a `requireDesktopAuth` option that appends OD_REQUIRE_DESKTOP_AUTH=1 to the spawn env. - startDaemon takes the same flag and additionally checks whether a desktop runtime is already alive in this namespace; either branch pins the env (revival case where the daemon died mid-session and the user runs `tools-dev start daemon` to bring it back up). - startApp threads the bundled-target list down so the daemon spawn knows when desktop is queued in the same orchestration even though the daemon starts first. - The `start` / `restart` / `run` command actions pass the resolved target list into startApp. apps/packaged: - Packaged builds always pair a desktop with the daemon, so startPackagedSidecars unconditionally sets OD_REQUIRE_DESKTOP_AUTH=1 in the daemon child env. Headless builds also flow through this same path, so the same gate applies. Standalone-daemon flows unaffected: `tools-dev start daemon` (alone, no desktop running, no desktop in the bundled target list) does not set the env, and the daemon's gate stays dormant — current web-only behavior is preserved. * fix(desktop,web): align project-id regex with daemon; surface pickAndImport failures mrcfps round-4 nits on PR #974. apps/desktop/src/main/runtime.ts (mrcfps #1): the previous client-side regex `^[a-zA-Z0-9_-]+$` rejected `.` even though the daemon's canonical isSafeId / POST /api/projects accept `[A-Za-z0-9._-]{1,128}`. Result: dotted ids like `my-project.v2` were valid backend-side but got "project id contains disallowed characters" before fetchResolvedProjectDir even hit the network, regressing Continue in CLI / Finalize for those projects. Align the regex with the daemon's shape, comment-tag the rationale. apps/packaged/tests/desktop-project-root-gate.test.ts: add a regression case for a dotted id and one for the 128-char length cap (the new regex exposes both, the old regex obscured the dotted one). apps/web/src/components/NewProjectPanel.tsx (mrcfps #2): the `if (!result \|\| result.ok !== true) return` branch swallowed every non-OK pickAndImport shape (`desktop auth secret not registered`, `web sidecar URL not available`, daemon HTTP errors with details) the same way as the explicit `{ canceled: true }` cancel — leaving the user with a silent no-op when the trusted-picker flow couldn't even get off the ground. Reserve silent-return for the cancel case only; surface every other reason via a Toast (existing component, already used by ProjectView for related Continue-in-CLI flows). The new `formatPickAndImportErrorDetails` helper flattens daemon ApiError envelopes into a single readable secondary line so the operator sees both the category ("Open folder failed: daemon returned HTTP 503") and the upstream reason ("desktop auth required but secret not yet registered"). * docs(architecture): document desktop folder-import auth boundary lefarcen P3 on PR #974 round 4: the `Folder import` section in docs/architecture.md still documented only realpath / sandbox / RUNTIME_DATA_DIR checks and omitted the new desktop HMAC trust boundary, replay/TTL behavior, fail-closed semantics, daemon-restart edge, and legacy-import migration note. Without that subsection it's hard to review whether the 60s TTL, the `~`-separated token shape, or the legacy folder-imports needing re-pick are intentional product decisions or overlooked gaps. Add a "Desktop folder-import auth (PR #974)" subsection covering: - The trust handshake (32-byte secret over sidecar IPC at desktop boot). - Token shape (`${nonce}~${exp}~${signature}`), HMAC payload, and why `.` cannot be the field separator (ISO 8601 expiries embed dots). - TTL and replay behavior (60s, single-use, 2× TTL upper bound). - Fail-closed mechanisms — sticky in-process flag and OD_REQUIRE_DESKTOP_AUTH env var pinning. - Web-only deployments are unaffected (browser builds have no shell.openPath surface). - The `metadata.fromTrustedPicker` marker and the openPath-side defense-in-depth check. - Legacy folder-imports need re-pick to use the Continue-in-CLI button. - Daemon-restart edge: 503 DESKTOP_AUTH_PENDING until desktop re-registers; restart desktop to recover. * fix(packaged): skip desktop-auth gate in headless mode (PR #974 round 5 P2) Round 5 (lefarcen P2): packaged headless mode (daemon+web only, no Electron) was inheriting OD_REQUIRE_DESKTOP_AUTH=1 from the round-4 unconditional pin in startPackagedSidecars. Headless never runs desktop main, so no client could ever register an HMAC secret and folder import returned 503 DESKTOP_AUTH_PENDING permanently — even though headless has no shell.openPath surface to exploit. Plumb a required `requireDesktopAuth: boolean` option through startPackagedSidecars: apps/packaged/src/index.ts (Electron entry) passes true; apps/packaged/src/headless.ts passes false. Extract buildPackagedDaemonSpawnEnv as a pure helper so vitest can pin both branches without spawning a child process. Tests added in apps/packaged/tests/sidecars.test.ts cover both branches plus OD_LEGACY_DATA_DIR / daemonCliEntry env forwarding edges. Refs: nexu-io/open-design#974 * fix(desktop,daemon): lazy auth retry + canonical HMAC binding (PR #974 round 5 P1+P3) Round 5 (lefarcen P1, mrcfps): a daemon restart under OD_REQUIRE_DESKTOP_AUTH=1 left desktop holding a stale secret while the new daemon process required a fresh registration — folder import returned 503 DESKTOP_AUTH_PENDING permanently until the user restarted desktop. Same dead-end if the startup handshake missed its retry window. Round 5 (lefarcen P3): the daemon verified the HMAC against raw request-body baseDir, then trimmed before realpath(). A picker selection of "/tmp/foo " could authorize an import of "/tmp/foo" — token bound to a different path than the one imported. Three coordinated fixes: 1. P1 lazy retry: extract pickAndImportFolder as a pure helper that takes injected fetch / mintToken / registerDesktopAuth deps. On 503 DESKTOP_AUTH_PENDING from /api/import/folder, re-invoke the registration callback once, mint a fresh token (new nonce + new exp keeps replay protection), and POST again. Single retry, no infinite loop. Other failure shapes return immediately to the renderer. 2. P1 wiring: runDesktopMain now ALWAYS passes desktopAuthSecret to the runtime regardless of whether the initial handshake succeeded, plus a registerDesktopAuthWithDaemon callback the runtime invokes lazily. Soften the startup warning text to match the new recovery semantics. 3. P3 binding: trim picker output ONCE on the desktop side before both signing the HMAC and POSTing. Daemon-side verification stays against raw request-body baseDir (round-4 behavior); the daemon's defensive trim before realpath() is now a no-op for desktop traffic and only load-bearing for web-mode callers (path.isAbsolute(" /foo ") is false). End-to-end: desktop-signed string == request body == HMAC- verified string == realpath() input. Tests: - apps/packaged/tests/desktop-pick-and-import.test.ts (NEW, 7 cases): lazy-retry happy path; lazy-retry exhausted (re-register WAS called); single-attempt happy path (no unnecessary IPC); optional-callback no-op; non-503 failures bypass retry; network errors; non-PENDING 503 bypasses retry. - apps/daemon/tests/desktop-import-token-gate.test.ts: replace round-4 whitespace test with two round-5 binding tests — the trimmed string flows end-to-end (HMAC verifies, project metadata.baseDir equals realpath of trimmed input), and a request whose body baseDir diverges from the HMAC-bound string is rejected 403. docs/architecture.md §"Desktop folder-import auth" — update the daemon- restart-edge bullet to describe the lazy-retry recovery (round 4 said "restart desktop to recover", which is now wrong) and add a headless- packaged-mode bullet describing the round-5 P2 gate exclusion. Refs: nexu-io/open-design#974 * feat(sidecar-proto,daemon): surface desktopAuthGateActive over STATUS IPC (PR #974 round 6 prep) Round 6 (mrcfps): the split-start dev flow `tools-dev start daemon` -> `tools-dev start desktop` was leaving the daemon ungated because `OD_REQUIRE_DESKTOP_AUTH=1` is only injected when daemon and desktop spawn in the same orchestrator invocation. To fix that, tools-dev needs to introspect the running daemon's gate state before launching desktop main — but the existing STATUS IPC didn't carry the flag. This commit extends `DaemonStatusSnapshot` with a required `desktopAuthGateActive: boolean` and wires the daemon sidecar's STATUS handler (and the public `status()` method on the handle) to recompute the value from `isDesktopAuthGateActive()` per request, since the flag flips after `REGISTER_DESKTOP_AUTH` and stays sticky. Extracted `withCurrentDesktopAuthGate(snapshot)` as a tiny pure helper so the wiring is testable without booting a real IPC server. The new test pins four scenarios: - no secret registered (web-only mode) -> false - after `setDesktopAuthSecret(buf)` -> true - after `setDesktopAuthSecret(null)` (sticky) -> still true - input snapshot's stale value is overridden by the live flag The orchestrator-side consumer lands in the next commit (`tools/dev/src/desktop-auth-gate.ts`). Refs: nexu-io/open-design#974 * fix(tools-dev): auto-restart ungated daemon before desktop start (PR #974 round 6 mrcfps) Round 6 (mrcfps): the split-start dev sequence `tools-dev start daemon` -> `tools-dev start desktop` was leaving the daemon running without `OD_REQUIRE_DESKTOP_AUTH=1`. The env var is only injected when (A) daemon and desktop spawn in the same orchestrator invocation (`startApp` line ~682) or (B) a desktop runtime is already alive at daemon spawn time (`startDaemon` lines ~595-596). Neither fires for the split flow, so a renderer (or any local HTTP client) could `POST /api/import/folder` directly with an arbitrary `baseDir` before the desktop's first registration POST. Round-5's lazy retry didn't help: it triggers on `503 DESKTOP_AUTH_PENDING`, and the ungated daemon returns 200. Close the gap by introspecting the running daemon's `desktopAuthGateActive` (added to the STATUS IPC in the prior commit) at the start of `startApp(DESKTOP, ...)`. When the daemon reports the gate inactive, stop the daemon (and web, if running), respawn the daemon with `requireDesktopAuth: true`, restart web, then proceed with the desktop start. Restart order is critical and pinned by tests: web stops FIRST (so the web->daemon proxy doesn't serve a transient 502 against the down-then-up daemon), then daemon stops, then daemon respawns gated, then web restarts. The bundled-targets path (`pnpm tools-dev`) is unaffected because trigger (A) already armed the gate at first daemon spawn — the helper costs one ~800ms STATUS IPC roundtrip and returns no-op. Helper lives in its own module (`tools/dev/src/desktop-auth-gate.ts`) so the regression test can import it without triggering the `cli.parse()` side effect at the bottom of `tools/dev/src/index.ts`. Five `node:test` cases pin the call sequence — no daemon, gate active, gate inactive + no web, gate inactive + web running, log shape — so a future refactor can't silently regress the gate. Two synthetic `DaemonStatusSnapshot` literals in `inspectAppStatus` and `inspect` (used when the IPC is unreachable) get `desktopAuthGateActive: false` to satisfy the now-required type field — semantically correct since "no daemon answering" trivially means "no gate active." `docs/architecture.md` adds a new bullet under the Desktop folder- import auth section describing this auto-restart behavior. Refs: nexu-io/open-design#974 * fix(daemon): combine finalize request-abort + timeout signals (PR #974 round 7 lefarcen P1) Round 6 wired the route handler to pass `finalizeAbort.signal` into `finalizeDesignPackage`, but the helper only created its own DEFAULT_TIMEOUT_MS controller when no caller signal was supplied. The result: a client that stayed connected could hold the finalize lock and upstream call indefinitely. Always create the timeout controller; when the caller passes a signal, combine both via `AbortSignal.any` so neither cancel path replaces the other. Adds two regression tests in finalize-design.test.ts: - timeout fires when caller signal never aborts - pre-aborted caller signal still cancels Adds an internal `timeoutMs` option to FinalizeOptions so tests can exercise the abort path without a 120 s wait or fake-timer chains. Production callers omit it; default remains DEFAULT_TIMEOUT_MS. * fix(daemon): allow PATCH preserving existing fromTrustedPicker marker (PR #974 round 7 lefarcen P2) The PATCH /api/projects/:id handler was rejecting any metadata that contained `fromTrustedPicker`, including the unchanged `true` marker that the linked-folder UI re-spreads when editing `linkedDirs`. Trusted folder-imported projects could not update other metadata fields without 400-ing on their own marker. Switch the rejection condition from `'in'` to a value comparison: only reject when the incoming value differs from the persisted one (`patch.metadata.fromTrustedPicker !== existingMeta?.fromTrustedPicker`). That keeps acquisition (existing=undefined, patch true) and flip (existing=true, patch false) attempts blocked while letting the UI re-spread the existing marker. POST /api/projects stays strict; that path has no existingMeta. Adds two regression tests in desktop-import-token-gate.test.ts: - allows PATCH preserving the existing fromTrustedPicker:true marker - rejects PATCH that flips fromTrustedPicker on a trusted project * fix(desktop,packaged): main-process api uses daemon URL not webUrl (PR #974 round 7 lefarcen P2) Packaged builds load the renderer from `od://app/` and report that URL through `discoverWebUrl`. But Node-side `globalThis.fetch` (undici) does not route through Electron's registered `od://` protocol handler — that handler runs in the renderer's protocol scope, not in main-process Node. So `pickAndImportFolder` and `fetchResolvedProjectDir` calls from main silently failed in packaged builds against the protocol scheme. Add `discoverDaemonUrl` to `DesktopRuntimeOptions` and `DesktopMainOptions`. The packaged shell already has the sidecar's real `http://127.0.0.1:<port>` URL (`sidecars.daemon.url` from STATUS IPC) — thread it through to the runtime. Main-process API calls now prefer the daemon URL and fall back to the renderer URL for tools-dev (where it is itself http://127.0.0.1). `PickAndImportFolderDeps.webUrl` renamed to `apiBaseUrl` so the boundary is explicit at the type level; `fetchResolvedProjectDir`'s first parameter renamed similarly. tools-dev callers see no behavior change — their web URL is already an http://127.0.0.1 URL Node fetch can hit. Test (`apps/packaged/tests/desktop-pick-and-import.test.ts`): - existing 7 cases updated to the new prop name (no behavior change) - new case pins URL composition: builds `${apiBaseUrl}/api/import/folder` and never produces a custom-protocol URL. Note for review: this test pins URL composition; full Electron protocol handler integration (renderer fetch through `od://`) is not exercised in unit tests here. * fix(tools-dev): preserve daemon/web ports across desktop-auth gate restart (PR #974 round 7 lefarcen P2) Round 6 added the split-start auto-restart in ensureDaemonGateForDesktop to close the dev-flow gap where `start daemon` then `start desktop` left the daemon ungated. The restart was passing the current `start desktop` CLI options to startDaemonGated/startWeb, which meant a stack started with `--daemon-port 17456 --web-port 17573` could be silently moved to random ports during the hardening restart, breaking browsers and scripts pinned to those ports. Extract the running ports from the STATUS snapshots (daemon.url and web.url) and forward them as explicit `{ port }` callback args. The closure in `tools/dev/src/index.ts` overrides the corresponding option when a port was extracted; null falls back to the original CLI flags. Adds three regression tests in tools/dev/tests/desktop-auth-gate.test.ts: - preserves the running daemon port across the hardening restart - preserves the running web port across the hardening restart - falls back to caller options (port:null) when the URL has no port * fix(web): refresh useDesignMdState on file/chat events (PR #974 round 7 mrcfps) useDesignMdState() previously only recomputed on mount and on explicit refresh() (called once after finalize). Once the user kept working — editing files or sending more chat turns — the stale/fresh badge could drift out of sync because file mtimes and conversation updatedAt moved past the recorded generatedAt without the hook re-checking. Hook accepts an optional `refreshKey: number` arg; ProjectView keeps a counter and bumps it on three events: - file-changed SSE (covers tool-emitted file mutations) - live_artifact* SSE (covers chat turns that emit artifacts) - streaming `true → false` edge (covers pure-text chat turns) The hook treats refreshKey as a compute() dep; React's Object.is comparison short-circuits the no-op renders, so each bump is a single recompute pass. Adds a regression test in useDesignMdState.test.tsx: - flips stale state after a refreshKey bump without remounting * fix(web): degraded-state useDesignMdState on malformed provenance (PR #974 round 7 mrcfps) useDesignMdState used to report `{ isStale: false, staleReason: null }` when the parser could not extract a comparison timestamp from the DESIGN.md `## Provenance` section. The pinned test made that the documented behavior. As mrcfps pointed out, that fails open exactly when the freshness signal is most untrustworthy: any provenance- formatting drift silently disables the staleness warning. Extend `DesignMdStaleReason` with a third variant `'unknown-provenance'`. On `generatedMs === null`, return `{ isStale: true, staleReason: 'unknown-provenance' }`. ContinueInCliButton renders a distinct chip text "Spec freshness unknown — regenerate to refresh signal" for that variant; the button stays enabled because not-comparable is not the same as broken state. Tests: - modify the existing pinned test to assert the new degraded state - add an end-to-end useDesignMdState test feeding a malformed Provenance section through compute() so a regression that re-pins fresh-on-null at the hook level (not just computeStale) fails fast - add ContinueInCliButton render + click tests for the new chip --------- Co-authored-by: DevForgeAI CI/CD Engineer <devforge-ai@development.ai> Co-authored-by: lefarcen <935902669@qq.com>	2026-05-10 11:44:32 +08:00
PerishFire	cc343f8828	ci: optimize beta release packaging cache (#1095 ) * ci: optimize beta release packaging cache * fix: version windows builder cache * fix: forward linux app version in container	2026-05-10 10:11:05 +08:00
Cursor Agent	2230b5767c	feat(deploy): per-cloud Helm value overrides (spec §15.5) Plan L1. Seven values-<cloud>.yaml files at tools/pack/helm/open-design/: values-aws.yaml EFS + gp3 + ALB ingress values-gcp.yaml Filestore + pd-balanced + GCLB ingress values-azure.yaml azurefile-csi + managed-csi + App Gateway ingress values-aliyun.yaml alicloud-nas + alicloud-disk-essd values-tencent.yaml cfs + cbs values-huawei.yaml sfs-turbo + csi-disk values-self.yaml cluster-default storageClass; ingress off Operators install with: helm install od ./tools/pack/helm/open-design \ -f tools/pack/helm/open-design/values-aws.yaml \ --set secrets.apiToken=$(openssl rand -hex 32) Each override is intentionally minimal — it only differs from the baseline values.yaml on the volume + ingress dimensions the matching cloud needs. Secret-manager wiring (External Secrets Operator, secrets-store CSI) is documented inline; the chart's Secret stays the destination of the sync. Co-authored-by: Tom Huang <1043269994@qq.com>	2026-05-09 13:45:53 +00:00
Cursor Agent	9e12d146e2	feat(deploy): Helm chart templates (Phase 5) Plan K2 / spec §15.5. Lands the canonical Kubernetes shape behind the parameter surface shipped in J4. Six templates: - _helpers.tpl name / fullname / labels / selector helpers - secret.yaml OD_API_TOKEN + ANTHROPIC_API_KEY + TAVILY_API_KEY (operators inject through values, not committed) - configmap.yaml non-secret env block from .Values.env - pvc.yaml /data/od + /data/config (gated by persistence.enabled) - service.yaml ClusterIP / LoadBalancer per .Values.service.type - deployment.yaml pod with envFrom both maps, /api/daemon/status liveness + readiness probes, persistent volume mounts at OD_DATA_DIR / OD_MEDIA_CONFIG_DIR, checksum/config + checksum/secret annotations so a values change rolls the pod - ingress.yaml optional, gated by ingress.enabled - NOTES.txt post-install instructions including port-forward + bearer-token reminder The chart now installs end-to-end with: helm install od ./tools/pack/helm/open-design --set secrets.apiToken=$(openssl rand -hex 32) Per-cloud override files (values-aws.yaml, values-gcp.yaml, …) stay scheduled — they're tiny diffs against this baseline. Co-authored-by: Tom Huang <1043269994@qq.com>	2026-05-09 13:32:45 +00:00
Cursor Agent	bf30b308e3	feat(deploy): docker-compose + Helm chart entry slice (Phase 5) Plan J4 / spec §15.4 / §15.5 / §16 Phase 5. Three landings: - deploy/Dockerfile now COPYs plugins/_official/ into the image so the bundled atom plugins from §3.I3 register on container boot — without this, registerBundledPlugins() silently no-ops inside the container and the §23 self-bootstrap promise breaks for hosted deployments. - tools/pack/docker-compose.yml ships the canonical hosted-mode manifest spec §15.4 calls out: two-volume layout (od-data + od-config) per §15.2, OD_BIND_HOST=0.0.0.0 + OD_API_TOKEN + OD_NAMESPACE + snapshot retention knobs as env, /api/daemon/status as the healthcheck endpoint (Phase 1.5). Drop-in usable with `docker compose -f tools/pack/docker-compose.yml up -d`. - tools/pack/helm/open-design/{Chart.yaml,values.yaml,README.md} pins the Helm chart parameter surface for the per-cloud overrides spec §15.5 enumerates (AWS / GCP / Azure / Aliyun / Tencent / Huawei / self-hosted). Templates land in the Phase 5 follow-up PR; the values schema is locked here so the per-cloud override files (values-<cloud>.yaml) review in isolation. scripts/guard.ts allowlist gains `packages/agui-adapter/esbuild.config.mjs` so the new package passes the residual-JS guard. Daemon tests stay at 1486/1486 (deploy artifacts only). Co-authored-by: Tom Huang <1043269994@qq.com>	2026-05-09 13:20:53 +00:00
Marc Chan	b03a504da6	release: Open Design 0.6.0 (#1080 )	2026-05-09 19:58:11 +08:00
PerishFire	dcfab797c2	[codex] Add stable nightly promotion gate (#962 ) * Upload beta e2e spec reports to R2 * Expose beta report URLs in summary * Complete Indonesian deploy locale keys * chore: factor release workflow scripts * chore: bump packaged beta base version * test: wait for mac packaged runtime health * fix: capture mac packaged startup logs * chore: improve mac release build observability * fix: ad-hoc sign unsigned mac builds * chore: diagnose mac packaged startup * fix: relax unsigned mac launch signing * chore: improve mac launch diagnostics * chore: simplify beta mac release artifacts * fix: align packaged mac smoke launch config * fix: externalize mac daemon wasm dependency * chore: require signed stable mac releases * fix: use stable app version for nightly package builds * chore: clean release artifacts after publish * chore: publish beta reports as zip * ci: disable beta mac tools-pack cache * fix: skip mac framework binary symlinks when signing * fix: sign mac framework version bundles * ci: disable beta mac pnpm cache * chore: align stable release reports * ci: require matching nightly before stable release * ci: avoid mac pnpm cache for packaged smoke	2026-05-08 21:48:54 +08:00
ferasbusiness666	1e8926271b	Harden security scan findings and upgrade dependencies (#806 ) * feat: add accent color control and launcher for Open Design * fix: remove launcher binary from PR * test: cover accent appearance edge cases * Harden security scan findings and upgrade deps * Address proxy security review * Pin jsdom for web test stability --------- Co-authored-by: ferasbusiness666 <ferasbusiness666@users.noreply.github.com> Co-authored-by: lefarcen <935902669@qq.com>	2026-05-08 19:46:34 +08:00
Marc Chan	b06f26a5fd	test: strengthen e2e PR coverage (#796 ) * test: strengthen e2e PR coverage * fix: address e2e PR feedback Generated-By: looper 0.6.1 (runner=fixer, agent=opencode) * fix: address e2e PR feedback Generated-By: looper 0.6.1 (runner=fixer, agent=opencode) * ci: cache Windows packaged smoke builds * test: fake additional agent runtimes * fix: address e2e PR feedback Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix: address e2e PR feedback Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix: address e2e PR feedback Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix: address e2e PR feedback Route tools-pack mac starts through a launch-time packaged config override so portable packaged smoke runs keep using the namespace runtime root that inspect and logs expect. Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix: address e2e PR feedback Fall back to the packaged app's embedded config when the build output config is missing so installed mac starts still work. Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix: align packaged mac PR smoke with tools-pack runtime mode Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix: address e2e PR feedback Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix: address e2e PR feedback Keep blake3-wasm out of the packaged mac daemon prebundle so the standalone runtime loads the Cloudflare asset hasher from node_modules instead of crashing in ESM. Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix: address e2e PR feedback Skip the portable mac launch override when the bundled packaged config is missing so installed fallback app targets can still boot with packaged defaults. Add a regression test covering the missing-config start path. Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(pack): remove duplicate mac prebundle dependency key	2026-05-08 16:48:10 +08:00
Marc Chan	e14b8092ea	feat: add Orbit activity summaries (#681 ) * feat: add Orbit activity summaries * fix(orbit): make runs navigable while agent continues * fix(web): widen minimum chat panel * feat: support Orbit template selection * fix(daemon): avoid bogus skill side-file preflight * fix(web): collapse orbit artifact project cards * fix(web): preserve orbit project card titles * fix: improve Orbit run daily briefing * fix: handle Orbit digest data failures * fix: load Orbit templates and connector tools reliably * fix: keep Orbit summary counts consistent Generated-By: looper 0.6.1 (runner=fixer, agent=opencode) * fix: apply Orbit template skill context * fix: cache and curate connector tools for Orbit * fix: align Orbit defaults and connector discovery * fix: simplify Orbit template settings * fix: move connectors into settings * fix: compact connector settings catalog * fix: address Orbit PR feedback Generated-By: looper 0.6.1 (runner=fixer, agent=opencode) * fix: address Orbit PR feedback Generated-By: looper 0.6.1 (runner=fixer, agent=opencode) * fix: address Orbit PR feedback Generated-By: looper 0.6.1 (runner=fixer, agent=opencode) * fix: address Orbit PR feedback Generated-By: looper 0.6.1 (runner=fixer, agent=opencode) * fix: address Orbit PR feedback Generated-By: looper 0.6.1 (runner=fixer, agent=opencode) * fix: address Orbit PR feedback Generated-By: looper 0.6.1 (runner=fixer, agent=opencode) * fix: address Orbit PR feedback Generated-By: looper 0.6.1 (runner=fixer, agent=opencode) * fix: address Orbit PR feedback Generated-By: looper 0.6.1 (runner=fixer, agent=opencode) * fix: prevent connector action button from stretching into pill The icon-only connect/disconnect buttons in the embedded connectors catalog inherited min-width: 92px / 106px from the non-embedded pill rules, overriding the 24px square sizing and causing the buttons to overlap the card head text. Reset min-width to 0 in the embedded icon-only rule so the compact square layout holds. * fix(web): align live artifact file rows * fix: clean up Orbit connector settings lifecycle Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix: address Orbit review regressions Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * feat(web): localize Orbit and connector settings * feat(web): gate Orbit runs without connectors * feat(web): refine connector settings UX * feat(web): safeguard Composio key clearing * fix(web): refresh Composio tool badges * feat(web): show connector logos * feat(daemon): localize Orbit prompt window * fix(daemon): clarify blocked connector callback closes * test(daemon): harden flaky async probes * fix(web): align Indonesian connector locale keys * test(web): align connector browser props * fix(web): preserve explicit credential clears Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(daemon): time out Composio logo proxy fetches Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(web): localize Indonesian connector settings copy Translate the new connector settings strings in the Indonesian locale and lock them with a regression test so this surface no longer silently falls back to English. Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(web): preserve discovered connector tools Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(web): preserve onboarding autosave completion Keep settings autosave from clearing onboarding completion after the close gesture, and expose the desktop main types from source so workspace validation can typecheck packaged imports without a prior desktop build. Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(daemon): defer Composio catalog cache hydration Load persisted Composio catalog data only after the runtime data directory is configured so startup cannot read another namespace's cache. Add a regression test that exercises the module-load singleton path. Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(web): treat discovery completion independently Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(web): preserve latest settings draft on close Use the latest persisted settings draft when the dialog closes so onboarding completion does not race a stale daemon sync and overwrite newer Orbit/template selections. Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(web): avoid syncing draft Composio key on Orbit run Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(web): localize Orbit settings copy Translate the new Indonesian Orbit and autosave strings so the settings UI no longer falls back to English and the locale regression stays covered. Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(web): prefer fresh connector catalog state Keep refetched connector status/auth data authoritative while retaining discovery-only tool metadata so the connectors UI stays consistent after refreshes. Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(web): declare Indonesian locale fallback keys explicitly Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(web): inline Indonesian fallback strings for CI Replace the Indonesian locale's per-key English lookups with explicit strings so workspace typecheck no longer depends on brittle build-mode resolution in CI. Add a regression test that blocks those per-key English lookups from reappearing in the CI-sensitive fallback sections. Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(daemon): restrict proxied connector logos to image MIME types Reject non-image upstream logo responses so the daemon never serves third-party HTML from its localhost origin. Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * test(e2e): align settings dialog regressions Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(web): decouple Orbit runs from media sync failures Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(web): keep SPA catch-all export-compatible Disable dynamic catch-all params for the exported SPA shell so Next.js static builds can emit the root route again. Add a regression test covering the route config against the web export mode. Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(web): preserve Orbit config and workspace routes Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(daemon): block SVG in connector logo proxy Reject SVG and other unsafe proxied logo responses so third-party logo content cannot execute under the daemon origin, while keeping raster logo fetches working and making rejected responses non-cacheable. Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(daemon): fall back to static catalog for empty cache Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(web): disable Orbit run before connector gate resolves Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(desktop): export shipped desktop types Point the desktop ./main type export at the generated declaration so installed consumers resolve the published file set. Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(web): restore persisted question form selections Render historical submitted answers directly so reloaded question forms keep their locked selections visible. Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(web): retry forced media sync autosave Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(daemon): keep Composio logo timeout through body read Keep the Composio logo fetch timeout active until the response body is fully consumed so stalled body reads abort and clear the inflight cache entry. Add a regression test that proves a delayed body read times out and the next request can recover.\n\nGenerated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(web): refresh Orbit gate after connector auth Re-check connector availability when the settings window regains focus so Orbit unlocks as soon as a connector finishes authenticating in the same settings session. Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(daemon): keep connector detail tool lists intact Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(daemon): ignore malformed Orbit summaries Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(e2e): stabilize design-system multi-select flow Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(daemon): cap Composio logo cache growth Bound the Composio logo cache with LRU eviction and expired-entry pruning so repeated untrusted logo requests cannot grow daemon memory without limit. Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(daemon): bound proxied Composio logo payloads Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(web): align autosave settings tests Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(web): remove stray CSS conflict marker Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fixer: address PR #681 follow-up items Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(web): restore restart routes and connector flows * fix(web): keep SPA export route static * fix(web): stabilize chat scroll tests --------- Co-authored-by: lefarcen <935902669@qq.com>	2026-05-08 14:27:46 +08:00
shangxinyu1	aec9428b08	Fix desktop preview and packaged app interactions (#879 ) * Fix packaged deck navigation interactions * Fix connector auth in packaged app and localized content coverage * Fix Electron connector browser handoff contract	2026-05-08 14:26:10 +08:00
Tom Huang	56bf6ee1b6	feat: agent-callable research command and /search (#615 ) * feat: pre-generation research (Tavily) for grounded generation Adds an optional pre-generation research step so the agent can produce slides / prototypes / decks grounded in real sources instead of guessing. User flow: 1. Settings -> Tavily Search -> paste API key (or set TAVILY_API_KEY). 2. Click the new Research button in the chat composer. 3. On send, the daemon runs a Tavily search, prepends the findings as a <research_context> block ahead of the system prompt, and spawns the agent. Research progress shows up as status pills in the chat stream; the agent cites sources inline as [1]/[2]/... Phase 1 surface: - Single provider (Tavily), single depth ('shallow'), no LLM synthesis pass (Tavily's `answer` is the summary). - Composer toggle only; no popover / depth picker yet. - Reuses the existing `status` SSE agent payload + StatusPill UI so no new event variants or renderer code are needed. Layers touched: - contracts: ResearchOptions / Source / Findings DTOs; ChatRequest.research; export from index. - daemon: apps/daemon/src/research/{index,tavily}.ts orchestrator + provider; tavily added to MEDIA_PROVIDERS and ENV_KEYS; hook in startChatRun before prompt assembly. - web: ChatComposer toggle + ChatSendMeta; threaded through ChatPane / ProjectView / streamViaDaemon into ChatRequest. Side fix (required to land the feature, but useful on its own): contracts internal relative imports lacked the `.js` suffix that NodeNext module resolution requires. This was already breaking `pnpm --filter @open-design/daemon typecheck` on main; without the fix, none of the new research types were visible to the daemon. All internal contracts imports now carry `.js`. Spec: specs/current/research-feature.md (phases 2-4 outlined for follow-up: composer popover, multi-provider, deep recursion, example skills with research_recommends). Verified: - pnpm --filter @open-design/contracts typecheck/test - pnpm --filter @open-design/daemon typecheck (the chokidar project-watchers test is a pre-existing flake, unrelated) - pnpm --filter @open-design/web typecheck - node scripts/verify-media-models.mjs * fix(daemon): clamp Tavily max_results to 20 Tavily's /search endpoint requires `max_results` in [0, 20]; sending a larger value (e.g. when `research.depth: "deep"` resolves to 30) returns 400 and `runResearch` silently falls back to no-research. Clamp at the provider boundary so Phase 2 depth tiers above 20 still produce results instead of failing the request. Generated-By: looper 0.6.1 (runner=fixer, agent=claude-code) * Remove stale research merge leftovers * Add agent-callable research search * Fix Indonesian locale typecheck * Fix research command invocation edge cases * Harden slash search prompt expansion * Honor research source caps in command contract * Require search reports in design files * Add research data provider settings * Wire web research provider fallback order * Update research provider fallback wording * Revert "Update research provider fallback wording" This reverts commit `86fb6001e3`. * Revert "Wire web research provider fallback order" This reverts commit `4c9e16036b`. * Revert "Add research data provider settings" This reverts commit `23630d1746`. * Add Dexter and Last30Days research skills * Add DCF and Last30Days OD skills * Add Last30Days and Dexter skills * Resolve research review threads --------- Co-authored-by: a1chzt <chizblank@gmail.com>	2026-05-08 10:33:44 +08:00
lefarcen	2bb029cb58	release: Open Design 0.5.0 (#820 ) 0.5.0 已从 `c21cbc6` 发布（https://github.com/nexu-io/open-design/releases/tag/open-design-v0.5.0）；本次 squash 把版本 bump 与 CHANGELOG [0.5.0] 条目带到 main 历史，便于后续 0.5.1 release 在 main 上走标准 dispatch 流程。	2026-05-08 00:41:01 +08:00
Nagendhra Madishetti	294fe94c67	fix(pack/win): close detection gaps that let `Open Design.exe` stay locked at install time (#821 ) (#823 ) The custom NSIS pre-install flow detects and closes running OD processes before extraction, but two gaps let `$INSTDIR\Open Design.exe` stay locked when the installer reaches `MUI_PAGE_INSTFILES`. The user then sees NSIS's native "file in use" Retry/Cancel dialog (not the custom `RunningInstancesCloseFailed` text), which is what kutzki reported. `DetectRunningInstances` and `CloseRunningInstances` previously matched processes only by `Win32_Process.ExecutablePath` under the install root. WMI returns null `ExecutablePath` for processes the caller cannot fully introspect: insufficient access tokens, processes mid-spawn, protected-process states. A child spawned in the millisecond window between the previous OD running and the installer's detection step can hit this and slip past the filter. Both functions now fall back to a CommandLine prefix match against the install root for null- `ExecutablePath` rows, which is OD-specific enough to avoid false positives without relying on a global `Name` match. `CloseRunningInstances` previously called `Stop-Process -Force` and returned without waiting for the OS to actually finalize the process exit. On Windows the file handle GC for an exiting process is async, so a `MUI_PAGE_INSTFILES` overwrite right after the kill can race the handle release and trigger NSIS's native file-in-use prompt even though the kill succeeded. The function now `WaitForExit(5000)` per PID after the force-stop loop, before returning, so the lock has time to clear before NSIS attempts the overwrite. Both changes were endorsed by @lefarcen in the issue thread after they ran their own code review and confirmed the matching diagnosis. The third part of the proposed fix (cross-platform `before-quit` cleanup in the Electron app) is in scope for #422 and not touched here. Local validation: `pnpm guard` clean. `pnpm --filter @open-design/tools-pack typecheck` fails on a pre-existing issue (missing `@electron/rebuild` devDep in tools-pack/src/win/app.ts on current main, reproducible by checking out main directly without my edit), unrelated to this change. The PowerShell embedded in the NSIS template is not exercised by the workspace test suite, so the change has no unit-test surface. Honest caveat: I do not have a Windows packaged-build environment to run `pnpm tools-pack win build --to nsis` and reproduce the locked-file dialog end-to-end. The PowerShell edits are textual and match the patterns already in the file, but a verifying install pass on a real Windows host with a previous OD already installed and running is recommended before merge. Co-authored-by: Nagendhra <nagendhra405@gmail.com>	2026-05-07 21:42:50 +08:00
PerishFire	cb92c93ae0	Migrate beta release publishing to R2 (#805 ) * Prebundle standalone web packaged runtime * Harden mac standalone prebundle policy * Prebundle mac daemon packaged runtime * Prune mac Electron locales * Maximize mac release artifact compression * Publish beta mac artifacts to R2 * Use remote R2 uploads for beta releases * Fail fast on beta R2 access issues * Use S3-compatible uploads for beta R2 releases * Decouple beta versioning from GitHub releases * Remove legacy beta metadata source * Address release beta review notes	2026-05-07 19:13:52 +08:00
PerishFire	6efac8887e	Improve Windows beta packaging and installer flow (#768 ) * Optimize Windows packaged web output * Fix packaged contracts runtime build * Optimize Windows packaged size pruning * Prune Windows root Next payload * Remove Windows bundled Node runtime * Prune Windows standalone duplicate Next * Add tools-pack cache foundation * Cache Windows packaged build layers * Cache Windows workspace builds * Cache Electron-ready Windows app * Split Windows tools-pack module * Cache Windows dir build outputs * Split Windows pack build modules * Document Windows NSIS smoke namespace limits * Move Windows NSIS smoke note to agents guide * Optimize Windows beta packaging * Bump packaged beta base version * Improve Windows installer namespace UX * Improve Windows tools-pack cache keys * Stabilize Windows beta cache version keys * Cache Windows workspace build outputs * Optimize windows release beta cache layers * Cache windows release dependencies * Trim windows release cache before save * Refresh windows tools-pack cache key * Improve windows installer preflight prompts * Fallback NSIS installer strings to English * Fix Windows installer cleanup and preflight * Improve Windows NSIS state logging * Fix system NSIS Persian language alias * Use long-path removal for Windows uninstall * Fix mac tools-pack tests on Windows * Address Windows packaging review feedback * Fix Windows installer cache namespace isolation * Include web output mode in Windows tarball cache key * Use unique Windows release cache save keys	2026-05-07 16:44:15 +08:00
shangxinyu1	9b501f12a5	Support overriding the Codex executable path (#755 ) * Support overriding the Codex executable path * Replace save-as-template prompts with an in-app dialog * Seed local packaged app config from workspace * Fix packaged config and connection test overrides * Keep tools-pack mac config seeding self-contained * Require absolute CODEX_BIN overrides	2026-05-07 15:00:52 +08:00
Jheison Martinez Bolivar	4368b8f163	feat(linux): add headless mode for install/start/stop operations (#686 ) * feat(linux): add headless mode for install/start/stop operations * docs(linux): document headless mode commands and usage * refactor(linux-headless): write web-root.json instead of polling IPC for URL * fix(linux-headless): fail start when web identity never appears instead of returning success * docs(linux-headless): add use-case context and clarify launcher path dependency * fix(linux-headless): ensure launcher, identity and shutdown align with tools-pack - Bake OD_DATA_DIR into launcher so manual runs use the same paths as tools-pack - Validate web-root.json fields before accepting to reject stale identity - Remove web-root.json on successful stop - Add IPC server for graceful STATUS/SHUTDOWN handling * fix(linux-headless): create IPC server before writing web-root.json	2026-05-07 01:52:03 +08:00
Feroomon2010	576dfed9e1	feat: add accent color control and launcher for Open Design (#683 ) * feat: add accent color control and launcher for Open Design * fix: remove launcher binary from PR * test: cover accent appearance edge cases --------- Co-authored-by: ferasbusiness666 <ferasbusiness666@users.noreply.github.com>	2026-05-06 23:14:21 +08:00
iulian	80416b185a	Diagnose missing Next package during tools-dev web startup (#675 ) * fix(tools-dev): diagnose missing Next package * fix(web): remove duplicate Ukrainian prompt labels	2026-05-06 20:45:41 +08:00

1 2

80 commits