open-design

mirror of https://github.com/nexu-io/open-design.git synced 2026-06-01 03:14:35 +07:00

Author	SHA1	Message	Date
PerishFire	bd48c597b0	chore: pin dependency versions and harden CI caches (#2189 ) * chore: pin dependency versions * ci: enforce pinned dependency specs * ci: fix pnpm executable invocation	2026-05-19 13:58:27 +08:00
PerishFire	4424f08be0	[codex] Add packaged desktop auto-update (#1375 ) * Add packaged desktop auto-update * Handle counted beta nightly update versions * Refresh desktop auto-update branch for main * Serialize desktop updater operations * Refresh auto-update branch for packaged paths	2026-05-19 11:20:05 +08:00
kami	8a629eb999	fix: discover codex models from cli (#2082 )	2026-05-18 22:11:51 +08:00
chaoxiaoche	f7eb82d7a5	feat(design-systems): import design system projects (#2112 ) * feat(design-systems): define project manifest contract * feat(design-systems): add default project manifest * feat(daemon): consume design system manifests * feat(design-systems): import local project systems * feat(design-systems): import from github repositories --------- Co-authored-by: chaoxiaoche <chaoxiaoche@chaoxiaochedeMacBook-Pro.local>	2026-05-18 20:20:38 +08:00
kami	7b3b7c3b74	Fix finalize provider routing for Gemini BYOK (#1964 ) Routes Finish Design/finalize requests through the selected BYOK provider, including Gemini, while preserving the Anthropic fallback. Validation: CI and nix-check were green on PR head `6c334e08d1`.	2026-05-18 18:03:44 +08:00
kami	0101a09b10	fix(mcp): support no-auth local HTTP servers (#2008 )	2026-05-18 17:08:46 +08:00
chaoxiaoche	46a64edce3	feat(design-systems): extract component manifests (#2051 ) Co-authored-by: chaoxiaoche <chaoxiaoche@chaoxiaochedeMacBook-Pro.local>	2026-05-18 16:48:59 +08:00
Yuhao Chen	5d59eb906a	fix(platform): detect mise npm package bins (#2035 ) * fix(platform): detect mise npm package bins * test(platform): cover mise codex latest symlink	2026-05-18 16:32:31 +08:00
kami	91be2696dd	Persist routine failure reasons (#1963 ) Co-authored-by: multica-agent <github@multica.ai>	2026-05-17 23:22:00 +08:00
leessju	e92f91fb06	fix(contracts): tighten LiveArtifactSsePayload.refreshStatus to LiveArtifactRefreshStatus (#1871 ) The SSE `live_artifact` event payload typed `refreshStatus` as `string \| undefined`, but the underlying REST API (`LiveArtifact`) and every consumer treat it as the canonical `LiveArtifactRefreshStatus` union (`'never' \| 'idle' \| 'running' \| 'succeeded' \| 'failed'`). Daemon emitters already pass `artifact.refreshStatus`, which is typed as the enum at the source. The old type let new union members drift between REST and SSE unnoticed: a daemon adding a new refresh state for the REST API could ship out via SSE as a bare `string`, and web/CLI consumers switching on the union would silently fall through. Tighten the SSE field to the same enum. The field stays optional to preserve forward/backward compatibility with daemons that omit it on legacy events, but the type now narrows to the canonical members. Runtime behavior is unchanged — this is a type-only fix. Validation: contracts build/test, web typecheck, daemon typecheck. Co-authored-by: nicejames <nicejames@gmail.com>	2026-05-16 21:39:39 +08:00
Yuhao Chen	0e61313347	fix(prompts): stabilize discovery brand answers (#1861 )	2026-05-16 15:50:52 +08:00
Tom Huang	c5d77a03bd	Garnet hemisphere (#1769 ) Some checks failed nix-check / build (push) Failing after 2s Details * feat(chat-composer): enhance mention handling and input overlay - Introduced a new overlay for inline mentions in the chat composer, improving user experience by visually indicating mentions as users type. - Updated the `ChatComposer` component to manage mention entities and integrate them into the input field, allowing for better context and interaction. - Enhanced the `AssistantMessage` component to support the display of plugin action panels based on the current project context, facilitating easier plugin management. - Refactored related components to ensure consistent handling of project files and mentions across the application. This update significantly improves the chat interaction model, making it more intuitive for users to engage with mentions and plugins. * feat(plugin-management): enhance plugin action panels and UI components - Updated the `AssistantMessage` component to include plugin action panels based on the latest project context, improving user interaction with generated plugins. - Refactored the `PluginsView` to support detailed views for available marketplace entries, allowing users to access more information and actions for each plugin. - Introduced new CSS styles for improved visual representation of plugin-related UI elements, enhancing overall user experience. - Enhanced the `listPlugins` function to include an option for fetching hidden plugins, providing more flexibility in plugin management. This update significantly improves the usability and functionality of the plugin management system, making it easier for users to interact with and manage their plugins. * fix(assistant-message): refine plugin folder candidate selection logic - Updated the `pluginFoldersTouchedThisTurn` function to improve the logic for selecting plugin folder candidates based on touched paths and message content. - Introduced a new helper function, `pathMatchesFolderFileBasename`, to enhance the matching criteria for folder candidates. - Added a check for explicit folder matches before falling back to a single candidate, improving accuracy in folder selection. - Modified the `shouldRenderSlotAsText` function in `HomeHero` to include the name parameter, refining the rendering logic for slot text. These changes enhance the functionality and reliability of the assistant message component in managing plugin folder candidates. * feat(plugin-folder-actions): implement agent-routed CLI actions for plugin management - Introduced a new `PluginFolderAgentAction` type to streamline actions related to plugin folders, including install, publish, and contribute. - Updated the `DesignFilesPanel`, `FileWorkspace`, and `AssistantMessage` components to utilize the new agent action handling, improving user interaction with generated plugins. - Refactored the action handling logic to send commands to the agent, enhancing the workflow for managing plugin folders. - Added corresponding tests to ensure the new functionality works as expected and integrates seamlessly with existing components. This update significantly enhances the plugin management experience by routing actions through the agent, allowing for a more cohesive and interactive user experience. * Fix PR 1702 CI blockers * Fix PR 1702 remaining CI checks * Prebuild AGUI adapter after install * Restore plugin project snapshot wiring * feat(marketplace): refactor marketplace URL handling and enhance fetching logic - Introduced new functions to normalize marketplace URLs and manage fetching of marketplace manifests, improving the reliability of marketplace integrations. - Updated the server and plugin logic to utilize the new fetching mechanisms, ensuring consistent handling of marketplace data. - Enhanced tests to cover new URL normalization and fetching scenarios, ensuring robustness in marketplace management. This update significantly improves the marketplace experience by streamlining URL handling and enhancing data fetching capabilities. * Fix project auto-send cleanup spec * Reconcile run messages on cancel * Use active design system as visual direction * Fix active design system prompt wording * feat(workspace-tabs): implement workspace tabs functionality and file attachment handling - Introduced a new `WorkspaceTabsBar` component to manage workspace tabs, allowing users to navigate between different views (projects, marketplace, etc.). - Enhanced file handling capabilities in the `HomeHero` and `EntryShell` components, enabling users to stage and attach files before project creation. - Updated the `App` component to support auto-sending attachments alongside the first message in a project. - Improved CSS styles for workspace tabs and attachment UI, ensuring a cohesive design and user experience. This update significantly enhances the workspace navigation and file management features, providing users with a more intuitive and efficient workflow. * refactor(workspace-tabs): streamline workspace tabs and UI components - Removed unused components and actions from the `WorkspaceTabsBar` and `AppChromeHeader`, simplifying the codebase. - Updated CSS styles for the workspace shell and tabs, enhancing visual consistency and reducing element sizes for a cleaner layout. - Introduced a new client type detection mechanism to dynamically adjust the workspace shell's class, improving responsiveness. - Added tests for the `WorkspaceTabsBar` to ensure proper navigation and tab management functionality. These changes improve the overall performance and user experience of the workspace navigation system. * Update critical e2e for entry modal flow * Stabilize entry critical e2e flows * fix(ui): adjust workspace tabs and header styles for improved layout - Updated the CSS for workspace tabs and the app header, reducing element sizes and padding for a cleaner appearance. - Introduced a new button in the `WorkspaceTabsBar` for quick access to the home tab, enhancing navigation. - Minor adjustments to the layout and styles to ensure consistency across components. These changes enhance the user interface and improve the overall user experience in the workspace navigation system. * feat(workspace-tabs): implement pinned home tab functionality - Added a new pinned home tab feature to the `WorkspaceTabsBar`, allowing the home tab to remain accessible during navigation. - Updated tab management logic to collapse duplicate home tabs into a single pinned instance when restoring from local storage. - Enhanced CSS styles for workspace tabs to accommodate the new pinned tab design. - Updated tests to verify the behavior of the pinned home tab and its interaction with other tabs. These changes improve navigation consistency and user experience within the workspace. * refactor(workspace-tabs): enhance tab management and styling - Updated CSS styles for workspace tabs, adjusting padding and flex properties for improved layout and consistency. - Refactored tab creation logic to ensure unique IDs for project and marketplace tabs, enhancing navigation clarity. - Removed deprecated functions related to pinned home tabs, streamlining the codebase. - Improved test cases to verify independent behavior of home tabs during navigation. These changes enhance the user experience by providing a more intuitive tab management system and a cleaner UI. * style(workspace-tabs): update CSS for improved layout and visibility - Adjusted CSS properties for workspace tabs, including overflow, position, and z-index to enhance layout and stacking context. - Ensured consistent styling across tab components for better visual hierarchy. These changes contribute to a more polished and user-friendly interface within the workspace. * style(entry-layout): update CSS variables for improved layout consistency - Replaced fixed width values with CSS variables for the entry rail to enhance flexibility. - Adjusted padding and height properties for better visual alignment and spacing. - Introduced a new background style for the entry main topbar to improve aesthetics. These changes contribute to a more responsive and visually appealing layout in the entry view. --------- Co-authored-by: qiongyu1999 <2694684348@qq.com> Co-authored-by: Eli <129168833+qiongyu1999@users.noreply.github.com>	2026-05-15 14:42:11 +08:00
lefarcen	b268bbe169	Merge origin/garnet-hemisphere (post-9e196d34) — Use Plugin handoff fix Brings in 11 new garnet commits, most importantly: - `1a90aef4` feat(plugin-use): implement plugin use handoff functionality — fixes the bug QA reported where /plugins Use Plugin would 422 silently for template plugins; new flow hands off to HomeView with the plugin pre-bound + input form prompted there. - `2ac58544` feat(plugin-inputs): enhance plugin input handling with file upload support — extends PluginInputsForm for file uploads. - `3b167b69` feat(plugins): registry protocol — new @open-design/registry-protocol workspace package (needs build before daemon boot). - Plus enhancements to plugin metadata, GitHub installer, plugin detail view, login/whoami, static HTML preview paths. Conflicts resolved: - packages/contracts/src/api/projects.ts: HEAD's skipDiscoveryBrief field + garnet's contextPlugins (@-mention plugin context refs) both kept on ProjectMetadata. - apps/landing-page/* (3 files): accepted HEAD — garnet had the older single-page landing-page header; main has the multi-page layout (/skills/, /systems/, /templates/, /craft/) with dynamic counts. Not related to the Use Plugin core fix. New @open-design/registry-protocol package must be built before daemon boots; pnpm install does this via postinstall already.	2026-05-14 16:32:35 +08:00
pftom	2ac5854432	feat(plugin-inputs): enhance plugin input handling with file upload support - Added support for file input fields in the PluginInputsForm, allowing users to upload files with serializable metadata. - Updated the HomeHero component to improve the layout and interaction of input fields, enhancing user experience. - Adjusted CSS styles for better visual representation of input fields and their states. - Modified HomeView to reflect changes in authoring chip IDs for better clarity in plugin actions. - Enhanced tests to cover new file input functionality and ensure correct behavior in various scenarios. This update significantly improves the plugin input handling, enabling users to upload files seamlessly and enhancing the overall interaction model.	2026-05-14 15:52:21 +08:00
pftom	9ea33e076b	feat(context-plugins): add support for context plugins in project metadata and UI - Introduced a new `contextPlugins` field in the `ProjectMetadata` type to accommodate plugins selected via `@` mentions, allowing for additive context in project creation. - Updated the `HomeHero` and `EntryShell` components to handle and display context plugins, enhancing user interaction with selected plugins. - Implemented rendering logic for context plugins in the metadata block, providing clear visibility of selected plugins and their descriptions. - Enhanced the UI to support the removal of context plugins and display additional details on hover, improving the overall user experience. This update significantly enriches the project creation process by allowing users to incorporate multiple context plugins seamlessly.	2026-05-14 15:29:49 +08:00
lefarcen	6c16283850	Merge origin/main (post-7c8305f4) into reconcile branch Brings in 10 new main commits: routine deep-link to specific conversations (#1508), Windows resource cache fix for Orbit templates, collapsible comment side panel (#1607), routines project radio polish, Copilot logo swap, and minor UI fixes. Conflicts resolved: - router.ts: garnet's home/view + marketplace routes + main's per-project conversationId deep-link field coexist on Route union - ProjectView.tsx: garnet's isPhantomDaemonRunMessage helper + main's isStoppableAssistantMessage helper both kept - ProjectView.run-cleanup.test.tsx: accepted HEAD (garnet's phantom-row regression test); main's three new tests for finalizeActiveAssistantMessagesOnStop / clearStreamingConversationMarker / shouldClearActiveRunRefs are queued as a follow-up TODO inline.	2026-05-14 15:13:38 +08:00
Siri-Ray	d2738924fb	fix(web): freeze completed run durations across conversations (#1351 ) * fix(web): freeze completed run durations across conversations * fix(web): finalize stopped API runs Generated-By: looper 0.6.0 (runner=fixer, agent=codex) * fix(daemon): optimize conversation latest run lookup Generated-By: looper 0.6.0 (runner=fixer, agent=codex) * fix(web): scope streaming cleanup to conversation Generated-By: looper 0.6.0 (runner=fixer, agent=codex) * fix(web): capture streaming conversation cleanup Generated-By: looper 0.6.0 (runner=fixer, agent=codex) * fix(web): guard stale run ref cleanup Generated-By: looper 0.6.0 (runner=fixer, agent=codex)	2026-05-14 14:25:37 +08:00
Marc Chan	055e55abd8	Add batch design system testing (#1515 ) * feat: add batch design system testing * fix: use daemon default agent for batch tests * fix: honor batch project prompt flags Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix: persist batch run output * fix: honor dry-run before daemon resolution Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix: persist batch assistant run ids Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix: cancel timed-out batch runs Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)	2026-05-14 14:19:32 +08:00
pftom	3b167b6921	feat(plugins): add registry protocol and enhance plugin management features - Introduced the `@open-design/registry-protocol` package, enabling improved interactions with plugin registries. - Updated the `typecheck` script in the daemon's `package.json` to include the new registry protocol. - Enhanced the CLI with new flags and commands for better plugin management, including `yank` and additional marketplace functionalities. - Implemented a plugin lockfile system to manage installed plugins and their versions, improving reliability during upgrades. - Added new marketplace doctor functionality to validate plugin entries and ensure compliance with registry standards. This update significantly enhances the plugin ecosystem by providing robust registry interactions and improved management capabilities.	2026-05-14 08:55:36 +08:00
pftom	56c264c9bd	feat(plugins): add login and whoami commands for GitHub CLI authentication - Introduced `login` and `whoami` commands to the plugin CLI, enabling users to authenticate with the Open Design registry via GitHub CLI. - The `login` command wraps GitHub CLI authentication, allowing users to specify a host, defaulting to GitHub. - The `whoami` command retrieves and displays the authenticated GitHub account information, with an option for JSON output. - Updated the CLI help documentation to include usage instructions for the new commands. - Enhanced error handling for GitHub CLI dependencies and authentication status. This update improves the user experience by simplifying the authentication process for plugin publishing.	2026-05-14 07:25:05 +08:00
lefarcen	d83b228c81	Merge remote-tracking branch 'origin/garnet-hemisphere' into reconcile/garnet-main-merge	2026-05-13 23:52:33 +08:00
lefarcen	53997990b7	Merge origin/main (post-0.7.0) into reconciled garnet branch Second-pass merge layering 41+ new commits from origin/main on top of the first reconcile commit. Headline upstream additions absorbed: - 0.7.0 release: redesigned chat bubble user-text styling, neutralised palette, lucide icons, ElevenLabs audio voice option discovery in the prompt composer, analytics tracking (PostHog) wired across home / studio / create surfaces, Prometheus `/api/metrics` endpoint, critique-theater drop-in mount with a settings toggle. - Misc upstream fixes (titlebar padding, release header layout, deck preview chrome, feedback form auto-scroll, conversation-created SSE on routine runs, etc.) Conflict resolutions (12 files, ~22 hunks): - contracts barrel + prompts/system: union of both sides; new analytics exports (`./analytics/events`, `./analytics/public-params`) added alongside garnet's plugin/atom/genui exports. Both ElevenLabs voice fields (audioVoiceOptions/audioVoiceOptionsError, main) and pluginBlock/activeStageBlocks (garnet) preserved on ComposeInput. - daemon/server.ts: Prometheus `/api/metrics` route inserted after garnet's `/api/daemon/shutdown`. main's `createAnalyticsService` call added before the chat-run service init alongside the prior reconcile note about the dropped legacy POST /api/projects body. - App.tsx: handleCreateProject now consumes both garnet's plugin fields (pluginId / appliedPluginSnapshotId / pluginInputs / autoSendFirstMessage) and main's analytics requestId. Tracking fires success + failure paths; PluginLoopHome auto-send sessionStorage flag is preserved. - ProjectView.tsx: the garnet auto-send useEffect coexists with main's `useCritiqueTheaterEnabled()` hook. - ChatComposer.tsx: imports merged (drop now-unused fetchSkills, add analytics provider + tracking + buildVisualAnnotationAttachment). - index.css: main's redesigned `.msg.user .user-text` chat bubble styling wins over garnet's plain text rule; garnet's `.msg-plugin-chip*` rules preserved alongside. - EntryView.tsx: accepted HEAD (garnet wrapper) — consistent with reconcile decision #2. main's added PetRail / TopTab / analytics view tracking is intentionally NOT brought into the wrapper; the follow-up to re-integrate PetRail / image-templates / video-templates into EntryShell still stands and now also covers analytics view-tracking hooks. - daemon/package.json + pnpm-lock: merged dep set (tar + posthog-node + prom-client coexist). - Test fixtures (FileWorkspace.test): kept garnet's plugin-folders describe block intact; main's projectKind="prototype" addition is dropped where it conflicted with garnet's plugin-folder fixture files. Verification: `pnpm install` (after lockfile reconciled), `pnpm typecheck` exits 0 across all workspace packages. Follow-up not done in this commit: - PetRail / image-templates / video-templates / 0.7.0 analytics view-tracking hooks need to be added to EntryShell. - Critique-theater settings toggle UX (added on main) lives in the SettingsDialog hierarchy; the reconcile state preserves the SettingsDialog so this should work without changes, but no end-to-end verification yet.	2026-05-13 23:29:56 +08:00
lefarcen	d3602be666	Merge origin/main into garnet-hemisphere (reconcile) Merge of `origin/main` (`03ed3960`, 2026-05-13 pre-0.7.0) into the 161-commit garnet-hemisphere line, reconciling the product-vibe-coded plugin/marketplace/EntryShell surfaces from garnet with the routines / skills / live-artifacts feature work landed on main since the fork point. Headline decisions (full rationale + side-by-side screenshots in `specs/change/20260513-garnet-skills-automations/reconcile-result-vs-garnet.md`): - #1 SettingsDialog: keep main's Memory / Skills / External MCP / Connectors / Routines / MCP server nav items even though the top-level /integrations + /automations routes also cover them. Two entries coexist for now; revisit once Track A/B fill in the placeholder content. - #2 EntryView: accept garnet's thin wrapper delegating to EntryShell. Main's PetRail sidebar + image-templates/video-templates tabs are intentionally deferred to a follow-up that re-integrates them into the new EntryShell layout. - #3 /integrations + /automations top-level routes: kept (garnet's product intent). Skills tab is still a "Coming soon" placeholder awaiting Track A; Routines/Schedules/Live-artifacts cards on /automations are still mock awaiting Track B. - #5 DesignFilesPanel: hybrid — main's pagination as primary list, garnet's Plugin folders section preserved between the live-artifacts block and the pagination block. (by-kind sections drop in favour of pagination; plugin-folders rendering stays because it is a garnet-specific product addition.) - #7 server.ts (10 hunks, ~5400 conflict lines): manual hunk-by-hunk merge. Both daemon admin routes + plugin/genui routes (garnet) and routines/memory/skills upgrades (main) preserved. Garnet's inline project route block kept alongside main's `registerProjectRoutes` / `registerProjectUploadRoutes` modular wiring — duplicate route audit is a follow-up. Garnet's POST /api/projects plugin-snapshot resolution + default-scenario fallback is intentionally dropped from the inline body (now handled by registerProjectRoutes) and listed for follow-up re-integration into `project-routes.ts`. Verification (worktree at /Users/elian/Documents/open-design-garnet): - `pnpm typecheck` exits 0 across all workspace packages - daemon (`pnpm tools-dev run web --namespace reconcile-shots`) boots, serves `/api/daemon/status` healthy, and survives a Playwright walkthrough of /integrations / /automations / home / projects / design-systems / plugins / settings dialog - `@open-design/plugin-runtime` package built (was missing dist/ on garnet); without it the daemon's plugins/* imports fail at boot Track A (Skills tab → real SkillsSection) and Track B (Automations cards → real routines / live-artifacts backend) are the two remaining follow-ups blocking the placeholder/mock content from going live. See `spec.md` and `track-skills.md` in the same directory.	2026-05-13 22:29:21 +08:00
pftom	0edbf38171	feat(plugins): add specVersion and version fields to plugin and marketplace schemas - Introduced `specVersion` and `version` fields to the plugin and marketplace schemas, ensuring better versioning and compatibility tracking. - Updated various components and functions to handle the new fields, including database migrations, plugin snapshots, and marketplace management. - Enhanced tests to validate the presence and correctness of the new fields in plugin manifests and marketplace entries. - Improved documentation to reflect the changes in schema requirements and provide guidance on the new versioning system. This update strengthens the plugin ecosystem by providing clear versioning, enhancing the reliability and maintainability of plugins and marketplaces.	2026-05-13 22:24:50 +08:00
Caprika	06dbde51f9	[codex] Add Cursor Agent auth diagnostics (#1538 ) * Add Cursor Agent auth diagnostics * Handle Cursor not logged in auth status * Address Cursor auth review feedback * Classify Cursor stdout auth failures	2026-05-13 20:25:34 +08:00
Caprika	a3276ec542	[codex] Add visual draw annotation context (#1547 ) * feat(web): add visual draw annotation context * Fix visual draw annotation staging * Fix concurrent visual annotation IDs	2026-05-13 20:02:19 +08:00
lefarcen	5172e37217	Merge origin/main into release/v0.7.0 to prepare merge-back PR Resolves 7 conflicts via hybrid strategy: - apps/web/src/components/EntryView.tsx: take main (Discord+X pills are forward feature) - apps/web/src/components/Icon.tsx: take main (switch-case refactor) - apps/web/src/components/NewProjectPanel.tsx: take release (preserve #1514 dropdown UX validated in 0.7.0 acceptance) - apps/web/src/index.css: take main (project-target-platforms / instructions chip styles) - apps/web/tests/components/FileViewer.inspect-empty-hint.test.tsx: accept main's deletion - nix/package-daemon.nix, nix/package-web.nix: take main pnpmDepsHash Non-conflicting hunks from #1519 (AppChromeHeader), #1428 (PostHog analytics call sites), and #1540 (release light background) are preserved via auto-merge.	2026-05-13 18:19:47 +08:00
kami	4f76e836ae	feat(audio): add ElevenLabs audio support (#1384 ) * docs: add ElevenLabs audio support design * docs: add ElevenLabs audio implementation plan * feat(daemon): add ElevenLabs speech renderer * feat(daemon): add ElevenLabs sound effects renderer * fix(daemon): preserve ElevenLabs sfx durations * feat(web): expose ElevenLabs media providers * feat(daemon): document ElevenLabs audio contract * feat(audio): add ElevenLabs voice selection * chore: ignore superpowers scratch docs * fix(daemon): cache ElevenLabs voice options * fix(audio): expand ElevenLabs voice and SFX selection * fix(audio): align ElevenLabs SFX controls * fix(audio): tighten ElevenLabs SFX prompt budget * fix(audio): preflight ElevenLabs SFX prompt length * fix(audio): surface ElevenLabs lookup failures * fix(audio): sanitize ElevenLabs prompt errors	2026-05-13 15:53:41 +08:00
Rocky	cd68c8a80a	fix(daemon+web): emit conversation-created SSE event when routine run starts (#1523 ) * fix(daemon+web): emit conversation-created SSE event when routine run starts When a Routine fires in "Reuse an existing project" mode, the daemon creates a new conversation in the project and writes a queued/running assistant message to the database, but the open `ProjectView` has no way to learn that anything happened: the project events SSE stream only carries `file-changed` and `live_artifact` events, and `ProjectView` reloads conversations only when `project.id` changes. The result is the user's own routine "Run now" appears to do nothing until they exit and re-enter the project (#1361). Fix: - Add a `conversation-created` payload type to the existing project events stream in `apps/web/src/providers/project-events.ts`. The payload carries `projectId`, `conversationId`, `title`, and `createdAt`. Mirror the existing `file-changed` listener pattern with explicit malformed-payload handling. - In `apps/daemon/src/server.ts`, after `insertConversation` runs in the routine `setRunHandler` (both reuse-an-existing-project and new-project paths), broadcast a `conversation-created` event through the existing `activeProjectEventSinks` map. The function body was already generic so it was renamed from `emitProjectLiveArtifactEvent` to `emitProjectEvent` and the two pre-existing callers updated. - In `apps/web/src/components/ProjectView.tsx`, when `handleProjectEvent` receives a `conversation-created` event whose `projectId` matches the currently-viewed project, refetch the conversation list via `listConversations`. The active conversation is intentionally NOT changed — per maintainer guidance on #1361, auto-switching is a separate UX decision left for a follow-up. - `projectEventToAgentEvent` returns null for `conversation-created` so it doesn't get routed into the live-artifact path. Tests (`apps/web/tests/providers/project-events.test.ts`): - A single `conversation-created` event reaches the consumer with the parsed payload. - Two consecutive `conversation-created` events from concurrent routine runs both reach the consumer (covers the multiple-concurrent-runs case reported in #1502). - Malformed `conversation-created` payloads are swallowed without throwing, matching the existing `file-changed` / `live_artifact` defensive behavior. Manual verification: - Built locally with `pnpm exec tools-pack mac build --to app --portable` and installed. - Created a routine in `Reuse an existing project` mode targeting an existing project. - With the project view open, clicked `Run now`. The new "Routine" conversation appeared in the project's conversation list within about a second, without exiting and re-entering the project, and the active conversation was not changed. - Clicked `Run now` twice in quick succession; both new conversations appeared in the list, covering the concurrent-runs case in #1502. - `pnpm guard` and `pnpm --filter @open-design/web typecheck` clean; full web test suite is 1016/1016 passing. Fixes #1361 fix(#1523 review): share SSE type via contracts; guard conversation refresh against project-switch and reordering races Addresses Codex P1 + lefarcen P2 inline review on #1523 (#1361): 1. Move `ProjectConversationCreatedSsePayload` to `@open-design/contracts` (`packages/contracts/src/sse/chat.ts`) so the daemon producer and the web consumer share one type. The web provider re-exports it under the local `ProjectConversationCreatedEvent` name to keep the existing import shape stable for callers; the daemon emit site picks up the same shape via a JSDoc typedef so producer and consumer can't drift as this stream grows. (Addresses lefarcen P2 on project-events.ts:17.) 2. Guard the `conversation-created` async refresh in `ProjectView` against two distinct races: - Project-switch race: capture `project.id` at dispatch time and re-check it via a live `projectIdRef` after `listConversations` resolves; bail if the user switched projects while the request was in flight. The existing project-load effects use the same cancellation pattern. (Addresses Codex P1 on ProjectView.tsx:767.) - Concurrent-refresh re-ordering race: bump a monotonic `conversationsRefreshTokenRef` on every dispatch and capture each request's token; only the request whose captured token still equals the live ref at await-return applies its result. Two rapid `conversation-created` events (the #1502 concurrent Run-now case) can no longer drop the newest conversation when the earlier request resolves last with a stale, shorter list. (Addresses lefarcen P2 on ProjectView.tsx:767.) Both guards are documented inline with comments that point back at the review threads. The existing project-events tests (single delivery, concurrent delivery, malformed payloads) are unchanged — the new guards are defensive logic on the consumer, not new event shapes. `pnpm guard`, `pnpm --filter @open-design/web typecheck`, `pnpm --filter @open-design/daemon typecheck`, and the full web test suite (1016/1016) remain green.	2026-05-13 14:50:58 +08:00
pftom	9e196d34af	feat(daemon, web): enhance plugin sharing workflows and UI components - Updated the plugin sharing prompts to utilize local daemon endpoints for publishing to GitHub and contributing to Open Design, streamlining the user experience. - Refactored the `PluginsView` and `PluginShareMenu` components to support new sharing functionalities, including confirmation modals and improved link handling. - Enhanced the CSS styles for the plugin share confirmation modal and related UI elements for better visual consistency. - Added tests to verify the functionality of the new sharing workflows and ensure proper integration within the existing plugin management system. This update significantly improves the plugin sharing experience, making it easier for users to publish and contribute their plugins effectively.	2026-05-13 14:35:09 +08:00
pftom	c9cc3b88c0	feat(web): standardize plugin terminology and enhance UI components - Updated terminology from "Community" to "Official" across various components to reflect first-party plugin status. - Enhanced the ChatComposer, HomeHero, and PluginsHomeSection components to improve user experience and clarity in plugin management. - Improved CSS styles for better visual consistency and layout across plugin-related interfaces. - Added tests to ensure proper functionality and visibility of official plugins in the UI. This update reinforces the distinction between official and user-installed plugins, enhancing the overall user experience in plugin interactions.	2026-05-13 12:19:29 +08:00
lefarcen	dc7791ef9d	feat(analytics): add project_id + project_kind to studio/artifact events (#1509 ) Product tracking doc 260513 added project_id + project_kind to studio_view (artifact), studio_click (share_option), and artifact_export_result. The Studio funnel can now group by project type without joining run_created on the back end. - contracts: 3 props gain required project_id + project_kind - ProjectView → FileWorkspace → FileViewer: thread projectKind down, converting metadata.kind via projectKindToTracking once at the top - FileViewer + HtmlViewer: populate the three call sites	2026-05-13 12:13:55 +08:00
pftom	fcc3ae5838	feat(web): enhance HomeHero and related components for improved context selection and visibility handling - Updated the HomeHero component to support skill and MCP server mentions, allowing users to select these options seamlessly. - Improved CSS styles for the HomeHero component, enhancing the visual presentation of active selections and context tabs. - Refactored visibility handling for slides in the deck framework, ensuring proper display logic and preventing visibility issues with variant classes. - Added tests to verify the functionality of context selection and visibility handling, ensuring a smoother user experience. This update significantly enhances the user interface and interaction capabilities within the HomeHero component, improving the overall experience for users managing skills and presentations.	2026-05-13 11:48:15 +08:00
lefarcen	e2952acd05	Revert "fix(web): restore consistent app header layout (#1432 )" This reverts commit `3d3119333c`.	2026-05-13 11:20:16 +08:00
pftom	c36609c47d	feat(daemon, web): implement plugin sharing project creation and enhance CLI functionality - Added new flags for conversation, message, agent, and model in the CLI to support enhanced plugin sharing features. - Introduced a new API endpoint for creating share projects for plugins, allowing users to publish to GitHub or contribute to Open Design. - Updated the UI components to facilitate the new sharing functionalities, including prompts for user input during the sharing process. - Enhanced the project management system to handle new plugin share actions, improving user interaction and experience. - Added tests to ensure the reliability of the new sharing features and their integration within the existing plugin management system. This update significantly enhances the plugin ecosystem by enabling users to share their creations more effectively and streamline collaboration.	2026-05-13 07:01:12 +08:00
sukumarp2022	b167991d7c	feat: add project-level and user-level custom instructions (#1304 ) * feat: add project-level and user-level custom instructions Implements #510 — editable custom instructions that get injected into every model message, at both user level (Settings → Memory) and project level (pencil icon in project header). - Add customInstructions to Project, AppConfigPrefs contracts - Add custom_instructions column migration to projects table - Inject user + project instructions into system prompt (after memory, before design system; project-level wins on conflict) - Add Settings textarea for user-level instructions - Add inline editor bar in ProjectView for project-level instructions - Sync user-level instructions through daemon app-config round-trip * fix: address PR review — validation, draft reset, length limit - Reset instructionsDraft on Cancel and toggle close (stale draft bug) - Thread customInstructions through POST /api/projects create handler - Add type + length validation (5000 chars) in PATCH handler - Enforce length cap in app-config applyConfigValue - Add maxLength={5000} to both UI textareas - Resync draft via useEffect when editor is closed - Remove stray run.sh from commit * fix: address maintainer review — save race condition, precedence wording - Make handleSaveInstructions async with await + revert on failure - Add instructionsSaving state to disable Save/Cancel/textarea during save - Clarify precedence wording with concrete example in both prompt composers - UpdateProjectRequest already has customInstructions (verified) * fix: use server-returned project in save handler, drop optimistic update The previous optimistic-update + revert approach captured a stale project snapshot in the useCallback closure. On failure, reverting with the captured object could clobber unrelated project fields that changed during the async request. Switch to pessimistic update: wait for patchProject to succeed, then call onProjectChange(result) with the server-returned project object. The instructionsSaving flag disables the editor UI during the round-trip. * fix: align create/PATCH validation for customInstructions Create endpoint now rejects invalid types and >5000 char values with 400 instead of silently truncating, matching the PATCH handler behavior.	2026-05-12 14:27:57 -04:00
Siri-Ray	3d3119333c	fix(web): restore consistent app header layout (#1432 ) * docs: add NotebookLM GitHub export script (#1062) * docs: add NotebookLM GitHub export script * fix: make NotebookLM export TOC anchors work * fix: escape TOC link text markdown chars * fix: include merged PRs when exporting --prs all * fix: allow --prs merged mode * fix: treat --limit as total export budget * fix: avoid starving buckets under global --limit * fix: support --issues none and handle repos w/ issues disabled * fix: avoid underfilling export when buckets empty * fix: keep disabled-issues fallback quiet * fix: silence disabled issues fallback * fix: satisfy script typecheck * prevent duplicate saves and add template deletion (#1294) * prevent duplicate template entries on repeated save * add delete button to saved template list Templates can now be removed from the template picker via a hover x button, calling the existing DELETE /api/templates/:id endpoint. * add missing onDeleteTemplate prop in test fixtures * add template deletion flow test for NewProjectPanel * reject template names longer than 100 characters * preserve original createdAt on template update * feat: add FAQ page skill (#1162) * fix: set writable OD_DATA_DIR default for nix run Fixes #1157 When running via 'nix run github:nexu-io/open-design', the daemon attempted to create runtime state under the Nix store package path: /nix/store/.../lib/open-design/.od/projects The Nix store is read-only at runtime, causing startup to fail with ENOENT when mkdir() tried to create the projects directory. This commit updates the nix run wrapper to export OD_DATA_DIR with a writable default ($HOME/.od) when the variable is unset. Users can still override it by setting OD_DATA_DIR before running. The Home Manager and NixOS modules already set OD_DATA_DIR, so they are unaffected by this change. * feat: add FAQ page skill Add a new skill for generating Frequently Asked Questions pages with: - Collapsible accordion sections for Q&A pairs - Real-time search functionality - Category filtering (Billing, Account, Technical, General) - Smooth animations and transitions - Keyboard navigation support - Mobile-friendly responsive design - Semantic HTML with proper ARIA attributes The skill includes: - SKILL.md with triggers, workflow, and output contract - example.html demonstrating a complete FAQ page with 12 questions Use cases: help centers, support pages, product documentation * fix: address PR review feedback for FAQ page skill - Fix craft slugs: use accessibility-baseline and state-coverage instead of non-existent slugs - Remove overly broad 'questions and answers' trigger - Add edge case handling for insufficient/excessive FAQs - Remove search highlighting requirement (XSS risk) - Update self-check to reflect filtering instead of highlighting Addresses review comments from @lefarcen and @chatgpt-codex-connector * feat: add localized copy for faq-page skill Add German, French, and Russian translations for the FAQ page skill example prompt to fix validation test failure. - DE: FAQ-Seite mit Akkordeon-Abschnitten, Suchfunktion und Kategoriefilterung - FR: Page FAQ avec sections accordéon, recherche et filtrage par catégorie - RU: Страница FAQ со складными секциями-аккордеонами, поиском и фильтрацией * fix: escape apostrophe in French translation Use double quotes to avoid syntax error with d'auth * fix(platform): add legacy ~/.fnm path to wellKnownUserToolchainBins (#1110) * fix(platform): add legacy ~/.fnm path to wellKnownUserToolchainBins fnm legacy installations use ~/.fnm/node-versions. Closes #1102 * fix: remove stray .fnm token from type declaration * docs: add Windows troubleshooting guide (#478) (#1170) * docs: add Windows troubleshooting guide (#478) Add docs/windows-troubleshooting.md with step-by-step fixes for the most common native-Windows setup errors: - Node 24 / nvm-windows gotchas (fake nvm file in System32) - pnpm not found after installation - Build scripts blocked by pnpm 10 (better-sqlite3, sharp) - Visual Studio / gyp build errors - Starting the dev server - Optional OpenCode CLI setup Also update CONTRIBUTING.md and QUICKSTART.md to link to the new guide instead of the vague "file an issue if it doesn't" note. * docs: fix Windows guide command accuracy (#1170) Address all 6 inline review comments from lefarcen: - Pin npm-global pnpm install to @10.33.2 (matches packageManager field) - Use where.exe instead of bare where (PowerShell alias conflict) - Fix OpenCode package: opencode-ai (not opencode), binary is opencode - Add EPERM fallback note for corepack enable on protected installs - Add Python check for gyp ERR! find Python - Expand diagnostic checklist with corepack, python, execution policy Also remove redundant corepack pnpm --version from checklist. * feat(daemon): inject compiled design-system tokens + fixture into prompts (#1385) * feat(daemon): inject compiled design-system tokens + fixture into prompts Follow-up to #1231. The prior PR landed the structured form of two brands (`default` + `kami`) and codified the schema; this PR teaches the daemon to actually consume those files when assembling the system prompt, so agents stop having to re-derive token names from DESIGN.md prose every turn. Gated behind `OD_DESIGN_TOKEN_CHANNEL=1` for the smoke-test phase — flag-off keeps the daemon byte-equivalent to today's behavior, flag-on appends two new prompt blocks (the brand's `tokens.css` :root contract and its `components.html` reference fixture) right after the existing DESIGN.md block. Brands without those sibling files (every brand except `default` and `kami` today) skip silently in either mode. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(daemon): only swallow ENOENT/ENOTDIR in readFileOptional, rethrow rest Reviewer feedback (nettee, #1385). The prior catch-all hid permission errors, EISDIR, and broken packaged-resource paths behind the same "undefined = absent" branch the legacy ~138-brand fallback uses, which would let `OD_DESIGN_TOKEN_CHANNEL=1` silently degrade to the DESIGN.md-only prompt while reporting success. That corrupts the exact signal the smoke-test rollout depends on. Now `readFileOptional` only returns undefined for ENOENT / ENOTDIR (real "file does not exist" cases) and rethrows everything else. Added a focused test that plants a directory at the tokens.css path to exercise the EISDIR branch, plus a partial-presence regression test to confirm the stricter contract preserves the legacy fallback. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: chaoxiaoche <chaoxiaoche@192.168.10.16> Co-authored-by: Cursor <cursoragent@cursor.com> * feat(daemon): make connection-test timeouts configurable (#1222) * feat(daemon): make connection-test timeouts configurable Provider and agent connection tests had hardcoded 12s / 45s budgets, which are too tight for slow networks or distant providers (the user sees "timeout" in Settings with no way to extend the budget). - Add OD_CONNECTION_TEST_PROVIDER_TIMEOUT_MS (default 12_000) - Add OD_CONNECTION_TEST_AGENT_TIMEOUT_MS (default 45_000) - Invalid values (non-numeric, zero, negative, fractional) emit a console.warn and fall back to the default, so a typo in the env never silently disables the safety timeout. - Export resolveConnectionTestTimeoutMs for unit testing; cover the three resolution paths (fallback / honored override / invalid). 41 connection-test tests pass (+3 new), full daemon suite 1170/1170. * fix(daemon): reject connection-test timeout overrides above Node's setTimeout maximum Node's `setTimeout` silently clamps any delay above `2^31-1` ms (2_147_483_647) to ~1 ms with a TimeoutOverflowWarning. The previous `Number.isInteger(n) && n >= 1` check accepted oversized values unchanged and passed them straight to `setTimeout`, so an override that intended to raise the budget — e.g. `OD_CONNECTION_TEST_AGENT_TIMEOUT_MS=3000000000` — instead caused every connection test to fail almost immediately. The safety timeout was effectively disarmed. Add `MAX_CONNECTION_TEST_TIMEOUT_MS = 2_147_483_647` and switch the guard to `Number.isSafeInteger(n) && n >= 1 && n <= MAX...`. The boundary value is still accepted; one millisecond past it falls back with a warn. Regression test exercises `3_000_000_000`, `2_147_483_647`, and `2_147_483_648`. Addresses #1222 review feedback from @chatgpt-codex-connector, @mrcfps, and @lefarcen. * fix(security): strip trailing dot in normalizeBracketedIpv6 (FQDN SSRF bypass) (#1122) * fix(security): strip trailing dot in normalizeBracketedIpv6 (FQDN bypass) new URL('http://192.168.1.5./').hostname returns '192.168.1.5.' — the trailing dot is the RFC 1034 absolute-FQDN form and resolves identically to '192.168.1.5'. parseIpv4 fails on the dotted form, so 169.254.169.254. slips past the metadata-service block, 192.168.1.5. slips past the LAN block, and localhost. slips past the loopback identification. Strip trailing dots in normalizeBracketedIpv6 so all downstream checks (isLoopbackApiHost, isBlockedExternalApiHostname, isBlockedIpv4, IPv6 range tests) see the canonical form. Adds 6 vitest cases covering loopback FQDN forms (localhost., foo.localhost., 127.0.0.1.) and SSRF FQDN bypasses (169.254.169.254., 192.168.1.5., 10.0.0.5.). Refs nexu-io/open-design#1119 review feedback (P2 from @lefarcen). * test(connectionTest): tighten trailing-dot coverage per #1122 review Two issues from #1122 review: 1. (P2 from @mrcfps + codex bot) The original `foo.localhost.` case asserted error===undefined on validateBaseUrl, which only proves the URL passed validation — not that the host is identified as loopback. Replaced with direct isLoopbackApiHost(...) assertions on the actual loopback FQDN forms (localhost., 127.0.0.1., 127.0.0.5.) so the test exercises the loopback path the comment claims. 2. (P3 from @lefarcen) Original blocked-FQDN tests covered only 3 of 7 ranges that isBlockedIpv4 handles. Added a dedicated case per range (0.0.0.0/8, 10/8, 100.64/10, 169.254/16, 172.16/12, 192.168/16, multicast >=224) so future regressions in normalizeBracketedIpv6 surface against the full coverage. * docs: drop misleading foo.localhost./endsWith claim in normalizer comment @lefarcen review feedback: isLoopbackApiHost only accepts exact 'localhost', '::1', loopback IPv4, and mapped loopback IPv4 — there's no subdomain or endsWith handling, so referencing 'foo.localhost.' overstates what the trailing-dot strip enables. Rewrite the comment to match actual call sites (isLoopbackApiHost equality + isBlockedIpv4 numeric parse). * feat(daemon): export self-contained HTML via /export/?inline=1 endpoint (#1312) test(daemon): add Red unit tests for inlineRelativeAssets helper 14 cases pinning the behavior contract for the upcoming apps/daemon/src/inline-assets.ts helper: - link/script inlining with verbatim body preservation - non-src script attrs preserved (type=module, defer, crossorigin) - relative path resolution (root + nested + deep-nested owners) - self-closing and single-quoted attr forms - negative cases: missing rel, rel=preload, absolute/data/blob/leading-slash - escaping: </style and </script inside body - null-fileReader graceful degradation - duplicate identical tags fully replaced (diverges from apps/web/src/components/FileViewer.tsx:5313's first-match-only; locked decision per plan §3.3) - HTML-escaped data-od-inline-asset attr Tests intentionally Red — module ../src/inline-assets.js does not yet exist. Phase B-G of plan declarative-roaming-gosling.md will turn them green by porting FileViewer.tsx:5248-5354 server-side. Refs nexu-io/open-design#368. * feat(daemon): port inlineRelativeAssets server-side for export endpoint Adds apps/daemon/src/inline-assets.ts — a pure helper that takes (html, ownerFileName, fileReader closure) and returns the HTML with every relative <link rel=stylesheet> and <script src> contents inlined into <style data-od-inline-asset="…">/<script>…</script> blocks. The fileReader closure keeps the helper free of fs/Express coupling so the route handler owns the filesystem boundary. Port source: apps/web/src/components/FileViewer.tsx:5248-5354 — five functions (inlineRelativeAssets, resolveProjectRelativePath, baseDirFor, readHtmlAttr, escapeHtmlAttr). The fetch hop becomes the fileReader closure; replace-all replaces first-match-only per locked design decision §3.3 (inline comment in inline-assets.ts cites the divergence from FileViewer.tsx:5313 and notes the web inline path is on a deprecation track since PR #384 made URL-load the default). Phase B-G of plan declarative-roaming-gosling.md. All 14 unit cases from the Red commit (`a60a9023`) now pass; tightens one case to use a realistic '&'-only filename (the original `<`/`>`-bearing filename was unreachable in real filesystems and exposed a regex limitation the web client carries too). Daemon delta: +14 tests (1704 → 1718). Typecheck clean. Refs nexu-io/open-design#368. * test(daemon): add Red integration tests for /export/?inline=1 route 9 HTTP cases against GET /api/projects/:id/export/?inline=1: - 3-file React-ish layout returns self-contained HTML (wiring guard: body assertions catch removal of the await inlineRelativeAssets(...) line, not just helper-internals changes) - missing inline / non-canonical values (0, false, foo, empty) → 400 - non-HTML file → 400 UNSUPPORTED_FILE_TYPE - missing file → 404 FILE_NOT_FOUND - invalid project id (..) → some 4xx (Express normalizes before route) - null-origin OPTIONS preflight → 204 + Access-Control-Allow-Origin: * - missing sibling asset → 200 with <link> tag intact, other asset inlined - nested HTML entry (pages/index.html + ../shared/util.js) → 200 inlined 8 of 9 tests Red (404 / 403); the invalid-project-id case is tolerant about how Express rejects .. so it accidentally passes Red — Green will tighten to 400 BAD_REQUEST via isSafeId. Phase C-R of plan declarative-roaming-gosling.md. C-G will register the route in apps/daemon/src/import-export-routes.ts. Refs nexu-io/open-design#368. * feat(daemon): wire GET /api/projects/:id/export/?inline=1 endpoint Adds the export-inline endpoint into registerProjectExportRoutes (import-export-routes.ts) alongside /export/pdf and /archive. The route: - Validates project id via ctx.validation.isSafeId - Requires ?inline=1 (accept-list: 1 / true / yes / on, matching Part 1's parseForceInline at file-viewer-render-mode.ts:59-66) - Reads the owner HTML via ctx.projectFiles.readProjectFile; maps ENOENT to 404 FILE_NOT_FOUND, everything else to 400 BAD_REQUEST - Gates non-HTML callers with 400 UNSUPPORTED_FILE_TYPE - Builds a fileReader closure that silently returns null on any sibling read failure (failure-local, not fatal — matches the web client's null-filter at FileViewer.tsx:5311) - Hands the buffer + relPath to inlineRelativeAssets and returns the result as text/html DI: RegisterProjectExportRoutesDeps gains 'projectFiles' \| 'validation'; server.ts:2879 passes the corresponding deps. Mirrors the dep shape of RegisterFinalizeRoutesDeps used by PR #832's /finalize/anthropic. Null-origin support intentionally omitted (decision §10 in the PR description): the daemon's null-origin allowlist is /raw/ and /codex-pets/.../spritesheet only, and export consumers are same-origin UI or server-side tooling — sandboxed-iframe srcdoc previews fetch /raw/* instead. Integration test #7 pins the 403 contract so a future allowlist change is deliberate. Phase C-G of plan declarative-roaming-gosling.md. All 23 tests green (14 unit + 9 integration); full daemon suite 1727 passing (delta +9 over B-G's 1718). Typecheck clean. Refs nexu-io/open-design#368. * test(daemon): add Red regression for inlined-body tag-literal corruption Reproduces the correctness bug Siri-Ray (looper) and codex-bot flagged on PR #1312: the reduce/split-join approach in inlineRelativeAssets re-scans the progressively mutated HTML, so a tag literal that happens to appear inside an already-inlined asset body gets the inner literal also replaced — corrupting the body and producing duplicate inlining. Concrete reproducer (CSS, where </style escape doesn't touch <link>): HTML: <link rel="stylesheet" href="a.css"> <link rel="stylesheet" href="b.css"> a.css: /* see also <link rel="stylesheet" href="b.css"> / b.css: body{color:red} Under split/join the second pass splits on `<link rel="stylesheet" href="b.css">` and matches BOTH the real outer tag AND the literal inside a.css's comment. Result: b.css's <style> block is injected inside a.css's comment, and b.css gets inlined twice. Phase F-R of plan declarative-roaming-gosling.md (post-PR-#1312 review round). F-G will rewrite the helper to collect matches by position in the original HTML and concat slices in a single pass, so already-inlined content is never re-scanned. Refs nexu-io/open-design#1312 review threads at apps/daemon/src/inline-assets.ts:122 (Siri-Ray looper + codex bot). feat(daemon): replace inliner reduce/split-join with position-based concat Fixes the inlined-body tag-literal corruption Siri-Ray (looper) + codex-bot flagged on PR #1312. The previous `replaceAllOccurrences` (`source.split(from).join(to)`) re-scanned the progressively mutated HTML on each pass, so a tag literal that appeared inside an already- inlined CSS/JS body got the inner literal replaced too, producing duplicate inlining and corrupted bodies. New shape: collect every match's {start, end} byte span from the ORIGINAL html via `matchAll`, await the per-match replacements in parallel, sort by start, and concat slices of the original html with the replacement strings in a single pass. Text introduced by an earlier replacement is never scanned for matches. The dup-tag fix (decision §8 — replace every occurrence, not first-match-only) is preserved: every original-tag position gets its own slice, so all duplicates are inlined. Also extracts buildInlineStyleBlock / buildInlineScriptBlock so the match-collection loops stay readable. Phase F-G of plan declarative-roaming-gosling.md. Regression test (`c809bccc`) goes Green; all 24 unit + integration tests pass; daemon suite still clean. Refs nexu-io/open-design#1312. * test(daemon): add Red CSP-sandbox test + P3 coverage gaps from PR #1312 review Three tests covering lefarcen's review on PR #1312: 1. [Red] CSP sandbox header (P2, lefarcen @ import-export-routes.ts:423). Top-level browser navigation to /export/?inline=1 sends no Origin header, so the daemon middleware lets it through and any JS in the exported document runs with daemon-origin privileges. Asserts the response sends `Content-Security-Policy: sandbox allow-scripts` so the browser treats it as a sandboxed iframe with an opaque origin (scripts still run, but no cookies / no /api/ access). This test fails until G1-G adds the header in the handler. 2. [Green-on-commit] Accept-list cases (P3, lefarcen @ test.ts:262). PR body decision §7 promises `inline=true/yes/on` case-insensitive, but round-1 tests only exercised inline=1. Pin the full accept list (true / yes / on + TRUE / Yes / ON). Already passes — the route's parser already implements the accept list; this just makes the contract testable. 3. [Green-on-commit] isSafeId guard (P3, lefarcen @ test.ts:287). Previous `..` test was normalized by Express before reaching the route. New input uses `bad!id` (URL-safe, but outside isSafeId's /^[A-Za-z0-9._-]+$/ char class), so Express passes it into req.params unchanged and isSafeId rejects with the documented 400 BAD_REQUEST envelope. Phase G1-R / H of plan declarative-roaming-gosling.md. Refs nexu-io/open-design#1312 review comments. feat(daemon): send Content-Security-Policy: sandbox allow-scripts on /export Closes the same-origin XSS surface lefarcen flagged on PR #1312 (P2 at import-export-routes.ts:423): top-level browser navigation to the export URL sends no Origin header, so the daemon's /api middleware admits the request and any JS in the exported document executes with daemon-origin privileges (cookies, /api/, localStorage). `Content-Security-Policy: sandbox allow-scripts` on the response makes the browser treat the document as a sandboxed iframe with an opaque origin. Scripts still execute (necessary for the screenshot use case — the whole point of inlining JS), but they cannot read cookies, hit /api/, or otherwise escalate to the daemon's origin. Phase G1-G of plan declarative-roaming-gosling.md. Daemon delta: +3 tests (the Red CSP test from `58151356` turns Green; the P3 coverage gap tests stay green). Refs nexu-io/open-design#1312. * test(daemon): add Red regression for <link> stylesheet attr preservation Currently `<link rel="stylesheet" href="print.css" media="print">` becomes a plain `<style data-od-inline-asset="print.css">…</style>` with no media query — print-only styles apply unconditionally. Same problem for `title` (alternate stylesheet sets), `disabled` (initial disabled state), and `nonce` (CSP nonce). All four are valid on both `<link rel=stylesheet>` and `<style>` per HTML spec, so the inliner must carry them across. PR #1312 round-2 review (lefarcen P2 @ inline-assets.ts:44). Phase G2-R; G2-G will extend buildInlineStyleBlock to copy the four attrs off the source <link>. Refs nexu-io/open-design#1312. * feat(daemon): preserve <link> stylesheet semantics on inlined <style> Closes lefarcen's P2 review note on PR #1312 (inline-assets.ts:44): `<link rel="stylesheet" href="print.css" media="print">` was becoming a plain <style> with no media query, so print-only styles applied unconditionally. Same issue for `title` (alternate stylesheet sets), `disabled` (initial disabled state), and `nonce` (CSP nonce). buildInlineStyleBlock now carries four attrs across from the source <link>: - media, title, nonce (value attrs, HTML-escaped via escapeHtmlAttr) - disabled (boolean attr — copied as bare presence) Other <link> attrs (rel, href, type, crossorigin, integrity, referrerpolicy) don't apply to <style> and are intentionally dropped. New `hasBooleanHtmlAttr` helper distinguishes presence-as-attr from substring-inside-another-attr-value via a regex that requires a word boundary after the name (whitespace, `=`, or `>`). Phase G2-G of plan declarative-roaming-gosling.md. All 28 tests pass. Refs nexu-io/open-design#1312. * docs(daemon): narrow inliner contract claim + document size-limit policy Closes lefarcen's P2 review notes on PR #1312: 1. "Self-contained" incomplete (inline-assets.ts:67): the helper only rewrites top-level <link rel=stylesheet> / <script src>. `<img src>`, CSS `url(...)`, CSS `@import`, ES module imports, font sources, and similar remain external in the response. The PR title/body claimed "self-contained HTML" which over-promised for screenshot tooling expecting bundled images/fonts. Module docstring now enumerates the full not-rewritten list and names the screenshot path as the primary use case (headless browser fetches each external asset on render, so inline-CSS- and-JS-only is sufficient). The route handler comment block mirrors the contract. A fully offline export with image/font bundling is filed as a follow-up — out of scope for this PR. 2. No response cap (inline-assets.ts:72): the helper does concurrent reads + multiple string copies and could spike daemon memory. The daemon is local-first (single-user, developer's machine — see open_design_architecture.md), so the effective ceiling is the size of the user's own project. The docstring now states this rationale and names the conditions under which a bounded-concurrency reader and output-size limit would be needed (non-trusted callers). Docs-only — no behavior change, all 28 tests still pass. Refs nexu-io/open-design#1312. * test(daemon): add Red regression for hasBooleanHtmlAttr quoted-value match PR #1312 round-2 review (lefarcen P3): `hasBooleanHtmlAttr` tests the tag string with no attr-quoting awareness, so the literal text `disabled` appearing inside any quoted attribute value followed by another whitespace char satisfies `\sdisabled(?=\s\|=\|/?>)`. <link rel=stylesheet href=x.css data-note="content disabled stuff"> emits a <style disabled> block, silently disabling a stylesheet the author wrote without that attr. Also adds a counterweight test for the legitimate-disabled case (<link … disabled>) so the next-commit fix doesn't over-correct and start dropping real boolean attrs. Phase I3-R of plan declarative-roaming-gosling.md (post-PR-#1312 round-2 review). I3-G will strip quoted attribute values from the tag string before testing for the bare attr. Refs nexu-io/open-design#1312. * feat(daemon): make hasBooleanHtmlAttr quote-aware to avoid false positives Closes lefarcen's P3 review note on PR #1312: `hasBooleanHtmlAttr` previously ran `\sname(?=\s\|=\|/?>)` over the full tag string, so the literal text `disabled` appearing inside any quoted attribute value followed by whitespace satisfied the regex. Source tags like `<link rel=stylesheet href=x.css data-note="content disabled stuff">` were emitting a <style disabled> block — silently disabling a stylesheet the author wrote without that attr. Fix: strip `="…"` and `='…'` substrings out of the tag with two regex passes BEFORE testing for the bare attr. The lookahead still requires `\s\|=\|/?>` after the attr name, so `<link disabled>`, `<link disabled="">`, `<link disabled/>`, etc. all match — but the attr name as a substring of any quoted value cannot match because values have been stripped to `""` / `''`. Phase I3-G of plan declarative-roaming-gosling.md. All 30 tests green (28 prior + 2 round-3 regression cases: false-positive and legitimate-disabled). Refs nexu-io/open-design#1312. * test(daemon): add Red cap-enforcement tests + scaffold InlineOptions PR #1312 round-2 review (lefarcen P2 — still open): round-2 only documented that no cap is enforced. Reviewer pushed back: the helper still builds unbounded candidate arrays + runs Promise.all over all asset reads + concatenates the full output in memory. Need actual limits in code. This commit adds the Red test surface that drives the next commit's enforcement: - InlineAssetsLimitError("owner") when owner HTML > maxOwnerBytes - InlineAssetsLimitError("candidates") when tag matches > maxCandidates - Per-asset graceful: oversized asset → tag stays as URL ref - InlineAssetsLimitError("total") when assembled output > maxTotalBytes - Bounded read concurrency: peak in-flight reads ≤ maxReadConcurrency - Integration: route maps the throw to 413 PAYLOAD_TOO_LARGE InlineOptions interface is added to the helper signature as a no-op test-door (per feedback_test_doors_over_fake_timers.md), so tests can exercise tiny fixtures while production callers use module-level defaults. The next commit (H3-G) wires the enforcement. Phase H3-R of plan declarative-roaming-gosling.md. Daemon delta on this commit: +6 tests (5 unit + 1 integration), all Red. Refs nexu-io/open-design#1312. * feat(daemon): enforce inliner caps + map limit errors to 413 PAYLOAD_TOO_LARGE Closes lefarcen's still-open P2 review on PR #1312 round 2 ("the code still builds unbounded candidate arrays + Promise.all over all asset reads + concatenates the full output in memory"). Caps are now enforced in code with the documented defaults: MAX_INLINE_OWNER_BYTES = 2 MiB MAX_INLINE_ASSET_BYTES = 5 MiB per sibling MAX_INLINE_CANDIDATES = 500 link/script matches MAX_INLINE_TOTAL_BYTES = 50 MiB assembled output MAX_INLINE_READ_CONCURRENCY = 8 simultaneous fileReader calls Enforcement points: - Owner cap (input): fires immediately at function entry. Cheap — Buffer.byteLength of the already-decoded UTF-8 string. - Candidate cap (planning): fires after matchAll, BEFORE any sibling read. Pathological HTML with thousands of <link>/<script src> tags is rejected without opening a single file descriptor. - Asset cap (per-sibling): post-read length check; oversized assets return null from the wrapped reader, so the tag stays as a URL ref and the response is still 200. This is the only "graceful" cap — one bad asset doesn't fail the whole export. - Total cap (output): tracked across the slice-and-concat loop, guarding both preserved-html slices AND injected replacements. - Concurrency cap (planning): a tiny in-module runWithConcurrency worker-pool keeps at most maxReadConcurrency fileReader calls in flight, with order-preserving results. `InlineAssetsLimitError` carries a `limit` discriminator so logs and clients can disambiguate owner/asset/candidates/total. The route handler catches it and emits 413 PAYLOAD_TOO_LARGE. Drive-by error-envelope fix while in the route: UNSUPPORTED_FILE_TYPE (an unregistered ApiErrorCode) → UNSUPPORTED_MEDIA_TYPE (the canonical code) with HTTP 415. The round-1 string was a slip; caught by reading packages/contracts/src/errors.ts:11 while wiring PAYLOAD_TOO_LARGE. Phase H3-G of plan declarative-roaming-gosling.md. All 36 tests green (28 prior + 2 round-3 quoted-attr + 5 cap unit + 1 cap integration). Refs nexu-io/open-design#1312. * feat(daemon): enforce inliner caps pre-buffer via AssetHandle contract Closes lefarcen's still-open P2 review on PR #1312 round 3 ("the helper enforces maxTotalBytes only after all candidate assets have already been read and converted to replacement strings" / "maxAssetBytes is checked after fileReader fully buffers each sibling"). Round-3 caps were defensive against the final output size but did not bound peak memory during read fanout — 500 assets at 5 MiB each could materialize ~2.5 GiB before the 413 fired. Contract change: InlineAssetReader now returns `AssetHandle \| null` where AssetHandle is `{ readonly size: number; read(): Promise<...> }`. Callers expose `size` from a cheap stat-equivalent (the route uses `resolveProjectFilePath`) and defer the full materialization to `read()`. The helper checks size against maxAssetBytes BEFORE invoking read, and against the running total BEFORE the reservation is committed. Enforcement flow inside runWithConcurrency: 1. await fileReader(p.resolved) → cheap stat-only call 2. if (handle.size > maxAssetBytes) return null ← pre-buffer 3. if (runningBytes + handle.size > maxTotalBytes) ← pre-buffer totalAborted = true; return null 4. runningBytes += handle.size ← reserve 5. await handle.read() ← only now 6. if (read returned null) runningBytes -= refund `totalAborted` is a shared flag the workers check at entry, so once the running total hits the cap, no new reads start. With maxReadConcurrency = 8, at most ~8 stat-side calls finish after abort — peak memory bounded. The concat-time guard stays as the exact final assertion (the pre-buffer reservation is approximate — it counts the original tag bytes and skips wrapper overhead). Route closure updated to do `resolveProjectFilePath` first, then `readProjectFile` inside the deferred `read()`. Test reader helpers (`readerFrom` + the concurrency-test reader) updated to the new shape. Two new unit tests pin the pre-buffer semantics: - `maxAssetBytes` is checked via handle.size BEFORE handle.read() (the reader's `read()` throws — must never run) - Running total abort stops further reads once exceeded (counting reader observes ≤ 2 reads when cap should fire after the first) Phase K of plan declarative-roaming-gosling.md (post-PR-#1312 round-3 review). All 38 tests green (36 prior + 2 round-4 pre-buffer cases). Refs nexu-io/open-design#1312. * test(daemon): add Red test pinning owner pre-buffer 413 before mime 415 PR #1312 round-5 (lefarcen P2): the route currently reads the owner file with readProjectFile() before any size check, so a 100 MiB owner HTML is fully buffered into memory before the helper's ownerBytes check fires. The fix is to stat with resolveProjectFilePath first, reject pre-buffer with 413 PAYLOAD_TOO_LARGE on oversize, then fold in the mime check (still 415 on mismatch, now pre-buffer), then readProjectFile when both gates pass. The Red→Green discriminator is the combination 'oversize AND non-HTML': pre-fix the route reads the buffer first and the text/plain mime check fires → 415; post-fix the route stats first and the size check fires before the mime check → 413. Asserting 'got 413, not 415' pins both the pre-buffer property and the check ordering (size before mime, per lefarcen's locked round-5 sequence). 2 MiB+1 byte fixture is acceptable in test setup; MAX_INLINE_OWNER_BYTES is the production 2 MiB so no test-door is needed. Red verified: AssertionError: expected 415 to be 413 (pre-fix flow reads → mime → 415). * feat(daemon): stat owner before readProjectFile in /export route to bound owner pre-buffer PR #1312 round-5 (lefarcen P2 confirmed at PR-1312#issuecomment-4424868413 follow-up): the route previously called readProjectFile() unconditionally on the owner, so a 100 MiB owner HTML was fully buffered into memory before the helper's ownerBytes check fired with InlineAssetsLimitError ('owner'). That meant the 413 envelope returned to the caller but only after peak memory had already hit the file size. Fix mirrors the sibling-asset stat-then-read contract round 4 added via the AssetHandle interface: call resolveProjectFilePath first (cheap stat), reject pre-buffer with 413 PAYLOAD_TOO_LARGE on size > MAX_INLINE_OWNER_BYTES, fold in the mime check (still 415 UNSUPPORTED_MEDIA_TYPE on mismatch, now also pre-buffer per lefarcen's 'fold-in is welcome'), then readProjectFile() only when both gates pass. Size check fires before mime check, so an oversize non-HTML file returns 413 rather than 415 — the observable Red→Green discriminator for this round. The helper's ownerBytes check (inline-assets.ts:127-133) stays as defense-in-depth for direct in-process callers that skip the route and for any drift between stat-reported size and the bytes returned by readFile. Verifies the round-5 Red at apps/daemon/tests/export-inline-route.ts ('returns 413 (not 415) for an oversize non-HTML file'). Daemon suite 1743/1743 passing. * test(daemon): add Red test pinning stat-vs-actual byte reconciliation PR #1312 round-5 (lefarcen P3 confirmed at PR-1312#issuecomment-4424868413 follow-up): the helper trusts handle.size for the running-total guard and never reconciles with the actual byte length of content unless the per-asset cap is exceeded. A reader that under-reports size (stale stat, UTF-8 expansion at decode, sparse file, deliberate lie) can let many strings materialize in memory before the concat-time guard at the bottom of inlineRelativeAssets throws — defeating the round-4 pre-buffer cap intent. Fix is lefarcen-confirmed path-a: post-read, the helper computes actualBytes = Buffer.byteLength(content, 'utf8'), reconciles runningBytes (add actualBytes, refund handle.size), and if running total exceeds maxTotalBytes flips totalAborted = true and returns null. Subsequent workers see totalAborted before invoking their own read(). Helper still throws InlineAssetsLimitError('total') after Promise.all settles — preserving the round-2/3/4 graceful-fallback pattern instead of racing throws across in-flight workers. Red→Green discriminator is read count. Pre-fix the helper trusts the lying handle.size (10), so both reads complete (each returning 1000 bytes) under the reservation total of 56+10+10=76 < cap 500. The concat-time guard then catches the 2000+-byte assembly and throws 'total' — but only after both reads materialized in memory. Post-fix worker 1's reconciliation trips totalAborted as soon as actualBytes (1000) is folded into runningBytes; worker 2 skips its read. Red verified: AssertionError expected 1, received 2 (pre-fix flow completes both reads before concat-guard fires). * feat(daemon): reconcile inliner reservation with post-read actual bytes PR #1312 round-5 (lefarcen P3 confirmed at PR-1312#issuecomment-4424868413 follow-up, path-a): the helper trusted handle.size for the running- total guard and only reconciled with actual bytes for the per-asset cap. A reader that under-reported size — stale stat, UTF-8 decode expansion at read time, sparse file, deliberate lie — could let many strings materialize before the concat-time guard at the bottom of inlineRelativeAssets caught the excess. That defeated the round-4 pre-buffer cap intent. Fix: after a successful read(), compute actualBytes = Buffer.byteLength(content, 'utf8'), reconcile runningBytes by folding in (actualBytes - handle.size), and re-check the total cap. If the reconciliation pushes runningBytes past maxTotalBytes, drop the asset's inlining (tag stays as URL ref), set totalAborted = true to block subsequent worker reads, and let Promise.all settle. The helper then throws InlineAssetsLimitError('total') below — matching the round-2/3/4 graceful-fallback pattern (no throw-before-settle race between in-flight workers). The per-asset cap check at line 228 is preserved for stat-lying readers that blow a single asset past maxAssetBytes; that branch refunds handle.size and drops without flipping totalAborted, so sibling assets still get a fair shot. Verifies the round-5 Red at apps/daemon/tests/export-inline-route.ts ('reconciles handle.size with actual content bytes'). Daemon suite 1744/1744 passing. --------- Co-authored-by: DevForgeAI CI/CD Engineer <devforge-ai@development.ai> * fix: truncate long template names on project cards (#1220) (#1302) Add min-width: 0 to .design-card-name so text-overflow: ellipsis works correctly in flex layouts. Long template names were pushing the task execution status (Running, Failed, etc.) out of view on project cards. Closes #1220 Co-authored-by: laomo <laomo@openclaw.ai> * fix(desktop): swallow setTypeOfService EINVAL crashes in dev main (#647) (#1298) * fix(desktop): swallow harmless setTypeOfService EINVAL crashes in dev main The packaged Electron entry (apps/packaged/src/logging.ts) already filters the undici "setTypeOfService EINVAL" crash that issue #895 introduced for the prod build, but the dev / source-built desktop entry was missing the parallel guard. Result: switching settings tabs in a from-source desktop run could fire a fresh fetch, undici would try to set IP_TOS on the outbound socket, the kernel would refuse on certain macOS / VPN configurations, and the rejection bubbled to Electron's default handler as the "JavaScript error in the main process" dialog reported in issue #647. Add the same defensive filter to apps/desktop: - isHarmlessSocketOptionError matches only the canonical undici shape (syscall name AND EINVAL code). A contradicting code (EACCES, EPERM, etc) explicitly fails the match so real bugs don't get hidden. - The uncaughtException handler logs harmless cases at warn and returns silently. For anything else it removes itself from the listener list and re-throws via setImmediate, restoring Node's default crash path so Electron's native dialog renders exactly as it would without this filter. - unhandledRejection mirrors the same harmless / fall-through split. The filter is installed BEFORE app.whenReady so it is armed by the time the renderer fires its first fetch. The helper is duplicated rather than imported from apps/packaged because AGENTS.md forbids cross-app private-source imports. The file header calls out the parallel and notes that the two copies should stay in sync until the helper is promoted to a shared workspace package (follow-up); the contract is identical so a regression in one will surface in the other's test suite. Tests in apps/desktop/tests/main/uncaught-exception.test.ts mirror apps/packaged/tests/logging.test.ts: 8 cases pinning the matcher shape, 2 cases pinning the handler's harmless-log-warn vs fall-through-rethrow split. Validated: pnpm guard, pnpm --filter @open-design/desktop typecheck, pnpm --filter @open-design/desktop build, and pnpm --filter @open-design/desktop test (14 passed, 10 new). * fix(desktop,packaged): fail-fast on non-harmless unhandled rejections The previous unhandledRejection listeners logged non-harmless reasons and returned, which kept the main process alive after any rejected promise. A real bug, a failed IPC registration, or any unexpected async exception was reduced to a console line instead of surfacing through Node/Electron's default crash path the filter was meant to preserve. Both copies now route non-harmless rejections through a parallel factory (createDesktopUnhandledRejectionHandler / createFatalUnhandledRejectionHandler) that mirrors the uncaughtException policy: harmless setTypeOfService EINVAL shapes log at warn and return, anything else logs at error, removes the listener, and re-throws via setImmediate. Listener removal happens before the scheduled throw, so the rethrown reason lands in the uncaughtException path with no recursion. Tests cover the harmless branch, the detach + ordered rethrow, and non-Error / primitive rejection reasons (Promise.reject(42)) which must fall through. Desktop suite: 13/13, packaged suite: 16/16. Flagged on PR #1298 by Siri-Ray and the codex P2 review thread; the two file copies stay in lockstep per the AGENTS.md sync invariant. --------- Co-authored-by: Nagendhra <nagendhra405@gmail.com> * feature: refine assistant artifact feedback (#1379) * feature: refine assistant artifact feedback * fix: clear hidden custom feedback reason * test: update assistant feedback expectations * fix: support object-style question-form options (#1293) * fix: support object-style question-form options * fix: preserve stable option values in form submissions * fix(daemon/acp): terminate ACP child after clean prompt completion (#1286) * fix(daemon/acp): terminate ACP child after clean prompt completion (Bug B / #1265) Some ACP agents (notably Devin for Terminal) keep the child process alive after stdin closes, waiting for the next prompt. Open Design spawns a fresh agent per chat turn and relies on child.on('close') to finalize the run, so without an explicit signal-driven shutdown the chat sits stuck in the 'working' state indefinitely. Three small, targeted changes: - apps/daemon/src/acp.ts: After a clean session/prompt response we schedule a 500ms grace period and then SIGTERM the child. This mirrors the pattern detectAcpModels() already uses after model discovery. The grace period leaves well-behaved agents that exit on stdin.end() unaffected. - apps/daemon/src/acp.ts: New completedSuccessfully() method on the session handle reports whether the prompt resolved without a fatal error or abort, so the consumer can distinguish 'clean signal exit' from 'genuine signal failure'. - apps/daemon/src/server.ts: child.on('close') now treats a SIGTERM exit as 'succeeded' when acpSession.completedSuccessfully() is true. - apps/web/src/providers/daemon.ts: Trust the server's authoritative endStatus; the signal/non-zero-code safety net no longer overrides an explicit 'succeeded' status, so the chat doesn't surface a fake 'agent exited with signal SIGTERM' error after a clean ACP run. Daemon tests cover the SIGTERM grace timer, clean early-exit (timer cleared), and completedSuccessfully() abort/error states. Manual UI test on plain main + this fix confirms Devin chats now return to ready automatically after Done · ... * fix(daemon/connectionTest): treat ACP clean SIGTERM as success Codex review on #1286 caught that the new SIGTERM in attachAcpSession breaks ACP connection tests for agents that don't shut down on stdin.end() (the exact Devin behavior the patch targets). attachAgentStreamHandlers() in connectionTest.ts now also respects acpSession.completedSuccessfully(), mirroring the same check we apply in server.ts. Without this, a clean prompt response followed by our SIGTERM would set winner.signal === 'SIGTERM', flip exitedCleanly to false, and the connection test would report 'agent_spawn_failed' even when the agent had returned a healthy response. Also widened the AgentSpawnHandle type so completedSuccessfully is visible on the structural type used inside connectionTest.ts. All 56 daemon tests still pass; typecheck + guard clean. * fix(daemon/acp): narrow ACP success-on-signal override to forced-SIGTERM Looper review on #1286 caught that the success predicate was broader than the SIGTERM case it was meant to handle. `completedSuccessfully()` flips to true as soon as the ACP `session/prompt` response is processed, but it does not say why the child later closed. With the broad predicate, an ACP agent that returned a prompt result and then exited with code 1 (or was killed by SIGKILL/SIGSEGV) was still marked 'succeeded', regressing the existing close-status behavior for genuine post-response process failures. Scope the override to the exact forced-shutdown shape this PR introduces: code === null && signal === 'SIGTERM' && acpCleanCompletion Applied to both `server.ts` (chat run finalization) and `connectionTest.ts` (connection-test classification). Any other post-response failure now falls through to 'failed' / 'agent_spawn_failed' as before. All 59 daemon tests still pass; typecheck + guard clean. * fix(web/daemon): only bypass exit-code safety net on explicit server success Looper review on #1286 caught that the previous web change trusted `endStatus === 'succeeded'` absolutely, but `endStatus` can become 'succeeded' in two distinct ways: 1. The SSE end event explicitly carries `status: 'succeeded'` (authoritative server declaration). 2. The end event omits or has an invalid `status` field and the handler silently falls back to 'succeeded' as a local default. Both produced `endStatus === 'succeeded'` in the existing code, so the new safety-net bypass treated them identically. That regressed backward compat: a compatible or older daemon emitting an end event like `{code:1}` or `{code:null,signal:"SIGTERM"}` with no `status` would suddenly skip the failure banner. Track explicit success separately via `serverDeclaredSuccess`, set true only when: - The SSE end event has `status === 'succeeded'`, or - The fallback `fetchChatRunStatus` REST path returns `status === 'succeeded'` (which the existing `isChatRunStatus()` guard already proves is explicit). The safety net is now bypassed only on that explicit signal; the local-fallback success path still reaches the exit-code/signal check so real failures surface as before. Adds three web-side regression tests in `apps/web/tests/providers/sse.test.ts`: - Explicit `status: 'succeeded'` + SIGTERM → onDone called, no error - End event with `{code:1}` and no `status` → onError surfaces 'agent exited with code 1' as before - End event with `{code:null,signal:'SIGTERM'}` and no `status` → onError surfaces 'agent exited with signal SIGTERM' as before `pnpm guard` + daemon typecheck clean; 27/27 SSE tests pass (up from 24). * Fix Codex wrapper launch paths (#1395) * test: add Memory and Routines coverage (#1400) * test: align extended Playwright coverage with current UI behavior * test: address extended suite review feedback * test: fix Codex fallback config hydration in e2e * test: add Memory and Routines coverage * test: fix Memory and Routines component test typing * test: include Memory and Routines e2e in extended suite * refactor(settings): use tiled language picker instead of dropdown (#1406) The Language section in Settings rendered a single-button dropdown trigger that opened a floating menu. With one visible label and lots of empty panel space, the layout misled users into thinking only one language existed. Replace the dropdown trigger + portaled menu with an inline tile grid that shows every locale at a glance and clicks directly to switch. Side effects of the new layout: the languageOpen / languageMenuRect state, the dynamic placement effect, the resize-close effect, the mousedown click-outside handler, and the languageRef are gone. The global Escape handler no longer needs to guard against the menu being open. CSS for .settings-language-picker, .settings-language-button, .settings-language-menu, and .settings-language-option is replaced by .settings-language-grid (auto-fill 180px minmax columns) + .settings-language-tile. Tests in SettingsDialog.execution.test.tsx that drove the dropdown (click trigger → click menuitemradio → assert menu closed) are rewritten to drive the tiles directly via the radio role. Refs #1347 * fix(web): restore consistent app header layout * fix(web): restore consistent app header layout Generated-By: looper 0.7.2 (runner=fixer, agent=opencode) * fix(web): restore consistent app header layout Generated-By: looper 0.7.2 (runner=fixer, agent=opencode) * fix(web): restore consistent app header layout Generated-By: looper 0.7.2 (runner=fixer, agent=opencode) * fix(web): hide project output chips in header --------- Co-authored-by: Prantik Medhi <140103052+prantikmedhi@users.noreply.github.com> Co-authored-by: 이용진 <90879448+Leesin0222@users.noreply.github.com> Co-authored-by: Nicholas-Xiong <2482929840@qq.com> Co-authored-by: Hesam <chngyzkhanwhsht@gmail.com> Co-authored-by: Yuhao Chen <godcorn001@outlook.com> Co-authored-by: chaoxiaoche <fanzhen910412@gmail.com> Co-authored-by: chaoxiaoche <chaoxiaoche@192.168.10.16> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: eggward han <32223217+Eggwardhan@users.noreply.github.com> Co-authored-by: @aaronjmars <61592645+aaronjmars@users.noreply.github.com> Co-authored-by: Bryan <121247296+bankielewicz@users.noreply.github.com> Co-authored-by: DevForgeAI CI/CD Engineer <devforge-ai@development.ai> Co-authored-by: mrzhangkris <92247501+mrzhangkris@users.noreply.github.com> Co-authored-by: laomo <laomo@openclaw.ai> Co-authored-by: Nagendhra Madishetti <nagendhra.madishetti24@gmail.com> Co-authored-by: Nagendhra <nagendhra405@gmail.com> Co-authored-by: Mason <jinmeihong0201@gmail.com> Co-authored-by: Yiang Yiyan <15089131836@163.com> Co-authored-by: Rocky <101849785+MrRockySL@users.noreply.github.com> Co-authored-by: nettee <nettee.liu@gmail.com> Co-authored-by: shangxinyu1 <shangxinyu@refly.ai> Co-authored-by: Matt Van Horn <mvanhorn@users.noreply.github.com>	2026-05-12 23:15:46 +08:00
lefarcen	e1bc83a476	feat(analytics): PostHog product analytics (P0 events, consent-gated, packaged) (#1428 ) * feat(analytics): scaffold PostHog product-analytics integration - Add @open-design/contracts/analytics subpath with the 17 P0 event payload types, header constants, and code↔CSV enum mapping helpers. - Add apps/daemon/src/analytics.ts with env-gated posthog-node client, request-scoped analytics context reader, and artifact-id anonymizer. - Expose GET /api/analytics/config so the web bundle never embeds the PostHog key at build time; daemon owns POSTHOG_KEY / POSTHOG_HOST. - Add apps/web/src/analytics module (identity + lazy posthog-js client + React provider) and mount it under <I18nProvider> in app/layout. No event wiring yet — that lands in the next commit alongside trigger points (App.tsx, EntryView, NewProjectPanel, SettingsDialog, FileViewer, runs.ts). * feat(analytics): wire app_launch, home_view, home_click, project_create_result - App.tsx: fire app_launch once after first effect tick. handleCreateProject now emits project_create_result on both success and failure paths. - EntryView.tsx: home_view (page) gated on agents loading so has_available_cli isn't transiently false; home_view (asset_panel) fires per top-tab change with the right result_count. - NewProjectPanel.tsx: home_click create_button fires before delegating to the parent; a fresh request_id is generated here and threaded through onCreate so the matching project_create_result stitches via $insert_id. - contracts/analytics: tighten createTabToTracking and topTabToTracking for the worktree branch's renamed tabs (live-artifact, templates). * feat(analytics): wire settings_view + 3 settings_click events - settings_view fires on dialog mount and on every section switch, carrying the active section (mapped via settingsSectionToTracking for the 16-section worktree layout), execution_mode, and the selected CLI provider id when present. - settings_click execution_mode_tab: setMode now emits before/after values whenever the user toggles between Local CLI and BYOK. - settings_click cli_provider_card: agent card onClick reports cli_provider_id via agentIdToTracking (kiro → other). - settings_click byok_field: onFocus added to api_key, model select, and base_url inputs; provider_id widened to include google so the worktree's Gemini protocol slot type-checks. * feat(analytics): wire studio_view + studio_click chat, studio_view artifact - packages/contracts/src/analytics/artifact-id.ts: FNV-1a 64-bit helper produces a 16-hex anonymized id for (projectId, fileName). Stable cross-platform so the daemon and the web bundle resolve the same id without a Web Crypto round-trip; daemon now re-exports it. - ChatComposer: studio_view chat_panel fires once per project mount, studio_click chat_composer fires on attachment + send buttons with estimated user_query_tokens (length/4) and has_attachment. - FileViewer: studio_view artifact fires once per (project, file) at the dispatcher level, before any sub-viewer renders, with artifact_kind derived from the renderer registry / file.kind table. - Widen TrackingExportFormat to include markdown and cloudflare_pages so the worktree branch's full share menu can emit verbatim. * feat(analytics): wire studio_click share_option + artifact_export_result HtmlViewer's share menu now emits both events per click via a fireShareExport helper: - studio_click share_option fires immediately on click with the chosen export_format and a fresh request_id. - artifact_export_result fires when the export resolves — success for sync exporters (html, markdown, template) the moment the call returns, success/failed for async exporters (pdf, zip, deploy) via .then/.catch. The same request_id threads both events so PostHog stitches click → result via $insert_id. DEPLOY_PROVIDER_OPTIONS maps to the CSV's vercel / cloudflare_pages slots; markdown is now a first-class export_format value. Also ignore .env.local so local POSTHOG_KEY / .env-style secrets don't get committed. * feat(analytics): emit run_created and run_finished from the daemon POST /api/runs now reads the analytics context off the x-od-analytics-* headers the web client sets on every fetch, then: - Captures run_created with project_id, conversation_id, run_id, model_id, agent_provider_id (mapped via agentIdToTracking), skill_id, design_system_id, plus the token_count_source marker. - Schedules a run_finished capture on runs.wait(run) resolution, mapping succeeded/canceled/failed to success/cancelled/failed and reporting total_duration_ms. Both events use a stable insert_id derived from the same uuid so PostHog dedupes the daemon-side mirror against any future web-side capture without double-counting. Token sub-fields (user_query_tokens/system_prompt_tokens/...) stay omitted in v1 — the claude-stream parser only exposes input/output totals today. See tracking-doc-issues.md §3.2. * feat(analytics): emit settings_cli_test_result + settings_byok_test_result The original BLOCKING-list assumed these CSV P0 events were not implementable in this branch because main lacked Test buttons. The worktree HEAD actually wires `handleTestAgent` and `handleTestProvider` in SettingsDialog, so both events are now in scope. - handleTestAgent emits settings_cli_test_result on success and failure paths with cli_provider_id mapped via agentIdToTracking, result drawn from result.ok / catch branch, error_code from result.kind or the thrown error name, and duration_ms timed via performance.now(). - handleTestProvider emits settings_byok_test_result analogously, using apiProtocol (anthropic\|openai\|azure\|ollama\|google) directly as provider_id — wider than the CSV's 5-value enum, documented in tracking-doc-issues.md §2.5. Contracts: add SettingsCliTestResultProps / SettingsByokTestResultProps plus matching track* helpers. AnalyticsEventName union now covers all 14 P0 events this branch supports. * feat(analytics): gate PostHog on the existing telemetry.metrics consent The integration now reuses the same first-launch privacy banner + Settings → Privacy toggle that gates Langfuse, so a single user decision controls both telemetry sinks. - /api/analytics/config now consults the persisted AppConfigPrefs: it returns enabled=true only when POSTHOG_KEY is set AND the user has chosen "Share usage data" (telemetry.metrics === true). The response also echoes installationId so the web client uses the same anonymous id Langfuse keys off of — one identity per install, shared across both sinks. - Web AnalyticsProvider: - Bootstrap fetch resolves installationId and threads it through the x-od-analytics-anonymous-id header on every /api/* fetch, so daemon-side captures (run_created / run_finished / project_create_result) land on the same person record. - Exposes a setConsent(granted) method that calls posthog-js's opt_in_capturing / opt_out_capturing, wired from App.tsx via a useEffect watching config.telemetry?.metrics. Toggling Privacy → metrics now stops/resumes events immediately, no reload. - app_launch additionally gates on telemetry.metrics so a freshly- declined user fires nothing, and a freshly-opted-in user fires on the next reload. * feat(packaging): bake POSTHOG_KEY into packaged daemon spawn env Wires PostHog product analytics through the same Langfuse-style build- secret pipeline so official Open Design builds ship with the key while fork builds compile without it (the integration short-circuits cleanly when POSTHOG_KEY is absent). tools/pack - resolveToolPackConfig reads POSTHOG_KEY / POSTHOG_HOST from process.env at packaging time, validates them (no whitespace in the key, http(s) URL for host, trailing-slash strip), and stamps them on ToolPackConfig. Fork builds without the env vars simply omit the fields; the daemon-side gate keeps things off in that case. - Mac, Windows, and Linux packaged-config writers each append the two fields to open-design-config.json next to the existing telemetryRelayUrl entry. apps/packaged - RawPackagedConfig / PackagedConfig surface posthogKey / posthogHost so the Electron entry and headless entry both forward them to the daemon sidecar. - buildPackagedDaemonSpawnEnv emits POSTHOG_KEY / POSTHOG_HOST into the daemon child env when present. The daemon's existing analytics module reads these via process.env — no daemon-side changes needed. - The headless packaged path falls back to process.env for fields the builder hasn't injected, mirroring how OPEN_DESIGN_TELEMETRY_RELAY_URL is read there. CI - release-beta.yml and release-stable.yml expose POSTHOG_KEY (secret) and POSTHOG_HOST (var) at workflow-env scope so every packaging job inherits them. PR / fork builds without these set simply skip the bake step. Tests - tools/pack: config.test.ts covers bake-through, fork-build omission, whitespace rejection, invalid-URL rejection, and trailing-slash normalization. - apps/packaged: sidecars.test.ts covers buildPackagedDaemonSpawnEnv forwarding the keys when present and omitting them when null. * feat(analytics): enable PostHog autocapture + perf + exceptions Flip on the PostHog SDK's automatic diagnostic features so we capture click paths, page transitions, web vitals, dead clicks, and browser exceptions without scattering instrumentation through the codebase. Privacy defense lives in one place — apps/web/src/analytics/scrub.ts — wired in via posthog-js's `before_send` hook so every outgoing event passes through the same audit point: - $autocapture / $rageclick / $dead_click / $copy_autocapture: strips $el_text and value/placeholder/aria-label attrs from any input, textarea, password input, or contenteditable element. PostHog autocapture does not capture input.value by default, but $el_text on a <textarea> reflects the typed content — that's the prompt body for us, so it has to be scrubbed every time. - $pageview / $pageleave: drops query string and fragment from $current_url / $referrer so any future ?q=… can't leak. - $exception: rewrites file:// and absolute filesystem paths in stack frames to app://apps/<repo-relative> so we don't ship the user's home directory. - Suppresses $opt_in entirely — duplicate of our explicit setConsent toggle in App.tsx. Element-level defense in depth is limited to the single most sensitive surface: the chat composer textarea gets `ph-no-capture` so PostHog never even generates an event for clicks inside that subtree. Every other input relies on scrub.ts — sprinkling the class through every form would be noisy and easy to forget on new surfaces. The existing Privacy → "Share usage data" toggle continues to gate every new feature: posthog-js's opt_out_capturing() halts autocapture, $pageview, $exception, web vitals, and dead clicks alongside the explicit capture() calls — one global switch. 11 unit tests pin the scrub rules in apps/web/tests/analytics-scrub.test.ts. * ci(nix): bump pnpmDepsHash for posthog-js + posthog-node additions Adding posthog-js to apps/web and posthog-node to apps/daemon changed pnpm-lock.yaml, which Nix's fixed-output pnpmDeps derivation pins by sha256. The CI nix flake check failed with: specified: sha256-KF3Mld72/iau+pJmA7HvnanRx8VLtDP0N624SKrtrrc= got: sha256-PGFgX4lYyeH2TRAXfUq52A3EOa6bb1gO59hPsXhEk3s= Copy the new hash into both nix/package-web.nix and nix/package-daemon.nix per the procedure documented in nix/README.md §"First-build hash pinning". * feat(analytics): unify PostHog identity with Langfuse installationId PostHog's distinct_id is the installationId stamped by /api/analytics/ config; Langfuse already reads the same id off app-config.json to populate trace.userId. With both sinks keying off the same anonymous identity, dashboards can correlate user actions (PostHog events) with LLM runs (Langfuse traces) without re-identifying. Two gaps closed: 1. applyConsent(false) — clear posthog-js's persisted ph__posthog localStorage entry on opt-out via posthog.reset(). Without this, a user who opts out, then clicks Delete my data, then re-opts in would see PostHog stitch their new session to the deleted identity because bootstrap.distinctID only takes effect on first init. 2. applyIdentity(newInstallationId) — Delete my data rotates the installationId in app-config; App.tsx now watches config.installationId and calls posthog.reset() then identify(newId) so the next event batch is fully decoupled from the deleted one. Idempotent on same-id re-renders so benign config refreshes don't churn PostHog identities. The fetch wrapper's x-od-analytics-anonymous-id header also flips to the new id on rotation so daemon-side captures (run_created / run_finished) land on the same person record from the very next API call, not after a reload. The end-to-end rotation flow is verified against a live PostHog project; these unit tests pin the safety guards (no-client paths, null inputs) since stubbing posthog-js's init-loaded callback chain is brittle. fix(langfuse): require both metrics AND content consent for trace reports Tightens the Langfuse gate so a user who shares anonymous metrics but NOT conversation content stops emitting Langfuse traces entirely — Langfuse is used for turn-quality evals which only make sense with prompt/output bodies. PostHog (product analytics, content-free) stays gated on `metrics` alone and is unaffected. i18n: "Conversation content" → "Conversation and tool content" with hints expanded to mention tool inputs/outputs so the consent surface matches what the trace actually carries (en + zh-CN). Bundled here per PR scope — change originated outside this PostHog PR but lands cleanly on the same files; gating Langfuse strictly on `content` makes the dual-sink consent model (PostHog = metrics, Langfuse = metrics + content) symmetric across both i18n locales and the daemon-side gate. * feat(analytics): wire byok_provider_option + fix PR review P1s Adds the BYOK protocol-chip click event (5-value provider_id mirroring the apiProtocol Settings UI) and resolves four P1 review threads on PR #1428. byok_provider_option: - New SettingsClickByokProviderOptionProps in contracts (provider_id = anthropic\|openai\|azure\|google\|ollama; maps to CSV's 5 values per tracking-doc-issues.md §2.5). - trackSettingsClickByokProviderOption helper in apps/web/src/analytics. - SettingsDialog hooks it on the protocol-chip onClick alongside the existing setApiProtocol call; is_selected reflects whether the chip was already active. Review fixes: 1. client.ts (Siri-Ray): clear `initPromise` when the resolution is null so a Privacy → metrics opt-in after a previous decline triggers a fresh /api/analytics/config fetch. Without this, the disabled response was cached forever — first-session opt-in needed a reload to start sending PostHog events. 2. provider.tsx (Siri-Ray): replace `url.includes('/api/')` with a strict same-origin + /api/ pathname check (shared `isSameOriginApiCall` helper). Outbound third-party URLs containing `/api/` (e.g. provider.example.com/api/x) no longer receive our x-od-analytics-* headers. 3. provider.tsx (codex-connector, lefarcen): gate header injection on `resolvedAnonId` being non-null. When Privacy → metrics is off, /api/analytics/config returns enabled=false → resolvedAnonId stays null → wrapper never installs → daemon can't read consent-bearing headers → no daemon-side PostHog event. setConsent now also clears resolvedAnonId on opt-out and re-fetches on opt-in. 4. daemon/analytics.ts (defense in depth): createAnalyticsService now takes dataDir and capture() re-reads app-config to check telemetry.metrics inside the fire-and-forget wrapper. Even if a stale header somehow reaches the daemon after opt-out, the capture is dropped before posthog-node.capture is called. * fix(web): place "Share usage data" on the right in privacy consent banner Swap button order in PrivacyConsentModal and the in-settings ConsentCard so the affirmative "Share usage data" lands on the right and "Not now" on the left. Matches the OK-on-the-right pattern users expect for primary actions. Both buttons keep equal visual prominence (same .privacy-consent-action styling) so the swap doesn't change the EDPB equal-prominence stance called out in the original Langfuse telemetry spec. * feat(analytics): populate run_finished token totals from claude-stream usage Daemon's claude-stream parser already emits agent usage events with input_tokens / output_tokens totals; the run service buffers them in run.events and Langfuse reads them out the same way. The run_finished PostHog event was leaving these fields empty. Scan run.events for the most recent agent usage frame on terminal transition and emit input_tokens / output_tokens / total_tokens when present. token_count_source flips to 'provider_usage' only when at least one count landed; runs without provider-side usage data keep 'unknown'. Provider does not break the input down into the 7 sub-fields the tracking doc lists (memory / context / attachment / system_prompt / …); those stay omitted until a parser change exposes them. * feat(analytics): estimate user_query_tokens from prompt length The user_query_tokens field for run_created / run_finished was hardcoded to 0. We can't tokenize without bundling a model-specific tokenizer, but the character/4 heuristic is the industry-standard estimate when one isn't available and is enough for funnel analysis (prompt-length cohorts, short-vs-long-query conversion rates). Extracted from req.body via the same telemetryPromptFromRunRequest pattern the daemon already uses for langfuse-bridge (currentPrompt then message fallback). Only the integer count goes to PostHog — the prompt text itself never leaves the daemon. token_count_source flips appropriately: - run_created with a prompt: 'estimated' (was 'unknown') - run_created with no prompt: 'unknown' - run_finished with provider usage: 'provider_usage' (overrides baseProps' 'estimated' value) - run_finished without provider usage: inherits 'estimated' or 'unknown' from baseProps so input/output absent doesn't mask the estimate.	2026-05-12 22:32:42 +08:00
Nagendhra Madishetti	09a8fa8d64	feat(web): Critique Theater Phase 8 (8 Theater components, barrel, role-keyed CSS) (#1314 ) * feat(web): pure reducer for Critique Theater states (Phase 7.1) Pure CritiqueState reducer driven by the contracts-level PanelEvent (the same shape both the live SSE stream and the recorded transcript emit), so a single reducer powers both the in-flight panel and the rerun replay. Lifecycle covers run_started → running → (shipped / degraded / interrupted / failed), with panelist_open / dim / must_fix / close / round_end events building per-round CritiquePanelistView entries as they arrive. Defensive behaviour that surfaced while writing the spec tests: - Terminal phases (shipped / degraded / interrupted / failed) are sticky against further lifecycle events for the same run, except for parser_warning which can land late and is recorded in a side channel without changing phase. - A new run_started for a different runId at any time discards the prior state and reboots, so the UI can launch consecutive runs without an explicit reset action. - Events whose runId does not match the active run return the same state reference, so React's useReducer doesn't re-render subscribers on stray traffic. - Round bookkeeping keys by round number rather than "always last", so an out-of-order panelist_dim for round 1 arriving after a round 2 dim does not corrupt the round 2 bucket. Test coverage: 18 cases covering each transition, the runId guard, sticky-terminal behaviour, the out-of-order round invariant, and the stable-identity guarantee. Sets up Phase 7.2 and 7.3 to wire SSE + replay into the same reducer. * feat(web): useCritiqueStream hook subscribes to SSE and feeds reducer (Phase 7.2) createCritiqueEventsConnection is a pure connection manager that mirrors apps/web/src/providers/project-events.ts: opens an EventSource at /api/projects/:id/events, listens for every name in CRITIQUE_SSE_EVENT_NAMES, decodes each frame back into a PanelEvent (stripping the critique. prefix and merging the data payload), and hands it to the caller's onEvent. Reconnect uses exponential backoff (1s → 30s) and resets on `ready`; malformed payloads drop with a dev-mode warning rather than tearing the stream. useCritiqueStream wraps the manager in a useReducer that owns the CritiqueState. enabled=false or a null projectId tears down the connection cleanly; switching projectId closes the old connection and opens a fresh one. The returned dispatch lets local UI synthesise actions (e.g. an Esc keypress firing a synthetic interrupted while a kill request is in flight); production traffic comes from the SSE stream. Test coverage: - sse.test.ts (10 cases, node env): subscription set covers every CRITIQUE_SSE_EVENT_NAMES channel; payload decoding lifts the wire shape back to PanelEvent; malformed JSON is swallowed and does not stop the stream; exponential backoff schedule and ready-reset semantics are pinned with a setTimeout seam; close() cancels pending reconnects and shuts the live source; no-op fallback when EventSource is unavailable. - useCritiqueStream.test.tsx (6 cases, jsdom env): idle pre-event, reducer driven by synthetic actions, no connection when disabled or projectId is null, clean close on unmount, projectId change reopens cleanly. * feat(web): useCritiqueReplay hook drives reducer from transcript file (Phase 7.3) Fetches the per-run NDJSON transcript (one PanelEvent per line), parses every line via the shared isPanelEvent predicate, and dispatches into the same CritiqueState reducer the live SSE stream uses. A single reducer means the UI rendering a replay can be identical to the live panel, and a UI mounting both useCritiqueStream and useCritiqueReplay in parallel does not have to reconcile two state shapes. speed knob is `paused \| instant \| live \| { intervalMs: N }`. - instant flushes every event synchronously, useful for opening a finished run already at its terminal state. - intervalMs paces dispatches at a fixed cadence so the reviewer can watch the run unfold. - paused parses the transcript but holds events back until the caller advances speed (consumers can drive a scrubber later). - live is reserved for the future "playback at original cadence" feature, currently treated as instant; replay timestamps are not yet persisted with each event so honest pacing requires a follow-up Phase 7+ task. gunzip seam handles `.ndjson.gz` transcripts via DecompressionStream when present; the production fetch path picks between text and arrayBuffer based on the URL extension. Both seams are injectable so the unit tests don't need to spin up a real network or a real gzip pipeline. Test coverage (8 cases, jsdom env): - Idle status before any URL is provided. - speed=instant flushes the full transcript synchronously to shipped state. - speed={intervalMs:N} paces with the setTimeout seam, reaching done after the last tick. - speed=paused leaves status=playing with no dispatches. - Empty transcript reports done with state still idle. - Fetch rejection surfaces an error status with the message. - Malformed NDJSON lines are skipped; valid events around them still land. - .gz transcripts route through the gunzip seam. Closes the Phase 7 plan tasks 7.1 / 7.2 / 7.3 (reducer + stream + replay), all on one branch ready for review. Phases 8+ (Theater components) consume these from this PR. * fix(web): close payload-override gap + paused-resume bug in Critique Theater hooks (Phase 7 review) Two P1 fixes from lefarcen's review on PR #1307: SSE payload override `sseToPanelEvent` previously spread `data` after the channel-derived `type`, so a payload-provided `type` could override the channel and route a `critique.run_started` frame into the reducer as a `ship` action. Reversed the spread so the channel-derived `type` is authoritative, and revalidated the resulting object through the contracts-level `isPanelEvent` predicate before returning. Frames that fail validation (missing runId, empty runId, unknown type) are dropped, so a malformed or compromised SSE frame can no longer dispatch a wrong-shape action into the reducer. Three new sse.test.ts cases pin the regression: hostile `type:'ship'` in the payload still resolves to `run_started`, missing runId is dropped, empty runId is dropped. Replay pause/resume `useCritiqueReplay` had one big effect keyed on `transcriptUrl` only, so flipping `speed` from `paused` to `instant` never re-fired and the held events sat undispatched. Split into a parse effect (depends on URL, fetches and stores events in state) and a pace effect (depends on parsed-events + speed, owns the cursor + timers). The playback cursor lives in a ref that survives pause/resume cycles, so flipping `paused` -> `instant` flushes from the current position rather than restarting (which would double-dispatch `run_started` and reset the reducer). Two new useCritiqueReplay.test.tsx cases: - paused-then-instant transitions from `playing` to `done` and reaches the shipped terminal phase - intervalMs paced playback dispatches one event, pauses to drain the next scheduled timer, flips to instant, and confirms the remaining transcript drains exactly once (cursor was preserved) Doc consistency The earlier source comment in useCritiqueReplay.ts claimed `live` "paces by recorded timestamps" while the impl used zero-delay timers and the PR body said it behaves like `instant`. Aligned to reality: `live` currently behaves like `{ intervalMs: 0 }` (events drain on successive microtasks via setTimeoutFn) because transcripts do not yet carry per-event timestamps. Honest timestamp-driven pacing is queued as a Phase 7+ follow-up. Validated: pnpm guard, pnpm --filter @open-design/web typecheck, Theater suite 47/47 (up from 42, +3 sse + 2 replay), full web suite 96 files / 888 tests. * feat(i18n): seed Critique Theater key block (en + zh-CN; other locales fall back via spread) * feat(web): Theater PanelistLane component (Phase 8.1) * feat(web): Theater ScoreTicker component (Phase 8.2) * feat(web): Theater RoundDivider component (Phase 8.3) * feat(web): Theater InterruptButton component with Escape keybind (Phase 8.4) * feat(web): Theater TheaterDegraded chip (Phase 8.5) * feat(web): Theater TheaterCollapsed post-run summary (Phase 8.6) * feat(web): Theater TheaterTranscript replay surface (Phase 8.7) * feat(web): Theater TheaterStage top-level container (Phase 8.8) * feat(web): Theater CSS using existing semantic tokens (no hex literals) * feat(web): Theater public exports barrel * fix(web): resolve P2 + P3 review feedback on Phase 8 (PR #1314) Addresses all 4 P2 + 3 P3 items from codex, Siri-Ray, and lefarcen. State-lifecycle fixes (3 x P2) 1. Reducer learns a synthetic `__reset__` action (`CritiqueResetAction`). Host hooks dispatch it when their gating prop changes so a stale run from a prior project / transcript cannot bleed into the next context. Reset is idempotent on idle (returns the same reference). 2. `useCritiqueStream` dispatches `__reset__` at the top of its connection effect, so a workspace switch from project A (which streamed a critique) to project B clears the reducer before the new EventSource opens. enabled=false also clears. 3. `useCritiqueReplay` dispatches `__reset__` at the top of its parse effect, so transcriptUrl swaps (including swap-to-null after a replay reached `shipped`) lift the reducer back to idle before the new fetch starts. SSE validation (1 x P2) 4. `sseToPanelEvent` now runs a per-variant `hasValidVariantShape` check after the cheap `isPanelEvent` predicate. A `critique.ship` frame missing `composite` / `round` / `status` / `artifactRef` is rejected before reaching the reducer, so TheaterCollapsed can no longer crash on `undefined.toFixed(1)`. Every variant's required fields are validated: run_started (protocolVersion, non-empty cast, maxRounds, threshold, scale), panelist_* (round, role, plus variant-specific shape), round_end (round, composite, mustFix, decision in {continue,ship}, reason), ship (round, composite, status, artifactRef.{projectId,artifactId}, summary), degraded (reason, adapter), interrupted (bestRound, composite), failed (cause), parser_warning (kind, position). Reducer correctness (1 x P2) 5. `panelist_open` now materializes the round + an empty panelist view (`{dims: [], mustFixes: []}`) so TheaterStage can highlight the in-progress lane the instant the tag opens. Before this, a stream that emitted only `panelist_open` after `run_started` left `rounds = []` and the UI rendered no current round until a later `panelist_dim` arrived. Polish (3 x P3) 6. Brand role tint swaps from `var(--magenta, var(--accent))` to `var(--purple, var(--accent))`. `--purple` is actually defined across the design systems; `--magenta` is not, so Brand was silently falling through to `--accent` and looking identical to Designer. 7. New i18n key `critiqueTheater.interruptedSummary` for the interrupted-collapse copy ("Interrupted at round N, best composite X.X"). Previously the interrupted branch reused `shippedSummary` and the UI read "Shipped at round..." for a run that specifically did not ship. Native value in en + zh-CN; other locales fall back via `...en` spread. 8. `TheaterDegraded` heading id comes from `useId()` instead of a hardcoded `theater-degraded-heading`, so two chips rendered on the same page (chat history with multiple completed runs) keep their aria-labelledby references unambiguous. Tests (15 new cases) - reducer.test.ts (+5): __reset__ on running/terminal/idle, panelist_open materializes round, panelist_open does not stomp prior panelist data. - sse.test.ts (+6): variant-level rejection for ship without required fields, degraded without adapter, run_started with empty cast, panelist_dim with non-numeric score, round_end with unknown decision, plus a positive fully-formed ship. - useCritiqueStream.test.tsx (+2): state reset on projectId change, state reset on enabled flip false. - useCritiqueReplay.test.tsx (+1): state reset on transcriptUrl swap to null after a replay reached shipped. - TheaterCollapsed.test.tsx (text-pinning update): asserts the interrupted branch reads "Interrupted at round 1" + "best composite 7.9", and explicitly NOT "Shipped at round...". - TheaterDegraded.test.tsx (+1): two chips on the same page get unique aria-labelledby ids that each resolve to an `<h3>`. Validated - pnpm guard clean - pnpm --filter @open-design/web typecheck clean - Theater suite: 13 files, 101 tests (was 86 on the first Phase 8 push, +15 new) - tests/i18n/locales.test.ts 5 of 5 across 18 locales * fix(web): tighten isPanelEvent in contracts so enum + numeric fields are checked end-to-end (Siri-Ray round-3 P1 on PR #1314) The variant validator on the web SSE path previously accepted any `typeof === 'string'` for closed-enum fields (ship.status, panelist_.role, degraded.reason, failed.cause, parser_warning.kind, run_started.cast[]) and any `typeof === 'number'` for numeric fields, which let NaN / Infinity through. Downstream components index i18n tables by enum value, so an unknown status or role would land `SHIP_BADGE_KEY[final.status]` on undefined and crash the translator. The replay parser had a separate gap: `useCritiqueReplay.parseTranscript` called the cheap `isPanelEvent` header check directly, so a recorded line like `{"type":"ship","runId":"r"}` reached the reducer with composite, status, round, artifactRef, summary all undefined and TheaterCollapsed then called `final.composite.toFixed(1)` on undefined. Resolution: move all wire-side validation into the contract guard. - Export const arrays for the closed enums: SHIP_STATUSES, DEGRADED_REASONS, FAILED_CAUSES, PARSER_WARNING_KINDS, ROUND_DECISIONS (PANELIST_ROLES already existed). - Rewrite `isPanelEvent` in packages/contracts/src/critique.ts to be the single deep validator: header (known type + non-empty runId) plus every variant-specific required field plus closed-enum membership plus Number.isFinite on every numeric field. Documented as the wire source of truth. - Drop the local `hasValidVariantShape` from web/sse.ts; sseToPanelEvent now relies entirely on the contract guard, and parseTranscript in useCritiqueReplay (which already uses isPanelEvent) gets the deeper validation for free. Tests (TDD, red-first): - packages/contracts/tests/critique.test.ts: 13 new cases pinning the strict guard directly (well-formed across every variant, every rejection path: unknown type, empty/non-string runId, unknown enum, non-finite numeric, missing variant field). - apps/web/tests/components/Theater/state/sse.test.ts: 9 new cases for each closed-enum rejection on the wire path plus a positive sweep across every legal enum value across every variant. - apps/web/tests/components/Theater/hooks/useCritiqueReplay.test.tsx: 2 new cases for incomplete and unknown-enum transcript lines. Verified: - pnpm --filter @open-design/contracts test 4 files / 30 tests green. - pnpm --filter @open-design/contracts build clean. - pnpm --filter @open-design/web typecheck clean. - pnpm --filter @open-design/web test 107 files / 976 tests green. fix(contracts): enforce numeric domains in isPanelEvent (lefarcen P2 on PR #1314 round 4) The strict guard from PR #1314 round 3 enforced enum membership and Number.isFinite, but accepted any finite number where the contract intends a specific domain: scale: 0 (ScoreTicker divides by it), negative thresholds, fractional rounds, negative mustFix, etc. ScoreTicker.tsx writes `var(--scale, ${state.scale})` into inline CSS and divides by it for tick width, so a guard-passing scale: 0 shipped Infinity into the rendered style. Negative composite / score values reached downstream code that assumes >= 0. Resolution: mirror the daemon-side Zod domain constraints in the runtime guard. Three new helpers in packages/contracts/src/critique.ts: - isPositiveInt(v): integer with v > 0. Used for round, maxRounds, scale, protocolVersion (all 1-indexed in the orchestrator). - isNonNegativeInt(v): integer with v >= 0. Used for mustFix, position, bestRound. bestRound: 0 is the valid sentinel for 'interrupted before any round closed'. - isNonNegativeFinite(v): finite number with v >= 0. Used for composite, score, dimScore, threshold. Threshold may be fractional (e.g. 8.5 on a scale of 10). Cross-field check inside run_started: threshold <= scale (the daemon Zod schema enforces this with an epsilon refine, the wire guard matches the same intent). Tests (TDD, red-first) added in packages/contracts/tests/critique.test.ts: - 22 new rejection cases across every numeric field that previously slipped through: scale: 0, negative scale, fractional scale, maxRounds: 0, fractional maxRounds, protocolVersion: 0, fractional protocolVersion, negative threshold, threshold > scale, round: 0, fractional round, negative dimScore / score, negative / fractional mustFix, negative composite, ship round: 0, negative / fractional bestRound, negative interrupted composite, negative / fractional parser_warning position. - 3 positive boundary cases that must still pass: threshold == scale, fractional threshold within [0, scale], interrupted with bestRound: 0 (no round completed before interrupt), parser_warning with position: 0 (start of stream). Verified: - pnpm --filter @open-design/contracts build clean. - pnpm --filter @open-design/contracts test: 4 files / 59 tests green (was 37 before the new domain cases). - pnpm --filter @open-design/web typecheck clean. - pnpm --filter @open-design/web test: 110 files / 1004 tests green; no regression on Theater suite, sse validator, replay parser, or assistant-feedback widget tests. --------- Co-authored-by: Nagendhra <nagendhra405@gmail.com>	2026-05-12 21:38:58 +08:00
pftom	6f818d971d	feat(daemon, web): implement plugin folder installation and enhance atom worker registry - Added a new API endpoint for installing plugins from specified folder paths, improving the plugin management experience. - Introduced functions for normalizing and validating project plugin folder paths, ensuring robust error handling. - Implemented a registry for built-in atom workers, allowing for dynamic signal aggregation during pipeline execution. - Enhanced the `runStageWithRegistry` function to support multiple atom workers, merging their outputs with pessimistic logic. - Updated the UI components to display plugin folder candidates and facilitate user interactions for plugin installation. - Added tests for the new atom worker registry and plugin folder installation features, ensuring reliability and correctness. This update significantly enhances the plugin installation process and the overall functionality of the atom worker system, providing users with better tools for managing plugins and their interactions.	2026-05-12 21:38:45 +08:00
pftom	ed2cbe171b	feat(daemon, web): implement media generation scenario and enhance plugin handling - Introduced a new `od-media-generation` scenario plugin for handling image, video, and audio projects, providing a default pipeline for media generation. - Updated the `collectBundledScenarios` function to deduplicate scenarios and prefer canonical IDs for task kinds, improving plugin routing. - Enhanced the `PluginsView` and `HomeHero` components to better display community and user-installed plugins, improving user experience. - Refactored tests to accommodate the new media generation scenario and ensure proper functionality across plugin types. This update significantly enhances the media handling capabilities and overall plugin management experience, making it easier for users to work with various media projects.	2026-05-12 20:54:33 +08:00
pftom	443aea72c5	feat(daemon, web): enhance plugin handling and UI integration - Introduced a new plugin upload mechanism with file size limits and memory storage, allowing users to upload plugins directly. - Implemented fallback logic for plugin application, ensuring projects can be created without explicit plugin requests. - Enhanced the UI to support plugin selection and integration, including a new `PluginsView` component for managing plugins. - Updated various components to utilize localized text for plugin queries, improving user experience across different languages. - Added tests for new plugin functionalities and local skill loading, ensuring reliability and correctness. This update significantly improves the plugin management experience, providing users with better tools for plugin integration and interaction.	2026-05-12 20:42:40 +08:00
nettee	28d3e5faf5	Fix Codex wrapper launch paths (#1395 )	2026-05-12 17:20:32 +08:00
Mason	2f51f3c1ae	feature: refine assistant artifact feedback (#1379 ) * feature: refine assistant artifact feedback * fix: clear hidden custom feedback reason * test: update assistant feedback expectations	2026-05-12 17:00:42 +08:00
@aaronjmars	377d65b7e4	fix(security): strip trailing dot in normalizeBracketedIpv6 (FQDN SSRF bypass) (#1122 ) * fix(security): strip trailing dot in normalizeBracketedIpv6 (FQDN bypass) new URL('http://192.168.1.5./').hostname returns '192.168.1.5.' — the trailing dot is the RFC 1034 absolute-FQDN form and resolves identically to '192.168.1.5'. parseIpv4 fails on the dotted form, so 169.254.169.254. slips past the metadata-service block, 192.168.1.5. slips past the LAN block, and localhost. slips past the loopback identification. Strip trailing dots in normalizeBracketedIpv6 so all downstream checks (isLoopbackApiHost, isBlockedExternalApiHostname, isBlockedIpv4, IPv6 range tests) see the canonical form. Adds 6 vitest cases covering loopback FQDN forms (localhost., foo.localhost., 127.0.0.1.) and SSRF FQDN bypasses (169.254.169.254., 192.168.1.5., 10.0.0.5.). Refs nexu-io/open-design#1119 review feedback (P2 from @lefarcen). * test(connectionTest): tighten trailing-dot coverage per #1122 review Two issues from #1122 review: 1. (P2 from @mrcfps + codex bot) The original `foo.localhost.` case asserted error===undefined on validateBaseUrl, which only proves the URL passed validation — not that the host is identified as loopback. Replaced with direct isLoopbackApiHost(...) assertions on the actual loopback FQDN forms (localhost., 127.0.0.1., 127.0.0.5.) so the test exercises the loopback path the comment claims. 2. (P3 from @lefarcen) Original blocked-FQDN tests covered only 3 of 7 ranges that isBlockedIpv4 handles. Added a dedicated case per range (0.0.0.0/8, 10/8, 100.64/10, 169.254/16, 172.16/12, 192.168/16, multicast >=224) so future regressions in normalizeBracketedIpv6 surface against the full coverage. * docs: drop misleading foo.localhost./endsWith claim in normalizer comment @lefarcen review feedback: isLoopbackApiHost only accepts exact 'localhost', '::1', loopback IPv4, and mapped loopback IPv4 — there's no subdomain or endsWith handling, so referencing 'foo.localhost.' overstates what the trailing-dot strip enables. Rewrite the comment to match actual call sites (isLoopbackApiHost equality + isBlockedIpv4 numeric parse).	2026-05-12 16:36:09 +08:00
Hesam	d97b6041eb	fix(platform): add legacy ~/.fnm path to wellKnownUserToolchainBins (#1110 ) * fix(platform): add legacy ~/.fnm path to wellKnownUserToolchainBins fnm legacy installations use ~/.fnm/node-versions. Closes #1102 * fix: remove stray .fnm token from type declaration	2026-05-12 16:15:52 +08:00
lefarcen	2a0ebea50b	release: Open Design 0.7.0 - bump 14 monorepo package.json files to 0.7.0 (root + apps/{web,daemon,desktop,packaged,landing-page} + packages/{contracts,platform,sidecar,sidecar-proto} + tools/{dev,pack,pr} + e2e); apps/packaged was already at 0.6.1 from beta lane, all others at 0.6.0 - add CHANGELOG.md [0.7.0] - 2026-05-12 entry covering 97 merged PRs since 0.6.0: - Critique Theater: Phase 7 web client state machine (#1307) + Phase 6.2 daemon artifact extraction (#1085) - Web/UI: thumbs-up/down feedback widget (#1308), Cmd+, opens Settings (#1173), Finalize design package + Continue in CLI (#974), fetch models button for BYOK (#1034), provider models alphabetical sort (#1097), collapsible MCP JSON field-mapping (#1136), design file rename (#894) - Daemon: auto-memory store with chat-protocol-aware extraction (#999), install/uninstall skills & design systems (#1003), HTTP 206 range requests for video/audio (#1105), scheduled routines (#1033), agent runtime + route registration refactor (#1063, #1043) - HyperFrames: HTML-in-Canvas across web + skills (#866) - Skills/design systems: generic skills + design-templates split + finalize-design API (#955), agent-browser skill (#1284), WeChat design system + login-flow skill (#1083), hud/loom/trading-terminal design systems (#1069), release-notes-one-pager skill (#873), tokens.css schema (#1231) - Packaging: macOS Intel (x64) build (#759), official Nix flake (#402), beta packaging cache (#1095) - Maintainer ops: tools-pr PR-duty workspace (#1259), MAINTAINERS.md (#1290), contributor card bot (#932), PR→issue linking discipline (#1263) - Changed: conversation run isolation (#1271), default English i18n fallback (#1270), Codex CLI exit diagnostics / empty-response handling / path fallback (#1267, #1244, #1205) - Fixed: ~30 web + desktop + daemon + packaging bugfixes - Internal: nightly UI/desktop regression coverage (#1256), e2e/release report hardening (#1140), entry/settings automation (#954) - catch up [Unreleased] compare link to v0.7.0 and add missing [0.6.0] release link - add 97 PR footnote refs ([#402]..[#1330]) Verified locally: pnpm install + pre-build contracts/daemon/desktop dist + pnpm typecheck (exit 0 across all 14 packages on Node 22.22 with engine-warning). Release workflow validation runs after merge via release-stable.	2026-05-12 15:33:28 +08:00
huyhoangnhh98	140a4e1ff6	Improve responsive preview and design handoff outputs (#1224 ) * feat: improve responsive design handoff * feat: refine cross-platform design outputs Changelog:\n- Add auto-fit responsive preview behavior for tablet/mobile frames.\n- Add landing page and OS widgets metadata options with project header chips.\n- Strengthen prompt contracts for modern breakpoints, app-specific modules, CJX-ready UX, and final product surfaces.\n- Require cross-platform outputs to use separate platform files instead of tabbed demo selectors.\n- Add DESIGN-MANIFEST.json plus richer handoff guidance to daemon/client exports.\n- Update archive/export tests for manifest and responsive viewport matrix. * feat: enforce screen-file design outputs Changelog:\n- Enforce screen-file-first generation for landing pages, app screens, platform surfaces, and OS widgets.\n- Update design handoff and manifest exports so coding tools map each screen file to separate routes/surfaces.\n- Strengthen minimal-brief visual guidance to avoid monochrome or unstyled design outputs. * fix: address responsive handoff review feedback * fix: address handoff review blockers * fix: preserve proxy auth and normalized export entry * fix: narrow frame wrapper filter to directory paths only * fix: make artifact save failure banner generic --------- Co-authored-by: Huy Hoàng <macos@MacBook-Pro-Hoang.local>	2026-05-12 14:18:33 +08:00
pftom	b3dc3c3e0c	feat(web): integrate applied plugin snapshot for enhanced user experience - Added support for displaying an active plugin as a context chip in user messages when a project is created with a pinned plugin. This replaces the in-composer plugin rail to avoid re-prompting users for plugin selection. - Introduced `applied_plugin_snapshot_id` in the database schema and updated relevant components (ChatComposer, ChatPane, ProjectView) to handle the new functionality. - Implemented fetching of the applied plugin snapshot in ProjectView to ensure the active plugin is rendered correctly. - Enhanced CSS for the plugin chip to improve visual presentation. This change streamlines the user experience by providing context on previously selected plugins directly within the chat interface.	2026-05-11 22:53:40 +08:00
Sebastian Westberg	8962088c75	feat(daemon): guard against agent-emitted stub artifact regressions (#1171 ) * feat(daemon): guard against agent-emitted stub artifact regressions When an agent emits an <artifact> block whose body is a placeholder ("see other-file.html in this project", a bare filename string, a tiny fallback page) instead of the full document, the daemon writes the placeholder to disk verbatim. Users see a 25-500 byte HTML file where their previous version had tens of kilobytes of real markup. Add a structural regression guard in writeProjectFile: before writing an html/deck artifact whose manifest carries metadata.identifier, scan the project dir for prior siblings matching <identifier>(-\d+)?\.html? and compare sizes. If the new body is below minRetainedRatio (default 0.2) of the largest prior sibling >= minPriorBytes (default 4096), flag a regression. Three modes via env: - OD_ARTIFACT_STUB_GUARD=warn (default) writes the file and attaches stubGuardWarning to the response so the frontend can surface it. - OD_ARTIFACT_STUB_GUARD=reject throws ArtifactRegressionError before fs.writeFile; the route returns 422 ARTIFACT_REGRESSION with the prior sibling's name and size in error.details. - OD_ARTIFACT_STUB_GUARD=off skips the guard entirely. Cross-agent by design: anchored on size delta + identifier match, no agent-specific stub-phrase regex, so works for any agent backend behind the agent-adapter abstraction. The body-then-manifest write order pre-dates this change; the reject path throws before fs.writeFile so rejections never leave a partial state behind. 24 unit + 8 HTTP tests cover happy paths, all three modes, deck kind, .htm extension sibling detection, ratio=1 edge case, and verify rejected writes leave neither the html nor its manifest sidecar on disk. * fix(stub-guard): close same-name, nested-dir, and non-slug bypasses Code review on PR #1171 (lefarcen, Codex, mrcfps) found three holes where the stub guard could be silently bypassed. All three are now closed with HTTP test coverage. Same-name overwrite (lefarcen P1): the writer's prior-sibling scan deliberately skipped the file at safeName, but for an in-session overwrite (persistArtifact reuses the same fileName when savedArtifactRef.current matches) that file is the prior content, not the new entry. Drop the exclude-by-name filter; the current on-disk size at scan time is always the prior because the overwrite happens after this check. Subdirectory scoping (Codex/mrcfps P2): writeProjectFile creates parent directories for nested paths like reports/overview.html, but the guard only scanned the project root. Pass path.dirname(target) as scanDir so nested artifacts are evaluated against their real sibling set. Non-slug identifier (Codex/lefarcen/mrcfps P2): the web's persistArtifact slugifies the filename basename but stores the raw identifier in the manifest, so an identifier like "Landing Page" yields filename landing-page.html with metadata.identifier="Landing Page". Build the sibling regex from both the raw identifier and a slugified variant (mirroring the frontend's slugifier) so either form matches the same priors. Also surface warn-mode warnings in the web UI: ProjectView now checks file.stubGuardWarning after writeProjectTextFile and renders the warning via setError. Reject-mode 422 surfacing requires restructuring writeProjectTextFile's return contract and is deferred. API change inside the daemon: evaluateArtifactStubGuard / findPriorArtifactSiblings drop excludeSafeName and rename projectDir to scanDir. Tests updated. Tests: 4 new HTTP cases (same-name overwrite preserves prior body, nested subdir rejects, slug-form match rejects, plus the existing warn/off/deck/.htm cases) and 1 new unit case (slug-form sibling match). 44 tests pass. * fix(stub-guard): empty-slug fallback + reject-mode UI surface Round 3 review on PR #1171 (lefarcen, mrcfps) found two remaining holes after `9cc82430` closed the same-name / subdir / non-slug bypasses. Empty-slug fallback bypass (lefarcen P2): an identifier like "测试" (all-non-ASCII) strips to empty through the web slugifier, and persistArtifact's `slice(0,60) \|\| 'artifact'` falls back to the literal "artifact" basename. The guard searched for raw identifier + slug only, so a later artifact-2.html stub bypassed the prior. Add EMPTY_SLUG_FALLBACK_NAME = 'artifact' as a sibling-name candidate when the slug is empty, mirroring the frontend fallback exactly. Reject-mode UI silence (mrcfps P2 + lefarcen P2): writeProjectTextFile collapses any non-OK response (including 422 ARTIFACT_REGRESSION) to null, and persistArtifact previously had no else branch. Users in reject mode saw the daemon log fire but the UI was silent. Add an else branch that surfaces a generic banner pointing at the most likely cause and mentions checking the daemon logs for structured details. Also clear savedArtifactRef.current on failure so retries re-enter the persistence path. Plumbing the structured 422 details through writeProjectTextFile itself remains out of scope (cross-cutting client contract change affecting 5+ call sites). The generic banner is the "at minimum" path mrcfps suggested. Tests: 1 new unit case (artifact.html sibling discovery for non-ASCII identifier) + 1 new HTTP case (empty-slug stub regression rejected end-to-end). 46 tests pass across stub-guard suites (was 44). * fix(stub-guard): verify sidecar identity to avoid cross-identifier false positives Round 4 review on PR #1171 (mrcfps inline + lefarcen review) caught a false-positive introduced by the round-3 empty-slug fallback. Two distinct identifiers that both slugify to empty (e.g. "测试" and "首页") share the artifact.html basename, so a brand-new save under the second identifier was being compared against — and falsely rejected because of — the unrelated first. The same shape exists symmetrically: a non-empty-slug identifier literally named "artifact" would falsely match empty-slug fallback files written under any other identifier. Fix: filename pattern matching is now a candidate generator, not the source of truth. For every candidate sibling, read its .artifact.json sidecar and verify metadata.identifier matches the input via artifactIdentifiersMatch (raw equality OR shared non-empty slug). Files without a sidecar are skipped — they weren't written through the artifact-tag path this guard targets, and treating them as priors was always a stretch. Empty-slug equivalence is intentionally NOT honored: 测试 != 首页 even though both slugify to empty. The whole bug was conflating distinct identifiers via the fallback name; slug-equivalence kicks in only for non-empty slugs (Landing Page <-> landing-page). Tests: unit fixtures now write file+sidecar pairs (mirrors prod); new artifactIdentifiersMatch suite covers the 5 equivalence cases; new HTTP test does NOT cross-reject distinct empty-slug identifiers asserts the second save returns 200 instead of 422; new unit test skips files without a sidecar. 42 tests pass across stub-guard suites. fix(stub-guard): require canonical-form anchor in identifier match to avoid 60-char truncation collisions Round 5 review on PR #1171 (mrcfps) caught another false-positive in artifactIdentifiersMatch: slugifyArtifactIdentifier truncates at 60 chars, so two distinct >60-char identifiers that share their first 60 chars (e.g. "A...A1" and "A...A2", 70 chars each) slugify to the same string and would falsely bridge. Same shape as the empty-slug fallback bug from round 4, just at the other end of the input range. Tighten the rule: slug-equivalence requires at least one input to BE its own canonical slug form. That keeps the legitimate bridge ("Landing Page" <-> "landing-page" — second input IS the slug) but rejects truncation collisions ("A...A1" <-> "A...A2" — neither is in canonical form). Side effect: two non-canonical forms that slugify to the same value no longer bridge (e.g. "Landing Page" vs "LANDING-PAGE"). This is correct: without one canonical anchor we can't safely call them the same lineage. Updated the slug-equivalence test to assert the new semantics explicitly with both directions and a negative case. Tests: 2 new cases (no bridge for >60-char truncation collision; raw 70-char to its 60-char truncated slug still bridges) + 1 negative test for the non-canonical-pair case. 45 tests pass. * fix(stub-guard): cover legacy sidecar-less HTML priors Round 6 review on PR #1171 (mrcfps, non-blocking) caught a real legacy bypass: round 4's sidecar-required policy skipped any HTML file without an .artifact.json companion, but readManifestForPath (projects.ts) treats those same files as legitimate artifacts via inferLegacyManifest. So a project with an older sidecar-less dashboard.html (pre-sidecar era, Write-tool-emitted, paste-text, manual import, etc.) let its first stub rewrite through as a supposed "first emission". Fix: when the sidecar is missing, derive a synthetic identifier from the filename (strip the (-N)?\.html? suffix) and run it through the same artifactIdentifiersMatch rules. Synthetic identifiers come from already-slugified filenames, so they bridge raw inputs only via the canonical-form rule established in round 5 — no truncation collisions, no empty-slug conflation, no unrelated cross-identifier matches. Tests: 3 new unit cases (legacy fallback finds the prior; bridges raw->slug under the same rules; does NOT bridge unrelated slug forms via inference) + 1 new HTTP test that seeds a sidecar-less prior via the artifact-manifest-less write path and asserts the stub rewrite is rejected with 422 ARTIFACT_REGRESSION. 48 tests pass across stub-guard suites (was 45). * fix(stub-guard): try both interpretations for legacy filename inference Round 7 review on PR #1171 (mrcfps, non-blocking) caught a real ambiguity in the round-6 legacy fallback: a filename like `phase-2.html` is genuinely ambiguous without a sidecar. It could be the identifier "phase" with a -2 collision suffix, OR the standalone identifier "phase-2". The round-6 helper only stripped the suffix, so a sidecar-less `phase-2.html` followed by a stub emission with metadata.identifier="phase-2" bypassed the guard ("phase-2" doesn't match the inferred "phase"). Fix: when the sidecar is missing, generate both candidate identifiers (full basename and suffix-stripped basename) and accept the file as a prior if either matches. Visible false positives are preferable to silent false negatives — and the canonical-form anchor in artifactIdentifiersMatch still rules out truncation collisions and empty-slug conflations regardless of which candidate matched. Tests: 2 new unit cases (full-basename interpretation finds "phase-2"; suffix-stripped interpretation also finds "phase") and 1 new HTTP test that seeds a sidecar-less `phase-2.html` and asserts the stub rewrite is rejected with 422 ARTIFACT_REGRESSION. 51 tests pass across stub-guard suites (was 48). --------- Co-authored-by: Sebastian Westberg <sebastianwestberg@users.noreply.github.com>	2026-05-11 19:59:37 +08:00

1 2 3

130 commits