Commit graph

30 commits

Author SHA1 Message Date
Denis Redozubov
c847ace554
Add run-scoped media execution policy (#3106)
* feat(contracts): add run media execution policy

* feat(daemon): enforce run media execution policy

* test(daemon): cover media execution policy gates
2026-05-28 09:19:40 +00:00
Siri-Ray
170a05f5d2
Formalize skill artifacts into plugins (#3085)
* Add skill-to-plugin candidate flow

* Fix skill plugin candidate card reuse

Generated-By: looper 0.9.1 (runner=fixer, agent=codex)

* Fix skill plugin candidate dismiss and URL gates

Generated-By: looper 0.9.1 (runner=fixer, agent=codex)

* Polish skill plugin candidate copy
2026-05-27 08:26:00 +00:00
shangxinyu1
cc6edb9afe
Proxy GitHub metadata through the daemon (#2654)
* Proxy GitHub metadata through the daemon

* fix(contracts): share GitHub metadata responses

Generated-By: looper 0.6.0 (runner=fixer, agent=codex)

* fix(contracts): align GitHub fetchedAt payload types

Generated-By: looper 0.6.0 (runner=fixer, agent=codex)

* Proxy GitHub metadata through the daemon

Generated-By: looper 0.6.0 (runner=fixer, agent=codex)
2026-05-22 14:06:07 +08:00
Eli-tangerine
8193981511
Keep PR 2400 changes without folder pickers (#2462)
* feat(daemon): add project working directory management and editor hand-off functionality

- Introduced new flags for project commands to manage working directories, including `--working-dir` and `--dir`.
- Implemented API routes for listing available editors and opening projects in selected editors.
- Added a hand-off button in the ChatPane header to facilitate opening project folders in local applications.
- Enhanced the HomeHero component to include working directory and design system settings, improving user experience in project creation.
- Created HomeHeroSettingsChips component for inline management of working directory and design system selection.

* feat(chat): implement voice transcription proxy and enhance UI components

- Added a new API route for voice transcription using OpenAI's `/audio/transcriptions` endpoint, allowing users to send audio blobs directly for transcription.
- Integrated multer for handling audio file uploads in memory, ensuring efficient processing without disk storage.
- Updated the HomeHero component to include example prompt suggestions for plugins, enhancing user interaction.
- Introduced the EditorIcon component to visually represent different editors in the hand-off menu, improving the user experience.
- Refined the HandoffButton component to utilize the new EditorIcon, providing a more cohesive interface for selecting editors.
- Enhanced CSS styles for various components to improve layout and responsiveness, including adjustments to tab and button sizes for better usability.

* style(workspace-shell): enhance layout and overflow handling

- Updated CSS for .workspace-shell to ensure full viewport width and height, with proper overflow management.
- Adjusted grid layout to prevent content overflow and maintain responsiveness.
- Modified styles for .workspace-tabs-chrome to improve width handling and prevent overflow issues.

* refactor(chat): remove voice transcription proxy and related components

- Deleted the voice transcription proxy implementation, including the associated API route and multer configuration.
- Removed the MicButton component from the ChatComposer and HomeHero components to streamline the UI.
- Updated HomeHero to include example suggestions without the voice input functionality.
- Adjusted CSS styles for various components to maintain layout consistency after the removal of the MicButton.

* feat(daemon): implement minting of HMAC tokens for working directory management

- Added a new function `mintImportTokenFromCurrentSecret` to generate HMAC tokens bound to a specified base directory, enhancing security for working directory operations.
- Updated the `desktop-auth.ts` file to include the new token minting functionality, which returns structured errors when the desktop auth secret is cleared.
- Introduced new IPC message types for minting import tokens in the sidecar protocol, allowing seamless integration with the daemon's working directory management.
- Enhanced the `WorkingDirPill` component to utilize the new token minting flow for secure directory selection in desktop builds.
- Updated CSS styles for the HomeHero component to accommodate new example suggestion features and maintain layout consistency.

* fix(HomeView): import HOME_HERO_CHIPS constant for improved chip management

- Updated the HomeView component to import the HOME_HERO_CHIPS constant from the chips module, enhancing the management of hero chips within the component.

* feat(daemon): implement mintImportTokenViaSidecar for secure working directory management

- Introduced the `mintImportTokenViaSidecar` function to facilitate the minting of HMAC tokens for desktop-import operations via the daemon's sidecar IPC. This allows CLI commands to bypass authentication when the desktop-auth gate is active.
- Updated the CLI to utilize the new token minting function when setting the working directory, ensuring secure access to trust-gated API endpoints.
- Enhanced the sidecar server to handle minting requests and return structured error messages for improved user feedback.
- Added tests to validate the new token minting functionality and its integration with the working directory management process.
- Refactored related components to support the new token flow, improving overall security and user experience.

* feat(HomeHero): enhance UI components and styles for improved user experience

- Updated HomeHero component to replace active dot indicators with Plug icons for better visual representation of active plugins.
- Adjusted CSS styles for various elements, including padding and dimensions, to enhance layout consistency and responsiveness.
- Introduced new styles for active type icons and improved hover effects for buttons.
- Updated HomeHeroSettingsChips to change button titles and icons for clarity.
- Added tests to ensure proper rendering and functionality of updated components.

* feat(ProjectDesignSystemPicker): enhance design system selection with preview functionality

- Updated the ProjectDesignSystemPicker component to include a preview feature for design systems, allowing users to see a preview of the selected design system.
- Implemented hover functionality to update the preview based on the hovered design system.
- Added fullscreen preview capability for a more immersive experience.
- Enhanced CSS styles for the design system picker to improve layout and responsiveness.
- Introduced tests to validate the new preview functionality and ensure proper interaction within the component.

* feat: refactor project metadata handling and enhance design system picker

- Updated the default scenario plugin ID retrieval to use project metadata, improving the logic for determining the appropriate plugin based on project intent.
- Enhanced the ProjectDesignSystemPicker and related components to support localized design system summaries and categories, improving user experience.
- Introduced new translations for working directory and design system picker components, ensuring better accessibility and usability across different locales.
- Added a new 'live-artifact' project type to the HomeHero chips, expanding the functionality for users creating refreshable artifacts.
- Updated tests to validate the new project metadata handling and design system picker functionalities.

* feat: enhance localization and styling for design system components

- Added French translations for working directory and design system picker components, improving accessibility for French-speaking users.
- Updated CSS styles for the pet task item to ensure consistent padding and layout.
- Introduced a new test suite for HomeHeroSettingsChips to validate localization and design system selection functionality.
- Enhanced ProjectDesignSystemPicker tests to ensure proper localization and interaction with design system categories.

* fix: update .gitignore to include all claude-sessions directories and remove specific session files

- Modified .gitignore to ensure all claude-sessions directories are ignored by using a wildcard pattern.
- Deleted two specific claude-sessions markdown files to clean up unnecessary session data.

* fix: repair home automation ci regressions

* fix: stabilize artifact consistency e2e

* Remove folder picker changes from PR 2400

---------

Co-authored-by: pftom <1043269994@qq.com>
Co-authored-by: qiongyu1999 <2694684348@qq.com>
2026-05-20 22:07:30 +08:00
Bryan
c530d163f8
feat(web): "Resume conversation in new chat" UI — #462 Commit B (companion to #1718) (#2264)
* feat(contracts): add handoff request/response DTOs

Adds HandoffRequest, HandoffResponse, and HANDOFF_SCHEMA_VERSION for
the upcoming POST /api/projects/:id/handoff synthesis endpoint. Mirrors
the finalize.ts subpath pattern (package.json#exports + esbuild entry +
index re-export) so daemon and web can import
@open-design/contracts/api/handoff.

Refs nexu-io/open-design#462.

* feat(daemon): add handoff synthesis pipeline (buildHandoffPrompt + synthesizeHandoffPrompt)

Adds `apps/daemon/src/handoff-design.ts` exposing the resume-conversation
synthesis primitives the upcoming `POST /api/projects/:id/handoff` route will
call into.

- `buildHandoffPrompt({ projectId, transcriptJsonl, transcriptMessageCount,
  now })` returns the system + user prompts. System prompt asks Claude to
  emit a structured Markdown body with Context / Decisions made / Open
  questions / Current focus / Provenance, with Provenance bullets explicitly
  flat (no Markdown emphasis on labels) to preempt the PR #1584 round-2
  parser bug.
- `synthesizeHandoffPrompt(db, projectsRoot, projectId, options)` reuses the
  existing finalize-design pipeline pieces: `exportProjectTranscript` →
  `truncateTranscriptForPrompt` → `buildHandoffPrompt` →
  `callAnthropicWithRetry` → `extractDesignMd`, but without the lockfile,
  disk write, design-system, or artifact-resolution paths.
- Promotes `DEFAULT_TIMEOUT_MS` in finalize-design.ts to `export const` so
  handoff shares the same 120s upstream-call bound.

Refs nexu-io/open-design#462.

* feat(daemon): wire POST /api/projects/:id/handoff route

Adds the handoff HTTP route and registers it in server.ts. Validation
block + error-mapping shape mirror registerFinalizeRoutes (BYOK payload,
upstream-error → ApiErrorCode mapping, redactSecrets on the raw upstream
body). Handoff has no lockfile, so the CONFLICT branch is omitted.

`res.on('close')` is wired to flip an AbortController whose signal is
threaded into synthesizeHandoffPrompt, so a UI-side cancel actually
aborts the daemon-side Anthropic call rather than letting it keep
running after the client walks away (mirrors the PR #974 fix for
finalize).

- `apps/daemon/src/handoff-routes.ts` — new, exports registerHandoffRoutes
  + RegisterHandoffRoutesDeps.
- `apps/daemon/src/server-context.ts` — adds handoff slot to ServerContext.
- `apps/daemon/src/route-context-contract.ts` — adds RegisterHandoffRoutesDeps
  to the compile-time coverage assertion.
- `apps/daemon/src/server.ts` — imports synthesizeHandoffPrompt +
  registerHandoffRoutes, builds handoffDeps, registers the route next
  to finalize.
- `apps/daemon/tests/handoff-route.test.ts` — 12 HTTP-layer tests:
  validation (400/403/404), happy path, upstream error mapping
  (401/429/502/502 non-JSON), api-key redaction.
- `apps/daemon/tests/handoff-route-abort.test.ts` — client-disconnect
  aborts the daemon-side controller.

Refs nexu-io/open-design#462.

* fix(daemon): map TranscriptExportLockedError to 409 CONFLICT on handoff route

`exportProjectTranscript` acquires a per-project `.transcript.lock`
internally (apps/daemon/src/transcript-export.ts:131-163) and throws
`TranscriptExportLockedError` on EEXIST. Concurrent handoff requests —
or a handoff that races `/api/projects/:id/finalize/anthropic` — lost
that lock and surfaced as 500 INTERNAL_ERROR through the route's
generic catch.

- `apps/daemon/src/handoff-routes.ts` — catch `TranscriptExportLockedError`
  and return `409 CONFLICT` ahead of the generic 500 branch, mirroring
  the existing `FinalizePackageLockedError → 409 CONFLICT` mapping at
  `apps/daemon/src/import-export-routes.ts:603-605`.
- `apps/daemon/src/server.ts` — thread `TranscriptExportLockedError`
  through `handoffDeps` so the route can match without a direct import.
- `apps/daemon/src/handoff-design.ts` — correct the module header
  comment that incorrectly claimed "no lockfile (concurrent handoff
  calls are safe)" — handoff does not add its own lock, but it does
  transitively acquire `.transcript.lock` via the transcript-export
  call.
- `apps/daemon/tests/handoff-route.test.ts` — regression test that
  pre-acquires `.transcript.lock` on disk via `fs.openSync(lockPath, 'wx')`
  before firing a handoff request, asserts 409 CONFLICT.

Refs nexu-io/open-design#462 — addresses @nettee's blocking review on
PR #1718 (comment 3242251338).

* fix(daemon): keep handoff request timeout armed through the response body read

`synthesizeHandoffPrompt` cleared the upstream-call timeout in a `finally`
that ran as soon as `callAnthropicWithRetry` returned. But `fetch()`
resolves once the upstream sends *headers* — so the subsequent
`await response.json()` body read ran with no timeout. A response that
sends headers and then stalls its body could hang `/api/projects/:id/handoff`
indefinitely instead of failing.

- `apps/daemon/src/handoff-design.ts` — move `clearTimeout(timeoutId)` into a
  single outer `finally` spanning both the call and the `response.json()`
  body parse, so the timeout stays armed until the body is fully consumed.
- `apps/daemon/src/handoff-design.ts` — the body-parse catch now re-throws
  `AbortError` as-is, mirroring the call-phase catch. Without this a
  body-phase timeout would surface as `502` "non-JSON body"; re-throwing
  lets the route map it to the intended `503` "handoff timed out"
  (`handoff-routes.ts:122-124`).
- `apps/daemon/tests/handoff-design.test.ts` — regression test: a `fetchImpl`
  returning a `Response` whose body never closes after headers, raced
  against a 500ms deadline, asserts the call aborts (not hangs) and rejects
  with `AbortError`.

Refs nexu-io/open-design#462 — addresses @nettee's round-2 blocking review
on PR #1718 (`handoff-design.ts:196`).

* fix(daemon): map upstream 400 to 400 BAD_REQUEST on handoff route

`callAnthropicWithRetry` preserves a non-retryable upstream status, so an
Anthropic HTTP 400 (`invalid_request_error` — unknown model, invalid
maxTokens, malformed body) reached the route's `FinalizeUpstreamError`
branch and fell through to `502 UPSTREAM_UNAVAILABLE`. That reported
deterministic caller input as a transient server outage, inviting
pointless retries and hiding which field was wrong.

- `apps/daemon/src/handoff-routes.ts` — special-case `err.status === 400`
  to `400 BAD_REQUEST` with the redacted upstream detail, ahead of the
  generic 502. Also refresh the route docblock: it claimed the 409 branch
  was omitted (stale since the R1 TranscriptExportLockedError fix) and
  that error mapping fully mirrors finalize (now diverges on 400).
- `apps/daemon/tests/handoff-route.test.ts` — route test driving an
  Anthropic `400 invalid_request_error`: asserts 400 BAD_REQUEST, the
  upstream detail is surfaced, and an echoed key is redacted.
- `packages/contracts/tests/package-runtime.test.ts` — import
  `@open-design/contracts/api/handoff` through the package `exports` map
  and assert `HANDOFF_SCHEMA_VERSION`, covering the built publish surface
  (esbuild entry + exports map + root re-export) that the source-only
  `handoff-contract.test.ts` does not exercise.

Refs nexu-io/open-design#462 — addresses @nettee's round-3 blocking
review on PR #1718.

* fix(daemon): await the now-async external base-URL validator on handoff route

Main's #1176 (`9a64fccd`) made `validateExternalApiBaseUrl` DNS-aware and
asynchronous (`validateBaseUrlResolved`) and updated the proxy and finalize
callers to `await` it. The handoff route — added on this branch in parallel,
against the old synchronous validator — still called it without `await`, so
`validated` was a Promise: `validated.error` / `validated.forbidden` were
`undefined`, the SSRF / malformed-URL guard silently no-opped, and a bad
`baseUrl` fell through to the upstream call and surfaced as 502.

A semantic merge break — no textual conflict, green on the branch in
isolation, red once CI re-merged latest main.

- `apps/daemon/src/handoff-routes.ts` — `await validateExternalApiBaseUrl(...)`,
  mirroring the finalize route (`import-export-routes.ts:561`). The handler
  is already `async`.

The existing `handoff-route.test.ts` cases "400 BAD_REQUEST when baseUrl is
not a valid URL" and "403 FORBIDDEN when baseUrl points at a private internal
IP" already encode this — red against branch + latest main, green now.

Refs nexu-io/open-design#462 — PR #1718 CI fix.

* chore(daemon): list handoff in the assertServerContextSatisfiesRoutes literal

The `assertServerContextSatisfiesRoutes({...})` call in `server.ts` enumerates
every route registrar's deps but omitted `handoff`. Adding `handoff: handoffDeps`
makes the literal complete and consistent with the other route deps.

This was not a typecheck break: route-dep coverage is guaranteed by the
`Assert<ServerContext extends AllRegisteredRouteDeps>` type in
`route-context-contract.ts` — and `AllRegisteredRouteDeps` already includes
`RegisterHandoffRoutesDeps` — not by this assertion-call literal. The literal
has omitted `handoff` since this branch's first push (`806db576`) through green
CI throughout; `tsc -p tsconfig.json --noEmit` is clean before and after.

Refs nexu-io/open-design#462 — addresses @nettee's round-4 review note on PR #1718.

* feat(web): add "Resume conversation in new chat" action (#462)

Adds a Resume control to the chat header, next to "New conversation".
Clicking it synthesizes a handoff prompt from the current transcript
via POST /api/projects/:id/handoff, opens a fresh conversation, and
auto-sends the synthesized prompt as its first user message — so a
drifted session resumes without the user replaying context by hand.
The old conversation is preserved.

- synthesizeHandoff() web-state wrapper in apps/web/src/state/projects.ts
- resume-conversation icon button in ChatPane (onResumeConversation /
  resumeConversationDisabled props)
- handleResumeConversation + pendingResumeRef + auto-send effect in
  ProjectView; effect gates on messagesConversationId so the prompt
  cannot fire before the new conversation's message read settles
- chat.resumeConversation i18n key across all 19 locales

Commit B of #462; Commit A is the daemon endpoint (PR #1718). This
branch is stacked on feat/handoff-endpoint so the web code resolves
@open-design/contracts/api/handoff.

* fix(daemon): scope handoff to one conversation + reject empty transcripts (#462)

Addresses the review on #1718 and #2264:

- mrcfps (#2264): the handoff endpoint exported the whole project's
  transcript, so a multi-conversation project blended unrelated chats
  into the synthesized prompt. HandoffRequest now carries a required
  conversationId; the route validates it belongs to the project
  (404 CONVERSATION_NOT_FOUND), and exportProjectTranscript takes an
  optional conversationId filter so only that conversation is exported.
- nettee (#1718): a zero-message conversation still called Anthropic and
  fabricated a handoff. synthesizeHandoffPrompt now throws
  EmptyTranscriptError on messageCount === 0; the route maps it to
  400 EMPTY_TRANSCRIPT before any BYOK tokens are spent.

HANDOFF_SCHEMA_VERSION bumped to 2 (conversationId is a new required
request field). Regression tests: a two-conversation scoping test, an
empty-conversation route + pipeline test, and a transcript-export
conversationId-filter unit test.

* feat(web): send conversationId with the resume handoff request (#462)

Follows the handoff endpoint becoming conversation-scoped. The resume
flow now passes the active conversationId to POST /handoff so the
synthesized prompt summarizes only the conversation being resumed.
handleResumeConversation bails when there is no active conversation;
synthesizeHandoff and the resume tests carry the new field.

* feat(daemon): add `od project handoff` CLI + register handoff error codes (#462)

Addresses the second-round review on #1718 and #2264:

- mrcfps (#2264): per AGENTS.md "Capability exposure (UI/CLI dual-track)",
  a user-facing capability must be reachable through the `od` CLI, not
  only the web UI. Adds `od project handoff <id> --conversation <id>
  --api-key <key> --model <model> [--base-url] [--max-tokens] [--json]`,
  driving the same POST /api/projects/:id/handoff endpoint. The logic
  lives in a testable handoff-cli.ts sibling module (mirrors
  artifacts-cli.ts) so cli.ts's import-time dispatch stays out of tests.
- nettee (#1718): the route emitted CONVERSATION_NOT_FOUND and
  EMPTY_TRANSCRIPT, which were absent from the shared API_ERROR_CODES
  union. Both are now registered in packages/contracts/src/errors.ts,
  with a contract test pinning them so the route and contract cannot
  drift again.

A CLI contract test covers the conversation-scoped request shape,
--json output, flag validation, and daemon-error surfacing.

* fix(daemon): fail `od project handoff` on a malformed 2xx response (#462)

Addresses nettee's review on #1718: runProjectHandoff treated any 2xx
response as success, so a broken daemon/proxy 200 with malformed or
shape-invalid JSON would print `undefined` (or `{}` under --json) and
still exit 0 — breaking the fail-fast contract scripts rely on. It now
validates the body is a well-formed HandoffResponse via an
isHandoffResponse type guard and fails fast otherwise. Regression tests
cover a shape-invalid and an unparseable 200 body.

* feat(web): surface the daemon's classified handoff error in the resume toast (#462)

Addresses mrcfps's non-blocking note on #2264: synthesizeHandoff returned
null for every non-2xx response, so RATE_LIMITED, EMPTY_TRANSCRIPT, and an
upstream 400 with provider detail all collapsed into one generic "check
your API key" toast — even though handoff-routes.ts had already classified
and sanitized them.

synthesizeHandoff now returns the daemon's structured `{ error }` on a
classified failure; `null` stays reserved for a transport failure or an
unparseable body. handleResumeConversation surfaces error.message plus
redacted details for the `{ error }` case, and a distinct
daemon-unreachable message for null.

* fix(web): omit empty baseUrl from the resume handoff request (#462)

Addresses mrcfps's review on #2264: the default Anthropic config
normalizes baseUrl to '' (config.ts), and the handoff route 400s an
explicit empty baseUrl — so the Resume action failed before synthesis
for every user who never set a custom base URL.

handleResumeConversation now forwards baseUrl only when config.baseUrl
is a non-empty string, matching the contract's optional-field semantics.
Tests: the default-config path asserts baseUrl is absent from the
request, and a new case covers a custom baseUrl being forwarded.

* refactor(daemon): dispatch `od project handoff` before the generic project parser (#462)

Addresses nettee's non-blocking note on #1718: runProject ran the shared
parseFlags(PROJECT_*) before reaching the handoff switch case, so a
malformed `od project handoff` invocation (`--unknown`, `--max-tokens`
with no value) threw out of the generic parser instead of hitting
handoff-cli's structured fail() — the entrypoint behaved differently
from the unit-tested runProjectHandoff helper.

The handoff sub now short-circuits before parseFlags / projectDaemonUrl,
so `od project handoff` runs exactly runProjectHandoff with no
intervening parsing. handoff-cli.test.ts gains unknown-flag and
missing-value cases covering the structured fail path.

---------

Co-authored-by: DevForgeAI CI/CD Engineer <devforge-ai@development.ai>
2026-05-20 13:28:27 +08:00
Tom Huang
86ec951fb9
[codex] Add automation templates and proposal workflows (#2193)
* feat(web): introduce Automations tab with dual-track capability for routines

This commit adds a new Automations tab that consolidates routines, schedules, and live artifacts, allowing users to manage automations seamlessly. The tab features a modal for creating and editing automations, which supports various scheduling options (hourly, daily, weekdays, weekly) and project modes (create_each_run, reuse). The CLI is also updated to expose automation commands, ensuring consistency between the web UI and CLI interfaces.

Key changes include:
- New `NewAutomationModal` component for automation creation and editing.
- Updated `TasksView` to integrate the new Automations functionality.
- Enhanced styling for the Automations tab to improve user experience.

This implementation aligns with the dual-track capability exposure policy, ensuring all features are accessible via both the web UI and CLI.

* feat(daemon): enhance automation context handling and CLI commands

This commit introduces several improvements to the automation context management and updates the CLI commands accordingly. Key changes include:

- Added support for new context fields (`plugin`, `mcp`, `connector`) in automation commands.
- Updated the CLI to reflect new target options (`new-project`).
- Enhanced error messages for invalid target inputs.
- Introduced functions to handle context selection and normalization for routines, including the ability to parse and store context data in the database.
- Updated the database schema to include a new `context_json` field for routines.
- Improved the handling of context in routine routes and the web interface, ensuring that selected contexts are properly managed and displayed.

These changes aim to provide a more robust and flexible automation experience, aligning with the recent enhancements in the web UI.

* feat(web): enhance TasksView with automation run history and status indicators

This commit introduces several new features to the TasksView component, including:

- Added functionality to display automation run history for each routine, showing metadata such as status, timestamps, and project details.
- Implemented status indicators for routine runs, providing visual feedback on their current state (succeeded, failed, running, queued).
- Enhanced the UI to allow users to expand and view detailed run history, including the ability to open the corresponding project conversation.
- Updated styles to improve the presentation of automation statuses and history.

These changes aim to provide users with better insights into their automation routines and improve overall usability.

* feat(daemon): implement automation ingestion and proposal management

This commit introduces several new features related to automation ingestion and proposal management within the daemon. Key changes include:

- Added new modules for handling automation source packets and proposals, allowing for the storage, retrieval, and management of automation-related data.
- Implemented functions to list, create, and apply automation proposals, enhancing the automation workflow.
- Introduced new CLI commands for interacting with memory entries and automation sources, providing users with more control over their automation processes.
- Enhanced the server routes to support automation source and proposal APIs, enabling seamless integration with the existing system.

These changes aim to improve the overall automation experience, making it easier for users to manage and utilize automation proposals and ingestions effectively.
2026-05-19 16:35:28 +08:00
chaoxiaoche
46a64edce3
feat(design-systems): extract component manifests (#2051)
Co-authored-by: chaoxiaoche <chaoxiaoche@chaoxiaochedeMacBook-Pro.local>
2026-05-18 16:48:59 +08:00
lefarcen
53997990b7 Merge origin/main (post-0.7.0) into reconciled garnet branch
Second-pass merge layering 41+ new commits from origin/main on top of
the first reconcile commit. Headline upstream additions absorbed:

- 0.7.0 release: redesigned chat bubble user-text styling, neutralised
  palette, lucide icons, ElevenLabs audio voice option discovery in the
  prompt composer, analytics tracking (PostHog) wired across home /
  studio / create surfaces, Prometheus `/api/metrics` endpoint,
  critique-theater drop-in mount with a settings toggle.
- Misc upstream fixes (titlebar padding, release header layout, deck
  preview chrome, feedback form auto-scroll, conversation-created SSE
  on routine runs, etc.)

Conflict resolutions (12 files, ~22 hunks):

- contracts barrel + prompts/system: union of both sides; new analytics
  exports (`./analytics/events`, `./analytics/public-params`) added
  alongside garnet's plugin/atom/genui exports. Both ElevenLabs voice
  fields (audioVoiceOptions/audioVoiceOptionsError, main) and
  pluginBlock/activeStageBlocks (garnet) preserved on ComposeInput.
- daemon/server.ts: Prometheus `/api/metrics` route inserted after
  garnet's `/api/daemon/shutdown`. main's `createAnalyticsService` call
  added before the chat-run service init alongside the prior reconcile
  note about the dropped legacy POST /api/projects body.
- App.tsx: handleCreateProject now consumes both garnet's plugin
  fields (pluginId / appliedPluginSnapshotId / pluginInputs /
  autoSendFirstMessage) and main's analytics requestId. Tracking
  fires success + failure paths; PluginLoopHome auto-send sessionStorage
  flag is preserved.
- ProjectView.tsx: the garnet auto-send useEffect coexists with main's
  `useCritiqueTheaterEnabled()` hook.
- ChatComposer.tsx: imports merged (drop now-unused fetchSkills,
  add analytics provider + tracking + buildVisualAnnotationAttachment).
- index.css: main's redesigned `.msg.user .user-text` chat bubble
  styling wins over garnet's plain text rule; garnet's
  `.msg-plugin-chip*` rules preserved alongside.
- EntryView.tsx: accepted HEAD (garnet wrapper) — consistent with
  reconcile decision #2. main's added PetRail / TopTab / analytics
  view tracking is intentionally NOT brought into the wrapper; the
  follow-up to re-integrate PetRail / image-templates / video-templates
  into EntryShell still stands and now also covers analytics
  view-tracking hooks.
- daemon/package.json + pnpm-lock: merged dep set (tar + posthog-node +
  prom-client coexist).
- Test fixtures (FileWorkspace.test): kept garnet's plugin-folders
  describe block intact; main's projectKind="prototype" addition is
  dropped where it conflicted with garnet's plugin-folder fixture
  files.

Verification: `pnpm install` (after lockfile reconciled), `pnpm typecheck`
exits 0 across all workspace packages.

Follow-up not done in this commit:
- PetRail / image-templates / video-templates / 0.7.0 analytics
  view-tracking hooks need to be added to EntryShell.
- Critique-theater settings toggle UX (added on main) lives in the
  SettingsDialog hierarchy; the reconcile state preserves the
  SettingsDialog so this should work without changes, but no
  end-to-end verification yet.
2026-05-13 23:29:56 +08:00
lefarcen
d3602be666 Merge origin/main into garnet-hemisphere (reconcile)
Merge of `origin/main` (`03ed3960`, 2026-05-13 pre-0.7.0) into the
161-commit garnet-hemisphere line, reconciling the product-vibe-coded
plugin/marketplace/EntryShell surfaces from garnet with the routines /
skills / live-artifacts feature work landed on main since the fork point.

Headline decisions (full rationale + side-by-side screenshots in
`specs/change/20260513-garnet-skills-automations/reconcile-result-vs-garnet.md`):

- #1 SettingsDialog: keep main's Memory / Skills / External MCP /
  Connectors / Routines / MCP server nav items even though the top-level
  /integrations + /automations routes also cover them. Two entries
  coexist for now; revisit once Track A/B fill in the placeholder content.
- #2 EntryView: accept garnet's thin wrapper delegating to EntryShell.
  Main's PetRail sidebar + image-templates/video-templates tabs are
  intentionally deferred to a follow-up that re-integrates them into
  the new EntryShell layout.
- #3 /integrations + /automations top-level routes: kept (garnet's
  product intent). Skills tab is still a "Coming soon" placeholder
  awaiting Track A; Routines/Schedules/Live-artifacts cards on
  /automations are still mock awaiting Track B.
- #5 DesignFilesPanel: hybrid — main's pagination as primary list,
  garnet's Plugin folders section preserved between the live-artifacts
  block and the pagination block. (by-kind sections drop in favour of
  pagination; plugin-folders rendering stays because it is a
  garnet-specific product addition.)
- #7 server.ts (10 hunks, ~5400 conflict lines): manual hunk-by-hunk
  merge. Both daemon admin routes + plugin/genui routes (garnet) and
  routines/memory/skills upgrades (main) preserved. Garnet's inline
  project route block kept alongside main's `registerProjectRoutes` /
  `registerProjectUploadRoutes` modular wiring — duplicate route
  audit is a follow-up. Garnet's POST /api/projects plugin-snapshot
  resolution + default-scenario fallback is intentionally dropped from
  the inline body (now handled by registerProjectRoutes) and listed for
  follow-up re-integration into `project-routes.ts`.

Verification (worktree at /Users/elian/Documents/open-design-garnet):
- `pnpm typecheck` exits 0 across all workspace packages
- daemon (`pnpm tools-dev run web --namespace reconcile-shots`) boots,
  serves `/api/daemon/status` healthy, and survives a Playwright
  walkthrough of /integrations / /automations / home / projects /
  design-systems / plugins / settings dialog
- `@open-design/plugin-runtime` package built (was missing dist/ on
  garnet); without it the daemon's plugins/* imports fail at boot

Track A (Skills tab → real SkillsSection) and Track B (Automations
cards → real routines / live-artifacts backend) are the two remaining
follow-ups blocking the placeholder/mock content from going live. See
`spec.md` and `track-skills.md` in the same directory.
2026-05-13 22:29:21 +08:00
lefarcen
e1bc83a476
feat(analytics): PostHog product analytics (P0 events, consent-gated, packaged) (#1428)
* feat(analytics): scaffold PostHog product-analytics integration

- Add @open-design/contracts/analytics subpath with the 17 P0 event
  payload types, header constants, and code↔CSV enum mapping helpers.
- Add apps/daemon/src/analytics.ts with env-gated posthog-node client,
  request-scoped analytics context reader, and artifact-id anonymizer.
- Expose GET /api/analytics/config so the web bundle never embeds the
  PostHog key at build time; daemon owns POSTHOG_KEY / POSTHOG_HOST.
- Add apps/web/src/analytics module (identity + lazy posthog-js client
  + React provider) and mount it under <I18nProvider> in app/layout.

No event wiring yet — that lands in the next commit alongside trigger
points (App.tsx, EntryView, NewProjectPanel, SettingsDialog, FileViewer,
runs.ts).

* feat(analytics): wire app_launch, home_view, home_click, project_create_result

- App.tsx: fire app_launch once after first effect tick. handleCreateProject
  now emits project_create_result on both success and failure paths.
- EntryView.tsx: home_view (page) gated on agents loading so
  has_available_cli isn't transiently false; home_view (asset_panel) fires
  per top-tab change with the right result_count.
- NewProjectPanel.tsx: home_click create_button fires before delegating to
  the parent; a fresh request_id is generated here and threaded through
  onCreate so the matching project_create_result stitches via $insert_id.
- contracts/analytics: tighten createTabToTracking and topTabToTracking
  for the worktree branch's renamed tabs (live-artifact, templates).

* feat(analytics): wire settings_view + 3 settings_click events

- settings_view fires on dialog mount and on every section switch,
  carrying the active section (mapped via settingsSectionToTracking
  for the 16-section worktree layout), execution_mode, and the
  selected CLI provider id when present.
- settings_click execution_mode_tab: setMode now emits before/after
  values whenever the user toggles between Local CLI and BYOK.
- settings_click cli_provider_card: agent card onClick reports
  cli_provider_id via agentIdToTracking (kiro → other).
- settings_click byok_field: onFocus added to api_key, model select,
  and base_url inputs; provider_id widened to include google so the
  worktree's Gemini protocol slot type-checks.

* feat(analytics): wire studio_view + studio_click chat, studio_view artifact

- packages/contracts/src/analytics/artifact-id.ts: FNV-1a 64-bit helper
  produces a 16-hex anonymized id for (projectId, fileName). Stable
  cross-platform so the daemon and the web bundle resolve the same id
  without a Web Crypto round-trip; daemon now re-exports it.
- ChatComposer: studio_view chat_panel fires once per project mount,
  studio_click chat_composer fires on attachment + send buttons with
  estimated user_query_tokens (length/4) and has_attachment.
- FileViewer: studio_view artifact fires once per (project, file) at
  the dispatcher level, before any sub-viewer renders, with
  artifact_kind derived from the renderer registry / file.kind table.
- Widen TrackingExportFormat to include markdown and cloudflare_pages
  so the worktree branch's full share menu can emit verbatim.

* feat(analytics): wire studio_click share_option + artifact_export_result

HtmlViewer's share menu now emits both events per click via a
fireShareExport helper:

- studio_click share_option fires immediately on click with the chosen
  export_format and a fresh request_id.
- artifact_export_result fires when the export resolves — success for
  sync exporters (html, markdown, template) the moment the call
  returns, success/failed for async exporters (pdf, zip, deploy)
  via .then/.catch. The same request_id threads both events so
  PostHog stitches click → result via $insert_id.

DEPLOY_PROVIDER_OPTIONS maps to the CSV's vercel / cloudflare_pages
slots; markdown is now a first-class export_format value.

Also ignore .env.local so local POSTHOG_KEY / .env-style secrets
don't get committed.

* feat(analytics): emit run_created and run_finished from the daemon

POST /api/runs now reads the analytics context off the
x-od-analytics-* headers the web client sets on every fetch, then:

- Captures run_created with project_id, conversation_id, run_id,
  model_id, agent_provider_id (mapped via agentIdToTracking),
  skill_id, design_system_id, plus the token_count_source marker.
- Schedules a run_finished capture on runs.wait(run) resolution,
  mapping succeeded/canceled/failed to success/cancelled/failed and
  reporting total_duration_ms.

Both events use a stable insert_id derived from the same uuid so
PostHog dedupes the daemon-side mirror against any future
web-side capture without double-counting.

Token sub-fields (user_query_tokens/system_prompt_tokens/...) stay
omitted in v1 — the claude-stream parser only exposes input/output
totals today. See tracking-doc-issues.md §3.2.

* feat(analytics): emit settings_cli_test_result + settings_byok_test_result

The original BLOCKING-list assumed these CSV P0 events were not
implementable in this branch because main lacked Test buttons. The
worktree HEAD actually wires `handleTestAgent` and `handleTestProvider`
in SettingsDialog, so both events are now in scope.

- handleTestAgent emits settings_cli_test_result on success and
  failure paths with cli_provider_id mapped via agentIdToTracking,
  result drawn from result.ok / catch branch, error_code from
  result.kind or the thrown error name, and duration_ms timed via
  performance.now().
- handleTestProvider emits settings_byok_test_result analogously,
  using apiProtocol (anthropic|openai|azure|ollama|google) directly
  as provider_id — wider than the CSV's 5-value enum, documented in
  tracking-doc-issues.md §2.5.

Contracts: add SettingsCliTestResultProps / SettingsByokTestResultProps
plus matching track* helpers. AnalyticsEventName union now covers all
14 P0 events this branch supports.

* feat(analytics): gate PostHog on the existing telemetry.metrics consent

The integration now reuses the same first-launch privacy banner +
Settings → Privacy toggle that gates Langfuse, so a single user
decision controls both telemetry sinks.

- /api/analytics/config now consults the persisted AppConfigPrefs:
  it returns enabled=true only when POSTHOG_KEY is set AND the user
  has chosen "Share usage data" (telemetry.metrics === true). The
  response also echoes installationId so the web client uses the
  same anonymous id Langfuse keys off of — one identity per install,
  shared across both sinks.
- Web AnalyticsProvider:
  - Bootstrap fetch resolves installationId and threads it through
    the x-od-analytics-anonymous-id header on every /api/* fetch,
    so daemon-side captures (run_created / run_finished /
    project_create_result) land on the same person record.
  - Exposes a setConsent(granted) method that calls posthog-js's
    opt_in_capturing / opt_out_capturing, wired from App.tsx via a
    useEffect watching config.telemetry?.metrics. Toggling Privacy
    → metrics now stops/resumes events immediately, no reload.
- app_launch additionally gates on telemetry.metrics so a freshly-
  declined user fires nothing, and a freshly-opted-in user fires on
  the next reload.

* feat(packaging): bake POSTHOG_KEY into packaged daemon spawn env

Wires PostHog product analytics through the same Langfuse-style build-
secret pipeline so official Open Design builds ship with the key while
fork builds compile without it (the integration short-circuits cleanly
when POSTHOG_KEY is absent).

tools/pack
- resolveToolPackConfig reads POSTHOG_KEY / POSTHOG_HOST from
  process.env at packaging time, validates them (no whitespace in the
  key, http(s) URL for host, trailing-slash strip), and stamps them on
  ToolPackConfig. Fork builds without the env vars simply omit the
  fields; the daemon-side gate keeps things off in that case.
- Mac, Windows, and Linux packaged-config writers each append the two
  fields to open-design-config.json next to the existing
  telemetryRelayUrl entry.

apps/packaged
- RawPackagedConfig / PackagedConfig surface posthogKey / posthogHost
  so the Electron entry and headless entry both forward them to the
  daemon sidecar.
- buildPackagedDaemonSpawnEnv emits POSTHOG_KEY / POSTHOG_HOST into
  the daemon child env when present. The daemon's existing analytics
  module reads these via process.env — no daemon-side changes needed.
- The headless packaged path falls back to process.env for fields the
  builder hasn't injected, mirroring how OPEN_DESIGN_TELEMETRY_RELAY_URL
  is read there.

CI
- release-beta.yml and release-stable.yml expose POSTHOG_KEY (secret)
  and POSTHOG_HOST (var) at workflow-env scope so every packaging job
  inherits them. PR / fork builds without these set simply skip the
  bake step.

Tests
- tools/pack: config.test.ts covers bake-through, fork-build omission,
  whitespace rejection, invalid-URL rejection, and trailing-slash
  normalization.
- apps/packaged: sidecars.test.ts covers buildPackagedDaemonSpawnEnv
  forwarding the keys when present and omitting them when null.

* feat(analytics): enable PostHog autocapture + perf + exceptions

Flip on the PostHog SDK's automatic diagnostic features so we capture
click paths, page transitions, web vitals, dead clicks, and browser
exceptions without scattering instrumentation through the codebase.

Privacy defense lives in one place — apps/web/src/analytics/scrub.ts —
wired in via posthog-js's `before_send` hook so every outgoing event
passes through the same audit point:

  - $autocapture / $rageclick / $dead_click / $copy_autocapture:
    strips $el_text and value/placeholder/aria-label attrs from any
    input, textarea, password input, or contenteditable element. PostHog
    autocapture does not capture input.value by default, but $el_text
    on a <textarea> reflects the typed content — that's the prompt
    body for us, so it has to be scrubbed every time.
  - $pageview / $pageleave: drops query string and fragment from
    $current_url / $referrer so any future ?q=… can't leak.
  - $exception: rewrites file:// and absolute filesystem paths in
    stack frames to app://apps/<repo-relative> so we don't ship the
    user's home directory.
  - Suppresses $opt_in entirely — duplicate of our explicit
    setConsent toggle in App.tsx.

Element-level defense in depth is limited to the single most sensitive
surface: the chat composer textarea gets `ph-no-capture` so PostHog
never even generates an event for clicks inside that subtree. Every
other input relies on scrub.ts — sprinkling the class through every
form would be noisy and easy to forget on new surfaces.

The existing Privacy → "Share usage data" toggle continues to gate
every new feature: posthog-js's opt_out_capturing() halts autocapture,
$pageview, $exception, web vitals, and dead clicks alongside the
explicit capture() calls — one global switch.

11 unit tests pin the scrub rules in apps/web/tests/analytics-scrub.test.ts.

* ci(nix): bump pnpmDepsHash for posthog-js + posthog-node additions

Adding posthog-js to apps/web and posthog-node to apps/daemon changed
pnpm-lock.yaml, which Nix's fixed-output pnpmDeps derivation pins by
sha256. The CI nix flake check failed with:

  specified: sha256-KF3Mld72/iau+pJmA7HvnanRx8VLtDP0N624SKrtrrc=
  got:       sha256-PGFgX4lYyeH2TRAXfUq52A3EOa6bb1gO59hPsXhEk3s=

Copy the new hash into both nix/package-web.nix and
nix/package-daemon.nix per the procedure documented in nix/README.md
§"First-build hash pinning".

* feat(analytics): unify PostHog identity with Langfuse installationId

PostHog's distinct_id is the installationId stamped by /api/analytics/
config; Langfuse already reads the same id off app-config.json to
populate trace.userId. With both sinks keying off the same anonymous
identity, dashboards can correlate user actions (PostHog events) with
LLM runs (Langfuse traces) without re-identifying.

Two gaps closed:

1. applyConsent(false) — clear posthog-js's persisted ph_*_posthog
   localStorage entry on opt-out via posthog.reset(). Without this, a
   user who opts out, then clicks Delete my data, then re-opts in
   would see PostHog stitch their new session to the deleted identity
   because bootstrap.distinctID only takes effect on first init.

2. applyIdentity(newInstallationId) — Delete my data rotates the
   installationId in app-config; App.tsx now watches config.installationId
   and calls posthog.reset() then identify(newId) so the next event
   batch is fully decoupled from the deleted one. Idempotent on
   same-id re-renders so benign config refreshes don't churn PostHog
   identities.

The fetch wrapper's x-od-analytics-anonymous-id header also flips to
the new id on rotation so daemon-side captures (run_created /
run_finished) land on the same person record from the very next API
call, not after a reload.

The end-to-end rotation flow is verified against a live PostHog
project; these unit tests pin the safety guards (no-client paths, null
inputs) since stubbing posthog-js's init-loaded callback chain is
brittle.

* fix(langfuse): require both metrics AND content consent for trace reports

Tightens the Langfuse gate so a user who shares anonymous metrics but
NOT conversation content stops emitting Langfuse traces entirely —
Langfuse is used for turn-quality evals which only make sense with
prompt/output bodies. PostHog (product analytics, content-free) stays
gated on `metrics` alone and is unaffected.

i18n: "Conversation content" → "Conversation and tool content" with
hints expanded to mention tool inputs/outputs so the consent surface
matches what the trace actually carries (en + zh-CN).

Bundled here per PR scope — change originated outside this PostHog
PR but lands cleanly on the same files; gating Langfuse strictly
on `content` makes the dual-sink consent model (PostHog = metrics,
Langfuse = metrics + content) symmetric across both i18n locales and
the daemon-side gate.

* feat(analytics): wire byok_provider_option + fix PR review P1s

Adds the BYOK protocol-chip click event (5-value provider_id mirroring
the apiProtocol Settings UI) and resolves four P1 review threads on
PR #1428.

byok_provider_option:
- New SettingsClickByokProviderOptionProps in contracts (provider_id =
  anthropic|openai|azure|google|ollama; maps to CSV's 5 values per
  tracking-doc-issues.md §2.5).
- trackSettingsClickByokProviderOption helper in apps/web/src/analytics.
- SettingsDialog hooks it on the protocol-chip onClick alongside the
  existing setApiProtocol call; is_selected reflects whether the chip
  was already active.

Review fixes:

1. client.ts (Siri-Ray): clear `initPromise` when the resolution is
   null so a Privacy → metrics opt-in after a previous decline triggers
   a fresh /api/analytics/config fetch. Without this, the disabled
   response was cached forever — first-session opt-in needed a reload
   to start sending PostHog events.

2. provider.tsx (Siri-Ray): replace `url.includes('/api/')` with a
   strict same-origin + /api/ pathname check (shared
   `isSameOriginApiCall` helper). Outbound third-party URLs containing
   `/api/` (e.g. provider.example.com/api/x) no longer receive our
   x-od-analytics-* headers.

3. provider.tsx (codex-connector, lefarcen): gate header injection on
   `resolvedAnonId` being non-null. When Privacy → metrics is off,
   /api/analytics/config returns enabled=false → resolvedAnonId stays
   null → wrapper never installs → daemon can't read consent-bearing
   headers → no daemon-side PostHog event. setConsent now also clears
   resolvedAnonId on opt-out and re-fetches on opt-in.

4. daemon/analytics.ts (defense in depth): createAnalyticsService now
   takes dataDir and capture() re-reads app-config to check
   telemetry.metrics inside the fire-and-forget wrapper. Even if a
   stale header somehow reaches the daemon after opt-out, the capture
   is dropped before posthog-node.capture is called.

* fix(web): place "Share usage data" on the right in privacy consent banner

Swap button order in PrivacyConsentModal and the in-settings ConsentCard
so the affirmative "Share usage data" lands on the right and "Not now"
on the left. Matches the OK-on-the-right pattern users expect for
primary actions.

Both buttons keep equal visual prominence (same .privacy-consent-action
styling) so the swap doesn't change the EDPB equal-prominence stance
called out in the original Langfuse telemetry spec.

* feat(analytics): populate run_finished token totals from claude-stream usage

Daemon's claude-stream parser already emits agent usage events with
input_tokens / output_tokens totals; the run service buffers them in
run.events and Langfuse reads them out the same way. The run_finished
PostHog event was leaving these fields empty.

Scan run.events for the most recent agent usage frame on terminal
transition and emit input_tokens / output_tokens / total_tokens when
present. token_count_source flips to 'provider_usage' only when at
least one count landed; runs without provider-side usage data keep
'unknown'.

Provider does not break the input down into the 7 sub-fields the
tracking doc lists (memory / context / attachment / system_prompt /
…); those stay omitted until a parser change exposes them.

* feat(analytics): estimate user_query_tokens from prompt length

The user_query_tokens field for run_created / run_finished was hardcoded
to 0. We can't tokenize without bundling a model-specific tokenizer, but
the character/4 heuristic is the industry-standard estimate when one
isn't available and is enough for funnel analysis (prompt-length cohorts,
short-vs-long-query conversion rates).

Extracted from req.body via the same telemetryPromptFromRunRequest
pattern the daemon already uses for langfuse-bridge (currentPrompt then
message fallback). Only the integer count goes to PostHog — the prompt
text itself never leaves the daemon.

token_count_source flips appropriately:
- run_created with a prompt: 'estimated' (was 'unknown')
- run_created with no prompt: 'unknown'
- run_finished with provider usage: 'provider_usage' (overrides
  baseProps' 'estimated' value)
- run_finished without provider usage: inherits 'estimated' or 'unknown'
  from baseProps so input/output absent doesn't mask the estimate.
2026-05-12 22:32:42 +08:00
Tom Huang
e254d1280b
feat(memory): auto-memory store with chat-protocol-aware extraction (#999)
* feat(memory): auto-memory store with chat-protocol-aware extraction

Markdown memory store at <dataDir>/memory/ with two extractors —
heuristic regex for explicit "remember:" / "我是 X" markers, and a
small-model LLM pass after each turn — folded into the system prompt
so cross-chat preferences, role, and ongoing-work context survive
restarts.

Settings UI:
- Memory tab lists entries, exposes a hand-edited MEMORY.md index, and
  shows an extraction history with per-attempt phase/skip/failure rows.
- Memory model picker is inline next to the chat model picker (CLI and
  BYOK) so the choice "which fast model mines facts each turn?" sits
  next to the chat-model decision instead of a separate panel. The
  picker reuses the same SUGGESTED_MODELS table and "Custom..." pattern
  the chat picker uses.

LLM extractor supports all four protocols (anthropic / openai / azure /
google); pickProvider takes the chat agent id from the chat handler
and constrains its auto-pick to the chat's protocol family — Claude
Code chats no longer surprise users by silently extracting on whatever
OpenAI key happens to be in media-config. When no matching key is
configured the attempt records as 'skipped: no-provider' instead of
quietly switching vendors.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(memory): keep hint outside <label> and disambiguate Model selectors

The inline Memory model picker wrapped its hint paragraph inside the
<label>, which made the hint's "API key" / "model" wording bleed into
the <select>'s accessible name and broke Playwright's getByLabel('API
key') / getByLabel('Model') strict-mode matching in the existing
settings-api-protocol e2e suite.

- Move the hint <p> out of the <label> in MemoryModelInline so the
  select's accessible name is just "Memory model".
- Switch the chat-Model selectors in settings-api-protocol.test.ts from
  getByLabel('Model') to getByRole('combobox', { name: 'Model', exact:
  true }) so they no longer collide with the new "Memory model" select
  that sits next to the chat Model picker.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(memory): address review changes — BYOK wiring, MEMORY.md index, /v1, label wrapper

Addresses the four blocking review threads on PR #999.

1. MemoryModelInline accessibility (mrcfps)
   The inline picker still wrapped its select + custom input + flash +
   hint inside a single <label>, which made the select's accessible
   name absorb every text descendant — including the "API key" / "model"
   hint copy. The previous fix moved only the hint outside; the
   reviewer asked for a non-label wrapper. Switch to <div className="field">
   and associate just the short title with the controls via
   `aria-labelledby` / `aria-label`. The select's accessible name is
   now exactly "Memory model" so `getByLabel` strict-mode locators
   on the surrounding chat form stop cross-matching the memory copy.

2. Respect the hand-edited MEMORY.md index (mrcfps + codex)
   `composeMemoryBody()` was reading every *.md file in the memory
   dir, ignoring the index. Removing a `- [Name](id.md)` line had no
   effect on future prompts. Parse the index's `INDEX_LINK_RE` bullets
   and filter `listMemoryEntries()` to the linked id set, so the
   editor's "delete this line to disable injection" promise actually
   holds.

3. Versioned OpenAI-compatible base URLs (codex)
   `callOpenAI` and `callAnthropic` hard-coded `/v1` onto
   `provider.baseUrl`, breaking custom endpoints whose saved URL
   already includes `/v1` (`/v1/v1/chat/completions`). Apply the same
   conditional `appendVersionedApiPath` helper the chat proxy and
   connection-test routes already use.

4. Wire memory into BYOK / API-mode chats (mrcfps + codex)
   The previous PR's daemon-only memory hook never fired for BYOK,
   leaving the Memory tab + model picker as a no-op for that mode.
   Add the missing surface and wire it through ProjectView:
   - contracts: extend `composeSystemPrompt` with `memoryBody`,
     mirroring the daemon's local composer; add
     `MemorySystemPromptResponse` and the `attemptedLLM` flag on
     `ExtractMemoryResponse`.
   - daemon: expose `GET /api/memory/system-prompt` (returns the
     composed body) and turn `POST /api/memory/extract` into a
     two-phase endpoint — heuristic-only when only userMessage is
     supplied (pre-turn), LLM-only when assistantMessage is also
     supplied (post-turn), so the extraction-history doesn't double
     up.
   - web: ProjectView's BYOK branch now fetches the memory body
     before composing the system prompt, runs the heuristic
     extractor before the run (so "remember:" markers in this turn
     reach this turn's prompt), accumulates assistant text during
     streaming, and queues the LLM extractor on `onDone` — fire-and-
     forget so it never blocks the chat round-trip.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(memory): re-sync BYOK memory override when chat config drifts

The inline memory-model picker captured `apiProtocol` / `chatApiKey` /
`chatBaseUrl` / `chatApiVersion` into the saved override only at the
moment the user clicked a model. If they later swapped the BYOK
protocol tab, rotated the API key, or edited the base URL in the same
settings flow, the daemon's background extractor kept calling the
*old* vendor / credential — directly contradicting the picker's
"borrows the surrounding chat picker's protocol, key, base URL, and
api-version automatically" promise.

Add a debounced effect that compares the persisted (masked) shape
against the live chat props and re-PATCHes /api/memory/config when
they drift. The masked config exposes `apiKeyTail` (last 4 chars), so
key rotation is detectable without ever round-tripping the secret
back to the browser. The 300 ms debounce coalesces the keystroke-
granularity prop updates the parent settings dialog streams during
its autosave loop, so a user editing the base URL doesn't trigger one
PATCH per character. Background re-syncs are silent — the "Saved!"
flash only fires for explicit user clicks, so the picker doesn't feel
like it's fighting them as they edit unrelated chat fields.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(memory): thread BYOK chat config through /api/memory/extract default path

Leaving the BYOK memory picker on "Same as chat" still broke the
default LLM extraction path: `MemoryModelInline` clears the override
for that option, both `/api/memory/extract` calls in `ProjectView`
only sent the messages, and the daemon never persists BYOK creds, so
`extractWithLLM(..., { chatAgentId: null })` always reached
`pickProvider()` with no chat context and fell through to env /
media-config — the wrong vendor for a BYOK chat that works for
inference.

Thread the live BYOK chat config through the extract endpoint as a
per-call snapshot:

- contracts: extend `ExtractMemoryRequest` with an optional
  `chatProvider` (provider/apiKey/baseUrl/apiVersion/model) and add
  `'chat-byok'` to the credentialSource enum.
- daemon: parse + validate `chatProvider` on `/api/memory/extract`
  (provider must be one of the five known shapes) and forward to
  `extractWithLLM` as a new option. `pickProvider()` gets a new
  path 2 that uses the snapshot directly with the per-protocol
  fast-model default — so a memory pass on `gpt-4o` / `claude-sonnet-4-5`
  silently turns into a cheap `gpt-4o-mini` / `claude-haiku-4-5` call
  instead of paying chat-tier rates for sediment work. Override and
  CLI-agent-constrained paths still win when they apply.
- web: `ProjectView` snapshots `apiProtocol` / `apiKey` / `baseUrl` /
  `apiVersion` from the live `AppConfig` on each BYOK extract call
  (both pre-turn heuristic-only and post-turn LLM phases). The
  picker's existing drift-resync effect already covers explicit
  overrides; this snapshot covers the implicit "Same as chat"
  default that the override flow can't reach.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(memory): treat empty apiKey on PATCH as a real clear

MemoryModelInline silently re-PATCHes /api/memory/config whenever the
surrounding BYOK chat creds drift. The previous reuse branch lumped
`apiKey === ''` together with `apiKey === undefined`, so clearing the
chat API key from the picker quietly preserved the old daemon-side
secret and kept calling the provider on a stale credential.

Distinguish four states for the apiKey field:
- absent       -> preserve stored secret (form re-save without re-typing)
- ''           -> clear stored secret (user removed it from the picker)
- 'sk-...'     -> replace
- new provider -> ignore stored secret entirely

Add tests/memory-config-route.test.ts covering all four cases.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 15:45:42 +08:00
Cursor Agent
e6eaa62294
feat(plugins): handoff atom + ArtifactManifest provenance fields
Plan N3 / spec §11.5.1 / §21.5.

@open-design/contracts ArtifactManifest gains the spec §11.5.1
provenance + downstream-distribution surface as additive optional
fields:

  sourcePluginSnapshotId / sourcePluginId / sourcePluginVersion /
    sourceTaskKind / sourceRunId / sourceProjectId / parentArtifactId
  artifactKind / renderKind / handoffKind
  exportTargets[] / deployTargets[]

Spec §11.5.1 invariants:
  - sourcePluginSnapshotId NEVER changes after first write.
  - exportTargets[] / deployTargets[] are append-only.
  - handoffKind promotes monotonically along
    design-only < implementation-plan < patch < deployable-app.

apps/daemon/src/plugins/atoms/handoff.ts ships the daemon-side
helper:

  recordHandoff({ manifest, exportTarget?, deployTarget?,
                  handoffKind?, enforceMonotonicHandoff? })
    → { manifest, changed }

  - Idempotent: a (surface, target) pair only ever lands once on
    exportTargets[]; same for (provider, location) on deployTargets[].
  - handoffKind defaults to monotonic; pass enforceMonotonicHandoff:
    false on a rollback path.

  isDeployableAppEligible({ manifest, buildPassing, testsPassing })
    → boolean

  Spec §11.5.1 promotion rule for the deployable-app tier: requires
  build.passing + tests.passing AND at least one exportTargets[]
  entry on docker / cli surface. Centralises the rule so plugins
  don't reimplement it.

packages/contracts/src/index.ts now uses .js extensions on every
re-export so the daemon's NodeNext moduleResolution picks up the
new types end-to-end.

Daemon tests: 1534 → 1543 (+9 cases on plugins-handoff: appends
exportTargets / deployTargets, idempotency, monotonic handoffKind
promotion, downgrade refusal vs. rollback escape, deployable-app
eligibility rule).

Co-authored-by: Tom Huang <1043269994@qq.com>
2026-05-09 14:48:29 +00:00
Demoniooo
617fb043fe
feat(settings): add fetch models button for BYOK providers (#1034)
* feat(settings): add fetch models button for BYOK providers

* fix(settings): exclude Ollama from fetch models, add manual-entry hint

* fix(provider-models): classify non-JSON upstream errors by HTTP status

* fix(i18n): drop redundant English overrides from non-English locales

* fix(provider-models): allow ollama through allowlist, return unsupported_protocol

---------

Co-authored-by: haolin122 <hl6593@nyu.edu>
2026-05-09 22:28:03 +08:00
Cursor Agent
847304ebc5
feat(plugins): atom SKILL.md body loader + renderActiveStageBlock (spec §23.4)
Plan J3 / spec §23.3.2 patch 2 / §23.4.

Lays the substrate slice for migrating prompt fragments out of
`apps/daemon/src/prompts/system.ts` and into the bundled atom
SKILL.md bodies registered by §3.I3.

apps/daemon/src/plugins/atom-bodies.ts owns the daemon-side loader:

  loadAtomBodies(db, atomIds) → AtomBodyEntry[]

The function looks each atom id up in installed_plugins (bundled
rows win), reads the matching fsPath/SKILL.md, strips
front-matter, and returns the raw body. Atoms with no installed
plugin or unreadable SKILL.md are silently skipped — the caller
drops empty entries from the prompt.

packages/contracts/src/prompts/atom-block.ts ships the pure
renderer:

  renderActiveStageBlock({ stageId, bodies, iteration? }) → string

Mirrors spec §23.4's composeSystemPrompt sketch. Empty bodies
return ''; multiple bodies are separated by '---' with no trailing
separator. Lives in contracts so the daemon-side composer and any
future contracts-side composer share one definition (§11.8 PB1
single-import guarantee).

The composeSystemPrompt() rewiring itself is the next PR — this
commit gives that PR zero scaffolding to build: the helpers are
reachable, tested, and the bundled atom plugins from §3.I3 already
have the matching SKILL.md bodies on disk.

Tests: contracts 8 → 12 (+4 cases on atom-block); daemon
1482 → 1486 (+4 cases on plugins-atom-bodies covering the
end-to-end loadAtomBodies → renderActiveStageBlock path).

Co-authored-by: Tom Huang <1043269994@qq.com>
2026-05-09 13:15:52 +00:00
Tom Huang
643d0cf637
feat: add scheduled routines for unattended agent runs (#1033)
* feat: add scheduled routines for unattended agent runs

Generalizes Orbit's single hard-coded daily-digest scheduler into
user-defined routines: each one fires on a schedule (hourly / daily /
weekdays / weekly with IANA timezone) and starts a fresh agent
conversation, either inside an existing project or in a new project
minted on the spot.

Backend:
- New RoutineService with timezone-aware nextRunAt computed via
  Intl.DateTimeFormat (no new dependency); two-pass tzWallToUtc so
  DST transitions stay correct. Each fire chains rescheduleOne in
  finally() to keep the cadence alive.
- routines + routine_runs SQLite tables; schedule_json is the
  authoritative form, with legacy schedule_kind/value kept populated.
- /api/routines CRUD + /api/routines/:id/run + /api/routines/:id/runs.
- Run handler resolves agent (routine override -> app config -> first
  available), creates project (or reuses configured one) and a fresh
  conversation per fire, then dispatches into startChatRun.

UI (Settings -> Routines):
- Pill-chip schedule kind picker, time + timezone fields, weekday
  picker for Weekly. Live preview line ("Runs daily at 9:00 AM GMT+8").
- Routine list with inline status pill, next/last meta, expandable run
  history; each history row links into the project the run wrote to
  via the existing router primitive.

* fix(daemon): swallow trailing finally rejection for inflight cleanup

Without a terminal `.catch`, the promise returned by `promise.finally(...)`
mirrors the original rejection and produces an unhandled rejection — fatal
in modern Node — when the run handler rejects before producing a start
handle. Callers still see the rejection on the returned `promise`.

Generated-By: looper 0.6.1 (runner=fixer, agent=claude-code)

* fix(daemon): handle DST spring-forward gap in tzWallToUtc

The two-pass conversion picked the pre-gap candidate when the requested
wall time fell inside a spring-forward gap (e.g. 02:30 in
America/New_York on 2026-03-08), so the resulting instant rendered back
as 01:30 local and a 02:30 routine fired an hour early on the
transition day. Routines are local wall-clock schedules, so firing
before the requested time breaks the contract.

Now we round-trip both candidates through partsInTimezone, return the
one whose wall-clock matches the request, and on a gap day where
neither matches return the later candidate so the routine fires at the
first valid post-gap instant on the same day.

Generated-By: looper 0.6.1 (runner=fixer, agent=claude-code)

* fix(daemon): preserve both wall-time candidates on DST fall-back day

On a fall-back day, the requested wall time inside the repeated hour
(e.g. 01:30 America/New_York) maps to two distinct UTC instants. The
previous tzWallToUtc collapsed them to the first (pre-transition) one,
so a daemon that woke between the two instants would skip the second
01:30 entirely and fire a day late once per fall-back. Replace it with
tzWallToUtcCandidates (returns all valid instants, ascending) plus a
gap-only fallback for spring-forward, and have nextWallTimeMatching
walk both ambiguous candidates before advancing to the next day. Adds
fixtures for the repeated-hour case so the intended behavior stays
locked in.

Generated-By: looper 0.6.1 (runner=fixer, agent=claude-code)

* fix(web): make routine timezone picker IANA-complete and DST-truthful

The timezone dropdown was a hardcoded subset, but the backend validator
accepts any IANA zone — so users could not pick zones like
`America/Phoenix` or `Africa/Johannesburg` unless they happened to be
local. And `gmtLabel()` always derived the offset from `new Date()`,
which drifted seasonally for DST-observing zones (a New York routine
created in winter rendered `GMT-5` while it would actually fire on
`GMT-4` after DST started).

Source the picker from `Intl.supportedValuesOf('timeZone')` (with a
curated fallback for older runtimes) and anchor the GMT label to the
routine's next fire time. When the next fire time is unknown (e.g.
the live preview while the form is open) and in the dropdown itself,
fall back to the IANA city, which is stable year-round.

Generated-By: looper 0.6.1 (runner=fixer, agent=claude-code)

* fix(web): always include UTC in routine timezone picker

`Intl.supportedValuesOf('timeZone')` returns only canonical region
names on current runtimes (Node 24, recent browsers) and omits `UTC`,
so the previous picker dropped the most common non-local zone unless
the runtime itself was already UTC. The backend validator and the
contract examples still accept `UTC`, so a user on a non-UTC machine
could not create a documented UTC routine from Settings.

Prepend `UTC` inside `listSupportedTimezones()` when the runtime list
omits it, so the picker stays aligned with the supported schedule
surface.

Generated-By: looper 0.6.1 (runner=fixer, agent=claude-code)
2026-05-09 19:30:22 +08:00
pftom
89be57b2c4 feat(genui): introduce GenUI surface management and event handling
- Added a new GenUI module for managing user interface surfaces, including creation, response handling, and state synchronization.
- Implemented API endpoints for listing and responding to GenUI surfaces associated with runs and projects.
- Introduced event types and payload helpers for GenUI surface events, enhancing the interaction model for headless operations.
- Established a persistent state writer for GenUI surfaces, ensuring reliable data management and retrieval.
- Enhanced the plugin system to support auto-derived OAuth prompts for required connectors, improving user experience during plugin application.
2026-05-09 18:44:04 +08:00
pftom
4c7cd5d9f2 feat(plugins): introduce plugin system with installation and management capabilities
- Added support for a new plugin system, allowing users to install, uninstall, and manage plugins through the daemon.
- Implemented API endpoints for listing installed plugins, retrieving plugin details, and applying plugins with input validation.
- Introduced a plugin doctor feature to validate plugin manifests and check for issues before application.
- Established a plugin persistence layer with SQLite migrations for managing installed plugins and their metadata.
- Enhanced the CLI with commands for plugin operations, improving user interaction with the plugin ecosystem.
2026-05-09 18:24:44 +08:00
初晨
9ef136ced5
fix: sync Orbit last run with selected prompt template (#937)
* fix(orbit): scope last run to selected template

* fix(orbit): preserve legacy last run on upgrade

* fix(orbit): pin legacy last-run fallback on refresh

* fix(orbit): pin template id at run start

* test(web): sync orbit fixtures with skill summary
2026-05-09 11:19:59 +08:00
Bryan A
e13adf2e63
feat(daemon): finalize design package endpoint (closes #450) (#832)
* feat(daemon): scaffold /api/projects/:id/finalize/anthropic (refs #450)

Phase C of the PR 2 plan for issue #450: scaffold the route + module
shape so subsequent phases (D-I) land function bodies and tests against
a stable surface that already passes typecheck.

What lands here:
- apps/daemon/src/finalize-design.ts: module-level constants
  (DEFAULT_BASE_URL, DEFAULT_MAX_TOKENS=16000, INPUT_BODY_CAP_BYTES=384KiB,
  LOCK_FILENAME=.finalize.lock, OUTPUT_FILENAME=DESIGN.md,
  DEFAULT_TIMEOUT_MS=120s); inline interfaces for the request/response
  shape (kept out of packages/contracts per scope rules); two error
  classes - FinalizePackageLockedError (mirrors PR #493's
  TranscriptExportLockedError) and FinalizeUpstreamError (carries upstream
  HTTP status for the route's error mapping); function stub that throws
  "not yet implemented".
- apps/daemon/tests/finalize-design.test.ts: vitest harness with
  describe.skip placeholder so the file imports cleanly. Real cases land
  in phases D-I. Default-import of node:fs (per memory: vi.spyOn cannot
  redefine on the frozen ESM Module Namespace; CJS exports object is
  mutable).
- apps/daemon/src/server.ts: route handler at
  POST /api/projects/:id/finalize/anthropic, slotted next to the
  existing :id/deploy* family. Validates apiKey/model non-empty, optional
  baseUrl via the existing validateExternalApiBaseUrl closure (forbidden
  -> 403, invalid -> 400), optional maxTokens positive number; calls
  getProject (404 on miss); calls finalizeDesignPackage (which throws,
  caught and mapped to 500 for now); maps known error classes
  (FinalizePackageLockedError -> 409, FinalizeUpstreamError -> 502)
  pre-emptively.

Path shape rationale (Bryan-confirmed): project-scoped path matches every
sibling /api/projects/:id/* route in server.ts (deploy, deployments,
deploy/preflight); provider-namespaced segment leaves a clean expansion
line for /api/projects/:id/finalize/openai etc. as follow-ups.

Field-name rationale: apiKey, baseUrl, model, maxTokens match
ProxyStreamRequest verbatim (packages/contracts/src/api/proxy.ts:8-19)
so a future caller can reuse the same body shape. baseUrl is optional
here (intentional divergence from the proxy at server.ts which requires
it) so standard Anthropic users do not need to set it; Bedrock /
self-hosted-proxy users still can.

Verification: pnpm --filter @open-design/daemon typecheck exits 0;
finalize-design.test.ts loads cleanly with 1 skipped placeholder; no
other tests touched.

Refs nexu-io/open-design#450 (PR 2 scaffold; pipeline body in subsequent
commits)

* feat(daemon): transcript truncation helper for /finalize prompt

Phase D of the PR 2 plan for issue #450: lands the helper that bounds
the transcript section of the synthesis prompt.

Why this exists: real-world signal at authoring time was a local project
transcript already at 3.95 MB. Anthropic's claude-opus-4-7 context cap
is roughly 200K tokens (~700 KB at typical density). Inserting an
unbounded transcript would 4xx upstream on the first real call. This
helper keeps the on-disk .transcript.jsonl lossless (PR #493's contract)
while making the prompt-inclusion bounded.

Strategy:
- Cap output at INPUT_BODY_CAP_BYTES (384 KiB) so the prompt has room
  for the system prompt + design system body + current artifact + room
  for the synthesis output.
- Always preserve the header line - it carries projectId, schemaVersion,
  conversation/message counts, attachment counts; synthesis quality
  depends on knowing the original sizes.
- Split equal byte budgets between head and tail so both project genesis
  and most-recent intent survive. Two thinking segments separated only
  by mid-session truncation lose the same kind of boundary that PR #493
  preserves between thinking blocks - that's accepted; smarter
  semantic chunking is a follow-up.
- Insert a single `{"kind":"truncated","reason":"size","omittedBytes":N}`
  sentinel JSON line between the head and tail so a synthesis consumer
  can detect the gap. omittedBytes is the difference between the
  original UTF-8 byte length and the output's UTF-8 byte length.
- If the head + tail budgets together cover the whole body (e.g. all
  message lines are tiny), no marker is emitted - the output is the
  input verbatim.

Tests:
- "returns the input verbatim when the JSONL fits under the 384 KiB cap"
  pins that small transcripts pass through unchanged with no marker.
- "head+tail truncates with a single marker line when the JSONL exceeds
  the 384 KiB cap" pins that output is bounded, header survives, exactly
  one marker emitted with non-zero omittedBytes, both ends of the body
  preserved, and at least one middle message omitted.

Suite delta: +2 tests in finalize-design.test.ts.

Refs nexu-io/open-design#450

* fix(daemon): resolve noUncheckedIndexedAccess in truncateTranscriptForPrompt

D1 (0eaa123) shipped with `body[headIndex]` and `body[i]` typed as
`string | undefined` under TypeScript's `noUncheckedIndexedAccess`
strict mode. Local typecheck would have caught it but the prior
verification piped through `tail` which masked the non-zero exit code
of `tsc`. Coalesce each access via `?? ''` (the array is from
`String.split('\n')` so undefined elements are not actually reachable;
the coalesce is a type-narrowing convenience, not a behavior change).

Verification: `pnpm --filter @open-design/daemon typecheck` exits 0;
`pnpm --filter @open-design/daemon test finalize-design` shows 2/2 +
1 skipped, identical to the pre-fix run.

Refs nexu-io/open-design#450

* feat(daemon): current-artifact resolver for /finalize

Phase E of the PR 2 plan for issue #450: resolves which artifact (if
any) accompanies the transcript + design system in the synthesis prompt.

Priority order (Bryan-locked in plan §6):
  1. The file referenced by tabs.is_active = 1 IF an
     <name>.artifact.json sidecar exists on disk. Sidecar presence is
     the discriminator: an inferred manifest from
     `inferLegacyManifest` (e.g. for a bare .html with no sidecar)
     does NOT count, and an active tab pointing at a non-artifact file
     (.md, .txt) falls through.
  2. Newest project file with a real .artifact.json sidecar, sorted by
     manifest.updatedAt descending. Files without an updatedAt sort
     last so legacy pre-streaming manifests do not get accidentally
     promoted.
  3. Returns null - "no artifact in scope". The Phase H caller will
     emit `artifact: null` in the response and the prompt's "Current
     artifact" section will read "none".

Sidecar presence is checked via `existsSync` on the on-disk path, NOT
via the `artifactManifest` field returned by readProjectFile/listFiles
(those run inferLegacyManifest as a fallback for known kinds, which
would otherwise cause a bare .html with no sidecar to look like an
artifact).

Tests:
- "returns the active-tab artifact when its sidecar is present, even
  if a newer artifact exists elsewhere": pinned.html (older
  updatedAt) is in the active tab; newer.html (newer updatedAt) is
  not. Resolver returns pinned.html - intent (active tab) beats
  recency.
- "falls through to newest .artifact.json when active tab points at a
  non-artifact file": README.md is the active tab (no sidecar);
  design.html has a real sidecar. Resolver falls through and returns
  design.html.
- "returns null when no active tab and no .artifact.json sidecars
  exist": only a README.md is in the project; no tabs row. Resolver
  returns null.

Suite delta: +3 tests in finalize-design.test.ts (5 active total).

Refs nexu-io/open-design#450

* feat(daemon): synthesis prompt construction for /finalize

Phase F of the PR 2 plan for issue #450: builds the system + user
prompts that get sent to Anthropic's Messages API in the synthesis
call. Pure function; no IO, no side effects.

System prompt (literal, stored as a module-level constant): instructs
Claude to emit a DESIGN.md document with a fixed 7-heading structure
(# DESIGN.md / ## Summary / ## Brand & Voice / ## Information
Architecture / ## Components & Patterns / ## Visual System / ## Open
Questions / ## Provenance). The Provenance section is required to list
project ID, design system, current artifact, transcript message count,
and the UTC generation timestamp.

User prompt (built at runtime): structured payload with the truncated
transcript JSONL, the design system body, and the current artifact
body, each under a ## heading. Missing inputs (no design system
selected, no artifact in scope) produce explicit "none" headings +
parenthetical placeholder body so Claude does not hallucinate content
for absent sections.

Truncation is the caller's concern - this function does not
re-truncate. The caller (Phase H pipeline) feeds in a JSONL that has
already been bounded by truncateTranscriptForPrompt.

Tests:
- "includes the transcript JSONL verbatim and the generation context":
  pins all section headings, the transcript body verbatim, the design
  system body verbatim, the artifact body verbatim, and every
  generation-context line.
- "falls back to \"none\" + parenthetical when no design system is
  selected": designSystemId=null and designSystemBody=null -> heading
  reads "## Active design system: none" with the parenthetical body.
- "falls back to \"none\" + parenthetical when no artifact is in
  scope": artifact=null -> heading reads "## Current artifact: none"
  with the parenthetical body.

Suite delta: +3 tests in finalize-design.test.ts (8 active total).

Refs nexu-io/open-design#450

* feat(daemon): Anthropic call + retry strategy for /finalize

Phase G of the PR 2 plan for issue #450: lands the upstream Claude
Messages API call with a single transient-error retry, plus the
response extractor that turns Anthropic's content array into the
DESIGN.md body.

What lands here:
- appendVersionedApiPath: inlined from the connectionTest helper at
  apps/daemon/src/connectionTest.ts:188-195 (it is not exported there).
  Appends /v1/messages when the base URL has no /vN segment, otherwise
  appends /messages directly. Same semantics; ~5 lines.
- callAnthropicWithRetry: POSTs to <base>/v1/messages with the canonical
  Anthropic headers (content-type, x-api-key, anthropic-version:
  2023-06-01) and body shape ({ model, max_tokens, system, messages,
  stream:false }). One retry on transient (HTTP 429 or 5xx); on terminal
  failure throws FinalizeUpstreamError carrying the upstream HTTP
  status and raw body text. The route handler in Phase I maps status
  to AUTH_FAILED / RATE_LIMITED / UPSTREAM_FAILED and runs the body
  through redactSecrets before exposing it as `details`.
- extractDesignMd: concatenates content[].text for every block where
  type === 'text', preserving order. Throws FinalizeUpstreamError(502)
  on three malformed-response shapes: non-object payload, missing
  content array, zero text blocks. The route handler maps the throw
  to 502 UPSTREAM_FAILED so synthesis cannot land a half-empty
  DESIGN.md on disk.
- Test-only `_sleepMs` injection on the call params so the retry-delay
  sleep is instant under vitest. Default sleep uses setTimeout.

Retry posture (1 retry on transient) is opinionated; the maintainer's
"standard exponential backoff" answer was directional and a single
retry matches the existing daemon's posture (transcript export and
connectionTest do zero retries) while staying inside the daemon's
blocking-fast posture for /finalize.

Tests:
- callAnthropicWithRetry: throws on 401 with no retry; retries once
  on 429 and resolves on second 200; throws after both 5xx attempts;
  propagates AbortError when signal is pre-aborted.
- extractDesignMd: concatenates ordered text blocks; throws on
  missing content array; throws on content with zero text blocks.

A spurious typecheck error from `exactOptionalPropertyTypes` (signal
typed as AbortSignal | undefined where RequestInit expects
AbortSignal | null) was resolved by conditionally spreading signal
into the RequestInit literal.

Suite delta: +7 tests in finalize-design.test.ts (15 active total).

Refs nexu-io/open-design#450

* feat(daemon): wire /finalize pipeline end-to-end

Phase H of the PR 2 plan for issue #450: stitches together every
phase D-G primitive into the full finalizeDesignPackage pipeline that
the route handler in Phase I will expose over HTTP.

Pipeline (in execution order, all inside a try/finally that always
releases the lockfile):
1. getProject(db, projectId): defensive 404 (the route validates first;
   this throw catches direct CLI/script callers).
2. mkdirSync(<projectDir>, { recursive: true }): some projects have DB
   rows but no on-disk dir yet (PR #493's same fix).
3. fs.openSync(.finalize.lock, 'wx'): EEXIST -> FinalizePackageLockedError
   (mirror PR #493's TranscriptExportLockedError).
4. exportProjectTranscript(db, projectsRoot, projectId, { now }): produces
   .transcript.jsonl on disk; we read the body and run it through
   truncateTranscriptForPrompt to bound the prompt-inclusion size.
5. readDesignSystem(designSystemsRoot, designSystemId): returns null when
   the project has no design_system_id selected, when the design system
   directory does not exist, or when the DESIGN.md file is missing.
6. resolveCurrentArtifact(db, projectsRoot, projectId): active tab ->
   newest .artifact.json by manifest.updatedAt -> null.
7. buildSynthesisPrompt({...}): system + user prompt (per Phase F).
8. callAnthropicWithRetry({...}): one retry on 429/5xx; throws
   FinalizeUpstreamError on terminal failure.
9. extractDesignMd(payload): concatenates content[].text blocks; throws
   FinalizeUpstreamError(502) on malformed shape.
10. Atomic write: writeFileSync({flag:'wx'}) -> reopen for fsync ->
    rename. Errors unlink tmp before rethrowing.
11. Lock release in finally (always closeSync + unlinkSync).

Bounded blocking: the function uses its own AbortController + 120s
timeout when the caller does not supply a signal. Caller-supplied
signal takes precedence.

Type tightening: switched the local Db interface to
`type Db = Database.Database` (better-sqlite3) so the function signature
is compatible with `exportProjectTranscript`'s typed parameter. Source
file already had a `better-sqlite3` import in claude-design-import area
of the daemon, so no new dependency.

Tests:
- "writes DESIGN.md atomically on the happy path": end-to-end with
  seeded project + conversation + 2 messages + design system on disk;
  asserts file at exact path + body bytes match the fetch mock.
- "response carries every documented field with correct types":
  designMdPath/bytesWritten/model/inputTokens/outputTokens/artifact/
  transcriptMessageCount/designSystemId all present and typed.
- "emits design system 'none' in the prompt when no design_system_id is
  set": fetch mock asserts on the body it receives.
- "throws FinalizePackageLockedError when .finalize.lock is already
  held": pre-create lockfile; assert throw + DESIGN.md not written +
  pre-existing lock NOT unlinked (we did not own it).
- "replaces an existing DESIGN.md atomically on a second finalize":
  inject a sentinel between two finalize calls; assert sentinel is
  gone after second run.
- "cleans up tmp file AND lock file on every error path": mock
  fs.writeFileSync to throw on the tmp path; assert no DESIGN.md.tmp.*
  remain, no DESIGN.md, no .finalize.lock.
- "uses the default https://api.anthropic.com baseUrl when baseUrl is
  omitted": fetch URL begins with the default; baseUrl=undefined path.

vi.restoreAllMocks() now runs in afterEach so the writeFileSync spy
from the cleanup test does not leak into subsequent tests.

Suite delta: +7 tests in finalize-design.test.ts (22 active total).

Refs nexu-io/open-design#450

* feat(daemon): /finalize HTTP route handler + error mapping

Phase I of the PR 2 plan for issue #450: replaces the Phase C stub's
catch-all 500 with status-aware error mapping that surfaces the right
HTTP status + error code for each documented failure mode, and adds
HTTP-layer tests that boot startServer to exercise the route's
validation branches.

Route handler changes:
- :id format guard: an inline regex matching isSafeId at
  apps/daemon/src/projects.ts:556-558 rejects unsafe ids with 400
  BAD_REQUEST before any DB or filesystem work. Without this, an id
  like 'bad!id' would either fail getProject as 404 (wrong code) or
  reach the function and throw 'invalid project id' (mapped to 500).
- FinalizeUpstreamError mapping is now status-aware:
  - upstream 401 -> 401 AUTH_FAILED
  - upstream 429 -> 429 RATE_LIMITED
  - upstream 5xx (or our own 502 sentinel for malformed responses)
    -> 502 UPSTREAM_FAILED
  In all cases the upstream raw text is run through redactSecrets so
  the apiKey cannot leak through `details` even if the upstream
  echoes the inbound headers.
- AbortError mapping: when the 120s AbortController fires (or the
  caller pre-aborted the signal), surface as 503 TIMEOUT.
- Default case: console.error the error per daemon convention; client
  sees 500 INTERNAL with the message routed through redactSecrets.
- Imported redactSecrets alongside the existing connectionTest
  imports (apps/daemon/src/server.ts:51).

HTTP-layer tests (boot startServer({port:0,returnServer:true}) once
in beforeAll, mirror the proxy-routes.test.ts pattern):
- "400 BAD_REQUEST when baseUrl is not a valid URL (test #13)":
  baseUrl='not-a-url'.
- "403 FORBIDDEN when baseUrl points at a private internal IP (test
  #14)": baseUrl='http://10.0.0.1'. Note: validateBaseUrl explicitly
  allows loopback (for local OpenAI-compatible servers) and only
  blocks non-loopback private IPs (10/8, 172.16/12, 192.168/16,
  fc00::/7, fe80::/10).
- "400 BAD_REQUEST when apiKey is missing (test #15)": apiKey omitted.
- "400 BAD_REQUEST when :id contains characters outside the safe-id
  regex (test #16)": id='bad!id' contains '!' which is not in
  [A-Za-z0-9._-].

Suite delta: +4 tests (26 active in finalize-design.test.ts).
Full daemon suite: 1078/1078 pass; baseline+26 (the +5 above plan
target reflects retry+extract split into more granular unit tests
than originally enumerated; all real, none skipped).

Refs nexu-io/open-design#450

* fix(daemon): tighten isSafeId to reject pure-dot project ids

Addresses the P1 path-traversal finding from @lefarcen on PR #832
(https://github.com/nexu-io/open-design/pull/832#discussion_r3202512644).

The pre-fix `isSafeId` at apps/daemon/src/projects.ts:556-558 used
regex `/^[A-Za-z0-9._-]{1,128}$/` which permitted pure-dot ids
(`.`, `..`, `...`) because `.` is in the character class. `projectDir`
and `resolveProjectDir` both delegated to `isSafeId`, so an id of
`..` would resolve to the PARENT of `.od/projects/` via `path.join`.

Threat model (per @lefarcen):
- An attacker creates a project row whose stored id is `..` (or
  another pure-dot variant) — for instance via a workflow that
  writes the row directly without going through the API. Subsequent
  finalize/write ops keyed by that id then escape the project tree.
- A direct CLI / scripted caller passing `..` as the project id
  reaches the function without HTTP normalization saving us. (Express
  normalizes %2e%2e to .. and collapses path segments, which yields
  404 for the URL `/api/projects/%2e%2e/...` in practice — but that's
  Express's protection, not ours.)

Fix:
- isSafeId now explicitly rejects pure-dot ids (`/^\.+$/.test(id)`)
  before the char-class regex check. Empty string and inputs longer
  than 128 chars are also rejected explicitly so the function fails
  closed on edge cases.
- isSafeId is now exported from apps/daemon/src/projects.ts so the
  /finalize route handler in apps/daemon/src/server.ts can use the
  same validator instead of re-implementing the regex inline. This
  prevents drift between the route guard and the projectDir guard,
  which was how this hole originally appeared.

Tests (in finalize-design.test.ts because that's where the threat was
flagged; isSafeId is daemon-wide so a dedicated test file would also
work):
- isSafeId rejects `.`, `..`, `...`, `....`
- isSafeId rejects ids with `/`, `\`, `!`, leading whitespace
- isSafeId rejects empty string and >128 chars
- isSafeId rejects non-string inputs (null/undefined/number)
- isSafeId accepts plain ids, ids with mid-string dots, UUIDs, single chars

Suite delta: +7 tests (33 active in finalize-design.test.ts).
Full daemon suite: 1085/1085.

Refs nexu-io/open-design#832

* fix(daemon): address PR #832 P1 findings — imported folders + network 502

Addresses two of the three P1 findings from @lefarcen on PR #832:

1. Imported-folder projects route DESIGN.md to metadata.baseDir
   (https://github.com/nexu-io/open-design/pull/832#discussion_r3202512656,
   also flagged independently by @chatgpt-codex-connector at #discussion_r3202430470)

   The pipeline previously called `projectDir(projectsRoot, projectId)`
   unconditionally, which resolves to `.od/projects/<id>`. For projects
   created via /api/import/folder the project row's `metadata.baseDir`
   carries the user's actual folder; without threading metadata through,
   finalize would silently land DESIGN.md in the hidden daemon data dir
   and the current-artifact resolver would miss the user's real files.

   Fix: switch from `projectDir` to `resolveProjectDir(projectsRoot,
   projectId, metadata)` in both `finalizeDesignPackage` and
   `resolveCurrentArtifact`. Thread `project.metadata` (from
   `getProject`'s normalized row) through both call paths. The resolver
   gets a new optional `metadata` parameter; native projects pass null
   and get identical behavior.

2. Network failures and JSON parse errors now map to 502 UPSTREAM_FAILED
   (https://github.com/nexu-io/open-design/pull/832#discussion_r3202512661)

   Pre-fix, only HTTP-non-OK responses were wrapped as
   FinalizeUpstreamError. DNS failures (ECONNREFUSED, ENOTFOUND), fetch
   TypeErrors, and `response.json()` SyntaxErrors fell through to the
   route's catch-all and surfaced as 500 INTERNAL — incorrect: those are
   upstream-level failures, not daemon bugs.

   Fix:
   - Wrap callAnthropicWithRetry in a try/catch that passes
     FinalizeUpstreamError and AbortError through verbatim, but rewraps
     any other thrown error as FinalizeUpstreamError(502, '', message).
   - Wrap response.json() in a try/catch that rewraps SyntaxError as
     FinalizeUpstreamError(502, '', "upstream Anthropic returned non-JSON
     body: ...").
   - The route handler's existing FinalizeUpstreamError mapping then
     correctly maps these to 502 with the message in `details` (run
     through redactSecrets first).

Tests:
- "writes DESIGN.md under metadata.baseDir for imported-folder projects":
  inserts a project row with metadata.baseDir pointing at a
  user-folder temp dir; asserts result.designMdPath lands there AND
  the hidden .od/projects/<id> dir does NOT contain a DESIGN.md.
- "rewraps fetch network rejection as FinalizeUpstreamError(502)":
  fetchImpl throws TypeError with cause.code='ENOTFOUND'; assert thrown
  error has name=FinalizeUpstreamError and status=502.
- "rewraps 200 with non-JSON body as FinalizeUpstreamError(502)":
  fetchImpl returns 200 with text/html body; response.json() throws
  SyntaxError internally; assert FinalizeUpstreamError(502).

Suite delta: +3 tests (36 active in finalize-design.test.ts).
Full daemon suite: green at last check; will re-verify before push.

Refs nexu-io/open-design#832

* refactor(daemon): move /finalize DTOs to contracts + map error codes + validate active-tab

Addresses the P2 and P3 findings from @lefarcen on PR #832:

P2 — Error codes + DTOs not in packages/contracts
  https://github.com/nexu-io/open-design/pull/832#discussion_r3202512673

  Reverses my plan's locked decision #10 ("no contracts changes in this
  PR; inline the request/response types"). That rule came from the
  predecessor PROMPT brief's anti-pattern table; @lefarcen's review is
  fresher signal and supersedes it. Drift risk between the daemon's
  inline types and any future PR 3 web client is real.

  - New contracts module: packages/contracts/src/api/finalize.ts with
    FinalizeAnthropicRequest / FinalizeArtifactRef /
    FinalizeAnthropicResponse. Re-exported from the package root and
    made addressable via `@open-design/contracts/api/finalize` subpath.
  - Daemon source imports the canonical types from contracts and
    re-exports the public type names so internal references keep
    working without touching every call site.
  - Daemon-local error codes remapped to existing ApiErrorCode union
    members (apps/daemon/src/server.ts), per @lefarcen's suggested
    mapping:
      FINALIZE_IN_PROGRESS -> CONFLICT
      AUTH_FAILED          -> UNAUTHORIZED
      UPSTREAM_FAILED      -> UPSTREAM_UNAVAILABLE
      TIMEOUT              -> UPSTREAM_UNAVAILABLE (status 503)
      INTERNAL             -> INTERNAL_ERROR
    HTTP status codes are unchanged; only the `code` field in the
    error JSON body changed.

P3 — Active-tab name not validated before sidecar probe
  https://github.com/nexu-io/open-design/pull/832#discussion_r3202512684

  resolveCurrentArtifact now runs the active tab's name through
  validateProjectPath BEFORE composing it into a path.join expression.
  An invalid tab (traversal segments, absolute path, null byte,
  reserved segment) causes resolveCurrentArtifact to fall through to
  the newest-artifact branch rather than abort or probe outside the
  project directory.

Tests:
- "falls through (does not throw) when active tab name contains
  traversal segments": injects a malformed `tabs.name =
  '../../../etc/passwd'` row directly via SQL (bypassing production
  tab-creation validation), seeds a real artifact, asserts the
  resolver returns the real artifact rather than the malformed name.

Suite delta: +1 test (37 active in finalize-design.test.ts).
Full daemon suite: 1089/1089 green.

Refs nexu-io/open-design#832

* fix(contracts): publish /api/finalize as standalone runtime entrypoint

Addresses @mrcfps's CI-red review on PR #832
(https://github.com/nexu-io/open-design/pull/832, inline comment on
packages/contracts/package.json).

The previous J3 commit added `./api/finalize` as a type-only subpath:
the entry had only a `types` field, no `default`. That broke the
contracts package-runtime gate (packages/contracts/tests/package-
runtime.test.ts:38-47) which asserts every exports entry exposes both
a `.mjs` runtime and a `.d.ts` types target. mrcfps proposed two fixes;
this commit takes path B — make finalize a first-class published
module rather than a type-only re-export from the package root.

Path B vs path A (a peer-AI second opinion via /collaborate confirmed):
under NodeNext + ESM with exports-map semantics, TypeScript validates
re-exported symbols against the published module-identity surface.
Because the previous J3 had `./api/finalize` neither declared as an
exports-map entry nor materialized as a standalone .mjs, TS omitted
the re-exported names during package boundary analysis. Even at
runtime `import('@open-design/contracts').FINALIZE_SCHEMA_VERSION`
worked from the bundled index.mjs but the type-checker rejected it.
Path B aligns the runtime and declaration surfaces.

Changes:
- packages/contracts/esbuild.config.mjs: add `./src/api/finalize.ts`
  to entryPoints so dist/api/finalize.mjs is generated as a standalone
  module rather than only inlined into the bundled root.
- packages/contracts/package.json: re-add `./api/finalize` to the
  exports map with both `default: ./dist/api/finalize.mjs` AND
  `types: ./dist/api/finalize.d.ts`. Mirrors `./api/connectionTest`'s
  shape (the canonical pattern for first-class submodule entries).
- packages/contracts/src/api/finalize.ts: keep the runtime export
  `FINALIZE_SCHEMA_VERSION = 1` (giving the standalone module a real
  value to emit beyond the type-only interfaces) and update the
  doc-comment now that the standalone .mjs is wired.
- apps/daemon/src/finalize-design.ts: switch the type import from
  the inline declarations introduced in the prior J3 fallback to
  `import type { ... } from '@open-design/contracts/api/finalize'`.
  Re-export the names so internal references inside finalize-design.ts
  keep working without touching every call site.

Verified:
- node --input-type=module -e "import('@open-design/contracts/api/finalize').then(m=>console.log(JSON.stringify(Object.keys(m))))"
  prints ["FINALIZE_SCHEMA_VERSION"] — runtime resolution clean.
- pnpm --filter @open-design/contracts test: 6/6 (including both
  package-runtime.test.ts cases on the rebuilt exports map).
- pnpm --filter @open-design/daemon typecheck: exits 0.
- pnpm --filter @open-design/daemon test: 1089/1089 (no regression vs
  the prior J3 number).

Refs nexu-io/open-design#832

---------

Co-authored-by: DevForgeAI CI/CD Engineer <devforge-ai@development.ai>
2026-05-08 19:52:11 +08:00
Tom Huang
d592f6087f
feat(mcp): external MCP client with daemon-managed OAuth and 39 design-focused templates (#898)
* feat(mcp): add external MCP client with daemon-managed OAuth and 17 design-focused templates

Open Design now acts as an MCP CLIENT and surfaces tools from third-party
MCP servers to the underlying agent (Claude Code, Hermes, Kimi).

Daemon
- New mcp-config / mcp-oauth / mcp-tokens modules: persist server entries
  to .od/mcp-config.json, run the OAuth dance for HTTP/SSE servers
  end-to-end on the daemon (so cloud deployments work and tokens
  survive across turns), and inject Authorization: Bearer headers into
  the per-spawn .mcp.json the daemon writes for Claude Code (or the
  ACP mcpServers map for Hermes/Kimi).
- /api/mcp/servers and /api/mcp/oauth/{start,status,disconnect}
  endpoints, plus spawn-time wiring in agents that hands the configured
  servers to the active agent CLI.
- System-prompt directive for connected external MCPs so the model
  does not chase Claude Code's synthetic *_authenticate /
  *_complete_authentication tools when the Bearer is already pinned.

Web
- Settings -> External MCP servers panel with per-row OAuth Connect /
  Disconnect / Refresh affordances and per-row template hints.
- New "Add server" picker categorized into 7 groups
  (image-generation, image-editing, web-capture, ui-components,
  data-viz, publishing, utilities) with a search box, sticky close
  button, collapsible <details> sections (auto-expand on search),
  60vh capped scroll region, and a pinned Custom-server footer.
- ChatComposer /mcp slash and MCP picker button forward to the new
  Settings tab; AssistantMessage renders MCP tool calls inline;
  markdown autolinker handles bare http(s) URLs (incl. OAuth links)
  before italic markers so OAuth callback URLs do not get
  italic-fragmented mid-token.

Contracts
- packages/contracts/src/api/mcp.ts owns the wire shapes
  (McpServerConfig, McpTemplate with stable McpTemplateCategory
  enum, McpServersResponse, OAuth start/status/disconnect bodies, the
  postMessage payload from the OAuth callback).

Templates (17 built-in)
- image-generation: Higgsfield (OpenClaw, OAuth HTTP), Pollinations,
  Allyson (animated SVG), AWS Bedrock Image (uvx).
- image-editing: Imagician, ImageSorcery.
- web-capture: just-every screenshot-website-fast, ScreenshotOne.
- ui-components: 21st.dev Magic, shadcn/ui, FlyonUI.
- data-viz: AntV Chart, Mermaid.
- publishing: EdgeOne Pages.
- utilities: Filesystem, GitHub, Fetch.

Tests
- apps/daemon/tests/mcp-{config,oauth,tokens,spawn}.test.ts cover
  storage round-trip, OAuth helpers, token persistence, spawn-time
  wiring, every template's transport / command / args / env-field
  invariants, and the canonical category enum.
- apps/web/tests/runtime/markdown.test.tsx covers the new autolinker
  ordering rules.

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(mcp): add 21 more design-focused templates and a `design-systems` category

Expands the built-in MCP picker from 17 to 38 templates so users can compose
the full Open Design craft loop (design-system intake → generate → edit →
audit → publish) without leaving the Settings dialog. Every install spec is
verified live against the upstream README; templates that needed Go binaries,
multi-step `init` ceremonies, or massive runtime stacks (PostgreSQL + Redis
+ Ollama) are intentionally deferred so picking a template still resolves to
a working server in one click.

New `design-systems` category between `web-capture` and `ui-components`
(reflects the upstream-of-components position in the workflow). Mirrored in
`McpTemplateCategory` on both contracts and daemon, and `CATEGORY_ORDER` on
the web side.

New templates by category:

- image-generation (+4): prompt-to-asset (icons / favicons / OG / logos with
  free-tier routing across Cloudflare AI / NVIDIA NIM / HF / Stable Horde),
  Nano Banana (hosted streamable HTTP, virtual try-on + product placement),
  Seedream (hosted streamable HTTP, ByteDance Seedream v3-v5 + SeedEdit),
  fal.ai (uvx, 600+ models incl. FLUX / Kling / Hunyuan / MusicGen).
- image-editing (+3): Photopea (34 layered-editor tools — closes the PSD
  gap), Topaz Labs (AI upscale / denoise / sharpen), Transloadit (86+ media
  pipeline robots).
- web-capture (+1): Pagecast (browser → demo GIF / MP4 with auto-zoom).
- design-systems (+4, NEW category): Figma-Context (Framelink, designs →
  code), Design Token Bridge (Tailwind ⇄ CSS ⇄ Figma ⇄ M3 / SwiftUI / W3C
  DTCG + WCAG contrast), Design System Extractor (Storybook scrape),
  Aesthetics Wiki (cottagecore / dark-academia / y2k / … moodboards).
- data-viz (+2): MCP Dashboards (45+ chart types + KPI dashboards),
  Excalidraw Architect (hand-drawn architecture diagrams).
- publishing (+6): PageDrop, PDFSpark, OGForge, QRMint, Slideshot
  (HTML → PDF / PPTX / PNG with 7 themes), Deckrun (Markdown → PDF / video,
  hosted free tier with no key required).
- utilities (+1): A11y axe-core (WCAG 2.0/2.1/2.2 + color-contrast + ARIA).

Tests cover every new template's wiring (command, args, env / header
required-vs-optional, secret flag), the category enum invariant, and
in-category declaration order for image-generation, design-systems and
publishing buckets where the order is what users see in the picker. 21 new
test cases pass; full mcp-config suite is green.

Templates intentionally deferred (documented in PR body): figma-use
(needs Figma desktop with --remote-debugging-port=9222), m-moire (multi-step
`memi suite init` + daemon ceremony), gemini-media-mcp + trident-mcp (Go
binaries — no npx / uvx path), Pixelle-MCP (full app with web UI + ComfyUI
backend), storybook-addon-mcp (lives inside user's Storybook, not standalone),
primitiv (multi-step init / build / serve), ReftrixMCP (PostgreSQL + Redis +
Ollama + DINOv2), narasimhaponnada/mermaid (overlap with peng-shawn).

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(mcp): add figma-use template (write designs from chat) under design-systems

figma-use is the natural counterpart to Figma-Context already in this PR:
where Framelink reads Figma designs into the model, figma-use writes back
into the canvas (90+ tools — create frames / text / components / variants,
render JSX into Figma, export PNG/SVG, query nodes via XPath, lint for
WCAG / auto-layout / hardcoded colors, analyze design systems).

Wired as an HTTP MCP template (`http://localhost:38451/mcp`) because
`figma-use mcp serve` only exposes HTTP — there's no stdio mode in the
upstream `serve.ts`. No API key. Two prerequisites the user owns are
spelled out in the description so picking the template still resolves to
a working server: (1) start Figma with `--remote-debugging-port=9222`
(or `figma-use daemon start --pipe` on Figma 126+), and (2) leave
`npx figma-use mcp serve` running in a terminal.

Inserted between `design-system-extractor` and `aesthetics-wiki` so the
design-systems category reads as a workflow: read existing design (Figma
Context) → translate tokens (Token Bridge) → extract from Storybook
(Extractor) → write back to Figma (figma-use) → break creative block
(Aesthetics Wiki).

Tests cover the new template's transport (`http`), endpoint URL, the
empty header-fields invariant (no auth required), and bump the
design-systems group order to include it.

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(settings): i18n the External MCP / MCP server / Connectors sidebar entries and make the dialog header track the active section

The External MCP sidebar entry this PR introduces was hardcoded English
("External MCP / Add MCP tools (Higgsfield, GitHub…)"). Same for the
adjacent Connectors and MCP server entries. The dialog header was also
pinned to "Execution & model" copy, so opening Settings → External MCP
showed a header that lied about which section the user was on.

Adds six translation keys — `settings.connectorsTitle/Hint`,
`settings.mcpServerTitle/Hint`, `settings.externalMcpTitle/Hint` — and
translates them across all 17 locales (ar, de, en, es-ES, fa, fr, hu, id,
ja, ko, pl, pt-BR, ru, tr, uk, zh-CN, zh-TW).

`SettingsDialog` now derives the header title/subtitle from the active
section (11 sections total) instead of a single hardcoded pair, so each
section renders an honest header.

Co-authored-by: Cursor <cursoragent@cursor.com>

* test(e2e): pin level: 3 on dialog heading lookups for Pets and Connectors

CI's Validate workspace job (#1479) failed two Playwright cases with the
strict-mode violation:

  getByRole('dialog').getByRole('heading', { name: 'Pets' })
  resolved to 2 elements:
    1) <h2>Pets</h2>
    2) <h3>Pets</h3>

Same root cause as the unit-test fix already in this PR: the dynamic
dialog `<h2>` now echoes the section's own `<h3>` because the dialog
header tracks the active section. Disambiguate to `level: 3` so each
assertion still pins the section heading specifically (which is what
the test intends to verify).

Audit of the rest of e2e/ for `dialog.getByRole('heading', ...)` —
settings-api-protocol.test.ts looks for "OpenAI API" / "Anthropic API"
section h3s which never appear in the dialog `<h2>` (always
"Execution & model"), so those stay safe.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(mcp): bind OAuth refresh to the issuing client and skip stale tokens

Persist the OAuth client context (token endpoint, client_id, client_secret,
issuer, redirect_uri, resource) alongside the bearer token so refresh hits
the same client the refresh_token was bound to (RFC 6749 §6). The previous
refresh path re-ran beginAuth with a dummy OOB redirect URI, which kept
getOrRegisterClient from finding the original DCR client and made
providers reject the refresh on the next chat turn. Refreshes now reuse
the persisted endpoint/client pair directly.

Also stop injecting expired access tokens at spawn time when refresh is
unavailable or fails. Pinning a stale Bearer made every Claude MCP call
401 while the prompt still treated the server as connected; on that path
we now skip the entry and let the UI surface a reconnect.

Generated-By: looper 0.6.1 (runner=fixer, agent=claude-code)

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-08 17:59:20 +08:00
Tom Huang
56bf6ee1b6
feat: agent-callable research command and /search (#615)
* feat: pre-generation research (Tavily) for grounded generation

Adds an optional pre-generation research step so the agent can produce
slides / prototypes / decks grounded in real sources instead of guessing.

User flow:
  1. Settings -> Tavily Search -> paste API key (or set TAVILY_API_KEY).
  2. Click the new Research button in the chat composer.
  3. On send, the daemon runs a Tavily search, prepends the findings
     as a <research_context> block ahead of the system prompt, and
     spawns the agent. Research progress shows up as status pills in
     the chat stream; the agent cites sources inline as [1]/[2]/...

Phase 1 surface:
  - Single provider (Tavily), single depth ('shallow'), no LLM
    synthesis pass (Tavily's `answer` is the summary).
  - Composer toggle only; no popover / depth picker yet.
  - Reuses the existing `status` SSE agent payload + StatusPill UI
    so no new event variants or renderer code are needed.

Layers touched:
  - contracts: ResearchOptions / Source / Findings DTOs;
    ChatRequest.research; export from index.
  - daemon: apps/daemon/src/research/{index,tavily}.ts orchestrator
    + provider; tavily added to MEDIA_PROVIDERS and ENV_KEYS; hook
    in startChatRun before prompt assembly.
  - web: ChatComposer toggle + ChatSendMeta; threaded through
    ChatPane / ProjectView / streamViaDaemon into ChatRequest.

Side fix (required to land the feature, but useful on its own):
  contracts internal relative imports lacked the `.js` suffix that
  NodeNext module resolution requires. This was already breaking
  `pnpm --filter @open-design/daemon typecheck` on main; without the
  fix, none of the new research types were visible to the daemon.
  All internal contracts imports now carry `.js`.

Spec: specs/current/research-feature.md (phases 2-4 outlined for
follow-up: composer popover, multi-provider, deep recursion, example
skills with research_recommends).

Verified:
  - pnpm --filter @open-design/contracts typecheck/test
  - pnpm --filter @open-design/daemon typecheck (the chokidar
    project-watchers test is a pre-existing flake, unrelated)
  - pnpm --filter @open-design/web typecheck
  - node scripts/verify-media-models.mjs

* fix(daemon): clamp Tavily max_results to 20

Tavily's /search endpoint requires `max_results` in [0, 20]; sending a
larger value (e.g. when `research.depth: "deep"` resolves to 30) returns
400 and `runResearch` silently falls back to no-research. Clamp at the
provider boundary so Phase 2 depth tiers above 20 still produce results
instead of failing the request.

Generated-By: looper 0.6.1 (runner=fixer, agent=claude-code)

* Remove stale research merge leftovers

* Add agent-callable research search

* Fix Indonesian locale typecheck

* Fix research command invocation edge cases

* Harden slash search prompt expansion

* Honor research source caps in command contract

* Require search reports in design files

* Add research data provider settings

* Wire web research provider fallback order

* Update research provider fallback wording

* Revert "Update research provider fallback wording"

This reverts commit 86fb6001e3.

* Revert "Wire web research provider fallback order"

This reverts commit 4c9e16036b.

* Revert "Add research data provider settings"

This reverts commit 23630d1746.

* Add Dexter and Last30Days research skills

* Add DCF and Last30Days OD skills

* Add Last30Days and Dexter skills

* Resolve research review threads

---------

Co-authored-by: a1chzt <chizblank@gmail.com>
2026-05-08 10:33:44 +08:00
monshunter
e6e5928be1
feat(web): add connection tests for execution settings (#507)
* feat(settings): add connection test for providers and CLI agents

Adds a "Test" action in the Settings dialog that verifies the configured
provider (Anthropic/OpenAI/Azure/Google) or CLI agent without sending a
real chat. Backed by a new daemon endpoint and shared contracts, with
categorized inline statuses and i18n strings across all supported locales.

* fix(settings): address connection test review feedback

* fix(daemon): pass empty MCP servers for connection probes

* fix(connection-test): address review blockers

* fix(daemon): fail json stream runs on structured errors

* fix(contracts): build connection test subpath export

* Use draft CLI env in agent connection tests

* fix(i18n): add fallback ids for new curated content
2026-05-07 11:25:37 +08:00
Marc Chan
c3d9136a0c
Add live artifacts and Composio connector catalog (#381)
* docs: add live artifacts implementation spec

* docs: align live artifacts implementation plan

* Ralph iteration 1: work in progress

* Ralph iteration 2: work in progress

* Ralph iteration 3: work in progress

* Ralph iteration 4: work in progress

* Ralph iteration 5: work in progress

* Ralph iteration 6: work in progress

* Ralph iteration 7: work in progress

* Ralph iteration 8: work in progress

* Ralph iteration 9: work in progress

* Ralph iteration 10: work in progress

* Ralph iteration 11: work in progress

* Ralph iteration 12: work in progress

* Ralph iteration 13: work in progress

* Ralph iteration 14: work in progress

* Ralph iteration 15: work in progress

* Ralph iteration 16: work in progress

* Ralph iteration 17: work in progress

* Ralph iteration 18: work in progress

* Ralph iteration 19: work in progress

* Ralph iteration 20: work in progress

* Ralph iteration 21: work in progress

* Ralph iteration 22: work in progress

* Ralph iteration 23: work in progress

* Ralph iteration 24: work in progress

* Ralph iteration 25: work in progress

* Ralph iteration 26: work in progress

* Ralph iteration 27: work in progress

* Ralph iteration 28: work in progress

* Ralph iteration 29: work in progress

* Ralph iteration 30: work in progress

* Ralph iteration 31: work in progress

* Ralph iteration 32: work in progress

* Ralph iteration 33: work in progress

* Ralph iteration 34: work in progress

* Ralph iteration 35: work in progress

* Ralph iteration 36: work in progress

* Ralph iteration 37: work in progress

* Ralph iteration 38: work in progress

* Ralph iteration 39: work in progress

* Ralph iteration 40: work in progress

* Ralph iteration 41: work in progress

* Ralph iteration 42: work in progress

* Ralph iteration 43: work in progress

* Ralph iteration 44: work in progress

* Ralph iteration 45: work in progress

* Ralph iteration 46: work in progress

* Ralph iteration 47: work in progress

* Ralph iteration 48: work in progress

* Ralph iteration 49: work in progress

* Ralph iteration 50: work in progress

* Ralph iteration 51: work in progress

* Ralph iteration 52: work in progress

* Ralph iteration 53: work in progress

* Ralph iteration 54: work in progress

* Ralph iteration 55: work in progress

* Ralph iteration 56: work in progress

* Ralph iteration 57: work in progress

* Ralph iteration 58: work in progress

* Ralph iteration 59: work in progress

* Ralph iteration 60: work in progress

* Ralph iteration 61: work in progress

* Ralph iteration 62: work in progress

* Ralph iteration 63: work in progress

* Ralph iteration 64: work in progress

* Ralph iteration 65: work in progress

* Ralph iteration 1: work in progress

* Ralph iteration 2: work in progress

* Ralph iteration 3: work in progress

* Ralph iteration 4: work in progress

* Ralph iteration 5: work in progress

* Ralph iteration 6: work in progress

* Ralph iteration 8: work in progress

* Ralph iteration 9: work in progress

* Ralph iteration 17: work in progress

* Add Composio-backed connectors

* Add Composio-backed connector catalog

* Fix connector callback flow

* Update live artifact connector refresh

* Fix live artifact refresh updates

* Improve live artifact viewer toolbar

* Refine live artifact source tabs

* Expand Composio connector catalog

* Improve Composio connector browsing

* Fix artifact refresh source safety checks

Generated-By: looper 0.4.1 (runner=fixer, agent=opencode)

* Fix live artifacts PR feedback

Generated-By: looper 0.5.0 (runner=fixer, agent=opencode)

* Fix live artifact preview CORS validation

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* Fix connector OAuth IPv6 loopback hosts

Allow bracketed IPv6 loopback Host headers when deriving connector OAuth callback URLs so IPv6-bound daemons can complete connection flow.

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* Preserve live artifact refresh permissions

Respect explicit refresh permission choices during live artifact create and update flows so revoked connector sources remain gated.

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* Fix live artifact preview cache freshness

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* Fix live artifact refresh validation

Guard manual refreshes with local daemon checks and reject daemon_tool sources without a toolName before refresh execution.

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* Fix Composio credential invalidation

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* Fix live artifact CORS methods

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* Fix workspace validation

Restore media config test isolation under Vitest setup data-dir overrides and add the missing French live artifact display copy so the workspace test suite stays aligned.\n\nGenerated-By: looper 0.5.2 (runner=fixer, agent=opencode)

* Fix connector safety filtering

Keep agent-preview connector listings aligned with execution safety policy and prune stale Composio OAuth state records before they accumulate.

Generated-By: looper 0.5.2 (runner=fixer, agent=opencode)

* Fix agent runtime cleanup

Generated-By: looper 0.5.2 (runner=fixer, agent=opencode)

* Fix live artifact daemon access

Validate local-only live artifact routes against the peer socket address and pass daemon-resolved CLI paths to ACP MCP descriptors.\n\nGenerated-By: looper 0.5.2 (runner=fixer, agent=opencode)

* Fix connector run limit pruning

Evict stale connector rate-limit buckets so long-lived daemon processes do not retain per-run entries indefinitely.\n\nGenerated-By: looper 0.5.2 (runner=fixer, agent=opencode)

* Fix connector compact schemas

Generated-By: looper 0.5.2 (runner=fixer, agent=opencode)

* Improve connector connection feedback

* Adjust connector gate positioning

* Fix live artifact refresh commits

Avoid marking refresh candidates failed after snapshot or state persistence errors by deferring live artifact mutations until the durable refresh metadata is written. Also align connector OAuth callback host validation with daemon loopback handling.\n\nGenerated-By: looper 0.5.4 (runner=fixer, agent=opencode)

* Improve connector search relevance

* fix(daemon): harden connector connection state

Require loopback daemon validation before connector connect side effects and only clear provider-owned connector statuses during credential reset.

Generated-By: looper 0.5.4 (runner=fixer, agent=opencode)

* fix(daemon): guard connector disconnect route

Require local daemon request validation before connector disconnect side effects.

Generated-By: looper 0.5.4 (runner=fixer, agent=opencode)

* fix(daemon): guard composio config updates

Generated-By: looper 0.5.4 (runner=fixer, agent=opencode)

* fix(daemon): dispatch live artifacts mcp first

Route the live-artifacts MCP server before the generic MCP CLI so od mcp live-artifacts starts the dedicated server instead of failing generic argument parsing.\n\nGenerated-By: looper 0.5.4 (runner=fixer, agent=opencode)

* fix(daemon): handle integer connector schemas

Allow JSON Schema integer connector inputs while preserving fractional-value validation so generated connector tool schemas accept valid page sizes and limits.

Generated-By: looper 0.5.4 (runner=fixer, agent=opencode)

* fix: align live artifact refresh error codes

Generated-By: looper 0.5.4 (runner=fixer, agent=opencode)

* Fix live artifact connector refresh flow

* Update live artifact design cards

* Add beta badge to live artifact form

* Remove live artifact tile model

* Fix live artifact refresh sync

* Fix live artifact MCP refresh durability

Generated-By: looper 0.5.4 (runner=fixer, agent=opencode)

* Fix live artifact refresh safety

Enforce persisted refresh opt-out and connector auto-read gating before refresh sources execute.

Generated-By: looper 0.5.5 (runner=fixer, agent=opencode)
2026-05-05 16:42:11 +08:00
Nagendhra Madishetti
76e6c7a9f6
feat: Critique Theater Phase 4 (persistence + transcript + orchestrator) (#481)
* docs(specs): add Critique Theater design spec for panel-tempered artifacts

* docs(specs): add Critique Theater implementation plan

* docs(specs): rename UI to Design Jury, add lane-density modes, ship-rule explainer, label sizing

* feat(contracts): add CritiqueConfig schema and defaults

* fix(contracts): apply Task 1.1 review (CRITIQUE_PROTOCOL_VERSION rename, descriptions, RoleWeights export)

* feat(contracts): add PanelEvent discriminated union and isPanelEvent guard

* fix(contracts): apply Task 1.2 review (exhaustive event-type list, runId guard, import order)

* feat(contracts): add CritiqueSseEvent variants and panelEventToSse mapper

* test(daemon): add v1 wire-protocol golden fixtures for Critique Theater parser

* feat(daemon): add v1 streaming parser for Critique Theater wire protocol

* chore(contracts): add .js extensions to relative imports for NodeNext consumers

* fix(daemon): satisfy noUncheckedIndexedAccess in v1 parser regex match access

* test(daemon): cover parser failure modes; fix unclosed-PANELIST swallow bug

* fix(daemon,contracts): address PR #387 review

- parser now clamps panelist + DIM scores against the run-declared scale
  captured from <CRITIQUE_RUN scale=...>, not a hardcoded 100
- PANELIST appearing before any <ROUND n=...> opens now throws
  MalformedBlockError rather than emitting events with NaN round
- DIM_RE and MUST_FIX_RE hoisted to module scope and lastIndex reset per
  call so the parser hot path stops recompiling regex per artifact
- overflow check after drain simplified to a plain buf.length > cap test
  (the prior compound condition was always true on the right side and
  obscured intent)
- scoreThreshold <= scoreScale refine gains a 1e-9 epsilon so floating
  slack does not reject semantically valid configs
- round-1 designer ARTIFACT guard gains a comment naming the spec
  invariant and the v2 relaxation path
- 3 new regression tests cover the panelist-without-round, scale=10
  clamp, and scale=20 plumbing cases

* docs(specs): rationale for non-goals, failure-mode rate targets, Phase 10 matrix, Phase 14 doc layout

* Merge branch 'main' into feat/critique-theater

Resolves the contracts/index.ts conflict by keeping the .js extensions added
by chore(contracts) 2d6e8d6 and slotting in the new export for ./api/app-config
introduced upstream by #255 (9d700ec). Critique Theater additions
(./sse/critique, ./critique) preserved in their original positions.

Verified after merge:
  pnpm --filter @open-design/contracts test    -> 10/10 pass
  pnpm --filter @open-design/contracts typecheck -> exit 0
  pnpm --filter @open-design/daemon typecheck  -> exit 0
  pnpm --filter @open-design/web typecheck     -> exit 0

Two daemon tests in tests/media-config.test.ts fail both before and after the
merge because they read real OAuth credentials from the developer machine
instead of using mock fixtures. That's an upstream isolation issue on
origin/main, not something this branch introduces.

* fix: unblock web build and address mrcfps PANELIST oversize bypass

The chore commit that added .js extensions to satisfy daemon's nodenext
typecheck broke apps/web's Next.js build, because webpack tried to resolve
the literal ./common.js when only common.ts exists on disk. Replaced with
a subpath approach: contracts/exports gains a './critique' entry pointing
straight at src/critique.ts (which has no relative imports), and daemon
imports route through @open-design/contracts/critique instead of the
barrel. Web keeps the bundler-friendly barrel; daemon's nodenext walks
only the leaf module. All 13 contracts source files reverted to no-.js.

Separately, mrcfps flagged that parserMaxBlockBytes was only enforced on
the leftover buffer after drain returned, so a complete oversized block
arriving in one chunk slipped past the cap. Added an explicit per-block
size check inside drain for every buffered block type (PANELIST,
ROUND_END, SHIP). Three regression tests yield the whole stream as a
single chunk and assert OversizeBlockError fires before any events emit.

* fix(daemon): close three v1 parser invariant gaps from mrcfps review

Three independent gaps that all let malformed or oversized protocol
output pass the v1 envelope contract:

(1) Envelope guard. ROUND, PANELIST, ROUND_END, and SHIP now throw
MalformedBlockError when state.inRun is false. Without this, a stream
that omits <CRITIQUE_RUN> could still emit panelist_* events without
the run_started handshake, leaving downstream reducers with no run-level
config.

(2) UTF-8 byte length. Both the per-block size check and the post-drain
buf-size check now compare Buffer.byteLength(text, 'utf8') against
parserMaxBlockBytes. The previous string-length comparison let multibyte
content (CJK, emoji) inside <NOTES>/<SUMMARY> exceed the configured
byte cap while staying under the JS string length cap, bypassing the
daemon's resource guard.

(3) Header-end ordering. PANELIST, ROUND_END, and SHIP now require the
opener's > to appear before the matched closing tag. A malformed opener
like <PANELIST role="x" score="8"</PANELIST> previously fell through
to the closing tag's > and emitted events for an invalid block.

Four regression tests cover each gap (ROUND-without-run,
SHIP-without-run, multibyte-byte-cap, malformed-opener).

* feat(daemon): add critique_runs persistence (Task 4.1)

Introduces a new SQLite table critique_runs to back the orchestrator's
run lifecycle. Plan called for ALTER TABLE artifacts ADD COLUMN ..., but
artifacts is not a DB concept in this repo; runs get their own table.

- migrateCritique(db) creates the table + two indexes idempotently and
  is wired into the existing migrate(db) flow on daemon boot.
- CRUD helpers (insertCritiqueRun, getCritiqueRun, updateCritiqueRun,
  listCritiqueRunsByProject, deleteCritiqueRun) round-trip rounds_json
  through helpers so callers see typed CritiqueRunRow.
- reconcileStaleRuns flips stale 'running' rows to 'interrupted' with
  a recoveryReason='daemon_restart' marker, supporting the spec's
  daemon-restart-mid-run failure mode.
- Public CritiqueRunStatus union excludes the in-flight 'running' value
  but the runtime CHECK accepts it, matching the spec's lifecycle.
- 11 vitest cases cover migration idempotence, round-trip, default
  rounds, status validation, update + list ordering, deletion, and
  reconciliation, plus FK CASCADE on project deletion.

* feat(daemon): add Critique Theater transcript writer (Task 4.2)

Streams PanelEvent sequences to .ndjson on disk under the artifact dir,
gzipping to .ndjson.gz when the cumulative UTF-8 byte size crosses
gzipThresholdBytes (default 256 KiB). Uses Node fs streams plus
zlib.createGzip so the writer never holds the full transcript in memory.
readTranscript inverts the path and streams events back, picking the
right pipeline by file extension. Covers happy path, large multibyte,
empty input, mid-stream failure cleanup, and unknown-extension reject.

* feat(daemon): add Critique Theater orchestrator (Task 4.3)

Drives one run end-to-end: parses stdout via parseCritiqueStream, scores
each round through scoreboard helpers, persists lifecycle to critique_runs,
and emits CritiqueSseEvent variants on the existing project event bus.
Honors per-round and total timeouts, applies fallbackPolicy when no
<SHIP> arrives, and tees events into writeTranscript so transcripts
stream to disk without buffering the whole run in memory. Defensive entry
validation throws RangeError on invalid CritiqueConfig before any side
effect.

Also adds scoreboard.ts (computeComposite, decideRound, selectFallbackRound)
and re-exports panelEventToSse/CritiqueSseEvent from the critique subpath
so daemon imports never touch the barrel. Fixes missing .js extensions in
sse/critique.ts that caused NodeNext module resolution errors.

* feat(daemon): wire Critique Theater orchestrator into spawn path (Task 4.4)

Adds loadCritiqueConfigFromEnv to read OD_CRITIQUE_* keys with strict
validation at boot. Branches the existing CLI spawn flow on cfg.enabled:
when false (the M0 default) the legacy single-pass generation runs
unchanged; when true the orchestrator owns the run end-to-end. Same SSE
bus, same artifact dir, no behavior change for users until they flip the
flag.

* fix(lockfile): regenerate to include contracts zod + vitest entries

The earlier conflict resolution took main's lockfile and ran pnpm
install, but the install pass on Windows didn't write the contracts
package's zod and vitest entries back into the lockfile. CI's
--frozen-lockfile install rejected the resulting state. Re-running
pnpm install with --no-frozen-lockfile rewrites the lockfile so it
now matches every package.json across the workspace, including
contracts/zod ^3.23.8 and contracts/vitest ^2.1.8. Verified locally:
pnpm install --frozen-lockfile passes.

* fix(daemon): parser ship envelope, SHIP-before-round guard, real artifactRef (Defects 3 + 5)

- ParserOptions gains projectId + artifactId; the parser threads them into
  every emitted ship event's artifactRef so downstream consumers see the
  real run identity instead of empty placeholders.
- <SHIP> now requires at least one closed <ROUND_END> in the same run;
  malformed streams that emit SHIP before any round complete now throw
  MalformedBlockError instead of bypassing the round-1 artifact invariant.
- The SHIP handler validates the inner <ARTIFACT> block is present and
  non-empty; missing artifact raises MissingArtifactError.
- Three new regressions: SHIP-before-round, SHIP-without-artifact,
  artifactRef populated from parser options.
- Orchestrator threads projectId + artifactId into parserOpts.
- Test fixtures updated to include <ARTIFACT> inside <SHIP> blocks.

* fix(daemon): orchestrator owns lifecycle, gzip atomicity, fallback on timeout (Defects 2,4,7,8)

- Orchestrator now accepts child + childExitPromise, races parser /
  child-exit / abort / timeout in one awaited flow, and SIGTERMs the
  child on every non-clean termination. Server awaits the result so
  the run lifecycle has a single owner.
- ChildExitError surfaces when child exits non-zero mid-stream; the
  run is classified as failed with cause cli_exit_nonzero.
- Timeout / abort with at least one completed round elects a fallback
  via selectFallbackRound and emits a synthetic ship event with
  status=timed_out or interrupted; the score persists to
  critique_runs instead of staying null.
- applyTimeouts includes childExitRace in every Promise.race so early
  child exits are classified without waiting for the total timeout.
  iter.return() cleanup is capped at 200ms to prevent hang on
  stalling generators.
- writeTranscript writes gzip output to transcript.ndjson.gz.tmp,
  fsyncs, then atomic-renames. Crashes mid-write leave no partial
  .gz or .gz.tmp on disk.

* fix(daemon): plain-stream gating, per-run artifact dir, boot reconcile (Defects 1, 2, 6)

- Spawn-path branch now inspects def.streamFormat and only routes through
  runOrchestrator when format === 'plain'. Adapters emitting wrapper
  formats (claude-stream-json, copilot-stream-json, json-event-stream,
  acp-json-rpc, pi-rpc) fall through to legacy single-pass with a
  one-time stderr warning per format. Per-format decoding into the
  orchestrator is reserved for v2.
- critiqueArtifactDir is now path.join(ARTIFACTS_DIR, projectId, runId)
  so concurrent or sequential runs in the same project never overwrite
  each other's transcript or final HTML. Persistence stores the relative
  per-run path.
- reconcileStaleRuns is now invoked after openDatabase on every daemon
  boot with staleAfterMs = critiqueCfg.totalTimeoutMs. Stale running
  rows from a prior crash flip to interrupted with rounds_json.
  recoveryReason='daemon_restart'. Logs a one-line warning naming the
  flipped count when greater than zero.
- Spawn now passes child + childExitPromise to runOrchestrator so the
  orchestrator can race child exit against the parser, abort signal,
  and timeouts in one awaited flow. Server awaits the orchestrator's
  result and surfaces failures through the existing run lifecycle.

* fix(daemon): daemon-authoritative scoring, lifecycle status, stderr ordering, insert type

Round 2 review feedback on PR #481.

1. CritiqueRunInsert.status now accepts 'running' so the boot-reconcile
   tests (and any caller seeding an in-flight row) typecheck without
   casting. The runtime check in insertCritiqueRun already accepted
   'running' against the DB constraint set, only the public type was
   stricter than the DB.
2. round_end keeps the daemon-computed composite authoritative. The
   agent's <ROUND_END composite=...> attribute is advisory: a divergence
   beyond COMPOSITE_TOLERANCE emits a composite_mismatch parser_warning
   so the discrepancy is observable, but the daemon value is what scores
   and persists. Same policy for must_fix.
3. SHIP-handling derives the final status from decideRound(...) using the
   daemon's scored round rather than trusting <SHIP composite=... status=...>.
   A run that the agent claims as shipped but whose daemon composite is
   below threshold now finalizes as below_threshold, so a malformed or
   adversarial stream cannot force a ship.
4. server.ts captures the orchestrator's result and maps the critique
   terminal status to the chat run lifecycle. shipped/below_threshold
   finalize as 'succeeded'; timed_out/interrupted/degraded/failed
   finalize as 'failed'. cancelRequested is honored.
5. stderr forwarding and child.on('error') registrations moved BEFORE
   the orchestrator await so a CLI that floods stderr cannot fill the
   OS pipe and deadlock until the total timeout, and so an early
   child error fired during the run is observed by the same listener
   used after.

Tests:
- tests/critique-authority.test.ts: 3 new regressions (lying ship
  downgraded to below_threshold, mismatch warning emitted, aligned
  composites stay quiet).
- All four affected suites green: 14 orchestrator + 10 spawn-wiring +
  3 boot-reconcile + 3 authority = 30/30.

Workspace typechecks: contracts, daemon, web all exit 0.

* fix(daemon,contracts): inline critique SSE, signal-terminated child, null shipped artifactPath

Round 3 review feedback on PR #481.

1. packages/contracts/src/critique.ts inlines CritiqueSseEvent +
   panelEventToSse + CRITIQUE_SSE_EVENT_NAMES + a local mirror of
   SseTransportEvent. The previous re-export from './sse/critique.js'
   broke the workspace web build (Turbopack cannot rewrite .js to .ts
   on a relative source import) while removing the .js extension broke
   daemon's NodeNext typecheck (it walks this leaf via the './critique'
   subpath export which requires explicit .js extensions). Inlining
   removes the cross-file relative import entirely so both consumers
   walk one self-contained file. packages/contracts/src/sse/critique.ts
   is removed and its co-located test moves up to
   packages/contracts/src/critique.test.ts. The barrel
   packages/contracts/src/index.ts drops the redundant
   './sse/critique' re-export since './critique' already exports the
   same symbols.

2. apps/daemon/src/critique/orchestrator.ts treats a signal-terminated
   child as a terminal race rejection. Previously the race only caught
   non-zero numeric exit codes and treated code === null as
   indefinitely pending, so a SIGTERM from /api/runs/:id/cancel
   resolved childExitPromise as { code: null, signal: 'SIGTERM' } and
   the orchestrator fell through to the no-SHIP fallback path,
   persisting below_threshold instead of interrupted. The race now
   rejects with a new ChildSignaledError when signal !== null, and a
   new catch branch classifies the run as 'interrupted' and (if at
   least one round closed) emits a synthetic ship event with
   status='interrupted' so the persisted row and the SSE transcript
   reflect the actual cause.

3. Same file, ship-handling: artifactPath is now persisted as null on
   shipped runs until a future phase actually extracts the
   <SHIP><ARTIFACT> body to disk. Previously the orchestrator wrote
   ${artifactDir}/${artifactId} even though no file existed at that
   path, so any later replay/export/UI code that trusted
   critique_runs.artifact_path would dereference a missing file. The
   transcript still records the ship event with the artifact reference
   so consumers can find the run.

Tests:
- apps/daemon/tests/critique-lifecycle.test.ts: 2 new regressions
  (SIGTERM-terminated child after one closed round persists
  'interrupted' with a synthetic ship event of the same status; shipped
  run leaves artifactPath null in result and DB row).
- 43 critique-suite tests pass: 14 orchestrator + 11 transcript +
  10 spawn-wiring + 3 boot-reconcile + 3 authority + 2 lifecycle.

Workspace typechecks: contracts, daemon, web all exit 0.

* fix(daemon): buffer raw SHIP, emit only normalized; reject SHIP for unclosed round

Round 4 review feedback on PR #481.

The parser-event loop used to unconditionally collectedEvents.push(event)
and bus.emit(panelEventToSse(event)) for every event, including raw
<SHIP>. SSE clients and the transcript could see the agent's forged
status="shipped" / composite="9.5" before decideRound(...) ran, even
when the daemon later corrected the persisted DB row to below_threshold.
The loop now skips ship events entirely; the orchestrator buffers the
raw shipEvent, runs daemon-authoritative scoring, and emits a single
normalized ship payload built from the daemon's computed composite,
selectFallbackRound's mustFix, and decideRound's status. The transcript
and SSE bus now only ever see the daemon-scored ship.

The unknown-round fallback used to make agent-claimed status/composite
authoritative when SHIP referenced a round that was never closed: a
malformed stream could close low round 1, then send <SHIP round="2"
status="shipped" composite="10">, completedRounds.find(r => r.n === 2)
was undefined, and the orchestrator persisted the agent's value. That
re-opened the scoring-integrity hole the previous round was meant to
close. The orchestrator now drops a SHIP whose round isn't in
completedRounds, emits a parser_warning, and falls through to the
no-SHIP fallback policy. The synthetic ship from selectFallbackRound
gets emitted instead, with daemon-authoritative round/composite/status.

Tests:
- tests/critique-authority.test.ts: extended the lying-ship regression
  to also assert the emitted critique.ship payload is downgraded
  (status='below_threshold', composite < threshold), so the SSE bus
  cannot see the agent's claim. Added a new regression where SHIP
  references an unclosed round 2: the agent ship is dropped, a
  parser_warning fires, the fallback selects round 1, and the only
  emitted critique.ship has round=1 and status=below_threshold.
- 44 critique-suite tests pass: 14 orchestrator + 11 transcript + 10
  spawn-wiring + 3 boot-reconcile + 4 authority + 2 lifecycle.

Workspace daemon typecheck exits 0.

---------

Co-authored-by: Nagendhra <nagendhra405@gmail.com>
Co-authored-by: mrcfps <mrc@powerformer.com>
2026-05-05 15:50:35 +08:00
Nagendhra Madishetti
47eeaf445d
feat: Critique Theater foundation (contracts + parser, Phases 0-2) (#387)
* docs(specs): add Critique Theater design spec for panel-tempered artifacts

* docs(specs): add Critique Theater implementation plan

* docs(specs): rename UI to Design Jury, add lane-density modes, ship-rule explainer, label sizing

* feat(contracts): add CritiqueConfig schema and defaults

* fix(contracts): apply Task 1.1 review (CRITIQUE_PROTOCOL_VERSION rename, descriptions, RoleWeights export)

* feat(contracts): add PanelEvent discriminated union and isPanelEvent guard

* fix(contracts): apply Task 1.2 review (exhaustive event-type list, runId guard, import order)

* feat(contracts): add CritiqueSseEvent variants and panelEventToSse mapper

* test(daemon): add v1 wire-protocol golden fixtures for Critique Theater parser

* feat(daemon): add v1 streaming parser for Critique Theater wire protocol

* chore(contracts): add .js extensions to relative imports for NodeNext consumers

* fix(daemon): satisfy noUncheckedIndexedAccess in v1 parser regex match access

* test(daemon): cover parser failure modes; fix unclosed-PANELIST swallow bug

* fix(daemon,contracts): address PR #387 review

- parser now clamps panelist + DIM scores against the run-declared scale
  captured from <CRITIQUE_RUN scale=...>, not a hardcoded 100
- PANELIST appearing before any <ROUND n=...> opens now throws
  MalformedBlockError rather than emitting events with NaN round
- DIM_RE and MUST_FIX_RE hoisted to module scope and lastIndex reset per
  call so the parser hot path stops recompiling regex per artifact
- overflow check after drain simplified to a plain buf.length > cap test
  (the prior compound condition was always true on the right side and
  obscured intent)
- scoreThreshold <= scoreScale refine gains a 1e-9 epsilon so floating
  slack does not reject semantically valid configs
- round-1 designer ARTIFACT guard gains a comment naming the spec
  invariant and the v2 relaxation path
- 3 new regression tests cover the panelist-without-round, scale=10
  clamp, and scale=20 plumbing cases

* docs(specs): rationale for non-goals, failure-mode rate targets, Phase 10 matrix, Phase 14 doc layout

* Merge branch 'main' into feat/critique-theater

Resolves the contracts/index.ts conflict by keeping the .js extensions added
by chore(contracts) 2d6e8d6 and slotting in the new export for ./api/app-config
introduced upstream by #255 (9d700ec). Critique Theater additions
(./sse/critique, ./critique) preserved in their original positions.

Verified after merge:
  pnpm --filter @open-design/contracts test    -> 10/10 pass
  pnpm --filter @open-design/contracts typecheck -> exit 0
  pnpm --filter @open-design/daemon typecheck  -> exit 0
  pnpm --filter @open-design/web typecheck     -> exit 0

Two daemon tests in tests/media-config.test.ts fail both before and after the
merge because they read real OAuth credentials from the developer machine
instead of using mock fixtures. That's an upstream isolation issue on
origin/main, not something this branch introduces.

* fix: unblock web build and address mrcfps PANELIST oversize bypass

The chore commit that added .js extensions to satisfy daemon's nodenext
typecheck broke apps/web's Next.js build, because webpack tried to resolve
the literal ./common.js when only common.ts exists on disk. Replaced with
a subpath approach: contracts/exports gains a './critique' entry pointing
straight at src/critique.ts (which has no relative imports), and daemon
imports route through @open-design/contracts/critique instead of the
barrel. Web keeps the bundler-friendly barrel; daemon's nodenext walks
only the leaf module. All 13 contracts source files reverted to no-.js.

Separately, mrcfps flagged that parserMaxBlockBytes was only enforced on
the leftover buffer after drain returned, so a complete oversized block
arriving in one chunk slipped past the cap. Added an explicit per-block
size check inside drain for every buffered block type (PANELIST,
ROUND_END, SHIP). Three regression tests yield the whole stream as a
single chunk and assert OversizeBlockError fires before any events emit.

* fix(daemon): close three v1 parser invariant gaps from mrcfps review

Three independent gaps that all let malformed or oversized protocol
output pass the v1 envelope contract:

(1) Envelope guard. ROUND, PANELIST, ROUND_END, and SHIP now throw
MalformedBlockError when state.inRun is false. Without this, a stream
that omits <CRITIQUE_RUN> could still emit panelist_* events without
the run_started handshake, leaving downstream reducers with no run-level
config.

(2) UTF-8 byte length. Both the per-block size check and the post-drain
buf-size check now compare Buffer.byteLength(text, 'utf8') against
parserMaxBlockBytes. The previous string-length comparison let multibyte
content (CJK, emoji) inside <NOTES>/<SUMMARY> exceed the configured
byte cap while staying under the JS string length cap, bypassing the
daemon's resource guard.

(3) Header-end ordering. PANELIST, ROUND_END, and SHIP now require the
opener's > to appear before the matched closing tag. A malformed opener
like <PANELIST role="x" score="8"</PANELIST> previously fell through
to the closing tag's > and emitted events for an invalid block.

Four regression tests cover each gap (ROUND-without-run,
SHIP-without-run, multibyte-byte-cap, malformed-opener).

* fix(lockfile): regenerate to include contracts zod + vitest entries

The earlier conflict resolution took main's lockfile and ran pnpm
install, but the install pass on Windows didn't write the contracts
package's zod and vitest entries back into the lockfile. CI's
--frozen-lockfile install rejected the resulting state. Re-running
pnpm install with --no-frozen-lockfile rewrites the lockfile so it
now matches every package.json across the workspace, including
contracts/zod ^3.23.8 and contracts/vitest ^2.1.8. Verified locally:
pnpm install --frozen-lockfile passes.

---------

Co-authored-by: Nagendhra <nagendhra405@gmail.com>
2026-05-04 20:28:28 +08:00
Ajay Satish
9d700ec74f
feat(daemon): persist code agent startup (#255)
* feat(daemon): persist code agent startup

* fix: complete all suggestions

* fix: types for app config

* chore: revert local origin

* chore: format to single quotes

* fix: duplicate headers

* fix: isLocalSameOrigin rewriting issue

---------

Co-authored-by: mrcfps <mrc@powerformer.com>
2026-05-03 12:14:04 +08:00
Caprika
0c00f241e7
Add preview comment attachments (#284) 2026-05-02 19:23:46 +08:00
Aresdgi
59e4966dda
feat(version): add app version awareness (#204)
* feat(version): add app version awareness

* fix(version): detect packaged sidecars across platforms
2026-05-01 17:26:54 +08:00
nettee
3fb849d047
Fix chat runs surviving web disconnects (#146)
* fix chat runs surviving web disconnects

* fix chat run create abort propagation

Generated-By: looper 0.0.0-dev (runner=fixer, agent=openai/gpt-5.5)

* fix daemon keepalive reconnect budget

Generated-By: looper 0.0.0-dev (runner=fixer, agent=gpt-5.5)

* fix daemon stream disconnect cancellation

Generated-By: looper 0.0.0-dev (runner=fixer, agent=openai/gpt-5.5)

* fix daemon stream abort cancellation race

Generated-By: looper 0.0.0-dev (runner=fixer, agent=openai/gpt-5.5)

* fix daemon run cancellation semantics

* fix load

* doc

* 2

* add run refresh recovery

* fix active run refresh status

* fix reattach abort handling

* fix

* fix chat initial scroll

* fix daemon start failures

Generated-By: looper 0.2.7 (runner=fixer, agent=openai/gpt-5.5)

* fix background run recovery

Generated-By: looper 0.2.7 (runner=fixer, agent=openai/gpt-5.5)

* fix stop run status

Generated-By: looper 0.2.7 (runner=fixer, agent=openai/gpt-5.5)

* fix background run recovery

Generated-By: looper 0.2.7 (runner=fixer, agent=openai/gpt-5.5)

* extract daemon run service

* move prompt composition to daemon

* fix prompt module resolution

* fix project id generation

* add project run status

* add designs kanban view with awaiting_input status

- add grid/kanban view toggle on Designs tab; persist choice in localStorage
- introduce awaiting_input project display status (daemon-derived from
  unanswered <question-form>) so projects asking the user aren't shown
  as Completed; ordered between Running and Completed with amber accent
- hide transient queued state from users: coerce queued/starting to
  running in daemon /api/projects projection and drop the queued kanban
  column
- a11y polish on Designs cards: Space activation, aria-labels on delete,
  focus-visible outlines, reveal delete on focus-within and touch,
  prefers-reduced-motion handling
- kanban layout uses flex sizing instead of viewport math; scoped icon-
  only pill button rule fixes view-toggle icon alignment

---------

Co-authored-by: mrcfps <mrc@powerformer.com>
2026-04-30 20:16:46 +08:00
nettee
56d08b8c5f
Add shared contracts and migrate project code to TypeScript (#118) 2026-04-30 13:01:15 +08:00