Commit graph

439 commits

Author SHA1 Message Date
Ethan Guo
38b59fa4d9
fix(web): confirm before deleting saved template in New Project (#2330)
* fix(web): confirm before deleting saved template in New Project

EntryShell and NewProjectModal were dropping the onDeleteTemplate prop, so the trash button in the template picker was inert; thread it through and gate it behind an alertdialog confirm.

Fixes #2237

* fix(web/i18n): translate deleteTemplate keys across 18 locales

The initial commit shipped English placeholders for the 4 new newproj.deleteTemplate* keys; replace with localized strings matching each locale's existing designs.deleteConfirm style.

* fix(web): gate confirm-dialog backdrop on in-flight delete + style error message

Backdrop now no-ops while `deleting` is true (matching the Cancel and Delete button gating), preventing a false-cancellation race where the backgrounded DELETE keeps running and leaks a stale `deleteError=true` into the next dialog open; also adds the missing `.modal-confirm-error` rule so the inline failure copy renders as a destructive state instead of default body styling.
2026-05-20 14:50:36 +08:00
Bryan
c530d163f8
feat(web): "Resume conversation in new chat" UI — #462 Commit B (companion to #1718) (#2264)
* feat(contracts): add handoff request/response DTOs

Adds HandoffRequest, HandoffResponse, and HANDOFF_SCHEMA_VERSION for
the upcoming POST /api/projects/:id/handoff synthesis endpoint. Mirrors
the finalize.ts subpath pattern (package.json#exports + esbuild entry +
index re-export) so daemon and web can import
@open-design/contracts/api/handoff.

Refs nexu-io/open-design#462.

* feat(daemon): add handoff synthesis pipeline (buildHandoffPrompt + synthesizeHandoffPrompt)

Adds `apps/daemon/src/handoff-design.ts` exposing the resume-conversation
synthesis primitives the upcoming `POST /api/projects/:id/handoff` route will
call into.

- `buildHandoffPrompt({ projectId, transcriptJsonl, transcriptMessageCount,
  now })` returns the system + user prompts. System prompt asks Claude to
  emit a structured Markdown body with Context / Decisions made / Open
  questions / Current focus / Provenance, with Provenance bullets explicitly
  flat (no Markdown emphasis on labels) to preempt the PR #1584 round-2
  parser bug.
- `synthesizeHandoffPrompt(db, projectsRoot, projectId, options)` reuses the
  existing finalize-design pipeline pieces: `exportProjectTranscript` →
  `truncateTranscriptForPrompt` → `buildHandoffPrompt` →
  `callAnthropicWithRetry` → `extractDesignMd`, but without the lockfile,
  disk write, design-system, or artifact-resolution paths.
- Promotes `DEFAULT_TIMEOUT_MS` in finalize-design.ts to `export const` so
  handoff shares the same 120s upstream-call bound.

Refs nexu-io/open-design#462.

* feat(daemon): wire POST /api/projects/:id/handoff route

Adds the handoff HTTP route and registers it in server.ts. Validation
block + error-mapping shape mirror registerFinalizeRoutes (BYOK payload,
upstream-error → ApiErrorCode mapping, redactSecrets on the raw upstream
body). Handoff has no lockfile, so the CONFLICT branch is omitted.

`res.on('close')` is wired to flip an AbortController whose signal is
threaded into synthesizeHandoffPrompt, so a UI-side cancel actually
aborts the daemon-side Anthropic call rather than letting it keep
running after the client walks away (mirrors the PR #974 fix for
finalize).

- `apps/daemon/src/handoff-routes.ts` — new, exports registerHandoffRoutes
  + RegisterHandoffRoutesDeps.
- `apps/daemon/src/server-context.ts` — adds handoff slot to ServerContext.
- `apps/daemon/src/route-context-contract.ts` — adds RegisterHandoffRoutesDeps
  to the compile-time coverage assertion.
- `apps/daemon/src/server.ts` — imports synthesizeHandoffPrompt +
  registerHandoffRoutes, builds handoffDeps, registers the route next
  to finalize.
- `apps/daemon/tests/handoff-route.test.ts` — 12 HTTP-layer tests:
  validation (400/403/404), happy path, upstream error mapping
  (401/429/502/502 non-JSON), api-key redaction.
- `apps/daemon/tests/handoff-route-abort.test.ts` — client-disconnect
  aborts the daemon-side controller.

Refs nexu-io/open-design#462.

* fix(daemon): map TranscriptExportLockedError to 409 CONFLICT on handoff route

`exportProjectTranscript` acquires a per-project `.transcript.lock`
internally (apps/daemon/src/transcript-export.ts:131-163) and throws
`TranscriptExportLockedError` on EEXIST. Concurrent handoff requests —
or a handoff that races `/api/projects/:id/finalize/anthropic` — lost
that lock and surfaced as 500 INTERNAL_ERROR through the route's
generic catch.

- `apps/daemon/src/handoff-routes.ts` — catch `TranscriptExportLockedError`
  and return `409 CONFLICT` ahead of the generic 500 branch, mirroring
  the existing `FinalizePackageLockedError → 409 CONFLICT` mapping at
  `apps/daemon/src/import-export-routes.ts:603-605`.
- `apps/daemon/src/server.ts` — thread `TranscriptExportLockedError`
  through `handoffDeps` so the route can match without a direct import.
- `apps/daemon/src/handoff-design.ts` — correct the module header
  comment that incorrectly claimed "no lockfile (concurrent handoff
  calls are safe)" — handoff does not add its own lock, but it does
  transitively acquire `.transcript.lock` via the transcript-export
  call.
- `apps/daemon/tests/handoff-route.test.ts` — regression test that
  pre-acquires `.transcript.lock` on disk via `fs.openSync(lockPath, 'wx')`
  before firing a handoff request, asserts 409 CONFLICT.

Refs nexu-io/open-design#462 — addresses @nettee's blocking review on
PR #1718 (comment 3242251338).

* fix(daemon): keep handoff request timeout armed through the response body read

`synthesizeHandoffPrompt` cleared the upstream-call timeout in a `finally`
that ran as soon as `callAnthropicWithRetry` returned. But `fetch()`
resolves once the upstream sends *headers* — so the subsequent
`await response.json()` body read ran with no timeout. A response that
sends headers and then stalls its body could hang `/api/projects/:id/handoff`
indefinitely instead of failing.

- `apps/daemon/src/handoff-design.ts` — move `clearTimeout(timeoutId)` into a
  single outer `finally` spanning both the call and the `response.json()`
  body parse, so the timeout stays armed until the body is fully consumed.
- `apps/daemon/src/handoff-design.ts` — the body-parse catch now re-throws
  `AbortError` as-is, mirroring the call-phase catch. Without this a
  body-phase timeout would surface as `502` "non-JSON body"; re-throwing
  lets the route map it to the intended `503` "handoff timed out"
  (`handoff-routes.ts:122-124`).
- `apps/daemon/tests/handoff-design.test.ts` — regression test: a `fetchImpl`
  returning a `Response` whose body never closes after headers, raced
  against a 500ms deadline, asserts the call aborts (not hangs) and rejects
  with `AbortError`.

Refs nexu-io/open-design#462 — addresses @nettee's round-2 blocking review
on PR #1718 (`handoff-design.ts:196`).

* fix(daemon): map upstream 400 to 400 BAD_REQUEST on handoff route

`callAnthropicWithRetry` preserves a non-retryable upstream status, so an
Anthropic HTTP 400 (`invalid_request_error` — unknown model, invalid
maxTokens, malformed body) reached the route's `FinalizeUpstreamError`
branch and fell through to `502 UPSTREAM_UNAVAILABLE`. That reported
deterministic caller input as a transient server outage, inviting
pointless retries and hiding which field was wrong.

- `apps/daemon/src/handoff-routes.ts` — special-case `err.status === 400`
  to `400 BAD_REQUEST` with the redacted upstream detail, ahead of the
  generic 502. Also refresh the route docblock: it claimed the 409 branch
  was omitted (stale since the R1 TranscriptExportLockedError fix) and
  that error mapping fully mirrors finalize (now diverges on 400).
- `apps/daemon/tests/handoff-route.test.ts` — route test driving an
  Anthropic `400 invalid_request_error`: asserts 400 BAD_REQUEST, the
  upstream detail is surfaced, and an echoed key is redacted.
- `packages/contracts/tests/package-runtime.test.ts` — import
  `@open-design/contracts/api/handoff` through the package `exports` map
  and assert `HANDOFF_SCHEMA_VERSION`, covering the built publish surface
  (esbuild entry + exports map + root re-export) that the source-only
  `handoff-contract.test.ts` does not exercise.

Refs nexu-io/open-design#462 — addresses @nettee's round-3 blocking
review on PR #1718.

* fix(daemon): await the now-async external base-URL validator on handoff route

Main's #1176 (`9a64fccd`) made `validateExternalApiBaseUrl` DNS-aware and
asynchronous (`validateBaseUrlResolved`) and updated the proxy and finalize
callers to `await` it. The handoff route — added on this branch in parallel,
against the old synchronous validator — still called it without `await`, so
`validated` was a Promise: `validated.error` / `validated.forbidden` were
`undefined`, the SSRF / malformed-URL guard silently no-opped, and a bad
`baseUrl` fell through to the upstream call and surfaced as 502.

A semantic merge break — no textual conflict, green on the branch in
isolation, red once CI re-merged latest main.

- `apps/daemon/src/handoff-routes.ts` — `await validateExternalApiBaseUrl(...)`,
  mirroring the finalize route (`import-export-routes.ts:561`). The handler
  is already `async`.

The existing `handoff-route.test.ts` cases "400 BAD_REQUEST when baseUrl is
not a valid URL" and "403 FORBIDDEN when baseUrl points at a private internal
IP" already encode this — red against branch + latest main, green now.

Refs nexu-io/open-design#462 — PR #1718 CI fix.

* chore(daemon): list handoff in the assertServerContextSatisfiesRoutes literal

The `assertServerContextSatisfiesRoutes({...})` call in `server.ts` enumerates
every route registrar's deps but omitted `handoff`. Adding `handoff: handoffDeps`
makes the literal complete and consistent with the other route deps.

This was not a typecheck break: route-dep coverage is guaranteed by the
`Assert<ServerContext extends AllRegisteredRouteDeps>` type in
`route-context-contract.ts` — and `AllRegisteredRouteDeps` already includes
`RegisterHandoffRoutesDeps` — not by this assertion-call literal. The literal
has omitted `handoff` since this branch's first push (`806db576`) through green
CI throughout; `tsc -p tsconfig.json --noEmit` is clean before and after.

Refs nexu-io/open-design#462 — addresses @nettee's round-4 review note on PR #1718.

* feat(web): add "Resume conversation in new chat" action (#462)

Adds a Resume control to the chat header, next to "New conversation".
Clicking it synthesizes a handoff prompt from the current transcript
via POST /api/projects/:id/handoff, opens a fresh conversation, and
auto-sends the synthesized prompt as its first user message — so a
drifted session resumes without the user replaying context by hand.
The old conversation is preserved.

- synthesizeHandoff() web-state wrapper in apps/web/src/state/projects.ts
- resume-conversation icon button in ChatPane (onResumeConversation /
  resumeConversationDisabled props)
- handleResumeConversation + pendingResumeRef + auto-send effect in
  ProjectView; effect gates on messagesConversationId so the prompt
  cannot fire before the new conversation's message read settles
- chat.resumeConversation i18n key across all 19 locales

Commit B of #462; Commit A is the daemon endpoint (PR #1718). This
branch is stacked on feat/handoff-endpoint so the web code resolves
@open-design/contracts/api/handoff.

* fix(daemon): scope handoff to one conversation + reject empty transcripts (#462)

Addresses the review on #1718 and #2264:

- mrcfps (#2264): the handoff endpoint exported the whole project's
  transcript, so a multi-conversation project blended unrelated chats
  into the synthesized prompt. HandoffRequest now carries a required
  conversationId; the route validates it belongs to the project
  (404 CONVERSATION_NOT_FOUND), and exportProjectTranscript takes an
  optional conversationId filter so only that conversation is exported.
- nettee (#1718): a zero-message conversation still called Anthropic and
  fabricated a handoff. synthesizeHandoffPrompt now throws
  EmptyTranscriptError on messageCount === 0; the route maps it to
  400 EMPTY_TRANSCRIPT before any BYOK tokens are spent.

HANDOFF_SCHEMA_VERSION bumped to 2 (conversationId is a new required
request field). Regression tests: a two-conversation scoping test, an
empty-conversation route + pipeline test, and a transcript-export
conversationId-filter unit test.

* feat(web): send conversationId with the resume handoff request (#462)

Follows the handoff endpoint becoming conversation-scoped. The resume
flow now passes the active conversationId to POST /handoff so the
synthesized prompt summarizes only the conversation being resumed.
handleResumeConversation bails when there is no active conversation;
synthesizeHandoff and the resume tests carry the new field.

* feat(daemon): add `od project handoff` CLI + register handoff error codes (#462)

Addresses the second-round review on #1718 and #2264:

- mrcfps (#2264): per AGENTS.md "Capability exposure (UI/CLI dual-track)",
  a user-facing capability must be reachable through the `od` CLI, not
  only the web UI. Adds `od project handoff <id> --conversation <id>
  --api-key <key> --model <model> [--base-url] [--max-tokens] [--json]`,
  driving the same POST /api/projects/:id/handoff endpoint. The logic
  lives in a testable handoff-cli.ts sibling module (mirrors
  artifacts-cli.ts) so cli.ts's import-time dispatch stays out of tests.
- nettee (#1718): the route emitted CONVERSATION_NOT_FOUND and
  EMPTY_TRANSCRIPT, which were absent from the shared API_ERROR_CODES
  union. Both are now registered in packages/contracts/src/errors.ts,
  with a contract test pinning them so the route and contract cannot
  drift again.

A CLI contract test covers the conversation-scoped request shape,
--json output, flag validation, and daemon-error surfacing.

* fix(daemon): fail `od project handoff` on a malformed 2xx response (#462)

Addresses nettee's review on #1718: runProjectHandoff treated any 2xx
response as success, so a broken daemon/proxy 200 with malformed or
shape-invalid JSON would print `undefined` (or `{}` under --json) and
still exit 0 — breaking the fail-fast contract scripts rely on. It now
validates the body is a well-formed HandoffResponse via an
isHandoffResponse type guard and fails fast otherwise. Regression tests
cover a shape-invalid and an unparseable 200 body.

* feat(web): surface the daemon's classified handoff error in the resume toast (#462)

Addresses mrcfps's non-blocking note on #2264: synthesizeHandoff returned
null for every non-2xx response, so RATE_LIMITED, EMPTY_TRANSCRIPT, and an
upstream 400 with provider detail all collapsed into one generic "check
your API key" toast — even though handoff-routes.ts had already classified
and sanitized them.

synthesizeHandoff now returns the daemon's structured `{ error }` on a
classified failure; `null` stays reserved for a transport failure or an
unparseable body. handleResumeConversation surfaces error.message plus
redacted details for the `{ error }` case, and a distinct
daemon-unreachable message for null.

* fix(web): omit empty baseUrl from the resume handoff request (#462)

Addresses mrcfps's review on #2264: the default Anthropic config
normalizes baseUrl to '' (config.ts), and the handoff route 400s an
explicit empty baseUrl — so the Resume action failed before synthesis
for every user who never set a custom base URL.

handleResumeConversation now forwards baseUrl only when config.baseUrl
is a non-empty string, matching the contract's optional-field semantics.
Tests: the default-config path asserts baseUrl is absent from the
request, and a new case covers a custom baseUrl being forwarded.

* refactor(daemon): dispatch `od project handoff` before the generic project parser (#462)

Addresses nettee's non-blocking note on #1718: runProject ran the shared
parseFlags(PROJECT_*) before reaching the handoff switch case, so a
malformed `od project handoff` invocation (`--unknown`, `--max-tokens`
with no value) threw out of the generic parser instead of hitting
handoff-cli's structured fail() — the entrypoint behaved differently
from the unit-tested runProjectHandoff helper.

The handoff sub now short-circuits before parseFlags / projectDaemonUrl,
so `od project handoff` runs exactly runProjectHandoff with no
intervening parsing. handoff-cli.test.ts gains unknown-flag and
missing-value cases covering the structured fail path.

---------

Co-authored-by: DevForgeAI CI/CD Engineer <devforge-ai@development.ai>
2026-05-20 13:28:27 +08:00
lefarcen
204599a7ae
feat(analytics): ship PostHog v2 event schema (#2285)
* feat(analytics): ship PostHog v2 event schema

Aligns the PostHog wire format with the product team's v2 tracking
spec (Open Design 埋点文档 2.0). The previous v1 catalogue defined a
flat per-page event name (home_view / studio_click / settings_view…);
v2 collapses everything to four core events identified through the
page_name + area + element triplet so dashboards can group by surface
without owning a separate event per page.

Key changes
- packages/contracts/src/analytics: collapse to page_view / ui_click /
  surface_view / *_result event names; bump EVENT_SCHEMA_VERSION to 2;
  rename the wire field anonymous_id → device_id (value unchanged);
  promote the configure-state triplet (has_available_configure_cli /
  configure_type / configure_availability) to a global PostHog register
  so every event inherits it without per-helper boilerplate.
- apps/web/src/analytics: rewrite the 43 trackXxx helpers behind the
  new typed catalogue; opt out of PostHog's built-in UA bot filter so
  legitimate embedded webviews, fingerprinted browsers, and the
  Playwright-based e2e runs ingest captures (the Privacy → "Share
  usage data" toggle remains the single consent gate).
- apps/web components: wire P0/P1/P2 click + view + result surfaces
  end-to-end — left nav, toolbar, home chat composer, recent projects,
  new project modal, plugins / design systems / integrations /
  automations pages, file manager, artifact toolbar/header/share popup,
  feedback panel, settings sidebar / language / appearance /
  notifications / pets / privacy / connectors. Fixes the v1 feedback
  bug where action=clear_feedback_rating shipped rating=null instead of
  the rating being cleared.
- apps/daemon: extend run_created / run_finished with the v2 context
  (entry_from / project_kind / target_platforms / fidelity /
  connectors / etc.), add explicit error_code classification on
  result=failed (run.errorCode → AGENT_SIGNAL_* → AGENT_EXIT_* →
  AGENT_TERMINATED_UNKNOWN), and read device_id from the new
  x-od-analytics-device-id header. Also moves the run_created /
  run_finished emission to the canonical /api/runs handler in
  server.ts; the chat-routes copy was shadowed by Express's earlier
  registration and never executed, which also meant run.clientType
  never made it to Langfuse — fixed in the same move.

Verification
- pnpm guard / pnpm typecheck clean for daemon, web, and contracts.
- pnpm --filter @open-design/web test: 1645/1645 passing.
- End-to-end smoke through Playwright + local PostHog ingest project
  420348: every page_view (home/projects/automations/design_systems/
  plugins/integrations/chat_panel/file_manager), every nav element,
  the new_project_modal surface_view + tab + create flow, the
  plugin_replacement_modal surface_view, settings_view across nine
  sections, settings_cli_test_result (codex CLI), the
  project_create_result success path, and run_created + run_finished
  (result=failed, error_code=AGENT_EXIT_1) all reached PostHog with
  the v2 schema and the expected device_id / page_name / area /
  element / fidelity / target_platforms props. The remaining
  *_result events (artifact_export / feedback_submit / file_upload /
  plugin_replacement / settings_byok_test / settings_connector_auth)
  are wired in code; production traffic will trigger them.

* fix(analytics): preserve style category on design-systems surface chip switch

The merge resolution in DesignSystemsTab incorrectly re-introduced a
`setCategory('All')` call alongside the new `trackDesignSystemsTopClick`
emit. main intentionally keeps the active style category when the surface
filter refines within it; the regression was caught by the existing
"keeps the style category when a surface chip refines within it" test
in tests/components/DesignSystemsTab.test.tsx.

* fix(analytics): address review — senseaudio passthrough + daemon-side configure-state

Two follow-ups from the v2 schema review on #2285:

1. `byokProtocolToTracking()` was still falling through to `null` for
   `senseaudio` even though the v2 BYOK provider enum now lists it. Every
   `SettingsDialog` BYOK call site guards on `if (byokProviderId)`, so a
   user on SenseAudio was silently dropping the provider-option,
   field-focus, and test-result captures. Added the missing case so
   SenseAudio gets the same analytics coverage as the other providers.

2. The daemon-authoritative `run_created` / `run_finished` events were
   missing the configure-state triplet (`has_available_configure_cli` /
   `configure_type` / `configure_availability`) that v2 promotes to a
   global register on the web side. Daemon captures don't go through the
   PostHog global register, so dashboards couldn't segment run lifecycle
   by execution setup after the migration.

   The fix derives the triplet server-side from `detectAgents()` and the
   request's `agentId` before `design.analytics.capture(...)`:
     - has_available_configure_cli: any CLI on PATH reports installed
     - configure_type: 'local_cli' when the run targets an installed CLI,
       otherwise 'unknown' (daemon can't see BYOK keys, which live in
       web-client storage)
     - configure_availability: 'available' / 'unavailable' / 'unknown'
       based on the requested agent's install status, with a fallback to
       'available' when any CLI is installed

   This keeps the v2 schema consistent across both daemon-side and
   web-side captures.

* fix(analytics): wire setConfigureGlobals so browser events carry fresh state

Third follow-up from the v2 schema review on #2285. The previous fix
addressed senseaudio + daemon-side configure-state, but reviewer flagged
that `setConfigureGlobals` was still defined-only — no caller — so every
browser-side capture inherited the boot defaults
(`has_available_configure_cli=false`, `configure_type='unknown'`,
`configure_availability='unknown'`). PostHog dashboards therefore could
not segment the new `page_view` / `ui_click` / `surface_view` events by
execution setup after a user configured their environment.

Changes:

- `packages/contracts/src/analytics/events.ts` — add a pure
  `deriveConfigureGlobals(mode, agentId, agents, byokConfigured)` helper
  so the web client and the daemon can derive the triplet from the same
  source of truth. The helper covers all 5 `configure_type` buckets
  (`local_cli` / `byok` / `both` / `none` / `unknown`) and the 3
  `configure_availability` buckets (`available` / `unavailable` /
  `unknown`).
- `apps/web/src/App.tsx` — add a useEffect that re-derives the triplet
  whenever the user changes execution mode, selects a new CLI, saves a
  BYOK key, or the detected-agent list refreshes, then pushes it to
  PostHog via `analytics.setConfigureGlobals(...)`. The setter goes
  through the provider so the analytics module stays the single source
  of truth.
- `apps/web/src/analytics/provider.tsx` — expose
  `setConfigureGlobals` on the analytics context and the test stub so
  consumers route through the provider boundary.
- `apps/daemon/src/server.ts` — switch the daemon-side derive in
  `/api/runs` to the shared `deriveConfigureGlobals` helper so the
  authoritative run_created/run_finished captures match the web-side
  payload. BYOK credentials live in the web client and stay invisible
  to the daemon, so the daemon arm passes `byokConfigured: undefined`
  and falls back to the installed-CLI signal.
- `apps/web/tests/analytics-configure-globals.test.ts` — new regression
  test that pins the derive behavior across all branches and confirms
  the setter actually mutates the client-side store. Locks the wire-up
  so a future refactor can't silently turn the setter back into a
  no-op.

Verification: pnpm guard clean; daemon / web typecheck clean; web tests
1703/1703 passing (up from 1696 — 7 new tests in the configure-globals
suite).

* fix(analytics): emit projects page_view + drop misattributed chat_panel source

Fourth review pass on PR #2285. Two follow-ups from mrcfps:

1. DesignsTab (projects landing) was emitting click events but no
   matching page_view. Opening /projects without clicking anything left
   the surface invisible in PostHog. Added a once-per-mount
   trackPageView({ page_name: 'projects' }) with the same ref-keyed
   pattern HomeView / PluginsView use.

2. ChatComposer was hard-coding source: 'recent_project' on every
   chat_panel page_view. The web router currently only carries
   projectId / conversationId / fileName, so we cannot distinguish a
   New-project launch from a template-pick or a Recent-projects click
   from this layer. A false constant would over-attribute every chat
   launch to 'recent_project' and break the funnel slice this schema
   was meant to unlock. Dropped the field for now — better no source
   than the wrong source — until the router grows a launch-source
   channel; the field is still defined as optional on PageViewProps so
   the channel can land in a follow-up PR.

Verification: web typecheck clean; web tests 1703/1703 passing.

* fix(analytics): correct plugin-replacement async result + heterogeneous upload + missing requestId

Three follow-ups from the fifth review pass on PR #2285:

1. **plugin_replacement_result emitted before the apply settled**
   (`apps/web/src/components/HomeView.tsx`). The modal's confirm action
   was a synchronous wrapper around an async `usePlugin(...)` call, so
   the surrounding try/catch never observed real failures and every
   attempt was reported as `result=success`. Changed `PendingReplacement.
   confirm` to return `Promise<void>`, made the wrapper return the
   underlying promise, and moved the analytics emit into an async
   IIFE in the click handler so the success/failure branches reflect
   the actual outcome.

2. **file_upload_result mis-typed heterogeneous batches**
   (`apps/web/src/components/FileWorkspace.tsx`). The earlier
   implementation only inspected `picked[0]`, so a mixed batch like
   `image.png + demo.mp4` reported `file_type=image`. Per the comment
   above the block ("mixed batches collapse to other"), the
   implementation now maps every file to a tracking type, collapses to
   `other` when more than one distinct type is present, and falls
   back to the single type otherwise.

3. **project_create_result lost the click→result correlation id**
   (`apps/web/src/components/NewProjectPanel.tsx`). The click event
   no longer carried the locally-generated `requestId` that
   `project_create_result` keeps, so the two could not be joined.
   `trackNewProjectModalElementClick()` now accepts an optional
   `{ requestId }`, mirroring the other helpers, and the create-button
   click threads the same id used for the result.

Verification: web typecheck clean; web tests 1703/1703 passing.

* fix(analytics): gate configure-state on agents probe + drop unsent run_created fields

Two follow-ups from the sixth review pass on PR #2285:

1. **Cold-start configure-state was stamped before fetchAgents() landed**
   (`apps/web/src/App.tsx`). The useEffect that pushes the v2 triplet
   into the PostHog global register fired on first paint with
   `agents=[]`, so the first home/projects/plugins page_view reported
   `has_available_configure_cli=false` / `configure_availability=
   unavailable` even on machines that did have an installed CLI. The
   effect now waits on `agentsLoading === false` and leaves the boot
   defaults ('unknown'/'unknown') in place until the probe resolves.

2. **Daemon read run-context fields the web never sends**
   (`apps/daemon/src/server.ts`). The daemon-side run_created /
   run_finished baseProps read `projectKind`, `entryFrom`,
   `projectSource`, `targetPlatforms`, `companionSurfaces`, `fidelity`,
   `connectors`, `useSpeakerNotes`, `includeAnimations`,
   `referenceTemplate`, and `aspect` from `req.body`, but
   `packages/contracts/src/api/chat.ts` and
   `apps/web/src/providers/daemon.ts` don't carry those keys on the
   wire. Reading them therefore always produced null/undefined.
   Dropped the unsent fields from the daemon capture; a follow-up can
   extend the create payload to thread the real context through. The
   `design_system_id` field stays because the chat contract does send
   it.

Tests: added 3 regression tests in `tests/analytics-configure-globals.
test.ts` covering the boot-time gating contract (empty agents +
daemon mode → unavailable / local_cli; installed agent → available;
undefined agents list → unavailable). Verification: web typecheck
clean; daemon typecheck clean; web tests 1706/1706 passing (up from
1703 — 3 new cold-start tests).

* fix(analytics): pin mode='daemon' so missing-agent run reports unavailable

Eleventh review pass on PR #2285. mrcfps flagged that
`apps/daemon/src/server.ts` was calling `deriveConfigureGlobals(...)`
without `mode`, so the helper fell through to the generic branch.
Result: a run for an uninstalled agent was tagged
`configure_availability: 'available'` whenever any OTHER CLI was on
PATH, because the generic branch only looks at the cohort-wide
"any installed?" signal. That precisely undermines the slice the
daemon emit is trying to power.

The daemon's /api/runs handler is always a daemon-mode capture
(daemon is the local CLI runner — BYOK lives in the web layer), so we
now pin `mode: 'daemon'` on the call site. The helper then judges
`configure_availability` from the REQUESTED agent's install status and
reports `unavailable` when the user picked an agent that is not
installed, even if peers are.

Added a regression case in `tests/analytics-configure-globals.test.ts`:
`{ mode: 'daemon', agentId: 'codex', agents: [{claude,true},{codex,false}] }`
→ `{ has_available_configure_cli: true, configure_type: 'local_cli',
configure_availability: 'unavailable' }`.

Verification: daemon typecheck clean; web tests 1707/1707 passing
(up from 1706 — 1 new regression test).

* fix(analytics): hoist chat_panel page_view + thread requestId

- Move chat_panel page_view emit from ChatComposer to ProjectView so
  it survives activeConversationId-driven ChatPane remounts. ProjectView
  keys the dedupe ref by project.id; the composer drops its duplicate.
- Thread { requestId } into trackAssistantFeedbackReasonSubmitClick so
  the click pairs with the existing feedback_submit_result on the same
  request id (mirrors the trackNewProjectModalElementClick pattern).

* fix(analytics): keep v2 super-props alive across reset and stamp design_system_source

- Snapshot the register payload in client.ts on PostHog init and
  re-register it from applyConsent(true) and applyIdentity() so a
  privacy-toggle or Delete-my-data rotation does not resume capture
  without event_schema_version / device_id / session_id / locale /
  configure-state globals. setConfigureGlobals() also patches the
  cache so a later restore picks up the current configure state.
- Stamp design_system_source on daemon-side run_created / run_finished
  (it is required by RunCreatedProps / RunFinishedProps). Daemon
  can't tell default vs user_selected vs inherited from the wire, so
  it derives 'unknown' when designSystemId is present, 'not_applicable'
  otherwise — a follow-up that threads designSystemSource through
  CreateRunRequest can replace this with the precise source.
2026-05-20 13:04:20 +08:00
PerishFire
15d08d4158
feat: add windows packaged auto update flow (#2362) 2026-05-20 12:56:14 +08:00
lefarcen
1ab8758045
Revert "fix(web): demote Plugins and Integrations to nav rail footer (#1806)" (#2360)
This reverts commit a38e09f931.
2026-05-20 12:43:49 +08:00
Ethan Guo
41e0e26d86
fix(web): stop re-syncing composer scroll on every keystroke (#2282)
The `draft` dep on the overlay scroll-snap effect triggered setComposerScrollTop on every keystroke, looping through the v0.8.0 mention overlay state and crashing the chat composer with Maximum update depth exceeded.

Fixes #2097
2026-05-20 11:15:59 +08:00
kami
d72d689430
fix(web): preserve daemon run reattach across project switch (#2317)
Co-authored-by: multica-agent <github@multica.ai>
2026-05-20 11:14:25 +08:00
Sid
ba6497b399
fix(web): gate srcDoc transport activation on shell ready (#2320)
Opening Tweaks could leave the HTML preview iframe blank when the
host posted `od:srcdoc-transport-activate` before the lazy transport
shell had registered its message listener. The dropped activation
was then suppressed by the dedupe check in subsequent onLoad calls,
stranding the iframe on the 536-byte empty shell.

The race surfaces after the iframe re-mounts: closing Tweaks
increments `srcDocTransportResetKey`, which re-mounts the srcDoc
iframe with a fresh shell. Re-opening Tweaks immediately fires the
`useUrlLoadPreview`-driven `activateSrcDocTransport()` while the new
shell is still loading. The post lands in a window with no listener
yet, but `activatedSrcDocTransportHtmlRef.current` is marked to the
current srcDoc, so the iframe's onLoad call later short-circuits.

Fix:
  - The lazy transport shell now posts `od:srcdoc-transport-ready`
    to its parent once its activate-listener is installed.
  - FileViewer tracks `srcDocShellReady`, reset on iframe re-mount
    and set when ready arrives (with onLoad as belt-and-suspenders).
  - Activation is funneled through a new pure helper
    `canActivateSrcDocTransport()` that adds the ready gate to the
    existing pre-conditions. When ready flips later, the
    `activateSrcDocTransport` useCallback regenerates and the
    enclosing useEffect re-fires the activation cleanly.

Tests cover the shell handshake (red on main: shell never posts
ready) and the gating helper (red on main: function did not exist).

Fixes #2253
2026-05-20 11:13:40 +08:00
kami
dea07840f3
fix: stop stale pinned todos after terminal runs (#2321)
Co-authored-by: multica-agent <github@multica.ai>
2026-05-20 11:13:20 +08:00
Sid
8b16d21785
fix(web): coalesce chokidar rewrite bursts before refreshing files (#2326)
Agent rewrites surface to chokidar as `unlink` + `add` (+ optional
`change`) within a single tick. ProjectView refreshes the file list
on every event, so the open tab's active file vanishes for one frame
between the `unlink` and the `add`. FileWorkspace's `activeFile`
resolver returns null when the active name disappears from
`visibleFiles`, and the preview falls back to an empty state mid-run.

Add a trailing-coalesce around the file-changed refresh:
  - First event arms an 80ms quiet window.
  - Subsequent events inside the window reset it.
  - A 250ms maxWait cap ensures a sustained edit storm still flushes.
  - The intermediate `unlink` is absorbed by the next `add` and the
    UI only sees a single, file-present refresh.

The coalesce is a small `useCoalescedCallback` hook so the timing
contract is unit-testable without standing up ProjectView. Tests
cover the rewrite burst, isolated single triggers, the maxWait cap
under sustained triggers, latest-callback semantics, and unmount
cleanup.

Fixes #2195
2026-05-20 11:12:53 +08:00
Bryan
b426f614f5
fix(web): scope design-systems surface chip counts to active style filter (#2141)
On the Design systems page the built-in library surface chips (All / Web /
Image / …) counted the whole catalog, so applying a style category left
them showing stale totals while the grid shrank. Clicking a chip also
called setCategory('All'), discarding the user's style filter instead of
refining within it.

Surface chip counts and the grid now derive from a query-scoped set
(style category + search) over librarySystems; the chip click only
switches surface. An unscoped surfaceTotals count still backs the
"surface is empty" effect guard so it reacts to the catalog, not a
transient filter. The chip row always keeps "all" and the active surface
chip rendered, so a transient search can never hide the active filter.

Re-applied onto the DesignSystemsTab restructure from #2187. Adds 4 tests
covering scoped counts, category preservation on chip click, empty-chip
hiding, and active-chip persistence under a zero-result search.

Fixes #2062

Co-authored-by: DevForgeAI CI/CD Engineer <devforge-ai@development.ai>
2026-05-19 23:42:45 +08:00
mzl163
210b94069a
feat(senseaudio): BYOK chat with image + video generation tools (#2065)
* feat(senseaudio): BYOK chat with image + video generation tools

Adds SenseAudio as a first-class BYOK chat protocol and wires the daemon's
chat proxy with a tool loop so BYOK users can generate images and videos
without dropping to a CLI agent.

- BYOK protocol: new senseaudio tab + /api/proxy/senseaudio/stream route +
  connection-test + provider-models discovery (OpenAI-compatible wire)
- Tool loop: generate_image (synchronous /v1/image/sync) and generate_video
  (async /v1/video/create + 5s polling /v1/video/status, 10-min ceiling,
  periodic progress log every 30s)
- Settings dropdown + chat-composer dropdown for the BYOK image model
  default; generate_image's model enum lets the LLM override per call
- Seed-on-success: a successful BYOK chat call idempotently mirrors the
  key into media-config (preserves env-resolved + already-stored keys)
- Generated artifacts land in <projectsRoot>/<projectId>/ so FileViewer,
  DesignFilesPanel, and project export pick them up automatically;
  legacy /api/byok-image/:id route kept for old conversation links
- Markdown renderer learns ![alt](url) image syntax with a scheme
  allowlist (http(s) / data:image/ / blob: / relative paths)
- i18n key settings.byokImageModel across all 19 locales
- 3 SenseAudio image models registered (2.0, 1.0, doubao-seedream-5.0);
  1 video model (doubao-seedance-2.0)
- Tests: byok-tools (29), media-senseaudio-image (8), media-config seed
  (7), proxy-routes (47), markdown image rendering (8)

* fix(senseaudio): unblock image gen + design file preview switching

- SenseAudio /v1/image/sync rejected the previous size mapping with
  `参数错误:size` (1664x936, 936x1664, 1280x960, 960x1280 are not in
  the gateway's accepted set). Switched to standard HD / SD sizes that
  every aspect bucket can hit: 1024×1024, 1280×720, 720×1280,
  1024×768, 768×1024. Kept the byok-tools and media.ts tables in sync
  so the BYOK chat tool and the CLI agent path both stop failing on
  non-square aspects.

- DesignFilesPanel's <DfPreview> was missing a key prop, so React
  reused the same iframe DOM node when the user picked a different
  file — the src prop changed but the iframe never navigated. Added
  key={previewFile.name} so the previous preview unmounts cleanly.

- Updated byok-tools + media-senseaudio-image tests for the new size
  expectations.

* docs(senseaudio): clear stale provider hint + update README

- Settings → Media → SenseAudio: clear the auto-promoted
  "Image · TTS · 70+ voices · clone" hint; the provider label alone is
  enough now that the BYOK chat surface covers image + video tooling.
- README: list the new senseaudio (and missing ollama) proxy routes so
  the BYOK section reflects what the daemon actually serves, and
  mention the generate_image / generate_video chat tools that ship
  with the SenseAudio path.

* fix(senseaudio): address PR #2065 review feedback

Three non-blocking review notes from @PerishCode on PR #2065:

1. Drop the dead /api/byok-image/:id route. The PR description claimed
   it was "legacy fallback for old chat history" but that storage
   layout never existed on main, so the route can only ever 400 or
   404 — never 200. Removed the handler, the isSafeByokImageId
   export, the unused createReadStream / stat / path / Request /
   Response imports, and the two byok-image regression tests.

2. Add rejectProxyPluginContext guard to the senseaudio proxy
   handler so it matches the invariant the other five proxy paths
   already enforce (plugin runs must go through /api/runs for
   snapshot pinning). Extended the existing "API fallback rejects
   plugin runs" describe to also cover /api/proxy/senseaudio/stream
   with the 409 PLUGIN_REQUIRES_DAEMON expectation.

3. Wrap the secondary image / video downloads (the URLs the
   SenseAudio gateway hands back in /v1/image/sync .url and
   /v1/video/status .video_url) in validateBaseUrlResolved so a
   malicious gateway can't point us at 169.254.169.254 (AWS / Azure
   metadata) or RFC1918 hosts via the response payload. Also passed
   `redirect: 'error'` on both fetches to match the SSRF posture
   the primary proxy fetch already uses. The new
   assertExternalAssetUrl helper lives next to executeGenerateImage
   so future tool downloads can reuse it.

Tests: 120/120 daemon tests pass; guard + typecheck green.

* fix(senseaudio): mirror SSRF guard onto renderSenseAudioImage CLI path

Follow-up to 01b1260a — the chat-tool fix in byok-tools.ts wasn't
mirrored onto the parallel renderSenseAudioImage path in media.ts.
Same attacker-controllable shape (gateway-returned `data.url`),
same one-line fix.

- Hoist assertExternalAssetUrl from byok-tools.ts into
  connectionTest.ts next to validateBaseUrlResolved so both call
  sites (the BYOK chat tool loop AND the CLI agent media dispatcher)
  share one helper. Made the error strings provider-agnostic so a
  future caller doesn't get a misleading "senseaudio" attribution
  for a Volcengine / Grok / etc. download.
- renderSenseAudioImage now runs the response url through
  assertExternalAssetUrl before fetching bytes, and passes
  redirect: 'error' to block a 3xx hop into private space.

Scope intentionally limited to the senseaudio path PerishCode
flagged; the other unguarded fetch(entry.url) call sites in
media.ts (OpenAI / Volcengine / Grok / Nano-Banana) are pre-existing
patterns and belong in a separate follow-up if the daemon wants
defense-in-depth across every provider.

Tests: 127/127 daemon tests pass; guard + typecheck green.

---------

Co-authored-by: unknown <mazeliang@sensetime.com>
2026-05-19 23:14:56 +08:00
Eli
431a5e2d79
[codex] Add global onboarding flow without AMR (#2272)
* Add global onboarding flow

* Remove AMR from onboarding variant

* Add onboarding role question
2026-05-19 22:00:40 +08:00
PerishFire
ad37fd30cf
Add desktop updater UI flow (#2270) 2026-05-19 21:36:51 +08:00
Eli
e94663bfbd
Add connector memory extraction flow (#2265) 2026-05-19 21:27:41 +08:00
kami
d1eb8b7bef
Fix editorial deck navigation (#2173) 2026-05-19 19:30:30 +08:00
Neha Prasad
716f06cb73
fix(web): allow Comment/Inspect picker to select iframe-backed components (#2254) 2026-05-19 19:26:11 +08:00
Neha Prasad
e6d0f4eeab
fix(web): restore scrolling in New project modal for tall tabs (#2032) 2026-05-19 19:09:23 +08:00
PerishFire
2c128e0e91
refactor desktop host bridge (#2246) 2026-05-19 18:27:05 +08:00
Neha Prasad
c6b7c44424
confirm before clearing memory extraction history (#2241) 2026-05-19 17:56:14 +08:00
Zellux Wang
52b702648e
Enable LAN web dev access (#1947) 2026-05-19 17:50:50 +08:00
Tom Huang
86ec951fb9
[codex] Add automation templates and proposal workflows (#2193)
* feat(web): introduce Automations tab with dual-track capability for routines

This commit adds a new Automations tab that consolidates routines, schedules, and live artifacts, allowing users to manage automations seamlessly. The tab features a modal for creating and editing automations, which supports various scheduling options (hourly, daily, weekdays, weekly) and project modes (create_each_run, reuse). The CLI is also updated to expose automation commands, ensuring consistency between the web UI and CLI interfaces.

Key changes include:
- New `NewAutomationModal` component for automation creation and editing.
- Updated `TasksView` to integrate the new Automations functionality.
- Enhanced styling for the Automations tab to improve user experience.

This implementation aligns with the dual-track capability exposure policy, ensuring all features are accessible via both the web UI and CLI.

* feat(daemon): enhance automation context handling and CLI commands

This commit introduces several improvements to the automation context management and updates the CLI commands accordingly. Key changes include:

- Added support for new context fields (`plugin`, `mcp`, `connector`) in automation commands.
- Updated the CLI to reflect new target options (`new-project`).
- Enhanced error messages for invalid target inputs.
- Introduced functions to handle context selection and normalization for routines, including the ability to parse and store context data in the database.
- Updated the database schema to include a new `context_json` field for routines.
- Improved the handling of context in routine routes and the web interface, ensuring that selected contexts are properly managed and displayed.

These changes aim to provide a more robust and flexible automation experience, aligning with the recent enhancements in the web UI.

* feat(web): enhance TasksView with automation run history and status indicators

This commit introduces several new features to the TasksView component, including:

- Added functionality to display automation run history for each routine, showing metadata such as status, timestamps, and project details.
- Implemented status indicators for routine runs, providing visual feedback on their current state (succeeded, failed, running, queued).
- Enhanced the UI to allow users to expand and view detailed run history, including the ability to open the corresponding project conversation.
- Updated styles to improve the presentation of automation statuses and history.

These changes aim to provide users with better insights into their automation routines and improve overall usability.

* feat(daemon): implement automation ingestion and proposal management

This commit introduces several new features related to automation ingestion and proposal management within the daemon. Key changes include:

- Added new modules for handling automation source packets and proposals, allowing for the storage, retrieval, and management of automation-related data.
- Implemented functions to list, create, and apply automation proposals, enhancing the automation workflow.
- Introduced new CLI commands for interacting with memory entries and automation sources, providing users with more control over their automation processes.
- Enhanced the server routes to support automation source and proposal APIs, enabling seamless integration with the existing system.

These changes aim to improve the overall automation experience, making it easier for users to manage and utilize automation proposals and ingestions effectively.
2026-05-19 16:35:28 +08:00
Siri-Ray
eb127e0f79
Remove live artifact home chip (#2221) 2026-05-19 16:35:09 +08:00
Eli
4376d8a8ec
[codex] Add pet task center and desktop pet (#1833)
* feat: add pet task center and desktop pet

* Fix pet task center review regressions
2026-05-19 15:38:39 +08:00
kami
6990291217
fix: escape chat transcript role delimiters (#2156)
Co-authored-by: multica-agent <github@multica.ai>
2026-05-19 15:37:35 +08:00
lefarcen
387dc83b27
fix(plugins): wire Open Design "Open Design PR" button end-to-end (#2182)
Some checks failed
ci / Detect CI change scopes (push) Failing after 2s
landing-page-ci / Validate landing page (push) Failing after 1s
landing-page-deploy / Deploy landing page (push) Has been skipped
nix-check / build (push) Failing after 1s
ci / Preflight (push) Has been skipped
ci / Core package tests (push) Has been skipped
ci / Tools workspace tests (push) Has been skipped
ci / Daemon workspace tests (push) Has been skipped
ci / Web workspace tests (push) Has been skipped
ci / E2E vitest (push) Has been skipped
ci / Playwright critical (push) Has been skipped
ci / Build workspaces (push) Has been skipped
ci / App workspace tests (push) Has been skipped
ci / Validate workspace (push) Failing after 0s
Two-part fix for the Plugin folder card's "Open Design PR" button. The
flow was broken end-to-end: the CLI emitted a 404 URL, and the agent
prompt under-specified the contribution steps so the agent stalled
mid-turn or fell back to the legacy issue-URL path.

Catalog target — `apps/daemon/src/plugins/publish.ts`:

`buildPublishLink({ catalog: 'open-design' })` hardcoded a submission URL
at `github.com/open-design/plugin-registry`, the dedicated registry repo
proposed in docs/plans/plugin-registry.md §1.2. That repo doesn't exist
yet (P3.1 notes "creating the external GitHub repo is an operational
launch step, not a code blocker"), so every generated URL 404'd. Retarget
the catalog at the live `nexu-io/open-design` monorepo and update the
target-path hint in the PR body to `plugins/community/<plugin-name>/`
(the actual layout under main). Plan §1.2 stays the long-term goal — see
the code comment.

Also updates the matching `od plugin yank` issue URL in cli.ts.

Agent prompt — `apps/web/src/components/design-files/pluginFolderActions.ts`:

The `contribute` action prompt only said "use the supported `od plugin
publish` Open Design registry flow", which produces an issue URL (the
legacy path) and left the agent to invent the remaining steps. The agent
ended up calling `AskUserQuestion` mid-turn waiting for input that the
DesignFilesPanel buttons couldn't satisfy, then stalled for 600s.

Rewrite the `contribute` prompt to drive the full PR flow via raw `gh`
commands:
  1. preflight `gh --version` / `gh auth status` (detect-and-instruct on
     missing CLI, never auto-install anything)
  2. read manifest, resolve author login
  3. `gh repo fork nexu-io/open-design --remote=false`
  4. clone fork, branch, cp plugin into `plugins/community/<name>/`,
     commit/push using author's git identity (not a hardcoded bot)
  5. `gh pr create ... --web` so the author reviews and clicks Create

Hard bans on `AskUserQuestion`, retry-on-failure, and the legacy
`od plugin publish --to open-design` CLI keep the turn fire-and-forget.
The `install` / `publish` action prompts are unchanged.

Tests:

* `apps/daemon/tests/plugins-publish.test.ts` — assert URL host +
  catalogLabel match `nexu-io/open-design`, body contains the new path
  hint.
* `apps/web/tests/components/pluginFolderActions.test.ts` (new) — lock
  the contribute prompt's command surface, the `--web` review window,
  the `AskUserQuestion` ban, the install-tool ban, and folder-path
  interpolation. Nine assertions covering the prompt contract without
  coupling to exact wording.

Validation:

* `pnpm --filter @open-design/daemon exec vitest run tests/plugins-publish.test.ts`
  → 11/11 passed
* `pnpm --filter @open-design/web exec vitest run tests/components/pluginFolderActions.test.ts`
  → 9/9 passed
* `pnpm --filter @open-design/web typecheck` clean
* E2E: button click in DesignFilesPanel under namespace=e2e-pr-test ran
  `gh --version` → fork → clone → branch → commit/push → ended at
  `gh pr create --web` opening the GitHub PR draft in browser. No
  AskUserQuestion stall this time.

Doesn't touch: `5f71968f` server-side endpoints (`/contribute-open-design`
+ `/publish-github` still in main as dead code — separate cleanup),
plan/spec docs (per maintainer call to keep the dedicated-repo as the
long-term goal), or `registry-backends.test.ts` (plan terminal-state
fixture; no production caller).
2026-05-19 15:26:59 +08:00
lefarcen
9596a0ccd5
feat(privacy): collapse first-run consent banner to a single "I get it" button (#2202)
* feat(privacy): collapse first-run banner to a single "I get it" button

Replaces the first-run privacy disclosure's two-button decision picker
("Share usage data" / "Don't share") with a single "I get it"
acknowledgement. Clicking it accepts the same default telemetry surface
the previous "Share usage data" path enabled — the banner shifts from
binary consent picker to informed disclosure.

To keep the surface honest, the banner footer is rewritten to spell out
the new default and point at the off switch:

  "Data sharing is on by default. You can turn it off any time in
   Settings → Privacy. We never upload the contents of your generated
   artifact files."

Settings → Privacy (PrivacySection.tsx) is unchanged — that surface still
exposes both Share and Don't share buttons so users who arrive there
later (or come back to flip the choice) keep the explicit picker.

Mechanics:

* `PrivacyConsentModal.tsx`: drop `onDecline` prop and the second button;
  rename `onShare` → `onAccept` to match the new semantic. Footer hint
  now reads from `settings.privacyConsentBannerFooter` (new key) so the
  banner copy can speak in single-button voice without disturbing the
  reused `settings.privacyConsentFooter` that PrivacySection still
  displays.

* `App.tsx`: drop the `onDecline` handler. Single `onAccept` handler
  applies the same opt-in payload as the previous `onShare` branch
  (`telemetry.metrics = true`, `telemetry.content = true`, fresh
  `installationId`), so the wire format daemon-side is unchanged.

* `i18n/types.ts` + `locales/en.ts`: two new keys —
  `settings.privacyConsentAccept` ("I get it") and
  `settings.privacyConsentBannerFooter` (the default-on disclosure copy).

* `i18n/locales/*.ts` (all 18 non-en dictionaries): added the two new
  keys. zh-CN and zh-TW are translated; the remaining 16 locales follow
  the project convention of leaving the EN string as a fallback for
  later contributor passes (same shape used by privacyConsentShare /
  Decline today).

* `tests/components/PrivacyConsentModal.test.tsx`: rewritten. The four
  new tests lock the new contract — single "I get it" button, no
  Share/Decline labels, default-on disclosure text in the footer, the
  external privacy-policy link, and onAccept firing on click. Replaces
  the prior "equal-prominence" tests, which only made sense for the
  two-button shape.

Validation:

* `pnpm --filter @open-design/web exec vitest run tests/components/PrivacyConsentModal.test.tsx`
  → 4/4 passed
* `pnpm --filter @open-design/web exec vitest run tests/i18n/locales.test.ts`
  → 5/5 passed (every locale aligned with English keys + placeholders;
  the new keys ship to all 19 dictionaries)
* `pnpm --filter @open-design/web typecheck` clean

* test(privacy): update App.connectors fixture to match single-button banner

`App.connectors.test.tsx > does not show first-run privacy consent until
daemon config hydration finishes` hardcoded the previous "Share usage
data" affirmative button label. The single-button banner now renders
"I get it", so the assertion was looking for a button that no longer
exists.

CI signal: 26081467446 → Web workspace tests → `× does not show
first-run privacy consent until daemon config hydration finishes`. The
App workspace tests + Validate workspace failures cascaded from this
one — both are aggregator jobs.

Local: vitest run tests/components/App.connectors.test.tsx → 5/5 passed.
2026-05-19 15:26:56 +08:00
Eli
3cecb1c881
Polish project workspace UI (#2201) 2026-05-19 15:14:16 +08:00
chaoxiaoche
a38e09f931
fix(web): demote Plugins and Integrations to nav rail footer (#1806)
* fix(web): demote Plugins and Integrations to nav rail footer

Plugins and Integrations are platform-configuration surfaces, not
daily-use destinations. Moving them to the footer section of the
left nav rail — separated from primary items by a thin divider —
keeps them reachable while giving the primary four items
(Home, Projects, Automations, Design Systems) the visual weight
they deserve.

- EntryNavRail: remove Plugins/Integrations from the main __group
  and place them in the __footer above the help launcher
- entry-layout.css: add __divider rule (1 px separator) to visually
  mark the boundary between primary and secondary nav regions

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): remove settings dropdown, gear opens settings directly

The gear/cog button previously opened a dropdown that mixed three
unrelated concerns: community links (X, Discord), preference quick-
access (Language, Appearance), a feature shortcut (Use everywhere),
and a redundant Settings entry — creating two separate paths to the
same Settings dialog and duplicating Language/Appearance relative to
the Settings sidebar.

Changes:
- Gear button now directly opens the Settings dialog (no intermediate
  dropdown layer)
- Follow @nexudotio on X and Join Discord moved to the Help menu at
  the bottom of the nav rail, where community/external links belong
- Language and Appearance remain exclusively in the Settings dialog
- Use everywhere remains exclusively in the topbar chip
- Remove dead state (avatarMenuOpen, languageExpanded,
  appearanceExpanded), ref, outside-click effect, and module-level
  constants (APPEARANCE_THEMES, APPEARANCE_LABEL, describeModelChip)
  that were introduced solely to support the dropdown

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(web): move output-type chips above chat input as tab bar

The home hero previously had two chip rows below the chat input that
mixed two unrelated dimensions: output type (Prototype, Slide deck,
Image…) and workflow source (Create plugin, From Figma, From folder,
From template). Users had no visual cue that these were different
categories.

This change separates the two dimensions clearly:

- Output type (create group) becomes a tab bar positioned above the
  input card. Tabs share the same chip data and onPickChip dispatch,
  so plugin selection, active state, and pending state are unchanged.
  Active tab shows a colored underline; the bar border visually
  connects to the input card below.

- Workflow source (migrate group) stays as the chip row below the
  input card, now standing alone with unambiguous "how to start"
  semantics.

- Subtitle updated from "Pick a plugin below" to "Pick a type"
  to match the new placement.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): skip replace-prompt confirmation when switching output-type tabs

Output-type tab clicks (create group) are mode-selection gestures;
the user expects the prompt to update immediately without a dialog.
The confirmation is still shown for migrate-group chips (From Figma
etc.) where the replacement carries meaningful user-provided content.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): use brand logo as Home destination, drop redundant brand mentions

The entry nav rail previously rendered the brand logo and a separate
Home icon back-to-back; both invoked `onViewChange('home')`, so the
Home button was pure duplicate affordance. The hero pane also
displayed a third "Open Design" lockup that competed with the rail
logo and the rebranded title.

This collapses those affordances:

* `EntryNavRail` drops the dedicated Home `NavButton`. The brand
  logo button now carries `aria-current="page"` and an `is-active`
  visual state when the home view is showing, and its tooltip reads
  "Open Design · Home" off-home so the navigation behavior stays
  discoverable for new users.
* `entry-layout.css` adds the matching `.entry-nav-rail__logo.is-active`
  accent treatment so the "you are here" cue reads at parity with
  primary rail buttons.
* `HomeHero` removes the inline `home-hero__brand` lockup and the
  associated CSS, then retunes the title/subtitle/type-tab spacing
  so the headline group still pairs tightly with the type tabs and
  input card below.
* `entry-chrome-flows` is updated to assert the logo carries the
  active page treatment and that no `entry-nav-home` test id
  resurfaces by mistake.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): auto-grow home chat input, disable internal scroll and drag-resize

The home hero textarea was capped at a fixed `min-height: 84px` with
`resize: vertical`, so users had to either drag the bottom-right
corner to enlarge it or scroll the textarea internally to read
longer prompts. That hid context (loaded plugin templates routinely
overflow three lines) and exposed a manual grip whose state was easy
to leave in an awkward height.

This change makes the chat box grow with its content:

* `HomeHero` adds a `useLayoutEffect` that, on every prompt change,
  resets the textarea height to `auto` so the browser can measure a
  smaller content, then writes back `scrollHeight` in pixels. The
  effect uses layout phase (not effect phase) to avoid a one-frame
  flash at the previous height when a plugin loads a long example
  prompt.
* `home-hero.css` swaps `resize: vertical` for `resize: none` and
  adds `overflow: hidden`, so the manual drag grip disappears and
  the textarea never scrolls internally. `min-height: 84px` is kept
  so the empty input still reads as a chunky chat box. The outer
  page handles overflow when the prompt is genuinely very long.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): make output-type tabs read as folder-tabs attached to chat box

The previous tab bar used a small icon + label per tab and signaled
the active tab with a 2px accent underline sitting on a horizontal
divider line. That divider visually broke the relationship between
the tabs and the input card below — the active tab looked like a
free-floating header, not a flap of the chat surface, so users had
to do extra work to mentally connect "I picked Prototype" with the
prompt area immediately underneath.

This switches the tab bar to a folder-tab pattern:

* `HomeHero` drops the per-tab `Icon` element. The seven labels
  (Prototype, Live artifact, Slide deck, Image, Video, HyperFrames,
  Audio) already disambiguate at the type sizes used here, and the
  icons were primarily decorative.
* `home-hero.css` rewrites the tab styling:
  - The bar no longer paints its own baseline border; the input
    card's top edge serves as the baseline.
  - Each tab is a rounded-top container with a 1px transparent
    border and a `-1px` bottom margin, so its bottom edge overlaps
    the card's top border by exactly one pixel.
  - The active tab borrows the card's panel background and border
    color, and overrides its bottom border with the panel color
    so it visually erases the card's top border for the tab's
    width. The result reads as one continuous "Prototype is this
    chat box" surface.
  - The card's prior `margin-top: 8px` is removed so the tab bar
    bottom and card top sit at the same y coordinate, which is the
    geometric precondition for the 1px overlap to land cleanly.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): enlarge home chat attach and submit buttons for legibility

The attach (paper-clip) and submit (arrow-up) controls on the home
chat input rendered at 32px diameter with 14px and 18px glyphs
respectively. After stroke antialiasing the icons read at roughly
11–12px on typical displays — small enough that users reported the
glyphs were illegible and had to be discovered by trial-and-error.

This bumps both controls to 38px circles and grows their glyphs
(attach 14 → 18, submit 18 → 22). The primary call-to-action now
has clearly more visual weight than the surrounding muted hint
text, and the paper-clip is recognisable at a glance.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): paint a chevron on inline select slots in the home prompt

Inline select slots in the home prompt overlay (e.g. the
`high-fidelity` / `wireframe` fidelity picker on the Live artifact
template) rendered as plain pill highlights, visually identical to
free-text and read-only text slots. The slot's `appearance: none`
strips the browser's default chevron, so users had no affordance
hinting the value was switchable.

This wraps select-type slots in an `inline-flex` span and overlays
an explicit chevron-down `Icon` against the slot's trailing padding
(bumped from 18px to 22px to make room). The chevron inherits the
accent color via the wrap's `color` property so it themes correctly
in light and dark modes. Click semantics are unchanged because the
chevron is `pointer-events: none`.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): size inline number slots to their value, not the whole row

Number-type inline slots in the home prompt (e.g. the slide-count
spinner on the Slide deck template) inherited the generic slot
`min-width: 8ch` and then stretched to the browser's default
`<input type=number>` width, so a two-digit value like "10" ate
the entire remaining line and pushed the native spinner buttons
to the far right edge — far away from the value they control.

This sizes the number input to its actual content plus four
characters of trailing room for the spinner buttons (clamped to
6–14ch), and adds a `home-hero__prompt-slot-input--number`
modifier that overrides the slot `min-width` to 4ch. The
spinner now sits flush against the value and the slot reads as
a compact inline pill matching the surrounding text/select
slots.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): widen prompt line-height so slot pills do not collide vertically

Each interactive slot and mention in the home prompt paints a 2px
outline ring via `box-shadow`. At the previous `line-height: 1.55`
on a 15px font, two lines were ~23px apart while a single pill
occupied ~20px of vertical space (text box + ring on both sides),
leaving roughly 2px of clear space between rows. When the prompt
wrapped onto multiple lines — common for the Image template's
"Generate a {kind} of {subject}. Style: {style}. Aspect: {ratio}."
example — the rings from line N and line N+1 visually merged into
a single bar, making it ambiguous which pill the user was about to
click or edit.

This bumps the line-height of both the highlight overlay and the
underlying editable textarea to 1.85 (~28px per line), restoring
~8px of clear space between pill rows. The two values must stay
identical so the overlay glyphs continue to track the textarea
caret positions exactly; a brief comment in each rule documents
this coupling for future edits.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): paint dropdown chevron on select element with neutral color

The previous chevron treatment wrapped each select slot in an
inline-flex span and laid an accent-orange `<Icon>` over the
trailing padding. At the prompt's normal display size the orange
chevron blended into the orange pill ring corners, so users
perceived "a bunch of dropdown arrows" across every highlighted
slot, even though only the actual select rendered one.

Move the chevron onto the `<select>` itself as a small neutral-
gray `background-image` SVG. The grey contrasts clearly with the
pill's accent ring and the chevron lives inside the select's own
padding, so it can never visually overlap the value text and can
never appear on non-select slots.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): shrink select-slot dropdown caret so it stops reading as a check-mark

The 8×6 stroked chevron looked like a check-mark in zoomed views
because the stroke width was a large fraction of the glyph. Swap
for a small (9×5) filled triangle drawn as a `background-image`,
with `background-size` pinned so browsers can't scale it to its
intrinsic SVG box. The caret is now unambiguously a dropdown
indicator without crowding the value text.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): make read-only prompt slots visually distinct from editable pills

Multi-word context values are rendered as plain <span>s with
pointer-events: none, but they inherited the same orange pill +
ring treatment as the truly editable <input>/<select> slots.
Users couldn't tell which pills they could click into.

Strip the pill background, the ring, the radius, and the padding
from `.home-hero__prompt-slot-text` and replace them with a
subtle dashed bottom border. The orange foreground keeps the
slot family link, but read-only highlights now clearly read as
"context value spliced into the prompt" rather than as
"interactive control".

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): scale select-slot dropdown caret up to 12px wide for legibility

The 9×5 caret was too quiet at the prompt's font size to read as
a dropdown affordance. Bump to a 12×6 filled triangle and pad the
select's trailing space (24px) so the value text still has clear
breathing room from the caret.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): collapse plugins search and filter strips into one bar

The plugins-home gallery laid out search, count, mode (Featured),
total-in-catalog, clear-filters, the main category row, and the
sub-category row as four floating clusters. Search lived up in
the section header, far from the chips it actually scopes; the
mode strip wedged "Featured + 386 in catalog + Clear filters"
between the header and the category row with no obvious
ownership; and the two clear-filter affordances duplicated each
other (chip strip + empty-state).

Fold the mode strip into the category row: Featured is now the
leading chip on the same line as the category pills, and the
search field, the result count, and a compact Clear link sit as
a right-aligned tools cluster on that same row. The header
shrinks back to just the title, subtitle, and Browse registry
link. The sub-category row stays as the contextual second line
when a category is active.

Tests: keeps the existing data-testids (plugins-home-chip-featured,
plugins-home-row-category, plugins-home-clear, plugins-home-count,
plugins-home-search) so the existing 11-case section spec passes
without modification.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): hide Recent projects rail on first-run home when empty

A brand-new user landing on Home saw an empty dashed box with the
copy "No projects yet — type a prompt to start one." sitting
above the plugin gallery. The hero already prompts the user to
type, so the empty rail just adds vertical noise without telling
them anything new.

Return null from RecentProjectsStrip when the recent list is
empty (loading or not) so the rail appears only when there is
actually something to show. Update the entry-chrome e2e to
assert toHaveCount(0) on first run — codifying the new contract
that the rail is conditional, not always-mounted.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): hide structured inputs form when every field is in the prompt

The PluginInputsForm below the chat textarea surfaced every plugin
input — even the ones the prompt template already substitutes
inline as highlighted slot pills. For a plugin like Prototype
that's five identical labelled inputs duplicating the five slot
pills above them, making the chat box look like it has grown a
second composer.

Compute the set of placeholder keys actually referenced in the
prompt template (via INPUT_PLACEHOLDER_PATTERN) and skip those
when deciding what the structured form needs to render. The form
still appears for plugins that expose inputs not referenced in
the prompt text (e.g. a "Run in background" toggle), but
template-only plugins collapse the redundant second editor.

Update the picker spec accordingly: when every field is in the
template the form is now expected to be absent rather than to
render a duplicate of the inline slot.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): drop redundant filtered-count and Clear link beside the search

The combined plugins filter bar trailed the search field with
"59 / 386" and an inline "Clear" link, but both repeat what the
chip strip already shows: every chip carries its own count
(Slides 59, All 386, …) and clicking the All chip already resets
the filters. Users had no way to know what the bare "59 / 386"
fraction or "Clear" referred to without inference.

Strip the trailing tools cluster down to just the search input.
The empty-results message at the bottom of the gallery still
exposes a contextual "Clear filters" button when a stacked
filter yields no matches, so the affordance isn't lost — just
removed from a position where it didn't read as actionable.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): drop default subtitle copy from the Official starters strip

The Home gallery rendered a two-line explanatory subtitle under
"Official starters" — "Ready-to-use Open Design workflows bundled
with this runtime. Pick one to load a starter prompt, or browse
the registry for more." — every visit. Returning users skip it,
new users get the same message from the section title plus the
Browse registry link plus the visual card grid itself; the prose
read as filler chrome above the chip strip.

Default the subtitle prop to undefined and only render the <p>
when a caller passes an explicit string. Other surfaces that
mount PluginsHomeSection with their own copy keep their
subtitle; the bare Home gallery loses one row of vertical noise.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Fade out empty workflow lanes in Plugins home filter

Empty top-level lanes (Deploy 0, Refine 0, etc.) and empty
sub-categories (Vercel 0 under Deploy, Figma 0 under Import, etc.)
used to look identical to populated ones, so users couldn't tell at
a glance which chips were real catalog buckets and which were
"contribute a plugin" invites. The strip kept all lanes visible by
design — the workflow shape (Import / Create / Export / Share /
Deploy / Refine / Extend) is part of what we want users to see —
so we keep them clickable, but tag count-zero chips with
data-empty='true' and give them a faded, dashed-border treatment.
"All" pills stay solid since their count reflects the parent lane,
not their own emptiness.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): update home nav test expectations

* fix(e2e): align critical smoke with entry chrome

---------

Co-authored-by: chaoxiaoche <chaoxiaoche@chaoxiaochedeMacBook-Pro.local>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-19 14:58:15 +08:00
Eli
18b947c25f
[codex] Land design system GitHub intake handoff (#2187)
* Add Claude-style design system workflow

* Merge design system workflow into main

* Restore design system workflow UI styles

* Fix design system setup scrolling

* Fix design system setup connector button

* Preserve connector auth link after popup block

* Simplify connected GitHub setup state

* Open generated design system workspace project

* Summarize design system auto prompt in chat

* Add bounded GitHub connector design intake

* Prefer path-scoped GitHub intake tools

* Restore branch GitHub design context intake

* Restore design system review workspace

* Restore design system manager tab

* Let design system workflow routes own details

* Open editable design systems as projects

* Restore design system workspace coverage

* Fix bounded GitHub connector intake

* Hide design system review while generating

* Suppress design system generation questions

* Constrain GitHub design intake to bounded command

* Tolerate oversized GitHub metadata during intake

* Rebuild daemon CLI when sources change

* Fallback when GitHub connector snapshots are rate limited

* Allow GitHub intake without Composio

* Use native GitHub auth for design intake

* Remove design system review group heading

* Improve design system extraction evidence

* Align design system scaffold with Claude output

* Add evidence inventory for design system intake

* Add local design system evidence intake

* Add design system package audit gate

* Allow auditing Claude Design reference packages

* Audit design system package content quality

* Migrate legacy design system artifacts

* Clean migrated design system artifacts

* Require modular design system UI kits

* Reject thin design system UI kits

* Prioritize core design evidence intake

* Require role-based design system UI kits

* Clean stale design system manifest references

* Require representative preserved design assets

* Warn on generic design system visuals

* Enforce design system quality warnings

* Audit connected design system UI kits

* Require mounted design system UI kits

* Require composed design system app shells

* Require runnable JSX design system kits

* Require browser globals for design system components

* Infer design system names from source URLs

* Require source examples in design system packages

* Bind preserved fonts in design system tokens

* Require skill frontmatter in design system packages

* Preserve build icons in design system packages

* Require real assets in brand previews

* Require substantive source examples

* Require product overview in design system README

* Require reusable UI kit README

* Require reusable design system skill docs

* Seed Claude-style UI kit entry contract

* Preserve runtime build assets in design packages

* Audit design system packages after generation

* Audit design system first-run output

* Audit source-backed preview cards

* Align design system UI kit scaffolds

* Materialize design evidence package artifacts

* Show project chat during design system setup

* Hand off design system setup to project chat

* Auto-repair design system audit failures

* Harden design system evidence preservation

* Tighten design system package guidance

* Add targeted design system repair guidance

* Bound design system audit auto repair

* Use connector statuses in design system setup

* Audit design system preview manifests

* Require README preview manifests for design systems

* Fix design system GitHub intake handoff

* Fix daemon prompt CI assertions
2026-05-19 14:30:17 +08:00
PerishFire
bd48c597b0
chore: pin dependency versions and harden CI caches (#2189)
* chore: pin dependency versions

* ci: enforce pinned dependency specs

* ci: fix pnpm executable invocation
2026-05-19 13:58:27 +08:00
kami
68b0e0258d
fix(web): scope daemon transcript to active agent (#2157) 2026-05-19 13:49:34 +08:00
kami
b838e94b88
fix(web): expand skill mention picker (#2170) 2026-05-19 13:49:15 +08:00
shangxinyu1
f650a043d9
test(e2e): align entry coverage with redesigned flows (#2101)
* Migrate entry E2E coverage and split CI

* test(e2e): relax connectors auth error assertions

* ci: route scenario registry changes to extended e2e

* ci: decouple packaged smoke jobs from validate gate

* ci: restore pre-split workflow

* test(e2e): slim critical ui smoke coverage

* test(e2e): move broader entry flows out of critical

* test(e2e): restore entry chrome coverage to ci

* ci: parallelize workspace validation jobs

* test(web): stabilize media palette bridge assertion

* ci: cache e2e playwright browsers
2026-05-19 11:26:40 +08:00
Caprika
34f66113a0
Implement Home audio essential workflow (#2104)
Some checks failed
ci / Packaged mac smoke (push) Blocked by required conditions
ci / Packaged windows smoke (push) Blocked by required conditions
ci / Detect PR change scopes (push) Failing after 1s
ci / Validate workspace (push) Has been skipped
landing-page-ci / Validate landing page (push) Failing after 1s
landing-page-deploy / Deploy landing page (push) Has been skipped
nix-check / build (push) Failing after 1s
ci / Packaged linux headless smoke (push) Has been skipped
* fix(web): align Home prompt overlay with textarea so caret lands on click

Picking a chip such as Slide deck or Image loaded a default prompt
into the Home textarea and rendered an overlay with `{{key}}`
placeholders as interactive `<input>` / `<select>` controls. The
overlay controls and the underlying textarea text were laid out
independently:

- Inputs declared `min-width: 8ch` and `Math.max(displayValue.length
  + 1, 10)ch` of width.
- Selects added 18px of right-padding for the dropdown arrow.
- The textarea kept the raw substituted string in a proportional
  font.

The two layouts no longer matched column-for-column, so every slot
shifted the textarea text to the left of where it appeared in the
overlay, compounding across the line. Clicking on visible prose to
position the caret hit a different character offset in the textarea
and subsequent typing or deletion landed in the wrong place.

For example, the Slide deck template

  Create a {{slideCount}}-slide {{deckType}} for {{audience}} about
  {{topic}}. ...

renders with slideCount=10 (~2 ch in the textarea) under a slot
input forced to 10 ch in the overlay — clicking right after the
literal `-slide` placed the caret several characters into `pitch
deck`. The Image / Video / Audio chips with their pre-filled
subject, style, aspect values reproduced the same drift.

Render the inline pills as read-only `<span>`s carrying the exact
substring the textarea shows at that position, mark them
`aria-hidden` so the textarea remains the single labelled control,
and surface every plugin input field — including the ones referenced
inline — in `PluginInputsForm` underneath. Editing flows through the
form, the parent's `updateActiveInputs` already re-renders the
prompt, and the pills stay aligned with the textarea on every
keystroke.

Also drop the now-unused inline helpers (updatePluginInput,
getTemplateInputNames, shouldRenderSlotAsText, inlineFieldType,
fileInputLabel, fileMetadata) and the dead
`.home-hero__prompt-slot-control/input/select/toggle/file/text` CSS
rules.

Verified:
- pnpm --filter @open-design/web typecheck
- vitest run on HomeHero.plugin-picker, HomeHero.rail, and
  HomeView.prefill (29/29 pass; tests updated to reflect the new
  read-only span + always-on form contract)
- Manual click-to-edit on Slide deck and Image chips in the
  pnpm tools-dev web runtime — caret now lands where the user
  clicked.

* Implement home audio essential workflow

* Fix Home media composer review issues

* Guard stale Home media apply results

---------

Co-authored-by: hahaplus <zmjdll@gmail.com>
2026-05-18 23:14:03 +08:00
kami
1523defad1
feat(web): add plugin registry detail drawer (#2087)
Co-authored-by: multica-agent <github@multica.ai>
2026-05-18 22:12:16 +08:00
kami
8a629eb999
fix: discover codex models from cli (#2082) 2026-05-18 22:11:51 +08:00
kami
ec53959601
feat(web): unify plugin trust badges (#2088)
Normalizes plugin trust badge rendering across Home, plugin details, registry entries, sources, and plugin share/install surfaces while mapping bundled and official sources to the user-facing Official tier.

Validation: CI and nix-check were green on PR head 1a68f20571.
2026-05-18 19:20:01 +08:00
kami
ac748d53fc
Fix pointer HTML artifact previews (#2075)
Resolves short pointer-only HTML artifacts to the existing project HTML target before persisting preview artifacts, with helper and ProjectView regression coverage for #536. Validation: CI and nix-check were green on PR head 8c7ee5f38d.
2026-05-18 18:28:28 +08:00
kami
7b3b7c3b74
Fix finalize provider routing for Gemini BYOK (#1964)
Routes Finish Design/finalize requests through the selected BYOK provider, including Gemini, while preserving the Anthropic fallback. Validation: CI and nix-check were green on PR head 6c334e08d1.
2026-05-18 18:03:44 +08:00
kami
0101a09b10
fix(mcp): support no-auth local HTTP servers (#2008) 2026-05-18 17:08:46 +08:00
Yuhao Chen
25708fbe0f
fix(web): keep connector errors out of cards (#1740)
* fix(web): keep connector errors out of cards

* fix(web): route panel-alert open-details through openConnectorDetails

The new "open details" action on the panel-alert row called
`setDetailConnectorId(alert.connectorId)` directly, bypassing the
`toolPreviewFailedIds` clear that `openConnectorDetails()` does on
every other entry point.

Concrete regression: user opens a connector's drawer from the card,
the tool-preview hydration fetch fails so `toolPreviewFailedIds[id]`
gets stamped with the current `toolPreviewRetryToken`, user closes
the drawer, and later the same connector throws a connect error and
shows up as a panel alert. Clicking the alert's open-details button
re-sets `detailConnectorId` but never clears the failed-fetch token,
so the hydration effect at the previous bail-out check
(`toolPreviewFailedIds[detailConnector.id] === toolPreviewRetryToken`)
returns immediately and the drawer never retries loading tools.

Routing the click through `openConnectorDetails(alert.connectorId)`
restores the same clear-before-set behavior the card and search
entry points already had, so the reopened drawer always gets a fresh
hydration attempt.

* test(web): cover connector panel alert tool retry

* fix(web): clarify connector panel alert status text

* fix(web): clear stale connector cancel alerts

---------

Co-authored-by: lefarcen <935902669@qq.com>
2026-05-18 17:06:46 +08:00
Ethan Guo
bd1b41a9d4
feat(web): highlight captured Pod component on chip hover (#1982)
* feat(web): highlight captured Pod component on chip hover

Wires onPointerEnter/Leave on chip + onFocus/Blur on its × so the matching member overlay gets an outline-focused emphasis on canvas.

* ci: trigger re-run for flaky palette dark-mode test

* fix(web): skip non-mouse pointer events on pod chip hover

Gate onPointerEnter/Leave to pointerType === 'mouse' so a touch tap on a chip no longer flickers the captured-member highlight; also add the CommentTargetOverlay class-wiring test, drop the redundant alpha on the hover outline, and document the non-member fallback path.
2026-05-18 16:34:19 +08:00
fyz3120
92d91101ae
Fix inspect ancestor selection notice (#2039)
Co-authored-by: fuyizheng3120 <182291973+fuyizheng3120@users.noreply.github.com>
2026-05-18 16:33:42 +08:00
Jeongjin Shin
0026dfa344
fix: support CODEX_API_KEY for Codex CLI (#1880)
Some checks failed
ci / Packaged mac smoke (push) Blocked by required conditions
ci / Packaged windows smoke (push) Blocked by required conditions
ci / Detect PR change scopes (push) Failing after 1s
ci / Validate workspace (push) Has been skipped
landing-page-ci / Validate landing page (push) Failing after 1s
landing-page-deploy / Deploy landing page (push) Has been skipped
nix-check / build (push) Failing after 3s
ci / Packaged linux headless smoke (push) Has been skipped
* fix(daemon): allow Codex API key env

* fix(web): expose Codex API key setting

* fix(web): clarify Codex key labels
2026-05-18 14:29:23 +08:00
Yuhao Chen
156191d565
fix(web): show unavailable state for missing examples (#1983) 2026-05-18 14:02:22 +08:00
Ghxst
332e982a07
fix(web): stabilize HTML preview URL on tab switch (#1959)
Some checks failed
ci / Packaged mac smoke (push) Blocked by required conditions
ci / Packaged windows smoke (push) Blocked by required conditions
ci / Detect PR change scopes (push) Failing after 1s
ci / Validate workspace (push) Has been skipped
nix-check / build (push) Failing after 2s
ci / Packaged linux headless smoke (push) Has been skipped
Co-authored-by: GHX5T-SOL <200635707+GHX5T-SOL@users.noreply.github.com>
2026-05-17 23:25:14 +08:00
kami
91be2696dd
Persist routine failure reasons (#1963)
Co-authored-by: multica-agent <github@multica.ai>
2026-05-17 23:22:00 +08:00
hahaplus
89bf313782
fix(web): align Home prompt overlay with textarea so caret lands on click (#1958)
Picking a chip such as Slide deck or Image loaded a default prompt
into the Home textarea and rendered an overlay with `{{key}}`
placeholders as interactive `<input>` / `<select>` controls. The
overlay controls and the underlying textarea text were laid out
independently:

- Inputs declared `min-width: 8ch` and `Math.max(displayValue.length
  + 1, 10)ch` of width.
- Selects added 18px of right-padding for the dropdown arrow.
- The textarea kept the raw substituted string in a proportional
  font.

The two layouts no longer matched column-for-column, so every slot
shifted the textarea text to the left of where it appeared in the
overlay, compounding across the line. Clicking on visible prose to
position the caret hit a different character offset in the textarea
and subsequent typing or deletion landed in the wrong place.

For example, the Slide deck template

  Create a {{slideCount}}-slide {{deckType}} for {{audience}} about
  {{topic}}. ...

renders with slideCount=10 (~2 ch in the textarea) under a slot
input forced to 10 ch in the overlay — clicking right after the
literal `-slide` placed the caret several characters into `pitch
deck`. The Image / Video / Audio chips with their pre-filled
subject, style, aspect values reproduced the same drift.

Render the inline pills as read-only `<span>`s carrying the exact
substring the textarea shows at that position, mark them
`aria-hidden` so the textarea remains the single labelled control,
and surface every plugin input field — including the ones referenced
inline — in `PluginInputsForm` underneath. Editing flows through the
form, the parent's `updateActiveInputs` already re-renders the
prompt, and the pills stay aligned with the textarea on every
keystroke.

Also drop the now-unused inline helpers (updatePluginInput,
getTemplateInputNames, shouldRenderSlotAsText, inlineFieldType,
fileInputLabel, fileMetadata) and the dead
`.home-hero__prompt-slot-control/input/select/toggle/file/text` CSS
rules.

Verified:
- pnpm --filter @open-design/web typecheck
- vitest run on HomeHero.plugin-picker, HomeHero.rail, and
  HomeView.prefill (29/29 pass; tests updated to reflect the new
  read-only span + always-on form contract)
- Manual click-to-edit on Slide deck and Image chips in the
  pnpm tools-dev web runtime — caret now lands where the user
  clicked.
2026-05-17 23:18:28 +08:00
Ethan Guo
0d2c87242f
fix(web): snapshot pending before reload so expired-auth cancel actually fires (#1972)
reloadConnectorStatuses commits its prune setState (and runs the connectorAuthorizationPendingRef mirror effect) between the two awaits in onFocus, so cancelStaleAuthorizations was reading a post-prune ref and never identified anyone as stuck. Capture the snapshot inside onFocus before reloadConnectorStatuses runs, and pass it explicitly so the expired-auth daemon cancel actually fires.

Refs #1909 #1354
2026-05-17 23:13:53 +08:00
kami
d36ceacf3a
feat(web): surface saved Project instructions for review and retrieval (#1933)
Saved Project instructions had no post-save read-back surface. The
header control was a bare pencil icon that opened an editor and closed
on Save, so after returning to the workspace a user could not confirm
what was stored, revisit it, or tell whether it was still active
(#1822).

Make the saved state discoverable: once instructions exist, the header
affordance becomes a persistent "Project instructions" chip. Opening it
shows a read-only review panel with a preview of the saved text and an
"Active - included in every message" status, plus a reopen-to-edit
action. Saving lands back on the review panel so the stored value is
read back immediately. The empty state keeps the pencil and opens the
editor directly.

Persistence and project-level system-prompt injection are unchanged.

Closes #1822

Co-authored-by: multica-agent <github@multica.ai>
2026-05-17 23:07:25 +08:00
kami
647433ccef
Fix Inspect preview transport flash (#1967)
Some checks failed
ci / Packaged mac smoke (push) Blocked by required conditions
ci / Packaged windows smoke (push) Blocked by required conditions
ci / Detect PR change scopes (push) Failing after 2s
ci / Validate workspace (push) Has been skipped
nix-check / build (push) Failing after 1s
ci / Packaged linux headless smoke (push) Has been skipped
* Fix inspect preview transport flash

Co-authored-by: multica-agent <github@multica.ai>

* Keep inactive srcdoc transport lazy

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-17 22:30:37 +08:00
kami
30ad8b8ac3
improve privacy consent modal: policy link, clearer CTAs, mobile layout (#1921)
The first-run consent banner had no link to a privacy policy, an
affirmative button ("Help improve") that didn't read as a consent
choice, and a fixed bottom-right card that crowded content on phones.

- Add a "Read the privacy policy" link (external-link icon, accent,
  underlined) above the actions, plus a root PRIVACY.md it points to
  documenting the telemetry behaviour the modal discloses.
- Rename the CTAs to "Share usage data" / "Don't share" so both name
  the action; they stay equal-prominence per the EDPB/GDPR comment.
- Stretch the banner to a bottom-edge bar under 540px with a
  safe-area inset so it clears mobile browser chrome.
- Add PrivacyConsentModal tests; sync the new i18n key to every
  locale and update the consent-label assertion in App.connectors.

Refs #1756
2026-05-17 20:24:15 +08:00
Ethan Guo
9cc1fb28f3
fix(web): apply Tweaks palette shift to :root CSS custom properties (#1952)
Walk document.styleSheets so chromatic custom properties on :root/html/body/:host (including @media-nested rules) are hue-shifted via inline setProperty, not just the values returned by getComputedStyle.

Fixes #1393
References: PR #1643 (open, in review) binds the Tweaks toolbar toggle to artifact `.tw-panel` panels via a new `injectTweaksBridge` in `srcdoc.ts` — adjacent surface but different problem. Our fix extends `injectPaletteBridge` to also shift `:root` CSS custom-property colors, which #1643 does not touch.
2026-05-17 20:16:57 +08:00
Ethan Guo
b7fa1d9286
fix(web): recover stuck Composio connector when auth is cancelled (#1909)
* fix(web): recover stuck Composio connector when auth is cancelled

When a user closes the Composio system-browser auth window without completing the flow, the connector card stayed in its loading state forever because nothing told the daemon the request was abandoned.

The window-focus refresh now awaits the status poll and, for every connector still in connectorAuthorizationPending whose freshly fetched status is not 'connected', issues cancelConnectorAuthorizationRequest and clears the matching local pending/error UI state. The connected guard is read from the just-returned statuses so a postMessage success that lands just before focus is not racy.

Fixes #1354

* fix(web): gate Composio focus auto-cancel on TTL + cancel response

Require expiresAt to have elapsed before auto-cancelling a pending
Composio auth on focus, and preserve the spinner on a failed cancel
(matching the manual Cancel handler) so the UI never disagrees with
the daemon.

Refs https://github.com/nexu-io/open-design/pull/1909#discussion_r3254152045
Refs https://github.com/nexu-io/open-design/pull/1909#discussion_r3254152046
2026-05-17 20:15:38 +08:00
Ethan Guo
324eca27ea
feat(web): add manual removal for captured Pod components (#1951)
* feat(web): add manual removal for captured Pod components

Adds `removePodMember` helper and a per-chip × in `BoardComposerPopover`; leaves `comments.ts` untouched (avoid-zone from #1127).

Closes #802
Contract: runs/2026-05-16T08-08-52_open-design_issue-feat/contract.md

* style(web): hide Pod chip × until chip hover

Swaps the unicode × for the existing `Icon name="close"` SVG so the
hit target stays centered, and fades the button in only on chip
hover / keyboard focus for a quieter resting state.

* fix(web): auto-close Pod composer when last chip is removed

Removing the last chip leaves a stale anchor; close so Send cannot attach to elements no longer visible.

* refactor(web): extract BoardComposerPopover and Pod-member reducer

Moves the popover to its own module and lifts the chip-removal reducer into a pure `applyPodMemberRemoval` so unit tests exercise the real code path and the popover's export is no longer test-only.

* fix(web): rebuild Pod anchor when a member is removed

Without this, the popover keeps the original union bbox / selector / label after each chip removal, so a subsequent Send to chat anchors the comment to elements no longer in the Pod.

* fix(web): render every captured chip and scroll on overflow

The previous slice(0, 6) cap left chips beyond the sixth invisible and undeletable. Render the full list inside a 132px-tall scrollable strip.
2026-05-17 20:13:56 +08:00
Ethan Guo
10c62bf6e4
fix(web): warn before editing a built-in skill creates a shadow (#1850)
Clicking Edit on a built-in skill silently issued PUT /api/skills/:id,
which the daemon stores as a user-owned shadow and then hides the
built-in entry from the Settings list. From the user's perspective the
row they just edited disappears with no explanation.

Arm an inline override-warning banner on Edit for source==='built-in'
skills, mirroring the existing inline delete-confirm pattern. The edit
form only opens after the user explicitly confirms. User skills bypass
the warning and edit directly. No daemon or listSkills contract change.

Fixes #1378
2026-05-17 18:30:45 +08:00
Ethan Guo
a6288cec3f
fix(web): allow editing existing routines from RoutinesSection (#1884)
The Routines list only exposed Run now / Pause / History / Delete, so any
change to a routine required deleting and recreating it. The daemon already
serves PATCH /api/routines/:id; only the web entry point was missing.

Add an Edit button on each routine row that pre-populates the create form
from the stored schedule and target via a new formFromRoutine helper, and
branch the submit handler on editingId so it PATCHes /api/routines/:id
with the updated fields instead of POSTing a new routine.

Fixes #1373
2026-05-17 15:02:54 +08:00
张东明
bac56415a2
fix(web): surface daemon error messages for invalid folder imports (#1923)
* fix(web): surface daemon error messages for invalid folder imports

importFolderProject() was swallowing non-2xx responses by returning
null, so the UI could only show a generic "Open folder failed: <path>"
message even though the daemon already returns specific errors like
"cannot import the filesystem root" or "folder not found".

Parse the daemon error body and throw so the panel displays the actual
reason. Also show feedback for empty path input instead of silently
returning.

Fixes #1186

* test(web): update folder import test to match new error propagation

The existing test expected a generic "Open folder failed: <path>"
message from a boolean return. Update to match the new behavior where
the daemon's error message is thrown and displayed directly.
2026-05-17 15:00:49 +08:00
kami
e64f1d8497
fix(desktop): export PDF saves to a file instead of the OS print dialog (#1920)
* fix(desktop): export PDF saves to a file instead of the OS print dialog

The `od:print-pdf` IPC handler called `webContents.print()`, which
opens the printer-first macOS system print dialog. That handler is the
destination of the renderer's `window.__odDesktop.printPdf()` bridge,
so on desktop "Export PDF" felt like a print flow rather than a file
export — from both the PreviewModal share menu (which always uses this
path) and the FileViewer share menu (whose daemon-route fallback lands
here too).

Route the handler through a direct Save-as-PDF flow instead: a native
Save dialog, then `webContents.printToPDF()` straight to the chosen
file — the same shape `exportPdfFromHtml` already uses for the
daemon-backed export path. The flow is extracted into
`savePrintReadyDocumentAsPdf` behind a structural `PrintReadyPdfTarget`
surface that has no `print()` method, so the regression cannot be
reintroduced.

Fixes #1774

Co-authored-by: multica-agent <github@multica.ai>

* fix(desktop): preserve PDF page sizing in print bridge

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-17 11:26:08 +08:00
Yuhao Chen
2f553941d3
fix(runtime): auto-annotate imported HTML elements for Tweaks selection (#892) (#1169)
* fix(runtime): auto-annotate imported HTML elements for Tweaks selection (#892)

* fix(runtime): always annotate missing od-ids and broaden selector (#892)

- Remove the conditional gate so annotateMissingOdIds runs for every
  srcdoc, not just comment/inspect bridges. This fixes the persistence
  regression where saved Inspect tweaks reference synthesized data-od-id
  selectors that vanish when the bridge is rebuilt without annotations.

- Expand the selector to cover div-based imported HTML (div[class],
  div[id]), headings, buttons, and links so Picker/Pods work on
  common anonymous wrapper markup.

Addresses review feedback from lefarcen and mrcfps.

* fix(runtime): narrow div selector, skip iframe/object/embed, add tests (#892)

- Replace broad div[class] with direct-child combinators only under
  semantic containers, body, and [id] elements to avoid layout noise
- Add iframe, object, embed to the skip list alongside script/style
- Add tests for: direct-child divs, nested div skip, always-on behavior,
  skip list with id attributes, div children of [id] elements
- Format selector as array join and skip tags as Set for readability

* fix(runtime): let annotateManualEditSourcePaths coexist with data-od-id (#892)

annotateMissingOdIds now runs unconditionally, so elements like main/h1
get data-od-id before annotateManualEditSourcePaths runs. The old skip
condition (has data-od-id) incorrectly prevented source-path marking on
those elements. Change the guard to skip only when the element already
has data-od-source-path, allowing both attributes to coexist.
2026-05-16 16:17:36 +08:00
Yuhao Chen
26502ea124
test(web): decouple memory preview icon assertion (#1863) 2026-05-16 11:37:34 +08:00
mehmet turac
6b15236843
fix(web): distinguish expanded memory preview action (#1813) 2026-05-15 23:08:30 +08:00
Yuhao Chen
19d65678a3
fix(web): keep filter pill hover labels readable (#1828) 2026-05-15 23:07:43 +08:00
mehmet turac
9689dce1ad
fix(web): align comment marker numbering (#1826) 2026-05-15 23:07:32 +08:00
mehmet turac
444f7d03eb
fix(web): reveal memory editor after edit click (#1827) 2026-05-15 23:07:18 +08:00
mehmet turac
7d9488300e
fix(web): keep draw overlay scrollable (#1848) 2026-05-15 23:05:49 +08:00
mehmet turac
c4a8e92916
fix(web): add breathing room to plugin publish footer (#1849) 2026-05-15 23:02:36 +08:00
lefarcen
22a3b99a47 Merge origin/main into preview/v0.8.0
Sync 49 commits from main. Conflicts resolved:
- .github/workflows/ci.yml: kept v0.8.0 granular per-area gating, added main's
  linux specs + release-stable.yml + release-preview.yml triggers
- .github/workflows/release-preview.yml: kept v0.8.0's full workflow over main's placeholder
- apps/web/src/components/AssistantMessage.tsx: combined v0.8.0 file-ops
  summary with main's stripTodoToolGroups + suppressAskUserQuestionFallbackText
- apps/web/src/components/ChatPane.tsx: kept both new imports
- apps/web/src/index.css: kept both .msg-plugin-chip and .user-copy-btn blocks
- e2e/ui/*.test.ts: kept v0.8.0 openEntrySettingsDialog helper over main's
  inline dialog navigation (UI was redesigned in v0.8.0)
- nix/package-{daemon,web}.nix: kept v0.8.0 pnpmDepsHash; rerun nix build to refresh
2026-05-15 18:23:33 +08:00
mehmet turac
15ebe8b266
fix(web): keep picker hint clear of comments panel (#1820) 2026-05-15 17:51:12 +08:00
mehmet turac
fd80d7c997
fix(web): clear draw ink when exiting mode (#1821) 2026-05-15 17:48:28 +08:00
Nicholas-Xiong
9cfb01e6b0
feat: add Italian (it) locale support (#1323)
* feat: add Italian (it) locale support

- Add complete Italian translation dictionary (apps/web/src/i18n/locales/it.ts)
- Register 'it' locale in types.ts (Locale type, LOCALES array, LOCALE_LABEL)
- Import and register Italian dictionary in index.tsx
- All 1352 translation keys translated to Italian
- Follows the same structure as French locale (PR #326)

Closes #1245

* test: Update locale tests to include Italian (it)

1. Add 'it' to EXPECTED_LOCALES array
2. Add LOCALE_LABEL.it assertion to verify 'Italiano' label

This fixes the CI test failure and completes the Italian locale integration.

---------

Co-authored-by: lefarcen <935902669@qq.com>
2026-05-15 16:38:55 +08:00
leessju
4e19c3f4f3
Prevent imported Claude canvases from zooming on scroll (#1726)
* Preserve HTML preview state across mode toggles

HTML previews could rebuild their iframe when switching into Edit or Comment, which reset scroll/canvas state and caused visible churn for multi-file artifacts. The viewer now keeps URL-loaded previews mounted when the artifact owns the mode bridge, relays file-refreshes through frame navigation, and restores preview scroll/viewport state across bridge mode changes.

Constraint: Generic srcdoc-only bridges are still required for unbridged artifacts, inspect mode, palette tweaks, decks, draw overlays, and forceInline.

Rejected: Keep all Edit/Comment previews on srcdoc | causes unnecessary iframe replacement for bridge-capable URL-loaded artifacts.

Confidence: high

Scope-risk: moderate

Directive: Do not enable URL-load for bridge-dependent modes unless the artifact has an owned postMessage bridge.

Tested: pnpm guard

Tested: pnpm --filter @open-design/web typecheck

Tested: pnpm --filter @open-design/web test

Tested: Playwright verified Edit and Comment toggles preserve iframe src and DOM node while receiving comment targets.

* Prevent preview wheel gestures from escaping into zoom

Trackpad pinch-like wheel events arrive with ctrl/meta modifiers on some platforms, which can make a normal vertical scroll feel like the preview zoomed. The preview now consumes those modified wheel events inside the host preview shell and in injected srcdoc previews, then maps the delta back to scroll where a scroll target exists.

Constraint: URL-loaded sandbox iframes cannot always be inspected by the host, so srcdoc previews need their own in-frame guard.

Rejected: Add allow-same-origin to preview iframes | weakens the sandbox boundary for generated artifacts.

Confidence: medium

Scope-risk: narrow

Directive: Do not broaden iframe sandbox permissions to fix gesture handling without a security review.

Tested: pnpm guard

Tested: pnpm --filter @open-design/web typecheck

Tested: pnpm --filter @open-design/web exec vitest run tests/components/FileViewer.test.tsx tests/runtime/srcdoc.test.ts

Tested: playwright-cli verified ctrl-wheel in preview keeps app zoom at 100% and prevents default in the iframe context

* Revert "Prevent preview wheel gestures from escaping into zoom"

This reverts commit 976407ab4c.

* Prevent imported Claude canvases from zooming on scroll

Claude Design exports can classify ordinary macOS two-finger vertical wheel events as mouse-wheel zoom clicks inside design-canvas.jsx. Normalize that imported canvas code so plain wheel input pans, while Cmd+wheel remains the explicit zoom gesture.

Constraint: The offending canvas code lives inside imported user artifacts rather than a tracked runtime component, so the fix belongs in the Claude Design zip import normalization path.\nRejected: Host-side wheel interception | wheel events inside the sandboxed iframe are handled by the artifact before the host can reliably classify them.\nRejected: Disable all wheel zoom | users still need Cmd+wheel as an explicit zoom control.\nConfidence: high\nScope-risk: narrow\nDirective: Keep plain wheel as pan-only for imported design-canvas.jsx unless a future bridge provides an explicit wheel-mode handshake.\nTested: pnpm --filter @open-design/daemon exec vitest run tests/claude-design-import.test.ts\nTested: pnpm --filter @open-design/daemon typecheck\nTested: pnpm guard

---------

Co-authored-by: nicejames <nicejames@gmail.com>
Co-authored-by: lefarcen <935902669@qq.com>
2026-05-15 16:37:57 +08:00
Nicholas-Xiong
d16acf6462
fix: Add error feedback for manual folder path import (#1666)
* fix: Add error feedback for manual folder path import

Fixes #1408

When users manually enter a folder path and click 'Open folder' (non-Electron
environment), the app now provides clear error feedback if the import fails.

**Before:**
- No error clearing before import
- No error handling for failed imports
- Silent failures left users confused

**After:**
- Clears previous errors before attempting import
- Catches and displays import errors with clear messages
- Success feedback is implicit (navigation to the opened project)

**Why implicit success feedback:**
The parent handler (Home.tsx) navigates to the newly opened project on success,
which provides clear visual feedback by changing the entire view. An additional
toast would be redundant.

**Error handling:**
- Catches all errors from onImportFolder
- Displays user-friendly error messages
- Preserves error details when available

* fix: surface failed folder imports

---------

Co-authored-by: Siri-Ray <2667192167@qq.com>
2026-05-15 16:36:24 +08:00
Zihan Zhao
cfcfbe0178
Inline attached file context for BYOK chats (#1730)
BYOK/API-mode chats bypass the daemon run path, so attached project
files were saved as message metadata but their readable contents were
not sent to the provider. This adds a web-side attachment context step
for API-mode requests, reusing raw text reads and existing document
preview extraction.

Constraint: Docker PDF previews require pdftotext in the runtime image
Confidence: high
Scope-risk: moderate
Tested: corepack pnpm --filter @open-design/web test -- tests/api-attachment-context.test.ts tests/components/ProjectView.api-empty-response.test.tsx
Tested: corepack pnpm --filter @open-design/web typecheck
Tested: corepack pnpm --filter @open-design/web build
Tested: corepack pnpm guard
Tested: corepack pnpm typecheck
2026-05-15 15:52:15 +08:00
Yuhao Chen
b2d2635360
fix(web): hide resolved comments from preview overlays (#1762) 2026-05-15 15:46:03 +08:00
Quang Do
88db51521d
feat(web): add custom select primitive (#1714)
Some checks failed
ci / Packaged mac smoke (push) Blocked by required conditions
ci / Packaged windows smoke (push) Blocked by required conditions
ci / Detect PR change scopes (push) Failing after 2s
ci / Validate workspace (push) Has been skipped
nix-check / build (push) Failing after 1s
* feat(web): add custom select primitive

* fix(web): harden custom select active option state
2026-05-15 14:43:18 +08:00
Tom Huang
c5d77a03bd
Garnet hemisphere (#1769)
Some checks failed
nix-check / build (push) Failing after 2s
* feat(chat-composer): enhance mention handling and input overlay

- Introduced a new overlay for inline mentions in the chat composer, improving user experience by visually indicating mentions as users type.
- Updated the `ChatComposer` component to manage mention entities and integrate them into the input field, allowing for better context and interaction.
- Enhanced the `AssistantMessage` component to support the display of plugin action panels based on the current project context, facilitating easier plugin management.
- Refactored related components to ensure consistent handling of project files and mentions across the application.

This update significantly improves the chat interaction model, making it more intuitive for users to engage with mentions and plugins.

* feat(plugin-management): enhance plugin action panels and UI components

- Updated the `AssistantMessage` component to include plugin action panels based on the latest project context, improving user interaction with generated plugins.
- Refactored the `PluginsView` to support detailed views for available marketplace entries, allowing users to access more information and actions for each plugin.
- Introduced new CSS styles for improved visual representation of plugin-related UI elements, enhancing overall user experience.
- Enhanced the `listPlugins` function to include an option for fetching hidden plugins, providing more flexibility in plugin management.

This update significantly improves the usability and functionality of the plugin management system, making it easier for users to interact with and manage their plugins.

* fix(assistant-message): refine plugin folder candidate selection logic

- Updated the `pluginFoldersTouchedThisTurn` function to improve the logic for selecting plugin folder candidates based on touched paths and message content.
- Introduced a new helper function, `pathMatchesFolderFileBasename`, to enhance the matching criteria for folder candidates.
- Added a check for explicit folder matches before falling back to a single candidate, improving accuracy in folder selection.
- Modified the `shouldRenderSlotAsText` function in `HomeHero` to include the name parameter, refining the rendering logic for slot text.

These changes enhance the functionality and reliability of the assistant message component in managing plugin folder candidates.

* feat(plugin-folder-actions): implement agent-routed CLI actions for plugin management

- Introduced a new `PluginFolderAgentAction` type to streamline actions related to plugin folders, including install, publish, and contribute.
- Updated the `DesignFilesPanel`, `FileWorkspace`, and `AssistantMessage` components to utilize the new agent action handling, improving user interaction with generated plugins.
- Refactored the action handling logic to send commands to the agent, enhancing the workflow for managing plugin folders.
- Added corresponding tests to ensure the new functionality works as expected and integrates seamlessly with existing components.

This update significantly enhances the plugin management experience by routing actions through the agent, allowing for a more cohesive and interactive user experience.

* Fix PR 1702 CI blockers

* Fix PR 1702 remaining CI checks

* Prebuild AGUI adapter after install

* Restore plugin project snapshot wiring

* feat(marketplace): refactor marketplace URL handling and enhance fetching logic

- Introduced new functions to normalize marketplace URLs and manage fetching of marketplace manifests, improving the reliability of marketplace integrations.
- Updated the server and plugin logic to utilize the new fetching mechanisms, ensuring consistent handling of marketplace data.
- Enhanced tests to cover new URL normalization and fetching scenarios, ensuring robustness in marketplace management.

This update significantly improves the marketplace experience by streamlining URL handling and enhancing data fetching capabilities.

* Fix project auto-send cleanup spec

* Reconcile run messages on cancel

* Use active design system as visual direction

* Fix active design system prompt wording

* feat(workspace-tabs): implement workspace tabs functionality and file attachment handling

- Introduced a new `WorkspaceTabsBar` component to manage workspace tabs, allowing users to navigate between different views (projects, marketplace, etc.).
- Enhanced file handling capabilities in the `HomeHero` and `EntryShell` components, enabling users to stage and attach files before project creation.
- Updated the `App` component to support auto-sending attachments alongside the first message in a project.
- Improved CSS styles for workspace tabs and attachment UI, ensuring a cohesive design and user experience.

This update significantly enhances the workspace navigation and file management features, providing users with a more intuitive and efficient workflow.

* refactor(workspace-tabs): streamline workspace tabs and UI components

- Removed unused components and actions from the `WorkspaceTabsBar` and `AppChromeHeader`, simplifying the codebase.
- Updated CSS styles for the workspace shell and tabs, enhancing visual consistency and reducing element sizes for a cleaner layout.
- Introduced a new client type detection mechanism to dynamically adjust the workspace shell's class, improving responsiveness.
- Added tests for the `WorkspaceTabsBar` to ensure proper navigation and tab management functionality.

These changes improve the overall performance and user experience of the workspace navigation system.

* Update critical e2e for entry modal flow

* Stabilize entry critical e2e flows

* fix(ui): adjust workspace tabs and header styles for improved layout

- Updated the CSS for workspace tabs and the app header, reducing element sizes and padding for a cleaner appearance.
- Introduced a new button in the `WorkspaceTabsBar` for quick access to the home tab, enhancing navigation.
- Minor adjustments to the layout and styles to ensure consistency across components.

These changes enhance the user interface and improve the overall user experience in the workspace navigation system.

* feat(workspace-tabs): implement pinned home tab functionality

- Added a new pinned home tab feature to the `WorkspaceTabsBar`, allowing the home tab to remain accessible during navigation.
- Updated tab management logic to collapse duplicate home tabs into a single pinned instance when restoring from local storage.
- Enhanced CSS styles for workspace tabs to accommodate the new pinned tab design.
- Updated tests to verify the behavior of the pinned home tab and its interaction with other tabs.

These changes improve navigation consistency and user experience within the workspace.

* refactor(workspace-tabs): enhance tab management and styling

- Updated CSS styles for workspace tabs, adjusting padding and flex properties for improved layout and consistency.
- Refactored tab creation logic to ensure unique IDs for project and marketplace tabs, enhancing navigation clarity.
- Removed deprecated functions related to pinned home tabs, streamlining the codebase.
- Improved test cases to verify independent behavior of home tabs during navigation.

These changes enhance the user experience by providing a more intuitive tab management system and a cleaner UI.

* style(workspace-tabs): update CSS for improved layout and visibility

- Adjusted CSS properties for workspace tabs, including overflow, position, and z-index to enhance layout and stacking context.
- Ensured consistent styling across tab components for better visual hierarchy.

These changes contribute to a more polished and user-friendly interface within the workspace.

* style(entry-layout): update CSS variables for improved layout consistency

- Replaced fixed width values with CSS variables for the entry rail to enhance flexibility.
- Adjusted padding and height properties for better visual alignment and spacing.
- Introduced a new background style for the entry main topbar to improve aesthetics.

These changes contribute to a more responsive and visually appealing layout in the entry view.

---------

Co-authored-by: qiongyu1999 <2694684348@qq.com>
Co-authored-by: Eli <129168833+qiongyu1999@users.noreply.github.com>
2026-05-15 14:42:11 +08:00
ngoduybien
843b6fec4f
fix(web): fall back to srcDoc when HTML preview needs sandbox shim (#1306)
* fix(web): fall back to srcDoc preview when HTML needs the sandbox shim

The URL-load HTML preview iframe is sandboxed with `allow-scripts`
only — no `allow-same-origin` — so any artifact that reads
`localStorage`/`sessionStorage` at startup throws SecurityError, its
React tree unmounts, and the preview goes blank. The srcDoc path
already polyfills both via `injectSandboxShim`
(apps/web/src/runtime/srcdoc.ts) before any user script runs, but
URL-load served raw HTML untouched. Agent-emitted React prototypes
that read Web Storage at mount went blank until the user toggled
Tweaks (which forces the srcDoc path).

Detect the two reliable signals — `<script type="text/babel">`
(Babel-standalone XHR-fetches and evals sibling `.jsx` files at
runtime; those routinely read Web Storage from `useState`
initializers) and direct `localStorage` / `sessionStorage` references
in the source — and set `forceInline` automatically so those
artifacts route back through the srcDoc path. Plain static HTML keeps
the URL-load benefits (real source maps, per-asset HTTP caching,
isolated per-script failures).

No new daemon endpoint, no new contract, no sandbox loosening. Pure
content sniff in the existing render-mode helper; reuses the same
`forceInline` seam the `parseForceInline` opt-out already uses. Tests
cover the new helper across positive (Babel-standalone variants with
attribute reordering, quoting, whitespace, case) and negative (plain
script tags, module type, JSON type, substring lookalikes) cases.

* fix(web): address review feedback on sandbox-shim fallback

- Accept unquoted `<script type=text/babel>` per HTML5 attribute syntax
  (regex `["']` → `["']?`). Adds a focused test covering bare unquoted,
  unquoted with unquoted `src=`, mixed unquoted/quoted, and the negative
  case `<script type=text/babelish>` to confirm the trailing word-boundary
  still rejects look-alikes.

- Memoize `htmlNeedsSandboxShim(source)` on `source`. HtmlViewer re-renders
  on board/inspect/edit/slide state changes; the scan only changes when
  the source itself does. Cheap micro-opt, free correctness win.

- Narrow the helper docstring's scope claim and add an explicit known
  limitation: external scripts (`<script src="./app.js">`,
  `<script type="module" src="./main.js">`) that read Web Storage during
  module eval are *not* covered — the helper only sees the document, not
  the linked subresource. Workaround documented: `?forceInline=1` or
  Tweaks. Catching this case would require fetching every script
  reference before deciding load strategy, duplicating browser work; not
  worth the cost until a real report surfaces.

* fix(web): correct inline comment on `\b` boundary behavior

The comment claimed `\b` rejects `text/babel-other`, but `\b` matches
between `l` and `-` (hyphen is a non-word char), so the regex actually
does match that input. The test asserts `text/babelish` as the negative
case, which `\b` does correctly reject (`i` is a word char). Comment
now matches the regex's actual behavior, with a note that hyphenated
variants are a harmless false positive (srcDoc fallback is the safe
direction) and a pointer to the `(?=[\s>"'])` lookahead tightening if
a real case ever surfaces.

No behavior change; existing tests still pass.

* fix(web): align test comment with helper docstring on hyphenated variants

Same class of inconsistency the previous commit fixed in the helper:
the test comment claimed `type=text/babel-other` "remains a non-match",
but the assertion actually covers `type=text/babelish`, and the helper
docstring explicitly documents hyphenated variants as a safe
false-positive that does match. Comment now describes both shapes
correctly and explains why the hyphenated variant isn't asserted
(it's the documented safe direction, not a regression).

No behavior change; test count unchanged.

* chore: trigger CI
2026-05-15 14:41:23 +08:00
chaoxiaoche
bcc58af931
refactor(web): rename Execution mode and tighten settings dialog UI (#1568)
* refactor(web): rename Execution mode and tighten settings dialog UI

- Rename "Settings → Execution & model" to "Settings → Execution mode"
  across the web UI, i18n keys, docs, and e2e selectors.
- Redesign SettingsDialog: kicker + title row in the modal head, a
  flatMap-driven agent grid that renders the inline test-result row
  beside the selected card, compact unavailable cards with right-aligned
  install/docs links, and an install guide that only shows when the
  user has no working agent picked.
- Trim verbose subtitle / hint copy across chat model, CLI proxy,
  media providers, custom instructions, and memory sections.
- Add an `info` Icon variant for the redesigned settings hints.
- Update e2e selectors and docs that referenced the old menu label.

Co-authored-by: Cursor <cursoragent@cursor.com>

* refactor(web): polish Settings dialog — media providers, skills, MCP

Media providers
- Hide internal Stub fixture provider (settingsVisible: false)
- Split provider list into Available (integrated, editable) and Coming
  Soon (collapsed <details> drawer with name/hint/Docs link only)
- Drop right-side Integrated/Configured badges from every row; all rows
  in the main list are integrated by definition; inline grey "Saved"
  chip next to the provider name is the only status indicator now
- "Saved" badge moves inline to the right of the provider name and uses
  a neutral grey treatment (was a standalone green pill below the name)
- "Reload from daemon" button shows a 2s green "✓ Reloaded" flash on
  success instead of leaving a permanent paragraph under the header;
  errors remain sticky

Skills
- Replace three pill-row filter banks (Source, Type, Category) with a
  compact single-row toolbar: search + three inline <select> dropdowns
  side by side; active filter highlighted with a stronger border

MCP server
- Shorten section hint to one line
- Move WHAT YOUR AGENT CAN DO capabilities above the client dropdown
  (motivate before asking to act)
- Move "Build the daemon first" warning below the code block where it
  contextually explains why the command might fail, not as a top-level
  error before the user has done anything
- Downgrade "Restart your client" left-border from accent orange to
  border-strong grey — it is a next step, not a warning

External MCP
- Shorten section hint to one line

Misc CSS
- Add .sr-only utility for accessible off-screen live regions
- Add button.ghost.is-success-flash for transient success feedback
- Add .library-filter-selects / .library-filter-select for dropdown
  filter rows
- Add .media-provider-coming-soon-* for the roadmap drawer

Co-authored-by: Cursor <cursoragent@cursor.com>

* [codex] Add Cursor Agent auth diagnostics (#1538)

* Add Cursor Agent auth diagnostics

* Handle Cursor not logged in auth status

* Address Cursor auth review feedback

* Classify Cursor stdout auth failures

* test: expand Memory and Routines coverage (#1521)

* test: expand settings and packaged coverage

* test: extend memory settings coverage

* test: cover routine settings failure states

* test: cover routine operation failures

* test: fix daemon test typing on CI

* test: decouple packaged smoke from orbit bug

* test: avoid live memory LLM calls in route tests

* test: fix daemon fetch typing in CI

* fix: restore preview comment and inspect toggles

* test: align manual edit flow with current inspector UX

* test: align comment attachment flow with current preview comments UI

* fix: probe resolved Codex launch path during detection

* fix: remove duplicate board activation helper after rebase

* test: update ghost cli detection mock

* test: align FileViewer toolbar expectation

* ci: move full app tests to extended lane

* ci: run app tests by changed scope

* ci: cover shared app inputs in test scopes

* ci: avoid setup-node cache in windows packaged smoke

* test: align extended settings and manual edit flows

* refactor(web): rename Execution mode and tighten settings dialog UI

- Rename "Settings → Execution & model" to "Settings → Execution mode"
  across the web UI, i18n keys, docs, and e2e selectors.
- Redesign SettingsDialog: kicker + title row in the modal head, a
  flatMap-driven agent grid that renders the inline test-result row
  beside the selected card, compact unavailable cards with right-aligned
  install/docs links, and an install guide that only shows when the
  user has no working agent picked.
- Trim verbose subtitle / hint copy across chat model, CLI proxy,
  media providers, custom instructions, and memory sections.
- Add an `info` Icon variant for the redesigned settings hints.
- Update e2e selectors and docs that referenced the old menu label.

Co-authored-by: Cursor <cursoragent@cursor.com>

* refactor(web): settings dialog UX polish — layout, dedup, and interactions

- Remove duplicate section headers from all settings sections
  (Notifications, Appearance, Privacy, About, Design Systems, Skills,
  MCP server, Connectors, Media providers, Routines)
- Restructure Notifications cards: title + toggle on same row, hint below
- Restructure Skills toolbar: search + New skill button in row 1,
  filter dropdowns in row 2 with left-aligned labels
- Restructure Pet section: tabs and Wake button on same row
- MCP server: group capabilities and setup into separate cards,
  remove nested double border on client picker
- Connectors: show connect errors as toast instead of inline card text,
  position toast inside panel, hide single-provider tab
- Media providers: move Reload button to left-aligned small ghost button
- Memory: info icon shows path on hover, Path copied badge inline;
  Extraction history and MEMORY.md as standalone collapsible cards;
  group header hidden when only one type visible
- Pet grid cards: Adopt button hidden until hover, icon-only when adopted,
  description truncated to 2 lines, text fills full width via abs positioning
- Agent cards: selected state uses accent border only, no background change
- Add sun/moon icons to Appearance theme buttons (Light/Dark)
- Shorten several hint strings for clarity

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): resolve i18n review comments from PR #1568

- Update settings.title and settings.envConfigure to localized
  "Execution mode" in all 17 non-English locale files
- Add settings.memoryFlashPathCopied to all locales and use t()
  in MemorySection instead of hardcoded English "Path copied"
- Add settings.agentModelHead to all locales and use t() in
  SettingsDialog for "Model for:" agent model row header

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): update tests to match settings dialog redesign

- Add role prop to Toast (alert/status) so error toasts from
  ConnectorsBrowser are announced immediately by screen readers
- Clear connectErrorToast on successful connector retry
- Update SettingsDialog.execution tests:
  - Remove heading assertions for About and MCP server (headers
    were intentionally removed as duplicate nav labels)
  - Rewrite CLI env test to use codex-only fields (per-agent
    filtering means only selected agent's fields are shown)
  - Update Composio key hint text assertion to match shortened copy
  - Replace filter button click with select change for Type filter
  - Replace Configured/Unsupported/Integrated badge checks with
    updated assertions matching the new media provider UI
  - Replace disabled BFL row test with coming-soon section check
- Update SettingsDialog.media test: remove Fal.ai input assertions
  (non-integrated providers no longer have editable fields)

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): unblock CI for #1568

Three small fixes to get Playwright back to green on the settings
dialog redesign:

1. `en.ts`: revert `settings.envConfigure` to "Configure execution mode".
   This PR collapsed both `settings.title` (header gear) and
   `settings.envConfigure` (entry-side foot pill) to the same string
   "Execution mode", so `getByRole('button', { name: 'Execution mode' })`
   resolved to two elements and tripped Playwright strict mode in the
   three Composio-flow tests (entry-configuration-flows.test.ts:174,
   228, 285). Restoring the distinct label also gives screen readers
   a clearer hint for the pill, which doubles as a status display.
   Non-English locales still alias the two keys; happy to follow up
   on those, but they don't gate the (English-only) Playwright suite.

2. entry-configuration-flows.test.ts:167 — `Connectors` heading is now
   rendered at `<h2>` in the modal-head (SettingsDialog.tsx:1545), with
   the inner `<h3>` removed by design (see comment around line 1448).
   Updated the assertion from `level: 3` to `level: 2`.

3. project-management-flows.test.ts:360 — same change for the `Pets`
   heading.

Verified locally with `pnpm --filter @open-design/web typecheck` and
`pnpm --filter @open-design/e2e typecheck`. The actual Playwright
specs need the dev server up; I didn't rerun them here, but the
locator changes are mechanical and match the new DOM.

* fix(web): use exact match for Execution mode button locator

Playwright's `getByRole({ name })` defaults to substring matching, so
`{ name: 'Execution mode' }` still resolved to both the header gear
(aria-label "Execution mode") and the entry-side foot pill (aria-label
"Configure execution mode" — substring contains "Execution mode").
Strict mode tripped in the three composio-flow tests at lines 202,
257, and 319.

Adding `exact: true` makes each call resolve to just the header gear,
which opens the same dialog the foot pill does — the test outcomes
are unchanged.

---------

Co-authored-by: chaoxiaoche <chaoxiaoche@chaoxiaochedeMacBook-Pro.local>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Caprika <56862773+alchemistklk@users.noreply.github.com>
Co-authored-by: shangxinyu1 <shangxinyu@refly.ai>
Co-authored-by: lefarcen <935902669@qq.com>
2026-05-15 14:35:06 +08:00
Quang Do
a41d4f6126
fix(web): keep chat pinned during content growth (#1716) 2026-05-15 14:12:00 +08:00
Prantik Medhi
01e54700a2
fix(web): make file grouping by kind work (#1551)
* fix(web): group design files by kind

* fix(web): unblock CI for #1551

- FileViewer test (line 434): add missing `projectKind="prototype"`
  to match every other instance; this was the source of the typecheck
  failure blocking workspace validation.
- DesignFilesPanel "groups files by kind" test: assert against
  `.df-section-label` elements so the section header check is not
  ambiguous with the per-row kind cell text.
- DesignFilesPanel batch-delete test: derive the expected file names
  from the rendered row testids and use `arrayContaining` so the
  assertion no longer depends on the (now kind-default) row order.

* fix(web): satisfy strict-index typecheck in batch-delete test

`onDeleteFiles.mock.calls[0][0]` tripped `noUncheckedIndexedAccess`
("Object is possibly 'undefined'"). Drop the separate length probe and
assert the exact array instead — `selected` is a `Set`, `handleBatchDelete`
spreads it with `[...selected]`, and the test clicks rows[0]/rows[1] in
that order, so insertion order is deterministic and equals
`[firstName, secondName]`.

---------

Co-authored-by: lefarcen <935902669@qq.com>
2026-05-15 13:07:27 +08:00
Prantik Medhi
8aeedf368b
fix(web): localize accent controls in settings (#1565)
* fix(web): localize accent controls

* fix(web): localize accent default label

* fix(web): unblock CI for #1565

Add missing `projectKind="prototype"` to the FileViewer deck-render
test (line 434) so workspace typecheck stops failing on the
`Property 'projectKind' is missing` error. This mirrors every other
FileViewer render in the same file and is unrelated to the accent
localization changes in this PR — it's drift from a recent change
on main that made `projectKind` required.

---------

Co-authored-by: lefarcen <935902669@qq.com>
2026-05-15 12:09:28 +08:00
Yuhao Chen
b0963fd874
fix(web): allow downloads from preview iframes (#1732) 2026-05-15 11:55:29 +08:00
Tom Huang
76defffb93
Garnet hemisphere (#1702)
* feat(chat-composer): enhance mention handling and input overlay

- Introduced a new overlay for inline mentions in the chat composer, improving user experience by visually indicating mentions as users type.
- Updated the `ChatComposer` component to manage mention entities and integrate them into the input field, allowing for better context and interaction.
- Enhanced the `AssistantMessage` component to support the display of plugin action panels based on the current project context, facilitating easier plugin management.
- Refactored related components to ensure consistent handling of project files and mentions across the application.

This update significantly improves the chat interaction model, making it more intuitive for users to engage with mentions and plugins.

* feat(plugin-management): enhance plugin action panels and UI components

- Updated the `AssistantMessage` component to include plugin action panels based on the latest project context, improving user interaction with generated plugins.
- Refactored the `PluginsView` to support detailed views for available marketplace entries, allowing users to access more information and actions for each plugin.
- Introduced new CSS styles for improved visual representation of plugin-related UI elements, enhancing overall user experience.
- Enhanced the `listPlugins` function to include an option for fetching hidden plugins, providing more flexibility in plugin management.

This update significantly improves the usability and functionality of the plugin management system, making it easier for users to interact with and manage their plugins.

* fix(assistant-message): refine plugin folder candidate selection logic

- Updated the `pluginFoldersTouchedThisTurn` function to improve the logic for selecting plugin folder candidates based on touched paths and message content.
- Introduced a new helper function, `pathMatchesFolderFileBasename`, to enhance the matching criteria for folder candidates.
- Added a check for explicit folder matches before falling back to a single candidate, improving accuracy in folder selection.
- Modified the `shouldRenderSlotAsText` function in `HomeHero` to include the name parameter, refining the rendering logic for slot text.

These changes enhance the functionality and reliability of the assistant message component in managing plugin folder candidates.

* feat(plugin-folder-actions): implement agent-routed CLI actions for plugin management

- Introduced a new `PluginFolderAgentAction` type to streamline actions related to plugin folders, including install, publish, and contribute.
- Updated the `DesignFilesPanel`, `FileWorkspace`, and `AssistantMessage` components to utilize the new agent action handling, improving user interaction with generated plugins.
- Refactored the action handling logic to send commands to the agent, enhancing the workflow for managing plugin folders.
- Added corresponding tests to ensure the new functionality works as expected and integrates seamlessly with existing components.

This update significantly enhances the plugin management experience by routing actions through the agent, allowing for a more cohesive and interactive user experience.

* Fix PR 1702 CI blockers

* Fix PR 1702 remaining CI checks

* Prebuild AGUI adapter after install

* Restore plugin project snapshot wiring

* feat(marketplace): refactor marketplace URL handling and enhance fetching logic

- Introduced new functions to normalize marketplace URLs and manage fetching of marketplace manifests, improving the reliability of marketplace integrations.
- Updated the server and plugin logic to utilize the new fetching mechanisms, ensuring consistent handling of marketplace data.
- Enhanced tests to cover new URL normalization and fetching scenarios, ensuring robustness in marketplace management.

This update significantly improves the marketplace experience by streamlining URL handling and enhancing data fetching capabilities.

* Fix project auto-send cleanup spec
2026-05-14 21:12:50 +08:00
sakshyasinha
c4a67a7b3e
Fix Kimi CLI icon contrast in light mode (#1667)
* fix(web): improve Kimi CLI icon contrast

* fix(web): render Kimi icon via theme-aware CSS mask

Move Kimi to the MONO_ICONS set so it renders through CSS mask
with currentColor adaptation, making it legible in both light and
dark themes instead of baking a single dark fill that fails on
dark backgrounds.

* fix(web): adjust Kimi icon secondary mark for dual-theme contrast

Keep Kimi as a baked two-tone asset: blue accent (#1783ff) for brand
identity, mid-tone gray (#666666) secondary mark for acceptable contrast
on both light and dark card surfaces. Revert from mask path to preserve
the blue branding.

* fix(web): correct corrupted Kimi SVG and strengthen asset validation test

Remove extraneous PR discussion text that was accidentally included
in the SVG file. Strengthen the test to validate the bundled asset
is valid SVG with the expected fills (blue accent + gray secondary
mark), catching asset corruption that would otherwise go undetected.
2026-05-14 20:32:52 +08:00
lefarcen
640a332276 Merge remote-tracking branch 'origin/garnet-hemisphere' into reconcile/garnet-main-merge 2026-05-14 17:44:44 +08:00
lefarcen
3b7f87c7ae Merge remote-tracking branch 'origin/main' into reconcile/garnet-main-merge 2026-05-14 17:44:26 +08:00
pftom
8e3af79dea feat(github-installer): enhance GitHub content installation and error handling
- Introduced new interfaces for GitHub content entries and budgets to streamline content fetching.
- Enhanced the `installFromGithub` function to support installation from GitHub contents, including subpath handling.
- Implemented robust error handling and retry logic for fetching GitHub content, improving installation resilience.
- Updated tests to validate the new content fetching logic and ensure correct behavior across various scenarios.

This update significantly improves the GitHub installation process, making it more flexible and user-friendly.
2026-05-14 17:31:29 +08:00
lefarcen
b268bbe169 Merge origin/garnet-hemisphere (post-9e196d34) — Use Plugin handoff fix
Brings in 11 new garnet commits, most importantly:
- 1a90aef4 feat(plugin-use): implement plugin use handoff functionality —
  fixes the bug QA reported where /plugins Use Plugin would 422 silently
  for template plugins; new flow hands off to HomeView with the plugin
  pre-bound + input form prompted there.
- 2ac58544 feat(plugin-inputs): enhance plugin input handling with file
  upload support — extends PluginInputsForm for file uploads.
- 3b167b69 feat(plugins): registry protocol — new @open-design/registry-protocol
  workspace package (needs build before daemon boot).
- Plus enhancements to plugin metadata, GitHub installer, plugin detail
  view, login/whoami, static HTML preview paths.

Conflicts resolved:
- packages/contracts/src/api/projects.ts: HEAD's skipDiscoveryBrief
  field + garnet's contextPlugins (@-mention plugin context refs) both
  kept on ProjectMetadata.
- apps/landing-page/* (3 files): accepted HEAD — garnet had the older
  single-page landing-page header; main has the multi-page layout
  (/skills/, /systems/, /templates/, /craft/) with dynamic counts. Not
  related to the Use Plugin core fix.

New @open-design/registry-protocol package must be built before daemon
boots; pnpm install does this via postinstall already.
2026-05-14 16:32:35 +08:00
pftom
6614b9bf09 refactor(plugin-authoring): streamline prompt and input handling
- Removed the redundant `buildPluginAuthoringPrompt` function call in `startPluginAuthoring` for cleaner code.
- Introduced new functions to build prompts and inputs based on user goals, enhancing the authoring experience.
- Updated `HomeView` to manage authoring inputs and prompts more effectively, ensuring better state handling.
- Adjusted the `PluginImportModal` to reflect changes in the import process, removing references to template creation.
- Enhanced tests to cover new input handling and prompt generation logic, ensuring reliability in the authoring flow.

This update improves the clarity and efficiency of the plugin authoring process, making it more intuitive for users.
2026-05-14 16:18:34 +08:00
pftom
fbcf50382a feat(github-installer): enhance GitHub source parsing and error handling
- Updated the GitHub source regex to support more flexible parsing of repository references, including subpaths.
- Introduced a new `parseGithubSource` function to handle the extraction of owner, repo, and potential subpaths, improving the robustness of the installation process.
- Enhanced the `installFromGithub` function to retry fetching from multiple candidates and provide detailed error messages when installation fails.
- Added tests to validate the new parsing logic and ensure correct behavior when handling various GitHub source formats.

This update significantly improves the handling of GitHub sources, making the installation process more resilient and user-friendly.
2026-05-14 16:09:37 +08:00
Nagendhra Madishetti
40766ef1ba
test(web): Critique Theater Phase 13 (reducer p99 bench + surface coverage walker) (#1318)
* feat(web): pure reducer for Critique Theater states (Phase 7.1)

Pure CritiqueState reducer driven by the contracts-level PanelEvent
(the same shape both the live SSE stream and the recorded transcript
emit), so a single reducer powers both the in-flight panel and the
rerun replay. Lifecycle covers run_started → running → (shipped /
degraded / interrupted / failed), with panelist_open / dim /
must_fix / close / round_end events building per-round
CritiquePanelistView entries as they arrive.

Defensive behaviour that surfaced while writing the spec tests:
- Terminal phases (shipped / degraded / interrupted / failed) are
  sticky against further lifecycle events for the same run, except
  for parser_warning which can land late and is recorded in a side
  channel without changing phase.
- A new run_started for a different runId at any time discards the
  prior state and reboots, so the UI can launch consecutive runs
  without an explicit reset action.
- Events whose runId does not match the active run return the same
  state reference, so React's useReducer doesn't re-render
  subscribers on stray traffic.
- Round bookkeeping keys by round number rather than "always last",
  so an out-of-order panelist_dim for round 1 arriving after a
  round 2 dim does not corrupt the round 2 bucket.

Test coverage: 18 cases covering each transition, the runId guard,
sticky-terminal behaviour, the out-of-order round invariant, and
the stable-identity guarantee. Sets up Phase 7.2 and 7.3 to wire
SSE + replay into the same reducer.

* feat(web): useCritiqueStream hook subscribes to SSE and feeds reducer (Phase 7.2)

createCritiqueEventsConnection is a pure connection manager that
mirrors apps/web/src/providers/project-events.ts: opens an
EventSource at /api/projects/:id/events, listens for every name in
CRITIQUE_SSE_EVENT_NAMES, decodes each frame back into a PanelEvent
(stripping the critique. prefix and merging the data payload), and
hands it to the caller's onEvent. Reconnect uses exponential
backoff (1s → 30s) and resets on `ready`; malformed payloads drop
with a dev-mode warning rather than tearing the stream.

useCritiqueStream wraps the manager in a useReducer that owns the
CritiqueState. enabled=false or a null projectId tears down the
connection cleanly; switching projectId closes the old connection
and opens a fresh one. The returned dispatch lets local UI
synthesise actions (e.g. an Esc keypress firing a synthetic
interrupted while a kill request is in flight); production traffic
comes from the SSE stream.

Test coverage:
- sse.test.ts (10 cases, node env): subscription set covers every
  CRITIQUE_SSE_EVENT_NAMES channel; payload decoding lifts the wire
  shape back to PanelEvent; malformed JSON is swallowed and does
  not stop the stream; exponential backoff schedule and ready-reset
  semantics are pinned with a setTimeout seam; close() cancels
  pending reconnects and shuts the live source; no-op fallback
  when EventSource is unavailable.
- useCritiqueStream.test.tsx (6 cases, jsdom env): idle pre-event,
  reducer driven by synthetic actions, no connection when disabled
  or projectId is null, clean close on unmount, projectId change
  reopens cleanly.

* feat(web): useCritiqueReplay hook drives reducer from transcript file (Phase 7.3)

Fetches the per-run NDJSON transcript (one PanelEvent per line),
parses every line via the shared isPanelEvent predicate, and
dispatches into the same CritiqueState reducer the live SSE stream
uses. A single reducer means the UI rendering a replay can be
identical to the live panel, and a UI mounting both
useCritiqueStream and useCritiqueReplay in parallel does not have
to reconcile two state shapes.

speed knob is `paused | instant | live | { intervalMs: N }`.
- instant flushes every event synchronously, useful for opening a
  finished run already at its terminal state.
- intervalMs paces dispatches at a fixed cadence so the reviewer
  can watch the run unfold.
- paused parses the transcript but holds events back until the
  caller advances speed (consumers can drive a scrubber later).
- live is reserved for the future "playback at original cadence"
  feature, currently treated as instant; replay timestamps are not
  yet persisted with each event so honest pacing requires a
  follow-up Phase 7+ task.

gunzip seam handles `.ndjson.gz` transcripts via
DecompressionStream when present; the production fetch path picks
between text and arrayBuffer based on the URL extension. Both seams
are injectable so the unit tests don't need to spin up a real
network or a real gzip pipeline.

Test coverage (8 cases, jsdom env):
- Idle status before any URL is provided.
- speed=instant flushes the full transcript synchronously to
  shipped state.
- speed={intervalMs:N} paces with the setTimeout seam, reaching
  done after the last tick.
- speed=paused leaves status=playing with no dispatches.
- Empty transcript reports done with state still idle.
- Fetch rejection surfaces an error status with the message.
- Malformed NDJSON lines are skipped; valid events around them
  still land.
- .gz transcripts route through the gunzip seam.

Closes the Phase 7 plan tasks 7.1 / 7.2 / 7.3 (reducer + stream +
replay), all on one branch ready for review. Phases 8+ (Theater
components) consume these from this PR.

* fix(web): close payload-override gap + paused-resume bug in Critique Theater hooks (Phase 7 review)

Two P1 fixes from lefarcen's review on PR #1307:

SSE payload override

`sseToPanelEvent` previously spread `data` after the channel-derived
`type`, so a payload-provided `type` could override the channel and
route a `critique.run_started` frame into the reducer as a `ship`
action. Reversed the spread so the channel-derived `type` is
authoritative, and revalidated the resulting object through the
contracts-level `isPanelEvent` predicate before returning. Frames
that fail validation (missing runId, empty runId, unknown type) are
dropped, so a malformed or compromised SSE frame can no longer
dispatch a wrong-shape action into the reducer.

Three new sse.test.ts cases pin the regression: hostile `type:'ship'`
in the payload still resolves to `run_started`, missing runId is
dropped, empty runId is dropped.

Replay pause/resume

`useCritiqueReplay` had one big effect keyed on `transcriptUrl`
only, so flipping `speed` from `paused` to `instant` never re-fired
and the held events sat undispatched. Split into a parse effect
(depends on URL, fetches and stores events in state) and a pace
effect (depends on parsed-events + speed, owns the cursor + timers).
The playback cursor lives in a ref that survives pause/resume
cycles, so flipping `paused` -> `instant` flushes from the current
position rather than restarting (which would double-dispatch
`run_started` and reset the reducer).

Two new useCritiqueReplay.test.tsx cases:
- paused-then-instant transitions from `playing` to `done` and
  reaches the shipped terminal phase
- intervalMs paced playback dispatches one event, pauses to drain
  the next scheduled timer, flips to instant, and confirms the
  remaining transcript drains exactly once (cursor was preserved)

Doc consistency

The earlier source comment in useCritiqueReplay.ts claimed `live`
"paces by recorded timestamps" while the impl used zero-delay
timers and the PR body said it behaves like `instant`. Aligned to
reality: `live` currently behaves like `{ intervalMs: 0 }` (events
drain on successive microtasks via setTimeoutFn) because transcripts
do not yet carry per-event timestamps. Honest timestamp-driven
pacing is queued as a Phase 7+ follow-up.

Validated: pnpm guard, pnpm --filter @open-design/web typecheck,
Theater suite 47/47 (up from 42, +3 sse + 2 replay), full web suite
96 files / 888 tests.

* feat(i18n): seed Critique Theater key block (en + zh-CN; other locales fall back via spread)

* feat(web): Theater PanelistLane component (Phase 8.1)

* feat(web): Theater ScoreTicker component (Phase 8.2)

* feat(web): Theater RoundDivider component (Phase 8.3)

* feat(web): Theater InterruptButton component with Escape keybind (Phase 8.4)

* feat(web): Theater TheaterDegraded chip (Phase 8.5)

* feat(web): Theater TheaterCollapsed post-run summary (Phase 8.6)

* feat(web): Theater TheaterTranscript replay surface (Phase 8.7)

* feat(web): Theater TheaterStage top-level container (Phase 8.8)

* feat(web): Theater CSS using existing semantic tokens (no hex literals)

* feat(web): Theater public exports barrel

* fix(web): resolve P2 + P3 review feedback on Phase 8 (PR #1314)

Addresses all 4 P2 + 3 P3 items from codex, Siri-Ray, and lefarcen.

State-lifecycle fixes (3 x P2)
1. Reducer learns a synthetic `__reset__` action (`CritiqueResetAction`).
   Host hooks dispatch it when their gating prop changes so a stale
   run from a prior project / transcript cannot bleed into the next
   context. Reset is idempotent on idle (returns the same reference).
2. `useCritiqueStream` dispatches `__reset__` at the top of its
   connection effect, so a workspace switch from project A (which
   streamed a critique) to project B clears the reducer before the
   new EventSource opens. enabled=false also clears.
3. `useCritiqueReplay` dispatches `__reset__` at the top of its
   parse effect, so transcriptUrl swaps (including swap-to-null after
   a replay reached `shipped`) lift the reducer back to idle before
   the new fetch starts.

SSE validation (1 x P2)
4. `sseToPanelEvent` now runs a per-variant `hasValidVariantShape`
   check after the cheap `isPanelEvent` predicate. A
   `critique.ship` frame missing `composite` / `round` / `status` /
   `artifactRef` is rejected before reaching the reducer, so
   TheaterCollapsed can no longer crash on `undefined.toFixed(1)`.
   Every variant's required fields are validated: run_started
   (protocolVersion, non-empty cast, maxRounds, threshold, scale),
   panelist_* (round, role, plus variant-specific shape), round_end
   (round, composite, mustFix, decision in {continue,ship}, reason),
   ship (round, composite, status, artifactRef.{projectId,artifactId},
   summary), degraded (reason, adapter), interrupted (bestRound,
   composite), failed (cause), parser_warning (kind, position).

Reducer correctness (1 x P2)
5. `panelist_open` now materializes the round + an empty panelist
   view (`{dims: [], mustFixes: []}`) so TheaterStage can highlight
   the in-progress lane the instant the tag opens. Before this, a
   stream that emitted only `panelist_open` after `run_started` left
   `rounds = []` and the UI rendered no current round until a later
   `panelist_dim` arrived.

Polish (3 x P3)
6. Brand role tint swaps from `var(--magenta, var(--accent))` to
   `var(--purple, var(--accent))`. `--purple` is actually defined
   across the design systems; `--magenta` is not, so Brand was
   silently falling through to `--accent` and looking identical to
   Designer.
7. New i18n key `critiqueTheater.interruptedSummary` for the
   interrupted-collapse copy ("Interrupted at round N, best
   composite X.X"). Previously the interrupted branch reused
   `shippedSummary` and the UI read "Shipped at round..." for a run
   that specifically did not ship. Native value in en + zh-CN; other
   locales fall back via `...en` spread.
8. `TheaterDegraded` heading id comes from `useId()` instead of a
   hardcoded `theater-degraded-heading`, so two chips rendered on
   the same page (chat history with multiple completed runs) keep
   their aria-labelledby references unambiguous.

Tests (15 new cases)
- reducer.test.ts (+5): __reset__ on running/terminal/idle, panelist_open materializes round, panelist_open does not stomp prior panelist data.
- sse.test.ts (+6): variant-level rejection for ship without required fields, degraded without adapter, run_started with empty cast, panelist_dim with non-numeric score, round_end with unknown decision, plus a positive fully-formed ship.
- useCritiqueStream.test.tsx (+2): state reset on projectId change, state reset on enabled flip false.
- useCritiqueReplay.test.tsx (+1): state reset on transcriptUrl swap to null after a replay reached shipped.
- TheaterCollapsed.test.tsx (text-pinning update): asserts the interrupted branch reads "Interrupted at round 1" + "best composite 7.9", and explicitly NOT "Shipped at round...".
- TheaterDegraded.test.tsx (+1): two chips on the same page get unique aria-labelledby ids that each resolve to an `<h3>`.

Validated
- pnpm guard clean
- pnpm --filter @open-design/web typecheck clean
- Theater suite: 13 files, 101 tests (was 86 on the first Phase 8 push, +15 new)
- tests/i18n/locales.test.ts 5 of 5 across 18 locales

* feat(web): CritiqueTheaterMount wires SSE + reducer into a single drop-in (Phase 9.1)

* feat(i18n): Critique Theater strings for de + ja + ko + zh-TW (Phase 9.2)

* fix(web): resolve P1 + P2 review feedback on Phase 9 (PR #1315)

Addresses every blocker from codex, Siri-Ray, and lefarcen. The
three state-lifecycle and SSE-validation issues they also flagged
inherit fixes from PR #1314's review pass that this branch now sits
on top of after rebase.

Real daemon kill on Interrupt (P1)
- CritiqueTheaterMount now POSTs to
  /api/projects/:id/critique/:runId/interrupt alongside the
  optimistic local dispatch. Before this fix, clicking Interrupt
  only flipped the React state to interrupted while the daemon job
  kept running. The fetch is best-effort: a 404 (endpoint not wired
  yet, lands in Phase 15) is swallowed with a dev-mode console.warn
  so the UI still moves to the collapsed badge.
- New fetchInterrupt test seam lets RTL assert on the URL / method
  and simulate the "daemon not ready yet" path. Two tests pin both:
  the happy URL proj-42/critique/run-abc/interrupt POSTs, and a
  rejected fetch still flips the UI.

interruptPending reset on new run (P2)
- A ref-backed effect compares the current runId against the last
  one we saw; when it changes, interruptPending is cleared. A user
  who interrupts run-1 and then triggers run-2 from the same mount
  now gets a fresh, enabled kill button instead of one stuck in
  "Interrupting…". Pinned by a new mount test.

Escape keybind scope (P2)
- InterruptButton now checks the keydown target. Escape inside an
  input, textarea, select, or contenteditable element is ignored
  (and any ancestor of those via closest() is treated the same
  way). Body-level focus still fires the keybind so the Theater
  area's affordance keeps working. Four new tests cover textarea,
  input, contenteditable, and the body-focus positive case.

userFacingName i18n key (P2)
- The spec at specs/current/critique-theater.md:6 mandates a single
  critiqueTheater.userFacingName key so the "Design Jury" label can
  be renamed without touching code. Phase 8 introduced
  critiqueTheater.title by mistake; renamed across types.ts, en.ts,
  zh-CN.ts, de.ts, ja.ts, ko.ts, zh-TW.ts, and the lone consumer
  TheaterStage.tsx. The locale alignment test stays green.

Validated
- pnpm guard clean
- pnpm --filter @open-design/web typecheck clean
- Theater suite: 14 files, 112 tests (was 101 before, +11 new for
  the Phase 9 review pass: 3 mount + 4 InterruptButton focus scope;
  the rest were already in #1314's review fix).
- tests/i18n/locales.test.ts 5 of 5 across 18 locales.

* feat(daemon): adapter-degraded registry with TTL (Phase 10.1)

In-memory registry recording adapters that produced malformed or
oversize transcripts so the orchestrator can skip them for a TTL
window (default 24h) instead of cycling through known-bad providers
on every run.

Records carry reason (malformed_block | oversize_block |
missing_artifact), source label, and expiresAt. The test-only
clock seam lets the suite advance time deterministically and prove
that an expired entry stops counting as degraded without anyone
calling clearDegraded.

7/7 vitest cases green.

* feat(daemon): synthetic good + bad adapter fixtures (Phase 10.2)

Two test-only adapters that read the existing v1 transcript
fixtures (happy-3-rounds and malformed-unbalanced) and replay them
as either a full string or a 512-byte chunked stream. The chunked
form is what the conformance harness uses to prove the parser
holds together when the transcript arrives in arbitrary network
slices, not as one buffered blob.

* feat(daemon): adapter conformance harness (Phase 10.3)

runAdapterConformance pulls a transcript through the same
parseCritiqueStream pipeline the orchestrator uses and classifies
the outcome as shipped, degraded, or failed. On a degraded
outcome it forwards the matched reason to the adapter-degraded
registry, so a single nightly conformance run is what populates
the skip list rather than the orchestrator learning each adapter
is broken at request time.

5/5 vitest cases green covering shipped, malformed degraded,
oversize degraded, no-ship failure, and the harness-thrown
failure path.

* test(e2e): Critique Theater Playwright suite (Phase 11)

Six tests, one viewport per visual case, deterministic SSE
fixtures stubbed via page.route(). Adds the suite to
test:ui:extended so the existing extended-UI lane picks it up.

Coverage:

  1. Happy path: a single mounted theater plays the full
     fixture (1 run_started, 5 panelists open / dim / must_fix /
     close, 1 round_end, 1 ship) and ends on the score badge.
  2. Interrupt mid-run: the panelist that is open at the time
     the interrupt button is clicked closes with an interrupted
     marker and the transcript freezes there.
  3. Visual regression at 375x720 mobile.
  4. Visual regression at 768x1024 tablet.
  5. Visual regression at 1280x800 desktop.
  6. A11y role tree: the theater region exposes a labelled
     landmark, each panelist lane is a group with an accessible
     name, the score is a status live region.

All SSE traffic is stubbed by page.route so the suite runs in CI
without a daemon. The toggle is seeded via localStorage by
bootAppWithCritiqueEnabled so the gate behaves as if Settings
flipped it on. typecheck clean; playwright --list reports 6.

* test(web): reducer p99 bench at 10k iterations (Phase 13.1)

Locks the documented 2ms budget for the Critique Theater reducer
on a representative SSE script (27 actions, one full happy run)
behind a regression gate. Asserts p99 stays under 4ms (2x the
documented budget) so CI runners with a noisy neighbour do not
flake while a real regression to 20ms or 200ms still trips.

The bench is a vitest case rather than a bare microbenchmark so
it runs in the same CI lane as every other web test and does not
need a parallel runner.

* test(web): critique surface coverage walker (Phase 13.2)

Walks the public critique surface (11 SSE event names, 5 panelist
roles, 6 lifecycle phases, 9 named i18n keys) and asserts each
named symbol appears in both the src corpus and the test corpus.
The walker is the gate that catches a rename in one half of the
codebase without a matching update in the other half: a future
PR that drops 'panelist_must_fix' from the reducer without also
removing its test reference fails this suite.

62 assertions, one per symbol per corpus.

* fix(web): tighten Phase 13 gates from lefarcen review (PR #1318)

Address the actionable items from lefarcen's review of the two
Phase 13 CI gates. The two questions about longer-term DX (pre-
commit hook to auto-update the symbol table, AST-walker swap)
are documented as deferred follow-ups rather than landed here.

reducer-bench:
  - Describe renamed to 'reducer p99 regression gate (Phase 13.1)'
    so it reads as a gate, not a comparative benchmark.
  - Failure message now carries the full distribution
    (p50 / p90 / p99 / max + ceiling), so triage on a tripped gate
    can distinguish a real 20ms regression from a 4.001ms CI hiccup
    without re-running locally (lefarcen Q3).
  - Captured a baseline (p50=0.011ms p90=0.013ms p99=0.018ms
    max=0.244ms on a local Node 24 / Win11 run, 2026-05-11) inside
    the docblock so reviewers can see the actual reading sits ~222x
    below the 4ms ceiling (lefarcen Q1).
  - Replaced 'role as any' casts with PanelistRole-typed casts so
    the fixture is typecheck-strict.
  - Phase numbering corrected (13.2 → 13.1 to match the PR body).

critique-coverage:
  - Symbols now grouped under four describe blocks (SSE events /
    panelist roles / lifecycle phases / i18n keys) so a failure
    points at the category that drifted at a glance (lefarcen nit).
  - Docblock now explains the grep-over-AST trade-off (the bug
    class is structural at the string level, not at the AST level)
    and points at the future AST-walker work as a deferred follow-
    up (lefarcen Q2).
  - Docblock now walks a contributor through the four-step
    maintenance flow (add to contract → add caller → add test →
    add literal here), so the next person to add an SSE event or
    i18n key knows the gate exists and what to update (lefarcen
    Q4).
  - Phase strings switched from 'phase: <name>' to bare-quoted
    literals so the walker is robust against single vs double
    quotes and ':' vs '===' source-shape changes.
  - Dead try/catch around 'stack = [root]' removed (cannot throw).
  - Per-symbol failure messages name the symbol AND which corpus
    is missing it, so the gate is self-describing on the next
    CI red.
  - Phase numbering corrected (13.4 → 13.2 to match the PR body).

63 / 63 vitest cases green (1 bench + 62 coverage). Web
typecheck clean.

* fix(web): tighten coverage walker semantics from lefarcen P2/P3 (PR #1318)

Two follow-on findings on commit 338a185:

P2 — coverage gate weakened. The previous revision used one helper
`corpusReferences` for both SRC and TEST corpora, and that helper
accepted the unprefixed PanelEvent type form (`type: 'panelist_must_fix'`)
as a substitute for the prefixed SSE wire name (`critique.panelist_must_fix`).
The fallback is correct on the TEST side (reducer tests dispatch
PanelEvent literals) but it weakened the SRC side: production code
could drop the SSE channel name silently and the PanelEvent type
alias would keep the walker green.

Split into two helpers: `srcReferences` is strict (exact substring
match only, no fallback) and `testReferences` keeps the lenient
fallback for SSE events. The production-side assertions now route
through `srcReferences` so the wire name is load-bearing again.

P3 — maintenance doc overclaimed. The previous revision said 'CI red
if you forget step 4' but the symbol arrays are partially hand-
maintained, so a contributor adding a NEW phase string or i18n key
without updating the array leaves CI green (the walker never knew
to look). Rewrote the failure-mode section to distinguish the two
cases:

  - Renaming an EXISTING symbol without updating the walker → CI red
    (existing assertion fails because the old name is gone).
  - Adding a NEW hand-maintained symbol without updating the walker
    → CI stays green (walker does not know to look for it).

Also clarified that `SSE_EVENTS` and `PANELIST_ROLE_STRINGS` are
auto-built from contracts so step 4 is one-line for `PHASE_STRINGS`
and `I18N_KEYS` only.

63 / 63 vitest cases still green.

* fix(web): close two P2 findings on PR #1318 (Siri-Ray + lefarcen)

P2 (coverage walker counted self as evidence). The walker walked
apps/web/tests, which contains apps/web/tests/components/Theater/
critique-coverage.test.ts itself. The hand-maintained PHASE_STRINGS
and I18N_KEYS literals inside that file would satisfy the test-side
coverage assertion against themselves, so a real Theater test that
covers a symbol could be deleted and the gate would still pass.

Excluded the walker file from TEST_FILES via path.resolve(__filename)
filter so the test corpus only contains independent evidence.

Once the walker stopped seeing itself, the gate correctly red-flagged
nine i18n keys that no INDEPENDENT test exercises:
critiqueTheater.userFacingName, roundLabel, composite, threshold,
interrupt, interrupted, degradedHeading, shippedSummary,
interruptedSummary. Component tests like TheaterCollapsed.test.tsx
exercise the rendered text but never mention the key STRING, so the
walker couldn't see them. Closed that gap by adding
apps/web/tests/components/Theater/critique-i18n-keys.test.ts: 9 cases,
one per watched key, asserting the dictionary entry exists as a
non-empty string. That's both real coverage (catches a stale dict)
and the independent evidence the walker requires.

P2 (interruptedSummary missing from de/ja/ko/zh-TW). The native
locale overrides were missing the key, so an interrupted run on a
German / Japanese / Korean / Traditional Chinese UI silently fell
back to the English string via the ...en spread. Added the key with
{round} and {composite} placeholders preserved, using PerishCode's
suggested copy from the earlier review thread.

Verified:
- pnpm --filter @open-design/web typecheck clean.
- pnpm exec vitest run tests/components/Theater tests/i18n:
  20 files / 190 tests green (critique-coverage 62 / 62,
  critique-i18n-keys 9 / 9 new, reducer-bench 1 / 1, locales 5 / 5).

* fix(web): drop the Dict cast in i18n key coverage test (lefarcen P1 / Siri-Ray on PR #1318)

The previous revision used `(en as Record<string, string>)[key]` to
read each watched key. Dict has no string index signature, so CI's
strict typecheck rejected the broad cast with TS2352 even though the
runtime assertion was fine.

Replaced with the typed pattern lefarcen suggested: type WATCHED_KEYS
as `readonly (keyof typeof en)[]` and read `en[key]` directly. That
removes the cast and also strengthens the test, because a renamed or
removed key now fails the type check immediately rather than at
runtime.

Verified:
- pnpm --filter @open-design/web typecheck clean.
- pnpm --filter @open-design/web exec vitest run
  tests/components/Theater/critique-i18n-keys.test.ts: 9 / 9 green.

* fix(web): tighten isPanelEvent in contracts so enum + numeric fields are checked end-to-end (Siri-Ray round-3 P1 on PR #1314)

The variant validator on the web SSE path previously accepted any
`typeof === 'string'` for closed-enum fields (ship.status,
panelist_*.role, degraded.reason, failed.cause, parser_warning.kind,
run_started.cast[]) and any `typeof === 'number'` for numeric fields,
which let NaN / Infinity through. Downstream components index i18n
tables by enum value, so an unknown status or role would land
`SHIP_BADGE_KEY[final.status]` on undefined and crash the translator.

The replay parser had a separate gap: `useCritiqueReplay.parseTranscript`
called the cheap `isPanelEvent` header check directly, so a recorded
line like `{"type":"ship","runId":"r"}` reached the reducer with
composite, status, round, artifactRef, summary all undefined and
TheaterCollapsed then called `final.composite.toFixed(1)` on undefined.

Resolution: move all wire-side validation into the contract guard.

- Export const arrays for the closed enums:
  SHIP_STATUSES, DEGRADED_REASONS, FAILED_CAUSES, PARSER_WARNING_KINDS,
  ROUND_DECISIONS (PANELIST_ROLES already existed).
- Rewrite `isPanelEvent` in packages/contracts/src/critique.ts to be the
  single deep validator: header (known type + non-empty runId) plus
  every variant-specific required field plus closed-enum membership
  plus Number.isFinite on every numeric field. Documented as the wire
  source of truth.
- Drop the local `hasValidVariantShape` from web/sse.ts; sseToPanelEvent
  now relies entirely on the contract guard, and parseTranscript in
  useCritiqueReplay (which already uses isPanelEvent) gets the deeper
  validation for free.

Tests (TDD, red-first):

- packages/contracts/tests/critique.test.ts: 13 new cases pinning the
  strict guard directly (well-formed across every variant, every
  rejection path: unknown type, empty/non-string runId, unknown enum,
  non-finite numeric, missing variant field).
- apps/web/tests/components/Theater/state/sse.test.ts: 9 new cases for
  each closed-enum rejection on the wire path plus a positive sweep
  across every legal enum value across every variant.
- apps/web/tests/components/Theater/hooks/useCritiqueReplay.test.tsx:
  2 new cases for incomplete and unknown-enum transcript lines.

Verified:
- pnpm --filter @open-design/contracts test 4 files / 30 tests green.
- pnpm --filter @open-design/contracts build clean.
- pnpm --filter @open-design/web typecheck clean.
- pnpm --filter @open-design/web test 107 files / 976 tests green.

* fix(contracts): enforce numeric domains in isPanelEvent (lefarcen P2 on PR #1314 round 4)

The strict guard from PR #1314 round 3 enforced enum membership and
Number.isFinite, but accepted any finite number where the contract
intends a specific domain: scale: 0 (ScoreTicker divides by it),
negative thresholds, fractional rounds, negative mustFix, etc.
ScoreTicker.tsx writes `var(--scale, ${state.scale})` into inline
CSS and divides by it for tick width, so a guard-passing scale: 0
shipped Infinity into the rendered style. Negative composite /
score values reached downstream code that assumes >= 0.

Resolution: mirror the daemon-side Zod domain constraints in the
runtime guard.

Three new helpers in packages/contracts/src/critique.ts:

  - isPositiveInt(v): integer with v > 0. Used for round, maxRounds,
    scale, protocolVersion (all 1-indexed in the orchestrator).
  - isNonNegativeInt(v): integer with v >= 0. Used for mustFix,
    position, bestRound. bestRound: 0 is the valid sentinel for
    'interrupted before any round closed'.
  - isNonNegativeFinite(v): finite number with v >= 0. Used for
    composite, score, dimScore, threshold. Threshold may be
    fractional (e.g. 8.5 on a scale of 10).

Cross-field check inside run_started: threshold <= scale (the daemon
Zod schema enforces this with an epsilon refine, the wire guard
matches the same intent).

Tests (TDD, red-first) added in packages/contracts/tests/critique.test.ts:

  - 22 new rejection cases across every numeric field that
    previously slipped through: scale: 0, negative scale, fractional
    scale, maxRounds: 0, fractional maxRounds, protocolVersion: 0,
    fractional protocolVersion, negative threshold, threshold > scale,
    round: 0, fractional round, negative dimScore / score, negative /
    fractional mustFix, negative composite, ship round: 0, negative /
    fractional bestRound, negative interrupted composite, negative /
    fractional parser_warning position.
  - 3 positive boundary cases that must still pass: threshold == scale,
    fractional threshold within [0, scale], interrupted with
    bestRound: 0 (no round completed before interrupt), parser_warning
    with position: 0 (start of stream).

Verified:
- pnpm --filter @open-design/contracts build clean.
- pnpm --filter @open-design/contracts test: 4 files / 59 tests green
  (was 37 before the new domain cases).
- pnpm --filter @open-design/web typecheck clean.
- pnpm --filter @open-design/web test: 110 files / 1004 tests green;
  no regression on Theater suite, sse validator, replay parser, or
  assistant-feedback widget tests.

* fix(web): restore wait-for-daemon-ack pattern on Theater interrupt

Same regression as flagged on PR #1316 post-main-merge: the
optimistic local dispatch fired before the POST resolved, so a
daemon 404 / 409 still terminalized the UI and the real SSE
terminal event got ignored by the sticky interrupted phase.

Snapshot runId / bestRound / composite at click time, dispatch
interrupted only on res.ok, clear interruptPending on rejection or
non-2xx so the user can retry. Tests cover rejection + 404 leaving
the run on the live stage; the 204 path waits for the ack.

* test(e2e): move critique-coverage walker from apps/web/tests to e2e/tests (Siri-Ray P2)

The walker is by definition a cross-app consistency check: it reads
the web reducer, the daemon critique module, the contracts package,
and the e2e UI suite. Hosting it under apps/web/tests/ violated the
repo boundary rule (root AGENTS.md): app packages must not import
another app's private src/ or tests/ as a shared helper, and
cross-app consistency checks belong in e2e/tests/. The web test
lane was effectively coupled to daemon and e2e file layout, so a
daemon-only refactor could break the web lane.

Moved the file to e2e/tests/critique-coverage.test.ts and switched
the contracts import to the import.meta.glob shape the e2e package
already uses (see localized-content.test.ts), so the e2e package
does not have to add @open-design/contracts as a workspace dep just
to load two const arrays. REPO_ROOT and SELF_PATH recalculated for
the new location.

Web test lane no longer depends on daemon, contracts, or e2e layout.
The e2e walker covers the same 62 assertions as before:

  e2e/tests/critique-coverage.test.ts  62 / 62 green

Web typecheck clean, e2e typecheck clean.

* fix(test): add projectKind prop to FileViewer deck render after v0.7.0 merge

---------

Co-authored-by: Nagendhra <nagendhra405@gmail.com>
2026-05-14 15:55:36 +08:00
pftom
2ac5854432 feat(plugin-inputs): enhance plugin input handling with file upload support
- Added support for file input fields in the PluginInputsForm, allowing users to upload files with serializable metadata.
- Updated the HomeHero component to improve the layout and interaction of input fields, enhancing user experience.
- Adjusted CSS styles for better visual representation of input fields and their states.
- Modified HomeView to reflect changes in authoring chip IDs for better clarity in plugin actions.
- Enhanced tests to cover new file input functionality and ensure correct behavior in various scenarios.

This update significantly improves the plugin input handling, enabling users to upload files seamlessly and enhancing the overall interaction model.
2026-05-14 15:52:21 +08:00
pftom
1a90aef4a2 feat(plugin-use): implement plugin use handoff functionality
- Added support for using installed plugins directly from the PluginsView, allowing users to initiate plugin actions seamlessly.
- Enhanced the HomeView to handle plugin use handoffs, managing state and user interactions effectively.
- Introduced new types and functions to facilitate the creation and processing of plugin use handoffs, improving the overall user experience.
- Updated tests to cover the new plugin use functionality, ensuring reliability and correctness in the application flow.

This update significantly enhances the interaction model for plugins, enabling users to utilize plugins more intuitively within the application.
2026-05-14 15:40:42 +08:00
pftom
9ea33e076b feat(context-plugins): add support for context plugins in project metadata and UI
- Introduced a new `contextPlugins` field in the `ProjectMetadata` type to accommodate plugins selected via `@` mentions, allowing for additive context in project creation.
- Updated the `HomeHero` and `EntryShell` components to handle and display context plugins, enhancing user interaction with selected plugins.
- Implemented rendering logic for context plugins in the metadata block, providing clear visibility of selected plugins and their descriptions.
- Enhanced the UI to support the removal of context plugins and display additional details on hover, improving the overall user experience.

This update significantly enriches the project creation process by allowing users to incorporate multiple context plugins seamlessly.
2026-05-14 15:29:49 +08:00
lefarcen
6c16283850 Merge origin/main (post-7c8305f4) into reconcile branch
Brings in 10 new main commits: routine deep-link to specific
conversations (#1508), Windows resource cache fix for Orbit templates,
collapsible comment side panel (#1607), routines project radio polish,
Copilot logo swap, and minor UI fixes.

Conflicts resolved:
- router.ts: garnet's home/view + marketplace routes + main's
  per-project conversationId deep-link field coexist on Route union
- ProjectView.tsx: garnet's isPhantomDaemonRunMessage helper +
  main's isStoppableAssistantMessage helper both kept
- ProjectView.run-cleanup.test.tsx: accepted HEAD (garnet's
  phantom-row regression test); main's three new tests for
  finalizeActiveAssistantMessagesOnStop / clearStreamingConversationMarker
  / shouldClearActiveRunRefs are queued as a follow-up TODO inline.
2026-05-14 15:13:38 +08:00
shangxinyu1
2976c76fc3
test: expand Memory and Routines coverage (#1521)
* test: expand settings and packaged coverage

* test: extend memory settings coverage

* test: cover routine settings failure states

* test: cover routine operation failures

* test: fix daemon test typing on CI

* test: decouple packaged smoke from orbit bug

* test: avoid live memory LLM calls in route tests

* test: fix daemon fetch typing in CI

* fix: restore preview comment and inspect toggles

* test: align manual edit flow with current inspector UX

* test: align comment attachment flow with current preview comments UI

* fix: probe resolved Codex launch path during detection

* fix: remove duplicate board activation helper after rebase

* test: update ghost cli detection mock

* test: align FileViewer toolbar expectation

* ci: move full app tests to extended lane

* ci: run app tests by changed scope

* ci: cover shared app inputs in test scopes

* ci: avoid setup-node cache in windows packaged smoke

* test: align extended settings and manual edit flows
2026-05-14 14:48:40 +08:00
Nagendhra Madishetti
5cb0508790
fix(web): deep-link Routines history rows to their specific conversation (Fixes #1505) (#1508) 2026-05-14 14:27:34 +08:00
soulme
2a8ebff11a
feat(web): add collapsible comment side panel (#1607) 2026-05-14 14:27:09 +08:00
Siri-Ray
d2738924fb
fix(web): freeze completed run durations across conversations (#1351)
* fix(web): freeze completed run durations across conversations

* fix(web): finalize stopped API runs

Generated-By: looper 0.6.0 (runner=fixer, agent=codex)

* fix(daemon): optimize conversation latest run lookup

Generated-By: looper 0.6.0 (runner=fixer, agent=codex)

* fix(web): scope streaming cleanup to conversation

Generated-By: looper 0.6.0 (runner=fixer, agent=codex)

* fix(web): capture streaming conversation cleanup

Generated-By: looper 0.6.0 (runner=fixer, agent=codex)

* fix(web): guard stale run ref cleanup

Generated-By: looper 0.6.0 (runner=fixer, agent=codex)
2026-05-14 14:25:37 +08:00
sukumarp2022
852a005b32
feat(web): add export as image screenshot to share menu (#1569)
Add an option to export the current preview viewport as a PNG image.

- Add requestPreviewSnapshot() utility in exports.ts (reuses the existing
  srcdoc snapshot bridge via postMessage)
- Add exportAsImage() and dataUrlToBlob() helpers for Blob download
- Add Export as image menu item in the HTML viewer share menu, gated
  behind srcdoc mode (bridge only present in srcdoc, not URL-load mode)
- Refactor PreviewDrawOverlay to delegate to the shared
  requestPreviewSnapshot() instead of duplicating the snapshot logic
- Add fileViewer.exportImage i18n key across all 19 locale files
- Add 7 unit tests covering snapshot request, timeout, error handling,
  and download filename sanitization

Fixes #1500
2026-05-14 11:07:28 +08:00
Bryan
54498f1ac5
fix(web): parse Provenance with Markdown-bold labels (#1584)
* fix(web): parse Provenance with Markdown-bold labels (#1580)

The daemon's finalize synthesis prompt at apps/daemon/src/finalize-design.ts:560-565
lists the five Provenance fields without pinning field-label syntax, so Claude
renders them with Markdown-bold labels per Markdown convention
(`- **Field:** value`). The parser at apps/web/src/lib/parse-provenance.ts:32-36
uses `[:\s]+` as its label/value separator, which stops at the trailing `**`
after the colon; the capture group then slurps the `**` and any following
whitespace into the value. Downstream of that, transcriptMessageCount and
generatedAt parse as null because the captured tokens don't start with digits
or a valid ISO 8601 prefix, and the Continue in CLI clipboard prompt shows
`Design system: ** ...`, `Transcript message count when DESIGN.md was
generated: unknown`, `DESIGN.md generated at: unknown`.

Fix: strip leading and trailing Markdown emphasis (`*`, `_`, whitespace) from
every captured value via a single helper threaded through extractField /
extractFieldOrNone / extractNumber / extractDate. Widen the
transcriptMessageCount regex's capture from `(\d+)` to `([^\n]+)` so the
strip step gets a chance to run on `** 4`. Add `[^:]*` between `count` and
`[:\s]+` to mirror the other label-walking regexes for bolded label
variants.

Defense-in-depth: tell the synthesis prompt to emit plain `- Field: value`
bullets with no emphasis on the labels. The parser hardening is the
load-bearing fix; this is belt-and-suspenders for new model variants.

Red-Green-Refactor:
- Phase 1 (Red): 3 new parse-provenance tests covering bold labels with
  backticked values, bold labels with a short `Generated:` form, and bold
  labels with `none` sentinels. All 3 failed against pre-fix source.
- Phase 2 (Green): strip + regex widening. All 7 parse-provenance tests
  + 1158 web tests pass.
- Phase 3: empirically verified against a live finalized DESIGN.md — all
  five fields now parse correctly.
- Phase 4 (defense-in-depth): one-line addendum to synthesis prompt.
- Phase 5: bold-labelled Provenance fixture added to the hook test
  (useDesignMdState.test.tsx) so the round-7 `unknown-provenance`
  fail-closed path is regression-pinned end-to-end.

Backticks in field values are intentionally kept (out of scope per the
issue spec; rendered clipboard text reads fine with them). The variant
`- **Field**: value` (colon outside emphasis) is not in the issue
enumeration and is not handled.

Fixes #1580

* fix(web): narrow Provenance strip to Markdown residue only

Round-2 fix per lefarcen's review on PR #1584. The round-1 helper
used `^[\s*_]+` / `[\s*_]+$`, which stripped a literal leading or
trailing `*`/`_` from any captured value — `_draft.html` corrupted
to `draft.html`, and a build id like `build_id_v1_` lost its
trailing underscore.

Narrow stripMarkdownEmphasis to three explicit passes:
  1. Leading `*`/`_` tokens FOLLOWED BY WHITESPACE — only matches
     the `** ` residue left after `- **Field:** value` is captured
     starting at the `*`.
  2. Trailing WHITESPACE followed by `*`/`_` tokens — mirror of (1)
     if the value closes with emphasis after whitespace.
  3. A single balanced wrap around the remaining value
     (`**X**` / `*X*` / `__X__` / `_X_`) — handles the
     `- **Field:** **value**` shape and any plain-label
     `**value**` form.

Asymmetric literal `*`/`_` characters in the value (no whitespace
separator, no balanced closing token) are preserved by
construction.

Added regression tests:
  - plain label + `_draft.html` value
  - plain label + `build_id_v1_` value (trailing underscore)
  - bold label + `_draft.html` value (residue stripped, literal
    leading underscore preserved)
  - plain label + `**wrapped-id**` value (balanced residue stripped)

All 11 parse-provenance tests + 1162 web tests pass. Empirically
re-verified against a live finalized DESIGN.md — all five fields
still parse correctly.

---------

Co-authored-by: DevForgeAI CI/CD Engineer <devforge-ai@development.ai>
2026-05-14 11:04:24 +08:00
Prantik Medhi
0c0da7cc23
fix(web): confirm continue-in-cli copy (#1604) 2026-05-14 11:02:36 +08:00
pftom
3b167b6921 feat(plugins): add registry protocol and enhance plugin management features
- Introduced the `@open-design/registry-protocol` package, enabling improved interactions with plugin registries.
- Updated the `typecheck` script in the daemon's `package.json` to include the new registry protocol.
- Enhanced the CLI with new flags and commands for better plugin management, including `yank` and additional marketplace functionalities.
- Implemented a plugin lockfile system to manage installed plugins and their versions, improving reliability during upgrades.
- Added new marketplace doctor functionality to validate plugin entries and ensure compliance with registry standards.

This update significantly enhances the plugin ecosystem by providing robust registry interactions and improved management capabilities.
2026-05-14 08:55:36 +08:00
pftom
56c264c9bd feat(plugins): add login and whoami commands for GitHub CLI authentication
- Introduced `login` and `whoami` commands to the plugin CLI, enabling users to authenticate with the Open Design registry via GitHub CLI.
- The `login` command wraps GitHub CLI authentication, allowing users to specify a host, defaulting to GitHub.
- The `whoami` command retrieves and displays the authenticated GitHub account information, with an option for JSON output.
- Updated the CLI help documentation to include usage instructions for the new commands.
- Enhanced error handling for GitHub CLI dependencies and authentication status.

This update improves the user experience by simplifying the authentication process for plugin publishing.
2026-05-14 07:25:05 +08:00
lefarcen
d83b228c81 Merge remote-tracking branch 'origin/garnet-hemisphere' into reconcile/garnet-main-merge 2026-05-13 23:52:33 +08:00
pftom
7c48fbd902 feat(plugins): enhance PluginsView with new marketplace management features
- Updated the PluginsView component to include new tabs for 'Installed', 'Available', and 'Sources', improving the organization of plugin management.
- Introduced functions for adding, refreshing, removing, and setting trust for plugin marketplaces, enhancing the marketplace interaction capabilities.
- Enhanced the UI to reflect the new structure, including updated CSS styles for better visual consistency and usability.
- Added tests to ensure the functionality of the new marketplace features and verify the correct rendering of available plugins.

This update significantly improves the user experience in managing plugins and marketplaces, providing a more intuitive interface for users.
2026-05-13 23:42:41 +08:00
lefarcen
53997990b7 Merge origin/main (post-0.7.0) into reconciled garnet branch
Second-pass merge layering 41+ new commits from origin/main on top of
the first reconcile commit. Headline upstream additions absorbed:

- 0.7.0 release: redesigned chat bubble user-text styling, neutralised
  palette, lucide icons, ElevenLabs audio voice option discovery in the
  prompt composer, analytics tracking (PostHog) wired across home /
  studio / create surfaces, Prometheus `/api/metrics` endpoint,
  critique-theater drop-in mount with a settings toggle.
- Misc upstream fixes (titlebar padding, release header layout, deck
  preview chrome, feedback form auto-scroll, conversation-created SSE
  on routine runs, etc.)

Conflict resolutions (12 files, ~22 hunks):

- contracts barrel + prompts/system: union of both sides; new analytics
  exports (`./analytics/events`, `./analytics/public-params`) added
  alongside garnet's plugin/atom/genui exports. Both ElevenLabs voice
  fields (audioVoiceOptions/audioVoiceOptionsError, main) and
  pluginBlock/activeStageBlocks (garnet) preserved on ComposeInput.
- daemon/server.ts: Prometheus `/api/metrics` route inserted after
  garnet's `/api/daemon/shutdown`. main's `createAnalyticsService` call
  added before the chat-run service init alongside the prior reconcile
  note about the dropped legacy POST /api/projects body.
- App.tsx: handleCreateProject now consumes both garnet's plugin
  fields (pluginId / appliedPluginSnapshotId / pluginInputs /
  autoSendFirstMessage) and main's analytics requestId. Tracking
  fires success + failure paths; PluginLoopHome auto-send sessionStorage
  flag is preserved.
- ProjectView.tsx: the garnet auto-send useEffect coexists with main's
  `useCritiqueTheaterEnabled()` hook.
- ChatComposer.tsx: imports merged (drop now-unused fetchSkills,
  add analytics provider + tracking + buildVisualAnnotationAttachment).
- index.css: main's redesigned `.msg.user .user-text` chat bubble
  styling wins over garnet's plain text rule; garnet's
  `.msg-plugin-chip*` rules preserved alongside.
- EntryView.tsx: accepted HEAD (garnet wrapper) — consistent with
  reconcile decision #2. main's added PetRail / TopTab / analytics
  view tracking is intentionally NOT brought into the wrapper; the
  follow-up to re-integrate PetRail / image-templates / video-templates
  into EntryShell still stands and now also covers analytics
  view-tracking hooks.
- daemon/package.json + pnpm-lock: merged dep set (tar + posthog-node +
  prom-client coexist).
- Test fixtures (FileWorkspace.test): kept garnet's plugin-folders
  describe block intact; main's projectKind="prototype" addition is
  dropped where it conflicted with garnet's plugin-folder fixture
  files.

Verification: `pnpm install` (after lockfile reconciled), `pnpm typecheck`
exits 0 across all workspace packages.

Follow-up not done in this commit:
- PetRail / image-templates / video-templates / 0.7.0 analytics
  view-tracking hooks need to be added to EntryShell.
- Critique-theater settings toggle UX (added on main) lives in the
  SettingsDialog hierarchy; the reconcile state preserves the
  SettingsDialog so this should work without changes, but no
  end-to-end verification yet.
2026-05-13 23:29:56 +08:00
lefarcen
d3602be666 Merge origin/main into garnet-hemisphere (reconcile)
Merge of `origin/main` (`03ed3960`, 2026-05-13 pre-0.7.0) into the
161-commit garnet-hemisphere line, reconciling the product-vibe-coded
plugin/marketplace/EntryShell surfaces from garnet with the routines /
skills / live-artifacts feature work landed on main since the fork point.

Headline decisions (full rationale + side-by-side screenshots in
`specs/change/20260513-garnet-skills-automations/reconcile-result-vs-garnet.md`):

- #1 SettingsDialog: keep main's Memory / Skills / External MCP /
  Connectors / Routines / MCP server nav items even though the top-level
  /integrations + /automations routes also cover them. Two entries
  coexist for now; revisit once Track A/B fill in the placeholder content.
- #2 EntryView: accept garnet's thin wrapper delegating to EntryShell.
  Main's PetRail sidebar + image-templates/video-templates tabs are
  intentionally deferred to a follow-up that re-integrates them into
  the new EntryShell layout.
- #3 /integrations + /automations top-level routes: kept (garnet's
  product intent). Skills tab is still a "Coming soon" placeholder
  awaiting Track A; Routines/Schedules/Live-artifacts cards on
  /automations are still mock awaiting Track B.
- #5 DesignFilesPanel: hybrid — main's pagination as primary list,
  garnet's Plugin folders section preserved between the live-artifacts
  block and the pagination block. (by-kind sections drop in favour of
  pagination; plugin-folders rendering stays because it is a
  garnet-specific product addition.)
- #7 server.ts (10 hunks, ~5400 conflict lines): manual hunk-by-hunk
  merge. Both daemon admin routes + plugin/genui routes (garnet) and
  routines/memory/skills upgrades (main) preserved. Garnet's inline
  project route block kept alongside main's `registerProjectRoutes` /
  `registerProjectUploadRoutes` modular wiring — duplicate route
  audit is a follow-up. Garnet's POST /api/projects plugin-snapshot
  resolution + default-scenario fallback is intentionally dropped from
  the inline body (now handled by registerProjectRoutes) and listed for
  follow-up re-integration into `project-routes.ts`.

Verification (worktree at /Users/elian/Documents/open-design-garnet):
- `pnpm typecheck` exits 0 across all workspace packages
- daemon (`pnpm tools-dev run web --namespace reconcile-shots`) boots,
  serves `/api/daemon/status` healthy, and survives a Playwright
  walkthrough of /integrations / /automations / home / projects /
  design-systems / plugins / settings dialog
- `@open-design/plugin-runtime` package built (was missing dist/ on
  garnet); without it the daemon's plugins/* imports fail at boot

Track A (Skills tab → real SkillsSection) and Track B (Automations
cards → real routines / live-artifacts backend) are the two remaining
follow-ups blocking the placeholder/mock content from going live. See
`spec.md` and `track-skills.md` in the same directory.
2026-05-13 22:29:21 +08:00
Yuhao Chen
828f8b93bf
fix(web): protect built-in skills from delete (#1583) 2026-05-13 22:13:30 +08:00
Nagendhra Madishetti
38a5ab69e6
feat(daemon): Critique Theater Phase 12 (9 Prometheus metrics + 6 log events + OTel span + Grafana dashboard) (#1485)
* feat(web): pure reducer for Critique Theater states (Phase 7.1)

Pure CritiqueState reducer driven by the contracts-level PanelEvent
(the same shape both the live SSE stream and the recorded transcript
emit), so a single reducer powers both the in-flight panel and the
rerun replay. Lifecycle covers run_started → running → (shipped /
degraded / interrupted / failed), with panelist_open / dim /
must_fix / close / round_end events building per-round
CritiquePanelistView entries as they arrive.

Defensive behaviour that surfaced while writing the spec tests:
- Terminal phases (shipped / degraded / interrupted / failed) are
  sticky against further lifecycle events for the same run, except
  for parser_warning which can land late and is recorded in a side
  channel without changing phase.
- A new run_started for a different runId at any time discards the
  prior state and reboots, so the UI can launch consecutive runs
  without an explicit reset action.
- Events whose runId does not match the active run return the same
  state reference, so React's useReducer doesn't re-render
  subscribers on stray traffic.
- Round bookkeeping keys by round number rather than "always last",
  so an out-of-order panelist_dim for round 1 arriving after a
  round 2 dim does not corrupt the round 2 bucket.

Test coverage: 18 cases covering each transition, the runId guard,
sticky-terminal behaviour, the out-of-order round invariant, and
the stable-identity guarantee. Sets up Phase 7.2 and 7.3 to wire
SSE + replay into the same reducer.

* feat(web): useCritiqueStream hook subscribes to SSE and feeds reducer (Phase 7.2)

createCritiqueEventsConnection is a pure connection manager that
mirrors apps/web/src/providers/project-events.ts: opens an
EventSource at /api/projects/:id/events, listens for every name in
CRITIQUE_SSE_EVENT_NAMES, decodes each frame back into a PanelEvent
(stripping the critique. prefix and merging the data payload), and
hands it to the caller's onEvent. Reconnect uses exponential
backoff (1s → 30s) and resets on `ready`; malformed payloads drop
with a dev-mode warning rather than tearing the stream.

useCritiqueStream wraps the manager in a useReducer that owns the
CritiqueState. enabled=false or a null projectId tears down the
connection cleanly; switching projectId closes the old connection
and opens a fresh one. The returned dispatch lets local UI
synthesise actions (e.g. an Esc keypress firing a synthetic
interrupted while a kill request is in flight); production traffic
comes from the SSE stream.

Test coverage:
- sse.test.ts (10 cases, node env): subscription set covers every
  CRITIQUE_SSE_EVENT_NAMES channel; payload decoding lifts the wire
  shape back to PanelEvent; malformed JSON is swallowed and does
  not stop the stream; exponential backoff schedule and ready-reset
  semantics are pinned with a setTimeout seam; close() cancels
  pending reconnects and shuts the live source; no-op fallback
  when EventSource is unavailable.
- useCritiqueStream.test.tsx (6 cases, jsdom env): idle pre-event,
  reducer driven by synthetic actions, no connection when disabled
  or projectId is null, clean close on unmount, projectId change
  reopens cleanly.

* feat(web): useCritiqueReplay hook drives reducer from transcript file (Phase 7.3)

Fetches the per-run NDJSON transcript (one PanelEvent per line),
parses every line via the shared isPanelEvent predicate, and
dispatches into the same CritiqueState reducer the live SSE stream
uses. A single reducer means the UI rendering a replay can be
identical to the live panel, and a UI mounting both
useCritiqueStream and useCritiqueReplay in parallel does not have
to reconcile two state shapes.

speed knob is `paused | instant | live | { intervalMs: N }`.
- instant flushes every event synchronously, useful for opening a
  finished run already at its terminal state.
- intervalMs paces dispatches at a fixed cadence so the reviewer
  can watch the run unfold.
- paused parses the transcript but holds events back until the
  caller advances speed (consumers can drive a scrubber later).
- live is reserved for the future "playback at original cadence"
  feature, currently treated as instant; replay timestamps are not
  yet persisted with each event so honest pacing requires a
  follow-up Phase 7+ task.

gunzip seam handles `.ndjson.gz` transcripts via
DecompressionStream when present; the production fetch path picks
between text and arrayBuffer based on the URL extension. Both seams
are injectable so the unit tests don't need to spin up a real
network or a real gzip pipeline.

Test coverage (8 cases, jsdom env):
- Idle status before any URL is provided.
- speed=instant flushes the full transcript synchronously to
  shipped state.
- speed={intervalMs:N} paces with the setTimeout seam, reaching
  done after the last tick.
- speed=paused leaves status=playing with no dispatches.
- Empty transcript reports done with state still idle.
- Fetch rejection surfaces an error status with the message.
- Malformed NDJSON lines are skipped; valid events around them
  still land.
- .gz transcripts route through the gunzip seam.

Closes the Phase 7 plan tasks 7.1 / 7.2 / 7.3 (reducer + stream +
replay), all on one branch ready for review. Phases 8+ (Theater
components) consume these from this PR.

* fix(web): close payload-override gap + paused-resume bug in Critique Theater hooks (Phase 7 review)

Two P1 fixes from lefarcen's review on PR #1307:

SSE payload override

`sseToPanelEvent` previously spread `data` after the channel-derived
`type`, so a payload-provided `type` could override the channel and
route a `critique.run_started` frame into the reducer as a `ship`
action. Reversed the spread so the channel-derived `type` is
authoritative, and revalidated the resulting object through the
contracts-level `isPanelEvent` predicate before returning. Frames
that fail validation (missing runId, empty runId, unknown type) are
dropped, so a malformed or compromised SSE frame can no longer
dispatch a wrong-shape action into the reducer.

Three new sse.test.ts cases pin the regression: hostile `type:'ship'`
in the payload still resolves to `run_started`, missing runId is
dropped, empty runId is dropped.

Replay pause/resume

`useCritiqueReplay` had one big effect keyed on `transcriptUrl`
only, so flipping `speed` from `paused` to `instant` never re-fired
and the held events sat undispatched. Split into a parse effect
(depends on URL, fetches and stores events in state) and a pace
effect (depends on parsed-events + speed, owns the cursor + timers).
The playback cursor lives in a ref that survives pause/resume
cycles, so flipping `paused` -> `instant` flushes from the current
position rather than restarting (which would double-dispatch
`run_started` and reset the reducer).

Two new useCritiqueReplay.test.tsx cases:
- paused-then-instant transitions from `playing` to `done` and
  reaches the shipped terminal phase
- intervalMs paced playback dispatches one event, pauses to drain
  the next scheduled timer, flips to instant, and confirms the
  remaining transcript drains exactly once (cursor was preserved)

Doc consistency

The earlier source comment in useCritiqueReplay.ts claimed `live`
"paces by recorded timestamps" while the impl used zero-delay
timers and the PR body said it behaves like `instant`. Aligned to
reality: `live` currently behaves like `{ intervalMs: 0 }` (events
drain on successive microtasks via setTimeoutFn) because transcripts
do not yet carry per-event timestamps. Honest timestamp-driven
pacing is queued as a Phase 7+ follow-up.

Validated: pnpm guard, pnpm --filter @open-design/web typecheck,
Theater suite 47/47 (up from 42, +3 sse + 2 replay), full web suite
96 files / 888 tests.

* feat(i18n): seed Critique Theater key block (en + zh-CN; other locales fall back via spread)

* feat(web): Theater PanelistLane component (Phase 8.1)

* feat(web): Theater ScoreTicker component (Phase 8.2)

* feat(web): Theater RoundDivider component (Phase 8.3)

* feat(web): Theater InterruptButton component with Escape keybind (Phase 8.4)

* feat(web): Theater TheaterDegraded chip (Phase 8.5)

* feat(web): Theater TheaterCollapsed post-run summary (Phase 8.6)

* feat(web): Theater TheaterTranscript replay surface (Phase 8.7)

* feat(web): Theater TheaterStage top-level container (Phase 8.8)

* feat(web): Theater CSS using existing semantic tokens (no hex literals)

* feat(web): Theater public exports barrel

* fix(web): resolve P2 + P3 review feedback on Phase 8 (PR #1314)

Addresses all 4 P2 + 3 P3 items from codex, Siri-Ray, and lefarcen.

State-lifecycle fixes (3 x P2)
1. Reducer learns a synthetic `__reset__` action (`CritiqueResetAction`).
   Host hooks dispatch it when their gating prop changes so a stale
   run from a prior project / transcript cannot bleed into the next
   context. Reset is idempotent on idle (returns the same reference).
2. `useCritiqueStream` dispatches `__reset__` at the top of its
   connection effect, so a workspace switch from project A (which
   streamed a critique) to project B clears the reducer before the
   new EventSource opens. enabled=false also clears.
3. `useCritiqueReplay` dispatches `__reset__` at the top of its
   parse effect, so transcriptUrl swaps (including swap-to-null after
   a replay reached `shipped`) lift the reducer back to idle before
   the new fetch starts.

SSE validation (1 x P2)
4. `sseToPanelEvent` now runs a per-variant `hasValidVariantShape`
   check after the cheap `isPanelEvent` predicate. A
   `critique.ship` frame missing `composite` / `round` / `status` /
   `artifactRef` is rejected before reaching the reducer, so
   TheaterCollapsed can no longer crash on `undefined.toFixed(1)`.
   Every variant's required fields are validated: run_started
   (protocolVersion, non-empty cast, maxRounds, threshold, scale),
   panelist_* (round, role, plus variant-specific shape), round_end
   (round, composite, mustFix, decision in {continue,ship}, reason),
   ship (round, composite, status, artifactRef.{projectId,artifactId},
   summary), degraded (reason, adapter), interrupted (bestRound,
   composite), failed (cause), parser_warning (kind, position).

Reducer correctness (1 x P2)
5. `panelist_open` now materializes the round + an empty panelist
   view (`{dims: [], mustFixes: []}`) so TheaterStage can highlight
   the in-progress lane the instant the tag opens. Before this, a
   stream that emitted only `panelist_open` after `run_started` left
   `rounds = []` and the UI rendered no current round until a later
   `panelist_dim` arrived.

Polish (3 x P3)
6. Brand role tint swaps from `var(--magenta, var(--accent))` to
   `var(--purple, var(--accent))`. `--purple` is actually defined
   across the design systems; `--magenta` is not, so Brand was
   silently falling through to `--accent` and looking identical to
   Designer.
7. New i18n key `critiqueTheater.interruptedSummary` for the
   interrupted-collapse copy ("Interrupted at round N, best
   composite X.X"). Previously the interrupted branch reused
   `shippedSummary` and the UI read "Shipped at round..." for a run
   that specifically did not ship. Native value in en + zh-CN; other
   locales fall back via `...en` spread.
8. `TheaterDegraded` heading id comes from `useId()` instead of a
   hardcoded `theater-degraded-heading`, so two chips rendered on
   the same page (chat history with multiple completed runs) keep
   their aria-labelledby references unambiguous.

Tests (15 new cases)
- reducer.test.ts (+5): __reset__ on running/terminal/idle, panelist_open materializes round, panelist_open does not stomp prior panelist data.
- sse.test.ts (+6): variant-level rejection for ship without required fields, degraded without adapter, run_started with empty cast, panelist_dim with non-numeric score, round_end with unknown decision, plus a positive fully-formed ship.
- useCritiqueStream.test.tsx (+2): state reset on projectId change, state reset on enabled flip false.
- useCritiqueReplay.test.tsx (+1): state reset on transcriptUrl swap to null after a replay reached shipped.
- TheaterCollapsed.test.tsx (text-pinning update): asserts the interrupted branch reads "Interrupted at round 1" + "best composite 7.9", and explicitly NOT "Shipped at round...".
- TheaterDegraded.test.tsx (+1): two chips on the same page get unique aria-labelledby ids that each resolve to an `<h3>`.

Validated
- pnpm guard clean
- pnpm --filter @open-design/web typecheck clean
- Theater suite: 13 files, 101 tests (was 86 on the first Phase 8 push, +15 new)
- tests/i18n/locales.test.ts 5 of 5 across 18 locales

* feat(web): CritiqueTheaterMount wires SSE + reducer into a single drop-in (Phase 9.1)

* feat(i18n): Critique Theater strings for de + ja + ko + zh-TW (Phase 9.2)

* fix(web): resolve P1 + P2 review feedback on Phase 9 (PR #1315)

Addresses every blocker from codex, Siri-Ray, and lefarcen. The
three state-lifecycle and SSE-validation issues they also flagged
inherit fixes from PR #1314's review pass that this branch now sits
on top of after rebase.

Real daemon kill on Interrupt (P1)
- CritiqueTheaterMount now POSTs to
  /api/projects/:id/critique/:runId/interrupt alongside the
  optimistic local dispatch. Before this fix, clicking Interrupt
  only flipped the React state to interrupted while the daemon job
  kept running. The fetch is best-effort: a 404 (endpoint not wired
  yet, lands in Phase 15) is swallowed with a dev-mode console.warn
  so the UI still moves to the collapsed badge.
- New fetchInterrupt test seam lets RTL assert on the URL / method
  and simulate the "daemon not ready yet" path. Two tests pin both:
  the happy URL proj-42/critique/run-abc/interrupt POSTs, and a
  rejected fetch still flips the UI.

interruptPending reset on new run (P2)
- A ref-backed effect compares the current runId against the last
  one we saw; when it changes, interruptPending is cleared. A user
  who interrupts run-1 and then triggers run-2 from the same mount
  now gets a fresh, enabled kill button instead of one stuck in
  "Interrupting…". Pinned by a new mount test.

Escape keybind scope (P2)
- InterruptButton now checks the keydown target. Escape inside an
  input, textarea, select, or contenteditable element is ignored
  (and any ancestor of those via closest() is treated the same
  way). Body-level focus still fires the keybind so the Theater
  area's affordance keeps working. Four new tests cover textarea,
  input, contenteditable, and the body-focus positive case.

userFacingName i18n key (P2)
- The spec at specs/current/critique-theater.md:6 mandates a single
  critiqueTheater.userFacingName key so the "Design Jury" label can
  be renamed without touching code. Phase 8 introduced
  critiqueTheater.title by mistake; renamed across types.ts, en.ts,
  zh-CN.ts, de.ts, ja.ts, ko.ts, zh-TW.ts, and the lone consumer
  TheaterStage.tsx. The locale alignment test stays green.

Validated
- pnpm guard clean
- pnpm --filter @open-design/web typecheck clean
- Theater suite: 14 files, 112 tests (was 101 before, +11 new for
  the Phase 9 review pass: 3 mount + 4 InterruptButton focus scope;
  the rest were already in #1314's review fix).
- tests/i18n/locales.test.ts 5 of 5 across 18 locales.

* feat(daemon): adapter-degraded registry with TTL (Phase 10.1)

In-memory registry recording adapters that produced malformed or
oversize transcripts so the orchestrator can skip them for a TTL
window (default 24h) instead of cycling through known-bad providers
on every run.

Records carry reason (malformed_block | oversize_block |
missing_artifact), source label, and expiresAt. The test-only
clock seam lets the suite advance time deterministically and prove
that an expired entry stops counting as degraded without anyone
calling clearDegraded.

7/7 vitest cases green.

* feat(daemon): synthetic good + bad adapter fixtures (Phase 10.2)

Two test-only adapters that read the existing v1 transcript
fixtures (happy-3-rounds and malformed-unbalanced) and replay them
as either a full string or a 512-byte chunked stream. The chunked
form is what the conformance harness uses to prove the parser
holds together when the transcript arrives in arbitrary network
slices, not as one buffered blob.

* feat(daemon): adapter conformance harness (Phase 10.3)

runAdapterConformance pulls a transcript through the same
parseCritiqueStream pipeline the orchestrator uses and classifies
the outcome as shipped, degraded, or failed. On a degraded
outcome it forwards the matched reason to the adapter-degraded
registry, so a single nightly conformance run is what populates
the skip list rather than the orchestrator learning each adapter
is broken at request time.

5/5 vitest cases green covering shipped, malformed degraded,
oversize degraded, no-ship failure, and the harness-thrown
failure path.

* test(e2e): Critique Theater Playwright suite (Phase 11)

Six tests, one viewport per visual case, deterministic SSE
fixtures stubbed via page.route(). Adds the suite to
test:ui:extended so the existing extended-UI lane picks it up.

Coverage:

  1. Happy path: a single mounted theater plays the full
     fixture (1 run_started, 5 panelists open / dim / must_fix /
     close, 1 round_end, 1 ship) and ends on the score badge.
  2. Interrupt mid-run: the panelist that is open at the time
     the interrupt button is clicked closes with an interrupted
     marker and the transcript freezes there.
  3. Visual regression at 375x720 mobile.
  4. Visual regression at 768x1024 tablet.
  5. Visual regression at 1280x800 desktop.
  6. A11y role tree: the theater region exposes a labelled
     landmark, each panelist lane is a group with an accessible
     name, the score is a status live region.

All SSE traffic is stubbed by page.route so the suite runs in CI
without a daemon. The toggle is seeded via localStorage by
bootAppWithCritiqueEnabled so the gate behaves as if Settings
flipped it on. typecheck clean; playwright --list reports 6.

* test(web): reducer p99 bench at 10k iterations (Phase 13.1)

Locks the documented 2ms budget for the Critique Theater reducer
on a representative SSE script (27 actions, one full happy run)
behind a regression gate. Asserts p99 stays under 4ms (2x the
documented budget) so CI runners with a noisy neighbour do not
flake while a real regression to 20ms or 200ms still trips.

The bench is a vitest case rather than a bare microbenchmark so
it runs in the same CI lane as every other web test and does not
need a parallel runner.

* test(web): critique surface coverage walker (Phase 13.2)

Walks the public critique surface (11 SSE event names, 5 panelist
roles, 6 lifecycle phases, 9 named i18n keys) and asserts each
named symbol appears in both the src corpus and the test corpus.
The walker is the gate that catches a rename in one half of the
codebase without a matching update in the other half: a future
PR that drops 'panelist_must_fix' from the reducer without also
removing its test reference fails this suite.

62 assertions, one per symbol per corpus.

* docs: Critique Theater user guide (Phase 14.1)

Seven sections aimed at end users (not contributors):

  1. What is Design Jury
  2. How it works (the five panelists, auto-converging rounds,
     the composite formula)
  3. Settings (the M1 toggle and what it does)
  4. Reading the score badge
  5. Replay surface
  6. Troubleshooting (degraded, interrupted, failed)
  7. FAQ

The composite formula is documented as
    designer * 0 + critic * 0.4 + brand * 0.2 + a11y * 0.2 + copy * 0.2
because anyone trying to reverse-engineer the score is going to
search for those weights and the docs are the place they should
land first.

* docs(daemon): critique module AGENTS map (Phase 14.2)

Daemon-side wayfinder for the apps/daemon/src/critique directory.
Tables every file, what owns what invariant, and the 'when you
change anything here' guide so a future contributor does not
have to reverse-engineer the rollout resolver before adding a
new SSE event.

* docs(web): Theater module AGENTS map (Phase 14.3)

Web-side mirror of the daemon AGENTS map. Same file table, same
invariants section, same change-impact guide, sized to the
Theater component package.

* feat(daemon): rollout flag resolver (Phase 15.1)

Single decision point every caller consults to know whether the
orchestrator should wire the critique pipeline for a given run.
Priority:

  1. Skill-level policy (required wins, opt-out wins inversely)
  2. Per-project override from the Settings toggle
  3. OD_CRITIQUE_ENABLED env override
  4. Rollout phase default
       M0 dark-launch      false
       M1 settings only    false (toggle is off until the user flips it)
       M2 per-skill        true if skill opted in
       M3 global default   true

OD_CRITIQUE_ROLLOUT_PHASE parser defaults to M0 on unknown input
so a fresh install never surprises a user with the feature on.

10/10 vitest cases green covering every cell of the matrix.

* feat(web): Settings toggle hook for Critique Theater (Phase 15.2)

React hook that reads critiqueTheaterEnabled from the existing
open-design:config localStorage blob and stays in sync via:

  - the platform storage event (cross-tab)
  - a open-design:critique-theater-toggle CustomEvent (same-tab)

Same-tab event is the one that fires when the Settings panel saves
in the current window: the toggle and every mounted theater update
without a page reload.

setCritiqueTheaterEnabled(next) is the imperative setter the Settings
panel calls. It preserves the rest of the stored config (mode, apiKey,
etc.) and dispatches the same-tab event after the localStorage write.

The web hook reflects what the user toggled; the daemon-side
isCritiqueEnabled is the final routing authority (project override,
env, rollout phase). When they disagree, the daemon wins for backend
gating and the web reflects the toggle state.

6/6 vitest cases green covering first read, stored read, same-tab
event flip, config preservation, corrupted JSON tolerance, and
cross-tab storage event.

* test(web): Phase 15 toggle hook failure-mode coverage (PR #1320)

lefarcen P2 on PR #1320 flagged that the PR body claimed safe
behavior for disabled localStorage, non-object JSON, and missing
CustomEvent shim, but the suite only covered corrupt JSON plus
happy-path storage events. Added four failure-mode tests so the
swallowed errors are not silently traded for a throw in a future
refactor:

1. Returns false on a stored JSON value that parses to an array
   (non-object). Catches a regression where the guard treats
   anything truthy as a config blob.
2. Returns false on a stored JSON value of literal 'null'.
   typeof null === 'object' in JS, so the guard has to check null
   explicitly; this test pins that check.
3. Returns false when localStorage.getItem throws (private mode /
   disabled storage / SecurityError). The hook must swallow and
   return false so the rest of the app keeps rendering.
4. setCritiqueTheaterEnabled still dispatches the same-tab
   CustomEvent when localStorage.setItem throws (quota exceeded /
   disabled storage). The dispatch path is the in-session
   broadcast that keeps every mounted hook coherent even when
   persistence is unavailable; verified by mounting two probes
   and asserting both flip after the setter is called with a
   throwing setItem.

10/10 vitest cases green (6 existing + 4 new).

* fix(web): honor CustomEvent payload in toggle hook listener (PR #1320)

Both Siri-Ray (blocking) and lefarcen (P2 new) caught the same
real bug in the failure-mode test I added in affcdd27: the test
asserts the in-session UI flips when localStorage.setItem throws,
but the CustomEvent listener was ignoring the event's typed
detail and just calling readToggle(). Under a throwing setItem
the localStorage value is stale (or absent), so the listener
would see the OLD value and the test would fail (or worse, the
production claim 'in-session event keeps mounts coherent' was
hollow).

Fixed the hook, not the test: the listener now reads
event.detail.enabled when it is a boolean, falling back to
readToggle() only for malformed events or for cross-tab storage
events (which do not carry a typed payload). The setter already
dispatched the detail; the listener just was not consuming it.

Test changes:

  - The existing 'setItem throws' test now asserts the right
    behavior for the right reason. Updated the inline comment to
    say the listener reads from detail, not localStorage.
  - New test 'falls back to readToggle when the CustomEvent
    carries no usable detail' pins the fallback path: a
    malformed dispatcher (no detail, or detail.enabled not a
    boolean) degrades cleanly instead of throwing or being
    silently ignored.

11 / 11 vitest cases green (10 prior + 1 new fallback).

* feat(daemon): route critique spawn-path eligibility through the rollout resolver

The wireup edit Phase 10 and Phase 15 carved out: today server.ts gates
the critique pipeline on critiqueCfg.enabled, which is just the
OD_CRITIQUE_ENABLED env var. After this commit it gates on
isCritiqueEnabled(...) from the Phase 15 resolver, so the full
priority matrix is live:

  1. Per-skill od.critique.policy veto (opt-out / required)
  2. Per-project override (M1 Settings toggle, written through the
     existing Phase 6 settings endpoint)
  3. OD_CRITIQUE_ENABLED env override (power-user lane / CI fixtures)
  4. OD_CRITIQUE_ROLLOUT_PHASE default
       M0 dark-launch      false
       M1 settings only    false
       M2 per-skill        only when skillPolicy === 'opt-in'
       M3 global default   true

Default behaviour on a fresh install is unchanged: the resolver
returns false at M0 without an env override or a project override,
so prod traffic falls through to the legacy single-pass path
exactly the way it did before.

Inputs threaded today: phase from OD_CRITIQUE_ROLLOUT_PHASE,
envOverride from OD_CRITIQUE_ENABLED. skillPolicy and projectOverride
are passed as null for the v1 cutover; the daemon-side handler that
round-trips critiqueTheaterEnabled on the project settings row and
the od.critique.policy frontmatter resolver land as the next two
commits in this branch.

The three call sites that used critiqueCfg.enabled (the brand-thread
guard, the skill-thread guard, the top-line critiqueShouldRun
compound) now read from a single locally-scoped critiqueEnabledForRun
boolean, so the eligibility check is computed exactly once per spawn
and the prompt composer + orchestrator stay in lockstep the way
the existing comment already promised.

Tests still green: daemon vitest 22 / 22 across rollout +
conformance + adapter-degraded. Daemon typecheck clean.

* feat(web): mount CritiqueTheaterMount in ProjectView

The web counterpart of the daemon wireup. ProjectView now renders
<CritiqueTheaterMount projectId={project.id} enabled={...} /> as a
sibling of <AppChromeHeader> inside the top-level <div className="app">.

The mount is the drop-in from the Phase 9 stack: it owns the SSE
subscription, the kill-request handshake, and the phase-aware swap
from the live <TheaterStage> to the collapsed badge once a run
settles. The mount returns null until the daemon emits a
critique.run_started for the active project, so the visual surface
is byte-for-byte unchanged for users who have not opted in.

Enabled wiring: useCritiqueTheaterEnabled() reads the M1 Settings
toggle from the existing open-design:config localStorage blob and
stays in sync with both the platform storage event (cross-tab) and
the same-tab open-design:critique-theater-toggle CustomEvent the
Phase 15 setter dispatches. The hook honors the event payload
directly so a private-mode browser that cannot persist the toggle
still updates the in-session UI correctly.

The daemon-side gate (isCritiqueEnabled in apps/daemon/src/server.ts)
remains the authority for whether a run is actually wired through
the critique pipeline. This hook only governs whether the web layer
renders the resulting SSE stream when the daemon emits one. The
two-layer gate is intentional: an integrator embedding the Theater
in a custom UI can flip the web visibility independent of the
daemon's routing decision, and a daemon-side env override flips
backend gating without touching the web's localStorage.

Tests still green: web Theater suite 181 / 181 across 16 files.
Web typecheck clean.

* feat(daemon): resolve od.critique.policy frontmatter at the spawn site

The next step in the wireup branch's ladder: replace the placeholder
`skillPolicy: null` with the actual value parsed from the active
skill's SKILL.md frontmatter.

Three small edits, one new field on a public type:

1. SkillInfo gains a `critiquePolicy: SkillCritiquePolicy` field
   carrying the parsed `od.critique.policy` token (required /
   opt-in / opt-out / null). The field is null when the skill has
   no opinion, which lets the lower-priority resolver tiers
   (projectOverride, envOverride, phase default) decide.

2. listSkills() populates the new field via a small
   `normalizeCritiquePolicy` helper that tolerates the YAML
   scalar's casing and trims whitespace. Unknown tokens collapse
   to null so a typo in SKILL.md cannot accidentally force the
   panel on or off; it just falls through. Derived example cards
   inherit the parent's policy.

3. server.ts captures `skill.critiquePolicy` into a hoisted
   `skillCritiquePolicy` variable inside the existing skill-load
   block, then threads it into the isCritiqueEnabled call as the
   skillPolicy input. The hoisting keeps the variable in scope at
   the resolver call site without restructuring the spawn handler.

After this commit, the priority matrix the rollout resolver was
designed for is live for its top tier. The previous commit wired
env + phase; this one wires skill. The projectOverride input
remains null pending the next commit that extends the Phase 6
settings endpoint.

Daemon vitest: 10 / 10 rollout cases pass against the new wiring.
Daemon typecheck: clean.

* feat(daemon): feed projectOverride into the rollout resolver from project metadata

Replaces the placeholder `projectOverride: null` in the spawn
handler with the actual value the Settings panel writes onto the
project's metadata blob: `critiqueTheaterEnabled?: boolean`.

The read is defensive at the boundary: the metadata object is
typed loosely (it round-trips through SQLite as a free-form JSON
blob), so the spawn handler narrows to `boolean` and falls
through to `null` for any other shape. A missing key, a malformed
value, or a project that has never visited Settings collapses to
`null`, which is exactly the resolver's "no opinion, fall
through to env / phase" signal.

The `critique` frontmatter slot also gets typed on the
SkillFrontmatter shape so the `od.critique.policy` chain the
previous commit introduced no longer needs a bracket-access
cast. Same pattern as the existing `craft`, `preview`, and
`design_system` nested-record slots.

After this commit, every tier of the rollout resolver's priority
matrix is wired:

  1. skillPolicy   (from SKILL.md od.critique.policy)
  2. projectOverride (from project metadata critiqueTheaterEnabled)
  3. envOverride   (from OD_CRITIQUE_ENABLED)
  4. rollout phase (from OD_CRITIQUE_ROLLOUT_PHASE)

The write path for projectOverride still flows through the
existing project-update handler the Settings panel already uses
to persist project metadata; no new endpoint is needed. The
Settings UI button that calls setCritiqueTheaterEnabled and
posts the new field is the next commit on this branch.

Daemon typecheck: clean. Daemon vitest: 10 / 10 rollout cases
still green against the new wiring.

* fix(daemon): forward critique events to project sinks + align composer gate (PR #1338)

Two codex review items addressed in one commit since they share the
same root cause (resolver-enabled run hits a transport / prompt
contract that was still env-gated):

P1 (transport mismatch). The daemon emits critique.* SSE frames
through critiqueBus -> design.runs.emit, which fans out on
/api/runs/:runId/events. The web CritiqueTheaterMount subscribes to
/api/projects/:projectId/events (it's project-scoped, not run-
scoped, because the mount lives at the project workspace and
follows the user across runs). Result: in production the mount
never sees a real frame and the e2e tests' stubbed routes hide the
mismatch.

Fixed by extending critiqueBus.emit to fan out to BOTH sinks: the
existing runs.emit transport, AND the per-project event-sinks map.
The project-events route emits via sse.send(payload.type, payload),
so we pack the SSE channel name onto payload.type and let the sink
push the right channel. The web sseToPanelEvent overwrites type
from the channel name on the way back into a PanelEvent, so the
round-trip stays correct.

P2 (prompt gate misalignment). composeSystemPrompt reads
cfg.enabled to decide whether to append the panel addendum, but
critiqueCfg.enabled is loaded from OD_CRITIQUE_ENABLED only. A run
the resolver enabled via phase / project / skill (env unset) would
have critiqueShouldRun = true while critiqueCfg.enabled remained
false, dropping the panel prompt while still routing through
runOrchestrator -> parser waits for tags that never arrive -> run
degrades.

Fixed by passing a derived config { ...critiqueCfg, enabled: true }
to the composer when critiqueShouldRun is true. The composer's own
gate now agrees with the resolver decision on every input the
spec defines.

Daemon typecheck: clean. Daemon vitest: 10 / 10 rollout cases
still green against the new wiring.

* fix: address PerishCode P1 + P2 follow-ups on PR #1338

Two follow-up items PerishCode flagged on the activation PR.
Non-blocking but both are real:

1. Phase 11 e2e suite was wired into test:ui:extended but lands
   the user on '/' (home route) where ProjectView (and therefore
   CritiqueTheaterMount) is never rendered. With the suite as
   written, every assertion would time out the first time the
   lane runs in CI, contradicting the PR body's claim that the
   suite stays parked behind test.describe.fixme.

   The state diverged from my earlier Phase 11 work because the
   merge from main on commit 4ab719c6 brought in #1307's
   squash-merged version of the e2e file (the pre-fixme shape).

   Re-applied test.describe.fixme to the describe block plus
   removed ui/critique-theater.test.ts from the test:ui:extended
   script in e2e/package.json. Added a file-header docblock
   explaining what the follow-up commit needs to do: replace
   goto('/') with /projects/:id navigation similar to
   app-design-files.test.ts, split the SSE fixture into a live
   prefix and terminal suffix (Codex P2 on PR #1320), and commit
   the first PNG baselines.

2. bestRoundOf in CritiqueTheaterMount returned the LAST round
   with a numeric composite, not the round with the HIGHEST
   composite, while bestCompositeOf correctly returned the max.
   A run that closed round 1 at 8.5 and round 2 at 6.0 would
   dispatch interrupted { bestRound: 2, composite: 8.5 } on a
   user-clicked interrupt.

   Folded the two helpers into a single bestRoundAndComposite
   that walks state.rounds once and returns the matching pair so
   the two values cannot drift. The onInterrupt callback now
   destructures from one helper instead of two independent reads.
   Falls back to (state.activeRound, 0) when no round has closed
   with a composite yet.

Web typecheck: clean. CritiqueTheaterMount.test.tsx: 7 / 7 cases
still green against the new helper.

* fix: wire M1 project override end-to-end + correct deferred-surface doc claims (PR #1338)

Three lefarcen P2s on the latest review pass, all real:

1. M1 project override was half-wired: the daemon read
   metadata.critiqueTheaterEnabled but the web setter only
   wrote localStorage. A user opt-in would render the Theater
   on the web (localStorage was set) while the daemon resolved
   projectOverride=null and skipped critique unless env / phase
   already permitted. Two halves talking past each other.

   Extended setCritiqueTheaterEnabled to accept an optional
   { projectId, fetchProjectSettings } options bag. When a
   projectId is supplied, the setter ALSO sends a
   PATCH /api/projects/:id with { metadata: { critiqueTheaterEnabled
   } } so the daemon's spawn-time resolver picks the same value up
   on the next generation. The existing project-routes endpoint
   already accepts arbitrary metadata patches, so no new endpoint
   is needed. The local write + the CustomEvent dispatch still
   fire before the PATCH, so a network failure does not unwind
   the in-session UI flip. Three new vitest cases pin the new
   path: PATCHes when projectId is provided, skips when it is
   not, swallows a rejected PATCH so the in-session UI still
   flips.

2. Rollout docs (docs/critique-theater.md section 3) claimed the
   Settings toggle persists into the daemon settings store, but
   the previous implementation only had a localStorage reader /
   writer plus a daemon read of project metadata, with no
   round-trip. Rewrote the section to lead with the four-tier
   resolver (skill policy / project override / env / phase),
   document that the setter now round-trips via the existing
   PATCH endpoint when given a projectId, and call out the
   Settings panel UI control as a deliberate follow-up.

3. Troubleshooting table pointed users at /api/metrics/critique
   (Phase 12, deferred) and 'od adapters clear-degraded <id>'
   (CLI wrapper that does not exist). Replaced the metrics
   reference with the local conformance harness command
   (pnpm --filter @open-design/daemon vitest run
   tests/critique-conformance.test.ts) that ships today, with a
   note that the Phase 12 dashboard surfaces this status as a
   series once that PR lands. Replaced the CLI command with the
   programmatic clearDegraded() helper that exists today and
   flagged the CLI wrapper as planned follow-up.

Web typecheck: clean. Toggle hook tests: 14 / 14 green (11
existing + 3 new for the round-trip path).

* test(web): multi-round interrupt regression for bestRoundAndComposite (PR #1338)

lefarcen P3 follow-up to the previous bestRoundAndComposite fix:
the existing CritiqueTheaterMount.test.tsx interrupt cases only
exercised a single-round state, so a future refactor back to two
independent helpers wouldn't be caught by the test suite even
though it'd reintroduce the round / composite drift bug.

Added a regression case that:

  1. Drives the reducer through two complete rounds with the
     full 5-role cast closing at distinct composites: round 1
     at 8.5, round 2 at 6.0 (the high-composite round is NOT the
     most recent one).
  2. Clicks Interrupt + waits for the daemon ack via the test
     seam fetcher returning 204.
  3. Asserts the collapsed badge displays "round 1" (the
     correct best-composite round), and queryByText for
     "round 2 ... 8.5" returns null (the buggy pairing
     would have produced that string).

The bestRoundAndComposite helper walks state.rounds in one pass
and returns the matching pair, so the round number and the
composite cannot drift apart. This test locks the fix in: a
refactor that splits the helpers back into independent walks
will be caught here.

8 / 8 vitest cases green on the file.

* fix(web): read-merge-write the project metadata in setCritiqueTheaterEnabled (PerishCode P2 on PR #1338)

The previous round-trip sent { metadata: { critiqueTheaterEnabled: next } }
as the entire PATCH body. The daemon's project-routes handler only
re-stamps three immutable fields (baseDir, importedFrom,
fromTrustedPicker) before calling updateProject(db, id, patch),
which then does a shallow { ...existing, ...patch } in apps/daemon/
src/db.ts. So patch.metadata replaces the row's metadata wholesale,
dropping kind, templateId, linkedDirs, and every other field the rest
of the app reads.

No in-tree caller passes projectId today (only vitest cases), so the
bug had not surfaced yet. But the surface is documented in
docs/critique-theater.md section 3 and the function's own JSDoc as
the M1 round-trip path, so it would have shipped as a latent footgun
for the next integrator: a Settings UI follow-up, or any third party
that wires the setter into a project-aware surface.

Fix: read-merge-write rather than a bare patch.

- GET /api/projects/:id to read the row's current metadata.
- Spread that metadata into the PATCH body and overlay
  critiqueTheaterEnabled: next on top, mirroring the partial-metadata
  pattern already used in ChatComposer.tsx for linkedDirs.
- PATCH the merged object.

Failure handling:
- GET fails: skip the PATCH entirely. We cannot construct a safe
  merged body without the current state, and a bare patch would
  wipe other metadata. The in-session CustomEvent fired earlier in
  the setter still keeps every mounted hook consistent; the next
  save retries the round-trip.
- PATCH fails: log in dev. The in-session UI is already correct via
  the CustomEvent.

Tests (TDD, red-first):

- 'GETs the project then PATCHes with merged metadata when a
  projectId is supplied': stubs a GET that returns
  { kind: 'template', templateId: 'modern-blog', linkedDirs: [...] }
  and asserts the PATCH body equals the merge plus the toggle.
- 'PATCHes with just the toggle when the project has no prior
  metadata': stubs a GET that returns no metadata block.
- 'skips the PATCH (does not stomp metadata) when the prefetch GET
  fails': stubs a rejecting GET and asserts only the GET fires.
- 'swallows a rejected PATCH after a successful prefetch': stubs a
  successful GET and a rejecting PATCH; asserts the in-session UI
  still flips via the CustomEvent.

Doc updated on the setter's JSDoc to describe the new three-step
flow (localStorage, CustomEvent, read-merge-write PATCH) and the
two failure modes.

Verified:
- pnpm --filter @open-design/web typecheck clean.
- pnpm --filter @open-design/web test: 111 files / 1055 tests green
  (was 1052, +3 from the new merge-flow cases).

* fix(web): restore wait-for-daemon-ack pattern on Theater interrupt

Same regression as flagged on PR #1316 post-main-merge: the
optimistic local dispatch fired before the POST resolved, so a
daemon 404 / 409 still terminalized the UI and the real SSE
terminal event got ignored by the sticky interrupted phase.

Snapshot runId / bestRound / composite at click time, dispatch
interrupted only on res.ok, clear interruptPending on rejection or
non-2xx so the user can retry. Tests cover rejection + 404 leaving
the run on the live stage; the 204 path waits for the ack.

* feat(daemon): Critique Theater Phase 12 observability foundations

Lands the metrics registry, the structured logger, the /api/metrics
route, and the adapter-degraded bump that wires up the first data
point. The orchestrator-side bumps for runs / rounds / composite /
must-fix / interrupted / parser_errors / protocol_version land in a
follow-up commit on this branch (kept separate so the wiring diff
reads cleanly against the registry shape).

Surfaces added:

- apps/daemon/src/metrics/index.ts: 9 Prometheus series under the
  open_design_critique_* namespace with the histogram buckets the
  spec calls out (round_duration_ms at 100 / 250 / 500 / 1000 /
  2500 / 5000 / 10000 / 30000 / 60000 ms; composite_score at
  0-10 integer steps).
- apps/daemon/src/logging/critique.ts: 6 typed events, one JSON line
  per call on stdout, namespaced critique. Matches the JSON-per-line
  convention cli.ts already uses; no new logger framework.
- apps/daemon/src/server.ts: GET /api/metrics route. Honors
  OD_METRICS_ENDPOINT=disabled to opt out for air-gapped installs.
- apps/daemon/src/critique/adapter-degraded.ts: markDegraded now
  bumps degraded_total so the adapter-health dashboard panel
  reflects every TTL refresh and every fresh mark.

Deps: prom-client ^15.1.0, @opentelemetry/api ^1.9.0 added to
apps/daemon/package.json. Both are zero-config no-ops without an
exporter wired; daemon bundle size impact is ~150 KB uncompressed.
The @opentelemetry/api dep is in place ahead of the OTel-spans
follow-up commit; it adds no behavior on this commit.

Tests:
- tests/metrics/critique.test.ts (3 cases): registry shape +
  exposition text + reset-between-tests
- tests/logging/critique.test.ts (4 cases): event shape + ordering
  + newline framing + namespace stamping

Verification (Windows-local):
- pnpm --filter @open-design/daemon typecheck: clean
- New metrics + logging suites: 7 / 7 green
- Existing adapter-degraded + conformance + rollout suites:
  22 / 22 green; the bump is non-breaking

* feat(daemon): wire Critique Theater metrics + structured logs from the orchestrator

Lights up the bump sites the Phase 12 foundations PR registered the
series for. Every panel event the parser surfaces now reaches the
matching Prometheus counter / histogram and the matching JSON log
line on stdout.

Switch-loop bumps + logs:

- run_started: log run_started, set protocol_version gauge to the
  observed protocol version (small-integer cardinality).
- panelist_open: record the first-open wall-clock per round so
  round_end can compute round_duration_ms; subsequent opens in the
  same round leave the start time untouched.
- panelist_must_fix: bump must_fix_total with the panelist role.
  The wire event does not yet carry a dim name, so the label is
  'unspecified' for now; a future parser revision can drop in the
  real dim without a metric rename.
- round_end: bump rounds_total, observe composite_score, observe
  round_duration_ms (current ms minus the tracked start), log
  round_closed with the composite / mustFix / decision triple.
- parser_warning (parser-yielded): bump parser_errors_total with
  the kind label, log parser_recover with kind + position.

Orchestrator-side parser warnings (composite_mismatch and
duplicate_ship from the daemon-authoritative scoring checks) go
through a new emitParserWarning helper so the bus emit, the
collectedEvents push, the metric bump, and the log line stay in
lockstep. Three inline emission sites collapse to one-line helper
calls.

After the try/catch, a single terminal-status switch bumps
runs_total{status, adapter, skill} once per run, with branch-
specific log + counter:

- shipped / below_threshold: log run_shipped
- interrupted: bump interrupted_total, log run_failed{cause: interrupted}
- timed_out: log run_failed{cause: timed_out}
- failed: log run_failed{cause: orchestrator_internal}
- degraded: log degraded{reason: orchestrator_classified}

OrchestratorParams gains optional skill: string for the label;
defaults to 'unknown' so spawn sites that have not yet threaded it
keep working without a metric shape change.

Tests:
- The new metrics + logging suites (7 / 7) verify registry shape
  and event framing; orchestrator-side metric integration is
  exercised through the existing critique-conformance and
  critique-adapter-degraded suites (22 / 22 still green).
- Logger test reassigns process.stdout.write directly instead of
  vi.spyOn so the Node overloaded write signature does not
  collide with MockInstance<unknown>.

* feat(observability): Grafana dashboard JSON for Critique Theater

Three default rows mapping to the metrics this branch wires up:

1. Fleet quality: composite score p50 / p90 / p99 line graph by
   adapter, plus a heatmap of the composite distribution. The
   line graph answers 'are my agents getting better over time';
   the heatmap answers 'are the bad runs clustered around one
   adapter or smeared across the fleet'.

2. Adapter health: stacked bar charts for degraded marks (by
   adapter / reason) and parser errors (by adapter / kind) over
   a 5-minute window. The two queries together let an operator
   see 'is this adapter degraded because of malformed wire output
   or because of oversize blocks' without flipping panels.

3. Brief throughput: runs-per-hour by terminal status, an average
   rounds-per-run stat per adapter, and a round-duration ms p50 /
   p90 / p99 line. Throughput numbers fall straight out of the
   runs_total / rounds_total counters; the duration histogram is
   the same one the runs feed.

The dashboard uses a templated $datasource var (defaults to
'prometheus') so an operator with multiple Prometheus instances
can switch without editing JSON. Schema version 39 (Grafana 11).

Operators import via:

  pnpm dlx @grafana/cli dashboard import     tools/dev/dashboards/critique.json

or paste into a provisioned dashboards directory. The file is
checked into the repo as a starting artifact; alert rules and
SLO panels ship after the first 1000 runs inform the right
thresholds. JSON validates with node -e 'JSON.parse(...)' (sanity
checked locally).

* feat(daemon): OpenTelemetry outer span around the critique run

Wraps each runOrchestrator call in a 'critique.run' span via the
existing @opentelemetry/api dep added in the Phase 12 foundations
commit. Attributes set on the span:

- critique.run_id, critique.adapter, critique.skill at start
- critique.final_status, critique.final_composite on terminal
  resolution
- span status flipped to ERROR for failed / timed_out runs so a
  Tempo / Honeycomb / Jaeger filter on traces.status=error
  surfaces the right slice without joining back to Prometheus

No exporter is wired by default; @opentelemetry/api is the API
package and intentionally splits from @opentelemetry/sdk-*, so
the span is zero-overhead until an operator attaches an SDK
through their runtime config.

Inner per-round / parse_chunk / scoreboard_eval / persist_round /
ship.persist spans defined in the Phase 12 plan are a follow-up:
the outer span alone gives the trace a duration + final status +
adapter/skill labels, which is the 80% value for dashboards that
correlate runs across services. Adding child spans inside the
existing 600-line orchestrator without restructuring is a separate
careful change.

Verification:
- pnpm --filter @open-design/daemon typecheck: clean
- 29 / 29 critique + metrics + logging tests still green

* fix(nix): bump pnpmDepsHash for prom-client + @opentelemetry/api lockfile bump

nix-check failed on PR #1485 with hash mismatch in
open-design-daemon-pnpm-deps and open-design-web-pnpm-deps after
the Phase 12 foundations commit (2b8b7445) added prom-client and
@opentelemetry/api to apps/daemon/package.json and refreshed
pnpm-lock.yaml.

CI reported the new sha:
  specified: HFLm+8hv3o5x3Xem4MXNsNclIgiVRc70+EBafL0rVn8=
  got:       7R1sQC38gOT0gsZ2oNOviCZ486cbbGJGJCis6WI8z9s=

Both nix files pin the same workspace lockfile, so both flip in
lockstep. No other Nix surface changes required.

* fix(daemon): four Phase 12 review findings (Codex P2 x2 + Siri-Ray P2 + lefarcen P2)

1. Siri-Ray P2 in orchestrator.ts (round metric / log used untrusted
   agent values). The new observability path now records rs.composite
   and rs.mustFix (daemon-authoritative) instead of event.composite
   and event.mustFix when rs exists, and skips the bumps + log
   entirely when rs is missing (a degenerate round_end without any
   matching panelist_open). The dashboard p50 / p90 / p99 now agrees
   with persistence and ship decisions; an adapter reporting <ROUND_END
   composite='10'> while the daemon computed 6 logs 6 and still emits
   the composite_mismatch parser warning the prior block was already
   producing.

2. Codex P2 in server.ts (skill label always 'unknown'). The spawn
   path called runOrchestrator without passing the resolved skill id,
   so every live run bumped open_design_critique_*{skill='unknown'}
   and the per-skill dashboard breakdown was always empty. Threaded
   effectiveSkillId (already computed at the same handler scope as
   the project skill fallback) through skill: . . . so the metric
   reflects the real skill when one is assigned, and the orchestrator
   default of 'unknown' only fires for runs that genuinely have none.

3. Codex P2 in conformance.ts (protocol-version mismatch let through).
   An adapter that emitted <CRITIQUE_RUN version='2'> followed by a
   valid SHIP classified as shipped because the harness only watched
   for terminal events. Added a guard inside the parse loop: if a
   run_started carries protocolVersion !== CRITIQUE_PROTOCOL_VERSION,
   mark the adapter degraded with reason 'protocol_version_mismatch'
   (already in DEGRADED_REASONS) and return early. ConformanceOutcome
   union widened to accept the new reason.

4. lefarcen P2 in tools/dev/dashboards/critique.json (runs-per-hour
   panel under-reported by 3600x). 'rate(...[1h])' returns per-second.
   Multiplied by 3600 so the panel title and unit match the actual
   value rendered.

Verification:
- pnpm --filter @open-design/daemon typecheck: clean
- New metrics + logging suites (7), existing adapter-degraded (7),
  conformance (5), rollout (10): 29 / 29 green
- Grafana JSON re-parses with node -e 'JSON.parse(...)'

* fix(nix): set pnpmDepsHash to fakeHash so CI surfaces the real hash for the regenerated lockfile (lefarcen P1 on PR #1485)

* fix(nix): pin pnpmDepsHash to sha256-NtXbiRU0YZ4EVJVNC6N3sR1S0ozA3BvCwgXI0L0OMH4= from CI nix-check output

---------

Co-authored-by: Nagendhra <nagendhra405@gmail.com>
2026-05-13 22:11:27 +08:00
Nagendhra Madishetti
385e1d111d
feat(web): Critique Theater Phase 9 (drop-in mount wrapper, native i18n for de, ja, ko, zh-TW) (#1315)
* feat(web): pure reducer for Critique Theater states (Phase 7.1)

Pure CritiqueState reducer driven by the contracts-level PanelEvent
(the same shape both the live SSE stream and the recorded transcript
emit), so a single reducer powers both the in-flight panel and the
rerun replay. Lifecycle covers run_started → running → (shipped /
degraded / interrupted / failed), with panelist_open / dim /
must_fix / close / round_end events building per-round
CritiquePanelistView entries as they arrive.

Defensive behaviour that surfaced while writing the spec tests:
- Terminal phases (shipped / degraded / interrupted / failed) are
  sticky against further lifecycle events for the same run, except
  for parser_warning which can land late and is recorded in a side
  channel without changing phase.
- A new run_started for a different runId at any time discards the
  prior state and reboots, so the UI can launch consecutive runs
  without an explicit reset action.
- Events whose runId does not match the active run return the same
  state reference, so React's useReducer doesn't re-render
  subscribers on stray traffic.
- Round bookkeeping keys by round number rather than "always last",
  so an out-of-order panelist_dim for round 1 arriving after a
  round 2 dim does not corrupt the round 2 bucket.

Test coverage: 18 cases covering each transition, the runId guard,
sticky-terminal behaviour, the out-of-order round invariant, and
the stable-identity guarantee. Sets up Phase 7.2 and 7.3 to wire
SSE + replay into the same reducer.

* feat(web): useCritiqueStream hook subscribes to SSE and feeds reducer (Phase 7.2)

createCritiqueEventsConnection is a pure connection manager that
mirrors apps/web/src/providers/project-events.ts: opens an
EventSource at /api/projects/:id/events, listens for every name in
CRITIQUE_SSE_EVENT_NAMES, decodes each frame back into a PanelEvent
(stripping the critique. prefix and merging the data payload), and
hands it to the caller's onEvent. Reconnect uses exponential
backoff (1s → 30s) and resets on `ready`; malformed payloads drop
with a dev-mode warning rather than tearing the stream.

useCritiqueStream wraps the manager in a useReducer that owns the
CritiqueState. enabled=false or a null projectId tears down the
connection cleanly; switching projectId closes the old connection
and opens a fresh one. The returned dispatch lets local UI
synthesise actions (e.g. an Esc keypress firing a synthetic
interrupted while a kill request is in flight); production traffic
comes from the SSE stream.

Test coverage:
- sse.test.ts (10 cases, node env): subscription set covers every
  CRITIQUE_SSE_EVENT_NAMES channel; payload decoding lifts the wire
  shape back to PanelEvent; malformed JSON is swallowed and does
  not stop the stream; exponential backoff schedule and ready-reset
  semantics are pinned with a setTimeout seam; close() cancels
  pending reconnects and shuts the live source; no-op fallback
  when EventSource is unavailable.
- useCritiqueStream.test.tsx (6 cases, jsdom env): idle pre-event,
  reducer driven by synthetic actions, no connection when disabled
  or projectId is null, clean close on unmount, projectId change
  reopens cleanly.

* feat(web): useCritiqueReplay hook drives reducer from transcript file (Phase 7.3)

Fetches the per-run NDJSON transcript (one PanelEvent per line),
parses every line via the shared isPanelEvent predicate, and
dispatches into the same CritiqueState reducer the live SSE stream
uses. A single reducer means the UI rendering a replay can be
identical to the live panel, and a UI mounting both
useCritiqueStream and useCritiqueReplay in parallel does not have
to reconcile two state shapes.

speed knob is `paused | instant | live | { intervalMs: N }`.
- instant flushes every event synchronously, useful for opening a
  finished run already at its terminal state.
- intervalMs paces dispatches at a fixed cadence so the reviewer
  can watch the run unfold.
- paused parses the transcript but holds events back until the
  caller advances speed (consumers can drive a scrubber later).
- live is reserved for the future "playback at original cadence"
  feature, currently treated as instant; replay timestamps are not
  yet persisted with each event so honest pacing requires a
  follow-up Phase 7+ task.

gunzip seam handles `.ndjson.gz` transcripts via
DecompressionStream when present; the production fetch path picks
between text and arrayBuffer based on the URL extension. Both seams
are injectable so the unit tests don't need to spin up a real
network or a real gzip pipeline.

Test coverage (8 cases, jsdom env):
- Idle status before any URL is provided.
- speed=instant flushes the full transcript synchronously to
  shipped state.
- speed={intervalMs:N} paces with the setTimeout seam, reaching
  done after the last tick.
- speed=paused leaves status=playing with no dispatches.
- Empty transcript reports done with state still idle.
- Fetch rejection surfaces an error status with the message.
- Malformed NDJSON lines are skipped; valid events around them
  still land.
- .gz transcripts route through the gunzip seam.

Closes the Phase 7 plan tasks 7.1 / 7.2 / 7.3 (reducer + stream +
replay), all on one branch ready for review. Phases 8+ (Theater
components) consume these from this PR.

* fix(web): close payload-override gap + paused-resume bug in Critique Theater hooks (Phase 7 review)

Two P1 fixes from lefarcen's review on PR #1307:

SSE payload override

`sseToPanelEvent` previously spread `data` after the channel-derived
`type`, so a payload-provided `type` could override the channel and
route a `critique.run_started` frame into the reducer as a `ship`
action. Reversed the spread so the channel-derived `type` is
authoritative, and revalidated the resulting object through the
contracts-level `isPanelEvent` predicate before returning. Frames
that fail validation (missing runId, empty runId, unknown type) are
dropped, so a malformed or compromised SSE frame can no longer
dispatch a wrong-shape action into the reducer.

Three new sse.test.ts cases pin the regression: hostile `type:'ship'`
in the payload still resolves to `run_started`, missing runId is
dropped, empty runId is dropped.

Replay pause/resume

`useCritiqueReplay` had one big effect keyed on `transcriptUrl`
only, so flipping `speed` from `paused` to `instant` never re-fired
and the held events sat undispatched. Split into a parse effect
(depends on URL, fetches and stores events in state) and a pace
effect (depends on parsed-events + speed, owns the cursor + timers).
The playback cursor lives in a ref that survives pause/resume
cycles, so flipping `paused` -> `instant` flushes from the current
position rather than restarting (which would double-dispatch
`run_started` and reset the reducer).

Two new useCritiqueReplay.test.tsx cases:
- paused-then-instant transitions from `playing` to `done` and
  reaches the shipped terminal phase
- intervalMs paced playback dispatches one event, pauses to drain
  the next scheduled timer, flips to instant, and confirms the
  remaining transcript drains exactly once (cursor was preserved)

Doc consistency

The earlier source comment in useCritiqueReplay.ts claimed `live`
"paces by recorded timestamps" while the impl used zero-delay
timers and the PR body said it behaves like `instant`. Aligned to
reality: `live` currently behaves like `{ intervalMs: 0 }` (events
drain on successive microtasks via setTimeoutFn) because transcripts
do not yet carry per-event timestamps. Honest timestamp-driven
pacing is queued as a Phase 7+ follow-up.

Validated: pnpm guard, pnpm --filter @open-design/web typecheck,
Theater suite 47/47 (up from 42, +3 sse + 2 replay), full web suite
96 files / 888 tests.

* feat(i18n): seed Critique Theater key block (en + zh-CN; other locales fall back via spread)

* feat(web): Theater PanelistLane component (Phase 8.1)

* feat(web): Theater ScoreTicker component (Phase 8.2)

* feat(web): Theater RoundDivider component (Phase 8.3)

* feat(web): Theater InterruptButton component with Escape keybind (Phase 8.4)

* feat(web): Theater TheaterDegraded chip (Phase 8.5)

* feat(web): Theater TheaterCollapsed post-run summary (Phase 8.6)

* feat(web): Theater TheaterTranscript replay surface (Phase 8.7)

* feat(web): Theater TheaterStage top-level container (Phase 8.8)

* feat(web): Theater CSS using existing semantic tokens (no hex literals)

* feat(web): Theater public exports barrel

* fix(web): resolve P2 + P3 review feedback on Phase 8 (PR #1314)

Addresses all 4 P2 + 3 P3 items from codex, Siri-Ray, and lefarcen.

State-lifecycle fixes (3 x P2)
1. Reducer learns a synthetic `__reset__` action (`CritiqueResetAction`).
   Host hooks dispatch it when their gating prop changes so a stale
   run from a prior project / transcript cannot bleed into the next
   context. Reset is idempotent on idle (returns the same reference).
2. `useCritiqueStream` dispatches `__reset__` at the top of its
   connection effect, so a workspace switch from project A (which
   streamed a critique) to project B clears the reducer before the
   new EventSource opens. enabled=false also clears.
3. `useCritiqueReplay` dispatches `__reset__` at the top of its
   parse effect, so transcriptUrl swaps (including swap-to-null after
   a replay reached `shipped`) lift the reducer back to idle before
   the new fetch starts.

SSE validation (1 x P2)
4. `sseToPanelEvent` now runs a per-variant `hasValidVariantShape`
   check after the cheap `isPanelEvent` predicate. A
   `critique.ship` frame missing `composite` / `round` / `status` /
   `artifactRef` is rejected before reaching the reducer, so
   TheaterCollapsed can no longer crash on `undefined.toFixed(1)`.
   Every variant's required fields are validated: run_started
   (protocolVersion, non-empty cast, maxRounds, threshold, scale),
   panelist_* (round, role, plus variant-specific shape), round_end
   (round, composite, mustFix, decision in {continue,ship}, reason),
   ship (round, composite, status, artifactRef.{projectId,artifactId},
   summary), degraded (reason, adapter), interrupted (bestRound,
   composite), failed (cause), parser_warning (kind, position).

Reducer correctness (1 x P2)
5. `panelist_open` now materializes the round + an empty panelist
   view (`{dims: [], mustFixes: []}`) so TheaterStage can highlight
   the in-progress lane the instant the tag opens. Before this, a
   stream that emitted only `panelist_open` after `run_started` left
   `rounds = []` and the UI rendered no current round until a later
   `panelist_dim` arrived.

Polish (3 x P3)
6. Brand role tint swaps from `var(--magenta, var(--accent))` to
   `var(--purple, var(--accent))`. `--purple` is actually defined
   across the design systems; `--magenta` is not, so Brand was
   silently falling through to `--accent` and looking identical to
   Designer.
7. New i18n key `critiqueTheater.interruptedSummary` for the
   interrupted-collapse copy ("Interrupted at round N, best
   composite X.X"). Previously the interrupted branch reused
   `shippedSummary` and the UI read "Shipped at round..." for a run
   that specifically did not ship. Native value in en + zh-CN; other
   locales fall back via `...en` spread.
8. `TheaterDegraded` heading id comes from `useId()` instead of a
   hardcoded `theater-degraded-heading`, so two chips rendered on
   the same page (chat history with multiple completed runs) keep
   their aria-labelledby references unambiguous.

Tests (15 new cases)
- reducer.test.ts (+5): __reset__ on running/terminal/idle, panelist_open materializes round, panelist_open does not stomp prior panelist data.
- sse.test.ts (+6): variant-level rejection for ship without required fields, degraded without adapter, run_started with empty cast, panelist_dim with non-numeric score, round_end with unknown decision, plus a positive fully-formed ship.
- useCritiqueStream.test.tsx (+2): state reset on projectId change, state reset on enabled flip false.
- useCritiqueReplay.test.tsx (+1): state reset on transcriptUrl swap to null after a replay reached shipped.
- TheaterCollapsed.test.tsx (text-pinning update): asserts the interrupted branch reads "Interrupted at round 1" + "best composite 7.9", and explicitly NOT "Shipped at round...".
- TheaterDegraded.test.tsx (+1): two chips on the same page get unique aria-labelledby ids that each resolve to an `<h3>`.

Validated
- pnpm guard clean
- pnpm --filter @open-design/web typecheck clean
- Theater suite: 13 files, 101 tests (was 86 on the first Phase 8 push, +15 new)
- tests/i18n/locales.test.ts 5 of 5 across 18 locales

* feat(web): CritiqueTheaterMount wires SSE + reducer into a single drop-in (Phase 9.1)

* feat(i18n): Critique Theater strings for de + ja + ko + zh-TW (Phase 9.2)

* fix(web): resolve P1 + P2 review feedback on Phase 9 (PR #1315)

Addresses every blocker from codex, Siri-Ray, and lefarcen. The
three state-lifecycle and SSE-validation issues they also flagged
inherit fixes from PR #1314's review pass that this branch now sits
on top of after rebase.

Real daemon kill on Interrupt (P1)
- CritiqueTheaterMount now POSTs to
  /api/projects/:id/critique/:runId/interrupt alongside the
  optimistic local dispatch. Before this fix, clicking Interrupt
  only flipped the React state to interrupted while the daemon job
  kept running. The fetch is best-effort: a 404 (endpoint not wired
  yet, lands in Phase 15) is swallowed with a dev-mode console.warn
  so the UI still moves to the collapsed badge.
- New fetchInterrupt test seam lets RTL assert on the URL / method
  and simulate the "daemon not ready yet" path. Two tests pin both:
  the happy URL proj-42/critique/run-abc/interrupt POSTs, and a
  rejected fetch still flips the UI.

interruptPending reset on new run (P2)
- A ref-backed effect compares the current runId against the last
  one we saw; when it changes, interruptPending is cleared. A user
  who interrupts run-1 and then triggers run-2 from the same mount
  now gets a fresh, enabled kill button instead of one stuck in
  "Interrupting…". Pinned by a new mount test.

Escape keybind scope (P2)
- InterruptButton now checks the keydown target. Escape inside an
  input, textarea, select, or contenteditable element is ignored
  (and any ancestor of those via closest() is treated the same
  way). Body-level focus still fires the keybind so the Theater
  area's affordance keeps working. Four new tests cover textarea,
  input, contenteditable, and the body-focus positive case.

userFacingName i18n key (P2)
- The spec at specs/current/critique-theater.md:6 mandates a single
  critiqueTheater.userFacingName key so the "Design Jury" label can
  be renamed without touching code. Phase 8 introduced
  critiqueTheater.title by mistake; renamed across types.ts, en.ts,
  zh-CN.ts, de.ts, ja.ts, ko.ts, zh-TW.ts, and the lone consumer
  TheaterStage.tsx. The locale alignment test stays green.

Validated
- pnpm guard clean
- pnpm --filter @open-design/web typecheck clean
- Theater suite: 14 files, 112 tests (was 101 before, +11 new for
  the Phase 9 review pass: 3 mount + 4 InterruptButton focus scope;
  the rest were already in #1314's review fix).
- tests/i18n/locales.test.ts 5 of 5 across 18 locales.

* fix(web): address PerishCode P3 polish items on PR #1315

Three non-blocking findings from PerishCode's latest review folded
in alongside the merge-with-main work.

1. bestRoundOf / bestCompositeOf could pair mismatched values.
   The split helpers walked state.rounds independently: bestRoundOf
   returned the LAST round with a numeric composite; bestCompositeOf
   returned the MAX composite. With non-monotonic composites (round
   1 at 8.5, round 2 at 6.0), the user-initiated interrupt shipped
   { bestRound: 2, composite: 8.5 } which is a pair that never
   existed and the collapsed badge advertised forever (the reducer's
   interrupted phase is sticky). Folded into a single
   bestRoundAndComposite() that walks once and returns the matching
   pair. Multi-round regression test in CritiqueTheaterMount.test.tsx
   pins the fix.

2. critiqueTheater.interruptedSummary was missing from de.ts,
   ja.ts, ko.ts, zh-TW.ts. The Phase 9 native-locale work translated
   39 of the 40 critiqueTheater.* keys; this one slipped through
   because the rest of the key block uses the locale-specific
   interrupted line as an anchor. The collapsed badge would have
   shown the English fallback string in an otherwise native UI on
   every interrupted run. Added native translations to all four
   locales (matching PerishCode's suggested copy).

3. InterruptButton's window-scope Escape handler could collide with
   Esc-to-dismiss on modals / popovers elsewhere on the page. The
   prior fix filtered text-entry focus but still fired from any
   body-level Esc, which double-handles when a non-text dismissable
   surface is open. New gate defers when a [role="dialog"] (or any
   aria-modal="true" element) without aria-hidden="true" is on the
   page AND the Escape didn't originate inside .theater-stage. Three
   new InterruptButton.test.tsx cases pin the deferral, the
   aria-hidden exemption, and the in-stage exemption.

Verified: typecheck clean, 111 / 1007 tests green.

* fix(web): restore wait-for-daemon-ack pattern on Theater interrupt

Same regression as flagged on PR #1316 post-main-merge: the
optimistic local dispatch fired before the POST resolved, so a
daemon 404 / 409 still terminalized the UI and the real SSE
terminal event got ignored by the sticky interrupted phase.

Pulled the corrected pattern from the Phase 10 commit:
- snapshot runId / bestRound / composite at click time
- dispatch interrupted only on res.ok
- on rejection or non-2xx, clear interruptPending and log in dev so
  the user can retry and the real terminal event still wins

Tests updated: rejection and non-2xx cases now assert the UI stays
in running phase; the 204 cases wait for the async ack with waitFor.
Web typecheck clean.

* fix(web): drop shadow hasValidVariantShape; delegate to isPanelEvent (PerishCode follow-up on PR #1315)

The wire-layer guard in `apps/web/src/components/Theater/state/sse.ts` carried a `hasValidVariantShape` second-pass filter and a docstring claiming `isPanelEvent` from contracts was a header-only check ("missing runId, unknown type"). That stopped being true after the Siri-Ray round-3 fix on PR #1314: `isPanelEvent` is now the strict guard, validating every variant's required fields, closed-enum membership against PANELIST_ROLES / SHIP_STATUSES / DEGRADED_REASONS / FAILED_CAUSES / PARSER_WARNING_KINDS / ROUND_DECISIONS, finite numerics, and the threshold <= scale cross-field check.

Net effect: the shadow guard was strictly weaker than the contract guard above it. It could not reject any frame `isPanelEvent` already let through. The docstring also misled future readers into thinking the safety net was at the wire layer when it actually sits one layer up in contracts.

Two changes:

1. Drop `hasValidVariantShape` entirely. `sseToPanelEvent` collapses to the one-liner: spread the payload, pin `type` from the channel, return `isPanelEvent(candidate) ? candidate : null`. Docstring rewritten to describe `isPanelEvent` accurately as the strict guard, and to call out (with a forward reference to this commit) that the deletion was intentional.

2. Add three small wire-layer regression cases to `sse.test.ts` so a future accidental weakening of either layer fails loudly at the boundary the reducer actually sees:
   - drops a `critique.ship` with an unknown status (closed-enum check)
   - drops a `critique.panelist_close` with an unknown role (closed-enum check)
   - drops a `critique.round_end` with `composite: NaN` (finite-numeric check)

Verification:
- pnpm --filter @open-design/web typecheck: clean
- pnpm --filter @open-design/web exec vitest run tests/components/Theater/state/sse.test.ts: 22 / 22 green (19 prior + 3 new)
- Net diff: -84 +41 lines (-43 net); the shadow function and its stale docstring are gone, the three regressions land at half the cost

* fix(test): add projectKind prop to FileViewer deck render after v0.7.0 merge

---------

Co-authored-by: Nagendhra <nagendhra405@gmail.com>
2026-05-13 21:21:41 +08:00
Nagendhra Madishetti
c326492203
feat: Critique Theater Phase 15 (rollout resolver + Settings toggle hook) (#1320)
* feat(web): pure reducer for Critique Theater states (Phase 7.1)

Pure CritiqueState reducer driven by the contracts-level PanelEvent
(the same shape both the live SSE stream and the recorded transcript
emit), so a single reducer powers both the in-flight panel and the
rerun replay. Lifecycle covers run_started → running → (shipped /
degraded / interrupted / failed), with panelist_open / dim /
must_fix / close / round_end events building per-round
CritiquePanelistView entries as they arrive.

Defensive behaviour that surfaced while writing the spec tests:
- Terminal phases (shipped / degraded / interrupted / failed) are
  sticky against further lifecycle events for the same run, except
  for parser_warning which can land late and is recorded in a side
  channel without changing phase.
- A new run_started for a different runId at any time discards the
  prior state and reboots, so the UI can launch consecutive runs
  without an explicit reset action.
- Events whose runId does not match the active run return the same
  state reference, so React's useReducer doesn't re-render
  subscribers on stray traffic.
- Round bookkeeping keys by round number rather than "always last",
  so an out-of-order panelist_dim for round 1 arriving after a
  round 2 dim does not corrupt the round 2 bucket.

Test coverage: 18 cases covering each transition, the runId guard,
sticky-terminal behaviour, the out-of-order round invariant, and
the stable-identity guarantee. Sets up Phase 7.2 and 7.3 to wire
SSE + replay into the same reducer.

* feat(web): useCritiqueStream hook subscribes to SSE and feeds reducer (Phase 7.2)

createCritiqueEventsConnection is a pure connection manager that
mirrors apps/web/src/providers/project-events.ts: opens an
EventSource at /api/projects/:id/events, listens for every name in
CRITIQUE_SSE_EVENT_NAMES, decodes each frame back into a PanelEvent
(stripping the critique. prefix and merging the data payload), and
hands it to the caller's onEvent. Reconnect uses exponential
backoff (1s → 30s) and resets on `ready`; malformed payloads drop
with a dev-mode warning rather than tearing the stream.

useCritiqueStream wraps the manager in a useReducer that owns the
CritiqueState. enabled=false or a null projectId tears down the
connection cleanly; switching projectId closes the old connection
and opens a fresh one. The returned dispatch lets local UI
synthesise actions (e.g. an Esc keypress firing a synthetic
interrupted while a kill request is in flight); production traffic
comes from the SSE stream.

Test coverage:
- sse.test.ts (10 cases, node env): subscription set covers every
  CRITIQUE_SSE_EVENT_NAMES channel; payload decoding lifts the wire
  shape back to PanelEvent; malformed JSON is swallowed and does
  not stop the stream; exponential backoff schedule and ready-reset
  semantics are pinned with a setTimeout seam; close() cancels
  pending reconnects and shuts the live source; no-op fallback
  when EventSource is unavailable.
- useCritiqueStream.test.tsx (6 cases, jsdom env): idle pre-event,
  reducer driven by synthetic actions, no connection when disabled
  or projectId is null, clean close on unmount, projectId change
  reopens cleanly.

* feat(web): useCritiqueReplay hook drives reducer from transcript file (Phase 7.3)

Fetches the per-run NDJSON transcript (one PanelEvent per line),
parses every line via the shared isPanelEvent predicate, and
dispatches into the same CritiqueState reducer the live SSE stream
uses. A single reducer means the UI rendering a replay can be
identical to the live panel, and a UI mounting both
useCritiqueStream and useCritiqueReplay in parallel does not have
to reconcile two state shapes.

speed knob is `paused | instant | live | { intervalMs: N }`.
- instant flushes every event synchronously, useful for opening a
  finished run already at its terminal state.
- intervalMs paces dispatches at a fixed cadence so the reviewer
  can watch the run unfold.
- paused parses the transcript but holds events back until the
  caller advances speed (consumers can drive a scrubber later).
- live is reserved for the future "playback at original cadence"
  feature, currently treated as instant; replay timestamps are not
  yet persisted with each event so honest pacing requires a
  follow-up Phase 7+ task.

gunzip seam handles `.ndjson.gz` transcripts via
DecompressionStream when present; the production fetch path picks
between text and arrayBuffer based on the URL extension. Both seams
are injectable so the unit tests don't need to spin up a real
network or a real gzip pipeline.

Test coverage (8 cases, jsdom env):
- Idle status before any URL is provided.
- speed=instant flushes the full transcript synchronously to
  shipped state.
- speed={intervalMs:N} paces with the setTimeout seam, reaching
  done after the last tick.
- speed=paused leaves status=playing with no dispatches.
- Empty transcript reports done with state still idle.
- Fetch rejection surfaces an error status with the message.
- Malformed NDJSON lines are skipped; valid events around them
  still land.
- .gz transcripts route through the gunzip seam.

Closes the Phase 7 plan tasks 7.1 / 7.2 / 7.3 (reducer + stream +
replay), all on one branch ready for review. Phases 8+ (Theater
components) consume these from this PR.

* fix(web): close payload-override gap + paused-resume bug in Critique Theater hooks (Phase 7 review)

Two P1 fixes from lefarcen's review on PR #1307:

SSE payload override

`sseToPanelEvent` previously spread `data` after the channel-derived
`type`, so a payload-provided `type` could override the channel and
route a `critique.run_started` frame into the reducer as a `ship`
action. Reversed the spread so the channel-derived `type` is
authoritative, and revalidated the resulting object through the
contracts-level `isPanelEvent` predicate before returning. Frames
that fail validation (missing runId, empty runId, unknown type) are
dropped, so a malformed or compromised SSE frame can no longer
dispatch a wrong-shape action into the reducer.

Three new sse.test.ts cases pin the regression: hostile `type:'ship'`
in the payload still resolves to `run_started`, missing runId is
dropped, empty runId is dropped.

Replay pause/resume

`useCritiqueReplay` had one big effect keyed on `transcriptUrl`
only, so flipping `speed` from `paused` to `instant` never re-fired
and the held events sat undispatched. Split into a parse effect
(depends on URL, fetches and stores events in state) and a pace
effect (depends on parsed-events + speed, owns the cursor + timers).
The playback cursor lives in a ref that survives pause/resume
cycles, so flipping `paused` -> `instant` flushes from the current
position rather than restarting (which would double-dispatch
`run_started` and reset the reducer).

Two new useCritiqueReplay.test.tsx cases:
- paused-then-instant transitions from `playing` to `done` and
  reaches the shipped terminal phase
- intervalMs paced playback dispatches one event, pauses to drain
  the next scheduled timer, flips to instant, and confirms the
  remaining transcript drains exactly once (cursor was preserved)

Doc consistency

The earlier source comment in useCritiqueReplay.ts claimed `live`
"paces by recorded timestamps" while the impl used zero-delay
timers and the PR body said it behaves like `instant`. Aligned to
reality: `live` currently behaves like `{ intervalMs: 0 }` (events
drain on successive microtasks via setTimeoutFn) because transcripts
do not yet carry per-event timestamps. Honest timestamp-driven
pacing is queued as a Phase 7+ follow-up.

Validated: pnpm guard, pnpm --filter @open-design/web typecheck,
Theater suite 47/47 (up from 42, +3 sse + 2 replay), full web suite
96 files / 888 tests.

* feat(i18n): seed Critique Theater key block (en + zh-CN; other locales fall back via spread)

* feat(web): Theater PanelistLane component (Phase 8.1)

* feat(web): Theater ScoreTicker component (Phase 8.2)

* feat(web): Theater RoundDivider component (Phase 8.3)

* feat(web): Theater InterruptButton component with Escape keybind (Phase 8.4)

* feat(web): Theater TheaterDegraded chip (Phase 8.5)

* feat(web): Theater TheaterCollapsed post-run summary (Phase 8.6)

* feat(web): Theater TheaterTranscript replay surface (Phase 8.7)

* feat(web): Theater TheaterStage top-level container (Phase 8.8)

* feat(web): Theater CSS using existing semantic tokens (no hex literals)

* feat(web): Theater public exports barrel

* fix(web): resolve P2 + P3 review feedback on Phase 8 (PR #1314)

Addresses all 4 P2 + 3 P3 items from codex, Siri-Ray, and lefarcen.

State-lifecycle fixes (3 x P2)
1. Reducer learns a synthetic `__reset__` action (`CritiqueResetAction`).
   Host hooks dispatch it when their gating prop changes so a stale
   run from a prior project / transcript cannot bleed into the next
   context. Reset is idempotent on idle (returns the same reference).
2. `useCritiqueStream` dispatches `__reset__` at the top of its
   connection effect, so a workspace switch from project A (which
   streamed a critique) to project B clears the reducer before the
   new EventSource opens. enabled=false also clears.
3. `useCritiqueReplay` dispatches `__reset__` at the top of its
   parse effect, so transcriptUrl swaps (including swap-to-null after
   a replay reached `shipped`) lift the reducer back to idle before
   the new fetch starts.

SSE validation (1 x P2)
4. `sseToPanelEvent` now runs a per-variant `hasValidVariantShape`
   check after the cheap `isPanelEvent` predicate. A
   `critique.ship` frame missing `composite` / `round` / `status` /
   `artifactRef` is rejected before reaching the reducer, so
   TheaterCollapsed can no longer crash on `undefined.toFixed(1)`.
   Every variant's required fields are validated: run_started
   (protocolVersion, non-empty cast, maxRounds, threshold, scale),
   panelist_* (round, role, plus variant-specific shape), round_end
   (round, composite, mustFix, decision in {continue,ship}, reason),
   ship (round, composite, status, artifactRef.{projectId,artifactId},
   summary), degraded (reason, adapter), interrupted (bestRound,
   composite), failed (cause), parser_warning (kind, position).

Reducer correctness (1 x P2)
5. `panelist_open` now materializes the round + an empty panelist
   view (`{dims: [], mustFixes: []}`) so TheaterStage can highlight
   the in-progress lane the instant the tag opens. Before this, a
   stream that emitted only `panelist_open` after `run_started` left
   `rounds = []` and the UI rendered no current round until a later
   `panelist_dim` arrived.

Polish (3 x P3)
6. Brand role tint swaps from `var(--magenta, var(--accent))` to
   `var(--purple, var(--accent))`. `--purple` is actually defined
   across the design systems; `--magenta` is not, so Brand was
   silently falling through to `--accent` and looking identical to
   Designer.
7. New i18n key `critiqueTheater.interruptedSummary` for the
   interrupted-collapse copy ("Interrupted at round N, best
   composite X.X"). Previously the interrupted branch reused
   `shippedSummary` and the UI read "Shipped at round..." for a run
   that specifically did not ship. Native value in en + zh-CN; other
   locales fall back via `...en` spread.
8. `TheaterDegraded` heading id comes from `useId()` instead of a
   hardcoded `theater-degraded-heading`, so two chips rendered on
   the same page (chat history with multiple completed runs) keep
   their aria-labelledby references unambiguous.

Tests (15 new cases)
- reducer.test.ts (+5): __reset__ on running/terminal/idle, panelist_open materializes round, panelist_open does not stomp prior panelist data.
- sse.test.ts (+6): variant-level rejection for ship without required fields, degraded without adapter, run_started with empty cast, panelist_dim with non-numeric score, round_end with unknown decision, plus a positive fully-formed ship.
- useCritiqueStream.test.tsx (+2): state reset on projectId change, state reset on enabled flip false.
- useCritiqueReplay.test.tsx (+1): state reset on transcriptUrl swap to null after a replay reached shipped.
- TheaterCollapsed.test.tsx (text-pinning update): asserts the interrupted branch reads "Interrupted at round 1" + "best composite 7.9", and explicitly NOT "Shipped at round...".
- TheaterDegraded.test.tsx (+1): two chips on the same page get unique aria-labelledby ids that each resolve to an `<h3>`.

Validated
- pnpm guard clean
- pnpm --filter @open-design/web typecheck clean
- Theater suite: 13 files, 101 tests (was 86 on the first Phase 8 push, +15 new)
- tests/i18n/locales.test.ts 5 of 5 across 18 locales

* feat(web): CritiqueTheaterMount wires SSE + reducer into a single drop-in (Phase 9.1)

* feat(i18n): Critique Theater strings for de + ja + ko + zh-TW (Phase 9.2)

* fix(web): resolve P1 + P2 review feedback on Phase 9 (PR #1315)

Addresses every blocker from codex, Siri-Ray, and lefarcen. The
three state-lifecycle and SSE-validation issues they also flagged
inherit fixes from PR #1314's review pass that this branch now sits
on top of after rebase.

Real daemon kill on Interrupt (P1)
- CritiqueTheaterMount now POSTs to
  /api/projects/:id/critique/:runId/interrupt alongside the
  optimistic local dispatch. Before this fix, clicking Interrupt
  only flipped the React state to interrupted while the daemon job
  kept running. The fetch is best-effort: a 404 (endpoint not wired
  yet, lands in Phase 15) is swallowed with a dev-mode console.warn
  so the UI still moves to the collapsed badge.
- New fetchInterrupt test seam lets RTL assert on the URL / method
  and simulate the "daemon not ready yet" path. Two tests pin both:
  the happy URL proj-42/critique/run-abc/interrupt POSTs, and a
  rejected fetch still flips the UI.

interruptPending reset on new run (P2)
- A ref-backed effect compares the current runId against the last
  one we saw; when it changes, interruptPending is cleared. A user
  who interrupts run-1 and then triggers run-2 from the same mount
  now gets a fresh, enabled kill button instead of one stuck in
  "Interrupting…". Pinned by a new mount test.

Escape keybind scope (P2)
- InterruptButton now checks the keydown target. Escape inside an
  input, textarea, select, or contenteditable element is ignored
  (and any ancestor of those via closest() is treated the same
  way). Body-level focus still fires the keybind so the Theater
  area's affordance keeps working. Four new tests cover textarea,
  input, contenteditable, and the body-focus positive case.

userFacingName i18n key (P2)
- The spec at specs/current/critique-theater.md:6 mandates a single
  critiqueTheater.userFacingName key so the "Design Jury" label can
  be renamed without touching code. Phase 8 introduced
  critiqueTheater.title by mistake; renamed across types.ts, en.ts,
  zh-CN.ts, de.ts, ja.ts, ko.ts, zh-TW.ts, and the lone consumer
  TheaterStage.tsx. The locale alignment test stays green.

Validated
- pnpm guard clean
- pnpm --filter @open-design/web typecheck clean
- Theater suite: 14 files, 112 tests (was 101 before, +11 new for
  the Phase 9 review pass: 3 mount + 4 InterruptButton focus scope;
  the rest were already in #1314's review fix).
- tests/i18n/locales.test.ts 5 of 5 across 18 locales.

* feat(daemon): adapter-degraded registry with TTL (Phase 10.1)

In-memory registry recording adapters that produced malformed or
oversize transcripts so the orchestrator can skip them for a TTL
window (default 24h) instead of cycling through known-bad providers
on every run.

Records carry reason (malformed_block | oversize_block |
missing_artifact), source label, and expiresAt. The test-only
clock seam lets the suite advance time deterministically and prove
that an expired entry stops counting as degraded without anyone
calling clearDegraded.

7/7 vitest cases green.

* feat(daemon): synthetic good + bad adapter fixtures (Phase 10.2)

Two test-only adapters that read the existing v1 transcript
fixtures (happy-3-rounds and malformed-unbalanced) and replay them
as either a full string or a 512-byte chunked stream. The chunked
form is what the conformance harness uses to prove the parser
holds together when the transcript arrives in arbitrary network
slices, not as one buffered blob.

* feat(daemon): adapter conformance harness (Phase 10.3)

runAdapterConformance pulls a transcript through the same
parseCritiqueStream pipeline the orchestrator uses and classifies
the outcome as shipped, degraded, or failed. On a degraded
outcome it forwards the matched reason to the adapter-degraded
registry, so a single nightly conformance run is what populates
the skip list rather than the orchestrator learning each adapter
is broken at request time.

5/5 vitest cases green covering shipped, malformed degraded,
oversize degraded, no-ship failure, and the harness-thrown
failure path.

* test(e2e): Critique Theater Playwright suite (Phase 11)

Six tests, one viewport per visual case, deterministic SSE
fixtures stubbed via page.route(). Adds the suite to
test:ui:extended so the existing extended-UI lane picks it up.

Coverage:

  1. Happy path: a single mounted theater plays the full
     fixture (1 run_started, 5 panelists open / dim / must_fix /
     close, 1 round_end, 1 ship) and ends on the score badge.
  2. Interrupt mid-run: the panelist that is open at the time
     the interrupt button is clicked closes with an interrupted
     marker and the transcript freezes there.
  3. Visual regression at 375x720 mobile.
  4. Visual regression at 768x1024 tablet.
  5. Visual regression at 1280x800 desktop.
  6. A11y role tree: the theater region exposes a labelled
     landmark, each panelist lane is a group with an accessible
     name, the score is a status live region.

All SSE traffic is stubbed by page.route so the suite runs in CI
without a daemon. The toggle is seeded via localStorage by
bootAppWithCritiqueEnabled so the gate behaves as if Settings
flipped it on. typecheck clean; playwright --list reports 6.

* test(web): reducer p99 bench at 10k iterations (Phase 13.1)

Locks the documented 2ms budget for the Critique Theater reducer
on a representative SSE script (27 actions, one full happy run)
behind a regression gate. Asserts p99 stays under 4ms (2x the
documented budget) so CI runners with a noisy neighbour do not
flake while a real regression to 20ms or 200ms still trips.

The bench is a vitest case rather than a bare microbenchmark so
it runs in the same CI lane as every other web test and does not
need a parallel runner.

* test(web): critique surface coverage walker (Phase 13.2)

Walks the public critique surface (11 SSE event names, 5 panelist
roles, 6 lifecycle phases, 9 named i18n keys) and asserts each
named symbol appears in both the src corpus and the test corpus.
The walker is the gate that catches a rename in one half of the
codebase without a matching update in the other half: a future
PR that drops 'panelist_must_fix' from the reducer without also
removing its test reference fails this suite.

62 assertions, one per symbol per corpus.

* docs: Critique Theater user guide (Phase 14.1)

Seven sections aimed at end users (not contributors):

  1. What is Design Jury
  2. How it works (the five panelists, auto-converging rounds,
     the composite formula)
  3. Settings (the M1 toggle and what it does)
  4. Reading the score badge
  5. Replay surface
  6. Troubleshooting (degraded, interrupted, failed)
  7. FAQ

The composite formula is documented as
    designer * 0 + critic * 0.4 + brand * 0.2 + a11y * 0.2 + copy * 0.2
because anyone trying to reverse-engineer the score is going to
search for those weights and the docs are the place they should
land first.

* docs(daemon): critique module AGENTS map (Phase 14.2)

Daemon-side wayfinder for the apps/daemon/src/critique directory.
Tables every file, what owns what invariant, and the 'when you
change anything here' guide so a future contributor does not
have to reverse-engineer the rollout resolver before adding a
new SSE event.

* docs(web): Theater module AGENTS map (Phase 14.3)

Web-side mirror of the daemon AGENTS map. Same file table, same
invariants section, same change-impact guide, sized to the
Theater component package.

* feat(daemon): rollout flag resolver (Phase 15.1)

Single decision point every caller consults to know whether the
orchestrator should wire the critique pipeline for a given run.
Priority:

  1. Skill-level policy (required wins, opt-out wins inversely)
  2. Per-project override from the Settings toggle
  3. OD_CRITIQUE_ENABLED env override
  4. Rollout phase default
       M0 dark-launch      false
       M1 settings only    false (toggle is off until the user flips it)
       M2 per-skill        true if skill opted in
       M3 global default   true

OD_CRITIQUE_ROLLOUT_PHASE parser defaults to M0 on unknown input
so a fresh install never surprises a user with the feature on.

10/10 vitest cases green covering every cell of the matrix.

* feat(web): Settings toggle hook for Critique Theater (Phase 15.2)

React hook that reads critiqueTheaterEnabled from the existing
open-design:config localStorage blob and stays in sync via:

  - the platform storage event (cross-tab)
  - a open-design:critique-theater-toggle CustomEvent (same-tab)

Same-tab event is the one that fires when the Settings panel saves
in the current window: the toggle and every mounted theater update
without a page reload.

setCritiqueTheaterEnabled(next) is the imperative setter the Settings
panel calls. It preserves the rest of the stored config (mode, apiKey,
etc.) and dispatches the same-tab event after the localStorage write.

The web hook reflects what the user toggled; the daemon-side
isCritiqueEnabled is the final routing authority (project override,
env, rollout phase). When they disagree, the daemon wins for backend
gating and the web reflects the toggle state.

6/6 vitest cases green covering first read, stored read, same-tab
event flip, config preservation, corrupted JSON tolerance, and
cross-tab storage event.

* test(web): Phase 15 toggle hook failure-mode coverage (PR #1320)

lefarcen P2 on PR #1320 flagged that the PR body claimed safe
behavior for disabled localStorage, non-object JSON, and missing
CustomEvent shim, but the suite only covered corrupt JSON plus
happy-path storage events. Added four failure-mode tests so the
swallowed errors are not silently traded for a throw in a future
refactor:

1. Returns false on a stored JSON value that parses to an array
   (non-object). Catches a regression where the guard treats
   anything truthy as a config blob.
2. Returns false on a stored JSON value of literal 'null'.
   typeof null === 'object' in JS, so the guard has to check null
   explicitly; this test pins that check.
3. Returns false when localStorage.getItem throws (private mode /
   disabled storage / SecurityError). The hook must swallow and
   return false so the rest of the app keeps rendering.
4. setCritiqueTheaterEnabled still dispatches the same-tab
   CustomEvent when localStorage.setItem throws (quota exceeded /
   disabled storage). The dispatch path is the in-session
   broadcast that keeps every mounted hook coherent even when
   persistence is unavailable; verified by mounting two probes
   and asserting both flip after the setter is called with a
   throwing setItem.

10/10 vitest cases green (6 existing + 4 new).

* fix(web): honor CustomEvent payload in toggle hook listener (PR #1320)

Both Siri-Ray (blocking) and lefarcen (P2 new) caught the same
real bug in the failure-mode test I added in affcdd27: the test
asserts the in-session UI flips when localStorage.setItem throws,
but the CustomEvent listener was ignoring the event's typed
detail and just calling readToggle(). Under a throwing setItem
the localStorage value is stale (or absent), so the listener
would see the OLD value and the test would fail (or worse, the
production claim 'in-session event keeps mounts coherent' was
hollow).

Fixed the hook, not the test: the listener now reads
event.detail.enabled when it is a boolean, falling back to
readToggle() only for malformed events or for cross-tab storage
events (which do not carry a typed payload). The setter already
dispatched the detail; the listener just was not consuming it.

Test changes:

  - The existing 'setItem throws' test now asserts the right
    behavior for the right reason. Updated the inline comment to
    say the listener reads from detail, not localStorage.
  - New test 'falls back to readToggle when the CustomEvent
    carries no usable detail' pins the fallback path: a
    malformed dispatcher (no detail, or detail.enabled not a
    boolean) degrades cleanly instead of throwing or being
    silently ignored.

11 / 11 vitest cases green (10 prior + 1 new fallback).

* fix(web): tighten isPanelEvent in contracts so enum + numeric fields are checked end-to-end (Siri-Ray round-3 P1 on PR #1314)

The variant validator on the web SSE path previously accepted any
`typeof === 'string'` for closed-enum fields (ship.status,
panelist_*.role, degraded.reason, failed.cause, parser_warning.kind,
run_started.cast[]) and any `typeof === 'number'` for numeric fields,
which let NaN / Infinity through. Downstream components index i18n
tables by enum value, so an unknown status or role would land
`SHIP_BADGE_KEY[final.status]` on undefined and crash the translator.

The replay parser had a separate gap: `useCritiqueReplay.parseTranscript`
called the cheap `isPanelEvent` header check directly, so a recorded
line like `{"type":"ship","runId":"r"}` reached the reducer with
composite, status, round, artifactRef, summary all undefined and
TheaterCollapsed then called `final.composite.toFixed(1)` on undefined.

Resolution: move all wire-side validation into the contract guard.

- Export const arrays for the closed enums:
  SHIP_STATUSES, DEGRADED_REASONS, FAILED_CAUSES, PARSER_WARNING_KINDS,
  ROUND_DECISIONS (PANELIST_ROLES already existed).
- Rewrite `isPanelEvent` in packages/contracts/src/critique.ts to be the
  single deep validator: header (known type + non-empty runId) plus
  every variant-specific required field plus closed-enum membership
  plus Number.isFinite on every numeric field. Documented as the wire
  source of truth.
- Drop the local `hasValidVariantShape` from web/sse.ts; sseToPanelEvent
  now relies entirely on the contract guard, and parseTranscript in
  useCritiqueReplay (which already uses isPanelEvent) gets the deeper
  validation for free.

Tests (TDD, red-first):

- packages/contracts/tests/critique.test.ts: 13 new cases pinning the
  strict guard directly (well-formed across every variant, every
  rejection path: unknown type, empty/non-string runId, unknown enum,
  non-finite numeric, missing variant field).
- apps/web/tests/components/Theater/state/sse.test.ts: 9 new cases for
  each closed-enum rejection on the wire path plus a positive sweep
  across every legal enum value across every variant.
- apps/web/tests/components/Theater/hooks/useCritiqueReplay.test.tsx:
  2 new cases for incomplete and unknown-enum transcript lines.

Verified:
- pnpm --filter @open-design/contracts test 4 files / 30 tests green.
- pnpm --filter @open-design/contracts build clean.
- pnpm --filter @open-design/web typecheck clean.
- pnpm --filter @open-design/web test 107 files / 976 tests green.

* docs: tighten Phase 14 reasoning from lefarcen review (PR #1319)

Four content gaps lefarcen flagged in the Phase 14 docs review,
addressed inline rather than deferred. The fifth item (scope-drift
between 'docs only' PR body and the cumulative stacked diff) is
handled by rewriting the PR body, not the docs.

1. Round exit conditions (lefarcen P2-1).
   docs/critique-theater.md §2 'Auto-converging rounds' now lists
   the five conditions that stop a run (threshold reached, round
   budget exhausted, per-round timeout, total timeout, user
   interrupt) with their default values. A user debugging a run
   that stopped at round 1 with composite 5.4 can read this list
   and find the matching cause without spelunking the orchestrator.

2. Prior-art comparison (lefarcen P2-2).
   New §1.5 'Why an in-CLI panel and not a third-party design lint'
   pre-answers the 'why not Figma lint / Adobe checker / Material
   You conformance' question. Three differences: rule engines vs
   generative reviewers, post-hoc vs in-loop, external service vs
   same-CLI-session.

3. Composite formula rationale (lefarcen P2-4).
   §2 now explains why each weight is set the way it is: critic
   gates correctness so it gets 0.4; brand / a11y / copy are
   secondary quality dimensions at 0.2 each; designer is at 0.0
   in v1 because aesthetic preference is not a ship gate. The slot
   stays in the schema so notes flow into the transcript and a v2
   config release can bump the weight without a wire-shape change.

4. v2 cast-config ownership (lefarcen P2-3).
   Both AGENTS.md files (daemon + web) now declare a 'Designer
   weight frozen at 0.0 until v2 cast config' invariant. The
   daemon side calls out where the SKILL.md frontmatter resolver
   lands (apps/daemon/src/critique/config.ts); the web side calls
   out where the Settings surface lands (apps/web/src/components/
   Settings/). A contributor reading either AGENTS.md before
   implementing v2 sees which module to touch first.

* docs: replace deferred metrics endpoint reference + refresh Theater module map (PR #1319)

Two carryover items lefarcen flagged across the PR #1319 + #1320
reviews.

1. docs/critique-theater.md was sending users to
   /api/metrics/critique as the conformance-status check on
   malformed_block, but the Phase 12 metrics endpoint is
   explicitly deferred until after orchestrator wiring lands.
   Replaced the link with the pnpm conformance-harness command
   that DOES exist today (pnpm --filter @open-design/daemon
   vitest run tests/critique-conformance.test.ts) and noted
   that the dashboard surfaces this status as a series once
   Phase 12 ships.

2. apps/web/src/components/Theater/AGENTS.md module map was
   stale after Phase 15: the index.ts row said 'only two hooks
   are exported' but the barrel now exports
   useCritiqueTheaterEnabled too (plus the setCritiqueTheaterEnabled
   setter). Updated the row to list all three hooks + the
   setter + the reducer-derived contract types, and added a
   new row for hooks/useCritiqueTheaterEnabled.ts in the file
   table so a web contributor scanning the table sees the new
   hook without inferring it from the index.ts blurb.

* docs(phase-15): clarify resolver + toggle scope in Section 3 (lefarcen P2 on PR #1320)

Two P2 doc-accuracy items from the latest review:

1. The prior text said "the web toggle persists into the daemon's
   settings store; both surfaces flip the same flag." That isn't
   true on this PR's head: setCritiqueTheaterEnabled writes
   localStorage and dispatches a same-tab CustomEvent, but does NOT
   write through to the daemon (no /api/settings/critique endpoint
   ships in Phase 15, and no production caller of the setter rounds
   through to the daemon). Reworded the section to describe the
   actual scope: client-side in-session toggle this phase, daemon
   round-trip deferred to the Settings UI follow-up.

2. The prior text implied the rollout resolver was wired into the
   spawn-time gate, but apps/daemon/src/server.ts still reads
   critiqueCfg.enabled directly; isCritiqueEnabled is only called in
   tests. Reworded the section to make this explicit: Phase 15 ships
   the resolver in isolation so it can land green and be reviewed on
   its own, the one-line wiring change ships in the wireup PR that
   follows.

Operators who want to enable the feature today should still set
OD_CRITIQUE_ENABLED=1 rather than relying on the client toggle. That
guidance is now explicit in the doc.

* docs(rollout): align module docblock with actual Phase 15 scope (lefarcen P2 on PR #1320)

The module-level docblock said the orchestrator entry and a
GET /api/settings/critique endpoint already consume isCritiqueEnabled,
but neither integration ships in this PR: the spawn-time gate in
server.ts still reads critiqueCfg.enabled directly, and the daemon
settings endpoint is deferred to the Settings UI follow-up. A future
contributor reading the source comment would assume rollout phases
and project overrides are live in generation routing when they are
not yet wired.

Reworded the docblock into three sections that mirror the user-facing
docs/critique-theater.md scope notes:

- What ships in Phase 15: the pure resolver function plus its
  supporting parsers, with full priority-matrix coverage.
- Planned consumers (not yet wired): the orchestrator entry (one-line
  swap, lands in the wireup PR), the settings echo endpoint (lands
  with the Settings UI PR), and the conformance harness.
- Operator guidance: until the wiring change lands, set
  OD_CRITIQUE_ENABLED=1 rather than relying on the client toggle.

Resolution-order bullet 2 also updated: per-project override 'will
write here once the Settings UI follow-up adds the daemon-side write
path' instead of implying that write path exists today.

Verified:
- pnpm --filter @open-design/contracts build clean.
- pnpm --filter @open-design/daemon typecheck clean.

No runtime change. Documentation-only alignment.

* fix(web): restore wait-for-daemon-ack pattern on Theater interrupt

Same regression as flagged on PR #1316 post-main-merge: the
optimistic local dispatch fired before the POST resolved, so a
daemon 404 / 409 still terminalized the UI and the real SSE
terminal event got ignored by the sticky interrupted phase.

Snapshot runId / bestRound / composite at click time, dispatch
interrupted only on res.ok, clear interruptPending on rejection or
non-2xx so the user can retry. Tests cover rejection + 404 leaving
the run on the live stage; the 204 path waits for the ack.

* fix(web): export useCritiqueTheaterEnabled + setter from Theater barrel (Siri-Ray P2)

The Phase 15 hook and its imperative setter were not in the public
barrel, even though Phase 14 AGENTS.md describes index.ts as
exporting them. That mismatch would force the Settings follow-up to
import from the private hooks/ path (or render the AGENTS module map
inaccurate).

Added the export alongside useCritiqueStream and useCritiqueReplay
so the Phase 15 public surface matches the module map.

* fix(test): add projectKind prop to FileViewer deck render after v0.7.0 merge

* fix(contracts): restore numeric-domain guards in isPanelEvent (lefarcen + Siri-Ray P2 on PR #1320)

---------

Co-authored-by: Nagendhra <nagendhra405@gmail.com>
2026-05-13 21:19:51 +08:00
Dongsen
8e6250f2c1
feat(web): render GFM tables in markdown artifact and chat renderers (#1496)
* feat(web): render GFM pipe tables in markdown artifact and chat renderers

Both hand-rolled markdown parsers (artifact preview and assistant chat)
ignored pipe-table syntax, so GFM tables collapsed into a single
paragraph. Add a table block to each parser that:

- detects header+alignment+body rows;
- honors :--- / :---: / ---: column alignment;
- preserves inline formatting (code, bold, italic, links) inside cells;
- restores escaped \| as a literal pipe;
- breaks a preceding paragraph at a table start without a blank line.

Closes #1495

* fix(web): use scan-based table cell splitter, drop NUL placeholder

Address review on #1496:

- lefarcen (P1): the placeholder approach in both splitters was unsafe.
  artifacts/markdown.ts embedded literal NUL bytes around 'ODPIPE', which
  flipped the whole file to binary in GitHub diffs and hid the patch from
  review. runtime/markdown.tsx used a printable ' ODPIPE ' sentinel that
  could collide with real cell content.
- codex (P2): both splitters ran before inline parsing, so a literal '|'
  inside a backtick code span (e.g. a TypeScript union cell like
  \`\"ready\" | \"done\"\`) was treated as a column boundary and shredded
  the row.

Both issues collapse to one root cause — splitting by regex on '|'
without knowing what's a code span and what's an escape. Replace the
placeholder-then-split routine in both files with a single character-
level state-machine scanner that:

- treats '|' inside backticks as cell content (tracks inCode);
- resolves '\\|' to a literal '|' as it accumulates each cell;
- strips a single optional leading '|' and a single unescaped trailing
  '|' as row terminators rather than empty cells.

No placeholders, no NUL bytes, no collision surface. Add a test in each
test file covering the GFM-style pipe-inside-code-span cell, and the
runtime test additionally asserts the row produces exactly two <td>
cells (i.e. no phantom column from the embedded pipe).

* style(web): make markdown table apply in artifact viewer + add header bg, borders

The prior `.md-table` rules were scoped to `.prose-block`, which only
wraps the chat assistant path (AssistantMessage). The artifact preview
in FileViewer renders into `.markdown-rendered`, so tables in long-form
documents (the most common case) inherited no styling and showed as
borderless, structureless rows.

Lift the rule out of `.prose-block` so both paths share it. Visual
treatment matches the existing prose-block ecosystem (blockquote,
markdown-status):

- `var(--border)` for cell + outer borders, 6px outer radius.
- `var(--bg-panel)` on `<th>` for the header band — same token already
  used by blockquote and code-copy chip.
- Even-row zebra via `color-mix(in oklab, var(--bg-panel) 40%, transparent)`
  for a very faint stripe that holds up in both light and dark themes
  (no hardcoded color).
- Padding bumped from 4×8 to 8×12; cells were touching before.
- `display: block; overflow-x: auto` kept so wide tables scroll inside
  narrow columns instead of pushing layout.

* fix(web): wrap md-table in scroll container so columns fill width

Previously `.md-table` itself was `display: block; overflow-x: auto`
to let wide tables scroll horizontally. The side effect: a block-mode
table no longer behaves like `<table>` for layout, so columns collapse
to natural content width and leave empty space on the right when the
content is narrower than the container.

Split the two concerns: the new `.md-table-wrap` <div> owns the scroll
viewport (border, radius, overflow-x), and the inner `<table>` keeps
its native `display: table` with `width: 100%` so column widths
distribute across the available space again. Wide tables still scroll;
narrow tables now fill the container.

Both render paths (chat runtime + artifact HTML) emit the wrapper, and
the corresponding unit tests assert the nested shape.
2026-05-13 21:15:23 +08:00
pftom
d3d95121f3 feat(plugins): enhance visual score sorting and add new example templates
- Updated the `sortByVisualAppeal` function to prioritize featured ranks, ensuring that curated plugins are displayed prominently.
- Added tests to verify the new sorting logic, ensuring that plugins with numeric featured ranks are sorted correctly ahead of others.
- Introduced new example templates for a magazine article layout, a Twitter share card, and a Xiaohongshu card, expanding the available options for users.
- Enhanced the overall plugin preview experience by integrating these new templates, providing users with more visually appealing and functional examples.

This update significantly improves the plugin sorting mechanism and enriches the template offerings, enhancing user engagement and experience.
2026-05-13 21:02:05 +08:00
pftom
8b2d48a258 feat(daemon, web): enhance plugin preview handling and add new templates
- Introduced logic to assemble example slides with a companion template when the declared entry is missing, improving the user experience for plugin previews.
- Updated the server logic to handle special cases for `example-slides.html`, ensuring proper fallback to `template.html` when applicable.
- Enhanced tests to verify the new preview assembly functionality and ensure correct rendering of fallback content.
- Added new HTML and Markdown examples for various skills, including a magazine article layout and a Twitter share card, expanding the available templates for users.

This update significantly improves the plugin preview experience, providing users with more robust and visually appealing fallback options.
2026-05-13 20:58:24 +08:00
Caprika
06dbde51f9
[codex] Add Cursor Agent auth diagnostics (#1538)
* Add Cursor Agent auth diagnostics

* Handle Cursor not logged in auth status

* Address Cursor auth review feedback

* Classify Cursor stdout auth failures
2026-05-13 20:25:34 +08:00
Caprika
a3276ec542
[codex] Add visual draw annotation context (#1547)
* feat(web): add visual draw annotation context

* Fix visual draw annotation staging

* Fix concurrent visual annotation IDs
2026-05-13 20:02:19 +08:00
Sid
fb545b8d21
fix(web): count nested .slide elements in deck preview bridge (#1542)
* fix(web): count nested .slide elements in deck preview bridge

Generated HTML decks commonly nest .slide elements under an extra
wrapper rather than placing them as direct children of .deck /
.deck-stage / .deck-shell / body. The bridge then reported count: 0
and the preview toolbar showed 1 / 0 even though the deck visibly
contained slides and its own keyboard handler navigated them.

Keep the structured selectors first so decorative .slide markup in
non-deck pages is not counted, and fall back to all .slide only when
nothing structured matched. Pinned with a regression test that runs
the extracted deck bridge script against a JSDOM nested-slide layout.

Fixes #1530

* test(web): pin deck bridge structured-first precedence with decoy .slide

Address review feedback on #1542: the existing structured-first case
expected count === 3 but the fixture only contained the three real
slides, so a regression that dropped the structured selector entirely
and went straight to `.slide` would still pass. Add a decoy `.slide`
in a sibling <header> outside `.deck`; the structured-first pass
keeps count === 3, while a naive broad selector would now count 4.

Verified by temporarily collapsing slides() to `document.querySelectorAll('.slide')`
and observing the second case fail with `expected 4 to be 3`.
2026-05-13 19:42:20 +08:00
Prantik Medhi
086be271d4
fix: hide preview chrome in source view (#1556)
* fix: hide preview chrome in source view

* fix: keep source-view edit controls
2026-05-13 19:12:00 +08:00
Prantik Medhi
660c5b88b4
fix(web): auto-scroll feedback form (#1566) 2026-05-13 19:06:08 +08:00
lefarcen
fa2bb59ab3 fix(test): align FileViewer tests with merged component
- Drop FileViewer.inspect-empty-hint.test.tsx (deleted on main; should
  have been gone after the merge but the modify/delete conflict left it
  in the tree).
- Take main's FileViewer.test.tsx so the manual-edit interaction tests
  match the merged FileViewer.tsx behaviour, then patch every
  <FileViewer> render with projectKind="prototype" so it satisfies
  the prop requirement release added in #1509.
2026-05-13 18:49:44 +08:00
lefarcen
4096d65125 fix(web): align Icon names and FileViewer tests with merged state
- SettingsDialog: use 'sparkles' for Pet section nav icon (main's choice;
  the merge picked up release's 'paw' which is not in main's IconName union).
- FileViewer.test.tsx: take release's version which passes projectKind on
  every render (release added that prop in #1509 analytics; main's tests
  predate that prop).
- FileViewer.manual-edit{,-history}.test.tsx: keep main's tests but pass
  projectKind="prototype" so they satisfy the merged FileViewer Props.
2026-05-13 18:40:48 +08:00
lefarcen
5172e37217 Merge origin/main into release/v0.7.0 to prepare merge-back PR
Resolves 7 conflicts via hybrid strategy:
- apps/web/src/components/EntryView.tsx: take main (Discord+X pills are forward feature)
- apps/web/src/components/Icon.tsx: take main (switch-case refactor)
- apps/web/src/components/NewProjectPanel.tsx: take release (preserve #1514 dropdown UX validated in 0.7.0 acceptance)
- apps/web/src/index.css: take main (project-target-platforms / instructions chip styles)
- apps/web/tests/components/FileViewer.inspect-empty-hint.test.tsx: accept main's deletion
- nix/package-daemon.nix, nix/package-web.nix: take main pnpmDepsHash

Non-conflicting hunks from #1519 (AppChromeHeader), #1428 (PostHog analytics
call sites), and #1540 (release light background) are preserved via auto-merge.
2026-05-13 18:19:47 +08:00
Prantik Medhi
9040088f1c
fix(web): remove redundant bulk-select button (#1550) 2026-05-13 16:38:14 +08:00
kami
4f76e836ae
feat(audio): add ElevenLabs audio support (#1384)
* docs: add ElevenLabs audio support design

* docs: add ElevenLabs audio implementation plan

* feat(daemon): add ElevenLabs speech renderer

* feat(daemon): add ElevenLabs sound effects renderer

* fix(daemon): preserve ElevenLabs sfx durations

* feat(web): expose ElevenLabs media providers

* feat(daemon): document ElevenLabs audio contract

* feat(audio): add ElevenLabs voice selection

* chore: ignore superpowers scratch docs

* fix(daemon): cache ElevenLabs voice options

* fix(audio): expand ElevenLabs voice and SFX selection

* fix(audio): align ElevenLabs SFX controls

* fix(audio): tighten ElevenLabs SFX prompt budget

* fix(audio): preflight ElevenLabs SFX prompt length

* fix(audio): surface ElevenLabs lookup failures

* fix(audio): sanitize ElevenLabs prompt errors
2026-05-13 15:53:41 +08:00
PerishFire
11b4750677
Update release light background (#1540) 2026-05-13 15:36:22 +08:00
Rocky
cd68c8a80a
fix(daemon+web): emit conversation-created SSE event when routine run starts (#1523)
* fix(daemon+web): emit conversation-created SSE event when routine run starts

When a Routine fires in "Reuse an existing project" mode, the daemon
creates a new conversation in the project and writes a queued/running
assistant message to the database, but the open `ProjectView` has no
way to learn that anything happened: the project events SSE stream
only carries `file-changed` and `live_artifact*` events, and
`ProjectView` reloads conversations only when `project.id` changes.
The result is the user's own routine "Run now" appears to do nothing
until they exit and re-enter the project (#1361).

Fix:

- Add a `conversation-created` payload type to the existing project
  events stream in `apps/web/src/providers/project-events.ts`. The
  payload carries `projectId`, `conversationId`, `title`, and
  `createdAt`. Mirror the existing `file-changed` listener pattern
  with explicit malformed-payload handling.
- In `apps/daemon/src/server.ts`, after `insertConversation` runs in
  the routine `setRunHandler` (both reuse-an-existing-project and
  new-project paths), broadcast a `conversation-created` event
  through the existing `activeProjectEventSinks` map. The function
  body was already generic so it was renamed from
  `emitProjectLiveArtifactEvent` to `emitProjectEvent` and the two
  pre-existing callers updated.
- In `apps/web/src/components/ProjectView.tsx`, when
  `handleProjectEvent` receives a `conversation-created` event whose
  `projectId` matches the currently-viewed project, refetch the
  conversation list via `listConversations`. The active conversation
  is intentionally NOT changed — per maintainer guidance on #1361,
  auto-switching is a separate UX decision left for a follow-up.
- `projectEventToAgentEvent` returns null for `conversation-created`
  so it doesn't get routed into the live-artifact path.

Tests (`apps/web/tests/providers/project-events.test.ts`):

- A single `conversation-created` event reaches the consumer with
  the parsed payload.
- Two consecutive `conversation-created` events from concurrent
  routine runs both reach the consumer (covers the
  multiple-concurrent-runs case reported in #1502).
- Malformed `conversation-created` payloads are swallowed without
  throwing, matching the existing `file-changed` / `live_artifact`
  defensive behavior.

Manual verification:

- Built locally with `pnpm exec tools-pack mac build --to app
  --portable` and installed.
- Created a routine in `Reuse an existing project` mode targeting an
  existing project.
- With the project view open, clicked `Run now`. The new "Routine"
  conversation appeared in the project's conversation list within
  about a second, without exiting and re-entering the project, and
  the active conversation was not changed.
- Clicked `Run now` twice in quick succession; both new
  conversations appeared in the list, covering the concurrent-runs
  case in #1502.
- `pnpm guard` and `pnpm --filter @open-design/web typecheck` clean;
  full web test suite is 1016/1016 passing.

Fixes #1361

* fix(#1523 review): share SSE type via contracts; guard conversation refresh against project-switch and reordering races

Addresses Codex P1 + lefarcen P2 inline review on #1523 (#1361):

1. Move `ProjectConversationCreatedSsePayload` to
   `@open-design/contracts` (`packages/contracts/src/sse/chat.ts`)
   so the daemon producer and the web consumer share one type. The
   web provider re-exports it under the local
   `ProjectConversationCreatedEvent` name to keep the existing import
   shape stable for callers; the daemon emit site picks up the same
   shape via a JSDoc typedef so producer and consumer can't drift as
   this stream grows. (Addresses lefarcen P2 on project-events.ts:17.)

2. Guard the `conversation-created` async refresh in `ProjectView`
   against two distinct races:

   - Project-switch race: capture `project.id` at dispatch time and
     re-check it via a live `projectIdRef` after `listConversations`
     resolves; bail if the user switched projects while the request
     was in flight. The existing project-load effects use the same
     cancellation pattern. (Addresses Codex P1 on ProjectView.tsx:767.)

   - Concurrent-refresh re-ordering race: bump a monotonic
     `conversationsRefreshTokenRef` on every dispatch and capture each
     request's token; only the request whose captured token still
     equals the live ref at await-return applies its result. Two
     rapid `conversation-created` events (the #1502 concurrent
     Run-now case) can no longer drop the newest conversation when
     the earlier request resolves last with a stale, shorter list.
     (Addresses lefarcen P2 on ProjectView.tsx:767.)

Both guards are documented inline with comments that point back at the
review threads. The existing project-events tests (single delivery,
concurrent delivery, malformed payloads) are unchanged — the new
guards are defensive logic on the consumer, not new event shapes.

`pnpm guard`, `pnpm --filter @open-design/web typecheck`,
`pnpm --filter @open-design/daemon typecheck`, and the full web test
suite (1016/1016) remain green.
2026-05-13 14:50:58 +08:00
pftom
9e196d34af feat(daemon, web): enhance plugin sharing workflows and UI components
- Updated the plugin sharing prompts to utilize local daemon endpoints for publishing to GitHub and contributing to Open Design, streamlining the user experience.
- Refactored the `PluginsView` and `PluginShareMenu` components to support new sharing functionalities, including confirmation modals and improved link handling.
- Enhanced the CSS styles for the plugin share confirmation modal and related UI elements for better visual consistency.
- Added tests to verify the functionality of the new sharing workflows and ensure proper integration within the existing plugin management system.

This update significantly improves the plugin sharing experience, making it easier for users to publish and contribute their plugins effectively.
2026-05-13 14:35:09 +08:00
Sid
eda182c8a1
refactor(web): UI polish for v0.7.0 — neutralised palette, official brand glyphs, lucide (#1522)
* refactor(web): adopt lucide-react for the inline Icon component

The hand-rolled `<Icon>` set drifted in stroke weight and proportion across
its 50+ glyphs as new icons were added. Swap the implementation to dispatch
to `lucide-react` while keeping the same `<Icon name="..." size={X} />` API
so the 246 existing call sites stay untouched.

- Adds `lucide-react` as a dependency (tree-shaken; ~30KB gzipped for the
  ~50 icons we actually import).
- `discord` and `x-brand` keep their bespoke inline SVG paths since lucide
  intentionally does not ship brand artwork.
- `spinner` continues to use the existing `.icon-spin` className for its
  rotation; under the hood it now renders lucide's `Loader2`.
- New `paw` glyph (lucide `PawPrint`) so the Pets nav item stops sharing
  the `sparkles` icon with External MCP.

No behaviour change: the prop surface is identical, fill follows
`currentColor` exactly as before, and aria-hidden / focusable defaults are
preserved. Visual deltas are limited to the strokes themselves (slightly
finer endcaps, more consistent baseline weights) — exactly the
consistency upgrade lucide gives us.

* feat(web): bundle official brand assets for agent icons

`AgentIcon` previously approximated each agent's brand with hand-drawn
SVG (orange Anthropic-ish sparkle, OpenAI-knot ellipses, etc). Replace
those approximations with the real, vendor-published artwork shipped as
static assets under `apps/web/public/agent-icons/`.

- 13 SVG marks sourced from `@lobehub/icons-static-svg` (MIT) — color
  variants where the vendor published one (Claude, Codex, Gemini,
  Copilot, Qwen, Qoder, DeepSeek, Kimi, Mistral/Vibe), monochrome marks
  for the rest (Cursor, OpenCode, Hermes, MiMo, Pi, Kilo).
- 1 PNG mark (Devin) sourced from devin.ai/icon.png, resized to 96×96
  via `sips` since Cognition doesn't publish an SVG.
- Each SVG was cleaned (stripped `<title>` brand text and the library's
  internal `style="flex:none;..."` ; dropped `width/height="1em"` so
  `viewBox` governs sizing) and run through `svgo --multipass`. Total
  bundle footprint: ~36 KB for all 17 files, only loaded on the agent
  cards that render them.
- `AgentIcon` now resolves brands via a small `ICON_EXT` table and
  renders `<img src="/agent-icons/<id>.<ext>">`. Agents without an asset
  (`devin` is the lone outlier removed in this commit because PNG; new
  agents with no shipped artwork at all) fall back to an initial-letter
  pill that reads as "no official mark yet" rather than inventing
  brand artwork.
- Removes the `simple-icons` dependency from a previous iteration since
  `AgentIcon` was its only consumer.

Public-API stable: `<AgentIcon id={a.id} size={X} />` still accepts the
same prop shape; `AvatarMenu`'s small-size usage continues to work.

* refactor(web): polish entry view + Settings dialog UI for v0.7.0

A sweep over the two surfaces that have the most visual surface area in
the app (the entry sidebar / New Project panel on the left, and the
Settings modal). The work converged on a single neutral palette + a
small set of shared dimensional standards documented in CSS, so future
sections that get added slot into the same rhythm.

New Project panel (apps/web/src/components/NewProjectPanel.tsx +
.newproj* rules in index.css)
- Adds a spec comment block at the top of the .newproj rules listing
  the canonical heights (input 30, dropdown 38, compact toggle 36,
  popover item 38) and the neutral colour rules.
- Rebuilds PlatformPicker as a DS-picker-style dropdown trigger +
  popover (the previous 6-card 2×3 grid was ~280px tall; the dropdown
  collapses to a single 38px row with the same multi-select semantics).
- Replaces SurfaceOptions' two heavy `ToggleRow` cards with the new
  compact one-line `CompactToggle`; the descriptive hint moves to a
  native `title` tooltip.
- Compresses the Fidelity card grid (thumb aspect 16/7 → 16/5, tighter
  padding, smaller label).
- Neutralises every selected/active state inside the panel: removes the
  orange accent fills and rings from `.newproj-card.active`,
  `.newproj-title-badge`, `.compact-toggle.on`, `.toggle-row.on`, the
  DS picker popover items + radio/check marks, the trigger open border
  and shadow, and the search-bar background. The Create CTA stays the
  only orange element on the panel.
- Aligns the project-name input focus state across the sidebar:
  border `var(--text)` + 8% black halo (rgba is written out because the
  CSS pipeline collapses `color-mix(... 8%, transparent)` down to a
  solid `var(--text)`, which would render as a 3px solid black band).
- Switches the body card from `flex: 1 1 auto` to `flex: 0 1 auto` so a
  short form variant doesn't leave a white void at the bottom of the
  card, and disables overscroll-bounce on the card so a fast scroll
  doesn't briefly expose the page-level gray under the white surface.
- Pins the privacy footer below the card with a fixed 0 margin-top +
  shorter padding-top so it reads as a label of the card rather than a
  centred dialog footer.

Entry sidebar footer (apps/web/src/components/EntryView.tsx +
.entry-side-foot* rules)
- Replaces the X social pill's `external-link` placeholder glyph with a
  bespoke filled `x-brand` SVG that mirrors the `discord` mark already
  in the icon set.
- Wraps Discord + X in `.entry-side-foot-social` and lets that group
  flex-margin to the right of the row, so the two social pills read as
  a tight pair instead of a fourth pill stuck to the Pet pill.
- Drops the "unadopted" red dot on the Pet pill (it duplicated the call
  to action that the label already carried).
- Shrinks the footer icons to 10px and dims them to 55% / 75% opacity
  on hover so the labels are clearly the focal point — `currentColor`
  on the lucide-rendered SVGs would otherwise make the glyphs full
  black on hover.
- Tightens the env-pill version text cap (180 → 142) so the top row
  ends close to the right edge of the Language + Pet group below it.

Settings dialog (apps/web/src/components/SettingsDialog.tsx +
.modal-settings / .settings-* / .seg-* / .agent-* rules)
- Removes the "SETTINGS" kicker eyebrow above each section title (the
  big-typography title and modal context already make it redundant).
- Switches the sidebar from a card-per-item layout to ChatGPT-style
  single-line pills: hides the `<small>` description, swaps the
  sidebar bg from gray to white, makes the active item a gray pill (no
  border, no shadow) so all items keep a consistent row height
  regardless of state.
- Drops the modal-body's top border (already separated by the
  whitespace between modal-head and the body grid) and pins
  `.modal-settings { height: min(720px, 100vh - 64px) }` so the
  dialog no longer resizes when the user switches between short and
  long sections.
- Compresses the Local CLI / BYOK seg-control from a 2-line ~52px card
  pair to a 1-line ~42px segmented pill that height-matches the active
  sidebar nav-item, and aligns the `.settings-content` padding-top
  with `.settings-sidebar` (22 → 16) so the first content row sits
  level with the first sidebar item.
- Neutralises agent-card selected state, install/docs link colour, and
  protocol-chip active state — same accent-stripping pattern as the
  New Project panel.
- Uniform agent-card height via `min-height: 64px` so installed cards
  (icon + name + version) align with unavailable cards (icon + name +
  not-installed + Install/Docs row).

No prop-API changes, no business-logic edits — this is a pure visual
refactor. Existing tests, providers and daemon contracts are untouched.
2026-05-13 13:59:19 +08:00
Jesse Yu
b2841f6045
Fix user chat message bubble styling (#1517)
Co-authored-by: Haoyuan Yu <haoyuan.yu@shopee.com>
2026-05-13 13:59:15 +08:00
Caprika
6736310a01
Implement manual edit inspector (#1448)
* feat(web): tweaks palette popover with HSL hue-shift recoloring

Adds a Tweaks color-palette popover to the HTML preview toolbar.
Selecting a palette re-skins the iframe in place via a srcDoc-side
bridge that walks the DOM and shifts every chromatic paint to the
target hue while preserving each color's saturation and lightness —
pale tints stay pale, bold CTAs stay bold, just in the new color
family. Mono-noir desaturates instead of shifting.

- runtime/srcdoc: new injectPaletteBridge + paletteBridge / initialPalette options
- file-viewer-render-mode: paletteActive flips URL-load back to srcDoc so the bridge can be injected
- FileViewer: state, popover, postMessage wiring, srcDoc + useUrlLoadPreview integration
- PaletteTweaks: popover UI with Original + Coral / Electric / Acid forest / Risograph / Mono noir
- PreviewDrawOverlay: stub pass-through until the draw branch lands

* feat(web): hide finalize-design toolbar from project header

* test(e2e): skip project actions toolbar flow after toolbar removal

* Polish manual edit inspector

* Implement manual edit inspector

* Fix manual edit review regressions

* Fix FileViewer CI regressions

* Fix remaining manual edit review issues

* Flush manual edit styles before draw exit

* Restore Critique Theater styles

* Accept pixel line-height manual edits

---------

Co-authored-by: qiongyu1999 <2694684348@qq.com>
2026-05-13 13:25:58 +08:00
pftom
c9cc3b88c0 feat(web): standardize plugin terminology and enhance UI components
- Updated terminology from "Community" to "Official" across various components to reflect first-party plugin status.
- Enhanced the ChatComposer, HomeHero, and PluginsHomeSection components to improve user experience and clarity in plugin management.
- Improved CSS styles for better visual consistency and layout across plugin-related interfaces.
- Added tests to ensure proper functionality and visibility of official plugins in the UI.

This update reinforces the distinction between official and user-installed plugins, enhancing the overall user experience in plugin interactions.
2026-05-13 12:19:29 +08:00
nettee
0f0d2879ff
Make de/fr/ru content i18n optional (#1511) 2026-05-13 12:17:17 +08:00
lefarcen
dc7791ef9d
feat(analytics): add project_id + project_kind to studio/artifact events (#1509)
Product tracking doc 260513 added project_id + project_kind to
studio_view (artifact), studio_click (share_option), and
artifact_export_result. The Studio funnel can now group by project
type without joining run_created on the back end.

- contracts: 3 props gain required project_id + project_kind
- ProjectView → FileWorkspace → FileViewer: thread projectKind down,
  converting metadata.kind via projectKindToTracking once at the top
- FileViewer + HtmlViewer: populate the three call sites
2026-05-13 12:13:55 +08:00
Siri-Ray
c16297f10c
Refine preview and project dropdown controls (#1514)
* Refine preview and project dropdown controls

* fix(web): gate OS widget metadata

Generated-By: looper 0.7.4 (runner=fixer, agent=codex)

* fix(web): mark platform picker listbox multi-select

Generated-By: looper 0.7.4 (runner=fixer, agent=codex)
2026-05-13 12:13:31 +08:00
Nagendhra Madishetti
e2f409579d
docs: Critique Theater Phase 14 (user guide + 2 AGENTS module maps) (#1319)
* feat(web): pure reducer for Critique Theater states (Phase 7.1)

Pure CritiqueState reducer driven by the contracts-level PanelEvent
(the same shape both the live SSE stream and the recorded transcript
emit), so a single reducer powers both the in-flight panel and the
rerun replay. Lifecycle covers run_started → running → (shipped /
degraded / interrupted / failed), with panelist_open / dim /
must_fix / close / round_end events building per-round
CritiquePanelistView entries as they arrive.

Defensive behaviour that surfaced while writing the spec tests:
- Terminal phases (shipped / degraded / interrupted / failed) are
  sticky against further lifecycle events for the same run, except
  for parser_warning which can land late and is recorded in a side
  channel without changing phase.
- A new run_started for a different runId at any time discards the
  prior state and reboots, so the UI can launch consecutive runs
  without an explicit reset action.
- Events whose runId does not match the active run return the same
  state reference, so React's useReducer doesn't re-render
  subscribers on stray traffic.
- Round bookkeeping keys by round number rather than "always last",
  so an out-of-order panelist_dim for round 1 arriving after a
  round 2 dim does not corrupt the round 2 bucket.

Test coverage: 18 cases covering each transition, the runId guard,
sticky-terminal behaviour, the out-of-order round invariant, and
the stable-identity guarantee. Sets up Phase 7.2 and 7.3 to wire
SSE + replay into the same reducer.

* feat(web): useCritiqueStream hook subscribes to SSE and feeds reducer (Phase 7.2)

createCritiqueEventsConnection is a pure connection manager that
mirrors apps/web/src/providers/project-events.ts: opens an
EventSource at /api/projects/:id/events, listens for every name in
CRITIQUE_SSE_EVENT_NAMES, decodes each frame back into a PanelEvent
(stripping the critique. prefix and merging the data payload), and
hands it to the caller's onEvent. Reconnect uses exponential
backoff (1s → 30s) and resets on `ready`; malformed payloads drop
with a dev-mode warning rather than tearing the stream.

useCritiqueStream wraps the manager in a useReducer that owns the
CritiqueState. enabled=false or a null projectId tears down the
connection cleanly; switching projectId closes the old connection
and opens a fresh one. The returned dispatch lets local UI
synthesise actions (e.g. an Esc keypress firing a synthetic
interrupted while a kill request is in flight); production traffic
comes from the SSE stream.

Test coverage:
- sse.test.ts (10 cases, node env): subscription set covers every
  CRITIQUE_SSE_EVENT_NAMES channel; payload decoding lifts the wire
  shape back to PanelEvent; malformed JSON is swallowed and does
  not stop the stream; exponential backoff schedule and ready-reset
  semantics are pinned with a setTimeout seam; close() cancels
  pending reconnects and shuts the live source; no-op fallback
  when EventSource is unavailable.
- useCritiqueStream.test.tsx (6 cases, jsdom env): idle pre-event,
  reducer driven by synthetic actions, no connection when disabled
  or projectId is null, clean close on unmount, projectId change
  reopens cleanly.

* feat(web): useCritiqueReplay hook drives reducer from transcript file (Phase 7.3)

Fetches the per-run NDJSON transcript (one PanelEvent per line),
parses every line via the shared isPanelEvent predicate, and
dispatches into the same CritiqueState reducer the live SSE stream
uses. A single reducer means the UI rendering a replay can be
identical to the live panel, and a UI mounting both
useCritiqueStream and useCritiqueReplay in parallel does not have
to reconcile two state shapes.

speed knob is `paused | instant | live | { intervalMs: N }`.
- instant flushes every event synchronously, useful for opening a
  finished run already at its terminal state.
- intervalMs paces dispatches at a fixed cadence so the reviewer
  can watch the run unfold.
- paused parses the transcript but holds events back until the
  caller advances speed (consumers can drive a scrubber later).
- live is reserved for the future "playback at original cadence"
  feature, currently treated as instant; replay timestamps are not
  yet persisted with each event so honest pacing requires a
  follow-up Phase 7+ task.

gunzip seam handles `.ndjson.gz` transcripts via
DecompressionStream when present; the production fetch path picks
between text and arrayBuffer based on the URL extension. Both seams
are injectable so the unit tests don't need to spin up a real
network or a real gzip pipeline.

Test coverage (8 cases, jsdom env):
- Idle status before any URL is provided.
- speed=instant flushes the full transcript synchronously to
  shipped state.
- speed={intervalMs:N} paces with the setTimeout seam, reaching
  done after the last tick.
- speed=paused leaves status=playing with no dispatches.
- Empty transcript reports done with state still idle.
- Fetch rejection surfaces an error status with the message.
- Malformed NDJSON lines are skipped; valid events around them
  still land.
- .gz transcripts route through the gunzip seam.

Closes the Phase 7 plan tasks 7.1 / 7.2 / 7.3 (reducer + stream +
replay), all on one branch ready for review. Phases 8+ (Theater
components) consume these from this PR.

* fix(web): close payload-override gap + paused-resume bug in Critique Theater hooks (Phase 7 review)

Two P1 fixes from lefarcen's review on PR #1307:

SSE payload override

`sseToPanelEvent` previously spread `data` after the channel-derived
`type`, so a payload-provided `type` could override the channel and
route a `critique.run_started` frame into the reducer as a `ship`
action. Reversed the spread so the channel-derived `type` is
authoritative, and revalidated the resulting object through the
contracts-level `isPanelEvent` predicate before returning. Frames
that fail validation (missing runId, empty runId, unknown type) are
dropped, so a malformed or compromised SSE frame can no longer
dispatch a wrong-shape action into the reducer.

Three new sse.test.ts cases pin the regression: hostile `type:'ship'`
in the payload still resolves to `run_started`, missing runId is
dropped, empty runId is dropped.

Replay pause/resume

`useCritiqueReplay` had one big effect keyed on `transcriptUrl`
only, so flipping `speed` from `paused` to `instant` never re-fired
and the held events sat undispatched. Split into a parse effect
(depends on URL, fetches and stores events in state) and a pace
effect (depends on parsed-events + speed, owns the cursor + timers).
The playback cursor lives in a ref that survives pause/resume
cycles, so flipping `paused` -> `instant` flushes from the current
position rather than restarting (which would double-dispatch
`run_started` and reset the reducer).

Two new useCritiqueReplay.test.tsx cases:
- paused-then-instant transitions from `playing` to `done` and
  reaches the shipped terminal phase
- intervalMs paced playback dispatches one event, pauses to drain
  the next scheduled timer, flips to instant, and confirms the
  remaining transcript drains exactly once (cursor was preserved)

Doc consistency

The earlier source comment in useCritiqueReplay.ts claimed `live`
"paces by recorded timestamps" while the impl used zero-delay
timers and the PR body said it behaves like `instant`. Aligned to
reality: `live` currently behaves like `{ intervalMs: 0 }` (events
drain on successive microtasks via setTimeoutFn) because transcripts
do not yet carry per-event timestamps. Honest timestamp-driven
pacing is queued as a Phase 7+ follow-up.

Validated: pnpm guard, pnpm --filter @open-design/web typecheck,
Theater suite 47/47 (up from 42, +3 sse + 2 replay), full web suite
96 files / 888 tests.

* feat(i18n): seed Critique Theater key block (en + zh-CN; other locales fall back via spread)

* feat(web): Theater PanelistLane component (Phase 8.1)

* feat(web): Theater ScoreTicker component (Phase 8.2)

* feat(web): Theater RoundDivider component (Phase 8.3)

* feat(web): Theater InterruptButton component with Escape keybind (Phase 8.4)

* feat(web): Theater TheaterDegraded chip (Phase 8.5)

* feat(web): Theater TheaterCollapsed post-run summary (Phase 8.6)

* feat(web): Theater TheaterTranscript replay surface (Phase 8.7)

* feat(web): Theater TheaterStage top-level container (Phase 8.8)

* feat(web): Theater CSS using existing semantic tokens (no hex literals)

* feat(web): Theater public exports barrel

* fix(web): resolve P2 + P3 review feedback on Phase 8 (PR #1314)

Addresses all 4 P2 + 3 P3 items from codex, Siri-Ray, and lefarcen.

State-lifecycle fixes (3 x P2)
1. Reducer learns a synthetic `__reset__` action (`CritiqueResetAction`).
   Host hooks dispatch it when their gating prop changes so a stale
   run from a prior project / transcript cannot bleed into the next
   context. Reset is idempotent on idle (returns the same reference).
2. `useCritiqueStream` dispatches `__reset__` at the top of its
   connection effect, so a workspace switch from project A (which
   streamed a critique) to project B clears the reducer before the
   new EventSource opens. enabled=false also clears.
3. `useCritiqueReplay` dispatches `__reset__` at the top of its
   parse effect, so transcriptUrl swaps (including swap-to-null after
   a replay reached `shipped`) lift the reducer back to idle before
   the new fetch starts.

SSE validation (1 x P2)
4. `sseToPanelEvent` now runs a per-variant `hasValidVariantShape`
   check after the cheap `isPanelEvent` predicate. A
   `critique.ship` frame missing `composite` / `round` / `status` /
   `artifactRef` is rejected before reaching the reducer, so
   TheaterCollapsed can no longer crash on `undefined.toFixed(1)`.
   Every variant's required fields are validated: run_started
   (protocolVersion, non-empty cast, maxRounds, threshold, scale),
   panelist_* (round, role, plus variant-specific shape), round_end
   (round, composite, mustFix, decision in {continue,ship}, reason),
   ship (round, composite, status, artifactRef.{projectId,artifactId},
   summary), degraded (reason, adapter), interrupted (bestRound,
   composite), failed (cause), parser_warning (kind, position).

Reducer correctness (1 x P2)
5. `panelist_open` now materializes the round + an empty panelist
   view (`{dims: [], mustFixes: []}`) so TheaterStage can highlight
   the in-progress lane the instant the tag opens. Before this, a
   stream that emitted only `panelist_open` after `run_started` left
   `rounds = []` and the UI rendered no current round until a later
   `panelist_dim` arrived.

Polish (3 x P3)
6. Brand role tint swaps from `var(--magenta, var(--accent))` to
   `var(--purple, var(--accent))`. `--purple` is actually defined
   across the design systems; `--magenta` is not, so Brand was
   silently falling through to `--accent` and looking identical to
   Designer.
7. New i18n key `critiqueTheater.interruptedSummary` for the
   interrupted-collapse copy ("Interrupted at round N, best
   composite X.X"). Previously the interrupted branch reused
   `shippedSummary` and the UI read "Shipped at round..." for a run
   that specifically did not ship. Native value in en + zh-CN; other
   locales fall back via `...en` spread.
8. `TheaterDegraded` heading id comes from `useId()` instead of a
   hardcoded `theater-degraded-heading`, so two chips rendered on
   the same page (chat history with multiple completed runs) keep
   their aria-labelledby references unambiguous.

Tests (15 new cases)
- reducer.test.ts (+5): __reset__ on running/terminal/idle, panelist_open materializes round, panelist_open does not stomp prior panelist data.
- sse.test.ts (+6): variant-level rejection for ship without required fields, degraded without adapter, run_started with empty cast, panelist_dim with non-numeric score, round_end with unknown decision, plus a positive fully-formed ship.
- useCritiqueStream.test.tsx (+2): state reset on projectId change, state reset on enabled flip false.
- useCritiqueReplay.test.tsx (+1): state reset on transcriptUrl swap to null after a replay reached shipped.
- TheaterCollapsed.test.tsx (text-pinning update): asserts the interrupted branch reads "Interrupted at round 1" + "best composite 7.9", and explicitly NOT "Shipped at round...".
- TheaterDegraded.test.tsx (+1): two chips on the same page get unique aria-labelledby ids that each resolve to an `<h3>`.

Validated
- pnpm guard clean
- pnpm --filter @open-design/web typecheck clean
- Theater suite: 13 files, 101 tests (was 86 on the first Phase 8 push, +15 new)
- tests/i18n/locales.test.ts 5 of 5 across 18 locales

* feat(web): CritiqueTheaterMount wires SSE + reducer into a single drop-in (Phase 9.1)

* feat(i18n): Critique Theater strings for de + ja + ko + zh-TW (Phase 9.2)

* fix(web): resolve P1 + P2 review feedback on Phase 9 (PR #1315)

Addresses every blocker from codex, Siri-Ray, and lefarcen. The
three state-lifecycle and SSE-validation issues they also flagged
inherit fixes from PR #1314's review pass that this branch now sits
on top of after rebase.

Real daemon kill on Interrupt (P1)
- CritiqueTheaterMount now POSTs to
  /api/projects/:id/critique/:runId/interrupt alongside the
  optimistic local dispatch. Before this fix, clicking Interrupt
  only flipped the React state to interrupted while the daemon job
  kept running. The fetch is best-effort: a 404 (endpoint not wired
  yet, lands in Phase 15) is swallowed with a dev-mode console.warn
  so the UI still moves to the collapsed badge.
- New fetchInterrupt test seam lets RTL assert on the URL / method
  and simulate the "daemon not ready yet" path. Two tests pin both:
  the happy URL proj-42/critique/run-abc/interrupt POSTs, and a
  rejected fetch still flips the UI.

interruptPending reset on new run (P2)
- A ref-backed effect compares the current runId against the last
  one we saw; when it changes, interruptPending is cleared. A user
  who interrupts run-1 and then triggers run-2 from the same mount
  now gets a fresh, enabled kill button instead of one stuck in
  "Interrupting…". Pinned by a new mount test.

Escape keybind scope (P2)
- InterruptButton now checks the keydown target. Escape inside an
  input, textarea, select, or contenteditable element is ignored
  (and any ancestor of those via closest() is treated the same
  way). Body-level focus still fires the keybind so the Theater
  area's affordance keeps working. Four new tests cover textarea,
  input, contenteditable, and the body-focus positive case.

userFacingName i18n key (P2)
- The spec at specs/current/critique-theater.md:6 mandates a single
  critiqueTheater.userFacingName key so the "Design Jury" label can
  be renamed without touching code. Phase 8 introduced
  critiqueTheater.title by mistake; renamed across types.ts, en.ts,
  zh-CN.ts, de.ts, ja.ts, ko.ts, zh-TW.ts, and the lone consumer
  TheaterStage.tsx. The locale alignment test stays green.

Validated
- pnpm guard clean
- pnpm --filter @open-design/web typecheck clean
- Theater suite: 14 files, 112 tests (was 101 before, +11 new for
  the Phase 9 review pass: 3 mount + 4 InterruptButton focus scope;
  the rest were already in #1314's review fix).
- tests/i18n/locales.test.ts 5 of 5 across 18 locales.

* feat(daemon): adapter-degraded registry with TTL (Phase 10.1)

In-memory registry recording adapters that produced malformed or
oversize transcripts so the orchestrator can skip them for a TTL
window (default 24h) instead of cycling through known-bad providers
on every run.

Records carry reason (malformed_block | oversize_block |
missing_artifact), source label, and expiresAt. The test-only
clock seam lets the suite advance time deterministically and prove
that an expired entry stops counting as degraded without anyone
calling clearDegraded.

7/7 vitest cases green.

* feat(daemon): synthetic good + bad adapter fixtures (Phase 10.2)

Two test-only adapters that read the existing v1 transcript
fixtures (happy-3-rounds and malformed-unbalanced) and replay them
as either a full string or a 512-byte chunked stream. The chunked
form is what the conformance harness uses to prove the parser
holds together when the transcript arrives in arbitrary network
slices, not as one buffered blob.

* feat(daemon): adapter conformance harness (Phase 10.3)

runAdapterConformance pulls a transcript through the same
parseCritiqueStream pipeline the orchestrator uses and classifies
the outcome as shipped, degraded, or failed. On a degraded
outcome it forwards the matched reason to the adapter-degraded
registry, so a single nightly conformance run is what populates
the skip list rather than the orchestrator learning each adapter
is broken at request time.

5/5 vitest cases green covering shipped, malformed degraded,
oversize degraded, no-ship failure, and the harness-thrown
failure path.

* test(e2e): Critique Theater Playwright suite (Phase 11)

Six tests, one viewport per visual case, deterministic SSE
fixtures stubbed via page.route(). Adds the suite to
test:ui:extended so the existing extended-UI lane picks it up.

Coverage:

  1. Happy path: a single mounted theater plays the full
     fixture (1 run_started, 5 panelists open / dim / must_fix /
     close, 1 round_end, 1 ship) and ends on the score badge.
  2. Interrupt mid-run: the panelist that is open at the time
     the interrupt button is clicked closes with an interrupted
     marker and the transcript freezes there.
  3. Visual regression at 375x720 mobile.
  4. Visual regression at 768x1024 tablet.
  5. Visual regression at 1280x800 desktop.
  6. A11y role tree: the theater region exposes a labelled
     landmark, each panelist lane is a group with an accessible
     name, the score is a status live region.

All SSE traffic is stubbed by page.route so the suite runs in CI
without a daemon. The toggle is seeded via localStorage by
bootAppWithCritiqueEnabled so the gate behaves as if Settings
flipped it on. typecheck clean; playwright --list reports 6.

* test(web): reducer p99 bench at 10k iterations (Phase 13.1)

Locks the documented 2ms budget for the Critique Theater reducer
on a representative SSE script (27 actions, one full happy run)
behind a regression gate. Asserts p99 stays under 4ms (2x the
documented budget) so CI runners with a noisy neighbour do not
flake while a real regression to 20ms or 200ms still trips.

The bench is a vitest case rather than a bare microbenchmark so
it runs in the same CI lane as every other web test and does not
need a parallel runner.

* test(web): critique surface coverage walker (Phase 13.2)

Walks the public critique surface (11 SSE event names, 5 panelist
roles, 6 lifecycle phases, 9 named i18n keys) and asserts each
named symbol appears in both the src corpus and the test corpus.
The walker is the gate that catches a rename in one half of the
codebase without a matching update in the other half: a future
PR that drops 'panelist_must_fix' from the reducer without also
removing its test reference fails this suite.

62 assertions, one per symbol per corpus.

* docs: Critique Theater user guide (Phase 14.1)

Seven sections aimed at end users (not contributors):

  1. What is Design Jury
  2. How it works (the five panelists, auto-converging rounds,
     the composite formula)
  3. Settings (the M1 toggle and what it does)
  4. Reading the score badge
  5. Replay surface
  6. Troubleshooting (degraded, interrupted, failed)
  7. FAQ

The composite formula is documented as
    designer * 0 + critic * 0.4 + brand * 0.2 + a11y * 0.2 + copy * 0.2
because anyone trying to reverse-engineer the score is going to
search for those weights and the docs are the place they should
land first.

* docs(daemon): critique module AGENTS map (Phase 14.2)

Daemon-side wayfinder for the apps/daemon/src/critique directory.
Tables every file, what owns what invariant, and the 'when you
change anything here' guide so a future contributor does not
have to reverse-engineer the rollout resolver before adding a
new SSE event.

* docs(web): Theater module AGENTS map (Phase 14.3)

Web-side mirror of the daemon AGENTS map. Same file table, same
invariants section, same change-impact guide, sized to the
Theater component package.

* docs: tighten Phase 14 reasoning from lefarcen review (PR #1319)

Four content gaps lefarcen flagged in the Phase 14 docs review,
addressed inline rather than deferred. The fifth item (scope-drift
between 'docs only' PR body and the cumulative stacked diff) is
handled by rewriting the PR body, not the docs.

1. Round exit conditions (lefarcen P2-1).
   docs/critique-theater.md §2 'Auto-converging rounds' now lists
   the five conditions that stop a run (threshold reached, round
   budget exhausted, per-round timeout, total timeout, user
   interrupt) with their default values. A user debugging a run
   that stopped at round 1 with composite 5.4 can read this list
   and find the matching cause without spelunking the orchestrator.

2. Prior-art comparison (lefarcen P2-2).
   New §1.5 'Why an in-CLI panel and not a third-party design lint'
   pre-answers the 'why not Figma lint / Adobe checker / Material
   You conformance' question. Three differences: rule engines vs
   generative reviewers, post-hoc vs in-loop, external service vs
   same-CLI-session.

3. Composite formula rationale (lefarcen P2-4).
   §2 now explains why each weight is set the way it is: critic
   gates correctness so it gets 0.4; brand / a11y / copy are
   secondary quality dimensions at 0.2 each; designer is at 0.0
   in v1 because aesthetic preference is not a ship gate. The slot
   stays in the schema so notes flow into the transcript and a v2
   config release can bump the weight without a wire-shape change.

4. v2 cast-config ownership (lefarcen P2-3).
   Both AGENTS.md files (daemon + web) now declare a 'Designer
   weight frozen at 0.0 until v2 cast config' invariant. The
   daemon side calls out where the SKILL.md frontmatter resolver
   lands (apps/daemon/src/critique/config.ts); the web side calls
   out where the Settings surface lands (apps/web/src/components/
   Settings/). A contributor reading either AGENTS.md before
   implementing v2 sees which module to touch first.

* docs(web): mirror the Designer-weight invariant in Theater AGENTS.md (PR #1319)

lefarcen P1 follow-up on PR #1319: the daemon AGENTS.md already
declares 'Designer weight is frozen at 0.0 until v2 cast config
lands' as an invariant, but the web AGENTS.md's parallel bullet
led with 'Composite weights are read-only on the web side' which
buried the Designer-specific constraint. A web contributor
reading that bullet would not realise the v1 weight distribution
is wire-shape (changing it mid-v1 invalidates persisted
critique_runs composite values).

Rewrote the bullet to lead with the same 'Designer weight is
frozen at 0.0 until v2 cast config lands' phrasing the daemon
side uses, and added an explicit cross-link to the daemon
AGENTS.md so the two halves of the invariant read as one rule.

Web-side specifics retained: ScoreTicker / TheaterCollapsed read
composite off the wire (no client recompute), v2 lands as a
Settings surface at apps/web/src/components/Settings/, do not
add a 'weights' prop to any component in this directory until
the contracts package carries the v2 cast type.

* docs: replace deferred metrics endpoint reference + refresh Theater module map (PR #1319)

Two carryover items lefarcen flagged across the PR #1319 + #1320
reviews.

1. docs/critique-theater.md was sending users to
   /api/metrics/critique as the conformance-status check on
   malformed_block, but the Phase 12 metrics endpoint is
   explicitly deferred until after orchestrator wiring lands.
   Replaced the link with the pnpm conformance-harness command
   that DOES exist today (pnpm --filter @open-design/daemon
   vitest run tests/critique-conformance.test.ts) and noted
   that the dashboard surfaces this status as a series once
   Phase 12 ships.

2. apps/web/src/components/Theater/AGENTS.md module map was
   stale after Phase 15: the index.ts row said 'only two hooks
   are exported' but the barrel now exports
   useCritiqueTheaterEnabled too (plus the setCritiqueTheaterEnabled
   setter). Updated the row to list all three hooks + the
   setter + the reducer-derived contract types, and added a
   new row for hooks/useCritiqueTheaterEnabled.ts in the file
   table so a web contributor scanning the table sees the new
   hook without inferring it from the index.ts blurb.

* fix(web): restore wait-for-daemon-ack pattern on Theater interrupt

Same regression as flagged on PR #1316 post-main-merge: the
optimistic local dispatch fired before the POST resolved, so a
daemon 404 / 409 still terminalized the UI and the real SSE
terminal event got ignored by the sticky interrupted phase.

Snapshot runId / bestRound / composite at click time, dispatch
interrupted only on res.ok, clear interruptPending on rejection or
non-2xx so the user can retry. Tests cover rejection + 404 leaving
the run on the live stage; the 204 path waits for the ack.

---------

Co-authored-by: Nagendhra <nagendhra405@gmail.com>
2026-05-13 12:11:48 +08:00
pftom
fcc3ae5838 feat(web): enhance HomeHero and related components for improved context selection and visibility handling
- Updated the HomeHero component to support skill and MCP server mentions, allowing users to select these options seamlessly.
- Improved CSS styles for the HomeHero component, enhancing the visual presentation of active selections and context tabs.
- Refactored visibility handling for slides in the deck framework, ensuring proper display logic and preventing visibility issues with variant classes.
- Added tests to verify the functionality of context selection and visibility handling, ensuring a smoother user experience.

This update significantly enhances the user interface and interaction capabilities within the HomeHero component, improving the overall experience for users managing skills and presentations.
2026-05-13 11:48:15 +08:00
pftom
9006b74cde feat(web): enhance ChatComposer and related components with skill and MCP server integration
- Added support for skills and MCP servers in the ChatComposer, allowing users to apply skills and select MCP servers through mentions.
- Updated the HomeHero and ProjectView components to manage skill selection and project skill changes.
- Enhanced the EntryShell and other components to accommodate new skill and MCP server functionalities.
- Improved CSS styles for better visual presentation of new features.
- Added tests to ensure proper functionality of skill and MCP server integrations within the ChatComposer.

This update significantly improves the user experience by enabling seamless integration of skills and MCP servers into the chat interface, enhancing project management capabilities.
2026-05-13 11:47:51 +08:00
pftom
f7a13c7b15 feat(web): enhance HomeHero component to support IME composition handling
- Added a `composingRef` to track IME composition state, preventing submission during text composition.
- Implemented `isImeComposing` function to check if the input is currently being composed.
- Updated event handlers to ensure that submissions and plugin selections are blocked while composing.
- Added tests to verify that submissions and plugin picks do not occur during IME composition, improving user experience for non-Latin input methods.

This update enhances the input handling in the HomeHero component, ensuring a smoother experience for users utilizing IME for text input.
2026-05-13 11:26:10 +08:00
lefarcen
e2952acd05 Revert "fix(web): restore consistent app header layout (#1432)"
This reverts commit 3d3119333c.
2026-05-13 11:20:16 +08:00
pftom
c36609c47d feat(daemon, web): implement plugin sharing project creation and enhance CLI functionality
- Added new flags for conversation, message, agent, and model in the CLI to support enhanced plugin sharing features.
- Introduced a new API endpoint for creating share projects for plugins, allowing users to publish to GitHub or contribute to Open Design.
- Updated the UI components to facilitate the new sharing functionalities, including prompts for user input during the sharing process.
- Enhanced the project management system to handle new plugin share actions, improving user interaction and experience.
- Added tests to ensure the reliability of the new sharing features and their integration within the existing plugin management system.

This update significantly enhances the plugin ecosystem by enabling users to share their creations more effectively and streamline collaboration.
2026-05-13 07:01:12 +08:00
Neha Prasad
342ba44383
fix memory extraction history affordance (#1447) 2026-05-12 13:35:34 -04:00
Siri-Ray
3d3119333c
fix(web): restore consistent app header layout (#1432)
* docs: add NotebookLM GitHub export script (#1062)

* docs: add NotebookLM GitHub export script

* fix: make NotebookLM export TOC anchors work

* fix: escape TOC link text markdown chars

* fix: include merged PRs when exporting --prs all

* fix: allow --prs merged mode

* fix: treat --limit as total export budget

* fix: avoid starving buckets under global --limit

* fix: support --issues none and handle repos w/ issues disabled

* fix: avoid underfilling export when buckets empty

* fix: keep disabled-issues fallback quiet

* fix: silence disabled issues fallback

* fix: satisfy script typecheck

* prevent duplicate saves and add template deletion (#1294)

* prevent duplicate template entries on repeated save

* add delete button to saved template list

Templates can now be removed from the template picker via a hover x button, calling the existing DELETE /api/templates/:id endpoint.

* add missing onDeleteTemplate prop in test fixtures

* add template deletion flow test for NewProjectPanel

* reject template names longer than 100 characters

* preserve original createdAt on template update

* feat: add FAQ page skill (#1162)

* fix: set writable OD_DATA_DIR default for nix run

Fixes #1157

When running via 'nix run github:nexu-io/open-design', the daemon
attempted to create runtime state under the Nix store package path:

  /nix/store/.../lib/open-design/.od/projects

The Nix store is read-only at runtime, causing startup to fail with
ENOENT when mkdir() tried to create the projects directory.

This commit updates the nix run wrapper to export OD_DATA_DIR with
a writable default ($HOME/.od) when the variable is unset. Users
can still override it by setting OD_DATA_DIR before running.

The Home Manager and NixOS modules already set OD_DATA_DIR, so they
are unaffected by this change.

* feat: add FAQ page skill

Add a new skill for generating Frequently Asked Questions pages with:
- Collapsible accordion sections for Q&A pairs
- Real-time search functionality
- Category filtering (Billing, Account, Technical, General)
- Smooth animations and transitions
- Keyboard navigation support
- Mobile-friendly responsive design
- Semantic HTML with proper ARIA attributes

The skill includes:
- SKILL.md with triggers, workflow, and output contract
- example.html demonstrating a complete FAQ page with 12 questions

Use cases: help centers, support pages, product documentation

* fix: address PR review feedback for FAQ page skill

- Fix craft slugs: use accessibility-baseline and state-coverage instead of non-existent slugs
- Remove overly broad 'questions and answers' trigger
- Add edge case handling for insufficient/excessive FAQs
- Remove search highlighting requirement (XSS risk)
- Update self-check to reflect filtering instead of highlighting

Addresses review comments from @lefarcen and @chatgpt-codex-connector

* feat: add localized copy for faq-page skill

Add German, French, and Russian translations for the FAQ page skill
example prompt to fix validation test failure.

- DE: FAQ-Seite mit Akkordeon-Abschnitten, Suchfunktion und Kategoriefilterung
- FR: Page FAQ avec sections accordéon, recherche et filtrage par catégorie
- RU: Страница FAQ со складными секциями-аккордеонами, поиском и фильтрацией

* fix: escape apostrophe in French translation

Use double quotes to avoid syntax error with d'auth

* fix(platform): add legacy ~/.fnm path to wellKnownUserToolchainBins (#1110)

* fix(platform): add legacy ~/.fnm path to wellKnownUserToolchainBins

fnm legacy installations use ~/.fnm/node-versions. Closes #1102

* fix: remove stray .fnm token from type declaration

* docs: add Windows troubleshooting guide (#478) (#1170)

* docs: add Windows troubleshooting guide (#478)

Add docs/windows-troubleshooting.md with step-by-step fixes for the
most common native-Windows setup errors:

- Node 24 / nvm-windows gotchas (fake nvm file in System32)
- pnpm not found after installation
- Build scripts blocked by pnpm 10 (better-sqlite3, sharp)
- Visual Studio / gyp build errors
- Starting the dev server
- Optional OpenCode CLI setup

Also update CONTRIBUTING.md and QUICKSTART.md to link to the new
guide instead of the vague "file an issue if it doesn't" note.

* docs: fix Windows guide command accuracy (#1170)

Address all 6 inline review comments from lefarcen:

- Pin npm-global pnpm install to @10.33.2 (matches packageManager field)
- Use where.exe instead of bare where (PowerShell alias conflict)
- Fix OpenCode package: opencode-ai (not opencode), binary is opencode
- Add EPERM fallback note for corepack enable on protected installs
- Add Python check for gyp ERR! find Python
- Expand diagnostic checklist with corepack, python, execution policy

Also remove redundant corepack pnpm --version from checklist.

* feat(daemon): inject compiled design-system tokens + fixture into prompts (#1385)

* feat(daemon): inject compiled design-system tokens + fixture into prompts

Follow-up to #1231. The prior PR landed the structured form of two
brands (`default` + `kami`) and codified the schema; this PR teaches
the daemon to actually consume those files when assembling the system
prompt, so agents stop having to re-derive token names from DESIGN.md
prose every turn.

Gated behind `OD_DESIGN_TOKEN_CHANNEL=1` for the smoke-test phase —
flag-off keeps the daemon byte-equivalent to today's behavior, flag-on
appends two new prompt blocks (the brand's `tokens.css` :root contract
and its `components.html` reference fixture) right after the existing
DESIGN.md block. Brands without those sibling files (every brand
except `default` and `kami` today) skip silently in either mode.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(daemon): only swallow ENOENT/ENOTDIR in readFileOptional, rethrow rest

Reviewer feedback (nettee, #1385). The prior catch-all hid permission
errors, EISDIR, and broken packaged-resource paths behind the same
"undefined = absent" branch the legacy ~138-brand fallback uses,
which would let `OD_DESIGN_TOKEN_CHANNEL=1` silently degrade to the
DESIGN.md-only prompt while reporting success. That corrupts the
exact signal the smoke-test rollout depends on.

Now `readFileOptional` only returns undefined for ENOENT / ENOTDIR
(real "file does not exist" cases) and rethrows everything else.
Added a focused test that plants a directory at the tokens.css path
to exercise the EISDIR branch, plus a partial-presence regression
test to confirm the stricter contract preserves the legacy fallback.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: chaoxiaoche <chaoxiaoche@192.168.10.16>
Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(daemon): make connection-test timeouts configurable (#1222)

* feat(daemon): make connection-test timeouts configurable

Provider and agent connection tests had hardcoded 12s / 45s budgets,
which are too tight for slow networks or distant providers (the user
sees "timeout" in Settings with no way to extend the budget).

- Add OD_CONNECTION_TEST_PROVIDER_TIMEOUT_MS (default 12_000)
- Add OD_CONNECTION_TEST_AGENT_TIMEOUT_MS (default 45_000)
- Invalid values (non-numeric, zero, negative, fractional) emit a
  console.warn and fall back to the default, so a typo in the env
  never silently disables the safety timeout.
- Export resolveConnectionTestTimeoutMs for unit testing; cover the
  three resolution paths (fallback / honored override / invalid).

41 connection-test tests pass (+3 new), full daemon suite 1170/1170.

* fix(daemon): reject connection-test timeout overrides above Node's setTimeout maximum

Node's `setTimeout` silently clamps any delay above `2^31-1` ms
(2_147_483_647) to ~1 ms with a TimeoutOverflowWarning. The previous
`Number.isInteger(n) && n >= 1` check accepted oversized values
unchanged and passed them straight to `setTimeout`, so an override
that *intended* to raise the budget — e.g.
`OD_CONNECTION_TEST_AGENT_TIMEOUT_MS=3000000000` — instead caused
every connection test to fail almost immediately. The safety
timeout was effectively disarmed.

Add `MAX_CONNECTION_TEST_TIMEOUT_MS = 2_147_483_647` and switch the
guard to `Number.isSafeInteger(n) && n >= 1 && n <= MAX...`. The
boundary value is still accepted; one millisecond past it falls
back with a warn. Regression test exercises `3_000_000_000`,
`2_147_483_647`, and `2_147_483_648`.

Addresses #1222 review feedback from @chatgpt-codex-connector,
@mrcfps, and @lefarcen.

* fix(security): strip trailing dot in normalizeBracketedIpv6 (FQDN SSRF bypass) (#1122)

* fix(security): strip trailing dot in normalizeBracketedIpv6 (FQDN bypass)

new URL('http://192.168.1.5./').hostname returns '192.168.1.5.' — the
trailing dot is the RFC 1034 absolute-FQDN form and resolves identically
to '192.168.1.5'. parseIpv4 fails on the dotted form, so 169.254.169.254.
slips past the metadata-service block, 192.168.1.5. slips past the LAN
block, and localhost. slips past the loopback identification.

Strip trailing dots in normalizeBracketedIpv6 so all downstream checks
(isLoopbackApiHost, isBlockedExternalApiHostname, isBlockedIpv4, IPv6
range tests) see the canonical form.

Adds 6 vitest cases covering loopback FQDN forms (localhost.,
foo.localhost., 127.0.0.1.) and SSRF FQDN bypasses (169.254.169.254.,
192.168.1.5., 10.0.0.5.).

Refs nexu-io/open-design#1119 review feedback (P2 from @lefarcen).

* test(connectionTest): tighten trailing-dot coverage per #1122 review

Two issues from #1122 review:

1. (P2 from @mrcfps + codex bot) The original `foo.localhost.` case
   asserted error===undefined on validateBaseUrl, which only proves the
   URL passed validation — not that the host is identified as loopback.
   Replaced with direct isLoopbackApiHost(...) assertions on the actual
   loopback FQDN forms (localhost., 127.0.0.1., 127.0.0.5.) so the test
   exercises the loopback path the comment claims.

2. (P3 from @lefarcen) Original blocked-FQDN tests covered only 3 of 7
   ranges that isBlockedIpv4 handles. Added a dedicated case per range
   (0.0.0.0/8, 10/8, 100.64/10, 169.254/16, 172.16/12, 192.168/16,
   multicast >=224) so future regressions in normalizeBracketedIpv6
   surface against the full coverage.

* docs: drop misleading foo.localhost./endsWith claim in normalizer comment

@lefarcen review feedback: isLoopbackApiHost only accepts exact 'localhost',
'::1', loopback IPv4, and mapped loopback IPv4 — there's no subdomain or
endsWith handling, so referencing 'foo.localhost.' overstates what the
trailing-dot strip enables. Rewrite the comment to match actual call sites
(isLoopbackApiHost equality + isBlockedIpv4 numeric parse).

* feat(daemon): export self-contained HTML via /export/*?inline=1 endpoint (#1312)

* test(daemon): add Red unit tests for inlineRelativeAssets helper

14 cases pinning the behavior contract for the upcoming
apps/daemon/src/inline-assets.ts helper:

- link/script inlining with verbatim body preservation
- non-src script attrs preserved (type=module, defer, crossorigin)
- relative path resolution (root + nested + deep-nested owners)
- self-closing and single-quoted attr forms
- negative cases: missing rel, rel=preload, absolute/data/blob/leading-slash
- escaping: </style and </script inside body
- null-fileReader graceful degradation
- duplicate identical tags fully replaced (diverges from
  apps/web/src/components/FileViewer.tsx:5313's first-match-only;
  locked decision per plan §3.3)
- HTML-escaped data-od-inline-asset attr

Tests intentionally Red — module ../src/inline-assets.js does not yet
exist. Phase B-G of plan declarative-roaming-gosling.md will turn them
green by porting FileViewer.tsx:5248-5354 server-side.

Refs nexu-io/open-design#368.

* feat(daemon): port inlineRelativeAssets server-side for export endpoint

Adds apps/daemon/src/inline-assets.ts — a pure helper that takes
(html, ownerFileName, fileReader closure) and returns the HTML with
every relative <link rel=stylesheet> and <script src> contents inlined
into <style data-od-inline-asset="…">/<script>…</script> blocks. The
fileReader closure keeps the helper free of fs/Express coupling so the
route handler owns the filesystem boundary.

Port source: apps/web/src/components/FileViewer.tsx:5248-5354 — five
functions (inlineRelativeAssets, resolveProjectRelativePath, baseDirFor,
readHtmlAttr, escapeHtmlAttr). The fetch hop becomes the fileReader
closure; replace-all replaces first-match-only per locked design
decision §3.3 (inline comment in inline-assets.ts cites the divergence
from FileViewer.tsx:5313 and notes the web inline path is on a
deprecation track since PR #384 made URL-load the default).

Phase B-G of plan declarative-roaming-gosling.md. All 14 unit cases
from the Red commit (a60a9023) now pass; tightens one case to use a
realistic '&'-only filename (the original `<`/`>`-bearing filename was
unreachable in real filesystems and exposed a regex limitation the
web client carries too).

Daemon delta: +14 tests (1704 → 1718). Typecheck clean.

Refs nexu-io/open-design#368.

* test(daemon): add Red integration tests for /export/*?inline=1 route

9 HTTP cases against GET /api/projects/:id/export/*?inline=1:

- 3-file React-ish layout returns self-contained HTML (wiring guard:
  body assertions catch removal of the await inlineRelativeAssets(...)
  line, not just helper-internals changes)
- missing inline / non-canonical values (0, false, foo, empty) → 400
- non-HTML file → 400 UNSUPPORTED_FILE_TYPE
- missing file → 404 FILE_NOT_FOUND
- invalid project id (..) → some 4xx (Express normalizes before route)
- null-origin OPTIONS preflight → 204 + Access-Control-Allow-Origin: *
- missing sibling asset → 200 with <link> tag intact, other asset inlined
- nested HTML entry (pages/index.html + ../shared/util.js) → 200 inlined

8 of 9 tests Red (404 / 403); the invalid-project-id case is tolerant
about how Express rejects .. so it accidentally passes Red — Green
will tighten to 400 BAD_REQUEST via isSafeId.

Phase C-R of plan declarative-roaming-gosling.md. C-G will register
the route in apps/daemon/src/import-export-routes.ts.

Refs nexu-io/open-design#368.

* feat(daemon): wire GET /api/projects/:id/export/*?inline=1 endpoint

Adds the export-inline endpoint into registerProjectExportRoutes
(import-export-routes.ts) alongside /export/pdf and /archive. The
route:

- Validates project id via ctx.validation.isSafeId
- Requires ?inline=1 (accept-list: 1 / true / yes / on, matching Part
  1's parseForceInline at file-viewer-render-mode.ts:59-66)
- Reads the owner HTML via ctx.projectFiles.readProjectFile; maps
  ENOENT to 404 FILE_NOT_FOUND, everything else to 400 BAD_REQUEST
- Gates non-HTML callers with 400 UNSUPPORTED_FILE_TYPE
- Builds a fileReader closure that silently returns null on any sibling
  read failure (failure-local, not fatal — matches the web client's
  null-filter at FileViewer.tsx:5311)
- Hands the buffer + relPath to inlineRelativeAssets and returns the
  result as text/html

DI: RegisterProjectExportRoutesDeps gains 'projectFiles' | 'validation';
server.ts:2879 passes the corresponding deps. Mirrors the dep shape of
RegisterFinalizeRoutesDeps used by PR #832's /finalize/anthropic.

Null-origin support intentionally omitted (decision §10 in the PR
description): the daemon's null-origin allowlist is /raw/* and
/codex-pets/.../spritesheet only, and export consumers are same-origin
UI or server-side tooling — sandboxed-iframe srcdoc previews fetch
/raw/* instead. Integration test #7 pins the 403 contract so a future
allowlist change is deliberate.

Phase C-G of plan declarative-roaming-gosling.md. All 23 tests green
(14 unit + 9 integration); full daemon suite 1727 passing (delta +9
over B-G's 1718). Typecheck clean.

Refs nexu-io/open-design#368.

* test(daemon): add Red regression for inlined-body tag-literal corruption

Reproduces the correctness bug Siri-Ray (looper) and codex-bot flagged
on PR #1312: the reduce/split-join approach in inlineRelativeAssets
re-scans the progressively mutated HTML, so a tag literal that happens
to appear inside an already-inlined asset body gets the inner literal
also replaced — corrupting the body and producing duplicate inlining.

Concrete reproducer (CSS, where </style escape doesn't touch <link>):

  HTML:    <link rel="stylesheet" href="a.css">
           <link rel="stylesheet" href="b.css">
  a.css:   /* see also <link rel="stylesheet" href="b.css"> */
  b.css:   body{color:red}

Under split/join the second pass splits on `<link rel="stylesheet"
href="b.css">` and matches BOTH the real outer tag AND the literal
inside a.css's comment. Result: b.css's <style> block is injected
inside a.css's comment, and b.css gets inlined twice.

Phase F-R of plan declarative-roaming-gosling.md (post-PR-#1312
review round). F-G will rewrite the helper to collect matches by
position in the original HTML and concat slices in a single pass,
so already-inlined content is never re-scanned.

Refs nexu-io/open-design#1312 review threads at
apps/daemon/src/inline-assets.ts:122 (Siri-Ray looper + codex bot).

* feat(daemon): replace inliner reduce/split-join with position-based concat

Fixes the inlined-body tag-literal corruption Siri-Ray (looper) +
codex-bot flagged on PR #1312. The previous `replaceAllOccurrences`
(`source.split(from).join(to)`) re-scanned the progressively mutated
HTML on each pass, so a tag literal that appeared inside an already-
inlined CSS/JS body got the inner literal replaced too, producing
duplicate inlining and corrupted bodies.

New shape: collect every match's {start, end} byte span from the
ORIGINAL html via `matchAll`, await the per-match replacements in
parallel, sort by start, and concat slices of the original html with
the replacement strings in a single pass. Text introduced by an
earlier replacement is never scanned for matches.

The dup-tag fix (decision §8 — replace every occurrence, not
first-match-only) is preserved: every original-tag position gets its
own slice, so all duplicates are inlined.

Also extracts buildInlineStyleBlock / buildInlineScriptBlock so the
match-collection loops stay readable.

Phase F-G of plan declarative-roaming-gosling.md. Regression test
(c809bccc) goes Green; all 24 unit + integration tests pass; daemon
suite still clean.

Refs nexu-io/open-design#1312.

* test(daemon): add Red CSP-sandbox test + P3 coverage gaps from PR #1312 review

Three tests covering lefarcen's review on PR #1312:

1. [Red] CSP sandbox header (P2, lefarcen @ import-export-routes.ts:423).
   Top-level browser navigation to /export/*?inline=1 sends no Origin
   header, so the daemon middleware lets it through and any JS in the
   exported document runs with daemon-origin privileges. Asserts the
   response sends `Content-Security-Policy: sandbox allow-scripts` so
   the browser treats it as a sandboxed iframe with an opaque origin
   (scripts still run, but no cookies / no /api/ access). This test
   fails until G1-G adds the header in the handler.

2. [Green-on-commit] Accept-list cases (P3, lefarcen @ test.ts:262).
   PR body decision §7 promises `inline=true/yes/on` case-insensitive,
   but round-1 tests only exercised inline=1. Pin the full accept list
   (true / yes / on + TRUE / Yes / ON). Already passes — the route's
   parser already implements the accept list; this just makes the
   contract testable.

3. [Green-on-commit] isSafeId guard (P3, lefarcen @ test.ts:287).
   Previous `..` test was normalized by Express before reaching the
   route. New input uses `bad!id` (URL-safe, but outside isSafeId's
   /^[A-Za-z0-9._-]+$/ char class), so Express passes it into
   req.params unchanged and isSafeId rejects with the documented
   400 BAD_REQUEST envelope.

Phase G1-R / H of plan declarative-roaming-gosling.md. Refs
nexu-io/open-design#1312 review comments.

* feat(daemon): send Content-Security-Policy: sandbox allow-scripts on /export

Closes the same-origin XSS surface lefarcen flagged on PR #1312 (P2
at import-export-routes.ts:423): top-level browser navigation to the
export URL sends no Origin header, so the daemon's /api middleware
admits the request and any JS in the exported document executes with
daemon-origin privileges (cookies, /api/, localStorage).

`Content-Security-Policy: sandbox allow-scripts` on the response
makes the browser treat the document as a sandboxed iframe with an
opaque origin. Scripts still execute (necessary for the screenshot
use case — the whole point of inlining JS), but they cannot read
cookies, hit /api/, or otherwise escalate to the daemon's origin.

Phase G1-G of plan declarative-roaming-gosling.md. Daemon delta: +3
tests (the Red CSP test from 58151356 turns Green; the P3 coverage
gap tests stay green).

Refs nexu-io/open-design#1312.

* test(daemon): add Red regression for <link> stylesheet attr preservation

Currently `<link rel="stylesheet" href="print.css" media="print">`
becomes a plain `<style data-od-inline-asset="print.css">…</style>`
with no media query — print-only styles apply unconditionally. Same
problem for `title` (alternate stylesheet sets), `disabled` (initial
disabled state), and `nonce` (CSP nonce). All four are valid on
both `<link rel=stylesheet>` and `<style>` per HTML spec, so the
inliner must carry them across.

PR #1312 round-2 review (lefarcen P2 @ inline-assets.ts:44). Phase
G2-R; G2-G will extend buildInlineStyleBlock to copy the four attrs
off the source <link>.

Refs nexu-io/open-design#1312.

* feat(daemon): preserve <link> stylesheet semantics on inlined <style>

Closes lefarcen's P2 review note on PR #1312 (inline-assets.ts:44):
`<link rel="stylesheet" href="print.css" media="print">` was becoming
a plain <style> with no media query, so print-only styles applied
unconditionally. Same issue for `title` (alternate stylesheet sets),
`disabled` (initial disabled state), and `nonce` (CSP nonce).

buildInlineStyleBlock now carries four attrs across from the source
<link>:
  - media, title, nonce  (value attrs, HTML-escaped via escapeHtmlAttr)
  - disabled             (boolean attr — copied as bare presence)

Other <link> attrs (rel, href, type, crossorigin, integrity,
referrerpolicy) don't apply to <style> and are intentionally dropped.

New `hasBooleanHtmlAttr` helper distinguishes presence-as-attr from
substring-inside-another-attr-value via a regex that requires a
word boundary after the name (whitespace, `=`, or `>`).

Phase G2-G of plan declarative-roaming-gosling.md. All 28 tests pass.

Refs nexu-io/open-design#1312.

* docs(daemon): narrow inliner contract claim + document size-limit policy

Closes lefarcen's P2 review notes on PR #1312:

1. "Self-contained" incomplete (inline-assets.ts:67): the helper
   only rewrites top-level <link rel=stylesheet> / <script src>.
   `<img src>`, CSS `url(...)`, CSS `@import`, ES module imports,
   font sources, and similar remain external in the response. The
   PR title/body claimed "self-contained HTML" which over-promised
   for screenshot tooling expecting bundled images/fonts.

   Module docstring now enumerates the full not-rewritten list and
   names the screenshot path as the primary use case (headless
   browser fetches each external asset on render, so inline-CSS-
   and-JS-only is sufficient). The route handler comment block
   mirrors the contract.

   A fully offline export with image/font bundling is filed as a
   follow-up — out of scope for this PR.

2. No response cap (inline-assets.ts:72): the helper does
   concurrent reads + multiple string copies and could spike daemon
   memory. The daemon is local-first (single-user, developer's
   machine — see open_design_architecture.md), so the effective
   ceiling is the size of the user's own project. The docstring now
   states this rationale and names the conditions under which a
   bounded-concurrency reader and output-size limit would be
   needed (non-trusted callers).

Docs-only — no behavior change, all 28 tests still pass.

Refs nexu-io/open-design#1312.

* test(daemon): add Red regression for hasBooleanHtmlAttr quoted-value match

PR #1312 round-2 review (lefarcen P3): `hasBooleanHtmlAttr` tests the
tag string with no attr-quoting awareness, so the literal text
`disabled` appearing inside any quoted attribute value followed by
another whitespace char satisfies `\sdisabled(?=\s|=|/?>)`.

  <link rel=stylesheet href=x.css data-note="content disabled stuff">

emits a <style disabled> block, silently disabling a stylesheet the
author wrote without that attr.

Also adds a counterweight test for the legitimate-disabled case
(<link … disabled>) so the next-commit fix doesn't over-correct
and start dropping real boolean attrs.

Phase I3-R of plan declarative-roaming-gosling.md (post-PR-#1312
round-2 review). I3-G will strip quoted attribute values from the
tag string before testing for the bare attr.

Refs nexu-io/open-design#1312.

* feat(daemon): make hasBooleanHtmlAttr quote-aware to avoid false positives

Closes lefarcen's P3 review note on PR #1312:
`hasBooleanHtmlAttr` previously ran `\sname(?=\s|=|/?>)` over the
full tag string, so the literal text `disabled` appearing inside any
quoted attribute value followed by whitespace satisfied the regex.
Source tags like `<link rel=stylesheet href=x.css
data-note="content disabled stuff">` were emitting a <style
disabled> block — silently disabling a stylesheet the author wrote
without that attr.

Fix: strip `="…"` and `='…'` substrings out of the tag with two
regex passes BEFORE testing for the bare attr. The lookahead still
requires `\s|=|/?>` after the attr name, so `<link disabled>`,
`<link disabled="">`, `<link disabled/>`, etc. all match — but the
attr name as a substring of any quoted value cannot match because
values have been stripped to `""` / `''`.

Phase I3-G of plan declarative-roaming-gosling.md. All 30 tests
green (28 prior + 2 round-3 regression cases: false-positive and
legitimate-disabled). Refs nexu-io/open-design#1312.

* test(daemon): add Red cap-enforcement tests + scaffold InlineOptions

PR #1312 round-2 review (lefarcen P2 — still open): round-2 only
documented that no cap is enforced. Reviewer pushed back: the helper
still builds unbounded candidate arrays + runs Promise.all over all
asset reads + concatenates the full output in memory. Need actual
limits in code.

This commit adds the Red test surface that drives the next commit's
enforcement:

  - InlineAssetsLimitError("owner") when owner HTML > maxOwnerBytes
  - InlineAssetsLimitError("candidates") when tag matches > maxCandidates
  - Per-asset graceful: oversized asset → tag stays as URL ref
  - InlineAssetsLimitError("total") when assembled output > maxTotalBytes
  - Bounded read concurrency: peak in-flight reads ≤ maxReadConcurrency
  - Integration: route maps the throw to 413 PAYLOAD_TOO_LARGE

InlineOptions interface is added to the helper signature as a no-op
test-door (per feedback_test_doors_over_fake_timers.md), so tests
can exercise tiny fixtures while production callers use module-level
defaults. The next commit (H3-G) wires the enforcement.

Phase H3-R of plan declarative-roaming-gosling.md. Daemon delta on
this commit: +6 tests (5 unit + 1 integration), all Red.

Refs nexu-io/open-design#1312.

* feat(daemon): enforce inliner caps + map limit errors to 413 PAYLOAD_TOO_LARGE

Closes lefarcen's still-open P2 review on PR #1312 round 2 ("the
code still builds unbounded candidate arrays + Promise.all over all
asset reads + concatenates the full output in memory"). Caps are now
enforced in code with the documented defaults:

  MAX_INLINE_OWNER_BYTES       = 2 MiB
  MAX_INLINE_ASSET_BYTES       = 5 MiB per sibling
  MAX_INLINE_CANDIDATES        = 500 link/script matches
  MAX_INLINE_TOTAL_BYTES       = 50 MiB assembled output
  MAX_INLINE_READ_CONCURRENCY  = 8 simultaneous fileReader calls

Enforcement points:

- Owner cap (input): fires immediately at function entry. Cheap —
  Buffer.byteLength of the already-decoded UTF-8 string.
- Candidate cap (planning): fires after matchAll, BEFORE any sibling
  read. Pathological HTML with thousands of <link>/<script src>
  tags is rejected without opening a single file descriptor.
- Asset cap (per-sibling): post-read length check; oversized assets
  return null from the wrapped reader, so the tag stays as a URL ref
  and the response is still 200. This is the only "graceful" cap —
  one bad asset doesn't fail the whole export.
- Total cap (output): tracked across the slice-and-concat loop,
  guarding both preserved-html slices AND injected replacements.
- Concurrency cap (planning): a tiny in-module runWithConcurrency
  worker-pool keeps at most maxReadConcurrency fileReader calls in
  flight, with order-preserving results.

`InlineAssetsLimitError` carries a `limit` discriminator so logs and
clients can disambiguate owner/asset/candidates/total. The route
handler catches it and emits 413 PAYLOAD_TOO_LARGE.

Drive-by error-envelope fix while in the route: UNSUPPORTED_FILE_TYPE
(an unregistered ApiErrorCode) → UNSUPPORTED_MEDIA_TYPE (the
canonical code) with HTTP 415. The round-1 string was a slip;
caught by reading packages/contracts/src/errors.ts:11 while wiring
PAYLOAD_TOO_LARGE.

Phase H3-G of plan declarative-roaming-gosling.md. All 36 tests
green (28 prior + 2 round-3 quoted-attr + 5 cap unit + 1 cap
integration). Refs nexu-io/open-design#1312.

* feat(daemon): enforce inliner caps pre-buffer via AssetHandle contract

Closes lefarcen's still-open P2 review on PR #1312 round 3 ("the
helper enforces maxTotalBytes only after all candidate assets have
already been read and converted to replacement strings" /
"maxAssetBytes is checked after fileReader fully buffers each
sibling"). Round-3 caps were defensive against the final output
size but did not bound peak memory during read fanout — 500 assets
at 5 MiB each could materialize ~2.5 GiB before the 413 fired.

Contract change: InlineAssetReader now returns `AssetHandle | null`
where AssetHandle is `{ readonly size: number; read(): Promise<...> }`.
Callers expose `size` from a cheap stat-equivalent (the route uses
`resolveProjectFilePath`) and defer the full materialization to
`read()`. The helper checks size against maxAssetBytes BEFORE
invoking read, and against the running total BEFORE the reservation
is committed.

Enforcement flow inside runWithConcurrency:

  1. await fileReader(p.resolved) → cheap stat-only call
  2. if (handle.size > maxAssetBytes) return null   ← pre-buffer
  3. if (runningBytes + handle.size > maxTotalBytes) ← pre-buffer
     totalAborted = true; return null
  4. runningBytes += handle.size                    ← reserve
  5. await handle.read()                            ← only now
  6. if (read returned null) runningBytes -= refund

`totalAborted` is a shared flag the workers check at entry, so
once the running total hits the cap, no new reads start. With
maxReadConcurrency = 8, at most ~8 stat-side calls finish after
abort — peak memory bounded.

The concat-time guard stays as the exact final assertion (the
pre-buffer reservation is approximate — it counts the original tag
bytes and skips wrapper overhead).

Route closure updated to do `resolveProjectFilePath` first, then
`readProjectFile` inside the deferred `read()`. Test reader helpers
(`readerFrom` + the concurrency-test reader) updated to the new
shape.

Two new unit tests pin the pre-buffer semantics:

  - `maxAssetBytes` is checked via handle.size BEFORE handle.read()
    (the reader's `read()` throws — must never run)
  - Running total abort stops further reads once exceeded (counting
    reader observes ≤ 2 reads when cap should fire after the first)

Phase K of plan declarative-roaming-gosling.md (post-PR-#1312
round-3 review). All 38 tests green (36 prior + 2 round-4
pre-buffer cases).

Refs nexu-io/open-design#1312.

* test(daemon): add Red test pinning owner pre-buffer 413 before mime 415

PR #1312 round-5 (lefarcen P2): the route currently reads the owner
file with readProjectFile() before any size check, so a 100 MiB owner
HTML is fully buffered into memory before the helper's ownerBytes
check fires. The fix is to stat with resolveProjectFilePath first,
reject pre-buffer with 413 PAYLOAD_TOO_LARGE on oversize, then fold
in the mime check (still 415 on mismatch, now pre-buffer), then
readProjectFile when both gates pass.

The Red→Green discriminator is the combination 'oversize AND
non-HTML': pre-fix the route reads the buffer first and the
text/plain mime check fires → 415; post-fix the route stats first
and the size check fires before the mime check → 413. Asserting
'got 413, not 415' pins both the pre-buffer property and the check
ordering (size before mime, per lefarcen's locked round-5 sequence).

2 MiB+1 byte fixture is acceptable in test setup; MAX_INLINE_OWNER_BYTES
is the production 2 MiB so no test-door is needed.

Red verified: AssertionError: expected 415 to be 413 (pre-fix flow
reads → mime → 415).

* feat(daemon): stat owner before readProjectFile in /export route to bound owner pre-buffer

PR #1312 round-5 (lefarcen P2 confirmed at PR-1312#issuecomment-4424868413
follow-up): the route previously called readProjectFile() unconditionally
on the owner, so a 100 MiB owner HTML was fully buffered into memory
before the helper's ownerBytes check fired with InlineAssetsLimitError
('owner'). That meant the 413 envelope returned to the caller but only
after peak memory had already hit the file size.

Fix mirrors the sibling-asset stat-then-read contract round 4 added via
the AssetHandle interface: call resolveProjectFilePath first (cheap
stat), reject pre-buffer with 413 PAYLOAD_TOO_LARGE on size >
MAX_INLINE_OWNER_BYTES, fold in the mime check (still 415
UNSUPPORTED_MEDIA_TYPE on mismatch, now also pre-buffer per lefarcen's
'fold-in is welcome'), then readProjectFile() only when both gates
pass. Size check fires before mime check, so an oversize non-HTML file
returns 413 rather than 415 — the observable Red→Green discriminator
for this round.

The helper's ownerBytes check (inline-assets.ts:127-133) stays as
defense-in-depth for direct in-process callers that skip the route and
for any drift between stat-reported size and the bytes returned by
readFile.

Verifies the round-5 Red at apps/daemon/tests/export-inline-route.ts
('returns 413 (not 415) for an oversize non-HTML file'). Daemon suite
1743/1743 passing.

* test(daemon): add Red test pinning stat-vs-actual byte reconciliation

PR #1312 round-5 (lefarcen P3 confirmed at PR-1312#issuecomment-4424868413
follow-up): the helper trusts handle.size for the running-total guard
and never reconciles with the actual byte length of content unless the
per-asset cap is exceeded. A reader that under-reports size (stale
stat, UTF-8 expansion at decode, sparse file, deliberate lie) can let
many strings materialize in memory before the concat-time guard at
the bottom of inlineRelativeAssets throws — defeating the round-4
pre-buffer cap intent.

Fix is lefarcen-confirmed path-a: post-read, the helper computes
actualBytes = Buffer.byteLength(content, 'utf8'), reconciles
runningBytes (add actualBytes, refund handle.size), and if running
total exceeds maxTotalBytes flips totalAborted = true and returns
null. Subsequent workers see totalAborted before invoking their own
read(). Helper still throws InlineAssetsLimitError('total') after
Promise.all settles — preserving the round-2/3/4 graceful-fallback
pattern instead of racing throws across in-flight workers.

Red→Green discriminator is read count. Pre-fix the helper trusts
the lying handle.size (10), so both reads complete (each returning
1000 bytes) under the reservation total of 56+10+10=76 < cap 500.
The concat-time guard then catches the 2000+-byte assembly and
throws 'total' — but only after both reads materialized in memory.
Post-fix worker 1's reconciliation trips totalAborted as soon as
actualBytes (1000) is folded into runningBytes; worker 2 skips its
read.

Red verified: AssertionError expected 1, received 2 (pre-fix flow
completes both reads before concat-guard fires).

* feat(daemon): reconcile inliner reservation with post-read actual bytes

PR #1312 round-5 (lefarcen P3 confirmed at PR-1312#issuecomment-4424868413
follow-up, path-a): the helper trusted handle.size for the running-
total guard and only reconciled with actual bytes for the per-asset
cap. A reader that under-reported size — stale stat, UTF-8 decode
expansion at read time, sparse file, deliberate lie — could let
many strings materialize before the concat-time guard at the bottom
of inlineRelativeAssets caught the excess. That defeated the
round-4 pre-buffer cap intent.

Fix: after a successful read(), compute actualBytes =
Buffer.byteLength(content, 'utf8'), reconcile runningBytes by
folding in (actualBytes - handle.size), and re-check the total cap.
If the reconciliation pushes runningBytes past maxTotalBytes,
drop the asset's inlining (tag stays as URL ref), set
totalAborted = true to block subsequent worker reads, and let
Promise.all settle. The helper then throws
InlineAssetsLimitError('total') below — matching the round-2/3/4
graceful-fallback pattern (no throw-before-settle race between
in-flight workers).

The per-asset cap check at line 228 is preserved for stat-lying
readers that blow a single asset past maxAssetBytes; that branch
refunds handle.size and drops without flipping totalAborted, so
sibling assets still get a fair shot.

Verifies the round-5 Red at apps/daemon/tests/export-inline-route.ts
('reconciles handle.size with actual content bytes'). Daemon suite
1744/1744 passing.

---------

Co-authored-by: DevForgeAI CI/CD Engineer <devforge-ai@development.ai>

* fix: truncate long template names on project cards (#1220) (#1302)

Add min-width: 0 to .design-card-name so text-overflow: ellipsis
works correctly in flex layouts. Long template names were pushing
the task execution status (Running, Failed, etc.) out of view on
project cards.

Closes #1220

Co-authored-by: laomo <laomo@openclaw.ai>

* fix(desktop): swallow setTypeOfService EINVAL crashes in dev main (#647) (#1298)

* fix(desktop): swallow harmless setTypeOfService EINVAL crashes in dev main

The packaged Electron entry (apps/packaged/src/logging.ts) already
filters the undici "setTypeOfService EINVAL" crash that issue #895
introduced for the prod build, but the dev / source-built desktop
entry was missing the parallel guard. Result: switching settings
tabs in a from-source desktop run could fire a fresh fetch, undici
would try to set IP_TOS on the outbound socket, the kernel would
refuse on certain macOS / VPN configurations, and the rejection
bubbled to Electron's default handler as the "JavaScript error in
the main process" dialog reported in issue #647.

Add the same defensive filter to apps/desktop:

  - isHarmlessSocketOptionError matches only the canonical undici
    shape (syscall name AND EINVAL code). A contradicting code
    (EACCES, EPERM, etc) explicitly fails the match so real bugs
    don't get hidden.
  - The uncaughtException handler logs harmless cases at warn and
    returns silently. For anything else it removes itself from the
    listener list and re-throws via setImmediate, restoring Node's
    default crash path so Electron's native dialog renders exactly
    as it would without this filter.
  - unhandledRejection mirrors the same harmless / fall-through
    split.

The filter is installed BEFORE app.whenReady so it is armed by the
time the renderer fires its first fetch.

The helper is duplicated rather than imported from apps/packaged
because AGENTS.md forbids cross-app private-source imports. The file
header calls out the parallel and notes that the two copies should
stay in sync until the helper is promoted to a shared workspace
package (follow-up); the contract is identical so a regression in
one will surface in the other's test suite.

Tests in apps/desktop/tests/main/uncaught-exception.test.ts mirror
apps/packaged/tests/logging.test.ts: 8 cases pinning the matcher
shape, 2 cases pinning the handler's harmless-log-warn vs
fall-through-rethrow split.

Validated: pnpm guard, pnpm --filter @open-design/desktop typecheck,
pnpm --filter @open-design/desktop build, and pnpm --filter
@open-design/desktop test (14 passed, 10 new).

* fix(desktop,packaged): fail-fast on non-harmless unhandled rejections

The previous unhandledRejection listeners logged non-harmless reasons
and returned, which kept the main process alive after any rejected
promise. A real bug, a failed IPC registration, or any unexpected
async exception was reduced to a console line instead of surfacing
through Node/Electron's default crash path the filter was meant to
preserve.

Both copies now route non-harmless rejections through a parallel
factory (createDesktopUnhandledRejectionHandler /
createFatalUnhandledRejectionHandler) that mirrors the
uncaughtException policy: harmless setTypeOfService EINVAL shapes log
at warn and return, anything else logs at error, removes the
listener, and re-throws via setImmediate. Listener removal happens
before the scheduled throw, so the rethrown reason lands in the
uncaughtException path with no recursion.

Tests cover the harmless branch, the detach + ordered rethrow, and
non-Error / primitive rejection reasons (Promise.reject(42)) which
must fall through. Desktop suite: 13/13, packaged suite: 16/16.
Flagged on PR #1298 by Siri-Ray and the codex P2 review thread; the
two file copies stay in lockstep per the AGENTS.md sync invariant.

---------

Co-authored-by: Nagendhra <nagendhra405@gmail.com>

* feature: refine assistant artifact feedback (#1379)

* feature: refine assistant artifact feedback

* fix: clear hidden custom feedback reason

* test: update assistant feedback expectations

* fix: support object-style question-form options (#1293)

* fix: support object-style question-form options

* fix: preserve stable option values in form submissions

* fix(daemon/acp): terminate ACP child after clean prompt completion (#1286)

* fix(daemon/acp): terminate ACP child after clean prompt completion (Bug B / #1265)

Some ACP agents (notably Devin for Terminal) keep the child process
alive after stdin closes, waiting for the next prompt. Open Design
spawns a fresh agent per chat turn and relies on child.on('close') to
finalize the run, so without an explicit signal-driven shutdown the
chat sits stuck in the 'working' state indefinitely.

Three small, targeted changes:

- apps/daemon/src/acp.ts: After a clean session/prompt response we
  schedule a 500ms grace period and then SIGTERM the child. This
  mirrors the pattern detectAcpModels() already uses after model
  discovery. The grace period leaves well-behaved agents that exit on
  stdin.end() unaffected.
- apps/daemon/src/acp.ts: New completedSuccessfully() method on the
  session handle reports whether the prompt resolved without a fatal
  error or abort, so the consumer can distinguish 'clean signal exit'
  from 'genuine signal failure'.
- apps/daemon/src/server.ts: child.on('close') now treats a SIGTERM
  exit as 'succeeded' when acpSession.completedSuccessfully() is true.
- apps/web/src/providers/daemon.ts: Trust the server's authoritative
  endStatus; the signal/non-zero-code safety net no longer overrides
  an explicit 'succeeded' status, so the chat doesn't surface a fake
  'agent exited with signal SIGTERM' error after a clean ACP run.

Daemon tests cover the SIGTERM grace timer, clean early-exit (timer
cleared), and completedSuccessfully() abort/error states. Manual UI
test on plain main + this fix confirms Devin chats now return to ready
automatically after Done · ...

* fix(daemon/connectionTest): treat ACP clean SIGTERM as success

Codex review on #1286 caught that the new SIGTERM in attachAcpSession
breaks ACP connection tests for agents that don't shut down on
stdin.end() (the exact Devin behavior the patch targets).

attachAgentStreamHandlers() in connectionTest.ts now also respects
acpSession.completedSuccessfully(), mirroring the same check we apply
in server.ts. Without this, a clean prompt response followed by our
SIGTERM would set winner.signal === 'SIGTERM', flip exitedCleanly to
false, and the connection test would report 'agent_spawn_failed'
even when the agent had returned a healthy response.

Also widened the AgentSpawnHandle type so completedSuccessfully is
visible on the structural type used inside connectionTest.ts.

All 56 daemon tests still pass; typecheck + guard clean.

* fix(daemon/acp): narrow ACP success-on-signal override to forced-SIGTERM

Looper review on #1286 caught that the success predicate was broader
than the SIGTERM case it was meant to handle. `completedSuccessfully()`
flips to true as soon as the ACP `session/prompt` response is
processed, but it does not say why the child later closed. With the
broad predicate, an ACP agent that returned a prompt result and then
exited with code 1 (or was killed by SIGKILL/SIGSEGV) was still marked
'succeeded', regressing the existing close-status behavior for genuine
post-response process failures.

Scope the override to the exact forced-shutdown shape this PR
introduces:

  code === null && signal === 'SIGTERM' && acpCleanCompletion

Applied to both `server.ts` (chat run finalization) and
`connectionTest.ts` (connection-test classification). Any other
post-response failure now falls through to 'failed' / 'agent_spawn_failed'
as before.

All 59 daemon tests still pass; typecheck + guard clean.

* fix(web/daemon): only bypass exit-code safety net on explicit server success

Looper review on #1286 caught that the previous web change trusted
`endStatus === 'succeeded'` absolutely, but `endStatus` can become
'succeeded' in two distinct ways:

1. The SSE end event explicitly carries `status: 'succeeded'`
   (authoritative server declaration).
2. The end event omits or has an invalid `status` field and the
   handler silently falls back to 'succeeded' as a local default.

Both produced `endStatus === 'succeeded'` in the existing code, so
the new safety-net bypass treated them identically. That regressed
backward compat: a compatible or older daemon emitting an end event
like `{code:1}` or `{code:null,signal:"SIGTERM"}` with no
`status` would suddenly skip the failure banner.

Track explicit success separately via `serverDeclaredSuccess`,
set true only when:

- The SSE end event has `status === 'succeeded'`, or
- The fallback `fetchChatRunStatus` REST path returns
  `status === 'succeeded'` (which the existing `isChatRunStatus()`
  guard already proves is explicit).

The safety net is now bypassed only on that explicit signal; the
local-fallback success path still reaches the exit-code/signal
check so real failures surface as before.

Adds three web-side regression tests in `apps/web/tests/providers/sse.test.ts`:

- Explicit `status: 'succeeded'` + SIGTERM → onDone called, no error
- End event with `{code:1}` and no `status` → onError surfaces
  'agent exited with code 1' as before
- End event with `{code:null,signal:'SIGTERM'}` and no `status` →
  onError surfaces 'agent exited with signal SIGTERM' as before

`pnpm guard` + daemon typecheck clean; 27/27 SSE tests pass (up
from 24).

* Fix Codex wrapper launch paths (#1395)

* test: add Memory and Routines coverage (#1400)

* test: align extended Playwright coverage with current UI behavior

* test: address extended suite review feedback

* test: fix Codex fallback config hydration in e2e

* test: add Memory and Routines coverage

* test: fix Memory and Routines component test typing

* test: include Memory and Routines e2e in extended suite

* refactor(settings): use tiled language picker instead of dropdown (#1406)

The Language section in Settings rendered a single-button dropdown
trigger that opened a floating menu. With one visible label and lots of
empty panel space, the layout misled users into thinking only one
language existed. Replace the dropdown trigger + portaled menu with an
inline tile grid that shows every locale at a glance and clicks
directly to switch.

Side effects of the new layout: the languageOpen / languageMenuRect
state, the dynamic placement effect, the resize-close effect, the
mousedown click-outside handler, and the languageRef are gone. The
global Escape handler no longer needs to guard against the menu being
open. CSS for .settings-language-picker, .settings-language-button,
.settings-language-menu, and .settings-language-option is replaced by
.settings-language-grid (auto-fill 180px minmax columns) +
.settings-language-tile.

Tests in SettingsDialog.execution.test.tsx that drove the dropdown
(click trigger → click menuitemradio → assert menu closed) are
rewritten to drive the tiles directly via the radio role.

Refs #1347

* fix(web): restore consistent app header layout

* fix(web): restore consistent app header layout

Generated-By: looper 0.7.2 (runner=fixer, agent=opencode)

* fix(web): restore consistent app header layout

Generated-By: looper 0.7.2 (runner=fixer, agent=opencode)

* fix(web): restore consistent app header layout

Generated-By: looper 0.7.2 (runner=fixer, agent=opencode)

* fix(web): hide project output chips in header

---------

Co-authored-by: Prantik Medhi <140103052+prantikmedhi@users.noreply.github.com>
Co-authored-by: 이용진 <90879448+Leesin0222@users.noreply.github.com>
Co-authored-by: Nicholas-Xiong <2482929840@qq.com>
Co-authored-by: Hesam <chngyzkhanwhsht@gmail.com>
Co-authored-by: Yuhao Chen <godcorn001@outlook.com>
Co-authored-by: chaoxiaoche <fanzhen910412@gmail.com>
Co-authored-by: chaoxiaoche <chaoxiaoche@192.168.10.16>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: eggward han <32223217+Eggwardhan@users.noreply.github.com>
Co-authored-by: @aaronjmars <61592645+aaronjmars@users.noreply.github.com>
Co-authored-by: Bryan <121247296+bankielewicz@users.noreply.github.com>
Co-authored-by: DevForgeAI CI/CD Engineer <devforge-ai@development.ai>
Co-authored-by: mrzhangkris <92247501+mrzhangkris@users.noreply.github.com>
Co-authored-by: laomo <laomo@openclaw.ai>
Co-authored-by: Nagendhra Madishetti <nagendhra.madishetti24@gmail.com>
Co-authored-by: Nagendhra <nagendhra405@gmail.com>
Co-authored-by: Mason <jinmeihong0201@gmail.com>
Co-authored-by: Yiang Yiyan <15089131836@163.com>
Co-authored-by: Rocky <101849785+MrRockySL@users.noreply.github.com>
Co-authored-by: nettee <nettee.liu@gmail.com>
Co-authored-by: shangxinyu1 <shangxinyu@refly.ai>
Co-authored-by: Matt Van Horn <mvanhorn@users.noreply.github.com>
2026-05-12 23:15:46 +08:00
pftom
1f4259a190 feat(web): update routing and terminology for automation features
- Modified the routing logic to recognize 'automations' as a valid entry point alongside 'tasks', ensuring backward compatibility.
- Updated the EntryNavRail component to reflect the change from 'tasks' to 'automations' in labels and tooltips.
- Renamed instances of 'tasks' to 'automations' in the TasksView component for consistency and clarity.
- Enhanced HomeView and chip actions to include new input structures for better handling of automation scenarios.
- Updated tests to validate the new routing and terminology changes, ensuring proper functionality across the application.

This update improves the user experience by clarifying the distinction between tasks and automations, aligning the UI with the updated terminology.
2026-05-12 23:00:24 +08:00
pftom
358105c154 feat(web): enhance PluginsHomeSection with contribution card and improved plugin creation flow
- Updated the `onCreatePlugin` prop to accept an optional goal parameter, allowing for more contextual plugin creation.
- Introduced a contribution card that displays when no plugins match the current filters, providing users with a prompt to create new plugins.
- Enhanced the `resolveContributionTarget` function to return detailed information about the contribution context.
- Added new CSS styles for the contribution card to improve visual presentation.
- Updated tests to cover new contribution scenarios and ensure proper functionality of the contribution card.

This update significantly improves the user experience by guiding users to contribute plugins when no matches are found, fostering community engagement.
2026-05-12 22:46:43 +08:00
pftom
5f71968f61 feat(daemon, web): implement plugin sharing features for GitHub and Open Design contributions
- Added new API endpoints for publishing plugins to GitHub and contributing to Open Design, enhancing the plugin sharing capabilities.
- Introduced functions for handling plugin sharing actions, including `publishGeneratedPluginToGitHub` and `contributeGeneratedPluginToOpenDesign`.
- Updated the `DesignFilesPanel` and `FileWorkspace` components to support new sharing functionalities, allowing users to publish or contribute plugins directly from the interface.
- Enhanced the UI with new buttons for publishing and contributing plugins, improving user interaction and experience.
- Added tests to ensure the reliability of the new sharing features and their integration within the existing plugin management system.

This update significantly improves the plugin ecosystem by enabling users to share their creations with the community and streamline collaboration.
2026-05-12 22:39:32 +08:00
lefarcen
e1bc83a476
feat(analytics): PostHog product analytics (P0 events, consent-gated, packaged) (#1428)
* feat(analytics): scaffold PostHog product-analytics integration

- Add @open-design/contracts/analytics subpath with the 17 P0 event
  payload types, header constants, and code↔CSV enum mapping helpers.
- Add apps/daemon/src/analytics.ts with env-gated posthog-node client,
  request-scoped analytics context reader, and artifact-id anonymizer.
- Expose GET /api/analytics/config so the web bundle never embeds the
  PostHog key at build time; daemon owns POSTHOG_KEY / POSTHOG_HOST.
- Add apps/web/src/analytics module (identity + lazy posthog-js client
  + React provider) and mount it under <I18nProvider> in app/layout.

No event wiring yet — that lands in the next commit alongside trigger
points (App.tsx, EntryView, NewProjectPanel, SettingsDialog, FileViewer,
runs.ts).

* feat(analytics): wire app_launch, home_view, home_click, project_create_result

- App.tsx: fire app_launch once after first effect tick. handleCreateProject
  now emits project_create_result on both success and failure paths.
- EntryView.tsx: home_view (page) gated on agents loading so
  has_available_cli isn't transiently false; home_view (asset_panel) fires
  per top-tab change with the right result_count.
- NewProjectPanel.tsx: home_click create_button fires before delegating to
  the parent; a fresh request_id is generated here and threaded through
  onCreate so the matching project_create_result stitches via $insert_id.
- contracts/analytics: tighten createTabToTracking and topTabToTracking
  for the worktree branch's renamed tabs (live-artifact, templates).

* feat(analytics): wire settings_view + 3 settings_click events

- settings_view fires on dialog mount and on every section switch,
  carrying the active section (mapped via settingsSectionToTracking
  for the 16-section worktree layout), execution_mode, and the
  selected CLI provider id when present.
- settings_click execution_mode_tab: setMode now emits before/after
  values whenever the user toggles between Local CLI and BYOK.
- settings_click cli_provider_card: agent card onClick reports
  cli_provider_id via agentIdToTracking (kiro → other).
- settings_click byok_field: onFocus added to api_key, model select,
  and base_url inputs; provider_id widened to include google so the
  worktree's Gemini protocol slot type-checks.

* feat(analytics): wire studio_view + studio_click chat, studio_view artifact

- packages/contracts/src/analytics/artifact-id.ts: FNV-1a 64-bit helper
  produces a 16-hex anonymized id for (projectId, fileName). Stable
  cross-platform so the daemon and the web bundle resolve the same id
  without a Web Crypto round-trip; daemon now re-exports it.
- ChatComposer: studio_view chat_panel fires once per project mount,
  studio_click chat_composer fires on attachment + send buttons with
  estimated user_query_tokens (length/4) and has_attachment.
- FileViewer: studio_view artifact fires once per (project, file) at
  the dispatcher level, before any sub-viewer renders, with
  artifact_kind derived from the renderer registry / file.kind table.
- Widen TrackingExportFormat to include markdown and cloudflare_pages
  so the worktree branch's full share menu can emit verbatim.

* feat(analytics): wire studio_click share_option + artifact_export_result

HtmlViewer's share menu now emits both events per click via a
fireShareExport helper:

- studio_click share_option fires immediately on click with the chosen
  export_format and a fresh request_id.
- artifact_export_result fires when the export resolves — success for
  sync exporters (html, markdown, template) the moment the call
  returns, success/failed for async exporters (pdf, zip, deploy)
  via .then/.catch. The same request_id threads both events so
  PostHog stitches click → result via $insert_id.

DEPLOY_PROVIDER_OPTIONS maps to the CSV's vercel / cloudflare_pages
slots; markdown is now a first-class export_format value.

Also ignore .env.local so local POSTHOG_KEY / .env-style secrets
don't get committed.

* feat(analytics): emit run_created and run_finished from the daemon

POST /api/runs now reads the analytics context off the
x-od-analytics-* headers the web client sets on every fetch, then:

- Captures run_created with project_id, conversation_id, run_id,
  model_id, agent_provider_id (mapped via agentIdToTracking),
  skill_id, design_system_id, plus the token_count_source marker.
- Schedules a run_finished capture on runs.wait(run) resolution,
  mapping succeeded/canceled/failed to success/cancelled/failed and
  reporting total_duration_ms.

Both events use a stable insert_id derived from the same uuid so
PostHog dedupes the daemon-side mirror against any future
web-side capture without double-counting.

Token sub-fields (user_query_tokens/system_prompt_tokens/...) stay
omitted in v1 — the claude-stream parser only exposes input/output
totals today. See tracking-doc-issues.md §3.2.

* feat(analytics): emit settings_cli_test_result + settings_byok_test_result

The original BLOCKING-list assumed these CSV P0 events were not
implementable in this branch because main lacked Test buttons. The
worktree HEAD actually wires `handleTestAgent` and `handleTestProvider`
in SettingsDialog, so both events are now in scope.

- handleTestAgent emits settings_cli_test_result on success and
  failure paths with cli_provider_id mapped via agentIdToTracking,
  result drawn from result.ok / catch branch, error_code from
  result.kind or the thrown error name, and duration_ms timed via
  performance.now().
- handleTestProvider emits settings_byok_test_result analogously,
  using apiProtocol (anthropic|openai|azure|ollama|google) directly
  as provider_id — wider than the CSV's 5-value enum, documented in
  tracking-doc-issues.md §2.5.

Contracts: add SettingsCliTestResultProps / SettingsByokTestResultProps
plus matching track* helpers. AnalyticsEventName union now covers all
14 P0 events this branch supports.

* feat(analytics): gate PostHog on the existing telemetry.metrics consent

The integration now reuses the same first-launch privacy banner +
Settings → Privacy toggle that gates Langfuse, so a single user
decision controls both telemetry sinks.

- /api/analytics/config now consults the persisted AppConfigPrefs:
  it returns enabled=true only when POSTHOG_KEY is set AND the user
  has chosen "Share usage data" (telemetry.metrics === true). The
  response also echoes installationId so the web client uses the
  same anonymous id Langfuse keys off of — one identity per install,
  shared across both sinks.
- Web AnalyticsProvider:
  - Bootstrap fetch resolves installationId and threads it through
    the x-od-analytics-anonymous-id header on every /api/* fetch,
    so daemon-side captures (run_created / run_finished /
    project_create_result) land on the same person record.
  - Exposes a setConsent(granted) method that calls posthog-js's
    opt_in_capturing / opt_out_capturing, wired from App.tsx via a
    useEffect watching config.telemetry?.metrics. Toggling Privacy
    → metrics now stops/resumes events immediately, no reload.
- app_launch additionally gates on telemetry.metrics so a freshly-
  declined user fires nothing, and a freshly-opted-in user fires on
  the next reload.

* feat(packaging): bake POSTHOG_KEY into packaged daemon spawn env

Wires PostHog product analytics through the same Langfuse-style build-
secret pipeline so official Open Design builds ship with the key while
fork builds compile without it (the integration short-circuits cleanly
when POSTHOG_KEY is absent).

tools/pack
- resolveToolPackConfig reads POSTHOG_KEY / POSTHOG_HOST from
  process.env at packaging time, validates them (no whitespace in the
  key, http(s) URL for host, trailing-slash strip), and stamps them on
  ToolPackConfig. Fork builds without the env vars simply omit the
  fields; the daemon-side gate keeps things off in that case.
- Mac, Windows, and Linux packaged-config writers each append the two
  fields to open-design-config.json next to the existing
  telemetryRelayUrl entry.

apps/packaged
- RawPackagedConfig / PackagedConfig surface posthogKey / posthogHost
  so the Electron entry and headless entry both forward them to the
  daemon sidecar.
- buildPackagedDaemonSpawnEnv emits POSTHOG_KEY / POSTHOG_HOST into
  the daemon child env when present. The daemon's existing analytics
  module reads these via process.env — no daemon-side changes needed.
- The headless packaged path falls back to process.env for fields the
  builder hasn't injected, mirroring how OPEN_DESIGN_TELEMETRY_RELAY_URL
  is read there.

CI
- release-beta.yml and release-stable.yml expose POSTHOG_KEY (secret)
  and POSTHOG_HOST (var) at workflow-env scope so every packaging job
  inherits them. PR / fork builds without these set simply skip the
  bake step.

Tests
- tools/pack: config.test.ts covers bake-through, fork-build omission,
  whitespace rejection, invalid-URL rejection, and trailing-slash
  normalization.
- apps/packaged: sidecars.test.ts covers buildPackagedDaemonSpawnEnv
  forwarding the keys when present and omitting them when null.

* feat(analytics): enable PostHog autocapture + perf + exceptions

Flip on the PostHog SDK's automatic diagnostic features so we capture
click paths, page transitions, web vitals, dead clicks, and browser
exceptions without scattering instrumentation through the codebase.

Privacy defense lives in one place — apps/web/src/analytics/scrub.ts —
wired in via posthog-js's `before_send` hook so every outgoing event
passes through the same audit point:

  - $autocapture / $rageclick / $dead_click / $copy_autocapture:
    strips $el_text and value/placeholder/aria-label attrs from any
    input, textarea, password input, or contenteditable element. PostHog
    autocapture does not capture input.value by default, but $el_text
    on a <textarea> reflects the typed content — that's the prompt
    body for us, so it has to be scrubbed every time.
  - $pageview / $pageleave: drops query string and fragment from
    $current_url / $referrer so any future ?q=… can't leak.
  - $exception: rewrites file:// and absolute filesystem paths in
    stack frames to app://apps/<repo-relative> so we don't ship the
    user's home directory.
  - Suppresses $opt_in entirely — duplicate of our explicit
    setConsent toggle in App.tsx.

Element-level defense in depth is limited to the single most sensitive
surface: the chat composer textarea gets `ph-no-capture` so PostHog
never even generates an event for clicks inside that subtree. Every
other input relies on scrub.ts — sprinkling the class through every
form would be noisy and easy to forget on new surfaces.

The existing Privacy → "Share usage data" toggle continues to gate
every new feature: posthog-js's opt_out_capturing() halts autocapture,
$pageview, $exception, web vitals, and dead clicks alongside the
explicit capture() calls — one global switch.

11 unit tests pin the scrub rules in apps/web/tests/analytics-scrub.test.ts.

* ci(nix): bump pnpmDepsHash for posthog-js + posthog-node additions

Adding posthog-js to apps/web and posthog-node to apps/daemon changed
pnpm-lock.yaml, which Nix's fixed-output pnpmDeps derivation pins by
sha256. The CI nix flake check failed with:

  specified: sha256-KF3Mld72/iau+pJmA7HvnanRx8VLtDP0N624SKrtrrc=
  got:       sha256-PGFgX4lYyeH2TRAXfUq52A3EOa6bb1gO59hPsXhEk3s=

Copy the new hash into both nix/package-web.nix and
nix/package-daemon.nix per the procedure documented in nix/README.md
§"First-build hash pinning".

* feat(analytics): unify PostHog identity with Langfuse installationId

PostHog's distinct_id is the installationId stamped by /api/analytics/
config; Langfuse already reads the same id off app-config.json to
populate trace.userId. With both sinks keying off the same anonymous
identity, dashboards can correlate user actions (PostHog events) with
LLM runs (Langfuse traces) without re-identifying.

Two gaps closed:

1. applyConsent(false) — clear posthog-js's persisted ph_*_posthog
   localStorage entry on opt-out via posthog.reset(). Without this, a
   user who opts out, then clicks Delete my data, then re-opts in
   would see PostHog stitch their new session to the deleted identity
   because bootstrap.distinctID only takes effect on first init.

2. applyIdentity(newInstallationId) — Delete my data rotates the
   installationId in app-config; App.tsx now watches config.installationId
   and calls posthog.reset() then identify(newId) so the next event
   batch is fully decoupled from the deleted one. Idempotent on
   same-id re-renders so benign config refreshes don't churn PostHog
   identities.

The fetch wrapper's x-od-analytics-anonymous-id header also flips to
the new id on rotation so daemon-side captures (run_created /
run_finished) land on the same person record from the very next API
call, not after a reload.

The end-to-end rotation flow is verified against a live PostHog
project; these unit tests pin the safety guards (no-client paths, null
inputs) since stubbing posthog-js's init-loaded callback chain is
brittle.

* fix(langfuse): require both metrics AND content consent for trace reports

Tightens the Langfuse gate so a user who shares anonymous metrics but
NOT conversation content stops emitting Langfuse traces entirely —
Langfuse is used for turn-quality evals which only make sense with
prompt/output bodies. PostHog (product analytics, content-free) stays
gated on `metrics` alone and is unaffected.

i18n: "Conversation content" → "Conversation and tool content" with
hints expanded to mention tool inputs/outputs so the consent surface
matches what the trace actually carries (en + zh-CN).

Bundled here per PR scope — change originated outside this PostHog
PR but lands cleanly on the same files; gating Langfuse strictly
on `content` makes the dual-sink consent model (PostHog = metrics,
Langfuse = metrics + content) symmetric across both i18n locales and
the daemon-side gate.

* feat(analytics): wire byok_provider_option + fix PR review P1s

Adds the BYOK protocol-chip click event (5-value provider_id mirroring
the apiProtocol Settings UI) and resolves four P1 review threads on
PR #1428.

byok_provider_option:
- New SettingsClickByokProviderOptionProps in contracts (provider_id =
  anthropic|openai|azure|google|ollama; maps to CSV's 5 values per
  tracking-doc-issues.md §2.5).
- trackSettingsClickByokProviderOption helper in apps/web/src/analytics.
- SettingsDialog hooks it on the protocol-chip onClick alongside the
  existing setApiProtocol call; is_selected reflects whether the chip
  was already active.

Review fixes:

1. client.ts (Siri-Ray): clear `initPromise` when the resolution is
   null so a Privacy → metrics opt-in after a previous decline triggers
   a fresh /api/analytics/config fetch. Without this, the disabled
   response was cached forever — first-session opt-in needed a reload
   to start sending PostHog events.

2. provider.tsx (Siri-Ray): replace `url.includes('/api/')` with a
   strict same-origin + /api/ pathname check (shared
   `isSameOriginApiCall` helper). Outbound third-party URLs containing
   `/api/` (e.g. provider.example.com/api/x) no longer receive our
   x-od-analytics-* headers.

3. provider.tsx (codex-connector, lefarcen): gate header injection on
   `resolvedAnonId` being non-null. When Privacy → metrics is off,
   /api/analytics/config returns enabled=false → resolvedAnonId stays
   null → wrapper never installs → daemon can't read consent-bearing
   headers → no daemon-side PostHog event. setConsent now also clears
   resolvedAnonId on opt-out and re-fetches on opt-in.

4. daemon/analytics.ts (defense in depth): createAnalyticsService now
   takes dataDir and capture() re-reads app-config to check
   telemetry.metrics inside the fire-and-forget wrapper. Even if a
   stale header somehow reaches the daemon after opt-out, the capture
   is dropped before posthog-node.capture is called.

* fix(web): place "Share usage data" on the right in privacy consent banner

Swap button order in PrivacyConsentModal and the in-settings ConsentCard
so the affirmative "Share usage data" lands on the right and "Not now"
on the left. Matches the OK-on-the-right pattern users expect for
primary actions.

Both buttons keep equal visual prominence (same .privacy-consent-action
styling) so the swap doesn't change the EDPB equal-prominence stance
called out in the original Langfuse telemetry spec.

* feat(analytics): populate run_finished token totals from claude-stream usage

Daemon's claude-stream parser already emits agent usage events with
input_tokens / output_tokens totals; the run service buffers them in
run.events and Langfuse reads them out the same way. The run_finished
PostHog event was leaving these fields empty.

Scan run.events for the most recent agent usage frame on terminal
transition and emit input_tokens / output_tokens / total_tokens when
present. token_count_source flips to 'provider_usage' only when at
least one count landed; runs without provider-side usage data keep
'unknown'.

Provider does not break the input down into the 7 sub-fields the
tracking doc lists (memory / context / attachment / system_prompt /
…); those stay omitted until a parser change exposes them.

* feat(analytics): estimate user_query_tokens from prompt length

The user_query_tokens field for run_created / run_finished was hardcoded
to 0. We can't tokenize without bundling a model-specific tokenizer, but
the character/4 heuristic is the industry-standard estimate when one
isn't available and is enough for funnel analysis (prompt-length cohorts,
short-vs-long-query conversion rates).

Extracted from req.body via the same telemetryPromptFromRunRequest
pattern the daemon already uses for langfuse-bridge (currentPrompt then
message fallback). Only the integer count goes to PostHog — the prompt
text itself never leaves the daemon.

token_count_source flips appropriately:
- run_created with a prompt: 'estimated' (was 'unknown')
- run_created with no prompt: 'unknown'
- run_finished with provider usage: 'provider_usage' (overrides
  baseProps' 'estimated' value)
- run_finished without provider usage: inherits 'estimated' or 'unknown'
  from baseProps so input/output absent doesn't mask the estimate.
2026-05-12 22:32:42 +08:00
pftom
72ecc09326 feat(daemon, web): enhance plugin input handling and subcategory filtering
- Updated the `pickPluginFields` function to support legacy input aliases, improving compatibility with existing plugin structures.
- Added tests to ensure the new input handling works correctly with legacy inputs when resolving plugin snapshots.
- Enhanced CSS styles for subcategory elements in the plugin home section, improving visual clarity and user experience.
- Introduced new tests for subcategory filtering within the active workflow lane, ensuring accurate plugin categorization.

This update significantly improves the plugin management experience by enhancing input handling and refining the categorization system.
2026-05-12 22:09:26 +08:00
pftom
26d21a942e feat(web): enhance plugin input handling and categorization
- Added support for plugin inputs in the EntryShell and HomeView components, allowing for more dynamic plugin interactions.
- Updated the PluginsHomeSection to include subcategory filtering, improving the user experience when navigating plugins.
- Enhanced the PluginsView and related components to reflect the new categorization model, transitioning to a workflow-based approach.
- Refactored tests to ensure coverage for new input handling and categorization features, maintaining reliability across the application.

This update significantly improves the plugin management experience by providing clearer categorization and enhanced input handling for plugins.
2026-05-12 21:59:38 +08:00
Eli
49ea2499ac
[codex] Add draw annotation workflow (#1435)
* feat(web): tweaks palette popover with HSL hue-shift recoloring

Adds a Tweaks color-palette popover to the HTML preview toolbar.
Selecting a palette re-skins the iframe in place via a srcDoc-side
bridge that walks the DOM and shifts every chromatic paint to the
target hue while preserving each color's saturation and lightness —
pale tints stay pale, bold CTAs stay bold, just in the new color
family. Mono-noir desaturates instead of shifting.

- runtime/srcdoc: new injectPaletteBridge + paletteBridge / initialPalette options
- file-viewer-render-mode: paletteActive flips URL-load back to srcDoc so the bridge can be injected
- FileViewer: state, popover, postMessage wiring, srcDoc + useUrlLoadPreview integration
- PaletteTweaks: popover UI with Original + Coral / Electric / Acid forest / Risograph / Mono noir
- PreviewDrawOverlay: stub pass-through until the draw branch lands

* feat(web): hide finalize-design toolbar from project header

* test(e2e): skip project actions toolbar flow after toolbar removal

* Add draw annotation workflow

* Restore project actions toolbar
2026-05-12 21:54:59 +08:00
pftom
67a109335d feat(web, daemon): enhance plugin categorization and introduce new export scenarios
- Updated the plugin categorization system to reflect a workflow-based model, replacing the previous category bar with a curated workflow bar (From source, Generate, Export).
- Added new export scenarios for Next.js, React, and Vue, providing users with starter plugins for downstream integration.
- Enhanced the HomeView and PluginsHomeSection components to support the new categorization and improve user interaction with plugins.
- Updated tests to cover new scenarios and ensure proper functionality across the updated plugin management system.

This update significantly improves the user experience by providing clearer categorization and new tools for exporting Open Design artifacts.
2026-05-12 21:50:19 +08:00
Neha Prasad
2d405fae96
fix:align artifact preview exit button (#1445) 2026-05-12 21:39:31 +08:00
Nagendhra Madishetti
09a8fa8d64
feat(web): Critique Theater Phase 8 (8 Theater components, barrel, role-keyed CSS) (#1314)
* feat(web): pure reducer for Critique Theater states (Phase 7.1)

Pure CritiqueState reducer driven by the contracts-level PanelEvent
(the same shape both the live SSE stream and the recorded transcript
emit), so a single reducer powers both the in-flight panel and the
rerun replay. Lifecycle covers run_started → running → (shipped /
degraded / interrupted / failed), with panelist_open / dim /
must_fix / close / round_end events building per-round
CritiquePanelistView entries as they arrive.

Defensive behaviour that surfaced while writing the spec tests:
- Terminal phases (shipped / degraded / interrupted / failed) are
  sticky against further lifecycle events for the same run, except
  for parser_warning which can land late and is recorded in a side
  channel without changing phase.
- A new run_started for a different runId at any time discards the
  prior state and reboots, so the UI can launch consecutive runs
  without an explicit reset action.
- Events whose runId does not match the active run return the same
  state reference, so React's useReducer doesn't re-render
  subscribers on stray traffic.
- Round bookkeeping keys by round number rather than "always last",
  so an out-of-order panelist_dim for round 1 arriving after a
  round 2 dim does not corrupt the round 2 bucket.

Test coverage: 18 cases covering each transition, the runId guard,
sticky-terminal behaviour, the out-of-order round invariant, and
the stable-identity guarantee. Sets up Phase 7.2 and 7.3 to wire
SSE + replay into the same reducer.

* feat(web): useCritiqueStream hook subscribes to SSE and feeds reducer (Phase 7.2)

createCritiqueEventsConnection is a pure connection manager that
mirrors apps/web/src/providers/project-events.ts: opens an
EventSource at /api/projects/:id/events, listens for every name in
CRITIQUE_SSE_EVENT_NAMES, decodes each frame back into a PanelEvent
(stripping the critique. prefix and merging the data payload), and
hands it to the caller's onEvent. Reconnect uses exponential
backoff (1s → 30s) and resets on `ready`; malformed payloads drop
with a dev-mode warning rather than tearing the stream.

useCritiqueStream wraps the manager in a useReducer that owns the
CritiqueState. enabled=false or a null projectId tears down the
connection cleanly; switching projectId closes the old connection
and opens a fresh one. The returned dispatch lets local UI
synthesise actions (e.g. an Esc keypress firing a synthetic
interrupted while a kill request is in flight); production traffic
comes from the SSE stream.

Test coverage:
- sse.test.ts (10 cases, node env): subscription set covers every
  CRITIQUE_SSE_EVENT_NAMES channel; payload decoding lifts the wire
  shape back to PanelEvent; malformed JSON is swallowed and does
  not stop the stream; exponential backoff schedule and ready-reset
  semantics are pinned with a setTimeout seam; close() cancels
  pending reconnects and shuts the live source; no-op fallback
  when EventSource is unavailable.
- useCritiqueStream.test.tsx (6 cases, jsdom env): idle pre-event,
  reducer driven by synthetic actions, no connection when disabled
  or projectId is null, clean close on unmount, projectId change
  reopens cleanly.

* feat(web): useCritiqueReplay hook drives reducer from transcript file (Phase 7.3)

Fetches the per-run NDJSON transcript (one PanelEvent per line),
parses every line via the shared isPanelEvent predicate, and
dispatches into the same CritiqueState reducer the live SSE stream
uses. A single reducer means the UI rendering a replay can be
identical to the live panel, and a UI mounting both
useCritiqueStream and useCritiqueReplay in parallel does not have
to reconcile two state shapes.

speed knob is `paused | instant | live | { intervalMs: N }`.
- instant flushes every event synchronously, useful for opening a
  finished run already at its terminal state.
- intervalMs paces dispatches at a fixed cadence so the reviewer
  can watch the run unfold.
- paused parses the transcript but holds events back until the
  caller advances speed (consumers can drive a scrubber later).
- live is reserved for the future "playback at original cadence"
  feature, currently treated as instant; replay timestamps are not
  yet persisted with each event so honest pacing requires a
  follow-up Phase 7+ task.

gunzip seam handles `.ndjson.gz` transcripts via
DecompressionStream when present; the production fetch path picks
between text and arrayBuffer based on the URL extension. Both seams
are injectable so the unit tests don't need to spin up a real
network or a real gzip pipeline.

Test coverage (8 cases, jsdom env):
- Idle status before any URL is provided.
- speed=instant flushes the full transcript synchronously to
  shipped state.
- speed={intervalMs:N} paces with the setTimeout seam, reaching
  done after the last tick.
- speed=paused leaves status=playing with no dispatches.
- Empty transcript reports done with state still idle.
- Fetch rejection surfaces an error status with the message.
- Malformed NDJSON lines are skipped; valid events around them
  still land.
- .gz transcripts route through the gunzip seam.

Closes the Phase 7 plan tasks 7.1 / 7.2 / 7.3 (reducer + stream +
replay), all on one branch ready for review. Phases 8+ (Theater
components) consume these from this PR.

* fix(web): close payload-override gap + paused-resume bug in Critique Theater hooks (Phase 7 review)

Two P1 fixes from lefarcen's review on PR #1307:

SSE payload override

`sseToPanelEvent` previously spread `data` after the channel-derived
`type`, so a payload-provided `type` could override the channel and
route a `critique.run_started` frame into the reducer as a `ship`
action. Reversed the spread so the channel-derived `type` is
authoritative, and revalidated the resulting object through the
contracts-level `isPanelEvent` predicate before returning. Frames
that fail validation (missing runId, empty runId, unknown type) are
dropped, so a malformed or compromised SSE frame can no longer
dispatch a wrong-shape action into the reducer.

Three new sse.test.ts cases pin the regression: hostile `type:'ship'`
in the payload still resolves to `run_started`, missing runId is
dropped, empty runId is dropped.

Replay pause/resume

`useCritiqueReplay` had one big effect keyed on `transcriptUrl`
only, so flipping `speed` from `paused` to `instant` never re-fired
and the held events sat undispatched. Split into a parse effect
(depends on URL, fetches and stores events in state) and a pace
effect (depends on parsed-events + speed, owns the cursor + timers).
The playback cursor lives in a ref that survives pause/resume
cycles, so flipping `paused` -> `instant` flushes from the current
position rather than restarting (which would double-dispatch
`run_started` and reset the reducer).

Two new useCritiqueReplay.test.tsx cases:
- paused-then-instant transitions from `playing` to `done` and
  reaches the shipped terminal phase
- intervalMs paced playback dispatches one event, pauses to drain
  the next scheduled timer, flips to instant, and confirms the
  remaining transcript drains exactly once (cursor was preserved)

Doc consistency

The earlier source comment in useCritiqueReplay.ts claimed `live`
"paces by recorded timestamps" while the impl used zero-delay
timers and the PR body said it behaves like `instant`. Aligned to
reality: `live` currently behaves like `{ intervalMs: 0 }` (events
drain on successive microtasks via setTimeoutFn) because transcripts
do not yet carry per-event timestamps. Honest timestamp-driven
pacing is queued as a Phase 7+ follow-up.

Validated: pnpm guard, pnpm --filter @open-design/web typecheck,
Theater suite 47/47 (up from 42, +3 sse + 2 replay), full web suite
96 files / 888 tests.

* feat(i18n): seed Critique Theater key block (en + zh-CN; other locales fall back via spread)

* feat(web): Theater PanelistLane component (Phase 8.1)

* feat(web): Theater ScoreTicker component (Phase 8.2)

* feat(web): Theater RoundDivider component (Phase 8.3)

* feat(web): Theater InterruptButton component with Escape keybind (Phase 8.4)

* feat(web): Theater TheaterDegraded chip (Phase 8.5)

* feat(web): Theater TheaterCollapsed post-run summary (Phase 8.6)

* feat(web): Theater TheaterTranscript replay surface (Phase 8.7)

* feat(web): Theater TheaterStage top-level container (Phase 8.8)

* feat(web): Theater CSS using existing semantic tokens (no hex literals)

* feat(web): Theater public exports barrel

* fix(web): resolve P2 + P3 review feedback on Phase 8 (PR #1314)

Addresses all 4 P2 + 3 P3 items from codex, Siri-Ray, and lefarcen.

State-lifecycle fixes (3 x P2)
1. Reducer learns a synthetic `__reset__` action (`CritiqueResetAction`).
   Host hooks dispatch it when their gating prop changes so a stale
   run from a prior project / transcript cannot bleed into the next
   context. Reset is idempotent on idle (returns the same reference).
2. `useCritiqueStream` dispatches `__reset__` at the top of its
   connection effect, so a workspace switch from project A (which
   streamed a critique) to project B clears the reducer before the
   new EventSource opens. enabled=false also clears.
3. `useCritiqueReplay` dispatches `__reset__` at the top of its
   parse effect, so transcriptUrl swaps (including swap-to-null after
   a replay reached `shipped`) lift the reducer back to idle before
   the new fetch starts.

SSE validation (1 x P2)
4. `sseToPanelEvent` now runs a per-variant `hasValidVariantShape`
   check after the cheap `isPanelEvent` predicate. A
   `critique.ship` frame missing `composite` / `round` / `status` /
   `artifactRef` is rejected before reaching the reducer, so
   TheaterCollapsed can no longer crash on `undefined.toFixed(1)`.
   Every variant's required fields are validated: run_started
   (protocolVersion, non-empty cast, maxRounds, threshold, scale),
   panelist_* (round, role, plus variant-specific shape), round_end
   (round, composite, mustFix, decision in {continue,ship}, reason),
   ship (round, composite, status, artifactRef.{projectId,artifactId},
   summary), degraded (reason, adapter), interrupted (bestRound,
   composite), failed (cause), parser_warning (kind, position).

Reducer correctness (1 x P2)
5. `panelist_open` now materializes the round + an empty panelist
   view (`{dims: [], mustFixes: []}`) so TheaterStage can highlight
   the in-progress lane the instant the tag opens. Before this, a
   stream that emitted only `panelist_open` after `run_started` left
   `rounds = []` and the UI rendered no current round until a later
   `panelist_dim` arrived.

Polish (3 x P3)
6. Brand role tint swaps from `var(--magenta, var(--accent))` to
   `var(--purple, var(--accent))`. `--purple` is actually defined
   across the design systems; `--magenta` is not, so Brand was
   silently falling through to `--accent` and looking identical to
   Designer.
7. New i18n key `critiqueTheater.interruptedSummary` for the
   interrupted-collapse copy ("Interrupted at round N, best
   composite X.X"). Previously the interrupted branch reused
   `shippedSummary` and the UI read "Shipped at round..." for a run
   that specifically did not ship. Native value in en + zh-CN; other
   locales fall back via `...en` spread.
8. `TheaterDegraded` heading id comes from `useId()` instead of a
   hardcoded `theater-degraded-heading`, so two chips rendered on
   the same page (chat history with multiple completed runs) keep
   their aria-labelledby references unambiguous.

Tests (15 new cases)
- reducer.test.ts (+5): __reset__ on running/terminal/idle, panelist_open materializes round, panelist_open does not stomp prior panelist data.
- sse.test.ts (+6): variant-level rejection for ship without required fields, degraded without adapter, run_started with empty cast, panelist_dim with non-numeric score, round_end with unknown decision, plus a positive fully-formed ship.
- useCritiqueStream.test.tsx (+2): state reset on projectId change, state reset on enabled flip false.
- useCritiqueReplay.test.tsx (+1): state reset on transcriptUrl swap to null after a replay reached shipped.
- TheaterCollapsed.test.tsx (text-pinning update): asserts the interrupted branch reads "Interrupted at round 1" + "best composite 7.9", and explicitly NOT "Shipped at round...".
- TheaterDegraded.test.tsx (+1): two chips on the same page get unique aria-labelledby ids that each resolve to an `<h3>`.

Validated
- pnpm guard clean
- pnpm --filter @open-design/web typecheck clean
- Theater suite: 13 files, 101 tests (was 86 on the first Phase 8 push, +15 new)
- tests/i18n/locales.test.ts 5 of 5 across 18 locales

* fix(web): tighten isPanelEvent in contracts so enum + numeric fields are checked end-to-end (Siri-Ray round-3 P1 on PR #1314)

The variant validator on the web SSE path previously accepted any
`typeof === 'string'` for closed-enum fields (ship.status,
panelist_*.role, degraded.reason, failed.cause, parser_warning.kind,
run_started.cast[]) and any `typeof === 'number'` for numeric fields,
which let NaN / Infinity through. Downstream components index i18n
tables by enum value, so an unknown status or role would land
`SHIP_BADGE_KEY[final.status]` on undefined and crash the translator.

The replay parser had a separate gap: `useCritiqueReplay.parseTranscript`
called the cheap `isPanelEvent` header check directly, so a recorded
line like `{"type":"ship","runId":"r"}` reached the reducer with
composite, status, round, artifactRef, summary all undefined and
TheaterCollapsed then called `final.composite.toFixed(1)` on undefined.

Resolution: move all wire-side validation into the contract guard.

- Export const arrays for the closed enums:
  SHIP_STATUSES, DEGRADED_REASONS, FAILED_CAUSES, PARSER_WARNING_KINDS,
  ROUND_DECISIONS (PANELIST_ROLES already existed).
- Rewrite `isPanelEvent` in packages/contracts/src/critique.ts to be the
  single deep validator: header (known type + non-empty runId) plus
  every variant-specific required field plus closed-enum membership
  plus Number.isFinite on every numeric field. Documented as the wire
  source of truth.
- Drop the local `hasValidVariantShape` from web/sse.ts; sseToPanelEvent
  now relies entirely on the contract guard, and parseTranscript in
  useCritiqueReplay (which already uses isPanelEvent) gets the deeper
  validation for free.

Tests (TDD, red-first):

- packages/contracts/tests/critique.test.ts: 13 new cases pinning the
  strict guard directly (well-formed across every variant, every
  rejection path: unknown type, empty/non-string runId, unknown enum,
  non-finite numeric, missing variant field).
- apps/web/tests/components/Theater/state/sse.test.ts: 9 new cases for
  each closed-enum rejection on the wire path plus a positive sweep
  across every legal enum value across every variant.
- apps/web/tests/components/Theater/hooks/useCritiqueReplay.test.tsx:
  2 new cases for incomplete and unknown-enum transcript lines.

Verified:
- pnpm --filter @open-design/contracts test 4 files / 30 tests green.
- pnpm --filter @open-design/contracts build clean.
- pnpm --filter @open-design/web typecheck clean.
- pnpm --filter @open-design/web test 107 files / 976 tests green.

* fix(contracts): enforce numeric domains in isPanelEvent (lefarcen P2 on PR #1314 round 4)

The strict guard from PR #1314 round 3 enforced enum membership and
Number.isFinite, but accepted any finite number where the contract
intends a specific domain: scale: 0 (ScoreTicker divides by it),
negative thresholds, fractional rounds, negative mustFix, etc.
ScoreTicker.tsx writes `var(--scale, ${state.scale})` into inline
CSS and divides by it for tick width, so a guard-passing scale: 0
shipped Infinity into the rendered style. Negative composite /
score values reached downstream code that assumes >= 0.

Resolution: mirror the daemon-side Zod domain constraints in the
runtime guard.

Three new helpers in packages/contracts/src/critique.ts:

  - isPositiveInt(v): integer with v > 0. Used for round, maxRounds,
    scale, protocolVersion (all 1-indexed in the orchestrator).
  - isNonNegativeInt(v): integer with v >= 0. Used for mustFix,
    position, bestRound. bestRound: 0 is the valid sentinel for
    'interrupted before any round closed'.
  - isNonNegativeFinite(v): finite number with v >= 0. Used for
    composite, score, dimScore, threshold. Threshold may be
    fractional (e.g. 8.5 on a scale of 10).

Cross-field check inside run_started: threshold <= scale (the daemon
Zod schema enforces this with an epsilon refine, the wire guard
matches the same intent).

Tests (TDD, red-first) added in packages/contracts/tests/critique.test.ts:

  - 22 new rejection cases across every numeric field that
    previously slipped through: scale: 0, negative scale, fractional
    scale, maxRounds: 0, fractional maxRounds, protocolVersion: 0,
    fractional protocolVersion, negative threshold, threshold > scale,
    round: 0, fractional round, negative dimScore / score, negative /
    fractional mustFix, negative composite, ship round: 0, negative /
    fractional bestRound, negative interrupted composite, negative /
    fractional parser_warning position.
  - 3 positive boundary cases that must still pass: threshold == scale,
    fractional threshold within [0, scale], interrupted with
    bestRound: 0 (no round completed before interrupt), parser_warning
    with position: 0 (start of stream).

Verified:
- pnpm --filter @open-design/contracts build clean.
- pnpm --filter @open-design/contracts test: 4 files / 59 tests green
  (was 37 before the new domain cases).
- pnpm --filter @open-design/web typecheck clean.
- pnpm --filter @open-design/web test: 110 files / 1004 tests green;
  no regression on Theater suite, sse validator, replay parser, or
  assistant-feedback widget tests.

---------

Co-authored-by: Nagendhra <nagendhra405@gmail.com>
2026-05-12 21:38:58 +08:00
pftom
6f818d971d feat(daemon, web): implement plugin folder installation and enhance atom worker registry
- Added a new API endpoint for installing plugins from specified folder paths, improving the plugin management experience.
- Introduced functions for normalizing and validating project plugin folder paths, ensuring robust error handling.
- Implemented a registry for built-in atom workers, allowing for dynamic signal aggregation during pipeline execution.
- Enhanced the `runStageWithRegistry` function to support multiple atom workers, merging their outputs with pessimistic logic.
- Updated the UI components to display plugin folder candidates and facilitate user interactions for plugin installation.
- Added tests for the new atom worker registry and plugin folder installation features, ensuring reliability and correctness.

This update significantly enhances the plugin installation process and the overall functionality of the atom worker system, providing users with better tools for managing plugins and their interactions.
2026-05-12 21:38:45 +08:00
pftom
61a68a34f8 feat(web): implement Home intent rail for streamlined project creation
- Introduced a new Home intent rail in the HomeHero component, allowing users to select project categories or migration shortcuts via chips.
- Enhanced the EntryShell and HomeView components to support project kind selection based on the chosen chip, improving the project creation flow.
- Added functionality for folder import and template selection, integrating with existing project creation mechanisms.
- Updated CSS styles for the Home intent rail to ensure a responsive and visually appealing layout.
- Implemented tests for the new chip interactions and HomeHero functionality, ensuring reliability and user experience.

This update significantly enhances the user experience by providing a more intuitive way to create projects and manage migrations directly from the Home interface.
2026-05-12 21:08:25 +08:00
pftom
ed2cbe171b feat(daemon, web): implement media generation scenario and enhance plugin handling
- Introduced a new `od-media-generation` scenario plugin for handling image, video, and audio projects, providing a default pipeline for media generation.
- Updated the `collectBundledScenarios` function to deduplicate scenarios and prefer canonical IDs for task kinds, improving plugin routing.
- Enhanced the `PluginsView` and `HomeHero` components to better display community and user-installed plugins, improving user experience.
- Refactored tests to accommodate the new media generation scenario and ensure proper functionality across plugin types.

This update significantly enhances the media handling capabilities and overall plugin management experience, making it easier for users to work with various media projects.
2026-05-12 20:54:33 +08:00
pftom
13d5598b0c feat(web, daemon): enhance plugin import functionality and UI components
- Added support for uploading plugins via zip files and folders, improving the plugin import process.
- Introduced a new `PluginImportModal` for a streamlined user experience when importing plugins.
- Updated the `PluginsView` to include disabled states for unfinished plugin areas, enhancing clarity for users.
- Refactored various components to utilize the new `resolvePluginQueryFallback` function for improved localization handling.
- Enhanced CSS styles for better visual feedback and responsiveness in the plugin import interface.

This update significantly improves the plugin management experience, making it easier for users to import and manage plugins effectively.
2026-05-12 20:46:17 +08:00
pftom
443aea72c5 feat(daemon, web): enhance plugin handling and UI integration
- Introduced a new plugin upload mechanism with file size limits and memory storage, allowing users to upload plugins directly.
- Implemented fallback logic for plugin application, ensuring projects can be created without explicit plugin requests.
- Enhanced the UI to support plugin selection and integration, including a new `PluginsView` component for managing plugins.
- Updated various components to utilize localized text for plugin queries, improving user experience across different languages.
- Added tests for new plugin functionalities and local skill loading, ensuring reliability and correctness.

This update significantly improves the plugin management experience, providing users with better tools for plugin integration and interaction.
2026-05-12 20:42:40 +08:00
lefarcen
7b191b5f85
fix: load Orbit templates from design templates (#1442)
(cherry picked from commit 988e727927)

Co-authored-by: shangxinyu1 <shangxinyu@refly.ai>
2026-05-12 19:38:16 +08:00
Rocky
a7e6e0dc3d
fix(web/AssistantMessage): update status block detail to latest value instead of skipping (#1413)
When using Open Design with an ACP agent (e.g. Devin for Terminal),
after selecting a non-default model in the picker, the model badge
under the conversation header kept showing the agent's initial default
(e.g. `model swe-1-6-fast`) instead of the running model. The
conversation header text and the agent itself reflected the selected
model correctly — only the badge UI was stale.

Root cause: `buildBlocks()` in this file de-duplicated consecutive
status events with the same label by SKIPPING the new one rather than
updating the existing block. The daemon emits two `status:
label='model'` events per turn — once after `session/new` returns
with the agent's default model, then again after
`session/set_config_option` succeeds with the user-selected model
(see `apps/daemon/src/acp.ts` ~lines 495 and 587). The dedupe kept
the first and skipped the second, so the badge stayed stuck on the
default.

Fix: update the existing block's detail to the latest value instead.
"Most recent detail wins" is more accurate than "first one wins
forever" for every status label that reaches this code path
(`'model'`, `'initializing'`, etc.; the filter at lines 1156-1162
already drops the labels we don't want to surface as badges:
streaming, starting, requesting, thinking, empty_response).

Adds two regression tests in
apps/web/tests/components/AssistantMessage.test.tsx:

- Two sequential `status: 'model'` events with different details
  render the second detail and not the first (the Bug A scenario).
- Two sequential events with the SAME detail still collapse to a
  single badge (no regression in the existing dedupe behavior).

`pnpm guard` clean; AssistantMessage tests pass 22/22. Manually
verified working in a local build against Devin for Terminal
`2026.5.6-4` with #1208 applied — badge now updates to the
selected model on every turn.
2026-05-12 19:28:35 +08:00
PerishFire
e6c5560884
Fix appearance accent color persistence (#1439) 2026-05-12 19:11:09 +08:00
pftom
3d6d434f11 feat(web): enhance TasksView and plugin preview components with new features and styles
- Updated `TasksView` to include a title row with a "Coming soon" label and added a preview note for user guidance.
- Enhanced `DesignSystemSurface` to fetch and display design system showcases, improving visual representation in the plugin home section.
- Refactored `PreviewSurface` to pass the `inView` prop to `DesignSystemSurface`, optimizing rendering behavior.
- Improved CSS styles for tasks and design system components, ensuring a cohesive and visually appealing layout.

This update significantly enhances user experience by providing clearer information and improved visual elements in the tasks and plugin previews.
2026-05-12 18:10:37 +08:00
pftom
9f4e76d507 feat(web): introduce integrations and tasks views with enhanced navigation and settings management
- Added new `IntegrationsView` and `TasksView` components to facilitate user interaction with integrations and task management.
- Updated the `App` component to manage initial tab states for integrations and settings navigation.
- Enhanced `EntryNavRail` to include navigation options for tasks and integrations, improving accessibility.
- Refactored `EntryShell` to support dynamic rendering of the new views and manage integration tab states.
- Improved CSS styles for the new views, ensuring a cohesive design and responsive layout.

This update significantly enhances the user experience by providing dedicated views for integrations and tasks, streamlining navigation and settings management.
2026-05-12 18:05:15 +08:00
Matt Van Horn
23218eacd9
refactor(settings): use tiled language picker instead of dropdown (#1406)
The Language section in Settings rendered a single-button dropdown
trigger that opened a floating menu. With one visible label and lots of
empty panel space, the layout misled users into thinking only one
language existed. Replace the dropdown trigger + portaled menu with an
inline tile grid that shows every locale at a glance and clicks
directly to switch.

Side effects of the new layout: the languageOpen / languageMenuRect
state, the dynamic placement effect, the resize-close effect, the
mousedown click-outside handler, and the languageRef are gone. The
global Escape handler no longer needs to guard against the menu being
open. CSS for .settings-language-picker, .settings-language-button,
.settings-language-menu, and .settings-language-option is replaced by
.settings-language-grid (auto-fill 180px minmax columns) +
.settings-language-tile.

Tests in SettingsDialog.execution.test.tsx that drove the dropdown
(click trigger → click menuitemradio → assert menu closed) are
rewritten to drive the tiles directly via the radio role.

Refs #1347
2026-05-12 17:49:04 +08:00
shangxinyu1
0220124e0f
test: add Memory and Routines coverage (#1400)
* test: align extended Playwright coverage with current UI behavior

* test: address extended suite review feedback

* test: fix Codex fallback config hydration in e2e

* test: add Memory and Routines coverage

* test: fix Memory and Routines component test typing

* test: include Memory and Routines e2e in extended suite
2026-05-12 17:48:56 +08:00
pftom
7464cb443a feat(web): enhance plugin media detail and modal components with improved layout and functionality
- Introduced a new custom stage container for media plugins, allowing for a unified modal experience across different media types (image, video, audio).
- Updated the `PreviewModal` to support custom ReactNode stages, enhancing the flexibility of media previews.
- Refactored the `PluginMetaSections` to include an optional heading for better clarity and organization of plugin information.
- Improved the close button design in various modals for consistency and better user experience.
- Enhanced CSS styles for modals and plugin details to ensure responsive design and visual consistency across the application.

This update significantly improves the usability and aesthetics of media-related modals, making it easier for users to interact with and understand plugin content.
2026-05-12 17:44:32 +08:00
Rocky
6c3fd86642
fix(daemon/acp): terminate ACP child after clean prompt completion (#1286)
* fix(daemon/acp): terminate ACP child after clean prompt completion (Bug B / #1265)

Some ACP agents (notably Devin for Terminal) keep the child process
alive after stdin closes, waiting for the next prompt. Open Design
spawns a fresh agent per chat turn and relies on child.on('close') to
finalize the run, so without an explicit signal-driven shutdown the
chat sits stuck in the 'working' state indefinitely.

Three small, targeted changes:

- apps/daemon/src/acp.ts: After a clean session/prompt response we
  schedule a 500ms grace period and then SIGTERM the child. This
  mirrors the pattern detectAcpModels() already uses after model
  discovery. The grace period leaves well-behaved agents that exit on
  stdin.end() unaffected.
- apps/daemon/src/acp.ts: New completedSuccessfully() method on the
  session handle reports whether the prompt resolved without a fatal
  error or abort, so the consumer can distinguish 'clean signal exit'
  from 'genuine signal failure'.
- apps/daemon/src/server.ts: child.on('close') now treats a SIGTERM
  exit as 'succeeded' when acpSession.completedSuccessfully() is true.
- apps/web/src/providers/daemon.ts: Trust the server's authoritative
  endStatus; the signal/non-zero-code safety net no longer overrides
  an explicit 'succeeded' status, so the chat doesn't surface a fake
  'agent exited with signal SIGTERM' error after a clean ACP run.

Daemon tests cover the SIGTERM grace timer, clean early-exit (timer
cleared), and completedSuccessfully() abort/error states. Manual UI
test on plain main + this fix confirms Devin chats now return to ready
automatically after Done · ...

* fix(daemon/connectionTest): treat ACP clean SIGTERM as success

Codex review on #1286 caught that the new SIGTERM in attachAcpSession
breaks ACP connection tests for agents that don't shut down on
stdin.end() (the exact Devin behavior the patch targets).

attachAgentStreamHandlers() in connectionTest.ts now also respects
acpSession.completedSuccessfully(), mirroring the same check we apply
in server.ts. Without this, a clean prompt response followed by our
SIGTERM would set winner.signal === 'SIGTERM', flip exitedCleanly to
false, and the connection test would report 'agent_spawn_failed'
even when the agent had returned a healthy response.

Also widened the AgentSpawnHandle type so completedSuccessfully is
visible on the structural type used inside connectionTest.ts.

All 56 daemon tests still pass; typecheck + guard clean.

* fix(daemon/acp): narrow ACP success-on-signal override to forced-SIGTERM

Looper review on #1286 caught that the success predicate was broader
than the SIGTERM case it was meant to handle. `completedSuccessfully()`
flips to true as soon as the ACP `session/prompt` response is
processed, but it does not say why the child later closed. With the
broad predicate, an ACP agent that returned a prompt result and then
exited with code 1 (or was killed by SIGKILL/SIGSEGV) was still marked
'succeeded', regressing the existing close-status behavior for genuine
post-response process failures.

Scope the override to the exact forced-shutdown shape this PR
introduces:

  code === null && signal === 'SIGTERM' && acpCleanCompletion

Applied to both `server.ts` (chat run finalization) and
`connectionTest.ts` (connection-test classification). Any other
post-response failure now falls through to 'failed' / 'agent_spawn_failed'
as before.

All 59 daemon tests still pass; typecheck + guard clean.

* fix(web/daemon): only bypass exit-code safety net on explicit server success

Looper review on #1286 caught that the previous web change trusted
`endStatus === 'succeeded'` absolutely, but `endStatus` can become
'succeeded' in two distinct ways:

1. The SSE end event explicitly carries `status: 'succeeded'`
   (authoritative server declaration).
2. The end event omits or has an invalid `status` field and the
   handler silently falls back to 'succeeded' as a local default.

Both produced `endStatus === 'succeeded'` in the existing code, so
the new safety-net bypass treated them identically. That regressed
backward compat: a compatible or older daemon emitting an end event
like `{code:1}` or `{code:null,signal:"SIGTERM"}` with no
`status` would suddenly skip the failure banner.

Track explicit success separately via `serverDeclaredSuccess`,
set true only when:

- The SSE end event has `status === 'succeeded'`, or
- The fallback `fetchChatRunStatus` REST path returns
  `status === 'succeeded'` (which the existing `isChatRunStatus()`
  guard already proves is explicit).

The safety net is now bypassed only on that explicit signal; the
local-fallback success path still reaches the exit-code/signal
check so real failures surface as before.

Adds three web-side regression tests in `apps/web/tests/providers/sse.test.ts`:

- Explicit `status: 'succeeded'` + SIGTERM → onDone called, no error
- End event with `{code:1}` and no `status` → onError surfaces
  'agent exited with code 1' as before
- End event with `{code:null,signal:'SIGTERM'}` and no `status` →
  onError surfaces 'agent exited with signal SIGTERM' as before

`pnpm guard` + daemon typecheck clean; 27/27 SSE tests pass (up
from 24).
2026-05-12 17:13:10 +08:00
Yiang Yiyan
5ff578dc8d
fix: support object-style question-form options (#1293)
* fix: support object-style question-form options

* fix: preserve stable option values in form submissions
2026-05-12 17:03:45 +08:00
Mason
2f51f3c1ae
feature: refine assistant artifact feedback (#1379)
* feature: refine assistant artifact feedback

* fix: clear hidden custom feedback reason

* test: update assistant feedback expectations
2026-05-12 17:00:42 +08:00
pftom
244e8b7981 feat(daemon): enhance plugin preview handling and add fallback mechanisms
- Implemented a new `collectPluginPreviewCandidates` function to gather potential HTML assets for plugins, improving the robustness of the preview endpoint.
- Introduced `discoverPluginHtmlAssets` to scan common directories for HTML files, ensuring that plugins with missing declared entries can still provide a valid preview.
- Updated the `/api/plugins/:id/preview` route to utilize the new candidate collection and discovery functions, enhancing the user experience by preventing blank tiles in the gallery.
- Added comprehensive tests for the new fallback logic to ensure reliability and correctness in various scenarios.

This update significantly improves the plugin preview functionality, ensuring users have access to valid previews even when manifest entries are outdated or missing.
2026-05-12 16:45:24 +08:00
pftom
2e27745800 feat(web): enhance PluginsHomeSection with improved plugin card visuals and layout
- Updated the `PluginsHomeSection` to change the section title from "Plugins" to "Community" and revised the subtitle for clarity.
- Refactored the `PluginCard` component to include a new `PreviewSurface` for dynamic rendering of plugin previews based on type (media, HTML, design).
- Introduced lazy loading for plugin previews using the `useInView` hook to optimize performance.
- Added new components for different preview types: `MediaSurface`, `HtmlSurface`, `DesignSystemSurface`, and `TextSurface`, enhancing the visual presentation of plugins.
- Improved CSS styles for a more responsive and visually appealing grid layout, accommodating up to five tiles per row on larger screens.
- Implemented a new `inferPluginPreview` function to classify and render plugins based on their manifest data.

This update significantly enhances the user experience by providing a more engaging and visually rich plugin gallery.
2026-05-12 15:51:39 +08:00
이용진
aeb6cde923
prevent duplicate saves and add template deletion (#1294)
* prevent duplicate template entries on repeated save

* add delete button to saved template list

Templates can now be removed from the template picker via a hover x button, calling the existing DELETE /api/templates/:id endpoint.

* add missing onDeleteTemplate prop in test fixtures

* add template deletion flow test for NewProjectPanel

* reject template names longer than 100 characters

* preserve original createdAt on template update
2026-05-12 15:48:04 +08:00
pftom
6cccccdc56 feat(web): refactor PluginsHomeSection to implement 3-axis faceted filtering
- Replaced the previous tag-based filtering with a new 3-axis model (SURFACE, TYPE, SCENARIO) for enhanced plugin discovery.
- Introduced a new `usePluginFacets` hook to manage the independent selection of facets and their AND-composition.
- Updated the `PluginsHomeSection` component to render facet rows and a Featured chip, improving user interaction and layout.
- Removed legacy categorization logic and associated files, streamlining the codebase.
- Enhanced CSS styles for the new layout and improved visual consistency across the plugins home section.
- Added tests to ensure the correctness of facet extraction and filtering logic.

This update significantly enhances the user experience by providing a more flexible and intuitive way to filter and discover plugins.
2026-05-12 15:29:59 +08:00
Eli
9c489aa045
feat(web): redesign Designs tab cards — covers, tags, overflow menu, multi-select (#1161)
* feat(web): redesign Designs tab cards — covers, tags, overflow menu, multi-select

- Render real previews on project cards: HTML iframe / image / video / hashed gradient fallback with project initial; lazily fetches the project's primary file when metadata.entryFile is unset, prefers index.html → newest html → image → video.
- Live artifact card thumbnails embed the rendered artifact URL via sandboxed iframe.
- Replace the per-card close button with a `…` overflow menu (Rename, Delete) that opens on hover/click; click-outside and Esc close it.
- Add multi-select mode (toolbar toggle → checkbox per card → "N selected · Delete · Cancel" pill) with batch delete via the existing onDelete prop.
- Add a category tag to every card (Prototype / Live Artifact / Slide / Media) derived from project.metadata.intent / kind / skillId.
- Replace browser prompt() and confirm() with custom modals (rename input + danger-confirm) reusing the existing .modal shell.
- Add `more-horizontal` icon and 16 new i18n keys across all 18 locales (zh-CN/zh-TW localized; others fall back to English).

* test(e2e): update home delete flow for overflow menu + custom confirm modal

The previous flow targeted a per-card X button labelled "delete project <name>"
and asserted on a native `dialog` event. The card UI now exposes a `…` overflow
menu and a styled confirm modal, so reach delete via the menu and assert against
the modal's Cancel / Delete buttons instead.

* fix(web): harden Designs tab preview sandbox

* fix(web): hide Designs select mode in kanban
2026-05-12 15:08:22 +08:00
Eli
77f69257a7
feat(web): in-context comment thread for the artifact preview (#1276)
* feat(web): free-pin fallback in comment mode for unannotated artifacts

When the artifact has no data-od-id annotations, clicking in Comment
mode now posts a synthetic position-based target so the host opens a
popover at the click location. Daemon upsert validation requires a
non-empty selector/label, so the pin uses [data-od-pin=ID] and label
'pin'. Coordinates are document-space (viewport + scrollY) so pins
stay anchored after scroll/reload. Clicks on interactive elements
(a/button/input/textarea/select/label/contenteditable) keep their
native behavior and are not pinned.

* feat(web): tighten comment popover layout for free-pin and element targets

The popover header used to dump the raw elementId verbatim — fine for
data-od-id targets like 'hero-cta' but jarring for free-pins where
elementId is a synthetic 'pin-...' string. Branch on the prefix and
show 'Pin · at X, Y' for free-pins; keep the label + selection kind
for real element / pod targets. Replace the text 'Close' button with
an icon-only close affordance to match the popover-as-card visual.

Action row is now two right-aligned buttons (Comment + Send to
Claude) for element targets and (Add note + Send to Claude) for pod
targets, eliminating the three-button row that wrapped onto two
lines at narrow widths. The 'Remove' affordance for existing
comments stays left-aligned.

* feat(web): drop comments tab from chat sidebar

The chat sidebar's 'Comments' tab listed saved/attached preview
comments but duplicates the per-element popover already shown in the
artifact viewer. Hide the tab and its content while the right-side
comment thread panel takes over the same surface in-context. The
CommentsPanel / CommentSection components stay defined as dead code
for the moment so callers and translation keys remain valid; a later
pass can delete them.

* feat(web): right-side comment thread panel in board mode

Render a 320px CommentSidePanel anchored to the right of the
artifact preview whenever board (comment) mode is on. The panel
lists every saved preview comment for the current file with an
avatar initial, the element label (or 'Pin' for free-pin synthetic
ids), an Xd/Xh/Xm-ago timestamp, the note body, a Reply link, and
a checkbox.

Reply focuses the comment's element via liveSnapshotForComment so
the popover opens at the right anchor. Selecting one or more
comments via the checkboxes surfaces a 'N selected · Clear · Send
to Claude' action bar above the list; Send to Claude reuses the
existing onSendBoardCommentAttachments pipeline via
commentsToAttachments. The panel takes the place of the chat
sidebar's removed Comments tab so the thread lives next to the
artifact instead of behind a tab switch.

* feat(web): styles for right-side comment thread panel

Floating 320px panel anchored to the right edge of the artifact
preview with a scrollable comment list and a coral selection bar
that appears when one or more comments are checked. Selected items
get a coral tint; the reply / check / send-to-claude controls
match the popover's coral primary tone.

* feat(web): toast confirmation on comment save, close popover

After savePersistentComment succeeds, close the popover via
clearBoardComposer and surface a transient 'Comment saved' (or
'Pin saved' for free-pin targets) toast for 2.2s. Replaces the
previous behavior where the popover stayed open with an empty
draft after save, which left users uncertain whether the save
landed and forced an extra click to dismiss.

* feat(web): position the comment-save toast at the top of the preview

* feat(web): allow editing saved comment notes via the side panel

Rename the per-item 'Reply' affordance to 'Edit' (no thread model
exists yet, so reply was misleading) and pre-fill the popover with
the existing note when clicked. The save path goes through
onSavePreviewComment which the daemon implements as an upsert keyed
on (project, conversation, filePath, elementId), so the edit
overwrites the existing row's note without spawning a duplicate.

Also fall back to a snapshot synthesized from the saved comment's
own fields when the corresponding live target is no longer in the
iframe DOM (e.g. free-pin parents that were re-rendered), so the
edit path still works after artifact reloads.

* feat(web): hide already-sent comments from the side panel

After Send to Claude, the daemon flips the comment status from
'open' to 'applying' (and then 'needs_review' / 'resolved' /
'failed' depending on the run). Filter the side panel to status
=== 'open' so sent comments visibly leave the list — the user
gets clear feedback that the send landed and the panel stays
focused on actionable, un-sent items.

* feat(web): drop single-tab bar and conversation count badge

After the Comments tab was removed the chat header still rendered
a one-tab 'tablist' just for the Chat tab, which read as visual
noise without a sibling to switch between. Drop the tabs wrapper
entirely; the chat content stays mounted and the header now hosts
only the conversation-history affordance.

Also drop the numeric badge that overlaid the conversation history
button: counting open conversations next to a generic history icon
was easy to mistake for an unread / notification count. The dropdown
itself remains the canonical place to see and switch between past
conversations.

* feat(web): right-align chat header actions after tab bar removal

With the tabs wrapper gone, chat-header-actions sat flush left
because nothing was pushing it across the header. Add margin-left:
auto so the history / new-conversation / collapse buttons land at
the right edge, matching the design files / index.html tab row's
own right-aligned controls.

* feat(web): rename board-mode toggle to Comment with comment icon

The artifact preview toolbar's board-mode entry was labeled 'Tweaks'
with the tweaks icon, which collided with the palette Tweaks button
next to it and hid the comment capability behind a generic label.
Rename to 'Comment' with the comment icon and switch to the
viewer-action class so the button matches the surrounding toolbar
items (Edit/Draw) and the coral active state lands on the right
surface.

* fix(web): pass designTemplates to ProjectView in api-empty-response test

The test props for ProjectView were missing the designTemplates
prop that was added to Props in #955 (generic skills split). CI's
strict typecheck (tsc -b --noEmit) caught it; local runs that hit
project references differently did not. Pass an empty SkillSummary
array — matches the empty skills fixture for the same reason.
2026-05-12 15:05:08 +08:00
Eli
928079daf5
feat(web): consolidate Image/Video/Audio entries into a Media tab (#1167)
Reduces the New Project panel's top-level tab count by collapsing the
three media surfaces into a single Media tab with an inner segmented
control, and polishes the controls inside that tab so they stop
dominating the panel:

- Media tab + segmented (Image / Video / Audio) inside the panel body.
  Underlying ProjectKind branches and submission contract unchanged —
  the daemon still receives kind=image/video/audio.
- Model picker rewritten as a combobox: one trigger row + searchable,
  provider-grouped popover with Recommended badges. Replaces the flat
  grid of provider-grouped cards that scrolled past the fold once the
  fourth provider landed.
- Aspect picker compressed from a 5-card grid to a single row of
  segmented pills with mini ratio glyphs.
- Image surface no longer carries a free-form Style notes field; it
  was redundant with the prompt template + main prompt input.
- Live artifact tab locks fidelity to high-fidelity (the wireframe
  option is now hidden) — a wireframe live artifact doesn't make
  sense and the picker added noise.

i18n: adds tabMedia / titleMedia / model* keys across all 18 locales,
removes imageStyleLabel / imageStylePlaceholder. Tests + e2e selectors
updated to drive the new Media tab + segmented surface flow.
2026-05-12 14:52:03 +08:00
huyhoangnhh98
140a4e1ff6
Improve responsive preview and design handoff outputs (#1224)
* feat: improve responsive design handoff

* feat: refine cross-platform design outputs

Changelog:\n- Add auto-fit responsive preview behavior for tablet/mobile frames.\n- Add landing page and OS widgets metadata options with project header chips.\n- Strengthen prompt contracts for modern breakpoints, app-specific modules, CJX-ready UX, and final product surfaces.\n- Require cross-platform outputs to use separate platform files instead of tabbed demo selectors.\n- Add DESIGN-MANIFEST.json plus richer handoff guidance to daemon/client exports.\n- Update archive/export tests for manifest and responsive viewport matrix.

* feat: enforce screen-file design outputs

Changelog:\n- Enforce screen-file-first generation for landing pages, app screens, platform surfaces, and OS widgets.\n- Update design handoff and manifest exports so coding tools map each screen file to separate routes/surfaces.\n- Strengthen minimal-brief visual guidance to avoid monochrome or unstyled design outputs.

* fix: address responsive handoff review feedback

* fix: address handoff review blockers

* fix: preserve proxy auth and normalized export entry

* fix: narrow frame wrapper filter to directory paths only

* fix: make artifact save failure banner generic

---------

Co-authored-by: Huy Hoàng <macos@MacBook-Pro-Hoang.local>
2026-05-12 14:18:33 +08:00
pftom
5af84c09af feat(web): refactor PluginsHomeSection to use tag-based filtering and introduce PluginCard component
- Replaced the legacy tabbed categorization in `PluginsHomeSection` with a tag-driven approach, allowing dynamic filtering based on plugin tags.
- Introduced a new `PluginCard` component to encapsulate the rendering of individual plugin cards, improving separation of concerns and maintainability.
- Added a `usePluginCategories` hook to manage plugin visibility and filtering logic, enhancing the overall structure and testability of the component.
- Implemented a "More" pill for overflow tags in the filter row, improving user interaction with a cleaner UI.
- Updated CSS styles to support the new layout and improve visual consistency across the plugins home section.

This update significantly enhances the user experience by providing a more flexible and intuitive way to discover and interact with plugins.
2026-05-12 13:25:44 +08:00
pftom
9825b3ba1f feat(web): enhance entry view with responsive topbar and GitHub star integration
- Updated `EntryShell` to include a responsive topbar that collapses into a settings dropdown on narrow viewports, improving accessibility and user experience.
- Integrated `GithubStarBadge` to display the current star count for the GitHub repository, encouraging user engagement.
- Added a new `useGithubStars` hook to manage the star count fetching and caching, ensuring efficient updates without unnecessary API calls.
- Enhanced CSS styles for the topbar and avatar items to improve visual consistency and interaction feedback.
- Updated internationalization files to include translations for new UI elements related to the GitHub star feature.

This update significantly improves the user interface by providing a more dynamic and engaging entry experience.
2026-05-12 11:57:41 +08:00
pftom
45760a75aa feat(web): enhance entry view with API protocol and model switching
- Introduced `InlineModelSwitcher` to allow users to switch between CLI and BYOK modes, along with selecting the active model.
- Updated `App` component to handle API protocol and model changes, ensuring seamless configuration updates.
- Modified routing to support distinct views for home, projects, and design systems, improving navigation and user experience.
- Removed legacy `PluginsSection` from `NewProjectPanel`, streamlining the project creation process.
- Enhanced CSS styles for better visual consistency across updated components.

This update significantly improves user interaction by providing intuitive controls for managing execution modes and models directly from the entry view.
2026-05-12 11:25:06 +08:00
Nagendhra Madishetti
dbc94b83ed
feat(web): add thumbs-up/down feedback widget under completed assistant turns (#1288) (#1308)
* feat(web): add thumbs-up/down feedback widget under completed assistant turns (#1288)

Adds a lightweight feedback widget that surfaces under each
assistant turn whose run succeeded. Users can submit positive or
negative feedback in one click; the negative path opens an optional
free-text comment area. The widget never blocks the message
composer and only mounts after the run has produced its final
artifact, matching the acceptance criteria.

What ships

- `<MessageFeedback>` (apps/web/src/components/MessageFeedback.tsx)
  renders the three states: idle (prompt + thumbs), submitted
  positive (confirmation + Change), submitted negative
  (confirmation + optional comment textarea + Send + Change).
- AssistantMessage.tsx slots the widget under AssistantFooter,
  gated on `runSucceeded && !hasEmptyResponse`, so failed and
  empty-response turns don't ask the user to rate something that
  never finished.
- The full record shape leaves room for the future analytics
  metadata the issue calls out (rating, comment, submittedAt;
  artifactRef / runId derivable from the surrounding message
  whenever the analytics pipeline lands).

Persistence (v1 = localStorage)

Lefarcen's clarifying comment on the issue asked whether v1 should
be daemon-persisted or in-memory while the analytics pipeline is
defined. The daemon's messages table is column-strict, so daemon
persistence would require a SQLite migration plus a contract bump
on `ChatMessage`; locking that shape in before the analytics
pipeline is designed risks reworking it twice. localStorage is the
middle ground: feedback survives reload (so the "feedback state is
visually clear after submission" criterion holds across tabs and
sessions) without committing the wire shape. The hook surface is
just `(value, setter)`, so a future PR can swap the storage layer
for a daemon mirror or an analytics shipper without touching the
React surface.

The store handles corrupted JSON, unknown future rating values,
disabled storage (private-mode browsers), and broadcasts changes
across listeners in the same tab via a CustomEvent so two mounts
of the hook for the same messageId stay in sync.

i18n

11 new keys under `feedback.*` (prompt, thumbsUp/Down, two
confirmation chips, comment label/placeholder/submit/saved, change).
English source values authored alongside the keys; zh-CN
translations added in the same pass so the locale alignment test
stays green and Chinese users see Chinese strings from day one.
The other 16 locales pick up English fallbacks via their existing
`...en` spread.

Test coverage

- `tests/state/message-feedback.test.ts` (8 jsdom cases) — round-trip,
  null-clear, corrupted JSON, missing rating, unknown rating, key
  collision across messages.
- `tests/components/MessageFeedback.test.tsx` (7 jsdom cases) — idle
  state, positive submit, negative submit, comment save, blank-comment
  Send disabled, Change unsticks the rating, rehydration from
  pre-populated storage.

The locale alignment test continues to enforce that every locale
declares the new keys (5/5 across 18 locales).

Validated

- pnpm guard clean
- pnpm --filter @open-design/web typecheck clean
- tests/i18n/locales.test.ts 5/5
- tests/state/message-feedback.test.ts 8/8
- tests/components/MessageFeedback.test.tsx 7/7
- Full web suite: 98 files, 903 tests

* fix(web): tighten feedback widget gate + storage sync + textarea, add styles (PR #1308 review)

Addresses every P2/P3 from the codex + Siri-Ray + lefarcen reviews
on PR #1308, plus a couple of polish items the review surfaced
indirectly.

Visibility gate (lefarcen P2)

The gate was `runSucceeded && !hasEmptyResponse`, which also matched
text-only acknowledgements and question-form replies. The issue
scopes feedback to turns that produced a final artifact, so the
gate now also requires `produced.length > 0`. New AssistantMessage
suite (5 jsdom cases) pins: artifact -> shown, no-artifact -> hidden,
streaming -> hidden, failed run -> hidden, empty_response -> hidden.

Storage sync (codex P2 + lefarcen P2)

The previous broadcast contract was: write storage, dispatch a bare
CustomEvent, listeners re-read storage. That had two failure modes:

  - setItem throwing (private mode / quota / disabled storage) left
    the listener seeing null and clobbering the in-memory state the
    user just confirmed.
  - The clear path early-returned after removeItem and never
    dispatched, so a second mount of the same messageId stayed in
    the submitted state when the user clicked Change.

New contract: every successful OR failed write dispatches a
CustomEvent whose `detail.value` carries the new feedback record (or
null). Listeners apply the value directly without re-reading. Same-
tab sync survives storage failures and the clear path no longer
early-returns. Cross-tab still re-reads on the platform `storage`
event since that event has no detail. Two new storage tests pin the
new broadcast contract (positive + null) and the failed-setItem path;
two new component tests pin in-session confirmation under setItem
failure and two-mount Submit + Change synchrony.

Textarea draft fix (lefarcen P3)

The textarea used `draftComment || feedback.comment || ''` as its
controlled value, so erasing a saved comment snapped it back. The
draft is now exclusively the source of truth; a ref-backed effect
re-seeds the draft from feedback.comment whenever the rating
transitions (mount, idle -> negative, cross-mount sync). Send is now
enabled when `draftComment !== savedComment`, which lets the user
both edit and clear a saved comment. New component test pins erase+
Send actually removing a previously-saved comment.

Accessibility

The confirmation chip and "Comment saved" tag both gain
`role="status"` + `aria-live="polite"` so screen readers announce the
state transition. The thumb buttons keep their `aria-label`.

CSS (lefarcen P3)

The widget's `.message-feedback*` class set had no rules in
index.css, so it rendered with default browser controls. Added a
~130-line block that mirrors the surrounding chat pill/chip
vocabulary: bg-subtle background, border-pill confirmation chip,
accent-tinted positive state and amber-tinted negative state to
match the assistant-footer's data-unfinished pattern. Comment area
sits below the chip and wraps on narrow widths so the composer
isn't pushed off-screen on small panes.

Validated

- pnpm guard clean
- pnpm --filter @open-design/web typecheck clean
- tests/state/message-feedback.test.ts 10/10 (was 8, +2 broadcast)
- tests/components/MessageFeedback.test.tsx 10/10 (was 7, +3 sync /
  storage-failure / clear-saved-comment)
- tests/components/AssistantMessage.test.tsx 5/5 (new file)
- tests/i18n/locales.test.ts 5/5
- Full web suite: 866 tests

---------

Co-authored-by: Nagendhra <nagendhra405@gmail.com>
2026-05-12 11:10:28 +08:00
pftom
b55f171693 feat(web): redesign entry view with new navigation and home layout
- Introduced `EntryNavRail` for a streamlined left navigation experience, featuring primary actions and a brand logo.
- Created `EntryShell` to manage the entire home view layout, integrating the centered hero, recent projects, and plugins section.
- Developed `HomeHero` for user prompt input, allowing seamless interaction with plugins and project creation.
- Replaced the previous `PluginLoopHome` with a more cohesive `HomeView` that orchestrates plugin interactions and project submissions.
- Enhanced CSS styles for improved visual consistency across the redesigned components.

This update significantly enhances the user experience by providing a more intuitive and visually appealing entry point into the application.
2026-05-12 10:56:35 +08:00
Nagendhra Madishetti
1df3eca161
feat(web): Critique Theater Phase 7 — reducer + useCritiqueStream + useCritiqueReplay (#1307)
* feat(web): pure reducer for Critique Theater states (Phase 7.1)

Pure CritiqueState reducer driven by the contracts-level PanelEvent
(the same shape both the live SSE stream and the recorded transcript
emit), so a single reducer powers both the in-flight panel and the
rerun replay. Lifecycle covers run_started → running → (shipped /
degraded / interrupted / failed), with panelist_open / dim /
must_fix / close / round_end events building per-round
CritiquePanelistView entries as they arrive.

Defensive behaviour that surfaced while writing the spec tests:
- Terminal phases (shipped / degraded / interrupted / failed) are
  sticky against further lifecycle events for the same run, except
  for parser_warning which can land late and is recorded in a side
  channel without changing phase.
- A new run_started for a different runId at any time discards the
  prior state and reboots, so the UI can launch consecutive runs
  without an explicit reset action.
- Events whose runId does not match the active run return the same
  state reference, so React's useReducer doesn't re-render
  subscribers on stray traffic.
- Round bookkeeping keys by round number rather than "always last",
  so an out-of-order panelist_dim for round 1 arriving after a
  round 2 dim does not corrupt the round 2 bucket.

Test coverage: 18 cases covering each transition, the runId guard,
sticky-terminal behaviour, the out-of-order round invariant, and
the stable-identity guarantee. Sets up Phase 7.2 and 7.3 to wire
SSE + replay into the same reducer.

* feat(web): useCritiqueStream hook subscribes to SSE and feeds reducer (Phase 7.2)

createCritiqueEventsConnection is a pure connection manager that
mirrors apps/web/src/providers/project-events.ts: opens an
EventSource at /api/projects/:id/events, listens for every name in
CRITIQUE_SSE_EVENT_NAMES, decodes each frame back into a PanelEvent
(stripping the critique. prefix and merging the data payload), and
hands it to the caller's onEvent. Reconnect uses exponential
backoff (1s → 30s) and resets on `ready`; malformed payloads drop
with a dev-mode warning rather than tearing the stream.

useCritiqueStream wraps the manager in a useReducer that owns the
CritiqueState. enabled=false or a null projectId tears down the
connection cleanly; switching projectId closes the old connection
and opens a fresh one. The returned dispatch lets local UI
synthesise actions (e.g. an Esc keypress firing a synthetic
interrupted while a kill request is in flight); production traffic
comes from the SSE stream.

Test coverage:
- sse.test.ts (10 cases, node env): subscription set covers every
  CRITIQUE_SSE_EVENT_NAMES channel; payload decoding lifts the wire
  shape back to PanelEvent; malformed JSON is swallowed and does
  not stop the stream; exponential backoff schedule and ready-reset
  semantics are pinned with a setTimeout seam; close() cancels
  pending reconnects and shuts the live source; no-op fallback
  when EventSource is unavailable.
- useCritiqueStream.test.tsx (6 cases, jsdom env): idle pre-event,
  reducer driven by synthetic actions, no connection when disabled
  or projectId is null, clean close on unmount, projectId change
  reopens cleanly.

* feat(web): useCritiqueReplay hook drives reducer from transcript file (Phase 7.3)

Fetches the per-run NDJSON transcript (one PanelEvent per line),
parses every line via the shared isPanelEvent predicate, and
dispatches into the same CritiqueState reducer the live SSE stream
uses. A single reducer means the UI rendering a replay can be
identical to the live panel, and a UI mounting both
useCritiqueStream and useCritiqueReplay in parallel does not have
to reconcile two state shapes.

speed knob is `paused | instant | live | { intervalMs: N }`.
- instant flushes every event synchronously, useful for opening a
  finished run already at its terminal state.
- intervalMs paces dispatches at a fixed cadence so the reviewer
  can watch the run unfold.
- paused parses the transcript but holds events back until the
  caller advances speed (consumers can drive a scrubber later).
- live is reserved for the future "playback at original cadence"
  feature, currently treated as instant; replay timestamps are not
  yet persisted with each event so honest pacing requires a
  follow-up Phase 7+ task.

gunzip seam handles `.ndjson.gz` transcripts via
DecompressionStream when present; the production fetch path picks
between text and arrayBuffer based on the URL extension. Both seams
are injectable so the unit tests don't need to spin up a real
network or a real gzip pipeline.

Test coverage (8 cases, jsdom env):
- Idle status before any URL is provided.
- speed=instant flushes the full transcript synchronously to
  shipped state.
- speed={intervalMs:N} paces with the setTimeout seam, reaching
  done after the last tick.
- speed=paused leaves status=playing with no dispatches.
- Empty transcript reports done with state still idle.
- Fetch rejection surfaces an error status with the message.
- Malformed NDJSON lines are skipped; valid events around them
  still land.
- .gz transcripts route through the gunzip seam.

Closes the Phase 7 plan tasks 7.1 / 7.2 / 7.3 (reducer + stream +
replay), all on one branch ready for review. Phases 8+ (Theater
components) consume these from this PR.

* fix(web): close payload-override gap + paused-resume bug in Critique Theater hooks (Phase 7 review)

Two P1 fixes from lefarcen's review on PR #1307:

SSE payload override

`sseToPanelEvent` previously spread `data` after the channel-derived
`type`, so a payload-provided `type` could override the channel and
route a `critique.run_started` frame into the reducer as a `ship`
action. Reversed the spread so the channel-derived `type` is
authoritative, and revalidated the resulting object through the
contracts-level `isPanelEvent` predicate before returning. Frames
that fail validation (missing runId, empty runId, unknown type) are
dropped, so a malformed or compromised SSE frame can no longer
dispatch a wrong-shape action into the reducer.

Three new sse.test.ts cases pin the regression: hostile `type:'ship'`
in the payload still resolves to `run_started`, missing runId is
dropped, empty runId is dropped.

Replay pause/resume

`useCritiqueReplay` had one big effect keyed on `transcriptUrl`
only, so flipping `speed` from `paused` to `instant` never re-fired
and the held events sat undispatched. Split into a parse effect
(depends on URL, fetches and stores events in state) and a pace
effect (depends on parsed-events + speed, owns the cursor + timers).
The playback cursor lives in a ref that survives pause/resume
cycles, so flipping `paused` -> `instant` flushes from the current
position rather than restarting (which would double-dispatch
`run_started` and reset the reducer).

Two new useCritiqueReplay.test.tsx cases:
- paused-then-instant transitions from `playing` to `done` and
  reaches the shipped terminal phase
- intervalMs paced playback dispatches one event, pauses to drain
  the next scheduled timer, flips to instant, and confirms the
  remaining transcript drains exactly once (cursor was preserved)

Doc consistency

The earlier source comment in useCritiqueReplay.ts claimed `live`
"paces by recorded timestamps" while the impl used zero-delay
timers and the PR body said it behaves like `instant`. Aligned to
reality: `live` currently behaves like `{ intervalMs: 0 }` (events
drain on successive microtasks via setTimeoutFn) because transcripts
do not yet carry per-event timestamps. Honest timestamp-driven
pacing is queued as a Phase 7+ follow-up.

Validated: pnpm guard, pnpm --filter @open-design/web typecheck,
Theater suite 47/47 (up from 42, +3 sse + 2 replay), full web suite
96 files / 888 tests.

---------

Co-authored-by: Nagendhra <nagendhra405@gmail.com>
2026-05-12 10:45:07 +08:00
Nagendhra Madishetti
64510b790b
fix(web): translate Design Files refresh strings instead of hardcoding English (#1254) (#1300)
* fix(web): translate Design Files / live artifact refresh strings instead of hardcoding English

When the app language was set to Chinese, the Design Files refresh
flow showed Chinese for the surrounding chrome but kept English for
every label and message originating in describeRefreshStatus,
describeEventPhase, and the refresh-event timeline body of
LiveArtifactRefreshHistoryPanel. Same-screen mixed-language UX, the
exact symptom reported in #1254.

Root cause: those three sites bypassed i18n entirely. describeRefreshStatus
returned hardcoded English label + description strings for the
running / succeeded / failed / idle / never statuses;
describeEventPhase returned hardcoded Started / Succeeded / Failed
labels; the timeline body inlined "Refresh started…",
"<n> source(s) updated", and "Refresh failed." string literals; and
the empty-timeline copy ("No refresh activity yet in this session.
Trigger Refresh to record a timeline…") was hardcoded too.

Fix: thread the existing TranslateFn through both helpers, swap every
hardcoded string for a t() lookup, and pull the empty-timeline copy
and the failure-fallback through the same path. Added 13 new keys
under liveArtifact.refresh.* — statusRunning, the five
*Description keys, three event-phase labels (eventStarted/Succeeded/Failed),
eventStartedDetail, sourcesUpdatedOne/Many with an {n} placeholder,
and timelineEmpty. Status labels for succeeded / failed / ready / never
already had keys (statusSucceeded / statusFailed / statusReady /
statusNever) so those are reused unchanged.

Locales: full Chinese translations added to zh-CN.ts (the locale
directly named in the issue). The other 16 locales pick up English
fallbacks through their existing ...en spread, so the locale-key
alignment test stays green; native translations for those locales
can land via the usual locale-team passes without re-touching the
source code.

* fix(web): cover the rest of the refresh panel under i18n + add a zh-CN render test

Lefarcen's review on #1254 / PR #1300 surfaced that the first pass
only translated three helpers (describeRefreshStatus,
describeEventPhase, session timeline body) and left the rest of the
panel in English. Under a Chinese UI the panel still mixed
languages, which was exactly the regression the issue was filed for.

This commit threads t() through every user-visible refresh-panel
string the user would see in the Chinese flow:

- Hero block: "Last refreshed" label + "Never" empty state.
- Created / Last updated facts + their "Unknown" empty label.
- Persisted refresh history header, hint, empty-state copy.
- Persisted timeline status badge: succeeded / running / failed /
  cancelled / skipped now resolve through describePersistedStatus,
  which uses an exhaustive switch off LiveArtifactRefreshLogEntry's
  status union so a future contract addition trips tsc.
- Session activity header, hint.
- Document source header, hint, Type / Tool / Connector field
  labels.
- Advanced debug metadata summary + note line.
- "just now" relative-time fallback in the persisted timeline.

22 new i18n keys total (23 with the new
heroLastRefreshedNever distinct from statusNever); zh-CN strings
authored alongside the English source, every other locale picks
them up via its existing ...en spread and the locale-key alignment
test stays green.

Intentionally untranslated surfaces: raw daemon payloads inside
the <details> debug panel (event.step / refreshId / error.message
and the JSON.stringify dump), since those are agent / connector
identifiers and stack-trace style strings, not localised copy.
The debug summary heading itself is translated; if the
debug section should be hidden in localised primary flows, that
is a separate UX call worth its own issue.

Test coverage: new render test wraps LiveArtifactRefreshHistoryPanel
in I18nProvider initial="zh-CN" and pins the Chinese rendering of
every translated label, plus negative assertions that the formerly
hardcoded English literals are NOT present in the markup. With the
no-provider fallback returning English, the existing static-markup
tests can't observe the regression this PR is meant to fix; the
zh-CN render test is the only one that would have caught the
original gap and will catch the next one.

Validated: pnpm guard, pnpm --filter @open-design/web typecheck,
locales.test.ts (5/5), FileViewer.test.tsx (69/69, +1 new zh-CN
test), full web suite (92 files, 841 tests).

* fix(web): route formatRelativeTime through Intl.RelativeTimeFormat so units localise

Lefarcen's second pass on PR #1300 caught the remaining hardcoded
English path: formatRelativeTime() still emitted units like `5s ago`
and `45m ago`, so Chinese users would see those strings inside the
otherwise-translated refresh panel. The function now takes the
active locale + TranslateFn and routes through
Intl.RelativeTimeFormat with style: 'narrow', numeric: 'always'.
That preserves the historical `5s ago` shape for English while
producing locale-correct output for every other locale (zh-CN gets
`5秒前` / `45分前`, with the right past / future suffix and word
order).

The `just now` carve-out (abs < 5s) keeps using
t('liveArtifact.refresh.justNow') since Intl's narrow output for
zero-delta reads awkwardly. A try/catch around the RTF constructor
falls back to 'en' if the runtime rejects the locale, so the
function is safe on engines with limited ICU data.

Callsites threaded through:
- LiveArtifactRefreshHistoryPanel hero metric (`lastRefreshedAt`)
- Session timeline event row (`event.startedAt`)
- Session timeline event time (`event.at`)
- LiveArtifactRefreshFact for the created / last-updated facts;
  the component now accepts optional `locale` + `t` props and the
  panel passes them in.

Test coverage extension:
- The existing zh-CN render test sets a real lastRefreshedAt
  (now - 45s) and real session-event timestamps, then asserts the
  Chinese past-tense suffix `前` appears AND the legacy English
  `Xs ago` / `Xm ago` shapes do NOT. That was the gap lefarcen
  pointed at: setting `lastRefreshedAt: undefined` couldn't see
  the regression because no relative-time formatting ran.
- Added a small second test for the lastRefreshedAt-undefined
  empty hero so the original `从未` coverage still pins.

Validated: pnpm guard, pnpm --filter @open-design/web typecheck,
FileViewer.test.tsx (70/70, +1 new test), locales.test.ts (5/5),
full web suite (92 files, 842 tests).

---------

Co-authored-by: Nagendhra <nagendhra405@gmail.com>
2026-05-12 10:38:07 +08:00
Caprika
5bd9763181
[codex] Improve Claude Code exit diagnostics (#1267)
* fix daemon claude diagnostics

* fix claude custom endpoint auth diagnostics

* fix project view api empty response test props

* fix claude diagnostic review gaps

* fix silent custom endpoint claude diagnostics

* fix claude diagnostic credential redaction

* fix quoted api key redaction

* fix claude diagnostic tail redaction

* fix silent claude configured profile diagnostics
2026-05-12 00:08:31 +08:00
pftom
4983fc3ac4 feat(web): implement file operations summary in assistant messages
- Added a new `FileOpsSummary` component to display a summary of file operations (read, write, edit) performed during an agent's run, enhancing user visibility into file interactions.
- Integrated the `FileOpsSummary` into the `AssistantMessage` component, allowing it to show a compact view while streaming and expand to a detailed list once the run completes.
- Created a new `file-ops` CSS style to manage the presentation of the file operations summary, including hover effects and status indicators.
- Developed utility functions in `runtime/file-ops.ts` to derive and count file operations from agent events, ensuring accurate aggregation of file interactions.
- Added tests for the `FileOpsSummary` component and the file operations logic to ensure functionality and prevent regressions.

This update improves the user experience by providing clear insights into file operations related to agent activities.
2026-05-11 23:25:38 +08:00
pftom
070b8b07c6 feat(web): enhance PluginLoopHome with plugin details modal and refined plugin rail
- Introduced a new `PluginDetailsModal` component to display detailed information about plugins directly from the PluginLoopHome interface, allowing users to preview queries, inputs, and other relevant data before applying a plugin.
- Updated the `ChatComposer`, `ChatPane`, and `InlinePluginsRail` components to support a single pinned plugin display when a project is created with a specific plugin, improving user experience by preventing unnecessary plugin selection prompts.
- Enhanced CSS styles for better visual presentation of plugin actions and details.
- Added tests to ensure the new functionality works as intended and to prevent regressions related to the plugin rail behavior.

This update streamlines the plugin selection process and enhances the overall user experience within the application.
2026-05-11 23:18:34 +08:00
pftom
b3dc3c3e0c feat(web): integrate applied plugin snapshot for enhanced user experience
- Added support for displaying an active plugin as a context chip in user messages when a project is created with a pinned plugin. This replaces the in-composer plugin rail to avoid re-prompting users for plugin selection.
- Introduced `applied_plugin_snapshot_id` in the database schema and updated relevant components (ChatComposer, ChatPane, ProjectView) to handle the new functionality.
- Implemented fetching of the applied plugin snapshot in ProjectView to ensure the active plugin is rendered correctly.
- Enhanced CSS for the plugin chip to improve visual presentation.

This change streamlines the user experience by providing context on previously selected plugins directly within the chat interface.
2026-05-11 22:53:40 +08:00
nettee
87a95b7fb4
Fix conversation run isolation (#1271) 2026-05-11 21:13:54 +08:00
Kaelz31
3524a43d18
fix: pretty-print JSON file previews (#1206)
* fix: pretty-print JSON file previews

* fix: avoid formatting JSON with unsafe numbers

* fix: preserve precision-sensitive JSON previews

* fix: preserve signed zero in JSON previews

* fix: scan JSON numbers without repeated slicing

---------

Co-authored-by: Kael S <YOUR_GITHUB_EMAIL_HERE>
2026-05-11 20:52:55 +08:00
eggward han
a0316d2599
fix(web): suppress autosave indicator for draft-only Connector key edits (#1232)
When the user typed a replacement Composio API key, the global Settings
autosave loop persisted `buildPersistedConfig(cfg)` — which intentionally
strips the in-flight secret — and then advanced the indicator through
'saving' -> 'saved' despite the key never actually being written. The
"All changes saved" status then contradicted the section-local "Save key"
gesture and eroded trust in the saved-state badge for a sensitive field.

The autosave effect now tracks the snapshot at the last successful save
(or the initial cfg on mount) and compares the next snapshot's persisted
shape against it via a new `isAutosaveDraftOnlyChange` helper. When the
only diffs since last save are fields that `buildPersistedConfig` strips
(today the Composio API key, generalizing to any future
save-on-explicit-confirm secret), the persist call is skipped and the
indicator settles to 'idle' instead of flashing 'saved'. The forced
media-provider sync path still runs because that is a real outbound
effect even when the persisted shape hasn't changed.

Refs #1187
2026-05-11 20:52:45 +08:00
nettee
be77dc0394
Default English resource i18n fallback (#1270) 2026-05-11 20:29:05 +08:00
pftom
1bdf765cf2 feat(daemon): enrich API responses with surface specs and add new flags
- Implemented `--schema` flag for `od ui show` to return only the JSON Schema of the surface.
- Enhanced the response of `GET /api/runs/:runId/genui/:surfaceId` to include the surface spec from the AppliedPluginSnapshot.
- Introduced new flags for daemon and library commands to improve command handling and parsing.
- Added tests for the new functionality, ensuring proper behavior of the enriched responses and flag handling.

This change supports headless interactions by allowing code agents to inspect surface contracts before responding.
2026-05-11 20:27:05 +08:00
Caprika
fb079d8115
Add reliable agent-browser skill (#1284)
* Add reliable agent browser skill

* Fix ProjectView delete conversation test props
2026-05-11 20:09:12 +08:00
PerishFire
1eb20e3807
fix(web): keep tweaks selection usable without annotations (#1268) 2026-05-11 20:06:49 +08:00
初晨
0f0d214298
fix(web): render static previews for sketch json files (#1060)
* fix(web): render static previews for sketch json files

* fix(web): tolerate malformed sketch text items

* fix(web): harden sketch preview parsing

* fix(web): preserve sketch items on round-trip

* fix(web): clear sketch files destructively

* fix(web): unblock unsupported sketch saves
2026-05-11 19:29:46 +08:00
Dongsen
12ce5ad38b
fix(web): ignore <artifact> tags inside markdown code spans and fences (#1132)
* test(web): add failing parser cases for <artifact> recitation in markdown code

Cover the three real-world prose contexts where the model legitimately
quotes the artifact tag without intending to emit one:

- inside an inline backtick span
- inside a fenced code block
- spread across streaming chunks crossing the fence boundary

Establishes the RED baseline before parser code-fence awareness lands.

* fix(web): ignore <artifact> tags inside markdown code spans and fences

The streaming artifact parser scanned the buffer with a raw indexOf,
guarded only by 'next char must be whitespace'. That meant any literal
<artifact ...> the model recited while documenting the protocol — even
inside backticks or a ```html fence — flipped the parser into artifact
mode, swallowed the rest of the reply from the chat UI, and (when a
matching </artifact> appeared in the recitation) silently wrote a
spurious file to disk via persistArtifact.

Replace findOpenTag with a linear scan that tracks fenced code blocks
(```) and inline code spans (`), skipping any <artifact prefix found
inside either. If the buffer ends mid-fence, return a partial match
anchored at the fence start so the next streaming chunk can resolve
the boundary without losing fence context.

Closes #1130.

* fix(web): match renderer fence/inline-code rules in artifact parser

Codex review on PR #1132 caught that the previous fix toggled inFence on
any triple-backtick run anywhere in the buffer, including mid-line, while
the chat renderer (apps/web/src/runtime/markdown.tsx) only treats ``` as
a fence when it occupies a whole line matching /^[ ]{0,3}```(\w[\w+-]*)?\s*$/.
That asymmetry would suppress a real <artifact> tag emitted after a prose
sentence like "the opening marker is ```html and the response then writes:".

Rework findOpenTag in three passes that mirror the renderer:

  1. Walk \n-terminated lines; only a line that matches FENCE_LINE_RE
     toggles fence state. Open fences without a close (or with an
     unterminated tail line) return partial so the next chunk can resolve.
  2. Collect inline code spans with /`[^`]+`/g — the same regex used by
     renderInline — so what the parser skips matches what the user sees as
     code. Unmatched trailing backticks after the last \n hold back.
  3. Find the first <artifact …> outside any skip range; preserve the
     existing partial-prefix tail handling.

Adds a regression test covering the exact case Codex reported.

* test(web): pin parser behavior on double-backtick and in-fence string literal recitation

Two cases raised in PR #1132 review:

- a real artifact tag wrapped in '``<artifact …>``' (double-backtick
  inline code span) should not be treated as a real artifact
- a fenced JS example whose body contains a string literal like
  'const fence = "```";' should not pop fence state early and let a
  later literal <artifact> be parsed as real

Both already pass on 96e88ca because the line-anchored fence regex and
the renderer-aligned inline regex handle them correctly. Pinning the
behavior so future regressions surface as test failures.

* fix(web): make stripArtifact markdown-aware to stop truncating literal recitations

The streaming artifact parser was hardened in 96e88ca to skip <artifact>
recitations inside backticks and fences, but the post-stream stripper at
AssistantMessage.tsx still ran a naive 'content.indexOf("<artifact")' over
the same text events. As reported by lefarcen on PR #1132, that meant
chat replies with literal protocol recitations could still get silently
truncated mid-explanation — even though the parser preserved them in the
text stream and the file panel was no longer polluted with ghost files.

Extract the renderer-aligned classification (FENCE_LINE_RE, INLINE_CODE_RE,
computeSkipRanges, rangeContains) into a single source of truth at
apps/web/src/artifacts/markdown-context.ts so the parser and the stripper
agree on what counts as code. Add apps/web/src/artifacts/strip.ts with a
markdown-aware stripArtifact that:

- ignores any <artifact open inside a fenced block or inline code span
- looks for </artifact> with the same skip-range filter, so a real open
  paired with a literal close inside backticks does not strip a literal
  body that is meant to render
- returns content unchanged when an open exists with no matching real
  close (the previous implementation sliced to end-of-string, which would
  nuke trailing prose on a malformed or still-streaming tag)

Refactor parser.ts to import the shared helpers; behavior preserved (all
seven existing parser tests still pass). New strip.test.ts covers six
cases including the empirically-verified inline-backtick regression.

* fix(web): align artifact stripper/parser fence rules with renderer exactly

Two gaps surfaced in review at a0bf05f:

- markdown-context.ts used a single FENCE_LINE_RE that allowed 0-3 leading
  spaces and reused the same pattern for opening and closing fences. The
  chat renderer (runtime/markdown.tsx:44 and :49) is asymmetric — opens
  with /^```(\w[\w+-]*)?\s*$/, closes with /^```\s*$/, and rejects any
  leading indentation on either side. Indented "   ```html" was being
  treated as a code fence even though the renderer keeps it as a paragraph,
  and a literal "```html" line inside an open fenced example was closing
  the skip range early — both could expose a real or literal <artifact …>
  to the wrong handler.
- stripArtifact discarded computeSkipRanges' unclosedFenceStart, so a
  fenced literal that ends at EOF without a trailing newline (very common
  for chat output) leaked the inner <artifact …> recitation to the
  stripper, reproducing the original #1130 truncation symptom on a
  narrower input shape.

Split FENCE_LINE_RE into FENCE_OPEN_RE / FENCE_CLOSE_RE with no leading
indentation, gate the fence state machine on the right side of the toggle,
and have stripArtifact extend skip ranges to end-of-content when a fence
is left open. Also tightened the parser's tail-line hold-back regex to
match the renderer's no-leading-space rule. Added regression tests for the
EOF-unclosed-fence case, the indented pseudo-fence (renderer treats as
paragraph, stripper must strip the real artifact), and a "```html" line
inside an open fence.

Refs nexu-io/open-design#1130

* refactor(web): align streaming tail-line fence guard with FENCE_OPEN_RE

The streaming parser's tail-line hold-back used a stricter local regex
(/^```\w*$/) than the renderer's FENCE_OPEN_RE (/^```(\w[\w+-]*)?\s*$/),
missing valid opener tails like ```c++, ```ts-, or ``` (trailing space).

In practice these tails are still held back by the unmatched-backtick
parity scan that runs immediately after — three backticks in a tail line
are odd, so firstUnmatched stays set and the parser holds from that
position. So this wasn't a runtime correctness bug, just a regex
divergence that future readers could trip on.

Drop the local regex and reuse FENCE_OPEN_RE so the tail check matches
the same shape the rest of the pipeline already uses. Pinned the
behavior with three new parser tests (`+`/`-` info-string suffix and
trailing-space tails arriving as the first chunk) — they pass at HEAD,
proving the parity scan was already covering these cases.

Refs nexu-io/open-design#1132 (lefarcen polish P2)

* fix(web): scope inline-code skip ranges per block and reject <artifact prefix-shared opens

INLINE_CODE_RE previously ran over the whole buffer, so an unmatched
backtick in one paragraph could pair with a backtick in a later
paragraph and create a phantom inline span that swallowed any real
<artifact …> between them. Mirror runtime/markdown.tsx by splitting the
buffer on fence / blank / heading / list / hr boundaries and running
INLINE_CODE_RE per block region instead.

stripArtifact accepted any unskipped `<artifact` substring as a real
open, while the streaming parser already required a following whitespace
character — so prose like `<artifactual>demo</artifact>` was being
truncated to `prefix  suffix`. Extract the parser's real-open guard into
isRealArtifactOpenAt and reuse it from both sides.

While reordering findOpenTag for the shared guard, also fix the related
hold-back ordering issue tracked at #1141: a stray tail-line backtick or
fence-opener prefix used to suppress an artifact already complete
earlier in the buffer. Scan for the earliest complete real open first,
then pick the earliest hold-back position only when no complete tag was
found.

Regressions pinned in parser.test.ts and strip.test.ts for both new
finding shapes.

* fix(web): keep HR-shaped lines inside paragraph regions for inline-code scanning

The previous walker closed inline-scan regions on lines matching the HR
regex, but `parseBlocks()` in runtime/markdown.tsx does not break a
paragraph on HR — its inner accumulation loop only breaks on blank /
fence / heading / ul / ol (runtime/markdown.tsx:95-104). HR is only an
HR block in the outer loop's first-look, never mid-paragraph.

So inputs like `intro \`\n---\n<artifact …>…</artifact>\n---\nclosing \``
are one paragraph in the renderer, whose two stray backticks pair to
cover the literal artifact recitation — but the walker was splitting on
the `---` lines, leaving the recitation outside skip ranges, and the
parser/stripper would treat it as a real tag.

Drop HR from the paragraph-break list (HR-shaped lines carry no
backticks of their own, so keeping them inside the surrounding region
is benign either way) and document the renderer-mirror rationale.

Regressions pinned on both sides.
2026-05-11 19:29:22 +08:00
Sid
156bf5a34e
fix(web): refresh home projects after deleting a conversation (#1202) (#1219)
The home design cards render their `Needs input` badge from the
cached `/api/projects` payload — App.tsx owns the `projects` state
and exposes a `refreshProjects` callback that ProjectView already
fires from every other state-changing branch (run end, live-artifact
events, project rename, etc.). The conversation-delete branch
silently skipped it: deleting a conversation that owned an unanswered
`<question-form>` flips the daemon-side flag, but the home view kept
showing the stale badge until the next manual reload.

Call `onProjectsRefresh()` immediately after a successful
`deleteConversation` API response (and only then — if the request
fails the cached state is still the truth and we must not pretend
otherwise). Adds `onProjectsRefresh` to the useCallback deps for
exhaustive-deps correctness; matches the pattern at the four
existing call sites in this file.

New regression coverage in
`apps/web/tests/components/ProjectView.deleteConversation.test.tsx`:
- triggers onProjectsRefresh after deleting a conversation
  (verified RED before this fix, GREEN after)
- does not trigger onProjectsRefresh when the delete request fails
  (defensive complement so a future "always refresh" refactor
  doesn't paper over a real failure with a stale-but-confident UI)
2026-05-11 19:29:09 +08:00