Commit graph

39 commits

Author SHA1 Message Date
kami
333a62cda6
fix: link od bin after fresh install (#2069)
* fix: link od bin after fresh install

* test: lock root od bin shim path

* test: cover root workspace deps in postinstall scan

* chore(nix): refresh pnpm deps hash
2026-05-31 04:36:49 +00:00
lefarcen
df8a0faff6
feat(runtimes): register AMR (vela) as an ACP stdio agent (#2355)
* feat(runtimes): register AMR (vela) as an ACP stdio agent

AMR is the vela CLI's ACP runtime mode. `vela agent run --runtime opencode`
speaks ACP JSON-RPC over stdio (see vela's
`specs/current/runtime/manual-agent-run-openrouter.md`); per
`docs/new-agent-runtime-acp.md` we expose it through the same `streamFormat:
'acp-json-rpc'` transport that already powers Hermes, Devin, Kimi, etc.

The new `defs/amr.ts` is the entire wiring — `buildArgs` returns
`['agent', 'run', '--runtime', 'opencode']`, `fetchModels` reuses
`detectAcpModels`, and the fallback list seeds the OpenRouter ids vela's
e2e baseline uses. `executables.ts`/`app-config.ts`/`metadata.ts` get the
matching `VELA_BIN`/`VELA_LINK_URL`/`VELA_RUNTIME_KEY`/`VELA_OPENCODE_BIN`
allowlist + install/docs URLs, so users can configure the per-agent env in
Settings without leaking into other adapters.

Coverage: `tests/fixtures/fake-vela.mjs` is a minimal ACP stub that returns
the documented `initialize` / `session/new` / `session/set_model` /
`session/prompt` shapes; `tests/amr-acp-integration.test.ts` spawns it via
`child_process.spawn` and drives a full turn through `attachAcpSession` and
`detectAcpModels`, so the ACP transport contract for AMR is end-to-end
verified locally even before a real `vela` binary is installed.

Validated:
- pnpm guard
- pnpm typecheck (all workspace projects)
- pnpm --filter @open-design/daemon test (2881/2881)

Deferred: real OpenRouter-backed turn through a built `vela` binary —
the runtime def needs no changes for that path, only `VELA_RUNTIME_KEY`
and `VELA_LINK_URL` in env (or Settings).

* fix(runtimes/amr): pin a concrete default model and bare openai ids

End-to-end validation against a freshly-built `vela` (nexu-io/vela@main)
+ OpenRouter surfaced two contract details the first AMR runtime def
got wrong:

1. vela rejects `session/prompt` with `session/set_model must be called
   before session/prompt`. attachAcpSession in apps/daemon/src/acp.ts
   skips set_model whenever the picked model is the synthetic 'default'
   id, so AMR's fallback list must NOT include DEFAULT_MODEL_OPTION. The
   def now ships a concrete `gpt-5.4-mini` as both `fetchModels`'
   default option and `fallbackModels[0]`, which makes attachAcpSession
   always send a real `session/set_model` for AMR turns.

2. `vela --runtime opencode` auto-prepends `openai/` to whatever modelId
   it forwards to opencode's openai provider. With OpenRouter-style ids
   like `openai/gpt-5.4-mini`, opencode receives the double-prefixed
   `openai/openai/gpt-5.4-mini` and replies `ProviderModelNotFoundError`.
   The new fallback list ships the bare ids opencode's openai registry
   actually knows about (gpt-5.4, gpt-5.4-mini, gpt-5.4-fast, etc.).

Stub + tests:
- tests/fixtures/fake-vela.mjs now enforces the set_model gate the same
  way real vela does, so a regression that silently goes back to
  model: 'default' would surface as a fatal error in tests instead of a
  hidden production failure.
- tests/amr-acp-integration.test.ts pins both contracts: no 'default' /
  no 'openai/' prefix in fallbackModels, and a negative case that
  asserts session/prompt fails when no model is set.

Adds `apps/daemon/scripts/verify-amr-real-vela.mjs` — a small dev-time
runner that drives `attachAcpSession` against a real `vela` binary and
prints the daemon's chat events, so future protocol drift can be checked
against an actual OpenRouter call.

Verified locally: `vela agent run --runtime opencode` + OpenRouter
returns the prompted string ("AMR-E2E-PASS") through the full daemon
pipeline; daemon test suite stays 2883/2883.

* fix(runtimes/amr): substitute concrete model when chat run sends 'default'

A plugin-driven AMR run from the UI surfaced a real-world hole in the
prior commit:

  json-rpc id 3: session/set_model must be called before session/prompt

The Default-design-router plugin (and any caller that doesn't pin a
real model) sends `model: 'default'` straight through, which the AMR
runtime def cannot accept — vela rejects `session/prompt` without
`session/set_model` and attachAcpSession skips set_model whenever
model === 'default'. Just leaving DEFAULT_MODEL_OPTION out of the
adapter's `fallbackModels` is not enough: the chat-run handler in
server.ts still forwarded 'default' verbatim.

This adds `resolveModelForAgent(def, resolved, env?)` as the
single source of truth for the substitution:

  1. If the caller picked a real id, pass it through.
  2. Else, if `def.defaultModelEnvVar` is set and the daemon process
     env has a non-empty value for it, return that (operator escape
     hatch — see below).
  3. Else, if the def's `fallbackModels` does NOT contain a 'default'
     id, return `fallbackModels[0].id`.
  4. Else, return the original value (the historic shape — defs that
     list 'default' themselves are untouched).

AMR sets `defaultModelEnvVar: 'VELA_DEFAULT_MODEL'`, so when
opencode's openai-provider registry deprecates `gpt-5.4-mini`
upstream, an operator can swap the fallback id without a code change
by exporting `VELA_DEFAULT_MODEL=gpt-5.5` before launching tools-dev
/ od. Worth noting the env var must live in the daemon's `process.env`
(Settings-UI per-agent env values only reach the spawned child, not
the daemon's resolver) — the new field's docblock spells this out.

Coverage:
- `tests/runtimes/resolve-model.test.ts` — 8 unit tests covering all
  four resolver branches plus the env-override happy path / fallback /
  ignore-when-user-picked-a-real-id case.
- `pnpm --filter @open-design/daemon typecheck` clean.

* chore(runtimes/amr): move AMR to the top of the base agent list

So `AMR (vela)` shows up first in the agent picker / status views,
ahead of claude / codex. Pure ordering change; no behavior delta.

* feat(amr): Sign-in / Sign-out button on the AMR Settings card

The first half of the AMR work assumed the operator would set
VELA_RUNTIME_KEY / VELA_LINK_URL on the daemon process and never
surfaced login state to users. This adds the missing UX so a fresh
install can drive the full path from Settings:

  - GET  /api/integrations/vela/status   reads ~/.vela/config.json
    for the active profile and returns { loggedIn, profile, user }
    (without leaking the runtime/control keys themselves).
  - POST /api/integrations/vela/login    spawns `vela login` once
    (409 if one is already in flight). The vela CLI opens the user's
    browser to the device-authorization page itself — Open Design
    only needs to kick the subprocess off.
  - POST /api/integrations/vela/logout   removes ~/.vela/config.json
    so the next status read returns logged-out.

`AmrAgentCard` is a dedicated agent-card component for AMR because
the existing `<button>` row can't host an interactive sub-control
(nested interactive elements). It polls /status after a login click
until the daemon reports loggedIn=true (or 5 minutes elapse), and
exposes a Sign-out action on hover. Other adapters (claude, codex,
hermes, …) keep their existing `<button>` card.

i18n: 8 new keys (settings.amrLogin / Logout / LoggingIn / etc.)
added to en + zh-CN. Other locales spread `en` and inherit the
English copy until translations land.

Coverage:
- `tests/integrations/vela.test.ts` pins the config.json reader
  against a tmp HOME — including the negative case where a profile
  has user info but no runtimeKey (still logged-out), and the
  secret-leak guard ("rt-secret-*" must not appear in the projection
  payload).
- `tests/components/AmrAgentCard.test.tsx` covers all four UI
  states (logged-out, logging-in, logged-in, logging-out) plus the
  click-propagation invariant the divergent card was built to keep.

`pnpm --filter @open-design/daemon test` 2901 / 2901 passing.
`pnpm --filter @open-design/web test` 1719 / 1719 passing.
`pnpm typecheck` + `pnpm guard` clean.

Dev script side-effects: `apps/daemon/scripts/verify-amr-real-vela.mjs`
no longer requires both VELA_RUNTIME_KEY and VELA_LINK_URL — if
VELA_PROFILE is set, the vela CLI is allowed to resolve credentials
from `~/.vela/config.json`. Added the two AMR `.mjs` fixtures to
`scripts/guard.ts` allowlist with the executable-fixture / dev-runner
rationale.

* fix(connection-test): substitute model for AMR before attachAcpSession

The chat-run path in server.ts already routes the requested model through
`resolveModelForAgent` so AMR / vela (whose CLI demands an explicit
`session/set_model` before `session/prompt`) gets the def's first
concrete fallback id when the chat run ships `model: 'default'`.
`connectionTest.ts` was wiring `attachAcpSession({ ..., model: model ?? null })`
directly, which made the Test Connection button on the AMR Settings
card deadlock with the same `session/set_model must be called before
session/prompt` error the chat-run path already handles — surfaced as a
permanent "Testing connection…" spinner in the UI.

Reuse the same helper here so Test Connection mirrors chat-run behavior.

* test(amr): three-layer end-to-end coverage for the AMR login + turn flow

The PR up to this point shipped runtime + UI code with unit-level Vitest
coverage. This commit adds the cross-layer regression net the live demo
relied on:

1. apps/daemon/tests/integrations/vela.routes.test.ts (HTTP, Vitest)
   Spins up the real daemon Express app via `startServer({port:0,...})`,
   persists `agentCliEnv.amr.VELA_BIN = <fake>` into app-config.json,
   and exercises every /api/integrations/vela/* endpoint against the
   extended fake-vela stub:
     - status reads ~/.vela/config.json under various states
     - login spawns the fake, waits for config.json to appear, returns
       pid + startedAt + profile
     - 409 already-running guard with the stub's delay knob
     - logout removes the file (idempotent)
     - secrets (runtimeKey / controlKey) never leak in the projection
     - login → status round-trip flips loggedIn=false → true

2. e2e/tests/amr/turn.test.ts (tools-dev orchestrated, Vitest)
   Boots a namespaced daemon + web pair through `createSmokeSuite`,
   inlines a self-contained fake `vela` binary that handles BOTH
   `vela login` (writes ~/.vela/config.json) and
   `vela agent run --runtime opencode` (ACP stdio with the
   `session/set_model must precede session/prompt` gate the real binary
   enforces), then drives a complete /api/runs lifecycle for
   `agentId: 'amr', model: 'default'` and asserts the assistant message
   captures the fake's streamed text. This is the test that would have
   surfaced today's plugin-default-model regression (the `set_model
   before prompt` error) at PR time instead of demo time.

3. e2e/ui/amr-login-pill.test.ts (Playwright)
   Mocks /api/agents + /api/integrations/vela/{status,login,logout}
   to drive the Settings AMR card through the full Sign in → Signed in
   → Sign out cycle. Pins the AmrLoginPill polling contract and the
   aria-label semantics (the pill's accessible name is "Sign out" once
   logged in, regardless of which label the hover-state text shows).

fake-vela.mjs extensions:
   - Handles `vela login` argv by writing
     ~/.vela/config.json for the active VELA_PROFILE and exiting 0 —
     mirrors real vela's on-disk side-effect without the device-auth
     loop.
   - FAKE_VELA_LOGIN_DELAY_MS knob so route tests can observe the
     in-flight state of the spawn lifecycle.
   - FAKE_VELA_LOGIN_USER_EMAIL / _USER_PLAN to assert the surfaced
     user fields end-to-end.

Validated:
   - `pnpm guard` + `pnpm typecheck` (all workspace projects)
   - `pnpm --filter @open-design/daemon test`: 2998 / 2998 passing,
     including the new 8-test integration suite.
   - `cd e2e && pnpm test tests/amr`: 1 / 1 passing.
   - `cd e2e && pnpm exec playwright test ui/amr-login-pill.test.ts`:
     1 / 1 passing (6.7s).

* feat(amr): package native cli and refine login ui

* feat(amr): wire vela cli beta packaging

* docs(amr): document vela ci packaging review

* docs(amr): refine vela ci integration review

* fix(ci): refresh nix pnpm dependency hashes

* fix(pack): clean up Vela CLI packaging

* fix(pack): bundle Vela CLI support files

* fix(amr): recover login attempts from stale auth state

* test: expand AMR and automations coverage

* fix(amr): address review follow-ups

* test(web): align tasks fixtures with contracts

* fix(daemon): type wildcard route params

* fix(ci): refresh PR merge validation

* fix(amr): clear env credentials on logout

* feat(settings): inline local CLI model configuration

* fix(amr): recognize daemon env credentials

* [codex] Fix Vela companion packaging (#2979)

* Fix Vela companion packaging

* Update Nix pnpm dependency hashes

* [codex] Surface AMR account failures (#2980)

* fix: surface AMR account failures

* fix: cover AMR recovery error guidance

* chore: bump beta base version to 0.8.1 (#2990)

* Fix AMR profile and packaged runtime review issues

* Detect packaged AMR OpenCode companion tree

* feat(web): polish AMR frontend flows

* Polish AMR onboarding card

* fix: read AMR login state from dot-amr config (#3048)

* test: tighten AMR credential and packaging coverage

* test: restore AMR executable test env helper

* [codex] Fix packaged mac Dock identity and AMR label (#3076)

* Fix packaged mac sidecar Dock identity

* Rename AMR assistant label

* Fix AMR live models and dot-amr login state (#3073)

* fix: read AMR login state from dot-amr config

* fix: load live AMR models before runs

* fix: point AMR onboarding link to production wallet

* fix: address AMR model review feedback

* fix: persist live AMR model fallback

* [codex] Fix AMR link catalog model ids (#3088)

* Fix packaged mac sidecar Dock identity

* Rename AMR assistant label

* Fix AMR link catalog model ids

* Fix AMR model normalization typecheck

* Use live AMR model for default runs

* fix: polish AMR runtime settings UI

* Accelerate AMR startup defaults (#3092)

* Surface AMR insufficient balance wallet URL (#3099)

* fix(web): polish onboarding controls (#3112)

* fix(web): show CLI scan loading state

* Avoid duplicate AMR wallet recharge links (#3117)

* Avoid duplicate AMR wallet recharge links

* Use Vela CLI 0.0.3 test package

* chore(nix): refresh pnpm deps hash

* Fix AMR wallet guidance display

---------

Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>

* chore(pack): pin Vela CLI 0.0.3-test.1 (#3127)

* chore(nix): refresh pnpm deps hash

* chore(pack): pin Vela CLI 0.0.3

* chore(nix): refresh pnpm deps hash

* fix(web): suppress AMR exit 130 fallback (#3136)

* feat(web): nudge users to hosted AMR on model/auth/quota failures (#3083)

* feat(web): nudge users to hosted AMR on model/auth/quota failures

When a non-AMR agent run fails with an auth / quota / upstream model
error, surface an inline nudge under the error pill linking to Open
Design's hosted AMR gateway (https://open-design.ai/amr). The nudge
fires `surface_view` (element=run_failed_toast) on impression and
`ui_click` (element=go_amr) on the link.

Also teach the daemon to classify CLI-agent auth/quota/upstream failures
(Claude Code, codex, ...) into specific API error codes
(AGENT_AUTH_REQUIRED / RATE_LIMITED / UPSTREAM_UNAVAILABLE) instead of
the generic AGENT_EXECUTION_FAILED, so both the error message and the
nudge key off accurate codes. AMR's own runs are excluded from the
nudge — they keep the dedicated sign-in / recharge affordances.

* feat(web): rework failed-run AMR guidance into per-case error UI

Replace the single inline nudge with a per-case failed-run experience
driven by the run's error code + agent:

- The error card is now neutral gray (was red) and always carries a
  retry button; it is driven by the persisted per-message error event so
  it survives a reload.
- Non-AMR agent hitting a model/auth/quota wall: a theme-color promotion
  card under the error card offers "switch to AMR & retry" — switches the
  run to AMR, opens Settings on the AMR card, and auto-retries once the
  account signs in (ProjectView polls vela login status, independent of
  the Settings pill lifecycle, with success / 5-min-timeout / unmount
  exits).
- AMR agent unauthorized: clearer copy + an "authorize & retry" button.
- AMR agent out of balance: clearer copy + a "top up" button to the AMR
  wallet, with manual retry.
- Settings AMR card: when opened from the nudge, it scrolls into view and
  pulses, and an authorize-button coachmark (a fake hand cursor that
  rises in and dismisses on hover) points at the sign-in control when not
  yet authorized.

analytics: surface_view (run_failed_toast) on the promotion card and
ui_click (go_amr) on its action are retained. i18n adds chat.amrCard.*
and chat.amrError.* (en / zh-CN / zh-TW translated; other locales fall
back to en) and drops the old chat.amrErrorGuidance keys.

* fix(daemon): require status context for numeric service-failure codes

Per review on #3083: the model-service classifier matched bare HTTP
status numbers (`500`, `502`, `429`, `401`), so ordinary CLI output like
`line 500`, `read 502 bytes`, or `exit code 401` could be misclassified
as a provider outage / auth wall and wrongly surface the AMR nudge. Now
a status number only counts when it carries explicit context (`HTTP 500`,
`status 503`, `code: 401`, `502 Bad Gateway`); textual provider phrases
(overloaded, bad gateway, service unavailable, rate limit, …) are
unchanged. Adds fixtures proving unrelated numeric output stays null.

* fix(web): keep error pill for failed runs ChatPane's card doesn't cover

Per review on #3083: the per-message gray error pill was suppressed for
every persisted error status event, but ChatPane only renders the
replacement top-level error card for `retryableAssistantMessage` (the
last failed assistant). So a failed turn that is no longer last (after a
follow-up) or an older failed run in history showed neither the pill nor
the card — its error detail vanished, undercutting reload/history
survival. ChatPane now passes `errorCardOwnerId` (the assistant id whose
error the card represents); AssistantMessage suppresses only that one
pill and keeps rendering StatusPill for all other error events.

* fix(daemon): don't treat a process exit code as an HTTP status

Follow-up to review on #3083: the status-context helper accepted a bare
`code` prefix, so `exit code 401` / `process exited with code 429` still
matched and got classified as AGENT_AUTH_REQUIRED / RATE_LIMITED (the
very `exit code 401` case the comment calls out as noise). `code` now
only counts when qualified (`status code` / `error code` / `response
code`) or punctuation-bound (`code: 401`); bare `exit code N` no longer
matches. Adds fixtures for exit-code lines returning null.

* chore(web): translate AMR card / error keys for 16 remaining locales

PR #3083 added 10 new `chat.amrCard.*` / `chat.amrError.*` keys but only
provided en/zh-CN/zh-TW translations; the other 16 locales fell back to
English. Translate the card title/body, three chips, primary CTA, and
the AMR self-error (auth / balance) messages and buttons for ar, de,
es-ES, fa, fr, hu, id, it, ja, ko, pl, pt-BR, ru, th, tr, uk.

* fix(amr): address review feedback on #2355

Targeted fixes for the unresolved review threads on #2355. Each fix
includes / updates a focused test.

- runtimes/executables.ts: `packagedVelaOpenCodeCompanionTree` now
  verifies the inner `opencode` executable exists + is runnable, not
  just the directory. This closes the false-positive availability path
  that let `detectAgents()` surface AMR as available even when the
  packaged companion was empty / partially copied (mrcfps, 4 threads).

- runtimes/executables.ts: `resolveAmrOpenCodeExecutable` now prefers
  the bundled `<OD_RESOURCE_ROOT>/bin/libexec/opencode/opencode` over a
  stale `opencode` on the user's PATH, so packaged AMR builds can't be
  hijacked by a global installation.

- web/EntryShell.tsx: when the Local CLI scan returns an available
  agent and the previously-selected agent is AMR, switch the selection
  to the first available local agent so the runtime and persisted
  agent agree before Continue.

- server.ts (model-probe branch): for AMR, check `readVelaLoginStatus`
  BEFORE rejecting on an empty live-model catalog — a signed-out user
  was getting `AMR_MODEL_UNAVAILABLE` ("choose a model") instead of
  the correct `AMR_AUTH_REQUIRED` (sign-in affordance).

- server.ts (default model fallback): if the user asked for the AMR
  agent default and the cached id is no longer in the FRESH catalog,
  fall back to `liveModels[0]` from the probe instead of rejecting the
  run as `AMR_MODEL_UNAVAILABLE`.

- integrations/vela.ts: route `vela login` through
  `createCommandInvocation` so an npm/Node-style `vela.cmd` / `.bat`
  shim on Windows gets the correct `cmd.exe /d /s /c …` wrapping with
  verbatim args (matches `execAgentFile` / chat-run spawning).

- tools/pack/src/linux.ts: in containerized Linux builds, bind-mount
  the host directory of `OPEN_DESIGN_VELA_CLI_BIN` and rewrite the env
  to the container-side path. The host path was being passed in as-is
  even though the default container only mounts /project, /tools-pack
  and cache/home — `copyOptionalVelaCliBinary` saw a missing path.

Deferred (out of scope for this PR):
- `od amr status/login/logout/cancel` CLI subcommands (AGENTS.md
  UI/CLI dual-track rule, server.ts:5763) — sizable surface; tracked
  for a separate focused PR.
- Strict `--require-vela-cli` for Windows + mac-x64 beta builds:
  prematurely blocked — `@powerformer/vela-cli` only publishes the
  `darwin-arm64` platform binary today; adding the flag elsewhere
  would fail the builds. Revisit once win/x64/linux binaries ship.

* fix(amr): hoist sendAmrAccountFailure above the AMR catalog preflight (TDZ)

The new signed-out AMR branch in the catalog preflight at server.ts:10875
calls `sendAmrAccountFailure(...)` to emit AMR_AUTH_REQUIRED, but the
const declaration sat ~100 lines below at the outer function scope. Because
`const` is TDZ-aware, that branch would have thrown `ReferenceError:
Cannot access 'sendAmrAccountFailure' before initialization` for the
exact users it tries to help — defeating the original intent.

Hoist the helper to just above the AMR preflight block so it's available
to every AMR code path in this function. Behavior elsewhere is unchanged.

Also rerun the daemon test suite: `launch.test.ts > resolveAgentLaunch
uses packaged built-in Vela for AMR` was creating the
`<resourceRoot>/bin/libexec/opencode/` companion *directory* only, but
this PR's earlier tightening of `packagedVelaOpenCodeCompanionTree`
also requires the inner `opencode` executable. Add it to that fixture
to match the new contract; the test was a sibling of the executables /
env-and-detection fixtures already updated in 13fc4f4.

Addresses #2355 review (mrcfps, 2026-05-28).

* feat(web): add hover cancel for AMR login (#3158)

* feat(web): add hover cancel for AMR login

* fix(web): don't bounce AmrLoginPill back to 'Signing in…' after local cancel

Both codex-connector (P2) and looper (CHANGES_REQUESTED) on this PR
flagged the same race in the new local-cancel path: `handleCancelLogin`
dispatches `notifyAmrLoginStatusChanged('login-canceled')` immediately
after `/login/cancel` returns, but the `AMR_LOGIN_STATUS_EVENT` listener
unconditionally re-enters `refresh()` and then restarts polling
whenever `/api/integrations/vela/status` still reports
`loginInFlight: true`.

That is a real race because the daemon's `cancelVelaLogin()` only sends
SIGTERM (escalating to SIGKILL after `LOGIN_CANCEL_KILL_GRACE_MS` =
2000 ms) and keeps the child in `activeLoginProcs` until it actually
exits — so the first `/status` read after a successful cancel can
legally still come back as in-flight. Under that window the pill flips
back to 'Signing in…' and can later surface the timeout/error path even
though the user already canceled, defeating the behavior promised in
the PR description.

Fix the listener instead of every dispatch site: in the
`login-canceled` branch, after the local reset (stopPolling +
setPending(null) + clear refs), optimistically mark every subscribed
pill instance as not-in-flight (`setStatus((c) => c ? { ...c,
loginInFlight: false } : c)`) and `return` — skip the
refresh-and-reconcile branch below entirely. The next explicit refresh
(component mount, user interaction, or a `status-changed` event) will
pick up the daemon's confirmed state once the child has actually
exited.

Add a focused regression test that holds `/api/integrations/vela/status`
at `loginInFlight: true` even after a successful `/login/cancel`,
asserting that the pill stays at the Canceled → Authorize sequence and
never bounces back to 'Signing in…'. This test fails on the pre-fix
listener and passes on the new behavior; existing
'cancels an in-flight AMR sign-in…' and 'reconciles late AMR browser
completion to Signed in after local cancel' tests continue to pass.

Addresses review feedback on #3158 (chatgpt-codex-connector, nettee).

---------

Co-authored-by: lefarcen <935902669@qq.com>

---------

Co-authored-by: a1chzt <chizblank@gmail.com>
Co-authored-by: Amy <1184569493@qq.com>
Co-authored-by: Mason <jinmeihong0201@gmail.com>
Co-authored-by: Caprika <56862773+alchemistklk@users.noreply.github.com>
Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>
2026-05-28 05:09:55 +00:00
Patrick A
7bc11b398d
chore(deps): upgrade express 4 -> 5 in daemon (#2311)
* chore(deps): upgrade express 4.22.1 -> 5.2.1 and @types/express

Breaking changes addressed:
- Renamed all bare wildcard route segments from * to *splat across
  src/server.ts, src/static-resource-routes.ts, src/project-routes.ts,
  src/import-export-routes.ts, and all three test stubs that define
  app.get/options/delete routes using /raw/* or /raw/* patterns
- Updated wildcard param access from (req.params as any)[0] / req.params[0]
  to Array.isArray(req.params.splat) ? req.params.splat.join('/') : String(...)
  to handle the Express 5 / path-to-regexp v8 change where wildcard params
  are now string[] instead of string
- Updated app.get('*') SPA fallback to app.get('/*splat') in server.ts
- Annotated five connector route handlers with Request<{ connectorId: string }>
  so the typed param resolves as string, not string | string[], fixing the
  10 TS2345 / TS2322 errors that surfaced when @types/express moved to 5.0.6
- Fixed two app.listen() beforeAll callbacks in origin-validation.test.ts to
  accept and propagate the optional Error argument Express 5 now passes to
  the listen callback, resolving TS2769 overload mismatch

* chore(nix): refresh daemonHash for rebased lockfile

* fix(daemon): await res.sendFile() in async route handlers for Express 5 compatibility

Express 5 res.sendFile() returns a Promise. Without await, async route
handlers return before the response is sent, causing Express to call
next() and fall through to a 404. Add await to all res.sendFile() calls
in async handlers in static-resource-routes.ts and server.ts.

* fix(daemon): use readFile+send for spritesheet route instead of sendFile

Express 5 res.sendFile() returns undefined (not a Promise). ENOENT errors
call next() asynchronously after the route handler's try/catch has returned,
causing unhandled 404 responses. Replacing with fs.promises.readFile + res.send
keeps the error path fully within the handler's try/catch.

---------

Co-authored-by: Patrick A <259201958+eefynet@users.noreply.github.com>
2026-05-26 03:16:48 +00:00
lefarcen
c14baf07d3 Merge origin/main into release/v0.8.0
PR #2461 sync prep — resolves 14 conflicts merging 84 main-side commits
on top of 58 release-side commits accumulated during the 0.8.0 cycle.

Resolution summary:

Take main (theirs) where main carried deliberate forward progress:
- apps/web/src/components/PluginCard.tsx — 7 hunks, i18n migration:
  hardcoded English aria-labels/titles replaced with t() calls keyed
  on pluginCard.* (all 8 keys verified present in en.ts).
- apps/web/src/components/TasksView.tsx — 1 hunk, source-ingestion
  feature: sortedRoutines (newest-first), sourceIngestionTemplates,
  patchSourceForm, submitSourceIngestion. activeCount/pausedCount
  semantics preserved (now keyed on sortedRoutines, count unchanged).
- e2e/ui/app.test.ts — new node:fs/promises + tmpdir + path + @/timeouts
  imports needed by main-side test helpers.
- e2e/ui/settings-local-cli-codex-fallback.test.ts — menu-dismissal
  helper block added by main.

Keep both sides where each added a different field to the same object
literal:
- apps/web/src/components/ProjectView.tsx (locale + analyticsHints
  spread).
- apps/web/src/components/DesignSystemFlow.tsx (locale + analyticsHints).

Take release (ours) where release carried deliberate work that ships
0.8.0:
- CHANGELOG.md — release-side 0.8.0 entry + PR link refs; main's
  Unreleased section was the same body of work, now finalized.
- apps/landing-page/public/{apple-touch-icon,favicon}.png +
  apps/web/public/app-icon.svg — release-side visual refresh assets
  consistent with 0.8.0 stable ship.
- tools/pack/src/linux.ts — packageVersion const required by line 466;
  taking main's empty line would build-error.
- e2e/ui/project-management-flows.test.ts +
  e2e/ui/settings-api-protocol.test.ts +
  e2e/ui/settings-memory-routines.test.ts — release-side release-smoke
  hardening (shangxinyu1 + PerishFire) takes precedence on overlap.

Closes-issue / unblocks: PR #2461 sync release/v0.8.0 → main.
2026-05-23 12:17:18 +08:00
YOMXXX
3e2f037730
feat(daemon): add CTA hierarchy static QA pass (refs #2251) (#2427)
Some checks failed
visual-baseline / Capture visual baselines (push) Waiting to run
ci / Detect CI change scopes (push) Successful in 0s
landing-page-ci / Validate landing page (push) Failing after 1s
landing-page-deploy / Deploy landing page (push) Has been skipped
nix-check / build (push) Failing after 1s
ci / Validate Nix flake (push) Failing after 0s
ci / Preflight (push) Failing after 1s
ci / Core package tests (push) Failing after 1s
ci / Tools workspace tests (push) Failing after 1s
ci / Daemon workspace tests (1/2) (push) Failing after 1s
ci / Daemon workspace tests (2/2) (push) Failing after 1s
ci / Web workspace tests (push) Failing after 1s
ci / E2E vitest (push) Failing after 1s
ci / Playwright critical (starters) (push) Failing after 1s
ci / Playwright critical (core) (push) Failing after 1s
ci / Build workspaces (push) Failing after 1s
ci / App workspace tests (push) Failing after 1s
ci / Validate workspace (push) Failing after 1s
ci / Runtime trace (push) Has been skipped
* feat(daemon): add CTA hierarchy static QA pass

Introduce apps/daemon/src/qa/cta-hierarchy.ts exporting a pure
analyseCtaHierarchy(html) that parses generated prototypes with cheerio
and flags three precision-biased findings: multiple-primary CTAs in the
same section, ambiguous-weight (all CTAs share identical class + inline
style), and misleading-prominence (secondary-coded copy like "Learn
more" / "了解更多" styled with primary weight).

CTA candidates come from <button>, <a>, role="button" with btn/button/cta
class markers plus CTA copy keywords covering both English (Get started,
Sign up, Buy, Subscribe, Learn more, ...) and Chinese (立即购买,
立即下单, 了解更多, ...). Weight is inferred from class tokens
(primary/solid/filled/accent/cta) and from non-transparent inline
background-color, matching the inverse of the issue #2251 sample where
the header CTA was rendered with the neutral .btn style.

This PR only ships the pure function plus its tests. HTTP route, CLI
subcommand, and any auto-repair feedback loop are deliberate follow-ups
so the first cut can land without touching the daemon HTTP surface.

Refs #2251

* fix(qa): respect container boundaries in CTA hierarchy heuristics

Two precision fixes from review of #2427:

- computeContainerKey()'s parent fallback keyed by tag name alone, so
  flat layouts like <div><a class=btn-primary>...</a></div> repeated
  for sibling cards all landed in 'parent:div' and
  detectMultiplePrimary() reported a fake shared-section conflict on
  what is in fact one CTA per card. Switch to parent-node identity
  (positional index of the matched parent within its tag group, same
  trick the landmark branch already uses), so each sibling wrapper
  gets its own bucket.
- detectAmbiguousWeight() compared signatures across the entire
  document, so two unrelated sections each containing one '.btn' CTA
  with matching style would trigger 'ambiguous-weight' despite neither
  container having 2+ CTAs. The PR body's rule is narrower — 'every
  CTA in a container shares the same class + inline style' — so bucket
  by containerKey first and only emit the finding for containers with
  2+ CTAs whose signatures are identical.

Tests lock both behaviors down:
- sibling <div> card-grid without a landmark ancestor stays under the
  multiple-primary threshold;
- one-CTA-per-section pairs stay under the ambiguous-weight threshold.
2026-05-22 16:53:14 +08:00
lefarcen
aedbb9dbe4 release: Open Design 0.8.0
Bumps 14 workspace package.json files from 0.7.0 to 0.8.0:
- root, apps/{web,daemon,desktop,landing-page}
- packages/{contracts,host,platform,sidecar,sidecar-proto}
- tools/{dev,pack,pr}, e2e

apps/packaged was already at 0.8.0 from the preview lane.
Independently versioned packages keep their own tracks.

Adds CHANGELOG [0.8.0] - 2026-05-20 entry covering the
305 PRs merged since 0.7.0 by 75 contributors:

- Plugin engine rebuild + Plugin Registry surface
- Headless by default (desktop is thin wrapper around CLI)
- Critique Theater Phases 9 through 16
- 149 design systems with structured tokens.css
- Italian locale + CJK font fallback
- Leonardo.ai, ElevenLabs, SenseAudio providers
- Windows packaged auto-update
- Visual refresh + Quick-brief discovery overhaul
- PostHog v2 analytics
- Manual edit UX overhaul
2026-05-20 21:22:17 +08:00
Patrick A
aa8f02dbac
chore(deps): upgrade posthog-node 4.18.0 -> 5.34.6 in daemon (#2309)
Breaking changes addressed:
- posthog-node v5 replaces axios/follow-redirects/proxy-from-env with
  @posthog/core (native fetch); no call-site changes required — the
  PostHog constructor signature, capture(), identify(), groupIdentify(),
  on(), and shutdown() surface used by apps/daemon/src/analytics.ts is
  stable across the major boundary.
- shutdown() is still async in @posthog/core (PostHogCoreStateless base
  class); the IPostHog interface in posthog-node types it as void but the
  inherited Promise<void> from @posthog/core keeps await client.shutdown()
  correct at runtime.
- protobufjs resolved version: 7.5.7 (pre-existing; posthog-node v5 does
  not pull in @opentelemetry, so no change to protobufjs from this bump).
2026-05-20 15:23:28 +08:00
lefarcen
80d305858b
feat(diagnostics): add one-click log export from Settings → About (#798)
* feat(diagnostics): add one-click log export from Settings → About

Adds a new "Export diagnostics" entry under the About section that bundles
daemon/web/desktop logs, machine info, and recent macOS crash reports into
a zip the user can share when reporting issues.

- Browser hits a new daemon HTTP endpoint and triggers a download.
- Electron uses an IPC bridge with the native save dialog and reveals the
  saved file in Finder/Explorer; the Help menu also exposes it as a
  fallback when the daemon is unresponsive.

Packaging + redaction lives in a new @open-design/diagnostics package so
both surfaces share it. Sensitive JSON keys, URL query secrets, and the
current user's home path are redacted before packaging.

* build(nix): include packages/diagnostics in daemon build targets

The Nix daemon derivation builds workspace siblings in dependency order
before compiling apps/daemon. Without @open-design/diagnostics in that
list, the daemon TypeScript build fails inside the Nix sandbox with
`Cannot find module '@open-design/diagnostics'` because pnpm install
only creates the symlink — the dist output that the package.json
exports point at isn't produced until each sibling's build script runs.

* build(tools-pack): include @open-design/diagnostics in packaged INTERNAL_PACKAGES

Without this, packaged win/mac/linux builds fail with `npm error 404` when
the post-build `npm install --omit=dev --no-package-lock` step in the
assembled app tries to resolve `@open-design/diagnostics@0.2.0` from the
public npm registry. The package is workspace-private, so it has to be
tarballed via `pnpm pack` and file:-referenced from the assembled
package.json like every other internal workspace dep that daemon/desktop
depend on.

Also wires the package's `pnpm --filter ... build` into the pre-pack
workspace build step so the dist/ exists before pnpm pack runs, and
updates the two test fixtures (`win-app.test.ts`, `workspace-build.test.ts`)
that mirror INTERNAL_PACKAGES.

The diagnostics package itself is repinned to exact dependency versions
already used elsewhere in the workspace (`jszip 3.10.1`, `@types/node
20.19.39`, `esbuild 0.28.0`, `typescript 5.9.3`, `vitest 4.1.6`) so it
passes the new `pnpm guard` exact-version rule and produces a minimal
lockfile diff vs main (additions only, no resolution-string churn).

* fix(diagnostics): include `~` in bearer-token redaction char class

RFC 6750 token68 syntax allows `~`, so tokens like `Authorization: Bearer
abcd~efgh` were only partially matched by `HTTP_AUTH_SCHEME_RE`. The
regex stopped at the first `~`, leaving the tail (`~efgh`) un-redacted in
the exported diagnostics zip — a clear leak since this feature explicitly
generates support bundles for external sharing.

Add `~` to the character class and a regression test.

* fix(diagnostics): only collect renderer.log from desktop

`buildSidecarLogSources` unconditionally added `logs/${app}/renderer.log`
for daemon/web/desktop, but only the desktop runtime writes a renderer
log (see apps/desktop/src/main/runtime.ts) — daemon and web are pure
Node services with no Electron renderer. Every export therefore produced
missing-file placeholders and manifest warnings for the two phantom
paths, polluting the bundle.

Gate the renderer.log source on APP_KEYS.DESKTOP so the daemon-side
collector matches the desktop-side collector in apps/desktop/src/main/
diagnostics.ts:63.

* fix(diagnostics): mirror desktop-side renderer.log gate

The previous fix only updated the daemon-side `buildSidecarLogSources`
in `apps/daemon/src/diagnostics-export.ts`. The desktop-side collector
at `apps/desktop/src/main/diagnostics.ts` had an identical copy of the
same bug that I overlooked: it also unconditionally added
`logs/${appKey}/renderer.log` for daemon/web/desktop, producing
missing-file placeholders + manifest warnings for the two phantom paths
on every desktop-initiated export.

Apply the same `appKey === APP_KEYS.DESKTOP` gate here so both export
entry points (browser via daemon HTTP, Electron via native save dialog)
emit the same clean manifest.

* feat(diagnostics): add `od diagnostics export` CLI subcommand

AGENTS.md's dual-track capability-exposure contract requires every
user-facing feature to ship on both the web UI and the `od` CLI. The
diagnostics export was only reachable through Settings → About and the
desktop Help menu; this commit closes the loop with an `od diagnostics
export [<path>] [--json]` subcommand registered in SUBCOMMAND_MAP.

The CLI is a thin shell over the existing GET /api/diagnostics/export
endpoint — same zip output, same redaction, same crash-report scope.
Defaults to writing `open-design-diagnostics-<timestamp>.zip` in the
current directory; `--output <path>` or a positional arg overrides.
`--json` prints `{path, sizeBytes}` for shell pipelines.

Use cases this unlocks:
- A CI script can `od diagnostics export ~/artifacts/bundle.zip` after
  a failed run.
- Bug reporters on headless boxes can grab a bundle without booting
  the web UI.
- `od doctor` follow-ups can collect a full snapshot when a probe fails.

* fix(diagnostics): surface non-sidecar launch in manifest warnings

`buildSidecarLogSources()` returns `[]` when the daemon has no sidecar
runtime context, which is the standard `od` (plain) launch path —
`runDaemonCliStartup()` -> `startDaemonRuntime()` does not pass a
runtime. Settings → About and the new `od diagnostics export` previously
reported success but produced a bundle with only the summary JSONs, so
operators could not tell "no logs because plain launch" from "no logs
because something genuinely broke."

- Extend `DiagnosticsContext` with an optional upstream `warnings:
  string[]` that `buildManifest` merges into the manifest warnings.
- Emit STANDALONE_LAUNCH_WARNING from the daemon handler when
  `options.runtime == null`. The warning names the limitation and
  points the user at the sidecar entry points that DO capture logs.
- Add a regression spec at `apps/daemon/tests/diagnostics-export.test.ts`
  that drives the handler with `runtime: null` and asserts the warning
  surfaces in `summary/manifest.json` (and that `files` is empty so a
  user reading the bundle does not confuse "no log sources" with
  "missing files").
2026-05-20 09:10:51 +08:00
PerishFire
bd48c597b0
chore: pin dependency versions and harden CI caches (#2189)
* chore: pin dependency versions

* ci: enforce pinned dependency specs

* ci: fix pnpm executable invocation
2026-05-19 13:58:27 +08:00
lefarcen
b268bbe169 Merge origin/garnet-hemisphere (post-9e196d34) — Use Plugin handoff fix
Brings in 11 new garnet commits, most importantly:
- 1a90aef4 feat(plugin-use): implement plugin use handoff functionality —
  fixes the bug QA reported where /plugins Use Plugin would 422 silently
  for template plugins; new flow hands off to HomeView with the plugin
  pre-bound + input form prompted there.
- 2ac58544 feat(plugin-inputs): enhance plugin input handling with file
  upload support — extends PluginInputsForm for file uploads.
- 3b167b69 feat(plugins): registry protocol — new @open-design/registry-protocol
  workspace package (needs build before daemon boot).
- Plus enhancements to plugin metadata, GitHub installer, plugin detail
  view, login/whoami, static HTML preview paths.

Conflicts resolved:
- packages/contracts/src/api/projects.ts: HEAD's skipDiscoveryBrief
  field + garnet's contextPlugins (@-mention plugin context refs) both
  kept on ProjectMetadata.
- apps/landing-page/* (3 files): accepted HEAD — garnet had the older
  single-page landing-page header; main has the multi-page layout
  (/skills/, /systems/, /templates/, /craft/) with dynamic counts. Not
  related to the Use Plugin core fix.

New @open-design/registry-protocol package must be built before daemon
boots; pnpm install does this via postinstall already.
2026-05-14 16:32:35 +08:00
pftom
3b167b6921 feat(plugins): add registry protocol and enhance plugin management features
- Introduced the `@open-design/registry-protocol` package, enabling improved interactions with plugin registries.
- Updated the `typecheck` script in the daemon's `package.json` to include the new registry protocol.
- Enhanced the CLI with new flags and commands for better plugin management, including `yank` and additional marketplace functionalities.
- Implemented a plugin lockfile system to manage installed plugins and their versions, improving reliability during upgrades.
- Added new marketplace doctor functionality to validate plugin entries and ensure compliance with registry standards.

This update significantly enhances the plugin ecosystem by providing robust registry interactions and improved management capabilities.
2026-05-14 08:55:36 +08:00
lefarcen
53997990b7 Merge origin/main (post-0.7.0) into reconciled garnet branch
Second-pass merge layering 41+ new commits from origin/main on top of
the first reconcile commit. Headline upstream additions absorbed:

- 0.7.0 release: redesigned chat bubble user-text styling, neutralised
  palette, lucide icons, ElevenLabs audio voice option discovery in the
  prompt composer, analytics tracking (PostHog) wired across home /
  studio / create surfaces, Prometheus `/api/metrics` endpoint,
  critique-theater drop-in mount with a settings toggle.
- Misc upstream fixes (titlebar padding, release header layout, deck
  preview chrome, feedback form auto-scroll, conversation-created SSE
  on routine runs, etc.)

Conflict resolutions (12 files, ~22 hunks):

- contracts barrel + prompts/system: union of both sides; new analytics
  exports (`./analytics/events`, `./analytics/public-params`) added
  alongside garnet's plugin/atom/genui exports. Both ElevenLabs voice
  fields (audioVoiceOptions/audioVoiceOptionsError, main) and
  pluginBlock/activeStageBlocks (garnet) preserved on ComposeInput.
- daemon/server.ts: Prometheus `/api/metrics` route inserted after
  garnet's `/api/daemon/shutdown`. main's `createAnalyticsService` call
  added before the chat-run service init alongside the prior reconcile
  note about the dropped legacy POST /api/projects body.
- App.tsx: handleCreateProject now consumes both garnet's plugin
  fields (pluginId / appliedPluginSnapshotId / pluginInputs /
  autoSendFirstMessage) and main's analytics requestId. Tracking
  fires success + failure paths; PluginLoopHome auto-send sessionStorage
  flag is preserved.
- ProjectView.tsx: the garnet auto-send useEffect coexists with main's
  `useCritiqueTheaterEnabled()` hook.
- ChatComposer.tsx: imports merged (drop now-unused fetchSkills,
  add analytics provider + tracking + buildVisualAnnotationAttachment).
- index.css: main's redesigned `.msg.user .user-text` chat bubble
  styling wins over garnet's plain text rule; garnet's
  `.msg-plugin-chip*` rules preserved alongside.
- EntryView.tsx: accepted HEAD (garnet wrapper) — consistent with
  reconcile decision #2. main's added PetRail / TopTab / analytics
  view tracking is intentionally NOT brought into the wrapper; the
  follow-up to re-integrate PetRail / image-templates / video-templates
  into EntryShell still stands and now also covers analytics
  view-tracking hooks.
- daemon/package.json + pnpm-lock: merged dep set (tar + posthog-node +
  prom-client coexist).
- Test fixtures (FileWorkspace.test): kept garnet's plugin-folders
  describe block intact; main's projectKind="prototype" addition is
  dropped where it conflicted with garnet's plugin-folder fixture
  files.

Verification: `pnpm install` (after lockfile reconciled), `pnpm typecheck`
exits 0 across all workspace packages.

Follow-up not done in this commit:
- PetRail / image-templates / video-templates / 0.7.0 analytics
  view-tracking hooks need to be added to EntryShell.
- Critique-theater settings toggle UX (added on main) lives in the
  SettingsDialog hierarchy; the reconcile state preserves the
  SettingsDialog so this should work without changes, but no
  end-to-end verification yet.
2026-05-13 23:29:56 +08:00
lefarcen
d3602be666 Merge origin/main into garnet-hemisphere (reconcile)
Merge of `origin/main` (`03ed3960`, 2026-05-13 pre-0.7.0) into the
161-commit garnet-hemisphere line, reconciling the product-vibe-coded
plugin/marketplace/EntryShell surfaces from garnet with the routines /
skills / live-artifacts feature work landed on main since the fork point.

Headline decisions (full rationale + side-by-side screenshots in
`specs/change/20260513-garnet-skills-automations/reconcile-result-vs-garnet.md`):

- #1 SettingsDialog: keep main's Memory / Skills / External MCP /
  Connectors / Routines / MCP server nav items even though the top-level
  /integrations + /automations routes also cover them. Two entries
  coexist for now; revisit once Track A/B fill in the placeholder content.
- #2 EntryView: accept garnet's thin wrapper delegating to EntryShell.
  Main's PetRail sidebar + image-templates/video-templates tabs are
  intentionally deferred to a follow-up that re-integrates them into
  the new EntryShell layout.
- #3 /integrations + /automations top-level routes: kept (garnet's
  product intent). Skills tab is still a "Coming soon" placeholder
  awaiting Track A; Routines/Schedules/Live-artifacts cards on
  /automations are still mock awaiting Track B.
- #5 DesignFilesPanel: hybrid — main's pagination as primary list,
  garnet's Plugin folders section preserved between the live-artifacts
  block and the pagination block. (by-kind sections drop in favour of
  pagination; plugin-folders rendering stays because it is a
  garnet-specific product addition.)
- #7 server.ts (10 hunks, ~5400 conflict lines): manual hunk-by-hunk
  merge. Both daemon admin routes + plugin/genui routes (garnet) and
  routines/memory/skills upgrades (main) preserved. Garnet's inline
  project route block kept alongside main's `registerProjectRoutes` /
  `registerProjectUploadRoutes` modular wiring — duplicate route
  audit is a follow-up. Garnet's POST /api/projects plugin-snapshot
  resolution + default-scenario fallback is intentionally dropped from
  the inline body (now handled by registerProjectRoutes) and listed for
  follow-up re-integration into `project-routes.ts`.

Verification (worktree at /Users/elian/Documents/open-design-garnet):
- `pnpm typecheck` exits 0 across all workspace packages
- daemon (`pnpm tools-dev run web --namespace reconcile-shots`) boots,
  serves `/api/daemon/status` healthy, and survives a Playwright
  walkthrough of /integrations / /automations / home / projects /
  design-systems / plugins / settings dialog
- `@open-design/plugin-runtime` package built (was missing dist/ on
  garnet); without it the daemon's plugins/* imports fail at boot

Track A (Skills tab → real SkillsSection) and Track B (Automations
cards → real routines / live-artifacts backend) are the two remaining
follow-ups blocking the placeholder/mock content from going live. See
`spec.md` and `track-skills.md` in the same directory.
2026-05-13 22:29:21 +08:00
Nagendhra Madishetti
38a5ab69e6
feat(daemon): Critique Theater Phase 12 (9 Prometheus metrics + 6 log events + OTel span + Grafana dashboard) (#1485)
* feat(web): pure reducer for Critique Theater states (Phase 7.1)

Pure CritiqueState reducer driven by the contracts-level PanelEvent
(the same shape both the live SSE stream and the recorded transcript
emit), so a single reducer powers both the in-flight panel and the
rerun replay. Lifecycle covers run_started → running → (shipped /
degraded / interrupted / failed), with panelist_open / dim /
must_fix / close / round_end events building per-round
CritiquePanelistView entries as they arrive.

Defensive behaviour that surfaced while writing the spec tests:
- Terminal phases (shipped / degraded / interrupted / failed) are
  sticky against further lifecycle events for the same run, except
  for parser_warning which can land late and is recorded in a side
  channel without changing phase.
- A new run_started for a different runId at any time discards the
  prior state and reboots, so the UI can launch consecutive runs
  without an explicit reset action.
- Events whose runId does not match the active run return the same
  state reference, so React's useReducer doesn't re-render
  subscribers on stray traffic.
- Round bookkeeping keys by round number rather than "always last",
  so an out-of-order panelist_dim for round 1 arriving after a
  round 2 dim does not corrupt the round 2 bucket.

Test coverage: 18 cases covering each transition, the runId guard,
sticky-terminal behaviour, the out-of-order round invariant, and
the stable-identity guarantee. Sets up Phase 7.2 and 7.3 to wire
SSE + replay into the same reducer.

* feat(web): useCritiqueStream hook subscribes to SSE and feeds reducer (Phase 7.2)

createCritiqueEventsConnection is a pure connection manager that
mirrors apps/web/src/providers/project-events.ts: opens an
EventSource at /api/projects/:id/events, listens for every name in
CRITIQUE_SSE_EVENT_NAMES, decodes each frame back into a PanelEvent
(stripping the critique. prefix and merging the data payload), and
hands it to the caller's onEvent. Reconnect uses exponential
backoff (1s → 30s) and resets on `ready`; malformed payloads drop
with a dev-mode warning rather than tearing the stream.

useCritiqueStream wraps the manager in a useReducer that owns the
CritiqueState. enabled=false or a null projectId tears down the
connection cleanly; switching projectId closes the old connection
and opens a fresh one. The returned dispatch lets local UI
synthesise actions (e.g. an Esc keypress firing a synthetic
interrupted while a kill request is in flight); production traffic
comes from the SSE stream.

Test coverage:
- sse.test.ts (10 cases, node env): subscription set covers every
  CRITIQUE_SSE_EVENT_NAMES channel; payload decoding lifts the wire
  shape back to PanelEvent; malformed JSON is swallowed and does
  not stop the stream; exponential backoff schedule and ready-reset
  semantics are pinned with a setTimeout seam; close() cancels
  pending reconnects and shuts the live source; no-op fallback
  when EventSource is unavailable.
- useCritiqueStream.test.tsx (6 cases, jsdom env): idle pre-event,
  reducer driven by synthetic actions, no connection when disabled
  or projectId is null, clean close on unmount, projectId change
  reopens cleanly.

* feat(web): useCritiqueReplay hook drives reducer from transcript file (Phase 7.3)

Fetches the per-run NDJSON transcript (one PanelEvent per line),
parses every line via the shared isPanelEvent predicate, and
dispatches into the same CritiqueState reducer the live SSE stream
uses. A single reducer means the UI rendering a replay can be
identical to the live panel, and a UI mounting both
useCritiqueStream and useCritiqueReplay in parallel does not have
to reconcile two state shapes.

speed knob is `paused | instant | live | { intervalMs: N }`.
- instant flushes every event synchronously, useful for opening a
  finished run already at its terminal state.
- intervalMs paces dispatches at a fixed cadence so the reviewer
  can watch the run unfold.
- paused parses the transcript but holds events back until the
  caller advances speed (consumers can drive a scrubber later).
- live is reserved for the future "playback at original cadence"
  feature, currently treated as instant; replay timestamps are not
  yet persisted with each event so honest pacing requires a
  follow-up Phase 7+ task.

gunzip seam handles `.ndjson.gz` transcripts via
DecompressionStream when present; the production fetch path picks
between text and arrayBuffer based on the URL extension. Both seams
are injectable so the unit tests don't need to spin up a real
network or a real gzip pipeline.

Test coverage (8 cases, jsdom env):
- Idle status before any URL is provided.
- speed=instant flushes the full transcript synchronously to
  shipped state.
- speed={intervalMs:N} paces with the setTimeout seam, reaching
  done after the last tick.
- speed=paused leaves status=playing with no dispatches.
- Empty transcript reports done with state still idle.
- Fetch rejection surfaces an error status with the message.
- Malformed NDJSON lines are skipped; valid events around them
  still land.
- .gz transcripts route through the gunzip seam.

Closes the Phase 7 plan tasks 7.1 / 7.2 / 7.3 (reducer + stream +
replay), all on one branch ready for review. Phases 8+ (Theater
components) consume these from this PR.

* fix(web): close payload-override gap + paused-resume bug in Critique Theater hooks (Phase 7 review)

Two P1 fixes from lefarcen's review on PR #1307:

SSE payload override

`sseToPanelEvent` previously spread `data` after the channel-derived
`type`, so a payload-provided `type` could override the channel and
route a `critique.run_started` frame into the reducer as a `ship`
action. Reversed the spread so the channel-derived `type` is
authoritative, and revalidated the resulting object through the
contracts-level `isPanelEvent` predicate before returning. Frames
that fail validation (missing runId, empty runId, unknown type) are
dropped, so a malformed or compromised SSE frame can no longer
dispatch a wrong-shape action into the reducer.

Three new sse.test.ts cases pin the regression: hostile `type:'ship'`
in the payload still resolves to `run_started`, missing runId is
dropped, empty runId is dropped.

Replay pause/resume

`useCritiqueReplay` had one big effect keyed on `transcriptUrl`
only, so flipping `speed` from `paused` to `instant` never re-fired
and the held events sat undispatched. Split into a parse effect
(depends on URL, fetches and stores events in state) and a pace
effect (depends on parsed-events + speed, owns the cursor + timers).
The playback cursor lives in a ref that survives pause/resume
cycles, so flipping `paused` -> `instant` flushes from the current
position rather than restarting (which would double-dispatch
`run_started` and reset the reducer).

Two new useCritiqueReplay.test.tsx cases:
- paused-then-instant transitions from `playing` to `done` and
  reaches the shipped terminal phase
- intervalMs paced playback dispatches one event, pauses to drain
  the next scheduled timer, flips to instant, and confirms the
  remaining transcript drains exactly once (cursor was preserved)

Doc consistency

The earlier source comment in useCritiqueReplay.ts claimed `live`
"paces by recorded timestamps" while the impl used zero-delay
timers and the PR body said it behaves like `instant`. Aligned to
reality: `live` currently behaves like `{ intervalMs: 0 }` (events
drain on successive microtasks via setTimeoutFn) because transcripts
do not yet carry per-event timestamps. Honest timestamp-driven
pacing is queued as a Phase 7+ follow-up.

Validated: pnpm guard, pnpm --filter @open-design/web typecheck,
Theater suite 47/47 (up from 42, +3 sse + 2 replay), full web suite
96 files / 888 tests.

* feat(i18n): seed Critique Theater key block (en + zh-CN; other locales fall back via spread)

* feat(web): Theater PanelistLane component (Phase 8.1)

* feat(web): Theater ScoreTicker component (Phase 8.2)

* feat(web): Theater RoundDivider component (Phase 8.3)

* feat(web): Theater InterruptButton component with Escape keybind (Phase 8.4)

* feat(web): Theater TheaterDegraded chip (Phase 8.5)

* feat(web): Theater TheaterCollapsed post-run summary (Phase 8.6)

* feat(web): Theater TheaterTranscript replay surface (Phase 8.7)

* feat(web): Theater TheaterStage top-level container (Phase 8.8)

* feat(web): Theater CSS using existing semantic tokens (no hex literals)

* feat(web): Theater public exports barrel

* fix(web): resolve P2 + P3 review feedback on Phase 8 (PR #1314)

Addresses all 4 P2 + 3 P3 items from codex, Siri-Ray, and lefarcen.

State-lifecycle fixes (3 x P2)
1. Reducer learns a synthetic `__reset__` action (`CritiqueResetAction`).
   Host hooks dispatch it when their gating prop changes so a stale
   run from a prior project / transcript cannot bleed into the next
   context. Reset is idempotent on idle (returns the same reference).
2. `useCritiqueStream` dispatches `__reset__` at the top of its
   connection effect, so a workspace switch from project A (which
   streamed a critique) to project B clears the reducer before the
   new EventSource opens. enabled=false also clears.
3. `useCritiqueReplay` dispatches `__reset__` at the top of its
   parse effect, so transcriptUrl swaps (including swap-to-null after
   a replay reached `shipped`) lift the reducer back to idle before
   the new fetch starts.

SSE validation (1 x P2)
4. `sseToPanelEvent` now runs a per-variant `hasValidVariantShape`
   check after the cheap `isPanelEvent` predicate. A
   `critique.ship` frame missing `composite` / `round` / `status` /
   `artifactRef` is rejected before reaching the reducer, so
   TheaterCollapsed can no longer crash on `undefined.toFixed(1)`.
   Every variant's required fields are validated: run_started
   (protocolVersion, non-empty cast, maxRounds, threshold, scale),
   panelist_* (round, role, plus variant-specific shape), round_end
   (round, composite, mustFix, decision in {continue,ship}, reason),
   ship (round, composite, status, artifactRef.{projectId,artifactId},
   summary), degraded (reason, adapter), interrupted (bestRound,
   composite), failed (cause), parser_warning (kind, position).

Reducer correctness (1 x P2)
5. `panelist_open` now materializes the round + an empty panelist
   view (`{dims: [], mustFixes: []}`) so TheaterStage can highlight
   the in-progress lane the instant the tag opens. Before this, a
   stream that emitted only `panelist_open` after `run_started` left
   `rounds = []` and the UI rendered no current round until a later
   `panelist_dim` arrived.

Polish (3 x P3)
6. Brand role tint swaps from `var(--magenta, var(--accent))` to
   `var(--purple, var(--accent))`. `--purple` is actually defined
   across the design systems; `--magenta` is not, so Brand was
   silently falling through to `--accent` and looking identical to
   Designer.
7. New i18n key `critiqueTheater.interruptedSummary` for the
   interrupted-collapse copy ("Interrupted at round N, best
   composite X.X"). Previously the interrupted branch reused
   `shippedSummary` and the UI read "Shipped at round..." for a run
   that specifically did not ship. Native value in en + zh-CN; other
   locales fall back via `...en` spread.
8. `TheaterDegraded` heading id comes from `useId()` instead of a
   hardcoded `theater-degraded-heading`, so two chips rendered on
   the same page (chat history with multiple completed runs) keep
   their aria-labelledby references unambiguous.

Tests (15 new cases)
- reducer.test.ts (+5): __reset__ on running/terminal/idle, panelist_open materializes round, panelist_open does not stomp prior panelist data.
- sse.test.ts (+6): variant-level rejection for ship without required fields, degraded without adapter, run_started with empty cast, panelist_dim with non-numeric score, round_end with unknown decision, plus a positive fully-formed ship.
- useCritiqueStream.test.tsx (+2): state reset on projectId change, state reset on enabled flip false.
- useCritiqueReplay.test.tsx (+1): state reset on transcriptUrl swap to null after a replay reached shipped.
- TheaterCollapsed.test.tsx (text-pinning update): asserts the interrupted branch reads "Interrupted at round 1" + "best composite 7.9", and explicitly NOT "Shipped at round...".
- TheaterDegraded.test.tsx (+1): two chips on the same page get unique aria-labelledby ids that each resolve to an `<h3>`.

Validated
- pnpm guard clean
- pnpm --filter @open-design/web typecheck clean
- Theater suite: 13 files, 101 tests (was 86 on the first Phase 8 push, +15 new)
- tests/i18n/locales.test.ts 5 of 5 across 18 locales

* feat(web): CritiqueTheaterMount wires SSE + reducer into a single drop-in (Phase 9.1)

* feat(i18n): Critique Theater strings for de + ja + ko + zh-TW (Phase 9.2)

* fix(web): resolve P1 + P2 review feedback on Phase 9 (PR #1315)

Addresses every blocker from codex, Siri-Ray, and lefarcen. The
three state-lifecycle and SSE-validation issues they also flagged
inherit fixes from PR #1314's review pass that this branch now sits
on top of after rebase.

Real daemon kill on Interrupt (P1)
- CritiqueTheaterMount now POSTs to
  /api/projects/:id/critique/:runId/interrupt alongside the
  optimistic local dispatch. Before this fix, clicking Interrupt
  only flipped the React state to interrupted while the daemon job
  kept running. The fetch is best-effort: a 404 (endpoint not wired
  yet, lands in Phase 15) is swallowed with a dev-mode console.warn
  so the UI still moves to the collapsed badge.
- New fetchInterrupt test seam lets RTL assert on the URL / method
  and simulate the "daemon not ready yet" path. Two tests pin both:
  the happy URL proj-42/critique/run-abc/interrupt POSTs, and a
  rejected fetch still flips the UI.

interruptPending reset on new run (P2)
- A ref-backed effect compares the current runId against the last
  one we saw; when it changes, interruptPending is cleared. A user
  who interrupts run-1 and then triggers run-2 from the same mount
  now gets a fresh, enabled kill button instead of one stuck in
  "Interrupting…". Pinned by a new mount test.

Escape keybind scope (P2)
- InterruptButton now checks the keydown target. Escape inside an
  input, textarea, select, or contenteditable element is ignored
  (and any ancestor of those via closest() is treated the same
  way). Body-level focus still fires the keybind so the Theater
  area's affordance keeps working. Four new tests cover textarea,
  input, contenteditable, and the body-focus positive case.

userFacingName i18n key (P2)
- The spec at specs/current/critique-theater.md:6 mandates a single
  critiqueTheater.userFacingName key so the "Design Jury" label can
  be renamed without touching code. Phase 8 introduced
  critiqueTheater.title by mistake; renamed across types.ts, en.ts,
  zh-CN.ts, de.ts, ja.ts, ko.ts, zh-TW.ts, and the lone consumer
  TheaterStage.tsx. The locale alignment test stays green.

Validated
- pnpm guard clean
- pnpm --filter @open-design/web typecheck clean
- Theater suite: 14 files, 112 tests (was 101 before, +11 new for
  the Phase 9 review pass: 3 mount + 4 InterruptButton focus scope;
  the rest were already in #1314's review fix).
- tests/i18n/locales.test.ts 5 of 5 across 18 locales.

* feat(daemon): adapter-degraded registry with TTL (Phase 10.1)

In-memory registry recording adapters that produced malformed or
oversize transcripts so the orchestrator can skip them for a TTL
window (default 24h) instead of cycling through known-bad providers
on every run.

Records carry reason (malformed_block | oversize_block |
missing_artifact), source label, and expiresAt. The test-only
clock seam lets the suite advance time deterministically and prove
that an expired entry stops counting as degraded without anyone
calling clearDegraded.

7/7 vitest cases green.

* feat(daemon): synthetic good + bad adapter fixtures (Phase 10.2)

Two test-only adapters that read the existing v1 transcript
fixtures (happy-3-rounds and malformed-unbalanced) and replay them
as either a full string or a 512-byte chunked stream. The chunked
form is what the conformance harness uses to prove the parser
holds together when the transcript arrives in arbitrary network
slices, not as one buffered blob.

* feat(daemon): adapter conformance harness (Phase 10.3)

runAdapterConformance pulls a transcript through the same
parseCritiqueStream pipeline the orchestrator uses and classifies
the outcome as shipped, degraded, or failed. On a degraded
outcome it forwards the matched reason to the adapter-degraded
registry, so a single nightly conformance run is what populates
the skip list rather than the orchestrator learning each adapter
is broken at request time.

5/5 vitest cases green covering shipped, malformed degraded,
oversize degraded, no-ship failure, and the harness-thrown
failure path.

* test(e2e): Critique Theater Playwright suite (Phase 11)

Six tests, one viewport per visual case, deterministic SSE
fixtures stubbed via page.route(). Adds the suite to
test:ui:extended so the existing extended-UI lane picks it up.

Coverage:

  1. Happy path: a single mounted theater plays the full
     fixture (1 run_started, 5 panelists open / dim / must_fix /
     close, 1 round_end, 1 ship) and ends on the score badge.
  2. Interrupt mid-run: the panelist that is open at the time
     the interrupt button is clicked closes with an interrupted
     marker and the transcript freezes there.
  3. Visual regression at 375x720 mobile.
  4. Visual regression at 768x1024 tablet.
  5. Visual regression at 1280x800 desktop.
  6. A11y role tree: the theater region exposes a labelled
     landmark, each panelist lane is a group with an accessible
     name, the score is a status live region.

All SSE traffic is stubbed by page.route so the suite runs in CI
without a daemon. The toggle is seeded via localStorage by
bootAppWithCritiqueEnabled so the gate behaves as if Settings
flipped it on. typecheck clean; playwright --list reports 6.

* test(web): reducer p99 bench at 10k iterations (Phase 13.1)

Locks the documented 2ms budget for the Critique Theater reducer
on a representative SSE script (27 actions, one full happy run)
behind a regression gate. Asserts p99 stays under 4ms (2x the
documented budget) so CI runners with a noisy neighbour do not
flake while a real regression to 20ms or 200ms still trips.

The bench is a vitest case rather than a bare microbenchmark so
it runs in the same CI lane as every other web test and does not
need a parallel runner.

* test(web): critique surface coverage walker (Phase 13.2)

Walks the public critique surface (11 SSE event names, 5 panelist
roles, 6 lifecycle phases, 9 named i18n keys) and asserts each
named symbol appears in both the src corpus and the test corpus.
The walker is the gate that catches a rename in one half of the
codebase without a matching update in the other half: a future
PR that drops 'panelist_must_fix' from the reducer without also
removing its test reference fails this suite.

62 assertions, one per symbol per corpus.

* docs: Critique Theater user guide (Phase 14.1)

Seven sections aimed at end users (not contributors):

  1. What is Design Jury
  2. How it works (the five panelists, auto-converging rounds,
     the composite formula)
  3. Settings (the M1 toggle and what it does)
  4. Reading the score badge
  5. Replay surface
  6. Troubleshooting (degraded, interrupted, failed)
  7. FAQ

The composite formula is documented as
    designer * 0 + critic * 0.4 + brand * 0.2 + a11y * 0.2 + copy * 0.2
because anyone trying to reverse-engineer the score is going to
search for those weights and the docs are the place they should
land first.

* docs(daemon): critique module AGENTS map (Phase 14.2)

Daemon-side wayfinder for the apps/daemon/src/critique directory.
Tables every file, what owns what invariant, and the 'when you
change anything here' guide so a future contributor does not
have to reverse-engineer the rollout resolver before adding a
new SSE event.

* docs(web): Theater module AGENTS map (Phase 14.3)

Web-side mirror of the daemon AGENTS map. Same file table, same
invariants section, same change-impact guide, sized to the
Theater component package.

* feat(daemon): rollout flag resolver (Phase 15.1)

Single decision point every caller consults to know whether the
orchestrator should wire the critique pipeline for a given run.
Priority:

  1. Skill-level policy (required wins, opt-out wins inversely)
  2. Per-project override from the Settings toggle
  3. OD_CRITIQUE_ENABLED env override
  4. Rollout phase default
       M0 dark-launch      false
       M1 settings only    false (toggle is off until the user flips it)
       M2 per-skill        true if skill opted in
       M3 global default   true

OD_CRITIQUE_ROLLOUT_PHASE parser defaults to M0 on unknown input
so a fresh install never surprises a user with the feature on.

10/10 vitest cases green covering every cell of the matrix.

* feat(web): Settings toggle hook for Critique Theater (Phase 15.2)

React hook that reads critiqueTheaterEnabled from the existing
open-design:config localStorage blob and stays in sync via:

  - the platform storage event (cross-tab)
  - a open-design:critique-theater-toggle CustomEvent (same-tab)

Same-tab event is the one that fires when the Settings panel saves
in the current window: the toggle and every mounted theater update
without a page reload.

setCritiqueTheaterEnabled(next) is the imperative setter the Settings
panel calls. It preserves the rest of the stored config (mode, apiKey,
etc.) and dispatches the same-tab event after the localStorage write.

The web hook reflects what the user toggled; the daemon-side
isCritiqueEnabled is the final routing authority (project override,
env, rollout phase). When they disagree, the daemon wins for backend
gating and the web reflects the toggle state.

6/6 vitest cases green covering first read, stored read, same-tab
event flip, config preservation, corrupted JSON tolerance, and
cross-tab storage event.

* test(web): Phase 15 toggle hook failure-mode coverage (PR #1320)

lefarcen P2 on PR #1320 flagged that the PR body claimed safe
behavior for disabled localStorage, non-object JSON, and missing
CustomEvent shim, but the suite only covered corrupt JSON plus
happy-path storage events. Added four failure-mode tests so the
swallowed errors are not silently traded for a throw in a future
refactor:

1. Returns false on a stored JSON value that parses to an array
   (non-object). Catches a regression where the guard treats
   anything truthy as a config blob.
2. Returns false on a stored JSON value of literal 'null'.
   typeof null === 'object' in JS, so the guard has to check null
   explicitly; this test pins that check.
3. Returns false when localStorage.getItem throws (private mode /
   disabled storage / SecurityError). The hook must swallow and
   return false so the rest of the app keeps rendering.
4. setCritiqueTheaterEnabled still dispatches the same-tab
   CustomEvent when localStorage.setItem throws (quota exceeded /
   disabled storage). The dispatch path is the in-session
   broadcast that keeps every mounted hook coherent even when
   persistence is unavailable; verified by mounting two probes
   and asserting both flip after the setter is called with a
   throwing setItem.

10/10 vitest cases green (6 existing + 4 new).

* fix(web): honor CustomEvent payload in toggle hook listener (PR #1320)

Both Siri-Ray (blocking) and lefarcen (P2 new) caught the same
real bug in the failure-mode test I added in affcdd27: the test
asserts the in-session UI flips when localStorage.setItem throws,
but the CustomEvent listener was ignoring the event's typed
detail and just calling readToggle(). Under a throwing setItem
the localStorage value is stale (or absent), so the listener
would see the OLD value and the test would fail (or worse, the
production claim 'in-session event keeps mounts coherent' was
hollow).

Fixed the hook, not the test: the listener now reads
event.detail.enabled when it is a boolean, falling back to
readToggle() only for malformed events or for cross-tab storage
events (which do not carry a typed payload). The setter already
dispatched the detail; the listener just was not consuming it.

Test changes:

  - The existing 'setItem throws' test now asserts the right
    behavior for the right reason. Updated the inline comment to
    say the listener reads from detail, not localStorage.
  - New test 'falls back to readToggle when the CustomEvent
    carries no usable detail' pins the fallback path: a
    malformed dispatcher (no detail, or detail.enabled not a
    boolean) degrades cleanly instead of throwing or being
    silently ignored.

11 / 11 vitest cases green (10 prior + 1 new fallback).

* feat(daemon): route critique spawn-path eligibility through the rollout resolver

The wireup edit Phase 10 and Phase 15 carved out: today server.ts gates
the critique pipeline on critiqueCfg.enabled, which is just the
OD_CRITIQUE_ENABLED env var. After this commit it gates on
isCritiqueEnabled(...) from the Phase 15 resolver, so the full
priority matrix is live:

  1. Per-skill od.critique.policy veto (opt-out / required)
  2. Per-project override (M1 Settings toggle, written through the
     existing Phase 6 settings endpoint)
  3. OD_CRITIQUE_ENABLED env override (power-user lane / CI fixtures)
  4. OD_CRITIQUE_ROLLOUT_PHASE default
       M0 dark-launch      false
       M1 settings only    false
       M2 per-skill        only when skillPolicy === 'opt-in'
       M3 global default   true

Default behaviour on a fresh install is unchanged: the resolver
returns false at M0 without an env override or a project override,
so prod traffic falls through to the legacy single-pass path
exactly the way it did before.

Inputs threaded today: phase from OD_CRITIQUE_ROLLOUT_PHASE,
envOverride from OD_CRITIQUE_ENABLED. skillPolicy and projectOverride
are passed as null for the v1 cutover; the daemon-side handler that
round-trips critiqueTheaterEnabled on the project settings row and
the od.critique.policy frontmatter resolver land as the next two
commits in this branch.

The three call sites that used critiqueCfg.enabled (the brand-thread
guard, the skill-thread guard, the top-line critiqueShouldRun
compound) now read from a single locally-scoped critiqueEnabledForRun
boolean, so the eligibility check is computed exactly once per spawn
and the prompt composer + orchestrator stay in lockstep the way
the existing comment already promised.

Tests still green: daemon vitest 22 / 22 across rollout +
conformance + adapter-degraded. Daemon typecheck clean.

* feat(web): mount CritiqueTheaterMount in ProjectView

The web counterpart of the daemon wireup. ProjectView now renders
<CritiqueTheaterMount projectId={project.id} enabled={...} /> as a
sibling of <AppChromeHeader> inside the top-level <div className="app">.

The mount is the drop-in from the Phase 9 stack: it owns the SSE
subscription, the kill-request handshake, and the phase-aware swap
from the live <TheaterStage> to the collapsed badge once a run
settles. The mount returns null until the daemon emits a
critique.run_started for the active project, so the visual surface
is byte-for-byte unchanged for users who have not opted in.

Enabled wiring: useCritiqueTheaterEnabled() reads the M1 Settings
toggle from the existing open-design:config localStorage blob and
stays in sync with both the platform storage event (cross-tab) and
the same-tab open-design:critique-theater-toggle CustomEvent the
Phase 15 setter dispatches. The hook honors the event payload
directly so a private-mode browser that cannot persist the toggle
still updates the in-session UI correctly.

The daemon-side gate (isCritiqueEnabled in apps/daemon/src/server.ts)
remains the authority for whether a run is actually wired through
the critique pipeline. This hook only governs whether the web layer
renders the resulting SSE stream when the daemon emits one. The
two-layer gate is intentional: an integrator embedding the Theater
in a custom UI can flip the web visibility independent of the
daemon's routing decision, and a daemon-side env override flips
backend gating without touching the web's localStorage.

Tests still green: web Theater suite 181 / 181 across 16 files.
Web typecheck clean.

* feat(daemon): resolve od.critique.policy frontmatter at the spawn site

The next step in the wireup branch's ladder: replace the placeholder
`skillPolicy: null` with the actual value parsed from the active
skill's SKILL.md frontmatter.

Three small edits, one new field on a public type:

1. SkillInfo gains a `critiquePolicy: SkillCritiquePolicy` field
   carrying the parsed `od.critique.policy` token (required /
   opt-in / opt-out / null). The field is null when the skill has
   no opinion, which lets the lower-priority resolver tiers
   (projectOverride, envOverride, phase default) decide.

2. listSkills() populates the new field via a small
   `normalizeCritiquePolicy` helper that tolerates the YAML
   scalar's casing and trims whitespace. Unknown tokens collapse
   to null so a typo in SKILL.md cannot accidentally force the
   panel on or off; it just falls through. Derived example cards
   inherit the parent's policy.

3. server.ts captures `skill.critiquePolicy` into a hoisted
   `skillCritiquePolicy` variable inside the existing skill-load
   block, then threads it into the isCritiqueEnabled call as the
   skillPolicy input. The hoisting keeps the variable in scope at
   the resolver call site without restructuring the spawn handler.

After this commit, the priority matrix the rollout resolver was
designed for is live for its top tier. The previous commit wired
env + phase; this one wires skill. The projectOverride input
remains null pending the next commit that extends the Phase 6
settings endpoint.

Daemon vitest: 10 / 10 rollout cases pass against the new wiring.
Daemon typecheck: clean.

* feat(daemon): feed projectOverride into the rollout resolver from project metadata

Replaces the placeholder `projectOverride: null` in the spawn
handler with the actual value the Settings panel writes onto the
project's metadata blob: `critiqueTheaterEnabled?: boolean`.

The read is defensive at the boundary: the metadata object is
typed loosely (it round-trips through SQLite as a free-form JSON
blob), so the spawn handler narrows to `boolean` and falls
through to `null` for any other shape. A missing key, a malformed
value, or a project that has never visited Settings collapses to
`null`, which is exactly the resolver's "no opinion, fall
through to env / phase" signal.

The `critique` frontmatter slot also gets typed on the
SkillFrontmatter shape so the `od.critique.policy` chain the
previous commit introduced no longer needs a bracket-access
cast. Same pattern as the existing `craft`, `preview`, and
`design_system` nested-record slots.

After this commit, every tier of the rollout resolver's priority
matrix is wired:

  1. skillPolicy   (from SKILL.md od.critique.policy)
  2. projectOverride (from project metadata critiqueTheaterEnabled)
  3. envOverride   (from OD_CRITIQUE_ENABLED)
  4. rollout phase (from OD_CRITIQUE_ROLLOUT_PHASE)

The write path for projectOverride still flows through the
existing project-update handler the Settings panel already uses
to persist project metadata; no new endpoint is needed. The
Settings UI button that calls setCritiqueTheaterEnabled and
posts the new field is the next commit on this branch.

Daemon typecheck: clean. Daemon vitest: 10 / 10 rollout cases
still green against the new wiring.

* fix(daemon): forward critique events to project sinks + align composer gate (PR #1338)

Two codex review items addressed in one commit since they share the
same root cause (resolver-enabled run hits a transport / prompt
contract that was still env-gated):

P1 (transport mismatch). The daemon emits critique.* SSE frames
through critiqueBus -> design.runs.emit, which fans out on
/api/runs/:runId/events. The web CritiqueTheaterMount subscribes to
/api/projects/:projectId/events (it's project-scoped, not run-
scoped, because the mount lives at the project workspace and
follows the user across runs). Result: in production the mount
never sees a real frame and the e2e tests' stubbed routes hide the
mismatch.

Fixed by extending critiqueBus.emit to fan out to BOTH sinks: the
existing runs.emit transport, AND the per-project event-sinks map.
The project-events route emits via sse.send(payload.type, payload),
so we pack the SSE channel name onto payload.type and let the sink
push the right channel. The web sseToPanelEvent overwrites type
from the channel name on the way back into a PanelEvent, so the
round-trip stays correct.

P2 (prompt gate misalignment). composeSystemPrompt reads
cfg.enabled to decide whether to append the panel addendum, but
critiqueCfg.enabled is loaded from OD_CRITIQUE_ENABLED only. A run
the resolver enabled via phase / project / skill (env unset) would
have critiqueShouldRun = true while critiqueCfg.enabled remained
false, dropping the panel prompt while still routing through
runOrchestrator -> parser waits for tags that never arrive -> run
degrades.

Fixed by passing a derived config { ...critiqueCfg, enabled: true }
to the composer when critiqueShouldRun is true. The composer's own
gate now agrees with the resolver decision on every input the
spec defines.

Daemon typecheck: clean. Daemon vitest: 10 / 10 rollout cases
still green against the new wiring.

* fix: address PerishCode P1 + P2 follow-ups on PR #1338

Two follow-up items PerishCode flagged on the activation PR.
Non-blocking but both are real:

1. Phase 11 e2e suite was wired into test:ui:extended but lands
   the user on '/' (home route) where ProjectView (and therefore
   CritiqueTheaterMount) is never rendered. With the suite as
   written, every assertion would time out the first time the
   lane runs in CI, contradicting the PR body's claim that the
   suite stays parked behind test.describe.fixme.

   The state diverged from my earlier Phase 11 work because the
   merge from main on commit 4ab719c6 brought in #1307's
   squash-merged version of the e2e file (the pre-fixme shape).

   Re-applied test.describe.fixme to the describe block plus
   removed ui/critique-theater.test.ts from the test:ui:extended
   script in e2e/package.json. Added a file-header docblock
   explaining what the follow-up commit needs to do: replace
   goto('/') with /projects/:id navigation similar to
   app-design-files.test.ts, split the SSE fixture into a live
   prefix and terminal suffix (Codex P2 on PR #1320), and commit
   the first PNG baselines.

2. bestRoundOf in CritiqueTheaterMount returned the LAST round
   with a numeric composite, not the round with the HIGHEST
   composite, while bestCompositeOf correctly returned the max.
   A run that closed round 1 at 8.5 and round 2 at 6.0 would
   dispatch interrupted { bestRound: 2, composite: 8.5 } on a
   user-clicked interrupt.

   Folded the two helpers into a single bestRoundAndComposite
   that walks state.rounds once and returns the matching pair so
   the two values cannot drift. The onInterrupt callback now
   destructures from one helper instead of two independent reads.
   Falls back to (state.activeRound, 0) when no round has closed
   with a composite yet.

Web typecheck: clean. CritiqueTheaterMount.test.tsx: 7 / 7 cases
still green against the new helper.

* fix: wire M1 project override end-to-end + correct deferred-surface doc claims (PR #1338)

Three lefarcen P2s on the latest review pass, all real:

1. M1 project override was half-wired: the daemon read
   metadata.critiqueTheaterEnabled but the web setter only
   wrote localStorage. A user opt-in would render the Theater
   on the web (localStorage was set) while the daemon resolved
   projectOverride=null and skipped critique unless env / phase
   already permitted. Two halves talking past each other.

   Extended setCritiqueTheaterEnabled to accept an optional
   { projectId, fetchProjectSettings } options bag. When a
   projectId is supplied, the setter ALSO sends a
   PATCH /api/projects/:id with { metadata: { critiqueTheaterEnabled
   } } so the daemon's spawn-time resolver picks the same value up
   on the next generation. The existing project-routes endpoint
   already accepts arbitrary metadata patches, so no new endpoint
   is needed. The local write + the CustomEvent dispatch still
   fire before the PATCH, so a network failure does not unwind
   the in-session UI flip. Three new vitest cases pin the new
   path: PATCHes when projectId is provided, skips when it is
   not, swallows a rejected PATCH so the in-session UI still
   flips.

2. Rollout docs (docs/critique-theater.md section 3) claimed the
   Settings toggle persists into the daemon settings store, but
   the previous implementation only had a localStorage reader /
   writer plus a daemon read of project metadata, with no
   round-trip. Rewrote the section to lead with the four-tier
   resolver (skill policy / project override / env / phase),
   document that the setter now round-trips via the existing
   PATCH endpoint when given a projectId, and call out the
   Settings panel UI control as a deliberate follow-up.

3. Troubleshooting table pointed users at /api/metrics/critique
   (Phase 12, deferred) and 'od adapters clear-degraded <id>'
   (CLI wrapper that does not exist). Replaced the metrics
   reference with the local conformance harness command
   (pnpm --filter @open-design/daemon vitest run
   tests/critique-conformance.test.ts) that ships today, with a
   note that the Phase 12 dashboard surfaces this status as a
   series once that PR lands. Replaced the CLI command with the
   programmatic clearDegraded() helper that exists today and
   flagged the CLI wrapper as planned follow-up.

Web typecheck: clean. Toggle hook tests: 14 / 14 green (11
existing + 3 new for the round-trip path).

* test(web): multi-round interrupt regression for bestRoundAndComposite (PR #1338)

lefarcen P3 follow-up to the previous bestRoundAndComposite fix:
the existing CritiqueTheaterMount.test.tsx interrupt cases only
exercised a single-round state, so a future refactor back to two
independent helpers wouldn't be caught by the test suite even
though it'd reintroduce the round / composite drift bug.

Added a regression case that:

  1. Drives the reducer through two complete rounds with the
     full 5-role cast closing at distinct composites: round 1
     at 8.5, round 2 at 6.0 (the high-composite round is NOT the
     most recent one).
  2. Clicks Interrupt + waits for the daemon ack via the test
     seam fetcher returning 204.
  3. Asserts the collapsed badge displays "round 1" (the
     correct best-composite round), and queryByText for
     "round 2 ... 8.5" returns null (the buggy pairing
     would have produced that string).

The bestRoundAndComposite helper walks state.rounds in one pass
and returns the matching pair, so the round number and the
composite cannot drift apart. This test locks the fix in: a
refactor that splits the helpers back into independent walks
will be caught here.

8 / 8 vitest cases green on the file.

* fix(web): read-merge-write the project metadata in setCritiqueTheaterEnabled (PerishCode P2 on PR #1338)

The previous round-trip sent { metadata: { critiqueTheaterEnabled: next } }
as the entire PATCH body. The daemon's project-routes handler only
re-stamps three immutable fields (baseDir, importedFrom,
fromTrustedPicker) before calling updateProject(db, id, patch),
which then does a shallow { ...existing, ...patch } in apps/daemon/
src/db.ts. So patch.metadata replaces the row's metadata wholesale,
dropping kind, templateId, linkedDirs, and every other field the rest
of the app reads.

No in-tree caller passes projectId today (only vitest cases), so the
bug had not surfaced yet. But the surface is documented in
docs/critique-theater.md section 3 and the function's own JSDoc as
the M1 round-trip path, so it would have shipped as a latent footgun
for the next integrator: a Settings UI follow-up, or any third party
that wires the setter into a project-aware surface.

Fix: read-merge-write rather than a bare patch.

- GET /api/projects/:id to read the row's current metadata.
- Spread that metadata into the PATCH body and overlay
  critiqueTheaterEnabled: next on top, mirroring the partial-metadata
  pattern already used in ChatComposer.tsx for linkedDirs.
- PATCH the merged object.

Failure handling:
- GET fails: skip the PATCH entirely. We cannot construct a safe
  merged body without the current state, and a bare patch would
  wipe other metadata. The in-session CustomEvent fired earlier in
  the setter still keeps every mounted hook consistent; the next
  save retries the round-trip.
- PATCH fails: log in dev. The in-session UI is already correct via
  the CustomEvent.

Tests (TDD, red-first):

- 'GETs the project then PATCHes with merged metadata when a
  projectId is supplied': stubs a GET that returns
  { kind: 'template', templateId: 'modern-blog', linkedDirs: [...] }
  and asserts the PATCH body equals the merge plus the toggle.
- 'PATCHes with just the toggle when the project has no prior
  metadata': stubs a GET that returns no metadata block.
- 'skips the PATCH (does not stomp metadata) when the prefetch GET
  fails': stubs a rejecting GET and asserts only the GET fires.
- 'swallows a rejected PATCH after a successful prefetch': stubs a
  successful GET and a rejecting PATCH; asserts the in-session UI
  still flips via the CustomEvent.

Doc updated on the setter's JSDoc to describe the new three-step
flow (localStorage, CustomEvent, read-merge-write PATCH) and the
two failure modes.

Verified:
- pnpm --filter @open-design/web typecheck clean.
- pnpm --filter @open-design/web test: 111 files / 1055 tests green
  (was 1052, +3 from the new merge-flow cases).

* fix(web): restore wait-for-daemon-ack pattern on Theater interrupt

Same regression as flagged on PR #1316 post-main-merge: the
optimistic local dispatch fired before the POST resolved, so a
daemon 404 / 409 still terminalized the UI and the real SSE
terminal event got ignored by the sticky interrupted phase.

Snapshot runId / bestRound / composite at click time, dispatch
interrupted only on res.ok, clear interruptPending on rejection or
non-2xx so the user can retry. Tests cover rejection + 404 leaving
the run on the live stage; the 204 path waits for the ack.

* feat(daemon): Critique Theater Phase 12 observability foundations

Lands the metrics registry, the structured logger, the /api/metrics
route, and the adapter-degraded bump that wires up the first data
point. The orchestrator-side bumps for runs / rounds / composite /
must-fix / interrupted / parser_errors / protocol_version land in a
follow-up commit on this branch (kept separate so the wiring diff
reads cleanly against the registry shape).

Surfaces added:

- apps/daemon/src/metrics/index.ts: 9 Prometheus series under the
  open_design_critique_* namespace with the histogram buckets the
  spec calls out (round_duration_ms at 100 / 250 / 500 / 1000 /
  2500 / 5000 / 10000 / 30000 / 60000 ms; composite_score at
  0-10 integer steps).
- apps/daemon/src/logging/critique.ts: 6 typed events, one JSON line
  per call on stdout, namespaced critique. Matches the JSON-per-line
  convention cli.ts already uses; no new logger framework.
- apps/daemon/src/server.ts: GET /api/metrics route. Honors
  OD_METRICS_ENDPOINT=disabled to opt out for air-gapped installs.
- apps/daemon/src/critique/adapter-degraded.ts: markDegraded now
  bumps degraded_total so the adapter-health dashboard panel
  reflects every TTL refresh and every fresh mark.

Deps: prom-client ^15.1.0, @opentelemetry/api ^1.9.0 added to
apps/daemon/package.json. Both are zero-config no-ops without an
exporter wired; daemon bundle size impact is ~150 KB uncompressed.
The @opentelemetry/api dep is in place ahead of the OTel-spans
follow-up commit; it adds no behavior on this commit.

Tests:
- tests/metrics/critique.test.ts (3 cases): registry shape +
  exposition text + reset-between-tests
- tests/logging/critique.test.ts (4 cases): event shape + ordering
  + newline framing + namespace stamping

Verification (Windows-local):
- pnpm --filter @open-design/daemon typecheck: clean
- New metrics + logging suites: 7 / 7 green
- Existing adapter-degraded + conformance + rollout suites:
  22 / 22 green; the bump is non-breaking

* feat(daemon): wire Critique Theater metrics + structured logs from the orchestrator

Lights up the bump sites the Phase 12 foundations PR registered the
series for. Every panel event the parser surfaces now reaches the
matching Prometheus counter / histogram and the matching JSON log
line on stdout.

Switch-loop bumps + logs:

- run_started: log run_started, set protocol_version gauge to the
  observed protocol version (small-integer cardinality).
- panelist_open: record the first-open wall-clock per round so
  round_end can compute round_duration_ms; subsequent opens in the
  same round leave the start time untouched.
- panelist_must_fix: bump must_fix_total with the panelist role.
  The wire event does not yet carry a dim name, so the label is
  'unspecified' for now; a future parser revision can drop in the
  real dim without a metric rename.
- round_end: bump rounds_total, observe composite_score, observe
  round_duration_ms (current ms minus the tracked start), log
  round_closed with the composite / mustFix / decision triple.
- parser_warning (parser-yielded): bump parser_errors_total with
  the kind label, log parser_recover with kind + position.

Orchestrator-side parser warnings (composite_mismatch and
duplicate_ship from the daemon-authoritative scoring checks) go
through a new emitParserWarning helper so the bus emit, the
collectedEvents push, the metric bump, and the log line stay in
lockstep. Three inline emission sites collapse to one-line helper
calls.

After the try/catch, a single terminal-status switch bumps
runs_total{status, adapter, skill} once per run, with branch-
specific log + counter:

- shipped / below_threshold: log run_shipped
- interrupted: bump interrupted_total, log run_failed{cause: interrupted}
- timed_out: log run_failed{cause: timed_out}
- failed: log run_failed{cause: orchestrator_internal}
- degraded: log degraded{reason: orchestrator_classified}

OrchestratorParams gains optional skill: string for the label;
defaults to 'unknown' so spawn sites that have not yet threaded it
keep working without a metric shape change.

Tests:
- The new metrics + logging suites (7 / 7) verify registry shape
  and event framing; orchestrator-side metric integration is
  exercised through the existing critique-conformance and
  critique-adapter-degraded suites (22 / 22 still green).
- Logger test reassigns process.stdout.write directly instead of
  vi.spyOn so the Node overloaded write signature does not
  collide with MockInstance<unknown>.

* feat(observability): Grafana dashboard JSON for Critique Theater

Three default rows mapping to the metrics this branch wires up:

1. Fleet quality: composite score p50 / p90 / p99 line graph by
   adapter, plus a heatmap of the composite distribution. The
   line graph answers 'are my agents getting better over time';
   the heatmap answers 'are the bad runs clustered around one
   adapter or smeared across the fleet'.

2. Adapter health: stacked bar charts for degraded marks (by
   adapter / reason) and parser errors (by adapter / kind) over
   a 5-minute window. The two queries together let an operator
   see 'is this adapter degraded because of malformed wire output
   or because of oversize blocks' without flipping panels.

3. Brief throughput: runs-per-hour by terminal status, an average
   rounds-per-run stat per adapter, and a round-duration ms p50 /
   p90 / p99 line. Throughput numbers fall straight out of the
   runs_total / rounds_total counters; the duration histogram is
   the same one the runs feed.

The dashboard uses a templated $datasource var (defaults to
'prometheus') so an operator with multiple Prometheus instances
can switch without editing JSON. Schema version 39 (Grafana 11).

Operators import via:

  pnpm dlx @grafana/cli dashboard import     tools/dev/dashboards/critique.json

or paste into a provisioned dashboards directory. The file is
checked into the repo as a starting artifact; alert rules and
SLO panels ship after the first 1000 runs inform the right
thresholds. JSON validates with node -e 'JSON.parse(...)' (sanity
checked locally).

* feat(daemon): OpenTelemetry outer span around the critique run

Wraps each runOrchestrator call in a 'critique.run' span via the
existing @opentelemetry/api dep added in the Phase 12 foundations
commit. Attributes set on the span:

- critique.run_id, critique.adapter, critique.skill at start
- critique.final_status, critique.final_composite on terminal
  resolution
- span status flipped to ERROR for failed / timed_out runs so a
  Tempo / Honeycomb / Jaeger filter on traces.status=error
  surfaces the right slice without joining back to Prometheus

No exporter is wired by default; @opentelemetry/api is the API
package and intentionally splits from @opentelemetry/sdk-*, so
the span is zero-overhead until an operator attaches an SDK
through their runtime config.

Inner per-round / parse_chunk / scoreboard_eval / persist_round /
ship.persist spans defined in the Phase 12 plan are a follow-up:
the outer span alone gives the trace a duration + final status +
adapter/skill labels, which is the 80% value for dashboards that
correlate runs across services. Adding child spans inside the
existing 600-line orchestrator without restructuring is a separate
careful change.

Verification:
- pnpm --filter @open-design/daemon typecheck: clean
- 29 / 29 critique + metrics + logging tests still green

* fix(nix): bump pnpmDepsHash for prom-client + @opentelemetry/api lockfile bump

nix-check failed on PR #1485 with hash mismatch in
open-design-daemon-pnpm-deps and open-design-web-pnpm-deps after
the Phase 12 foundations commit (2b8b7445) added prom-client and
@opentelemetry/api to apps/daemon/package.json and refreshed
pnpm-lock.yaml.

CI reported the new sha:
  specified: HFLm+8hv3o5x3Xem4MXNsNclIgiVRc70+EBafL0rVn8=
  got:       7R1sQC38gOT0gsZ2oNOviCZ486cbbGJGJCis6WI8z9s=

Both nix files pin the same workspace lockfile, so both flip in
lockstep. No other Nix surface changes required.

* fix(daemon): four Phase 12 review findings (Codex P2 x2 + Siri-Ray P2 + lefarcen P2)

1. Siri-Ray P2 in orchestrator.ts (round metric / log used untrusted
   agent values). The new observability path now records rs.composite
   and rs.mustFix (daemon-authoritative) instead of event.composite
   and event.mustFix when rs exists, and skips the bumps + log
   entirely when rs is missing (a degenerate round_end without any
   matching panelist_open). The dashboard p50 / p90 / p99 now agrees
   with persistence and ship decisions; an adapter reporting <ROUND_END
   composite='10'> while the daemon computed 6 logs 6 and still emits
   the composite_mismatch parser warning the prior block was already
   producing.

2. Codex P2 in server.ts (skill label always 'unknown'). The spawn
   path called runOrchestrator without passing the resolved skill id,
   so every live run bumped open_design_critique_*{skill='unknown'}
   and the per-skill dashboard breakdown was always empty. Threaded
   effectiveSkillId (already computed at the same handler scope as
   the project skill fallback) through skill: . . . so the metric
   reflects the real skill when one is assigned, and the orchestrator
   default of 'unknown' only fires for runs that genuinely have none.

3. Codex P2 in conformance.ts (protocol-version mismatch let through).
   An adapter that emitted <CRITIQUE_RUN version='2'> followed by a
   valid SHIP classified as shipped because the harness only watched
   for terminal events. Added a guard inside the parse loop: if a
   run_started carries protocolVersion !== CRITIQUE_PROTOCOL_VERSION,
   mark the adapter degraded with reason 'protocol_version_mismatch'
   (already in DEGRADED_REASONS) and return early. ConformanceOutcome
   union widened to accept the new reason.

4. lefarcen P2 in tools/dev/dashboards/critique.json (runs-per-hour
   panel under-reported by 3600x). 'rate(...[1h])' returns per-second.
   Multiplied by 3600 so the panel title and unit match the actual
   value rendered.

Verification:
- pnpm --filter @open-design/daemon typecheck: clean
- New metrics + logging suites (7), existing adapter-degraded (7),
  conformance (5), rollout (10): 29 / 29 green
- Grafana JSON re-parses with node -e 'JSON.parse(...)'

* fix(nix): set pnpmDepsHash to fakeHash so CI surfaces the real hash for the regenerated lockfile (lefarcen P1 on PR #1485)

* fix(nix): pin pnpmDepsHash to sha256-NtXbiRU0YZ4EVJVNC6N3sR1S0ozA3BvCwgXI0L0OMH4= from CI nix-check output

---------

Co-authored-by: Nagendhra <nagendhra405@gmail.com>
2026-05-13 22:11:27 +08:00
lefarcen
e1bc83a476
feat(analytics): PostHog product analytics (P0 events, consent-gated, packaged) (#1428)
* feat(analytics): scaffold PostHog product-analytics integration

- Add @open-design/contracts/analytics subpath with the 17 P0 event
  payload types, header constants, and code↔CSV enum mapping helpers.
- Add apps/daemon/src/analytics.ts with env-gated posthog-node client,
  request-scoped analytics context reader, and artifact-id anonymizer.
- Expose GET /api/analytics/config so the web bundle never embeds the
  PostHog key at build time; daemon owns POSTHOG_KEY / POSTHOG_HOST.
- Add apps/web/src/analytics module (identity + lazy posthog-js client
  + React provider) and mount it under <I18nProvider> in app/layout.

No event wiring yet — that lands in the next commit alongside trigger
points (App.tsx, EntryView, NewProjectPanel, SettingsDialog, FileViewer,
runs.ts).

* feat(analytics): wire app_launch, home_view, home_click, project_create_result

- App.tsx: fire app_launch once after first effect tick. handleCreateProject
  now emits project_create_result on both success and failure paths.
- EntryView.tsx: home_view (page) gated on agents loading so
  has_available_cli isn't transiently false; home_view (asset_panel) fires
  per top-tab change with the right result_count.
- NewProjectPanel.tsx: home_click create_button fires before delegating to
  the parent; a fresh request_id is generated here and threaded through
  onCreate so the matching project_create_result stitches via $insert_id.
- contracts/analytics: tighten createTabToTracking and topTabToTracking
  for the worktree branch's renamed tabs (live-artifact, templates).

* feat(analytics): wire settings_view + 3 settings_click events

- settings_view fires on dialog mount and on every section switch,
  carrying the active section (mapped via settingsSectionToTracking
  for the 16-section worktree layout), execution_mode, and the
  selected CLI provider id when present.
- settings_click execution_mode_tab: setMode now emits before/after
  values whenever the user toggles between Local CLI and BYOK.
- settings_click cli_provider_card: agent card onClick reports
  cli_provider_id via agentIdToTracking (kiro → other).
- settings_click byok_field: onFocus added to api_key, model select,
  and base_url inputs; provider_id widened to include google so the
  worktree's Gemini protocol slot type-checks.

* feat(analytics): wire studio_view + studio_click chat, studio_view artifact

- packages/contracts/src/analytics/artifact-id.ts: FNV-1a 64-bit helper
  produces a 16-hex anonymized id for (projectId, fileName). Stable
  cross-platform so the daemon and the web bundle resolve the same id
  without a Web Crypto round-trip; daemon now re-exports it.
- ChatComposer: studio_view chat_panel fires once per project mount,
  studio_click chat_composer fires on attachment + send buttons with
  estimated user_query_tokens (length/4) and has_attachment.
- FileViewer: studio_view artifact fires once per (project, file) at
  the dispatcher level, before any sub-viewer renders, with
  artifact_kind derived from the renderer registry / file.kind table.
- Widen TrackingExportFormat to include markdown and cloudflare_pages
  so the worktree branch's full share menu can emit verbatim.

* feat(analytics): wire studio_click share_option + artifact_export_result

HtmlViewer's share menu now emits both events per click via a
fireShareExport helper:

- studio_click share_option fires immediately on click with the chosen
  export_format and a fresh request_id.
- artifact_export_result fires when the export resolves — success for
  sync exporters (html, markdown, template) the moment the call
  returns, success/failed for async exporters (pdf, zip, deploy)
  via .then/.catch. The same request_id threads both events so
  PostHog stitches click → result via $insert_id.

DEPLOY_PROVIDER_OPTIONS maps to the CSV's vercel / cloudflare_pages
slots; markdown is now a first-class export_format value.

Also ignore .env.local so local POSTHOG_KEY / .env-style secrets
don't get committed.

* feat(analytics): emit run_created and run_finished from the daemon

POST /api/runs now reads the analytics context off the
x-od-analytics-* headers the web client sets on every fetch, then:

- Captures run_created with project_id, conversation_id, run_id,
  model_id, agent_provider_id (mapped via agentIdToTracking),
  skill_id, design_system_id, plus the token_count_source marker.
- Schedules a run_finished capture on runs.wait(run) resolution,
  mapping succeeded/canceled/failed to success/cancelled/failed and
  reporting total_duration_ms.

Both events use a stable insert_id derived from the same uuid so
PostHog dedupes the daemon-side mirror against any future
web-side capture without double-counting.

Token sub-fields (user_query_tokens/system_prompt_tokens/...) stay
omitted in v1 — the claude-stream parser only exposes input/output
totals today. See tracking-doc-issues.md §3.2.

* feat(analytics): emit settings_cli_test_result + settings_byok_test_result

The original BLOCKING-list assumed these CSV P0 events were not
implementable in this branch because main lacked Test buttons. The
worktree HEAD actually wires `handleTestAgent` and `handleTestProvider`
in SettingsDialog, so both events are now in scope.

- handleTestAgent emits settings_cli_test_result on success and
  failure paths with cli_provider_id mapped via agentIdToTracking,
  result drawn from result.ok / catch branch, error_code from
  result.kind or the thrown error name, and duration_ms timed via
  performance.now().
- handleTestProvider emits settings_byok_test_result analogously,
  using apiProtocol (anthropic|openai|azure|ollama|google) directly
  as provider_id — wider than the CSV's 5-value enum, documented in
  tracking-doc-issues.md §2.5.

Contracts: add SettingsCliTestResultProps / SettingsByokTestResultProps
plus matching track* helpers. AnalyticsEventName union now covers all
14 P0 events this branch supports.

* feat(analytics): gate PostHog on the existing telemetry.metrics consent

The integration now reuses the same first-launch privacy banner +
Settings → Privacy toggle that gates Langfuse, so a single user
decision controls both telemetry sinks.

- /api/analytics/config now consults the persisted AppConfigPrefs:
  it returns enabled=true only when POSTHOG_KEY is set AND the user
  has chosen "Share usage data" (telemetry.metrics === true). The
  response also echoes installationId so the web client uses the
  same anonymous id Langfuse keys off of — one identity per install,
  shared across both sinks.
- Web AnalyticsProvider:
  - Bootstrap fetch resolves installationId and threads it through
    the x-od-analytics-anonymous-id header on every /api/* fetch,
    so daemon-side captures (run_created / run_finished /
    project_create_result) land on the same person record.
  - Exposes a setConsent(granted) method that calls posthog-js's
    opt_in_capturing / opt_out_capturing, wired from App.tsx via a
    useEffect watching config.telemetry?.metrics. Toggling Privacy
    → metrics now stops/resumes events immediately, no reload.
- app_launch additionally gates on telemetry.metrics so a freshly-
  declined user fires nothing, and a freshly-opted-in user fires on
  the next reload.

* feat(packaging): bake POSTHOG_KEY into packaged daemon spawn env

Wires PostHog product analytics through the same Langfuse-style build-
secret pipeline so official Open Design builds ship with the key while
fork builds compile without it (the integration short-circuits cleanly
when POSTHOG_KEY is absent).

tools/pack
- resolveToolPackConfig reads POSTHOG_KEY / POSTHOG_HOST from
  process.env at packaging time, validates them (no whitespace in the
  key, http(s) URL for host, trailing-slash strip), and stamps them on
  ToolPackConfig. Fork builds without the env vars simply omit the
  fields; the daemon-side gate keeps things off in that case.
- Mac, Windows, and Linux packaged-config writers each append the two
  fields to open-design-config.json next to the existing
  telemetryRelayUrl entry.

apps/packaged
- RawPackagedConfig / PackagedConfig surface posthogKey / posthogHost
  so the Electron entry and headless entry both forward them to the
  daemon sidecar.
- buildPackagedDaemonSpawnEnv emits POSTHOG_KEY / POSTHOG_HOST into
  the daemon child env when present. The daemon's existing analytics
  module reads these via process.env — no daemon-side changes needed.
- The headless packaged path falls back to process.env for fields the
  builder hasn't injected, mirroring how OPEN_DESIGN_TELEMETRY_RELAY_URL
  is read there.

CI
- release-beta.yml and release-stable.yml expose POSTHOG_KEY (secret)
  and POSTHOG_HOST (var) at workflow-env scope so every packaging job
  inherits them. PR / fork builds without these set simply skip the
  bake step.

Tests
- tools/pack: config.test.ts covers bake-through, fork-build omission,
  whitespace rejection, invalid-URL rejection, and trailing-slash
  normalization.
- apps/packaged: sidecars.test.ts covers buildPackagedDaemonSpawnEnv
  forwarding the keys when present and omitting them when null.

* feat(analytics): enable PostHog autocapture + perf + exceptions

Flip on the PostHog SDK's automatic diagnostic features so we capture
click paths, page transitions, web vitals, dead clicks, and browser
exceptions without scattering instrumentation through the codebase.

Privacy defense lives in one place — apps/web/src/analytics/scrub.ts —
wired in via posthog-js's `before_send` hook so every outgoing event
passes through the same audit point:

  - $autocapture / $rageclick / $dead_click / $copy_autocapture:
    strips $el_text and value/placeholder/aria-label attrs from any
    input, textarea, password input, or contenteditable element. PostHog
    autocapture does not capture input.value by default, but $el_text
    on a <textarea> reflects the typed content — that's the prompt
    body for us, so it has to be scrubbed every time.
  - $pageview / $pageleave: drops query string and fragment from
    $current_url / $referrer so any future ?q=… can't leak.
  - $exception: rewrites file:// and absolute filesystem paths in
    stack frames to app://apps/<repo-relative> so we don't ship the
    user's home directory.
  - Suppresses $opt_in entirely — duplicate of our explicit
    setConsent toggle in App.tsx.

Element-level defense in depth is limited to the single most sensitive
surface: the chat composer textarea gets `ph-no-capture` so PostHog
never even generates an event for clicks inside that subtree. Every
other input relies on scrub.ts — sprinkling the class through every
form would be noisy and easy to forget on new surfaces.

The existing Privacy → "Share usage data" toggle continues to gate
every new feature: posthog-js's opt_out_capturing() halts autocapture,
$pageview, $exception, web vitals, and dead clicks alongside the
explicit capture() calls — one global switch.

11 unit tests pin the scrub rules in apps/web/tests/analytics-scrub.test.ts.

* ci(nix): bump pnpmDepsHash for posthog-js + posthog-node additions

Adding posthog-js to apps/web and posthog-node to apps/daemon changed
pnpm-lock.yaml, which Nix's fixed-output pnpmDeps derivation pins by
sha256. The CI nix flake check failed with:

  specified: sha256-KF3Mld72/iau+pJmA7HvnanRx8VLtDP0N624SKrtrrc=
  got:       sha256-PGFgX4lYyeH2TRAXfUq52A3EOa6bb1gO59hPsXhEk3s=

Copy the new hash into both nix/package-web.nix and
nix/package-daemon.nix per the procedure documented in nix/README.md
§"First-build hash pinning".

* feat(analytics): unify PostHog identity with Langfuse installationId

PostHog's distinct_id is the installationId stamped by /api/analytics/
config; Langfuse already reads the same id off app-config.json to
populate trace.userId. With both sinks keying off the same anonymous
identity, dashboards can correlate user actions (PostHog events) with
LLM runs (Langfuse traces) without re-identifying.

Two gaps closed:

1. applyConsent(false) — clear posthog-js's persisted ph_*_posthog
   localStorage entry on opt-out via posthog.reset(). Without this, a
   user who opts out, then clicks Delete my data, then re-opts in
   would see PostHog stitch their new session to the deleted identity
   because bootstrap.distinctID only takes effect on first init.

2. applyIdentity(newInstallationId) — Delete my data rotates the
   installationId in app-config; App.tsx now watches config.installationId
   and calls posthog.reset() then identify(newId) so the next event
   batch is fully decoupled from the deleted one. Idempotent on
   same-id re-renders so benign config refreshes don't churn PostHog
   identities.

The fetch wrapper's x-od-analytics-anonymous-id header also flips to
the new id on rotation so daemon-side captures (run_created /
run_finished) land on the same person record from the very next API
call, not after a reload.

The end-to-end rotation flow is verified against a live PostHog
project; these unit tests pin the safety guards (no-client paths, null
inputs) since stubbing posthog-js's init-loaded callback chain is
brittle.

* fix(langfuse): require both metrics AND content consent for trace reports

Tightens the Langfuse gate so a user who shares anonymous metrics but
NOT conversation content stops emitting Langfuse traces entirely —
Langfuse is used for turn-quality evals which only make sense with
prompt/output bodies. PostHog (product analytics, content-free) stays
gated on `metrics` alone and is unaffected.

i18n: "Conversation content" → "Conversation and tool content" with
hints expanded to mention tool inputs/outputs so the consent surface
matches what the trace actually carries (en + zh-CN).

Bundled here per PR scope — change originated outside this PostHog
PR but lands cleanly on the same files; gating Langfuse strictly
on `content` makes the dual-sink consent model (PostHog = metrics,
Langfuse = metrics + content) symmetric across both i18n locales and
the daemon-side gate.

* feat(analytics): wire byok_provider_option + fix PR review P1s

Adds the BYOK protocol-chip click event (5-value provider_id mirroring
the apiProtocol Settings UI) and resolves four P1 review threads on
PR #1428.

byok_provider_option:
- New SettingsClickByokProviderOptionProps in contracts (provider_id =
  anthropic|openai|azure|google|ollama; maps to CSV's 5 values per
  tracking-doc-issues.md §2.5).
- trackSettingsClickByokProviderOption helper in apps/web/src/analytics.
- SettingsDialog hooks it on the protocol-chip onClick alongside the
  existing setApiProtocol call; is_selected reflects whether the chip
  was already active.

Review fixes:

1. client.ts (Siri-Ray): clear `initPromise` when the resolution is
   null so a Privacy → metrics opt-in after a previous decline triggers
   a fresh /api/analytics/config fetch. Without this, the disabled
   response was cached forever — first-session opt-in needed a reload
   to start sending PostHog events.

2. provider.tsx (Siri-Ray): replace `url.includes('/api/')` with a
   strict same-origin + /api/ pathname check (shared
   `isSameOriginApiCall` helper). Outbound third-party URLs containing
   `/api/` (e.g. provider.example.com/api/x) no longer receive our
   x-od-analytics-* headers.

3. provider.tsx (codex-connector, lefarcen): gate header injection on
   `resolvedAnonId` being non-null. When Privacy → metrics is off,
   /api/analytics/config returns enabled=false → resolvedAnonId stays
   null → wrapper never installs → daemon can't read consent-bearing
   headers → no daemon-side PostHog event. setConsent now also clears
   resolvedAnonId on opt-out and re-fetches on opt-in.

4. daemon/analytics.ts (defense in depth): createAnalyticsService now
   takes dataDir and capture() re-reads app-config to check
   telemetry.metrics inside the fire-and-forget wrapper. Even if a
   stale header somehow reaches the daemon after opt-out, the capture
   is dropped before posthog-node.capture is called.

* fix(web): place "Share usage data" on the right in privacy consent banner

Swap button order in PrivacyConsentModal and the in-settings ConsentCard
so the affirmative "Share usage data" lands on the right and "Not now"
on the left. Matches the OK-on-the-right pattern users expect for
primary actions.

Both buttons keep equal visual prominence (same .privacy-consent-action
styling) so the swap doesn't change the EDPB equal-prominence stance
called out in the original Langfuse telemetry spec.

* feat(analytics): populate run_finished token totals from claude-stream usage

Daemon's claude-stream parser already emits agent usage events with
input_tokens / output_tokens totals; the run service buffers them in
run.events and Langfuse reads them out the same way. The run_finished
PostHog event was leaving these fields empty.

Scan run.events for the most recent agent usage frame on terminal
transition and emit input_tokens / output_tokens / total_tokens when
present. token_count_source flips to 'provider_usage' only when at
least one count landed; runs without provider-side usage data keep
'unknown'.

Provider does not break the input down into the 7 sub-fields the
tracking doc lists (memory / context / attachment / system_prompt /
…); those stay omitted until a parser change exposes them.

* feat(analytics): estimate user_query_tokens from prompt length

The user_query_tokens field for run_created / run_finished was hardcoded
to 0. We can't tokenize without bundling a model-specific tokenizer, but
the character/4 heuristic is the industry-standard estimate when one
isn't available and is enough for funnel analysis (prompt-length cohorts,
short-vs-long-query conversion rates).

Extracted from req.body via the same telemetryPromptFromRunRequest
pattern the daemon already uses for langfuse-bridge (currentPrompt then
message fallback). Only the integer count goes to PostHog — the prompt
text itself never leaves the daemon.

token_count_source flips appropriately:
- run_created with a prompt: 'estimated' (was 'unknown')
- run_created with no prompt: 'unknown'
- run_finished with provider usage: 'provider_usage' (overrides
  baseProps' 'estimated' value)
- run_finished without provider usage: inherits 'estimated' or 'unknown'
  from baseProps so input/output absent doesn't mask the estimate.
2026-05-12 22:32:42 +08:00
lefarcen
2a0ebea50b release: Open Design 0.7.0
- bump 14 monorepo package.json files to 0.7.0 (root + apps/{web,daemon,desktop,packaged,landing-page} + packages/{contracts,platform,sidecar,sidecar-proto} + tools/{dev,pack,pr} + e2e); apps/packaged was already at 0.6.1 from beta lane, all others at 0.6.0
- add CHANGELOG.md [0.7.0] - 2026-05-12 entry covering 97 merged PRs since 0.6.0:
  - Critique Theater: Phase 7 web client state machine (#1307) + Phase 6.2 daemon artifact extraction (#1085)
  - Web/UI: thumbs-up/down feedback widget (#1308), Cmd+, opens Settings (#1173), Finalize design package + Continue in CLI (#974), fetch models button for BYOK (#1034), provider models alphabetical sort (#1097), collapsible MCP JSON field-mapping (#1136), design file rename (#894)
  - Daemon: auto-memory store with chat-protocol-aware extraction (#999), install/uninstall skills & design systems (#1003), HTTP 206 range requests for video/audio (#1105), scheduled routines (#1033), agent runtime + route registration refactor (#1063, #1043)
  - HyperFrames: HTML-in-Canvas across web + skills (#866)
  - Skills/design systems: generic skills + design-templates split + finalize-design API (#955), agent-browser skill (#1284), WeChat design system + login-flow skill (#1083), hud/loom/trading-terminal design systems (#1069), release-notes-one-pager skill (#873), tokens.css schema (#1231)
  - Packaging: macOS Intel (x64) build (#759), official Nix flake (#402), beta packaging cache (#1095)
  - Maintainer ops: tools-pr PR-duty workspace (#1259), MAINTAINERS.md (#1290), contributor card bot (#932), PR→issue linking discipline (#1263)
  - Changed: conversation run isolation (#1271), default English i18n fallback (#1270), Codex CLI exit diagnostics / empty-response handling / path fallback (#1267, #1244, #1205)
  - Fixed: ~30 web + desktop + daemon + packaging bugfixes
  - Internal: nightly UI/desktop regression coverage (#1256), e2e/release report hardening (#1140), entry/settings automation (#954)
- catch up [Unreleased] compare link to v0.7.0 and add missing [0.6.0] release link
- add 97 PR footnote refs ([#402]..[#1330])

Verified locally: pnpm install + pre-build contracts/daemon/desktop dist + pnpm typecheck (exit 0 across all 14 packages on Node 22.22 with engine-warning).

Release workflow validation runs after merge via release-stable.
2026-05-12 15:33:28 +08:00
Cursor Agent
0631f04a00
feat(plugins): @open-design/agui-adapter package + GET /api/runs/:id/agui
Plan J1 + J2 / spec §10.3.5 / Phase 4.

New workspace package: packages/agui-adapter/. Pure-TS
bidirectional bridge between OD's native PersistedAgentEvent /
GenUIEvent / PluginPipelineStageEvent union and the AG-UI canonical
event protocol (https://github.com/CopilotKit/CopilotKit).

  - src/types.ts        — AGUIEvent discriminated union (agent.message,
                          tool_call, state_update, ui.surface_requested,
                          ui.surface_responded, run.lifecycle).
  - src/encode.ts       — encodeOdEventForAgui(event, ctx): maps every
                          OD native event onto the canonical shape; drops
                          events the encoder can't translate so external
                          AG-UI clients always see a clean stream.
  - tests/encode.test.ts (9 cases) covers message_chunk, tool_call,
                          run_started, end → started/completed/failed/
                          cancelled, pipeline_stage_started/completed,
                          genui_surface_request/response/timeout,
                          genui_state_synced, and the unknown-event drop.

apps/daemon/src/server.ts mounts GET /api/runs/:id/agui:

  - 404 for unknown run ids.
  - Replays the run's recorded events through the encoder on subscribe
    (so a reconnecting client with Last-Event-ID picks up exactly the
    AG-UI events it missed).
  - Subscribes to future events via a thin adapter client wrapper that
    routes through the existing run.clients fan-out, so the encoder
    runs lazily on each broadcast (no double event buffering).

Daemon depends on @open-design/agui-adapter; the package builds clean
and ships pure ESM. v1 plugins consume CopilotKit / agent-protocol
clients without modification — the adapter ships independently from
daemon main, so upstream protocol revs do not couple to the daemon
release cadence (per spec §10.3.5 Phase 4 contract).

Tests: agui-adapter 9/9, daemon 1481 → 1482 (+1 case on agui-route).

Co-authored-by: Tom Huang <1043269994@qq.com>
2026-05-09 13:11:48 +00:00
Marc Chan
b03a504da6
release: Open Design 0.6.0 (#1080) 2026-05-09 19:58:11 +08:00
Cursor Agent
b9d40094b5
feat(plugins): github tarball + https archive install sources
Plan §3.A6 / spec §7.2.

The installer gains a top-level dispatcher `installPlugin` that picks
the right backend off the source string:

  - ./folder, /abs/path  → installFromLocalFolder (existing behavior)
  - github:owner/repo[@ref][/subpath]
                          → fetch codeload tarball, extract via tar.x,
                            optionally chroot into a subpath, then re-use
                            the local backend for copy / persist
  - https://*.tar.gz, *.tgz
                          → same archive backend, recorded as source_kind=url

Hard guards inherited from spec §7.2:
  - Symbolic / hard link entries inside the archive abort the install with
    a clean error.
  - Path-traversal segments (..) abort.
  - Total extracted size is measured against maxBytes (50 MiB default) and
    rejected if exceeded.
  - Local-folder backend now preserves the recorded source / source_kind
    when called from the archive backend so installed_plugins records
    accurate provenance.

POST /api/plugins/install accepts the new shapes; the SSE event stream
shape is unchanged. The fetcher is pluggable so tests don't need network.

apps/daemon/package.json adds tar@^7 + @types/tar@^6.

Daemon tests: 1429 → 1434 (added plugins-installer-archive with 5 cases).

Co-authored-by: Tom Huang <1043269994@qq.com>
2026-05-09 11:26:54 +00:00
pftom
4c7cd5d9f2 feat(plugins): introduce plugin system with installation and management capabilities
- Added support for a new plugin system, allowing users to install, uninstall, and manage plugins through the daemon.
- Implemented API endpoints for listing installed plugins, retrieving plugin details, and applying plugins with input validation.
- Introduced a plugin doctor feature to validate plugin manifests and check for issues before application.
- Established a plugin persistence layer with SQLite migrations for managing installed plugins and their metadata.
- Enhanced the CLI with commands for plugin operations, improving user interaction with the plugin ecosystem.
2026-05-09 18:24:44 +08:00
nettee
ef9ca7baff
fix(daemon): typecheck core server paths (#952) 2026-05-08 20:43:51 +08:00
ferasbusiness666
1e8926271b
Harden security scan findings and upgrade dependencies (#806)
* feat: add accent color control and launcher for Open Design

* fix: remove launcher binary from PR

* test: cover accent appearance edge cases

* Harden security scan findings and upgrade deps

* Address proxy security review

* Pin jsdom for web test stability

---------

Co-authored-by: ferasbusiness666 <ferasbusiness666@users.noreply.github.com>
Co-authored-by: lefarcen <935902669@qq.com>
2026-05-08 19:46:34 +08:00
lefarcen
2bb029cb58
release: Open Design 0.5.0 (#820)
0.5.0 已从 c21cbc6 发布(https://github.com/nexu-io/open-design/releases/tag/open-design-v0.5.0);本次 squash 把版本 bump 与 CHANGELOG [0.5.0] 条目带到 main 历史,便于后续 0.5.1 release 在 main 上走标准 dispatch 流程。
2026-05-08 00:41:01 +08:00
kami
09eb88f683
Add Cloudflare Pages artifact deployment
Adds Cloudflare Pages artifact deployment support.
2026-05-07 20:04:22 +08:00
nettee
84ac93c945
fix(daemon): extend OpenAI image request timeouts (#788) 2026-05-07 18:02:31 +08:00
lefarcen
ae4a08773a
chore(release): prepare 0.4.1 (#659)
- bump remaining monorepo package.json files to 0.4.1 after apps/packaged was already bumped in #637
- add CHANGELOG.md [0.4.1] - 2026-05-06 entry covering the startup hotfix and 19 merged PRs since 0.4.0:
  - Added: manual edit mode (#620), Cmd/Ctrl+P quick file switcher (#556), resizable chat panel (#563), PI status/cancel updates (#618), accessibility and RTL/Bidi craft modules (#587, #595), i18n structure checks (#608)
  - Changed: first-PR README links now surface help-wanted issues (#605)
  - Fixed: packaged contracts runtime exports (#577), packaged runtime beta gating (#637), ACP/MCP/agent fixes (#604, #612, #627), conversation error recovery (#623), native mac quit (#637)
  - Documentation/Internal: OD_DATA_DIR migration docs (#570), Simplified Chinese QUICKSTART (#578), zh-TW/ko README syncs (#586, #619), generated metrics (#592)

Release workflow validation runs after merge via release-stable.
2026-05-06 18:05:56 +08:00
lefarcen
963bbf2500
release: Open Design 0.4.0 (#454) 2026-05-05 23:39:40 +08:00
ChildhoodAndy
009d7a5478
refactor(daemon): eliminate duplicate dist tree from two-tsconfig build (#553)
Move sidecar source under src/ so a single tsconfig produces all daemon
output. Removes the parallel dist/src/ tree that was emitted by
tsconfig.sidecar.json (it included src/**/*.ts to type-check the
`../src/server.js` cross-tree import).

Build now emits:
- dist/<flat>            (cli.js, server.js, app-version.js, ...)
- dist/sidecar/{index,server}.js

`dist/sidecar/server.js` reaches the main daemon via `../server.js`
instead of `../src/server.js`, so there is no second copy of the source
tree in the published tarball.

Background — issue #534 (already fixed by #537):
The packaged Settings → About panel showed 0.0.0 because the sidecar
chain loaded the duplicated `dist/src/app-version.js`, where the fixed
`new URL('../package.json', import.meta.url)` resolved to a non-existent
`dist/package.json`. #537 patched the symptom by walking parents until a
real `package.json` is found and by writing `appVersion` into the Linux
packaged config. Both stay in place — they're sound defenses — but the
underlying duplicate-emit was never addressed; any future relative
resource lookup (templates, schemas, prompts) anchored on
`import.meta.url` would have hit the same trap.

This change removes the trap.
2026-05-05 23:31:14 +08:00
PerishFire
bbdd4e84b5
chore: enforce test directory conventions (#496)
* chore: enforce test directory conventions

Move package, app, and tool tests out of src and add guard enforcement so source directories stay source-only.

* ci: use guard and package-scoped tests

Run the new repository guard in CI and keep test execution aligned with package-scoped commands after removing root aliases.

* ci: align stable release guard check

Use the new repository guard in stable release verification after replacing the residual-JS-only script.

* chore: tighten test layout enforcement

Enforce sibling tests directories, typecheck moved test suites with dedicated configs, and refresh remaining guidance that pointed at src-based tests.

* chore: clarify no-emit test tsconfigs

Explicitly disable declaration-only emit in test tsconfigs so review tooling sees they are no-emit typecheck configs.
2026-05-05 15:34:22 +08:00
emilneander
33c3b94b42
feat(daemon): add od mcp - expose Open Design as an MCP server (#399)
* feat(daemon): add `od mcp` subcommand for stdio MCP server

Lets a coding agent in a different repo (Claude Code, Cursor, Zed)
pull files from a locally-running OD project over the Model Context
Protocol — no export/import zip dance.

The MCP server is a thin stdio process that proxies read-only tool
calls to the daemon's existing HTTP API; no daemon-side changes
required. Exposes 8 tools:

  list_projects, get_project,
  list_files, get_file,
  list_skills, get_skill,
  list_design_systems, get_design_system

Wired exactly like `od media`: a hoisted flag set, a SUBCOMMAND_MAP
entry, a thin handler that resolves OD_DAEMON_URL and hands off to
src/mcp.ts. Tool dispatch is a switch over the tool name; each branch
fetches the matching daemon route and surfaces the response as MCP
text content. Binary mimes return a clear error pending phase-2
support.

Lifecycle gotcha worth flagging: Server.connect(transport) only
*starts* the stdio reader; the promise resolves immediately. Without
holding the function awaiting until transport/stdin close, cli.ts's
top-level process.exit(0) kills the server before the first request
arrives. The fix in src/mcp.ts holds until onclose / stdin EOF.

Wire-up example for a consuming repo:

    {
      "mcpServers": {
        "open-design": {
          "command": "od",
          "args": ["mcp"],
          "env": { "OD_DAEMON_URL": "http://127.0.0.1:7456" }
        }
      }
    }

New dep: @modelcontextprotocol/sdk (MIT, official Anthropic SDK).

* feat(daemon): add MCP server instructions for zero-shot LLM context

Hand the consuming LLM a system-prompt-style overview of the OD
workflow so it picks the right tool without prompt-engineering on
the user's side. Mentions get_artifact and project-name resolution
ahead of their actual implementation; both ship in the same batch.

* feat(daemon): resolve MCP project args by UUID, name, or substring

Lets a consuming agent say `project: "recaptr"` instead of pasting a
UUID. Match order: exact id → exact name (case-insensitive) →
slug-normalized name (strips trailing " (N)", normalizes whitespace) →
substring (errors if multiple). UUID inputs short-circuit and never
hit the daemon.

* feat(daemon): surface entryFile and kind on MCP get_project response

Promote metadata.entryFile and metadata.kind to top-level fields so
consumers (including get_artifact in this branch) can find the entry
without digging through nested metadata blobs.

* feat(daemon): add MCP get_artifact tool for bundle retrieval

A design rarely lives in a single file. get_artifact pulls the entry
HTML/JSX plus every sibling it references (tokens CSS, JSX modules,
imported components) in one call, so a consuming agent doesn't need
to parse HTML and round-trip per file.

Three modes:
  auto (default): BFS over relative <script src>, <link href>,
    <img src>, <source/video src>, JSX import/from, CSS url(), with
    depth cap 3 and a visited set. CDN, data:, mailto:, anchors, and
    paths containing .. are skipped.
  all:    every textual file in the project (mirror of /archive
          minus binaries).
  shallow: just the entry file (same as get_file).

Output is a structured JSON blob with name/mime/size/content per
file and the project's manifest metadata at the top.

* feat(daemon): add /api/projects/:id/search route + MCP search_files

Server-side substring search across textual project files. Returns
file, 1-indexed line, and snippet, capped at 1000 matches. Exposed
through the MCP layer as search_files(project, query, pattern?, max?).

Treats the query as a literal substring (regex chars escaped) to
avoid catastrophic-backtracking attacks from LLM-supplied input.
Honors the project dir's existing path-safety guards via listFiles.

* feat(daemon): add since= filter to /files route + MCP list_files arg

Lets a consumer poll for "what's changed since I last looked" without
re-walking every file. Daemon-side: parse since= as ms, filter
listFiles output by mtime. MCP-side: forward as URL query.

* feat(daemon): expose skills and design systems as MCP resources

Catalog reads are stable reference material — they fit MCP's
resources surface (LLM-passive) better than tools (LLM-active).
Skills and design systems each become resources at
od://skills/<id>/SKILL.md and od://design-systems/<id>/DESIGN.md;
existing list_skills / get_skill / list_design_systems /
get_design_system tools remain as fallbacks for clients that don't
handle resources cleanly.

* fix(daemon): tighten MCP correctness in get_artifact and resources

Several silent-failure paths and minor footguns the first pass missed:

  - get_artifact auto: the entry's own fetch now raises a clear
    error instead of returning files: []. Previously a typo in
    `entry:` looked like an empty project.
  - get_artifact: invalid `include` value returns a clear error
    listing the valid modes instead of silently behaving as auto.
  - get_artifact all: includes binary files as metadata stubs to
    match auto's behavior. Both modes are now strict supersets of
    shallow.
  - extractRelativeRefs: gate JS-only patterns (import/from/require/
    dynamic-import) by file mime/extension so prose in markdown or
    HTML doesn't generate spurious 404 round-trips on words like
    "imported from 'X'".
  - extractRelativeRefs: cover <iframe>, <audio>, srcset, and
    CSS @import — common in real OD output.
  - resources/list descriptions are collapsed to a single line
    (newlines + repeated whitespace -> one space) so MCP UIs that
    don't normalize whitespace render cleanly.
  - fetchProjectFile: 0-byte binary files no longer report size: null
    due to falsy short-circuit on Number(content-length).

* perf(daemon): cache MCP project list for 5s in resolveProjectId

A typical agent session calls list_files/get_file/get_artifact several
times in a row, each with a project name argument. Each previously
re-fetched /api/projects. Cache the list in module scope with a 5s
TTL so back-to-back lookups are local; renames in the OD UI still
propagate within a few seconds.

* feat(daemon): MCP UX polish — tool order, annotations, get_artifact maxBytes

Three changes well-behaved MCP clients pick up automatically:

  - Tool ordering. list_projects + get_artifact are now first; LLMs
    that weight earlier entries surface the bundle path before
    per-file fetching. Catalog tools (list_skills, get_skill,
    list_design_systems, get_design_system) sit at the bottom; they
    are also exposed as MCP resources.
  - readOnlyHint / idempotentHint / openWorldHint annotations on
    every tool so clients can skip confirmation prompts on safe
    tools and let the LLM know re-running is fine. Per-tool `title`
    annotations give clients a friendlier display name than the
    snake_case tool id.
  - get_artifact gains a `maxBytes` arg (default 1.5MB). Once the
    accumulated textual content crosses the cap, remaining files
    are dropped and `truncated: true` is set on the bundle so the
    consumer knows to use list_files / get_file for the rest.

* feat(daemon): expose user's active OD project/file via MCP

The "what file are you on?" round-trip the agent had to do every
session is now answered automatically. Three pieces:

  - Daemon: in-memory active-context slot with 5-minute TTL.
    POST /api/active sets {projectId, fileName}; GET /api/active
    returns the current value enriched with projectName, or
    {active:false} when the slot is empty/stale. Cleared on
    daemon restart.
  - Web: a small useEffect in App.tsx posts the active project +
    file to the daemon on every route change. Best-effort fire-
    and-forget; a missing daemon doesn't surface an error.
  - MCP: get_active_context tool (no args) and a matching MCP
    resource at od://focus/active. The tool is listed second,
    right after list_projects, so an LLM picks it up before
    asking for ids. Server instructions tell the model to call
    it FIRST when the user says "this file" / "the design I have
    open" / "what I'm looking at."

End to end: user opens a project in OD, agent in another repo
calls get_active_context() → gets {projectName: "recaptr",
fileName: "recaptr-onboarding-4.html"}, then immediately calls
get_artifact(project: "recaptr") with no further user input.

* feat(daemon): make MCP project arg optional, fall back to active OD context

get_artifact, get_project, get_file, search_files, and list_files now
accept project as optional. When omitted, the MCP resolves project
from /api/active so an agent in another repo can call

  search_files({ query: "Polaroid" })

without first asking the user "which project?". get_file and
get_artifact also default their path/entry to the active file, so
get_file({}) returns whatever the user is currently looking at.

The implicit path stamps `usedActiveContext` on JSON responses (or a
separate `[od:active-context …]` content block on get_file) so the
agent can see exactly which project/file got chosen. Explicit
project args pass through with zero added overhead.

Cuts the common case from two MCP round trips
(get_active_context → search_files) to one. Server instructions and
get_active_context's own description are updated to point at the
new default.

* fix(daemon): require same-origin for /api/active POST and GET

The active-context endpoint was added without isLocalSameOrigin
guard. Since the daemon binds 0.0.0.0 by default, a LAN peer could
GET it to learn what file the user has open, or POST it to redirect
the MCP fallback to a project of their choice. Same-origin only is
the right scope: the web app proxies its requests through Next.js
on the daemon port, and the MCP runs over loopback in-process, so
both legitimate callers pass.

Pattern matches the existing /api/app-config etc. guards.

* feat(daemon): add /api/mcp/install-info for cross-platform install snippets

The Settings -> MCP server panel needs absolute paths to node and
the daemon's built cli.js so it can render snippets that work on a
fresh source clone (where `od` is not on PATH) and dodge the
/usr/bin/od octal-dump tool that ships on macOS/Linux and would
otherwise shadow ours.

Endpoint returns:
  - command: process.execPath (the node binary running the daemon)
  - args: [<absolute path to dist/cli.js>, "mcp"]
  - daemonUrl: http://127.0.0.1:<port>
  - platform: process.platform (so the panel can localize ~/.cursor
    vs %USERPROFILE%\.cursor and Cmd vs Ctrl shortcuts)
  - cliExists / nodeExists: existsSync checks on both binaries
  - buildHint: human-readable build/reinstall instructions when
    either path is missing

isLocalSameOrigin guard same as /api/active. Cached for 5s because
the panel may re-fetch on every open and the paths cannot change
without a daemon restart.

Test file covers the happy path, cross-origin rejection, two
allowed-Origin variants, and the cache by counting fresh resolves
across rapid calls. 5/5 pass.

* refactor(daemon): tighten MCP surface, trim descriptions, polish copy

Three intertwined cleanups that all live in mcp.ts + cli.ts:

1. Drop catalog tools from MCP. list_skills / get_skill /
   list_design_systems / get_design_system are removed. The audience
   is a coding agent in a separate repo consuming Open Design's
   output; it cannot run skills (those are recipes Open Design uses
   to generate) and design-system DESIGN.md is reference material
   that already ships as an MCP resource. Keeping the catalog as
   tools cost ~350 token-overhead per turn for capabilities the
   agent could not act on. Tool count: 11 -> 7.

2. Trim tool descriptions. The active-context fallback explanation
   was repeated in 5 separate tool descriptions; hoisted into
   PROJECT_ARG and explained once in the server `instructions`
   block instead. Saves ~150-200 tokens per tools/list response.

3. User-facing branding pass. Tool titles, tool descriptions,
   resource names, error messages, comments, and `od mcp --help`
   now consistently use "Open Design" rather than "OD". Internal
   abbreviation `OD` is retained only inside the server
   instructions block where it is introduced inline as "Open Design
   (OD)" for compactness across multi-paragraph guidance.

Em dashes replaced with hyphens throughout, per project style.

* feat(web): add MCP server install panel in Settings

New "MCP server" section in the Settings dialog, surfacing
copy-paste install snippets for the major MCP-compatible coding
agents (Claude Code, Cursor, VS Code, Antigravity, Zed, Windsurf).

Highlights:
  - In-brand custom dropdown (reuses the existing .ds-picker
    pattern from the design-system / prompt-template pickers, click
    outside / Escape to close, chevron animates) instead of a
    native <select>.
  - Per-client snippet that uses absolute paths to node + cli.js
    fetched from /api/mcp/install-info on mount, so it works even
    when `od` is not on PATH.
  - Cursor gets a one-click "Install in Cursor" deeplink
    (cursor://anysphere.cursor-deeplink/mcp/install) that pops an
    approval dialog and writes the config for the user. UTF-8-safe
    base64 so paths with accented characters do not throw.
  - Per-OS path hints (~/.cursor on POSIX, %USERPROFILE%\.cursor
    on Windows) and keyboard shortcuts (Cmd vs Ctrl).
  - Build-required warning card when cli.js or the node binary
    does not exist on disk; deeplink button disables in that state.
  - Prominent "restart your client to pick up the new server"
    callout below the snippet, with per-client guidance.
  - Capability list ("what your agent can do") instead of a tool-
    name dump, so non-developer designers can also tell what is
    possible without reading MCP docs.

README adds a short "Use Open Design from your coding agent"
section that points at the panel and summarizes the per-client
flow (one-click for Cursor, JSON merge elsewhere). Read-only by
design; the daemon must be running locally.

* docs(readme): align MCP server section with the Settings panel

The "Use Open Design from your coding agent" section had drifted
from what the panel actually emits and lists.

- Add Antigravity to the supported-client list (previously missing).
- Drop the "(GitHub Copilot)" parenthetical from VS Code so the
  label matches the panel.
- Fix the Claude Code line: we no longer emit a single
  `claude mcp add ...` shell command. The snippet is JSON; the
  panel additionally suggests `claude mcp add-json` as the safer
  way to apply it instead of hand-editing ~/.claude.json.
- Swap the "find the Polaroid section" example for two more
  universal phrases ("build this in my app", "match these
  styles") that match what the panel surfaces.
- Add a one-line "restart or reload your client after install"
  note - this was prominent in the panel and absent from the
  README.
- Trim the /usr/bin/od octal-dump aside; it was technical detail
  that did not earn its space at the README intro level.

* feat(web): add Codex CLI to the MCP server install panel

Codex is a first-class supported coding agent (listed alongside
Claude Code, Cursor, etc. in the README's PATH-detected agent
table) but the install panel was missing it.

Codex stores MCP server config at ~/.codex/config.toml (TOML, not
JSON) under an `[mcp_servers.<name>]` table, and the same file is
shared between the Codex CLI and the Codex IDE extension - so one
install covers both. Added a 7th client entry that emits the right
TOML snippet, expanded the snippet-lang union to include 'toml'
(behaves like 'json' for whitespace handling, just a different
syntax-highlight hint).

For our minimal payload (just command + args), JSON.stringify
happens to produce valid TOML literal values since TOML basic
strings use the same double-quote escape rules as JSON, and TOML
inline arrays match JSON array syntax. No new TOML serializer
needed.

README updated to list Codex among the supported clients.

Schema verified against https://developers.openai.com/codex/mcp.

* fix(daemon): accept any loopback origin in same-origin guard

The previous port-pinned check required the request's Origin to match
either the daemon's own port or OD_WEB_PORT. tools-dev does not pass
OD_WEB_PORT to the daemon process, so any browser POST to /api/active
proxied through the dev web (port 17573 etc.) was rejected with 403,
and get_active_context always returned {active: false}.

Relax to a loopback-prefix match: any http://127.0.0.1:*,
http://localhost:*, or http://[::1]:* origin passes regardless of
port. Cross-origin (https://evil.com) is still rejected. The
trade-off is that another local web app on a different loopback port
could now CSRF the daemon; same-origin checks are inherently a CSRF
defense, not a network ACL.

* fix(web): make Claude Code MCP snippet a real copyable one-liner

claude mcp add-json open-design '<json>' takes only the inner
server-config object, not the full {"mcpServers": ...} wrapper, and
rejected the wrapped shape with "Invalid configuration: : Invalid
input". Pass only the inner config, and inline the JSON into the
command itself so the snippet is a real one-liner the user can copy
and paste, no template substitution.

* test(daemon): drop loopback-prefix assertions superseded by upstream origin policy

The two proxy-flow allow tests were added in ae13094 to cover our
relaxed isLocalSameOrigin. Main's port-pinned implementation (from
#365) now handles the dev-flow via the web sidecar proxy origin
rewrite (#a719f02), making the relaxation -- and these tests --
unnecessary.

Also replace the inline LOOPBACK_*_RE / isLocalSameOrigin replica in
mcp-install-info.test.ts with a direct import from server.ts so both
test files stay in sync with the production guard automatically.

* fix(daemon): bake daemon URL into MCP install-info args

The install panel snippet previously emitted `od mcp` with no daemon
URL, so the MCP server always fell back to the hardcoded default port
7456. When tools-dev starts the daemon on a non-default port the
snippet silently targets the wrong daemon.

Fix: include --daemon-url http://127.0.0.1:<port> as the third arg so
the generated snippet is always tied to the running daemon's actual
port. Update the matching mini-app and assertion in the install-info
test.

* fix(daemon): address MCP reviewer feedback

- extractRelativeRefs: replace blanket `includes('..')` rejection with
  proper POSIX-style path normalization. `../tokens.css` in a nested
  project layout now resolves to `tokens.css` instead of being
  silently dropped.

- getArtifact: add MAX_FILES=200 cap to BFS auto and include=all modes.
  Pass `remainingBytes` to fetchProjectFile so it can bail early when
  the server-advertised content-length would already exceed the budget.

- resolveProjectId: return {id, name, source} instead of a bare id.
  Callers echo `resolvedProject` in the response when the match was by
  slug or substring, letting the agent confirm which project was
  chosen without an extra round-trip.

- getFile: thread `resolved` through so substring matches surface
  the same `[od:resolved-project ...]` annotation.

- @ts-nocheck: add a comment explaining the Zod-vs-JSON-Schema SDK
  mismatch so future contributors don't remove it accidentally.

- get_active_context description: note the ~5-minute cache TTL.

* test(daemon): restore @ts-nocheck on mcp-install-info test

Dropped accidentally when replacing the import header. The directive
suppresses expected test-file noise (baseUrl pre-assignment and
res.json() unknown return type); keeping it avoids littering the test
body with `as any` casts for zero real safety benefit.

* docs(readme): expand MCP section with why-MCP, security model, and recovery note

- Soften "No zip export, no copy-paste" to "Replaces the
  export-then-attach loop" per reviewer feedback.
- Add "Why MCP?" paragraph explaining the structured-API benefit over
  zip exports.
- Add daemon-not-running recovery note (clear error, not a crash;
  start with pnpm tools-dev and retry).
- Add security model callout: read-only, loopback-only, Host/Origin
  guard rejects non-loopback requests.

* docs: complete security model and daemon recovery notes for MCP section

8.3: Expand README security model to include stdio child process context,
trust framing (treat like a VS Code extension), and OD_BIND_HOST opt-in
for LAN exposure.

8.4: Replace terse "daemon not running" note in README with a full
recovery sentence covering the start-agent-before-Open-Design case.
Add the same recovery note as a footer paragraph in IntegrationsSection
so users see it in the Settings panel without needing to read the README.

* fix(daemon): pass resolved through get_artifact so substring matches echo resolvedProject

* feat(daemon): add MCP unit tests and fill description/instructions gaps

- Export extractRelativeRefs, resolveProjectId, resolveProjectArg,
  withActiveEcho, fetchProjectFile, getArtifact for testing
- mcp-extract-refs.test.ts: 10 cases covering flat, nested, deep,
  escape attempts, external/data/anchor/mailto URLs, srcset
- mcp-get-artifact.test.ts: MAX_FILES=200 cap, maxBytes cap,
  per-file content-length pre-check via fetchProjectFile
- mcp-resolve-project.test.ts: uuid/exact/slug/substring source
  values, ambiguity error, withActiveEcho resolvedProject stamping
- get_artifact maxBytes description now mentions the 200-file cap
- Instructions block now mentions resolvedProject field and when it
  appears (slug or substring match)

* docs(daemon): document MCP active-context TTL and surface wake-up hint

Address PR #399 review item P2.5 (active-context TTL undocumented) plus
the related UX gap where the agent had no way to tell the user that
clicking around in Open Design refreshes the cache.

- PROJECT_ARG, get_artifact entry, get_file path: append TTL note to
  argument descriptions so agents see the ~5-minute fallback window.
- get_active_context: when /api/active reports active:false, return
  an explicit hint string explaining the recovery action ("ask the
  user to click into a project") instead of a bare {active:false}
  the agent can't act on.
- get_active_context tool description: mention the new hint payload.
- resolveProjectArg error: extend the missing-active-context message
  with the same TTL + recovery wording for tool calls that omit
  project= and have no fallback.

* feat(daemon): add offset/limit pagination to MCP get_file

Real-world MCP usage hit a wall on large files: get_file returned the
full body, the agent decided the result was too large for its context
budget, and recovered by spawning a sub-agent that ran Python with
manual brace-matching for several minutes. That defeats the value
proposition of skipping zip-export.

Mirror Claude Code's Read tool semantics: get_file now accepts
optional offset (0-indexed line) and limit (default 2000) args, slices
the file in mcp.ts after fetching from the daemon, and stamps an
[od:file-window offset=.. returnedLines=.. totalLines=..] marker on
sliced or truncated responses so the agent can page by re-calling
with the next offset.

- Tool definition: add offset/limit args, expand description.
- getFile helper: line-split, slice, marker, range clamp at EOF.
- Instructions block: mention pagination in the get_file bullet.
- Binary rejection unchanged.
- New tests in mcp-get-file.test.ts cover default behavior, limit
  truncation, mid-file offset, offset past EOF, and binary rejection.

* fix(daemon): set truncated: true when per-file content-length pre-check fires

When fetchProjectFile throws because a file's advertised content-length
exceeds the remaining byte budget, both the include=all loop and the auto
BFS loop silently skipped the file without setting truncated: true. The
bundle could then report truncated: false even though files were dropped.

Introduce BudgetExceededError as a sentinel so callers can distinguish a
budget rejection (truncated: true) from a genuine fetch failure (404,
network) that should just be skipped. Both getArtifact call sites now
check instanceof BudgetExceededError and set truncated accordingly.

Adds a regression test: 5 files of 250 bytes with explicit content-length,
maxBytes=400. Only file 0 fits; files 1-4 each exceed the remaining 150
bytes. totalTextBytes never reaches maxBytes, so only the new path sets
truncated=true. Previously the bundle reported truncated: false.
2026-05-04 22:34:17 +08:00
Bryan A
d637297313
feat(preview): live-reload iframes when project files change on disk (#409)
* feat(preview): live-reload iframes when project files change on disk

Add a chokidar-backed file watcher per active project on the daemon, surface
changes via an SSE endpoint at /api/projects/:id/events, and consume them in
the web app to bump the file list. The new mtime then propagates to the
FileViewer iframe through PR #384's ?v=${mtime} cache-bust, reloading the
preview automatically — no manual refresh click.

Daemon:
- New apps/daemon/src/project-watchers.ts: refcounted per-project watcher
  registry. First subscribe lazy-creates a chokidar watcher; last unsubscribe
  closes it. Ignores .git, node_modules, .od, debug, .DS_Store. Returns a
  ready promise so callers can await initial scan.
- New endpoint GET /api/projects/:id/events using the existing
  createSseResponse helper. Sends one ready event after chokidar binds, then
  one file-changed event per add/change/unlink.
- Adds chokidar ^5.0.0 dependency.

Web:
- New apps/web/src/providers/project-events.ts exposing
  createProjectEventsConnection (pure, testable) and useProjectFileEvents
  hook. EventSource with exponential backoff (1s -> 30s cap), reset on a
  successful ready event.
- ProjectView.tsx subscribes when daemonLive && project.id, and on each
  event bumps the existing filesRefresh signal — no FileViewer changes
  needed because PR #384 already URL-loads with mtime cache-bust.

Tests:
- 6 new daemon unit + integration tests (refcounting, real chokidar
  add/change/unlink, ignore patterns).
- 8 new web hook unit tests (URL encoding, payload parsing, malformed
  payload tolerance, exponential backoff, backoff reset on ready,
  close cancels reconnects, no-op when EventSource missing).

Closes #370

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(preview): test ignore patterns relative to watch root

The ignore predicate matched against absolute paths, so segments in the
watch root's ancestors (e.g. the daemon's own .od/ runtime dir, which
contains every project) silenced every event. In production this meant
zero file-changed events ever fired — every file inside a project sat
under .od/projects/<id>/, and .od matched the ignore.

Tests passed because mkdtemp puts test roots in /tmp/od-watchers-XXX/,
which has no .od ancestor.

Fix: compute the path relative to the watch root, then test segments.
Add a regression test that reproduces the production layout
(.od/projects/<id>/...) and asserts events still fire.

Also folds in a small consolidation of the SSE route handler from the
prior commit on this branch (single route, surfaces sub.ready before
emitting `ready`, propagates err.message in the error path).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(project-view): only auto-open files that exist in the project

Agent Write/Edit tool results unconditionally called requestOpenFile on the
basename of the edited path, which created permanent placeholder tabs
("Open a file from Design Files.") whenever the agent edited a file outside
the project's working directory (e.g. an upstream repo source file).

Add decideAutoOpenAfterWrite() — a pure helper that gates the auto-open on
the file actually appearing in the refreshed project file list. Same
nextFiles-from-then() pattern already used at ProjectView.tsx:968.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(preview): chokidar resilience + auto-open path-suffix matching

Address PR #409 review feedback in one bundle:

- chokidar error listener (codex P1): FSWatcher is an EventEmitter;
  without an error handler, transient FS faults (ENOSPC, EPERM, EMFILE)
  surfaced as unhandled exceptions and could crash the daemon. Watcher
  now logs in dev mode and continues; refcount cleanup unaffected.
- followSymlinks: false (mrcfps): keep the watcher's resource boundary
  aligned with the project boundary so a symlink inside the project
  cannot traverse externally. Real-chokidar regression test included.
- decideAutoOpenAfterWrite path-suffix matching (mrcfps): pass the
  agent's full file_path through (not just the basename); resolve via
  path-suffix match against project file paths, with single-unambiguous
  basename fallback only when filePath has no slash. Fixes the
  same-basename collision case where an external Write to App.jsx
  could open a project's prototype/App.jsx.
- Dev-mode logs (lefarcen P3): warn when the broadcast subscriber
  loop or the SSE payload parser swallows an error, so subscriber
  bugs and payload-shape regressions don't go silent during testing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: DevForgeAI CI/CD Engineer <devforge-ai@development.ai>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 20:47:22 +08:00
lefarcen
016c08183f
release: Open Design 0.3.0 2026-05-03 23:07:28 +08:00
louie42
61cdc3fe4b
fix(daemon): upgrade better-sqlite3 for Node 24 Windows prebuilt support (#357)
Merged per maintainer approval in pr 解决群.
2026-05-03 15:59:41 +08:00
lefarcen
62b01a6dbf
release: Open Design 0.2.0 (#297) 2026-05-02 22:28:59 +08:00
Sid
52677555f7
fix(daemon): include package.json in tarball so packaged app reports correct version (#260)
The packaged app's Settings → About page displays version `0.0.0` instead
of the actual release version because the daemon's `package.json` was
excluded from its `pnpm pack` tarball.

Root cause: `apps/daemon/package.json` declares `"files": ["dist"]`, so
`pnpm pack` only includes the `dist/` directory. At runtime,
`readCurrentAppVersionInfo()` in `apps/daemon/src/app-version.ts` resolves
`new URL('../package.json', import.meta.url)` from the compiled
`dist/app-version.js`, which points at the tarball's root `package.json`.
Because that file isn't packed, the read fails silently, falls through
the catch in `readPackageMetadata()`, and the version falls back to the
`APP_VERSION_FALLBACK = '0.0.0'` constant. `/api/version` and
`/api/health` then report `'0.0.0'` for every packaged install.

Fix: add `"package.json"` to the daemon's `files` array so it ships in
the tarball. The package already declares `"./package.json"` as an
exports entry, so consumers expect this file to be available.

Closes #224
2026-05-02 14:52:23 +08:00
PerishFire
a40d817d28
Add mac packaged runtime and beta release flow (#170)
* feat(pack): add mac packaged runtime control plane

* feat(pack): harden mac packaged runtime lifecycle

Keep packaged state namespace-scoped, make daemon paths explicit through sidecar launch env, and add conservative desktop identity/logging fallbacks for local mac package validation.

* feat(pack): add mac beta release flow

* fix(pack): generate mac update feed fallback

* fix(pack): write portable beta checksums

* fix(pack): make beta artifacts portable

* fix(pack): clean up mac install visuals

* fix(pack): address packaged runtime review feedback
2026-04-30 20:25:49 +08:00
PerishFire
c6d11018a0
Refresh desktop integration control plane (#123)
* feat(dev): add desktop tools-dev control plane

* refactor(sidecar): split Open Design contracts

Move Open Design-specific sidecar protocol definitions into @open-design/contracts so sidecar and platform can remain descriptor-driven primitives.

* refactor(daemon): organize package sources

Keep daemon app code, tests, and sidecar entrypoints in separate package directories so each layer can be built and verified independently.

* chore(repo): streamline maintenance entrypoints

Centralize agent guidance by directory and reduce root command chains while preserving the existing build scope.

* docs: translate agent guidance to English

* fix(sidecar): tolerate stale IPC sockets

Remove stale Unix socket files only after confirming no listener is active, so tools-dev can restart after unclean shutdowns.
2026-04-30 14:23:53 +08:00
nettee
56d08b8c5f
Add shared contracts and migrate project code to TypeScript (#118) 2026-04-30 13:01:15 +08:00
PerishFire
cfebff9653
Align app directories and isolate e2e tests (#102)
* chore: align app directories

* test: consolidate external suites under e2e
2026-04-30 09:47:03 +08:00