Commit graph

35 commits

Author SHA1 Message Date
lefarcen
df8a0faff6
feat(runtimes): register AMR (vela) as an ACP stdio agent (#2355)
* feat(runtimes): register AMR (vela) as an ACP stdio agent

AMR is the vela CLI's ACP runtime mode. `vela agent run --runtime opencode`
speaks ACP JSON-RPC over stdio (see vela's
`specs/current/runtime/manual-agent-run-openrouter.md`); per
`docs/new-agent-runtime-acp.md` we expose it through the same `streamFormat:
'acp-json-rpc'` transport that already powers Hermes, Devin, Kimi, etc.

The new `defs/amr.ts` is the entire wiring — `buildArgs` returns
`['agent', 'run', '--runtime', 'opencode']`, `fetchModels` reuses
`detectAcpModels`, and the fallback list seeds the OpenRouter ids vela's
e2e baseline uses. `executables.ts`/`app-config.ts`/`metadata.ts` get the
matching `VELA_BIN`/`VELA_LINK_URL`/`VELA_RUNTIME_KEY`/`VELA_OPENCODE_BIN`
allowlist + install/docs URLs, so users can configure the per-agent env in
Settings without leaking into other adapters.

Coverage: `tests/fixtures/fake-vela.mjs` is a minimal ACP stub that returns
the documented `initialize` / `session/new` / `session/set_model` /
`session/prompt` shapes; `tests/amr-acp-integration.test.ts` spawns it via
`child_process.spawn` and drives a full turn through `attachAcpSession` and
`detectAcpModels`, so the ACP transport contract for AMR is end-to-end
verified locally even before a real `vela` binary is installed.

Validated:
- pnpm guard
- pnpm typecheck (all workspace projects)
- pnpm --filter @open-design/daemon test (2881/2881)

Deferred: real OpenRouter-backed turn through a built `vela` binary —
the runtime def needs no changes for that path, only `VELA_RUNTIME_KEY`
and `VELA_LINK_URL` in env (or Settings).

* fix(runtimes/amr): pin a concrete default model and bare openai ids

End-to-end validation against a freshly-built `vela` (nexu-io/vela@main)
+ OpenRouter surfaced two contract details the first AMR runtime def
got wrong:

1. vela rejects `session/prompt` with `session/set_model must be called
   before session/prompt`. attachAcpSession in apps/daemon/src/acp.ts
   skips set_model whenever the picked model is the synthetic 'default'
   id, so AMR's fallback list must NOT include DEFAULT_MODEL_OPTION. The
   def now ships a concrete `gpt-5.4-mini` as both `fetchModels`'
   default option and `fallbackModels[0]`, which makes attachAcpSession
   always send a real `session/set_model` for AMR turns.

2. `vela --runtime opencode` auto-prepends `openai/` to whatever modelId
   it forwards to opencode's openai provider. With OpenRouter-style ids
   like `openai/gpt-5.4-mini`, opencode receives the double-prefixed
   `openai/openai/gpt-5.4-mini` and replies `ProviderModelNotFoundError`.
   The new fallback list ships the bare ids opencode's openai registry
   actually knows about (gpt-5.4, gpt-5.4-mini, gpt-5.4-fast, etc.).

Stub + tests:
- tests/fixtures/fake-vela.mjs now enforces the set_model gate the same
  way real vela does, so a regression that silently goes back to
  model: 'default' would surface as a fatal error in tests instead of a
  hidden production failure.
- tests/amr-acp-integration.test.ts pins both contracts: no 'default' /
  no 'openai/' prefix in fallbackModels, and a negative case that
  asserts session/prompt fails when no model is set.

Adds `apps/daemon/scripts/verify-amr-real-vela.mjs` — a small dev-time
runner that drives `attachAcpSession` against a real `vela` binary and
prints the daemon's chat events, so future protocol drift can be checked
against an actual OpenRouter call.

Verified locally: `vela agent run --runtime opencode` + OpenRouter
returns the prompted string ("AMR-E2E-PASS") through the full daemon
pipeline; daemon test suite stays 2883/2883.

* fix(runtimes/amr): substitute concrete model when chat run sends 'default'

A plugin-driven AMR run from the UI surfaced a real-world hole in the
prior commit:

  json-rpc id 3: session/set_model must be called before session/prompt

The Default-design-router plugin (and any caller that doesn't pin a
real model) sends `model: 'default'` straight through, which the AMR
runtime def cannot accept — vela rejects `session/prompt` without
`session/set_model` and attachAcpSession skips set_model whenever
model === 'default'. Just leaving DEFAULT_MODEL_OPTION out of the
adapter's `fallbackModels` is not enough: the chat-run handler in
server.ts still forwarded 'default' verbatim.

This adds `resolveModelForAgent(def, resolved, env?)` as the
single source of truth for the substitution:

  1. If the caller picked a real id, pass it through.
  2. Else, if `def.defaultModelEnvVar` is set and the daemon process
     env has a non-empty value for it, return that (operator escape
     hatch — see below).
  3. Else, if the def's `fallbackModels` does NOT contain a 'default'
     id, return `fallbackModels[0].id`.
  4. Else, return the original value (the historic shape — defs that
     list 'default' themselves are untouched).

AMR sets `defaultModelEnvVar: 'VELA_DEFAULT_MODEL'`, so when
opencode's openai-provider registry deprecates `gpt-5.4-mini`
upstream, an operator can swap the fallback id without a code change
by exporting `VELA_DEFAULT_MODEL=gpt-5.5` before launching tools-dev
/ od. Worth noting the env var must live in the daemon's `process.env`
(Settings-UI per-agent env values only reach the spawned child, not
the daemon's resolver) — the new field's docblock spells this out.

Coverage:
- `tests/runtimes/resolve-model.test.ts` — 8 unit tests covering all
  four resolver branches plus the env-override happy path / fallback /
  ignore-when-user-picked-a-real-id case.
- `pnpm --filter @open-design/daemon typecheck` clean.

* chore(runtimes/amr): move AMR to the top of the base agent list

So `AMR (vela)` shows up first in the agent picker / status views,
ahead of claude / codex. Pure ordering change; no behavior delta.

* feat(amr): Sign-in / Sign-out button on the AMR Settings card

The first half of the AMR work assumed the operator would set
VELA_RUNTIME_KEY / VELA_LINK_URL on the daemon process and never
surfaced login state to users. This adds the missing UX so a fresh
install can drive the full path from Settings:

  - GET  /api/integrations/vela/status   reads ~/.vela/config.json
    for the active profile and returns { loggedIn, profile, user }
    (without leaking the runtime/control keys themselves).
  - POST /api/integrations/vela/login    spawns `vela login` once
    (409 if one is already in flight). The vela CLI opens the user's
    browser to the device-authorization page itself — Open Design
    only needs to kick the subprocess off.
  - POST /api/integrations/vela/logout   removes ~/.vela/config.json
    so the next status read returns logged-out.

`AmrAgentCard` is a dedicated agent-card component for AMR because
the existing `<button>` row can't host an interactive sub-control
(nested interactive elements). It polls /status after a login click
until the daemon reports loggedIn=true (or 5 minutes elapse), and
exposes a Sign-out action on hover. Other adapters (claude, codex,
hermes, …) keep their existing `<button>` card.

i18n: 8 new keys (settings.amrLogin / Logout / LoggingIn / etc.)
added to en + zh-CN. Other locales spread `en` and inherit the
English copy until translations land.

Coverage:
- `tests/integrations/vela.test.ts` pins the config.json reader
  against a tmp HOME — including the negative case where a profile
  has user info but no runtimeKey (still logged-out), and the
  secret-leak guard ("rt-secret-*" must not appear in the projection
  payload).
- `tests/components/AmrAgentCard.test.tsx` covers all four UI
  states (logged-out, logging-in, logged-in, logging-out) plus the
  click-propagation invariant the divergent card was built to keep.

`pnpm --filter @open-design/daemon test` 2901 / 2901 passing.
`pnpm --filter @open-design/web test` 1719 / 1719 passing.
`pnpm typecheck` + `pnpm guard` clean.

Dev script side-effects: `apps/daemon/scripts/verify-amr-real-vela.mjs`
no longer requires both VELA_RUNTIME_KEY and VELA_LINK_URL — if
VELA_PROFILE is set, the vela CLI is allowed to resolve credentials
from `~/.vela/config.json`. Added the two AMR `.mjs` fixtures to
`scripts/guard.ts` allowlist with the executable-fixture / dev-runner
rationale.

* fix(connection-test): substitute model for AMR before attachAcpSession

The chat-run path in server.ts already routes the requested model through
`resolveModelForAgent` so AMR / vela (whose CLI demands an explicit
`session/set_model` before `session/prompt`) gets the def's first
concrete fallback id when the chat run ships `model: 'default'`.
`connectionTest.ts` was wiring `attachAcpSession({ ..., model: model ?? null })`
directly, which made the Test Connection button on the AMR Settings
card deadlock with the same `session/set_model must be called before
session/prompt` error the chat-run path already handles — surfaced as a
permanent "Testing connection…" spinner in the UI.

Reuse the same helper here so Test Connection mirrors chat-run behavior.

* test(amr): three-layer end-to-end coverage for the AMR login + turn flow

The PR up to this point shipped runtime + UI code with unit-level Vitest
coverage. This commit adds the cross-layer regression net the live demo
relied on:

1. apps/daemon/tests/integrations/vela.routes.test.ts (HTTP, Vitest)
   Spins up the real daemon Express app via `startServer({port:0,...})`,
   persists `agentCliEnv.amr.VELA_BIN = <fake>` into app-config.json,
   and exercises every /api/integrations/vela/* endpoint against the
   extended fake-vela stub:
     - status reads ~/.vela/config.json under various states
     - login spawns the fake, waits for config.json to appear, returns
       pid + startedAt + profile
     - 409 already-running guard with the stub's delay knob
     - logout removes the file (idempotent)
     - secrets (runtimeKey / controlKey) never leak in the projection
     - login → status round-trip flips loggedIn=false → true

2. e2e/tests/amr/turn.test.ts (tools-dev orchestrated, Vitest)
   Boots a namespaced daemon + web pair through `createSmokeSuite`,
   inlines a self-contained fake `vela` binary that handles BOTH
   `vela login` (writes ~/.vela/config.json) and
   `vela agent run --runtime opencode` (ACP stdio with the
   `session/set_model must precede session/prompt` gate the real binary
   enforces), then drives a complete /api/runs lifecycle for
   `agentId: 'amr', model: 'default'` and asserts the assistant message
   captures the fake's streamed text. This is the test that would have
   surfaced today's plugin-default-model regression (the `set_model
   before prompt` error) at PR time instead of demo time.

3. e2e/ui/amr-login-pill.test.ts (Playwright)
   Mocks /api/agents + /api/integrations/vela/{status,login,logout}
   to drive the Settings AMR card through the full Sign in → Signed in
   → Sign out cycle. Pins the AmrLoginPill polling contract and the
   aria-label semantics (the pill's accessible name is "Sign out" once
   logged in, regardless of which label the hover-state text shows).

fake-vela.mjs extensions:
   - Handles `vela login` argv by writing
     ~/.vela/config.json for the active VELA_PROFILE and exiting 0 —
     mirrors real vela's on-disk side-effect without the device-auth
     loop.
   - FAKE_VELA_LOGIN_DELAY_MS knob so route tests can observe the
     in-flight state of the spawn lifecycle.
   - FAKE_VELA_LOGIN_USER_EMAIL / _USER_PLAN to assert the surfaced
     user fields end-to-end.

Validated:
   - `pnpm guard` + `pnpm typecheck` (all workspace projects)
   - `pnpm --filter @open-design/daemon test`: 2998 / 2998 passing,
     including the new 8-test integration suite.
   - `cd e2e && pnpm test tests/amr`: 1 / 1 passing.
   - `cd e2e && pnpm exec playwright test ui/amr-login-pill.test.ts`:
     1 / 1 passing (6.7s).

* feat(amr): package native cli and refine login ui

* feat(amr): wire vela cli beta packaging

* docs(amr): document vela ci packaging review

* docs(amr): refine vela ci integration review

* fix(ci): refresh nix pnpm dependency hashes

* fix(pack): clean up Vela CLI packaging

* fix(pack): bundle Vela CLI support files

* fix(amr): recover login attempts from stale auth state

* test: expand AMR and automations coverage

* fix(amr): address review follow-ups

* test(web): align tasks fixtures with contracts

* fix(daemon): type wildcard route params

* fix(ci): refresh PR merge validation

* fix(amr): clear env credentials on logout

* feat(settings): inline local CLI model configuration

* fix(amr): recognize daemon env credentials

* [codex] Fix Vela companion packaging (#2979)

* Fix Vela companion packaging

* Update Nix pnpm dependency hashes

* [codex] Surface AMR account failures (#2980)

* fix: surface AMR account failures

* fix: cover AMR recovery error guidance

* chore: bump beta base version to 0.8.1 (#2990)

* Fix AMR profile and packaged runtime review issues

* Detect packaged AMR OpenCode companion tree

* feat(web): polish AMR frontend flows

* Polish AMR onboarding card

* fix: read AMR login state from dot-amr config (#3048)

* test: tighten AMR credential and packaging coverage

* test: restore AMR executable test env helper

* [codex] Fix packaged mac Dock identity and AMR label (#3076)

* Fix packaged mac sidecar Dock identity

* Rename AMR assistant label

* Fix AMR live models and dot-amr login state (#3073)

* fix: read AMR login state from dot-amr config

* fix: load live AMR models before runs

* fix: point AMR onboarding link to production wallet

* fix: address AMR model review feedback

* fix: persist live AMR model fallback

* [codex] Fix AMR link catalog model ids (#3088)

* Fix packaged mac sidecar Dock identity

* Rename AMR assistant label

* Fix AMR link catalog model ids

* Fix AMR model normalization typecheck

* Use live AMR model for default runs

* fix: polish AMR runtime settings UI

* Accelerate AMR startup defaults (#3092)

* Surface AMR insufficient balance wallet URL (#3099)

* fix(web): polish onboarding controls (#3112)

* fix(web): show CLI scan loading state

* Avoid duplicate AMR wallet recharge links (#3117)

* Avoid duplicate AMR wallet recharge links

* Use Vela CLI 0.0.3 test package

* chore(nix): refresh pnpm deps hash

* Fix AMR wallet guidance display

---------

Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>

* chore(pack): pin Vela CLI 0.0.3-test.1 (#3127)

* chore(nix): refresh pnpm deps hash

* chore(pack): pin Vela CLI 0.0.3

* chore(nix): refresh pnpm deps hash

* fix(web): suppress AMR exit 130 fallback (#3136)

* feat(web): nudge users to hosted AMR on model/auth/quota failures (#3083)

* feat(web): nudge users to hosted AMR on model/auth/quota failures

When a non-AMR agent run fails with an auth / quota / upstream model
error, surface an inline nudge under the error pill linking to Open
Design's hosted AMR gateway (https://open-design.ai/amr). The nudge
fires `surface_view` (element=run_failed_toast) on impression and
`ui_click` (element=go_amr) on the link.

Also teach the daemon to classify CLI-agent auth/quota/upstream failures
(Claude Code, codex, ...) into specific API error codes
(AGENT_AUTH_REQUIRED / RATE_LIMITED / UPSTREAM_UNAVAILABLE) instead of
the generic AGENT_EXECUTION_FAILED, so both the error message and the
nudge key off accurate codes. AMR's own runs are excluded from the
nudge — they keep the dedicated sign-in / recharge affordances.

* feat(web): rework failed-run AMR guidance into per-case error UI

Replace the single inline nudge with a per-case failed-run experience
driven by the run's error code + agent:

- The error card is now neutral gray (was red) and always carries a
  retry button; it is driven by the persisted per-message error event so
  it survives a reload.
- Non-AMR agent hitting a model/auth/quota wall: a theme-color promotion
  card under the error card offers "switch to AMR & retry" — switches the
  run to AMR, opens Settings on the AMR card, and auto-retries once the
  account signs in (ProjectView polls vela login status, independent of
  the Settings pill lifecycle, with success / 5-min-timeout / unmount
  exits).
- AMR agent unauthorized: clearer copy + an "authorize & retry" button.
- AMR agent out of balance: clearer copy + a "top up" button to the AMR
  wallet, with manual retry.
- Settings AMR card: when opened from the nudge, it scrolls into view and
  pulses, and an authorize-button coachmark (a fake hand cursor that
  rises in and dismisses on hover) points at the sign-in control when not
  yet authorized.

analytics: surface_view (run_failed_toast) on the promotion card and
ui_click (go_amr) on its action are retained. i18n adds chat.amrCard.*
and chat.amrError.* (en / zh-CN / zh-TW translated; other locales fall
back to en) and drops the old chat.amrErrorGuidance keys.

* fix(daemon): require status context for numeric service-failure codes

Per review on #3083: the model-service classifier matched bare HTTP
status numbers (`500`, `502`, `429`, `401`), so ordinary CLI output like
`line 500`, `read 502 bytes`, or `exit code 401` could be misclassified
as a provider outage / auth wall and wrongly surface the AMR nudge. Now
a status number only counts when it carries explicit context (`HTTP 500`,
`status 503`, `code: 401`, `502 Bad Gateway`); textual provider phrases
(overloaded, bad gateway, service unavailable, rate limit, …) are
unchanged. Adds fixtures proving unrelated numeric output stays null.

* fix(web): keep error pill for failed runs ChatPane's card doesn't cover

Per review on #3083: the per-message gray error pill was suppressed for
every persisted error status event, but ChatPane only renders the
replacement top-level error card for `retryableAssistantMessage` (the
last failed assistant). So a failed turn that is no longer last (after a
follow-up) or an older failed run in history showed neither the pill nor
the card — its error detail vanished, undercutting reload/history
survival. ChatPane now passes `errorCardOwnerId` (the assistant id whose
error the card represents); AssistantMessage suppresses only that one
pill and keeps rendering StatusPill for all other error events.

* fix(daemon): don't treat a process exit code as an HTTP status

Follow-up to review on #3083: the status-context helper accepted a bare
`code` prefix, so `exit code 401` / `process exited with code 429` still
matched and got classified as AGENT_AUTH_REQUIRED / RATE_LIMITED (the
very `exit code 401` case the comment calls out as noise). `code` now
only counts when qualified (`status code` / `error code` / `response
code`) or punctuation-bound (`code: 401`); bare `exit code N` no longer
matches. Adds fixtures for exit-code lines returning null.

* chore(web): translate AMR card / error keys for 16 remaining locales

PR #3083 added 10 new `chat.amrCard.*` / `chat.amrError.*` keys but only
provided en/zh-CN/zh-TW translations; the other 16 locales fell back to
English. Translate the card title/body, three chips, primary CTA, and
the AMR self-error (auth / balance) messages and buttons for ar, de,
es-ES, fa, fr, hu, id, it, ja, ko, pl, pt-BR, ru, th, tr, uk.

* fix(amr): address review feedback on #2355

Targeted fixes for the unresolved review threads on #2355. Each fix
includes / updates a focused test.

- runtimes/executables.ts: `packagedVelaOpenCodeCompanionTree` now
  verifies the inner `opencode` executable exists + is runnable, not
  just the directory. This closes the false-positive availability path
  that let `detectAgents()` surface AMR as available even when the
  packaged companion was empty / partially copied (mrcfps, 4 threads).

- runtimes/executables.ts: `resolveAmrOpenCodeExecutable` now prefers
  the bundled `<OD_RESOURCE_ROOT>/bin/libexec/opencode/opencode` over a
  stale `opencode` on the user's PATH, so packaged AMR builds can't be
  hijacked by a global installation.

- web/EntryShell.tsx: when the Local CLI scan returns an available
  agent and the previously-selected agent is AMR, switch the selection
  to the first available local agent so the runtime and persisted
  agent agree before Continue.

- server.ts (model-probe branch): for AMR, check `readVelaLoginStatus`
  BEFORE rejecting on an empty live-model catalog — a signed-out user
  was getting `AMR_MODEL_UNAVAILABLE` ("choose a model") instead of
  the correct `AMR_AUTH_REQUIRED` (sign-in affordance).

- server.ts (default model fallback): if the user asked for the AMR
  agent default and the cached id is no longer in the FRESH catalog,
  fall back to `liveModels[0]` from the probe instead of rejecting the
  run as `AMR_MODEL_UNAVAILABLE`.

- integrations/vela.ts: route `vela login` through
  `createCommandInvocation` so an npm/Node-style `vela.cmd` / `.bat`
  shim on Windows gets the correct `cmd.exe /d /s /c …` wrapping with
  verbatim args (matches `execAgentFile` / chat-run spawning).

- tools/pack/src/linux.ts: in containerized Linux builds, bind-mount
  the host directory of `OPEN_DESIGN_VELA_CLI_BIN` and rewrite the env
  to the container-side path. The host path was being passed in as-is
  even though the default container only mounts /project, /tools-pack
  and cache/home — `copyOptionalVelaCliBinary` saw a missing path.

Deferred (out of scope for this PR):
- `od amr status/login/logout/cancel` CLI subcommands (AGENTS.md
  UI/CLI dual-track rule, server.ts:5763) — sizable surface; tracked
  for a separate focused PR.
- Strict `--require-vela-cli` for Windows + mac-x64 beta builds:
  prematurely blocked — `@powerformer/vela-cli` only publishes the
  `darwin-arm64` platform binary today; adding the flag elsewhere
  would fail the builds. Revisit once win/x64/linux binaries ship.

* fix(amr): hoist sendAmrAccountFailure above the AMR catalog preflight (TDZ)

The new signed-out AMR branch in the catalog preflight at server.ts:10875
calls `sendAmrAccountFailure(...)` to emit AMR_AUTH_REQUIRED, but the
const declaration sat ~100 lines below at the outer function scope. Because
`const` is TDZ-aware, that branch would have thrown `ReferenceError:
Cannot access 'sendAmrAccountFailure' before initialization` for the
exact users it tries to help — defeating the original intent.

Hoist the helper to just above the AMR preflight block so it's available
to every AMR code path in this function. Behavior elsewhere is unchanged.

Also rerun the daemon test suite: `launch.test.ts > resolveAgentLaunch
uses packaged built-in Vela for AMR` was creating the
`<resourceRoot>/bin/libexec/opencode/` companion *directory* only, but
this PR's earlier tightening of `packagedVelaOpenCodeCompanionTree`
also requires the inner `opencode` executable. Add it to that fixture
to match the new contract; the test was a sibling of the executables /
env-and-detection fixtures already updated in 13fc4f4.

Addresses #2355 review (mrcfps, 2026-05-28).

* feat(web): add hover cancel for AMR login (#3158)

* feat(web): add hover cancel for AMR login

* fix(web): don't bounce AmrLoginPill back to 'Signing in…' after local cancel

Both codex-connector (P2) and looper (CHANGES_REQUESTED) on this PR
flagged the same race in the new local-cancel path: `handleCancelLogin`
dispatches `notifyAmrLoginStatusChanged('login-canceled')` immediately
after `/login/cancel` returns, but the `AMR_LOGIN_STATUS_EVENT` listener
unconditionally re-enters `refresh()` and then restarts polling
whenever `/api/integrations/vela/status` still reports
`loginInFlight: true`.

That is a real race because the daemon's `cancelVelaLogin()` only sends
SIGTERM (escalating to SIGKILL after `LOGIN_CANCEL_KILL_GRACE_MS` =
2000 ms) and keeps the child in `activeLoginProcs` until it actually
exits — so the first `/status` read after a successful cancel can
legally still come back as in-flight. Under that window the pill flips
back to 'Signing in…' and can later surface the timeout/error path even
though the user already canceled, defeating the behavior promised in
the PR description.

Fix the listener instead of every dispatch site: in the
`login-canceled` branch, after the local reset (stopPolling +
setPending(null) + clear refs), optimistically mark every subscribed
pill instance as not-in-flight (`setStatus((c) => c ? { ...c,
loginInFlight: false } : c)`) and `return` — skip the
refresh-and-reconcile branch below entirely. The next explicit refresh
(component mount, user interaction, or a `status-changed` event) will
pick up the daemon's confirmed state once the child has actually
exited.

Add a focused regression test that holds `/api/integrations/vela/status`
at `loginInFlight: true` even after a successful `/login/cancel`,
asserting that the pill stays at the Canceled → Authorize sequence and
never bounces back to 'Signing in…'. This test fails on the pre-fix
listener and passes on the new behavior; existing
'cancels an in-flight AMR sign-in…' and 'reconciles late AMR browser
completion to Signed in after local cancel' tests continue to pass.

Addresses review feedback on #3158 (chatgpt-codex-connector, nettee).

---------

Co-authored-by: lefarcen <935902669@qq.com>

---------

Co-authored-by: a1chzt <chizblank@gmail.com>
Co-authored-by: Amy <1184569493@qq.com>
Co-authored-by: Mason <jinmeihong0201@gmail.com>
Co-authored-by: Caprika <56862773+alchemistklk@users.noreply.github.com>
Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>
2026-05-28 05:09:55 +00:00
lefarcen
c14baf07d3 Merge origin/main into release/v0.8.0
PR #2461 sync prep — resolves 14 conflicts merging 84 main-side commits
on top of 58 release-side commits accumulated during the 0.8.0 cycle.

Resolution summary:

Take main (theirs) where main carried deliberate forward progress:
- apps/web/src/components/PluginCard.tsx — 7 hunks, i18n migration:
  hardcoded English aria-labels/titles replaced with t() calls keyed
  on pluginCard.* (all 8 keys verified present in en.ts).
- apps/web/src/components/TasksView.tsx — 1 hunk, source-ingestion
  feature: sortedRoutines (newest-first), sourceIngestionTemplates,
  patchSourceForm, submitSourceIngestion. activeCount/pausedCount
  semantics preserved (now keyed on sortedRoutines, count unchanged).
- e2e/ui/app.test.ts — new node:fs/promises + tmpdir + path + @/timeouts
  imports needed by main-side test helpers.
- e2e/ui/settings-local-cli-codex-fallback.test.ts — menu-dismissal
  helper block added by main.

Keep both sides where each added a different field to the same object
literal:
- apps/web/src/components/ProjectView.tsx (locale + analyticsHints
  spread).
- apps/web/src/components/DesignSystemFlow.tsx (locale + analyticsHints).

Take release (ours) where release carried deliberate work that ships
0.8.0:
- CHANGELOG.md — release-side 0.8.0 entry + PR link refs; main's
  Unreleased section was the same body of work, now finalized.
- apps/landing-page/public/{apple-touch-icon,favicon}.png +
  apps/web/public/app-icon.svg — release-side visual refresh assets
  consistent with 0.8.0 stable ship.
- tools/pack/src/linux.ts — packageVersion const required by line 466;
  taking main's empty line would build-error.
- e2e/ui/project-management-flows.test.ts +
  e2e/ui/settings-api-protocol.test.ts +
  e2e/ui/settings-memory-routines.test.ts — release-side release-smoke
  hardening (shangxinyu1 + PerishFire) takes precedence on overlap.

Closes-issue / unblocks: PR #2461 sync release/v0.8.0 → main.
2026-05-23 12:17:18 +08:00
Marc Chan
a3872b97a9
fix(tools-dev): preserve web origin trust on web start (#2715)
* fix(tools-dev): preserve web origin trust on web start

Restart daemon/web when the trusted web port is missing, and reuse the active web port during repeated starts so run web and start web keep app-config origin checks aligned.

Generated-By: looper 0.0.0-dev (runner=worker, agent=opencode)

* fix(plugins): refresh official registry bundled count

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(tools-dev): preserve daemon/web reserved ports

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(tools-dev): preserve daemon reuse on web start

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(tools-dev): preserve running daemon port on web reuse

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(tools-dev): reserve explicit web port before daemon allocation

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* test(web): stabilize media provider reload flash timing

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(web): restore merged reattach workspace coverage

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(tools-dev): reserve allocated daemon port

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* test(e2e): wait for artifact manifest persistence

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)
2026-05-23 00:25:43 +08:00
lefarcen
50a4dc8a62 Merge origin/main into release/v0.8.0 2026-05-21 13:17:52 +08:00
Patrick A
85276df284
chore(deps): patch security override and patch bumps (#2306)
- Add pnpm override: protobufjs 8.4.0 (CVE-2026-45740, GHSA-jggg-4jg4-v7c6)
- Bump postcss 8.5.14 -> 8.5.15 in apps/web (and root override)
- Bump tsx 4.22.2 -> 4.22.3 across all workspace packages

Co-authored-by: Patrick A <259201958+eefynet@users.noreply.github.com>
2026-05-21 11:51:54 +08:00
lefarcen
722ddfa235 Merge origin/main into release/v0.8.0
Conflicts resolved by taking origin/main on both files. Root cause:
main's PR #2460 (fix(landing): align logo.webp with brand icon) changed
HomeHero.tsx's .home-hero__brand-mark to render <img src=/app-icon.svg>
instead of an inlined <HeroBrandIcon /> SVG, and bundled the matching
CSS (26px round badge with bg-panel + border + padding 2px) plus a
gap/font-size tune. The release-side visual-refresh CSS still targeted
the SVG layout (38px square, transparent, inset SVG selector). Keeping
release's CSS would leave main's <img> unstyled.

- apps/web/src/styles/home/home-hero.css  three blocks, all taken from
  main: .home-hero__brand gap 8px, .home-hero__brand-mark redesigned for
  <img> child, .home-hero__brand-name font-size 16px.
- apps/web/src/index.css  two blocks, both taken from main: workspace
  tab close column 22px and .workspace-tab__close 18x18 (paired
  tune-down of tab UI spacing).
2026-05-20 22:28:38 +08:00
Eli-tangerine
8193981511
Keep PR 2400 changes without folder pickers (#2462)
* feat(daemon): add project working directory management and editor hand-off functionality

- Introduced new flags for project commands to manage working directories, including `--working-dir` and `--dir`.
- Implemented API routes for listing available editors and opening projects in selected editors.
- Added a hand-off button in the ChatPane header to facilitate opening project folders in local applications.
- Enhanced the HomeHero component to include working directory and design system settings, improving user experience in project creation.
- Created HomeHeroSettingsChips component for inline management of working directory and design system selection.

* feat(chat): implement voice transcription proxy and enhance UI components

- Added a new API route for voice transcription using OpenAI's `/audio/transcriptions` endpoint, allowing users to send audio blobs directly for transcription.
- Integrated multer for handling audio file uploads in memory, ensuring efficient processing without disk storage.
- Updated the HomeHero component to include example prompt suggestions for plugins, enhancing user interaction.
- Introduced the EditorIcon component to visually represent different editors in the hand-off menu, improving the user experience.
- Refined the HandoffButton component to utilize the new EditorIcon, providing a more cohesive interface for selecting editors.
- Enhanced CSS styles for various components to improve layout and responsiveness, including adjustments to tab and button sizes for better usability.

* style(workspace-shell): enhance layout and overflow handling

- Updated CSS for .workspace-shell to ensure full viewport width and height, with proper overflow management.
- Adjusted grid layout to prevent content overflow and maintain responsiveness.
- Modified styles for .workspace-tabs-chrome to improve width handling and prevent overflow issues.

* refactor(chat): remove voice transcription proxy and related components

- Deleted the voice transcription proxy implementation, including the associated API route and multer configuration.
- Removed the MicButton component from the ChatComposer and HomeHero components to streamline the UI.
- Updated HomeHero to include example suggestions without the voice input functionality.
- Adjusted CSS styles for various components to maintain layout consistency after the removal of the MicButton.

* feat(daemon): implement minting of HMAC tokens for working directory management

- Added a new function `mintImportTokenFromCurrentSecret` to generate HMAC tokens bound to a specified base directory, enhancing security for working directory operations.
- Updated the `desktop-auth.ts` file to include the new token minting functionality, which returns structured errors when the desktop auth secret is cleared.
- Introduced new IPC message types for minting import tokens in the sidecar protocol, allowing seamless integration with the daemon's working directory management.
- Enhanced the `WorkingDirPill` component to utilize the new token minting flow for secure directory selection in desktop builds.
- Updated CSS styles for the HomeHero component to accommodate new example suggestion features and maintain layout consistency.

* fix(HomeView): import HOME_HERO_CHIPS constant for improved chip management

- Updated the HomeView component to import the HOME_HERO_CHIPS constant from the chips module, enhancing the management of hero chips within the component.

* feat(daemon): implement mintImportTokenViaSidecar for secure working directory management

- Introduced the `mintImportTokenViaSidecar` function to facilitate the minting of HMAC tokens for desktop-import operations via the daemon's sidecar IPC. This allows CLI commands to bypass authentication when the desktop-auth gate is active.
- Updated the CLI to utilize the new token minting function when setting the working directory, ensuring secure access to trust-gated API endpoints.
- Enhanced the sidecar server to handle minting requests and return structured error messages for improved user feedback.
- Added tests to validate the new token minting functionality and its integration with the working directory management process.
- Refactored related components to support the new token flow, improving overall security and user experience.

* feat(HomeHero): enhance UI components and styles for improved user experience

- Updated HomeHero component to replace active dot indicators with Plug icons for better visual representation of active plugins.
- Adjusted CSS styles for various elements, including padding and dimensions, to enhance layout consistency and responsiveness.
- Introduced new styles for active type icons and improved hover effects for buttons.
- Updated HomeHeroSettingsChips to change button titles and icons for clarity.
- Added tests to ensure proper rendering and functionality of updated components.

* feat(ProjectDesignSystemPicker): enhance design system selection with preview functionality

- Updated the ProjectDesignSystemPicker component to include a preview feature for design systems, allowing users to see a preview of the selected design system.
- Implemented hover functionality to update the preview based on the hovered design system.
- Added fullscreen preview capability for a more immersive experience.
- Enhanced CSS styles for the design system picker to improve layout and responsiveness.
- Introduced tests to validate the new preview functionality and ensure proper interaction within the component.

* feat: refactor project metadata handling and enhance design system picker

- Updated the default scenario plugin ID retrieval to use project metadata, improving the logic for determining the appropriate plugin based on project intent.
- Enhanced the ProjectDesignSystemPicker and related components to support localized design system summaries and categories, improving user experience.
- Introduced new translations for working directory and design system picker components, ensuring better accessibility and usability across different locales.
- Added a new 'live-artifact' project type to the HomeHero chips, expanding the functionality for users creating refreshable artifacts.
- Updated tests to validate the new project metadata handling and design system picker functionalities.

* feat: enhance localization and styling for design system components

- Added French translations for working directory and design system picker components, improving accessibility for French-speaking users.
- Updated CSS styles for the pet task item to ensure consistent padding and layout.
- Introduced a new test suite for HomeHeroSettingsChips to validate localization and design system selection functionality.
- Enhanced ProjectDesignSystemPicker tests to ensure proper localization and interaction with design system categories.

* fix: update .gitignore to include all claude-sessions directories and remove specific session files

- Modified .gitignore to ensure all claude-sessions directories are ignored by using a wildcard pattern.
- Deleted two specific claude-sessions markdown files to clean up unnecessary session data.

* fix: repair home automation ci regressions

* fix: stabilize artifact consistency e2e

* Remove folder picker changes from PR 2400

---------

Co-authored-by: pftom <1043269994@qq.com>
Co-authored-by: qiongyu1999 <2694684348@qq.com>
2026-05-20 22:07:30 +08:00
lefarcen
aedbb9dbe4 release: Open Design 0.8.0
Bumps 14 workspace package.json files from 0.7.0 to 0.8.0:
- root, apps/{web,daemon,desktop,landing-page}
- packages/{contracts,host,platform,sidecar,sidecar-proto}
- tools/{dev,pack,pr}, e2e

apps/packaged was already at 0.8.0 from the preview lane.
Independently versioned packages keep their own tracks.

Adds CHANGELOG [0.8.0] - 2026-05-20 entry covering the
305 PRs merged since 0.7.0 by 75 contributors:

- Plugin engine rebuild + Plugin Registry surface
- Headless by default (desktop is thin wrapper around CLI)
- Critique Theater Phases 9 through 16
- 149 design systems with structured tokens.css
- Italian locale + CJK font fallback
- Leonardo.ai, ElevenLabs, SenseAudio providers
- Windows packaged auto-update
- Visual refresh + Quick-brief discovery overhaul
- PostHog v2 analytics
- Manual edit UX overhaul
2026-05-20 21:22:17 +08:00
Eli
18b947c25f
[codex] Land design system GitHub intake handoff (#2187)
* Add Claude-style design system workflow

* Merge design system workflow into main

* Restore design system workflow UI styles

* Fix design system setup scrolling

* Fix design system setup connector button

* Preserve connector auth link after popup block

* Simplify connected GitHub setup state

* Open generated design system workspace project

* Summarize design system auto prompt in chat

* Add bounded GitHub connector design intake

* Prefer path-scoped GitHub intake tools

* Restore branch GitHub design context intake

* Restore design system review workspace

* Restore design system manager tab

* Let design system workflow routes own details

* Open editable design systems as projects

* Restore design system workspace coverage

* Fix bounded GitHub connector intake

* Hide design system review while generating

* Suppress design system generation questions

* Constrain GitHub design intake to bounded command

* Tolerate oversized GitHub metadata during intake

* Rebuild daemon CLI when sources change

* Fallback when GitHub connector snapshots are rate limited

* Allow GitHub intake without Composio

* Use native GitHub auth for design intake

* Remove design system review group heading

* Improve design system extraction evidence

* Align design system scaffold with Claude output

* Add evidence inventory for design system intake

* Add local design system evidence intake

* Add design system package audit gate

* Allow auditing Claude Design reference packages

* Audit design system package content quality

* Migrate legacy design system artifacts

* Clean migrated design system artifacts

* Require modular design system UI kits

* Reject thin design system UI kits

* Prioritize core design evidence intake

* Require role-based design system UI kits

* Clean stale design system manifest references

* Require representative preserved design assets

* Warn on generic design system visuals

* Enforce design system quality warnings

* Audit connected design system UI kits

* Require mounted design system UI kits

* Require composed design system app shells

* Require runnable JSX design system kits

* Require browser globals for design system components

* Infer design system names from source URLs

* Require source examples in design system packages

* Bind preserved fonts in design system tokens

* Require skill frontmatter in design system packages

* Preserve build icons in design system packages

* Require real assets in brand previews

* Require substantive source examples

* Require product overview in design system README

* Require reusable UI kit README

* Require reusable design system skill docs

* Seed Claude-style UI kit entry contract

* Preserve runtime build assets in design packages

* Audit design system packages after generation

* Audit design system first-run output

* Audit source-backed preview cards

* Align design system UI kit scaffolds

* Materialize design evidence package artifacts

* Show project chat during design system setup

* Hand off design system setup to project chat

* Auto-repair design system audit failures

* Harden design system evidence preservation

* Tighten design system package guidance

* Add targeted design system repair guidance

* Bound design system audit auto repair

* Use connector statuses in design system setup

* Audit design system preview manifests

* Require README preview manifests for design systems

* Fix design system GitHub intake handoff

* Fix daemon prompt CI assertions
2026-05-19 14:30:17 +08:00
PerishFire
bd48c597b0
chore: pin dependency versions and harden CI caches (#2189)
* chore: pin dependency versions

* ci: enforce pinned dependency specs

* ci: fix pnpm executable invocation
2026-05-19 13:58:27 +08:00
PerishFire
4424f08be0
[codex] Add packaged desktop auto-update (#1375)
* Add packaged desktop auto-update

* Handle counted beta nightly update versions

* Refresh desktop auto-update branch for main

* Serialize desktop updater operations

* Refresh auto-update branch for packaged paths
2026-05-19 11:20:05 +08:00
Qiaochu Hu
d55f05fcfa
fix: remove dead ternary in WORKSPACE_ROOT resolution (#487)
* fix: remove dead ternary in WORKSPACE_ROOT resolution

Both tools/dev and tools/pack config files had:
  ENTRY_DIR_NAME === "dist" ? "../../.." : "../../.."
with identical branches. Since `src/` and `dist/` are siblings under
`tools/{dev,pack}/`, both resolve to the same path. The ternary and
ENTRY_DIR_NAME constant were dead code — simplify to "../..".

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Fix workspace root depth

---------

Co-authored-by: Test User <test@example.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: PerishCode <perishcode@gmail.com>
2026-05-18 15:50:07 +08:00
Nagendhra Madishetti
38a5ab69e6
feat(daemon): Critique Theater Phase 12 (9 Prometheus metrics + 6 log events + OTel span + Grafana dashboard) (#1485)
* feat(web): pure reducer for Critique Theater states (Phase 7.1)

Pure CritiqueState reducer driven by the contracts-level PanelEvent
(the same shape both the live SSE stream and the recorded transcript
emit), so a single reducer powers both the in-flight panel and the
rerun replay. Lifecycle covers run_started → running → (shipped /
degraded / interrupted / failed), with panelist_open / dim /
must_fix / close / round_end events building per-round
CritiquePanelistView entries as they arrive.

Defensive behaviour that surfaced while writing the spec tests:
- Terminal phases (shipped / degraded / interrupted / failed) are
  sticky against further lifecycle events for the same run, except
  for parser_warning which can land late and is recorded in a side
  channel without changing phase.
- A new run_started for a different runId at any time discards the
  prior state and reboots, so the UI can launch consecutive runs
  without an explicit reset action.
- Events whose runId does not match the active run return the same
  state reference, so React's useReducer doesn't re-render
  subscribers on stray traffic.
- Round bookkeeping keys by round number rather than "always last",
  so an out-of-order panelist_dim for round 1 arriving after a
  round 2 dim does not corrupt the round 2 bucket.

Test coverage: 18 cases covering each transition, the runId guard,
sticky-terminal behaviour, the out-of-order round invariant, and
the stable-identity guarantee. Sets up Phase 7.2 and 7.3 to wire
SSE + replay into the same reducer.

* feat(web): useCritiqueStream hook subscribes to SSE and feeds reducer (Phase 7.2)

createCritiqueEventsConnection is a pure connection manager that
mirrors apps/web/src/providers/project-events.ts: opens an
EventSource at /api/projects/:id/events, listens for every name in
CRITIQUE_SSE_EVENT_NAMES, decodes each frame back into a PanelEvent
(stripping the critique. prefix and merging the data payload), and
hands it to the caller's onEvent. Reconnect uses exponential
backoff (1s → 30s) and resets on `ready`; malformed payloads drop
with a dev-mode warning rather than tearing the stream.

useCritiqueStream wraps the manager in a useReducer that owns the
CritiqueState. enabled=false or a null projectId tears down the
connection cleanly; switching projectId closes the old connection
and opens a fresh one. The returned dispatch lets local UI
synthesise actions (e.g. an Esc keypress firing a synthetic
interrupted while a kill request is in flight); production traffic
comes from the SSE stream.

Test coverage:
- sse.test.ts (10 cases, node env): subscription set covers every
  CRITIQUE_SSE_EVENT_NAMES channel; payload decoding lifts the wire
  shape back to PanelEvent; malformed JSON is swallowed and does
  not stop the stream; exponential backoff schedule and ready-reset
  semantics are pinned with a setTimeout seam; close() cancels
  pending reconnects and shuts the live source; no-op fallback
  when EventSource is unavailable.
- useCritiqueStream.test.tsx (6 cases, jsdom env): idle pre-event,
  reducer driven by synthetic actions, no connection when disabled
  or projectId is null, clean close on unmount, projectId change
  reopens cleanly.

* feat(web): useCritiqueReplay hook drives reducer from transcript file (Phase 7.3)

Fetches the per-run NDJSON transcript (one PanelEvent per line),
parses every line via the shared isPanelEvent predicate, and
dispatches into the same CritiqueState reducer the live SSE stream
uses. A single reducer means the UI rendering a replay can be
identical to the live panel, and a UI mounting both
useCritiqueStream and useCritiqueReplay in parallel does not have
to reconcile two state shapes.

speed knob is `paused | instant | live | { intervalMs: N }`.
- instant flushes every event synchronously, useful for opening a
  finished run already at its terminal state.
- intervalMs paces dispatches at a fixed cadence so the reviewer
  can watch the run unfold.
- paused parses the transcript but holds events back until the
  caller advances speed (consumers can drive a scrubber later).
- live is reserved for the future "playback at original cadence"
  feature, currently treated as instant; replay timestamps are not
  yet persisted with each event so honest pacing requires a
  follow-up Phase 7+ task.

gunzip seam handles `.ndjson.gz` transcripts via
DecompressionStream when present; the production fetch path picks
between text and arrayBuffer based on the URL extension. Both seams
are injectable so the unit tests don't need to spin up a real
network or a real gzip pipeline.

Test coverage (8 cases, jsdom env):
- Idle status before any URL is provided.
- speed=instant flushes the full transcript synchronously to
  shipped state.
- speed={intervalMs:N} paces with the setTimeout seam, reaching
  done after the last tick.
- speed=paused leaves status=playing with no dispatches.
- Empty transcript reports done with state still idle.
- Fetch rejection surfaces an error status with the message.
- Malformed NDJSON lines are skipped; valid events around them
  still land.
- .gz transcripts route through the gunzip seam.

Closes the Phase 7 plan tasks 7.1 / 7.2 / 7.3 (reducer + stream +
replay), all on one branch ready for review. Phases 8+ (Theater
components) consume these from this PR.

* fix(web): close payload-override gap + paused-resume bug in Critique Theater hooks (Phase 7 review)

Two P1 fixes from lefarcen's review on PR #1307:

SSE payload override

`sseToPanelEvent` previously spread `data` after the channel-derived
`type`, so a payload-provided `type` could override the channel and
route a `critique.run_started` frame into the reducer as a `ship`
action. Reversed the spread so the channel-derived `type` is
authoritative, and revalidated the resulting object through the
contracts-level `isPanelEvent` predicate before returning. Frames
that fail validation (missing runId, empty runId, unknown type) are
dropped, so a malformed or compromised SSE frame can no longer
dispatch a wrong-shape action into the reducer.

Three new sse.test.ts cases pin the regression: hostile `type:'ship'`
in the payload still resolves to `run_started`, missing runId is
dropped, empty runId is dropped.

Replay pause/resume

`useCritiqueReplay` had one big effect keyed on `transcriptUrl`
only, so flipping `speed` from `paused` to `instant` never re-fired
and the held events sat undispatched. Split into a parse effect
(depends on URL, fetches and stores events in state) and a pace
effect (depends on parsed-events + speed, owns the cursor + timers).
The playback cursor lives in a ref that survives pause/resume
cycles, so flipping `paused` -> `instant` flushes from the current
position rather than restarting (which would double-dispatch
`run_started` and reset the reducer).

Two new useCritiqueReplay.test.tsx cases:
- paused-then-instant transitions from `playing` to `done` and
  reaches the shipped terminal phase
- intervalMs paced playback dispatches one event, pauses to drain
  the next scheduled timer, flips to instant, and confirms the
  remaining transcript drains exactly once (cursor was preserved)

Doc consistency

The earlier source comment in useCritiqueReplay.ts claimed `live`
"paces by recorded timestamps" while the impl used zero-delay
timers and the PR body said it behaves like `instant`. Aligned to
reality: `live` currently behaves like `{ intervalMs: 0 }` (events
drain on successive microtasks via setTimeoutFn) because transcripts
do not yet carry per-event timestamps. Honest timestamp-driven
pacing is queued as a Phase 7+ follow-up.

Validated: pnpm guard, pnpm --filter @open-design/web typecheck,
Theater suite 47/47 (up from 42, +3 sse + 2 replay), full web suite
96 files / 888 tests.

* feat(i18n): seed Critique Theater key block (en + zh-CN; other locales fall back via spread)

* feat(web): Theater PanelistLane component (Phase 8.1)

* feat(web): Theater ScoreTicker component (Phase 8.2)

* feat(web): Theater RoundDivider component (Phase 8.3)

* feat(web): Theater InterruptButton component with Escape keybind (Phase 8.4)

* feat(web): Theater TheaterDegraded chip (Phase 8.5)

* feat(web): Theater TheaterCollapsed post-run summary (Phase 8.6)

* feat(web): Theater TheaterTranscript replay surface (Phase 8.7)

* feat(web): Theater TheaterStage top-level container (Phase 8.8)

* feat(web): Theater CSS using existing semantic tokens (no hex literals)

* feat(web): Theater public exports barrel

* fix(web): resolve P2 + P3 review feedback on Phase 8 (PR #1314)

Addresses all 4 P2 + 3 P3 items from codex, Siri-Ray, and lefarcen.

State-lifecycle fixes (3 x P2)
1. Reducer learns a synthetic `__reset__` action (`CritiqueResetAction`).
   Host hooks dispatch it when their gating prop changes so a stale
   run from a prior project / transcript cannot bleed into the next
   context. Reset is idempotent on idle (returns the same reference).
2. `useCritiqueStream` dispatches `__reset__` at the top of its
   connection effect, so a workspace switch from project A (which
   streamed a critique) to project B clears the reducer before the
   new EventSource opens. enabled=false also clears.
3. `useCritiqueReplay` dispatches `__reset__` at the top of its
   parse effect, so transcriptUrl swaps (including swap-to-null after
   a replay reached `shipped`) lift the reducer back to idle before
   the new fetch starts.

SSE validation (1 x P2)
4. `sseToPanelEvent` now runs a per-variant `hasValidVariantShape`
   check after the cheap `isPanelEvent` predicate. A
   `critique.ship` frame missing `composite` / `round` / `status` /
   `artifactRef` is rejected before reaching the reducer, so
   TheaterCollapsed can no longer crash on `undefined.toFixed(1)`.
   Every variant's required fields are validated: run_started
   (protocolVersion, non-empty cast, maxRounds, threshold, scale),
   panelist_* (round, role, plus variant-specific shape), round_end
   (round, composite, mustFix, decision in {continue,ship}, reason),
   ship (round, composite, status, artifactRef.{projectId,artifactId},
   summary), degraded (reason, adapter), interrupted (bestRound,
   composite), failed (cause), parser_warning (kind, position).

Reducer correctness (1 x P2)
5. `panelist_open` now materializes the round + an empty panelist
   view (`{dims: [], mustFixes: []}`) so TheaterStage can highlight
   the in-progress lane the instant the tag opens. Before this, a
   stream that emitted only `panelist_open` after `run_started` left
   `rounds = []` and the UI rendered no current round until a later
   `panelist_dim` arrived.

Polish (3 x P3)
6. Brand role tint swaps from `var(--magenta, var(--accent))` to
   `var(--purple, var(--accent))`. `--purple` is actually defined
   across the design systems; `--magenta` is not, so Brand was
   silently falling through to `--accent` and looking identical to
   Designer.
7. New i18n key `critiqueTheater.interruptedSummary` for the
   interrupted-collapse copy ("Interrupted at round N, best
   composite X.X"). Previously the interrupted branch reused
   `shippedSummary` and the UI read "Shipped at round..." for a run
   that specifically did not ship. Native value in en + zh-CN; other
   locales fall back via `...en` spread.
8. `TheaterDegraded` heading id comes from `useId()` instead of a
   hardcoded `theater-degraded-heading`, so two chips rendered on
   the same page (chat history with multiple completed runs) keep
   their aria-labelledby references unambiguous.

Tests (15 new cases)
- reducer.test.ts (+5): __reset__ on running/terminal/idle, panelist_open materializes round, panelist_open does not stomp prior panelist data.
- sse.test.ts (+6): variant-level rejection for ship without required fields, degraded without adapter, run_started with empty cast, panelist_dim with non-numeric score, round_end with unknown decision, plus a positive fully-formed ship.
- useCritiqueStream.test.tsx (+2): state reset on projectId change, state reset on enabled flip false.
- useCritiqueReplay.test.tsx (+1): state reset on transcriptUrl swap to null after a replay reached shipped.
- TheaterCollapsed.test.tsx (text-pinning update): asserts the interrupted branch reads "Interrupted at round 1" + "best composite 7.9", and explicitly NOT "Shipped at round...".
- TheaterDegraded.test.tsx (+1): two chips on the same page get unique aria-labelledby ids that each resolve to an `<h3>`.

Validated
- pnpm guard clean
- pnpm --filter @open-design/web typecheck clean
- Theater suite: 13 files, 101 tests (was 86 on the first Phase 8 push, +15 new)
- tests/i18n/locales.test.ts 5 of 5 across 18 locales

* feat(web): CritiqueTheaterMount wires SSE + reducer into a single drop-in (Phase 9.1)

* feat(i18n): Critique Theater strings for de + ja + ko + zh-TW (Phase 9.2)

* fix(web): resolve P1 + P2 review feedback on Phase 9 (PR #1315)

Addresses every blocker from codex, Siri-Ray, and lefarcen. The
three state-lifecycle and SSE-validation issues they also flagged
inherit fixes from PR #1314's review pass that this branch now sits
on top of after rebase.

Real daemon kill on Interrupt (P1)
- CritiqueTheaterMount now POSTs to
  /api/projects/:id/critique/:runId/interrupt alongside the
  optimistic local dispatch. Before this fix, clicking Interrupt
  only flipped the React state to interrupted while the daemon job
  kept running. The fetch is best-effort: a 404 (endpoint not wired
  yet, lands in Phase 15) is swallowed with a dev-mode console.warn
  so the UI still moves to the collapsed badge.
- New fetchInterrupt test seam lets RTL assert on the URL / method
  and simulate the "daemon not ready yet" path. Two tests pin both:
  the happy URL proj-42/critique/run-abc/interrupt POSTs, and a
  rejected fetch still flips the UI.

interruptPending reset on new run (P2)
- A ref-backed effect compares the current runId against the last
  one we saw; when it changes, interruptPending is cleared. A user
  who interrupts run-1 and then triggers run-2 from the same mount
  now gets a fresh, enabled kill button instead of one stuck in
  "Interrupting…". Pinned by a new mount test.

Escape keybind scope (P2)
- InterruptButton now checks the keydown target. Escape inside an
  input, textarea, select, or contenteditable element is ignored
  (and any ancestor of those via closest() is treated the same
  way). Body-level focus still fires the keybind so the Theater
  area's affordance keeps working. Four new tests cover textarea,
  input, contenteditable, and the body-focus positive case.

userFacingName i18n key (P2)
- The spec at specs/current/critique-theater.md:6 mandates a single
  critiqueTheater.userFacingName key so the "Design Jury" label can
  be renamed without touching code. Phase 8 introduced
  critiqueTheater.title by mistake; renamed across types.ts, en.ts,
  zh-CN.ts, de.ts, ja.ts, ko.ts, zh-TW.ts, and the lone consumer
  TheaterStage.tsx. The locale alignment test stays green.

Validated
- pnpm guard clean
- pnpm --filter @open-design/web typecheck clean
- Theater suite: 14 files, 112 tests (was 101 before, +11 new for
  the Phase 9 review pass: 3 mount + 4 InterruptButton focus scope;
  the rest were already in #1314's review fix).
- tests/i18n/locales.test.ts 5 of 5 across 18 locales.

* feat(daemon): adapter-degraded registry with TTL (Phase 10.1)

In-memory registry recording adapters that produced malformed or
oversize transcripts so the orchestrator can skip them for a TTL
window (default 24h) instead of cycling through known-bad providers
on every run.

Records carry reason (malformed_block | oversize_block |
missing_artifact), source label, and expiresAt. The test-only
clock seam lets the suite advance time deterministically and prove
that an expired entry stops counting as degraded without anyone
calling clearDegraded.

7/7 vitest cases green.

* feat(daemon): synthetic good + bad adapter fixtures (Phase 10.2)

Two test-only adapters that read the existing v1 transcript
fixtures (happy-3-rounds and malformed-unbalanced) and replay them
as either a full string or a 512-byte chunked stream. The chunked
form is what the conformance harness uses to prove the parser
holds together when the transcript arrives in arbitrary network
slices, not as one buffered blob.

* feat(daemon): adapter conformance harness (Phase 10.3)

runAdapterConformance pulls a transcript through the same
parseCritiqueStream pipeline the orchestrator uses and classifies
the outcome as shipped, degraded, or failed. On a degraded
outcome it forwards the matched reason to the adapter-degraded
registry, so a single nightly conformance run is what populates
the skip list rather than the orchestrator learning each adapter
is broken at request time.

5/5 vitest cases green covering shipped, malformed degraded,
oversize degraded, no-ship failure, and the harness-thrown
failure path.

* test(e2e): Critique Theater Playwright suite (Phase 11)

Six tests, one viewport per visual case, deterministic SSE
fixtures stubbed via page.route(). Adds the suite to
test:ui:extended so the existing extended-UI lane picks it up.

Coverage:

  1. Happy path: a single mounted theater plays the full
     fixture (1 run_started, 5 panelists open / dim / must_fix /
     close, 1 round_end, 1 ship) and ends on the score badge.
  2. Interrupt mid-run: the panelist that is open at the time
     the interrupt button is clicked closes with an interrupted
     marker and the transcript freezes there.
  3. Visual regression at 375x720 mobile.
  4. Visual regression at 768x1024 tablet.
  5. Visual regression at 1280x800 desktop.
  6. A11y role tree: the theater region exposes a labelled
     landmark, each panelist lane is a group with an accessible
     name, the score is a status live region.

All SSE traffic is stubbed by page.route so the suite runs in CI
without a daemon. The toggle is seeded via localStorage by
bootAppWithCritiqueEnabled so the gate behaves as if Settings
flipped it on. typecheck clean; playwright --list reports 6.

* test(web): reducer p99 bench at 10k iterations (Phase 13.1)

Locks the documented 2ms budget for the Critique Theater reducer
on a representative SSE script (27 actions, one full happy run)
behind a regression gate. Asserts p99 stays under 4ms (2x the
documented budget) so CI runners with a noisy neighbour do not
flake while a real regression to 20ms or 200ms still trips.

The bench is a vitest case rather than a bare microbenchmark so
it runs in the same CI lane as every other web test and does not
need a parallel runner.

* test(web): critique surface coverage walker (Phase 13.2)

Walks the public critique surface (11 SSE event names, 5 panelist
roles, 6 lifecycle phases, 9 named i18n keys) and asserts each
named symbol appears in both the src corpus and the test corpus.
The walker is the gate that catches a rename in one half of the
codebase without a matching update in the other half: a future
PR that drops 'panelist_must_fix' from the reducer without also
removing its test reference fails this suite.

62 assertions, one per symbol per corpus.

* docs: Critique Theater user guide (Phase 14.1)

Seven sections aimed at end users (not contributors):

  1. What is Design Jury
  2. How it works (the five panelists, auto-converging rounds,
     the composite formula)
  3. Settings (the M1 toggle and what it does)
  4. Reading the score badge
  5. Replay surface
  6. Troubleshooting (degraded, interrupted, failed)
  7. FAQ

The composite formula is documented as
    designer * 0 + critic * 0.4 + brand * 0.2 + a11y * 0.2 + copy * 0.2
because anyone trying to reverse-engineer the score is going to
search for those weights and the docs are the place they should
land first.

* docs(daemon): critique module AGENTS map (Phase 14.2)

Daemon-side wayfinder for the apps/daemon/src/critique directory.
Tables every file, what owns what invariant, and the 'when you
change anything here' guide so a future contributor does not
have to reverse-engineer the rollout resolver before adding a
new SSE event.

* docs(web): Theater module AGENTS map (Phase 14.3)

Web-side mirror of the daemon AGENTS map. Same file table, same
invariants section, same change-impact guide, sized to the
Theater component package.

* feat(daemon): rollout flag resolver (Phase 15.1)

Single decision point every caller consults to know whether the
orchestrator should wire the critique pipeline for a given run.
Priority:

  1. Skill-level policy (required wins, opt-out wins inversely)
  2. Per-project override from the Settings toggle
  3. OD_CRITIQUE_ENABLED env override
  4. Rollout phase default
       M0 dark-launch      false
       M1 settings only    false (toggle is off until the user flips it)
       M2 per-skill        true if skill opted in
       M3 global default   true

OD_CRITIQUE_ROLLOUT_PHASE parser defaults to M0 on unknown input
so a fresh install never surprises a user with the feature on.

10/10 vitest cases green covering every cell of the matrix.

* feat(web): Settings toggle hook for Critique Theater (Phase 15.2)

React hook that reads critiqueTheaterEnabled from the existing
open-design:config localStorage blob and stays in sync via:

  - the platform storage event (cross-tab)
  - a open-design:critique-theater-toggle CustomEvent (same-tab)

Same-tab event is the one that fires when the Settings panel saves
in the current window: the toggle and every mounted theater update
without a page reload.

setCritiqueTheaterEnabled(next) is the imperative setter the Settings
panel calls. It preserves the rest of the stored config (mode, apiKey,
etc.) and dispatches the same-tab event after the localStorage write.

The web hook reflects what the user toggled; the daemon-side
isCritiqueEnabled is the final routing authority (project override,
env, rollout phase). When they disagree, the daemon wins for backend
gating and the web reflects the toggle state.

6/6 vitest cases green covering first read, stored read, same-tab
event flip, config preservation, corrupted JSON tolerance, and
cross-tab storage event.

* test(web): Phase 15 toggle hook failure-mode coverage (PR #1320)

lefarcen P2 on PR #1320 flagged that the PR body claimed safe
behavior for disabled localStorage, non-object JSON, and missing
CustomEvent shim, but the suite only covered corrupt JSON plus
happy-path storage events. Added four failure-mode tests so the
swallowed errors are not silently traded for a throw in a future
refactor:

1. Returns false on a stored JSON value that parses to an array
   (non-object). Catches a regression where the guard treats
   anything truthy as a config blob.
2. Returns false on a stored JSON value of literal 'null'.
   typeof null === 'object' in JS, so the guard has to check null
   explicitly; this test pins that check.
3. Returns false when localStorage.getItem throws (private mode /
   disabled storage / SecurityError). The hook must swallow and
   return false so the rest of the app keeps rendering.
4. setCritiqueTheaterEnabled still dispatches the same-tab
   CustomEvent when localStorage.setItem throws (quota exceeded /
   disabled storage). The dispatch path is the in-session
   broadcast that keeps every mounted hook coherent even when
   persistence is unavailable; verified by mounting two probes
   and asserting both flip after the setter is called with a
   throwing setItem.

10/10 vitest cases green (6 existing + 4 new).

* fix(web): honor CustomEvent payload in toggle hook listener (PR #1320)

Both Siri-Ray (blocking) and lefarcen (P2 new) caught the same
real bug in the failure-mode test I added in affcdd27: the test
asserts the in-session UI flips when localStorage.setItem throws,
but the CustomEvent listener was ignoring the event's typed
detail and just calling readToggle(). Under a throwing setItem
the localStorage value is stale (or absent), so the listener
would see the OLD value and the test would fail (or worse, the
production claim 'in-session event keeps mounts coherent' was
hollow).

Fixed the hook, not the test: the listener now reads
event.detail.enabled when it is a boolean, falling back to
readToggle() only for malformed events or for cross-tab storage
events (which do not carry a typed payload). The setter already
dispatched the detail; the listener just was not consuming it.

Test changes:

  - The existing 'setItem throws' test now asserts the right
    behavior for the right reason. Updated the inline comment to
    say the listener reads from detail, not localStorage.
  - New test 'falls back to readToggle when the CustomEvent
    carries no usable detail' pins the fallback path: a
    malformed dispatcher (no detail, or detail.enabled not a
    boolean) degrades cleanly instead of throwing or being
    silently ignored.

11 / 11 vitest cases green (10 prior + 1 new fallback).

* feat(daemon): route critique spawn-path eligibility through the rollout resolver

The wireup edit Phase 10 and Phase 15 carved out: today server.ts gates
the critique pipeline on critiqueCfg.enabled, which is just the
OD_CRITIQUE_ENABLED env var. After this commit it gates on
isCritiqueEnabled(...) from the Phase 15 resolver, so the full
priority matrix is live:

  1. Per-skill od.critique.policy veto (opt-out / required)
  2. Per-project override (M1 Settings toggle, written through the
     existing Phase 6 settings endpoint)
  3. OD_CRITIQUE_ENABLED env override (power-user lane / CI fixtures)
  4. OD_CRITIQUE_ROLLOUT_PHASE default
       M0 dark-launch      false
       M1 settings only    false
       M2 per-skill        only when skillPolicy === 'opt-in'
       M3 global default   true

Default behaviour on a fresh install is unchanged: the resolver
returns false at M0 without an env override or a project override,
so prod traffic falls through to the legacy single-pass path
exactly the way it did before.

Inputs threaded today: phase from OD_CRITIQUE_ROLLOUT_PHASE,
envOverride from OD_CRITIQUE_ENABLED. skillPolicy and projectOverride
are passed as null for the v1 cutover; the daemon-side handler that
round-trips critiqueTheaterEnabled on the project settings row and
the od.critique.policy frontmatter resolver land as the next two
commits in this branch.

The three call sites that used critiqueCfg.enabled (the brand-thread
guard, the skill-thread guard, the top-line critiqueShouldRun
compound) now read from a single locally-scoped critiqueEnabledForRun
boolean, so the eligibility check is computed exactly once per spawn
and the prompt composer + orchestrator stay in lockstep the way
the existing comment already promised.

Tests still green: daemon vitest 22 / 22 across rollout +
conformance + adapter-degraded. Daemon typecheck clean.

* feat(web): mount CritiqueTheaterMount in ProjectView

The web counterpart of the daemon wireup. ProjectView now renders
<CritiqueTheaterMount projectId={project.id} enabled={...} /> as a
sibling of <AppChromeHeader> inside the top-level <div className="app">.

The mount is the drop-in from the Phase 9 stack: it owns the SSE
subscription, the kill-request handshake, and the phase-aware swap
from the live <TheaterStage> to the collapsed badge once a run
settles. The mount returns null until the daemon emits a
critique.run_started for the active project, so the visual surface
is byte-for-byte unchanged for users who have not opted in.

Enabled wiring: useCritiqueTheaterEnabled() reads the M1 Settings
toggle from the existing open-design:config localStorage blob and
stays in sync with both the platform storage event (cross-tab) and
the same-tab open-design:critique-theater-toggle CustomEvent the
Phase 15 setter dispatches. The hook honors the event payload
directly so a private-mode browser that cannot persist the toggle
still updates the in-session UI correctly.

The daemon-side gate (isCritiqueEnabled in apps/daemon/src/server.ts)
remains the authority for whether a run is actually wired through
the critique pipeline. This hook only governs whether the web layer
renders the resulting SSE stream when the daemon emits one. The
two-layer gate is intentional: an integrator embedding the Theater
in a custom UI can flip the web visibility independent of the
daemon's routing decision, and a daemon-side env override flips
backend gating without touching the web's localStorage.

Tests still green: web Theater suite 181 / 181 across 16 files.
Web typecheck clean.

* feat(daemon): resolve od.critique.policy frontmatter at the spawn site

The next step in the wireup branch's ladder: replace the placeholder
`skillPolicy: null` with the actual value parsed from the active
skill's SKILL.md frontmatter.

Three small edits, one new field on a public type:

1. SkillInfo gains a `critiquePolicy: SkillCritiquePolicy` field
   carrying the parsed `od.critique.policy` token (required /
   opt-in / opt-out / null). The field is null when the skill has
   no opinion, which lets the lower-priority resolver tiers
   (projectOverride, envOverride, phase default) decide.

2. listSkills() populates the new field via a small
   `normalizeCritiquePolicy` helper that tolerates the YAML
   scalar's casing and trims whitespace. Unknown tokens collapse
   to null so a typo in SKILL.md cannot accidentally force the
   panel on or off; it just falls through. Derived example cards
   inherit the parent's policy.

3. server.ts captures `skill.critiquePolicy` into a hoisted
   `skillCritiquePolicy` variable inside the existing skill-load
   block, then threads it into the isCritiqueEnabled call as the
   skillPolicy input. The hoisting keeps the variable in scope at
   the resolver call site without restructuring the spawn handler.

After this commit, the priority matrix the rollout resolver was
designed for is live for its top tier. The previous commit wired
env + phase; this one wires skill. The projectOverride input
remains null pending the next commit that extends the Phase 6
settings endpoint.

Daemon vitest: 10 / 10 rollout cases pass against the new wiring.
Daemon typecheck: clean.

* feat(daemon): feed projectOverride into the rollout resolver from project metadata

Replaces the placeholder `projectOverride: null` in the spawn
handler with the actual value the Settings panel writes onto the
project's metadata blob: `critiqueTheaterEnabled?: boolean`.

The read is defensive at the boundary: the metadata object is
typed loosely (it round-trips through SQLite as a free-form JSON
blob), so the spawn handler narrows to `boolean` and falls
through to `null` for any other shape. A missing key, a malformed
value, or a project that has never visited Settings collapses to
`null`, which is exactly the resolver's "no opinion, fall
through to env / phase" signal.

The `critique` frontmatter slot also gets typed on the
SkillFrontmatter shape so the `od.critique.policy` chain the
previous commit introduced no longer needs a bracket-access
cast. Same pattern as the existing `craft`, `preview`, and
`design_system` nested-record slots.

After this commit, every tier of the rollout resolver's priority
matrix is wired:

  1. skillPolicy   (from SKILL.md od.critique.policy)
  2. projectOverride (from project metadata critiqueTheaterEnabled)
  3. envOverride   (from OD_CRITIQUE_ENABLED)
  4. rollout phase (from OD_CRITIQUE_ROLLOUT_PHASE)

The write path for projectOverride still flows through the
existing project-update handler the Settings panel already uses
to persist project metadata; no new endpoint is needed. The
Settings UI button that calls setCritiqueTheaterEnabled and
posts the new field is the next commit on this branch.

Daemon typecheck: clean. Daemon vitest: 10 / 10 rollout cases
still green against the new wiring.

* fix(daemon): forward critique events to project sinks + align composer gate (PR #1338)

Two codex review items addressed in one commit since they share the
same root cause (resolver-enabled run hits a transport / prompt
contract that was still env-gated):

P1 (transport mismatch). The daemon emits critique.* SSE frames
through critiqueBus -> design.runs.emit, which fans out on
/api/runs/:runId/events. The web CritiqueTheaterMount subscribes to
/api/projects/:projectId/events (it's project-scoped, not run-
scoped, because the mount lives at the project workspace and
follows the user across runs). Result: in production the mount
never sees a real frame and the e2e tests' stubbed routes hide the
mismatch.

Fixed by extending critiqueBus.emit to fan out to BOTH sinks: the
existing runs.emit transport, AND the per-project event-sinks map.
The project-events route emits via sse.send(payload.type, payload),
so we pack the SSE channel name onto payload.type and let the sink
push the right channel. The web sseToPanelEvent overwrites type
from the channel name on the way back into a PanelEvent, so the
round-trip stays correct.

P2 (prompt gate misalignment). composeSystemPrompt reads
cfg.enabled to decide whether to append the panel addendum, but
critiqueCfg.enabled is loaded from OD_CRITIQUE_ENABLED only. A run
the resolver enabled via phase / project / skill (env unset) would
have critiqueShouldRun = true while critiqueCfg.enabled remained
false, dropping the panel prompt while still routing through
runOrchestrator -> parser waits for tags that never arrive -> run
degrades.

Fixed by passing a derived config { ...critiqueCfg, enabled: true }
to the composer when critiqueShouldRun is true. The composer's own
gate now agrees with the resolver decision on every input the
spec defines.

Daemon typecheck: clean. Daemon vitest: 10 / 10 rollout cases
still green against the new wiring.

* fix: address PerishCode P1 + P2 follow-ups on PR #1338

Two follow-up items PerishCode flagged on the activation PR.
Non-blocking but both are real:

1. Phase 11 e2e suite was wired into test:ui:extended but lands
   the user on '/' (home route) where ProjectView (and therefore
   CritiqueTheaterMount) is never rendered. With the suite as
   written, every assertion would time out the first time the
   lane runs in CI, contradicting the PR body's claim that the
   suite stays parked behind test.describe.fixme.

   The state diverged from my earlier Phase 11 work because the
   merge from main on commit 4ab719c6 brought in #1307's
   squash-merged version of the e2e file (the pre-fixme shape).

   Re-applied test.describe.fixme to the describe block plus
   removed ui/critique-theater.test.ts from the test:ui:extended
   script in e2e/package.json. Added a file-header docblock
   explaining what the follow-up commit needs to do: replace
   goto('/') with /projects/:id navigation similar to
   app-design-files.test.ts, split the SSE fixture into a live
   prefix and terminal suffix (Codex P2 on PR #1320), and commit
   the first PNG baselines.

2. bestRoundOf in CritiqueTheaterMount returned the LAST round
   with a numeric composite, not the round with the HIGHEST
   composite, while bestCompositeOf correctly returned the max.
   A run that closed round 1 at 8.5 and round 2 at 6.0 would
   dispatch interrupted { bestRound: 2, composite: 8.5 } on a
   user-clicked interrupt.

   Folded the two helpers into a single bestRoundAndComposite
   that walks state.rounds once and returns the matching pair so
   the two values cannot drift. The onInterrupt callback now
   destructures from one helper instead of two independent reads.
   Falls back to (state.activeRound, 0) when no round has closed
   with a composite yet.

Web typecheck: clean. CritiqueTheaterMount.test.tsx: 7 / 7 cases
still green against the new helper.

* fix: wire M1 project override end-to-end + correct deferred-surface doc claims (PR #1338)

Three lefarcen P2s on the latest review pass, all real:

1. M1 project override was half-wired: the daemon read
   metadata.critiqueTheaterEnabled but the web setter only
   wrote localStorage. A user opt-in would render the Theater
   on the web (localStorage was set) while the daemon resolved
   projectOverride=null and skipped critique unless env / phase
   already permitted. Two halves talking past each other.

   Extended setCritiqueTheaterEnabled to accept an optional
   { projectId, fetchProjectSettings } options bag. When a
   projectId is supplied, the setter ALSO sends a
   PATCH /api/projects/:id with { metadata: { critiqueTheaterEnabled
   } } so the daemon's spawn-time resolver picks the same value up
   on the next generation. The existing project-routes endpoint
   already accepts arbitrary metadata patches, so no new endpoint
   is needed. The local write + the CustomEvent dispatch still
   fire before the PATCH, so a network failure does not unwind
   the in-session UI flip. Three new vitest cases pin the new
   path: PATCHes when projectId is provided, skips when it is
   not, swallows a rejected PATCH so the in-session UI still
   flips.

2. Rollout docs (docs/critique-theater.md section 3) claimed the
   Settings toggle persists into the daemon settings store, but
   the previous implementation only had a localStorage reader /
   writer plus a daemon read of project metadata, with no
   round-trip. Rewrote the section to lead with the four-tier
   resolver (skill policy / project override / env / phase),
   document that the setter now round-trips via the existing
   PATCH endpoint when given a projectId, and call out the
   Settings panel UI control as a deliberate follow-up.

3. Troubleshooting table pointed users at /api/metrics/critique
   (Phase 12, deferred) and 'od adapters clear-degraded <id>'
   (CLI wrapper that does not exist). Replaced the metrics
   reference with the local conformance harness command
   (pnpm --filter @open-design/daemon vitest run
   tests/critique-conformance.test.ts) that ships today, with a
   note that the Phase 12 dashboard surfaces this status as a
   series once that PR lands. Replaced the CLI command with the
   programmatic clearDegraded() helper that exists today and
   flagged the CLI wrapper as planned follow-up.

Web typecheck: clean. Toggle hook tests: 14 / 14 green (11
existing + 3 new for the round-trip path).

* test(web): multi-round interrupt regression for bestRoundAndComposite (PR #1338)

lefarcen P3 follow-up to the previous bestRoundAndComposite fix:
the existing CritiqueTheaterMount.test.tsx interrupt cases only
exercised a single-round state, so a future refactor back to two
independent helpers wouldn't be caught by the test suite even
though it'd reintroduce the round / composite drift bug.

Added a regression case that:

  1. Drives the reducer through two complete rounds with the
     full 5-role cast closing at distinct composites: round 1
     at 8.5, round 2 at 6.0 (the high-composite round is NOT the
     most recent one).
  2. Clicks Interrupt + waits for the daemon ack via the test
     seam fetcher returning 204.
  3. Asserts the collapsed badge displays "round 1" (the
     correct best-composite round), and queryByText for
     "round 2 ... 8.5" returns null (the buggy pairing
     would have produced that string).

The bestRoundAndComposite helper walks state.rounds in one pass
and returns the matching pair, so the round number and the
composite cannot drift apart. This test locks the fix in: a
refactor that splits the helpers back into independent walks
will be caught here.

8 / 8 vitest cases green on the file.

* fix(web): read-merge-write the project metadata in setCritiqueTheaterEnabled (PerishCode P2 on PR #1338)

The previous round-trip sent { metadata: { critiqueTheaterEnabled: next } }
as the entire PATCH body. The daemon's project-routes handler only
re-stamps three immutable fields (baseDir, importedFrom,
fromTrustedPicker) before calling updateProject(db, id, patch),
which then does a shallow { ...existing, ...patch } in apps/daemon/
src/db.ts. So patch.metadata replaces the row's metadata wholesale,
dropping kind, templateId, linkedDirs, and every other field the rest
of the app reads.

No in-tree caller passes projectId today (only vitest cases), so the
bug had not surfaced yet. But the surface is documented in
docs/critique-theater.md section 3 and the function's own JSDoc as
the M1 round-trip path, so it would have shipped as a latent footgun
for the next integrator: a Settings UI follow-up, or any third party
that wires the setter into a project-aware surface.

Fix: read-merge-write rather than a bare patch.

- GET /api/projects/:id to read the row's current metadata.
- Spread that metadata into the PATCH body and overlay
  critiqueTheaterEnabled: next on top, mirroring the partial-metadata
  pattern already used in ChatComposer.tsx for linkedDirs.
- PATCH the merged object.

Failure handling:
- GET fails: skip the PATCH entirely. We cannot construct a safe
  merged body without the current state, and a bare patch would
  wipe other metadata. The in-session CustomEvent fired earlier in
  the setter still keeps every mounted hook consistent; the next
  save retries the round-trip.
- PATCH fails: log in dev. The in-session UI is already correct via
  the CustomEvent.

Tests (TDD, red-first):

- 'GETs the project then PATCHes with merged metadata when a
  projectId is supplied': stubs a GET that returns
  { kind: 'template', templateId: 'modern-blog', linkedDirs: [...] }
  and asserts the PATCH body equals the merge plus the toggle.
- 'PATCHes with just the toggle when the project has no prior
  metadata': stubs a GET that returns no metadata block.
- 'skips the PATCH (does not stomp metadata) when the prefetch GET
  fails': stubs a rejecting GET and asserts only the GET fires.
- 'swallows a rejected PATCH after a successful prefetch': stubs a
  successful GET and a rejecting PATCH; asserts the in-session UI
  still flips via the CustomEvent.

Doc updated on the setter's JSDoc to describe the new three-step
flow (localStorage, CustomEvent, read-merge-write PATCH) and the
two failure modes.

Verified:
- pnpm --filter @open-design/web typecheck clean.
- pnpm --filter @open-design/web test: 111 files / 1055 tests green
  (was 1052, +3 from the new merge-flow cases).

* fix(web): restore wait-for-daemon-ack pattern on Theater interrupt

Same regression as flagged on PR #1316 post-main-merge: the
optimistic local dispatch fired before the POST resolved, so a
daemon 404 / 409 still terminalized the UI and the real SSE
terminal event got ignored by the sticky interrupted phase.

Snapshot runId / bestRound / composite at click time, dispatch
interrupted only on res.ok, clear interruptPending on rejection or
non-2xx so the user can retry. Tests cover rejection + 404 leaving
the run on the live stage; the 204 path waits for the ack.

* feat(daemon): Critique Theater Phase 12 observability foundations

Lands the metrics registry, the structured logger, the /api/metrics
route, and the adapter-degraded bump that wires up the first data
point. The orchestrator-side bumps for runs / rounds / composite /
must-fix / interrupted / parser_errors / protocol_version land in a
follow-up commit on this branch (kept separate so the wiring diff
reads cleanly against the registry shape).

Surfaces added:

- apps/daemon/src/metrics/index.ts: 9 Prometheus series under the
  open_design_critique_* namespace with the histogram buckets the
  spec calls out (round_duration_ms at 100 / 250 / 500 / 1000 /
  2500 / 5000 / 10000 / 30000 / 60000 ms; composite_score at
  0-10 integer steps).
- apps/daemon/src/logging/critique.ts: 6 typed events, one JSON line
  per call on stdout, namespaced critique. Matches the JSON-per-line
  convention cli.ts already uses; no new logger framework.
- apps/daemon/src/server.ts: GET /api/metrics route. Honors
  OD_METRICS_ENDPOINT=disabled to opt out for air-gapped installs.
- apps/daemon/src/critique/adapter-degraded.ts: markDegraded now
  bumps degraded_total so the adapter-health dashboard panel
  reflects every TTL refresh and every fresh mark.

Deps: prom-client ^15.1.0, @opentelemetry/api ^1.9.0 added to
apps/daemon/package.json. Both are zero-config no-ops without an
exporter wired; daemon bundle size impact is ~150 KB uncompressed.
The @opentelemetry/api dep is in place ahead of the OTel-spans
follow-up commit; it adds no behavior on this commit.

Tests:
- tests/metrics/critique.test.ts (3 cases): registry shape +
  exposition text + reset-between-tests
- tests/logging/critique.test.ts (4 cases): event shape + ordering
  + newline framing + namespace stamping

Verification (Windows-local):
- pnpm --filter @open-design/daemon typecheck: clean
- New metrics + logging suites: 7 / 7 green
- Existing adapter-degraded + conformance + rollout suites:
  22 / 22 green; the bump is non-breaking

* feat(daemon): wire Critique Theater metrics + structured logs from the orchestrator

Lights up the bump sites the Phase 12 foundations PR registered the
series for. Every panel event the parser surfaces now reaches the
matching Prometheus counter / histogram and the matching JSON log
line on stdout.

Switch-loop bumps + logs:

- run_started: log run_started, set protocol_version gauge to the
  observed protocol version (small-integer cardinality).
- panelist_open: record the first-open wall-clock per round so
  round_end can compute round_duration_ms; subsequent opens in the
  same round leave the start time untouched.
- panelist_must_fix: bump must_fix_total with the panelist role.
  The wire event does not yet carry a dim name, so the label is
  'unspecified' for now; a future parser revision can drop in the
  real dim without a metric rename.
- round_end: bump rounds_total, observe composite_score, observe
  round_duration_ms (current ms minus the tracked start), log
  round_closed with the composite / mustFix / decision triple.
- parser_warning (parser-yielded): bump parser_errors_total with
  the kind label, log parser_recover with kind + position.

Orchestrator-side parser warnings (composite_mismatch and
duplicate_ship from the daemon-authoritative scoring checks) go
through a new emitParserWarning helper so the bus emit, the
collectedEvents push, the metric bump, and the log line stay in
lockstep. Three inline emission sites collapse to one-line helper
calls.

After the try/catch, a single terminal-status switch bumps
runs_total{status, adapter, skill} once per run, with branch-
specific log + counter:

- shipped / below_threshold: log run_shipped
- interrupted: bump interrupted_total, log run_failed{cause: interrupted}
- timed_out: log run_failed{cause: timed_out}
- failed: log run_failed{cause: orchestrator_internal}
- degraded: log degraded{reason: orchestrator_classified}

OrchestratorParams gains optional skill: string for the label;
defaults to 'unknown' so spawn sites that have not yet threaded it
keep working without a metric shape change.

Tests:
- The new metrics + logging suites (7 / 7) verify registry shape
  and event framing; orchestrator-side metric integration is
  exercised through the existing critique-conformance and
  critique-adapter-degraded suites (22 / 22 still green).
- Logger test reassigns process.stdout.write directly instead of
  vi.spyOn so the Node overloaded write signature does not
  collide with MockInstance<unknown>.

* feat(observability): Grafana dashboard JSON for Critique Theater

Three default rows mapping to the metrics this branch wires up:

1. Fleet quality: composite score p50 / p90 / p99 line graph by
   adapter, plus a heatmap of the composite distribution. The
   line graph answers 'are my agents getting better over time';
   the heatmap answers 'are the bad runs clustered around one
   adapter or smeared across the fleet'.

2. Adapter health: stacked bar charts for degraded marks (by
   adapter / reason) and parser errors (by adapter / kind) over
   a 5-minute window. The two queries together let an operator
   see 'is this adapter degraded because of malformed wire output
   or because of oversize blocks' without flipping panels.

3. Brief throughput: runs-per-hour by terminal status, an average
   rounds-per-run stat per adapter, and a round-duration ms p50 /
   p90 / p99 line. Throughput numbers fall straight out of the
   runs_total / rounds_total counters; the duration histogram is
   the same one the runs feed.

The dashboard uses a templated $datasource var (defaults to
'prometheus') so an operator with multiple Prometheus instances
can switch without editing JSON. Schema version 39 (Grafana 11).

Operators import via:

  pnpm dlx @grafana/cli dashboard import     tools/dev/dashboards/critique.json

or paste into a provisioned dashboards directory. The file is
checked into the repo as a starting artifact; alert rules and
SLO panels ship after the first 1000 runs inform the right
thresholds. JSON validates with node -e 'JSON.parse(...)' (sanity
checked locally).

* feat(daemon): OpenTelemetry outer span around the critique run

Wraps each runOrchestrator call in a 'critique.run' span via the
existing @opentelemetry/api dep added in the Phase 12 foundations
commit. Attributes set on the span:

- critique.run_id, critique.adapter, critique.skill at start
- critique.final_status, critique.final_composite on terminal
  resolution
- span status flipped to ERROR for failed / timed_out runs so a
  Tempo / Honeycomb / Jaeger filter on traces.status=error
  surfaces the right slice without joining back to Prometheus

No exporter is wired by default; @opentelemetry/api is the API
package and intentionally splits from @opentelemetry/sdk-*, so
the span is zero-overhead until an operator attaches an SDK
through their runtime config.

Inner per-round / parse_chunk / scoreboard_eval / persist_round /
ship.persist spans defined in the Phase 12 plan are a follow-up:
the outer span alone gives the trace a duration + final status +
adapter/skill labels, which is the 80% value for dashboards that
correlate runs across services. Adding child spans inside the
existing 600-line orchestrator without restructuring is a separate
careful change.

Verification:
- pnpm --filter @open-design/daemon typecheck: clean
- 29 / 29 critique + metrics + logging tests still green

* fix(nix): bump pnpmDepsHash for prom-client + @opentelemetry/api lockfile bump

nix-check failed on PR #1485 with hash mismatch in
open-design-daemon-pnpm-deps and open-design-web-pnpm-deps after
the Phase 12 foundations commit (2b8b7445) added prom-client and
@opentelemetry/api to apps/daemon/package.json and refreshed
pnpm-lock.yaml.

CI reported the new sha:
  specified: HFLm+8hv3o5x3Xem4MXNsNclIgiVRc70+EBafL0rVn8=
  got:       7R1sQC38gOT0gsZ2oNOviCZ486cbbGJGJCis6WI8z9s=

Both nix files pin the same workspace lockfile, so both flip in
lockstep. No other Nix surface changes required.

* fix(daemon): four Phase 12 review findings (Codex P2 x2 + Siri-Ray P2 + lefarcen P2)

1. Siri-Ray P2 in orchestrator.ts (round metric / log used untrusted
   agent values). The new observability path now records rs.composite
   and rs.mustFix (daemon-authoritative) instead of event.composite
   and event.mustFix when rs exists, and skips the bumps + log
   entirely when rs is missing (a degenerate round_end without any
   matching panelist_open). The dashboard p50 / p90 / p99 now agrees
   with persistence and ship decisions; an adapter reporting <ROUND_END
   composite='10'> while the daemon computed 6 logs 6 and still emits
   the composite_mismatch parser warning the prior block was already
   producing.

2. Codex P2 in server.ts (skill label always 'unknown'). The spawn
   path called runOrchestrator without passing the resolved skill id,
   so every live run bumped open_design_critique_*{skill='unknown'}
   and the per-skill dashboard breakdown was always empty. Threaded
   effectiveSkillId (already computed at the same handler scope as
   the project skill fallback) through skill: . . . so the metric
   reflects the real skill when one is assigned, and the orchestrator
   default of 'unknown' only fires for runs that genuinely have none.

3. Codex P2 in conformance.ts (protocol-version mismatch let through).
   An adapter that emitted <CRITIQUE_RUN version='2'> followed by a
   valid SHIP classified as shipped because the harness only watched
   for terminal events. Added a guard inside the parse loop: if a
   run_started carries protocolVersion !== CRITIQUE_PROTOCOL_VERSION,
   mark the adapter degraded with reason 'protocol_version_mismatch'
   (already in DEGRADED_REASONS) and return early. ConformanceOutcome
   union widened to accept the new reason.

4. lefarcen P2 in tools/dev/dashboards/critique.json (runs-per-hour
   panel under-reported by 3600x). 'rate(...[1h])' returns per-second.
   Multiplied by 3600 so the panel title and unit match the actual
   value rendered.

Verification:
- pnpm --filter @open-design/daemon typecheck: clean
- New metrics + logging suites (7), existing adapter-degraded (7),
  conformance (5), rollout (10): 29 / 29 green
- Grafana JSON re-parses with node -e 'JSON.parse(...)'

* fix(nix): set pnpmDepsHash to fakeHash so CI surfaces the real hash for the regenerated lockfile (lefarcen P1 on PR #1485)

* fix(nix): pin pnpmDepsHash to sha256-NtXbiRU0YZ4EVJVNC6N3sR1S0ozA3BvCwgXI0L0OMH4= from CI nix-check output

---------

Co-authored-by: Nagendhra <nagendhra405@gmail.com>
2026-05-13 22:11:27 +08:00
lefarcen
2a0ebea50b release: Open Design 0.7.0
- bump 14 monorepo package.json files to 0.7.0 (root + apps/{web,daemon,desktop,packaged,landing-page} + packages/{contracts,platform,sidecar,sidecar-proto} + tools/{dev,pack,pr} + e2e); apps/packaged was already at 0.6.1 from beta lane, all others at 0.6.0
- add CHANGELOG.md [0.7.0] - 2026-05-12 entry covering 97 merged PRs since 0.6.0:
  - Critique Theater: Phase 7 web client state machine (#1307) + Phase 6.2 daemon artifact extraction (#1085)
  - Web/UI: thumbs-up/down feedback widget (#1308), Cmd+, opens Settings (#1173), Finalize design package + Continue in CLI (#974), fetch models button for BYOK (#1034), provider models alphabetical sort (#1097), collapsible MCP JSON field-mapping (#1136), design file rename (#894)
  - Daemon: auto-memory store with chat-protocol-aware extraction (#999), install/uninstall skills & design systems (#1003), HTTP 206 range requests for video/audio (#1105), scheduled routines (#1033), agent runtime + route registration refactor (#1063, #1043)
  - HyperFrames: HTML-in-Canvas across web + skills (#866)
  - Skills/design systems: generic skills + design-templates split + finalize-design API (#955), agent-browser skill (#1284), WeChat design system + login-flow skill (#1083), hud/loom/trading-terminal design systems (#1069), release-notes-one-pager skill (#873), tokens.css schema (#1231)
  - Packaging: macOS Intel (x64) build (#759), official Nix flake (#402), beta packaging cache (#1095)
  - Maintainer ops: tools-pr PR-duty workspace (#1259), MAINTAINERS.md (#1290), contributor card bot (#932), PR→issue linking discipline (#1263)
  - Changed: conversation run isolation (#1271), default English i18n fallback (#1270), Codex CLI exit diagnostics / empty-response handling / path fallback (#1267, #1244, #1205)
  - Fixed: ~30 web + desktop + daemon + packaging bugfixes
  - Internal: nightly UI/desktop regression coverage (#1256), e2e/release report hardening (#1140), entry/settings automation (#954)
- catch up [Unreleased] compare link to v0.7.0 and add missing [0.6.0] release link
- add 97 PR footnote refs ([#402]..[#1330])

Verified locally: pnpm install + pre-build contracts/daemon/desktop dist + pnpm typecheck (exit 0 across all 14 packages on Node 22.22 with engine-warning).

Release workflow validation runs after merge via release-stable.
2026-05-12 15:33:28 +08:00
Bryan A
587c783dc0
feat(web): add Finalize design package + Continue in CLI buttons (#451) (#974)
* feat(daemon): expose resolvedDir on GET /api/projects/:id (#451 prereq)

Native projects (no metadata.baseDir) live at <projects root>/<id>, where
projects root is daemon-side state. The web client cannot reconstruct an
absolute path on its own, and shell.openPath on a relative path is
undefined behavior. Without resolvedDir, the upcoming Continue in CLI
button (#451) would render permanently disabled for native projects.

Mirrors PR #832's pattern of exposing designMdPath in its response.
Computed via the existing resolveProjectDir(...) helper. No behavior
change to existing callers; they ignore the new field.

Adds ProjectDetailResponse contract type and a focused projects-routes
test covering imported-folder, native, and unknown-id paths.

* feat(web): add parseProvenance helper for DESIGN.md staleness checks

Pure helper that extracts Project ID, design system, current artifact,
transcript message count, and generated UTC timestamp from the
`## Provenance` section emitted by the daemon's finalize synthesis
prompt (apps/daemon/src/finalize-design.ts). Used by useDesignMdState
to derive the Continue in CLI button's stale/fresh state without an
additional daemon endpoint.

Handles missing section, "none" sentinels for design system /
artifact, and malformed timestamps without throwing. Tests cover all
four branches.

* feat(web): add buildClipboardPrompt template for Continue in CLI

Inline single-source-of-truth template per #451 spec §3.4. Names the
project, the working directory, and the DESIGN.md-first operating
contract for the receiving `claude` CLI session. Trailing TODO is
the blank task slot the issue body specifies — left empty so the user
fills it in before submitting.

Also lands the shared copyToClipboard helper (jsdom-safe canonical path
+ execCommand fallback) so the new button and any future caller share
one fallback path, mirroring the inline pattern in FileViewer.tsx.

Tests cover happy-path field rendering, "none"/"unknown" sentinels
when DESIGN.md fields are absent, and both clipboard branches.

* feat(web): add useProjectDetail + useDesignMdState hooks

useProjectDetail wraps GET /api/projects/:id, surfacing the resolvedDir
field and falling back to metadata.baseDir for older daemons that don't
include it. Continue in CLI needs an absolute working directory so the
desktop bridge can openPath it; the web client never reconstructs the
path itself.

useDesignMdState fetches the project's file list, downloads DESIGN.md
when present, parses the Provenance section, and computes a stale
verdict by comparing the recorded generatedAt against the max mtime of
non-DESIGN.md files and the max conversation updatedAt. Drives the
button's three-state UI (disabled / fresh / stale) without a
daemon-side endpoint.

Tests cover happy path, fallback, and both stale branches plus the
pure computeStale helper for the null-timestamp edge case.

* feat(web): add useFinalizeProject hook with cancel + error-code mapping

Wraps POST /api/projects/:id/finalize/anthropic for the Finalize design
package button. Three concerns:

  1. Lifecycle: idle → pending → success | error. Double-clicking the
     button aborts the prior in-flight request before starting a new
     one so the daemon never sees stacked finalize calls per project.

  2. Cancellation: AbortController plumbed through fetch + a 130 s
     timer (daemon timeout 120 s + 10 s buffer). Cancel returns to idle
     cleanly — it's a user gesture, not an error surface.

  3. Daemon error mapping: when the response is non-OK, body.error.code
     drives the canonical user-facing toast string (table covers all
     7 codes the daemon emits today plus a network-error catch-all).
     body.error.details, when a string, surfaces alongside the category
     message so account-usage-cap responses (Anthropic 400 →
     UPSTREAM_UNAVAILABLE) can show the upstream's own reason instead
     of just the daemon's category label — committed to lefarcen on
     #450 verification reply.

Tests cover request body shape, all 8 error codes via it.each, the
network-error path, the details-surfacing branch, the cancel ⇒ idle
flow, and the unknown-code → catch-all message branch.

* feat(web): add useTerminalLaunch with electron/web detection

Capability-detected wrapper around window.electronAPI.openPath. On
desktop the bridge forwards to shell.openPath, which opens the OS
file manager at the project working directory (per Electron's
contract for directory paths — it is NOT a terminal launcher;
spawning a terminal application is deferred per #451 Non-goals). On
browser builds the hook reports web-fallback so the caller renders
a manual-instruction toast naming the working directory.

Treats any non-empty string return from shell.openPath as ok: false
so platform-specific failures surface the manual fallback toast.
Behavior is exercised end-to-end by the upcoming
ContinueInCliButton tests.

* feat(desktop): expose shell.openPath via electronAPI bridge

Adds an openPath bridge method that the Continue in CLI button (#451)
uses to surface the project working directory in the OS file manager.
shell.openPath is part of Electron's contract and resolves to '' on
success / a non-empty error string on failure; the IPC handler
forwards the result so the renderer can decide between the success
toast and the manual fallback toast without a separate error channel.

Empty / non-string inputs short-circuit to a self-describing error
string so the renderer never needs to worry about undefined-input
crashes from the main process.

Web side: extracts Window.electronAPI into a single global declaration
at apps/web/src/types/electron.d.ts so future bridge methods land in
one place. Two pre-existing inline declare-global blocks
(NewProjectPanel.tsx, providers/registry.ts) are deleted in favor of
that single source of truth — the inline ones each carried a partial
shape of the bridge and were diverging from the desktop preload.

* feat(web): add FinalizeDesignButton, ContinueInCliButton, ProjectActionsToolbar

Project-level toolbar that hosts the two new actions from #451.
Mounted between AppChromeHeader and the chat/workspace split (wiring
lands in the next commit). Per-file actions (Export PDF/PPTX/ZIP,
Deploy) stay in the FileViewer share menu.

FinalizeDesignButton has three idle labels driven by DESIGN.md
existence + staleness, plus a pending state with a spinner and a
cancel link that maps to useFinalizeProject's AbortController. Error
toasts are owned by ProjectView so the button doesn't carry its own
toast surface.

ContinueInCliButton renders disabled with a Finalize-pointing
tooltip when DESIGN.md is missing (so the workflow is discoverable
rather than hidden), enabled when fresh, and enabled with a stale
chip otherwise. Chip text is the spec's canonical "Spec is stale —
regenerate?" — N-turns-ago is deferred per spec §4.6.

Toast.tsx is a tiny transient component that mirrors
PromptTemplatePreviewModal's state-based toast pattern; supports a
secondary details line so daemon error envelopes that carry an
upstream explanation (e.g. Anthropic account-usage cap) can surface
the real reason alongside the daemon's category label.

CSS appends one block to apps/web/src/index.css mirroring the
existing app-project-title token usage; no CSS modules in this
repo (verified by grep).

* test(web): cover ContinueInCliButton states + interaction wiring

Three rendered states (DESIGN.md missing → disabled with the
Finalize-pointing tooltip; DESIGN.md fresh → enabled, no chip;
DESIGN.md stale → enabled with the canonical "Spec is stale —
regenerate?" chip), plus three onClick branches (no-op when
disabled, fires once when fresh, fires once when stale).

Click-handler integration with clipboard / shell.openPath / toast
lives in ProjectView (the button is presentational and takes the
handler in via props), so those are covered by Phase K's wiring +
the manual smoke test rather than the per-component test.

* feat(web): wire Continue in CLI + Finalize buttons into ProjectView

Mounts the new project-actions toolbar between AppChromeHeader and
the chat/workspace split, hidden when workspaceFocused so the
focus-mode artifact view stays uncluttered.

Wires the four hooks (useProjectDetail, useDesignMdState,
useFinalizeProject, useTerminalLaunch) to a single shared toast
surface. handleFinalize reads the request body from the existing
config: AppConfig prop and uses effectiveMaxTokens(config) to match
the chat-flow's maxTokens defaulting; on success it refreshes
useDesignMdState so the toolbar re-renders with the new chip state.

handleContinueInCli builds the literal clipboard prompt, copies it,
opens the working directory via shell.openPath on desktop /
falls through to a manual-instruction toast on browser, and surfaces
shell.openPath failures with a fallback toast that names the path.

Errors lift into the same toast surface (a useEffect tied to
finalize.error) so the daemon's category message + body.error.details
reach the user as the spec's two-line render — covered by hook test
16a in the prior commit.

⌘+Shift+K (mac) / Ctrl+Shift+K (others) is the keyboard
accelerator for Continue in CLI; capture-phase, platform-gated,
no-op when DESIGN.md is missing. Mirrors the existing FileWorkspace
shortcut idiom and does not collide with ⌘+P (Quick Switcher).

* fix(web): distinguish timeout abort from user cancel in useFinalizeProject

Addresses codex P2 finding on PR #974: the catch block treated every
AbortError as a user-initiated cancel and reset to idle silently. If
the internal 130 s timeout fired, users saw no failure signal but the
daemon's synthesis call may still have been in flight.

Adds a timedOutRef set inside the setTimeout callback before
controller.abort(), and branches in the catch: timeout → status
'error' with new TIMEOUT code ("Finalize timed out after 130 s. The
daemon may still be running."), user cancel → existing idle reset.
Reset the ref at the start of every trigger() so a previous timeout
doesn't poison the next call.

Adds one test using vi.useFakeTimers() that advances past 130_001 ms
and asserts the TIMEOUT error surface.

* fix(web): surface clipboard failures by rendering the prompt in the toast

Addresses codex P2 finding on PR #974: handleContinueInCli ignored
copyToClipboard's return value, so when both clipboard paths failed
(restricted browser context / insecure origin) the toast still said
"paste the prompt" though nothing had been copied — leaving users
with no manual-copy recourse in exactly the environments where the
fallback should help.

handleContinueInCli now branches on copyToClipboard's boolean return.
On failure the toast renders the prepared prompt in a scrollable
<pre> block and pins itself open (no auto-dismiss) so the user has
time to select-and-copy manually. Includes a Dismiss button + the
working directory in the secondary details line so the user has the
information needed to proceed.

The folder-open call is skipped on copy failure because there's
nothing to paste yet; the user copies first, then re-clicks Continue
in CLI when they're ready.

Toast component grows an optional Updating VS Code Server to version 41dd792b5e652393e7787322889ed5fdc58bd75b
Removing previous installation...
Installing VS Code Server for Linux x64 (41dd792b5e652393e7787322889ed5fdc58bd75b)
Downloading:       0%  0%  0%  0%  0%  0%  0%  0%  0%  0%  0%  0%  0%  0%  0%  0%  0%  1%  1%  1%  1%  1%  1%  1%  1%  1%  1%  1%  1%  1%  1%  1%  1%  1%  1%  2%  2%  2%  2%  2%  2%  2%  2%  2%  2%  2%  2%  2%  2%  2%  2%  2%  2%  3%  3%  3%  3%  3%  3%  3%  3%  3%  3%  3%  3%  3%  3%  3%  3%  3%  3%  4%  4%  4%  4%  4%  4%  4%  4%  4%  4%  4%  4%  4%  4%  4%  4%  4%  4%  5%  5%  5%  5%  5%  5%  5%  5%  5%  5%  5%  5%  5%  5%  5%  5%  5%  5%  6%  6%  6%  6%  6%  6%  6%  6%  6%  6%  6%  6%  6%  6%  6%  6%  6%  6%  7%  7%  7%  7%  7%  7%  7%  7%  7%  7%  7%  7%  7%  7%  7%  7%  7%  7%  8%  8%  8%  8%  8%  8%  8%  8%  8%  8%  8%  8%  8%  8%  8%  8%  8%  8%  9%  9%  9%  9%  9%  9%  9%  9%  9%  9%  9%  9%  9%  9%  9%  9%  9% 10% 10% 10% 10% 10% 10% 10% 10% 10% 10% 10% 10% 10% 10% 10% 10% 10% 10% 11% 11% 11% 11% 11% 11% 11% 11% 11% 11% 11% 11% 11% 11% 11% 11% 11% 11% 12% 12% 12% 12% 12% 12% 12% 12% 12% 12% 12% 12% 12% 12% 12% 12% 12% 12% 13% 13% 13% 13% 13% 13% 13% 13% 13% 13% 13% 13% 13% 13% 13% 13% 13% 13% 14% 14% 14% 14% 14% 14% 14% 14% 14% 14% 14% 14% 14% 14% 14% 14% 14% 14% 15% 15% 15% 15% 15% 15% 15% 15% 15% 15% 15% 15% 15% 15% 15% 15% 15% 15% 16% 16% 16% 16% 16% 16% 16% 16% 16% 16% 16% 16% 16% 16% 16% 16% 16% 16% 17% 17% 17% 17% 17% 17% 17% 17% 17% 17% 17% 17% 17% 17% 17% 17% 17% 17% 18% 18% 18% 18% 18% 18% 18% 18% 18% 18% 18% 18% 18% 18% 18% 18% 18% 19% 19% 19% 19% 19% 19% 19% 19% 19% 19% 19% 19% 19% 19% 19% 19% 19% 19% 20% 20% 20% 20% 20% 20% 20% 20% 20% 20% 20% 20% 20% 20% 20% 20% 20% 20% 21% 21% 21% 21% 21% 21% 21% 21% 21% 21% 21% 21% 21% 21% 21% 21% 21% 21% 22% 22% 22% 22% 22% 22% 22% 22% 22% 22% 22% 22% 22% 22% 22% 22% 22% 22% 23% 23% 23% 23% 23% 23% 23% 23% 23% 23% 23% 23% 23% 23% 23% 23% 23% 23% 24% 24% 24% 24% 24% 24% 24% 24% 24% 24% 24% 24% 24% 24% 24% 24% 24% 24% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 26% 26% 26% 26% 26% 26% 26% 26% 26% 26% 26% 26% 26% 26% 26% 26% 26% 26% 27% 27% 27% 27% 27% 27% 27% 27% 27% 27% 27% 27% 27% 27% 27% 27% 27% 28% 28% 28% 28% 28% 28% 28% 28% 28% 28% 28% 28% 28% 28% 28% 28% 28% 28% 29% 29% 29% 29% 29% 29% 29% 29% 29% 29% 29% 29% 29% 29% 29% 29% 29% 29% 30% 30% 30% 30% 30% 30% 30% 30% 30% 30% 30% 30% 30% 30% 30% 30% 30% 30% 31% 31% 31% 31% 31% 31% 31% 31% 31% 31% 31% 31% 31% 31% 31% 31% 31% 31% 32% 32% 32% 32% 32% 32% 32% 32% 32% 32% 32% 32% 32% 32% 32% 32% 32% 32% 33% 33% 33% 33% 33% 33% 33% 33% 33% 33% 33% 33% 33% 33% 33% 33% 33% 33% 34% 34% 34% 34% 34% 34% 34% 34% 34% 34% 34% 34% 34% 34% 34% 34% 34% 34% 35% 35% 35% 35% 35% 35% 35% 35% 35% 35% 35% 35% 35% 35% 35% 35% 35% 35% 36% 36% 36% 36% 36% 36% 36% 36% 36% 36% 36% 36% 36% 36% 36% 36% 36% 37% 37% 37% 37% 37% 37% 37% 37% 37% 37% 37% 37% 37% 37% 37% 37% 37% 37% 38% 38% 38% 38% 38% 38% 38% 38% 38% 38% 38% 38% 38% 38% 38% 38% 38% 38% 39% 39% 39% 39% 39% 39% 39% 39% 39% 39% 39% 39% 39% 39% 39% 39% 39% 39% 40% 40% 40% 40% 40% 40% 40% 40% 40% 40% 40% 40% 40% 40% 40% 40% 40% 40% 41% 41% 41% 41% 41% 41% 41% 41% 41% 41% 41% 41% 41% 41% 41% 41% 41% 41% 42% 42% 42% 42% 42% 42% 42% 42% 42% 42% 42% 42% 42% 42% 42% 42% 42% 42% 43% 43% 43% 43% 43% 43% 43% 43% 43% 43% 43% 43% 43% 43% 43% 43% 43% 43% 44% 44% 44% 44% 44% 44% 44% 44% 44% 44% 44% 44% 44% 44% 44% 44% 44% 44% 45% 45% 45% 45% 45% 45% 45% 45% 45% 45% 45% 45% 45% 45% 45% 45% 45% 46% 46% 46% 46% 46% 46% 46% 46% 46% 46% 46% 46% 46% 46% 46% 46% 46% 46% 47% 47% 47% 47% 47% 47% 47% 47% 47% 47% 47% 47% 47% 47% 47% 47% 47% 47% 48% 48% 48% 48% 48% 48% 48% 48% 48% 48% 48% 48% 48% 48% 48% 48% 48% 48% 49% 49% 49% 49% 49% 49% 49% 49% 49% 49% 49% 49% 49% 49% 49% 49% 49% 49% 50% 50% 50% 50% 50% 50% 50% 50% 50% 50% 50% 50% 50% 50% 50% 50% 50% 50% 51% 51% 51% 51% 51% 51% 51% 51% 51% 51% 51% 51% 51% 51% 51% 51% 51% 51% 52% 52% 52% 52% 52% 52% 52% 52% 52% 52% 52% 52% 52% 52% 52% 52% 52% 52% 53% 53% 53% 53% 53% 53% 53% 53% 53% 53% 53% 53% 53% 53% 53% 53% 53% 53% 54% 54% 54% 54% 54% 54% 54% 54% 54% 54% 54% 54% 54% 54% 54% 54% 54% 55% 55% 55% 55% 55% 55% 55% 55% 55% 55% 55% 55% 55% 55% 55% 55% 55% 55% 56% 56% 56% 56% 56% 56% 56% 56% 56% 56% 56% 56% 56% 56% 56% 56% 56% 56% 57% 57% 57% 57% 57% 57% 57% 57% 57% 57% 57% 57% 57% 57% 57% 57% 57% 57% 58% 58% 58% 58% 58% 58% 58% 58% 58% 58% 58% 58% 58% 58% 58% 58% 58% 58% 59% 59% 59% 59% 59% 59% 59% 59% 59% 59% 59% 59% 59% 59% 59% 59% 59% 59% 60% 60% 60% 60% 60% 60% 60% 60% 60% 60% 60% 60% 60% 60% 60% 60% 60% 60% 61% 61% 61% 61% 61% 61% 61% 61% 61% 61% 61% 61% 61% 61% 61% 61% 61% 61% 62% 62% 62% 62% 62% 62% 62% 62% 62% 62% 62% 62% 62% 62% 62% 62% 62% 62% 63% 63% 63% 63% 63% 63% 63% 63% 63% 63% 63% 63% 63% 63% 63% 63% 63% 64% 64% 64% 64% 64% 64% 64% 64% 64% 64% 64% 64% 64% 64% 64% 64% 64% 64% 65% 65% 65% 65% 65% 65% 65% 65% 65% 65% 65% 65% 65% 65% 65% 65% 65% 65% 66% 66% 66% 66% 66% 66% 66% 66% 66% 66% 66% 66% 66% 66% 66% 66% 66% 66% 67% 67% 67% 67% 67% 67% 67% 67% 67% 67% 67% 67% 67% 67% 67% 67% 67% 67% 68% 68% 68% 68% 68% 68% 68% 68% 68% 68% 68% 68% 68% 68% 68% 68% 68% 68% 69% 69% 69% 69% 69% 69% 69% 69% 69% 69% 69% 69% 69% 69% 69% 69% 69% 69% 70% 70% 70% 70% 70% 70% 70% 70% 70% 70% 70% 70% 70% 70% 70% 70% 70% 70% 71% 71% 71% 71% 71% 71% 71% 71% 71% 71% 71% 71% 71% 71% 71% 71% 71% 71% 72% 72% 72% 72% 72% 72% 72% 72% 72% 72% 72% 72% 72% 72% 72% 72% 72% 73% 73% 73% 73% 73% 73% 73% 73% 73% 73% 73% 73% 73% 73% 73% 73% 73% 73% 74% 74% 74% 74% 74% 74% 74% 74% 74% 74% 74% 74% 74% 74% 74% 74% 74% 74% 75% 75% 75% 75% 75% 75% 75% 75% 75% 75% 75% 75% 75% 75% 75% 75% 75% 75% 76% 76% 76% 76% 76% 76% 76% 76% 76% 76% 76% 76% 76% 76% 76% 76% 76% 76% 77% 77% 77% 77% 77% 77% 77% 77% 77% 77% 77% 77% 77% 77% 77% 77% 77% 77% 78% 78% 78% 78% 78% 78% 78% 78% 78% 78% 78% 78% 78% 78% 78% 78% 78% 78% 79% 79% 79% 79% 79% 79% 79% 79% 79% 79% 79% 79% 79% 79% 79% 79% 79% 79% 80% 80% 80% 80% 80% 80% 80% 80% 80% 80% 80% 80% 80% 80% 80% 80% 80% 80% 81% 81% 81% 81% 81% 81% 81% 81% 81% 81% 81% 81% 81% 81% 81% 81% 81% 81% 82% 82% 82% 82% 82% 82% 82% 82% 82% 82% 82% 82% 82% 82% 82% 82% 82% 83% 83% 83% 83% 83% 83% 83% 83% 83% 83% 83% 83% 83% 83% 83% 83% 83% 83% 84% 84% 84% 84% 84% 84% 84% 84% 84% 84% 84% 84% 84% 84% 84% 84% 84% 84% 85% 85% 85% 85% 85% 85% 85% 85% 85% 85% 85% 85% 85% 85% 85% 85% 85% 85% 86% 86% 86% 86% 86% 86% 86% 86% 86% 86% 86% 86% 86% 86% 86% 86% 86% 86% 87% 87% 87% 87% 87% 87% 87% 87% 87% 87% 87% 87% 87% 87% 87% 87% 87% 87% 88% 88% 88% 88% 88% 88% 88% 88% 88% 88% 88% 88% 88% 88% 88% 88% 88% 88% 89% 89% 89% 89% 89% 89% 89% 89% 89% 89% 89% 89% 89% 89% 89% 89% 89% 89% 90% 90% 90% 90% 90% 90% 90% 90% 90% 90% 90% 90% 90% 90% 90% 90% 90% 90% 91% 91% 91% 91% 91% 91% 91% 91% 91% 91% 91% 91% 91% 91% 91% 91% 91% 92% 92% 92% 92% 92% 92% 92% 92% 92% 92% 92% 92% 92% 92% 92% 92% 92% 92% 93% 93% 93% 93% 93% 93% 93% 93% 93% 93% 93% 93% 93% 93% 93% 93% 93% 93% 94% 94% 94% 94% 94% 94% 94% 94% 94% 94% 94% 94% 94% 94% 94% 94% 94% 94% 95% 95% 95% 95% 95% 95% 95% 95% 95% 95% 95% 95% 95% 95% 95% 95% 95% 95% 96% 96% 96% 96% 96% 96% 96% 96% 96% 96% 96% 96% 96% 96% 96% 96% 96% 96% 97% 97% 97% 97% 97% 97% 97% 97% 97% 97% 97% 97% 97% 97% 97% 97% 97% 97% 98% 98% 98% 98% 98% 98% 98% 98% 98% 98% 98% 98% 98% 98% 98% 98% 98% 98% 99% 99% 99% 99% 99% 99% 99% 99% 99% 99% 99% 99% 99% 99% 99% 99% 99% 99%100%100%
Unpacking:   0%  1%  2%  3%  4%  5%  6%  7%  8%  9% 10% 11% 12% 13% 14% 15% 16% 17% 18% 19% 20% 21% 22% 23% 24% 25% 26% 27% 28% 29% 30% 31% 32% 33% 34% 35% 36% 37% 38% 39% 40% 41% 42% 43% 44% 45% 46% 47% 48% 49% 50% 51% 52% 53% 54% 55% 56% 57% 58% 59% 60% 61% 62% 63% 64% 65% 66% 67% 68% 69% 70% 71% 72% 73% 74% 75% 76% 77% 78% 79% 80% 81% 82% 83% 84% 85% 86% 87% 88% 89% 90% 91% 92% 93% 94% 95% 96% 97% 98% 99%100%
Unpacked 4009 files and folders to /home/bryan/.vscode-server/bin/41dd792b5e652393e7787322889ed5fdc58bd75b.
Looking for compatibility check script at /home/bryan/.vscode-server/bin/41dd792b5e652393e7787322889ed5fdc58bd75b/bin/helpers/check-requirements.sh
Running compatibility check script
Compatibility check successful (0) prop and the auto-dismiss
TTL is suppressed whenever code is present. CSS adds .od-toast-code
(monospace, max-height 240 with overflow-auto) and .od-toast-dismiss
styling.

Six new Toast tests cover details rendering, code rendering,
no-auto-dismiss when code is present, auto-dismiss when code is
absent, and the Dismiss button affordance.

* fix(web): make ContinueInCliButton disabled-state guidance visible

Addresses mrcfps's PR #974 review: native <button disabled> does
not fire hover/focus events in browsers we ship against, so a
`title` tooltip on the disabled button never surfaces. The only
guidance for the missing-DESIGN.md state was effectively invisible —
defeating the spec's "discoverable, not hidden" intent.

Renders the help text as a visible sibling <span> next to the
disabled button instead. Adds aria-describedby pointing the button
at the hint's id so assistive tech announces the explanation when
the disabled button gets focus. The native `disabled` attribute
stays so the button still can't be clicked or submitted.

CSS adds .project-actions-disabled-hint (muted italic, 11.5px,
matches the existing meta/secondary text style on this surface).

Test asserts the role="note" hint is in the DOM with the canonical
text and that the button's aria-describedby links to its id.

* fix(web): keep ProjectActionsToolbar at natural height inside the .app grid

The .app container was `grid-template-rows: auto 1fr` — only two
rows. Adding ProjectActionsToolbar as a third child between
AppChromeHeader and the chat/workspace split made the toolbar the
2nd grid item, so it took the `1fr` row (filling roughly half the
viewport) while the split got pushed into an implicit auto row at
its content's natural height. Surfaced as a screenshot from Bryan
showing the toolbar's background bleeding across most of the screen.

Extend grid-template-rows to `auto auto 1fr` and pin the split to
`grid-row: 3` explicitly. Now:
- Toolbar visible: row 1 = header (auto), row 2 = toolbar (auto),
  row 3 = split (1fr, fills remaining viewport).
- Toolbar hidden via hidden=workspaceFocused → ProjectActionsToolbar
  returns null, row 2 collapses to 0px (auto with no content), split
  still fills row 3.

No JS changes; existing 609 tests still green.

* fix(web): guard useFinalizeProject state writes against superseded triggers

Addresses mrcfps's PR #974 P1 review on useFinalizeProject.ts:132
(also called out as P1.3 in lefarcen's deep-dive review).

Calling trigger() twice in quick succession aborted the first
controller and swapped abortRef to the new one, but the first
request's later AbortError catch still unconditionally called
setStatus('idle') / setError(null). That cleared the spinner and
re-enabled both toolbar buttons while the replacement finalize was
still pending — defeating the de-duplication this hook was meant to
enforce.

Adds an isCurrent() closure (`abortRef.current === controller`)
and gates every state-write site after the await: success path,
non-OK envelope path, AbortError-timeout, AbortError-cancel, and
network-error all bail early when the trigger has been superseded.
Per mrcfps: "make every state write request-scoped."

Regression test triggers twice in quick succession with a
never-resolving fetch, awaits the first promise (it rejects with
AbortError), and asserts status stays 'pending' rather than
collapsing to 'idle' under the replacement's lifetime.

* fix(desktop): allowlist-validate shell.openPath against registered project roots

Addresses mrcfps's PR #974 P1 review on runtime.ts:305 (also called
out as P1.2 in lefarcen's deep-dive review): the new
`shell:open-path` IPC handler accepted any renderer-supplied
string and forwarded it straight into Electron's `shell.openPath`,
widening the renderer→main trust boundary so XSS or a compromised
renderer dependency could open arbitrary local paths to the user.

Adds an explicit gate around the bridge:

  1. validateExistingDirectory(p) — floor check that rejects empty
     strings, relative paths, files, apps, and non-existent paths;
     realpath-resolves so symlink games can't be used to register
     one path and reach another.

  2. createProjectRootGate() — Set-backed allowlist of
     daemon-validated project working directories. The renderer
     calls registerProjectRoot(absDir) once per project mount via
     a new IPC method (preload bridge); the main process only
     opens paths that pass both the floor check and the allowlist.

ProjectView wires the registration via a useEffect tied to
projectDetail.resolvedDir, so the active project's daemon-supplied
working directory is always the one being approved (not a renderer-
synthesized string).

Threat-model caveat documented in the runtime.ts comment block: an
attacker that fully controls the renderer can also call register
with arbitrary paths. Closing that gap fully requires a daemon-side
round-trip to derive the canonical resolvedDir from the daemon's
project registry, which is deferred to keep this PR focused.
Today's allowlist still defends against accidental misuse, bugs,
and common XSS payloads that don't know to call register first.

Adds apps/packaged/tests/desktop-project-root-gate.test.ts with 13
cases: floor-validation rejection cases (empty / relative / missing
/ file), happy-path resolution, symlink realpath canonicalization,
and the allowlist's register/isApproved/reset semantics. Mirrors
the existing apps/packaged/tests/desktop-url-allowlist.test.ts
pattern from PR #911 — the packaged workspace hosts the test
because apps/desktop has no vitest setup yet.

* fix(daemon): wire request-lifecycle abort signal through finalize route

Addresses mrcfps's PR #974 P1 review on
apps/daemon/src/server.ts:3831-3837 (also called out as P1.1 in
lefarcen's deep-dive review): `POST /api/projects/:id/finalize/anthropic`
called `finalizeDesignPackage(...)` without threading any
request-lifecycle abort, so cancelling the browser fetch only
aborted the UI-side request — the daemon's 60–120 s Anthropic call
kept running and still wrote DESIGN.md after the UI returned to idle.

Adds an AbortController inside the route handler, fired from
`res.on('close')`, and threads its signal into the existing
`signal?: AbortSignal` parameter on `FinalizeOptions`
(finalize-design.ts:70). `callAnthropicWithRetry` already passes
the signal through to the underlying fetch, so a client disconnect
now propagates all the way to the Anthropic SDK call.

Listener-event choice: `res.on('close')` is the canonical event
for "client disconnected before response was sent" in Express. The
common alternative `req.on('close')` fires whenever the *request*
stream finishes — for POST routes that means as soon as the
body-parser middleware drains the body, well before the route does
any work. Using req.on('close') would have flipped the abort
controller in every successful run; the test caught this empirically.

Caveat documented in the route's comment block: an abort fired
*after* the upstream response has been received but *before* the
atomic write completes still allows the write to land. The SDK
contract bounds the network round-trip, not the post-network disk
handoff.

Adds tests/finalize-route-abort.test.ts: spins up the test server,
mocks global fetch to capture the daemon-side AbortSignal at the
Anthropic call, sends the request via raw http (so we can destroy
the underlying socket), waits until the server reaches the
Anthropic call, then destroys the socket and asserts that the
daemon-side signal received an abort event within 5 s.

Three pre-existing project-watchers chokidar tests show flaky
timeouts under full-suite concurrency but pass in isolation;
unrelated to this fix.

* fix(daemon): refactor finalize-route-abort test to satisfy strict TS narrowing

The CI typecheck (`pnpm --filter @open-design/daemon typecheck`,
which runs both tsconfig.json and tsconfig.tests.json) caught what
my pre-push validation missed: TS narrowed `capturedSignal` to
literal `null` because vitest's mockImplementation closure can't
prove its callback runs, leaving the bare `let capturedSignal:
AbortSignal | null = null` permanently typed at its initial value.
At line 184 (`expect(capturedSignal?.aborted).toBe(true)`) the
right-hand side of the optional-chain became unreachable, and TS
flagged it as `Property 'aborted' does not exist on type 'never'`.

Switches to the standard ref-object pattern
(`const capture: { signal: AbortSignal | null } = { signal: null }`).
TS narrows let bindings inside closures conservatively but treats
object-property writes as opaque, so `capture.signal` reads
correctly across the closure boundary. Logic is unchanged.

(Pre-push oversight: ran `pnpm --filter @open-design/web typecheck`
but not the full repo `pnpm typecheck` after the daemon test
landed; the daemon's own typecheck would have caught this. Adding
`pnpm typecheck` back into the standard pre-push checklist.)

* fix(desktop): make shell.openPath gate daemon-controlled and reject .app bundles

Addresses lefarcen + mrcfps PR #974 P1 reviews on the previous path
allowlist (commit 8bf56597):

  - mrcfps (runtime.ts:45): `validateExistingDirectory` accepted
    macOS `.app` bundles because they're directories, so the gate
    would forward `/Applications/Safari.app` (or any other app
    bundle) into shell.openPath and *launch* the application — a
    stronger capability than the bridge's intended "reveal the
    project folder" feature.

  - lefarcen (runtime.ts:396): the allowlist was renderer-controlled.
    A compromised renderer could call `shell:register-project-root`
    with any existing absolute directory and then `shell:open-path`
    that same path; the IPC injection issue I'd documented as
    "deferred" was the central reviewer concern, not an acceptable
    caveat. Both reviewers asked for the gate to be derived from
    a daemon-authoritative source.

The redesign drops the renderer-controlled register/openPath pair
and replaces it with a single `openPath(projectId)` bridge call.
The desktop main process resolves the project ID by calling the
daemon's `GET /api/projects/:id` endpoint over the web sidecar
proxy (which already forwards `/api/*` to the daemon — verified
in apps/web/sidecar/server.ts:209 and apps/web/next.config.ts:77),
parses `resolvedDir` from the response, validates it against the
floor (absolute, exists, is-directory, not .app), and only then
forwards to `shell.openPath`. The renderer never names the path
directly, so a compromised renderer cannot escalate to opening
arbitrary local paths — it can only name a project the daemon
already knows about, and the canonical path comes from the daemon's
own response.

Surface changes:

  - `runtime.ts`: `createProjectRootGate` removed.
    `fetchResolvedProjectDir(webUrl, projectId, fetchImpl?)` added.
    `validateExistingDirectory` rejects `.app` suffix after the
    realpath check (so symlinked launders are caught too).
    `shell:open-path` handler signature changes from `(path)` to
    `(projectId)`; `shell:register-project-root` handler removed.

  - `preload.cts`: `openPath(projectId)`; `registerProjectRoot`
    removed from the bridge surface.

  - `apps/web/src/types/electron.d.ts`: type updated to match.

  - `useTerminalLaunch.ts`: `open(projectId)` instead of
    `open(dir)`.

  - `ProjectView.tsx`: passes `project.id` to
    `terminalLauncher.open`; the registerProjectRoot useEffect is
    deleted. Toast text still reads `projectDir` (from
    `useProjectDetail.resolvedDir`) for fallback messages — the
    *display* path is independent of the *open* mechanism.

  - `apps/packaged/tests/desktop-project-root-gate.test.ts`:
    rewritten to cover `validateExistingDirectory` (8 cases
    including the new `.app` suffix and symlinked-bundle rejection)
    and `fetchResolvedProjectDir` (8 cases including empty/invalid
    project ids, daemon HTTP success/failure, missing resolvedDir,
    network error, and URL canonicalization).

Total: 16 passing tests, ~330 LOC churn including test rewrites.

Lesson learned (from the iteration loop, not the code): when a
reviewer asks for "ideally X, or at least Y," shipping Y with a
deferred-X note flags the gap rather than fixing it. Either ship X
or argue Y is sufficient; don't middle-ground.

* feat(contracts,sidecar-proto): add desktop-auth IPC + fromTrustedPicker

Schema-only prep for the PR #974 round-3 fix. Adds the two type
extensions the daemon HTTP gate and the desktop main process will
build on:

- packages/sidecar-proto: SIDECAR_MESSAGES.REGISTER_DESKTOP_AUTH, with a
  base64-validated `{ secret }` payload + RegisterDesktopAuthResult.
  Updates normalizeDaemonSidecarMessage to accept the new message and
  pins both branches (accept + reject) in tests/index.test.ts.

- packages/contracts: ProjectMetadata.fromTrustedPicker — a marker the
  daemon stamps on folder-imported projects whose POST /api/import/folder
  passed the desktop HMAC gate. The marker is privileged in the same
  way as `baseDir`: only the gated import handler sets it, and the
  desktop main process refuses to forward `shell.openPath` for
  folder-imported projects whose metadata lacks it.

* fix(daemon): gate /api/import/folder on desktop HMAC token

Closes the renderer→arbitrary-baseDir→shell.openPath bypass chain
flagged by lefarcen and mrcfps in round 3 of PR #974. Both reviewers
converged on the same gap: the previous round only moved path
resolution into the daemon, but renderer JS could still POST
/api/import/folder with any absolute path, get a project ID back, and
then call openPath(projectId) to reveal the attacker-chosen path.

Daemon-side closure:

- New module-scope desktop auth secret + setter exported from
  apps/daemon/src/server.ts. The secret is null at boot (web/standalone
  mode unaffected) and gets set when the desktop main process
  registers it over the daemon's sidecar IPC.

- New `verifyDesktopImportToken` pure helper. Verifies tokens shaped
  `${nonce}~${exp}~${signature}` against HMAC-SHA256(secret, baseDir +
  "\n" + nonce + "\n" + exp). Field separator is `~` (not `.`) because
  ISO 8601 expiries embed dots; `~` is in neither base64url nor ISO
  8601 character sets. Rejects expired tokens, replayed nonces, and
  expiries beyond 2× the 60s TTL.

- New middleware on POST /api/import/folder. When the secret is set,
  every request must carry a valid `X-OD-Desktop-Import-Token` header
  bound to the requested baseDir. Rejected requests return 403 with
  FORBIDDEN. When the secret is unset (no desktop registered), the
  route is unchanged so web-only deployments and standalone daemons
  keep working.

- Trusted imports get `metadata.fromTrustedPicker: true` stamped on
  the project. POST /api/projects and PATCH /api/projects/:id reject
  any client-supplied `fromTrustedPicker` (privileged the same way as
  `baseDir`), and the PATCH preservation block re-stamps the marker
  on partial-metadata patches so it cannot be silently stripped.

- Daemon sidecar IPC handler: REGISTER_DESKTOP_AUTH calls
  setDesktopAuthSecret with the base64-decoded secret. The HTTP and
  IPC servers share a process so the registration takes effect
  immediately for the next inbound /api/import/folder call.

Tests:

- apps/daemon/tests/desktop-import-token-gate.test.ts (15 cases): web
  mode acceptance, no-token rejection, malformed-token rejection,
  wrong-secret rejection, wrong-baseDir rejection, expired rejection,
  oversized-window rejection, valid mint + trusted-picker stamp +
  replay rejection, plus 6 pure-helper cases for verifyDesktopImportToken.
  afterAll() clears the secret to keep the shared HTTP server clean
  for sibling test files.

- apps/daemon/tests/projects-routes.test.ts (+2 cases): POST and PATCH
  reject `fromTrustedPicker` in client-supplied metadata.

Existing folder-import-route.test.ts continues to pass because none of
those tests register a desktop secret; the gate stays dormant.

* fix(desktop,web): atomic pickAndImport replacing pickFolder; openPath trusted-picker check

Closes the renderer→arbitrary-baseDir bypass at the bridge boundary.
The renderer no longer receives a raw filesystem path from the main
process; the picker dialog and the import call live in a single
main-process transaction.

Desktop main:

- runDesktopMain generates a per-process 32-byte secret and registers
  it with the daemon over the daemon's sidecar IPC *before* the
  BrowserWindow is created. registerDesktopAuthWithDaemon retries a
  few times because tools-dev / tools-pack spawn daemon, web, and
  desktop as siblings, so the daemon may not be listening yet on
  desktop boot. A failed registration logs a warning and the runtime
  refuses pickAndImport calls (no secret → no token can be minted).

- runtime.ts replaces the `dialog:pick-folder` IPC with
  `dialog:pick-and-import`. The handler shows the picker, mints an
  HMAC token bound to the chosen path, POSTs /api/import/folder via
  the discovered web URL with the token + body, and returns the
  daemon's ImportFolderResponse to the renderer (or a structured
  failure envelope). Renderer never sees the path or the token.

- shell:open-path now consults a new pure helper
  `isOpenPathAllowedForProject` that refuses folder-imported projects
  whose metadata lacks `fromTrustedPicker: true`. This is the literal
  interpretation of mrcfps's round-3 follow-up: openPath is gated to
  projects whose resolvedDir came from the trusted-picker flow, not
  just transitively via the import gate. Native projects (no
  baseDir → daemon-owned <projectsRoot>/<id>) are always safe to open.

- fetchResolvedProjectDir now returns a `ResolvedProjectDirContext`
  with hasBaseDir + fromTrustedPicker so the openPath handler can
  enforce the marker check.

- New `signDesktopImportToken` pure helper mirrors the daemon-side
  signer with the same `~`-separated wire shape, exported for the
  packaged workspace's test file.

Preload bridge:

- `pickFolder` is deleted. The new `pickAndImport(init?)` returns the
  daemon's import response or a structured failure. `openPath` keeps
  its existing signature; its trust gate now lives in the main
  process.

Web renderer:

- electron.d.ts drops `pickFolder` and adds `pickAndImport` with the
  shared DesktopPickAndImportResult union pulled from contracts.

- NewProjectPanel: when running on Electron (pickAndImport bridge
  present), the "Open folder" button calls pickAndImport atomically
  and forwards the response through a new `onImportFolderResponse`
  prop. On web (no bridge), the existing manual baseDir input keeps
  working — browser builds have no shell.openPath surface so a
  renderer-named path cannot escalate.

- EntryView and App.tsx pass through the new callback. App's
  `handleImportFolderResponse` updates state from the response without
  a second fetch (the import already happened in the main process).

Tests (apps/packaged/tests/desktop-project-root-gate.test.ts):

- 3 cases for `isOpenPathAllowedForProject`: native allowed,
  trusted-picker allowed, legacy folder-import refused.

- 6 cases for `signDesktopImportToken`: shape (~-separated), determinism,
  signature flips when secret/baseDir/nonce/exp changes.

- Existing fetchResolvedProjectDir cases extended for the new
  `context` shape and additional cases that prove the metadata
  inspection (hasBaseDir, fromTrustedPicker) reads the daemon
  response correctly.

* fix(daemon): make desktop import-folder gate fail-closed (PR #974 round 4)

lefarcen P1 on round 3 of PR #974: the gate's `secret == null → accept`
branch (originally intended to keep web-only deployments unaffected)
let a renderer bypass the import boundary in two real desktop edges:

- Startup race: desktop's REGISTER_DESKTOP_AUTH IPC hasn't reached the
  daemon yet, but the renderer is already alive in the BrowserWindow
  and races to fetch /api/import/folder directly with arbitrary baseDir.
- Daemon restart mid-session: the new daemon process boots tokenless
  while a desktop is still running. Same shape: renderer fetches the
  route, daemon falls through to "web mode", accepts the untrusted
  baseDir. shell.openPath rejects (no fromTrustedPicker marker) but
  the daemon's other file APIs (read/write project files, list
  directories) operate on the attacker-chosen path.

Two coordinated mechanisms close that:

(1) Sticky in-process flag. `desktopAuthEverRegistered` flips to true
    on first non-null `setDesktopAuthSecret(...)` and never goes back.
    setDesktopAuthSecret(null) (used by tests) does NOT relax the gate
    so production code can never silently fall back to fail-open. Add
    `resetDesktopAuthForTests()` for vitest cleanup.

(2) Orchestrator-pinned mode via OD_REQUIRE_DESKTOP_AUTH=1 read at
    module load. tools-dev / tools-pack / apps/packaged set this when
    the daemon is spawned in a desktop-bundled flow (separate commits).
    With the env set, the gate is active from request 0 — a renderer
    racing /api/import/folder before registration completes gets a
    503 DESKTOP_AUTH_PENDING (transient, retry).

Standalone-daemon (web-only) deployments where neither mechanism fires
keep the gate dormant and the route's behavior unchanged.

Also addresses lefarcen P3 (whitespace HMAC mismatch): the desktop
signs the exact picker output, so the daemon must verify the same
string. The previous version trimmed `baseDir` before HMAC, which
would reject legitimate paths whose final component carried edge
whitespace. Use the raw request-body baseDir for verification; the
existing trim()+realpath() logic still normalizes for fs operations.

New error code: `DESKTOP_AUTH_PENDING` (HTTP 503, retryable).

Tests:

- `stays fail-closed (503 DESKTOP_AUTH_PENDING) after a registered
  secret is cleared` — exercises the sticky flag.
- `verifies the exact request-body baseDir, not a trimmed version` —
  pins the round-4 P3 fix.
- All existing desktop-import-token-gate cases continue to pass; the
  beforeEach/afterEach/afterAll resetters now use
  resetDesktopAuthForTests() to honor the sticky flag.

* fix(tools-dev,packaged): pin desktop import-auth on daemon spawn

PR #974 round-4 P1 follow-through. The daemon-side fail-closed gate
needs OD_REQUIRE_DESKTOP_AUTH=1 in the daemon's spawn env whenever
the daemon is paired with a desktop, so the gate is active from
request 0 and the daemon-restart-mid-session bypass cannot reopen.

tools-dev:
- spawnDaemonRuntime accepts a `requireDesktopAuth` option that
  appends OD_REQUIRE_DESKTOP_AUTH=1 to the spawn env.
- startDaemon takes the same flag and additionally checks whether a
  desktop runtime is already alive in this namespace; either branch
  pins the env (revival case where the daemon died mid-session and
  the user runs `tools-dev start daemon` to bring it back up).
- startApp threads the bundled-target list down so the daemon spawn
  knows when desktop is queued in the same orchestration even though
  the daemon starts first.
- The `start` / `restart` / `run` command actions pass the resolved
  target list into startApp.

apps/packaged:
- Packaged builds always pair a desktop with the daemon, so
  startPackagedSidecars unconditionally sets OD_REQUIRE_DESKTOP_AUTH=1
  in the daemon child env. Headless builds also flow through this
  same path, so the same gate applies.

Standalone-daemon flows unaffected: `tools-dev start daemon` (alone,
no desktop running, no desktop in the bundled target list) does not
set the env, and the daemon's gate stays dormant — current web-only
behavior is preserved.

* fix(desktop,web): align project-id regex with daemon; surface pickAndImport failures

mrcfps round-4 nits on PR #974.

apps/desktop/src/main/runtime.ts (mrcfps #1): the previous client-side
regex `^[a-zA-Z0-9_-]+$` rejected `.` even though the daemon's
canonical isSafeId / POST /api/projects accept `[A-Za-z0-9._-]{1,128}`.
Result: dotted ids like `my-project.v2` were valid backend-side but
got "project id contains disallowed characters" before
fetchResolvedProjectDir even hit the network, regressing Continue in
CLI / Finalize for those projects. Align the regex with the daemon's
shape, comment-tag the rationale.

apps/packaged/tests/desktop-project-root-gate.test.ts: add a
regression case for a dotted id and one for the 128-char length cap
(the new regex exposes both, the old regex obscured the dotted one).

apps/web/src/components/NewProjectPanel.tsx (mrcfps #2): the
`if (!result || result.ok !== true) return` branch swallowed every
non-OK pickAndImport shape (`desktop auth secret not registered`,
`web sidecar URL not available`, daemon HTTP errors with details)
the same way as the explicit `{ canceled: true }` cancel — leaving
the user with a silent no-op when the trusted-picker flow couldn't
even get off the ground. Reserve silent-return for the cancel case
only; surface every other reason via a Toast (existing component,
already used by ProjectView for related Continue-in-CLI flows).
The new `formatPickAndImportErrorDetails` helper flattens daemon
ApiError envelopes into a single readable secondary line so the
operator sees both the category ("Open folder failed: daemon
returned HTTP 503") and the upstream reason
("desktop auth required but secret not yet registered").

* docs(architecture): document desktop folder-import auth boundary

lefarcen P3 on PR #974 round 4: the `Folder import` section in
docs/architecture.md still documented only realpath / sandbox /
RUNTIME_DATA_DIR checks and omitted the new desktop HMAC trust
boundary, replay/TTL behavior, fail-closed semantics, daemon-restart
edge, and legacy-import migration note. Without that subsection it's
hard to review whether the 60s TTL, the `~`-separated token shape,
or the legacy folder-imports needing re-pick are intentional product
decisions or overlooked gaps.

Add a "Desktop folder-import auth (PR #974)" subsection covering:
- The trust handshake (32-byte secret over sidecar IPC at desktop boot).
- Token shape (`${nonce}~${exp}~${signature}`), HMAC payload, and
  why `.` cannot be the field separator (ISO 8601 expiries embed dots).
- TTL and replay behavior (60s, single-use, 2× TTL upper bound).
- Fail-closed mechanisms — sticky in-process flag and
  OD_REQUIRE_DESKTOP_AUTH env var pinning.
- Web-only deployments are unaffected (browser builds have no
  shell.openPath surface).
- The `metadata.fromTrustedPicker` marker and the openPath-side
  defense-in-depth check.
- Legacy folder-imports need re-pick to use the Continue-in-CLI button.
- Daemon-restart edge: 503 DESKTOP_AUTH_PENDING until desktop
  re-registers; restart desktop to recover.

* fix(packaged): skip desktop-auth gate in headless mode (PR #974 round 5 P2)

Round 5 (lefarcen P2): packaged headless mode (daemon+web only, no
Electron) was inheriting OD_REQUIRE_DESKTOP_AUTH=1 from the round-4
unconditional pin in startPackagedSidecars. Headless never runs desktop
main, so no client could ever register an HMAC secret and folder import
returned 503 DESKTOP_AUTH_PENDING permanently — even though headless has
no shell.openPath surface to exploit.

Plumb a required `requireDesktopAuth: boolean` option through
startPackagedSidecars: apps/packaged/src/index.ts (Electron entry)
passes true; apps/packaged/src/headless.ts passes false. Extract
buildPackagedDaemonSpawnEnv as a pure helper so vitest can pin both
branches without spawning a child process.

Tests added in apps/packaged/tests/sidecars.test.ts cover both branches
plus OD_LEGACY_DATA_DIR / daemonCliEntry env forwarding edges.

Refs: nexu-io/open-design#974

* fix(desktop,daemon): lazy auth retry + canonical HMAC binding (PR #974 round 5 P1+P3)

Round 5 (lefarcen P1, mrcfps): a daemon restart under
OD_REQUIRE_DESKTOP_AUTH=1 left desktop holding a stale secret while the
new daemon process required a fresh registration — folder import
returned 503 DESKTOP_AUTH_PENDING permanently until the user restarted
desktop. Same dead-end if the startup handshake missed its retry window.

Round 5 (lefarcen P3): the daemon verified the HMAC against raw
request-body baseDir, then trimmed before realpath(). A picker selection
of "/tmp/foo " could authorize an import of "/tmp/foo" — token bound to
a different path than the one imported.

Three coordinated fixes:

1. P1 lazy retry: extract pickAndImportFolder as a pure helper that
   takes injected fetch / mintToken / registerDesktopAuth deps. On 503
   DESKTOP_AUTH_PENDING from /api/import/folder, re-invoke the
   registration callback once, mint a fresh token (new nonce + new exp
   keeps replay protection), and POST again. Single retry, no infinite
   loop. Other failure shapes return immediately to the renderer.

2. P1 wiring: runDesktopMain now ALWAYS passes desktopAuthSecret to the
   runtime regardless of whether the initial handshake succeeded, plus
   a registerDesktopAuthWithDaemon callback the runtime invokes lazily.
   Soften the startup warning text to match the new recovery semantics.

3. P3 binding: trim picker output ONCE on the desktop side before both
   signing the HMAC and POSTing. Daemon-side verification stays against
   raw request-body baseDir (round-4 behavior); the daemon's defensive
   trim before realpath() is now a no-op for desktop traffic and only
   load-bearing for web-mode callers (path.isAbsolute("  /foo  ") is
   false). End-to-end: desktop-signed string == request body == HMAC-
   verified string == realpath() input.

Tests:

- apps/packaged/tests/desktop-pick-and-import.test.ts (NEW, 7 cases):
  lazy-retry happy path; lazy-retry exhausted (re-register WAS called);
  single-attempt happy path (no unnecessary IPC); optional-callback
  no-op; non-503 failures bypass retry; network errors; non-PENDING 503
  bypasses retry.

- apps/daemon/tests/desktop-import-token-gate.test.ts: replace round-4
  whitespace test with two round-5 binding tests — the trimmed string
  flows end-to-end (HMAC verifies, project metadata.baseDir equals
  realpath of trimmed input), and a request whose body baseDir diverges
  from the HMAC-bound string is rejected 403.

docs/architecture.md §"Desktop folder-import auth" — update the daemon-
restart-edge bullet to describe the lazy-retry recovery (round 4 said
"restart desktop to recover", which is now wrong) and add a headless-
packaged-mode bullet describing the round-5 P2 gate exclusion.

Refs: nexu-io/open-design#974

* feat(sidecar-proto,daemon): surface desktopAuthGateActive over STATUS IPC (PR #974 round 6 prep)

Round 6 (mrcfps): the split-start dev flow `tools-dev start daemon` ->
`tools-dev start desktop` was leaving the daemon ungated because
`OD_REQUIRE_DESKTOP_AUTH=1` is only injected when daemon and desktop
spawn in the same orchestrator invocation. To fix that, tools-dev needs
to introspect the running daemon's gate state before launching desktop
main — but the existing STATUS IPC didn't carry the flag.

This commit extends `DaemonStatusSnapshot` with a required
`desktopAuthGateActive: boolean` and wires the daemon sidecar's STATUS
handler (and the public `status()` method on the handle) to recompute
the value from `isDesktopAuthGateActive()` per request, since the flag
flips after `REGISTER_DESKTOP_AUTH` and stays sticky.

Extracted `withCurrentDesktopAuthGate(snapshot)` as a tiny pure helper
so the wiring is testable without booting a real IPC server. The new
test pins four scenarios:
- no secret registered (web-only mode) -> false
- after `setDesktopAuthSecret(buf)` -> true
- after `setDesktopAuthSecret(null)` (sticky) -> still true
- input snapshot's stale value is overridden by the live flag

The orchestrator-side consumer lands in the next commit
(`tools/dev/src/desktop-auth-gate.ts`).

Refs: nexu-io/open-design#974

* fix(tools-dev): auto-restart ungated daemon before desktop start (PR #974 round 6 mrcfps)

Round 6 (mrcfps): the split-start dev sequence
`tools-dev start daemon` -> `tools-dev start desktop` was leaving the
daemon running without `OD_REQUIRE_DESKTOP_AUTH=1`. The env var is
only injected when (A) daemon and desktop spawn in the same
orchestrator invocation (`startApp` line ~682) or (B) a desktop
runtime is already alive at daemon spawn time (`startDaemon` lines
~595-596). Neither fires for the split flow, so a renderer (or any
local HTTP client) could `POST /api/import/folder` directly with an
arbitrary `baseDir` before the desktop's first registration POST.
Round-5's lazy retry didn't help: it triggers on `503 DESKTOP_AUTH_PENDING`,
and the ungated daemon returns 200.

Close the gap by introspecting the running daemon's
`desktopAuthGateActive` (added to the STATUS IPC in the prior
commit) at the start of `startApp(DESKTOP, ...)`. When the daemon
reports the gate inactive, stop the daemon (and web, if running),
respawn the daemon with `requireDesktopAuth: true`, restart web,
then proceed with the desktop start. Restart order is critical and
pinned by tests: web stops FIRST (so the web->daemon proxy doesn't
serve a transient 502 against the down-then-up daemon), then daemon
stops, then daemon respawns gated, then web restarts.

The bundled-targets path (`pnpm tools-dev`) is unaffected because
trigger (A) already armed the gate at first daemon spawn — the
helper costs one ~800ms STATUS IPC roundtrip and returns no-op.

Helper lives in its own module (`tools/dev/src/desktop-auth-gate.ts`)
so the regression test can import it without triggering the
`cli.parse()` side effect at the bottom of `tools/dev/src/index.ts`.
Five `node:test` cases pin the call sequence — no daemon, gate
active, gate inactive + no web, gate inactive + web running, log
shape — so a future refactor can't silently regress the gate.

Two synthetic `DaemonStatusSnapshot` literals in `inspectAppStatus`
and `inspect` (used when the IPC is unreachable) get
`desktopAuthGateActive: false` to satisfy the now-required type
field — semantically correct since "no daemon answering" trivially
means "no gate active."

`docs/architecture.md` adds a new bullet under the Desktop folder-
import auth section describing this auto-restart behavior.

Refs: nexu-io/open-design#974

* fix(daemon): combine finalize request-abort + timeout signals (PR #974 round 7 lefarcen P1)

Round 6 wired the route handler to pass `finalizeAbort.signal` into
`finalizeDesignPackage`, but the helper only created its own
DEFAULT_TIMEOUT_MS controller when no caller signal was supplied. The
result: a client that stayed connected could hold the finalize lock and
upstream call indefinitely. Always create the timeout controller; when
the caller passes a signal, combine both via `AbortSignal.any` so
neither cancel path replaces the other.

Adds two regression tests in finalize-design.test.ts:
- timeout fires when caller signal never aborts
- pre-aborted caller signal still cancels

Adds an internal `timeoutMs` option to FinalizeOptions so tests can
exercise the abort path without a 120 s wait or fake-timer chains.
Production callers omit it; default remains DEFAULT_TIMEOUT_MS.

* fix(daemon): allow PATCH preserving existing fromTrustedPicker marker (PR #974 round 7 lefarcen P2)

The PATCH /api/projects/:id handler was rejecting any metadata that
contained `fromTrustedPicker`, including the unchanged `true` marker
that the linked-folder UI re-spreads when editing `linkedDirs`. Trusted
folder-imported projects could not update other metadata fields without
400-ing on their own marker.

Switch the rejection condition from `'in'` to a value comparison: only
reject when the incoming value differs from the persisted one
(`patch.metadata.fromTrustedPicker !== existingMeta?.fromTrustedPicker`).
That keeps acquisition (existing=undefined, patch true) and flip
(existing=true, patch false) attempts blocked while letting the UI
re-spread the existing marker.

POST /api/projects stays strict; that path has no existingMeta.

Adds two regression tests in desktop-import-token-gate.test.ts:
- allows PATCH preserving the existing fromTrustedPicker:true marker
- rejects PATCH that flips fromTrustedPicker on a trusted project

* fix(desktop,packaged): main-process api uses daemon URL not webUrl (PR #974 round 7 lefarcen P2)

Packaged builds load the renderer from `od://app/` and report that URL
through `discoverWebUrl`. But Node-side `globalThis.fetch` (undici) does
not route through Electron's registered `od://` protocol handler — that
handler runs in the renderer's protocol scope, not in main-process Node.
So `pickAndImportFolder` and `fetchResolvedProjectDir` calls from main
silently failed in packaged builds against the protocol scheme.

Add `discoverDaemonUrl` to `DesktopRuntimeOptions` and `DesktopMainOptions`.
The packaged shell already has the sidecar's real `http://127.0.0.1:<port>`
URL (`sidecars.daemon.url` from STATUS IPC) — thread it through to the
runtime. Main-process API calls now prefer the daemon URL and fall back
to the renderer URL for tools-dev (where it is itself http://127.0.0.1).

`PickAndImportFolderDeps.webUrl` renamed to `apiBaseUrl` so the boundary
is explicit at the type level; `fetchResolvedProjectDir`'s first
parameter renamed similarly. tools-dev callers see no behavior change —
their web URL is already an http://127.0.0.1 URL Node fetch can hit.

Test (`apps/packaged/tests/desktop-pick-and-import.test.ts`):
- existing 7 cases updated to the new prop name (no behavior change)
- new case pins URL composition: builds `${apiBaseUrl}/api/import/folder`
  and never produces a custom-protocol URL.

Note for review: this test pins URL composition; full Electron protocol
handler integration (renderer fetch through `od://`) is not exercised in
unit tests here.

* fix(tools-dev): preserve daemon/web ports across desktop-auth gate restart (PR #974 round 7 lefarcen P2)

Round 6 added the split-start auto-restart in ensureDaemonGateForDesktop
to close the dev-flow gap where `start daemon` then `start desktop`
left the daemon ungated. The restart was passing the current
`start desktop` CLI options to startDaemonGated/startWeb, which meant a
stack started with `--daemon-port 17456 --web-port 17573` could be
silently moved to random ports during the hardening restart, breaking
browsers and scripts pinned to those ports.

Extract the running ports from the STATUS snapshots (daemon.url and
web.url) and forward them as explicit `{ port }` callback args. The
closure in `tools/dev/src/index.ts` overrides the corresponding option
when a port was extracted; null falls back to the original CLI flags.

Adds three regression tests in tools/dev/tests/desktop-auth-gate.test.ts:
- preserves the running daemon port across the hardening restart
- preserves the running web port across the hardening restart
- falls back to caller options (port:null) when the URL has no port

* fix(web): refresh useDesignMdState on file/chat events (PR #974 round 7 mrcfps)

useDesignMdState() previously only recomputed on mount and on explicit
refresh() (called once after finalize). Once the user kept working —
editing files or sending more chat turns — the stale/fresh badge could
drift out of sync because file mtimes and conversation updatedAt moved
past the recorded generatedAt without the hook re-checking.

Hook accepts an optional `refreshKey: number` arg; ProjectView keeps a
counter and bumps it on three events:
- file-changed SSE (covers tool-emitted file mutations)
- live_artifact* SSE (covers chat turns that emit artifacts)
- streaming `true → false` edge (covers pure-text chat turns)

The hook treats refreshKey as a compute() dep; React's Object.is
comparison short-circuits the no-op renders, so each bump is a single
recompute pass.

Adds a regression test in useDesignMdState.test.tsx:
- flips stale state after a refreshKey bump without remounting

* fix(web): degraded-state useDesignMdState on malformed provenance (PR #974 round 7 mrcfps)

useDesignMdState used to report `{ isStale: false, staleReason: null }`
when the parser could not extract a comparison timestamp from the
DESIGN.md `## Provenance` section. The pinned test made that the
documented behavior. As mrcfps pointed out, that fails open exactly
when the freshness signal is most untrustworthy: any provenance-
formatting drift silently disables the staleness warning.

Extend `DesignMdStaleReason` with a third variant `'unknown-provenance'`.
On `generatedMs === null`, return `{ isStale: true, staleReason: 'unknown-provenance' }`.
ContinueInCliButton renders a distinct chip text "Spec freshness
unknown — regenerate to refresh signal" for that variant; the button
stays enabled because not-comparable is not the same as broken state.

Tests:
- modify the existing pinned test to assert the new degraded state
- add an end-to-end useDesignMdState test feeding a malformed Provenance
  section through compute() so a regression that re-pins fresh-on-null
  at the hook level (not just computeStale) fails fast
- add ContinueInCliButton render + click tests for the new chip

---------

Co-authored-by: DevForgeAI CI/CD Engineer <devforge-ai@development.ai>
Co-authored-by: lefarcen <935902669@qq.com>
2026-05-10 11:44:32 +08:00
Marc Chan
b03a504da6
release: Open Design 0.6.0 (#1080) 2026-05-09 19:58:11 +08:00
lefarcen
2bb029cb58
release: Open Design 0.5.0 (#820)
0.5.0 已从 c21cbc6 发布(https://github.com/nexu-io/open-design/releases/tag/open-design-v0.5.0);本次 squash 把版本 bump 与 CHANGELOG [0.5.0] 条目带到 main 历史,便于后续 0.5.1 release 在 main 上走标准 dispatch 流程。
2026-05-08 00:41:01 +08:00
iulian
80416b185a
Diagnose missing Next package during tools-dev web startup (#675)
* fix(tools-dev): diagnose missing Next package

* fix(web): remove duplicate Ukrainian prompt labels
2026-05-06 20:45:41 +08:00
lefarcen
ae4a08773a
chore(release): prepare 0.4.1 (#659)
- bump remaining monorepo package.json files to 0.4.1 after apps/packaged was already bumped in #637
- add CHANGELOG.md [0.4.1] - 2026-05-06 entry covering the startup hotfix and 19 merged PRs since 0.4.0:
  - Added: manual edit mode (#620), Cmd/Ctrl+P quick file switcher (#556), resizable chat panel (#563), PI status/cancel updates (#618), accessibility and RTL/Bidi craft modules (#587, #595), i18n structure checks (#608)
  - Changed: first-PR README links now surface help-wanted issues (#605)
  - Fixed: packaged contracts runtime exports (#577), packaged runtime beta gating (#637), ACP/MCP/agent fixes (#604, #612, #627), conversation error recovery (#623), native mac quit (#637)
  - Documentation/Internal: OD_DATA_DIR migration docs (#570), Simplified Chinese QUICKSTART (#578), zh-TW/ko README syncs (#586, #619), generated metrics (#592)

Release workflow validation runs after merge via release-stable.
2026-05-06 18:05:56 +08:00
lefarcen
963bbf2500
release: Open Design 0.4.0 (#454) 2026-05-05 23:39:40 +08:00
ChildhoodAndy
009d7a5478
refactor(daemon): eliminate duplicate dist tree from two-tsconfig build (#553)
Move sidecar source under src/ so a single tsconfig produces all daemon
output. Removes the parallel dist/src/ tree that was emitted by
tsconfig.sidecar.json (it included src/**/*.ts to type-check the
`../src/server.js` cross-tree import).

Build now emits:
- dist/<flat>            (cli.js, server.js, app-version.js, ...)
- dist/sidecar/{index,server}.js

`dist/sidecar/server.js` reaches the main daemon via `../server.js`
instead of `../src/server.js`, so there is no second copy of the source
tree in the published tarball.

Background — issue #534 (already fixed by #537):
The packaged Settings → About panel showed 0.0.0 because the sidecar
chain loaded the duplicated `dist/src/app-version.js`, where the fixed
`new URL('../package.json', import.meta.url)` resolved to a non-existent
`dist/package.json`. #537 patched the symptom by walking parents until a
real `package.json` is found and by writing `appVersion` into the Linux
packaged config. Both stay in place — they're sound defenses — but the
underlying duplicate-emit was never addressed; any future relative
resource lookup (templates, schemas, prompts) anchored on
`import.meta.url` would have hit the same trap.

This change removes the trap.
2026-05-05 23:31:14 +08:00
PerishFire
bbdd4e84b5
chore: enforce test directory conventions (#496)
* chore: enforce test directory conventions

Move package, app, and tool tests out of src and add guard enforcement so source directories stay source-only.

* ci: use guard and package-scoped tests

Run the new repository guard in CI and keep test execution aligned with package-scoped commands after removing root aliases.

* ci: align stable release guard check

Use the new repository guard in stable release verification after replacing the residual-JS-only script.

* chore: tighten test layout enforcement

Enforce sibling tests directories, typecheck moved test suites with dedicated configs, and refresh remaining guidance that pointed at src-based tests.

* chore: clarify no-emit test tsconfigs

Explicitly disable declaration-only emit in test tsconfigs so review tooling sees they are no-emit typecheck configs.
2026-05-05 15:34:22 +08:00
lefarcen
a719f02aa2
fix(web): normalize daemon proxy origins
Fix web sidecar proxy requests so same-origin browser requests reach the daemon without tripping origin validation, while unrelated origins remain rejected. Fixes #388.
2026-05-04 03:39:19 +08:00
lefarcen
016c08183f
release: Open Design 0.3.0 2026-05-03 23:07:28 +08:00
Sid
648374d839
fix(platform): wrap cmd.exe shim invocations to survive /s /c quote stripping (#339)
PR #258 standardized agent spawning through `createCommandInvocation`,
which on Windows wraps `.cmd` / `.bat` paths in `cmd.exe /d /s /c <line>`
and quotes each argument with cmd-style doubled quotes. PR #232's
follow-up fix for `shell:true` was lost in that refactor, and the new
shape has its own quoting bug on argv-style spawn:

1. cmd.exe `/s /c` strips exactly one leading and one trailing `"` from
   the rest of the command line.
2. Node, with `windowsVerbatimArguments` unset, escapes each argv element
   using CommandLineToArgvW rules — so the inner `"path with space"`
   ends up surfacing to cmd.exe with an extra layer of `\"` escaping
   that cmd doesn't understand.

Together these collapse `"C:\Users\Ethical Byte\...\codex.CMD" --help`
into `C:\Users\Ethical Byte\...\codex.CMD --help` with no quoting
preserved, and cmd.exe parses the first space as a token boundary —
"`Ethical` is not recognized as an internal or external command." See
issue #315 for the full repro.

The fix mirrors what Node's own `child_process.spawn({ shell: true })`
does internally: wrap the entire joined command line in an extra `"…"`
and set `windowsVerbatimArguments: true`. The outer wrap absorbs the
`/s /c` strip, leaving inner per-arg quoting intact, and the verbatim
flag tells Node to pass argv through to CreateProcess unchanged.

Changes:

- `packages/platform/src/index.ts`
  - Extend `CommandInvocation` with optional `windowsVerbatimArguments`.
  - Extract the cmd.exe shim builder into `buildCmdShimInvocation` and
    apply the outer wrap + verbatim flag in both `createCommandInvocation`
    and `createPackageManagerInvocation`.
  - Forward the flag through `spawnBackgroundProcess` and
    `spawnLoggedProcess`.
- `apps/daemon/src/server.ts` — agent spawn forwards
  `invocation.windowsVerbatimArguments`. This is the call site that
  hit #315 in the wild (Codex CLI `.CMD` shim, user dir with space).
- `tools/pack/src/win.ts` — `runPnpm` and `runNpmInstall` forward the
  flag through `execFileAsync`. Affects the Windows packaged-build
  pipeline when run from a path with spaces.
- `tools/dev/src/index.ts` — `runLoggedCommand` accepts and forwards the
  flag; `buildDesktop` propagates it from
  `createPackageManagerInvocation`. Affects local dev on Windows.

Tests:

- 9 new unit tests in `packages/platform/src/index.test.ts` stub
  `process.platform` so both Windows and POSIX branches run on every
  CI runner. Coverage:
    - POSIX pass-through.
    - Windows non-shim binary pass-through.
    - `.CMD` shim with spaces in the binary path (the #315 repro).
    - `.bat` shim parity.
    - Argv elements with spaces alongside the shim path.
    - Argv elements without whitespace stay unquoted.
    - `process.env.ComSpec` fallback.
    - `npm_execpath` short-circuit (cross-platform).
    - POSIX pnpm pass-through.
    - Windows pnpm wrapped through cmd.exe.

Closes #315.
2026-05-03 10:00:46 +08:00
lefarcen
62b01a6dbf
release: Open Design 0.2.0 (#297) 2026-05-02 22:28:59 +08:00
Kevin Tsai
c0589ed05e
fix(desktop): launch reliably on Windows from Electron-based parent shells (#292)
* fix(tools-dev): strip ELECTRON_RUN_AS_NODE before spawning desktop

Parent processes such as Electron-based IDEs may set ELECTRON_RUN_AS_NODE=1
in their environment for sidecar/script reuse. When tools-dev inherits this
env via process.env, the spawned electron.exe runs as plain Node and fails
to inject main-process APIs (app, BrowserWindow, protocol all become
undefined). Explicitly drop the variable before spawning so desktop always
boots in real Electron mode regardless of caller environment.

* fix(desktop): ensure BrowserWindow is visible on initial load

Windows focus-stealing prevention can leave detached-spawned GUI windows
minimized or hidden, even when constructed with show:true. Add a small
ensureWindowVisible helper that restores from minimized state and forces
show+focus after the placeholder URL loads. Cross-platform safe: only
acts when window is actually hidden or minimized, preserving any user
window-state adjustments.
2026-05-02 22:28:56 +08:00
Foximo24
a4fd4f949f
fix(tools-dev): use junction instead of dir symlink on Windows (#231)
ensureWebDevNodeModules() called fs.symlink(target, path, "dir") to
link apps/web/node_modules into the web runtime root. On Windows,
"dir" symlinks require either Administrator rights or Developer Mode
(SeCreateSymbolicLinkPrivilege). Standard non-elevated user accounts
without Developer Mode get EPERM and tools-dev exits before web ever
starts.

Junctions ("junction") are functionally equivalent for directory-only
links on the same volume, work for any user without elevation or
Developer Mode, and are silently treated as plain symlinks on POSIX.
The existing isSymbolicLink() check on the next launch still matches
junctions on Node.

Reproduced on Windows 11 + Node 24 with a non-elevated PowerShell
session and Developer Mode off.
2026-05-02 10:17:43 +08:00
yamsfeer
4510c69ba1
feat(tools-dev): add --prod flag and OD_HOST for headless server deployment (#222)
* feat(tools-dev): add --prod flag and OD_HOST for headless server deployment

- Lazy-load electronBinaryPath so daemon+web can start without Electron
- Add --prod flag to tools-dev start/run that sets NODE_ENV, OD_WEB_PROD,
  and OD_WEB_OUTPUT_MODE automatically for production Next.js builds
- Add OD_HOST env var support to daemon and web sidecar bind addresses
  (defaults to 127.0.0.1, set to 0.0.0.0 for remote access)
- Skip next.config.ts distDir override when OD_WEB_PROD=1 so production
  builds resolve the default .next directory

Closes #221

* fix: address PR review — hardcode daemon bind to 127.0.0.1, cache electron path

- Revert OD_HOST from daemon: daemon is a local privileged process and must
  only bind 127.0.0.1. Remote access goes through the web sidecar proxy.
- Web sidecar resolveDaemonOrigin now always uses 127.0.0.1, separated from
  the web bind address (OD_HOST) so the daemon proxy works correctly even
  when the web listener binds 0.0.0.0.
- Add OD_HOST character validation to reject clearly invalid values early.
- Cache electronBinaryPath getter result to avoid repeated require() calls.

---------

Co-authored-by: yamsfeer <yamsfeer@users.noreply.github.com>
2026-05-02 09:27:16 +08:00
Waleed978
89722379c5
fix(tools-dev): normalize web dev tsconfig paths on Windows (#174)
tools-dev generated a temp web tsconfig with Windows backslash relative paths in extends, which Next/TypeScript failed to resolve in some environments. Normalize runtime tsconfig/dist path strings to POSIX separators so dev config resolution works consistently across Windows/Linux/macOS.
2026-04-30 22:45:02 +08:00
nettee
3fb849d047
Fix chat runs surviving web disconnects (#146)
* fix chat runs surviving web disconnects

* fix chat run create abort propagation

Generated-By: looper 0.0.0-dev (runner=fixer, agent=openai/gpt-5.5)

* fix daemon keepalive reconnect budget

Generated-By: looper 0.0.0-dev (runner=fixer, agent=gpt-5.5)

* fix daemon stream disconnect cancellation

Generated-By: looper 0.0.0-dev (runner=fixer, agent=openai/gpt-5.5)

* fix daemon stream abort cancellation race

Generated-By: looper 0.0.0-dev (runner=fixer, agent=openai/gpt-5.5)

* fix daemon run cancellation semantics

* fix load

* doc

* 2

* add run refresh recovery

* fix active run refresh status

* fix reattach abort handling

* fix

* fix chat initial scroll

* fix daemon start failures

Generated-By: looper 0.2.7 (runner=fixer, agent=openai/gpt-5.5)

* fix background run recovery

Generated-By: looper 0.2.7 (runner=fixer, agent=openai/gpt-5.5)

* fix stop run status

Generated-By: looper 0.2.7 (runner=fixer, agent=openai/gpt-5.5)

* fix background run recovery

Generated-By: looper 0.2.7 (runner=fixer, agent=openai/gpt-5.5)

* extract daemon run service

* move prompt composition to daemon

* fix prompt module resolution

* fix project id generation

* add project run status

* add designs kanban view with awaiting_input status

- add grid/kanban view toggle on Designs tab; persist choice in localStorage
- introduce awaiting_input project display status (daemon-derived from
  unanswered <question-form>) so projects asking the user aren't shown
  as Completed; ordered between Running and Completed with amber accent
- hide transient queued state from users: coerce queued/starting to
  running in daemon /api/projects projection and drop the queued kanban
  column
- a11y polish on Designs cards: Space activation, aria-labels on delete,
  focus-visible outlines, reveal delete on focus-within and touch,
  prefers-reduced-motion handling
- kanban layout uses flex sizing instead of viewport math; scoped icon-
  only pill button rule fixes view-toggle icon alignment

---------

Co-authored-by: mrcfps <mrc@powerformer.com>
2026-04-30 20:16:46 +08:00
Marc Chan
ac9c239b1e
Improve tools-dev native addon diagnostics (#153)
Surface daemon log tails when tools-dev waits for daemon status and add targeted guidance for native Node addon ABI mismatches.

Generated-By: looper 0.0.0-dev (runner=worker, agent=gpt-5.5)
2026-04-30 17:11:59 +08:00
nettee
86c256ad56
Improve tools-dev web startup flow (#128) 2026-04-30 14:58:52 +08:00
PerishFire
a19c866d5b
Fix tools-dev default startup usability (#127)
Allow sidecar port zero for auto allocation and make lifecycle command output easier to read.
2026-04-30 14:50:44 +08:00
PerishFire
c6d11018a0
Refresh desktop integration control plane (#123)
* feat(dev): add desktop tools-dev control plane

* refactor(sidecar): split Open Design contracts

Move Open Design-specific sidecar protocol definitions into @open-design/contracts so sidecar and platform can remain descriptor-driven primitives.

* refactor(daemon): organize package sources

Keep daemon app code, tests, and sidecar entrypoints in separate package directories so each layer can be built and verified independently.

* chore(repo): streamline maintenance entrypoints

Centralize agent guidance by directory and reduce root command chains while preserving the existing build scope.

* docs: translate agent guidance to English

* fix(sidecar): tolerate stale IPC sockets

Remove stale Unix socket files only after confirming no listener is active, so tools-dev can restart after unclean shutdowns.
2026-04-30 14:23:53 +08:00