open-design

mirror of https://github.com/nexu-io/open-design.git synced 2026-06-01 03:14:35 +07:00

Author	SHA1	Message	Date
lefarcen	df8a0faff6	feat(runtimes): register AMR (vela) as an ACP stdio agent (#2355 ) * feat(runtimes): register AMR (vela) as an ACP stdio agent AMR is the vela CLI's ACP runtime mode. `vela agent run --runtime opencode` speaks ACP JSON-RPC over stdio (see vela's `specs/current/runtime/manual-agent-run-openrouter.md`); per `docs/new-agent-runtime-acp.md` we expose it through the same `streamFormat: 'acp-json-rpc'` transport that already powers Hermes, Devin, Kimi, etc. The new `defs/amr.ts` is the entire wiring — `buildArgs` returns `['agent', 'run', '--runtime', 'opencode']`, `fetchModels` reuses `detectAcpModels`, and the fallback list seeds the OpenRouter ids vela's e2e baseline uses. `executables.ts`/`app-config.ts`/`metadata.ts` get the matching `VELA_BIN`/`VELA_LINK_URL`/`VELA_RUNTIME_KEY`/`VELA_OPENCODE_BIN` allowlist + install/docs URLs, so users can configure the per-agent env in Settings without leaking into other adapters. Coverage: `tests/fixtures/fake-vela.mjs` is a minimal ACP stub that returns the documented `initialize` / `session/new` / `session/set_model` / `session/prompt` shapes; `tests/amr-acp-integration.test.ts` spawns it via `child_process.spawn` and drives a full turn through `attachAcpSession` and `detectAcpModels`, so the ACP transport contract for AMR is end-to-end verified locally even before a real `vela` binary is installed. Validated: - pnpm guard - pnpm typecheck (all workspace projects) - pnpm --filter @open-design/daemon test (2881/2881) Deferred: real OpenRouter-backed turn through a built `vela` binary — the runtime def needs no changes for that path, only `VELA_RUNTIME_KEY` and `VELA_LINK_URL` in env (or Settings). * fix(runtimes/amr): pin a concrete default model and bare openai ids End-to-end validation against a freshly-built `vela` (nexu-io/vela@main) + OpenRouter surfaced two contract details the first AMR runtime def got wrong: 1. vela rejects `session/prompt` with `session/set_model must be called before session/prompt`. attachAcpSession in apps/daemon/src/acp.ts skips set_model whenever the picked model is the synthetic 'default' id, so AMR's fallback list must NOT include DEFAULT_MODEL_OPTION. The def now ships a concrete `gpt-5.4-mini` as both `fetchModels`' default option and `fallbackModels[0]`, which makes attachAcpSession always send a real `session/set_model` for AMR turns. 2. `vela --runtime opencode` auto-prepends `openai/` to whatever modelId it forwards to opencode's openai provider. With OpenRouter-style ids like `openai/gpt-5.4-mini`, opencode receives the double-prefixed `openai/openai/gpt-5.4-mini` and replies `ProviderModelNotFoundError`. The new fallback list ships the bare ids opencode's openai registry actually knows about (gpt-5.4, gpt-5.4-mini, gpt-5.4-fast, etc.). Stub + tests: - tests/fixtures/fake-vela.mjs now enforces the set_model gate the same way real vela does, so a regression that silently goes back to model: 'default' would surface as a fatal error in tests instead of a hidden production failure. - tests/amr-acp-integration.test.ts pins both contracts: no 'default' / no 'openai/' prefix in fallbackModels, and a negative case that asserts session/prompt fails when no model is set. Adds `apps/daemon/scripts/verify-amr-real-vela.mjs` — a small dev-time runner that drives `attachAcpSession` against a real `vela` binary and prints the daemon's chat events, so future protocol drift can be checked against an actual OpenRouter call. Verified locally: `vela agent run --runtime opencode` + OpenRouter returns the prompted string ("AMR-E2E-PASS") through the full daemon pipeline; daemon test suite stays 2883/2883. * fix(runtimes/amr): substitute concrete model when chat run sends 'default' A plugin-driven AMR run from the UI surfaced a real-world hole in the prior commit: json-rpc id 3: session/set_model must be called before session/prompt The Default-design-router plugin (and any caller that doesn't pin a real model) sends `model: 'default'` straight through, which the AMR runtime def cannot accept — vela rejects `session/prompt` without `session/set_model` and attachAcpSession skips set_model whenever model === 'default'. Just leaving DEFAULT_MODEL_OPTION out of the adapter's `fallbackModels` is not enough: the chat-run handler in server.ts still forwarded 'default' verbatim. This adds `resolveModelForAgent(def, resolved, env?)` as the single source of truth for the substitution: 1. If the caller picked a real id, pass it through. 2. Else, if `def.defaultModelEnvVar` is set and the daemon process env has a non-empty value for it, return that (operator escape hatch — see below). 3. Else, if the def's `fallbackModels` does NOT contain a 'default' id, return `fallbackModels[0].id`. 4. Else, return the original value (the historic shape — defs that list 'default' themselves are untouched). AMR sets `defaultModelEnvVar: 'VELA_DEFAULT_MODEL'`, so when opencode's openai-provider registry deprecates `gpt-5.4-mini` upstream, an operator can swap the fallback id without a code change by exporting `VELA_DEFAULT_MODEL=gpt-5.5` before launching tools-dev / od. Worth noting the env var must live in the daemon's `process.env` (Settings-UI per-agent env values only reach the spawned child, not the daemon's resolver) — the new field's docblock spells this out. Coverage: - `tests/runtimes/resolve-model.test.ts` — 8 unit tests covering all four resolver branches plus the env-override happy path / fallback / ignore-when-user-picked-a-real-id case. - `pnpm --filter @open-design/daemon typecheck` clean. * chore(runtimes/amr): move AMR to the top of the base agent list So `AMR (vela)` shows up first in the agent picker / status views, ahead of claude / codex. Pure ordering change; no behavior delta. * feat(amr): Sign-in / Sign-out button on the AMR Settings card The first half of the AMR work assumed the operator would set VELA_RUNTIME_KEY / VELA_LINK_URL on the daemon process and never surfaced login state to users. This adds the missing UX so a fresh install can drive the full path from Settings: - GET /api/integrations/vela/status reads ~/.vela/config.json for the active profile and returns { loggedIn, profile, user } (without leaking the runtime/control keys themselves). - POST /api/integrations/vela/login spawns `vela login` once (409 if one is already in flight). The vela CLI opens the user's browser to the device-authorization page itself — Open Design only needs to kick the subprocess off. - POST /api/integrations/vela/logout removes ~/.vela/config.json so the next status read returns logged-out. `AmrAgentCard` is a dedicated agent-card component for AMR because the existing `<button>` row can't host an interactive sub-control (nested interactive elements). It polls /status after a login click until the daemon reports loggedIn=true (or 5 minutes elapse), and exposes a Sign-out action on hover. Other adapters (claude, codex, hermes, …) keep their existing `<button>` card. i18n: 8 new keys (settings.amrLogin / Logout / LoggingIn / etc.) added to en + zh-CN. Other locales spread `en` and inherit the English copy until translations land. Coverage: - `tests/integrations/vela.test.ts` pins the config.json reader against a tmp HOME — including the negative case where a profile has user info but no runtimeKey (still logged-out), and the secret-leak guard ("rt-secret-" must not appear in the projection payload). - `tests/components/AmrAgentCard.test.tsx` covers all four UI states (logged-out, logging-in, logged-in, logging-out) plus the click-propagation invariant the divergent card was built to keep. `pnpm --filter @open-design/daemon test` 2901 / 2901 passing. `pnpm --filter @open-design/web test` 1719 / 1719 passing. `pnpm typecheck` + `pnpm guard` clean. Dev script side-effects: `apps/daemon/scripts/verify-amr-real-vela.mjs` no longer requires both VELA_RUNTIME_KEY and VELA_LINK_URL — if VELA_PROFILE is set, the vela CLI is allowed to resolve credentials from `~/.vela/config.json`. Added the two AMR `.mjs` fixtures to `scripts/guard.ts` allowlist with the executable-fixture / dev-runner rationale. fix(connection-test): substitute model for AMR before attachAcpSession The chat-run path in server.ts already routes the requested model through `resolveModelForAgent` so AMR / vela (whose CLI demands an explicit `session/set_model` before `session/prompt`) gets the def's first concrete fallback id when the chat run ships `model: 'default'`. `connectionTest.ts` was wiring `attachAcpSession({ ..., model: model ?? null })` directly, which made the Test Connection button on the AMR Settings card deadlock with the same `session/set_model must be called before session/prompt` error the chat-run path already handles — surfaced as a permanent "Testing connection…" spinner in the UI. Reuse the same helper here so Test Connection mirrors chat-run behavior. * test(amr): three-layer end-to-end coverage for the AMR login + turn flow The PR up to this point shipped runtime + UI code with unit-level Vitest coverage. This commit adds the cross-layer regression net the live demo relied on: 1. apps/daemon/tests/integrations/vela.routes.test.ts (HTTP, Vitest) Spins up the real daemon Express app via `startServer({port:0,...})`, persists `agentCliEnv.amr.VELA_BIN = <fake>` into app-config.json, and exercises every /api/integrations/vela/* endpoint against the extended fake-vela stub: - status reads ~/.vela/config.json under various states - login spawns the fake, waits for config.json to appear, returns pid + startedAt + profile - 409 already-running guard with the stub's delay knob - logout removes the file (idempotent) - secrets (runtimeKey / controlKey) never leak in the projection - login → status round-trip flips loggedIn=false → true 2. e2e/tests/amr/turn.test.ts (tools-dev orchestrated, Vitest) Boots a namespaced daemon + web pair through `createSmokeSuite`, inlines a self-contained fake `vela` binary that handles BOTH `vela login` (writes ~/.vela/config.json) and `vela agent run --runtime opencode` (ACP stdio with the `session/set_model must precede session/prompt` gate the real binary enforces), then drives a complete /api/runs lifecycle for `agentId: 'amr', model: 'default'` and asserts the assistant message captures the fake's streamed text. This is the test that would have surfaced today's plugin-default-model regression (the `set_model before prompt` error) at PR time instead of demo time. 3. e2e/ui/amr-login-pill.test.ts (Playwright) Mocks /api/agents + /api/integrations/vela/{status,login,logout} to drive the Settings AMR card through the full Sign in → Signed in → Sign out cycle. Pins the AmrLoginPill polling contract and the aria-label semantics (the pill's accessible name is "Sign out" once logged in, regardless of which label the hover-state text shows). fake-vela.mjs extensions: - Handles `vela login` argv by writing ~/.vela/config.json for the active VELA_PROFILE and exiting 0 — mirrors real vela's on-disk side-effect without the device-auth loop. - FAKE_VELA_LOGIN_DELAY_MS knob so route tests can observe the in-flight state of the spawn lifecycle. - FAKE_VELA_LOGIN_USER_EMAIL / _USER_PLAN to assert the surfaced user fields end-to-end. Validated: - `pnpm guard` + `pnpm typecheck` (all workspace projects) - `pnpm --filter @open-design/daemon test`: 2998 / 2998 passing, including the new 8-test integration suite. - `cd e2e && pnpm test tests/amr`: 1 / 1 passing. - `cd e2e && pnpm exec playwright test ui/amr-login-pill.test.ts`: 1 / 1 passing (6.7s). * feat(amr): package native cli and refine login ui * feat(amr): wire vela cli beta packaging * docs(amr): document vela ci packaging review * docs(amr): refine vela ci integration review * fix(ci): refresh nix pnpm dependency hashes * fix(pack): clean up Vela CLI packaging * fix(pack): bundle Vela CLI support files * fix(amr): recover login attempts from stale auth state * test: expand AMR and automations coverage * fix(amr): address review follow-ups * test(web): align tasks fixtures with contracts * fix(daemon): type wildcard route params * fix(ci): refresh PR merge validation * fix(amr): clear env credentials on logout * feat(settings): inline local CLI model configuration * fix(amr): recognize daemon env credentials * [codex] Fix Vela companion packaging (#2979) * Fix Vela companion packaging * Update Nix pnpm dependency hashes * [codex] Surface AMR account failures (#2980) * fix: surface AMR account failures * fix: cover AMR recovery error guidance * chore: bump beta base version to 0.8.1 (#2990) * Fix AMR profile and packaged runtime review issues * Detect packaged AMR OpenCode companion tree * feat(web): polish AMR frontend flows * Polish AMR onboarding card * fix: read AMR login state from dot-amr config (#3048) * test: tighten AMR credential and packaging coverage * test: restore AMR executable test env helper * [codex] Fix packaged mac Dock identity and AMR label (#3076) * Fix packaged mac sidecar Dock identity * Rename AMR assistant label * Fix AMR live models and dot-amr login state (#3073) * fix: read AMR login state from dot-amr config * fix: load live AMR models before runs * fix: point AMR onboarding link to production wallet * fix: address AMR model review feedback * fix: persist live AMR model fallback * [codex] Fix AMR link catalog model ids (#3088) * Fix packaged mac sidecar Dock identity * Rename AMR assistant label * Fix AMR link catalog model ids * Fix AMR model normalization typecheck * Use live AMR model for default runs * fix: polish AMR runtime settings UI * Accelerate AMR startup defaults (#3092) * Surface AMR insufficient balance wallet URL (#3099) * fix(web): polish onboarding controls (#3112) * fix(web): show CLI scan loading state * Avoid duplicate AMR wallet recharge links (#3117) * Avoid duplicate AMR wallet recharge links * Use Vela CLI 0.0.3 test package * chore(nix): refresh pnpm deps hash * Fix AMR wallet guidance display --------- Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com> * chore(pack): pin Vela CLI 0.0.3-test.1 (#3127) * chore(nix): refresh pnpm deps hash * chore(pack): pin Vela CLI 0.0.3 * chore(nix): refresh pnpm deps hash * fix(web): suppress AMR exit 130 fallback (#3136) * feat(web): nudge users to hosted AMR on model/auth/quota failures (#3083) * feat(web): nudge users to hosted AMR on model/auth/quota failures When a non-AMR agent run fails with an auth / quota / upstream model error, surface an inline nudge under the error pill linking to Open Design's hosted AMR gateway (https://open-design.ai/amr). The nudge fires `surface_view` (element=run_failed_toast) on impression and `ui_click` (element=go_amr) on the link. Also teach the daemon to classify CLI-agent auth/quota/upstream failures (Claude Code, codex, ...) into specific API error codes (AGENT_AUTH_REQUIRED / RATE_LIMITED / UPSTREAM_UNAVAILABLE) instead of the generic AGENT_EXECUTION_FAILED, so both the error message and the nudge key off accurate codes. AMR's own runs are excluded from the nudge — they keep the dedicated sign-in / recharge affordances. * feat(web): rework failed-run AMR guidance into per-case error UI Replace the single inline nudge with a per-case failed-run experience driven by the run's error code + agent: - The error card is now neutral gray (was red) and always carries a retry button; it is driven by the persisted per-message error event so it survives a reload. - Non-AMR agent hitting a model/auth/quota wall: a theme-color promotion card under the error card offers "switch to AMR & retry" — switches the run to AMR, opens Settings on the AMR card, and auto-retries once the account signs in (ProjectView polls vela login status, independent of the Settings pill lifecycle, with success / 5-min-timeout / unmount exits). - AMR agent unauthorized: clearer copy + an "authorize & retry" button. - AMR agent out of balance: clearer copy + a "top up" button to the AMR wallet, with manual retry. - Settings AMR card: when opened from the nudge, it scrolls into view and pulses, and an authorize-button coachmark (a fake hand cursor that rises in and dismisses on hover) points at the sign-in control when not yet authorized. analytics: surface_view (run_failed_toast) on the promotion card and ui_click (go_amr) on its action are retained. i18n adds chat.amrCard.* and chat.amrError.* (en / zh-CN / zh-TW translated; other locales fall back to en) and drops the old chat.amrErrorGuidance keys. * fix(daemon): require status context for numeric service-failure codes Per review on #3083: the model-service classifier matched bare HTTP status numbers (`500`, `502`, `429`, `401`), so ordinary CLI output like `line 500`, `read 502 bytes`, or `exit code 401` could be misclassified as a provider outage / auth wall and wrongly surface the AMR nudge. Now a status number only counts when it carries explicit context (`HTTP 500`, `status 503`, `code: 401`, `502 Bad Gateway`); textual provider phrases (overloaded, bad gateway, service unavailable, rate limit, …) are unchanged. Adds fixtures proving unrelated numeric output stays null. * fix(web): keep error pill for failed runs ChatPane's card doesn't cover Per review on #3083: the per-message gray error pill was suppressed for every persisted error status event, but ChatPane only renders the replacement top-level error card for `retryableAssistantMessage` (the last failed assistant). So a failed turn that is no longer last (after a follow-up) or an older failed run in history showed neither the pill nor the card — its error detail vanished, undercutting reload/history survival. ChatPane now passes `errorCardOwnerId` (the assistant id whose error the card represents); AssistantMessage suppresses only that one pill and keeps rendering StatusPill for all other error events. * fix(daemon): don't treat a process exit code as an HTTP status Follow-up to review on #3083: the status-context helper accepted a bare `code` prefix, so `exit code 401` / `process exited with code 429` still matched and got classified as AGENT_AUTH_REQUIRED / RATE_LIMITED (the very `exit code 401` case the comment calls out as noise). `code` now only counts when qualified (`status code` / `error code` / `response code`) or punctuation-bound (`code: 401`); bare `exit code N` no longer matches. Adds fixtures for exit-code lines returning null. * chore(web): translate AMR card / error keys for 16 remaining locales PR #3083 added 10 new `chat.amrCard.` / `chat.amrError.` keys but only provided en/zh-CN/zh-TW translations; the other 16 locales fell back to English. Translate the card title/body, three chips, primary CTA, and the AMR self-error (auth / balance) messages and buttons for ar, de, es-ES, fa, fr, hu, id, it, ja, ko, pl, pt-BR, ru, th, tr, uk. * fix(amr): address review feedback on #2355 Targeted fixes for the unresolved review threads on #2355. Each fix includes / updates a focused test. - runtimes/executables.ts: `packagedVelaOpenCodeCompanionTree` now verifies the inner `opencode` executable exists + is runnable, not just the directory. This closes the false-positive availability path that let `detectAgents()` surface AMR as available even when the packaged companion was empty / partially copied (mrcfps, 4 threads). - runtimes/executables.ts: `resolveAmrOpenCodeExecutable` now prefers the bundled `<OD_RESOURCE_ROOT>/bin/libexec/opencode/opencode` over a stale `opencode` on the user's PATH, so packaged AMR builds can't be hijacked by a global installation. - web/EntryShell.tsx: when the Local CLI scan returns an available agent and the previously-selected agent is AMR, switch the selection to the first available local agent so the runtime and persisted agent agree before Continue. - server.ts (model-probe branch): for AMR, check `readVelaLoginStatus` BEFORE rejecting on an empty live-model catalog — a signed-out user was getting `AMR_MODEL_UNAVAILABLE` ("choose a model") instead of the correct `AMR_AUTH_REQUIRED` (sign-in affordance). - server.ts (default model fallback): if the user asked for the AMR agent default and the cached id is no longer in the FRESH catalog, fall back to `liveModels[0]` from the probe instead of rejecting the run as `AMR_MODEL_UNAVAILABLE`. - integrations/vela.ts: route `vela login` through `createCommandInvocation` so an npm/Node-style `vela.cmd` / `.bat` shim on Windows gets the correct `cmd.exe /d /s /c …` wrapping with verbatim args (matches `execAgentFile` / chat-run spawning). - tools/pack/src/linux.ts: in containerized Linux builds, bind-mount the host directory of `OPEN_DESIGN_VELA_CLI_BIN` and rewrite the env to the container-side path. The host path was being passed in as-is even though the default container only mounts /project, /tools-pack and cache/home — `copyOptionalVelaCliBinary` saw a missing path. Deferred (out of scope for this PR): - `od amr status/login/logout/cancel` CLI subcommands (AGENTS.md UI/CLI dual-track rule, server.ts:5763) — sizable surface; tracked for a separate focused PR. - Strict `--require-vela-cli` for Windows + mac-x64 beta builds: prematurely blocked — `@powerformer/vela-cli` only publishes the `darwin-arm64` platform binary today; adding the flag elsewhere would fail the builds. Revisit once win/x64/linux binaries ship. * fix(amr): hoist sendAmrAccountFailure above the AMR catalog preflight (TDZ) The new signed-out AMR branch in the catalog preflight at server.ts:10875 calls `sendAmrAccountFailure(...)` to emit AMR_AUTH_REQUIRED, but the const declaration sat ~100 lines below at the outer function scope. Because `const` is TDZ-aware, that branch would have thrown `ReferenceError: Cannot access 'sendAmrAccountFailure' before initialization` for the exact users it tries to help — defeating the original intent. Hoist the helper to just above the AMR preflight block so it's available to every AMR code path in this function. Behavior elsewhere is unchanged. Also rerun the daemon test suite: `launch.test.ts > resolveAgentLaunch uses packaged built-in Vela for AMR` was creating the `<resourceRoot>/bin/libexec/opencode/` companion directory only, but this PR's earlier tightening of `packagedVelaOpenCodeCompanionTree` also requires the inner `opencode` executable. Add it to that fixture to match the new contract; the test was a sibling of the executables / env-and-detection fixtures already updated in `13fc4f4`. Addresses #2355 review (mrcfps, 2026-05-28). * feat(web): add hover cancel for AMR login (#3158) * feat(web): add hover cancel for AMR login * fix(web): don't bounce AmrLoginPill back to 'Signing in…' after local cancel Both codex-connector (P2) and looper (CHANGES_REQUESTED) on this PR flagged the same race in the new local-cancel path: `handleCancelLogin` dispatches `notifyAmrLoginStatusChanged('login-canceled')` immediately after `/login/cancel` returns, but the `AMR_LOGIN_STATUS_EVENT` listener unconditionally re-enters `refresh()` and then restarts polling whenever `/api/integrations/vela/status` still reports `loginInFlight: true`. That is a real race because the daemon's `cancelVelaLogin()` only sends SIGTERM (escalating to SIGKILL after `LOGIN_CANCEL_KILL_GRACE_MS` = 2000 ms) and keeps the child in `activeLoginProcs` until it actually exits — so the first `/status` read after a successful cancel can legally still come back as in-flight. Under that window the pill flips back to 'Signing in…' and can later surface the timeout/error path even though the user already canceled, defeating the behavior promised in the PR description. Fix the listener instead of every dispatch site: in the `login-canceled` branch, after the local reset (stopPolling + setPending(null) + clear refs), optimistically mark every subscribed pill instance as not-in-flight (`setStatus((c) => c ? { ...c, loginInFlight: false } : c)`) and `return` — skip the refresh-and-reconcile branch below entirely. The next explicit refresh (component mount, user interaction, or a `status-changed` event) will pick up the daemon's confirmed state once the child has actually exited. Add a focused regression test that holds `/api/integrations/vela/status` at `loginInFlight: true` even after a successful `/login/cancel`, asserting that the pill stays at the Canceled → Authorize sequence and never bounces back to 'Signing in…'. This test fails on the pre-fix listener and passes on the new behavior; existing 'cancels an in-flight AMR sign-in…' and 'reconciles late AMR browser completion to Signed in after local cancel' tests continue to pass. Addresses review feedback on #3158 (chatgpt-codex-connector, nettee). --------- Co-authored-by: lefarcen <935902669@qq.com> --------- Co-authored-by: a1chzt <chizblank@gmail.com> Co-authored-by: Amy <1184569493@qq.com> Co-authored-by: Mason <jinmeihong0201@gmail.com> Co-authored-by: Caprika <56862773+alchemistklk@users.noreply.github.com> Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>	2026-05-28 05:09:55 +00:00
lefarcen	c14baf07d3	Merge origin/main into release/v0.8.0 PR #2461 sync prep — resolves 14 conflicts merging 84 main-side commits on top of 58 release-side commits accumulated during the 0.8.0 cycle. Resolution summary: Take main (theirs) where main carried deliberate forward progress: - apps/web/src/components/PluginCard.tsx — 7 hunks, i18n migration: hardcoded English aria-labels/titles replaced with t() calls keyed on pluginCard.* (all 8 keys verified present in en.ts). - apps/web/src/components/TasksView.tsx — 1 hunk, source-ingestion feature: sortedRoutines (newest-first), sourceIngestionTemplates, patchSourceForm, submitSourceIngestion. activeCount/pausedCount semantics preserved (now keyed on sortedRoutines, count unchanged). - e2e/ui/app.test.ts — new node:fs/promises + tmpdir + path + @/timeouts imports needed by main-side test helpers. - e2e/ui/settings-local-cli-codex-fallback.test.ts — menu-dismissal helper block added by main. Keep both sides where each added a different field to the same object literal: - apps/web/src/components/ProjectView.tsx (locale + analyticsHints spread). - apps/web/src/components/DesignSystemFlow.tsx (locale + analyticsHints). Take release (ours) where release carried deliberate work that ships 0.8.0: - CHANGELOG.md — release-side 0.8.0 entry + PR link refs; main's Unreleased section was the same body of work, now finalized. - apps/landing-page/public/{apple-touch-icon,favicon}.png + apps/web/public/app-icon.svg — release-side visual refresh assets consistent with 0.8.0 stable ship. - tools/pack/src/linux.ts — packageVersion const required by line 466; taking main's empty line would build-error. - e2e/ui/project-management-flows.test.ts + e2e/ui/settings-api-protocol.test.ts + e2e/ui/settings-memory-routines.test.ts — release-side release-smoke hardening (shangxinyu1 + PerishFire) takes precedence on overlap. Closes-issue / unblocks: PR #2461 sync release/v0.8.0 → main.	2026-05-23 12:17:18 +08:00
Marc Chan	a3872b97a9	fix(tools-dev): preserve web origin trust on web start (#2715 ) * fix(tools-dev): preserve web origin trust on web start Restart daemon/web when the trusted web port is missing, and reuse the active web port during repeated starts so run web and start web keep app-config origin checks aligned. Generated-By: looper 0.0.0-dev (runner=worker, agent=opencode) * fix(plugins): refresh official registry bundled count Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(tools-dev): preserve daemon/web reserved ports Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(tools-dev): preserve daemon reuse on web start Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(tools-dev): preserve running daemon port on web reuse Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(tools-dev): reserve explicit web port before daemon allocation Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * test(web): stabilize media provider reload flash timing Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(web): restore merged reattach workspace coverage Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(tools-dev): reserve allocated daemon port Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * test(e2e): wait for artifact manifest persistence Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)	2026-05-23 00:25:43 +08:00
lefarcen	50a4dc8a62	Merge origin/main into release/v0.8.0	2026-05-21 13:17:52 +08:00
Patrick A	85276df284	chore(deps): patch security override and patch bumps (#2306 ) - Add pnpm override: protobufjs 8.4.0 (CVE-2026-45740, GHSA-jggg-4jg4-v7c6) - Bump postcss 8.5.14 -> 8.5.15 in apps/web (and root override) - Bump tsx 4.22.2 -> 4.22.3 across all workspace packages Co-authored-by: Patrick A <259201958+eefynet@users.noreply.github.com>	2026-05-21 11:51:54 +08:00
lefarcen	722ddfa235	Merge origin/main into release/v0.8.0 Conflicts resolved by taking origin/main on both files. Root cause: main's PR #2460 (fix(landing): align logo.webp with brand icon) changed HomeHero.tsx's .home-hero__brand-mark to render <img src=/app-icon.svg> instead of an inlined <HeroBrandIcon /> SVG, and bundled the matching CSS (26px round badge with bg-panel + border + padding 2px) plus a gap/font-size tune. The release-side visual-refresh CSS still targeted the SVG layout (38px square, transparent, inset SVG selector). Keeping release's CSS would leave main's <img> unstyled. - apps/web/src/styles/home/home-hero.css three blocks, all taken from main: .home-hero__brand gap 8px, .home-hero__brand-mark redesigned for <img> child, .home-hero__brand-name font-size 16px. - apps/web/src/index.css two blocks, both taken from main: workspace tab close column 22px and .workspace-tab__close 18x18 (paired tune-down of tab UI spacing).	2026-05-20 22:28:38 +08:00
Eli-tangerine	8193981511	Keep PR 2400 changes without folder pickers (#2462 ) * feat(daemon): add project working directory management and editor hand-off functionality - Introduced new flags for project commands to manage working directories, including `--working-dir` and `--dir`. - Implemented API routes for listing available editors and opening projects in selected editors. - Added a hand-off button in the ChatPane header to facilitate opening project folders in local applications. - Enhanced the HomeHero component to include working directory and design system settings, improving user experience in project creation. - Created HomeHeroSettingsChips component for inline management of working directory and design system selection. * feat(chat): implement voice transcription proxy and enhance UI components - Added a new API route for voice transcription using OpenAI's `/audio/transcriptions` endpoint, allowing users to send audio blobs directly for transcription. - Integrated multer for handling audio file uploads in memory, ensuring efficient processing without disk storage. - Updated the HomeHero component to include example prompt suggestions for plugins, enhancing user interaction. - Introduced the EditorIcon component to visually represent different editors in the hand-off menu, improving the user experience. - Refined the HandoffButton component to utilize the new EditorIcon, providing a more cohesive interface for selecting editors. - Enhanced CSS styles for various components to improve layout and responsiveness, including adjustments to tab and button sizes for better usability. * style(workspace-shell): enhance layout and overflow handling - Updated CSS for .workspace-shell to ensure full viewport width and height, with proper overflow management. - Adjusted grid layout to prevent content overflow and maintain responsiveness. - Modified styles for .workspace-tabs-chrome to improve width handling and prevent overflow issues. * refactor(chat): remove voice transcription proxy and related components - Deleted the voice transcription proxy implementation, including the associated API route and multer configuration. - Removed the MicButton component from the ChatComposer and HomeHero components to streamline the UI. - Updated HomeHero to include example suggestions without the voice input functionality. - Adjusted CSS styles for various components to maintain layout consistency after the removal of the MicButton. * feat(daemon): implement minting of HMAC tokens for working directory management - Added a new function `mintImportTokenFromCurrentSecret` to generate HMAC tokens bound to a specified base directory, enhancing security for working directory operations. - Updated the `desktop-auth.ts` file to include the new token minting functionality, which returns structured errors when the desktop auth secret is cleared. - Introduced new IPC message types for minting import tokens in the sidecar protocol, allowing seamless integration with the daemon's working directory management. - Enhanced the `WorkingDirPill` component to utilize the new token minting flow for secure directory selection in desktop builds. - Updated CSS styles for the HomeHero component to accommodate new example suggestion features and maintain layout consistency. * fix(HomeView): import HOME_HERO_CHIPS constant for improved chip management - Updated the HomeView component to import the HOME_HERO_CHIPS constant from the chips module, enhancing the management of hero chips within the component. * feat(daemon): implement mintImportTokenViaSidecar for secure working directory management - Introduced the `mintImportTokenViaSidecar` function to facilitate the minting of HMAC tokens for desktop-import operations via the daemon's sidecar IPC. This allows CLI commands to bypass authentication when the desktop-auth gate is active. - Updated the CLI to utilize the new token minting function when setting the working directory, ensuring secure access to trust-gated API endpoints. - Enhanced the sidecar server to handle minting requests and return structured error messages for improved user feedback. - Added tests to validate the new token minting functionality and its integration with the working directory management process. - Refactored related components to support the new token flow, improving overall security and user experience. * feat(HomeHero): enhance UI components and styles for improved user experience - Updated HomeHero component to replace active dot indicators with Plug icons for better visual representation of active plugins. - Adjusted CSS styles for various elements, including padding and dimensions, to enhance layout consistency and responsiveness. - Introduced new styles for active type icons and improved hover effects for buttons. - Updated HomeHeroSettingsChips to change button titles and icons for clarity. - Added tests to ensure proper rendering and functionality of updated components. * feat(ProjectDesignSystemPicker): enhance design system selection with preview functionality - Updated the ProjectDesignSystemPicker component to include a preview feature for design systems, allowing users to see a preview of the selected design system. - Implemented hover functionality to update the preview based on the hovered design system. - Added fullscreen preview capability for a more immersive experience. - Enhanced CSS styles for the design system picker to improve layout and responsiveness. - Introduced tests to validate the new preview functionality and ensure proper interaction within the component. * feat: refactor project metadata handling and enhance design system picker - Updated the default scenario plugin ID retrieval to use project metadata, improving the logic for determining the appropriate plugin based on project intent. - Enhanced the ProjectDesignSystemPicker and related components to support localized design system summaries and categories, improving user experience. - Introduced new translations for working directory and design system picker components, ensuring better accessibility and usability across different locales. - Added a new 'live-artifact' project type to the HomeHero chips, expanding the functionality for users creating refreshable artifacts. - Updated tests to validate the new project metadata handling and design system picker functionalities. * feat: enhance localization and styling for design system components - Added French translations for working directory and design system picker components, improving accessibility for French-speaking users. - Updated CSS styles for the pet task item to ensure consistent padding and layout. - Introduced a new test suite for HomeHeroSettingsChips to validate localization and design system selection functionality. - Enhanced ProjectDesignSystemPicker tests to ensure proper localization and interaction with design system categories. * fix: update .gitignore to include all claude-sessions directories and remove specific session files - Modified .gitignore to ensure all claude-sessions directories are ignored by using a wildcard pattern. - Deleted two specific claude-sessions markdown files to clean up unnecessary session data. * fix: repair home automation ci regressions * fix: stabilize artifact consistency e2e * Remove folder picker changes from PR 2400 --------- Co-authored-by: pftom <1043269994@qq.com> Co-authored-by: qiongyu1999 <2694684348@qq.com>	2026-05-20 22:07:30 +08:00
lefarcen	aedbb9dbe4	release: Open Design 0.8.0 Bumps 14 workspace package.json files from 0.7.0 to 0.8.0: - root, apps/{web,daemon,desktop,landing-page} - packages/{contracts,host,platform,sidecar,sidecar-proto} - tools/{dev,pack,pr}, e2e apps/packaged was already at 0.8.0 from the preview lane. Independently versioned packages keep their own tracks. Adds CHANGELOG [0.8.0] - 2026-05-20 entry covering the 305 PRs merged since 0.7.0 by 75 contributors: - Plugin engine rebuild + Plugin Registry surface - Headless by default (desktop is thin wrapper around CLI) - Critique Theater Phases 9 through 16 - 149 design systems with structured tokens.css - Italian locale + CJK font fallback - Leonardo.ai, ElevenLabs, SenseAudio providers - Windows packaged auto-update - Visual refresh + Quick-brief discovery overhaul - PostHog v2 analytics - Manual edit UX overhaul	2026-05-20 21:22:17 +08:00
Eli	18b947c25f	[codex] Land design system GitHub intake handoff (#2187 ) * Add Claude-style design system workflow * Merge design system workflow into main * Restore design system workflow UI styles * Fix design system setup scrolling * Fix design system setup connector button * Preserve connector auth link after popup block * Simplify connected GitHub setup state * Open generated design system workspace project * Summarize design system auto prompt in chat * Add bounded GitHub connector design intake * Prefer path-scoped GitHub intake tools * Restore branch GitHub design context intake * Restore design system review workspace * Restore design system manager tab * Let design system workflow routes own details * Open editable design systems as projects * Restore design system workspace coverage * Fix bounded GitHub connector intake * Hide design system review while generating * Suppress design system generation questions * Constrain GitHub design intake to bounded command * Tolerate oversized GitHub metadata during intake * Rebuild daemon CLI when sources change * Fallback when GitHub connector snapshots are rate limited * Allow GitHub intake without Composio * Use native GitHub auth for design intake * Remove design system review group heading * Improve design system extraction evidence * Align design system scaffold with Claude output * Add evidence inventory for design system intake * Add local design system evidence intake * Add design system package audit gate * Allow auditing Claude Design reference packages * Audit design system package content quality * Migrate legacy design system artifacts * Clean migrated design system artifacts * Require modular design system UI kits * Reject thin design system UI kits * Prioritize core design evidence intake * Require role-based design system UI kits * Clean stale design system manifest references * Require representative preserved design assets * Warn on generic design system visuals * Enforce design system quality warnings * Audit connected design system UI kits * Require mounted design system UI kits * Require composed design system app shells * Require runnable JSX design system kits * Require browser globals for design system components * Infer design system names from source URLs * Require source examples in design system packages * Bind preserved fonts in design system tokens * Require skill frontmatter in design system packages * Preserve build icons in design system packages * Require real assets in brand previews * Require substantive source examples * Require product overview in design system README * Require reusable UI kit README * Require reusable design system skill docs * Seed Claude-style UI kit entry contract * Preserve runtime build assets in design packages * Audit design system packages after generation * Audit design system first-run output * Audit source-backed preview cards * Align design system UI kit scaffolds * Materialize design evidence package artifacts * Show project chat during design system setup * Hand off design system setup to project chat * Auto-repair design system audit failures * Harden design system evidence preservation * Tighten design system package guidance * Add targeted design system repair guidance * Bound design system audit auto repair * Use connector statuses in design system setup * Audit design system preview manifests * Require README preview manifests for design systems * Fix design system GitHub intake handoff * Fix daemon prompt CI assertions	2026-05-19 14:30:17 +08:00
PerishFire	bd48c597b0	chore: pin dependency versions and harden CI caches (#2189 ) * chore: pin dependency versions * ci: enforce pinned dependency specs * ci: fix pnpm executable invocation	2026-05-19 13:58:27 +08:00
PerishFire	4424f08be0	[codex] Add packaged desktop auto-update (#1375 ) * Add packaged desktop auto-update * Handle counted beta nightly update versions * Refresh desktop auto-update branch for main * Serialize desktop updater operations * Refresh auto-update branch for packaged paths	2026-05-19 11:20:05 +08:00
Qiaochu Hu	d55f05fcfa	fix: remove dead ternary in WORKSPACE_ROOT resolution (#487 ) * fix: remove dead ternary in WORKSPACE_ROOT resolution Both tools/dev and tools/pack config files had: ENTRY_DIR_NAME === "dist" ? "../../.." : "../../.." with identical branches. Since `src/` and `dist/` are siblings under `tools/{dev,pack}/`, both resolve to the same path. The ternary and ENTRY_DIR_NAME constant were dead code — simplify to "../..". Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Fix workspace root depth --------- Co-authored-by: Test User <test@example.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: PerishCode <perishcode@gmail.com>	2026-05-18 15:50:07 +08:00
Nagendhra Madishetti	38a5ab69e6	feat(daemon): Critique Theater Phase 12 (9 Prometheus metrics + 6 log events + OTel span + Grafana dashboard) (#1485 ) * feat(web): pure reducer for Critique Theater states (Phase 7.1) Pure CritiqueState reducer driven by the contracts-level PanelEvent (the same shape both the live SSE stream and the recorded transcript emit), so a single reducer powers both the in-flight panel and the rerun replay. Lifecycle covers run_started → running → (shipped / degraded / interrupted / failed), with panelist_open / dim / must_fix / close / round_end events building per-round CritiquePanelistView entries as they arrive. Defensive behaviour that surfaced while writing the spec tests: - Terminal phases (shipped / degraded / interrupted / failed) are sticky against further lifecycle events for the same run, except for parser_warning which can land late and is recorded in a side channel without changing phase. - A new run_started for a different runId at any time discards the prior state and reboots, so the UI can launch consecutive runs without an explicit reset action. - Events whose runId does not match the active run return the same state reference, so React's useReducer doesn't re-render subscribers on stray traffic. - Round bookkeeping keys by round number rather than "always last", so an out-of-order panelist_dim for round 1 arriving after a round 2 dim does not corrupt the round 2 bucket. Test coverage: 18 cases covering each transition, the runId guard, sticky-terminal behaviour, the out-of-order round invariant, and the stable-identity guarantee. Sets up Phase 7.2 and 7.3 to wire SSE + replay into the same reducer. * feat(web): useCritiqueStream hook subscribes to SSE and feeds reducer (Phase 7.2) createCritiqueEventsConnection is a pure connection manager that mirrors apps/web/src/providers/project-events.ts: opens an EventSource at /api/projects/:id/events, listens for every name in CRITIQUE_SSE_EVENT_NAMES, decodes each frame back into a PanelEvent (stripping the critique. prefix and merging the data payload), and hands it to the caller's onEvent. Reconnect uses exponential backoff (1s → 30s) and resets on `ready`; malformed payloads drop with a dev-mode warning rather than tearing the stream. useCritiqueStream wraps the manager in a useReducer that owns the CritiqueState. enabled=false or a null projectId tears down the connection cleanly; switching projectId closes the old connection and opens a fresh one. The returned dispatch lets local UI synthesise actions (e.g. an Esc keypress firing a synthetic interrupted while a kill request is in flight); production traffic comes from the SSE stream. Test coverage: - sse.test.ts (10 cases, node env): subscription set covers every CRITIQUE_SSE_EVENT_NAMES channel; payload decoding lifts the wire shape back to PanelEvent; malformed JSON is swallowed and does not stop the stream; exponential backoff schedule and ready-reset semantics are pinned with a setTimeout seam; close() cancels pending reconnects and shuts the live source; no-op fallback when EventSource is unavailable. - useCritiqueStream.test.tsx (6 cases, jsdom env): idle pre-event, reducer driven by synthetic actions, no connection when disabled or projectId is null, clean close on unmount, projectId change reopens cleanly. * feat(web): useCritiqueReplay hook drives reducer from transcript file (Phase 7.3) Fetches the per-run NDJSON transcript (one PanelEvent per line), parses every line via the shared isPanelEvent predicate, and dispatches into the same CritiqueState reducer the live SSE stream uses. A single reducer means the UI rendering a replay can be identical to the live panel, and a UI mounting both useCritiqueStream and useCritiqueReplay in parallel does not have to reconcile two state shapes. speed knob is `paused \| instant \| live \| { intervalMs: N }`. - instant flushes every event synchronously, useful for opening a finished run already at its terminal state. - intervalMs paces dispatches at a fixed cadence so the reviewer can watch the run unfold. - paused parses the transcript but holds events back until the caller advances speed (consumers can drive a scrubber later). - live is reserved for the future "playback at original cadence" feature, currently treated as instant; replay timestamps are not yet persisted with each event so honest pacing requires a follow-up Phase 7+ task. gunzip seam handles `.ndjson.gz` transcripts via DecompressionStream when present; the production fetch path picks between text and arrayBuffer based on the URL extension. Both seams are injectable so the unit tests don't need to spin up a real network or a real gzip pipeline. Test coverage (8 cases, jsdom env): - Idle status before any URL is provided. - speed=instant flushes the full transcript synchronously to shipped state. - speed={intervalMs:N} paces with the setTimeout seam, reaching done after the last tick. - speed=paused leaves status=playing with no dispatches. - Empty transcript reports done with state still idle. - Fetch rejection surfaces an error status with the message. - Malformed NDJSON lines are skipped; valid events around them still land. - .gz transcripts route through the gunzip seam. Closes the Phase 7 plan tasks 7.1 / 7.2 / 7.3 (reducer + stream + replay), all on one branch ready for review. Phases 8+ (Theater components) consume these from this PR. * fix(web): close payload-override gap + paused-resume bug in Critique Theater hooks (Phase 7 review) Two P1 fixes from lefarcen's review on PR #1307: SSE payload override `sseToPanelEvent` previously spread `data` after the channel-derived `type`, so a payload-provided `type` could override the channel and route a `critique.run_started` frame into the reducer as a `ship` action. Reversed the spread so the channel-derived `type` is authoritative, and revalidated the resulting object through the contracts-level `isPanelEvent` predicate before returning. Frames that fail validation (missing runId, empty runId, unknown type) are dropped, so a malformed or compromised SSE frame can no longer dispatch a wrong-shape action into the reducer. Three new sse.test.ts cases pin the regression: hostile `type:'ship'` in the payload still resolves to `run_started`, missing runId is dropped, empty runId is dropped. Replay pause/resume `useCritiqueReplay` had one big effect keyed on `transcriptUrl` only, so flipping `speed` from `paused` to `instant` never re-fired and the held events sat undispatched. Split into a parse effect (depends on URL, fetches and stores events in state) and a pace effect (depends on parsed-events + speed, owns the cursor + timers). The playback cursor lives in a ref that survives pause/resume cycles, so flipping `paused` -> `instant` flushes from the current position rather than restarting (which would double-dispatch `run_started` and reset the reducer). Two new useCritiqueReplay.test.tsx cases: - paused-then-instant transitions from `playing` to `done` and reaches the shipped terminal phase - intervalMs paced playback dispatches one event, pauses to drain the next scheduled timer, flips to instant, and confirms the remaining transcript drains exactly once (cursor was preserved) Doc consistency The earlier source comment in useCritiqueReplay.ts claimed `live` "paces by recorded timestamps" while the impl used zero-delay timers and the PR body said it behaves like `instant`. Aligned to reality: `live` currently behaves like `{ intervalMs: 0 }` (events drain on successive microtasks via setTimeoutFn) because transcripts do not yet carry per-event timestamps. Honest timestamp-driven pacing is queued as a Phase 7+ follow-up. Validated: pnpm guard, pnpm --filter @open-design/web typecheck, Theater suite 47/47 (up from 42, +3 sse + 2 replay), full web suite 96 files / 888 tests. * feat(i18n): seed Critique Theater key block (en + zh-CN; other locales fall back via spread) * feat(web): Theater PanelistLane component (Phase 8.1) * feat(web): Theater ScoreTicker component (Phase 8.2) * feat(web): Theater RoundDivider component (Phase 8.3) * feat(web): Theater InterruptButton component with Escape keybind (Phase 8.4) * feat(web): Theater TheaterDegraded chip (Phase 8.5) * feat(web): Theater TheaterCollapsed post-run summary (Phase 8.6) * feat(web): Theater TheaterTranscript replay surface (Phase 8.7) * feat(web): Theater TheaterStage top-level container (Phase 8.8) * feat(web): Theater CSS using existing semantic tokens (no hex literals) * feat(web): Theater public exports barrel * fix(web): resolve P2 + P3 review feedback on Phase 8 (PR #1314) Addresses all 4 P2 + 3 P3 items from codex, Siri-Ray, and lefarcen. State-lifecycle fixes (3 x P2) 1. Reducer learns a synthetic `__reset__` action (`CritiqueResetAction`). Host hooks dispatch it when their gating prop changes so a stale run from a prior project / transcript cannot bleed into the next context. Reset is idempotent on idle (returns the same reference). 2. `useCritiqueStream` dispatches `__reset__` at the top of its connection effect, so a workspace switch from project A (which streamed a critique) to project B clears the reducer before the new EventSource opens. enabled=false also clears. 3. `useCritiqueReplay` dispatches `__reset__` at the top of its parse effect, so transcriptUrl swaps (including swap-to-null after a replay reached `shipped`) lift the reducer back to idle before the new fetch starts. SSE validation (1 x P2) 4. `sseToPanelEvent` now runs a per-variant `hasValidVariantShape` check after the cheap `isPanelEvent` predicate. A `critique.ship` frame missing `composite` / `round` / `status` / `artifactRef` is rejected before reaching the reducer, so TheaterCollapsed can no longer crash on `undefined.toFixed(1)`. Every variant's required fields are validated: run_started (protocolVersion, non-empty cast, maxRounds, threshold, scale), panelist_* (round, role, plus variant-specific shape), round_end (round, composite, mustFix, decision in {continue,ship}, reason), ship (round, composite, status, artifactRef.{projectId,artifactId}, summary), degraded (reason, adapter), interrupted (bestRound, composite), failed (cause), parser_warning (kind, position). Reducer correctness (1 x P2) 5. `panelist_open` now materializes the round + an empty panelist view (`{dims: [], mustFixes: []}`) so TheaterStage can highlight the in-progress lane the instant the tag opens. Before this, a stream that emitted only `panelist_open` after `run_started` left `rounds = []` and the UI rendered no current round until a later `panelist_dim` arrived. Polish (3 x P3) 6. Brand role tint swaps from `var(--magenta, var(--accent))` to `var(--purple, var(--accent))`. `--purple` is actually defined across the design systems; `--magenta` is not, so Brand was silently falling through to `--accent` and looking identical to Designer. 7. New i18n key `critiqueTheater.interruptedSummary` for the interrupted-collapse copy ("Interrupted at round N, best composite X.X"). Previously the interrupted branch reused `shippedSummary` and the UI read "Shipped at round..." for a run that specifically did not ship. Native value in en + zh-CN; other locales fall back via `...en` spread. 8. `TheaterDegraded` heading id comes from `useId()` instead of a hardcoded `theater-degraded-heading`, so two chips rendered on the same page (chat history with multiple completed runs) keep their aria-labelledby references unambiguous. Tests (15 new cases) - reducer.test.ts (+5): __reset__ on running/terminal/idle, panelist_open materializes round, panelist_open does not stomp prior panelist data. - sse.test.ts (+6): variant-level rejection for ship without required fields, degraded without adapter, run_started with empty cast, panelist_dim with non-numeric score, round_end with unknown decision, plus a positive fully-formed ship. - useCritiqueStream.test.tsx (+2): state reset on projectId change, state reset on enabled flip false. - useCritiqueReplay.test.tsx (+1): state reset on transcriptUrl swap to null after a replay reached shipped. - TheaterCollapsed.test.tsx (text-pinning update): asserts the interrupted branch reads "Interrupted at round 1" + "best composite 7.9", and explicitly NOT "Shipped at round...". - TheaterDegraded.test.tsx (+1): two chips on the same page get unique aria-labelledby ids that each resolve to an `<h3>`. Validated - pnpm guard clean - pnpm --filter @open-design/web typecheck clean - Theater suite: 13 files, 101 tests (was 86 on the first Phase 8 push, +15 new) - tests/i18n/locales.test.ts 5 of 5 across 18 locales * feat(web): CritiqueTheaterMount wires SSE + reducer into a single drop-in (Phase 9.1) * feat(i18n): Critique Theater strings for de + ja + ko + zh-TW (Phase 9.2) * fix(web): resolve P1 + P2 review feedback on Phase 9 (PR #1315) Addresses every blocker from codex, Siri-Ray, and lefarcen. The three state-lifecycle and SSE-validation issues they also flagged inherit fixes from PR #1314's review pass that this branch now sits on top of after rebase. Real daemon kill on Interrupt (P1) - CritiqueTheaterMount now POSTs to /api/projects/:id/critique/:runId/interrupt alongside the optimistic local dispatch. Before this fix, clicking Interrupt only flipped the React state to interrupted while the daemon job kept running. The fetch is best-effort: a 404 (endpoint not wired yet, lands in Phase 15) is swallowed with a dev-mode console.warn so the UI still moves to the collapsed badge. - New fetchInterrupt test seam lets RTL assert on the URL / method and simulate the "daemon not ready yet" path. Two tests pin both: the happy URL proj-42/critique/run-abc/interrupt POSTs, and a rejected fetch still flips the UI. interruptPending reset on new run (P2) - A ref-backed effect compares the current runId against the last one we saw; when it changes, interruptPending is cleared. A user who interrupts run-1 and then triggers run-2 from the same mount now gets a fresh, enabled kill button instead of one stuck in "Interrupting…". Pinned by a new mount test. Escape keybind scope (P2) - InterruptButton now checks the keydown target. Escape inside an input, textarea, select, or contenteditable element is ignored (and any ancestor of those via closest() is treated the same way). Body-level focus still fires the keybind so the Theater area's affordance keeps working. Four new tests cover textarea, input, contenteditable, and the body-focus positive case. userFacingName i18n key (P2) - The spec at specs/current/critique-theater.md:6 mandates a single critiqueTheater.userFacingName key so the "Design Jury" label can be renamed without touching code. Phase 8 introduced critiqueTheater.title by mistake; renamed across types.ts, en.ts, zh-CN.ts, de.ts, ja.ts, ko.ts, zh-TW.ts, and the lone consumer TheaterStage.tsx. The locale alignment test stays green. Validated - pnpm guard clean - pnpm --filter @open-design/web typecheck clean - Theater suite: 14 files, 112 tests (was 101 before, +11 new for the Phase 9 review pass: 3 mount + 4 InterruptButton focus scope; the rest were already in #1314's review fix). - tests/i18n/locales.test.ts 5 of 5 across 18 locales. * feat(daemon): adapter-degraded registry with TTL (Phase 10.1) In-memory registry recording adapters that produced malformed or oversize transcripts so the orchestrator can skip them for a TTL window (default 24h) instead of cycling through known-bad providers on every run. Records carry reason (malformed_block \| oversize_block \| missing_artifact), source label, and expiresAt. The test-only clock seam lets the suite advance time deterministically and prove that an expired entry stops counting as degraded without anyone calling clearDegraded. 7/7 vitest cases green. * feat(daemon): synthetic good + bad adapter fixtures (Phase 10.2) Two test-only adapters that read the existing v1 transcript fixtures (happy-3-rounds and malformed-unbalanced) and replay them as either a full string or a 512-byte chunked stream. The chunked form is what the conformance harness uses to prove the parser holds together when the transcript arrives in arbitrary network slices, not as one buffered blob. * feat(daemon): adapter conformance harness (Phase 10.3) runAdapterConformance pulls a transcript through the same parseCritiqueStream pipeline the orchestrator uses and classifies the outcome as shipped, degraded, or failed. On a degraded outcome it forwards the matched reason to the adapter-degraded registry, so a single nightly conformance run is what populates the skip list rather than the orchestrator learning each adapter is broken at request time. 5/5 vitest cases green covering shipped, malformed degraded, oversize degraded, no-ship failure, and the harness-thrown failure path. * test(e2e): Critique Theater Playwright suite (Phase 11) Six tests, one viewport per visual case, deterministic SSE fixtures stubbed via page.route(). Adds the suite to test:ui:extended so the existing extended-UI lane picks it up. Coverage: 1. Happy path: a single mounted theater plays the full fixture (1 run_started, 5 panelists open / dim / must_fix / close, 1 round_end, 1 ship) and ends on the score badge. 2. Interrupt mid-run: the panelist that is open at the time the interrupt button is clicked closes with an interrupted marker and the transcript freezes there. 3. Visual regression at 375x720 mobile. 4. Visual regression at 768x1024 tablet. 5. Visual regression at 1280x800 desktop. 6. A11y role tree: the theater region exposes a labelled landmark, each panelist lane is a group with an accessible name, the score is a status live region. All SSE traffic is stubbed by page.route so the suite runs in CI without a daemon. The toggle is seeded via localStorage by bootAppWithCritiqueEnabled so the gate behaves as if Settings flipped it on. typecheck clean; playwright --list reports 6. * test(web): reducer p99 bench at 10k iterations (Phase 13.1) Locks the documented 2ms budget for the Critique Theater reducer on a representative SSE script (27 actions, one full happy run) behind a regression gate. Asserts p99 stays under 4ms (2x the documented budget) so CI runners with a noisy neighbour do not flake while a real regression to 20ms or 200ms still trips. The bench is a vitest case rather than a bare microbenchmark so it runs in the same CI lane as every other web test and does not need a parallel runner. * test(web): critique surface coverage walker (Phase 13.2) Walks the public critique surface (11 SSE event names, 5 panelist roles, 6 lifecycle phases, 9 named i18n keys) and asserts each named symbol appears in both the src corpus and the test corpus. The walker is the gate that catches a rename in one half of the codebase without a matching update in the other half: a future PR that drops 'panelist_must_fix' from the reducer without also removing its test reference fails this suite. 62 assertions, one per symbol per corpus. * docs: Critique Theater user guide (Phase 14.1) Seven sections aimed at end users (not contributors): 1. What is Design Jury 2. How it works (the five panelists, auto-converging rounds, the composite formula) 3. Settings (the M1 toggle and what it does) 4. Reading the score badge 5. Replay surface 6. Troubleshooting (degraded, interrupted, failed) 7. FAQ The composite formula is documented as designer * 0 + critic * 0.4 + brand * 0.2 + a11y * 0.2 + copy * 0.2 because anyone trying to reverse-engineer the score is going to search for those weights and the docs are the place they should land first. * docs(daemon): critique module AGENTS map (Phase 14.2) Daemon-side wayfinder for the apps/daemon/src/critique directory. Tables every file, what owns what invariant, and the 'when you change anything here' guide so a future contributor does not have to reverse-engineer the rollout resolver before adding a new SSE event. * docs(web): Theater module AGENTS map (Phase 14.3) Web-side mirror of the daemon AGENTS map. Same file table, same invariants section, same change-impact guide, sized to the Theater component package. * feat(daemon): rollout flag resolver (Phase 15.1) Single decision point every caller consults to know whether the orchestrator should wire the critique pipeline for a given run. Priority: 1. Skill-level policy (required wins, opt-out wins inversely) 2. Per-project override from the Settings toggle 3. OD_CRITIQUE_ENABLED env override 4. Rollout phase default M0 dark-launch false M1 settings only false (toggle is off until the user flips it) M2 per-skill true if skill opted in M3 global default true OD_CRITIQUE_ROLLOUT_PHASE parser defaults to M0 on unknown input so a fresh install never surprises a user with the feature on. 10/10 vitest cases green covering every cell of the matrix. * feat(web): Settings toggle hook for Critique Theater (Phase 15.2) React hook that reads critiqueTheaterEnabled from the existing open-design:config localStorage blob and stays in sync via: - the platform storage event (cross-tab) - a open-design:critique-theater-toggle CustomEvent (same-tab) Same-tab event is the one that fires when the Settings panel saves in the current window: the toggle and every mounted theater update without a page reload. setCritiqueTheaterEnabled(next) is the imperative setter the Settings panel calls. It preserves the rest of the stored config (mode, apiKey, etc.) and dispatches the same-tab event after the localStorage write. The web hook reflects what the user toggled; the daemon-side isCritiqueEnabled is the final routing authority (project override, env, rollout phase). When they disagree, the daemon wins for backend gating and the web reflects the toggle state. 6/6 vitest cases green covering first read, stored read, same-tab event flip, config preservation, corrupted JSON tolerance, and cross-tab storage event. * test(web): Phase 15 toggle hook failure-mode coverage (PR #1320) lefarcen P2 on PR #1320 flagged that the PR body claimed safe behavior for disabled localStorage, non-object JSON, and missing CustomEvent shim, but the suite only covered corrupt JSON plus happy-path storage events. Added four failure-mode tests so the swallowed errors are not silently traded for a throw in a future refactor: 1. Returns false on a stored JSON value that parses to an array (non-object). Catches a regression where the guard treats anything truthy as a config blob. 2. Returns false on a stored JSON value of literal 'null'. typeof null === 'object' in JS, so the guard has to check null explicitly; this test pins that check. 3. Returns false when localStorage.getItem throws (private mode / disabled storage / SecurityError). The hook must swallow and return false so the rest of the app keeps rendering. 4. setCritiqueTheaterEnabled still dispatches the same-tab CustomEvent when localStorage.setItem throws (quota exceeded / disabled storage). The dispatch path is the in-session broadcast that keeps every mounted hook coherent even when persistence is unavailable; verified by mounting two probes and asserting both flip after the setter is called with a throwing setItem. 10/10 vitest cases green (6 existing + 4 new). * fix(web): honor CustomEvent payload in toggle hook listener (PR #1320) Both Siri-Ray (blocking) and lefarcen (P2 new) caught the same real bug in the failure-mode test I added in `affcdd27`: the test asserts the in-session UI flips when localStorage.setItem throws, but the CustomEvent listener was ignoring the event's typed detail and just calling readToggle(). Under a throwing setItem the localStorage value is stale (or absent), so the listener would see the OLD value and the test would fail (or worse, the production claim 'in-session event keeps mounts coherent' was hollow). Fixed the hook, not the test: the listener now reads event.detail.enabled when it is a boolean, falling back to readToggle() only for malformed events or for cross-tab storage events (which do not carry a typed payload). The setter already dispatched the detail; the listener just was not consuming it. Test changes: - The existing 'setItem throws' test now asserts the right behavior for the right reason. Updated the inline comment to say the listener reads from detail, not localStorage. - New test 'falls back to readToggle when the CustomEvent carries no usable detail' pins the fallback path: a malformed dispatcher (no detail, or detail.enabled not a boolean) degrades cleanly instead of throwing or being silently ignored. 11 / 11 vitest cases green (10 prior + 1 new fallback). * feat(daemon): route critique spawn-path eligibility through the rollout resolver The wireup edit Phase 10 and Phase 15 carved out: today server.ts gates the critique pipeline on critiqueCfg.enabled, which is just the OD_CRITIQUE_ENABLED env var. After this commit it gates on isCritiqueEnabled(...) from the Phase 15 resolver, so the full priority matrix is live: 1. Per-skill od.critique.policy veto (opt-out / required) 2. Per-project override (M1 Settings toggle, written through the existing Phase 6 settings endpoint) 3. OD_CRITIQUE_ENABLED env override (power-user lane / CI fixtures) 4. OD_CRITIQUE_ROLLOUT_PHASE default M0 dark-launch false M1 settings only false M2 per-skill only when skillPolicy === 'opt-in' M3 global default true Default behaviour on a fresh install is unchanged: the resolver returns false at M0 without an env override or a project override, so prod traffic falls through to the legacy single-pass path exactly the way it did before. Inputs threaded today: phase from OD_CRITIQUE_ROLLOUT_PHASE, envOverride from OD_CRITIQUE_ENABLED. skillPolicy and projectOverride are passed as null for the v1 cutover; the daemon-side handler that round-trips critiqueTheaterEnabled on the project settings row and the od.critique.policy frontmatter resolver land as the next two commits in this branch. The three call sites that used critiqueCfg.enabled (the brand-thread guard, the skill-thread guard, the top-line critiqueShouldRun compound) now read from a single locally-scoped critiqueEnabledForRun boolean, so the eligibility check is computed exactly once per spawn and the prompt composer + orchestrator stay in lockstep the way the existing comment already promised. Tests still green: daemon vitest 22 / 22 across rollout + conformance + adapter-degraded. Daemon typecheck clean. * feat(web): mount CritiqueTheaterMount in ProjectView The web counterpart of the daemon wireup. ProjectView now renders <CritiqueTheaterMount projectId={project.id} enabled={...} /> as a sibling of <AppChromeHeader> inside the top-level <div className="app">. The mount is the drop-in from the Phase 9 stack: it owns the SSE subscription, the kill-request handshake, and the phase-aware swap from the live <TheaterStage> to the collapsed badge once a run settles. The mount returns null until the daemon emits a critique.run_started for the active project, so the visual surface is byte-for-byte unchanged for users who have not opted in. Enabled wiring: useCritiqueTheaterEnabled() reads the M1 Settings toggle from the existing open-design:config localStorage blob and stays in sync with both the platform storage event (cross-tab) and the same-tab open-design:critique-theater-toggle CustomEvent the Phase 15 setter dispatches. The hook honors the event payload directly so a private-mode browser that cannot persist the toggle still updates the in-session UI correctly. The daemon-side gate (isCritiqueEnabled in apps/daemon/src/server.ts) remains the authority for whether a run is actually wired through the critique pipeline. This hook only governs whether the web layer renders the resulting SSE stream when the daemon emits one. The two-layer gate is intentional: an integrator embedding the Theater in a custom UI can flip the web visibility independent of the daemon's routing decision, and a daemon-side env override flips backend gating without touching the web's localStorage. Tests still green: web Theater suite 181 / 181 across 16 files. Web typecheck clean. * feat(daemon): resolve od.critique.policy frontmatter at the spawn site The next step in the wireup branch's ladder: replace the placeholder `skillPolicy: null` with the actual value parsed from the active skill's SKILL.md frontmatter. Three small edits, one new field on a public type: 1. SkillInfo gains a `critiquePolicy: SkillCritiquePolicy` field carrying the parsed `od.critique.policy` token (required / opt-in / opt-out / null). The field is null when the skill has no opinion, which lets the lower-priority resolver tiers (projectOverride, envOverride, phase default) decide. 2. listSkills() populates the new field via a small `normalizeCritiquePolicy` helper that tolerates the YAML scalar's casing and trims whitespace. Unknown tokens collapse to null so a typo in SKILL.md cannot accidentally force the panel on or off; it just falls through. Derived example cards inherit the parent's policy. 3. server.ts captures `skill.critiquePolicy` into a hoisted `skillCritiquePolicy` variable inside the existing skill-load block, then threads it into the isCritiqueEnabled call as the skillPolicy input. The hoisting keeps the variable in scope at the resolver call site without restructuring the spawn handler. After this commit, the priority matrix the rollout resolver was designed for is live for its top tier. The previous commit wired env + phase; this one wires skill. The projectOverride input remains null pending the next commit that extends the Phase 6 settings endpoint. Daemon vitest: 10 / 10 rollout cases pass against the new wiring. Daemon typecheck: clean. * feat(daemon): feed projectOverride into the rollout resolver from project metadata Replaces the placeholder `projectOverride: null` in the spawn handler with the actual value the Settings panel writes onto the project's metadata blob: `critiqueTheaterEnabled?: boolean`. The read is defensive at the boundary: the metadata object is typed loosely (it round-trips through SQLite as a free-form JSON blob), so the spawn handler narrows to `boolean` and falls through to `null` for any other shape. A missing key, a malformed value, or a project that has never visited Settings collapses to `null`, which is exactly the resolver's "no opinion, fall through to env / phase" signal. The `critique` frontmatter slot also gets typed on the SkillFrontmatter shape so the `od.critique.policy` chain the previous commit introduced no longer needs a bracket-access cast. Same pattern as the existing `craft`, `preview`, and `design_system` nested-record slots. After this commit, every tier of the rollout resolver's priority matrix is wired: 1. skillPolicy (from SKILL.md od.critique.policy) 2. projectOverride (from project metadata critiqueTheaterEnabled) 3. envOverride (from OD_CRITIQUE_ENABLED) 4. rollout phase (from OD_CRITIQUE_ROLLOUT_PHASE) The write path for projectOverride still flows through the existing project-update handler the Settings panel already uses to persist project metadata; no new endpoint is needed. The Settings UI button that calls setCritiqueTheaterEnabled and posts the new field is the next commit on this branch. Daemon typecheck: clean. Daemon vitest: 10 / 10 rollout cases still green against the new wiring. * fix(daemon): forward critique events to project sinks + align composer gate (PR #1338) Two codex review items addressed in one commit since they share the same root cause (resolver-enabled run hits a transport / prompt contract that was still env-gated): P1 (transport mismatch). The daemon emits critique.* SSE frames through critiqueBus -> design.runs.emit, which fans out on /api/runs/:runId/events. The web CritiqueTheaterMount subscribes to /api/projects/:projectId/events (it's project-scoped, not run- scoped, because the mount lives at the project workspace and follows the user across runs). Result: in production the mount never sees a real frame and the e2e tests' stubbed routes hide the mismatch. Fixed by extending critiqueBus.emit to fan out to BOTH sinks: the existing runs.emit transport, AND the per-project event-sinks map. The project-events route emits via sse.send(payload.type, payload), so we pack the SSE channel name onto payload.type and let the sink push the right channel. The web sseToPanelEvent overwrites type from the channel name on the way back into a PanelEvent, so the round-trip stays correct. P2 (prompt gate misalignment). composeSystemPrompt reads cfg.enabled to decide whether to append the panel addendum, but critiqueCfg.enabled is loaded from OD_CRITIQUE_ENABLED only. A run the resolver enabled via phase / project / skill (env unset) would have critiqueShouldRun = true while critiqueCfg.enabled remained false, dropping the panel prompt while still routing through runOrchestrator -> parser waits for tags that never arrive -> run degrades. Fixed by passing a derived config { ...critiqueCfg, enabled: true } to the composer when critiqueShouldRun is true. The composer's own gate now agrees with the resolver decision on every input the spec defines. Daemon typecheck: clean. Daemon vitest: 10 / 10 rollout cases still green against the new wiring. * fix: address PerishCode P1 + P2 follow-ups on PR #1338 Two follow-up items PerishCode flagged on the activation PR. Non-blocking but both are real: 1. Phase 11 e2e suite was wired into test:ui:extended but lands the user on '/' (home route) where ProjectView (and therefore CritiqueTheaterMount) is never rendered. With the suite as written, every assertion would time out the first time the lane runs in CI, contradicting the PR body's claim that the suite stays parked behind test.describe.fixme. The state diverged from my earlier Phase 11 work because the merge from main on commit `4ab719c6` brought in #1307's squash-merged version of the e2e file (the pre-fixme shape). Re-applied test.describe.fixme to the describe block plus removed ui/critique-theater.test.ts from the test:ui:extended script in e2e/package.json. Added a file-header docblock explaining what the follow-up commit needs to do: replace goto('/') with /projects/:id navigation similar to app-design-files.test.ts, split the SSE fixture into a live prefix and terminal suffix (Codex P2 on PR #1320), and commit the first PNG baselines. 2. bestRoundOf in CritiqueTheaterMount returned the LAST round with a numeric composite, not the round with the HIGHEST composite, while bestCompositeOf correctly returned the max. A run that closed round 1 at 8.5 and round 2 at 6.0 would dispatch interrupted { bestRound: 2, composite: 8.5 } on a user-clicked interrupt. Folded the two helpers into a single bestRoundAndComposite that walks state.rounds once and returns the matching pair so the two values cannot drift. The onInterrupt callback now destructures from one helper instead of two independent reads. Falls back to (state.activeRound, 0) when no round has closed with a composite yet. Web typecheck: clean. CritiqueTheaterMount.test.tsx: 7 / 7 cases still green against the new helper. * fix: wire M1 project override end-to-end + correct deferred-surface doc claims (PR #1338) Three lefarcen P2s on the latest review pass, all real: 1. M1 project override was half-wired: the daemon read metadata.critiqueTheaterEnabled but the web setter only wrote localStorage. A user opt-in would render the Theater on the web (localStorage was set) while the daemon resolved projectOverride=null and skipped critique unless env / phase already permitted. Two halves talking past each other. Extended setCritiqueTheaterEnabled to accept an optional { projectId, fetchProjectSettings } options bag. When a projectId is supplied, the setter ALSO sends a PATCH /api/projects/:id with { metadata: { critiqueTheaterEnabled } } so the daemon's spawn-time resolver picks the same value up on the next generation. The existing project-routes endpoint already accepts arbitrary metadata patches, so no new endpoint is needed. The local write + the CustomEvent dispatch still fire before the PATCH, so a network failure does not unwind the in-session UI flip. Three new vitest cases pin the new path: PATCHes when projectId is provided, skips when it is not, swallows a rejected PATCH so the in-session UI still flips. 2. Rollout docs (docs/critique-theater.md section 3) claimed the Settings toggle persists into the daemon settings store, but the previous implementation only had a localStorage reader / writer plus a daemon read of project metadata, with no round-trip. Rewrote the section to lead with the four-tier resolver (skill policy / project override / env / phase), document that the setter now round-trips via the existing PATCH endpoint when given a projectId, and call out the Settings panel UI control as a deliberate follow-up. 3. Troubleshooting table pointed users at /api/metrics/critique (Phase 12, deferred) and 'od adapters clear-degraded <id>' (CLI wrapper that does not exist). Replaced the metrics reference with the local conformance harness command (pnpm --filter @open-design/daemon vitest run tests/critique-conformance.test.ts) that ships today, with a note that the Phase 12 dashboard surfaces this status as a series once that PR lands. Replaced the CLI command with the programmatic clearDegraded() helper that exists today and flagged the CLI wrapper as planned follow-up. Web typecheck: clean. Toggle hook tests: 14 / 14 green (11 existing + 3 new for the round-trip path). * test(web): multi-round interrupt regression for bestRoundAndComposite (PR #1338) lefarcen P3 follow-up to the previous bestRoundAndComposite fix: the existing CritiqueTheaterMount.test.tsx interrupt cases only exercised a single-round state, so a future refactor back to two independent helpers wouldn't be caught by the test suite even though it'd reintroduce the round / composite drift bug. Added a regression case that: 1. Drives the reducer through two complete rounds with the full 5-role cast closing at distinct composites: round 1 at 8.5, round 2 at 6.0 (the high-composite round is NOT the most recent one). 2. Clicks Interrupt + waits for the daemon ack via the test seam fetcher returning 204. 3. Asserts the collapsed badge displays "round 1" (the correct best-composite round), and queryByText for "round 2 ... 8.5" returns null (the buggy pairing would have produced that string). The bestRoundAndComposite helper walks state.rounds in one pass and returns the matching pair, so the round number and the composite cannot drift apart. This test locks the fix in: a refactor that splits the helpers back into independent walks will be caught here. 8 / 8 vitest cases green on the file. * fix(web): read-merge-write the project metadata in setCritiqueTheaterEnabled (PerishCode P2 on PR #1338) The previous round-trip sent { metadata: { critiqueTheaterEnabled: next } } as the entire PATCH body. The daemon's project-routes handler only re-stamps three immutable fields (baseDir, importedFrom, fromTrustedPicker) before calling updateProject(db, id, patch), which then does a shallow { ...existing, ...patch } in apps/daemon/ src/db.ts. So patch.metadata replaces the row's metadata wholesale, dropping kind, templateId, linkedDirs, and every other field the rest of the app reads. No in-tree caller passes projectId today (only vitest cases), so the bug had not surfaced yet. But the surface is documented in docs/critique-theater.md section 3 and the function's own JSDoc as the M1 round-trip path, so it would have shipped as a latent footgun for the next integrator: a Settings UI follow-up, or any third party that wires the setter into a project-aware surface. Fix: read-merge-write rather than a bare patch. - GET /api/projects/:id to read the row's current metadata. - Spread that metadata into the PATCH body and overlay critiqueTheaterEnabled: next on top, mirroring the partial-metadata pattern already used in ChatComposer.tsx for linkedDirs. - PATCH the merged object. Failure handling: - GET fails: skip the PATCH entirely. We cannot construct a safe merged body without the current state, and a bare patch would wipe other metadata. The in-session CustomEvent fired earlier in the setter still keeps every mounted hook consistent; the next save retries the round-trip. - PATCH fails: log in dev. The in-session UI is already correct via the CustomEvent. Tests (TDD, red-first): - 'GETs the project then PATCHes with merged metadata when a projectId is supplied': stubs a GET that returns { kind: 'template', templateId: 'modern-blog', linkedDirs: [...] } and asserts the PATCH body equals the merge plus the toggle. - 'PATCHes with just the toggle when the project has no prior metadata': stubs a GET that returns no metadata block. - 'skips the PATCH (does not stomp metadata) when the prefetch GET fails': stubs a rejecting GET and asserts only the GET fires. - 'swallows a rejected PATCH after a successful prefetch': stubs a successful GET and a rejecting PATCH; asserts the in-session UI still flips via the CustomEvent. Doc updated on the setter's JSDoc to describe the new three-step flow (localStorage, CustomEvent, read-merge-write PATCH) and the two failure modes. Verified: - pnpm --filter @open-design/web typecheck clean. - pnpm --filter @open-design/web test: 111 files / 1055 tests green (was 1052, +3 from the new merge-flow cases). * fix(web): restore wait-for-daemon-ack pattern on Theater interrupt Same regression as flagged on PR #1316 post-main-merge: the optimistic local dispatch fired before the POST resolved, so a daemon 404 / 409 still terminalized the UI and the real SSE terminal event got ignored by the sticky interrupted phase. Snapshot runId / bestRound / composite at click time, dispatch interrupted only on res.ok, clear interruptPending on rejection or non-2xx so the user can retry. Tests cover rejection + 404 leaving the run on the live stage; the 204 path waits for the ack. * feat(daemon): Critique Theater Phase 12 observability foundations Lands the metrics registry, the structured logger, the /api/metrics route, and the adapter-degraded bump that wires up the first data point. The orchestrator-side bumps for runs / rounds / composite / must-fix / interrupted / parser_errors / protocol_version land in a follow-up commit on this branch (kept separate so the wiring diff reads cleanly against the registry shape). Surfaces added: - apps/daemon/src/metrics/index.ts: 9 Prometheus series under the open_design_critique_* namespace with the histogram buckets the spec calls out (round_duration_ms at 100 / 250 / 500 / 1000 / 2500 / 5000 / 10000 / 30000 / 60000 ms; composite_score at 0-10 integer steps). - apps/daemon/src/logging/critique.ts: 6 typed events, one JSON line per call on stdout, namespaced critique. Matches the JSON-per-line convention cli.ts already uses; no new logger framework. - apps/daemon/src/server.ts: GET /api/metrics route. Honors OD_METRICS_ENDPOINT=disabled to opt out for air-gapped installs. - apps/daemon/src/critique/adapter-degraded.ts: markDegraded now bumps degraded_total so the adapter-health dashboard panel reflects every TTL refresh and every fresh mark. Deps: prom-client ^15.1.0, @opentelemetry/api ^1.9.0 added to apps/daemon/package.json. Both are zero-config no-ops without an exporter wired; daemon bundle size impact is ~150 KB uncompressed. The @opentelemetry/api dep is in place ahead of the OTel-spans follow-up commit; it adds no behavior on this commit. Tests: - tests/metrics/critique.test.ts (3 cases): registry shape + exposition text + reset-between-tests - tests/logging/critique.test.ts (4 cases): event shape + ordering + newline framing + namespace stamping Verification (Windows-local): - pnpm --filter @open-design/daemon typecheck: clean - New metrics + logging suites: 7 / 7 green - Existing adapter-degraded + conformance + rollout suites: 22 / 22 green; the bump is non-breaking * feat(daemon): wire Critique Theater metrics + structured logs from the orchestrator Lights up the bump sites the Phase 12 foundations PR registered the series for. Every panel event the parser surfaces now reaches the matching Prometheus counter / histogram and the matching JSON log line on stdout. Switch-loop bumps + logs: - run_started: log run_started, set protocol_version gauge to the observed protocol version (small-integer cardinality). - panelist_open: record the first-open wall-clock per round so round_end can compute round_duration_ms; subsequent opens in the same round leave the start time untouched. - panelist_must_fix: bump must_fix_total with the panelist role. The wire event does not yet carry a dim name, so the label is 'unspecified' for now; a future parser revision can drop in the real dim without a metric rename. - round_end: bump rounds_total, observe composite_score, observe round_duration_ms (current ms minus the tracked start), log round_closed with the composite / mustFix / decision triple. - parser_warning (parser-yielded): bump parser_errors_total with the kind label, log parser_recover with kind + position. Orchestrator-side parser warnings (composite_mismatch and duplicate_ship from the daemon-authoritative scoring checks) go through a new emitParserWarning helper so the bus emit, the collectedEvents push, the metric bump, and the log line stay in lockstep. Three inline emission sites collapse to one-line helper calls. After the try/catch, a single terminal-status switch bumps runs_total{status, adapter, skill} once per run, with branch- specific log + counter: - shipped / below_threshold: log run_shipped - interrupted: bump interrupted_total, log run_failed{cause: interrupted} - timed_out: log run_failed{cause: timed_out} - failed: log run_failed{cause: orchestrator_internal} - degraded: log degraded{reason: orchestrator_classified} OrchestratorParams gains optional skill: string for the label; defaults to 'unknown' so spawn sites that have not yet threaded it keep working without a metric shape change. Tests: - The new metrics + logging suites (7 / 7) verify registry shape and event framing; orchestrator-side metric integration is exercised through the existing critique-conformance and critique-adapter-degraded suites (22 / 22 still green). - Logger test reassigns process.stdout.write directly instead of vi.spyOn so the Node overloaded write signature does not collide with MockInstance<unknown>. * feat(observability): Grafana dashboard JSON for Critique Theater Three default rows mapping to the metrics this branch wires up: 1. Fleet quality: composite score p50 / p90 / p99 line graph by adapter, plus a heatmap of the composite distribution. The line graph answers 'are my agents getting better over time'; the heatmap answers 'are the bad runs clustered around one adapter or smeared across the fleet'. 2. Adapter health: stacked bar charts for degraded marks (by adapter / reason) and parser errors (by adapter / kind) over a 5-minute window. The two queries together let an operator see 'is this adapter degraded because of malformed wire output or because of oversize blocks' without flipping panels. 3. Brief throughput: runs-per-hour by terminal status, an average rounds-per-run stat per adapter, and a round-duration ms p50 / p90 / p99 line. Throughput numbers fall straight out of the runs_total / rounds_total counters; the duration histogram is the same one the runs feed. The dashboard uses a templated $datasource var (defaults to 'prometheus') so an operator with multiple Prometheus instances can switch without editing JSON. Schema version 39 (Grafana 11). Operators import via: pnpm dlx @grafana/cli dashboard import tools/dev/dashboards/critique.json or paste into a provisioned dashboards directory. The file is checked into the repo as a starting artifact; alert rules and SLO panels ship after the first 1000 runs inform the right thresholds. JSON validates with node -e 'JSON.parse(...)' (sanity checked locally). * feat(daemon): OpenTelemetry outer span around the critique run Wraps each runOrchestrator call in a 'critique.run' span via the existing @opentelemetry/api dep added in the Phase 12 foundations commit. Attributes set on the span: - critique.run_id, critique.adapter, critique.skill at start - critique.final_status, critique.final_composite on terminal resolution - span status flipped to ERROR for failed / timed_out runs so a Tempo / Honeycomb / Jaeger filter on traces.status=error surfaces the right slice without joining back to Prometheus No exporter is wired by default; @opentelemetry/api is the API package and intentionally splits from @opentelemetry/sdk-, so the span is zero-overhead until an operator attaches an SDK through their runtime config. Inner per-round / parse_chunk / scoreboard_eval / persist_round / ship.persist spans defined in the Phase 12 plan are a follow-up: the outer span alone gives the trace a duration + final status + adapter/skill labels, which is the 80% value for dashboards that correlate runs across services. Adding child spans inside the existing 600-line orchestrator without restructuring is a separate careful change. Verification: - pnpm --filter @open-design/daemon typecheck: clean - 29 / 29 critique + metrics + logging tests still green fix(nix): bump pnpmDepsHash for prom-client + @opentelemetry/api lockfile bump nix-check failed on PR #1485 with hash mismatch in open-design-daemon-pnpm-deps and open-design-web-pnpm-deps after the Phase 12 foundations commit (`2b8b7445`) added prom-client and @opentelemetry/api to apps/daemon/package.json and refreshed pnpm-lock.yaml. CI reported the new sha: specified: HFLm+8hv3o5x3Xem4MXNsNclIgiVRc70+EBafL0rVn8= got: 7R1sQC38gOT0gsZ2oNOviCZ486cbbGJGJCis6WI8z9s= Both nix files pin the same workspace lockfile, so both flip in lockstep. No other Nix surface changes required. * fix(daemon): four Phase 12 review findings (Codex P2 x2 + Siri-Ray P2 + lefarcen P2) 1. Siri-Ray P2 in orchestrator.ts (round metric / log used untrusted agent values). The new observability path now records rs.composite and rs.mustFix (daemon-authoritative) instead of event.composite and event.mustFix when rs exists, and skips the bumps + log entirely when rs is missing (a degenerate round_end without any matching panelist_open). The dashboard p50 / p90 / p99 now agrees with persistence and ship decisions; an adapter reporting <ROUND_END composite='10'> while the daemon computed 6 logs 6 and still emits the composite_mismatch parser warning the prior block was already producing. 2. Codex P2 in server.ts (skill label always 'unknown'). The spawn path called runOrchestrator without passing the resolved skill id, so every live run bumped open_design_critique_{skill='unknown'} and the per-skill dashboard breakdown was always empty. Threaded effectiveSkillId (already computed at the same handler scope as the project skill fallback) through skill: . . . so the metric reflects the real skill when one is assigned, and the orchestrator default of 'unknown' only fires for runs that genuinely have none. 3. Codex P2 in conformance.ts (protocol-version mismatch let through). An adapter that emitted <CRITIQUE_RUN version='2'> followed by a valid SHIP classified as shipped because the harness only watched for terminal events. Added a guard inside the parse loop: if a run_started carries protocolVersion !== CRITIQUE_PROTOCOL_VERSION, mark the adapter degraded with reason 'protocol_version_mismatch' (already in DEGRADED_REASONS) and return early. ConformanceOutcome union widened to accept the new reason. 4. lefarcen P2 in tools/dev/dashboards/critique.json (runs-per-hour panel under-reported by 3600x). 'rate(...[1h])' returns per-second. Multiplied by 3600 so the panel title and unit match the actual value rendered. Verification: - pnpm --filter @open-design/daemon typecheck: clean - New metrics + logging suites (7), existing adapter-degraded (7), conformance (5), rollout (10): 29 / 29 green - Grafana JSON re-parses with node -e 'JSON.parse(...)' fix(nix): set pnpmDepsHash to fakeHash so CI surfaces the real hash for the regenerated lockfile (lefarcen P1 on PR #1485) * fix(nix): pin pnpmDepsHash to sha256-NtXbiRU0YZ4EVJVNC6N3sR1S0ozA3BvCwgXI0L0OMH4= from CI nix-check output --------- Co-authored-by: Nagendhra <nagendhra405@gmail.com>	2026-05-13 22:11:27 +08:00
lefarcen	2a0ebea50b	release: Open Design 0.7.0 - bump 14 monorepo package.json files to 0.7.0 (root + apps/{web,daemon,desktop,packaged,landing-page} + packages/{contracts,platform,sidecar,sidecar-proto} + tools/{dev,pack,pr} + e2e); apps/packaged was already at 0.6.1 from beta lane, all others at 0.6.0 - add CHANGELOG.md [0.7.0] - 2026-05-12 entry covering 97 merged PRs since 0.6.0: - Critique Theater: Phase 7 web client state machine (#1307) + Phase 6.2 daemon artifact extraction (#1085) - Web/UI: thumbs-up/down feedback widget (#1308), Cmd+, opens Settings (#1173), Finalize design package + Continue in CLI (#974), fetch models button for BYOK (#1034), provider models alphabetical sort (#1097), collapsible MCP JSON field-mapping (#1136), design file rename (#894) - Daemon: auto-memory store with chat-protocol-aware extraction (#999), install/uninstall skills & design systems (#1003), HTTP 206 range requests for video/audio (#1105), scheduled routines (#1033), agent runtime + route registration refactor (#1063, #1043) - HyperFrames: HTML-in-Canvas across web + skills (#866) - Skills/design systems: generic skills + design-templates split + finalize-design API (#955), agent-browser skill (#1284), WeChat design system + login-flow skill (#1083), hud/loom/trading-terminal design systems (#1069), release-notes-one-pager skill (#873), tokens.css schema (#1231) - Packaging: macOS Intel (x64) build (#759), official Nix flake (#402), beta packaging cache (#1095) - Maintainer ops: tools-pr PR-duty workspace (#1259), MAINTAINERS.md (#1290), contributor card bot (#932), PR→issue linking discipline (#1263) - Changed: conversation run isolation (#1271), default English i18n fallback (#1270), Codex CLI exit diagnostics / empty-response handling / path fallback (#1267, #1244, #1205) - Fixed: ~30 web + desktop + daemon + packaging bugfixes - Internal: nightly UI/desktop regression coverage (#1256), e2e/release report hardening (#1140), entry/settings automation (#954) - catch up [Unreleased] compare link to v0.7.0 and add missing [0.6.0] release link - add 97 PR footnote refs ([#402]..[#1330]) Verified locally: pnpm install + pre-build contracts/daemon/desktop dist + pnpm typecheck (exit 0 across all 14 packages on Node 22.22 with engine-warning). Release workflow validation runs after merge via release-stable.	2026-05-12 15:33:28 +08:00
Bryan A	587c783dc0	feat(web): add Finalize design package + Continue in CLI buttons (#451 ) (#974 ) * feat(daemon): expose resolvedDir on GET /api/projects/:id (#451 prereq) Native projects (no metadata.baseDir) live at <projects root>/<id>, where projects root is daemon-side state. The web client cannot reconstruct an absolute path on its own, and shell.openPath on a relative path is undefined behavior. Without resolvedDir, the upcoming Continue in CLI button (#451) would render permanently disabled for native projects. Mirrors PR #832's pattern of exposing designMdPath in its response. Computed via the existing resolveProjectDir(...) helper. No behavior change to existing callers; they ignore the new field. Adds ProjectDetailResponse contract type and a focused projects-routes test covering imported-folder, native, and unknown-id paths. * feat(web): add parseProvenance helper for DESIGN.md staleness checks Pure helper that extracts Project ID, design system, current artifact, transcript message count, and generated UTC timestamp from the `## Provenance` section emitted by the daemon's finalize synthesis prompt (apps/daemon/src/finalize-design.ts). Used by useDesignMdState to derive the Continue in CLI button's stale/fresh state without an additional daemon endpoint. Handles missing section, "none" sentinels for design system / artifact, and malformed timestamps without throwing. Tests cover all four branches. * feat(web): add buildClipboardPrompt template for Continue in CLI Inline single-source-of-truth template per #451 spec §3.4. Names the project, the working directory, and the DESIGN.md-first operating contract for the receiving `claude` CLI session. Trailing TODO is the blank task slot the issue body specifies — left empty so the user fills it in before submitting. Also lands the shared copyToClipboard helper (jsdom-safe canonical path + execCommand fallback) so the new button and any future caller share one fallback path, mirroring the inline pattern in FileViewer.tsx. Tests cover happy-path field rendering, "none"/"unknown" sentinels when DESIGN.md fields are absent, and both clipboard branches. * feat(web): add useProjectDetail + useDesignMdState hooks useProjectDetail wraps GET /api/projects/:id, surfacing the resolvedDir field and falling back to metadata.baseDir for older daemons that don't include it. Continue in CLI needs an absolute working directory so the desktop bridge can openPath it; the web client never reconstructs the path itself. useDesignMdState fetches the project's file list, downloads DESIGN.md when present, parses the Provenance section, and computes a stale verdict by comparing the recorded generatedAt against the max mtime of non-DESIGN.md files and the max conversation updatedAt. Drives the button's three-state UI (disabled / fresh / stale) without a daemon-side endpoint. Tests cover happy path, fallback, and both stale branches plus the pure computeStale helper for the null-timestamp edge case. * feat(web): add useFinalizeProject hook with cancel + error-code mapping Wraps POST /api/projects/:id/finalize/anthropic for the Finalize design package button. Three concerns: 1. Lifecycle: idle → pending → success \| error. Double-clicking the button aborts the prior in-flight request before starting a new one so the daemon never sees stacked finalize calls per project. 2. Cancellation: AbortController plumbed through fetch + a 130 s timer (daemon timeout 120 s + 10 s buffer). Cancel returns to idle cleanly — it's a user gesture, not an error surface. 3. Daemon error mapping: when the response is non-OK, body.error.code drives the canonical user-facing toast string (table covers all 7 codes the daemon emits today plus a network-error catch-all). body.error.details, when a string, surfaces alongside the category message so account-usage-cap responses (Anthropic 400 → UPSTREAM_UNAVAILABLE) can show the upstream's own reason instead of just the daemon's category label — committed to lefarcen on #450 verification reply. Tests cover request body shape, all 8 error codes via it.each, the network-error path, the details-surfacing branch, the cancel ⇒ idle flow, and the unknown-code → catch-all message branch. * feat(web): add useTerminalLaunch with electron/web detection Capability-detected wrapper around window.electronAPI.openPath. On desktop the bridge forwards to shell.openPath, which opens the OS file manager at the project working directory (per Electron's contract for directory paths — it is NOT a terminal launcher; spawning a terminal application is deferred per #451 Non-goals). On browser builds the hook reports web-fallback so the caller renders a manual-instruction toast naming the working directory. Treats any non-empty string return from shell.openPath as ok: false so platform-specific failures surface the manual fallback toast. Behavior is exercised end-to-end by the upcoming ContinueInCliButton tests. * feat(desktop): expose shell.openPath via electronAPI bridge Adds an openPath bridge method that the Continue in CLI button (#451) uses to surface the project working directory in the OS file manager. shell.openPath is part of Electron's contract and resolves to '' on success / a non-empty error string on failure; the IPC handler forwards the result so the renderer can decide between the success toast and the manual fallback toast without a separate error channel. Empty / non-string inputs short-circuit to a self-describing error string so the renderer never needs to worry about undefined-input crashes from the main process. Web side: extracts Window.electronAPI into a single global declaration at apps/web/src/types/electron.d.ts so future bridge methods land in one place. Two pre-existing inline declare-global blocks (NewProjectPanel.tsx, providers/registry.ts) are deleted in favor of that single source of truth — the inline ones each carried a partial shape of the bridge and were diverging from the desktop preload. * feat(web): add FinalizeDesignButton, ContinueInCliButton, ProjectActionsToolbar Project-level toolbar that hosts the two new actions from #451. Mounted between AppChromeHeader and the chat/workspace split (wiring lands in the next commit). Per-file actions (Export PDF/PPTX/ZIP, Deploy) stay in the FileViewer share menu. FinalizeDesignButton has three idle labels driven by DESIGN.md existence + staleness, plus a pending state with a spinner and a cancel link that maps to useFinalizeProject's AbortController. Error toasts are owned by ProjectView so the button doesn't carry its own toast surface. ContinueInCliButton renders disabled with a Finalize-pointing tooltip when DESIGN.md is missing (so the workflow is discoverable rather than hidden), enabled when fresh, and enabled with a stale chip otherwise. Chip text is the spec's canonical "Spec is stale — regenerate?" — N-turns-ago is deferred per spec §4.6. Toast.tsx is a tiny transient component that mirrors PromptTemplatePreviewModal's state-based toast pattern; supports a secondary details line so daemon error envelopes that carry an upstream explanation (e.g. Anthropic account-usage cap) can surface the real reason alongside the daemon's category label. CSS appends one block to apps/web/src/index.css mirroring the existing app-project-title token usage; no CSS modules in this repo (verified by grep). * test(web): cover ContinueInCliButton states + interaction wiring Three rendered states (DESIGN.md missing → disabled with the Finalize-pointing tooltip; DESIGN.md fresh → enabled, no chip; DESIGN.md stale → enabled with the canonical "Spec is stale — regenerate?" chip), plus three onClick branches (no-op when disabled, fires once when fresh, fires once when stale). Click-handler integration with clipboard / shell.openPath / toast lives in ProjectView (the button is presentational and takes the handler in via props), so those are covered by Phase K's wiring + the manual smoke test rather than the per-component test. * feat(web): wire Continue in CLI + Finalize buttons into ProjectView Mounts the new project-actions toolbar between AppChromeHeader and the chat/workspace split, hidden when workspaceFocused so the focus-mode artifact view stays uncluttered. Wires the four hooks (useProjectDetail, useDesignMdState, useFinalizeProject, useTerminalLaunch) to a single shared toast surface. handleFinalize reads the request body from the existing config: AppConfig prop and uses effectiveMaxTokens(config) to match the chat-flow's maxTokens defaulting; on success it refreshes useDesignMdState so the toolbar re-renders with the new chip state. handleContinueInCli builds the literal clipboard prompt, copies it, opens the working directory via shell.openPath on desktop / falls through to a manual-instruction toast on browser, and surfaces shell.openPath failures with a fallback toast that names the path. Errors lift into the same toast surface (a useEffect tied to finalize.error) so the daemon's category message + body.error.details reach the user as the spec's two-line render — covered by hook test 16a in the prior commit. ⌘+Shift+K (mac) / Ctrl+Shift+K (others) is the keyboard accelerator for Continue in CLI; capture-phase, platform-gated, no-op when DESIGN.md is missing. Mirrors the existing FileWorkspace shortcut idiom and does not collide with ⌘+P (Quick Switcher). * fix(web): distinguish timeout abort from user cancel in useFinalizeProject Addresses codex P2 finding on PR #974: the catch block treated every AbortError as a user-initiated cancel and reset to idle silently. If the internal 130 s timeout fired, users saw no failure signal but the daemon's synthesis call may still have been in flight. Adds a timedOutRef set inside the setTimeout callback before controller.abort(), and branches in the catch: timeout → status 'error' with new TIMEOUT code ("Finalize timed out after 130 s. The daemon may still be running."), user cancel → existing idle reset. Reset the ref at the start of every trigger() so a previous timeout doesn't poison the next call. Adds one test using vi.useFakeTimers() that advances past 130_001 ms and asserts the TIMEOUT error surface. * fix(web): surface clipboard failures by rendering the prompt in the toast Addresses codex P2 finding on PR #974: handleContinueInCli ignored copyToClipboard's return value, so when both clipboard paths failed (restricted browser context / insecure origin) the toast still said "paste the prompt" though nothing had been copied — leaving users with no manual-copy recourse in exactly the environments where the fallback should help. handleContinueInCli now branches on copyToClipboard's boolean return. On failure the toast renders the prepared prompt in a scrollable <pre> block and pins itself open (no auto-dismiss) so the user has time to select-and-copy manually. Includes a Dismiss button + the working directory in the secondary details line so the user has the information needed to proceed. The folder-open call is skipped on copy failure because there's nothing to paste yet; the user copies first, then re-clicks Continue in CLI when they're ready. Toast component grows an optional Updating VS Code Server to version 41dd792b5e652393e7787322889ed5fdc58bd75b Removing previous installation... Installing VS Code Server for Linux x64 (41dd792b5e652393e7787322889ed5fdc58bd75b) Downloading: 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 2% 2% 2% 2% 2% 2% 2% 2% 2% 2% 2% 2% 2% 2% 2% 2% 2% 2% 3% 3% 3% 3% 3% 3% 3% 3% 3% 3% 3% 3% 3% 3% 3% 3% 3% 3% 4% 4% 4% 4% 4% 4% 4% 4% 4% 4% 4% 4% 4% 4% 4% 4% 4% 4% 5% 5% 5% 5% 5% 5% 5% 5% 5% 5% 5% 5% 5% 5% 5% 5% 5% 5% 6% 6% 6% 6% 6% 6% 6% 6% 6% 6% 6% 6% 6% 6% 6% 6% 6% 6% 7% 7% 7% 7% 7% 7% 7% 7% 7% 7% 7% 7% 7% 7% 7% 7% 7% 7% 8% 8% 8% 8% 8% 8% 8% 8% 8% 8% 8% 8% 8% 8% 8% 8% 8% 8% 9% 9% 9% 9% 9% 9% 9% 9% 9% 9% 9% 9% 9% 9% 9% 9% 9% 10% 10% 10% 10% 10% 10% 10% 10% 10% 10% 10% 10% 10% 10% 10% 10% 10% 10% 11% 11% 11% 11% 11% 11% 11% 11% 11% 11% 11% 11% 11% 11% 11% 11% 11% 11% 12% 12% 12% 12% 12% 12% 12% 12% 12% 12% 12% 12% 12% 12% 12% 12% 12% 12% 13% 13% 13% 13% 13% 13% 13% 13% 13% 13% 13% 13% 13% 13% 13% 13% 13% 13% 14% 14% 14% 14% 14% 14% 14% 14% 14% 14% 14% 14% 14% 14% 14% 14% 14% 14% 15% 15% 15% 15% 15% 15% 15% 15% 15% 15% 15% 15% 15% 15% 15% 15% 15% 15% 16% 16% 16% 16% 16% 16% 16% 16% 16% 16% 16% 16% 16% 16% 16% 16% 16% 16% 17% 17% 17% 17% 17% 17% 17% 17% 17% 17% 17% 17% 17% 17% 17% 17% 17% 17% 18% 18% 18% 18% 18% 18% 18% 18% 18% 18% 18% 18% 18% 18% 18% 18% 18% 19% 19% 19% 19% 19% 19% 19% 19% 19% 19% 19% 19% 19% 19% 19% 19% 19% 19% 20% 20% 20% 20% 20% 20% 20% 20% 20% 20% 20% 20% 20% 20% 20% 20% 20% 20% 21% 21% 21% 21% 21% 21% 21% 21% 21% 21% 21% 21% 21% 21% 21% 21% 21% 21% 22% 22% 22% 22% 22% 22% 22% 22% 22% 22% 22% 22% 22% 22% 22% 22% 22% 22% 23% 23% 23% 23% 23% 23% 23% 23% 23% 23% 23% 23% 23% 23% 23% 23% 23% 23% 24% 24% 24% 24% 24% 24% 24% 24% 24% 24% 24% 24% 24% 24% 24% 24% 24% 24% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 25% 26% 26% 26% 26% 26% 26% 26% 26% 26% 26% 26% 26% 26% 26% 26% 26% 26% 26% 27% 27% 27% 27% 27% 27% 27% 27% 27% 27% 27% 27% 27% 27% 27% 27% 27% 28% 28% 28% 28% 28% 28% 28% 28% 28% 28% 28% 28% 28% 28% 28% 28% 28% 28% 29% 29% 29% 29% 29% 29% 29% 29% 29% 29% 29% 29% 29% 29% 29% 29% 29% 29% 30% 30% 30% 30% 30% 30% 30% 30% 30% 30% 30% 30% 30% 30% 30% 30% 30% 30% 31% 31% 31% 31% 31% 31% 31% 31% 31% 31% 31% 31% 31% 31% 31% 31% 31% 31% 32% 32% 32% 32% 32% 32% 32% 32% 32% 32% 32% 32% 32% 32% 32% 32% 32% 32% 33% 33% 33% 33% 33% 33% 33% 33% 33% 33% 33% 33% 33% 33% 33% 33% 33% 33% 34% 34% 34% 34% 34% 34% 34% 34% 34% 34% 34% 34% 34% 34% 34% 34% 34% 34% 35% 35% 35% 35% 35% 35% 35% 35% 35% 35% 35% 35% 35% 35% 35% 35% 35% 35% 36% 36% 36% 36% 36% 36% 36% 36% 36% 36% 36% 36% 36% 36% 36% 36% 36% 37% 37% 37% 37% 37% 37% 37% 37% 37% 37% 37% 37% 37% 37% 37% 37% 37% 37% 38% 38% 38% 38% 38% 38% 38% 38% 38% 38% 38% 38% 38% 38% 38% 38% 38% 38% 39% 39% 39% 39% 39% 39% 39% 39% 39% 39% 39% 39% 39% 39% 39% 39% 39% 39% 40% 40% 40% 40% 40% 40% 40% 40% 40% 40% 40% 40% 40% 40% 40% 40% 40% 40% 41% 41% 41% 41% 41% 41% 41% 41% 41% 41% 41% 41% 41% 41% 41% 41% 41% 41% 42% 42% 42% 42% 42% 42% 42% 42% 42% 42% 42% 42% 42% 42% 42% 42% 42% 42% 43% 43% 43% 43% 43% 43% 43% 43% 43% 43% 43% 43% 43% 43% 43% 43% 43% 43% 44% 44% 44% 44% 44% 44% 44% 44% 44% 44% 44% 44% 44% 44% 44% 44% 44% 44% 45% 45% 45% 45% 45% 45% 45% 45% 45% 45% 45% 45% 45% 45% 45% 45% 45% 46% 46% 46% 46% 46% 46% 46% 46% 46% 46% 46% 46% 46% 46% 46% 46% 46% 46% 47% 47% 47% 47% 47% 47% 47% 47% 47% 47% 47% 47% 47% 47% 47% 47% 47% 47% 48% 48% 48% 48% 48% 48% 48% 48% 48% 48% 48% 48% 48% 48% 48% 48% 48% 48% 49% 49% 49% 49% 49% 49% 49% 49% 49% 49% 49% 49% 49% 49% 49% 49% 49% 49% 50% 50% 50% 50% 50% 50% 50% 50% 50% 50% 50% 50% 50% 50% 50% 50% 50% 50% 51% 51% 51% 51% 51% 51% 51% 51% 51% 51% 51% 51% 51% 51% 51% 51% 51% 51% 52% 52% 52% 52% 52% 52% 52% 52% 52% 52% 52% 52% 52% 52% 52% 52% 52% 52% 53% 53% 53% 53% 53% 53% 53% 53% 53% 53% 53% 53% 53% 53% 53% 53% 53% 53% 54% 54% 54% 54% 54% 54% 54% 54% 54% 54% 54% 54% 54% 54% 54% 54% 54% 55% 55% 55% 55% 55% 55% 55% 55% 55% 55% 55% 55% 55% 55% 55% 55% 55% 55% 56% 56% 56% 56% 56% 56% 56% 56% 56% 56% 56% 56% 56% 56% 56% 56% 56% 56% 57% 57% 57% 57% 57% 57% 57% 57% 57% 57% 57% 57% 57% 57% 57% 57% 57% 57% 58% 58% 58% 58% 58% 58% 58% 58% 58% 58% 58% 58% 58% 58% 58% 58% 58% 58% 59% 59% 59% 59% 59% 59% 59% 59% 59% 59% 59% 59% 59% 59% 59% 59% 59% 59% 60% 60% 60% 60% 60% 60% 60% 60% 60% 60% 60% 60% 60% 60% 60% 60% 60% 60% 61% 61% 61% 61% 61% 61% 61% 61% 61% 61% 61% 61% 61% 61% 61% 61% 61% 61% 62% 62% 62% 62% 62% 62% 62% 62% 62% 62% 62% 62% 62% 62% 62% 62% 62% 62% 63% 63% 63% 63% 63% 63% 63% 63% 63% 63% 63% 63% 63% 63% 63% 63% 63% 64% 64% 64% 64% 64% 64% 64% 64% 64% 64% 64% 64% 64% 64% 64% 64% 64% 64% 65% 65% 65% 65% 65% 65% 65% 65% 65% 65% 65% 65% 65% 65% 65% 65% 65% 65% 66% 66% 66% 66% 66% 66% 66% 66% 66% 66% 66% 66% 66% 66% 66% 66% 66% 66% 67% 67% 67% 67% 67% 67% 67% 67% 67% 67% 67% 67% 67% 67% 67% 67% 67% 67% 68% 68% 68% 68% 68% 68% 68% 68% 68% 68% 68% 68% 68% 68% 68% 68% 68% 68% 69% 69% 69% 69% 69% 69% 69% 69% 69% 69% 69% 69% 69% 69% 69% 69% 69% 69% 70% 70% 70% 70% 70% 70% 70% 70% 70% 70% 70% 70% 70% 70% 70% 70% 70% 70% 71% 71% 71% 71% 71% 71% 71% 71% 71% 71% 71% 71% 71% 71% 71% 71% 71% 71% 72% 72% 72% 72% 72% 72% 72% 72% 72% 72% 72% 72% 72% 72% 72% 72% 72% 73% 73% 73% 73% 73% 73% 73% 73% 73% 73% 73% 73% 73% 73% 73% 73% 73% 73% 74% 74% 74% 74% 74% 74% 74% 74% 74% 74% 74% 74% 74% 74% 74% 74% 74% 74% 75% 75% 75% 75% 75% 75% 75% 75% 75% 75% 75% 75% 75% 75% 75% 75% 75% 75% 76% 76% 76% 76% 76% 76% 76% 76% 76% 76% 76% 76% 76% 76% 76% 76% 76% 76% 77% 77% 77% 77% 77% 77% 77% 77% 77% 77% 77% 77% 77% 77% 77% 77% 77% 77% 78% 78% 78% 78% 78% 78% 78% 78% 78% 78% 78% 78% 78% 78% 78% 78% 78% 78% 79% 79% 79% 79% 79% 79% 79% 79% 79% 79% 79% 79% 79% 79% 79% 79% 79% 79% 80% 80% 80% 80% 80% 80% 80% 80% 80% 80% 80% 80% 80% 80% 80% 80% 80% 80% 81% 81% 81% 81% 81% 81% 81% 81% 81% 81% 81% 81% 81% 81% 81% 81% 81% 81% 82% 82% 82% 82% 82% 82% 82% 82% 82% 82% 82% 82% 82% 82% 82% 82% 82% 83% 83% 83% 83% 83% 83% 83% 83% 83% 83% 83% 83% 83% 83% 83% 83% 83% 83% 84% 84% 84% 84% 84% 84% 84% 84% 84% 84% 84% 84% 84% 84% 84% 84% 84% 84% 85% 85% 85% 85% 85% 85% 85% 85% 85% 85% 85% 85% 85% 85% 85% 85% 85% 85% 86% 86% 86% 86% 86% 86% 86% 86% 86% 86% 86% 86% 86% 86% 86% 86% 86% 86% 87% 87% 87% 87% 87% 87% 87% 87% 87% 87% 87% 87% 87% 87% 87% 87% 87% 87% 88% 88% 88% 88% 88% 88% 88% 88% 88% 88% 88% 88% 88% 88% 88% 88% 88% 88% 89% 89% 89% 89% 89% 89% 89% 89% 89% 89% 89% 89% 89% 89% 89% 89% 89% 89% 90% 90% 90% 90% 90% 90% 90% 90% 90% 90% 90% 90% 90% 90% 90% 90% 90% 90% 91% 91% 91% 91% 91% 91% 91% 91% 91% 91% 91% 91% 91% 91% 91% 91% 91% 92% 92% 92% 92% 92% 92% 92% 92% 92% 92% 92% 92% 92% 92% 92% 92% 92% 92% 93% 93% 93% 93% 93% 93% 93% 93% 93% 93% 93% 93% 93% 93% 93% 93% 93% 93% 94% 94% 94% 94% 94% 94% 94% 94% 94% 94% 94% 94% 94% 94% 94% 94% 94% 94% 95% 95% 95% 95% 95% 95% 95% 95% 95% 95% 95% 95% 95% 95% 95% 95% 95% 95% 96% 96% 96% 96% 96% 96% 96% 96% 96% 96% 96% 96% 96% 96% 96% 96% 96% 96% 97% 97% 97% 97% 97% 97% 97% 97% 97% 97% 97% 97% 97% 97% 97% 97% 97% 97% 98% 98% 98% 98% 98% 98% 98% 98% 98% 98% 98% 98% 98% 98% 98% 98% 98% 98% 99% 99% 99% 99% 99% 99% 99% 99% 99% 99% 99% 99% 99% 99% 99% 99% 99% 99%100%100% Unpacking: 0% 1% 2% 3% 4% 5% 6% 7% 8% 9% 10% 11% 12% 13% 14% 15% 16% 17% 18% 19% 20% 21% 22% 23% 24% 25% 26% 27% 28% 29% 30% 31% 32% 33% 34% 35% 36% 37% 38% 39% 40% 41% 42% 43% 44% 45% 46% 47% 48% 49% 50% 51% 52% 53% 54% 55% 56% 57% 58% 59% 60% 61% 62% 63% 64% 65% 66% 67% 68% 69% 70% 71% 72% 73% 74% 75% 76% 77% 78% 79% 80% 81% 82% 83% 84% 85% 86% 87% 88% 89% 90% 91% 92% 93% 94% 95% 96% 97% 98% 99%100% Unpacked 4009 files and folders to /home/bryan/.vscode-server/bin/41dd792b5e652393e7787322889ed5fdc58bd75b. Looking for compatibility check script at /home/bryan/.vscode-server/bin/41dd792b5e652393e7787322889ed5fdc58bd75b/bin/helpers/check-requirements.sh Running compatibility check script Compatibility check successful (0) prop and the auto-dismiss TTL is suppressed whenever code is present. CSS adds .od-toast-code (monospace, max-height 240 with overflow-auto) and .od-toast-dismiss styling. Six new Toast tests cover details rendering, code rendering, no-auto-dismiss when code is present, auto-dismiss when code is absent, and the Dismiss button affordance. * fix(web): make ContinueInCliButton disabled-state guidance visible Addresses mrcfps's PR #974 review: native <button disabled> does not fire hover/focus events in browsers we ship against, so a `title` tooltip on the disabled button never surfaces. The only guidance for the missing-DESIGN.md state was effectively invisible — defeating the spec's "discoverable, not hidden" intent. Renders the help text as a visible sibling <span> next to the disabled button instead. Adds aria-describedby pointing the button at the hint's id so assistive tech announces the explanation when the disabled button gets focus. The native `disabled` attribute stays so the button still can't be clicked or submitted. CSS adds .project-actions-disabled-hint (muted italic, 11.5px, matches the existing meta/secondary text style on this surface). Test asserts the role="note" hint is in the DOM with the canonical text and that the button's aria-describedby links to its id. * fix(web): keep ProjectActionsToolbar at natural height inside the .app grid The .app container was `grid-template-rows: auto 1fr` — only two rows. Adding ProjectActionsToolbar as a third child between AppChromeHeader and the chat/workspace split made the toolbar the 2nd grid item, so it took the `1fr` row (filling roughly half the viewport) while the split got pushed into an implicit auto row at its content's natural height. Surfaced as a screenshot from Bryan showing the toolbar's background bleeding across most of the screen. Extend grid-template-rows to `auto auto 1fr` and pin the split to `grid-row: 3` explicitly. Now: - Toolbar visible: row 1 = header (auto), row 2 = toolbar (auto), row 3 = split (1fr, fills remaining viewport). - Toolbar hidden via hidden=workspaceFocused → ProjectActionsToolbar returns null, row 2 collapses to 0px (auto with no content), split still fills row 3. No JS changes; existing 609 tests still green. * fix(web): guard useFinalizeProject state writes against superseded triggers Addresses mrcfps's PR #974 P1 review on useFinalizeProject.ts:132 (also called out as P1.3 in lefarcen's deep-dive review). Calling trigger() twice in quick succession aborted the first controller and swapped abortRef to the new one, but the first request's later AbortError catch still unconditionally called setStatus('idle') / setError(null). That cleared the spinner and re-enabled both toolbar buttons while the replacement finalize was still pending — defeating the de-duplication this hook was meant to enforce. Adds an isCurrent() closure (`abortRef.current === controller`) and gates every state-write site after the await: success path, non-OK envelope path, AbortError-timeout, AbortError-cancel, and network-error all bail early when the trigger has been superseded. Per mrcfps: "make every state write request-scoped." Regression test triggers twice in quick succession with a never-resolving fetch, awaits the first promise (it rejects with AbortError), and asserts status stays 'pending' rather than collapsing to 'idle' under the replacement's lifetime. * fix(desktop): allowlist-validate shell.openPath against registered project roots Addresses mrcfps's PR #974 P1 review on runtime.ts:305 (also called out as P1.2 in lefarcen's deep-dive review): the new `shell:open-path` IPC handler accepted any renderer-supplied string and forwarded it straight into Electron's `shell.openPath`, widening the renderer→main trust boundary so XSS or a compromised renderer dependency could open arbitrary local paths to the user. Adds an explicit gate around the bridge: 1. validateExistingDirectory(p) — floor check that rejects empty strings, relative paths, files, apps, and non-existent paths; realpath-resolves so symlink games can't be used to register one path and reach another. 2. createProjectRootGate() — Set-backed allowlist of daemon-validated project working directories. The renderer calls registerProjectRoot(absDir) once per project mount via a new IPC method (preload bridge); the main process only opens paths that pass both the floor check and the allowlist. ProjectView wires the registration via a useEffect tied to projectDetail.resolvedDir, so the active project's daemon-supplied working directory is always the one being approved (not a renderer- synthesized string). Threat-model caveat documented in the runtime.ts comment block: an attacker that fully controls the renderer can also call register with arbitrary paths. Closing that gap fully requires a daemon-side round-trip to derive the canonical resolvedDir from the daemon's project registry, which is deferred to keep this PR focused. Today's allowlist still defends against accidental misuse, bugs, and common XSS payloads that don't know to call register first. Adds apps/packaged/tests/desktop-project-root-gate.test.ts with 13 cases: floor-validation rejection cases (empty / relative / missing / file), happy-path resolution, symlink realpath canonicalization, and the allowlist's register/isApproved/reset semantics. Mirrors the existing apps/packaged/tests/desktop-url-allowlist.test.ts pattern from PR #911 — the packaged workspace hosts the test because apps/desktop has no vitest setup yet. * fix(daemon): wire request-lifecycle abort signal through finalize route Addresses mrcfps's PR #974 P1 review on apps/daemon/src/server.ts:3831-3837 (also called out as P1.1 in lefarcen's deep-dive review): `POST /api/projects/:id/finalize/anthropic` called `finalizeDesignPackage(...)` without threading any request-lifecycle abort, so cancelling the browser fetch only aborted the UI-side request — the daemon's 60–120 s Anthropic call kept running and still wrote DESIGN.md after the UI returned to idle. Adds an AbortController inside the route handler, fired from `res.on('close')`, and threads its signal into the existing `signal?: AbortSignal` parameter on `FinalizeOptions` (finalize-design.ts:70). `callAnthropicWithRetry` already passes the signal through to the underlying fetch, so a client disconnect now propagates all the way to the Anthropic SDK call. Listener-event choice: `res.on('close')` is the canonical event for "client disconnected before response was sent" in Express. The common alternative `req.on('close')` fires whenever the request stream finishes — for POST routes that means as soon as the body-parser middleware drains the body, well before the route does any work. Using req.on('close') would have flipped the abort controller in every successful run; the test caught this empirically. Caveat documented in the route's comment block: an abort fired after the upstream response has been received but before the atomic write completes still allows the write to land. The SDK contract bounds the network round-trip, not the post-network disk handoff. Adds tests/finalize-route-abort.test.ts: spins up the test server, mocks global fetch to capture the daemon-side AbortSignal at the Anthropic call, sends the request via raw http (so we can destroy the underlying socket), waits until the server reaches the Anthropic call, then destroys the socket and asserts that the daemon-side signal received an abort event within 5 s. Three pre-existing project-watchers chokidar tests show flaky timeouts under full-suite concurrency but pass in isolation; unrelated to this fix. * fix(daemon): refactor finalize-route-abort test to satisfy strict TS narrowing The CI typecheck (`pnpm --filter @open-design/daemon typecheck`, which runs both tsconfig.json and tsconfig.tests.json) caught what my pre-push validation missed: TS narrowed `capturedSignal` to literal `null` because vitest's mockImplementation closure can't prove its callback runs, leaving the bare `let capturedSignal: AbortSignal \| null = null` permanently typed at its initial value. At line 184 (`expect(capturedSignal?.aborted).toBe(true)`) the right-hand side of the optional-chain became unreachable, and TS flagged it as `Property 'aborted' does not exist on type 'never'`. Switches to the standard ref-object pattern (`const capture: { signal: AbortSignal \| null } = { signal: null }`). TS narrows let bindings inside closures conservatively but treats object-property writes as opaque, so `capture.signal` reads correctly across the closure boundary. Logic is unchanged. (Pre-push oversight: ran `pnpm --filter @open-design/web typecheck` but not the full repo `pnpm typecheck` after the daemon test landed; the daemon's own typecheck would have caught this. Adding `pnpm typecheck` back into the standard pre-push checklist.) * fix(desktop): make shell.openPath gate daemon-controlled and reject .app bundles Addresses lefarcen + mrcfps PR #974 P1 reviews on the previous path allowlist (commit `8bf56597`): - mrcfps (runtime.ts:45): `validateExistingDirectory` accepted macOS `.app` bundles because they're directories, so the gate would forward `/Applications/Safari.app` (or any other app bundle) into shell.openPath and launch the application — a stronger capability than the bridge's intended "reveal the project folder" feature. - lefarcen (runtime.ts:396): the allowlist was renderer-controlled. A compromised renderer could call `shell:register-project-root` with any existing absolute directory and then `shell:open-path` that same path; the IPC injection issue I'd documented as "deferred" was the central reviewer concern, not an acceptable caveat. Both reviewers asked for the gate to be derived from a daemon-authoritative source. The redesign drops the renderer-controlled register/openPath pair and replaces it with a single `openPath(projectId)` bridge call. The desktop main process resolves the project ID by calling the daemon's `GET /api/projects/:id` endpoint over the web sidecar proxy (which already forwards `/api/` to the daemon — verified in apps/web/sidecar/server.ts:209 and apps/web/next.config.ts:77), parses `resolvedDir` from the response, validates it against the floor (absolute, exists, is-directory, not .app), and only then forwards to `shell.openPath`. The renderer never names the path directly, so a compromised renderer cannot escalate to opening arbitrary local paths — it can only name a project the daemon already knows about, and the canonical path comes from the daemon's own response. Surface changes: - `runtime.ts`: `createProjectRootGate` removed. `fetchResolvedProjectDir(webUrl, projectId, fetchImpl?)` added. `validateExistingDirectory` rejects `.app` suffix after the realpath check (so symlinked launders are caught too). `shell:open-path` handler signature changes from `(path)` to `(projectId)`; `shell:register-project-root` handler removed. - `preload.cts`: `openPath(projectId)`; `registerProjectRoot` removed from the bridge surface. - `apps/web/src/types/electron.d.ts`: type updated to match. - `useTerminalLaunch.ts`: `open(projectId)` instead of `open(dir)`. - `ProjectView.tsx`: passes `project.id` to `terminalLauncher.open`; the registerProjectRoot useEffect is deleted. Toast text still reads `projectDir` (from `useProjectDetail.resolvedDir`) for fallback messages — the display* path is independent of the open mechanism. - `apps/packaged/tests/desktop-project-root-gate.test.ts`: rewritten to cover `validateExistingDirectory` (8 cases including the new `.app` suffix and symlinked-bundle rejection) and `fetchResolvedProjectDir` (8 cases including empty/invalid project ids, daemon HTTP success/failure, missing resolvedDir, network error, and URL canonicalization). Total: 16 passing tests, ~330 LOC churn including test rewrites. Lesson learned (from the iteration loop, not the code): when a reviewer asks for "ideally X, or at least Y," shipping Y with a deferred-X note flags the gap rather than fixing it. Either ship X or argue Y is sufficient; don't middle-ground. * feat(contracts,sidecar-proto): add desktop-auth IPC + fromTrustedPicker Schema-only prep for the PR #974 round-3 fix. Adds the two type extensions the daemon HTTP gate and the desktop main process will build on: - packages/sidecar-proto: SIDECAR_MESSAGES.REGISTER_DESKTOP_AUTH, with a base64-validated `{ secret }` payload + RegisterDesktopAuthResult. Updates normalizeDaemonSidecarMessage to accept the new message and pins both branches (accept + reject) in tests/index.test.ts. - packages/contracts: ProjectMetadata.fromTrustedPicker — a marker the daemon stamps on folder-imported projects whose POST /api/import/folder passed the desktop HMAC gate. The marker is privileged in the same way as `baseDir`: only the gated import handler sets it, and the desktop main process refuses to forward `shell.openPath` for folder-imported projects whose metadata lacks it. * fix(daemon): gate /api/import/folder on desktop HMAC token Closes the renderer→arbitrary-baseDir→shell.openPath bypass chain flagged by lefarcen and mrcfps in round 3 of PR #974. Both reviewers converged on the same gap: the previous round only moved path resolution into the daemon, but renderer JS could still POST /api/import/folder with any absolute path, get a project ID back, and then call openPath(projectId) to reveal the attacker-chosen path. Daemon-side closure: - New module-scope desktop auth secret + setter exported from apps/daemon/src/server.ts. The secret is null at boot (web/standalone mode unaffected) and gets set when the desktop main process registers it over the daemon's sidecar IPC. - New `verifyDesktopImportToken` pure helper. Verifies tokens shaped `${nonce}~${exp}~${signature}` against HMAC-SHA256(secret, baseDir + "\n" + nonce + "\n" + exp). Field separator is `~` (not `.`) because ISO 8601 expiries embed dots; `~` is in neither base64url nor ISO 8601 character sets. Rejects expired tokens, replayed nonces, and expiries beyond 2× the 60s TTL. - New middleware on POST /api/import/folder. When the secret is set, every request must carry a valid `X-OD-Desktop-Import-Token` header bound to the requested baseDir. Rejected requests return 403 with FORBIDDEN. When the secret is unset (no desktop registered), the route is unchanged so web-only deployments and standalone daemons keep working. - Trusted imports get `metadata.fromTrustedPicker: true` stamped on the project. POST /api/projects and PATCH /api/projects/:id reject any client-supplied `fromTrustedPicker` (privileged the same way as `baseDir`), and the PATCH preservation block re-stamps the marker on partial-metadata patches so it cannot be silently stripped. - Daemon sidecar IPC handler: REGISTER_DESKTOP_AUTH calls setDesktopAuthSecret with the base64-decoded secret. The HTTP and IPC servers share a process so the registration takes effect immediately for the next inbound /api/import/folder call. Tests: - apps/daemon/tests/desktop-import-token-gate.test.ts (15 cases): web mode acceptance, no-token rejection, malformed-token rejection, wrong-secret rejection, wrong-baseDir rejection, expired rejection, oversized-window rejection, valid mint + trusted-picker stamp + replay rejection, plus 6 pure-helper cases for verifyDesktopImportToken. afterAll() clears the secret to keep the shared HTTP server clean for sibling test files. - apps/daemon/tests/projects-routes.test.ts (+2 cases): POST and PATCH reject `fromTrustedPicker` in client-supplied metadata. Existing folder-import-route.test.ts continues to pass because none of those tests register a desktop secret; the gate stays dormant. * fix(desktop,web): atomic pickAndImport replacing pickFolder; openPath trusted-picker check Closes the renderer→arbitrary-baseDir bypass at the bridge boundary. The renderer no longer receives a raw filesystem path from the main process; the picker dialog and the import call live in a single main-process transaction. Desktop main: - runDesktopMain generates a per-process 32-byte secret and registers it with the daemon over the daemon's sidecar IPC before the BrowserWindow is created. registerDesktopAuthWithDaemon retries a few times because tools-dev / tools-pack spawn daemon, web, and desktop as siblings, so the daemon may not be listening yet on desktop boot. A failed registration logs a warning and the runtime refuses pickAndImport calls (no secret → no token can be minted). - runtime.ts replaces the `dialog:pick-folder` IPC with `dialog:pick-and-import`. The handler shows the picker, mints an HMAC token bound to the chosen path, POSTs /api/import/folder via the discovered web URL with the token + body, and returns the daemon's ImportFolderResponse to the renderer (or a structured failure envelope). Renderer never sees the path or the token. - shell:open-path now consults a new pure helper `isOpenPathAllowedForProject` that refuses folder-imported projects whose metadata lacks `fromTrustedPicker: true`. This is the literal interpretation of mrcfps's round-3 follow-up: openPath is gated to projects whose resolvedDir came from the trusted-picker flow, not just transitively via the import gate. Native projects (no baseDir → daemon-owned <projectsRoot>/<id>) are always safe to open. - fetchResolvedProjectDir now returns a `ResolvedProjectDirContext` with hasBaseDir + fromTrustedPicker so the openPath handler can enforce the marker check. - New `signDesktopImportToken` pure helper mirrors the daemon-side signer with the same `~`-separated wire shape, exported for the packaged workspace's test file. Preload bridge: - `pickFolder` is deleted. The new `pickAndImport(init?)` returns the daemon's import response or a structured failure. `openPath` keeps its existing signature; its trust gate now lives in the main process. Web renderer: - electron.d.ts drops `pickFolder` and adds `pickAndImport` with the shared DesktopPickAndImportResult union pulled from contracts. - NewProjectPanel: when running on Electron (pickAndImport bridge present), the "Open folder" button calls pickAndImport atomically and forwards the response through a new `onImportFolderResponse` prop. On web (no bridge), the existing manual baseDir input keeps working — browser builds have no shell.openPath surface so a renderer-named path cannot escalate. - EntryView and App.tsx pass through the new callback. App's `handleImportFolderResponse` updates state from the response without a second fetch (the import already happened in the main process). Tests (apps/packaged/tests/desktop-project-root-gate.test.ts): - 3 cases for `isOpenPathAllowedForProject`: native allowed, trusted-picker allowed, legacy folder-import refused. - 6 cases for `signDesktopImportToken`: shape (~-separated), determinism, signature flips when secret/baseDir/nonce/exp changes. - Existing fetchResolvedProjectDir cases extended for the new `context` shape and additional cases that prove the metadata inspection (hasBaseDir, fromTrustedPicker) reads the daemon response correctly. * fix(daemon): make desktop import-folder gate fail-closed (PR #974 round 4) lefarcen P1 on round 3 of PR #974: the gate's `secret == null → accept` branch (originally intended to keep web-only deployments unaffected) let a renderer bypass the import boundary in two real desktop edges: - Startup race: desktop's REGISTER_DESKTOP_AUTH IPC hasn't reached the daemon yet, but the renderer is already alive in the BrowserWindow and races to fetch /api/import/folder directly with arbitrary baseDir. - Daemon restart mid-session: the new daemon process boots tokenless while a desktop is still running. Same shape: renderer fetches the route, daemon falls through to "web mode", accepts the untrusted baseDir. shell.openPath rejects (no fromTrustedPicker marker) but the daemon's other file APIs (read/write project files, list directories) operate on the attacker-chosen path. Two coordinated mechanisms close that: (1) Sticky in-process flag. `desktopAuthEverRegistered` flips to true on first non-null `setDesktopAuthSecret(...)` and never goes back. setDesktopAuthSecret(null) (used by tests) does NOT relax the gate so production code can never silently fall back to fail-open. Add `resetDesktopAuthForTests()` for vitest cleanup. (2) Orchestrator-pinned mode via OD_REQUIRE_DESKTOP_AUTH=1 read at module load. tools-dev / tools-pack / apps/packaged set this when the daemon is spawned in a desktop-bundled flow (separate commits). With the env set, the gate is active from request 0 — a renderer racing /api/import/folder before registration completes gets a 503 DESKTOP_AUTH_PENDING (transient, retry). Standalone-daemon (web-only) deployments where neither mechanism fires keep the gate dormant and the route's behavior unchanged. Also addresses lefarcen P3 (whitespace HMAC mismatch): the desktop signs the exact picker output, so the daemon must verify the same string. The previous version trimmed `baseDir` before HMAC, which would reject legitimate paths whose final component carried edge whitespace. Use the raw request-body baseDir for verification; the existing trim()+realpath() logic still normalizes for fs operations. New error code: `DESKTOP_AUTH_PENDING` (HTTP 503, retryable). Tests: - `stays fail-closed (503 DESKTOP_AUTH_PENDING) after a registered secret is cleared` — exercises the sticky flag. - `verifies the exact request-body baseDir, not a trimmed version` — pins the round-4 P3 fix. - All existing desktop-import-token-gate cases continue to pass; the beforeEach/afterEach/afterAll resetters now use resetDesktopAuthForTests() to honor the sticky flag. * fix(tools-dev,packaged): pin desktop import-auth on daemon spawn PR #974 round-4 P1 follow-through. The daemon-side fail-closed gate needs OD_REQUIRE_DESKTOP_AUTH=1 in the daemon's spawn env whenever the daemon is paired with a desktop, so the gate is active from request 0 and the daemon-restart-mid-session bypass cannot reopen. tools-dev: - spawnDaemonRuntime accepts a `requireDesktopAuth` option that appends OD_REQUIRE_DESKTOP_AUTH=1 to the spawn env. - startDaemon takes the same flag and additionally checks whether a desktop runtime is already alive in this namespace; either branch pins the env (revival case where the daemon died mid-session and the user runs `tools-dev start daemon` to bring it back up). - startApp threads the bundled-target list down so the daemon spawn knows when desktop is queued in the same orchestration even though the daemon starts first. - The `start` / `restart` / `run` command actions pass the resolved target list into startApp. apps/packaged: - Packaged builds always pair a desktop with the daemon, so startPackagedSidecars unconditionally sets OD_REQUIRE_DESKTOP_AUTH=1 in the daemon child env. Headless builds also flow through this same path, so the same gate applies. Standalone-daemon flows unaffected: `tools-dev start daemon` (alone, no desktop running, no desktop in the bundled target list) does not set the env, and the daemon's gate stays dormant — current web-only behavior is preserved. * fix(desktop,web): align project-id regex with daemon; surface pickAndImport failures mrcfps round-4 nits on PR #974. apps/desktop/src/main/runtime.ts (mrcfps #1): the previous client-side regex `^[a-zA-Z0-9_-]+$` rejected `.` even though the daemon's canonical isSafeId / POST /api/projects accept `[A-Za-z0-9._-]{1,128}`. Result: dotted ids like `my-project.v2` were valid backend-side but got "project id contains disallowed characters" before fetchResolvedProjectDir even hit the network, regressing Continue in CLI / Finalize for those projects. Align the regex with the daemon's shape, comment-tag the rationale. apps/packaged/tests/desktop-project-root-gate.test.ts: add a regression case for a dotted id and one for the 128-char length cap (the new regex exposes both, the old regex obscured the dotted one). apps/web/src/components/NewProjectPanel.tsx (mrcfps #2): the `if (!result \|\| result.ok !== true) return` branch swallowed every non-OK pickAndImport shape (`desktop auth secret not registered`, `web sidecar URL not available`, daemon HTTP errors with details) the same way as the explicit `{ canceled: true }` cancel — leaving the user with a silent no-op when the trusted-picker flow couldn't even get off the ground. Reserve silent-return for the cancel case only; surface every other reason via a Toast (existing component, already used by ProjectView for related Continue-in-CLI flows). The new `formatPickAndImportErrorDetails` helper flattens daemon ApiError envelopes into a single readable secondary line so the operator sees both the category ("Open folder failed: daemon returned HTTP 503") and the upstream reason ("desktop auth required but secret not yet registered"). * docs(architecture): document desktop folder-import auth boundary lefarcen P3 on PR #974 round 4: the `Folder import` section in docs/architecture.md still documented only realpath / sandbox / RUNTIME_DATA_DIR checks and omitted the new desktop HMAC trust boundary, replay/TTL behavior, fail-closed semantics, daemon-restart edge, and legacy-import migration note. Without that subsection it's hard to review whether the 60s TTL, the `~`-separated token shape, or the legacy folder-imports needing re-pick are intentional product decisions or overlooked gaps. Add a "Desktop folder-import auth (PR #974)" subsection covering: - The trust handshake (32-byte secret over sidecar IPC at desktop boot). - Token shape (`${nonce}~${exp}~${signature}`), HMAC payload, and why `.` cannot be the field separator (ISO 8601 expiries embed dots). - TTL and replay behavior (60s, single-use, 2× TTL upper bound). - Fail-closed mechanisms — sticky in-process flag and OD_REQUIRE_DESKTOP_AUTH env var pinning. - Web-only deployments are unaffected (browser builds have no shell.openPath surface). - The `metadata.fromTrustedPicker` marker and the openPath-side defense-in-depth check. - Legacy folder-imports need re-pick to use the Continue-in-CLI button. - Daemon-restart edge: 503 DESKTOP_AUTH_PENDING until desktop re-registers; restart desktop to recover. * fix(packaged): skip desktop-auth gate in headless mode (PR #974 round 5 P2) Round 5 (lefarcen P2): packaged headless mode (daemon+web only, no Electron) was inheriting OD_REQUIRE_DESKTOP_AUTH=1 from the round-4 unconditional pin in startPackagedSidecars. Headless never runs desktop main, so no client could ever register an HMAC secret and folder import returned 503 DESKTOP_AUTH_PENDING permanently — even though headless has no shell.openPath surface to exploit. Plumb a required `requireDesktopAuth: boolean` option through startPackagedSidecars: apps/packaged/src/index.ts (Electron entry) passes true; apps/packaged/src/headless.ts passes false. Extract buildPackagedDaemonSpawnEnv as a pure helper so vitest can pin both branches without spawning a child process. Tests added in apps/packaged/tests/sidecars.test.ts cover both branches plus OD_LEGACY_DATA_DIR / daemonCliEntry env forwarding edges. Refs: nexu-io/open-design#974 * fix(desktop,daemon): lazy auth retry + canonical HMAC binding (PR #974 round 5 P1+P3) Round 5 (lefarcen P1, mrcfps): a daemon restart under OD_REQUIRE_DESKTOP_AUTH=1 left desktop holding a stale secret while the new daemon process required a fresh registration — folder import returned 503 DESKTOP_AUTH_PENDING permanently until the user restarted desktop. Same dead-end if the startup handshake missed its retry window. Round 5 (lefarcen P3): the daemon verified the HMAC against raw request-body baseDir, then trimmed before realpath(). A picker selection of "/tmp/foo " could authorize an import of "/tmp/foo" — token bound to a different path than the one imported. Three coordinated fixes: 1. P1 lazy retry: extract pickAndImportFolder as a pure helper that takes injected fetch / mintToken / registerDesktopAuth deps. On 503 DESKTOP_AUTH_PENDING from /api/import/folder, re-invoke the registration callback once, mint a fresh token (new nonce + new exp keeps replay protection), and POST again. Single retry, no infinite loop. Other failure shapes return immediately to the renderer. 2. P1 wiring: runDesktopMain now ALWAYS passes desktopAuthSecret to the runtime regardless of whether the initial handshake succeeded, plus a registerDesktopAuthWithDaemon callback the runtime invokes lazily. Soften the startup warning text to match the new recovery semantics. 3. P3 binding: trim picker output ONCE on the desktop side before both signing the HMAC and POSTing. Daemon-side verification stays against raw request-body baseDir (round-4 behavior); the daemon's defensive trim before realpath() is now a no-op for desktop traffic and only load-bearing for web-mode callers (path.isAbsolute(" /foo ") is false). End-to-end: desktop-signed string == request body == HMAC- verified string == realpath() input. Tests: - apps/packaged/tests/desktop-pick-and-import.test.ts (NEW, 7 cases): lazy-retry happy path; lazy-retry exhausted (re-register WAS called); single-attempt happy path (no unnecessary IPC); optional-callback no-op; non-503 failures bypass retry; network errors; non-PENDING 503 bypasses retry. - apps/daemon/tests/desktop-import-token-gate.test.ts: replace round-4 whitespace test with two round-5 binding tests — the trimmed string flows end-to-end (HMAC verifies, project metadata.baseDir equals realpath of trimmed input), and a request whose body baseDir diverges from the HMAC-bound string is rejected 403. docs/architecture.md §"Desktop folder-import auth" — update the daemon- restart-edge bullet to describe the lazy-retry recovery (round 4 said "restart desktop to recover", which is now wrong) and add a headless- packaged-mode bullet describing the round-5 P2 gate exclusion. Refs: nexu-io/open-design#974 * feat(sidecar-proto,daemon): surface desktopAuthGateActive over STATUS IPC (PR #974 round 6 prep) Round 6 (mrcfps): the split-start dev flow `tools-dev start daemon` -> `tools-dev start desktop` was leaving the daemon ungated because `OD_REQUIRE_DESKTOP_AUTH=1` is only injected when daemon and desktop spawn in the same orchestrator invocation. To fix that, tools-dev needs to introspect the running daemon's gate state before launching desktop main — but the existing STATUS IPC didn't carry the flag. This commit extends `DaemonStatusSnapshot` with a required `desktopAuthGateActive: boolean` and wires the daemon sidecar's STATUS handler (and the public `status()` method on the handle) to recompute the value from `isDesktopAuthGateActive()` per request, since the flag flips after `REGISTER_DESKTOP_AUTH` and stays sticky. Extracted `withCurrentDesktopAuthGate(snapshot)` as a tiny pure helper so the wiring is testable without booting a real IPC server. The new test pins four scenarios: - no secret registered (web-only mode) -> false - after `setDesktopAuthSecret(buf)` -> true - after `setDesktopAuthSecret(null)` (sticky) -> still true - input snapshot's stale value is overridden by the live flag The orchestrator-side consumer lands in the next commit (`tools/dev/src/desktop-auth-gate.ts`). Refs: nexu-io/open-design#974 * fix(tools-dev): auto-restart ungated daemon before desktop start (PR #974 round 6 mrcfps) Round 6 (mrcfps): the split-start dev sequence `tools-dev start daemon` -> `tools-dev start desktop` was leaving the daemon running without `OD_REQUIRE_DESKTOP_AUTH=1`. The env var is only injected when (A) daemon and desktop spawn in the same orchestrator invocation (`startApp` line ~682) or (B) a desktop runtime is already alive at daemon spawn time (`startDaemon` lines ~595-596). Neither fires for the split flow, so a renderer (or any local HTTP client) could `POST /api/import/folder` directly with an arbitrary `baseDir` before the desktop's first registration POST. Round-5's lazy retry didn't help: it triggers on `503 DESKTOP_AUTH_PENDING`, and the ungated daemon returns 200. Close the gap by introspecting the running daemon's `desktopAuthGateActive` (added to the STATUS IPC in the prior commit) at the start of `startApp(DESKTOP, ...)`. When the daemon reports the gate inactive, stop the daemon (and web, if running), respawn the daemon with `requireDesktopAuth: true`, restart web, then proceed with the desktop start. Restart order is critical and pinned by tests: web stops FIRST (so the web->daemon proxy doesn't serve a transient 502 against the down-then-up daemon), then daemon stops, then daemon respawns gated, then web restarts. The bundled-targets path (`pnpm tools-dev`) is unaffected because trigger (A) already armed the gate at first daemon spawn — the helper costs one ~800ms STATUS IPC roundtrip and returns no-op. Helper lives in its own module (`tools/dev/src/desktop-auth-gate.ts`) so the regression test can import it without triggering the `cli.parse()` side effect at the bottom of `tools/dev/src/index.ts`. Five `node:test` cases pin the call sequence — no daemon, gate active, gate inactive + no web, gate inactive + web running, log shape — so a future refactor can't silently regress the gate. Two synthetic `DaemonStatusSnapshot` literals in `inspectAppStatus` and `inspect` (used when the IPC is unreachable) get `desktopAuthGateActive: false` to satisfy the now-required type field — semantically correct since "no daemon answering" trivially means "no gate active." `docs/architecture.md` adds a new bullet under the Desktop folder- import auth section describing this auto-restart behavior. Refs: nexu-io/open-design#974 * fix(daemon): combine finalize request-abort + timeout signals (PR #974 round 7 lefarcen P1) Round 6 wired the route handler to pass `finalizeAbort.signal` into `finalizeDesignPackage`, but the helper only created its own DEFAULT_TIMEOUT_MS controller when no caller signal was supplied. The result: a client that stayed connected could hold the finalize lock and upstream call indefinitely. Always create the timeout controller; when the caller passes a signal, combine both via `AbortSignal.any` so neither cancel path replaces the other. Adds two regression tests in finalize-design.test.ts: - timeout fires when caller signal never aborts - pre-aborted caller signal still cancels Adds an internal `timeoutMs` option to FinalizeOptions so tests can exercise the abort path without a 120 s wait or fake-timer chains. Production callers omit it; default remains DEFAULT_TIMEOUT_MS. * fix(daemon): allow PATCH preserving existing fromTrustedPicker marker (PR #974 round 7 lefarcen P2) The PATCH /api/projects/:id handler was rejecting any metadata that contained `fromTrustedPicker`, including the unchanged `true` marker that the linked-folder UI re-spreads when editing `linkedDirs`. Trusted folder-imported projects could not update other metadata fields without 400-ing on their own marker. Switch the rejection condition from `'in'` to a value comparison: only reject when the incoming value differs from the persisted one (`patch.metadata.fromTrustedPicker !== existingMeta?.fromTrustedPicker`). That keeps acquisition (existing=undefined, patch true) and flip (existing=true, patch false) attempts blocked while letting the UI re-spread the existing marker. POST /api/projects stays strict; that path has no existingMeta. Adds two regression tests in desktop-import-token-gate.test.ts: - allows PATCH preserving the existing fromTrustedPicker:true marker - rejects PATCH that flips fromTrustedPicker on a trusted project * fix(desktop,packaged): main-process api uses daemon URL not webUrl (PR #974 round 7 lefarcen P2) Packaged builds load the renderer from `od://app/` and report that URL through `discoverWebUrl`. But Node-side `globalThis.fetch` (undici) does not route through Electron's registered `od://` protocol handler — that handler runs in the renderer's protocol scope, not in main-process Node. So `pickAndImportFolder` and `fetchResolvedProjectDir` calls from main silently failed in packaged builds against the protocol scheme. Add `discoverDaemonUrl` to `DesktopRuntimeOptions` and `DesktopMainOptions`. The packaged shell already has the sidecar's real `http://127.0.0.1:<port>` URL (`sidecars.daemon.url` from STATUS IPC) — thread it through to the runtime. Main-process API calls now prefer the daemon URL and fall back to the renderer URL for tools-dev (where it is itself http://127.0.0.1). `PickAndImportFolderDeps.webUrl` renamed to `apiBaseUrl` so the boundary is explicit at the type level; `fetchResolvedProjectDir`'s first parameter renamed similarly. tools-dev callers see no behavior change — their web URL is already an http://127.0.0.1 URL Node fetch can hit. Test (`apps/packaged/tests/desktop-pick-and-import.test.ts`): - existing 7 cases updated to the new prop name (no behavior change) - new case pins URL composition: builds `${apiBaseUrl}/api/import/folder` and never produces a custom-protocol URL. Note for review: this test pins URL composition; full Electron protocol handler integration (renderer fetch through `od://`) is not exercised in unit tests here. * fix(tools-dev): preserve daemon/web ports across desktop-auth gate restart (PR #974 round 7 lefarcen P2) Round 6 added the split-start auto-restart in ensureDaemonGateForDesktop to close the dev-flow gap where `start daemon` then `start desktop` left the daemon ungated. The restart was passing the current `start desktop` CLI options to startDaemonGated/startWeb, which meant a stack started with `--daemon-port 17456 --web-port 17573` could be silently moved to random ports during the hardening restart, breaking browsers and scripts pinned to those ports. Extract the running ports from the STATUS snapshots (daemon.url and web.url) and forward them as explicit `{ port }` callback args. The closure in `tools/dev/src/index.ts` overrides the corresponding option when a port was extracted; null falls back to the original CLI flags. Adds three regression tests in tools/dev/tests/desktop-auth-gate.test.ts: - preserves the running daemon port across the hardening restart - preserves the running web port across the hardening restart - falls back to caller options (port:null) when the URL has no port * fix(web): refresh useDesignMdState on file/chat events (PR #974 round 7 mrcfps) useDesignMdState() previously only recomputed on mount and on explicit refresh() (called once after finalize). Once the user kept working — editing files or sending more chat turns — the stale/fresh badge could drift out of sync because file mtimes and conversation updatedAt moved past the recorded generatedAt without the hook re-checking. Hook accepts an optional `refreshKey: number` arg; ProjectView keeps a counter and bumps it on three events: - file-changed SSE (covers tool-emitted file mutations) - live_artifact* SSE (covers chat turns that emit artifacts) - streaming `true → false` edge (covers pure-text chat turns) The hook treats refreshKey as a compute() dep; React's Object.is comparison short-circuits the no-op renders, so each bump is a single recompute pass. Adds a regression test in useDesignMdState.test.tsx: - flips stale state after a refreshKey bump without remounting * fix(web): degraded-state useDesignMdState on malformed provenance (PR #974 round 7 mrcfps) useDesignMdState used to report `{ isStale: false, staleReason: null }` when the parser could not extract a comparison timestamp from the DESIGN.md `## Provenance` section. The pinned test made that the documented behavior. As mrcfps pointed out, that fails open exactly when the freshness signal is most untrustworthy: any provenance- formatting drift silently disables the staleness warning. Extend `DesignMdStaleReason` with a third variant `'unknown-provenance'`. On `generatedMs === null`, return `{ isStale: true, staleReason: 'unknown-provenance' }`. ContinueInCliButton renders a distinct chip text "Spec freshness unknown — regenerate to refresh signal" for that variant; the button stays enabled because not-comparable is not the same as broken state. Tests: - modify the existing pinned test to assert the new degraded state - add an end-to-end useDesignMdState test feeding a malformed Provenance section through compute() so a regression that re-pins fresh-on-null at the hook level (not just computeStale) fails fast - add ContinueInCliButton render + click tests for the new chip --------- Co-authored-by: DevForgeAI CI/CD Engineer <devforge-ai@development.ai> Co-authored-by: lefarcen <935902669@qq.com>	2026-05-10 11:44:32 +08:00
Marc Chan	b03a504da6	release: Open Design 0.6.0 (#1080 )	2026-05-09 19:58:11 +08:00
lefarcen	2bb029cb58	release: Open Design 0.5.0 (#820 ) 0.5.0 已从 `c21cbc6` 发布（https://github.com/nexu-io/open-design/releases/tag/open-design-v0.5.0）；本次 squash 把版本 bump 与 CHANGELOG [0.5.0] 条目带到 main 历史，便于后续 0.5.1 release 在 main 上走标准 dispatch 流程。	2026-05-08 00:41:01 +08:00
iulian	80416b185a	Diagnose missing Next package during tools-dev web startup (#675 ) * fix(tools-dev): diagnose missing Next package * fix(web): remove duplicate Ukrainian prompt labels	2026-05-06 20:45:41 +08:00
lefarcen	ae4a08773a	chore(release): prepare 0.4.1 (#659 ) - bump remaining monorepo package.json files to 0.4.1 after apps/packaged was already bumped in #637 - add CHANGELOG.md [0.4.1] - 2026-05-06 entry covering the startup hotfix and 19 merged PRs since 0.4.0: - Added: manual edit mode (#620), Cmd/Ctrl+P quick file switcher (#556), resizable chat panel (#563), PI status/cancel updates (#618), accessibility and RTL/Bidi craft modules (#587, #595), i18n structure checks (#608) - Changed: first-PR README links now surface help-wanted issues (#605) - Fixed: packaged contracts runtime exports (#577), packaged runtime beta gating (#637), ACP/MCP/agent fixes (#604, #612, #627), conversation error recovery (#623), native mac quit (#637) - Documentation/Internal: OD_DATA_DIR migration docs (#570), Simplified Chinese QUICKSTART (#578), zh-TW/ko README syncs (#586, #619), generated metrics (#592) Release workflow validation runs after merge via release-stable.	2026-05-06 18:05:56 +08:00
lefarcen	963bbf2500	release: Open Design 0.4.0 (#454 )	2026-05-05 23:39:40 +08:00
ChildhoodAndy	009d7a5478	refactor(daemon): eliminate duplicate dist tree from two-tsconfig build (#553 ) Move sidecar source under src/ so a single tsconfig produces all daemon output. Removes the parallel dist/src/ tree that was emitted by tsconfig.sidecar.json (it included src/*/.ts to type-check the `../src/server.js` cross-tree import). Build now emits: - dist/<flat> (cli.js, server.js, app-version.js, ...) - dist/sidecar/{index,server}.js `dist/sidecar/server.js` reaches the main daemon via `../server.js` instead of `../src/server.js`, so there is no second copy of the source tree in the published tarball. Background — issue #534 (already fixed by #537): The packaged Settings → About panel showed 0.0.0 because the sidecar chain loaded the duplicated `dist/src/app-version.js`, where the fixed `new URL('../package.json', import.meta.url)` resolved to a non-existent `dist/package.json`. #537 patched the symptom by walking parents until a real `package.json` is found and by writing `appVersion` into the Linux packaged config. Both stay in place — they're sound defenses — but the underlying duplicate-emit was never addressed; any future relative resource lookup (templates, schemas, prompts) anchored on `import.meta.url` would have hit the same trap. This change removes the trap.	2026-05-05 23:31:14 +08:00
PerishFire	bbdd4e84b5	chore: enforce test directory conventions (#496 ) * chore: enforce test directory conventions Move package, app, and tool tests out of src and add guard enforcement so source directories stay source-only. * ci: use guard and package-scoped tests Run the new repository guard in CI and keep test execution aligned with package-scoped commands after removing root aliases. * ci: align stable release guard check Use the new repository guard in stable release verification after replacing the residual-JS-only script. * chore: tighten test layout enforcement Enforce sibling tests directories, typecheck moved test suites with dedicated configs, and refresh remaining guidance that pointed at src-based tests. * chore: clarify no-emit test tsconfigs Explicitly disable declaration-only emit in test tsconfigs so review tooling sees they are no-emit typecheck configs.	2026-05-05 15:34:22 +08:00
lefarcen	a719f02aa2	fix(web): normalize daemon proxy origins Fix web sidecar proxy requests so same-origin browser requests reach the daemon without tripping origin validation, while unrelated origins remain rejected. Fixes #388.	2026-05-04 03:39:19 +08:00
lefarcen	016c08183f	release: Open Design 0.3.0	2026-05-03 23:07:28 +08:00
Sid	648374d839	fix(platform): wrap cmd.exe shim invocations to survive /s /c quote stripping (#339 ) PR #258 standardized agent spawning through `createCommandInvocation`, which on Windows wraps `.cmd` / `.bat` paths in `cmd.exe /d /s /c <line>` and quotes each argument with cmd-style doubled quotes. PR #232's follow-up fix for `shell:true` was lost in that refactor, and the new shape has its own quoting bug on argv-style spawn: 1. cmd.exe `/s /c` strips exactly one leading and one trailing `"` from the rest of the command line. 2. Node, with `windowsVerbatimArguments` unset, escapes each argv element using CommandLineToArgvW rules — so the inner `"path with space"` ends up surfacing to cmd.exe with an extra layer of `\"` escaping that cmd doesn't understand. Together these collapse `"C:\Users\Ethical Byte\...\codex.CMD" --help` into `C:\Users\Ethical Byte\...\codex.CMD --help` with no quoting preserved, and cmd.exe parses the first space as a token boundary — "`Ethical` is not recognized as an internal or external command." See issue #315 for the full repro. The fix mirrors what Node's own `child_process.spawn({ shell: true })` does internally: wrap the entire joined command line in an extra `"…"` and set `windowsVerbatimArguments: true`. The outer wrap absorbs the `/s /c` strip, leaving inner per-arg quoting intact, and the verbatim flag tells Node to pass argv through to CreateProcess unchanged. Changes: - `packages/platform/src/index.ts` - Extend `CommandInvocation` with optional `windowsVerbatimArguments`. - Extract the cmd.exe shim builder into `buildCmdShimInvocation` and apply the outer wrap + verbatim flag in both `createCommandInvocation` and `createPackageManagerInvocation`. - Forward the flag through `spawnBackgroundProcess` and `spawnLoggedProcess`. - `apps/daemon/src/server.ts` — agent spawn forwards `invocation.windowsVerbatimArguments`. This is the call site that hit #315 in the wild (Codex CLI `.CMD` shim, user dir with space). - `tools/pack/src/win.ts` — `runPnpm` and `runNpmInstall` forward the flag through `execFileAsync`. Affects the Windows packaged-build pipeline when run from a path with spaces. - `tools/dev/src/index.ts` — `runLoggedCommand` accepts and forwards the flag; `buildDesktop` propagates it from `createPackageManagerInvocation`. Affects local dev on Windows. Tests: - 9 new unit tests in `packages/platform/src/index.test.ts` stub `process.platform` so both Windows and POSIX branches run on every CI runner. Coverage: - POSIX pass-through. - Windows non-shim binary pass-through. - `.CMD` shim with spaces in the binary path (the #315 repro). - `.bat` shim parity. - Argv elements with spaces alongside the shim path. - Argv elements without whitespace stay unquoted. - `process.env.ComSpec` fallback. - `npm_execpath` short-circuit (cross-platform). - POSIX pnpm pass-through. - Windows pnpm wrapped through cmd.exe. Closes #315.	2026-05-03 10:00:46 +08:00
lefarcen	62b01a6dbf	release: Open Design 0.2.0 (#297 )	2026-05-02 22:28:59 +08:00
Kevin Tsai	c0589ed05e	fix(desktop): launch reliably on Windows from Electron-based parent shells (#292 ) * fix(tools-dev): strip ELECTRON_RUN_AS_NODE before spawning desktop Parent processes such as Electron-based IDEs may set ELECTRON_RUN_AS_NODE=1 in their environment for sidecar/script reuse. When tools-dev inherits this env via process.env, the spawned electron.exe runs as plain Node and fails to inject main-process APIs (app, BrowserWindow, protocol all become undefined). Explicitly drop the variable before spawning so desktop always boots in real Electron mode regardless of caller environment. * fix(desktop): ensure BrowserWindow is visible on initial load Windows focus-stealing prevention can leave detached-spawned GUI windows minimized or hidden, even when constructed with show:true. Add a small ensureWindowVisible helper that restores from minimized state and forces show+focus after the placeholder URL loads. Cross-platform safe: only acts when window is actually hidden or minimized, preserving any user window-state adjustments.	2026-05-02 22:28:56 +08:00
Foximo24	a4fd4f949f	fix(tools-dev): use junction instead of dir symlink on Windows (#231 ) ensureWebDevNodeModules() called fs.symlink(target, path, "dir") to link apps/web/node_modules into the web runtime root. On Windows, "dir" symlinks require either Administrator rights or Developer Mode (SeCreateSymbolicLinkPrivilege). Standard non-elevated user accounts without Developer Mode get EPERM and tools-dev exits before web ever starts. Junctions ("junction") are functionally equivalent for directory-only links on the same volume, work for any user without elevation or Developer Mode, and are silently treated as plain symlinks on POSIX. The existing isSymbolicLink() check on the next launch still matches junctions on Node. Reproduced on Windows 11 + Node 24 with a non-elevated PowerShell session and Developer Mode off.	2026-05-02 10:17:43 +08:00
yamsfeer	4510c69ba1	feat(tools-dev): add --prod flag and OD_HOST for headless server deployment (#222 ) * feat(tools-dev): add --prod flag and OD_HOST for headless server deployment - Lazy-load electronBinaryPath so daemon+web can start without Electron - Add --prod flag to tools-dev start/run that sets NODE_ENV, OD_WEB_PROD, and OD_WEB_OUTPUT_MODE automatically for production Next.js builds - Add OD_HOST env var support to daemon and web sidecar bind addresses (defaults to 127.0.0.1, set to 0.0.0.0 for remote access) - Skip next.config.ts distDir override when OD_WEB_PROD=1 so production builds resolve the default .next directory Closes #221 * fix: address PR review — hardcode daemon bind to 127.0.0.1, cache electron path - Revert OD_HOST from daemon: daemon is a local privileged process and must only bind 127.0.0.1. Remote access goes through the web sidecar proxy. - Web sidecar resolveDaemonOrigin now always uses 127.0.0.1, separated from the web bind address (OD_HOST) so the daemon proxy works correctly even when the web listener binds 0.0.0.0. - Add OD_HOST character validation to reject clearly invalid values early. - Cache electronBinaryPath getter result to avoid repeated require() calls. --------- Co-authored-by: yamsfeer <yamsfeer@users.noreply.github.com>	2026-05-02 09:27:16 +08:00
Waleed978	89722379c5	fix(tools-dev): normalize web dev tsconfig paths on Windows (#174 ) tools-dev generated a temp web tsconfig with Windows backslash relative paths in extends, which Next/TypeScript failed to resolve in some environments. Normalize runtime tsconfig/dist path strings to POSIX separators so dev config resolution works consistently across Windows/Linux/macOS.	2026-04-30 22:45:02 +08:00
nettee	3fb849d047	Fix chat runs surviving web disconnects (#146 ) * fix chat runs surviving web disconnects * fix chat run create abort propagation Generated-By: looper 0.0.0-dev (runner=fixer, agent=openai/gpt-5.5) * fix daemon keepalive reconnect budget Generated-By: looper 0.0.0-dev (runner=fixer, agent=gpt-5.5) * fix daemon stream disconnect cancellation Generated-By: looper 0.0.0-dev (runner=fixer, agent=openai/gpt-5.5) * fix daemon stream abort cancellation race Generated-By: looper 0.0.0-dev (runner=fixer, agent=openai/gpt-5.5) * fix daemon run cancellation semantics * fix load * doc * 2 * add run refresh recovery * fix active run refresh status * fix reattach abort handling * fix * fix chat initial scroll * fix daemon start failures Generated-By: looper 0.2.7 (runner=fixer, agent=openai/gpt-5.5) * fix background run recovery Generated-By: looper 0.2.7 (runner=fixer, agent=openai/gpt-5.5) * fix stop run status Generated-By: looper 0.2.7 (runner=fixer, agent=openai/gpt-5.5) * fix background run recovery Generated-By: looper 0.2.7 (runner=fixer, agent=openai/gpt-5.5) * extract daemon run service * move prompt composition to daemon * fix prompt module resolution * fix project id generation * add project run status * add designs kanban view with awaiting_input status - add grid/kanban view toggle on Designs tab; persist choice in localStorage - introduce awaiting_input project display status (daemon-derived from unanswered <question-form>) so projects asking the user aren't shown as Completed; ordered between Running and Completed with amber accent - hide transient queued state from users: coerce queued/starting to running in daemon /api/projects projection and drop the queued kanban column - a11y polish on Designs cards: Space activation, aria-labels on delete, focus-visible outlines, reveal delete on focus-within and touch, prefers-reduced-motion handling - kanban layout uses flex sizing instead of viewport math; scoped icon- only pill button rule fixes view-toggle icon alignment --------- Co-authored-by: mrcfps <mrc@powerformer.com>	2026-04-30 20:16:46 +08:00
Marc Chan	ac9c239b1e	Improve tools-dev native addon diagnostics (#153 ) Surface daemon log tails when tools-dev waits for daemon status and add targeted guidance for native Node addon ABI mismatches. Generated-By: looper 0.0.0-dev (runner=worker, agent=gpt-5.5)	2026-04-30 17:11:59 +08:00
nettee	86c256ad56	Improve tools-dev web startup flow (#128 )	2026-04-30 14:58:52 +08:00
PerishFire	a19c866d5b	Fix tools-dev default startup usability (#127 ) Allow sidecar port zero for auto allocation and make lifecycle command output easier to read.	2026-04-30 14:50:44 +08:00
PerishFire	c6d11018a0	Refresh desktop integration control plane (#123 ) * feat(dev): add desktop tools-dev control plane * refactor(sidecar): split Open Design contracts Move Open Design-specific sidecar protocol definitions into @open-design/contracts so sidecar and platform can remain descriptor-driven primitives. * refactor(daemon): organize package sources Keep daemon app code, tests, and sidecar entrypoints in separate package directories so each layer can be built and verified independently. * chore(repo): streamline maintenance entrypoints Centralize agent guidance by directory and reduce root command chains while preserving the existing build scope. * docs: translate agent guidance to English * fix(sidecar): tolerate stale IPC sockets Remove stale Unix socket files only after confirming no listener is active, so tools-dev can restart after unclean shutdowns.	2026-04-30 14:23:53 +08:00

35 commits