open-design

mirror of https://github.com/nexu-io/open-design.git synced 2026-06-01 03:14:35 +07:00

Author	SHA1	Message	Date
Amy	1c2a1c4459	Add launch review regression coverage and stabilize daemon tests (#3207 ) * Add launch review E2E regression coverage * Harden daemon launch review regressions * Stabilize daemon runtime tests * fix(tests): restore e2e preflight typing Generated-By: looper 0.8.1 (runner=fixer, agent=codex) * fix(tests): make fake plugin runtime ESM-safe Generated-By: looper 0.8.1 (runner=fixer, agent=codex) * Stabilize e2e fake agent and regression tests * fix(tests): repair fake agent cjs runtime Generated-By: looper 0.8.1 (runner=fixer, agent=codex) * fix(review): harden plugin authoring checks Generated-By: looper 0.9.2 (runner=fixer, agent=codex) * fix(tests): bind plugin authoring run to seeded conversation Generated-By: looper 0.9.2 (runner=fixer, agent=codex)	2026-05-29 02:39:33 +00:00
lefarcen	df8a0faff6	feat(runtimes): register AMR (vela) as an ACP stdio agent (#2355 ) * feat(runtimes): register AMR (vela) as an ACP stdio agent AMR is the vela CLI's ACP runtime mode. `vela agent run --runtime opencode` speaks ACP JSON-RPC over stdio (see vela's `specs/current/runtime/manual-agent-run-openrouter.md`); per `docs/new-agent-runtime-acp.md` we expose it through the same `streamFormat: 'acp-json-rpc'` transport that already powers Hermes, Devin, Kimi, etc. The new `defs/amr.ts` is the entire wiring — `buildArgs` returns `['agent', 'run', '--runtime', 'opencode']`, `fetchModels` reuses `detectAcpModels`, and the fallback list seeds the OpenRouter ids vela's e2e baseline uses. `executables.ts`/`app-config.ts`/`metadata.ts` get the matching `VELA_BIN`/`VELA_LINK_URL`/`VELA_RUNTIME_KEY`/`VELA_OPENCODE_BIN` allowlist + install/docs URLs, so users can configure the per-agent env in Settings without leaking into other adapters. Coverage: `tests/fixtures/fake-vela.mjs` is a minimal ACP stub that returns the documented `initialize` / `session/new` / `session/set_model` / `session/prompt` shapes; `tests/amr-acp-integration.test.ts` spawns it via `child_process.spawn` and drives a full turn through `attachAcpSession` and `detectAcpModels`, so the ACP transport contract for AMR is end-to-end verified locally even before a real `vela` binary is installed. Validated: - pnpm guard - pnpm typecheck (all workspace projects) - pnpm --filter @open-design/daemon test (2881/2881) Deferred: real OpenRouter-backed turn through a built `vela` binary — the runtime def needs no changes for that path, only `VELA_RUNTIME_KEY` and `VELA_LINK_URL` in env (or Settings). * fix(runtimes/amr): pin a concrete default model and bare openai ids End-to-end validation against a freshly-built `vela` (nexu-io/vela@main) + OpenRouter surfaced two contract details the first AMR runtime def got wrong: 1. vela rejects `session/prompt` with `session/set_model must be called before session/prompt`. attachAcpSession in apps/daemon/src/acp.ts skips set_model whenever the picked model is the synthetic 'default' id, so AMR's fallback list must NOT include DEFAULT_MODEL_OPTION. The def now ships a concrete `gpt-5.4-mini` as both `fetchModels`' default option and `fallbackModels[0]`, which makes attachAcpSession always send a real `session/set_model` for AMR turns. 2. `vela --runtime opencode` auto-prepends `openai/` to whatever modelId it forwards to opencode's openai provider. With OpenRouter-style ids like `openai/gpt-5.4-mini`, opencode receives the double-prefixed `openai/openai/gpt-5.4-mini` and replies `ProviderModelNotFoundError`. The new fallback list ships the bare ids opencode's openai registry actually knows about (gpt-5.4, gpt-5.4-mini, gpt-5.4-fast, etc.). Stub + tests: - tests/fixtures/fake-vela.mjs now enforces the set_model gate the same way real vela does, so a regression that silently goes back to model: 'default' would surface as a fatal error in tests instead of a hidden production failure. - tests/amr-acp-integration.test.ts pins both contracts: no 'default' / no 'openai/' prefix in fallbackModels, and a negative case that asserts session/prompt fails when no model is set. Adds `apps/daemon/scripts/verify-amr-real-vela.mjs` — a small dev-time runner that drives `attachAcpSession` against a real `vela` binary and prints the daemon's chat events, so future protocol drift can be checked against an actual OpenRouter call. Verified locally: `vela agent run --runtime opencode` + OpenRouter returns the prompted string ("AMR-E2E-PASS") through the full daemon pipeline; daemon test suite stays 2883/2883. * fix(runtimes/amr): substitute concrete model when chat run sends 'default' A plugin-driven AMR run from the UI surfaced a real-world hole in the prior commit: json-rpc id 3: session/set_model must be called before session/prompt The Default-design-router plugin (and any caller that doesn't pin a real model) sends `model: 'default'` straight through, which the AMR runtime def cannot accept — vela rejects `session/prompt` without `session/set_model` and attachAcpSession skips set_model whenever model === 'default'. Just leaving DEFAULT_MODEL_OPTION out of the adapter's `fallbackModels` is not enough: the chat-run handler in server.ts still forwarded 'default' verbatim. This adds `resolveModelForAgent(def, resolved, env?)` as the single source of truth for the substitution: 1. If the caller picked a real id, pass it through. 2. Else, if `def.defaultModelEnvVar` is set and the daemon process env has a non-empty value for it, return that (operator escape hatch — see below). 3. Else, if the def's `fallbackModels` does NOT contain a 'default' id, return `fallbackModels[0].id`. 4. Else, return the original value (the historic shape — defs that list 'default' themselves are untouched). AMR sets `defaultModelEnvVar: 'VELA_DEFAULT_MODEL'`, so when opencode's openai-provider registry deprecates `gpt-5.4-mini` upstream, an operator can swap the fallback id without a code change by exporting `VELA_DEFAULT_MODEL=gpt-5.5` before launching tools-dev / od. Worth noting the env var must live in the daemon's `process.env` (Settings-UI per-agent env values only reach the spawned child, not the daemon's resolver) — the new field's docblock spells this out. Coverage: - `tests/runtimes/resolve-model.test.ts` — 8 unit tests covering all four resolver branches plus the env-override happy path / fallback / ignore-when-user-picked-a-real-id case. - `pnpm --filter @open-design/daemon typecheck` clean. * chore(runtimes/amr): move AMR to the top of the base agent list So `AMR (vela)` shows up first in the agent picker / status views, ahead of claude / codex. Pure ordering change; no behavior delta. * feat(amr): Sign-in / Sign-out button on the AMR Settings card The first half of the AMR work assumed the operator would set VELA_RUNTIME_KEY / VELA_LINK_URL on the daemon process and never surfaced login state to users. This adds the missing UX so a fresh install can drive the full path from Settings: - GET /api/integrations/vela/status reads ~/.vela/config.json for the active profile and returns { loggedIn, profile, user } (without leaking the runtime/control keys themselves). - POST /api/integrations/vela/login spawns `vela login` once (409 if one is already in flight). The vela CLI opens the user's browser to the device-authorization page itself — Open Design only needs to kick the subprocess off. - POST /api/integrations/vela/logout removes ~/.vela/config.json so the next status read returns logged-out. `AmrAgentCard` is a dedicated agent-card component for AMR because the existing `<button>` row can't host an interactive sub-control (nested interactive elements). It polls /status after a login click until the daemon reports loggedIn=true (or 5 minutes elapse), and exposes a Sign-out action on hover. Other adapters (claude, codex, hermes, …) keep their existing `<button>` card. i18n: 8 new keys (settings.amrLogin / Logout / LoggingIn / etc.) added to en + zh-CN. Other locales spread `en` and inherit the English copy until translations land. Coverage: - `tests/integrations/vela.test.ts` pins the config.json reader against a tmp HOME — including the negative case where a profile has user info but no runtimeKey (still logged-out), and the secret-leak guard ("rt-secret-" must not appear in the projection payload). - `tests/components/AmrAgentCard.test.tsx` covers all four UI states (logged-out, logging-in, logged-in, logging-out) plus the click-propagation invariant the divergent card was built to keep. `pnpm --filter @open-design/daemon test` 2901 / 2901 passing. `pnpm --filter @open-design/web test` 1719 / 1719 passing. `pnpm typecheck` + `pnpm guard` clean. Dev script side-effects: `apps/daemon/scripts/verify-amr-real-vela.mjs` no longer requires both VELA_RUNTIME_KEY and VELA_LINK_URL — if VELA_PROFILE is set, the vela CLI is allowed to resolve credentials from `~/.vela/config.json`. Added the two AMR `.mjs` fixtures to `scripts/guard.ts` allowlist with the executable-fixture / dev-runner rationale. fix(connection-test): substitute model for AMR before attachAcpSession The chat-run path in server.ts already routes the requested model through `resolveModelForAgent` so AMR / vela (whose CLI demands an explicit `session/set_model` before `session/prompt`) gets the def's first concrete fallback id when the chat run ships `model: 'default'`. `connectionTest.ts` was wiring `attachAcpSession({ ..., model: model ?? null })` directly, which made the Test Connection button on the AMR Settings card deadlock with the same `session/set_model must be called before session/prompt` error the chat-run path already handles — surfaced as a permanent "Testing connection…" spinner in the UI. Reuse the same helper here so Test Connection mirrors chat-run behavior. * test(amr): three-layer end-to-end coverage for the AMR login + turn flow The PR up to this point shipped runtime + UI code with unit-level Vitest coverage. This commit adds the cross-layer regression net the live demo relied on: 1. apps/daemon/tests/integrations/vela.routes.test.ts (HTTP, Vitest) Spins up the real daemon Express app via `startServer({port:0,...})`, persists `agentCliEnv.amr.VELA_BIN = <fake>` into app-config.json, and exercises every /api/integrations/vela/* endpoint against the extended fake-vela stub: - status reads ~/.vela/config.json under various states - login spawns the fake, waits for config.json to appear, returns pid + startedAt + profile - 409 already-running guard with the stub's delay knob - logout removes the file (idempotent) - secrets (runtimeKey / controlKey) never leak in the projection - login → status round-trip flips loggedIn=false → true 2. e2e/tests/amr/turn.test.ts (tools-dev orchestrated, Vitest) Boots a namespaced daemon + web pair through `createSmokeSuite`, inlines a self-contained fake `vela` binary that handles BOTH `vela login` (writes ~/.vela/config.json) and `vela agent run --runtime opencode` (ACP stdio with the `session/set_model must precede session/prompt` gate the real binary enforces), then drives a complete /api/runs lifecycle for `agentId: 'amr', model: 'default'` and asserts the assistant message captures the fake's streamed text. This is the test that would have surfaced today's plugin-default-model regression (the `set_model before prompt` error) at PR time instead of demo time. 3. e2e/ui/amr-login-pill.test.ts (Playwright) Mocks /api/agents + /api/integrations/vela/{status,login,logout} to drive the Settings AMR card through the full Sign in → Signed in → Sign out cycle. Pins the AmrLoginPill polling contract and the aria-label semantics (the pill's accessible name is "Sign out" once logged in, regardless of which label the hover-state text shows). fake-vela.mjs extensions: - Handles `vela login` argv by writing ~/.vela/config.json for the active VELA_PROFILE and exiting 0 — mirrors real vela's on-disk side-effect without the device-auth loop. - FAKE_VELA_LOGIN_DELAY_MS knob so route tests can observe the in-flight state of the spawn lifecycle. - FAKE_VELA_LOGIN_USER_EMAIL / _USER_PLAN to assert the surfaced user fields end-to-end. Validated: - `pnpm guard` + `pnpm typecheck` (all workspace projects) - `pnpm --filter @open-design/daemon test`: 2998 / 2998 passing, including the new 8-test integration suite. - `cd e2e && pnpm test tests/amr`: 1 / 1 passing. - `cd e2e && pnpm exec playwright test ui/amr-login-pill.test.ts`: 1 / 1 passing (6.7s). * feat(amr): package native cli and refine login ui * feat(amr): wire vela cli beta packaging * docs(amr): document vela ci packaging review * docs(amr): refine vela ci integration review * fix(ci): refresh nix pnpm dependency hashes * fix(pack): clean up Vela CLI packaging * fix(pack): bundle Vela CLI support files * fix(amr): recover login attempts from stale auth state * test: expand AMR and automations coverage * fix(amr): address review follow-ups * test(web): align tasks fixtures with contracts * fix(daemon): type wildcard route params * fix(ci): refresh PR merge validation * fix(amr): clear env credentials on logout * feat(settings): inline local CLI model configuration * fix(amr): recognize daemon env credentials * [codex] Fix Vela companion packaging (#2979) * Fix Vela companion packaging * Update Nix pnpm dependency hashes * [codex] Surface AMR account failures (#2980) * fix: surface AMR account failures * fix: cover AMR recovery error guidance * chore: bump beta base version to 0.8.1 (#2990) * Fix AMR profile and packaged runtime review issues * Detect packaged AMR OpenCode companion tree * feat(web): polish AMR frontend flows * Polish AMR onboarding card * fix: read AMR login state from dot-amr config (#3048) * test: tighten AMR credential and packaging coverage * test: restore AMR executable test env helper * [codex] Fix packaged mac Dock identity and AMR label (#3076) * Fix packaged mac sidecar Dock identity * Rename AMR assistant label * Fix AMR live models and dot-amr login state (#3073) * fix: read AMR login state from dot-amr config * fix: load live AMR models before runs * fix: point AMR onboarding link to production wallet * fix: address AMR model review feedback * fix: persist live AMR model fallback * [codex] Fix AMR link catalog model ids (#3088) * Fix packaged mac sidecar Dock identity * Rename AMR assistant label * Fix AMR link catalog model ids * Fix AMR model normalization typecheck * Use live AMR model for default runs * fix: polish AMR runtime settings UI * Accelerate AMR startup defaults (#3092) * Surface AMR insufficient balance wallet URL (#3099) * fix(web): polish onboarding controls (#3112) * fix(web): show CLI scan loading state * Avoid duplicate AMR wallet recharge links (#3117) * Avoid duplicate AMR wallet recharge links * Use Vela CLI 0.0.3 test package * chore(nix): refresh pnpm deps hash * Fix AMR wallet guidance display --------- Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com> * chore(pack): pin Vela CLI 0.0.3-test.1 (#3127) * chore(nix): refresh pnpm deps hash * chore(pack): pin Vela CLI 0.0.3 * chore(nix): refresh pnpm deps hash * fix(web): suppress AMR exit 130 fallback (#3136) * feat(web): nudge users to hosted AMR on model/auth/quota failures (#3083) * feat(web): nudge users to hosted AMR on model/auth/quota failures When a non-AMR agent run fails with an auth / quota / upstream model error, surface an inline nudge under the error pill linking to Open Design's hosted AMR gateway (https://open-design.ai/amr). The nudge fires `surface_view` (element=run_failed_toast) on impression and `ui_click` (element=go_amr) on the link. Also teach the daemon to classify CLI-agent auth/quota/upstream failures (Claude Code, codex, ...) into specific API error codes (AGENT_AUTH_REQUIRED / RATE_LIMITED / UPSTREAM_UNAVAILABLE) instead of the generic AGENT_EXECUTION_FAILED, so both the error message and the nudge key off accurate codes. AMR's own runs are excluded from the nudge — they keep the dedicated sign-in / recharge affordances. * feat(web): rework failed-run AMR guidance into per-case error UI Replace the single inline nudge with a per-case failed-run experience driven by the run's error code + agent: - The error card is now neutral gray (was red) and always carries a retry button; it is driven by the persisted per-message error event so it survives a reload. - Non-AMR agent hitting a model/auth/quota wall: a theme-color promotion card under the error card offers "switch to AMR & retry" — switches the run to AMR, opens Settings on the AMR card, and auto-retries once the account signs in (ProjectView polls vela login status, independent of the Settings pill lifecycle, with success / 5-min-timeout / unmount exits). - AMR agent unauthorized: clearer copy + an "authorize & retry" button. - AMR agent out of balance: clearer copy + a "top up" button to the AMR wallet, with manual retry. - Settings AMR card: when opened from the nudge, it scrolls into view and pulses, and an authorize-button coachmark (a fake hand cursor that rises in and dismisses on hover) points at the sign-in control when not yet authorized. analytics: surface_view (run_failed_toast) on the promotion card and ui_click (go_amr) on its action are retained. i18n adds chat.amrCard.* and chat.amrError.* (en / zh-CN / zh-TW translated; other locales fall back to en) and drops the old chat.amrErrorGuidance keys. * fix(daemon): require status context for numeric service-failure codes Per review on #3083: the model-service classifier matched bare HTTP status numbers (`500`, `502`, `429`, `401`), so ordinary CLI output like `line 500`, `read 502 bytes`, or `exit code 401` could be misclassified as a provider outage / auth wall and wrongly surface the AMR nudge. Now a status number only counts when it carries explicit context (`HTTP 500`, `status 503`, `code: 401`, `502 Bad Gateway`); textual provider phrases (overloaded, bad gateway, service unavailable, rate limit, …) are unchanged. Adds fixtures proving unrelated numeric output stays null. * fix(web): keep error pill for failed runs ChatPane's card doesn't cover Per review on #3083: the per-message gray error pill was suppressed for every persisted error status event, but ChatPane only renders the replacement top-level error card for `retryableAssistantMessage` (the last failed assistant). So a failed turn that is no longer last (after a follow-up) or an older failed run in history showed neither the pill nor the card — its error detail vanished, undercutting reload/history survival. ChatPane now passes `errorCardOwnerId` (the assistant id whose error the card represents); AssistantMessage suppresses only that one pill and keeps rendering StatusPill for all other error events. * fix(daemon): don't treat a process exit code as an HTTP status Follow-up to review on #3083: the status-context helper accepted a bare `code` prefix, so `exit code 401` / `process exited with code 429` still matched and got classified as AGENT_AUTH_REQUIRED / RATE_LIMITED (the very `exit code 401` case the comment calls out as noise). `code` now only counts when qualified (`status code` / `error code` / `response code`) or punctuation-bound (`code: 401`); bare `exit code N` no longer matches. Adds fixtures for exit-code lines returning null. * chore(web): translate AMR card / error keys for 16 remaining locales PR #3083 added 10 new `chat.amrCard.` / `chat.amrError.` keys but only provided en/zh-CN/zh-TW translations; the other 16 locales fell back to English. Translate the card title/body, three chips, primary CTA, and the AMR self-error (auth / balance) messages and buttons for ar, de, es-ES, fa, fr, hu, id, it, ja, ko, pl, pt-BR, ru, th, tr, uk. * fix(amr): address review feedback on #2355 Targeted fixes for the unresolved review threads on #2355. Each fix includes / updates a focused test. - runtimes/executables.ts: `packagedVelaOpenCodeCompanionTree` now verifies the inner `opencode` executable exists + is runnable, not just the directory. This closes the false-positive availability path that let `detectAgents()` surface AMR as available even when the packaged companion was empty / partially copied (mrcfps, 4 threads). - runtimes/executables.ts: `resolveAmrOpenCodeExecutable` now prefers the bundled `<OD_RESOURCE_ROOT>/bin/libexec/opencode/opencode` over a stale `opencode` on the user's PATH, so packaged AMR builds can't be hijacked by a global installation. - web/EntryShell.tsx: when the Local CLI scan returns an available agent and the previously-selected agent is AMR, switch the selection to the first available local agent so the runtime and persisted agent agree before Continue. - server.ts (model-probe branch): for AMR, check `readVelaLoginStatus` BEFORE rejecting on an empty live-model catalog — a signed-out user was getting `AMR_MODEL_UNAVAILABLE` ("choose a model") instead of the correct `AMR_AUTH_REQUIRED` (sign-in affordance). - server.ts (default model fallback): if the user asked for the AMR agent default and the cached id is no longer in the FRESH catalog, fall back to `liveModels[0]` from the probe instead of rejecting the run as `AMR_MODEL_UNAVAILABLE`. - integrations/vela.ts: route `vela login` through `createCommandInvocation` so an npm/Node-style `vela.cmd` / `.bat` shim on Windows gets the correct `cmd.exe /d /s /c …` wrapping with verbatim args (matches `execAgentFile` / chat-run spawning). - tools/pack/src/linux.ts: in containerized Linux builds, bind-mount the host directory of `OPEN_DESIGN_VELA_CLI_BIN` and rewrite the env to the container-side path. The host path was being passed in as-is even though the default container only mounts /project, /tools-pack and cache/home — `copyOptionalVelaCliBinary` saw a missing path. Deferred (out of scope for this PR): - `od amr status/login/logout/cancel` CLI subcommands (AGENTS.md UI/CLI dual-track rule, server.ts:5763) — sizable surface; tracked for a separate focused PR. - Strict `--require-vela-cli` for Windows + mac-x64 beta builds: prematurely blocked — `@powerformer/vela-cli` only publishes the `darwin-arm64` platform binary today; adding the flag elsewhere would fail the builds. Revisit once win/x64/linux binaries ship. * fix(amr): hoist sendAmrAccountFailure above the AMR catalog preflight (TDZ) The new signed-out AMR branch in the catalog preflight at server.ts:10875 calls `sendAmrAccountFailure(...)` to emit AMR_AUTH_REQUIRED, but the const declaration sat ~100 lines below at the outer function scope. Because `const` is TDZ-aware, that branch would have thrown `ReferenceError: Cannot access 'sendAmrAccountFailure' before initialization` for the exact users it tries to help — defeating the original intent. Hoist the helper to just above the AMR preflight block so it's available to every AMR code path in this function. Behavior elsewhere is unchanged. Also rerun the daemon test suite: `launch.test.ts > resolveAgentLaunch uses packaged built-in Vela for AMR` was creating the `<resourceRoot>/bin/libexec/opencode/` companion directory only, but this PR's earlier tightening of `packagedVelaOpenCodeCompanionTree` also requires the inner `opencode` executable. Add it to that fixture to match the new contract; the test was a sibling of the executables / env-and-detection fixtures already updated in `13fc4f4`. Addresses #2355 review (mrcfps, 2026-05-28). * feat(web): add hover cancel for AMR login (#3158) * feat(web): add hover cancel for AMR login * fix(web): don't bounce AmrLoginPill back to 'Signing in…' after local cancel Both codex-connector (P2) and looper (CHANGES_REQUESTED) on this PR flagged the same race in the new local-cancel path: `handleCancelLogin` dispatches `notifyAmrLoginStatusChanged('login-canceled')` immediately after `/login/cancel` returns, but the `AMR_LOGIN_STATUS_EVENT` listener unconditionally re-enters `refresh()` and then restarts polling whenever `/api/integrations/vela/status` still reports `loginInFlight: true`. That is a real race because the daemon's `cancelVelaLogin()` only sends SIGTERM (escalating to SIGKILL after `LOGIN_CANCEL_KILL_GRACE_MS` = 2000 ms) and keeps the child in `activeLoginProcs` until it actually exits — so the first `/status` read after a successful cancel can legally still come back as in-flight. Under that window the pill flips back to 'Signing in…' and can later surface the timeout/error path even though the user already canceled, defeating the behavior promised in the PR description. Fix the listener instead of every dispatch site: in the `login-canceled` branch, after the local reset (stopPolling + setPending(null) + clear refs), optimistically mark every subscribed pill instance as not-in-flight (`setStatus((c) => c ? { ...c, loginInFlight: false } : c)`) and `return` — skip the refresh-and-reconcile branch below entirely. The next explicit refresh (component mount, user interaction, or a `status-changed` event) will pick up the daemon's confirmed state once the child has actually exited. Add a focused regression test that holds `/api/integrations/vela/status` at `loginInFlight: true` even after a successful `/login/cancel`, asserting that the pill stays at the Canceled → Authorize sequence and never bounces back to 'Signing in…'. This test fails on the pre-fix listener and passes on the new behavior; existing 'cancels an in-flight AMR sign-in…' and 'reconciles late AMR browser completion to Signed in after local cancel' tests continue to pass. Addresses review feedback on #3158 (chatgpt-codex-connector, nettee). --------- Co-authored-by: lefarcen <935902669@qq.com> --------- Co-authored-by: a1chzt <chizblank@gmail.com> Co-authored-by: Amy <1184569493@qq.com> Co-authored-by: Mason <jinmeihong0201@gmail.com> Co-authored-by: Caprika <56862773+alchemistklk@users.noreply.github.com> Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>	2026-05-28 05:09:55 +00:00
Marc Chan	125dcd0174	fix(ci): run fork visual reports from trusted code (#2935 ) * fix: run fork visual reports from trusted code * fix: auto-approve strict web visual capture * fix: address visual report review feedback Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix: propagate visual report storage failures Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix: validate PR screenshots before upload Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix: validate visual PR identity before comment * fix: harden fork visual report validation Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix: address remaining fork visual report review feedback Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix: handle stale fork visual report lookup Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix: allow stale fork visual report fallback Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)	2026-05-26 06:17:04 +00:00
lefarcen	c14baf07d3	Merge origin/main into release/v0.8.0 PR #2461 sync prep — resolves 14 conflicts merging 84 main-side commits on top of 58 release-side commits accumulated during the 0.8.0 cycle. Resolution summary: Take main (theirs) where main carried deliberate forward progress: - apps/web/src/components/PluginCard.tsx — 7 hunks, i18n migration: hardcoded English aria-labels/titles replaced with t() calls keyed on pluginCard.* (all 8 keys verified present in en.ts). - apps/web/src/components/TasksView.tsx — 1 hunk, source-ingestion feature: sortedRoutines (newest-first), sourceIngestionTemplates, patchSourceForm, submitSourceIngestion. activeCount/pausedCount semantics preserved (now keyed on sortedRoutines, count unchanged). - e2e/ui/app.test.ts — new node:fs/promises + tmpdir + path + @/timeouts imports needed by main-side test helpers. - e2e/ui/settings-local-cli-codex-fallback.test.ts — menu-dismissal helper block added by main. Keep both sides where each added a different field to the same object literal: - apps/web/src/components/ProjectView.tsx (locale + analyticsHints spread). - apps/web/src/components/DesignSystemFlow.tsx (locale + analyticsHints). Take release (ours) where release carried deliberate work that ships 0.8.0: - CHANGELOG.md — release-side 0.8.0 entry + PR link refs; main's Unreleased section was the same body of work, now finalized. - apps/landing-page/public/{apple-touch-icon,favicon}.png + apps/web/public/app-icon.svg — release-side visual refresh assets consistent with 0.8.0 stable ship. - tools/pack/src/linux.ts — packageVersion const required by line 466; taking main's empty line would build-error. - e2e/ui/project-management-flows.test.ts + e2e/ui/settings-api-protocol.test.ts + e2e/ui/settings-memory-routines.test.ts — release-side release-smoke hardening (shangxinyu1 + PerishFire) takes precedence on overlap. Closes-issue / unblocks: PR #2461 sync release/v0.8.0 → main.	2026-05-23 12:17:18 +08:00
Marc Chan	a3872b97a9	fix(tools-dev): preserve web origin trust on web start (#2715 ) * fix(tools-dev): preserve web origin trust on web start Restart daemon/web when the trusted web port is missing, and reuse the active web port during repeated starts so run web and start web keep app-config origin checks aligned. Generated-By: looper 0.0.0-dev (runner=worker, agent=opencode) * fix(plugins): refresh official registry bundled count Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(tools-dev): preserve daemon/web reserved ports Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(tools-dev): preserve daemon reuse on web start Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(tools-dev): preserve running daemon port on web reuse Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(tools-dev): reserve explicit web port before daemon allocation Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * test(web): stabilize media provider reload flash timing Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(web): restore merged reattach workspace coverage Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(tools-dev): reserve allocated daemon port Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * test(e2e): wait for artifact manifest persistence Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)	2026-05-23 00:25:43 +08:00
PerishFire	1ef865dd31	fix: defer Windows packaged updater installer (#2575 ) * fix: tighten packaged updater flow * test: prune noisy extended ui coverage * fix: hide unpublished release artifacts * test: validate release updater channels * fix: align prerelease release namespaces * fix: align packaged updater validation	2026-05-21 19:13:18 +08:00
PerishFire	526c7f7c26	Fix packaged auto-update release validation (#2565 ) * fix: tighten packaged updater flow * test: prune noisy extended ui coverage * fix: hide unpublished release artifacts * test: validate release updater channels * fix: align prerelease release namespaces	2026-05-21 18:15:53 +08:00
lefarcen	f5f8937421	Merge origin/main into release/v0.8.0 Conflict resolved by taking origin/main: - apps/web/src/components/EntryNavRail.tsx design-systems rail button icon name palette-filled (release-side) -> blocks (main); main's icon swap is part of the more recent design-systems rail pass.	2026-05-21 10:52:08 +08:00
lefarcen	722ddfa235	Merge origin/main into release/v0.8.0 Conflicts resolved by taking origin/main on both files. Root cause: main's PR #2460 (fix(landing): align logo.webp with brand icon) changed HomeHero.tsx's .home-hero__brand-mark to render <img src=/app-icon.svg> instead of an inlined <HeroBrandIcon /> SVG, and bundled the matching CSS (26px round badge with bg-panel + border + padding 2px) plus a gap/font-size tune. The release-side visual-refresh CSS still targeted the SVG layout (38px square, transparent, inset SVG selector). Keeping release's CSS would leave main's <img> unstyled. - apps/web/src/styles/home/home-hero.css three blocks, all taken from main: .home-hero__brand gap 8px, .home-hero__brand-mark redesigned for <img> child, .home-hero__brand-name font-size 16px. - apps/web/src/index.css two blocks, both taken from main: workspace tab close column 22px and .workspace-tab__close 18x18 (paired tune-down of tab UI spacing).	2026-05-20 22:28:38 +08:00
Marc Chan	e727168676	chore(ci): expand visual regression coverage (#2381 ) Some checks failed ci / Runtime trace (push) Blocked by required conditions Details visual-baseline / Capture visual baselines (push) Waiting to run Details ci / Detect CI change scopes (push) Successful in 0s Details landing-page-ci / Validate landing page (push) Failing after 2s Details landing-page-deploy / Deploy landing page (push) Has been skipped Details nix-check / build (push) Failing after 2s Details ci / Preflight (push) Failing after 1s Details ci / Core package tests (push) Failing after 1s Details ci / Tools workspace tests (push) Failing after 1s Details ci / Daemon workspace tests (1/2) (push) Failing after 1s Details ci / Daemon workspace tests (2/2) (push) Failing after 2s Details ci / Web workspace tests (push) Failing after 1s Details ci / E2E vitest (push) Failing after 2s Details ci / Playwright critical (starters) (push) Failing after 1s Details ci / Playwright critical (core) (push) Failing after 1s Details ci / Build workspaces (push) Failing after 1s Details ci / App workspace tests (push) Failing after 0s Details ci / Validate workspace (push) Failing after 14m14s Details * Improve visual diff annotations * Expand visual regression coverage * fix(ci): cap visual diff canvas pixels Generated-By: looper 0.8.1 (runner=fixer, agent=opencode) * Stabilize visual regression screenshots * test(e2e): stub routines for visual snapshot Generated-By: looper 0.8.1 (runner=fixer, agent=opencode) * Expand visual regression surfaces * fix(e2e): order design system visual mocks Generated-By: looper 0.8.1 (runner=fixer, agent=opencode) * fix(e2e): order design system visual mocks Generated-By: looper 0.8.1 (runner=fixer, agent=opencode) * Tune visual diff box stroke * fix(e2e): stabilize visual detail mocks Generated-By: looper 0.8.1 (runner=fixer, agent=opencode) * fix(e2e): harden visual diff box helpers Generated-By: looper 0.8.1 (runner=fixer, agent=opencode) * fix(web): preserve deep-linked project bootstrap Generated-By: looper 0.8.1 (runner=fixer, agent=opencode) * fix(e2e): stub automation task mocks Generated-By: looper 0.8.1 (runner=fixer, agent=opencode)	2026-05-20 22:25:41 +08:00
Eli-tangerine	8193981511	Keep PR 2400 changes without folder pickers (#2462 ) * feat(daemon): add project working directory management and editor hand-off functionality - Introduced new flags for project commands to manage working directories, including `--working-dir` and `--dir`. - Implemented API routes for listing available editors and opening projects in selected editors. - Added a hand-off button in the ChatPane header to facilitate opening project folders in local applications. - Enhanced the HomeHero component to include working directory and design system settings, improving user experience in project creation. - Created HomeHeroSettingsChips component for inline management of working directory and design system selection. * feat(chat): implement voice transcription proxy and enhance UI components - Added a new API route for voice transcription using OpenAI's `/audio/transcriptions` endpoint, allowing users to send audio blobs directly for transcription. - Integrated multer for handling audio file uploads in memory, ensuring efficient processing without disk storage. - Updated the HomeHero component to include example prompt suggestions for plugins, enhancing user interaction. - Introduced the EditorIcon component to visually represent different editors in the hand-off menu, improving the user experience. - Refined the HandoffButton component to utilize the new EditorIcon, providing a more cohesive interface for selecting editors. - Enhanced CSS styles for various components to improve layout and responsiveness, including adjustments to tab and button sizes for better usability. * style(workspace-shell): enhance layout and overflow handling - Updated CSS for .workspace-shell to ensure full viewport width and height, with proper overflow management. - Adjusted grid layout to prevent content overflow and maintain responsiveness. - Modified styles for .workspace-tabs-chrome to improve width handling and prevent overflow issues. * refactor(chat): remove voice transcription proxy and related components - Deleted the voice transcription proxy implementation, including the associated API route and multer configuration. - Removed the MicButton component from the ChatComposer and HomeHero components to streamline the UI. - Updated HomeHero to include example suggestions without the voice input functionality. - Adjusted CSS styles for various components to maintain layout consistency after the removal of the MicButton. * feat(daemon): implement minting of HMAC tokens for working directory management - Added a new function `mintImportTokenFromCurrentSecret` to generate HMAC tokens bound to a specified base directory, enhancing security for working directory operations. - Updated the `desktop-auth.ts` file to include the new token minting functionality, which returns structured errors when the desktop auth secret is cleared. - Introduced new IPC message types for minting import tokens in the sidecar protocol, allowing seamless integration with the daemon's working directory management. - Enhanced the `WorkingDirPill` component to utilize the new token minting flow for secure directory selection in desktop builds. - Updated CSS styles for the HomeHero component to accommodate new example suggestion features and maintain layout consistency. * fix(HomeView): import HOME_HERO_CHIPS constant for improved chip management - Updated the HomeView component to import the HOME_HERO_CHIPS constant from the chips module, enhancing the management of hero chips within the component. * feat(daemon): implement mintImportTokenViaSidecar for secure working directory management - Introduced the `mintImportTokenViaSidecar` function to facilitate the minting of HMAC tokens for desktop-import operations via the daemon's sidecar IPC. This allows CLI commands to bypass authentication when the desktop-auth gate is active. - Updated the CLI to utilize the new token minting function when setting the working directory, ensuring secure access to trust-gated API endpoints. - Enhanced the sidecar server to handle minting requests and return structured error messages for improved user feedback. - Added tests to validate the new token minting functionality and its integration with the working directory management process. - Refactored related components to support the new token flow, improving overall security and user experience. * feat(HomeHero): enhance UI components and styles for improved user experience - Updated HomeHero component to replace active dot indicators with Plug icons for better visual representation of active plugins. - Adjusted CSS styles for various elements, including padding and dimensions, to enhance layout consistency and responsiveness. - Introduced new styles for active type icons and improved hover effects for buttons. - Updated HomeHeroSettingsChips to change button titles and icons for clarity. - Added tests to ensure proper rendering and functionality of updated components. * feat(ProjectDesignSystemPicker): enhance design system selection with preview functionality - Updated the ProjectDesignSystemPicker component to include a preview feature for design systems, allowing users to see a preview of the selected design system. - Implemented hover functionality to update the preview based on the hovered design system. - Added fullscreen preview capability for a more immersive experience. - Enhanced CSS styles for the design system picker to improve layout and responsiveness. - Introduced tests to validate the new preview functionality and ensure proper interaction within the component. * feat: refactor project metadata handling and enhance design system picker - Updated the default scenario plugin ID retrieval to use project metadata, improving the logic for determining the appropriate plugin based on project intent. - Enhanced the ProjectDesignSystemPicker and related components to support localized design system summaries and categories, improving user experience. - Introduced new translations for working directory and design system picker components, ensuring better accessibility and usability across different locales. - Added a new 'live-artifact' project type to the HomeHero chips, expanding the functionality for users creating refreshable artifacts. - Updated tests to validate the new project metadata handling and design system picker functionalities. * feat: enhance localization and styling for design system components - Added French translations for working directory and design system picker components, improving accessibility for French-speaking users. - Updated CSS styles for the pet task item to ensure consistent padding and layout. - Introduced a new test suite for HomeHeroSettingsChips to validate localization and design system selection functionality. - Enhanced ProjectDesignSystemPicker tests to ensure proper localization and interaction with design system categories. * fix: update .gitignore to include all claude-sessions directories and remove specific session files - Modified .gitignore to ensure all claude-sessions directories are ignored by using a wildcard pattern. - Deleted two specific claude-sessions markdown files to clean up unnecessary session data. * fix: repair home automation ci regressions * fix: stabilize artifact consistency e2e * Remove folder picker changes from PR 2400 --------- Co-authored-by: pftom <1043269994@qq.com> Co-authored-by: qiongyu1999 <2694684348@qq.com>	2026-05-20 22:07:30 +08:00
lefarcen	1cfe274a90	Merge origin/main into release/v0.8.0 Conflicts resolved by taking origin/main on all six points: - apps/web/src/components/HomeHero.tsx:479-487 brand div removed (main dropped the .home-hero__brand wrapper; the release-side visual refresh still had it). - apps/web/src/components/HomeHero.tsx:894-898 attach Icon size 18 (main's update) replaces 20 from release. - apps/web/src/components/HomeHero.tsx:913-927 submit button uses <Icon name="arrow-up" size={22} /> (main's component refactor) instead of the release-side inline SVG. - apps/web/src/components/EntryShell.tsx:578-582 Discord Icon size 14 (main) instead of 16 (release). - apps/web/src/styles/home/home-hero.css drop .home-hero__brand / __brand-mark / __brand-name rules — main removed both the component div and these CSS rules together; keeping the CSS would be dead code. - apps/web/src/styles/home/entry-layout.css Discord badge icon color #5865f2 (main, the brand color introduced by PR #2386) instead of release's neutral var(--text-strong).	2026-05-20 20:59:00 +08:00
PerishFire	7a47829279	Fix nightly release smoke identity (#2446 ) * Fix Windows nightly release smoke identity * Fix mac nightly release smoke identity	2026-05-20 20:34:26 +08:00
shangxinyu1	71044bd3d6	test(e2e): harden extended coverage state assertions (#2245 ) * test(e2e): harden extended coverage contracts * docs(testing): add e2e hardening status * fix(web): persist artifact chips after daemon runs * ci: install playwright browsers for e2e vitest * Fix daemon run recovery across reloads Pin daemon-created runs to assistant messages immediately so hard reloads before the create response can reattach. Replay terminal and active run events from the beginning on reload so restored turns keep assistant text, thinking events, produced files, and artifacts. Fixes #2366 Fixes #2368 Fixes #2371 * test(e2e): preserve fake runtime selection across reload * fix(web): scope daemon run recovery to daemon mode * fix(e2e): remove duplicate delayed smoke flag * fix(web): scope replay artifact recovery to current run * fix(daemon): remove duplicate run-create pin	2026-05-20 16:21:01 +08:00
Sid	8bcd96f5e5	fix(frames): resolve relative screen= against embedder URL (#2316 ) Shared device frames serve at /frames/<name>.html and previously assigned the raw ?screen= value to the inner iframe.src. A project-relative value like screen=screens/foo.html resolved against /frames/, producing /frames/screens/foo.html (404), instead of the embedding project's /api/projects/:id/raw/screens/foo.html. The five frame HTML files now resolve relative ?screen= values against document.referrer when present (the embedding project preview), falling back to location.href so standalone /frames/* loads keep working. Absolute and root-relative paths are passed through unchanged. Adds an e2e Vitest spec that evaluates each frame's inline <script> in a Node vm and asserts iframe.src under five scenarios per file (25 cases total): project-relative against referrer, root-relative pass-through, absolute pass-through, empty referrer fallback, and missing ?screen= no-op. Fixes #2234	2026-05-20 10:03:01 +08:00
PerishFire	bb13eee765	chore: optimize CI and beta release runtime (#2231 ) * chore(ci): add runtime trace summaries * chore(ci): tighten measured workspace steps * chore(release): tighten beta setup steps * chore(release): slim beta windows smoke * chore(ci): shard daemon tests * chore(ci): harden runtime trace lookup * chore(release): avoid mac pnpm cache in beta * chore(ci): split critical playwright checks * chore(release): publish beta platforms from builders * test(e2e): update beta release workflow expectation * chore(ci): stop gating PRs on nix check * fix(release): keep beta latest complete	2026-05-19 18:06:28 +08:00
PerishFire	99b42726b8	Simplify CI PR gate (#2183 )	2026-05-19 13:18:41 +08:00
Olin Hendershot	74637f1cb5	Add Linux packaged client parity smoke coverage (#1204 ) * docs: plan linux client issue 709 * fix: complete linux headless lifecycle routing * feat: add linux packaged inspect * test: add linux headless packaged smoke * ci: add linux headless packaged smoke * ci: smoke linux AppImage release artifacts * docs: document linux packaged client status * chore: finalize linux client audit remediation * docs: add linux client publication packet * test: harden linux client smoke coverage * ci: preserve linux smoke audit evidence * refactor: consolidate linux e2e helpers Move pathExists and the desktop/web/daemon app-key array out of linux.spec.ts into linux-helpers.ts, where expectPathInside and linuxUserHome already live. Keeps the spec file focused on tests and the helpers file as the canonical home for shared Linux e2e utilities. * fix: move linux e2e helpers to lib * fix: address linux release review blockers * fix: drop npm dependency from containerized linux build writeAssembledApp() previously called runNpmInstall() which executed `npm install` directly. Inside the containerized build path, electronuserland/builder:base strips npm/npx/corepack, so the inner tools-pack build would fail at the assembled-app install step. Route the install through OD_TOOLS_PACK_PNPM_BIN: buildDockerArgs sets the env to the standalone pnpm binary it bootstraps, and the new resolveProductionInstallCommand helper consumes that env to run `<bin> install --prod --no-lockfile --config.node-linker=hoisted`. Host invocations with no env set keep the prior npm behavior. --config.node-linker=hoisted preserves the flat node_modules layout that electron-builder packs the same way as npm-installed trees. New tests cover the resolver branches and assert the docker-arg-to- resolver chain end-to-end so reviewers can see the container's inner build receives the env that switches its install away from npm. * fix: harden linux container bootstrap * fix: validate desktop marker liveness in headless cleanup cleanup --headless previously skipped on any parseable desktop-root.json, trapping recovery when the AppImage had crashed and left a stale marker. Validate the marker the same way stopPackedLinuxApp does: if the PID is not in the live snapshot list, proceed through cleanup instead of skipping. Extract the validation into validateDesktopAppImageMarker so the stop and cleanup paths share one definition of live and owned. Tests cover both branches: a stale marker drives cleanup to remove the runtime/output roots, while a live marker drives cleanup to skip and preserve them.	2026-05-15 16:38:29 +08:00
Nagendhra Madishetti	40766ef1ba	test(web): Critique Theater Phase 13 (reducer p99 bench + surface coverage walker) (#1318 ) * feat(web): pure reducer for Critique Theater states (Phase 7.1) Pure CritiqueState reducer driven by the contracts-level PanelEvent (the same shape both the live SSE stream and the recorded transcript emit), so a single reducer powers both the in-flight panel and the rerun replay. Lifecycle covers run_started → running → (shipped / degraded / interrupted / failed), with panelist_open / dim / must_fix / close / round_end events building per-round CritiquePanelistView entries as they arrive. Defensive behaviour that surfaced while writing the spec tests: - Terminal phases (shipped / degraded / interrupted / failed) are sticky against further lifecycle events for the same run, except for parser_warning which can land late and is recorded in a side channel without changing phase. - A new run_started for a different runId at any time discards the prior state and reboots, so the UI can launch consecutive runs without an explicit reset action. - Events whose runId does not match the active run return the same state reference, so React's useReducer doesn't re-render subscribers on stray traffic. - Round bookkeeping keys by round number rather than "always last", so an out-of-order panelist_dim for round 1 arriving after a round 2 dim does not corrupt the round 2 bucket. Test coverage: 18 cases covering each transition, the runId guard, sticky-terminal behaviour, the out-of-order round invariant, and the stable-identity guarantee. Sets up Phase 7.2 and 7.3 to wire SSE + replay into the same reducer. * feat(web): useCritiqueStream hook subscribes to SSE and feeds reducer (Phase 7.2) createCritiqueEventsConnection is a pure connection manager that mirrors apps/web/src/providers/project-events.ts: opens an EventSource at /api/projects/:id/events, listens for every name in CRITIQUE_SSE_EVENT_NAMES, decodes each frame back into a PanelEvent (stripping the critique. prefix and merging the data payload), and hands it to the caller's onEvent. Reconnect uses exponential backoff (1s → 30s) and resets on `ready`; malformed payloads drop with a dev-mode warning rather than tearing the stream. useCritiqueStream wraps the manager in a useReducer that owns the CritiqueState. enabled=false or a null projectId tears down the connection cleanly; switching projectId closes the old connection and opens a fresh one. The returned dispatch lets local UI synthesise actions (e.g. an Esc keypress firing a synthetic interrupted while a kill request is in flight); production traffic comes from the SSE stream. Test coverage: - sse.test.ts (10 cases, node env): subscription set covers every CRITIQUE_SSE_EVENT_NAMES channel; payload decoding lifts the wire shape back to PanelEvent; malformed JSON is swallowed and does not stop the stream; exponential backoff schedule and ready-reset semantics are pinned with a setTimeout seam; close() cancels pending reconnects and shuts the live source; no-op fallback when EventSource is unavailable. - useCritiqueStream.test.tsx (6 cases, jsdom env): idle pre-event, reducer driven by synthetic actions, no connection when disabled or projectId is null, clean close on unmount, projectId change reopens cleanly. * feat(web): useCritiqueReplay hook drives reducer from transcript file (Phase 7.3) Fetches the per-run NDJSON transcript (one PanelEvent per line), parses every line via the shared isPanelEvent predicate, and dispatches into the same CritiqueState reducer the live SSE stream uses. A single reducer means the UI rendering a replay can be identical to the live panel, and a UI mounting both useCritiqueStream and useCritiqueReplay in parallel does not have to reconcile two state shapes. speed knob is `paused \| instant \| live \| { intervalMs: N }`. - instant flushes every event synchronously, useful for opening a finished run already at its terminal state. - intervalMs paces dispatches at a fixed cadence so the reviewer can watch the run unfold. - paused parses the transcript but holds events back until the caller advances speed (consumers can drive a scrubber later). - live is reserved for the future "playback at original cadence" feature, currently treated as instant; replay timestamps are not yet persisted with each event so honest pacing requires a follow-up Phase 7+ task. gunzip seam handles `.ndjson.gz` transcripts via DecompressionStream when present; the production fetch path picks between text and arrayBuffer based on the URL extension. Both seams are injectable so the unit tests don't need to spin up a real network or a real gzip pipeline. Test coverage (8 cases, jsdom env): - Idle status before any URL is provided. - speed=instant flushes the full transcript synchronously to shipped state. - speed={intervalMs:N} paces with the setTimeout seam, reaching done after the last tick. - speed=paused leaves status=playing with no dispatches. - Empty transcript reports done with state still idle. - Fetch rejection surfaces an error status with the message. - Malformed NDJSON lines are skipped; valid events around them still land. - .gz transcripts route through the gunzip seam. Closes the Phase 7 plan tasks 7.1 / 7.2 / 7.3 (reducer + stream + replay), all on one branch ready for review. Phases 8+ (Theater components) consume these from this PR. * fix(web): close payload-override gap + paused-resume bug in Critique Theater hooks (Phase 7 review) Two P1 fixes from lefarcen's review on PR #1307: SSE payload override `sseToPanelEvent` previously spread `data` after the channel-derived `type`, so a payload-provided `type` could override the channel and route a `critique.run_started` frame into the reducer as a `ship` action. Reversed the spread so the channel-derived `type` is authoritative, and revalidated the resulting object through the contracts-level `isPanelEvent` predicate before returning. Frames that fail validation (missing runId, empty runId, unknown type) are dropped, so a malformed or compromised SSE frame can no longer dispatch a wrong-shape action into the reducer. Three new sse.test.ts cases pin the regression: hostile `type:'ship'` in the payload still resolves to `run_started`, missing runId is dropped, empty runId is dropped. Replay pause/resume `useCritiqueReplay` had one big effect keyed on `transcriptUrl` only, so flipping `speed` from `paused` to `instant` never re-fired and the held events sat undispatched. Split into a parse effect (depends on URL, fetches and stores events in state) and a pace effect (depends on parsed-events + speed, owns the cursor + timers). The playback cursor lives in a ref that survives pause/resume cycles, so flipping `paused` -> `instant` flushes from the current position rather than restarting (which would double-dispatch `run_started` and reset the reducer). Two new useCritiqueReplay.test.tsx cases: - paused-then-instant transitions from `playing` to `done` and reaches the shipped terminal phase - intervalMs paced playback dispatches one event, pauses to drain the next scheduled timer, flips to instant, and confirms the remaining transcript drains exactly once (cursor was preserved) Doc consistency The earlier source comment in useCritiqueReplay.ts claimed `live` "paces by recorded timestamps" while the impl used zero-delay timers and the PR body said it behaves like `instant`. Aligned to reality: `live` currently behaves like `{ intervalMs: 0 }` (events drain on successive microtasks via setTimeoutFn) because transcripts do not yet carry per-event timestamps. Honest timestamp-driven pacing is queued as a Phase 7+ follow-up. Validated: pnpm guard, pnpm --filter @open-design/web typecheck, Theater suite 47/47 (up from 42, +3 sse + 2 replay), full web suite 96 files / 888 tests. * feat(i18n): seed Critique Theater key block (en + zh-CN; other locales fall back via spread) * feat(web): Theater PanelistLane component (Phase 8.1) * feat(web): Theater ScoreTicker component (Phase 8.2) * feat(web): Theater RoundDivider component (Phase 8.3) * feat(web): Theater InterruptButton component with Escape keybind (Phase 8.4) * feat(web): Theater TheaterDegraded chip (Phase 8.5) * feat(web): Theater TheaterCollapsed post-run summary (Phase 8.6) * feat(web): Theater TheaterTranscript replay surface (Phase 8.7) * feat(web): Theater TheaterStage top-level container (Phase 8.8) * feat(web): Theater CSS using existing semantic tokens (no hex literals) * feat(web): Theater public exports barrel * fix(web): resolve P2 + P3 review feedback on Phase 8 (PR #1314) Addresses all 4 P2 + 3 P3 items from codex, Siri-Ray, and lefarcen. State-lifecycle fixes (3 x P2) 1. Reducer learns a synthetic `__reset__` action (`CritiqueResetAction`). Host hooks dispatch it when their gating prop changes so a stale run from a prior project / transcript cannot bleed into the next context. Reset is idempotent on idle (returns the same reference). 2. `useCritiqueStream` dispatches `__reset__` at the top of its connection effect, so a workspace switch from project A (which streamed a critique) to project B clears the reducer before the new EventSource opens. enabled=false also clears. 3. `useCritiqueReplay` dispatches `__reset__` at the top of its parse effect, so transcriptUrl swaps (including swap-to-null after a replay reached `shipped`) lift the reducer back to idle before the new fetch starts. SSE validation (1 x P2) 4. `sseToPanelEvent` now runs a per-variant `hasValidVariantShape` check after the cheap `isPanelEvent` predicate. A `critique.ship` frame missing `composite` / `round` / `status` / `artifactRef` is rejected before reaching the reducer, so TheaterCollapsed can no longer crash on `undefined.toFixed(1)`. Every variant's required fields are validated: run_started (protocolVersion, non-empty cast, maxRounds, threshold, scale), panelist_* (round, role, plus variant-specific shape), round_end (round, composite, mustFix, decision in {continue,ship}, reason), ship (round, composite, status, artifactRef.{projectId,artifactId}, summary), degraded (reason, adapter), interrupted (bestRound, composite), failed (cause), parser_warning (kind, position). Reducer correctness (1 x P2) 5. `panelist_open` now materializes the round + an empty panelist view (`{dims: [], mustFixes: []}`) so TheaterStage can highlight the in-progress lane the instant the tag opens. Before this, a stream that emitted only `panelist_open` after `run_started` left `rounds = []` and the UI rendered no current round until a later `panelist_dim` arrived. Polish (3 x P3) 6. Brand role tint swaps from `var(--magenta, var(--accent))` to `var(--purple, var(--accent))`. `--purple` is actually defined across the design systems; `--magenta` is not, so Brand was silently falling through to `--accent` and looking identical to Designer. 7. New i18n key `critiqueTheater.interruptedSummary` for the interrupted-collapse copy ("Interrupted at round N, best composite X.X"). Previously the interrupted branch reused `shippedSummary` and the UI read "Shipped at round..." for a run that specifically did not ship. Native value in en + zh-CN; other locales fall back via `...en` spread. 8. `TheaterDegraded` heading id comes from `useId()` instead of a hardcoded `theater-degraded-heading`, so two chips rendered on the same page (chat history with multiple completed runs) keep their aria-labelledby references unambiguous. Tests (15 new cases) - reducer.test.ts (+5): __reset__ on running/terminal/idle, panelist_open materializes round, panelist_open does not stomp prior panelist data. - sse.test.ts (+6): variant-level rejection for ship without required fields, degraded without adapter, run_started with empty cast, panelist_dim with non-numeric score, round_end with unknown decision, plus a positive fully-formed ship. - useCritiqueStream.test.tsx (+2): state reset on projectId change, state reset on enabled flip false. - useCritiqueReplay.test.tsx (+1): state reset on transcriptUrl swap to null after a replay reached shipped. - TheaterCollapsed.test.tsx (text-pinning update): asserts the interrupted branch reads "Interrupted at round 1" + "best composite 7.9", and explicitly NOT "Shipped at round...". - TheaterDegraded.test.tsx (+1): two chips on the same page get unique aria-labelledby ids that each resolve to an `<h3>`. Validated - pnpm guard clean - pnpm --filter @open-design/web typecheck clean - Theater suite: 13 files, 101 tests (was 86 on the first Phase 8 push, +15 new) - tests/i18n/locales.test.ts 5 of 5 across 18 locales * feat(web): CritiqueTheaterMount wires SSE + reducer into a single drop-in (Phase 9.1) * feat(i18n): Critique Theater strings for de + ja + ko + zh-TW (Phase 9.2) * fix(web): resolve P1 + P2 review feedback on Phase 9 (PR #1315) Addresses every blocker from codex, Siri-Ray, and lefarcen. The three state-lifecycle and SSE-validation issues they also flagged inherit fixes from PR #1314's review pass that this branch now sits on top of after rebase. Real daemon kill on Interrupt (P1) - CritiqueTheaterMount now POSTs to /api/projects/:id/critique/:runId/interrupt alongside the optimistic local dispatch. Before this fix, clicking Interrupt only flipped the React state to interrupted while the daemon job kept running. The fetch is best-effort: a 404 (endpoint not wired yet, lands in Phase 15) is swallowed with a dev-mode console.warn so the UI still moves to the collapsed badge. - New fetchInterrupt test seam lets RTL assert on the URL / method and simulate the "daemon not ready yet" path. Two tests pin both: the happy URL proj-42/critique/run-abc/interrupt POSTs, and a rejected fetch still flips the UI. interruptPending reset on new run (P2) - A ref-backed effect compares the current runId against the last one we saw; when it changes, interruptPending is cleared. A user who interrupts run-1 and then triggers run-2 from the same mount now gets a fresh, enabled kill button instead of one stuck in "Interrupting…". Pinned by a new mount test. Escape keybind scope (P2) - InterruptButton now checks the keydown target. Escape inside an input, textarea, select, or contenteditable element is ignored (and any ancestor of those via closest() is treated the same way). Body-level focus still fires the keybind so the Theater area's affordance keeps working. Four new tests cover textarea, input, contenteditable, and the body-focus positive case. userFacingName i18n key (P2) - The spec at specs/current/critique-theater.md:6 mandates a single critiqueTheater.userFacingName key so the "Design Jury" label can be renamed without touching code. Phase 8 introduced critiqueTheater.title by mistake; renamed across types.ts, en.ts, zh-CN.ts, de.ts, ja.ts, ko.ts, zh-TW.ts, and the lone consumer TheaterStage.tsx. The locale alignment test stays green. Validated - pnpm guard clean - pnpm --filter @open-design/web typecheck clean - Theater suite: 14 files, 112 tests (was 101 before, +11 new for the Phase 9 review pass: 3 mount + 4 InterruptButton focus scope; the rest were already in #1314's review fix). - tests/i18n/locales.test.ts 5 of 5 across 18 locales. * feat(daemon): adapter-degraded registry with TTL (Phase 10.1) In-memory registry recording adapters that produced malformed or oversize transcripts so the orchestrator can skip them for a TTL window (default 24h) instead of cycling through known-bad providers on every run. Records carry reason (malformed_block \| oversize_block \| missing_artifact), source label, and expiresAt. The test-only clock seam lets the suite advance time deterministically and prove that an expired entry stops counting as degraded without anyone calling clearDegraded. 7/7 vitest cases green. * feat(daemon): synthetic good + bad adapter fixtures (Phase 10.2) Two test-only adapters that read the existing v1 transcript fixtures (happy-3-rounds and malformed-unbalanced) and replay them as either a full string or a 512-byte chunked stream. The chunked form is what the conformance harness uses to prove the parser holds together when the transcript arrives in arbitrary network slices, not as one buffered blob. * feat(daemon): adapter conformance harness (Phase 10.3) runAdapterConformance pulls a transcript through the same parseCritiqueStream pipeline the orchestrator uses and classifies the outcome as shipped, degraded, or failed. On a degraded outcome it forwards the matched reason to the adapter-degraded registry, so a single nightly conformance run is what populates the skip list rather than the orchestrator learning each adapter is broken at request time. 5/5 vitest cases green covering shipped, malformed degraded, oversize degraded, no-ship failure, and the harness-thrown failure path. * test(e2e): Critique Theater Playwright suite (Phase 11) Six tests, one viewport per visual case, deterministic SSE fixtures stubbed via page.route(). Adds the suite to test:ui:extended so the existing extended-UI lane picks it up. Coverage: 1. Happy path: a single mounted theater plays the full fixture (1 run_started, 5 panelists open / dim / must_fix / close, 1 round_end, 1 ship) and ends on the score badge. 2. Interrupt mid-run: the panelist that is open at the time the interrupt button is clicked closes with an interrupted marker and the transcript freezes there. 3. Visual regression at 375x720 mobile. 4. Visual regression at 768x1024 tablet. 5. Visual regression at 1280x800 desktop. 6. A11y role tree: the theater region exposes a labelled landmark, each panelist lane is a group with an accessible name, the score is a status live region. All SSE traffic is stubbed by page.route so the suite runs in CI without a daemon. The toggle is seeded via localStorage by bootAppWithCritiqueEnabled so the gate behaves as if Settings flipped it on. typecheck clean; playwright --list reports 6. * test(web): reducer p99 bench at 10k iterations (Phase 13.1) Locks the documented 2ms budget for the Critique Theater reducer on a representative SSE script (27 actions, one full happy run) behind a regression gate. Asserts p99 stays under 4ms (2x the documented budget) so CI runners with a noisy neighbour do not flake while a real regression to 20ms or 200ms still trips. The bench is a vitest case rather than a bare microbenchmark so it runs in the same CI lane as every other web test and does not need a parallel runner. * test(web): critique surface coverage walker (Phase 13.2) Walks the public critique surface (11 SSE event names, 5 panelist roles, 6 lifecycle phases, 9 named i18n keys) and asserts each named symbol appears in both the src corpus and the test corpus. The walker is the gate that catches a rename in one half of the codebase without a matching update in the other half: a future PR that drops 'panelist_must_fix' from the reducer without also removing its test reference fails this suite. 62 assertions, one per symbol per corpus. * fix(web): tighten Phase 13 gates from lefarcen review (PR #1318) Address the actionable items from lefarcen's review of the two Phase 13 CI gates. The two questions about longer-term DX (pre- commit hook to auto-update the symbol table, AST-walker swap) are documented as deferred follow-ups rather than landed here. reducer-bench: - Describe renamed to 'reducer p99 regression gate (Phase 13.1)' so it reads as a gate, not a comparative benchmark. - Failure message now carries the full distribution (p50 / p90 / p99 / max + ceiling), so triage on a tripped gate can distinguish a real 20ms regression from a 4.001ms CI hiccup without re-running locally (lefarcen Q3). - Captured a baseline (p50=0.011ms p90=0.013ms p99=0.018ms max=0.244ms on a local Node 24 / Win11 run, 2026-05-11) inside the docblock so reviewers can see the actual reading sits ~222x below the 4ms ceiling (lefarcen Q1). - Replaced 'role as any' casts with PanelistRole-typed casts so the fixture is typecheck-strict. - Phase numbering corrected (13.2 → 13.1 to match the PR body). critique-coverage: - Symbols now grouped under four describe blocks (SSE events / panelist roles / lifecycle phases / i18n keys) so a failure points at the category that drifted at a glance (lefarcen nit). - Docblock now explains the grep-over-AST trade-off (the bug class is structural at the string level, not at the AST level) and points at the future AST-walker work as a deferred follow- up (lefarcen Q2). - Docblock now walks a contributor through the four-step maintenance flow (add to contract → add caller → add test → add literal here), so the next person to add an SSE event or i18n key knows the gate exists and what to update (lefarcen Q4). - Phase strings switched from 'phase: <name>' to bare-quoted literals so the walker is robust against single vs double quotes and ':' vs '===' source-shape changes. - Dead try/catch around 'stack = [root]' removed (cannot throw). - Per-symbol failure messages name the symbol AND which corpus is missing it, so the gate is self-describing on the next CI red. - Phase numbering corrected (13.4 → 13.2 to match the PR body). 63 / 63 vitest cases green (1 bench + 62 coverage). Web typecheck clean. * fix(web): tighten coverage walker semantics from lefarcen P2/P3 (PR #1318) Two follow-on findings on commit `338a185`: P2 — coverage gate weakened. The previous revision used one helper `corpusReferences` for both SRC and TEST corpora, and that helper accepted the unprefixed PanelEvent type form (`type: 'panelist_must_fix'`) as a substitute for the prefixed SSE wire name (`critique.panelist_must_fix`). The fallback is correct on the TEST side (reducer tests dispatch PanelEvent literals) but it weakened the SRC side: production code could drop the SSE channel name silently and the PanelEvent type alias would keep the walker green. Split into two helpers: `srcReferences` is strict (exact substring match only, no fallback) and `testReferences` keeps the lenient fallback for SSE events. The production-side assertions now route through `srcReferences` so the wire name is load-bearing again. P3 — maintenance doc overclaimed. The previous revision said 'CI red if you forget step 4' but the symbol arrays are partially hand- maintained, so a contributor adding a NEW phase string or i18n key without updating the array leaves CI green (the walker never knew to look). Rewrote the failure-mode section to distinguish the two cases: - Renaming an EXISTING symbol without updating the walker → CI red (existing assertion fails because the old name is gone). - Adding a NEW hand-maintained symbol without updating the walker → CI stays green (walker does not know to look for it). Also clarified that `SSE_EVENTS` and `PANELIST_ROLE_STRINGS` are auto-built from contracts so step 4 is one-line for `PHASE_STRINGS` and `I18N_KEYS` only. 63 / 63 vitest cases still green. * fix(web): close two P2 findings on PR #1318 (Siri-Ray + lefarcen) P2 (coverage walker counted self as evidence). The walker walked apps/web/tests, which contains apps/web/tests/components/Theater/ critique-coverage.test.ts itself. The hand-maintained PHASE_STRINGS and I18N_KEYS literals inside that file would satisfy the test-side coverage assertion against themselves, so a real Theater test that covers a symbol could be deleted and the gate would still pass. Excluded the walker file from TEST_FILES via path.resolve(__filename) filter so the test corpus only contains independent evidence. Once the walker stopped seeing itself, the gate correctly red-flagged nine i18n keys that no INDEPENDENT test exercises: critiqueTheater.userFacingName, roundLabel, composite, threshold, interrupt, interrupted, degradedHeading, shippedSummary, interruptedSummary. Component tests like TheaterCollapsed.test.tsx exercise the rendered text but never mention the key STRING, so the walker couldn't see them. Closed that gap by adding apps/web/tests/components/Theater/critique-i18n-keys.test.ts: 9 cases, one per watched key, asserting the dictionary entry exists as a non-empty string. That's both real coverage (catches a stale dict) and the independent evidence the walker requires. P2 (interruptedSummary missing from de/ja/ko/zh-TW). The native locale overrides were missing the key, so an interrupted run on a German / Japanese / Korean / Traditional Chinese UI silently fell back to the English string via the ...en spread. Added the key with {round} and {composite} placeholders preserved, using PerishCode's suggested copy from the earlier review thread. Verified: - pnpm --filter @open-design/web typecheck clean. - pnpm exec vitest run tests/components/Theater tests/i18n: 20 files / 190 tests green (critique-coverage 62 / 62, critique-i18n-keys 9 / 9 new, reducer-bench 1 / 1, locales 5 / 5). * fix(web): drop the Dict cast in i18n key coverage test (lefarcen P1 / Siri-Ray on PR #1318) The previous revision used `(en as Record<string, string>)[key]` to read each watched key. Dict has no string index signature, so CI's strict typecheck rejected the broad cast with TS2352 even though the runtime assertion was fine. Replaced with the typed pattern lefarcen suggested: type WATCHED_KEYS as `readonly (keyof typeof en)[]` and read `en[key]` directly. That removes the cast and also strengthens the test, because a renamed or removed key now fails the type check immediately rather than at runtime. Verified: - pnpm --filter @open-design/web typecheck clean. - pnpm --filter @open-design/web exec vitest run tests/components/Theater/critique-i18n-keys.test.ts: 9 / 9 green. * fix(web): tighten isPanelEvent in contracts so enum + numeric fields are checked end-to-end (Siri-Ray round-3 P1 on PR #1314) The variant validator on the web SSE path previously accepted any `typeof === 'string'` for closed-enum fields (ship.status, panelist_.role, degraded.reason, failed.cause, parser_warning.kind, run_started.cast[]) and any `typeof === 'number'` for numeric fields, which let NaN / Infinity through. Downstream components index i18n tables by enum value, so an unknown status or role would land `SHIP_BADGE_KEY[final.status]` on undefined and crash the translator. The replay parser had a separate gap: `useCritiqueReplay.parseTranscript` called the cheap `isPanelEvent` header check directly, so a recorded line like `{"type":"ship","runId":"r"}` reached the reducer with composite, status, round, artifactRef, summary all undefined and TheaterCollapsed then called `final.composite.toFixed(1)` on undefined. Resolution: move all wire-side validation into the contract guard. - Export const arrays for the closed enums: SHIP_STATUSES, DEGRADED_REASONS, FAILED_CAUSES, PARSER_WARNING_KINDS, ROUND_DECISIONS (PANELIST_ROLES already existed). - Rewrite `isPanelEvent` in packages/contracts/src/critique.ts to be the single deep validator: header (known type + non-empty runId) plus every variant-specific required field plus closed-enum membership plus Number.isFinite on every numeric field. Documented as the wire source of truth. - Drop the local `hasValidVariantShape` from web/sse.ts; sseToPanelEvent now relies entirely on the contract guard, and parseTranscript in useCritiqueReplay (which already uses isPanelEvent) gets the deeper validation for free. Tests (TDD, red-first): - packages/contracts/tests/critique.test.ts: 13 new cases pinning the strict guard directly (well-formed across every variant, every rejection path: unknown type, empty/non-string runId, unknown enum, non-finite numeric, missing variant field). - apps/web/tests/components/Theater/state/sse.test.ts: 9 new cases for each closed-enum rejection on the wire path plus a positive sweep across every legal enum value across every variant. - apps/web/tests/components/Theater/hooks/useCritiqueReplay.test.tsx: 2 new cases for incomplete and unknown-enum transcript lines. Verified: - pnpm --filter @open-design/contracts test 4 files / 30 tests green. - pnpm --filter @open-design/contracts build clean. - pnpm --filter @open-design/web typecheck clean. - pnpm --filter @open-design/web test 107 files / 976 tests green. fix(contracts): enforce numeric domains in isPanelEvent (lefarcen P2 on PR #1314 round 4) The strict guard from PR #1314 round 3 enforced enum membership and Number.isFinite, but accepted any finite number where the contract intends a specific domain: scale: 0 (ScoreTicker divides by it), negative thresholds, fractional rounds, negative mustFix, etc. ScoreTicker.tsx writes `var(--scale, ${state.scale})` into inline CSS and divides by it for tick width, so a guard-passing scale: 0 shipped Infinity into the rendered style. Negative composite / score values reached downstream code that assumes >= 0. Resolution: mirror the daemon-side Zod domain constraints in the runtime guard. Three new helpers in packages/contracts/src/critique.ts: - isPositiveInt(v): integer with v > 0. Used for round, maxRounds, scale, protocolVersion (all 1-indexed in the orchestrator). - isNonNegativeInt(v): integer with v >= 0. Used for mustFix, position, bestRound. bestRound: 0 is the valid sentinel for 'interrupted before any round closed'. - isNonNegativeFinite(v): finite number with v >= 0. Used for composite, score, dimScore, threshold. Threshold may be fractional (e.g. 8.5 on a scale of 10). Cross-field check inside run_started: threshold <= scale (the daemon Zod schema enforces this with an epsilon refine, the wire guard matches the same intent). Tests (TDD, red-first) added in packages/contracts/tests/critique.test.ts: - 22 new rejection cases across every numeric field that previously slipped through: scale: 0, negative scale, fractional scale, maxRounds: 0, fractional maxRounds, protocolVersion: 0, fractional protocolVersion, negative threshold, threshold > scale, round: 0, fractional round, negative dimScore / score, negative / fractional mustFix, negative composite, ship round: 0, negative / fractional bestRound, negative interrupted composite, negative / fractional parser_warning position. - 3 positive boundary cases that must still pass: threshold == scale, fractional threshold within [0, scale], interrupted with bestRound: 0 (no round completed before interrupt), parser_warning with position: 0 (start of stream). Verified: - pnpm --filter @open-design/contracts build clean. - pnpm --filter @open-design/contracts test: 4 files / 59 tests green (was 37 before the new domain cases). - pnpm --filter @open-design/web typecheck clean. - pnpm --filter @open-design/web test: 110 files / 1004 tests green; no regression on Theater suite, sse validator, replay parser, or assistant-feedback widget tests. * fix(web): restore wait-for-daemon-ack pattern on Theater interrupt Same regression as flagged on PR #1316 post-main-merge: the optimistic local dispatch fired before the POST resolved, so a daemon 404 / 409 still terminalized the UI and the real SSE terminal event got ignored by the sticky interrupted phase. Snapshot runId / bestRound / composite at click time, dispatch interrupted only on res.ok, clear interruptPending on rejection or non-2xx so the user can retry. Tests cover rejection + 404 leaving the run on the live stage; the 204 path waits for the ack. * test(e2e): move critique-coverage walker from apps/web/tests to e2e/tests (Siri-Ray P2) The walker is by definition a cross-app consistency check: it reads the web reducer, the daemon critique module, the contracts package, and the e2e UI suite. Hosting it under apps/web/tests/ violated the repo boundary rule (root AGENTS.md): app packages must not import another app's private src/ or tests/ as a shared helper, and cross-app consistency checks belong in e2e/tests/. The web test lane was effectively coupled to daemon and e2e file layout, so a daemon-only refactor could break the web lane. Moved the file to e2e/tests/critique-coverage.test.ts and switched the contracts import to the import.meta.glob shape the e2e package already uses (see localized-content.test.ts), so the e2e package does not have to add @open-design/contracts as a workspace dep just to load two const arrays. REPO_ROOT and SELF_PATH recalculated for the new location. Web test lane no longer depends on daemon, contracts, or e2e layout. The e2e walker covers the same 62 assertions as before: e2e/tests/critique-coverage.test.ts 62 / 62 green Web typecheck clean, e2e typecheck clean. * fix(test): add projectKind prop to FileViewer deck render after v0.7.0 merge --------- Co-authored-by: Nagendhra <nagendhra405@gmail.com>	2026-05-14 15:55:36 +08:00
nettee	0f0d2879ff	Make de/fr/ru content i18n optional (#1511 )	2026-05-13 12:17:17 +08:00
chaoxiaoche	a75d9938c7	feat(design-systems): add structured tokens.css schema (default + kami) (#1231 ) * feat(design-systems): add structured tokens.css schema (default + kami) Compile each brand's DESIGN.md prose into a machine-readable :root block agents paste verbatim, removing the "Primary → --accent" translation step where most token misuse happens. Daemon prompt injection lands in a follow-up; lint-artifact already enforces the shared token vocabulary so no rule changes needed. Schema validated across two contrasting aesthetics: - default (sans-serif, cobalt, B2B utility) — stress test the shallow form, 2-level fg / 2-level surface - kami (serif, parchment, ink-blue, print-first) — stress test the rich form, 4-level fg ramp, 3-level surface, ring elevation, i18n font stacks, and solid-hex tag tints (print renderers double-paint alpha) Schema growth from kami's stress test (5 new optional slots, all backward-compatible — default aliases via var() to existing tokens): - --fg-2 / --meta (4-level fg ramp) - --surface-warm (3-level surface) - --border-soft (2-level border) - --elev-ring (ring elevation as first-class level) Brand-specific extensions live in tokens.css with explicit "NOT in shared schema" labels and a documented promotion path (≥2 brands need it → promote to schema slot). components.html in each brand is a self-contained reference fixture that exercises every token through real layouts. Both fixtures lint clean against apps/daemon/src/lint-artifact.ts. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(design-systems): add token-fixture drift guard Each design system in design-systems/<brand>/ ships two files agents consume in tandem: tokens.css (canonical token bindings) and components.html (a self-contained fixture whose first <style> embeds the same :root paste so the file renders standalone). The fixture's :root block is a copy of tokens.css's :root block, kept in sync only by an inline comment. This adds scripts/check-tokens-fixture-sync.ts and registers it in pnpm guard. The check pairs each brand's tokens.css with its components.html and asserts the unscoped :root block is byte-equivalent after canonical normalization (CSS comments stripped, whitespace collapsed, separator spacing normalized). Brands missing one half of the pair, or with no :root rule in either file, fail the guard. Scoped overrides like :root[lang="zh-CN"] are not required to appear in the fixture (per the kami fixture's inline comment they are pasted only when an artifact's <html lang> matches), so the check only compares the unscoped :root block. Verified: pnpm guard passes for default + kami, fails on intentional value drift, fails on missing token, tolerates whitespace-only formatting differences. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(design-systems): point fixture CTAs to real files Both default and kami components.html advertised in-page anchors (#tokens, #spec, #surface, #accent, #type, #components) but defined no matching ids, so every CTA was a no-op when the fixture was opened locally — flagged by mrcfps in #1231. Re-point each link to a real artifact in the same brand directory: - "View tokens" / "Inspect tokens" / "Inspect typography" → ./tokens.css - "Read the spec" / "Read the rule" → ./DESIGN.md Browsers render these as raw source views, which is the desired UX for a reference fixture: clicking the CTA shows the underlying contract instead of jumping to nothing. Agents copying the fixture also learn the pattern of "buttons link to actual sibling resources". The :root token block is unchanged, so the token-fixture drift guard still passes for both brands. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(design-systems): codify token schema (A1/A2/B/C layers) The two-brand pilot (default + kami) settled the shape of the shared token schema; this commit codifies it as a machine-readable contract and enforces it in pnpm guard, addressing lefarcen's review on #1231: > the optional-vs-required split won't generalize cleanly when brand > #3 needs different Layer A tokens or when multiple brands converge > on the same extension (promoting C→B→A). Consider surfacing that > limitation in the PR narrative or in a future SCHEMA.md. Schema lives under design-systems/_schema/ as three files: - tokens.schema.ts — TypeScript declaration of every shared token with its layer (A1-identity / A1-structure / A2 / B-slot), plus per-brand C-extension allowlists and a global C-prefix allowlist - defaults.css — CSS mirror of A2 fallback values, used as the human-readable contract reviewer's-eye copy and the future input to the derive script - AGENTS.md — schema layer model, C → B-slot → A2 promotion rules, when-not-to-add-a-token guidance Layer model: A1-identity 8 tokens — bg/surface/fg/muted/border/accent + font-display/font-body. The brand IS these values; no fallback is defensible. A1-structure 18 tokens — type scale (8), leading (2), tracking (1), section-y (3), container (4). Structural decisions vary per brand by design and have no cross-brand default. A2 26 tokens — accent states, semantic colors, motion, base spacing scale, radius, elevation, focus, font-mono. Required in every tokens.css; fallback lives in defaults.css for the future derive script to inline when DESIGN.md does not specify the value. B-slot 4 tokens — fg-2 / meta / surface-warm / border-soft. Brand may bind independently or alias the named sibling via var(...) for components that target the richer ramp. C-extension n tokens — brand-specific names (kami's tag-bg-, leading-display, accent-light, etc.). Allowlisted per-brand in BRAND_EXTENSIONS or globally by prefix in BRAND_EXTENSION_PREFIXES. Promote when a second brand adopts the same name. Why A2 fails the guard today: Artifacts are generated by agents pasting one brand's :root block into a single <style>; there is no global stylesheet that supplies fallbacks at runtime. A tokens.css missing an A2 declaration would silently break any var() reference in the fixture. Until the derive script (PR-B) lands and inlines defaults, every brand's tokens.css must declare every A2 token directly. The guard enforces this strictly. Why --font-mono lands in A2 (not A1): 149 brands' DESIGN.md files were surveyed: 87 (58%) declare a monospace stack, 62 (42%) do not — including major brands like bmw / nike / apple / notion / mastercard / meta. Agent paste cannot rely on the brand author having written it down; a defaultable A2 fallback (with CJK brands like kami overriding) is safer than forcing every brand author to add a field they may not realize their kbd / code-block components need. Five guard checks, each registered as its own entry in scripts/guard.ts so failures attribute to a specific contract: 1. token-fixture sync — components.html :root ↔ tokens.css :root byte-equivalent (existing) 2. A1 required tokens — every brand declares every A1 token 3. A2 required tokens — every brand declares every A2 token 4. unknown token allowlist — every declared token is in schema or brand-extension allowlist 5. A2 defaults parity — defaults.css ↔ tokens.schema.ts fallback byte-equivalent Verified on default + kami: - 26 A1 tokens declared in both brands - 26 A2 tokens declared in both brands - 129 total declarations, all match shared schema or brand extensions - defaults.css ↔ tokens.schema.ts parity holds - sanity test: drifting --motion-fast in defaults.css fails check 5 with a clear divergence message The PR description originally listed "Dedicated SCHEMA.md" as explicitly NOT in this PR ("Once 3+ brands ship, extracting a single source of truth becomes worthwhile"). That boundary moves: lefarcen's review surfaced the schema-generalization risk, and the schema must exist as a machine-enforced contract before the derive script can read it. The TS file replaces the markdown that was deferred. Co-authored-by: Cursor <cursoragent@cursor.com> fix(web/tests): pass missing designTemplates prop to ProjectView Pre-existing typecheck regression on main: PR #955 (`b5eb8c16`, "generic skills + split skills/design-templates + finalize-design API") added required `designTemplates: SkillSummary[]` to ProjectView Props but updated only two of the three test fixtures that render ProjectView directly. The third — ProjectView.api-empty-response.test.tsx — was missed, so `pnpm typecheck` (and CI on any PR merging into main) fails on: apps/web/tests/components/ProjectView.api-empty-response.test.tsx (168,6): error TS2741: Property 'designTemplates' is missing in type ... The other two ProjectView tests already pass `designTemplates={[]}`, so this aligns this fixture with the existing pattern. Out of scope for #1231 strictly, but the regression blocks the merged-state typecheck CI runs that #1231 triggers, and the one-line fix here restores main's typecheck health for everyone. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(design-systems): enforce B-slot required tokens in pnpm guard Closes mrcfps + lefarcen review comment thread on #1231: > The guard validates A2 required tokens here, but there's no > sibling check for B-slot aliases (--fg-2, --meta, --surface-warm, > --border-soft). Per the schema docs, every brand must declare > A1 + A2 + B-slot names so shared components can safely read > var(--fg-2) etc. Without a B-slot guard, a brand can omit those > aliases, pass pnpm guard, and break any artifact that references > them. Same artifact-paste constraint as A2: agents render artifacts by pasting one brand's :root block into a single <style>; there is no runtime cascade, so a missing B-slot makes any var(--fg-2) reference resolve to nothing. Until now the schema narrative claimed B-slots were optional with a var() default, but no machine check enforced declaration — a contract gap reviewers reasonably refused to merge. This commit closes the gap in three places so machine and narrative agree: 1. scripts/check-tokens-fixture-sync.ts - Add checkDesignSystemBSlotRequiredTokens, mirroring the A2 check but using getBSlotNames() from the schema. - Failure message names each missing slot AND the schema-suggested alias (--fg-2 (default alias: var(--fg))) so a brand author fixing the failure has a copy-pasteable resolution. - Renumber section comments: 5 checks → 6 checks. 2. scripts/guard.ts - Register the new check between A2 required and unknown allowlist so failures attribute to a specific contract. 3. design-systems/_schema/AGENTS.md - Update the layer table: B-slot row's "If omitted" column changes from "resolves via var() to a richer sibling" to "guard fails — brand must declare, either as var(--sibling) (collapsed) or independent value (richer)". - Add a "Why B-slot is required (and what the alias is for)" section that distinguishes the schema-suggested alias from a runtime fallback, with worked examples for default (alias) and kami (independent bind). Verified on default + kami: - pnpm guard passes all 6 design-system checks - 4 B-slot tokens declared in both brands (default aliases via var(), kami binds independently — both forms satisfy the contract) - pnpm typecheck clean across the workspace - Sanity test: removing --fg-2 + --meta from default/tokens.css fires the new guard with a precise per-token alias hint: [default] design-systems/default/tokens.css is missing 2 B-slot tokens (alias the named sibling via var(...) or bind independently): --fg-2 (default alias: var(--fg)), --meta (default alias: var(--muted)) The schema contract is now machine-enforced end-to-end (A1 + A2 + B-slot all required-with-fixed-form-of-fallback). The derive script in PR-B can rely on every brand's tokens.css containing every shared slot name. Co-authored-by: Cursor <cursoragent@cursor.com> * test(e2e): skip leading-underscore meta-directories under design-systems/ CI for #1231 went red on `Validate workspace` after merging origin/main. Cause is a clean collision between two recently-landed changes: - main #1270 (`be77dc03` "Default English resource i18n fallback") tightened tests/localized-content.test.ts so every directory under design-systems/ is run through assertResourceId() with the strict RESOURCE_ID_PATTERN /^[a-z0-9][a-z0-9-]*$/. - this branch #1231 introduced design-systems/_schema/ as the home of the shared token contract (tokens.schema.ts, defaults.css, AGENTS.md). The leading underscore signals "meta-directory, not brand" — the same convention SCSS partials, Jekyll, Hugo all use. The two changes never met until CI built the merge commit, where assertResourceId('_schema') deterministically failed: Error: Design system directory _schema has malformed resource id: _schema at invariant tests/localized-content.test.ts:66:11 at assertResourceId tests/localized-content.test.ts:71:3 at readDesignSystemResources tests/localized-content.test.ts:202:8 Fix tightens readDesignSystemResources's directory filter so the leading-underscore convention is recognised explicitly: .filter((entry) => entry.isDirectory() && !entry.name.startsWith('_')) This aligns with what apps/daemon/src/design-systems.ts:listDesignSystems already does implicitly — it requires DESIGN.md per directory, so _schema/ was always invisible at runtime; the test was the only place that surfaced it. Verified locally on the post-merge tree: - pnpm test (e2e vitest) — tests/localized-content.test.ts: 4 passed - pnpm guard — all 6 design-system checks pass on default + kami - pnpm typecheck — clean across the workspace (after pnpm install to pull deps for tools/pr that arrived with main) The fix is intentionally narrow (one filter line in one test) and documents the convention inline so future meta-directories under design-systems/ (e.g. _archive/, _drafts/) are covered for free. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: chaoxiaoche <chaoxiaoche@192.168.10.16> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-11 22:23:34 +08:00
nettee	be77dc0394	Default English resource i18n fallback (#1270 )	2026-05-11 20:29:05 +08:00
Tom Huang	b5eb8c1647	feat: generic skills + split skills/design-templates + finalize-design API (#955 ) * feat: general-purpose skills with @-mention composition and user import Lift skills from "one mode-bound skill per project" to a generic capability the user can compose per turn: - Daemon: scan multiple skill roots (user-skills under runtime data, then the bundled `skills/`); user-imported skills can shadow built-ins by id. - New `POST /api/skills/import` and `DELETE /api/skills/:id` endpoints, with CONFLICT/BAD_REQUEST/NOT_FOUND error codes and built-in delete protection. - ChatRequest gains `skillIds: string[]`; the chat run concatenates each picked skill's body (and merges craftRequires) into the system prompt for that turn only — the project's persistent `skillId` is untouched. - Web composer: `@` popover now lists skills alongside project files; picks render as removable chips above the textarea and ride along with the request as `skillIds`. - Settings → Library: import form (name/description/triggers/body), per-card delete for user skills, "user" origin badge. * chore(web): drop welcome pet teaser + add ds→prompt-template mapping util - SettingsDialog: remove the inline pet adoption teaser from the welcome panel so the first-run modal stays focused on configuration. - New `inferPromptTemplateCategoriesForDs(ds)` helper that maps a design system's authored metadata to prompt-template gallery categories. Imported by the design-system gallery wiring on a sibling branch; no callers in this branch yet. * feat: split skills/design-templates and add finalize-design API Phase 0 of the skills/design-templates refactor (specs/current/ skills-and-design-templates.md): - Move ~104 rendering catalogue entries from skills/ to design-templates/ and keep skills/ for the small set of functional skills that do work on user input (utilities, briefs, packagers). - Add design-templates/AGENTS.md and skills/AGENTS.md describing the contract, and a brand-agnostic craft/ surface for opt-in craft rules. - Daemon: add DESIGN_TEMPLATES_DIR / USER_DESIGN_TEMPLATES_DIR roots and an /api/design-templates surface mirroring /api/skills. Asset/example routes still span both registries so existing srcdoc URLs keep resolving across the rename. - Web: split LibrarySection into SkillsSection + DesignSystemsSection, rename the EntryView "Examples" tab to "Templates", and update locales + the New-project picker accordingly. Adds the finalize-design endpoint: - New apps/daemon/src/finalize-design.ts and packages/contracts/src/api/ finalize.ts — one-shot synthesis of a project's transcript + active design system + current artifact into <projectDir>/DESIGN.md via the Anthropic Messages API. Per-project .finalize.lock mirrors the transcript-export hygiene from PR #493; provider credentials are not persisted by the daemon. Other supporting changes: - README + AGENTS.md updates to document the new directory split and craft/ surface, plus i18n strings across 13 locales. - Test refactors and new coverage (finalize-design, runs, sidecar server, plus refreshed daemon integration tests). - .gitignore: scope the .exe ignore to /OpenDesign.exe so legitimate vendor binaries are no longer hidden. fix(merge): move clinical-case-report to design-templates/ Origin/main added the clinical-case-report skill under skills/ before the skills/design-templates split landed. Its od.mode is prototype, so per specs/current/skills-and-design-templates.md it is a design template and belongs alongside the other rendering catalogue entries — not under the slimmed-down functional skills/ root. Moving it keeps the EntryView Templates tab consistent with origin/main's intent. * feat(skills): curated design/creative catalogue + collapsible Settings rows Seed ~100 curated design/creative skill stubs under skills/ sourced from awesome-claude-skills (ComposioHQ) and awesome-agent-skills (VoltAgent). Each stub carries an od.category tag so the new filter pill row in Settings -> Skills can group them. The seed script (scripts/seed-curated-design-skills.ts, pnpm seed:curated-design-skills) is idempotent: it only creates folders that don't already exist, so hand-edited stubs are never overwritten. - Daemon: parse and surface od.category on SkillInfo with a strict slug normaliser; mirror the field on SkillSummary in @open-design/contracts. Category is purely a UI hint — system-prompt composition is unchanged. - Web: rewrite SkillsSection from a left-list / right-detail grid into a vertical stack of collapsible rows mirroring the External MCP panel (header always visible with name + mode/source/category pills + per-row enable toggle; SKILL.md preview, file tree and inline edit form expand on demand). Add a Category filter row above the list. Reorder Settings nav so Skills + External MCP sit above the Composio/MCP cluster. Update composer placeholder/hint across 17 locales to advertise '@ files or skills · / for commands'. - Docs: extend skills/AGENTS.md with the curated catalogue rules (idempotency, category vocabulary, no upstream vendoring). Co-authored-by: Cursor <cursoragent@cursor.com> * test(skills): teach localized-content + system-prompt tests about the skills/design-templates split mrcfps blocking review on PR #955: the skills/design-templates split (`b5993385`) moved ~110 SKILL.md entries out of `skills/` and into `design-templates/`, but two repo-level tests still hard-coded the single-root layout, so CI gates went red on the merged branch: - `e2e/tests/localized-content.test.ts` only scanned `<repo>/skills` while the locale `skillCopy` map keeps id-keyed entries spanning both roots (ExamplesTab/Templates uses one lookup regardless of origin). Teach the helper to read both `skills/` and `design-templates/`, deduplicating ids so the union matches the localized claim. - `apps/daemon/tests/prompts/system.test.ts` read `skills/live-artifact/SKILL.md`, which now lives under `design-templates/live-artifact/`. Update the absolute path so composeSystemPrompt's coverage of the live-artifact preamble is exercised again. Also enroll the curated design/creative catalogue (PR #955, ~91 stubs sourced from awesome-claude-skills / awesome-agent-skills) in the DE / FR / RU `_SKILL_IDS_WITH_EN_FALLBACK` lists. The stubs are English-only by design (frontmatter advertises an upstream URL); the fallback list is exactly the place to acknowledge "we know this id exists, English copy is fine here" so the localized-content coverage gate passes without forcing a translation task per locale. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(skills): always quote frontmatter name so importUserSkill round-trips numeric / boolean ids mrcfps PR #955 review: `buildSkillMarkdown` emitted `name: ${escapeYamlString(name)}` without quotes, so YAML coerced names like `123`, `true`, `false`, or `null` into non-string scalars on re-parse. listSkills() then read `data.name` as a number/boolean and the import flow's follow-up `findSkillById(skills, result.id)` missed it, falling into `/api/skills/import`'s "imported skill could not be re-read" 500 path for those ids. Switch the emitter to a quoted scalar (`name: "..."`) — the double-escape already in `escapeYamlString` makes the quoted form safe — and add a round-trip test covering `123`, `true`, `false`, `null`, and `0` to lock in the contract. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(web): drop staged-skill chips when the matching @<id> token leaves the draft mrcfps PR #955 review: `submit()` always forwarded every id in `stagedSkills`, but that state was only mutated on picker click and chip removal. Hand-deleting an `@<id>` token from the textarea left the chip staged, so the request still carried `skillIds: [<id>]` and the daemon composed a skill the prompt no longer referenced. Sync the chips with the draft inside `handleChange()` by pruning `stagedSkills` whenever the new value no longer contains the `@<id>` token (using the same whitespace boundary as `removeStagedSkill`'s strip regex). Comment explains why this prune does not run for `staged` file attachments — users frequently add files via the upload button without leaving an `@<path>` token, so a symmetric prune there would erase legitimate uploads. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(daemon): stage @-composed skills' side files alongside the active skill codex PR #955 review: composing a per-turn `@`-picked skill into the system prompt appended its body (with the `withSkillRootPreamble` guidance pointing at relative paths under `<cwd>/.od-skills/<folder>/`) but never staged the actual folder. `startChatRun` only copied `activeSkillDir`, so when the project's primary skill was different (or absent) the composed skill's references/, examples/, and scripts/ files lived only at their absolute repo path — agents that honour the cwd-relative form (or that don't get `--add-dir`, e.g. Codex with allowlisted gpt-image projects) couldn't reach them. Thread the composed skills' dirs out of `composeDaemonSystemPrompt` as `extraSkillDirs` and stage each one through the same `stageActiveSkill` API used for the primary skill. Dedupe by folder basename so a project whose primary skill is also `@`-composed isn't copied twice. Each preamble already advertises its own folder, so the prompt and the staged tree stay aligned without further changes. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(web): respect the Library disable toggle in the project @-mention picker codex PR #955 review: only `EntryView` received `enabledSkills` (filtered against `config.disabledSkills`); active projects still got `skills={skills}` raw, so a skill the user disabled in Settings kept appearing in the project's `@`-mention popover and could ride along to the daemon via `skillIds`. That broke the Library toggle for any project opened on the post-split branch. Compute a functional-skills-only enabled subset (`enabledFunctionalSkills`) and pass it into `<ProjectView>` instead. Templates stay separate — design-templates are filtered through their own `enabledDesignTemplates` memo for the Templates gallery — so ProjectView's chat composer still only sees skills, never templates, matching the pre-split prop surface. Co-authored-by: Cursor <cursoragent@cursor.com> * test(e2e): mock /api/design-templates for example-use-prompt flow The Templates tab in EntryView fetches from /api/design-templates after the skills/design-templates split (specs/current/skills-and-design-templates.md). The example-use-prompt Playwright scenario only mocked /api/skills, so the gallery card never appeared and the test timed out waiting on example-card-warm-utility-example. Serve the same fixture summary on both endpoints so the templates gallery renders the card the test clicks. Co-authored-by: Cursor <cursoragent@cursor.com> * test(tools-pack): create design-templates fixture for resources test The packaging resources copy now bundles the new design-templates tree alongside skills (see resources.ts BUNDLED_RESOURCE_TREES). The copyBundledResourceTrees fixture only created skills, design-systems, craft, etc., so the recursive copy crashed with ENOENT on design-templates before it could check the prompt-templates assertion. Add the missing fixture directory so the test exercises the same set of resource trees the packaged build does. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(skills): clone built-in side files into the shadow on first edit mrcfps PR #955 review: editing a built-in skill wrote a USER_SKILLS_DIR shadow folder that contained only a new SKILL.md. The next listSkills() pass surfaced the shadow as the active dir, but every side-file resolver (/api/skills/:id/files, /example, /assets/, the system-prompt preamble, and the per-turn cwd staging) reads through skill.dir. With nothing but SKILL.md in the shadow, the bundled assets/, references/, scripts/, and examples/ disappeared the moment the user hit save — a built-in like last30days or live-artifact would break immediately after edit instead of just having its body overridden. Teach updateUserSkill() to take a `sourceDir` and clone every entry except SKILL.md / dotfiles into the shadow on the very first edit. The shadow stays self-contained, so all the resolvers keep working without fallback bookkeeping. Subsequent edits detect the existing shadow and skip the clone, so user tweaks under the side tree survive a re-save. Wire `sourceDir: skill.dir` from server.ts's PUT /api/skills/:id handler and add two regression tests: - 'clones built-in side files into the shadow on the first edit' walks the file tree after save and asserts assets/template.html, references/ notes.md, and scripts/helper.sh all round-trip from the built-in. - 'preserves user-edited side files on subsequent edits' edits the staged assets/template.html, re-saves, and confirms the user content is still there. Co-authored-by: Cursor <cursoragent@cursor.com> test(e2e): rename home tab from Examples to Templates The Examples tab was renamed to Templates in EntryView (b5993385's skills/design-templates split — entry.tabExamples became entry.tabTemplates and the tab value moved from 'examples' to 'templates'), but entry-chrome-flows still asserted the old label and testId. Update both. * fix(skills+web): preserve template body in API mode and dir-based skill delete Two follow-ups from PR #955 review: 1. ProjectView only received `enabledFunctionalSkills`, but `composedSystemPrompt()` still resolved `project.skillId` through that prop and `fetchSkill()`. Projects created from the new `/api/design-templates` surface keep a template id in `project.skillId`, so opening one in API mode dropped the template body from the system prompt and the upstream request ran without the project's primary template instructions. Now ProjectView takes a separate `designTemplates` prop (the unfiltered template list, so a later-disabled template still loads for projects already created from it) and `composedSystemPrompt()` plus the metadata / `isDeck` lookups fall back to that list, with `fetchDesignTemplate()` as the body-fetch fallback to `fetchSkill()`. The chat composer's `@`-picker keeps receiving only the enabled functional skills. 2. `DELETE /api/skills/:id` used `deleteUserSkill(USER_SKILLS_DIR, skill.id)` which re-slugified the frontmatter id and removed `<userSkillsDir>/<slug>/`. That matched the import shape but missed the install shape — `installFromTarget` writes the folder at `sanitizeRepoName(url)` (GitHub) or `path.basename(realpath)` (local symlink), neither of which is guaranteed to equal the slugified frontmatter `name`. A duplicate `app.delete('/api/skills/:id', ...)` handler at the install routes never fired because Express resolved the earlier registration first, leaving the install/uninstall path without working teardown. The handler now removes `skill.dir` (the absolute path listSkills already discovered) under a USER_SKILLS_DIR safety check, using `lstat` + `unlinkSync` so symlinked local installs unlink cleanly without recursing into the user's source tree. The dead duplicate handler is removed; `deleteUserSkill` is dropped from the server.ts import set (still exported and unit-tested in skills.ts). Regression coverage in `apps/daemon/tests/skills-delete-route.test.ts` pins both shapes plus the symlink-preserves-source case. * test(daemon): point hyperframes system-prompt test at design-templates The merge with main brought in a hyperframes system-prompt test that reads `skills/hyperframes/SKILL.md`, but this branch's split moved `hyperframes` into `design-templates/` (same migration as `live-artifact` already handled above in this file). CI was failing with ENOENT on the old path. --------- Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-11 17:48:34 +08:00
PerishFire	31e57fd773	fix(daemon): persist runStatus/endedAt on chat run termination (#1230 ) * fix(daemon): persist runStatus/endedAt on chat run termination (#135) POST /api/runs created the run but never reconciled the messages row on terminal status. If the web failed to persist the cancel (refresh, dropped PUT), the row stayed at run_status='running' / ended_at=NULL, and on reload the elapsed timer kept climbing because the renderer fell back to now - startedAt. Mirror routine/orbit reconciliation: attach a wait-completion handler that updates run_status and ended_at, guarded by COALESCE and a run_status IN ('queued','running') filter so concurrent web persists are not clobbered. Adds cancelRun helper and two regression specs under e2e/tests/dialog/. * fix(daemon): annotate reconcile callback params for chat-routes The chat run reconciliation block landed in chat-routes.ts after the recent server-route split (#1043), where stricter type checking surfaces implicit `any` parameters. Annotate the wait/then callback as `{ status: string }` and the catch callback as `unknown`. * refactor(daemon): extract reconcileAssistantMessageOnRunEnd helper The inline if/wait/then/catch block in POST /api/runs read as a bolt-on patch. Lift it to a named file-scope helper so the route handler stays intent-level (start the run, arrange follow-up reconciliation) and the guard for missing assistantMessageId is an internal detail. The helper's docblock describes the invariant ("messages row reflects the run's terminal state even without web persist"); commit history keeps the issue context. * test(e2e): wait for any terminal status in stop-reconcile spec The earlier .catch fallback chained two waitForRunStatus calls (canceled then succeeded). waitForRunStatus throws on the first non-expected terminal, so a canceled run that resolves to failed (e.g. agent exits non-zero on SIGTERM) would still abort the test before reaching the messages-row assertion. Add waitForRunTerminal to e2e/lib/vitest/runs.ts: polls until any terminal status without throwing on mismatch, since this spec's claim is about the resulting messages row, not which terminal the run took. Addresses Codex inline review on PR #1230.	2026-05-11 15:37:52 +08:00
PerishFire	976edaf38e	test: harden e2e smoke and release reports (#1140 ) * test: harden e2e inspect specs * test: wire e2e release reports * chore: bump packaged beta base to 0.6.1 * test: run release smoke vitest directly * test: add suite-owned tools-dev lifecycle * ci: harden stable release packaging * fix(release,e2e): gate stable signing on verify and harden suite cleanup - restore `needs: [metadata, verify]` on the stable release `build_mac`, `build_mac_intel`, `build_win`, and `build_linux` jobs so Apple signing/notarization and Windows release builds cannot run before pnpm guard, typecheck, and layout checks complete on the metadata commit. - in `runToolsDevSuite`, drop the `started` flag and always attempt `stopToolsDevWeb` in `finally`; record stop errors in diagnostics, and when the test body succeeded, escalate the stop failure to the suite result and rethrow — so orphan daemon/web processes from an interrupted `startToolsDevWeb` or a broken shutdown can no longer pass silently. Addresses PR #1140 review feedback from lefarcen and mrcfps.	2026-05-11 13:11:16 +08:00
code-Y	84f768d4a2	feat: add WeChat design system, login-flow skill, and fix API mode tool_calls bug (#1083 ) * feat: add WeChat design system, login-flow skill, and fix API mode tool_calls bug - Add WeChat design system (design-systems/wechat/) with full brand spec including color palette, typography, and component rules for chat UI - Add login-flow skill (skills/login-flow/) for mobile authentication flows with P0 checklist, example HTML, and i18n registration across 3 locales - Fix DeepSeek V4 bug: API/BYOK mode (streamFormat=plain) models now receive a directive to emit only <artifact> HTML blocks and suppress tool_calls, since plain adapters proxy to external providers that cannot execute tools * fix: restore full server.ts and WeChat DESIGN.md from ad46d8cd commit Restore files that were corrupted in PR #1083 head branch. The WeChat DESIGN.md was reduced to a single line (filename only) and server.ts was reduced to ~1 line. Both are restored to their original ad46d8cd state with full content. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix: restore full server.ts and WeChat DESIGN.md from ad46d8cd Restore files corrupted in PR #1083: - apps/daemon/src/server.ts: restored 7106-line file - design-systems/wechat/DESIGN.md: restored 301-line WeChat design spec - skills/login-flow/SKILL.md: restored from local working state - skills/login-flow/example.html: restored 351-line example HTML * fix: only suppress tool_calls when streamFormat='plain' explicitly, remove nonexistent assets/template.html 1. streamFormat check now requires explicit 'plain' value instead of defaulting to 'plain' when undefined. This prevents normal tool-using chat runs from incorrectly inheriting the API/BYOK tool_calls suppression rule. 2. login-flow SKILL.md: removed reference to assets/template.html since that file does not exist in the skill bundle and derivePreflight() would inject a hard instruction to read it before any other tool, causing pre-flight to fail. * fix: thread streamFormat to composeSystemPrompt in server.ts call Previously the composeSystemPrompt call at line ~4940 omitted streamFormat, causing the composer to default to 'plain' and suppress tool_calls even for tool-using chat runs. Now streamFormat is passed through from the adapter definition so the API mode rule only fires when streamFormat='plain' is explicitly set. * fix: WeChat category metadata, font-family, and login-flow example interactivity WeChat DESIGN.md: - Add Category: Social & Messaging metadata so it appears correctly in picker - Fix font-family declaration: remove invalid -webkit-font-family prefix, use standard font-family so downstream CSS generation works correctly skills/login-flow/example.html: - Add password toggle click handler so show/hide actually works - Change Apple icon fill from hardcoded #fff to currentColor so it is visible on light backgrounds * fix: mirror streamFormat suppression in contracts composer and add WeChat i18n 1. packages/contracts/src/prompts/system.ts: Add streamFormat parameter to ComposeInput and ComposeInput interface, mirroring the same suppression rule from daemon prompts/system.ts. When streamFormat='plain' is passed, a directive is appended telling models not to emit tool_calls and to only output <artifact> HTML blocks. 2. apps/web/src/i18n/content.{ts,fr,ru}.ts: Add WeChat design system entries: - Add 'wechat' to DE/FR/RU_DESIGN_SYSTEM_IDS_WITH_EN_FALLBACK arrays - Add 'wechat' summary to DE/FR/RU_DESIGN_SYSTEM_SUMMARIES - Add 'Social & Messaging' category to DE/FR/RU_DESIGN_SYSTEM_CATEGORIES (matching the Category: Social & Messaging metadata in WeChat DESIGN.md) * fix: thread streamFormat='plain' into web composeSystemPrompt for api mode * test: focus localized content coverage on missing resources --------- Co-authored-by: Open Design Contributor <z@open-design.dev> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: mrcfps <mrc@powerformer.com>	2026-05-10 20:38:33 +08:00
Marc Chan	b06f26a5fd	test: strengthen e2e PR coverage (#796 ) * test: strengthen e2e PR coverage * fix: address e2e PR feedback Generated-By: looper 0.6.1 (runner=fixer, agent=opencode) * fix: address e2e PR feedback Generated-By: looper 0.6.1 (runner=fixer, agent=opencode) * ci: cache Windows packaged smoke builds * test: fake additional agent runtimes * fix: address e2e PR feedback Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix: address e2e PR feedback Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix: address e2e PR feedback Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix: address e2e PR feedback Route tools-pack mac starts through a launch-time packaged config override so portable packaged smoke runs keep using the namespace runtime root that inspect and logs expect. Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix: address e2e PR feedback Fall back to the packaged app's embedded config when the build output config is missing so installed mac starts still work. Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix: align packaged mac PR smoke with tools-pack runtime mode Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix: address e2e PR feedback Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix: address e2e PR feedback Keep blake3-wasm out of the packaged mac daemon prebundle so the standalone runtime loads the Cloudflare asset hasher from node_modules instead of crashing in ESM. Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix: address e2e PR feedback Skip the portable mac launch override when the bundled packaged config is missing so installed fallback app targets can still boot with packaged defaults. Add a regression test covering the missing-config start path. Generated-By: looper 0.6.2 (runner=fixer, agent=opencode) * fix(pack): remove duplicate mac prebundle dependency key	2026-05-08 16:48:10 +08:00
PerishFire	f1cdb2844a	test(e2e): gate beta packaged runtime (#637 ) * test(e2e): gate beta mac packaged runtime * test(e2e): separate ui automation layout * test(e2e): move localized content coverage * chore(release): prepare packaged 0.4.1 beta validation * test(e2e): keep ui lane playwright-only * fix(web): keep chat recoverable after conversation load failure * fix(desktop): honor native mac quit	2026-05-06 17:44:29 +08:00
nhancdt2602	d1d63f9dae	Fix/error message persistence (#623 ) * test(ProjectView): persists daemon errors on assistant messages Signed-off-by: nhancdt2602 <nhancu2602@gmail.com> * fix(ProjectView): persists daemon errors on assistant message Signed-off-by: nhancdt2602 <nhancu2602@gmail.com> * test(e2e): cover agent switch and persistence Signed-off-by: nhancdt2602 <nhancu2602@gmail.com> * chore(chat-event): handle falsy empty error detail Signed-off-by: nhancdt2602 <nhancu2602@gmail.com> --------- Signed-off-by: nhancdt2602 <nhancu2602@gmail.com>	2026-05-06 15:28:30 +08:00
jiakeboge	2473ab9567	fix(web): add copy buttons for FileViewer code blocks (#471 ) * fix(web): add copy buttons for FileViewer code blocks * fix(web): harden FileViewer markdown copy controls * fix(web): restore focus after clipboard fallback * test(e2e): restore execCommand after markdown copy tests	2026-05-05 09:09:39 +08:00
Tom	5a0f954297	fix(daemon): emit tool_use from tool_execution_start in pi-rpc (#186 ) * fix(daemon): emit tool_use from tool_execution_start in pi-rpc The pi-rpc adapter emitted tool_use from message_end, which fires before tool execution starts. The web UI pairs tool results to prior tool_use events, so receiving tool_result without a preceding tool_use broke tool card rendering and file auto-open behavior. Move tool_use emission to tool_execution_start, matching the pattern in copilot-stream.ts. Remove redundant tool call extraction from message_end (tool_use is now emitted at execution time, usage is already emitted from turn_end). Extract mapPiRpcEvent as a pure exported function so tests exercise the real event mapping logic instead of an inlined copy that can diverge from production. Ref: mrcfps review comments on PR #117 * docs(daemon): clarify mapPiRpcEvent mutability contract The function mutates ctx.sentFirstToken to track streaming state. Calling it "pure" is misleading; revised the doc comment to say no I/O or child process interaction instead. * fix(pi-rpc): remove redundant status(tool) emission from tool_execution_start Now that tool_use fires inline from tool_execution_start, the accompanying status(tool) event is redundant: tool_use already carries the tool name, and the UI renders running state from the tool card. The extra status pill breaks consecutive tool_use grouping in AssistantMessage.buildBlocks. Aligns with copilot-stream, which emits only tool_use from tool.execution_start with no status event.	2026-05-02 16:06:37 +08:00
Siri-Ray	0bafc73d24	Add visible conversation timestamps (#120 ) * Add visible conversation timestamps Generated-By: looper 0.2.7 (runner=worker, agent=codex) * Fix assistant message timestamps Generated-By: looper 0.2.7 (runner=fixer, agent=codex)	2026-05-02 11:15:07 +08:00
d 🔹	f8af2cd875	fix(web): exit PreviewModal fullscreen on first Esc press (#168 ) Closes #141. When the user clicked the Fullscreen button, requestFullscreen() put the stage element into native browser fullscreen and React's `fullscreen` state was set true. Pressing Esc was meant to exit the overlay, but in browsers like Firefox the browser consumes Esc to drop its native fullscreen element without delivering keydown to JS. The React state stayed true, the `ds-modal-fullscreen` class lingered, and only a second Esc reached the keydown handler that flipped the state. Subscribe to `fullscreenchange` so the React state mirrors the native state. When the browser exits its fullscreen element, the overlay drops on the same keystroke. The keydown handler is still needed for the fallback path (no native fullscreen API support, where requestFullscreen is undefined and only React state is set). Adds three regression tests in e2e/tests/preview-modal-fullscreen.test.tsx covering the bug fix path, the keydown fallback, and a non-collapse guard for transitions where another element is still fullscreen. Co-authored-by: d 🔹 <258577966+voidborne-d@users.noreply.github.com>	2026-04-30 23:35:01 +08:00
Tom	8f34e39b7b	feat(daemon): add pi coding agent adapter (#117 ) * feat(daemon): add pi coding agent adapter Add pi (https://pi.dev) as a supported coding agent, using its --mode rpc JSON-RPC protocol over stdio for structured event streaming. Changes: - apps/daemon/pi-rpc.js: new RPC session handler that drives pi's --mode rpc protocol, translating typed agent events (text_delta, thinking_delta, tool_use, tool_result, usage, status) into the daemon's UI event format. Auto-resolves extension UI requests (fire-and-forget consumed, dialogs auto-approved) so pi stays unblocked in the headless web UI. Kills the process after agent_end since pi's RPC process is designed for multi-prompt sessions. - apps/daemon/agents.js: add pi agent definition with custom fetchModels (pi --list-models outputs to stderr, not stdout), 575+ models from 20+ providers, reasoning/thinking level support via --thinking flag, and streamFormat 'pi-rpc'. - apps/daemon/server.js: wire pi-rpc stream format to attachPiRpcSession; skip stdin.end() for pi-rpc since the RPC session manages stdin bidirectionally. - apps/daemon/acp.js: export createJsonLineStream for reuse by pi-rpc.js. - apps/daemon/pi-rpc.test.mjs: 19 unit tests covering model list parsing (TSV, dedup, edge cases), RPC event translation (text, thinking, tools, usage, compaction, retry), sendCommand wire format, extension UI auto-resolution. - e2e/tests/structured-streams.test.ts: add pi RPC tool_use/tool_result event mapping test alongside existing Claude/Copilot fixtures. Verified end-to-end: daemon /api/chat → pi RPC → SSE stream with status, text_delta, usage, and tool events. Live E2E test passes (OD_E2E_RUNTIMES=pi). All 59 project tests green. * refactor(daemon): migrate pi-rpc to TypeScript Follow upstream #118 TypeScript migration convention: rename pi-rpc.js → pi-rpc.ts and pi-rpc.test.mjs → pi-rpc.test.ts with @ts-nocheck header (same as all other daemon modules). Import paths remain ./pi-rpc.js per NodeNext module resolution. * fix(daemon): avoid duplicate usage events in pi-rpc handler Pi emits both message_end and turn_end per turn, both carrying usage data. Emitting from both handlers caused double-counting in the UI and any consumer that aggregates usage. Remove usage emission from the message_end branch since turn_end is the canonical per-turn usage source. Keep tool call extraction in message_end (unique data not available in turn_end). Add regression test confirming exactly one usage event is emitted when both message_end and turn_end carry usage for the same turn. Addresses Copilot P2 review on PR #117. * fix(daemon): scope pi RPC id counter per session, bump graceful shutdown Move nextRpcId and sendCommand inside attachPiRpcSession as local state, matching the pattern in acp.ts where nextId is scoped per session. Prevents RPC id collisions across concurrent /api/chat requests. Bump post-agent_end SIGTERM grace period from 2s to 5s and make it configurable via PI_GRACEFUL_SHUTDOWN_MS env var for resource- constrained machines. Add test confirming concurrent sessions get independent id sequences. * fix(daemon): wrap parser.feed in try-catch in pi-rpc Catch errors from parser.feed and route them through the existing fail() handler instead of letting them propagate as unhandled exceptions.	2026-04-30 17:45:11 +08:00
PerishFire	c6d11018a0	Refresh desktop integration control plane (#123 ) * feat(dev): add desktop tools-dev control plane * refactor(sidecar): split Open Design contracts Move Open Design-specific sidecar protocol definitions into @open-design/contracts so sidecar and platform can remain descriptor-driven primitives. * refactor(daemon): organize package sources Keep daemon app code, tests, and sidecar entrypoints in separate package directories so each layer can be built and verified independently. * chore(repo): streamline maintenance entrypoints Centralize agent guidance by directory and reduce root command chains while preserving the existing build scope. * docs: translate agent guidance to English * fix(sidecar): tolerate stale IPC sockets Remove stale Unix socket files only after confirming no listener is active, so tools-dev can restart after unclean shutdowns.	2026-04-30 14:23:53 +08:00
PerishFire	cfebff9653	Align app directories and isolate e2e tests (#102 ) * chore: align app directories * test: consolidate external suites under e2e	2026-04-30 09:47:03 +08:00

36 commits