mirror of
https://github.com/nexu-io/open-design.git
synced 2026-06-01 03:14:35 +07:00
24 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
76c7d31c53
|
chore: bump vela cli to 0.0.4 (#3239)
* chore: bump vela cli to 0.0.4-test.0 * chore: refresh lockfile for vela cli 0.0.4-test.0 * chore(nix): refresh pnpm deps hash * fix: materialize electron before mac release checks * fix: rebuild electron when mac framework links are invalid * revert: drop release workflow experiments * chore(nix): refresh pnpm deps hash * fix: stop blocking beta mac release on electron symlink preflight * fix: stop using custom electron dist for beta mac packaging * fix: guard oversized chat images and opencode overflow * chore: bump vela cli to 0.0.4 * chore(nix): refresh pnpm deps hash * fix(daemon): surface prompt-image stat failures instead of dropping them resolveSafePromptImagePaths only swallowed unresolvable path input; once a path was confirmed inside UPLOAD_DIR and existed, a statSync failure (EACCES/EPERM, a file vanishing mid-run) silently dropped the image and let the run continue without that prompt context. Since this helper is now also the 1 MB enforcement point, that turned an infra/validation failure into a 'successful' run with missing required context. Collect those into a new failedImages bucket and fail the run with INTERNAL_ERROR at the call site, mirroring the oversized-image guard. Add a unit test covering statSync throwing. --------- Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com> Co-authored-by: lefarcen <935902669@qq.com> |
||
|
|
df8a0faff6
|
feat(runtimes): register AMR (vela) as an ACP stdio agent (#2355)
* feat(runtimes): register AMR (vela) as an ACP stdio agent
AMR is the vela CLI's ACP runtime mode. `vela agent run --runtime opencode`
speaks ACP JSON-RPC over stdio (see vela's
`specs/current/runtime/manual-agent-run-openrouter.md`); per
`docs/new-agent-runtime-acp.md` we expose it through the same `streamFormat:
'acp-json-rpc'` transport that already powers Hermes, Devin, Kimi, etc.
The new `defs/amr.ts` is the entire wiring — `buildArgs` returns
`['agent', 'run', '--runtime', 'opencode']`, `fetchModels` reuses
`detectAcpModels`, and the fallback list seeds the OpenRouter ids vela's
e2e baseline uses. `executables.ts`/`app-config.ts`/`metadata.ts` get the
matching `VELA_BIN`/`VELA_LINK_URL`/`VELA_RUNTIME_KEY`/`VELA_OPENCODE_BIN`
allowlist + install/docs URLs, so users can configure the per-agent env in
Settings without leaking into other adapters.
Coverage: `tests/fixtures/fake-vela.mjs` is a minimal ACP stub that returns
the documented `initialize` / `session/new` / `session/set_model` /
`session/prompt` shapes; `tests/amr-acp-integration.test.ts` spawns it via
`child_process.spawn` and drives a full turn through `attachAcpSession` and
`detectAcpModels`, so the ACP transport contract for AMR is end-to-end
verified locally even before a real `vela` binary is installed.
Validated:
- pnpm guard
- pnpm typecheck (all workspace projects)
- pnpm --filter @open-design/daemon test (2881/2881)
Deferred: real OpenRouter-backed turn through a built `vela` binary —
the runtime def needs no changes for that path, only `VELA_RUNTIME_KEY`
and `VELA_LINK_URL` in env (or Settings).
* fix(runtimes/amr): pin a concrete default model and bare openai ids
End-to-end validation against a freshly-built `vela` (nexu-io/vela@main)
+ OpenRouter surfaced two contract details the first AMR runtime def
got wrong:
1. vela rejects `session/prompt` with `session/set_model must be called
before session/prompt`. attachAcpSession in apps/daemon/src/acp.ts
skips set_model whenever the picked model is the synthetic 'default'
id, so AMR's fallback list must NOT include DEFAULT_MODEL_OPTION. The
def now ships a concrete `gpt-5.4-mini` as both `fetchModels`'
default option and `fallbackModels[0]`, which makes attachAcpSession
always send a real `session/set_model` for AMR turns.
2. `vela --runtime opencode` auto-prepends `openai/` to whatever modelId
it forwards to opencode's openai provider. With OpenRouter-style ids
like `openai/gpt-5.4-mini`, opencode receives the double-prefixed
`openai/openai/gpt-5.4-mini` and replies `ProviderModelNotFoundError`.
The new fallback list ships the bare ids opencode's openai registry
actually knows about (gpt-5.4, gpt-5.4-mini, gpt-5.4-fast, etc.).
Stub + tests:
- tests/fixtures/fake-vela.mjs now enforces the set_model gate the same
way real vela does, so a regression that silently goes back to
model: 'default' would surface as a fatal error in tests instead of a
hidden production failure.
- tests/amr-acp-integration.test.ts pins both contracts: no 'default' /
no 'openai/' prefix in fallbackModels, and a negative case that
asserts session/prompt fails when no model is set.
Adds `apps/daemon/scripts/verify-amr-real-vela.mjs` — a small dev-time
runner that drives `attachAcpSession` against a real `vela` binary and
prints the daemon's chat events, so future protocol drift can be checked
against an actual OpenRouter call.
Verified locally: `vela agent run --runtime opencode` + OpenRouter
returns the prompted string ("AMR-E2E-PASS") through the full daemon
pipeline; daemon test suite stays 2883/2883.
* fix(runtimes/amr): substitute concrete model when chat run sends 'default'
A plugin-driven AMR run from the UI surfaced a real-world hole in the
prior commit:
json-rpc id 3: session/set_model must be called before session/prompt
The Default-design-router plugin (and any caller that doesn't pin a
real model) sends `model: 'default'` straight through, which the AMR
runtime def cannot accept — vela rejects `session/prompt` without
`session/set_model` and attachAcpSession skips set_model whenever
model === 'default'. Just leaving DEFAULT_MODEL_OPTION out of the
adapter's `fallbackModels` is not enough: the chat-run handler in
server.ts still forwarded 'default' verbatim.
This adds `resolveModelForAgent(def, resolved, env?)` as the
single source of truth for the substitution:
1. If the caller picked a real id, pass it through.
2. Else, if `def.defaultModelEnvVar` is set and the daemon process
env has a non-empty value for it, return that (operator escape
hatch — see below).
3. Else, if the def's `fallbackModels` does NOT contain a 'default'
id, return `fallbackModels[0].id`.
4. Else, return the original value (the historic shape — defs that
list 'default' themselves are untouched).
AMR sets `defaultModelEnvVar: 'VELA_DEFAULT_MODEL'`, so when
opencode's openai-provider registry deprecates `gpt-5.4-mini`
upstream, an operator can swap the fallback id without a code change
by exporting `VELA_DEFAULT_MODEL=gpt-5.5` before launching tools-dev
/ od. Worth noting the env var must live in the daemon's `process.env`
(Settings-UI per-agent env values only reach the spawned child, not
the daemon's resolver) — the new field's docblock spells this out.
Coverage:
- `tests/runtimes/resolve-model.test.ts` — 8 unit tests covering all
four resolver branches plus the env-override happy path / fallback /
ignore-when-user-picked-a-real-id case.
- `pnpm --filter @open-design/daemon typecheck` clean.
* chore(runtimes/amr): move AMR to the top of the base agent list
So `AMR (vela)` shows up first in the agent picker / status views,
ahead of claude / codex. Pure ordering change; no behavior delta.
* feat(amr): Sign-in / Sign-out button on the AMR Settings card
The first half of the AMR work assumed the operator would set
VELA_RUNTIME_KEY / VELA_LINK_URL on the daemon process and never
surfaced login state to users. This adds the missing UX so a fresh
install can drive the full path from Settings:
- GET /api/integrations/vela/status reads ~/.vela/config.json
for the active profile and returns { loggedIn, profile, user }
(without leaking the runtime/control keys themselves).
- POST /api/integrations/vela/login spawns `vela login` once
(409 if one is already in flight). The vela CLI opens the user's
browser to the device-authorization page itself — Open Design
only needs to kick the subprocess off.
- POST /api/integrations/vela/logout removes ~/.vela/config.json
so the next status read returns logged-out.
`AmrAgentCard` is a dedicated agent-card component for AMR because
the existing `<button>` row can't host an interactive sub-control
(nested interactive elements). It polls /status after a login click
until the daemon reports loggedIn=true (or 5 minutes elapse), and
exposes a Sign-out action on hover. Other adapters (claude, codex,
hermes, …) keep their existing `<button>` card.
i18n: 8 new keys (settings.amrLogin / Logout / LoggingIn / etc.)
added to en + zh-CN. Other locales spread `en` and inherit the
English copy until translations land.
Coverage:
- `tests/integrations/vela.test.ts` pins the config.json reader
against a tmp HOME — including the negative case where a profile
has user info but no runtimeKey (still logged-out), and the
secret-leak guard ("rt-secret-*" must not appear in the projection
payload).
- `tests/components/AmrAgentCard.test.tsx` covers all four UI
states (logged-out, logging-in, logged-in, logging-out) plus the
click-propagation invariant the divergent card was built to keep.
`pnpm --filter @open-design/daemon test` 2901 / 2901 passing.
`pnpm --filter @open-design/web test` 1719 / 1719 passing.
`pnpm typecheck` + `pnpm guard` clean.
Dev script side-effects: `apps/daemon/scripts/verify-amr-real-vela.mjs`
no longer requires both VELA_RUNTIME_KEY and VELA_LINK_URL — if
VELA_PROFILE is set, the vela CLI is allowed to resolve credentials
from `~/.vela/config.json`. Added the two AMR `.mjs` fixtures to
`scripts/guard.ts` allowlist with the executable-fixture / dev-runner
rationale.
* fix(connection-test): substitute model for AMR before attachAcpSession
The chat-run path in server.ts already routes the requested model through
`resolveModelForAgent` so AMR / vela (whose CLI demands an explicit
`session/set_model` before `session/prompt`) gets the def's first
concrete fallback id when the chat run ships `model: 'default'`.
`connectionTest.ts` was wiring `attachAcpSession({ ..., model: model ?? null })`
directly, which made the Test Connection button on the AMR Settings
card deadlock with the same `session/set_model must be called before
session/prompt` error the chat-run path already handles — surfaced as a
permanent "Testing connection…" spinner in the UI.
Reuse the same helper here so Test Connection mirrors chat-run behavior.
* test(amr): three-layer end-to-end coverage for the AMR login + turn flow
The PR up to this point shipped runtime + UI code with unit-level Vitest
coverage. This commit adds the cross-layer regression net the live demo
relied on:
1. apps/daemon/tests/integrations/vela.routes.test.ts (HTTP, Vitest)
Spins up the real daemon Express app via `startServer({port:0,...})`,
persists `agentCliEnv.amr.VELA_BIN = <fake>` into app-config.json,
and exercises every /api/integrations/vela/* endpoint against the
extended fake-vela stub:
- status reads ~/.vela/config.json under various states
- login spawns the fake, waits for config.json to appear, returns
pid + startedAt + profile
- 409 already-running guard with the stub's delay knob
- logout removes the file (idempotent)
- secrets (runtimeKey / controlKey) never leak in the projection
- login → status round-trip flips loggedIn=false → true
2. e2e/tests/amr/turn.test.ts (tools-dev orchestrated, Vitest)
Boots a namespaced daemon + web pair through `createSmokeSuite`,
inlines a self-contained fake `vela` binary that handles BOTH
`vela login` (writes ~/.vela/config.json) and
`vela agent run --runtime opencode` (ACP stdio with the
`session/set_model must precede session/prompt` gate the real binary
enforces), then drives a complete /api/runs lifecycle for
`agentId: 'amr', model: 'default'` and asserts the assistant message
captures the fake's streamed text. This is the test that would have
surfaced today's plugin-default-model regression (the `set_model
before prompt` error) at PR time instead of demo time.
3. e2e/ui/amr-login-pill.test.ts (Playwright)
Mocks /api/agents + /api/integrations/vela/{status,login,logout}
to drive the Settings AMR card through the full Sign in → Signed in
→ Sign out cycle. Pins the AmrLoginPill polling contract and the
aria-label semantics (the pill's accessible name is "Sign out" once
logged in, regardless of which label the hover-state text shows).
fake-vela.mjs extensions:
- Handles `vela login` argv by writing
~/.vela/config.json for the active VELA_PROFILE and exiting 0 —
mirrors real vela's on-disk side-effect without the device-auth
loop.
- FAKE_VELA_LOGIN_DELAY_MS knob so route tests can observe the
in-flight state of the spawn lifecycle.
- FAKE_VELA_LOGIN_USER_EMAIL / _USER_PLAN to assert the surfaced
user fields end-to-end.
Validated:
- `pnpm guard` + `pnpm typecheck` (all workspace projects)
- `pnpm --filter @open-design/daemon test`: 2998 / 2998 passing,
including the new 8-test integration suite.
- `cd e2e && pnpm test tests/amr`: 1 / 1 passing.
- `cd e2e && pnpm exec playwright test ui/amr-login-pill.test.ts`:
1 / 1 passing (6.7s).
* feat(amr): package native cli and refine login ui
* feat(amr): wire vela cli beta packaging
* docs(amr): document vela ci packaging review
* docs(amr): refine vela ci integration review
* fix(ci): refresh nix pnpm dependency hashes
* fix(pack): clean up Vela CLI packaging
* fix(pack): bundle Vela CLI support files
* fix(amr): recover login attempts from stale auth state
* test: expand AMR and automations coverage
* fix(amr): address review follow-ups
* test(web): align tasks fixtures with contracts
* fix(daemon): type wildcard route params
* fix(ci): refresh PR merge validation
* fix(amr): clear env credentials on logout
* feat(settings): inline local CLI model configuration
* fix(amr): recognize daemon env credentials
* [codex] Fix Vela companion packaging (#2979)
* Fix Vela companion packaging
* Update Nix pnpm dependency hashes
* [codex] Surface AMR account failures (#2980)
* fix: surface AMR account failures
* fix: cover AMR recovery error guidance
* chore: bump beta base version to 0.8.1 (#2990)
* Fix AMR profile and packaged runtime review issues
* Detect packaged AMR OpenCode companion tree
* feat(web): polish AMR frontend flows
* Polish AMR onboarding card
* fix: read AMR login state from dot-amr config (#3048)
* test: tighten AMR credential and packaging coverage
* test: restore AMR executable test env helper
* [codex] Fix packaged mac Dock identity and AMR label (#3076)
* Fix packaged mac sidecar Dock identity
* Rename AMR assistant label
* Fix AMR live models and dot-amr login state (#3073)
* fix: read AMR login state from dot-amr config
* fix: load live AMR models before runs
* fix: point AMR onboarding link to production wallet
* fix: address AMR model review feedback
* fix: persist live AMR model fallback
* [codex] Fix AMR link catalog model ids (#3088)
* Fix packaged mac sidecar Dock identity
* Rename AMR assistant label
* Fix AMR link catalog model ids
* Fix AMR model normalization typecheck
* Use live AMR model for default runs
* fix: polish AMR runtime settings UI
* Accelerate AMR startup defaults (#3092)
* Surface AMR insufficient balance wallet URL (#3099)
* fix(web): polish onboarding controls (#3112)
* fix(web): show CLI scan loading state
* Avoid duplicate AMR wallet recharge links (#3117)
* Avoid duplicate AMR wallet recharge links
* Use Vela CLI 0.0.3 test package
* chore(nix): refresh pnpm deps hash
* Fix AMR wallet guidance display
---------
Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>
* chore(pack): pin Vela CLI 0.0.3-test.1 (#3127)
* chore(nix): refresh pnpm deps hash
* chore(pack): pin Vela CLI 0.0.3
* chore(nix): refresh pnpm deps hash
* fix(web): suppress AMR exit 130 fallback (#3136)
* feat(web): nudge users to hosted AMR on model/auth/quota failures (#3083)
* feat(web): nudge users to hosted AMR on model/auth/quota failures
When a non-AMR agent run fails with an auth / quota / upstream model
error, surface an inline nudge under the error pill linking to Open
Design's hosted AMR gateway (https://open-design.ai/amr). The nudge
fires `surface_view` (element=run_failed_toast) on impression and
`ui_click` (element=go_amr) on the link.
Also teach the daemon to classify CLI-agent auth/quota/upstream failures
(Claude Code, codex, ...) into specific API error codes
(AGENT_AUTH_REQUIRED / RATE_LIMITED / UPSTREAM_UNAVAILABLE) instead of
the generic AGENT_EXECUTION_FAILED, so both the error message and the
nudge key off accurate codes. AMR's own runs are excluded from the
nudge — they keep the dedicated sign-in / recharge affordances.
* feat(web): rework failed-run AMR guidance into per-case error UI
Replace the single inline nudge with a per-case failed-run experience
driven by the run's error code + agent:
- The error card is now neutral gray (was red) and always carries a
retry button; it is driven by the persisted per-message error event so
it survives a reload.
- Non-AMR agent hitting a model/auth/quota wall: a theme-color promotion
card under the error card offers "switch to AMR & retry" — switches the
run to AMR, opens Settings on the AMR card, and auto-retries once the
account signs in (ProjectView polls vela login status, independent of
the Settings pill lifecycle, with success / 5-min-timeout / unmount
exits).
- AMR agent unauthorized: clearer copy + an "authorize & retry" button.
- AMR agent out of balance: clearer copy + a "top up" button to the AMR
wallet, with manual retry.
- Settings AMR card: when opened from the nudge, it scrolls into view and
pulses, and an authorize-button coachmark (a fake hand cursor that
rises in and dismisses on hover) points at the sign-in control when not
yet authorized.
analytics: surface_view (run_failed_toast) on the promotion card and
ui_click (go_amr) on its action are retained. i18n adds chat.amrCard.*
and chat.amrError.* (en / zh-CN / zh-TW translated; other locales fall
back to en) and drops the old chat.amrErrorGuidance keys.
* fix(daemon): require status context for numeric service-failure codes
Per review on #3083: the model-service classifier matched bare HTTP
status numbers (`500`, `502`, `429`, `401`), so ordinary CLI output like
`line 500`, `read 502 bytes`, or `exit code 401` could be misclassified
as a provider outage / auth wall and wrongly surface the AMR nudge. Now
a status number only counts when it carries explicit context (`HTTP 500`,
`status 503`, `code: 401`, `502 Bad Gateway`); textual provider phrases
(overloaded, bad gateway, service unavailable, rate limit, …) are
unchanged. Adds fixtures proving unrelated numeric output stays null.
* fix(web): keep error pill for failed runs ChatPane's card doesn't cover
Per review on #3083: the per-message gray error pill was suppressed for
every persisted error status event, but ChatPane only renders the
replacement top-level error card for `retryableAssistantMessage` (the
last failed assistant). So a failed turn that is no longer last (after a
follow-up) or an older failed run in history showed neither the pill nor
the card — its error detail vanished, undercutting reload/history
survival. ChatPane now passes `errorCardOwnerId` (the assistant id whose
error the card represents); AssistantMessage suppresses only that one
pill and keeps rendering StatusPill for all other error events.
* fix(daemon): don't treat a process exit code as an HTTP status
Follow-up to review on #3083: the status-context helper accepted a bare
`code` prefix, so `exit code 401` / `process exited with code 429` still
matched and got classified as AGENT_AUTH_REQUIRED / RATE_LIMITED (the
very `exit code 401` case the comment calls out as noise). `code` now
only counts when qualified (`status code` / `error code` / `response
code`) or punctuation-bound (`code: 401`); bare `exit code N` no longer
matches. Adds fixtures for exit-code lines returning null.
* chore(web): translate AMR card / error keys for 16 remaining locales
PR #3083 added 10 new `chat.amrCard.*` / `chat.amrError.*` keys but only
provided en/zh-CN/zh-TW translations; the other 16 locales fell back to
English. Translate the card title/body, three chips, primary CTA, and
the AMR self-error (auth / balance) messages and buttons for ar, de,
es-ES, fa, fr, hu, id, it, ja, ko, pl, pt-BR, ru, th, tr, uk.
* fix(amr): address review feedback on #2355
Targeted fixes for the unresolved review threads on #2355. Each fix
includes / updates a focused test.
- runtimes/executables.ts: `packagedVelaOpenCodeCompanionTree` now
verifies the inner `opencode` executable exists + is runnable, not
just the directory. This closes the false-positive availability path
that let `detectAgents()` surface AMR as available even when the
packaged companion was empty / partially copied (mrcfps, 4 threads).
- runtimes/executables.ts: `resolveAmrOpenCodeExecutable` now prefers
the bundled `<OD_RESOURCE_ROOT>/bin/libexec/opencode/opencode` over a
stale `opencode` on the user's PATH, so packaged AMR builds can't be
hijacked by a global installation.
- web/EntryShell.tsx: when the Local CLI scan returns an available
agent and the previously-selected agent is AMR, switch the selection
to the first available local agent so the runtime and persisted
agent agree before Continue.
- server.ts (model-probe branch): for AMR, check `readVelaLoginStatus`
BEFORE rejecting on an empty live-model catalog — a signed-out user
was getting `AMR_MODEL_UNAVAILABLE` ("choose a model") instead of
the correct `AMR_AUTH_REQUIRED` (sign-in affordance).
- server.ts (default model fallback): if the user asked for the AMR
agent default and the cached id is no longer in the FRESH catalog,
fall back to `liveModels[0]` from the probe instead of rejecting the
run as `AMR_MODEL_UNAVAILABLE`.
- integrations/vela.ts: route `vela login` through
`createCommandInvocation` so an npm/Node-style `vela.cmd` / `.bat`
shim on Windows gets the correct `cmd.exe /d /s /c …` wrapping with
verbatim args (matches `execAgentFile` / chat-run spawning).
- tools/pack/src/linux.ts: in containerized Linux builds, bind-mount
the host directory of `OPEN_DESIGN_VELA_CLI_BIN` and rewrite the env
to the container-side path. The host path was being passed in as-is
even though the default container only mounts /project, /tools-pack
and cache/home — `copyOptionalVelaCliBinary` saw a missing path.
Deferred (out of scope for this PR):
- `od amr status/login/logout/cancel` CLI subcommands (AGENTS.md
UI/CLI dual-track rule, server.ts:5763) — sizable surface; tracked
for a separate focused PR.
- Strict `--require-vela-cli` for Windows + mac-x64 beta builds:
prematurely blocked — `@powerformer/vela-cli` only publishes the
`darwin-arm64` platform binary today; adding the flag elsewhere
would fail the builds. Revisit once win/x64/linux binaries ship.
* fix(amr): hoist sendAmrAccountFailure above the AMR catalog preflight (TDZ)
The new signed-out AMR branch in the catalog preflight at server.ts:10875
calls `sendAmrAccountFailure(...)` to emit AMR_AUTH_REQUIRED, but the
const declaration sat ~100 lines below at the outer function scope. Because
`const` is TDZ-aware, that branch would have thrown `ReferenceError:
Cannot access 'sendAmrAccountFailure' before initialization` for the
exact users it tries to help — defeating the original intent.
Hoist the helper to just above the AMR preflight block so it's available
to every AMR code path in this function. Behavior elsewhere is unchanged.
Also rerun the daemon test suite: `launch.test.ts > resolveAgentLaunch
uses packaged built-in Vela for AMR` was creating the
`<resourceRoot>/bin/libexec/opencode/` companion *directory* only, but
this PR's earlier tightening of `packagedVelaOpenCodeCompanionTree`
also requires the inner `opencode` executable. Add it to that fixture
to match the new contract; the test was a sibling of the executables /
env-and-detection fixtures already updated in
|
||
|
|
d5659d82d4
|
chore(nix): streamline pnpm deps hash maintenance (#2919)
* chore(nix): streamline pnpm deps hash maintenance Generated-By: looper 0.9.0 (runner=worker, agent=opencode) * fix(ci): satisfy actionlint in nix hash autofix Generated-By: looper 0.9.0 (runner=fixer, agent=opencode) * fix(ci): allow nix hash autofix on fork PRs Generated-By: looper 0.9.0 (runner=fixer, agent=opencode) * fix(ci): follow up nix hash review Generated-By: looper 0.9.0 (runner=fixer, agent=opencode) * fix(ci): tolerate nix hash bot token failures Generated-By: looper 0.9.0 (runner=fixer, agent=opencode) |
||
|
|
7bc11b398d
|
chore(deps): upgrade express 4 -> 5 in daemon (#2311)
* chore(deps): upgrade express 4.22.1 -> 5.2.1 and @types/express
Breaking changes addressed:
- Renamed all bare wildcard route segments from * to *splat across
src/server.ts, src/static-resource-routes.ts, src/project-routes.ts,
src/import-export-routes.ts, and all three test stubs that define
app.get/options/delete routes using /raw/* or /raw/* patterns
- Updated wildcard param access from (req.params as any)[0] / req.params[0]
to Array.isArray(req.params.splat) ? req.params.splat.join('/') : String(...)
to handle the Express 5 / path-to-regexp v8 change where wildcard params
are now string[] instead of string
- Updated app.get('*') SPA fallback to app.get('/*splat') in server.ts
- Annotated five connector route handlers with Request<{ connectorId: string }>
so the typed param resolves as string, not string | string[], fixing the
10 TS2345 / TS2322 errors that surfaced when @types/express moved to 5.0.6
- Fixed two app.listen() beforeAll callbacks in origin-validation.test.ts to
accept and propagate the optional Error argument Express 5 now passes to
the listen callback, resolving TS2769 overload mismatch
* chore(nix): refresh daemonHash for rebased lockfile
* fix(daemon): await res.sendFile() in async route handlers for Express 5 compatibility
Express 5 res.sendFile() returns a Promise. Without await, async route
handlers return before the response is sent, causing Express to call
next() and fall through to a 404. Add await to all res.sendFile() calls
in async handlers in static-resource-routes.ts and server.ts.
* fix(daemon): use readFile+send for spritesheet route instead of sendFile
Express 5 res.sendFile() returns undefined (not a Promise). ENOENT errors
call next() asynchronously after the route handler's try/catch has returned,
causing unhandled 404 responses. Replacing with fs.promises.readFile + res.send
keeps the error path fully within the handler's try/catch.
---------
Co-authored-by: Patrick A <259201958+eefynet@users.noreply.github.com>
|
||
|
|
128b62f863
|
chore: patch qs advisory (#2833)
* chore: patch qs advisory * chore: update daemon pnpm deps hash |
||
|
|
34165ff189
|
chore: retire tools-pr (#2867) | ||
|
|
a5b47c5f76
|
fix(ci): narrow workflow scope and reuse setup steps (#2708)
* fix(ci): narrow workflow scope and reuse setup steps * fix(ci): narrow workflow scope and reuse setup steps Repair Nix fixed-output hashes for the filtered daemon and web source trees. Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(ci): narrow workflow scope and reuse setup steps Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(ci): narrow workflow scope and reuse setup steps Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(ci): repair daemon and nix checks Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) |
||
|
|
7f03030f3f
|
perf(landing): self-host fonts + inline critical CSS (#2599)
* perf(landing): self-host fonts + inline critical CSS
PageSpeed Insights flagged ~2.3s of render-blocking on /:
globals.css 12.9 KB external link, 160ms
fonts CSS 2.2 KB fonts.googleapis.com, 750ms
+ 4 woff2 ~1200ms each from fonts.gstatic.com
Two changes drop that whole chain:
1. Self-host fonts via @fontsource-variable/{inter,inter-tight,
playfair-display,jetbrains-mono}. Each family ships a single variable
woff2 (covers all weights we use) that Astro bundles into /_astro/*
alongside the rest of the build, served same-origin through CF Pages —
no separate TLS handshake, no Google Fonts CSS round-trip. The CSS
variable names get an extra alias in front (`'Inter Tight Variable',
'Inter Tight', ...`) so a system fallback still works if the package
ever ships under a different family name.
2. `astro.config.ts: build.inlineStylesheets: 'always'` inlines every
emitted <style> into the HTML <head> instead of emitting a separate
/_astro/*.css link. The HTML grows from ~13KB to ~28KB (gzip) but
loses one stylesheet round-trip + the entire @font-face chain that
used to gate text rendering.
Component cleanup: the `<FontStylesheet>` component (preconnect + link to
fonts.googleapis.com) is no longer needed and is deleted, removed from
all 7 places that mounted it. og.astro keeps its own font setup since
it renders to a screenshot.
Expected effect (from PageSpeed Insights "Render-blocking requests"
diagnostic on the previous build):
FCP 1.9s → ~1.2s
LCP 2.2s → ~1.5s
Verified: pnpm typecheck 0 errors, pnpm build 1853 pages 78s, preview
serves /_astro/*.woff2 as font/woff2 same-origin, 0 fonts.googleapis or
fonts.gstatic references in the built HTML.
* perf(landing): include Playfair italic + bump nix pnpm-deps hash
Two follow-ups on the self-host fonts PR:
1. globals.css imported only `@fontsource-variable/playfair-display`,
which ships @font-face for font-style: normal only. The previous
Google Fonts URL included the italic axis (`ital,wght@0,500;1,400;
...`) and several rules (.roman, .work-rule .roman, .sec-rule .roman,
plus 8 other italics across globals.css + sub-pages.css) render
Playfair italics via `font-family: var(--serif); font-style: italic`.
Without the italic face self-hosted, those would fall through to
Times New Roman italic or browser synthesis. Adding
`wght-italic.css` keeps the typography visually equivalent.
2. nix/pnpm-deps.nix uses a fixed-output derivation hash that has to
match the pnpm vendored store; adding the four fontsource packages
changed pnpm-lock.yaml so the hash has to be bumped to the value Nix
reported in CI.
Codex (Looper reviewer) flagged #1 as non-blocking.
* perf(landing): pin fontsource versions exactly per repo guard
`pnpm add` defaulted to caret ranges (`^5.2.8`) but repo guard rejects
non-exact specs ("dependency specs must be exact versions like 1.2.3 or
workspace:*"). That was the actual cause of the Preflight + Validate
workspace failures — pinning to the locked versions Codex reviewer
called out:
@fontsource-variable/inter 5.2.8
@fontsource-variable/inter-tight 5.2.7
@fontsource-variable/jetbrains-mono 5.2.8
@fontsource-variable/playfair-display 5.2.8
`pnpm guard` now passes locally (6/6 tests).
|
||
|
|
10192dcc52
|
fix(ci): catch nix hash drift before merge (#2530)
* fix(ci): catch nix hash drift before merge * fix(nix): add pnpm hash refresh helper * chore(nix): drop redundant hash alias * fix(nix): raise update-hash output buffer Generated-By: looper 0.8.1 (runner=fixer, agent=opencode) * fix(nix): handle current pnpm deps hash Generated-By: looper 0.8.1 (runner=fixer, agent=opencode) * fix(nix): reject non-mismatch hash updates Generated-By: looper 0.8.1 (runner=fixer, agent=opencode) |
||
|
|
c45c5c9764
|
fix(ci): align visual selectors and nix hashes (#2471)
* fix(ci): align visual selectors and nix hashes * fix(ci): add strict PR visual verification * fix(ci): repair visual-home captures Generated-By: looper 0.8.1 (runner=fixer, agent=opencode) |
||
|
|
7b1cc16988
|
fix(nix): force http:// scheme on bundled caddy site address (#2485)
A bare `host:port` site address lets Caddy pick the listener scheme by port heuristic, which fights `auto_https off` and surfaces as TLS errors when the browser hits plain HTTP on a non-standard port. Hardcode the `http://` prefix in both the Home Manager and NixOS Caddyfile templates — the bundled proxy is plaintext-only by design, so users who need TLS run their own front-end with `webFrontend.enable = false`. |
||
|
|
80d305858b
|
feat(diagnostics): add one-click log export from Settings → About (#798)
* feat(diagnostics): add one-click log export from Settings → About
Adds a new "Export diagnostics" entry under the About section that bundles
daemon/web/desktop logs, machine info, and recent macOS crash reports into
a zip the user can share when reporting issues.
- Browser hits a new daemon HTTP endpoint and triggers a download.
- Electron uses an IPC bridge with the native save dialog and reveals the
saved file in Finder/Explorer; the Help menu also exposes it as a
fallback when the daemon is unresponsive.
Packaging + redaction lives in a new @open-design/diagnostics package so
both surfaces share it. Sensitive JSON keys, URL query secrets, and the
current user's home path are redacted before packaging.
* build(nix): include packages/diagnostics in daemon build targets
The Nix daemon derivation builds workspace siblings in dependency order
before compiling apps/daemon. Without @open-design/diagnostics in that
list, the daemon TypeScript build fails inside the Nix sandbox with
`Cannot find module '@open-design/diagnostics'` because pnpm install
only creates the symlink — the dist output that the package.json
exports point at isn't produced until each sibling's build script runs.
* build(tools-pack): include @open-design/diagnostics in packaged INTERNAL_PACKAGES
Without this, packaged win/mac/linux builds fail with `npm error 404` when
the post-build `npm install --omit=dev --no-package-lock` step in the
assembled app tries to resolve `@open-design/diagnostics@0.2.0` from the
public npm registry. The package is workspace-private, so it has to be
tarballed via `pnpm pack` and file:-referenced from the assembled
package.json like every other internal workspace dep that daemon/desktop
depend on.
Also wires the package's `pnpm --filter ... build` into the pre-pack
workspace build step so the dist/ exists before pnpm pack runs, and
updates the two test fixtures (`win-app.test.ts`, `workspace-build.test.ts`)
that mirror INTERNAL_PACKAGES.
The diagnostics package itself is repinned to exact dependency versions
already used elsewhere in the workspace (`jszip 3.10.1`, `@types/node
20.19.39`, `esbuild 0.28.0`, `typescript 5.9.3`, `vitest 4.1.6`) so it
passes the new `pnpm guard` exact-version rule and produces a minimal
lockfile diff vs main (additions only, no resolution-string churn).
* fix(diagnostics): include `~` in bearer-token redaction char class
RFC 6750 token68 syntax allows `~`, so tokens like `Authorization: Bearer
abcd~efgh` were only partially matched by `HTTP_AUTH_SCHEME_RE`. The
regex stopped at the first `~`, leaving the tail (`~efgh`) un-redacted in
the exported diagnostics zip — a clear leak since this feature explicitly
generates support bundles for external sharing.
Add `~` to the character class and a regression test.
* fix(diagnostics): only collect renderer.log from desktop
`buildSidecarLogSources` unconditionally added `logs/${app}/renderer.log`
for daemon/web/desktop, but only the desktop runtime writes a renderer
log (see apps/desktop/src/main/runtime.ts) — daemon and web are pure
Node services with no Electron renderer. Every export therefore produced
missing-file placeholders and manifest warnings for the two phantom
paths, polluting the bundle.
Gate the renderer.log source on APP_KEYS.DESKTOP so the daemon-side
collector matches the desktop-side collector in apps/desktop/src/main/
diagnostics.ts:63.
* fix(diagnostics): mirror desktop-side renderer.log gate
The previous fix only updated the daemon-side `buildSidecarLogSources`
in `apps/daemon/src/diagnostics-export.ts`. The desktop-side collector
at `apps/desktop/src/main/diagnostics.ts` had an identical copy of the
same bug that I overlooked: it also unconditionally added
`logs/${appKey}/renderer.log` for daemon/web/desktop, producing
missing-file placeholders + manifest warnings for the two phantom paths
on every desktop-initiated export.
Apply the same `appKey === APP_KEYS.DESKTOP` gate here so both export
entry points (browser via daemon HTTP, Electron via native save dialog)
emit the same clean manifest.
* feat(diagnostics): add `od diagnostics export` CLI subcommand
AGENTS.md's dual-track capability-exposure contract requires every
user-facing feature to ship on both the web UI and the `od` CLI. The
diagnostics export was only reachable through Settings → About and the
desktop Help menu; this commit closes the loop with an `od diagnostics
export [<path>] [--json]` subcommand registered in SUBCOMMAND_MAP.
The CLI is a thin shell over the existing GET /api/diagnostics/export
endpoint — same zip output, same redaction, same crash-report scope.
Defaults to writing `open-design-diagnostics-<timestamp>.zip` in the
current directory; `--output <path>` or a positional arg overrides.
`--json` prints `{path, sizeBytes}` for shell pipelines.
Use cases this unlocks:
- A CI script can `od diagnostics export ~/artifacts/bundle.zip` after
a failed run.
- Bug reporters on headless boxes can grab a bundle without booting
the web UI.
- `od doctor` follow-ups can collect a full snapshot when a probe fails.
* fix(diagnostics): surface non-sidecar launch in manifest warnings
`buildSidecarLogSources()` returns `[]` when the daemon has no sidecar
runtime context, which is the standard `od` (plain) launch path —
`runDaemonCliStartup()` -> `startDaemonRuntime()` does not pass a
runtime. Settings → About and the new `od diagnostics export` previously
reported success but produced a bundle with only the summary JSONs, so
operators could not tell "no logs because plain launch" from "no logs
because something genuinely broke."
- Extend `DiagnosticsContext` with an optional upstream `warnings:
string[]` that `buildManifest` merges into the manifest warnings.
- Emit STANDALONE_LAUNCH_WARNING from the daemon handler when
`options.runtime == null`. The warning names the limitation and
points the user at the sidecar entry points that DO capture logs.
- Add a regression spec at `apps/daemon/tests/diagnostics-export.test.ts`
that drives the handler with `runtime: null` and asserts the warning
surfaces in `summary/manifest.json` (and that `files` is empty so a
user reading the bundle does not confuse "no log sources" with
"missing files").
|
||
|
|
60bd58a1f3
|
fix(nix): prebuild host package for web build (#2280)
Some checks failed
ci / Detect CI change scopes (push) Successful in 0s
landing-page-ci / Validate landing page (push) Failing after 2s
landing-page-deploy / Deploy landing page (push) Has been skipped
nix-check / build (push) Failing after 2s
ci / Preflight (push) Failing after 2s
ci / Core package tests (push) Failing after 1s
ci / Tools workspace tests (push) Failing after 1s
ci / Daemon workspace tests (1/2) (push) Failing after 1s
ci / Daemon workspace tests (2/2) (push) Failing after 2s
ci / Web workspace tests (push) Failing after 1s
ci / E2E vitest (push) Failing after 2s
ci / Playwright critical (starters) (push) Failing after 1s
ci / Playwright critical (core) (push) Failing after 2s
ci / Build workspaces (push) Failing after 1s
ci / App workspace tests (push) Failing after 0s
ci / Validate workspace (push) Failing after 0s
ci / Runtime trace (push) Has been skipped
|
||
|
|
bb13eee765
|
chore: optimize CI and beta release runtime (#2231)
* chore(ci): add runtime trace summaries * chore(ci): tighten measured workspace steps * chore(release): tighten beta setup steps * chore(release): slim beta windows smoke * chore(ci): shard daemon tests * chore(ci): harden runtime trace lookup * chore(release): avoid mac pnpm cache in beta * chore(ci): split critical playwright checks * chore(release): publish beta platforms from builders * test(e2e): update beta release workflow expectation * chore(ci): stop gating PRs on nix check * fix(release): keep beta latest complete |
||
|
|
bd48c597b0
|
chore: pin dependency versions and harden CI caches (#2189)
* chore: pin dependency versions * ci: enforce pinned dependency specs * ci: fix pnpm executable invocation |
||
|
|
14cff69435 |
fix(nix): refresh pnpmDepsHash after merging main
The merge brought new entries into pnpm-lock.yaml (Leonardo.ai provider, custom select primitive, Critique Theater wireup, etc.), so the vendored pnpm store hash changed. CI computed the new hash from the freshly built fixed-output derivation. specified: sha256-mzGiD5K1l5SEFpQ++L4wq775ne+ViG4ruXYZYEdT6zQ= (pre-merge) got: sha256-lROdH5HgKFf3R7DYGbc8n/GrmINwLbfVwC4Xp7SrHN4= (post-merge) |
||
|
|
545aed642e |
Merge remote-tracking branch 'origin/preview/0.8.0' into preview/v0.8.0
# Conflicts: # nix/package-daemon.nix # scripts/postinstall.mjs |
||
|
|
883598f556 | Build registry protocol in packaged workspaces | ||
|
|
dbfe08eaf3 | Update Nix pnpm deps hash | ||
|
|
76defffb93
|
Garnet hemisphere (#1702)
* feat(chat-composer): enhance mention handling and input overlay - Introduced a new overlay for inline mentions in the chat composer, improving user experience by visually indicating mentions as users type. - Updated the `ChatComposer` component to manage mention entities and integrate them into the input field, allowing for better context and interaction. - Enhanced the `AssistantMessage` component to support the display of plugin action panels based on the current project context, facilitating easier plugin management. - Refactored related components to ensure consistent handling of project files and mentions across the application. This update significantly improves the chat interaction model, making it more intuitive for users to engage with mentions and plugins. * feat(plugin-management): enhance plugin action panels and UI components - Updated the `AssistantMessage` component to include plugin action panels based on the latest project context, improving user interaction with generated plugins. - Refactored the `PluginsView` to support detailed views for available marketplace entries, allowing users to access more information and actions for each plugin. - Introduced new CSS styles for improved visual representation of plugin-related UI elements, enhancing overall user experience. - Enhanced the `listPlugins` function to include an option for fetching hidden plugins, providing more flexibility in plugin management. This update significantly improves the usability and functionality of the plugin management system, making it easier for users to interact with and manage their plugins. * fix(assistant-message): refine plugin folder candidate selection logic - Updated the `pluginFoldersTouchedThisTurn` function to improve the logic for selecting plugin folder candidates based on touched paths and message content. - Introduced a new helper function, `pathMatchesFolderFileBasename`, to enhance the matching criteria for folder candidates. - Added a check for explicit folder matches before falling back to a single candidate, improving accuracy in folder selection. - Modified the `shouldRenderSlotAsText` function in `HomeHero` to include the name parameter, refining the rendering logic for slot text. These changes enhance the functionality and reliability of the assistant message component in managing plugin folder candidates. * feat(plugin-folder-actions): implement agent-routed CLI actions for plugin management - Introduced a new `PluginFolderAgentAction` type to streamline actions related to plugin folders, including install, publish, and contribute. - Updated the `DesignFilesPanel`, `FileWorkspace`, and `AssistantMessage` components to utilize the new agent action handling, improving user interaction with generated plugins. - Refactored the action handling logic to send commands to the agent, enhancing the workflow for managing plugin folders. - Added corresponding tests to ensure the new functionality works as expected and integrates seamlessly with existing components. This update significantly enhances the plugin management experience by routing actions through the agent, allowing for a more cohesive and interactive user experience. * Fix PR 1702 CI blockers * Fix PR 1702 remaining CI checks * Prebuild AGUI adapter after install * Restore plugin project snapshot wiring * feat(marketplace): refactor marketplace URL handling and enhance fetching logic - Introduced new functions to normalize marketplace URLs and manage fetching of marketplace manifests, improving the reliability of marketplace integrations. - Updated the server and plugin logic to utilize the new fetching mechanisms, ensuring consistent handling of marketplace data. - Enhanced tests to cover new URL normalization and fetching scenarios, ensuring robustness in marketplace management. This update significantly improves the marketplace experience by streamlining URL handling and enhancing data fetching capabilities. * Fix project auto-send cleanup spec |
||
|
|
38a5ab69e6
|
feat(daemon): Critique Theater Phase 12 (9 Prometheus metrics + 6 log events + OTel span + Grafana dashboard) (#1485)
* feat(web): pure reducer for Critique Theater states (Phase 7.1)
Pure CritiqueState reducer driven by the contracts-level PanelEvent
(the same shape both the live SSE stream and the recorded transcript
emit), so a single reducer powers both the in-flight panel and the
rerun replay. Lifecycle covers run_started → running → (shipped /
degraded / interrupted / failed), with panelist_open / dim /
must_fix / close / round_end events building per-round
CritiquePanelistView entries as they arrive.
Defensive behaviour that surfaced while writing the spec tests:
- Terminal phases (shipped / degraded / interrupted / failed) are
sticky against further lifecycle events for the same run, except
for parser_warning which can land late and is recorded in a side
channel without changing phase.
- A new run_started for a different runId at any time discards the
prior state and reboots, so the UI can launch consecutive runs
without an explicit reset action.
- Events whose runId does not match the active run return the same
state reference, so React's useReducer doesn't re-render
subscribers on stray traffic.
- Round bookkeeping keys by round number rather than "always last",
so an out-of-order panelist_dim for round 1 arriving after a
round 2 dim does not corrupt the round 2 bucket.
Test coverage: 18 cases covering each transition, the runId guard,
sticky-terminal behaviour, the out-of-order round invariant, and
the stable-identity guarantee. Sets up Phase 7.2 and 7.3 to wire
SSE + replay into the same reducer.
* feat(web): useCritiqueStream hook subscribes to SSE and feeds reducer (Phase 7.2)
createCritiqueEventsConnection is a pure connection manager that
mirrors apps/web/src/providers/project-events.ts: opens an
EventSource at /api/projects/:id/events, listens for every name in
CRITIQUE_SSE_EVENT_NAMES, decodes each frame back into a PanelEvent
(stripping the critique. prefix and merging the data payload), and
hands it to the caller's onEvent. Reconnect uses exponential
backoff (1s → 30s) and resets on `ready`; malformed payloads drop
with a dev-mode warning rather than tearing the stream.
useCritiqueStream wraps the manager in a useReducer that owns the
CritiqueState. enabled=false or a null projectId tears down the
connection cleanly; switching projectId closes the old connection
and opens a fresh one. The returned dispatch lets local UI
synthesise actions (e.g. an Esc keypress firing a synthetic
interrupted while a kill request is in flight); production traffic
comes from the SSE stream.
Test coverage:
- sse.test.ts (10 cases, node env): subscription set covers every
CRITIQUE_SSE_EVENT_NAMES channel; payload decoding lifts the wire
shape back to PanelEvent; malformed JSON is swallowed and does
not stop the stream; exponential backoff schedule and ready-reset
semantics are pinned with a setTimeout seam; close() cancels
pending reconnects and shuts the live source; no-op fallback
when EventSource is unavailable.
- useCritiqueStream.test.tsx (6 cases, jsdom env): idle pre-event,
reducer driven by synthetic actions, no connection when disabled
or projectId is null, clean close on unmount, projectId change
reopens cleanly.
* feat(web): useCritiqueReplay hook drives reducer from transcript file (Phase 7.3)
Fetches the per-run NDJSON transcript (one PanelEvent per line),
parses every line via the shared isPanelEvent predicate, and
dispatches into the same CritiqueState reducer the live SSE stream
uses. A single reducer means the UI rendering a replay can be
identical to the live panel, and a UI mounting both
useCritiqueStream and useCritiqueReplay in parallel does not have
to reconcile two state shapes.
speed knob is `paused | instant | live | { intervalMs: N }`.
- instant flushes every event synchronously, useful for opening a
finished run already at its terminal state.
- intervalMs paces dispatches at a fixed cadence so the reviewer
can watch the run unfold.
- paused parses the transcript but holds events back until the
caller advances speed (consumers can drive a scrubber later).
- live is reserved for the future "playback at original cadence"
feature, currently treated as instant; replay timestamps are not
yet persisted with each event so honest pacing requires a
follow-up Phase 7+ task.
gunzip seam handles `.ndjson.gz` transcripts via
DecompressionStream when present; the production fetch path picks
between text and arrayBuffer based on the URL extension. Both seams
are injectable so the unit tests don't need to spin up a real
network or a real gzip pipeline.
Test coverage (8 cases, jsdom env):
- Idle status before any URL is provided.
- speed=instant flushes the full transcript synchronously to
shipped state.
- speed={intervalMs:N} paces with the setTimeout seam, reaching
done after the last tick.
- speed=paused leaves status=playing with no dispatches.
- Empty transcript reports done with state still idle.
- Fetch rejection surfaces an error status with the message.
- Malformed NDJSON lines are skipped; valid events around them
still land.
- .gz transcripts route through the gunzip seam.
Closes the Phase 7 plan tasks 7.1 / 7.2 / 7.3 (reducer + stream +
replay), all on one branch ready for review. Phases 8+ (Theater
components) consume these from this PR.
* fix(web): close payload-override gap + paused-resume bug in Critique Theater hooks (Phase 7 review)
Two P1 fixes from lefarcen's review on PR #1307:
SSE payload override
`sseToPanelEvent` previously spread `data` after the channel-derived
`type`, so a payload-provided `type` could override the channel and
route a `critique.run_started` frame into the reducer as a `ship`
action. Reversed the spread so the channel-derived `type` is
authoritative, and revalidated the resulting object through the
contracts-level `isPanelEvent` predicate before returning. Frames
that fail validation (missing runId, empty runId, unknown type) are
dropped, so a malformed or compromised SSE frame can no longer
dispatch a wrong-shape action into the reducer.
Three new sse.test.ts cases pin the regression: hostile `type:'ship'`
in the payload still resolves to `run_started`, missing runId is
dropped, empty runId is dropped.
Replay pause/resume
`useCritiqueReplay` had one big effect keyed on `transcriptUrl`
only, so flipping `speed` from `paused` to `instant` never re-fired
and the held events sat undispatched. Split into a parse effect
(depends on URL, fetches and stores events in state) and a pace
effect (depends on parsed-events + speed, owns the cursor + timers).
The playback cursor lives in a ref that survives pause/resume
cycles, so flipping `paused` -> `instant` flushes from the current
position rather than restarting (which would double-dispatch
`run_started` and reset the reducer).
Two new useCritiqueReplay.test.tsx cases:
- paused-then-instant transitions from `playing` to `done` and
reaches the shipped terminal phase
- intervalMs paced playback dispatches one event, pauses to drain
the next scheduled timer, flips to instant, and confirms the
remaining transcript drains exactly once (cursor was preserved)
Doc consistency
The earlier source comment in useCritiqueReplay.ts claimed `live`
"paces by recorded timestamps" while the impl used zero-delay
timers and the PR body said it behaves like `instant`. Aligned to
reality: `live` currently behaves like `{ intervalMs: 0 }` (events
drain on successive microtasks via setTimeoutFn) because transcripts
do not yet carry per-event timestamps. Honest timestamp-driven
pacing is queued as a Phase 7+ follow-up.
Validated: pnpm guard, pnpm --filter @open-design/web typecheck,
Theater suite 47/47 (up from 42, +3 sse + 2 replay), full web suite
96 files / 888 tests.
* feat(i18n): seed Critique Theater key block (en + zh-CN; other locales fall back via spread)
* feat(web): Theater PanelistLane component (Phase 8.1)
* feat(web): Theater ScoreTicker component (Phase 8.2)
* feat(web): Theater RoundDivider component (Phase 8.3)
* feat(web): Theater InterruptButton component with Escape keybind (Phase 8.4)
* feat(web): Theater TheaterDegraded chip (Phase 8.5)
* feat(web): Theater TheaterCollapsed post-run summary (Phase 8.6)
* feat(web): Theater TheaterTranscript replay surface (Phase 8.7)
* feat(web): Theater TheaterStage top-level container (Phase 8.8)
* feat(web): Theater CSS using existing semantic tokens (no hex literals)
* feat(web): Theater public exports barrel
* fix(web): resolve P2 + P3 review feedback on Phase 8 (PR #1314)
Addresses all 4 P2 + 3 P3 items from codex, Siri-Ray, and lefarcen.
State-lifecycle fixes (3 x P2)
1. Reducer learns a synthetic `__reset__` action (`CritiqueResetAction`).
Host hooks dispatch it when their gating prop changes so a stale
run from a prior project / transcript cannot bleed into the next
context. Reset is idempotent on idle (returns the same reference).
2. `useCritiqueStream` dispatches `__reset__` at the top of its
connection effect, so a workspace switch from project A (which
streamed a critique) to project B clears the reducer before the
new EventSource opens. enabled=false also clears.
3. `useCritiqueReplay` dispatches `__reset__` at the top of its
parse effect, so transcriptUrl swaps (including swap-to-null after
a replay reached `shipped`) lift the reducer back to idle before
the new fetch starts.
SSE validation (1 x P2)
4. `sseToPanelEvent` now runs a per-variant `hasValidVariantShape`
check after the cheap `isPanelEvent` predicate. A
`critique.ship` frame missing `composite` / `round` / `status` /
`artifactRef` is rejected before reaching the reducer, so
TheaterCollapsed can no longer crash on `undefined.toFixed(1)`.
Every variant's required fields are validated: run_started
(protocolVersion, non-empty cast, maxRounds, threshold, scale),
panelist_* (round, role, plus variant-specific shape), round_end
(round, composite, mustFix, decision in {continue,ship}, reason),
ship (round, composite, status, artifactRef.{projectId,artifactId},
summary), degraded (reason, adapter), interrupted (bestRound,
composite), failed (cause), parser_warning (kind, position).
Reducer correctness (1 x P2)
5. `panelist_open` now materializes the round + an empty panelist
view (`{dims: [], mustFixes: []}`) so TheaterStage can highlight
the in-progress lane the instant the tag opens. Before this, a
stream that emitted only `panelist_open` after `run_started` left
`rounds = []` and the UI rendered no current round until a later
`panelist_dim` arrived.
Polish (3 x P3)
6. Brand role tint swaps from `var(--magenta, var(--accent))` to
`var(--purple, var(--accent))`. `--purple` is actually defined
across the design systems; `--magenta` is not, so Brand was
silently falling through to `--accent` and looking identical to
Designer.
7. New i18n key `critiqueTheater.interruptedSummary` for the
interrupted-collapse copy ("Interrupted at round N, best
composite X.X"). Previously the interrupted branch reused
`shippedSummary` and the UI read "Shipped at round..." for a run
that specifically did not ship. Native value in en + zh-CN; other
locales fall back via `...en` spread.
8. `TheaterDegraded` heading id comes from `useId()` instead of a
hardcoded `theater-degraded-heading`, so two chips rendered on
the same page (chat history with multiple completed runs) keep
their aria-labelledby references unambiguous.
Tests (15 new cases)
- reducer.test.ts (+5): __reset__ on running/terminal/idle, panelist_open materializes round, panelist_open does not stomp prior panelist data.
- sse.test.ts (+6): variant-level rejection for ship without required fields, degraded without adapter, run_started with empty cast, panelist_dim with non-numeric score, round_end with unknown decision, plus a positive fully-formed ship.
- useCritiqueStream.test.tsx (+2): state reset on projectId change, state reset on enabled flip false.
- useCritiqueReplay.test.tsx (+1): state reset on transcriptUrl swap to null after a replay reached shipped.
- TheaterCollapsed.test.tsx (text-pinning update): asserts the interrupted branch reads "Interrupted at round 1" + "best composite 7.9", and explicitly NOT "Shipped at round...".
- TheaterDegraded.test.tsx (+1): two chips on the same page get unique aria-labelledby ids that each resolve to an `<h3>`.
Validated
- pnpm guard clean
- pnpm --filter @open-design/web typecheck clean
- Theater suite: 13 files, 101 tests (was 86 on the first Phase 8 push, +15 new)
- tests/i18n/locales.test.ts 5 of 5 across 18 locales
* feat(web): CritiqueTheaterMount wires SSE + reducer into a single drop-in (Phase 9.1)
* feat(i18n): Critique Theater strings for de + ja + ko + zh-TW (Phase 9.2)
* fix(web): resolve P1 + P2 review feedback on Phase 9 (PR #1315)
Addresses every blocker from codex, Siri-Ray, and lefarcen. The
three state-lifecycle and SSE-validation issues they also flagged
inherit fixes from PR #1314's review pass that this branch now sits
on top of after rebase.
Real daemon kill on Interrupt (P1)
- CritiqueTheaterMount now POSTs to
/api/projects/:id/critique/:runId/interrupt alongside the
optimistic local dispatch. Before this fix, clicking Interrupt
only flipped the React state to interrupted while the daemon job
kept running. The fetch is best-effort: a 404 (endpoint not wired
yet, lands in Phase 15) is swallowed with a dev-mode console.warn
so the UI still moves to the collapsed badge.
- New fetchInterrupt test seam lets RTL assert on the URL / method
and simulate the "daemon not ready yet" path. Two tests pin both:
the happy URL proj-42/critique/run-abc/interrupt POSTs, and a
rejected fetch still flips the UI.
interruptPending reset on new run (P2)
- A ref-backed effect compares the current runId against the last
one we saw; when it changes, interruptPending is cleared. A user
who interrupts run-1 and then triggers run-2 from the same mount
now gets a fresh, enabled kill button instead of one stuck in
"Interrupting…". Pinned by a new mount test.
Escape keybind scope (P2)
- InterruptButton now checks the keydown target. Escape inside an
input, textarea, select, or contenteditable element is ignored
(and any ancestor of those via closest() is treated the same
way). Body-level focus still fires the keybind so the Theater
area's affordance keeps working. Four new tests cover textarea,
input, contenteditable, and the body-focus positive case.
userFacingName i18n key (P2)
- The spec at specs/current/critique-theater.md:6 mandates a single
critiqueTheater.userFacingName key so the "Design Jury" label can
be renamed without touching code. Phase 8 introduced
critiqueTheater.title by mistake; renamed across types.ts, en.ts,
zh-CN.ts, de.ts, ja.ts, ko.ts, zh-TW.ts, and the lone consumer
TheaterStage.tsx. The locale alignment test stays green.
Validated
- pnpm guard clean
- pnpm --filter @open-design/web typecheck clean
- Theater suite: 14 files, 112 tests (was 101 before, +11 new for
the Phase 9 review pass: 3 mount + 4 InterruptButton focus scope;
the rest were already in #1314's review fix).
- tests/i18n/locales.test.ts 5 of 5 across 18 locales.
* feat(daemon): adapter-degraded registry with TTL (Phase 10.1)
In-memory registry recording adapters that produced malformed or
oversize transcripts so the orchestrator can skip them for a TTL
window (default 24h) instead of cycling through known-bad providers
on every run.
Records carry reason (malformed_block | oversize_block |
missing_artifact), source label, and expiresAt. The test-only
clock seam lets the suite advance time deterministically and prove
that an expired entry stops counting as degraded without anyone
calling clearDegraded.
7/7 vitest cases green.
* feat(daemon): synthetic good + bad adapter fixtures (Phase 10.2)
Two test-only adapters that read the existing v1 transcript
fixtures (happy-3-rounds and malformed-unbalanced) and replay them
as either a full string or a 512-byte chunked stream. The chunked
form is what the conformance harness uses to prove the parser
holds together when the transcript arrives in arbitrary network
slices, not as one buffered blob.
* feat(daemon): adapter conformance harness (Phase 10.3)
runAdapterConformance pulls a transcript through the same
parseCritiqueStream pipeline the orchestrator uses and classifies
the outcome as shipped, degraded, or failed. On a degraded
outcome it forwards the matched reason to the adapter-degraded
registry, so a single nightly conformance run is what populates
the skip list rather than the orchestrator learning each adapter
is broken at request time.
5/5 vitest cases green covering shipped, malformed degraded,
oversize degraded, no-ship failure, and the harness-thrown
failure path.
* test(e2e): Critique Theater Playwright suite (Phase 11)
Six tests, one viewport per visual case, deterministic SSE
fixtures stubbed via page.route(). Adds the suite to
test:ui:extended so the existing extended-UI lane picks it up.
Coverage:
1. Happy path: a single mounted theater plays the full
fixture (1 run_started, 5 panelists open / dim / must_fix /
close, 1 round_end, 1 ship) and ends on the score badge.
2. Interrupt mid-run: the panelist that is open at the time
the interrupt button is clicked closes with an interrupted
marker and the transcript freezes there.
3. Visual regression at 375x720 mobile.
4. Visual regression at 768x1024 tablet.
5. Visual regression at 1280x800 desktop.
6. A11y role tree: the theater region exposes a labelled
landmark, each panelist lane is a group with an accessible
name, the score is a status live region.
All SSE traffic is stubbed by page.route so the suite runs in CI
without a daemon. The toggle is seeded via localStorage by
bootAppWithCritiqueEnabled so the gate behaves as if Settings
flipped it on. typecheck clean; playwright --list reports 6.
* test(web): reducer p99 bench at 10k iterations (Phase 13.1)
Locks the documented 2ms budget for the Critique Theater reducer
on a representative SSE script (27 actions, one full happy run)
behind a regression gate. Asserts p99 stays under 4ms (2x the
documented budget) so CI runners with a noisy neighbour do not
flake while a real regression to 20ms or 200ms still trips.
The bench is a vitest case rather than a bare microbenchmark so
it runs in the same CI lane as every other web test and does not
need a parallel runner.
* test(web): critique surface coverage walker (Phase 13.2)
Walks the public critique surface (11 SSE event names, 5 panelist
roles, 6 lifecycle phases, 9 named i18n keys) and asserts each
named symbol appears in both the src corpus and the test corpus.
The walker is the gate that catches a rename in one half of the
codebase without a matching update in the other half: a future
PR that drops 'panelist_must_fix' from the reducer without also
removing its test reference fails this suite.
62 assertions, one per symbol per corpus.
* docs: Critique Theater user guide (Phase 14.1)
Seven sections aimed at end users (not contributors):
1. What is Design Jury
2. How it works (the five panelists, auto-converging rounds,
the composite formula)
3. Settings (the M1 toggle and what it does)
4. Reading the score badge
5. Replay surface
6. Troubleshooting (degraded, interrupted, failed)
7. FAQ
The composite formula is documented as
designer * 0 + critic * 0.4 + brand * 0.2 + a11y * 0.2 + copy * 0.2
because anyone trying to reverse-engineer the score is going to
search for those weights and the docs are the place they should
land first.
* docs(daemon): critique module AGENTS map (Phase 14.2)
Daemon-side wayfinder for the apps/daemon/src/critique directory.
Tables every file, what owns what invariant, and the 'when you
change anything here' guide so a future contributor does not
have to reverse-engineer the rollout resolver before adding a
new SSE event.
* docs(web): Theater module AGENTS map (Phase 14.3)
Web-side mirror of the daemon AGENTS map. Same file table, same
invariants section, same change-impact guide, sized to the
Theater component package.
* feat(daemon): rollout flag resolver (Phase 15.1)
Single decision point every caller consults to know whether the
orchestrator should wire the critique pipeline for a given run.
Priority:
1. Skill-level policy (required wins, opt-out wins inversely)
2. Per-project override from the Settings toggle
3. OD_CRITIQUE_ENABLED env override
4. Rollout phase default
M0 dark-launch false
M1 settings only false (toggle is off until the user flips it)
M2 per-skill true if skill opted in
M3 global default true
OD_CRITIQUE_ROLLOUT_PHASE parser defaults to M0 on unknown input
so a fresh install never surprises a user with the feature on.
10/10 vitest cases green covering every cell of the matrix.
* feat(web): Settings toggle hook for Critique Theater (Phase 15.2)
React hook that reads critiqueTheaterEnabled from the existing
open-design:config localStorage blob and stays in sync via:
- the platform storage event (cross-tab)
- a open-design:critique-theater-toggle CustomEvent (same-tab)
Same-tab event is the one that fires when the Settings panel saves
in the current window: the toggle and every mounted theater update
without a page reload.
setCritiqueTheaterEnabled(next) is the imperative setter the Settings
panel calls. It preserves the rest of the stored config (mode, apiKey,
etc.) and dispatches the same-tab event after the localStorage write.
The web hook reflects what the user toggled; the daemon-side
isCritiqueEnabled is the final routing authority (project override,
env, rollout phase). When they disagree, the daemon wins for backend
gating and the web reflects the toggle state.
6/6 vitest cases green covering first read, stored read, same-tab
event flip, config preservation, corrupted JSON tolerance, and
cross-tab storage event.
* test(web): Phase 15 toggle hook failure-mode coverage (PR #1320)
lefarcen P2 on PR #1320 flagged that the PR body claimed safe
behavior for disabled localStorage, non-object JSON, and missing
CustomEvent shim, but the suite only covered corrupt JSON plus
happy-path storage events. Added four failure-mode tests so the
swallowed errors are not silently traded for a throw in a future
refactor:
1. Returns false on a stored JSON value that parses to an array
(non-object). Catches a regression where the guard treats
anything truthy as a config blob.
2. Returns false on a stored JSON value of literal 'null'.
typeof null === 'object' in JS, so the guard has to check null
explicitly; this test pins that check.
3. Returns false when localStorage.getItem throws (private mode /
disabled storage / SecurityError). The hook must swallow and
return false so the rest of the app keeps rendering.
4. setCritiqueTheaterEnabled still dispatches the same-tab
CustomEvent when localStorage.setItem throws (quota exceeded /
disabled storage). The dispatch path is the in-session
broadcast that keeps every mounted hook coherent even when
persistence is unavailable; verified by mounting two probes
and asserting both flip after the setter is called with a
throwing setItem.
10/10 vitest cases green (6 existing + 4 new).
* fix(web): honor CustomEvent payload in toggle hook listener (PR #1320)
Both Siri-Ray (blocking) and lefarcen (P2 new) caught the same
real bug in the failure-mode test I added in
|
||
|
|
a1859b7f40 |
fix(nix): update pnpmDepsHash for merged lockfile
The merged pnpm-lock.yaml (release/v0.7.0 contents on top of main) has a different hash than either parent. Adopt the value Nix computed in CI. |
||
|
|
f621dbbfea
|
feat(web): Add Tailwind foundation (#1388) | ||
|
|
c61ba320fd
|
feat(nix): Add official flake with home-manager and NixOS support (#402)
* nix: add official flake with home-manager and nixos modules
* Pin pnpm version
* Format README.md
* Populate PATH files to discover installed CLIs
* Revert "Populate PATH files to discover installed CLIs"
This reverts commit 18d88781a88b8781913cf5a8b680dfb38eabf7e4.
* Fix missing sqlite issue
* Fix system issue
* Reapply "Populate PATH files to discover installed CLIs"
This reverts commit
|