open-design

mirror of https://github.com/nexu-io/open-design.git synced 2026-06-01 03:14:35 +07:00

Author	SHA1	Message	Date
kami	333a62cda6	fix: link od bin after fresh install (#2069 ) * fix: link od bin after fresh install * test: lock root od bin shim path * test: cover root workspace deps in postinstall scan * chore(nix): refresh pnpm deps hash	2026-05-31 04:36:49 +00:00
lefarcen	da19ff3ca0	feat(mocks): replay-based mock CLIs for 14 of OD's supported agents (opencode/codex/claude/gemini/cursor-agent/deepseek/qwen/grok + ACP family devin/hermes/kilo/kimi/kiro/vibe) (#3241 ) * feat(mocks): replay-based mock CLIs for opencode/claude/codex/deepseek/qwen/grok Drops in a `mocks/` top-level dir that pretends to be the real agent CLIs by streaming pre-recorded sessions in each CLI's native stdout protocol. Zero LLM tokens. ## Use cases - E2E tests in `apps/daemon/tests/` — exercise the full chat-server pipeline against a known trace, assert UI events / artifacts. - Self-validation during dev — iterate on `claude-stream.ts` / `json-event-stream.ts` parser changes without burning provider budget. - Regression harness — replay the same trace before and after a charter / parser change; diff the daemon events the UI surfaces. - Demo / onboarding — show what a 17-tool claude editing session looks like end-to-end, offline. ## How - 6 bash wrappers (`mocks/bin/`) shadow the real CLIs when PATH-overlaid. - `mocks/mock-agent.mjs` reads `mocks/recordings/<trace>.jsonl`, picks one via env var (`SYNCLO_EXPLORE_MOCK_TRACE` / `_POOL` / `_BY_PROMPT_HASH`), streams the trace in the requested format. - Each format renderer matches the EXACT JSON shape the OD daemon parser expects, verified line-by-line against `apps/daemon/src/{json-event-stream,claude-stream}.ts`: \| CLI \| streamFormat \| parser source \| \| ------------------------- \| ------------------------- \| ------------------------------------------ \| \| `opencode` \| `json-event-stream` \| `handleOpenCodeEvent` \| \| `codex` \| `json-event-stream` \| `handleCodexEvent` \| \| `claude` \| `claude-stream-json` \| `createClaudeStreamHandler` \| \| `deepseek` `qwen` `grok` \| `plain` \| `server.ts` (raw stdout) \| ## Quick start ```bash export PATH="$PWD/mocks/bin:$PATH" export SYNCLO_EXPLORE_MOCK_TRACE=04097377 # 8-char prefix OK export SYNCLO_EXPLORE_MOCK_NO_DELAY=1 echo "any prompt" \| opencode run echo "any prompt" \| claude -p --output-format=stream-json echo "any prompt" \| codex exec ``` The mock binary announces the picked trace id on stderr: `[mock-opencode] picked 04097377… via fixed`. Recording selection (env, in priority order): - `SYNCLO_EXPLORE_MOCK_TRACE=<id>` — fixed (prefix OK) - `SYNCLO_EXPLORE_MOCK_BY_PROMPT_HASH=1` + stdin prompt — `sha256(prompt) % N` - `SYNCLO_EXPLORE_MOCK_POOL=<tag>` — random within `agent:claude` / `skill:agent-browser` / `outcome:failed` / etc. - (default) uniform random - `SYNCLO_EXPLORE_MOCK_SEED=<str>` — reproducible "random" - `SYNCLO_EXPLORE_MOCK_NO_DELAY=1` — skip inter-event waits ## Dataset 179 anonymized Langfuse traces from this project's own production telemetry: - 9 agents: claude 57 · opencode 41 · codex 38 · gemini 25 · cursor-agent 11 · qwen 2 · copilot 2 · deepseek 2 · antigravity 1 - outcomes: succeeded 144 · failed 35 - skills: default 71 · ad-creative 50 · algorithmic-art 30 · agent-browser 22 · video-hyperframes 2 · plus magazine-web-ppt / brainstorming / data-report / penpot-flutter-design-source 1 each - 124 multi-turn (sessions with ≥2 turns) - 18 produce `<artifact>` output - ~4.5 MB on disk total Anonymization: `/Users/<name>/` → `${HOME}/`, `C:\Users\<name>\` → `%USERPROFILE%\`, project UUIDs → stable `proj-001`, `proj-002`, …. Tool input/output payloads preserved verbatim (templated UI, no cell-level PII). ## Smoke test `bash mocks/scripts/smoke-test.sh` — 6 checks across all 6 agents. All pass on this branch (verified locally): ``` ✓ opencode first event = step_start ✓ codex first event = thread.started ✓ claude first event = system ✓ deepseek emitted plain text (144 chars on first line) ✓ qwen emitted plain text (144 chars on first line) ✓ grok emitted plain text (144 chars on first line) All mock CLIs working. ✅ ``` ## Adding more recordings The exporter that produced this set lives in [nexu-io/agent-pr-explore](https://github.com/nexu-io/agent-pr-explore) (see `cli/src/local/orchestrator/langfuse-import.ts` + the `local langfuse-import` CLI command). Operators with the Langfuse keys can pull more by tag / outcome / artifact / multi-turn filter, then run `local recordings anonymize --out-dir ~/Documents/open-design/mocks/recordings`. `mocks/README.md` has the full instructions. ## Out of scope (follow-ups) - ACP agents (`devin`, `hermes`, `kilo`, `kimi`, `kiro`, `vibe`) need a JSON-RPC server on stdio rather than a one-shot stream — separate `format-acp.mjs` module not yet written. - Per-agent json-event-stream variants (`cursor-agent`, `gemini`, `qoder`, `copilot`, `pi`) currently fall back to the `plain` renderer; their parsers are in `apps/daemon/src/json-event-stream.ts` and follow the same template as `format-codex.mjs`. ## AGENTS.md updates - Added `mocks/` to the top-level content directories listing - Added a Validation strategy bullet pointing here for agent-stream / parser changes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(mocks): add opencode-cli/kiro-cli/vibe-acp bin aliases and unref ACP timeout - Add mocks/bin/opencode-cli, kiro-cli, vibe-acp wrappers for the primary RuntimeAgentDef bin names OD resolves before any fallback. Without these, a PATH-overlaid OD daemon run bypasses the mock entirely (opencode-cli, kiro-cli) or cannot find the mock at all (vibe-acp, which has no fallback). - Include opencode-cli, kiro-cli, vibe-acp in the smoke-test ACP/JSON loop so coverage is verified end-to-end. - Call .unref() on the 30s safety timeout in format-acp.mjs so a completed ACP session exits promptly instead of waiting the full 30 seconds. Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code) * feat(mocks): add vela (AMR) — login / models / ACP with strict set_model gate Extends mocks/ to cover OD's own AMR runtime. `vela` is the bin name `apps/daemon/src/runtimes/defs/amr.ts` specifies (`bin: 'vela'`, `streamFormat: 'acp-json-rpc'`). It's richer than the generic ACP agents — covers full login + models + chat-session lifecycle. ### What vela does (mirrored from apps/daemon/tests/fixtures/fake-vela.mjs) 1. `vela login` — writes ~/.amr/config.json with a fake profile (controlKey, runtimeKey, user{email,name,plan}, profile-specific apiUrl/linkUrl). The on-disk projection is what OD's daemon login route + AmrLoginPill poller read; production goes through device-auth, the mock skips straight to the file write. 2. `vela models` — prints the production-shaped public model catalog as newline-separated `public_model_* vela` lines. Override via FAKE_VELA_MODELS env. 3. `vela agent run --runtime opencode` — ACP JSON-RPC server with three vela-specific protocol extensions: a. `initialize` response carries `agentCapabilities` (`promptCapabilities.embeddedContext`) + `models` (`currentModelId` + `availableModels`). b. `session/new` response carries the same `models` block. c. Strict set_model gate: `session/prompt` is rejected with JSON-RPC -32602 ("session/set_model must be called before session/prompt") UNLESS `session/set_model` (or `session/set_config_option`) has been called for the current sessionId. Mirrors real vela 0.0.1 contract; catches regressions in `attachAcpSession` that silently skip set_model. ### Error injection envs (in sync with fake-vela.mjs) FAKE_VELA_SESSION_ID - sessionId returned by session/new FAKE_VELA_TEXT - override assistant text FAKE_VELA_THOUGHT - optional thought_chunk before text FAKE_VELA_SESSION_NEW_ERROR - fail session/new FAKE_VELA_SET_MODEL_ERROR - fail session/set_model FAKE_VELA_PROMPT_ERROR - fail session/prompt FAKE_VELA_REQUIRE_SET_MODEL='0' - disable the strict gate (legacy) FAKE_VELA_LOGIN_USER_EMAIL - email written into config profile FAKE_VELA_LOGIN_USER_PLAN - plan written into config profile FAKE_VELA_LOGIN_DELAY_MS - sleep before write (test in-flight) FAKE_VELA_LOGIN_FAIL - print + exit 1 FAKE_VELA_MODELS - override models stdout VELA_PROFILE - profile slot (prod \| test \| local) ### Components `mocks/lib/format-vela.mjs` (~205 LOC) - Full ACP server with vela protocol extensions - Strict set_model gate - Error injection plumbing `mocks/lib/vela-subcommands.mjs` (~90 LOC) - runVelaLogin() — writes ~/.amr/config.json - runVelaModels() — prints catalog `mocks/bin/vela` — dispatcher wrapper. Forwards `vela <subcmd>` to mock-agent.mjs which routes to login/models or falls through to ACP. `mocks/mock-agent.mjs` — parseArgs now collects positionals so the vela dispatcher can read subcommand from there; switch case added for vela. `mocks/scripts/smoke-test.sh` — +4 assertions: vela models prints ≥10 catalog lines vela login writes ~/.amr/config.json with the requested email vela agent run ACP roundtrip (initialize+models+set_model+stream+result) vela strict set_model gate rejects prompt without prior set_model ### Verified locally ✓ vela models printed 15 catalog lines ✓ vela login wrote ~/.amr/config.json with profile.prod.user.email ✓ vela agent run ACP roundtrip (initialize+models, set_model accepted, prompt streamed) ✓ vela strict set_model gate rejects session/prompt without prior set_model All 21 smoke checks pass (up from 17 with previous P3 ACP commit). ### AGENTS.md + README updates AGENTS.md — mention `vela (AMR — vela CLI)` alongside ACP agents in the directory listing entry. mocks/README.md — protocol table row + dedicated vela section with subcommand contract, strict gate explanation, env-injection cheat sheet. Mock-tree listing updated. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(mocks): honor REPORT_FILE env when --report-file flag not given Harnesses that spawn the mock without translating their report-path contract to the mock's CLI flag (notably nexu-io/agent-pr-explore's orchestrator, which passes REPORT_FILE as env per the existing opencode/claude/codex agent launchers) wouldn't get a report file written, so the harness's "agent exit 0 but produced no report" check would always fire and mark mock runs as failure even though the stdout stream was complete. Fix: in mock-agent.mjs parseArgs, fall through to process.env.REPORT_FILE when --report-file wasn't provided on argv. Each format renderer already accepts opts.reportFile and writes the recording's final assistant text to it (`format-.mjs` already had this — only the wiring was missing). Verified: synclo-explore run with `mock=true, mock_trace=04097377` against the opencode wrapper now produces a plan.md with the recording's 17-tool claude editing session report. ~1.5s per run vs ~70s real opencode. mocks: move recordings to Cloudflare R2; PR→main→Action upload path The 179-recording corpus (~4.5 MB raw, ~280 KB after compression) has been moved off git into Cloudflare R2 at the bucket open-design-mocks under recordings/v1/. The repo now ships: - mocks/manifest.json — the canonical catalog (renamed from recordings/index.json) with sha256 + storage hints; consumers fetch this to discover what exists, then pull individual jsonl files on demand - mocks/scripts/fetch-recordings.sh — parallel, sha256-verified, idempotent puller for the public r2.dev URL - mocks/scripts/add-recording.sh — local maintainer helper that validates a new .jsonl and copies it into recordings-staging/ (no R2 calls; no credentials needed) - mocks/scripts/upload-to-r2.mjs — called only by the CI workflow - mocks/scripts/lib/manifest-utils.mjs — shared sha256/meta/ rebuild-histograms logic, used by both add-recording (preview) and upload-to-r2 (actual write) so the entry shape never drifts - .github/workflows/sync-mocks-to-r2.yml — fires on push to main when mocks/recordings-staging/ changes; uploads to R2, updates manifest, commits cleanup back; serialized via concurrency group Trust model: R2 write credentials (CLOUDFLARE_API_TOKEN, CLOUDFLARE_ACCOUNT_ID) are repo secrets; nobody can push from a laptop. Read stays public via the r2.dev URL. Why not pnpm install integration: contributors who do not touch agent code do not pay the fetch cost. Fetch happens on first smoke-test run (auto-fallback) or when a mock spawn needs data. Repo size: -4.55 MB net (delete 179 jsonl, +280 KB manifest + scripts). Smoke test (21 checks) still green against the fetched corpus. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mocks: scope R2 write token to a dedicated secret name Use CLOUDFLARE_R2_MOCKS_TOKEN (instead of reusing the shared CLOUDFLARE_API_TOKEN that landing-page-.yml uses for Pages deploys) so the R2 write capability can be scoped to just the open-design-mocks bucket without bleeding extra capability into the Pages workflows. Also hardcode the powerformer CF account_id directly in the workflow (account IDs are not secret and the shared CLOUDFLARE_ACCOUNT_ID secret may point at a different account). Workflow now fails fast with an actionable error message + dashboard link if the secret is unset. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> mocks: switch R2 sync to S3-compat API (wrangler getMemberships gate) wrangler 4.x calls /memberships before any r2 action, requiring user:read scope. R2 "Object Read & Write" tokens deliberately lack that scope (defense in depth — a leaked token should not enumerate account-level resources). The workflow now uses the aws CLI talking straight to the R2 S3-compatible endpoint with SigV4, no membership lookup. Secret rotation: CLOUDFLARE_R2_MOCKS_TOKEN (Bearer) is replaced by CLOUDFLARE_R2_MOCKS_AK / CLOUDFLARE_R2_MOCKS_SK (matching the existing CLOUDFLARE_R2_RELEASES_AK/SK naming convention). End-to-end tested locally: PUT recording → manifest rebuild → manifest PUT → staging cleanup all green. aws CLI is pre-installed on ubuntu-latest, so no install step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mocks: scrub synclo namespace; use OD_MOCKS_* env prefix throughout These mocks were copy-pasted from synclo-explore, where they originated, and inherited the SYNCLO_EXPLORE_MOCK_* env-var convention. That brand-bleed is not appropriate in OD: rename the public env surface to OD_MOCKS_* (matching OD-native prefixes like OD_MOCKS_CACHE_DIR, OD_TRACE_R2_UPLOAD, OD_EXPECT_TIMEOUT_SECONDS). Renames: SYNCLO_EXPLORE_MOCK_TRACE → OD_MOCKS_TRACE SYNCLO_EXPLORE_MOCK_BY_PROMPT_HASH → OD_MOCKS_BY_PROMPT_HASH SYNCLO_EXPLORE_MOCK_POOL → OD_MOCKS_POOL SYNCLO_EXPLORE_MOCK_SEED → OD_MOCKS_SEED SYNCLO_EXPLORE_MOCK_NO_DELAY → OD_MOCKS_NO_DELAY SYNCLO_EXPLORE_MOCK_RECORDINGS_DIR → OD_MOCKS_RECORDINGS_DIR SYNCLO_EXPLORE_MOCK_SMOKE_TRACE → OD_MOCKS_SMOKE_TRACE SYNCLO_OD_MOCKS_I_KNOW_WHAT_IM_DOING → OD_MOCKS_ALLOW_LOCAL_UPLOAD Also drop the inline harvester usage from README. The harvester is an external CLI in nexu-io/agent-pr-explore — its README is the right place for langfuse-import flags, anonymization options, etc. OD only documents its own staging→PR→Action workflow. Smoke test (21 checks) still green; OD_MOCKS_TRACE end-to-end verified to route correctly. Consumers of the OLD env names (notably the orchestrator in nexu-io/agent-pr-explore) need a matching rename. No back-compat shim here — the explore side has zero external users today and a one-line follow-up is cleaner than a permanent deprecation layer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * AGENTS.md: align mock env names with mocks/ rename (SYNCLO_* → OD_MOCKS_) Missed in the prior commit (`a30b868a`) — only grepped mocks/ subdir. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> mocks: drop staging dir + GH Action; back to local-script upload The staging-dir + Action design (added earlier in this PR) had a flaw the user caught: new recordings briefly entered the repo on their way through staging, leaving them in git history forever even after the Action cleanup commit removed them from HEAD. That defeats the whole point of moving recordings to R2. Replace with the simpler local-maintainer flow: bash mocks/scripts/upload-recording.sh /path/to/<trace>.jsonl # → validates, wrangler r2 put, updates manifest.json, wrangler r2 put manifest git add mocks/manifest.json && git commit && git push # → only the ~200B manifest delta enters git The wrangler-OAuth gate replaces the CI secret + Action duo. For a solo / small maintainer team this collapses the trust chain down to "do you have wrangler login to the powerformer account?" — no GH secrets to rotate, no concurrency window to worry about, no inevitable repo-history bloat. Deletes: - .github/workflows/sync-mocks-to-r2.yml - mocks/scripts/upload-to-r2.mjs (CI-only) - mocks/scripts/add-recording.sh (staging helper, now obsolete) - mocks/recordings-staging/ (empty dir, never to be repopulated) Adds: - mocks/scripts/upload-recording.sh Kept: - mocks/scripts/fetch-recordings.sh - mocks/scripts/lib/manifest-utils.mjs (still used by upload-recording.sh) - mocks/manifest.json (committed; the only mocks artifact in git) End-to-end tested locally: re-upload an existing recording is idempotent, manifest math is stable, fetch + smoke test still green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mocks: address review — guard allowlist + safe ~/.amr + loud OD_MOCKS_TRACE typo Three concrete issues raised across recent Siri-Ray (Looper) review threads on #3241: 1. scripts/guard.ts only allowlisted mocks/lib/ + mocks/mock-agent.mjs, leaving mocks/scripts/lib/manifest-utils.mjs outside the residual- JS guard. Result: Preflight fail on every push. Extend the allowlist to mocks/scripts/ — same precedent as the lib/ entry directly above. 2. mocks/scripts/smoke-test.sh moved the caller real ~/.amr to ~/.amr-smoke-backup, ran vela login (which writes a fake config), then rm -rf the .amr and restored the backup. Two failure modes: crash mid-run loses the user real config, and re-running before restore overwrites the backup with the fake login. Fix: sandbox vela login into a mktemp -d HOME via env (HOME=$amr_sandbox vela login). Never touches the real ~/.amr at all. trap cleans up. 3. mocks/lib/recording-picker.mjs silently fell through to prompt-hash → pool → random when OD_MOCKS_TRACE was set but did not match any recording (typo, prefix too short, corpus not fetched). Tests using a pinned trace would silently get a different trace, hiding regressions. Fix: throw an explicit error with the failing value + a pointer at fetch-recordings.sh. Verified locally: pnpm guard prints "Residual JavaScript check passed", smoke-test still 21/21, ~/.amr mtime unchanged after run, typo on OD_MOCKS_TRACE now produces "mock-agent: OD_MOCKS_TRACE=... set but no matching recording in <dir>" on stderr. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fetch-recordings: detect empty filter result before line-counting printf '%s\n' on an empty string emits a single empty line, so the previous TOTAL=$(printf ... \| grep -c "") math returned 1 on an empty $ENTRIES_TSV — a typo like `--agent no-such-agent` printed "Fetching up to 1 recordings", downloaded zero, and exited 0 ("ready"). Check `-z $ENTRIES_TSV` first. Reproduced + fix verified per the reviewer thread. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mocks: address mrcfps review — goldens + provenance + contract check Three durability improvements suggested in the PR #3241 top-level review: ## 1. Golden daemon-event snapshots (mocks/golden/.events.json + apps/daemon/tests/mocks-golden.test.ts) Smoke-test verified that mocks RUN; that catches crashes but not a parser change that semantically reshapes the events the daemon emits. Commit the daemon-event sequence for 3 representative traces: - claude 314d6833 — median-complexity agent-browser session - codex dcdff3b3 — 14-tool refactor - opencode 9a9522ec — 7-tool data-report apps/daemon/tests/mocks-golden.test.ts spawns the mock, feeds stdout through the real createClaudeStreamHandler / createJsonEventStreamHandler, normalizes per-spawn volatile fields (only sessionId today, only on claude), and deep-equals against the committed snapshot. A parser regression fails the test loudly. After an intentional parser change, regenerate: MOCKS_GOLDEN_UPDATE=1 pnpm --filter @open-design/daemon test mocks-golden git diff mocks/golden/ # eyeball; commit if shapes match intent ## 2. Provenance fields on every manifest entry (mocks/scripts/lib/manifest-utils.mjs + mocks/manifest.json) Augment inspectRecording() to write: captured_at — ISO 8601 from existing meta.timestamp cli_version — null until harvester writes it protocol_version — null until harvester writes it anonymization_version — null until harvester writes it captured_at is now populated for all 179 existing entries from the meta event the harvester already emits. The harvester in nexu-io/agent-pr-explore is the next step for cli_version / protocol_version / anonymization_version — once those are populated, consumers can detect when a recording is older than ~1 minor version behind the live CLI and flag for re-harvest. No matrix of (cli_version × agent) recordings — that explodes maintenance. Just metadata per recording so trust decay is visible. ## 3. Real-CLI contract check (mocks/scripts/contract-check.sh + docs/MOCKS-CONTRACT-CHECK.md) Mocks catch parser regressions against recordings; they do NOT catch recordings drifting away from the live agent CLI as that CLI evolves. The contract check spawns the real CLI alongside the mock with a fixed deterministic prompt + diffs top-level event-type distributions. Deliberately human-driven, not cron-scheduled: - costs real LLM tokens per invocation - requires real CLI auth - maintainer reads the output, not a regex Suggested triggers per doc: real-CLI release notes mentioning "output format" / "stream" / "JSON" / "events"; before a parser refactor; ad-hoc when something looks off. ## Coverage note README updated to position mocks as "deterministic protocol/parser coverage" (not "e2e replacement") per mrcfps framing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> fix(mocks-golden test): drop import of non-exported ParserKind Use plain string (the type alias is `string` anyway) — Preflight typecheck on `a31fa71a` failed: tests/mocks-golden.test.ts(29,8): error TS2459: Module "../src/json-event-stream.js" declares "ParserKind" locally, but it is not exported. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * recording-picker: structured OD_MOCKS_POOL + hard-fail no-match Siri-Ray review: \`OD_MOCKS_POOL=outcome:failed\` was documented as a supported selection knob, but the matcher only checked tags and \`meta.agent\` — so the negative-path pool found 0 candidates and silently fell through to global random, validating against any recording instead of a failed trace. Fix: - Parse \`<dim>:<value>\` shape and route each dim to the right meta field: \`outcome\` → \`meta.outcome\`, \`agent\` → \`meta.agent\`, \`skill\` → \`tags[]\`. Bare values still fall back to tag substring. - If the env was set and matched nothing, throw with the failing value and a jq one-liner for inspection. Same loud-fail policy as OD_MOCKS_TRACE — silent fallback was the original bug. Verified locally: outcome:failed, agent:codex, skill:agent-browser all route correctly; outcome:nonsense throws the explicit error. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * contract-check.sh: fix lost $PROMPT in mock invocation Siri-Ray review on `e576074a`: the mock side wrapped its pipeline in `bash -c "printf %s \"\$PROMPT\" \| ..."` — but $PROMPT was a parent shell variable, not exported, so the child bash expanded it to an empty string. Result: the contract check sent the real prompt to the real CLI and an empty string to the mock, defeating the same-input invariant the whole script rests on. Also let the mock randomly select a different trace whenever a maintainer happens to have OD_MOCKS_BY_PROMPT_HASH=1 in their env. Fix: drop the inner bash -c entirely; use a subshell that scopes the PATH overlay and pipes printf into the PATH-resolved mock binary directly. The subshell limits the PATH change without var-passing. Verified locally: with prompt-A the mock picks trace 54ec02ee via hash; prompt-B → 2667e851 via hash; empty prompt (old broken behavior) → random — confirms the prompt is now actually reaching the mock under PATH overlay. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-29 07:17:20 +00:00
lefarcen	df8a0faff6	feat(runtimes): register AMR (vela) as an ACP stdio agent (#2355 ) * feat(runtimes): register AMR (vela) as an ACP stdio agent AMR is the vela CLI's ACP runtime mode. `vela agent run --runtime opencode` speaks ACP JSON-RPC over stdio (see vela's `specs/current/runtime/manual-agent-run-openrouter.md`); per `docs/new-agent-runtime-acp.md` we expose it through the same `streamFormat: 'acp-json-rpc'` transport that already powers Hermes, Devin, Kimi, etc. The new `defs/amr.ts` is the entire wiring — `buildArgs` returns `['agent', 'run', '--runtime', 'opencode']`, `fetchModels` reuses `detectAcpModels`, and the fallback list seeds the OpenRouter ids vela's e2e baseline uses. `executables.ts`/`app-config.ts`/`metadata.ts` get the matching `VELA_BIN`/`VELA_LINK_URL`/`VELA_RUNTIME_KEY`/`VELA_OPENCODE_BIN` allowlist + install/docs URLs, so users can configure the per-agent env in Settings without leaking into other adapters. Coverage: `tests/fixtures/fake-vela.mjs` is a minimal ACP stub that returns the documented `initialize` / `session/new` / `session/set_model` / `session/prompt` shapes; `tests/amr-acp-integration.test.ts` spawns it via `child_process.spawn` and drives a full turn through `attachAcpSession` and `detectAcpModels`, so the ACP transport contract for AMR is end-to-end verified locally even before a real `vela` binary is installed. Validated: - pnpm guard - pnpm typecheck (all workspace projects) - pnpm --filter @open-design/daemon test (2881/2881) Deferred: real OpenRouter-backed turn through a built `vela` binary — the runtime def needs no changes for that path, only `VELA_RUNTIME_KEY` and `VELA_LINK_URL` in env (or Settings). * fix(runtimes/amr): pin a concrete default model and bare openai ids End-to-end validation against a freshly-built `vela` (nexu-io/vela@main) + OpenRouter surfaced two contract details the first AMR runtime def got wrong: 1. vela rejects `session/prompt` with `session/set_model must be called before session/prompt`. attachAcpSession in apps/daemon/src/acp.ts skips set_model whenever the picked model is the synthetic 'default' id, so AMR's fallback list must NOT include DEFAULT_MODEL_OPTION. The def now ships a concrete `gpt-5.4-mini` as both `fetchModels`' default option and `fallbackModels[0]`, which makes attachAcpSession always send a real `session/set_model` for AMR turns. 2. `vela --runtime opencode` auto-prepends `openai/` to whatever modelId it forwards to opencode's openai provider. With OpenRouter-style ids like `openai/gpt-5.4-mini`, opencode receives the double-prefixed `openai/openai/gpt-5.4-mini` and replies `ProviderModelNotFoundError`. The new fallback list ships the bare ids opencode's openai registry actually knows about (gpt-5.4, gpt-5.4-mini, gpt-5.4-fast, etc.). Stub + tests: - tests/fixtures/fake-vela.mjs now enforces the set_model gate the same way real vela does, so a regression that silently goes back to model: 'default' would surface as a fatal error in tests instead of a hidden production failure. - tests/amr-acp-integration.test.ts pins both contracts: no 'default' / no 'openai/' prefix in fallbackModels, and a negative case that asserts session/prompt fails when no model is set. Adds `apps/daemon/scripts/verify-amr-real-vela.mjs` — a small dev-time runner that drives `attachAcpSession` against a real `vela` binary and prints the daemon's chat events, so future protocol drift can be checked against an actual OpenRouter call. Verified locally: `vela agent run --runtime opencode` + OpenRouter returns the prompted string ("AMR-E2E-PASS") through the full daemon pipeline; daemon test suite stays 2883/2883. * fix(runtimes/amr): substitute concrete model when chat run sends 'default' A plugin-driven AMR run from the UI surfaced a real-world hole in the prior commit: json-rpc id 3: session/set_model must be called before session/prompt The Default-design-router plugin (and any caller that doesn't pin a real model) sends `model: 'default'` straight through, which the AMR runtime def cannot accept — vela rejects `session/prompt` without `session/set_model` and attachAcpSession skips set_model whenever model === 'default'. Just leaving DEFAULT_MODEL_OPTION out of the adapter's `fallbackModels` is not enough: the chat-run handler in server.ts still forwarded 'default' verbatim. This adds `resolveModelForAgent(def, resolved, env?)` as the single source of truth for the substitution: 1. If the caller picked a real id, pass it through. 2. Else, if `def.defaultModelEnvVar` is set and the daemon process env has a non-empty value for it, return that (operator escape hatch — see below). 3. Else, if the def's `fallbackModels` does NOT contain a 'default' id, return `fallbackModels[0].id`. 4. Else, return the original value (the historic shape — defs that list 'default' themselves are untouched). AMR sets `defaultModelEnvVar: 'VELA_DEFAULT_MODEL'`, so when opencode's openai-provider registry deprecates `gpt-5.4-mini` upstream, an operator can swap the fallback id without a code change by exporting `VELA_DEFAULT_MODEL=gpt-5.5` before launching tools-dev / od. Worth noting the env var must live in the daemon's `process.env` (Settings-UI per-agent env values only reach the spawned child, not the daemon's resolver) — the new field's docblock spells this out. Coverage: - `tests/runtimes/resolve-model.test.ts` — 8 unit tests covering all four resolver branches plus the env-override happy path / fallback / ignore-when-user-picked-a-real-id case. - `pnpm --filter @open-design/daemon typecheck` clean. * chore(runtimes/amr): move AMR to the top of the base agent list So `AMR (vela)` shows up first in the agent picker / status views, ahead of claude / codex. Pure ordering change; no behavior delta. * feat(amr): Sign-in / Sign-out button on the AMR Settings card The first half of the AMR work assumed the operator would set VELA_RUNTIME_KEY / VELA_LINK_URL on the daemon process and never surfaced login state to users. This adds the missing UX so a fresh install can drive the full path from Settings: - GET /api/integrations/vela/status reads ~/.vela/config.json for the active profile and returns { loggedIn, profile, user } (without leaking the runtime/control keys themselves). - POST /api/integrations/vela/login spawns `vela login` once (409 if one is already in flight). The vela CLI opens the user's browser to the device-authorization page itself — Open Design only needs to kick the subprocess off. - POST /api/integrations/vela/logout removes ~/.vela/config.json so the next status read returns logged-out. `AmrAgentCard` is a dedicated agent-card component for AMR because the existing `<button>` row can't host an interactive sub-control (nested interactive elements). It polls /status after a login click until the daemon reports loggedIn=true (or 5 minutes elapse), and exposes a Sign-out action on hover. Other adapters (claude, codex, hermes, …) keep their existing `<button>` card. i18n: 8 new keys (settings.amrLogin / Logout / LoggingIn / etc.) added to en + zh-CN. Other locales spread `en` and inherit the English copy until translations land. Coverage: - `tests/integrations/vela.test.ts` pins the config.json reader against a tmp HOME — including the negative case where a profile has user info but no runtimeKey (still logged-out), and the secret-leak guard ("rt-secret-" must not appear in the projection payload). - `tests/components/AmrAgentCard.test.tsx` covers all four UI states (logged-out, logging-in, logged-in, logging-out) plus the click-propagation invariant the divergent card was built to keep. `pnpm --filter @open-design/daemon test` 2901 / 2901 passing. `pnpm --filter @open-design/web test` 1719 / 1719 passing. `pnpm typecheck` + `pnpm guard` clean. Dev script side-effects: `apps/daemon/scripts/verify-amr-real-vela.mjs` no longer requires both VELA_RUNTIME_KEY and VELA_LINK_URL — if VELA_PROFILE is set, the vela CLI is allowed to resolve credentials from `~/.vela/config.json`. Added the two AMR `.mjs` fixtures to `scripts/guard.ts` allowlist with the executable-fixture / dev-runner rationale. fix(connection-test): substitute model for AMR before attachAcpSession The chat-run path in server.ts already routes the requested model through `resolveModelForAgent` so AMR / vela (whose CLI demands an explicit `session/set_model` before `session/prompt`) gets the def's first concrete fallback id when the chat run ships `model: 'default'`. `connectionTest.ts` was wiring `attachAcpSession({ ..., model: model ?? null })` directly, which made the Test Connection button on the AMR Settings card deadlock with the same `session/set_model must be called before session/prompt` error the chat-run path already handles — surfaced as a permanent "Testing connection…" spinner in the UI. Reuse the same helper here so Test Connection mirrors chat-run behavior. * test(amr): three-layer end-to-end coverage for the AMR login + turn flow The PR up to this point shipped runtime + UI code with unit-level Vitest coverage. This commit adds the cross-layer regression net the live demo relied on: 1. apps/daemon/tests/integrations/vela.routes.test.ts (HTTP, Vitest) Spins up the real daemon Express app via `startServer({port:0,...})`, persists `agentCliEnv.amr.VELA_BIN = <fake>` into app-config.json, and exercises every /api/integrations/vela/* endpoint against the extended fake-vela stub: - status reads ~/.vela/config.json under various states - login spawns the fake, waits for config.json to appear, returns pid + startedAt + profile - 409 already-running guard with the stub's delay knob - logout removes the file (idempotent) - secrets (runtimeKey / controlKey) never leak in the projection - login → status round-trip flips loggedIn=false → true 2. e2e/tests/amr/turn.test.ts (tools-dev orchestrated, Vitest) Boots a namespaced daemon + web pair through `createSmokeSuite`, inlines a self-contained fake `vela` binary that handles BOTH `vela login` (writes ~/.vela/config.json) and `vela agent run --runtime opencode` (ACP stdio with the `session/set_model must precede session/prompt` gate the real binary enforces), then drives a complete /api/runs lifecycle for `agentId: 'amr', model: 'default'` and asserts the assistant message captures the fake's streamed text. This is the test that would have surfaced today's plugin-default-model regression (the `set_model before prompt` error) at PR time instead of demo time. 3. e2e/ui/amr-login-pill.test.ts (Playwright) Mocks /api/agents + /api/integrations/vela/{status,login,logout} to drive the Settings AMR card through the full Sign in → Signed in → Sign out cycle. Pins the AmrLoginPill polling contract and the aria-label semantics (the pill's accessible name is "Sign out" once logged in, regardless of which label the hover-state text shows). fake-vela.mjs extensions: - Handles `vela login` argv by writing ~/.vela/config.json for the active VELA_PROFILE and exiting 0 — mirrors real vela's on-disk side-effect without the device-auth loop. - FAKE_VELA_LOGIN_DELAY_MS knob so route tests can observe the in-flight state of the spawn lifecycle. - FAKE_VELA_LOGIN_USER_EMAIL / _USER_PLAN to assert the surfaced user fields end-to-end. Validated: - `pnpm guard` + `pnpm typecheck` (all workspace projects) - `pnpm --filter @open-design/daemon test`: 2998 / 2998 passing, including the new 8-test integration suite. - `cd e2e && pnpm test tests/amr`: 1 / 1 passing. - `cd e2e && pnpm exec playwright test ui/amr-login-pill.test.ts`: 1 / 1 passing (6.7s). * feat(amr): package native cli and refine login ui * feat(amr): wire vela cli beta packaging * docs(amr): document vela ci packaging review * docs(amr): refine vela ci integration review * fix(ci): refresh nix pnpm dependency hashes * fix(pack): clean up Vela CLI packaging * fix(pack): bundle Vela CLI support files * fix(amr): recover login attempts from stale auth state * test: expand AMR and automations coverage * fix(amr): address review follow-ups * test(web): align tasks fixtures with contracts * fix(daemon): type wildcard route params * fix(ci): refresh PR merge validation * fix(amr): clear env credentials on logout * feat(settings): inline local CLI model configuration * fix(amr): recognize daemon env credentials * [codex] Fix Vela companion packaging (#2979) * Fix Vela companion packaging * Update Nix pnpm dependency hashes * [codex] Surface AMR account failures (#2980) * fix: surface AMR account failures * fix: cover AMR recovery error guidance * chore: bump beta base version to 0.8.1 (#2990) * Fix AMR profile and packaged runtime review issues * Detect packaged AMR OpenCode companion tree * feat(web): polish AMR frontend flows * Polish AMR onboarding card * fix: read AMR login state from dot-amr config (#3048) * test: tighten AMR credential and packaging coverage * test: restore AMR executable test env helper * [codex] Fix packaged mac Dock identity and AMR label (#3076) * Fix packaged mac sidecar Dock identity * Rename AMR assistant label * Fix AMR live models and dot-amr login state (#3073) * fix: read AMR login state from dot-amr config * fix: load live AMR models before runs * fix: point AMR onboarding link to production wallet * fix: address AMR model review feedback * fix: persist live AMR model fallback * [codex] Fix AMR link catalog model ids (#3088) * Fix packaged mac sidecar Dock identity * Rename AMR assistant label * Fix AMR link catalog model ids * Fix AMR model normalization typecheck * Use live AMR model for default runs * fix: polish AMR runtime settings UI * Accelerate AMR startup defaults (#3092) * Surface AMR insufficient balance wallet URL (#3099) * fix(web): polish onboarding controls (#3112) * fix(web): show CLI scan loading state * Avoid duplicate AMR wallet recharge links (#3117) * Avoid duplicate AMR wallet recharge links * Use Vela CLI 0.0.3 test package * chore(nix): refresh pnpm deps hash * Fix AMR wallet guidance display --------- Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com> * chore(pack): pin Vela CLI 0.0.3-test.1 (#3127) * chore(nix): refresh pnpm deps hash * chore(pack): pin Vela CLI 0.0.3 * chore(nix): refresh pnpm deps hash * fix(web): suppress AMR exit 130 fallback (#3136) * feat(web): nudge users to hosted AMR on model/auth/quota failures (#3083) * feat(web): nudge users to hosted AMR on model/auth/quota failures When a non-AMR agent run fails with an auth / quota / upstream model error, surface an inline nudge under the error pill linking to Open Design's hosted AMR gateway (https://open-design.ai/amr). The nudge fires `surface_view` (element=run_failed_toast) on impression and `ui_click` (element=go_amr) on the link. Also teach the daemon to classify CLI-agent auth/quota/upstream failures (Claude Code, codex, ...) into specific API error codes (AGENT_AUTH_REQUIRED / RATE_LIMITED / UPSTREAM_UNAVAILABLE) instead of the generic AGENT_EXECUTION_FAILED, so both the error message and the nudge key off accurate codes. AMR's own runs are excluded from the nudge — they keep the dedicated sign-in / recharge affordances. * feat(web): rework failed-run AMR guidance into per-case error UI Replace the single inline nudge with a per-case failed-run experience driven by the run's error code + agent: - The error card is now neutral gray (was red) and always carries a retry button; it is driven by the persisted per-message error event so it survives a reload. - Non-AMR agent hitting a model/auth/quota wall: a theme-color promotion card under the error card offers "switch to AMR & retry" — switches the run to AMR, opens Settings on the AMR card, and auto-retries once the account signs in (ProjectView polls vela login status, independent of the Settings pill lifecycle, with success / 5-min-timeout / unmount exits). - AMR agent unauthorized: clearer copy + an "authorize & retry" button. - AMR agent out of balance: clearer copy + a "top up" button to the AMR wallet, with manual retry. - Settings AMR card: when opened from the nudge, it scrolls into view and pulses, and an authorize-button coachmark (a fake hand cursor that rises in and dismisses on hover) points at the sign-in control when not yet authorized. analytics: surface_view (run_failed_toast) on the promotion card and ui_click (go_amr) on its action are retained. i18n adds chat.amrCard.* and chat.amrError.* (en / zh-CN / zh-TW translated; other locales fall back to en) and drops the old chat.amrErrorGuidance keys. * fix(daemon): require status context for numeric service-failure codes Per review on #3083: the model-service classifier matched bare HTTP status numbers (`500`, `502`, `429`, `401`), so ordinary CLI output like `line 500`, `read 502 bytes`, or `exit code 401` could be misclassified as a provider outage / auth wall and wrongly surface the AMR nudge. Now a status number only counts when it carries explicit context (`HTTP 500`, `status 503`, `code: 401`, `502 Bad Gateway`); textual provider phrases (overloaded, bad gateway, service unavailable, rate limit, …) are unchanged. Adds fixtures proving unrelated numeric output stays null. * fix(web): keep error pill for failed runs ChatPane's card doesn't cover Per review on #3083: the per-message gray error pill was suppressed for every persisted error status event, but ChatPane only renders the replacement top-level error card for `retryableAssistantMessage` (the last failed assistant). So a failed turn that is no longer last (after a follow-up) or an older failed run in history showed neither the pill nor the card — its error detail vanished, undercutting reload/history survival. ChatPane now passes `errorCardOwnerId` (the assistant id whose error the card represents); AssistantMessage suppresses only that one pill and keeps rendering StatusPill for all other error events. * fix(daemon): don't treat a process exit code as an HTTP status Follow-up to review on #3083: the status-context helper accepted a bare `code` prefix, so `exit code 401` / `process exited with code 429` still matched and got classified as AGENT_AUTH_REQUIRED / RATE_LIMITED (the very `exit code 401` case the comment calls out as noise). `code` now only counts when qualified (`status code` / `error code` / `response code`) or punctuation-bound (`code: 401`); bare `exit code N` no longer matches. Adds fixtures for exit-code lines returning null. * chore(web): translate AMR card / error keys for 16 remaining locales PR #3083 added 10 new `chat.amrCard.` / `chat.amrError.` keys but only provided en/zh-CN/zh-TW translations; the other 16 locales fell back to English. Translate the card title/body, three chips, primary CTA, and the AMR self-error (auth / balance) messages and buttons for ar, de, es-ES, fa, fr, hu, id, it, ja, ko, pl, pt-BR, ru, th, tr, uk. * fix(amr): address review feedback on #2355 Targeted fixes for the unresolved review threads on #2355. Each fix includes / updates a focused test. - runtimes/executables.ts: `packagedVelaOpenCodeCompanionTree` now verifies the inner `opencode` executable exists + is runnable, not just the directory. This closes the false-positive availability path that let `detectAgents()` surface AMR as available even when the packaged companion was empty / partially copied (mrcfps, 4 threads). - runtimes/executables.ts: `resolveAmrOpenCodeExecutable` now prefers the bundled `<OD_RESOURCE_ROOT>/bin/libexec/opencode/opencode` over a stale `opencode` on the user's PATH, so packaged AMR builds can't be hijacked by a global installation. - web/EntryShell.tsx: when the Local CLI scan returns an available agent and the previously-selected agent is AMR, switch the selection to the first available local agent so the runtime and persisted agent agree before Continue. - server.ts (model-probe branch): for AMR, check `readVelaLoginStatus` BEFORE rejecting on an empty live-model catalog — a signed-out user was getting `AMR_MODEL_UNAVAILABLE` ("choose a model") instead of the correct `AMR_AUTH_REQUIRED` (sign-in affordance). - server.ts (default model fallback): if the user asked for the AMR agent default and the cached id is no longer in the FRESH catalog, fall back to `liveModels[0]` from the probe instead of rejecting the run as `AMR_MODEL_UNAVAILABLE`. - integrations/vela.ts: route `vela login` through `createCommandInvocation` so an npm/Node-style `vela.cmd` / `.bat` shim on Windows gets the correct `cmd.exe /d /s /c …` wrapping with verbatim args (matches `execAgentFile` / chat-run spawning). - tools/pack/src/linux.ts: in containerized Linux builds, bind-mount the host directory of `OPEN_DESIGN_VELA_CLI_BIN` and rewrite the env to the container-side path. The host path was being passed in as-is even though the default container only mounts /project, /tools-pack and cache/home — `copyOptionalVelaCliBinary` saw a missing path. Deferred (out of scope for this PR): - `od amr status/login/logout/cancel` CLI subcommands (AGENTS.md UI/CLI dual-track rule, server.ts:5763) — sizable surface; tracked for a separate focused PR. - Strict `--require-vela-cli` for Windows + mac-x64 beta builds: prematurely blocked — `@powerformer/vela-cli` only publishes the `darwin-arm64` platform binary today; adding the flag elsewhere would fail the builds. Revisit once win/x64/linux binaries ship. * fix(amr): hoist sendAmrAccountFailure above the AMR catalog preflight (TDZ) The new signed-out AMR branch in the catalog preflight at server.ts:10875 calls `sendAmrAccountFailure(...)` to emit AMR_AUTH_REQUIRED, but the const declaration sat ~100 lines below at the outer function scope. Because `const` is TDZ-aware, that branch would have thrown `ReferenceError: Cannot access 'sendAmrAccountFailure' before initialization` for the exact users it tries to help — defeating the original intent. Hoist the helper to just above the AMR preflight block so it's available to every AMR code path in this function. Behavior elsewhere is unchanged. Also rerun the daemon test suite: `launch.test.ts > resolveAgentLaunch uses packaged built-in Vela for AMR` was creating the `<resourceRoot>/bin/libexec/opencode/` companion directory only, but this PR's earlier tightening of `packagedVelaOpenCodeCompanionTree` also requires the inner `opencode` executable. Add it to that fixture to match the new contract; the test was a sibling of the executables / env-and-detection fixtures already updated in `13fc4f4`. Addresses #2355 review (mrcfps, 2026-05-28). * feat(web): add hover cancel for AMR login (#3158) * feat(web): add hover cancel for AMR login * fix(web): don't bounce AmrLoginPill back to 'Signing in…' after local cancel Both codex-connector (P2) and looper (CHANGES_REQUESTED) on this PR flagged the same race in the new local-cancel path: `handleCancelLogin` dispatches `notifyAmrLoginStatusChanged('login-canceled')` immediately after `/login/cancel` returns, but the `AMR_LOGIN_STATUS_EVENT` listener unconditionally re-enters `refresh()` and then restarts polling whenever `/api/integrations/vela/status` still reports `loginInFlight: true`. That is a real race because the daemon's `cancelVelaLogin()` only sends SIGTERM (escalating to SIGKILL after `LOGIN_CANCEL_KILL_GRACE_MS` = 2000 ms) and keeps the child in `activeLoginProcs` until it actually exits — so the first `/status` read after a successful cancel can legally still come back as in-flight. Under that window the pill flips back to 'Signing in…' and can later surface the timeout/error path even though the user already canceled, defeating the behavior promised in the PR description. Fix the listener instead of every dispatch site: in the `login-canceled` branch, after the local reset (stopPolling + setPending(null) + clear refs), optimistically mark every subscribed pill instance as not-in-flight (`setStatus((c) => c ? { ...c, loginInFlight: false } : c)`) and `return` — skip the refresh-and-reconcile branch below entirely. The next explicit refresh (component mount, user interaction, or a `status-changed` event) will pick up the daemon's confirmed state once the child has actually exited. Add a focused regression test that holds `/api/integrations/vela/status` at `loginInFlight: true` even after a successful `/login/cancel`, asserting that the pill stays at the Canceled → Authorize sequence and never bounces back to 'Signing in…'. This test fails on the pre-fix listener and passes on the new behavior; existing 'cancels an in-flight AMR sign-in…' and 'reconciles late AMR browser completion to Signed in after local cancel' tests continue to pass. Addresses review feedback on #3158 (chatgpt-codex-connector, nettee). --------- Co-authored-by: lefarcen <935902669@qq.com> --------- Co-authored-by: a1chzt <chizblank@gmail.com> Co-authored-by: Amy <1184569493@qq.com> Co-authored-by: Mason <jinmeihong0201@gmail.com> Co-authored-by: Caprika <56862773+alchemistklk@users.noreply.github.com> Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>	2026-05-28 05:09:55 +00:00
Marc Chan	125dcd0174	fix(ci): run fork visual reports from trusted code (#2935 ) * fix: run fork visual reports from trusted code * fix: auto-approve strict web visual capture * fix: address visual report review feedback Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix: propagate visual report storage failures Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix: validate PR screenshots before upload Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix: validate visual PR identity before comment * fix: harden fork visual report validation Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix: address remaining fork visual report review feedback Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix: handle stale fork visual report lookup Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix: allow stale fork visual report fallback Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)	2026-05-26 06:17:04 +00:00
Jiannanya	f4af51d550	fix(skills): update d3-visualization skill upstream to snow-d3 and expand skill metadata (#1981 ) * update d3 visualization skill * update d3 skill info * fix(skills/d3): align seed triggers and clone path with SKILL.md - Add 'd3 scroll' to the d3-visualization triggers array in seed-curated-design-skills.ts so it matches the 16 triggers already present in skills/d3-visualization/SKILL.md. - Change `git clone` target from `.` to `skills/snow-d3` so the install command produces the path described by the prose.	2026-05-25 10:58:23 +00:00
PerishFire	34165ff189	chore: retire tools-pr (#2867 )	2026-05-25 05:15:04 +00:00
Marc Chan	e70d5bdd3f	fix(ci): disambiguate fork PR workflow auto-approval (#2857 ) Fork PR workflow runs can arrive without GitHub PR associations, which left ci and visual checks stuck awaiting manual approval even for allowlisted changes. Fall back to a single open PR match on head repo/ref/SHA so the approver stays conservative while still unblocking low-risk fork PR workflows.	2026-05-25 03:44:33 +00:00
icc	587f6de46d	fix: restore Atelier Zero deck plugin prompt (#2822 ) Co-authored-by: icc <iccccccccccccc@users.noreply.github.com>	2026-05-24 14:20:25 +00:00
Marc Chan	a04c2c8f3f	fix(ci): broaden fork PR workflow auto-approval (#2788 )	2026-05-23 13:00:43 +08:00
lefarcen	c14baf07d3	Merge origin/main into release/v0.8.0 PR #2461 sync prep — resolves 14 conflicts merging 84 main-side commits on top of 58 release-side commits accumulated during the 0.8.0 cycle. Resolution summary: Take main (theirs) where main carried deliberate forward progress: - apps/web/src/components/PluginCard.tsx — 7 hunks, i18n migration: hardcoded English aria-labels/titles replaced with t() calls keyed on pluginCard.* (all 8 keys verified present in en.ts). - apps/web/src/components/TasksView.tsx — 1 hunk, source-ingestion feature: sortedRoutines (newest-first), sourceIngestionTemplates, patchSourceForm, submitSourceIngestion. activeCount/pausedCount semantics preserved (now keyed on sortedRoutines, count unchanged). - e2e/ui/app.test.ts — new node:fs/promises + tmpdir + path + @/timeouts imports needed by main-side test helpers. - e2e/ui/settings-local-cli-codex-fallback.test.ts — menu-dismissal helper block added by main. Keep both sides where each added a different field to the same object literal: - apps/web/src/components/ProjectView.tsx (locale + analyticsHints spread). - apps/web/src/components/DesignSystemFlow.tsx (locale + analyticsHints). Take release (ours) where release carried deliberate work that ships 0.8.0: - CHANGELOG.md — release-side 0.8.0 entry + PR link refs; main's Unreleased section was the same body of work, now finalized. - apps/landing-page/public/{apple-touch-icon,favicon}.png + apps/web/public/app-icon.svg — release-side visual refresh assets consistent with 0.8.0 stable ship. - tools/pack/src/linux.ts — packageVersion const required by line 466; taking main's empty line would build-error. - e2e/ui/project-management-flows.test.ts + e2e/ui/settings-api-protocol.test.ts + e2e/ui/settings-memory-routines.test.ts — release-side release-smoke hardening (shangxinyu1 + PerishFire) takes precedence on overlap. Closes-issue / unblocks: PR #2461 sync release/v0.8.0 → main.	2026-05-23 12:17:18 +08:00
Marc Chan	6592d638ce	ci: gate fork PR workflow auto-approval (#2683 ) * ci: gate fork PR workflow auto-approval * ci: rename fork PR approval workflow * ci: normalize fork workflow paths Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(ci): match action_required workflow runs Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(ci): denylist tool config paths Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(ci): retry action_required workflow lookup Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(ci): restrict fork workflow approvals to target PR Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(ci): keep polling fork workflow approvals Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(ci): revalidate fork workflow approvals before approving Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(ci): poll longer for first fork approval run Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(ci): make fork approval poll budget configurable Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(ci): drop stale fork approval runs Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(ci): deny dotted tsconfig variants in fork approvals Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(ci): run fork approval regression in guard Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(ci): refresh Nix pnpm deps hash Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * test(web): mock useI18n in reattach restore test Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(ci): accept status-only fork approvals Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(ci): rerun fork approval on retarget Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(ci): ignore base tip churn in PR association Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(ci): broaden pending approval run fetch Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(ci): skip non-retarget fork approval edits Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(ci): checkout visual comment workflow head Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(ci): paginate workflow approval run lookup Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(ci): harden fork workflow follow-ups Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(ci): honor full post-appearance settling window Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(ci): validate manual visual comment checkout Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)	2026-05-23 11:48:36 +08:00
Marc Chan	a5b47c5f76	fix(ci): narrow workflow scope and reuse setup steps (#2708 ) * fix(ci): narrow workflow scope and reuse setup steps * fix(ci): narrow workflow scope and reuse setup steps Repair Nix fixed-output hashes for the filtered daemon and web source trees. Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(ci): narrow workflow scope and reuse setup steps Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(ci): narrow workflow scope and reuse setup steps Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix(ci): repair daemon and nix checks Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)	2026-05-22 18:58:53 +08:00
PerishFire	b4e94b0534	Harden packaged updater downloads and install handoff (#2677 ) * Add managed download package for updater resumes * fix(download): clear stale pid locks * test(e2e): harden windows updater resume smoke * feat(updater): make update downloads silent in ui * fix(updater): keep install handoff prompt visible * fix(ci): build platform before download in postinstall	2026-05-22 15:44:28 +08:00
PerishFire	526c7f7c26	Fix packaged auto-update release validation (#2565 ) * fix: tighten packaged updater flow * test: prune noisy extended ui coverage * fix: hide unpublished release artifacts * test: validate release updater channels * fix: align prerelease release namespaces	2026-05-21 18:15:53 +08:00
Marc Chan	10192dcc52	fix(ci): catch nix hash drift before merge (#2530 ) * fix(ci): catch nix hash drift before merge * fix(nix): add pnpm hash refresh helper * chore(nix): drop redundant hash alias * fix(nix): raise update-hash output buffer Generated-By: looper 0.8.1 (runner=fixer, agent=opencode) * fix(nix): handle current pnpm deps hash Generated-By: looper 0.8.1 (runner=fixer, agent=opencode) * fix(nix): reject non-mismatch hash updates Generated-By: looper 0.8.1 (runner=fixer, agent=opencode)	2026-05-21 16:08:13 +08:00
Eli-tangerine	8193981511	Keep PR 2400 changes without folder pickers (#2462 ) * feat(daemon): add project working directory management and editor hand-off functionality - Introduced new flags for project commands to manage working directories, including `--working-dir` and `--dir`. - Implemented API routes for listing available editors and opening projects in selected editors. - Added a hand-off button in the ChatPane header to facilitate opening project folders in local applications. - Enhanced the HomeHero component to include working directory and design system settings, improving user experience in project creation. - Created HomeHeroSettingsChips component for inline management of working directory and design system selection. * feat(chat): implement voice transcription proxy and enhance UI components - Added a new API route for voice transcription using OpenAI's `/audio/transcriptions` endpoint, allowing users to send audio blobs directly for transcription. - Integrated multer for handling audio file uploads in memory, ensuring efficient processing without disk storage. - Updated the HomeHero component to include example prompt suggestions for plugins, enhancing user interaction. - Introduced the EditorIcon component to visually represent different editors in the hand-off menu, improving the user experience. - Refined the HandoffButton component to utilize the new EditorIcon, providing a more cohesive interface for selecting editors. - Enhanced CSS styles for various components to improve layout and responsiveness, including adjustments to tab and button sizes for better usability. * style(workspace-shell): enhance layout and overflow handling - Updated CSS for .workspace-shell to ensure full viewport width and height, with proper overflow management. - Adjusted grid layout to prevent content overflow and maintain responsiveness. - Modified styles for .workspace-tabs-chrome to improve width handling and prevent overflow issues. * refactor(chat): remove voice transcription proxy and related components - Deleted the voice transcription proxy implementation, including the associated API route and multer configuration. - Removed the MicButton component from the ChatComposer and HomeHero components to streamline the UI. - Updated HomeHero to include example suggestions without the voice input functionality. - Adjusted CSS styles for various components to maintain layout consistency after the removal of the MicButton. * feat(daemon): implement minting of HMAC tokens for working directory management - Added a new function `mintImportTokenFromCurrentSecret` to generate HMAC tokens bound to a specified base directory, enhancing security for working directory operations. - Updated the `desktop-auth.ts` file to include the new token minting functionality, which returns structured errors when the desktop auth secret is cleared. - Introduced new IPC message types for minting import tokens in the sidecar protocol, allowing seamless integration with the daemon's working directory management. - Enhanced the `WorkingDirPill` component to utilize the new token minting flow for secure directory selection in desktop builds. - Updated CSS styles for the HomeHero component to accommodate new example suggestion features and maintain layout consistency. * fix(HomeView): import HOME_HERO_CHIPS constant for improved chip management - Updated the HomeView component to import the HOME_HERO_CHIPS constant from the chips module, enhancing the management of hero chips within the component. * feat(daemon): implement mintImportTokenViaSidecar for secure working directory management - Introduced the `mintImportTokenViaSidecar` function to facilitate the minting of HMAC tokens for desktop-import operations via the daemon's sidecar IPC. This allows CLI commands to bypass authentication when the desktop-auth gate is active. - Updated the CLI to utilize the new token minting function when setting the working directory, ensuring secure access to trust-gated API endpoints. - Enhanced the sidecar server to handle minting requests and return structured error messages for improved user feedback. - Added tests to validate the new token minting functionality and its integration with the working directory management process. - Refactored related components to support the new token flow, improving overall security and user experience. * feat(HomeHero): enhance UI components and styles for improved user experience - Updated HomeHero component to replace active dot indicators with Plug icons for better visual representation of active plugins. - Adjusted CSS styles for various elements, including padding and dimensions, to enhance layout consistency and responsiveness. - Introduced new styles for active type icons and improved hover effects for buttons. - Updated HomeHeroSettingsChips to change button titles and icons for clarity. - Added tests to ensure proper rendering and functionality of updated components. * feat(ProjectDesignSystemPicker): enhance design system selection with preview functionality - Updated the ProjectDesignSystemPicker component to include a preview feature for design systems, allowing users to see a preview of the selected design system. - Implemented hover functionality to update the preview based on the hovered design system. - Added fullscreen preview capability for a more immersive experience. - Enhanced CSS styles for the design system picker to improve layout and responsiveness. - Introduced tests to validate the new preview functionality and ensure proper interaction within the component. * feat: refactor project metadata handling and enhance design system picker - Updated the default scenario plugin ID retrieval to use project metadata, improving the logic for determining the appropriate plugin based on project intent. - Enhanced the ProjectDesignSystemPicker and related components to support localized design system summaries and categories, improving user experience. - Introduced new translations for working directory and design system picker components, ensuring better accessibility and usability across different locales. - Added a new 'live-artifact' project type to the HomeHero chips, expanding the functionality for users creating refreshable artifacts. - Updated tests to validate the new project metadata handling and design system picker functionalities. * feat: enhance localization and styling for design system components - Added French translations for working directory and design system picker components, improving accessibility for French-speaking users. - Updated CSS styles for the pet task item to ensure consistent padding and layout. - Introduced a new test suite for HomeHeroSettingsChips to validate localization and design system selection functionality. - Enhanced ProjectDesignSystemPicker tests to ensure proper localization and interaction with design system categories. * fix: update .gitignore to include all claude-sessions directories and remove specific session files - Modified .gitignore to ensure all claude-sessions directories are ignored by using a wildcard pattern. - Deleted two specific claude-sessions markdown files to clean up unnecessary session data. * fix: repair home automation ci regressions * fix: stabilize artifact consistency e2e * Remove folder picker changes from PR 2400 --------- Co-authored-by: pftom <1043269994@qq.com> Co-authored-by: qiongyu1999 <2694684348@qq.com>	2026-05-20 22:07:30 +08:00
Marc Chan	f294ab4915	chore(ci): add visual regression PR workflow (#2372 ) * Add visual regression PR workflow * Allow manual visual PR comments * Post visual comments for same-repo PRs * fix(ci): surface R2 lookup failures in visual report Generated-By: looper 0.8.1 (runner=fixer, agent=opencode) * Align visual workflow names	2026-05-20 15:05:59 +08:00
lefarcen	80d305858b	feat(diagnostics): add one-click log export from Settings → About (#798 ) * feat(diagnostics): add one-click log export from Settings → About Adds a new "Export diagnostics" entry under the About section that bundles daemon/web/desktop logs, machine info, and recent macOS crash reports into a zip the user can share when reporting issues. - Browser hits a new daemon HTTP endpoint and triggers a download. - Electron uses an IPC bridge with the native save dialog and reveals the saved file in Finder/Explorer; the Help menu also exposes it as a fallback when the daemon is unresponsive. Packaging + redaction lives in a new @open-design/diagnostics package so both surfaces share it. Sensitive JSON keys, URL query secrets, and the current user's home path are redacted before packaging. * build(nix): include packages/diagnostics in daemon build targets The Nix daemon derivation builds workspace siblings in dependency order before compiling apps/daemon. Without @open-design/diagnostics in that list, the daemon TypeScript build fails inside the Nix sandbox with `Cannot find module '@open-design/diagnostics'` because pnpm install only creates the symlink — the dist output that the package.json exports point at isn't produced until each sibling's build script runs. * build(tools-pack): include @open-design/diagnostics in packaged INTERNAL_PACKAGES Without this, packaged win/mac/linux builds fail with `npm error 404` when the post-build `npm install --omit=dev --no-package-lock` step in the assembled app tries to resolve `@open-design/diagnostics@0.2.0` from the public npm registry. The package is workspace-private, so it has to be tarballed via `pnpm pack` and file:-referenced from the assembled package.json like every other internal workspace dep that daemon/desktop depend on. Also wires the package's `pnpm --filter ... build` into the pre-pack workspace build step so the dist/ exists before pnpm pack runs, and updates the two test fixtures (`win-app.test.ts`, `workspace-build.test.ts`) that mirror INTERNAL_PACKAGES. The diagnostics package itself is repinned to exact dependency versions already used elsewhere in the workspace (`jszip 3.10.1`, `@types/node 20.19.39`, `esbuild 0.28.0`, `typescript 5.9.3`, `vitest 4.1.6`) so it passes the new `pnpm guard` exact-version rule and produces a minimal lockfile diff vs main (additions only, no resolution-string churn). * fix(diagnostics): include `~` in bearer-token redaction char class RFC 6750 token68 syntax allows `~`, so tokens like `Authorization: Bearer abcd~efgh` were only partially matched by `HTTP_AUTH_SCHEME_RE`. The regex stopped at the first `~`, leaving the tail (`~efgh`) un-redacted in the exported diagnostics zip — a clear leak since this feature explicitly generates support bundles for external sharing. Add `~` to the character class and a regression test. * fix(diagnostics): only collect renderer.log from desktop `buildSidecarLogSources` unconditionally added `logs/${app}/renderer.log` for daemon/web/desktop, but only the desktop runtime writes a renderer log (see apps/desktop/src/main/runtime.ts) — daemon and web are pure Node services with no Electron renderer. Every export therefore produced missing-file placeholders and manifest warnings for the two phantom paths, polluting the bundle. Gate the renderer.log source on APP_KEYS.DESKTOP so the daemon-side collector matches the desktop-side collector in apps/desktop/src/main/ diagnostics.ts:63. * fix(diagnostics): mirror desktop-side renderer.log gate The previous fix only updated the daemon-side `buildSidecarLogSources` in `apps/daemon/src/diagnostics-export.ts`. The desktop-side collector at `apps/desktop/src/main/diagnostics.ts` had an identical copy of the same bug that I overlooked: it also unconditionally added `logs/${appKey}/renderer.log` for daemon/web/desktop, producing missing-file placeholders + manifest warnings for the two phantom paths on every desktop-initiated export. Apply the same `appKey === APP_KEYS.DESKTOP` gate here so both export entry points (browser via daemon HTTP, Electron via native save dialog) emit the same clean manifest. * feat(diagnostics): add `od diagnostics export` CLI subcommand AGENTS.md's dual-track capability-exposure contract requires every user-facing feature to ship on both the web UI and the `od` CLI. The diagnostics export was only reachable through Settings → About and the desktop Help menu; this commit closes the loop with an `od diagnostics export [<path>] [--json]` subcommand registered in SUBCOMMAND_MAP. The CLI is a thin shell over the existing GET /api/diagnostics/export endpoint — same zip output, same redaction, same crash-report scope. Defaults to writing `open-design-diagnostics-<timestamp>.zip` in the current directory; `--output <path>` or a positional arg overrides. `--json` prints `{path, sizeBytes}` for shell pipelines. Use cases this unlocks: - A CI script can `od diagnostics export ~/artifacts/bundle.zip` after a failed run. - Bug reporters on headless boxes can grab a bundle without booting the web UI. - `od doctor` follow-ups can collect a full snapshot when a probe fails. * fix(diagnostics): surface non-sidecar launch in manifest warnings `buildSidecarLogSources()` returns `[]` when the daemon has no sidecar runtime context, which is the standard `od` (plain) launch path — `runDaemonCliStartup()` -> `startDaemonRuntime()` does not pass a runtime. Settings → About and the new `od diagnostics export` previously reported success but produced a bundle with only the summary JSONs, so operators could not tell "no logs because plain launch" from "no logs because something genuinely broke." - Extend `DiagnosticsContext` with an optional upstream `warnings: string[]` that `buildManifest` merges into the manifest warnings. - Emit STANDALONE_LAUNCH_WARNING from the daemon handler when `options.runtime == null`. The warning names the limitation and points the user at the sidecar entry points that DO capture logs. - Add a regression spec at `apps/daemon/tests/diagnostics-export.test.ts` that drives the handler with `runtime: null` and asserts the warning surfaces in `summary/manifest.json` (and that `files` is empty so a user reading the bundle does not confuse "no log sources" with "missing files").	2026-05-20 09:10:51 +08:00
PerishFire	2c128e0e91	refactor desktop host bridge (#2246 )	2026-05-19 18:27:05 +08:00
chaoxiaoche	6a08dfe111	Add design system package quality guard (#2224 ) * Add design system import manifest schema * Generate hybrid design system imports * Read design system usage and cached manifests * Add design system pull-file tool * Show design system package evidence * Wire design system import semantics * Add design system package quality guard --------- Co-authored-by: chaoxiaoche <chaoxiaoche@chaoxiaochedeMacBook-Pro.local>	2026-05-19 16:53:29 +08:00
PerishFire	bd48c597b0	chore: pin dependency versions and harden CI caches (#2189 ) * chore: pin dependency versions * ci: enforce pinned dependency specs * ci: fix pnpm executable invocation	2026-05-19 13:58:27 +08:00
PerishFire	4424f08be0	[codex] Add packaged desktop auto-update (#1375 ) * Add packaged desktop auto-update * Handle counted beta nightly update versions * Refresh desktop auto-update branch for main * Serialize desktop updater operations * Refresh auto-update branch for packaged paths	2026-05-19 11:20:05 +08:00
chaoxiaoche	f7eb82d7a5	feat(design-systems): import design system projects (#2112 ) * feat(design-systems): define project manifest contract * feat(design-systems): add default project manifest * feat(daemon): consume design system manifests * feat(design-systems): import local project systems * feat(design-systems): import from github repositories --------- Co-authored-by: chaoxiaoche <chaoxiaoche@chaoxiaochedeMacBook-Pro.local>	2026-05-18 20:20:38 +08:00
chaoxiaoche	1f66c53203	feat(daemon): consume component manifests (#2053 ) * feat(design-systems): extract component manifests * feat(daemon): consume component manifests --------- Co-authored-by: chaoxiaoche <chaoxiaoche@chaoxiaochedeMacBook-Pro.local>	2026-05-18 16:50:52 +08:00
chaoxiaoche	46a64edce3	feat(design-systems): extract component manifests (#2051 ) Co-authored-by: chaoxiaoche <chaoxiaoche@chaoxiaochedeMacBook-Pro.local>	2026-05-18 16:48:59 +08:00
chaoxiaoche	d1a2f9f07e	chore(design-systems): report component fixture coverage (#2049 ) Co-authored-by: chaoxiaoche <chaoxiaoche@chaoxiaochedeMacBook-Pro.local>	2026-05-18 16:47:34 +08:00
asim48-ctrl	08d6bc73ac	Clarify ui-ux-pro-max catalog scope (#1960 )	2026-05-18 14:02:33 +08:00
leessju	3944ef34b8	feat(scripts): add informational i18n coverage report (#1896 ) Adds `scripts/i18n-coverage-report.ts` (wired via `pnpm i18n:coverage`) that reports per-locale key-coverage drift against English. The output looks like: Locale (key total = 1578 on en) en English keys=1578 missing= 0 ... zh-CN 简体中文 keys=1578 missing= 0 coverage=100% th ภาษาไทย keys=1261 missing= 317 coverage=80% ... Why this is useful right now: - The existing `scripts/i18n-check.ts` validates structural consistency (locale registration, README switcher alignment, core doc link references) and exits non-zero on failure. It does not surface content-coverage drift. - The test suite enforces strict English-parity only for Indonesian (`id`) — other locales pass CI even when an English key was added without a matching translation. Today the average locale is missing ~140–300 keys vs English on `main`. This script is purely informational — exit code stays 0. It gives contributors and release managers a fast way to see which locale needs translation work without breaking PRs until the locale catches up. Issue #1894 covers the policy question of whether strict enforcement should extend beyond Indonesian (e.g., to a tier-1 locale set); this script is the substrate for that policy conversation, not a substitute for it. Refs #1894. Validation: - `pnpm i18n:coverage` runs end-to-end against the 19 locale files under `apps/web/src/i18n/locales/`, exit 0. - `pnpm guard` green. - `pnpm exec tsc -p scripts/tsconfig.json --noEmit` green. Co-authored-by: nicejames <nicejames@gmail.com>	2026-05-16 16:06:25 +08:00
lefarcen	22a3b99a47	Merge origin/main into preview/v0.8.0 Sync 49 commits from main. Conflicts resolved: - .github/workflows/ci.yml: kept v0.8.0 granular per-area gating, added main's linux specs + release-stable.yml + release-preview.yml triggers - .github/workflows/release-preview.yml: kept v0.8.0's full workflow over main's placeholder - apps/web/src/components/AssistantMessage.tsx: combined v0.8.0 file-ops summary with main's stripTodoToolGroups + suppressAskUserQuestionFallbackText - apps/web/src/components/ChatPane.tsx: kept both new imports - apps/web/src/index.css: kept both .msg-plugin-chip and .user-copy-btn blocks - e2e/ui/*.test.ts: kept v0.8.0 openEntrySettingsDialog helper over main's inline dialog navigation (UI was redesigned in v0.8.0) - nix/package-{daemon,web}.nix: kept v0.8.0 pnpmDepsHash; rerun nix build to refresh	2026-05-15 18:23:33 +08:00
PerishCode	883598f556	Build registry protocol in packaged workspaces	2026-05-14 21:23:45 +08:00
PerishCode	4f15c33595	Merge remote-tracking branch 'origin/preview/0.8.0' into preview/v0.8.0	2026-05-14 21:10:03 +08:00
정수현	63baff5222	fix(skills): repoint coreyhaines31 upstream URLs to marketingskills (#1659 ) The upstream repo github.com/coreyhaines31/skills was renamed to github.com/coreyhaines31/marketingskills, so the four curated marketing-creative stubs (ad-creative, copywriting, marketing-psychology, paywall-upgrade-cro) advertised a source URL that now 404s. Update od.upstream and the body source/open links in all four SKILL.md stubs, plus the matching entries in the seed script so re-seeding stays consistent.	2026-05-14 20:10:14 +08:00
PerishCode	7ea77cf8b1	Build plugin runtime during postinstall	2026-05-14 19:19:28 +08:00
PerishCode	43b1b94c8e	Add preview release channel	2026-05-14 19:15:16 +08:00
PerishCode	cba8bf151d	chore: align namespace lifecycle packaging	2026-05-14 16:35:46 +08:00
lefarcen	b268bbe169	Merge origin/garnet-hemisphere (post-9e196d34) — Use Plugin handoff fix Brings in 11 new garnet commits, most importantly: - `1a90aef4` feat(plugin-use): implement plugin use handoff functionality — fixes the bug QA reported where /plugins Use Plugin would 422 silently for template plugins; new flow hands off to HomeView with the plugin pre-bound + input form prompted there. - `2ac58544` feat(plugin-inputs): enhance plugin input handling with file upload support — extends PluginInputsForm for file uploads. - `3b167b69` feat(plugins): registry protocol — new @open-design/registry-protocol workspace package (needs build before daemon boot). - Plus enhancements to plugin metadata, GitHub installer, plugin detail view, login/whoami, static HTML preview paths. Conflicts resolved: - packages/contracts/src/api/projects.ts: HEAD's skipDiscoveryBrief field + garnet's contextPlugins (@-mention plugin context refs) both kept on ProjectMetadata. - apps/landing-page/* (3 files): accepted HEAD — garnet had the older single-page landing-page header; main has the multi-page layout (/skills/, /systems/, /templates/, /craft/) with dynamic counts. Not related to the Use Plugin core fix. New @open-design/registry-protocol package must be built before daemon boots; pnpm install does this via postinstall already.	2026-05-14 16:32:35 +08:00
lefarcen	6c16283850	Merge origin/main (post-7c8305f4) into reconcile branch Brings in 10 new main commits: routine deep-link to specific conversations (#1508), Windows resource cache fix for Orbit templates, collapsible comment side panel (#1607), routines project radio polish, Copilot logo swap, and minor UI fixes. Conflicts resolved: - router.ts: garnet's home/view + marketplace routes + main's per-project conversationId deep-link field coexist on Route union - ProjectView.tsx: garnet's isPhantomDaemonRunMessage helper + main's isStoppableAssistantMessage helper both kept - ProjectView.run-cleanup.test.tsx: accepted HEAD (garnet's phantom-row regression test); main's three new tests for finalizeActiveAssistantMessagesOnStop / clearStreamingConversationMarker / shouldClearActiveRunRefs are queued as a follow-up TODO inline.	2026-05-14 15:13:38 +08:00
Marc Chan	055e55abd8	Add batch design system testing (#1515 ) * feat: add batch design system testing * fix: use daemon default agent for batch tests * fix: honor batch project prompt flags Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix: persist batch run output * fix: honor dry-run before daemon resolution Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix: persist batch assistant run ids Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * fix: cancel timed-out batch runs Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)	2026-05-14 14:19:32 +08:00
chaoxiaoche	e57e028222	feat(daemon): make design-system token channel default-on (PR-D) (#1544 ) * feat(daemon): make design-system token channel default-on (PR-D) Flip `OD_DESIGN_TOKEN_CHANNEL` from default-off to default-on. Every chat that picks a brand with `tokens.css` + `components.html` siblings (today: `default`, `kami`) now gets the structured token contract appended to the system prompt automatically. `OD_DESIGN_TOKEN_CHANNEL=0` keeps the DESIGN.md-only path as a kill switch. Adds `scripts/check-design-system-flag-parity.ts`, registered in `pnpm guard`. The guard walks every brand and asserts: - 147 prose-only brands produce byte-identical prompts under flag-off vs flag-on (PR-D's "no-op for legacy brands" promise) - 2 structured brands diverge as expected (catches a future regression that silently dropped the structured blocks) Smoke evidence on #1385 (PR-C): - `default` — 10/10 brand tokens used byte-for-byte in treatment vs 0/10 invented colors in control - `kami` — treatment recovers brand name (`Kami · 纸`), the two-tier surface (`--bg` parchment + `--surface` ivory), the CN font stack override, and the `components.html` card pattern; control invented "Replica" as a brand name Co-authored-by: Cursor <cursoragent@cursor.com> * review: address @nettee + @lefarcen feedback on parity guard Two blocking findings from #1544 review: 1. @nettee — guard's inventory walk silently passed on unreadable filesystem state. `fileExists` swallowed every `stat` error and the bare `readdir` catch returned `[]` for any failure. A renamed `design-systems/` tree, a permission-denied DESIGN.md, or a directory at the brand path would have left `pnpm guard` happy after checking 0 brands — exactly the silent misconfiguration this guard exists to catch. Both error paths now treat only ENOENT / ENOTDIR as absence and rethrow everything else, mirroring the `readFileOptional` fix already applied to PR-C's `apps/daemon/src/design-systems.ts`. 2. @nettee — guard exercised `composeSystemPrompt` directly, bypassing the `process.env.OD_DESIGN_TOKEN_CHANNEL !== '0'` gate in server.ts that PR-D actually flipped. A regression that restored `=== '1'`, typo'd the env name, or stopped reading assets when the var is unset would still leave the guard green. Extracted the predicate into `isDesignTokenChannelEnabled(env)` next to `readDesignSystemAssets` and added 6 unit tests pinning every value that matters: unset / `'1'` / `'true'` / empty / `'0'` / whitespace-padded. server.ts now calls the predicate. Any regression on the env-flag semantics fails `tests/design-system-assets.test.ts` independently of the composer-level coverage. Verified: pnpm guard (13/13), tsc -p scripts/tsconfig.json (clean), @open-design/daemon typecheck (clean), 32/32 prompt + asset tests. Co-authored-by: Cursor <cursoragent@cursor.com> * review: pin server-layer asset resolution end-to-end (lefarcen P2) Round-2 review feedback from @lefarcen on #1544: the predicate suite in tests/design-system-assets.test.ts pinned the env-flag boolean but did NOT exercise the server prompt-assembly path that PR-D actually flipped — the seam where the daemon decides whether to read tokens.css / components.html from disk and hand them to composeSystemPrompt. A regression that, say, restored an inline `=== '1'` gate or stopped calling isDesignTokenChannelEnabled() from server.ts would still leave the predicate test green. Extracted that whole seam into `resolveDesignSystemAssets(id, builtInRoot, userInstalledRoot, env)` on apps/daemon/src/design-systems.ts. The function combines: 1. the env-flag gate (kill switch on `OD_DESIGN_TOKEN_CHANNEL=0`) 2. the built-in → user-installed root fallback chain (per-file) 3. the DesignSystemAssets result shape consumed by composeSystemPrompt server.ts at the prompt-assembly site is now a thin caller of this function. The previous 13-line inline block (env check + per-file fallback) collapses to one call, so the whole asset-resolution path now has a single testable seam. 7 new tests in tests/design-system-assets.test.ts run the full pipeline end-to-end against real disk fixtures: - env unset (default-on): returns built-in assets - env=`'0'` (kill switch): returns undefined even with files on disk - env=`'1'` (legacy opt-in): still works - mixed builtin/user-installed: per-file fallback merges correctly - both halves built-in: skips user-installed roundtrip verbatim - prose-only brand (no files): undefined / undefined - nonexistent brand directory: undefined / undefined Verified: pnpm guard (13/13), tsc -p scripts/tsconfig.json (clean), @open-design/daemon typecheck (clean), 39/39 prompt + asset tests (was 32; +7 new server-layer-resolution tests). Co-authored-by: Cursor <cursoragent@cursor.com> * fix(test): add missing projectKind to FileViewer deck preview test The deck preview test added in #1556 (`086be271`) renders <FileViewer/> without `projectKind`, which became a required prop in #1509. CI on main is currently red on this; pick up the trivial fix here so PR-D can land cleanly. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: chaoxiaoche <chaoxiaoche@chaoxiaochedeMacBook-Pro.local> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-14 14:14:19 +08:00
pftom	3b167b6921	feat(plugins): add registry protocol and enhance plugin management features - Introduced the `@open-design/registry-protocol` package, enabling improved interactions with plugin registries. - Updated the `typecheck` script in the daemon's `package.json` to include the new registry protocol. - Enhanced the CLI with new flags and commands for better plugin management, including `yank` and additional marketplace functionalities. - Implemented a plugin lockfile system to manage installed plugins and their versions, improving reliability during upgrades. - Added new marketplace doctor functionality to validate plugin entries and ensure compliance with registry standards. This update significantly enhances the plugin ecosystem by providing robust registry interactions and improved management capabilities.	2026-05-14 08:55:36 +08:00
lefarcen	d83b228c81	Merge remote-tracking branch 'origin/garnet-hemisphere' into reconcile/garnet-main-merge	2026-05-13 23:52:33 +08:00
lefarcen	d3602be666	Merge origin/main into garnet-hemisphere (reconcile) Merge of `origin/main` (`03ed3960`, 2026-05-13 pre-0.7.0) into the 161-commit garnet-hemisphere line, reconciling the product-vibe-coded plugin/marketplace/EntryShell surfaces from garnet with the routines / skills / live-artifacts feature work landed on main since the fork point. Headline decisions (full rationale + side-by-side screenshots in `specs/change/20260513-garnet-skills-automations/reconcile-result-vs-garnet.md`): - #1 SettingsDialog: keep main's Memory / Skills / External MCP / Connectors / Routines / MCP server nav items even though the top-level /integrations + /automations routes also cover them. Two entries coexist for now; revisit once Track A/B fill in the placeholder content. - #2 EntryView: accept garnet's thin wrapper delegating to EntryShell. Main's PetRail sidebar + image-templates/video-templates tabs are intentionally deferred to a follow-up that re-integrates them into the new EntryShell layout. - #3 /integrations + /automations top-level routes: kept (garnet's product intent). Skills tab is still a "Coming soon" placeholder awaiting Track A; Routines/Schedules/Live-artifacts cards on /automations are still mock awaiting Track B. - #5 DesignFilesPanel: hybrid — main's pagination as primary list, garnet's Plugin folders section preserved between the live-artifacts block and the pagination block. (by-kind sections drop in favour of pagination; plugin-folders rendering stays because it is a garnet-specific product addition.) - #7 server.ts (10 hunks, ~5400 conflict lines): manual hunk-by-hunk merge. Both daemon admin routes + plugin/genui routes (garnet) and routines/memory/skills upgrades (main) preserved. Garnet's inline project route block kept alongside main's `registerProjectRoutes` / `registerProjectUploadRoutes` modular wiring — duplicate route audit is a follow-up. Garnet's POST /api/projects plugin-snapshot resolution + default-scenario fallback is intentionally dropped from the inline body (now handled by registerProjectRoutes) and listed for follow-up re-integration into `project-routes.ts`. Verification (worktree at /Users/elian/Documents/open-design-garnet): - `pnpm typecheck` exits 0 across all workspace packages - daemon (`pnpm tools-dev run web --namespace reconcile-shots`) boots, serves `/api/daemon/status` healthy, and survives a Playwright walkthrough of /integrations / /automations / home / projects / design-systems / plugins / settings dialog - `@open-design/plugin-runtime` package built (was missing dist/ on garnet); without it the daemon's plugins/* imports fail at boot Track A (Skills tab → real SkillsSection) and Track B (Automations cards → real routines / live-artifacts backend) are the two remaining follow-ups blocking the placeholder/mock content from going live. See `spec.md` and `track-skills.md` in the same directory.	2026-05-13 22:29:21 +08:00
pftom	d3d95121f3	feat(plugins): enhance visual score sorting and add new example templates - Updated the `sortByVisualAppeal` function to prioritize featured ranks, ensuring that curated plugins are displayed prominently. - Added tests to verify the new sorting logic, ensuring that plugins with numeric featured ranks are sorted correctly ahead of others. - Introduced new example templates for a magazine article layout, a Twitter share card, and a Xiaohongshu card, expanding the available options for users. - Enhanced the overall plugin preview experience by integrating these new templates, providing users with more visually appealing and functional examples. This update significantly improves the plugin sorting mechanism and enriches the template offerings, enhancing user engagement and experience.	2026-05-13 21:02:05 +08:00
pftom	9e196d34af	feat(daemon, web): enhance plugin sharing workflows and UI components - Updated the plugin sharing prompts to utilize local daemon endpoints for publishing to GitHub and contributing to Open Design, streamlining the user experience. - Refactored the `PluginsView` and `PluginShareMenu` components to support new sharing functionalities, including confirmation modals and improved link handling. - Enhanced the CSS styles for the plugin share confirmation modal and related UI elements for better visual consistency. - Added tests to verify the functionality of the new sharing workflows and ensure proper integration within the existing plugin management system. This update significantly improves the plugin sharing experience, making it easier for users to publish and contribute their plugins effectively.	2026-05-13 14:35:09 +08:00
nettee	f621dbbfea	feat(web): Add Tailwind foundation (#1388 )	2026-05-12 21:48:16 +08:00
Prantik Medhi	325d1d3ceb	docs: add NotebookLM GitHub export script (#1062 ) * docs: add NotebookLM GitHub export script * fix: make NotebookLM export TOC anchors work * fix: escape TOC link text markdown chars * fix: include merged PRs when exporting --prs all * fix: allow --prs merged mode * fix: treat --limit as total export budget * fix: avoid starving buckets under global --limit * fix: support --issues none and handle repos w/ issues disabled * fix: avoid underfilling export when buckets empty * fix: keep disabled-issues fallback quiet * fix: silence disabled issues fallback * fix: satisfy script typecheck	2026-05-12 15:47:32 +08:00
pftom	5af84c09af	feat(web): refactor PluginsHomeSection to use tag-based filtering and introduce PluginCard component - Replaced the legacy tabbed categorization in `PluginsHomeSection` with a tag-driven approach, allowing dynamic filtering based on plugin tags. - Introduced a new `PluginCard` component to encapsulate the rendering of individual plugin cards, improving separation of concerns and maintainability. - Added a `usePluginCategories` hook to manage plugin visibility and filtering logic, enhancing the overall structure and testability of the component. - Implemented a "More" pill for overflow tags in the filter row, improving user interaction with a cleaner UI. - Updated CSS styles to support the new layout and improve visual consistency across the plugins home section. This update significantly enhances the user experience by providing a more flexible and intuitive way to discover and interact with plugins.	2026-05-12 13:25:44 +08:00
chaoxiaoche	a75d9938c7	feat(design-systems): add structured tokens.css schema (default + kami) (#1231 ) * feat(design-systems): add structured tokens.css schema (default + kami) Compile each brand's DESIGN.md prose into a machine-readable :root block agents paste verbatim, removing the "Primary → --accent" translation step where most token misuse happens. Daemon prompt injection lands in a follow-up; lint-artifact already enforces the shared token vocabulary so no rule changes needed. Schema validated across two contrasting aesthetics: - default (sans-serif, cobalt, B2B utility) — stress test the shallow form, 2-level fg / 2-level surface - kami (serif, parchment, ink-blue, print-first) — stress test the rich form, 4-level fg ramp, 3-level surface, ring elevation, i18n font stacks, and solid-hex tag tints (print renderers double-paint alpha) Schema growth from kami's stress test (5 new optional slots, all backward-compatible — default aliases via var() to existing tokens): - --fg-2 / --meta (4-level fg ramp) - --surface-warm (3-level surface) - --border-soft (2-level border) - --elev-ring (ring elevation as first-class level) Brand-specific extensions live in tokens.css with explicit "NOT in shared schema" labels and a documented promotion path (≥2 brands need it → promote to schema slot). components.html in each brand is a self-contained reference fixture that exercises every token through real layouts. Both fixtures lint clean against apps/daemon/src/lint-artifact.ts. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(design-systems): add token-fixture drift guard Each design system in design-systems/<brand>/ ships two files agents consume in tandem: tokens.css (canonical token bindings) and components.html (a self-contained fixture whose first <style> embeds the same :root paste so the file renders standalone). The fixture's :root block is a copy of tokens.css's :root block, kept in sync only by an inline comment. This adds scripts/check-tokens-fixture-sync.ts and registers it in pnpm guard. The check pairs each brand's tokens.css with its components.html and asserts the unscoped :root block is byte-equivalent after canonical normalization (CSS comments stripped, whitespace collapsed, separator spacing normalized). Brands missing one half of the pair, or with no :root rule in either file, fail the guard. Scoped overrides like :root[lang="zh-CN"] are not required to appear in the fixture (per the kami fixture's inline comment they are pasted only when an artifact's <html lang> matches), so the check only compares the unscoped :root block. Verified: pnpm guard passes for default + kami, fails on intentional value drift, fails on missing token, tolerates whitespace-only formatting differences. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(design-systems): point fixture CTAs to real files Both default and kami components.html advertised in-page anchors (#tokens, #spec, #surface, #accent, #type, #components) but defined no matching ids, so every CTA was a no-op when the fixture was opened locally — flagged by mrcfps in #1231. Re-point each link to a real artifact in the same brand directory: - "View tokens" / "Inspect tokens" / "Inspect typography" → ./tokens.css - "Read the spec" / "Read the rule" → ./DESIGN.md Browsers render these as raw source views, which is the desired UX for a reference fixture: clicking the CTA shows the underlying contract instead of jumping to nothing. Agents copying the fixture also learn the pattern of "buttons link to actual sibling resources". The :root token block is unchanged, so the token-fixture drift guard still passes for both brands. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(design-systems): codify token schema (A1/A2/B/C layers) The two-brand pilot (default + kami) settled the shape of the shared token schema; this commit codifies it as a machine-readable contract and enforces it in pnpm guard, addressing lefarcen's review on #1231: > the optional-vs-required split won't generalize cleanly when brand > #3 needs different Layer A tokens or when multiple brands converge > on the same extension (promoting C→B→A). Consider surfacing that > limitation in the PR narrative or in a future SCHEMA.md. Schema lives under design-systems/_schema/ as three files: - tokens.schema.ts — TypeScript declaration of every shared token with its layer (A1-identity / A1-structure / A2 / B-slot), plus per-brand C-extension allowlists and a global C-prefix allowlist - defaults.css — CSS mirror of A2 fallback values, used as the human-readable contract reviewer's-eye copy and the future input to the derive script - AGENTS.md — schema layer model, C → B-slot → A2 promotion rules, when-not-to-add-a-token guidance Layer model: A1-identity 8 tokens — bg/surface/fg/muted/border/accent + font-display/font-body. The brand IS these values; no fallback is defensible. A1-structure 18 tokens — type scale (8), leading (2), tracking (1), section-y (3), container (4). Structural decisions vary per brand by design and have no cross-brand default. A2 26 tokens — accent states, semantic colors, motion, base spacing scale, radius, elevation, focus, font-mono. Required in every tokens.css; fallback lives in defaults.css for the future derive script to inline when DESIGN.md does not specify the value. B-slot 4 tokens — fg-2 / meta / surface-warm / border-soft. Brand may bind independently or alias the named sibling via var(...) for components that target the richer ramp. C-extension n tokens — brand-specific names (kami's tag-bg-, leading-display, accent-light, etc.). Allowlisted per-brand in BRAND_EXTENSIONS or globally by prefix in BRAND_EXTENSION_PREFIXES. Promote when a second brand adopts the same name. Why A2 fails the guard today: Artifacts are generated by agents pasting one brand's :root block into a single <style>; there is no global stylesheet that supplies fallbacks at runtime. A tokens.css missing an A2 declaration would silently break any var() reference in the fixture. Until the derive script (PR-B) lands and inlines defaults, every brand's tokens.css must declare every A2 token directly. The guard enforces this strictly. Why --font-mono lands in A2 (not A1): 149 brands' DESIGN.md files were surveyed: 87 (58%) declare a monospace stack, 62 (42%) do not — including major brands like bmw / nike / apple / notion / mastercard / meta. Agent paste cannot rely on the brand author having written it down; a defaultable A2 fallback (with CJK brands like kami overriding) is safer than forcing every brand author to add a field they may not realize their kbd / code-block components need. Five guard checks, each registered as its own entry in scripts/guard.ts so failures attribute to a specific contract: 1. token-fixture sync — components.html :root ↔ tokens.css :root byte-equivalent (existing) 2. A1 required tokens — every brand declares every A1 token 3. A2 required tokens — every brand declares every A2 token 4. unknown token allowlist — every declared token is in schema or brand-extension allowlist 5. A2 defaults parity — defaults.css ↔ tokens.schema.ts fallback byte-equivalent Verified on default + kami: - 26 A1 tokens declared in both brands - 26 A2 tokens declared in both brands - 129 total declarations, all match shared schema or brand extensions - defaults.css ↔ tokens.schema.ts parity holds - sanity test: drifting --motion-fast in defaults.css fails check 5 with a clear divergence message The PR description originally listed "Dedicated SCHEMA.md" as explicitly NOT in this PR ("Once 3+ brands ship, extracting a single source of truth becomes worthwhile"). That boundary moves: lefarcen's review surfaced the schema-generalization risk, and the schema must exist as a machine-enforced contract before the derive script can read it. The TS file replaces the markdown that was deferred. Co-authored-by: Cursor <cursoragent@cursor.com> fix(web/tests): pass missing designTemplates prop to ProjectView Pre-existing typecheck regression on main: PR #955 (`b5eb8c16`, "generic skills + split skills/design-templates + finalize-design API") added required `designTemplates: SkillSummary[]` to ProjectView Props but updated only two of the three test fixtures that render ProjectView directly. The third — ProjectView.api-empty-response.test.tsx — was missed, so `pnpm typecheck` (and CI on any PR merging into main) fails on: apps/web/tests/components/ProjectView.api-empty-response.test.tsx (168,6): error TS2741: Property 'designTemplates' is missing in type ... The other two ProjectView tests already pass `designTemplates={[]}`, so this aligns this fixture with the existing pattern. Out of scope for #1231 strictly, but the regression blocks the merged-state typecheck CI runs that #1231 triggers, and the one-line fix here restores main's typecheck health for everyone. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(design-systems): enforce B-slot required tokens in pnpm guard Closes mrcfps + lefarcen review comment thread on #1231: > The guard validates A2 required tokens here, but there's no > sibling check for B-slot aliases (--fg-2, --meta, --surface-warm, > --border-soft). Per the schema docs, every brand must declare > A1 + A2 + B-slot names so shared components can safely read > var(--fg-2) etc. Without a B-slot guard, a brand can omit those > aliases, pass pnpm guard, and break any artifact that references > them. Same artifact-paste constraint as A2: agents render artifacts by pasting one brand's :root block into a single <style>; there is no runtime cascade, so a missing B-slot makes any var(--fg-2) reference resolve to nothing. Until now the schema narrative claimed B-slots were optional with a var() default, but no machine check enforced declaration — a contract gap reviewers reasonably refused to merge. This commit closes the gap in three places so machine and narrative agree: 1. scripts/check-tokens-fixture-sync.ts - Add checkDesignSystemBSlotRequiredTokens, mirroring the A2 check but using getBSlotNames() from the schema. - Failure message names each missing slot AND the schema-suggested alias (--fg-2 (default alias: var(--fg))) so a brand author fixing the failure has a copy-pasteable resolution. - Renumber section comments: 5 checks → 6 checks. 2. scripts/guard.ts - Register the new check between A2 required and unknown allowlist so failures attribute to a specific contract. 3. design-systems/_schema/AGENTS.md - Update the layer table: B-slot row's "If omitted" column changes from "resolves via var() to a richer sibling" to "guard fails — brand must declare, either as var(--sibling) (collapsed) or independent value (richer)". - Add a "Why B-slot is required (and what the alias is for)" section that distinguishes the schema-suggested alias from a runtime fallback, with worked examples for default (alias) and kami (independent bind). Verified on default + kami: - pnpm guard passes all 6 design-system checks - 4 B-slot tokens declared in both brands (default aliases via var(), kami binds independently — both forms satisfy the contract) - pnpm typecheck clean across the workspace - Sanity test: removing --fg-2 + --meta from default/tokens.css fires the new guard with a precise per-token alias hint: [default] design-systems/default/tokens.css is missing 2 B-slot tokens (alias the named sibling via var(...) or bind independently): --fg-2 (default alias: var(--fg)), --meta (default alias: var(--muted)) The schema contract is now machine-enforced end-to-end (A1 + A2 + B-slot all required-with-fixed-form-of-fallback). The derive script in PR-B can rely on every brand's tokens.css containing every shared slot name. Co-authored-by: Cursor <cursoragent@cursor.com> * test(e2e): skip leading-underscore meta-directories under design-systems/ CI for #1231 went red on `Validate workspace` after merging origin/main. Cause is a clean collision between two recently-landed changes: - main #1270 (`be77dc03` "Default English resource i18n fallback") tightened tests/localized-content.test.ts so every directory under design-systems/ is run through assertResourceId() with the strict RESOURCE_ID_PATTERN /^[a-z0-9][a-z0-9-]*$/. - this branch #1231 introduced design-systems/_schema/ as the home of the shared token contract (tokens.schema.ts, defaults.css, AGENTS.md). The leading underscore signals "meta-directory, not brand" — the same convention SCSS partials, Jekyll, Hugo all use. The two changes never met until CI built the merge commit, where assertResourceId('_schema') deterministically failed: Error: Design system directory _schema has malformed resource id: _schema at invariant tests/localized-content.test.ts:66:11 at assertResourceId tests/localized-content.test.ts:71:3 at readDesignSystemResources tests/localized-content.test.ts:202:8 Fix tightens readDesignSystemResources's directory filter so the leading-underscore convention is recognised explicitly: .filter((entry) => entry.isDirectory() && !entry.name.startsWith('_')) This aligns with what apps/daemon/src/design-systems.ts:listDesignSystems already does implicitly — it requires DESIGN.md per directory, so _schema/ was always invisible at runtime; the test was the only place that surfaced it. Verified locally on the post-merge tree: - pnpm test (e2e vitest) — tests/localized-content.test.ts: 4 passed - pnpm guard — all 6 design-system checks pass on default + kami - pnpm typecheck — clean across the workspace (after pnpm install to pull deps for tools/pr that arrived with main) The fix is intentionally narrow (one filter line in one test) and documents the convention inline so future meta-directories under design-systems/ (e.g. _archive/, _drafts/) are covered for free. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: chaoxiaoche <chaoxiaoche@192.168.10.16> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-11 22:23:34 +08:00
shangxinyu1	10802bb0b0	test: expand nightly UI and desktop regression coverage (#1256 ) * e2e(ui): cover examples preview flows * e2e(ui): cover Codex local CLI fallback UX * test: expand desktop and connector regression coverage * e2e(ui): cover workspace restoration flows * e2e(ui): cover retry recovery workspace flow * test: cover artifact and connector recovery flows * e2e(ui): cover Continue in CLI stale provenance flow * e2e(ui): cover BYOK model fetch caching * test: expand Orbit and desktop connector coverage * e2e(ui): cover workspace quick switcher recovery flows * e2e(ui): cover connector pending authorization recovery * e2e(ui): cover workspace and conversation restoration routes * e2e(ui): cover conversation draft and attachment restoration * e2e(ui): cover conversation history selection recovery * e2e(ui): cover workspace surface conversation selection * test: cover artifact presentation and orbit link behavior * test: cover artifact external link restoration * e2e(ui): cover root-route deep-link restoration * e2e(specs): cover Orbit open-artifact desktop click * e2e(specs): cover desktop artifact open link * test: fix Orbit settings fixture type drift * test: split Playwright critical and extended suites * test: fix ProjectView design template fixtures * ci: split workspace test stages * guard: allow split Playwright suite scripts * test: shrink Playwright critical suite * test: restore omitted Playwright suites	2026-05-11 19:23:13 +08:00
PerishFire	8c0fb8dc01	feat(tools-pr): add maintainer PR-duty workspace (#1259 ) * feat(tools-pr): add maintainer PR-duty workspace Adds `tools/pr` as the maintainer-only control plane for PR-duty work on this repo. Thin `gh` wrapper that encodes repo-specific knowledge: review lanes, forbidden surfaces, lane-specific checklists, validation command derivation from touched packages. Subcommands: - `list` — triage open queue by lane and review-state bucket. - `view <num>` — agent-friendly review brief for a single PR. - `classify [num]` — emit script-level tags for one PR or the whole open queue; full-queue JSON output lands under `.tmp/tools-pr/classify/` with rate-limit telemetry per run. - `assignment` — assigner-perspective view of PR ownership, idle time, and blockers (derived from existing tags; no new judgments). Tag dictionary (13 tags) covers: bot-only-approval, needs-rebase, forbidden-surface, unlabeled, duplicate-title, non-ascii-slug, maintainer-edits-disabled, org-member, unresolved-changes-requested, stale-approval, and three awaiting-* timing tags. Each rule is expressible as one factual sentence over `gh` data + repo paths — see `tools/pr/AGENTS.md` for the full dictionary plus precision rules. Templates in `tools/pr/templates/.md` are aesthetic references for recurring maintainer comments (duplicate-title ask, awaiting-author nudge, agent-review brief shape). `templates/examples/` holds frozen-in-time agent-review snapshots for three PR shapes. Infrastructure: - `gh()` wraps `execFile` with minimum-touch retry (2 attempts at 1s + 2s backoff) on transient 5xx / network errors. Persistent failures still surface — retry is anti-jitter, not an exponential-backoff resilience layer. - Heavy chunks (`reviews`, `comments`, `commits`, assignment timelines) use cursor-paginated `gh api graphql` via `fetchPaginatedPrList` to stay under GitHub's GraphQL server-side timeout. Light chunks stay on `gh pr list --json`. - `fetchOrgMembers` cached per process via `gh api orgs/<owner>/members --paginate`. Wiring: - Root `package.json` adds `pnpm tools-pr` to the allowed root entry points. - `scripts/postinstall.mjs` builds `tools/pr` alongside other workspace packages. - `scripts/guard.ts` allowlists `tools/pr/bin/tools-pr.mjs` and `tools/pr/esbuild.config.mjs`, and adds `pr/` to the `tools/` top-level layout allowlist. - Root `AGENTS.md` and `tools/AGENTS.md` document the new command surface, root-command-boundary update, and per-tool ownership. docs(agents): brief tools-pr in root AGENTS.md, link to tools/pr/AGENTS.md Adds a `PR-duty tooling` section to the root AGENTS.md summarising what `pnpm tools-pr` is, listing the four common subcommands (list / view / classify / assignment), and pointing readers to `tools/pr/AGENTS.md` for the full tag dictionary, operational playbook, templates, and design rules. The section keeps root-level guidance to high-level orientation while details stay local to the tool's own AGENTS.md. * fix(tools-pr): drop overly broad touches-root-package.json forbidden hit `deriveForbidden` was flagging any change to root `package.json` as a forbidden-surface hit, but AGENTS.md §Root command boundary only forbids specific lifecycle aliases (pnpm dev / test / build / daemon / preview / start) — tools-control-plane entrypoints like `pnpm tools-pr` are explicitly allowed. Distinguishing "forbidden alias" from "allowed entry" requires reading the diff content, which is `pnpm guard`'s job rather than a path-derived classify tag. Dogfooded on this branch's own PR (#1259), which added the `pnpm tools-pr` script and was incorrectly flagged. Removing the hit aligns the `forbidden-surface` tag with what tools-pr can mechanically detect from file paths alone (apps/nextjs/, packages/shared/). * fix(tools-pr): paginate commits fetch, recognise ready-to-merge, escape title-index separator Three review follow-ups on #1259, all factual fixes: - `fetchOpenPrCommits` now uses `fetchPaginatedPrList` instead of a one-shot `pullRequests(first: $first)` query. GitHub GraphQL caps connection page size at 100, so the previous implementation would fail at runtime when callers passed `--limit > 100`. The paginated path makes the commits fetch consistent with the other heavy chunks (reviews, comments, assignment timelines) and removes the artificial ceiling entirely. The `limit` parameter is dropped from `fetchOpenPrCommits`; the CLI `--limit` continues to bound the `gh pr list --json` chunks. - `deriveStatus` in `assignment.ts` now reads `facts.reviewDecision` and `facts.mergeStateStatus`. When the PR is `APPROVED` with merge state `CLEAN` or `UNSTABLE` and carries no blockers, status renders as `ready to merge` instead of falling through to `in review`. The assignment view loses its main triage signal without this — a clean human-approved PR rendered identical to a REVIEW_REQUIRED one. - `tags.ts:tagDuplicateTitle` and `tags.ts:buildContext` both constructed the title-index key with a literal NUL byte between author and title, which made the file appear as binary in `git diff` / review tooling. Replaced the literal byte with a Unicode escape sequence in source; the runtime string value is identical, the source stays plain text and round-trips through review tooling cleanly. * fix(tools-pr): raise default --limit to 1000 to cover the live open queue mrcfps flagged that `tools-pr list` (and `classify --all`, `assignment`) defaults to `--limit 100`, which silently drops every PR past the first 100 in the open queue. The repo currently sits at 104 open PRs, so the out-of-the-box run was already omitting four PRs. Raise the default to 1000 in `list.ts`, `classify.ts`, and `assignment.ts`, and remove the now-pointless 200 ceiling — `gh pr list --limit N` paginates internally, so a high cap is cheap. Users can still pass `--limit <small>` for a truncated preview. CLI help text on the three subcommands updated to match. * fix(web): pass designTemplates to ProjectView render helper #955 made `designTemplates` a required Prop on ProjectView, but the test helper added in #1244 (`renderProjectView` in `ProjectView.api-empty-response.test.tsx`) was never updated. The two PRs landed on main without conflicting, leaving `apps/web` typecheck red for every PR that rebases past `b5eb8c16`. Pass `designTemplates={[] as SkillSummary[]}` alongside the existing `skills={[] as SkillSummary[]}` so the helper compiles. The component already treats the array shape (empty included) as a no-op fallback in the empty-response paths the test exercises. * fix(tools-pr): correct author signal + merge inline review comments Two correctness gaps in the awaiting-* signal pipeline surfaced during review of the new tools-pr commands: 1. `authorSignalAt` iterated every PR commit unconditionally. On `maintainerCanModify=true` PRs a maintainer's follow-up push would advance the author timestamp, masking a stalled author response. Filter commits to those whose `authorLogin` matches `facts.author`, mirroring the same filter already applied to comments. 2. `fetchOpenPrComments` (and `fetchView`) only fetched `pullRequest.comments` / `gh pr view --json comments`, which is the issue-conversation thread. Inline review-thread replies — where authors and reviewers actually exchange most fix-up replies — live in `reviewThreads.comments` / REST `pulls/{n}/comments`. Missing them let `humanReviewerSignalAt` / `authorSignalAt` and the `view` brief point at the wrong side after someone replied inline. Extend the list-mode GraphQL to also sweep `reviewThreads(last: 20).comments(first: 20)`, and add a parallel REST inline-comments fetch in `fetchView` that merges into `GhView.comments`.	2026-05-11 19:17:21 +08:00

1 2

86 commits