feat(mocks): replay-based mock CLIs for 14 of OD's supported agents (opencode/codex/claude/gemini/cursor-agent/deepseek/qwen/grok + ACP family devin/hermes/kilo/kimi/kiro/vibe) (#3241)

* feat(mocks): replay-based mock CLIs for opencode/claude/codex/deepseek/qwen/grok Drops in a `mocks/` top-level dir that pretends to be the real agent CLIs by streaming pre-recorded sessions in each CLI's native stdout protocol. Zero LLM tokens. ## Use cases - **E2E tests** in `apps/daemon/tests/` — exercise the full chat-server pipeline against a known trace, assert UI events / artifacts. - **Self-validation during dev** — iterate on `claude-stream.ts` / `json-event-stream.ts` parser changes without burning provider budget. - **Regression harness** — replay the same trace before and after a charter / parser change; diff the daemon events the UI surfaces. - **Demo / onboarding** — show what a 17-tool claude editing session looks like end-to-end, offline. ## How - 6 bash wrappers (`mocks/bin/`) shadow the real CLIs when PATH-overlaid. - `mocks/mock-agent.mjs` reads `mocks/recordings/<trace>.jsonl`, picks one via env var (`SYNCLO_EXPLORE_MOCK_TRACE` / `_POOL` / `_BY_PROMPT_HASH`), streams the trace in the requested format. - Each format renderer matches the EXACT JSON shape the OD daemon parser expects, verified line-by-line against `apps/daemon/src/{json-event-stream,claude-stream}.ts`: | CLI | streamFormat | parser source | | ------------------------- | ------------------------- | ------------------------------------------ | | `opencode` | `json-event-stream` | `handleOpenCodeEvent` | | `codex` | `json-event-stream` | `handleCodexEvent` | | `claude` | `claude-stream-json` | `createClaudeStreamHandler` | | `deepseek` `qwen` `grok` | `plain` | `server.ts` (raw stdout) | ## Quick start ```bash export PATH="$PWD/mocks/bin:$PATH" export SYNCLO_EXPLORE_MOCK_TRACE=04097377 # 8-char prefix OK export SYNCLO_EXPLORE_MOCK_NO_DELAY=1 echo "any prompt" | opencode run echo "any prompt" | claude -p --output-format=stream-json echo "any prompt" | codex exec ``` The mock binary announces the picked trace id on stderr: `[mock-opencode] picked 04097377… via fixed`. Recording selection (env, in priority order): - `SYNCLO_EXPLORE_MOCK_TRACE=<id>` — fixed (prefix OK) - `SYNCLO_EXPLORE_MOCK_BY_PROMPT_HASH=1` + stdin prompt — `sha256(prompt) % N` - `SYNCLO_EXPLORE_MOCK_POOL=<tag>` — random within `agent:claude` / `skill:agent-browser` / `outcome:failed` / etc. - (default) uniform random - `SYNCLO_EXPLORE_MOCK_SEED=<str>` — reproducible "random" - `SYNCLO_EXPLORE_MOCK_NO_DELAY=1` — skip inter-event waits ## Dataset 179 anonymized Langfuse traces from this project's own production telemetry: - 9 agents: claude 57 · opencode 41 · codex 38 · gemini 25 · cursor-agent 11 · qwen 2 · copilot 2 · deepseek 2 · antigravity 1 - outcomes: succeeded 144 · failed 35 - skills: default 71 · ad-creative 50 · algorithmic-art 30 · agent-browser 22 · video-hyperframes 2 · plus magazine-web-ppt / brainstorming / data-report / penpot-flutter-design-source 1 each - 124 multi-turn (sessions with ≥2 turns) - 18 produce `<artifact>` output - ~4.5 MB on disk total Anonymization: `/Users/<name>/` → `${HOME}/`, `C:\Users\<name>\` → `%USERPROFILE%\`, project UUIDs → stable `proj-001`, `proj-002`, …. Tool input/output payloads preserved verbatim (templated UI, no cell-level PII). ## Smoke test `bash mocks/scripts/smoke-test.sh` — 6 checks across all 6 agents. All pass on this branch (verified locally): ``` ✓ opencode first event = step_start ✓ codex first event = thread.started ✓ claude first event = system ✓ deepseek emitted plain text (144 chars on first line) ✓ qwen emitted plain text (144 chars on first line) ✓ grok emitted plain text (144 chars on first line) All mock CLIs working. ✅ ``` ## Adding more recordings The exporter that produced this set lives in [nexu-io/agent-pr-explore](https://github.com/nexu-io/agent-pr-explore) (see `cli/src/local/orchestrator/langfuse-import.ts` + the `local langfuse-import` CLI command). Operators with the Langfuse keys can pull more by tag / outcome / artifact / multi-turn filter, then run `local recordings anonymize --out-dir ~/Documents/open-design/mocks/recordings`. `mocks/README.md` has the full instructions. ## Out of scope (follow-ups) - **ACP agents** (`devin`, `hermes`, `kilo`, `kimi`, `kiro`, `vibe`) need a JSON-RPC server on stdio rather than a one-shot stream — separate `format-acp.mjs` module not yet written. - **Per-agent json-event-stream variants** (`cursor-agent`, `gemini`, `qoder`, `copilot`, `pi`) currently fall back to the `plain` renderer; their parsers are in `apps/daemon/src/json-event-stream.ts` and follow the same template as `format-codex.mjs`. ## AGENTS.md updates - Added `mocks/` to the top-level content directories listing - Added a Validation strategy bullet pointing here for agent-stream / parser changes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(mocks): add opencode-cli/kiro-cli/vibe-acp bin aliases and unref ACP timeout - Add mocks/bin/opencode-cli, kiro-cli, vibe-acp wrappers for the primary RuntimeAgentDef bin names OD resolves before any fallback. Without these, a PATH-overlaid OD daemon run bypasses the mock entirely (opencode-cli, kiro-cli) or cannot find the mock at all (vibe-acp, which has no fallback). - Include opencode-cli, kiro-cli, vibe-acp in the smoke-test ACP/JSON loop so coverage is verified end-to-end. - Call .unref() on the 30s safety timeout in format-acp.mjs so a completed ACP session exits promptly instead of waiting the full 30 seconds. Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code) * feat(mocks): add vela (AMR) — login / models / ACP with strict set_model gate Extends mocks/ to cover OD's own AMR runtime. `vela` is the bin name `apps/daemon/src/runtimes/defs/amr.ts` specifies (`bin: 'vela'`, `streamFormat: 'acp-json-rpc'`). It's richer than the generic ACP agents — covers full login + models + chat-session lifecycle. ### What vela does (mirrored from apps/daemon/tests/fixtures/fake-vela.mjs) 1. `vela login` — writes ~/.amr/config.json with a fake profile (controlKey, runtimeKey, user{email,name,plan}, profile-specific apiUrl/linkUrl). The on-disk projection is what OD's daemon login route + AmrLoginPill poller read; production goes through device-auth, the mock skips straight to the file write. 2. `vela models` — prints the production-shaped public model catalog as newline-separated `public_model_* vela` lines. Override via FAKE_VELA_MODELS env. 3. `vela agent run --runtime opencode` — ACP JSON-RPC server with three vela-specific protocol extensions: a. `initialize` response carries `agentCapabilities` (`promptCapabilities.embeddedContext`) + `models` (`currentModelId` + `availableModels`). b. `session/new` response carries the same `models` block. c. **Strict set_model gate**: `session/prompt` is rejected with JSON-RPC -32602 ("session/set_model must be called before session/prompt") UNLESS `session/set_model` (or `session/set_config_option`) has been called for the current sessionId. Mirrors real vela 0.0.1 contract; catches regressions in `attachAcpSession` that silently skip set_model. ### Error injection envs (in sync with fake-vela.mjs) FAKE_VELA_SESSION_ID - sessionId returned by session/new FAKE_VELA_TEXT - override assistant text FAKE_VELA_THOUGHT - optional thought_chunk before text FAKE_VELA_SESSION_NEW_ERROR - fail session/new FAKE_VELA_SET_MODEL_ERROR - fail session/set_model FAKE_VELA_PROMPT_ERROR - fail session/prompt FAKE_VELA_REQUIRE_SET_MODEL='0' - disable the strict gate (legacy) FAKE_VELA_LOGIN_USER_EMAIL - email written into config profile FAKE_VELA_LOGIN_USER_PLAN - plan written into config profile FAKE_VELA_LOGIN_DELAY_MS - sleep before write (test in-flight) FAKE_VELA_LOGIN_FAIL - print + exit 1 FAKE_VELA_MODELS - override models stdout VELA_PROFILE - profile slot (prod | test | local) ### Components `mocks/lib/format-vela.mjs` (~205 LOC) - Full ACP server with vela protocol extensions - Strict set_model gate - Error injection plumbing `mocks/lib/vela-subcommands.mjs` (~90 LOC) - runVelaLogin() — writes ~/.amr/config.json - runVelaModels() — prints catalog `mocks/bin/vela` — dispatcher wrapper. Forwards `vela <subcmd>` to mock-agent.mjs which routes to login/models or falls through to ACP. `mocks/mock-agent.mjs` — parseArgs now collects positionals so the vela dispatcher can read subcommand from there; switch case added for vela. `mocks/scripts/smoke-test.sh` — +4 assertions: vela models prints ≥10 catalog lines vela login writes ~/.amr/config.json with the requested email vela agent run ACP roundtrip (initialize+models+set_model+stream+result) vela strict set_model gate rejects prompt without prior set_model ### Verified locally ✓ vela models printed 15 catalog lines ✓ vela login wrote ~/.amr/config.json with profile.prod.user.email ✓ vela agent run ACP roundtrip (initialize+models, set_model accepted, prompt streamed) ✓ vela strict set_model gate rejects session/prompt without prior set_model All 21 smoke checks pass (up from 17 with previous P3 ACP commit). ### AGENTS.md + README updates AGENTS.md — mention `vela (AMR — vela CLI)` alongside ACP agents in the directory listing entry. mocks/README.md — protocol table row + dedicated vela section with subcommand contract, strict gate explanation, env-injection cheat sheet. Mock-tree listing updated. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(mocks): honor REPORT_FILE env when --report-file flag not given Harnesses that spawn the mock without translating their report-path contract to the mock's CLI flag (notably nexu-io/agent-pr-explore's orchestrator, which passes REPORT_FILE as env per the existing opencode/claude/codex agent launchers) wouldn't get a report file written, so the harness's "agent exit 0 but produced no report" check would always fire and mark mock runs as failure even though the stdout stream was complete. Fix: in mock-agent.mjs parseArgs, fall through to process.env.REPORT_FILE when --report-file wasn't provided on argv. Each format renderer already accepts opts.reportFile and writes the recording's final assistant text to it (`format-*.mjs` already had this — only the wiring was missing). Verified: synclo-explore run with `mock=true, mock_trace=04097377` against the opencode wrapper now produces a plan.md with the recording's 17-tool claude editing session report. ~1.5s per run vs ~70s real opencode. * mocks: move recordings to Cloudflare R2; PR→main→Action upload path The 179-recording corpus (~4.5 MB raw, ~280 KB after compression) has been moved off git into Cloudflare R2 at the bucket open-design-mocks under recordings/v1/. The repo now ships: - mocks/manifest.json — the canonical catalog (renamed from recordings/index.json) with sha256 + storage hints; consumers fetch this to discover what exists, then pull individual jsonl files on demand - mocks/scripts/fetch-recordings.sh — parallel, sha256-verified, idempotent puller for the public r2.dev URL - mocks/scripts/add-recording.sh — local maintainer helper that validates a new .jsonl and copies it into recordings-staging/ (no R2 calls; no credentials needed) - mocks/scripts/upload-to-r2.mjs — called only by the CI workflow - mocks/scripts/lib/manifest-utils.mjs — shared sha256/meta/ rebuild-histograms logic, used by both add-recording (preview) and upload-to-r2 (actual write) so the entry shape never drifts - .github/workflows/sync-mocks-to-r2.yml — fires on push to main when mocks/recordings-staging/ changes; uploads to R2, updates manifest, commits cleanup back; serialized via concurrency group Trust model: R2 write credentials (CLOUDFLARE_API_TOKEN, CLOUDFLARE_ACCOUNT_ID) are repo secrets; nobody can push from a laptop. Read stays public via the r2.dev URL. Why not pnpm install integration: contributors who do not touch agent code do not pay the fetch cost. Fetch happens on first smoke-test run (auto-fallback) or when a mock spawn needs data. Repo size: -4.55 MB net (delete 179 jsonl, +280 KB manifest + scripts). Smoke test (21 checks) still green against the fetched corpus. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mocks: scope R2 write token to a dedicated secret name Use CLOUDFLARE_R2_MOCKS_TOKEN (instead of reusing the shared CLOUDFLARE_API_TOKEN that landing-page-*.yml uses for Pages deploys) so the R2 write capability can be scoped to just the open-design-mocks bucket without bleeding extra capability into the Pages workflows. Also hardcode the powerformer CF account_id directly in the workflow (account IDs are not secret and the shared CLOUDFLARE_ACCOUNT_ID secret may point at a different account). Workflow now fails fast with an actionable error message + dashboard link if the secret is unset. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mocks: switch R2 sync to S3-compat API (wrangler getMemberships gate) wrangler 4.x calls /memberships before any r2 action, requiring user:read scope. R2 "Object Read & Write" tokens deliberately lack that scope (defense in depth — a leaked token should not enumerate account-level resources). The workflow now uses the aws CLI talking straight to the R2 S3-compatible endpoint with SigV4, no membership lookup. Secret rotation: CLOUDFLARE_R2_MOCKS_TOKEN (Bearer) is replaced by CLOUDFLARE_R2_MOCKS_AK / CLOUDFLARE_R2_MOCKS_SK (matching the existing CLOUDFLARE_R2_RELEASES_AK/SK naming convention). End-to-end tested locally: PUT recording → manifest rebuild → manifest PUT → staging cleanup all green. aws CLI is pre-installed on ubuntu-latest, so no install step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mocks: scrub synclo namespace; use OD_MOCKS_* env prefix throughout These mocks were copy-pasted from synclo-explore, where they originated, and inherited the SYNCLO_EXPLORE_MOCK_* env-var convention. That brand-bleed is not appropriate in OD: rename the public env surface to OD_MOCKS_* (matching OD-native prefixes like OD_MOCKS_CACHE_DIR, OD_TRACE_R2_UPLOAD, OD_EXPECT_TIMEOUT_SECONDS). Renames: SYNCLO_EXPLORE_MOCK_TRACE → OD_MOCKS_TRACE SYNCLO_EXPLORE_MOCK_BY_PROMPT_HASH → OD_MOCKS_BY_PROMPT_HASH SYNCLO_EXPLORE_MOCK_POOL → OD_MOCKS_POOL SYNCLO_EXPLORE_MOCK_SEED → OD_MOCKS_SEED SYNCLO_EXPLORE_MOCK_NO_DELAY → OD_MOCKS_NO_DELAY SYNCLO_EXPLORE_MOCK_RECORDINGS_DIR → OD_MOCKS_RECORDINGS_DIR SYNCLO_EXPLORE_MOCK_SMOKE_TRACE → OD_MOCKS_SMOKE_TRACE SYNCLO_OD_MOCKS_I_KNOW_WHAT_IM_DOING → OD_MOCKS_ALLOW_LOCAL_UPLOAD Also drop the inline harvester usage from README. The harvester is an external CLI in nexu-io/agent-pr-explore — its README is the right place for langfuse-import flags, anonymization options, etc. OD only documents its own staging→PR→Action workflow. Smoke test (21 checks) still green; OD_MOCKS_TRACE end-to-end verified to route correctly. Consumers of the OLD env names (notably the orchestrator in nexu-io/agent-pr-explore) need a matching rename. No back-compat shim here — the explore side has zero external users today and a one-line follow-up is cleaner than a permanent deprecation layer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * AGENTS.md: align mock env names with mocks/ rename (SYNCLO_* → OD_MOCKS_*) Missed in the prior commit (a30b868a) — only grepped mocks/ subdir. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mocks: drop staging dir + GH Action; back to local-script upload The staging-dir + Action design (added earlier in this PR) had a flaw the user caught: new recordings briefly entered the repo on their way through staging, leaving them in git history forever even after the Action cleanup commit removed them from HEAD. That defeats the whole point of moving recordings to R2. Replace with the simpler local-maintainer flow: bash mocks/scripts/upload-recording.sh /path/to/<trace>.jsonl # → validates, wrangler r2 put, updates manifest.json, wrangler r2 put manifest git add mocks/manifest.json && git commit && git push # → only the ~200B manifest delta enters git The wrangler-OAuth gate replaces the CI secret + Action duo. For a solo / small maintainer team this collapses the trust chain down to "do you have wrangler login to the powerformer account?" — no GH secrets to rotate, no concurrency window to worry about, no inevitable repo-history bloat. Deletes: - .github/workflows/sync-mocks-to-r2.yml - mocks/scripts/upload-to-r2.mjs (CI-only) - mocks/scripts/add-recording.sh (staging helper, now obsolete) - mocks/recordings-staging/ (empty dir, never to be repopulated) Adds: - mocks/scripts/upload-recording.sh Kept: - mocks/scripts/fetch-recordings.sh - mocks/scripts/lib/manifest-utils.mjs (still used by upload-recording.sh) - mocks/manifest.json (committed; the only mocks artifact in git) End-to-end tested locally: re-upload an existing recording is idempotent, manifest math is stable, fetch + smoke test still green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mocks: address review — guard allowlist + safe ~/.amr + loud OD_MOCKS_TRACE typo Three concrete issues raised across recent Siri-Ray (Looper) review threads on #3241: 1. scripts/guard.ts only allowlisted mocks/lib/ + mocks/mock-agent.mjs, leaving mocks/scripts/lib/manifest-utils.mjs outside the residual- JS guard. Result: Preflight fail on every push. Extend the allowlist to mocks/scripts/ — same precedent as the lib/ entry directly above. 2. mocks/scripts/smoke-test.sh moved the caller real ~/.amr to ~/.amr-smoke-backup, ran vela login (which writes a fake config), then rm -rf the .amr and restored the backup. Two failure modes: crash mid-run loses the user real config, and re-running before restore overwrites the backup with the fake login. Fix: sandbox vela login into a mktemp -d HOME via env (HOME=$amr_sandbox vela login). Never touches the real ~/.amr at all. trap cleans up. 3. mocks/lib/recording-picker.mjs silently fell through to prompt-hash → pool → random when OD_MOCKS_TRACE was set but did not match any recording (typo, prefix too short, corpus not fetched). Tests using a pinned trace would silently get a different trace, hiding regressions. Fix: throw an explicit error with the failing value + a pointer at fetch-recordings.sh. Verified locally: pnpm guard prints "Residual JavaScript check passed", smoke-test still 21/21, ~/.amr mtime unchanged after run, typo on OD_MOCKS_TRACE now produces "mock-agent: OD_MOCKS_TRACE=... set but no matching recording in <dir>" on stderr. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fetch-recordings: detect empty filter result before line-counting printf '%s\n' on an empty string emits a single empty line, so the previous TOTAL=$(printf ... | grep -c "") math returned 1 on an empty $ENTRIES_TSV — a typo like `--agent no-such-agent` printed "Fetching up to 1 recordings", downloaded zero, and exited 0 ("ready"). Check `-z $ENTRIES_TSV` first. Reproduced + fix verified per the reviewer thread. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mocks: address mrcfps review — goldens + provenance + contract check Three durability improvements suggested in the PR #3241 top-level review: ## 1. Golden daemon-event snapshots (mocks/golden/*.events.json + apps/daemon/tests/mocks-golden.test.ts) Smoke-test verified that mocks RUN; that catches crashes but not a parser change that semantically reshapes the events the daemon emits. Commit the daemon-event sequence for 3 representative traces: - claude 314d6833 — median-complexity agent-browser session - codex dcdff3b3 — 14-tool refactor - opencode 9a9522ec — 7-tool data-report apps/daemon/tests/mocks-golden.test.ts spawns the mock, feeds stdout through the real createClaudeStreamHandler / createJsonEventStreamHandler, normalizes per-spawn volatile fields (only sessionId today, only on claude), and deep-equals against the committed snapshot. A parser regression fails the test loudly. After an intentional parser change, regenerate: MOCKS_GOLDEN_UPDATE=1 pnpm --filter @open-design/daemon test mocks-golden git diff mocks/golden/ # eyeball; commit if shapes match intent ## 2. Provenance fields on every manifest entry (mocks/scripts/lib/manifest-utils.mjs + mocks/manifest.json) Augment inspectRecording() to write: captured_at — ISO 8601 from existing meta.timestamp cli_version — null until harvester writes it protocol_version — null until harvester writes it anonymization_version — null until harvester writes it captured_at is now populated for all 179 existing entries from the meta event the harvester already emits. The harvester in nexu-io/agent-pr-explore is the next step for cli_version / protocol_version / anonymization_version — once those are populated, consumers can detect when a recording is older than ~1 minor version behind the live CLI and flag for re-harvest. No matrix of (cli_version × agent) recordings — that explodes maintenance. Just metadata per recording so trust decay is visible. ## 3. Real-CLI contract check (mocks/scripts/contract-check.sh + docs/MOCKS-CONTRACT-CHECK.md) Mocks catch parser regressions against recordings; they do NOT catch recordings drifting away from the live agent CLI as that CLI evolves. The contract check spawns the real CLI alongside the mock with a fixed deterministic prompt + diffs top-level event-type distributions. Deliberately human-driven, not cron-scheduled: - costs real LLM tokens per invocation - requires real CLI auth - maintainer reads the output, not a regex Suggested triggers per doc: real-CLI release notes mentioning "output format" / "stream" / "JSON" / "events"; before a parser refactor; ad-hoc when something looks off. ## Coverage note README updated to position mocks as "deterministic protocol/parser coverage" (not "e2e replacement") per mrcfps framing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(mocks-golden test): drop import of non-exported ParserKind Use plain string (the type alias is `string` anyway) — Preflight typecheck on a31fa71a failed: tests/mocks-golden.test.ts(29,8): error TS2459: Module "../src/json-event-stream.js" declares "ParserKind" locally, but it is not exported. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * recording-picker: structured OD_MOCKS_POOL + hard-fail no-match Siri-Ray review: \`OD_MOCKS_POOL=outcome:failed\` was documented as a supported selection knob, but the matcher only checked tags and \`meta.agent\` — so the negative-path pool found 0 candidates and silently fell through to global random, validating against any recording instead of a failed trace. Fix: - Parse \`<dim>:<value>\` shape and route each dim to the right meta field: \`outcome\` → \`meta.outcome\`, \`agent\` → \`meta.agent\`, \`skill\` → \`tags[]\`. Bare values still fall back to tag substring. - If the env was set and matched nothing, throw with the failing value and a jq one-liner for inspection. Same loud-fail policy as OD_MOCKS_TRACE — silent fallback was the original bug. Verified locally: outcome:failed, agent:codex, skill:agent-browser all route correctly; outcome:nonsense throws the explicit error. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * contract-check.sh: fix lost $PROMPT in mock invocation Siri-Ray review on e576074a: the mock side wrapped its pipeline in `bash -c "printf %s \"\$PROMPT\" | ..."` — but $PROMPT was a parent shell variable, not exported, so the child bash expanded it to an empty string. Result: the contract check sent the real prompt to the real CLI and an empty string to the mock, defeating the same-input invariant the whole script rests on. Also let the mock randomly select a different trace whenever a maintainer happens to have OD_MOCKS_BY_PROMPT_HASH=1 in their env. Fix: drop the inner bash -c entirely; use a subshell that scopes the PATH overlay and pipes printf into the PATH-resolved mock binary directly. The subshell limits the PATH change without var-passing. Verified locally: with prompt-A the mock picks trace 54ec02ee via hash; prompt-B → 2667e851 via hash; empty prompt (old broken behavior) → random — confirms the prompt is now actually reaching the mock under PATH overlay. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 19:04:39 +07:00 · 2026-05-29 15:17:20 +08:00 · 2026-05-29 15:17:20 +08:00 · da19ff3ca0
commit da19ff3ca0
parent 4b7c018a9b
45 changed files with 6888 additions and 1 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@ -14,7 +14,7 @@ This file is the single source of truth for agents entering this repository. Rea
 ## Workspace directories

 - Workspace packages come from `pnpm-workspace.yaml`: `apps/*`, `packages/*`, `tools/*`, and `e2e`.
- Top-level content directories: `skills/` (functional skills the agent invokes mid-task — utilities, briefs, packagers; see `skills/AGENTS.md`), `design-templates/` (rendering catalogue: decks, prototypes, image/video/audio templates; see `design-templates/AGENTS.md` and `specs/current/skills-and-design-templates.md`), `design-systems/` (brand `DESIGN.md` files), `craft/` (universal brand-agnostic craft rules a skill can opt into via `od.craft.requires`).
+- Top-level content directories: `skills/` (functional skills the agent invokes mid-task — utilities, briefs, packagers; see `skills/AGENTS.md`), `design-templates/` (rendering catalogue: decks, prototypes, image/video/audio templates; see `design-templates/AGENTS.md` and `specs/current/skills-and-design-templates.md`), `design-systems/` (brand `DESIGN.md` files), `craft/` (universal brand-agnostic craft rules a skill can opt into via `od.craft.requires`), `mocks/` (replay-based mock CLIs for `opencode`/`claude`/`codex`/`gemini`/`cursor-agent`/`deepseek`/`qwen`/`grok`, the ACP family `devin`/`hermes`/`kilo`/`kimi`/`kiro`/`vibe`, and the AMR `vela` CLI (login + models + ACP), built from anonymized Langfuse traces — PATH-overlay drop-in for tests and self-validation; see `mocks/README.md`).
 - `apps/web` is the Next.js 16 App Router + React 18 web runtime; do not restore `apps/nextjs`.
 - `apps/daemon` is the local privileged daemon and `od` bin. It owns `/api/*`, agent spawning, skills, design systems, artifacts, and static serving.
 - `apps/desktop` is the Electron shell; it discovers the web URL through sidecar IPC.
@ -167,6 +167,7 @@ root `pnpm tools-pr` script without a new explicit maintainer decision.
 ## Validation strategy

 - After package, workspace, or command-entry changes, run `pnpm install` so workspace links and generated dist entries stay fresh.
+- For agent-stream / parser changes (`apps/daemon/src/claude-stream.ts`, `json-event-stream.ts`, `qoder-stream.ts`, etc.), replay a recorded session through the mock CLIs in `mocks/` to verify event shapes round-trip without burning provider budget. PATH-overlay activation: `export PATH="$PWD/mocks/bin:$PATH" OD_MOCKS_TRACE=<8-char-id> OD_MOCKS_NO_DELAY=1`. See `mocks/README.md` for the trace catalog and selection knobs.
 - Treat every `pnpm-lock.yaml` change as requiring a Nix pnpm deps hash refresh check. `nix/pnpm-deps.nix` is a generated lock artifact; use `pnpm nix:update-hash` only when intentionally maintaining Nix packaging, then re-run `nix flake check --print-build-logs --keep-going`. Contributors without Nix can rely on the PR `Validate workspace` gate, which now uploads or auto-applies the generated hash-only fix when possible.
 - Before marking regular work ready, run at least `pnpm guard` and `pnpm typecheck`, plus the package-scoped tests/builds that match the files changed. Do not use or add root `pnpm test`/`pnpm build` aliases.
 - For local web runtime loops, prefer `pnpm tools-dev run web --daemon-port <port> --web-port <port>`.
--- a/apps/daemon/tests/mocks-golden.test.ts
+++ b/apps/daemon/tests/mocks-golden.test.ts
@ -0,0 +1,113 @@
+// Golden daemon-event snapshots — addresses the regression-signal point
+// from review on #3241: smoke-testing that mocks RUN catches only crashes
+// or protocol-level garbage; it does NOT catch a parser change that
+// semantically reshapes the events the daemon emits to the UI.
+//
+// This test replays representative recordings through the actual daemon
+// stream handlers and asserts the emitted event sequence matches a
+// committed `mocks/golden/<trace>.events.json`. A parser tweak that
+// drops a tool_result, changes a usage shape, or renames an event type
+// fails this test loudly.
+//
+// Update flow when a parser change is INTENTIONAL:
+//   MOCKS_GOLDEN_UPDATE=1 pnpm --filter @open-design/daemon test mocks-golden
+// then `git diff mocks/golden/` and commit the new shapes.
+//
+// Auto-skips when the recording corpus hasn't been fetched yet (see
+// `mocks/scripts/fetch-recordings.sh`); CI that exercises this test must
+// fetch first.
+
+import { describe, it, expect } from 'vitest';
+import { existsSync, readFileSync, writeFileSync, mkdirSync } from 'node:fs';
+import { spawnSync } from 'node:child_process';
+import { dirname, join } from 'node:path';
+import { fileURLToPath } from 'node:url';
+
+import { createClaudeStreamHandler } from '../src/claude-stream.js';
+import { createJsonEventStreamHandler } from '../src/json-event-stream.js';
+
+const HERE = dirname(fileURLToPath(import.meta.url));
+const REPO = join(HERE, '../../..');
+const MOCK_AGENT = join(REPO, 'mocks/mock-agent.mjs');
+const GOLDEN_DIR = join(REPO, 'mocks/golden');
+const RECORDINGS_DIR = join(REPO, 'mocks/recordings');
+
+// Median-tool-count successful traces per agent (selected from manifest
+// 2026-05-29). Each one's `.jsonl` lives in `mocks/recordings/` after
+// `bash mocks/scripts/fetch-recordings.sh`.
+const CASES: Array<{ agent: 'claude' | 'codex' | 'opencode'; trace: string }> = [
+  { agent: 'claude',   trace: '314d6833-0377-4ac4-ba11-2b8d7eca5511' },
+  { agent: 'codex',    trace: 'dcdff3b3-cd39-4dcd-be83-372830a29639' },
+  { agent: 'opencode', trace: '9a9522ec-575f-432f-aeed-efc491e900aa' },
+];
+
+// Replace per-spawn-volatile fields with stable sentinels so the
+// snapshot stays diffable across runs. Currently only `sessionId` —
+// claude's mock emits a fresh UUID every spawn. Opencode/codex carry
+// the recording's own session/thread id so they're already stable.
+function normalizeVolatile(events: unknown[]): unknown[] {
+  return events.map(e => {
+    if (!e || typeof e !== 'object') return e;
+    const rec = e as Record<string, unknown>;
+    const out: Record<string, unknown> = { ...rec };
+    if ('sessionId' in out) out.sessionId = '<normalized>';
+    return out;
+  });
+}
+
+function runMockAndCollectEvents(agent: string, trace: string): unknown[] {
+  // Force no-delay so the spawn returns quickly + deterministically.
+  const proc = spawnSync(
+    process.execPath,
+    [MOCK_AGENT, '--as', agent, '--no-delay'],
+    {
+      env: { ...process.env, OD_MOCKS_TRACE: trace, OD_MOCKS_NO_DELAY: '1' },
+      input: 'golden-test-prompt',
+      encoding: 'utf-8',
+      timeout: 30_000,
+      maxBuffer: 50 * 1024 * 1024,
+    },
+  );
+  if (proc.status !== 0) {
+    throw new Error(
+      `mock-agent --as ${agent} exit ${proc.status}: ${proc.stderr.slice(0, 500)}`,
+    );
+  }
+
+  const events: unknown[] = [];
+  const sink = (e: unknown) => events.push(e);
+  const handler =
+    agent === 'claude'
+      ? createClaudeStreamHandler(sink)
+      : createJsonEventStreamHandler(agent, sink);
+  handler.feed(proc.stdout);
+  return normalizeVolatile(events);
+}
+
+const recordingsAvailable =
+  existsSync(RECORDINGS_DIR) &&
+  CASES.every(c => existsSync(join(RECORDINGS_DIR, `${c.trace}.jsonl`)));
+
+describe.skipIf(!recordingsAvailable)(
+  'mocks goldens — daemon event shape regression',
+  () => {
+    for (const { agent, trace } of CASES) {
+      it(`${agent} ${trace.slice(0, 8)}`, () => {
+        const events = runMockAndCollectEvents(agent, trace);
+        const goldenPath = join(GOLDEN_DIR, `${trace}.events.json`);
+
+        if (process.env.MOCKS_GOLDEN_UPDATE === '1') {
+          mkdirSync(GOLDEN_DIR, { recursive: true });
+          writeFileSync(
+            goldenPath,
+            JSON.stringify({ agent, trace, events }, null, 2) + '\n',
+          );
+          return;
+        }
+
+        const golden = JSON.parse(readFileSync(goldenPath, 'utf-8'));
+        expect({ agent, trace, events }).toEqual(golden);
+      });
+    }
+  },
+);
--- a/docs/MOCKS-CONTRACT-CHECK.md
+++ b/docs/MOCKS-CONTRACT-CHECK.md
@ -0,0 +1,108 @@
+# Real-CLI contract check
+
+The replay mocks under `mocks/` impersonate real agent CLIs by emitting
+recorded traces in each CLI's native protocol. They're great for parser
+regression coverage but they can silently drift away from the real CLI
+when:
+
+- An agent CLI ships a new event `type` that the mock doesn't know about.
+- A field gets renamed (`sessionID` → `sessionId`) and the mock keeps
+  emitting the old name. OD's parser may have been updated to accept
+  both, so smoke tests stay green, but new fields aren't surfaced.
+- A protocol version bump changes the shape of `usage` / tool calls /
+  init blocks.
+
+The contract check is the periodic ritual that catches that drift.
+
+## Scope
+
+It is **not** a CI gate. The check:
+
+- Costs real LLM tokens (a few cents per agent per run).
+- Requires the real CLI installed + authenticated locally or on a
+  maintainer-controlled runner.
+- Wants a human to eyeball the output, not a regex.
+
+Treat it like a maintenance task — monthly is fine, ad-hoc whenever a
+relevant CLI publishes a release note about output-format changes.
+
+## How to run
+
+```bash
+bash mocks/scripts/contract-check.sh claude
+bash mocks/scripts/contract-check.sh codex
+bash mocks/scripts/contract-check.sh opencode
+```
+
+The script:
+
+1. Resolves the real CLI binary (ignoring the `mocks/bin/` PATH overlay).
+2. Sends a fixed deterministic prompt: *"List the entries of the current
+   working directory and tell me how many JSON files are present."*
+3. Runs the same prompt through the mock CLI.
+4. Prints a side-by-side distribution of top-level event `type`
+   values from both.
+5. Leaves both raw JSONL outputs in `/tmp` for you to `diff`.
+
+## What to look for
+
+Compare the two `type` distributions. Acceptable differences:
+
+- Counts vary slightly (mock plays a single recorded trace, real CLI
+  may take a different number of turns for the same prompt).
+- Mock emits a superset of the real CLI's event types — the recordings
+  span historical CLI versions.
+
+**Red flags**:
+
+- Real CLI emits a `type` value the mock never produces → the mock
+  needs a new event handler in `mocks/lib/format-<agent>.mjs`.
+- Real CLI's event uses different field names than the mock → either
+  the real CLI changed and the parser may already be out of sync, or
+  the mock is drifting toward an internal convention.
+- Mock crashes / emits nothing → the agent's `--no-delay` path is
+  broken.
+
+## Suggested cadence
+
+No fixed schedule, no automated cron — the check is human-driven:
+
+- **On real-CLI release**: when Anthropic / OpenAI / OpenCode publishes
+  a release whose notes mention "output format" / "JSON" / "stream" /
+  "events" / "API", run the affected agent's check. This is the
+  highest-signal trigger.
+- **Before a parser refactor**: lock the contract before touching
+  `apps/daemon/src/claude-stream.ts` / `json-event-stream.ts`, so a
+  post-refactor failure means "I broke the parser" rather than "the
+  real CLI already drifted and the parser had silently caught it".
+- **Ad-hoc**: if something feels off — UI suddenly missing a tool call,
+  duplicate events, unfamiliar field names in logs — a contract check
+  is the fast first step.
+
+Putting this on a cron would burn LLM tokens every run with no human
+review of the output, defeating the point. The check is an artifact a
+maintainer reads, not a CI gate.
+
+## Future improvements
+
+The current script only compares top-level `type` distributions
+because a deeper structural diff is hard to do without a schema.
+Possible follow-ups:
+
+1. **JSON-shape schema per agent** — generate a JSON Schema from the
+   mock formatters' output, run a validator against real-CLI output,
+   report violations with field paths.
+2. **Recorded-then-replayed delta** — capture the real CLI's output
+   for the fixed prompt, save under `mocks/contracts/<agent>.golden.jsonl`,
+   then in CI replay that golden through the daemon parser and assert
+   no parser errors. Cheaper than calling the LLM every CI run but
+   only catches *parser* drift, not *CLI* drift.
+
+Neither is implemented today.
+
+## Related
+
+- `mocks/scripts/contract-check.sh` — the script itself.
+- `apps/daemon/tests/mocks-golden.test.ts` — daemon-event golden
+  snapshots (catches parser regressions against the mocks, complementary
+  to this check which catches mock-vs-real drift).
--- a/mocks/README.md
+++ b/mocks/README.md
@ -0,0 +1,475 @@
+# `mocks/` — replay-based mock CLIs for OD's supported agents
+
+A drop-in replacement for the real agent CLIs (`claude`, `opencode`,
+`codex`, `gemini`, `cursor-agent`, `deepseek`, `qwen`, `grok`, the
+ACP family `devin` / `hermes` / `kilo` / `kimi` / `kiro` / `vibe`, and
+the AMR `vela` CLI) that replays pre-recorded sessions in each CLI's
+native protocol — stdout streaming for most, JSON-RPC over stdio for
+ACP and AMR. **Zero LLM tokens.**
+
+Used by:
+
+- **E2E tests** in `apps/daemon/tests/` — run the full chat-server
+  pipeline against a known agent trace, assert UI events / artifacts.
+- **Local self-tests during development** — iterate on `chat-routes.ts`,
+  `claude-stream.ts`, `json-event-stream.ts` parser changes without
+  burning provider budget.
+- **Demo / onboarding** — show what a 17-tool `claude` editing session
+  looks like end-to-end, offline.
+- **Regression harness** — replay the same trace before and after a
+  charter / parser change; diff the events the daemon surfaces.
+
+The recordings are anonymized exports from open-design's Langfuse
+project (179 traces across 9 agents and 5+ skills as of this commit).
+
+---
+
+## tl;dr
+
+```bash
+# First-time setup — pull the recording corpus from R2 (~30s, 4.5MB):
+bash mocks/scripts/fetch-recordings.sh
+# Subsequent runs hit the local cache (sha256-verified, instant).
+
+# Make the mock CLIs override the real ones for this shell:
+export PATH="$PWD/mocks/bin:$PATH"
+
+# Pick any recording to play back (8-char prefix OK):
+export OD_MOCKS_TRACE=04097377
+
+# Speed up replay (skip inter-event sleeps):
+export OD_MOCKS_NO_DELAY=1
+
+# Now anything that spawns opencode/claude/codex gets the recording:
+echo "any prompt body" | opencode run
+echo "any prompt"     | claude -p --output-format=stream-json
+echo "any prompt"     | codex exec
+```
+
+The mock binaries are bash wrappers that exec
+`node mocks/mock-agent.mjs --as <agent>`. Anything fed to stdin is
+discarded by the renderer but used by the recording picker (see hash
+mode below).
+
+## Recordings live on R2, not in this repo
+
+The 179-recording corpus (~4.5 MB) is hosted on Cloudflare R2 at
+`open-design-mocks` and fetched **on demand** — `pnpm install` does NOT
+pull them, and the repo stays small. Recordings only land in
+`mocks/recordings/` when:
+
+1. You run `bash mocks/scripts/fetch-recordings.sh` directly, OR
+2. `bash mocks/scripts/smoke-test.sh` runs and the dir is empty (auto-
+   fetch fallback), OR
+3. A mock binary spawn finds no data — it errors with a pointer at the
+   fetch script (no silent failure).
+
+This is by design: contributors who don't touch agent code don't pay
+the fetch cost. CI jobs that DO touch agent code (`apps/daemon/tests/`
+parser changes, etc.) run the fetch as a quick pre-step and cache
+`mocks/recordings/` between runs.
+
+```bash
+# Fetch everything (parallel, sha256-verified, idempotent):
+bash mocks/scripts/fetch-recordings.sh
+
+# Fetch a subset:
+bash mocks/scripts/fetch-recordings.sh --agent claude       # 57 claude traces
+bash mocks/scripts/fetch-recordings.sh --outcome failed     # 35 failed-path traces
+bash mocks/scripts/fetch-recordings.sh --skill agent-browser
+
+# Override cache location (e.g. share across multiple OD checkouts):
+OD_MOCKS_CACHE_DIR=~/.cache/od-mocks bash mocks/scripts/fetch-recordings.sh
+```
+
+Manifest at `mocks/manifest.json` is the committed source of truth —
+it lists every recording's `trace_id`, `sha256`, `bytes`, `agent`,
+`outcome`, `skills`, `multi_turn`, plus histograms over the corpus.
+Tooling reads this; you don't have to.
+
+### Provenance per recording
+
+Beyond identity (`trace_id`, `sha256`), each manifest entry carries
+fixture-trust signals so consumers can decide whether the recording
+is still meaningful as the real CLIs evolve:
+
+| Field | Meaning |
+|---|---|
+| `captured_at` | ISO 8601 timestamp of the original session — populated for all 179 current entries |
+| `cli_version` | The CLI version the trace was captured against (e.g. `"claude-code 1.0.65"`) — populated only on traces the harvester writes it to, null otherwise |
+| `protocol_version` | Stream-format version (`"claude-stream-json/v1"`, `"opencode/json-event-stream"`) — populated by harvester |
+| `anonymization_version` | Which anonymizer pass scrubbed the recording — populated by harvester |
+
+For now most of these are null on the existing 179 — the harvester in
+[nexu-io/agent-pr-explore][harvester] is the next thing to teach to
+write them. Once a recording's `cli_version` falls behind the actual
+CLI by more than one minor version, treat it as a candidate for
+re-harvest.
+
+### Golden daemon-event snapshots
+
+`mocks/golden/<trace>.events.json` holds the exact event sequence the
+OD daemon emits when fed each (mock CLI → handler) pipeline. Diffed
+on every `pnpm --filter @open-design/daemon test` run by
+`apps/daemon/tests/mocks-golden.test.ts`.
+
+A parser refactor that semantically changes events (drops a field,
+renames `sessionId`, stops emitting `turn_end`) fails the diff loudly.
+After an intentional parser change, regenerate:
+
+```bash
+MOCKS_GOLDEN_UPDATE=1 pnpm --filter @open-design/daemon test mocks-golden
+git diff mocks/golden/    # eyeball the new shapes
+git add mocks/golden/ && git commit -m "mocks: refresh goldens for <parser change>"
+```
+
+Per-spawn volatile fields (currently just claude's generated
+`sessionId`) are stripped to `"<normalized>"` so the snapshot stays
+stable. See `mocks/golden/README.md` for the coverage rationale.
+
+### Real-CLI contract check
+
+The mocks catch parser regressions against the recordings; they do
+**not** catch the recordings themselves drifting away from the live
+agent CLIs. For that, `mocks/scripts/contract-check.sh` spawns a real
+CLI alongside the mock with a fixed prompt and prints a side-by-side
+event-type distribution.
+
+This is human-driven and costs real LLM tokens — run on a real-CLI
+release or before a parser refactor, not on a cron. Full doc:
+[`docs/MOCKS-CONTRACT-CHECK.md`](../docs/MOCKS-CONTRACT-CHECK.md).
+
+---
+
+## What gets emitted
+
+Each renderer matches the EXACT event shapes the OD daemon expects, as
+verified line-by-line against the parsers in `apps/daemon/src/`:
+
+| CLI | OD streamFormat | Parser source |
+|---|---|---|
+| `opencode`        | `json-event-stream` (opencode kind)     | `json-event-stream.ts:handleOpenCodeEvent`   |
+| `codex`           | `json-event-stream` (codex kind)        | `json-event-stream.ts:handleCodexEvent`      |
+| `claude`          | `claude-stream-json`                    | `claude-stream.ts:createClaudeStreamHandler` |
+| `gemini`          | `json-event-stream` (gemini kind)       | `json-event-stream.ts:handleGeminiEvent`     |
+| `cursor-agent`    | `json-event-stream` (cursor-agent kind) | `json-event-stream.ts:handleCursorEvent`     |
+| `deepseek` `qwen` `grok` | `plain`                          | `server.ts` (raw stdout = final assistant text) |
+| `devin` `hermes` `kilo` `kimi` `kiro` `vibe` | `acp-json-rpc` | `acp.ts:attachAcpSession`                       |
+| `vela` (AMR) | `acp-json-rpc` + `login` / `models` subcommands | `runtimes/defs/amr.ts` + `apps/daemon/tests/fixtures/fake-vela.mjs` (sibling stub) |
+
+> **Note on `gemini` and `cursor-agent`**: OD's parsers for these two
+> agents do NOT recognize tool-call events — only init / assistant text /
+> usage. The renderers therefore emit ONLY the final assistant text wrapped
+> in the expected init/text/usage envelope. Tool calls present in the
+> source recording are silently dropped (which matches the real CLI's UI
+> behavior — these agents don't surface tools in OD's chat view).
+
+> **Note on ACP agents** (`devin` / `hermes` / `kilo` / `kimi` / `kiro` /
+> `vibe`): These do NOT stream stdout — they speak JSON-RPC v2 over stdio.
+> OD's daemon sends `initialize` → `session/new` → (optional `session/set_model`)
+> → `session/prompt`; the mock responds in order, streams text via
+> `session/update` notifications carrying `agent_message_chunk` parts,
+> then responds to the prompt request with usage stats. Tool calls
+> aren't part of the ACP protocol on this path (tools surface via MCP or
+> other side channels), so they're dropped from playback.
+
+> **Note on `vela` (AMR)**: vela is the bin OD's AMR runtime spawns. It
+> extends the generic ACP shape with `agentCapabilities` + `models`
+> blocks in `initialize` / `session/new`, plus a **strict set_model gate**
+> — `session/prompt` is rejected with -32602 until `session/set_model`
+> (or `session/set_config_option`) has been called for the current
+> sessionId, mirroring real vela 0.0.1 contract.
+>
+> vela also has two non-ACP subcommands:
+>
+> - `vela login` → writes `~/.amr/config.json` with a fake profile so
+>   OD's daemon login route + `AmrLoginPill` poller see the same on-disk
+>   projection production produces.
+> - `vela models` → prints the production-shaped `public_model_*    vela`
+>   catalog.
+>
+> Error injection envs (kept in sync with
+> `apps/daemon/tests/fixtures/fake-vela.mjs`):
+> `FAKE_VELA_SESSION_NEW_ERROR` / `FAKE_VELA_SET_MODEL_ERROR` /
+> `FAKE_VELA_PROMPT_ERROR` / `FAKE_VELA_LOGIN_FAIL` /
+> `FAKE_VELA_REQUIRE_SET_MODEL=0`.
+
+Each tool call from the recording is rendered with the original input
+arguments and tool output. The agents' assistant text is rendered as
+the final message.
+
+---
+
+## Recording selection
+
+Driven by env vars, in priority order:
+
+| Env | Behavior |
+|---|---|
+| `OD_MOCKS_TRACE=<id>` | Always play this trace. 8-char prefix OK. |
+| `OD_MOCKS_BY_PROMPT_HASH=1` + stdin prompt | Deterministic by `sha256(prompt) % len(all)`. Same prompt → same trace. Useful for "stable answer per question" tests. |
+| `OD_MOCKS_POOL=<tag>` | Random within the tag pool. Examples: `agent:claude`, `skill:agent-browser`, `outcome:failed`. |
+| `OD_MOCKS_SEED=<str>` | Makes "random" picks reproducible across runs. |
+| `OD_MOCKS_NO_DELAY=1` | Skip inter-event waits. |
+| `OD_MOCKS_RECORDINGS_DIR=<path>` | Override the recordings dir. |
+
+If none are set, a uniformly random recording is played each invocation.
+
+The mock binary announces the picked trace id on stderr:
+
+```
+[mock-opencode] picked 04097377… via fixed
+```
+
+This line is invisible to OD's stdout parser but useful for "wait, why
+did my test get the FAQ-fix trace?" debugging.
+
+---
+
+## Recording catalog
+
+The recordings live as one JSONL file per Langfuse trace under
+`recordings/`. Each file starts with a `meta` event carrying:
+
+```json
+{
+  "type": "meta",
+  "source": {"provider": "langfuse", "trace_id": "...", "project_id": "..."},
+  "agent": "claude" | "codex" | "opencode" | "gemini" | "cursor-agent" | "qwen" | "copilot" | "deepseek" | "antigravity",
+  "model": "...",
+  "outcome": "succeeded" | "failed" | "errored" | "interrupted",
+  "duration_ms": 33620,
+  "tool_call_count": 17,
+  "error_count": 0,
+  "total_tokens": 12345,
+  "tags": ["agent:claude", "skill:agent-browser", "open-design", ...],
+  "user_input": "...",
+  "session_id": "..."
+}
+```
+
+Subsequent events are `tool_call`, `tool_result`, and `report` (the
+final assistant text).
+
+### Indexed metadata
+
+`mocks/manifest.json` is a flat manifest with one entry per recording
+plus histograms over all recordings, committed to the repo. It's also
+mirrored to R2 alongside the .jsonl files so consumers can fetch the
+current catalog without cloning. Query with `jq`:
+
+```bash
+# All multi-turn claude sessions about HTML editing
+jq '.entries[] | select(.agent=="claude" and .multi_turn==true)' \
+  mocks/manifest.json | head -50
+
+# Failed codex traces (negative-path tests)
+jq '.entries[] | select(.agent=="codex" and .outcome=="failed") | .trace_id' \
+  mocks/manifest.json
+
+# Agent-browser skill, sorted by tool count desc
+jq '[.entries[] | select(.skills | index("agent-browser"))] | sort_by(-.tool_count)' \
+  mocks/manifest.json
+```
+
+### Headline stats (current dataset)
+
+| Dimension | Distribution |
+|---|---|
+| Agents | claude 57 · opencode 41 · codex 38 · gemini 25 · cursor-agent 11 · qwen/copilot/deepseek 2 each · antigravity 1 |
+| Outcomes | succeeded 144 · failed 35 |
+| Skills | default 71 · ad-creative 50 · algorithmic-art 30 · agent-browser 22 · video-hyperframes 2 · magazine-web-ppt / brainstorming / data-report / penpot-flutter 1 each |
+| Multi-turn | 124 traces tied to a session with ≥2 turns |
+| Artifact | 18 traces produce `<artifact>` output |
+
+---
+
+## Anonymization
+
+User-specific data has been scrubbed from every recording:
+
+- `/Users/<name>/…`, `/home/<name>/…`, `C:\Users\<name>\…`
+  → `${HOME}/…` / `%USERPROFILE%\…`
+- Project UUIDs → stable `proj-001`, `proj-002`, … per recording
+- meta tag `project:<uuid>` rewritten too
+
+The anonymizer is idempotent. Tool input/output payloads (HTML, code,
+etc.) are preserved verbatim — they're templated UI without cell-level
+PII; if a future audit finds otherwise, add specific scrubs in the
+harvester repo (see "Adding more recordings" below) and re-run.
+
+---
+
+## Adding more recordings
+
+Local maintainer flow — the .jsonl never enters the repo. Only the
+manifest delta (≈200 B per entry) gets committed.
+
+### Step 1 — produce an anonymized .jsonl
+
+The harvester that produced the current 179-trace set lives in a
+separate repo, [nexu-io/agent-pr-explore][harvester]. See its README
+for how to authenticate against your trace store, filter by skill /
+agent / outcome, and anonymize the result. Output is one
+`<trace-id>.jsonl` file per recording.
+
+[harvester]: https://github.com/nexu-io/agent-pr-explore
+
+### Step 2 — one-shot upload + manifest update
+
+```bash
+# prereq, once: wrangler login (OAuth, no token to manage)
+bash mocks/scripts/upload-recording.sh /path/to/<trace-id>.jsonl
+```
+
+The script validates the file, prints the manifest entry it will add,
+uploads the .jsonl to R2, rewrites `mocks/manifest.json` locally, then
+uploads the updated manifest to R2 too (so consumers see the new entry
+without waiting for the next git push).
+
+### Step 3 — commit the manifest delta
+
+```bash
+git add mocks/manifest.json
+git commit -m "mocks: add recording <trace-id>"
+git push     # or open a PR — your call
+```
+
+The only thing in the commit is a ~200-byte JSON edit listing the new
+entry's `trace_id`, `sha256`, `bytes`, `agent`, `outcome`, `skills`,
+etc. The .jsonl itself stays in R2.
+
+### Trust model
+
+- **R2 write is wrangler-OAuth gated.** Maintainers do `wrangler login`
+  once. The bucket is on the powerformer Cloudflare account (pinned in
+  the script). No long-lived tokens in repo secrets, no Action to
+  hijack — just account access.
+- **Repo stays small forever.** No .jsonl files ever land in git; the
+  manifest grows by ~200 B per recording.
+- **Read stays public.** Anyone can fetch via the r2.dev URL — see
+  [Recordings live on R2, not in this repo](#recordings-live-on-r2-not-in-this-repo).
+
+### Removing a recording
+
+```bash
+# 1. delete from R2
+export CLOUDFLARE_ACCOUNT_ID=64ad4569ffd912432d6b86d5656484c4
+wrangler r2 object delete open-design-mocks/recordings/v1/<trace-id>.jsonl --remote
+# 2. drop the entry from manifest.json (edit by hand, or use `jq`)
+# 3. re-upload manifest
+wrangler r2 object put open-design-mocks/recordings/v1/manifest.json \
+  --file mocks/manifest.json --remote
+# 4. git add mocks/manifest.json && git commit && git push
+```
+
+There's no automation for delete because (a) it's rare and (b) you
+want a human to think about whether removing a recording would
+invalidate any test fixtures that pin it via `OD_MOCKS_TRACE=<id>`.
+
+---
+
+## Usage from OD's test code
+
+### From a test (Vitest / Jest)
+
+```ts
+import { spawn } from 'node:child_process';
+import { join } from 'node:path';
+
+const MOCK_BIN = join(__dirname, '../../mocks/bin');
+
+it('parses an opencode session with 4 tool calls into 4 UI events', async () => {
+  const child = spawn('opencode', ['run'], {
+    env: {
+      ...process.env,
+      PATH: `${MOCK_BIN}:${process.env.PATH}`,
+      OD_MOCKS_TRACE: '06a9324a',   // 4-tool claude session
+      OD_MOCKS_NO_DELAY: '1',
+    },
+    stdio: ['pipe', 'pipe', 'pipe'],
+  });
+  child.stdin.write('test prompt');
+  child.stdin.end();
+  // ... assert events parsed from child.stdout
+});
+```
+
+### From a manual playback
+
+```bash
+# See what claude's 17-tool "delete v2" session emits to OD:
+export PATH=$(git rev-parse --show-toplevel)/mocks/bin:$PATH
+export OD_MOCKS_TRACE=04097377
+export OD_MOCKS_NO_DELAY=1
+echo "anything" | claude -p --output-format=stream-json | jq .type | uniq -c
+```
+
+---
+
+## Files
+
+```
+mocks/
+├── README.md                 ← you are here
+├── mock-agent.mjs                ← entry; routes --as <agent> to format renderer
+├── lib/
+│   ├── recording-picker.mjs      ← env-driven trace selection
+│   ├── format-opencode.mjs       ← matches handleOpenCodeEvent
+│   ├── format-codex.mjs          ← matches handleCodexEvent
+│   ├── format-claude.mjs         ← matches createClaudeStreamHandler
+│   ├── format-gemini.mjs         ← matches handleGeminiEvent
+│   ├── format-cursor-agent.mjs   ← matches handleCursorEvent
+│   ├── format-acp.mjs            ← JSON-RPC server matching attachAcpSession
+│   ├── format-vela.mjs           ← AMR vela: ACP + models block + set_model gate
+│   ├── vela-subcommands.mjs      ← `vela login` + `vela models` handlers
+│   └── format-plain.mjs          ← raw stdout (deepseek/qwen/grok)
+├── bin/
+│   ├── opencode  claude  codex
+│   ├── gemini    cursor-agent
+│   ├── deepseek  qwen    grok
+│   ├── devin hermes kilo kimi kiro vibe
+│   └── vela                       ← 15 bash wrappers, PATH-overlay
+├── manifest.json                 ← committed: 179 entries' metadata + sha256 + provenance + R2 storage hints
+├── golden/                       ← committed: daemon-event regression snapshots
+│   ├── README.md
+│   └── *.events.json             ← 3 representative traces (claude/codex/opencode)
+├── scripts/
+│   ├── smoke-test.sh             ← 21 checks; auto-fetches recordings if empty
+│   ├── fetch-recordings.sh       ← pull from R2 (parallel, sha256-verified, idempotent)
+│   ├── upload-recording.sh       ← maintainer-local: validate + wrangler put + manifest update
+│   ├── contract-check.sh         ← real-CLI vs mock protocol drift check (manual)
+│   └── lib/
+│       └── manifest-utils.mjs    ← shared sha256 / meta-parse / manifest-rebuild logic
+└── recordings/                   ← populated at runtime, gitignored .jsonl
+    └── .gitignore                ← recordings come via fetch
+```
+
+No external dependencies. Pure node:`fs`/`crypto`/`child_process`. Works
+under any Node ≥18.
+
+---
+
+## Limitations
+
+- `copilot`, `qoder`, `pi` (the niche `copilot-stream-json` /
+  `qoder-stream-json` / `pi-rpc` formats) are recorded but not yet
+  rendered as their native protocols — they fall back to the plain
+  renderer for now. If you need them, add a `format-<agent>.mjs`
+  following the same pattern as `format-codex.mjs`; the parsers are
+  in `apps/daemon/src/{copilot-stream,qoder-stream}.ts` and the pi-rpc
+  handler inside `apps/daemon/src/server.ts`.
+- The mock does not honor CLI flags that change semantics (`--model`,
+  `--permission-mode`, `--allowed-tools`). They're silently ignored.
+
+---
+
+## Provenance / safety
+
+All recordings come from open-design's own Langfuse project (the
+`open-design` project under the `powerformer` org). Users opted into
+telemetry when they installed the desktop client. The anonymizer
+removed user-identifying paths and project UUIDs before checking in.
+
+If you find a recording that includes content that should be redacted,
+follow the [Removing a recording](#removing-a-recording) flow above.
--- a/mocks/bin/claude
+++ b/mocks/bin/claude
@ -0,0 +1,9 @@
+#!/usr/bin/env bash
+# Mock wrapper for claude — runs the bundled mock-agent in this agent's
+# stdout protocol. The wrapper accepts (and ignores) any flags the real
+# CLI would take; the mock doesn't honor model selection, tool gates, or
+# permission modes — recording selection is via OD_MOCKS_*
+# env vars (see ../README.md).
+set -euo pipefail
+HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd -P)"
+exec node "$HERE/../mock-agent.mjs" --as claude "$@"
--- a/mocks/bin/codex
+++ b/mocks/bin/codex
@ -0,0 +1,9 @@
+#!/usr/bin/env bash
+# Mock wrapper for codex — runs the bundled mock-agent in this agent's
+# stdout protocol. The wrapper accepts (and ignores) any flags the real
+# CLI would take; the mock doesn't honor model selection, tool gates, or
+# permission modes — recording selection is via OD_MOCKS_*
+# env vars (see ../README.md).
+set -euo pipefail
+HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd -P)"
+exec node "$HERE/../mock-agent.mjs" --as codex "$@"
--- a/mocks/bin/cursor-agent
+++ b/mocks/bin/cursor-agent
@ -0,0 +1,4 @@
+#!/usr/bin/env bash
+set -euo pipefail
+HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd -P)"
+exec node "$HERE/../mock-agent.mjs" --as cursor-agent "$@"
--- a/mocks/bin/deepseek
+++ b/mocks/bin/deepseek
@ -0,0 +1,9 @@
+#!/usr/bin/env bash
+# Mock wrapper for deepseek — runs the bundled mock-agent in this agent's
+# stdout protocol. The wrapper accepts (and ignores) any flags the real
+# CLI would take; the mock doesn't honor model selection, tool gates, or
+# permission modes — recording selection is via OD_MOCKS_*
+# env vars (see ../README.md).
+set -euo pipefail
+HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd -P)"
+exec node "$HERE/../mock-agent.mjs" --as deepseek "$@"
--- a/mocks/bin/devin
+++ b/mocks/bin/devin
@ -0,0 +1,4 @@
+#!/usr/bin/env bash
+set -euo pipefail
+HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd -P)"
+exec node "$HERE/../mock-agent.mjs" --as devin "$@"
--- a/mocks/bin/gemini
+++ b/mocks/bin/gemini
@ -0,0 +1,4 @@
+#!/usr/bin/env bash
+set -euo pipefail
+HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd -P)"
+exec node "$HERE/../mock-agent.mjs" --as gemini "$@"
--- a/mocks/bin/grok
+++ b/mocks/bin/grok
@ -0,0 +1,9 @@
+#!/usr/bin/env bash
+# Mock wrapper for grok — runs the bundled mock-agent in this agent's
+# stdout protocol. The wrapper accepts (and ignores) any flags the real
+# CLI would take; the mock doesn't honor model selection, tool gates, or
+# permission modes — recording selection is via OD_MOCKS_*
+# env vars (see ../README.md).
+set -euo pipefail
+HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd -P)"
+exec node "$HERE/../mock-agent.mjs" --as grok "$@"
--- a/mocks/bin/hermes
+++ b/mocks/bin/hermes
@ -0,0 +1,4 @@
+#!/usr/bin/env bash
+set -euo pipefail
+HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd -P)"
+exec node "$HERE/../mock-agent.mjs" --as hermes "$@"
--- a/mocks/bin/kilo
+++ b/mocks/bin/kilo
@ -0,0 +1,4 @@
+#!/usr/bin/env bash
+set -euo pipefail
+HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd -P)"
+exec node "$HERE/../mock-agent.mjs" --as kilo "$@"
--- a/mocks/bin/kimi
+++ b/mocks/bin/kimi
@ -0,0 +1,4 @@
+#!/usr/bin/env bash
+set -euo pipefail
+HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd -P)"
+exec node "$HERE/../mock-agent.mjs" --as kimi "$@"
--- a/mocks/bin/kiro
+++ b/mocks/bin/kiro
@ -0,0 +1,4 @@
+#!/usr/bin/env bash
+set -euo pipefail
+HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd -P)"
+exec node "$HERE/../mock-agent.mjs" --as kiro "$@"
--- a/mocks/bin/kiro-cli
+++ b/mocks/bin/kiro-cli
@ -0,0 +1,7 @@
+#!/usr/bin/env bash
+# Alias wrapper for kiro-cli — the primary bin name OD resolves for the
+# kiro RuntimeAgentDef (bin: 'kiro-cli').
+# Delegates to the same mock as mocks/bin/kiro.
+set -euo pipefail
+HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd -P)"
+exec node "$HERE/../mock-agent.mjs" --as kiro "$@"
--- a/mocks/bin/opencode
+++ b/mocks/bin/opencode
@ -0,0 +1,9 @@
+#!/usr/bin/env bash
+# Mock wrapper for opencode — runs the bundled mock-agent in this agent's
+# stdout protocol. The wrapper accepts (and ignores) any flags the real
+# CLI would take; the mock doesn't honor model selection, tool gates, or
+# permission modes — recording selection is via OD_MOCKS_*
+# env vars (see ../README.md).
+set -euo pipefail
+HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd -P)"
+exec node "$HERE/../mock-agent.mjs" --as opencode "$@"
--- a/mocks/bin/opencode-cli
+++ b/mocks/bin/opencode-cli
@ -0,0 +1,7 @@
+#!/usr/bin/env bash
+# Alias wrapper for opencode-cli — the primary bin name OD resolves for
+# the opencode RuntimeAgentDef (bin: 'opencode-cli', fallbackBins: ['opencode']).
+# Delegates to the same mock as mocks/bin/opencode.
+set -euo pipefail
+HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd -P)"
+exec node "$HERE/../mock-agent.mjs" --as opencode "$@"
--- a/mocks/bin/qwen
+++ b/mocks/bin/qwen
@ -0,0 +1,9 @@
+#!/usr/bin/env bash
+# Mock wrapper for qwen — runs the bundled mock-agent in this agent's
+# stdout protocol. The wrapper accepts (and ignores) any flags the real
+# CLI would take; the mock doesn't honor model selection, tool gates, or
+# permission modes — recording selection is via OD_MOCKS_*
+# env vars (see ../README.md).
+set -euo pipefail
+HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd -P)"
+exec node "$HERE/../mock-agent.mjs" --as qwen "$@"
--- a/mocks/bin/vela
+++ b/mocks/bin/vela
@ -0,0 +1,8 @@
+#!/usr/bin/env bash
+# vela mock CLI — dispatches by first argv:
+#   vela login    → write ~/.amr/config.json (fake credentials)
+#   vela models   → print public model catalog
+#   vela agent run --runtime opencode → ACP JSON-RPC server (default)
+set -euo pipefail
+HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd -P)"
+exec node "$HERE/../mock-agent.mjs" --as vela "$@"
--- a/mocks/bin/vibe
+++ b/mocks/bin/vibe
@ -0,0 +1,4 @@
+#!/usr/bin/env bash
+set -euo pipefail
+HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd -P)"
+exec node "$HERE/../mock-agent.mjs" --as vibe "$@"
--- a/mocks/bin/vibe-acp
+++ b/mocks/bin/vibe-acp
@ -0,0 +1,7 @@
+#!/usr/bin/env bash
+# Alias wrapper for vibe-acp — the primary bin name OD resolves for the
+# vibe RuntimeAgentDef (bin: 'vibe-acp').
+# Delegates to the same mock as mocks/bin/vibe.
+set -euo pipefail
+HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd -P)"
+exec node "$HERE/../mock-agent.mjs" --as vibe "$@"
--- a/mocks/golden/314d6833-0377-4ac4-ba11-2b8d7eca5511.events.json
+++ b/mocks/golden/314d6833-0377-4ac4-ba11-2b8d7eca5511.events.json
@ -0,0 +1,114 @@
+{
+  "agent": "claude",
+  "trace": "314d6833-0377-4ac4-ba11-2b8d7eca5511",
+  "events": [
+    {
+      "type": "status",
+      "label": "initializing",
+      "model": null,
+      "sessionId": "<normalized>"
+    },
+    {
+      "type": "tool_use",
+      "id": "314d6833-0377-4ac4-ba11-2b8d7eca5511-tool-toolu_bdrk_01W4ChHzyM5SX3pnfNPzvvQe",
+      "name": "Read",
+      "input": {
+        "file_path": "${HOME}/Library/Application Support/Open Design/namespaces/release-stable/data/projects/proj-001/ref-b-tickets-v3.html",
+        "offset": 100,
+        "limit": 40
+      }
+    },
+    {
+      "type": "turn_end",
+      "stopReason": "tool_use"
+    },
+    {
+      "type": "tool_result",
+      "toolUseId": "314d6833-0377-4ac4-ba11-2b8d7eca5511-tool-toolu_bdrk_01W4ChHzyM5SX3pnfNPzvvQe",
+      "content": "100\t  .led-grid .led { width: 100%; aspect-ratio: 1 / 1; max-height: 22px; min-height: 12px;\n101\t    display: grid; place-items: center; font-size: 8px; font-weight: 700; }\n102\t  .led.up { background: rgba(0,255,102,0.18); color: var(--green); border: 1px solid rgba(0,255,102,0.55); box-shadow: inset 0 0 6px rgba(0,255,102,0.35); }\n103\t  .led.dn { background: rgba(255,51,68,0.16); color: var(--red); border: 1px solid rgba(255,51,68,0.5); box-shadow: inset 0 0 6px rgba(255,51,68,0.3); }\n104\t  .led.flat { background: rgba(112,112,112,0.12); color: var(--muted); border: 1px solid var(--line-strong); }\n105\t  .led-grid .net { font-size: 10px; color: var(--fg-strong); font-weight: 700; text-align: right; letter-spacing: 0.04em; }\n106\t  .led-grid .net.up { color: var(--green); }\n107\t  .led-grid .net.dn { color: var(--red); }\n108\t  .led-legend { display: flex; gap: 12px; margin-top: 8px; font-size: 9px; color: var(--muted); letter-spacing: 0.08em; }\n109\t  .led-legend .lg { display: flex; align-items: center; gap: 4px; }\n110\t  .led-legend i { width: 10px; height: 10px; }\n111\t  .led-legend i.up { background: rgba(0,255,102,0.45); border: 1px solid var(--green); }\n112\t  .led-legend i.dn { background: rgba(255,51,68,0.4); border: 1px solid var(--red); }\n113\t  .led-legend i.flat { background: rgba(112,112,112,0.2); border: 1px solid var(--muted); }\n114\t\n115\t  /* MOD 2: Activity log */\n116\t  .log-list { font-size: 10px; line-height: 1.5; letter-spacing: 0.02em; }\n117\t  .log-row { display: grid; grid-template-columns: 56px 38px 1fr auto; gap: 8px; padding: 4px 0; border-bottom: 1px dotted var(--line); }\n118\t  .log-row:last-child { border-bottom: 0; }\n119\t  .log-time { color: var(--dim); }\n120\t  .log-type { color: var(--amber); font-weight: 700; letter-spacing: 0.06em; }\n121\t  .log-type.cl { color: var(--green); }\n122\t  .log-type.cr { color: var(--cyan); }\n123\t  .log-msg { color: var(--fg); white-space: nowrap; overflow: hidden; text-overflow: ellipsis; }\n124\t  .log-d { font-weight: 700; }\n125\t  .log-d.up { color: var(--green); }\n126\t  .log-d.dn { color: var(--red); }\n127\t\n128\t  /* MOD 3: ASCII heatmap */\n129\t  .heat-wrap { font-family: var(--font-mono); font-size: 10px; line-height: 1.55; color: var(--fg); }\n130\t  .heat-wrap .heat-head, .heat-wrap .heat-row { display: grid; grid-template-columns: 36px repeat(4, 1fr); gap: 6px; }\n131\t  .heat-wrap .heat-head { color: var(--muted); font-size: 9px; letter-spacing: 0.1em; padding-bottom: 4px; border-bottom: 1px dotted var(--line-strong); margin-bottom: 4px; }\n132\t  .heat-wrap .heat-row { padding: 3px 0; }\n133\t  .heat-wrap .heat-row .m { color: var(--muted); font-weight: 600; letter-spacing: 0.06em; }\n134\t  .heat-cell { padding: 2px 4px; text-align: center; font-weight: 700; letter-spacing: 0.04em; }\n135\t  .heat-cell.h0 { background: #0f0f0f; color: var(--dim); }\n136\t  .heat-cell.h1 { background: rgba(255,176,0,0.12); color: var(--amber-dim); }\n137\t  .heat-cell.h2 { background: rgba(255,176,0,0.28); color: var(--amber); }\n138\t  .heat-cell.h3 { background: rgba(255,176,0,0.5); color: #fff; }\n139\t  .heat-cell.h4 { background: rgba(255,176,0,0.78); color: #050505; }",
+      "isError": false
+    },
+    {
+      "type": "tool_use",
+      "id": "314d6833-0377-4ac4-ba11-2b8d7eca5511-tool-toolu_bdrk_01SifooERKXmRaxihtaFixFU",
+      "name": "Grep",
+      "input": {
+        "pattern": "kpi-cell::before|kpi-cell\\s*\\{|\\.kpi-strip|\\.kpi-cell",
+        "path": "${HOME}/Library/Application Support/Open Design/namespaces/release-stable/data/projects/proj-001/ref-b-tickets-v3.html",
+        "output_mode": "content",
+        "-n": true
+      }
+    },
+    {
+      "type": "turn_end",
+      "stopReason": "tool_use"
+    },
+    {
+      "type": "tool_result",
+      "toolUseId": "314d6833-0377-4ac4-ba11-2b8d7eca5511-tool-toolu_bdrk_01SifooERKXmRaxihtaFixFU",
+      "content": "55:  .kpi-strip { background: var(--bg-2); border-bottom: 1px solid var(--line); display: grid; grid-template-columns: repeat(4, 1fr); }\n56:  .kpi-cell { padding: 14px 22px; border-right: 1px solid var(--line); position: relative; }\n57:  .kpi-cell:last-child { border-right: 0; }\n58:  .kpi-cell::before { content:''; position: absolute; left: 0; top: 12px; bottom: 12px; width: 3px; }\n59:  .kpi-cell.cust-create::before { background: var(--cyan); }\n60:  .kpi-cell.cust-close::before { background: var(--green); }\n61:  .kpi-cell.int-create::before { background: var(--amber); }\n62:  .kpi-cell.int-close::before { background: var(--magenta); }",
+      "isError": false
+    },
+    {
+      "type": "tool_use",
+      "id": "314d6833-0377-4ac4-ba11-2b8d7eca5511-tool-toolu_bdrk_0163uAuJyfBhuHWtxXoSwi3N",
+      "name": "Edit",
+      "input": {
+        "file_path": "${HOME}/Library/Application Support/Open Design/namespaces/release-stable/data/projects/proj-001/ref-b-tickets-v3.html",
+        "old_string": "  .kpi-cell::before { content:''; position: absolute; left: 0; top: 12px; bottom: 12px; width: 3px; }",
+        "new_string": "  .kpi-cell::before { content:''; position: absolute; left: 0; top: 14px; bottom: 14px; width: 3px; }"
+      }
+    },
+    {
+      "type": "turn_end",
+      "stopReason": "tool_use"
+    },
+    {
+      "type": "tool_result",
+      "toolUseId": "314d6833-0377-4ac4-ba11-2b8d7eca5511-tool-toolu_bdrk_0163uAuJyfBhuHWtxXoSwi3N",
+      "content": "The file ${HOME}/Library/Application Support/Open Design/namespaces/release-stable/data/projects/proj-001/ref-b-tickets-v3.html has been updated successfully. (file state is current in your context — no need to Read it back)",
+      "isError": false
+    },
+    {
+      "type": "tool_use",
+      "id": "314d6833-0377-4ac4-ba11-2b8d7eca5511-tool-toolu_bdrk_01QdvPxPh8Qk9Ch4bgAgGZxT",
+      "name": "Grep",
+      "input": {
+        "-n": true,
+        "output_mode": "content",
+        "path": "${HOME}/Library/Application Support/Open Design/namespaces/release-stable/data/projects/proj-001/ref-b-tickets-v3.html",
+        "pattern": "topbar|\\.bar\\s"
+      }
+    },
+    {
+      "type": "turn_end",
+      "stopReason": "tool_use"
+    },
+    {
+      "type": "tool_result",
+      "toolUseId": "314d6833-0377-4ac4-ba11-2b8d7eca5511-tool-toolu_bdrk_01QdvPxPh8Qk9Ch4bgAgGZxT",
+      "content": "40:  /* topbar */\n41:  .topbar { background: var(--bg-2); border-bottom: 1px solid var(--line); display: grid;\n44:  .topbar .lbl { color: var(--muted); }\n45:  .topbar .v { color: var(--amber); }\n46:  .topbar .v.up { color: var(--green); }\n47:  .topbar .v.dn { color: var(--red); }\n48:  .topbar .sep { color: var(--dim); }\n214:  <header class=\"topbar\">",
+      "isError": false
+    },
+    {
+      "type": "text_delta",
+      "delta": "让我看一下 KPI 色条的当前样式定位。让我找 KPI 色条样式的定义。找到了。当前 `top: 12px; bottom: 12px` — KPI cell 高度是 78px（来自 stage grid `32px 78px 1fr 44px` 的第二行），padding 是 `14px 22px`。\n\n色条 `top:12 / bottom:12` 意味着色条上下各留 12px，色条本身高度 = 78 - 12 - 12 = 54px。\n\n但 KPI cell 的 padding 是 `14px 22px`（上下 14px），意味着内容区从 14px 开始。色条 `top:12px` 比内容区上沿（14px）高 2px，bottom 同理。这看起来挺合理啊...\n\n等等，可能是用户感觉\"统一往下偏移几像素\"是因为色条上端比 KPI 数字（在 padding 14px 后开始）更高 2px，但下端比 KPI 文字下边缘更深 — 视觉上整条色条的中点跟内容的中点错开了。\n\n让我让色条跟 KPI 内容（padding 14 上下）完全对齐：`top:14px / bottom:14px`。改完。色条上下从 12px 改成 14px，跟 KPI cell 的 padding（14px）完全对齐 — 现在色条的上下端正好齐内容区的上下沿，不再有 2px 错位。\n\n但你说\"统一往下偏移几个像素\" — 还有一种可能是色条整体被往下推了。让我再看一下是不是有别的样式干扰。topbar 32px 高度正常。color bar 现在 `top:14 / bottom:14` 应该对齐到 KPI 内容区的上下沿了。\n\n刷新 ref-b-tickets-v3.html 看看现在对齐了没？如果还是觉得偏移，告诉我具体是色条整体往下移、还是上下不对称，我再精调。"
+    },
+    {
+      "type": "turn_end",
+      "stopReason": "end_turn"
+    },
+    {
+      "type": "usage",
+      "usage": {
+        "input_tokens": 0,
+        "output_tokens": 1594,
+        "cache_creation_input_tokens": 0,
+        "cache_read_input_tokens": 0
+      },
+      "costUsd": 0,
+      "durationMs": 40885,
+      "stopReason": "end_turn"
+    }
+  ]
+}
--- a/mocks/golden/9a9522ec-575f-432f-aeed-efc491e900aa.events.json
+++ b/mocks/golden/9a9522ec-575f-432f-aeed-efc491e900aa.events.json
@ -0,0 +1,187 @@
+{
+  "agent": "opencode",
+  "trace": "9a9522ec-575f-432f-aeed-efc491e900aa",
+  "events": [
+    {
+      "type": "status",
+      "label": "running"
+    },
+    {
+      "type": "tool_use",
+      "id": "9a9522ec-575f-432f-aeed-efc491e900aa-tool-call_00_9F86gxR0AA5Ea5dwfJQA0062",
+      "name": "read",
+      "input": {
+        "filePath": "%USERPROFILE%\\AppData\\Roaming\\Open Design\\namespaces\\release-stable-win\\data\\projects\\proj-001\\evidguard-pitch-deck.html"
+      }
+    },
+    {
+      "type": "tool_result",
+      "toolUseId": "9a9522ec-575f-432f-aeed-efc491e900aa-tool-call_00_9F86gxR0AA5Ea5dwfJQA0062",
+      "content": "<path>%USERPROFILE%\\AppData\\Roaming\\Open Design\\namespaces\\release-stable-win\\data\\projects\\proj-001\\evidguard-pitch-deck.html</path>\n<type>file</type>\n<content>\n1: \n2: <!doctype html>\n3: <html lang=\"es\">\n4: <head>\n5:   <meta charset=\"utf-8\" />\n6:   <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />\n7:   <title>EvidGuard · Pitch deck</title>\n8:   <style>\n9:     :root {\n10:       /* Apple system — Zhangzara accent */\n11:       --bg: #ffffff;\n12:       --surface: #f5f5f7;\n13:       --fg: #1d1d1f;\n14:       --fg-2: #424245;\n15:       --muted: #6e6e73;\n16:       --border: #d2d2d7;\n17:       --accent: #e5ff00;\n18:       --accent-on: #000000;\n19:       --accent-soft: color-mix(in oklch, var(--accent) 14%, transparent);\n20: \n21:       --font-display: \"SF Pro Display\", \"SF Pro Icons\", \"Helvetica Neue\", Helvetica, Arial, sans-serif;\n22:       --font-body:    \"SF Pro Text\", \"SF Pro Icons\", \"Helvetica Neue\", Helvetica, Arial, sans-serif;\n23:       --font-mono:    \"SF Mono\", ui-monospace, \"JetBrains Mono\", \"IBM Plex Mono\", Menlo, monospace;\n24: \n25:       --dark-bg: #000000;\n26:       --light-text: #f5f5f7;\n27: \n28:       --radius-sm: 8px;\n29:       --radius-md: 12px;\n30:       --radius-lg: 18px;\n31:       --radius-pill: 980px;\n32: \n33:       --elev-flat: none;\n34:       --elev-ring: 0 0 0 1px var(--border);\n35:       --elev-raised: 0 12px 32px rgba(0,0,0,0.08);\n36: \n37:       --motion-fast: 150ms;\n38:       --motion-base: 220ms;\n39:       --ease-standard: cubic-bezier(0.28, 0, 0.22, 1);\n40:     }\n41: \n42:     *, *::before, *::after { box-sizing: border-box; }\n43:     @keyframes fadeSlideIn {\n44:       from { opacity: 0; transform: translateY(24px); }\n45:       to   { opacity: 1; transform: translateY(0); }\n46:     }\n47:     html, body { margin: 0; height: 100%; }\n48:     body {\n49:       background: var(--dark-bg);\n50:       color: var(--fg);\n51:       font-family: var(--font-body);\n52:       font-size: 18px;\n53:       line-height: 1.5;\n54:       -webkit-font-smoothing: antialiased;\n55:       display: flex;\n56:       overflow-x: auto;\n57:       overflow-y: hidden;\n58:       scroll-snap-type: x mandatory;\n59:       scroll-behavior: smooth;\n60:     }\n61:     body::-webkit-scrollbar { display: none; }\n62:     p { text-wrap: pretty; }\n63:     h1, h2, h3 { text-wrap: balance; }\n64: \n65:     .slide {\n66:       flex: 0 0 100vw;\n67:       width: 100vw;\n68:       height: 100vh;\n69:       scroll-snap-align: start;\n70:       padding: clamp(48px, 7vw, 96px) clamp(48px, 8vw, 112px);\n71:       display: flex;\n72:       flex-direction: column;\n73:       justify-content: center;\n74:       position: relative;\n75:       overflow: hidden;\n76:     }\n77:     .slide > * { animation: fadeSlideIn 0.6s var(--ease-standard) both; }\n78:     .slide > *:nth-child(1) { animation-delay: 0.1s; }\n79:     .slide > *:nth-child(2) { animation-delay: 0.2s; }\n80:     .slide > *:nth-child(3) { animation-delay: 0.3s; }\n81:     .slide > *:nth-child(4) { animation-delay: 0.4s; }\n82:     .slide > *:nth-child(5) { animation-delay: 0.5s; }\n83:     .slide > *:nth-child(6) { animation-delay: 0.6s; }\n84: \n85:     .slide.light { background: var(--bg); color: var(--fg); }\n86:     .slide.dark  { background: var(--dark-bg); color: var(--light-text); }\n87:     .slide.dark .muted    { color: rgba(245,245,247,0.65); }\n88:     .slide.dark .border   { border-color: rgba(245,245,247,0.18); }\n89:     .slide.hero { padding-block: clamp(64px, 9vw, 128px); }\n90:     .slide.center { align-items: center; text-align: center; justify-content: center; }\n91:     .slide.center .body { margin-inline: auto; }\n92: \n93:     .eyebrow {\n94:       font-family: var(--font-mono);\n95:       font-size: 12px;\n96:       letter-spacing: 0.1em;\n97:       text-transform: uppercase;\n98:       color: var(--accent);\n99:       margin: 0 0 28px;\n100:     }\n101:     .slide.light .eyebrow { color: var(--fg); }\n102:     .slide.light .eyebrow span { color: var(--accent); }\n103: \n104:     .h-hero {\n105:       font-family: var(--font-display);\n106:       font-size: clamp(56px",
+      "isError": false
+    },
+    {
+      "type": "tool_use",
+      "id": "9a9522ec-575f-432f-aeed-efc491e900aa-tool-call_00_XGSoulTKMbeomO2LlXC40391",
+      "name": "todowrite",
+      "input": {
+        "todos": [
+          {
+            "content": "Rewrite CSS with cleaner minimal styling and advanced animations",
+            "status": "in_progress",
+            "priority": "high"
+          },
+          {
+            "content": "Simplify slide layouts — remove excess chrome, tighter copy",
+            "status": "pending",
+            "priority": "high"
+          },
+          {
+            "content": "Add scroll-triggered parallax and staggered micro-animations",
+            "status": "pending",
+            "priority": "high"
+          },
+          {
+            "content": "Verify all animations work and deck is responsive",
+            "status": "pending",
+            "priority": "medium"
+          }
+        ]
+      }
+    },
+    {
+      "type": "tool_result",
+      "toolUseId": "9a9522ec-575f-432f-aeed-efc491e900aa-tool-call_00_XGSoulTKMbeomO2LlXC40391",
+      "content": "[{\"content\":\"Rewrite CSS with cleaner minimal styling and advanced animations\",\"status\":\"in_progress\",\"priority\":\"high\"},{\"content\":\"Simplify slide layouts — remove excess chrome, tighter copy\",\"status\":\"pending\",\"priority\":\"high\"},{\"content\":\"Add scroll-triggered parallax and staggered micro-animations\",\"status\":\"pending\",\"priority\":\"high\"},{\"content\":\"Verify all animations work and deck is responsive\",\"status\":\"pending\",\"priority\":\"medium\"}]",
+      "isError": false
+    },
+    {
+      "type": "tool_use",
+      "id": "9a9522ec-575f-432f-aeed-efc491e900aa-tool-call_00_rb6Fsyip171hkqP6OXk18071",
+      "name": "write",
+      "input": "{\"filePath\":\"C:\\\\Users\\\\borde\\\\AppData\\\\Roaming\\\\Open Design\\\\namespaces\\\\release-stable-win\\\\data\\\\projects\\\\proj-001\\\\evidguard-pitch-deck.html\",\"content\":\"<!doctype html>\\n<html lang=\\\"es\\\">\\n<head>\\n  <meta charset=\\\"utf-8\\\" />\\n  <meta name=\\\"viewport\\\" content=\\\"width=device-width, initial-scale=1\\\" />\\n  <title>EvidGuard · Pitch deck</title>\\n  <style>\\n    :root {\\n      --bg: #ffffff;\\n      --surface: #f5f5f7;\\n      --fg: #1d1d1f;\\n      --fg-2: #424245;\\n      --muted: #6e6e73;\\n      --border: #d2d2d7;\\n      --accent: #e5ff00;\\n      --accent-on: #000000;\\n\\n      --font-display: \\\"SF Pro Display\\\", \\\"SF Pro Icons\\\", \\\"Helvetica Neue\\\", Helvetica, Arial, sans-serif;\\n      --font-body:    \\\"SF Pro Text\\\", \\\"SF Pro Icons\\\", \\\"Helvetica Neue\\\", Helvetica, Arial, sans-serif;\\n      --font-mono:    \\\"SF Mono\\\", ui-monospace, \\\"JetBrains Mono\\\", \\\"IBM Plex Mono\\\", Menlo, monospace;\\n\\n      --dark-bg: #000000;\\n      --light-text: #f5f5f7;\\n\\n      --radius-sm: 6px;\\n      --radius-pill: 980px;\\n\\n      --ease-standard: cubic-bezier(0.28, 0, 0.22, 1);\\n      --ease-out: cubic-bezier(0.16, 1, 0.3, 1);\\n      --ease-in-out: cubic-bezier(0.76, 0, 0.24, 1);\\n    }\\n    *,*::before,*::after{box-sizing:border-box;margin:0;padding:0}\\n\\n    @keyframes revealUp {\\n      0%   { opacity: 0; transform: translateY(28px); }\\n      100% { opacity: 1; transform: translateY(0); }\\n    }\\n    @keyframes revealScale {\\n      0%   { opacity: 0; transform: scale(0.92); }\\n      100% { opacity: 1; transform: scale(1); }\\n    }\\n    @keyframes revealAccent {\\n      0%   { opacity: 0; transform: scaleX(0); }\\n      100% { opacity: 1; transform: scaleX(1); }\\n    }\\n    @keyframes glowPulse {\\n      0%, 100% { box-shadow: 0 0 30px color-mix(in oklch, var(--accent) 10%, transparent); }\\n      50%      { box-shadow: 0 0 60px color-mix(in oklch, var(--accent) 18%, transparent); }\\n    }\\n    @keyframes counterPop {\\n      0%   { transform: scale(1); }\\n      50%  { transform: scale(1.12); }\\n      100% { transform: scale(1); }\\n    }\\n    @keyframes progressShimmer {\\n      0%   { background-position: -200% 0; }\\n      100% { background-position: 200% 0; }\\n    }\\n\\n    html,body{margin:0;height:100%}\\n    body {\\n      background: var(--dark-bg);\\n      color: var(--fg);\\n      font-family: var(--font-body);\\n      font-size: 18px;\\n      line-height: 1.55;\\n      -webkit-font-smoothing: antialiased;\\n      display: flex;\\n      overflow-x: auto;\\n      overflow-y: hidden;\\n      scroll-snap-type: x mandatory;\\n      scroll-behavior: smooth;\\n    }\\n    body::-webkit-scrollbar{display:none}\\n    p{text-wrap:pretty}\\n    h1,h2,h3,h4{text-wrap:balance;font-weight:500}\\n\\n    .slide {\\n      flex: 0 0 100vw;\\n      width: 100vw;\\n      height: 100vh;\\n      scroll-snap-align: start;\\n      padding: clamp(60px, 8vw, 100px) clamp(64px, 10vw, 140px);\\n      display: flex;\\n      flex-direction: column;\\n      justify-content: center;\\n      position: relative;\\n      overflow: hidden;\\n    }\\n\\n    /* ── Entrance animation layers ── */\\n    .slide .entrance-1 { animation: revealUp 0.7s var(--ease-out) both; }\\n    .slide .entrance-2 { animation: revealUp 0.7s var(--ease-out) 0.15s both; }\\n    .slide .entrance-3 { animation: revealUp 0.7s var(--ease-out) 0.3s both; }\\n    .slide .entrance-4 { animation: revealUp 0.7s var(--ease-out) 0.45s both; }\\n    .slide .entrance-5 { animation: revealUp 0.7s var(--ease-out) 0.6s both; }\\n    .slide .entrance-scale { animation: revealScale 0.8s var(--ease-out) 0.1s both; }\\n    .slide.hero .eyebrow   { animation: revealUp 0.6s var(--ease-out) 0.05s both; }\\n    .slide.hero .h-hero    { animation: revealUp 0.8s var(--ease-out) 0.2s both; }\\n\\n    .slide.light { background: var(--bg); }\\n    .slide.dark  { background: var(--dark-bg); color: var(--light-text); }\\n    .slide.center { align-items: center; text-align: center; }\\n    .slide.center .entrance-1,\\n    .slide.center .entrance-2,\\n    .slide.center .entrance-3 { margin-inline: auto; }\\n\\n    .eyebrow {\\n      font-"
+    },
+    {
+      "type": "tool_result",
+      "toolUseId": "9a9522ec-575f-432f-aeed-efc491e900aa-tool-call_00_rb6Fsyip171hkqP6OXk18071",
+      "content": "Wrote file successfully.",
+      "isError": false
+    },
+    {
+      "type": "tool_use",
+      "id": "9a9522ec-575f-432f-aeed-efc491e900aa-tool-call_00_4eTH38Zw5qidE8X3PTkU1514",
+      "name": "todowrite",
+      "input": {
+        "todos": [
+          {
+            "priority": "high",
+            "content": "Rewrite CSS with cleaner minimal styling and advanced animations",
+            "status": "completed"
+          },
+          {
+            "priority": "high",
+            "content": "Simplify slide layouts — remove excess chrome, tighter copy",
+            "status": "completed"
+          },
+          {
+            "priority": "high",
+            "content": "Add scroll-triggered parallax and staggered micro-animations",
+            "status": "completed"
+          },
+          {
+            "priority": "medium",
+            "content": "Verify all animations work and deck is responsive",
+            "status": "in_progress"
+          }
+        ]
+      }
+    },
+    {
+      "type": "tool_result",
+      "toolUseId": "9a9522ec-575f-432f-aeed-efc491e900aa-tool-call_00_4eTH38Zw5qidE8X3PTkU1514",
+      "content": "[{\"content\":\"Rewrite CSS with cleaner minimal styling and advanced animations\",\"status\":\"completed\",\"priority\":\"high\"},{\"content\":\"Simplify slide layouts — remove excess chrome, tighter copy\",\"status\":\"completed\",\"priority\":\"high\"},{\"content\":\"Add scroll-triggered parallax and staggered micro-animations\",\"status\":\"completed\",\"priority\":\"high\"},{\"content\":\"Verify all animations work and deck is responsive\",\"status\":\"in_progress\",\"priority\":\"medium\"}]",
+      "isError": false
+    },
+    {
+      "type": "tool_use",
+      "id": "9a9522ec-575f-432f-aeed-efc491e900aa-tool-call_01_C7MtbryrHlpniLdo33pp2533",
+      "name": "grep",
+      "input": {
+        "pattern": "class=\"slide",
+        "path": "%USERPROFILE%\\AppData\\Roaming\\Open Design\\namespaces\\release-stable-win\\data\\projects\\proj-001",
+        "include": "evidguard-pitch-deck.html"
+      }
+    },
+    {
+      "type": "tool_result",
+      "toolUseId": "9a9522ec-575f-432f-aeed-efc491e900aa-tool-call_01_C7MtbryrHlpniLdo33pp2533",
+      "content": "Found 5 matches\n%USERPROFILE%\\AppData\\Roaming\\Open Design\\namespaces\\release-stable-win\\data\\projects\\proj-001\\evidguard-pitch-deck.html:\n  Line 301:   <section class=\"slide hero dark center\" data-screen-label=\"01 Cover\">\n\n  Line 314:   <section class=\"slide light\" data-screen-label=\"02 Performance\">\n\n  Line 336:   <section class=\"slide hero dark center\" data-screen-label=\"03 Enterprise\">\n\n  Line 342:   <section class=\"slide light\" data-screen-label=\"04 ROI\">\n\n  Line 361:   <section class=\"slide hero dark center\" data-screen-label=\"05 CTA\">\n",
+      "isError": false
+    },
+    {
+      "type": "tool_use",
+      "id": "9a9522ec-575f-432f-aeed-efc491e900aa-tool-call_00_O7Acmy77VQy2GyT4o6mk6480",
+      "name": "bash",
+      "input": {
+        "command": "Get-Content \"%USERPROFILE%\\AppData\\Roaming\\Open Design\\namespaces\\release-stable-win\\data\\projects\\proj-001\\evidguard-pitch-deck.html\" -Tail 5",
+        "description": "Verify file ends properly"
+      }
+    },
+    {
+      "type": "tool_result",
+      "toolUseId": "9a9522ec-575f-432f-aeed-efc491e900aa-tool-call_00_O7Acmy77VQy2GyT4o6mk6480",
+      "content": "      } catch (_) { setActive(0); }\r\n    })();\r\n  </script>\r\n</body>\r\n</html>\r\n",
+      "isError": false
+    },
+    {
+      "type": "tool_use",
+      "id": "9a9522ec-575f-432f-aeed-efc491e900aa-tool-call_00_njLhyQDJQcFCAXLXT4fD3309",
+      "name": "todowrite",
+      "input": {
+        "todos": [
+          {
+            "priority": "high",
+            "content": "Rewrite CSS with cleaner minimal styling and advanced animations",
+            "status": "completed"
+          },
+          {
+            "priority": "high",
+            "content": "Simplify slide layouts — remove excess chrome, tighter copy",
+            "status": "completed"
+          },
+          {
+            "priority": "high",
+            "content": "Add scroll-triggered parallax and staggered micro-animations",
+            "status": "completed"
+          },
+          {
+            "priority": "medium",
+            "content": "Verify all animations work and deck is responsive",
+            "status": "completed"
+          }
+        ]
+      }
+    },
+    {
+      "type": "tool_result",
+      "toolUseId": "9a9522ec-575f-432f-aeed-efc491e900aa-tool-call_00_njLhyQDJQcFCAXLXT4fD3309",
+      "content": "[{\"content\":\"Rewrite CSS with cleaner minimal styling and advanced animations\",\"status\":\"completed\",\"priority\":\"high\"},{\"content\":\"Simplify slide layouts — remove excess chrome, tighter copy\",\"status\":\"completed\",\"priority\":\"high\"},{\"content\":\"Add scroll-triggered parallax and staggered micro-animations\",\"status\":\"completed\",\"priority\":\"high\"},{\"content\":\"Verify all animations work and deck is responsive\",\"status\":\"completed\",\"priority\":\"medium\"}]",
+      "isError": false
+    },
+    {
+      "type": "text_delta",
+      "delta": "Hecho — `evidguard-pitch-deck.html` actualizado con dos mejoras clave:\n\n**Minimalista:** sin borders dashed en el logo, sin hint text, sin bullets decorativos, tipografía más limpia, líneas de acento finas como único adorno visual.\n\n**Animaciones profesionales:** cada slide tiene 4 capas de entrada con `revealUp` escalonadas (0.05s → 0.2s → 0.3s → 0.45s), el número grande (94%) escala con `revealScale`, el contador rebota con `counterPop` al cambiar de slide, la barra de progreso tiene un shimmer animado, y las easing curves son Apple-style (`cubic-bezier(0.16, 1, 0.3, 1)`) para un movimiento que se siente físico, no lineal."
+    },
+    {
+      "type": "usage",
+      "usage": {
+        "input_tokens": 0,
+        "output_tokens": 379,
+        "thought_tokens": 0,
+        "cached_read_tokens": 0,
+        "cached_write_tokens": 0
+      },
+      "costUsd": 0
+    }
+  ]
+}
--- a/mocks/golden/README.md
+++ b/mocks/golden/README.md
@ -0,0 +1,56 @@
+# `mocks/golden/` — daemon-event regression snapshots
+
+Each `<trace-id>.events.json` is a committed snapshot of the event
+sequence that flows out of the OD daemon's stream handler when fed the
+matching mock CLI's output for that recording.
+
+The shape is:
+
+```json
+{
+  "agent": "claude",
+  "trace": "314d6833-...",
+  "events": [
+    { "type": "status", ... },
+    { "type": "tool_use", "name": "Read", ... },
+    { "type": "tool_result", ... },
+    ...
+  ]
+}
+```
+
+## Why these exist
+
+`mocks/scripts/smoke-test.sh` confirms the mock CLIs *run* and emit
+events. It doesn't tell you whether the events have the *right shape*
+— a parser change in `apps/daemon/src/claude-stream.ts` or
+`json-event-stream.ts` could silently drop a field, rename
+`sessionID` → `sessionId`, or stop emitting `turn_end`. The goldens
+are diffed by `apps/daemon/tests/mocks-golden.test.ts`; a regression
+makes that test fail loudly.
+
+## Updating after an intentional parser change
+
+```bash
+MOCKS_GOLDEN_UPDATE=1 pnpm --filter @open-design/daemon test mocks-golden
+git diff mocks/golden/
+git add mocks/golden/ && git commit -m "mocks: refresh goldens — parser X now emits Y"
+```
+
+The diff is the part the reviewer eyeballs. If the change matches what
+the parser refactor intended, ship. If it doesn't, the refactor broke
+something.
+
+## Coverage
+
+| File | Agent | Trace | Tools | Why it's representative |
+|---|---|---|---|---|
+| `314d6833…events.json` | claude | 17-tool agent-browser session | 8 | Median complexity for claude; exercises Read / Write / Edit / Bash tool shapes + content_block streaming |
+| `9a9522ec…events.json` | opencode | data-report session | 7 | Covers `step_start` / `step_finish` / `tool_use` / `cost` envelope |
+| `dcdff3b3…events.json` | codex | 14-tool refactor | 14 | High tool density; exercises `thread.started` + thread message streaming |
+
+Other agents (gemini / cursor-agent / plain / ACP / vela) aren't
+golden-tested here because their parsers either (a) emit only assistant
+text (gemini, cursor-agent, plain) — covered by simpler unit tests, or
+(b) use JSON-RPC over stdio rather than streamed stdout (ACP / vela)
+— which needs a different harness; see `apps/daemon/tests/acp.test.ts`.
--- a/mocks/golden/dcdff3b3-cd39-4dcd-be83-372830a29639.events.json
+++ b/mocks/golden/dcdff3b3-cd39-4dcd-be83-372830a29639.events.json
--- a/mocks/lib/format-acp.mjs
+++ b/mocks/lib/format-acp.mjs
@ -0,0 +1,204 @@
+// OD-faithful ACP (agent-client-protocol) renderer.
+//
+// Unlike the streaming formats (opencode/codex/claude/gemini/cursor-agent),
+// ACP is a real JSON-RPC server over stdio. The mock has to:
+//
+//   1. Listen on stdin for newline-delimited JSON-RPC messages from OD.
+//   2. Respond to `initialize` (id=1) with the protocol version.
+//   3. Respond to `session/new` (id=2) with a synthetic sessionId.
+//   4. Optionally respond to `session/set_model` (id=3) with {}.
+//   5. When `session/prompt` (id=N) arrives, push a series of
+//      `session/update` NOTIFICATIONS carrying agent_message_chunk
+//      content from the recording, then respond to the prompt request
+//      with a final usage stats result.
+//   6. Stay alive until OD closes stdin or the request completes.
+//
+// Verified against the spec implementation at
+//   apps/daemon/src/acp.ts (attachAcpSession + ACP_PROTOCOL_VERSION).
+//
+// Tool events from the recording are dropped — OD's ACP parser only
+// recognizes thought_chunk + message_chunk. Tool-use surfaces in the
+// real ACP agents (Hermes/Kimi/Kilo/Kiro/Vibe/Devin) come through
+// MCP or other side channels, not the stdio protocol.
+
+import { writeFile } from 'node:fs/promises';
+
+const PROTOCOL_VERSION = 1;
+const sleep = ms => new Promise(r => setTimeout(r, ms));
+
+function writeRpcResult(out, id, result) {
+  out.write(JSON.stringify({ jsonrpc: '2.0', id, result }) + '\n');
+}
+
+function writeNotification(out, method, params) {
+  out.write(JSON.stringify({ jsonrpc: '2.0', method, params }) + '\n');
+}
+
+/** Split a long text into approximately N-char chunks at word boundaries
+ *  so OD's UI sees a streamed message rather than one giant burst. */
+function chunkText(text, targetChunkSize = 400) {
+  if (text.length <= targetChunkSize) return [text];
+  const chunks = [];
+  let pos = 0;
+  while (pos < text.length) {
+    let end = Math.min(text.length, pos + targetChunkSize);
+    if (end < text.length) {
+      // Round to the next whitespace so we don't split mid-word
+      const ws = text.indexOf(' ', end);
+      if (ws !== -1 && ws - end < 200) end = ws + 1;
+    }
+    chunks.push(text.slice(pos, end));
+    pos = end;
+  }
+  return chunks;
+}
+
+/**
+ * Run the ACP server on stdin/stdout. Returns a promise that resolves
+ * when the prompt round-trip completes.
+ */
+export async function runAcpServer(events, opts = {}) {
+  const out = opts.emit ?? process.stdout;
+  const writeFn = typeof out === 'function' ? out : s => out.write(s);
+  const writeLine = obj => writeFn(JSON.stringify(obj) + '\n');
+
+  const meta = events.find(e => e.type === 'meta');
+  const sessionId = opts.sessionId ?? `mock-acp-${Date.now()}`;
+  const reportEvent = events.find(e => e.type === 'report');
+  const reportText = reportEvent?.content ?? '';
+
+  // Collect any thought_chunks from tool_call inputs that look like
+  // "I should..." style reasoning. For our recordings, the recording
+  // doesn't separate thinking from tool calls — so we just stream the
+  // final report as message chunks. Future enhancement could split out
+  // thinking-style text.
+  void meta;
+
+  // Buffer for incoming JSON-RPC lines.
+  let buf = '';
+  let aborted = false;
+
+  // Track which request we owe a response to.
+  let lastPromptId = null;
+
+  // Chunks generator — used after session/prompt arrives.
+  const startPromptResponse = async (id) => {
+    lastPromptId = id;
+    const chunks = chunkText(reportText, 400);
+    for (const chunk of chunks) {
+      if (aborted) break;
+      if (!opts.noDelay) await sleep(50);
+      writeLine({
+        jsonrpc: '2.0',
+        method: 'session/update',
+        params: {
+          sessionId,
+          update: {
+            sessionUpdate: 'agent_message_chunk',
+            content: { type: 'text', text: chunk },
+          },
+        },
+      });
+    }
+    // Final prompt response carries usage stats. ACP parser reads:
+    //   formatUsage({inputTokens, outputTokens, cachedReadTokens, totalTokens})
+    writeRpcResult(out, id, {
+      stopReason: 'end_of_turn',
+      usage: {
+        inputTokens: 0,
+        outputTokens: meta?.total_tokens ?? 0,
+        cachedReadTokens: 0,
+        totalTokens: meta?.total_tokens ?? 0,
+      },
+    });
+
+    if (opts.reportFile) await writeFile(opts.reportFile, reportText).catch(() => {});
+  };
+
+  // Dispatch incoming RPC messages.
+  const handleRpc = async (obj) => {
+    if (!obj || typeof obj !== 'object') return;
+    const { id, method } = obj;
+    if (method === 'initialize') {
+      writeRpcResult(out, id, {
+        protocolVersion: PROTOCOL_VERSION,
+        capabilities: {
+          loadSession: false,
+          // Tool calls aren't supported via stdio in this mock —
+          // matches the actual ACP agents on OD's side.
+          tools: false,
+        },
+      });
+      return;
+    }
+    if (method === 'session/new') {
+      writeRpcResult(out, id, {
+        sessionId,
+        configOptions: [
+          {
+            configId: 'model',
+            category: 'model',
+            currentValue: meta?.model ?? 'default',
+            values: [meta?.model ?? 'default'],
+          },
+        ],
+      });
+      return;
+    }
+    if (method === 'session/set_model' || method === 'session/set_config_option') {
+      writeRpcResult(out, id, {});
+      return;
+    }
+    if (method === 'session/prompt') {
+      // Fire-and-forget the chunk streaming; the prompt request gets
+      // responded to once the last chunk has been streamed.
+      void startPromptResponse(id);
+      return;
+    }
+    if (method === 'session/cancel') {
+      aborted = true;
+      // OD doesn't expect a response to cancel notifications.
+      return;
+    }
+    // Any other method: respond with an empty result to keep the parser happy.
+    if (id !== undefined && id !== null) {
+      writeRpcResult(out, id, {});
+    }
+  };
+
+  // Promise that resolves when the prompt response has been written.
+  return new Promise((resolve) => {
+    process.stdin.setEncoding('utf-8');
+    process.stdin.on('data', async (data) => {
+      buf += data;
+      let nl;
+      while ((nl = buf.indexOf('\n')) !== -1) {
+        const line = buf.slice(0, nl).trim();
+        buf = buf.slice(nl + 1);
+        if (!line) continue;
+        let obj;
+        try { obj = JSON.parse(line); } catch { continue; }
+        await handleRpc(obj);
+      }
+    });
+    process.stdin.on('end', () => {
+      // OD closed stdin after our prompt response — graceful shutdown.
+      resolve();
+    });
+    process.stdin.on('error', () => resolve());
+
+    const onSigint = () => { aborted = true; resolve(); };
+    process.once('SIGINT', onSigint);
+    process.once('SIGTERM', onSigint);
+
+    // Safety timeout — if OD never sends `initialize` within 30s,
+    // bail out so we don't hang in CI. .unref() lets the process exit
+    // promptly once the prompt round-trip completes normally.
+    setTimeout(() => {
+      if (lastPromptId === null) {
+        process.stderr.write('[mock-acp] no prompt received within 30s; exiting\n');
+        resolve();
+      }
+    }, 30_000).unref();
+  });
+}
--- a/mocks/lib/format-claude.mjs
+++ b/mocks/lib/format-claude.mjs
@ -0,0 +1,92 @@
+// OD-faithful claude-stream-json renderer. Matches OD's
+// `claude-stream.ts:createClaudeStreamHandler` parser.
+//
+// Each tool call lives in its own assistant message wrapper (the
+// "finalized blocks" path — simpler than stream_event deltas, identical
+// semantics).
+
+import { writeFile } from 'node:fs/promises';
+import { randomUUID } from 'node:crypto';
+
+const sleep = ms => new Promise(r => setTimeout(r, ms));
+
+export async function renderAsClaude(events, opts = {}) {
+  const emit = opts.emit ?? (s => process.stdout.write(s));
+  const maxSleep = opts.maxSleepMs ?? 3000;
+  const meta = events.find(e => e.type === 'meta');
+  const sessionId = opts.sessionId ?? randomUUID();
+
+  emit(JSON.stringify({
+    type: 'system',
+    subtype: 'init',
+    model: meta?.model ?? null,
+    session_id: sessionId,
+  }) + '\n');
+
+  const results = new Map();
+  for (const e of events) if (e.type === 'tool_result') results.set(e.obs_id, e);
+
+  let lastT = 0;
+  for (const e of events) {
+    if (e.type === 'meta' || e.type === 'stdout' || e.type === 'tool_result') continue;
+    const t = typeof e.t_ms === 'number' ? e.t_ms : undefined;
+    if (!opts.noDelay && t !== undefined) {
+      const delta = Math.min(maxSleep, Math.max(0, t - lastT));
+      if (delta > 0) await sleep(delta);
+      lastT = t;
+    }
+    if (e.type === 'tool_call') {
+      const result = results.get(e.obs_id);
+      const messageId = `msg_${randomUUID().replace(/-/g, '').slice(0, 24)}`;
+      emit(JSON.stringify({
+        type: 'assistant',
+        message: {
+          id: messageId,
+          role: 'assistant',
+          content: [{
+            type: 'tool_use', id: e.obs_id, name: e.name, input: e.input ?? {},
+          }],
+          stop_reason: 'tool_use',
+        },
+      }) + '\n');
+      emit(JSON.stringify({
+        type: 'user',
+        message: {
+          role: 'user',
+          content: [{
+            type: 'tool_result',
+            tool_use_id: e.obs_id,
+            content: result?.output ?? '',
+            is_error: result?.status === 'error',
+          }],
+        },
+      }) + '\n');
+    } else if (e.type === 'report') {
+      const messageId = `msg_${randomUUID().replace(/-/g, '').slice(0, 24)}`;
+      emit(JSON.stringify({
+        type: 'assistant',
+        message: {
+          id: messageId,
+          role: 'assistant',
+          content: [{ type: 'text', text: e.content }],
+          stop_reason: 'end_turn',
+        },
+      }) + '\n');
+      if (opts.reportFile) await writeFile(opts.reportFile, e.content).catch(() => {});
+    }
+  }
+
+  emit(JSON.stringify({
+    type: 'result',
+    subtype: 'success',
+    usage: {
+      input_tokens: 0,
+      output_tokens: meta?.total_tokens ?? 0,
+      cache_creation_input_tokens: 0,
+      cache_read_input_tokens: 0,
+    },
+    total_cost_usd: 0,
+    duration_ms: meta?.duration_ms ?? 0,
+    stop_reason: 'end_turn',
+  }) + '\n');
+}
--- a/mocks/lib/format-codex.mjs
+++ b/mocks/lib/format-codex.mjs
@ -0,0 +1,92 @@
+// OD-faithful codex renderer. Emits the shape OD's
+// `json-event-stream.ts:handleCodexEvent` parser accepts.
+// Codex parser only knows command_execution (Bash) + agent_message —
+// non-Bash tools are translated to a shell-equivalent command.
+
+import { writeFile } from 'node:fs/promises';
+
+const sleep = ms => new Promise(r => setTimeout(r, ms));
+
+function commandFor(name, input) {
+  if (!input || typeof input !== 'object') return `# ${name}`;
+  const o = input;
+  switch (name) {
+    case 'Bash':
+      return typeof o.command === 'string' ? o.command : '# Bash';
+    case 'Read': {
+      const fp = typeof o.file_path === 'string' ? o.file_path : '';
+      const o2 = typeof o.offset === 'number' ? ` # offset=${o.offset}` : '';
+      const lim = typeof o.limit === 'number' ? ` limit=${o.limit}` : '';
+      return `cat "${fp}"${o2}${lim}`;
+    }
+    case 'Grep': {
+      const pat = typeof o.pattern === 'string' ? o.pattern : '';
+      const path = typeof o.path === 'string' ? o.path : '.';
+      return `grep -n "${pat.replace(/"/g, '\\"')}" "${path}"`;
+    }
+    case 'Glob': {
+      const pat = typeof o.pattern === 'string' ? o.pattern : '';
+      return `find . -path "${pat}" -print`;
+    }
+    case 'Edit':
+    case 'Write':
+      return `# ${name} ${typeof o.file_path === 'string' ? o.file_path : ''}`;
+    default:
+      return `# ${name} ${JSON.stringify(o).slice(0, 200)}`;
+  }
+}
+
+export async function renderAsCodex(events, opts = {}) {
+  const emit = opts.emit ?? (s => process.stdout.write(s));
+  const maxSleep = opts.maxSleepMs ?? 3000;
+  const meta = events.find(e => e.type === 'meta');
+
+  const results = new Map();
+  for (const e of events) if (e.type === 'tool_result') results.set(e.obs_id, e);
+
+  emit(JSON.stringify({ type: 'thread.started' }) + '\n');
+  emit(JSON.stringify({ type: 'turn.started' }) + '\n');
+
+  let lastT = 0;
+  let agentMessageCount = 0;
+  for (const e of events) {
+    if (e.type === 'meta' || e.type === 'stdout' || e.type === 'tool_result') continue;
+    const t = typeof e.t_ms === 'number' ? e.t_ms : undefined;
+    if (!opts.noDelay && t !== undefined) {
+      const delta = Math.min(maxSleep, Math.max(0, t - lastT));
+      if (delta > 0) await sleep(delta);
+      lastT = t;
+    }
+    if (e.type === 'tool_call') {
+      const result = results.get(e.obs_id);
+      const command = commandFor(e.name, e.input);
+      emit(JSON.stringify({
+        type: 'item.started',
+        item: { type: 'command_execution', id: e.obs_id, command },
+      }) + '\n');
+      emit(JSON.stringify({
+        type: 'item.completed',
+        item: {
+          type: 'command_execution',
+          id: e.obs_id,
+          command,
+          aggregated_output: result?.output ?? '',
+          exit_code: result?.status === 'error' ? 1 : 0,
+          status: result?.status === 'error' ? 'failed' : 'completed',
+        },
+      }) + '\n');
+    } else if (e.type === 'report') {
+      agentMessageCount += 1;
+      emit(JSON.stringify({
+        type: 'item.completed',
+        item: { type: 'agent_message', id: `mock-msg-${agentMessageCount}`, text: e.content },
+      }) + '\n');
+      if (opts.reportFile) await writeFile(opts.reportFile, e.content).catch(() => {});
+    }
+  }
+
+  emit(JSON.stringify({
+    type: 'turn.completed',
+    usage: { input_tokens: 0, output_tokens: meta?.total_tokens ?? 0, cached_input_tokens: 0 },
+  }) + '\n');
+}
--- a/mocks/lib/format-cursor-agent.mjs
+++ b/mocks/lib/format-cursor-agent.mjs
@ -0,0 +1,63 @@
+// OD-faithful cursor-agent renderer.
+//
+// Matches the JSONL shape OD's `json-event-stream.ts:handleCursorEvent`
+// parser accepts:
+//   {"type":"system","subtype":"init","model":"..."}
+//   {"type":"assistant","message":{
+//      "content":[{"type":"text","text":"..."}, ...]},
+//      "timestamp_ms": ...}
+//   {"type":"result","usage":{"inputTokens","outputTokens","cacheReadTokens","cacheWriteTokens"},
+//      "duration_ms": ...}
+//
+// Cursor's parser handles delta-vs-replacement detection itself: when a
+// later text block STARTS WITH the prior accumulated text, the parser
+// strips the prefix and emits only the delta. So we can either emit a
+// single message containing the full text, OR emit progressive chunks
+// where each chunk is the cumulative-so-far. We use the single-message
+// form — simplest, no risk of state desync.
+//
+// Like gemini, the cursor-agent parser does NOT recognize tool events.
+// Tool calls in the recording are ignored; only the final assistant
+// text is rendered.
+
+import { writeFile } from 'node:fs/promises';
+
+const sleep = ms => new Promise(r => setTimeout(r, ms));
+
+export async function renderAsCursorAgent(events, opts = {}) {
+  const emit = opts.emit ?? (s => process.stdout.write(s));
+  const maxSleep = opts.maxSleepMs ?? 2000;
+  const meta = events.find(e => e.type === 'meta');
+
+  emit(JSON.stringify({
+    type: 'system',
+    subtype: 'init',
+    model: meta?.model ?? 'cursor-default',
+  }) + '\n');
+
+  if (!opts.noDelay) await sleep(Math.min(maxSleep, 200));
+
+  for (const e of events) {
+    if (e.type === 'report') {
+      emit(JSON.stringify({
+        type: 'assistant',
+        message: {
+          content: [{ type: 'text', text: e.content }],
+        },
+        timestamp_ms: Date.now(),
+      }) + '\n');
+      if (opts.reportFile) await writeFile(opts.reportFile, e.content).catch(() => {});
+    }
+  }
+
+  emit(JSON.stringify({
+    type: 'result',
+    usage: {
+      inputTokens: 0,
+      outputTokens: meta?.total_tokens ?? 0,
+      cacheReadTokens: 0,
+      cacheWriteTokens: 0,
+    },
+    duration_ms: meta?.duration_ms ?? 0,
+  }) + '\n');
+}
--- a/mocks/lib/format-gemini.mjs
+++ b/mocks/lib/format-gemini.mjs
@ -0,0 +1,53 @@
+// OD-faithful gemini renderer.
+//
+// Matches the JSONL shape OD's `json-event-stream.ts:handleGeminiEvent`
+// parser accepts. The parser only recognizes THREE event types:
+//   {"type":"init","model":"..."}                        → status:initializing
+//   {"type":"message","role":"assistant","content":"…"}  → text_delta
+//   {"type":"result","stats":{...}}                      → usage
+//
+// Notably ABSENT: any tool-call event shape. OD's gemini surface doesn't
+// render tool calls in the UI — they're stripped at the parser layer.
+// So our renderer only emits the final assistant text wrapped in the
+// init/message/result envelope. Tool calls in the recording are ignored.
+
+import { writeFile } from 'node:fs/promises';
+
+const sleep = ms => new Promise(r => setTimeout(r, ms));
+
+export async function renderAsGemini(events, opts = {}) {
+  const emit = opts.emit ?? (s => process.stdout.write(s));
+  const maxSleep = opts.maxSleepMs ?? 2000;
+  const meta = events.find(e => e.type === 'meta');
+
+  emit(JSON.stringify({
+    type: 'init',
+    model: meta?.model ?? 'gemini-2.5-pro',
+  }) + '\n');
+
+  // Stream the report text as one assistant message. Optionally we could
+  // chunk by token-count for a more "live streaming" feel — but OD's
+  // gemini parser accepts multi-chunk too (each emits as text_delta).
+  if (!opts.noDelay) await sleep(Math.min(maxSleep, 200));
+  for (const e of events) {
+    if (e.type === 'report') {
+      emit(JSON.stringify({
+        type: 'message',
+        role: 'assistant',
+        content: e.content,
+      }) + '\n');
+      if (opts.reportFile) await writeFile(opts.reportFile, e.content).catch(() => {});
+    }
+  }
+
+  // Final stats wrapper.
+  emit(JSON.stringify({
+    type: 'result',
+    stats: {
+      input_tokens: 0,
+      output_tokens: meta?.total_tokens ?? 0,
+      cached: 0,
+      duration_ms: meta?.duration_ms ?? 0,
+    },
+  }) + '\n');
+}
--- a/mocks/lib/format-opencode.mjs
+++ b/mocks/lib/format-opencode.mjs
@ -0,0 +1,60 @@
+// OD-faithful opencode renderer. Emits the JSONL shape OD's
+// `json-event-stream.ts:handleOpenCodeEvent` parser accepts.
+
+import { writeFile } from 'node:fs/promises';
+
+const sleep = ms => new Promise(r => setTimeout(r, ms));
+
+export async function renderAsOpencode(events, opts = {}) {
+  const emit = opts.emit ?? (s => process.stdout.write(s));
+  const maxSleep = opts.maxSleepMs ?? 3000;
+  const sessionId = opts.sessionId ?? `mock-${Date.now()}`;
+  const meta = events.find(e => e.type === 'meta');
+
+  const results = new Map();
+  for (const e of events) if (e.type === 'tool_result') results.set(e.obs_id, e);
+
+  emit(JSON.stringify({ type: 'step_start' }) + '\n');
+
+  let lastT = 0;
+  for (const e of events) {
+    if (e.type === 'meta' || e.type === 'stdout' || e.type === 'tool_result') continue;
+    const t = typeof e.t_ms === 'number' ? e.t_ms : undefined;
+    if (!opts.noDelay && t !== undefined) {
+      const delta = Math.min(maxSleep, Math.max(0, t - lastT));
+      if (delta > 0) await sleep(delta);
+      lastT = t;
+    }
+    if (e.type === 'tool_call') {
+      const result = results.get(e.obs_id);
+      const isErr = result?.status === 'error';
+      emit(JSON.stringify({
+        type: 'tool_use',
+        sessionID: sessionId,
+        part: {
+          tool: e.name,
+          callID: e.obs_id,
+          state: {
+            input: e.input ?? null,
+            status: isErr ? 'failed' : 'completed',
+            output: result?.output ?? '',
+          },
+        },
+      }) + '\n');
+    } else if (e.type === 'report') {
+      emit(JSON.stringify({
+        type: 'text',
+        part: { text: e.content },
+      }) + '\n');
+      if (opts.reportFile) await writeFile(opts.reportFile, e.content).catch(() => {});
+    }
+  }
+
+  emit(JSON.stringify({
+    type: 'step_finish',
+    part: {
+      tokens: { input: 0, output: meta?.total_tokens ?? 0, reasoning: 0, cache: { read: 0, write: 0 } },
+      cost: 0,
+    },
+  }) + '\n');
+}
--- a/mocks/lib/format-plain.mjs
+++ b/mocks/lib/format-plain.mjs
@ -0,0 +1,40 @@
+// Plain (raw stdout) renderer. OD's `plain` streamFormat (deepseek /
+// qwen / grok / grok-build) treats stdout as the final assistant
+// response. We emit ONLY the report content.
+
+import { writeFile } from 'node:fs/promises';
+
+const sleep = ms => new Promise(r => setTimeout(r, ms));
+
+export async function renderAsPlain(events, opts = {}) {
+  const emit = opts.emit ?? (s => process.stdout.write(s));
+  const maxSleep = opts.maxSleepMs ?? 2000;
+
+  if (opts.includeToolTrace) {
+    const tools = events.filter(e => e.type === 'tool_call');
+    if (tools.length > 0) {
+      emit(`<!-- tools used (${tools.length}):\n`);
+      for (const tc of tools) {
+        const summary = (() => {
+          if (!tc.input || typeof tc.input !== 'object') return '';
+          const o = tc.input;
+          if (typeof o.file_path === 'string') return o.file_path;
+          if (typeof o.command === 'string') return o.command;
+          if (typeof o.pattern === 'string') return `${o.pattern} in ${o.path ?? '.'}`;
+          return JSON.stringify(o).slice(0, 80);
+        })();
+        emit(`  ${tc.name} ${summary}\n`);
+      }
+      emit(`-->\n`);
+    }
+  }
+
+  for (const e of events) {
+    if (e.type === 'report') {
+      if (!opts.noDelay) await sleep(Math.min(maxSleep, 200));
+      emit(e.content);
+      if (!e.content.endsWith('\n')) emit('\n');
+      if (opts.reportFile) await writeFile(opts.reportFile, e.content).catch(() => {});
+    }
+  }
+}
--- a/mocks/lib/format-vela.mjs
+++ b/mocks/lib/format-vela.mjs
@ -0,0 +1,224 @@
+// OD-faithful vela renderer for AMR (Agent Management Runtime).
+//
+// vela is the bin OD's AMR runtime spawns; `bin: 'vela'` in
+// apps/daemon/src/runtimes/defs/amr.ts. The protocol is ACP JSON-RPC
+// over stdio, but vela's spec extends the generic ACP shape with:
+//
+//   1. `initialize` response carries `agentCapabilities` and `models`
+//      (currentModelId + availableModels).
+//   2. `session/new` response carries the same `models` block.
+//   3. `session/prompt` is REJECTED unless `session/set_model` (or
+//      `session/set_config_option`) has been called for the current
+//      sessionId — the strict-set_model gate. This is the contract real
+//      vela 0.0.1 enforces; we mirror it so a regression in
+//      attachAcpSession that silently skips set_model gets caught.
+//   4. Final prompt response includes `usage.{inputTokens, outputTokens,
+//      totalTokens}` (no cachedReadTokens — vela doesn't surface those).
+//
+// Error-injection envs (mirror apps/daemon/tests/fixtures/fake-vela.mjs):
+//   FAKE_VELA_SESSION_NEW_ERROR — fail session/new with this message
+//   FAKE_VELA_SET_MODEL_ERROR   — fail session/set_model
+//   FAKE_VELA_PROMPT_ERROR      — fail session/prompt
+//   FAKE_VELA_REQUIRE_SET_MODEL='0' — disable the strict gate (legacy)
+//
+// Content envs:
+//   FAKE_VELA_SESSION_ID  — the sessionId returned by session/new
+//   FAKE_VELA_TEXT        — override the assistant text (defaults to recording's
+//                           report content)
+//   FAKE_VELA_THOUGHT     — optional thought_chunk emitted before text
+
+import { writeFile } from 'node:fs/promises';
+
+const PROTOCOL_VERSION = 1;
+const sleep = ms => new Promise(r => setTimeout(r, ms));
+
+const DEFAULT_AVAILABLE_MODELS = [
+  { modelId: 'openai/gpt-5.4-mini',         name: 'gpt-5.4-mini' },
+  { modelId: 'anthropic/claude-3.7-sonnet', name: 'claude-3.7-sonnet' },
+  { modelId: 'deepseek-v3.2',               name: 'deepseek-v3.2' },
+  { modelId: 'glm-5.1',                     name: 'glm-5.1' },
+];
+
+function writeRpcResult(out, id, result) {
+  out.write(JSON.stringify({ jsonrpc: '2.0', id, result }) + '\n');
+}
+
+function writeRpcError(out, id, message, code = -32603) {
+  out.write(JSON.stringify({ jsonrpc: '2.0', id, error: { code, message } }) + '\n');
+}
+
+function writeNotification(out, method, params) {
+  out.write(JSON.stringify({ jsonrpc: '2.0', method, params }) + '\n');
+}
+
+/** Split long text on word boundaries — chunk size mirrors the real vela
+ *  agent server's typical streaming cadence (the existing fake-vela.mjs
+ *  uses 16-char chunks; we use 400 to match the higher-throughput
+ *  recordings without flooding the test harness). */
+function chunkText(text, target = 400) {
+  if (text.length <= target) return [text];
+  const chunks = [];
+  let pos = 0;
+  while (pos < text.length) {
+    let end = Math.min(text.length, pos + target);
+    if (end < text.length) {
+      const ws = text.indexOf(' ', end);
+      if (ws !== -1 && ws - end < 200) end = ws + 1;
+    }
+    chunks.push(text.slice(pos, end));
+    pos = end;
+  }
+  return chunks;
+}
+
+/**
+ * Run the vela ACP server on stdin/stdout. Returns a promise that
+ * resolves when stdin closes or the prompt round-trip completes.
+ */
+export async function runVelaAcpServer(events, opts = {}) {
+  const out = opts.emit ?? process.stdout;
+  const meta = events.find(e => e.type === 'meta');
+  const reportEvent = events.find(e => e.type === 'report');
+  const reportText = process.env.FAKE_VELA_TEXT ?? reportEvent?.content ?? '';
+  const thoughtText = process.env.FAKE_VELA_THOUGHT ?? '';
+
+  const sessionId = opts.sessionId ?? process.env.FAKE_VELA_SESSION_ID ?? `fake-vela-session-${Date.now()}`;
+  const strict = process.env.FAKE_VELA_REQUIRE_SET_MODEL !== '0';
+  const SESSION_NEW_ERROR = process.env.FAKE_VELA_SESSION_NEW_ERROR ?? '';
+  const SET_MODEL_ERROR   = process.env.FAKE_VELA_SET_MODEL_ERROR   ?? '';
+  const PROMPT_ERROR      = process.env.FAKE_VELA_PROMPT_ERROR      ?? '';
+
+  let currentModelId = null;
+  const sessionsWithModel = new Set();
+  let buf = '';
+  let aborted = false;
+
+  const startPromptResponse = async (id, sid) => {
+    if (thoughtText) {
+      writeNotification(out, 'session/update', {
+        sessionId: sid,
+        update: {
+          sessionUpdate: 'agent_thought_chunk',
+          content: { type: 'text', text: thoughtText },
+        },
+      });
+    }
+    const chunks = chunkText(reportText, 400);
+    for (const chunk of chunks) {
+      if (aborted) break;
+      if (!opts.noDelay) await sleep(30);
+      writeNotification(out, 'session/update', {
+        sessionId: sid,
+        update: {
+          sessionUpdate: 'agent_message_chunk',
+          content: { type: 'text', text: chunk },
+        },
+      });
+    }
+    writeRpcResult(out, id, {
+      stopReason: 'end_turn',
+      usage: {
+        inputTokens: 12,
+        outputTokens: meta?.total_tokens ?? 7,
+        totalTokens: (meta?.total_tokens ?? 7) + 12,
+      },
+    });
+    if (opts.reportFile) await writeFile(opts.reportFile, reportText).catch(() => {});
+  };
+
+  const handleRpc = async (obj) => {
+    if (!obj || typeof obj !== 'object') return;
+    const { id, method, params } = obj;
+    switch (method) {
+      case 'initialize':
+        writeRpcResult(out, id, {
+          protocolVersion: PROTOCOL_VERSION,
+          agentCapabilities: { promptCapabilities: { embeddedContext: false } },
+          models: {
+            currentModelId,
+            availableModels: DEFAULT_AVAILABLE_MODELS,
+          },
+        });
+        return;
+      case 'session/new': {
+        if (SESSION_NEW_ERROR) {
+          writeRpcError(out, id, SESSION_NEW_ERROR);
+          return;
+        }
+        writeRpcResult(out, id, {
+          sessionId,
+          models: { currentModelId, availableModels: DEFAULT_AVAILABLE_MODELS },
+        });
+        return;
+      }
+      case 'session/set_model': {
+        if (SET_MODEL_ERROR) {
+          writeRpcError(out, id, SET_MODEL_ERROR, -32099);
+          return;
+        }
+        const next = typeof params?.modelId === 'string' ? params.modelId.trim() : '';
+        const sid = typeof params?.sessionId === 'string' ? params.sessionId : sessionId;
+        if (next) currentModelId = next;
+        sessionsWithModel.add(sid);
+        writeRpcResult(out, id, {});
+        return;
+      }
+      case 'session/set_config_option': {
+        // Treat config-option model selection as set_model for the
+        // strict-set_model gate.
+        const sid = typeof params?.sessionId === 'string' ? params.sessionId : sessionId;
+        sessionsWithModel.add(sid);
+        writeRpcResult(out, id, {});
+        return;
+      }
+      case 'session/prompt': {
+        if (PROMPT_ERROR) {
+          writeRpcError(out, id, PROMPT_ERROR, -32602);
+          return;
+        }
+        const sid = typeof params?.sessionId === 'string' ? params.sessionId : sessionId;
+        if (strict && !sessionsWithModel.has(sid)) {
+          writeRpcError(out, id, 'session/set_model must be called before session/prompt', -32602);
+          return;
+        }
+        void startPromptResponse(id, sid);
+        return;
+      }
+      case 'session/cancel':
+        aborted = true;
+        return;
+      default:
+        if (id !== undefined && id !== null) {
+          writeRpcError(out, id, `unknown method: ${method}`, -32601);
+        }
+    }
+  };
+
+  return new Promise((resolve) => {
+    process.stdin.setEncoding('utf-8');
+    process.stdin.on('data', async (data) => {
+      buf += data;
+      let nl;
+      while ((nl = buf.indexOf('\n')) !== -1) {
+        const line = buf.slice(0, nl).trim();
+        buf = buf.slice(nl + 1);
+        if (!line) continue;
+        let obj;
+        try { obj = JSON.parse(line); } catch { continue; }
+        await handleRpc(obj);
+      }
+    });
+    process.stdin.on('end', () => resolve());
+    process.stdin.on('error', () => resolve());
+
+    const onSig = () => { aborted = true; resolve(); };
+    process.once('SIGINT', onSig);
+    process.once('SIGTERM', onSig);
+
+    // 30s safety timeout; .unref() so a completed session exits promptly.
+    setTimeout(() => {
+      process.stderr.write('[mock-vela] no prompt received within 30s; exiting\n');
+      resolve();
+    }, 30_000).unref();
+  });
+}
--- a/mocks/lib/recording-picker.mjs
+++ b/mocks/lib/recording-picker.mjs
@ -0,0 +1,128 @@
+// Pick which recording to play back, driven by env vars.
+//
+// Priority order:
+//   1. OD_MOCKS_TRACE                → fixed trace id (or prefix)
+//   2. OD_MOCKS_BY_PROMPT_HASH=1     → hash(prompt) → trace
+//   3. OD_MOCKS_POOL=<tag>           → random within tag pool
+//   4. (default)                                → random across all
+//
+// OD_MOCKS_SEED gives reproducible "random" selection.
+// OD_MOCKS_RECORDINGS_DIR overrides the default recordings dir
+// (defaults to ../recordings/ relative to this file).
+
+import { readdir, readFile, stat } from 'node:fs/promises';
+import { createHash } from 'node:crypto';
+import { join, dirname } from 'node:path';
+import { fileURLToPath } from 'node:url';
+
+const HERE = dirname(fileURLToPath(import.meta.url));
+export const DEFAULT_RECORDINGS_DIR =
+  process.env.OD_MOCKS_RECORDINGS_DIR ||
+  join(HERE, '..', 'recordings');
+
+async function listRecordings(dir) {
+  try {
+    return (await readdir(dir))
+      .filter(f => f.endsWith('.jsonl'))
+      .map(f => f.replace(/\.jsonl$/, ''))
+      .sort();
+  } catch {
+    return [];
+  }
+}
+
+async function readMeta(dir, traceId) {
+  try {
+    const txt = await readFile(join(dir, `${traceId}.jsonl`), 'utf-8');
+    const firstLine = txt.split('\n', 1)[0];
+    return JSON.parse(firstLine);
+  } catch {
+    return null;
+  }
+}
+
+function pickRandom(arr, seed) {
+  if (arr.length === 0) return null;
+  if (!seed) return arr[Math.floor(Math.random() * arr.length)];
+  const h = parseInt(createHash('sha256').update(String(seed)).digest('hex').slice(0, 12), 16);
+  return arr[h % arr.length];
+}
+
+export async function pickRecording({ prompt } = {}) {
+  const dir = DEFAULT_RECORDINGS_DIR;
+  const all = await listRecordings(dir);
+  if (all.length === 0) return null;
+
+  // 1. fixed — if the env is set, refuse to fall through to random / pool /
+  //    hash selection. A typo in `OD_MOCKS_TRACE` should surface loudly,
+  //    not silently produce a different trace and silently poison a test.
+  const fixed = process.env.OD_MOCKS_TRACE;
+  if (fixed) {
+    const hit = all.find(id => id === fixed) ?? all.find(id => id.startsWith(fixed));
+    if (hit) return { traceId: hit, path: join(dir, `${hit}.jsonl`), method: 'fixed' };
+    throw new Error(
+      `OD_MOCKS_TRACE="${fixed}" set but no matching recording in ${dir}. ` +
+      `8-char id prefix is supported; check spelling and that the corpus has been fetched ` +
+      `(\`bash mocks/scripts/fetch-recordings.sh\`).`
+    );
+  }
+
+  // 2. prompt-hash
+  if (process.env.OD_MOCKS_BY_PROMPT_HASH === '1' && prompt) {
+    const picked = pickRandom(all, prompt);
+    if (picked) return { traceId: picked, path: join(dir, `${picked}.jsonl`), method: 'hash' };
+  }
+
+  // 3. pool by tag — supports structured `<dimension>:<value>` shortcuts
+  //    documented in README (agent:claude, skill:agent-browser,
+  //    outcome:failed). The dimension routes to the right meta field;
+  //    bare values fall back to tag substring match. Mirrors the
+  //    OD_MOCKS_TRACE policy: if the env is set and matches nothing,
+  //    refuse to fall through to global random — surface the typo.
+  const pool = process.env.OD_MOCKS_POOL;
+  if (pool) {
+    const colonIdx = pool.indexOf(':');
+    const dim = colonIdx >= 0 ? pool.slice(0, colonIdx) : null;
+    const value = colonIdx >= 0 ? pool.slice(colonIdx + 1) : null;
+
+    const candidates = [];
+    for (const id of all) {
+      const meta = await readMeta(dir, id);
+      if (!meta) continue;
+      const tags = meta.tags ?? [];
+
+      let match = false;
+      if (dim === 'outcome' && meta.outcome === value)               match = true;
+      else if (dim === 'agent'  && meta.agent === value)             match = true;
+      else if (dim === 'skill'  && tags.some(t => t === `skill:${value}`)) match = true;
+      else if (tags.includes(pool))                                  match = true;
+      else if (meta.agent === pool)                                  match = true;
+      else if (tags.some(t => typeof t === 'string' && t.includes(pool))) match = true;
+
+      if (match) candidates.push(id);
+    }
+    if (candidates.length === 0) {
+      throw new Error(
+        `OD_MOCKS_POOL="${pool}" matched no recordings in ${dir}. ` +
+        `Supported shapes: agent:<name>, skill:<name>, outcome:<succeeded|failed|errored>, ` +
+        `or any tag substring. Check candidates with ` +
+        `\`jq '[.entries[] | {agent, outcome, skills}] | unique' mocks/manifest.json\`.`,
+      );
+    }
+    const picked = pickRandom(candidates, process.env.OD_MOCKS_SEED);
+    if (picked) return { traceId: picked, path: join(dir, `${picked}.jsonl`), method: 'pool', pool };
+  }
+
+  // 4. random
+  const picked = pickRandom(all, process.env.OD_MOCKS_SEED);
+  if (!picked) return null;
+  return { traceId: picked, path: join(dir, `${picked}.jsonl`), method: 'random' };
+}
+
+export async function readRecording(path) {
+  const text = await readFile(path, 'utf-8');
+  return text
+    .split('\n')
+    .filter(Boolean)
+    .map(line => JSON.parse(line));
+}
--- a/mocks/lib/vela-subcommands.mjs
+++ b/mocks/lib/vela-subcommands.mjs
@ -0,0 +1,99 @@
+// vela subcommand handlers (login + models) used by the mock CLI when
+// argv[1] is not the default 'agent' (which falls through to the
+// ACP server in format-vela.mjs).
+//
+// Mirrors the contract that
+// apps/daemon/tests/fixtures/fake-vela.mjs implements — kept in sync
+// because both feed the same daemon-side login route + status reader
+// (apps/daemon/src/integrations/vela-profile.ts +
+// apps/web/src/components/amrLoginPolling.ts).
+
+import { mkdirSync, writeFileSync } from 'node:fs';
+import { homedir } from 'node:os';
+import { dirname, join } from 'node:path';
+
+const DEFAULT_MODELS_STDOUT = [
+  'public_model_deepseek_v3_2          vela',
+  'public_model_deepseek_v4_flash      vela',
+  'public_model_deepseek_v4_pro        vela',
+  'public_model_gemini_2_5_flash       vela',
+  'public_model_gemini_3_1_flash_lite_preview  vela',
+  'public_model_gemini_3_1_pro_preview vela',
+  'public_model_gpt_5_4                vela',
+  'public_model_gpt_5_4_mini           vela',
+  'public_model_glm_5                  vela',
+  'public_model_glm_5_1                vela',
+  'public_model_gpt_image_2            vela',
+  'public_model_kimi_k2_6              vela',
+  'public_model_minimax_m2_7           vela',
+  'public_model_qwen3_235b_a22b        vela',
+  'public_model_seedance_2             vela',
+].join('\n');
+
+/**
+ * `vela login` — writes the AMR config file the daemon's status reader
+ * + AmrLoginPill component expect on disk. The real vela goes through a
+ * device-authorization browser approval; we skip the loop and just
+ * project the same on-disk artifact a successful real login produces.
+ *
+ * Envs (compat with fake-vela.mjs):
+ *   VELA_PROFILE                — profile slot to populate (prod|test|local)
+ *   FAKE_VELA_LOGIN_DELAY_MS    — sleep before the write (test in-flight states)
+ *   FAKE_VELA_LOGIN_USER_EMAIL  — email written into the profile
+ *   FAKE_VELA_LOGIN_USER_PLAN   — plan written into the profile
+ *   FAKE_VELA_LOGIN_FAIL        — when set, prints to stderr + exits 1
+ */
+export async function runVelaLogin() {
+  if (process.env.FAKE_VELA_LOGIN_FAIL) {
+    process.stderr.write(`${process.env.FAKE_VELA_LOGIN_FAIL}\n`);
+    process.exit(1);
+  }
+  const allowed = new Set(['prod', 'test', 'local']);
+  const requested = (process.env.VELA_PROFILE || 'prod').trim() || 'prod';
+  const profile = allowed.has(requested) ? requested : 'prod';
+  if (!allowed.has(requested)) {
+    process.stderr.write(`[mock-vela] unknown profile ${requested}; defaulting to prod\n`);
+  }
+  const delayMs = Number(process.env.FAKE_VELA_LOGIN_DELAY_MS) || 0;
+  const userEmail = process.env.FAKE_VELA_LOGIN_USER_EMAIL || 'fake-user@example.com';
+  const userPlan  = process.env.FAKE_VELA_LOGIN_USER_PLAN  || 'free';
+
+  const write = () => {
+    const file = join(homedir(), '.amr', 'config.json');
+    mkdirSync(dirname(file), { recursive: true });
+    const payload = {
+      profiles: {
+        [profile]: {
+          // Fake credentials — never used by real vela traffic since the
+          // daemon's link client points at the same FAKE_VELA_API_URL.
+          controlKey: 'fake-control-key-0000000000000000000000',
+          runtimeKey: 'fake-runtime-key-0000000000000000000000',
+          apiUrl:  profile === 'local' ? 'http://localhost:18080' : '',
+          linkUrl: profile === 'local' ? 'http://localhost:18081' : '',
+          user: {
+            id: 'fake-user-id',
+            email: userEmail,
+            name: 'Fake User',
+            plan: userPlan,
+          },
+        },
+      },
+    };
+    writeFileSync(file, JSON.stringify(payload, null, 2), 'utf8');
+    process.stdout.write(`Login successful for ${userEmail}.\n`);
+    process.exit(0);
+  };
+
+  if (delayMs > 0) setTimeout(write, delayMs);
+  else write();
+}
+
+/**
+ * `vela models` — prints the production-shaped public model catalog.
+ * Override via FAKE_VELA_MODELS env (newline-separated lines).
+ */
+export function runVelaModels() {
+  const out = process.env.FAKE_VELA_MODELS || DEFAULT_MODELS_STDOUT;
+  process.stdout.write(`${out}\n`);
+  process.exit(0);
+}
--- a/mocks/manifest.json
+++ b/mocks/manifest.json
--- a/mocks/mock-agent.mjs
+++ b/mocks/mock-agent.mjs
@ -0,0 +1,149 @@
+#!/usr/bin/env node
+/**
+ * mock-agent.mjs — pretends to be one of OD's supported agent CLIs
+ * (claude / opencode / codex / deepseek / qwen / grok) by streaming a
+ * pre-recorded session in that CLI's native stdout protocol. Zero LLM
+ * tokens.
+ *
+ * Usage (driven by the wrappers in bin/, not directly):
+ *   ./mock-agent.mjs --as opencode [--no-delay] [--report-file <path>]
+ *
+ * Recording selection — see lib/recording-picker.mjs. The wrappers
+ * announce the picked trace id on stderr.
+ *
+ * Trace data: ./recordings/<trace-id>.jsonl (anonymized exports from
+ * Langfuse). Index: ./recordings/index.json.
+ */
+
+import { pickRecording, readRecording } from './lib/recording-picker.mjs';
+import { renderAsOpencode }    from './lib/format-opencode.mjs';
+import { renderAsCodex }       from './lib/format-codex.mjs';
+import { renderAsClaude }      from './lib/format-claude.mjs';
+import { renderAsGemini }      from './lib/format-gemini.mjs';
+import { renderAsCursorAgent } from './lib/format-cursor-agent.mjs';
+import { renderAsPlain }       from './lib/format-plain.mjs';
+import { runAcpServer }        from './lib/format-acp.mjs';
+import { runVelaAcpServer }    from './lib/format-vela.mjs';
+import { runVelaLogin, runVelaModels } from './lib/vela-subcommands.mjs';
+
+function parseArgs(argv) {
+  const opts = { as: null, noDelay: false, reportFile: null, positionals: [] };
+  for (let i = 0; i < argv.length; i++) {
+    const a = argv[i];
+    if (a === '--as' || a === '--agent') { opts.as = argv[++i]; continue; }
+    if (a === '--no-delay')              { opts.noDelay = true; continue; }
+    if (a === '--report-file')           { opts.reportFile = argv[++i]; continue; }
+    if (a.startsWith('-')) continue;     // Unknown flag — silently skip (model/permission flags etc.)
+    // Anything left is a positional — used by vela subcommand dispatch.
+    opts.positionals.push(a);
+  }
+  if (process.env.OD_MOCKS_NO_DELAY === '1') opts.noDelay = true;
+  // Fall through to REPORT_FILE env when --report-file wasn't supplied.
+  // Some harnesses (e.g. the agent-pr-explore orchestrator) set
+  // REPORT_FILE as env but expect the agent to write there
+  // autonomously — real opencode/claude do via their Write tool, but
+  // the mock needs to project the recording's final assistant text to
+  // that path so the harness sees a report.
+  if (!opts.reportFile && process.env.REPORT_FILE) {
+    opts.reportFile = process.env.REPORT_FILE;
+  }
+  return opts;
+}
+
+async function readStdinIfPiped() {
+  if (process.stdin.isTTY) return '';
+  return new Promise(resolve => {
+    let acc = '';
+    process.stdin.setEncoding('utf-8');
+    process.stdin.on('data', c => { acc += c; });
+    process.stdin.on('end',  () => resolve(acc));
+    process.stdin.on('error', () => resolve(acc));
+    // Safety timeout in case the parent never closes stdin (PTY).
+    setTimeout(() => resolve(acc), 1500);
+  });
+}
+
+async function main() {
+  const opts = parseArgs(process.argv.slice(2));
+  if (!opts.as) {
+    process.stderr.write(
+      'mock-agent: --as <agent> required\n' +
+      '  supported: opencode | claude | codex | gemini | cursor-agent |\n' +
+      '             deepseek | qwen | grok | plain |\n' +
+      '             devin | hermes | kilo | kimi | kiro | vibe   (ACP)\n' +
+      '             vela                                          (AMR — vela CLI)\n',
+    );
+    process.exit(2);
+  }
+
+  // `vela` dispatches by the first positional arg passed by OD (login /
+  // models / agent). Subcommands run BEFORE recording selection because
+  // they don't use trace data at all.
+  if (opts.as === 'vela') {
+    const cmd = (opts.positionals[0] || '').trim();
+    if (cmd === 'login')  return runVelaLogin();
+    if (cmd === 'models') return runVelaModels();
+    // Default: `agent run --runtime opencode` — fall through to the ACP
+    // server below with the vela-flavored protocol.
+  }
+
+  // ACP agents read JSON-RPC messages off stdin one line at a time, so the
+  // bulk-prompt buffering logic below doesn't apply — pickRecording sees no
+  // prompt for hash-mode (use OD_MOCKS_TRACE or _POOL instead).
+  const ACP_AGENTS = new Set(['devin', 'hermes', 'kilo', 'kimi', 'kiro', 'vibe', 'vela']);
+  const isAcp = ACP_AGENTS.has(opts.as);
+  const prompt = isAcp ? '' : await readStdinIfPiped();
+  const picked = await pickRecording({ prompt });
+  if (!picked) {
+    process.stderr.write(
+      'mock-agent: no recordings on disk yet.\n' +
+      'The recording corpus is hosted on Cloudflare R2 (see mocks/manifest.json)\n' +
+      'and is fetched on demand. Run:\n' +
+      '\n' +
+      '  bash mocks/scripts/fetch-recordings.sh             # all 179 (~30s, 4.5MB)\n' +
+      '  bash mocks/scripts/fetch-recordings.sh --agent claude   # subset\n' +
+      '\n' +
+      'Or set OD_MOCKS_RECORDINGS_DIR if you stashed them elsewhere.\n',
+    );
+    process.exit(3);
+  }
+
+  process.stderr.write(
+    `[mock-${opts.as}] picked ${picked.traceId.slice(0, 8)}… via ${picked.method}` +
+    (picked.pool ? ` (pool="${picked.pool}")` : '') +
+    '\n',
+  );
+
+  const events = await readRecording(picked.path);
+  const renderOpts = { noDelay: opts.noDelay, reportFile: opts.reportFile };
+
+  switch (opts.as) {
+    case 'opencode':     await renderAsOpencode(events, renderOpts);    break;
+    case 'codex':        await renderAsCodex(events, renderOpts);       break;
+    case 'claude':       await renderAsClaude(events, renderOpts);      break;
+    case 'gemini':       await renderAsGemini(events, renderOpts);      break;
+    case 'cursor-agent': await renderAsCursorAgent(events, renderOpts); break;
+    case 'deepseek':
+    case 'qwen':
+    case 'grok':
+    case 'plain':        await renderAsPlain(events, renderOpts);       break;
+    // ACP family — JSON-RPC server over stdio.
+    case 'devin':
+    case 'hermes':
+    case 'kilo':
+    case 'kimi':
+    case 'kiro':
+    case 'vibe':         await runAcpServer(events, renderOpts);        break;
+    // AMR (vela CLI) — ACP with vela-specific protocol extensions
+    // (agentCapabilities + models block + strict set_model gate).
+    case 'vela':         await runVelaAcpServer(events, renderOpts);    break;
+    default:
+      process.stderr.write(`mock-agent: unknown agent "${opts.as}"\n`);
+      process.exit(2);
+  }
+}
+
+main().catch(err => {
+  process.stderr.write(`mock-agent: ${err.message}\n`);
+  process.exit(1);
+});
--- a/mocks/recordings/.gitignore
+++ b/mocks/recordings/.gitignore
@ -0,0 +1,6 @@
+# Recording corpus is hosted on Cloudflare R2 (see ../manifest.json).
+# Pull it with: bash mocks/scripts/fetch-recordings.sh
+# The .jsonl files land here at runtime; the manifest is the source of
+# truth committed in the parent dir.
+*.jsonl
+index.json
--- a/mocks/scripts/contract-check.sh
+++ b/mocks/scripts/contract-check.sh
@ -0,0 +1,113 @@
+#!/usr/bin/env bash
+# Real-CLI contract check — spot-check that the actual agent CLI's
+# stream protocol still matches what the mock formatters emit. Catches
+# drift like:
+#   - real CLI adds a new event `type`
+#   - real CLI changes a field name (e.g. `sessionID` → `sessionId`)
+#   - real CLI's usage object gains/loses a sub-key
+#
+# When mocks drift toward "satisfy the current OD parser" instead of
+# "mimic the actual CLI protocol", the OD-level smoke tests stay green
+# but real-world behavior diverges. Periodic runs of this script (manual
+# or scheduled in CI on a real-CLI-available runner) surface that drift
+# before it becomes a real-PR debugging session.
+#
+#   bash mocks/scripts/contract-check.sh <agent>
+#   bash mocks/scripts/contract-check.sh claude
+#   bash mocks/scripts/contract-check.sh opencode
+#
+# Prereqs: the real agent CLI must be installed AND authenticated. This
+# script DOES make a live LLM call (couple of cents of tokens) so it's
+# not free.
+#
+# Output: side-by-side top-level event `type` distribution from real
+# vs mock for the same prompt. A maintainer eyeballs the diff. Future
+# work (see docs/MOCKS-CONTRACT-CHECK.md) is to lift this into a
+# stricter structural compare.
+
+set -euo pipefail
+
+AGENT="${1:-}"
+if [ -z "$AGENT" ]; then
+  echo "usage: $0 <agent>" >&2
+  echo "       supported: claude | opencode | codex" >&2
+  exit 2
+fi
+
+HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd -P)"
+MOCKS_DIR="$(cd "$HERE/.." && pwd -P)"
+
+# Fixed deterministic prompt — small, no creative variability, costs
+# pennies. Picked to be a realistic OD-shaped task: 1-2 tool calls.
+PROMPT='List the entries of the current working directory and tell me how many JSON files are present. Reply with just the count, like "N JSON files".'
+
+real_out="$(mktemp -t contract-real.XXXX).jsonl"
+mock_out="$(mktemp -t contract-mock.XXXX).jsonl"
+trap 'rm -f "$real_out" "$mock_out"' EXIT
+
+# Resolve the real CLI binary. We deliberately ignore mocks/bin/ PATH
+# overlays.
+real_bin=""
+case "$AGENT" in
+  claude)   real_bin="$(PATH=$(echo "$PATH" | tr ':' '\n' | grep -v "$MOCKS_DIR/bin" | paste -sd: -) command -v claude || true)" ;;
+  codex)    real_bin="$(PATH=$(echo "$PATH" | tr ':' '\n' | grep -v "$MOCKS_DIR/bin" | paste -sd: -) command -v codex  || true)" ;;
+  opencode) real_bin="$(PATH=$(echo "$PATH" | tr ':' '\n' | grep -v "$MOCKS_DIR/bin" | paste -sd: -) command -v opencode || true)" ;;
+  *) echo "✗ unsupported agent for contract check: $AGENT" >&2; exit 2 ;;
+esac
+if [ -z "$real_bin" ]; then
+  echo "✗ real '$AGENT' CLI not on PATH. Install + login, then re-run." >&2
+  exit 1
+fi
+echo "real CLI:  $real_bin"
+echo "prompt:    $PROMPT"
+echo
+
+# 1. Real CLI
+echo "→ invoking real $AGENT…"
+case "$AGENT" in
+  claude)
+    printf '%s' "$PROMPT" | "$real_bin" -p --output-format=stream-json --verbose >"$real_out" 2>&1 || true ;;
+  codex)
+    printf '%s' "$PROMPT" | "$real_bin" exec >"$real_out" 2>&1 || true ;;
+  opencode)
+    printf '%s' "$PROMPT" | "$real_bin" run >"$real_out" 2>&1 || true ;;
+esac
+
+# 2. Mock CLI — same prompt, PATH-overlayed to the mock bin.
+# `bash -c` here would lose $PROMPT (parent shell var, not exported)
+# and silently send an empty string to the mock — defeating the
+# "same input on both sides" property the rest of the script relies on.
+# A subshell scopes the PATH override locally, no var-passing dance.
+echo "→ invoking mock $AGENT…"
+(
+  export PATH="$MOCKS_DIR/bin:$PATH"
+  export OD_MOCKS_NO_DELAY=1
+  case "$AGENT" in
+    claude)
+      printf '%s' "$PROMPT" | claude -p --output-format=stream-json --verbose >"$mock_out" 2>&1 ;;
+    codex)
+      printf '%s' "$PROMPT" | codex exec >"$mock_out" 2>&1 ;;
+    opencode)
+      printf '%s' "$PROMPT" | opencode run >"$mock_out" 2>&1 ;;
+  esac
+) || true
+
+# 3. Compare top-level event `type` distributions (skip content)
+summarize() {
+  jq -r 'try .type catch empty' "$1" 2>/dev/null | sort | uniq -c | sort -rn || true
+}
+real_summary=$(summarize "$real_out")
+mock_summary=$(summarize "$mock_out")
+
+echo
+echo "real:"
+printf '%s\n' "${real_summary:-  (no parseable JSON events)}" | sed 's/^/  /'
+echo
+echo "mock:"
+printf '%s\n' "${mock_summary:-  (no parseable JSON events)}" | sed 's/^/  /'
+echo
+echo "raw outputs kept at:"
+echo "  real: $real_out"
+echo "  mock: $mock_out"
+echo "(diff manually — `diff <(jq -r .type $real_out|sort -u) <(jq -r .type $mock_out|sort -u)`)"
+trap - EXIT   # leave the tmpfiles for the maintainer to inspect
--- a/mocks/scripts/fetch-recordings.sh
+++ b/mocks/scripts/fetch-recordings.sh
@ -0,0 +1,158 @@
+#!/usr/bin/env bash
+# Fetch the recording corpus referenced by mocks/manifest.json from
+# Cloudflare R2 into mocks/recordings/. Skips files already on disk
+# whose sha256 matches the manifest. Verifies every download.
+#
+# Usage:
+#   bash mocks/scripts/fetch-recordings.sh                  # fetch all
+#   bash mocks/scripts/fetch-recordings.sh --agent claude   # fetch claude only
+#   bash mocks/scripts/fetch-recordings.sh --outcome failed # fetch failed only
+#   bash mocks/scripts/fetch-recordings.sh --skill agent-browser
+#   bash mocks/scripts/fetch-recordings.sh --concurrency 16
+#   bash mocks/scripts/fetch-recordings.sh --force          # re-download all
+#   bash mocks/scripts/fetch-recordings.sh --cache-dir <p>  # override cache location
+#
+# Default cache: mocks/recordings/. Override with OD_MOCKS_CACHE_DIR env
+# or --cache-dir flag — useful for sharing across multiple OD checkouts.
+
+set -euo pipefail
+
+HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd -P)"
+MOCKS_DIR="$(cd "$HERE/.." && pwd -P)"
+MANIFEST="$MOCKS_DIR/manifest.json"
+
+FILTER_AGENT=""
+FILTER_OUTCOME=""
+FILTER_SKILL=""
+CONCURRENCY=8
+FORCE=0
+CACHE_DIR="${OD_MOCKS_CACHE_DIR:-$MOCKS_DIR/recordings}"
+
+while [[ $# -gt 0 ]]; do
+  case "$1" in
+    --agent)        FILTER_AGENT="$2";   shift 2 ;;
+    --outcome)      FILTER_OUTCOME="$2"; shift 2 ;;
+    --skill)        FILTER_SKILL="$2";   shift 2 ;;
+    --concurrency)  CONCURRENCY="$2";    shift 2 ;;
+    --cache-dir)    CACHE_DIR="$2";      shift 2 ;;
+    --force)        FORCE=1;             shift   ;;
+    -h|--help)
+      sed -n '2,17p' "$0" | sed 's/^# //; s/^#//'; exit 0 ;;
+    *) echo "unknown flag: $1" >&2; exit 2 ;;
+  esac
+done
+
+if [ ! -f "$MANIFEST" ]; then
+  echo "✗ manifest not found at $MANIFEST" >&2
+  exit 1
+fi
+
+mkdir -p "$CACHE_DIR"
+
+# Use node to walk the manifest — sturdier than shell JSON parsing.
+PUBLIC_URL=$(node -e '
+const m = JSON.parse(require("fs").readFileSync(process.argv[1],"utf-8"));
+process.stdout.write(m.storage.public_url_base + "/" + m.storage.object_prefix);
+' "$MANIFEST")
+
+# Select entries matching filters, write one TSV row per entry:
+# <trace_id>\t<sha256>\t<bytes>
+ENTRIES_TSV=$(node -e '
+const m = JSON.parse(require("fs").readFileSync(process.argv[1],"utf-8"));
+const fa = process.argv[2], fo = process.argv[3], fs = process.argv[4];
+for (const e of m.entries) {
+  if (fa && e.agent !== fa) continue;
+  if (fo && e.outcome !== fo) continue;
+  if (fs && !(e.skills || []).includes(fs)) continue;
+  process.stdout.write(`${e.trace_id}\t${e.sha256}\t${e.bytes}\n`);
+}
+' "$MANIFEST" "$FILTER_AGENT" "$FILTER_OUTCOME" "$FILTER_SKILL")
+
+# Empty-string check has to come BEFORE any line-counting — `printf '%s\n' ""`
+# emits a single empty line, which `grep -c ""` / `wc -l` would count as 1
+# and let a typo'd `--agent xyz` quietly succeed with zero downloads.
+if [ -z "$ENTRIES_TSV" ]; then
+  echo "no entries matched filter" >&2
+  exit 0
+fi
+TOTAL=$(printf '%s\n' "$ENTRIES_TSV" | wc -l | tr -d ' ')
+
+echo "Fetching up to $TOTAL recordings → $CACHE_DIR"
+echo "  manifest:    $MANIFEST"
+echo "  R2 prefix:   $PUBLIC_URL"
+[ -n "$FILTER_AGENT" ]   && echo "  filter:      agent=$FILTER_AGENT"
+[ -n "$FILTER_OUTCOME" ] && echo "  filter:      outcome=$FILTER_OUTCOME"
+[ -n "$FILTER_SKILL" ]   && echo "  filter:      skill=$FILTER_SKILL"
+[ "$FORCE" -eq 1 ]       && echo "  --force: re-downloading all matched"
+echo
+
+# Function called by xargs — must be exported. Writes one of:
+#   ✓ <id>   (newly fetched)
+#   • <id>   (skipped — sha256 already matches)
+#   ✗ <id>   (failed — sha256 mismatch or download error)
+fetch_one() {
+  local id="$1" sha="$2" bytes="$3"
+  local dest="$CACHE_DIR/$id.jsonl"
+  if [ "$FORCE" -ne 1 ] && [ -f "$dest" ]; then
+    local existing
+    existing=$(shasum -a 256 "$dest" 2>/dev/null | awk '{print $1}')
+    if [ "$existing" = "$sha" ]; then
+      echo "• $id"
+      return 0
+    fi
+  fi
+  local url="${PUBLIC_URL}${id}.jsonl"
+  if ! curl -sf -o "$dest.tmp" "$url"; then
+    echo "✗ $id (download failed)"
+    rm -f "$dest.tmp"
+    return 1
+  fi
+  local got
+  got=$(shasum -a 256 "$dest.tmp" | awk '{print $1}')
+  if [ "$got" != "$sha" ]; then
+    echo "✗ $id (sha256 mismatch: got $got expected $sha)"
+    rm -f "$dest.tmp"
+    return 1
+  fi
+  mv "$dest.tmp" "$dest"
+  echo "✓ $id"
+}
+
+export PUBLIC_URL CACHE_DIR FORCE
+export -f fetch_one
+
+printf '%s\n' "$ENTRIES_TSV" \
+  | xargs -P "$CONCURRENCY" -L 1 bash -c 'fetch_one "$1" "$2" "$3"' _ \
+  > /tmp/od-mocks-fetch-progress.txt 2>&1
+
+new=$(grep -c "^✓"  /tmp/od-mocks-fetch-progress.txt || true)
+skip=$(grep -c "^•" /tmp/od-mocks-fetch-progress.txt || true)
+fail=$(grep -c "^✗" /tmp/od-mocks-fetch-progress.txt || true)
+
+echo "  ✓ fetched: $new"
+echo "  • cached:  $skip"
+if [ "$fail" -gt 0 ]; then
+  echo "  ✗ failed:  $fail"
+  echo
+  grep "^✗" /tmp/od-mocks-fetch-progress.txt | head -5
+  echo "  …(full log /tmp/od-mocks-fetch-progress.txt)"
+  exit 1
+fi
+
+# Symlink (or copy) into mocks/recordings/ when cache lives elsewhere so
+# the mock-agent recording-picker keeps working without env overrides.
+if [ "$CACHE_DIR" != "$MOCKS_DIR/recordings" ]; then
+  mkdir -p "$MOCKS_DIR/recordings"
+  for f in "$CACHE_DIR"/*.jsonl; do
+    [ -e "$f" ] || continue
+    bn=$(basename "$f")
+    if [ ! -e "$MOCKS_DIR/recordings/$bn" ]; then
+      ln -sf "$f" "$MOCKS_DIR/recordings/$bn"
+    fi
+  done
+  # Also link the manifest so picker/index-aware tooling sees it.
+  ln -sf "$MANIFEST" "$MOCKS_DIR/recordings/index.json" 2>/dev/null || true
+fi
+
+echo
+echo "✅ ready: $MOCKS_DIR/recordings/"
--- a/mocks/scripts/lib/manifest-utils.mjs
+++ b/mocks/scripts/lib/manifest-utils.mjs
@ -0,0 +1,103 @@
+// Manifest math — shared between the local `add-recording.sh` preview
+// step and the GitHub Action that actually does R2 uploads.
+//
+// Concentrated here so a maintainer who eyeballs add-recording's
+// preview output sees the SAME entry shape the workflow will commit
+// post-merge. No drift.
+
+import { readFileSync as readSync, writeFileSync } from 'node:fs';
+import { createHash } from 'node:crypto';
+import { basename } from 'node:path';
+
+/**
+ * Parse a recording's first JSONL line (the `meta` event) and return
+ * the shape we store in the manifest.
+ */
+export function inspectRecording(jsonlPath) {
+  const buf = readSync(jsonlPath);
+  const sha256 = createHash('sha256').update(buf).digest('hex');
+  const firstLine = buf.toString('utf-8').split('\n', 1)[0];
+  let meta;
+  try {
+    meta = JSON.parse(firstLine);
+  } catch (err) {
+    throw new Error(`first line of ${jsonlPath} is not valid JSON: ${err.message}`);
+  }
+  if (meta.type !== 'meta') {
+    throw new Error(`first line of ${jsonlPath} has type="${meta.type}" — expected "meta"`);
+  }
+  const traceId = basename(jsonlPath, '.jsonl');
+  const skills = (meta.tags ?? [])
+    .filter(t => typeof t === 'string' && t.startsWith('skill:'))
+    .map(t => t.slice('skill:'.length));
+  return {
+    trace_id: traceId,
+    sha256,
+    bytes: buf.byteLength,
+    agent: meta.agent ?? 'unknown',
+    model: meta.model ?? null,
+    outcome: meta.outcome ?? 'unknown',
+    tool_count: meta.tool_call_count ?? 0,
+    duration_ms: meta.duration_ms ?? 0,
+    skills: skills.length > 0 ? skills : ['default'],
+    session_id: meta.session_id ?? null,
+    user_input_preview: meta.user_input
+      ? String(meta.user_input).slice(0, 200).replace(/\s+/g, ' ').trim()
+      : null,
+    multi_turn: false,                    // filled in after histograms
+    // ──── Provenance (fixture-trust signals; mrcfps review #3241) ────
+    // Capture-time fields the harvester writes into the meta event.
+    // Older recordings may not have them all — null is the explicit "we
+    // don't know" so consumers can decide whether the fixture is still
+    // trustworthy as the real CLIs evolve.
+    captured_at: meta.timestamp ?? null,            // ISO 8601 of the original session
+    cli_version: meta.cli_version ?? null,          // e.g. "claude-code 1.0.65" / "codex 0.40"
+    protocol_version: meta.protocol_version ?? null,
+    anonymization_version: meta.anonymization_version ?? null,
+  };
+}
+
+/**
+ * Insert / replace a manifest entry and rebuild all derived fields
+ * (histograms, multi_turn flags, total_bytes, total). Returns the
+ * mutated manifest object — caller writes it back if it wants to.
+ */
+export function upsertEntry(manifest, newEntry) {
+  const idx = manifest.entries.findIndex(e => e.trace_id === newEntry.trace_id);
+  if (idx >= 0) {
+    manifest.entries[idx] = newEntry;
+  } else {
+    manifest.entries.push(newEntry);
+  }
+  // Rebuild derived fields from scratch.
+  const byAgent = {};
+  const byOutcome = {};
+  const bySkill = {};
+  const sessions = {};
+  let totalBytes = 0;
+  for (const e of manifest.entries) {
+    byAgent[e.agent] = (byAgent[e.agent] ?? 0) + 1;
+    byOutcome[e.outcome] = (byOutcome[e.outcome] ?? 0) + 1;
+    for (const s of e.skills) bySkill[s] = (bySkill[s] ?? 0) + 1;
+    if (e.session_id) (sessions[e.session_id] ??= []).push(e.trace_id);
+    totalBytes += e.bytes;
+  }
+  for (const e of manifest.entries) {
+    e.multi_turn = !!(e.session_id && (sessions[e.session_id]?.length ?? 0) >= 2);
+  }
+  manifest.entries.sort((a, b) => a.trace_id.localeCompare(b.trace_id));
+  manifest.generated_at = new Date().toISOString();
+  manifest.total = manifest.entries.length;
+  manifest.total_bytes = totalBytes;
+  manifest.histograms = { by_agent: byAgent, by_outcome: byOutcome, by_skill: bySkill };
+  manifest.sessions_with_multi_turn = Object.values(sessions).filter(s => s.length >= 2).length;
+  return manifest;
+}
+
+export function readManifest(path) {
+  return JSON.parse(readSync(path, 'utf-8'));
+}
+
+export function writeManifest(path, manifest) {
+  writeFileSync(path, JSON.stringify(manifest, null, 2));
+}
--- a/mocks/scripts/smoke-test.sh
+++ b/mocks/scripts/smoke-test.sh
@ -0,0 +1,175 @@
+#!/usr/bin/env bash
+# Quick smoke test for the mock CLIs.
+# Runs each agent's wrapper against a known recording and asserts that:
+#   1. The mock binary exits 0
+#   2. Stdout produces a sensible number of lines (>= 5 for JSON formats,
+#      >= 1 for plain)
+#   3. The first JSON line for each JSON agent has the expected shape
+#
+# Usage:
+#   bash mocks/scripts/smoke-test.sh
+
+set -euo pipefail
+
+HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd -P)"
+MOCKS="$(cd "$HERE/.." && pwd -P)"
+TRACE_ID="${OD_MOCKS_SMOKE_TRACE:-04097377}"   # the 17-tool claude session
+
+# Ensure recordings are on disk — the corpus is hosted on R2 and fetched
+# on demand. If nothing's been pulled yet (or only a few are), run the
+# fetcher so smoke covers the full agent matrix.
+if ! ls "$MOCKS/recordings"/*.jsonl >/dev/null 2>&1; then
+  echo "no recordings found — fetching from R2 first..."
+  bash "$HERE/fetch-recordings.sh" >/dev/null
+  echo
+fi
+
+export PATH="$MOCKS/bin:$PATH"
+export OD_MOCKS_TRACE="$TRACE_ID"
+export OD_MOCKS_NO_DELAY=1
+
+failed=0
+pass()  { printf '  \033[32m✓\033[0m %s\n' "$1"; }
+fail()  { printf '  \033[31m✗\033[0m %s\n' "$1"; failed=$((failed + 1)); }
+
+check_json_first_event() {
+  local agent="$1" expected_type="$2"
+  local first
+  first=$(echo "smoke" | "$agent" run 2>/dev/null | head -1 || true)
+  if [ -z "$first" ]; then fail "$agent: empty stdout"; return; fi
+  local got
+  got=$(printf '%s' "$first" | node -e 'let buf="";process.stdin.on("data",d=>buf+=d);process.stdin.on("end",()=>{try{console.log(JSON.parse(buf).type||"")}catch{console.log("INVALID")}})')
+  if [ "$got" = "$expected_type" ]; then
+    pass "$agent first event = $expected_type"
+  else
+    fail "$agent first event = $got (wanted $expected_type)"
+  fi
+}
+
+echo "Smoke testing mock CLIs against trace $TRACE_ID"
+echo
+
+# opencode / opencode-cli (primary OD-facing bin) → step_start
+check_json_first_event opencode step_start
+check_json_first_event opencode-cli step_start
+
+# codex → thread.started
+check_json_first_event codex thread.started
+
+# claude → system / init
+# (codex/claude have a different entry verb; using a uniform "first line type" check)
+first=$(echo smoke | claude -p 2>/dev/null | head -1 || true)
+if printf '%s' "$first" | grep -q '"type":"system"'; then
+  pass "claude first event = system"
+else
+  fail "claude first event missing system shape: ${first:0:80}"
+fi
+
+# gemini → init
+check_json_first_event gemini init
+
+# cursor-agent → system + subtype:init
+first=$(echo smoke | cursor-agent 2>/dev/null | head -1 || true)
+if printf '%s' "$first" | grep -q '"type":"system"' && printf '%s' "$first" | grep -q '"subtype":"init"'; then
+  pass "cursor-agent first event = system+init"
+else
+  fail "cursor-agent first event missing system/init shape: ${first:0:80}"
+fi
+
+# Plain agents — first non-empty line should be from the report content.
+for agent in deepseek qwen grok; do
+  out=$(echo smoke | "$agent" 2>/dev/null | head -1 || true)
+  if [ -n "$out" ]; then
+    pass "$agent emitted plain text (${#out} chars on first line)"
+  else
+    fail "$agent emitted nothing"
+  fi
+done
+
+# vela subcommands — `vela models` (prints catalog) and `vela login` (writes
+# ~/.amr/config.json). These exit immediately, no recording involved.
+vela_models_out=$(vela models 2>/dev/null | wc -l | tr -d ' ')
+if [ "$vela_models_out" -ge 10 ]; then
+  pass "vela models printed $vela_models_out catalog lines"
+else
+  fail "vela models printed only $vela_models_out lines (expected ≥10)"
+fi
+
+# Sandbox vela login into a temp HOME so we never touch the caller's real
+# ~/.amr config (which holds the production vela login state for anyone
+# using the real CLI). vela's login subcommand resolves ~/.amr from $HOME,
+# so override just for this one invocation.
+amr_sandbox="$(mktemp -d -t od-mocks-amr.XXXXXX)"
+trap 'rm -rf "$amr_sandbox"' EXIT
+if HOME="$amr_sandbox" FAKE_VELA_LOGIN_USER_EMAIL=smoke@od.local vela login >/dev/null 2>&1 \
+   && [ -f "$amr_sandbox/.amr/config.json" ]; then
+  email=$(node -e "console.log(JSON.parse(require('fs').readFileSync('$amr_sandbox/.amr/config.json','utf-8')).profiles.prod.user.email)" 2>/dev/null || echo "")
+  if [ "$email" = "smoke@od.local" ]; then
+    pass "vela login wrote ~/.amr/config.json with profile.prod.user.email"
+  else
+    fail "vela login config.json missing expected email (got: $email)"
+  fi
+else
+  fail "vela login did not produce ~/.amr/config.json"
+fi
+
+# vela ACP roundtrip (strict set_model gate enforced).
+vela_acp_out=$(cat <<EOF | vela agent run --runtime opencode 2>/dev/null
+{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}
+{"jsonrpc":"2.0","id":2,"method":"session/new","params":{"cwd":"/tmp"}}
+{"jsonrpc":"2.0","id":3,"method":"session/set_model","params":{"sessionId":"fake-vela-session-1","modelId":"deepseek-v3.2"}}
+{"jsonrpc":"2.0","id":4,"method":"session/prompt","params":{"sessionId":"fake-vela-session-1","prompt":[{"type":"text","text":"hi"}]}}
+EOF
+)
+if printf '%s' "$vela_acp_out" | grep -q '"agentCapabilities"' \
+  && printf '%s' "$vela_acp_out" | grep -q '"availableModels"' \
+  && printf '%s' "$vela_acp_out" | grep -q '"id":3,"result":{}' \
+  && printf '%s' "$vela_acp_out" | grep -q '"sessionUpdate":"agent_message_chunk"' \
+  && printf '%s' "$vela_acp_out" | grep -q '"id":4,"result":{"stopReason":'; then
+  pass "vela agent run ACP roundtrip (initialize+models, set_model accepted, prompt streamed)"
+else
+  fail "vela agent run ACP roundtrip incomplete"
+fi
+
+# vela strict set_model gate — skipping set_model must reject prompt.
+vela_gate_out=$(cat <<EOF | vela agent run --runtime opencode 2>/dev/null
+{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}
+{"jsonrpc":"2.0","id":2,"method":"session/new","params":{"cwd":"/tmp"}}
+{"jsonrpc":"2.0","id":3,"method":"session/prompt","params":{"sessionId":"fake-vela-session-1","prompt":[{"type":"text","text":"hi"}]}}
+EOF
+)
+if printf '%s' "$vela_gate_out" | grep -q 'session/set_model must be called before session/prompt'; then
+  pass "vela strict set_model gate rejects session/prompt without prior set_model"
+else
+  fail "vela strict set_model gate did not reject (negative-path regression)"
+fi
+
+# ACP agents — JSON-RPC server. Send initialize+session/new+prompt and
+# verify the protocol responses come back in order.
+# kiro-cli and vibe-acp are the primary OD-facing bin names; test them
+# alongside the fallback names (kiro, vibe).
+for agent in hermes kimi kilo kiro kiro-cli vibe vibe-acp devin; do
+  out=$(cat <<EOF | "$agent" 2>/dev/null
+{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}
+{"jsonrpc":"2.0","id":2,"method":"session/new","params":{"cwd":"/tmp"}}
+{"jsonrpc":"2.0","id":3,"method":"session/prompt","params":{"sessionId":"X","prompt":[{"type":"text","text":"hi"}]}}
+EOF
+)
+  # Expect: id=1 initialize result, id=2 session/new result, ≥1 session/update, id=3 prompt result
+  if printf '%s' "$out" | grep -q '"id":1,"result":{"protocolVersion":1' \
+    && printf '%s' "$out" | grep -q '"id":2,"result":{"sessionId":' \
+    && printf '%s' "$out" | grep -q '"sessionUpdate":"agent_message_chunk"' \
+    && printf '%s' "$out" | grep -q '"id":3,"result":{"stopReason":'; then
+    pass "$agent ACP roundtrip complete (init → session/new → update → prompt result)"
+  else
+    fail "$agent ACP roundtrip incomplete"
+  fi
+done
+
+echo
+if [ "$failed" -eq 0 ]; then
+  echo "All mock CLIs working. ✅"
+else
+  echo "$failed check(s) failed. ❌"
+  exit 1
+fi
--- a/mocks/scripts/upload-recording.sh
+++ b/mocks/scripts/upload-recording.sh
@ -0,0 +1,103 @@
+#!/usr/bin/env bash
+# Upload a new anonymized recording to Cloudflare R2 and update the
+# local manifest. Local-maintainer flow — no GitHub Action involved,
+# nothing about the recording ever lands in git.
+#
+#   bash mocks/scripts/upload-recording.sh <recording.jsonl>
+#
+# Prereqs
+# -------
+# - `wrangler login` once (OAuth, no token to manage). The logged-in
+#   account must have access to the powerformer R2 namespace where the
+#   `open-design-mocks` bucket lives.
+# - That's it. Bucket is public-read, manifest is in repo; consumers
+#   pull via `fetch-recordings.sh`.
+#
+# What it does
+# ------------
+#   1. Validate the .jsonl (first line = meta event, UUID filename)
+#   2. Compute sha256, parse meta, build manifest entry
+#   3. `wrangler r2 object put` the recording → recordings/v1/<id>.jsonl
+#   4. Update mocks/manifest.json with the new entry (rebuilt histograms)
+#   5. `wrangler r2 object put` the manifest too → recordings/v1/manifest.json
+#   6. Tell you to commit + push the manifest change
+#
+# The .jsonl is never copied into the repo. Only mocks/manifest.json
+# (≈200B added per entry) gets git-tracked.
+
+set -euo pipefail
+
+HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd -P)"
+MOCKS_DIR="$(cd "$HERE/.." && pwd -P)"
+MANIFEST="$MOCKS_DIR/manifest.json"
+LIB="$HERE/lib/manifest-utils.mjs"
+BUCKET='open-design-mocks'
+KEY_PREFIX='recordings/v1/'
+# powerformer hosts the bucket; pin so wrangler doesn't ask which
+# account in non-interactive mode when the OAuth login spans several.
+export CLOUDFLARE_ACCOUNT_ID='64ad4569ffd912432d6b86d5656484c4'
+
+if [ $# -lt 1 ] || [ "$1" = "-h" ] || [ "$1" = "--help" ]; then
+  sed -n '2,28p' "$0" | sed 's/^# \?//'
+  exit 0
+fi
+
+INPUT="$1"
+if [ ! -f "$INPUT" ]; then echo "✗ no such file: $INPUT" >&2; exit 1; fi
+
+INPUT_ABS="$(cd "$(dirname "$INPUT")" && pwd -P)/$(basename "$INPUT")"
+TRACE_ID="$(basename "$INPUT" .jsonl)"
+
+if ! printf '%s' "$TRACE_ID" | grep -qE '^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$'; then
+  echo "✗ trace id '$TRACE_ID' is not a UUID. Rename the file." >&2
+  exit 1
+fi
+
+if ! command -v wrangler >/dev/null 2>&1; then
+  echo "✗ wrangler not installed. \`npm i -g wrangler\` (or pnpm/yarn) and \`wrangler login\` first." >&2
+  exit 1
+fi
+
+# Step 1+2: validate + build entry preview via shared lib (Node — same code
+# the consumers use, so no shape drift)
+ENTRY_JSON=$(node --input-type=module -e "
+import { inspectRecording } from '$LIB';
+process.stdout.write(JSON.stringify(inspectRecording('$INPUT_ABS'), null, 2));
+" 2>&1) || {
+  echo "✗ validation failed:" >&2
+  echo "$ENTRY_JSON" | sed 's/^/  /' >&2
+  exit 1
+}
+
+echo "manifest entry to add:"
+echo "$ENTRY_JSON" | sed 's/^/  /'
+echo
+
+# Step 3: upload the recording itself
+echo "→ uploading recording to R2…"
+wrangler r2 object put "${BUCKET}/${KEY_PREFIX}${TRACE_ID}.jsonl" \
+  --file "$INPUT_ABS" --remote >/dev/null
+echo "  ✓ $TRACE_ID.jsonl"
+
+# Step 4: update local manifest.json
+echo "→ updating local mocks/manifest.json…"
+node --input-type=module -e "
+import { inspectRecording, upsertEntry, readManifest, writeManifest } from '$LIB';
+const m = readManifest('$MANIFEST');
+upsertEntry(m, inspectRecording('$INPUT_ABS'));
+writeManifest('$MANIFEST', m);
+console.log('  ✓ now ' + m.total + ' entries (' + (m.total_bytes/1024).toFixed(0) + ' KB total)');
+"
+
+# Step 5: upload the updated manifest to R2 so consumers see the new entry
+# without waiting for the next git push.
+echo "→ uploading manifest to R2…"
+wrangler r2 object put "${BUCKET}/${KEY_PREFIX}manifest.json" \
+  --file "$MANIFEST" --remote >/dev/null
+echo "  ✓ manifest.json"
+
+echo
+echo "✅ done. Next:"
+echo "  git add mocks/manifest.json"
+echo "  git commit -m 'mocks: add recording $TRACE_ID'"
+echo "  git push"
--- a/scripts/guard.ts
+++ b/scripts/guard.ts
@ -117,6 +117,17 @@ const residualAllowedPathPrefixes = [
  "design-templates/last30days/scripts/lib/vendor/",
  // Vendored upstream html-ppt runtime assets (lewislulu/html-ppt-skill, design template).
  "design-templates/html-ppt/assets/",
+  // Replay-based mock CLIs that impersonate the agent CLIs OD spawns
+  // (opencode/claude/codex/gemini/cursor-agent + ACP family). Need to
+  // be directly executable via Node so `child_process.spawn` from test
+  // harnesses and PATH-overlay shells work without any transform step.
+  // `mocks/scripts/` holds the maintainer-facing helpers (manifest math,
+  // fetch from R2) which are also pure-node single-file modules — same
+  // precedent as `apps/daemon/tests/fixtures/fake-vela.mjs` (an ACP
+  // stdio stub, allowlisted individually above). See `mocks/README.md`.
+  "mocks/lib/",
+  "mocks/mock-agent.mjs",
+  "mocks/scripts/",
  "test-results/",
  "vendor/",
 ];