open-design

mirror of https://github.com/nexu-io/open-design.git synced 2026-05-31 19:04:39 +07:00

History

lefarcen da19ff3ca0 feat(mocks): replay-based mock CLIs for 14 of OD's supported agents (opencode/codex/claude/gemini/cursor-agent/deepseek/qwen/grok + ACP family devin/hermes/kilo/kimi/kiro/vibe) (#3241 ) * feat(mocks): replay-based mock CLIs for opencode/claude/codex/deepseek/qwen/grok Drops in a `mocks/` top-level dir that pretends to be the real agent CLIs by streaming pre-recorded sessions in each CLI's native stdout protocol. Zero LLM tokens. ## Use cases - E2E tests in `apps/daemon/tests/` — exercise the full chat-server pipeline against a known trace, assert UI events / artifacts. - Self-validation during dev — iterate on `claude-stream.ts` / `json-event-stream.ts` parser changes without burning provider budget. - Regression harness — replay the same trace before and after a charter / parser change; diff the daemon events the UI surfaces. - Demo / onboarding — show what a 17-tool claude editing session looks like end-to-end, offline. ## How - 6 bash wrappers (`mocks/bin/`) shadow the real CLIs when PATH-overlaid. - `mocks/mock-agent.mjs` reads `mocks/recordings/<trace>.jsonl`, picks one via env var (`SYNCLO_EXPLORE_MOCK_TRACE` / `_POOL` / `_BY_PROMPT_HASH`), streams the trace in the requested format. - Each format renderer matches the EXACT JSON shape the OD daemon parser expects, verified line-by-line against `apps/daemon/src/{json-event-stream,claude-stream}.ts`: \| CLI \| streamFormat \| parser source \| \| ------------------------- \| ------------------------- \| ------------------------------------------ \| \| `opencode` \| `json-event-stream` \| `handleOpenCodeEvent` \| \| `codex` \| `json-event-stream` \| `handleCodexEvent` \| \| `claude` \| `claude-stream-json` \| `createClaudeStreamHandler` \| \| `deepseek` `qwen` `grok` \| `plain` \| `server.ts` (raw stdout) \| ## Quick start ```bash export PATH="$PWD/mocks/bin:$PATH" export SYNCLO_EXPLORE_MOCK_TRACE=04097377 # 8-char prefix OK export SYNCLO_EXPLORE_MOCK_NO_DELAY=1 echo "any prompt" \| opencode run echo "any prompt" \| claude -p --output-format=stream-json echo "any prompt" \| codex exec ``` The mock binary announces the picked trace id on stderr: `[mock-opencode] picked 04097377… via fixed`. Recording selection (env, in priority order): - `SYNCLO_EXPLORE_MOCK_TRACE=<id>` — fixed (prefix OK) - `SYNCLO_EXPLORE_MOCK_BY_PROMPT_HASH=1` + stdin prompt — `sha256(prompt) % N` - `SYNCLO_EXPLORE_MOCK_POOL=<tag>` — random within `agent:claude` / `skill:agent-browser` / `outcome:failed` / etc. - (default) uniform random - `SYNCLO_EXPLORE_MOCK_SEED=<str>` — reproducible "random" - `SYNCLO_EXPLORE_MOCK_NO_DELAY=1` — skip inter-event waits ## Dataset 179 anonymized Langfuse traces from this project's own production telemetry: - 9 agents: claude 57 · opencode 41 · codex 38 · gemini 25 · cursor-agent 11 · qwen 2 · copilot 2 · deepseek 2 · antigravity 1 - outcomes: succeeded 144 · failed 35 - skills: default 71 · ad-creative 50 · algorithmic-art 30 · agent-browser 22 · video-hyperframes 2 · plus magazine-web-ppt / brainstorming / data-report / penpot-flutter-design-source 1 each - 124 multi-turn (sessions with ≥2 turns) - 18 produce `<artifact>` output - ~4.5 MB on disk total Anonymization: `/Users/<name>/` → `${HOME}/`, `C:\Users\<name>\` → `%USERPROFILE%\`, project UUIDs → stable `proj-001`, `proj-002`, …. Tool input/output payloads preserved verbatim (templated UI, no cell-level PII). ## Smoke test `bash mocks/scripts/smoke-test.sh` — 6 checks across all 6 agents. All pass on this branch (verified locally): ``` ✓ opencode first event = step_start ✓ codex first event = thread.started ✓ claude first event = system ✓ deepseek emitted plain text (144 chars on first line) ✓ qwen emitted plain text (144 chars on first line) ✓ grok emitted plain text (144 chars on first line) All mock CLIs working. ✅ ``` ## Adding more recordings The exporter that produced this set lives in [nexu-io/agent-pr-explore](https://github.com/nexu-io/agent-pr-explore) (see `cli/src/local/orchestrator/langfuse-import.ts` + the `local langfuse-import` CLI command). Operators with the Langfuse keys can pull more by tag / outcome / artifact / multi-turn filter, then run `local recordings anonymize --out-dir ~/Documents/open-design/mocks/recordings`. `mocks/README.md` has the full instructions. ## Out of scope (follow-ups) - ACP agents (`devin`, `hermes`, `kilo`, `kimi`, `kiro`, `vibe`) need a JSON-RPC server on stdio rather than a one-shot stream — separate `format-acp.mjs` module not yet written. - Per-agent json-event-stream variants (`cursor-agent`, `gemini`, `qoder`, `copilot`, `pi`) currently fall back to the `plain` renderer; their parsers are in `apps/daemon/src/json-event-stream.ts` and follow the same template as `format-codex.mjs`. ## AGENTS.md updates - Added `mocks/` to the top-level content directories listing - Added a Validation strategy bullet pointing here for agent-stream / parser changes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(mocks): add opencode-cli/kiro-cli/vibe-acp bin aliases and unref ACP timeout - Add mocks/bin/opencode-cli, kiro-cli, vibe-acp wrappers for the primary RuntimeAgentDef bin names OD resolves before any fallback. Without these, a PATH-overlaid OD daemon run bypasses the mock entirely (opencode-cli, kiro-cli) or cannot find the mock at all (vibe-acp, which has no fallback). - Include opencode-cli, kiro-cli, vibe-acp in the smoke-test ACP/JSON loop so coverage is verified end-to-end. - Call .unref() on the 30s safety timeout in format-acp.mjs so a completed ACP session exits promptly instead of waiting the full 30 seconds. Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code) * feat(mocks): add vela (AMR) — login / models / ACP with strict set_model gate Extends mocks/ to cover OD's own AMR runtime. `vela` is the bin name `apps/daemon/src/runtimes/defs/amr.ts` specifies (`bin: 'vela'`, `streamFormat: 'acp-json-rpc'`). It's richer than the generic ACP agents — covers full login + models + chat-session lifecycle. ### What vela does (mirrored from apps/daemon/tests/fixtures/fake-vela.mjs) 1. `vela login` — writes ~/.amr/config.json with a fake profile (controlKey, runtimeKey, user{email,name,plan}, profile-specific apiUrl/linkUrl). The on-disk projection is what OD's daemon login route + AmrLoginPill poller read; production goes through device-auth, the mock skips straight to the file write. 2. `vela models` — prints the production-shaped public model catalog as newline-separated `public_model_* vela` lines. Override via FAKE_VELA_MODELS env. 3. `vela agent run --runtime opencode` — ACP JSON-RPC server with three vela-specific protocol extensions: a. `initialize` response carries `agentCapabilities` (`promptCapabilities.embeddedContext`) + `models` (`currentModelId` + `availableModels`). b. `session/new` response carries the same `models` block. c. Strict set_model gate: `session/prompt` is rejected with JSON-RPC -32602 ("session/set_model must be called before session/prompt") UNLESS `session/set_model` (or `session/set_config_option`) has been called for the current sessionId. Mirrors real vela 0.0.1 contract; catches regressions in `attachAcpSession` that silently skip set_model. ### Error injection envs (in sync with fake-vela.mjs) FAKE_VELA_SESSION_ID - sessionId returned by session/new FAKE_VELA_TEXT - override assistant text FAKE_VELA_THOUGHT - optional thought_chunk before text FAKE_VELA_SESSION_NEW_ERROR - fail session/new FAKE_VELA_SET_MODEL_ERROR - fail session/set_model FAKE_VELA_PROMPT_ERROR - fail session/prompt FAKE_VELA_REQUIRE_SET_MODEL='0' - disable the strict gate (legacy) FAKE_VELA_LOGIN_USER_EMAIL - email written into config profile FAKE_VELA_LOGIN_USER_PLAN - plan written into config profile FAKE_VELA_LOGIN_DELAY_MS - sleep before write (test in-flight) FAKE_VELA_LOGIN_FAIL - print + exit 1 FAKE_VELA_MODELS - override models stdout VELA_PROFILE - profile slot (prod \| test \| local) ### Components `mocks/lib/format-vela.mjs` (~205 LOC) - Full ACP server with vela protocol extensions - Strict set_model gate - Error injection plumbing `mocks/lib/vela-subcommands.mjs` (~90 LOC) - runVelaLogin() — writes ~/.amr/config.json - runVelaModels() — prints catalog `mocks/bin/vela` — dispatcher wrapper. Forwards `vela <subcmd>` to mock-agent.mjs which routes to login/models or falls through to ACP. `mocks/mock-agent.mjs` — parseArgs now collects positionals so the vela dispatcher can read subcommand from there; switch case added for vela. `mocks/scripts/smoke-test.sh` — +4 assertions: vela models prints ≥10 catalog lines vela login writes ~/.amr/config.json with the requested email vela agent run ACP roundtrip (initialize+models+set_model+stream+result) vela strict set_model gate rejects prompt without prior set_model ### Verified locally ✓ vela models printed 15 catalog lines ✓ vela login wrote ~/.amr/config.json with profile.prod.user.email ✓ vela agent run ACP roundtrip (initialize+models, set_model accepted, prompt streamed) ✓ vela strict set_model gate rejects session/prompt without prior set_model All 21 smoke checks pass (up from 17 with previous P3 ACP commit). ### AGENTS.md + README updates AGENTS.md — mention `vela (AMR — vela CLI)` alongside ACP agents in the directory listing entry. mocks/README.md — protocol table row + dedicated vela section with subcommand contract, strict gate explanation, env-injection cheat sheet. Mock-tree listing updated. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(mocks): honor REPORT_FILE env when --report-file flag not given Harnesses that spawn the mock without translating their report-path contract to the mock's CLI flag (notably nexu-io/agent-pr-explore's orchestrator, which passes REPORT_FILE as env per the existing opencode/claude/codex agent launchers) wouldn't get a report file written, so the harness's "agent exit 0 but produced no report" check would always fire and mark mock runs as failure even though the stdout stream was complete. Fix: in mock-agent.mjs parseArgs, fall through to process.env.REPORT_FILE when --report-file wasn't provided on argv. Each format renderer already accepts opts.reportFile and writes the recording's final assistant text to it (`format-.mjs` already had this — only the wiring was missing). Verified: synclo-explore run with `mock=true, mock_trace=04097377` against the opencode wrapper now produces a plan.md with the recording's 17-tool claude editing session report. ~1.5s per run vs ~70s real opencode. mocks: move recordings to Cloudflare R2; PR→main→Action upload path The 179-recording corpus (~4.5 MB raw, ~280 KB after compression) has been moved off git into Cloudflare R2 at the bucket open-design-mocks under recordings/v1/. The repo now ships: - mocks/manifest.json — the canonical catalog (renamed from recordings/index.json) with sha256 + storage hints; consumers fetch this to discover what exists, then pull individual jsonl files on demand - mocks/scripts/fetch-recordings.sh — parallel, sha256-verified, idempotent puller for the public r2.dev URL - mocks/scripts/add-recording.sh — local maintainer helper that validates a new .jsonl and copies it into recordings-staging/ (no R2 calls; no credentials needed) - mocks/scripts/upload-to-r2.mjs — called only by the CI workflow - mocks/scripts/lib/manifest-utils.mjs — shared sha256/meta/ rebuild-histograms logic, used by both add-recording (preview) and upload-to-r2 (actual write) so the entry shape never drifts - .github/workflows/sync-mocks-to-r2.yml — fires on push to main when mocks/recordings-staging/ changes; uploads to R2, updates manifest, commits cleanup back; serialized via concurrency group Trust model: R2 write credentials (CLOUDFLARE_API_TOKEN, CLOUDFLARE_ACCOUNT_ID) are repo secrets; nobody can push from a laptop. Read stays public via the r2.dev URL. Why not pnpm install integration: contributors who do not touch agent code do not pay the fetch cost. Fetch happens on first smoke-test run (auto-fallback) or when a mock spawn needs data. Repo size: -4.55 MB net (delete 179 jsonl, +280 KB manifest + scripts). Smoke test (21 checks) still green against the fetched corpus. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mocks: scope R2 write token to a dedicated secret name Use CLOUDFLARE_R2_MOCKS_TOKEN (instead of reusing the shared CLOUDFLARE_API_TOKEN that landing-page-.yml uses for Pages deploys) so the R2 write capability can be scoped to just the open-design-mocks bucket without bleeding extra capability into the Pages workflows. Also hardcode the powerformer CF account_id directly in the workflow (account IDs are not secret and the shared CLOUDFLARE_ACCOUNT_ID secret may point at a different account). Workflow now fails fast with an actionable error message + dashboard link if the secret is unset. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> mocks: switch R2 sync to S3-compat API (wrangler getMemberships gate) wrangler 4.x calls /memberships before any r2 action, requiring user:read scope. R2 "Object Read & Write" tokens deliberately lack that scope (defense in depth — a leaked token should not enumerate account-level resources). The workflow now uses the aws CLI talking straight to the R2 S3-compatible endpoint with SigV4, no membership lookup. Secret rotation: CLOUDFLARE_R2_MOCKS_TOKEN (Bearer) is replaced by CLOUDFLARE_R2_MOCKS_AK / CLOUDFLARE_R2_MOCKS_SK (matching the existing CLOUDFLARE_R2_RELEASES_AK/SK naming convention). End-to-end tested locally: PUT recording → manifest rebuild → manifest PUT → staging cleanup all green. aws CLI is pre-installed on ubuntu-latest, so no install step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mocks: scrub synclo namespace; use OD_MOCKS_* env prefix throughout These mocks were copy-pasted from synclo-explore, where they originated, and inherited the SYNCLO_EXPLORE_MOCK_* env-var convention. That brand-bleed is not appropriate in OD: rename the public env surface to OD_MOCKS_* (matching OD-native prefixes like OD_MOCKS_CACHE_DIR, OD_TRACE_R2_UPLOAD, OD_EXPECT_TIMEOUT_SECONDS). Renames: SYNCLO_EXPLORE_MOCK_TRACE → OD_MOCKS_TRACE SYNCLO_EXPLORE_MOCK_BY_PROMPT_HASH → OD_MOCKS_BY_PROMPT_HASH SYNCLO_EXPLORE_MOCK_POOL → OD_MOCKS_POOL SYNCLO_EXPLORE_MOCK_SEED → OD_MOCKS_SEED SYNCLO_EXPLORE_MOCK_NO_DELAY → OD_MOCKS_NO_DELAY SYNCLO_EXPLORE_MOCK_RECORDINGS_DIR → OD_MOCKS_RECORDINGS_DIR SYNCLO_EXPLORE_MOCK_SMOKE_TRACE → OD_MOCKS_SMOKE_TRACE SYNCLO_OD_MOCKS_I_KNOW_WHAT_IM_DOING → OD_MOCKS_ALLOW_LOCAL_UPLOAD Also drop the inline harvester usage from README. The harvester is an external CLI in nexu-io/agent-pr-explore — its README is the right place for langfuse-import flags, anonymization options, etc. OD only documents its own staging→PR→Action workflow. Smoke test (21 checks) still green; OD_MOCKS_TRACE end-to-end verified to route correctly. Consumers of the OLD env names (notably the orchestrator in nexu-io/agent-pr-explore) need a matching rename. No back-compat shim here — the explore side has zero external users today and a one-line follow-up is cleaner than a permanent deprecation layer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * AGENTS.md: align mock env names with mocks/ rename (SYNCLO_* → OD_MOCKS_) Missed in the prior commit (`a30b868a`) — only grepped mocks/ subdir. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> mocks: drop staging dir + GH Action; back to local-script upload The staging-dir + Action design (added earlier in this PR) had a flaw the user caught: new recordings briefly entered the repo on their way through staging, leaving them in git history forever even after the Action cleanup commit removed them from HEAD. That defeats the whole point of moving recordings to R2. Replace with the simpler local-maintainer flow: bash mocks/scripts/upload-recording.sh /path/to/<trace>.jsonl # → validates, wrangler r2 put, updates manifest.json, wrangler r2 put manifest git add mocks/manifest.json && git commit && git push # → only the ~200B manifest delta enters git The wrangler-OAuth gate replaces the CI secret + Action duo. For a solo / small maintainer team this collapses the trust chain down to "do you have wrangler login to the powerformer account?" — no GH secrets to rotate, no concurrency window to worry about, no inevitable repo-history bloat. Deletes: - .github/workflows/sync-mocks-to-r2.yml - mocks/scripts/upload-to-r2.mjs (CI-only) - mocks/scripts/add-recording.sh (staging helper, now obsolete) - mocks/recordings-staging/ (empty dir, never to be repopulated) Adds: - mocks/scripts/upload-recording.sh Kept: - mocks/scripts/fetch-recordings.sh - mocks/scripts/lib/manifest-utils.mjs (still used by upload-recording.sh) - mocks/manifest.json (committed; the only mocks artifact in git) End-to-end tested locally: re-upload an existing recording is idempotent, manifest math is stable, fetch + smoke test still green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mocks: address review — guard allowlist + safe ~/.amr + loud OD_MOCKS_TRACE typo Three concrete issues raised across recent Siri-Ray (Looper) review threads on #3241: 1. scripts/guard.ts only allowlisted mocks/lib/ + mocks/mock-agent.mjs, leaving mocks/scripts/lib/manifest-utils.mjs outside the residual- JS guard. Result: Preflight fail on every push. Extend the allowlist to mocks/scripts/ — same precedent as the lib/ entry directly above. 2. mocks/scripts/smoke-test.sh moved the caller real ~/.amr to ~/.amr-smoke-backup, ran vela login (which writes a fake config), then rm -rf the .amr and restored the backup. Two failure modes: crash mid-run loses the user real config, and re-running before restore overwrites the backup with the fake login. Fix: sandbox vela login into a mktemp -d HOME via env (HOME=$amr_sandbox vela login). Never touches the real ~/.amr at all. trap cleans up. 3. mocks/lib/recording-picker.mjs silently fell through to prompt-hash → pool → random when OD_MOCKS_TRACE was set but did not match any recording (typo, prefix too short, corpus not fetched). Tests using a pinned trace would silently get a different trace, hiding regressions. Fix: throw an explicit error with the failing value + a pointer at fetch-recordings.sh. Verified locally: pnpm guard prints "Residual JavaScript check passed", smoke-test still 21/21, ~/.amr mtime unchanged after run, typo on OD_MOCKS_TRACE now produces "mock-agent: OD_MOCKS_TRACE=... set but no matching recording in <dir>" on stderr. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fetch-recordings: detect empty filter result before line-counting printf '%s\n' on an empty string emits a single empty line, so the previous TOTAL=$(printf ... \| grep -c "") math returned 1 on an empty $ENTRIES_TSV — a typo like `--agent no-such-agent` printed "Fetching up to 1 recordings", downloaded zero, and exited 0 ("ready"). Check `-z $ENTRIES_TSV` first. Reproduced + fix verified per the reviewer thread. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mocks: address mrcfps review — goldens + provenance + contract check Three durability improvements suggested in the PR #3241 top-level review: ## 1. Golden daemon-event snapshots (mocks/golden/.events.json + apps/daemon/tests/mocks-golden.test.ts) Smoke-test verified that mocks RUN; that catches crashes but not a parser change that semantically reshapes the events the daemon emits. Commit the daemon-event sequence for 3 representative traces: - claude 314d6833 — median-complexity agent-browser session - codex dcdff3b3 — 14-tool refactor - opencode 9a9522ec — 7-tool data-report apps/daemon/tests/mocks-golden.test.ts spawns the mock, feeds stdout through the real createClaudeStreamHandler / createJsonEventStreamHandler, normalizes per-spawn volatile fields (only sessionId today, only on claude), and deep-equals against the committed snapshot. A parser regression fails the test loudly. After an intentional parser change, regenerate: MOCKS_GOLDEN_UPDATE=1 pnpm --filter @open-design/daemon test mocks-golden git diff mocks/golden/ # eyeball; commit if shapes match intent ## 2. Provenance fields on every manifest entry (mocks/scripts/lib/manifest-utils.mjs + mocks/manifest.json) Augment inspectRecording() to write: captured_at — ISO 8601 from existing meta.timestamp cli_version — null until harvester writes it protocol_version — null until harvester writes it anonymization_version — null until harvester writes it captured_at is now populated for all 179 existing entries from the meta event the harvester already emits. The harvester in nexu-io/agent-pr-explore is the next step for cli_version / protocol_version / anonymization_version — once those are populated, consumers can detect when a recording is older than ~1 minor version behind the live CLI and flag for re-harvest. No matrix of (cli_version × agent) recordings — that explodes maintenance. Just metadata per recording so trust decay is visible. ## 3. Real-CLI contract check (mocks/scripts/contract-check.sh + docs/MOCKS-CONTRACT-CHECK.md) Mocks catch parser regressions against recordings; they do NOT catch recordings drifting away from the live agent CLI as that CLI evolves. The contract check spawns the real CLI alongside the mock with a fixed deterministic prompt + diffs top-level event-type distributions. Deliberately human-driven, not cron-scheduled: - costs real LLM tokens per invocation - requires real CLI auth - maintainer reads the output, not a regex Suggested triggers per doc: real-CLI release notes mentioning "output format" / "stream" / "JSON" / "events"; before a parser refactor; ad-hoc when something looks off. ## Coverage note README updated to position mocks as "deterministic protocol/parser coverage" (not "e2e replacement") per mrcfps framing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> fix(mocks-golden test): drop import of non-exported ParserKind Use plain string (the type alias is `string` anyway) — Preflight typecheck on `a31fa71a` failed: tests/mocks-golden.test.ts(29,8): error TS2459: Module "../src/json-event-stream.js" declares "ParserKind" locally, but it is not exported. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * recording-picker: structured OD_MOCKS_POOL + hard-fail no-match Siri-Ray review: \`OD_MOCKS_POOL=outcome:failed\` was documented as a supported selection knob, but the matcher only checked tags and \`meta.agent\` — so the negative-path pool found 0 candidates and silently fell through to global random, validating against any recording instead of a failed trace. Fix: - Parse \`<dim>:<value>\` shape and route each dim to the right meta field: \`outcome\` → \`meta.outcome\`, \`agent\` → \`meta.agent\`, \`skill\` → \`tags[]\`. Bare values still fall back to tag substring. - If the env was set and matched nothing, throw with the failing value and a jq one-liner for inspection. Same loud-fail policy as OD_MOCKS_TRACE — silent fallback was the original bug. Verified locally: outcome:failed, agent:codex, skill:agent-browser all route correctly; outcome:nonsense throws the explicit error. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * contract-check.sh: fix lost $PROMPT in mock invocation Siri-Ray review on `e576074a`: the mock side wrapped its pipeline in `bash -c "printf %s \"\$PROMPT\" \| ..."` — but $PROMPT was a parent shell variable, not exported, so the child bash expanded it to an empty string. Result: the contract check sent the real prompt to the real CLI and an empty string to the mock, defeating the same-input invariant the whole script rests on. Also let the mock randomly select a different trace whenever a maintainer happens to have OD_MOCKS_BY_PROMPT_HASH=1 in their env. Fix: drop the inner bash -c entirely; use a subshell that scopes the PATH overlay and pipes printf into the PATH-resolved mock binary directly. The subshell limits the PATH change without var-passing. Verified locally: with prompt-A the mock picks trace 54ec02ee via hash; prompt-B → 2667e851 via hash; empty prompt (old broken behavior) → random — confirms the prompt is now actually reaching the mock under PATH overlay. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-05-29 07:17:20 +00:00
..
format-acp.mjs	feat(mocks): replay-based mock CLIs for 14 of OD's supported agents (opencode/codex/claude/gemini/cursor-agent/deepseek/qwen/grok + ACP family devin/hermes/kilo/kimi/kiro/vibe) (#3241 )	2026-05-29 07:17:20 +00:00
format-claude.mjs	feat(mocks): replay-based mock CLIs for 14 of OD's supported agents (opencode/codex/claude/gemini/cursor-agent/deepseek/qwen/grok + ACP family devin/hermes/kilo/kimi/kiro/vibe) (#3241 )	2026-05-29 07:17:20 +00:00
format-codex.mjs	feat(mocks): replay-based mock CLIs for 14 of OD's supported agents (opencode/codex/claude/gemini/cursor-agent/deepseek/qwen/grok + ACP family devin/hermes/kilo/kimi/kiro/vibe) (#3241 )	2026-05-29 07:17:20 +00:00
format-cursor-agent.mjs	feat(mocks): replay-based mock CLIs for 14 of OD's supported agents (opencode/codex/claude/gemini/cursor-agent/deepseek/qwen/grok + ACP family devin/hermes/kilo/kimi/kiro/vibe) (#3241 )	2026-05-29 07:17:20 +00:00
format-gemini.mjs	feat(mocks): replay-based mock CLIs for 14 of OD's supported agents (opencode/codex/claude/gemini/cursor-agent/deepseek/qwen/grok + ACP family devin/hermes/kilo/kimi/kiro/vibe) (#3241 )	2026-05-29 07:17:20 +00:00
format-opencode.mjs	feat(mocks): replay-based mock CLIs for 14 of OD's supported agents (opencode/codex/claude/gemini/cursor-agent/deepseek/qwen/grok + ACP family devin/hermes/kilo/kimi/kiro/vibe) (#3241 )	2026-05-29 07:17:20 +00:00
format-plain.mjs	feat(mocks): replay-based mock CLIs for 14 of OD's supported agents (opencode/codex/claude/gemini/cursor-agent/deepseek/qwen/grok + ACP family devin/hermes/kilo/kimi/kiro/vibe) (#3241 )	2026-05-29 07:17:20 +00:00
format-vela.mjs	feat(mocks): replay-based mock CLIs for 14 of OD's supported agents (opencode/codex/claude/gemini/cursor-agent/deepseek/qwen/grok + ACP family devin/hermes/kilo/kimi/kiro/vibe) (#3241 )	2026-05-29 07:17:20 +00:00
recording-picker.mjs	feat(mocks): replay-based mock CLIs for 14 of OD's supported agents (opencode/codex/claude/gemini/cursor-agent/deepseek/qwen/grok + ACP family devin/hermes/kilo/kimi/kiro/vibe) (#3241 )	2026-05-29 07:17:20 +00:00
vela-subcommands.mjs	feat(mocks): replay-based mock CLIs for 14 of OD's supported agents (opencode/codex/claude/gemini/cursor-agent/deepseek/qwen/grok + ACP family devin/hermes/kilo/kimi/kiro/vibe) (#3241 )	2026-05-29 07:17:20 +00:00