open-design/mocks/golden/dcdff3b3-cd39-4dcd-be83-372830a29639.events.json
lefarcen da19ff3ca0
feat(mocks): replay-based mock CLIs for 14 of OD's supported agents (opencode/codex/claude/gemini/cursor-agent/deepseek/qwen/grok + ACP family devin/hermes/kilo/kimi/kiro/vibe) (#3241)
* feat(mocks): replay-based mock CLIs for opencode/claude/codex/deepseek/qwen/grok

Drops in a `mocks/` top-level dir that pretends to be the real agent
CLIs by streaming pre-recorded sessions in each CLI's native stdout
protocol. Zero LLM tokens.

## Use cases

- **E2E tests** in `apps/daemon/tests/` — exercise the full chat-server
  pipeline against a known trace, assert UI events / artifacts.
- **Self-validation during dev** — iterate on `claude-stream.ts` /
  `json-event-stream.ts` parser changes without burning provider budget.
- **Regression harness** — replay the same trace before and after a
  charter / parser change; diff the daemon events the UI surfaces.
- **Demo / onboarding** — show what a 17-tool claude editing session
  looks like end-to-end, offline.

## How

- 6 bash wrappers (`mocks/bin/`) shadow the real CLIs when PATH-overlaid.
- `mocks/mock-agent.mjs` reads `mocks/recordings/<trace>.jsonl`, picks
  one via env var (`SYNCLO_EXPLORE_MOCK_TRACE` / `_POOL` /
  `_BY_PROMPT_HASH`), streams the trace in the requested format.
- Each format renderer matches the EXACT JSON shape the OD daemon
  parser expects, verified line-by-line against
  `apps/daemon/src/{json-event-stream,claude-stream}.ts`:

  | CLI                       | streamFormat              | parser source                              |
  | ------------------------- | ------------------------- | ------------------------------------------ |
  | `opencode`                | `json-event-stream`       | `handleOpenCodeEvent`                      |
  | `codex`                   | `json-event-stream`       | `handleCodexEvent`                         |
  | `claude`                  | `claude-stream-json`      | `createClaudeStreamHandler`                |
  | `deepseek` `qwen` `grok`  | `plain`                   | `server.ts` (raw stdout)                   |

## Quick start

```bash
export PATH="$PWD/mocks/bin:$PATH"
export SYNCLO_EXPLORE_MOCK_TRACE=04097377   # 8-char prefix OK
export SYNCLO_EXPLORE_MOCK_NO_DELAY=1

echo "any prompt" | opencode run
echo "any prompt" | claude -p --output-format=stream-json
echo "any prompt" | codex exec
```

The mock binary announces the picked trace id on stderr:
`[mock-opencode] picked 04097377… via fixed`.

Recording selection (env, in priority order):
- `SYNCLO_EXPLORE_MOCK_TRACE=<id>` — fixed (prefix OK)
- `SYNCLO_EXPLORE_MOCK_BY_PROMPT_HASH=1` + stdin prompt — `sha256(prompt) % N`
- `SYNCLO_EXPLORE_MOCK_POOL=<tag>` — random within `agent:claude` /
  `skill:agent-browser` / `outcome:failed` / etc.
- (default) uniform random
- `SYNCLO_EXPLORE_MOCK_SEED=<str>` — reproducible "random"
- `SYNCLO_EXPLORE_MOCK_NO_DELAY=1` — skip inter-event waits

## Dataset

179 anonymized Langfuse traces from this project's own production
telemetry:

- 9 agents: claude 57 · opencode 41 · codex 38 · gemini 25 ·
  cursor-agent 11 · qwen 2 · copilot 2 · deepseek 2 · antigravity 1
- outcomes: succeeded 144 · failed 35
- skills: default 71 · ad-creative 50 · algorithmic-art 30 ·
  agent-browser 22 · video-hyperframes 2 · plus magazine-web-ppt /
  brainstorming / data-report / penpot-flutter-design-source 1 each
- 124 multi-turn (sessions with ≥2 turns)
- 18 produce `<artifact>` output
- ~4.5 MB on disk total

Anonymization: `/Users/<name>/` → `${HOME}/`,
`C:\Users\<name>\` → `%USERPROFILE%\`, project UUIDs →
stable `proj-001`, `proj-002`, …. Tool input/output payloads
preserved verbatim (templated UI, no cell-level PII).

## Smoke test

`bash mocks/scripts/smoke-test.sh` — 6 checks across all 6 agents.
All pass on this branch (verified locally):

```
  ✓ opencode first event = step_start
  ✓ codex first event = thread.started
  ✓ claude first event = system
  ✓ deepseek emitted plain text (144 chars on first line)
  ✓ qwen emitted plain text (144 chars on first line)
  ✓ grok emitted plain text (144 chars on first line)
All mock CLIs working. 
```

## Adding more recordings

The exporter that produced this set lives in
[nexu-io/agent-pr-explore](https://github.com/nexu-io/agent-pr-explore)
(see `cli/src/local/orchestrator/langfuse-import.ts` + the `local
langfuse-import` CLI command). Operators with the Langfuse keys can pull
more by tag / outcome / artifact / multi-turn filter, then run
`local recordings anonymize --out-dir ~/Documents/open-design/mocks/recordings`.
`mocks/README.md` has the full instructions.

## Out of scope (follow-ups)

- **ACP agents** (`devin`, `hermes`, `kilo`, `kimi`, `kiro`, `vibe`) need
  a JSON-RPC server on stdio rather than a one-shot stream — separate
  `format-acp.mjs` module not yet written.
- **Per-agent json-event-stream variants** (`cursor-agent`, `gemini`,
  `qoder`, `copilot`, `pi`) currently fall back to the `plain` renderer;
  their parsers are in `apps/daemon/src/json-event-stream.ts` and follow
  the same template as `format-codex.mjs`.

## AGENTS.md updates

- Added `mocks/` to the top-level content directories listing
- Added a Validation strategy bullet pointing here for agent-stream /
  parser changes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(mocks): add opencode-cli/kiro-cli/vibe-acp bin aliases and unref ACP timeout

- Add mocks/bin/opencode-cli, kiro-cli, vibe-acp wrappers for the primary
  RuntimeAgentDef bin names OD resolves before any fallback. Without these,
  a PATH-overlaid OD daemon run bypasses the mock entirely (opencode-cli,
  kiro-cli) or cannot find the mock at all (vibe-acp, which has no fallback).
- Include opencode-cli, kiro-cli, vibe-acp in the smoke-test ACP/JSON loop
  so coverage is verified end-to-end.
- Call .unref() on the 30s safety timeout in format-acp.mjs so a completed
  ACP session exits promptly instead of waiting the full 30 seconds.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

* feat(mocks): add vela (AMR) — login / models / ACP with strict set_model gate

Extends mocks/ to cover OD's own AMR runtime. `vela` is the bin name
`apps/daemon/src/runtimes/defs/amr.ts` specifies (`bin: 'vela'`,
`streamFormat: 'acp-json-rpc'`). It's richer than the generic ACP
agents — covers full login + models + chat-session lifecycle.

### What vela does (mirrored from apps/daemon/tests/fixtures/fake-vela.mjs)

1. `vela login` — writes ~/.amr/config.json with a fake profile (controlKey,
   runtimeKey, user{email,name,plan}, profile-specific apiUrl/linkUrl).
   The on-disk projection is what OD's daemon login route + AmrLoginPill
   poller read; production goes through device-auth, the mock skips
   straight to the file write.

2. `vela models` — prints the production-shaped public model catalog as
   newline-separated `public_model_*    vela` lines. Override via
   FAKE_VELA_MODELS env.

3. `vela agent run --runtime opencode` — ACP JSON-RPC server with three
   vela-specific protocol extensions:

   a. `initialize` response carries `agentCapabilities`
      (`promptCapabilities.embeddedContext`) + `models`
      (`currentModelId` + `availableModels`).
   b. `session/new` response carries the same `models` block.
   c. **Strict set_model gate**: `session/prompt` is rejected with
      JSON-RPC -32602 ("session/set_model must be called before
      session/prompt") UNLESS `session/set_model` (or
      `session/set_config_option`) has been called for the current
      sessionId. Mirrors real vela 0.0.1 contract; catches regressions
      in `attachAcpSession` that silently skip set_model.

### Error injection envs (in sync with fake-vela.mjs)

  FAKE_VELA_SESSION_ID            - sessionId returned by session/new
  FAKE_VELA_TEXT                  - override assistant text
  FAKE_VELA_THOUGHT               - optional thought_chunk before text
  FAKE_VELA_SESSION_NEW_ERROR     - fail session/new
  FAKE_VELA_SET_MODEL_ERROR       - fail session/set_model
  FAKE_VELA_PROMPT_ERROR          - fail session/prompt
  FAKE_VELA_REQUIRE_SET_MODEL='0' - disable the strict gate (legacy)
  FAKE_VELA_LOGIN_USER_EMAIL      - email written into config profile
  FAKE_VELA_LOGIN_USER_PLAN       - plan written into config profile
  FAKE_VELA_LOGIN_DELAY_MS        - sleep before write (test in-flight)
  FAKE_VELA_LOGIN_FAIL            - print + exit 1
  FAKE_VELA_MODELS                - override models stdout
  VELA_PROFILE                    - profile slot (prod | test | local)

### Components

`mocks/lib/format-vela.mjs` (~205 LOC)
  - Full ACP server with vela protocol extensions
  - Strict set_model gate
  - Error injection plumbing

`mocks/lib/vela-subcommands.mjs` (~90 LOC)
  - runVelaLogin() — writes ~/.amr/config.json
  - runVelaModels() — prints catalog

`mocks/bin/vela` — dispatcher wrapper. Forwards `vela <subcmd>` to
mock-agent.mjs which routes to login/models or falls through to ACP.

`mocks/mock-agent.mjs` — parseArgs now collects positionals so the vela
dispatcher can read subcommand from there; switch case added for vela.

`mocks/scripts/smoke-test.sh` — +4 assertions:
  vela models prints ≥10 catalog lines
  vela login writes ~/.amr/config.json with the requested email
  vela agent run ACP roundtrip (initialize+models+set_model+stream+result)
  vela strict set_model gate rejects prompt without prior set_model

### Verified locally

  ✓ vela models printed 15 catalog lines
  ✓ vela login wrote ~/.amr/config.json with profile.prod.user.email
  ✓ vela agent run ACP roundtrip (initialize+models, set_model accepted, prompt streamed)
  ✓ vela strict set_model gate rejects session/prompt without prior set_model

All 21 smoke checks pass (up from 17 with previous P3 ACP commit).

### AGENTS.md + README updates

  AGENTS.md — mention `vela (AMR — vela CLI)` alongside ACP agents in
  the directory listing entry.
  mocks/README.md — protocol table row + dedicated vela section with
  subcommand contract, strict gate explanation, env-injection cheat
  sheet. Mock-tree listing updated.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(mocks): honor REPORT_FILE env when --report-file flag not given

Harnesses that spawn the mock without translating their report-path
contract to the mock's CLI flag (notably nexu-io/agent-pr-explore's
orchestrator, which passes REPORT_FILE as env per the existing
opencode/claude/codex agent launchers) wouldn't get a report file
written, so the harness's "agent exit 0 but produced no report"
check would always fire and mark mock runs as failure even though the
stdout stream was complete.

Fix: in mock-agent.mjs parseArgs, fall through to process.env.REPORT_FILE
when --report-file wasn't provided on argv. Each format renderer already
accepts opts.reportFile and writes the recording's final assistant text
to it (`format-*.mjs` already had this — only the wiring was missing).

Verified: synclo-explore run with `mock=true, mock_trace=04097377`
against the opencode wrapper now produces a plan.md with the recording's
17-tool claude editing session report. ~1.5s per run vs ~70s real opencode.

* mocks: move recordings to Cloudflare R2; PR→main→Action upload path

The 179-recording corpus (~4.5 MB raw, ~280 KB after compression) has
been moved off git into Cloudflare R2 at the bucket open-design-mocks
under recordings/v1/. The repo now ships:

- mocks/manifest.json — the canonical catalog (renamed from
  recordings/index.json) with sha256 + storage hints; consumers
  fetch this to discover what exists, then pull individual jsonl
  files on demand
- mocks/scripts/fetch-recordings.sh — parallel, sha256-verified,
  idempotent puller for the public r2.dev URL
- mocks/scripts/add-recording.sh — local maintainer helper that
  validates a new .jsonl and copies it into recordings-staging/
  (no R2 calls; no credentials needed)
- mocks/scripts/upload-to-r2.mjs — called only by the CI workflow
- mocks/scripts/lib/manifest-utils.mjs — shared sha256/meta/
  rebuild-histograms logic, used by both add-recording (preview)
  and upload-to-r2 (actual write) so the entry shape never drifts
- .github/workflows/sync-mocks-to-r2.yml — fires on push to main
  when mocks/recordings-staging/ changes; uploads to R2, updates
  manifest, commits cleanup back; serialized via concurrency group

Trust model: R2 write credentials (CLOUDFLARE_API_TOKEN,
CLOUDFLARE_ACCOUNT_ID) are repo secrets; nobody can push from a
laptop. Read stays public via the r2.dev URL.

Why not pnpm install integration: contributors who do not touch
agent code do not pay the fetch cost. Fetch happens on first
smoke-test run (auto-fallback) or when a mock spawn needs data.

Repo size: -4.55 MB net (delete 179 jsonl, +280 KB manifest +
scripts). Smoke test (21 checks) still green against the fetched
corpus.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* mocks: scope R2 write token to a dedicated secret name

Use CLOUDFLARE_R2_MOCKS_TOKEN (instead of reusing the shared
CLOUDFLARE_API_TOKEN that landing-page-*.yml uses for Pages deploys)
so the R2 write capability can be scoped to just the
open-design-mocks bucket without bleeding extra capability into the
Pages workflows.

Also hardcode the powerformer CF account_id directly in the workflow
(account IDs are not secret and the shared CLOUDFLARE_ACCOUNT_ID
secret may point at a different account).

Workflow now fails fast with an actionable error message + dashboard
link if the secret is unset.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* mocks: switch R2 sync to S3-compat API (wrangler getMemberships gate)

wrangler 4.x calls /memberships before any r2 action, requiring
user:read scope. R2 "Object Read & Write" tokens deliberately lack
that scope (defense in depth — a leaked token should not enumerate
account-level resources). The workflow now uses the aws CLI talking
straight to the R2 S3-compatible endpoint with SigV4, no membership
lookup.

Secret rotation: CLOUDFLARE_R2_MOCKS_TOKEN (Bearer) is replaced by
CLOUDFLARE_R2_MOCKS_AK / CLOUDFLARE_R2_MOCKS_SK (matching the
existing CLOUDFLARE_R2_RELEASES_AK/SK naming convention). End-to-end
tested locally: PUT recording → manifest rebuild → manifest PUT →
staging cleanup all green.

aws CLI is pre-installed on ubuntu-latest, so no install step.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* mocks: scrub synclo namespace; use OD_MOCKS_* env prefix throughout

These mocks were copy-pasted from synclo-explore, where they
originated, and inherited the SYNCLO_EXPLORE_MOCK_* env-var
convention. That brand-bleed is not appropriate in OD: rename the
public env surface to OD_MOCKS_* (matching OD-native prefixes like
OD_MOCKS_CACHE_DIR, OD_TRACE_R2_UPLOAD, OD_EXPECT_TIMEOUT_SECONDS).

Renames:
  SYNCLO_EXPLORE_MOCK_TRACE             → OD_MOCKS_TRACE
  SYNCLO_EXPLORE_MOCK_BY_PROMPT_HASH    → OD_MOCKS_BY_PROMPT_HASH
  SYNCLO_EXPLORE_MOCK_POOL              → OD_MOCKS_POOL
  SYNCLO_EXPLORE_MOCK_SEED              → OD_MOCKS_SEED
  SYNCLO_EXPLORE_MOCK_NO_DELAY          → OD_MOCKS_NO_DELAY
  SYNCLO_EXPLORE_MOCK_RECORDINGS_DIR    → OD_MOCKS_RECORDINGS_DIR
  SYNCLO_EXPLORE_MOCK_SMOKE_TRACE       → OD_MOCKS_SMOKE_TRACE
  SYNCLO_OD_MOCKS_I_KNOW_WHAT_IM_DOING  → OD_MOCKS_ALLOW_LOCAL_UPLOAD

Also drop the inline harvester usage from README. The harvester is an
external CLI in nexu-io/agent-pr-explore — its README is the right
place for langfuse-import flags, anonymization options, etc. OD only
documents its own staging→PR→Action workflow.

Smoke test (21 checks) still green; OD_MOCKS_TRACE end-to-end
verified to route correctly.

Consumers of the OLD env names (notably the orchestrator in
nexu-io/agent-pr-explore) need a matching rename. No back-compat
shim here — the explore side has zero external users today and a
one-line follow-up is cleaner than a permanent deprecation layer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* AGENTS.md: align mock env names with mocks/ rename (SYNCLO_* → OD_MOCKS_*)

Missed in the prior commit (a30b868a) — only grepped mocks/ subdir.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* mocks: drop staging dir + GH Action; back to local-script upload

The staging-dir + Action design (added earlier in this PR) had a flaw
the user caught: new recordings briefly entered the repo on their way
through staging, leaving them in git history forever even after the
Action cleanup commit removed them from HEAD. That defeats the whole
point of moving recordings to R2.

Replace with the simpler local-maintainer flow:

  bash mocks/scripts/upload-recording.sh /path/to/<trace>.jsonl
  # → validates, wrangler r2 put, updates manifest.json, wrangler r2 put manifest
  git add mocks/manifest.json && git commit && git push
  # → only the ~200B manifest delta enters git

The wrangler-OAuth gate replaces the CI secret + Action duo. For a
solo / small maintainer team this collapses the trust chain down to
"do you have wrangler login to the powerformer account?" — no GH
secrets to rotate, no concurrency window to worry about, no
inevitable repo-history bloat.

Deletes:
- .github/workflows/sync-mocks-to-r2.yml
- mocks/scripts/upload-to-r2.mjs   (CI-only)
- mocks/scripts/add-recording.sh   (staging helper, now obsolete)
- mocks/recordings-staging/        (empty dir, never to be repopulated)

Adds:
- mocks/scripts/upload-recording.sh

Kept:
- mocks/scripts/fetch-recordings.sh
- mocks/scripts/lib/manifest-utils.mjs (still used by upload-recording.sh)
- mocks/manifest.json (committed; the only mocks artifact in git)

End-to-end tested locally: re-upload an existing recording is
idempotent, manifest math is stable, fetch + smoke test still green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* mocks: address review — guard allowlist + safe ~/.amr + loud OD_MOCKS_TRACE typo

Three concrete issues raised across recent Siri-Ray (Looper) review
threads on #3241:

1. scripts/guard.ts only allowlisted mocks/lib/ + mocks/mock-agent.mjs,
   leaving mocks/scripts/lib/manifest-utils.mjs outside the residual-
   JS guard. Result: Preflight fail on every push. Extend the allowlist
   to mocks/scripts/ — same precedent as the lib/ entry directly above.

2. mocks/scripts/smoke-test.sh moved the caller real ~/.amr to
   ~/.amr-smoke-backup, ran vela login (which writes a fake config),
   then rm -rf the .amr and restored the backup. Two failure modes:
   crash mid-run loses the user real config, and re-running before
   restore overwrites the backup with the fake login. Fix: sandbox
   vela login into a mktemp -d HOME via env (HOME=$amr_sandbox vela
   login). Never touches the real ~/.amr at all. trap cleans up.

3. mocks/lib/recording-picker.mjs silently fell through to
   prompt-hash → pool → random when OD_MOCKS_TRACE was set but did
   not match any recording (typo, prefix too short, corpus not
   fetched). Tests using a pinned trace would silently get a
   different trace, hiding regressions. Fix: throw an explicit error
   with the failing value + a pointer at fetch-recordings.sh.

Verified locally: pnpm guard prints "Residual JavaScript check
passed", smoke-test still 21/21, ~/.amr mtime unchanged after run,
typo on OD_MOCKS_TRACE now produces "mock-agent: OD_MOCKS_TRACE=...
set but no matching recording in <dir>" on stderr.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fetch-recordings: detect empty filter result before line-counting

printf '%s\n' on an empty string emits a single empty line, so the
previous TOTAL=$(printf ... | grep -c "") math returned 1 on an
empty $ENTRIES_TSV — a typo like `--agent no-such-agent` printed
"Fetching up to 1 recordings", downloaded zero, and exited 0
("ready"). Check `-z $ENTRIES_TSV` first.

Reproduced + fix verified per the reviewer thread.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* mocks: address mrcfps review — goldens + provenance + contract check

Three durability improvements suggested in the PR #3241 top-level
review:

## 1. Golden daemon-event snapshots (mocks/golden/*.events.json + apps/daemon/tests/mocks-golden.test.ts)

Smoke-test verified that mocks RUN; that catches crashes but not a
parser change that semantically reshapes the events the daemon emits.

Commit the daemon-event sequence for 3 representative traces:
- claude  314d6833 — median-complexity agent-browser session
- codex   dcdff3b3 — 14-tool refactor
- opencode 9a9522ec — 7-tool data-report

apps/daemon/tests/mocks-golden.test.ts spawns the mock, feeds stdout
through the real createClaudeStreamHandler / createJsonEventStreamHandler,
normalizes per-spawn volatile fields (only sessionId today, only on
claude), and deep-equals against the committed snapshot. A parser
regression fails the test loudly.

After an intentional parser change, regenerate:

  MOCKS_GOLDEN_UPDATE=1 pnpm --filter @open-design/daemon test mocks-golden
  git diff mocks/golden/
  # eyeball; commit if shapes match intent

## 2. Provenance fields on every manifest entry (mocks/scripts/lib/manifest-utils.mjs + mocks/manifest.json)

Augment inspectRecording() to write:

  captured_at         — ISO 8601 from existing meta.timestamp
  cli_version         — null until harvester writes it
  protocol_version    — null until harvester writes it
  anonymization_version — null until harvester writes it

captured_at is now populated for all 179 existing entries from the
meta event the harvester already emits. The harvester in
nexu-io/agent-pr-explore is the next step for cli_version /
protocol_version / anonymization_version — once those are
populated, consumers can detect when a recording is older than ~1
minor version behind the live CLI and flag for re-harvest.

No matrix of (cli_version × agent) recordings — that explodes
maintenance. Just metadata per recording so trust decay is visible.

## 3. Real-CLI contract check (mocks/scripts/contract-check.sh + docs/MOCKS-CONTRACT-CHECK.md)

Mocks catch parser regressions against recordings; they do NOT
catch recordings drifting away from the live agent CLI as that CLI
evolves. The contract check spawns the real CLI alongside the mock
with a fixed deterministic prompt + diffs top-level event-type
distributions.

Deliberately human-driven, not cron-scheduled:
- costs real LLM tokens per invocation
- requires real CLI auth
- maintainer reads the output, not a regex

Suggested triggers per doc: real-CLI release notes mentioning
"output format" / "stream" / "JSON" / "events"; before a parser
refactor; ad-hoc when something looks off.

## Coverage note

README updated to position mocks as "deterministic protocol/parser
coverage" (not "e2e replacement") per mrcfps framing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(mocks-golden test): drop import of non-exported ParserKind

Use plain string (the type alias is `string` anyway) — Preflight
typecheck on a31fa71a failed:

  tests/mocks-golden.test.ts(29,8): error TS2459: Module
  "../src/json-event-stream.js" declares "ParserKind" locally, but
  it is not exported.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* recording-picker: structured OD_MOCKS_POOL + hard-fail no-match

Siri-Ray review: \`OD_MOCKS_POOL=outcome:failed\` was documented as a
supported selection knob, but the matcher only checked tags and
\`meta.agent\` — so the negative-path pool found 0 candidates and
silently fell through to global random, validating against any
recording instead of a failed trace.

Fix:
- Parse \`<dim>:<value>\` shape and route each dim to the right meta
  field: \`outcome\` → \`meta.outcome\`, \`agent\` → \`meta.agent\`,
  \`skill\` → \`tags[]\`. Bare values still fall back to tag substring.
- If the env was set and matched nothing, throw with the failing
  value and a jq one-liner for inspection. Same loud-fail policy as
  OD_MOCKS_TRACE — silent fallback was the original bug.

Verified locally: outcome:failed, agent:codex, skill:agent-browser
all route correctly; outcome:nonsense throws the explicit error.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* contract-check.sh: fix lost $PROMPT in mock invocation

Siri-Ray review on e576074a: the mock side wrapped its pipeline in
`bash -c "printf %s \"\$PROMPT\" | ..."` — but $PROMPT was a parent
shell variable, not exported, so the child bash expanded it to an
empty string. Result: the contract check sent the real prompt to the
real CLI and an empty string to the mock, defeating the
same-input invariant the whole script rests on. Also let the mock
randomly select a different trace whenever a maintainer happens to
have OD_MOCKS_BY_PROMPT_HASH=1 in their env.

Fix: drop the inner bash -c entirely; use a subshell that scopes the
PATH overlay and pipes printf into the PATH-resolved mock binary
directly. The subshell limits the PATH change without var-passing.

Verified locally: with prompt-A the mock picks trace 54ec02ee via
hash; prompt-B → 2667e851 via hash; empty prompt (old broken
behavior) → random — confirms the prompt is now actually reaching
the mock under PATH overlay.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-29 07:17:20 +00:00

222 lines
32 KiB
JSON
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"agent": "codex",
"trace": "dcdff3b3-cd39-4dcd-be83-372830a29639",
"events": [
{
"type": "status",
"label": "initializing"
},
{
"type": "status",
"label": "running"
},
{
"type": "tool_use",
"id": "dcdff3b3-cd39-4dcd-be83-372830a29639-tool-item_3",
"name": "Bash",
"input": {
"command": "/bin/zsh -lc 'rg --files -n .'"
}
},
{
"type": "tool_result",
"toolUseId": "dcdff3b3-cd39-4dcd-be83-372830a29639-tool-item_3",
"content": "zsh:1: command not found: rg\n",
"isError": true
},
{
"type": "tool_use",
"id": "dcdff3b3-cd39-4dcd-be83-372830a29639-tool-item_4",
"name": "Bash",
"input": {
"command": "/bin/zsh -lc \"sed -n '1,220p' '${HOME}/.codex/plugins/cache/openai-curated/build-web-apps/9b3c8689/skills/frontend-app-builder/SKILL.md'\""
}
},
{
"type": "tool_result",
"toolUseId": "dcdff3b3-cd39-4dcd-be83-372830a29639-tool-item_4",
"content": "---\nname: frontend-app-builder\ndescription: Use for new frontend applications, dashboards, games, creative websites, hero sections, and visually driven UI from scratch, or when the user explicitly asks for a redesign/restyle/modernization. Builds from clean, airy, high-taste, readable image-generated concept design with section-specific references, faithful implementation, and browser testing.\n---\n\n# Frontend App Builder\n\nUse this skill to create polished frontend apps, dashboards, games, creative websites, hero sections, redesigns, and other visually driven UI. Act first as a senior front-end designer, then as an engineer implementing an approved design spec.\n\n## Core Standard\n\nThe two priorities of this skill outrank everything else:\n\n1. Create enough great-looking Image Gen design first: clean, airy, distinctive, complete, readable, section-specific when needed, and not repetitive by default.\n2. Do not stop until the accepted design and browser implementation match 10/10. Keep fixing visual, interaction, responsive, asset, and typography mismatches until `view_image` comparison would pass agency sign-off.\n\n## Hard Rules\n\n1. Use Image Gen for the visual concept unless the user explicitly opts out or the task is a small UI fix inside an existing design system.\n2. Design the complete requested surface before coding. For a full page, app, dashboard, game, or product interface, a header or hero concept is not enough. For multi-section websites and long landing pages, prefer coordinated section-by-section concepts, plus an optional overview for rhythm, over one tall image that loses detail. For apps, dashboards, games, or compact product surfaces, generate the full primary screen plus any needed state, responsive, or asset concepts first.\n3. Inside Codex, default multi-section website concepting to one fresh, large, readable Image Gen screenshot per major section. If the request has 1-10 sections, expect roughly 1-10 primary section images. Generate additional section/detail screenshots whenever text, buttons, card anatomy, typography, spacing, or colors are too small to extract. Do not crop or zoom an old full-page image as the main reference; regenerate a fresh standalone section or detail image that preserves the same design system.\n4. In Plan mode, generate the design first, then use `request_user_input` to get design approval before planning implementation details.\n5. Once accepted, the concept is a production design spec. No creative liberties during implementation: do not reinterpret layout, visible copy, hierarchy, container model, styling, imagery, density, or sections unless the user approves it or a concrete blocker requires it. General design heuristics never override the accepted concept.\n6. The completion bar is agency-signoff faithful implementation: 10/10 fidelity to the accepted spec plus production-quality code. If the browser-rendered UI would receive design-review comments, keep fixing it.\n7. Before coding, build a small design system from the accepted image: tokens, typography, component families, variants, spacing, icon treatment, and container rules. Include both content typography and UI chrome typography for tools, editors, and dashboards. Implement from that system so repeated elements stay consistent.\n8. For new complex app UIs such as dashboards, admin tools, editors, data-heavy tools, and multi-panel product surfaces, default to React + Vite unless the user specifies another framework, the existing repo already dictates one, or the task is explicitly a single-file/static deliverable.\n9. Hero eyebrow, kicker, pretitle, badge, or pill labels above the main heading are prohibited by default. Use one only when the user explicitly requested it or the accepted/reference design already contains it.\n10. Verify in the Browser plugin / built-in browser first. Use Playwright Chromium only when Browser/IAB is unavailable or unreliable, and state the fallback reason.\n11. Final handoff is blocked until you use `view_image` on both the accepted concept and the latest browser screenshot. This cannot be skipped ",
"isError": false
},
{
"type": "tool_use",
"id": "dcdff3b3-cd39-4dcd-be83-372830a29639-tool-item_5",
"name": "Bash",
"input": {
"command": "/bin/zsh -lc 'find . -maxdepth 2 -type d -print'"
}
},
{
"type": "tool_result",
"toolUseId": "dcdff3b3-cd39-4dcd-be83-372830a29639-tool-item_5",
"content": ".\n",
"isError": false
},
{
"type": "tool_use",
"id": "dcdff3b3-cd39-4dcd-be83-372830a29639-tool-item_9",
"name": "Bash",
"input": {
"command": "/bin/zsh -lc 'mkdir -p css js screens'"
}
},
{
"type": "tool_result",
"toolUseId": "dcdff3b3-cd39-4dcd-be83-372830a29639-tool-item_9",
"content": "",
"isError": false
},
{
"type": "tool_use",
"id": "dcdff3b3-cd39-4dcd-be83-372830a29639-tool-item_15",
"name": "Bash",
"input": {
"command": "/bin/zsh -lc 'find . -maxdepth 3 -type f -print | sort'"
}
},
{
"type": "tool_result",
"toolUseId": "dcdff3b3-cd39-4dcd-be83-372830a29639-tool-item_15",
"content": "./css/mobile.css\n./index.html\n./js/queue-app.js\n./screens/android-detail.html\n./screens/android-discover.html\n./screens/android-ticket.html\n./screens/ios-detail.html\n./screens/ios-discover.html\n./screens/ios-ticket.html\n",
"isError": false
},
{
"type": "tool_use",
"id": "dcdff3b3-cd39-4dcd-be83-372830a29639-tool-item_16",
"name": "Bash",
"input": {
"command": "/bin/zsh -lc \"grep -RInE 'lorem|placeholder|Feature One|Feature Two|TODO|REPLACE|demo only|target count|settings panel|viewport selector|theme knob' .\""
}
},
{
"type": "tool_result",
"toolUseId": "dcdff3b3-cd39-4dcd-be83-372830a29639-tool-item_16",
"content": "./css/mobile.css:164:.search-box input::placeholder {\n./screens/android-discover.html:28: <input data-search type=\"search\" placeholder=\"Doctor, clinic, symptom\" />\n./screens/ios-discover.html:26: <input data-search type=\"search\" placeholder=\"Search doctor, clinic, specialty\" />\n",
"isError": false
},
{
"type": "tool_use",
"id": "dcdff3b3-cd39-4dcd-be83-372830a29639-tool-item_17",
"name": "Bash",
"input": {
"command": "/bin/zsh -lc \"python3 - <<'PY'\nfrom pathlib import Path\nfiles=list(Path('.').glob('**/*.html'))\nfor p in files:\n s=p.read_text()\n if '<\"'!doctype html>'\"' not in s[:40].lower(): print('missing doctype', p)\n if s.count('<html')\"'!=1 or s.count('\"'</html>')\"'!=1: print('\"'html count', p)\n if s.count('<body')\"'!=1 or s.count('\"'</body>')\"'!=1: print('\"'body count', p)\nprint('checked', len(files), 'html files')\nPY\""
}
},
{
"type": "tool_result",
"toolUseId": "dcdff3b3-cd39-4dcd-be83-372830a29639-tool-item_17",
"content": "checked 7 html files\n",
"isError": false
},
{
"type": "tool_use",
"id": "dcdff3b3-cd39-4dcd-be83-372830a29639-tool-item_18",
"name": "Bash",
"input": {
"command": "/bin/zsh -lc \"sed -n '1,180p' '${HOME}/.codex/plugins/cache/openai-bundled/browser/26.519.81530/skills/browser/SKILL.md'\""
}
},
{
"type": "tool_result",
"toolUseId": "dcdff3b3-cd39-4dcd-be83-372830a29639-tool-item_18",
"content": "---\nname: browser\ndescription: \"Browser automation for the Codex in-app browser. Use to open, navigate, inspect, test, click, type, screenshot, or verify local targets such as localhost, [REDACTED:ipv4], ::1, file://, the current in-app browser tab, and websites shown side by side inside Codex.\"\n---\n\n# Browser\n\nUse this skill for browser automation tasks such as inspecting pages, navigating, testing local apps, clicking, typing, taking screenshots, and reading visible page state. After setup, select the `iab` browser.\n\nKeep browser work in the background by default.\n\nShow the browser when the user's request is primarily to put a page in front of them or let them watch the interaction, such as \"open localhost:3000\", \"go to the docs page\", \"take me to the PR\", \"show me the current tab\", or \"keep the browser open while you test checkout\".\n\nDo not show the browser when navigation is only a means to answer a question or verify behavior, such as \"check localhost:3000 and tell me whether login works\", \"inspect the docs page and summarize what changed\", or \"verify the modal still opens correctly\". Localhost targets and ordinary page navigation do not by themselves require visibility.\n\nWhen the browser should be visible to the user, actually present it with `await (await browser.capabilities.get(\"visibility\")).set(true)`.\n\nIf this plugin is listed as available in the session, treat that as mandatory reading before browser work. Open and follow this skill before saying that Browser is unavailable and before falling back to standalone Playwright or Computer Use.\n\nDo not skip this skill just because Computer Use MCP tool calls are directly visible or appear easier to invoke. The presence of Computer Use tools is not evidence that Computer Use is the preferred browser surface.\n\nBefore using this skill for the first time in the current conversation context, read the entire `SKILL.md` file in one read. Do not use a partial range such as `sed -n '1,220p'`; read through the end of the file. Do not mention this internal skill-loading step to the user.\n\n## Bootstrap\n\nThese setup details are internal. User-facing progress updates should be less technical in nature. Never mention `Node REPL`, `node_repl`, `REPL`, JavaScript sessions, or module exports unless a user is asking for that exact information. If setup or recovery is needed, describe it naturally as connecting to the browser or retrying the browser connection.\n\nThe `browser-client` module is the core entry point for browser use, and is available under `scripts/browser-client.mjs` in this plugin's root directory. ALWAYS import it using an absolute path.\nIMPORTANT: If this path cannot be found, stop and report that this plugin is missing `scripts/browser-client.mjs`. NEVER use the built in `browser-client` library.\n\nRun browser setup code through the Node REPL `js` tool. In this environment the callable tool id typically appears as `mcp__node_repl__js`; `js_reset` only clears state and is not the execution tool. Run this once per fresh `node_repl` session:\n\n```js\nconst { setupBrowserRuntime } = await import(\"<plugin root>/scripts/browser-client.mjs\");\nawait setupBrowserRuntime({ globals: globalThis });\nglobalThis.browser = await agent.browsers.get(\"iab\");\n```\n\nUse the browser bound to `browser` for tasks in this skill.\n\n## Troubleshooting\n\nIMPORTANT: do NOT attempt to dig through source code or control the browser through unrelated mechanisms before attempting the workflow for the selected backend. If you run into issues, follow the steps below FIRST.\n\n- Do not fall back to Computer Use just because its tool calls are already visible. Read and attempt this workflow first.\n- If `js_reset` is visible but `js` is not, do not conclude that `node_repl` is unusable. Use tool discovery for `node_repl js`, then `mcp__node_repl__js`, then `js`, then `node_repl js JavaScript execution`; run the bootstrap cell with the Node REPL `js` tool once it is exposed.\n- If the Node REPL `js` execution tool is still unavailable after those searches, say that explicitly before choosing any fallback browser-",
"isError": false
},
{
"type": "tool_use",
"id": "dcdff3b3-cd39-4dcd-be83-372830a29639-tool-item_19",
"name": "Bash",
"input": {
"command": "/bin/zsh -lc 'python3 -m http.server 5173'"
}
},
{
"type": "tool_result",
"toolUseId": "dcdff3b3-cd39-4dcd-be83-372830a29639-tool-item_19",
"content": "",
"isError": false
},
{
"type": "tool_use",
"id": "dcdff3b3-cd39-4dcd-be83-372830a29639-tool-item_20",
"name": "Bash",
"input": {
"command": "/bin/zsh -lc \"sed -n '181,360p' '${HOME}/.codex/plugins/cache/openai-bundled/browser/26.519.81530/skills/browser/SKILL.md'\""
}
},
{
"type": "tool_result",
"toolUseId": "dcdff3b3-cd39-4dcd-be83-372830a29639-tool-item_20",
"content": "In `node_repl` you can use Node filesystem libraries when needed.\n\nFor file operations, prefer the Node runtime libraries directly:\n```js\nconst fs = await import(\"node:fs/promises\");\n\n// write a file\nawait fs.writeFile(\"hello.txt\", \"Hello world\");\n\n// read a file\nconst contents = await fs.readFile(\"hello.txt\", \"utf-8\");\n```\n\n#### Browser interactions\n\nUse the guarded first-browser-cell pattern above when starting browser work. It prepares the browser runtime and top-level `agent` object for browser work.\n\n## API Use Behavior\n\nThe ability to interact directly with the browser is exposed through the `browser-client` runtime via the `agent.browsers.*` API.\n\nOnly the Node REPL `js` tool (`mcp__node_repl__js`) can be used to control the in-app browser. Do not use external MCP browser-control tools, separate browser automation servers, or other browser skills for this surface. References to Playwright mean the in-skill `tab.playwright` API after browser-client setup.\n\n### How to use the API\n\n* You are provided with various options for interacting with the browser (Playwright, vision), and you should use the most appropriate tool for the job.\n* Prefer Playwright where possible, but if it is not clear how to best use it, prefer vision.\n* Always make sure you understand what is on the screen before proceeding to your next action. After clicking, scrolling, typing, or other interactions, collect the cheapest state check that answers the next question. Prefer a fresh DOM snapshot when you need locator ground truth, prefer a screenshot when visual confirmation matters, and avoid requesting both by default.\n* Remember that variables are persistent across calls to the REPL. By default, define `tab` once and keep using it. Only re-query a tab when you are intentionally switching to a different tab, after a kernel reset, or after a failed cell that never created the binding.\n\n### General guidance\n\n* Minimize interruptions as much as possible. Only ask clarifying questions if you really need to. If a user has an under-specified prompt, try to fulfill it first before asking for more information.\n* Remember, the user is asking questions about what they see on the screen. Base your interactions on what is visible to the user (based on DOM and screenshots) rather than programmatically determining what they are talking about. The \"first link\" on the page is not necessarily the first `a href` in the DOM.\n* Try not to over-complicate things. It is okay to click based on node ID if it is not clear how to determine the UI element in Playwright.\n* If a tab is already on a given URL, do not call `goto` with the same URL. This will reload the page and may lose any in-progress information the user has provided. When you intentionally need to reload, call `tab.reload()`.\n* If browser-use is interrupted because the extension or user took control, do not quote the raw runtime error. Summarize it naturally for the user, for example: \"Browser use was stopped in the extension.\" Avoid internal terms like turn_id, runtime, retry, or plugin error text unless the user asks for details.\n* When testing a user's local app on `localhost`, `[REDACTED:ipv4]`, `::1`, or another local development URL in a framework that does not support hot reloading or hot reloading is disabled, call `tab.reload()` after code or build changes before verifying the UI. After reloading, take a fresh DOM snapshot or screenshot before continuing.\n* Do not brute-force undocumented site search URLs, query parameter variants, search engine query grids, or candidate URL arrays unless the user explicitly asks for exhaustive coverage.\n* If a guessed URL, search query, or candidate page fails, try at most one new approach. After that, switch to visible page navigation, the site's own search UI, or give the best current answer with uncertainty.\n* If you use a search engine fallback, run one focused query, inspect the strongest results, and open the best candidate. Do not keep rewriting the query in loops.\n* Once you have one strong candidate page, verify it directly instead of collecting more can",
"isError": false
},
{
"type": "tool_use",
"id": "dcdff3b3-cd39-4dcd-be83-372830a29639-tool-item_24",
"name": "Bash",
"input": {
"command": "/bin/zsh -lc 'curl -I http://[REDACTED:ipv4]:5173/index.html'"
}
},
{
"type": "tool_result",
"toolUseId": "dcdff3b3-cd39-4dcd-be83-372830a29639-tool-item_24",
"content": " % Total % Received % Xferd Average Speed Time Time Time Current\n Dload Upload Total Spent Left Speed\n\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r 0 8879 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\nHTTP/1.0 200 OK\r\nServer: SimpleHTTP/0.6 Python/3.9.6\r\nDate: Thu, 28 May 2026 16:20:20 GMT\r\nContent-type: text/html\r\nContent-Length: 8879\r\nLast-Modified: Thu, 28 May 2026 16:18:55 GMT\r\n\r\n",
"isError": false
},
{
"type": "tool_use",
"id": "dcdff3b3-cd39-4dcd-be83-372830a29639-tool-item_25",
"name": "Bash",
"input": {
"command": "/bin/zsh -lc 'curl -I http://[REDACTED:ipv4]:5173/screens/android-ticket.html'"
}
},
{
"type": "tool_result",
"toolUseId": "dcdff3b3-cd39-4dcd-be83-372830a29639-tool-item_25",
"content": " % Total % Received % Xferd Average Speed Time Time Time Current\n Dload Upload Total Spent Left Speed\n\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r 0 2356 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\nHTTP/1.0 200 OK\r\nServer: SimpleHTTP/0.6 Python/3.9.6\r\nDate: Thu, 28 May 2026 16:20:20 GMT\r\nContent-type: text/html\r\nContent-Length: 2356\r\nLast-Modified: Thu, 28 May 2026 16:17:56 GMT\r\n\r\n",
"isError": false
},
{
"type": "tool_use",
"id": "dcdff3b3-cd39-4dcd-be83-372830a29639-tool-item_26",
"name": "Bash",
"input": {
"command": "/bin/zsh -lc 'curl -I http://[REDACTED:ipv4]:5173/screens/ios-discover.html'"
}
},
{
"type": "tool_result",
"toolUseId": "dcdff3b3-cd39-4dcd-be83-372830a29639-tool-item_26",
"content": " % Total % Received % Xferd Average Speed Time Time Time Current\n Dload Upload Total Spent Left Speed\n\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r 0 4518 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\nHTTP/1.0 200 OK\r\nServer: SimpleHTTP/0.6 Python/3.9.6\r\nDate: Thu, 28 May 2026 16:20:20 GMT\r\nContent-type: text/html\r\nContent-Length: 4518\r\nLast-Modified: Thu, 28 May 2026 16:17:55 GMT\r\n\r\n",
"isError": false
},
{
"type": "tool_use",
"id": "dcdff3b3-cd39-4dcd-be83-372830a29639-tool-item_29",
"name": "Bash",
"input": {
"command": "/bin/zsh -lc \"pkill -f 'python3 -m http.server 5173'\""
}
},
{
"type": "tool_result",
"toolUseId": "dcdff3b3-cd39-4dcd-be83-372830a29639-tool-item_29",
"content": "sysmon request failed with error: sysmond service not found\npkill: Cannot get process list\n",
"isError": true
},
{
"type": "text_delta",
"delta": "Using the frontend app builder workflow because this is a new cross-platform mobile prototype. Ill create separate iOS and Android app screens, with an `index.html` gallery that uses the provided device frames.\nIm checking the app-building guidance and the empty project folder now, then Ill lock the screen rhythm before writing files.The project folder is empty. Im choosing a healthcare-native, human approachable direction: clean white clinical surfaces, teal primary action, soft blue secondary states, high-contrast queue status colors, and platform-native iOS/Android navigation.\nThe screen arc is locked: iOS discovery, iOS doctor queue detail, iOS active ticket, Android discovery, Android doctor queue detail, Android active ticket. The interaction path is search/filter → choose doctor queue → join queue → receive token → check in/leave queue.\nIm writing the prototype as real app surfaces now: shared mobile CSS and JavaScript, then separate screen files for each platform/screen so the frame gallery can compare native adaptations side by side.The app screens are in place with search, queue confirmation, check-in, and leave-queue state changes. Im adding the frame gallery next so iOS and Android can be reviewed as native surfaces rather than one combined long page.\nThe launcher is written and the copy is specific to queue booking rather than generic scheduling. Im doing the implementation checks now: file structure, placeholder audit, basic HTML sanity, and then a browser smoke test if the local preview can be loaded.The in-app browser surface is not available in this session, so Im falling back to lightweight local checks instead of claiming a rendered-browser pass. The files still load through the static server; the frame URLs are intended for the Open Design preview host where `/frames/` is available.The P0 checks pass for structure and specificity: seven HTML files have complete document shells, the core screens return 200 locally, and there are no filler sections or generated-design controls in the product UI. The only placeholder matches are real search input placeholder attributes.<artifact identifier=\"queuecare-mobile-prototype\" type=\"text/html\" title=\"QueueCare Mobile Prototype\">\n<!doctype html>\n<html lang=\"en\">\n<head>\n <meta charset=\"utf-8\" />\n <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />\n <title>QueueCare Mobile Prototype</title>\n <style>\n :root {\n --bg: oklch(98% 0.004 240);\n --surface: oklch(100% 0 0);\n --fg: oklch(20% 0.02 240);\n --muted: oklch(50% 0.018 240);\n --border: oklch(90% 0.006 240);\n --accent: oklch(56% 0.12 170);\n --blue: oklch(58% 0.14 245);\n --font-display: 'Sohne', 'Avenir Next', -apple-system, BlinkMacSystemFont, system-ui, sans-serif;\n --font-body: -apple-system, BlinkMacSystemFont, 'SF Pro Text', system-ui, sans-serif;\n }\n\n * { box-sizing: border-box; }\n\n html,\n body {\n margin: 0;\n min-height: 100%;\n background:\n radial-gradient(circle at 8% 0%, color-mix(in oklch, var(--accent), transparent 86%) 0 220px, transparent 221px),\n radial-gradient(circle at 92% 12%, color-mix(in oklch, var(--blue), transparent 88%) 0 260px, transparent 261px),\n var(--bg);\n color: var(--fg);\n font-family: var(--font-body);\n -webkit-font-smoothing: antialiased;\n }\n\n body { overflow-x: hidden; }\n\n .page {\n width: min(1440px, calc(100% - 32px));\n margin: 0 auto;\n padding: 34px 0 54px;\n }\n\n .hero {\n display: grid;\n grid-template-columns: minmax(280px, 0.9fr) minmax(320px, 1.1fr);\n gap: 32px;\n align-items: end;\n margin-bottom: 32px;\n }\n\n .brand {\n display: inline-flex;\n align-items: center;\n gap: 10px;\n color: var(--fg);\n font-weight: 900;\n letter-spacing: 0;\n }\n\n .mark {\n width: 34px;\n height: 34px;\n border-radius: 12px;\n background:\n linear-gradient(90deg, transparent 42%, #fff 42% 58%, transparent 58%),\n linear-gradient(0deg, transparent 42%, #fff 42% 58%, transparent 58%),\n linear-gradient(145deg, var(--accent), var(--blue));\n box-shadow: 0 14px 30px color-mix(in oklch, var(--accent), transparent 72%);\n }\n\n h1 {\n margin: 28px 0 14px;\n font-family: var(--font-display);\n font-size: clamp(44px, 7vw, 96px);\n line-height: 0.92;\n letter-spacing: 0;\n text-wrap: balance;\n max-width: 820px;\n }\n\n .hero p {\n margin: 0;\n color: var(--muted);\n font-size: clamp(17px, 1.7vw, 22px);\n line-height: 1.42;\n max-width: 620px;\n }\n\n .summary {\n display: grid;\n gap: 12px;\n align-self: stretch;\n }\n\n .summary-card {\n border: 1px solid var(--border);\n border-radius: 8px;\n background: color-mix(in oklch, var(--surface), transparent 5%);\n padding: 16px;\n box-shadow: 0 18px 42px color-mix(in oklch, var(--fg), transparent 94%);\n }\n\n .summary-card strong {\n display: block;\n margin-bottom: 6px;\n font-size: 15px;\n }\n\n .summary-card span {\n display: block;\n color: var(--muted);\n font-size: 13px;\n line-height: 1.42;\n }\n\n .section-heading {\n display: flex;\n align-items: end;\n justify-content: space-between;\n gap: 20px;\n margin: 34px 0 18px;\n }\n\n .section-heading h2 {\n margin: 0;\n font-size: clamp(26px, 3vw, 42px);\n line-height: 1;\n }\n\n .section-heading p {\n margin: 0;\n color: var(--muted);\n max-width: 500px;\n font-size: 15px;\n line-height: 1.42;\n }\n\n .frame-grid {\n display: grid;\n grid-template-columns: repeat(3, minmax(310px, 1fr));\n gap: 18px;\n align-items: start;\n }\n\n .frame-card {\n border: 1px solid var(--border);\n border-radius: 8px;\n background: color-mix(in oklch, var(--surface), transparent 3%);\n padding: 14px;\n box-shadow: 0 22px 52px color-mix(in oklch, var(--fg), transparent 91%);\n }\n\n .frame-title {\n display: flex;\n justify-content: space-between;\n gap: 12px;\n align-items: baseline;\n margin: 0 0 10px;\n min-height: 32px;\n }\n\n .frame-title h3 {\n margin: 0;\n font-size: 15px;\n }\n\n .frame-title a {\n color: var(--accent);\n font-size: 12px;\n font-weight: 900;\n text-decoration: none;\n }\n\n iframe {\n display: block;\n width: 100%;\n height: auto;\n aspect-ratio: 390 / 844;\n border: 0;\n border-radius: 34px;\n background: transparent;\n }\n\n .android-frame iframe { aspect-ratio: 412 / 900; }\n\n @media (max-width: 1180px) {\n .hero { grid-template-columns: 1fr; }\n .summary { grid-template-columns: repeat(3, 1fr); }\n .frame-grid { grid-template-columns: repeat(2, minmax(300px, 1fr)); }\n }\n\n @media (max-width: 760px) {\n .page {\n width: min(100% - 20px, 430px);\n padding-top: 22px;\n }\n\n .summary { grid-template-columns: 1fr; }\n .section-heading { display: block; }\n .section-heading p { margin-top: 8px; }\n .frame-grid { grid-template-columns: 1fr; }\n }\n </style>\n</head>\n<body>\n <main class=\"page\">\n <section class=\"hero\" aria-labelledby=\"prototype-title\">\n <div>\n <div class=\"brand\"><span class=\"mark\" aria-hidden=\"true\"></span><span>QueueCare</span></div>\n <h1 id=\"prototype-title\">Doctor queues without appointment slots.</h1>\n <p>Patients choose a live doctor queue, receive a token, track how many people are ahead, and check in only when it is worth leaving.</p>\n </div>\n <div class=\"summary\" aria-label=\"Product workflow\">\n <div class=\"summary-card\"><strong>Patient first</strong><span>The core promise is a held queue position, not a fixed time that clinics cannot honor.</span></div>\n <div class=\"summary-card\"><strong>Live queue state</strong><span>Every doctor card carries patients ahead, expected call, token availability, and clinic notes.</span></div>\n <div class=\"summary-card\"><strong>Native adaptation</strong><span>iOS uses compact tab navigation; Android uses Material-style top actions, bottom nav, and a floating queue shortcut.</span></div>\n </div>\n </section>\n\n <section aria-labelledby=\"ios-title\">\n <div class=\"section-heading\">\n <h2 id=\"ios-title\">iOS flow</h2>\n <p>Discovery, doctor queue detail, and active token are separate app screens inside the iPhone frame.</p>\n </div>\n <div class=\"frame-grid\">\n <article class=\"frame-card\">\n <div class=\"frame-title\"><h3>Find queues</h3><a href=\"screens/ios-discover.html\">Open</a></div>\n <iframe title=\"iOS discovery screen\" src=\"/frames/iphone-15-pro.html?screen=screens/ios-discover.html\" width=\"390\" height=\"844\" loading=\"lazy\"></iframe>\n </article>\n <article class=\"frame-card\">\n <div class=\"frame-title\"><h3>Join queue</h3><a href=\"screens/ios-detail.html\">Open</a></div>\n <iframe title=\"iOS queue detail screen\" src=\"/frames/iphone-15-pro.html?screen=screens/ios-detail.html\" width=\"390\" height=\"844\" loading=\"lazy\"></iframe>\n </article>\n <article class=\"frame-card\">\n <div class=\"frame-title\"><h3>Track token</h3><a href=\"screens/ios-ticket.html\">Open</a></div>\n <iframe title=\"iOS ticket screen\" src=\"/frames/iphone-15-pro.html?screen=screens/ios-ticket.html\" width=\"390\" height=\"844\" loading=\"lazy\"></iframe>\n </article>\n </div>\n </section>\n\n <section aria-labelledby=\"android-title\">\n <div class=\"section-heading\">\n <h2 id=\"android-title\">Android flow</h2>\n <p>The same product adapts to Android density, action placement, and Material-style controls.</p>\n </div>\n <div class=\"frame-grid\">\n <article class=\"frame-card android-frame\">\n <div class=\"frame-title\"><h3>Find queues</h3><a href=\"screens/android-discover.html\">Open</a></div>\n <iframe title=\"Android discovery screen\" src=\"/frames/android-pixel.html?screen=screens/android-discover.html\" width=\"412\" height=\"900\" loading=\"lazy\"></iframe>\n </article>\n <article class=\"frame-card android-frame\">\n <div class=\"frame-title\"><h3>Join queue</h3><a href=\"screens/android-detail.html\">Open</a></div>\n <iframe title=\"Android queue detail screen\" src=\"/frames/android-pixel.html?screen=screens/android-detail.html\" width=\"412\" height=\"900\" loading=\"lazy\"></iframe>\n </article>\n <article class=\"frame-card android-frame\">\n <div class=\"frame-title\"><h3>Track token</h3><a href=\"screens/android-ticket.html\">Open</a></div>\n <iframe title=\"Android ticket screen\" src=\"/frames/android-pixel.html?screen=screens/android-ticket.html\" width=\"412\" height=\"900\" loading=\"lazy\"></iframe>\n </article>\n </div>\n </section>\n </main>\n</body>\n</html>\n</artifact>"
},
{
"type": "usage",
"usage": {
"input_tokens": 0,
"output_tokens": 1982076,
"cached_read_tokens": 0
}
}
]
}