mirror of https://github.com/nexu-io/open-design.git synced 2026-06-01 03:14:35 +07:00

feat(claude): wire AskUserQuestion tool through chat + pin TodoWrite (#1743 )

* feat(claude): wire AskUserQuestion tool through chat + pin TodoWrite

Claude calls `AskUserQuestion` for mid-conversation clarifications when
the natural answer is one of a small finite set of choices. Until now
the tool round trip hit two dead ends in headless mode: claude-code -p
cannot prompt the user, so it auto-errored the tool and retried 4x;
the model then hedged by also writing the same options as a markdown
bulleted list. The host had no way to feed a real `tool_result` back.

This change makes the AskUserQuestion round trip work end to end:

* Switch Claude to `--input-format stream-json`. The daemon wraps the
  prompt as a JSONL `user` message on stdin and keeps stdin OPEN, so
  later writes (a `tool_result` for the open AskUserQuestion) feed
  back into the same child instead of needing a fresh spawn.
* New `RuntimeAdapter.promptInputFormat()` ('text' default,
  'stream-json' for Claude) so the spawn loop keeps the old close-on-
  prompt behavior for every other agent.
* New `POST /api/runs/:id/tool-result` daemon endpoint and
  `submitChatRunToolResult` web helper. Body carries `toolUseId` and
  `content`; daemon writes a JSONL `user` message with the matching
  `tool_result` content block.
* Track outstanding host answers on the run (`pendingHostAnswers`)
  and close stdin on either a `usage` event or a synthesized
  `turn_end` event (extracted from `assistant.message.stop_reason`
  in `claude-stream`). Without the per-turn `turn_end` signal stdin
  would never close after the follow up turn finished and the run
  would hang until the inactivity watchdog killed it.
* System prompt: tell Claude to use AskUserQuestion for follow ups
  with 2-4 finite choices, and to STOP after the tool call instead
  of writing a markdown duplicate.

Web UI:

* New `AskUserQuestionCard` renders the tool input as labelled chip
  buttons (single or multi select) with a Submit button styled like
  the composer's Send. On submit the answer routes through
  `submitChatRunToolResult` (live tool_result path) and falls back
  to `onSubmitForm` (plain user message) only if the run has already
  terminated. Selected chips persist across page reloads by re
  parsing the stored `tool_result.content`.
* Hide markdown text that follows an AskUserQuestion in the same
  turn — defense in depth against the model emitting the duplicate.
* Collapse identical `AskUserQuestion` / `TodoWrite` retries inside
  any tool group to a single card. TodoWrite is a snapshot tool,
  so older calls are duplicates of state.
* Pinned TodoCard above the chat composer. The latest TodoWrite
  snapshot across the conversation renders once, expandable /
  collapsible header, count shows in-progress + completed (1/4),
  Done button dismisses when all tasks finish, soft fade gradient
  above so scrolling chat text dissolves into the panel instead of
  hard clipping under the card.
* Composer gains a top shadow that only appears when the pinned
  todo slot sits directly above it (dark mode strengthened).
* Accordion expand / collapse motion shared between TodoCard, the
  ToolGroupCard disclosure, and BashCard output via
  `grid-template-rows: 0fr -> 1fr` with `cubic-bezier(0.23, 1, 0.32, 1)`
  and asymmetric durations (200ms enter, 140ms exit) per Emil
  Kowalski's animation framework.
* Jump-to-latest button no longer unmounts on hide; slides up with
  scale 0.9 -> 1 + fade on show, slides down with scale + fade on
  hide. Always horizontally centered via `margin: 0 auto`.

i18n:

* `tool.askQuestion`, `tool.askQuestionSubmit`, `tool.askQuestionPending`,
  `tool.askQuestionAnswered`, `tool.todosExpand`, `tool.todosCollapse`,
  `tool.todosDone`, `tool.todosDismiss` added to all 18 locales.

Unblocker:

* Fix a pre-existing render loop in `ProjectView` when the user
  clicks "New conversation". `handleNewConversation` now navigates
  to the fresh conversation id synchronously after
  `setActiveConversationId` so the route-sync effect at L512 and
  the URL-sync effect at L851 do not ping pong (route mismatch
  triggered repeated reverts; React's nested-update guard fired).

* fix(claude): order turn_end after content blocks + cover chat switching

Two follow-up fixes to the AskUserQuestion + new-conversation work:

* `claude-stream.ts` emitted `turn_end` BEFORE iterating the assistant
  message's content blocks. When claude-code lacks
  `--include-partial-messages` (older builds), tool_use events surface
  only from that loop, so the daemon's stdin-close handler saw an
  empty `pendingHostAnswers` set and closed stdin before the
  AskUserQuestion tool_use was even registered. The result: the model
  retried, hit the same race, and gave up writing the questions in
  prose. Emit `turn_end` AFTER the content loop so tool_use ids land
  in `pendingHostAnswers` first.

* `server.ts` now ignores `turn_end` events with
  `stop_reason: 'tool_use'`. That stop reason means the model paused
  to wait for a tool execution (claude-code's internal tool runner
  for Bash / Edit / Read, or a host-answered tool like
  AskUserQuestion). Either way the conversation is still in flight —
  closing stdin there would kill the follow-up response. Only the
  natural turn-end stop reasons (`end_turn`, etc.) close stdin.

* `ProjectView.handleSelectConversation` now navigates to the picked
  conversation id synchronously, mirroring the fix already in
  handleNewConversation. The route-sync effect at L512 was reverting
  the active conversation on every switch, ping-ponging with the
  URL-sync effect at L851 until React's nested-update guard fired
  with "Maximum update depth exceeded". Same bug class as the
  pre-existing new-conversation render loop.

* docs(agents): capture AskUserQuestion runtime + chat UI conventions

Record the patterns this PR introduces so future contributors can find
them without spelunking server.ts:

* Agent runtime conventions — `RuntimeAgentDef.promptInputFormat`,
  `run.pendingHostAnswers` / `run.stdinOpen` lifecycle, `turn_end`
  ordering rule, `POST /api/runs/:id/tool-result` endpoint shape, the
  Claude only system prompt block that nudges AskUserQuestion, and the
  `suppressAskUserQuestionFallbackText` defense in depth.
* Chat UI conventions — URL-load vs srcDoc render mode dispatch with
  bridge disqualifiers, the dual iframe visibility swap pattern,
  `isOurIframe` plus the active-iframe re-check for signals that must
  only come from the visible iframe, pinned TodoCard via
  `PinnedTodoSlot`, count includes `in_progress`,
  `dedupeSnapshotToolRetries` for AskUserQuestion / TodoWrite stacks.
* i18n keys — 18 locale files, add the key to `types.ts` first.
* UI animation philosophy — `cubic-bezier(0.23, 1, 0.32, 1)` ease out,
  asymmetric 200/140ms enter/exit, accordion via `grid-template-rows`,
  no `transform: scale(0)`, keep mounted + toggle class for exit
  transitions instead of relying on React unmount.

* fix(claude): read promptInputFormat as field, close stdin on deferred answer

Two PR review follow-ups on the AskUserQuestion stream-json wiring.

* server.ts:4616 referenced `runtimeAdapter.promptInputFormat()` — but
  `runtimeAdapter` is not declared, imported, or assigned anywhere. The
  prior adapter abstraction was deleted in #1656; when the changes
  were folded back into the inline handler the format was moved onto
  `RuntimeAgentDef.promptInputFormat`, but this call site was missed.
  `server.ts` starts with `// @ts-nocheck` so typecheck never caught
  it — every chat run hit `ReferenceError: runtimeAdapter is not
  defined` the moment we wrote the prompt to a stdin-fed child, which
  is every agent with `promptViaStdin: true` (claude, codex, copilot,
  cursor-agent, gemini, opencode, pi, qoder). Read the format off the
  in-scope `def` and default missing values to `'text'`.

* `submitToolResultToRun` cleared the answered id from
  `pendingHostAnswers` but never closed stdin if a `turn_end` /
  `usage` event had already fired with the set non-empty (deferred
  by the event handler). The child then waited indefinitely for
  further input until the inactivity watchdog killed it, losing the
  model's follow-up response. Close stdin on the last-answer
  transition when stream-json stdin is still open.

Test: pin `promptInputFormat` for every `promptViaStdin: true` agent
so future regressions of the field-vs-method contract fail at
typecheck-adjacent test time instead of in production. The new test
asserts `typeof def.promptInputFormat` is a string (or undefined),
not a function — exactly the shape mistake the original line made.

* fix(web): keep AskUserQuestion multi-select chips selected after reload when labels contain commas

`handleSubmit` joined multi-select answers with `', '` while the
reload parser split them on `','`. The pair is asymmetric: a valid
model-generated option like `"Yes, including images"` round-tripped
as `["Yes", "including images"]`, so after a page reload the locked
question card showed the user's pick as unselected — even though the
`tool_result` content the daemon actually wrote into the run was
correct, and the model saw the right answer. Bounded to post-reload
visual state, but silently confusing.

Switch to a `- ` bullet list per option, one per line, with the
parser stripping the leading `- ` back off. Newlines never appear
inside a label so the round trip is exact. The outer pairs separator
stays `\n\n` because individual answer bodies still never contain
that double-newline.

* chore: drop accidental personal design-system file

`design-systems/foldar/DESIGN.md` was added to the AskUserQuestion
branch in 31ac531 by mistake — it's a personal brand spec that does
not belong in the upstream design-systems catalogue. Removing it
keeps the branch's surface area scoped to the feature.

2026-05-15 15:50:27 +08:00

23 KiB

Raw Blame History

Directory guide

This file is the single source of truth for agents entering this repository. Read this file first; after entering apps/, packages/, tools/, or e2e/, read that layer's AGENTS.md for module-level details. Do not copy module details back into the root file; root stays focused on cross-repository boundaries, workflow, and commands.

Core documentation index

Product and onboarding: README.md, README.zh-CN.md, QUICKSTART.md.
Contribution and environment: CONTRIBUTING.md, CONTRIBUTING.zh-CN.md.
Architecture and protocols: docs/spec.md, docs/architecture.md, docs/skills-protocol.md, docs/agent-adapters.md, docs/modes.md.
Roadmap and references: docs/roadmap.md, docs/references.md, docs/code-review-guidelines.md, specs/current/maintainability-roadmap.md.
Directory-level agent guidance: apps/AGENTS.md, packages/AGENTS.md, tools/AGENTS.md, e2e/AGENTS.md.

Workspace directories

Workspace packages come from pnpm-workspace.yaml: apps/*, packages/*, tools/*, and e2e.
Top-level content directories: skills/ (functional skills the agent invokes mid-task — utilities, briefs, packagers; see skills/AGENTS.md), design-templates/ (rendering catalogue: decks, prototypes, image/video/audio templates; see design-templates/AGENTS.md and specs/current/skills-and-design-templates.md), design-systems/ (brand DESIGN.md files), craft/ (universal brand-agnostic craft rules a skill can opt into via od.craft.requires).
apps/web is the Next.js 16 App Router + React 18 web runtime; do not restore apps/nextjs.
apps/daemon is the local privileged daemon and od bin. It owns /api/*, agent spawning, skills, design systems, artifacts, and static serving.
apps/desktop is the Electron shell; it discovers the web URL through sidecar IPC.
apps/packaged is the thin packaged Electron runtime entry; it starts packaged sidecars and owns the od:// entry glue only.
packages/contracts is the pure TypeScript web/daemon app contract layer.
packages/sidecar-proto owns the Open Design sidecar business protocol; packages/sidecar owns the generic sidecar runtime; packages/platform owns generic OS process primitives.
tools/dev is the local development lifecycle control plane.
tools/pack is the local packaged build/start/stop/logs control plane and mac beta release artifact preparation surface.
tools/pr is the maintainer PR-duty control plane: a thin gh wrapper that encodes this repo's review-lane derivation, forbidden-surface flags, lane checklists, and validation-command suggestions.
e2e owns user-level end-to-end smoke tests and Playwright UI automation; read e2e/AGENTS.md before editing its tests or commands.

Inactive or placeholder directories

apps/nextjs and packages/shared have been removed; do not recreate or reference them.
.od/, .tmp/, Playwright reports, and agent scratch directories are local runtime data and must stay out of git.

Development workflow

Environment baseline

Runtime target is Node ~24 and pnpm@10.33.2; use Corepack so the pnpm version pinned in package.json is selected.
New project-owned entrypoints, modules, scripts, tests, reporters, and configs should default to TypeScript.
Residual JavaScript is limited to generated output, vendored dependencies, explicitly documented compatibility build artifacts, and the allowlist in scripts/guard.ts.

Local lifecycle

Use pnpm tools-dev as the only local development lifecycle entry point.
Do not add or restore root lifecycle aliases: pnpm dev, pnpm dev:all, pnpm daemon, pnpm preview, or pnpm start.
Ports are governed by tools-dev flags: --daemon-port and --web-port.
tools-dev exports OD_PORT for the web proxy target and OD_WEB_PORT for the web listener; do not use NEXT_PORT.

Root command boundary

Keep root scripts reserved for true repo-level checks and tools control-plane entrypoints: pnpm guard, pnpm typecheck, pnpm tools-dev, pnpm tools-pack, and pnpm tools-pr.
Do not add root aggregate pnpm build or pnpm test aliases. Build/test commands must stay package-scoped (pnpm --filter <package> ...) or tool-scoped (pnpm tools-pack ... / pnpm tools-pr ...).
Do not add root e2e aliases; e2e package commands and ownership rules live in e2e/AGENTS.md.

Boundary constraints

Tests under apps/, packages/, and tools/ live in a package/app/tool-level tests/ directory sibling to src/; keep src/ source-only and do not add new *.test.ts or *.test.tsx files under src/. Playwright UI automation belongs to e2e/ui/, not app packages.
App packages must not import another app's private src/ or tests/ implementation as a shared helper. In particular, apps/web/** must not import apps/daemon/src/**; web/daemon integration belongs behind HTTP APIs, packages/contracts, and app-local provider boundaries.
Cross-app, cross-runtime, or repository-resource consistency checks belong in e2e/tests/ when they need to observe more than one app/package boundary; promote reusable logic to a pure package instead of borrowing another app's private source.
Keep shared API DTOs, SSE event unions, error shapes, task shapes, and example payloads in packages/contracts; update contracts before wiring divergent web/daemon request or response shapes.
Keep packages/contracts pure TypeScript and free of Next.js, Express, Node filesystem/process APIs, browser APIs, SQLite, daemon internals, and sidecar control-plane dependencies.
Keep project-owned entrypoints, modules, scripts, tests, reporters, and configs TypeScript-first; generated dist/*.js is runtime output, and source edits belong in .ts files.
New .js, .mjs, or .cjs files need an explicit generated/vendor/compatibility reason and must pass pnpm guard.
App business logic must not know about sidecar/control-plane concepts. Keep sidecar awareness in apps/<app>/sidecar or the desktop sidecar entry wrapper.
Shared web/daemon app contracts belong in packages/contracts; that package must not depend on Next.js, Express, Node filesystem/process APIs, browser APIs, SQLite, daemon internals, or the sidecar control-plane protocol.
Sidecar process stamps must have exactly five fields: app, mode, namespace, ipc, and source.
Orchestration layers (tools-dev, tools-pack, packaged launchers) must call package primitives; do not hand-build --od-stamp-* args or process-scan regexes.
Packaged runtime paths must be namespace-scoped and independent from daemon/web ports; ports are transient transport details only.
Default runtime files live under <project-root>/.tmp/<source>/<namespace>/...; POSIX IPC sockets are fixed at /tmp/open-design/ipc/<namespace>/<app>.sock.

Git commit policy

Git commits must not include Co-authored-by trailers or any other co-author metadata.

Pull request expectations

Opening a PR uses .github/pull_request_template.md; fill every section, not just the title.
"Why" must answer both the author's use case (what made you write this PR) and the pain being addressed (user problem, technical debt, prod issue, or unblocker), not just a one-line restatement of the title.
"What users will see" describes the change from a user's perspective — what they click, what new thing appears, what default behavior changed — not from a code perspective.
The Surface area checklist must reflect actual surfaces touched; check every box that applies, including extension points (skills/, design-systems/, design-templates/, craft/), CLI flags, env vars, i18n keys, and new root package.json dependencies.
If any UI surface is checked, attach screenshots showing the entry point — where users discover the change — not just the feature in isolation; before/after is best for behavior changes.
For bug-fix PRs, link the red-spec test that reproduces the bug and confirm it went red on main and green on the branch, per the Bug follow-up workflow section above.
CONTRIBUTING.md covers PR scope, title format, dependency policy, and the issue-first rule for non-trivial features; docs/code-review-guidelines.md is the reviewer-facing complement.

Code review guide

Use docs/code-review-guidelines.md as the repository-wide review standard. That document is the operational guide; this AGENTS.md is the source of truth when the two disagree.
Walk reviews top-down through docs/code-review-guidelines.md: Product relevance test → forbidden surfaces → ownership/scope → matching lane → checklist → comments → approval bar.
Pick the matching review lane: default code/tests, contract and protocol changes, design-system additions, skill additions, or craft additions.
Before reviewing changes under apps/, packages/, tools/, or e2e/, read that directory's AGENTS.md and apply its local boundaries.
Blocking review feedback should focus on correctness, security/secrets, data integrity, repository boundary violations, contract/migration breakage, missing required validation, or high-risk maintainability issues.
Only maintainers may close a PR instead of requesting changes, and only when the change is not salvageable on the existing branch (wrong target product, foreign test harness, DOM/API assumptions absent from this repo, or scripts that conflict with lifecycle rules).

PR-duty tooling

pnpm tools-pr is the maintainer-only control plane for PR-duty work on this repo. It is a thin gh wrapper that encodes repo-specific knowledge — review-lane derivation, forbidden-surface flags, per-lane checklists, validation-command suggestions, and a fixed dictionary of factual classify tags (bot-only-approval, needs-rebase, stale-approval, unresolved-changes-requested, awaiting-* timing, org-member, etc.). The tool is read-only on the PR surface: it never approves, merges, comments, or closes; those side effects stay in explicit gh invocations the maintainer runs.

Common subcommands:

pnpm tools-pr list — triage the open queue by lane and review-state bucket.
pnpm tools-pr view <num> — factual review brief for a single PR.
pnpm tools-pr classify --all — script-level tag JSON for the whole open queue (entry point for cron / digest consumers); per-PR classify <num> for spot checks.
pnpm tools-pr assignment — assigner-perspective ownership + idle-time / blocker view across the queue.

For the full tag dictionary, operational playbook (direct merge / duplicate-title / awaiting-author / org-member / agent-review flows), comment templates, language-detection rules, and tool-design constraints (precision boundaries, factual-output rule, retry + pagination strategy), see tools/pr/AGENTS.md.

Agent runtime conventions

RuntimeAgentDef.promptInputFormat selects how the daemon writes the prompt to a child's stdin. The default 'text' writes the composed prompt and ends stdin immediately. 'stream-json' wraps the prompt as one JSONL user message and KEEPS stdin open so the daemon can stream further user messages back in mid-turn. Claude (apps/daemon/src/runtimes/defs/claude.ts) ships 'stream-json' together with --input-format stream-json so the host can answer interactive tools like AskUserQuestion with a real tool_result block. Every other agent stays on 'text'.
apps/daemon/src/server.ts tracks run.pendingHostAnswers (a Set of tool_use_id strings) and run.stdinOpen on the run object. The claude-stream-json event handler adds AskUserQuestion ids to the set and closes stdin only when both the set is empty AND a turn_end (or usage) event arrives with a non tool_use stop_reason. The tool_use stop reason means the model paused mid tool (waiting on claude-code's internal runner or on a host answer); closing stdin there would truncate the follow up response.
claude-stream.ts emits the turn_end event AFTER iterating the assistant message's content blocks, not before. When --include-partial-messages is unsupported, tool_use events surface only from the assistant wrapper, so emitting turn_end first would let the daemon close stdin before the host had registered any pending answers.
POST /api/runs/:id/tool-result is the daemon endpoint for feeding a tool_result block back into a still running stream-json child. Body shape: { toolUseId: string, content: string, isError?: boolean }. Web callers use submitChatRunToolResult from apps/web/src/providers/daemon.ts. The daemon writes a JSONL user message containing one tool_result content block, removes the id from pendingHostAnswers, and lets the next turn_end decide when to close stdin.
AskUserQuestion specifically: Claude's system prompt section in apps/daemon/src/prompts/system.ts (Claude only block at the bottom of composeSystemPrompt) tells the model to use the tool for 2 to 4 finite choices, and to stop generating tokens after the tool call instead of also writing a markdown duplicate. AssistantMessage.suppressAskUserQuestionFallbackText is the belt and suspenders that hides any trailing markdown text in the same turn.

Chat UI conventions

apps/web/src/components/file-viewer-render-mode.ts decides URL-load vs srcDoc for HTML previews. Bridges (deck, comment/inspect selection, palette, edit, tweaks) can ONLY inject through the srcDoc path. Add a new disqualifier to UrlLoadDecision whenever a feature needs a srcDoc-only bridge; pass it from FileViewer.tsx based on a source-content heuristic where appropriate (e.g. hasTweaksTemplate). The host keeps both iframes mounted simultaneously and swaps CSS visibility so toggling render mode does not cause an iframe reload flash; iframeRef.current stays aligned with the active iframe via useEffect. Receive filters use isOurIframe(ev.source) to accept messages from either iframe but signals that should ONLY come from the active iframe (e.g. od:tweaks-available) re-check ev.source === iframeRef.current?.contentWindow.
TodoWrite UI pins one canonical task list above the chat composer via PinnedTodoSlot in ChatPane.tsx. The slot reads the latest TodoWrite snapshot across the conversation through latestTodoWriteInputFromMessages (apps/web/src/runtime/todos.ts). AssistantMessage.stripTodoToolGroups removes any TodoWrite tool groups from per message rendering so there is exactly one TodoCard on screen. The progress count includes both completed and in_progress items (1/4 reads "one underway" not "zero finished"). Dismissal via the Done button is keyed on the snapshot's JSON, so a fresh TodoWrite from the agent automatically re shows the card.
AskUserQuestionCard (in ToolCard.tsx) prefers the live onAnswerToolUse(toolUseId, content) route (POSTs to /api/runs/:id/tool-result) and falls back to the legacy onSubmitForm(text) path when the run has already terminated. Selected chips persist across reloads by parsing the stored tool_result.content back into the selections shape.
Tool group rendering uses dedupeSnapshotToolRetries to collapse identical AskUserQuestion retries (one card per unique input, keeping the latest tool_use_id) and TodoWrite snapshots (only the most recent call, since each call is a state replace).

i18n keys

apps/web/src/i18n/types.ts is the typed Dict; every key must be defined in all 18 locale files under apps/web/src/i18n/locales/*.ts (ar, de, en, es-ES, fa, fr, hu, id, ja, ko, pl, pt-BR, ru, th, tr, uk, zh-CN, zh-TW). Add the key to types.ts first; missing translations produce a typecheck error.

UI animation philosophy

Default ease-out for UI transitions: cubic-bezier(0.23, 1, 0.32, 1). Built-in ease is too weak; ease-in is forbidden for UI elements because it feels sluggish.
Asymmetric durations: enter around 200ms, exit around 140ms. Exit reads as decisive because the user has already chosen to dismiss.
Accordion expand and collapse uses grid-template-rows: 0fr -> 1fr (modern auto height pattern). Pair with opacity fade and the easing above. The shared .accordion-collapsible + .accordion-collapsible-inner class pair (defined in apps/web/src/index.css) is the canonical implementation; reuse it for new disclosure UI.
Never animate from transform: scale(0). Start from scale(0.9) or higher with opacity: 0.
For elements that show conditionally, keep them mounted and toggle a CSS class (e.g. .chat-jump-btn-active). React unmounts skip the exit transition entirely.

Validation strategy

After package, workspace, or command-entry changes, run pnpm install so workspace links and generated dist entries stay fresh.
Before marking regular work ready, run at least pnpm guard and pnpm typecheck, plus the package-scoped tests/builds that match the files changed. Do not use or add root pnpm test/pnpm build aliases.
For local web runtime loops, prefer pnpm tools-dev run web --daemon-port <port> --web-port <port>.
On a GUI-capable machine, validate desktop by running pnpm tools-dev, then pnpm tools-dev inspect desktop status.
Stamp/namespace changes must validate two concurrent namespaces and run desktop inspect eval plus inspect screenshot for each namespace.
Path/log changes must run pnpm tools-dev logs --namespace <name> --json and confirm log paths are under .tmp/tools-dev/<namespace>/....

Bug follow-up workflow

The following is a working playbook for routine bug follow-ups, distilled from recent practice. Treat it as a default action shape, not a contract — production reality always has edges these bullets can't anticipate, so use judgment when the situation doesn't fit cleanly.

Lead with a red spec. Default to encoding the bug as a falsifiable test that goes red before any source change, so the fix is anchored in observable behavior rather than source-code intuition. If a red spec can't be written cheaply, that's usually a signal to clarify scope rather than push forward on a guess.
Try the cheapest layer first. Reach for the lightest test layer that can still see the symptom (e2e Vitest at the daemon HTTP boundary → app-local Vitest → Playwright UI → platform-native harnesses), and drop down only when the cheaper layer can't.
Hold the spec's scope. Defects discovered outside the bug's described boundary belong in a follow-up — their own red spec, their own PR — not in this fix. List them in the PR body's "Adjacent issues" section with the rationale and move on.
Let the fix read as an invariant. Prefer a named helper whose docblock describes what must hold over a bolt-on if guard with apologetic history-comments. The call site should read as intent.
Diff against the baseline. When neighboring suites have pre-existing failures, stash or check out upstream before claiming "no new failures."
Link the issue from the PR body. Use Fixes #N / Closes #N / Resolves #N so the issue auto-closes on merge and the release-time reverse lookup (gh issue view N --json closedByPullRequestsReferences → git tag --contains <merge sha>) actually has a chain to follow. The repo's PR template prompts for this; deleting the prompt is fine when the PR genuinely closes nothing.
Stage human verification for visible bugs. When the symptom needs an eye to confirm — UI, platform-native behavior, animations, race conditions a unit test can't see — green specs alone aren't acceptance. Stand up a buggy-vs-fix comparison the reviewer can drive themselves (typical shape: two namespaced runtimes, one on main, one on the fix branch), and seed any required data only through production HTTP APIs; source-level test backdoors invalidate the verification because they prove a fake flow rather than the real one.

For a worked example of one full loop (red e2e spec → fix → green), see e2e/tests/dialog/stop-reconciles-message.test.ts (issue #135).

Common commands

pnpm install
pnpm tools-dev
pnpm tools-dev start web
pnpm tools-dev run web --daemon-port 17456 --web-port 17573
pnpm tools-dev status --json
pnpm tools-dev logs --json
pnpm tools-dev inspect desktop status --json
pnpm tools-dev inspect desktop screenshot --path /tmp/open-design.png
pnpm tools-dev stop
pnpm tools-dev check

pnpm guard
pnpm typecheck

pnpm tools-pr list
pnpm tools-pr list --bucket=merge-ready,approved-blocked
pnpm tools-pr list --lane=skill,contract --json
pnpm tools-pr view 1180
pnpm tools-pr view 1180 --json

pnpm --filter @open-design/web typecheck
pnpm --filter @open-design/web test
pnpm --filter @open-design/web build
pnpm --filter @open-design/daemon test
pnpm --filter @open-design/daemon build
pnpm --filter @open-design/desktop build
pnpm --filter @open-design/tools-dev build
pnpm --filter @open-design/tools-pack build
pnpm --filter @open-design/tools-pr build

pnpm tools-pack mac build --to all
pnpm tools-pack mac install
pnpm tools-pack mac cleanup
pnpm tools-pack win build --to nsis
pnpm tools-pack win install
pnpm tools-pack win cleanup
pnpm tools-pack linux build --to appimage
pnpm tools-pack linux install
pnpm tools-pack linux build --containerized

FAQ

Why is there no root `pnpm dev` / `pnpm start`?

To avoid starting daemon, web, and desktop through inconsistent env, port, namespace, or log paths. All local lifecycle flows must go through pnpm tools-dev.

Why should `apps/nextjs` not be restored?

The current web runtime is apps/web. The historical apps/nextjs layout has been removed from the active repo shape; restoring it would reintroduce duplicate app boundaries and stale scripts.

How does desktop discover the web URL?

Desktop queries runtime status through sidecar IPC. The web URL comes from tools-dev launch status, not from desktop guessing ports or reading web internals.

How are sidecar-proto, sidecar, and platform split?

@open-design/sidecar-proto owns Open Design app/mode/source constants, namespace validation, stamp fields/flags, IPC message schema, status shapes, and error semantics. @open-design/sidecar provides only generic bootstrap, IPC transport, path/runtime resolution, launch env, and JSON runtime files. @open-design/platform provides only generic OS process stamp serialization, command parsing, and process matching/search primitives, consuming the proto descriptor.

Where is data written?

The daemon writes .od/ by default: SQLite at .od/app.sqlite, agent CWDs under .od/projects/<id>/, saved renders under .od/artifacts/, and credentials at .od/media-config.json. Two env vars override the storage root, in order:

OD_DATA_DIR=<dir> — relocates all daemon runtime data to <dir> (used by Playwright for test isolation, and by the packaged daemon and the Home Manager / NixOS modules to point the daemon at a writable directory when the install root is read-only). The path is resolved with ~/ expansion and relative paths anchored to <projectRoot>.
OD_MEDIA_CONFIG_DIR=<dir> — narrower override that relocates only media-config.json. Same resolution semantics. Most installs do not need this; it exists for setups that want to keep API credentials in a different location from the rest of the runtime data.

Default precedence is OD_MEDIA_CONFIG_DIR > OD_DATA_DIR > <projectRoot>/.od.

When is `pnpm install` required?

Run pnpm install after changing package manifests, workspace layout, command entrypoints, bin/link-related content, or after adding/removing workspace packages.

23 KiB Raw Blame History