open-design/apps/daemon/src/claude-stream.ts
Chris Seifert 9cf265e520
feat(claude): wire AskUserQuestion tool through chat + pin TodoWrite (#1743)
* feat(claude): wire AskUserQuestion tool through chat + pin TodoWrite

Claude calls `AskUserQuestion` for mid-conversation clarifications when
the natural answer is one of a small finite set of choices. Until now
the tool round trip hit two dead ends in headless mode: claude-code -p
cannot prompt the user, so it auto-errored the tool and retried 4x;
the model then hedged by also writing the same options as a markdown
bulleted list. The host had no way to feed a real `tool_result` back.

This change makes the AskUserQuestion round trip work end to end:

* Switch Claude to `--input-format stream-json`. The daemon wraps the
  prompt as a JSONL `user` message on stdin and keeps stdin OPEN, so
  later writes (a `tool_result` for the open AskUserQuestion) feed
  back into the same child instead of needing a fresh spawn.
* New `RuntimeAdapter.promptInputFormat()` ('text' default,
  'stream-json' for Claude) so the spawn loop keeps the old close-on-
  prompt behavior for every other agent.
* New `POST /api/runs/:id/tool-result` daemon endpoint and
  `submitChatRunToolResult` web helper. Body carries `toolUseId` and
  `content`; daemon writes a JSONL `user` message with the matching
  `tool_result` content block.
* Track outstanding host answers on the run (`pendingHostAnswers`)
  and close stdin on either a `usage` event or a synthesized
  `turn_end` event (extracted from `assistant.message.stop_reason`
  in `claude-stream`). Without the per-turn `turn_end` signal stdin
  would never close after the follow up turn finished and the run
  would hang until the inactivity watchdog killed it.
* System prompt: tell Claude to use AskUserQuestion for follow ups
  with 2-4 finite choices, and to STOP after the tool call instead
  of writing a markdown duplicate.

Web UI:

* New `AskUserQuestionCard` renders the tool input as labelled chip
  buttons (single or multi select) with a Submit button styled like
  the composer's Send. On submit the answer routes through
  `submitChatRunToolResult` (live tool_result path) and falls back
  to `onSubmitForm` (plain user message) only if the run has already
  terminated. Selected chips persist across page reloads by re
  parsing the stored `tool_result.content`.
* Hide markdown text that follows an AskUserQuestion in the same
  turn — defense in depth against the model emitting the duplicate.
* Collapse identical `AskUserQuestion` / `TodoWrite` retries inside
  any tool group to a single card. TodoWrite is a snapshot tool,
  so older calls are duplicates of state.
* Pinned TodoCard above the chat composer. The latest TodoWrite
  snapshot across the conversation renders once, expandable /
  collapsible header, count shows in-progress + completed (1/4),
  Done button dismisses when all tasks finish, soft fade gradient
  above so scrolling chat text dissolves into the panel instead of
  hard clipping under the card.
* Composer gains a top shadow that only appears when the pinned
  todo slot sits directly above it (dark mode strengthened).
* Accordion expand / collapse motion shared between TodoCard, the
  ToolGroupCard disclosure, and BashCard output via
  `grid-template-rows: 0fr -> 1fr` with `cubic-bezier(0.23, 1, 0.32, 1)`
  and asymmetric durations (200ms enter, 140ms exit) per Emil
  Kowalski's animation framework.
* Jump-to-latest button no longer unmounts on hide; slides up with
  scale 0.9 -> 1 + fade on show, slides down with scale + fade on
  hide. Always horizontally centered via `margin: 0 auto`.

i18n:

* `tool.askQuestion`, `tool.askQuestionSubmit`, `tool.askQuestionPending`,
  `tool.askQuestionAnswered`, `tool.todosExpand`, `tool.todosCollapse`,
  `tool.todosDone`, `tool.todosDismiss` added to all 18 locales.

Unblocker:

* Fix a pre-existing render loop in `ProjectView` when the user
  clicks "New conversation". `handleNewConversation` now navigates
  to the fresh conversation id synchronously after
  `setActiveConversationId` so the route-sync effect at L512 and
  the URL-sync effect at L851 do not ping pong (route mismatch
  triggered repeated reverts; React's nested-update guard fired).

* fix(claude): order turn_end after content blocks + cover chat switching

Two follow-up fixes to the AskUserQuestion + new-conversation work:

* `claude-stream.ts` emitted `turn_end` BEFORE iterating the assistant
  message's content blocks. When claude-code lacks
  `--include-partial-messages` (older builds), tool_use events surface
  only from that loop, so the daemon's stdin-close handler saw an
  empty `pendingHostAnswers` set and closed stdin before the
  AskUserQuestion tool_use was even registered. The result: the model
  retried, hit the same race, and gave up writing the questions in
  prose. Emit `turn_end` AFTER the content loop so tool_use ids land
  in `pendingHostAnswers` first.

* `server.ts` now ignores `turn_end` events with
  `stop_reason: 'tool_use'`. That stop reason means the model paused
  to wait for a tool execution (claude-code's internal tool runner
  for Bash / Edit / Read, or a host-answered tool like
  AskUserQuestion). Either way the conversation is still in flight —
  closing stdin there would kill the follow-up response. Only the
  natural turn-end stop reasons (`end_turn`, etc.) close stdin.

* `ProjectView.handleSelectConversation` now navigates to the picked
  conversation id synchronously, mirroring the fix already in
  handleNewConversation. The route-sync effect at L512 was reverting
  the active conversation on every switch, ping-ponging with the
  URL-sync effect at L851 until React's nested-update guard fired
  with "Maximum update depth exceeded". Same bug class as the
  pre-existing new-conversation render loop.

* docs(agents): capture AskUserQuestion runtime + chat UI conventions

Record the patterns this PR introduces so future contributors can find
them without spelunking server.ts:

* Agent runtime conventions — `RuntimeAgentDef.promptInputFormat`,
  `run.pendingHostAnswers` / `run.stdinOpen` lifecycle, `turn_end`
  ordering rule, `POST /api/runs/:id/tool-result` endpoint shape, the
  Claude only system prompt block that nudges AskUserQuestion, and the
  `suppressAskUserQuestionFallbackText` defense in depth.
* Chat UI conventions — URL-load vs srcDoc render mode dispatch with
  bridge disqualifiers, the dual iframe visibility swap pattern,
  `isOurIframe` plus the active-iframe re-check for signals that must
  only come from the visible iframe, pinned TodoCard via
  `PinnedTodoSlot`, count includes `in_progress`,
  `dedupeSnapshotToolRetries` for AskUserQuestion / TodoWrite stacks.
* i18n keys — 18 locale files, add the key to `types.ts` first.
* UI animation philosophy — `cubic-bezier(0.23, 1, 0.32, 1)` ease out,
  asymmetric 200/140ms enter/exit, accordion via `grid-template-rows`,
  no `transform: scale(0)`, keep mounted + toggle class for exit
  transitions instead of relying on React unmount.

* fix(claude): read promptInputFormat as field, close stdin on deferred answer

Two PR review follow-ups on the AskUserQuestion stream-json wiring.

* server.ts:4616 referenced `runtimeAdapter.promptInputFormat()` — but
  `runtimeAdapter` is not declared, imported, or assigned anywhere. The
  prior adapter abstraction was deleted in #1656; when the changes
  were folded back into the inline handler the format was moved onto
  `RuntimeAgentDef.promptInputFormat`, but this call site was missed.
  `server.ts` starts with `// @ts-nocheck` so typecheck never caught
  it — every chat run hit `ReferenceError: runtimeAdapter is not
  defined` the moment we wrote the prompt to a stdin-fed child, which
  is every agent with `promptViaStdin: true` (claude, codex, copilot,
  cursor-agent, gemini, opencode, pi, qoder). Read the format off the
  in-scope `def` and default missing values to `'text'`.

* `submitToolResultToRun` cleared the answered id from
  `pendingHostAnswers` but never closed stdin if a `turn_end` /
  `usage` event had already fired with the set non-empty (deferred
  by the event handler). The child then waited indefinitely for
  further input until the inactivity watchdog killed it, losing the
  model's follow-up response. Close stdin on the last-answer
  transition when stream-json stdin is still open.

Test: pin `promptInputFormat` for every `promptViaStdin: true` agent
so future regressions of the field-vs-method contract fail at
typecheck-adjacent test time instead of in production. The new test
asserts `typeof def.promptInputFormat` is a string (or undefined),
not a function — exactly the shape mistake the original line made.

* fix(web): keep AskUserQuestion multi-select chips selected after reload when labels contain commas

`handleSubmit` joined multi-select answers with `', '` while the
reload parser split them on `','`. The pair is asymmetric: a valid
model-generated option like `"Yes, including images"` round-tripped
as `["Yes", "including images"]`, so after a page reload the locked
question card showed the user's pick as unselected — even though the
`tool_result` content the daemon actually wrote into the run was
correct, and the model saw the right answer. Bounded to post-reload
visual state, but silently confusing.

Switch to a `- ` bullet list per option, one per line, with the
parser stripping the leading `- ` back off. Newlines never appear
inside a label so the round trip is exact. The outer pairs separator
stays `\n\n` because individual answer bodies still never contain
that double-newline.

* chore: drop accidental personal design-system file

`design-systems/foldar/DESIGN.md` was added to the AskUserQuestion
branch in 31ac531 by mistake — it's a personal brand spec that does
not belong in the upstream design-systems catalogue. Removing it
keeps the branch's surface area scoped to the feature.
2026-05-15 15:50:27 +08:00

269 lines
9.8 KiB
TypeScript

/**
* Parses Claude Code's `--output-format stream-json --verbose` JSONL stream
* (with or without `--include-partial-messages`) into a small set of
* UI-friendly events. With partial messages on, text arrives as
* `stream_event` deltas; without it (older builds <1.0.86, or any build
* where the flag isn't passed) text arrives only in the final `assistant`
* wrapper. We handle both. The UI only needs to know five things:
*
* - status : high-level lifecycle ("initializing", "requesting",
* "thinking")
* - text_delta : assistant text chunk (gets fed to the artifact parser)
* - thinking_delta: extended-thinking chunk (shown in a collapsed block)
* - tool_use : { id, name, input } (fires when input is complete)
* - tool_result : { tool_use_id, content, is_error }
* - usage : aggregated input/output/cache tokens + cost
*
* Callers give us `onEvent({ type, ...payload })`. We track per-content-block
* state to accumulate partial tool_use input JSON and emit a single
* `tool_use` event when that block stops.
*/
type StreamEvent = Record<string, unknown>;
type EventSink = (event: StreamEvent) => void;
type BlockState = { type?: unknown; name?: unknown; id?: unknown; input: string };
function isRecord(value: unknown): value is Record<string, unknown> {
return Boolean(value) && typeof value === 'object' && !Array.isArray(value);
}
export function createClaudeStreamHandler(onEvent: EventSink) {
let buffer = '';
// Per-content-block scratch, keyed by `${messageId}:${blockIndex}`.
const blocks = new Map<string, BlockState>();
// Tool uses already emitted from streamed `input_json_delta` data.
// Claude Code still repeats them in the final assistant wrapper, often with
// empty `{}` inputs, so we suppress that duplicate emission.
const streamedToolUseIds = new Set<string>();
// Most recent assistant message id so content_block_* events without an id
// can be attributed correctly.
let currentMessageId: string | null = null;
// Message ids that already streamed text via `stream_event` deltas.
// When `--include-partial-messages` is OFF (older Claude Code, e.g. 1.0.84
// pre-flag), no deltas arrive — only the final `assistant` wrapper carries
// text. The fallback below emits that text once, but we must skip it for
// newer builds that already streamed deltas, otherwise the message would
// duplicate.
const textStreamed = new Set<string>();
function blockKey(index: unknown): string {
return `${currentMessageId ?? 'anon'}:${index}`;
}
function feed(chunk: string) {
buffer += chunk;
let nl;
while ((nl = buffer.indexOf('\n')) !== -1) {
const line = buffer.slice(0, nl).trim();
buffer = buffer.slice(nl + 1);
if (!line) continue;
let obj;
try {
obj = JSON.parse(line);
} catch {
onEvent({ type: 'raw', line });
continue;
}
handleObject(obj);
}
}
function flush() {
const rem = buffer.trim();
buffer = '';
if (!rem) return;
try {
handleObject(JSON.parse(rem));
} catch {
onEvent({ type: 'raw', line: rem });
}
}
function handleObject(obj: unknown) {
if (!isRecord(obj)) return;
if (obj.type === 'system' && obj.subtype === 'init') {
onEvent({
type: 'status',
label: 'initializing',
model: obj.model ?? null,
sessionId: obj.session_id ?? null,
});
return;
}
if (obj.type === 'system' && obj.subtype === 'status') {
onEvent({ type: 'status', label: obj.status ?? 'working' });
return;
}
if (obj.type === 'stream_event' && isRecord(obj.event)) {
handleStreamEvent(obj.event);
return;
}
// `assistant` messages are the "block finished" signal for the current
// content block. For tool_use blocks whose input finished assembling,
// emit tool_use now with the final parsed input. For text blocks, emit
// the text as a single delta — but only if no streaming deltas already
// covered it (older Claude Code without --include-partial-messages
// delivers text only here; newer builds stream it and would duplicate).
if (obj.type === 'assistant' && isRecord(obj.message) && Array.isArray(obj.message.content)) {
currentMessageId = typeof obj.message.id === 'string' ? obj.message.id : currentMessageId;
const msgId = typeof obj.message.id === 'string' ? obj.message.id : null;
const alreadyStreamed = msgId ? textStreamed.has(msgId) : false;
// Per-turn `stop_reason` is emitted as `turn_end` AFTER the content
// blocks have been processed (see below). When `--include-partial-
// messages` is unsupported, tool_use events surface only from the
// assistant wrapper here — emitting `turn_end` before that loop
// would let the daemon's stdin-close handler see an empty
// `pendingHostAnswers` set and close stdin before the
// AskUserQuestion tool_use was registered, which made the round
// trip silently fail. Read the stop_reason now, emit after.
const stopReason = typeof obj.message.stop_reason === 'string'
? obj.message.stop_reason
: null;
for (const block of obj.message.content) {
if (!isRecord(block)) continue;
if (block.type === 'tool_use') {
if (typeof block.id === 'string' && streamedToolUseIds.has(block.id)) {
streamedToolUseIds.delete(block.id);
continue;
}
onEvent({
type: 'tool_use',
id: block.id,
name: block.name,
input: block.input ?? null,
});
} else if (
!alreadyStreamed &&
block.type === 'text' &&
typeof block.text === 'string' &&
block.text.length > 0
) {
onEvent({ type: 'text_delta', delta: block.text });
} else if (
!alreadyStreamed &&
block.type === 'thinking' &&
typeof block.thinking === 'string' &&
block.thinking.length > 0
) {
onEvent({ type: 'thinking_delta', delta: block.thinking });
}
}
// Surface the turn_end signal now that every tool_use in this
// assistant message has been emitted, so the daemon's stdin-close
// handler has the up-to-date `pendingHostAnswers` set before
// deciding whether to close stream-json input stdin.
if (stopReason) {
onEvent({ type: 'turn_end', stopReason });
}
return;
}
// `user` messages in a stream-json transcript are usually tool_result
// wrappers from prior turns.
if (obj.type === 'user' && isRecord(obj.message) && Array.isArray(obj.message.content)) {
for (const block of obj.message.content) {
if (!isRecord(block)) continue;
if (block.type === 'tool_result') {
onEvent({
type: 'tool_result',
toolUseId: block.tool_use_id,
content: stringifyToolResult(block.content),
isError: Boolean(block.is_error),
});
}
}
return;
}
if (obj.type === 'result') {
onEvent({
type: 'usage',
usage: obj.usage ?? null,
costUsd: obj.total_cost_usd ?? null,
durationMs: obj.duration_ms ?? null,
stopReason: obj.stop_reason ?? null,
});
return;
}
}
function handleStreamEvent(ev: Record<string, unknown>) {
if (ev.type === 'message_start') {
currentMessageId = isRecord(ev.message) && typeof ev.message.id === 'string' ? ev.message.id : null;
if (typeof ev.ttft_ms === 'number') {
onEvent({ type: 'status', label: 'streaming', ttftMs: ev.ttft_ms });
}
return;
}
if (ev.type === 'content_block_start' && isRecord(ev.content_block)) {
const key = blockKey(ev.index);
const block = ev.content_block;
blocks.set(key, { type: block.type, name: block.name, id: block.id, input: '' });
if (block.type === 'thinking') {
onEvent({ type: 'thinking_start' });
}
return;
}
if (ev.type === 'content_block_delta' && isRecord(ev.delta)) {
const state = blocks.get(blockKey(ev.index));
const delta = ev.delta;
if (delta.type === 'text_delta' && typeof delta.text === 'string') {
if (currentMessageId) textStreamed.add(currentMessageId);
onEvent({ type: 'text_delta', delta: delta.text });
return;
}
if (delta.type === 'thinking_delta' && typeof delta.thinking === 'string') {
if (currentMessageId) textStreamed.add(currentMessageId);
onEvent({ type: 'thinking_delta', delta: delta.thinking });
return;
}
if (delta.type === 'input_json_delta' && typeof delta.partial_json === 'string') {
if (state && state.type === 'tool_use') {
state.input += delta.partial_json;
}
return;
}
}
if (ev.type === 'content_block_stop') {
const key = blockKey(ev.index);
const state = blocks.get(key);
if (state && state.type === 'tool_use' && typeof state.id === 'string' && state.input.trim()) {
try {
onEvent({
type: 'tool_use',
id: state.id,
name: state.name,
input: JSON.parse(state.input),
});
streamedToolUseIds.add(state.id);
} catch {
// Fall through to the final assistant wrapper's input if the
// streamed JSON is malformed or incomplete.
}
}
blocks.delete(key);
return;
}
}
return { feed, flush };
}
function stringifyToolResult(content: unknown): string {
if (typeof content === 'string') return content;
if (Array.isArray(content)) {
return content
.map((c) => (isRecord(c) && c.type === 'text' ? String(c.text) : JSON.stringify(c)))
.join('\n');
}
return JSON.stringify(content);
}