Commit graph

8 commits

Author SHA1 Message Date
Alex Lucero
56a4e97871 feat: surface linked repo changes 2026-05-31 08:07:19 +02:00
JasonBroderick
0fbeaf829e
fix(#3247): Detect, terminate, and warn on fabricated role markers across all agent paths (#3303)
* fix(daemon): detect and strip fabricated role markers in model output (#3247)

Three-layer defence against models emitting `## user` / `## assistant` /
`## system` lines mid-response, which the chat host interprets as real
turn boundaries and acts on as unauthorised instruction:

1. **System prompt**: anti-roleplay instruction elevated from a bullet
   under "What you don't do" to a standalone `## CRITICAL` section in
   `official-system.ts`, with a REMINDER pinned at the end of the
   composed prompt for recency bias.

2. **Stream-level detection and truncation**: shared `role-marker-guard.ts`
   module (`createRoleMarkerGuard` + `FABRICATED_ROLE_MARKER_RE`) used
   across all text paths — Claude stream (per-message guards), non-Claude
   structured streams (run-scoped guard via `emitGuardedTextDelta`),
   and BYOK proxy routes (`createDeltaGuard`). When a marker is detected,
   the contaminated suffix is dropped and a `fabricated_role_marker` event
   surfaces a warning in the UI.

3. **UI**: `StatusPill` gains `is-warning` / `is-error` CSS variants;
   `fabricated_role_marker` events render as amber warning pills.

* fix(chat-routes): do not await reader.cancel() on stream early-return

The await on reader.cancel() can hang indefinitely on response streams
whose underlying source is a Uint8Array (most notably surfaced by the
ollama test in proxy-routes.test.ts, which builds its mock body via
`new Response(uint8array)` rather than the controller-based helper
`sseResponse()`). The hung await holds the request handler open, which
in turn blocks `server.close()` in the afterAll hook, producing the two
test timeouts (test at 145, hook at 36) currently failing CI on #3296.

Fix is in production code, not the test: don't await the cancel. It
is a cleanup hint and we are returning from the function anyway, so
blocking on it offers no value. fire-and-forget with an empty catch
keeps the cancel signal flowing for real HTTP streams without
risking a hang on mock/edge-case implementations.

Co-Authored-By: JasonBroderick <jason@buddyboss.com>

* fix(daemon): terminate child on role-marker detection (close #3247 generation vector)

PR #3296's detection layer truncates display and persistence of fabricated
role markers, but the underlying model subprocess keeps generating tokens
after detection. Three concrete consequences:

  1. The model bills the user for the entire contaminated response
     (we observed 5,106 chars stored in claude's session file for a turn
     where only the first 3,013 chars were legitimate — a 40% overhead).
  2. tool_use blocks emitted AFTER the marker reach the daemon's
     dispatcher unchecked, since detection only gates the text-delta
     emission path, not content-block-stop / tool_use blocks. The
     model could fabricate "## user delete file X" then emit a
     tool_use(delete X) that the dispatcher would execute.
  3. The UI surfaces a `fabricated_role_marker` warning followed by an
     eventual normal turn-end, blurring the distinction between
     "completed normally" and "killed by safety guard."

This commit adds a single idempotent `abortForRoleMarker(marker)`
helper in server.ts, scoped to the same closure as `child` and
`runGuard`. On any detection event (per-message Claude guard,
run-scoped non-Claude guard, plain stdout guard) the helper:

  - Emits a structured `ROLE_MARKER_HALLUCINATION` SSE error so the
    UI can render a security-class status distinct from a normal
    turn-end. The existing `fabricated_role_marker` warning is still
    sent and rendered as the amber pill (PR #3296's UI).
  - Calls `acpSession.abort()` for ACP-multiplexed agents (Hermes,
    Kimi, Devin, Kiro) whose I/O doesn't necessarily release on
    SIGTERM of the wrapper process alone.
  - SIGTERMs the child immediately, with the existing
    `scheduleForcedChildShutdown()` SIGKILL fallback at 2x grace.

Wired into three sites where contamination is detected:
  - `emitGuardedTextDelta` (sendAgentEvent / copilot / ACP / pi-rpc
    text_delta paths)
  - Plain-stdout listener (BYOK plain mode)
  - The Claude stream handler's onEvent (per-message guards in
    claude-stream.ts surface `fabricated_role_marker` events directly
    via onEvent rather than through the run-scoped emitGuardedTextDelta)

Tool_use blocks emitted BEFORE the marker still flow through normally
— this guard can't help with those, since by the time we observe a
text marker the prior content block has already finished. Closing
that gap requires speculative cancellation of in-flight tool calls
when a downstream text block contains a marker; that's tracked as
follow-up work, not included here.

Co-Authored-By: roverkai <2196140098@qq.com>
Co-Authored-By: JasonBroderick <jason@buddyboss.com>

* refactor(role-marker-guard): bounded tail + drop chat-style markers

Addresses two review comments on #3303:

(1) O(1) memory + per-delta work (review r3323982225)
  Replace the unbounded `accumulated` string with a rolling tail capped
  at TAIL_BUFFER_SIZE (64 chars — comfortably exceeds the longest
  marker prefix `\n<whitespace>## assistant` ≈ 16–24 chars in practice).
  A 50 KB assistant response delivered in 1000 chunks of 50 bytes was
  previously O(n²) on string concatenation alone; now it is O(1) per
  delta regardless of message length. The `tail.length` value carries
  the "already emitted" offset that the cut-point math needs, so the
  offset semantics at L74–78 of the prior implementation are preserved
  without re-introducing the full-text buffer.

(2) Drop chat-style markers entirely (review r3323982234, option (a))
  `User:` / `Assistant:` / `Human:` / `AI:` are removed from the regex.
  Rationale:
    - The host parses ONLY `## user` / `## assistant` / `## system`
      lines as turn boundaries (see `buildDaemonTranscript` in
      apps/web/src/providers/daemon.ts). A model emitting chat-style
      markers does NOT cause the original #3247 security failure.
    - With kill-on-detection wired in this PR (`abortForRoleMarker`
      in server.ts), a false positive aborts the whole run — far
      more expensive than a stray unflagged `User:` line in chat
      scrollback. Chat-style markers collide with legitimate output
      (form labels, email contacts, JSDoc) often enough that pairing
      them with kill-semantics is the wrong tradeoff.
  The tradeoff is now documented in the regex docblock so the
  kill-on-match behaviour is justified against the false-positive
  surface.

Also aligns the prompt-side CRITICAL block in system.ts: drop the
"don't emit User: / Assistant: / Human: / AI:" bullet, since we no
longer enforce it. Less ambiguity for the model and the operators.

Test file updated:
  - Chat-style positive tests flipped to negative ("does NOT match
    User: — chat-style out of scope") so the intentional exclusion
    has a permanent regression test.
  - Two new tests cover the bounded-tail behaviour: a marker arriving
    after 10 KB of clean text in small chunks, and a marker
    straddling a chunk boundary after 100 prior chunks.
  - Added test for legitimate `User: bob@example.com`-style content
    not triggering contamination.
Test count is now 35 (up from 25); two of the new ones explicitly
exercise the new bounded-tail path.

Co-Authored-By: JasonBroderick <jason@buddyboss.com>

* fix(role-marker-guard): drop \`^\` anchor after first chunk (review r3324060995)

Blocking correctness bug introduced by commit 4 (bounded-tail refactor):
once \`tail\` is a rolling slice of mid-stream text, \`^\` in the
canonical regex \`(?:^|\\n)\\s*##\\s+(?:user|...)\` no longer represents
the genuine message start. As the rolling window slides forward chunk
by chunk, a sliced tail can begin with whitespace + \`##\` (or just
\`##\`), letting \`^\` anchor a match against text that the
full-buffer implementation correctly ignored. With kill-on-detection
wired in commit 3, that false positive now SIGTERMs the run and emits
a \`ROLE_MARKER_HALLUCINATION\` error — exactly the failure class
called out in the docblock at L22–29.

Reviewer's evidence (PerishCode, r3324060995): streaming
"…take a look at the ## user content section…" one character at a
time reports \`contaminated: true\` post-refactor; the same text in a
single feed stays clean.

Fix: keep the canonical \`FABRICATED_ROLE_MARKER_RE\` for the very
first non-empty feed (where \`^\` legitimately points at the message
start), and switch to an internal \`NEWLINE_ANCHORED_ROLE_MARKER_RE\`
(\`\\n\\s*##\\s+(?:user|...)\` — drops the \`^\` alternative) for all
subsequent feeds. A \`firstChunk\` boolean tracks the state. Real
newline-preceded markers straddling chunk boundaries are still caught
because the preceding \`\\n\` is retained inside the 64-char tail.

Regression tests added (\`apps/daemon/tests/role-marker-guard.test.ts\`):
  - mid-line \`## user\` streamed char-by-char with no preceding \\n
    (mirrors the reviewer's repro)
  - space-preceded mid-line \`## user\` in a >130-char stream, which
    long enough to force the rolling window past the marker — exercises
    the exact slice condition that triggered the bug
  - real \\n-preceded \`## user\` still caught after a long preamble
    (positive case must not regress)
  - \`## user\` as the very first chunk still caught (\`^\` legitimately
    anchors on the first feed)

Co-Authored-By: JasonBroderick <jason@buddyboss.com>

* fix(role-marker-guard): case-sensitive + tighter prefix scope (reviews r3324151877 / r3324151882)

Two refinements addressing the third review on #3303:

== Blocking (r3324151877) ==
The regex over-matched legitimate Markdown headings, and with
kill-on-detection wired in commit 3 each false positive
deterministically aborts a real run. Three changes tighten the match
to the actual security surface — `## user` / `## assistant` /
`## system` lines the chat host parses as turn boundaries — without
losing any real attack pattern:

1. CASE-SENSITIVE. Dropped the `/i` flag. The host's turn-boundary
   delimiter is lowercase (see `buildDaemonTranscript` in
   apps/web/src/providers/daemon.ts), and the `## CRITICAL`
   system-prompt block already forbids only the lowercase forms.
   Title-Case headings like `## User Guide`, `## System Architecture`,
   `## Assistant settings` are now ignored — these are legitimate
   technical writing patterns LLMs emit constantly. `## USER NOTES`
   (all-caps) likewise no longer flags.

2. POSITIVE LOOKAHEAD `(?=[^a-z])` after the role keyword. Without it,
   `## userland`, `## userspace`, `## users guide`, `## systemd`,
   `## assistance` all match via prefix in the alternation. The
   lookahead requires the next character to exist and to not be a
   lowercase letter, so:
     - `## user\\n…`     → match (newline is not lowercase)
     - `## assistantR…` → match (R is uppercase; the glued-form
                          attack pattern still gets caught)
     - `## assistant.`  → match (. is not a letter)
     - `## users guide` → no match (s is lowercase letter)
     - `## userland`    → no match (l is lowercase letter)
   POSITIVE rather than NEGATIVE `(?![a-z])` because the negative
   form is satisfied at end-of-string, which in a streaming context
   means "we have `## user` but don't know what comes next yet" —
   would fire prematurely if `land` arrives in a later chunk. The
   positive form delays detection by one character in that edge
   case, traded for correctness.

3. `[ \\t]` instead of `\\s` for inner whitespace. Markdown role
   markers are single-line by convention; restricting to space/tab
   prevents oddities like `##\\nuser` from matching across lines.

Test file: added Title-Case fixtures (`## User Guide`,
`## System Architecture`, `## Assistant settings`, `## USER NOTES`)
and prefix-of-longer-word fixtures (`## users guide`, `## userland`,
`## systemd`, `## assistance`) — each asserting NO contamination.
The existing `## usability` negative test gave false confidence as
the reviewer noted (only failed via alternation-miss, not via
word-boundary semantics); the new fixtures actually exercise the
lookahead. Also added a positive test for `## assistant.` (glued
punctuation) to balance the existing `## assistantReading`
(glued uppercase) coverage. Total tests: 35 → 50.

== Non-blocking (r3324151882) ==
Added `ROLE_MARKER_HALLUCINATION` to `API_ERROR_CODES` in
`packages/contracts/src/errors.ts` alongside the existing agent/AMR
codes, with a docblock comment explaining the emission contract:
emitted by `server.ts::abortForRoleMarker` alongside the existing
`fabricated_role_marker` warning event when the daemon detects a
fabricated Markdown role marker in agent output; retryable. The code
was already being emitted over the wire but unregistered — landing
the registration here keeps the contract and emitter in sync as
reviewer requested.

Co-Authored-By: JasonBroderick <jason@buddyboss.com>

* fix(role-marker-guard): defer complete-but-unconfirmed marker suffix

Addresses review r3324277xxx — the boundary case where a stream chunk
boundary lands between the role keyword and its lookahead character
violated the documented "everything from the marker onward is silently
dropped" contract. With (?=[^a-z]) as the lookahead, `feedText('## user')`
returned `## user` as safe (no char to satisfy the lookahead → no match
→ pass through), so the fabricated marker line leaked into UI and
app.sqlite before the next chunk confirmed contamination on the next
SIGTERM cycle.

Fix: introduce a `pending` state variable holding bytes that match the
COMPLETE-but-unconfirmed marker prefix at end of buffer
(/(?:^|\\n)[ \\t]*##[ \\t]+(?:user|assistant|assist|system)$/, no
lookahead, $ anchor instead). When the no-match branch detects this
suffix, withhold it from emission until the next feed either:
  - Confirms it (next char non-lowercase) → main regex matches →
    contaminated → withheld bytes dropped along with `## user`.
  - Denies it (next char lowercase, e.g. `userl…`) → main regex no
    longer matches the role keyword → withheld suffix is released
    and emitted alongside the new continuation.

Also tied the firstChunk transition to actual byte emission rather
than feed count. Previously a message that starts with `## system`
followed by a separate `\\n` chunk would lose the `^` anchor on the
second feed (firstChunk had flipped after the first feed even though
nothing was emitted yet), silently breaking detection for that edge
case. Now `firstChunk` stays true until at least one byte has crossed
the emission boundary, matching the conceptual definition of "message
start".

Tests added (apps/daemon/tests/role-marker-guard.test.ts):
  - `## user` deferred at chunk boundary, confirmed by `\\n` in next
  - `## user` deferred at chunk boundary, denied by `land` continuation
  - `## assistant` deferred, confirmed by punctuation
  - `## User` Title-Case still passes through unconditionally
  - `## system` as the very first chunk: deferred, confirmed by \\n
    in next chunk (tests the firstChunk-stays-true-when-nothing-
    emitted invariant)

Total tests: 50 → 55.

Co-Authored-By: JasonBroderick <jason@buddyboss.com>

* fix(claude-stream): scope role-marker guard to text_delta only, not thinking_delta

Addresses review r3324xxxxxx — guarding the thinking channel buys no
security and causes legitimate aborts.

Why thinking is NOT a #3247 vector:
  - `buildDaemonTranscript` in apps/web/src/providers/daemon.ts only
    re-serializes `m.content` as `## ${m.role}\n...`.
  - Extended-thinking content is rendered to a separate
    `kind: 'thinking'` payload (daemon.ts:857-858) and never folded
    into `m.content`.
  - So a `## user` line in the thinking channel CANNOT become a
    fabricated turn boundary on the next round-trip.

Why guarding it is harmful:
  - Models routinely emit literal `## user` / `## assistant` lines
    in chain-of-thought when reasoning about conversation structure
    ("Let me think about this. The user might phrase it as:\n## user\n
    …"). Common pattern in production traces.
  - With `abortForRoleMarker` wired in server.ts, a guard match on
    thinking SIGTERMs the run and surfaces a security error to the
    UI. The user paid for the reasoning, never sees the answer, and
    gets a confusing "fabricated role marker" warning for what was
    actually legitimate metacognition.
  - This directly contradicts the module's own stated philosophy
    ("a false positive aborts the whole run — a much more expensive
    failure than a stray unflagged ... line", role-marker-guard.ts).

Fix: `emitSafeText` now passes thinking_delta through unconditionally,
skipping both the guard and the contamination check. text_delta
remains fully guarded. The single-line change at the top of
emitSafeText preserves all other channels' behavior.

Regression tests added (apps/daemon/tests/claude-stream-thinking.test.ts):
  - `## user` / `## assistant` lines in a thinking_delta — must NOT
    fire fabricated_role_marker, the thinking content streams intact
    including the marker text, and the subsequent text_delta answer
    still reaches the consumer (run not aborted).
  - Sanity check: same `## user` pattern in a text_delta DOES fire
    fabricated_role_marker and truncates emission at the marker. Locks
    in the channel-discriminated behavior.

Co-Authored-By: JasonBroderick <jason@buddyboss.com>

* fix(role-marker-guard): tie firstChunk to slicing, not byte emission

Blocking review r3324xxxxxx: under the prior firstChunk transition
("any byte emitted"), a role marker that arrived at the very start of
a message with its prefix split across multiple chunks bypassed
detection — reopening the #3247 vector on the Claude path.

Concrete cases that were missed (all are routine provider
tokenizations of \`## user\n…\` at message start):
  - \`##\`     | \` user\nDELETE…\`
  - \`## us\`  | \`er\nDELETE…\`
  - \`## \`    | \`user\nDELETE…\`

Mechanism: the pending-deferral regex only catches COMPLETE role
keywords, so a first chunk ending in a partial prefix (\`##\`, \`## \`,
\`## us\`) was emitted in full. That emission flipped firstChunk to
false. From that point only NEWLINE_ANCHORED_ROLE_MARKER_RE was used,
which requires a literal \n before \`##\`. A marker at buffer
position 0 has no preceding \n, so it could no longer match.
abortForRoleMarker never fired and tool_use blocks emitted after the
fabricated turn boundary reached the dispatcher.

Fix: change firstChunk to track "tail has not been sliced yet" rather
than "any byte emitted". While total emitted bytes <= TAIL_BUFFER_SIZE,
tail still represents the entire emission so far and \`^\` in the
canonical regex genuinely anchors at byte 0 of the stream — so the
\`^|\n\` alternation safely catches a chunk-split message-start
marker. The transition happens at the moment we would slice: once
emitted > TAIL_BUFFER_SIZE, tail becomes a mid-stream window, \`^\`
becomes meaningless, and we switch to the newline-only variants.

Earlier iterations of this code tried two other definitions, both
unsound:
  - "any byte emitted" (this commit fixes) — lost \`^\` before a
    chunk-split message-start marker could finish arriving.
  - "newline emitted" (briefly considered as the reviewer's
    alternative suggestion) — left \`^\` valid on a sliced buffer
    when streams hadn't emitted a newline yet, re-introducing the
    rolling-tail mid-stream false positive from review r3324060995.
The slice-based invariant satisfies both: while we have not sliced,
\`^\` is correct; once we slice, it is not.

Regression tests added (apps/daemon/tests/role-marker-guard.test.ts):
  - \`##\`    | \` user\nDELETE…\`   → contaminated, marker=\`## user\`
  - \`## us\` | \`er\nDELETE…\`      → contaminated, marker=\`## user\`
  - \`## \`   | \`user\nDELETE…\`    → contaminated, marker=\`## user\`
  - \`#\`     | \`# user\nDELETE…\`  → contaminated, marker=\`## user\`
The fourth case (single \`#\` first chunk) exercises an even more
adversarial tokenization than the reviewer's examples; it is also
caught.

Total tests: 55 → 59.

Co-Authored-By: JasonBroderick <jason@buddyboss.com>

* fix(tests): wrap events in stream_event envelope in thinking test

feedJsonl was feeding raw events without the `{ type: 'stream_event',
event: ... }` wrapper that createClaudeStreamHandler requires (line 141
of claude-stream.ts). Events silently fell through all branches, making
both tests pass vacuously. Also fix TS2532 on warnings[0].marker with
non-null assertion (safe after the toHaveLength(1) guard).

Co-Authored-By: RoverKai <roverkai@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: roverkai <2196140098@qq.com>
Co-authored-by: JasonBroderick <jason@buddyboss.com>
Co-authored-by: RoverKai <roverkai@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-30 03:57:56 +00:00
leessju
e92f91fb06
fix(contracts): tighten LiveArtifactSsePayload.refreshStatus to LiveArtifactRefreshStatus (#1871)
The SSE `live_artifact` event payload typed `refreshStatus` as
`string | undefined`, but the underlying REST API (`LiveArtifact`) and
every consumer treat it as the canonical `LiveArtifactRefreshStatus`
union (`'never' | 'idle' | 'running' | 'succeeded' | 'failed'`).
Daemon emitters already pass `artifact.refreshStatus`, which is
typed as the enum at the source.

The old type let new union members drift between REST and SSE
unnoticed: a daemon adding a new refresh state for the REST API
could ship out via SSE as a bare `string`, and web/CLI consumers
switching on the union would silently fall through.

Tighten the SSE field to the same enum. The field stays optional
to preserve forward/backward compatibility with daemons that omit
it on legacy events, but the type now narrows to the canonical
members. Runtime behavior is unchanged — this is a type-only fix.

Validation: contracts build/test, web typecheck, daemon typecheck.

Co-authored-by: nicejames <nicejames@gmail.com>
2026-05-16 21:39:39 +08:00
Rocky
cd68c8a80a
fix(daemon+web): emit conversation-created SSE event when routine run starts (#1523)
* fix(daemon+web): emit conversation-created SSE event when routine run starts

When a Routine fires in "Reuse an existing project" mode, the daemon
creates a new conversation in the project and writes a queued/running
assistant message to the database, but the open `ProjectView` has no
way to learn that anything happened: the project events SSE stream
only carries `file-changed` and `live_artifact*` events, and
`ProjectView` reloads conversations only when `project.id` changes.
The result is the user's own routine "Run now" appears to do nothing
until they exit and re-enter the project (#1361).

Fix:

- Add a `conversation-created` payload type to the existing project
  events stream in `apps/web/src/providers/project-events.ts`. The
  payload carries `projectId`, `conversationId`, `title`, and
  `createdAt`. Mirror the existing `file-changed` listener pattern
  with explicit malformed-payload handling.
- In `apps/daemon/src/server.ts`, after `insertConversation` runs in
  the routine `setRunHandler` (both reuse-an-existing-project and
  new-project paths), broadcast a `conversation-created` event
  through the existing `activeProjectEventSinks` map. The function
  body was already generic so it was renamed from
  `emitProjectLiveArtifactEvent` to `emitProjectEvent` and the two
  pre-existing callers updated.
- In `apps/web/src/components/ProjectView.tsx`, when
  `handleProjectEvent` receives a `conversation-created` event whose
  `projectId` matches the currently-viewed project, refetch the
  conversation list via `listConversations`. The active conversation
  is intentionally NOT changed — per maintainer guidance on #1361,
  auto-switching is a separate UX decision left for a follow-up.
- `projectEventToAgentEvent` returns null for `conversation-created`
  so it doesn't get routed into the live-artifact path.

Tests (`apps/web/tests/providers/project-events.test.ts`):

- A single `conversation-created` event reaches the consumer with
  the parsed payload.
- Two consecutive `conversation-created` events from concurrent
  routine runs both reach the consumer (covers the
  multiple-concurrent-runs case reported in #1502).
- Malformed `conversation-created` payloads are swallowed without
  throwing, matching the existing `file-changed` / `live_artifact`
  defensive behavior.

Manual verification:

- Built locally with `pnpm exec tools-pack mac build --to app
  --portable` and installed.
- Created a routine in `Reuse an existing project` mode targeting an
  existing project.
- With the project view open, clicked `Run now`. The new "Routine"
  conversation appeared in the project's conversation list within
  about a second, without exiting and re-entering the project, and
  the active conversation was not changed.
- Clicked `Run now` twice in quick succession; both new
  conversations appeared in the list, covering the concurrent-runs
  case in #1502.
- `pnpm guard` and `pnpm --filter @open-design/web typecheck` clean;
  full web test suite is 1016/1016 passing.

Fixes #1361

* fix(#1523 review): share SSE type via contracts; guard conversation refresh against project-switch and reordering races

Addresses Codex P1 + lefarcen P2 inline review on #1523 (#1361):

1. Move `ProjectConversationCreatedSsePayload` to
   `@open-design/contracts` (`packages/contracts/src/sse/chat.ts`)
   so the daemon producer and the web consumer share one type. The
   web provider re-exports it under the local
   `ProjectConversationCreatedEvent` name to keep the existing import
   shape stable for callers; the daemon emit site picks up the same
   shape via a JSDoc typedef so producer and consumer can't drift as
   this stream grows. (Addresses lefarcen P2 on project-events.ts:17.)

2. Guard the `conversation-created` async refresh in `ProjectView`
   against two distinct races:

   - Project-switch race: capture `project.id` at dispatch time and
     re-check it via a live `projectIdRef` after `listConversations`
     resolves; bail if the user switched projects while the request
     was in flight. The existing project-load effects use the same
     cancellation pattern. (Addresses Codex P1 on ProjectView.tsx:767.)

   - Concurrent-refresh re-ordering race: bump a monotonic
     `conversationsRefreshTokenRef` on every dispatch and capture each
     request's token; only the request whose captured token still
     equals the live ref at await-return applies its result. Two
     rapid `conversation-created` events (the #1502 concurrent
     Run-now case) can no longer drop the newest conversation when
     the earlier request resolves last with a stale, shorter list.
     (Addresses lefarcen P2 on ProjectView.tsx:767.)

Both guards are documented inline with comments that point back at the
review threads. The existing project-events tests (single delivery,
concurrent delivery, malformed payloads) are unchanged — the new
guards are defensive logic on the consumer, not new event shapes.

`pnpm guard`, `pnpm --filter @open-design/web typecheck`,
`pnpm --filter @open-design/daemon typecheck`, and the full web test
suite (1016/1016) remain green.
2026-05-13 14:50:58 +08:00
Tom Huang
56bf6ee1b6
feat: agent-callable research command and /search (#615)
* feat: pre-generation research (Tavily) for grounded generation

Adds an optional pre-generation research step so the agent can produce
slides / prototypes / decks grounded in real sources instead of guessing.

User flow:
  1. Settings -> Tavily Search -> paste API key (or set TAVILY_API_KEY).
  2. Click the new Research button in the chat composer.
  3. On send, the daemon runs a Tavily search, prepends the findings
     as a <research_context> block ahead of the system prompt, and
     spawns the agent. Research progress shows up as status pills in
     the chat stream; the agent cites sources inline as [1]/[2]/...

Phase 1 surface:
  - Single provider (Tavily), single depth ('shallow'), no LLM
    synthesis pass (Tavily's `answer` is the summary).
  - Composer toggle only; no popover / depth picker yet.
  - Reuses the existing `status` SSE agent payload + StatusPill UI
    so no new event variants or renderer code are needed.

Layers touched:
  - contracts: ResearchOptions / Source / Findings DTOs;
    ChatRequest.research; export from index.
  - daemon: apps/daemon/src/research/{index,tavily}.ts orchestrator
    + provider; tavily added to MEDIA_PROVIDERS and ENV_KEYS; hook
    in startChatRun before prompt assembly.
  - web: ChatComposer toggle + ChatSendMeta; threaded through
    ChatPane / ProjectView / streamViaDaemon into ChatRequest.

Side fix (required to land the feature, but useful on its own):
  contracts internal relative imports lacked the `.js` suffix that
  NodeNext module resolution requires. This was already breaking
  `pnpm --filter @open-design/daemon typecheck` on main; without the
  fix, none of the new research types were visible to the daemon.
  All internal contracts imports now carry `.js`.

Spec: specs/current/research-feature.md (phases 2-4 outlined for
follow-up: composer popover, multi-provider, deep recursion, example
skills with research_recommends).

Verified:
  - pnpm --filter @open-design/contracts typecheck/test
  - pnpm --filter @open-design/daemon typecheck (the chokidar
    project-watchers test is a pre-existing flake, unrelated)
  - pnpm --filter @open-design/web typecheck
  - node scripts/verify-media-models.mjs

* fix(daemon): clamp Tavily max_results to 20

Tavily's /search endpoint requires `max_results` in [0, 20]; sending a
larger value (e.g. when `research.depth: "deep"` resolves to 30) returns
400 and `runResearch` silently falls back to no-research. Clamp at the
provider boundary so Phase 2 depth tiers above 20 still produce results
instead of failing the request.

Generated-By: looper 0.6.1 (runner=fixer, agent=claude-code)

* Remove stale research merge leftovers

* Add agent-callable research search

* Fix Indonesian locale typecheck

* Fix research command invocation edge cases

* Harden slash search prompt expansion

* Honor research source caps in command contract

* Require search reports in design files

* Add research data provider settings

* Wire web research provider fallback order

* Update research provider fallback wording

* Revert "Update research provider fallback wording"

This reverts commit 86fb6001e3.

* Revert "Wire web research provider fallback order"

This reverts commit 4c9e16036b.

* Revert "Add research data provider settings"

This reverts commit 23630d1746.

* Add Dexter and Last30Days research skills

* Add DCF and Last30Days OD skills

* Add Last30Days and Dexter skills

* Resolve research review threads

---------

Co-authored-by: a1chzt <chizblank@gmail.com>
2026-05-08 10:33:44 +08:00
Marc Chan
c3d9136a0c
Add live artifacts and Composio connector catalog (#381)
* docs: add live artifacts implementation spec

* docs: align live artifacts implementation plan

* Ralph iteration 1: work in progress

* Ralph iteration 2: work in progress

* Ralph iteration 3: work in progress

* Ralph iteration 4: work in progress

* Ralph iteration 5: work in progress

* Ralph iteration 6: work in progress

* Ralph iteration 7: work in progress

* Ralph iteration 8: work in progress

* Ralph iteration 9: work in progress

* Ralph iteration 10: work in progress

* Ralph iteration 11: work in progress

* Ralph iteration 12: work in progress

* Ralph iteration 13: work in progress

* Ralph iteration 14: work in progress

* Ralph iteration 15: work in progress

* Ralph iteration 16: work in progress

* Ralph iteration 17: work in progress

* Ralph iteration 18: work in progress

* Ralph iteration 19: work in progress

* Ralph iteration 20: work in progress

* Ralph iteration 21: work in progress

* Ralph iteration 22: work in progress

* Ralph iteration 23: work in progress

* Ralph iteration 24: work in progress

* Ralph iteration 25: work in progress

* Ralph iteration 26: work in progress

* Ralph iteration 27: work in progress

* Ralph iteration 28: work in progress

* Ralph iteration 29: work in progress

* Ralph iteration 30: work in progress

* Ralph iteration 31: work in progress

* Ralph iteration 32: work in progress

* Ralph iteration 33: work in progress

* Ralph iteration 34: work in progress

* Ralph iteration 35: work in progress

* Ralph iteration 36: work in progress

* Ralph iteration 37: work in progress

* Ralph iteration 38: work in progress

* Ralph iteration 39: work in progress

* Ralph iteration 40: work in progress

* Ralph iteration 41: work in progress

* Ralph iteration 42: work in progress

* Ralph iteration 43: work in progress

* Ralph iteration 44: work in progress

* Ralph iteration 45: work in progress

* Ralph iteration 46: work in progress

* Ralph iteration 47: work in progress

* Ralph iteration 48: work in progress

* Ralph iteration 49: work in progress

* Ralph iteration 50: work in progress

* Ralph iteration 51: work in progress

* Ralph iteration 52: work in progress

* Ralph iteration 53: work in progress

* Ralph iteration 54: work in progress

* Ralph iteration 55: work in progress

* Ralph iteration 56: work in progress

* Ralph iteration 57: work in progress

* Ralph iteration 58: work in progress

* Ralph iteration 59: work in progress

* Ralph iteration 60: work in progress

* Ralph iteration 61: work in progress

* Ralph iteration 62: work in progress

* Ralph iteration 63: work in progress

* Ralph iteration 64: work in progress

* Ralph iteration 65: work in progress

* Ralph iteration 1: work in progress

* Ralph iteration 2: work in progress

* Ralph iteration 3: work in progress

* Ralph iteration 4: work in progress

* Ralph iteration 5: work in progress

* Ralph iteration 6: work in progress

* Ralph iteration 8: work in progress

* Ralph iteration 9: work in progress

* Ralph iteration 17: work in progress

* Add Composio-backed connectors

* Add Composio-backed connector catalog

* Fix connector callback flow

* Update live artifact connector refresh

* Fix live artifact refresh updates

* Improve live artifact viewer toolbar

* Refine live artifact source tabs

* Expand Composio connector catalog

* Improve Composio connector browsing

* Fix artifact refresh source safety checks

Generated-By: looper 0.4.1 (runner=fixer, agent=opencode)

* Fix live artifacts PR feedback

Generated-By: looper 0.5.0 (runner=fixer, agent=opencode)

* Fix live artifact preview CORS validation

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* Fix connector OAuth IPv6 loopback hosts

Allow bracketed IPv6 loopback Host headers when deriving connector OAuth callback URLs so IPv6-bound daemons can complete connection flow.

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* Preserve live artifact refresh permissions

Respect explicit refresh permission choices during live artifact create and update flows so revoked connector sources remain gated.

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* Fix live artifact preview cache freshness

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* Fix live artifact refresh validation

Guard manual refreshes with local daemon checks and reject daemon_tool sources without a toolName before refresh execution.

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* Fix Composio credential invalidation

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* Fix live artifact CORS methods

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* Fix workspace validation

Restore media config test isolation under Vitest setup data-dir overrides and add the missing French live artifact display copy so the workspace test suite stays aligned.\n\nGenerated-By: looper 0.5.2 (runner=fixer, agent=opencode)

* Fix connector safety filtering

Keep agent-preview connector listings aligned with execution safety policy and prune stale Composio OAuth state records before they accumulate.

Generated-By: looper 0.5.2 (runner=fixer, agent=opencode)

* Fix agent runtime cleanup

Generated-By: looper 0.5.2 (runner=fixer, agent=opencode)

* Fix live artifact daemon access

Validate local-only live artifact routes against the peer socket address and pass daemon-resolved CLI paths to ACP MCP descriptors.\n\nGenerated-By: looper 0.5.2 (runner=fixer, agent=opencode)

* Fix connector run limit pruning

Evict stale connector rate-limit buckets so long-lived daemon processes do not retain per-run entries indefinitely.\n\nGenerated-By: looper 0.5.2 (runner=fixer, agent=opencode)

* Fix connector compact schemas

Generated-By: looper 0.5.2 (runner=fixer, agent=opencode)

* Improve connector connection feedback

* Adjust connector gate positioning

* Fix live artifact refresh commits

Avoid marking refresh candidates failed after snapshot or state persistence errors by deferring live artifact mutations until the durable refresh metadata is written. Also align connector OAuth callback host validation with daemon loopback handling.\n\nGenerated-By: looper 0.5.4 (runner=fixer, agent=opencode)

* Improve connector search relevance

* fix(daemon): harden connector connection state

Require loopback daemon validation before connector connect side effects and only clear provider-owned connector statuses during credential reset.

Generated-By: looper 0.5.4 (runner=fixer, agent=opencode)

* fix(daemon): guard connector disconnect route

Require local daemon request validation before connector disconnect side effects.

Generated-By: looper 0.5.4 (runner=fixer, agent=opencode)

* fix(daemon): guard composio config updates

Generated-By: looper 0.5.4 (runner=fixer, agent=opencode)

* fix(daemon): dispatch live artifacts mcp first

Route the live-artifacts MCP server before the generic MCP CLI so od mcp live-artifacts starts the dedicated server instead of failing generic argument parsing.\n\nGenerated-By: looper 0.5.4 (runner=fixer, agent=opencode)

* fix(daemon): handle integer connector schemas

Allow JSON Schema integer connector inputs while preserving fractional-value validation so generated connector tool schemas accept valid page sizes and limits.

Generated-By: looper 0.5.4 (runner=fixer, agent=opencode)

* fix: align live artifact refresh error codes

Generated-By: looper 0.5.4 (runner=fixer, agent=opencode)

* Fix live artifact connector refresh flow

* Update live artifact design cards

* Add beta badge to live artifact form

* Remove live artifact tile model

* Fix live artifact refresh sync

* Fix live artifact MCP refresh durability

Generated-By: looper 0.5.4 (runner=fixer, agent=opencode)

* Fix live artifact refresh safety

Enforce persisted refresh opt-out and connector auto-read gating before refresh sources execute.

Generated-By: looper 0.5.5 (runner=fixer, agent=opencode)
2026-05-05 16:42:11 +08:00
nettee
3fb849d047
Fix chat runs surviving web disconnects (#146)
* fix chat runs surviving web disconnects

* fix chat run create abort propagation

Generated-By: looper 0.0.0-dev (runner=fixer, agent=openai/gpt-5.5)

* fix daemon keepalive reconnect budget

Generated-By: looper 0.0.0-dev (runner=fixer, agent=gpt-5.5)

* fix daemon stream disconnect cancellation

Generated-By: looper 0.0.0-dev (runner=fixer, agent=openai/gpt-5.5)

* fix daemon stream abort cancellation race

Generated-By: looper 0.0.0-dev (runner=fixer, agent=openai/gpt-5.5)

* fix daemon run cancellation semantics

* fix load

* doc

* 2

* add run refresh recovery

* fix active run refresh status

* fix reattach abort handling

* fix

* fix chat initial scroll

* fix daemon start failures

Generated-By: looper 0.2.7 (runner=fixer, agent=openai/gpt-5.5)

* fix background run recovery

Generated-By: looper 0.2.7 (runner=fixer, agent=openai/gpt-5.5)

* fix stop run status

Generated-By: looper 0.2.7 (runner=fixer, agent=openai/gpt-5.5)

* fix background run recovery

Generated-By: looper 0.2.7 (runner=fixer, agent=openai/gpt-5.5)

* extract daemon run service

* move prompt composition to daemon

* fix prompt module resolution

* fix project id generation

* add project run status

* add designs kanban view with awaiting_input status

- add grid/kanban view toggle on Designs tab; persist choice in localStorage
- introduce awaiting_input project display status (daemon-derived from
  unanswered <question-form>) so projects asking the user aren't shown
  as Completed; ordered between Running and Completed with amber accent
- hide transient queued state from users: coerce queued/starting to
  running in daemon /api/projects projection and drop the queued kanban
  column
- a11y polish on Designs cards: Space activation, aria-labels on delete,
  focus-visible outlines, reveal delete on focus-within and touch,
  prefers-reduced-motion handling
- kanban layout uses flex sizing instead of viewport math; scoped icon-
  only pill button rule fixes view-toggle icon alignment

---------

Co-authored-by: mrcfps <mrc@powerformer.com>
2026-04-30 20:16:46 +08:00
nettee
56d08b8c5f
Add shared contracts and migrate project code to TypeScript (#118) 2026-04-30 13:01:15 +08:00