open-design/apps/daemon/src/role-marker-guard.ts
JasonBroderick 0fbeaf829e
fix(#3247): Detect, terminate, and warn on fabricated role markers across all agent paths (#3303)
* fix(daemon): detect and strip fabricated role markers in model output (#3247)

Three-layer defence against models emitting `## user` / `## assistant` /
`## system` lines mid-response, which the chat host interprets as real
turn boundaries and acts on as unauthorised instruction:

1. **System prompt**: anti-roleplay instruction elevated from a bullet
   under "What you don't do" to a standalone `## CRITICAL` section in
   `official-system.ts`, with a REMINDER pinned at the end of the
   composed prompt for recency bias.

2. **Stream-level detection and truncation**: shared `role-marker-guard.ts`
   module (`createRoleMarkerGuard` + `FABRICATED_ROLE_MARKER_RE`) used
   across all text paths — Claude stream (per-message guards), non-Claude
   structured streams (run-scoped guard via `emitGuardedTextDelta`),
   and BYOK proxy routes (`createDeltaGuard`). When a marker is detected,
   the contaminated suffix is dropped and a `fabricated_role_marker` event
   surfaces a warning in the UI.

3. **UI**: `StatusPill` gains `is-warning` / `is-error` CSS variants;
   `fabricated_role_marker` events render as amber warning pills.

* fix(chat-routes): do not await reader.cancel() on stream early-return

The await on reader.cancel() can hang indefinitely on response streams
whose underlying source is a Uint8Array (most notably surfaced by the
ollama test in proxy-routes.test.ts, which builds its mock body via
`new Response(uint8array)` rather than the controller-based helper
`sseResponse()`). The hung await holds the request handler open, which
in turn blocks `server.close()` in the afterAll hook, producing the two
test timeouts (test at 145, hook at 36) currently failing CI on #3296.

Fix is in production code, not the test: don't await the cancel. It
is a cleanup hint and we are returning from the function anyway, so
blocking on it offers no value. fire-and-forget with an empty catch
keeps the cancel signal flowing for real HTTP streams without
risking a hang on mock/edge-case implementations.

Co-Authored-By: JasonBroderick <jason@buddyboss.com>

* fix(daemon): terminate child on role-marker detection (close #3247 generation vector)

PR #3296's detection layer truncates display and persistence of fabricated
role markers, but the underlying model subprocess keeps generating tokens
after detection. Three concrete consequences:

  1. The model bills the user for the entire contaminated response
     (we observed 5,106 chars stored in claude's session file for a turn
     where only the first 3,013 chars were legitimate — a 40% overhead).
  2. tool_use blocks emitted AFTER the marker reach the daemon's
     dispatcher unchecked, since detection only gates the text-delta
     emission path, not content-block-stop / tool_use blocks. The
     model could fabricate "## user delete file X" then emit a
     tool_use(delete X) that the dispatcher would execute.
  3. The UI surfaces a `fabricated_role_marker` warning followed by an
     eventual normal turn-end, blurring the distinction between
     "completed normally" and "killed by safety guard."

This commit adds a single idempotent `abortForRoleMarker(marker)`
helper in server.ts, scoped to the same closure as `child` and
`runGuard`. On any detection event (per-message Claude guard,
run-scoped non-Claude guard, plain stdout guard) the helper:

  - Emits a structured `ROLE_MARKER_HALLUCINATION` SSE error so the
    UI can render a security-class status distinct from a normal
    turn-end. The existing `fabricated_role_marker` warning is still
    sent and rendered as the amber pill (PR #3296's UI).
  - Calls `acpSession.abort()` for ACP-multiplexed agents (Hermes,
    Kimi, Devin, Kiro) whose I/O doesn't necessarily release on
    SIGTERM of the wrapper process alone.
  - SIGTERMs the child immediately, with the existing
    `scheduleForcedChildShutdown()` SIGKILL fallback at 2x grace.

Wired into three sites where contamination is detected:
  - `emitGuardedTextDelta` (sendAgentEvent / copilot / ACP / pi-rpc
    text_delta paths)
  - Plain-stdout listener (BYOK plain mode)
  - The Claude stream handler's onEvent (per-message guards in
    claude-stream.ts surface `fabricated_role_marker` events directly
    via onEvent rather than through the run-scoped emitGuardedTextDelta)

Tool_use blocks emitted BEFORE the marker still flow through normally
— this guard can't help with those, since by the time we observe a
text marker the prior content block has already finished. Closing
that gap requires speculative cancellation of in-flight tool calls
when a downstream text block contains a marker; that's tracked as
follow-up work, not included here.

Co-Authored-By: roverkai <2196140098@qq.com>
Co-Authored-By: JasonBroderick <jason@buddyboss.com>

* refactor(role-marker-guard): bounded tail + drop chat-style markers

Addresses two review comments on #3303:

(1) O(1) memory + per-delta work (review r3323982225)
  Replace the unbounded `accumulated` string with a rolling tail capped
  at TAIL_BUFFER_SIZE (64 chars — comfortably exceeds the longest
  marker prefix `\n<whitespace>## assistant` ≈ 16–24 chars in practice).
  A 50 KB assistant response delivered in 1000 chunks of 50 bytes was
  previously O(n²) on string concatenation alone; now it is O(1) per
  delta regardless of message length. The `tail.length` value carries
  the "already emitted" offset that the cut-point math needs, so the
  offset semantics at L74–78 of the prior implementation are preserved
  without re-introducing the full-text buffer.

(2) Drop chat-style markers entirely (review r3323982234, option (a))
  `User:` / `Assistant:` / `Human:` / `AI:` are removed from the regex.
  Rationale:
    - The host parses ONLY `## user` / `## assistant` / `## system`
      lines as turn boundaries (see `buildDaemonTranscript` in
      apps/web/src/providers/daemon.ts). A model emitting chat-style
      markers does NOT cause the original #3247 security failure.
    - With kill-on-detection wired in this PR (`abortForRoleMarker`
      in server.ts), a false positive aborts the whole run — far
      more expensive than a stray unflagged `User:` line in chat
      scrollback. Chat-style markers collide with legitimate output
      (form labels, email contacts, JSDoc) often enough that pairing
      them with kill-semantics is the wrong tradeoff.
  The tradeoff is now documented in the regex docblock so the
  kill-on-match behaviour is justified against the false-positive
  surface.

Also aligns the prompt-side CRITICAL block in system.ts: drop the
"don't emit User: / Assistant: / Human: / AI:" bullet, since we no
longer enforce it. Less ambiguity for the model and the operators.

Test file updated:
  - Chat-style positive tests flipped to negative ("does NOT match
    User: — chat-style out of scope") so the intentional exclusion
    has a permanent regression test.
  - Two new tests cover the bounded-tail behaviour: a marker arriving
    after 10 KB of clean text in small chunks, and a marker
    straddling a chunk boundary after 100 prior chunks.
  - Added test for legitimate `User: bob@example.com`-style content
    not triggering contamination.
Test count is now 35 (up from 25); two of the new ones explicitly
exercise the new bounded-tail path.

Co-Authored-By: JasonBroderick <jason@buddyboss.com>

* fix(role-marker-guard): drop \`^\` anchor after first chunk (review r3324060995)

Blocking correctness bug introduced by commit 4 (bounded-tail refactor):
once \`tail\` is a rolling slice of mid-stream text, \`^\` in the
canonical regex \`(?:^|\\n)\\s*##\\s+(?:user|...)\` no longer represents
the genuine message start. As the rolling window slides forward chunk
by chunk, a sliced tail can begin with whitespace + \`##\` (or just
\`##\`), letting \`^\` anchor a match against text that the
full-buffer implementation correctly ignored. With kill-on-detection
wired in commit 3, that false positive now SIGTERMs the run and emits
a \`ROLE_MARKER_HALLUCINATION\` error — exactly the failure class
called out in the docblock at L22–29.

Reviewer's evidence (PerishCode, r3324060995): streaming
"…take a look at the ## user content section…" one character at a
time reports \`contaminated: true\` post-refactor; the same text in a
single feed stays clean.

Fix: keep the canonical \`FABRICATED_ROLE_MARKER_RE\` for the very
first non-empty feed (where \`^\` legitimately points at the message
start), and switch to an internal \`NEWLINE_ANCHORED_ROLE_MARKER_RE\`
(\`\\n\\s*##\\s+(?:user|...)\` — drops the \`^\` alternative) for all
subsequent feeds. A \`firstChunk\` boolean tracks the state. Real
newline-preceded markers straddling chunk boundaries are still caught
because the preceding \`\\n\` is retained inside the 64-char tail.

Regression tests added (\`apps/daemon/tests/role-marker-guard.test.ts\`):
  - mid-line \`## user\` streamed char-by-char with no preceding \\n
    (mirrors the reviewer's repro)
  - space-preceded mid-line \`## user\` in a >130-char stream, which
    long enough to force the rolling window past the marker — exercises
    the exact slice condition that triggered the bug
  - real \\n-preceded \`## user\` still caught after a long preamble
    (positive case must not regress)
  - \`## user\` as the very first chunk still caught (\`^\` legitimately
    anchors on the first feed)

Co-Authored-By: JasonBroderick <jason@buddyboss.com>

* fix(role-marker-guard): case-sensitive + tighter prefix scope (reviews r3324151877 / r3324151882)

Two refinements addressing the third review on #3303:

== Blocking (r3324151877) ==
The regex over-matched legitimate Markdown headings, and with
kill-on-detection wired in commit 3 each false positive
deterministically aborts a real run. Three changes tighten the match
to the actual security surface — `## user` / `## assistant` /
`## system` lines the chat host parses as turn boundaries — without
losing any real attack pattern:

1. CASE-SENSITIVE. Dropped the `/i` flag. The host's turn-boundary
   delimiter is lowercase (see `buildDaemonTranscript` in
   apps/web/src/providers/daemon.ts), and the `## CRITICAL`
   system-prompt block already forbids only the lowercase forms.
   Title-Case headings like `## User Guide`, `## System Architecture`,
   `## Assistant settings` are now ignored — these are legitimate
   technical writing patterns LLMs emit constantly. `## USER NOTES`
   (all-caps) likewise no longer flags.

2. POSITIVE LOOKAHEAD `(?=[^a-z])` after the role keyword. Without it,
   `## userland`, `## userspace`, `## users guide`, `## systemd`,
   `## assistance` all match via prefix in the alternation. The
   lookahead requires the next character to exist and to not be a
   lowercase letter, so:
     - `## user\\n…`     → match (newline is not lowercase)
     - `## assistantR…` → match (R is uppercase; the glued-form
                          attack pattern still gets caught)
     - `## assistant.`  → match (. is not a letter)
     - `## users guide` → no match (s is lowercase letter)
     - `## userland`    → no match (l is lowercase letter)
   POSITIVE rather than NEGATIVE `(?![a-z])` because the negative
   form is satisfied at end-of-string, which in a streaming context
   means "we have `## user` but don't know what comes next yet" —
   would fire prematurely if `land` arrives in a later chunk. The
   positive form delays detection by one character in that edge
   case, traded for correctness.

3. `[ \\t]` instead of `\\s` for inner whitespace. Markdown role
   markers are single-line by convention; restricting to space/tab
   prevents oddities like `##\\nuser` from matching across lines.

Test file: added Title-Case fixtures (`## User Guide`,
`## System Architecture`, `## Assistant settings`, `## USER NOTES`)
and prefix-of-longer-word fixtures (`## users guide`, `## userland`,
`## systemd`, `## assistance`) — each asserting NO contamination.
The existing `## usability` negative test gave false confidence as
the reviewer noted (only failed via alternation-miss, not via
word-boundary semantics); the new fixtures actually exercise the
lookahead. Also added a positive test for `## assistant.` (glued
punctuation) to balance the existing `## assistantReading`
(glued uppercase) coverage. Total tests: 35 → 50.

== Non-blocking (r3324151882) ==
Added `ROLE_MARKER_HALLUCINATION` to `API_ERROR_CODES` in
`packages/contracts/src/errors.ts` alongside the existing agent/AMR
codes, with a docblock comment explaining the emission contract:
emitted by `server.ts::abortForRoleMarker` alongside the existing
`fabricated_role_marker` warning event when the daemon detects a
fabricated Markdown role marker in agent output; retryable. The code
was already being emitted over the wire but unregistered — landing
the registration here keeps the contract and emitter in sync as
reviewer requested.

Co-Authored-By: JasonBroderick <jason@buddyboss.com>

* fix(role-marker-guard): defer complete-but-unconfirmed marker suffix

Addresses review r3324277xxx — the boundary case where a stream chunk
boundary lands between the role keyword and its lookahead character
violated the documented "everything from the marker onward is silently
dropped" contract. With (?=[^a-z]) as the lookahead, `feedText('## user')`
returned `## user` as safe (no char to satisfy the lookahead → no match
→ pass through), so the fabricated marker line leaked into UI and
app.sqlite before the next chunk confirmed contamination on the next
SIGTERM cycle.

Fix: introduce a `pending` state variable holding bytes that match the
COMPLETE-but-unconfirmed marker prefix at end of buffer
(/(?:^|\\n)[ \\t]*##[ \\t]+(?:user|assistant|assist|system)$/, no
lookahead, $ anchor instead). When the no-match branch detects this
suffix, withhold it from emission until the next feed either:
  - Confirms it (next char non-lowercase) → main regex matches →
    contaminated → withheld bytes dropped along with `## user`.
  - Denies it (next char lowercase, e.g. `userl…`) → main regex no
    longer matches the role keyword → withheld suffix is released
    and emitted alongside the new continuation.

Also tied the firstChunk transition to actual byte emission rather
than feed count. Previously a message that starts with `## system`
followed by a separate `\\n` chunk would lose the `^` anchor on the
second feed (firstChunk had flipped after the first feed even though
nothing was emitted yet), silently breaking detection for that edge
case. Now `firstChunk` stays true until at least one byte has crossed
the emission boundary, matching the conceptual definition of "message
start".

Tests added (apps/daemon/tests/role-marker-guard.test.ts):
  - `## user` deferred at chunk boundary, confirmed by `\\n` in next
  - `## user` deferred at chunk boundary, denied by `land` continuation
  - `## assistant` deferred, confirmed by punctuation
  - `## User` Title-Case still passes through unconditionally
  - `## system` as the very first chunk: deferred, confirmed by \\n
    in next chunk (tests the firstChunk-stays-true-when-nothing-
    emitted invariant)

Total tests: 50 → 55.

Co-Authored-By: JasonBroderick <jason@buddyboss.com>

* fix(claude-stream): scope role-marker guard to text_delta only, not thinking_delta

Addresses review r3324xxxxxx — guarding the thinking channel buys no
security and causes legitimate aborts.

Why thinking is NOT a #3247 vector:
  - `buildDaemonTranscript` in apps/web/src/providers/daemon.ts only
    re-serializes `m.content` as `## ${m.role}\n...`.
  - Extended-thinking content is rendered to a separate
    `kind: 'thinking'` payload (daemon.ts:857-858) and never folded
    into `m.content`.
  - So a `## user` line in the thinking channel CANNOT become a
    fabricated turn boundary on the next round-trip.

Why guarding it is harmful:
  - Models routinely emit literal `## user` / `## assistant` lines
    in chain-of-thought when reasoning about conversation structure
    ("Let me think about this. The user might phrase it as:\n## user\n
    …"). Common pattern in production traces.
  - With `abortForRoleMarker` wired in server.ts, a guard match on
    thinking SIGTERMs the run and surfaces a security error to the
    UI. The user paid for the reasoning, never sees the answer, and
    gets a confusing "fabricated role marker" warning for what was
    actually legitimate metacognition.
  - This directly contradicts the module's own stated philosophy
    ("a false positive aborts the whole run — a much more expensive
    failure than a stray unflagged ... line", role-marker-guard.ts).

Fix: `emitSafeText` now passes thinking_delta through unconditionally,
skipping both the guard and the contamination check. text_delta
remains fully guarded. The single-line change at the top of
emitSafeText preserves all other channels' behavior.

Regression tests added (apps/daemon/tests/claude-stream-thinking.test.ts):
  - `## user` / `## assistant` lines in a thinking_delta — must NOT
    fire fabricated_role_marker, the thinking content streams intact
    including the marker text, and the subsequent text_delta answer
    still reaches the consumer (run not aborted).
  - Sanity check: same `## user` pattern in a text_delta DOES fire
    fabricated_role_marker and truncates emission at the marker. Locks
    in the channel-discriminated behavior.

Co-Authored-By: JasonBroderick <jason@buddyboss.com>

* fix(role-marker-guard): tie firstChunk to slicing, not byte emission

Blocking review r3324xxxxxx: under the prior firstChunk transition
("any byte emitted"), a role marker that arrived at the very start of
a message with its prefix split across multiple chunks bypassed
detection — reopening the #3247 vector on the Claude path.

Concrete cases that were missed (all are routine provider
tokenizations of \`## user\n…\` at message start):
  - \`##\`     | \` user\nDELETE…\`
  - \`## us\`  | \`er\nDELETE…\`
  - \`## \`    | \`user\nDELETE…\`

Mechanism: the pending-deferral regex only catches COMPLETE role
keywords, so a first chunk ending in a partial prefix (\`##\`, \`## \`,
\`## us\`) was emitted in full. That emission flipped firstChunk to
false. From that point only NEWLINE_ANCHORED_ROLE_MARKER_RE was used,
which requires a literal \n before \`##\`. A marker at buffer
position 0 has no preceding \n, so it could no longer match.
abortForRoleMarker never fired and tool_use blocks emitted after the
fabricated turn boundary reached the dispatcher.

Fix: change firstChunk to track "tail has not been sliced yet" rather
than "any byte emitted". While total emitted bytes <= TAIL_BUFFER_SIZE,
tail still represents the entire emission so far and \`^\` in the
canonical regex genuinely anchors at byte 0 of the stream — so the
\`^|\n\` alternation safely catches a chunk-split message-start
marker. The transition happens at the moment we would slice: once
emitted > TAIL_BUFFER_SIZE, tail becomes a mid-stream window, \`^\`
becomes meaningless, and we switch to the newline-only variants.

Earlier iterations of this code tried two other definitions, both
unsound:
  - "any byte emitted" (this commit fixes) — lost \`^\` before a
    chunk-split message-start marker could finish arriving.
  - "newline emitted" (briefly considered as the reviewer's
    alternative suggestion) — left \`^\` valid on a sliced buffer
    when streams hadn't emitted a newline yet, re-introducing the
    rolling-tail mid-stream false positive from review r3324060995.
The slice-based invariant satisfies both: while we have not sliced,
\`^\` is correct; once we slice, it is not.

Regression tests added (apps/daemon/tests/role-marker-guard.test.ts):
  - \`##\`    | \` user\nDELETE…\`   → contaminated, marker=\`## user\`
  - \`## us\` | \`er\nDELETE…\`      → contaminated, marker=\`## user\`
  - \`## \`   | \`user\nDELETE…\`    → contaminated, marker=\`## user\`
  - \`#\`     | \`# user\nDELETE…\`  → contaminated, marker=\`## user\`
The fourth case (single \`#\` first chunk) exercises an even more
adversarial tokenization than the reviewer's examples; it is also
caught.

Total tests: 55 → 59.

Co-Authored-By: JasonBroderick <jason@buddyboss.com>

* fix(tests): wrap events in stream_event envelope in thinking test

feedJsonl was feeding raw events without the `{ type: 'stream_event',
event: ... }` wrapper that createClaudeStreamHandler requires (line 141
of claude-stream.ts). Events silently fell through all branches, making
both tests pass vacuously. Also fix TS2532 on warnings[0].marker with
non-null assertion (safe after the toHaveLength(1) guard).

Co-Authored-By: RoverKai <roverkai@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: roverkai <2196140098@qq.com>
Co-authored-by: JasonBroderick <jason@buddyboss.com>
Co-authored-by: RoverKai <roverkai@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-30 03:57:56 +00:00

297 lines
15 KiB
TypeScript
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

/**
* Shared utility for detecting and stripping fabricated role-marker lines
* (`## user`, `## assistant`, `## system`) injected by the model into its
* own output (see #3247 — same class as #2102 / #2464).
*
* `createRoleMarkerGuard()` — stateful per-message guard for structured
* stream handlers that can track message boundaries (Claude, Copilot,
* Qoder, OpenCode/Codex, Pi, ACP). Returns `{ feedText, contaminated,
* warningEvent }`.
*/
// Regex matching fabricated role-marker lines injected by the model into
// its own output. Anchored to start-of-line via (?:^|\n) so we don't
// false-positive on user prose like "here is the ## user content".
//
// Scope (deliberately narrow): Markdown-style `## user` / `## assistant`
// / `## assist` / `## system` only — these are the patterns the chat
// host actually parses as turn boundaries (see `buildDaemonTranscript`
// in apps/web/src/providers/daemon.ts). Chat-style markers like
// `User:` / `Assistant:` / `Human:` / `AI:` are intentionally NOT
// included, because:
// (1) The host never parses them as turn boundaries; a model emitting
// them does NOT cause the original #3247 security failure mode.
// (2) They collide with legitimate output far more often than the
// Markdown family (e.g., "User: bob@example.com", form labels,
// JSDoc lines). With kill-on-detection wired in server.ts
// (`abortForRoleMarker`), a false positive aborts the whole run
// — a much more expensive failure than a stray unflagged
// `User:` line in the chat scrollback.
// If a host frontend ever starts parsing chat-style markers as
// boundaries, narrow the additions to that frontend's specific
// path rather than the shared regex.
//
// Three deliberate refinements vs. a naive `## role` match:
//
// 1. CASE-SENSITIVE. The chat host's turn-boundary delimiter is
// lowercase (`## user` / `## assistant` / `## system` — see
// `buildDaemonTranscript` in apps/web/src/providers/daemon.ts), and
// the `## CRITICAL` system-prompt block forbids only the lowercase
// forms. Title-Case Markdown headings like `## User Guide`,
// `## System Architecture`, `## Assistant settings` are LEGITIMATE
// content (LLMs emit these constantly in technical writing) and
// must not contaminate. Matching with `/i` would deterministically
// abort any run that produced such a heading — exactly the
// "false positive aborts the whole run" cost the docblock cites
// as the reason to keep the regex narrow.
// (See PR #3303 review r3324151877.)
//
// 2. POSITIVE LOOKAHEAD `(?=[^a-z])`. Without it, `## userland`,
// `## userspace`, `## users guide`, `## systemd`, `## assistance`
// all match via prefix in the alternation. The positive lookahead
// requires the character after the role keyword to exist AND to NOT
// be a lowercase letter:
// - `## user\n…` → match (newline is not lowercase)
// - `## assistantR…` → match (R is uppercase; the glued-form
// attack pattern still gets caught)
// - `## assistant.` → match (. is not a letter)
// - `## users guide` → no match (s is lowercase letter)
// - `## userland` → no match (l is lowercase letter)
// Why POSITIVE `[^a-z]` rather than NEGATIVE `(?![a-z])`: the
// negative form is satisfied at end-of-string, which in a streaming
// context means "we have just received `## user` but don't know
// what comes next yet". A negative lookahead would fire prematurely
// if the rest of the role-keyword landed in a later chunk (e.g.
// the model emits `## user` then `land` arrives). The positive
// form requires an actual non-lowercase character to be present,
// so detection waits one more chunk in that edge case — a
// one-character latency traded for correctness.
//
// 3. `[ \t]` instead of `\s` for inner whitespace. `\s` matches
// newlines, which would let oddities like `##\nuser` match across
// lines. Markdown role markers are always single-line by
// convention; restricting to space/tab tightens the match without
// losing any real attack pattern.
//
// Alternation order: `assistant` is listed before `assist` so a
// fully-spelled `## assistant` consumes 9 chars (not 6) and the
// `(?![a-z])` check is applied at position 9 (after the full word)
// rather than position 6. Truncated forms (`## assist\n` from a
// stream cut mid-emission) still match via the `assist` branch.
export const FABRICATED_ROLE_MARKER_RE =
/(?:^|\n)[ \t]*##[ \t]+(?:user|assistant|assist|system)(?=[^a-z])/;
// Internal-only variant used after the first chunk has been processed.
// Drops the `^` alternative: once `tail` is a rolling slice of
// mid-stream text, `^` no longer represents the genuine message start
// — applying it would let the regex anchor at an arbitrary cut point
// inside legitimate prose ("…take a look at the ## user content…"
// fed char-by-char would eventually slide a tail window onto leading
// whitespace + `## user` and false-positive). Only `\n`-preceded
// markers are real role boundaries on subsequent chunks; the preceding
// newline is retained inside the 64-char tail so genuine markers
// straddling a chunk boundary are still caught.
// (See PR #3303 review r3324060995.)
const NEWLINE_ANCHORED_ROLE_MARKER_RE =
/\n[ \t]*##[ \t]+(?:user|assistant|assist|system)(?=[^a-z])/;
// Pending-marker variants used in the no-match branch to detect a
// COMPLETE-but-unconfirmed marker prefix at the end of the buffer.
// Drop the `(?=[^a-z])` lookahead and anchor with `$` instead — the
// lookahead's whole purpose is to require a non-lowercase character
// AFTER the role keyword, which by definition can't be present when
// the chunk boundary fell exactly between the role keyword and its
// next byte. If one of these matches, the role keyword IS at the end
// of the current buffer; we withhold it and revisit on the next
// feed, where one of three things will happen:
// (1) The next char is non-lowercase → main regex matches →
// contaminated → withheld bytes dropped.
// (2) The next char is lowercase (e.g. `## userl…`) → main regex
// no longer matches the role keyword → withheld bytes are
// confirmed safe and emitted alongside the new chunk.
// (3) The role keyword is part of a longer word that itself is a
// role keyword (only `user` ⊂ `users`, etc. — none extend to
// a different role) → still case (2), since the extension is
// lowercase.
// This implements the suggested fix on review r3324277xxx —
// preserves the documented "everything from the marker onward is
// silently dropped" contract across chunk boundaries that fall
// inside the lookahead-detection window.
const FIRST_CHUNK_PENDING_MARKER_TAIL_RE =
/(?:^|\n)[ \t]*##[ \t]+(?:user|assistant|assist|system)$/;
const NEWLINE_ANCHORED_PENDING_MARKER_TAIL_RE =
/\n[ \t]*##[ \t]+(?:user|assistant|assist|system)$/;
// Bounded tail size for cross-chunk matching. Must comfortably exceed
// the longest possible marker prefix:
// "\n" + whitespace run + "##" + whitespace + "assistant" ≈ 1624
// chars in practice (LLMs rarely emit more than a couple newlines or a
// handful of spaces between sections). 64 leaves generous margin and
// keeps the guard's memory + per-delta work O(1) regardless of message
// length — important because a 50KB assistant response delivered in
// 1000 chunks of 50 bytes is otherwise O(n²) on string concatenation
// alone.
const TAIL_BUFFER_SIZE = 64;
export interface RoleMarkerGuard {
/** Feed a text delta for the current message. Returns the safe portion
* to emit (may be shorter than `text` if a marker was found mid-chunk,
* or empty string if the entire chunk is past the cut point). */
feedText(text: string): string;
/** Whether a fabricated marker was detected (further text is dropped). */
readonly contaminated: boolean;
/** If contaminated, the warning event to emit. `null` if clean. */
warningEvent(): { type: 'fabricated_role_marker'; marker: string; messageId: string } | null;
}
/**
* Create a stateful guard that detects fabricated role markers across
* chunk boundaries. Memory + per-call work is O(1): instead of
* accumulating the full message text, the guard retains only a small
* trailing suffix (TAIL_BUFFER_SIZE chars) — enough for the matcher to
* see across chunk boundaries when a marker straddles them.
*
* Usage in a stream handler:
*
* const guard = createRoleMarkerGuard(messageId);
* for (const delta of deltas) {
* const safe = guard.feedText(delta.text);
* if (safe.length > 0) onEvent({ type: 'text_delta', delta: safe });
* if (guard.contaminated) {
* onEvent(guard.warningEvent()!);
* break; // stop emitting text for this message
* }
* }
*/
export function createRoleMarkerGuard(messageId: string): RoleMarkerGuard {
// Rolling tail of the bytes we have ALREADY EMITTED, capped at
// TAIL_BUFFER_SIZE. Used as the prefix when matching against new
// text so we catch markers that straddle a chunk boundary.
let tail = '';
// Bytes we have RECEIVED but DEFERRED — held back because they form
// a complete-but-unconfirmed marker suffix at the end of the buffer
// and we don't yet know whether the next chunk will confirm them
// (next char non-lowercase → contaminated, drop) or deny them
// (next char lowercase → suffix was part of a longer word, emit).
// Without this, a chunk boundary falling exactly between the role
// keyword and its lookahead char would leak the marker line itself
// into the UI / app.sqlite before we could classify it. See review
// r3324277xxx.
let pending = '';
// Tracks whether `tail` still represents the ENTIRE emission so
// far — i.e. no slicing has occurred yet and `^` in the canonical
// regex genuinely anchors at byte 0 of the message stream. While
// this holds, the `^|\n` alternation safely catches a role marker
// that arrives at the start of the stream even if its prefix is
// split across multiple chunks (`## ` | `user\n…`, `## us` | `er\n…`,
// `##` | ` user\n…`). The moment `tail` would exceed
// TAIL_BUFFER_SIZE, the slice turns `tail` into a mid-stream
// window and `^` no longer represents the stream start — we then
// switch to the newline-only variants so a sliding window cannot
// manufacture a match from prose. The transition is on slicing,
// not on first emission: earlier definitions ("any byte emitted",
// "newline emitted") both had failure modes — see PR #3303 reviews
// r3324060995 and r3324xxxxxx, and the regression tests below.
let firstChunk = true;
let _contaminated = false;
let markerText: string | null = null;
return {
get contaminated() {
return _contaminated;
},
feedText(text: string): string {
if (_contaminated) return '';
if (text.length === 0) return '';
// Combine `tail` (already-emitted suffix for cross-chunk matching),
// `pending` (deferred-from-prior-call suspicious suffix), and the
// new `text` into a single matching buffer.
const buffer = tail + pending + text;
const matchRe = firstChunk
? FABRICATED_ROLE_MARKER_RE
: NEWLINE_ANCHORED_ROLE_MARKER_RE;
const pendingRe = firstChunk
? FIRST_CHUNK_PENDING_MARKER_TAIL_RE
: NEWLINE_ANCHORED_PENDING_MARKER_TAIL_RE;
// `firstChunk` transitions are tied to actual byte emission, not
// feed count — see comment above. Transitioned at the end of
// this function only when we emit at least one byte.
const match = matchRe.exec(buffer);
if (match) {
// Marker confirmed. Compute the safe-to-emit portion (bytes
// between previously-emitted `tail` and the marker), drop
// `pending` (the deferred portion sits inside the marker
// region by definition once the lookahead char arrives), and
// mark contaminated. Subsequent feeds early-return.
_contaminated = true;
markerText = match[0].trim();
pending = '';
const alreadyEmitted = tail.length;
const markerStart = match.index;
if (markerStart <= alreadyEmitted) return '';
return buffer.slice(alreadyEmitted, markerStart);
}
// No confirmed marker. Check whether the buffer ends with a
// complete-but-unconfirmed marker prefix (role keyword present,
// lookahead char not yet arrived). If so, withhold that suffix
// until the next feed; emit the rest.
const pendingMatch = pendingRe.exec(buffer);
const alreadyEmitted = tail.length;
const pendingStart = pendingMatch
// Never withhold bytes we have already emitted in a prior
// feed — the suspicious suffix could in pathological cases
// start inside `tail` (we held back `pending` correctly on
// the prior call, but the suffix-start position is upstream
// of where we hold). Clamp to alreadyEmitted so safeToEmit
// never goes negative.
? Math.max(pendingMatch.index, alreadyEmitted)
: buffer.length;
const safeToEmit = buffer.slice(alreadyEmitted, pendingStart);
pending = buffer.slice(pendingStart);
// Roll the emitted-bytes tail forward.
const fullEmitted = tail + safeToEmit;
const willSlice = fullEmitted.length > TAIL_BUFFER_SIZE;
tail = willSlice
? fullEmitted.slice(fullEmitted.length - TAIL_BUFFER_SIZE)
: fullEmitted;
// `firstChunk` is true exactly while `tail` still represents the
// entire emission so far — i.e. no slice has occurred and `^` in
// the canonical regex genuinely anchors at byte 0 of the stream.
// The moment we slice (emitted bytes exceed TAIL_BUFFER_SIZE),
// `tail` becomes a mid-stream window, `^` becomes meaningless,
// and we switch to the newline-only variants.
//
// Earlier iterations of this code used "any byte emitted" or
// "newline emitted" as the transition trigger. Both were wrong:
// - "any byte" lost the `^` anchor before a chunk-split
// message-start marker (e.g. `## ` | `user\n…`,
// `## us` | `er\n…`) could finish arriving — see PR #3303
// review r3324xxxxxx, and the new tests below.
// - "newline emitted" left `^` valid on a sliced buffer for
// streams that hadn't yet emitted a newline, which then
// false-positived the rolling-tail mid-stream case from
// review r3324060995.
// Slice-based is the invariant that satisfies both: while we
// haven't sliced, `^` is correct; once we slice, it isn't.
if (willSlice) firstChunk = false;
return safeToEmit;
},
warningEvent() {
if (!_contaminated || !markerText) return null;
return {
type: 'fabricated_role_marker',
marker: markerText,
messageId,
};
},
};
}