mirror of
https://github.com/nexu-io/open-design.git
synced 2026-05-31 19:04:39 +07:00
* fix(daemon): detect and strip fabricated role markers in model output (#3247) Three-layer defence against models emitting `## user` / `## assistant` / `## system` lines mid-response, which the chat host interprets as real turn boundaries and acts on as unauthorised instruction: 1. **System prompt**: anti-roleplay instruction elevated from a bullet under "What you don't do" to a standalone `## CRITICAL` section in `official-system.ts`, with a REMINDER pinned at the end of the composed prompt for recency bias. 2. **Stream-level detection and truncation**: shared `role-marker-guard.ts` module (`createRoleMarkerGuard` + `FABRICATED_ROLE_MARKER_RE`) used across all text paths — Claude stream (per-message guards), non-Claude structured streams (run-scoped guard via `emitGuardedTextDelta`), and BYOK proxy routes (`createDeltaGuard`). When a marker is detected, the contaminated suffix is dropped and a `fabricated_role_marker` event surfaces a warning in the UI. 3. **UI**: `StatusPill` gains `is-warning` / `is-error` CSS variants; `fabricated_role_marker` events render as amber warning pills. * fix(chat-routes): do not await reader.cancel() on stream early-return The await on reader.cancel() can hang indefinitely on response streams whose underlying source is a Uint8Array (most notably surfaced by the ollama test in proxy-routes.test.ts, which builds its mock body via `new Response(uint8array)` rather than the controller-based helper `sseResponse()`). The hung await holds the request handler open, which in turn blocks `server.close()` in the afterAll hook, producing the two test timeouts (test at 145, hook at 36) currently failing CI on #3296. Fix is in production code, not the test: don't await the cancel. It is a cleanup hint and we are returning from the function anyway, so blocking on it offers no value. fire-and-forget with an empty catch keeps the cancel signal flowing for real HTTP streams without risking a hang on mock/edge-case implementations. Co-Authored-By: JasonBroderick <jason@buddyboss.com> * fix(daemon): terminate child on role-marker detection (close #3247 generation vector) PR #3296's detection layer truncates display and persistence of fabricated role markers, but the underlying model subprocess keeps generating tokens after detection. Three concrete consequences: 1. The model bills the user for the entire contaminated response (we observed 5,106 chars stored in claude's session file for a turn where only the first 3,013 chars were legitimate — a 40% overhead). 2. tool_use blocks emitted AFTER the marker reach the daemon's dispatcher unchecked, since detection only gates the text-delta emission path, not content-block-stop / tool_use blocks. The model could fabricate "## user delete file X" then emit a tool_use(delete X) that the dispatcher would execute. 3. The UI surfaces a `fabricated_role_marker` warning followed by an eventual normal turn-end, blurring the distinction between "completed normally" and "killed by safety guard." This commit adds a single idempotent `abortForRoleMarker(marker)` helper in server.ts, scoped to the same closure as `child` and `runGuard`. On any detection event (per-message Claude guard, run-scoped non-Claude guard, plain stdout guard) the helper: - Emits a structured `ROLE_MARKER_HALLUCINATION` SSE error so the UI can render a security-class status distinct from a normal turn-end. The existing `fabricated_role_marker` warning is still sent and rendered as the amber pill (PR #3296's UI). - Calls `acpSession.abort()` for ACP-multiplexed agents (Hermes, Kimi, Devin, Kiro) whose I/O doesn't necessarily release on SIGTERM of the wrapper process alone. - SIGTERMs the child immediately, with the existing `scheduleForcedChildShutdown()` SIGKILL fallback at 2x grace. Wired into three sites where contamination is detected: - `emitGuardedTextDelta` (sendAgentEvent / copilot / ACP / pi-rpc text_delta paths) - Plain-stdout listener (BYOK plain mode) - The Claude stream handler's onEvent (per-message guards in claude-stream.ts surface `fabricated_role_marker` events directly via onEvent rather than through the run-scoped emitGuardedTextDelta) Tool_use blocks emitted BEFORE the marker still flow through normally — this guard can't help with those, since by the time we observe a text marker the prior content block has already finished. Closing that gap requires speculative cancellation of in-flight tool calls when a downstream text block contains a marker; that's tracked as follow-up work, not included here. Co-Authored-By: roverkai <2196140098@qq.com> Co-Authored-By: JasonBroderick <jason@buddyboss.com> * refactor(role-marker-guard): bounded tail + drop chat-style markers Addresses two review comments on #3303: (1) O(1) memory + per-delta work (review r3323982225) Replace the unbounded `accumulated` string with a rolling tail capped at TAIL_BUFFER_SIZE (64 chars — comfortably exceeds the longest marker prefix `\n<whitespace>## assistant` ≈ 16–24 chars in practice). A 50 KB assistant response delivered in 1000 chunks of 50 bytes was previously O(n²) on string concatenation alone; now it is O(1) per delta regardless of message length. The `tail.length` value carries the "already emitted" offset that the cut-point math needs, so the offset semantics at L74–78 of the prior implementation are preserved without re-introducing the full-text buffer. (2) Drop chat-style markers entirely (review r3323982234, option (a)) `User:` / `Assistant:` / `Human:` / `AI:` are removed from the regex. Rationale: - The host parses ONLY `## user` / `## assistant` / `## system` lines as turn boundaries (see `buildDaemonTranscript` in apps/web/src/providers/daemon.ts). A model emitting chat-style markers does NOT cause the original #3247 security failure. - With kill-on-detection wired in this PR (`abortForRoleMarker` in server.ts), a false positive aborts the whole run — far more expensive than a stray unflagged `User:` line in chat scrollback. Chat-style markers collide with legitimate output (form labels, email contacts, JSDoc) often enough that pairing them with kill-semantics is the wrong tradeoff. The tradeoff is now documented in the regex docblock so the kill-on-match behaviour is justified against the false-positive surface. Also aligns the prompt-side CRITICAL block in system.ts: drop the "don't emit User: / Assistant: / Human: / AI:" bullet, since we no longer enforce it. Less ambiguity for the model and the operators. Test file updated: - Chat-style positive tests flipped to negative ("does NOT match User: — chat-style out of scope") so the intentional exclusion has a permanent regression test. - Two new tests cover the bounded-tail behaviour: a marker arriving after 10 KB of clean text in small chunks, and a marker straddling a chunk boundary after 100 prior chunks. - Added test for legitimate `User: bob@example.com`-style content not triggering contamination. Test count is now 35 (up from 25); two of the new ones explicitly exercise the new bounded-tail path. Co-Authored-By: JasonBroderick <jason@buddyboss.com> * fix(role-marker-guard): drop \`^\` anchor after first chunk (review r3324060995) Blocking correctness bug introduced by commit 4 (bounded-tail refactor): once \`tail\` is a rolling slice of mid-stream text, \`^\` in the canonical regex \`(?:^|\\n)\\s*##\\s+(?:user|...)\` no longer represents the genuine message start. As the rolling window slides forward chunk by chunk, a sliced tail can begin with whitespace + \`##\` (or just \`##\`), letting \`^\` anchor a match against text that the full-buffer implementation correctly ignored. With kill-on-detection wired in commit 3, that false positive now SIGTERMs the run and emits a \`ROLE_MARKER_HALLUCINATION\` error — exactly the failure class called out in the docblock at L22–29. Reviewer's evidence (PerishCode, r3324060995): streaming "…take a look at the ## user content section…" one character at a time reports \`contaminated: true\` post-refactor; the same text in a single feed stays clean. Fix: keep the canonical \`FABRICATED_ROLE_MARKER_RE\` for the very first non-empty feed (where \`^\` legitimately points at the message start), and switch to an internal \`NEWLINE_ANCHORED_ROLE_MARKER_RE\` (\`\\n\\s*##\\s+(?:user|...)\` — drops the \`^\` alternative) for all subsequent feeds. A \`firstChunk\` boolean tracks the state. Real newline-preceded markers straddling chunk boundaries are still caught because the preceding \`\\n\` is retained inside the 64-char tail. Regression tests added (\`apps/daemon/tests/role-marker-guard.test.ts\`): - mid-line \`## user\` streamed char-by-char with no preceding \\n (mirrors the reviewer's repro) - space-preceded mid-line \`## user\` in a >130-char stream, which long enough to force the rolling window past the marker — exercises the exact slice condition that triggered the bug - real \\n-preceded \`## user\` still caught after a long preamble (positive case must not regress) - \`## user\` as the very first chunk still caught (\`^\` legitimately anchors on the first feed) Co-Authored-By: JasonBroderick <jason@buddyboss.com> * fix(role-marker-guard): case-sensitive + tighter prefix scope (reviews r3324151877 / r3324151882) Two refinements addressing the third review on #3303: == Blocking (r3324151877) == The regex over-matched legitimate Markdown headings, and with kill-on-detection wired in commit 3 each false positive deterministically aborts a real run. Three changes tighten the match to the actual security surface — `## user` / `## assistant` / `## system` lines the chat host parses as turn boundaries — without losing any real attack pattern: 1. CASE-SENSITIVE. Dropped the `/i` flag. The host's turn-boundary delimiter is lowercase (see `buildDaemonTranscript` in apps/web/src/providers/daemon.ts), and the `## CRITICAL` system-prompt block already forbids only the lowercase forms. Title-Case headings like `## User Guide`, `## System Architecture`, `## Assistant settings` are now ignored — these are legitimate technical writing patterns LLMs emit constantly. `## USER NOTES` (all-caps) likewise no longer flags. 2. POSITIVE LOOKAHEAD `(?=[^a-z])` after the role keyword. Without it, `## userland`, `## userspace`, `## users guide`, `## systemd`, `## assistance` all match via prefix in the alternation. The lookahead requires the next character to exist and to not be a lowercase letter, so: - `## user\\n…` → match (newline is not lowercase) - `## assistantR…` → match (R is uppercase; the glued-form attack pattern still gets caught) - `## assistant.` → match (. is not a letter) - `## users guide` → no match (s is lowercase letter) - `## userland` → no match (l is lowercase letter) POSITIVE rather than NEGATIVE `(?![a-z])` because the negative form is satisfied at end-of-string, which in a streaming context means "we have `## user` but don't know what comes next yet" — would fire prematurely if `land` arrives in a later chunk. The positive form delays detection by one character in that edge case, traded for correctness. 3. `[ \\t]` instead of `\\s` for inner whitespace. Markdown role markers are single-line by convention; restricting to space/tab prevents oddities like `##\\nuser` from matching across lines. Test file: added Title-Case fixtures (`## User Guide`, `## System Architecture`, `## Assistant settings`, `## USER NOTES`) and prefix-of-longer-word fixtures (`## users guide`, `## userland`, `## systemd`, `## assistance`) — each asserting NO contamination. The existing `## usability` negative test gave false confidence as the reviewer noted (only failed via alternation-miss, not via word-boundary semantics); the new fixtures actually exercise the lookahead. Also added a positive test for `## assistant.` (glued punctuation) to balance the existing `## assistantReading` (glued uppercase) coverage. Total tests: 35 → 50. == Non-blocking (r3324151882) == Added `ROLE_MARKER_HALLUCINATION` to `API_ERROR_CODES` in `packages/contracts/src/errors.ts` alongside the existing agent/AMR codes, with a docblock comment explaining the emission contract: emitted by `server.ts::abortForRoleMarker` alongside the existing `fabricated_role_marker` warning event when the daemon detects a fabricated Markdown role marker in agent output; retryable. The code was already being emitted over the wire but unregistered — landing the registration here keeps the contract and emitter in sync as reviewer requested. Co-Authored-By: JasonBroderick <jason@buddyboss.com> * fix(role-marker-guard): defer complete-but-unconfirmed marker suffix Addresses review r3324277xxx — the boundary case where a stream chunk boundary lands between the role keyword and its lookahead character violated the documented "everything from the marker onward is silently dropped" contract. With (?=[^a-z]) as the lookahead, `feedText('## user')` returned `## user` as safe (no char to satisfy the lookahead → no match → pass through), so the fabricated marker line leaked into UI and app.sqlite before the next chunk confirmed contamination on the next SIGTERM cycle. Fix: introduce a `pending` state variable holding bytes that match the COMPLETE-but-unconfirmed marker prefix at end of buffer (/(?:^|\\n)[ \\t]*##[ \\t]+(?:user|assistant|assist|system)$/, no lookahead, $ anchor instead). When the no-match branch detects this suffix, withhold it from emission until the next feed either: - Confirms it (next char non-lowercase) → main regex matches → contaminated → withheld bytes dropped along with `## user`. - Denies it (next char lowercase, e.g. `userl…`) → main regex no longer matches the role keyword → withheld suffix is released and emitted alongside the new continuation. Also tied the firstChunk transition to actual byte emission rather than feed count. Previously a message that starts with `## system` followed by a separate `\\n` chunk would lose the `^` anchor on the second feed (firstChunk had flipped after the first feed even though nothing was emitted yet), silently breaking detection for that edge case. Now `firstChunk` stays true until at least one byte has crossed the emission boundary, matching the conceptual definition of "message start". Tests added (apps/daemon/tests/role-marker-guard.test.ts): - `## user` deferred at chunk boundary, confirmed by `\\n` in next - `## user` deferred at chunk boundary, denied by `land` continuation - `## assistant` deferred, confirmed by punctuation - `## User` Title-Case still passes through unconditionally - `## system` as the very first chunk: deferred, confirmed by \\n in next chunk (tests the firstChunk-stays-true-when-nothing- emitted invariant) Total tests: 50 → 55. Co-Authored-By: JasonBroderick <jason@buddyboss.com> * fix(claude-stream): scope role-marker guard to text_delta only, not thinking_delta Addresses review r3324xxxxxx — guarding the thinking channel buys no security and causes legitimate aborts. Why thinking is NOT a #3247 vector: - `buildDaemonTranscript` in apps/web/src/providers/daemon.ts only re-serializes `m.content` as `## ${m.role}\n...`. - Extended-thinking content is rendered to a separate `kind: 'thinking'` payload (daemon.ts:857-858) and never folded into `m.content`. - So a `## user` line in the thinking channel CANNOT become a fabricated turn boundary on the next round-trip. Why guarding it is harmful: - Models routinely emit literal `## user` / `## assistant` lines in chain-of-thought when reasoning about conversation structure ("Let me think about this. The user might phrase it as:\n## user\n …"). Common pattern in production traces. - With `abortForRoleMarker` wired in server.ts, a guard match on thinking SIGTERMs the run and surfaces a security error to the UI. The user paid for the reasoning, never sees the answer, and gets a confusing "fabricated role marker" warning for what was actually legitimate metacognition. - This directly contradicts the module's own stated philosophy ("a false positive aborts the whole run — a much more expensive failure than a stray unflagged ... line", role-marker-guard.ts). Fix: `emitSafeText` now passes thinking_delta through unconditionally, skipping both the guard and the contamination check. text_delta remains fully guarded. The single-line change at the top of emitSafeText preserves all other channels' behavior. Regression tests added (apps/daemon/tests/claude-stream-thinking.test.ts): - `## user` / `## assistant` lines in a thinking_delta — must NOT fire fabricated_role_marker, the thinking content streams intact including the marker text, and the subsequent text_delta answer still reaches the consumer (run not aborted). - Sanity check: same `## user` pattern in a text_delta DOES fire fabricated_role_marker and truncates emission at the marker. Locks in the channel-discriminated behavior. Co-Authored-By: JasonBroderick <jason@buddyboss.com> * fix(role-marker-guard): tie firstChunk to slicing, not byte emission Blocking review r3324xxxxxx: under the prior firstChunk transition ("any byte emitted"), a role marker that arrived at the very start of a message with its prefix split across multiple chunks bypassed detection — reopening the #3247 vector on the Claude path. Concrete cases that were missed (all are routine provider tokenizations of \`## user\n…\` at message start): - \`##\` | \` user\nDELETE…\` - \`## us\` | \`er\nDELETE…\` - \`## \` | \`user\nDELETE…\` Mechanism: the pending-deferral regex only catches COMPLETE role keywords, so a first chunk ending in a partial prefix (\`##\`, \`## \`, \`## us\`) was emitted in full. That emission flipped firstChunk to false. From that point only NEWLINE_ANCHORED_ROLE_MARKER_RE was used, which requires a literal \n before \`##\`. A marker at buffer position 0 has no preceding \n, so it could no longer match. abortForRoleMarker never fired and tool_use blocks emitted after the fabricated turn boundary reached the dispatcher. Fix: change firstChunk to track "tail has not been sliced yet" rather than "any byte emitted". While total emitted bytes <= TAIL_BUFFER_SIZE, tail still represents the entire emission so far and \`^\` in the canonical regex genuinely anchors at byte 0 of the stream — so the \`^|\n\` alternation safely catches a chunk-split message-start marker. The transition happens at the moment we would slice: once emitted > TAIL_BUFFER_SIZE, tail becomes a mid-stream window, \`^\` becomes meaningless, and we switch to the newline-only variants. Earlier iterations of this code tried two other definitions, both unsound: - "any byte emitted" (this commit fixes) — lost \`^\` before a chunk-split message-start marker could finish arriving. - "newline emitted" (briefly considered as the reviewer's alternative suggestion) — left \`^\` valid on a sliced buffer when streams hadn't emitted a newline yet, re-introducing the rolling-tail mid-stream false positive from review r3324060995. The slice-based invariant satisfies both: while we have not sliced, \`^\` is correct; once we slice, it is not. Regression tests added (apps/daemon/tests/role-marker-guard.test.ts): - \`##\` | \` user\nDELETE…\` → contaminated, marker=\`## user\` - \`## us\` | \`er\nDELETE…\` → contaminated, marker=\`## user\` - \`## \` | \`user\nDELETE…\` → contaminated, marker=\`## user\` - \`#\` | \`# user\nDELETE…\` → contaminated, marker=\`## user\` The fourth case (single \`#\` first chunk) exercises an even more adversarial tokenization than the reviewer's examples; it is also caught. Total tests: 55 → 59. Co-Authored-By: JasonBroderick <jason@buddyboss.com> * fix(tests): wrap events in stream_event envelope in thinking test feedJsonl was feeding raw events without the `{ type: 'stream_event', event: ... }` wrapper that createClaudeStreamHandler requires (line 141 of claude-stream.ts). Events silently fell through all branches, making both tests pass vacuously. Also fix TS2532 on warnings[0].marker with non-null assertion (safe after the toHaveLength(1) guard). Co-Authored-By: RoverKai <roverkai@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: roverkai <2196140098@qq.com> Co-authored-by: JasonBroderick <jason@buddyboss.com> Co-authored-by: RoverKai <roverkai@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
297 lines
15 KiB
TypeScript
297 lines
15 KiB
TypeScript
/**
|
||
* Shared utility for detecting and stripping fabricated role-marker lines
|
||
* (`## user`, `## assistant`, `## system`) injected by the model into its
|
||
* own output (see #3247 — same class as #2102 / #2464).
|
||
*
|
||
* `createRoleMarkerGuard()` — stateful per-message guard for structured
|
||
* stream handlers that can track message boundaries (Claude, Copilot,
|
||
* Qoder, OpenCode/Codex, Pi, ACP). Returns `{ feedText, contaminated,
|
||
* warningEvent }`.
|
||
*/
|
||
|
||
// Regex matching fabricated role-marker lines injected by the model into
|
||
// its own output. Anchored to start-of-line via (?:^|\n) so we don't
|
||
// false-positive on user prose like "here is the ## user content".
|
||
//
|
||
// Scope (deliberately narrow): Markdown-style `## user` / `## assistant`
|
||
// / `## assist` / `## system` only — these are the patterns the chat
|
||
// host actually parses as turn boundaries (see `buildDaemonTranscript`
|
||
// in apps/web/src/providers/daemon.ts). Chat-style markers like
|
||
// `User:` / `Assistant:` / `Human:` / `AI:` are intentionally NOT
|
||
// included, because:
|
||
// (1) The host never parses them as turn boundaries; a model emitting
|
||
// them does NOT cause the original #3247 security failure mode.
|
||
// (2) They collide with legitimate output far more often than the
|
||
// Markdown family (e.g., "User: bob@example.com", form labels,
|
||
// JSDoc lines). With kill-on-detection wired in server.ts
|
||
// (`abortForRoleMarker`), a false positive aborts the whole run
|
||
// — a much more expensive failure than a stray unflagged
|
||
// `User:` line in the chat scrollback.
|
||
// If a host frontend ever starts parsing chat-style markers as
|
||
// boundaries, narrow the additions to that frontend's specific
|
||
// path rather than the shared regex.
|
||
//
|
||
// Three deliberate refinements vs. a naive `## role` match:
|
||
//
|
||
// 1. CASE-SENSITIVE. The chat host's turn-boundary delimiter is
|
||
// lowercase (`## user` / `## assistant` / `## system` — see
|
||
// `buildDaemonTranscript` in apps/web/src/providers/daemon.ts), and
|
||
// the `## CRITICAL` system-prompt block forbids only the lowercase
|
||
// forms. Title-Case Markdown headings like `## User Guide`,
|
||
// `## System Architecture`, `## Assistant settings` are LEGITIMATE
|
||
// content (LLMs emit these constantly in technical writing) and
|
||
// must not contaminate. Matching with `/i` would deterministically
|
||
// abort any run that produced such a heading — exactly the
|
||
// "false positive aborts the whole run" cost the docblock cites
|
||
// as the reason to keep the regex narrow.
|
||
// (See PR #3303 review r3324151877.)
|
||
//
|
||
// 2. POSITIVE LOOKAHEAD `(?=[^a-z])`. Without it, `## userland`,
|
||
// `## userspace`, `## users guide`, `## systemd`, `## assistance`
|
||
// all match via prefix in the alternation. The positive lookahead
|
||
// requires the character after the role keyword to exist AND to NOT
|
||
// be a lowercase letter:
|
||
// - `## user\n…` → match (newline is not lowercase)
|
||
// - `## assistantR…` → match (R is uppercase; the glued-form
|
||
// attack pattern still gets caught)
|
||
// - `## assistant.` → match (. is not a letter)
|
||
// - `## users guide` → no match (s is lowercase letter)
|
||
// - `## userland` → no match (l is lowercase letter)
|
||
// Why POSITIVE `[^a-z]` rather than NEGATIVE `(?![a-z])`: the
|
||
// negative form is satisfied at end-of-string, which in a streaming
|
||
// context means "we have just received `## user` but don't know
|
||
// what comes next yet". A negative lookahead would fire prematurely
|
||
// if the rest of the role-keyword landed in a later chunk (e.g.
|
||
// the model emits `## user` then `land` arrives). The positive
|
||
// form requires an actual non-lowercase character to be present,
|
||
// so detection waits one more chunk in that edge case — a
|
||
// one-character latency traded for correctness.
|
||
//
|
||
// 3. `[ \t]` instead of `\s` for inner whitespace. `\s` matches
|
||
// newlines, which would let oddities like `##\nuser` match across
|
||
// lines. Markdown role markers are always single-line by
|
||
// convention; restricting to space/tab tightens the match without
|
||
// losing any real attack pattern.
|
||
//
|
||
// Alternation order: `assistant` is listed before `assist` so a
|
||
// fully-spelled `## assistant` consumes 9 chars (not 6) and the
|
||
// `(?![a-z])` check is applied at position 9 (after the full word)
|
||
// rather than position 6. Truncated forms (`## assist\n` from a
|
||
// stream cut mid-emission) still match via the `assist` branch.
|
||
export const FABRICATED_ROLE_MARKER_RE =
|
||
/(?:^|\n)[ \t]*##[ \t]+(?:user|assistant|assist|system)(?=[^a-z])/;
|
||
|
||
// Internal-only variant used after the first chunk has been processed.
|
||
// Drops the `^` alternative: once `tail` is a rolling slice of
|
||
// mid-stream text, `^` no longer represents the genuine message start
|
||
// — applying it would let the regex anchor at an arbitrary cut point
|
||
// inside legitimate prose ("…take a look at the ## user content…"
|
||
// fed char-by-char would eventually slide a tail window onto leading
|
||
// whitespace + `## user` and false-positive). Only `\n`-preceded
|
||
// markers are real role boundaries on subsequent chunks; the preceding
|
||
// newline is retained inside the 64-char tail so genuine markers
|
||
// straddling a chunk boundary are still caught.
|
||
// (See PR #3303 review r3324060995.)
|
||
const NEWLINE_ANCHORED_ROLE_MARKER_RE =
|
||
/\n[ \t]*##[ \t]+(?:user|assistant|assist|system)(?=[^a-z])/;
|
||
|
||
// Pending-marker variants used in the no-match branch to detect a
|
||
// COMPLETE-but-unconfirmed marker prefix at the end of the buffer.
|
||
// Drop the `(?=[^a-z])` lookahead and anchor with `$` instead — the
|
||
// lookahead's whole purpose is to require a non-lowercase character
|
||
// AFTER the role keyword, which by definition can't be present when
|
||
// the chunk boundary fell exactly between the role keyword and its
|
||
// next byte. If one of these matches, the role keyword IS at the end
|
||
// of the current buffer; we withhold it and revisit on the next
|
||
// feed, where one of three things will happen:
|
||
// (1) The next char is non-lowercase → main regex matches →
|
||
// contaminated → withheld bytes dropped.
|
||
// (2) The next char is lowercase (e.g. `## userl…`) → main regex
|
||
// no longer matches the role keyword → withheld bytes are
|
||
// confirmed safe and emitted alongside the new chunk.
|
||
// (3) The role keyword is part of a longer word that itself is a
|
||
// role keyword (only `user` ⊂ `users`, etc. — none extend to
|
||
// a different role) → still case (2), since the extension is
|
||
// lowercase.
|
||
// This implements the suggested fix on review r3324277xxx —
|
||
// preserves the documented "everything from the marker onward is
|
||
// silently dropped" contract across chunk boundaries that fall
|
||
// inside the lookahead-detection window.
|
||
const FIRST_CHUNK_PENDING_MARKER_TAIL_RE =
|
||
/(?:^|\n)[ \t]*##[ \t]+(?:user|assistant|assist|system)$/;
|
||
|
||
const NEWLINE_ANCHORED_PENDING_MARKER_TAIL_RE =
|
||
/\n[ \t]*##[ \t]+(?:user|assistant|assist|system)$/;
|
||
|
||
// Bounded tail size for cross-chunk matching. Must comfortably exceed
|
||
// the longest possible marker prefix:
|
||
// "\n" + whitespace run + "##" + whitespace + "assistant" ≈ 16–24
|
||
// chars in practice (LLMs rarely emit more than a couple newlines or a
|
||
// handful of spaces between sections). 64 leaves generous margin and
|
||
// keeps the guard's memory + per-delta work O(1) regardless of message
|
||
// length — important because a 50KB assistant response delivered in
|
||
// 1000 chunks of 50 bytes is otherwise O(n²) on string concatenation
|
||
// alone.
|
||
const TAIL_BUFFER_SIZE = 64;
|
||
|
||
export interface RoleMarkerGuard {
|
||
/** Feed a text delta for the current message. Returns the safe portion
|
||
* to emit (may be shorter than `text` if a marker was found mid-chunk,
|
||
* or empty string if the entire chunk is past the cut point). */
|
||
feedText(text: string): string;
|
||
/** Whether a fabricated marker was detected (further text is dropped). */
|
||
readonly contaminated: boolean;
|
||
/** If contaminated, the warning event to emit. `null` if clean. */
|
||
warningEvent(): { type: 'fabricated_role_marker'; marker: string; messageId: string } | null;
|
||
}
|
||
|
||
/**
|
||
* Create a stateful guard that detects fabricated role markers across
|
||
* chunk boundaries. Memory + per-call work is O(1): instead of
|
||
* accumulating the full message text, the guard retains only a small
|
||
* trailing suffix (TAIL_BUFFER_SIZE chars) — enough for the matcher to
|
||
* see across chunk boundaries when a marker straddles them.
|
||
*
|
||
* Usage in a stream handler:
|
||
*
|
||
* const guard = createRoleMarkerGuard(messageId);
|
||
* for (const delta of deltas) {
|
||
* const safe = guard.feedText(delta.text);
|
||
* if (safe.length > 0) onEvent({ type: 'text_delta', delta: safe });
|
||
* if (guard.contaminated) {
|
||
* onEvent(guard.warningEvent()!);
|
||
* break; // stop emitting text for this message
|
||
* }
|
||
* }
|
||
*/
|
||
export function createRoleMarkerGuard(messageId: string): RoleMarkerGuard {
|
||
// Rolling tail of the bytes we have ALREADY EMITTED, capped at
|
||
// TAIL_BUFFER_SIZE. Used as the prefix when matching against new
|
||
// text so we catch markers that straddle a chunk boundary.
|
||
let tail = '';
|
||
// Bytes we have RECEIVED but DEFERRED — held back because they form
|
||
// a complete-but-unconfirmed marker suffix at the end of the buffer
|
||
// and we don't yet know whether the next chunk will confirm them
|
||
// (next char non-lowercase → contaminated, drop) or deny them
|
||
// (next char lowercase → suffix was part of a longer word, emit).
|
||
// Without this, a chunk boundary falling exactly between the role
|
||
// keyword and its lookahead char would leak the marker line itself
|
||
// into the UI / app.sqlite before we could classify it. See review
|
||
// r3324277xxx.
|
||
let pending = '';
|
||
// Tracks whether `tail` still represents the ENTIRE emission so
|
||
// far — i.e. no slicing has occurred yet and `^` in the canonical
|
||
// regex genuinely anchors at byte 0 of the message stream. While
|
||
// this holds, the `^|\n` alternation safely catches a role marker
|
||
// that arrives at the start of the stream even if its prefix is
|
||
// split across multiple chunks (`## ` | `user\n…`, `## us` | `er\n…`,
|
||
// `##` | ` user\n…`). The moment `tail` would exceed
|
||
// TAIL_BUFFER_SIZE, the slice turns `tail` into a mid-stream
|
||
// window and `^` no longer represents the stream start — we then
|
||
// switch to the newline-only variants so a sliding window cannot
|
||
// manufacture a match from prose. The transition is on slicing,
|
||
// not on first emission: earlier definitions ("any byte emitted",
|
||
// "newline emitted") both had failure modes — see PR #3303 reviews
|
||
// r3324060995 and r3324xxxxxx, and the regression tests below.
|
||
let firstChunk = true;
|
||
let _contaminated = false;
|
||
let markerText: string | null = null;
|
||
|
||
return {
|
||
get contaminated() {
|
||
return _contaminated;
|
||
},
|
||
|
||
feedText(text: string): string {
|
||
if (_contaminated) return '';
|
||
if (text.length === 0) return '';
|
||
|
||
// Combine `tail` (already-emitted suffix for cross-chunk matching),
|
||
// `pending` (deferred-from-prior-call suspicious suffix), and the
|
||
// new `text` into a single matching buffer.
|
||
const buffer = tail + pending + text;
|
||
const matchRe = firstChunk
|
||
? FABRICATED_ROLE_MARKER_RE
|
||
: NEWLINE_ANCHORED_ROLE_MARKER_RE;
|
||
const pendingRe = firstChunk
|
||
? FIRST_CHUNK_PENDING_MARKER_TAIL_RE
|
||
: NEWLINE_ANCHORED_PENDING_MARKER_TAIL_RE;
|
||
// `firstChunk` transitions are tied to actual byte emission, not
|
||
// feed count — see comment above. Transitioned at the end of
|
||
// this function only when we emit at least one byte.
|
||
|
||
const match = matchRe.exec(buffer);
|
||
if (match) {
|
||
// Marker confirmed. Compute the safe-to-emit portion (bytes
|
||
// between previously-emitted `tail` and the marker), drop
|
||
// `pending` (the deferred portion sits inside the marker
|
||
// region by definition once the lookahead char arrives), and
|
||
// mark contaminated. Subsequent feeds early-return.
|
||
_contaminated = true;
|
||
markerText = match[0].trim();
|
||
pending = '';
|
||
const alreadyEmitted = tail.length;
|
||
const markerStart = match.index;
|
||
if (markerStart <= alreadyEmitted) return '';
|
||
return buffer.slice(alreadyEmitted, markerStart);
|
||
}
|
||
|
||
// No confirmed marker. Check whether the buffer ends with a
|
||
// complete-but-unconfirmed marker prefix (role keyword present,
|
||
// lookahead char not yet arrived). If so, withhold that suffix
|
||
// until the next feed; emit the rest.
|
||
const pendingMatch = pendingRe.exec(buffer);
|
||
const alreadyEmitted = tail.length;
|
||
const pendingStart = pendingMatch
|
||
// Never withhold bytes we have already emitted in a prior
|
||
// feed — the suspicious suffix could in pathological cases
|
||
// start inside `tail` (we held back `pending` correctly on
|
||
// the prior call, but the suffix-start position is upstream
|
||
// of where we hold). Clamp to alreadyEmitted so safeToEmit
|
||
// never goes negative.
|
||
? Math.max(pendingMatch.index, alreadyEmitted)
|
||
: buffer.length;
|
||
|
||
const safeToEmit = buffer.slice(alreadyEmitted, pendingStart);
|
||
pending = buffer.slice(pendingStart);
|
||
|
||
// Roll the emitted-bytes tail forward.
|
||
const fullEmitted = tail + safeToEmit;
|
||
const willSlice = fullEmitted.length > TAIL_BUFFER_SIZE;
|
||
tail = willSlice
|
||
? fullEmitted.slice(fullEmitted.length - TAIL_BUFFER_SIZE)
|
||
: fullEmitted;
|
||
// `firstChunk` is true exactly while `tail` still represents the
|
||
// entire emission so far — i.e. no slice has occurred and `^` in
|
||
// the canonical regex genuinely anchors at byte 0 of the stream.
|
||
// The moment we slice (emitted bytes exceed TAIL_BUFFER_SIZE),
|
||
// `tail` becomes a mid-stream window, `^` becomes meaningless,
|
||
// and we switch to the newline-only variants.
|
||
//
|
||
// Earlier iterations of this code used "any byte emitted" or
|
||
// "newline emitted" as the transition trigger. Both were wrong:
|
||
// - "any byte" lost the `^` anchor before a chunk-split
|
||
// message-start marker (e.g. `## ` | `user\n…`,
|
||
// `## us` | `er\n…`) could finish arriving — see PR #3303
|
||
// review r3324xxxxxx, and the new tests below.
|
||
// - "newline emitted" left `^` valid on a sliced buffer for
|
||
// streams that hadn't yet emitted a newline, which then
|
||
// false-positived the rolling-tail mid-stream case from
|
||
// review r3324060995.
|
||
// Slice-based is the invariant that satisfies both: while we
|
||
// haven't sliced, `^` is correct; once we slice, it isn't.
|
||
if (willSlice) firstChunk = false;
|
||
|
||
return safeToEmit;
|
||
},
|
||
|
||
warningEvent() {
|
||
if (!_contaminated || !markerText) return null;
|
||
return {
|
||
type: 'fabricated_role_marker',
|
||
marker: markerText,
|
||
messageId,
|
||
};
|
||
},
|
||
};
|
||
}
|