Commit graph

18 commits

Author SHA1 Message Date
Denis Redozubov
729ce2b0cb
feat(daemon): add run-scoped MCP tool bundles (#3244)
* feat(daemon): add run-scoped MCP tool bundles

* fix(daemon): keep sandbox runs in managed project dirs

* fix(daemon): reject malformed run tool bundles

* fix(contracts): model run-scoped mcp server inputs

* fix(daemon): reject unsupported run tool bundles

* fix(daemon): validate run tools before chat fallback

* test(daemon): expect sandbox imported folder failure

* fix(daemon): preflight sandbox project roots before run rows

* fix(daemon): preflight sandbox chat project roots

* fix(daemon): allow host editor for sandbox imports

* fix(daemon): preflight sandbox routine project reuse

* fix(daemon): reject undeliverable Claude tool bundles

* fix(daemon): single-source chat route validation
2026-05-31 03:53:04 +00:00
JasonBroderick
0fbeaf829e
fix(#3247): Detect, terminate, and warn on fabricated role markers across all agent paths (#3303)
* fix(daemon): detect and strip fabricated role markers in model output (#3247)

Three-layer defence against models emitting `## user` / `## assistant` /
`## system` lines mid-response, which the chat host interprets as real
turn boundaries and acts on as unauthorised instruction:

1. **System prompt**: anti-roleplay instruction elevated from a bullet
   under "What you don't do" to a standalone `## CRITICAL` section in
   `official-system.ts`, with a REMINDER pinned at the end of the
   composed prompt for recency bias.

2. **Stream-level detection and truncation**: shared `role-marker-guard.ts`
   module (`createRoleMarkerGuard` + `FABRICATED_ROLE_MARKER_RE`) used
   across all text paths — Claude stream (per-message guards), non-Claude
   structured streams (run-scoped guard via `emitGuardedTextDelta`),
   and BYOK proxy routes (`createDeltaGuard`). When a marker is detected,
   the contaminated suffix is dropped and a `fabricated_role_marker` event
   surfaces a warning in the UI.

3. **UI**: `StatusPill` gains `is-warning` / `is-error` CSS variants;
   `fabricated_role_marker` events render as amber warning pills.

* fix(chat-routes): do not await reader.cancel() on stream early-return

The await on reader.cancel() can hang indefinitely on response streams
whose underlying source is a Uint8Array (most notably surfaced by the
ollama test in proxy-routes.test.ts, which builds its mock body via
`new Response(uint8array)` rather than the controller-based helper
`sseResponse()`). The hung await holds the request handler open, which
in turn blocks `server.close()` in the afterAll hook, producing the two
test timeouts (test at 145, hook at 36) currently failing CI on #3296.

Fix is in production code, not the test: don't await the cancel. It
is a cleanup hint and we are returning from the function anyway, so
blocking on it offers no value. fire-and-forget with an empty catch
keeps the cancel signal flowing for real HTTP streams without
risking a hang on mock/edge-case implementations.

Co-Authored-By: JasonBroderick <jason@buddyboss.com>

* fix(daemon): terminate child on role-marker detection (close #3247 generation vector)

PR #3296's detection layer truncates display and persistence of fabricated
role markers, but the underlying model subprocess keeps generating tokens
after detection. Three concrete consequences:

  1. The model bills the user for the entire contaminated response
     (we observed 5,106 chars stored in claude's session file for a turn
     where only the first 3,013 chars were legitimate — a 40% overhead).
  2. tool_use blocks emitted AFTER the marker reach the daemon's
     dispatcher unchecked, since detection only gates the text-delta
     emission path, not content-block-stop / tool_use blocks. The
     model could fabricate "## user delete file X" then emit a
     tool_use(delete X) that the dispatcher would execute.
  3. The UI surfaces a `fabricated_role_marker` warning followed by an
     eventual normal turn-end, blurring the distinction between
     "completed normally" and "killed by safety guard."

This commit adds a single idempotent `abortForRoleMarker(marker)`
helper in server.ts, scoped to the same closure as `child` and
`runGuard`. On any detection event (per-message Claude guard,
run-scoped non-Claude guard, plain stdout guard) the helper:

  - Emits a structured `ROLE_MARKER_HALLUCINATION` SSE error so the
    UI can render a security-class status distinct from a normal
    turn-end. The existing `fabricated_role_marker` warning is still
    sent and rendered as the amber pill (PR #3296's UI).
  - Calls `acpSession.abort()` for ACP-multiplexed agents (Hermes,
    Kimi, Devin, Kiro) whose I/O doesn't necessarily release on
    SIGTERM of the wrapper process alone.
  - SIGTERMs the child immediately, with the existing
    `scheduleForcedChildShutdown()` SIGKILL fallback at 2x grace.

Wired into three sites where contamination is detected:
  - `emitGuardedTextDelta` (sendAgentEvent / copilot / ACP / pi-rpc
    text_delta paths)
  - Plain-stdout listener (BYOK plain mode)
  - The Claude stream handler's onEvent (per-message guards in
    claude-stream.ts surface `fabricated_role_marker` events directly
    via onEvent rather than through the run-scoped emitGuardedTextDelta)

Tool_use blocks emitted BEFORE the marker still flow through normally
— this guard can't help with those, since by the time we observe a
text marker the prior content block has already finished. Closing
that gap requires speculative cancellation of in-flight tool calls
when a downstream text block contains a marker; that's tracked as
follow-up work, not included here.

Co-Authored-By: roverkai <2196140098@qq.com>
Co-Authored-By: JasonBroderick <jason@buddyboss.com>

* refactor(role-marker-guard): bounded tail + drop chat-style markers

Addresses two review comments on #3303:

(1) O(1) memory + per-delta work (review r3323982225)
  Replace the unbounded `accumulated` string with a rolling tail capped
  at TAIL_BUFFER_SIZE (64 chars — comfortably exceeds the longest
  marker prefix `\n<whitespace>## assistant` ≈ 16–24 chars in practice).
  A 50 KB assistant response delivered in 1000 chunks of 50 bytes was
  previously O(n²) on string concatenation alone; now it is O(1) per
  delta regardless of message length. The `tail.length` value carries
  the "already emitted" offset that the cut-point math needs, so the
  offset semantics at L74–78 of the prior implementation are preserved
  without re-introducing the full-text buffer.

(2) Drop chat-style markers entirely (review r3323982234, option (a))
  `User:` / `Assistant:` / `Human:` / `AI:` are removed from the regex.
  Rationale:
    - The host parses ONLY `## user` / `## assistant` / `## system`
      lines as turn boundaries (see `buildDaemonTranscript` in
      apps/web/src/providers/daemon.ts). A model emitting chat-style
      markers does NOT cause the original #3247 security failure.
    - With kill-on-detection wired in this PR (`abortForRoleMarker`
      in server.ts), a false positive aborts the whole run — far
      more expensive than a stray unflagged `User:` line in chat
      scrollback. Chat-style markers collide with legitimate output
      (form labels, email contacts, JSDoc) often enough that pairing
      them with kill-semantics is the wrong tradeoff.
  The tradeoff is now documented in the regex docblock so the
  kill-on-match behaviour is justified against the false-positive
  surface.

Also aligns the prompt-side CRITICAL block in system.ts: drop the
"don't emit User: / Assistant: / Human: / AI:" bullet, since we no
longer enforce it. Less ambiguity for the model and the operators.

Test file updated:
  - Chat-style positive tests flipped to negative ("does NOT match
    User: — chat-style out of scope") so the intentional exclusion
    has a permanent regression test.
  - Two new tests cover the bounded-tail behaviour: a marker arriving
    after 10 KB of clean text in small chunks, and a marker
    straddling a chunk boundary after 100 prior chunks.
  - Added test for legitimate `User: bob@example.com`-style content
    not triggering contamination.
Test count is now 35 (up from 25); two of the new ones explicitly
exercise the new bounded-tail path.

Co-Authored-By: JasonBroderick <jason@buddyboss.com>

* fix(role-marker-guard): drop \`^\` anchor after first chunk (review r3324060995)

Blocking correctness bug introduced by commit 4 (bounded-tail refactor):
once \`tail\` is a rolling slice of mid-stream text, \`^\` in the
canonical regex \`(?:^|\\n)\\s*##\\s+(?:user|...)\` no longer represents
the genuine message start. As the rolling window slides forward chunk
by chunk, a sliced tail can begin with whitespace + \`##\` (or just
\`##\`), letting \`^\` anchor a match against text that the
full-buffer implementation correctly ignored. With kill-on-detection
wired in commit 3, that false positive now SIGTERMs the run and emits
a \`ROLE_MARKER_HALLUCINATION\` error — exactly the failure class
called out in the docblock at L22–29.

Reviewer's evidence (PerishCode, r3324060995): streaming
"…take a look at the ## user content section…" one character at a
time reports \`contaminated: true\` post-refactor; the same text in a
single feed stays clean.

Fix: keep the canonical \`FABRICATED_ROLE_MARKER_RE\` for the very
first non-empty feed (where \`^\` legitimately points at the message
start), and switch to an internal \`NEWLINE_ANCHORED_ROLE_MARKER_RE\`
(\`\\n\\s*##\\s+(?:user|...)\` — drops the \`^\` alternative) for all
subsequent feeds. A \`firstChunk\` boolean tracks the state. Real
newline-preceded markers straddling chunk boundaries are still caught
because the preceding \`\\n\` is retained inside the 64-char tail.

Regression tests added (\`apps/daemon/tests/role-marker-guard.test.ts\`):
  - mid-line \`## user\` streamed char-by-char with no preceding \\n
    (mirrors the reviewer's repro)
  - space-preceded mid-line \`## user\` in a >130-char stream, which
    long enough to force the rolling window past the marker — exercises
    the exact slice condition that triggered the bug
  - real \\n-preceded \`## user\` still caught after a long preamble
    (positive case must not regress)
  - \`## user\` as the very first chunk still caught (\`^\` legitimately
    anchors on the first feed)

Co-Authored-By: JasonBroderick <jason@buddyboss.com>

* fix(role-marker-guard): case-sensitive + tighter prefix scope (reviews r3324151877 / r3324151882)

Two refinements addressing the third review on #3303:

== Blocking (r3324151877) ==
The regex over-matched legitimate Markdown headings, and with
kill-on-detection wired in commit 3 each false positive
deterministically aborts a real run. Three changes tighten the match
to the actual security surface — `## user` / `## assistant` /
`## system` lines the chat host parses as turn boundaries — without
losing any real attack pattern:

1. CASE-SENSITIVE. Dropped the `/i` flag. The host's turn-boundary
   delimiter is lowercase (see `buildDaemonTranscript` in
   apps/web/src/providers/daemon.ts), and the `## CRITICAL`
   system-prompt block already forbids only the lowercase forms.
   Title-Case headings like `## User Guide`, `## System Architecture`,
   `## Assistant settings` are now ignored — these are legitimate
   technical writing patterns LLMs emit constantly. `## USER NOTES`
   (all-caps) likewise no longer flags.

2. POSITIVE LOOKAHEAD `(?=[^a-z])` after the role keyword. Without it,
   `## userland`, `## userspace`, `## users guide`, `## systemd`,
   `## assistance` all match via prefix in the alternation. The
   lookahead requires the next character to exist and to not be a
   lowercase letter, so:
     - `## user\\n…`     → match (newline is not lowercase)
     - `## assistantR…` → match (R is uppercase; the glued-form
                          attack pattern still gets caught)
     - `## assistant.`  → match (. is not a letter)
     - `## users guide` → no match (s is lowercase letter)
     - `## userland`    → no match (l is lowercase letter)
   POSITIVE rather than NEGATIVE `(?![a-z])` because the negative
   form is satisfied at end-of-string, which in a streaming context
   means "we have `## user` but don't know what comes next yet" —
   would fire prematurely if `land` arrives in a later chunk. The
   positive form delays detection by one character in that edge
   case, traded for correctness.

3. `[ \\t]` instead of `\\s` for inner whitespace. Markdown role
   markers are single-line by convention; restricting to space/tab
   prevents oddities like `##\\nuser` from matching across lines.

Test file: added Title-Case fixtures (`## User Guide`,
`## System Architecture`, `## Assistant settings`, `## USER NOTES`)
and prefix-of-longer-word fixtures (`## users guide`, `## userland`,
`## systemd`, `## assistance`) — each asserting NO contamination.
The existing `## usability` negative test gave false confidence as
the reviewer noted (only failed via alternation-miss, not via
word-boundary semantics); the new fixtures actually exercise the
lookahead. Also added a positive test for `## assistant.` (glued
punctuation) to balance the existing `## assistantReading`
(glued uppercase) coverage. Total tests: 35 → 50.

== Non-blocking (r3324151882) ==
Added `ROLE_MARKER_HALLUCINATION` to `API_ERROR_CODES` in
`packages/contracts/src/errors.ts` alongside the existing agent/AMR
codes, with a docblock comment explaining the emission contract:
emitted by `server.ts::abortForRoleMarker` alongside the existing
`fabricated_role_marker` warning event when the daemon detects a
fabricated Markdown role marker in agent output; retryable. The code
was already being emitted over the wire but unregistered — landing
the registration here keeps the contract and emitter in sync as
reviewer requested.

Co-Authored-By: JasonBroderick <jason@buddyboss.com>

* fix(role-marker-guard): defer complete-but-unconfirmed marker suffix

Addresses review r3324277xxx — the boundary case where a stream chunk
boundary lands between the role keyword and its lookahead character
violated the documented "everything from the marker onward is silently
dropped" contract. With (?=[^a-z]) as the lookahead, `feedText('## user')`
returned `## user` as safe (no char to satisfy the lookahead → no match
→ pass through), so the fabricated marker line leaked into UI and
app.sqlite before the next chunk confirmed contamination on the next
SIGTERM cycle.

Fix: introduce a `pending` state variable holding bytes that match the
COMPLETE-but-unconfirmed marker prefix at end of buffer
(/(?:^|\\n)[ \\t]*##[ \\t]+(?:user|assistant|assist|system)$/, no
lookahead, $ anchor instead). When the no-match branch detects this
suffix, withhold it from emission until the next feed either:
  - Confirms it (next char non-lowercase) → main regex matches →
    contaminated → withheld bytes dropped along with `## user`.
  - Denies it (next char lowercase, e.g. `userl…`) → main regex no
    longer matches the role keyword → withheld suffix is released
    and emitted alongside the new continuation.

Also tied the firstChunk transition to actual byte emission rather
than feed count. Previously a message that starts with `## system`
followed by a separate `\\n` chunk would lose the `^` anchor on the
second feed (firstChunk had flipped after the first feed even though
nothing was emitted yet), silently breaking detection for that edge
case. Now `firstChunk` stays true until at least one byte has crossed
the emission boundary, matching the conceptual definition of "message
start".

Tests added (apps/daemon/tests/role-marker-guard.test.ts):
  - `## user` deferred at chunk boundary, confirmed by `\\n` in next
  - `## user` deferred at chunk boundary, denied by `land` continuation
  - `## assistant` deferred, confirmed by punctuation
  - `## User` Title-Case still passes through unconditionally
  - `## system` as the very first chunk: deferred, confirmed by \\n
    in next chunk (tests the firstChunk-stays-true-when-nothing-
    emitted invariant)

Total tests: 50 → 55.

Co-Authored-By: JasonBroderick <jason@buddyboss.com>

* fix(claude-stream): scope role-marker guard to text_delta only, not thinking_delta

Addresses review r3324xxxxxx — guarding the thinking channel buys no
security and causes legitimate aborts.

Why thinking is NOT a #3247 vector:
  - `buildDaemonTranscript` in apps/web/src/providers/daemon.ts only
    re-serializes `m.content` as `## ${m.role}\n...`.
  - Extended-thinking content is rendered to a separate
    `kind: 'thinking'` payload (daemon.ts:857-858) and never folded
    into `m.content`.
  - So a `## user` line in the thinking channel CANNOT become a
    fabricated turn boundary on the next round-trip.

Why guarding it is harmful:
  - Models routinely emit literal `## user` / `## assistant` lines
    in chain-of-thought when reasoning about conversation structure
    ("Let me think about this. The user might phrase it as:\n## user\n
    …"). Common pattern in production traces.
  - With `abortForRoleMarker` wired in server.ts, a guard match on
    thinking SIGTERMs the run and surfaces a security error to the
    UI. The user paid for the reasoning, never sees the answer, and
    gets a confusing "fabricated role marker" warning for what was
    actually legitimate metacognition.
  - This directly contradicts the module's own stated philosophy
    ("a false positive aborts the whole run — a much more expensive
    failure than a stray unflagged ... line", role-marker-guard.ts).

Fix: `emitSafeText` now passes thinking_delta through unconditionally,
skipping both the guard and the contamination check. text_delta
remains fully guarded. The single-line change at the top of
emitSafeText preserves all other channels' behavior.

Regression tests added (apps/daemon/tests/claude-stream-thinking.test.ts):
  - `## user` / `## assistant` lines in a thinking_delta — must NOT
    fire fabricated_role_marker, the thinking content streams intact
    including the marker text, and the subsequent text_delta answer
    still reaches the consumer (run not aborted).
  - Sanity check: same `## user` pattern in a text_delta DOES fire
    fabricated_role_marker and truncates emission at the marker. Locks
    in the channel-discriminated behavior.

Co-Authored-By: JasonBroderick <jason@buddyboss.com>

* fix(role-marker-guard): tie firstChunk to slicing, not byte emission

Blocking review r3324xxxxxx: under the prior firstChunk transition
("any byte emitted"), a role marker that arrived at the very start of
a message with its prefix split across multiple chunks bypassed
detection — reopening the #3247 vector on the Claude path.

Concrete cases that were missed (all are routine provider
tokenizations of \`## user\n…\` at message start):
  - \`##\`     | \` user\nDELETE…\`
  - \`## us\`  | \`er\nDELETE…\`
  - \`## \`    | \`user\nDELETE…\`

Mechanism: the pending-deferral regex only catches COMPLETE role
keywords, so a first chunk ending in a partial prefix (\`##\`, \`## \`,
\`## us\`) was emitted in full. That emission flipped firstChunk to
false. From that point only NEWLINE_ANCHORED_ROLE_MARKER_RE was used,
which requires a literal \n before \`##\`. A marker at buffer
position 0 has no preceding \n, so it could no longer match.
abortForRoleMarker never fired and tool_use blocks emitted after the
fabricated turn boundary reached the dispatcher.

Fix: change firstChunk to track "tail has not been sliced yet" rather
than "any byte emitted". While total emitted bytes <= TAIL_BUFFER_SIZE,
tail still represents the entire emission so far and \`^\` in the
canonical regex genuinely anchors at byte 0 of the stream — so the
\`^|\n\` alternation safely catches a chunk-split message-start
marker. The transition happens at the moment we would slice: once
emitted > TAIL_BUFFER_SIZE, tail becomes a mid-stream window, \`^\`
becomes meaningless, and we switch to the newline-only variants.

Earlier iterations of this code tried two other definitions, both
unsound:
  - "any byte emitted" (this commit fixes) — lost \`^\` before a
    chunk-split message-start marker could finish arriving.
  - "newline emitted" (briefly considered as the reviewer's
    alternative suggestion) — left \`^\` valid on a sliced buffer
    when streams hadn't emitted a newline yet, re-introducing the
    rolling-tail mid-stream false positive from review r3324060995.
The slice-based invariant satisfies both: while we have not sliced,
\`^\` is correct; once we slice, it is not.

Regression tests added (apps/daemon/tests/role-marker-guard.test.ts):
  - \`##\`    | \` user\nDELETE…\`   → contaminated, marker=\`## user\`
  - \`## us\` | \`er\nDELETE…\`      → contaminated, marker=\`## user\`
  - \`## \`   | \`user\nDELETE…\`    → contaminated, marker=\`## user\`
  - \`#\`     | \`# user\nDELETE…\`  → contaminated, marker=\`## user\`
The fourth case (single \`#\` first chunk) exercises an even more
adversarial tokenization than the reviewer's examples; it is also
caught.

Total tests: 55 → 59.

Co-Authored-By: JasonBroderick <jason@buddyboss.com>

* fix(tests): wrap events in stream_event envelope in thinking test

feedJsonl was feeding raw events without the `{ type: 'stream_event',
event: ... }` wrapper that createClaudeStreamHandler requires (line 141
of claude-stream.ts). Events silently fell through all branches, making
both tests pass vacuously. Also fix TS2532 on warnings[0].marker with
non-null assertion (safe after the toHaveLength(1) guard).

Co-Authored-By: RoverKai <roverkai@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: roverkai <2196140098@qq.com>
Co-authored-by: JasonBroderick <jason@buddyboss.com>
Co-authored-by: RoverKai <roverkai@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-30 03:57:56 +00:00
Denis Redozubov
c847ace554
Add run-scoped media execution policy (#3106)
* feat(contracts): add run media execution policy

* feat(daemon): enforce run media execution policy

* test(daemon): cover media execution policy gates
2026-05-28 09:19:40 +00:00
Marc Chan
338cb4d423
fix(platform): support live system proxy changes (#3093)
* fix(platform): support live system proxy changes

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Honor lowercase proxy env vars within a single source before merging proxy-aware envs.\n\nGenerated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Refresh provider request proxy env on each dispatcher creation and cover it with a focused regression test.

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): enable node env proxy for user proxy vars

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)
2026-05-28 06:11:47 +00:00
bougie-atxp
d28acdc879
Fix Gemini BYOK model URL normalization (#2761)
Some checks failed
visual-baseline / Capture visual baselines (push) Waiting to run
ci / Detect CI change scopes (push) Successful in 1s
nix-check / build (push) Failing after 2s
ci / Validate Nix flake (push) Has been skipped
ci / Preflight (push) Failing after 1s
ci / Workspace unit tests (push) Failing after 1s
ci / Daemon workspace tests (push) Failing after 1s
ci / Web workspace tests (push) Failing after 1s
ci / Browser tests (push) Failing after 1s
ci / Build workspaces (push) Failing after 1s
ci / Validate workspace (push) Failing after 0s
ci / Runtime trace (push) Has been skipped
Co-authored-by: ATXP Earn Clowdbot <bougie-atxp@users.noreply.github.com>
2026-05-24 03:23:36 +00:00
Dhruv Rana
ecfe9b9d10
fix: gate chat token params by model family (#1675)
* fix: gate chat token params by model family

* fix: retry azure deployment token params

* fix: retry azure v1 token params

* fix: report azure retry latency

* test: cover azure failed retry latency
2026-05-23 14:58:47 +00:00
lefarcen
c14baf07d3 Merge origin/main into release/v0.8.0
PR #2461 sync prep — resolves 14 conflicts merging 84 main-side commits
on top of 58 release-side commits accumulated during the 0.8.0 cycle.

Resolution summary:

Take main (theirs) where main carried deliberate forward progress:
- apps/web/src/components/PluginCard.tsx — 7 hunks, i18n migration:
  hardcoded English aria-labels/titles replaced with t() calls keyed
  on pluginCard.* (all 8 keys verified present in en.ts).
- apps/web/src/components/TasksView.tsx — 1 hunk, source-ingestion
  feature: sortedRoutines (newest-first), sourceIngestionTemplates,
  patchSourceForm, submitSourceIngestion. activeCount/pausedCount
  semantics preserved (now keyed on sortedRoutines, count unchanged).
- e2e/ui/app.test.ts — new node:fs/promises + tmpdir + path + @/timeouts
  imports needed by main-side test helpers.
- e2e/ui/settings-local-cli-codex-fallback.test.ts — menu-dismissal
  helper block added by main.

Keep both sides where each added a different field to the same object
literal:
- apps/web/src/components/ProjectView.tsx (locale + analyticsHints
  spread).
- apps/web/src/components/DesignSystemFlow.tsx (locale + analyticsHints).

Take release (ours) where release carried deliberate work that ships
0.8.0:
- CHANGELOG.md — release-side 0.8.0 entry + PR link refs; main's
  Unreleased section was the same body of work, now finalized.
- apps/landing-page/public/{apple-touch-icon,favicon}.png +
  apps/web/public/app-icon.svg — release-side visual refresh assets
  consistent with 0.8.0 stable ship.
- tools/pack/src/linux.ts — packageVersion const required by line 466;
  taking main's empty line would build-error.
- e2e/ui/project-management-flows.test.ts +
  e2e/ui/settings-api-protocol.test.ts +
  e2e/ui/settings-memory-routines.test.ts — release-side release-smoke
  hardening (shangxinyu1 + PerishFire) takes precedence on overlap.

Closes-issue / unblocks: PR #2461 sync release/v0.8.0 → main.
2026-05-23 12:17:18 +08:00
Fl0rencess
5b53c44e13
fix: enable SenseAudio BYOK TTS (#2570)
* fix: enable SenseAudio BYOK TTS

* fix: handle BYOK SenseAudio TTS failures

* fix: harden SenseAudio BYOK TTS responses

* fix: preserve BYOK speech tool failure kind

* fix: handle versioned SenseAudio TTS base URLs
2026-05-22 14:04:29 +08:00
lefarcen
6690dbd5bb
feat(analytics): PostHog + Langfuse instrumentation for assistant feedback (#1558)
* feat(analytics): PostHog + Langfuse instrumentation for assistant feedback

Re-bases the original three-commit PR onto release/v0.8.0. The web-side
feedback UI instrumentation (surface_view / ui_click / feedback_submit_result)
landed on main while this branch was open, so on this rebase that wiring
is taken from main; the remaining net additions are:

- Contracts: TrackingFeedback* enums and the four dedicated
  assistant_feedback_* event payload types (click, reason_view,
  reason_click, reason_submit), plus normalizeCustomReason helper.
  The new event-name variants are added to TrackingEventName and the
  AnalyticsEventPayload discriminated union next to the existing
  surface_view/ui_click variants — both wire formats coexist.
- POST /api/runs/:id/feedback in apps/daemon/src/chat-routes.ts:
  thin route that validates rating, allowlists reasonCodes through a
  simple string filter, and fire-and-forgets into the daemon's
  reportFeedback hook.
- apps/daemon/src/langfuse-bridge.ts reportRunFeedbackFromDaemon
  forwards the rating + reasonCodes into Langfuse as user_rating
  (NUMERIC ±1) + user_rating_reason (CATEGORICAL, one per code)
  score-create entries. Gates on telemetry.metrics + telemetry.content.
- apps/web/src/providers/daemon.ts reportChatRunFeedback (fire-and-forget
  fetch) and apps/web/src/components/ProjectView.tsx wiring so each
  thumbs-up/down + reason submission posts the side-channel.

Conflicts resolved (release/v0.8.0 vs the branch's old base):
- packages/contracts/src/analytics/events.ts: keep main's
  file_upload_result / feedback_submit_result / settings_* event
  variants alongside the new assistant_feedback_* additions.
- apps/daemon/src/server.ts: keep DNS-aware validateExternalApiBaseUrl,
  add reportFeedback closure wired into registerChatRoutes telemetry.
- apps/daemon/src/chat-routes.ts: keep both /tool-result and the new
  /feedback routes; merge RegisterChatRoutesDeps to include both
  'paths' and 'telemetry'. Drop PR's chat-routes-local
  reconcileAssistantMessageOnRunEnd helper (main has the equivalent in
  server.ts).
- apps/web/src/components/ChatPane.tsx & AssistantMessage.tsx & ProjectView.tsx:
  keep main's projectKindForTracking prop name and its existing
  emission of surface_view / ui_click / feedback_submit_result; the
  PR's analyticsCtx-based reason_view/click/submit emission is dropped
  in this rebase since it would duplicate the existing wire format.
- apps/web/tests/components/*: rename projectKind → projectKindForTracking
  to match ChatPane's current prop name.

Outstanding review feedback (from the pre-rebase round, will be
addressed in a follow-up commit):
- AssistantMessage tests not yet passing the new feedback context to
  the direct render path.
- ProjectView clear-feedback path skips reportChatRunFeedback, leaving
  stale Langfuse user_rating scores.
- buildFeedbackPayload has no deletion path for previously-submitted
  user_rating_reason scores when the user switches thumbs.
- POST /api/runs/:id/feedback always returns {status:'accepted'} even
  when consent is off; needs to surface skipped_consent / skipped_no_sink.
- reasonCodes are filtered to string[] but not allowlisted against
  ChatMessageFeedbackReasonCode or deduped.

* fix(analytics): address review on assistant feedback rebase

Picks up the in-scope correctness items from the prior review round
and the rebase residue without rewriting history:

- chat-routes.ts: `/feedback` now awaits the daemon's preflight
  outcome and echoes it as the response. The contract was already
  shaped as `accepted | skipped_consent | skipped_no_sink`, but the
  previous handler always returned `accepted` because the network
  send was fire-and-forget. The consent + sink decision is local
  (a small file read and an env-var lookup); the actual Langfuse
  upload still runs as a detached promise.
- chat-routes.ts: reasonCodes are now allowlisted against the
  contract's reason-code union and deduplicated before reaching
  Langfuse, so a stale or replayed client can't poison the
  Langfuse score table with unknown categorical values or
  duplicate stable ids in the same batch.
- langfuse-bridge.ts: split the consent + sink resolution from the
  fire-and-forget network send so the route can claim `accepted`
  honestly. The legacy `skipped_no_sink` return on app-config read
  failure is preserved.

Contracts + comment hygiene:
- TrackingFeedbackReasonCode in packages/contracts/src/analytics/events.ts
  drifted from ChatMessageFeedbackReasonCode in packages/contracts/src/api/chat.ts;
  add `followed_design_system` and `missed_design_system` so the
  analytics wire format stays aligned with the persistence shape.
- langfuse-trace.ts buildFeedbackPayload: the docblock claimed the
  raw custom-reason text is bucketed before send. Product reversed
  that on 2026-05-13 (raw text now ships, consent-gated). Replace
  the stale comment with the real semantics + a note that there is
  no tombstone path for reason codes the user removes in a
  follow-up submission (left as scope for a later PR).
- AssistantMessage.tsx: remove the now-unused
  `AssistantFeedbackAnalyticsCtx` interface and a stray blank-line
  delete from the rebase; restore the analytics-context comment
  above the feedback hook.

Left as follow-up (intentional, documented in code):
- Sending a tombstone score when the user clears their rating —
  ProjectView still skips reportChatRunFeedback on `change===null`,
  so Langfuse retains the previous rating until the user re-submits.
  The PostHog event captures the clear separately.
- Removing reason-code scores when the user re-submits with a
  smaller set — buildFeedbackPayload only overwrites the codes
  present in the current payload.

* feat(analytics): wire PR's dedicated assistant_feedback_* events

The four dedicated event types (`assistant_feedback_click` /
`_reason_view` / `_reason_click` / `_reason_submit`) the PR added to
contracts were sitting unused after the rebase because main's
umbrella `surface_view` / `ui_click` / `feedback_submit_result`
emissions covered the same user gestures. Wire the dedicated events
alongside the umbrella ones so both wire formats fire on every
feedback action — dashboards / evals can pick whichever schema they
were built against without losing signal.

Each dedicated event has stricter typing than its umbrella sibling
(`project_id` / `project_kind` / `conversation_id` are non-null), so
the new emissions are guarded behind a presence check and skipped on
test renders that mount AssistantMessage without project context. The
umbrella emissions retain their nullable fallbacks unchanged.

Pairing:
- surface_view (feedback reason panel) ↔ assistant_feedback_reason_view
- ui_click (feedback button)           ↔ assistant_feedback_click
- ui_click (reason submit button)      ↔ assistant_feedback_reason_click
- feedback_submit_result               ↔ assistant_feedback_reason_submit

Reason click + submit share the existing `requestId` so PostHog can
stitch click→result across both schemas, matching the spec.
2026-05-21 19:28:51 +08:00
lefarcen
204599a7ae
feat(analytics): ship PostHog v2 event schema (#2285)
* feat(analytics): ship PostHog v2 event schema

Aligns the PostHog wire format with the product team's v2 tracking
spec (Open Design 埋点文档 2.0). The previous v1 catalogue defined a
flat per-page event name (home_view / studio_click / settings_view…);
v2 collapses everything to four core events identified through the
page_name + area + element triplet so dashboards can group by surface
without owning a separate event per page.

Key changes
- packages/contracts/src/analytics: collapse to page_view / ui_click /
  surface_view / *_result event names; bump EVENT_SCHEMA_VERSION to 2;
  rename the wire field anonymous_id → device_id (value unchanged);
  promote the configure-state triplet (has_available_configure_cli /
  configure_type / configure_availability) to a global PostHog register
  so every event inherits it without per-helper boilerplate.
- apps/web/src/analytics: rewrite the 43 trackXxx helpers behind the
  new typed catalogue; opt out of PostHog's built-in UA bot filter so
  legitimate embedded webviews, fingerprinted browsers, and the
  Playwright-based e2e runs ingest captures (the Privacy → "Share
  usage data" toggle remains the single consent gate).
- apps/web components: wire P0/P1/P2 click + view + result surfaces
  end-to-end — left nav, toolbar, home chat composer, recent projects,
  new project modal, plugins / design systems / integrations /
  automations pages, file manager, artifact toolbar/header/share popup,
  feedback panel, settings sidebar / language / appearance /
  notifications / pets / privacy / connectors. Fixes the v1 feedback
  bug where action=clear_feedback_rating shipped rating=null instead of
  the rating being cleared.
- apps/daemon: extend run_created / run_finished with the v2 context
  (entry_from / project_kind / target_platforms / fidelity /
  connectors / etc.), add explicit error_code classification on
  result=failed (run.errorCode → AGENT_SIGNAL_* → AGENT_EXIT_* →
  AGENT_TERMINATED_UNKNOWN), and read device_id from the new
  x-od-analytics-device-id header. Also moves the run_created /
  run_finished emission to the canonical /api/runs handler in
  server.ts; the chat-routes copy was shadowed by Express's earlier
  registration and never executed, which also meant run.clientType
  never made it to Langfuse — fixed in the same move.

Verification
- pnpm guard / pnpm typecheck clean for daemon, web, and contracts.
- pnpm --filter @open-design/web test: 1645/1645 passing.
- End-to-end smoke through Playwright + local PostHog ingest project
  420348: every page_view (home/projects/automations/design_systems/
  plugins/integrations/chat_panel/file_manager), every nav element,
  the new_project_modal surface_view + tab + create flow, the
  plugin_replacement_modal surface_view, settings_view across nine
  sections, settings_cli_test_result (codex CLI), the
  project_create_result success path, and run_created + run_finished
  (result=failed, error_code=AGENT_EXIT_1) all reached PostHog with
  the v2 schema and the expected device_id / page_name / area /
  element / fidelity / target_platforms props. The remaining
  *_result events (artifact_export / feedback_submit / file_upload /
  plugin_replacement / settings_byok_test / settings_connector_auth)
  are wired in code; production traffic will trigger them.

* fix(analytics): preserve style category on design-systems surface chip switch

The merge resolution in DesignSystemsTab incorrectly re-introduced a
`setCategory('All')` call alongside the new `trackDesignSystemsTopClick`
emit. main intentionally keeps the active style category when the surface
filter refines within it; the regression was caught by the existing
"keeps the style category when a surface chip refines within it" test
in tests/components/DesignSystemsTab.test.tsx.

* fix(analytics): address review — senseaudio passthrough + daemon-side configure-state

Two follow-ups from the v2 schema review on #2285:

1. `byokProtocolToTracking()` was still falling through to `null` for
   `senseaudio` even though the v2 BYOK provider enum now lists it. Every
   `SettingsDialog` BYOK call site guards on `if (byokProviderId)`, so a
   user on SenseAudio was silently dropping the provider-option,
   field-focus, and test-result captures. Added the missing case so
   SenseAudio gets the same analytics coverage as the other providers.

2. The daemon-authoritative `run_created` / `run_finished` events were
   missing the configure-state triplet (`has_available_configure_cli` /
   `configure_type` / `configure_availability`) that v2 promotes to a
   global register on the web side. Daemon captures don't go through the
   PostHog global register, so dashboards couldn't segment run lifecycle
   by execution setup after the migration.

   The fix derives the triplet server-side from `detectAgents()` and the
   request's `agentId` before `design.analytics.capture(...)`:
     - has_available_configure_cli: any CLI on PATH reports installed
     - configure_type: 'local_cli' when the run targets an installed CLI,
       otherwise 'unknown' (daemon can't see BYOK keys, which live in
       web-client storage)
     - configure_availability: 'available' / 'unavailable' / 'unknown'
       based on the requested agent's install status, with a fallback to
       'available' when any CLI is installed

   This keeps the v2 schema consistent across both daemon-side and
   web-side captures.

* fix(analytics): wire setConfigureGlobals so browser events carry fresh state

Third follow-up from the v2 schema review on #2285. The previous fix
addressed senseaudio + daemon-side configure-state, but reviewer flagged
that `setConfigureGlobals` was still defined-only — no caller — so every
browser-side capture inherited the boot defaults
(`has_available_configure_cli=false`, `configure_type='unknown'`,
`configure_availability='unknown'`). PostHog dashboards therefore could
not segment the new `page_view` / `ui_click` / `surface_view` events by
execution setup after a user configured their environment.

Changes:

- `packages/contracts/src/analytics/events.ts` — add a pure
  `deriveConfigureGlobals(mode, agentId, agents, byokConfigured)` helper
  so the web client and the daemon can derive the triplet from the same
  source of truth. The helper covers all 5 `configure_type` buckets
  (`local_cli` / `byok` / `both` / `none` / `unknown`) and the 3
  `configure_availability` buckets (`available` / `unavailable` /
  `unknown`).
- `apps/web/src/App.tsx` — add a useEffect that re-derives the triplet
  whenever the user changes execution mode, selects a new CLI, saves a
  BYOK key, or the detected-agent list refreshes, then pushes it to
  PostHog via `analytics.setConfigureGlobals(...)`. The setter goes
  through the provider so the analytics module stays the single source
  of truth.
- `apps/web/src/analytics/provider.tsx` — expose
  `setConfigureGlobals` on the analytics context and the test stub so
  consumers route through the provider boundary.
- `apps/daemon/src/server.ts` — switch the daemon-side derive in
  `/api/runs` to the shared `deriveConfigureGlobals` helper so the
  authoritative run_created/run_finished captures match the web-side
  payload. BYOK credentials live in the web client and stay invisible
  to the daemon, so the daemon arm passes `byokConfigured: undefined`
  and falls back to the installed-CLI signal.
- `apps/web/tests/analytics-configure-globals.test.ts` — new regression
  test that pins the derive behavior across all branches and confirms
  the setter actually mutates the client-side store. Locks the wire-up
  so a future refactor can't silently turn the setter back into a
  no-op.

Verification: pnpm guard clean; daemon / web typecheck clean; web tests
1703/1703 passing (up from 1696 — 7 new tests in the configure-globals
suite).

* fix(analytics): emit projects page_view + drop misattributed chat_panel source

Fourth review pass on PR #2285. Two follow-ups from mrcfps:

1. DesignsTab (projects landing) was emitting click events but no
   matching page_view. Opening /projects without clicking anything left
   the surface invisible in PostHog. Added a once-per-mount
   trackPageView({ page_name: 'projects' }) with the same ref-keyed
   pattern HomeView / PluginsView use.

2. ChatComposer was hard-coding source: 'recent_project' on every
   chat_panel page_view. The web router currently only carries
   projectId / conversationId / fileName, so we cannot distinguish a
   New-project launch from a template-pick or a Recent-projects click
   from this layer. A false constant would over-attribute every chat
   launch to 'recent_project' and break the funnel slice this schema
   was meant to unlock. Dropped the field for now — better no source
   than the wrong source — until the router grows a launch-source
   channel; the field is still defined as optional on PageViewProps so
   the channel can land in a follow-up PR.

Verification: web typecheck clean; web tests 1703/1703 passing.

* fix(analytics): correct plugin-replacement async result + heterogeneous upload + missing requestId

Three follow-ups from the fifth review pass on PR #2285:

1. **plugin_replacement_result emitted before the apply settled**
   (`apps/web/src/components/HomeView.tsx`). The modal's confirm action
   was a synchronous wrapper around an async `usePlugin(...)` call, so
   the surrounding try/catch never observed real failures and every
   attempt was reported as `result=success`. Changed `PendingReplacement.
   confirm` to return `Promise<void>`, made the wrapper return the
   underlying promise, and moved the analytics emit into an async
   IIFE in the click handler so the success/failure branches reflect
   the actual outcome.

2. **file_upload_result mis-typed heterogeneous batches**
   (`apps/web/src/components/FileWorkspace.tsx`). The earlier
   implementation only inspected `picked[0]`, so a mixed batch like
   `image.png + demo.mp4` reported `file_type=image`. Per the comment
   above the block ("mixed batches collapse to other"), the
   implementation now maps every file to a tracking type, collapses to
   `other` when more than one distinct type is present, and falls
   back to the single type otherwise.

3. **project_create_result lost the click→result correlation id**
   (`apps/web/src/components/NewProjectPanel.tsx`). The click event
   no longer carried the locally-generated `requestId` that
   `project_create_result` keeps, so the two could not be joined.
   `trackNewProjectModalElementClick()` now accepts an optional
   `{ requestId }`, mirroring the other helpers, and the create-button
   click threads the same id used for the result.

Verification: web typecheck clean; web tests 1703/1703 passing.

* fix(analytics): gate configure-state on agents probe + drop unsent run_created fields

Two follow-ups from the sixth review pass on PR #2285:

1. **Cold-start configure-state was stamped before fetchAgents() landed**
   (`apps/web/src/App.tsx`). The useEffect that pushes the v2 triplet
   into the PostHog global register fired on first paint with
   `agents=[]`, so the first home/projects/plugins page_view reported
   `has_available_configure_cli=false` / `configure_availability=
   unavailable` even on machines that did have an installed CLI. The
   effect now waits on `agentsLoading === false` and leaves the boot
   defaults ('unknown'/'unknown') in place until the probe resolves.

2. **Daemon read run-context fields the web never sends**
   (`apps/daemon/src/server.ts`). The daemon-side run_created /
   run_finished baseProps read `projectKind`, `entryFrom`,
   `projectSource`, `targetPlatforms`, `companionSurfaces`, `fidelity`,
   `connectors`, `useSpeakerNotes`, `includeAnimations`,
   `referenceTemplate`, and `aspect` from `req.body`, but
   `packages/contracts/src/api/chat.ts` and
   `apps/web/src/providers/daemon.ts` don't carry those keys on the
   wire. Reading them therefore always produced null/undefined.
   Dropped the unsent fields from the daemon capture; a follow-up can
   extend the create payload to thread the real context through. The
   `design_system_id` field stays because the chat contract does send
   it.

Tests: added 3 regression tests in `tests/analytics-configure-globals.
test.ts` covering the boot-time gating contract (empty agents +
daemon mode → unavailable / local_cli; installed agent → available;
undefined agents list → unavailable). Verification: web typecheck
clean; daemon typecheck clean; web tests 1706/1706 passing (up from
1703 — 3 new cold-start tests).

* fix(analytics): pin mode='daemon' so missing-agent run reports unavailable

Eleventh review pass on PR #2285. mrcfps flagged that
`apps/daemon/src/server.ts` was calling `deriveConfigureGlobals(...)`
without `mode`, so the helper fell through to the generic branch.
Result: a run for an uninstalled agent was tagged
`configure_availability: 'available'` whenever any OTHER CLI was on
PATH, because the generic branch only looks at the cohort-wide
"any installed?" signal. That precisely undermines the slice the
daemon emit is trying to power.

The daemon's /api/runs handler is always a daemon-mode capture
(daemon is the local CLI runner — BYOK lives in the web layer), so we
now pin `mode: 'daemon'` on the call site. The helper then judges
`configure_availability` from the REQUESTED agent's install status and
reports `unavailable` when the user picked an agent that is not
installed, even if peers are.

Added a regression case in `tests/analytics-configure-globals.test.ts`:
`{ mode: 'daemon', agentId: 'codex', agents: [{claude,true},{codex,false}] }`
→ `{ has_available_configure_cli: true, configure_type: 'local_cli',
configure_availability: 'unavailable' }`.

Verification: daemon typecheck clean; web tests 1707/1707 passing
(up from 1706 — 1 new regression test).

* fix(analytics): hoist chat_panel page_view + thread requestId

- Move chat_panel page_view emit from ChatComposer to ProjectView so
  it survives activeConversationId-driven ChatPane remounts. ProjectView
  keys the dedupe ref by project.id; the composer drops its duplicate.
- Thread { requestId } into trackAssistantFeedbackReasonSubmitClick so
  the click pairs with the existing feedback_submit_result on the same
  request id (mirrors the trackNewProjectModalElementClick pattern).

* fix(analytics): keep v2 super-props alive across reset and stamp design_system_source

- Snapshot the register payload in client.ts on PostHog init and
  re-register it from applyConsent(true) and applyIdentity() so a
  privacy-toggle or Delete-my-data rotation does not resume capture
  without event_schema_version / device_id / session_id / locale /
  configure-state globals. setConfigureGlobals() also patches the
  cache so a later restore picks up the current configure state.
- Stamp design_system_source on daemon-side run_created / run_finished
  (it is required by RunCreatedProps / RunFinishedProps). Daemon
  can't tell default vs user_selected vs inherited from the wire, so
  it derives 'unknown' when designSystemId is present, 'not_applicable'
  otherwise — a follow-up that threads designSystemSource through
  CreateRunRequest can replace this with the precise source.
2026-05-20 13:04:20 +08:00
mzl163
210b94069a
feat(senseaudio): BYOK chat with image + video generation tools (#2065)
* feat(senseaudio): BYOK chat with image + video generation tools

Adds SenseAudio as a first-class BYOK chat protocol and wires the daemon's
chat proxy with a tool loop so BYOK users can generate images and videos
without dropping to a CLI agent.

- BYOK protocol: new senseaudio tab + /api/proxy/senseaudio/stream route +
  connection-test + provider-models discovery (OpenAI-compatible wire)
- Tool loop: generate_image (synchronous /v1/image/sync) and generate_video
  (async /v1/video/create + 5s polling /v1/video/status, 10-min ceiling,
  periodic progress log every 30s)
- Settings dropdown + chat-composer dropdown for the BYOK image model
  default; generate_image's model enum lets the LLM override per call
- Seed-on-success: a successful BYOK chat call idempotently mirrors the
  key into media-config (preserves env-resolved + already-stored keys)
- Generated artifacts land in <projectsRoot>/<projectId>/ so FileViewer,
  DesignFilesPanel, and project export pick them up automatically;
  legacy /api/byok-image/:id route kept for old conversation links
- Markdown renderer learns ![alt](url) image syntax with a scheme
  allowlist (http(s) / data:image/ / blob: / relative paths)
- i18n key settings.byokImageModel across all 19 locales
- 3 SenseAudio image models registered (2.0, 1.0, doubao-seedream-5.0);
  1 video model (doubao-seedance-2.0)
- Tests: byok-tools (29), media-senseaudio-image (8), media-config seed
  (7), proxy-routes (47), markdown image rendering (8)

* fix(senseaudio): unblock image gen + design file preview switching

- SenseAudio /v1/image/sync rejected the previous size mapping with
  `参数错误:size` (1664x936, 936x1664, 1280x960, 960x1280 are not in
  the gateway's accepted set). Switched to standard HD / SD sizes that
  every aspect bucket can hit: 1024×1024, 1280×720, 720×1280,
  1024×768, 768×1024. Kept the byok-tools and media.ts tables in sync
  so the BYOK chat tool and the CLI agent path both stop failing on
  non-square aspects.

- DesignFilesPanel's <DfPreview> was missing a key prop, so React
  reused the same iframe DOM node when the user picked a different
  file — the src prop changed but the iframe never navigated. Added
  key={previewFile.name} so the previous preview unmounts cleanly.

- Updated byok-tools + media-senseaudio-image tests for the new size
  expectations.

* docs(senseaudio): clear stale provider hint + update README

- Settings → Media → SenseAudio: clear the auto-promoted
  "Image · TTS · 70+ voices · clone" hint; the provider label alone is
  enough now that the BYOK chat surface covers image + video tooling.
- README: list the new senseaudio (and missing ollama) proxy routes so
  the BYOK section reflects what the daemon actually serves, and
  mention the generate_image / generate_video chat tools that ship
  with the SenseAudio path.

* fix(senseaudio): address PR #2065 review feedback

Three non-blocking review notes from @PerishCode on PR #2065:

1. Drop the dead /api/byok-image/:id route. The PR description claimed
   it was "legacy fallback for old chat history" but that storage
   layout never existed on main, so the route can only ever 400 or
   404 — never 200. Removed the handler, the isSafeByokImageId
   export, the unused createReadStream / stat / path / Request /
   Response imports, and the two byok-image regression tests.

2. Add rejectProxyPluginContext guard to the senseaudio proxy
   handler so it matches the invariant the other five proxy paths
   already enforce (plugin runs must go through /api/runs for
   snapshot pinning). Extended the existing "API fallback rejects
   plugin runs" describe to also cover /api/proxy/senseaudio/stream
   with the 409 PLUGIN_REQUIRES_DAEMON expectation.

3. Wrap the secondary image / video downloads (the URLs the
   SenseAudio gateway hands back in /v1/image/sync .url and
   /v1/video/status .video_url) in validateBaseUrlResolved so a
   malicious gateway can't point us at 169.254.169.254 (AWS / Azure
   metadata) or RFC1918 hosts via the response payload. Also passed
   `redirect: 'error'` on both fetches to match the SSRF posture
   the primary proxy fetch already uses. The new
   assertExternalAssetUrl helper lives next to executeGenerateImage
   so future tool downloads can reuse it.

Tests: 120/120 daemon tests pass; guard + typecheck green.

* fix(senseaudio): mirror SSRF guard onto renderSenseAudioImage CLI path

Follow-up to 01b1260a — the chat-tool fix in byok-tools.ts wasn't
mirrored onto the parallel renderSenseAudioImage path in media.ts.
Same attacker-controllable shape (gateway-returned `data.url`),
same one-line fix.

- Hoist assertExternalAssetUrl from byok-tools.ts into
  connectionTest.ts next to validateBaseUrlResolved so both call
  sites (the BYOK chat tool loop AND the CLI agent media dispatcher)
  share one helper. Made the error strings provider-agnostic so a
  future caller doesn't get a misleading "senseaudio" attribution
  for a Volcengine / Grok / etc. download.
- renderSenseAudioImage now runs the response url through
  assertExternalAssetUrl before fetching bytes, and passes
  redirect: 'error' to block a 3xx hop into private space.

Scope intentionally limited to the senseaudio path PerishCode
flagged; the other unguarded fetch(entry.url) call sites in
media.ts (OpenAI / Volcengine / Grok / Nano-Banana) are pre-existing
patterns and belong in a separate follow-up if the daemon wants
defense-in-depth across every provider.

Tests: 127/127 daemon tests pass; guard + typecheck green.

---------

Co-authored-by: unknown <mazeliang@sensetime.com>
2026-05-19 23:14:56 +08:00
@aaronjmars
9a64fccdc0
fix(security): resolve hostname before approving external API base URLs (#1176)
Some checks failed
ci / Packaged mac smoke (push) Blocked by required conditions
ci / Packaged windows smoke (push) Blocked by required conditions
ci / Detect PR change scopes (push) Failing after 2s
ci / Validate workspace (push) Has been skipped
Docker image / build-and-push (push) Failing after 27s
landing-page-ci / Validate landing page (push) Failing after 1s
landing-page-deploy / Deploy landing page (push) Has been skipped
github-metrics / Generate repository metrics SVG (push) Has been skipped
nix-check / build (push) Failing after 2s
ci / Packaged linux headless smoke (push) Has been skipped
* fix(security): resolve hostname before approving external API base URLs

Before this change the daemon-side base-URL validator only inspected the
literal hostname string. A public DNS record that points at an internal
address ('internal.example.com' → 10.0.0.5) passed validation, and the
daemon would issue the upstream request anyway — turning the BYOK proxy
into an SSRF primitive against internal infrastructure.

Add a small companion ('validateBaseUrlResolved') that runs the existing
sync check, resolves the hostname with 'dns.lookup({ all: true })', and
re-applies the block-list against every resolved address. Wire it into
the wrapper the daemon already uses ('validateExternalApiBaseUrl'), so
every proxy/finalize handler picks it up without further edits.

Carve-outs match the existing sync validator:

- Loopback hostnames skip DNS (Ollama-style local LLMs still work,
  including '*.localhost' / 'lvh.me'-style names that resolve to 127.0.0.1
  per RFC 6761).
- IP literals are already vetted by the sync pass; no need to re-resolve.
- DNS resolver errors fall through to the existing fetch error path —
  a transient ENOTFOUND should not turn into a 403.

The 6 callers that previously consumed the sync result now 'await' the
async wrapper. All call sites are already inside async route handlers.

Vitest coverage in apps/daemon/tests/connection-test.test.ts covers:
sync rejection passthrough, loopback / IP-literal short-circuits,
private IPv4 and IPv6 resolution, dual-stack with one private record,
public→public passes (api.openai.com), '*.localhost' resolved→loopback,
and resolver-error fallback.

Detected by: aeon (manual review + semgrep + osv + trufflehog).
Class: CWE-918 (SSRF) — DNS-based bypass.

* fix(security): cover remaining daemon baseUrl fetch surfaces + Ollama redirects

Addresses PR #1176 review feedback (lefarcen / mrcfps): the resolved-IP
wrapper only covered the proxy/finalize routes, leaving three adjacent
SSRF gaps open.

- testProviderConnection (/api/test/connection provider mode): switch
  from sync validateBaseUrl to await validateBaseUrlResolved so a
  hostname that resolves to a private IP is rejected before the daemon
  POSTs the smoke prompt upstream.
- listProviderModels (/api/provider/models): same swap. Import the
  DNS-aware helper from ./connectionTest.js since it carries the dns
  binding the daemon owns; contracts stays pure.
- Ollama proxy stream fetch: align with the other four proxy routes
  by setting redirect: 'error', so a validated public host cannot 3xx
  the daemon to a private/internal URL after the pre-fetch check.

Regression coverage:
- POST /api/provider/models — DNS spy returns 10.0.0.5 for a synthetic
  hostname; route must respond { ok: false, kind: 'forbidden' } and
  must not invoke upstream fetch.
- POST /api/test/connection provider mode — same shape.
- /api/proxy/ollama/stream — fetch mock asserts redirect: 'error' on
  the upstream Ollama call.

The existing /api/provider/models timeout test now stubs dnsPromises
so it doesn't race the probe timer against real DNS.

---------

Co-authored-by: aeon <aeon@aaronjmars.com>
2026-05-15 23:12:52 +08:00
lefarcen
22a3b99a47 Merge origin/main into preview/v0.8.0
Sync 49 commits from main. Conflicts resolved:
- .github/workflows/ci.yml: kept v0.8.0 granular per-area gating, added main's
  linux specs + release-stable.yml + release-preview.yml triggers
- .github/workflows/release-preview.yml: kept v0.8.0's full workflow over main's placeholder
- apps/web/src/components/AssistantMessage.tsx: combined v0.8.0 file-ops
  summary with main's stripTodoToolGroups + suppressAskUserQuestionFallbackText
- apps/web/src/components/ChatPane.tsx: kept both new imports
- apps/web/src/index.css: kept both .msg-plugin-chip and .user-copy-btn blocks
- e2e/ui/*.test.ts: kept v0.8.0 openEntrySettingsDialog helper over main's
  inline dialog navigation (UI was redesigned in v0.8.0)
- nix/package-{daemon,web}.nix: kept v0.8.0 pnpmDepsHash; rerun nix build to refresh
2026-05-15 18:23:33 +08:00
Chris Seifert
9cf265e520
feat(claude): wire AskUserQuestion tool through chat + pin TodoWrite (#1743)
* feat(claude): wire AskUserQuestion tool through chat + pin TodoWrite

Claude calls `AskUserQuestion` for mid-conversation clarifications when
the natural answer is one of a small finite set of choices. Until now
the tool round trip hit two dead ends in headless mode: claude-code -p
cannot prompt the user, so it auto-errored the tool and retried 4x;
the model then hedged by also writing the same options as a markdown
bulleted list. The host had no way to feed a real `tool_result` back.

This change makes the AskUserQuestion round trip work end to end:

* Switch Claude to `--input-format stream-json`. The daemon wraps the
  prompt as a JSONL `user` message on stdin and keeps stdin OPEN, so
  later writes (a `tool_result` for the open AskUserQuestion) feed
  back into the same child instead of needing a fresh spawn.
* New `RuntimeAdapter.promptInputFormat()` ('text' default,
  'stream-json' for Claude) so the spawn loop keeps the old close-on-
  prompt behavior for every other agent.
* New `POST /api/runs/:id/tool-result` daemon endpoint and
  `submitChatRunToolResult` web helper. Body carries `toolUseId` and
  `content`; daemon writes a JSONL `user` message with the matching
  `tool_result` content block.
* Track outstanding host answers on the run (`pendingHostAnswers`)
  and close stdin on either a `usage` event or a synthesized
  `turn_end` event (extracted from `assistant.message.stop_reason`
  in `claude-stream`). Without the per-turn `turn_end` signal stdin
  would never close after the follow up turn finished and the run
  would hang until the inactivity watchdog killed it.
* System prompt: tell Claude to use AskUserQuestion for follow ups
  with 2-4 finite choices, and to STOP after the tool call instead
  of writing a markdown duplicate.

Web UI:

* New `AskUserQuestionCard` renders the tool input as labelled chip
  buttons (single or multi select) with a Submit button styled like
  the composer's Send. On submit the answer routes through
  `submitChatRunToolResult` (live tool_result path) and falls back
  to `onSubmitForm` (plain user message) only if the run has already
  terminated. Selected chips persist across page reloads by re
  parsing the stored `tool_result.content`.
* Hide markdown text that follows an AskUserQuestion in the same
  turn — defense in depth against the model emitting the duplicate.
* Collapse identical `AskUserQuestion` / `TodoWrite` retries inside
  any tool group to a single card. TodoWrite is a snapshot tool,
  so older calls are duplicates of state.
* Pinned TodoCard above the chat composer. The latest TodoWrite
  snapshot across the conversation renders once, expandable /
  collapsible header, count shows in-progress + completed (1/4),
  Done button dismisses when all tasks finish, soft fade gradient
  above so scrolling chat text dissolves into the panel instead of
  hard clipping under the card.
* Composer gains a top shadow that only appears when the pinned
  todo slot sits directly above it (dark mode strengthened).
* Accordion expand / collapse motion shared between TodoCard, the
  ToolGroupCard disclosure, and BashCard output via
  `grid-template-rows: 0fr -> 1fr` with `cubic-bezier(0.23, 1, 0.32, 1)`
  and asymmetric durations (200ms enter, 140ms exit) per Emil
  Kowalski's animation framework.
* Jump-to-latest button no longer unmounts on hide; slides up with
  scale 0.9 -> 1 + fade on show, slides down with scale + fade on
  hide. Always horizontally centered via `margin: 0 auto`.

i18n:

* `tool.askQuestion`, `tool.askQuestionSubmit`, `tool.askQuestionPending`,
  `tool.askQuestionAnswered`, `tool.todosExpand`, `tool.todosCollapse`,
  `tool.todosDone`, `tool.todosDismiss` added to all 18 locales.

Unblocker:

* Fix a pre-existing render loop in `ProjectView` when the user
  clicks "New conversation". `handleNewConversation` now navigates
  to the fresh conversation id synchronously after
  `setActiveConversationId` so the route-sync effect at L512 and
  the URL-sync effect at L851 do not ping pong (route mismatch
  triggered repeated reverts; React's nested-update guard fired).

* fix(claude): order turn_end after content blocks + cover chat switching

Two follow-up fixes to the AskUserQuestion + new-conversation work:

* `claude-stream.ts` emitted `turn_end` BEFORE iterating the assistant
  message's content blocks. When claude-code lacks
  `--include-partial-messages` (older builds), tool_use events surface
  only from that loop, so the daemon's stdin-close handler saw an
  empty `pendingHostAnswers` set and closed stdin before the
  AskUserQuestion tool_use was even registered. The result: the model
  retried, hit the same race, and gave up writing the questions in
  prose. Emit `turn_end` AFTER the content loop so tool_use ids land
  in `pendingHostAnswers` first.

* `server.ts` now ignores `turn_end` events with
  `stop_reason: 'tool_use'`. That stop reason means the model paused
  to wait for a tool execution (claude-code's internal tool runner
  for Bash / Edit / Read, or a host-answered tool like
  AskUserQuestion). Either way the conversation is still in flight —
  closing stdin there would kill the follow-up response. Only the
  natural turn-end stop reasons (`end_turn`, etc.) close stdin.

* `ProjectView.handleSelectConversation` now navigates to the picked
  conversation id synchronously, mirroring the fix already in
  handleNewConversation. The route-sync effect at L512 was reverting
  the active conversation on every switch, ping-ponging with the
  URL-sync effect at L851 until React's nested-update guard fired
  with "Maximum update depth exceeded". Same bug class as the
  pre-existing new-conversation render loop.

* docs(agents): capture AskUserQuestion runtime + chat UI conventions

Record the patterns this PR introduces so future contributors can find
them without spelunking server.ts:

* Agent runtime conventions — `RuntimeAgentDef.promptInputFormat`,
  `run.pendingHostAnswers` / `run.stdinOpen` lifecycle, `turn_end`
  ordering rule, `POST /api/runs/:id/tool-result` endpoint shape, the
  Claude only system prompt block that nudges AskUserQuestion, and the
  `suppressAskUserQuestionFallbackText` defense in depth.
* Chat UI conventions — URL-load vs srcDoc render mode dispatch with
  bridge disqualifiers, the dual iframe visibility swap pattern,
  `isOurIframe` plus the active-iframe re-check for signals that must
  only come from the visible iframe, pinned TodoCard via
  `PinnedTodoSlot`, count includes `in_progress`,
  `dedupeSnapshotToolRetries` for AskUserQuestion / TodoWrite stacks.
* i18n keys — 18 locale files, add the key to `types.ts` first.
* UI animation philosophy — `cubic-bezier(0.23, 1, 0.32, 1)` ease out,
  asymmetric 200/140ms enter/exit, accordion via `grid-template-rows`,
  no `transform: scale(0)`, keep mounted + toggle class for exit
  transitions instead of relying on React unmount.

* fix(claude): read promptInputFormat as field, close stdin on deferred answer

Two PR review follow-ups on the AskUserQuestion stream-json wiring.

* server.ts:4616 referenced `runtimeAdapter.promptInputFormat()` — but
  `runtimeAdapter` is not declared, imported, or assigned anywhere. The
  prior adapter abstraction was deleted in #1656; when the changes
  were folded back into the inline handler the format was moved onto
  `RuntimeAgentDef.promptInputFormat`, but this call site was missed.
  `server.ts` starts with `// @ts-nocheck` so typecheck never caught
  it — every chat run hit `ReferenceError: runtimeAdapter is not
  defined` the moment we wrote the prompt to a stdin-fed child, which
  is every agent with `promptViaStdin: true` (claude, codex, copilot,
  cursor-agent, gemini, opencode, pi, qoder). Read the format off the
  in-scope `def` and default missing values to `'text'`.

* `submitToolResultToRun` cleared the answered id from
  `pendingHostAnswers` but never closed stdin if a `turn_end` /
  `usage` event had already fired with the set non-empty (deferred
  by the event handler). The child then waited indefinitely for
  further input until the inactivity watchdog killed it, losing the
  model's follow-up response. Close stdin on the last-answer
  transition when stream-json stdin is still open.

Test: pin `promptInputFormat` for every `promptViaStdin: true` agent
so future regressions of the field-vs-method contract fail at
typecheck-adjacent test time instead of in production. The new test
asserts `typeof def.promptInputFormat` is a string (or undefined),
not a function — exactly the shape mistake the original line made.

* fix(web): keep AskUserQuestion multi-select chips selected after reload when labels contain commas

`handleSubmit` joined multi-select answers with `', '` while the
reload parser split them on `','`. The pair is asymmetric: a valid
model-generated option like `"Yes, including images"` round-tripped
as `["Yes", "including images"]`, so after a page reload the locked
question card showed the user's pick as unselected — even though the
`tool_result` content the daemon actually wrote into the run was
correct, and the model saw the right answer. Bounded to post-reload
visual state, but silently confusing.

Switch to a `- ` bullet list per option, one per line, with the
parser stripping the leading `- ` back off. Newlines never appear
inside a label so the round trip is exact. The outer pairs separator
stays `\n\n` because individual answer bodies still never contain
that double-newline.

* chore: drop accidental personal design-system file

`design-systems/foldar/DESIGN.md` was added to the AskUserQuestion
branch in 31ac531 by mistake — it's a personal brand spec that does
not belong in the upstream design-systems catalogue. Removing it
keeps the branch's surface area scoped to the feature.
2026-05-15 15:50:27 +08:00
Tom Huang
76defffb93
Garnet hemisphere (#1702)
* feat(chat-composer): enhance mention handling and input overlay

- Introduced a new overlay for inline mentions in the chat composer, improving user experience by visually indicating mentions as users type.
- Updated the `ChatComposer` component to manage mention entities and integrate them into the input field, allowing for better context and interaction.
- Enhanced the `AssistantMessage` component to support the display of plugin action panels based on the current project context, facilitating easier plugin management.
- Refactored related components to ensure consistent handling of project files and mentions across the application.

This update significantly improves the chat interaction model, making it more intuitive for users to engage with mentions and plugins.

* feat(plugin-management): enhance plugin action panels and UI components

- Updated the `AssistantMessage` component to include plugin action panels based on the latest project context, improving user interaction with generated plugins.
- Refactored the `PluginsView` to support detailed views for available marketplace entries, allowing users to access more information and actions for each plugin.
- Introduced new CSS styles for improved visual representation of plugin-related UI elements, enhancing overall user experience.
- Enhanced the `listPlugins` function to include an option for fetching hidden plugins, providing more flexibility in plugin management.

This update significantly improves the usability and functionality of the plugin management system, making it easier for users to interact with and manage their plugins.

* fix(assistant-message): refine plugin folder candidate selection logic

- Updated the `pluginFoldersTouchedThisTurn` function to improve the logic for selecting plugin folder candidates based on touched paths and message content.
- Introduced a new helper function, `pathMatchesFolderFileBasename`, to enhance the matching criteria for folder candidates.
- Added a check for explicit folder matches before falling back to a single candidate, improving accuracy in folder selection.
- Modified the `shouldRenderSlotAsText` function in `HomeHero` to include the name parameter, refining the rendering logic for slot text.

These changes enhance the functionality and reliability of the assistant message component in managing plugin folder candidates.

* feat(plugin-folder-actions): implement agent-routed CLI actions for plugin management

- Introduced a new `PluginFolderAgentAction` type to streamline actions related to plugin folders, including install, publish, and contribute.
- Updated the `DesignFilesPanel`, `FileWorkspace`, and `AssistantMessage` components to utilize the new agent action handling, improving user interaction with generated plugins.
- Refactored the action handling logic to send commands to the agent, enhancing the workflow for managing plugin folders.
- Added corresponding tests to ensure the new functionality works as expected and integrates seamlessly with existing components.

This update significantly enhances the plugin management experience by routing actions through the agent, allowing for a more cohesive and interactive user experience.

* Fix PR 1702 CI blockers

* Fix PR 1702 remaining CI checks

* Prebuild AGUI adapter after install

* Restore plugin project snapshot wiring

* feat(marketplace): refactor marketplace URL handling and enhance fetching logic

- Introduced new functions to normalize marketplace URLs and manage fetching of marketplace manifests, improving the reliability of marketplace integrations.
- Updated the server and plugin logic to utilize the new fetching mechanisms, ensuring consistent handling of marketplace data.
- Enhanced tests to cover new URL normalization and fetching scenarios, ensuring robustness in marketplace management.

This update significantly improves the marketplace experience by streamlining URL handling and enhancing data fetching capabilities.

* Fix project auto-send cleanup spec
2026-05-14 21:12:50 +08:00
lefarcen
e1bc83a476
feat(analytics): PostHog product analytics (P0 events, consent-gated, packaged) (#1428)
* feat(analytics): scaffold PostHog product-analytics integration

- Add @open-design/contracts/analytics subpath with the 17 P0 event
  payload types, header constants, and code↔CSV enum mapping helpers.
- Add apps/daemon/src/analytics.ts with env-gated posthog-node client,
  request-scoped analytics context reader, and artifact-id anonymizer.
- Expose GET /api/analytics/config so the web bundle never embeds the
  PostHog key at build time; daemon owns POSTHOG_KEY / POSTHOG_HOST.
- Add apps/web/src/analytics module (identity + lazy posthog-js client
  + React provider) and mount it under <I18nProvider> in app/layout.

No event wiring yet — that lands in the next commit alongside trigger
points (App.tsx, EntryView, NewProjectPanel, SettingsDialog, FileViewer,
runs.ts).

* feat(analytics): wire app_launch, home_view, home_click, project_create_result

- App.tsx: fire app_launch once after first effect tick. handleCreateProject
  now emits project_create_result on both success and failure paths.
- EntryView.tsx: home_view (page) gated on agents loading so
  has_available_cli isn't transiently false; home_view (asset_panel) fires
  per top-tab change with the right result_count.
- NewProjectPanel.tsx: home_click create_button fires before delegating to
  the parent; a fresh request_id is generated here and threaded through
  onCreate so the matching project_create_result stitches via $insert_id.
- contracts/analytics: tighten createTabToTracking and topTabToTracking
  for the worktree branch's renamed tabs (live-artifact, templates).

* feat(analytics): wire settings_view + 3 settings_click events

- settings_view fires on dialog mount and on every section switch,
  carrying the active section (mapped via settingsSectionToTracking
  for the 16-section worktree layout), execution_mode, and the
  selected CLI provider id when present.
- settings_click execution_mode_tab: setMode now emits before/after
  values whenever the user toggles between Local CLI and BYOK.
- settings_click cli_provider_card: agent card onClick reports
  cli_provider_id via agentIdToTracking (kiro → other).
- settings_click byok_field: onFocus added to api_key, model select,
  and base_url inputs; provider_id widened to include google so the
  worktree's Gemini protocol slot type-checks.

* feat(analytics): wire studio_view + studio_click chat, studio_view artifact

- packages/contracts/src/analytics/artifact-id.ts: FNV-1a 64-bit helper
  produces a 16-hex anonymized id for (projectId, fileName). Stable
  cross-platform so the daemon and the web bundle resolve the same id
  without a Web Crypto round-trip; daemon now re-exports it.
- ChatComposer: studio_view chat_panel fires once per project mount,
  studio_click chat_composer fires on attachment + send buttons with
  estimated user_query_tokens (length/4) and has_attachment.
- FileViewer: studio_view artifact fires once per (project, file) at
  the dispatcher level, before any sub-viewer renders, with
  artifact_kind derived from the renderer registry / file.kind table.
- Widen TrackingExportFormat to include markdown and cloudflare_pages
  so the worktree branch's full share menu can emit verbatim.

* feat(analytics): wire studio_click share_option + artifact_export_result

HtmlViewer's share menu now emits both events per click via a
fireShareExport helper:

- studio_click share_option fires immediately on click with the chosen
  export_format and a fresh request_id.
- artifact_export_result fires when the export resolves — success for
  sync exporters (html, markdown, template) the moment the call
  returns, success/failed for async exporters (pdf, zip, deploy)
  via .then/.catch. The same request_id threads both events so
  PostHog stitches click → result via $insert_id.

DEPLOY_PROVIDER_OPTIONS maps to the CSV's vercel / cloudflare_pages
slots; markdown is now a first-class export_format value.

Also ignore .env.local so local POSTHOG_KEY / .env-style secrets
don't get committed.

* feat(analytics): emit run_created and run_finished from the daemon

POST /api/runs now reads the analytics context off the
x-od-analytics-* headers the web client sets on every fetch, then:

- Captures run_created with project_id, conversation_id, run_id,
  model_id, agent_provider_id (mapped via agentIdToTracking),
  skill_id, design_system_id, plus the token_count_source marker.
- Schedules a run_finished capture on runs.wait(run) resolution,
  mapping succeeded/canceled/failed to success/cancelled/failed and
  reporting total_duration_ms.

Both events use a stable insert_id derived from the same uuid so
PostHog dedupes the daemon-side mirror against any future
web-side capture without double-counting.

Token sub-fields (user_query_tokens/system_prompt_tokens/...) stay
omitted in v1 — the claude-stream parser only exposes input/output
totals today. See tracking-doc-issues.md §3.2.

* feat(analytics): emit settings_cli_test_result + settings_byok_test_result

The original BLOCKING-list assumed these CSV P0 events were not
implementable in this branch because main lacked Test buttons. The
worktree HEAD actually wires `handleTestAgent` and `handleTestProvider`
in SettingsDialog, so both events are now in scope.

- handleTestAgent emits settings_cli_test_result on success and
  failure paths with cli_provider_id mapped via agentIdToTracking,
  result drawn from result.ok / catch branch, error_code from
  result.kind or the thrown error name, and duration_ms timed via
  performance.now().
- handleTestProvider emits settings_byok_test_result analogously,
  using apiProtocol (anthropic|openai|azure|ollama|google) directly
  as provider_id — wider than the CSV's 5-value enum, documented in
  tracking-doc-issues.md §2.5.

Contracts: add SettingsCliTestResultProps / SettingsByokTestResultProps
plus matching track* helpers. AnalyticsEventName union now covers all
14 P0 events this branch supports.

* feat(analytics): gate PostHog on the existing telemetry.metrics consent

The integration now reuses the same first-launch privacy banner +
Settings → Privacy toggle that gates Langfuse, so a single user
decision controls both telemetry sinks.

- /api/analytics/config now consults the persisted AppConfigPrefs:
  it returns enabled=true only when POSTHOG_KEY is set AND the user
  has chosen "Share usage data" (telemetry.metrics === true). The
  response also echoes installationId so the web client uses the
  same anonymous id Langfuse keys off of — one identity per install,
  shared across both sinks.
- Web AnalyticsProvider:
  - Bootstrap fetch resolves installationId and threads it through
    the x-od-analytics-anonymous-id header on every /api/* fetch,
    so daemon-side captures (run_created / run_finished /
    project_create_result) land on the same person record.
  - Exposes a setConsent(granted) method that calls posthog-js's
    opt_in_capturing / opt_out_capturing, wired from App.tsx via a
    useEffect watching config.telemetry?.metrics. Toggling Privacy
    → metrics now stops/resumes events immediately, no reload.
- app_launch additionally gates on telemetry.metrics so a freshly-
  declined user fires nothing, and a freshly-opted-in user fires on
  the next reload.

* feat(packaging): bake POSTHOG_KEY into packaged daemon spawn env

Wires PostHog product analytics through the same Langfuse-style build-
secret pipeline so official Open Design builds ship with the key while
fork builds compile without it (the integration short-circuits cleanly
when POSTHOG_KEY is absent).

tools/pack
- resolveToolPackConfig reads POSTHOG_KEY / POSTHOG_HOST from
  process.env at packaging time, validates them (no whitespace in the
  key, http(s) URL for host, trailing-slash strip), and stamps them on
  ToolPackConfig. Fork builds without the env vars simply omit the
  fields; the daemon-side gate keeps things off in that case.
- Mac, Windows, and Linux packaged-config writers each append the two
  fields to open-design-config.json next to the existing
  telemetryRelayUrl entry.

apps/packaged
- RawPackagedConfig / PackagedConfig surface posthogKey / posthogHost
  so the Electron entry and headless entry both forward them to the
  daemon sidecar.
- buildPackagedDaemonSpawnEnv emits POSTHOG_KEY / POSTHOG_HOST into
  the daemon child env when present. The daemon's existing analytics
  module reads these via process.env — no daemon-side changes needed.
- The headless packaged path falls back to process.env for fields the
  builder hasn't injected, mirroring how OPEN_DESIGN_TELEMETRY_RELAY_URL
  is read there.

CI
- release-beta.yml and release-stable.yml expose POSTHOG_KEY (secret)
  and POSTHOG_HOST (var) at workflow-env scope so every packaging job
  inherits them. PR / fork builds without these set simply skip the
  bake step.

Tests
- tools/pack: config.test.ts covers bake-through, fork-build omission,
  whitespace rejection, invalid-URL rejection, and trailing-slash
  normalization.
- apps/packaged: sidecars.test.ts covers buildPackagedDaemonSpawnEnv
  forwarding the keys when present and omitting them when null.

* feat(analytics): enable PostHog autocapture + perf + exceptions

Flip on the PostHog SDK's automatic diagnostic features so we capture
click paths, page transitions, web vitals, dead clicks, and browser
exceptions without scattering instrumentation through the codebase.

Privacy defense lives in one place — apps/web/src/analytics/scrub.ts —
wired in via posthog-js's `before_send` hook so every outgoing event
passes through the same audit point:

  - $autocapture / $rageclick / $dead_click / $copy_autocapture:
    strips $el_text and value/placeholder/aria-label attrs from any
    input, textarea, password input, or contenteditable element. PostHog
    autocapture does not capture input.value by default, but $el_text
    on a <textarea> reflects the typed content — that's the prompt
    body for us, so it has to be scrubbed every time.
  - $pageview / $pageleave: drops query string and fragment from
    $current_url / $referrer so any future ?q=… can't leak.
  - $exception: rewrites file:// and absolute filesystem paths in
    stack frames to app://apps/<repo-relative> so we don't ship the
    user's home directory.
  - Suppresses $opt_in entirely — duplicate of our explicit
    setConsent toggle in App.tsx.

Element-level defense in depth is limited to the single most sensitive
surface: the chat composer textarea gets `ph-no-capture` so PostHog
never even generates an event for clicks inside that subtree. Every
other input relies on scrub.ts — sprinkling the class through every
form would be noisy and easy to forget on new surfaces.

The existing Privacy → "Share usage data" toggle continues to gate
every new feature: posthog-js's opt_out_capturing() halts autocapture,
$pageview, $exception, web vitals, and dead clicks alongside the
explicit capture() calls — one global switch.

11 unit tests pin the scrub rules in apps/web/tests/analytics-scrub.test.ts.

* ci(nix): bump pnpmDepsHash for posthog-js + posthog-node additions

Adding posthog-js to apps/web and posthog-node to apps/daemon changed
pnpm-lock.yaml, which Nix's fixed-output pnpmDeps derivation pins by
sha256. The CI nix flake check failed with:

  specified: sha256-KF3Mld72/iau+pJmA7HvnanRx8VLtDP0N624SKrtrrc=
  got:       sha256-PGFgX4lYyeH2TRAXfUq52A3EOa6bb1gO59hPsXhEk3s=

Copy the new hash into both nix/package-web.nix and
nix/package-daemon.nix per the procedure documented in nix/README.md
§"First-build hash pinning".

* feat(analytics): unify PostHog identity with Langfuse installationId

PostHog's distinct_id is the installationId stamped by /api/analytics/
config; Langfuse already reads the same id off app-config.json to
populate trace.userId. With both sinks keying off the same anonymous
identity, dashboards can correlate user actions (PostHog events) with
LLM runs (Langfuse traces) without re-identifying.

Two gaps closed:

1. applyConsent(false) — clear posthog-js's persisted ph_*_posthog
   localStorage entry on opt-out via posthog.reset(). Without this, a
   user who opts out, then clicks Delete my data, then re-opts in
   would see PostHog stitch their new session to the deleted identity
   because bootstrap.distinctID only takes effect on first init.

2. applyIdentity(newInstallationId) — Delete my data rotates the
   installationId in app-config; App.tsx now watches config.installationId
   and calls posthog.reset() then identify(newId) so the next event
   batch is fully decoupled from the deleted one. Idempotent on
   same-id re-renders so benign config refreshes don't churn PostHog
   identities.

The fetch wrapper's x-od-analytics-anonymous-id header also flips to
the new id on rotation so daemon-side captures (run_created /
run_finished) land on the same person record from the very next API
call, not after a reload.

The end-to-end rotation flow is verified against a live PostHog
project; these unit tests pin the safety guards (no-client paths, null
inputs) since stubbing posthog-js's init-loaded callback chain is
brittle.

* fix(langfuse): require both metrics AND content consent for trace reports

Tightens the Langfuse gate so a user who shares anonymous metrics but
NOT conversation content stops emitting Langfuse traces entirely —
Langfuse is used for turn-quality evals which only make sense with
prompt/output bodies. PostHog (product analytics, content-free) stays
gated on `metrics` alone and is unaffected.

i18n: "Conversation content" → "Conversation and tool content" with
hints expanded to mention tool inputs/outputs so the consent surface
matches what the trace actually carries (en + zh-CN).

Bundled here per PR scope — change originated outside this PostHog
PR but lands cleanly on the same files; gating Langfuse strictly
on `content` makes the dual-sink consent model (PostHog = metrics,
Langfuse = metrics + content) symmetric across both i18n locales and
the daemon-side gate.

* feat(analytics): wire byok_provider_option + fix PR review P1s

Adds the BYOK protocol-chip click event (5-value provider_id mirroring
the apiProtocol Settings UI) and resolves four P1 review threads on
PR #1428.

byok_provider_option:
- New SettingsClickByokProviderOptionProps in contracts (provider_id =
  anthropic|openai|azure|google|ollama; maps to CSV's 5 values per
  tracking-doc-issues.md §2.5).
- trackSettingsClickByokProviderOption helper in apps/web/src/analytics.
- SettingsDialog hooks it on the protocol-chip onClick alongside the
  existing setApiProtocol call; is_selected reflects whether the chip
  was already active.

Review fixes:

1. client.ts (Siri-Ray): clear `initPromise` when the resolution is
   null so a Privacy → metrics opt-in after a previous decline triggers
   a fresh /api/analytics/config fetch. Without this, the disabled
   response was cached forever — first-session opt-in needed a reload
   to start sending PostHog events.

2. provider.tsx (Siri-Ray): replace `url.includes('/api/')` with a
   strict same-origin + /api/ pathname check (shared
   `isSameOriginApiCall` helper). Outbound third-party URLs containing
   `/api/` (e.g. provider.example.com/api/x) no longer receive our
   x-od-analytics-* headers.

3. provider.tsx (codex-connector, lefarcen): gate header injection on
   `resolvedAnonId` being non-null. When Privacy → metrics is off,
   /api/analytics/config returns enabled=false → resolvedAnonId stays
   null → wrapper never installs → daemon can't read consent-bearing
   headers → no daemon-side PostHog event. setConsent now also clears
   resolvedAnonId on opt-out and re-fetches on opt-in.

4. daemon/analytics.ts (defense in depth): createAnalyticsService now
   takes dataDir and capture() re-reads app-config to check
   telemetry.metrics inside the fire-and-forget wrapper. Even if a
   stale header somehow reaches the daemon after opt-out, the capture
   is dropped before posthog-node.capture is called.

* fix(web): place "Share usage data" on the right in privacy consent banner

Swap button order in PrivacyConsentModal and the in-settings ConsentCard
so the affirmative "Share usage data" lands on the right and "Not now"
on the left. Matches the OK-on-the-right pattern users expect for
primary actions.

Both buttons keep equal visual prominence (same .privacy-consent-action
styling) so the swap doesn't change the EDPB equal-prominence stance
called out in the original Langfuse telemetry spec.

* feat(analytics): populate run_finished token totals from claude-stream usage

Daemon's claude-stream parser already emits agent usage events with
input_tokens / output_tokens totals; the run service buffers them in
run.events and Langfuse reads them out the same way. The run_finished
PostHog event was leaving these fields empty.

Scan run.events for the most recent agent usage frame on terminal
transition and emit input_tokens / output_tokens / total_tokens when
present. token_count_source flips to 'provider_usage' only when at
least one count landed; runs without provider-side usage data keep
'unknown'.

Provider does not break the input down into the 7 sub-fields the
tracking doc lists (memory / context / attachment / system_prompt /
…); those stay omitted until a parser change exposes them.

* feat(analytics): estimate user_query_tokens from prompt length

The user_query_tokens field for run_created / run_finished was hardcoded
to 0. We can't tokenize without bundling a model-specific tokenizer, but
the character/4 heuristic is the industry-standard estimate when one
isn't available and is enough for funnel analysis (prompt-length cohorts,
short-vs-long-query conversion rates).

Extracted from req.body via the same telemetryPromptFromRunRequest
pattern the daemon already uses for langfuse-bridge (currentPrompt then
message fallback). Only the integer count goes to PostHog — the prompt
text itself never leaves the daemon.

token_count_source flips appropriately:
- run_created with a prompt: 'estimated' (was 'unknown')
- run_created with no prompt: 'unknown'
- run_finished with provider usage: 'provider_usage' (overrides
  baseProps' 'estimated' value)
- run_finished without provider usage: inherits 'estimated' or 'unknown'
  from baseProps so input/output absent doesn't mask the estimate.
2026-05-12 22:32:42 +08:00
PerishFire
31e57fd773
fix(daemon): persist runStatus/endedAt on chat run termination (#1230)
* fix(daemon): persist runStatus/endedAt on chat run termination (#135)

POST /api/runs created the run but never reconciled the messages row
on terminal status. If the web failed to persist the cancel (refresh,
dropped PUT), the row stayed at run_status='running' / ended_at=NULL,
and on reload the elapsed timer kept climbing because the renderer
fell back to now - startedAt.

Mirror routine/orbit reconciliation: attach a wait-completion handler
that updates run_status and ended_at, guarded by COALESCE and a
run_status IN ('queued','running') filter so concurrent web persists
are not clobbered.

Adds cancelRun helper and two regression specs under e2e/tests/dialog/.

* fix(daemon): annotate reconcile callback params for chat-routes

The chat run reconciliation block landed in chat-routes.ts after the
recent server-route split (#1043), where stricter type checking surfaces
implicit `any` parameters. Annotate the wait/then callback as
`{ status: string }` and the catch callback as `unknown`.

* refactor(daemon): extract reconcileAssistantMessageOnRunEnd helper

The inline if/wait/then/catch block in POST /api/runs read as a bolt-on
patch. Lift it to a named file-scope helper so the route handler stays
intent-level (start the run, arrange follow-up reconciliation) and the
guard for missing assistantMessageId is an internal detail.

The helper's docblock describes the invariant ("messages row reflects
the run's terminal state even without web persist"); commit history
keeps the issue context.

* test(e2e): wait for any terminal status in stop-reconcile spec

The earlier .catch fallback chained two waitForRunStatus calls (canceled
then succeeded). waitForRunStatus throws on the first non-expected
terminal, so a canceled run that resolves to failed (e.g. agent exits
non-zero on SIGTERM) would still abort the test before reaching the
messages-row assertion.

Add waitForRunTerminal to e2e/lib/vitest/runs.ts: polls until any
terminal status without throwing on mismatch, since this spec's claim
is about the resulting messages row, not which terminal the run took.

Addresses Codex inline review on PR #1230.
2026-05-11 15:37:52 +08:00
nettee
b1d440d2bd
refactor(daemon): split route registration (#1043)
* spec

* refactor(daemon): split server route registrars

* refactor(daemon): group route registrar dependencies

* refactor(daemon): move remaining domain routes out of server

* update doc

* revert spec

* fix daemon route context contract

Generated-By: looper 0.5.6 (runner=fixer, agent=opencode)

* fix media task persistence

Generated-By: looper 0.5.6 (runner=fixer, agent=opencode)

* fix: restore daemon route registrations

* fix: restore static resource mutation origin checks
2026-05-11 15:00:23 +08:00