open-design

mirror of https://github.com/nexu-io/open-design.git synced 2026-06-01 03:14:35 +07:00

Author	SHA1	Message	Date
Denis Redozubov	729ce2b0cb	feat(daemon): add run-scoped MCP tool bundles (#3244 ) * feat(daemon): add run-scoped MCP tool bundles * fix(daemon): keep sandbox runs in managed project dirs * fix(daemon): reject malformed run tool bundles * fix(contracts): model run-scoped mcp server inputs * fix(daemon): reject unsupported run tool bundles * fix(daemon): validate run tools before chat fallback * test(daemon): expect sandbox imported folder failure * fix(daemon): preflight sandbox project roots before run rows * fix(daemon): preflight sandbox chat project roots * fix(daemon): allow host editor for sandbox imports * fix(daemon): preflight sandbox routine project reuse * fix(daemon): reject undeliverable Claude tool bundles * fix(daemon): single-source chat route validation	2026-05-31 03:53:04 +00:00
JasonBroderick	0fbeaf829e	fix(#3247 ): Detect, terminate, and warn on fabricated role markers across all agent paths (#3303 ) * fix(daemon): detect and strip fabricated role markers in model output (#3247) Three-layer defence against models emitting `## user` / `## assistant` / `## system` lines mid-response, which the chat host interprets as real turn boundaries and acts on as unauthorised instruction: 1. System prompt: anti-roleplay instruction elevated from a bullet under "What you don't do" to a standalone `## CRITICAL` section in `official-system.ts`, with a REMINDER pinned at the end of the composed prompt for recency bias. 2. Stream-level detection and truncation: shared `role-marker-guard.ts` module (`createRoleMarkerGuard` + `FABRICATED_ROLE_MARKER_RE`) used across all text paths — Claude stream (per-message guards), non-Claude structured streams (run-scoped guard via `emitGuardedTextDelta`), and BYOK proxy routes (`createDeltaGuard`). When a marker is detected, the contaminated suffix is dropped and a `fabricated_role_marker` event surfaces a warning in the UI. 3. UI: `StatusPill` gains `is-warning` / `is-error` CSS variants; `fabricated_role_marker` events render as amber warning pills. * fix(chat-routes): do not await reader.cancel() on stream early-return The await on reader.cancel() can hang indefinitely on response streams whose underlying source is a Uint8Array (most notably surfaced by the ollama test in proxy-routes.test.ts, which builds its mock body via `new Response(uint8array)` rather than the controller-based helper `sseResponse()`). The hung await holds the request handler open, which in turn blocks `server.close()` in the afterAll hook, producing the two test timeouts (test at 145, hook at 36) currently failing CI on #3296. Fix is in production code, not the test: don't await the cancel. It is a cleanup hint and we are returning from the function anyway, so blocking on it offers no value. fire-and-forget with an empty catch keeps the cancel signal flowing for real HTTP streams without risking a hang on mock/edge-case implementations. Co-Authored-By: JasonBroderick <jason@buddyboss.com> * fix(daemon): terminate child on role-marker detection (close #3247 generation vector) PR #3296's detection layer truncates display and persistence of fabricated role markers, but the underlying model subprocess keeps generating tokens after detection. Three concrete consequences: 1. The model bills the user for the entire contaminated response (we observed 5,106 chars stored in claude's session file for a turn where only the first 3,013 chars were legitimate — a 40% overhead). 2. tool_use blocks emitted AFTER the marker reach the daemon's dispatcher unchecked, since detection only gates the text-delta emission path, not content-block-stop / tool_use blocks. The model could fabricate "## user delete file X" then emit a tool_use(delete X) that the dispatcher would execute. 3. The UI surfaces a `fabricated_role_marker` warning followed by an eventual normal turn-end, blurring the distinction between "completed normally" and "killed by safety guard." This commit adds a single idempotent `abortForRoleMarker(marker)` helper in server.ts, scoped to the same closure as `child` and `runGuard`. On any detection event (per-message Claude guard, run-scoped non-Claude guard, plain stdout guard) the helper: - Emits a structured `ROLE_MARKER_HALLUCINATION` SSE error so the UI can render a security-class status distinct from a normal turn-end. The existing `fabricated_role_marker` warning is still sent and rendered as the amber pill (PR #3296's UI). - Calls `acpSession.abort()` for ACP-multiplexed agents (Hermes, Kimi, Devin, Kiro) whose I/O doesn't necessarily release on SIGTERM of the wrapper process alone. - SIGTERMs the child immediately, with the existing `scheduleForcedChildShutdown()` SIGKILL fallback at 2x grace. Wired into three sites where contamination is detected: - `emitGuardedTextDelta` (sendAgentEvent / copilot / ACP / pi-rpc text_delta paths) - Plain-stdout listener (BYOK plain mode) - The Claude stream handler's onEvent (per-message guards in claude-stream.ts surface `fabricated_role_marker` events directly via onEvent rather than through the run-scoped emitGuardedTextDelta) Tool_use blocks emitted BEFORE the marker still flow through normally — this guard can't help with those, since by the time we observe a text marker the prior content block has already finished. Closing that gap requires speculative cancellation of in-flight tool calls when a downstream text block contains a marker; that's tracked as follow-up work, not included here. Co-Authored-By: roverkai <2196140098@qq.com> Co-Authored-By: JasonBroderick <jason@buddyboss.com> * refactor(role-marker-guard): bounded tail + drop chat-style markers Addresses two review comments on #3303: (1) O(1) memory + per-delta work (review r3323982225) Replace the unbounded `accumulated` string with a rolling tail capped at TAIL_BUFFER_SIZE (64 chars — comfortably exceeds the longest marker prefix `\n<whitespace>## assistant` ≈ 16–24 chars in practice). A 50 KB assistant response delivered in 1000 chunks of 50 bytes was previously O(n²) on string concatenation alone; now it is O(1) per delta regardless of message length. The `tail.length` value carries the "already emitted" offset that the cut-point math needs, so the offset semantics at L74–78 of the prior implementation are preserved without re-introducing the full-text buffer. (2) Drop chat-style markers entirely (review r3323982234, option (a)) `User:` / `Assistant:` / `Human:` / `AI:` are removed from the regex. Rationale: - The host parses ONLY `## user` / `## assistant` / `## system` lines as turn boundaries (see `buildDaemonTranscript` in apps/web/src/providers/daemon.ts). A model emitting chat-style markers does NOT cause the original #3247 security failure. - With kill-on-detection wired in this PR (`abortForRoleMarker` in server.ts), a false positive aborts the whole run — far more expensive than a stray unflagged `User:` line in chat scrollback. Chat-style markers collide with legitimate output (form labels, email contacts, JSDoc) often enough that pairing them with kill-semantics is the wrong tradeoff. The tradeoff is now documented in the regex docblock so the kill-on-match behaviour is justified against the false-positive surface. Also aligns the prompt-side CRITICAL block in system.ts: drop the "don't emit User: / Assistant: / Human: / AI:" bullet, since we no longer enforce it. Less ambiguity for the model and the operators. Test file updated: - Chat-style positive tests flipped to negative ("does NOT match User: — chat-style out of scope") so the intentional exclusion has a permanent regression test. - Two new tests cover the bounded-tail behaviour: a marker arriving after 10 KB of clean text in small chunks, and a marker straddling a chunk boundary after 100 prior chunks. - Added test for legitimate `User: bob@example.com`-style content not triggering contamination. Test count is now 35 (up from 25); two of the new ones explicitly exercise the new bounded-tail path. Co-Authored-By: JasonBroderick <jason@buddyboss.com> * fix(role-marker-guard): drop \`^\` anchor after first chunk (review r3324060995) Blocking correctness bug introduced by commit 4 (bounded-tail refactor): once \`tail\` is a rolling slice of mid-stream text, \`^\` in the canonical regex \`(?:^\|\\n)\\s##\\s+(?:user\|...)\` no longer represents the genuine message start. As the rolling window slides forward chunk by chunk, a sliced tail can begin with whitespace + \`##\` (or just \`##\`), letting \`^\` anchor a match against text that the full-buffer implementation correctly ignored. With kill-on-detection wired in commit 3, that false positive now SIGTERMs the run and emits a \`ROLE_MARKER_HALLUCINATION\` error — exactly the failure class called out in the docblock at L22–29. Reviewer's evidence (PerishCode, r3324060995): streaming "…take a look at the ## user content section…" one character at a time reports \`contaminated: true\` post-refactor; the same text in a single feed stays clean. Fix: keep the canonical \`FABRICATED_ROLE_MARKER_RE\` for the very first non-empty feed (where \`^\` legitimately points at the message start), and switch to an internal \`NEWLINE_ANCHORED_ROLE_MARKER_RE\` (\`\\n\\s##\\s+(?:user\|...)\` — drops the \`^\` alternative) for all subsequent feeds. A \`firstChunk\` boolean tracks the state. Real newline-preceded markers straddling chunk boundaries are still caught because the preceding \`\\n\` is retained inside the 64-char tail. Regression tests added (\`apps/daemon/tests/role-marker-guard.test.ts\`): - mid-line \`## user\` streamed char-by-char with no preceding \\n (mirrors the reviewer's repro) - space-preceded mid-line \`## user\` in a >130-char stream, which long enough to force the rolling window past the marker — exercises the exact slice condition that triggered the bug - real \\n-preceded \`## user\` still caught after a long preamble (positive case must not regress) - \`## user\` as the very first chunk still caught (\`^\` legitimately anchors on the first feed) Co-Authored-By: JasonBroderick <jason@buddyboss.com> * fix(role-marker-guard): case-sensitive + tighter prefix scope (reviews r3324151877 / r3324151882) Two refinements addressing the third review on #3303: == Blocking (r3324151877) == The regex over-matched legitimate Markdown headings, and with kill-on-detection wired in commit 3 each false positive deterministically aborts a real run. Three changes tighten the match to the actual security surface — `## user` / `## assistant` / `## system` lines the chat host parses as turn boundaries — without losing any real attack pattern: 1. CASE-SENSITIVE. Dropped the `/i` flag. The host's turn-boundary delimiter is lowercase (see `buildDaemonTranscript` in apps/web/src/providers/daemon.ts), and the `## CRITICAL` system-prompt block already forbids only the lowercase forms. Title-Case headings like `## User Guide`, `## System Architecture`, `## Assistant settings` are now ignored — these are legitimate technical writing patterns LLMs emit constantly. `## USER NOTES` (all-caps) likewise no longer flags. 2. POSITIVE LOOKAHEAD `(?=[^a-z])` after the role keyword. Without it, `## userland`, `## userspace`, `## users guide`, `## systemd`, `## assistance` all match via prefix in the alternation. The lookahead requires the next character to exist and to not be a lowercase letter, so: - `## user\\n…` → match (newline is not lowercase) - `## assistantR…` → match (R is uppercase; the glued-form attack pattern still gets caught) - `## assistant.` → match (. is not a letter) - `## users guide` → no match (s is lowercase letter) - `## userland` → no match (l is lowercase letter) POSITIVE rather than NEGATIVE `(?![a-z])` because the negative form is satisfied at end-of-string, which in a streaming context means "we have `## user` but don't know what comes next yet" — would fire prematurely if `land` arrives in a later chunk. The positive form delays detection by one character in that edge case, traded for correctness. 3. `[ \\t]` instead of `\\s` for inner whitespace. Markdown role markers are single-line by convention; restricting to space/tab prevents oddities like `##\\nuser` from matching across lines. Test file: added Title-Case fixtures (`## User Guide`, `## System Architecture`, `## Assistant settings`, `## USER NOTES`) and prefix-of-longer-word fixtures (`## users guide`, `## userland`, `## systemd`, `## assistance`) — each asserting NO contamination. The existing `## usability` negative test gave false confidence as the reviewer noted (only failed via alternation-miss, not via word-boundary semantics); the new fixtures actually exercise the lookahead. Also added a positive test for `## assistant.` (glued punctuation) to balance the existing `## assistantReading` (glued uppercase) coverage. Total tests: 35 → 50. == Non-blocking (r3324151882) == Added `ROLE_MARKER_HALLUCINATION` to `API_ERROR_CODES` in `packages/contracts/src/errors.ts` alongside the existing agent/AMR codes, with a docblock comment explaining the emission contract: emitted by `server.ts::abortForRoleMarker` alongside the existing `fabricated_role_marker` warning event when the daemon detects a fabricated Markdown role marker in agent output; retryable. The code was already being emitted over the wire but unregistered — landing the registration here keeps the contract and emitter in sync as reviewer requested. Co-Authored-By: JasonBroderick <jason@buddyboss.com> * fix(role-marker-guard): defer complete-but-unconfirmed marker suffix Addresses review r3324277xxx — the boundary case where a stream chunk boundary lands between the role keyword and its lookahead character violated the documented "everything from the marker onward is silently dropped" contract. With (?=[^a-z]) as the lookahead, `feedText('## user')` returned `## user` as safe (no char to satisfy the lookahead → no match → pass through), so the fabricated marker line leaked into UI and app.sqlite before the next chunk confirmed contamination on the next SIGTERM cycle. Fix: introduce a `pending` state variable holding bytes that match the COMPLETE-but-unconfirmed marker prefix at end of buffer (/(?:^\|\\n)[ \\t]##[ \\t]+(?:user\|assistant\|assist\|system)$/, no lookahead, $ anchor instead). When the no-match branch detects this suffix, withhold it from emission until the next feed either: - Confirms it (next char non-lowercase) → main regex matches → contaminated → withheld bytes dropped along with `## user`. - Denies it (next char lowercase, e.g. `userl…`) → main regex no longer matches the role keyword → withheld suffix is released and emitted alongside the new continuation. Also tied the firstChunk transition to actual byte emission rather than feed count. Previously a message that starts with `## system` followed by a separate `\\n` chunk would lose the `^` anchor on the second feed (firstChunk had flipped after the first feed even though nothing was emitted yet), silently breaking detection for that edge case. Now `firstChunk` stays true until at least one byte has crossed the emission boundary, matching the conceptual definition of "message start". Tests added (apps/daemon/tests/role-marker-guard.test.ts): - `## user` deferred at chunk boundary, confirmed by `\\n` in next - `## user` deferred at chunk boundary, denied by `land` continuation - `## assistant` deferred, confirmed by punctuation - `## User` Title-Case still passes through unconditionally - `## system` as the very first chunk: deferred, confirmed by \\n in next chunk (tests the firstChunk-stays-true-when-nothing- emitted invariant) Total tests: 50 → 55. Co-Authored-By: JasonBroderick <jason@buddyboss.com> fix(claude-stream): scope role-marker guard to text_delta only, not thinking_delta Addresses review r3324xxxxxx — guarding the thinking channel buys no security and causes legitimate aborts. Why thinking is NOT a #3247 vector: - `buildDaemonTranscript` in apps/web/src/providers/daemon.ts only re-serializes `m.content` as `## ${m.role}\n...`. - Extended-thinking content is rendered to a separate `kind: 'thinking'` payload (daemon.ts:857-858) and never folded into `m.content`. - So a `## user` line in the thinking channel CANNOT become a fabricated turn boundary on the next round-trip. Why guarding it is harmful: - Models routinely emit literal `## user` / `## assistant` lines in chain-of-thought when reasoning about conversation structure ("Let me think about this. The user might phrase it as:\n## user\n …"). Common pattern in production traces. - With `abortForRoleMarker` wired in server.ts, a guard match on thinking SIGTERMs the run and surfaces a security error to the UI. The user paid for the reasoning, never sees the answer, and gets a confusing "fabricated role marker" warning for what was actually legitimate metacognition. - This directly contradicts the module's own stated philosophy ("a false positive aborts the whole run — a much more expensive failure than a stray unflagged ... line", role-marker-guard.ts). Fix: `emitSafeText` now passes thinking_delta through unconditionally, skipping both the guard and the contamination check. text_delta remains fully guarded. The single-line change at the top of emitSafeText preserves all other channels' behavior. Regression tests added (apps/daemon/tests/claude-stream-thinking.test.ts): - `## user` / `## assistant` lines in a thinking_delta — must NOT fire fabricated_role_marker, the thinking content streams intact including the marker text, and the subsequent text_delta answer still reaches the consumer (run not aborted). - Sanity check: same `## user` pattern in a text_delta DOES fire fabricated_role_marker and truncates emission at the marker. Locks in the channel-discriminated behavior. Co-Authored-By: JasonBroderick <jason@buddyboss.com> * fix(role-marker-guard): tie firstChunk to slicing, not byte emission Blocking review r3324xxxxxx: under the prior firstChunk transition ("any byte emitted"), a role marker that arrived at the very start of a message with its prefix split across multiple chunks bypassed detection — reopening the #3247 vector on the Claude path. Concrete cases that were missed (all are routine provider tokenizations of \`## user\n…\` at message start): - \`##\` \| \` user\nDELETE…\` - \`## us\` \| \`er\nDELETE…\` - \`## \` \| \`user\nDELETE…\` Mechanism: the pending-deferral regex only catches COMPLETE role keywords, so a first chunk ending in a partial prefix (\`##\`, \`## \`, \`## us\`) was emitted in full. That emission flipped firstChunk to false. From that point only NEWLINE_ANCHORED_ROLE_MARKER_RE was used, which requires a literal \n before \`##\`. A marker at buffer position 0 has no preceding \n, so it could no longer match. abortForRoleMarker never fired and tool_use blocks emitted after the fabricated turn boundary reached the dispatcher. Fix: change firstChunk to track "tail has not been sliced yet" rather than "any byte emitted". While total emitted bytes <= TAIL_BUFFER_SIZE, tail still represents the entire emission so far and \`^\` in the canonical regex genuinely anchors at byte 0 of the stream — so the \`^\|\n\` alternation safely catches a chunk-split message-start marker. The transition happens at the moment we would slice: once emitted > TAIL_BUFFER_SIZE, tail becomes a mid-stream window, \`^\` becomes meaningless, and we switch to the newline-only variants. Earlier iterations of this code tried two other definitions, both unsound: - "any byte emitted" (this commit fixes) — lost \`^\` before a chunk-split message-start marker could finish arriving. - "newline emitted" (briefly considered as the reviewer's alternative suggestion) — left \`^\` valid on a sliced buffer when streams hadn't emitted a newline yet, re-introducing the rolling-tail mid-stream false positive from review r3324060995. The slice-based invariant satisfies both: while we have not sliced, \`^\` is correct; once we slice, it is not. Regression tests added (apps/daemon/tests/role-marker-guard.test.ts): - \`##\` \| \` user\nDELETE…\` → contaminated, marker=\`## user\` - \`## us\` \| \`er\nDELETE…\` → contaminated, marker=\`## user\` - \`## \` \| \`user\nDELETE…\` → contaminated, marker=\`## user\` - \`#\` \| \`# user\nDELETE…\` → contaminated, marker=\`## user\` The fourth case (single \`#\` first chunk) exercises an even more adversarial tokenization than the reviewer's examples; it is also caught. Total tests: 55 → 59. Co-Authored-By: JasonBroderick <jason@buddyboss.com> * fix(tests): wrap events in stream_event envelope in thinking test feedJsonl was feeding raw events without the `{ type: 'stream_event', event: ... }` wrapper that createClaudeStreamHandler requires (line 141 of claude-stream.ts). Events silently fell through all branches, making both tests pass vacuously. Also fix TS2532 on warnings[0].marker with non-null assertion (safe after the toHaveLength(1) guard). Co-Authored-By: RoverKai <roverkai@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: roverkai <2196140098@qq.com> Co-authored-by: JasonBroderick <jason@buddyboss.com> Co-authored-by: RoverKai <roverkai@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-05-30 03:57:56 +00:00
Denis Redozubov	c847ace554	Add run-scoped media execution policy (#3106 ) * feat(contracts): add run media execution policy * feat(daemon): enforce run media execution policy * test(daemon): cover media execution policy gates	2026-05-28 09:19:40 +00:00
Marc Chan	338cb4d423	fix(platform): support live system proxy changes (#3093 ) * fix(platform): support live system proxy changes * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Honor lowercase proxy env vars within a single source before merging proxy-aware envs.\n\nGenerated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Refresh provider request proxy env on each dispatcher creation and cover it with a focused regression test. Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): enable node env proxy for user proxy vars * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)	2026-05-28 06:11:47 +00:00
bougie-atxp	d28acdc879	Fix Gemini BYOK model URL normalization (#2761 ) Some checks failed visual-baseline / Capture visual baselines (push) Waiting to run Details ci / Detect CI change scopes (push) Successful in 1s Details nix-check / build (push) Failing after 2s Details ci / Validate Nix flake (push) Has been skipped Details ci / Preflight (push) Failing after 1s Details ci / Workspace unit tests (push) Failing after 1s Details ci / Daemon workspace tests (push) Failing after 1s Details ci / Web workspace tests (push) Failing after 1s Details ci / Browser tests (push) Failing after 1s Details ci / Build workspaces (push) Failing after 1s Details ci / Validate workspace (push) Failing after 0s Details ci / Runtime trace (push) Has been skipped Details Co-authored-by: ATXP Earn Clowdbot <bougie-atxp@users.noreply.github.com>	2026-05-24 03:23:36 +00:00
Dhruv Rana	ecfe9b9d10	fix: gate chat token params by model family (#1675 ) * fix: gate chat token params by model family * fix: retry azure deployment token params * fix: retry azure v1 token params * fix: report azure retry latency * test: cover azure failed retry latency	2026-05-23 14:58:47 +00:00
lefarcen	c14baf07d3	Merge origin/main into release/v0.8.0 PR #2461 sync prep — resolves 14 conflicts merging 84 main-side commits on top of 58 release-side commits accumulated during the 0.8.0 cycle. Resolution summary: Take main (theirs) where main carried deliberate forward progress: - apps/web/src/components/PluginCard.tsx — 7 hunks, i18n migration: hardcoded English aria-labels/titles replaced with t() calls keyed on pluginCard.* (all 8 keys verified present in en.ts). - apps/web/src/components/TasksView.tsx — 1 hunk, source-ingestion feature: sortedRoutines (newest-first), sourceIngestionTemplates, patchSourceForm, submitSourceIngestion. activeCount/pausedCount semantics preserved (now keyed on sortedRoutines, count unchanged). - e2e/ui/app.test.ts — new node:fs/promises + tmpdir + path + @/timeouts imports needed by main-side test helpers. - e2e/ui/settings-local-cli-codex-fallback.test.ts — menu-dismissal helper block added by main. Keep both sides where each added a different field to the same object literal: - apps/web/src/components/ProjectView.tsx (locale + analyticsHints spread). - apps/web/src/components/DesignSystemFlow.tsx (locale + analyticsHints). Take release (ours) where release carried deliberate work that ships 0.8.0: - CHANGELOG.md — release-side 0.8.0 entry + PR link refs; main's Unreleased section was the same body of work, now finalized. - apps/landing-page/public/{apple-touch-icon,favicon}.png + apps/web/public/app-icon.svg — release-side visual refresh assets consistent with 0.8.0 stable ship. - tools/pack/src/linux.ts — packageVersion const required by line 466; taking main's empty line would build-error. - e2e/ui/project-management-flows.test.ts + e2e/ui/settings-api-protocol.test.ts + e2e/ui/settings-memory-routines.test.ts — release-side release-smoke hardening (shangxinyu1 + PerishFire) takes precedence on overlap. Closes-issue / unblocks: PR #2461 sync release/v0.8.0 → main.	2026-05-23 12:17:18 +08:00
Fl0rencess	5b53c44e13	fix: enable SenseAudio BYOK TTS (#2570 ) * fix: enable SenseAudio BYOK TTS * fix: handle BYOK SenseAudio TTS failures * fix: harden SenseAudio BYOK TTS responses * fix: preserve BYOK speech tool failure kind * fix: handle versioned SenseAudio TTS base URLs	2026-05-22 14:04:29 +08:00
lefarcen	6690dbd5bb	feat(analytics): PostHog + Langfuse instrumentation for assistant feedback (#1558 ) * feat(analytics): PostHog + Langfuse instrumentation for assistant feedback Re-bases the original three-commit PR onto release/v0.8.0. The web-side feedback UI instrumentation (surface_view / ui_click / feedback_submit_result) landed on main while this branch was open, so on this rebase that wiring is taken from main; the remaining net additions are: - Contracts: TrackingFeedback* enums and the four dedicated assistant_feedback_* event payload types (click, reason_view, reason_click, reason_submit), plus normalizeCustomReason helper. The new event-name variants are added to TrackingEventName and the AnalyticsEventPayload discriminated union next to the existing surface_view/ui_click variants — both wire formats coexist. - POST /api/runs/:id/feedback in apps/daemon/src/chat-routes.ts: thin route that validates rating, allowlists reasonCodes through a simple string filter, and fire-and-forgets into the daemon's reportFeedback hook. - apps/daemon/src/langfuse-bridge.ts reportRunFeedbackFromDaemon forwards the rating + reasonCodes into Langfuse as user_rating (NUMERIC ±1) + user_rating_reason (CATEGORICAL, one per code) score-create entries. Gates on telemetry.metrics + telemetry.content. - apps/web/src/providers/daemon.ts reportChatRunFeedback (fire-and-forget fetch) and apps/web/src/components/ProjectView.tsx wiring so each thumbs-up/down + reason submission posts the side-channel. Conflicts resolved (release/v0.8.0 vs the branch's old base): - packages/contracts/src/analytics/events.ts: keep main's file_upload_result / feedback_submit_result / settings_* event variants alongside the new assistant_feedback_* additions. - apps/daemon/src/server.ts: keep DNS-aware validateExternalApiBaseUrl, add reportFeedback closure wired into registerChatRoutes telemetry. - apps/daemon/src/chat-routes.ts: keep both /tool-result and the new /feedback routes; merge RegisterChatRoutesDeps to include both 'paths' and 'telemetry'. Drop PR's chat-routes-local reconcileAssistantMessageOnRunEnd helper (main has the equivalent in server.ts). - apps/web/src/components/ChatPane.tsx & AssistantMessage.tsx & ProjectView.tsx: keep main's projectKindForTracking prop name and its existing emission of surface_view / ui_click / feedback_submit_result; the PR's analyticsCtx-based reason_view/click/submit emission is dropped in this rebase since it would duplicate the existing wire format. - apps/web/tests/components/: rename projectKind → projectKindForTracking to match ChatPane's current prop name. Outstanding review feedback (from the pre-rebase round, will be addressed in a follow-up commit): - AssistantMessage tests not yet passing the new feedback context to the direct render path. - ProjectView clear-feedback path skips reportChatRunFeedback, leaving stale Langfuse user_rating scores. - buildFeedbackPayload has no deletion path for previously-submitted user_rating_reason scores when the user switches thumbs. - POST /api/runs/:id/feedback always returns {status:'accepted'} even when consent is off; needs to surface skipped_consent / skipped_no_sink. - reasonCodes are filtered to string[] but not allowlisted against ChatMessageFeedbackReasonCode or deduped. fix(analytics): address review on assistant feedback rebase Picks up the in-scope correctness items from the prior review round and the rebase residue without rewriting history: - chat-routes.ts: `/feedback` now awaits the daemon's preflight outcome and echoes it as the response. The contract was already shaped as `accepted \| skipped_consent \| skipped_no_sink`, but the previous handler always returned `accepted` because the network send was fire-and-forget. The consent + sink decision is local (a small file read and an env-var lookup); the actual Langfuse upload still runs as a detached promise. - chat-routes.ts: reasonCodes are now allowlisted against the contract's reason-code union and deduplicated before reaching Langfuse, so a stale or replayed client can't poison the Langfuse score table with unknown categorical values or duplicate stable ids in the same batch. - langfuse-bridge.ts: split the consent + sink resolution from the fire-and-forget network send so the route can claim `accepted` honestly. The legacy `skipped_no_sink` return on app-config read failure is preserved. Contracts + comment hygiene: - TrackingFeedbackReasonCode in packages/contracts/src/analytics/events.ts drifted from ChatMessageFeedbackReasonCode in packages/contracts/src/api/chat.ts; add `followed_design_system` and `missed_design_system` so the analytics wire format stays aligned with the persistence shape. - langfuse-trace.ts buildFeedbackPayload: the docblock claimed the raw custom-reason text is bucketed before send. Product reversed that on 2026-05-13 (raw text now ships, consent-gated). Replace the stale comment with the real semantics + a note that there is no tombstone path for reason codes the user removes in a follow-up submission (left as scope for a later PR). - AssistantMessage.tsx: remove the now-unused `AssistantFeedbackAnalyticsCtx` interface and a stray blank-line delete from the rebase; restore the analytics-context comment above the feedback hook. Left as follow-up (intentional, documented in code): - Sending a tombstone score when the user clears their rating — ProjectView still skips reportChatRunFeedback on `change===null`, so Langfuse retains the previous rating until the user re-submits. The PostHog event captures the clear separately. - Removing reason-code scores when the user re-submits with a smaller set — buildFeedbackPayload only overwrites the codes present in the current payload. * feat(analytics): wire PR's dedicated assistant_feedback_* events The four dedicated event types (`assistant_feedback_click` / `_reason_view` / `_reason_click` / `_reason_submit`) the PR added to contracts were sitting unused after the rebase because main's umbrella `surface_view` / `ui_click` / `feedback_submit_result` emissions covered the same user gestures. Wire the dedicated events alongside the umbrella ones so both wire formats fire on every feedback action — dashboards / evals can pick whichever schema they were built against without losing signal. Each dedicated event has stricter typing than its umbrella sibling (`project_id` / `project_kind` / `conversation_id` are non-null), so the new emissions are guarded behind a presence check and skipped on test renders that mount AssistantMessage without project context. The umbrella emissions retain their nullable fallbacks unchanged. Pairing: - surface_view (feedback reason panel) ↔ assistant_feedback_reason_view - ui_click (feedback button) ↔ assistant_feedback_click - ui_click (reason submit button) ↔ assistant_feedback_reason_click - feedback_submit_result ↔ assistant_feedback_reason_submit Reason click + submit share the existing `requestId` so PostHog can stitch click→result across both schemas, matching the spec.	2026-05-21 19:28:51 +08:00
lefarcen	204599a7ae	feat(analytics): ship PostHog v2 event schema (#2285 ) * feat(analytics): ship PostHog v2 event schema Aligns the PostHog wire format with the product team's v2 tracking spec (Open Design 埋点文档 2.0). The previous v1 catalogue defined a flat per-page event name (home_view / studio_click / settings_view…); v2 collapses everything to four core events identified through the page_name + area + element triplet so dashboards can group by surface without owning a separate event per page. Key changes - packages/contracts/src/analytics: collapse to page_view / ui_click / surface_view / _result event names; bump EVENT_SCHEMA_VERSION to 2; rename the wire field anonymous_id → device_id (value unchanged); promote the configure-state triplet (has_available_configure_cli / configure_type / configure_availability) to a global PostHog register so every event inherits it without per-helper boilerplate. - apps/web/src/analytics: rewrite the 43 trackXxx helpers behind the new typed catalogue; opt out of PostHog's built-in UA bot filter so legitimate embedded webviews, fingerprinted browsers, and the Playwright-based e2e runs ingest captures (the Privacy → "Share usage data" toggle remains the single consent gate). - apps/web components: wire P0/P1/P2 click + view + result surfaces end-to-end — left nav, toolbar, home chat composer, recent projects, new project modal, plugins / design systems / integrations / automations pages, file manager, artifact toolbar/header/share popup, feedback panel, settings sidebar / language / appearance / notifications / pets / privacy / connectors. Fixes the v1 feedback bug where action=clear_feedback_rating shipped rating=null instead of the rating being cleared. - apps/daemon: extend run_created / run_finished with the v2 context (entry_from / project_kind / target_platforms / fidelity / connectors / etc.), add explicit error_code classification on result=failed (run.errorCode → AGENT_SIGNAL_ → AGENT_EXIT_* → AGENT_TERMINATED_UNKNOWN), and read device_id from the new x-od-analytics-device-id header. Also moves the run_created / run_finished emission to the canonical /api/runs handler in server.ts; the chat-routes copy was shadowed by Express's earlier registration and never executed, which also meant run.clientType never made it to Langfuse — fixed in the same move. Verification - pnpm guard / pnpm typecheck clean for daemon, web, and contracts. - pnpm --filter @open-design/web test: 1645/1645 passing. - End-to-end smoke through Playwright + local PostHog ingest project 420348: every page_view (home/projects/automations/design_systems/ plugins/integrations/chat_panel/file_manager), every nav element, the new_project_modal surface_view + tab + create flow, the plugin_replacement_modal surface_view, settings_view across nine sections, settings_cli_test_result (codex CLI), the project_create_result success path, and run_created + run_finished (result=failed, error_code=AGENT_EXIT_1) all reached PostHog with the v2 schema and the expected device_id / page_name / area / element / fidelity / target_platforms props. The remaining _result events (artifact_export / feedback_submit / file_upload / plugin_replacement / settings_byok_test / settings_connector_auth) are wired in code; production traffic will trigger them. fix(analytics): preserve style category on design-systems surface chip switch The merge resolution in DesignSystemsTab incorrectly re-introduced a `setCategory('All')` call alongside the new `trackDesignSystemsTopClick` emit. main intentionally keeps the active style category when the surface filter refines within it; the regression was caught by the existing "keeps the style category when a surface chip refines within it" test in tests/components/DesignSystemsTab.test.tsx. * fix(analytics): address review — senseaudio passthrough + daemon-side configure-state Two follow-ups from the v2 schema review on #2285: 1. `byokProtocolToTracking()` was still falling through to `null` for `senseaudio` even though the v2 BYOK provider enum now lists it. Every `SettingsDialog` BYOK call site guards on `if (byokProviderId)`, so a user on SenseAudio was silently dropping the provider-option, field-focus, and test-result captures. Added the missing case so SenseAudio gets the same analytics coverage as the other providers. 2. The daemon-authoritative `run_created` / `run_finished` events were missing the configure-state triplet (`has_available_configure_cli` / `configure_type` / `configure_availability`) that v2 promotes to a global register on the web side. Daemon captures don't go through the PostHog global register, so dashboards couldn't segment run lifecycle by execution setup after the migration. The fix derives the triplet server-side from `detectAgents()` and the request's `agentId` before `design.analytics.capture(...)`: - has_available_configure_cli: any CLI on PATH reports installed - configure_type: 'local_cli' when the run targets an installed CLI, otherwise 'unknown' (daemon can't see BYOK keys, which live in web-client storage) - configure_availability: 'available' / 'unavailable' / 'unknown' based on the requested agent's install status, with a fallback to 'available' when any CLI is installed This keeps the v2 schema consistent across both daemon-side and web-side captures. * fix(analytics): wire setConfigureGlobals so browser events carry fresh state Third follow-up from the v2 schema review on #2285. The previous fix addressed senseaudio + daemon-side configure-state, but reviewer flagged that `setConfigureGlobals` was still defined-only — no caller — so every browser-side capture inherited the boot defaults (`has_available_configure_cli=false`, `configure_type='unknown'`, `configure_availability='unknown'`). PostHog dashboards therefore could not segment the new `page_view` / `ui_click` / `surface_view` events by execution setup after a user configured their environment. Changes: - `packages/contracts/src/analytics/events.ts` — add a pure `deriveConfigureGlobals(mode, agentId, agents, byokConfigured)` helper so the web client and the daemon can derive the triplet from the same source of truth. The helper covers all 5 `configure_type` buckets (`local_cli` / `byok` / `both` / `none` / `unknown`) and the 3 `configure_availability` buckets (`available` / `unavailable` / `unknown`). - `apps/web/src/App.tsx` — add a useEffect that re-derives the triplet whenever the user changes execution mode, selects a new CLI, saves a BYOK key, or the detected-agent list refreshes, then pushes it to PostHog via `analytics.setConfigureGlobals(...)`. The setter goes through the provider so the analytics module stays the single source of truth. - `apps/web/src/analytics/provider.tsx` — expose `setConfigureGlobals` on the analytics context and the test stub so consumers route through the provider boundary. - `apps/daemon/src/server.ts` — switch the daemon-side derive in `/api/runs` to the shared `deriveConfigureGlobals` helper so the authoritative run_created/run_finished captures match the web-side payload. BYOK credentials live in the web client and stay invisible to the daemon, so the daemon arm passes `byokConfigured: undefined` and falls back to the installed-CLI signal. - `apps/web/tests/analytics-configure-globals.test.ts` — new regression test that pins the derive behavior across all branches and confirms the setter actually mutates the client-side store. Locks the wire-up so a future refactor can't silently turn the setter back into a no-op. Verification: pnpm guard clean; daemon / web typecheck clean; web tests 1703/1703 passing (up from 1696 — 7 new tests in the configure-globals suite). * fix(analytics): emit projects page_view + drop misattributed chat_panel source Fourth review pass on PR #2285. Two follow-ups from mrcfps: 1. DesignsTab (projects landing) was emitting click events but no matching page_view. Opening /projects without clicking anything left the surface invisible in PostHog. Added a once-per-mount trackPageView({ page_name: 'projects' }) with the same ref-keyed pattern HomeView / PluginsView use. 2. ChatComposer was hard-coding source: 'recent_project' on every chat_panel page_view. The web router currently only carries projectId / conversationId / fileName, so we cannot distinguish a New-project launch from a template-pick or a Recent-projects click from this layer. A false constant would over-attribute every chat launch to 'recent_project' and break the funnel slice this schema was meant to unlock. Dropped the field for now — better no source than the wrong source — until the router grows a launch-source channel; the field is still defined as optional on PageViewProps so the channel can land in a follow-up PR. Verification: web typecheck clean; web tests 1703/1703 passing. * fix(analytics): correct plugin-replacement async result + heterogeneous upload + missing requestId Three follow-ups from the fifth review pass on PR #2285: 1. plugin_replacement_result emitted before the apply settled (`apps/web/src/components/HomeView.tsx`). The modal's confirm action was a synchronous wrapper around an async `usePlugin(...)` call, so the surrounding try/catch never observed real failures and every attempt was reported as `result=success`. Changed `PendingReplacement. confirm` to return `Promise<void>`, made the wrapper return the underlying promise, and moved the analytics emit into an async IIFE in the click handler so the success/failure branches reflect the actual outcome. 2. file_upload_result mis-typed heterogeneous batches (`apps/web/src/components/FileWorkspace.tsx`). The earlier implementation only inspected `picked[0]`, so a mixed batch like `image.png + demo.mp4` reported `file_type=image`. Per the comment above the block ("mixed batches collapse to other"), the implementation now maps every file to a tracking type, collapses to `other` when more than one distinct type is present, and falls back to the single type otherwise. 3. project_create_result lost the click→result correlation id (`apps/web/src/components/NewProjectPanel.tsx`). The click event no longer carried the locally-generated `requestId` that `project_create_result` keeps, so the two could not be joined. `trackNewProjectModalElementClick()` now accepts an optional `{ requestId }`, mirroring the other helpers, and the create-button click threads the same id used for the result. Verification: web typecheck clean; web tests 1703/1703 passing. * fix(analytics): gate configure-state on agents probe + drop unsent run_created fields Two follow-ups from the sixth review pass on PR #2285: 1. Cold-start configure-state was stamped before fetchAgents() landed (`apps/web/src/App.tsx`). The useEffect that pushes the v2 triplet into the PostHog global register fired on first paint with `agents=[]`, so the first home/projects/plugins page_view reported `has_available_configure_cli=false` / `configure_availability= unavailable` even on machines that did have an installed CLI. The effect now waits on `agentsLoading === false` and leaves the boot defaults ('unknown'/'unknown') in place until the probe resolves. 2. Daemon read run-context fields the web never sends (`apps/daemon/src/server.ts`). The daemon-side run_created / run_finished baseProps read `projectKind`, `entryFrom`, `projectSource`, `targetPlatforms`, `companionSurfaces`, `fidelity`, `connectors`, `useSpeakerNotes`, `includeAnimations`, `referenceTemplate`, and `aspect` from `req.body`, but `packages/contracts/src/api/chat.ts` and `apps/web/src/providers/daemon.ts` don't carry those keys on the wire. Reading them therefore always produced null/undefined. Dropped the unsent fields from the daemon capture; a follow-up can extend the create payload to thread the real context through. The `design_system_id` field stays because the chat contract does send it. Tests: added 3 regression tests in `tests/analytics-configure-globals. test.ts` covering the boot-time gating contract (empty agents + daemon mode → unavailable / local_cli; installed agent → available; undefined agents list → unavailable). Verification: web typecheck clean; daemon typecheck clean; web tests 1706/1706 passing (up from 1703 — 3 new cold-start tests). * fix(analytics): pin mode='daemon' so missing-agent run reports unavailable Eleventh review pass on PR #2285. mrcfps flagged that `apps/daemon/src/server.ts` was calling `deriveConfigureGlobals(...)` without `mode`, so the helper fell through to the generic branch. Result: a run for an uninstalled agent was tagged `configure_availability: 'available'` whenever any OTHER CLI was on PATH, because the generic branch only looks at the cohort-wide "any installed?" signal. That precisely undermines the slice the daemon emit is trying to power. The daemon's /api/runs handler is always a daemon-mode capture (daemon is the local CLI runner — BYOK lives in the web layer), so we now pin `mode: 'daemon'` on the call site. The helper then judges `configure_availability` from the REQUESTED agent's install status and reports `unavailable` when the user picked an agent that is not installed, even if peers are. Added a regression case in `tests/analytics-configure-globals.test.ts`: `{ mode: 'daemon', agentId: 'codex', agents: [{claude,true},{codex,false}] }` → `{ has_available_configure_cli: true, configure_type: 'local_cli', configure_availability: 'unavailable' }`. Verification: daemon typecheck clean; web tests 1707/1707 passing (up from 1706 — 1 new regression test). * fix(analytics): hoist chat_panel page_view + thread requestId - Move chat_panel page_view emit from ChatComposer to ProjectView so it survives activeConversationId-driven ChatPane remounts. ProjectView keys the dedupe ref by project.id; the composer drops its duplicate. - Thread { requestId } into trackAssistantFeedbackReasonSubmitClick so the click pairs with the existing feedback_submit_result on the same request id (mirrors the trackNewProjectModalElementClick pattern). * fix(analytics): keep v2 super-props alive across reset and stamp design_system_source - Snapshot the register payload in client.ts on PostHog init and re-register it from applyConsent(true) and applyIdentity() so a privacy-toggle or Delete-my-data rotation does not resume capture without event_schema_version / device_id / session_id / locale / configure-state globals. setConfigureGlobals() also patches the cache so a later restore picks up the current configure state. - Stamp design_system_source on daemon-side run_created / run_finished (it is required by RunCreatedProps / RunFinishedProps). Daemon can't tell default vs user_selected vs inherited from the wire, so it derives 'unknown' when designSystemId is present, 'not_applicable' otherwise — a follow-up that threads designSystemSource through CreateRunRequest can replace this with the precise source.	2026-05-20 13:04:20 +08:00
mzl163	210b94069a	feat(senseaudio): BYOK chat with image + video generation tools (#2065 ) * feat(senseaudio): BYOK chat with image + video generation tools Adds SenseAudio as a first-class BYOK chat protocol and wires the daemon's chat proxy with a tool loop so BYOK users can generate images and videos without dropping to a CLI agent. - BYOK protocol: new senseaudio tab + /api/proxy/senseaudio/stream route + connection-test + provider-models discovery (OpenAI-compatible wire) - Tool loop: generate_image (synchronous /v1/image/sync) and generate_video (async /v1/video/create + 5s polling /v1/video/status, 10-min ceiling, periodic progress log every 30s) - Settings dropdown + chat-composer dropdown for the BYOK image model default; generate_image's model enum lets the LLM override per call - Seed-on-success: a successful BYOK chat call idempotently mirrors the key into media-config (preserves env-resolved + already-stored keys) - Generated artifacts land in <projectsRoot>/<projectId>/ so FileViewer, DesignFilesPanel, and project export pick them up automatically; legacy /api/byok-image/:id route kept for old conversation links - Markdown renderer learns ![alt](url) image syntax with a scheme allowlist (http(s) / data:image/ / blob: / relative paths) - i18n key settings.byokImageModel across all 19 locales - 3 SenseAudio image models registered (2.0, 1.0, doubao-seedream-5.0); 1 video model (doubao-seedance-2.0) - Tests: byok-tools (29), media-senseaudio-image (8), media-config seed (7), proxy-routes (47), markdown image rendering (8) * fix(senseaudio): unblock image gen + design file preview switching - SenseAudio /v1/image/sync rejected the previous size mapping with `参数错误：size` (1664x936, 936x1664, 1280x960, 960x1280 are not in the gateway's accepted set). Switched to standard HD / SD sizes that every aspect bucket can hit: 1024×1024, 1280×720, 720×1280, 1024×768, 768×1024. Kept the byok-tools and media.ts tables in sync so the BYOK chat tool and the CLI agent path both stop failing on non-square aspects. - DesignFilesPanel's <DfPreview> was missing a key prop, so React reused the same iframe DOM node when the user picked a different file — the src prop changed but the iframe never navigated. Added key={previewFile.name} so the previous preview unmounts cleanly. - Updated byok-tools + media-senseaudio-image tests for the new size expectations. * docs(senseaudio): clear stale provider hint + update README - Settings → Media → SenseAudio: clear the auto-promoted "Image · TTS · 70+ voices · clone" hint; the provider label alone is enough now that the BYOK chat surface covers image + video tooling. - README: list the new senseaudio (and missing ollama) proxy routes so the BYOK section reflects what the daemon actually serves, and mention the generate_image / generate_video chat tools that ship with the SenseAudio path. * fix(senseaudio): address PR #2065 review feedback Three non-blocking review notes from @PerishCode on PR #2065: 1. Drop the dead /api/byok-image/:id route. The PR description claimed it was "legacy fallback for old chat history" but that storage layout never existed on main, so the route can only ever 400 or 404 — never 200. Removed the handler, the isSafeByokImageId export, the unused createReadStream / stat / path / Request / Response imports, and the two byok-image regression tests. 2. Add rejectProxyPluginContext guard to the senseaudio proxy handler so it matches the invariant the other five proxy paths already enforce (plugin runs must go through /api/runs for snapshot pinning). Extended the existing "API fallback rejects plugin runs" describe to also cover /api/proxy/senseaudio/stream with the 409 PLUGIN_REQUIRES_DAEMON expectation. 3. Wrap the secondary image / video downloads (the URLs the SenseAudio gateway hands back in /v1/image/sync .url and /v1/video/status .video_url) in validateBaseUrlResolved so a malicious gateway can't point us at 169.254.169.254 (AWS / Azure metadata) or RFC1918 hosts via the response payload. Also passed `redirect: 'error'` on both fetches to match the SSRF posture the primary proxy fetch already uses. The new assertExternalAssetUrl helper lives next to executeGenerateImage so future tool downloads can reuse it. Tests: 120/120 daemon tests pass; guard + typecheck green. * fix(senseaudio): mirror SSRF guard onto renderSenseAudioImage CLI path Follow-up to `01b1260a` — the chat-tool fix in byok-tools.ts wasn't mirrored onto the parallel renderSenseAudioImage path in media.ts. Same attacker-controllable shape (gateway-returned `data.url`), same one-line fix. - Hoist assertExternalAssetUrl from byok-tools.ts into connectionTest.ts next to validateBaseUrlResolved so both call sites (the BYOK chat tool loop AND the CLI agent media dispatcher) share one helper. Made the error strings provider-agnostic so a future caller doesn't get a misleading "senseaudio" attribution for a Volcengine / Grok / etc. download. - renderSenseAudioImage now runs the response url through assertExternalAssetUrl before fetching bytes, and passes redirect: 'error' to block a 3xx hop into private space. Scope intentionally limited to the senseaudio path PerishCode flagged; the other unguarded fetch(entry.url) call sites in media.ts (OpenAI / Volcengine / Grok / Nano-Banana) are pre-existing patterns and belong in a separate follow-up if the daemon wants defense-in-depth across every provider. Tests: 127/127 daemon tests pass; guard + typecheck green. --------- Co-authored-by: unknown <mazeliang@sensetime.com>	2026-05-19 23:14:56 +08:00
@aaronjmars	9a64fccdc0	fix(security): resolve hostname before approving external API base URLs (#1176 ) Some checks failed ci / Packaged mac smoke (push) Blocked by required conditions Details ci / Packaged windows smoke (push) Blocked by required conditions Details ci / Detect PR change scopes (push) Failing after 2s Details ci / Validate workspace (push) Has been skipped Details Docker image / build-and-push (push) Failing after 27s Details landing-page-ci / Validate landing page (push) Failing after 1s Details landing-page-deploy / Deploy landing page (push) Has been skipped Details github-metrics / Generate repository metrics SVG (push) Has been skipped Details nix-check / build (push) Failing after 2s Details ci / Packaged linux headless smoke (push) Has been skipped Details * fix(security): resolve hostname before approving external API base URLs Before this change the daemon-side base-URL validator only inspected the literal hostname string. A public DNS record that points at an internal address ('internal.example.com' → 10.0.0.5) passed validation, and the daemon would issue the upstream request anyway — turning the BYOK proxy into an SSRF primitive against internal infrastructure. Add a small companion ('validateBaseUrlResolved') that runs the existing sync check, resolves the hostname with 'dns.lookup({ all: true })', and re-applies the block-list against every resolved address. Wire it into the wrapper the daemon already uses ('validateExternalApiBaseUrl'), so every proxy/finalize handler picks it up without further edits. Carve-outs match the existing sync validator: - Loopback hostnames skip DNS (Ollama-style local LLMs still work, including '.localhost' / 'lvh.me'-style names that resolve to 127.0.0.1 per RFC 6761). - IP literals are already vetted by the sync pass; no need to re-resolve. - DNS resolver errors fall through to the existing fetch error path — a transient ENOTFOUND should not turn into a 403. The 6 callers that previously consumed the sync result now 'await' the async wrapper. All call sites are already inside async route handlers. Vitest coverage in apps/daemon/tests/connection-test.test.ts covers: sync rejection passthrough, loopback / IP-literal short-circuits, private IPv4 and IPv6 resolution, dual-stack with one private record, public→public passes (api.openai.com), '.localhost' resolved→loopback, and resolver-error fallback. Detected by: aeon (manual review + semgrep + osv + trufflehog). Class: CWE-918 (SSRF) — DNS-based bypass. * fix(security): cover remaining daemon baseUrl fetch surfaces + Ollama redirects Addresses PR #1176 review feedback (lefarcen / mrcfps): the resolved-IP wrapper only covered the proxy/finalize routes, leaving three adjacent SSRF gaps open. - testProviderConnection (/api/test/connection provider mode): switch from sync validateBaseUrl to await validateBaseUrlResolved so a hostname that resolves to a private IP is rejected before the daemon POSTs the smoke prompt upstream. - listProviderModels (/api/provider/models): same swap. Import the DNS-aware helper from ./connectionTest.js since it carries the dns binding the daemon owns; contracts stays pure. - Ollama proxy stream fetch: align with the other four proxy routes by setting redirect: 'error', so a validated public host cannot 3xx the daemon to a private/internal URL after the pre-fetch check. Regression coverage: - POST /api/provider/models — DNS spy returns 10.0.0.5 for a synthetic hostname; route must respond { ok: false, kind: 'forbidden' } and must not invoke upstream fetch. - POST /api/test/connection provider mode — same shape. - /api/proxy/ollama/stream — fetch mock asserts redirect: 'error' on the upstream Ollama call. The existing /api/provider/models timeout test now stubs dnsPromises so it doesn't race the probe timer against real DNS. --------- Co-authored-by: aeon <aeon@aaronjmars.com>	2026-05-15 23:12:52 +08:00
lefarcen	22a3b99a47	Merge origin/main into preview/v0.8.0 Sync 49 commits from main. Conflicts resolved: - .github/workflows/ci.yml: kept v0.8.0 granular per-area gating, added main's linux specs + release-stable.yml + release-preview.yml triggers - .github/workflows/release-preview.yml: kept v0.8.0's full workflow over main's placeholder - apps/web/src/components/AssistantMessage.tsx: combined v0.8.0 file-ops summary with main's stripTodoToolGroups + suppressAskUserQuestionFallbackText - apps/web/src/components/ChatPane.tsx: kept both new imports - apps/web/src/index.css: kept both .msg-plugin-chip and .user-copy-btn blocks - e2e/ui/*.test.ts: kept v0.8.0 openEntrySettingsDialog helper over main's inline dialog navigation (UI was redesigned in v0.8.0) - nix/package-{daemon,web}.nix: kept v0.8.0 pnpmDepsHash; rerun nix build to refresh	2026-05-15 18:23:33 +08:00
Chris Seifert	9cf265e520	feat(claude): wire AskUserQuestion tool through chat + pin TodoWrite (#1743 ) * feat(claude): wire AskUserQuestion tool through chat + pin TodoWrite Claude calls `AskUserQuestion` for mid-conversation clarifications when the natural answer is one of a small finite set of choices. Until now the tool round trip hit two dead ends in headless mode: claude-code -p cannot prompt the user, so it auto-errored the tool and retried 4x; the model then hedged by also writing the same options as a markdown bulleted list. The host had no way to feed a real `tool_result` back. This change makes the AskUserQuestion round trip work end to end: * Switch Claude to `--input-format stream-json`. The daemon wraps the prompt as a JSONL `user` message on stdin and keeps stdin OPEN, so later writes (a `tool_result` for the open AskUserQuestion) feed back into the same child instead of needing a fresh spawn. * New `RuntimeAdapter.promptInputFormat()` ('text' default, 'stream-json' for Claude) so the spawn loop keeps the old close-on- prompt behavior for every other agent. * New `POST /api/runs/:id/tool-result` daemon endpoint and `submitChatRunToolResult` web helper. Body carries `toolUseId` and `content`; daemon writes a JSONL `user` message with the matching `tool_result` content block. * Track outstanding host answers on the run (`pendingHostAnswers`) and close stdin on either a `usage` event or a synthesized `turn_end` event (extracted from `assistant.message.stop_reason` in `claude-stream`). Without the per-turn `turn_end` signal stdin would never close after the follow up turn finished and the run would hang until the inactivity watchdog killed it. * System prompt: tell Claude to use AskUserQuestion for follow ups with 2-4 finite choices, and to STOP after the tool call instead of writing a markdown duplicate. Web UI: * New `AskUserQuestionCard` renders the tool input as labelled chip buttons (single or multi select) with a Submit button styled like the composer's Send. On submit the answer routes through `submitChatRunToolResult` (live tool_result path) and falls back to `onSubmitForm` (plain user message) only if the run has already terminated. Selected chips persist across page reloads by re parsing the stored `tool_result.content`. * Hide markdown text that follows an AskUserQuestion in the same turn — defense in depth against the model emitting the duplicate. * Collapse identical `AskUserQuestion` / `TodoWrite` retries inside any tool group to a single card. TodoWrite is a snapshot tool, so older calls are duplicates of state. * Pinned TodoCard above the chat composer. The latest TodoWrite snapshot across the conversation renders once, expandable / collapsible header, count shows in-progress + completed (1/4), Done button dismisses when all tasks finish, soft fade gradient above so scrolling chat text dissolves into the panel instead of hard clipping under the card. * Composer gains a top shadow that only appears when the pinned todo slot sits directly above it (dark mode strengthened). * Accordion expand / collapse motion shared between TodoCard, the ToolGroupCard disclosure, and BashCard output via `grid-template-rows: 0fr -> 1fr` with `cubic-bezier(0.23, 1, 0.32, 1)` and asymmetric durations (200ms enter, 140ms exit) per Emil Kowalski's animation framework. * Jump-to-latest button no longer unmounts on hide; slides up with scale 0.9 -> 1 + fade on show, slides down with scale + fade on hide. Always horizontally centered via `margin: 0 auto`. i18n: * `tool.askQuestion`, `tool.askQuestionSubmit`, `tool.askQuestionPending`, `tool.askQuestionAnswered`, `tool.todosExpand`, `tool.todosCollapse`, `tool.todosDone`, `tool.todosDismiss` added to all 18 locales. Unblocker: * Fix a pre-existing render loop in `ProjectView` when the user clicks "New conversation". `handleNewConversation` now navigates to the fresh conversation id synchronously after `setActiveConversationId` so the route-sync effect at L512 and the URL-sync effect at L851 do not ping pong (route mismatch triggered repeated reverts; React's nested-update guard fired). * fix(claude): order turn_end after content blocks + cover chat switching Two follow-up fixes to the AskUserQuestion + new-conversation work: * `claude-stream.ts` emitted `turn_end` BEFORE iterating the assistant message's content blocks. When claude-code lacks `--include-partial-messages` (older builds), tool_use events surface only from that loop, so the daemon's stdin-close handler saw an empty `pendingHostAnswers` set and closed stdin before the AskUserQuestion tool_use was even registered. The result: the model retried, hit the same race, and gave up writing the questions in prose. Emit `turn_end` AFTER the content loop so tool_use ids land in `pendingHostAnswers` first. * `server.ts` now ignores `turn_end` events with `stop_reason: 'tool_use'`. That stop reason means the model paused to wait for a tool execution (claude-code's internal tool runner for Bash / Edit / Read, or a host-answered tool like AskUserQuestion). Either way the conversation is still in flight — closing stdin there would kill the follow-up response. Only the natural turn-end stop reasons (`end_turn`, etc.) close stdin. * `ProjectView.handleSelectConversation` now navigates to the picked conversation id synchronously, mirroring the fix already in handleNewConversation. The route-sync effect at L512 was reverting the active conversation on every switch, ping-ponging with the URL-sync effect at L851 until React's nested-update guard fired with "Maximum update depth exceeded". Same bug class as the pre-existing new-conversation render loop. * docs(agents): capture AskUserQuestion runtime + chat UI conventions Record the patterns this PR introduces so future contributors can find them without spelunking server.ts: * Agent runtime conventions — `RuntimeAgentDef.promptInputFormat`, `run.pendingHostAnswers` / `run.stdinOpen` lifecycle, `turn_end` ordering rule, `POST /api/runs/:id/tool-result` endpoint shape, the Claude only system prompt block that nudges AskUserQuestion, and the `suppressAskUserQuestionFallbackText` defense in depth. * Chat UI conventions — URL-load vs srcDoc render mode dispatch with bridge disqualifiers, the dual iframe visibility swap pattern, `isOurIframe` plus the active-iframe re-check for signals that must only come from the visible iframe, pinned TodoCard via `PinnedTodoSlot`, count includes `in_progress`, `dedupeSnapshotToolRetries` for AskUserQuestion / TodoWrite stacks. * i18n keys — 18 locale files, add the key to `types.ts` first. * UI animation philosophy — `cubic-bezier(0.23, 1, 0.32, 1)` ease out, asymmetric 200/140ms enter/exit, accordion via `grid-template-rows`, no `transform: scale(0)`, keep mounted + toggle class for exit transitions instead of relying on React unmount. * fix(claude): read promptInputFormat as field, close stdin on deferred answer Two PR review follow-ups on the AskUserQuestion stream-json wiring. * server.ts:4616 referenced `runtimeAdapter.promptInputFormat()` — but `runtimeAdapter` is not declared, imported, or assigned anywhere. The prior adapter abstraction was deleted in #1656; when the changes were folded back into the inline handler the format was moved onto `RuntimeAgentDef.promptInputFormat`, but this call site was missed. `server.ts` starts with `// @ts-nocheck` so typecheck never caught it — every chat run hit `ReferenceError: runtimeAdapter is not defined` the moment we wrote the prompt to a stdin-fed child, which is every agent with `promptViaStdin: true` (claude, codex, copilot, cursor-agent, gemini, opencode, pi, qoder). Read the format off the in-scope `def` and default missing values to `'text'`. * `submitToolResultToRun` cleared the answered id from `pendingHostAnswers` but never closed stdin if a `turn_end` / `usage` event had already fired with the set non-empty (deferred by the event handler). The child then waited indefinitely for further input until the inactivity watchdog killed it, losing the model's follow-up response. Close stdin on the last-answer transition when stream-json stdin is still open. Test: pin `promptInputFormat` for every `promptViaStdin: true` agent so future regressions of the field-vs-method contract fail at typecheck-adjacent test time instead of in production. The new test asserts `typeof def.promptInputFormat` is a string (or undefined), not a function — exactly the shape mistake the original line made. * fix(web): keep AskUserQuestion multi-select chips selected after reload when labels contain commas `handleSubmit` joined multi-select answers with `', '` while the reload parser split them on `','`. The pair is asymmetric: a valid model-generated option like `"Yes, including images"` round-tripped as `["Yes", "including images"]`, so after a page reload the locked question card showed the user's pick as unselected — even though the `tool_result` content the daemon actually wrote into the run was correct, and the model saw the right answer. Bounded to post-reload visual state, but silently confusing. Switch to a `- ` bullet list per option, one per line, with the parser stripping the leading `- ` back off. Newlines never appear inside a label so the round trip is exact. The outer pairs separator stays `\n\n` because individual answer bodies still never contain that double-newline. * chore: drop accidental personal design-system file `design-systems/foldar/DESIGN.md` was added to the AskUserQuestion branch in `31ac531` by mistake — it's a personal brand spec that does not belong in the upstream design-systems catalogue. Removing it keeps the branch's surface area scoped to the feature.	2026-05-15 15:50:27 +08:00
Tom Huang	76defffb93	Garnet hemisphere (#1702 ) * feat(chat-composer): enhance mention handling and input overlay - Introduced a new overlay for inline mentions in the chat composer, improving user experience by visually indicating mentions as users type. - Updated the `ChatComposer` component to manage mention entities and integrate them into the input field, allowing for better context and interaction. - Enhanced the `AssistantMessage` component to support the display of plugin action panels based on the current project context, facilitating easier plugin management. - Refactored related components to ensure consistent handling of project files and mentions across the application. This update significantly improves the chat interaction model, making it more intuitive for users to engage with mentions and plugins. * feat(plugin-management): enhance plugin action panels and UI components - Updated the `AssistantMessage` component to include plugin action panels based on the latest project context, improving user interaction with generated plugins. - Refactored the `PluginsView` to support detailed views for available marketplace entries, allowing users to access more information and actions for each plugin. - Introduced new CSS styles for improved visual representation of plugin-related UI elements, enhancing overall user experience. - Enhanced the `listPlugins` function to include an option for fetching hidden plugins, providing more flexibility in plugin management. This update significantly improves the usability and functionality of the plugin management system, making it easier for users to interact with and manage their plugins. * fix(assistant-message): refine plugin folder candidate selection logic - Updated the `pluginFoldersTouchedThisTurn` function to improve the logic for selecting plugin folder candidates based on touched paths and message content. - Introduced a new helper function, `pathMatchesFolderFileBasename`, to enhance the matching criteria for folder candidates. - Added a check for explicit folder matches before falling back to a single candidate, improving accuracy in folder selection. - Modified the `shouldRenderSlotAsText` function in `HomeHero` to include the name parameter, refining the rendering logic for slot text. These changes enhance the functionality and reliability of the assistant message component in managing plugin folder candidates. * feat(plugin-folder-actions): implement agent-routed CLI actions for plugin management - Introduced a new `PluginFolderAgentAction` type to streamline actions related to plugin folders, including install, publish, and contribute. - Updated the `DesignFilesPanel`, `FileWorkspace`, and `AssistantMessage` components to utilize the new agent action handling, improving user interaction with generated plugins. - Refactored the action handling logic to send commands to the agent, enhancing the workflow for managing plugin folders. - Added corresponding tests to ensure the new functionality works as expected and integrates seamlessly with existing components. This update significantly enhances the plugin management experience by routing actions through the agent, allowing for a more cohesive and interactive user experience. * Fix PR 1702 CI blockers * Fix PR 1702 remaining CI checks * Prebuild AGUI adapter after install * Restore plugin project snapshot wiring * feat(marketplace): refactor marketplace URL handling and enhance fetching logic - Introduced new functions to normalize marketplace URLs and manage fetching of marketplace manifests, improving the reliability of marketplace integrations. - Updated the server and plugin logic to utilize the new fetching mechanisms, ensuring consistent handling of marketplace data. - Enhanced tests to cover new URL normalization and fetching scenarios, ensuring robustness in marketplace management. This update significantly improves the marketplace experience by streamlining URL handling and enhancing data fetching capabilities. * Fix project auto-send cleanup spec	2026-05-14 21:12:50 +08:00
lefarcen	e1bc83a476	feat(analytics): PostHog product analytics (P0 events, consent-gated, packaged) (#1428 ) * feat(analytics): scaffold PostHog product-analytics integration - Add @open-design/contracts/analytics subpath with the 17 P0 event payload types, header constants, and code↔CSV enum mapping helpers. - Add apps/daemon/src/analytics.ts with env-gated posthog-node client, request-scoped analytics context reader, and artifact-id anonymizer. - Expose GET /api/analytics/config so the web bundle never embeds the PostHog key at build time; daemon owns POSTHOG_KEY / POSTHOG_HOST. - Add apps/web/src/analytics module (identity + lazy posthog-js client + React provider) and mount it under <I18nProvider> in app/layout. No event wiring yet — that lands in the next commit alongside trigger points (App.tsx, EntryView, NewProjectPanel, SettingsDialog, FileViewer, runs.ts). * feat(analytics): wire app_launch, home_view, home_click, project_create_result - App.tsx: fire app_launch once after first effect tick. handleCreateProject now emits project_create_result on both success and failure paths. - EntryView.tsx: home_view (page) gated on agents loading so has_available_cli isn't transiently false; home_view (asset_panel) fires per top-tab change with the right result_count. - NewProjectPanel.tsx: home_click create_button fires before delegating to the parent; a fresh request_id is generated here and threaded through onCreate so the matching project_create_result stitches via $insert_id. - contracts/analytics: tighten createTabToTracking and topTabToTracking for the worktree branch's renamed tabs (live-artifact, templates). * feat(analytics): wire settings_view + 3 settings_click events - settings_view fires on dialog mount and on every section switch, carrying the active section (mapped via settingsSectionToTracking for the 16-section worktree layout), execution_mode, and the selected CLI provider id when present. - settings_click execution_mode_tab: setMode now emits before/after values whenever the user toggles between Local CLI and BYOK. - settings_click cli_provider_card: agent card onClick reports cli_provider_id via agentIdToTracking (kiro → other). - settings_click byok_field: onFocus added to api_key, model select, and base_url inputs; provider_id widened to include google so the worktree's Gemini protocol slot type-checks. * feat(analytics): wire studio_view + studio_click chat, studio_view artifact - packages/contracts/src/analytics/artifact-id.ts: FNV-1a 64-bit helper produces a 16-hex anonymized id for (projectId, fileName). Stable cross-platform so the daemon and the web bundle resolve the same id without a Web Crypto round-trip; daemon now re-exports it. - ChatComposer: studio_view chat_panel fires once per project mount, studio_click chat_composer fires on attachment + send buttons with estimated user_query_tokens (length/4) and has_attachment. - FileViewer: studio_view artifact fires once per (project, file) at the dispatcher level, before any sub-viewer renders, with artifact_kind derived from the renderer registry / file.kind table. - Widen TrackingExportFormat to include markdown and cloudflare_pages so the worktree branch's full share menu can emit verbatim. * feat(analytics): wire studio_click share_option + artifact_export_result HtmlViewer's share menu now emits both events per click via a fireShareExport helper: - studio_click share_option fires immediately on click with the chosen export_format and a fresh request_id. - artifact_export_result fires when the export resolves — success for sync exporters (html, markdown, template) the moment the call returns, success/failed for async exporters (pdf, zip, deploy) via .then/.catch. The same request_id threads both events so PostHog stitches click → result via $insert_id. DEPLOY_PROVIDER_OPTIONS maps to the CSV's vercel / cloudflare_pages slots; markdown is now a first-class export_format value. Also ignore .env.local so local POSTHOG_KEY / .env-style secrets don't get committed. * feat(analytics): emit run_created and run_finished from the daemon POST /api/runs now reads the analytics context off the x-od-analytics-* headers the web client sets on every fetch, then: - Captures run_created with project_id, conversation_id, run_id, model_id, agent_provider_id (mapped via agentIdToTracking), skill_id, design_system_id, plus the token_count_source marker. - Schedules a run_finished capture on runs.wait(run) resolution, mapping succeeded/canceled/failed to success/cancelled/failed and reporting total_duration_ms. Both events use a stable insert_id derived from the same uuid so PostHog dedupes the daemon-side mirror against any future web-side capture without double-counting. Token sub-fields (user_query_tokens/system_prompt_tokens/...) stay omitted in v1 — the claude-stream parser only exposes input/output totals today. See tracking-doc-issues.md §3.2. * feat(analytics): emit settings_cli_test_result + settings_byok_test_result The original BLOCKING-list assumed these CSV P0 events were not implementable in this branch because main lacked Test buttons. The worktree HEAD actually wires `handleTestAgent` and `handleTestProvider` in SettingsDialog, so both events are now in scope. - handleTestAgent emits settings_cli_test_result on success and failure paths with cli_provider_id mapped via agentIdToTracking, result drawn from result.ok / catch branch, error_code from result.kind or the thrown error name, and duration_ms timed via performance.now(). - handleTestProvider emits settings_byok_test_result analogously, using apiProtocol (anthropic\|openai\|azure\|ollama\|google) directly as provider_id — wider than the CSV's 5-value enum, documented in tracking-doc-issues.md §2.5. Contracts: add SettingsCliTestResultProps / SettingsByokTestResultProps plus matching track* helpers. AnalyticsEventName union now covers all 14 P0 events this branch supports. * feat(analytics): gate PostHog on the existing telemetry.metrics consent The integration now reuses the same first-launch privacy banner + Settings → Privacy toggle that gates Langfuse, so a single user decision controls both telemetry sinks. - /api/analytics/config now consults the persisted AppConfigPrefs: it returns enabled=true only when POSTHOG_KEY is set AND the user has chosen "Share usage data" (telemetry.metrics === true). The response also echoes installationId so the web client uses the same anonymous id Langfuse keys off of — one identity per install, shared across both sinks. - Web AnalyticsProvider: - Bootstrap fetch resolves installationId and threads it through the x-od-analytics-anonymous-id header on every /api/* fetch, so daemon-side captures (run_created / run_finished / project_create_result) land on the same person record. - Exposes a setConsent(granted) method that calls posthog-js's opt_in_capturing / opt_out_capturing, wired from App.tsx via a useEffect watching config.telemetry?.metrics. Toggling Privacy → metrics now stops/resumes events immediately, no reload. - app_launch additionally gates on telemetry.metrics so a freshly- declined user fires nothing, and a freshly-opted-in user fires on the next reload. * feat(packaging): bake POSTHOG_KEY into packaged daemon spawn env Wires PostHog product analytics through the same Langfuse-style build- secret pipeline so official Open Design builds ship with the key while fork builds compile without it (the integration short-circuits cleanly when POSTHOG_KEY is absent). tools/pack - resolveToolPackConfig reads POSTHOG_KEY / POSTHOG_HOST from process.env at packaging time, validates them (no whitespace in the key, http(s) URL for host, trailing-slash strip), and stamps them on ToolPackConfig. Fork builds without the env vars simply omit the fields; the daemon-side gate keeps things off in that case. - Mac, Windows, and Linux packaged-config writers each append the two fields to open-design-config.json next to the existing telemetryRelayUrl entry. apps/packaged - RawPackagedConfig / PackagedConfig surface posthogKey / posthogHost so the Electron entry and headless entry both forward them to the daemon sidecar. - buildPackagedDaemonSpawnEnv emits POSTHOG_KEY / POSTHOG_HOST into the daemon child env when present. The daemon's existing analytics module reads these via process.env — no daemon-side changes needed. - The headless packaged path falls back to process.env for fields the builder hasn't injected, mirroring how OPEN_DESIGN_TELEMETRY_RELAY_URL is read there. CI - release-beta.yml and release-stable.yml expose POSTHOG_KEY (secret) and POSTHOG_HOST (var) at workflow-env scope so every packaging job inherits them. PR / fork builds without these set simply skip the bake step. Tests - tools/pack: config.test.ts covers bake-through, fork-build omission, whitespace rejection, invalid-URL rejection, and trailing-slash normalization. - apps/packaged: sidecars.test.ts covers buildPackagedDaemonSpawnEnv forwarding the keys when present and omitting them when null. * feat(analytics): enable PostHog autocapture + perf + exceptions Flip on the PostHog SDK's automatic diagnostic features so we capture click paths, page transitions, web vitals, dead clicks, and browser exceptions without scattering instrumentation through the codebase. Privacy defense lives in one place — apps/web/src/analytics/scrub.ts — wired in via posthog-js's `before_send` hook so every outgoing event passes through the same audit point: - $autocapture / $rageclick / $dead_click / $copy_autocapture: strips $el_text and value/placeholder/aria-label attrs from any input, textarea, password input, or contenteditable element. PostHog autocapture does not capture input.value by default, but $el_text on a <textarea> reflects the typed content — that's the prompt body for us, so it has to be scrubbed every time. - $pageview / $pageleave: drops query string and fragment from $current_url / $referrer so any future ?q=… can't leak. - $exception: rewrites file:// and absolute filesystem paths in stack frames to app://apps/<repo-relative> so we don't ship the user's home directory. - Suppresses $opt_in entirely — duplicate of our explicit setConsent toggle in App.tsx. Element-level defense in depth is limited to the single most sensitive surface: the chat composer textarea gets `ph-no-capture` so PostHog never even generates an event for clicks inside that subtree. Every other input relies on scrub.ts — sprinkling the class through every form would be noisy and easy to forget on new surfaces. The existing Privacy → "Share usage data" toggle continues to gate every new feature: posthog-js's opt_out_capturing() halts autocapture, $pageview, $exception, web vitals, and dead clicks alongside the explicit capture() calls — one global switch. 11 unit tests pin the scrub rules in apps/web/tests/analytics-scrub.test.ts. * ci(nix): bump pnpmDepsHash for posthog-js + posthog-node additions Adding posthog-js to apps/web and posthog-node to apps/daemon changed pnpm-lock.yaml, which Nix's fixed-output pnpmDeps derivation pins by sha256. The CI nix flake check failed with: specified: sha256-KF3Mld72/iau+pJmA7HvnanRx8VLtDP0N624SKrtrrc= got: sha256-PGFgX4lYyeH2TRAXfUq52A3EOa6bb1gO59hPsXhEk3s= Copy the new hash into both nix/package-web.nix and nix/package-daemon.nix per the procedure documented in nix/README.md §"First-build hash pinning". * feat(analytics): unify PostHog identity with Langfuse installationId PostHog's distinct_id is the installationId stamped by /api/analytics/ config; Langfuse already reads the same id off app-config.json to populate trace.userId. With both sinks keying off the same anonymous identity, dashboards can correlate user actions (PostHog events) with LLM runs (Langfuse traces) without re-identifying. Two gaps closed: 1. applyConsent(false) — clear posthog-js's persisted ph__posthog localStorage entry on opt-out via posthog.reset(). Without this, a user who opts out, then clicks Delete my data, then re-opts in would see PostHog stitch their new session to the deleted identity because bootstrap.distinctID only takes effect on first init. 2. applyIdentity(newInstallationId) — Delete my data rotates the installationId in app-config; App.tsx now watches config.installationId and calls posthog.reset() then identify(newId) so the next event batch is fully decoupled from the deleted one. Idempotent on same-id re-renders so benign config refreshes don't churn PostHog identities. The fetch wrapper's x-od-analytics-anonymous-id header also flips to the new id on rotation so daemon-side captures (run_created / run_finished) land on the same person record from the very next API call, not after a reload. The end-to-end rotation flow is verified against a live PostHog project; these unit tests pin the safety guards (no-client paths, null inputs) since stubbing posthog-js's init-loaded callback chain is brittle. fix(langfuse): require both metrics AND content consent for trace reports Tightens the Langfuse gate so a user who shares anonymous metrics but NOT conversation content stops emitting Langfuse traces entirely — Langfuse is used for turn-quality evals which only make sense with prompt/output bodies. PostHog (product analytics, content-free) stays gated on `metrics` alone and is unaffected. i18n: "Conversation content" → "Conversation and tool content" with hints expanded to mention tool inputs/outputs so the consent surface matches what the trace actually carries (en + zh-CN). Bundled here per PR scope — change originated outside this PostHog PR but lands cleanly on the same files; gating Langfuse strictly on `content` makes the dual-sink consent model (PostHog = metrics, Langfuse = metrics + content) symmetric across both i18n locales and the daemon-side gate. * feat(analytics): wire byok_provider_option + fix PR review P1s Adds the BYOK protocol-chip click event (5-value provider_id mirroring the apiProtocol Settings UI) and resolves four P1 review threads on PR #1428. byok_provider_option: - New SettingsClickByokProviderOptionProps in contracts (provider_id = anthropic\|openai\|azure\|google\|ollama; maps to CSV's 5 values per tracking-doc-issues.md §2.5). - trackSettingsClickByokProviderOption helper in apps/web/src/analytics. - SettingsDialog hooks it on the protocol-chip onClick alongside the existing setApiProtocol call; is_selected reflects whether the chip was already active. Review fixes: 1. client.ts (Siri-Ray): clear `initPromise` when the resolution is null so a Privacy → metrics opt-in after a previous decline triggers a fresh /api/analytics/config fetch. Without this, the disabled response was cached forever — first-session opt-in needed a reload to start sending PostHog events. 2. provider.tsx (Siri-Ray): replace `url.includes('/api/')` with a strict same-origin + /api/ pathname check (shared `isSameOriginApiCall` helper). Outbound third-party URLs containing `/api/` (e.g. provider.example.com/api/x) no longer receive our x-od-analytics-* headers. 3. provider.tsx (codex-connector, lefarcen): gate header injection on `resolvedAnonId` being non-null. When Privacy → metrics is off, /api/analytics/config returns enabled=false → resolvedAnonId stays null → wrapper never installs → daemon can't read consent-bearing headers → no daemon-side PostHog event. setConsent now also clears resolvedAnonId on opt-out and re-fetches on opt-in. 4. daemon/analytics.ts (defense in depth): createAnalyticsService now takes dataDir and capture() re-reads app-config to check telemetry.metrics inside the fire-and-forget wrapper. Even if a stale header somehow reaches the daemon after opt-out, the capture is dropped before posthog-node.capture is called. * fix(web): place "Share usage data" on the right in privacy consent banner Swap button order in PrivacyConsentModal and the in-settings ConsentCard so the affirmative "Share usage data" lands on the right and "Not now" on the left. Matches the OK-on-the-right pattern users expect for primary actions. Both buttons keep equal visual prominence (same .privacy-consent-action styling) so the swap doesn't change the EDPB equal-prominence stance called out in the original Langfuse telemetry spec. * feat(analytics): populate run_finished token totals from claude-stream usage Daemon's claude-stream parser already emits agent usage events with input_tokens / output_tokens totals; the run service buffers them in run.events and Langfuse reads them out the same way. The run_finished PostHog event was leaving these fields empty. Scan run.events for the most recent agent usage frame on terminal transition and emit input_tokens / output_tokens / total_tokens when present. token_count_source flips to 'provider_usage' only when at least one count landed; runs without provider-side usage data keep 'unknown'. Provider does not break the input down into the 7 sub-fields the tracking doc lists (memory / context / attachment / system_prompt / …); those stay omitted until a parser change exposes them. * feat(analytics): estimate user_query_tokens from prompt length The user_query_tokens field for run_created / run_finished was hardcoded to 0. We can't tokenize without bundling a model-specific tokenizer, but the character/4 heuristic is the industry-standard estimate when one isn't available and is enough for funnel analysis (prompt-length cohorts, short-vs-long-query conversion rates). Extracted from req.body via the same telemetryPromptFromRunRequest pattern the daemon already uses for langfuse-bridge (currentPrompt then message fallback). Only the integer count goes to PostHog — the prompt text itself never leaves the daemon. token_count_source flips appropriately: - run_created with a prompt: 'estimated' (was 'unknown') - run_created with no prompt: 'unknown' - run_finished with provider usage: 'provider_usage' (overrides baseProps' 'estimated' value) - run_finished without provider usage: inherits 'estimated' or 'unknown' from baseProps so input/output absent doesn't mask the estimate.	2026-05-12 22:32:42 +08:00
PerishFire	31e57fd773	fix(daemon): persist runStatus/endedAt on chat run termination (#1230 ) * fix(daemon): persist runStatus/endedAt on chat run termination (#135) POST /api/runs created the run but never reconciled the messages row on terminal status. If the web failed to persist the cancel (refresh, dropped PUT), the row stayed at run_status='running' / ended_at=NULL, and on reload the elapsed timer kept climbing because the renderer fell back to now - startedAt. Mirror routine/orbit reconciliation: attach a wait-completion handler that updates run_status and ended_at, guarded by COALESCE and a run_status IN ('queued','running') filter so concurrent web persists are not clobbered. Adds cancelRun helper and two regression specs under e2e/tests/dialog/. * fix(daemon): annotate reconcile callback params for chat-routes The chat run reconciliation block landed in chat-routes.ts after the recent server-route split (#1043), where stricter type checking surfaces implicit `any` parameters. Annotate the wait/then callback as `{ status: string }` and the catch callback as `unknown`. * refactor(daemon): extract reconcileAssistantMessageOnRunEnd helper The inline if/wait/then/catch block in POST /api/runs read as a bolt-on patch. Lift it to a named file-scope helper so the route handler stays intent-level (start the run, arrange follow-up reconciliation) and the guard for missing assistantMessageId is an internal detail. The helper's docblock describes the invariant ("messages row reflects the run's terminal state even without web persist"); commit history keeps the issue context. * test(e2e): wait for any terminal status in stop-reconcile spec The earlier .catch fallback chained two waitForRunStatus calls (canceled then succeeded). waitForRunStatus throws on the first non-expected terminal, so a canceled run that resolves to failed (e.g. agent exits non-zero on SIGTERM) would still abort the test before reaching the messages-row assertion. Add waitForRunTerminal to e2e/lib/vitest/runs.ts: polls until any terminal status without throwing on mismatch, since this spec's claim is about the resulting messages row, not which terminal the run took. Addresses Codex inline review on PR #1230.	2026-05-11 15:37:52 +08:00
nettee	b1d440d2bd	refactor(daemon): split route registration (#1043 ) * spec * refactor(daemon): split server route registrars * refactor(daemon): group route registrar dependencies * refactor(daemon): move remaining domain routes out of server * update doc * revert spec * fix daemon route context contract Generated-By: looper 0.5.6 (runner=fixer, agent=opencode) * fix media task persistence Generated-By: looper 0.5.6 (runner=fixer, agent=opencode) * fix: restore daemon route registrations * fix: restore static resource mutation origin checks	2026-05-11 15:00:23 +08:00

18 commits