mirror of
https://github.com/nexu-io/open-design.git
synced 2026-06-01 03:14:35 +07:00
18 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
729ce2b0cb
|
feat(daemon): add run-scoped MCP tool bundles (#3244)
* feat(daemon): add run-scoped MCP tool bundles * fix(daemon): keep sandbox runs in managed project dirs * fix(daemon): reject malformed run tool bundles * fix(contracts): model run-scoped mcp server inputs * fix(daemon): reject unsupported run tool bundles * fix(daemon): validate run tools before chat fallback * test(daemon): expect sandbox imported folder failure * fix(daemon): preflight sandbox project roots before run rows * fix(daemon): preflight sandbox chat project roots * fix(daemon): allow host editor for sandbox imports * fix(daemon): preflight sandbox routine project reuse * fix(daemon): reject undeliverable Claude tool bundles * fix(daemon): single-source chat route validation |
||
|
|
0fbeaf829e
|
fix(#3247): Detect, terminate, and warn on fabricated role markers across all agent paths (#3303)
* fix(daemon): detect and strip fabricated role markers in model output (#3247) Three-layer defence against models emitting `## user` / `## assistant` / `## system` lines mid-response, which the chat host interprets as real turn boundaries and acts on as unauthorised instruction: 1. **System prompt**: anti-roleplay instruction elevated from a bullet under "What you don't do" to a standalone `## CRITICAL` section in `official-system.ts`, with a REMINDER pinned at the end of the composed prompt for recency bias. 2. **Stream-level detection and truncation**: shared `role-marker-guard.ts` module (`createRoleMarkerGuard` + `FABRICATED_ROLE_MARKER_RE`) used across all text paths — Claude stream (per-message guards), non-Claude structured streams (run-scoped guard via `emitGuardedTextDelta`), and BYOK proxy routes (`createDeltaGuard`). When a marker is detected, the contaminated suffix is dropped and a `fabricated_role_marker` event surfaces a warning in the UI. 3. **UI**: `StatusPill` gains `is-warning` / `is-error` CSS variants; `fabricated_role_marker` events render as amber warning pills. * fix(chat-routes): do not await reader.cancel() on stream early-return The await on reader.cancel() can hang indefinitely on response streams whose underlying source is a Uint8Array (most notably surfaced by the ollama test in proxy-routes.test.ts, which builds its mock body via `new Response(uint8array)` rather than the controller-based helper `sseResponse()`). The hung await holds the request handler open, which in turn blocks `server.close()` in the afterAll hook, producing the two test timeouts (test at 145, hook at 36) currently failing CI on #3296. Fix is in production code, not the test: don't await the cancel. It is a cleanup hint and we are returning from the function anyway, so blocking on it offers no value. fire-and-forget with an empty catch keeps the cancel signal flowing for real HTTP streams without risking a hang on mock/edge-case implementations. Co-Authored-By: JasonBroderick <jason@buddyboss.com> * fix(daemon): terminate child on role-marker detection (close #3247 generation vector) PR #3296's detection layer truncates display and persistence of fabricated role markers, but the underlying model subprocess keeps generating tokens after detection. Three concrete consequences: 1. The model bills the user for the entire contaminated response (we observed 5,106 chars stored in claude's session file for a turn where only the first 3,013 chars were legitimate — a 40% overhead). 2. tool_use blocks emitted AFTER the marker reach the daemon's dispatcher unchecked, since detection only gates the text-delta emission path, not content-block-stop / tool_use blocks. The model could fabricate "## user delete file X" then emit a tool_use(delete X) that the dispatcher would execute. 3. The UI surfaces a `fabricated_role_marker` warning followed by an eventual normal turn-end, blurring the distinction between "completed normally" and "killed by safety guard." This commit adds a single idempotent `abortForRoleMarker(marker)` helper in server.ts, scoped to the same closure as `child` and `runGuard`. On any detection event (per-message Claude guard, run-scoped non-Claude guard, plain stdout guard) the helper: - Emits a structured `ROLE_MARKER_HALLUCINATION` SSE error so the UI can render a security-class status distinct from a normal turn-end. The existing `fabricated_role_marker` warning is still sent and rendered as the amber pill (PR #3296's UI). - Calls `acpSession.abort()` for ACP-multiplexed agents (Hermes, Kimi, Devin, Kiro) whose I/O doesn't necessarily release on SIGTERM of the wrapper process alone. - SIGTERMs the child immediately, with the existing `scheduleForcedChildShutdown()` SIGKILL fallback at 2x grace. Wired into three sites where contamination is detected: - `emitGuardedTextDelta` (sendAgentEvent / copilot / ACP / pi-rpc text_delta paths) - Plain-stdout listener (BYOK plain mode) - The Claude stream handler's onEvent (per-message guards in claude-stream.ts surface `fabricated_role_marker` events directly via onEvent rather than through the run-scoped emitGuardedTextDelta) Tool_use blocks emitted BEFORE the marker still flow through normally — this guard can't help with those, since by the time we observe a text marker the prior content block has already finished. Closing that gap requires speculative cancellation of in-flight tool calls when a downstream text block contains a marker; that's tracked as follow-up work, not included here. Co-Authored-By: roverkai <2196140098@qq.com> Co-Authored-By: JasonBroderick <jason@buddyboss.com> * refactor(role-marker-guard): bounded tail + drop chat-style markers Addresses two review comments on #3303: (1) O(1) memory + per-delta work (review r3323982225) Replace the unbounded `accumulated` string with a rolling tail capped at TAIL_BUFFER_SIZE (64 chars — comfortably exceeds the longest marker prefix `\n<whitespace>## assistant` ≈ 16–24 chars in practice). A 50 KB assistant response delivered in 1000 chunks of 50 bytes was previously O(n²) on string concatenation alone; now it is O(1) per delta regardless of message length. The `tail.length` value carries the "already emitted" offset that the cut-point math needs, so the offset semantics at L74–78 of the prior implementation are preserved without re-introducing the full-text buffer. (2) Drop chat-style markers entirely (review r3323982234, option (a)) `User:` / `Assistant:` / `Human:` / `AI:` are removed from the regex. Rationale: - The host parses ONLY `## user` / `## assistant` / `## system` lines as turn boundaries (see `buildDaemonTranscript` in apps/web/src/providers/daemon.ts). A model emitting chat-style markers does NOT cause the original #3247 security failure. - With kill-on-detection wired in this PR (`abortForRoleMarker` in server.ts), a false positive aborts the whole run — far more expensive than a stray unflagged `User:` line in chat scrollback. Chat-style markers collide with legitimate output (form labels, email contacts, JSDoc) often enough that pairing them with kill-semantics is the wrong tradeoff. The tradeoff is now documented in the regex docblock so the kill-on-match behaviour is justified against the false-positive surface. Also aligns the prompt-side CRITICAL block in system.ts: drop the "don't emit User: / Assistant: / Human: / AI:" bullet, since we no longer enforce it. Less ambiguity for the model and the operators. Test file updated: - Chat-style positive tests flipped to negative ("does NOT match User: — chat-style out of scope") so the intentional exclusion has a permanent regression test. - Two new tests cover the bounded-tail behaviour: a marker arriving after 10 KB of clean text in small chunks, and a marker straddling a chunk boundary after 100 prior chunks. - Added test for legitimate `User: bob@example.com`-style content not triggering contamination. Test count is now 35 (up from 25); two of the new ones explicitly exercise the new bounded-tail path. Co-Authored-By: JasonBroderick <jason@buddyboss.com> * fix(role-marker-guard): drop \`^\` anchor after first chunk (review r3324060995) Blocking correctness bug introduced by commit 4 (bounded-tail refactor): once \`tail\` is a rolling slice of mid-stream text, \`^\` in the canonical regex \`(?:^|\\n)\\s*##\\s+(?:user|...)\` no longer represents the genuine message start. As the rolling window slides forward chunk by chunk, a sliced tail can begin with whitespace + \`##\` (or just \`##\`), letting \`^\` anchor a match against text that the full-buffer implementation correctly ignored. With kill-on-detection wired in commit 3, that false positive now SIGTERMs the run and emits a \`ROLE_MARKER_HALLUCINATION\` error — exactly the failure class called out in the docblock at L22–29. Reviewer's evidence (PerishCode, r3324060995): streaming "…take a look at the ## user content section…" one character at a time reports \`contaminated: true\` post-refactor; the same text in a single feed stays clean. Fix: keep the canonical \`FABRICATED_ROLE_MARKER_RE\` for the very first non-empty feed (where \`^\` legitimately points at the message start), and switch to an internal \`NEWLINE_ANCHORED_ROLE_MARKER_RE\` (\`\\n\\s*##\\s+(?:user|...)\` — drops the \`^\` alternative) for all subsequent feeds. A \`firstChunk\` boolean tracks the state. Real newline-preceded markers straddling chunk boundaries are still caught because the preceding \`\\n\` is retained inside the 64-char tail. Regression tests added (\`apps/daemon/tests/role-marker-guard.test.ts\`): - mid-line \`## user\` streamed char-by-char with no preceding \\n (mirrors the reviewer's repro) - space-preceded mid-line \`## user\` in a >130-char stream, which long enough to force the rolling window past the marker — exercises the exact slice condition that triggered the bug - real \\n-preceded \`## user\` still caught after a long preamble (positive case must not regress) - \`## user\` as the very first chunk still caught (\`^\` legitimately anchors on the first feed) Co-Authored-By: JasonBroderick <jason@buddyboss.com> * fix(role-marker-guard): case-sensitive + tighter prefix scope (reviews r3324151877 / r3324151882) Two refinements addressing the third review on #3303: == Blocking (r3324151877) == The regex over-matched legitimate Markdown headings, and with kill-on-detection wired in commit 3 each false positive deterministically aborts a real run. Three changes tighten the match to the actual security surface — `## user` / `## assistant` / `## system` lines the chat host parses as turn boundaries — without losing any real attack pattern: 1. CASE-SENSITIVE. Dropped the `/i` flag. The host's turn-boundary delimiter is lowercase (see `buildDaemonTranscript` in apps/web/src/providers/daemon.ts), and the `## CRITICAL` system-prompt block already forbids only the lowercase forms. Title-Case headings like `## User Guide`, `## System Architecture`, `## Assistant settings` are now ignored — these are legitimate technical writing patterns LLMs emit constantly. `## USER NOTES` (all-caps) likewise no longer flags. 2. POSITIVE LOOKAHEAD `(?=[^a-z])` after the role keyword. Without it, `## userland`, `## userspace`, `## users guide`, `## systemd`, `## assistance` all match via prefix in the alternation. The lookahead requires the next character to exist and to not be a lowercase letter, so: - `## user\\n…` → match (newline is not lowercase) - `## assistantR…` → match (R is uppercase; the glued-form attack pattern still gets caught) - `## assistant.` → match (. is not a letter) - `## users guide` → no match (s is lowercase letter) - `## userland` → no match (l is lowercase letter) POSITIVE rather than NEGATIVE `(?![a-z])` because the negative form is satisfied at end-of-string, which in a streaming context means "we have `## user` but don't know what comes next yet" — would fire prematurely if `land` arrives in a later chunk. The positive form delays detection by one character in that edge case, traded for correctness. 3. `[ \\t]` instead of `\\s` for inner whitespace. Markdown role markers are single-line by convention; restricting to space/tab prevents oddities like `##\\nuser` from matching across lines. Test file: added Title-Case fixtures (`## User Guide`, `## System Architecture`, `## Assistant settings`, `## USER NOTES`) and prefix-of-longer-word fixtures (`## users guide`, `## userland`, `## systemd`, `## assistance`) — each asserting NO contamination. The existing `## usability` negative test gave false confidence as the reviewer noted (only failed via alternation-miss, not via word-boundary semantics); the new fixtures actually exercise the lookahead. Also added a positive test for `## assistant.` (glued punctuation) to balance the existing `## assistantReading` (glued uppercase) coverage. Total tests: 35 → 50. == Non-blocking (r3324151882) == Added `ROLE_MARKER_HALLUCINATION` to `API_ERROR_CODES` in `packages/contracts/src/errors.ts` alongside the existing agent/AMR codes, with a docblock comment explaining the emission contract: emitted by `server.ts::abortForRoleMarker` alongside the existing `fabricated_role_marker` warning event when the daemon detects a fabricated Markdown role marker in agent output; retryable. The code was already being emitted over the wire but unregistered — landing the registration here keeps the contract and emitter in sync as reviewer requested. Co-Authored-By: JasonBroderick <jason@buddyboss.com> * fix(role-marker-guard): defer complete-but-unconfirmed marker suffix Addresses review r3324277xxx — the boundary case where a stream chunk boundary lands between the role keyword and its lookahead character violated the documented "everything from the marker onward is silently dropped" contract. With (?=[^a-z]) as the lookahead, `feedText('## user')` returned `## user` as safe (no char to satisfy the lookahead → no match → pass through), so the fabricated marker line leaked into UI and app.sqlite before the next chunk confirmed contamination on the next SIGTERM cycle. Fix: introduce a `pending` state variable holding bytes that match the COMPLETE-but-unconfirmed marker prefix at end of buffer (/(?:^|\\n)[ \\t]*##[ \\t]+(?:user|assistant|assist|system)$/, no lookahead, $ anchor instead). When the no-match branch detects this suffix, withhold it from emission until the next feed either: - Confirms it (next char non-lowercase) → main regex matches → contaminated → withheld bytes dropped along with `## user`. - Denies it (next char lowercase, e.g. `userl…`) → main regex no longer matches the role keyword → withheld suffix is released and emitted alongside the new continuation. Also tied the firstChunk transition to actual byte emission rather than feed count. Previously a message that starts with `## system` followed by a separate `\\n` chunk would lose the `^` anchor on the second feed (firstChunk had flipped after the first feed even though nothing was emitted yet), silently breaking detection for that edge case. Now `firstChunk` stays true until at least one byte has crossed the emission boundary, matching the conceptual definition of "message start". Tests added (apps/daemon/tests/role-marker-guard.test.ts): - `## user` deferred at chunk boundary, confirmed by `\\n` in next - `## user` deferred at chunk boundary, denied by `land` continuation - `## assistant` deferred, confirmed by punctuation - `## User` Title-Case still passes through unconditionally - `## system` as the very first chunk: deferred, confirmed by \\n in next chunk (tests the firstChunk-stays-true-when-nothing- emitted invariant) Total tests: 50 → 55. Co-Authored-By: JasonBroderick <jason@buddyboss.com> * fix(claude-stream): scope role-marker guard to text_delta only, not thinking_delta Addresses review r3324xxxxxx — guarding the thinking channel buys no security and causes legitimate aborts. Why thinking is NOT a #3247 vector: - `buildDaemonTranscript` in apps/web/src/providers/daemon.ts only re-serializes `m.content` as `## ${m.role}\n...`. - Extended-thinking content is rendered to a separate `kind: 'thinking'` payload (daemon.ts:857-858) and never folded into `m.content`. - So a `## user` line in the thinking channel CANNOT become a fabricated turn boundary on the next round-trip. Why guarding it is harmful: - Models routinely emit literal `## user` / `## assistant` lines in chain-of-thought when reasoning about conversation structure ("Let me think about this. The user might phrase it as:\n## user\n …"). Common pattern in production traces. - With `abortForRoleMarker` wired in server.ts, a guard match on thinking SIGTERMs the run and surfaces a security error to the UI. The user paid for the reasoning, never sees the answer, and gets a confusing "fabricated role marker" warning for what was actually legitimate metacognition. - This directly contradicts the module's own stated philosophy ("a false positive aborts the whole run — a much more expensive failure than a stray unflagged ... line", role-marker-guard.ts). Fix: `emitSafeText` now passes thinking_delta through unconditionally, skipping both the guard and the contamination check. text_delta remains fully guarded. The single-line change at the top of emitSafeText preserves all other channels' behavior. Regression tests added (apps/daemon/tests/claude-stream-thinking.test.ts): - `## user` / `## assistant` lines in a thinking_delta — must NOT fire fabricated_role_marker, the thinking content streams intact including the marker text, and the subsequent text_delta answer still reaches the consumer (run not aborted). - Sanity check: same `## user` pattern in a text_delta DOES fire fabricated_role_marker and truncates emission at the marker. Locks in the channel-discriminated behavior. Co-Authored-By: JasonBroderick <jason@buddyboss.com> * fix(role-marker-guard): tie firstChunk to slicing, not byte emission Blocking review r3324xxxxxx: under the prior firstChunk transition ("any byte emitted"), a role marker that arrived at the very start of a message with its prefix split across multiple chunks bypassed detection — reopening the #3247 vector on the Claude path. Concrete cases that were missed (all are routine provider tokenizations of \`## user\n…\` at message start): - \`##\` | \` user\nDELETE…\` - \`## us\` | \`er\nDELETE…\` - \`## \` | \`user\nDELETE…\` Mechanism: the pending-deferral regex only catches COMPLETE role keywords, so a first chunk ending in a partial prefix (\`##\`, \`## \`, \`## us\`) was emitted in full. That emission flipped firstChunk to false. From that point only NEWLINE_ANCHORED_ROLE_MARKER_RE was used, which requires a literal \n before \`##\`. A marker at buffer position 0 has no preceding \n, so it could no longer match. abortForRoleMarker never fired and tool_use blocks emitted after the fabricated turn boundary reached the dispatcher. Fix: change firstChunk to track "tail has not been sliced yet" rather than "any byte emitted". While total emitted bytes <= TAIL_BUFFER_SIZE, tail still represents the entire emission so far and \`^\` in the canonical regex genuinely anchors at byte 0 of the stream — so the \`^|\n\` alternation safely catches a chunk-split message-start marker. The transition happens at the moment we would slice: once emitted > TAIL_BUFFER_SIZE, tail becomes a mid-stream window, \`^\` becomes meaningless, and we switch to the newline-only variants. Earlier iterations of this code tried two other definitions, both unsound: - "any byte emitted" (this commit fixes) — lost \`^\` before a chunk-split message-start marker could finish arriving. - "newline emitted" (briefly considered as the reviewer's alternative suggestion) — left \`^\` valid on a sliced buffer when streams hadn't emitted a newline yet, re-introducing the rolling-tail mid-stream false positive from review r3324060995. The slice-based invariant satisfies both: while we have not sliced, \`^\` is correct; once we slice, it is not. Regression tests added (apps/daemon/tests/role-marker-guard.test.ts): - \`##\` | \` user\nDELETE…\` → contaminated, marker=\`## user\` - \`## us\` | \`er\nDELETE…\` → contaminated, marker=\`## user\` - \`## \` | \`user\nDELETE…\` → contaminated, marker=\`## user\` - \`#\` | \`# user\nDELETE…\` → contaminated, marker=\`## user\` The fourth case (single \`#\` first chunk) exercises an even more adversarial tokenization than the reviewer's examples; it is also caught. Total tests: 55 → 59. Co-Authored-By: JasonBroderick <jason@buddyboss.com> * fix(tests): wrap events in stream_event envelope in thinking test feedJsonl was feeding raw events without the `{ type: 'stream_event', event: ... }` wrapper that createClaudeStreamHandler requires (line 141 of claude-stream.ts). Events silently fell through all branches, making both tests pass vacuously. Also fix TS2532 on warnings[0].marker with non-null assertion (safe after the toHaveLength(1) guard). Co-Authored-By: RoverKai <roverkai@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: roverkai <2196140098@qq.com> Co-authored-by: JasonBroderick <jason@buddyboss.com> Co-authored-by: RoverKai <roverkai@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
c847ace554
|
Add run-scoped media execution policy (#3106)
* feat(contracts): add run media execution policy * feat(daemon): enforce run media execution policy * test(daemon): cover media execution policy gates |
||
|
|
338cb4d423
|
fix(platform): support live system proxy changes (#3093)
* fix(platform): support live system proxy changes * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Honor lowercase proxy env vars within a single source before merging proxy-aware envs.\n\nGenerated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Refresh provider request proxy env on each dispatcher creation and cover it with a focused regression test. Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): enable node env proxy for user proxy vars * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) * fix(platform): support live system proxy changes Generated-By: looper 0.9.1 (runner=fixer, agent=opencode) |
||
|
|
d28acdc879
|
Fix Gemini BYOK model URL normalization (#2761)
Some checks failed
visual-baseline / Capture visual baselines (push) Waiting to run
ci / Detect CI change scopes (push) Successful in 1s
nix-check / build (push) Failing after 2s
ci / Validate Nix flake (push) Has been skipped
ci / Preflight (push) Failing after 1s
ci / Workspace unit tests (push) Failing after 1s
ci / Daemon workspace tests (push) Failing after 1s
ci / Web workspace tests (push) Failing after 1s
ci / Browser tests (push) Failing after 1s
ci / Build workspaces (push) Failing after 1s
ci / Validate workspace (push) Failing after 0s
ci / Runtime trace (push) Has been skipped
Co-authored-by: ATXP Earn Clowdbot <bougie-atxp@users.noreply.github.com> |
||
|
|
ecfe9b9d10
|
fix: gate chat token params by model family (#1675)
* fix: gate chat token params by model family * fix: retry azure deployment token params * fix: retry azure v1 token params * fix: report azure retry latency * test: cover azure failed retry latency |
||
|
|
c14baf07d3 |
Merge origin/main into release/v0.8.0
PR #2461 sync prep — resolves 14 conflicts merging 84 main-side commits on top of 58 release-side commits accumulated during the 0.8.0 cycle. Resolution summary: Take main (theirs) where main carried deliberate forward progress: - apps/web/src/components/PluginCard.tsx — 7 hunks, i18n migration: hardcoded English aria-labels/titles replaced with t() calls keyed on pluginCard.* (all 8 keys verified present in en.ts). - apps/web/src/components/TasksView.tsx — 1 hunk, source-ingestion feature: sortedRoutines (newest-first), sourceIngestionTemplates, patchSourceForm, submitSourceIngestion. activeCount/pausedCount semantics preserved (now keyed on sortedRoutines, count unchanged). - e2e/ui/app.test.ts — new node:fs/promises + tmpdir + path + @/timeouts imports needed by main-side test helpers. - e2e/ui/settings-local-cli-codex-fallback.test.ts — menu-dismissal helper block added by main. Keep both sides where each added a different field to the same object literal: - apps/web/src/components/ProjectView.tsx (locale + analyticsHints spread). - apps/web/src/components/DesignSystemFlow.tsx (locale + analyticsHints). Take release (ours) where release carried deliberate work that ships 0.8.0: - CHANGELOG.md — release-side 0.8.0 entry + PR link refs; main's Unreleased section was the same body of work, now finalized. - apps/landing-page/public/{apple-touch-icon,favicon}.png + apps/web/public/app-icon.svg — release-side visual refresh assets consistent with 0.8.0 stable ship. - tools/pack/src/linux.ts — packageVersion const required by line 466; taking main's empty line would build-error. - e2e/ui/project-management-flows.test.ts + e2e/ui/settings-api-protocol.test.ts + e2e/ui/settings-memory-routines.test.ts — release-side release-smoke hardening (shangxinyu1 + PerishFire) takes precedence on overlap. Closes-issue / unblocks: PR #2461 sync release/v0.8.0 → main. |
||
|
|
5b53c44e13
|
fix: enable SenseAudio BYOK TTS (#2570)
* fix: enable SenseAudio BYOK TTS * fix: handle BYOK SenseAudio TTS failures * fix: harden SenseAudio BYOK TTS responses * fix: preserve BYOK speech tool failure kind * fix: handle versioned SenseAudio TTS base URLs |
||
|
|
6690dbd5bb
|
feat(analytics): PostHog + Langfuse instrumentation for assistant feedback (#1558)
* feat(analytics): PostHog + Langfuse instrumentation for assistant feedback
Re-bases the original three-commit PR onto release/v0.8.0. The web-side
feedback UI instrumentation (surface_view / ui_click / feedback_submit_result)
landed on main while this branch was open, so on this rebase that wiring
is taken from main; the remaining net additions are:
- Contracts: TrackingFeedback* enums and the four dedicated
assistant_feedback_* event payload types (click, reason_view,
reason_click, reason_submit), plus normalizeCustomReason helper.
The new event-name variants are added to TrackingEventName and the
AnalyticsEventPayload discriminated union next to the existing
surface_view/ui_click variants — both wire formats coexist.
- POST /api/runs/:id/feedback in apps/daemon/src/chat-routes.ts:
thin route that validates rating, allowlists reasonCodes through a
simple string filter, and fire-and-forgets into the daemon's
reportFeedback hook.
- apps/daemon/src/langfuse-bridge.ts reportRunFeedbackFromDaemon
forwards the rating + reasonCodes into Langfuse as user_rating
(NUMERIC ±1) + user_rating_reason (CATEGORICAL, one per code)
score-create entries. Gates on telemetry.metrics + telemetry.content.
- apps/web/src/providers/daemon.ts reportChatRunFeedback (fire-and-forget
fetch) and apps/web/src/components/ProjectView.tsx wiring so each
thumbs-up/down + reason submission posts the side-channel.
Conflicts resolved (release/v0.8.0 vs the branch's old base):
- packages/contracts/src/analytics/events.ts: keep main's
file_upload_result / feedback_submit_result / settings_* event
variants alongside the new assistant_feedback_* additions.
- apps/daemon/src/server.ts: keep DNS-aware validateExternalApiBaseUrl,
add reportFeedback closure wired into registerChatRoutes telemetry.
- apps/daemon/src/chat-routes.ts: keep both /tool-result and the new
/feedback routes; merge RegisterChatRoutesDeps to include both
'paths' and 'telemetry'. Drop PR's chat-routes-local
reconcileAssistantMessageOnRunEnd helper (main has the equivalent in
server.ts).
- apps/web/src/components/ChatPane.tsx & AssistantMessage.tsx & ProjectView.tsx:
keep main's projectKindForTracking prop name and its existing
emission of surface_view / ui_click / feedback_submit_result; the
PR's analyticsCtx-based reason_view/click/submit emission is dropped
in this rebase since it would duplicate the existing wire format.
- apps/web/tests/components/*: rename projectKind → projectKindForTracking
to match ChatPane's current prop name.
Outstanding review feedback (from the pre-rebase round, will be
addressed in a follow-up commit):
- AssistantMessage tests not yet passing the new feedback context to
the direct render path.
- ProjectView clear-feedback path skips reportChatRunFeedback, leaving
stale Langfuse user_rating scores.
- buildFeedbackPayload has no deletion path for previously-submitted
user_rating_reason scores when the user switches thumbs.
- POST /api/runs/:id/feedback always returns {status:'accepted'} even
when consent is off; needs to surface skipped_consent / skipped_no_sink.
- reasonCodes are filtered to string[] but not allowlisted against
ChatMessageFeedbackReasonCode or deduped.
* fix(analytics): address review on assistant feedback rebase
Picks up the in-scope correctness items from the prior review round
and the rebase residue without rewriting history:
- chat-routes.ts: `/feedback` now awaits the daemon's preflight
outcome and echoes it as the response. The contract was already
shaped as `accepted | skipped_consent | skipped_no_sink`, but the
previous handler always returned `accepted` because the network
send was fire-and-forget. The consent + sink decision is local
(a small file read and an env-var lookup); the actual Langfuse
upload still runs as a detached promise.
- chat-routes.ts: reasonCodes are now allowlisted against the
contract's reason-code union and deduplicated before reaching
Langfuse, so a stale or replayed client can't poison the
Langfuse score table with unknown categorical values or
duplicate stable ids in the same batch.
- langfuse-bridge.ts: split the consent + sink resolution from the
fire-and-forget network send so the route can claim `accepted`
honestly. The legacy `skipped_no_sink` return on app-config read
failure is preserved.
Contracts + comment hygiene:
- TrackingFeedbackReasonCode in packages/contracts/src/analytics/events.ts
drifted from ChatMessageFeedbackReasonCode in packages/contracts/src/api/chat.ts;
add `followed_design_system` and `missed_design_system` so the
analytics wire format stays aligned with the persistence shape.
- langfuse-trace.ts buildFeedbackPayload: the docblock claimed the
raw custom-reason text is bucketed before send. Product reversed
that on 2026-05-13 (raw text now ships, consent-gated). Replace
the stale comment with the real semantics + a note that there is
no tombstone path for reason codes the user removes in a
follow-up submission (left as scope for a later PR).
- AssistantMessage.tsx: remove the now-unused
`AssistantFeedbackAnalyticsCtx` interface and a stray blank-line
delete from the rebase; restore the analytics-context comment
above the feedback hook.
Left as follow-up (intentional, documented in code):
- Sending a tombstone score when the user clears their rating —
ProjectView still skips reportChatRunFeedback on `change===null`,
so Langfuse retains the previous rating until the user re-submits.
The PostHog event captures the clear separately.
- Removing reason-code scores when the user re-submits with a
smaller set — buildFeedbackPayload only overwrites the codes
present in the current payload.
* feat(analytics): wire PR's dedicated assistant_feedback_* events
The four dedicated event types (`assistant_feedback_click` /
`_reason_view` / `_reason_click` / `_reason_submit`) the PR added to
contracts were sitting unused after the rebase because main's
umbrella `surface_view` / `ui_click` / `feedback_submit_result`
emissions covered the same user gestures. Wire the dedicated events
alongside the umbrella ones so both wire formats fire on every
feedback action — dashboards / evals can pick whichever schema they
were built against without losing signal.
Each dedicated event has stricter typing than its umbrella sibling
(`project_id` / `project_kind` / `conversation_id` are non-null), so
the new emissions are guarded behind a presence check and skipped on
test renders that mount AssistantMessage without project context. The
umbrella emissions retain their nullable fallbacks unchanged.
Pairing:
- surface_view (feedback reason panel) ↔ assistant_feedback_reason_view
- ui_click (feedback button) ↔ assistant_feedback_click
- ui_click (reason submit button) ↔ assistant_feedback_reason_click
- feedback_submit_result ↔ assistant_feedback_reason_submit
Reason click + submit share the existing `requestId` so PostHog can
stitch click→result across both schemas, matching the spec.
|
||
|
|
204599a7ae
|
feat(analytics): ship PostHog v2 event schema (#2285)
* feat(analytics): ship PostHog v2 event schema
Aligns the PostHog wire format with the product team's v2 tracking
spec (Open Design 埋点文档 2.0). The previous v1 catalogue defined a
flat per-page event name (home_view / studio_click / settings_view…);
v2 collapses everything to four core events identified through the
page_name + area + element triplet so dashboards can group by surface
without owning a separate event per page.
Key changes
- packages/contracts/src/analytics: collapse to page_view / ui_click /
surface_view / *_result event names; bump EVENT_SCHEMA_VERSION to 2;
rename the wire field anonymous_id → device_id (value unchanged);
promote the configure-state triplet (has_available_configure_cli /
configure_type / configure_availability) to a global PostHog register
so every event inherits it without per-helper boilerplate.
- apps/web/src/analytics: rewrite the 43 trackXxx helpers behind the
new typed catalogue; opt out of PostHog's built-in UA bot filter so
legitimate embedded webviews, fingerprinted browsers, and the
Playwright-based e2e runs ingest captures (the Privacy → "Share
usage data" toggle remains the single consent gate).
- apps/web components: wire P0/P1/P2 click + view + result surfaces
end-to-end — left nav, toolbar, home chat composer, recent projects,
new project modal, plugins / design systems / integrations /
automations pages, file manager, artifact toolbar/header/share popup,
feedback panel, settings sidebar / language / appearance /
notifications / pets / privacy / connectors. Fixes the v1 feedback
bug where action=clear_feedback_rating shipped rating=null instead of
the rating being cleared.
- apps/daemon: extend run_created / run_finished with the v2 context
(entry_from / project_kind / target_platforms / fidelity /
connectors / etc.), add explicit error_code classification on
result=failed (run.errorCode → AGENT_SIGNAL_* → AGENT_EXIT_* →
AGENT_TERMINATED_UNKNOWN), and read device_id from the new
x-od-analytics-device-id header. Also moves the run_created /
run_finished emission to the canonical /api/runs handler in
server.ts; the chat-routes copy was shadowed by Express's earlier
registration and never executed, which also meant run.clientType
never made it to Langfuse — fixed in the same move.
Verification
- pnpm guard / pnpm typecheck clean for daemon, web, and contracts.
- pnpm --filter @open-design/web test: 1645/1645 passing.
- End-to-end smoke through Playwright + local PostHog ingest project
420348: every page_view (home/projects/automations/design_systems/
plugins/integrations/chat_panel/file_manager), every nav element,
the new_project_modal surface_view + tab + create flow, the
plugin_replacement_modal surface_view, settings_view across nine
sections, settings_cli_test_result (codex CLI), the
project_create_result success path, and run_created + run_finished
(result=failed, error_code=AGENT_EXIT_1) all reached PostHog with
the v2 schema and the expected device_id / page_name / area /
element / fidelity / target_platforms props. The remaining
*_result events (artifact_export / feedback_submit / file_upload /
plugin_replacement / settings_byok_test / settings_connector_auth)
are wired in code; production traffic will trigger them.
* fix(analytics): preserve style category on design-systems surface chip switch
The merge resolution in DesignSystemsTab incorrectly re-introduced a
`setCategory('All')` call alongside the new `trackDesignSystemsTopClick`
emit. main intentionally keeps the active style category when the surface
filter refines within it; the regression was caught by the existing
"keeps the style category when a surface chip refines within it" test
in tests/components/DesignSystemsTab.test.tsx.
* fix(analytics): address review — senseaudio passthrough + daemon-side configure-state
Two follow-ups from the v2 schema review on #2285:
1. `byokProtocolToTracking()` was still falling through to `null` for
`senseaudio` even though the v2 BYOK provider enum now lists it. Every
`SettingsDialog` BYOK call site guards on `if (byokProviderId)`, so a
user on SenseAudio was silently dropping the provider-option,
field-focus, and test-result captures. Added the missing case so
SenseAudio gets the same analytics coverage as the other providers.
2. The daemon-authoritative `run_created` / `run_finished` events were
missing the configure-state triplet (`has_available_configure_cli` /
`configure_type` / `configure_availability`) that v2 promotes to a
global register on the web side. Daemon captures don't go through the
PostHog global register, so dashboards couldn't segment run lifecycle
by execution setup after the migration.
The fix derives the triplet server-side from `detectAgents()` and the
request's `agentId` before `design.analytics.capture(...)`:
- has_available_configure_cli: any CLI on PATH reports installed
- configure_type: 'local_cli' when the run targets an installed CLI,
otherwise 'unknown' (daemon can't see BYOK keys, which live in
web-client storage)
- configure_availability: 'available' / 'unavailable' / 'unknown'
based on the requested agent's install status, with a fallback to
'available' when any CLI is installed
This keeps the v2 schema consistent across both daemon-side and
web-side captures.
* fix(analytics): wire setConfigureGlobals so browser events carry fresh state
Third follow-up from the v2 schema review on #2285. The previous fix
addressed senseaudio + daemon-side configure-state, but reviewer flagged
that `setConfigureGlobals` was still defined-only — no caller — so every
browser-side capture inherited the boot defaults
(`has_available_configure_cli=false`, `configure_type='unknown'`,
`configure_availability='unknown'`). PostHog dashboards therefore could
not segment the new `page_view` / `ui_click` / `surface_view` events by
execution setup after a user configured their environment.
Changes:
- `packages/contracts/src/analytics/events.ts` — add a pure
`deriveConfigureGlobals(mode, agentId, agents, byokConfigured)` helper
so the web client and the daemon can derive the triplet from the same
source of truth. The helper covers all 5 `configure_type` buckets
(`local_cli` / `byok` / `both` / `none` / `unknown`) and the 3
`configure_availability` buckets (`available` / `unavailable` /
`unknown`).
- `apps/web/src/App.tsx` — add a useEffect that re-derives the triplet
whenever the user changes execution mode, selects a new CLI, saves a
BYOK key, or the detected-agent list refreshes, then pushes it to
PostHog via `analytics.setConfigureGlobals(...)`. The setter goes
through the provider so the analytics module stays the single source
of truth.
- `apps/web/src/analytics/provider.tsx` — expose
`setConfigureGlobals` on the analytics context and the test stub so
consumers route through the provider boundary.
- `apps/daemon/src/server.ts` — switch the daemon-side derive in
`/api/runs` to the shared `deriveConfigureGlobals` helper so the
authoritative run_created/run_finished captures match the web-side
payload. BYOK credentials live in the web client and stay invisible
to the daemon, so the daemon arm passes `byokConfigured: undefined`
and falls back to the installed-CLI signal.
- `apps/web/tests/analytics-configure-globals.test.ts` — new regression
test that pins the derive behavior across all branches and confirms
the setter actually mutates the client-side store. Locks the wire-up
so a future refactor can't silently turn the setter back into a
no-op.
Verification: pnpm guard clean; daemon / web typecheck clean; web tests
1703/1703 passing (up from 1696 — 7 new tests in the configure-globals
suite).
* fix(analytics): emit projects page_view + drop misattributed chat_panel source
Fourth review pass on PR #2285. Two follow-ups from mrcfps:
1. DesignsTab (projects landing) was emitting click events but no
matching page_view. Opening /projects without clicking anything left
the surface invisible in PostHog. Added a once-per-mount
trackPageView({ page_name: 'projects' }) with the same ref-keyed
pattern HomeView / PluginsView use.
2. ChatComposer was hard-coding source: 'recent_project' on every
chat_panel page_view. The web router currently only carries
projectId / conversationId / fileName, so we cannot distinguish a
New-project launch from a template-pick or a Recent-projects click
from this layer. A false constant would over-attribute every chat
launch to 'recent_project' and break the funnel slice this schema
was meant to unlock. Dropped the field for now — better no source
than the wrong source — until the router grows a launch-source
channel; the field is still defined as optional on PageViewProps so
the channel can land in a follow-up PR.
Verification: web typecheck clean; web tests 1703/1703 passing.
* fix(analytics): correct plugin-replacement async result + heterogeneous upload + missing requestId
Three follow-ups from the fifth review pass on PR #2285:
1. **plugin_replacement_result emitted before the apply settled**
(`apps/web/src/components/HomeView.tsx`). The modal's confirm action
was a synchronous wrapper around an async `usePlugin(...)` call, so
the surrounding try/catch never observed real failures and every
attempt was reported as `result=success`. Changed `PendingReplacement.
confirm` to return `Promise<void>`, made the wrapper return the
underlying promise, and moved the analytics emit into an async
IIFE in the click handler so the success/failure branches reflect
the actual outcome.
2. **file_upload_result mis-typed heterogeneous batches**
(`apps/web/src/components/FileWorkspace.tsx`). The earlier
implementation only inspected `picked[0]`, so a mixed batch like
`image.png + demo.mp4` reported `file_type=image`. Per the comment
above the block ("mixed batches collapse to other"), the
implementation now maps every file to a tracking type, collapses to
`other` when more than one distinct type is present, and falls
back to the single type otherwise.
3. **project_create_result lost the click→result correlation id**
(`apps/web/src/components/NewProjectPanel.tsx`). The click event
no longer carried the locally-generated `requestId` that
`project_create_result` keeps, so the two could not be joined.
`trackNewProjectModalElementClick()` now accepts an optional
`{ requestId }`, mirroring the other helpers, and the create-button
click threads the same id used for the result.
Verification: web typecheck clean; web tests 1703/1703 passing.
* fix(analytics): gate configure-state on agents probe + drop unsent run_created fields
Two follow-ups from the sixth review pass on PR #2285:
1. **Cold-start configure-state was stamped before fetchAgents() landed**
(`apps/web/src/App.tsx`). The useEffect that pushes the v2 triplet
into the PostHog global register fired on first paint with
`agents=[]`, so the first home/projects/plugins page_view reported
`has_available_configure_cli=false` / `configure_availability=
unavailable` even on machines that did have an installed CLI. The
effect now waits on `agentsLoading === false` and leaves the boot
defaults ('unknown'/'unknown') in place until the probe resolves.
2. **Daemon read run-context fields the web never sends**
(`apps/daemon/src/server.ts`). The daemon-side run_created /
run_finished baseProps read `projectKind`, `entryFrom`,
`projectSource`, `targetPlatforms`, `companionSurfaces`, `fidelity`,
`connectors`, `useSpeakerNotes`, `includeAnimations`,
`referenceTemplate`, and `aspect` from `req.body`, but
`packages/contracts/src/api/chat.ts` and
`apps/web/src/providers/daemon.ts` don't carry those keys on the
wire. Reading them therefore always produced null/undefined.
Dropped the unsent fields from the daemon capture; a follow-up can
extend the create payload to thread the real context through. The
`design_system_id` field stays because the chat contract does send
it.
Tests: added 3 regression tests in `tests/analytics-configure-globals.
test.ts` covering the boot-time gating contract (empty agents +
daemon mode → unavailable / local_cli; installed agent → available;
undefined agents list → unavailable). Verification: web typecheck
clean; daemon typecheck clean; web tests 1706/1706 passing (up from
1703 — 3 new cold-start tests).
* fix(analytics): pin mode='daemon' so missing-agent run reports unavailable
Eleventh review pass on PR #2285. mrcfps flagged that
`apps/daemon/src/server.ts` was calling `deriveConfigureGlobals(...)`
without `mode`, so the helper fell through to the generic branch.
Result: a run for an uninstalled agent was tagged
`configure_availability: 'available'` whenever any OTHER CLI was on
PATH, because the generic branch only looks at the cohort-wide
"any installed?" signal. That precisely undermines the slice the
daemon emit is trying to power.
The daemon's /api/runs handler is always a daemon-mode capture
(daemon is the local CLI runner — BYOK lives in the web layer), so we
now pin `mode: 'daemon'` on the call site. The helper then judges
`configure_availability` from the REQUESTED agent's install status and
reports `unavailable` when the user picked an agent that is not
installed, even if peers are.
Added a regression case in `tests/analytics-configure-globals.test.ts`:
`{ mode: 'daemon', agentId: 'codex', agents: [{claude,true},{codex,false}] }`
→ `{ has_available_configure_cli: true, configure_type: 'local_cli',
configure_availability: 'unavailable' }`.
Verification: daemon typecheck clean; web tests 1707/1707 passing
(up from 1706 — 1 new regression test).
* fix(analytics): hoist chat_panel page_view + thread requestId
- Move chat_panel page_view emit from ChatComposer to ProjectView so
it survives activeConversationId-driven ChatPane remounts. ProjectView
keys the dedupe ref by project.id; the composer drops its duplicate.
- Thread { requestId } into trackAssistantFeedbackReasonSubmitClick so
the click pairs with the existing feedback_submit_result on the same
request id (mirrors the trackNewProjectModalElementClick pattern).
* fix(analytics): keep v2 super-props alive across reset and stamp design_system_source
- Snapshot the register payload in client.ts on PostHog init and
re-register it from applyConsent(true) and applyIdentity() so a
privacy-toggle or Delete-my-data rotation does not resume capture
without event_schema_version / device_id / session_id / locale /
configure-state globals. setConfigureGlobals() also patches the
cache so a later restore picks up the current configure state.
- Stamp design_system_source on daemon-side run_created / run_finished
(it is required by RunCreatedProps / RunFinishedProps). Daemon
can't tell default vs user_selected vs inherited from the wire, so
it derives 'unknown' when designSystemId is present, 'not_applicable'
otherwise — a follow-up that threads designSystemSource through
CreateRunRequest can replace this with the precise source.
|
||
|
|
210b94069a
|
feat(senseaudio): BYOK chat with image + video generation tools (#2065)
* feat(senseaudio): BYOK chat with image + video generation tools
Adds SenseAudio as a first-class BYOK chat protocol and wires the daemon's
chat proxy with a tool loop so BYOK users can generate images and videos
without dropping to a CLI agent.
- BYOK protocol: new senseaudio tab + /api/proxy/senseaudio/stream route +
connection-test + provider-models discovery (OpenAI-compatible wire)
- Tool loop: generate_image (synchronous /v1/image/sync) and generate_video
(async /v1/video/create + 5s polling /v1/video/status, 10-min ceiling,
periodic progress log every 30s)
- Settings dropdown + chat-composer dropdown for the BYOK image model
default; generate_image's model enum lets the LLM override per call
- Seed-on-success: a successful BYOK chat call idempotently mirrors the
key into media-config (preserves env-resolved + already-stored keys)
- Generated artifacts land in <projectsRoot>/<projectId>/ so FileViewer,
DesignFilesPanel, and project export pick them up automatically;
legacy /api/byok-image/:id route kept for old conversation links
- Markdown renderer learns  image syntax with a scheme
allowlist (http(s) / data:image/ / blob: / relative paths)
- i18n key settings.byokImageModel across all 19 locales
- 3 SenseAudio image models registered (2.0, 1.0, doubao-seedream-5.0);
1 video model (doubao-seedance-2.0)
- Tests: byok-tools (29), media-senseaudio-image (8), media-config seed
(7), proxy-routes (47), markdown image rendering (8)
* fix(senseaudio): unblock image gen + design file preview switching
- SenseAudio /v1/image/sync rejected the previous size mapping with
`参数错误:size` (1664x936, 936x1664, 1280x960, 960x1280 are not in
the gateway's accepted set). Switched to standard HD / SD sizes that
every aspect bucket can hit: 1024×1024, 1280×720, 720×1280,
1024×768, 768×1024. Kept the byok-tools and media.ts tables in sync
so the BYOK chat tool and the CLI agent path both stop failing on
non-square aspects.
- DesignFilesPanel's <DfPreview> was missing a key prop, so React
reused the same iframe DOM node when the user picked a different
file — the src prop changed but the iframe never navigated. Added
key={previewFile.name} so the previous preview unmounts cleanly.
- Updated byok-tools + media-senseaudio-image tests for the new size
expectations.
* docs(senseaudio): clear stale provider hint + update README
- Settings → Media → SenseAudio: clear the auto-promoted
"Image · TTS · 70+ voices · clone" hint; the provider label alone is
enough now that the BYOK chat surface covers image + video tooling.
- README: list the new senseaudio (and missing ollama) proxy routes so
the BYOK section reflects what the daemon actually serves, and
mention the generate_image / generate_video chat tools that ship
with the SenseAudio path.
* fix(senseaudio): address PR #2065 review feedback
Three non-blocking review notes from @PerishCode on PR #2065:
1. Drop the dead /api/byok-image/:id route. The PR description claimed
it was "legacy fallback for old chat history" but that storage
layout never existed on main, so the route can only ever 400 or
404 — never 200. Removed the handler, the isSafeByokImageId
export, the unused createReadStream / stat / path / Request /
Response imports, and the two byok-image regression tests.
2. Add rejectProxyPluginContext guard to the senseaudio proxy
handler so it matches the invariant the other five proxy paths
already enforce (plugin runs must go through /api/runs for
snapshot pinning). Extended the existing "API fallback rejects
plugin runs" describe to also cover /api/proxy/senseaudio/stream
with the 409 PLUGIN_REQUIRES_DAEMON expectation.
3. Wrap the secondary image / video downloads (the URLs the
SenseAudio gateway hands back in /v1/image/sync .url and
/v1/video/status .video_url) in validateBaseUrlResolved so a
malicious gateway can't point us at 169.254.169.254 (AWS / Azure
metadata) or RFC1918 hosts via the response payload. Also passed
`redirect: 'error'` on both fetches to match the SSRF posture
the primary proxy fetch already uses. The new
assertExternalAssetUrl helper lives next to executeGenerateImage
so future tool downloads can reuse it.
Tests: 120/120 daemon tests pass; guard + typecheck green.
* fix(senseaudio): mirror SSRF guard onto renderSenseAudioImage CLI path
Follow-up to
|
||
|
|
9a64fccdc0
|
fix(security): resolve hostname before approving external API base URLs (#1176)
Some checks failed
ci / Packaged mac smoke (push) Blocked by required conditions
ci / Packaged windows smoke (push) Blocked by required conditions
ci / Detect PR change scopes (push) Failing after 2s
ci / Validate workspace (push) Has been skipped
Docker image / build-and-push (push) Failing after 27s
landing-page-ci / Validate landing page (push) Failing after 1s
landing-page-deploy / Deploy landing page (push) Has been skipped
github-metrics / Generate repository metrics SVG (push) Has been skipped
nix-check / build (push) Failing after 2s
ci / Packaged linux headless smoke (push) Has been skipped
* fix(security): resolve hostname before approving external API base URLs
Before this change the daemon-side base-URL validator only inspected the
literal hostname string. A public DNS record that points at an internal
address ('internal.example.com' → 10.0.0.5) passed validation, and the
daemon would issue the upstream request anyway — turning the BYOK proxy
into an SSRF primitive against internal infrastructure.
Add a small companion ('validateBaseUrlResolved') that runs the existing
sync check, resolves the hostname with 'dns.lookup({ all: true })', and
re-applies the block-list against every resolved address. Wire it into
the wrapper the daemon already uses ('validateExternalApiBaseUrl'), so
every proxy/finalize handler picks it up without further edits.
Carve-outs match the existing sync validator:
- Loopback hostnames skip DNS (Ollama-style local LLMs still work,
including '*.localhost' / 'lvh.me'-style names that resolve to 127.0.0.1
per RFC 6761).
- IP literals are already vetted by the sync pass; no need to re-resolve.
- DNS resolver errors fall through to the existing fetch error path —
a transient ENOTFOUND should not turn into a 403.
The 6 callers that previously consumed the sync result now 'await' the
async wrapper. All call sites are already inside async route handlers.
Vitest coverage in apps/daemon/tests/connection-test.test.ts covers:
sync rejection passthrough, loopback / IP-literal short-circuits,
private IPv4 and IPv6 resolution, dual-stack with one private record,
public→public passes (api.openai.com), '*.localhost' resolved→loopback,
and resolver-error fallback.
Detected by: aeon (manual review + semgrep + osv + trufflehog).
Class: CWE-918 (SSRF) — DNS-based bypass.
* fix(security): cover remaining daemon baseUrl fetch surfaces + Ollama redirects
Addresses PR #1176 review feedback (lefarcen / mrcfps): the resolved-IP
wrapper only covered the proxy/finalize routes, leaving three adjacent
SSRF gaps open.
- testProviderConnection (/api/test/connection provider mode): switch
from sync validateBaseUrl to await validateBaseUrlResolved so a
hostname that resolves to a private IP is rejected before the daemon
POSTs the smoke prompt upstream.
- listProviderModels (/api/provider/models): same swap. Import the
DNS-aware helper from ./connectionTest.js since it carries the dns
binding the daemon owns; contracts stays pure.
- Ollama proxy stream fetch: align with the other four proxy routes
by setting redirect: 'error', so a validated public host cannot 3xx
the daemon to a private/internal URL after the pre-fetch check.
Regression coverage:
- POST /api/provider/models — DNS spy returns 10.0.0.5 for a synthetic
hostname; route must respond { ok: false, kind: 'forbidden' } and
must not invoke upstream fetch.
- POST /api/test/connection provider mode — same shape.
- /api/proxy/ollama/stream — fetch mock asserts redirect: 'error' on
the upstream Ollama call.
The existing /api/provider/models timeout test now stubs dnsPromises
so it doesn't race the probe timer against real DNS.
---------
Co-authored-by: aeon <aeon@aaronjmars.com>
|
||
|
|
22a3b99a47 |
Merge origin/main into preview/v0.8.0
Sync 49 commits from main. Conflicts resolved:
- .github/workflows/ci.yml: kept v0.8.0 granular per-area gating, added main's
linux specs + release-stable.yml + release-preview.yml triggers
- .github/workflows/release-preview.yml: kept v0.8.0's full workflow over main's placeholder
- apps/web/src/components/AssistantMessage.tsx: combined v0.8.0 file-ops
summary with main's stripTodoToolGroups + suppressAskUserQuestionFallbackText
- apps/web/src/components/ChatPane.tsx: kept both new imports
- apps/web/src/index.css: kept both .msg-plugin-chip and .user-copy-btn blocks
- e2e/ui/*.test.ts: kept v0.8.0 openEntrySettingsDialog helper over main's
inline dialog navigation (UI was redesigned in v0.8.0)
- nix/package-{daemon,web}.nix: kept v0.8.0 pnpmDepsHash; rerun nix build to refresh
|
||
|
|
9cf265e520
|
feat(claude): wire AskUserQuestion tool through chat + pin TodoWrite (#1743)
* feat(claude): wire AskUserQuestion tool through chat + pin TodoWrite
Claude calls `AskUserQuestion` for mid-conversation clarifications when
the natural answer is one of a small finite set of choices. Until now
the tool round trip hit two dead ends in headless mode: claude-code -p
cannot prompt the user, so it auto-errored the tool and retried 4x;
the model then hedged by also writing the same options as a markdown
bulleted list. The host had no way to feed a real `tool_result` back.
This change makes the AskUserQuestion round trip work end to end:
* Switch Claude to `--input-format stream-json`. The daemon wraps the
prompt as a JSONL `user` message on stdin and keeps stdin OPEN, so
later writes (a `tool_result` for the open AskUserQuestion) feed
back into the same child instead of needing a fresh spawn.
* New `RuntimeAdapter.promptInputFormat()` ('text' default,
'stream-json' for Claude) so the spawn loop keeps the old close-on-
prompt behavior for every other agent.
* New `POST /api/runs/:id/tool-result` daemon endpoint and
`submitChatRunToolResult` web helper. Body carries `toolUseId` and
`content`; daemon writes a JSONL `user` message with the matching
`tool_result` content block.
* Track outstanding host answers on the run (`pendingHostAnswers`)
and close stdin on either a `usage` event or a synthesized
`turn_end` event (extracted from `assistant.message.stop_reason`
in `claude-stream`). Without the per-turn `turn_end` signal stdin
would never close after the follow up turn finished and the run
would hang until the inactivity watchdog killed it.
* System prompt: tell Claude to use AskUserQuestion for follow ups
with 2-4 finite choices, and to STOP after the tool call instead
of writing a markdown duplicate.
Web UI:
* New `AskUserQuestionCard` renders the tool input as labelled chip
buttons (single or multi select) with a Submit button styled like
the composer's Send. On submit the answer routes through
`submitChatRunToolResult` (live tool_result path) and falls back
to `onSubmitForm` (plain user message) only if the run has already
terminated. Selected chips persist across page reloads by re
parsing the stored `tool_result.content`.
* Hide markdown text that follows an AskUserQuestion in the same
turn — defense in depth against the model emitting the duplicate.
* Collapse identical `AskUserQuestion` / `TodoWrite` retries inside
any tool group to a single card. TodoWrite is a snapshot tool,
so older calls are duplicates of state.
* Pinned TodoCard above the chat composer. The latest TodoWrite
snapshot across the conversation renders once, expandable /
collapsible header, count shows in-progress + completed (1/4),
Done button dismisses when all tasks finish, soft fade gradient
above so scrolling chat text dissolves into the panel instead of
hard clipping under the card.
* Composer gains a top shadow that only appears when the pinned
todo slot sits directly above it (dark mode strengthened).
* Accordion expand / collapse motion shared between TodoCard, the
ToolGroupCard disclosure, and BashCard output via
`grid-template-rows: 0fr -> 1fr` with `cubic-bezier(0.23, 1, 0.32, 1)`
and asymmetric durations (200ms enter, 140ms exit) per Emil
Kowalski's animation framework.
* Jump-to-latest button no longer unmounts on hide; slides up with
scale 0.9 -> 1 + fade on show, slides down with scale + fade on
hide. Always horizontally centered via `margin: 0 auto`.
i18n:
* `tool.askQuestion`, `tool.askQuestionSubmit`, `tool.askQuestionPending`,
`tool.askQuestionAnswered`, `tool.todosExpand`, `tool.todosCollapse`,
`tool.todosDone`, `tool.todosDismiss` added to all 18 locales.
Unblocker:
* Fix a pre-existing render loop in `ProjectView` when the user
clicks "New conversation". `handleNewConversation` now navigates
to the fresh conversation id synchronously after
`setActiveConversationId` so the route-sync effect at L512 and
the URL-sync effect at L851 do not ping pong (route mismatch
triggered repeated reverts; React's nested-update guard fired).
* fix(claude): order turn_end after content blocks + cover chat switching
Two follow-up fixes to the AskUserQuestion + new-conversation work:
* `claude-stream.ts` emitted `turn_end` BEFORE iterating the assistant
message's content blocks. When claude-code lacks
`--include-partial-messages` (older builds), tool_use events surface
only from that loop, so the daemon's stdin-close handler saw an
empty `pendingHostAnswers` set and closed stdin before the
AskUserQuestion tool_use was even registered. The result: the model
retried, hit the same race, and gave up writing the questions in
prose. Emit `turn_end` AFTER the content loop so tool_use ids land
in `pendingHostAnswers` first.
* `server.ts` now ignores `turn_end` events with
`stop_reason: 'tool_use'`. That stop reason means the model paused
to wait for a tool execution (claude-code's internal tool runner
for Bash / Edit / Read, or a host-answered tool like
AskUserQuestion). Either way the conversation is still in flight —
closing stdin there would kill the follow-up response. Only the
natural turn-end stop reasons (`end_turn`, etc.) close stdin.
* `ProjectView.handleSelectConversation` now navigates to the picked
conversation id synchronously, mirroring the fix already in
handleNewConversation. The route-sync effect at L512 was reverting
the active conversation on every switch, ping-ponging with the
URL-sync effect at L851 until React's nested-update guard fired
with "Maximum update depth exceeded". Same bug class as the
pre-existing new-conversation render loop.
* docs(agents): capture AskUserQuestion runtime + chat UI conventions
Record the patterns this PR introduces so future contributors can find
them without spelunking server.ts:
* Agent runtime conventions — `RuntimeAgentDef.promptInputFormat`,
`run.pendingHostAnswers` / `run.stdinOpen` lifecycle, `turn_end`
ordering rule, `POST /api/runs/:id/tool-result` endpoint shape, the
Claude only system prompt block that nudges AskUserQuestion, and the
`suppressAskUserQuestionFallbackText` defense in depth.
* Chat UI conventions — URL-load vs srcDoc render mode dispatch with
bridge disqualifiers, the dual iframe visibility swap pattern,
`isOurIframe` plus the active-iframe re-check for signals that must
only come from the visible iframe, pinned TodoCard via
`PinnedTodoSlot`, count includes `in_progress`,
`dedupeSnapshotToolRetries` for AskUserQuestion / TodoWrite stacks.
* i18n keys — 18 locale files, add the key to `types.ts` first.
* UI animation philosophy — `cubic-bezier(0.23, 1, 0.32, 1)` ease out,
asymmetric 200/140ms enter/exit, accordion via `grid-template-rows`,
no `transform: scale(0)`, keep mounted + toggle class for exit
transitions instead of relying on React unmount.
* fix(claude): read promptInputFormat as field, close stdin on deferred answer
Two PR review follow-ups on the AskUserQuestion stream-json wiring.
* server.ts:4616 referenced `runtimeAdapter.promptInputFormat()` — but
`runtimeAdapter` is not declared, imported, or assigned anywhere. The
prior adapter abstraction was deleted in #1656; when the changes
were folded back into the inline handler the format was moved onto
`RuntimeAgentDef.promptInputFormat`, but this call site was missed.
`server.ts` starts with `// @ts-nocheck` so typecheck never caught
it — every chat run hit `ReferenceError: runtimeAdapter is not
defined` the moment we wrote the prompt to a stdin-fed child, which
is every agent with `promptViaStdin: true` (claude, codex, copilot,
cursor-agent, gemini, opencode, pi, qoder). Read the format off the
in-scope `def` and default missing values to `'text'`.
* `submitToolResultToRun` cleared the answered id from
`pendingHostAnswers` but never closed stdin if a `turn_end` /
`usage` event had already fired with the set non-empty (deferred
by the event handler). The child then waited indefinitely for
further input until the inactivity watchdog killed it, losing the
model's follow-up response. Close stdin on the last-answer
transition when stream-json stdin is still open.
Test: pin `promptInputFormat` for every `promptViaStdin: true` agent
so future regressions of the field-vs-method contract fail at
typecheck-adjacent test time instead of in production. The new test
asserts `typeof def.promptInputFormat` is a string (or undefined),
not a function — exactly the shape mistake the original line made.
* fix(web): keep AskUserQuestion multi-select chips selected after reload when labels contain commas
`handleSubmit` joined multi-select answers with `', '` while the
reload parser split them on `','`. The pair is asymmetric: a valid
model-generated option like `"Yes, including images"` round-tripped
as `["Yes", "including images"]`, so after a page reload the locked
question card showed the user's pick as unselected — even though the
`tool_result` content the daemon actually wrote into the run was
correct, and the model saw the right answer. Bounded to post-reload
visual state, but silently confusing.
Switch to a `- ` bullet list per option, one per line, with the
parser stripping the leading `- ` back off. Newlines never appear
inside a label so the round trip is exact. The outer pairs separator
stays `\n\n` because individual answer bodies still never contain
that double-newline.
* chore: drop accidental personal design-system file
`design-systems/foldar/DESIGN.md` was added to the AskUserQuestion
branch in
|
||
|
|
76defffb93
|
Garnet hemisphere (#1702)
* feat(chat-composer): enhance mention handling and input overlay - Introduced a new overlay for inline mentions in the chat composer, improving user experience by visually indicating mentions as users type. - Updated the `ChatComposer` component to manage mention entities and integrate them into the input field, allowing for better context and interaction. - Enhanced the `AssistantMessage` component to support the display of plugin action panels based on the current project context, facilitating easier plugin management. - Refactored related components to ensure consistent handling of project files and mentions across the application. This update significantly improves the chat interaction model, making it more intuitive for users to engage with mentions and plugins. * feat(plugin-management): enhance plugin action panels and UI components - Updated the `AssistantMessage` component to include plugin action panels based on the latest project context, improving user interaction with generated plugins. - Refactored the `PluginsView` to support detailed views for available marketplace entries, allowing users to access more information and actions for each plugin. - Introduced new CSS styles for improved visual representation of plugin-related UI elements, enhancing overall user experience. - Enhanced the `listPlugins` function to include an option for fetching hidden plugins, providing more flexibility in plugin management. This update significantly improves the usability and functionality of the plugin management system, making it easier for users to interact with and manage their plugins. * fix(assistant-message): refine plugin folder candidate selection logic - Updated the `pluginFoldersTouchedThisTurn` function to improve the logic for selecting plugin folder candidates based on touched paths and message content. - Introduced a new helper function, `pathMatchesFolderFileBasename`, to enhance the matching criteria for folder candidates. - Added a check for explicit folder matches before falling back to a single candidate, improving accuracy in folder selection. - Modified the `shouldRenderSlotAsText` function in `HomeHero` to include the name parameter, refining the rendering logic for slot text. These changes enhance the functionality and reliability of the assistant message component in managing plugin folder candidates. * feat(plugin-folder-actions): implement agent-routed CLI actions for plugin management - Introduced a new `PluginFolderAgentAction` type to streamline actions related to plugin folders, including install, publish, and contribute. - Updated the `DesignFilesPanel`, `FileWorkspace`, and `AssistantMessage` components to utilize the new agent action handling, improving user interaction with generated plugins. - Refactored the action handling logic to send commands to the agent, enhancing the workflow for managing plugin folders. - Added corresponding tests to ensure the new functionality works as expected and integrates seamlessly with existing components. This update significantly enhances the plugin management experience by routing actions through the agent, allowing for a more cohesive and interactive user experience. * Fix PR 1702 CI blockers * Fix PR 1702 remaining CI checks * Prebuild AGUI adapter after install * Restore plugin project snapshot wiring * feat(marketplace): refactor marketplace URL handling and enhance fetching logic - Introduced new functions to normalize marketplace URLs and manage fetching of marketplace manifests, improving the reliability of marketplace integrations. - Updated the server and plugin logic to utilize the new fetching mechanisms, ensuring consistent handling of marketplace data. - Enhanced tests to cover new URL normalization and fetching scenarios, ensuring robustness in marketplace management. This update significantly improves the marketplace experience by streamlining URL handling and enhancing data fetching capabilities. * Fix project auto-send cleanup spec |
||
|
|
e1bc83a476
|
feat(analytics): PostHog product analytics (P0 events, consent-gated, packaged) (#1428)
* feat(analytics): scaffold PostHog product-analytics integration
- Add @open-design/contracts/analytics subpath with the 17 P0 event
payload types, header constants, and code↔CSV enum mapping helpers.
- Add apps/daemon/src/analytics.ts with env-gated posthog-node client,
request-scoped analytics context reader, and artifact-id anonymizer.
- Expose GET /api/analytics/config so the web bundle never embeds the
PostHog key at build time; daemon owns POSTHOG_KEY / POSTHOG_HOST.
- Add apps/web/src/analytics module (identity + lazy posthog-js client
+ React provider) and mount it under <I18nProvider> in app/layout.
No event wiring yet — that lands in the next commit alongside trigger
points (App.tsx, EntryView, NewProjectPanel, SettingsDialog, FileViewer,
runs.ts).
* feat(analytics): wire app_launch, home_view, home_click, project_create_result
- App.tsx: fire app_launch once after first effect tick. handleCreateProject
now emits project_create_result on both success and failure paths.
- EntryView.tsx: home_view (page) gated on agents loading so
has_available_cli isn't transiently false; home_view (asset_panel) fires
per top-tab change with the right result_count.
- NewProjectPanel.tsx: home_click create_button fires before delegating to
the parent; a fresh request_id is generated here and threaded through
onCreate so the matching project_create_result stitches via $insert_id.
- contracts/analytics: tighten createTabToTracking and topTabToTracking
for the worktree branch's renamed tabs (live-artifact, templates).
* feat(analytics): wire settings_view + 3 settings_click events
- settings_view fires on dialog mount and on every section switch,
carrying the active section (mapped via settingsSectionToTracking
for the 16-section worktree layout), execution_mode, and the
selected CLI provider id when present.
- settings_click execution_mode_tab: setMode now emits before/after
values whenever the user toggles between Local CLI and BYOK.
- settings_click cli_provider_card: agent card onClick reports
cli_provider_id via agentIdToTracking (kiro → other).
- settings_click byok_field: onFocus added to api_key, model select,
and base_url inputs; provider_id widened to include google so the
worktree's Gemini protocol slot type-checks.
* feat(analytics): wire studio_view + studio_click chat, studio_view artifact
- packages/contracts/src/analytics/artifact-id.ts: FNV-1a 64-bit helper
produces a 16-hex anonymized id for (projectId, fileName). Stable
cross-platform so the daemon and the web bundle resolve the same id
without a Web Crypto round-trip; daemon now re-exports it.
- ChatComposer: studio_view chat_panel fires once per project mount,
studio_click chat_composer fires on attachment + send buttons with
estimated user_query_tokens (length/4) and has_attachment.
- FileViewer: studio_view artifact fires once per (project, file) at
the dispatcher level, before any sub-viewer renders, with
artifact_kind derived from the renderer registry / file.kind table.
- Widen TrackingExportFormat to include markdown and cloudflare_pages
so the worktree branch's full share menu can emit verbatim.
* feat(analytics): wire studio_click share_option + artifact_export_result
HtmlViewer's share menu now emits both events per click via a
fireShareExport helper:
- studio_click share_option fires immediately on click with the chosen
export_format and a fresh request_id.
- artifact_export_result fires when the export resolves — success for
sync exporters (html, markdown, template) the moment the call
returns, success/failed for async exporters (pdf, zip, deploy)
via .then/.catch. The same request_id threads both events so
PostHog stitches click → result via $insert_id.
DEPLOY_PROVIDER_OPTIONS maps to the CSV's vercel / cloudflare_pages
slots; markdown is now a first-class export_format value.
Also ignore .env.local so local POSTHOG_KEY / .env-style secrets
don't get committed.
* feat(analytics): emit run_created and run_finished from the daemon
POST /api/runs now reads the analytics context off the
x-od-analytics-* headers the web client sets on every fetch, then:
- Captures run_created with project_id, conversation_id, run_id,
model_id, agent_provider_id (mapped via agentIdToTracking),
skill_id, design_system_id, plus the token_count_source marker.
- Schedules a run_finished capture on runs.wait(run) resolution,
mapping succeeded/canceled/failed to success/cancelled/failed and
reporting total_duration_ms.
Both events use a stable insert_id derived from the same uuid so
PostHog dedupes the daemon-side mirror against any future
web-side capture without double-counting.
Token sub-fields (user_query_tokens/system_prompt_tokens/...) stay
omitted in v1 — the claude-stream parser only exposes input/output
totals today. See tracking-doc-issues.md §3.2.
* feat(analytics): emit settings_cli_test_result + settings_byok_test_result
The original BLOCKING-list assumed these CSV P0 events were not
implementable in this branch because main lacked Test buttons. The
worktree HEAD actually wires `handleTestAgent` and `handleTestProvider`
in SettingsDialog, so both events are now in scope.
- handleTestAgent emits settings_cli_test_result on success and
failure paths with cli_provider_id mapped via agentIdToTracking,
result drawn from result.ok / catch branch, error_code from
result.kind or the thrown error name, and duration_ms timed via
performance.now().
- handleTestProvider emits settings_byok_test_result analogously,
using apiProtocol (anthropic|openai|azure|ollama|google) directly
as provider_id — wider than the CSV's 5-value enum, documented in
tracking-doc-issues.md §2.5.
Contracts: add SettingsCliTestResultProps / SettingsByokTestResultProps
plus matching track* helpers. AnalyticsEventName union now covers all
14 P0 events this branch supports.
* feat(analytics): gate PostHog on the existing telemetry.metrics consent
The integration now reuses the same first-launch privacy banner +
Settings → Privacy toggle that gates Langfuse, so a single user
decision controls both telemetry sinks.
- /api/analytics/config now consults the persisted AppConfigPrefs:
it returns enabled=true only when POSTHOG_KEY is set AND the user
has chosen "Share usage data" (telemetry.metrics === true). The
response also echoes installationId so the web client uses the
same anonymous id Langfuse keys off of — one identity per install,
shared across both sinks.
- Web AnalyticsProvider:
- Bootstrap fetch resolves installationId and threads it through
the x-od-analytics-anonymous-id header on every /api/* fetch,
so daemon-side captures (run_created / run_finished /
project_create_result) land on the same person record.
- Exposes a setConsent(granted) method that calls posthog-js's
opt_in_capturing / opt_out_capturing, wired from App.tsx via a
useEffect watching config.telemetry?.metrics. Toggling Privacy
→ metrics now stops/resumes events immediately, no reload.
- app_launch additionally gates on telemetry.metrics so a freshly-
declined user fires nothing, and a freshly-opted-in user fires on
the next reload.
* feat(packaging): bake POSTHOG_KEY into packaged daemon spawn env
Wires PostHog product analytics through the same Langfuse-style build-
secret pipeline so official Open Design builds ship with the key while
fork builds compile without it (the integration short-circuits cleanly
when POSTHOG_KEY is absent).
tools/pack
- resolveToolPackConfig reads POSTHOG_KEY / POSTHOG_HOST from
process.env at packaging time, validates them (no whitespace in the
key, http(s) URL for host, trailing-slash strip), and stamps them on
ToolPackConfig. Fork builds without the env vars simply omit the
fields; the daemon-side gate keeps things off in that case.
- Mac, Windows, and Linux packaged-config writers each append the two
fields to open-design-config.json next to the existing
telemetryRelayUrl entry.
apps/packaged
- RawPackagedConfig / PackagedConfig surface posthogKey / posthogHost
so the Electron entry and headless entry both forward them to the
daemon sidecar.
- buildPackagedDaemonSpawnEnv emits POSTHOG_KEY / POSTHOG_HOST into
the daemon child env when present. The daemon's existing analytics
module reads these via process.env — no daemon-side changes needed.
- The headless packaged path falls back to process.env for fields the
builder hasn't injected, mirroring how OPEN_DESIGN_TELEMETRY_RELAY_URL
is read there.
CI
- release-beta.yml and release-stable.yml expose POSTHOG_KEY (secret)
and POSTHOG_HOST (var) at workflow-env scope so every packaging job
inherits them. PR / fork builds without these set simply skip the
bake step.
Tests
- tools/pack: config.test.ts covers bake-through, fork-build omission,
whitespace rejection, invalid-URL rejection, and trailing-slash
normalization.
- apps/packaged: sidecars.test.ts covers buildPackagedDaemonSpawnEnv
forwarding the keys when present and omitting them when null.
* feat(analytics): enable PostHog autocapture + perf + exceptions
Flip on the PostHog SDK's automatic diagnostic features so we capture
click paths, page transitions, web vitals, dead clicks, and browser
exceptions without scattering instrumentation through the codebase.
Privacy defense lives in one place — apps/web/src/analytics/scrub.ts —
wired in via posthog-js's `before_send` hook so every outgoing event
passes through the same audit point:
- $autocapture / $rageclick / $dead_click / $copy_autocapture:
strips $el_text and value/placeholder/aria-label attrs from any
input, textarea, password input, or contenteditable element. PostHog
autocapture does not capture input.value by default, but $el_text
on a <textarea> reflects the typed content — that's the prompt
body for us, so it has to be scrubbed every time.
- $pageview / $pageleave: drops query string and fragment from
$current_url / $referrer so any future ?q=… can't leak.
- $exception: rewrites file:// and absolute filesystem paths in
stack frames to app://apps/<repo-relative> so we don't ship the
user's home directory.
- Suppresses $opt_in entirely — duplicate of our explicit
setConsent toggle in App.tsx.
Element-level defense in depth is limited to the single most sensitive
surface: the chat composer textarea gets `ph-no-capture` so PostHog
never even generates an event for clicks inside that subtree. Every
other input relies on scrub.ts — sprinkling the class through every
form would be noisy and easy to forget on new surfaces.
The existing Privacy → "Share usage data" toggle continues to gate
every new feature: posthog-js's opt_out_capturing() halts autocapture,
$pageview, $exception, web vitals, and dead clicks alongside the
explicit capture() calls — one global switch.
11 unit tests pin the scrub rules in apps/web/tests/analytics-scrub.test.ts.
* ci(nix): bump pnpmDepsHash for posthog-js + posthog-node additions
Adding posthog-js to apps/web and posthog-node to apps/daemon changed
pnpm-lock.yaml, which Nix's fixed-output pnpmDeps derivation pins by
sha256. The CI nix flake check failed with:
specified: sha256-KF3Mld72/iau+pJmA7HvnanRx8VLtDP0N624SKrtrrc=
got: sha256-PGFgX4lYyeH2TRAXfUq52A3EOa6bb1gO59hPsXhEk3s=
Copy the new hash into both nix/package-web.nix and
nix/package-daemon.nix per the procedure documented in nix/README.md
§"First-build hash pinning".
* feat(analytics): unify PostHog identity with Langfuse installationId
PostHog's distinct_id is the installationId stamped by /api/analytics/
config; Langfuse already reads the same id off app-config.json to
populate trace.userId. With both sinks keying off the same anonymous
identity, dashboards can correlate user actions (PostHog events) with
LLM runs (Langfuse traces) without re-identifying.
Two gaps closed:
1. applyConsent(false) — clear posthog-js's persisted ph_*_posthog
localStorage entry on opt-out via posthog.reset(). Without this, a
user who opts out, then clicks Delete my data, then re-opts in
would see PostHog stitch their new session to the deleted identity
because bootstrap.distinctID only takes effect on first init.
2. applyIdentity(newInstallationId) — Delete my data rotates the
installationId in app-config; App.tsx now watches config.installationId
and calls posthog.reset() then identify(newId) so the next event
batch is fully decoupled from the deleted one. Idempotent on
same-id re-renders so benign config refreshes don't churn PostHog
identities.
The fetch wrapper's x-od-analytics-anonymous-id header also flips to
the new id on rotation so daemon-side captures (run_created /
run_finished) land on the same person record from the very next API
call, not after a reload.
The end-to-end rotation flow is verified against a live PostHog
project; these unit tests pin the safety guards (no-client paths, null
inputs) since stubbing posthog-js's init-loaded callback chain is
brittle.
* fix(langfuse): require both metrics AND content consent for trace reports
Tightens the Langfuse gate so a user who shares anonymous metrics but
NOT conversation content stops emitting Langfuse traces entirely —
Langfuse is used for turn-quality evals which only make sense with
prompt/output bodies. PostHog (product analytics, content-free) stays
gated on `metrics` alone and is unaffected.
i18n: "Conversation content" → "Conversation and tool content" with
hints expanded to mention tool inputs/outputs so the consent surface
matches what the trace actually carries (en + zh-CN).
Bundled here per PR scope — change originated outside this PostHog
PR but lands cleanly on the same files; gating Langfuse strictly
on `content` makes the dual-sink consent model (PostHog = metrics,
Langfuse = metrics + content) symmetric across both i18n locales and
the daemon-side gate.
* feat(analytics): wire byok_provider_option + fix PR review P1s
Adds the BYOK protocol-chip click event (5-value provider_id mirroring
the apiProtocol Settings UI) and resolves four P1 review threads on
PR #1428.
byok_provider_option:
- New SettingsClickByokProviderOptionProps in contracts (provider_id =
anthropic|openai|azure|google|ollama; maps to CSV's 5 values per
tracking-doc-issues.md §2.5).
- trackSettingsClickByokProviderOption helper in apps/web/src/analytics.
- SettingsDialog hooks it on the protocol-chip onClick alongside the
existing setApiProtocol call; is_selected reflects whether the chip
was already active.
Review fixes:
1. client.ts (Siri-Ray): clear `initPromise` when the resolution is
null so a Privacy → metrics opt-in after a previous decline triggers
a fresh /api/analytics/config fetch. Without this, the disabled
response was cached forever — first-session opt-in needed a reload
to start sending PostHog events.
2. provider.tsx (Siri-Ray): replace `url.includes('/api/')` with a
strict same-origin + /api/ pathname check (shared
`isSameOriginApiCall` helper). Outbound third-party URLs containing
`/api/` (e.g. provider.example.com/api/x) no longer receive our
x-od-analytics-* headers.
3. provider.tsx (codex-connector, lefarcen): gate header injection on
`resolvedAnonId` being non-null. When Privacy → metrics is off,
/api/analytics/config returns enabled=false → resolvedAnonId stays
null → wrapper never installs → daemon can't read consent-bearing
headers → no daemon-side PostHog event. setConsent now also clears
resolvedAnonId on opt-out and re-fetches on opt-in.
4. daemon/analytics.ts (defense in depth): createAnalyticsService now
takes dataDir and capture() re-reads app-config to check
telemetry.metrics inside the fire-and-forget wrapper. Even if a
stale header somehow reaches the daemon after opt-out, the capture
is dropped before posthog-node.capture is called.
* fix(web): place "Share usage data" on the right in privacy consent banner
Swap button order in PrivacyConsentModal and the in-settings ConsentCard
so the affirmative "Share usage data" lands on the right and "Not now"
on the left. Matches the OK-on-the-right pattern users expect for
primary actions.
Both buttons keep equal visual prominence (same .privacy-consent-action
styling) so the swap doesn't change the EDPB equal-prominence stance
called out in the original Langfuse telemetry spec.
* feat(analytics): populate run_finished token totals from claude-stream usage
Daemon's claude-stream parser already emits agent usage events with
input_tokens / output_tokens totals; the run service buffers them in
run.events and Langfuse reads them out the same way. The run_finished
PostHog event was leaving these fields empty.
Scan run.events for the most recent agent usage frame on terminal
transition and emit input_tokens / output_tokens / total_tokens when
present. token_count_source flips to 'provider_usage' only when at
least one count landed; runs without provider-side usage data keep
'unknown'.
Provider does not break the input down into the 7 sub-fields the
tracking doc lists (memory / context / attachment / system_prompt /
…); those stay omitted until a parser change exposes them.
* feat(analytics): estimate user_query_tokens from prompt length
The user_query_tokens field for run_created / run_finished was hardcoded
to 0. We can't tokenize without bundling a model-specific tokenizer, but
the character/4 heuristic is the industry-standard estimate when one
isn't available and is enough for funnel analysis (prompt-length cohorts,
short-vs-long-query conversion rates).
Extracted from req.body via the same telemetryPromptFromRunRequest
pattern the daemon already uses for langfuse-bridge (currentPrompt then
message fallback). Only the integer count goes to PostHog — the prompt
text itself never leaves the daemon.
token_count_source flips appropriately:
- run_created with a prompt: 'estimated' (was 'unknown')
- run_created with no prompt: 'unknown'
- run_finished with provider usage: 'provider_usage' (overrides
baseProps' 'estimated' value)
- run_finished without provider usage: inherits 'estimated' or 'unknown'
from baseProps so input/output absent doesn't mask the estimate.
|
||
|
|
31e57fd773
|
fix(daemon): persist runStatus/endedAt on chat run termination (#1230)
* fix(daemon): persist runStatus/endedAt on chat run termination (#135) POST /api/runs created the run but never reconciled the messages row on terminal status. If the web failed to persist the cancel (refresh, dropped PUT), the row stayed at run_status='running' / ended_at=NULL, and on reload the elapsed timer kept climbing because the renderer fell back to now - startedAt. Mirror routine/orbit reconciliation: attach a wait-completion handler that updates run_status and ended_at, guarded by COALESCE and a run_status IN ('queued','running') filter so concurrent web persists are not clobbered. Adds cancelRun helper and two regression specs under e2e/tests/dialog/. * fix(daemon): annotate reconcile callback params for chat-routes The chat run reconciliation block landed in chat-routes.ts after the recent server-route split (#1043), where stricter type checking surfaces implicit `any` parameters. Annotate the wait/then callback as `{ status: string }` and the catch callback as `unknown`. * refactor(daemon): extract reconcileAssistantMessageOnRunEnd helper The inline if/wait/then/catch block in POST /api/runs read as a bolt-on patch. Lift it to a named file-scope helper so the route handler stays intent-level (start the run, arrange follow-up reconciliation) and the guard for missing assistantMessageId is an internal detail. The helper's docblock describes the invariant ("messages row reflects the run's terminal state even without web persist"); commit history keeps the issue context. * test(e2e): wait for any terminal status in stop-reconcile spec The earlier .catch fallback chained two waitForRunStatus calls (canceled then succeeded). waitForRunStatus throws on the first non-expected terminal, so a canceled run that resolves to failed (e.g. agent exits non-zero on SIGTERM) would still abort the test before reaching the messages-row assertion. Add waitForRunTerminal to e2e/lib/vitest/runs.ts: polls until any terminal status without throwing on mismatch, since this spec's claim is about the resulting messages row, not which terminal the run took. Addresses Codex inline review on PR #1230. |
||
|
|
b1d440d2bd
|
refactor(daemon): split route registration (#1043)
* spec * refactor(daemon): split server route registrars * refactor(daemon): group route registrar dependencies * refactor(daemon): move remaining domain routes out of server * update doc * revert spec * fix daemon route context contract Generated-By: looper 0.5.6 (runner=fixer, agent=opencode) * fix media task persistence Generated-By: looper 0.5.6 (runner=fixer, agent=opencode) * fix: restore daemon route registrations * fix: restore static resource mutation origin checks |