open-design

mirror of https://github.com/nexu-io/open-design.git synced 2026-05-31 19:04:39 +07:00

Author	SHA1	Message	Date
lefarcen	98a2c63973	feat(daemon): add Antigravity agent adapter (#3157 ) * feat(daemon): add Antigravity agent adapter Adds Google Antigravity (`agy` CLI) as a coding-agent runtime. Detection picks up `agy` on PATH, the daemon spawns `agy -p "<prompt>"` for a single non-interactive turn, and the assistant text reply streams back on stdout. OAuth is shared with the Antigravity IDE through the system keyring, so users who have signed into the desktop app are authenticated on first run with no extra step. `agy` v1.0.3 has no JSON / stream-json / ACP output mode (upstream issue #119), no `--model` flag (issue #35), and no MCP forwarding hook yet — the adapter ships with `streamFormat: 'plain'` and a single `default` fallback model so the model picker doesn't mislead users into thinking their choice is wired through. We will upgrade buildArgs + add a dedicated event parser when upstream ships structured output. Also gitignores `.antigravitycli/`, the project-local config directory `agy` auto-creates on every run (upstream issue #175). * fix(daemon): Antigravity adapter — stdin prompt, brand icon, form loop, empty-output guard - Switch prompt delivery from argv to stdin (`agy -p -`) to avoid the 30KB maxPromptArgBytes limit that blocked real-world composed prompts - Add official Antigravity brand SVG icon to agent picker - Fix repeated question-form loop for plain agents by injecting an OVERRIDE block when form answers are already present in the transcript - Add empty-output guard for plain agents so expired auth or silent failures surface a user-visible error instead of a blank "Done" turn * feat(daemon): expand Antigravity adapter — model picker, form-loop fix, OAuth launcher, log-file classification PR #3157 follow-up integrating four iterations from end-to-end manual testing on Gemini 3.5 Flash + GPT-OSS 120B Medium through `agy` v1.0.3. Each section is independently verifiable; combined they're what made the first successful artifact generation work end-to-end. ## Model picker via settings.json (agy has no --model flag) agy v1.0.3 ships no `--model` CLI flag (upstream issue #35), but the TUI Switch-Model picker writes the chosen label to `~/.gemini/antigravity-cli/settings.json`'s `"model"` field, and every `-p` invocation re-reads that file on startup — verified by capturing the `--log-file` line `Propagating selected model override to backend: label="<model>"`. Antigravity's `fallbackModels` now lists the 8 labels its TUI exposes (Gemini 3.1 Pro / 3.5 Flash variants, Claude Sonnet/Opus 4.6 Thinking, GPT-OSS 120B Medium) and `buildArgs` persists the user's choice to settings.json right before spawn. The synthetic `default` id is preserved — picking it leaves settings.json untouched so a user who switches models from agy's own TUI keeps their choice. Introduces `RuntimeAgentDef.supportsCustomModel?: boolean`. AMR's hardcoded blocklist in `SettingsDialog.tsx` migrates to the declarative flag (it rejects free-form ids at the ACP layer), and antigravity opts out because its label set is a server-side enum that silently fails on unrecognised strings. ## Form-loop fix (transcript sanitizer + stronger OVERRIDE) The discovery form loop on weak/medium plain-stream models (GPT-OSS 120B Medium, Gemini 3.5 Flash) had two reinforcing causes: 1. `buildDaemonTranscript` packed the prior assistant turn's literal `<question-form>` markup into the user request on the next turn, giving the model a template to echo. New `sanitizePriorAssistantTurnForTranscript` strips `<question-form>...</question-form>` blocks and ```json fences that match form-schema shape, replacing them with a brief placeholder. User content is preserved verbatim (a user who legitimately mentions `<question-form>` in chat keeps their message intact). 2. The OVERRIDE block on form-answered turns was 4 lines and only banned the bare `<question-form>` tag — models still emitted the fenced JSON, form-asking prose ("Got it — tell me the following"), and fake system events ("subagents stopped"). The new `FORM_ANSWERED_SYSTEM_OVERRIDE` enumerates each anti-pattern and pins them via tests, so silently weakening any line reintroduces the regression. Also adds RuntimeAgentDef.resumesSessionViaCli + RuntimeContext. hasPriorAssistantTurn as forward-looking abstractions (skipTranscript option on composeChatUserRequestForAgent). Antigravity does NOT opt in — agy's `-c` resume activates an internal agentic loop with tool retries and fallback-to-cached-response on tool errors that the OD system prompt cannot steer; reverted after seeing byte-identical form re-emissions caused by agy's own retry logic, not OD's transcript. ## One-click OAuth via system terminal agy print mode can't complete Google Sign-In on its own (the OAuth callback page asks the user to paste an auth code back into agy, but `-p` has no input field). Before this commit the auth banner only told the user to "open a terminal yourself." Adds `POST /api/agents/antigravity/oauth-launch` and a cross-platform launcher in `runtimes/terminal-launch.ts`: - macOS: osascript → Terminal.app `do script "agy"` + activate - Linux: tries x-terminal-emulator, gnome-terminal, konsole, xfce4-terminal, xterm in order - Windows: `cmd /c start "Open Design" cmd /k agy` The endpoint hardcodes the `agy` command (no user input → no shell injection surface) and is loopback-gated like the other daemon endpoints. The chat's `AGENT_AUTH_REQUIRED` banner now renders a "Sign in via terminal" button next to Retry; clicking it spawns the terminal so the user can finish OAuth in one click. ## Silent-failure classification (auth vs quota via --log-file) agy print mode is silent on stdout/stderr for both missing-OAuth AND quota-exhausted failures — the upstream `RESOURCE_EXHAUSTED (code 429): Individual quota reached` and the `not logged into Antigravity` line only surface in agy's `--log-file`. Without log inspection the daemon misread quota as "auth required" and showed the wrong banner. `RuntimeContext.agentLogFilePath` carries a daemon-owned per-run temp path that antigravity's buildArgs translates to `--log-file <path>`. The empty-output guard now reads that log on a `code === 0 && !childStdoutSeen` exit, feeds the tail to `classifyAgentServiceFailure`, and routes: - "not logged into Antigravity" → AGENT_AUTH_REQUIRED with antigravityAuthGuidance - "RESOURCE_EXHAUSTED" / "quota" / → RATE_LIMITED with "Individual quota reached" antigravityQuotaGuidance - none of the above (rare) → fall back to auth guidance as the most likely cause Both surface a terminal launcher in the auth banner: auth gets "Sign in via terminal", quota gets "Switch model in terminal" — same endpoint, contextual label. The handler is identical (open agy in a terminal); the user either signs in or uses agy's Switch Model picker to pick a model with available quota. ## Validation - `pnpm guard` pass - `pnpm --filter @open-design/daemon` runtime + telemetry suites: 192 passed, 1 skipped (the 1 pre-existing `task-type` failure on origin/main is unrelated to this change) - `pnpm --filter @open-design/web` typecheck pass; sse / amr-guidance / AgentIcon suites pass (51 web tests) - Manual end-to-end on darwin + Gemini 3.5 Flash and GPT-OSS 120B Medium: turn-1 question-form rendered correctly, turn-2 produced `<artifact>` with full HTML (3.3KB Modern Minimal design) instead of re-emitting the form. agy `--log-file` content correctly classified as RATE_LIMITED when Gemini Pro quota was exhausted, and as AGENT_AUTH_REQUIRED when keychain was cleared. * fix(web/test): align amrAgent fixture with supportsCustomModel contract The AMR agent definition in the daemon ships `supportsCustomModel: false` so the Settings model picker hides the free-text "Custom…" option. The PR changed `allowCustomModel` from `selected.id !== 'amr'` (hardcoded) to `selected.supportsCustomModel !== false` (declarative), but the test fixture was not updated to carry the same field — causing the `__custom__` sentinel to appear in the picker under test. Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code) * fix(daemon): align formAnswerTransition wording with main + scope build directive to discovery CI surfaced two failures on the merge with main: - chat-route.test marks submitted discovery form answers ... expected the main-version wording 'Do not emit another <formId> form.' - telemetry-message-finalization keeps non-discovery form answers active ... expected task-type to fall through the else branch ('Treat these form answers as the active user turn'), not the discovery RULE 2/RULE 3 build branch. The colleague's earlier `fba1e40b` form-loop fix tightened both pieces (stronger wording + grouped discovery\|task-type into the build branch) but didn't update the tests that pin the contract. Revert the transition wording to main and re-scope the build directive to 'discovery' only. The aggressive form-loop suppression we added in this PR now lives in the system-prompt FORM_ANSWERED_SYSTEM_OVERRIDE block, which is far stronger than the user-request transition text this commit reverts. * fix(daemon): scope formOverride by form id, detach Linux terminal, move agy log cleanup to finally - FORM_ANSWERED_GENERIC_OVERRIDE: new exported constant for non-discovery/ non-task-type form ids; contains only the "do not re-ask" suppression without the RULE 2 / RULE 3 / artifact directive. - formAnswerTransitionForCurrentPrompt: extend build-transition branch to include task-type alongside discovery, keeping user-turn and system override consistent. - Prompt assembly (server.ts ~10848): derive formOverride from the parsed form id — FORM_ANSWERED_SYSTEM_OVERRIDE for discovery/task-type, FORM_ANSWERED_GENERIC_OVERRIDE for all other form ids, empty otherwise. - launchOnLinux: replace execFileAsync (waited for terminal exit, 3 s cap) with spawn({ detached: true, stdio: 'ignore' }) + unref(); resolve on the 'spawn' event so long-lived interactive terminals (xterm, konsole) are not killed mid-OAuth-flow. - Antigravity log cleanup: move fs.promises.unlink(agentLogFilePath) into a try/finally wrapper around the close handler so every exit path (success, failure, cancel, non-zero exit) cleans up the per-run temp file, preventing unbounded /tmp accumulation. - Tests: rename task-type case to assert build-transition behaviour; add generic-form-id case (preferences) pinning the non-build path; add FORM_ANSWERED_GENERIC_OVERRIDE content assertions. Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code) * fix(daemon): switch Antigravity buildArgs to chat subcommand invocation Replace top-level `-p -` with `agy chat [--log-file …] -` so the adapter uses the documented chat subcommand and stdin sentinel instead of the unrecognised global -p flag. Update the agent-args test description and all four deepEqual assertions to assert the ['chat', '-'] shape. Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code) * test(daemon): drop real-platform default-launch assertion from terminal-launch suite The removed test called launchAgentInSystemTerminal('agy') with no platform override, which invokes the real system terminal on every developer machine running the daemon test suite (Terminal.app on macOS, cmd.exe on Windows, xterm/gnome-terminal on Linux). That is an unacceptable OS side effect for a unit test. The behaviour being asserted — that omitting platform selects process.platform — is a TypeScript default-parameter guarantee, not a runtime invariant that needs an integration test. The remaining 'aix' case continues to pin the unsupported-platform failure shape. Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code) * fix(daemon): buffer Antigravity stdout to suppress auth URL before close-time classifier The plain-stream close handler at code===0 can detect an agy OAuth prompt in agentStdoutTail and emit AGENT_AUTH_REQUIRED, but by the time close fires the stdout chunk has already been forwarded to the client via the plain-stream `send('stdout', { chunk })` path. This leaves both the raw OAuth URL and the terminal-launch guidance visible in chat. Buffer all stdout chunks for the `antigravity` agent instead of forwarding them immediately. The existing close-time auth-prompt guard (code===0, !trackingSubstantiveOutput, childStdoutSeen) returns early when it detects the auth pattern, leaving the buffer unflushed and the OAuth URL out of the SSE stream. For legitimate assistant output the buffer is flushed in order just before design.runs.finish so the chunks still arrive before the run's finished event. Adds a chat-route integration test using a fake `agy` that exits 0 after printing the canonical auth prompt; asserts that the run emits AGENT_AUTH_REQUIRED with no event: stdout delta containing the URL. Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code) * test(daemon): isolate antigravity buildArgs argv test from real settings file Pass a temp antigravitySettingsPath in the RuntimeContext for the withModel argv assertion so unit tests do not touch ~/.gemini/antigravity-cli/settings.json. Adds the optional antigravitySettingsPath field to RuntimeContext and threads it through buildArgs to writeAntigravityModelSelection; production callers leave it undefined, preserving the existing default path. Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code) * fix(daemon): revert Antigravity buildArgs to `-p -` (the only working agy v1.0.3 invocation) The looper-reviewer-bot reported `chat` as agy's headless subcommand based on its environment's agy build, and looper-fixer applied that shape. The installed CLI (`agy --version` reports `1.0.3`) does NOT expose a `chat` subcommand — `agy --help`'s `Available subcommands` section lists only `changelog / help / install / plugin / update`, and `agy chat - < prompt` exits 0 with empty stdout (the daemon then forwards it as a 'successful' empty reply, exactly the failure mode the auth/quota guard at server.ts ~12090 is meant to catch — for the wrong reason). `-p` is the documented print-mode flag (`Short alias for --print`) and `agy -p -` reads the prompt from stdin and prints the model reply, which the entire end-to-end test sequence in this PR has verified against (form-loop fix, settings.json model routing, log-file classification all confirmed working on Gemini 3.5 Flash + GPT-OSS 120B Medium with this invocation). Updates the agent-args test to pin `['-p', '-']` instead of `['chat', '-']` and adds an inline comment in antigravity.ts noting that `chat` may exist in a future agy build but is not the contract on the installed CLI today. * fix(daemon): serialize Antigravity concrete-model spawns to dodge settings.json race Reviewer (looper) flagged a concurrency race in the model-routing path: ~/.gemini/antigravity-cli/settings.json is process-global, so two OD runs starting close together with different concrete models can race the file — run A writes model A, run B writes model B, then A's agy finally reads settings.json and executes on model B. The Settings model picker becomes nondeterministic under parallel conversations. Adds a per-process promise chain in antigravity.ts: - acquireAntigravityModelLock(): chain-await + return release fn - waitForAgyToReadModel(logPath, expected): polls agy's --log-file for the upstream signal 'Propagating selected model override to backend: label="<X>"' which model_config_manager.go emits once agy has finished reading settings.json. Returns true on observed match, false on timeout. Regex-escapes the expected label so '(' / ')' in 'GPT-OSS 120B (Medium)' match literally, not as a capture group. server.ts spawn pipeline now acquires the lock BEFORE buildArgs (which performs the settings.json write) and schedules a release-once handler that fires when EITHER (a) the log-file confirms agy read the model or (b) the child exits — the exit fallback prevents a stuck/crashed agy from starving the queue for every subsequent antigravity spawn. Default-model spawns bypass the lock entirely: their buildArgs doesn't touch settings.json, so there's nothing to serialize. Tests pin: - FIFO ordering across 2 / 3 concurrent acquirers - Wait helper's regex correctly matches parenthesized labels - Wait helper does NOT match a different model with shared prefix - Wait helper swallows missing-log-file errors and returns false on timeout (no spawn-pipeline crash if the log never appears) 194 → 198 passing runtime tests, 0 regressions. * fix(daemon): close Antigravity lock release race on slow agy startup (looper #263fd2fe7) Reviewer flagged that the previous serialization scheduled `releaseOnce` in `.finally()` on waitForAgyToReadModel — meaning the helper's `false` timeout return ALSO released the lock. If agy took longer than the 15s polling window to read settings.json (cold start, swap-thrash, slow network handshake to the upstream backend), run A's lock dropped at 15s, run B rewrote settings.json with model B, and run A's still-starting agy then read the wrong model. Same race the original mutex was meant to close. Fix the release semantics to be release-on-confirmation-only: - waitForAgyToReadModel: `false` now strictly means 'I gave up polling,' not 'agy definitely did not read this.' Document the contract so a future caller can't conflate the two. Add an optional AbortSignal so server.ts can stop polling when the child exits — without it, the leftover watcher could outlive the run and accidentally match a later concurrent run's log content, releasing the wrong lock. - server.ts: schedule `releaseOnce` only when waitForAgyToReadModel returns true. The exit handler (which fires for crashes, fast exits, normal completion) is now the canonical fallback that releases the lock no matter what — the queue can't starve permanently because agy always exits eventually. The exit handler also fires the AbortController so the watcher cleans up. New tests pin: - timeout returns false WITHOUT any release-implying side effect - already-aborted signal short-circuits (no readFile calls) - abort mid-poll wakes the helper from its setTimeout (no multi-hundred-ms hang waiting out a poll interval that no longer matters) 198 → 201 passing runtime tests, 0 regressions. --------- Co-authored-by: qiongyu1999 <2694684348@qq.com>	2026-05-29 05:43:37 +00:00
kami	59a9867cf3	fix(daemon): surface discovery form answers to agents (#2071 )	2026-05-20 10:58:51 +08:00
lefarcen	4d8d233ce0	Fix Langfuse report finalization hook (#1402 )	2026-05-12 19:22:49 +08:00
lefarcen	afb331a288	feat: add opt-in Langfuse telemetry (#800 ) * docs(specs): add langfuse telemetry change spec Captures the design for forwarding completed agent runs to Langfuse, including data-model mapping, field-budget caps, privacy gates, build-secret injection, GDPR right-to-deletion approach, and the resolved decisions on default consent, identifier shape, region, and ownership. * feat(daemon): add langfuse-trace module and telemetry prefs Adds the dependency-free building blocks for forwarding completed agent runs to Langfuse. Two layers: - AppConfigPrefs gains installationId and a TelemetryPrefs object with metrics / content / artifactManifest gates. The daemon validator treats telemetry like agentModels — replace-on-write, drop-when-empty, reject non-boolean inner values. - New langfuse-trace.ts builds a {trace-create, generation-create} pair from a ReportContext, capping prompt at 8 KB, output at 16 KB, artifacts at 50 entries, and dropping any batch larger than 1 MB before send. reportRunCompleted is no-op when LANGFUSE_PUBLIC_KEY / LANGFUSE_SECRET_KEY are unset (so dev runs and forks never emit) and short-circuits on prefs.metrics === false. Server-side wiring into the run-close path lands in a follow-up. * fix(langfuse): default to US Langfuse region End-to-end smoke against the project's actual dev key on 2026-05-07 returned 401 from cloud.langfuse.com (EU) and 207 from us.cloud.langfuse.com (US), confirming the org lives in US. Update the default base URL, the matching test, and the spec's Q3 decision row to match. Self-hosted or EU-region operators can still override via the LANGFUSE_BASE_URL env var. * feat(daemon): wire langfuse trace forwarding into run-close Adds the daemon-side glue to forward completed agent runs: - runs.ts gains an optional onTerminate hook fired once per run after it reaches a terminal state. Errors thrown from the hook are caught and logged, never propagated, so telemetry can never break the run path. - New langfuse-bridge.ts assembles a ReportContext from the in-memory run record, the conversation's persisted assistant message, and the user's app-config preferences. It tolerates a missing message (e.g. when web has not yet PUT the final delta) and a missing app-config. - server.ts stashes the original user prompt on the run object inside startChatRun so the bridge can include it without crossing the createChatRunService boundary, and registers the hook callback when building the run service. Behavior remains a no-op unless LANGFUSE_PUBLIC_KEY / LANGFUSE_SECRET_KEY are set in the daemon env AND telemetry.metrics is true in app-config. A live smoke against us.cloud.langfuse.com on 2026-05-07 confirmed the matching trace + generation schema is accepted (HTTP 207, both events 201 created). * fix(langfuse): address PR #800 review feedback P1 — Move trace forwarding off the daemon-internal run-close hook and onto the message-persistence path. The original onTerminate hook ran inside finish() the moment the SSE 'end' event was emitted, which is before the web client's onDone handler refreshes project files and PUTs producedFiles + final assistant content back to SQLite. Reading SQLite at that moment routinely missed both. The fix: drop the runs.ts hook entirely and trigger from PUT /api/projects/:id/conversations/:cid/ messages/:mid when the saved row carries a terminal runStatus. A reportedRuns Set guards against the multiple PUT calls web makes per turn (each retry / state update). Set entries auto-evict after the same 30 min TTL the runs map uses. Web persists a terminal-status message in all three completion paths — onDone (succeeded), onError (failed), and cancel (canceled) — so this catches every run shape. P2 — postLangfuseBatch now parses the 207 Multi-Status response body. Langfuse legacy ingestion always returns 207, and response.ok is true for 207, so per-event validation errors used to slip through silently. We now warn when body.errors is non-empty. Two new unit tests. P2 — truncate() and the HARD_BATCH cap now compare UTF-8 byte length, not String.length (which counts UTF-16 code units). A 4096-character CJK prompt occupies 12 KB, well over the 8 KB input cap. truncate also walks backwards to a UTF-8 leading byte so the cut never lands inside a multi-byte codepoint. New unit test covers '设'.repeat(4096). P2 — Spec R7 now lists the actual Langfuse trace deletion endpoint (DELETE /api/public/traces/{traceId} for single, DELETE /api/public/traces with body for batch). Verified by curl on us.cloud.langfuse.com: DELETE /api/public/traces/X → 200; the path the original spec named (POST /api/public/trace/X) returns 404. Reference link points at langfuse.com/docs/administration/data-deletion. P3 — Q4 (legacy ingestion vs OTel) moved from Open Questions to Resolved Decisions. The implementation already commits to legacy and the trade-off was discussed during design; the open-question status was stale. * feat(web): privacy consent surface + Settings → Privacy tab Adds the user-facing half of the telemetry feature so the daemon-side hook from PR #800 has something to talk to. - AppConfig gains optional `installationId` (anonymous v4 uuid generated on first opt-in; null after explicit decline; undefined when the user has never seen the consent surface) and `telemetry: TelemetryConfig` ({metrics, content, artifactManifest}). syncConfigToDaemon round-trips both fields so the bridge module sees the same prefs. - SettingsDialog grows a Privacy section with two states. When the user has never made a consent decision (typical first-run path), the section renders the GDPR-aligned consent card: a kicker, the disclosure body listing both metrics and conversation content as separate bullets, and two equally-prominent buttons ("Share usage data" / "Don't share"). The Don't-share path keeps the app fully usable (core app must work with all tracking declined). After a decision the same panel switches to three independent toggles + the anonymous ID + a "Delete my data" button that rotates the ID and turns everything off. - App.tsx points the welcome modal at the new Privacy section so the consent decision is the first thing a fresh installation sees. - 17 i18n keys land in en + zh-CN + zh-TW with hand-translated copy, and as English placeholders in the remaining 14 locales — enough for the parity check to pass while leaving room for proper localisation in a follow-up. Dict type updated. - Minimal index.css for the consent card + toggle rows so the panel is legible without depending on follow-up design polish. Telemetry remains a no-op end-to-end until the user clicks Share usage data: the daemon gate (prefs.metrics === true) keeps every code path short-circuited otherwise. * refactor(web): rebuild Privacy panel using project-native settings primitives The first cut used custom .settings-privacy-* classes + raw HTML checkboxes that didn't match any other Settings tab. Replace with the shell other sections already use: - settings-subsection containers with section-head + h4 + .hint - seg-control / seg-btn pill toggles ("active" / "offline") for each of the three telemetry preferences, mirroring NotificationsSection - a 2-cell seg-control for the consent card so Share usage data and Don't share carry identical visual weight (the GDPR equal-prominence requirement that the previous accent / outline split missed) - ghost button + readonly text input for the installation id row, mirroring the API-key field pattern elsewhere Drop the bespoke CSS block in favor of inheriting the existing settings-section / seg-control / ghost styling. The only privacy- specific style left is a tight definition list inside the consent card for the metrics + content disclosure rows. * refactor(web): use .toggle-row iOS switch for Privacy preferences Active/offline pills (the seg-control single-cell pattern that NotificationsSection uses) read awkwardly for a flat preference list. Switch the three telemetry toggles to .toggle-row — the same control NewProjectPanel uses for "speaker notes" / "animations": label + hint on the left, iOS-style sliding switch on the right, full-row click target. The consent card's two-button seg-control stays as-is — there the equal-weight pill pair is exactly what GDPR equal-prominence wants. * feat(web): standalone first-run privacy consent banner Replaces the Settings-dialog-as-onboarding hack with a dedicated bottom-right banner card that mounts whenever the user has never made a privacy decision (cfg.installationId === undefined). The banner is prominent (anchored to the corner with a soft shadow) but non-blocking, mirrors cookie-consent UX, and shares the project's panel styling — same .modal-elevated background, --radius-lg corners, --shadow-lg lift. Wiring: - App.tsx imports PrivacyConsentModal and renders it at the root, gated on installationId === undefined && !settingsOpen so it doesn't double up with the Privacy tab's own consent card when Settings is already showing. - Share / Don't share both go through handleConfigPersist, so the resulting installationId + telemetry prefs land in localStorage and the daemon at the same time, reusing the existing autosave plumbing. - The previous attempt that pinned the welcome SettingsDialog to the Privacy section is reverted; onboarding now stays focused on agent configuration, and the consent decision lives in its own surface. * fix(web): keep privacy banner visible while Settings welcome modal is open The banner gated itself on `!settingsOpen` to avoid double-rendering with the Privacy tab's consent card. But the first-run path opens the Settings welcome modal automatically when `onboardingCompleted=false`, which fired immediately after bootstrap — so the banner flashed for a moment and then vanished behind the modal backdrop. Drop the `!settingsOpen` clause so the banner stays mounted whenever the user has not yet made a privacy decision, and bump its z-index above the modal backdrop (200 vs 100) so first-run users can actually reach the consent buttons. The minor visual overlap with the Privacy tab's own card is fine: clicking either copy resolves both surfaces. * copy(privacy): soften consent button labels Banner action buttons now read "Help improve Open Design" / "Not now" (en, with hand translations in zh-CN / zh-TW and English placeholders in the other 13 locales) instead of "Share usage data" / "Don't share". The new wording aligns the affirmative action with the kicker copy ("Help us improve Open Design") and reads less alarming, while the disclosure list above still names both data categories explicitly so the consent stays informed under GDPR. The decline button stays as a soft "Not now" rather than an aggressive "Don't share" so the reject path doesn't read as hostile to the user. No structural change — the two-cell seg-control still gives the buttons identical visual weight, and the underlying side-effects are unchanged (installationId is generated on Help / nulled on Not now, and the telemetry prefs flip the same way). * feat(telemetry): expand trace fields for evals & dataset construction Each Langfuse trace now ships the full per-turn + per-install fact sheet that the eval/dataset workflow needs, instead of only the bare turn id + token count from before. Everything below is gated by `prefs.metrics === true`; nothing here is content (those gates remain separate). Per-turn: - model — first-class generation.model field, drives Langfuse cost lookup and model-grouping in the UI; also mirrored in trace.metadata and trace.tags so list-view filters work. - reasoning — generation.modelParameters.{ reasoning } so the Model Parameters card lights up; mirrored in metadata. - skillId / designSystemId — metadata + tags, so dataset slices can group by which skill/DS produced which output. Per-process / build (constant within one daemon run, cached at start): - appVersion / appChannel / packaged from app-version.ts - nodeVersion (process.version), os (platform()), osRelease, arch (os.arch()) - clientType — desktop vs web, derived from a new X-OD-Client header the web layer sets in providers/daemon.ts (with a User-Agent sniff fallback for third-party callers). Plumbing: - startChatRun stashes model / reasoning / skillId / designSystemId on the run object alongside the existing userPrompt stash. - POST /api/runs reads X-OD-Client and stores run.clientType. - langfuse-bridge collects RuntimeInfo once per process and merges per-run client carrier; ReportContext gains optional `turn` + `runtime` blocks; existing fields stay backward compatible. Spec gains a "Telemetry Fields Catalog" section enumerating every field, its source, and the gate it lives under, so the eval team has a single place to look up what's available without reading the trace schema by example. Tests: - new langfuse-trace tests cover turn tags, runtime tags, generation model/modelParameters promotion, modelParameters omission when reasoning is unset, and metadata mirroring. - langfuse-bridge gains an end-to-end "turn-level config" test that threads model/reasoning/skill/DS/clientType + appVersion through the bridge and asserts the Langfuse payload shape. - existing tests adjusted to tolerate host-dependent os tag. * copy(privacy): trim Share button to verb phrase only "Help improve Open Design" overflowed the equal-width 2-cell seg-control on the consent banner — the product name is already in the kicker + headline above the buttons, so the button itself only needs the verb phrase. Drop the product name from all locales: - en: Help improve Open Design → Help improve - zh-CN: 帮助改进 Open Design → 帮助改进 - zh-TW: 協助改進 Open Design → 協助改進 The decline button ("Not now" / "暂不" / "暫不") was already short, so the two buttons now have comparable length and the equal-prominence seg-control fits cleanly. Standalone Settings → Privacy panel uses the same labels for consistency. * fix(web): defer Settings welcome modal until privacy decision is made Previously bootstrap raced two surfaces against each other on first launch: the privacy consent banner (gated on installationId === undefined) and the Settings welcome modal (gated on onboardingCompleted === false). The banner's higher z-index kept it above the backdrop visually, but having two foreground surfaces at once is still confusing UX. Sequence them instead: bootstrap only opens the welcome modal when the user has already resolved consent (installationId !== undefined). Until then the banner owns the foreground alone. Once the user clicks Help improve / Not now, the corresponding handler hands off to the welcome modal if onboarding is still pending. End state matches what it was before — just without the simultaneous-render flash. * debug(privacy): log banner gate state to track sudden disappearance Two console.log points to find which setCfg call (or stale bundle) is flipping cfg.installationId from undefined to a value while the banner is visible. To remove once the regression is reproduced. * fix(privacy): keep installationId + telemetry out of localStorage Daemon is now the single source of truth for the privacy decision. Why this matters: the consent banner gates on \`config.installationId === undefined\`, but loadConfig() merges localStorage on top of the daemon's reply, so a stale uuid in \`open-design:config\` (left over from a previous opt-in) was re-hydrating the React state and immediately syncing back to the daemon — defeating "Delete my data" and re-suppressing the banner within milliseconds of every page load. The deeper reason to fix it here, not just patch the gate: a privacy identifier persisted in browser storage that the user can't see or clear without DevTools is a compliance liability. Anything users can revoke needs one canonical place to store it. Daemon \`app-config.json\` already serves that role for everything else gated through syncConfigToDaemon, so installationId + telemetry now ride that path exclusively: - saveConfig() strips both keys before writing localStorage. - loadConfig() strips both keys when reading older stale payloads, so existing installs migrate transparently on next launch. - syncConfigToDaemon() / mergeDaemonConfig still round-trip them, so the React state stays in sync with the daemon as before. Net effect: clearing app-config.json (or hitting "Delete my data") now fully resets the install identity, with no residual cohort key in browser storage. * feat(privacy): scrub secrets + PII from prompt/output before send When prefs.content is on, daemon now runs the prompt and assistant text through a regex scrubber (apps/daemon/src/redact.ts) before posting to Langfuse. The scrubber is the simplest thing that gives the user-facing copy a truthful claim — pure regex, zero new dependencies, fully auditable in this Apache-2.0 repo (vs. pulling a single-maintainer 5-month-old npm package into a core process). Categories covered (each replaced with [REDACTED:<kind>]): - Anthropic / OpenAI sk- keys (incl. proj/live/test/ant variants) - Langfuse pk-lf- / sk-lf- (specific rule wins over generic sk-) - GitHub gh[opsur]_ tokens - AWS access key ids (AKIA + 16 uppercase) - Google API keys (AIza + 35) - Slack xox[abprs]- tokens - Stripe live/test keys - JWT header.payload.signature triples - Bearer-header values (scheme word stays readable) - Emails, IPv4, US-style phone numbers - Credit cards — 13–19 digit runs that pass a Luhn check, so order ids and unix-nanos timestamps that fail Luhn pass through unchanged Not covered, stated openly in spec + i18n: names, postal addresses, business-secret semantics, raw 40-hex tokens (too high a false-positive cost for artifact slugs). Those would require an ML layer. Wired in: - apps/daemon/src/redact.ts — exports redactSecrets() + redactSecretsWithCounts() helper for future audit-summary metadata. - apps/daemon/src/langfuse-bridge.ts — runs both prompt and output through redactSecrets() before they reach the trace builder. - 18 unit tests cover every pattern plus negative cases (Luhn-failing digit runs, out-of-range IPv4 octets, idempotence on re-redacted text, ordinary prose passthrough). - i18n privacyContentHint on en + zh-CN + zh-TW (plus 14 locale placeholders) enumerates the categories so the consent disclosure matches the implementation — the GDPR informed-consent requirement. - spec gains a Pre-send Redaction subsection with the regex shape table + intentional non-coverage list. Drive-by: dropped the [privacy] debug logs that traced the now-fixed bootstrap regression. * fix(telemetry): make Langfuse reporting resilient * feat(telemetry): nest Langfuse turn observations * feat(telemetry): emit Langfuse tool spans * fix(telemetry): report after finalized message writes * fix(telemetry): honor persisted terminal status * fix(web): let consent banner yield page clicks * fix(telemetry): report current turn prompt only	2026-05-09 10:06:01 +08:00

4 commits