Commit graph

4 commits

Author SHA1 Message Date
lefarcen
98a2c63973
feat(daemon): add Antigravity agent adapter (#3157)
* feat(daemon): add Antigravity agent adapter

Adds Google Antigravity (`agy` CLI) as a coding-agent runtime. Detection
picks up `agy` on PATH, the daemon spawns `agy -p "<prompt>"` for a
single non-interactive turn, and the assistant text reply streams back
on stdout. OAuth is shared with the Antigravity IDE through the system
keyring, so users who have signed into the desktop app are authenticated
on first run with no extra step.

`agy` v1.0.3 has no JSON / stream-json / ACP output mode (upstream issue
#119), no `--model` flag (issue #35), and no MCP forwarding hook yet —
the adapter ships with `streamFormat: 'plain'` and a single `default`
fallback model so the model picker doesn't mislead users into thinking
their choice is wired through. We will upgrade buildArgs + add a
dedicated event parser when upstream ships structured output.

Also gitignores `.antigravitycli/`, the project-local config directory
`agy` auto-creates on every run (upstream issue #175).

* fix(daemon): Antigravity adapter — stdin prompt, brand icon, form loop, empty-output guard

- Switch prompt delivery from argv to stdin (`agy -p -`) to avoid the
  30KB maxPromptArgBytes limit that blocked real-world composed prompts
- Add official Antigravity brand SVG icon to agent picker
- Fix repeated question-form loop for plain agents by injecting an
  OVERRIDE block when form answers are already present in the transcript
- Add empty-output guard for plain agents so expired auth or silent
  failures surface a user-visible error instead of a blank "Done" turn

* feat(daemon): expand Antigravity adapter — model picker, form-loop fix, OAuth launcher, log-file classification

PR #3157 follow-up integrating four iterations from end-to-end manual
testing on Gemini 3.5 Flash + GPT-OSS 120B Medium through `agy` v1.0.3.
Each section is independently verifiable; combined they're what made
the first successful artifact generation work end-to-end.

## Model picker via settings.json (agy has no --model flag)

agy v1.0.3 ships no `--model` CLI flag (upstream issue #35), but the
TUI Switch-Model picker writes the chosen label to
`~/.gemini/antigravity-cli/settings.json`'s `"model"` field, and every
`-p` invocation re-reads that file on startup — verified by capturing
the `--log-file` line `Propagating selected model override to backend:
label="<model>"`. Antigravity's `fallbackModels` now lists the 8
labels its TUI exposes (Gemini 3.1 Pro / 3.5 Flash variants, Claude
Sonnet/Opus 4.6 Thinking, GPT-OSS 120B Medium) and `buildArgs`
persists the user's choice to settings.json right before spawn. The
synthetic `default` id is preserved — picking it leaves settings.json
untouched so a user who switches models from agy's own TUI keeps
their choice.

Introduces `RuntimeAgentDef.supportsCustomModel?: boolean`. AMR's
hardcoded blocklist in `SettingsDialog.tsx` migrates to the
declarative flag (it rejects free-form ids at the ACP layer), and
antigravity opts out because its label set is a server-side enum that
silently fails on unrecognised strings.

## Form-loop fix (transcript sanitizer + stronger OVERRIDE)

The discovery form loop on weak/medium plain-stream models (GPT-OSS
120B Medium, Gemini 3.5 Flash) had two reinforcing causes:

  1. `buildDaemonTranscript` packed the prior assistant turn's
     literal `<question-form>` markup into the user request on the
     next turn, giving the model a template to echo. New
     `sanitizePriorAssistantTurnForTranscript` strips
     `<question-form>...</question-form>` blocks and ```json fences
     that match form-schema shape, replacing them with a brief
     placeholder. User content is preserved verbatim (a user who
     legitimately mentions `<question-form>` in chat keeps their
     message intact).
  2. The OVERRIDE block on form-answered turns was 4 lines and only
     banned the bare `<question-form>` tag — models still emitted the
     fenced JSON, form-asking prose ("Got it — tell me the following"),
     and fake system events ("subagents stopped"). The new
     `FORM_ANSWERED_SYSTEM_OVERRIDE` enumerates each anti-pattern and
     pins them via tests, so silently weakening any line reintroduces
     the regression.

Also adds RuntimeAgentDef.resumesSessionViaCli + RuntimeContext.
hasPriorAssistantTurn as forward-looking abstractions (skipTranscript
option on composeChatUserRequestForAgent). Antigravity does NOT opt
in — agy's `-c` resume activates an internal agentic loop with tool
retries and fallback-to-cached-response on tool errors that the OD
system prompt cannot steer; reverted after seeing byte-identical
form re-emissions caused by agy's own retry logic, not OD's transcript.

## One-click OAuth via system terminal

agy print mode can't complete Google Sign-In on its own (the OAuth
callback page asks the user to paste an auth code back into agy, but
`-p` has no input field). Before this commit the auth banner only
told the user to "open a terminal yourself."

Adds `POST /api/agents/antigravity/oauth-launch` and a cross-platform
launcher in `runtimes/terminal-launch.ts`:

  - macOS:    osascript → Terminal.app `do script "agy"` + activate
  - Linux:    tries x-terminal-emulator, gnome-terminal, konsole,
              xfce4-terminal, xterm in order
  - Windows:  `cmd /c start "Open Design" cmd /k agy`

The endpoint hardcodes the `agy` command (no user input → no shell
injection surface) and is loopback-gated like the other daemon
endpoints. The chat's `AGENT_AUTH_REQUIRED` banner now renders a
"Sign in via terminal" button next to Retry; clicking it spawns the
terminal so the user can finish OAuth in one click.

## Silent-failure classification (auth vs quota via --log-file)

agy print mode is silent on stdout/stderr for both missing-OAuth AND
quota-exhausted failures — the upstream
`RESOURCE_EXHAUSTED (code 429): Individual quota reached` and the
`not logged into Antigravity` line only surface in agy's
`--log-file`. Without log inspection the daemon misread quota as
"auth required" and showed the wrong banner.

`RuntimeContext.agentLogFilePath` carries a daemon-owned per-run temp
path that antigravity's buildArgs translates to `--log-file <path>`.
The empty-output guard now reads that log on a `code === 0 &&
!childStdoutSeen` exit, feeds the tail to
`classifyAgentServiceFailure`, and routes:

  - "not logged into Antigravity"     → AGENT_AUTH_REQUIRED with
                                        antigravityAuthGuidance
  - "RESOURCE_EXHAUSTED" / "quota" /  → RATE_LIMITED with
    "Individual quota reached"          antigravityQuotaGuidance
  - none of the above (rare)          → fall back to auth guidance
                                        as the most likely cause

Both surface a terminal launcher in the auth banner: auth gets "Sign
in via terminal", quota gets "Switch model in terminal" — same
endpoint, contextual label. The handler is identical (open agy in a
terminal); the user either signs in or uses agy's Switch Model
picker to pick a model with available quota.

## Validation

- `pnpm guard` pass
- `pnpm --filter @open-design/daemon` runtime + telemetry suites:
  192 passed, 1 skipped (the 1 pre-existing `task-type` failure on
  origin/main is unrelated to this change)
- `pnpm --filter @open-design/web` typecheck pass; sse / amr-guidance
  / AgentIcon suites pass (51 web tests)
- Manual end-to-end on darwin + Gemini 3.5 Flash and GPT-OSS 120B
  Medium: turn-1 question-form rendered correctly, turn-2 produced
  `<artifact>` with full HTML (3.3KB Modern Minimal design) instead
  of re-emitting the form. agy `--log-file` content correctly
  classified as RATE_LIMITED when Gemini Pro quota was exhausted,
  and as AGENT_AUTH_REQUIRED when keychain was cleared.

* fix(web/test): align amrAgent fixture with supportsCustomModel contract

The AMR agent definition in the daemon ships `supportsCustomModel: false`
so the Settings model picker hides the free-text "Custom…" option. The
PR changed `allowCustomModel` from `selected.id !== 'amr'` (hardcoded)
to `selected.supportsCustomModel !== false` (declarative), but the test
fixture was not updated to carry the same field — causing the
`__custom__` sentinel to appear in the picker under test.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

* fix(daemon): align formAnswerTransition wording with main + scope build directive to discovery

CI surfaced two failures on the merge with main:
- chat-route.test marks submitted discovery form answers ... expected
  the main-version wording 'Do not emit another <formId> form.'
- telemetry-message-finalization keeps non-discovery form answers
  active ... expected task-type to fall through the else branch
  ('Treat these form answers as the active user turn'), not the
  discovery RULE 2/RULE 3 build branch.

The colleague's earlier fba1e40b form-loop fix tightened both pieces
(stronger wording + grouped discovery|task-type into the build branch)
but didn't update the tests that pin the contract. Revert the
transition wording to main and re-scope the build directive to
'discovery' only. The aggressive form-loop suppression we added in
this PR now lives in the system-prompt FORM_ANSWERED_SYSTEM_OVERRIDE
block, which is far stronger than the user-request transition text
this commit reverts.

* fix(daemon): scope formOverride by form id, detach Linux terminal, move agy log cleanup to finally

- FORM_ANSWERED_GENERIC_OVERRIDE: new exported constant for non-discovery/
  non-task-type form ids; contains only the "do not re-ask" suppression
  without the RULE 2 / RULE 3 / artifact directive.
- formAnswerTransitionForCurrentPrompt: extend build-transition branch to
  include task-type alongside discovery, keeping user-turn and system
  override consistent.
- Prompt assembly (server.ts ~10848): derive formOverride from the parsed
  form id — FORM_ANSWERED_SYSTEM_OVERRIDE for discovery/task-type,
  FORM_ANSWERED_GENERIC_OVERRIDE for all other form ids, empty otherwise.
- launchOnLinux: replace execFileAsync (waited for terminal exit, 3 s cap)
  with spawn({ detached: true, stdio: 'ignore' }) + unref(); resolve on
  the 'spawn' event so long-lived interactive terminals (xterm, konsole)
  are not killed mid-OAuth-flow.
- Antigravity log cleanup: move fs.promises.unlink(agentLogFilePath) into
  a try/finally wrapper around the close handler so every exit path
  (success, failure, cancel, non-zero exit) cleans up the per-run temp
  file, preventing unbounded /tmp accumulation.
- Tests: rename task-type case to assert build-transition behaviour; add
  generic-form-id case (preferences) pinning the non-build path; add
  FORM_ANSWERED_GENERIC_OVERRIDE content assertions.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

* fix(daemon): switch Antigravity buildArgs to chat subcommand invocation

Replace top-level `-p -` with `agy chat [--log-file …] -` so the adapter
uses the documented chat subcommand and stdin sentinel instead of the
unrecognised global -p flag.  Update the agent-args test description and
all four deepEqual assertions to assert the ['chat', '-'] shape.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

* test(daemon): drop real-platform default-launch assertion from terminal-launch suite

The removed test called launchAgentInSystemTerminal('agy') with no
platform override, which invokes the real system terminal on every
developer machine running the daemon test suite (Terminal.app on macOS,
cmd.exe on Windows, xterm/gnome-terminal on Linux). That is an
unacceptable OS side effect for a unit test.

The behaviour being asserted — that omitting platform selects
process.platform — is a TypeScript default-parameter guarantee, not a
runtime invariant that needs an integration test. The remaining 'aix'
case continues to pin the unsupported-platform failure shape.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

* fix(daemon): buffer Antigravity stdout to suppress auth URL before close-time classifier

The plain-stream close handler at code===0 can detect an agy OAuth
prompt in agentStdoutTail and emit AGENT_AUTH_REQUIRED, but by the
time close fires the stdout chunk has already been forwarded to the
client via the plain-stream `send('stdout', { chunk })` path. This
leaves both the raw OAuth URL and the terminal-launch guidance visible
in chat.

Buffer all stdout chunks for the `antigravity` agent instead of
forwarding them immediately. The existing close-time auth-prompt guard
(code===0, !trackingSubstantiveOutput, childStdoutSeen) returns early
when it detects the auth pattern, leaving the buffer unflushed and the
OAuth URL out of the SSE stream. For legitimate assistant output the
buffer is flushed in order just before design.runs.finish so the
chunks still arrive before the run's finished event.

Adds a chat-route integration test using a fake `agy` that exits 0
after printing the canonical auth prompt; asserts that the run emits
AGENT_AUTH_REQUIRED with no event: stdout delta containing the URL.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

* test(daemon): isolate antigravity buildArgs argv test from real settings file

Pass a temp antigravitySettingsPath in the RuntimeContext for the
withModel argv assertion so unit tests do not touch
~/.gemini/antigravity-cli/settings.json. Adds the optional
antigravitySettingsPath field to RuntimeContext and threads it
through buildArgs to writeAntigravityModelSelection; production
callers leave it undefined, preserving the existing default path.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

* fix(daemon): revert Antigravity buildArgs to `-p -` (the only working agy v1.0.3 invocation)

The looper-reviewer-bot reported `chat` as agy's headless subcommand
based on its environment's agy build, and looper-fixer applied that
shape. The installed CLI (`agy --version` reports `1.0.3`) does NOT
expose a `chat` subcommand — `agy --help`'s `Available subcommands`
section lists only `changelog / help / install / plugin / update`,
and `agy chat - < prompt` exits 0 with empty stdout (the daemon then
forwards it as a 'successful' empty reply, exactly the failure mode
the auth/quota guard at server.ts ~12090 is meant to catch — for the
wrong reason).

`-p` is the documented print-mode flag (`Short alias for --print`)
and `agy -p -` reads the prompt from stdin and prints the model
reply, which the entire end-to-end test sequence in this PR has
verified against (form-loop fix, settings.json model routing,
log-file classification all confirmed working on Gemini 3.5 Flash
+ GPT-OSS 120B Medium with this invocation).

Updates the agent-args test to pin `['-p', '-']` instead of
`['chat', '-']` and adds an inline comment in antigravity.ts noting
that `chat` may exist in a future agy build but is not the contract
on the installed CLI today.

* fix(daemon): serialize Antigravity concrete-model spawns to dodge settings.json race

Reviewer (looper) flagged a concurrency race in the model-routing path:
~/.gemini/antigravity-cli/settings.json is process-global, so two OD
runs starting close together with different concrete models can race
the file — run A writes model A, run B writes model B, then A's agy
finally reads settings.json and executes on model B. The Settings
model picker becomes nondeterministic under parallel conversations.

Adds a per-process promise chain in antigravity.ts:
  - acquireAntigravityModelLock(): chain-await + return release fn
  - waitForAgyToReadModel(logPath, expected): polls agy's --log-file
    for the upstream signal
      'Propagating selected model override to backend: label="<X>"'
    which model_config_manager.go emits once agy has finished reading
    settings.json. Returns true on observed match, false on timeout.
    Regex-escapes the expected label so '(' / ')' in 'GPT-OSS 120B
    (Medium)' match literally, not as a capture group.

server.ts spawn pipeline now acquires the lock BEFORE buildArgs (which
performs the settings.json write) and schedules a release-once handler
that fires when EITHER (a) the log-file confirms agy read the model
or (b) the child exits — the exit fallback prevents a stuck/crashed
agy from starving the queue for every subsequent antigravity spawn.

Default-model spawns bypass the lock entirely: their buildArgs doesn't
touch settings.json, so there's nothing to serialize.

Tests pin:
  - FIFO ordering across 2 / 3 concurrent acquirers
  - Wait helper's regex correctly matches parenthesized labels
  - Wait helper does NOT match a different model with shared prefix
  - Wait helper swallows missing-log-file errors and returns false on
    timeout (no spawn-pipeline crash if the log never appears)

194 → 198 passing runtime tests, 0 regressions.

* fix(daemon): close Antigravity lock release race on slow agy startup (looper #263fd2fe7)

Reviewer flagged that the previous serialization scheduled
`releaseOnce` in `.finally()` on waitForAgyToReadModel — meaning the
helper's `false` timeout return ALSO released the lock. If agy took
longer than the 15s polling window to read settings.json (cold start,
swap-thrash, slow network handshake to the upstream backend), run A's
lock dropped at 15s, run B rewrote settings.json with model B, and
run A's still-starting agy then read the wrong model. Same race the
original mutex was meant to close.

Fix the release semantics to be release-on-confirmation-only:

  - waitForAgyToReadModel: `false` now strictly means 'I gave up
    polling,' not 'agy definitely did not read this.' Document the
    contract so a future caller can't conflate the two. Add an
    optional AbortSignal so server.ts can stop polling when the child
    exits — without it, the leftover watcher could outlive the run
    and accidentally match a later concurrent run's log content,
    releasing the wrong lock.
  - server.ts: schedule `releaseOnce` only when waitForAgyToReadModel
    returns true. The exit handler (which fires for crashes, fast
    exits, normal completion) is now the canonical fallback that
    releases the lock no matter what — the queue can't starve
    permanently because agy always exits eventually. The exit
    handler also fires the AbortController so the watcher cleans up.

New tests pin:
  - timeout returns false WITHOUT any release-implying side effect
  - already-aborted signal short-circuits (no readFile calls)
  - abort mid-poll wakes the helper from its setTimeout (no
    multi-hundred-ms hang waiting out a poll interval that no longer
    matters)

198 → 201 passing runtime tests, 0 regressions.

---------

Co-authored-by: qiongyu1999 <2694684348@qq.com>
2026-05-29 05:43:37 +00:00
kami
59a9867cf3
fix(daemon): surface discovery form answers to agents (#2071) 2026-05-20 10:58:51 +08:00
lefarcen
4d8d233ce0
Fix Langfuse report finalization hook (#1402) 2026-05-12 19:22:49 +08:00
lefarcen
afb331a288
feat: add opt-in Langfuse telemetry (#800)
* docs(specs): add langfuse telemetry change spec

Captures the design for forwarding completed agent runs to Langfuse,
including data-model mapping, field-budget caps, privacy gates,
build-secret injection, GDPR right-to-deletion approach, and the
resolved decisions on default consent, identifier shape, region, and
ownership.

* feat(daemon): add langfuse-trace module and telemetry prefs

Adds the dependency-free building blocks for forwarding completed
agent runs to Langfuse. Two layers:

- AppConfigPrefs gains installationId and a TelemetryPrefs object with
  metrics / content / artifactManifest gates. The daemon validator
  treats telemetry like agentModels — replace-on-write, drop-when-empty,
  reject non-boolean inner values.

- New langfuse-trace.ts builds a {trace-create, generation-create}
  pair from a ReportContext, capping prompt at 8 KB, output at 16 KB,
  artifacts at 50 entries, and dropping any batch larger than 1 MB
  before send. reportRunCompleted is no-op when LANGFUSE_PUBLIC_KEY /
  LANGFUSE_SECRET_KEY are unset (so dev runs and forks never emit) and
  short-circuits on prefs.metrics === false.

Server-side wiring into the run-close path lands in a follow-up.

* fix(langfuse): default to US Langfuse region

End-to-end smoke against the project's actual dev key on 2026-05-07
returned 401 from cloud.langfuse.com (EU) and 207 from
us.cloud.langfuse.com (US), confirming the org lives in US. Update the
default base URL, the matching test, and the spec's Q3 decision row to
match. Self-hosted or EU-region operators can still override via the
LANGFUSE_BASE_URL env var.

* feat(daemon): wire langfuse trace forwarding into run-close

Adds the daemon-side glue to forward completed agent runs:

- runs.ts gains an optional onTerminate hook fired once per run after it
  reaches a terminal state. Errors thrown from the hook are caught and
  logged, never propagated, so telemetry can never break the run path.

- New langfuse-bridge.ts assembles a ReportContext from the in-memory
  run record, the conversation's persisted assistant message, and the
  user's app-config preferences. It tolerates a missing message (e.g.
  when web has not yet PUT the final delta) and a missing app-config.

- server.ts stashes the original user prompt on the run object inside
  startChatRun so the bridge can include it without crossing the
  createChatRunService boundary, and registers the hook callback when
  building the run service.

Behavior remains a no-op unless LANGFUSE_PUBLIC_KEY / LANGFUSE_SECRET_KEY
are set in the daemon env AND telemetry.metrics is true in app-config.
A live smoke against us.cloud.langfuse.com on 2026-05-07 confirmed the
matching trace + generation schema is accepted (HTTP 207, both events
201 created).

* fix(langfuse): address PR #800 review feedback

P1 — Move trace forwarding off the daemon-internal run-close hook and
onto the message-persistence path. The original onTerminate hook ran
inside finish() the moment the SSE 'end' event was emitted, which is
*before* the web client's onDone handler refreshes project files and
PUTs producedFiles + final assistant content back to SQLite. Reading
SQLite at that moment routinely missed both. The fix: drop the runs.ts
hook entirely and trigger from PUT /api/projects/:id/conversations/:cid/
messages/:mid when the saved row carries a terminal runStatus. A
reportedRuns Set guards against the multiple PUT calls web makes per
turn (each retry / state update). Set entries auto-evict after the same
30 min TTL the runs map uses. Web persists a terminal-status message in
all three completion paths — onDone (succeeded), onError (failed), and
cancel (canceled) — so this catches every run shape.

P2 — postLangfuseBatch now parses the 207 Multi-Status response body.
Langfuse legacy ingestion always returns 207, and response.ok is true
for 207, so per-event validation errors used to slip through silently.
We now warn when body.errors is non-empty. Two new unit tests.

P2 — truncate() and the HARD_BATCH cap now compare UTF-8 byte length,
not String.length (which counts UTF-16 code units). A 4096-character
CJK prompt occupies 12 KB, well over the 8 KB input cap. truncate also
walks backwards to a UTF-8 leading byte so the cut never lands inside a
multi-byte codepoint. New unit test covers '设'.repeat(4096).

P2 — Spec R7 now lists the actual Langfuse trace deletion endpoint
(DELETE /api/public/traces/{traceId} for single, DELETE /api/public/traces
with body for batch). Verified by curl on us.cloud.langfuse.com:
DELETE /api/public/traces/X → 200; the path the original spec named
(POST /api/public/trace/X) returns 404. Reference link points at
langfuse.com/docs/administration/data-deletion.

P3 — Q4 (legacy ingestion vs OTel) moved from Open Questions to
Resolved Decisions. The implementation already commits to legacy and
the trade-off was discussed during design; the open-question status was
stale.

* feat(web): privacy consent surface + Settings → Privacy tab

Adds the user-facing half of the telemetry feature so the daemon-side
hook from PR #800 has something to talk to.

- AppConfig gains optional `installationId` (anonymous v4 uuid generated
  on first opt-in; null after explicit decline; undefined when the user
  has never seen the consent surface) and `telemetry: TelemetryConfig`
  ({metrics, content, artifactManifest}). syncConfigToDaemon round-trips
  both fields so the bridge module sees the same prefs.

- SettingsDialog grows a Privacy section with two states. When the user
  has never made a consent decision (typical first-run path), the
  section renders the GDPR-aligned consent card: a kicker, the disclosure
  body listing both metrics and conversation content as separate bullets,
  and two equally-prominent buttons ("Share usage data" / "Don't share").
  The Don't-share path keeps the app fully usable (core app must work
  with all tracking declined). After a decision the same panel switches
  to three independent toggles + the anonymous ID + a "Delete my data"
  button that rotates the ID and turns everything off.

- App.tsx points the welcome modal at the new Privacy section so the
  consent decision is the first thing a fresh installation sees.

- 17 i18n keys land in en + zh-CN + zh-TW with hand-translated copy,
  and as English placeholders in the remaining 14 locales — enough for
  the parity check to pass while leaving room for proper localisation
  in a follow-up. Dict type updated.

- Minimal index.css for the consent card + toggle rows so the panel is
  legible without depending on follow-up design polish.

Telemetry remains a no-op end-to-end until the user clicks Share usage
data: the daemon gate (prefs.metrics === true) keeps every code path
short-circuited otherwise.

* refactor(web): rebuild Privacy panel using project-native settings primitives

The first cut used custom .settings-privacy-* classes + raw HTML
checkboxes that didn't match any other Settings tab. Replace with the
shell other sections already use:

- settings-subsection containers with section-head + h4 + .hint
- seg-control / seg-btn pill toggles ("active" / "offline") for each of
  the three telemetry preferences, mirroring NotificationsSection
- a 2-cell seg-control for the consent card so Share usage data and
  Don't share carry identical visual weight (the GDPR equal-prominence
  requirement that the previous accent / outline split missed)
- ghost button + readonly text input for the installation id row,
  mirroring the API-key field pattern elsewhere

Drop the bespoke CSS block in favor of inheriting the existing
settings-section / seg-control / ghost styling. The only privacy-
specific style left is a tight definition list inside the consent card
for the metrics + content disclosure rows.

* refactor(web): use .toggle-row iOS switch for Privacy preferences

Active/offline pills (the seg-control single-cell pattern that
NotificationsSection uses) read awkwardly for a flat preference list.
Switch the three telemetry toggles to .toggle-row — the same control
NewProjectPanel uses for "speaker notes" / "animations": label + hint
on the left, iOS-style sliding switch on the right, full-row click
target. The consent card's two-button seg-control stays as-is — there
the equal-weight pill pair is exactly what GDPR equal-prominence wants.

* feat(web): standalone first-run privacy consent banner

Replaces the Settings-dialog-as-onboarding hack with a dedicated
bottom-right banner card that mounts whenever the user has never made
a privacy decision (cfg.installationId === undefined). The banner is
prominent (anchored to the corner with a soft shadow) but
non-blocking, mirrors cookie-consent UX, and shares the project's
panel styling — same .modal-elevated background, --radius-lg corners,
--shadow-lg lift.

Wiring:

- App.tsx imports PrivacyConsentModal and renders it at the root,
  gated on installationId === undefined && !settingsOpen so it doesn't
  double up with the Privacy tab's own consent card when Settings is
  already showing.
- Share / Don't share both go through handleConfigPersist, so the
  resulting installationId + telemetry prefs land in localStorage and
  the daemon at the same time, reusing the existing autosave plumbing.
- The previous attempt that pinned the welcome SettingsDialog to the
  Privacy section is reverted; onboarding now stays focused on agent
  configuration, and the consent decision lives in its own surface.

* fix(web): keep privacy banner visible while Settings welcome modal is open

The banner gated itself on `!settingsOpen` to avoid double-rendering
with the Privacy tab's consent card. But the first-run path opens the
Settings welcome modal automatically when `onboardingCompleted=false`,
which fired immediately after bootstrap — so the banner flashed for a
moment and then vanished behind the modal backdrop.

Drop the `!settingsOpen` clause so the banner stays mounted whenever
the user has not yet made a privacy decision, and bump its z-index
above the modal backdrop (200 vs 100) so first-run users can actually
reach the consent buttons. The minor visual overlap with the Privacy
tab's own card is fine: clicking either copy resolves both surfaces.

* copy(privacy): soften consent button labels

Banner action buttons now read "Help improve Open Design" / "Not now"
(en, with hand translations in zh-CN / zh-TW and English placeholders
in the other 13 locales) instead of "Share usage data" / "Don't share".

The new wording aligns the affirmative action with the kicker copy
("Help us improve Open Design") and reads less alarming, while the
disclosure list above still names both data categories explicitly so
the consent stays informed under GDPR. The decline button stays as a
soft "Not now" rather than an aggressive "Don't share" so the reject
path doesn't read as hostile to the user.

No structural change — the two-cell seg-control still gives the buttons
identical visual weight, and the underlying side-effects are unchanged
(installationId is generated on Help / nulled on Not now, and the
telemetry prefs flip the same way).

* feat(telemetry): expand trace fields for evals & dataset construction

Each Langfuse trace now ships the full per-turn + per-install fact
sheet that the eval/dataset workflow needs, instead of only the bare
turn id + token count from before. Everything below is gated by
`prefs.metrics === true`; nothing here is content (those gates remain
separate).

Per-turn:
- model — first-class generation.model field, drives Langfuse cost
  lookup and model-grouping in the UI; also mirrored in trace.metadata
  and trace.tags so list-view filters work.
- reasoning — generation.modelParameters.{ reasoning } so the Model
  Parameters card lights up; mirrored in metadata.
- skillId / designSystemId — metadata + tags, so dataset slices can
  group by which skill/DS produced which output.

Per-process / build (constant within one daemon run, cached at start):
- appVersion / appChannel / packaged from app-version.ts
- nodeVersion (process.version), os (platform()), osRelease,
  arch (os.arch())
- clientType — desktop vs web, derived from a new X-OD-Client header
  the web layer sets in providers/daemon.ts (with a User-Agent sniff
  fallback for third-party callers).

Plumbing:
- startChatRun stashes model / reasoning / skillId / designSystemId
  on the run object alongside the existing userPrompt stash.
- POST /api/runs reads X-OD-Client and stores run.clientType.
- langfuse-bridge collects RuntimeInfo once per process and merges
  per-run client carrier; ReportContext gains optional `turn` +
  `runtime` blocks; existing fields stay backward compatible.

Spec gains a "Telemetry Fields Catalog" section enumerating every
field, its source, and the gate it lives under, so the eval team has a
single place to look up what's available without reading the trace
schema by example.

Tests:
- new langfuse-trace tests cover turn tags, runtime tags, generation
  model/modelParameters promotion, modelParameters omission when
  reasoning is unset, and metadata mirroring.
- langfuse-bridge gains an end-to-end "turn-level config" test that
  threads model/reasoning/skill/DS/clientType + appVersion through
  the bridge and asserts the Langfuse payload shape.
- existing tests adjusted to tolerate host-dependent os tag.

* copy(privacy): trim Share button to verb phrase only

"Help improve Open Design" overflowed the equal-width 2-cell
seg-control on the consent banner — the product name is already in
the kicker + headline above the buttons, so the button itself only
needs the verb phrase. Drop the product name from all locales:

- en: Help improve Open Design → Help improve
- zh-CN: 帮助改进 Open Design → 帮助改进
- zh-TW: 協助改進 Open Design → 協助改進

The decline button ("Not now" / "暂不" / "暫不") was already short, so
the two buttons now have comparable length and the equal-prominence
seg-control fits cleanly. Standalone Settings → Privacy panel uses
the same labels for consistency.

* fix(web): defer Settings welcome modal until privacy decision is made

Previously bootstrap raced two surfaces against each other on first
launch: the privacy consent banner (gated on installationId ===
undefined) and the Settings welcome modal (gated on
onboardingCompleted === false). The banner's higher z-index kept it
above the backdrop visually, but having two foreground surfaces at
once is still confusing UX.

Sequence them instead: bootstrap only opens the welcome modal when
the user has *already* resolved consent (installationId !== undefined).
Until then the banner owns the foreground alone. Once the user clicks
Help improve / Not now, the corresponding handler hands off to the
welcome modal if onboarding is still pending. End state matches what
it was before — just without the simultaneous-render flash.

* debug(privacy): log banner gate state to track sudden disappearance

Two console.log points to find which setCfg call (or stale bundle) is
flipping cfg.installationId from undefined to a value while the banner
is visible. To remove once the regression is reproduced.

* fix(privacy): keep installationId + telemetry out of localStorage

Daemon is now the single source of truth for the privacy decision.
Why this matters: the consent banner gates on
\`config.installationId === undefined\`, but loadConfig() merges
localStorage on top of the daemon's reply, so a stale uuid in
\`open-design:config\` (left over from a previous opt-in) was
re-hydrating the React state and immediately syncing back to the
daemon — defeating "Delete my data" and re-suppressing the banner
within milliseconds of every page load.

The deeper reason to fix it here, not just patch the gate: a privacy
identifier persisted in browser storage that the user can't see or
clear without DevTools is a compliance liability. Anything users can
revoke needs one canonical place to store it. Daemon \`app-config.json\`
already serves that role for everything else gated through
syncConfigToDaemon, so installationId + telemetry now ride that path
exclusively:

- saveConfig() strips both keys before writing localStorage.
- loadConfig() strips both keys when reading older stale payloads,
  so existing installs migrate transparently on next launch.
- syncConfigToDaemon() / mergeDaemonConfig still round-trip them, so
  the React state stays in sync with the daemon as before.

Net effect: clearing app-config.json (or hitting "Delete my data") now
fully resets the install identity, with no residual cohort key in
browser storage.

* feat(privacy): scrub secrets + PII from prompt/output before send

When prefs.content is on, daemon now runs the prompt and assistant
text through a regex scrubber (apps/daemon/src/redact.ts) before
posting to Langfuse. The scrubber is the simplest thing that gives
the user-facing copy a truthful claim — pure regex, zero new
dependencies, fully auditable in this Apache-2.0 repo (vs. pulling a
single-maintainer 5-month-old npm package into a core process).

Categories covered (each replaced with [REDACTED:<kind>]):

- Anthropic / OpenAI sk- keys (incl. proj/live/test/ant variants)
- Langfuse pk-lf- / sk-lf- (specific rule wins over generic sk-)
- GitHub gh[opsur]_ tokens
- AWS access key ids (AKIA + 16 uppercase)
- Google API keys (AIza + 35)
- Slack xox[abprs]- tokens
- Stripe live/test keys
- JWT header.payload.signature triples
- Bearer-header values (scheme word stays readable)
- Emails, IPv4, US-style phone numbers
- Credit cards — 13–19 digit runs that pass a Luhn check, so order ids
  and unix-nanos timestamps that fail Luhn pass through unchanged

Not covered, stated openly in spec + i18n: names, postal addresses,
business-secret semantics, raw 40-hex tokens (too high a false-positive
cost for artifact slugs). Those would require an ML layer.

Wired in:
- apps/daemon/src/redact.ts — exports redactSecrets() +
  redactSecretsWithCounts() helper for future audit-summary metadata.
- apps/daemon/src/langfuse-bridge.ts — runs both prompt and output
  through redactSecrets() before they reach the trace builder.
- 18 unit tests cover every pattern plus negative cases (Luhn-failing
  digit runs, out-of-range IPv4 octets, idempotence on re-redacted
  text, ordinary prose passthrough).
- i18n privacyContentHint on en + zh-CN + zh-TW (plus 14 locale
  placeholders) enumerates the categories so the consent disclosure
  matches the implementation — the GDPR informed-consent requirement.
- spec gains a Pre-send Redaction subsection with the regex shape
  table + intentional non-coverage list.

Drive-by: dropped the [privacy] debug logs that traced the now-fixed
bootstrap regression.

* fix(telemetry): make Langfuse reporting resilient

* feat(telemetry): nest Langfuse turn observations

* feat(telemetry): emit Langfuse tool spans

* fix(telemetry): report after finalized message writes

* fix(telemetry): honor persisted terminal status

* fix(web): let consent banner yield page clicks

* fix(telemetry): report current turn prompt only
2026-05-09 10:06:01 +08:00