Commit graph

200 commits

Author SHA1 Message Date
Alex Lucero
56a4e97871 feat: surface linked repo changes 2026-05-31 08:07:19 +02:00
BayesWang
af4a62b69a
Add configurable project locations (#2041)
* add daemon project location support

* wire project locations into web settings

* localize project location settings

* move default project location to settings

* polish project location selection cards

* fix project location i18n gaps

* fix external project validation cleanup
2026-05-31 04:47:45 +00:00
Denis Redozubov
729ce2b0cb
feat(daemon): add run-scoped MCP tool bundles (#3244)
* feat(daemon): add run-scoped MCP tool bundles

* fix(daemon): keep sandbox runs in managed project dirs

* fix(daemon): reject malformed run tool bundles

* fix(contracts): model run-scoped mcp server inputs

* fix(daemon): reject unsupported run tool bundles

* fix(daemon): validate run tools before chat fallback

* test(daemon): expect sandbox imported folder failure

* fix(daemon): preflight sandbox project roots before run rows

* fix(daemon): preflight sandbox chat project roots

* fix(daemon): allow host editor for sandbox imports

* fix(daemon): preflight sandbox routine project reuse

* fix(daemon): reject undeliverable Claude tool bundles

* fix(daemon): single-source chat route validation
2026-05-31 03:53:04 +00:00
蓝宙
e8c179d3a6
fix: show cumulative conversation duration (#3354)
* fix: show cumulative conversation duration

* fix: include usage-only run durations

---------

Co-authored-by: Lanzhou3 <217479610+Lanzhou3@users.noreply.github.com>
2026-05-31 03:52:12 +00:00
RyanCheng77
f12679185c
fix(web): send Anthropic proxy image attachments (#3273)
* fix(web): send Anthropic proxy image attachments

* fix(web): omit image attachment stubs for Anthropic proxy

* fix(web): keep image fallback context aligned

* fix(web): align Anthropic image attachment omission

---------

Co-authored-by: 116405 <116405@ky-tech.com.cn>
2026-05-30 04:47:47 +00:00
JasonBroderick
0fbeaf829e
fix(#3247): Detect, terminate, and warn on fabricated role markers across all agent paths (#3303)
* fix(daemon): detect and strip fabricated role markers in model output (#3247)

Three-layer defence against models emitting `## user` / `## assistant` /
`## system` lines mid-response, which the chat host interprets as real
turn boundaries and acts on as unauthorised instruction:

1. **System prompt**: anti-roleplay instruction elevated from a bullet
   under "What you don't do" to a standalone `## CRITICAL` section in
   `official-system.ts`, with a REMINDER pinned at the end of the
   composed prompt for recency bias.

2. **Stream-level detection and truncation**: shared `role-marker-guard.ts`
   module (`createRoleMarkerGuard` + `FABRICATED_ROLE_MARKER_RE`) used
   across all text paths — Claude stream (per-message guards), non-Claude
   structured streams (run-scoped guard via `emitGuardedTextDelta`),
   and BYOK proxy routes (`createDeltaGuard`). When a marker is detected,
   the contaminated suffix is dropped and a `fabricated_role_marker` event
   surfaces a warning in the UI.

3. **UI**: `StatusPill` gains `is-warning` / `is-error` CSS variants;
   `fabricated_role_marker` events render as amber warning pills.

* fix(chat-routes): do not await reader.cancel() on stream early-return

The await on reader.cancel() can hang indefinitely on response streams
whose underlying source is a Uint8Array (most notably surfaced by the
ollama test in proxy-routes.test.ts, which builds its mock body via
`new Response(uint8array)` rather than the controller-based helper
`sseResponse()`). The hung await holds the request handler open, which
in turn blocks `server.close()` in the afterAll hook, producing the two
test timeouts (test at 145, hook at 36) currently failing CI on #3296.

Fix is in production code, not the test: don't await the cancel. It
is a cleanup hint and we are returning from the function anyway, so
blocking on it offers no value. fire-and-forget with an empty catch
keeps the cancel signal flowing for real HTTP streams without
risking a hang on mock/edge-case implementations.

Co-Authored-By: JasonBroderick <jason@buddyboss.com>

* fix(daemon): terminate child on role-marker detection (close #3247 generation vector)

PR #3296's detection layer truncates display and persistence of fabricated
role markers, but the underlying model subprocess keeps generating tokens
after detection. Three concrete consequences:

  1. The model bills the user for the entire contaminated response
     (we observed 5,106 chars stored in claude's session file for a turn
     where only the first 3,013 chars were legitimate — a 40% overhead).
  2. tool_use blocks emitted AFTER the marker reach the daemon's
     dispatcher unchecked, since detection only gates the text-delta
     emission path, not content-block-stop / tool_use blocks. The
     model could fabricate "## user delete file X" then emit a
     tool_use(delete X) that the dispatcher would execute.
  3. The UI surfaces a `fabricated_role_marker` warning followed by an
     eventual normal turn-end, blurring the distinction between
     "completed normally" and "killed by safety guard."

This commit adds a single idempotent `abortForRoleMarker(marker)`
helper in server.ts, scoped to the same closure as `child` and
`runGuard`. On any detection event (per-message Claude guard,
run-scoped non-Claude guard, plain stdout guard) the helper:

  - Emits a structured `ROLE_MARKER_HALLUCINATION` SSE error so the
    UI can render a security-class status distinct from a normal
    turn-end. The existing `fabricated_role_marker` warning is still
    sent and rendered as the amber pill (PR #3296's UI).
  - Calls `acpSession.abort()` for ACP-multiplexed agents (Hermes,
    Kimi, Devin, Kiro) whose I/O doesn't necessarily release on
    SIGTERM of the wrapper process alone.
  - SIGTERMs the child immediately, with the existing
    `scheduleForcedChildShutdown()` SIGKILL fallback at 2x grace.

Wired into three sites where contamination is detected:
  - `emitGuardedTextDelta` (sendAgentEvent / copilot / ACP / pi-rpc
    text_delta paths)
  - Plain-stdout listener (BYOK plain mode)
  - The Claude stream handler's onEvent (per-message guards in
    claude-stream.ts surface `fabricated_role_marker` events directly
    via onEvent rather than through the run-scoped emitGuardedTextDelta)

Tool_use blocks emitted BEFORE the marker still flow through normally
— this guard can't help with those, since by the time we observe a
text marker the prior content block has already finished. Closing
that gap requires speculative cancellation of in-flight tool calls
when a downstream text block contains a marker; that's tracked as
follow-up work, not included here.

Co-Authored-By: roverkai <2196140098@qq.com>
Co-Authored-By: JasonBroderick <jason@buddyboss.com>

* refactor(role-marker-guard): bounded tail + drop chat-style markers

Addresses two review comments on #3303:

(1) O(1) memory + per-delta work (review r3323982225)
  Replace the unbounded `accumulated` string with a rolling tail capped
  at TAIL_BUFFER_SIZE (64 chars — comfortably exceeds the longest
  marker prefix `\n<whitespace>## assistant` ≈ 16–24 chars in practice).
  A 50 KB assistant response delivered in 1000 chunks of 50 bytes was
  previously O(n²) on string concatenation alone; now it is O(1) per
  delta regardless of message length. The `tail.length` value carries
  the "already emitted" offset that the cut-point math needs, so the
  offset semantics at L74–78 of the prior implementation are preserved
  without re-introducing the full-text buffer.

(2) Drop chat-style markers entirely (review r3323982234, option (a))
  `User:` / `Assistant:` / `Human:` / `AI:` are removed from the regex.
  Rationale:
    - The host parses ONLY `## user` / `## assistant` / `## system`
      lines as turn boundaries (see `buildDaemonTranscript` in
      apps/web/src/providers/daemon.ts). A model emitting chat-style
      markers does NOT cause the original #3247 security failure.
    - With kill-on-detection wired in this PR (`abortForRoleMarker`
      in server.ts), a false positive aborts the whole run — far
      more expensive than a stray unflagged `User:` line in chat
      scrollback. Chat-style markers collide with legitimate output
      (form labels, email contacts, JSDoc) often enough that pairing
      them with kill-semantics is the wrong tradeoff.
  The tradeoff is now documented in the regex docblock so the
  kill-on-match behaviour is justified against the false-positive
  surface.

Also aligns the prompt-side CRITICAL block in system.ts: drop the
"don't emit User: / Assistant: / Human: / AI:" bullet, since we no
longer enforce it. Less ambiguity for the model and the operators.

Test file updated:
  - Chat-style positive tests flipped to negative ("does NOT match
    User: — chat-style out of scope") so the intentional exclusion
    has a permanent regression test.
  - Two new tests cover the bounded-tail behaviour: a marker arriving
    after 10 KB of clean text in small chunks, and a marker
    straddling a chunk boundary after 100 prior chunks.
  - Added test for legitimate `User: bob@example.com`-style content
    not triggering contamination.
Test count is now 35 (up from 25); two of the new ones explicitly
exercise the new bounded-tail path.

Co-Authored-By: JasonBroderick <jason@buddyboss.com>

* fix(role-marker-guard): drop \`^\` anchor after first chunk (review r3324060995)

Blocking correctness bug introduced by commit 4 (bounded-tail refactor):
once \`tail\` is a rolling slice of mid-stream text, \`^\` in the
canonical regex \`(?:^|\\n)\\s*##\\s+(?:user|...)\` no longer represents
the genuine message start. As the rolling window slides forward chunk
by chunk, a sliced tail can begin with whitespace + \`##\` (or just
\`##\`), letting \`^\` anchor a match against text that the
full-buffer implementation correctly ignored. With kill-on-detection
wired in commit 3, that false positive now SIGTERMs the run and emits
a \`ROLE_MARKER_HALLUCINATION\` error — exactly the failure class
called out in the docblock at L22–29.

Reviewer's evidence (PerishCode, r3324060995): streaming
"…take a look at the ## user content section…" one character at a
time reports \`contaminated: true\` post-refactor; the same text in a
single feed stays clean.

Fix: keep the canonical \`FABRICATED_ROLE_MARKER_RE\` for the very
first non-empty feed (where \`^\` legitimately points at the message
start), and switch to an internal \`NEWLINE_ANCHORED_ROLE_MARKER_RE\`
(\`\\n\\s*##\\s+(?:user|...)\` — drops the \`^\` alternative) for all
subsequent feeds. A \`firstChunk\` boolean tracks the state. Real
newline-preceded markers straddling chunk boundaries are still caught
because the preceding \`\\n\` is retained inside the 64-char tail.

Regression tests added (\`apps/daemon/tests/role-marker-guard.test.ts\`):
  - mid-line \`## user\` streamed char-by-char with no preceding \\n
    (mirrors the reviewer's repro)
  - space-preceded mid-line \`## user\` in a >130-char stream, which
    long enough to force the rolling window past the marker — exercises
    the exact slice condition that triggered the bug
  - real \\n-preceded \`## user\` still caught after a long preamble
    (positive case must not regress)
  - \`## user\` as the very first chunk still caught (\`^\` legitimately
    anchors on the first feed)

Co-Authored-By: JasonBroderick <jason@buddyboss.com>

* fix(role-marker-guard): case-sensitive + tighter prefix scope (reviews r3324151877 / r3324151882)

Two refinements addressing the third review on #3303:

== Blocking (r3324151877) ==
The regex over-matched legitimate Markdown headings, and with
kill-on-detection wired in commit 3 each false positive
deterministically aborts a real run. Three changes tighten the match
to the actual security surface — `## user` / `## assistant` /
`## system` lines the chat host parses as turn boundaries — without
losing any real attack pattern:

1. CASE-SENSITIVE. Dropped the `/i` flag. The host's turn-boundary
   delimiter is lowercase (see `buildDaemonTranscript` in
   apps/web/src/providers/daemon.ts), and the `## CRITICAL`
   system-prompt block already forbids only the lowercase forms.
   Title-Case headings like `## User Guide`, `## System Architecture`,
   `## Assistant settings` are now ignored — these are legitimate
   technical writing patterns LLMs emit constantly. `## USER NOTES`
   (all-caps) likewise no longer flags.

2. POSITIVE LOOKAHEAD `(?=[^a-z])` after the role keyword. Without it,
   `## userland`, `## userspace`, `## users guide`, `## systemd`,
   `## assistance` all match via prefix in the alternation. The
   lookahead requires the next character to exist and to not be a
   lowercase letter, so:
     - `## user\\n…`     → match (newline is not lowercase)
     - `## assistantR…` → match (R is uppercase; the glued-form
                          attack pattern still gets caught)
     - `## assistant.`  → match (. is not a letter)
     - `## users guide` → no match (s is lowercase letter)
     - `## userland`    → no match (l is lowercase letter)
   POSITIVE rather than NEGATIVE `(?![a-z])` because the negative
   form is satisfied at end-of-string, which in a streaming context
   means "we have `## user` but don't know what comes next yet" —
   would fire prematurely if `land` arrives in a later chunk. The
   positive form delays detection by one character in that edge
   case, traded for correctness.

3. `[ \\t]` instead of `\\s` for inner whitespace. Markdown role
   markers are single-line by convention; restricting to space/tab
   prevents oddities like `##\\nuser` from matching across lines.

Test file: added Title-Case fixtures (`## User Guide`,
`## System Architecture`, `## Assistant settings`, `## USER NOTES`)
and prefix-of-longer-word fixtures (`## users guide`, `## userland`,
`## systemd`, `## assistance`) — each asserting NO contamination.
The existing `## usability` negative test gave false confidence as
the reviewer noted (only failed via alternation-miss, not via
word-boundary semantics); the new fixtures actually exercise the
lookahead. Also added a positive test for `## assistant.` (glued
punctuation) to balance the existing `## assistantReading`
(glued uppercase) coverage. Total tests: 35 → 50.

== Non-blocking (r3324151882) ==
Added `ROLE_MARKER_HALLUCINATION` to `API_ERROR_CODES` in
`packages/contracts/src/errors.ts` alongside the existing agent/AMR
codes, with a docblock comment explaining the emission contract:
emitted by `server.ts::abortForRoleMarker` alongside the existing
`fabricated_role_marker` warning event when the daemon detects a
fabricated Markdown role marker in agent output; retryable. The code
was already being emitted over the wire but unregistered — landing
the registration here keeps the contract and emitter in sync as
reviewer requested.

Co-Authored-By: JasonBroderick <jason@buddyboss.com>

* fix(role-marker-guard): defer complete-but-unconfirmed marker suffix

Addresses review r3324277xxx — the boundary case where a stream chunk
boundary lands between the role keyword and its lookahead character
violated the documented "everything from the marker onward is silently
dropped" contract. With (?=[^a-z]) as the lookahead, `feedText('## user')`
returned `## user` as safe (no char to satisfy the lookahead → no match
→ pass through), so the fabricated marker line leaked into UI and
app.sqlite before the next chunk confirmed contamination on the next
SIGTERM cycle.

Fix: introduce a `pending` state variable holding bytes that match the
COMPLETE-but-unconfirmed marker prefix at end of buffer
(/(?:^|\\n)[ \\t]*##[ \\t]+(?:user|assistant|assist|system)$/, no
lookahead, $ anchor instead). When the no-match branch detects this
suffix, withhold it from emission until the next feed either:
  - Confirms it (next char non-lowercase) → main regex matches →
    contaminated → withheld bytes dropped along with `## user`.
  - Denies it (next char lowercase, e.g. `userl…`) → main regex no
    longer matches the role keyword → withheld suffix is released
    and emitted alongside the new continuation.

Also tied the firstChunk transition to actual byte emission rather
than feed count. Previously a message that starts with `## system`
followed by a separate `\\n` chunk would lose the `^` anchor on the
second feed (firstChunk had flipped after the first feed even though
nothing was emitted yet), silently breaking detection for that edge
case. Now `firstChunk` stays true until at least one byte has crossed
the emission boundary, matching the conceptual definition of "message
start".

Tests added (apps/daemon/tests/role-marker-guard.test.ts):
  - `## user` deferred at chunk boundary, confirmed by `\\n` in next
  - `## user` deferred at chunk boundary, denied by `land` continuation
  - `## assistant` deferred, confirmed by punctuation
  - `## User` Title-Case still passes through unconditionally
  - `## system` as the very first chunk: deferred, confirmed by \\n
    in next chunk (tests the firstChunk-stays-true-when-nothing-
    emitted invariant)

Total tests: 50 → 55.

Co-Authored-By: JasonBroderick <jason@buddyboss.com>

* fix(claude-stream): scope role-marker guard to text_delta only, not thinking_delta

Addresses review r3324xxxxxx — guarding the thinking channel buys no
security and causes legitimate aborts.

Why thinking is NOT a #3247 vector:
  - `buildDaemonTranscript` in apps/web/src/providers/daemon.ts only
    re-serializes `m.content` as `## ${m.role}\n...`.
  - Extended-thinking content is rendered to a separate
    `kind: 'thinking'` payload (daemon.ts:857-858) and never folded
    into `m.content`.
  - So a `## user` line in the thinking channel CANNOT become a
    fabricated turn boundary on the next round-trip.

Why guarding it is harmful:
  - Models routinely emit literal `## user` / `## assistant` lines
    in chain-of-thought when reasoning about conversation structure
    ("Let me think about this. The user might phrase it as:\n## user\n
    …"). Common pattern in production traces.
  - With `abortForRoleMarker` wired in server.ts, a guard match on
    thinking SIGTERMs the run and surfaces a security error to the
    UI. The user paid for the reasoning, never sees the answer, and
    gets a confusing "fabricated role marker" warning for what was
    actually legitimate metacognition.
  - This directly contradicts the module's own stated philosophy
    ("a false positive aborts the whole run — a much more expensive
    failure than a stray unflagged ... line", role-marker-guard.ts).

Fix: `emitSafeText` now passes thinking_delta through unconditionally,
skipping both the guard and the contamination check. text_delta
remains fully guarded. The single-line change at the top of
emitSafeText preserves all other channels' behavior.

Regression tests added (apps/daemon/tests/claude-stream-thinking.test.ts):
  - `## user` / `## assistant` lines in a thinking_delta — must NOT
    fire fabricated_role_marker, the thinking content streams intact
    including the marker text, and the subsequent text_delta answer
    still reaches the consumer (run not aborted).
  - Sanity check: same `## user` pattern in a text_delta DOES fire
    fabricated_role_marker and truncates emission at the marker. Locks
    in the channel-discriminated behavior.

Co-Authored-By: JasonBroderick <jason@buddyboss.com>

* fix(role-marker-guard): tie firstChunk to slicing, not byte emission

Blocking review r3324xxxxxx: under the prior firstChunk transition
("any byte emitted"), a role marker that arrived at the very start of
a message with its prefix split across multiple chunks bypassed
detection — reopening the #3247 vector on the Claude path.

Concrete cases that were missed (all are routine provider
tokenizations of \`## user\n…\` at message start):
  - \`##\`     | \` user\nDELETE…\`
  - \`## us\`  | \`er\nDELETE…\`
  - \`## \`    | \`user\nDELETE…\`

Mechanism: the pending-deferral regex only catches COMPLETE role
keywords, so a first chunk ending in a partial prefix (\`##\`, \`## \`,
\`## us\`) was emitted in full. That emission flipped firstChunk to
false. From that point only NEWLINE_ANCHORED_ROLE_MARKER_RE was used,
which requires a literal \n before \`##\`. A marker at buffer
position 0 has no preceding \n, so it could no longer match.
abortForRoleMarker never fired and tool_use blocks emitted after the
fabricated turn boundary reached the dispatcher.

Fix: change firstChunk to track "tail has not been sliced yet" rather
than "any byte emitted". While total emitted bytes <= TAIL_BUFFER_SIZE,
tail still represents the entire emission so far and \`^\` in the
canonical regex genuinely anchors at byte 0 of the stream — so the
\`^|\n\` alternation safely catches a chunk-split message-start
marker. The transition happens at the moment we would slice: once
emitted > TAIL_BUFFER_SIZE, tail becomes a mid-stream window, \`^\`
becomes meaningless, and we switch to the newline-only variants.

Earlier iterations of this code tried two other definitions, both
unsound:
  - "any byte emitted" (this commit fixes) — lost \`^\` before a
    chunk-split message-start marker could finish arriving.
  - "newline emitted" (briefly considered as the reviewer's
    alternative suggestion) — left \`^\` valid on a sliced buffer
    when streams hadn't emitted a newline yet, re-introducing the
    rolling-tail mid-stream false positive from review r3324060995.
The slice-based invariant satisfies both: while we have not sliced,
\`^\` is correct; once we slice, it is not.

Regression tests added (apps/daemon/tests/role-marker-guard.test.ts):
  - \`##\`    | \` user\nDELETE…\`   → contaminated, marker=\`## user\`
  - \`## us\` | \`er\nDELETE…\`      → contaminated, marker=\`## user\`
  - \`## \`   | \`user\nDELETE…\`    → contaminated, marker=\`## user\`
  - \`#\`     | \`# user\nDELETE…\`  → contaminated, marker=\`## user\`
The fourth case (single \`#\` first chunk) exercises an even more
adversarial tokenization than the reviewer's examples; it is also
caught.

Total tests: 55 → 59.

Co-Authored-By: JasonBroderick <jason@buddyboss.com>

* fix(tests): wrap events in stream_event envelope in thinking test

feedJsonl was feeding raw events without the `{ type: 'stream_event',
event: ... }` wrapper that createClaudeStreamHandler requires (line 141
of claude-stream.ts). Events silently fell through all branches, making
both tests pass vacuously. Also fix TS2532 on warnings[0].marker with
non-null assertion (safe after the toHaveLength(1) guard).

Co-Authored-By: RoverKai <roverkai@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: roverkai <2196140098@qq.com>
Co-authored-by: JasonBroderick <jason@buddyboss.com>
Co-authored-by: RoverKai <roverkai@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-30 03:57:56 +00:00
Ramiro
e30a4a2202
fix(platform): search mise shims dir so mise-installed CLIs are detected (#3319)
* fix(platform): search mise shims dir so mise-installed CLIs are detected

- Add ~/.local/share/mise/shims (and MISE_DATA_DIR override + legacy ~/.mise/shims) to wellKnownUserToolchainBins.
- This makes Pi, Kimi, and other mise-managed coding agents visible to the daemon even when launched from GUI contexts with stripped PATH.
- Added tests for default and MISE_DATA_DIR cases.
- Also pinned pnpm@10.33.2 in root mise.toml for better mise ergonomics.

Before/after: more local CLIs now appear in the runtime picker (Kimi, Pi, Antigravity, Kilo, etc.).

Refs: discussion in session around improving detection for common mise users.

* fix(platform): address Copilot review on mise shims logic

- Generalize the shims comment (no hard-coded CLI examples).
- Make per-version Node toolchain scanning respect MISE_DATA_DIR
  (use the same mise root for installs as for shims).
- Avoid duplicate shims entries when MISE_DATA_DIR makes legacy path
  identical to the primary one.

Addresses the three inline comments from copilot-pull-request-reviewer
on PR #3319.

* test(platform): extend MISE_DATA_DIR test to cover installs scanning

Addresses non-blocking review feedback from @nettee on PR #3319.

The previous test only asserted shims behavior under a custom
MISE_DATA_DIR. This extends it to also create fixture trees under
customMise/installs/node/... and customMise/installs/npm-openai-codex/...
and assert that the install paths are discovered while default-root
paths are excluded.

This makes the test robust against regressions in the installs
scanning logic (existingMiseNpmPackageBinDirs + node version dirs).

* fix(platform): only fall back to ~/.mise/shims when no MISE_DATA_DIR is set

Addresses the remaining non-blocking review comment from @nettee on PR #3319.

When an explicit MISE_DATA_DIR is provided, we no longer inject the
legacy ~/.mise/shims path. This prevents stale shims from a previous
mise layout from being re-introduced into detection.

Also added a regression assertion in the MISE_DATA_DIR test.

* fix(daemon): make claude-stream dedup robust when final assistant wrapper lacks msgId

Prevents duplicated text and thinking output (especially visible during
design system generation with AMR/Vela).

Root cause: the textStreamed guard fell back to  whenever the
final  message arrived without a string uid=501(ramarivera) gid=20(staff) groups=20(staff),12(everyone),61(localaccounts),79(_appserverusr),80(admin),81(_appserveradm),399(com.apple.access_ssh),33(_appstore),98(_lpadmin),100(_lpoperator),204(_developer),250(_analyticsusers),395(com.apple.access_ftp),398(com.apple.access_screensharing),400(com.apple.access_remote_ae) (common in some
AMR flows and design system tasks), causing the full content to be
re-emitted even if it had already been delivered via streaming deltas.

Fix: track whether any text or thinking was streamed via deltas for the
current message and use that as a reliable fallback for the final wrapper
instead of only trusting  presence.

* revert: remove dedup from claude-stream (PR #3319 should stay clean)
2026-05-30 03:21:04 +00:00
saifullakhan
73b2dc853f
Fix project empty state create action (#3082)
Co-authored-by: saifulla-khan <saifulla-khan@users.noreply.github.com>
Co-authored-by: Siri-Ray <2667192167@qq.com>
2026-05-29 08:30:43 +00:00
lefarcen
98a2c63973
feat(daemon): add Antigravity agent adapter (#3157)
* feat(daemon): add Antigravity agent adapter

Adds Google Antigravity (`agy` CLI) as a coding-agent runtime. Detection
picks up `agy` on PATH, the daemon spawns `agy -p "<prompt>"` for a
single non-interactive turn, and the assistant text reply streams back
on stdout. OAuth is shared with the Antigravity IDE through the system
keyring, so users who have signed into the desktop app are authenticated
on first run with no extra step.

`agy` v1.0.3 has no JSON / stream-json / ACP output mode (upstream issue
#119), no `--model` flag (issue #35), and no MCP forwarding hook yet —
the adapter ships with `streamFormat: 'plain'` and a single `default`
fallback model so the model picker doesn't mislead users into thinking
their choice is wired through. We will upgrade buildArgs + add a
dedicated event parser when upstream ships structured output.

Also gitignores `.antigravitycli/`, the project-local config directory
`agy` auto-creates on every run (upstream issue #175).

* fix(daemon): Antigravity adapter — stdin prompt, brand icon, form loop, empty-output guard

- Switch prompt delivery from argv to stdin (`agy -p -`) to avoid the
  30KB maxPromptArgBytes limit that blocked real-world composed prompts
- Add official Antigravity brand SVG icon to agent picker
- Fix repeated question-form loop for plain agents by injecting an
  OVERRIDE block when form answers are already present in the transcript
- Add empty-output guard for plain agents so expired auth or silent
  failures surface a user-visible error instead of a blank "Done" turn

* feat(daemon): expand Antigravity adapter — model picker, form-loop fix, OAuth launcher, log-file classification

PR #3157 follow-up integrating four iterations from end-to-end manual
testing on Gemini 3.5 Flash + GPT-OSS 120B Medium through `agy` v1.0.3.
Each section is independently verifiable; combined they're what made
the first successful artifact generation work end-to-end.

## Model picker via settings.json (agy has no --model flag)

agy v1.0.3 ships no `--model` CLI flag (upstream issue #35), but the
TUI Switch-Model picker writes the chosen label to
`~/.gemini/antigravity-cli/settings.json`'s `"model"` field, and every
`-p` invocation re-reads that file on startup — verified by capturing
the `--log-file` line `Propagating selected model override to backend:
label="<model>"`. Antigravity's `fallbackModels` now lists the 8
labels its TUI exposes (Gemini 3.1 Pro / 3.5 Flash variants, Claude
Sonnet/Opus 4.6 Thinking, GPT-OSS 120B Medium) and `buildArgs`
persists the user's choice to settings.json right before spawn. The
synthetic `default` id is preserved — picking it leaves settings.json
untouched so a user who switches models from agy's own TUI keeps
their choice.

Introduces `RuntimeAgentDef.supportsCustomModel?: boolean`. AMR's
hardcoded blocklist in `SettingsDialog.tsx` migrates to the
declarative flag (it rejects free-form ids at the ACP layer), and
antigravity opts out because its label set is a server-side enum that
silently fails on unrecognised strings.

## Form-loop fix (transcript sanitizer + stronger OVERRIDE)

The discovery form loop on weak/medium plain-stream models (GPT-OSS
120B Medium, Gemini 3.5 Flash) had two reinforcing causes:

  1. `buildDaemonTranscript` packed the prior assistant turn's
     literal `<question-form>` markup into the user request on the
     next turn, giving the model a template to echo. New
     `sanitizePriorAssistantTurnForTranscript` strips
     `<question-form>...</question-form>` blocks and ```json fences
     that match form-schema shape, replacing them with a brief
     placeholder. User content is preserved verbatim (a user who
     legitimately mentions `<question-form>` in chat keeps their
     message intact).
  2. The OVERRIDE block on form-answered turns was 4 lines and only
     banned the bare `<question-form>` tag — models still emitted the
     fenced JSON, form-asking prose ("Got it — tell me the following"),
     and fake system events ("subagents stopped"). The new
     `FORM_ANSWERED_SYSTEM_OVERRIDE` enumerates each anti-pattern and
     pins them via tests, so silently weakening any line reintroduces
     the regression.

Also adds RuntimeAgentDef.resumesSessionViaCli + RuntimeContext.
hasPriorAssistantTurn as forward-looking abstractions (skipTranscript
option on composeChatUserRequestForAgent). Antigravity does NOT opt
in — agy's `-c` resume activates an internal agentic loop with tool
retries and fallback-to-cached-response on tool errors that the OD
system prompt cannot steer; reverted after seeing byte-identical
form re-emissions caused by agy's own retry logic, not OD's transcript.

## One-click OAuth via system terminal

agy print mode can't complete Google Sign-In on its own (the OAuth
callback page asks the user to paste an auth code back into agy, but
`-p` has no input field). Before this commit the auth banner only
told the user to "open a terminal yourself."

Adds `POST /api/agents/antigravity/oauth-launch` and a cross-platform
launcher in `runtimes/terminal-launch.ts`:

  - macOS:    osascript → Terminal.app `do script "agy"` + activate
  - Linux:    tries x-terminal-emulator, gnome-terminal, konsole,
              xfce4-terminal, xterm in order
  - Windows:  `cmd /c start "Open Design" cmd /k agy`

The endpoint hardcodes the `agy` command (no user input → no shell
injection surface) and is loopback-gated like the other daemon
endpoints. The chat's `AGENT_AUTH_REQUIRED` banner now renders a
"Sign in via terminal" button next to Retry; clicking it spawns the
terminal so the user can finish OAuth in one click.

## Silent-failure classification (auth vs quota via --log-file)

agy print mode is silent on stdout/stderr for both missing-OAuth AND
quota-exhausted failures — the upstream
`RESOURCE_EXHAUSTED (code 429): Individual quota reached` and the
`not logged into Antigravity` line only surface in agy's
`--log-file`. Without log inspection the daemon misread quota as
"auth required" and showed the wrong banner.

`RuntimeContext.agentLogFilePath` carries a daemon-owned per-run temp
path that antigravity's buildArgs translates to `--log-file <path>`.
The empty-output guard now reads that log on a `code === 0 &&
!childStdoutSeen` exit, feeds the tail to
`classifyAgentServiceFailure`, and routes:

  - "not logged into Antigravity"     → AGENT_AUTH_REQUIRED with
                                        antigravityAuthGuidance
  - "RESOURCE_EXHAUSTED" / "quota" /  → RATE_LIMITED with
    "Individual quota reached"          antigravityQuotaGuidance
  - none of the above (rare)          → fall back to auth guidance
                                        as the most likely cause

Both surface a terminal launcher in the auth banner: auth gets "Sign
in via terminal", quota gets "Switch model in terminal" — same
endpoint, contextual label. The handler is identical (open agy in a
terminal); the user either signs in or uses agy's Switch Model
picker to pick a model with available quota.

## Validation

- `pnpm guard` pass
- `pnpm --filter @open-design/daemon` runtime + telemetry suites:
  192 passed, 1 skipped (the 1 pre-existing `task-type` failure on
  origin/main is unrelated to this change)
- `pnpm --filter @open-design/web` typecheck pass; sse / amr-guidance
  / AgentIcon suites pass (51 web tests)
- Manual end-to-end on darwin + Gemini 3.5 Flash and GPT-OSS 120B
  Medium: turn-1 question-form rendered correctly, turn-2 produced
  `<artifact>` with full HTML (3.3KB Modern Minimal design) instead
  of re-emitting the form. agy `--log-file` content correctly
  classified as RATE_LIMITED when Gemini Pro quota was exhausted,
  and as AGENT_AUTH_REQUIRED when keychain was cleared.

* fix(web/test): align amrAgent fixture with supportsCustomModel contract

The AMR agent definition in the daemon ships `supportsCustomModel: false`
so the Settings model picker hides the free-text "Custom…" option. The
PR changed `allowCustomModel` from `selected.id !== 'amr'` (hardcoded)
to `selected.supportsCustomModel !== false` (declarative), but the test
fixture was not updated to carry the same field — causing the
`__custom__` sentinel to appear in the picker under test.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

* fix(daemon): align formAnswerTransition wording with main + scope build directive to discovery

CI surfaced two failures on the merge with main:
- chat-route.test marks submitted discovery form answers ... expected
  the main-version wording 'Do not emit another <formId> form.'
- telemetry-message-finalization keeps non-discovery form answers
  active ... expected task-type to fall through the else branch
  ('Treat these form answers as the active user turn'), not the
  discovery RULE 2/RULE 3 build branch.

The colleague's earlier fba1e40b form-loop fix tightened both pieces
(stronger wording + grouped discovery|task-type into the build branch)
but didn't update the tests that pin the contract. Revert the
transition wording to main and re-scope the build directive to
'discovery' only. The aggressive form-loop suppression we added in
this PR now lives in the system-prompt FORM_ANSWERED_SYSTEM_OVERRIDE
block, which is far stronger than the user-request transition text
this commit reverts.

* fix(daemon): scope formOverride by form id, detach Linux terminal, move agy log cleanup to finally

- FORM_ANSWERED_GENERIC_OVERRIDE: new exported constant for non-discovery/
  non-task-type form ids; contains only the "do not re-ask" suppression
  without the RULE 2 / RULE 3 / artifact directive.
- formAnswerTransitionForCurrentPrompt: extend build-transition branch to
  include task-type alongside discovery, keeping user-turn and system
  override consistent.
- Prompt assembly (server.ts ~10848): derive formOverride from the parsed
  form id — FORM_ANSWERED_SYSTEM_OVERRIDE for discovery/task-type,
  FORM_ANSWERED_GENERIC_OVERRIDE for all other form ids, empty otherwise.
- launchOnLinux: replace execFileAsync (waited for terminal exit, 3 s cap)
  with spawn({ detached: true, stdio: 'ignore' }) + unref(); resolve on
  the 'spawn' event so long-lived interactive terminals (xterm, konsole)
  are not killed mid-OAuth-flow.
- Antigravity log cleanup: move fs.promises.unlink(agentLogFilePath) into
  a try/finally wrapper around the close handler so every exit path
  (success, failure, cancel, non-zero exit) cleans up the per-run temp
  file, preventing unbounded /tmp accumulation.
- Tests: rename task-type case to assert build-transition behaviour; add
  generic-form-id case (preferences) pinning the non-build path; add
  FORM_ANSWERED_GENERIC_OVERRIDE content assertions.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

* fix(daemon): switch Antigravity buildArgs to chat subcommand invocation

Replace top-level `-p -` with `agy chat [--log-file …] -` so the adapter
uses the documented chat subcommand and stdin sentinel instead of the
unrecognised global -p flag.  Update the agent-args test description and
all four deepEqual assertions to assert the ['chat', '-'] shape.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

* test(daemon): drop real-platform default-launch assertion from terminal-launch suite

The removed test called launchAgentInSystemTerminal('agy') with no
platform override, which invokes the real system terminal on every
developer machine running the daemon test suite (Terminal.app on macOS,
cmd.exe on Windows, xterm/gnome-terminal on Linux). That is an
unacceptable OS side effect for a unit test.

The behaviour being asserted — that omitting platform selects
process.platform — is a TypeScript default-parameter guarantee, not a
runtime invariant that needs an integration test. The remaining 'aix'
case continues to pin the unsupported-platform failure shape.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

* fix(daemon): buffer Antigravity stdout to suppress auth URL before close-time classifier

The plain-stream close handler at code===0 can detect an agy OAuth
prompt in agentStdoutTail and emit AGENT_AUTH_REQUIRED, but by the
time close fires the stdout chunk has already been forwarded to the
client via the plain-stream `send('stdout', { chunk })` path. This
leaves both the raw OAuth URL and the terminal-launch guidance visible
in chat.

Buffer all stdout chunks for the `antigravity` agent instead of
forwarding them immediately. The existing close-time auth-prompt guard
(code===0, !trackingSubstantiveOutput, childStdoutSeen) returns early
when it detects the auth pattern, leaving the buffer unflushed and the
OAuth URL out of the SSE stream. For legitimate assistant output the
buffer is flushed in order just before design.runs.finish so the
chunks still arrive before the run's finished event.

Adds a chat-route integration test using a fake `agy` that exits 0
after printing the canonical auth prompt; asserts that the run emits
AGENT_AUTH_REQUIRED with no event: stdout delta containing the URL.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

* test(daemon): isolate antigravity buildArgs argv test from real settings file

Pass a temp antigravitySettingsPath in the RuntimeContext for the
withModel argv assertion so unit tests do not touch
~/.gemini/antigravity-cli/settings.json. Adds the optional
antigravitySettingsPath field to RuntimeContext and threads it
through buildArgs to writeAntigravityModelSelection; production
callers leave it undefined, preserving the existing default path.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

* fix(daemon): revert Antigravity buildArgs to `-p -` (the only working agy v1.0.3 invocation)

The looper-reviewer-bot reported `chat` as agy's headless subcommand
based on its environment's agy build, and looper-fixer applied that
shape. The installed CLI (`agy --version` reports `1.0.3`) does NOT
expose a `chat` subcommand — `agy --help`'s `Available subcommands`
section lists only `changelog / help / install / plugin / update`,
and `agy chat - < prompt` exits 0 with empty stdout (the daemon then
forwards it as a 'successful' empty reply, exactly the failure mode
the auth/quota guard at server.ts ~12090 is meant to catch — for the
wrong reason).

`-p` is the documented print-mode flag (`Short alias for --print`)
and `agy -p -` reads the prompt from stdin and prints the model
reply, which the entire end-to-end test sequence in this PR has
verified against (form-loop fix, settings.json model routing,
log-file classification all confirmed working on Gemini 3.5 Flash
+ GPT-OSS 120B Medium with this invocation).

Updates the agent-args test to pin `['-p', '-']` instead of
`['chat', '-']` and adds an inline comment in antigravity.ts noting
that `chat` may exist in a future agy build but is not the contract
on the installed CLI today.

* fix(daemon): serialize Antigravity concrete-model spawns to dodge settings.json race

Reviewer (looper) flagged a concurrency race in the model-routing path:
~/.gemini/antigravity-cli/settings.json is process-global, so two OD
runs starting close together with different concrete models can race
the file — run A writes model A, run B writes model B, then A's agy
finally reads settings.json and executes on model B. The Settings
model picker becomes nondeterministic under parallel conversations.

Adds a per-process promise chain in antigravity.ts:
  - acquireAntigravityModelLock(): chain-await + return release fn
  - waitForAgyToReadModel(logPath, expected): polls agy's --log-file
    for the upstream signal
      'Propagating selected model override to backend: label="<X>"'
    which model_config_manager.go emits once agy has finished reading
    settings.json. Returns true on observed match, false on timeout.
    Regex-escapes the expected label so '(' / ')' in 'GPT-OSS 120B
    (Medium)' match literally, not as a capture group.

server.ts spawn pipeline now acquires the lock BEFORE buildArgs (which
performs the settings.json write) and schedules a release-once handler
that fires when EITHER (a) the log-file confirms agy read the model
or (b) the child exits — the exit fallback prevents a stuck/crashed
agy from starving the queue for every subsequent antigravity spawn.

Default-model spawns bypass the lock entirely: their buildArgs doesn't
touch settings.json, so there's nothing to serialize.

Tests pin:
  - FIFO ordering across 2 / 3 concurrent acquirers
  - Wait helper's regex correctly matches parenthesized labels
  - Wait helper does NOT match a different model with shared prefix
  - Wait helper swallows missing-log-file errors and returns false on
    timeout (no spawn-pipeline crash if the log never appears)

194 → 198 passing runtime tests, 0 regressions.

* fix(daemon): close Antigravity lock release race on slow agy startup (looper #263fd2fe7)

Reviewer flagged that the previous serialization scheduled
`releaseOnce` in `.finally()` on waitForAgyToReadModel — meaning the
helper's `false` timeout return ALSO released the lock. If agy took
longer than the 15s polling window to read settings.json (cold start,
swap-thrash, slow network handshake to the upstream backend), run A's
lock dropped at 15s, run B rewrote settings.json with model B, and
run A's still-starting agy then read the wrong model. Same race the
original mutex was meant to close.

Fix the release semantics to be release-on-confirmation-only:

  - waitForAgyToReadModel: `false` now strictly means 'I gave up
    polling,' not 'agy definitely did not read this.' Document the
    contract so a future caller can't conflate the two. Add an
    optional AbortSignal so server.ts can stop polling when the child
    exits — without it, the leftover watcher could outlive the run
    and accidentally match a later concurrent run's log content,
    releasing the wrong lock.
  - server.ts: schedule `releaseOnce` only when waitForAgyToReadModel
    returns true. The exit handler (which fires for crashes, fast
    exits, normal completion) is now the canonical fallback that
    releases the lock no matter what — the queue can't starve
    permanently because agy always exits eventually. The exit
    handler also fires the AbortController so the watcher cleans up.

New tests pin:
  - timeout returns false WITHOUT any release-implying side effect
  - already-aborted signal short-circuits (no readFile calls)
  - abort mid-poll wakes the helper from its setTimeout (no
    multi-hundred-ms hang waiting out a poll interval that no longer
    matters)

198 → 201 passing runtime tests, 0 regressions.

---------

Co-authored-by: qiongyu1999 <2694684348@qq.com>
2026-05-29 05:43:37 +00:00
chaoxiaoche
912c7e380a
fix(plugin): infer semantic roles for token maps (#3231)
Co-authored-by: chaoxiaoche <chaoxiaoche@chaoxiaochedeMacBook-Pro.local>
2026-05-29 03:50:56 +00:00
lefarcen
08c350fb0f
fix(analytics): bucket feedback agent/model directly on the event (#3240)
* fix(analytics): bucket feedback agent/model directly on the event

Reason × agent / reason × model splits on
`assistant_feedback_reason_submit` were 25-74% `unknown` because the
event only carried `run_id` — analyses had to join back to
`run_created/run_finished`, which loses rows whenever the feedback is
given to a message whose run sits outside the query window (the common
case for feedback on older messages), and whose `model_id` was `null`
to begin with (the user didn't pick a specific model — went with the
agent's default).

Carry `agent_provider_id` and `model_id` directly on every feedback
event so the analyses no longer need to join. Replace `null/unknown`
with the `default` bucket via `modelIdForTracking` (and let
`agentIdToTracking` fall through to `other`) at every emit site —
`null` was an analyst-hostile mix of "no selection" and "join failed";
`default` is a real, analysable bucket. On `run_finished`, upgrade the
model to the agent-reported value from initializing/model status
events when the user did not pick one — covers ACP, claude-stream,
copilot-stream, json-event-stream, qoder, pi-rpc.

* fix(analytics): use feedbackAgentProviderIdToTracking and assistantFeedbackModelId for feedback events

Wire API-mode agent ids (anthropic-api → anthropic) and agentName-parsed
model ids through the feedback emit path. Previously the feedback props used
agentIdToTracking (no anthropic-api case) and assistantModelDetail (no
agentName fallback), causing model_id='default' and agent_provider_id='other'
for API-mode agents.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

* fix(analytics): extend feedback/run schema for full agent/model coverage

Layered on top of the conflict resolution and the v1 emit switchover
in 0c1b30440. Three things the prior commits did not cover:

1) The v2 `assistant_feedback_*` family (page='studio') shares
   `AssistantFeedbackBase`. Add `agent_provider_id` + `model_id` once on
   the base so all four derived emits (reason_view, click, reason_click,
   reason_submit) carry the same context as the v1 family, instead of
   leaving the v2 dashboard with the same `unknown` gap the v1 PR was
   trying to close.

2) Tighten `FeedbackSubmitResultProps.model_id` and
   `feedbackAgentProviderIdToTracking` from `string | null` /
   `TrackingFeedbackProviderId | null` to non-null. The web emit paths
   already bucket null/empty through `modelIdForTracking` and the
   `?? 'other'` fallback; collapsing that at the helper / contract
   layer means `null` becomes a TS error at every new emit site, so we
   can't regress the unknown bucket again in a future event.

3) Comment on `run_finished.model_id` so reviewers reading
   `finishedModelId` see why the agent-reported value upgrades the
   request-side one.

* fix(analytics): continue event scan past usage to find agent-reported model

The reverse scan for agentReportedModel was broken: the loop broke on
the first usage event (terminal) before ever reaching the status:initializing
or status:model event (emitted at run start, lower index). This meant
run_finished.model_id always fell through to modelIdForTracking(null) =
'default' for any run that reported usage tokens.

Fix: track haveUsageTokens as a flag and defer the break until both usage
tokens are found and either the model is not needed (user picked one) or
the agent-reported model has been captured. Extract the logic into
scanRunEventsForFinishedProps for unit testability.

Tests: six new cases in run-lifecycle-analytics.test.ts cover the
initializing→usage append order, ACP status:model, detail field fallback,
early exit when reqBodyModel is set, no-status event, and empty events.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

* fix(analytics): guard usage block with !haveUsageTokens to prevent early events overwriting terminal tokens

In the reverse-scan loop of scanRunEventsForFinishedProps, the usage block
lacked a !haveUsageTokens guard. When needAgentModel is true and the
agentReportedModel lives at the start of the run (lower index), the loop
walks all the way back past multiple usage events (one per step/turn in
multi-step runs), overwriting inputTokens/outputTokens on each pass. The
surviving values were those of the earliest step, not the terminal total.

Adding !haveUsageTokens to the usage block condition ensures only the first
(terminal) usage event seen in reverse sets the token counts; subsequent
earlier usage events are skipped while the scan continues for agentReportedModel.

Adds a test case for initializing(model) → usage(step1) → usage(terminal)
asserting both terminal token counts and agentReportedModel.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)
2026-05-29 03:06:06 +00:00
jinha-hwang-hajong
94ad2284a0
Fix pnpm native executable invocation (#2144) 2026-05-28 13:43:57 +00:00
lefarcen
b8cdf5f0ea
feat(mcp): generation loop + one-click Codex install (#3141)
* feat(mcp): add project creation, capability discovery, and generation tools

Lets an external coding agent (Codex, Cursor, …) drive a full design
loop over `od mcp`, not just read/write files: create a project,
discover what Open Design can make, commission a generation run, poll
it, and open the result in a browser. Complements the existing
write_file / delete_file / delete_project management tools.

New tools:
- create_project — make an empty project to generate into (start_run
  needs one). Derives a slug id from the name unless given.
- list_skills / list_plugins — discover what you can ask OD to make.
- start_run / get_run / cancel_run — commission a run (OD spawns its
  own agent), poll to completion, cancel. start+poll because MCP is
  request/response and generation is minutes-long.
- get_run / get_project now return a browser-openable previewUrl
  (entry file served raw; HTML entries render directly).

The external agent never runs a skill itself — it commissions OD to,
so the prior "skills not on MCP" boundary no longer applies.

* feat(mcp): make get_run preview hint directive

Reword the hint MCP clients receive when a run finishes so the agent
is more likely to surface the previewUrl to the user proactively —
mention the user-facing browser explicitly and call out that clients
with a built-in browser pane (e.g. Codex CLI's right-side browser)
should navigate to it directly. Also nudge start_run's hint to flag
that a previewUrl will arrive on success, so the agent knows what to
do with it before it ever sees get_run.

Pure text change; no behavior change in the tool surface or daemon.

* feat(mcp): one-click Install / Remove for Codex from Settings

Adds a toggle button on Settings → Integrations → Codex panel that runs
`codex mcp add open-design …` / `codex mcp remove open-design` via the
daemon, so users no longer need to copy TOML and paste it into
~/.codex/config.toml by hand. The copy-snippet path is unchanged and
remains the fallback when the Codex CLI isn't on PATH.

The daemon shells out to Codex CLI rather than rewriting config.toml
itself — that way we inherit Codex's own merge / dedupe / validation
rules and only track its argv. The runner is dependency-injected for
testability.

New endpoints (under /api/mcp/install/codex/*):
- GET status — probes `codex mcp get open-design`; returns
  { available, installed } so the UI can render the toggle state.
- POST — runs `codex mcp add open-design --env K=V … -- <node> <cli.js> mcp`,
  reusing the same payload as /api/mcp/install-info.
- DELETE — runs `codex mcp remove open-design`.

The web UI renders the toggle only inside the Codex client panel
(`client.id === 'codex'`). When Codex CLI is missing it shows a
disabled button with an explanatory hint instead of vanishing, so users
know why one-click isn't available.

* feat(mcp): teach agents to clarify ambiguous format requests

When the user asks for a "PPT" / "deck" / "slides" / "PDF" / "doc",
that's two very different deliverables: Open Design natively produces
browser-viewable HTML/SVG (including HTML-rendered decks), but the
user may actually want a binary .pptx / .docx / .pdf — which OD does
NOT produce and which the agent would have to export from OD's output
itself. Add a paragraph to the MCP server instructions telling the
agent to ASK which one is wanted before kicking off work, rather than
silently picking one or dual-tracking both paths.

Pure prompt-text change in the instructions block; no tool surface or
behavior change. Costs ~10 lines of session-init context (one-time
per MCP session), versus dual-tracked .pptx hedging Codex was
otherwise doing on every ambiguous request.

* feat(mcp): surface agent messages, skip OD discovery, slim list_plugins

Three fixes uncovered while exercising the full MCP-driven generation
loop end-to-end with a real Codex client. Each one is a real
blocker / footgun for the external agent.

1. get_run now includes agentMessage — the inner agent's textual
   output reassembled from the SSE event stream. Without this, runs
   that ended in a discovery-style clarifying question (e.g. a
   <question-form>) looked like "succeeded with empty output" mysteries
   to the outer agent. The hint now branches on whether previewUrl
   exists: with preview = show preview + relay agentMessage as the
   inner agent's note; no preview = relay agentMessage as the actual
   deliverable (almost always a clarifying question).

2. create_project sets skipDiscoveryBrief:true by default. The outer
   agent IS the user-facing surface for MCP-driven runs, so OD's own
   interactive discovery stage just creates a confusing
   nested-clarification loop where its question form ends up dropped
   (no files = no artifact). Better to let the outer agent gather
   requirements and pass a precise prompt or plugin to start_run.

3. list_plugins flattens the daemon's bulky 16-field plugin record
   (fsPath, sourceMarketplaceId, installedAt, …) into the few fields
   an agent actually picks plugins on: id, title, description, kind,
   tags. description / kind come from manifest.description /
   manifest.od.{taskKind,kind} which the previous pass-through dropped
   on the floor.

* feat(mcp): smart entry fallback + list_agents

Two fixes uncovered by exercising the full Codex-driven loop on a real
machine. Both close the gap between "Open Design has the data" and
"the external agent can find it".

1. get_project / get_run now fall back to scanning the project's file
   list when metadata.entryFile is missing. We hit the case where
   write_file (and a half-finished inner-agent run) put a perfectly
   viewable index.html into the project, but metadata.entryFile stayed
   null — so the outer agent got no previewUrl from MCP and resorted
   to guessing a file:// path. Priority: declared entryFile, then
   index.html anywhere, then a single .html at the project root.
   Pure read-side change; no extra fetch when entryFile is already
   set.

2. list_agents lets the outer agent stop guessing 'claude' / 'codex' /
   'gemini' for start_run.agent. The daemon already exposed
   /api/agents with 19 supported CLIs and an `available` flag. The
   MCP wrapper defaults to filtering to installed agents only (so the
   agent never picks one whose binary won't spawn), with
   includeUnavailable:true as an opt-in to see uninstalled ones plus
   their installUrl. Models truncated to 10 with modelsCount carrying
   the real total — keeps the response token-economical even for
   agents (opencode) with 100+ models.

* feat(mcp): tell the outer agent runs take 5–30 min, don't bypass

Direct response to a real Codex client observably cancelling an
in-flight run after 3 polls and substituting its own write_file
output ("文件时间戳没推进 → 我直接覆盖生成") — exactly the failure
mode this MCP surface exists to avoid.

start_run's hint and the session-init instructions block now both
state explicitly:
  - Runs typically take 5–30 minutes.
  - status:running with unchanged file mtimes is the inner agent
    thinking, NOT a hang.
  - Do not cancel_run out of impatience.
  - Do not substitute write_file as a "faster" workaround — that
    discards OD's pipeline-driven design quality.
  - Poll every 30–60 seconds; report "still working" to the user
    between polls.
  - Only call cancel_run if the user explicitly asks.

Pure prompt-text change; no surface or behavior change. Costs ~10
lines of one-time session-init tokens + ~80 more tokens per
start_run response, in exchange for the outer agent actually
trusting the run.

* feat(mcp): persist run events to disk + expose tail-able path

Closes the in-flight visibility gap that made real Codex clients
cancel a 24-min run after 3 polls and substitute their own
write_file output, simply because polling get_run showed no change.

Daemon: every SSE event is now mirrored to a JSON-Lines file at
<RUNTIME_DATA_DIR>/runs/<runId>/events.jsonl. The path is wired
through createChatRunService's new `runsLogDir` option (null
disables, preserving legacy in-memory-only behavior). statusBody
exposes the path as `eventsLogPath`. Failures are best-effort — a
broken stream destroys itself and the run keeps going on the
in-memory event log (SSE clients are unaffected).

MCP: get_run already passed statusBody through, so eventsLogPath
surfaces automatically. The new value is that get_run during a
running status now adds a directive hint telling the outer agent to
`tail -n 50 -f <path>` in its own shell to see live progress —
that's the signal that makes the agent trust the run and stop
cancelling. The succeeded-status hint mentions the path too, for
forensics. No new tool; the field rides existing get_run polls.

Spec-first throughout:
  - runs.test.ts adds 4 tests covering write-per-emit, statusBody
    field, null-runsLogDir back-compat, and the no-IO guarantee
    when persistence is disabled.
  - mcp-runs.test.ts adds 1 test for the running-status hint.

* fix(mcp): get_run hint directs callers to pass project explicitly

The success hint in get_run previously said "project defaults to this
run's project", which is misleading: get_artifact has no run context and
falls back to /api/active when project is omitted, not to the run's
project. A client following the old guidance after creating a fresh or
non-active project could fetch the wrong project's files or fail with
"no active project".

The hint now embeds the run's projectId and tells callers to pass it
explicitly: get_artifact({ project: "<id>" }). A focused regression test
in mcp-runs.test.ts verifies the hint contains the projectId and does
not contain the incorrect active-context fallback guidance.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

* fix(contracts): add eventsLogPath to ChatRunStatusResponse

The daemon's statusBody() returns eventsLogPath but the shared DTO
lacked this field, leaving web/CLI/MCP callers without a typed
accessor.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

* feat(mcp): bind MCP runs to OD conversations + studio deep links

Closes the last gap that made MCP-driven runs feel like a parallel
side door: the user could not see the conversation in OD's studio
page even though the run was real, finished, and had files.

Daemon side: POST /api/runs now falls back to the project's default
conversation when the caller (MCP / SDK) only supplied projectId.
It synthesizes an assistantMessageId, writes a user message with the
prompt as content, and lets the existing
`pinAssistantMessageOnRunCreate` helper create the empty assistant
row. The existing `appendMessageAgentEvent` accumulation path then
streams text_delta events into the assistant row's content — same
as the web /api/chat flow. The response body now echoes the
resolved conversationId + assistantMessageId so MCP callers can
build a deep link.

`buildMcpInstallPayload` now also surfaces `webBaseUrl` (read from
OD_WEB_PORT, the env tools-dev exports for the web listener). MCP
clients use it to build studio deep links.

MCP side: `start_run`, `get_run`, `get_project` now return a
`studioUrl` — a browser-facing OD URL pointing at the studio page
that shows the file preview AND the chat history side by side. The
hint on each tool was updated to tell the outer agent to hand
studioUrl to the user as the primary link (previewUrl falls back to
raw-file when the user only wants the rendered output). The
webBaseUrl is fetched once via /api/mcp/install-info and cached for
5s to keep per-poll cost flat; a tiny `_resetWebBaseUrlCache` export
lets tests start each case with a clean cache.

Contracts: `ChatRunCreateResponse` gains optional conversationId +
assistantMessageId; `ChatRunStatusResponse` gains optional
eventsLogPath. Both additive, no consumer breakage.

Spec-first throughout:
  - get_run includes studioUrl on success when webBaseUrl + conversationId are available
  - get_run omits studioUrl when webBaseUrl is null
  - start_run returns studioUrl and conversationId for the new run
  - get_project returns studioUrl using the project default conversation

* fix(mcp): add skill/skillId to start_run so listed skills are actionable

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

* fix(test): update mcp-get-project test to handle getWebBaseUrl fetch

The get_project handler now calls getWebBaseUrl (added with the studio
deep-link feature), which fetches /api/mcp/install-info. The test mock
only handled the /api/projects/:id URL and expected a single fetch call,
causing the assertion to fail with "called 2 times" instead of 1.

Fix: handle the /api/mcp/install-info URL in the fetch mock (returning
webBaseUrl: null), update the call count expectation to 2, and call
_resetWebBaseUrlCache in afterEach to prevent cache bleed between tests.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

* feat(mcp): tell agents to render studioUrl as a clickable markdown link

Observed in a real Codex client: Codex received studioUrl correctly
but rendered it as inline code (gray code-span), which its built-in
browser pane does NOT make clickable. The user had to copy-paste the
URL into a browser by hand even though Codex / Cursor / Zed all
auto-link markdown `[label](url)` syntax and would navigate it in
their right-side preview pane.

The three studioUrl-mentioning hints now explicitly tell the agent
to render the URL as a markdown link (e.g.
`[Open Open Design studio](URL)`) and never as inline code or bare
text. Pure prompt-text change.

* fix(runs): resolve default agent when MCP caller omits agentId; add McpRunCreateRequest contract type

- POST /api/runs: when no agentId is provided, resolve from app-config
  or first available CLI before spawning — mirrors the pattern the
  routine handler already uses. Prevents 'unknown agent: undefined'
  failures on the create_project -> start_run(prompt) MCP path.
- packages/contracts: add McpRunCreateRequest interface for the
  projectId-only / SDK caller shape so typed callers can construct the
  request without casts. Exported via index.ts's existing chat re-export.
- packages/contracts/tests: add compile fixture verifying projectId-only,
  projectId+message, and projectId+message+agentId shapes all type-check.
- apps/daemon/tests: add mcp-runs test asserting agent arg omitted in
  start_run does not include agentId in the POSTed body.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)
2026-05-28 11:29:11 +00:00
Mason
50f85b509a
fix(analytics): fill run and feedback metadata (#3194)
* fix(analytics): fill run and feedback metadata

* fix(analytics): map feedback API providers
2026-05-28 11:05:56 +00:00
Denis Redozubov
c847ace554
Add run-scoped media execution policy (#3106)
* feat(contracts): add run media execution policy

* feat(daemon): enforce run media execution policy

* test(daemon): cover media execution policy gates
2026-05-28 09:19:40 +00:00
kami
0bfb4803e7
feat(daemon): add Phase 2C CLI wrappers (#2179)
* feat(daemon): add phase 2c cli wrappers

Co-authored-by: multica-agent <github@multica.ai>

* fix: handle desktop-gated CLI imports

Co-authored-by: multica-agent <github@multica.ai>

* fix: pass sidecar ipc path to agent wrappers

Co-authored-by: multica-agent <github@multica.ai>

* fix: make agent wrapper env explicit

Co-authored-by: multica-agent <github@multica.ai>

* fix(daemon): preserve CLI import and diff edge cases

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-28 09:08:31 +00:00
lefarcen
9a4816b101
feat(plugins): site-wide plugin detail pages, share-to-site links, landing deploy trigger (#2999)
* feat(plugins): site-wide plugin detail pages, share-to-site links, landing deploy trigger

Why: a merged plugin PR didn't redeploy the landing site (plugins/** was missing from the deploy paths), and the desktop Share menu copied a local/404 link instead of the public marketplace URL. The landing plugin routing left by the detail-page rework also 404'd: the locale listing's cards used a multi-segment href while detail pages were single-segment, and only 388 bundled _official plugins had pages.

What changed:
- Deploy: landing-page deploy/ci trigger on plugins/**, and skip the slow previews step on an exact cache hit (cache key aligned across both workflows so a PR-built cache is reused by main).
- Share URL: packages/contracts/plugin-url.ts owns the single-segment plugin URL scheme; the web Share menu and the landing site both derive links from it. Web links now point at https://open-design.ai/plugins/<slug>/.
- Full detail coverage: detail pages now cover all 403 local plugins (_official incl. atoms + community), each rendered from its local manifest. Fixes the locale-listing 404s and the community manifest-name/catalog-id (- vs /) mismatch.
- Self-host: daemon exposes OD_SITE_ORIGIN via /api/app-config; web falls back to the canonical origin until the daemon answers.

Validation: pnpm guard, pnpm typecheck (all packages), contracts + web tests green, and a full build E2E confirming all 403 catalog ids and locale-listing cards resolve to built detail pages (0 missing).

* chore: retrigger CI

* ci(landing): carry plugins/** trigger + previews cache-hit into #2994 split workflows

Merged origin/main, which split landing deploy into staging + manual production (#2994). git auto-migrated my landing-page-deploy.yml changes into landing-page-staging.yml via rename detection (plugins/** path, fallback-preview-card.ts cache key, cache-hit skip all carried). The new manual landing-page-production.yml didn't have them, so add the previews cache-key alignment + cache-hit skip there too (plugins/** path is N/A — production is workflow_dispatch only).

* fix(ci): wrangler-action uses pnpm so it tolerates landing's workspace dep

This PR added @open-design/contracts (workspace:*) to apps/landing-page/package.json so the landing site can share the plugin-url slug rules. But the landing deploy/preview steps run cloudflare/wrangler-action with packageManager: npm in workingDirectory apps/landing-page, and 'npm i wrangler' chokes on the workspace: protocol (EUNSUPPORTEDPROTOCOL), failing 'Validate landing page'. Switch all three landing wrangler-action steps (staging / ci preview / production) to packageManager: pnpm, which is workspace-aware.

* test(e2e): bundled plugins now offer the README badge

After this branch, buildPluginShareUrl returns a public open-design.ai link for bundled plugins (not just official-marketplace ones), so the home-starter share menu now shows 'Copy README badge'. Update the assertion from toHaveCount(0) to toBeVisible().

* fix(landing): drop @open-design/contracts dep, use a landing-local slug helper

Per review on #2999: the marketing site must not import @open-design/contracts (AGENTS.md boundary — it's the web/daemon product-runtime contract layer). Move the slug/path helpers into landing-local app/_lib/plugin-slug.ts; the web client keeps contracts' plugin-url. The two derive the same scheme and are verified in lockstep by the e2e route check (403 share URLs -> 403 detail pages, 0 missing). landing no longer has a workspace dep, so revert the wrangler-action packageManager back to npm.

* fix(landing): include plugins/_official in previews cache key

Per review on #2999: generate-previews.ts builds bundled-plugin preview jobs from plugins/_official/**/open-design.json and renders fallback cards from manifest fields (title/description/mode/scenario/tags). With plugins/** now triggering the workflow but the cache key not hashing plugin inputs, a plugin-only PR/merge could exact-hit an old cache and skip the preview regen, shipping with a stale or missing /previews/plugins/<manifest-id>.png. Add plugins/_official/** to the cache key in all three landing workflows (ci, staging, production). community is not currently covered by generate-previews so its glob is omitted.

* fix(plugins): include community marketplace installs in share gate

hasPublicPage now covers sourceMarketplaceId === 'community' so the
README badge and public detail link surface for community installs.
Community manifest names carry a community- prefix that diverges from
the landing-page route slug, so URL derivation uses sourceMarketplaceEntryName
(community/<folder>) instead — pluginDetailSlug takes the last segment,
matching the /plugins/<folder>/ route the landing page emits.
Adds component tests for buildPluginShareUrl, badge copy, and the
Open-in-marketplace link for a community/registry-starter record.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

---------

Co-authored-by: mrcfps <mrc@powerformer.com>
2026-05-28 09:07:12 +00:00
Marc Chan
338cb4d423
fix(platform): support live system proxy changes (#3093)
* fix(platform): support live system proxy changes

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Honor lowercase proxy env vars within a single source before merging proxy-aware envs.\n\nGenerated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Refresh provider request proxy env on each dispatcher creation and cover it with a focused regression test.

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): enable node env proxy for user proxy vars

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)
2026-05-28 06:11:47 +00:00
lefarcen
df8a0faff6
feat(runtimes): register AMR (vela) as an ACP stdio agent (#2355)
* feat(runtimes): register AMR (vela) as an ACP stdio agent

AMR is the vela CLI's ACP runtime mode. `vela agent run --runtime opencode`
speaks ACP JSON-RPC over stdio (see vela's
`specs/current/runtime/manual-agent-run-openrouter.md`); per
`docs/new-agent-runtime-acp.md` we expose it through the same `streamFormat:
'acp-json-rpc'` transport that already powers Hermes, Devin, Kimi, etc.

The new `defs/amr.ts` is the entire wiring — `buildArgs` returns
`['agent', 'run', '--runtime', 'opencode']`, `fetchModels` reuses
`detectAcpModels`, and the fallback list seeds the OpenRouter ids vela's
e2e baseline uses. `executables.ts`/`app-config.ts`/`metadata.ts` get the
matching `VELA_BIN`/`VELA_LINK_URL`/`VELA_RUNTIME_KEY`/`VELA_OPENCODE_BIN`
allowlist + install/docs URLs, so users can configure the per-agent env in
Settings without leaking into other adapters.

Coverage: `tests/fixtures/fake-vela.mjs` is a minimal ACP stub that returns
the documented `initialize` / `session/new` / `session/set_model` /
`session/prompt` shapes; `tests/amr-acp-integration.test.ts` spawns it via
`child_process.spawn` and drives a full turn through `attachAcpSession` and
`detectAcpModels`, so the ACP transport contract for AMR is end-to-end
verified locally even before a real `vela` binary is installed.

Validated:
- pnpm guard
- pnpm typecheck (all workspace projects)
- pnpm --filter @open-design/daemon test (2881/2881)

Deferred: real OpenRouter-backed turn through a built `vela` binary —
the runtime def needs no changes for that path, only `VELA_RUNTIME_KEY`
and `VELA_LINK_URL` in env (or Settings).

* fix(runtimes/amr): pin a concrete default model and bare openai ids

End-to-end validation against a freshly-built `vela` (nexu-io/vela@main)
+ OpenRouter surfaced two contract details the first AMR runtime def
got wrong:

1. vela rejects `session/prompt` with `session/set_model must be called
   before session/prompt`. attachAcpSession in apps/daemon/src/acp.ts
   skips set_model whenever the picked model is the synthetic 'default'
   id, so AMR's fallback list must NOT include DEFAULT_MODEL_OPTION. The
   def now ships a concrete `gpt-5.4-mini` as both `fetchModels`'
   default option and `fallbackModels[0]`, which makes attachAcpSession
   always send a real `session/set_model` for AMR turns.

2. `vela --runtime opencode` auto-prepends `openai/` to whatever modelId
   it forwards to opencode's openai provider. With OpenRouter-style ids
   like `openai/gpt-5.4-mini`, opencode receives the double-prefixed
   `openai/openai/gpt-5.4-mini` and replies `ProviderModelNotFoundError`.
   The new fallback list ships the bare ids opencode's openai registry
   actually knows about (gpt-5.4, gpt-5.4-mini, gpt-5.4-fast, etc.).

Stub + tests:
- tests/fixtures/fake-vela.mjs now enforces the set_model gate the same
  way real vela does, so a regression that silently goes back to
  model: 'default' would surface as a fatal error in tests instead of a
  hidden production failure.
- tests/amr-acp-integration.test.ts pins both contracts: no 'default' /
  no 'openai/' prefix in fallbackModels, and a negative case that
  asserts session/prompt fails when no model is set.

Adds `apps/daemon/scripts/verify-amr-real-vela.mjs` — a small dev-time
runner that drives `attachAcpSession` against a real `vela` binary and
prints the daemon's chat events, so future protocol drift can be checked
against an actual OpenRouter call.

Verified locally: `vela agent run --runtime opencode` + OpenRouter
returns the prompted string ("AMR-E2E-PASS") through the full daemon
pipeline; daemon test suite stays 2883/2883.

* fix(runtimes/amr): substitute concrete model when chat run sends 'default'

A plugin-driven AMR run from the UI surfaced a real-world hole in the
prior commit:

  json-rpc id 3: session/set_model must be called before session/prompt

The Default-design-router plugin (and any caller that doesn't pin a
real model) sends `model: 'default'` straight through, which the AMR
runtime def cannot accept — vela rejects `session/prompt` without
`session/set_model` and attachAcpSession skips set_model whenever
model === 'default'. Just leaving DEFAULT_MODEL_OPTION out of the
adapter's `fallbackModels` is not enough: the chat-run handler in
server.ts still forwarded 'default' verbatim.

This adds `resolveModelForAgent(def, resolved, env?)` as the
single source of truth for the substitution:

  1. If the caller picked a real id, pass it through.
  2. Else, if `def.defaultModelEnvVar` is set and the daemon process
     env has a non-empty value for it, return that (operator escape
     hatch — see below).
  3. Else, if the def's `fallbackModels` does NOT contain a 'default'
     id, return `fallbackModels[0].id`.
  4. Else, return the original value (the historic shape — defs that
     list 'default' themselves are untouched).

AMR sets `defaultModelEnvVar: 'VELA_DEFAULT_MODEL'`, so when
opencode's openai-provider registry deprecates `gpt-5.4-mini`
upstream, an operator can swap the fallback id without a code change
by exporting `VELA_DEFAULT_MODEL=gpt-5.5` before launching tools-dev
/ od. Worth noting the env var must live in the daemon's `process.env`
(Settings-UI per-agent env values only reach the spawned child, not
the daemon's resolver) — the new field's docblock spells this out.

Coverage:
- `tests/runtimes/resolve-model.test.ts` — 8 unit tests covering all
  four resolver branches plus the env-override happy path / fallback /
  ignore-when-user-picked-a-real-id case.
- `pnpm --filter @open-design/daemon typecheck` clean.

* chore(runtimes/amr): move AMR to the top of the base agent list

So `AMR (vela)` shows up first in the agent picker / status views,
ahead of claude / codex. Pure ordering change; no behavior delta.

* feat(amr): Sign-in / Sign-out button on the AMR Settings card

The first half of the AMR work assumed the operator would set
VELA_RUNTIME_KEY / VELA_LINK_URL on the daemon process and never
surfaced login state to users. This adds the missing UX so a fresh
install can drive the full path from Settings:

  - GET  /api/integrations/vela/status   reads ~/.vela/config.json
    for the active profile and returns { loggedIn, profile, user }
    (without leaking the runtime/control keys themselves).
  - POST /api/integrations/vela/login    spawns `vela login` once
    (409 if one is already in flight). The vela CLI opens the user's
    browser to the device-authorization page itself — Open Design
    only needs to kick the subprocess off.
  - POST /api/integrations/vela/logout   removes ~/.vela/config.json
    so the next status read returns logged-out.

`AmrAgentCard` is a dedicated agent-card component for AMR because
the existing `<button>` row can't host an interactive sub-control
(nested interactive elements). It polls /status after a login click
until the daemon reports loggedIn=true (or 5 minutes elapse), and
exposes a Sign-out action on hover. Other adapters (claude, codex,
hermes, …) keep their existing `<button>` card.

i18n: 8 new keys (settings.amrLogin / Logout / LoggingIn / etc.)
added to en + zh-CN. Other locales spread `en` and inherit the
English copy until translations land.

Coverage:
- `tests/integrations/vela.test.ts` pins the config.json reader
  against a tmp HOME — including the negative case where a profile
  has user info but no runtimeKey (still logged-out), and the
  secret-leak guard ("rt-secret-*" must not appear in the projection
  payload).
- `tests/components/AmrAgentCard.test.tsx` covers all four UI
  states (logged-out, logging-in, logged-in, logging-out) plus the
  click-propagation invariant the divergent card was built to keep.

`pnpm --filter @open-design/daemon test` 2901 / 2901 passing.
`pnpm --filter @open-design/web test` 1719 / 1719 passing.
`pnpm typecheck` + `pnpm guard` clean.

Dev script side-effects: `apps/daemon/scripts/verify-amr-real-vela.mjs`
no longer requires both VELA_RUNTIME_KEY and VELA_LINK_URL — if
VELA_PROFILE is set, the vela CLI is allowed to resolve credentials
from `~/.vela/config.json`. Added the two AMR `.mjs` fixtures to
`scripts/guard.ts` allowlist with the executable-fixture / dev-runner
rationale.

* fix(connection-test): substitute model for AMR before attachAcpSession

The chat-run path in server.ts already routes the requested model through
`resolveModelForAgent` so AMR / vela (whose CLI demands an explicit
`session/set_model` before `session/prompt`) gets the def's first
concrete fallback id when the chat run ships `model: 'default'`.
`connectionTest.ts` was wiring `attachAcpSession({ ..., model: model ?? null })`
directly, which made the Test Connection button on the AMR Settings
card deadlock with the same `session/set_model must be called before
session/prompt` error the chat-run path already handles — surfaced as a
permanent "Testing connection…" spinner in the UI.

Reuse the same helper here so Test Connection mirrors chat-run behavior.

* test(amr): three-layer end-to-end coverage for the AMR login + turn flow

The PR up to this point shipped runtime + UI code with unit-level Vitest
coverage. This commit adds the cross-layer regression net the live demo
relied on:

1. apps/daemon/tests/integrations/vela.routes.test.ts (HTTP, Vitest)
   Spins up the real daemon Express app via `startServer({port:0,...})`,
   persists `agentCliEnv.amr.VELA_BIN = <fake>` into app-config.json,
   and exercises every /api/integrations/vela/* endpoint against the
   extended fake-vela stub:
     - status reads ~/.vela/config.json under various states
     - login spawns the fake, waits for config.json to appear, returns
       pid + startedAt + profile
     - 409 already-running guard with the stub's delay knob
     - logout removes the file (idempotent)
     - secrets (runtimeKey / controlKey) never leak in the projection
     - login → status round-trip flips loggedIn=false → true

2. e2e/tests/amr/turn.test.ts (tools-dev orchestrated, Vitest)
   Boots a namespaced daemon + web pair through `createSmokeSuite`,
   inlines a self-contained fake `vela` binary that handles BOTH
   `vela login` (writes ~/.vela/config.json) and
   `vela agent run --runtime opencode` (ACP stdio with the
   `session/set_model must precede session/prompt` gate the real binary
   enforces), then drives a complete /api/runs lifecycle for
   `agentId: 'amr', model: 'default'` and asserts the assistant message
   captures the fake's streamed text. This is the test that would have
   surfaced today's plugin-default-model regression (the `set_model
   before prompt` error) at PR time instead of demo time.

3. e2e/ui/amr-login-pill.test.ts (Playwright)
   Mocks /api/agents + /api/integrations/vela/{status,login,logout}
   to drive the Settings AMR card through the full Sign in → Signed in
   → Sign out cycle. Pins the AmrLoginPill polling contract and the
   aria-label semantics (the pill's accessible name is "Sign out" once
   logged in, regardless of which label the hover-state text shows).

fake-vela.mjs extensions:
   - Handles `vela login` argv by writing
     ~/.vela/config.json for the active VELA_PROFILE and exiting 0 —
     mirrors real vela's on-disk side-effect without the device-auth
     loop.
   - FAKE_VELA_LOGIN_DELAY_MS knob so route tests can observe the
     in-flight state of the spawn lifecycle.
   - FAKE_VELA_LOGIN_USER_EMAIL / _USER_PLAN to assert the surfaced
     user fields end-to-end.

Validated:
   - `pnpm guard` + `pnpm typecheck` (all workspace projects)
   - `pnpm --filter @open-design/daemon test`: 2998 / 2998 passing,
     including the new 8-test integration suite.
   - `cd e2e && pnpm test tests/amr`: 1 / 1 passing.
   - `cd e2e && pnpm exec playwright test ui/amr-login-pill.test.ts`:
     1 / 1 passing (6.7s).

* feat(amr): package native cli and refine login ui

* feat(amr): wire vela cli beta packaging

* docs(amr): document vela ci packaging review

* docs(amr): refine vela ci integration review

* fix(ci): refresh nix pnpm dependency hashes

* fix(pack): clean up Vela CLI packaging

* fix(pack): bundle Vela CLI support files

* fix(amr): recover login attempts from stale auth state

* test: expand AMR and automations coverage

* fix(amr): address review follow-ups

* test(web): align tasks fixtures with contracts

* fix(daemon): type wildcard route params

* fix(ci): refresh PR merge validation

* fix(amr): clear env credentials on logout

* feat(settings): inline local CLI model configuration

* fix(amr): recognize daemon env credentials

* [codex] Fix Vela companion packaging (#2979)

* Fix Vela companion packaging

* Update Nix pnpm dependency hashes

* [codex] Surface AMR account failures (#2980)

* fix: surface AMR account failures

* fix: cover AMR recovery error guidance

* chore: bump beta base version to 0.8.1 (#2990)

* Fix AMR profile and packaged runtime review issues

* Detect packaged AMR OpenCode companion tree

* feat(web): polish AMR frontend flows

* Polish AMR onboarding card

* fix: read AMR login state from dot-amr config (#3048)

* test: tighten AMR credential and packaging coverage

* test: restore AMR executable test env helper

* [codex] Fix packaged mac Dock identity and AMR label (#3076)

* Fix packaged mac sidecar Dock identity

* Rename AMR assistant label

* Fix AMR live models and dot-amr login state (#3073)

* fix: read AMR login state from dot-amr config

* fix: load live AMR models before runs

* fix: point AMR onboarding link to production wallet

* fix: address AMR model review feedback

* fix: persist live AMR model fallback

* [codex] Fix AMR link catalog model ids (#3088)

* Fix packaged mac sidecar Dock identity

* Rename AMR assistant label

* Fix AMR link catalog model ids

* Fix AMR model normalization typecheck

* Use live AMR model for default runs

* fix: polish AMR runtime settings UI

* Accelerate AMR startup defaults (#3092)

* Surface AMR insufficient balance wallet URL (#3099)

* fix(web): polish onboarding controls (#3112)

* fix(web): show CLI scan loading state

* Avoid duplicate AMR wallet recharge links (#3117)

* Avoid duplicate AMR wallet recharge links

* Use Vela CLI 0.0.3 test package

* chore(nix): refresh pnpm deps hash

* Fix AMR wallet guidance display

---------

Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>

* chore(pack): pin Vela CLI 0.0.3-test.1 (#3127)

* chore(nix): refresh pnpm deps hash

* chore(pack): pin Vela CLI 0.0.3

* chore(nix): refresh pnpm deps hash

* fix(web): suppress AMR exit 130 fallback (#3136)

* feat(web): nudge users to hosted AMR on model/auth/quota failures (#3083)

* feat(web): nudge users to hosted AMR on model/auth/quota failures

When a non-AMR agent run fails with an auth / quota / upstream model
error, surface an inline nudge under the error pill linking to Open
Design's hosted AMR gateway (https://open-design.ai/amr). The nudge
fires `surface_view` (element=run_failed_toast) on impression and
`ui_click` (element=go_amr) on the link.

Also teach the daemon to classify CLI-agent auth/quota/upstream failures
(Claude Code, codex, ...) into specific API error codes
(AGENT_AUTH_REQUIRED / RATE_LIMITED / UPSTREAM_UNAVAILABLE) instead of
the generic AGENT_EXECUTION_FAILED, so both the error message and the
nudge key off accurate codes. AMR's own runs are excluded from the
nudge — they keep the dedicated sign-in / recharge affordances.

* feat(web): rework failed-run AMR guidance into per-case error UI

Replace the single inline nudge with a per-case failed-run experience
driven by the run's error code + agent:

- The error card is now neutral gray (was red) and always carries a
  retry button; it is driven by the persisted per-message error event so
  it survives a reload.
- Non-AMR agent hitting a model/auth/quota wall: a theme-color promotion
  card under the error card offers "switch to AMR & retry" — switches the
  run to AMR, opens Settings on the AMR card, and auto-retries once the
  account signs in (ProjectView polls vela login status, independent of
  the Settings pill lifecycle, with success / 5-min-timeout / unmount
  exits).
- AMR agent unauthorized: clearer copy + an "authorize & retry" button.
- AMR agent out of balance: clearer copy + a "top up" button to the AMR
  wallet, with manual retry.
- Settings AMR card: when opened from the nudge, it scrolls into view and
  pulses, and an authorize-button coachmark (a fake hand cursor that
  rises in and dismisses on hover) points at the sign-in control when not
  yet authorized.

analytics: surface_view (run_failed_toast) on the promotion card and
ui_click (go_amr) on its action are retained. i18n adds chat.amrCard.*
and chat.amrError.* (en / zh-CN / zh-TW translated; other locales fall
back to en) and drops the old chat.amrErrorGuidance keys.

* fix(daemon): require status context for numeric service-failure codes

Per review on #3083: the model-service classifier matched bare HTTP
status numbers (`500`, `502`, `429`, `401`), so ordinary CLI output like
`line 500`, `read 502 bytes`, or `exit code 401` could be misclassified
as a provider outage / auth wall and wrongly surface the AMR nudge. Now
a status number only counts when it carries explicit context (`HTTP 500`,
`status 503`, `code: 401`, `502 Bad Gateway`); textual provider phrases
(overloaded, bad gateway, service unavailable, rate limit, …) are
unchanged. Adds fixtures proving unrelated numeric output stays null.

* fix(web): keep error pill for failed runs ChatPane's card doesn't cover

Per review on #3083: the per-message gray error pill was suppressed for
every persisted error status event, but ChatPane only renders the
replacement top-level error card for `retryableAssistantMessage` (the
last failed assistant). So a failed turn that is no longer last (after a
follow-up) or an older failed run in history showed neither the pill nor
the card — its error detail vanished, undercutting reload/history
survival. ChatPane now passes `errorCardOwnerId` (the assistant id whose
error the card represents); AssistantMessage suppresses only that one
pill and keeps rendering StatusPill for all other error events.

* fix(daemon): don't treat a process exit code as an HTTP status

Follow-up to review on #3083: the status-context helper accepted a bare
`code` prefix, so `exit code 401` / `process exited with code 429` still
matched and got classified as AGENT_AUTH_REQUIRED / RATE_LIMITED (the
very `exit code 401` case the comment calls out as noise). `code` now
only counts when qualified (`status code` / `error code` / `response
code`) or punctuation-bound (`code: 401`); bare `exit code N` no longer
matches. Adds fixtures for exit-code lines returning null.

* chore(web): translate AMR card / error keys for 16 remaining locales

PR #3083 added 10 new `chat.amrCard.*` / `chat.amrError.*` keys but only
provided en/zh-CN/zh-TW translations; the other 16 locales fell back to
English. Translate the card title/body, three chips, primary CTA, and
the AMR self-error (auth / balance) messages and buttons for ar, de,
es-ES, fa, fr, hu, id, it, ja, ko, pl, pt-BR, ru, th, tr, uk.

* fix(amr): address review feedback on #2355

Targeted fixes for the unresolved review threads on #2355. Each fix
includes / updates a focused test.

- runtimes/executables.ts: `packagedVelaOpenCodeCompanionTree` now
  verifies the inner `opencode` executable exists + is runnable, not
  just the directory. This closes the false-positive availability path
  that let `detectAgents()` surface AMR as available even when the
  packaged companion was empty / partially copied (mrcfps, 4 threads).

- runtimes/executables.ts: `resolveAmrOpenCodeExecutable` now prefers
  the bundled `<OD_RESOURCE_ROOT>/bin/libexec/opencode/opencode` over a
  stale `opencode` on the user's PATH, so packaged AMR builds can't be
  hijacked by a global installation.

- web/EntryShell.tsx: when the Local CLI scan returns an available
  agent and the previously-selected agent is AMR, switch the selection
  to the first available local agent so the runtime and persisted
  agent agree before Continue.

- server.ts (model-probe branch): for AMR, check `readVelaLoginStatus`
  BEFORE rejecting on an empty live-model catalog — a signed-out user
  was getting `AMR_MODEL_UNAVAILABLE` ("choose a model") instead of
  the correct `AMR_AUTH_REQUIRED` (sign-in affordance).

- server.ts (default model fallback): if the user asked for the AMR
  agent default and the cached id is no longer in the FRESH catalog,
  fall back to `liveModels[0]` from the probe instead of rejecting the
  run as `AMR_MODEL_UNAVAILABLE`.

- integrations/vela.ts: route `vela login` through
  `createCommandInvocation` so an npm/Node-style `vela.cmd` / `.bat`
  shim on Windows gets the correct `cmd.exe /d /s /c …` wrapping with
  verbatim args (matches `execAgentFile` / chat-run spawning).

- tools/pack/src/linux.ts: in containerized Linux builds, bind-mount
  the host directory of `OPEN_DESIGN_VELA_CLI_BIN` and rewrite the env
  to the container-side path. The host path was being passed in as-is
  even though the default container only mounts /project, /tools-pack
  and cache/home — `copyOptionalVelaCliBinary` saw a missing path.

Deferred (out of scope for this PR):
- `od amr status/login/logout/cancel` CLI subcommands (AGENTS.md
  UI/CLI dual-track rule, server.ts:5763) — sizable surface; tracked
  for a separate focused PR.
- Strict `--require-vela-cli` for Windows + mac-x64 beta builds:
  prematurely blocked — `@powerformer/vela-cli` only publishes the
  `darwin-arm64` platform binary today; adding the flag elsewhere
  would fail the builds. Revisit once win/x64/linux binaries ship.

* fix(amr): hoist sendAmrAccountFailure above the AMR catalog preflight (TDZ)

The new signed-out AMR branch in the catalog preflight at server.ts:10875
calls `sendAmrAccountFailure(...)` to emit AMR_AUTH_REQUIRED, but the
const declaration sat ~100 lines below at the outer function scope. Because
`const` is TDZ-aware, that branch would have thrown `ReferenceError:
Cannot access 'sendAmrAccountFailure' before initialization` for the
exact users it tries to help — defeating the original intent.

Hoist the helper to just above the AMR preflight block so it's available
to every AMR code path in this function. Behavior elsewhere is unchanged.

Also rerun the daemon test suite: `launch.test.ts > resolveAgentLaunch
uses packaged built-in Vela for AMR` was creating the
`<resourceRoot>/bin/libexec/opencode/` companion *directory* only, but
this PR's earlier tightening of `packagedVelaOpenCodeCompanionTree`
also requires the inner `opencode` executable. Add it to that fixture
to match the new contract; the test was a sibling of the executables /
env-and-detection fixtures already updated in 13fc4f4.

Addresses #2355 review (mrcfps, 2026-05-28).

* feat(web): add hover cancel for AMR login (#3158)

* feat(web): add hover cancel for AMR login

* fix(web): don't bounce AmrLoginPill back to 'Signing in…' after local cancel

Both codex-connector (P2) and looper (CHANGES_REQUESTED) on this PR
flagged the same race in the new local-cancel path: `handleCancelLogin`
dispatches `notifyAmrLoginStatusChanged('login-canceled')` immediately
after `/login/cancel` returns, but the `AMR_LOGIN_STATUS_EVENT` listener
unconditionally re-enters `refresh()` and then restarts polling
whenever `/api/integrations/vela/status` still reports
`loginInFlight: true`.

That is a real race because the daemon's `cancelVelaLogin()` only sends
SIGTERM (escalating to SIGKILL after `LOGIN_CANCEL_KILL_GRACE_MS` =
2000 ms) and keeps the child in `activeLoginProcs` until it actually
exits — so the first `/status` read after a successful cancel can
legally still come back as in-flight. Under that window the pill flips
back to 'Signing in…' and can later surface the timeout/error path even
though the user already canceled, defeating the behavior promised in
the PR description.

Fix the listener instead of every dispatch site: in the
`login-canceled` branch, after the local reset (stopPolling +
setPending(null) + clear refs), optimistically mark every subscribed
pill instance as not-in-flight (`setStatus((c) => c ? { ...c,
loginInFlight: false } : c)`) and `return` — skip the
refresh-and-reconcile branch below entirely. The next explicit refresh
(component mount, user interaction, or a `status-changed` event) will
pick up the daemon's confirmed state once the child has actually
exited.

Add a focused regression test that holds `/api/integrations/vela/status`
at `loginInFlight: true` even after a successful `/login/cancel`,
asserting that the pill stays at the Canceled → Authorize sequence and
never bounces back to 'Signing in…'. This test fails on the pre-fix
listener and passes on the new behavior; existing
'cancels an in-flight AMR sign-in…' and 'reconciles late AMR browser
completion to Signed in after local cancel' tests continue to pass.

Addresses review feedback on #3158 (chatgpt-codex-connector, nettee).

---------

Co-authored-by: lefarcen <935902669@qq.com>

---------

Co-authored-by: a1chzt <chizblank@gmail.com>
Co-authored-by: Amy <1184569493@qq.com>
Co-authored-by: Mason <jinmeihong0201@gmail.com>
Co-authored-by: Caprika <56862773+alchemistklk@users.noreply.github.com>
Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>
2026-05-28 05:09:55 +00:00
elihahah666
dadf8a5bc3
Add tracking for Automations, Plugin Detail & Loop (#3103)
* Add tracking for Automations, Plugin Detail, and Plugin Loop

- RoutinesSection: track new_automation, create, save, cancel, run_now,
  edit, pause, resume, delete, history button clicks
- PluginDetailView: track back navigation and use_plugin action
- PluginLoopHome: track clear_active, submit, card_details, card_use
- Extend AutomationsClickProps with new CRUD elements
- Add PluginDetailClickProps and PluginLoopClickProps contracts

* fix: address review comments on plugin/automation tracking

- Extract onBack handler in PluginDetailView to cover both error-path
  and success-path back buttons with tracking
- Move create/save tracking from submit button onClick into the form
  submit handler to capture keyboard submissions and avoid false
  positives from validation failures

* fix: move submit tracking into submit() handler in PluginLoopHome

Same fix as RoutinesSection: tracking now fires inside submit() so
keyboard Enter submissions are captured and the !trimmed guard
prevents false positives.

---------

Co-authored-by: qiongyu1999 <2694684348@qq.com>
2026-05-27 16:48:41 +00:00
lefarcen
b8cfee0c60
fix(diagnostics): capture daemon/web logs in packaged bundles (#3126)
Packaged diagnostics bundles never contained the daemon or web
`latest.log` — the very logs that hold the agent/critique run flow — so
support exports could not explain "sent prompt to the agent, then
nothing happened" reports.

Root cause: the sidecar `base` means different things per launch path.
tools-dev passes the pre-namespace source root, so
`resolveNamespaceRoot(base, namespace)` is correct. But the packaged
orchestrator launches every child with `base = <namespaceRoot>/runtime`
(apps/packaged/src/{paths,sidecars}.ts) while logs live a level up at
`<namespaceRoot>/logs`. The diagnostics builders re-appended the
namespace and resolved every log to
`<namespaceRoot>/runtime/<namespace>/logs/...` → ENOENT. renderer.log
only survived by accident: the desktop main process wrote it to the
same wrong path the reader looked in.

Add `resolveRuntimeNamespaceRoot(runtime, contract, runtimeMode)` to
`@open-design/sidecar` which walks up out of the `runtime/` dir in
packaged (runtime-mode) launches and falls back to the dev layout
otherwise. Route the desktop renderer-log path and both diagnostics
exporters (desktop IPC + daemon HTTP) through it so writer and reader
stay in lockstep and renderer.log lands next to the desktop log dir.

Tests: sidecar unit specs for both layouts; a daemon export spec that
writes a real `<namespaceRoot>/logs/daemon/latest.log` and asserts the
bundle captures its contents (red on main → ENOENT placeholder, green
here).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 14:48:14 +00:00
elihahah666
1e9be2fdb5
Add tracking for Comment panel save/send actions (#3098)
Track "Save comment" and "Send to chat" button clicks in the comment
popover with a new `comment_popover` area, so we can measure the
distribution of save vs send-to-chat usage.

Co-authored-by: qiongyu1999 <2694684348@qq.com>
2026-05-27 09:23:19 +00:00
Siri-Ray
170a05f5d2
Formalize skill artifacts into plugins (#3085)
* Add skill-to-plugin candidate flow

* Fix skill plugin candidate card reuse

Generated-By: looper 0.9.1 (runner=fixer, agent=codex)

* Fix skill plugin candidate dismiss and URL gates

Generated-By: looper 0.9.1 (runner=fixer, agent=codex)

* Polish skill plugin candidate copy
2026-05-27 08:26:00 +00:00
chaoxiaoche
fce444bcab
Consolidate chat comments preview on main (#2906)
* feat(web): queue chat sends

* feat(web): render code comment directives

* feat(web): add preview comments and manual edits

* fix(web): polish shared chrome controls

* fix(web): align queued send loading state

* feat(web): open primary project artifacts

* fix(web): keep queued sends and tests aligned

* fix(web): restore docked comment tools layout

* fix(web): align preview comment toolbar

* fix(web): place local cli beside handoff

* fix(web): move agent menu beside handoff

* fix(web): make project instructions a direct header action

* fix(web): compact handoff and toolbar labels

* fix(web): clarify handoff menu and annotation label

* fix(web): restore compact cursor handoff trigger

* fix(web): align agent menu trigger with handoff

* fix(web): add draw toolbar close action

* fix(web): move inspect editing into edit mode

* fix(web): avoid reserving comment sidebar in annotation mode

* fix(web): float preview comments panel

* fix(web): keep edit canvas full width

* fix(web): polish preview annotation tools

* fix(web): highlight active preview comments

* fix(web): open comments panel after annotation save

* fix(web): polish comment handoff controls

* fix(web): remove palette preview tool

* fix(web): simplify draw annotation toolbar

* fix(web): restore queued tasks into composer

* fix(web): restore queued send strip styling

* fix(web): hide internal comment target ids

* fix(web): align manual edit panel header

* test(web): cover visual interaction contracts

* fix(web): address PR feedback regressions

* fix(web): preserve artifact chrome state

* fix(daemon): restore project raw file routes

---------

Co-authored-by: chaoxiaoche <chaoxiaoche@chaoxiaochedeMacBook-Pro.local>
Co-authored-by: mrcfps <mrc@powerformer.com>
2026-05-26 10:31:19 +00:00
YOMXXX
2bd83b6e23
feat(daemon): structured diagnostics for agent connection test results (#2248 PR 1/N) (#2419)
* feat(daemon): attach structured diagnostics to agent connection test results

Local agent connection-test failures currently flatten everything into
a single free-form `detail` string (e.g. "exit 1"). Settings UI and CLI
consumers can't tell what phase failed, which binary the daemon picked,
or what the child's exit metadata looked like — they have to scrape the
human-readable text.

Add an optional `diagnostics` block on the connection-test response so
callers can read structured fields instead. The existing `kind` and
`detail` strings are kept bit-for-bit identical, so older UIs keep
rendering unchanged.

- packages/contracts: add `ConnectionTestPhase`
  (binary_resolution / version_probe / model_list / spawn /
  connection_smoke_test / output_parse) and a `ConnectionTestDiagnostics`
  interface with optional `binaryPath`, `binaryVersion`, `exitCode`,
  `signal`, `stdoutTail`, `stderrTail`; extend
  `ConnectionTestResponse.diagnostics?` to carry it.
- apps/daemon/connectionTest.ts: thread a `phase` tracker through
  testAgentConnectionInternal, flip it at the meaningful boundaries
  (binary_resolution → spawn → connection_smoke_test / output_parse),
  and stamp diagnostics into every result return point — the four
  result helpers plus both early returns. Tail data already buffered
  by `createAgentSink` is reused; nothing new is captured.
- tests: three regressions per #2248 — success path attaches
  phase='connection_smoke_test' + exitCode 0, exit-failed path
  attaches phase='spawn' + the failing exitCode + the stderr tail,
  and a missing-CLI path attaches an early-phase diagnostics block.

This is PR 1 of the #2248 plan (contracts + minimum daemon fill);
follow-ups will introduce a normalized failure classifier
(binary_not_found, unsupported_version, auth_failed, quota_exceeded,
network_failed, unsupported_flags, no_text_output, output_parse_failed,
spawn_failed), candidate-alternative reporting via
inspectAgentExecutableResolution, and the Settings "View details"
disclosure.

Refs #2248.

* fix(connectionTest): honor diagnostics contract on all local return paths

Two follow-ups from review of #2419:

- packages/contracts/src/api/connectionTest.ts advertises diagnostics
  as 'Always set on local agent test responses', but three local
  returns still bypassed buildDiagnostics(): the buildArgs failure
  around 1295, the preflight probeAgentAuthStatus().status === 'missing'
  branch around 1317, and the outer catch around 1566. Thread
  buildDiagnostics() through all three; phase is still 'binary_resolution'
  at the first two and whatever the runtime advanced to at the catch.
- resultFromAgentText() hard-coded exitCode: 0 even though
  resultFromChildExit() routes ACP clean-SIGTERM completion through
  this success helper (winner.code === null, winner.signal ===
  'SIGTERM' with acpCleanCompletion). Add an optional exit argument
  threaded from both call sites so the diagnostics reflect the actual
  child code/signal pair instead of a synthesized 0 that masks the
  SIGTERM teardown. Only synthesize 0 when no exit context is
  available (theoretical text-without-exit path).

Tests:
- regression locking the diagnostics contract for the preflight auth
  path on Cursor Agent (phase: binary_resolution, binaryPath set)

* docs(contracts): widen diagnostics contract to match early-failure paths

Reviewer flagged that the JSDoc-style comment on
ConnectionTestResponse.diagnostics still said 'Populated only when the
test actually spawned an agent CLI', but the previous follow-up made
the daemon stamp diagnostics on three pre-spawn local-agent failures
too: the unknown-agent and unresolved-binary branches around
connectionTest.ts:1123-1148 and the preflight auth return around
1338-1353. Reword the contract so Settings/CLI consumers do not
incorrectly special-case those early local failures as
diagnostics === undefined.

* fix(connectionTest): keep contracts browser-safe and fold probe output into preflight diagnostics

Two follow-ups from review of #2419:

- ConnectionTestDiagnostics.signal was typed as
  `NodeJS.Signals | string | null`, which made the generated .d.ts of
  the shared @open-design/contracts surface depend on ambient Node
  types. Downstream consumers reading a plain HTTP response shape
  should not need @types/node. Narrow to `string | null` (NodeJS.Signals
  literals are strings, so the daemon write site is unchanged) and
  document the boundary in the field comment.
- The Cursor-style preflight auth path stamped diagnostics built from
  the smoke-test sink, which is always empty at that point because the
  smoke spawn never happened. As a result the diagnostics block
  silently dropped `cursor-agent status`'s own stderr/stdout/exit
  context — the only structured failure information available on that
  path. Thread the probe output back out of probeAgentAuthStatus()
  via new optional stdoutTail/stderrTail/exitCode/signal fields, then
  merge them into the diagnostics overrides in connectionTest.ts so
  Settings/CLI consumers can render the auth-failure context instead
  of just the guidance string.

Tests:
- extended the Cursor preflight regression to assert that diagnostics
  carries the probe's stderr ("Not logged in") and exit code (1).
2026-05-26 03:17:05 +00:00
chaoxiaoche
2b7b6590ae
feat(comments): add comment attachment API (#2869)
* feat(comments): add comment attachment API

* ci: add fork PR workflow approval script

---------

Co-authored-by: chaoxiaoche <chaoxiaoche@chaoxiaochedeMacBook-Pro.local>
2026-05-25 07:24:21 +00:00
lefarcen
c14baf07d3 Merge origin/main into release/v0.8.0
PR #2461 sync prep — resolves 14 conflicts merging 84 main-side commits
on top of 58 release-side commits accumulated during the 0.8.0 cycle.

Resolution summary:

Take main (theirs) where main carried deliberate forward progress:
- apps/web/src/components/PluginCard.tsx — 7 hunks, i18n migration:
  hardcoded English aria-labels/titles replaced with t() calls keyed
  on pluginCard.* (all 8 keys verified present in en.ts).
- apps/web/src/components/TasksView.tsx — 1 hunk, source-ingestion
  feature: sortedRoutines (newest-first), sourceIngestionTemplates,
  patchSourceForm, submitSourceIngestion. activeCount/pausedCount
  semantics preserved (now keyed on sortedRoutines, count unchanged).
- e2e/ui/app.test.ts — new node:fs/promises + tmpdir + path + @/timeouts
  imports needed by main-side test helpers.
- e2e/ui/settings-local-cli-codex-fallback.test.ts — menu-dismissal
  helper block added by main.

Keep both sides where each added a different field to the same object
literal:
- apps/web/src/components/ProjectView.tsx (locale + analyticsHints
  spread).
- apps/web/src/components/DesignSystemFlow.tsx (locale + analyticsHints).

Take release (ours) where release carried deliberate work that ships
0.8.0:
- CHANGELOG.md — release-side 0.8.0 entry + PR link refs; main's
  Unreleased section was the same body of work, now finalized.
- apps/landing-page/public/{apple-touch-icon,favicon}.png +
  apps/web/public/app-icon.svg — release-side visual refresh assets
  consistent with 0.8.0 stable ship.
- tools/pack/src/linux.ts — packageVersion const required by line 466;
  taking main's empty line would build-error.
- e2e/ui/project-management-flows.test.ts +
  e2e/ui/settings-api-protocol.test.ts +
  e2e/ui/settings-memory-routines.test.ts — release-side release-smoke
  hardening (shangxinyu1 + PerishFire) takes precedence on overlap.

Closes-issue / unblocks: PR #2461 sync release/v0.8.0 → main.
2026-05-23 12:17:18 +08:00
Matt Van Horn
715ed04f5d
fix(prompt): instruct discovery form to follow user's chat language (#2534)
* fix(prompt): instruct discovery form to follow user's chat language

The discovery form was reaching users in English even when their UI
language was Chinese (#1416). The form is generated by the LLM under
guidance from packages/contracts/src/prompts/discovery.ts, but the
prompt only mentioned that option labels MAY follow the user's
language. The example form embedded English text for title,
description, per-question labels, and placeholders, and the LLM
copied that text verbatim instead of localizing.

Two minimal changes to the prompt:

1. Add a sentence under RULE 1 making the language-match expectation
   explicit before the example forms.
2. Expand the Form authoring rules bullet so it covers every
   user-facing string (title, description, label, placeholder, option
   label) and pins the unlocalized identifiers (id, type, option
   value, branch values) for the runtime branch logic.

Fixes #1416

* fix(prompts): mirror discovery localization rule to daemon prompt copy

Apply the same 'Match the user's chat language' paragraph and the
expanded 'Localize every user-facing string' bullet to
apps/daemon/src/prompts/discovery.ts, which the daemon-backed chat
path uses (it imports ./discovery.js, not the contracts copy).

Also add apps/daemon/tests/prompts/discovery-localization-drift.test.ts,
which reads both prompt copies and asserts each one contains both rules,
so the contracts and daemon files cannot silently drift on this behavior.

Apply-anyway reason: pnpm install / pnpm vitest could not run locally
(registry DNS blocked in sandbox + node v26 vs required v24). Direct
Node content assertion over both files passes. CI will run vitest.

---------

Co-authored-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
2026-05-23 11:48:17 +08:00
Marc Chan
a3872b97a9
fix(tools-dev): preserve web origin trust on web start (#2715)
* fix(tools-dev): preserve web origin trust on web start

Restart daemon/web when the trusted web port is missing, and reuse the active web port during repeated starts so run web and start web keep app-config origin checks aligned.

Generated-By: looper 0.0.0-dev (runner=worker, agent=opencode)

* fix(plugins): refresh official registry bundled count

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(tools-dev): preserve daemon/web reserved ports

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(tools-dev): preserve daemon reuse on web start

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(tools-dev): preserve running daemon port on web reuse

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(tools-dev): reserve explicit web port before daemon allocation

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* test(web): stabilize media provider reload flash timing

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(web): restore merged reattach workspace coverage

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* fix(tools-dev): reserve allocated daemon port

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)

* test(e2e): wait for artifact manifest persistence

Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode)
2026-05-23 00:25:43 +08:00
Devayan Dewri
1b908a8481
fix(daemon): restore full assistant turn after mid-flight reload reattach (#2383)
* fix(daemon): restore full assistant turn after mid-flight reload reattach

When a daemon run is in progress and the browser reloads, the client
reattaches and the artifact recovers, but the restored chat turn drops
assistant text, thinking events, and producedFiles. Three independent
defects combine to cause this:

1. The reattach onDone never populated producedFiles. The pre-turn file
   snapshot used as the diff baseline lived only in a closure. Now it is
   persisted on the assistant message as preTurnFileNames so the reattach
   path can rebuild the diff after reload.

2. The SSE replay used a strict `>` cursor compare. A client that had
   already persisted lastRunEventId equal to the final event id received
   zero replay events on terminal-run reattach, fell into the status-only
   REST fallback, and never fired a clean onDone. The server now replays
   the final buffered event on terminal-run reattach when the cursor is
   at or past the end, so the client always sees a terminal signal.

3. The text buffer flushed on visibilitychange but not on pagehide.
   Hard reloads on browsers where visibilitychange does not fire before
   teardown could lose the last ~250ms of streamed text from the
   persisted message. A pagehide listener now flushes synchronously.

Refactor: extracted computeProducedFiles helper so the send and reattach
flows share the diff logic and cannot drift apart again.

Tests:
- apps/web/tests/components/ProjectView.reattach-restore.test.tsx
  covers: reattach onDone populates producedFiles from preTurnFileNames;
  reattach reaches succeeded via SSE even when only the end event replays;
  computeProducedFiles unit cases.
- apps/daemon/tests/runs.test.ts adds replay-cursor coverage for both
  the terminal-replay safety branch and the no-duplicate normal branch.

* fix(daemon): persist preTurnFileNames end-to-end on the messages table

Review on #2383 caught that `ChatMessage.preTurnFileNames` (added in
packages/contracts) had no daemon-side persistence: the messages
schema, upsertMessage, and normalizeMessage all ignored the field.
saveMessage() would PUT the field, the daemon would silently drop it,
and a real page reload would read a row without `preTurnFileNames`, so
the reattach onDone fell back to `new Set(nextFiles.map(...))` and
still missed files produced earlier in the turn.

This commit closes the round trip:

- New `pre_turn_file_names_json TEXT` column on the messages table,
  with a forward-compatible ALTER for existing databases (same pattern
  as agent_id / feedback_json / run_status).
- Both upsertMessage branches (UPDATE and INSERT) now serialize
  m.preTurnFileNames into the new column.
- listMessages, the post-upsert readback SELECT, and normalizeMessage
  surface the column back to callers.

Round-trip tests in apps/daemon/tests/db-pre-turn-file-names.test.ts
cover: write+listMessages, the UPDATE upsert path preserving the
baseline, and a legacy-row case returning undefined.

* fix(web): preserve terminal status + full multi-file diff on reattach

Two correctness issues caught in review of the prior reattach commits:

1. The reattach onDone path hard-coded `runStatus: 'succeeded'`, which
   overwrote a 'failed' or 'canceled' status that the replayed terminal
   event had already recorded via onRunStatus. Restored messages would
   come back as success even when the run had actually failed or been
   canceled. Now derives the final status from `prev.runStatus` via the
   existing `resolveSucceededRunStatus` helper, mirroring the send path
   at line 2333.

2. When `findExistingArtifactProjectFile()` recovered an existing
   on-disk artifact, the produced-files list was replaced with that
   single file, dropping any other files the turn had created earlier.
   Now always computes the full diff against `preTurnFileNames`, then
   appends the recovered artifact only if it isn't already in that
   set. Extracted as `mergeRecoveredArtifact(diff, recovered)` so the
   logic is a unit-testable invariant.

Tests in ProjectView.reattach-restore.test.tsx:
- mergeRecoveredArtifact: three cases (recovered appended to pre-files,
  no duplication when already in the diff, passthrough on no recovery).
- reattach failed-status: onRunStatus('failed') → onDone → final
  saveMessage has runStatus 'failed', not 'succeeded'.
- reattach canceled-status: same shape for cancellation.

* fix(web): force keepalive PUT on pagehide so the last buffered chunk survives reload

Review on #2383 caught that onPageHide() only called flush(), which
updates React state then schedules persistSoon() — a 500ms debounce.
On a hard reload the page tears down before that timer fires, so the
final ~250ms of streamed text never reaches the daemon.

Threaded a new flushAndPersistNow() callback through
createBufferedTextUpdates(). Both buffer call sites (send-path +
reattach-path) supply it backed by persistMessageById(id, { keepalive:
true }). saveMessage in state/projects.ts forwards the new
SaveMessageOptions.keepalive flag onto fetch's keepalive option, which
the browser honors specifically for unload-time requests.

onPageHide now calls flush() followed by flushAndPersistNow?.(), so:
- flush() pushes the buffered delta into React state synchronously
- the immediate persistMessageById then PUTs the updated message with
  keepalive:true, surviving document teardown

Regression test in ProjectView.reattach-restore.test.tsx: stream a
delta, dispatch pagehide, assert saveMessage was called with the
flushed content AND { keepalive: true } before the 500ms debounce
would otherwise have fired.
2026-05-22 18:47:12 +08:00
PerishFire
64f077d366
fix(download): handle pid reuse in stale locks (#2714) 2026-05-22 18:00:11 +08:00
lefarcen
85228a2b05
fix(analytics): capture About-you survey across rapid-finish flow (#2713)
PR #2590 wired About-you dropdown selections to fire one
`onboarding/ui_click` per pick (organization_size / use_case /
hear_about_us) with the chosen value attached. Live PostHog data on
nightly.11 showed a real session where every survey value was missing:
the user filled all four dropdowns and clicked Finish setup inside a
~3-second window, the route navigated away before posthog-js could
flush the per-dropdown rows, and the resulting `onboarding_complete_result`
only said `has_about_you: True` without surfacing what the user picked.
Two specific gaps surfaced.

`role` was never tracked at all. `OnboardingDropdown` for role had no
`emitOnboardingClick` call; `OnboardingClickProps` had no `role` field;
`TrackingOnboardingClickElement` had no `'role'` element. Even on a
slow-path session the dashboard could not see what role the user
identified as.

The other three fields (organization_size / use_case / discovery_source)
fired the right click events but were one round-trip away from a route
change. With per-dropdown rows as the only carrier, a Finish-setup
within a ~3-second batch lost them all to navigation-side flush failure.

Fix is two complementary delivery paths plus a closure-staleness repair:

- Contract: `TrackingOnboardingRole` (open-string, like the other
  About-you survey types so adding a future role doesn't force a
  contract bump); `role` element on `TrackingOnboardingClickElement`;
  `role?` and `use_cases?` fields on `OnboardingClickProps`. New
  `about_you_submit` click element + survey-snapshot fields
  (`role`/`organization_size`/`use_cases`/`discovery_source`) on
  `OnboardingCompleteResultProps`.

- EntryShell: emit `role/select_option` on the role dropdown so the
  per-pick funnel is symmetric with the other three. Introduce
  `profileRef` (live mirror via `useEffect`) so closures that fire
  faster than React commits (rapid multi-pick on `use_case`, the
  Finish-setup click after the last onChange) read the latest
  selection instead of stale render-time state. On Finish setup,
  emit a single `about_you_submit/continue` click with the full
  survey snapshot BEFORE `continue` + `onboarding_complete_result`,
  so the highest-value row is queued first. Mirror the same survey
  fields onto `onboarding_complete_result`, giving the dashboard two
  independent carriers for the same data — losing either path still
  leaves the funnel a complete picture.

`use_case` multi-select also had a stale-closure bug separate from
the navigation issue: the delta computation read `profile.useCase`
which was closure-captured one tick behind the latest pick. Reading
`profileRef.current.useCase` makes the delta correct even when two
picks land in the same commit. Live data from the fix session shows
`use_case` rows firing one-per-pick in real time.

Validation against the live PostHog stream on namespace
`onboarding-survey` (dev build, fresh install via wipe + restart):

  09:51:13  role/select_option            role=pm
  09:51:15  organization_size/select_option organization_size=solo
  09:51:16  use_case/select_option         use_case=product
  09:51:18  hear_about_us/select_option    discovery_source=github
  09:51:20  about_you_submit/continue      role=pm + organization_size=solo
                                           + use_cases=['product']
                                           + discovery_source=github
  09:51:20  continue/continue
  09:51:20  onboarding_complete_result     has_about_you=True
                                           role=pm + organization_size=solo
                                           + use_cases=['product']
                                           + discovery_source=github

An earlier pre-fix session on the same window (same user, no reload)
shows the original bug: 4 use_case rows, no role, no
`about_you_submit`, no survey fields on complete despite
`has_about_you: True`.

Targets release/v0.8.0.

  pnpm --filter @open-design/contracts build  green
  pnpm --filter @open-design/contracts test    111/111
  pnpm --filter @open-design/web typecheck     clean
  pnpm --filter @open-design/web test          1843/1843
2026-05-22 17:57:42 +08:00
Eli-tangerine
10e11531a1
Improve deck home previews and plugin gallery performance (#2698)
Co-authored-by: qiongyu1999 <2694684348@qq.com>
2026-05-22 17:47:28 +08:00
Siri-Ray
9d7e4658df
fix(host): accept folder imports without entry files (#2701)
Generated-By: looper 0.8.1 (runner=worker, agent=codex)
2026-05-22 17:21:22 +08:00
lefarcen
9912fa899a
feat(analytics): full design-system event family + DS run variant (#2706)
Lands the v2 PostHog spec's P0 design-system event family: five new
result events covering source ingest, create, review, status, and
picker apply; the existing file_upload_result + run_created/run_finished
schemas widened to discriminate DS workspaces from regular chat runs.

Contract (packages/contracts/src/analytics/events.ts):
- AnalyticsEventName gains design_system_{source_ingest,create,review,
  status,apply}_result.
- Props interfaces + bucket/origin/method/status enums per spec.
- TrackingProjectKind gains 'design_system' for DS-as-project runs.
- RunCreatedProps / RunFinishedProps widen page_name+area to discriminate
  chat_panel vs design_system_project; entry_from union accepts DS values;
  DS-variant context fields (ds_source_origin, source_count, brand
  description length bucket, per-source counts, design_system_created,
  preview_module_count, missing_font_count).
- FileUploadSurface union adds design_systems / design_system_source.
- Bucket helpers (designSystemLengthBucket, folderCountBucket,
  totalSizeBucket), module slug + type derivation, repo host parser.

Web emission sites:
- DesignSystemFlow.generate(): create_result + threads
  prepareCreatedDesignSystemProject with analyticsTrack so each of the
  4 source paths emits source_ingest_result (success / partial / failed
  / empty), repo-host dominance, fallback type from connector status.
- DropZone onFiles handlers: file_upload_result with deriveUploadCohort.
- DesignSystemDetailView: status_result on togglePublished + Make-default,
  review_result on Looks-good / Needs-work; module_id from markdown
  section header slug (designSystemModuleSlug), module_type via keyword
  heuristic.
- DesignSystemsTab: status_result on publish toggle, set/unset default,
  delete (incl. cancelled when window.confirm dismissed).
- NewProjectPanel: apply_result on DS picker change (manual select +
  clear) plus an auto_select emit when the picker mounts with a default
  DS not yet user-touched.
- ProjectView.streamViaDaemon: when project.metadata.importedFrom ===
  'design-system', pass analyticsHints with entry_from
  (onboarding_design_system for the auto-sent first message,
  regenerate_from_review for subsequent sends), projectKind=design_system,
  designSystemRunContext.

Daemon:
- ChatRequest gains optional analyticsHints (entryFrom / projectKind /
  designSystemRunContext). Behavior never depends on these; only PostHog
  props do.
- /api/runs handler reads analyticsHints to flip baseProps to the DS
  variant (page_name=design_system_project, area=design_system_generation,
  project_kind=design_system) when the run is DS-flagged, and spreads the
  DS context fields onto run_created.
- run_finished mirrors the DS area + adds design_system_created (true iff
  the run wrote DESIGN.md), preview_module_count (distinct preview/*.html
  writes), missing_font_count (0 placeholder; pending font-audit hook).
- run-artifacts.ts: extracts collectWrittenPathsMatching as the shared
  Write/Edit + isError-pair core; adds didRunCreateDesignSystemFile and
  countDesignSystemPreviewModules using the same dedup + failure-skip
  invariants as countNewHtmlArtifacts.

Tests:
- packages/contracts/tests/analytics-design-system-helpers.test.ts: 18
  new test cases over the bucket helpers, module slug + type mapping,
  repo host parser.
- apps/daemon/tests/run-artifacts.test.ts: 9 new tests for
  didRunCreateDesignSystemFile + countDesignSystemPreviewModules covering
  Write-then-Edit dedupe, case-insensitive DESIGN.md match, isError pair
  skip, preview/index.html as a module, non-preview path rejection.

Targets release/v0.8.0.
2026-05-22 17:18:57 +08:00
Siri-Ray
e6da01e998
Add i18n metadata for official content (#2692) 2026-05-22 16:39:32 +08:00
Patrick A
c5e38bbe58
fix(prompts): remove 10-item cap from discovery TodoWrite plan (#2298)
* fix(daemon): remove 10-item cap from discovery TodoWrite plan prompt

The RULE 3 sentence in DISCOVERY_AND_PHILOSOPHY told the model to write
'a plan of 5–10 short imperative items'. That upper bound caused the agent
to cap every plan at exactly ten steps even when the task genuinely needed
more. The TodoWrite JSON schema imposes no maxItems constraint, so the cap
was entirely prompt-driven.

Replace '5–10 short imperative items' with 'short imperative items covering
the work'. TodoWrite intent, RULE 3 label, and planning-before-building
requirement all survive unchanged.

Red spec: apps/daemon/tests/prompts/discovery-todo-cap.test.ts

* fix(prompts): remove 10-item cap from contracts discovery copy and harden tests

[pass-6,7 BLOCKER] packages/contracts/src/prompts/discovery.ts still had
the old '5-10 short imperative items' wording. apps/web imports
composeSystemPrompt from @open-design/contracts (ProjectView.tsx:43),
so web-originated chat runs were still subject to the cap.

[pass-8 WARNING] discovery-todo-cap.test.ts did not cover the contracts
copy, leaving that path unguarded. Also no guard against semantically
equivalent re-introduction via 'at most / maximum / no more than'.

Changes:
- packages/contracts/src/prompts/discovery.ts: apply same wording fix as
  apps/daemon; add inline rationale comment
- apps/daemon/src/prompts/discovery.ts: add inline rationale comment
- apps/daemon/tests/prompts/discovery-todo-cap.test.ts: add 4th assertion
  blocking 'at most|maximum|no more than N item' re-introduction
- packages/contracts/tests/system-prompt.test.ts: add 5-assertion suite
  guarding the contracts copy and composed prompt output
2026-05-22 16:23:37 +08:00
PerishFire
b4e94b0534
Harden packaged updater downloads and install handoff (#2677)
* Add managed download package for updater resumes

* fix(download): clear stale pid locks

* test(e2e): harden windows updater resume smoke

* feat(updater): make update downloads silent in ui

* fix(updater): keep install handoff prompt visible

* fix(ci): build platform before download in postinstall
2026-05-22 15:44:28 +08:00
Eli-tangerine
72c8e34bc9
Polish home onboarding and community presets (#2658)
Co-authored-by: qiongyu1999 <2694684348@qq.com>
2026-05-22 14:26:56 +08:00
shangxinyu1
cc6edb9afe
Proxy GitHub metadata through the daemon (#2654)
* Proxy GitHub metadata through the daemon

* fix(contracts): share GitHub metadata responses

Generated-By: looper 0.6.0 (runner=fixer, agent=codex)

* fix(contracts): align GitHub fetchedAt payload types

Generated-By: looper 0.6.0 (runner=fixer, agent=codex)

* Proxy GitHub metadata through the daemon

Generated-By: looper 0.6.0 (runner=fixer, agent=codex)
2026-05-22 14:06:07 +08:00
lefarcen
e1818f2677
feat(analytics): onboarding ui_click + lifecycle events + update_popover surface_view (#2590)
* feat(analytics): onboarding ui_click + lifecycle + update_popover surface_view

Spec rows 1-3 of the Onboarding family (ui_click,
onboarding_runtime_scan_result, onboarding_complete_result) and the
home `update_popover` surface_view were all listed as P0 in the v2
doc but unwired — PostHog showed 0 events for every onboarding
ui_click, 0 for the scan/complete result events, and 0 for the
update-popover exposure.

Contract (`packages/contracts/src/analytics/events.ts`):
- Adds event names `onboarding_runtime_scan_result` /
  `onboarding_complete_result` and wires them into
  `AnalyticsEventPayload`.
- Adds `OnboardingClickProps` (page_name=onboarding, area/element/
  action discriminators + optional runtime/about_you/source rider
  fields) and threads it into `UiClickProps`.
- Adds `OnboardingRuntimeScanResultProps` and
  `OnboardingCompleteResultProps` with the doc's full field set —
  enums for runtime_type / scan result / completion result /
  completion_type, plus the lifecycle context (has_about_you,
  has_design_system_request, source_count, exit_step_name).
- Extends `TrackingFileUploadSurface` with an `onboarding /
  design_system_source` shape so the design-system-step source ingest
  can ride the same `file_upload_result` event the file_manager /
  chat composer already use. `source_type` is required on this shape
  so the dashboard can split by `local_code|fig|assets` without
  inspecting `file_type`.
- Adds `UpdatePopoverSurfaceViewProps` for the home toolbar's
  "Update ready" panel.

Onboarding wiring (`apps/web/src/components/EntryShell.tsx`):
- Centralises step/runtime-context derivation in `emitOnboardingClick`
  + `emitOnboardingComplete` helpers; every interactive control inside
  OnboardingView now fires through one of them so a future spec tweak
  changes one place.
- Click rows for runtime cards (local_coding_agent / byok), design-
  source cards (github_repo / local_code / fig_upload), about_you
  selects (organization_size / use_case / hear_about_us), and the
  Continue / Back / Skip navigation buttons. Multi-select use_case
  emits one row per added value, not per render.
- `scanCliAgents` now emits `onboarding_runtime_scan_result` with
  detected/available counts on every terminal state — success when
  any CLI is available, failed when scan returned zero or threw.
  `duration_ms` measures wall-clock from start to terminal.
- `onboarding_complete_result` fires from the Skip / last-step
  Continue / Generate paths with the right `completion_type`. The
  Generate path uses a new `DesignSystemCreationFlow.onBeforeGenerate`
  callback so the embedded flow can expose its local source-count
  state to the wrapper.

DS creation flow (`apps/web/src/components/DesignSystemFlow.tsx`):
- New `onBeforeGenerate(snapshot)` prop with a typed
  `DesignSystemGenerateSnapshot` shape. Fired right before the async
  generate() work; OnboardingView consumes it for both the `generate`
  ui_click (with source_type derived from which-counts-equal-total)
  and the completion lifecycle event.
- `renderDesignSystemCreation` in `EntryView` / `EntryShell` / `App`
  grows a second `hooks` arg that plumbs `onBeforeGenerate` through.

Update popover (`apps/web/src/components/UpdaterPopup.tsx`):
- Fires `surface_view page_name=home area=update_popover` once per
  panel-open transition, deduped by `app_version_before ->
  app_version_after` so a re-render of the same offer doesn't
  inflate the count.

Validation:
- `pnpm guard` 
- `pnpm --filter @open-design/web typecheck` 
- `pnpm --filter @open-design/web test`  203 files / 1828 tests
- `pnpm --filter @open-design/daemon test`  249 files / 2977 tests

* fix(analytics): generation_progress fires from chat_panel + complete_result uses snapshot

E2E (2026-05-21, distinct_id=e2e-onboarding-test-001) drove the full
welcome flow and exposed two issues in the previous commit:

1. `page_view page_name=onboarding area=generation_progress` (step 4)
   never fired. PR #2590's commit wired this from
   `DesignSystemDetailView`, but the Generate path actually navigates
   to ProjectView (`page_name=chat_panel`), not to the DS detail
   surface. PostHog showed `chat_panel` and `file_manager` page_views
   landing right after the Generate click but no
   `area=generation_progress` row.

   Fix: fire `area=generation_progress` from `ProjectView` right
   alongside its `chat_panel` page_view when an onboarding session
   id is still in sessionStorage. Clear the session id immediately
   after so a later unrelated project visit doesn't inherit the
   onboarding attribution. The `DesignSystemDetailView` site can
   stay as a defense-in-depth — same dedup guard, no double-fire.

2. `onboarding_complete_result` from the Generate path shipped with
   `has_design_system_request: false` and `source_count: 0`. The
   `emitOnboardingComplete` helper read `designSource` (the click
   state on the three source-type cards), but E2E showed users
   click Generate without clicking those cards — they type a brand
   description and add a GitHub URL directly in the embedded form,
   so `designSource` stays null even when a request is clearly in
   flight.

   Fix: thread `DesignSystemGenerateSnapshot` from the
   `onBeforeGenerate` callback into `emitOnboardingComplete` via a
   new `extra.sourceSnapshot` option. When present, derive
   `has_design_system_request` from `sourceCount > 0 ||
   hasBrandDescription` and `source_count` from the snapshot's
   `sourceCount`. Skip / last-step Continue paths still fall back
   to the `designSource` heuristic since no snapshot exists there.

* fix(analytics): emit artifact_count from new-html count + remove unmount session-id clear

Cherry-picked from the orphaned `fix/analytics-app-version-zero` HEAD
(commit 5b5a7ed5 — pushed after PR #2453 had already squash-merged,
never made it into release/v0.8.0). Two P0 data bugs:

1. `run_finished.artifact_count` was hard-coded `0` at
   `server.ts:11061` (now `:11394`). Every run on PostHog reported
   zero artifacts, breaking the "generation success → artifact
   produced" funnel.

   Fix: count incremental `.html` paths the run wrote or edited,
   deduped per path so a Write-then-Edit cycle on the same file
   counts as one artifact. Pure helper in
   `apps/daemon/src/run-artifacts.ts` with 10 unit tests covering
   empty / no-html runs, single Write, dedup across Write+Edit+
   MultiEdit, distinct paths, Codex aliases (create_file,
   str_replace_edit), both `file_path` and `path` input shapes,
   case-insensitive extension, non-agent / malformed payloads, and
   Read/Grep/Bash always ignored. Wired into server.ts's
   `run_finished` properties block.

2. `OnboardingView` cleared `onboardingSessionId` on unmount. The
   Generate path unmounts OnboardingView *before* the post-Generate
   page_view fires elsewhere, so an unmount-clear consistently
   wiped the id before the 4th-step emission could read it.
   PostHog showed zero `area=generation_progress` events.

   Fix: drop the unmount cleanup effect entirely. Skip / Back /
   last-step Continue paths clear inline in their respective
   handlers (already in place from this PR's earlier instrumentation
   commit). The Generate path's clear now lives in `ProjectView`
   right after the `chat_panel` page_view (and the
   `generation_progress` page_view that rides with it). Abandoned
   sessions clear on sessionStorage tab close.

* fix(analytics): emit onboarding complete after generate settles + text source_type

Two review fixes on PR #2590 from mrcfps (2026-05-21 14:11):

1. `onboarding_complete_result` was emitted from `onBeforeGenerate`,
   which fires synchronously BEFORE
   `DesignSystemCreationFlow.generate()` runs the async draft-create
   / workspace-open work. Both of those have failure branches that
   bounce the user back to the setup form with an error. In that
   case the lifecycle row would have shipped as
   `result=completed` / `completion_type=completed_with_design_system`
   even though no design system was actually generated.

   Fix: add a new `onGenerateSettled(snapshot, outcome)` callback to
   `DesignSystemCreationFlow` and fire it from each branch of the
   `generate()` function (success after `onCreated` / failed on
   draft-create returning null / failed on workspace-open returning
   null / failed on catch). OnboardingView keeps the `onBeforeGenerate`
   hook for the intent-only `generate` ui_click row, and moves the
   lifecycle complete emit into `onGenerateSettled`. Failed outcomes
   ship as `result=failed` + `completion_type=completed_without_design_system`
   + the daemon's error code, and clear the onboarding session id
   since the user stays in the wrapper.

2. The `source_type` ternary in OnboardingView's `generate` ui_click
   mapped `sourceCount === 0` to `'none'` unconditionally, so a
   prompt-only generate ("user only typed a brand description, no
   GitHub / local / fig / assets sources") was indistinguishable on
   PostHog from "no input at all". The v2 contract reserves the
   `'text'` literal precisely for that prompt-only path.

   Fix: extract a `deriveOnboardingSourceType(snapshot)` helper that
   returns `'text'` when `sourceCount === 0 && hasBrandDescription`,
   `'none'` only when both are absent, single-source literal when one
   kind dominates, `'mixed'` otherwise. Single source of truth for
   the mapping so the ui_click and any future complete-row tagging
   stay consistent.

* fix(analytics): countNewHtmlArtifacts skips failed tool ops

Review fix on PR #2590 from mrcfps (2026-05-21 14:30, on commit
9e9a0019). `countNewHtmlArtifacts` counted every `Write` / `Edit`
tool_use on a `.html` path regardless of whether the matching
`tool_result` came back with `isError: true`. A permission denied
`Write index.html`, a path-outside-cwd refusal, or a
parent-missing failure all still bumped `run_finished.artifact_count`
to 1 — which is exactly the corruption pattern this helper was
introduced to fix (hard-coded zero → spuriously > 0 is the same
class of broken funnel signal).

Fix: mirror the web-side `apps/web/src/runtime/file-ops.ts` pattern.
Build a `resultByToolUseId` map in a first pass, then in the
second pass only count a tool_use whose paired result exists AND
`isError !== true`. A tool_use with no matching result is treated
as "still in flight" and not counted; the dashboard would rather
under-count attempts than promise artifacts we can't confirm
landed.

Tests grow 3 → 13:
- successful Write pair counts (canonical path)
- isError=true result does NOT count
- unpaired tool_use does NOT count
- Write-success-then-Edit-fail on same path still counts (artifact
  is on disk; later edit failure doesn't unmake it)
- existing dedup / distinct-paths / alias / case / malformed /
  read-skip cases all updated to use the new pair() helper

* fix(analytics): re-arm onboarding lifecycle on generate failure for retry

Review fix on PR #2590 from mrcfps (2026-05-21 14:45, on commit
2cd05f09). The previous `onGenerateSettled` failure branch did two
things that together broke the retry path:

1. Flipped `lifecycleReportedRef.current` to `true` (via
   `emitOnboardingComplete`), which the same guard then uses to
   short-circuit every subsequent complete emit.
2. Called `clearOnboardingSessionId()`, wiping the sessionStorage id
   that downstream surfaces (ProjectView's `generation_progress`
   page_view, subsequent ui_click rows) need to attribute under the
   same funnel session.

But `DesignSystemCreationFlow.generate()` doesn't bail out on
failure — it `setStep('setup')` and leaves the user in the same
embedded form to try again. So the retry sequence used to look
like:

  click Generate → fails → complete(failed) → flag locked + id cleared
  user fixes input → click Generate again
    ui_click `generate` row → fires under the STALE in-memory ref
      (sessionStorage was cleared but `onboardingSessionIdRef.current`
       still holds the old uuid)
    generate succeeds → onGenerateSettled(success)
      → emitOnboardingComplete → lifecycleReportedRef guard returns
        early → second complete row never lands
    navigate to ProjectView → peekOnboardingSessionId() = null
      → step-4 `area=generation_progress` row never lands

Fix: the failure handler keeps the session id intact and just
re-arms `lifecycleReportedRef.current = false`. A retry then
emits a fresh complete row under the same `onboarding_session_id`
(useful for "N retries until success" analysis) and an eventual
success can still hand off through ProjectView with the id available
for the step-4 emission. The Skip / last-step Continue paths still
clear via the inline `clearOnboardingSessionId()` next to their
`onFinish()` because those terminate the flow explicitly.
2026-05-21 22:50:46 +08:00
lefarcen
6690dbd5bb
feat(analytics): PostHog + Langfuse instrumentation for assistant feedback (#1558)
* feat(analytics): PostHog + Langfuse instrumentation for assistant feedback

Re-bases the original three-commit PR onto release/v0.8.0. The web-side
feedback UI instrumentation (surface_view / ui_click / feedback_submit_result)
landed on main while this branch was open, so on this rebase that wiring
is taken from main; the remaining net additions are:

- Contracts: TrackingFeedback* enums and the four dedicated
  assistant_feedback_* event payload types (click, reason_view,
  reason_click, reason_submit), plus normalizeCustomReason helper.
  The new event-name variants are added to TrackingEventName and the
  AnalyticsEventPayload discriminated union next to the existing
  surface_view/ui_click variants — both wire formats coexist.
- POST /api/runs/:id/feedback in apps/daemon/src/chat-routes.ts:
  thin route that validates rating, allowlists reasonCodes through a
  simple string filter, and fire-and-forgets into the daemon's
  reportFeedback hook.
- apps/daemon/src/langfuse-bridge.ts reportRunFeedbackFromDaemon
  forwards the rating + reasonCodes into Langfuse as user_rating
  (NUMERIC ±1) + user_rating_reason (CATEGORICAL, one per code)
  score-create entries. Gates on telemetry.metrics + telemetry.content.
- apps/web/src/providers/daemon.ts reportChatRunFeedback (fire-and-forget
  fetch) and apps/web/src/components/ProjectView.tsx wiring so each
  thumbs-up/down + reason submission posts the side-channel.

Conflicts resolved (release/v0.8.0 vs the branch's old base):
- packages/contracts/src/analytics/events.ts: keep main's
  file_upload_result / feedback_submit_result / settings_* event
  variants alongside the new assistant_feedback_* additions.
- apps/daemon/src/server.ts: keep DNS-aware validateExternalApiBaseUrl,
  add reportFeedback closure wired into registerChatRoutes telemetry.
- apps/daemon/src/chat-routes.ts: keep both /tool-result and the new
  /feedback routes; merge RegisterChatRoutesDeps to include both
  'paths' and 'telemetry'. Drop PR's chat-routes-local
  reconcileAssistantMessageOnRunEnd helper (main has the equivalent in
  server.ts).
- apps/web/src/components/ChatPane.tsx & AssistantMessage.tsx & ProjectView.tsx:
  keep main's projectKindForTracking prop name and its existing
  emission of surface_view / ui_click / feedback_submit_result; the
  PR's analyticsCtx-based reason_view/click/submit emission is dropped
  in this rebase since it would duplicate the existing wire format.
- apps/web/tests/components/*: rename projectKind → projectKindForTracking
  to match ChatPane's current prop name.

Outstanding review feedback (from the pre-rebase round, will be
addressed in a follow-up commit):
- AssistantMessage tests not yet passing the new feedback context to
  the direct render path.
- ProjectView clear-feedback path skips reportChatRunFeedback, leaving
  stale Langfuse user_rating scores.
- buildFeedbackPayload has no deletion path for previously-submitted
  user_rating_reason scores when the user switches thumbs.
- POST /api/runs/:id/feedback always returns {status:'accepted'} even
  when consent is off; needs to surface skipped_consent / skipped_no_sink.
- reasonCodes are filtered to string[] but not allowlisted against
  ChatMessageFeedbackReasonCode or deduped.

* fix(analytics): address review on assistant feedback rebase

Picks up the in-scope correctness items from the prior review round
and the rebase residue without rewriting history:

- chat-routes.ts: `/feedback` now awaits the daemon's preflight
  outcome and echoes it as the response. The contract was already
  shaped as `accepted | skipped_consent | skipped_no_sink`, but the
  previous handler always returned `accepted` because the network
  send was fire-and-forget. The consent + sink decision is local
  (a small file read and an env-var lookup); the actual Langfuse
  upload still runs as a detached promise.
- chat-routes.ts: reasonCodes are now allowlisted against the
  contract's reason-code union and deduplicated before reaching
  Langfuse, so a stale or replayed client can't poison the
  Langfuse score table with unknown categorical values or
  duplicate stable ids in the same batch.
- langfuse-bridge.ts: split the consent + sink resolution from the
  fire-and-forget network send so the route can claim `accepted`
  honestly. The legacy `skipped_no_sink` return on app-config read
  failure is preserved.

Contracts + comment hygiene:
- TrackingFeedbackReasonCode in packages/contracts/src/analytics/events.ts
  drifted from ChatMessageFeedbackReasonCode in packages/contracts/src/api/chat.ts;
  add `followed_design_system` and `missed_design_system` so the
  analytics wire format stays aligned with the persistence shape.
- langfuse-trace.ts buildFeedbackPayload: the docblock claimed the
  raw custom-reason text is bucketed before send. Product reversed
  that on 2026-05-13 (raw text now ships, consent-gated). Replace
  the stale comment with the real semantics + a note that there is
  no tombstone path for reason codes the user removes in a
  follow-up submission (left as scope for a later PR).
- AssistantMessage.tsx: remove the now-unused
  `AssistantFeedbackAnalyticsCtx` interface and a stray blank-line
  delete from the rebase; restore the analytics-context comment
  above the feedback hook.

Left as follow-up (intentional, documented in code):
- Sending a tombstone score when the user clears their rating —
  ProjectView still skips reportChatRunFeedback on `change===null`,
  so Langfuse retains the previous rating until the user re-submits.
  The PostHog event captures the clear separately.
- Removing reason-code scores when the user re-submits with a
  smaller set — buildFeedbackPayload only overwrites the codes
  present in the current payload.

* feat(analytics): wire PR's dedicated assistant_feedback_* events

The four dedicated event types (`assistant_feedback_click` /
`_reason_view` / `_reason_click` / `_reason_submit`) the PR added to
contracts were sitting unused after the rebase because main's
umbrella `surface_view` / `ui_click` / `feedback_submit_result`
emissions covered the same user gestures. Wire the dedicated events
alongside the umbrella ones so both wire formats fire on every
feedback action — dashboards / evals can pick whichever schema they
were built against without losing signal.

Each dedicated event has stricter typing than its umbrella sibling
(`project_id` / `project_kind` / `conversation_id` are non-null), so
the new emissions are guarded behind a presence check and skipped on
test renders that mount AssistantMessage without project context. The
umbrella emissions retain their nullable fallbacks unchanged.

Pairing:
- surface_view (feedback reason panel) ↔ assistant_feedback_reason_view
- ui_click (feedback button)           ↔ assistant_feedback_click
- ui_click (reason submit button)      ↔ assistant_feedback_reason_click
- feedback_submit_result               ↔ assistant_feedback_reason_submit

Reason click + submit share the existing `requestId` so PostHog can
stitch click→result across both schemas, matching the spec.
2026-05-21 19:28:51 +08:00
Siri-Ray
3a33a7b475
fix(web): localize quick brief prompt (#2520)
* fix(web): localize quick brief prompt

Generated-By: looper 0.8.1 (runner=worker, agent=codex)

* fix(web): pass locale from design system chat

Generated-By: looper 0.8.1 (runner=fixer, agent=codex)

* fix(web): preserve task-type routing options

Generated-By: looper 0.8.1 (runner=fixer, agent=codex)

* fix(web): preserve task-type routing options

Generated-By: looper 0.8.1 (runner=fixer, agent=codex)
2026-05-21 19:18:13 +08:00
lefarcen
fab172b782
feat(analytics): emit file_upload_result from all three upload entries (#2459)
* feat(analytics): emit file_upload_result from all three upload entries

`file_upload_result` was wired only on the Design Files Upload button in
FileWorkspace. The chat composer paperclip (project page) and the home
hero composer paperclip uploaded files silently — PostHog dashboards
saw upload activity from one of three real entry points, so per-surface
funnels were invisible and totals undercounted.

Three problems are fixed together:

1. `FileUploadResultProps` hard-coded `page_name: 'file_manager' /
   area: 'file_manager'`, which prevented the other two surfaces from
   type-checking. Widened to a discriminated union over the three v2
   doc surfaces (`file_manager` / `chat_panel` chat_composer /
   `home` chat_composer).

2. `HomeChatComposerClickProps.element` was missing `'attachment'`, so
   the home composer paperclip had no usable click value even if we
   wanted to instrument it. Added the literal, mirroring the
   chat_panel composer.

3. Three call sites for `file_upload_result` would duplicate the
   per-file mime + total-bytes cohort math. Extracted to
   `apps/web/src/analytics/upload-tracking.ts#deriveUploadCohort` so
   FileWorkspace, ChatComposer, and the App.tsx Home submit path all
   compute the same `file_count` / `file_type` / `file_size_bucket`
   triplet. FileWorkspace's inline math is replaced with the shared
   helper to prevent drift.

Call-site wiring:

- HomeHero attach button: `ui_click` (`element='attachment'`) at
  click time. The actual upload is deferred to submit, so the
  `file_upload_result` for this surface fires from App.tsx after
  `uploadProjectFiles` resolves.
- ChatComposer.uploadFiles: `file_upload_result` on success / failed /
  throw branches; existing `ui_click` (`element='attachment'`) at the
  paperclip stays as-is.
- FileWorkspace.uploadFiles: refactored to use `deriveUploadCohort`;
  behavior unchanged.

* test(analytics): cover deriveUploadCohort matrix

Reviewer flagged that deriveUploadCohort silently fans out to three
upload entry points (file_manager / chat_panel / home) but has no
focused coverage, so a regression in zip detection, mixed-type
collapsing, or the 1/10/100 MB thresholds would skew analytics
without breaking any visible UI behavior.

Adds homogeneous-image, zip-by-mime, zip-by-extension, mixed-type,
empty-batch, bucket-boundary (1/10/100 MB), and defensive
empty-mime cases.
2026-05-21 18:28:56 +08:00
lefarcen
b2b94dbde7
feat(desktop): follow OS language in packaged builds (cherry-pick of #2544 into release/v0.8.0) (#2560)
* feat(desktop): follow OS language in packaged builds

Packaged Electron currently shows Open Design in en-US regardless of
the OS language setting, because the renderer's i18n picks its locale
from `navigator.language` and Chromium hard-codes that to en-US unless
the host process intervenes. Browser users and `tools-dev` users are
unaffected because their `navigator.language` already reflects the
OS / browser preference.

This change:

- Adds `applyOsLocaleSwitch(app)` in `@open-design/desktop/main`. It
  reads `app.getPreferredSystemLanguages()[0]` and (when called before
  Electron's `ready` event) points Chromium's `--lang` flag at it, so
  the renderer's `navigator.language` follows the OS. Safe to call
  more than once: `appendSwitch` is a no-op once `app.isReady()`.
- Calls the helper from both Electron entries: `apps/packaged` before
  its own `whenReady`, and `runDesktopMain` for tools-dev parity.
- Forwards the resolved locale through
  `BrowserWindow.webPreferences.additionalArguments` as
  `--od-os-locale=<bcp-47>`, parsed by the preload and exposed at
  `window.__od__.client.osLocale`. The host bridge type
  (`OpenDesignHostClient.osLocale`) is extended accordingly.
- Updates `detectInitialLocale` in `apps/web/src/i18n/index.tsx` to
  read that field as a new step between the existing localStorage and
  navigator fallbacks. Browser/web continues to fall through to
  `navigator.languages` unchanged.

The explicit `osLocale` channel exists in addition to `--lang` because
some `app.getPreferredSystemLanguages()` strings (e.g. `zh-Hant-TW`,
`pt-PT`) need to round-trip through `resolveSystemLocale` to land on
the right supported locale, which Chromium's `navigator.language`
cannot do on its own.

* fix(web): route OS locale read through getOpenDesignHost

The first cut of detectInitialLocale read `window.__od__.client.osLocale`
directly, which trips `tests/host-boundary.test.ts` — that guard test
keeps web source from referencing preload globals by name so the
boundary stays single-source. Switch to `getOpenDesignHost()` from
`@open-design/host`, and rewrite the i18n test to install the host via
`installMockOpenDesignHost` instead of poking the global directly.

* fix(tools-pack): unblock mac packaged build on pnpm workspaces

Two independent issues prevented `pnpm tools-pack mac build --to all`
from completing on a clean macOS workspace, both unrelated to the
desktop OS-locale change in this PR but bundled here because verifying
that change end-to-end required the packaged pipeline to actually
finish.

1. `apps/web/.next/standalone/node_modules/.pnpm/node_modules/<pkg>`
   contained dangling symlinks left by Next's nft trace (e.g. a
   `semver -> ../semver@5.7.2/node_modules/semver` link to a
   `.pnpm/<pkg>@<ver>` directory pnpm never created). The downstream
   `cp { dereference: true }` aborted the whole packaged pipeline
   with ENOENT. Walk every artifact tree before copy and unlink
   symlinks whose target doesn't resolve. Targets that *do* resolve
   stay untouched.

2. Next 16's standalone build under pnpm workspaces does not hoist
   peer-dep packages (react, react-dom, styled-jsx) into
   `<standalone>/apps/web/node_modules`. The downstream
   `web-standalone-after-pack.cjs` audit then does
   `createRequire(server.js).resolve('react/package.json')`, whose
   module walk falls out of the standalone tree and aborts the
   electron-builder phase. Add a `hoistStandaloneNextPeerDeps` step
   for the web standalone artifact only: it locates the
   `<pkg>@<version>` (not peer-resolved sibling) directory under
   `.pnpm` and symlinks it into `apps/web/node_modules/<pkg>`. The
   subsequent `cp { dereference: true }` then writes the real
   directory into the cache so the packaged tree stays self-contained.

Verified by `pnpm tools-pack mac build --to all` succeeding end-to-end
(zip + dmg + app), then `pnpm tools-pack mac install` and
`pnpm exec tools-pack mac inspect --expr` reading the desired
`__od__.client.osLocale` from the packaged renderer.

* feat(desktop): fold encodeURIComponent + manual locale source + pet window from #2554

Three defensive improvements lifted from @Eli-tangerine's parallel
implementation on #2554, kept consistent with the OS-locale chain
already on this branch:

- The argv value crossing main → preload is now wrapped with
  encodeURIComponent / decodeURIComponent so a locale string with `;`,
  `=`, or any other Chromium argv special char round-trips cleanly.
  BCP-47 region tags don't carry those today, but the renderer parser
  no longer has to assume it.
- `setLocale` now also writes `open-design:locale-source = "manual"`
  to localStorage, and `detectInitialLocale` only treats the stored
  locale as winning when that marker is present. An untagged value
  (left over from a future auto-write path, or a stale install) no
  longer pins the app to an old language once the host injects a
  fresh OS locale. Today `setLocale` is the only writer so the marker
  has no behaviour difference yet — this is a defensive net.
- `createDesktopPetWindow` now receives `osLocale` and forwards the
  same `additionalArguments` as the main `BrowserWindow`, so the
  pet renderer's `__od__.client.osLocale` is consistent with the main
  window's instead of being silently undefined.

Co-authored idea credit: changes mirror the locale-piece of
@Eli-tangerine on #2554 — that PR is closing in favour of this one.

Tests: detect-initial-locale gets a new "untagged localStorage value
loses to host locale" case. desktop 62/62, host 13/13, web i18n +
host-boundary 15/15 stay green.

* feat(web): fold onboarding view styles from #2554

Pulls the 747-line addition to `apps/web/src/styles/home/entry-layout.css`
from @Eli-tangerine's #2554 — the visual layer for the global onboarding
flow (`/onboarding` view, Connect / About-you / Design-system steps).
The view itself was already plumbed through `EntryShell.tsx`; this adds
the styling that makes it shippable on v0.8.0.

#2554 is closing in favour of this branch, so the CSS lands here so the
onboarding work doesn't get dropped on the floor.

Co-authored idea credit: @Eli-tangerine — original styling on #2554.

* fix(tools-pack): make hoistStandaloneNextPeerDeps idempotent across builds

Addresses non-blocking review by @PerishCode on #2560: the previous
`if (await pathExists(linkPath)) continue;` guard uses `access()`,
which follows symlinks. A stale symlink from a previous build whose
`.pnpm/<pkg>@<version>` target moved (e.g. after a react/react-dom
version bump that invalidates the workspace-build cache key and forces
a re-run) reports as missing through `pathExists`, then `symlink()`
rejects with EEXIST and the unhandled rejection aborts the packaged
build.

Switch to `lstat` (which does not follow the link) so we can tell
"genuinely empty slot", "real directory left by Next" and "stale
symlink" apart, then unlink stale entries before re-creating. Also
move `stripBrokenSymlinks` ahead of `hoistStandaloneNextPeerDeps` in
`copyWorkspaceBuildArtifactsToCache` so any leftover dangling links
that survived a previous run are cleared before hoist tries to write.
2026-05-21 18:23:20 +08:00
lefarcen
6bb0f0fd91
feat(observability): web lifecycle telemetry + stable installationId migration (#2527)
* feat(observability): web lifecycle telemetry + stable installationId migration

Two intertwined safety-telemetry additions for the 0.8.0 release.

Web lifecycle observability
---------------------------
New `apps/web/src/observability/` module installed at module load via
client-app.tsx — alongside the existing error-tracking exception hooks
from #2521. Reuses error-tracking's direct-fetch transport (the same
consent-bypass + early-buffer guarantees) so every event flows even when
the user has opted out of general analytics:

  - client_long_task         PerformanceObserver longtask >100ms (real
                             "feels janky" signal, FPS proxy)
  - client_white_screen      app fails to mount after 5s; MutationObserver
                             cancels the timer the moment the React root
                             renders so a normal boot is zero events
  - client_resource_error    capture-phase window.error catches failed
                             <script>/<link>/<img>/<iframe> loads
                             (chunk-load failures, broken artifact refs)
  - client_boot_timing       navigationStart → load timings via
                             Navigation Timing v2
  - client_visibility_change visibilitychange + page lifetime
  - client_session_summary   real foreground duration emitted on pagehide
  - client_run_stuck         5min watchdog on SSE runs that don't progress
                             (#2464 / #2405 / #1451 in data form)
  - client_iframe_error      FileViewer iframe load failures (iframe
                             errors don't bubble to window, so the global
                             resource-error observer can't see them)
  - desktop_renderer_crash   Electron main observes render-process-gone
                             and forwards to daemon /api/observability/event
  - daemon_uncaught_exception
    daemon_unhandled_rejection
                             process-level handlers on the daemon

error-tracking.ts is generalised: `reportSafetyEvent(name, props)` now
exposes the same buffer + direct-fetch transport that `reportHandledException`
used, with identical $exception wire shape preserved for the existing
exception path.

Daemon cross-process bridge
---------------------------
New `AnalyticsService.captureSafety()` skips the consent re-check and
posts via posthog-node with installationId as distinct_id. Wired into:

  - `POST /api/observability/event` for desktop main and any future
    helper process that needs to ship a safety event (no consent check —
    same contract as web's direct-fetch path)
  - `process.on('uncaughtException')` / `unhandledRejection` on the
    daemon itself

Stable installationId across reinstalls (critical for 0.8.0 rollout)
--------------------------------------------------------------------
installationId previously lived in `<namespace>/data/app-config.json`,
so a packaged reinstall that churned the namespace token (or any future
namespace-scoped data wipe) rotated the id and the user showed up as a
brand-new PostHog person. This is the immediate trigger: when 0.8.0
ships, every 0.7.x user upgrading would silently double the user count.

New module `apps/daemon/src/installation.ts` reads/writes
`<installationDir>/installation.json` at the channel root. The daemon
gets the path from `OD_INSTALLATION_DIR`, set by
`apps/packaged/src/sidecars.ts` to `paths.installationRoot`
(one level above `namespaces/` — e.g.
`~/Library/Application Support/Open Design Nightly/` on mac).

`readAppConfig` transparently merges: if installation.json has an id it
wins; if only app-config.json has one (the 0.7.x state), it gets mirrored
to installation.json on the next read. `writeAppConfig` mirrors any
explicit installationId write, including the null-clear path used by
Settings → "Delete my data". 7 call sites of readAppConfig keep their
signatures unchanged.

Survives:
  - same-channel reinstall (DMG drag-replace, NSIS reinstall)
  - namespace churn between packaged builds
  - per-namespace data reset (future installer that clears `<ns>/data/`)

Still rotates (intentionally):
  - explicit "Delete my data"
  - manual `rm -rf "~/Library/Application Support/Open Design <Channel>/"`
  - different channel (Stable vs Nightly stay distinct because userData
    paths differ; that's the existing channel-isolation contract)

What this changes for posthog-js
--------------------------------
client.ts had `capture_exceptions: false` from #2521; nothing else
changes. autocapture / $pageview / $autocapture / track() / daemon
analyticsService.capture() — all unchanged. New events are additive.

Validation
----------
  - pnpm guard                              pass
  - pnpm typecheck                          whole repo pass
  - pnpm --filter @open-design/web test     200 files / 1824 tests
  - pnpm --filter @open-design/daemon test  251 files / 2981 tests
    (includes 10 new tests in installation.test.ts pinning the 0.7.x →
    0.8.0 migration, namespace-wipe survival, delete-my-data clear, and
    fresh-id rotation)
  - pnpm --filter @open-design/packaged test 9 files / 89 tests
  - Pre-existing baseline: apps/desktop/src/main/updater.ts has typecheck
    references to RELEASE_CHANNEL_NAMES.PREVIEW/NIGHTLY on release/v0.8.0;
    unrelated to this PR.

* fix(observability): preserve fatal exit on uncaught + skip loading shell in white-screen check

Addresses codex review on PR #2527 (Siri-Ray).

1) Daemon process handlers must keep Node fatal semantics

Installing an uncaughtException listener silences Node's default
crash/exit; Node 15+ does the same for unhandledRejection when a
listener is present. The previous handlers logged telemetry and let
control return to the event loop, leaving a corrupted daemon serving
requests instead of letting the supervisor restart it cleanly.

triggerFatalShutdown() now:
  - dispatches captureSafety once (guarded against re-entry from
    cascading faults)
  - races posthog-node's shutdown against a 1s bounded timeout so a
    slow flush can't keep the process alive
  - calls process.exit(1) after the race resolves
Both uncaughtException and unhandledRejection route through it.

apps/daemon/tests/uncaught-fatal-shutdown.test.ts pins:
  - captureSafety is invoked exactly once even on repeated faults
  - exit(1) fires on the happy path
  - exit(1) still fires when shutdown hangs past the timeout
  - exit(1) still fires when captureSafety itself throws

2) White-screen detector treated the loading shell as a successful mount

apps/web/app/[[...slug]]/client-app.tsx renders the dynamic-import
fallback as <div class="od-loading-shell">Loading Open Design…</div>
whose visible text (19 chars) exceeded the previous 10-char floor.
monitorMount() would therefore cancel the 5s timer the instant Next
swapped the loading shell in, completely missing the white-screen
signal the observer is meant to add.

isAppMounted() now:
  - primary signal: <html data-od-app-mounted="1"> set by App.tsx's
    first useEffect — authoritative because once App has mounted at
    least once, any later tree crash is an $exception story, not a
    white-screen story
  - fallback: only counts children of the root container whose
    classList does NOT include known loading-shell markers
    (od-loading-shell). Their visible text drives the > MIN_VISIBLE_TEXT
    check, so the loading sentinel can never be mistaken for a mount.

apps/web/tests/observability/white-screen.test.ts pins:
  - fires client_white_screen when only the loading shell is present
    after the timeout
  - does NOT fire when data-od-app-mounted is set before the timeout
  - cancels the timer the moment a real workspace-shell child appears
    alongside the loading shell
  - still fires when only sub-MIN_VISIBLE_TEXT non-shell content is
    present (effectively blank)

Validation:
  - pnpm guard pass
  - pnpm typecheck pass
  - pnpm --filter @open-design/daemon test  252 files / 2985 tests
  - pnpm --filter @open-design/web test     201 files / 1828 tests

* fix(observability): await captureSafety enqueue before fatal shutdown flush

Addresses second-pass codex review on PR #2527 (Siri-Ray, 3279268246).

The previous fatal-shutdown path called `analyticsService.captureSafety()`
synchronously and immediately raced `analyticsService.shutdown()` against
the bounded timeout. captureSafety in apps/daemon/src/analytics.ts does
its real `client.capture()` call only inside an async IIFE after
`await readInstallationIdSafe()` — so shutdown could win the race,
drain an empty posthog-node queue, and let `process.exit(1)` run BEFORE
the daemon crash event ever got enqueued. We'd then preserve the
process-lifecycle contract but lose the exact signal this PR is adding.

Changes:

  - AnalyticsService.captureSafety now returns Promise<void>. The async
    IIFE is gone; the body awaits readInstallationIdSafe directly so the
    returned promise resolves only AFTER client.capture() has been
    invoked (which is when posthog-node's local buffer contains the
    event).
  - server.ts triggerFatalShutdown awaits captureSafety, then calls
    shutdown, and races that whole sequence against the 1s bounded
    timeout. Capture failures still don't block exit (try/catch around
    the await).
  - NOOP_SERVICE.captureSafety becomes `async () => undefined` to
    match the new signature.
  - Fire-and-forget callers (/api/observability/event) are unaffected;
    voiding the returned promise keeps them non-blocking.

apps/daemon/tests/uncaught-fatal-shutdown.test.ts adds the reviewer-
requested fixture:

  - 'waits for the captureSafety promise to settle before invoking
    shutdown' — gives capture a 50ms delay and shutdown a separate 50ms
    delay so the intermediate "capture done / shutdown not yet" state
    is observable.
  - 'still aborts and exits if captureSafety hangs past the bounded
    timeout' — captureSafety never resolves; the outer 1s timeout still
    forces process.exit(1).

Validation:
  - pnpm guard                                pass
  - pnpm typecheck                            whole repo pass
  - pnpm --filter @open-design/daemon test    252 files / 2987 tests
2026-05-21 15:37:48 +08:00
lefarcen
88dee44892
feat(analytics): always-on $exception capture with early window hooks (#2521)
PostHog Error tracking was missing the vast majority of real exceptions:

  1. posthog-js's capture_exceptions: true is silenced by opt_out_capturing,
     so every opted-out user vanished from the error feed even though we
     could perfectly safely keep collecting their stacks (the consent
     toggle's user copy gates analytics, not safety telemetry).
  2. posthog-js is dynamically imported only after /api/analytics/config
     resolves AND the user has consented. Errors thrown during the first
     1-2 seconds (React hydration, early effects) had no listener to
     catch them.

Net effect: 14d $exception count was 54 events / 10 users across ~5k DAU,
producing the misleading 99.93% crash-free curve in PostHog's dashboard.

This PR makes exception capture independent of both gates:

  - apps/web/src/analytics/error-tracking.ts (new): own window.error +
    unhandledrejection handlers, in-memory buffer (capped at 50 entries),
    direct fetch to https://<host>/i/v0/e/ with the public phc_ key. Same
    scrub layer as the posthog-js path so file paths still get redacted.
  - apps/web/app/[[...slug]]/client-app.tsx: installErrorHandlers() at
    module-load, before React or any feature code can throw.
  - apps/web/src/analytics/provider.tsx: bootstrapExceptionTracking() in
    the identity useEffect, parallel to getAnalyticsClient() — runs
    regardless of consent state, fetches /api/analytics/config, hands the
    phc_ key + host + distinctId to the error tracker so buffered events
    can flush.
  - apps/web/src/analytics/client.ts: capture_exceptions: false so
    posthog-js stops also emitting $exception (would have produced
    duplicate events server-side); also re-bridges the error-tracking
    context inside the loaded() callback so future events inherit the
    fully-resolved appVersion / sessionId.
  - apps/daemon/src/server.ts + packages/contracts: /api/analytics/config
    now returns key + host even when consent=false. enabled still reflects
    only the analytics consent toggle (posthog-js full autocapture stays
    off when enabled=false), but the always-on error tracker can read key
    directly. Forks without POSTHOG_KEY still get key=null and the whole
    pipeline becomes a no-op — fork-safe by construction.
  - apps/web/src/analytics/scrub.ts: regex fix so packaged-mac paths like
    /Applications/Open Design.app/Contents/Resources/apps/web/... (which
    contain a space) get fully rewritten to app://apps/web/...; previously
    the [^\s] guard stopped at 'Open' and leaked the install dir.

Validation:

  - pnpm --filter @open-design/web typecheck: pass
  - pnpm --filter @open-design/web test: 199 files / 1823 tests pass
    (includes 8 new error-tracking.test.ts cases for buffer cap, hook
    install, scrub, and direct dispatch)
  - pnpm --filter @open-design/daemon test: 250 files / 2971 tests pass
  - pnpm guard: pass

After release/v0.8.0 ships and rolls out, expect the crash-free curve to
drop from the artificial 99.93% to a realistic 95-98% — that's not a
regression, it's the first time we're measuring it.
2026-05-21 13:07:26 +08:00
lefarcen
f5f8937421 Merge origin/main into release/v0.8.0
Conflict resolved by taking origin/main:

- apps/web/src/components/EntryNavRail.tsx  design-systems rail
  button icon name palette-filled (release-side) -> blocks (main);
  main's icon swap is part of the more recent design-systems rail
  pass.
2026-05-21 10:52:08 +08:00
Eli-tangerine
ce95266586
[codex] Polish home composer working-directory controls (#2468)
Some checks failed
visual-baseline / Capture visual baselines (push) Waiting to run
ci / Detect CI change scopes (push) Successful in 1s
nix-check / build (push) Failing after 3s
ci / Preflight (push) Failing after 2s
ci / Core package tests (push) Failing after 1s
ci / Tools workspace tests (push) Failing after 1s
ci / Daemon workspace tests (1/2) (push) Failing after 1s
ci / Daemon workspace tests (2/2) (push) Failing after 1s
ci / Web workspace tests (push) Failing after 1s
ci / E2E vitest (push) Failing after 1s
ci / Playwright critical (starters) (push) Failing after 1s
ci / Playwright critical (core) (push) Failing after 1s
ci / Build workspaces (push) Failing after 1s
ci / App workspace tests (push) Failing after 0s
ci / Validate workspace (push) Failing after 0s
ci / Runtime trace (push) Has been skipped
* Polish design system home flows

* Polish home prompt presets

* Polish home working directory controls

* test: align home hero chrome smoke

* fix: stabilize home composer ci checks

---------

Co-authored-by: qiongyu1999 <2694684348@qq.com>
2026-05-21 00:22:46 +08:00
lefarcen
722ddfa235 Merge origin/main into release/v0.8.0
Conflicts resolved by taking origin/main on both files. Root cause:
main's PR #2460 (fix(landing): align logo.webp with brand icon) changed
HomeHero.tsx's .home-hero__brand-mark to render <img src=/app-icon.svg>
instead of an inlined <HeroBrandIcon /> SVG, and bundled the matching
CSS (26px round badge with bg-panel + border + padding 2px) plus a
gap/font-size tune. The release-side visual-refresh CSS still targeted
the SVG layout (38px square, transparent, inset SVG selector). Keeping
release's CSS would leave main's <img> unstyled.

- apps/web/src/styles/home/home-hero.css  three blocks, all taken from
  main: .home-hero__brand gap 8px, .home-hero__brand-mark redesigned for
  <img> child, .home-hero__brand-name font-size 16px.
- apps/web/src/index.css  two blocks, both taken from main: workspace
  tab close column 22px and .workspace-tab__close 18x18 (paired
  tune-down of tab UI spacing).
2026-05-20 22:28:38 +08:00