Commit graph

512 commits

Author SHA1 Message Date
BayesWang
af4a62b69a
Add configurable project locations (#2041)
* add daemon project location support

* wire project locations into web settings

* localize project location settings

* move default project location to settings

* polish project location selection cards

* fix project location i18n gaps

* fix external project validation cleanup
2026-05-31 04:47:45 +00:00
Dan Porat
3395d2c855
feat(daemon): implement fal.ai renderer for image + video generation (#1606)
* feat(daemon): implement fal.ai renderer for image + video generation

Adds renderFalImage and renderFalVideo backed by the fal queue API
(queue.fal.run). Any fal-ai/* model path can be used directly without
a catalog entry, enabling the full fal model library without code
changes. Catalogued shortcuts are mapped via FAL_ENDPOINTS to their
fal-ai/* paths; OD_FAL_MAX_POLL_MS controls the poll ceiling.

Expands the fal model catalog with flux-pro-ultra, flux-dev-fal,
flux-schnell-fal, ideogram-v3-fal, recraft-v3-fal (images) and
veo-3-fal, veo-2-fal, wan-2.1-t2v, wan-2.1-i2v, seedance-1-pro-fal,
kling-2.1-t2v-fal (video). Marks fal provider as integrated: true in
both daemon and web model registries.

* fix(daemon): address fal renderer review comments

- Correct Wan 2.1 endpoints: wan-video/v2.1/* → fal-ai/wan-t2v / fal-ai/wan-i2v
- Correct Kling 2.1 t2v endpoint: .../pro/... → .../master/text-to-video
- Add FAL_IMAGE_USES_ASPECT_RATIO: flux-pro-ultra sends aspect_ratio not image_size
- Add FAL_VIDEO_NO_DURATION: Wan models reject the duration field
- Add FAL_VIDEO_STRING_DURATION: Veo expects duration as "5s" not 5
- Fix falQueueBase() to use anchored regex replace, avoiding mangled
  custom base URLs
- Do not wrap payload under input — raw fal queue HTTP API expects flat
  body; the input wrapper is an SDK abstraction only (confirmed by 422
  validation error from fal showing prompt missing at body.prompt)

* fix(daemon): correct fal queue protocol comment (flat body, no SDK input wrapper)

* fix(daemon): clamp Veo duration to valid fal buckets (4s/6s/8s)

* fix(daemon): report effective fal Veo duration in providerNote (with snap warning)

* fix(daemon): reduce image generation latency from 4m37s to ~73s

Five layered fixes targeting the overhead that padded a ~10s fal API call
into a 4m37s user-facing wait:

1. Skip DISCOVERY_AND_PHILOSOPHY for media surfaces (image/video/audio).
   The ~3000-token HTML-artifact discovery layer is irrelevant for media
   generation and forced the agent to parse and override all its rules
   before dispatching. Removes it from the system prompt entirely for
   these surfaces; MEDIA_GENERATION_CONTRACT is the sole authority.

2. Broaden the wait-loop contract to cover ALL slow models, not just
   "Volcengine i2v / hyperframes-html". Any model whose generation
   exceeds 25s — including fal flux-pro-ultra, Veo, Sora — returns exit 2
   from od media generate. The contract now makes this universal and
   provides a python3-based bash pattern (jq is not guaranteed to be
   installed on all agent runtimes).

3. Increase od media wait polling budget from 25s to 120s. od media
   generate keeps its 25s budget for fast feedback; od media wait is
   purpose-built to sit and poll, so it can safely use the full 2-minute
   bash-tool window. Reduces re-entries for a 3-minute generation from
   ~7 to ~2.

4. First fal poll is now immediate instead of always sleeping 3s before
   the first status check. Saves 3s for all fal jobs.

5. Project metadata no longer emits "(unknown — ask)" for imageModel and
   aspectRatio when unset. Emits the actual defaults (gpt-image-2,
   aspect-ratio scene heuristic) so the agent can dispatch without
   extended reasoning about model selection. Also adds dispatch-immediately
   defaults and a brief-reply rule (2–3 sentences max after generation).

Measured end-to-end on the exact problem prompt before/after:
  Before: 4m37s (discovery form + 7x LLM re-entries + jq failure)
  After:  ~73s   (single bash loop, no question turn, image delivered)

* feat(daemon): inject media dispatch hint for non-media project surfaces

Agents running inside prototype, deck, and other non-image/video/audio
projects previously had no knowledge of `od media generate`, so when
asked to create an image with fal they would try to call provider REST
APIs directly and ask the user for API keys — even though the daemon
already holds credentials in .od/media-config.json.

Add MEDIA_DISPATCH_HINT to composeSystemPrompt for all non-media
surfaces. The hint tells the agent to always route media generation
through the daemon dispatcher, and explicitly forbids prompting for
API keys.

Verified end-to-end: a prototype project generates a 952 KB image via
flux-pro-ultra in ~52s with no key errors.

* fix(daemon): prevent agent from converting bash env vars to PowerShell syntax

MEDIA_DISPATCH_HINT now explicitly labels the shell as POSIX bash and
shows the correct $VAR form side-by-side with a warning NOT to use
PowerShell $env:VAR. Without this, claude-sonnet running on a Windows
host converts the example to PowerShell syntax (`& $env:OD_NODE_BIN`)
which then fails at the bash executor with 'syntax error near unexpected
token &'.

* fix(daemon): add generate→wait loop to MEDIA_DISPATCH_HINT for slow models

MEDIA_DISPATCH_HINT previously showed only a bare  call.
flux-pro-ultra and other slow models always exit 2 after ~25s — without
the wait loop the agent would treat exit 2 as a failure and report an
error to the user.

Replace the single-command example with the canonical generate→wait loop
(matching media-contract.ts), add an explicit note that exit 2 means
'keep polling', and reinforce the POSIX bash / no-PowerShell rule
directly inside the code block.

* fix(daemon): allow fal-ai/* passthrough in media-agent contract

The media-agent prompt instructed the agent to warn and substitute the
default model for any ID not in the catalogue. This blocked the custom
fal-ai/* passthrough path the daemon already supports, so users could
not reach uncatalogued fal models from the normal chat flow.

Carve out the fal-ai/* exception so the agent passes those IDs through
directly instead of warning or substituting.

* fix(daemon): align MEDIA_DISPATCH_HINT with exit-0 generate contract

media generate now always exits 0 (handoff included). The non-media
agent hint still checked ec==2 to decide whether to keep polling, so
slow fal models (flux-pro-ultra, veo-3-fal) would stop after printing
the handoff JSON instead of entering the wait loop.

- generate error check: drop the ec!=2 exception (exits 0 always)
- while loop: drive on taskId presence, not ec==2; stop on ec==0/5
- footer: remove --surface inference claim; CLI requires it explicitly

* fix(guard): add test-fal-webui.ts to e2e scripts allowlist

CI failed: guard flagged e2e/scripts/test-fal-webui.ts as an
unapproved package-owned entrypoint. Add it to allowedE2eScripts.

* fix(daemon): update prompt test expectations to match exit-0 handoff wording

The two stale assertions checked for the old generate-exits-2 copy which
no longer exists in the contract. Update them to match the current
always-exits-0 wording.

* fix(daemon): move skipDiscoveryBrief override before discovery block

* chore(e2e): remove ad-hoc fal webui test script

The script was a one-time developer helper used to manually validate fal
image generation through the live UI. It relied on a real fal API key and
hardcoded local port, so it cannot participate in the e2e package's
fixture/reporting/CI conventions. Removing it per reviewer feedback.

- Delete e2e/scripts/test-fal-webui.ts
- Remove its guard.ts allowlist entry
- Gitignore the file and its screenshots to prevent accidental re-addition

* chore: remove accidental local scratch files from branch

Remove bash.exe.stackdump (MSYS crash dump) and fix_loop.py (one-off
local rewrite helper) — neither is a repo-owned source artifact.

* fix(prompts): document fal-ai/* passthrough in non-media dispatch hint

Prototype/deck agents now know arbitrary fal-ai/* model ids are valid
--model values and should be forwarded as-is, mirroring the exception
already present in media-contract.ts. Adds a prompt regression test.

* fix(daemon): use renderMediaGenerationContract(mediaExecution) for media surfaces

---------

Co-authored-by: mrcfps <mrc@powerformer.com>
2026-05-31 04:44:44 +00:00
kami
333a62cda6
fix: link od bin after fresh install (#2069)
* fix: link od bin after fresh install

* test: lock root od bin shim path

* test: cover root workspace deps in postinstall scan

* chore(nix): refresh pnpm deps hash
2026-05-31 04:36:49 +00:00
Denis Redozubov
729ce2b0cb
feat(daemon): add run-scoped MCP tool bundles (#3244)
* feat(daemon): add run-scoped MCP tool bundles

* fix(daemon): keep sandbox runs in managed project dirs

* fix(daemon): reject malformed run tool bundles

* fix(contracts): model run-scoped mcp server inputs

* fix(daemon): reject unsupported run tool bundles

* fix(daemon): validate run tools before chat fallback

* test(daemon): expect sandbox imported folder failure

* fix(daemon): preflight sandbox project roots before run rows

* fix(daemon): preflight sandbox chat project roots

* fix(daemon): allow host editor for sandbox imports

* fix(daemon): preflight sandbox routine project reuse

* fix(daemon): reject undeliverable Claude tool bundles

* fix(daemon): single-source chat route validation
2026-05-31 03:53:04 +00:00
蓝宙
e8c179d3a6
fix: show cumulative conversation duration (#3354)
* fix: show cumulative conversation duration

* fix: include usage-only run durations

---------

Co-authored-by: Lanzhou3 <217479610+Lanzhou3@users.noreply.github.com>
2026-05-31 03:52:12 +00:00
mehmet turac
8448b1105c
fix: preserve OpenClaude fallback credentials (#3361)
Some checks failed
visual-baseline / Capture visual baselines (push) Waiting to run
ci / Detect CI change scopes (push) Successful in 0s
landing-page-ci / Validate landing page (push) Failing after 1s
landing-page-staging / Deploy landing page to staging (push) Has been skipped
nix-check / build (push) Failing after 2s
ci / Validate Nix flake (push) Has been skipped
ci / Preflight (push) Failing after 2s
ci / Workspace unit tests (push) Failing after 2s
ci / Daemon workspace tests (push) Failing after 2s
ci / Web workspace tests (push) Failing after 2s
ci / Browser tests (push) Failing after 2s
ci / Build workspaces (push) Failing after 2s
ci / Validate workspace (push) Failing after 1s
ci / Runtime trace (push) Has been skipped
2026-05-31 03:49:25 +00:00
Denis Redozubov
f4c5d22f22
fix(daemon): confine sandbox project roots and host discovery (#3243)
* fix(daemon): confine sandbox project and host discovery

* fix(daemon): resolve sandbox data dir for toolchain discovery

* fix(daemon): resolve sandbox data dir for agent env

* fix(daemon): fail fast for sandbox imported folders

* test(daemon): assert sandbox imported folder rejection

* fix(daemon): keep sandbox import guard at run start

* fix(daemon): reject sandbox imported project file roots

* fix(daemon): preserve imported project detail roots

* test(daemon): expect sandbox profiles to stay scoped

* fix(daemon): bypass proxies for agent tool callbacks

* test(daemon): isolate media policy route memory extraction

* fix(daemon): keep loopback no-proxy scoped to sandbox
2026-05-30 16:57:04 +00:00
Denis Redozubov
9a3424d68c
feat(daemon): add sandbox runtime foundation (#3242)
* feat(daemon): add sandbox runtime foundation

* fix(daemon): preserve sandbox roots after agent env overrides

* fix(daemon): keep readiness probes pathless

* fix(daemon): harden headless run fallbacks

* fix(daemon): bootstrap sandbox runtime discovery

* fix(daemon): preserve explicit sandbox agent profile mounts

* fix(daemon): keep sandbox profile lookup run scoped

* fix(daemon): normalize sandbox data dir input

* fix(daemon): pin sandbox env roots to base data dir
2026-05-30 15:06:05 +00:00
RyanCheng77
b76e7196db
fix(daemon): dedupe Claude stream wrappers (#3334)
* fix(daemon): dedupe Claude stream wrappers

* fix(daemon): split Claude stream dedupe state

---------

Co-authored-by: 116405 <116405@ky-tech.com.cn>
2026-05-30 06:12:29 +00:00
Weston Houghton
7a9dcf38d7
fix(memory): deliver OpenCode extraction prompt on stdin (#3238)
`opencode run`'s `-f, --file` is a yargs array option that greedily
consumes every trailing non-flag token, so the memory extractor's
`--file <prompt-file> "<message>"` invocation made OpenCode treat the
message text as a second attachment and exit 1 with "File not found".
Every LLM memory extraction failed for OpenCode Local CLI users.

Deliver the prompt on stdin like the chat-run path (def.promptViaStdin)
and drop the --file attachment. The connector-memory test now models
the real yargs --file array-greediness so it would catch a regression.
2026-05-30 04:48:42 +00:00
Ramiro
c33641e592
fix(daemon): normalize cumulative ACP message chunks (#3333)
* fix(daemon): normalize cumulative acp message chunks

- apps/daemon/src/acp.ts
- apps/daemon/tests/acp.test.ts
- apps/web/src/providers/daemon.ts
- apps/web/src/components/DesignSystemFlow.tsx

Convert cumulative ACP message snapshots into suffix deltas and keep temporary browser debug instrumentation for trace verification.

* chore(web): remove temporary stream debug hooks

- apps/web/src/providers/daemon.ts
- apps/web/src/components/DesignSystemFlow.tsx

Remove the browser debug accumulator after validating the ACP duplication trace.
2026-05-30 04:17:32 +00:00
xinsngx
41b1cd763e
fix(media): hide OpenAI OAuth-only image credentials (#3308)
* fix(media): ignore OpenAI OAuth tokens

Agent-Model: gpt-5

Agent-Family: openai

Agent-Session: 019e6ceb-c33d-7cd3-bff0-cbc20c642197

Agent-Step: 0.0.1

* fix(media): hide unavailable model providers

Agent-Model: gpt-5

Agent-Family: openai

Agent-Session: 019e6ceb-c33d-7cd3-bff0-cbc20c642197

Agent-Step: 0.0.2

* fix(media): clear unavailable picker models

Agent-Model: gpt-5
Agent-Family: openai
Agent-Session: 019e6ceb-c33d-7cd3-bff0-cbc20c642197
Agent-Step: 0.0.3

* fix(media): keep missing-model projects executable

Agent-Model: gpt-5
Agent-Family: openai
Agent-Session: 019e6ceb-c33d-7cd3-bff0-cbc20c642197
Agent-Step: 0.0.8

---------

Co-authored-by: Codex <gpt-5@openai.com>
2026-05-30 04:12:10 +00:00
JasonBroderick
0fbeaf829e
fix(#3247): Detect, terminate, and warn on fabricated role markers across all agent paths (#3303)
* fix(daemon): detect and strip fabricated role markers in model output (#3247)

Three-layer defence against models emitting `## user` / `## assistant` /
`## system` lines mid-response, which the chat host interprets as real
turn boundaries and acts on as unauthorised instruction:

1. **System prompt**: anti-roleplay instruction elevated from a bullet
   under "What you don't do" to a standalone `## CRITICAL` section in
   `official-system.ts`, with a REMINDER pinned at the end of the
   composed prompt for recency bias.

2. **Stream-level detection and truncation**: shared `role-marker-guard.ts`
   module (`createRoleMarkerGuard` + `FABRICATED_ROLE_MARKER_RE`) used
   across all text paths — Claude stream (per-message guards), non-Claude
   structured streams (run-scoped guard via `emitGuardedTextDelta`),
   and BYOK proxy routes (`createDeltaGuard`). When a marker is detected,
   the contaminated suffix is dropped and a `fabricated_role_marker` event
   surfaces a warning in the UI.

3. **UI**: `StatusPill` gains `is-warning` / `is-error` CSS variants;
   `fabricated_role_marker` events render as amber warning pills.

* fix(chat-routes): do not await reader.cancel() on stream early-return

The await on reader.cancel() can hang indefinitely on response streams
whose underlying source is a Uint8Array (most notably surfaced by the
ollama test in proxy-routes.test.ts, which builds its mock body via
`new Response(uint8array)` rather than the controller-based helper
`sseResponse()`). The hung await holds the request handler open, which
in turn blocks `server.close()` in the afterAll hook, producing the two
test timeouts (test at 145, hook at 36) currently failing CI on #3296.

Fix is in production code, not the test: don't await the cancel. It
is a cleanup hint and we are returning from the function anyway, so
blocking on it offers no value. fire-and-forget with an empty catch
keeps the cancel signal flowing for real HTTP streams without
risking a hang on mock/edge-case implementations.

Co-Authored-By: JasonBroderick <jason@buddyboss.com>

* fix(daemon): terminate child on role-marker detection (close #3247 generation vector)

PR #3296's detection layer truncates display and persistence of fabricated
role markers, but the underlying model subprocess keeps generating tokens
after detection. Three concrete consequences:

  1. The model bills the user for the entire contaminated response
     (we observed 5,106 chars stored in claude's session file for a turn
     where only the first 3,013 chars were legitimate — a 40% overhead).
  2. tool_use blocks emitted AFTER the marker reach the daemon's
     dispatcher unchecked, since detection only gates the text-delta
     emission path, not content-block-stop / tool_use blocks. The
     model could fabricate "## user delete file X" then emit a
     tool_use(delete X) that the dispatcher would execute.
  3. The UI surfaces a `fabricated_role_marker` warning followed by an
     eventual normal turn-end, blurring the distinction between
     "completed normally" and "killed by safety guard."

This commit adds a single idempotent `abortForRoleMarker(marker)`
helper in server.ts, scoped to the same closure as `child` and
`runGuard`. On any detection event (per-message Claude guard,
run-scoped non-Claude guard, plain stdout guard) the helper:

  - Emits a structured `ROLE_MARKER_HALLUCINATION` SSE error so the
    UI can render a security-class status distinct from a normal
    turn-end. The existing `fabricated_role_marker` warning is still
    sent and rendered as the amber pill (PR #3296's UI).
  - Calls `acpSession.abort()` for ACP-multiplexed agents (Hermes,
    Kimi, Devin, Kiro) whose I/O doesn't necessarily release on
    SIGTERM of the wrapper process alone.
  - SIGTERMs the child immediately, with the existing
    `scheduleForcedChildShutdown()` SIGKILL fallback at 2x grace.

Wired into three sites where contamination is detected:
  - `emitGuardedTextDelta` (sendAgentEvent / copilot / ACP / pi-rpc
    text_delta paths)
  - Plain-stdout listener (BYOK plain mode)
  - The Claude stream handler's onEvent (per-message guards in
    claude-stream.ts surface `fabricated_role_marker` events directly
    via onEvent rather than through the run-scoped emitGuardedTextDelta)

Tool_use blocks emitted BEFORE the marker still flow through normally
— this guard can't help with those, since by the time we observe a
text marker the prior content block has already finished. Closing
that gap requires speculative cancellation of in-flight tool calls
when a downstream text block contains a marker; that's tracked as
follow-up work, not included here.

Co-Authored-By: roverkai <2196140098@qq.com>
Co-Authored-By: JasonBroderick <jason@buddyboss.com>

* refactor(role-marker-guard): bounded tail + drop chat-style markers

Addresses two review comments on #3303:

(1) O(1) memory + per-delta work (review r3323982225)
  Replace the unbounded `accumulated` string with a rolling tail capped
  at TAIL_BUFFER_SIZE (64 chars — comfortably exceeds the longest
  marker prefix `\n<whitespace>## assistant` ≈ 16–24 chars in practice).
  A 50 KB assistant response delivered in 1000 chunks of 50 bytes was
  previously O(n²) on string concatenation alone; now it is O(1) per
  delta regardless of message length. The `tail.length` value carries
  the "already emitted" offset that the cut-point math needs, so the
  offset semantics at L74–78 of the prior implementation are preserved
  without re-introducing the full-text buffer.

(2) Drop chat-style markers entirely (review r3323982234, option (a))
  `User:` / `Assistant:` / `Human:` / `AI:` are removed from the regex.
  Rationale:
    - The host parses ONLY `## user` / `## assistant` / `## system`
      lines as turn boundaries (see `buildDaemonTranscript` in
      apps/web/src/providers/daemon.ts). A model emitting chat-style
      markers does NOT cause the original #3247 security failure.
    - With kill-on-detection wired in this PR (`abortForRoleMarker`
      in server.ts), a false positive aborts the whole run — far
      more expensive than a stray unflagged `User:` line in chat
      scrollback. Chat-style markers collide with legitimate output
      (form labels, email contacts, JSDoc) often enough that pairing
      them with kill-semantics is the wrong tradeoff.
  The tradeoff is now documented in the regex docblock so the
  kill-on-match behaviour is justified against the false-positive
  surface.

Also aligns the prompt-side CRITICAL block in system.ts: drop the
"don't emit User: / Assistant: / Human: / AI:" bullet, since we no
longer enforce it. Less ambiguity for the model and the operators.

Test file updated:
  - Chat-style positive tests flipped to negative ("does NOT match
    User: — chat-style out of scope") so the intentional exclusion
    has a permanent regression test.
  - Two new tests cover the bounded-tail behaviour: a marker arriving
    after 10 KB of clean text in small chunks, and a marker
    straddling a chunk boundary after 100 prior chunks.
  - Added test for legitimate `User: bob@example.com`-style content
    not triggering contamination.
Test count is now 35 (up from 25); two of the new ones explicitly
exercise the new bounded-tail path.

Co-Authored-By: JasonBroderick <jason@buddyboss.com>

* fix(role-marker-guard): drop \`^\` anchor after first chunk (review r3324060995)

Blocking correctness bug introduced by commit 4 (bounded-tail refactor):
once \`tail\` is a rolling slice of mid-stream text, \`^\` in the
canonical regex \`(?:^|\\n)\\s*##\\s+(?:user|...)\` no longer represents
the genuine message start. As the rolling window slides forward chunk
by chunk, a sliced tail can begin with whitespace + \`##\` (or just
\`##\`), letting \`^\` anchor a match against text that the
full-buffer implementation correctly ignored. With kill-on-detection
wired in commit 3, that false positive now SIGTERMs the run and emits
a \`ROLE_MARKER_HALLUCINATION\` error — exactly the failure class
called out in the docblock at L22–29.

Reviewer's evidence (PerishCode, r3324060995): streaming
"…take a look at the ## user content section…" one character at a
time reports \`contaminated: true\` post-refactor; the same text in a
single feed stays clean.

Fix: keep the canonical \`FABRICATED_ROLE_MARKER_RE\` for the very
first non-empty feed (where \`^\` legitimately points at the message
start), and switch to an internal \`NEWLINE_ANCHORED_ROLE_MARKER_RE\`
(\`\\n\\s*##\\s+(?:user|...)\` — drops the \`^\` alternative) for all
subsequent feeds. A \`firstChunk\` boolean tracks the state. Real
newline-preceded markers straddling chunk boundaries are still caught
because the preceding \`\\n\` is retained inside the 64-char tail.

Regression tests added (\`apps/daemon/tests/role-marker-guard.test.ts\`):
  - mid-line \`## user\` streamed char-by-char with no preceding \\n
    (mirrors the reviewer's repro)
  - space-preceded mid-line \`## user\` in a >130-char stream, which
    long enough to force the rolling window past the marker — exercises
    the exact slice condition that triggered the bug
  - real \\n-preceded \`## user\` still caught after a long preamble
    (positive case must not regress)
  - \`## user\` as the very first chunk still caught (\`^\` legitimately
    anchors on the first feed)

Co-Authored-By: JasonBroderick <jason@buddyboss.com>

* fix(role-marker-guard): case-sensitive + tighter prefix scope (reviews r3324151877 / r3324151882)

Two refinements addressing the third review on #3303:

== Blocking (r3324151877) ==
The regex over-matched legitimate Markdown headings, and with
kill-on-detection wired in commit 3 each false positive
deterministically aborts a real run. Three changes tighten the match
to the actual security surface — `## user` / `## assistant` /
`## system` lines the chat host parses as turn boundaries — without
losing any real attack pattern:

1. CASE-SENSITIVE. Dropped the `/i` flag. The host's turn-boundary
   delimiter is lowercase (see `buildDaemonTranscript` in
   apps/web/src/providers/daemon.ts), and the `## CRITICAL`
   system-prompt block already forbids only the lowercase forms.
   Title-Case headings like `## User Guide`, `## System Architecture`,
   `## Assistant settings` are now ignored — these are legitimate
   technical writing patterns LLMs emit constantly. `## USER NOTES`
   (all-caps) likewise no longer flags.

2. POSITIVE LOOKAHEAD `(?=[^a-z])` after the role keyword. Without it,
   `## userland`, `## userspace`, `## users guide`, `## systemd`,
   `## assistance` all match via prefix in the alternation. The
   lookahead requires the next character to exist and to not be a
   lowercase letter, so:
     - `## user\\n…`     → match (newline is not lowercase)
     - `## assistantR…` → match (R is uppercase; the glued-form
                          attack pattern still gets caught)
     - `## assistant.`  → match (. is not a letter)
     - `## users guide` → no match (s is lowercase letter)
     - `## userland`    → no match (l is lowercase letter)
   POSITIVE rather than NEGATIVE `(?![a-z])` because the negative
   form is satisfied at end-of-string, which in a streaming context
   means "we have `## user` but don't know what comes next yet" —
   would fire prematurely if `land` arrives in a later chunk. The
   positive form delays detection by one character in that edge
   case, traded for correctness.

3. `[ \\t]` instead of `\\s` for inner whitespace. Markdown role
   markers are single-line by convention; restricting to space/tab
   prevents oddities like `##\\nuser` from matching across lines.

Test file: added Title-Case fixtures (`## User Guide`,
`## System Architecture`, `## Assistant settings`, `## USER NOTES`)
and prefix-of-longer-word fixtures (`## users guide`, `## userland`,
`## systemd`, `## assistance`) — each asserting NO contamination.
The existing `## usability` negative test gave false confidence as
the reviewer noted (only failed via alternation-miss, not via
word-boundary semantics); the new fixtures actually exercise the
lookahead. Also added a positive test for `## assistant.` (glued
punctuation) to balance the existing `## assistantReading`
(glued uppercase) coverage. Total tests: 35 → 50.

== Non-blocking (r3324151882) ==
Added `ROLE_MARKER_HALLUCINATION` to `API_ERROR_CODES` in
`packages/contracts/src/errors.ts` alongside the existing agent/AMR
codes, with a docblock comment explaining the emission contract:
emitted by `server.ts::abortForRoleMarker` alongside the existing
`fabricated_role_marker` warning event when the daemon detects a
fabricated Markdown role marker in agent output; retryable. The code
was already being emitted over the wire but unregistered — landing
the registration here keeps the contract and emitter in sync as
reviewer requested.

Co-Authored-By: JasonBroderick <jason@buddyboss.com>

* fix(role-marker-guard): defer complete-but-unconfirmed marker suffix

Addresses review r3324277xxx — the boundary case where a stream chunk
boundary lands between the role keyword and its lookahead character
violated the documented "everything from the marker onward is silently
dropped" contract. With (?=[^a-z]) as the lookahead, `feedText('## user')`
returned `## user` as safe (no char to satisfy the lookahead → no match
→ pass through), so the fabricated marker line leaked into UI and
app.sqlite before the next chunk confirmed contamination on the next
SIGTERM cycle.

Fix: introduce a `pending` state variable holding bytes that match the
COMPLETE-but-unconfirmed marker prefix at end of buffer
(/(?:^|\\n)[ \\t]*##[ \\t]+(?:user|assistant|assist|system)$/, no
lookahead, $ anchor instead). When the no-match branch detects this
suffix, withhold it from emission until the next feed either:
  - Confirms it (next char non-lowercase) → main regex matches →
    contaminated → withheld bytes dropped along with `## user`.
  - Denies it (next char lowercase, e.g. `userl…`) → main regex no
    longer matches the role keyword → withheld suffix is released
    and emitted alongside the new continuation.

Also tied the firstChunk transition to actual byte emission rather
than feed count. Previously a message that starts with `## system`
followed by a separate `\\n` chunk would lose the `^` anchor on the
second feed (firstChunk had flipped after the first feed even though
nothing was emitted yet), silently breaking detection for that edge
case. Now `firstChunk` stays true until at least one byte has crossed
the emission boundary, matching the conceptual definition of "message
start".

Tests added (apps/daemon/tests/role-marker-guard.test.ts):
  - `## user` deferred at chunk boundary, confirmed by `\\n` in next
  - `## user` deferred at chunk boundary, denied by `land` continuation
  - `## assistant` deferred, confirmed by punctuation
  - `## User` Title-Case still passes through unconditionally
  - `## system` as the very first chunk: deferred, confirmed by \\n
    in next chunk (tests the firstChunk-stays-true-when-nothing-
    emitted invariant)

Total tests: 50 → 55.

Co-Authored-By: JasonBroderick <jason@buddyboss.com>

* fix(claude-stream): scope role-marker guard to text_delta only, not thinking_delta

Addresses review r3324xxxxxx — guarding the thinking channel buys no
security and causes legitimate aborts.

Why thinking is NOT a #3247 vector:
  - `buildDaemonTranscript` in apps/web/src/providers/daemon.ts only
    re-serializes `m.content` as `## ${m.role}\n...`.
  - Extended-thinking content is rendered to a separate
    `kind: 'thinking'` payload (daemon.ts:857-858) and never folded
    into `m.content`.
  - So a `## user` line in the thinking channel CANNOT become a
    fabricated turn boundary on the next round-trip.

Why guarding it is harmful:
  - Models routinely emit literal `## user` / `## assistant` lines
    in chain-of-thought when reasoning about conversation structure
    ("Let me think about this. The user might phrase it as:\n## user\n
    …"). Common pattern in production traces.
  - With `abortForRoleMarker` wired in server.ts, a guard match on
    thinking SIGTERMs the run and surfaces a security error to the
    UI. The user paid for the reasoning, never sees the answer, and
    gets a confusing "fabricated role marker" warning for what was
    actually legitimate metacognition.
  - This directly contradicts the module's own stated philosophy
    ("a false positive aborts the whole run — a much more expensive
    failure than a stray unflagged ... line", role-marker-guard.ts).

Fix: `emitSafeText` now passes thinking_delta through unconditionally,
skipping both the guard and the contamination check. text_delta
remains fully guarded. The single-line change at the top of
emitSafeText preserves all other channels' behavior.

Regression tests added (apps/daemon/tests/claude-stream-thinking.test.ts):
  - `## user` / `## assistant` lines in a thinking_delta — must NOT
    fire fabricated_role_marker, the thinking content streams intact
    including the marker text, and the subsequent text_delta answer
    still reaches the consumer (run not aborted).
  - Sanity check: same `## user` pattern in a text_delta DOES fire
    fabricated_role_marker and truncates emission at the marker. Locks
    in the channel-discriminated behavior.

Co-Authored-By: JasonBroderick <jason@buddyboss.com>

* fix(role-marker-guard): tie firstChunk to slicing, not byte emission

Blocking review r3324xxxxxx: under the prior firstChunk transition
("any byte emitted"), a role marker that arrived at the very start of
a message with its prefix split across multiple chunks bypassed
detection — reopening the #3247 vector on the Claude path.

Concrete cases that were missed (all are routine provider
tokenizations of \`## user\n…\` at message start):
  - \`##\`     | \` user\nDELETE…\`
  - \`## us\`  | \`er\nDELETE…\`
  - \`## \`    | \`user\nDELETE…\`

Mechanism: the pending-deferral regex only catches COMPLETE role
keywords, so a first chunk ending in a partial prefix (\`##\`, \`## \`,
\`## us\`) was emitted in full. That emission flipped firstChunk to
false. From that point only NEWLINE_ANCHORED_ROLE_MARKER_RE was used,
which requires a literal \n before \`##\`. A marker at buffer
position 0 has no preceding \n, so it could no longer match.
abortForRoleMarker never fired and tool_use blocks emitted after the
fabricated turn boundary reached the dispatcher.

Fix: change firstChunk to track "tail has not been sliced yet" rather
than "any byte emitted". While total emitted bytes <= TAIL_BUFFER_SIZE,
tail still represents the entire emission so far and \`^\` in the
canonical regex genuinely anchors at byte 0 of the stream — so the
\`^|\n\` alternation safely catches a chunk-split message-start
marker. The transition happens at the moment we would slice: once
emitted > TAIL_BUFFER_SIZE, tail becomes a mid-stream window, \`^\`
becomes meaningless, and we switch to the newline-only variants.

Earlier iterations of this code tried two other definitions, both
unsound:
  - "any byte emitted" (this commit fixes) — lost \`^\` before a
    chunk-split message-start marker could finish arriving.
  - "newline emitted" (briefly considered as the reviewer's
    alternative suggestion) — left \`^\` valid on a sliced buffer
    when streams hadn't emitted a newline yet, re-introducing the
    rolling-tail mid-stream false positive from review r3324060995.
The slice-based invariant satisfies both: while we have not sliced,
\`^\` is correct; once we slice, it is not.

Regression tests added (apps/daemon/tests/role-marker-guard.test.ts):
  - \`##\`    | \` user\nDELETE…\`   → contaminated, marker=\`## user\`
  - \`## us\` | \`er\nDELETE…\`      → contaminated, marker=\`## user\`
  - \`## \`   | \`user\nDELETE…\`    → contaminated, marker=\`## user\`
  - \`#\`     | \`# user\nDELETE…\`  → contaminated, marker=\`## user\`
The fourth case (single \`#\` first chunk) exercises an even more
adversarial tokenization than the reviewer's examples; it is also
caught.

Total tests: 55 → 59.

Co-Authored-By: JasonBroderick <jason@buddyboss.com>

* fix(tests): wrap events in stream_event envelope in thinking test

feedJsonl was feeding raw events without the `{ type: 'stream_event',
event: ... }` wrapper that createClaudeStreamHandler requires (line 141
of claude-stream.ts). Events silently fell through all branches, making
both tests pass vacuously. Also fix TS2532 on warnings[0].marker with
non-null assertion (safe after the toHaveLength(1) guard).

Co-Authored-By: RoverKai <roverkai@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: roverkai <2196140098@qq.com>
Co-authored-by: JasonBroderick <jason@buddyboss.com>
Co-authored-by: RoverKai <roverkai@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-30 03:57:56 +00:00
xinsngx
c88a83cd5e
fix(web): preserve preview scroll across tools (#3313)
* fix(web): preserve preview scroll across tools

Capture URL-loaded preview scroll state before tool handoff and restore it through an opt-in raw HTML bridge to avoid jumping back to the top.

Agent-Model: gpt-5

Agent-Family: openai

Agent-Session: 019e6ceb-c33d-7cd3-bff0-cbc20c642197

Agent-Step: 0.0.6

* test(daemon): cover scroll bridge injection paths

Agent-Model: gpt-5
Agent-Family: openai
Agent-Session: 019e6ceb-c33d-7cd3-bff0-cbc20c642197
Agent-Step: 0.0.6

---------

Co-authored-by: Codex <gpt-5@openai.com>
2026-05-30 03:53:50 +00:00
Weston Houghton
65802542a2
fix(chat): surface OpenCode usage-limit/provider failures instead of a bare timeout (#3316)
* fix(chat): surface OpenCode provider failures from its log on a silent stall

OpenCode's headless `run --format json` mode swallows provider failures: a
429 usage-limit is marked retryable and retried silently with nothing on
stdout/stderr, so the chat run only dies via the inactivity watchdog and the
daemon shows a bare "request timed out" with no reason. The real error
(statusCode + "Monthly usage limit reached…") is recorded only in OpenCode's
own session log.

On a failed OpenCode close where stdout/stderr carry no signal, read the
newest OpenCode session log, extract the latest `service=llm` provider error
(scoped to that one line so the embedded request body can't contaminate the
classification), and emit a structured, retryable SSE error (RATE_LIMITED /
AGENT_AUTH_REQUIRED / UPSTREAM_UNAVAILABLE) carrying the provider's message.

Refs #982.

* fix(chat): emit recovered OpenCode failure from the watchdog path, bound to the run

Addresses review on #3316.

Blocking: the recovery previously ran only in the child-close handler, but in
the inactivity-watchdog stall path (the exact case this targets)
failForInactivity sends its error and finish()es the run — which clears
run.clients — before the child closes. So the structured error reached zero
live SSE clients and only surfaced on reload. Recover and send the OpenCode
failure inside failForInactivity, before finish(), on the same pre-teardown
send path the generic stall message already uses. Keep the close-handler
branch for the case where OpenCode exits non-zero on its own (clients still
attached).

Non-blocking: bind the log lookup to the current run via an mtime gate
(since=run.createdAt) so a stale or concurrent session's error can't be
misattributed — skip log files last written before the run started.

* docs(opencode-log): note the concurrent-run limitation of the mtime gate

* fix(chat): skip close-handler failure emit when the watchdog already finished the run

Non-blocking review follow-up on #3316: on the silent-stall path both
failForInactivity and the child-close handler fired for the same run, so the
recovered RATE_LIMITED error was sent twice and the events-log stream was
reopened after finish() had closed it. Guard the close-handler failure emit
with !design.runs.isTerminal(run.status) — the watchdog already sent the
error and finalized the run; finalization below still runs (finish() no-ops
once terminal).
2026-05-30 03:23:58 +00:00
YOMXXX
9b9a18af5b
fix(daemon): validate skillId on POST/PATCH /api/projects against runtime source-of-truth (#3293)
* fix(daemon): validate skillId on POST /api/projects against runtime source of truth

* fix(daemon): validate skillId on PATCH /api/projects/:id, sharing the POST validator

* test(daemon): cover skillId canonicalization, design-template ids, empty-string + null normalization, type rejection
2026-05-30 03:22:16 +00:00
Sriram Sivakumar
0bd07b2a3d
fix(daemon): grok-build — pass prompt inline as -p value, drop stdin (#2259)
* fix(daemon): grok-build runtime — pass prompt inline as -p value, drop stdin

Grok Build CLI 0.1.212 enforces `-p, --single <PROMPT>` as a value-requiring
flag — invoking with bare `-p` and piping the prompt to stdin now fails with:

  error: a value is required for '--single <PROMPT>' but none was supplied

The previous runtime def used `promptViaStdin: true` + `buildArgs` returning
`['-p']`, which only worked against earlier grok builds that read the prompt
from stdin when `-p` had no inline value.

This change inlines the prompt as the `-p` argument value and flips
`promptViaStdin: false`. Linux `MAX_ARG_STRLEN` (128 KB) is enough headroom
for typical Open Design prompts; if we ever hit `E2BIG` on a very large
brief, a follow-up could shell out to `--prompt-file <tempfile>`.

Verified against grok 0.1.212 (b7b8204a4) — single-turn invocations now
return clean text replies instead of exit 2.

* fix(daemon): declare grok-build argv prompt budget + regression coverage

@mrcfps' review on #2259 flagged that moving the Grok Build adapter from
the (no-longer-working) stdin path to argv would regress oversized
composed prompts from the actionable AGENT_PROMPT_TOO_LARGE error we
already emit for DeepSeek to a raw spawn ENAMETOOLONG / E2BIG instead.
Fixed by mirroring the DeepSeek argv-budget shape:

- grok-build.ts: `maxPromptArgBytes: 30_000` (same headroom as DeepSeek,
  ~2.7 KB under the Windows CreateProcess 32_767-char cap) so
  `checkPromptArgvBudget` pre-flights composed prompts (system + history
  + skills + design-system content + user message) before spawn.
- prompt-budget.ts: Grok-Build-specific message — names the `-p /
  --single` flag, the xAI CLI 0.1.212+ behavior change, and points the
  user at stdin-capable adapters (claude / codex / hermes) when they
  need to ship large local context.
- Tests: 3 new vitest cases in prompt-budget.test.ts — pin the budget
  field, exercise the strict-overrun + at-limit + CJK byte-count guards
  exactly like the DeepSeek regression set, and assert the Grok-named
  diagnostic copy. New `grokBuild` + `grokBuildMaxPromptArgBytes`
  helpers exported alongside the existing `deepseek*` ones.

All 23 prompt-budget tests pass locally (`pnpm exec vitest run
tests/runtimes/prompt-budget.test.ts`).

---------

Co-authored-by: Sriram Sivakumar <sriram155@gmail.com>
Co-authored-by: Siri-Ray <2667192167@qq.com>
2026-05-29 08:45:57 +00:00
maybeyourking
881571dea7
fix(media): route custom-image edits through images API (#3087)
* fix(media): route custom-image edits through images API

* fix(media): normalize custom-image endpoint suffixes

---------

Co-authored-by: Artist Ning <dingkuake@yeah.net>
Co-authored-by: Siri-Ray <2667192167@qq.com>
2026-05-29 08:09:44 +00:00
lefarcen
da19ff3ca0
feat(mocks): replay-based mock CLIs for 14 of OD's supported agents (opencode/codex/claude/gemini/cursor-agent/deepseek/qwen/grok + ACP family devin/hermes/kilo/kimi/kiro/vibe) (#3241)
* feat(mocks): replay-based mock CLIs for opencode/claude/codex/deepseek/qwen/grok

Drops in a `mocks/` top-level dir that pretends to be the real agent
CLIs by streaming pre-recorded sessions in each CLI's native stdout
protocol. Zero LLM tokens.

## Use cases

- **E2E tests** in `apps/daemon/tests/` — exercise the full chat-server
  pipeline against a known trace, assert UI events / artifacts.
- **Self-validation during dev** — iterate on `claude-stream.ts` /
  `json-event-stream.ts` parser changes without burning provider budget.
- **Regression harness** — replay the same trace before and after a
  charter / parser change; diff the daemon events the UI surfaces.
- **Demo / onboarding** — show what a 17-tool claude editing session
  looks like end-to-end, offline.

## How

- 6 bash wrappers (`mocks/bin/`) shadow the real CLIs when PATH-overlaid.
- `mocks/mock-agent.mjs` reads `mocks/recordings/<trace>.jsonl`, picks
  one via env var (`SYNCLO_EXPLORE_MOCK_TRACE` / `_POOL` /
  `_BY_PROMPT_HASH`), streams the trace in the requested format.
- Each format renderer matches the EXACT JSON shape the OD daemon
  parser expects, verified line-by-line against
  `apps/daemon/src/{json-event-stream,claude-stream}.ts`:

  | CLI                       | streamFormat              | parser source                              |
  | ------------------------- | ------------------------- | ------------------------------------------ |
  | `opencode`                | `json-event-stream`       | `handleOpenCodeEvent`                      |
  | `codex`                   | `json-event-stream`       | `handleCodexEvent`                         |
  | `claude`                  | `claude-stream-json`      | `createClaudeStreamHandler`                |
  | `deepseek` `qwen` `grok`  | `plain`                   | `server.ts` (raw stdout)                   |

## Quick start

```bash
export PATH="$PWD/mocks/bin:$PATH"
export SYNCLO_EXPLORE_MOCK_TRACE=04097377   # 8-char prefix OK
export SYNCLO_EXPLORE_MOCK_NO_DELAY=1

echo "any prompt" | opencode run
echo "any prompt" | claude -p --output-format=stream-json
echo "any prompt" | codex exec
```

The mock binary announces the picked trace id on stderr:
`[mock-opencode] picked 04097377… via fixed`.

Recording selection (env, in priority order):
- `SYNCLO_EXPLORE_MOCK_TRACE=<id>` — fixed (prefix OK)
- `SYNCLO_EXPLORE_MOCK_BY_PROMPT_HASH=1` + stdin prompt — `sha256(prompt) % N`
- `SYNCLO_EXPLORE_MOCK_POOL=<tag>` — random within `agent:claude` /
  `skill:agent-browser` / `outcome:failed` / etc.
- (default) uniform random
- `SYNCLO_EXPLORE_MOCK_SEED=<str>` — reproducible "random"
- `SYNCLO_EXPLORE_MOCK_NO_DELAY=1` — skip inter-event waits

## Dataset

179 anonymized Langfuse traces from this project's own production
telemetry:

- 9 agents: claude 57 · opencode 41 · codex 38 · gemini 25 ·
  cursor-agent 11 · qwen 2 · copilot 2 · deepseek 2 · antigravity 1
- outcomes: succeeded 144 · failed 35
- skills: default 71 · ad-creative 50 · algorithmic-art 30 ·
  agent-browser 22 · video-hyperframes 2 · plus magazine-web-ppt /
  brainstorming / data-report / penpot-flutter-design-source 1 each
- 124 multi-turn (sessions with ≥2 turns)
- 18 produce `<artifact>` output
- ~4.5 MB on disk total

Anonymization: `/Users/<name>/` → `${HOME}/`,
`C:\Users\<name>\` → `%USERPROFILE%\`, project UUIDs →
stable `proj-001`, `proj-002`, …. Tool input/output payloads
preserved verbatim (templated UI, no cell-level PII).

## Smoke test

`bash mocks/scripts/smoke-test.sh` — 6 checks across all 6 agents.
All pass on this branch (verified locally):

```
  ✓ opencode first event = step_start
  ✓ codex first event = thread.started
  ✓ claude first event = system
  ✓ deepseek emitted plain text (144 chars on first line)
  ✓ qwen emitted plain text (144 chars on first line)
  ✓ grok emitted plain text (144 chars on first line)
All mock CLIs working. 
```

## Adding more recordings

The exporter that produced this set lives in
[nexu-io/agent-pr-explore](https://github.com/nexu-io/agent-pr-explore)
(see `cli/src/local/orchestrator/langfuse-import.ts` + the `local
langfuse-import` CLI command). Operators with the Langfuse keys can pull
more by tag / outcome / artifact / multi-turn filter, then run
`local recordings anonymize --out-dir ~/Documents/open-design/mocks/recordings`.
`mocks/README.md` has the full instructions.

## Out of scope (follow-ups)

- **ACP agents** (`devin`, `hermes`, `kilo`, `kimi`, `kiro`, `vibe`) need
  a JSON-RPC server on stdio rather than a one-shot stream — separate
  `format-acp.mjs` module not yet written.
- **Per-agent json-event-stream variants** (`cursor-agent`, `gemini`,
  `qoder`, `copilot`, `pi`) currently fall back to the `plain` renderer;
  their parsers are in `apps/daemon/src/json-event-stream.ts` and follow
  the same template as `format-codex.mjs`.

## AGENTS.md updates

- Added `mocks/` to the top-level content directories listing
- Added a Validation strategy bullet pointing here for agent-stream /
  parser changes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(mocks): add opencode-cli/kiro-cli/vibe-acp bin aliases and unref ACP timeout

- Add mocks/bin/opencode-cli, kiro-cli, vibe-acp wrappers for the primary
  RuntimeAgentDef bin names OD resolves before any fallback. Without these,
  a PATH-overlaid OD daemon run bypasses the mock entirely (opencode-cli,
  kiro-cli) or cannot find the mock at all (vibe-acp, which has no fallback).
- Include opencode-cli, kiro-cli, vibe-acp in the smoke-test ACP/JSON loop
  so coverage is verified end-to-end.
- Call .unref() on the 30s safety timeout in format-acp.mjs so a completed
  ACP session exits promptly instead of waiting the full 30 seconds.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

* feat(mocks): add vela (AMR) — login / models / ACP with strict set_model gate

Extends mocks/ to cover OD's own AMR runtime. `vela` is the bin name
`apps/daemon/src/runtimes/defs/amr.ts` specifies (`bin: 'vela'`,
`streamFormat: 'acp-json-rpc'`). It's richer than the generic ACP
agents — covers full login + models + chat-session lifecycle.

### What vela does (mirrored from apps/daemon/tests/fixtures/fake-vela.mjs)

1. `vela login` — writes ~/.amr/config.json with a fake profile (controlKey,
   runtimeKey, user{email,name,plan}, profile-specific apiUrl/linkUrl).
   The on-disk projection is what OD's daemon login route + AmrLoginPill
   poller read; production goes through device-auth, the mock skips
   straight to the file write.

2. `vela models` — prints the production-shaped public model catalog as
   newline-separated `public_model_*    vela` lines. Override via
   FAKE_VELA_MODELS env.

3. `vela agent run --runtime opencode` — ACP JSON-RPC server with three
   vela-specific protocol extensions:

   a. `initialize` response carries `agentCapabilities`
      (`promptCapabilities.embeddedContext`) + `models`
      (`currentModelId` + `availableModels`).
   b. `session/new` response carries the same `models` block.
   c. **Strict set_model gate**: `session/prompt` is rejected with
      JSON-RPC -32602 ("session/set_model must be called before
      session/prompt") UNLESS `session/set_model` (or
      `session/set_config_option`) has been called for the current
      sessionId. Mirrors real vela 0.0.1 contract; catches regressions
      in `attachAcpSession` that silently skip set_model.

### Error injection envs (in sync with fake-vela.mjs)

  FAKE_VELA_SESSION_ID            - sessionId returned by session/new
  FAKE_VELA_TEXT                  - override assistant text
  FAKE_VELA_THOUGHT               - optional thought_chunk before text
  FAKE_VELA_SESSION_NEW_ERROR     - fail session/new
  FAKE_VELA_SET_MODEL_ERROR       - fail session/set_model
  FAKE_VELA_PROMPT_ERROR          - fail session/prompt
  FAKE_VELA_REQUIRE_SET_MODEL='0' - disable the strict gate (legacy)
  FAKE_VELA_LOGIN_USER_EMAIL      - email written into config profile
  FAKE_VELA_LOGIN_USER_PLAN       - plan written into config profile
  FAKE_VELA_LOGIN_DELAY_MS        - sleep before write (test in-flight)
  FAKE_VELA_LOGIN_FAIL            - print + exit 1
  FAKE_VELA_MODELS                - override models stdout
  VELA_PROFILE                    - profile slot (prod | test | local)

### Components

`mocks/lib/format-vela.mjs` (~205 LOC)
  - Full ACP server with vela protocol extensions
  - Strict set_model gate
  - Error injection plumbing

`mocks/lib/vela-subcommands.mjs` (~90 LOC)
  - runVelaLogin() — writes ~/.amr/config.json
  - runVelaModels() — prints catalog

`mocks/bin/vela` — dispatcher wrapper. Forwards `vela <subcmd>` to
mock-agent.mjs which routes to login/models or falls through to ACP.

`mocks/mock-agent.mjs` — parseArgs now collects positionals so the vela
dispatcher can read subcommand from there; switch case added for vela.

`mocks/scripts/smoke-test.sh` — +4 assertions:
  vela models prints ≥10 catalog lines
  vela login writes ~/.amr/config.json with the requested email
  vela agent run ACP roundtrip (initialize+models+set_model+stream+result)
  vela strict set_model gate rejects prompt without prior set_model

### Verified locally

  ✓ vela models printed 15 catalog lines
  ✓ vela login wrote ~/.amr/config.json with profile.prod.user.email
  ✓ vela agent run ACP roundtrip (initialize+models, set_model accepted, prompt streamed)
  ✓ vela strict set_model gate rejects session/prompt without prior set_model

All 21 smoke checks pass (up from 17 with previous P3 ACP commit).

### AGENTS.md + README updates

  AGENTS.md — mention `vela (AMR — vela CLI)` alongside ACP agents in
  the directory listing entry.
  mocks/README.md — protocol table row + dedicated vela section with
  subcommand contract, strict gate explanation, env-injection cheat
  sheet. Mock-tree listing updated.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(mocks): honor REPORT_FILE env when --report-file flag not given

Harnesses that spawn the mock without translating their report-path
contract to the mock's CLI flag (notably nexu-io/agent-pr-explore's
orchestrator, which passes REPORT_FILE as env per the existing
opencode/claude/codex agent launchers) wouldn't get a report file
written, so the harness's "agent exit 0 but produced no report"
check would always fire and mark mock runs as failure even though the
stdout stream was complete.

Fix: in mock-agent.mjs parseArgs, fall through to process.env.REPORT_FILE
when --report-file wasn't provided on argv. Each format renderer already
accepts opts.reportFile and writes the recording's final assistant text
to it (`format-*.mjs` already had this — only the wiring was missing).

Verified: synclo-explore run with `mock=true, mock_trace=04097377`
against the opencode wrapper now produces a plan.md with the recording's
17-tool claude editing session report. ~1.5s per run vs ~70s real opencode.

* mocks: move recordings to Cloudflare R2; PR→main→Action upload path

The 179-recording corpus (~4.5 MB raw, ~280 KB after compression) has
been moved off git into Cloudflare R2 at the bucket open-design-mocks
under recordings/v1/. The repo now ships:

- mocks/manifest.json — the canonical catalog (renamed from
  recordings/index.json) with sha256 + storage hints; consumers
  fetch this to discover what exists, then pull individual jsonl
  files on demand
- mocks/scripts/fetch-recordings.sh — parallel, sha256-verified,
  idempotent puller for the public r2.dev URL
- mocks/scripts/add-recording.sh — local maintainer helper that
  validates a new .jsonl and copies it into recordings-staging/
  (no R2 calls; no credentials needed)
- mocks/scripts/upload-to-r2.mjs — called only by the CI workflow
- mocks/scripts/lib/manifest-utils.mjs — shared sha256/meta/
  rebuild-histograms logic, used by both add-recording (preview)
  and upload-to-r2 (actual write) so the entry shape never drifts
- .github/workflows/sync-mocks-to-r2.yml — fires on push to main
  when mocks/recordings-staging/ changes; uploads to R2, updates
  manifest, commits cleanup back; serialized via concurrency group

Trust model: R2 write credentials (CLOUDFLARE_API_TOKEN,
CLOUDFLARE_ACCOUNT_ID) are repo secrets; nobody can push from a
laptop. Read stays public via the r2.dev URL.

Why not pnpm install integration: contributors who do not touch
agent code do not pay the fetch cost. Fetch happens on first
smoke-test run (auto-fallback) or when a mock spawn needs data.

Repo size: -4.55 MB net (delete 179 jsonl, +280 KB manifest +
scripts). Smoke test (21 checks) still green against the fetched
corpus.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* mocks: scope R2 write token to a dedicated secret name

Use CLOUDFLARE_R2_MOCKS_TOKEN (instead of reusing the shared
CLOUDFLARE_API_TOKEN that landing-page-*.yml uses for Pages deploys)
so the R2 write capability can be scoped to just the
open-design-mocks bucket without bleeding extra capability into the
Pages workflows.

Also hardcode the powerformer CF account_id directly in the workflow
(account IDs are not secret and the shared CLOUDFLARE_ACCOUNT_ID
secret may point at a different account).

Workflow now fails fast with an actionable error message + dashboard
link if the secret is unset.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* mocks: switch R2 sync to S3-compat API (wrangler getMemberships gate)

wrangler 4.x calls /memberships before any r2 action, requiring
user:read scope. R2 "Object Read & Write" tokens deliberately lack
that scope (defense in depth — a leaked token should not enumerate
account-level resources). The workflow now uses the aws CLI talking
straight to the R2 S3-compatible endpoint with SigV4, no membership
lookup.

Secret rotation: CLOUDFLARE_R2_MOCKS_TOKEN (Bearer) is replaced by
CLOUDFLARE_R2_MOCKS_AK / CLOUDFLARE_R2_MOCKS_SK (matching the
existing CLOUDFLARE_R2_RELEASES_AK/SK naming convention). End-to-end
tested locally: PUT recording → manifest rebuild → manifest PUT →
staging cleanup all green.

aws CLI is pre-installed on ubuntu-latest, so no install step.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* mocks: scrub synclo namespace; use OD_MOCKS_* env prefix throughout

These mocks were copy-pasted from synclo-explore, where they
originated, and inherited the SYNCLO_EXPLORE_MOCK_* env-var
convention. That brand-bleed is not appropriate in OD: rename the
public env surface to OD_MOCKS_* (matching OD-native prefixes like
OD_MOCKS_CACHE_DIR, OD_TRACE_R2_UPLOAD, OD_EXPECT_TIMEOUT_SECONDS).

Renames:
  SYNCLO_EXPLORE_MOCK_TRACE             → OD_MOCKS_TRACE
  SYNCLO_EXPLORE_MOCK_BY_PROMPT_HASH    → OD_MOCKS_BY_PROMPT_HASH
  SYNCLO_EXPLORE_MOCK_POOL              → OD_MOCKS_POOL
  SYNCLO_EXPLORE_MOCK_SEED              → OD_MOCKS_SEED
  SYNCLO_EXPLORE_MOCK_NO_DELAY          → OD_MOCKS_NO_DELAY
  SYNCLO_EXPLORE_MOCK_RECORDINGS_DIR    → OD_MOCKS_RECORDINGS_DIR
  SYNCLO_EXPLORE_MOCK_SMOKE_TRACE       → OD_MOCKS_SMOKE_TRACE
  SYNCLO_OD_MOCKS_I_KNOW_WHAT_IM_DOING  → OD_MOCKS_ALLOW_LOCAL_UPLOAD

Also drop the inline harvester usage from README. The harvester is an
external CLI in nexu-io/agent-pr-explore — its README is the right
place for langfuse-import flags, anonymization options, etc. OD only
documents its own staging→PR→Action workflow.

Smoke test (21 checks) still green; OD_MOCKS_TRACE end-to-end
verified to route correctly.

Consumers of the OLD env names (notably the orchestrator in
nexu-io/agent-pr-explore) need a matching rename. No back-compat
shim here — the explore side has zero external users today and a
one-line follow-up is cleaner than a permanent deprecation layer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* AGENTS.md: align mock env names with mocks/ rename (SYNCLO_* → OD_MOCKS_*)

Missed in the prior commit (a30b868a) — only grepped mocks/ subdir.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* mocks: drop staging dir + GH Action; back to local-script upload

The staging-dir + Action design (added earlier in this PR) had a flaw
the user caught: new recordings briefly entered the repo on their way
through staging, leaving them in git history forever even after the
Action cleanup commit removed them from HEAD. That defeats the whole
point of moving recordings to R2.

Replace with the simpler local-maintainer flow:

  bash mocks/scripts/upload-recording.sh /path/to/<trace>.jsonl
  # → validates, wrangler r2 put, updates manifest.json, wrangler r2 put manifest
  git add mocks/manifest.json && git commit && git push
  # → only the ~200B manifest delta enters git

The wrangler-OAuth gate replaces the CI secret + Action duo. For a
solo / small maintainer team this collapses the trust chain down to
"do you have wrangler login to the powerformer account?" — no GH
secrets to rotate, no concurrency window to worry about, no
inevitable repo-history bloat.

Deletes:
- .github/workflows/sync-mocks-to-r2.yml
- mocks/scripts/upload-to-r2.mjs   (CI-only)
- mocks/scripts/add-recording.sh   (staging helper, now obsolete)
- mocks/recordings-staging/        (empty dir, never to be repopulated)

Adds:
- mocks/scripts/upload-recording.sh

Kept:
- mocks/scripts/fetch-recordings.sh
- mocks/scripts/lib/manifest-utils.mjs (still used by upload-recording.sh)
- mocks/manifest.json (committed; the only mocks artifact in git)

End-to-end tested locally: re-upload an existing recording is
idempotent, manifest math is stable, fetch + smoke test still green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* mocks: address review — guard allowlist + safe ~/.amr + loud OD_MOCKS_TRACE typo

Three concrete issues raised across recent Siri-Ray (Looper) review
threads on #3241:

1. scripts/guard.ts only allowlisted mocks/lib/ + mocks/mock-agent.mjs,
   leaving mocks/scripts/lib/manifest-utils.mjs outside the residual-
   JS guard. Result: Preflight fail on every push. Extend the allowlist
   to mocks/scripts/ — same precedent as the lib/ entry directly above.

2. mocks/scripts/smoke-test.sh moved the caller real ~/.amr to
   ~/.amr-smoke-backup, ran vela login (which writes a fake config),
   then rm -rf the .amr and restored the backup. Two failure modes:
   crash mid-run loses the user real config, and re-running before
   restore overwrites the backup with the fake login. Fix: sandbox
   vela login into a mktemp -d HOME via env (HOME=$amr_sandbox vela
   login). Never touches the real ~/.amr at all. trap cleans up.

3. mocks/lib/recording-picker.mjs silently fell through to
   prompt-hash → pool → random when OD_MOCKS_TRACE was set but did
   not match any recording (typo, prefix too short, corpus not
   fetched). Tests using a pinned trace would silently get a
   different trace, hiding regressions. Fix: throw an explicit error
   with the failing value + a pointer at fetch-recordings.sh.

Verified locally: pnpm guard prints "Residual JavaScript check
passed", smoke-test still 21/21, ~/.amr mtime unchanged after run,
typo on OD_MOCKS_TRACE now produces "mock-agent: OD_MOCKS_TRACE=...
set but no matching recording in <dir>" on stderr.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fetch-recordings: detect empty filter result before line-counting

printf '%s\n' on an empty string emits a single empty line, so the
previous TOTAL=$(printf ... | grep -c "") math returned 1 on an
empty $ENTRIES_TSV — a typo like `--agent no-such-agent` printed
"Fetching up to 1 recordings", downloaded zero, and exited 0
("ready"). Check `-z $ENTRIES_TSV` first.

Reproduced + fix verified per the reviewer thread.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* mocks: address mrcfps review — goldens + provenance + contract check

Three durability improvements suggested in the PR #3241 top-level
review:

## 1. Golden daemon-event snapshots (mocks/golden/*.events.json + apps/daemon/tests/mocks-golden.test.ts)

Smoke-test verified that mocks RUN; that catches crashes but not a
parser change that semantically reshapes the events the daemon emits.

Commit the daemon-event sequence for 3 representative traces:
- claude  314d6833 — median-complexity agent-browser session
- codex   dcdff3b3 — 14-tool refactor
- opencode 9a9522ec — 7-tool data-report

apps/daemon/tests/mocks-golden.test.ts spawns the mock, feeds stdout
through the real createClaudeStreamHandler / createJsonEventStreamHandler,
normalizes per-spawn volatile fields (only sessionId today, only on
claude), and deep-equals against the committed snapshot. A parser
regression fails the test loudly.

After an intentional parser change, regenerate:

  MOCKS_GOLDEN_UPDATE=1 pnpm --filter @open-design/daemon test mocks-golden
  git diff mocks/golden/
  # eyeball; commit if shapes match intent

## 2. Provenance fields on every manifest entry (mocks/scripts/lib/manifest-utils.mjs + mocks/manifest.json)

Augment inspectRecording() to write:

  captured_at         — ISO 8601 from existing meta.timestamp
  cli_version         — null until harvester writes it
  protocol_version    — null until harvester writes it
  anonymization_version — null until harvester writes it

captured_at is now populated for all 179 existing entries from the
meta event the harvester already emits. The harvester in
nexu-io/agent-pr-explore is the next step for cli_version /
protocol_version / anonymization_version — once those are
populated, consumers can detect when a recording is older than ~1
minor version behind the live CLI and flag for re-harvest.

No matrix of (cli_version × agent) recordings — that explodes
maintenance. Just metadata per recording so trust decay is visible.

## 3. Real-CLI contract check (mocks/scripts/contract-check.sh + docs/MOCKS-CONTRACT-CHECK.md)

Mocks catch parser regressions against recordings; they do NOT
catch recordings drifting away from the live agent CLI as that CLI
evolves. The contract check spawns the real CLI alongside the mock
with a fixed deterministic prompt + diffs top-level event-type
distributions.

Deliberately human-driven, not cron-scheduled:
- costs real LLM tokens per invocation
- requires real CLI auth
- maintainer reads the output, not a regex

Suggested triggers per doc: real-CLI release notes mentioning
"output format" / "stream" / "JSON" / "events"; before a parser
refactor; ad-hoc when something looks off.

## Coverage note

README updated to position mocks as "deterministic protocol/parser
coverage" (not "e2e replacement") per mrcfps framing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(mocks-golden test): drop import of non-exported ParserKind

Use plain string (the type alias is `string` anyway) — Preflight
typecheck on a31fa71a failed:

  tests/mocks-golden.test.ts(29,8): error TS2459: Module
  "../src/json-event-stream.js" declares "ParserKind" locally, but
  it is not exported.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* recording-picker: structured OD_MOCKS_POOL + hard-fail no-match

Siri-Ray review: \`OD_MOCKS_POOL=outcome:failed\` was documented as a
supported selection knob, but the matcher only checked tags and
\`meta.agent\` — so the negative-path pool found 0 candidates and
silently fell through to global random, validating against any
recording instead of a failed trace.

Fix:
- Parse \`<dim>:<value>\` shape and route each dim to the right meta
  field: \`outcome\` → \`meta.outcome\`, \`agent\` → \`meta.agent\`,
  \`skill\` → \`tags[]\`. Bare values still fall back to tag substring.
- If the env was set and matched nothing, throw with the failing
  value and a jq one-liner for inspection. Same loud-fail policy as
  OD_MOCKS_TRACE — silent fallback was the original bug.

Verified locally: outcome:failed, agent:codex, skill:agent-browser
all route correctly; outcome:nonsense throws the explicit error.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* contract-check.sh: fix lost $PROMPT in mock invocation

Siri-Ray review on e576074a: the mock side wrapped its pipeline in
`bash -c "printf %s \"\$PROMPT\" | ..."` — but $PROMPT was a parent
shell variable, not exported, so the child bash expanded it to an
empty string. Result: the contract check sent the real prompt to the
real CLI and an empty string to the mock, defeating the
same-input invariant the whole script rests on. Also let the mock
randomly select a different trace whenever a maintainer happens to
have OD_MOCKS_BY_PROMPT_HASH=1 in their env.

Fix: drop the inner bash -c entirely; use a subshell that scopes the
PATH overlay and pipes printf into the PATH-resolved mock binary
directly. The subshell limits the PATH change without var-passing.

Verified locally: with prompt-A the mock picks trace 54ec02ee via
hash; prompt-B → 2667e851 via hash; empty prompt (old broken
behavior) → random — confirms the prompt is now actually reaching
the mock under PATH overlay.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-29 07:17:20 +00:00
Caprika
76c7d31c53
chore: bump vela cli to 0.0.4 (#3239)
* chore: bump vela cli to 0.0.4-test.0

* chore: refresh lockfile for vela cli 0.0.4-test.0

* chore(nix): refresh pnpm deps hash

* fix: materialize electron before mac release checks

* fix: rebuild electron when mac framework links are invalid

* revert: drop release workflow experiments

* chore(nix): refresh pnpm deps hash

* fix: stop blocking beta mac release on electron symlink preflight

* fix: stop using custom electron dist for beta mac packaging

* fix: guard oversized chat images and opencode overflow

* chore: bump vela cli to 0.0.4

* chore(nix): refresh pnpm deps hash

* fix(daemon): surface prompt-image stat failures instead of dropping them

resolveSafePromptImagePaths only swallowed unresolvable path input; once a
path was confirmed inside UPLOAD_DIR and existed, a statSync failure
(EACCES/EPERM, a file vanishing mid-run) silently dropped the image and let
the run continue without that prompt context. Since this helper is now also
the 1 MB enforcement point, that turned an infra/validation failure into a
'successful' run with missing required context.

Collect those into a new failedImages bucket and fail the run with
INTERNAL_ERROR at the call site, mirroring the oversized-image guard. Add a
unit test covering statSync throwing.

---------

Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>
Co-authored-by: lefarcen <935902669@qq.com>
2026-05-29 06:41:17 +00:00
lefarcen
98a2c63973
feat(daemon): add Antigravity agent adapter (#3157)
* feat(daemon): add Antigravity agent adapter

Adds Google Antigravity (`agy` CLI) as a coding-agent runtime. Detection
picks up `agy` on PATH, the daemon spawns `agy -p "<prompt>"` for a
single non-interactive turn, and the assistant text reply streams back
on stdout. OAuth is shared with the Antigravity IDE through the system
keyring, so users who have signed into the desktop app are authenticated
on first run with no extra step.

`agy` v1.0.3 has no JSON / stream-json / ACP output mode (upstream issue
#119), no `--model` flag (issue #35), and no MCP forwarding hook yet —
the adapter ships with `streamFormat: 'plain'` and a single `default`
fallback model so the model picker doesn't mislead users into thinking
their choice is wired through. We will upgrade buildArgs + add a
dedicated event parser when upstream ships structured output.

Also gitignores `.antigravitycli/`, the project-local config directory
`agy` auto-creates on every run (upstream issue #175).

* fix(daemon): Antigravity adapter — stdin prompt, brand icon, form loop, empty-output guard

- Switch prompt delivery from argv to stdin (`agy -p -`) to avoid the
  30KB maxPromptArgBytes limit that blocked real-world composed prompts
- Add official Antigravity brand SVG icon to agent picker
- Fix repeated question-form loop for plain agents by injecting an
  OVERRIDE block when form answers are already present in the transcript
- Add empty-output guard for plain agents so expired auth or silent
  failures surface a user-visible error instead of a blank "Done" turn

* feat(daemon): expand Antigravity adapter — model picker, form-loop fix, OAuth launcher, log-file classification

PR #3157 follow-up integrating four iterations from end-to-end manual
testing on Gemini 3.5 Flash + GPT-OSS 120B Medium through `agy` v1.0.3.
Each section is independently verifiable; combined they're what made
the first successful artifact generation work end-to-end.

## Model picker via settings.json (agy has no --model flag)

agy v1.0.3 ships no `--model` CLI flag (upstream issue #35), but the
TUI Switch-Model picker writes the chosen label to
`~/.gemini/antigravity-cli/settings.json`'s `"model"` field, and every
`-p` invocation re-reads that file on startup — verified by capturing
the `--log-file` line `Propagating selected model override to backend:
label="<model>"`. Antigravity's `fallbackModels` now lists the 8
labels its TUI exposes (Gemini 3.1 Pro / 3.5 Flash variants, Claude
Sonnet/Opus 4.6 Thinking, GPT-OSS 120B Medium) and `buildArgs`
persists the user's choice to settings.json right before spawn. The
synthetic `default` id is preserved — picking it leaves settings.json
untouched so a user who switches models from agy's own TUI keeps
their choice.

Introduces `RuntimeAgentDef.supportsCustomModel?: boolean`. AMR's
hardcoded blocklist in `SettingsDialog.tsx` migrates to the
declarative flag (it rejects free-form ids at the ACP layer), and
antigravity opts out because its label set is a server-side enum that
silently fails on unrecognised strings.

## Form-loop fix (transcript sanitizer + stronger OVERRIDE)

The discovery form loop on weak/medium plain-stream models (GPT-OSS
120B Medium, Gemini 3.5 Flash) had two reinforcing causes:

  1. `buildDaemonTranscript` packed the prior assistant turn's
     literal `<question-form>` markup into the user request on the
     next turn, giving the model a template to echo. New
     `sanitizePriorAssistantTurnForTranscript` strips
     `<question-form>...</question-form>` blocks and ```json fences
     that match form-schema shape, replacing them with a brief
     placeholder. User content is preserved verbatim (a user who
     legitimately mentions `<question-form>` in chat keeps their
     message intact).
  2. The OVERRIDE block on form-answered turns was 4 lines and only
     banned the bare `<question-form>` tag — models still emitted the
     fenced JSON, form-asking prose ("Got it — tell me the following"),
     and fake system events ("subagents stopped"). The new
     `FORM_ANSWERED_SYSTEM_OVERRIDE` enumerates each anti-pattern and
     pins them via tests, so silently weakening any line reintroduces
     the regression.

Also adds RuntimeAgentDef.resumesSessionViaCli + RuntimeContext.
hasPriorAssistantTurn as forward-looking abstractions (skipTranscript
option on composeChatUserRequestForAgent). Antigravity does NOT opt
in — agy's `-c` resume activates an internal agentic loop with tool
retries and fallback-to-cached-response on tool errors that the OD
system prompt cannot steer; reverted after seeing byte-identical
form re-emissions caused by agy's own retry logic, not OD's transcript.

## One-click OAuth via system terminal

agy print mode can't complete Google Sign-In on its own (the OAuth
callback page asks the user to paste an auth code back into agy, but
`-p` has no input field). Before this commit the auth banner only
told the user to "open a terminal yourself."

Adds `POST /api/agents/antigravity/oauth-launch` and a cross-platform
launcher in `runtimes/terminal-launch.ts`:

  - macOS:    osascript → Terminal.app `do script "agy"` + activate
  - Linux:    tries x-terminal-emulator, gnome-terminal, konsole,
              xfce4-terminal, xterm in order
  - Windows:  `cmd /c start "Open Design" cmd /k agy`

The endpoint hardcodes the `agy` command (no user input → no shell
injection surface) and is loopback-gated like the other daemon
endpoints. The chat's `AGENT_AUTH_REQUIRED` banner now renders a
"Sign in via terminal" button next to Retry; clicking it spawns the
terminal so the user can finish OAuth in one click.

## Silent-failure classification (auth vs quota via --log-file)

agy print mode is silent on stdout/stderr for both missing-OAuth AND
quota-exhausted failures — the upstream
`RESOURCE_EXHAUSTED (code 429): Individual quota reached` and the
`not logged into Antigravity` line only surface in agy's
`--log-file`. Without log inspection the daemon misread quota as
"auth required" and showed the wrong banner.

`RuntimeContext.agentLogFilePath` carries a daemon-owned per-run temp
path that antigravity's buildArgs translates to `--log-file <path>`.
The empty-output guard now reads that log on a `code === 0 &&
!childStdoutSeen` exit, feeds the tail to
`classifyAgentServiceFailure`, and routes:

  - "not logged into Antigravity"     → AGENT_AUTH_REQUIRED with
                                        antigravityAuthGuidance
  - "RESOURCE_EXHAUSTED" / "quota" /  → RATE_LIMITED with
    "Individual quota reached"          antigravityQuotaGuidance
  - none of the above (rare)          → fall back to auth guidance
                                        as the most likely cause

Both surface a terminal launcher in the auth banner: auth gets "Sign
in via terminal", quota gets "Switch model in terminal" — same
endpoint, contextual label. The handler is identical (open agy in a
terminal); the user either signs in or uses agy's Switch Model
picker to pick a model with available quota.

## Validation

- `pnpm guard` pass
- `pnpm --filter @open-design/daemon` runtime + telemetry suites:
  192 passed, 1 skipped (the 1 pre-existing `task-type` failure on
  origin/main is unrelated to this change)
- `pnpm --filter @open-design/web` typecheck pass; sse / amr-guidance
  / AgentIcon suites pass (51 web tests)
- Manual end-to-end on darwin + Gemini 3.5 Flash and GPT-OSS 120B
  Medium: turn-1 question-form rendered correctly, turn-2 produced
  `<artifact>` with full HTML (3.3KB Modern Minimal design) instead
  of re-emitting the form. agy `--log-file` content correctly
  classified as RATE_LIMITED when Gemini Pro quota was exhausted,
  and as AGENT_AUTH_REQUIRED when keychain was cleared.

* fix(web/test): align amrAgent fixture with supportsCustomModel contract

The AMR agent definition in the daemon ships `supportsCustomModel: false`
so the Settings model picker hides the free-text "Custom…" option. The
PR changed `allowCustomModel` from `selected.id !== 'amr'` (hardcoded)
to `selected.supportsCustomModel !== false` (declarative), but the test
fixture was not updated to carry the same field — causing the
`__custom__` sentinel to appear in the picker under test.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

* fix(daemon): align formAnswerTransition wording with main + scope build directive to discovery

CI surfaced two failures on the merge with main:
- chat-route.test marks submitted discovery form answers ... expected
  the main-version wording 'Do not emit another <formId> form.'
- telemetry-message-finalization keeps non-discovery form answers
  active ... expected task-type to fall through the else branch
  ('Treat these form answers as the active user turn'), not the
  discovery RULE 2/RULE 3 build branch.

The colleague's earlier fba1e40b form-loop fix tightened both pieces
(stronger wording + grouped discovery|task-type into the build branch)
but didn't update the tests that pin the contract. Revert the
transition wording to main and re-scope the build directive to
'discovery' only. The aggressive form-loop suppression we added in
this PR now lives in the system-prompt FORM_ANSWERED_SYSTEM_OVERRIDE
block, which is far stronger than the user-request transition text
this commit reverts.

* fix(daemon): scope formOverride by form id, detach Linux terminal, move agy log cleanup to finally

- FORM_ANSWERED_GENERIC_OVERRIDE: new exported constant for non-discovery/
  non-task-type form ids; contains only the "do not re-ask" suppression
  without the RULE 2 / RULE 3 / artifact directive.
- formAnswerTransitionForCurrentPrompt: extend build-transition branch to
  include task-type alongside discovery, keeping user-turn and system
  override consistent.
- Prompt assembly (server.ts ~10848): derive formOverride from the parsed
  form id — FORM_ANSWERED_SYSTEM_OVERRIDE for discovery/task-type,
  FORM_ANSWERED_GENERIC_OVERRIDE for all other form ids, empty otherwise.
- launchOnLinux: replace execFileAsync (waited for terminal exit, 3 s cap)
  with spawn({ detached: true, stdio: 'ignore' }) + unref(); resolve on
  the 'spawn' event so long-lived interactive terminals (xterm, konsole)
  are not killed mid-OAuth-flow.
- Antigravity log cleanup: move fs.promises.unlink(agentLogFilePath) into
  a try/finally wrapper around the close handler so every exit path
  (success, failure, cancel, non-zero exit) cleans up the per-run temp
  file, preventing unbounded /tmp accumulation.
- Tests: rename task-type case to assert build-transition behaviour; add
  generic-form-id case (preferences) pinning the non-build path; add
  FORM_ANSWERED_GENERIC_OVERRIDE content assertions.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

* fix(daemon): switch Antigravity buildArgs to chat subcommand invocation

Replace top-level `-p -` with `agy chat [--log-file …] -` so the adapter
uses the documented chat subcommand and stdin sentinel instead of the
unrecognised global -p flag.  Update the agent-args test description and
all four deepEqual assertions to assert the ['chat', '-'] shape.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

* test(daemon): drop real-platform default-launch assertion from terminal-launch suite

The removed test called launchAgentInSystemTerminal('agy') with no
platform override, which invokes the real system terminal on every
developer machine running the daemon test suite (Terminal.app on macOS,
cmd.exe on Windows, xterm/gnome-terminal on Linux). That is an
unacceptable OS side effect for a unit test.

The behaviour being asserted — that omitting platform selects
process.platform — is a TypeScript default-parameter guarantee, not a
runtime invariant that needs an integration test. The remaining 'aix'
case continues to pin the unsupported-platform failure shape.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

* fix(daemon): buffer Antigravity stdout to suppress auth URL before close-time classifier

The plain-stream close handler at code===0 can detect an agy OAuth
prompt in agentStdoutTail and emit AGENT_AUTH_REQUIRED, but by the
time close fires the stdout chunk has already been forwarded to the
client via the plain-stream `send('stdout', { chunk })` path. This
leaves both the raw OAuth URL and the terminal-launch guidance visible
in chat.

Buffer all stdout chunks for the `antigravity` agent instead of
forwarding them immediately. The existing close-time auth-prompt guard
(code===0, !trackingSubstantiveOutput, childStdoutSeen) returns early
when it detects the auth pattern, leaving the buffer unflushed and the
OAuth URL out of the SSE stream. For legitimate assistant output the
buffer is flushed in order just before design.runs.finish so the
chunks still arrive before the run's finished event.

Adds a chat-route integration test using a fake `agy` that exits 0
after printing the canonical auth prompt; asserts that the run emits
AGENT_AUTH_REQUIRED with no event: stdout delta containing the URL.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

* test(daemon): isolate antigravity buildArgs argv test from real settings file

Pass a temp antigravitySettingsPath in the RuntimeContext for the
withModel argv assertion so unit tests do not touch
~/.gemini/antigravity-cli/settings.json. Adds the optional
antigravitySettingsPath field to RuntimeContext and threads it
through buildArgs to writeAntigravityModelSelection; production
callers leave it undefined, preserving the existing default path.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

* fix(daemon): revert Antigravity buildArgs to `-p -` (the only working agy v1.0.3 invocation)

The looper-reviewer-bot reported `chat` as agy's headless subcommand
based on its environment's agy build, and looper-fixer applied that
shape. The installed CLI (`agy --version` reports `1.0.3`) does NOT
expose a `chat` subcommand — `agy --help`'s `Available subcommands`
section lists only `changelog / help / install / plugin / update`,
and `agy chat - < prompt` exits 0 with empty stdout (the daemon then
forwards it as a 'successful' empty reply, exactly the failure mode
the auth/quota guard at server.ts ~12090 is meant to catch — for the
wrong reason).

`-p` is the documented print-mode flag (`Short alias for --print`)
and `agy -p -` reads the prompt from stdin and prints the model
reply, which the entire end-to-end test sequence in this PR has
verified against (form-loop fix, settings.json model routing,
log-file classification all confirmed working on Gemini 3.5 Flash
+ GPT-OSS 120B Medium with this invocation).

Updates the agent-args test to pin `['-p', '-']` instead of
`['chat', '-']` and adds an inline comment in antigravity.ts noting
that `chat` may exist in a future agy build but is not the contract
on the installed CLI today.

* fix(daemon): serialize Antigravity concrete-model spawns to dodge settings.json race

Reviewer (looper) flagged a concurrency race in the model-routing path:
~/.gemini/antigravity-cli/settings.json is process-global, so two OD
runs starting close together with different concrete models can race
the file — run A writes model A, run B writes model B, then A's agy
finally reads settings.json and executes on model B. The Settings
model picker becomes nondeterministic under parallel conversations.

Adds a per-process promise chain in antigravity.ts:
  - acquireAntigravityModelLock(): chain-await + return release fn
  - waitForAgyToReadModel(logPath, expected): polls agy's --log-file
    for the upstream signal
      'Propagating selected model override to backend: label="<X>"'
    which model_config_manager.go emits once agy has finished reading
    settings.json. Returns true on observed match, false on timeout.
    Regex-escapes the expected label so '(' / ')' in 'GPT-OSS 120B
    (Medium)' match literally, not as a capture group.

server.ts spawn pipeline now acquires the lock BEFORE buildArgs (which
performs the settings.json write) and schedules a release-once handler
that fires when EITHER (a) the log-file confirms agy read the model
or (b) the child exits — the exit fallback prevents a stuck/crashed
agy from starving the queue for every subsequent antigravity spawn.

Default-model spawns bypass the lock entirely: their buildArgs doesn't
touch settings.json, so there's nothing to serialize.

Tests pin:
  - FIFO ordering across 2 / 3 concurrent acquirers
  - Wait helper's regex correctly matches parenthesized labels
  - Wait helper does NOT match a different model with shared prefix
  - Wait helper swallows missing-log-file errors and returns false on
    timeout (no spawn-pipeline crash if the log never appears)

194 → 198 passing runtime tests, 0 regressions.

* fix(daemon): close Antigravity lock release race on slow agy startup (looper #263fd2fe7)

Reviewer flagged that the previous serialization scheduled
`releaseOnce` in `.finally()` on waitForAgyToReadModel — meaning the
helper's `false` timeout return ALSO released the lock. If agy took
longer than the 15s polling window to read settings.json (cold start,
swap-thrash, slow network handshake to the upstream backend), run A's
lock dropped at 15s, run B rewrote settings.json with model B, and
run A's still-starting agy then read the wrong model. Same race the
original mutex was meant to close.

Fix the release semantics to be release-on-confirmation-only:

  - waitForAgyToReadModel: `false` now strictly means 'I gave up
    polling,' not 'agy definitely did not read this.' Document the
    contract so a future caller can't conflate the two. Add an
    optional AbortSignal so server.ts can stop polling when the child
    exits — without it, the leftover watcher could outlive the run
    and accidentally match a later concurrent run's log content,
    releasing the wrong lock.
  - server.ts: schedule `releaseOnce` only when waitForAgyToReadModel
    returns true. The exit handler (which fires for crashes, fast
    exits, normal completion) is now the canonical fallback that
    releases the lock no matter what — the queue can't starve
    permanently because agy always exits eventually. The exit
    handler also fires the AbortController so the watcher cleans up.

New tests pin:
  - timeout returns false WITHOUT any release-implying side effect
  - already-aborted signal short-circuits (no readFile calls)
  - abort mid-poll wakes the helper from its setTimeout (no
    multi-hundred-ms hang waiting out a poll interval that no longer
    matters)

198 → 201 passing runtime tests, 0 regressions.

---------

Co-authored-by: qiongyu1999 <2694684348@qq.com>
2026-05-29 05:43:37 +00:00
kami
055680a67d
fix(daemon): dedupe scheduled routine slots (#1971)
* fix(daemon): dedupe scheduled routine slots

Co-authored-by: multica-agent <github@multica.ai>

* fix(daemon): claim scheduled routine runs atomically

Co-authored-by: multica-agent <github@multica.ai>

* Fix routine loser snapshot rollback

Co-authored-by: multica-agent <github@multica.ai>

* fix(daemon): defer scheduled routine side effects

Co-authored-by: multica-agent <github@multica.ai>

* fix(daemon): terminate in-memory run on scheduled prepare failure

If `prepare()` throws after `persistPreparedRun()` has mutated the
routine run with real project/conversation/agentRunId values, the catch
in `RoutineService.start_` previously left the in-memory chat run
queued (no `discard()`), so its `completion` promise hung waiting on
`design.runs.wait(run)` forever, and the `routine_runs` row stayed
pinned to `routine-pending-*` placeholders even though the underlying
project/conversation rows for those real IDs had been created.

The catch now calls `handlerStart.discard?.()` so the in-memory run
terminates as `canceled`, releasing `completion`, and passes the real
IDs through `updateRun` so the persisted failed row reflects what was
attempted instead of the placeholder sentinels. A cleanup failure
inside `discard()` is logged via `console.error` rather than swallowed,
following the same surface-don't-swallow rule the loser cleanup path
uses. The original prepare error is still rethrown so the scheduler
advances to the next cadence (the slot claim is already terminal, so
retrying the same slot would just duplicate-claim and lose).

Added regression coverage in `apps/daemon/tests/routines.test.ts` for
both the normal prepare-failure path (real IDs persisted, discard
fired, completion resolved) and the case where the cleanup itself also
throws (failure surfaces via console.error, the row is still finalized
with the real IDs).

Co-authored-by: multica-agent <github@multica.ai>

* fix(daemon): clear placeholder IDs on scheduled prepare failure

Co-authored-by: multica-agent <github@multica.ai>

* fix(daemon): finalize routine prepare failures

* fix(daemon): defer manual routine setup cleanup

Co-authored-by: multica-agent <github@multica.ai>

* fix(daemon): drop loser chat runs and rollback partial snapshot pins

Two follow-ups from the latest scheduler-claim review:

- Duplicate scheduled losers used to call `design.runs.finish(run, 'canceled')`,
  exposing a phantom canceled routine run on `/api/runs` even though no
  `routine_runs` row, conversation, or messages were ever committed. Split
  the handler tear-down into `discardUnstarted` (used for never-inserted
  paths — drops the in-memory run via the new `design.runs.drop()`) and the
  existing `discard` (used after `prepare()` runs — still finalizes as
  canceled and rolls back partial state).
- `resolvePluginSnapshot()` calls `linkSnapshotToProject()` before linking
  the conversation/run, so a failure mid-link could leave the reused project
  pinned to a snapshot the routine never durably claimed while
  `resolvedRoutineSnapshot` stayed null. Capture the intermediate snapshot
  id in `partiallyAppliedSnapshotId` when the resolver throws, and let
  `discard()` fall back to it for `restoreProjectSnapshotLink` so the
  previous project pin is restored either way.

Regression coverage added in `tests/routine-schedule-claims.test.ts`:

- A scheduled loser does not surface a phantom canceled chat run via
  `/api/runs` after the slot is lost.
- A resolver that throws after `linkSnapshotToProject()` (forced via a
  SQLite trigger on `conversations.applied_plugin_snapshot_id`) still
  restores the reused project's previous pin in `discard()`.

* fix(daemon): return prepared routine run ids

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
Co-authored-by: kami.c <kami.c@chative.com>
2026-05-29 03:20:47 +00:00
Weston Houghton
20136c4da9
fix(skills): stream-copy fallback when skill staging hits cross-fs EPERM (#3249)
* fix(skills): fall back to a stream copy when skill staging hits EPERM

`fs.cp` copies each file with copy_file_range(2), which the kernel rejects
across some filesystem pairs — e.g. a container image layer (`/app`) copied
onto a ZFS/overlay bind mount (`/data`) — surfacing EPERM. Node doesn't fall
back to a userspace copy, so skill staging failed and degraded to absolute
paths, losing the `.od-skills` write barrier.

Retry recoverable copy errors (EPERM/EXDEV/ENOTSUP/EOPNOTSUPP) with a
dereferencing read/write copy that works across any source/dest filesystem;
non-recoverable errors still degrade as before. A test seam injects a
synthetic EPERM since the real errno only reproduces on those mounts.

* fix(skills): preserve source file mode in the EPERM stream-copy fallback

The cross-filesystem fallback copied contents with createWriteStream, which
opens the destination at the default 0644 and drops the source's exec bit.
Skills shell out to staged helper scripts (e.g.
skills/pptx-html-fidelity-audit/scripts/*.py), so on the EPERM/EXDEV path
this fallback repairs they would fail with EACCES.

chmod (masked to 0o777, so the agent-writable staging copy never inherits
setuid/setgid/sticky) + utimes each copied file from the source stat so the
fallback matches fs.cp's mode/timestamp preservation. Adds a regression test
that stages an executable fixture through the synthetic-EPERM seam and
asserts the exec bit survives.
2026-05-29 03:17:04 +00:00
lefarcen
08c350fb0f
fix(analytics): bucket feedback agent/model directly on the event (#3240)
* fix(analytics): bucket feedback agent/model directly on the event

Reason × agent / reason × model splits on
`assistant_feedback_reason_submit` were 25-74% `unknown` because the
event only carried `run_id` — analyses had to join back to
`run_created/run_finished`, which loses rows whenever the feedback is
given to a message whose run sits outside the query window (the common
case for feedback on older messages), and whose `model_id` was `null`
to begin with (the user didn't pick a specific model — went with the
agent's default).

Carry `agent_provider_id` and `model_id` directly on every feedback
event so the analyses no longer need to join. Replace `null/unknown`
with the `default` bucket via `modelIdForTracking` (and let
`agentIdToTracking` fall through to `other`) at every emit site —
`null` was an analyst-hostile mix of "no selection" and "join failed";
`default` is a real, analysable bucket. On `run_finished`, upgrade the
model to the agent-reported value from initializing/model status
events when the user did not pick one — covers ACP, claude-stream,
copilot-stream, json-event-stream, qoder, pi-rpc.

* fix(analytics): use feedbackAgentProviderIdToTracking and assistantFeedbackModelId for feedback events

Wire API-mode agent ids (anthropic-api → anthropic) and agentName-parsed
model ids through the feedback emit path. Previously the feedback props used
agentIdToTracking (no anthropic-api case) and assistantModelDetail (no
agentName fallback), causing model_id='default' and agent_provider_id='other'
for API-mode agents.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

* fix(analytics): extend feedback/run schema for full agent/model coverage

Layered on top of the conflict resolution and the v1 emit switchover
in 0c1b30440. Three things the prior commits did not cover:

1) The v2 `assistant_feedback_*` family (page='studio') shares
   `AssistantFeedbackBase`. Add `agent_provider_id` + `model_id` once on
   the base so all four derived emits (reason_view, click, reason_click,
   reason_submit) carry the same context as the v1 family, instead of
   leaving the v2 dashboard with the same `unknown` gap the v1 PR was
   trying to close.

2) Tighten `FeedbackSubmitResultProps.model_id` and
   `feedbackAgentProviderIdToTracking` from `string | null` /
   `TrackingFeedbackProviderId | null` to non-null. The web emit paths
   already bucket null/empty through `modelIdForTracking` and the
   `?? 'other'` fallback; collapsing that at the helper / contract
   layer means `null` becomes a TS error at every new emit site, so we
   can't regress the unknown bucket again in a future event.

3) Comment on `run_finished.model_id` so reviewers reading
   `finishedModelId` see why the agent-reported value upgrades the
   request-side one.

* fix(analytics): continue event scan past usage to find agent-reported model

The reverse scan for agentReportedModel was broken: the loop broke on
the first usage event (terminal) before ever reaching the status:initializing
or status:model event (emitted at run start, lower index). This meant
run_finished.model_id always fell through to modelIdForTracking(null) =
'default' for any run that reported usage tokens.

Fix: track haveUsageTokens as a flag and defer the break until both usage
tokens are found and either the model is not needed (user picked one) or
the agent-reported model has been captured. Extract the logic into
scanRunEventsForFinishedProps for unit testability.

Tests: six new cases in run-lifecycle-analytics.test.ts cover the
initializing→usage append order, ACP status:model, detail field fallback,
early exit when reqBodyModel is set, no-status event, and empty events.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

* fix(analytics): guard usage block with !haveUsageTokens to prevent early events overwriting terminal tokens

In the reverse-scan loop of scanRunEventsForFinishedProps, the usage block
lacked a !haveUsageTokens guard. When needAgentModel is true and the
agentReportedModel lives at the start of the run (lower index), the loop
walks all the way back past multiple usage events (one per step/turn in
multi-step runs), overwriting inputTokens/outputTokens on each pass. The
surviving values were those of the earliest step, not the terminal total.

Adding !haveUsageTokens to the usage block condition ensures only the first
(terminal) usage event seen in reverse sets the token counts; subsequent
earlier usage events are skipped while the scan continues for agentReportedModel.

Adds a test case for initializing(model) → usage(step1) → usage(terminal)
asserting both terminal token counts and agentReportedModel.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)
2026-05-29 03:06:06 +00:00
Amy
1c2a1c4459
Add launch review regression coverage and stabilize daemon tests (#3207)
* Add launch review E2E regression coverage

* Harden daemon launch review regressions

* Stabilize daemon runtime tests

* fix(tests): restore e2e preflight typing

Generated-By: looper 0.8.1 (runner=fixer, agent=codex)

* fix(tests): make fake plugin runtime ESM-safe

Generated-By: looper 0.8.1 (runner=fixer, agent=codex)

* Stabilize e2e fake agent and regression tests

* fix(tests): repair fake agent cjs runtime

Generated-By: looper 0.8.1 (runner=fixer, agent=codex)

* fix(review): harden plugin authoring checks

Generated-By: looper 0.9.2 (runner=fixer, agent=codex)

* fix(tests): bind plugin authoring run to seeded conversation

Generated-By: looper 0.9.2 (runner=fixer, agent=codex)
2026-05-29 02:39:33 +00:00
Caprika
cd1790abab
Harden AMR Link startup model discovery (#3198)
Some checks failed
visual-baseline / Capture visual baselines (push) Waiting to run
actionlint / Lint GitHub Actions workflows (push) Failing after 1s
ci / Detect CI change scopes (push) Successful in 0s
landing-page-ci / Validate landing page (push) Failing after 1s
landing-page-staging / Deploy landing page to staging (push) Has been skipped
nix-check / build (push) Failing after 2s
ci / Validate Nix flake (push) Has been skipped
ci / Preflight (push) Failing after 1s
ci / Workspace unit tests (push) Failing after 2s
ci / Daemon workspace tests (push) Failing after 2s
ci / Web workspace tests (push) Failing after 1s
ci / Browser tests (push) Failing after 1s
ci / Build workspaces (push) Failing after 1s
ci / Validate workspace (push) Failing after 0s
ci / Runtime trace (push) Has been skipped
2026-05-28 14:45:23 +00:00
Caprika
56fe5c5036
fix(amr): stage external image attachments into workspace (#3226)
* fix(daemon): forward AMR image attachments through ACP

* fix(amr): stage external image attachments into workspace

* fix(amr): stage prompt image paths safely
2026-05-28 14:44:00 +00:00
lefarcen
b8cdf5f0ea
feat(mcp): generation loop + one-click Codex install (#3141)
* feat(mcp): add project creation, capability discovery, and generation tools

Lets an external coding agent (Codex, Cursor, …) drive a full design
loop over `od mcp`, not just read/write files: create a project,
discover what Open Design can make, commission a generation run, poll
it, and open the result in a browser. Complements the existing
write_file / delete_file / delete_project management tools.

New tools:
- create_project — make an empty project to generate into (start_run
  needs one). Derives a slug id from the name unless given.
- list_skills / list_plugins — discover what you can ask OD to make.
- start_run / get_run / cancel_run — commission a run (OD spawns its
  own agent), poll to completion, cancel. start+poll because MCP is
  request/response and generation is minutes-long.
- get_run / get_project now return a browser-openable previewUrl
  (entry file served raw; HTML entries render directly).

The external agent never runs a skill itself — it commissions OD to,
so the prior "skills not on MCP" boundary no longer applies.

* feat(mcp): make get_run preview hint directive

Reword the hint MCP clients receive when a run finishes so the agent
is more likely to surface the previewUrl to the user proactively —
mention the user-facing browser explicitly and call out that clients
with a built-in browser pane (e.g. Codex CLI's right-side browser)
should navigate to it directly. Also nudge start_run's hint to flag
that a previewUrl will arrive on success, so the agent knows what to
do with it before it ever sees get_run.

Pure text change; no behavior change in the tool surface or daemon.

* feat(mcp): one-click Install / Remove for Codex from Settings

Adds a toggle button on Settings → Integrations → Codex panel that runs
`codex mcp add open-design …` / `codex mcp remove open-design` via the
daemon, so users no longer need to copy TOML and paste it into
~/.codex/config.toml by hand. The copy-snippet path is unchanged and
remains the fallback when the Codex CLI isn't on PATH.

The daemon shells out to Codex CLI rather than rewriting config.toml
itself — that way we inherit Codex's own merge / dedupe / validation
rules and only track its argv. The runner is dependency-injected for
testability.

New endpoints (under /api/mcp/install/codex/*):
- GET status — probes `codex mcp get open-design`; returns
  { available, installed } so the UI can render the toggle state.
- POST — runs `codex mcp add open-design --env K=V … -- <node> <cli.js> mcp`,
  reusing the same payload as /api/mcp/install-info.
- DELETE — runs `codex mcp remove open-design`.

The web UI renders the toggle only inside the Codex client panel
(`client.id === 'codex'`). When Codex CLI is missing it shows a
disabled button with an explanatory hint instead of vanishing, so users
know why one-click isn't available.

* feat(mcp): teach agents to clarify ambiguous format requests

When the user asks for a "PPT" / "deck" / "slides" / "PDF" / "doc",
that's two very different deliverables: Open Design natively produces
browser-viewable HTML/SVG (including HTML-rendered decks), but the
user may actually want a binary .pptx / .docx / .pdf — which OD does
NOT produce and which the agent would have to export from OD's output
itself. Add a paragraph to the MCP server instructions telling the
agent to ASK which one is wanted before kicking off work, rather than
silently picking one or dual-tracking both paths.

Pure prompt-text change in the instructions block; no tool surface or
behavior change. Costs ~10 lines of session-init context (one-time
per MCP session), versus dual-tracked .pptx hedging Codex was
otherwise doing on every ambiguous request.

* feat(mcp): surface agent messages, skip OD discovery, slim list_plugins

Three fixes uncovered while exercising the full MCP-driven generation
loop end-to-end with a real Codex client. Each one is a real
blocker / footgun for the external agent.

1. get_run now includes agentMessage — the inner agent's textual
   output reassembled from the SSE event stream. Without this, runs
   that ended in a discovery-style clarifying question (e.g. a
   <question-form>) looked like "succeeded with empty output" mysteries
   to the outer agent. The hint now branches on whether previewUrl
   exists: with preview = show preview + relay agentMessage as the
   inner agent's note; no preview = relay agentMessage as the actual
   deliverable (almost always a clarifying question).

2. create_project sets skipDiscoveryBrief:true by default. The outer
   agent IS the user-facing surface for MCP-driven runs, so OD's own
   interactive discovery stage just creates a confusing
   nested-clarification loop where its question form ends up dropped
   (no files = no artifact). Better to let the outer agent gather
   requirements and pass a precise prompt or plugin to start_run.

3. list_plugins flattens the daemon's bulky 16-field plugin record
   (fsPath, sourceMarketplaceId, installedAt, …) into the few fields
   an agent actually picks plugins on: id, title, description, kind,
   tags. description / kind come from manifest.description /
   manifest.od.{taskKind,kind} which the previous pass-through dropped
   on the floor.

* feat(mcp): smart entry fallback + list_agents

Two fixes uncovered by exercising the full Codex-driven loop on a real
machine. Both close the gap between "Open Design has the data" and
"the external agent can find it".

1. get_project / get_run now fall back to scanning the project's file
   list when metadata.entryFile is missing. We hit the case where
   write_file (and a half-finished inner-agent run) put a perfectly
   viewable index.html into the project, but metadata.entryFile stayed
   null — so the outer agent got no previewUrl from MCP and resorted
   to guessing a file:// path. Priority: declared entryFile, then
   index.html anywhere, then a single .html at the project root.
   Pure read-side change; no extra fetch when entryFile is already
   set.

2. list_agents lets the outer agent stop guessing 'claude' / 'codex' /
   'gemini' for start_run.agent. The daemon already exposed
   /api/agents with 19 supported CLIs and an `available` flag. The
   MCP wrapper defaults to filtering to installed agents only (so the
   agent never picks one whose binary won't spawn), with
   includeUnavailable:true as an opt-in to see uninstalled ones plus
   their installUrl. Models truncated to 10 with modelsCount carrying
   the real total — keeps the response token-economical even for
   agents (opencode) with 100+ models.

* feat(mcp): tell the outer agent runs take 5–30 min, don't bypass

Direct response to a real Codex client observably cancelling an
in-flight run after 3 polls and substituting its own write_file
output ("文件时间戳没推进 → 我直接覆盖生成") — exactly the failure
mode this MCP surface exists to avoid.

start_run's hint and the session-init instructions block now both
state explicitly:
  - Runs typically take 5–30 minutes.
  - status:running with unchanged file mtimes is the inner agent
    thinking, NOT a hang.
  - Do not cancel_run out of impatience.
  - Do not substitute write_file as a "faster" workaround — that
    discards OD's pipeline-driven design quality.
  - Poll every 30–60 seconds; report "still working" to the user
    between polls.
  - Only call cancel_run if the user explicitly asks.

Pure prompt-text change; no surface or behavior change. Costs ~10
lines of one-time session-init tokens + ~80 more tokens per
start_run response, in exchange for the outer agent actually
trusting the run.

* feat(mcp): persist run events to disk + expose tail-able path

Closes the in-flight visibility gap that made real Codex clients
cancel a 24-min run after 3 polls and substitute their own
write_file output, simply because polling get_run showed no change.

Daemon: every SSE event is now mirrored to a JSON-Lines file at
<RUNTIME_DATA_DIR>/runs/<runId>/events.jsonl. The path is wired
through createChatRunService's new `runsLogDir` option (null
disables, preserving legacy in-memory-only behavior). statusBody
exposes the path as `eventsLogPath`. Failures are best-effort — a
broken stream destroys itself and the run keeps going on the
in-memory event log (SSE clients are unaffected).

MCP: get_run already passed statusBody through, so eventsLogPath
surfaces automatically. The new value is that get_run during a
running status now adds a directive hint telling the outer agent to
`tail -n 50 -f <path>` in its own shell to see live progress —
that's the signal that makes the agent trust the run and stop
cancelling. The succeeded-status hint mentions the path too, for
forensics. No new tool; the field rides existing get_run polls.

Spec-first throughout:
  - runs.test.ts adds 4 tests covering write-per-emit, statusBody
    field, null-runsLogDir back-compat, and the no-IO guarantee
    when persistence is disabled.
  - mcp-runs.test.ts adds 1 test for the running-status hint.

* fix(mcp): get_run hint directs callers to pass project explicitly

The success hint in get_run previously said "project defaults to this
run's project", which is misleading: get_artifact has no run context and
falls back to /api/active when project is omitted, not to the run's
project. A client following the old guidance after creating a fresh or
non-active project could fetch the wrong project's files or fail with
"no active project".

The hint now embeds the run's projectId and tells callers to pass it
explicitly: get_artifact({ project: "<id>" }). A focused regression test
in mcp-runs.test.ts verifies the hint contains the projectId and does
not contain the incorrect active-context fallback guidance.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

* fix(contracts): add eventsLogPath to ChatRunStatusResponse

The daemon's statusBody() returns eventsLogPath but the shared DTO
lacked this field, leaving web/CLI/MCP callers without a typed
accessor.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

* feat(mcp): bind MCP runs to OD conversations + studio deep links

Closes the last gap that made MCP-driven runs feel like a parallel
side door: the user could not see the conversation in OD's studio
page even though the run was real, finished, and had files.

Daemon side: POST /api/runs now falls back to the project's default
conversation when the caller (MCP / SDK) only supplied projectId.
It synthesizes an assistantMessageId, writes a user message with the
prompt as content, and lets the existing
`pinAssistantMessageOnRunCreate` helper create the empty assistant
row. The existing `appendMessageAgentEvent` accumulation path then
streams text_delta events into the assistant row's content — same
as the web /api/chat flow. The response body now echoes the
resolved conversationId + assistantMessageId so MCP callers can
build a deep link.

`buildMcpInstallPayload` now also surfaces `webBaseUrl` (read from
OD_WEB_PORT, the env tools-dev exports for the web listener). MCP
clients use it to build studio deep links.

MCP side: `start_run`, `get_run`, `get_project` now return a
`studioUrl` — a browser-facing OD URL pointing at the studio page
that shows the file preview AND the chat history side by side. The
hint on each tool was updated to tell the outer agent to hand
studioUrl to the user as the primary link (previewUrl falls back to
raw-file when the user only wants the rendered output). The
webBaseUrl is fetched once via /api/mcp/install-info and cached for
5s to keep per-poll cost flat; a tiny `_resetWebBaseUrlCache` export
lets tests start each case with a clean cache.

Contracts: `ChatRunCreateResponse` gains optional conversationId +
assistantMessageId; `ChatRunStatusResponse` gains optional
eventsLogPath. Both additive, no consumer breakage.

Spec-first throughout:
  - get_run includes studioUrl on success when webBaseUrl + conversationId are available
  - get_run omits studioUrl when webBaseUrl is null
  - start_run returns studioUrl and conversationId for the new run
  - get_project returns studioUrl using the project default conversation

* fix(mcp): add skill/skillId to start_run so listed skills are actionable

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

* fix(test): update mcp-get-project test to handle getWebBaseUrl fetch

The get_project handler now calls getWebBaseUrl (added with the studio
deep-link feature), which fetches /api/mcp/install-info. The test mock
only handled the /api/projects/:id URL and expected a single fetch call,
causing the assertion to fail with "called 2 times" instead of 1.

Fix: handle the /api/mcp/install-info URL in the fetch mock (returning
webBaseUrl: null), update the call count expectation to 2, and call
_resetWebBaseUrlCache in afterEach to prevent cache bleed between tests.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

* feat(mcp): tell agents to render studioUrl as a clickable markdown link

Observed in a real Codex client: Codex received studioUrl correctly
but rendered it as inline code (gray code-span), which its built-in
browser pane does NOT make clickable. The user had to copy-paste the
URL into a browser by hand even though Codex / Cursor / Zed all
auto-link markdown `[label](url)` syntax and would navigate it in
their right-side preview pane.

The three studioUrl-mentioning hints now explicitly tell the agent
to render the URL as a markdown link (e.g.
`[Open Open Design studio](URL)`) and never as inline code or bare
text. Pure prompt-text change.

* fix(runs): resolve default agent when MCP caller omits agentId; add McpRunCreateRequest contract type

- POST /api/runs: when no agentId is provided, resolve from app-config
  or first available CLI before spawning — mirrors the pattern the
  routine handler already uses. Prevents 'unknown agent: undefined'
  failures on the create_project -> start_run(prompt) MCP path.
- packages/contracts: add McpRunCreateRequest interface for the
  projectId-only / SDK caller shape so typed callers can construct the
  request without casts. Exported via index.ts's existing chat re-export.
- packages/contracts/tests: add compile fixture verifying projectId-only,
  projectId+message, and projectId+message+agentId shapes all type-check.
- apps/daemon/tests: add mcp-runs test asserting agent arg omitted in
  start_run does not include agentId in the POSTed body.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)
2026-05-28 11:29:11 +00:00
Mason
50f85b509a
fix(analytics): fill run and feedback metadata (#3194)
* fix(analytics): fill run and feedback metadata

* fix(analytics): map feedback API providers
2026-05-28 11:05:56 +00:00
Denis Redozubov
71ad9eb292
fix(media): enforce legacy media policy for run tokens (#3205) 2026-05-28 11:02:04 +00:00
Bernardxu123
0f1c95e1de
feat(agents): add DeepSeek Reasonix CLI support via ACP protocol (#2952)
* feat(agents): add DeepSeek Reasonix CLI support via ACP protocol

* fix(auth): tighten Reasonix auth detection + add live model discovery

- Anchor isReasonixAuthFailureText to Reasonix-specific markers
  (~/.reasonix/config.json instead of generic api_key matches)
- Add fetchModels with detectAcpModels for live model discovery
  in the agent picker (matches kimi/hermes/kilo/kiro/vibe pattern)

* feat(web): add Reasonix short description in agent picker

---------

Co-authored-by: Bernardxu123 <Bernardxu123@users.noreply.github.com>
2026-05-28 09:36:22 +00:00
Denis Redozubov
c847ace554
Add run-scoped media execution policy (#3106)
* feat(contracts): add run media execution policy

* feat(daemon): enforce run media execution policy

* test(daemon): cover media execution policy gates
2026-05-28 09:19:40 +00:00
kami
0bfb4803e7
feat(daemon): add Phase 2C CLI wrappers (#2179)
* feat(daemon): add phase 2c cli wrappers

Co-authored-by: multica-agent <github@multica.ai>

* fix: handle desktop-gated CLI imports

Co-authored-by: multica-agent <github@multica.ai>

* fix: pass sidecar ipc path to agent wrappers

Co-authored-by: multica-agent <github@multica.ai>

* fix: make agent wrapper env explicit

Co-authored-by: multica-agent <github@multica.ai>

* fix(daemon): preserve CLI import and diff edge cases

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-28 09:08:31 +00:00
leessju
381e9a96e2
Make share deploys visibly complete (#2843)
* Make share deploys visibly complete

Share deploys were uploading only the referenced entry graph, so sibling screens could fall through to provider fallback pages after deployment. They also completed silently except for the result link block inside the deploy dialog, leaving users unsure whether a redeploy finished.

This includes visible files for Open Design-managed projects in real deploy/preflight payloads while preserving the selected entry as provider-root index.html. Linked-folder projects stay on the referenced-file graph so repo files that are visible in the file panel, like README.md or src/**, do not become public by accident. The web UI also shows a localized success toast at the top of the app after a successful Vercel or Cloudflare Pages upload.

Constraint: Cloudflare Pages Direct Upload serves missing files through its fallback behavior, so deployment payload completeness must be handled before upload.

Constraint: Linked-folder projects can expose arbitrary repository content through the file panel, so whole-project deploy expansion is limited to Open Design-managed project directories.

Rejected: Reintroduce an entry-file dropdown | users wanted full project deployment semantics rather than selecting a root-only artifact.

Rejected: Upload every visible linked-folder file | would make non-runtime repo content publicly reachable after Share deploy.

Confidence: high

Scope-risk: moderate

Directive: Do not remove the selected-entry-to-index.html mapping; it keeps alternate entries like index-v1.html deployable as the root without overwriting them with the launcher.

Directive: Do not expand linked-folder deploys beyond referenced web assets without an explicit user opt-in and review of the privacy model.

Tested: pnpm --filter @open-design/daemon test tests/deploy.test.ts tests/deploy-routes.test.ts

Tested: pnpm --filter @open-design/web test tests/components/FileViewer.test.tsx

Tested: pnpm --filter @open-design/web typecheck

Tested: pnpm guard

* fix(web): gate share-deploy ready hint on actual ready state

The 'Ready · Deployed URL' hint was unconditionally rendered whenever
deployResultCards was non-empty, so a successful deploy that came back
as link-delayed or protected showed contradictory copy next to the
'Public link pending' / 'Deployment protection enabled' badge.

Render the hint only when deployResultState(activeDeployment?.status)
is 'ready' so the success line stays consistent with the badge below.

---------

Co-authored-by: nicejames <nicejames@gmail.com>
2026-05-28 08:56:11 +00:00
hahalolo
fe24c8addf
chore: remove dead inline /api/agents route handler in server.ts (#2945)
Co-authored-by: Siri-Ray <2667192167@qq.com>
2026-05-28 08:54:11 +00:00
初晨
693176457e
fix(daemon): skip chat probe for SenseAudio media models (#3181) 2026-05-28 08:52:34 +00:00
Marc Chan
338cb4d423
fix(platform): support live system proxy changes (#3093)
* fix(platform): support live system proxy changes

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Honor lowercase proxy env vars within a single source before merging proxy-aware envs.\n\nGenerated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Refresh provider request proxy env on each dispatcher creation and cover it with a focused regression test.

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): enable node env proxy for user proxy vars

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)

* fix(platform): support live system proxy changes

Generated-By: looper 0.9.1 (runner=fixer, agent=opencode)
2026-05-28 06:11:47 +00:00
lefarcen
df8a0faff6
feat(runtimes): register AMR (vela) as an ACP stdio agent (#2355)
* feat(runtimes): register AMR (vela) as an ACP stdio agent

AMR is the vela CLI's ACP runtime mode. `vela agent run --runtime opencode`
speaks ACP JSON-RPC over stdio (see vela's
`specs/current/runtime/manual-agent-run-openrouter.md`); per
`docs/new-agent-runtime-acp.md` we expose it through the same `streamFormat:
'acp-json-rpc'` transport that already powers Hermes, Devin, Kimi, etc.

The new `defs/amr.ts` is the entire wiring — `buildArgs` returns
`['agent', 'run', '--runtime', 'opencode']`, `fetchModels` reuses
`detectAcpModels`, and the fallback list seeds the OpenRouter ids vela's
e2e baseline uses. `executables.ts`/`app-config.ts`/`metadata.ts` get the
matching `VELA_BIN`/`VELA_LINK_URL`/`VELA_RUNTIME_KEY`/`VELA_OPENCODE_BIN`
allowlist + install/docs URLs, so users can configure the per-agent env in
Settings without leaking into other adapters.

Coverage: `tests/fixtures/fake-vela.mjs` is a minimal ACP stub that returns
the documented `initialize` / `session/new` / `session/set_model` /
`session/prompt` shapes; `tests/amr-acp-integration.test.ts` spawns it via
`child_process.spawn` and drives a full turn through `attachAcpSession` and
`detectAcpModels`, so the ACP transport contract for AMR is end-to-end
verified locally even before a real `vela` binary is installed.

Validated:
- pnpm guard
- pnpm typecheck (all workspace projects)
- pnpm --filter @open-design/daemon test (2881/2881)

Deferred: real OpenRouter-backed turn through a built `vela` binary —
the runtime def needs no changes for that path, only `VELA_RUNTIME_KEY`
and `VELA_LINK_URL` in env (or Settings).

* fix(runtimes/amr): pin a concrete default model and bare openai ids

End-to-end validation against a freshly-built `vela` (nexu-io/vela@main)
+ OpenRouter surfaced two contract details the first AMR runtime def
got wrong:

1. vela rejects `session/prompt` with `session/set_model must be called
   before session/prompt`. attachAcpSession in apps/daemon/src/acp.ts
   skips set_model whenever the picked model is the synthetic 'default'
   id, so AMR's fallback list must NOT include DEFAULT_MODEL_OPTION. The
   def now ships a concrete `gpt-5.4-mini` as both `fetchModels`'
   default option and `fallbackModels[0]`, which makes attachAcpSession
   always send a real `session/set_model` for AMR turns.

2. `vela --runtime opencode` auto-prepends `openai/` to whatever modelId
   it forwards to opencode's openai provider. With OpenRouter-style ids
   like `openai/gpt-5.4-mini`, opencode receives the double-prefixed
   `openai/openai/gpt-5.4-mini` and replies `ProviderModelNotFoundError`.
   The new fallback list ships the bare ids opencode's openai registry
   actually knows about (gpt-5.4, gpt-5.4-mini, gpt-5.4-fast, etc.).

Stub + tests:
- tests/fixtures/fake-vela.mjs now enforces the set_model gate the same
  way real vela does, so a regression that silently goes back to
  model: 'default' would surface as a fatal error in tests instead of a
  hidden production failure.
- tests/amr-acp-integration.test.ts pins both contracts: no 'default' /
  no 'openai/' prefix in fallbackModels, and a negative case that
  asserts session/prompt fails when no model is set.

Adds `apps/daemon/scripts/verify-amr-real-vela.mjs` — a small dev-time
runner that drives `attachAcpSession` against a real `vela` binary and
prints the daemon's chat events, so future protocol drift can be checked
against an actual OpenRouter call.

Verified locally: `vela agent run --runtime opencode` + OpenRouter
returns the prompted string ("AMR-E2E-PASS") through the full daemon
pipeline; daemon test suite stays 2883/2883.

* fix(runtimes/amr): substitute concrete model when chat run sends 'default'

A plugin-driven AMR run from the UI surfaced a real-world hole in the
prior commit:

  json-rpc id 3: session/set_model must be called before session/prompt

The Default-design-router plugin (and any caller that doesn't pin a
real model) sends `model: 'default'` straight through, which the AMR
runtime def cannot accept — vela rejects `session/prompt` without
`session/set_model` and attachAcpSession skips set_model whenever
model === 'default'. Just leaving DEFAULT_MODEL_OPTION out of the
adapter's `fallbackModels` is not enough: the chat-run handler in
server.ts still forwarded 'default' verbatim.

This adds `resolveModelForAgent(def, resolved, env?)` as the
single source of truth for the substitution:

  1. If the caller picked a real id, pass it through.
  2. Else, if `def.defaultModelEnvVar` is set and the daemon process
     env has a non-empty value for it, return that (operator escape
     hatch — see below).
  3. Else, if the def's `fallbackModels` does NOT contain a 'default'
     id, return `fallbackModels[0].id`.
  4. Else, return the original value (the historic shape — defs that
     list 'default' themselves are untouched).

AMR sets `defaultModelEnvVar: 'VELA_DEFAULT_MODEL'`, so when
opencode's openai-provider registry deprecates `gpt-5.4-mini`
upstream, an operator can swap the fallback id without a code change
by exporting `VELA_DEFAULT_MODEL=gpt-5.5` before launching tools-dev
/ od. Worth noting the env var must live in the daemon's `process.env`
(Settings-UI per-agent env values only reach the spawned child, not
the daemon's resolver) — the new field's docblock spells this out.

Coverage:
- `tests/runtimes/resolve-model.test.ts` — 8 unit tests covering all
  four resolver branches plus the env-override happy path / fallback /
  ignore-when-user-picked-a-real-id case.
- `pnpm --filter @open-design/daemon typecheck` clean.

* chore(runtimes/amr): move AMR to the top of the base agent list

So `AMR (vela)` shows up first in the agent picker / status views,
ahead of claude / codex. Pure ordering change; no behavior delta.

* feat(amr): Sign-in / Sign-out button on the AMR Settings card

The first half of the AMR work assumed the operator would set
VELA_RUNTIME_KEY / VELA_LINK_URL on the daemon process and never
surfaced login state to users. This adds the missing UX so a fresh
install can drive the full path from Settings:

  - GET  /api/integrations/vela/status   reads ~/.vela/config.json
    for the active profile and returns { loggedIn, profile, user }
    (without leaking the runtime/control keys themselves).
  - POST /api/integrations/vela/login    spawns `vela login` once
    (409 if one is already in flight). The vela CLI opens the user's
    browser to the device-authorization page itself — Open Design
    only needs to kick the subprocess off.
  - POST /api/integrations/vela/logout   removes ~/.vela/config.json
    so the next status read returns logged-out.

`AmrAgentCard` is a dedicated agent-card component for AMR because
the existing `<button>` row can't host an interactive sub-control
(nested interactive elements). It polls /status after a login click
until the daemon reports loggedIn=true (or 5 minutes elapse), and
exposes a Sign-out action on hover. Other adapters (claude, codex,
hermes, …) keep their existing `<button>` card.

i18n: 8 new keys (settings.amrLogin / Logout / LoggingIn / etc.)
added to en + zh-CN. Other locales spread `en` and inherit the
English copy until translations land.

Coverage:
- `tests/integrations/vela.test.ts` pins the config.json reader
  against a tmp HOME — including the negative case where a profile
  has user info but no runtimeKey (still logged-out), and the
  secret-leak guard ("rt-secret-*" must not appear in the projection
  payload).
- `tests/components/AmrAgentCard.test.tsx` covers all four UI
  states (logged-out, logging-in, logged-in, logging-out) plus the
  click-propagation invariant the divergent card was built to keep.

`pnpm --filter @open-design/daemon test` 2901 / 2901 passing.
`pnpm --filter @open-design/web test` 1719 / 1719 passing.
`pnpm typecheck` + `pnpm guard` clean.

Dev script side-effects: `apps/daemon/scripts/verify-amr-real-vela.mjs`
no longer requires both VELA_RUNTIME_KEY and VELA_LINK_URL — if
VELA_PROFILE is set, the vela CLI is allowed to resolve credentials
from `~/.vela/config.json`. Added the two AMR `.mjs` fixtures to
`scripts/guard.ts` allowlist with the executable-fixture / dev-runner
rationale.

* fix(connection-test): substitute model for AMR before attachAcpSession

The chat-run path in server.ts already routes the requested model through
`resolveModelForAgent` so AMR / vela (whose CLI demands an explicit
`session/set_model` before `session/prompt`) gets the def's first
concrete fallback id when the chat run ships `model: 'default'`.
`connectionTest.ts` was wiring `attachAcpSession({ ..., model: model ?? null })`
directly, which made the Test Connection button on the AMR Settings
card deadlock with the same `session/set_model must be called before
session/prompt` error the chat-run path already handles — surfaced as a
permanent "Testing connection…" spinner in the UI.

Reuse the same helper here so Test Connection mirrors chat-run behavior.

* test(amr): three-layer end-to-end coverage for the AMR login + turn flow

The PR up to this point shipped runtime + UI code with unit-level Vitest
coverage. This commit adds the cross-layer regression net the live demo
relied on:

1. apps/daemon/tests/integrations/vela.routes.test.ts (HTTP, Vitest)
   Spins up the real daemon Express app via `startServer({port:0,...})`,
   persists `agentCliEnv.amr.VELA_BIN = <fake>` into app-config.json,
   and exercises every /api/integrations/vela/* endpoint against the
   extended fake-vela stub:
     - status reads ~/.vela/config.json under various states
     - login spawns the fake, waits for config.json to appear, returns
       pid + startedAt + profile
     - 409 already-running guard with the stub's delay knob
     - logout removes the file (idempotent)
     - secrets (runtimeKey / controlKey) never leak in the projection
     - login → status round-trip flips loggedIn=false → true

2. e2e/tests/amr/turn.test.ts (tools-dev orchestrated, Vitest)
   Boots a namespaced daemon + web pair through `createSmokeSuite`,
   inlines a self-contained fake `vela` binary that handles BOTH
   `vela login` (writes ~/.vela/config.json) and
   `vela agent run --runtime opencode` (ACP stdio with the
   `session/set_model must precede session/prompt` gate the real binary
   enforces), then drives a complete /api/runs lifecycle for
   `agentId: 'amr', model: 'default'` and asserts the assistant message
   captures the fake's streamed text. This is the test that would have
   surfaced today's plugin-default-model regression (the `set_model
   before prompt` error) at PR time instead of demo time.

3. e2e/ui/amr-login-pill.test.ts (Playwright)
   Mocks /api/agents + /api/integrations/vela/{status,login,logout}
   to drive the Settings AMR card through the full Sign in → Signed in
   → Sign out cycle. Pins the AmrLoginPill polling contract and the
   aria-label semantics (the pill's accessible name is "Sign out" once
   logged in, regardless of which label the hover-state text shows).

fake-vela.mjs extensions:
   - Handles `vela login` argv by writing
     ~/.vela/config.json for the active VELA_PROFILE and exiting 0 —
     mirrors real vela's on-disk side-effect without the device-auth
     loop.
   - FAKE_VELA_LOGIN_DELAY_MS knob so route tests can observe the
     in-flight state of the spawn lifecycle.
   - FAKE_VELA_LOGIN_USER_EMAIL / _USER_PLAN to assert the surfaced
     user fields end-to-end.

Validated:
   - `pnpm guard` + `pnpm typecheck` (all workspace projects)
   - `pnpm --filter @open-design/daemon test`: 2998 / 2998 passing,
     including the new 8-test integration suite.
   - `cd e2e && pnpm test tests/amr`: 1 / 1 passing.
   - `cd e2e && pnpm exec playwright test ui/amr-login-pill.test.ts`:
     1 / 1 passing (6.7s).

* feat(amr): package native cli and refine login ui

* feat(amr): wire vela cli beta packaging

* docs(amr): document vela ci packaging review

* docs(amr): refine vela ci integration review

* fix(ci): refresh nix pnpm dependency hashes

* fix(pack): clean up Vela CLI packaging

* fix(pack): bundle Vela CLI support files

* fix(amr): recover login attempts from stale auth state

* test: expand AMR and automations coverage

* fix(amr): address review follow-ups

* test(web): align tasks fixtures with contracts

* fix(daemon): type wildcard route params

* fix(ci): refresh PR merge validation

* fix(amr): clear env credentials on logout

* feat(settings): inline local CLI model configuration

* fix(amr): recognize daemon env credentials

* [codex] Fix Vela companion packaging (#2979)

* Fix Vela companion packaging

* Update Nix pnpm dependency hashes

* [codex] Surface AMR account failures (#2980)

* fix: surface AMR account failures

* fix: cover AMR recovery error guidance

* chore: bump beta base version to 0.8.1 (#2990)

* Fix AMR profile and packaged runtime review issues

* Detect packaged AMR OpenCode companion tree

* feat(web): polish AMR frontend flows

* Polish AMR onboarding card

* fix: read AMR login state from dot-amr config (#3048)

* test: tighten AMR credential and packaging coverage

* test: restore AMR executable test env helper

* [codex] Fix packaged mac Dock identity and AMR label (#3076)

* Fix packaged mac sidecar Dock identity

* Rename AMR assistant label

* Fix AMR live models and dot-amr login state (#3073)

* fix: read AMR login state from dot-amr config

* fix: load live AMR models before runs

* fix: point AMR onboarding link to production wallet

* fix: address AMR model review feedback

* fix: persist live AMR model fallback

* [codex] Fix AMR link catalog model ids (#3088)

* Fix packaged mac sidecar Dock identity

* Rename AMR assistant label

* Fix AMR link catalog model ids

* Fix AMR model normalization typecheck

* Use live AMR model for default runs

* fix: polish AMR runtime settings UI

* Accelerate AMR startup defaults (#3092)

* Surface AMR insufficient balance wallet URL (#3099)

* fix(web): polish onboarding controls (#3112)

* fix(web): show CLI scan loading state

* Avoid duplicate AMR wallet recharge links (#3117)

* Avoid duplicate AMR wallet recharge links

* Use Vela CLI 0.0.3 test package

* chore(nix): refresh pnpm deps hash

* Fix AMR wallet guidance display

---------

Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>

* chore(pack): pin Vela CLI 0.0.3-test.1 (#3127)

* chore(nix): refresh pnpm deps hash

* chore(pack): pin Vela CLI 0.0.3

* chore(nix): refresh pnpm deps hash

* fix(web): suppress AMR exit 130 fallback (#3136)

* feat(web): nudge users to hosted AMR on model/auth/quota failures (#3083)

* feat(web): nudge users to hosted AMR on model/auth/quota failures

When a non-AMR agent run fails with an auth / quota / upstream model
error, surface an inline nudge under the error pill linking to Open
Design's hosted AMR gateway (https://open-design.ai/amr). The nudge
fires `surface_view` (element=run_failed_toast) on impression and
`ui_click` (element=go_amr) on the link.

Also teach the daemon to classify CLI-agent auth/quota/upstream failures
(Claude Code, codex, ...) into specific API error codes
(AGENT_AUTH_REQUIRED / RATE_LIMITED / UPSTREAM_UNAVAILABLE) instead of
the generic AGENT_EXECUTION_FAILED, so both the error message and the
nudge key off accurate codes. AMR's own runs are excluded from the
nudge — they keep the dedicated sign-in / recharge affordances.

* feat(web): rework failed-run AMR guidance into per-case error UI

Replace the single inline nudge with a per-case failed-run experience
driven by the run's error code + agent:

- The error card is now neutral gray (was red) and always carries a
  retry button; it is driven by the persisted per-message error event so
  it survives a reload.
- Non-AMR agent hitting a model/auth/quota wall: a theme-color promotion
  card under the error card offers "switch to AMR & retry" — switches the
  run to AMR, opens Settings on the AMR card, and auto-retries once the
  account signs in (ProjectView polls vela login status, independent of
  the Settings pill lifecycle, with success / 5-min-timeout / unmount
  exits).
- AMR agent unauthorized: clearer copy + an "authorize & retry" button.
- AMR agent out of balance: clearer copy + a "top up" button to the AMR
  wallet, with manual retry.
- Settings AMR card: when opened from the nudge, it scrolls into view and
  pulses, and an authorize-button coachmark (a fake hand cursor that
  rises in and dismisses on hover) points at the sign-in control when not
  yet authorized.

analytics: surface_view (run_failed_toast) on the promotion card and
ui_click (go_amr) on its action are retained. i18n adds chat.amrCard.*
and chat.amrError.* (en / zh-CN / zh-TW translated; other locales fall
back to en) and drops the old chat.amrErrorGuidance keys.

* fix(daemon): require status context for numeric service-failure codes

Per review on #3083: the model-service classifier matched bare HTTP
status numbers (`500`, `502`, `429`, `401`), so ordinary CLI output like
`line 500`, `read 502 bytes`, or `exit code 401` could be misclassified
as a provider outage / auth wall and wrongly surface the AMR nudge. Now
a status number only counts when it carries explicit context (`HTTP 500`,
`status 503`, `code: 401`, `502 Bad Gateway`); textual provider phrases
(overloaded, bad gateway, service unavailable, rate limit, …) are
unchanged. Adds fixtures proving unrelated numeric output stays null.

* fix(web): keep error pill for failed runs ChatPane's card doesn't cover

Per review on #3083: the per-message gray error pill was suppressed for
every persisted error status event, but ChatPane only renders the
replacement top-level error card for `retryableAssistantMessage` (the
last failed assistant). So a failed turn that is no longer last (after a
follow-up) or an older failed run in history showed neither the pill nor
the card — its error detail vanished, undercutting reload/history
survival. ChatPane now passes `errorCardOwnerId` (the assistant id whose
error the card represents); AssistantMessage suppresses only that one
pill and keeps rendering StatusPill for all other error events.

* fix(daemon): don't treat a process exit code as an HTTP status

Follow-up to review on #3083: the status-context helper accepted a bare
`code` prefix, so `exit code 401` / `process exited with code 429` still
matched and got classified as AGENT_AUTH_REQUIRED / RATE_LIMITED (the
very `exit code 401` case the comment calls out as noise). `code` now
only counts when qualified (`status code` / `error code` / `response
code`) or punctuation-bound (`code: 401`); bare `exit code N` no longer
matches. Adds fixtures for exit-code lines returning null.

* chore(web): translate AMR card / error keys for 16 remaining locales

PR #3083 added 10 new `chat.amrCard.*` / `chat.amrError.*` keys but only
provided en/zh-CN/zh-TW translations; the other 16 locales fell back to
English. Translate the card title/body, three chips, primary CTA, and
the AMR self-error (auth / balance) messages and buttons for ar, de,
es-ES, fa, fr, hu, id, it, ja, ko, pl, pt-BR, ru, th, tr, uk.

* fix(amr): address review feedback on #2355

Targeted fixes for the unresolved review threads on #2355. Each fix
includes / updates a focused test.

- runtimes/executables.ts: `packagedVelaOpenCodeCompanionTree` now
  verifies the inner `opencode` executable exists + is runnable, not
  just the directory. This closes the false-positive availability path
  that let `detectAgents()` surface AMR as available even when the
  packaged companion was empty / partially copied (mrcfps, 4 threads).

- runtimes/executables.ts: `resolveAmrOpenCodeExecutable` now prefers
  the bundled `<OD_RESOURCE_ROOT>/bin/libexec/opencode/opencode` over a
  stale `opencode` on the user's PATH, so packaged AMR builds can't be
  hijacked by a global installation.

- web/EntryShell.tsx: when the Local CLI scan returns an available
  agent and the previously-selected agent is AMR, switch the selection
  to the first available local agent so the runtime and persisted
  agent agree before Continue.

- server.ts (model-probe branch): for AMR, check `readVelaLoginStatus`
  BEFORE rejecting on an empty live-model catalog — a signed-out user
  was getting `AMR_MODEL_UNAVAILABLE` ("choose a model") instead of
  the correct `AMR_AUTH_REQUIRED` (sign-in affordance).

- server.ts (default model fallback): if the user asked for the AMR
  agent default and the cached id is no longer in the FRESH catalog,
  fall back to `liveModels[0]` from the probe instead of rejecting the
  run as `AMR_MODEL_UNAVAILABLE`.

- integrations/vela.ts: route `vela login` through
  `createCommandInvocation` so an npm/Node-style `vela.cmd` / `.bat`
  shim on Windows gets the correct `cmd.exe /d /s /c …` wrapping with
  verbatim args (matches `execAgentFile` / chat-run spawning).

- tools/pack/src/linux.ts: in containerized Linux builds, bind-mount
  the host directory of `OPEN_DESIGN_VELA_CLI_BIN` and rewrite the env
  to the container-side path. The host path was being passed in as-is
  even though the default container only mounts /project, /tools-pack
  and cache/home — `copyOptionalVelaCliBinary` saw a missing path.

Deferred (out of scope for this PR):
- `od amr status/login/logout/cancel` CLI subcommands (AGENTS.md
  UI/CLI dual-track rule, server.ts:5763) — sizable surface; tracked
  for a separate focused PR.
- Strict `--require-vela-cli` for Windows + mac-x64 beta builds:
  prematurely blocked — `@powerformer/vela-cli` only publishes the
  `darwin-arm64` platform binary today; adding the flag elsewhere
  would fail the builds. Revisit once win/x64/linux binaries ship.

* fix(amr): hoist sendAmrAccountFailure above the AMR catalog preflight (TDZ)

The new signed-out AMR branch in the catalog preflight at server.ts:10875
calls `sendAmrAccountFailure(...)` to emit AMR_AUTH_REQUIRED, but the
const declaration sat ~100 lines below at the outer function scope. Because
`const` is TDZ-aware, that branch would have thrown `ReferenceError:
Cannot access 'sendAmrAccountFailure' before initialization` for the
exact users it tries to help — defeating the original intent.

Hoist the helper to just above the AMR preflight block so it's available
to every AMR code path in this function. Behavior elsewhere is unchanged.

Also rerun the daemon test suite: `launch.test.ts > resolveAgentLaunch
uses packaged built-in Vela for AMR` was creating the
`<resourceRoot>/bin/libexec/opencode/` companion *directory* only, but
this PR's earlier tightening of `packagedVelaOpenCodeCompanionTree`
also requires the inner `opencode` executable. Add it to that fixture
to match the new contract; the test was a sibling of the executables /
env-and-detection fixtures already updated in 13fc4f4.

Addresses #2355 review (mrcfps, 2026-05-28).

* feat(web): add hover cancel for AMR login (#3158)

* feat(web): add hover cancel for AMR login

* fix(web): don't bounce AmrLoginPill back to 'Signing in…' after local cancel

Both codex-connector (P2) and looper (CHANGES_REQUESTED) on this PR
flagged the same race in the new local-cancel path: `handleCancelLogin`
dispatches `notifyAmrLoginStatusChanged('login-canceled')` immediately
after `/login/cancel` returns, but the `AMR_LOGIN_STATUS_EVENT` listener
unconditionally re-enters `refresh()` and then restarts polling
whenever `/api/integrations/vela/status` still reports
`loginInFlight: true`.

That is a real race because the daemon's `cancelVelaLogin()` only sends
SIGTERM (escalating to SIGKILL after `LOGIN_CANCEL_KILL_GRACE_MS` =
2000 ms) and keeps the child in `activeLoginProcs` until it actually
exits — so the first `/status` read after a successful cancel can
legally still come back as in-flight. Under that window the pill flips
back to 'Signing in…' and can later surface the timeout/error path even
though the user already canceled, defeating the behavior promised in
the PR description.

Fix the listener instead of every dispatch site: in the
`login-canceled` branch, after the local reset (stopPolling +
setPending(null) + clear refs), optimistically mark every subscribed
pill instance as not-in-flight (`setStatus((c) => c ? { ...c,
loginInFlight: false } : c)`) and `return` — skip the
refresh-and-reconcile branch below entirely. The next explicit refresh
(component mount, user interaction, or a `status-changed` event) will
pick up the daemon's confirmed state once the child has actually
exited.

Add a focused regression test that holds `/api/integrations/vela/status`
at `loginInFlight: true` even after a successful `/login/cancel`,
asserting that the pill stays at the Canceled → Authorize sequence and
never bounces back to 'Signing in…'. This test fails on the pre-fix
listener and passes on the new behavior; existing
'cancels an in-flight AMR sign-in…' and 'reconciles late AMR browser
completion to Signed in after local cancel' tests continue to pass.

Addresses review feedback on #3158 (chatgpt-codex-connector, nettee).

---------

Co-authored-by: lefarcen <935902669@qq.com>

---------

Co-authored-by: a1chzt <chizblank@gmail.com>
Co-authored-by: Amy <1184569493@qq.com>
Co-authored-by: Mason <jinmeihong0201@gmail.com>
Co-authored-by: Caprika <56862773+alchemistklk@users.noreply.github.com>
Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>
2026-05-28 05:09:55 +00:00
吴杨帆
72426b942a
fix(daemon): align Codex launch permissions on Windows and WSL (#3037)
Use danger-full-access when WSL_DISTRO_NAME is set and pass
default_permissions=":workspace" so newer Codex builds can write
inside the project directory instead of staying read-only.
2026-05-28 04:12:26 +00:00
YOMXXX
269a385ee2
fix(daemon): reconcile missing artifact manifests on run end (#2893) (#3110)
* fix(daemon): reconcile missing artifact manifests on run end (#2893)

When an agent writes HTML via write_file instead of create_artifact,
no .artifact.json manifest sidecar is created. If the run then
terminates (inactivity watchdog, user cancel, or process exit), the
HTML file exists on disk but the manifest is missing — breaking the
artifact panel, finalize, and export flows.

Add a best-effort reconciliation step in the child.on('close') handler
that lists project HTML files and calls reconcileHtmlArtifactManifest
for any missing sidecars. The IIFE runs asynchronously after
design.runs.finish() so it never blocks run finalisation.

* fix(daemon): scope run-end reconciliation to files modified during the run

The review on #3110 flagged that listing the entire project tree and
reconciling every HTML file without a sidecar is too broad — for
imported-folder projects (metadata.baseDir), pre-existing HTML files
would receive spurious manifests.

Record runStartTimeMs at the beginning of startChatRun and filter the
reconciliation loop to only touch HTML files whose mtime >= that
timestamp. Add a regression test that backdates a pre-existing HTML
file and verifies it is skipped while a new file is reconciled.

* test(daemon): fix mtime ordering in reconciliation regression test

The runStartTimeMs was recorded after writing the new file, so its
mtime fell before the threshold and the reconciliation filter skipped
it. Move the timestamp capture to before the write to match the real
startChatRun semantics.
2026-05-28 03:54:48 +00:00
lefarcen
2302efa5db
fix(amr): close ACP stdin on abort so vela tears down OpenCode (#3097)
* fix(amr): close ACP stdin on abort so vela tears down OpenCode

When an AMR (vela) run is cancelled, attachAcpSession.abort() sent a
`session/cancel` RPC but left the child's stdin open. The vela ACP bridge
keeps running until it sees EOF (or is signaled), and it only shuts down
its private OpenCode `serve` process on a clean exit — so on abort the
OpenCode server lingered until the caller's SIGTERM fallback, and leaked
entirely if the parent was killed before cleanup ran.

End stdin after sending the cancel (mirroring the clean-completion path)
so the agent receives EOF and shuts down its own runtime promptly,
independent of signal timing.

* fix(amr): end stdin on abort even before session/new resolves

Addresses review on #3097: abort() still returned early when sessionId
was unset, so the stdin EOF only happened after session/new completed.
Cancelling during ACP startup (before the session exists) left the
OpenCode-teardown window open until the caller's SIGTERM fallback — and a
parent hard-kill before that could still strand the private OpenCode
process.

Move stdin.end() out of the sessionId guard so abort always closes stdin
when the pipe is writable; gate only the session/cancel RPC on sessionId.
Add a regression test that aborts during startup and asserts stdin is
ended with no session/cancel emitted.
2026-05-27 15:40:22 +00:00
lefarcen
b8cfee0c60
fix(diagnostics): capture daemon/web logs in packaged bundles (#3126)
Packaged diagnostics bundles never contained the daemon or web
`latest.log` — the very logs that hold the agent/critique run flow — so
support exports could not explain "sent prompt to the agent, then
nothing happened" reports.

Root cause: the sidecar `base` means different things per launch path.
tools-dev passes the pre-namespace source root, so
`resolveNamespaceRoot(base, namespace)` is correct. But the packaged
orchestrator launches every child with `base = <namespaceRoot>/runtime`
(apps/packaged/src/{paths,sidecars}.ts) while logs live a level up at
`<namespaceRoot>/logs`. The diagnostics builders re-appended the
namespace and resolved every log to
`<namespaceRoot>/runtime/<namespace>/logs/...` → ENOENT. renderer.log
only survived by accident: the desktop main process wrote it to the
same wrong path the reader looked in.

Add `resolveRuntimeNamespaceRoot(runtime, contract, runtimeMode)` to
`@open-design/sidecar` which walks up out of the `runtime/` dir in
packaged (runtime-mode) launches and falls back to the dev layout
otherwise. Route the desktop renderer-log path and both diagnostics
exporters (desktop IPC + daemon HTTP) through it so writer and reader
stay in lockstep and renderer.log lands next to the desktop log dir.

Tests: sidecar unit specs for both layouts; a daemon export spec that
writes a real `<namespaceRoot>/logs/daemon/latest.log` and asserts the
bundle captures its contents (red on main → ENOENT placeholder, green
here).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 14:48:14 +00:00
初晨
3abcb3a4d2
fix(connectors): expire stale auth credentials (#2385)
* fix(connectors): expire stale auth credentials

Mark connector credentials as expired when provider reads report auth-shaped failures so Memory stops presenting stale connected apps as healthy.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(connectors): avoid expiring grants on platform 401

Only delete connector credentials for provider tool errors attributable to the current connector so Composio platform auth failures do not wipe valid grants.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-27 14:44:10 +00:00
Siri-Ray
170a05f5d2
Formalize skill artifacts into plugins (#3085)
* Add skill-to-plugin candidate flow

* Fix skill plugin candidate card reuse

Generated-By: looper 0.9.1 (runner=fixer, agent=codex)

* Fix skill plugin candidate dismiss and URL gates

Generated-By: looper 0.9.1 (runner=fixer, agent=codex)

* Polish skill plugin candidate copy
2026-05-27 08:26:00 +00:00
吴杨帆
2a56f1bca9
fix(daemon): map Claude Not logged in output to /login guidance (#1928) (#3050)
Treat Claude Code stdout like "Not logged in · Please run /login." as an
auth failure in diagnoseClaudeCliFailure so connection tests and chat
runs surface actionable login guidance instead of raw CLI text.
2026-05-27 06:34:09 +00:00
吴杨帆
916438d919
fix(daemon): hide agent executable paths from chat status (#2874) (#3046)
Stop emitting resolved filesystem paths in chat start events and
inactivity-timeout diagnostics; surface agent ids instead.
Complements web-side redaction in #2894.
2026-05-27 06:22:56 +00:00
吴杨帆
8268253f61
fix(daemon): detect CodeWhale as DeepSeek TUI fallback binary (#3025)
* fix(daemon): detect CodeWhale as DeepSeek TUI fallback binary

The renamed CodeWhale CLI installs the `codewhale` dispatcher instead of
`deepseek`. Probe it via fallbackBins so agent detection works without
requiring DEEPSEEK_BIN overrides.

Fixes #2983

* test(daemon): align deepseek docsUrl expectation with CodeWhale metadata

Update env-and-detection coverage to match the runtime metadata URL
changed for issue #2983.
2026-05-27 03:04:40 +00:00
吴杨帆
8e52617738
fix(daemon): resolve imported design systems in user catalog (#3014)
Import/install routes compared bare directory slugs against catalog ids
prefixed with user:, causing a false 500 after a successful write and
duplicate entries on retry. Normalize lookup and reserved slug ids.

Fixes #2489
2026-05-26 15:00:25 +00:00
Amy
5563e7eca6
test: expand home entry and html preview coverage (#2992)
* test: cover entry topbar and hero flows

* test: expand entry and html preview coverage

* test: isolate mocked github stars in home entry e2e

Generated-By: looper 0.8.1 (runner=fixer, agent=codex)

* chore: retrigger CI for PR 2992
2026-05-26 14:48:35 +00:00
chaoxiaoche
fce444bcab
Consolidate chat comments preview on main (#2906)
* feat(web): queue chat sends

* feat(web): render code comment directives

* feat(web): add preview comments and manual edits

* fix(web): polish shared chrome controls

* fix(web): align queued send loading state

* feat(web): open primary project artifacts

* fix(web): keep queued sends and tests aligned

* fix(web): restore docked comment tools layout

* fix(web): align preview comment toolbar

* fix(web): place local cli beside handoff

* fix(web): move agent menu beside handoff

* fix(web): make project instructions a direct header action

* fix(web): compact handoff and toolbar labels

* fix(web): clarify handoff menu and annotation label

* fix(web): restore compact cursor handoff trigger

* fix(web): align agent menu trigger with handoff

* fix(web): add draw toolbar close action

* fix(web): move inspect editing into edit mode

* fix(web): avoid reserving comment sidebar in annotation mode

* fix(web): float preview comments panel

* fix(web): keep edit canvas full width

* fix(web): polish preview annotation tools

* fix(web): highlight active preview comments

* fix(web): open comments panel after annotation save

* fix(web): polish comment handoff controls

* fix(web): remove palette preview tool

* fix(web): simplify draw annotation toolbar

* fix(web): restore queued tasks into composer

* fix(web): restore queued send strip styling

* fix(web): hide internal comment target ids

* fix(web): align manual edit panel header

* test(web): cover visual interaction contracts

* fix(web): address PR feedback regressions

* fix(web): preserve artifact chrome state

* fix(daemon): restore project raw file routes

---------

Co-authored-by: chaoxiaoche <chaoxiaoche@chaoxiaochedeMacBook-Pro.local>
Co-authored-by: mrcfps <mrc@powerformer.com>
2026-05-26 10:31:19 +00:00