mirror of
https://github.com/nexu-io/open-design.git
synced 2026-06-01 03:14:35 +07:00
257 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
79c039efdf |
feat(daemon): introduce session mode for conversations
- Added a new `session_mode` column to the `conversations` table with a default value of 'design'. - Implemented logic to handle `session_mode` in conversation creation, updates, and retrieval. - Enhanced the API to support `session_mode` in conversation requests, allowing for 'chat' or 'design' modes. - Updated the web application to include a session mode toggle, enabling users to switch between chat and design modes seamlessly. - Adjusted system prompts to reflect the current session mode, providing context-aware responses. This feature enhances the user experience by allowing for more flexible conversation management, catering to different interaction styles. |
||
|
|
8cc11e38cc |
feat(daemon): add interactive terminal support with node-pty
- Introduced a new terminal service to manage interactive terminal sessions. - Added routes for creating, streaming, and managing terminal sessions in the daemon. - Integrated terminal functionality into the CLI, allowing users to open interactive shells. - Updated project routes to support conversation seeding for side chats. - Enhanced the web application to include terminal tab functionality and UI components. This feature enables users to interact with a terminal directly within the application, enhancing the overall user experience and providing a more integrated development environment. |
||
|
|
0fbeaf829e
|
fix(#3247): Detect, terminate, and warn on fabricated role markers across all agent paths (#3303)
* fix(daemon): detect and strip fabricated role markers in model output (#3247) Three-layer defence against models emitting `## user` / `## assistant` / `## system` lines mid-response, which the chat host interprets as real turn boundaries and acts on as unauthorised instruction: 1. **System prompt**: anti-roleplay instruction elevated from a bullet under "What you don't do" to a standalone `## CRITICAL` section in `official-system.ts`, with a REMINDER pinned at the end of the composed prompt for recency bias. 2. **Stream-level detection and truncation**: shared `role-marker-guard.ts` module (`createRoleMarkerGuard` + `FABRICATED_ROLE_MARKER_RE`) used across all text paths — Claude stream (per-message guards), non-Claude structured streams (run-scoped guard via `emitGuardedTextDelta`), and BYOK proxy routes (`createDeltaGuard`). When a marker is detected, the contaminated suffix is dropped and a `fabricated_role_marker` event surfaces a warning in the UI. 3. **UI**: `StatusPill` gains `is-warning` / `is-error` CSS variants; `fabricated_role_marker` events render as amber warning pills. * fix(chat-routes): do not await reader.cancel() on stream early-return The await on reader.cancel() can hang indefinitely on response streams whose underlying source is a Uint8Array (most notably surfaced by the ollama test in proxy-routes.test.ts, which builds its mock body via `new Response(uint8array)` rather than the controller-based helper `sseResponse()`). The hung await holds the request handler open, which in turn blocks `server.close()` in the afterAll hook, producing the two test timeouts (test at 145, hook at 36) currently failing CI on #3296. Fix is in production code, not the test: don't await the cancel. It is a cleanup hint and we are returning from the function anyway, so blocking on it offers no value. fire-and-forget with an empty catch keeps the cancel signal flowing for real HTTP streams without risking a hang on mock/edge-case implementations. Co-Authored-By: JasonBroderick <jason@buddyboss.com> * fix(daemon): terminate child on role-marker detection (close #3247 generation vector) PR #3296's detection layer truncates display and persistence of fabricated role markers, but the underlying model subprocess keeps generating tokens after detection. Three concrete consequences: 1. The model bills the user for the entire contaminated response (we observed 5,106 chars stored in claude's session file for a turn where only the first 3,013 chars were legitimate — a 40% overhead). 2. tool_use blocks emitted AFTER the marker reach the daemon's dispatcher unchecked, since detection only gates the text-delta emission path, not content-block-stop / tool_use blocks. The model could fabricate "## user delete file X" then emit a tool_use(delete X) that the dispatcher would execute. 3. The UI surfaces a `fabricated_role_marker` warning followed by an eventual normal turn-end, blurring the distinction between "completed normally" and "killed by safety guard." This commit adds a single idempotent `abortForRoleMarker(marker)` helper in server.ts, scoped to the same closure as `child` and `runGuard`. On any detection event (per-message Claude guard, run-scoped non-Claude guard, plain stdout guard) the helper: - Emits a structured `ROLE_MARKER_HALLUCINATION` SSE error so the UI can render a security-class status distinct from a normal turn-end. The existing `fabricated_role_marker` warning is still sent and rendered as the amber pill (PR #3296's UI). - Calls `acpSession.abort()` for ACP-multiplexed agents (Hermes, Kimi, Devin, Kiro) whose I/O doesn't necessarily release on SIGTERM of the wrapper process alone. - SIGTERMs the child immediately, with the existing `scheduleForcedChildShutdown()` SIGKILL fallback at 2x grace. Wired into three sites where contamination is detected: - `emitGuardedTextDelta` (sendAgentEvent / copilot / ACP / pi-rpc text_delta paths) - Plain-stdout listener (BYOK plain mode) - The Claude stream handler's onEvent (per-message guards in claude-stream.ts surface `fabricated_role_marker` events directly via onEvent rather than through the run-scoped emitGuardedTextDelta) Tool_use blocks emitted BEFORE the marker still flow through normally — this guard can't help with those, since by the time we observe a text marker the prior content block has already finished. Closing that gap requires speculative cancellation of in-flight tool calls when a downstream text block contains a marker; that's tracked as follow-up work, not included here. Co-Authored-By: roverkai <2196140098@qq.com> Co-Authored-By: JasonBroderick <jason@buddyboss.com> * refactor(role-marker-guard): bounded tail + drop chat-style markers Addresses two review comments on #3303: (1) O(1) memory + per-delta work (review r3323982225) Replace the unbounded `accumulated` string with a rolling tail capped at TAIL_BUFFER_SIZE (64 chars — comfortably exceeds the longest marker prefix `\n<whitespace>## assistant` ≈ 16–24 chars in practice). A 50 KB assistant response delivered in 1000 chunks of 50 bytes was previously O(n²) on string concatenation alone; now it is O(1) per delta regardless of message length. The `tail.length` value carries the "already emitted" offset that the cut-point math needs, so the offset semantics at L74–78 of the prior implementation are preserved without re-introducing the full-text buffer. (2) Drop chat-style markers entirely (review r3323982234, option (a)) `User:` / `Assistant:` / `Human:` / `AI:` are removed from the regex. Rationale: - The host parses ONLY `## user` / `## assistant` / `## system` lines as turn boundaries (see `buildDaemonTranscript` in apps/web/src/providers/daemon.ts). A model emitting chat-style markers does NOT cause the original #3247 security failure. - With kill-on-detection wired in this PR (`abortForRoleMarker` in server.ts), a false positive aborts the whole run — far more expensive than a stray unflagged `User:` line in chat scrollback. Chat-style markers collide with legitimate output (form labels, email contacts, JSDoc) often enough that pairing them with kill-semantics is the wrong tradeoff. The tradeoff is now documented in the regex docblock so the kill-on-match behaviour is justified against the false-positive surface. Also aligns the prompt-side CRITICAL block in system.ts: drop the "don't emit User: / Assistant: / Human: / AI:" bullet, since we no longer enforce it. Less ambiguity for the model and the operators. Test file updated: - Chat-style positive tests flipped to negative ("does NOT match User: — chat-style out of scope") so the intentional exclusion has a permanent regression test. - Two new tests cover the bounded-tail behaviour: a marker arriving after 10 KB of clean text in small chunks, and a marker straddling a chunk boundary after 100 prior chunks. - Added test for legitimate `User: bob@example.com`-style content not triggering contamination. Test count is now 35 (up from 25); two of the new ones explicitly exercise the new bounded-tail path. Co-Authored-By: JasonBroderick <jason@buddyboss.com> * fix(role-marker-guard): drop \`^\` anchor after first chunk (review r3324060995) Blocking correctness bug introduced by commit 4 (bounded-tail refactor): once \`tail\` is a rolling slice of mid-stream text, \`^\` in the canonical regex \`(?:^|\\n)\\s*##\\s+(?:user|...)\` no longer represents the genuine message start. As the rolling window slides forward chunk by chunk, a sliced tail can begin with whitespace + \`##\` (or just \`##\`), letting \`^\` anchor a match against text that the full-buffer implementation correctly ignored. With kill-on-detection wired in commit 3, that false positive now SIGTERMs the run and emits a \`ROLE_MARKER_HALLUCINATION\` error — exactly the failure class called out in the docblock at L22–29. Reviewer's evidence (PerishCode, r3324060995): streaming "…take a look at the ## user content section…" one character at a time reports \`contaminated: true\` post-refactor; the same text in a single feed stays clean. Fix: keep the canonical \`FABRICATED_ROLE_MARKER_RE\` for the very first non-empty feed (where \`^\` legitimately points at the message start), and switch to an internal \`NEWLINE_ANCHORED_ROLE_MARKER_RE\` (\`\\n\\s*##\\s+(?:user|...)\` — drops the \`^\` alternative) for all subsequent feeds. A \`firstChunk\` boolean tracks the state. Real newline-preceded markers straddling chunk boundaries are still caught because the preceding \`\\n\` is retained inside the 64-char tail. Regression tests added (\`apps/daemon/tests/role-marker-guard.test.ts\`): - mid-line \`## user\` streamed char-by-char with no preceding \\n (mirrors the reviewer's repro) - space-preceded mid-line \`## user\` in a >130-char stream, which long enough to force the rolling window past the marker — exercises the exact slice condition that triggered the bug - real \\n-preceded \`## user\` still caught after a long preamble (positive case must not regress) - \`## user\` as the very first chunk still caught (\`^\` legitimately anchors on the first feed) Co-Authored-By: JasonBroderick <jason@buddyboss.com> * fix(role-marker-guard): case-sensitive + tighter prefix scope (reviews r3324151877 / r3324151882) Two refinements addressing the third review on #3303: == Blocking (r3324151877) == The regex over-matched legitimate Markdown headings, and with kill-on-detection wired in commit 3 each false positive deterministically aborts a real run. Three changes tighten the match to the actual security surface — `## user` / `## assistant` / `## system` lines the chat host parses as turn boundaries — without losing any real attack pattern: 1. CASE-SENSITIVE. Dropped the `/i` flag. The host's turn-boundary delimiter is lowercase (see `buildDaemonTranscript` in apps/web/src/providers/daemon.ts), and the `## CRITICAL` system-prompt block already forbids only the lowercase forms. Title-Case headings like `## User Guide`, `## System Architecture`, `## Assistant settings` are now ignored — these are legitimate technical writing patterns LLMs emit constantly. `## USER NOTES` (all-caps) likewise no longer flags. 2. POSITIVE LOOKAHEAD `(?=[^a-z])` after the role keyword. Without it, `## userland`, `## userspace`, `## users guide`, `## systemd`, `## assistance` all match via prefix in the alternation. The lookahead requires the next character to exist and to not be a lowercase letter, so: - `## user\\n…` → match (newline is not lowercase) - `## assistantR…` → match (R is uppercase; the glued-form attack pattern still gets caught) - `## assistant.` → match (. is not a letter) - `## users guide` → no match (s is lowercase letter) - `## userland` → no match (l is lowercase letter) POSITIVE rather than NEGATIVE `(?![a-z])` because the negative form is satisfied at end-of-string, which in a streaming context means "we have `## user` but don't know what comes next yet" — would fire prematurely if `land` arrives in a later chunk. The positive form delays detection by one character in that edge case, traded for correctness. 3. `[ \\t]` instead of `\\s` for inner whitespace. Markdown role markers are single-line by convention; restricting to space/tab prevents oddities like `##\\nuser` from matching across lines. Test file: added Title-Case fixtures (`## User Guide`, `## System Architecture`, `## Assistant settings`, `## USER NOTES`) and prefix-of-longer-word fixtures (`## users guide`, `## userland`, `## systemd`, `## assistance`) — each asserting NO contamination. The existing `## usability` negative test gave false confidence as the reviewer noted (only failed via alternation-miss, not via word-boundary semantics); the new fixtures actually exercise the lookahead. Also added a positive test for `## assistant.` (glued punctuation) to balance the existing `## assistantReading` (glued uppercase) coverage. Total tests: 35 → 50. == Non-blocking (r3324151882) == Added `ROLE_MARKER_HALLUCINATION` to `API_ERROR_CODES` in `packages/contracts/src/errors.ts` alongside the existing agent/AMR codes, with a docblock comment explaining the emission contract: emitted by `server.ts::abortForRoleMarker` alongside the existing `fabricated_role_marker` warning event when the daemon detects a fabricated Markdown role marker in agent output; retryable. The code was already being emitted over the wire but unregistered — landing the registration here keeps the contract and emitter in sync as reviewer requested. Co-Authored-By: JasonBroderick <jason@buddyboss.com> * fix(role-marker-guard): defer complete-but-unconfirmed marker suffix Addresses review r3324277xxx — the boundary case where a stream chunk boundary lands between the role keyword and its lookahead character violated the documented "everything from the marker onward is silently dropped" contract. With (?=[^a-z]) as the lookahead, `feedText('## user')` returned `## user` as safe (no char to satisfy the lookahead → no match → pass through), so the fabricated marker line leaked into UI and app.sqlite before the next chunk confirmed contamination on the next SIGTERM cycle. Fix: introduce a `pending` state variable holding bytes that match the COMPLETE-but-unconfirmed marker prefix at end of buffer (/(?:^|\\n)[ \\t]*##[ \\t]+(?:user|assistant|assist|system)$/, no lookahead, $ anchor instead). When the no-match branch detects this suffix, withhold it from emission until the next feed either: - Confirms it (next char non-lowercase) → main regex matches → contaminated → withheld bytes dropped along with `## user`. - Denies it (next char lowercase, e.g. `userl…`) → main regex no longer matches the role keyword → withheld suffix is released and emitted alongside the new continuation. Also tied the firstChunk transition to actual byte emission rather than feed count. Previously a message that starts with `## system` followed by a separate `\\n` chunk would lose the `^` anchor on the second feed (firstChunk had flipped after the first feed even though nothing was emitted yet), silently breaking detection for that edge case. Now `firstChunk` stays true until at least one byte has crossed the emission boundary, matching the conceptual definition of "message start". Tests added (apps/daemon/tests/role-marker-guard.test.ts): - `## user` deferred at chunk boundary, confirmed by `\\n` in next - `## user` deferred at chunk boundary, denied by `land` continuation - `## assistant` deferred, confirmed by punctuation - `## User` Title-Case still passes through unconditionally - `## system` as the very first chunk: deferred, confirmed by \\n in next chunk (tests the firstChunk-stays-true-when-nothing- emitted invariant) Total tests: 50 → 55. Co-Authored-By: JasonBroderick <jason@buddyboss.com> * fix(claude-stream): scope role-marker guard to text_delta only, not thinking_delta Addresses review r3324xxxxxx — guarding the thinking channel buys no security and causes legitimate aborts. Why thinking is NOT a #3247 vector: - `buildDaemonTranscript` in apps/web/src/providers/daemon.ts only re-serializes `m.content` as `## ${m.role}\n...`. - Extended-thinking content is rendered to a separate `kind: 'thinking'` payload (daemon.ts:857-858) and never folded into `m.content`. - So a `## user` line in the thinking channel CANNOT become a fabricated turn boundary on the next round-trip. Why guarding it is harmful: - Models routinely emit literal `## user` / `## assistant` lines in chain-of-thought when reasoning about conversation structure ("Let me think about this. The user might phrase it as:\n## user\n …"). Common pattern in production traces. - With `abortForRoleMarker` wired in server.ts, a guard match on thinking SIGTERMs the run and surfaces a security error to the UI. The user paid for the reasoning, never sees the answer, and gets a confusing "fabricated role marker" warning for what was actually legitimate metacognition. - This directly contradicts the module's own stated philosophy ("a false positive aborts the whole run — a much more expensive failure than a stray unflagged ... line", role-marker-guard.ts). Fix: `emitSafeText` now passes thinking_delta through unconditionally, skipping both the guard and the contamination check. text_delta remains fully guarded. The single-line change at the top of emitSafeText preserves all other channels' behavior. Regression tests added (apps/daemon/tests/claude-stream-thinking.test.ts): - `## user` / `## assistant` lines in a thinking_delta — must NOT fire fabricated_role_marker, the thinking content streams intact including the marker text, and the subsequent text_delta answer still reaches the consumer (run not aborted). - Sanity check: same `## user` pattern in a text_delta DOES fire fabricated_role_marker and truncates emission at the marker. Locks in the channel-discriminated behavior. Co-Authored-By: JasonBroderick <jason@buddyboss.com> * fix(role-marker-guard): tie firstChunk to slicing, not byte emission Blocking review r3324xxxxxx: under the prior firstChunk transition ("any byte emitted"), a role marker that arrived at the very start of a message with its prefix split across multiple chunks bypassed detection — reopening the #3247 vector on the Claude path. Concrete cases that were missed (all are routine provider tokenizations of \`## user\n…\` at message start): - \`##\` | \` user\nDELETE…\` - \`## us\` | \`er\nDELETE…\` - \`## \` | \`user\nDELETE…\` Mechanism: the pending-deferral regex only catches COMPLETE role keywords, so a first chunk ending in a partial prefix (\`##\`, \`## \`, \`## us\`) was emitted in full. That emission flipped firstChunk to false. From that point only NEWLINE_ANCHORED_ROLE_MARKER_RE was used, which requires a literal \n before \`##\`. A marker at buffer position 0 has no preceding \n, so it could no longer match. abortForRoleMarker never fired and tool_use blocks emitted after the fabricated turn boundary reached the dispatcher. Fix: change firstChunk to track "tail has not been sliced yet" rather than "any byte emitted". While total emitted bytes <= TAIL_BUFFER_SIZE, tail still represents the entire emission so far and \`^\` in the canonical regex genuinely anchors at byte 0 of the stream — so the \`^|\n\` alternation safely catches a chunk-split message-start marker. The transition happens at the moment we would slice: once emitted > TAIL_BUFFER_SIZE, tail becomes a mid-stream window, \`^\` becomes meaningless, and we switch to the newline-only variants. Earlier iterations of this code tried two other definitions, both unsound: - "any byte emitted" (this commit fixes) — lost \`^\` before a chunk-split message-start marker could finish arriving. - "newline emitted" (briefly considered as the reviewer's alternative suggestion) — left \`^\` valid on a sliced buffer when streams hadn't emitted a newline yet, re-introducing the rolling-tail mid-stream false positive from review r3324060995. The slice-based invariant satisfies both: while we have not sliced, \`^\` is correct; once we slice, it is not. Regression tests added (apps/daemon/tests/role-marker-guard.test.ts): - \`##\` | \` user\nDELETE…\` → contaminated, marker=\`## user\` - \`## us\` | \`er\nDELETE…\` → contaminated, marker=\`## user\` - \`## \` | \`user\nDELETE…\` → contaminated, marker=\`## user\` - \`#\` | \`# user\nDELETE…\` → contaminated, marker=\`## user\` The fourth case (single \`#\` first chunk) exercises an even more adversarial tokenization than the reviewer's examples; it is also caught. Total tests: 55 → 59. Co-Authored-By: JasonBroderick <jason@buddyboss.com> * fix(tests): wrap events in stream_event envelope in thinking test feedJsonl was feeding raw events without the `{ type: 'stream_event', event: ... }` wrapper that createClaudeStreamHandler requires (line 141 of claude-stream.ts). Events silently fell through all branches, making both tests pass vacuously. Also fix TS2532 on warnings[0].marker with non-null assertion (safe after the toHaveLength(1) guard). Co-Authored-By: RoverKai <roverkai@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: roverkai <2196140098@qq.com> Co-authored-by: JasonBroderick <jason@buddyboss.com> Co-authored-by: RoverKai <roverkai@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
65802542a2
|
fix(chat): surface OpenCode usage-limit/provider failures instead of a bare timeout (#3316)
* fix(chat): surface OpenCode provider failures from its log on a silent stall OpenCode's headless `run --format json` mode swallows provider failures: a 429 usage-limit is marked retryable and retried silently with nothing on stdout/stderr, so the chat run only dies via the inactivity watchdog and the daemon shows a bare "request timed out" with no reason. The real error (statusCode + "Monthly usage limit reached…") is recorded only in OpenCode's own session log. On a failed OpenCode close where stdout/stderr carry no signal, read the newest OpenCode session log, extract the latest `service=llm` provider error (scoped to that one line so the embedded request body can't contaminate the classification), and emit a structured, retryable SSE error (RATE_LIMITED / AGENT_AUTH_REQUIRED / UPSTREAM_UNAVAILABLE) carrying the provider's message. Refs #982. * fix(chat): emit recovered OpenCode failure from the watchdog path, bound to the run Addresses review on #3316. Blocking: the recovery previously ran only in the child-close handler, but in the inactivity-watchdog stall path (the exact case this targets) failForInactivity sends its error and finish()es the run — which clears run.clients — before the child closes. So the structured error reached zero live SSE clients and only surfaced on reload. Recover and send the OpenCode failure inside failForInactivity, before finish(), on the same pre-teardown send path the generic stall message already uses. Keep the close-handler branch for the case where OpenCode exits non-zero on its own (clients still attached). Non-blocking: bind the log lookup to the current run via an mtime gate (since=run.createdAt) so a stale or concurrent session's error can't be misattributed — skip log files last written before the run started. * docs(opencode-log): note the concurrent-run limitation of the mtime gate * fix(chat): skip close-handler failure emit when the watchdog already finished the run Non-blocking review follow-up on #3316: on the silent-stall path both failForInactivity and the child-close handler fired for the same run, so the recovered RATE_LIMITED error was sent twice and the events-log stream was reopened after finish() had closed it. Guard the close-handler failure emit with !design.runs.isTerminal(run.status) — the watchdog already sent the error and finalized the run; finalization below still runs (finish() no-ops once terminal). |
||
|
|
9b9a18af5b
|
fix(daemon): validate skillId on POST/PATCH /api/projects against runtime source-of-truth (#3293)
* fix(daemon): validate skillId on POST /api/projects against runtime source of truth * fix(daemon): validate skillId on PATCH /api/projects/:id, sharing the POST validator * test(daemon): cover skillId canonicalization, design-template ids, empty-string + null normalization, type rejection |
||
|
|
76c7d31c53
|
chore: bump vela cli to 0.0.4 (#3239)
* chore: bump vela cli to 0.0.4-test.0 * chore: refresh lockfile for vela cli 0.0.4-test.0 * chore(nix): refresh pnpm deps hash * fix: materialize electron before mac release checks * fix: rebuild electron when mac framework links are invalid * revert: drop release workflow experiments * chore(nix): refresh pnpm deps hash * fix: stop blocking beta mac release on electron symlink preflight * fix: stop using custom electron dist for beta mac packaging * fix: guard oversized chat images and opencode overflow * chore: bump vela cli to 0.0.4 * chore(nix): refresh pnpm deps hash * fix(daemon): surface prompt-image stat failures instead of dropping them resolveSafePromptImagePaths only swallowed unresolvable path input; once a path was confirmed inside UPLOAD_DIR and existed, a statSync failure (EACCES/EPERM, a file vanishing mid-run) silently dropped the image and let the run continue without that prompt context. Since this helper is now also the 1 MB enforcement point, that turned an infra/validation failure into a 'successful' run with missing required context. Collect those into a new failedImages bucket and fail the run with INTERNAL_ERROR at the call site, mirroring the oversized-image guard. Add a unit test covering statSync throwing. --------- Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com> Co-authored-by: lefarcen <935902669@qq.com> |
||
|
|
98a2c63973
|
feat(daemon): add Antigravity agent adapter (#3157)
* feat(daemon): add Antigravity agent adapter
Adds Google Antigravity (`agy` CLI) as a coding-agent runtime. Detection
picks up `agy` on PATH, the daemon spawns `agy -p "<prompt>"` for a
single non-interactive turn, and the assistant text reply streams back
on stdout. OAuth is shared with the Antigravity IDE through the system
keyring, so users who have signed into the desktop app are authenticated
on first run with no extra step.
`agy` v1.0.3 has no JSON / stream-json / ACP output mode (upstream issue
#119), no `--model` flag (issue #35), and no MCP forwarding hook yet —
the adapter ships with `streamFormat: 'plain'` and a single `default`
fallback model so the model picker doesn't mislead users into thinking
their choice is wired through. We will upgrade buildArgs + add a
dedicated event parser when upstream ships structured output.
Also gitignores `.antigravitycli/`, the project-local config directory
`agy` auto-creates on every run (upstream issue #175).
* fix(daemon): Antigravity adapter — stdin prompt, brand icon, form loop, empty-output guard
- Switch prompt delivery from argv to stdin (`agy -p -`) to avoid the
30KB maxPromptArgBytes limit that blocked real-world composed prompts
- Add official Antigravity brand SVG icon to agent picker
- Fix repeated question-form loop for plain agents by injecting an
OVERRIDE block when form answers are already present in the transcript
- Add empty-output guard for plain agents so expired auth or silent
failures surface a user-visible error instead of a blank "Done" turn
* feat(daemon): expand Antigravity adapter — model picker, form-loop fix, OAuth launcher, log-file classification
PR #3157 follow-up integrating four iterations from end-to-end manual
testing on Gemini 3.5 Flash + GPT-OSS 120B Medium through `agy` v1.0.3.
Each section is independently verifiable; combined they're what made
the first successful artifact generation work end-to-end.
## Model picker via settings.json (agy has no --model flag)
agy v1.0.3 ships no `--model` CLI flag (upstream issue #35), but the
TUI Switch-Model picker writes the chosen label to
`~/.gemini/antigravity-cli/settings.json`'s `"model"` field, and every
`-p` invocation re-reads that file on startup — verified by capturing
the `--log-file` line `Propagating selected model override to backend:
label="<model>"`. Antigravity's `fallbackModels` now lists the 8
labels its TUI exposes (Gemini 3.1 Pro / 3.5 Flash variants, Claude
Sonnet/Opus 4.6 Thinking, GPT-OSS 120B Medium) and `buildArgs`
persists the user's choice to settings.json right before spawn. The
synthetic `default` id is preserved — picking it leaves settings.json
untouched so a user who switches models from agy's own TUI keeps
their choice.
Introduces `RuntimeAgentDef.supportsCustomModel?: boolean`. AMR's
hardcoded blocklist in `SettingsDialog.tsx` migrates to the
declarative flag (it rejects free-form ids at the ACP layer), and
antigravity opts out because its label set is a server-side enum that
silently fails on unrecognised strings.
## Form-loop fix (transcript sanitizer + stronger OVERRIDE)
The discovery form loop on weak/medium plain-stream models (GPT-OSS
120B Medium, Gemini 3.5 Flash) had two reinforcing causes:
1. `buildDaemonTranscript` packed the prior assistant turn's
literal `<question-form>` markup into the user request on the
next turn, giving the model a template to echo. New
`sanitizePriorAssistantTurnForTranscript` strips
`<question-form>...</question-form>` blocks and ```json fences
that match form-schema shape, replacing them with a brief
placeholder. User content is preserved verbatim (a user who
legitimately mentions `<question-form>` in chat keeps their
message intact).
2. The OVERRIDE block on form-answered turns was 4 lines and only
banned the bare `<question-form>` tag — models still emitted the
fenced JSON, form-asking prose ("Got it — tell me the following"),
and fake system events ("subagents stopped"). The new
`FORM_ANSWERED_SYSTEM_OVERRIDE` enumerates each anti-pattern and
pins them via tests, so silently weakening any line reintroduces
the regression.
Also adds RuntimeAgentDef.resumesSessionViaCli + RuntimeContext.
hasPriorAssistantTurn as forward-looking abstractions (skipTranscript
option on composeChatUserRequestForAgent). Antigravity does NOT opt
in — agy's `-c` resume activates an internal agentic loop with tool
retries and fallback-to-cached-response on tool errors that the OD
system prompt cannot steer; reverted after seeing byte-identical
form re-emissions caused by agy's own retry logic, not OD's transcript.
## One-click OAuth via system terminal
agy print mode can't complete Google Sign-In on its own (the OAuth
callback page asks the user to paste an auth code back into agy, but
`-p` has no input field). Before this commit the auth banner only
told the user to "open a terminal yourself."
Adds `POST /api/agents/antigravity/oauth-launch` and a cross-platform
launcher in `runtimes/terminal-launch.ts`:
- macOS: osascript → Terminal.app `do script "agy"` + activate
- Linux: tries x-terminal-emulator, gnome-terminal, konsole,
xfce4-terminal, xterm in order
- Windows: `cmd /c start "Open Design" cmd /k agy`
The endpoint hardcodes the `agy` command (no user input → no shell
injection surface) and is loopback-gated like the other daemon
endpoints. The chat's `AGENT_AUTH_REQUIRED` banner now renders a
"Sign in via terminal" button next to Retry; clicking it spawns the
terminal so the user can finish OAuth in one click.
## Silent-failure classification (auth vs quota via --log-file)
agy print mode is silent on stdout/stderr for both missing-OAuth AND
quota-exhausted failures — the upstream
`RESOURCE_EXHAUSTED (code 429): Individual quota reached` and the
`not logged into Antigravity` line only surface in agy's
`--log-file`. Without log inspection the daemon misread quota as
"auth required" and showed the wrong banner.
`RuntimeContext.agentLogFilePath` carries a daemon-owned per-run temp
path that antigravity's buildArgs translates to `--log-file <path>`.
The empty-output guard now reads that log on a `code === 0 &&
!childStdoutSeen` exit, feeds the tail to
`classifyAgentServiceFailure`, and routes:
- "not logged into Antigravity" → AGENT_AUTH_REQUIRED with
antigravityAuthGuidance
- "RESOURCE_EXHAUSTED" / "quota" / → RATE_LIMITED with
"Individual quota reached" antigravityQuotaGuidance
- none of the above (rare) → fall back to auth guidance
as the most likely cause
Both surface a terminal launcher in the auth banner: auth gets "Sign
in via terminal", quota gets "Switch model in terminal" — same
endpoint, contextual label. The handler is identical (open agy in a
terminal); the user either signs in or uses agy's Switch Model
picker to pick a model with available quota.
## Validation
- `pnpm guard` pass
- `pnpm --filter @open-design/daemon` runtime + telemetry suites:
192 passed, 1 skipped (the 1 pre-existing `task-type` failure on
origin/main is unrelated to this change)
- `pnpm --filter @open-design/web` typecheck pass; sse / amr-guidance
/ AgentIcon suites pass (51 web tests)
- Manual end-to-end on darwin + Gemini 3.5 Flash and GPT-OSS 120B
Medium: turn-1 question-form rendered correctly, turn-2 produced
`<artifact>` with full HTML (3.3KB Modern Minimal design) instead
of re-emitting the form. agy `--log-file` content correctly
classified as RATE_LIMITED when Gemini Pro quota was exhausted,
and as AGENT_AUTH_REQUIRED when keychain was cleared.
* fix(web/test): align amrAgent fixture with supportsCustomModel contract
The AMR agent definition in the daemon ships `supportsCustomModel: false`
so the Settings model picker hides the free-text "Custom…" option. The
PR changed `allowCustomModel` from `selected.id !== 'amr'` (hardcoded)
to `selected.supportsCustomModel !== false` (declarative), but the test
fixture was not updated to carry the same field — causing the
`__custom__` sentinel to appear in the picker under test.
Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)
* fix(daemon): align formAnswerTransition wording with main + scope build directive to discovery
CI surfaced two failures on the merge with main:
- chat-route.test marks submitted discovery form answers ... expected
the main-version wording 'Do not emit another <formId> form.'
- telemetry-message-finalization keeps non-discovery form answers
active ... expected task-type to fall through the else branch
('Treat these form answers as the active user turn'), not the
discovery RULE 2/RULE 3 build branch.
The colleague's earlier
|
||
|
|
055680a67d
|
fix(daemon): dedupe scheduled routine slots (#1971)
* fix(daemon): dedupe scheduled routine slots Co-authored-by: multica-agent <github@multica.ai> * fix(daemon): claim scheduled routine runs atomically Co-authored-by: multica-agent <github@multica.ai> * Fix routine loser snapshot rollback Co-authored-by: multica-agent <github@multica.ai> * fix(daemon): defer scheduled routine side effects Co-authored-by: multica-agent <github@multica.ai> * fix(daemon): terminate in-memory run on scheduled prepare failure If `prepare()` throws after `persistPreparedRun()` has mutated the routine run with real project/conversation/agentRunId values, the catch in `RoutineService.start_` previously left the in-memory chat run queued (no `discard()`), so its `completion` promise hung waiting on `design.runs.wait(run)` forever, and the `routine_runs` row stayed pinned to `routine-pending-*` placeholders even though the underlying project/conversation rows for those real IDs had been created. The catch now calls `handlerStart.discard?.()` so the in-memory run terminates as `canceled`, releasing `completion`, and passes the real IDs through `updateRun` so the persisted failed row reflects what was attempted instead of the placeholder sentinels. A cleanup failure inside `discard()` is logged via `console.error` rather than swallowed, following the same surface-don't-swallow rule the loser cleanup path uses. The original prepare error is still rethrown so the scheduler advances to the next cadence (the slot claim is already terminal, so retrying the same slot would just duplicate-claim and lose). Added regression coverage in `apps/daemon/tests/routines.test.ts` for both the normal prepare-failure path (real IDs persisted, discard fired, completion resolved) and the case where the cleanup itself also throws (failure surfaces via console.error, the row is still finalized with the real IDs). Co-authored-by: multica-agent <github@multica.ai> * fix(daemon): clear placeholder IDs on scheduled prepare failure Co-authored-by: multica-agent <github@multica.ai> * fix(daemon): finalize routine prepare failures * fix(daemon): defer manual routine setup cleanup Co-authored-by: multica-agent <github@multica.ai> * fix(daemon): drop loser chat runs and rollback partial snapshot pins Two follow-ups from the latest scheduler-claim review: - Duplicate scheduled losers used to call `design.runs.finish(run, 'canceled')`, exposing a phantom canceled routine run on `/api/runs` even though no `routine_runs` row, conversation, or messages were ever committed. Split the handler tear-down into `discardUnstarted` (used for never-inserted paths — drops the in-memory run via the new `design.runs.drop()`) and the existing `discard` (used after `prepare()` runs — still finalizes as canceled and rolls back partial state). - `resolvePluginSnapshot()` calls `linkSnapshotToProject()` before linking the conversation/run, so a failure mid-link could leave the reused project pinned to a snapshot the routine never durably claimed while `resolvedRoutineSnapshot` stayed null. Capture the intermediate snapshot id in `partiallyAppliedSnapshotId` when the resolver throws, and let `discard()` fall back to it for `restoreProjectSnapshotLink` so the previous project pin is restored either way. Regression coverage added in `tests/routine-schedule-claims.test.ts`: - A scheduled loser does not surface a phantom canceled chat run via `/api/runs` after the slot is lost. - A resolver that throws after `linkSnapshotToProject()` (forced via a SQLite trigger on `conversations.applied_plugin_snapshot_id`) still restores the reused project's previous pin in `discard()`. * fix(daemon): return prepared routine run ids Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai> Co-authored-by: kami.c <kami.c@chative.com> |
||
|
|
08c350fb0f
|
fix(analytics): bucket feedback agent/model directly on the event (#3240)
* fix(analytics): bucket feedback agent/model directly on the event
Reason × agent / reason × model splits on
`assistant_feedback_reason_submit` were 25-74% `unknown` because the
event only carried `run_id` — analyses had to join back to
`run_created/run_finished`, which loses rows whenever the feedback is
given to a message whose run sits outside the query window (the common
case for feedback on older messages), and whose `model_id` was `null`
to begin with (the user didn't pick a specific model — went with the
agent's default).
Carry `agent_provider_id` and `model_id` directly on every feedback
event so the analyses no longer need to join. Replace `null/unknown`
with the `default` bucket via `modelIdForTracking` (and let
`agentIdToTracking` fall through to `other`) at every emit site —
`null` was an analyst-hostile mix of "no selection" and "join failed";
`default` is a real, analysable bucket. On `run_finished`, upgrade the
model to the agent-reported value from initializing/model status
events when the user did not pick one — covers ACP, claude-stream,
copilot-stream, json-event-stream, qoder, pi-rpc.
* fix(analytics): use feedbackAgentProviderIdToTracking and assistantFeedbackModelId for feedback events
Wire API-mode agent ids (anthropic-api → anthropic) and agentName-parsed
model ids through the feedback emit path. Previously the feedback props used
agentIdToTracking (no anthropic-api case) and assistantModelDetail (no
agentName fallback), causing model_id='default' and agent_provider_id='other'
for API-mode agents.
Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)
* fix(analytics): extend feedback/run schema for full agent/model coverage
Layered on top of the conflict resolution and the v1 emit switchover
in
|
||
|
|
1c2a1c4459
|
Add launch review regression coverage and stabilize daemon tests (#3207)
* Add launch review E2E regression coverage * Harden daemon launch review regressions * Stabilize daemon runtime tests * fix(tests): restore e2e preflight typing Generated-By: looper 0.8.1 (runner=fixer, agent=codex) * fix(tests): make fake plugin runtime ESM-safe Generated-By: looper 0.8.1 (runner=fixer, agent=codex) * Stabilize e2e fake agent and regression tests * fix(tests): repair fake agent cjs runtime Generated-By: looper 0.8.1 (runner=fixer, agent=codex) * fix(review): harden plugin authoring checks Generated-By: looper 0.9.2 (runner=fixer, agent=codex) * fix(tests): bind plugin authoring run to seeded conversation Generated-By: looper 0.9.2 (runner=fixer, agent=codex) |
||
|
|
cd1790abab
|
Harden AMR Link startup model discovery (#3198)
Some checks failed
visual-baseline / Capture visual baselines (push) Waiting to run
actionlint / Lint GitHub Actions workflows (push) Failing after 1s
ci / Detect CI change scopes (push) Successful in 0s
landing-page-ci / Validate landing page (push) Failing after 1s
landing-page-staging / Deploy landing page to staging (push) Has been skipped
nix-check / build (push) Failing after 2s
ci / Validate Nix flake (push) Has been skipped
ci / Preflight (push) Failing after 1s
ci / Workspace unit tests (push) Failing after 2s
ci / Daemon workspace tests (push) Failing after 2s
ci / Web workspace tests (push) Failing after 1s
ci / Browser tests (push) Failing after 1s
ci / Build workspaces (push) Failing after 1s
ci / Validate workspace (push) Failing after 0s
ci / Runtime trace (push) Has been skipped
|
||
|
|
56fe5c5036
|
fix(amr): stage external image attachments into workspace (#3226)
* fix(daemon): forward AMR image attachments through ACP * fix(amr): stage external image attachments into workspace * fix(amr): stage prompt image paths safely |
||
|
|
b8cdf5f0ea
|
feat(mcp): generation loop + one-click Codex install (#3141)
* feat(mcp): add project creation, capability discovery, and generation tools
Lets an external coding agent (Codex, Cursor, …) drive a full design
loop over `od mcp`, not just read/write files: create a project,
discover what Open Design can make, commission a generation run, poll
it, and open the result in a browser. Complements the existing
write_file / delete_file / delete_project management tools.
New tools:
- create_project — make an empty project to generate into (start_run
needs one). Derives a slug id from the name unless given.
- list_skills / list_plugins — discover what you can ask OD to make.
- start_run / get_run / cancel_run — commission a run (OD spawns its
own agent), poll to completion, cancel. start+poll because MCP is
request/response and generation is minutes-long.
- get_run / get_project now return a browser-openable previewUrl
(entry file served raw; HTML entries render directly).
The external agent never runs a skill itself — it commissions OD to,
so the prior "skills not on MCP" boundary no longer applies.
* feat(mcp): make get_run preview hint directive
Reword the hint MCP clients receive when a run finishes so the agent
is more likely to surface the previewUrl to the user proactively —
mention the user-facing browser explicitly and call out that clients
with a built-in browser pane (e.g. Codex CLI's right-side browser)
should navigate to it directly. Also nudge start_run's hint to flag
that a previewUrl will arrive on success, so the agent knows what to
do with it before it ever sees get_run.
Pure text change; no behavior change in the tool surface or daemon.
* feat(mcp): one-click Install / Remove for Codex from Settings
Adds a toggle button on Settings → Integrations → Codex panel that runs
`codex mcp add open-design …` / `codex mcp remove open-design` via the
daemon, so users no longer need to copy TOML and paste it into
~/.codex/config.toml by hand. The copy-snippet path is unchanged and
remains the fallback when the Codex CLI isn't on PATH.
The daemon shells out to Codex CLI rather than rewriting config.toml
itself — that way we inherit Codex's own merge / dedupe / validation
rules and only track its argv. The runner is dependency-injected for
testability.
New endpoints (under /api/mcp/install/codex/*):
- GET status — probes `codex mcp get open-design`; returns
{ available, installed } so the UI can render the toggle state.
- POST — runs `codex mcp add open-design --env K=V … -- <node> <cli.js> mcp`,
reusing the same payload as /api/mcp/install-info.
- DELETE — runs `codex mcp remove open-design`.
The web UI renders the toggle only inside the Codex client panel
(`client.id === 'codex'`). When Codex CLI is missing it shows a
disabled button with an explanatory hint instead of vanishing, so users
know why one-click isn't available.
* feat(mcp): teach agents to clarify ambiguous format requests
When the user asks for a "PPT" / "deck" / "slides" / "PDF" / "doc",
that's two very different deliverables: Open Design natively produces
browser-viewable HTML/SVG (including HTML-rendered decks), but the
user may actually want a binary .pptx / .docx / .pdf — which OD does
NOT produce and which the agent would have to export from OD's output
itself. Add a paragraph to the MCP server instructions telling the
agent to ASK which one is wanted before kicking off work, rather than
silently picking one or dual-tracking both paths.
Pure prompt-text change in the instructions block; no tool surface or
behavior change. Costs ~10 lines of session-init context (one-time
per MCP session), versus dual-tracked .pptx hedging Codex was
otherwise doing on every ambiguous request.
* feat(mcp): surface agent messages, skip OD discovery, slim list_plugins
Three fixes uncovered while exercising the full MCP-driven generation
loop end-to-end with a real Codex client. Each one is a real
blocker / footgun for the external agent.
1. get_run now includes agentMessage — the inner agent's textual
output reassembled from the SSE event stream. Without this, runs
that ended in a discovery-style clarifying question (e.g. a
<question-form>) looked like "succeeded with empty output" mysteries
to the outer agent. The hint now branches on whether previewUrl
exists: with preview = show preview + relay agentMessage as the
inner agent's note; no preview = relay agentMessage as the actual
deliverable (almost always a clarifying question).
2. create_project sets skipDiscoveryBrief:true by default. The outer
agent IS the user-facing surface for MCP-driven runs, so OD's own
interactive discovery stage just creates a confusing
nested-clarification loop where its question form ends up dropped
(no files = no artifact). Better to let the outer agent gather
requirements and pass a precise prompt or plugin to start_run.
3. list_plugins flattens the daemon's bulky 16-field plugin record
(fsPath, sourceMarketplaceId, installedAt, …) into the few fields
an agent actually picks plugins on: id, title, description, kind,
tags. description / kind come from manifest.description /
manifest.od.{taskKind,kind} which the previous pass-through dropped
on the floor.
* feat(mcp): smart entry fallback + list_agents
Two fixes uncovered by exercising the full Codex-driven loop on a real
machine. Both close the gap between "Open Design has the data" and
"the external agent can find it".
1. get_project / get_run now fall back to scanning the project's file
list when metadata.entryFile is missing. We hit the case where
write_file (and a half-finished inner-agent run) put a perfectly
viewable index.html into the project, but metadata.entryFile stayed
null — so the outer agent got no previewUrl from MCP and resorted
to guessing a file:// path. Priority: declared entryFile, then
index.html anywhere, then a single .html at the project root.
Pure read-side change; no extra fetch when entryFile is already
set.
2. list_agents lets the outer agent stop guessing 'claude' / 'codex' /
'gemini' for start_run.agent. The daemon already exposed
/api/agents with 19 supported CLIs and an `available` flag. The
MCP wrapper defaults to filtering to installed agents only (so the
agent never picks one whose binary won't spawn), with
includeUnavailable:true as an opt-in to see uninstalled ones plus
their installUrl. Models truncated to 10 with modelsCount carrying
the real total — keeps the response token-economical even for
agents (opencode) with 100+ models.
* feat(mcp): tell the outer agent runs take 5–30 min, don't bypass
Direct response to a real Codex client observably cancelling an
in-flight run after 3 polls and substituting its own write_file
output ("文件时间戳没推进 → 我直接覆盖生成") — exactly the failure
mode this MCP surface exists to avoid.
start_run's hint and the session-init instructions block now both
state explicitly:
- Runs typically take 5–30 minutes.
- status:running with unchanged file mtimes is the inner agent
thinking, NOT a hang.
- Do not cancel_run out of impatience.
- Do not substitute write_file as a "faster" workaround — that
discards OD's pipeline-driven design quality.
- Poll every 30–60 seconds; report "still working" to the user
between polls.
- Only call cancel_run if the user explicitly asks.
Pure prompt-text change; no surface or behavior change. Costs ~10
lines of one-time session-init tokens + ~80 more tokens per
start_run response, in exchange for the outer agent actually
trusting the run.
* feat(mcp): persist run events to disk + expose tail-able path
Closes the in-flight visibility gap that made real Codex clients
cancel a 24-min run after 3 polls and substitute their own
write_file output, simply because polling get_run showed no change.
Daemon: every SSE event is now mirrored to a JSON-Lines file at
<RUNTIME_DATA_DIR>/runs/<runId>/events.jsonl. The path is wired
through createChatRunService's new `runsLogDir` option (null
disables, preserving legacy in-memory-only behavior). statusBody
exposes the path as `eventsLogPath`. Failures are best-effort — a
broken stream destroys itself and the run keeps going on the
in-memory event log (SSE clients are unaffected).
MCP: get_run already passed statusBody through, so eventsLogPath
surfaces automatically. The new value is that get_run during a
running status now adds a directive hint telling the outer agent to
`tail -n 50 -f <path>` in its own shell to see live progress —
that's the signal that makes the agent trust the run and stop
cancelling. The succeeded-status hint mentions the path too, for
forensics. No new tool; the field rides existing get_run polls.
Spec-first throughout:
- runs.test.ts adds 4 tests covering write-per-emit, statusBody
field, null-runsLogDir back-compat, and the no-IO guarantee
when persistence is disabled.
- mcp-runs.test.ts adds 1 test for the running-status hint.
* fix(mcp): get_run hint directs callers to pass project explicitly
The success hint in get_run previously said "project defaults to this
run's project", which is misleading: get_artifact has no run context and
falls back to /api/active when project is omitted, not to the run's
project. A client following the old guidance after creating a fresh or
non-active project could fetch the wrong project's files or fail with
"no active project".
The hint now embeds the run's projectId and tells callers to pass it
explicitly: get_artifact({ project: "<id>" }). A focused regression test
in mcp-runs.test.ts verifies the hint contains the projectId and does
not contain the incorrect active-context fallback guidance.
Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)
* fix(contracts): add eventsLogPath to ChatRunStatusResponse
The daemon's statusBody() returns eventsLogPath but the shared DTO
lacked this field, leaving web/CLI/MCP callers without a typed
accessor.
Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)
* feat(mcp): bind MCP runs to OD conversations + studio deep links
Closes the last gap that made MCP-driven runs feel like a parallel
side door: the user could not see the conversation in OD's studio
page even though the run was real, finished, and had files.
Daemon side: POST /api/runs now falls back to the project's default
conversation when the caller (MCP / SDK) only supplied projectId.
It synthesizes an assistantMessageId, writes a user message with the
prompt as content, and lets the existing
`pinAssistantMessageOnRunCreate` helper create the empty assistant
row. The existing `appendMessageAgentEvent` accumulation path then
streams text_delta events into the assistant row's content — same
as the web /api/chat flow. The response body now echoes the
resolved conversationId + assistantMessageId so MCP callers can
build a deep link.
`buildMcpInstallPayload` now also surfaces `webBaseUrl` (read from
OD_WEB_PORT, the env tools-dev exports for the web listener). MCP
clients use it to build studio deep links.
MCP side: `start_run`, `get_run`, `get_project` now return a
`studioUrl` — a browser-facing OD URL pointing at the studio page
that shows the file preview AND the chat history side by side. The
hint on each tool was updated to tell the outer agent to hand
studioUrl to the user as the primary link (previewUrl falls back to
raw-file when the user only wants the rendered output). The
webBaseUrl is fetched once via /api/mcp/install-info and cached for
5s to keep per-poll cost flat; a tiny `_resetWebBaseUrlCache` export
lets tests start each case with a clean cache.
Contracts: `ChatRunCreateResponse` gains optional conversationId +
assistantMessageId; `ChatRunStatusResponse` gains optional
eventsLogPath. Both additive, no consumer breakage.
Spec-first throughout:
- get_run includes studioUrl on success when webBaseUrl + conversationId are available
- get_run omits studioUrl when webBaseUrl is null
- start_run returns studioUrl and conversationId for the new run
- get_project returns studioUrl using the project default conversation
* fix(mcp): add skill/skillId to start_run so listed skills are actionable
Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)
* fix(test): update mcp-get-project test to handle getWebBaseUrl fetch
The get_project handler now calls getWebBaseUrl (added with the studio
deep-link feature), which fetches /api/mcp/install-info. The test mock
only handled the /api/projects/:id URL and expected a single fetch call,
causing the assertion to fail with "called 2 times" instead of 1.
Fix: handle the /api/mcp/install-info URL in the fetch mock (returning
webBaseUrl: null), update the call count expectation to 2, and call
_resetWebBaseUrlCache in afterEach to prevent cache bleed between tests.
Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)
* feat(mcp): tell agents to render studioUrl as a clickable markdown link
Observed in a real Codex client: Codex received studioUrl correctly
but rendered it as inline code (gray code-span), which its built-in
browser pane does NOT make clickable. The user had to copy-paste the
URL into a browser by hand even though Codex / Cursor / Zed all
auto-link markdown `[label](url)` syntax and would navigate it in
their right-side preview pane.
The three studioUrl-mentioning hints now explicitly tell the agent
to render the URL as a markdown link (e.g.
`[Open Open Design studio](URL)`) and never as inline code or bare
text. Pure prompt-text change.
* fix(runs): resolve default agent when MCP caller omits agentId; add McpRunCreateRequest contract type
- POST /api/runs: when no agentId is provided, resolve from app-config
or first available CLI before spawning — mirrors the pattern the
routine handler already uses. Prevents 'unknown agent: undefined'
failures on the create_project -> start_run(prompt) MCP path.
- packages/contracts: add McpRunCreateRequest interface for the
projectId-only / SDK caller shape so typed callers can construct the
request without casts. Exported via index.ts's existing chat re-export.
- packages/contracts/tests: add compile fixture verifying projectId-only,
projectId+message, and projectId+message+agentId shapes all type-check.
- apps/daemon/tests: add mcp-runs test asserting agent arg omitted in
start_run does not include agentId in the POSTed body.
Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)
|
||
|
|
50f85b509a
|
fix(analytics): fill run and feedback metadata (#3194)
* fix(analytics): fill run and feedback metadata * fix(analytics): map feedback API providers |
||
|
|
71ad9eb292
|
fix(media): enforce legacy media policy for run tokens (#3205) | ||
|
|
c847ace554
|
Add run-scoped media execution policy (#3106)
* feat(contracts): add run media execution policy * feat(daemon): enforce run media execution policy * test(daemon): cover media execution policy gates |
||
|
|
0bfb4803e7
|
feat(daemon): add Phase 2C CLI wrappers (#2179)
* feat(daemon): add phase 2c cli wrappers Co-authored-by: multica-agent <github@multica.ai> * fix: handle desktop-gated CLI imports Co-authored-by: multica-agent <github@multica.ai> * fix: pass sidecar ipc path to agent wrappers Co-authored-by: multica-agent <github@multica.ai> * fix: make agent wrapper env explicit Co-authored-by: multica-agent <github@multica.ai> * fix(daemon): preserve CLI import and diff edge cases Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai> |
||
|
|
fe24c8addf
|
chore: remove dead inline /api/agents route handler in server.ts (#2945)
Co-authored-by: Siri-Ray <2667192167@qq.com> |
||
|
|
df8a0faff6
|
feat(runtimes): register AMR (vela) as an ACP stdio agent (#2355)
* feat(runtimes): register AMR (vela) as an ACP stdio agent
AMR is the vela CLI's ACP runtime mode. `vela agent run --runtime opencode`
speaks ACP JSON-RPC over stdio (see vela's
`specs/current/runtime/manual-agent-run-openrouter.md`); per
`docs/new-agent-runtime-acp.md` we expose it through the same `streamFormat:
'acp-json-rpc'` transport that already powers Hermes, Devin, Kimi, etc.
The new `defs/amr.ts` is the entire wiring — `buildArgs` returns
`['agent', 'run', '--runtime', 'opencode']`, `fetchModels` reuses
`detectAcpModels`, and the fallback list seeds the OpenRouter ids vela's
e2e baseline uses. `executables.ts`/`app-config.ts`/`metadata.ts` get the
matching `VELA_BIN`/`VELA_LINK_URL`/`VELA_RUNTIME_KEY`/`VELA_OPENCODE_BIN`
allowlist + install/docs URLs, so users can configure the per-agent env in
Settings without leaking into other adapters.
Coverage: `tests/fixtures/fake-vela.mjs` is a minimal ACP stub that returns
the documented `initialize` / `session/new` / `session/set_model` /
`session/prompt` shapes; `tests/amr-acp-integration.test.ts` spawns it via
`child_process.spawn` and drives a full turn through `attachAcpSession` and
`detectAcpModels`, so the ACP transport contract for AMR is end-to-end
verified locally even before a real `vela` binary is installed.
Validated:
- pnpm guard
- pnpm typecheck (all workspace projects)
- pnpm --filter @open-design/daemon test (2881/2881)
Deferred: real OpenRouter-backed turn through a built `vela` binary —
the runtime def needs no changes for that path, only `VELA_RUNTIME_KEY`
and `VELA_LINK_URL` in env (or Settings).
* fix(runtimes/amr): pin a concrete default model and bare openai ids
End-to-end validation against a freshly-built `vela` (nexu-io/vela@main)
+ OpenRouter surfaced two contract details the first AMR runtime def
got wrong:
1. vela rejects `session/prompt` with `session/set_model must be called
before session/prompt`. attachAcpSession in apps/daemon/src/acp.ts
skips set_model whenever the picked model is the synthetic 'default'
id, so AMR's fallback list must NOT include DEFAULT_MODEL_OPTION. The
def now ships a concrete `gpt-5.4-mini` as both `fetchModels`'
default option and `fallbackModels[0]`, which makes attachAcpSession
always send a real `session/set_model` for AMR turns.
2. `vela --runtime opencode` auto-prepends `openai/` to whatever modelId
it forwards to opencode's openai provider. With OpenRouter-style ids
like `openai/gpt-5.4-mini`, opencode receives the double-prefixed
`openai/openai/gpt-5.4-mini` and replies `ProviderModelNotFoundError`.
The new fallback list ships the bare ids opencode's openai registry
actually knows about (gpt-5.4, gpt-5.4-mini, gpt-5.4-fast, etc.).
Stub + tests:
- tests/fixtures/fake-vela.mjs now enforces the set_model gate the same
way real vela does, so a regression that silently goes back to
model: 'default' would surface as a fatal error in tests instead of a
hidden production failure.
- tests/amr-acp-integration.test.ts pins both contracts: no 'default' /
no 'openai/' prefix in fallbackModels, and a negative case that
asserts session/prompt fails when no model is set.
Adds `apps/daemon/scripts/verify-amr-real-vela.mjs` — a small dev-time
runner that drives `attachAcpSession` against a real `vela` binary and
prints the daemon's chat events, so future protocol drift can be checked
against an actual OpenRouter call.
Verified locally: `vela agent run --runtime opencode` + OpenRouter
returns the prompted string ("AMR-E2E-PASS") through the full daemon
pipeline; daemon test suite stays 2883/2883.
* fix(runtimes/amr): substitute concrete model when chat run sends 'default'
A plugin-driven AMR run from the UI surfaced a real-world hole in the
prior commit:
json-rpc id 3: session/set_model must be called before session/prompt
The Default-design-router plugin (and any caller that doesn't pin a
real model) sends `model: 'default'` straight through, which the AMR
runtime def cannot accept — vela rejects `session/prompt` without
`session/set_model` and attachAcpSession skips set_model whenever
model === 'default'. Just leaving DEFAULT_MODEL_OPTION out of the
adapter's `fallbackModels` is not enough: the chat-run handler in
server.ts still forwarded 'default' verbatim.
This adds `resolveModelForAgent(def, resolved, env?)` as the
single source of truth for the substitution:
1. If the caller picked a real id, pass it through.
2. Else, if `def.defaultModelEnvVar` is set and the daemon process
env has a non-empty value for it, return that (operator escape
hatch — see below).
3. Else, if the def's `fallbackModels` does NOT contain a 'default'
id, return `fallbackModels[0].id`.
4. Else, return the original value (the historic shape — defs that
list 'default' themselves are untouched).
AMR sets `defaultModelEnvVar: 'VELA_DEFAULT_MODEL'`, so when
opencode's openai-provider registry deprecates `gpt-5.4-mini`
upstream, an operator can swap the fallback id without a code change
by exporting `VELA_DEFAULT_MODEL=gpt-5.5` before launching tools-dev
/ od. Worth noting the env var must live in the daemon's `process.env`
(Settings-UI per-agent env values only reach the spawned child, not
the daemon's resolver) — the new field's docblock spells this out.
Coverage:
- `tests/runtimes/resolve-model.test.ts` — 8 unit tests covering all
four resolver branches plus the env-override happy path / fallback /
ignore-when-user-picked-a-real-id case.
- `pnpm --filter @open-design/daemon typecheck` clean.
* chore(runtimes/amr): move AMR to the top of the base agent list
So `AMR (vela)` shows up first in the agent picker / status views,
ahead of claude / codex. Pure ordering change; no behavior delta.
* feat(amr): Sign-in / Sign-out button on the AMR Settings card
The first half of the AMR work assumed the operator would set
VELA_RUNTIME_KEY / VELA_LINK_URL on the daemon process and never
surfaced login state to users. This adds the missing UX so a fresh
install can drive the full path from Settings:
- GET /api/integrations/vela/status reads ~/.vela/config.json
for the active profile and returns { loggedIn, profile, user }
(without leaking the runtime/control keys themselves).
- POST /api/integrations/vela/login spawns `vela login` once
(409 if one is already in flight). The vela CLI opens the user's
browser to the device-authorization page itself — Open Design
only needs to kick the subprocess off.
- POST /api/integrations/vela/logout removes ~/.vela/config.json
so the next status read returns logged-out.
`AmrAgentCard` is a dedicated agent-card component for AMR because
the existing `<button>` row can't host an interactive sub-control
(nested interactive elements). It polls /status after a login click
until the daemon reports loggedIn=true (or 5 minutes elapse), and
exposes a Sign-out action on hover. Other adapters (claude, codex,
hermes, …) keep their existing `<button>` card.
i18n: 8 new keys (settings.amrLogin / Logout / LoggingIn / etc.)
added to en + zh-CN. Other locales spread `en` and inherit the
English copy until translations land.
Coverage:
- `tests/integrations/vela.test.ts` pins the config.json reader
against a tmp HOME — including the negative case where a profile
has user info but no runtimeKey (still logged-out), and the
secret-leak guard ("rt-secret-*" must not appear in the projection
payload).
- `tests/components/AmrAgentCard.test.tsx` covers all four UI
states (logged-out, logging-in, logged-in, logging-out) plus the
click-propagation invariant the divergent card was built to keep.
`pnpm --filter @open-design/daemon test` 2901 / 2901 passing.
`pnpm --filter @open-design/web test` 1719 / 1719 passing.
`pnpm typecheck` + `pnpm guard` clean.
Dev script side-effects: `apps/daemon/scripts/verify-amr-real-vela.mjs`
no longer requires both VELA_RUNTIME_KEY and VELA_LINK_URL — if
VELA_PROFILE is set, the vela CLI is allowed to resolve credentials
from `~/.vela/config.json`. Added the two AMR `.mjs` fixtures to
`scripts/guard.ts` allowlist with the executable-fixture / dev-runner
rationale.
* fix(connection-test): substitute model for AMR before attachAcpSession
The chat-run path in server.ts already routes the requested model through
`resolveModelForAgent` so AMR / vela (whose CLI demands an explicit
`session/set_model` before `session/prompt`) gets the def's first
concrete fallback id when the chat run ships `model: 'default'`.
`connectionTest.ts` was wiring `attachAcpSession({ ..., model: model ?? null })`
directly, which made the Test Connection button on the AMR Settings
card deadlock with the same `session/set_model must be called before
session/prompt` error the chat-run path already handles — surfaced as a
permanent "Testing connection…" spinner in the UI.
Reuse the same helper here so Test Connection mirrors chat-run behavior.
* test(amr): three-layer end-to-end coverage for the AMR login + turn flow
The PR up to this point shipped runtime + UI code with unit-level Vitest
coverage. This commit adds the cross-layer regression net the live demo
relied on:
1. apps/daemon/tests/integrations/vela.routes.test.ts (HTTP, Vitest)
Spins up the real daemon Express app via `startServer({port:0,...})`,
persists `agentCliEnv.amr.VELA_BIN = <fake>` into app-config.json,
and exercises every /api/integrations/vela/* endpoint against the
extended fake-vela stub:
- status reads ~/.vela/config.json under various states
- login spawns the fake, waits for config.json to appear, returns
pid + startedAt + profile
- 409 already-running guard with the stub's delay knob
- logout removes the file (idempotent)
- secrets (runtimeKey / controlKey) never leak in the projection
- login → status round-trip flips loggedIn=false → true
2. e2e/tests/amr/turn.test.ts (tools-dev orchestrated, Vitest)
Boots a namespaced daemon + web pair through `createSmokeSuite`,
inlines a self-contained fake `vela` binary that handles BOTH
`vela login` (writes ~/.vela/config.json) and
`vela agent run --runtime opencode` (ACP stdio with the
`session/set_model must precede session/prompt` gate the real binary
enforces), then drives a complete /api/runs lifecycle for
`agentId: 'amr', model: 'default'` and asserts the assistant message
captures the fake's streamed text. This is the test that would have
surfaced today's plugin-default-model regression (the `set_model
before prompt` error) at PR time instead of demo time.
3. e2e/ui/amr-login-pill.test.ts (Playwright)
Mocks /api/agents + /api/integrations/vela/{status,login,logout}
to drive the Settings AMR card through the full Sign in → Signed in
→ Sign out cycle. Pins the AmrLoginPill polling contract and the
aria-label semantics (the pill's accessible name is "Sign out" once
logged in, regardless of which label the hover-state text shows).
fake-vela.mjs extensions:
- Handles `vela login` argv by writing
~/.vela/config.json for the active VELA_PROFILE and exiting 0 —
mirrors real vela's on-disk side-effect without the device-auth
loop.
- FAKE_VELA_LOGIN_DELAY_MS knob so route tests can observe the
in-flight state of the spawn lifecycle.
- FAKE_VELA_LOGIN_USER_EMAIL / _USER_PLAN to assert the surfaced
user fields end-to-end.
Validated:
- `pnpm guard` + `pnpm typecheck` (all workspace projects)
- `pnpm --filter @open-design/daemon test`: 2998 / 2998 passing,
including the new 8-test integration suite.
- `cd e2e && pnpm test tests/amr`: 1 / 1 passing.
- `cd e2e && pnpm exec playwright test ui/amr-login-pill.test.ts`:
1 / 1 passing (6.7s).
* feat(amr): package native cli and refine login ui
* feat(amr): wire vela cli beta packaging
* docs(amr): document vela ci packaging review
* docs(amr): refine vela ci integration review
* fix(ci): refresh nix pnpm dependency hashes
* fix(pack): clean up Vela CLI packaging
* fix(pack): bundle Vela CLI support files
* fix(amr): recover login attempts from stale auth state
* test: expand AMR and automations coverage
* fix(amr): address review follow-ups
* test(web): align tasks fixtures with contracts
* fix(daemon): type wildcard route params
* fix(ci): refresh PR merge validation
* fix(amr): clear env credentials on logout
* feat(settings): inline local CLI model configuration
* fix(amr): recognize daemon env credentials
* [codex] Fix Vela companion packaging (#2979)
* Fix Vela companion packaging
* Update Nix pnpm dependency hashes
* [codex] Surface AMR account failures (#2980)
* fix: surface AMR account failures
* fix: cover AMR recovery error guidance
* chore: bump beta base version to 0.8.1 (#2990)
* Fix AMR profile and packaged runtime review issues
* Detect packaged AMR OpenCode companion tree
* feat(web): polish AMR frontend flows
* Polish AMR onboarding card
* fix: read AMR login state from dot-amr config (#3048)
* test: tighten AMR credential and packaging coverage
* test: restore AMR executable test env helper
* [codex] Fix packaged mac Dock identity and AMR label (#3076)
* Fix packaged mac sidecar Dock identity
* Rename AMR assistant label
* Fix AMR live models and dot-amr login state (#3073)
* fix: read AMR login state from dot-amr config
* fix: load live AMR models before runs
* fix: point AMR onboarding link to production wallet
* fix: address AMR model review feedback
* fix: persist live AMR model fallback
* [codex] Fix AMR link catalog model ids (#3088)
* Fix packaged mac sidecar Dock identity
* Rename AMR assistant label
* Fix AMR link catalog model ids
* Fix AMR model normalization typecheck
* Use live AMR model for default runs
* fix: polish AMR runtime settings UI
* Accelerate AMR startup defaults (#3092)
* Surface AMR insufficient balance wallet URL (#3099)
* fix(web): polish onboarding controls (#3112)
* fix(web): show CLI scan loading state
* Avoid duplicate AMR wallet recharge links (#3117)
* Avoid duplicate AMR wallet recharge links
* Use Vela CLI 0.0.3 test package
* chore(nix): refresh pnpm deps hash
* Fix AMR wallet guidance display
---------
Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>
* chore(pack): pin Vela CLI 0.0.3-test.1 (#3127)
* chore(nix): refresh pnpm deps hash
* chore(pack): pin Vela CLI 0.0.3
* chore(nix): refresh pnpm deps hash
* fix(web): suppress AMR exit 130 fallback (#3136)
* feat(web): nudge users to hosted AMR on model/auth/quota failures (#3083)
* feat(web): nudge users to hosted AMR on model/auth/quota failures
When a non-AMR agent run fails with an auth / quota / upstream model
error, surface an inline nudge under the error pill linking to Open
Design's hosted AMR gateway (https://open-design.ai/amr). The nudge
fires `surface_view` (element=run_failed_toast) on impression and
`ui_click` (element=go_amr) on the link.
Also teach the daemon to classify CLI-agent auth/quota/upstream failures
(Claude Code, codex, ...) into specific API error codes
(AGENT_AUTH_REQUIRED / RATE_LIMITED / UPSTREAM_UNAVAILABLE) instead of
the generic AGENT_EXECUTION_FAILED, so both the error message and the
nudge key off accurate codes. AMR's own runs are excluded from the
nudge — they keep the dedicated sign-in / recharge affordances.
* feat(web): rework failed-run AMR guidance into per-case error UI
Replace the single inline nudge with a per-case failed-run experience
driven by the run's error code + agent:
- The error card is now neutral gray (was red) and always carries a
retry button; it is driven by the persisted per-message error event so
it survives a reload.
- Non-AMR agent hitting a model/auth/quota wall: a theme-color promotion
card under the error card offers "switch to AMR & retry" — switches the
run to AMR, opens Settings on the AMR card, and auto-retries once the
account signs in (ProjectView polls vela login status, independent of
the Settings pill lifecycle, with success / 5-min-timeout / unmount
exits).
- AMR agent unauthorized: clearer copy + an "authorize & retry" button.
- AMR agent out of balance: clearer copy + a "top up" button to the AMR
wallet, with manual retry.
- Settings AMR card: when opened from the nudge, it scrolls into view and
pulses, and an authorize-button coachmark (a fake hand cursor that
rises in and dismisses on hover) points at the sign-in control when not
yet authorized.
analytics: surface_view (run_failed_toast) on the promotion card and
ui_click (go_amr) on its action are retained. i18n adds chat.amrCard.*
and chat.amrError.* (en / zh-CN / zh-TW translated; other locales fall
back to en) and drops the old chat.amrErrorGuidance keys.
* fix(daemon): require status context for numeric service-failure codes
Per review on #3083: the model-service classifier matched bare HTTP
status numbers (`500`, `502`, `429`, `401`), so ordinary CLI output like
`line 500`, `read 502 bytes`, or `exit code 401` could be misclassified
as a provider outage / auth wall and wrongly surface the AMR nudge. Now
a status number only counts when it carries explicit context (`HTTP 500`,
`status 503`, `code: 401`, `502 Bad Gateway`); textual provider phrases
(overloaded, bad gateway, service unavailable, rate limit, …) are
unchanged. Adds fixtures proving unrelated numeric output stays null.
* fix(web): keep error pill for failed runs ChatPane's card doesn't cover
Per review on #3083: the per-message gray error pill was suppressed for
every persisted error status event, but ChatPane only renders the
replacement top-level error card for `retryableAssistantMessage` (the
last failed assistant). So a failed turn that is no longer last (after a
follow-up) or an older failed run in history showed neither the pill nor
the card — its error detail vanished, undercutting reload/history
survival. ChatPane now passes `errorCardOwnerId` (the assistant id whose
error the card represents); AssistantMessage suppresses only that one
pill and keeps rendering StatusPill for all other error events.
* fix(daemon): don't treat a process exit code as an HTTP status
Follow-up to review on #3083: the status-context helper accepted a bare
`code` prefix, so `exit code 401` / `process exited with code 429` still
matched and got classified as AGENT_AUTH_REQUIRED / RATE_LIMITED (the
very `exit code 401` case the comment calls out as noise). `code` now
only counts when qualified (`status code` / `error code` / `response
code`) or punctuation-bound (`code: 401`); bare `exit code N` no longer
matches. Adds fixtures for exit-code lines returning null.
* chore(web): translate AMR card / error keys for 16 remaining locales
PR #3083 added 10 new `chat.amrCard.*` / `chat.amrError.*` keys but only
provided en/zh-CN/zh-TW translations; the other 16 locales fell back to
English. Translate the card title/body, three chips, primary CTA, and
the AMR self-error (auth / balance) messages and buttons for ar, de,
es-ES, fa, fr, hu, id, it, ja, ko, pl, pt-BR, ru, th, tr, uk.
* fix(amr): address review feedback on #2355
Targeted fixes for the unresolved review threads on #2355. Each fix
includes / updates a focused test.
- runtimes/executables.ts: `packagedVelaOpenCodeCompanionTree` now
verifies the inner `opencode` executable exists + is runnable, not
just the directory. This closes the false-positive availability path
that let `detectAgents()` surface AMR as available even when the
packaged companion was empty / partially copied (mrcfps, 4 threads).
- runtimes/executables.ts: `resolveAmrOpenCodeExecutable` now prefers
the bundled `<OD_RESOURCE_ROOT>/bin/libexec/opencode/opencode` over a
stale `opencode` on the user's PATH, so packaged AMR builds can't be
hijacked by a global installation.
- web/EntryShell.tsx: when the Local CLI scan returns an available
agent and the previously-selected agent is AMR, switch the selection
to the first available local agent so the runtime and persisted
agent agree before Continue.
- server.ts (model-probe branch): for AMR, check `readVelaLoginStatus`
BEFORE rejecting on an empty live-model catalog — a signed-out user
was getting `AMR_MODEL_UNAVAILABLE` ("choose a model") instead of
the correct `AMR_AUTH_REQUIRED` (sign-in affordance).
- server.ts (default model fallback): if the user asked for the AMR
agent default and the cached id is no longer in the FRESH catalog,
fall back to `liveModels[0]` from the probe instead of rejecting the
run as `AMR_MODEL_UNAVAILABLE`.
- integrations/vela.ts: route `vela login` through
`createCommandInvocation` so an npm/Node-style `vela.cmd` / `.bat`
shim on Windows gets the correct `cmd.exe /d /s /c …` wrapping with
verbatim args (matches `execAgentFile` / chat-run spawning).
- tools/pack/src/linux.ts: in containerized Linux builds, bind-mount
the host directory of `OPEN_DESIGN_VELA_CLI_BIN` and rewrite the env
to the container-side path. The host path was being passed in as-is
even though the default container only mounts /project, /tools-pack
and cache/home — `copyOptionalVelaCliBinary` saw a missing path.
Deferred (out of scope for this PR):
- `od amr status/login/logout/cancel` CLI subcommands (AGENTS.md
UI/CLI dual-track rule, server.ts:5763) — sizable surface; tracked
for a separate focused PR.
- Strict `--require-vela-cli` for Windows + mac-x64 beta builds:
prematurely blocked — `@powerformer/vela-cli` only publishes the
`darwin-arm64` platform binary today; adding the flag elsewhere
would fail the builds. Revisit once win/x64/linux binaries ship.
* fix(amr): hoist sendAmrAccountFailure above the AMR catalog preflight (TDZ)
The new signed-out AMR branch in the catalog preflight at server.ts:10875
calls `sendAmrAccountFailure(...)` to emit AMR_AUTH_REQUIRED, but the
const declaration sat ~100 lines below at the outer function scope. Because
`const` is TDZ-aware, that branch would have thrown `ReferenceError:
Cannot access 'sendAmrAccountFailure' before initialization` for the
exact users it tries to help — defeating the original intent.
Hoist the helper to just above the AMR preflight block so it's available
to every AMR code path in this function. Behavior elsewhere is unchanged.
Also rerun the daemon test suite: `launch.test.ts > resolveAgentLaunch
uses packaged built-in Vela for AMR` was creating the
`<resourceRoot>/bin/libexec/opencode/` companion *directory* only, but
this PR's earlier tightening of `packagedVelaOpenCodeCompanionTree`
also requires the inner `opencode` executable. Add it to that fixture
to match the new contract; the test was a sibling of the executables /
env-and-detection fixtures already updated in
|
||
|
|
269a385ee2
|
fix(daemon): reconcile missing artifact manifests on run end (#2893) (#3110)
* fix(daemon): reconcile missing artifact manifests on run end (#2893) When an agent writes HTML via write_file instead of create_artifact, no .artifact.json manifest sidecar is created. If the run then terminates (inactivity watchdog, user cancel, or process exit), the HTML file exists on disk but the manifest is missing — breaking the artifact panel, finalize, and export flows. Add a best-effort reconciliation step in the child.on('close') handler that lists project HTML files and calls reconcileHtmlArtifactManifest for any missing sidecars. The IIFE runs asynchronously after design.runs.finish() so it never blocks run finalisation. * fix(daemon): scope run-end reconciliation to files modified during the run The review on #3110 flagged that listing the entire project tree and reconciling every HTML file without a sidecar is too broad — for imported-folder projects (metadata.baseDir), pre-existing HTML files would receive spurious manifests. Record runStartTimeMs at the beginning of startChatRun and filter the reconciliation loop to only touch HTML files whose mtime >= that timestamp. Add a regression test that backdates a pre-existing HTML file and verifies it is skipped while a new file is reconciled. * test(daemon): fix mtime ordering in reconciliation regression test The runStartTimeMs was recorded after writing the new file, so its mtime fell before the threshold and the reconciliation filter skipped it. Move the timestamp capture to before the write to match the real startChatRun semantics. |
||
|
|
170a05f5d2
|
Formalize skill artifacts into plugins (#3085)
* Add skill-to-plugin candidate flow * Fix skill plugin candidate card reuse Generated-By: looper 0.9.1 (runner=fixer, agent=codex) * Fix skill plugin candidate dismiss and URL gates Generated-By: looper 0.9.1 (runner=fixer, agent=codex) * Polish skill plugin candidate copy |
||
|
|
916438d919
|
fix(daemon): hide agent executable paths from chat status (#2874) (#3046)
Stop emitting resolved filesystem paths in chat start events and inactivity-timeout diagnostics; surface agent ids instead. Complements web-side redaction in #2894. |
||
|
|
fce444bcab
|
Consolidate chat comments preview on main (#2906)
* feat(web): queue chat sends * feat(web): render code comment directives * feat(web): add preview comments and manual edits * fix(web): polish shared chrome controls * fix(web): align queued send loading state * feat(web): open primary project artifacts * fix(web): keep queued sends and tests aligned * fix(web): restore docked comment tools layout * fix(web): align preview comment toolbar * fix(web): place local cli beside handoff * fix(web): move agent menu beside handoff * fix(web): make project instructions a direct header action * fix(web): compact handoff and toolbar labels * fix(web): clarify handoff menu and annotation label * fix(web): restore compact cursor handoff trigger * fix(web): align agent menu trigger with handoff * fix(web): add draw toolbar close action * fix(web): move inspect editing into edit mode * fix(web): avoid reserving comment sidebar in annotation mode * fix(web): float preview comments panel * fix(web): keep edit canvas full width * fix(web): polish preview annotation tools * fix(web): highlight active preview comments * fix(web): open comments panel after annotation save * fix(web): polish comment handoff controls * fix(web): remove palette preview tool * fix(web): simplify draw annotation toolbar * fix(web): restore queued tasks into composer * fix(web): restore queued send strip styling * fix(web): hide internal comment target ids * fix(web): align manual edit panel header * test(web): cover visual interaction contracts * fix(web): address PR feedback regressions * fix(web): preserve artifact chrome state * fix(daemon): restore project raw file routes --------- Co-authored-by: chaoxiaoche <chaoxiaoche@chaoxiaochedeMacBook-Pro.local> Co-authored-by: mrcfps <mrc@powerformer.com> |
||
|
|
fb1e0c819f
|
fix(plugins): reject symlinked plugin assets (#2036)
* fix(plugins): reject symlinked plugin assets * test(plugins): cover asset directory symlink escapes * fix(plugins): reject symlinked asset path segments |
||
|
|
53b9d779ac
|
fix(daemon): widen HTTP keep-alive on the daemon listener (#2557)
* fix(daemon): widen HTTP keep-alive so SSE survives idle gaps The daemon's `/api/runs/:id/events` SSE stream emits an in-band `: keepalive` comment every 25s (`SSE_KEEPALIVE_INTERVAL_MS`), but Node's default `server.keepAliveTimeout` is 5_000ms. When a run is quiet for more than five seconds — e.g. the agent is still composing, or the user briefly walks away — Node closes the underlying TCP connection from under the SSE writer, the next 25s ping lands on a dead socket, and the browser surfaces it as a generic "network error" mid-stream. This is most visible behind any keep-alive-aware middlebox (the nginx running in the desktop bundle, the socat/docker bridges users set up for remote access, EC2 security-group idle timers): the default 5s window is shorter than every reasonable in-band keepalive cadence, so the connection dies before the application gets a chance to assert it's still alive. Set the listener to: - `keepAliveTimeout = 120_000` — 4.8× the in-band keepalive, plenty of slack for clock skew and slow flushes. - `headersTimeout = 125_000` — must exceed `keepAliveTimeout` per the Node docs, otherwise a misbehaving client can stall request parsing indefinitely. - `requestTimeout = 0` — disable the per-request timeout entirely; an SSE response intentionally runs for as long as the agent runs. Verified by curling `/api/runs/<id>/events` from inside the daemon container and watching the connection stay open through three full 25s keepalive cycles where it previously RST'd at ~5s. * fix(daemon): address PR #2557 review — drop requestTimeout, add regression test Three changes responding to @PerishCode's review (#2557): 1. Drop `server.requestTimeout = 0`. The reviewer is correct: that knob bounds how long the server waits to *receive* a complete request (headers + body) and is cleared the moment the request is fully parsed — it does not gate the duration of an SSE response. Setting it to 0 only removes Node 18+'s default 300s slow-loris guard, which is a real regression on a daemon that binds to 0.0.0.0 / Tailscale. 2. Rewrite the comment block. The previous comment claimed `keepAliveTimeout` "closes any idle SSE connection." Per the Node docs, `keepAliveTimeout` arms *after* a response finishes writing — it bounds the between-request idle gap on a kept-alive socket, not an in-flight streaming response. SSE drops mid-stream are almost always middlebox idle timers (nginx, socat/docker, EC2 NAT), not Node's own socket timeout, and this listener-side change cannot extend a connection past those middleboxes. What this PR actually fixes: routine kept-alive sockets used around an SSE stream (status polls, run-status fetches, the initial GET before the SSE upgrade) surviving normal client pauses. 120s gives comfortable headroom over the 25s in-band cadence so chat clients stop reconnect-storming between bursts. 3. Add `apps/daemon/tests/server-keepalive.test.ts` so a future refactor cannot silently restore the Node defaults. The test uses the existing `startServer({ port: 0, returnServer: true })` fixture (mirroring version-route.test.ts) and asserts the listener's `keepAliveTimeout` and `headersTimeout` invariants. Verified: - pnpm --filter @open-design/daemon run typecheck passes - pnpm vitest run tests/server-keepalive.test.ts → 2 passed |
||
|
|
7bc11b398d
|
chore(deps): upgrade express 4 -> 5 in daemon (#2311)
* chore(deps): upgrade express 4.22.1 -> 5.2.1 and @types/express
Breaking changes addressed:
- Renamed all bare wildcard route segments from * to *splat across
src/server.ts, src/static-resource-routes.ts, src/project-routes.ts,
src/import-export-routes.ts, and all three test stubs that define
app.get/options/delete routes using /raw/* or /raw/* patterns
- Updated wildcard param access from (req.params as any)[0] / req.params[0]
to Array.isArray(req.params.splat) ? req.params.splat.join('/') : String(...)
to handle the Express 5 / path-to-regexp v8 change where wildcard params
are now string[] instead of string
- Updated app.get('*') SPA fallback to app.get('/*splat') in server.ts
- Annotated five connector route handlers with Request<{ connectorId: string }>
so the typed param resolves as string, not string | string[], fixing the
10 TS2345 / TS2322 errors that surfaced when @types/express moved to 5.0.6
- Fixed two app.listen() beforeAll callbacks in origin-validation.test.ts to
accept and propagate the optional Error argument Express 5 now passes to
the listen callback, resolving TS2769 overload mismatch
* chore(nix): refresh daemonHash for rebased lockfile
* fix(daemon): await res.sendFile() in async route handlers for Express 5 compatibility
Express 5 res.sendFile() returns a Promise. Without await, async route
handlers return before the response is sent, causing Express to call
next() and fall through to a 404. Add await to all res.sendFile() calls
in async handlers in static-resource-routes.ts and server.ts.
* fix(daemon): use readFile+send for spritesheet route instead of sendFile
Express 5 res.sendFile() returns undefined (not a Promise). ENOENT errors
call next() asynchronously after the route handler's try/catch has returned,
causing unhandled 404 responses. Replacing with fs.promises.readFile + res.send
keeps the error path fully within the handler's try/catch.
---------
Co-authored-by: Patrick A <259201958+eefynet@users.noreply.github.com>
|
||
|
|
2b7b6590ae
|
feat(comments): add comment attachment API (#2869)
* feat(comments): add comment attachment API * ci: add fork PR workflow approval script --------- Co-authored-by: chaoxiaoche <chaoxiaoche@chaoxiaochedeMacBook-Pro.local> |
||
|
|
024e6d86a9
|
fix: validate plugin connector refs in doctor (#2164)
Some checks failed
visual-baseline / Capture visual baselines (push) Waiting to run
ci / Detect CI change scopes (push) Successful in 0s
nix-check / build (push) Failing after 1s
ci / Validate Nix flake (push) Has been skipped
ci / Preflight (push) Failing after 1s
ci / Workspace unit tests (push) Failing after 1s
ci / Daemon workspace tests (push) Failing after 1s
ci / Web workspace tests (push) Failing after 1s
ci / Browser tests (push) Failing after 1s
ci / Build workspaces (push) Failing after 1s
ci / Validate workspace (push) Failing after 0s
ci / Runtime trace (push) Has been skipped
* fix: validate plugin connector refs in doctor Co-authored-by: multica-agent <github@multica.ai> * chore: refresh pool review queue Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai> |
||
|
|
8e3d1360bd
|
fix(daemon): close # Instructions block with an explicit do-not-echo guard (#2827)
The composed chat prompt prepends a '# Instructions (read first)' block in front of '# User request' so a single user message carries both the system rules and the actual request — the shape every agent CLI (Claude, Codex, OpenCode, Gemini) expects on stdin. In practice claude-opus-4-7 (and a few other instruction-tuned models, particularly with --include-partial-messages on the stream) start their reply by echoing the top of that user message verbatim. The chat UI then shows the system prompt as a literal block leading the visible answer, e.g.: Instructions Always respond in Korean. Use Korean for all explanations… …Maintain full orthographic correctness… ).네, 완료했습니다. 전달하신 4가지 보강 포인트를 … (The closing token of the instructions block runs straight into the real answer without a newline — the telltale of a model-side echo rather than a UI render bug.) Close every Instructions block with one trailing line: (Do not quote, restate, or echo the # Instructions block above in your reply. Begin your response with the answer to the # User request below.) This kills the regression in practice without changing the turn shape (still one user message), so no agent CLI plumbing has to move. Tested via tests/chat-route.test.ts — pins the literal guard string so a future refactor cannot silently drop it. Co-authored-by: nicejames <nicejames@gmail.com> |
||
|
|
db90cb0bdb
|
fix(daemon): reject unsafe plugin manifest names (#2757)
Co-authored-by: Zerocracy Assistant <zerocracy-assistant@example.com> |
||
|
|
c14baf07d3 |
Merge origin/main into release/v0.8.0
PR #2461 sync prep — resolves 14 conflicts merging 84 main-side commits on top of 58 release-side commits accumulated during the 0.8.0 cycle. Resolution summary: Take main (theirs) where main carried deliberate forward progress: - apps/web/src/components/PluginCard.tsx — 7 hunks, i18n migration: hardcoded English aria-labels/titles replaced with t() calls keyed on pluginCard.* (all 8 keys verified present in en.ts). - apps/web/src/components/TasksView.tsx — 1 hunk, source-ingestion feature: sortedRoutines (newest-first), sourceIngestionTemplates, patchSourceForm, submitSourceIngestion. activeCount/pausedCount semantics preserved (now keyed on sortedRoutines, count unchanged). - e2e/ui/app.test.ts — new node:fs/promises + tmpdir + path + @/timeouts imports needed by main-side test helpers. - e2e/ui/settings-local-cli-codex-fallback.test.ts — menu-dismissal helper block added by main. Keep both sides where each added a different field to the same object literal: - apps/web/src/components/ProjectView.tsx (locale + analyticsHints spread). - apps/web/src/components/DesignSystemFlow.tsx (locale + analyticsHints). Take release (ours) where release carried deliberate work that ships 0.8.0: - CHANGELOG.md — release-side 0.8.0 entry + PR link refs; main's Unreleased section was the same body of work, now finalized. - apps/landing-page/public/{apple-touch-icon,favicon}.png + apps/web/public/app-icon.svg — release-side visual refresh assets consistent with 0.8.0 stable ship. - tools/pack/src/linux.ts — packageVersion const required by line 466; taking main's empty line would build-error. - e2e/ui/project-management-flows.test.ts + e2e/ui/settings-api-protocol.test.ts + e2e/ui/settings-memory-routines.test.ts — release-side release-smoke hardening (shangxinyu1 + PerishFire) takes precedence on overlap. Closes-issue / unblocks: PR #2461 sync release/v0.8.0 → main. |
||
|
|
48ed23c72f
|
fix(daemon): finish live-artifact chat runs via watchdog quiet-period handoff (#1451) (#2585)
* fix(daemon): finish live-artifact chat runs via watchdog quiet-period handoff (#1451) Live-artifact runs were staying in `Working` for the full 10-minute inactivity window even after the deliverable had been registered, and sometimes finishing as `failed` with `Agent stalled without emitting any new output for 600s`. The agent process kept its stdin/stdout alive (claude-code stream-json idle stdin, post-write reasoning that never reaches the chat) so the existing watchdog could not tell the deliverable was already in the user's hands. Wire `/api/tools/live-artifacts/create` back into the chat run via a small per-run handle registry: on the first `created` event, the run flips a local `artifactRegistered` flag and rearms the watchdog with the shorter `OD_CHAT_RUN_ARTIFACT_QUIET_PERIOD_MS` (default 60s) instead of the 10-minute pre-artifact ceiling. When that quiet timer trips, the watchdog no longer emits a stalled-error / `failed` finish; it SIGTERMs the child and lets the existing child-exit handler do final classification — the close handler now treats a SIGTERM exit after a registered artifact as `succeeded`, matching what the user actually got (a delivered artifact, not a failed run). The handoff stays with the existing child-exit lifecycle, so tool token revocation, cancel semantics, and exit-status classification keep their current owner — addressing the PR #1543 review history where finishing the run from the tool route bypassed those guarantees. Closes #1451. * fix(daemon): gate artifact quiet-period close on daemon-initiated flag (#1451 review follow-up) Reviewer (#2585) found that the close-handler branch reclassifying SIGTERM/SIGKILL as `succeeded` only checked `artifactRegistered`, so an unrelated later termination (external `kill`, OOM, container shutdown) after a successful artifact write would silently flip the run from `failed` to `succeeded` — the exact "completed without producing anything visible" failure mode the existing close handler is trying to prevent. Track the watchdog-initiated shutdown explicitly: set `artifactQuietShutdownRequested = true` immediately before `failForInactivity()` sends SIGTERM (covering the kill-grace SIGKILL escalation under the same flag), and require that flag in the close handler's quiet-period branch. Extract the final-status decision into a pure `classifyChatRunCloseStatus` so the daemon-initiated vs external signal cases can be pinned with focused unit tests instead of asserting closure-internal state via end-to-end timing. * fix(daemon): treat OD_CHAT_RUN_ARTIFACT_QUIET_PERIOD_MS=0 as disabled (#1451 review follow-up) Reviewer (#2585 non-blocking) found that an operator override of `OD_CHAT_RUN_ARTIFACT_QUIET_PERIOD_MS=0` no longer behaved as a "disable the quiet period" knob: once the artifact was registered, `activeInactivityTimeoutMs()` dropped to 0, `noteAgentActivity()` early-returned without clearing the prior timer, and the pre-artifact 10-minute timer kept running while further agent activity stopped refreshing it. Make the quiet-period switch conditional on a positive value. A 0 override now means "do not shorten after artifact registration" — the pre-artifact ceiling stays active, subsequent activity continues to reschedule it, and the existing pre-artifact stalled-error path still fires when the agent genuinely hangs. Pin the resolver as a pure `resolveActiveInactivityTimeoutMs` helper so the four quiet-vs-pre matrix cases are unit-tested directly. * fix(daemon): arm the quiet-period watchdog when pre-artifact timeout is disabled (#1451 review follow-up) Reviewer (#2585 non-blocking, round 3) found that `OD_CHAT_RUN_INACTIVITY_TIMEOUT_MS=0` paired with `OD_CHAT_RUN_ARTIFACT_QUIET_PERIOD_MS>0` left the watchdog disarmed forever. The `noteAgentActivity()` call at run start exited early because the pre-artifact delay was 0, so `inactivityTimer` was still `null` when the artifact was registered, and the prior `if (inactivityTimer) noteAgentActivity()` guard inside `noteArtifactRegistered()` then skipped the re-arm. The newly-positive quiet-period delay never armed a timer at all — a chat run that went silent right after artifact creation would stay `running` indefinitely. Drop the guard. `noteAgentActivity()` is already the function that decides whether to schedule (it bails when the active delay is 0), so calling it unconditionally keeps the behavior coherent across the four pre/quiet combinations: both non-zero (was already fine), pre=0 + quiet>0 (now arms the quiet timer), pre>0 + quiet=0 (still falls back to the pre-artifact ceiling via the existing resolver), both zero (still no watchdog at all — operator opted out). Pure-function coverage of the ceiling decision stays in `resolveActiveInactivityTimeoutMs` — exercised across the same four combinations in the existing unit suite. |
||
|
|
9912fa899a
|
feat(analytics): full design-system event family + DS run variant (#2706)
Lands the v2 PostHog spec's P0 design-system event family: five new
result events covering source ingest, create, review, status, and
picker apply; the existing file_upload_result + run_created/run_finished
schemas widened to discriminate DS workspaces from regular chat runs.
Contract (packages/contracts/src/analytics/events.ts):
- AnalyticsEventName gains design_system_{source_ingest,create,review,
status,apply}_result.
- Props interfaces + bucket/origin/method/status enums per spec.
- TrackingProjectKind gains 'design_system' for DS-as-project runs.
- RunCreatedProps / RunFinishedProps widen page_name+area to discriminate
chat_panel vs design_system_project; entry_from union accepts DS values;
DS-variant context fields (ds_source_origin, source_count, brand
description length bucket, per-source counts, design_system_created,
preview_module_count, missing_font_count).
- FileUploadSurface union adds design_systems / design_system_source.
- Bucket helpers (designSystemLengthBucket, folderCountBucket,
totalSizeBucket), module slug + type derivation, repo host parser.
Web emission sites:
- DesignSystemFlow.generate(): create_result + threads
prepareCreatedDesignSystemProject with analyticsTrack so each of the
4 source paths emits source_ingest_result (success / partial / failed
/ empty), repo-host dominance, fallback type from connector status.
- DropZone onFiles handlers: file_upload_result with deriveUploadCohort.
- DesignSystemDetailView: status_result on togglePublished + Make-default,
review_result on Looks-good / Needs-work; module_id from markdown
section header slug (designSystemModuleSlug), module_type via keyword
heuristic.
- DesignSystemsTab: status_result on publish toggle, set/unset default,
delete (incl. cancelled when window.confirm dismissed).
- NewProjectPanel: apply_result on DS picker change (manual select +
clear) plus an auto_select emit when the picker mounts with a default
DS not yet user-touched.
- ProjectView.streamViaDaemon: when project.metadata.importedFrom ===
'design-system', pass analyticsHints with entry_from
(onboarding_design_system for the auto-sent first message,
regenerate_from_review for subsequent sends), projectKind=design_system,
designSystemRunContext.
Daemon:
- ChatRequest gains optional analyticsHints (entryFrom / projectKind /
designSystemRunContext). Behavior never depends on these; only PostHog
props do.
- /api/runs handler reads analyticsHints to flip baseProps to the DS
variant (page_name=design_system_project, area=design_system_generation,
project_kind=design_system) when the run is DS-flagged, and spreads the
DS context fields onto run_created.
- run_finished mirrors the DS area + adds design_system_created (true iff
the run wrote DESIGN.md), preview_module_count (distinct preview/*.html
writes), missing_font_count (0 placeholder; pending font-audit hook).
- run-artifacts.ts: extracts collectWrittenPathsMatching as the shared
Write/Edit + isError-pair core; adds didRunCreateDesignSystemFile and
countDesignSystemPreviewModules using the same dedup + failure-skip
invariants as countNewHtmlArtifacts.
Tests:
- packages/contracts/tests/analytics-design-system-helpers.test.ts: 18
new test cases over the bucket helpers, module slug + type mapping,
repo host parser.
- apps/daemon/tests/run-artifacts.test.ts: 9 new tests for
didRunCreateDesignSystemFile + countDesignSystemPreviewModules covering
Write-then-Edit dedupe, case-insensitive DESIGN.md match, isError pair
skip, preview/index.html as a module, non-preview path rejection.
Targets release/v0.8.0.
|
||
|
|
e6da01e998
|
Add i18n metadata for official content (#2692) | ||
|
|
cc6edb9afe
|
Proxy GitHub metadata through the daemon (#2654)
* Proxy GitHub metadata through the daemon * fix(contracts): share GitHub metadata responses Generated-By: looper 0.6.0 (runner=fixer, agent=codex) * fix(contracts): align GitHub fetchedAt payload types Generated-By: looper 0.6.0 (runner=fixer, agent=codex) * Proxy GitHub metadata through the daemon Generated-By: looper 0.6.0 (runner=fixer, agent=codex) |
||
|
|
e1818f2677
|
feat(analytics): onboarding ui_click + lifecycle events + update_popover surface_view (#2590)
* feat(analytics): onboarding ui_click + lifecycle + update_popover surface_view Spec rows 1-3 of the Onboarding family (ui_click, onboarding_runtime_scan_result, onboarding_complete_result) and the home `update_popover` surface_view were all listed as P0 in the v2 doc but unwired — PostHog showed 0 events for every onboarding ui_click, 0 for the scan/complete result events, and 0 for the update-popover exposure. Contract (`packages/contracts/src/analytics/events.ts`): - Adds event names `onboarding_runtime_scan_result` / `onboarding_complete_result` and wires them into `AnalyticsEventPayload`. - Adds `OnboardingClickProps` (page_name=onboarding, area/element/ action discriminators + optional runtime/about_you/source rider fields) and threads it into `UiClickProps`. - Adds `OnboardingRuntimeScanResultProps` and `OnboardingCompleteResultProps` with the doc's full field set — enums for runtime_type / scan result / completion result / completion_type, plus the lifecycle context (has_about_you, has_design_system_request, source_count, exit_step_name). - Extends `TrackingFileUploadSurface` with an `onboarding / design_system_source` shape so the design-system-step source ingest can ride the same `file_upload_result` event the file_manager / chat composer already use. `source_type` is required on this shape so the dashboard can split by `local_code|fig|assets` without inspecting `file_type`. - Adds `UpdatePopoverSurfaceViewProps` for the home toolbar's "Update ready" panel. Onboarding wiring (`apps/web/src/components/EntryShell.tsx`): - Centralises step/runtime-context derivation in `emitOnboardingClick` + `emitOnboardingComplete` helpers; every interactive control inside OnboardingView now fires through one of them so a future spec tweak changes one place. - Click rows for runtime cards (local_coding_agent / byok), design- source cards (github_repo / local_code / fig_upload), about_you selects (organization_size / use_case / hear_about_us), and the Continue / Back / Skip navigation buttons. Multi-select use_case emits one row per added value, not per render. - `scanCliAgents` now emits `onboarding_runtime_scan_result` with detected/available counts on every terminal state — success when any CLI is available, failed when scan returned zero or threw. `duration_ms` measures wall-clock from start to terminal. - `onboarding_complete_result` fires from the Skip / last-step Continue / Generate paths with the right `completion_type`. The Generate path uses a new `DesignSystemCreationFlow.onBeforeGenerate` callback so the embedded flow can expose its local source-count state to the wrapper. DS creation flow (`apps/web/src/components/DesignSystemFlow.tsx`): - New `onBeforeGenerate(snapshot)` prop with a typed `DesignSystemGenerateSnapshot` shape. Fired right before the async generate() work; OnboardingView consumes it for both the `generate` ui_click (with source_type derived from which-counts-equal-total) and the completion lifecycle event. - `renderDesignSystemCreation` in `EntryView` / `EntryShell` / `App` grows a second `hooks` arg that plumbs `onBeforeGenerate` through. Update popover (`apps/web/src/components/UpdaterPopup.tsx`): - Fires `surface_view page_name=home area=update_popover` once per panel-open transition, deduped by `app_version_before -> app_version_after` so a re-render of the same offer doesn't inflate the count. Validation: - `pnpm guard` ✅ - `pnpm --filter @open-design/web typecheck` ✅ - `pnpm --filter @open-design/web test` ✅ 203 files / 1828 tests - `pnpm --filter @open-design/daemon test` ✅ 249 files / 2977 tests * fix(analytics): generation_progress fires from chat_panel + complete_result uses snapshot E2E (2026-05-21, distinct_id=e2e-onboarding-test-001) drove the full welcome flow and exposed two issues in the previous commit: 1. `page_view page_name=onboarding area=generation_progress` (step 4) never fired. PR #2590's commit wired this from `DesignSystemDetailView`, but the Generate path actually navigates to ProjectView (`page_name=chat_panel`), not to the DS detail surface. PostHog showed `chat_panel` and `file_manager` page_views landing right after the Generate click but no `area=generation_progress` row. Fix: fire `area=generation_progress` from `ProjectView` right alongside its `chat_panel` page_view when an onboarding session id is still in sessionStorage. Clear the session id immediately after so a later unrelated project visit doesn't inherit the onboarding attribution. The `DesignSystemDetailView` site can stay as a defense-in-depth — same dedup guard, no double-fire. 2. `onboarding_complete_result` from the Generate path shipped with `has_design_system_request: false` and `source_count: 0`. The `emitOnboardingComplete` helper read `designSource` (the click state on the three source-type cards), but E2E showed users click Generate without clicking those cards — they type a brand description and add a GitHub URL directly in the embedded form, so `designSource` stays null even when a request is clearly in flight. Fix: thread `DesignSystemGenerateSnapshot` from the `onBeforeGenerate` callback into `emitOnboardingComplete` via a new `extra.sourceSnapshot` option. When present, derive `has_design_system_request` from `sourceCount > 0 || hasBrandDescription` and `source_count` from the snapshot's `sourceCount`. Skip / last-step Continue paths still fall back to the `designSource` heuristic since no snapshot exists there. * fix(analytics): emit artifact_count from new-html count + remove unmount session-id clear Cherry-picked from the orphaned `fix/analytics-app-version-zero` HEAD (commit |
||
|
|
052f8097de
|
fix(daemon): inject @-mention skills into system prompt (#2552)
* fix(daemon): inject @-mention skills into system prompt Generated-By: looper 0.8.1 (runner=worker, agent=opencode) * fix(daemon): compose ad-hoc skill mode and aliases Generated-By: looper 0.8.1 (runner=fixer, agent=opencode) * fix(daemon): lazily load and stage ad-hoc skills Generated-By: looper 0.8.1 (runner=fixer, agent=opencode) * test(daemon): assert staged skill files before spawn Generated-By: looper 0.8.1 (runner=fixer, agent=opencode) * fix(daemon): compose skill metadata across @ mentions Generated-By: looper 0.8.1 (runner=fixer, agent=opencode) * test(daemon): cover ad-hoc critique skill policy Generated-By: looper 0.8.1 (runner=fixer, agent=opencode) * fix(daemon): preserve plugin skill composition Generated-By: looper 0.8.1 (runner=fixer, agent=opencode) * fix(daemon): resolve conflicting composed skill surfaces Generated-By: looper 0.8.1 (runner=fixer, agent=opencode) * fix(daemon): preserve primary skill surface Generated-By: looper 0.8.1 (runner=fixer, agent=opencode) * fix(daemon): share resolved critique surface Generated-By: looper 0.8.1 (runner=fixer, agent=opencode) |
||
|
|
6690dbd5bb
|
feat(analytics): PostHog + Langfuse instrumentation for assistant feedback (#1558)
* feat(analytics): PostHog + Langfuse instrumentation for assistant feedback
Re-bases the original three-commit PR onto release/v0.8.0. The web-side
feedback UI instrumentation (surface_view / ui_click / feedback_submit_result)
landed on main while this branch was open, so on this rebase that wiring
is taken from main; the remaining net additions are:
- Contracts: TrackingFeedback* enums and the four dedicated
assistant_feedback_* event payload types (click, reason_view,
reason_click, reason_submit), plus normalizeCustomReason helper.
The new event-name variants are added to TrackingEventName and the
AnalyticsEventPayload discriminated union next to the existing
surface_view/ui_click variants — both wire formats coexist.
- POST /api/runs/:id/feedback in apps/daemon/src/chat-routes.ts:
thin route that validates rating, allowlists reasonCodes through a
simple string filter, and fire-and-forgets into the daemon's
reportFeedback hook.
- apps/daemon/src/langfuse-bridge.ts reportRunFeedbackFromDaemon
forwards the rating + reasonCodes into Langfuse as user_rating
(NUMERIC ±1) + user_rating_reason (CATEGORICAL, one per code)
score-create entries. Gates on telemetry.metrics + telemetry.content.
- apps/web/src/providers/daemon.ts reportChatRunFeedback (fire-and-forget
fetch) and apps/web/src/components/ProjectView.tsx wiring so each
thumbs-up/down + reason submission posts the side-channel.
Conflicts resolved (release/v0.8.0 vs the branch's old base):
- packages/contracts/src/analytics/events.ts: keep main's
file_upload_result / feedback_submit_result / settings_* event
variants alongside the new assistant_feedback_* additions.
- apps/daemon/src/server.ts: keep DNS-aware validateExternalApiBaseUrl,
add reportFeedback closure wired into registerChatRoutes telemetry.
- apps/daemon/src/chat-routes.ts: keep both /tool-result and the new
/feedback routes; merge RegisterChatRoutesDeps to include both
'paths' and 'telemetry'. Drop PR's chat-routes-local
reconcileAssistantMessageOnRunEnd helper (main has the equivalent in
server.ts).
- apps/web/src/components/ChatPane.tsx & AssistantMessage.tsx & ProjectView.tsx:
keep main's projectKindForTracking prop name and its existing
emission of surface_view / ui_click / feedback_submit_result; the
PR's analyticsCtx-based reason_view/click/submit emission is dropped
in this rebase since it would duplicate the existing wire format.
- apps/web/tests/components/*: rename projectKind → projectKindForTracking
to match ChatPane's current prop name.
Outstanding review feedback (from the pre-rebase round, will be
addressed in a follow-up commit):
- AssistantMessage tests not yet passing the new feedback context to
the direct render path.
- ProjectView clear-feedback path skips reportChatRunFeedback, leaving
stale Langfuse user_rating scores.
- buildFeedbackPayload has no deletion path for previously-submitted
user_rating_reason scores when the user switches thumbs.
- POST /api/runs/:id/feedback always returns {status:'accepted'} even
when consent is off; needs to surface skipped_consent / skipped_no_sink.
- reasonCodes are filtered to string[] but not allowlisted against
ChatMessageFeedbackReasonCode or deduped.
* fix(analytics): address review on assistant feedback rebase
Picks up the in-scope correctness items from the prior review round
and the rebase residue without rewriting history:
- chat-routes.ts: `/feedback` now awaits the daemon's preflight
outcome and echoes it as the response. The contract was already
shaped as `accepted | skipped_consent | skipped_no_sink`, but the
previous handler always returned `accepted` because the network
send was fire-and-forget. The consent + sink decision is local
(a small file read and an env-var lookup); the actual Langfuse
upload still runs as a detached promise.
- chat-routes.ts: reasonCodes are now allowlisted against the
contract's reason-code union and deduplicated before reaching
Langfuse, so a stale or replayed client can't poison the
Langfuse score table with unknown categorical values or
duplicate stable ids in the same batch.
- langfuse-bridge.ts: split the consent + sink resolution from the
fire-and-forget network send so the route can claim `accepted`
honestly. The legacy `skipped_no_sink` return on app-config read
failure is preserved.
Contracts + comment hygiene:
- TrackingFeedbackReasonCode in packages/contracts/src/analytics/events.ts
drifted from ChatMessageFeedbackReasonCode in packages/contracts/src/api/chat.ts;
add `followed_design_system` and `missed_design_system` so the
analytics wire format stays aligned with the persistence shape.
- langfuse-trace.ts buildFeedbackPayload: the docblock claimed the
raw custom-reason text is bucketed before send. Product reversed
that on 2026-05-13 (raw text now ships, consent-gated). Replace
the stale comment with the real semantics + a note that there is
no tombstone path for reason codes the user removes in a
follow-up submission (left as scope for a later PR).
- AssistantMessage.tsx: remove the now-unused
`AssistantFeedbackAnalyticsCtx` interface and a stray blank-line
delete from the rebase; restore the analytics-context comment
above the feedback hook.
Left as follow-up (intentional, documented in code):
- Sending a tombstone score when the user clears their rating —
ProjectView still skips reportChatRunFeedback on `change===null`,
so Langfuse retains the previous rating until the user re-submits.
The PostHog event captures the clear separately.
- Removing reason-code scores when the user re-submits with a
smaller set — buildFeedbackPayload only overwrites the codes
present in the current payload.
* feat(analytics): wire PR's dedicated assistant_feedback_* events
The four dedicated event types (`assistant_feedback_click` /
`_reason_view` / `_reason_click` / `_reason_submit`) the PR added to
contracts were sitting unused after the rebase because main's
umbrella `surface_view` / `ui_click` / `feedback_submit_result`
emissions covered the same user gestures. Wire the dedicated events
alongside the umbrella ones so both wire formats fire on every
feedback action — dashboards / evals can pick whichever schema they
were built against without losing signal.
Each dedicated event has stricter typing than its umbrella sibling
(`project_id` / `project_kind` / `conversation_id` are non-null), so
the new emissions are guarded behind a presence check and skipped on
test renders that mount AssistantMessage without project context. The
umbrella emissions retain their nullable fallbacks unchanged.
Pairing:
- surface_view (feedback reason panel) ↔ assistant_feedback_reason_view
- ui_click (feedback button) ↔ assistant_feedback_click
- ui_click (reason submit button) ↔ assistant_feedback_reason_click
- feedback_submit_result ↔ assistant_feedback_reason_submit
Reason click + submit share the existing `requestId` so PostHog can
stitch click→result across both schemas, matching the spec.
|
||
|
|
10e2019c59
|
Fix plugin publish and Open Design PR workflow UX (#2564)
* Fix plugin publish and PR workflow UX * Update plugin workflow test expectations * Fix fake gh repo view verification path * Fix plugin publish headless tests and preserve PATH in shell wrappers. The publish-repo flow needs real git commits and fake gh auth output that matches gh auth status parsing. Login shells no longer drop PATH so test fakes and agent wrappers stay visible to nested gh/git calls. Co-authored-by: Cursor <cursoragent@cursor.com> * Restore plugin action card when share-task startup fails. If startGeneratedPluginShareTask rejects before a task is created, clear hiddenAssistantPluginActionPaths so the assistant action card reappears. Co-authored-by: Cursor <cursoragent@cursor.com> * Make daemon vitest self-contained for publish-github CLI shell-outs. Build dist/cli.js in tests/setup.ts when missing and set OD_DAEMON_CLI_PATH before server.ts resolves OD_BIN, so headless plugin tests pass from a clean checkout without a prior manual daemon build. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> |
||
|
|
3a33a7b475
|
fix(web): localize quick brief prompt (#2520)
* fix(web): localize quick brief prompt Generated-By: looper 0.8.1 (runner=worker, agent=codex) * fix(web): pass locale from design system chat Generated-By: looper 0.8.1 (runner=fixer, agent=codex) * fix(web): preserve task-type routing options Generated-By: looper 0.8.1 (runner=fixer, agent=codex) * fix(web): preserve task-type routing options Generated-By: looper 0.8.1 (runner=fixer, agent=codex) |
||
|
|
047d2bdd95
|
fix(orbit): respect selected app language (#2522)
* fix(orbit): respect selected app language Generated-By: looper 0.8.1 (runner=worker, agent=opencode) * fix(orbit): respect selected app language Generated-By: looper 0.8.1 (runner=fixer, agent=opencode) |
||
|
|
6bb0f0fd91
|
feat(observability): web lifecycle telemetry + stable installationId migration (#2527)
* feat(observability): web lifecycle telemetry + stable installationId migration Two intertwined safety-telemetry additions for the 0.8.0 release. Web lifecycle observability --------------------------- New `apps/web/src/observability/` module installed at module load via client-app.tsx — alongside the existing error-tracking exception hooks from #2521. Reuses error-tracking's direct-fetch transport (the same consent-bypass + early-buffer guarantees) so every event flows even when the user has opted out of general analytics: - client_long_task PerformanceObserver longtask >100ms (real "feels janky" signal, FPS proxy) - client_white_screen app fails to mount after 5s; MutationObserver cancels the timer the moment the React root renders so a normal boot is zero events - client_resource_error capture-phase window.error catches failed <script>/<link>/<img>/<iframe> loads (chunk-load failures, broken artifact refs) - client_boot_timing navigationStart → load timings via Navigation Timing v2 - client_visibility_change visibilitychange + page lifetime - client_session_summary real foreground duration emitted on pagehide - client_run_stuck 5min watchdog on SSE runs that don't progress (#2464 / #2405 / #1451 in data form) - client_iframe_error FileViewer iframe load failures (iframe errors don't bubble to window, so the global resource-error observer can't see them) - desktop_renderer_crash Electron main observes render-process-gone and forwards to daemon /api/observability/event - daemon_uncaught_exception daemon_unhandled_rejection process-level handlers on the daemon error-tracking.ts is generalised: `reportSafetyEvent(name, props)` now exposes the same buffer + direct-fetch transport that `reportHandledException` used, with identical $exception wire shape preserved for the existing exception path. Daemon cross-process bridge --------------------------- New `AnalyticsService.captureSafety()` skips the consent re-check and posts via posthog-node with installationId as distinct_id. Wired into: - `POST /api/observability/event` for desktop main and any future helper process that needs to ship a safety event (no consent check — same contract as web's direct-fetch path) - `process.on('uncaughtException')` / `unhandledRejection` on the daemon itself Stable installationId across reinstalls (critical for 0.8.0 rollout) -------------------------------------------------------------------- installationId previously lived in `<namespace>/data/app-config.json`, so a packaged reinstall that churned the namespace token (or any future namespace-scoped data wipe) rotated the id and the user showed up as a brand-new PostHog person. This is the immediate trigger: when 0.8.0 ships, every 0.7.x user upgrading would silently double the user count. New module `apps/daemon/src/installation.ts` reads/writes `<installationDir>/installation.json` at the channel root. The daemon gets the path from `OD_INSTALLATION_DIR`, set by `apps/packaged/src/sidecars.ts` to `paths.installationRoot` (one level above `namespaces/` — e.g. `~/Library/Application Support/Open Design Nightly/` on mac). `readAppConfig` transparently merges: if installation.json has an id it wins; if only app-config.json has one (the 0.7.x state), it gets mirrored to installation.json on the next read. `writeAppConfig` mirrors any explicit installationId write, including the null-clear path used by Settings → "Delete my data". 7 call sites of readAppConfig keep their signatures unchanged. Survives: - same-channel reinstall (DMG drag-replace, NSIS reinstall) - namespace churn between packaged builds - per-namespace data reset (future installer that clears `<ns>/data/`) Still rotates (intentionally): - explicit "Delete my data" - manual `rm -rf "~/Library/Application Support/Open Design <Channel>/"` - different channel (Stable vs Nightly stay distinct because userData paths differ; that's the existing channel-isolation contract) What this changes for posthog-js -------------------------------- client.ts had `capture_exceptions: false` from #2521; nothing else changes. autocapture / $pageview / $autocapture / track() / daemon analyticsService.capture() — all unchanged. New events are additive. Validation ---------- - pnpm guard pass - pnpm typecheck whole repo pass - pnpm --filter @open-design/web test 200 files / 1824 tests - pnpm --filter @open-design/daemon test 251 files / 2981 tests (includes 10 new tests in installation.test.ts pinning the 0.7.x → 0.8.0 migration, namespace-wipe survival, delete-my-data clear, and fresh-id rotation) - pnpm --filter @open-design/packaged test 9 files / 89 tests - Pre-existing baseline: apps/desktop/src/main/updater.ts has typecheck references to RELEASE_CHANNEL_NAMES.PREVIEW/NIGHTLY on release/v0.8.0; unrelated to this PR. * fix(observability): preserve fatal exit on uncaught + skip loading shell in white-screen check Addresses codex review on PR #2527 (Siri-Ray). 1) Daemon process handlers must keep Node fatal semantics Installing an uncaughtException listener silences Node's default crash/exit; Node 15+ does the same for unhandledRejection when a listener is present. The previous handlers logged telemetry and let control return to the event loop, leaving a corrupted daemon serving requests instead of letting the supervisor restart it cleanly. triggerFatalShutdown() now: - dispatches captureSafety once (guarded against re-entry from cascading faults) - races posthog-node's shutdown against a 1s bounded timeout so a slow flush can't keep the process alive - calls process.exit(1) after the race resolves Both uncaughtException and unhandledRejection route through it. apps/daemon/tests/uncaught-fatal-shutdown.test.ts pins: - captureSafety is invoked exactly once even on repeated faults - exit(1) fires on the happy path - exit(1) still fires when shutdown hangs past the timeout - exit(1) still fires when captureSafety itself throws 2) White-screen detector treated the loading shell as a successful mount apps/web/app/[[...slug]]/client-app.tsx renders the dynamic-import fallback as <div class="od-loading-shell">Loading Open Design…</div> whose visible text (19 chars) exceeded the previous 10-char floor. monitorMount() would therefore cancel the 5s timer the instant Next swapped the loading shell in, completely missing the white-screen signal the observer is meant to add. isAppMounted() now: - primary signal: <html data-od-app-mounted="1"> set by App.tsx's first useEffect — authoritative because once App has mounted at least once, any later tree crash is an $exception story, not a white-screen story - fallback: only counts children of the root container whose classList does NOT include known loading-shell markers (od-loading-shell). Their visible text drives the > MIN_VISIBLE_TEXT check, so the loading sentinel can never be mistaken for a mount. apps/web/tests/observability/white-screen.test.ts pins: - fires client_white_screen when only the loading shell is present after the timeout - does NOT fire when data-od-app-mounted is set before the timeout - cancels the timer the moment a real workspace-shell child appears alongside the loading shell - still fires when only sub-MIN_VISIBLE_TEXT non-shell content is present (effectively blank) Validation: - pnpm guard pass - pnpm typecheck pass - pnpm --filter @open-design/daemon test 252 files / 2985 tests - pnpm --filter @open-design/web test 201 files / 1828 tests * fix(observability): await captureSafety enqueue before fatal shutdown flush Addresses second-pass codex review on PR #2527 (Siri-Ray, 3279268246). The previous fatal-shutdown path called `analyticsService.captureSafety()` synchronously and immediately raced `analyticsService.shutdown()` against the bounded timeout. captureSafety in apps/daemon/src/analytics.ts does its real `client.capture()` call only inside an async IIFE after `await readInstallationIdSafe()` — so shutdown could win the race, drain an empty posthog-node queue, and let `process.exit(1)` run BEFORE the daemon crash event ever got enqueued. We'd then preserve the process-lifecycle contract but lose the exact signal this PR is adding. Changes: - AnalyticsService.captureSafety now returns Promise<void>. The async IIFE is gone; the body awaits readInstallationIdSafe directly so the returned promise resolves only AFTER client.capture() has been invoked (which is when posthog-node's local buffer contains the event). - server.ts triggerFatalShutdown awaits captureSafety, then calls shutdown, and races that whole sequence against the 1s bounded timeout. Capture failures still don't block exit (try/catch around the await). - NOOP_SERVICE.captureSafety becomes `async () => undefined` to match the new signature. - Fire-and-forget callers (/api/observability/event) are unaffected; voiding the returned promise keeps them non-blocking. apps/daemon/tests/uncaught-fatal-shutdown.test.ts adds the reviewer- requested fixture: - 'waits for the captureSafety promise to settle before invoking shutdown' — gives capture a 50ms delay and shutdown a separate 50ms delay so the intermediate "capture done / shutdown not yet" state is observable. - 'still aborts and exits if captureSafety hangs past the bounded timeout' — captureSafety never resolves; the outer 1s timeout still forces process.exit(1). Validation: - pnpm guard pass - pnpm typecheck whole repo pass - pnpm --filter @open-design/daemon test 252 files / 2987 tests |
||
|
|
88dee44892
|
feat(analytics): always-on $exception capture with early window hooks (#2521)
PostHog Error tracking was missing the vast majority of real exceptions:
1. posthog-js's capture_exceptions: true is silenced by opt_out_capturing,
so every opted-out user vanished from the error feed even though we
could perfectly safely keep collecting their stacks (the consent
toggle's user copy gates analytics, not safety telemetry).
2. posthog-js is dynamically imported only after /api/analytics/config
resolves AND the user has consented. Errors thrown during the first
1-2 seconds (React hydration, early effects) had no listener to
catch them.
Net effect: 14d $exception count was 54 events / 10 users across ~5k DAU,
producing the misleading 99.93% crash-free curve in PostHog's dashboard.
This PR makes exception capture independent of both gates:
- apps/web/src/analytics/error-tracking.ts (new): own window.error +
unhandledrejection handlers, in-memory buffer (capped at 50 entries),
direct fetch to https://<host>/i/v0/e/ with the public phc_ key. Same
scrub layer as the posthog-js path so file paths still get redacted.
- apps/web/app/[[...slug]]/client-app.tsx: installErrorHandlers() at
module-load, before React or any feature code can throw.
- apps/web/src/analytics/provider.tsx: bootstrapExceptionTracking() in
the identity useEffect, parallel to getAnalyticsClient() — runs
regardless of consent state, fetches /api/analytics/config, hands the
phc_ key + host + distinctId to the error tracker so buffered events
can flush.
- apps/web/src/analytics/client.ts: capture_exceptions: false so
posthog-js stops also emitting $exception (would have produced
duplicate events server-side); also re-bridges the error-tracking
context inside the loaded() callback so future events inherit the
fully-resolved appVersion / sessionId.
- apps/daemon/src/server.ts + packages/contracts: /api/analytics/config
now returns key + host even when consent=false. enabled still reflects
only the analytics consent toggle (posthog-js full autocapture stays
off when enabled=false), but the always-on error tracker can read key
directly. Forks without POSTHOG_KEY still get key=null and the whole
pipeline becomes a no-op — fork-safe by construction.
- apps/web/src/analytics/scrub.ts: regex fix so packaged-mac paths like
/Applications/Open Design.app/Contents/Resources/apps/web/... (which
contain a space) get fully rewritten to app://apps/web/...; previously
the [^\s] guard stopped at 'Open' and leaked the install dir.
Validation:
- pnpm --filter @open-design/web typecheck: pass
- pnpm --filter @open-design/web test: 199 files / 1823 tests pass
(includes 8 new error-tracking.test.ts cases for buffer cap, hook
install, scrub, and direct dispatch)
- pnpm --filter @open-design/daemon test: 250 files / 2971 tests pass
- pnpm guard: pass
After release/v0.8.0 ships and rolls out, expect the crash-free curve to
drop from the artificial 99.93% to a realistic 95-98% — that's not a
regression, it's the first time we're measuring it.
|
||
|
|
f5f8937421 |
Merge origin/main into release/v0.8.0
Conflict resolved by taking origin/main: - apps/web/src/components/EntryNavRail.tsx design-systems rail button icon name palette-filled (release-side) -> blocks (main); main's icon swap is part of the more recent design-systems rail pass. |
||
|
|
ce95266586
|
[codex] Polish home composer working-directory controls (#2468)
Some checks failed
visual-baseline / Capture visual baselines (push) Waiting to run
ci / Detect CI change scopes (push) Successful in 1s
nix-check / build (push) Failing after 3s
ci / Preflight (push) Failing after 2s
ci / Core package tests (push) Failing after 1s
ci / Tools workspace tests (push) Failing after 1s
ci / Daemon workspace tests (1/2) (push) Failing after 1s
ci / Daemon workspace tests (2/2) (push) Failing after 1s
ci / Web workspace tests (push) Failing after 1s
ci / E2E vitest (push) Failing after 1s
ci / Playwright critical (starters) (push) Failing after 1s
ci / Playwright critical (core) (push) Failing after 1s
ci / Build workspaces (push) Failing after 1s
ci / App workspace tests (push) Failing after 0s
ci / Validate workspace (push) Failing after 0s
ci / Runtime trace (push) Has been skipped
* Polish design system home flows * Polish home prompt presets * Polish home working directory controls * test: align home hero chrome smoke * fix: stabilize home composer ci checks --------- Co-authored-by: qiongyu1999 <2694684348@qq.com> |
||
|
|
722ddfa235 |
Merge origin/main into release/v0.8.0
Conflicts resolved by taking origin/main on both files. Root cause: main's PR #2460 (fix(landing): align logo.webp with brand icon) changed HomeHero.tsx's .home-hero__brand-mark to render <img src=/app-icon.svg> instead of an inlined <HeroBrandIcon /> SVG, and bundled the matching CSS (26px round badge with bg-panel + border + padding 2px) plus a gap/font-size tune. The release-side visual-refresh CSS still targeted the SVG layout (38px square, transparent, inset SVG selector). Keeping release's CSS would leave main's <img> unstyled. - apps/web/src/styles/home/home-hero.css three blocks, all taken from main: .home-hero__brand gap 8px, .home-hero__brand-mark redesigned for <img> child, .home-hero__brand-name font-size 16px. - apps/web/src/index.css two blocks, both taken from main: workspace tab close column 22px and .workspace-tab__close 18x18 (paired tune-down of tab UI spacing). |
||
|
|
8193981511
|
Keep PR 2400 changes without folder pickers (#2462)
* feat(daemon): add project working directory management and editor hand-off functionality - Introduced new flags for project commands to manage working directories, including `--working-dir` and `--dir`. - Implemented API routes for listing available editors and opening projects in selected editors. - Added a hand-off button in the ChatPane header to facilitate opening project folders in local applications. - Enhanced the HomeHero component to include working directory and design system settings, improving user experience in project creation. - Created HomeHeroSettingsChips component for inline management of working directory and design system selection. * feat(chat): implement voice transcription proxy and enhance UI components - Added a new API route for voice transcription using OpenAI's `/audio/transcriptions` endpoint, allowing users to send audio blobs directly for transcription. - Integrated multer for handling audio file uploads in memory, ensuring efficient processing without disk storage. - Updated the HomeHero component to include example prompt suggestions for plugins, enhancing user interaction. - Introduced the EditorIcon component to visually represent different editors in the hand-off menu, improving the user experience. - Refined the HandoffButton component to utilize the new EditorIcon, providing a more cohesive interface for selecting editors. - Enhanced CSS styles for various components to improve layout and responsiveness, including adjustments to tab and button sizes for better usability. * style(workspace-shell): enhance layout and overflow handling - Updated CSS for .workspace-shell to ensure full viewport width and height, with proper overflow management. - Adjusted grid layout to prevent content overflow and maintain responsiveness. - Modified styles for .workspace-tabs-chrome to improve width handling and prevent overflow issues. * refactor(chat): remove voice transcription proxy and related components - Deleted the voice transcription proxy implementation, including the associated API route and multer configuration. - Removed the MicButton component from the ChatComposer and HomeHero components to streamline the UI. - Updated HomeHero to include example suggestions without the voice input functionality. - Adjusted CSS styles for various components to maintain layout consistency after the removal of the MicButton. * feat(daemon): implement minting of HMAC tokens for working directory management - Added a new function `mintImportTokenFromCurrentSecret` to generate HMAC tokens bound to a specified base directory, enhancing security for working directory operations. - Updated the `desktop-auth.ts` file to include the new token minting functionality, which returns structured errors when the desktop auth secret is cleared. - Introduced new IPC message types for minting import tokens in the sidecar protocol, allowing seamless integration with the daemon's working directory management. - Enhanced the `WorkingDirPill` component to utilize the new token minting flow for secure directory selection in desktop builds. - Updated CSS styles for the HomeHero component to accommodate new example suggestion features and maintain layout consistency. * fix(HomeView): import HOME_HERO_CHIPS constant for improved chip management - Updated the HomeView component to import the HOME_HERO_CHIPS constant from the chips module, enhancing the management of hero chips within the component. * feat(daemon): implement mintImportTokenViaSidecar for secure working directory management - Introduced the `mintImportTokenViaSidecar` function to facilitate the minting of HMAC tokens for desktop-import operations via the daemon's sidecar IPC. This allows CLI commands to bypass authentication when the desktop-auth gate is active. - Updated the CLI to utilize the new token minting function when setting the working directory, ensuring secure access to trust-gated API endpoints. - Enhanced the sidecar server to handle minting requests and return structured error messages for improved user feedback. - Added tests to validate the new token minting functionality and its integration with the working directory management process. - Refactored related components to support the new token flow, improving overall security and user experience. * feat(HomeHero): enhance UI components and styles for improved user experience - Updated HomeHero component to replace active dot indicators with Plug icons for better visual representation of active plugins. - Adjusted CSS styles for various elements, including padding and dimensions, to enhance layout consistency and responsiveness. - Introduced new styles for active type icons and improved hover effects for buttons. - Updated HomeHeroSettingsChips to change button titles and icons for clarity. - Added tests to ensure proper rendering and functionality of updated components. * feat(ProjectDesignSystemPicker): enhance design system selection with preview functionality - Updated the ProjectDesignSystemPicker component to include a preview feature for design systems, allowing users to see a preview of the selected design system. - Implemented hover functionality to update the preview based on the hovered design system. - Added fullscreen preview capability for a more immersive experience. - Enhanced CSS styles for the design system picker to improve layout and responsiveness. - Introduced tests to validate the new preview functionality and ensure proper interaction within the component. * feat: refactor project metadata handling and enhance design system picker - Updated the default scenario plugin ID retrieval to use project metadata, improving the logic for determining the appropriate plugin based on project intent. - Enhanced the ProjectDesignSystemPicker and related components to support localized design system summaries and categories, improving user experience. - Introduced new translations for working directory and design system picker components, ensuring better accessibility and usability across different locales. - Added a new 'live-artifact' project type to the HomeHero chips, expanding the functionality for users creating refreshable artifacts. - Updated tests to validate the new project metadata handling and design system picker functionalities. * feat: enhance localization and styling for design system components - Added French translations for working directory and design system picker components, improving accessibility for French-speaking users. - Updated CSS styles for the pet task item to ensure consistent padding and layout. - Introduced a new test suite for HomeHeroSettingsChips to validate localization and design system selection functionality. - Enhanced ProjectDesignSystemPicker tests to ensure proper localization and interaction with design system categories. * fix: update .gitignore to include all claude-sessions directories and remove specific session files - Modified .gitignore to ensure all claude-sessions directories are ignored by using a wildcard pattern. - Deleted two specific claude-sessions markdown files to clean up unnecessary session data. * fix: repair home automation ci regressions * fix: stabilize artifact consistency e2e * Remove folder picker changes from PR 2400 --------- Co-authored-by: pftom <1043269994@qq.com> Co-authored-by: qiongyu1999 <2694684348@qq.com> |
||
|
|
255c3058c5
|
fix(analytics): app_version=0.0.0 + media providers clicks + lock run_finished error_code (#2453)
* fix(analytics): use state for runtime app version so PostHog gets the real value
`useAppVersion()` stored the fetched `/api/version` result in a `useRef`,
but ref writes do NOT trigger a re-render. The hook therefore kept
returning '0.0.0' forever and the downstream `useEffect` that calls
`client.register({ app_version, ui_version })` never re-ran with the
real version. PostHog dashboards then showed `app_version=0.0.0` and
`ui_version=0.0.0` on every event ever shipped from the web client.
Switching to `useState` lets the resolved version flow through React's
render cycle so the register-on-change effect picks it up. The boot
placeholder still ships as '0.0.0' for the first events before the
fetch resolves (we don't re-emit those), but every event after init
now carries the real daemon-pinned version.
Adds a red-spec at apps/web/tests/analytics-app-version.test.tsx that
went red on the `useRef` shape (`expected '0.0.0' to be '1.2.3'`) and
green on the `useState` shape, so a future refactor can't silently
regress it.
* feat(analytics): wire media providers click events + lock run_finished error_code invariant
Two analytics gaps shipped together because both came out of the same
PostHog spot-check after PR #2390 landed:
1. Settings → Media providers (CSV row "client_type=desktop / mason /
media_providers") wasn't emitting any ui_click events. The contract
type `SettingsMediaProvidersClickProps` and helper
`trackSettingsMediaProvidersClick` were defined but no call site
used them, so the dashboard showed zero traffic on every element.
Added the four v2 elements:
- `reload` on the "Reload from daemon" button
- `key_input` on every per-provider API key field (onFocus, mirrors
the BYOK key field pattern in this same dialog)
- `url_input` on every per-provider base-URL field
- `clear` on each row's Clear button (fires before the confirm
dialog so the intent signal is recorded even if the user backs
out)
Each event carries `providers_id` (provider.id) and `is_configured`
(truthy when the row has a stored entry).
2. `run_finished` with `result=failed` was reported as missing
`error_code` on PostHog. Audited every failure path: the daemon's
`child.on('close', ...)` handler has several branches that call
`runs.finish('failed', code, signal)` directly without first
emitting an SSE `error` event (ACP fatal, agentStreamError fall
through, child close without diagnostic), leaving
`run.errorCode === null` in the status body. The existing fallback
in `server.ts` already derives `AGENT_SIGNAL_*` / `AGENT_EXIT_*` /
`AGENT_TERMINATED_UNKNOWN` from `signal` / `exitCode` for those
cases, so the wire emission should never blank out — but the logic
was inline and had no unit coverage.
Extracted the result/error_code derivation into
`apps/daemon/src/run-result.ts` and added 12 unit tests covering:
- explicit errorCode forwarding
- signal-only failures
- exit-code-only failures
- clean (code=0) failures (ACP fatal shape)
- cancelled runs (with and without stamped code)
- empty-string errorCode defensive case
- status→result mapping for succeeded/canceled/failed/unknown
All 12 pass — confirming the invariant "result=failed always
carries error_code" holds for every failure shape the daemon
produces. The refactor pins that invariant so a future change
loses test coverage rather than silently regressing on PostHog.
If `error_code` still looks empty on a live event, share the
PostHog event JSON + the agent id and I'll dig further — at this
point the daemon emission itself is exercised end-to-end.
|
||
|
|
1cfe274a90 |
Merge origin/main into release/v0.8.0
Conflicts resolved by taking origin/main on all six points:
- apps/web/src/components/HomeHero.tsx:479-487 brand div removed
(main dropped the .home-hero__brand wrapper; the release-side visual
refresh still had it).
- apps/web/src/components/HomeHero.tsx:894-898 attach Icon size
18 (main's update) replaces 20 from release.
- apps/web/src/components/HomeHero.tsx:913-927 submit button uses
<Icon name="arrow-up" size={22} /> (main's component refactor)
instead of the release-side inline SVG.
- apps/web/src/components/EntryShell.tsx:578-582 Discord Icon size
14 (main) instead of 16 (release).
- apps/web/src/styles/home/home-hero.css drop .home-hero__brand /
__brand-mark / __brand-name rules — main removed both the component
div and these CSS rules together; keeping the CSS would be dead code.
- apps/web/src/styles/home/entry-layout.css Discord badge icon color
#5865f2 (main, the brand color introduced by PR #2386) instead of
release's neutral var(--text-strong).
|
||
|
|
31ca20f2c6
|
Add packaged update apply observations (#2429) |