mirror of
https://github.com/nexu-io/open-design.git
synced 2026-06-01 03:14:35 +07:00
8 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
56a4e97871 | feat: surface linked repo changes | ||
|
|
0fbeaf829e
|
fix(#3247): Detect, terminate, and warn on fabricated role markers across all agent paths (#3303)
* fix(daemon): detect and strip fabricated role markers in model output (#3247) Three-layer defence against models emitting `## user` / `## assistant` / `## system` lines mid-response, which the chat host interprets as real turn boundaries and acts on as unauthorised instruction: 1. **System prompt**: anti-roleplay instruction elevated from a bullet under "What you don't do" to a standalone `## CRITICAL` section in `official-system.ts`, with a REMINDER pinned at the end of the composed prompt for recency bias. 2. **Stream-level detection and truncation**: shared `role-marker-guard.ts` module (`createRoleMarkerGuard` + `FABRICATED_ROLE_MARKER_RE`) used across all text paths — Claude stream (per-message guards), non-Claude structured streams (run-scoped guard via `emitGuardedTextDelta`), and BYOK proxy routes (`createDeltaGuard`). When a marker is detected, the contaminated suffix is dropped and a `fabricated_role_marker` event surfaces a warning in the UI. 3. **UI**: `StatusPill` gains `is-warning` / `is-error` CSS variants; `fabricated_role_marker` events render as amber warning pills. * fix(chat-routes): do not await reader.cancel() on stream early-return The await on reader.cancel() can hang indefinitely on response streams whose underlying source is a Uint8Array (most notably surfaced by the ollama test in proxy-routes.test.ts, which builds its mock body via `new Response(uint8array)` rather than the controller-based helper `sseResponse()`). The hung await holds the request handler open, which in turn blocks `server.close()` in the afterAll hook, producing the two test timeouts (test at 145, hook at 36) currently failing CI on #3296. Fix is in production code, not the test: don't await the cancel. It is a cleanup hint and we are returning from the function anyway, so blocking on it offers no value. fire-and-forget with an empty catch keeps the cancel signal flowing for real HTTP streams without risking a hang on mock/edge-case implementations. Co-Authored-By: JasonBroderick <jason@buddyboss.com> * fix(daemon): terminate child on role-marker detection (close #3247 generation vector) PR #3296's detection layer truncates display and persistence of fabricated role markers, but the underlying model subprocess keeps generating tokens after detection. Three concrete consequences: 1. The model bills the user for the entire contaminated response (we observed 5,106 chars stored in claude's session file for a turn where only the first 3,013 chars were legitimate — a 40% overhead). 2. tool_use blocks emitted AFTER the marker reach the daemon's dispatcher unchecked, since detection only gates the text-delta emission path, not content-block-stop / tool_use blocks. The model could fabricate "## user delete file X" then emit a tool_use(delete X) that the dispatcher would execute. 3. The UI surfaces a `fabricated_role_marker` warning followed by an eventual normal turn-end, blurring the distinction between "completed normally" and "killed by safety guard." This commit adds a single idempotent `abortForRoleMarker(marker)` helper in server.ts, scoped to the same closure as `child` and `runGuard`. On any detection event (per-message Claude guard, run-scoped non-Claude guard, plain stdout guard) the helper: - Emits a structured `ROLE_MARKER_HALLUCINATION` SSE error so the UI can render a security-class status distinct from a normal turn-end. The existing `fabricated_role_marker` warning is still sent and rendered as the amber pill (PR #3296's UI). - Calls `acpSession.abort()` for ACP-multiplexed agents (Hermes, Kimi, Devin, Kiro) whose I/O doesn't necessarily release on SIGTERM of the wrapper process alone. - SIGTERMs the child immediately, with the existing `scheduleForcedChildShutdown()` SIGKILL fallback at 2x grace. Wired into three sites where contamination is detected: - `emitGuardedTextDelta` (sendAgentEvent / copilot / ACP / pi-rpc text_delta paths) - Plain-stdout listener (BYOK plain mode) - The Claude stream handler's onEvent (per-message guards in claude-stream.ts surface `fabricated_role_marker` events directly via onEvent rather than through the run-scoped emitGuardedTextDelta) Tool_use blocks emitted BEFORE the marker still flow through normally — this guard can't help with those, since by the time we observe a text marker the prior content block has already finished. Closing that gap requires speculative cancellation of in-flight tool calls when a downstream text block contains a marker; that's tracked as follow-up work, not included here. Co-Authored-By: roverkai <2196140098@qq.com> Co-Authored-By: JasonBroderick <jason@buddyboss.com> * refactor(role-marker-guard): bounded tail + drop chat-style markers Addresses two review comments on #3303: (1) O(1) memory + per-delta work (review r3323982225) Replace the unbounded `accumulated` string with a rolling tail capped at TAIL_BUFFER_SIZE (64 chars — comfortably exceeds the longest marker prefix `\n<whitespace>## assistant` ≈ 16–24 chars in practice). A 50 KB assistant response delivered in 1000 chunks of 50 bytes was previously O(n²) on string concatenation alone; now it is O(1) per delta regardless of message length. The `tail.length` value carries the "already emitted" offset that the cut-point math needs, so the offset semantics at L74–78 of the prior implementation are preserved without re-introducing the full-text buffer. (2) Drop chat-style markers entirely (review r3323982234, option (a)) `User:` / `Assistant:` / `Human:` / `AI:` are removed from the regex. Rationale: - The host parses ONLY `## user` / `## assistant` / `## system` lines as turn boundaries (see `buildDaemonTranscript` in apps/web/src/providers/daemon.ts). A model emitting chat-style markers does NOT cause the original #3247 security failure. - With kill-on-detection wired in this PR (`abortForRoleMarker` in server.ts), a false positive aborts the whole run — far more expensive than a stray unflagged `User:` line in chat scrollback. Chat-style markers collide with legitimate output (form labels, email contacts, JSDoc) often enough that pairing them with kill-semantics is the wrong tradeoff. The tradeoff is now documented in the regex docblock so the kill-on-match behaviour is justified against the false-positive surface. Also aligns the prompt-side CRITICAL block in system.ts: drop the "don't emit User: / Assistant: / Human: / AI:" bullet, since we no longer enforce it. Less ambiguity for the model and the operators. Test file updated: - Chat-style positive tests flipped to negative ("does NOT match User: — chat-style out of scope") so the intentional exclusion has a permanent regression test. - Two new tests cover the bounded-tail behaviour: a marker arriving after 10 KB of clean text in small chunks, and a marker straddling a chunk boundary after 100 prior chunks. - Added test for legitimate `User: bob@example.com`-style content not triggering contamination. Test count is now 35 (up from 25); two of the new ones explicitly exercise the new bounded-tail path. Co-Authored-By: JasonBroderick <jason@buddyboss.com> * fix(role-marker-guard): drop \`^\` anchor after first chunk (review r3324060995) Blocking correctness bug introduced by commit 4 (bounded-tail refactor): once \`tail\` is a rolling slice of mid-stream text, \`^\` in the canonical regex \`(?:^|\\n)\\s*##\\s+(?:user|...)\` no longer represents the genuine message start. As the rolling window slides forward chunk by chunk, a sliced tail can begin with whitespace + \`##\` (or just \`##\`), letting \`^\` anchor a match against text that the full-buffer implementation correctly ignored. With kill-on-detection wired in commit 3, that false positive now SIGTERMs the run and emits a \`ROLE_MARKER_HALLUCINATION\` error — exactly the failure class called out in the docblock at L22–29. Reviewer's evidence (PerishCode, r3324060995): streaming "…take a look at the ## user content section…" one character at a time reports \`contaminated: true\` post-refactor; the same text in a single feed stays clean. Fix: keep the canonical \`FABRICATED_ROLE_MARKER_RE\` for the very first non-empty feed (where \`^\` legitimately points at the message start), and switch to an internal \`NEWLINE_ANCHORED_ROLE_MARKER_RE\` (\`\\n\\s*##\\s+(?:user|...)\` — drops the \`^\` alternative) for all subsequent feeds. A \`firstChunk\` boolean tracks the state. Real newline-preceded markers straddling chunk boundaries are still caught because the preceding \`\\n\` is retained inside the 64-char tail. Regression tests added (\`apps/daemon/tests/role-marker-guard.test.ts\`): - mid-line \`## user\` streamed char-by-char with no preceding \\n (mirrors the reviewer's repro) - space-preceded mid-line \`## user\` in a >130-char stream, which long enough to force the rolling window past the marker — exercises the exact slice condition that triggered the bug - real \\n-preceded \`## user\` still caught after a long preamble (positive case must not regress) - \`## user\` as the very first chunk still caught (\`^\` legitimately anchors on the first feed) Co-Authored-By: JasonBroderick <jason@buddyboss.com> * fix(role-marker-guard): case-sensitive + tighter prefix scope (reviews r3324151877 / r3324151882) Two refinements addressing the third review on #3303: == Blocking (r3324151877) == The regex over-matched legitimate Markdown headings, and with kill-on-detection wired in commit 3 each false positive deterministically aborts a real run. Three changes tighten the match to the actual security surface — `## user` / `## assistant` / `## system` lines the chat host parses as turn boundaries — without losing any real attack pattern: 1. CASE-SENSITIVE. Dropped the `/i` flag. The host's turn-boundary delimiter is lowercase (see `buildDaemonTranscript` in apps/web/src/providers/daemon.ts), and the `## CRITICAL` system-prompt block already forbids only the lowercase forms. Title-Case headings like `## User Guide`, `## System Architecture`, `## Assistant settings` are now ignored — these are legitimate technical writing patterns LLMs emit constantly. `## USER NOTES` (all-caps) likewise no longer flags. 2. POSITIVE LOOKAHEAD `(?=[^a-z])` after the role keyword. Without it, `## userland`, `## userspace`, `## users guide`, `## systemd`, `## assistance` all match via prefix in the alternation. The lookahead requires the next character to exist and to not be a lowercase letter, so: - `## user\\n…` → match (newline is not lowercase) - `## assistantR…` → match (R is uppercase; the glued-form attack pattern still gets caught) - `## assistant.` → match (. is not a letter) - `## users guide` → no match (s is lowercase letter) - `## userland` → no match (l is lowercase letter) POSITIVE rather than NEGATIVE `(?![a-z])` because the negative form is satisfied at end-of-string, which in a streaming context means "we have `## user` but don't know what comes next yet" — would fire prematurely if `land` arrives in a later chunk. The positive form delays detection by one character in that edge case, traded for correctness. 3. `[ \\t]` instead of `\\s` for inner whitespace. Markdown role markers are single-line by convention; restricting to space/tab prevents oddities like `##\\nuser` from matching across lines. Test file: added Title-Case fixtures (`## User Guide`, `## System Architecture`, `## Assistant settings`, `## USER NOTES`) and prefix-of-longer-word fixtures (`## users guide`, `## userland`, `## systemd`, `## assistance`) — each asserting NO contamination. The existing `## usability` negative test gave false confidence as the reviewer noted (only failed via alternation-miss, not via word-boundary semantics); the new fixtures actually exercise the lookahead. Also added a positive test for `## assistant.` (glued punctuation) to balance the existing `## assistantReading` (glued uppercase) coverage. Total tests: 35 → 50. == Non-blocking (r3324151882) == Added `ROLE_MARKER_HALLUCINATION` to `API_ERROR_CODES` in `packages/contracts/src/errors.ts` alongside the existing agent/AMR codes, with a docblock comment explaining the emission contract: emitted by `server.ts::abortForRoleMarker` alongside the existing `fabricated_role_marker` warning event when the daemon detects a fabricated Markdown role marker in agent output; retryable. The code was already being emitted over the wire but unregistered — landing the registration here keeps the contract and emitter in sync as reviewer requested. Co-Authored-By: JasonBroderick <jason@buddyboss.com> * fix(role-marker-guard): defer complete-but-unconfirmed marker suffix Addresses review r3324277xxx — the boundary case where a stream chunk boundary lands between the role keyword and its lookahead character violated the documented "everything from the marker onward is silently dropped" contract. With (?=[^a-z]) as the lookahead, `feedText('## user')` returned `## user` as safe (no char to satisfy the lookahead → no match → pass through), so the fabricated marker line leaked into UI and app.sqlite before the next chunk confirmed contamination on the next SIGTERM cycle. Fix: introduce a `pending` state variable holding bytes that match the COMPLETE-but-unconfirmed marker prefix at end of buffer (/(?:^|\\n)[ \\t]*##[ \\t]+(?:user|assistant|assist|system)$/, no lookahead, $ anchor instead). When the no-match branch detects this suffix, withhold it from emission until the next feed either: - Confirms it (next char non-lowercase) → main regex matches → contaminated → withheld bytes dropped along with `## user`. - Denies it (next char lowercase, e.g. `userl…`) → main regex no longer matches the role keyword → withheld suffix is released and emitted alongside the new continuation. Also tied the firstChunk transition to actual byte emission rather than feed count. Previously a message that starts with `## system` followed by a separate `\\n` chunk would lose the `^` anchor on the second feed (firstChunk had flipped after the first feed even though nothing was emitted yet), silently breaking detection for that edge case. Now `firstChunk` stays true until at least one byte has crossed the emission boundary, matching the conceptual definition of "message start". Tests added (apps/daemon/tests/role-marker-guard.test.ts): - `## user` deferred at chunk boundary, confirmed by `\\n` in next - `## user` deferred at chunk boundary, denied by `land` continuation - `## assistant` deferred, confirmed by punctuation - `## User` Title-Case still passes through unconditionally - `## system` as the very first chunk: deferred, confirmed by \\n in next chunk (tests the firstChunk-stays-true-when-nothing- emitted invariant) Total tests: 50 → 55. Co-Authored-By: JasonBroderick <jason@buddyboss.com> * fix(claude-stream): scope role-marker guard to text_delta only, not thinking_delta Addresses review r3324xxxxxx — guarding the thinking channel buys no security and causes legitimate aborts. Why thinking is NOT a #3247 vector: - `buildDaemonTranscript` in apps/web/src/providers/daemon.ts only re-serializes `m.content` as `## ${m.role}\n...`. - Extended-thinking content is rendered to a separate `kind: 'thinking'` payload (daemon.ts:857-858) and never folded into `m.content`. - So a `## user` line in the thinking channel CANNOT become a fabricated turn boundary on the next round-trip. Why guarding it is harmful: - Models routinely emit literal `## user` / `## assistant` lines in chain-of-thought when reasoning about conversation structure ("Let me think about this. The user might phrase it as:\n## user\n …"). Common pattern in production traces. - With `abortForRoleMarker` wired in server.ts, a guard match on thinking SIGTERMs the run and surfaces a security error to the UI. The user paid for the reasoning, never sees the answer, and gets a confusing "fabricated role marker" warning for what was actually legitimate metacognition. - This directly contradicts the module's own stated philosophy ("a false positive aborts the whole run — a much more expensive failure than a stray unflagged ... line", role-marker-guard.ts). Fix: `emitSafeText` now passes thinking_delta through unconditionally, skipping both the guard and the contamination check. text_delta remains fully guarded. The single-line change at the top of emitSafeText preserves all other channels' behavior. Regression tests added (apps/daemon/tests/claude-stream-thinking.test.ts): - `## user` / `## assistant` lines in a thinking_delta — must NOT fire fabricated_role_marker, the thinking content streams intact including the marker text, and the subsequent text_delta answer still reaches the consumer (run not aborted). - Sanity check: same `## user` pattern in a text_delta DOES fire fabricated_role_marker and truncates emission at the marker. Locks in the channel-discriminated behavior. Co-Authored-By: JasonBroderick <jason@buddyboss.com> * fix(role-marker-guard): tie firstChunk to slicing, not byte emission Blocking review r3324xxxxxx: under the prior firstChunk transition ("any byte emitted"), a role marker that arrived at the very start of a message with its prefix split across multiple chunks bypassed detection — reopening the #3247 vector on the Claude path. Concrete cases that were missed (all are routine provider tokenizations of \`## user\n…\` at message start): - \`##\` | \` user\nDELETE…\` - \`## us\` | \`er\nDELETE…\` - \`## \` | \`user\nDELETE…\` Mechanism: the pending-deferral regex only catches COMPLETE role keywords, so a first chunk ending in a partial prefix (\`##\`, \`## \`, \`## us\`) was emitted in full. That emission flipped firstChunk to false. From that point only NEWLINE_ANCHORED_ROLE_MARKER_RE was used, which requires a literal \n before \`##\`. A marker at buffer position 0 has no preceding \n, so it could no longer match. abortForRoleMarker never fired and tool_use blocks emitted after the fabricated turn boundary reached the dispatcher. Fix: change firstChunk to track "tail has not been sliced yet" rather than "any byte emitted". While total emitted bytes <= TAIL_BUFFER_SIZE, tail still represents the entire emission so far and \`^\` in the canonical regex genuinely anchors at byte 0 of the stream — so the \`^|\n\` alternation safely catches a chunk-split message-start marker. The transition happens at the moment we would slice: once emitted > TAIL_BUFFER_SIZE, tail becomes a mid-stream window, \`^\` becomes meaningless, and we switch to the newline-only variants. Earlier iterations of this code tried two other definitions, both unsound: - "any byte emitted" (this commit fixes) — lost \`^\` before a chunk-split message-start marker could finish arriving. - "newline emitted" (briefly considered as the reviewer's alternative suggestion) — left \`^\` valid on a sliced buffer when streams hadn't emitted a newline yet, re-introducing the rolling-tail mid-stream false positive from review r3324060995. The slice-based invariant satisfies both: while we have not sliced, \`^\` is correct; once we slice, it is not. Regression tests added (apps/daemon/tests/role-marker-guard.test.ts): - \`##\` | \` user\nDELETE…\` → contaminated, marker=\`## user\` - \`## us\` | \`er\nDELETE…\` → contaminated, marker=\`## user\` - \`## \` | \`user\nDELETE…\` → contaminated, marker=\`## user\` - \`#\` | \`# user\nDELETE…\` → contaminated, marker=\`## user\` The fourth case (single \`#\` first chunk) exercises an even more adversarial tokenization than the reviewer's examples; it is also caught. Total tests: 55 → 59. Co-Authored-By: JasonBroderick <jason@buddyboss.com> * fix(tests): wrap events in stream_event envelope in thinking test feedJsonl was feeding raw events without the `{ type: 'stream_event', event: ... }` wrapper that createClaudeStreamHandler requires (line 141 of claude-stream.ts). Events silently fell through all branches, making both tests pass vacuously. Also fix TS2532 on warnings[0].marker with non-null assertion (safe after the toHaveLength(1) guard). Co-Authored-By: RoverKai <roverkai@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: roverkai <2196140098@qq.com> Co-authored-by: JasonBroderick <jason@buddyboss.com> Co-authored-by: RoverKai <roverkai@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
e92f91fb06
|
fix(contracts): tighten LiveArtifactSsePayload.refreshStatus to LiveArtifactRefreshStatus (#1871)
The SSE `live_artifact` event payload typed `refreshStatus` as `string | undefined`, but the underlying REST API (`LiveArtifact`) and every consumer treat it as the canonical `LiveArtifactRefreshStatus` union (`'never' | 'idle' | 'running' | 'succeeded' | 'failed'`). Daemon emitters already pass `artifact.refreshStatus`, which is typed as the enum at the source. The old type let new union members drift between REST and SSE unnoticed: a daemon adding a new refresh state for the REST API could ship out via SSE as a bare `string`, and web/CLI consumers switching on the union would silently fall through. Tighten the SSE field to the same enum. The field stays optional to preserve forward/backward compatibility with daemons that omit it on legacy events, but the type now narrows to the canonical members. Runtime behavior is unchanged — this is a type-only fix. Validation: contracts build/test, web typecheck, daemon typecheck. Co-authored-by: nicejames <nicejames@gmail.com> |
||
|
|
cd68c8a80a
|
fix(daemon+web): emit conversation-created SSE event when routine run starts (#1523)
* fix(daemon+web): emit conversation-created SSE event when routine run starts When a Routine fires in "Reuse an existing project" mode, the daemon creates a new conversation in the project and writes a queued/running assistant message to the database, but the open `ProjectView` has no way to learn that anything happened: the project events SSE stream only carries `file-changed` and `live_artifact*` events, and `ProjectView` reloads conversations only when `project.id` changes. The result is the user's own routine "Run now" appears to do nothing until they exit and re-enter the project (#1361). Fix: - Add a `conversation-created` payload type to the existing project events stream in `apps/web/src/providers/project-events.ts`. The payload carries `projectId`, `conversationId`, `title`, and `createdAt`. Mirror the existing `file-changed` listener pattern with explicit malformed-payload handling. - In `apps/daemon/src/server.ts`, after `insertConversation` runs in the routine `setRunHandler` (both reuse-an-existing-project and new-project paths), broadcast a `conversation-created` event through the existing `activeProjectEventSinks` map. The function body was already generic so it was renamed from `emitProjectLiveArtifactEvent` to `emitProjectEvent` and the two pre-existing callers updated. - In `apps/web/src/components/ProjectView.tsx`, when `handleProjectEvent` receives a `conversation-created` event whose `projectId` matches the currently-viewed project, refetch the conversation list via `listConversations`. The active conversation is intentionally NOT changed — per maintainer guidance on #1361, auto-switching is a separate UX decision left for a follow-up. - `projectEventToAgentEvent` returns null for `conversation-created` so it doesn't get routed into the live-artifact path. Tests (`apps/web/tests/providers/project-events.test.ts`): - A single `conversation-created` event reaches the consumer with the parsed payload. - Two consecutive `conversation-created` events from concurrent routine runs both reach the consumer (covers the multiple-concurrent-runs case reported in #1502). - Malformed `conversation-created` payloads are swallowed without throwing, matching the existing `file-changed` / `live_artifact` defensive behavior. Manual verification: - Built locally with `pnpm exec tools-pack mac build --to app --portable` and installed. - Created a routine in `Reuse an existing project` mode targeting an existing project. - With the project view open, clicked `Run now`. The new "Routine" conversation appeared in the project's conversation list within about a second, without exiting and re-entering the project, and the active conversation was not changed. - Clicked `Run now` twice in quick succession; both new conversations appeared in the list, covering the concurrent-runs case in #1502. - `pnpm guard` and `pnpm --filter @open-design/web typecheck` clean; full web test suite is 1016/1016 passing. Fixes #1361 * fix(#1523 review): share SSE type via contracts; guard conversation refresh against project-switch and reordering races Addresses Codex P1 + lefarcen P2 inline review on #1523 (#1361): 1. Move `ProjectConversationCreatedSsePayload` to `@open-design/contracts` (`packages/contracts/src/sse/chat.ts`) so the daemon producer and the web consumer share one type. The web provider re-exports it under the local `ProjectConversationCreatedEvent` name to keep the existing import shape stable for callers; the daemon emit site picks up the same shape via a JSDoc typedef so producer and consumer can't drift as this stream grows. (Addresses lefarcen P2 on project-events.ts:17.) 2. Guard the `conversation-created` async refresh in `ProjectView` against two distinct races: - Project-switch race: capture `project.id` at dispatch time and re-check it via a live `projectIdRef` after `listConversations` resolves; bail if the user switched projects while the request was in flight. The existing project-load effects use the same cancellation pattern. (Addresses Codex P1 on ProjectView.tsx:767.) - Concurrent-refresh re-ordering race: bump a monotonic `conversationsRefreshTokenRef` on every dispatch and capture each request's token; only the request whose captured token still equals the live ref at await-return applies its result. Two rapid `conversation-created` events (the #1502 concurrent Run-now case) can no longer drop the newest conversation when the earlier request resolves last with a stale, shorter list. (Addresses lefarcen P2 on ProjectView.tsx:767.) Both guards are documented inline with comments that point back at the review threads. The existing project-events tests (single delivery, concurrent delivery, malformed payloads) are unchanged — the new guards are defensive logic on the consumer, not new event shapes. `pnpm guard`, `pnpm --filter @open-design/web typecheck`, `pnpm --filter @open-design/daemon typecheck`, and the full web test suite (1016/1016) remain green. |
||
|
|
56bf6ee1b6
|
feat: agent-callable research command and /search (#615)
* feat: pre-generation research (Tavily) for grounded generation
Adds an optional pre-generation research step so the agent can produce
slides / prototypes / decks grounded in real sources instead of guessing.
User flow:
1. Settings -> Tavily Search -> paste API key (or set TAVILY_API_KEY).
2. Click the new Research button in the chat composer.
3. On send, the daemon runs a Tavily search, prepends the findings
as a <research_context> block ahead of the system prompt, and
spawns the agent. Research progress shows up as status pills in
the chat stream; the agent cites sources inline as [1]/[2]/...
Phase 1 surface:
- Single provider (Tavily), single depth ('shallow'), no LLM
synthesis pass (Tavily's `answer` is the summary).
- Composer toggle only; no popover / depth picker yet.
- Reuses the existing `status` SSE agent payload + StatusPill UI
so no new event variants or renderer code are needed.
Layers touched:
- contracts: ResearchOptions / Source / Findings DTOs;
ChatRequest.research; export from index.
- daemon: apps/daemon/src/research/{index,tavily}.ts orchestrator
+ provider; tavily added to MEDIA_PROVIDERS and ENV_KEYS; hook
in startChatRun before prompt assembly.
- web: ChatComposer toggle + ChatSendMeta; threaded through
ChatPane / ProjectView / streamViaDaemon into ChatRequest.
Side fix (required to land the feature, but useful on its own):
contracts internal relative imports lacked the `.js` suffix that
NodeNext module resolution requires. This was already breaking
`pnpm --filter @open-design/daemon typecheck` on main; without the
fix, none of the new research types were visible to the daemon.
All internal contracts imports now carry `.js`.
Spec: specs/current/research-feature.md (phases 2-4 outlined for
follow-up: composer popover, multi-provider, deep recursion, example
skills with research_recommends).
Verified:
- pnpm --filter @open-design/contracts typecheck/test
- pnpm --filter @open-design/daemon typecheck (the chokidar
project-watchers test is a pre-existing flake, unrelated)
- pnpm --filter @open-design/web typecheck
- node scripts/verify-media-models.mjs
* fix(daemon): clamp Tavily max_results to 20
Tavily's /search endpoint requires `max_results` in [0, 20]; sending a
larger value (e.g. when `research.depth: "deep"` resolves to 30) returns
400 and `runResearch` silently falls back to no-research. Clamp at the
provider boundary so Phase 2 depth tiers above 20 still produce results
instead of failing the request.
Generated-By: looper 0.6.1 (runner=fixer, agent=claude-code)
* Remove stale research merge leftovers
* Add agent-callable research search
* Fix Indonesian locale typecheck
* Fix research command invocation edge cases
* Harden slash search prompt expansion
* Honor research source caps in command contract
* Require search reports in design files
* Add research data provider settings
* Wire web research provider fallback order
* Update research provider fallback wording
* Revert "Update research provider fallback wording"
This reverts commit
|
||
|
|
c3d9136a0c
|
Add live artifacts and Composio connector catalog (#381)
* docs: add live artifacts implementation spec * docs: align live artifacts implementation plan * Ralph iteration 1: work in progress * Ralph iteration 2: work in progress * Ralph iteration 3: work in progress * Ralph iteration 4: work in progress * Ralph iteration 5: work in progress * Ralph iteration 6: work in progress * Ralph iteration 7: work in progress * Ralph iteration 8: work in progress * Ralph iteration 9: work in progress * Ralph iteration 10: work in progress * Ralph iteration 11: work in progress * Ralph iteration 12: work in progress * Ralph iteration 13: work in progress * Ralph iteration 14: work in progress * Ralph iteration 15: work in progress * Ralph iteration 16: work in progress * Ralph iteration 17: work in progress * Ralph iteration 18: work in progress * Ralph iteration 19: work in progress * Ralph iteration 20: work in progress * Ralph iteration 21: work in progress * Ralph iteration 22: work in progress * Ralph iteration 23: work in progress * Ralph iteration 24: work in progress * Ralph iteration 25: work in progress * Ralph iteration 26: work in progress * Ralph iteration 27: work in progress * Ralph iteration 28: work in progress * Ralph iteration 29: work in progress * Ralph iteration 30: work in progress * Ralph iteration 31: work in progress * Ralph iteration 32: work in progress * Ralph iteration 33: work in progress * Ralph iteration 34: work in progress * Ralph iteration 35: work in progress * Ralph iteration 36: work in progress * Ralph iteration 37: work in progress * Ralph iteration 38: work in progress * Ralph iteration 39: work in progress * Ralph iteration 40: work in progress * Ralph iteration 41: work in progress * Ralph iteration 42: work in progress * Ralph iteration 43: work in progress * Ralph iteration 44: work in progress * Ralph iteration 45: work in progress * Ralph iteration 46: work in progress * Ralph iteration 47: work in progress * Ralph iteration 48: work in progress * Ralph iteration 49: work in progress * Ralph iteration 50: work in progress * Ralph iteration 51: work in progress * Ralph iteration 52: work in progress * Ralph iteration 53: work in progress * Ralph iteration 54: work in progress * Ralph iteration 55: work in progress * Ralph iteration 56: work in progress * Ralph iteration 57: work in progress * Ralph iteration 58: work in progress * Ralph iteration 59: work in progress * Ralph iteration 60: work in progress * Ralph iteration 61: work in progress * Ralph iteration 62: work in progress * Ralph iteration 63: work in progress * Ralph iteration 64: work in progress * Ralph iteration 65: work in progress * Ralph iteration 1: work in progress * Ralph iteration 2: work in progress * Ralph iteration 3: work in progress * Ralph iteration 4: work in progress * Ralph iteration 5: work in progress * Ralph iteration 6: work in progress * Ralph iteration 8: work in progress * Ralph iteration 9: work in progress * Ralph iteration 17: work in progress * Add Composio-backed connectors * Add Composio-backed connector catalog * Fix connector callback flow * Update live artifact connector refresh * Fix live artifact refresh updates * Improve live artifact viewer toolbar * Refine live artifact source tabs * Expand Composio connector catalog * Improve Composio connector browsing * Fix artifact refresh source safety checks Generated-By: looper 0.4.1 (runner=fixer, agent=opencode) * Fix live artifacts PR feedback Generated-By: looper 0.5.0 (runner=fixer, agent=opencode) * Fix live artifact preview CORS validation Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * Fix connector OAuth IPv6 loopback hosts Allow bracketed IPv6 loopback Host headers when deriving connector OAuth callback URLs so IPv6-bound daemons can complete connection flow. Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * Preserve live artifact refresh permissions Respect explicit refresh permission choices during live artifact create and update flows so revoked connector sources remain gated. Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * Fix live artifact preview cache freshness Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * Fix live artifact refresh validation Guard manual refreshes with local daemon checks and reject daemon_tool sources without a toolName before refresh execution. Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * Fix Composio credential invalidation Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * Fix live artifact CORS methods Generated-By: looper 0.0.0-dev (runner=fixer, agent=opencode) * Fix workspace validation Restore media config test isolation under Vitest setup data-dir overrides and add the missing French live artifact display copy so the workspace test suite stays aligned.\n\nGenerated-By: looper 0.5.2 (runner=fixer, agent=opencode) * Fix connector safety filtering Keep agent-preview connector listings aligned with execution safety policy and prune stale Composio OAuth state records before they accumulate. Generated-By: looper 0.5.2 (runner=fixer, agent=opencode) * Fix agent runtime cleanup Generated-By: looper 0.5.2 (runner=fixer, agent=opencode) * Fix live artifact daemon access Validate local-only live artifact routes against the peer socket address and pass daemon-resolved CLI paths to ACP MCP descriptors.\n\nGenerated-By: looper 0.5.2 (runner=fixer, agent=opencode) * Fix connector run limit pruning Evict stale connector rate-limit buckets so long-lived daemon processes do not retain per-run entries indefinitely.\n\nGenerated-By: looper 0.5.2 (runner=fixer, agent=opencode) * Fix connector compact schemas Generated-By: looper 0.5.2 (runner=fixer, agent=opencode) * Improve connector connection feedback * Adjust connector gate positioning * Fix live artifact refresh commits Avoid marking refresh candidates failed after snapshot or state persistence errors by deferring live artifact mutations until the durable refresh metadata is written. Also align connector OAuth callback host validation with daemon loopback handling.\n\nGenerated-By: looper 0.5.4 (runner=fixer, agent=opencode) * Improve connector search relevance * fix(daemon): harden connector connection state Require loopback daemon validation before connector connect side effects and only clear provider-owned connector statuses during credential reset. Generated-By: looper 0.5.4 (runner=fixer, agent=opencode) * fix(daemon): guard connector disconnect route Require local daemon request validation before connector disconnect side effects. Generated-By: looper 0.5.4 (runner=fixer, agent=opencode) * fix(daemon): guard composio config updates Generated-By: looper 0.5.4 (runner=fixer, agent=opencode) * fix(daemon): dispatch live artifacts mcp first Route the live-artifacts MCP server before the generic MCP CLI so od mcp live-artifacts starts the dedicated server instead of failing generic argument parsing.\n\nGenerated-By: looper 0.5.4 (runner=fixer, agent=opencode) * fix(daemon): handle integer connector schemas Allow JSON Schema integer connector inputs while preserving fractional-value validation so generated connector tool schemas accept valid page sizes and limits. Generated-By: looper 0.5.4 (runner=fixer, agent=opencode) * fix: align live artifact refresh error codes Generated-By: looper 0.5.4 (runner=fixer, agent=opencode) * Fix live artifact connector refresh flow * Update live artifact design cards * Add beta badge to live artifact form * Remove live artifact tile model * Fix live artifact refresh sync * Fix live artifact MCP refresh durability Generated-By: looper 0.5.4 (runner=fixer, agent=opencode) * Fix live artifact refresh safety Enforce persisted refresh opt-out and connector auto-read gating before refresh sources execute. Generated-By: looper 0.5.5 (runner=fixer, agent=opencode) |
||
|
|
3fb849d047
|
Fix chat runs surviving web disconnects (#146)
* fix chat runs surviving web disconnects * fix chat run create abort propagation Generated-By: looper 0.0.0-dev (runner=fixer, agent=openai/gpt-5.5) * fix daemon keepalive reconnect budget Generated-By: looper 0.0.0-dev (runner=fixer, agent=gpt-5.5) * fix daemon stream disconnect cancellation Generated-By: looper 0.0.0-dev (runner=fixer, agent=openai/gpt-5.5) * fix daemon stream abort cancellation race Generated-By: looper 0.0.0-dev (runner=fixer, agent=openai/gpt-5.5) * fix daemon run cancellation semantics * fix load * doc * 2 * add run refresh recovery * fix active run refresh status * fix reattach abort handling * fix * fix chat initial scroll * fix daemon start failures Generated-By: looper 0.2.7 (runner=fixer, agent=openai/gpt-5.5) * fix background run recovery Generated-By: looper 0.2.7 (runner=fixer, agent=openai/gpt-5.5) * fix stop run status Generated-By: looper 0.2.7 (runner=fixer, agent=openai/gpt-5.5) * fix background run recovery Generated-By: looper 0.2.7 (runner=fixer, agent=openai/gpt-5.5) * extract daemon run service * move prompt composition to daemon * fix prompt module resolution * fix project id generation * add project run status * add designs kanban view with awaiting_input status - add grid/kanban view toggle on Designs tab; persist choice in localStorage - introduce awaiting_input project display status (daemon-derived from unanswered <question-form>) so projects asking the user aren't shown as Completed; ordered between Running and Completed with amber accent - hide transient queued state from users: coerce queued/starting to running in daemon /api/projects projection and drop the queued kanban column - a11y polish on Designs cards: Space activation, aria-labels on delete, focus-visible outlines, reveal delete on focus-within and touch, prefers-reduced-motion handling - kanban layout uses flex sizing instead of viewport math; scoped icon- only pill button rule fixes view-toggle icon alignment --------- Co-authored-by: mrcfps <mrc@powerformer.com> |
||
|
|
56d08b8c5f
|
Add shared contracts and migrate project code to TypeScript (#118) |