mirror of
https://github.com/nexu-io/open-design.git
synced 2026-05-31 19:04:39 +07:00
* fix(daemon): detect and strip fabricated role markers in model output (#3247) Three-layer defence against models emitting `## user` / `## assistant` / `## system` lines mid-response, which the chat host interprets as real turn boundaries and acts on as unauthorised instruction: 1. **System prompt**: anti-roleplay instruction elevated from a bullet under "What you don't do" to a standalone `## CRITICAL` section in `official-system.ts`, with a REMINDER pinned at the end of the composed prompt for recency bias. 2. **Stream-level detection and truncation**: shared `role-marker-guard.ts` module (`createRoleMarkerGuard` + `FABRICATED_ROLE_MARKER_RE`) used across all text paths — Claude stream (per-message guards), non-Claude structured streams (run-scoped guard via `emitGuardedTextDelta`), and BYOK proxy routes (`createDeltaGuard`). When a marker is detected, the contaminated suffix is dropped and a `fabricated_role_marker` event surfaces a warning in the UI. 3. **UI**: `StatusPill` gains `is-warning` / `is-error` CSS variants; `fabricated_role_marker` events render as amber warning pills. * fix(chat-routes): do not await reader.cancel() on stream early-return The await on reader.cancel() can hang indefinitely on response streams whose underlying source is a Uint8Array (most notably surfaced by the ollama test in proxy-routes.test.ts, which builds its mock body via `new Response(uint8array)` rather than the controller-based helper `sseResponse()`). The hung await holds the request handler open, which in turn blocks `server.close()` in the afterAll hook, producing the two test timeouts (test at 145, hook at 36) currently failing CI on #3296. Fix is in production code, not the test: don't await the cancel. It is a cleanup hint and we are returning from the function anyway, so blocking on it offers no value. fire-and-forget with an empty catch keeps the cancel signal flowing for real HTTP streams without risking a hang on mock/edge-case implementations. Co-Authored-By: JasonBroderick <jason@buddyboss.com> * fix(daemon): terminate child on role-marker detection (close #3247 generation vector) PR #3296's detection layer truncates display and persistence of fabricated role markers, but the underlying model subprocess keeps generating tokens after detection. Three concrete consequences: 1. The model bills the user for the entire contaminated response (we observed 5,106 chars stored in claude's session file for a turn where only the first 3,013 chars were legitimate — a 40% overhead). 2. tool_use blocks emitted AFTER the marker reach the daemon's dispatcher unchecked, since detection only gates the text-delta emission path, not content-block-stop / tool_use blocks. The model could fabricate "## user delete file X" then emit a tool_use(delete X) that the dispatcher would execute. 3. The UI surfaces a `fabricated_role_marker` warning followed by an eventual normal turn-end, blurring the distinction between "completed normally" and "killed by safety guard." This commit adds a single idempotent `abortForRoleMarker(marker)` helper in server.ts, scoped to the same closure as `child` and `runGuard`. On any detection event (per-message Claude guard, run-scoped non-Claude guard, plain stdout guard) the helper: - Emits a structured `ROLE_MARKER_HALLUCINATION` SSE error so the UI can render a security-class status distinct from a normal turn-end. The existing `fabricated_role_marker` warning is still sent and rendered as the amber pill (PR #3296's UI). - Calls `acpSession.abort()` for ACP-multiplexed agents (Hermes, Kimi, Devin, Kiro) whose I/O doesn't necessarily release on SIGTERM of the wrapper process alone. - SIGTERMs the child immediately, with the existing `scheduleForcedChildShutdown()` SIGKILL fallback at 2x grace. Wired into three sites where contamination is detected: - `emitGuardedTextDelta` (sendAgentEvent / copilot / ACP / pi-rpc text_delta paths) - Plain-stdout listener (BYOK plain mode) - The Claude stream handler's onEvent (per-message guards in claude-stream.ts surface `fabricated_role_marker` events directly via onEvent rather than through the run-scoped emitGuardedTextDelta) Tool_use blocks emitted BEFORE the marker still flow through normally — this guard can't help with those, since by the time we observe a text marker the prior content block has already finished. Closing that gap requires speculative cancellation of in-flight tool calls when a downstream text block contains a marker; that's tracked as follow-up work, not included here. Co-Authored-By: roverkai <2196140098@qq.com> Co-Authored-By: JasonBroderick <jason@buddyboss.com> * refactor(role-marker-guard): bounded tail + drop chat-style markers Addresses two review comments on #3303: (1) O(1) memory + per-delta work (review r3323982225) Replace the unbounded `accumulated` string with a rolling tail capped at TAIL_BUFFER_SIZE (64 chars — comfortably exceeds the longest marker prefix `\n<whitespace>## assistant` ≈ 16–24 chars in practice). A 50 KB assistant response delivered in 1000 chunks of 50 bytes was previously O(n²) on string concatenation alone; now it is O(1) per delta regardless of message length. The `tail.length` value carries the "already emitted" offset that the cut-point math needs, so the offset semantics at L74–78 of the prior implementation are preserved without re-introducing the full-text buffer. (2) Drop chat-style markers entirely (review r3323982234, option (a)) `User:` / `Assistant:` / `Human:` / `AI:` are removed from the regex. Rationale: - The host parses ONLY `## user` / `## assistant` / `## system` lines as turn boundaries (see `buildDaemonTranscript` in apps/web/src/providers/daemon.ts). A model emitting chat-style markers does NOT cause the original #3247 security failure. - With kill-on-detection wired in this PR (`abortForRoleMarker` in server.ts), a false positive aborts the whole run — far more expensive than a stray unflagged `User:` line in chat scrollback. Chat-style markers collide with legitimate output (form labels, email contacts, JSDoc) often enough that pairing them with kill-semantics is the wrong tradeoff. The tradeoff is now documented in the regex docblock so the kill-on-match behaviour is justified against the false-positive surface. Also aligns the prompt-side CRITICAL block in system.ts: drop the "don't emit User: / Assistant: / Human: / AI:" bullet, since we no longer enforce it. Less ambiguity for the model and the operators. Test file updated: - Chat-style positive tests flipped to negative ("does NOT match User: — chat-style out of scope") so the intentional exclusion has a permanent regression test. - Two new tests cover the bounded-tail behaviour: a marker arriving after 10 KB of clean text in small chunks, and a marker straddling a chunk boundary after 100 prior chunks. - Added test for legitimate `User: bob@example.com`-style content not triggering contamination. Test count is now 35 (up from 25); two of the new ones explicitly exercise the new bounded-tail path. Co-Authored-By: JasonBroderick <jason@buddyboss.com> * fix(role-marker-guard): drop \`^\` anchor after first chunk (review r3324060995) Blocking correctness bug introduced by commit 4 (bounded-tail refactor): once \`tail\` is a rolling slice of mid-stream text, \`^\` in the canonical regex \`(?:^|\\n)\\s*##\\s+(?:user|...)\` no longer represents the genuine message start. As the rolling window slides forward chunk by chunk, a sliced tail can begin with whitespace + \`##\` (or just \`##\`), letting \`^\` anchor a match against text that the full-buffer implementation correctly ignored. With kill-on-detection wired in commit 3, that false positive now SIGTERMs the run and emits a \`ROLE_MARKER_HALLUCINATION\` error — exactly the failure class called out in the docblock at L22–29. Reviewer's evidence (PerishCode, r3324060995): streaming "…take a look at the ## user content section…" one character at a time reports \`contaminated: true\` post-refactor; the same text in a single feed stays clean. Fix: keep the canonical \`FABRICATED_ROLE_MARKER_RE\` for the very first non-empty feed (where \`^\` legitimately points at the message start), and switch to an internal \`NEWLINE_ANCHORED_ROLE_MARKER_RE\` (\`\\n\\s*##\\s+(?:user|...)\` — drops the \`^\` alternative) for all subsequent feeds. A \`firstChunk\` boolean tracks the state. Real newline-preceded markers straddling chunk boundaries are still caught because the preceding \`\\n\` is retained inside the 64-char tail. Regression tests added (\`apps/daemon/tests/role-marker-guard.test.ts\`): - mid-line \`## user\` streamed char-by-char with no preceding \\n (mirrors the reviewer's repro) - space-preceded mid-line \`## user\` in a >130-char stream, which long enough to force the rolling window past the marker — exercises the exact slice condition that triggered the bug - real \\n-preceded \`## user\` still caught after a long preamble (positive case must not regress) - \`## user\` as the very first chunk still caught (\`^\` legitimately anchors on the first feed) Co-Authored-By: JasonBroderick <jason@buddyboss.com> * fix(role-marker-guard): case-sensitive + tighter prefix scope (reviews r3324151877 / r3324151882) Two refinements addressing the third review on #3303: == Blocking (r3324151877) == The regex over-matched legitimate Markdown headings, and with kill-on-detection wired in commit 3 each false positive deterministically aborts a real run. Three changes tighten the match to the actual security surface — `## user` / `## assistant` / `## system` lines the chat host parses as turn boundaries — without losing any real attack pattern: 1. CASE-SENSITIVE. Dropped the `/i` flag. The host's turn-boundary delimiter is lowercase (see `buildDaemonTranscript` in apps/web/src/providers/daemon.ts), and the `## CRITICAL` system-prompt block already forbids only the lowercase forms. Title-Case headings like `## User Guide`, `## System Architecture`, `## Assistant settings` are now ignored — these are legitimate technical writing patterns LLMs emit constantly. `## USER NOTES` (all-caps) likewise no longer flags. 2. POSITIVE LOOKAHEAD `(?=[^a-z])` after the role keyword. Without it, `## userland`, `## userspace`, `## users guide`, `## systemd`, `## assistance` all match via prefix in the alternation. The lookahead requires the next character to exist and to not be a lowercase letter, so: - `## user\\n…` → match (newline is not lowercase) - `## assistantR…` → match (R is uppercase; the glued-form attack pattern still gets caught) - `## assistant.` → match (. is not a letter) - `## users guide` → no match (s is lowercase letter) - `## userland` → no match (l is lowercase letter) POSITIVE rather than NEGATIVE `(?![a-z])` because the negative form is satisfied at end-of-string, which in a streaming context means "we have `## user` but don't know what comes next yet" — would fire prematurely if `land` arrives in a later chunk. The positive form delays detection by one character in that edge case, traded for correctness. 3. `[ \\t]` instead of `\\s` for inner whitespace. Markdown role markers are single-line by convention; restricting to space/tab prevents oddities like `##\\nuser` from matching across lines. Test file: added Title-Case fixtures (`## User Guide`, `## System Architecture`, `## Assistant settings`, `## USER NOTES`) and prefix-of-longer-word fixtures (`## users guide`, `## userland`, `## systemd`, `## assistance`) — each asserting NO contamination. The existing `## usability` negative test gave false confidence as the reviewer noted (only failed via alternation-miss, not via word-boundary semantics); the new fixtures actually exercise the lookahead. Also added a positive test for `## assistant.` (glued punctuation) to balance the existing `## assistantReading` (glued uppercase) coverage. Total tests: 35 → 50. == Non-blocking (r3324151882) == Added `ROLE_MARKER_HALLUCINATION` to `API_ERROR_CODES` in `packages/contracts/src/errors.ts` alongside the existing agent/AMR codes, with a docblock comment explaining the emission contract: emitted by `server.ts::abortForRoleMarker` alongside the existing `fabricated_role_marker` warning event when the daemon detects a fabricated Markdown role marker in agent output; retryable. The code was already being emitted over the wire but unregistered — landing the registration here keeps the contract and emitter in sync as reviewer requested. Co-Authored-By: JasonBroderick <jason@buddyboss.com> * fix(role-marker-guard): defer complete-but-unconfirmed marker suffix Addresses review r3324277xxx — the boundary case where a stream chunk boundary lands between the role keyword and its lookahead character violated the documented "everything from the marker onward is silently dropped" contract. With (?=[^a-z]) as the lookahead, `feedText('## user')` returned `## user` as safe (no char to satisfy the lookahead → no match → pass through), so the fabricated marker line leaked into UI and app.sqlite before the next chunk confirmed contamination on the next SIGTERM cycle. Fix: introduce a `pending` state variable holding bytes that match the COMPLETE-but-unconfirmed marker prefix at end of buffer (/(?:^|\\n)[ \\t]*##[ \\t]+(?:user|assistant|assist|system)$/, no lookahead, $ anchor instead). When the no-match branch detects this suffix, withhold it from emission until the next feed either: - Confirms it (next char non-lowercase) → main regex matches → contaminated → withheld bytes dropped along with `## user`. - Denies it (next char lowercase, e.g. `userl…`) → main regex no longer matches the role keyword → withheld suffix is released and emitted alongside the new continuation. Also tied the firstChunk transition to actual byte emission rather than feed count. Previously a message that starts with `## system` followed by a separate `\\n` chunk would lose the `^` anchor on the second feed (firstChunk had flipped after the first feed even though nothing was emitted yet), silently breaking detection for that edge case. Now `firstChunk` stays true until at least one byte has crossed the emission boundary, matching the conceptual definition of "message start". Tests added (apps/daemon/tests/role-marker-guard.test.ts): - `## user` deferred at chunk boundary, confirmed by `\\n` in next - `## user` deferred at chunk boundary, denied by `land` continuation - `## assistant` deferred, confirmed by punctuation - `## User` Title-Case still passes through unconditionally - `## system` as the very first chunk: deferred, confirmed by \\n in next chunk (tests the firstChunk-stays-true-when-nothing- emitted invariant) Total tests: 50 → 55. Co-Authored-By: JasonBroderick <jason@buddyboss.com> * fix(claude-stream): scope role-marker guard to text_delta only, not thinking_delta Addresses review r3324xxxxxx — guarding the thinking channel buys no security and causes legitimate aborts. Why thinking is NOT a #3247 vector: - `buildDaemonTranscript` in apps/web/src/providers/daemon.ts only re-serializes `m.content` as `## ${m.role}\n...`. - Extended-thinking content is rendered to a separate `kind: 'thinking'` payload (daemon.ts:857-858) and never folded into `m.content`. - So a `## user` line in the thinking channel CANNOT become a fabricated turn boundary on the next round-trip. Why guarding it is harmful: - Models routinely emit literal `## user` / `## assistant` lines in chain-of-thought when reasoning about conversation structure ("Let me think about this. The user might phrase it as:\n## user\n …"). Common pattern in production traces. - With `abortForRoleMarker` wired in server.ts, a guard match on thinking SIGTERMs the run and surfaces a security error to the UI. The user paid for the reasoning, never sees the answer, and gets a confusing "fabricated role marker" warning for what was actually legitimate metacognition. - This directly contradicts the module's own stated philosophy ("a false positive aborts the whole run — a much more expensive failure than a stray unflagged ... line", role-marker-guard.ts). Fix: `emitSafeText` now passes thinking_delta through unconditionally, skipping both the guard and the contamination check. text_delta remains fully guarded. The single-line change at the top of emitSafeText preserves all other channels' behavior. Regression tests added (apps/daemon/tests/claude-stream-thinking.test.ts): - `## user` / `## assistant` lines in a thinking_delta — must NOT fire fabricated_role_marker, the thinking content streams intact including the marker text, and the subsequent text_delta answer still reaches the consumer (run not aborted). - Sanity check: same `## user` pattern in a text_delta DOES fire fabricated_role_marker and truncates emission at the marker. Locks in the channel-discriminated behavior. Co-Authored-By: JasonBroderick <jason@buddyboss.com> * fix(role-marker-guard): tie firstChunk to slicing, not byte emission Blocking review r3324xxxxxx: under the prior firstChunk transition ("any byte emitted"), a role marker that arrived at the very start of a message with its prefix split across multiple chunks bypassed detection — reopening the #3247 vector on the Claude path. Concrete cases that were missed (all are routine provider tokenizations of \`## user\n…\` at message start): - \`##\` | \` user\nDELETE…\` - \`## us\` | \`er\nDELETE…\` - \`## \` | \`user\nDELETE…\` Mechanism: the pending-deferral regex only catches COMPLETE role keywords, so a first chunk ending in a partial prefix (\`##\`, \`## \`, \`## us\`) was emitted in full. That emission flipped firstChunk to false. From that point only NEWLINE_ANCHORED_ROLE_MARKER_RE was used, which requires a literal \n before \`##\`. A marker at buffer position 0 has no preceding \n, so it could no longer match. abortForRoleMarker never fired and tool_use blocks emitted after the fabricated turn boundary reached the dispatcher. Fix: change firstChunk to track "tail has not been sliced yet" rather than "any byte emitted". While total emitted bytes <= TAIL_BUFFER_SIZE, tail still represents the entire emission so far and \`^\` in the canonical regex genuinely anchors at byte 0 of the stream — so the \`^|\n\` alternation safely catches a chunk-split message-start marker. The transition happens at the moment we would slice: once emitted > TAIL_BUFFER_SIZE, tail becomes a mid-stream window, \`^\` becomes meaningless, and we switch to the newline-only variants. Earlier iterations of this code tried two other definitions, both unsound: - "any byte emitted" (this commit fixes) — lost \`^\` before a chunk-split message-start marker could finish arriving. - "newline emitted" (briefly considered as the reviewer's alternative suggestion) — left \`^\` valid on a sliced buffer when streams hadn't emitted a newline yet, re-introducing the rolling-tail mid-stream false positive from review r3324060995. The slice-based invariant satisfies both: while we have not sliced, \`^\` is correct; once we slice, it is not. Regression tests added (apps/daemon/tests/role-marker-guard.test.ts): - \`##\` | \` user\nDELETE…\` → contaminated, marker=\`## user\` - \`## us\` | \`er\nDELETE…\` → contaminated, marker=\`## user\` - \`## \` | \`user\nDELETE…\` → contaminated, marker=\`## user\` - \`#\` | \`# user\nDELETE…\` → contaminated, marker=\`## user\` The fourth case (single \`#\` first chunk) exercises an even more adversarial tokenization than the reviewer's examples; it is also caught. Total tests: 55 → 59. Co-Authored-By: JasonBroderick <jason@buddyboss.com> * fix(tests): wrap events in stream_event envelope in thinking test feedJsonl was feeding raw events without the `{ type: 'stream_event', event: ... }` wrapper that createClaudeStreamHandler requires (line 141 of claude-stream.ts). Events silently fell through all branches, making both tests pass vacuously. Also fix TS2532 on warnings[0].marker with non-null assertion (safe after the toHaveLength(1) guard). Co-Authored-By: RoverKai <roverkai@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: roverkai <2196140098@qq.com> Co-authored-by: JasonBroderick <jason@buddyboss.com> Co-authored-by: RoverKai <roverkai@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1303 lines
70 KiB
TypeScript
1303 lines
70 KiB
TypeScript
/**
|
||
* Prompt composer. The base is the OD-adapted "expert designer" system
|
||
* prompt (see ./official-system.ts) — a full identity, workflow, and
|
||
* content-philosophy charter. Stacked on top:
|
||
*
|
||
* 1. The discovery + planning + huashu-philosophy layer (./discovery.ts)
|
||
* — interactive question-form syntax, direction-picker fork,
|
||
* brand-spec extraction, TodoWrite reinforcement, 5-dim critique,
|
||
* and the embedded `directions.ts` library.
|
||
* 2. The active design system's DESIGN.md (if any) — palette, typography,
|
||
* spacing rules treated as authoritative tokens.
|
||
* 3. The active skill's SKILL.md (if any) — workflow specific to the
|
||
* kind of artifact being built. When the skill ships a seed
|
||
* (`assets/template.html`) and references (`references/layouts.md`,
|
||
* `references/checklist.md`), we inject a hard pre-flight rule above
|
||
* the skill body so the agent reads them BEFORE writing any code.
|
||
* 4. For decks (skillMode === 'deck' OR metadata.kind === 'deck'), the
|
||
* deck framework directive (./deck-framework.ts) is pinned LAST so it
|
||
* overrides any softer slide-handling wording earlier in the stack —
|
||
* this is the load-bearing nav / counter / scroll JS / print
|
||
* stylesheet contract that PDF stitching depends on. We also fire on
|
||
* the metadata path so deck-kind projects without a bound skill
|
||
* (skill_id null) still get a framework, instead of having the agent
|
||
* re-author scaling / nav / print logic from scratch each turn. When
|
||
* the active skill ships its own seed (skill body references
|
||
* `assets/template.html`), we defer to that seed and skip the generic
|
||
* skeleton — the skill's framework wins to avoid double-injection.
|
||
*
|
||
* The composed string is what the daemon sees as `systemPrompt` and what
|
||
* the Anthropic path sends as `system`.
|
||
*/
|
||
import { OFFICIAL_DESIGNER_PROMPT } from './official-system.js';
|
||
import { DISCOVERY_AND_PHILOSOPHY } from './discovery.js';
|
||
import { DECK_FRAMEWORK_DIRECTIVE } from './deck-framework.js';
|
||
import { renderMediaGenerationContract } from './media-contract.js';
|
||
import { IMAGE_MODELS } from '../media-models.js';
|
||
import { renderPanelPrompt } from './panel.js';
|
||
import { defaultCritiqueConfig, type CritiqueConfig } from '@open-design/contracts/critique';
|
||
import type { MediaExecutionPolicy, MediaSurface } from '@open-design/contracts';
|
||
|
||
const ELEVENLABS_VOICE_PROMPT_OPTION_LIMIT = 100;
|
||
const ELEVENLABS_VOICE_OPTIONS_PROMPT_PREFIX = 'ElevenLabs voice list could not be loaded';
|
||
const PROMPT_SAFE_HTTP_STATUS_LABELS: Record<string, string> = {
|
||
'400': 'Bad Request',
|
||
'401': 'Unauthorized',
|
||
'403': 'Forbidden',
|
||
'404': 'Not Found',
|
||
'429': 'Too Many Requests',
|
||
'500': 'Internal Server Error',
|
||
'502': 'Bad Gateway',
|
||
'503': 'Service Unavailable',
|
||
'504': 'Gateway Timeout',
|
||
};
|
||
|
||
function renderUiLocalePrompt(locale: string | undefined): string {
|
||
const normalized = locale?.trim();
|
||
if (!normalized || normalized.toLowerCase() === 'en') return '';
|
||
const languageName = normalized === 'zh-CN'
|
||
? 'Simplified Chinese'
|
||
: normalized === 'zh-TW'
|
||
? 'Traditional Chinese'
|
||
: normalized;
|
||
const lines = [
|
||
'# UI locale override',
|
||
'',
|
||
`The Open Design UI locale for this run is \`${normalized}\` (${languageName}). All user-visible chat prose and generated UI controls must follow this locale, especially \`<question-form>\` titles, descriptions, labels, placeholders, helper text, and option labels. Keep machine-readable ids and object option \`value\` fields exact and unlocalized.`,
|
||
'Exception: for the default task-type form, keep the `taskType` option labels as the canonical routing choices: `Prototype`, `Live artifact`, `Slide deck`, `Image`, `Video`, `HyperFrames`, `Audio`, `Other`. Do not translate, reorder, or rewrite those option labels.',
|
||
];
|
||
if (normalized === 'zh-CN') {
|
||
lines.push(
|
||
'',
|
||
'For the default quick brief in Simplified Chinese, use copy like:',
|
||
'- title: `快速简报 — 30 秒`',
|
||
'- description: `开始生成前我会先确认这些信息。不适用的可以跳过,我会补上默认值。`',
|
||
'- output label/options: `我们要做什么?` / `幻灯片 / 路演稿`, `单页网页原型 / 落地页`, `多屏应用原型`, `数据看板 / 工具界面`, `编辑式 / 营销页面`, `其他 — 我来描述`',
|
||
'- platform label/options: `目标平台` / `响应式网页`, `桌面网页`, `iOS 应用`, `Android 应用`, `平板应用`, `桌面应用`, `固定画布 (1920×1080)`',
|
||
'- audience label/placeholder: `目标用户` / `例如:早期投资人、开发者工具采购者、内部高管评审`',
|
||
'- tone label/options: `视觉调性` / `编辑 / 杂志感`, `现代极简`, `活泼 / 插画感`, `科技 / 工具型`, `奢华 / 精致`, `粗野 / 实验性`, `人性化 / 亲切`',
|
||
'- brand label/options: `品牌背景` / `帮我选一个方向`, `我有品牌规范 — 稍后分享`, `参考网站 / 截图 — 稍后附上`',
|
||
'- scale label/placeholder: `大概需要多少内容?` / `例如:8 页幻灯片、1 个落地页 + 3 个子页面、4 个移动端界面`',
|
||
'- constraints label/placeholder: `还有什么需要知道的吗?` / `真实文案、必须使用的字体、需要避免的内容、截止时间…`',
|
||
);
|
||
}
|
||
return lines.join('\n');
|
||
}
|
||
|
||
function normalizePromptText(value: string): string {
|
||
return value
|
||
.replace(/[\r\n]+/g, ' ')
|
||
.replace(/\s+/g, ' ')
|
||
.trim();
|
||
}
|
||
|
||
function formatElevenLabsVoiceOptionsErrorForPrompt(
|
||
error: string | undefined,
|
||
): string | undefined {
|
||
const trimmed = normalizePromptText(error ?? '');
|
||
if (!trimmed) return undefined;
|
||
|
||
if (/no ElevenLabs API key/i.test(trimmed)) {
|
||
return `${ELEVENLABS_VOICE_OPTIONS_PROMPT_PREFIX} because the ElevenLabs API key is missing. Tell the user to configure it in Settings or paste a voice id manually.`;
|
||
}
|
||
|
||
const statusMatch = trimmed.match(
|
||
/(?:\((\d{3})(?:\s+([^)]+))?\)|\b(\d{3})(?:\s+([A-Za-z][A-Za-z -]{0,40}))?\b)/,
|
||
);
|
||
if (statusMatch) {
|
||
const statusCode = statusMatch[1] ?? statusMatch[3];
|
||
const statusText = statusCode ? PROMPT_SAFE_HTTP_STATUS_LABELS[statusCode] ?? '' : '';
|
||
const suffix = statusText ? ` ${statusText}` : '';
|
||
return `${ELEVENLABS_VOICE_OPTIONS_PROMPT_PREFIX} (${statusCode}${suffix}). Tell the user to retry the lookup or paste a voice id manually.`;
|
||
}
|
||
|
||
return `${ELEVENLABS_VOICE_OPTIONS_PROMPT_PREFIX}. Tell the user to retry the lookup or paste a voice id manually.`;
|
||
}
|
||
|
||
type ProjectMetadata = {
|
||
kind?: string;
|
||
intent?: string | null;
|
||
fidelity?: string | null;
|
||
speakerNotes?: boolean | null;
|
||
slideCount?: string | null;
|
||
animations?: boolean | null;
|
||
includeLandingPage?: boolean | null;
|
||
includeOsWidgets?: boolean | null;
|
||
templateId?: string | null;
|
||
templateLabel?: string | null;
|
||
platform?: string | null;
|
||
platformTargets?: string[] | null;
|
||
inspirationDesignSystemIds?: string[];
|
||
skipDiscoveryBrief?: boolean | null;
|
||
imageModel?: string | null;
|
||
imageAspect?: string | null;
|
||
imageStyle?: string | null;
|
||
videoModel?: string | null;
|
||
videoLength?: number | null;
|
||
videoAspect?: string | null;
|
||
audioKind?: string | null;
|
||
audioModel?: string | null;
|
||
audioDuration?: number | null;
|
||
voice?: string | null;
|
||
promptTemplate?: {
|
||
id?: string | null;
|
||
surface?: 'image' | 'video' | null;
|
||
title?: string | null;
|
||
prompt?: string | null;
|
||
summary?: string | null;
|
||
category?: string | null;
|
||
tags?: string[] | null;
|
||
model?: string | null;
|
||
aspect?: string | null;
|
||
source?: {
|
||
repo?: string | null;
|
||
license?: string | null;
|
||
author?: string | null;
|
||
url?: string | null;
|
||
} | null;
|
||
} | null;
|
||
contextPlugins?: Array<{
|
||
id?: string | null;
|
||
title?: string | null;
|
||
description?: string | null;
|
||
}> | null;
|
||
contextMcpServers?: Array<{
|
||
id?: string | null;
|
||
label?: string | null;
|
||
transport?: string | null;
|
||
url?: string | null;
|
||
command?: string | null;
|
||
}> | null;
|
||
contextConnectors?: Array<{
|
||
id?: string | null;
|
||
name?: string | null;
|
||
provider?: string | null;
|
||
category?: string | null;
|
||
status?: string | null;
|
||
accountLabel?: string | null;
|
||
}> | null;
|
||
};
|
||
type ProjectTemplate = { name: string; description?: string | null; files: Array<{ name: string; content: string }> };
|
||
type AudioVoiceOption = {
|
||
name: string;
|
||
voiceId: string;
|
||
category?: string | null;
|
||
labels?: Record<string, string> | null;
|
||
};
|
||
|
||
type ExclusiveSurfaceMode = 'deck' | 'image' | 'video' | 'audio';
|
||
|
||
const EXCLUSIVE_SURFACE_MODES = new Set<ExclusiveSurfaceMode>(['deck', 'image', 'video', 'audio']);
|
||
|
||
export function resolveExclusiveSurface(args: {
|
||
metadata?: ProjectMetadata | undefined;
|
||
skillMode?: ComposeInput['skillMode'] | undefined;
|
||
skillModes?: ComposeInput['skillModes'] | undefined;
|
||
}): ExclusiveSurfaceMode | null {
|
||
const activeSkillModes = new Set(
|
||
Array.isArray(args.skillModes)
|
||
? args.skillModes.filter(Boolean)
|
||
: args.skillMode
|
||
? [args.skillMode]
|
||
: [],
|
||
);
|
||
const metadataSurface = EXCLUSIVE_SURFACE_MODES.has(args.metadata?.kind as ExclusiveSurfaceMode)
|
||
? args.metadata?.kind as ExclusiveSurfaceMode
|
||
: null;
|
||
const primarySkillSurface = EXCLUSIVE_SURFACE_MODES.has(args.skillMode as ExclusiveSurfaceMode)
|
||
? args.skillMode as ExclusiveSurfaceMode
|
||
: null;
|
||
const composedSurfaceModes = Array.from(activeSkillModes).filter((mode): mode is ExclusiveSurfaceMode =>
|
||
EXCLUSIVE_SURFACE_MODES.has(mode as ExclusiveSurfaceMode),
|
||
);
|
||
|
||
return metadataSurface
|
||
?? primarySkillSurface
|
||
?? (composedSurfaceModes.length === 1 ? composedSurfaceModes[0] ?? null : null);
|
||
}
|
||
|
||
export const BASE_SYSTEM_PROMPT = OFFICIAL_DESIGNER_PROMPT;
|
||
|
||
export const SKIP_DISCOVERY_BRIEF_OVERRIDE = `# Automated project mode — skip discovery form
|
||
|
||
This project was created through the daemon API with \`skipDiscoveryBrief: true\`. Override the discovery rules below: do NOT emit \`<question-form id="discovery">\`, do NOT show "Quick brief — 30 seconds", and do NOT ask a first-turn clarification form. Treat the user's first message and project metadata as the brief, then proceed directly to planning/building under the normal artifact workflow. Ask at most one concise follow-up only if a required detail is impossible to infer safely.`;
|
||
|
||
const ACTIVE_DESIGN_SYSTEM_VISUAL_DIRECTION_OVERRIDE = `
|
||
|
||
---
|
||
|
||
## Active design system visual direction
|
||
|
||
Active design system exception: the active design system is the visual direction for this project. Use its DESIGN.md palette, typography, spacing, component rules, and theme tokens as the source of truth for color and mood.
|
||
|
||
- Do not ask the user to pick a separate theme color, visual direction, palette, typography mood, or direction card.
|
||
- Do not emit a direction question-form, a \`direction-cards\` picker, or any visual-direction card while an active design system is present.
|
||
- If an earlier discovery answer asks to "Pick a direction for me", treat that as already satisfied by the active design system and continue with the plan.
|
||
- When a downstream framework mentions "active direction" or "theme tokens", bind those fields from the active design system instead of the built-in direction library.
|
||
`;
|
||
|
||
const DEFAULT_DESIGN_SYSTEM_USAGE = `Read DESIGN.md for visual principles, paste tokens.css verbatim into the first <style> when it is provided, and match component shapes from the reference component manifest or fixture when available. Treat any pull-layer index as optional context for deeper inspection; do not assume those files have already been loaded.`;
|
||
|
||
function renderDesignSystemImportModeGuidance(
|
||
importMode: ComposeInput['designSystemImportMode'],
|
||
): string | undefined {
|
||
if (importMode === 'normalized') {
|
||
return 'This package is normalized. Treat tokens.css and DESIGN.md as the contract, and prefer OD token names over source-project names. Use pull-layer source evidence only as optional background.';
|
||
}
|
||
if (importMode === 'hybrid') {
|
||
return 'This package is hybrid. Build with OD-normalized tokens first, then inspect pull-layer source evidence or snippets only when original component behavior, density, or naming would materially improve fidelity.';
|
||
}
|
||
if (importMode === 'verbatim') {
|
||
return 'This package is verbatim-oriented. Preserve source semantics and source naming as much as possible. Before translating component behavior, inspect the relevant pull-layer source evidence or snippets when the runtime tool is available.';
|
||
}
|
||
return undefined;
|
||
}
|
||
|
||
export interface ComposeInput {
|
||
agentId?: string | null | undefined;
|
||
includeCodexImagegenOverride?: boolean | undefined;
|
||
streamFormat?: string | undefined;
|
||
skillBody?: string | undefined;
|
||
skillName?: string | undefined;
|
||
skillMode?:
|
||
| 'prototype'
|
||
| 'deck'
|
||
| 'template'
|
||
| 'design-system'
|
||
| 'image'
|
||
| 'video'
|
||
| 'audio'
|
||
| undefined;
|
||
skillModes?: Array<'prototype' | 'deck' | 'template' | 'design-system' | 'image' | 'video' | 'audio'> | undefined;
|
||
designSystemBody?: string | undefined;
|
||
designSystemTitle?: string | undefined;
|
||
// Compiled (machine-readable) form of the active brand's design system,
|
||
// shipped as sibling files to DESIGN.md when available. Both fields are
|
||
// optional; the daemon populates them by default for every brand that
|
||
// ships `tokens.css` / `components.html` (today: `default` and
|
||
// `kami`). `OD_DESIGN_TOKEN_CHANNEL=0` disables the channel as a kill
|
||
// switch. When present they are appended AFTER the DESIGN.md block so
|
||
// prose still sets the high-level voice and the structured form
|
||
// disambiguates token names + worked component shapes.
|
||
//
|
||
// - `designSystemUsageMd` — optional USAGE.md router that tells
|
||
// agents how to consume this package.
|
||
// - `designSystemTokensCss` — verbatim `tokens.css` :root contract
|
||
// that the agent pastes into the
|
||
// artifact's <style>.
|
||
// - `designSystemComponentsManifest` — concise structured summary
|
||
// derived from components.html.
|
||
// - `designSystemFixtureHtml` — verbatim `components.html`
|
||
// fallback when no manifest can
|
||
// be derived.
|
||
// - `designSystemPullIndex` — lightweight manifest-derived
|
||
// list of richer files available
|
||
// for later pull-channel work.
|
||
designSystemUsageMd?: string | undefined;
|
||
designSystemTokensCss?: string | undefined;
|
||
designSystemComponentsManifest?: string | undefined;
|
||
designSystemFixtureHtml?: string | undefined;
|
||
designSystemPullIndex?: string | undefined;
|
||
designSystemImportMode?: 'normalized' | 'hybrid' | 'verbatim' | undefined;
|
||
// Craft references the active skill opted into via `od.craft.requires`.
|
||
// The daemon resolves the slug list to file contents and concatenates
|
||
// them with section headers; we inject them between the DESIGN.md and
|
||
// the skill body so brand tokens win on conflict but craft rules
|
||
// (letter-spacing, accent caps, anti-slop) cover everything below.
|
||
craftBody?: string | undefined;
|
||
craftSections?: string[] | undefined;
|
||
// Markdown built from the user's auto-memory store
|
||
// (<dataDir>/memory/*.md). Folded in before the active design system so
|
||
// tone/voice/preferences extracted from past chats win over the
|
||
// built-in identity charter but still defer to the brand's hard tokens
|
||
// and the active skill's workflow. Empty/undefined skips the block.
|
||
memoryBody?: string | undefined;
|
||
// Project-level metadata captured by the new-project panel. Drives the
|
||
// agent's understanding of artifact kind, fidelity, speaker-notes intent
|
||
// and animation intent. Missing fields here are exactly what the
|
||
// discovery form should re-ask the user about on turn 1.
|
||
metadata?: ProjectMetadata | undefined;
|
||
// The template the user picked in the From-template tab, when present.
|
||
// Snapshot of HTML files that the agent should treat as a starting
|
||
// reference rather than a fixed deliverable.
|
||
template?: ProjectTemplate | undefined;
|
||
// Provider voice choices fetched by the daemon/web before composing the
|
||
// prompt. Used for ElevenLabs speech discovery so the agent can render
|
||
// a select question-form instead of asking the user to paste raw ids.
|
||
audioVoiceOptions?: AudioVoiceOption[] | undefined;
|
||
// When voice discovery fails, surface the error reason so the agent
|
||
// can tell the user why the dropdown is unavailable instead of
|
||
// pretending there were simply no voices.
|
||
audioVoiceOptionsError?: string | undefined;
|
||
// When present and enabled, the Critique Theater protocol addendum is
|
||
// concatenated to the end of the composed prompt. Omitting this field
|
||
// (or passing cfg.enabled === false) preserves legacy behavior unchanged.
|
||
critique?: CritiqueConfig | undefined;
|
||
// Brand name and DESIGN.md body. Required when critique is enabled;
|
||
// ignored when critique is disabled or omitted.
|
||
critiqueBrand?: { name: string; design_md: string } | undefined;
|
||
// Skill identifier. Required when critique is enabled;
|
||
// ignored when critique is disabled or omitted.
|
||
critiqueSkill?: { id: string } | undefined;
|
||
// External MCP servers the daemon already holds a valid OAuth Bearer
|
||
// token for at spawn time. We surface the list to the model so it does
|
||
// NOT chase Claude Code's synthetic `*_authenticate` /
|
||
// `*_complete_authentication` tools that get injected when the HTTP
|
||
// transport's first connect transiently flips a server into
|
||
// needs-auth state — the Bearer is in `.mcp.json`, the real tools are
|
||
// available, and burning a turn on a redundant OAuth dance just
|
||
// confuses the user.
|
||
connectedExternalMcp?: ReadonlyArray<{ id: string; label?: string | undefined }>
|
||
| undefined;
|
||
// Optional `## Active plugin` / `## Plugin inputs` block. The daemon's
|
||
// plugin module renders this from an AppliedPluginSnapshot; we splice
|
||
// it in after the active skill so the plugin description sits next to
|
||
// its companion skill body in the prompt. Pass undefined when no
|
||
// plugin is bound to the run.
|
||
pluginBlock?: string | undefined;
|
||
// Plan §3.L2 / spec §23.4 — pre-rendered `## Active stage: <id>`
|
||
// blocks (one per pipeline stage active for the run). The daemon's
|
||
// pipeline runner builds these from `loadAtomBodies()` +
|
||
// `renderActiveStageBlock()` when the OD_BUNDLED_ATOM_PROMPTS env
|
||
// flag is set; otherwise this stays undefined and the prompt
|
||
// composer's hard-coded constants keep their precedence (back-compat).
|
||
activeStageBlocks?: ReadonlyArray<string> | undefined;
|
||
// Free-form instructions the user set at the global (user-level)
|
||
// settings panel. Injected after personal memory and before the
|
||
// project-level instructions.
|
||
userInstructions?: string | undefined;
|
||
// Free-form instructions the user set on this specific project.
|
||
// Injected after user-level instructions and before the design system.
|
||
projectInstructions?: string | undefined;
|
||
// UI locale selected by the client. User-visible generated form copy
|
||
// must follow this locale even when the user's initial prompt is brief.
|
||
locale?: string | undefined;
|
||
// Run-scoped media policy. Defaults to enabled when omitted so existing
|
||
// local OD behavior keeps the same media prompt contract.
|
||
mediaExecution?: MediaExecutionPolicy | undefined;
|
||
}
|
||
|
||
export function composeSystemPrompt({
|
||
agentId,
|
||
includeCodexImagegenOverride = true,
|
||
skillBody,
|
||
skillName,
|
||
skillMode,
|
||
skillModes,
|
||
designSystemBody,
|
||
designSystemTitle,
|
||
designSystemUsageMd,
|
||
designSystemTokensCss,
|
||
designSystemComponentsManifest,
|
||
designSystemFixtureHtml,
|
||
designSystemPullIndex,
|
||
designSystemImportMode,
|
||
craftBody,
|
||
craftSections,
|
||
memoryBody,
|
||
metadata,
|
||
template,
|
||
audioVoiceOptions,
|
||
audioVoiceOptionsError,
|
||
critique,
|
||
critiqueBrand,
|
||
critiqueSkill,
|
||
connectedExternalMcp,
|
||
pluginBlock,
|
||
activeStageBlocks,
|
||
streamFormat,
|
||
locale,
|
||
userInstructions,
|
||
projectInstructions,
|
||
mediaExecution,
|
||
}: ComposeInput): string {
|
||
// Discovery + philosophy goes FIRST so its hard rules ("emit a form on
|
||
// turn 1", "branch on brand on turn 2", "TodoWrite on turn 3", run
|
||
// checklist + critique before <artifact>) win precedence over softer
|
||
// wording later in the official base prompt.
|
||
const parts: string[] = [];
|
||
const activeDesignSystemBody = designSystemBody?.trim();
|
||
const activeSkillModes = new Set(
|
||
Array.isArray(skillModes)
|
||
? skillModes.filter(Boolean)
|
||
: skillMode
|
||
? [skillMode]
|
||
: [],
|
||
);
|
||
const resolvedExclusiveSurface = resolveExclusiveSurface({ metadata, skillMode, skillModes });
|
||
|
||
// API/BYOK mode (streamFormat === 'plain'): mirrors the same fix from
|
||
// `@open-design/contracts`'s composer. The daemon hits this path for
|
||
// any plain-stream adapter (e.g. DeepSeek), so without pinning the
|
||
// override above DISCOVERY_AND_PHILOSOPHY here too, those daemon
|
||
// agents still emit the `<todo-list>` / `[读取 X]` pseudo-tool
|
||
// markup described in #313. Keep the wording byte-identical to the
|
||
// contracts copy so both code paths produce the same observable
|
||
// behaviour.
|
||
if (streamFormat === 'plain') {
|
||
parts.push(API_MODE_OVERRIDE);
|
||
parts.push('\n\n---\n\n');
|
||
}
|
||
|
||
if (metadata?.skipDiscoveryBrief === true) {
|
||
parts.push(SKIP_DISCOVERY_BRIEF_OVERRIDE);
|
||
parts.push('\n\n---\n\n');
|
||
}
|
||
|
||
const localePrompt = renderUiLocalePrompt(locale);
|
||
if (localePrompt) {
|
||
parts.push(localePrompt);
|
||
parts.push('\n\n---\n\n');
|
||
}
|
||
|
||
parts.push(
|
||
DISCOVERY_AND_PHILOSOPHY,
|
||
'\n\n---\n\n# Identity and workflow charter (background)\n\n',
|
||
BASE_SYSTEM_PROMPT,
|
||
);
|
||
|
||
if (memoryBody && memoryBody.trim().length > 0) {
|
||
parts.push(
|
||
`\n\n## Personal memory (auto-extracted from past chats)\n\nThe following facts have been sedimented from this user's previous conversations and edited in the settings panel. Treat them as preferences and context, NOT hard rules: when they collide with the active design system tokens, the brand wins; when they collide with the active skill's workflow, the skill wins. They are still authoritative for tone, voice, terminology, and what the user already told you about themselves and their goals — never re-ask the user about something already captured here.\n\n${memoryBody.trim()}`,
|
||
);
|
||
}
|
||
|
||
if (userInstructions && userInstructions.trim().length > 0) {
|
||
parts.push(
|
||
`\n\n## Custom instructions (user-level)\n\nThe user has set the following persistent instructions. Apply them as defaults to every project. When a project-level instruction below contradicts a point here, the project-level version wins.\n\n${userInstructions.trim()}`,
|
||
);
|
||
}
|
||
|
||
if (projectInstructions && projectInstructions.trim().length > 0) {
|
||
parts.push(
|
||
`\n\n## Custom instructions (project-level)\n\nThe user has set the following instructions for this specific project. They take precedence over user-level custom instructions whenever both address the same topic (e.g. if user-level says "use spaces" but project-level says "use tabs", use tabs).\n\n${projectInstructions.trim()}`,
|
||
);
|
||
}
|
||
|
||
if (activeDesignSystemBody && activeDesignSystemBody.length > 0) {
|
||
const usageBlock =
|
||
designSystemUsageMd && designSystemUsageMd.trim().length > 0
|
||
? designSystemUsageMd.trim()
|
||
: DEFAULT_DESIGN_SYSTEM_USAGE;
|
||
parts.push(
|
||
`\n\n## How to use this design system${designSystemTitle ? ` — ${designSystemTitle}` : ''}\n\n${usageBlock}`,
|
||
);
|
||
|
||
parts.push(
|
||
`\n\n## Active design system${designSystemTitle ? ` — ${designSystemTitle}` : ''}\n\nTreat the following DESIGN.md as authoritative for color, typography, spacing, and component rules. Do not invent tokens outside this palette. When you copy the active skill's seed template, bind these tokens into its \`:root\` block before generating any layout.\n\n${activeDesignSystemBody}`,
|
||
);
|
||
|
||
const importModeGuidance = renderDesignSystemImportModeGuidance(designSystemImportMode);
|
||
if (importModeGuidance) {
|
||
parts.push(
|
||
`\n\n## Design system import mode${designSystemTitle ? ` — ${designSystemTitle}` : ''}\n\n${importModeGuidance}`,
|
||
);
|
||
}
|
||
}
|
||
|
||
// Structured (compiled) form of the active brand. The DESIGN.md above
|
||
// sets voice and intent; the tokens.css block below is the SAME
|
||
// contract in machine-readable form — names + values the agent pastes
|
||
// verbatim instead of re-deriving from prose. The components.html
|
||
// manifest grounds the token vocabulary in worked component shapes
|
||
// (button / card / type roles) without injecting the full HTML fixture.
|
||
// If manifest extraction fails or is unavailable, the composer falls
|
||
// back to the verbatim components.html fixture. Both blocks are
|
||
// individually gated: missing files skip silently, preserving the
|
||
// legacy DESIGN.md-only behaviour for prose-only brands.
|
||
if (designSystemTokensCss && designSystemTokensCss.trim().length > 0) {
|
||
parts.push(
|
||
`\n\n## Active design system tokens${designSystemTitle ? ` — ${designSystemTitle}` : ''}\n\nThe block below is this brand's tokens.css contract — every \`:root\` custom property and any scoped override (e.g. \`:root[lang=...]\`) the brand defines. **Paste the unscoped \`:root { ... }\` block verbatim into the artifact's first \`<style>\`** so every \`var(--*)\` reference resolves at runtime.\n\nDo not invent new tokens. Do not redefine these values. Do not write raw hex outside this :root block. The DESIGN.md above is prose; this is the binding contract.\n\n\`\`\`css\n${designSystemTokensCss.trim()}\n\`\`\``,
|
||
);
|
||
}
|
||
|
||
if (designSystemComponentsManifest && designSystemComponentsManifest.trim().length > 0) {
|
||
parts.push(
|
||
`\n\n## Reference component manifest${designSystemTitle ? ` — ${designSystemTitle}` : ''}\n\nA compact structured summary derived from this brand's components.html fixture. Use it as the component inventory for generated artifacts: match the listed selectors, component groups, class names, token references, focus behavior, and spacing cadence. Prefer these manifest entries over inventing new component shapes.\n\n\`\`\`text\n${designSystemComponentsManifest.trim()}\n\`\`\``,
|
||
);
|
||
} else if (designSystemFixtureHtml && designSystemFixtureHtml.trim().length > 0) {
|
||
parts.push(
|
||
`\n\n## Reference fixture${designSystemTitle ? ` — ${designSystemTitle}` : ''}\n\nA self-contained worked artifact in this design system. Match its component shapes (button structure, card structure, type-scale rhythm, focus ring, spacing cadence) when generating new artifacts. Copying fragments is encouraged as long as you keep the \`var(--*)\` references intact — they are already wired to the tokens above.\n\n\`\`\`html\n${designSystemFixtureHtml.trim()}\n\`\`\``,
|
||
);
|
||
}
|
||
|
||
if (designSystemPullIndex && designSystemPullIndex.trim().length > 0) {
|
||
parts.push(
|
||
`\n\n## Pull-layer files available on demand${designSystemTitle ? ` — ${designSystemTitle}` : ''}\n\nThis design-system package declares richer files for inspection, source evidence, or human preview. Keep the push prompt light: use the index below to decide what to read later. When the runtime tool environment is available, read a listed path with \`\"$OD_NODE_BIN\" \"$OD_BIN\" tools design-systems read --path <path>\`; the daemon will reject paths outside this manifest allowlist.\n\n\`\`\`text\n${designSystemPullIndex.trim()}\n\`\`\``,
|
||
);
|
||
}
|
||
|
||
if (craftBody && craftBody.trim().length > 0) {
|
||
const sectionLabel =
|
||
Array.isArray(craftSections) && craftSections.length > 0
|
||
? ` — ${craftSections.join(', ')}`
|
||
: '';
|
||
parts.push(
|
||
`\n\n## Active craft references${sectionLabel}\n\nThe following craft rules are universal — they apply on top of the active design system above, regardless of brand. The DESIGN.md decides *which* tokens to use; craft rules decide *how* to use them. On any conflict between a craft rule and a brand DESIGN.md, the brand wins for token values; craft rules still apply to anything the brand does not override (letter-spacing, accent overuse caps, anti-slop patterns).\n\n${craftBody.trim()}`,
|
||
);
|
||
}
|
||
|
||
if (skillBody && skillBody.trim().length > 0) {
|
||
const preflight = derivePreflight(skillBody);
|
||
parts.push(
|
||
`\n\n## Active skill${skillName ? ` — ${skillName}` : ''}\n\nFollow this skill's workflow exactly.${preflight}\n\n${skillBody.trim()}`,
|
||
);
|
||
}
|
||
|
||
if (pluginBlock && pluginBlock.trim().length > 0) {
|
||
parts.push(pluginBlock);
|
||
}
|
||
|
||
// Plan §3.L2 / spec §23.4 — splice per-stage atom blocks immediately
|
||
// after the active plugin block. Empty entries are skipped so a
|
||
// pipeline whose stages don't resolve any bundled atom bodies
|
||
// produces zero extra prompt mass. The active-skill body above
|
||
// remains the precedence carrier; these blocks add the stage-by-
|
||
// stage atom guidance that spec §23.3.2 calls out.
|
||
if (Array.isArray(activeStageBlocks) && activeStageBlocks.length > 0) {
|
||
for (const block of activeStageBlocks) {
|
||
if (typeof block === 'string' && block.trim().length > 0) {
|
||
parts.push(block);
|
||
}
|
||
}
|
||
}
|
||
|
||
const metaBlock = renderMetadataBlock(
|
||
metadata,
|
||
template,
|
||
audioVoiceOptions,
|
||
audioVoiceOptionsError,
|
||
mediaExecution,
|
||
);
|
||
if (metaBlock) parts.push(metaBlock);
|
||
|
||
// Decks have a load-bearing framework (nav, counter, scroll JS, print
|
||
// stylesheet for PDF stitching). Pin it last so it overrides any softer
|
||
// wording earlier in the stack ("write a script that handles arrows…").
|
||
//
|
||
// We fire on either (a) the active skill is a deck skill OR (b) the
|
||
// project metadata declares kind=deck. Case (b) catches projects created
|
||
// without a skill (skill_id null) — without this, a deck-kind project
|
||
// with no bound skill gets neither a skill seed nor the framework
|
||
// skeleton, and the agent writes scaling / nav / print logic from scratch
|
||
// with the same buggy `place-items: center` + transform pattern we keep
|
||
// having to fix at runtime. Skill seeds (when present) win — they
|
||
// already define their own opinionated framework (simple-deck's
|
||
// scroll-snap, guizang-ppt's magazine layout) and re-pinning the generic
|
||
// skeleton would conflict. The skill-seed path takes over via
|
||
// `derivePreflight` above, so we only fire the generic skeleton when no
|
||
// skill seed is on offer.
|
||
const isDeckProject = resolvedExclusiveSurface === 'deck';
|
||
const isFreeformProject = activeSkillModes.size === 0 && (!metadata || metadata.kind === 'other');
|
||
const hasSkillSeed =
|
||
!!skillBody && /assets\/template\.html/.test(skillBody);
|
||
if (isDeckProject && !hasSkillSeed) {
|
||
parts.push(`\n\n---\n\n${DECK_FRAMEWORK_DIRECTIVE}`);
|
||
} else if (isFreeformProject && !hasSkillSeed) {
|
||
// Freeform / kind=other projects skip the kind picker entirely and
|
||
// land here. If the user's brief is a deck/keynote/slides ("讲解",
|
||
// "presentation", "make a deck"), the agent used to invent its own
|
||
// scale-to-fit + slide visibility + nav script from scratch and
|
||
// shipped subtle CSS specificity bugs (per-slide layout classes
|
||
// overriding `.slide { display:none }`). Inject the same framework
|
||
// here, prefixed with a one-line conditional so the agent only
|
||
// adopts it when the brief actually is a deck — otherwise the
|
||
// directive is read as background reference and ignored.
|
||
parts.push(
|
||
`\n\n---\n\n## If this brief is a slide deck / keynote / presentation\n\nThe user did not pre-select a "Slide deck" surface, but their request may still call for one. **If — and only if — the brief reads as slides, keynote, presentation, deck, PPT, or 讲解, follow the framework below.** Otherwise ignore everything in this section and continue with the freeform output you would have written anyway.\n\n${DECK_FRAMEWORK_DIRECTIVE}`,
|
||
);
|
||
}
|
||
|
||
const isMediaSurface =
|
||
resolvedExclusiveSurface === 'image'
|
||
|| resolvedExclusiveSurface === 'video'
|
||
|| resolvedExclusiveSurface === 'audio';
|
||
if (isMediaSurface) {
|
||
parts.push(renderMediaGenerationContract(mediaExecution));
|
||
}
|
||
|
||
if (includeCodexImagegenOverride && shouldAllowCodexImagegenOverride(metadata, mediaExecution)) {
|
||
const codexImagegenOverride = renderCodexImagegenOverride(
|
||
agentId,
|
||
metadata,
|
||
);
|
||
if (codexImagegenOverride) {
|
||
parts.push(codexImagegenOverride);
|
||
}
|
||
}
|
||
|
||
// Critique Theater addendum. When cfg.enabled is true the panel protocol
|
||
// is pinned last so it overrides any softer critique wording earlier in the
|
||
// stack. When disabled (the default) this block is a no-op so no consumer
|
||
// needs to opt in.
|
||
//
|
||
// The panel block requires <ARTIFACT mime="text/html"> inside <CRITIQUE_RUN>,
|
||
// which conflicts with MEDIA_GENERATION_CONTRACT (image/video/audio surfaces
|
||
// explicitly forbid HTML output). Skip the addendum on media surfaces so
|
||
// the critique flag is a no-op there until a media-aware panel template
|
||
// lands.
|
||
const cfg = critique ?? defaultCritiqueConfig();
|
||
if (cfg.enabled && critiqueBrand && critiqueSkill && !isMediaSurface) {
|
||
parts.push('\n\n' + renderPanelPrompt({ cfg, brand: critiqueBrand, skill: critiqueSkill }));
|
||
}
|
||
|
||
if (activeDesignSystemBody && activeDesignSystemBody.length > 0) {
|
||
parts.push(ACTIVE_DESIGN_SYSTEM_VISUAL_DIRECTION_OVERRIDE);
|
||
}
|
||
|
||
const mcpDirective = renderConnectedExternalMcpDirective(connectedExternalMcp);
|
||
if (mcpDirective) parts.push(mcpDirective);
|
||
|
||
// Claude only: nudge the model toward the `AskUserQuestion` tool for
|
||
// mid-conversation clarifications. Without this hint Claude tends to fall
|
||
// back to a markdown bulleted list of options, which the chat UI cannot
|
||
// turn into clickable buttons. Discovery (turn 1) is still owned by the
|
||
// `<question-form>` flow defined in DISCOVERY_AND_PHILOSOPHY; this only
|
||
// covers follow-ups where the next action depends on a small set of
|
||
// choices the user can pick quickly.
|
||
if (agentId === 'claude') {
|
||
parts.push(
|
||
"\n\n---\n\n## Clarifying questions\n\nWhen you need a mid-conversation clarification AND the natural answer is one of a small finite set of choices (2-4 options per question), call the `AskUserQuestion` tool instead of writing a bulleted list in markdown. The host chat renders the tool call as inline choice buttons; a markdown list renders as plain text and forces the user to type a reply. Skip the tool when the answer is naturally free-form text, when the answer needs more than ~4 options, or when you only have one yes/no choice to ask. First-turn discovery still uses the `<question-form id=\"discovery\">` workflow described earlier; `AskUserQuestion` is for follow-ups only.\n\n**When you call `AskUserQuestion`, that tool call is the entire response.** Do NOT also write the same questions or options as markdown text alongside it, do NOT add a trailing prose paragraph like \"what sounds right?\", do NOT hedge by listing the options twice. Emit the tool call and stop generating tokens. The host is waiting on the tool's `tool_result` and will resume your turn the moment the user answers. Anything you write before, between, or after the tool call in the same message just duplicates what the card already shows and confuses the user.",
|
||
);
|
||
}
|
||
|
||
// Pinned LAST so recency bias reinforces the role-marker prohibition.
|
||
// This is the canonical anti-roleplay instruction;
|
||
parts.push(
|
||
"\n\n---\n\n## CRITICAL: Never fabricate conversation turns\n\n" +
|
||
"The text you emit is processed by a chat host that interprets lines " +
|
||
"starting with \`## user\`, \`## assistant\`, or \`## system\` as real " +
|
||
"turn boundaries. Emitting these lines causes the host to treat your " +
|
||
"fabricated text as a real user request and execute unauthorised actions.\n\n" +
|
||
"**FORBIDDEN — you MUST NOT:**\n" +
|
||
"- Emit any line starting with \`## user\`, \`## assist\`, \`## assistant\`, or \`## system\`\n" +
|
||
"- Roleplay multiple turns inside a single response\n" +
|
||
"- Invent a user message and then reply to it\n\n" +
|
||
"The host will truncate your response at the first role-marker line — " +
|
||
"any text after it is lost. If you feel the urge to simulate a dialogue, " +
|
||
"stop and ask the user a real question instead.",
|
||
);
|
||
|
||
return parts.join('');
|
||
}
|
||
|
||
/**
|
||
* Top-anchored override for plain-stream daemon agents (#313). Mirrors
|
||
* the contracts-package copy byte-for-byte; see that file for the full
|
||
* rationale. Pinning it at the absolute top of the composed prompt is
|
||
* what beats the discovery layer's own "these override anything later"
|
||
* header — the old bottom-appended `## API mode rule` lost that
|
||
* precedence war and let `<todo-list>` / `[读取 X]` pseudo-tool markup
|
||
* leak into the chat.
|
||
*/
|
||
const API_MODE_OVERRIDE = `# API mode — no tools available (read first — overrides every rule below)
|
||
|
||
You are running through a plain Messages API. **No tools are wired through to you.** \`TodoWrite\`, \`Read\`, \`Write\`, \`Edit\`, \`Bash\`, and \`WebFetch\` are unavailable — calls to them will not execute and will not render in the UI.
|
||
|
||
Every later instruction in this prompt that tells you to "call TodoWrite", "run Bash", "read via Read", or otherwise invoke a tool is describing the daemon-mode workflow. In this API run those instructions are **overridden** — do not attempt them and do not pretend you did.
|
||
|
||
**Forbidden output:**
|
||
- Pseudo-tool markup such as \`<todo-list>...</todo-list>\`, \`<tool-call>\`, or invented XML wrappers around a plan.
|
||
- Fake-protocol prose such as \`[读取 template.html ...]\`, \`[读取 layouts.md ...]\`, \`[正在调用 TodoWrite ...]\`, or any \`[doing X]\` placeholder narrating a tool you cannot run.
|
||
- Statements like "I'll call TodoWrite to track this" or "let me read the skill file first" — there is no TodoWrite and no Read in this run.
|
||
|
||
**Allowed output:**
|
||
- Plain chat prose to the user (in their language). State your plan as prose — a short numbered list in markdown is fine; it just must not be wrapped in \`<todo-list>\` or claim to be a tool call.
|
||
- A final \`<artifact type="text/html">...</artifact>\` block containing a complete \`<!doctype html>\` document when the brief is ready to deliver.
|
||
- \`<question-form>\` blocks for discovery on turn 1, exactly as the rules below describe — question-form is markup the UI parses, not a tool call.
|
||
|
||
If the rules below tell you to plan with TodoWrite, write the plan as prose instead. If they tell you to read skill side files before writing, describe in one sentence which patterns/conventions you're going to apply and proceed. If they tell you to run brand-spec extraction via Bash + Read + WebFetch, ask the user the missing brand questions in the discovery form instead.`;
|
||
|
||
// Defense-in-depth against Claude Code's synthetic OAuth tools.
|
||
//
|
||
// When Claude Code's built-in HTTP MCP transport gets a 401 on its first
|
||
// initialize (transient propagation lag, edge cache miss, header
|
||
// re-canonicalization quirk, etc.), it injects two synthetic tools per
|
||
// server — `mcp__<server>__authenticate` and
|
||
// `mcp__<server>__complete_authentication` — that drive a per-process
|
||
// OAuth dance with a `localhost:<random>/callback` redirect_uri. That
|
||
// listener dies with the agent process, so the round-trip never
|
||
// completes, and meanwhile the model burns a turn pasting an
|
||
// unreachable URL into the chat. By the time the user is back, our
|
||
// daemon-issued Bearer is already in `.mcp.json` and the real tools
|
||
// (`generate_image`, `models_explore`, …) are reachable on the next
|
||
// turn — but the model doesn't know that and keeps escalating the
|
||
// fake auth flow.
|
||
//
|
||
// The fix is to tell the model up front: these specific servers are
|
||
// already authenticated by the daemon, do NOT call any
|
||
// `*_authenticate` / `*_complete_authentication` tool for them. If
|
||
// the real tools really are missing, surface that as a separate
|
||
// failure instead of pivoting to the synthetic flow.
|
||
function renderConnectedExternalMcpDirective(
|
||
connectedExternalMcp:
|
||
| ReadonlyArray<{ id: string; label?: string | undefined }>
|
||
| undefined,
|
||
): string {
|
||
if (!connectedExternalMcp || connectedExternalMcp.length === 0) return '';
|
||
const lines = connectedExternalMcp
|
||
.map((s) => {
|
||
const id = typeof s?.id === 'string' ? s.id.trim() : '';
|
||
if (!id) return null;
|
||
const label = typeof s?.label === 'string' && s.label.trim() ? s.label.trim() : id;
|
||
return `- \`${id}\`${label !== id ? ` (${label})` : ''}`;
|
||
})
|
||
.filter((line): line is string => typeof line === 'string');
|
||
if (lines.length === 0) return '';
|
||
return [
|
||
'\n\n---\n\n',
|
||
'## External MCP servers — already authenticated\n\n',
|
||
'The following external MCP servers are already authenticated for this run via an OAuth Bearer token the daemon injected into `.mcp.json`. You can call their real tools directly:\n\n',
|
||
lines.join('\n'),
|
||
'\n\n',
|
||
'**Do NOT call any tool whose name matches `mcp__<server>__authenticate` or `mcp__<server>__complete_authentication` for the servers above.** Those are synthetic fallback tools Claude Code exposes when its first HTTP connect briefly flipped the server into a needs-auth state. The flow they drive (a `localhost:<random>/callback` redirect) cannot complete in this environment, and the real tools (e.g. `generate_image`, `models_explore`, `balance`, …) are already reachable.\n\n',
|
||
'If a real tool actually fails with an auth-related error, report the exact tool name and error text and stop — the user will reconnect the server in Settings → External MCP. Do not retry by invoking any `*_authenticate` tool.\n',
|
||
].join('');
|
||
}
|
||
|
||
const CODEX_IMAGEGEN_MODEL_IDS = new Set(
|
||
IMAGE_MODELS.filter(
|
||
(model) =>
|
||
model?.provider === 'openai' &&
|
||
typeof model?.id === 'string' &&
|
||
model.id.startsWith('gpt-image-'),
|
||
).map((model) => model.id),
|
||
);
|
||
|
||
export function resolveCodexImagegenModelId(
|
||
metadata: ProjectMetadata | undefined,
|
||
): string {
|
||
const imageModel =
|
||
typeof metadata?.imageModel === 'string' ? metadata.imageModel.trim() : '';
|
||
return CODEX_IMAGEGEN_MODEL_IDS.has(imageModel) ? imageModel : '';
|
||
}
|
||
|
||
export function shouldRenderCodexImagegenOverride(
|
||
agentId: string | null | undefined,
|
||
metadata: ProjectMetadata | undefined,
|
||
): boolean {
|
||
const normalizedAgentId =
|
||
typeof agentId === 'string' ? agentId.trim().toLowerCase() : '';
|
||
return (
|
||
normalizedAgentId === 'codex' &&
|
||
metadata?.kind === 'image' &&
|
||
resolveCodexImagegenModelId(metadata).length > 0
|
||
);
|
||
}
|
||
|
||
function shouldAllowCodexImagegenOverride(
|
||
metadata: ProjectMetadata | undefined,
|
||
mediaExecution: MediaExecutionPolicy | undefined,
|
||
): boolean {
|
||
const mode = mediaExecution?.mode ?? 'enabled';
|
||
if (mode !== 'enabled') return false;
|
||
if (
|
||
Array.isArray(mediaExecution?.allowedSurfaces) &&
|
||
mediaExecution.allowedSurfaces.length > 0 &&
|
||
!mediaExecution.allowedSurfaces.includes('image')
|
||
) {
|
||
return false;
|
||
}
|
||
const model = resolveCodexImagegenModelId(metadata);
|
||
if (
|
||
model &&
|
||
Array.isArray(mediaExecution?.allowedModels) &&
|
||
mediaExecution.allowedModels.length > 0 &&
|
||
!mediaExecution.allowedModels.includes(model)
|
||
) {
|
||
return false;
|
||
}
|
||
return true;
|
||
}
|
||
|
||
export function renderCodexImagegenOverride(
|
||
agentId: string | null | undefined,
|
||
metadata: ProjectMetadata | undefined,
|
||
): string {
|
||
if (!shouldRenderCodexImagegenOverride(agentId, metadata)) {
|
||
return '';
|
||
}
|
||
const imageModel = resolveCodexImagegenModelId(metadata);
|
||
|
||
return `
|
||
|
||
---
|
||
|
||
## Codex built-in imagegen override (load-bearing — Codex only)
|
||
|
||
The active agent is Codex and this image project selected \`${imageModel}\`.
|
||
For this specific case, use Codex's built-in image generation capability
|
||
instead of \`"$OD_NODE_BIN" "$OD_BIN" media generate\` for the first generation
|
||
attempt. This is an intentional exception to the media generation contract and
|
||
the active image skill's dispatcher wording.
|
||
|
||
Do not require, request, or mention \`OPENAI_API_KEY\` before trying the
|
||
built-in path. Reuse the project metadata, reference prompt template, aspect
|
||
ratio, style notes, and the user's current brief to form the final image
|
||
prompt. Generate the image with Codex built-in imagegen, then use the actual
|
||
output path returned by the built-in imagegen result as the source file first.
|
||
Only if the built-in result does not return a usable path should you search
|
||
\`\${CODEX_HOME:-$HOME/.codex}/generated_images/.../ig_*.png\` as a fallback
|
||
source. Never leave a project-referenced asset only under \`$CODEX_HOME\`.
|
||
|
||
Copy or move the selected generated file into \`$OD_PROJECT_DIR\` with a short
|
||
descriptive filename, then verify the exact destination file exists under
|
||
\`$OD_PROJECT_DIR\` before claiming success. If reading the source path,
|
||
creating the destination directory, copying/moving, or verifying the copied
|
||
asset fails, report the exact source path, destination path, and access/copy
|
||
error. Do not claim success, silently fall back, or ask about OpenAI/Azure
|
||
fallback after a generated image exists but the project copy fails; stop after
|
||
reporting the failure unless the user explicitly chooses fallback in a later
|
||
turn, because fallback may create a different image.
|
||
|
||
After the file exists under \`$OD_PROJECT_DIR\`, reply with the project-local
|
||
filename and a short summary of the prompt used. Do not emit an \`<artifact>\`
|
||
block for media.
|
||
|
||
If Codex built-in imagegen is unavailable or generation fails before producing
|
||
an image, surface the actual failure message and ask the user for one-time
|
||
confirmation before falling back to the existing OpenAI/Azure API-key provider
|
||
path via \`"$OD_NODE_BIN" "$OD_BIN" media generate --surface image --model ${imageModel}\`.
|
||
Do not silently fall back.`;
|
||
}
|
||
|
||
function renderMetadataBlock(
|
||
metadata: ProjectMetadata | undefined,
|
||
template: ProjectTemplate | undefined,
|
||
audioVoiceOptions: AudioVoiceOption[] | undefined,
|
||
audioVoiceOptionsError: string | undefined,
|
||
mediaExecution: MediaExecutionPolicy | undefined,
|
||
): string {
|
||
if (!metadata) return '';
|
||
const lines: string[] = [];
|
||
lines.push('\n\n## Project metadata');
|
||
lines.push(
|
||
'These are the structured choices the user made (or skipped) when creating this project. Treat known fields as authoritative; for any field marked "(unknown — ask)" you MUST include a matching question in your turn-1 discovery form.',
|
||
);
|
||
lines.push('');
|
||
lines.push(`- **kind**: ${metadata.kind}`);
|
||
if (metadata.platform) {
|
||
lines.push(`- **platform**: ${metadata.platform}`);
|
||
} else if (metadata.kind === 'prototype' || metadata.kind === 'template' || metadata.kind === 'other') {
|
||
lines.push('- **platform**: (unknown — ask: responsive web, desktop web, iOS app, Android app, tablet app, or desktop app?)');
|
||
}
|
||
if (Array.isArray(metadata.platformTargets) && metadata.platformTargets.length > 0) {
|
||
lines.push(`- **platformTargets**: ${metadata.platformTargets.join(', ')}`);
|
||
}
|
||
if (metadata.platform === 'responsive' || metadata.platformTargets?.includes('responsive')) {
|
||
lines.push(
|
||
'- **responsive web contract**: `responsive` means one web product experience that adapts across modern browser/device ranges, not only legacy desktop/tablet/mobile buckets. It is not an iOS app, Android app, or native tablet app target. Show responsive behavior through real product layout changes; do not render viewport labels as user-facing product content. Cover 2025–2026 breakpoints: mobile compact 360px, mobile standard 390–430px, foldable/small tablet 600–744px, tablet portrait 768–834px, tablet landscape/large tablet 1024–1180px, laptop 1280–1366px, desktop 1440–1536px, and wide 1920px. Use fluid `clamp()` scales, container queries where useful, and explicit layout changes at semantic thresholds. Verify no horizontal scroll at 360px, 390px, 430px, 768px, 820px, 1024px, 1366px, 1440px, and 1920px unless the brief explicitly asks for a pan/board canvas.',
|
||
);
|
||
}
|
||
if ((metadata.platformTargets?.length ?? 0) > 1) {
|
||
lines.push(
|
||
'- **cross-platform deliverable rule**: each selected target keeps the same product goal but MUST be delivered as its own product screen/file when more than one concrete target is selected. Use clear files such as `landing.html` (if enabled), `mobile-ios.html`, `mobile-android.html`, `tablet.html`, `desktop.html`, plus shared `css/` and `js/` when useful. `index.html` may be a launcher/overview that links to these files, but it must not be the only place where mobile/tablet/desktop designs live. Do not collapse cross-platform work into a single tabbed demo, selector UI, comparison board, platform map, or labelled documentation section inside one mock product page.',
|
||
);
|
||
}
|
||
if (metadata.kind === 'prototype' || metadata.kind === 'template' || metadata.kind === 'other') {
|
||
lines.push(
|
||
'- **screen-file-first rule**: each distinct user-facing screen or surface MUST be delivered as its own HTML file unless the user explicitly asks for a single-page scroll or single-file artifact. Do not combine landing pages, product app screens, dashboards, history, pricing, settings, mobile app, tablet app, desktop app, or OS widget surfaces into one long page. Use `index.html` as a launcher/overview that links to screen files when more than one screen exists; it may summarize the product and show screen cards, but it must not contain the full design for every screen.',
|
||
);
|
||
lines.push(
|
||
'- **product-realism rule**: final artifacts must look like real end-user product UI. Do not render project metadata, screen counts, target counts, state counts, "demo only" labels, "settings" panels for choosing platforms, "full design target" badges, viewport/device selector controls, theme/style knobs, platform output maps, behavior-spec sections, or design-process cards inside the product unless the user explicitly asks for a design spec/dashboard. Any navigation/tabs inside the artifact must be real product navigation, not designer controls for switching generated mockups.',
|
||
);
|
||
lines.push(
|
||
'- **visual-system rule**: when the user does not specify colors, layout, or visual direction, you must still make an intentional product-appropriate visual system. Infer a palette from the product category and audience with at least: neutral surface tokens, a primary action color, a secondary/domain accent, and status colors. Avoid plain monochrome/unstyled greyscale outputs. Use tasteful gradients, illustrations, iconography, device/product mockups, and colored state moments where they clarify the product, while still avoiding generic beige/peach/pink/brown AI washes.',
|
||
);
|
||
lines.push(
|
||
'- **app-specific modules rule**: include domain-specific in-app modules/components by default (cards, panels, controls, charts, lists, quick actions, status modules, mini players, checkout/cart summaries, etc. as appropriate). These are product UI modules, not OS home-screen widgets. Give each major module a clear purpose, states, and responsive behavior instead of generic card grids.',
|
||
);
|
||
lines.push(
|
||
'- **CJX-ready UX rule**: the artifact must be implementation-ready, not a static screenshot. Structure CSS tokens/components/responsive sections clearly; include real JavaScript behavior for meaningful UX such as tabs, dialogs, drawers, filters, generation/copy actions, validation, playback controls, or state transitions. If keeping a self-contained `index.html`, put the CSS/JS in clearly labelled blocks; for complex UX, generate `css/` and `js/` files when useful.',
|
||
);
|
||
lines.push(
|
||
'- **interaction-fidelity rule**: when the requested screen includes user input, generation, copying, validation, login, checkout, filtering, or any action verb, build real interactive controls for that screen. Do not substitute static text rows, prefilled-only mockups, screenshot-like device frames, or decorative state cards for editable inputs and working actions.',
|
||
);
|
||
lines.push(
|
||
'- **artifact-output rule**: when you generate an HTML artifact, keep conversational prose concise and product-facing. Do not dump the full raw HTML source back into chat; the artifact/file is the source of truth and the assistant message should only summarize the result.',
|
||
);
|
||
}
|
||
if (metadata.includeLandingPage) {
|
||
lines.push(
|
||
'- **includeLandingPage**: true — create `landing.html` as a separate responsive marketing companion surface in addition to the selected product/app screens. Do not implement the landing page only as a section inside `index.html`, even for responsive-web-only projects. If there is a working product/app screen, create it as a separate file such as `app.html`, `dashboard.html`, or a domain-specific screen name. `index.html` should be a lightweight launcher/overview when multiple files exist. Include hero, value props, product screenshots/device mockups, proof/features, and an appropriate CTA such as waitlist, download, or contact sales.',
|
||
);
|
||
}
|
||
if (metadata.includeOsWidgets) {
|
||
lines.push(
|
||
'- **includeOsWidgets**: true — add platform-native OS home-screen / lock-screen / quick-access widget surfaces where relevant. These are outside-the-app widgets (for example iOS WidgetKit, Android home screen widget, Live Activity/lock screen, tablet glance panel), not in-app cards. Include realistic widget sizes and direct quick actions for the domain.',
|
||
);
|
||
}
|
||
if (metadata.intent === 'live-artifact') {
|
||
lines.push(
|
||
'- **intent**: live-artifact — the user chose New live artifact. The first output should be a live artifact/dashboard/report, not a one-off static mockup. Prefer the `live-artifact` skill workflow when available, keep source data compact, and register through the daemon live-artifact tool path once that wrapper/tooling is available.',
|
||
);
|
||
lines.push(
|
||
'- **connector-source rule**: if the user names a connector/source (for example Notion) and daemon connector tools are available, list connectors before asking where the data comes from. When the named connector is `connected`, use its read-only tools and ask follow-up questions only for missing topic/page/database details, multiple equally plausible matches, or an unconnected/missing connector.',
|
||
);
|
||
}
|
||
|
||
if (metadata.kind === 'prototype') {
|
||
lines.push(
|
||
`- **fidelity**: ${metadata.fidelity ?? '(unknown — ask: wireframe vs high-fidelity)'}`,
|
||
);
|
||
}
|
||
if (metadata.kind === 'deck') {
|
||
lines.push(
|
||
`- **slideCount**: ${metadata.slideCount ?? '(unknown — ask only if the Active plugin / Plugin inputs block does not already include slideCount)'}`,
|
||
);
|
||
lines.push(
|
||
`- **speakerNotes**: ${typeof metadata.speakerNotes === 'boolean' ? metadata.speakerNotes : '(unknown — ask: include speaker notes?)'}`,
|
||
);
|
||
}
|
||
if (metadata.kind === 'template') {
|
||
lines.push(
|
||
`- **animations**: ${typeof metadata.animations === 'boolean' ? metadata.animations : '(unknown — ask: include motion/animations?)'}`,
|
||
);
|
||
if (metadata.templateLabel) {
|
||
lines.push(`- **template**: ${metadata.templateLabel}`);
|
||
}
|
||
}
|
||
if (metadata.kind === 'image') {
|
||
lines.push(
|
||
`- **imageModel**: ${metadata.imageModel ?? '(unknown — ask: which image model to use)'}`,
|
||
);
|
||
lines.push(
|
||
`- **aspectRatio**: ${metadata.imageAspect ?? '(unknown — ask: 1:1, 16:9, 9:16, 4:3, 3:4)'}`,
|
||
);
|
||
if (metadata.imageStyle) {
|
||
lines.push(`- **styleNotes**: ${metadata.imageStyle}`);
|
||
}
|
||
if (
|
||
metadata.promptTemplate?.title &&
|
||
typeof metadata.promptTemplate.prompt === 'string' &&
|
||
metadata.promptTemplate.prompt.trim().length > 0
|
||
) {
|
||
lines.push(`- **referenceTemplate**: ${metadata.promptTemplate.title}`);
|
||
}
|
||
lines.push('');
|
||
lines.push(renderMediaMetadataAction(
|
||
'image',
|
||
'`"$OD_NODE_BIN" "$OD_BIN" media generate --surface image --model <imageModel>`',
|
||
mediaExecution,
|
||
));
|
||
}
|
||
if (metadata.kind === 'video') {
|
||
lines.push(
|
||
`- **videoModel**: ${metadata.videoModel ?? '(unknown — ask: which video model to use)'}`,
|
||
);
|
||
lines.push(
|
||
`- **lengthSeconds**: ${typeof metadata.videoLength === 'number' ? metadata.videoLength : '(unknown — ask: 3s / 5s / 10s)'}`,
|
||
);
|
||
lines.push(
|
||
`- **aspectRatio**: ${metadata.videoAspect ?? '(unknown — ask: 16:9, 9:16, 1:1)'}`,
|
||
);
|
||
if (
|
||
metadata.promptTemplate?.title &&
|
||
typeof metadata.promptTemplate.prompt === 'string' &&
|
||
metadata.promptTemplate.prompt.trim().length > 0
|
||
) {
|
||
lines.push(`- **referenceTemplate**: ${metadata.promptTemplate.title}`);
|
||
}
|
||
lines.push('');
|
||
lines.push(renderMediaMetadataAction(
|
||
'video',
|
||
'`"$OD_NODE_BIN" "$OD_BIN" media generate --surface video --model <videoModel> --length <seconds> --aspect <ratio>`',
|
||
mediaExecution,
|
||
));
|
||
if (metadata.videoModel === 'hyperframes-html') {
|
||
lines.push(
|
||
'Special case: `hyperframes-html` is a local HTML-to-MP4 renderer, not a photoreal text-to-video model. Treat it like a motion design renderer, ask at most one clarifying question, then dispatch immediately.',
|
||
);
|
||
}
|
||
}
|
||
if (metadata.kind === 'audio') {
|
||
lines.push(
|
||
`- **audioKind**: ${metadata.audioKind ?? '(unknown — ask: music / speech / sfx)'}`,
|
||
);
|
||
lines.push(
|
||
`- **audioModel**: ${metadata.audioModel ?? '(unknown — ask: which audio model to use)'}`,
|
||
);
|
||
lines.push(
|
||
`- **durationSeconds**: ${typeof metadata.audioDuration === 'number' ? metadata.audioDuration : '(unknown — ask: target duration)'}`,
|
||
);
|
||
if (metadata.voice) {
|
||
lines.push(`- **voice**: ${metadata.voice}`);
|
||
} else if (metadata.audioKind === 'speech') {
|
||
lines.push('- **voice**: (unknown — ask: voice id / accent / pacing)');
|
||
}
|
||
const voiceOptions = shouldRenderElevenLabsVoiceOptions(metadata, audioVoiceOptions)
|
||
? audioVoiceOptions ?? []
|
||
: [];
|
||
if (voiceOptions.length > 0) {
|
||
lines.push(
|
||
'- **ElevenLabs voice options**: Ask the user to choose from a dropdown select. The visible labels are voice descriptions; the selected value must be the exact `voice_id` passed to `--voice`. Do not ask the user to type an id.',
|
||
);
|
||
if (voiceOptions.length > ELEVENLABS_VOICE_PROMPT_OPTION_LIMIT) {
|
||
lines.push(`- **ElevenLabs voice options**: showing the first ${ELEVENLABS_VOICE_PROMPT_OPTION_LIMIT} of ${voiceOptions.length} available voices.`);
|
||
}
|
||
lines.push('');
|
||
lines.push('<question-form id="elevenlabs-voice" title="Choose an ElevenLabs voice">');
|
||
lines.push(JSON.stringify(renderElevenLabsVoiceQuestionForm(voiceOptions), null, 2));
|
||
lines.push('</question-form>');
|
||
} else {
|
||
const audioVoiceOptionsPromptError = formatElevenLabsVoiceOptionsErrorForPrompt(audioVoiceOptionsError);
|
||
if (audioVoiceOptionsPromptError) {
|
||
lines.push(
|
||
`- **ElevenLabs voice options**: ${audioVoiceOptionsPromptError}`,
|
||
);
|
||
}
|
||
}
|
||
if (metadata.audioKind === 'sfx') {
|
||
lines.push(
|
||
'- **SFX discovery**: Ask about the sound source/action, materials, intensity, acoustic space, timing/tail, loop/non-loop, and "avoid" constraints. Do not ask for language or voice for SFX.',
|
||
);
|
||
}
|
||
lines.push('');
|
||
lines.push(renderMediaMetadataAction(
|
||
'audio',
|
||
'`"$OD_NODE_BIN" "$OD_BIN" media generate --surface audio --audio-kind <kind> --model <audioModel> --duration <seconds>` and add `--voice <voice-id>` for speech when you have a provider-specific voice id',
|
||
mediaExecution,
|
||
));
|
||
}
|
||
|
||
if (metadata.inspirationDesignSystemIds && metadata.inspirationDesignSystemIds.length > 0) {
|
||
lines.push(
|
||
`- **inspirationDesignSystemIds**: ${metadata.inspirationDesignSystemIds.join(', ')} — the user picked these systems as *additional* inspiration alongside the primary one. Borrow palette accents, typographic personality, or component patterns from them; don't replace the primary system's tokens.`,
|
||
);
|
||
}
|
||
|
||
if (Array.isArray(metadata.contextPlugins) && metadata.contextPlugins.length > 0) {
|
||
lines.push('');
|
||
lines.push('### @ plugin context');
|
||
lines.push(
|
||
'The user selected these plugins as additive context via @ mentions. Treat them as requested references to combine with the brief; only the explicit active plugin block, if present, is the executable/pinned plugin snapshot.',
|
||
);
|
||
for (const plugin of metadata.contextPlugins) {
|
||
const id = typeof plugin.id === 'string' ? plugin.id : '';
|
||
const title = typeof plugin.title === 'string' && plugin.title.trim().length > 0
|
||
? plugin.title.trim()
|
||
: id;
|
||
if (!id && !title) continue;
|
||
const description = typeof plugin.description === 'string' && plugin.description.trim().length > 0
|
||
? ` — ${plugin.description.trim()}`
|
||
: '';
|
||
lines.push(`- ${title}${id ? ` (\`${id}\`)` : ''}${description}`);
|
||
}
|
||
}
|
||
|
||
if (Array.isArray(metadata.contextMcpServers) && metadata.contextMcpServers.length > 0) {
|
||
lines.push('');
|
||
lines.push('### @ MCP context');
|
||
lines.push(
|
||
'The user selected these MCP servers as context. Prefer their tools when mounted and relevant before asking where data should come from.',
|
||
);
|
||
for (const server of metadata.contextMcpServers) {
|
||
const id = typeof server.id === 'string' ? server.id : '';
|
||
const label = typeof server.label === 'string' && server.label.trim().length > 0
|
||
? server.label.trim()
|
||
: id;
|
||
if (!id && !label) continue;
|
||
const transport = typeof server.transport === 'string' && server.transport.trim().length > 0
|
||
? ` — ${server.transport.trim()}`
|
||
: '';
|
||
lines.push(`- ${label}${id ? ` (\`${id}\`)` : ''}${transport}`);
|
||
}
|
||
}
|
||
|
||
if (Array.isArray(metadata.contextConnectors) && metadata.contextConnectors.length > 0) {
|
||
lines.push('');
|
||
lines.push('### @ connector context');
|
||
lines.push(
|
||
'The user selected these connectors as context. Use daemon connector tools through the OD CLI wrapper when data from these sources is needed; do not ask the user to identify a source that is already selected.',
|
||
);
|
||
for (const connector of metadata.contextConnectors) {
|
||
const id = typeof connector.id === 'string' ? connector.id : '';
|
||
const name = typeof connector.name === 'string' && connector.name.trim().length > 0
|
||
? connector.name.trim()
|
||
: id;
|
||
if (!id && !name) continue;
|
||
const meta = [connector.provider, connector.status, connector.accountLabel]
|
||
.filter((value): value is string => typeof value === 'string' && value.trim().length > 0)
|
||
.join(' · ');
|
||
lines.push(`- ${name}${id ? ` (\`${id}\`)` : ''}${meta ? ` — ${meta}` : ''}`);
|
||
}
|
||
}
|
||
|
||
// Curated prompt template reference for image/video projects. Inlined
|
||
// verbatim (with light truncation) so the agent can borrow structure,
|
||
// mood and phrasing without a separate fetch. The user may have edited
|
||
// the body before clicking Create — those edits land here and are now
|
||
// authoritative for the brief.
|
||
if (
|
||
(metadata.kind === 'image' || metadata.kind === 'video') &&
|
||
metadata.promptTemplate &&
|
||
typeof metadata.promptTemplate.prompt === 'string' &&
|
||
metadata.promptTemplate.prompt.trim().length > 0
|
||
) {
|
||
const tpl = metadata.promptTemplate;
|
||
lines.push('');
|
||
lines.push(`### Reference prompt template — "${tpl.title ?? 'untitled'}"`);
|
||
const meta = [];
|
||
if (tpl.category) meta.push(`category: ${tpl.category}`);
|
||
if (tpl.model) meta.push(`suggested model: ${tpl.model}`);
|
||
if (tpl.aspect) meta.push(`aspect: ${tpl.aspect}`);
|
||
if (Array.isArray(tpl.tags) && tpl.tags.length > 0) {
|
||
meta.push(`tags: ${tpl.tags.join(', ')}`);
|
||
}
|
||
if (meta.length > 0) lines.push(meta.join(' · '));
|
||
if (tpl.summary) {
|
||
lines.push('');
|
||
lines.push(tpl.summary);
|
||
}
|
||
lines.push('');
|
||
lines.push(
|
||
'The user picked this template as inspiration. Treat it as a structural and stylistic reference: borrow composition, palette cues, lighting language, lens/motion direction, and the level of detail. Adapt the wording to the user\'s actual subject and brief — do NOT generate the template subject verbatim. If a field above is unknown the user wants you to follow the template\'s defaults.',
|
||
);
|
||
// Escape triple-backticks so a user who pastes ``` into the editable
|
||
// template body can't break out of the markdown fence below and inject
|
||
// free-form instructions into the agent's system prompt.
|
||
const safe = (tpl.prompt ?? '').replace(/```/g, '`\u200b`\u200b`');
|
||
const truncated =
|
||
safe.length > 4000
|
||
? `${safe.slice(0, 4000)}\n… (truncated ${safe.length - 4000} chars)`
|
||
: safe;
|
||
lines.push('');
|
||
lines.push('```text');
|
||
lines.push(truncated);
|
||
lines.push('```');
|
||
if (tpl.source) {
|
||
const author = tpl.source.author ? ` by ${tpl.source.author}` : '';
|
||
lines.push('');
|
||
lines.push(
|
||
`Source: ${tpl.source.repo}${author} — license ${tpl.source.license ?? 'unspecified'}. Preserve attribution if you echo the template language directly.`,
|
||
);
|
||
}
|
||
}
|
||
|
||
if (metadata.kind === 'template' && template && template.files.length > 0) {
|
||
lines.push('');
|
||
lines.push(
|
||
`### Template reference — "${template.name}"${template.description ? ` (${template.description})` : ''}`,
|
||
);
|
||
lines.push(
|
||
'These HTML snapshots are what the user wants to start FROM. Read them as a stylistic + structural reference. You may copy structure, palette, typography, and component patterns; you may adapt them to the new brief; do NOT ship them verbatim. The agent should still produce its own artifact, just one that visibly inherits this template\'s design language.',
|
||
);
|
||
for (const f of template.files) {
|
||
// Cap each file at ~12k chars so a giant template doesn't blow out
|
||
// the system prompt budget. The agent gets enough to read structure.
|
||
const truncated =
|
||
f.content.length > 12000
|
||
? `${f.content.slice(0, 12000)}\n<!-- … truncated (${f.content.length - 12000} chars omitted) -->`
|
||
: f.content;
|
||
lines.push('');
|
||
lines.push(`#### \`${f.name}\``);
|
||
lines.push('```html');
|
||
lines.push(truncated);
|
||
lines.push('```');
|
||
}
|
||
}
|
||
|
||
return lines.join('\n');
|
||
}
|
||
|
||
function renderMediaMetadataAction(
|
||
surface: MediaSurface,
|
||
command: string,
|
||
mediaExecution: MediaExecutionPolicy | undefined,
|
||
): string {
|
||
const article = surface === 'audio' ? 'an' : 'a';
|
||
const mode = mediaExecution?.mode ?? 'enabled';
|
||
if (mode === 'disabled') {
|
||
return `This is ${article} **${surface}** project, but Open Design-owned media execution is disabled for this run. Plan the creative brief only unless an external MCP media tool is explicitly configured. Do NOT call OD media generation tools and do NOT emit \`<artifact>\` HTML for media surfaces.`;
|
||
}
|
||
return `This is ${article} **${surface}** project. Plan the creative brief carefully, then dispatch via the **media generation contract** using ${command}. Do NOT emit \`<artifact>\` HTML for media surfaces.`;
|
||
}
|
||
|
||
function shouldRenderElevenLabsVoiceOptions(
|
||
metadata: ProjectMetadata,
|
||
audioVoiceOptions: AudioVoiceOption[] | undefined,
|
||
): boolean {
|
||
return metadata.kind === 'audio'
|
||
&& metadata.audioKind === 'speech'
|
||
&& metadata.audioModel === 'elevenlabs-v3'
|
||
&& !metadata.voice
|
||
&& Array.isArray(audioVoiceOptions)
|
||
&& audioVoiceOptions.length > 0;
|
||
}
|
||
|
||
function renderElevenLabsVoiceQuestionForm(voiceOptions: AudioVoiceOption[]): {
|
||
description: string;
|
||
questions: Array<{
|
||
id: string;
|
||
label: string;
|
||
type: 'select';
|
||
required: boolean;
|
||
placeholder: string;
|
||
help: string;
|
||
options: Array<{ label: string; value: string }>;
|
||
}>;
|
||
submitLabel: string;
|
||
} {
|
||
const options = voiceOptions.slice(0, ELEVENLABS_VOICE_PROMPT_OPTION_LIMIT).map((option) => ({
|
||
label: formatElevenLabsVoiceLabel(option),
|
||
value: option.voiceId,
|
||
}));
|
||
return {
|
||
description:
|
||
'Pick a voice by description. The selected answer will be the exact voice_id passed to the renderer.',
|
||
questions: [
|
||
{
|
||
id: 'voice',
|
||
label: 'Voice',
|
||
type: 'select',
|
||
required: true,
|
||
placeholder: 'Choose a voice',
|
||
help: 'Select a voice description; the answer submits the matching Voice ID.',
|
||
options,
|
||
},
|
||
],
|
||
submitLabel: 'Use voice',
|
||
};
|
||
}
|
||
|
||
function formatElevenLabsVoiceLabel(option: AudioVoiceOption): string {
|
||
const labels = option.labels && typeof option.labels === 'object'
|
||
? Object.values(option.labels)
|
||
.map((value) => (typeof value === 'string' ? value.trim() : ''))
|
||
.filter(Boolean)
|
||
: [];
|
||
const bits = [...labels];
|
||
if (bits.length > 0) return `${option.name} — ${bits.join(' · ')}`;
|
||
const category = typeof option.category === 'string' ? option.category.trim() : '';
|
||
return category ? `${option.name} — ${category}` : option.name;
|
||
}
|
||
|
||
/**
|
||
* Detect the seed/references pattern shipped by the upgraded
|
||
* web-prototype / mobile-app / simple-deck / guizang-ppt skills, and
|
||
* inject a hard pre-flight rule that lists which side files to Read
|
||
* before doing anything else. The skill body's own workflow already says
|
||
* this — but skills get truncated under context pressure and the agent
|
||
* sometimes skips Step 0. A short up-front directive helps.
|
||
*
|
||
* Returns an empty string when the skill ships no side files (legacy
|
||
* SKILL.md-only skills) so we don't add noise.
|
||
*/
|
||
function derivePreflight(skillBody: string): string {
|
||
const refs: string[] = [];
|
||
if (/assets\/template\.html/.test(skillBody)) refs.push('`assets/template.html`');
|
||
if (/references\/layouts\.md/.test(skillBody)) refs.push('`references/layouts.md`');
|
||
if (/references\/themes\.md/.test(skillBody)) refs.push('`references/themes.md`');
|
||
if (/references\/components\.md/.test(skillBody)) refs.push('`references/components.md`');
|
||
if (/references\/checklist\.md/.test(skillBody)) refs.push('`references/checklist.md`');
|
||
// The hyperframes skill ships an html-in-canvas reference next to the
|
||
// VFX catalog blocks. The chat handler at server.ts:4138 routes through
|
||
// this composer (not the contracts copy), so the case must live here
|
||
// too — otherwise live agent runs miss the preflight directive even
|
||
// when the skill body explicitly lists the file.
|
||
if (/references\/html-in-canvas\.md|html-in-canvas\.md/.test(skillBody)) {
|
||
refs.push('`references/html-in-canvas.md`');
|
||
}
|
||
if (refs.length === 0) return '';
|
||
return ` **Pre-flight (do this before any other tool):** Read ${refs.join(', ')} via the path written in the skill-root preamble. The seed template defines the class system you'll paste into; the layouts file is the only acceptable source of section/screen/slide skeletons; the checklist is your P0/P1/P2 gate before emitting \`<artifact>\`. Skipping this step is the #1 reason output regresses to generic AI-slop.`;
|
||
}
|