open-design/apps/daemon/src/prompts/system.ts
Dan Porat 3395d2c855
feat(daemon): implement fal.ai renderer for image + video generation (#1606)
* feat(daemon): implement fal.ai renderer for image + video generation

Adds renderFalImage and renderFalVideo backed by the fal queue API
(queue.fal.run). Any fal-ai/* model path can be used directly without
a catalog entry, enabling the full fal model library without code
changes. Catalogued shortcuts are mapped via FAL_ENDPOINTS to their
fal-ai/* paths; OD_FAL_MAX_POLL_MS controls the poll ceiling.

Expands the fal model catalog with flux-pro-ultra, flux-dev-fal,
flux-schnell-fal, ideogram-v3-fal, recraft-v3-fal (images) and
veo-3-fal, veo-2-fal, wan-2.1-t2v, wan-2.1-i2v, seedance-1-pro-fal,
kling-2.1-t2v-fal (video). Marks fal provider as integrated: true in
both daemon and web model registries.

* fix(daemon): address fal renderer review comments

- Correct Wan 2.1 endpoints: wan-video/v2.1/* → fal-ai/wan-t2v / fal-ai/wan-i2v
- Correct Kling 2.1 t2v endpoint: .../pro/... → .../master/text-to-video
- Add FAL_IMAGE_USES_ASPECT_RATIO: flux-pro-ultra sends aspect_ratio not image_size
- Add FAL_VIDEO_NO_DURATION: Wan models reject the duration field
- Add FAL_VIDEO_STRING_DURATION: Veo expects duration as "5s" not 5
- Fix falQueueBase() to use anchored regex replace, avoiding mangled
  custom base URLs
- Do not wrap payload under input — raw fal queue HTTP API expects flat
  body; the input wrapper is an SDK abstraction only (confirmed by 422
  validation error from fal showing prompt missing at body.prompt)

* fix(daemon): correct fal queue protocol comment (flat body, no SDK input wrapper)

* fix(daemon): clamp Veo duration to valid fal buckets (4s/6s/8s)

* fix(daemon): report effective fal Veo duration in providerNote (with snap warning)

* fix(daemon): reduce image generation latency from 4m37s to ~73s

Five layered fixes targeting the overhead that padded a ~10s fal API call
into a 4m37s user-facing wait:

1. Skip DISCOVERY_AND_PHILOSOPHY for media surfaces (image/video/audio).
   The ~3000-token HTML-artifact discovery layer is irrelevant for media
   generation and forced the agent to parse and override all its rules
   before dispatching. Removes it from the system prompt entirely for
   these surfaces; MEDIA_GENERATION_CONTRACT is the sole authority.

2. Broaden the wait-loop contract to cover ALL slow models, not just
   "Volcengine i2v / hyperframes-html". Any model whose generation
   exceeds 25s — including fal flux-pro-ultra, Veo, Sora — returns exit 2
   from od media generate. The contract now makes this universal and
   provides a python3-based bash pattern (jq is not guaranteed to be
   installed on all agent runtimes).

3. Increase od media wait polling budget from 25s to 120s. od media
   generate keeps its 25s budget for fast feedback; od media wait is
   purpose-built to sit and poll, so it can safely use the full 2-minute
   bash-tool window. Reduces re-entries for a 3-minute generation from
   ~7 to ~2.

4. First fal poll is now immediate instead of always sleeping 3s before
   the first status check. Saves 3s for all fal jobs.

5. Project metadata no longer emits "(unknown — ask)" for imageModel and
   aspectRatio when unset. Emits the actual defaults (gpt-image-2,
   aspect-ratio scene heuristic) so the agent can dispatch without
   extended reasoning about model selection. Also adds dispatch-immediately
   defaults and a brief-reply rule (2–3 sentences max after generation).

Measured end-to-end on the exact problem prompt before/after:
  Before: 4m37s (discovery form + 7x LLM re-entries + jq failure)
  After:  ~73s   (single bash loop, no question turn, image delivered)

* feat(daemon): inject media dispatch hint for non-media project surfaces

Agents running inside prototype, deck, and other non-image/video/audio
projects previously had no knowledge of `od media generate`, so when
asked to create an image with fal they would try to call provider REST
APIs directly and ask the user for API keys — even though the daemon
already holds credentials in .od/media-config.json.

Add MEDIA_DISPATCH_HINT to composeSystemPrompt for all non-media
surfaces. The hint tells the agent to always route media generation
through the daemon dispatcher, and explicitly forbids prompting for
API keys.

Verified end-to-end: a prototype project generates a 952 KB image via
flux-pro-ultra in ~52s with no key errors.

* fix(daemon): prevent agent from converting bash env vars to PowerShell syntax

MEDIA_DISPATCH_HINT now explicitly labels the shell as POSIX bash and
shows the correct $VAR form side-by-side with a warning NOT to use
PowerShell $env:VAR. Without this, claude-sonnet running on a Windows
host converts the example to PowerShell syntax (`& $env:OD_NODE_BIN`)
which then fails at the bash executor with 'syntax error near unexpected
token &'.

* fix(daemon): add generate→wait loop to MEDIA_DISPATCH_HINT for slow models

MEDIA_DISPATCH_HINT previously showed only a bare  call.
flux-pro-ultra and other slow models always exit 2 after ~25s — without
the wait loop the agent would treat exit 2 as a failure and report an
error to the user.

Replace the single-command example with the canonical generate→wait loop
(matching media-contract.ts), add an explicit note that exit 2 means
'keep polling', and reinforce the POSIX bash / no-PowerShell rule
directly inside the code block.

* fix(daemon): allow fal-ai/* passthrough in media-agent contract

The media-agent prompt instructed the agent to warn and substitute the
default model for any ID not in the catalogue. This blocked the custom
fal-ai/* passthrough path the daemon already supports, so users could
not reach uncatalogued fal models from the normal chat flow.

Carve out the fal-ai/* exception so the agent passes those IDs through
directly instead of warning or substituting.

* fix(daemon): align MEDIA_DISPATCH_HINT with exit-0 generate contract

media generate now always exits 0 (handoff included). The non-media
agent hint still checked ec==2 to decide whether to keep polling, so
slow fal models (flux-pro-ultra, veo-3-fal) would stop after printing
the handoff JSON instead of entering the wait loop.

- generate error check: drop the ec!=2 exception (exits 0 always)
- while loop: drive on taskId presence, not ec==2; stop on ec==0/5
- footer: remove --surface inference claim; CLI requires it explicitly

* fix(guard): add test-fal-webui.ts to e2e scripts allowlist

CI failed: guard flagged e2e/scripts/test-fal-webui.ts as an
unapproved package-owned entrypoint. Add it to allowedE2eScripts.

* fix(daemon): update prompt test expectations to match exit-0 handoff wording

The two stale assertions checked for the old generate-exits-2 copy which
no longer exists in the contract. Update them to match the current
always-exits-0 wording.

* fix(daemon): move skipDiscoveryBrief override before discovery block

* chore(e2e): remove ad-hoc fal webui test script

The script was a one-time developer helper used to manually validate fal
image generation through the live UI. It relied on a real fal API key and
hardcoded local port, so it cannot participate in the e2e package's
fixture/reporting/CI conventions. Removing it per reviewer feedback.

- Delete e2e/scripts/test-fal-webui.ts
- Remove its guard.ts allowlist entry
- Gitignore the file and its screenshots to prevent accidental re-addition

* chore: remove accidental local scratch files from branch

Remove bash.exe.stackdump (MSYS crash dump) and fix_loop.py (one-off
local rewrite helper) — neither is a repo-owned source artifact.

* fix(prompts): document fal-ai/* passthrough in non-media dispatch hint

Prototype/deck agents now know arbitrary fal-ai/* model ids are valid
--model values and should be forwarded as-is, mirroring the exception
already present in media-contract.ts. Adds a prompt regression test.

* fix(daemon): use renderMediaGenerationContract(mediaExecution) for media surfaces

---------

Co-authored-by: mrcfps <mrc@powerformer.com>
2026-05-31 04:44:44 +00:00

1382 lines
74 KiB
TypeScript
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

/**
* Prompt composer. The base is the OD-adapted "expert designer" system
* prompt (see ./official-system.ts) — a full identity, workflow, and
* content-philosophy charter. Stacked on top:
*
* 1. The discovery + planning + huashu-philosophy layer (./discovery.ts)
* — interactive question-form syntax, direction-picker fork,
* brand-spec extraction, TodoWrite reinforcement, 5-dim critique,
* and the embedded `directions.ts` library.
* 2. The active design system's DESIGN.md (if any) — palette, typography,
* spacing rules treated as authoritative tokens.
* 3. The active skill's SKILL.md (if any) — workflow specific to the
* kind of artifact being built. When the skill ships a seed
* (`assets/template.html`) and references (`references/layouts.md`,
* `references/checklist.md`), we inject a hard pre-flight rule above
* the skill body so the agent reads them BEFORE writing any code.
* 4. For decks (skillMode === 'deck' OR metadata.kind === 'deck'), the
* deck framework directive (./deck-framework.ts) is pinned LAST so it
* overrides any softer slide-handling wording earlier in the stack —
* this is the load-bearing nav / counter / scroll JS / print
* stylesheet contract that PDF stitching depends on. We also fire on
* the metadata path so deck-kind projects without a bound skill
* (skill_id null) still get a framework, instead of having the agent
* re-author scaling / nav / print logic from scratch each turn. When
* the active skill ships its own seed (skill body references
* `assets/template.html`), we defer to that seed and skip the generic
* skeleton — the skill's framework wins to avoid double-injection.
*
* The composed string is what the daemon sees as `systemPrompt` and what
* the Anthropic path sends as `system`.
*/
import { OFFICIAL_DESIGNER_PROMPT } from './official-system.js';
import { DISCOVERY_AND_PHILOSOPHY } from './discovery.js';
import { DECK_FRAMEWORK_DIRECTIVE } from './deck-framework.js';
import { renderMediaGenerationContract } from './media-contract.js';
import { IMAGE_MODELS } from '../media-models.js';
import { renderPanelPrompt } from './panel.js';
import { defaultCritiqueConfig, type CritiqueConfig } from '@open-design/contracts/critique';
import type { MediaExecutionPolicy, MediaSurface } from '@open-design/contracts';
const ELEVENLABS_VOICE_PROMPT_OPTION_LIMIT = 100;
const ELEVENLABS_VOICE_OPTIONS_PROMPT_PREFIX = 'ElevenLabs voice list could not be loaded';
const PROMPT_SAFE_HTTP_STATUS_LABELS: Record<string, string> = {
'400': 'Bad Request',
'401': 'Unauthorized',
'403': 'Forbidden',
'404': 'Not Found',
'429': 'Too Many Requests',
'500': 'Internal Server Error',
'502': 'Bad Gateway',
'503': 'Service Unavailable',
'504': 'Gateway Timeout',
};
function renderUiLocalePrompt(locale: string | undefined): string {
const normalized = locale?.trim();
if (!normalized || normalized.toLowerCase() === 'en') return '';
const languageName = normalized === 'zh-CN'
? 'Simplified Chinese'
: normalized === 'zh-TW'
? 'Traditional Chinese'
: normalized;
const lines = [
'# UI locale override',
'',
`The Open Design UI locale for this run is \`${normalized}\` (${languageName}). All user-visible chat prose and generated UI controls must follow this locale, especially \`<question-form>\` titles, descriptions, labels, placeholders, helper text, and option labels. Keep machine-readable ids and object option \`value\` fields exact and unlocalized.`,
'Exception: for the default task-type form, keep the `taskType` option labels as the canonical routing choices: `Prototype`, `Live artifact`, `Slide deck`, `Image`, `Video`, `HyperFrames`, `Audio`, `Other`. Do not translate, reorder, or rewrite those option labels.',
];
if (normalized === 'zh-CN') {
lines.push(
'',
'For the default quick brief in Simplified Chinese, use copy like:',
'- title: `快速简报 — 30 秒`',
'- description: `开始生成前我会先确认这些信息。不适用的可以跳过,我会补上默认值。`',
'- output label/options: `我们要做什么?` / `幻灯片 / 路演稿`, `单页网页原型 / 落地页`, `多屏应用原型`, `数据看板 / 工具界面`, `编辑式 / 营销页面`, `其他 — 我来描述`',
'- platform label/options: `目标平台` / `响应式网页`, `桌面网页`, `iOS 应用`, `Android 应用`, `平板应用`, `桌面应用`, `固定画布 (1920×1080)`',
'- audience label/placeholder: `目标用户` / `例如:早期投资人、开发者工具采购者、内部高管评审`',
'- tone label/options: `视觉调性` / `编辑 / 杂志感`, `现代极简`, `活泼 / 插画感`, `科技 / 工具型`, `奢华 / 精致`, `粗野 / 实验性`, `人性化 / 亲切`',
'- brand label/options: `品牌背景` / `帮我选一个方向`, `我有品牌规范 — 稍后分享`, `参考网站 / 截图 — 稍后附上`',
'- scale label/placeholder: `大概需要多少内容?` / `例如8 页幻灯片、1 个落地页 + 3 个子页面、4 个移动端界面`',
'- constraints label/placeholder: `还有什么需要知道的吗?` / `真实文案、必须使用的字体、需要避免的内容、截止时间…`',
);
}
return lines.join('\n');
}
function normalizePromptText(value: string): string {
return value
.replace(/[\r\n]+/g, ' ')
.replace(/\s+/g, ' ')
.trim();
}
function formatElevenLabsVoiceOptionsErrorForPrompt(
error: string | undefined,
): string | undefined {
const trimmed = normalizePromptText(error ?? '');
if (!trimmed) return undefined;
if (/no ElevenLabs API key/i.test(trimmed)) {
return `${ELEVENLABS_VOICE_OPTIONS_PROMPT_PREFIX} because the ElevenLabs API key is missing. Tell the user to configure it in Settings or paste a voice id manually.`;
}
const statusMatch = trimmed.match(
/(?:\((\d{3})(?:\s+([^)]+))?\)|\b(\d{3})(?:\s+([A-Za-z][A-Za-z -]{0,40}))?\b)/,
);
if (statusMatch) {
const statusCode = statusMatch[1] ?? statusMatch[3];
const statusText = statusCode ? PROMPT_SAFE_HTTP_STATUS_LABELS[statusCode] ?? '' : '';
const suffix = statusText ? ` ${statusText}` : '';
return `${ELEVENLABS_VOICE_OPTIONS_PROMPT_PREFIX} (${statusCode}${suffix}). Tell the user to retry the lookup or paste a voice id manually.`;
}
return `${ELEVENLABS_VOICE_OPTIONS_PROMPT_PREFIX}. Tell the user to retry the lookup or paste a voice id manually.`;
}
type ProjectMetadata = {
kind?: string;
intent?: string | null;
fidelity?: string | null;
speakerNotes?: boolean | null;
slideCount?: string | null;
animations?: boolean | null;
includeLandingPage?: boolean | null;
includeOsWidgets?: boolean | null;
templateId?: string | null;
templateLabel?: string | null;
platform?: string | null;
platformTargets?: string[] | null;
inspirationDesignSystemIds?: string[];
skipDiscoveryBrief?: boolean | null;
imageModel?: string | null;
imageAspect?: string | null;
imageStyle?: string | null;
videoModel?: string | null;
videoLength?: number | null;
videoAspect?: string | null;
audioKind?: string | null;
audioModel?: string | null;
audioDuration?: number | null;
voice?: string | null;
promptTemplate?: {
id?: string | null;
surface?: 'image' | 'video' | null;
title?: string | null;
prompt?: string | null;
summary?: string | null;
category?: string | null;
tags?: string[] | null;
model?: string | null;
aspect?: string | null;
source?: {
repo?: string | null;
license?: string | null;
author?: string | null;
url?: string | null;
} | null;
} | null;
contextPlugins?: Array<{
id?: string | null;
title?: string | null;
description?: string | null;
}> | null;
contextMcpServers?: Array<{
id?: string | null;
label?: string | null;
transport?: string | null;
url?: string | null;
command?: string | null;
}> | null;
contextConnectors?: Array<{
id?: string | null;
name?: string | null;
provider?: string | null;
category?: string | null;
status?: string | null;
accountLabel?: string | null;
}> | null;
};
type ProjectTemplate = { name: string; description?: string | null; files: Array<{ name: string; content: string }> };
type AudioVoiceOption = {
name: string;
voiceId: string;
category?: string | null;
labels?: Record<string, string> | null;
};
type ExclusiveSurfaceMode = 'deck' | 'image' | 'video' | 'audio';
const EXCLUSIVE_SURFACE_MODES = new Set<ExclusiveSurfaceMode>(['deck', 'image', 'video', 'audio']);
export function resolveExclusiveSurface(args: {
metadata?: ProjectMetadata | undefined;
skillMode?: ComposeInput['skillMode'] | undefined;
skillModes?: ComposeInput['skillModes'] | undefined;
}): ExclusiveSurfaceMode | null {
const activeSkillModes = new Set(
Array.isArray(args.skillModes)
? args.skillModes.filter(Boolean)
: args.skillMode
? [args.skillMode]
: [],
);
const metadataSurface = EXCLUSIVE_SURFACE_MODES.has(args.metadata?.kind as ExclusiveSurfaceMode)
? args.metadata?.kind as ExclusiveSurfaceMode
: null;
const primarySkillSurface = EXCLUSIVE_SURFACE_MODES.has(args.skillMode as ExclusiveSurfaceMode)
? args.skillMode as ExclusiveSurfaceMode
: null;
const composedSurfaceModes = Array.from(activeSkillModes).filter((mode): mode is ExclusiveSurfaceMode =>
EXCLUSIVE_SURFACE_MODES.has(mode as ExclusiveSurfaceMode),
);
return metadataSurface
?? primarySkillSurface
?? (composedSurfaceModes.length === 1 ? composedSurfaceModes[0] ?? null : null);
}
export const BASE_SYSTEM_PROMPT = OFFICIAL_DESIGNER_PROMPT;
export const SKIP_DISCOVERY_BRIEF_OVERRIDE = `# Automated project mode — skip discovery form
This project was created through the daemon API with \`skipDiscoveryBrief: true\`. Override the discovery rules below: do NOT emit \`<question-form id="discovery">\`, do NOT show "Quick brief — 30 seconds", and do NOT ask a first-turn clarification form. Treat the user's first message and project metadata as the brief, then proceed directly to planning/building under the normal artifact workflow. Ask at most one concise follow-up only if a required detail is impossible to infer safely.`;
// Injected into non-media projects so the agent knows how to dispatch
// media generation if the user asks for it mid-session (e.g. "generate an
// image with fal"). Without this, agents in prototype/deck projects try to
// call provider REST APIs directly and ask the user for keys that the daemon
// already holds in .od/media-config.json.
const MEDIA_DISPATCH_HINT = `
---
## Media generation (if asked)
If the user asks you to generate an image, video, or audio file — regardless of which provider or model they mention (fal, Replicate, OpenAI, etc.) — use the daemon dispatcher via your **Bash tool**. Do NOT call provider REST APIs directly.
The daemon injects these env vars into your shell (**POSIX bash — not PowerShell**):
- \`OD_NODE_BIN\` — absolute path to the Node runtime
- \`OD_BIN\` — absolute path to the OD CLI script
- \`OD_PROJECT_ID\` — the active project id
**Always use the generate→wait loop below.** \`media generate\` always exits 0 — either with \`{"file":{...}}\` if done within ~25s, or with \`{"taskId":"..."}\` as a handoff for slow models (flux-pro-ultra ~60180s, veo-3-fal longer). Whenever the output contains a \`taskId\`, keep polling with \`media wait\` until exit 0 (done) or exit 5 (failed).
Use **POSIX \`$VAR\` syntax** — do NOT translate to PowerShell (\`$env:VAR\`, \`&\` operator). Uses \`python3\` for JSON parsing (do NOT use \`jq\`):
\`\`\`bash
# POSIX bash — do NOT convert to PowerShell
out=\$("$OD_NODE_BIN" "$OD_BIN" media generate \\
--project "$OD_PROJECT_ID" \\
--surface image \\
--model flux-pro-ultra \\
--prompt "..." \\
--aspect 16:9)
ec=\$?
if [ "\$ec" -ne 0 ]; then echo "\$out" >&2; exit "\$ec"; fi
last=\$(printf '%s\\n' "\$out" | tail -1)
task_id=\$(printf '%s\\n' "\$last" | python3 -c "import sys,json; d=json.load(sys.stdin); print(d.get('taskId',''))" 2>/dev/null)
since=\$(printf '%s\\n' "\$last" | python3 -c "import sys,json; d=json.load(sys.stdin); print(d.get('nextSince',0))" 2>/dev/null)
since="\${since:-0}"
while [ -n "\$task_id" ]; do
out=\$("$OD_NODE_BIN" "$OD_BIN" media wait "\$task_id" --since "\$since")
ec=\$?
last=\$(printf '%s\\n' "\$out" | tail -1)
since=\$(printf '%s\\n' "\$last" | python3 -c "import sys,json; d=json.load(sys.stdin); print(d.get('nextSince',\$since))" 2>/dev/null)
since="\${since:-0}"
if [ "\$ec" -eq 0 ]; then
task_id=""
elif [ "\$ec" -ne 2 ]; then
echo "\$out" >&2; exit "\$ec"
fi
done
printf '%s\\n' "\$last"
\`\`\`
**Never ask the user for an API key.** The daemon reads provider credentials from its config; keys are never passed through the shell. If the provider returns an auth error, tell the user to open Settings → AI Providers and confirm the key is configured there.
For the best fal image model use \`--model flux-pro-ultra\`. For video use \`--model veo-3-fal\` or \`--model wan-2.1-t2v\`. Always pass \`--surface\` explicitly (\`image\`, \`video\`, or \`audio\`). Any \`fal-ai/*\` path (e.g. \`fal-ai/flux/schnell\`, \`fal-ai/wan-i2v\`) is also a valid \`--model\` value for image/video — pass it through as-is without substitution.`;
const ACTIVE_DESIGN_SYSTEM_VISUAL_DIRECTION_OVERRIDE = `
---
## Active design system visual direction
Active design system exception: the active design system is the visual direction for this project. Use its DESIGN.md palette, typography, spacing, component rules, and theme tokens as the source of truth for color and mood.
- Do not ask the user to pick a separate theme color, visual direction, palette, typography mood, or direction card.
- Do not emit a direction question-form, a \`direction-cards\` picker, or any visual-direction card while an active design system is present.
- If an earlier discovery answer asks to "Pick a direction for me", treat that as already satisfied by the active design system and continue with the plan.
- When a downstream framework mentions "active direction" or "theme tokens", bind those fields from the active design system instead of the built-in direction library.
`;
const DEFAULT_DESIGN_SYSTEM_USAGE = `Read DESIGN.md for visual principles, paste tokens.css verbatim into the first <style> when it is provided, and match component shapes from the reference component manifest or fixture when available. Treat any pull-layer index as optional context for deeper inspection; do not assume those files have already been loaded.`;
function renderDesignSystemImportModeGuidance(
importMode: ComposeInput['designSystemImportMode'],
): string | undefined {
if (importMode === 'normalized') {
return 'This package is normalized. Treat tokens.css and DESIGN.md as the contract, and prefer OD token names over source-project names. Use pull-layer source evidence only as optional background.';
}
if (importMode === 'hybrid') {
return 'This package is hybrid. Build with OD-normalized tokens first, then inspect pull-layer source evidence or snippets only when original component behavior, density, or naming would materially improve fidelity.';
}
if (importMode === 'verbatim') {
return 'This package is verbatim-oriented. Preserve source semantics and source naming as much as possible. Before translating component behavior, inspect the relevant pull-layer source evidence or snippets when the runtime tool is available.';
}
return undefined;
}
export interface ComposeInput {
agentId?: string | null | undefined;
includeCodexImagegenOverride?: boolean | undefined;
streamFormat?: string | undefined;
skillBody?: string | undefined;
skillName?: string | undefined;
skillMode?:
| 'prototype'
| 'deck'
| 'template'
| 'design-system'
| 'image'
| 'video'
| 'audio'
| undefined;
skillModes?: Array<'prototype' | 'deck' | 'template' | 'design-system' | 'image' | 'video' | 'audio'> | undefined;
designSystemBody?: string | undefined;
designSystemTitle?: string | undefined;
// Compiled (machine-readable) form of the active brand's design system,
// shipped as sibling files to DESIGN.md when available. Both fields are
// optional; the daemon populates them by default for every brand that
// ships `tokens.css` / `components.html` (today: `default` and
// `kami`). `OD_DESIGN_TOKEN_CHANNEL=0` disables the channel as a kill
// switch. When present they are appended AFTER the DESIGN.md block so
// prose still sets the high-level voice and the structured form
// disambiguates token names + worked component shapes.
//
// - `designSystemUsageMd` — optional USAGE.md router that tells
// agents how to consume this package.
// - `designSystemTokensCss` — verbatim `tokens.css` :root contract
// that the agent pastes into the
// artifact's <style>.
// - `designSystemComponentsManifest` — concise structured summary
// derived from components.html.
// - `designSystemFixtureHtml` — verbatim `components.html`
// fallback when no manifest can
// be derived.
// - `designSystemPullIndex` — lightweight manifest-derived
// list of richer files available
// for later pull-channel work.
designSystemUsageMd?: string | undefined;
designSystemTokensCss?: string | undefined;
designSystemComponentsManifest?: string | undefined;
designSystemFixtureHtml?: string | undefined;
designSystemPullIndex?: string | undefined;
designSystemImportMode?: 'normalized' | 'hybrid' | 'verbatim' | undefined;
// Craft references the active skill opted into via `od.craft.requires`.
// The daemon resolves the slug list to file contents and concatenates
// them with section headers; we inject them between the DESIGN.md and
// the skill body so brand tokens win on conflict but craft rules
// (letter-spacing, accent caps, anti-slop) cover everything below.
craftBody?: string | undefined;
craftSections?: string[] | undefined;
// Markdown built from the user's auto-memory store
// (<dataDir>/memory/*.md). Folded in before the active design system so
// tone/voice/preferences extracted from past chats win over the
// built-in identity charter but still defer to the brand's hard tokens
// and the active skill's workflow. Empty/undefined skips the block.
memoryBody?: string | undefined;
// Project-level metadata captured by the new-project panel. Drives the
// agent's understanding of artifact kind, fidelity, speaker-notes intent
// and animation intent. Missing fields here are exactly what the
// discovery form should re-ask the user about on turn 1.
metadata?: ProjectMetadata | undefined;
// The template the user picked in the From-template tab, when present.
// Snapshot of HTML files that the agent should treat as a starting
// reference rather than a fixed deliverable.
template?: ProjectTemplate | undefined;
// Provider voice choices fetched by the daemon/web before composing the
// prompt. Used for ElevenLabs speech discovery so the agent can render
// a select question-form instead of asking the user to paste raw ids.
audioVoiceOptions?: AudioVoiceOption[] | undefined;
// When voice discovery fails, surface the error reason so the agent
// can tell the user why the dropdown is unavailable instead of
// pretending there were simply no voices.
audioVoiceOptionsError?: string | undefined;
// When present and enabled, the Critique Theater protocol addendum is
// concatenated to the end of the composed prompt. Omitting this field
// (or passing cfg.enabled === false) preserves legacy behavior unchanged.
critique?: CritiqueConfig | undefined;
// Brand name and DESIGN.md body. Required when critique is enabled;
// ignored when critique is disabled or omitted.
critiqueBrand?: { name: string; design_md: string } | undefined;
// Skill identifier. Required when critique is enabled;
// ignored when critique is disabled or omitted.
critiqueSkill?: { id: string } | undefined;
// External MCP servers the daemon already holds a valid OAuth Bearer
// token for at spawn time. We surface the list to the model so it does
// NOT chase Claude Code's synthetic `*_authenticate` /
// `*_complete_authentication` tools that get injected when the HTTP
// transport's first connect transiently flips a server into
// needs-auth state — the Bearer is in `.mcp.json`, the real tools are
// available, and burning a turn on a redundant OAuth dance just
// confuses the user.
connectedExternalMcp?: ReadonlyArray<{ id: string; label?: string | undefined }>
| undefined;
// Optional `## Active plugin` / `## Plugin inputs` block. The daemon's
// plugin module renders this from an AppliedPluginSnapshot; we splice
// it in after the active skill so the plugin description sits next to
// its companion skill body in the prompt. Pass undefined when no
// plugin is bound to the run.
pluginBlock?: string | undefined;
// Plan §3.L2 / spec §23.4 — pre-rendered `## Active stage: <id>`
// blocks (one per pipeline stage active for the run). The daemon's
// pipeline runner builds these from `loadAtomBodies()` +
// `renderActiveStageBlock()` when the OD_BUNDLED_ATOM_PROMPTS env
// flag is set; otherwise this stays undefined and the prompt
// composer's hard-coded constants keep their precedence (back-compat).
activeStageBlocks?: ReadonlyArray<string> | undefined;
// Free-form instructions the user set at the global (user-level)
// settings panel. Injected after personal memory and before the
// project-level instructions.
userInstructions?: string | undefined;
// Free-form instructions the user set on this specific project.
// Injected after user-level instructions and before the design system.
projectInstructions?: string | undefined;
// UI locale selected by the client. User-visible generated form copy
// must follow this locale even when the user's initial prompt is brief.
locale?: string | undefined;
// Run-scoped media policy. Defaults to enabled when omitted so existing
// local OD behavior keeps the same media prompt contract.
mediaExecution?: MediaExecutionPolicy | undefined;
}
export function composeSystemPrompt({
agentId,
includeCodexImagegenOverride = true,
skillBody,
skillName,
skillMode,
skillModes,
designSystemBody,
designSystemTitle,
designSystemUsageMd,
designSystemTokensCss,
designSystemComponentsManifest,
designSystemFixtureHtml,
designSystemPullIndex,
designSystemImportMode,
craftBody,
craftSections,
memoryBody,
metadata,
template,
audioVoiceOptions,
audioVoiceOptionsError,
critique,
critiqueBrand,
critiqueSkill,
connectedExternalMcp,
pluginBlock,
activeStageBlocks,
streamFormat,
locale,
userInstructions,
projectInstructions,
mediaExecution,
}: ComposeInput): string {
// Discovery + philosophy goes FIRST so its hard rules ("emit a form on
// turn 1", "branch on brand on turn 2", "TodoWrite on turn 3", run
// checklist + critique before <artifact>) win precedence over softer
// wording later in the official base prompt.
const parts: string[] = [];
const activeDesignSystemBody = designSystemBody?.trim();
const activeSkillModes = new Set(
Array.isArray(skillModes)
? skillModes.filter(Boolean)
: skillMode
? [skillMode]
: [],
);
const resolvedExclusiveSurface = resolveExclusiveSurface({ metadata, skillMode, skillModes });
// API/BYOK mode (streamFormat === 'plain'): mirrors the same fix from
// `@open-design/contracts`'s composer. The daemon hits this path for
// any plain-stream adapter (e.g. DeepSeek), so without pinning the
// override above DISCOVERY_AND_PHILOSOPHY here too, those daemon
// agents still emit the `<todo-list>` / `[读取 X]` pseudo-tool
// markup described in #313. Keep the wording byte-identical to the
// contracts copy so both code paths produce the same observable
// behaviour.
if (streamFormat === 'plain') {
parts.push(API_MODE_OVERRIDE);
parts.push('\n\n---\n\n');
}
// Skip the HTML-artifact discovery layer for media surfaces (image / video /
// audio). DISCOVERY_AND_PHILOSOPHY is ~3 000 tokens of rules about question
// forms, brand extraction, direction pickers, and HTML artifact checklist —
// none of which apply to media generation. Including it forces the agent to
// parse and override all of those rules before it can start, adding tokens
// and LLM inference time. The MEDIA_GENERATION_CONTRACT (pushed below) is
// the sole workflow authority for these surfaces.
const isMediaSurfaceEarly =
skillMode === 'image' ||
skillMode === 'video' ||
skillMode === 'audio' ||
metadata?.kind === 'image' ||
metadata?.kind === 'video' ||
metadata?.kind === 'audio';
if (metadata?.skipDiscoveryBrief === true) {
parts.push(SKIP_DISCOVERY_BRIEF_OVERRIDE);
parts.push('\n\n---\n\n');
}
const localePrompt = renderUiLocalePrompt(locale);
if (localePrompt) {
parts.push(localePrompt);
parts.push('\n\n---\n\n');
}
if (!isMediaSurfaceEarly) {
parts.push(DISCOVERY_AND_PHILOSOPHY, '\n\n---\n\n');
}
parts.push(
'# Identity and workflow charter (background)\n\n',
BASE_SYSTEM_PROMPT,
);
if (memoryBody && memoryBody.trim().length > 0) {
parts.push(
`\n\n## Personal memory (auto-extracted from past chats)\n\nThe following facts have been sedimented from this user's previous conversations and edited in the settings panel. Treat them as preferences and context, NOT hard rules: when they collide with the active design system tokens, the brand wins; when they collide with the active skill's workflow, the skill wins. They are still authoritative for tone, voice, terminology, and what the user already told you about themselves and their goals — never re-ask the user about something already captured here.\n\n${memoryBody.trim()}`,
);
}
if (userInstructions && userInstructions.trim().length > 0) {
parts.push(
`\n\n## Custom instructions (user-level)\n\nThe user has set the following persistent instructions. Apply them as defaults to every project. When a project-level instruction below contradicts a point here, the project-level version wins.\n\n${userInstructions.trim()}`,
);
}
if (projectInstructions && projectInstructions.trim().length > 0) {
parts.push(
`\n\n## Custom instructions (project-level)\n\nThe user has set the following instructions for this specific project. They take precedence over user-level custom instructions whenever both address the same topic (e.g. if user-level says "use spaces" but project-level says "use tabs", use tabs).\n\n${projectInstructions.trim()}`,
);
}
if (activeDesignSystemBody && activeDesignSystemBody.length > 0) {
const usageBlock =
designSystemUsageMd && designSystemUsageMd.trim().length > 0
? designSystemUsageMd.trim()
: DEFAULT_DESIGN_SYSTEM_USAGE;
parts.push(
`\n\n## How to use this design system${designSystemTitle ? `${designSystemTitle}` : ''}\n\n${usageBlock}`,
);
parts.push(
`\n\n## Active design system${designSystemTitle ? `${designSystemTitle}` : ''}\n\nTreat the following DESIGN.md as authoritative for color, typography, spacing, and component rules. Do not invent tokens outside this palette. When you copy the active skill's seed template, bind these tokens into its \`:root\` block before generating any layout.\n\n${activeDesignSystemBody}`,
);
const importModeGuidance = renderDesignSystemImportModeGuidance(designSystemImportMode);
if (importModeGuidance) {
parts.push(
`\n\n## Design system import mode${designSystemTitle ? `${designSystemTitle}` : ''}\n\n${importModeGuidance}`,
);
}
}
// Structured (compiled) form of the active brand. The DESIGN.md above
// sets voice and intent; the tokens.css block below is the SAME
// contract in machine-readable form — names + values the agent pastes
// verbatim instead of re-deriving from prose. The components.html
// manifest grounds the token vocabulary in worked component shapes
// (button / card / type roles) without injecting the full HTML fixture.
// If manifest extraction fails or is unavailable, the composer falls
// back to the verbatim components.html fixture. Both blocks are
// individually gated: missing files skip silently, preserving the
// legacy DESIGN.md-only behaviour for prose-only brands.
if (designSystemTokensCss && designSystemTokensCss.trim().length > 0) {
parts.push(
`\n\n## Active design system tokens${designSystemTitle ? `${designSystemTitle}` : ''}\n\nThe block below is this brand's tokens.css contract — every \`:root\` custom property and any scoped override (e.g. \`:root[lang=...]\`) the brand defines. **Paste the unscoped \`:root { ... }\` block verbatim into the artifact's first \`<style>\`** so every \`var(--*)\` reference resolves at runtime.\n\nDo not invent new tokens. Do not redefine these values. Do not write raw hex outside this :root block. The DESIGN.md above is prose; this is the binding contract.\n\n\`\`\`css\n${designSystemTokensCss.trim()}\n\`\`\``,
);
}
if (designSystemComponentsManifest && designSystemComponentsManifest.trim().length > 0) {
parts.push(
`\n\n## Reference component manifest${designSystemTitle ? `${designSystemTitle}` : ''}\n\nA compact structured summary derived from this brand's components.html fixture. Use it as the component inventory for generated artifacts: match the listed selectors, component groups, class names, token references, focus behavior, and spacing cadence. Prefer these manifest entries over inventing new component shapes.\n\n\`\`\`text\n${designSystemComponentsManifest.trim()}\n\`\`\``,
);
} else if (designSystemFixtureHtml && designSystemFixtureHtml.trim().length > 0) {
parts.push(
`\n\n## Reference fixture${designSystemTitle ? `${designSystemTitle}` : ''}\n\nA self-contained worked artifact in this design system. Match its component shapes (button structure, card structure, type-scale rhythm, focus ring, spacing cadence) when generating new artifacts. Copying fragments is encouraged as long as you keep the \`var(--*)\` references intact — they are already wired to the tokens above.\n\n\`\`\`html\n${designSystemFixtureHtml.trim()}\n\`\`\``,
);
}
if (designSystemPullIndex && designSystemPullIndex.trim().length > 0) {
parts.push(
`\n\n## Pull-layer files available on demand${designSystemTitle ? `${designSystemTitle}` : ''}\n\nThis design-system package declares richer files for inspection, source evidence, or human preview. Keep the push prompt light: use the index below to decide what to read later. When the runtime tool environment is available, read a listed path with \`\"$OD_NODE_BIN\" \"$OD_BIN\" tools design-systems read --path <path>\`; the daemon will reject paths outside this manifest allowlist.\n\n\`\`\`text\n${designSystemPullIndex.trim()}\n\`\`\``,
);
}
if (craftBody && craftBody.trim().length > 0) {
const sectionLabel =
Array.isArray(craftSections) && craftSections.length > 0
? `${craftSections.join(', ')}`
: '';
parts.push(
`\n\n## Active craft references${sectionLabel}\n\nThe following craft rules are universal — they apply on top of the active design system above, regardless of brand. The DESIGN.md decides *which* tokens to use; craft rules decide *how* to use them. On any conflict between a craft rule and a brand DESIGN.md, the brand wins for token values; craft rules still apply to anything the brand does not override (letter-spacing, accent overuse caps, anti-slop patterns).\n\n${craftBody.trim()}`,
);
}
if (skillBody && skillBody.trim().length > 0) {
const preflight = derivePreflight(skillBody);
parts.push(
`\n\n## Active skill${skillName ? `${skillName}` : ''}\n\nFollow this skill's workflow exactly.${preflight}\n\n${skillBody.trim()}`,
);
}
if (pluginBlock && pluginBlock.trim().length > 0) {
parts.push(pluginBlock);
}
// Plan §3.L2 / spec §23.4 — splice per-stage atom blocks immediately
// after the active plugin block. Empty entries are skipped so a
// pipeline whose stages don't resolve any bundled atom bodies
// produces zero extra prompt mass. The active-skill body above
// remains the precedence carrier; these blocks add the stage-by-
// stage atom guidance that spec §23.3.2 calls out.
if (Array.isArray(activeStageBlocks) && activeStageBlocks.length > 0) {
for (const block of activeStageBlocks) {
if (typeof block === 'string' && block.trim().length > 0) {
parts.push(block);
}
}
}
const metaBlock = renderMetadataBlock(
metadata,
template,
audioVoiceOptions,
audioVoiceOptionsError,
mediaExecution,
);
if (metaBlock) parts.push(metaBlock);
// Decks have a load-bearing framework (nav, counter, scroll JS, print
// stylesheet for PDF stitching). Pin it last so it overrides any softer
// wording earlier in the stack ("write a script that handles arrows…").
//
// We fire on either (a) the active skill is a deck skill OR (b) the
// project metadata declares kind=deck. Case (b) catches projects created
// without a skill (skill_id null) — without this, a deck-kind project
// with no bound skill gets neither a skill seed nor the framework
// skeleton, and the agent writes scaling / nav / print logic from scratch
// with the same buggy `place-items: center` + transform pattern we keep
// having to fix at runtime. Skill seeds (when present) win — they
// already define their own opinionated framework (simple-deck's
// scroll-snap, guizang-ppt's magazine layout) and re-pinning the generic
// skeleton would conflict. The skill-seed path takes over via
// `derivePreflight` above, so we only fire the generic skeleton when no
// skill seed is on offer.
const isDeckProject = resolvedExclusiveSurface === 'deck';
const isFreeformProject = activeSkillModes.size === 0 && (!metadata || metadata.kind === 'other');
const hasSkillSeed =
!!skillBody && /assets\/template\.html/.test(skillBody);
if (isDeckProject && !hasSkillSeed) {
parts.push(`\n\n---\n\n${DECK_FRAMEWORK_DIRECTIVE}`);
} else if (isFreeformProject && !hasSkillSeed) {
// Freeform / kind=other projects skip the kind picker entirely and
// land here. If the user's brief is a deck/keynote/slides ("讲解",
// "presentation", "make a deck"), the agent used to invent its own
// scale-to-fit + slide visibility + nav script from scratch and
// shipped subtle CSS specificity bugs (per-slide layout classes
// overriding `.slide { display:none }`). Inject the same framework
// here, prefixed with a one-line conditional so the agent only
// adopts it when the brief actually is a deck — otherwise the
// directive is read as background reference and ignored.
parts.push(
`\n\n---\n\n## If this brief is a slide deck / keynote / presentation\n\nThe user did not pre-select a "Slide deck" surface, but their request may still call for one. **If — and only if — the brief reads as slides, keynote, presentation, deck, PPT, or 讲解, follow the framework below.** Otherwise ignore everything in this section and continue with the freeform output you would have written anyway.\n\n${DECK_FRAMEWORK_DIRECTIVE}`,
);
}
const isMediaSurface =
resolvedExclusiveSurface === 'image'
|| resolvedExclusiveSurface === 'video'
|| resolvedExclusiveSurface === 'audio';
if (isMediaSurface) {
parts.push(renderMediaGenerationContract(mediaExecution));
} else {
// Non-media projects (prototype, deck, etc.): inject a lightweight hint
// so the agent uses `od media generate` if the user asks for an image/video
// mid-session, rather than hunting for provider API keys in the environment.
parts.push(MEDIA_DISPATCH_HINT);
}
if (includeCodexImagegenOverride && shouldAllowCodexImagegenOverride(metadata, mediaExecution)) {
const codexImagegenOverride = renderCodexImagegenOverride(
agentId,
metadata,
);
if (codexImagegenOverride) {
parts.push(codexImagegenOverride);
}
}
// Critique Theater addendum. When cfg.enabled is true the panel protocol
// is pinned last so it overrides any softer critique wording earlier in the
// stack. When disabled (the default) this block is a no-op so no consumer
// needs to opt in.
//
// The panel block requires <ARTIFACT mime="text/html"> inside <CRITIQUE_RUN>,
// which conflicts with MEDIA_GENERATION_CONTRACT (image/video/audio surfaces
// explicitly forbid HTML output). Skip the addendum on media surfaces so
// the critique flag is a no-op there until a media-aware panel template
// lands.
const cfg = critique ?? defaultCritiqueConfig();
if (cfg.enabled && critiqueBrand && critiqueSkill && !isMediaSurface) {
parts.push('\n\n' + renderPanelPrompt({ cfg, brand: critiqueBrand, skill: critiqueSkill }));
}
if (activeDesignSystemBody && activeDesignSystemBody.length > 0) {
parts.push(ACTIVE_DESIGN_SYSTEM_VISUAL_DIRECTION_OVERRIDE);
}
const mcpDirective = renderConnectedExternalMcpDirective(connectedExternalMcp);
if (mcpDirective) parts.push(mcpDirective);
// Claude only: nudge the model toward the `AskUserQuestion` tool for
// mid-conversation clarifications. Without this hint Claude tends to fall
// back to a markdown bulleted list of options, which the chat UI cannot
// turn into clickable buttons. Discovery (turn 1) is still owned by the
// `<question-form>` flow defined in DISCOVERY_AND_PHILOSOPHY; this only
// covers follow-ups where the next action depends on a small set of
// choices the user can pick quickly.
if (agentId === 'claude') {
parts.push(
"\n\n---\n\n## Clarifying questions\n\nWhen you need a mid-conversation clarification AND the natural answer is one of a small finite set of choices (2-4 options per question), call the `AskUserQuestion` tool instead of writing a bulleted list in markdown. The host chat renders the tool call as inline choice buttons; a markdown list renders as plain text and forces the user to type a reply. Skip the tool when the answer is naturally free-form text, when the answer needs more than ~4 options, or when you only have one yes/no choice to ask. First-turn discovery still uses the `<question-form id=\"discovery\">` workflow described earlier; `AskUserQuestion` is for follow-ups only.\n\n**When you call `AskUserQuestion`, that tool call is the entire response.** Do NOT also write the same questions or options as markdown text alongside it, do NOT add a trailing prose paragraph like \"what sounds right?\", do NOT hedge by listing the options twice. Emit the tool call and stop generating tokens. The host is waiting on the tool's `tool_result` and will resume your turn the moment the user answers. Anything you write before, between, or after the tool call in the same message just duplicates what the card already shows and confuses the user.",
);
}
// Pinned LAST so recency bias reinforces the role-marker prohibition.
// This is the canonical anti-roleplay instruction;
parts.push(
"\n\n---\n\n## CRITICAL: Never fabricate conversation turns\n\n" +
"The text you emit is processed by a chat host that interprets lines " +
"starting with \`## user\`, \`## assistant\`, or \`## system\` as real " +
"turn boundaries. Emitting these lines causes the host to treat your " +
"fabricated text as a real user request and execute unauthorised actions.\n\n" +
"**FORBIDDEN — you MUST NOT:**\n" +
"- Emit any line starting with \`## user\`, \`## assist\`, \`## assistant\`, or \`## system\`\n" +
"- Roleplay multiple turns inside a single response\n" +
"- Invent a user message and then reply to it\n\n" +
"The host will truncate your response at the first role-marker line — " +
"any text after it is lost. If you feel the urge to simulate a dialogue, " +
"stop and ask the user a real question instead.",
);
return parts.join('');
}
/**
* Top-anchored override for plain-stream daemon agents (#313). Mirrors
* the contracts-package copy byte-for-byte; see that file for the full
* rationale. Pinning it at the absolute top of the composed prompt is
* what beats the discovery layer's own "these override anything later"
* header — the old bottom-appended `## API mode rule` lost that
* precedence war and let `<todo-list>` / `[读取 X]` pseudo-tool markup
* leak into the chat.
*/
const API_MODE_OVERRIDE = `# API mode — no tools available (read first — overrides every rule below)
You are running through a plain Messages API. **No tools are wired through to you.** \`TodoWrite\`, \`Read\`, \`Write\`, \`Edit\`, \`Bash\`, and \`WebFetch\` are unavailable — calls to them will not execute and will not render in the UI.
Every later instruction in this prompt that tells you to "call TodoWrite", "run Bash", "read via Read", or otherwise invoke a tool is describing the daemon-mode workflow. In this API run those instructions are **overridden** — do not attempt them and do not pretend you did.
**Forbidden output:**
- Pseudo-tool markup such as \`<todo-list>...</todo-list>\`, \`<tool-call>\`, or invented XML wrappers around a plan.
- Fake-protocol prose such as \`[读取 template.html ...]\`, \`[读取 layouts.md ...]\`, \`[正在调用 TodoWrite ...]\`, or any \`[doing X]\` placeholder narrating a tool you cannot run.
- Statements like "I'll call TodoWrite to track this" or "let me read the skill file first" — there is no TodoWrite and no Read in this run.
**Allowed output:**
- Plain chat prose to the user (in their language). State your plan as prose — a short numbered list in markdown is fine; it just must not be wrapped in \`<todo-list>\` or claim to be a tool call.
- A final \`<artifact type="text/html">...</artifact>\` block containing a complete \`<!doctype html>\` document when the brief is ready to deliver.
- \`<question-form>\` blocks for discovery on turn 1, exactly as the rules below describe — question-form is markup the UI parses, not a tool call.
If the rules below tell you to plan with TodoWrite, write the plan as prose instead. If they tell you to read skill side files before writing, describe in one sentence which patterns/conventions you're going to apply and proceed. If they tell you to run brand-spec extraction via Bash + Read + WebFetch, ask the user the missing brand questions in the discovery form instead.`;
// Defense-in-depth against Claude Code's synthetic OAuth tools.
//
// When Claude Code's built-in HTTP MCP transport gets a 401 on its first
// initialize (transient propagation lag, edge cache miss, header
// re-canonicalization quirk, etc.), it injects two synthetic tools per
// server — `mcp__<server>__authenticate` and
// `mcp__<server>__complete_authentication` — that drive a per-process
// OAuth dance with a `localhost:<random>/callback` redirect_uri. That
// listener dies with the agent process, so the round-trip never
// completes, and meanwhile the model burns a turn pasting an
// unreachable URL into the chat. By the time the user is back, our
// daemon-issued Bearer is already in `.mcp.json` and the real tools
// (`generate_image`, `models_explore`, …) are reachable on the next
// turn — but the model doesn't know that and keeps escalating the
// fake auth flow.
//
// The fix is to tell the model up front: these specific servers are
// already authenticated by the daemon, do NOT call any
// `*_authenticate` / `*_complete_authentication` tool for them. If
// the real tools really are missing, surface that as a separate
// failure instead of pivoting to the synthetic flow.
function renderConnectedExternalMcpDirective(
connectedExternalMcp:
| ReadonlyArray<{ id: string; label?: string | undefined }>
| undefined,
): string {
if (!connectedExternalMcp || connectedExternalMcp.length === 0) return '';
const lines = connectedExternalMcp
.map((s) => {
const id = typeof s?.id === 'string' ? s.id.trim() : '';
if (!id) return null;
const label = typeof s?.label === 'string' && s.label.trim() ? s.label.trim() : id;
return `- \`${id}\`${label !== id ? ` (${label})` : ''}`;
})
.filter((line): line is string => typeof line === 'string');
if (lines.length === 0) return '';
return [
'\n\n---\n\n',
'## External MCP servers — already authenticated\n\n',
'The following external MCP servers are already authenticated for this run via an OAuth Bearer token the daemon injected into `.mcp.json`. You can call their real tools directly:\n\n',
lines.join('\n'),
'\n\n',
'**Do NOT call any tool whose name matches `mcp__<server>__authenticate` or `mcp__<server>__complete_authentication` for the servers above.** Those are synthetic fallback tools Claude Code exposes when its first HTTP connect briefly flipped the server into a needs-auth state. The flow they drive (a `localhost:<random>/callback` redirect) cannot complete in this environment, and the real tools (e.g. `generate_image`, `models_explore`, `balance`, …) are already reachable.\n\n',
'If a real tool actually fails with an auth-related error, report the exact tool name and error text and stop — the user will reconnect the server in Settings → External MCP. Do not retry by invoking any `*_authenticate` tool.\n',
].join('');
}
const CODEX_IMAGEGEN_MODEL_IDS = new Set(
IMAGE_MODELS.filter(
(model) =>
model?.provider === 'openai' &&
typeof model?.id === 'string' &&
model.id.startsWith('gpt-image-'),
).map((model) => model.id),
);
export function resolveCodexImagegenModelId(
metadata: ProjectMetadata | undefined,
): string {
const imageModel =
typeof metadata?.imageModel === 'string' ? metadata.imageModel.trim() : '';
return CODEX_IMAGEGEN_MODEL_IDS.has(imageModel) ? imageModel : '';
}
export function shouldRenderCodexImagegenOverride(
agentId: string | null | undefined,
metadata: ProjectMetadata | undefined,
): boolean {
const normalizedAgentId =
typeof agentId === 'string' ? agentId.trim().toLowerCase() : '';
return (
normalizedAgentId === 'codex' &&
metadata?.kind === 'image' &&
resolveCodexImagegenModelId(metadata).length > 0
);
}
function shouldAllowCodexImagegenOverride(
metadata: ProjectMetadata | undefined,
mediaExecution: MediaExecutionPolicy | undefined,
): boolean {
const mode = mediaExecution?.mode ?? 'enabled';
if (mode !== 'enabled') return false;
if (
Array.isArray(mediaExecution?.allowedSurfaces) &&
mediaExecution.allowedSurfaces.length > 0 &&
!mediaExecution.allowedSurfaces.includes('image')
) {
return false;
}
const model = resolveCodexImagegenModelId(metadata);
if (
model &&
Array.isArray(mediaExecution?.allowedModels) &&
mediaExecution.allowedModels.length > 0 &&
!mediaExecution.allowedModels.includes(model)
) {
return false;
}
return true;
}
export function renderCodexImagegenOverride(
agentId: string | null | undefined,
metadata: ProjectMetadata | undefined,
): string {
if (!shouldRenderCodexImagegenOverride(agentId, metadata)) {
return '';
}
const imageModel = resolveCodexImagegenModelId(metadata);
return `
---
## Codex built-in imagegen override (load-bearing — Codex only)
The active agent is Codex and this image project selected \`${imageModel}\`.
For this specific case, use Codex's built-in image generation capability
instead of \`"$OD_NODE_BIN" "$OD_BIN" media generate\` for the first generation
attempt. This is an intentional exception to the media generation contract and
the active image skill's dispatcher wording.
Do not require, request, or mention \`OPENAI_API_KEY\` before trying the
built-in path. Reuse the project metadata, reference prompt template, aspect
ratio, style notes, and the user's current brief to form the final image
prompt. Generate the image with Codex built-in imagegen, then use the actual
output path returned by the built-in imagegen result as the source file first.
Only if the built-in result does not return a usable path should you search
\`\${CODEX_HOME:-$HOME/.codex}/generated_images/.../ig_*.png\` as a fallback
source. Never leave a project-referenced asset only under \`$CODEX_HOME\`.
Copy or move the selected generated file into \`$OD_PROJECT_DIR\` with a short
descriptive filename, then verify the exact destination file exists under
\`$OD_PROJECT_DIR\` before claiming success. If reading the source path,
creating the destination directory, copying/moving, or verifying the copied
asset fails, report the exact source path, destination path, and access/copy
error. Do not claim success, silently fall back, or ask about OpenAI/Azure
fallback after a generated image exists but the project copy fails; stop after
reporting the failure unless the user explicitly chooses fallback in a later
turn, because fallback may create a different image.
After the file exists under \`$OD_PROJECT_DIR\`, reply with the project-local
filename and a short summary of the prompt used. Do not emit an \`<artifact>\`
block for media.
If Codex built-in imagegen is unavailable or generation fails before producing
an image, surface the actual failure message and ask the user for one-time
confirmation before falling back to the existing OpenAI/Azure API-key provider
path via \`"$OD_NODE_BIN" "$OD_BIN" media generate --surface image --model ${imageModel}\`.
Do not silently fall back.`;
}
function renderMetadataBlock(
metadata: ProjectMetadata | undefined,
template: ProjectTemplate | undefined,
audioVoiceOptions: AudioVoiceOption[] | undefined,
audioVoiceOptionsError: string | undefined,
mediaExecution: MediaExecutionPolicy | undefined,
): string {
if (!metadata) return '';
const lines: string[] = [];
lines.push('\n\n## Project metadata');
lines.push(
'These are the structured choices the user made (or skipped) when creating this project. Treat known fields as authoritative; for any field marked "(unknown — ask)" you MUST include a matching question in your turn-1 discovery form.',
);
lines.push('');
lines.push(`- **kind**: ${metadata.kind}`);
if (metadata.platform) {
lines.push(`- **platform**: ${metadata.platform}`);
} else if (metadata.kind === 'prototype' || metadata.kind === 'template' || metadata.kind === 'other') {
lines.push('- **platform**: (unknown — ask: responsive web, desktop web, iOS app, Android app, tablet app, or desktop app?)');
}
if (Array.isArray(metadata.platformTargets) && metadata.platformTargets.length > 0) {
lines.push(`- **platformTargets**: ${metadata.platformTargets.join(', ')}`);
}
if (metadata.platform === 'responsive' || metadata.platformTargets?.includes('responsive')) {
lines.push(
'- **responsive web contract**: `responsive` means one web product experience that adapts across modern browser/device ranges, not only legacy desktop/tablet/mobile buckets. It is not an iOS app, Android app, or native tablet app target. Show responsive behavior through real product layout changes; do not render viewport labels as user-facing product content. Cover 20252026 breakpoints: mobile compact 360px, mobile standard 390430px, foldable/small tablet 600744px, tablet portrait 768834px, tablet landscape/large tablet 10241180px, laptop 12801366px, desktop 14401536px, and wide 1920px. Use fluid `clamp()` scales, container queries where useful, and explicit layout changes at semantic thresholds. Verify no horizontal scroll at 360px, 390px, 430px, 768px, 820px, 1024px, 1366px, 1440px, and 1920px unless the brief explicitly asks for a pan/board canvas.',
);
}
if ((metadata.platformTargets?.length ?? 0) > 1) {
lines.push(
'- **cross-platform deliverable rule**: each selected target keeps the same product goal but MUST be delivered as its own product screen/file when more than one concrete target is selected. Use clear files such as `landing.html` (if enabled), `mobile-ios.html`, `mobile-android.html`, `tablet.html`, `desktop.html`, plus shared `css/` and `js/` when useful. `index.html` may be a launcher/overview that links to these files, but it must not be the only place where mobile/tablet/desktop designs live. Do not collapse cross-platform work into a single tabbed demo, selector UI, comparison board, platform map, or labelled documentation section inside one mock product page.',
);
}
if (metadata.kind === 'prototype' || metadata.kind === 'template' || metadata.kind === 'other') {
lines.push(
'- **screen-file-first rule**: each distinct user-facing screen or surface MUST be delivered as its own HTML file unless the user explicitly asks for a single-page scroll or single-file artifact. Do not combine landing pages, product app screens, dashboards, history, pricing, settings, mobile app, tablet app, desktop app, or OS widget surfaces into one long page. Use `index.html` as a launcher/overview that links to screen files when more than one screen exists; it may summarize the product and show screen cards, but it must not contain the full design for every screen.',
);
lines.push(
'- **product-realism rule**: final artifacts must look like real end-user product UI. Do not render project metadata, screen counts, target counts, state counts, "demo only" labels, "settings" panels for choosing platforms, "full design target" badges, viewport/device selector controls, theme/style knobs, platform output maps, behavior-spec sections, or design-process cards inside the product unless the user explicitly asks for a design spec/dashboard. Any navigation/tabs inside the artifact must be real product navigation, not designer controls for switching generated mockups.',
);
lines.push(
'- **visual-system rule**: when the user does not specify colors, layout, or visual direction, you must still make an intentional product-appropriate visual system. Infer a palette from the product category and audience with at least: neutral surface tokens, a primary action color, a secondary/domain accent, and status colors. Avoid plain monochrome/unstyled greyscale outputs. Use tasteful gradients, illustrations, iconography, device/product mockups, and colored state moments where they clarify the product, while still avoiding generic beige/peach/pink/brown AI washes.',
);
lines.push(
'- **app-specific modules rule**: include domain-specific in-app modules/components by default (cards, panels, controls, charts, lists, quick actions, status modules, mini players, checkout/cart summaries, etc. as appropriate). These are product UI modules, not OS home-screen widgets. Give each major module a clear purpose, states, and responsive behavior instead of generic card grids.',
);
lines.push(
'- **CJX-ready UX rule**: the artifact must be implementation-ready, not a static screenshot. Structure CSS tokens/components/responsive sections clearly; include real JavaScript behavior for meaningful UX such as tabs, dialogs, drawers, filters, generation/copy actions, validation, playback controls, or state transitions. If keeping a self-contained `index.html`, put the CSS/JS in clearly labelled blocks; for complex UX, generate `css/` and `js/` files when useful.',
);
lines.push(
'- **interaction-fidelity rule**: when the requested screen includes user input, generation, copying, validation, login, checkout, filtering, or any action verb, build real interactive controls for that screen. Do not substitute static text rows, prefilled-only mockups, screenshot-like device frames, or decorative state cards for editable inputs and working actions.',
);
lines.push(
'- **artifact-output rule**: when you generate an HTML artifact, keep conversational prose concise and product-facing. Do not dump the full raw HTML source back into chat; the artifact/file is the source of truth and the assistant message should only summarize the result.',
);
}
if (metadata.includeLandingPage) {
lines.push(
'- **includeLandingPage**: true — create `landing.html` as a separate responsive marketing companion surface in addition to the selected product/app screens. Do not implement the landing page only as a section inside `index.html`, even for responsive-web-only projects. If there is a working product/app screen, create it as a separate file such as `app.html`, `dashboard.html`, or a domain-specific screen name. `index.html` should be a lightweight launcher/overview when multiple files exist. Include hero, value props, product screenshots/device mockups, proof/features, and an appropriate CTA such as waitlist, download, or contact sales.',
);
}
if (metadata.includeOsWidgets) {
lines.push(
'- **includeOsWidgets**: true — add platform-native OS home-screen / lock-screen / quick-access widget surfaces where relevant. These are outside-the-app widgets (for example iOS WidgetKit, Android home screen widget, Live Activity/lock screen, tablet glance panel), not in-app cards. Include realistic widget sizes and direct quick actions for the domain.',
);
}
if (metadata.intent === 'live-artifact') {
lines.push(
'- **intent**: live-artifact — the user chose New live artifact. The first output should be a live artifact/dashboard/report, not a one-off static mockup. Prefer the `live-artifact` skill workflow when available, keep source data compact, and register through the daemon live-artifact tool path once that wrapper/tooling is available.',
);
lines.push(
'- **connector-source rule**: if the user names a connector/source (for example Notion) and daemon connector tools are available, list connectors before asking where the data comes from. When the named connector is `connected`, use its read-only tools and ask follow-up questions only for missing topic/page/database details, multiple equally plausible matches, or an unconnected/missing connector.',
);
}
if (metadata.kind === 'prototype') {
lines.push(
`- **fidelity**: ${metadata.fidelity ?? '(unknown — ask: wireframe vs high-fidelity)'}`,
);
}
if (metadata.kind === 'deck') {
lines.push(
`- **slideCount**: ${metadata.slideCount ?? '(unknown — ask only if the Active plugin / Plugin inputs block does not already include slideCount)'}`,
);
lines.push(
`- **speakerNotes**: ${typeof metadata.speakerNotes === 'boolean' ? metadata.speakerNotes : '(unknown — ask: include speaker notes?)'}`,
);
}
if (metadata.kind === 'template') {
lines.push(
`- **animations**: ${typeof metadata.animations === 'boolean' ? metadata.animations : '(unknown — ask: include motion/animations?)'}`,
);
if (metadata.templateLabel) {
lines.push(`- **template**: ${metadata.templateLabel}`);
}
}
if (metadata.kind === 'image') {
lines.push(
`- **imageModel**: ${metadata.imageModel ?? 'gpt-image-2 (default — override if the user asks for a specific model or provider)'}`,
);
lines.push(
`- **aspectRatio**: ${metadata.imageAspect ?? '1:1 (default — use 16:9 for landscape/outdoor scenes, 9:16 for portrait/vertical)'}`,
);
if (metadata.imageStyle) {
lines.push(`- **styleNotes**: ${metadata.imageStyle}`);
}
if (
metadata.promptTemplate?.title &&
typeof metadata.promptTemplate.prompt === 'string' &&
metadata.promptTemplate.prompt.trim().length > 0
) {
lines.push(`- **referenceTemplate**: ${metadata.promptTemplate.title}`);
}
lines.push('');
lines.push(renderMediaMetadataAction(
'image',
'`"$OD_NODE_BIN" "$OD_BIN" media generate --surface image --model <imageModel>`',
mediaExecution,
));
}
if (metadata.kind === 'video') {
lines.push(
`- **videoModel**: ${metadata.videoModel ?? '(unknown — ask: which video model to use)'}`,
);
lines.push(
`- **lengthSeconds**: ${typeof metadata.videoLength === 'number' ? metadata.videoLength : '(unknown — ask: 3s / 5s / 10s)'}`,
);
lines.push(
`- **aspectRatio**: ${metadata.videoAspect ?? '(unknown — ask: 16:9, 9:16, 1:1)'}`,
);
if (
metadata.promptTemplate?.title &&
typeof metadata.promptTemplate.prompt === 'string' &&
metadata.promptTemplate.prompt.trim().length > 0
) {
lines.push(`- **referenceTemplate**: ${metadata.promptTemplate.title}`);
}
lines.push('');
lines.push(renderMediaMetadataAction(
'video',
'`"$OD_NODE_BIN" "$OD_BIN" media generate --surface video --model <videoModel> --length <seconds> --aspect <ratio>`',
mediaExecution,
));
if (metadata.videoModel === 'hyperframes-html') {
lines.push(
'Special case: `hyperframes-html` is a local HTML-to-MP4 renderer, not a photoreal text-to-video model. Treat it like a motion design renderer, ask at most one clarifying question, then dispatch immediately.',
);
}
}
if (metadata.kind === 'audio') {
lines.push(
`- **audioKind**: ${metadata.audioKind ?? '(unknown — ask: music / speech / sfx)'}`,
);
lines.push(
`- **audioModel**: ${metadata.audioModel ?? '(unknown — ask: which audio model to use)'}`,
);
lines.push(
`- **durationSeconds**: ${typeof metadata.audioDuration === 'number' ? metadata.audioDuration : '(unknown — ask: target duration)'}`,
);
if (metadata.voice) {
lines.push(`- **voice**: ${metadata.voice}`);
} else if (metadata.audioKind === 'speech') {
lines.push('- **voice**: (unknown — ask: voice id / accent / pacing)');
}
const voiceOptions = shouldRenderElevenLabsVoiceOptions(metadata, audioVoiceOptions)
? audioVoiceOptions ?? []
: [];
if (voiceOptions.length > 0) {
lines.push(
'- **ElevenLabs voice options**: Ask the user to choose from a dropdown select. The visible labels are voice descriptions; the selected value must be the exact `voice_id` passed to `--voice`. Do not ask the user to type an id.',
);
if (voiceOptions.length > ELEVENLABS_VOICE_PROMPT_OPTION_LIMIT) {
lines.push(`- **ElevenLabs voice options**: showing the first ${ELEVENLABS_VOICE_PROMPT_OPTION_LIMIT} of ${voiceOptions.length} available voices.`);
}
lines.push('');
lines.push('<question-form id="elevenlabs-voice" title="Choose an ElevenLabs voice">');
lines.push(JSON.stringify(renderElevenLabsVoiceQuestionForm(voiceOptions), null, 2));
lines.push('</question-form>');
} else {
const audioVoiceOptionsPromptError = formatElevenLabsVoiceOptionsErrorForPrompt(audioVoiceOptionsError);
if (audioVoiceOptionsPromptError) {
lines.push(
`- **ElevenLabs voice options**: ${audioVoiceOptionsPromptError}`,
);
}
}
if (metadata.audioKind === 'sfx') {
lines.push(
'- **SFX discovery**: Ask about the sound source/action, materials, intensity, acoustic space, timing/tail, loop/non-loop, and "avoid" constraints. Do not ask for language or voice for SFX.',
);
}
lines.push('');
lines.push(renderMediaMetadataAction(
'audio',
'`"$OD_NODE_BIN" "$OD_BIN" media generate --surface audio --audio-kind <kind> --model <audioModel> --duration <seconds>` and add `--voice <voice-id>` for speech when you have a provider-specific voice id',
mediaExecution,
));
}
if (metadata.inspirationDesignSystemIds && metadata.inspirationDesignSystemIds.length > 0) {
lines.push(
`- **inspirationDesignSystemIds**: ${metadata.inspirationDesignSystemIds.join(', ')} — the user picked these systems as *additional* inspiration alongside the primary one. Borrow palette accents, typographic personality, or component patterns from them; don't replace the primary system's tokens.`,
);
}
if (Array.isArray(metadata.contextPlugins) && metadata.contextPlugins.length > 0) {
lines.push('');
lines.push('### @ plugin context');
lines.push(
'The user selected these plugins as additive context via @ mentions. Treat them as requested references to combine with the brief; only the explicit active plugin block, if present, is the executable/pinned plugin snapshot.',
);
for (const plugin of metadata.contextPlugins) {
const id = typeof plugin.id === 'string' ? plugin.id : '';
const title = typeof plugin.title === 'string' && plugin.title.trim().length > 0
? plugin.title.trim()
: id;
if (!id && !title) continue;
const description = typeof plugin.description === 'string' && plugin.description.trim().length > 0
? `${plugin.description.trim()}`
: '';
lines.push(`- ${title}${id ? ` (\`${id}\`)` : ''}${description}`);
}
}
if (Array.isArray(metadata.contextMcpServers) && metadata.contextMcpServers.length > 0) {
lines.push('');
lines.push('### @ MCP context');
lines.push(
'The user selected these MCP servers as context. Prefer their tools when mounted and relevant before asking where data should come from.',
);
for (const server of metadata.contextMcpServers) {
const id = typeof server.id === 'string' ? server.id : '';
const label = typeof server.label === 'string' && server.label.trim().length > 0
? server.label.trim()
: id;
if (!id && !label) continue;
const transport = typeof server.transport === 'string' && server.transport.trim().length > 0
? `${server.transport.trim()}`
: '';
lines.push(`- ${label}${id ? ` (\`${id}\`)` : ''}${transport}`);
}
}
if (Array.isArray(metadata.contextConnectors) && metadata.contextConnectors.length > 0) {
lines.push('');
lines.push('### @ connector context');
lines.push(
'The user selected these connectors as context. Use daemon connector tools through the OD CLI wrapper when data from these sources is needed; do not ask the user to identify a source that is already selected.',
);
for (const connector of metadata.contextConnectors) {
const id = typeof connector.id === 'string' ? connector.id : '';
const name = typeof connector.name === 'string' && connector.name.trim().length > 0
? connector.name.trim()
: id;
if (!id && !name) continue;
const meta = [connector.provider, connector.status, connector.accountLabel]
.filter((value): value is string => typeof value === 'string' && value.trim().length > 0)
.join(' · ');
lines.push(`- ${name}${id ? ` (\`${id}\`)` : ''}${meta ? `${meta}` : ''}`);
}
}
// Curated prompt template reference for image/video projects. Inlined
// verbatim (with light truncation) so the agent can borrow structure,
// mood and phrasing without a separate fetch. The user may have edited
// the body before clicking Create — those edits land here and are now
// authoritative for the brief.
if (
(metadata.kind === 'image' || metadata.kind === 'video') &&
metadata.promptTemplate &&
typeof metadata.promptTemplate.prompt === 'string' &&
metadata.promptTemplate.prompt.trim().length > 0
) {
const tpl = metadata.promptTemplate;
lines.push('');
lines.push(`### Reference prompt template — "${tpl.title ?? 'untitled'}"`);
const meta = [];
if (tpl.category) meta.push(`category: ${tpl.category}`);
if (tpl.model) meta.push(`suggested model: ${tpl.model}`);
if (tpl.aspect) meta.push(`aspect: ${tpl.aspect}`);
if (Array.isArray(tpl.tags) && tpl.tags.length > 0) {
meta.push(`tags: ${tpl.tags.join(', ')}`);
}
if (meta.length > 0) lines.push(meta.join(' · '));
if (tpl.summary) {
lines.push('');
lines.push(tpl.summary);
}
lines.push('');
lines.push(
'The user picked this template as inspiration. Treat it as a structural and stylistic reference: borrow composition, palette cues, lighting language, lens/motion direction, and the level of detail. Adapt the wording to the user\'s actual subject and brief — do NOT generate the template subject verbatim. If a field above is unknown the user wants you to follow the template\'s defaults.',
);
// Escape triple-backticks so a user who pastes ``` into the editable
// template body can't break out of the markdown fence below and inject
// free-form instructions into the agent's system prompt.
const safe = (tpl.prompt ?? '').replace(/```/g, '`\u200b`\u200b`');
const truncated =
safe.length > 4000
? `${safe.slice(0, 4000)}\n… (truncated ${safe.length - 4000} chars)`
: safe;
lines.push('');
lines.push('```text');
lines.push(truncated);
lines.push('```');
if (tpl.source) {
const author = tpl.source.author ? ` by ${tpl.source.author}` : '';
lines.push('');
lines.push(
`Source: ${tpl.source.repo}${author} — license ${tpl.source.license ?? 'unspecified'}. Preserve attribution if you echo the template language directly.`,
);
}
}
if (metadata.kind === 'template' && template && template.files.length > 0) {
lines.push('');
lines.push(
`### Template reference — "${template.name}"${template.description ? ` (${template.description})` : ''}`,
);
lines.push(
'These HTML snapshots are what the user wants to start FROM. Read them as a stylistic + structural reference. You may copy structure, palette, typography, and component patterns; you may adapt them to the new brief; do NOT ship them verbatim. The agent should still produce its own artifact, just one that visibly inherits this template\'s design language.',
);
for (const f of template.files) {
// Cap each file at ~12k chars so a giant template doesn't blow out
// the system prompt budget. The agent gets enough to read structure.
const truncated =
f.content.length > 12000
? `${f.content.slice(0, 12000)}\n<!-- … truncated (${f.content.length - 12000} chars omitted) -->`
: f.content;
lines.push('');
lines.push(`#### \`${f.name}\``);
lines.push('```html');
lines.push(truncated);
lines.push('```');
}
}
return lines.join('\n');
}
function renderMediaMetadataAction(
surface: MediaSurface,
command: string,
mediaExecution: MediaExecutionPolicy | undefined,
): string {
const article = surface === 'audio' ? 'an' : 'a';
const mode = mediaExecution?.mode ?? 'enabled';
if (mode === 'disabled') {
return `This is ${article} **${surface}** project, but Open Design-owned media execution is disabled for this run. Plan the creative brief only unless an external MCP media tool is explicitly configured. Do NOT call OD media generation tools and do NOT emit \`<artifact>\` HTML for media surfaces.`;
}
return `This is ${article} **${surface}** project. Plan the creative brief carefully, then dispatch via the **media generation contract** using ${command}. Do NOT emit \`<artifact>\` HTML for media surfaces.`;
}
function shouldRenderElevenLabsVoiceOptions(
metadata: ProjectMetadata,
audioVoiceOptions: AudioVoiceOption[] | undefined,
): boolean {
return metadata.kind === 'audio'
&& metadata.audioKind === 'speech'
&& metadata.audioModel === 'elevenlabs-v3'
&& !metadata.voice
&& Array.isArray(audioVoiceOptions)
&& audioVoiceOptions.length > 0;
}
function renderElevenLabsVoiceQuestionForm(voiceOptions: AudioVoiceOption[]): {
description: string;
questions: Array<{
id: string;
label: string;
type: 'select';
required: boolean;
placeholder: string;
help: string;
options: Array<{ label: string; value: string }>;
}>;
submitLabel: string;
} {
const options = voiceOptions.slice(0, ELEVENLABS_VOICE_PROMPT_OPTION_LIMIT).map((option) => ({
label: formatElevenLabsVoiceLabel(option),
value: option.voiceId,
}));
return {
description:
'Pick a voice by description. The selected answer will be the exact voice_id passed to the renderer.',
questions: [
{
id: 'voice',
label: 'Voice',
type: 'select',
required: true,
placeholder: 'Choose a voice',
help: 'Select a voice description; the answer submits the matching Voice ID.',
options,
},
],
submitLabel: 'Use voice',
};
}
function formatElevenLabsVoiceLabel(option: AudioVoiceOption): string {
const labels = option.labels && typeof option.labels === 'object'
? Object.values(option.labels)
.map((value) => (typeof value === 'string' ? value.trim() : ''))
.filter(Boolean)
: [];
const bits = [...labels];
if (bits.length > 0) return `${option.name}${bits.join(' · ')}`;
const category = typeof option.category === 'string' ? option.category.trim() : '';
return category ? `${option.name}${category}` : option.name;
}
/**
* Detect the seed/references pattern shipped by the upgraded
* web-prototype / mobile-app / simple-deck / guizang-ppt skills, and
* inject a hard pre-flight rule that lists which side files to Read
* before doing anything else. The skill body's own workflow already says
* this — but skills get truncated under context pressure and the agent
* sometimes skips Step 0. A short up-front directive helps.
*
* Returns an empty string when the skill ships no side files (legacy
* SKILL.md-only skills) so we don't add noise.
*/
function derivePreflight(skillBody: string): string {
const refs: string[] = [];
if (/assets\/template\.html/.test(skillBody)) refs.push('`assets/template.html`');
if (/references\/layouts\.md/.test(skillBody)) refs.push('`references/layouts.md`');
if (/references\/themes\.md/.test(skillBody)) refs.push('`references/themes.md`');
if (/references\/components\.md/.test(skillBody)) refs.push('`references/components.md`');
if (/references\/checklist\.md/.test(skillBody)) refs.push('`references/checklist.md`');
// The hyperframes skill ships an html-in-canvas reference next to the
// VFX catalog blocks. The chat handler at server.ts:4138 routes through
// this composer (not the contracts copy), so the case must live here
// too — otherwise live agent runs miss the preflight directive even
// when the skill body explicitly lists the file.
if (/references\/html-in-canvas\.md|html-in-canvas\.md/.test(skillBody)) {
refs.push('`references/html-in-canvas.md`');
}
if (refs.length === 0) return '';
return ` **Pre-flight (do this before any other tool):** Read ${refs.join(', ')} via the path written in the skill-root preamble. The seed template defines the class system you'll paste into; the layouts file is the only acceptable source of section/screen/slide skeletons; the checklist is your P0/P1/P2 gate before emitting \`<artifact>\`. Skipping this step is the #1 reason output regresses to generic AI-slop.`;
}