open-design/apps/daemon/src/media-models.ts
Dan Porat 3395d2c855
feat(daemon): implement fal.ai renderer for image + video generation (#1606)
* feat(daemon): implement fal.ai renderer for image + video generation

Adds renderFalImage and renderFalVideo backed by the fal queue API
(queue.fal.run). Any fal-ai/* model path can be used directly without
a catalog entry, enabling the full fal model library without code
changes. Catalogued shortcuts are mapped via FAL_ENDPOINTS to their
fal-ai/* paths; OD_FAL_MAX_POLL_MS controls the poll ceiling.

Expands the fal model catalog with flux-pro-ultra, flux-dev-fal,
flux-schnell-fal, ideogram-v3-fal, recraft-v3-fal (images) and
veo-3-fal, veo-2-fal, wan-2.1-t2v, wan-2.1-i2v, seedance-1-pro-fal,
kling-2.1-t2v-fal (video). Marks fal provider as integrated: true in
both daemon and web model registries.

* fix(daemon): address fal renderer review comments

- Correct Wan 2.1 endpoints: wan-video/v2.1/* → fal-ai/wan-t2v / fal-ai/wan-i2v
- Correct Kling 2.1 t2v endpoint: .../pro/... → .../master/text-to-video
- Add FAL_IMAGE_USES_ASPECT_RATIO: flux-pro-ultra sends aspect_ratio not image_size
- Add FAL_VIDEO_NO_DURATION: Wan models reject the duration field
- Add FAL_VIDEO_STRING_DURATION: Veo expects duration as "5s" not 5
- Fix falQueueBase() to use anchored regex replace, avoiding mangled
  custom base URLs
- Do not wrap payload under input — raw fal queue HTTP API expects flat
  body; the input wrapper is an SDK abstraction only (confirmed by 422
  validation error from fal showing prompt missing at body.prompt)

* fix(daemon): correct fal queue protocol comment (flat body, no SDK input wrapper)

* fix(daemon): clamp Veo duration to valid fal buckets (4s/6s/8s)

* fix(daemon): report effective fal Veo duration in providerNote (with snap warning)

* fix(daemon): reduce image generation latency from 4m37s to ~73s

Five layered fixes targeting the overhead that padded a ~10s fal API call
into a 4m37s user-facing wait:

1. Skip DISCOVERY_AND_PHILOSOPHY for media surfaces (image/video/audio).
   The ~3000-token HTML-artifact discovery layer is irrelevant for media
   generation and forced the agent to parse and override all its rules
   before dispatching. Removes it from the system prompt entirely for
   these surfaces; MEDIA_GENERATION_CONTRACT is the sole authority.

2. Broaden the wait-loop contract to cover ALL slow models, not just
   "Volcengine i2v / hyperframes-html". Any model whose generation
   exceeds 25s — including fal flux-pro-ultra, Veo, Sora — returns exit 2
   from od media generate. The contract now makes this universal and
   provides a python3-based bash pattern (jq is not guaranteed to be
   installed on all agent runtimes).

3. Increase od media wait polling budget from 25s to 120s. od media
   generate keeps its 25s budget for fast feedback; od media wait is
   purpose-built to sit and poll, so it can safely use the full 2-minute
   bash-tool window. Reduces re-entries for a 3-minute generation from
   ~7 to ~2.

4. First fal poll is now immediate instead of always sleeping 3s before
   the first status check. Saves 3s for all fal jobs.

5. Project metadata no longer emits "(unknown — ask)" for imageModel and
   aspectRatio when unset. Emits the actual defaults (gpt-image-2,
   aspect-ratio scene heuristic) so the agent can dispatch without
   extended reasoning about model selection. Also adds dispatch-immediately
   defaults and a brief-reply rule (2–3 sentences max after generation).

Measured end-to-end on the exact problem prompt before/after:
  Before: 4m37s (discovery form + 7x LLM re-entries + jq failure)
  After:  ~73s   (single bash loop, no question turn, image delivered)

* feat(daemon): inject media dispatch hint for non-media project surfaces

Agents running inside prototype, deck, and other non-image/video/audio
projects previously had no knowledge of `od media generate`, so when
asked to create an image with fal they would try to call provider REST
APIs directly and ask the user for API keys — even though the daemon
already holds credentials in .od/media-config.json.

Add MEDIA_DISPATCH_HINT to composeSystemPrompt for all non-media
surfaces. The hint tells the agent to always route media generation
through the daemon dispatcher, and explicitly forbids prompting for
API keys.

Verified end-to-end: a prototype project generates a 952 KB image via
flux-pro-ultra in ~52s with no key errors.

* fix(daemon): prevent agent from converting bash env vars to PowerShell syntax

MEDIA_DISPATCH_HINT now explicitly labels the shell as POSIX bash and
shows the correct $VAR form side-by-side with a warning NOT to use
PowerShell $env:VAR. Without this, claude-sonnet running on a Windows
host converts the example to PowerShell syntax (`& $env:OD_NODE_BIN`)
which then fails at the bash executor with 'syntax error near unexpected
token &'.

* fix(daemon): add generate→wait loop to MEDIA_DISPATCH_HINT for slow models

MEDIA_DISPATCH_HINT previously showed only a bare  call.
flux-pro-ultra and other slow models always exit 2 after ~25s — without
the wait loop the agent would treat exit 2 as a failure and report an
error to the user.

Replace the single-command example with the canonical generate→wait loop
(matching media-contract.ts), add an explicit note that exit 2 means
'keep polling', and reinforce the POSIX bash / no-PowerShell rule
directly inside the code block.

* fix(daemon): allow fal-ai/* passthrough in media-agent contract

The media-agent prompt instructed the agent to warn and substitute the
default model for any ID not in the catalogue. This blocked the custom
fal-ai/* passthrough path the daemon already supports, so users could
not reach uncatalogued fal models from the normal chat flow.

Carve out the fal-ai/* exception so the agent passes those IDs through
directly instead of warning or substituting.

* fix(daemon): align MEDIA_DISPATCH_HINT with exit-0 generate contract

media generate now always exits 0 (handoff included). The non-media
agent hint still checked ec==2 to decide whether to keep polling, so
slow fal models (flux-pro-ultra, veo-3-fal) would stop after printing
the handoff JSON instead of entering the wait loop.

- generate error check: drop the ec!=2 exception (exits 0 always)
- while loop: drive on taskId presence, not ec==2; stop on ec==0/5
- footer: remove --surface inference claim; CLI requires it explicitly

* fix(guard): add test-fal-webui.ts to e2e scripts allowlist

CI failed: guard flagged e2e/scripts/test-fal-webui.ts as an
unapproved package-owned entrypoint. Add it to allowedE2eScripts.

* fix(daemon): update prompt test expectations to match exit-0 handoff wording

The two stale assertions checked for the old generate-exits-2 copy which
no longer exists in the contract. Update them to match the current
always-exits-0 wording.

* fix(daemon): move skipDiscoveryBrief override before discovery block

* chore(e2e): remove ad-hoc fal webui test script

The script was a one-time developer helper used to manually validate fal
image generation through the live UI. It relied on a real fal API key and
hardcoded local port, so it cannot participate in the e2e package's
fixture/reporting/CI conventions. Removing it per reviewer feedback.

- Delete e2e/scripts/test-fal-webui.ts
- Remove its guard.ts allowlist entry
- Gitignore the file and its screenshots to prevent accidental re-addition

* chore: remove accidental local scratch files from branch

Remove bash.exe.stackdump (MSYS crash dump) and fix_loop.py (one-off
local rewrite helper) — neither is a repo-owned source artifact.

* fix(prompts): document fal-ai/* passthrough in non-media dispatch hint

Prototype/deck agents now know arbitrary fal-ai/* model ids are valid
--model values and should be forwarded as-is, mirroring the exception
already present in media-contract.ts. Adds a prompt regression test.

* fix(daemon): use renderMediaGenerationContract(mediaExecution) for media surfaces

---------

Co-authored-by: mrcfps <mrc@powerformer.com>
2026-05-31 04:44:44 +00:00

210 lines
15 KiB
TypeScript
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

// Daemon-side mirror of src/media/models.ts. The two files are kept in sync by hand — any model added to
// src/media/models.ts must be added here too. Drift is enforced by
// `node scripts/verify-media-models.mjs` (also exposed as
// `npm run verify:media-models`); CI should call it before publish so
// the moment one side adds a model and the other doesn't, the build
// fails with a precise diff.
export type MediaSurface = 'image' | 'video' | 'audio';
export type AudioKind = 'music' | 'speech' | 'sfx';
export type MediaProvider = {
id: string;
label: string;
hint: string;
integrated: boolean;
defaultBaseUrl?: string;
docsUrl?: string;
credentialsRequired?: boolean;
settingsVisible?: boolean;
supportsCustomModel?: boolean;
customModelPlaceholder?: string;
};
export type MediaModel = {
id: string;
label: string;
hint: string;
provider: string;
caps: string[];
default?: boolean;
};
export const MEDIA_PROVIDERS: MediaProvider[] = [
{ id: 'openai', label: 'OpenAI', hint: 'gpt-image-2 / dall-e-3', integrated: true, defaultBaseUrl: 'https://api.openai.com/v1' },
{ id: 'volcengine', label: 'Volcengine Ark (Doubao)', hint: 'Seedance 2.0 / Seedream', integrated: true, defaultBaseUrl: 'https://ark.cn-beijing.volces.com/api/v3' },
{ id: 'grok', label: 'xAI Grok Imagine', hint: 'grok-imagine — image + video with native audio', integrated: true, defaultBaseUrl: 'https://api.x.ai/v1' },
{ id: 'hyperframes', label: 'HyperFrames', hint: 'Local HTML -> MP4 renderer', integrated: true, credentialsRequired: false, settingsVisible: false },
{ id: 'nanobanana', label: 'Nano Banana', hint: 'Google official by default; custom gateway configurable', integrated: true, defaultBaseUrl: 'https://generativelanguage.googleapis.com', supportsCustomModel: true },
{ id: 'imagerouter', label: 'ImageRouter', hint: 'OpenAI-compatible image + video routing', integrated: true, defaultBaseUrl: 'https://api.imagerouter.io/v1/openai', docsUrl: 'https://docs.imagerouter.io/api-reference/image-generation/', supportsCustomModel: true, customModelPlaceholder: 'openai/gpt-image-2 or xAI/grok-imagine-video' },
{ id: 'custom-image', label: 'Custom Image API', hint: 'OpenAI-compatible images/generations + images/edits (local or cloud)', integrated: true, docsUrl: 'https://platform.openai.com/docs/api-reference/images', supportsCustomModel: true, customModelPlaceholder: 'my-image-model' },
{ id: 'comfyui', label: 'ComfyUI', hint: 'Local JSON workflow server (planned adapter)', integrated: false, defaultBaseUrl: 'http://127.0.0.1:8188', docsUrl: 'https://docs.comfy.org/development/core-concepts/workflow' },
{ id: 'bfl', label: 'Black Forest Labs', hint: 'FLUX 1.1 Pro / FLUX Pro / Dev', integrated: false, defaultBaseUrl: 'https://api.bfl.ai' },
{ id: 'fal', label: 'Fal.ai', hint: 'FLUX / Sora / Veo / Wan / Ideogram / Recraft and any fal-ai/* model', integrated: true, defaultBaseUrl: 'https://fal.run', supportsCustomModel: true },
{ id: 'leonardo', label: 'Leonardo.ai', hint: 'Phoenix / Kino XL / FLUX', integrated: true, credentialsRequired: true, settingsVisible: true, defaultBaseUrl: 'https://cloud.leonardo.ai/api/rest/v1' },
{ id: 'replicate', label: 'Replicate', hint: 'FLUX / SDXL / Ideogram', integrated: false, defaultBaseUrl: 'https://api.replicate.com' },
{ id: 'google', label: 'Google AI / Vertex', hint: 'Imagen 4 / Veo 3 / Lyria', integrated: false },
{ id: 'kling', label: 'Kuaishou Kling', hint: 'Kling 1.6 / 2.0 video', integrated: false },
{ id: 'midjourney', label: 'Midjourney (proxy)', hint: 'midjourney-v7', integrated: false },
{ id: 'minimax', label: 'MiniMax', hint: 'TTS / video-01', integrated: true, defaultBaseUrl: 'https://api.minimaxi.chat/v1' },
{ id: 'suno', label: 'Suno', hint: 'Music generation', integrated: false },
{ id: 'udio', label: 'Udio', hint: 'Music generation', integrated: false },
{
id: 'elevenlabs',
label: 'ElevenLabs',
hint: 'Voice / SFX',
integrated: true,
defaultBaseUrl: 'https://api.elevenlabs.io',
docsUrl: 'https://elevenlabs.io/app/settings/api-keys',
},
{ id: 'fishaudio', label: 'FishAudio', hint: 'Speech / voice clone', integrated: true, defaultBaseUrl: 'https://api.fish.audio' },
{
id: 'senseaudio',
label: 'SenseAudio',
hint: '',
integrated: true,
defaultBaseUrl: 'https://api.senseaudio.cn',
docsUrl: 'https://docs.senseaudio.cn',
},
{ id: 'tavily', label: 'Tavily Search', hint: 'Agent-callable web research', integrated: true, defaultBaseUrl: 'https://api.tavily.com' },
{ id: 'stub', label: 'Stub (placeholder)', hint: 'Deterministic local placeholder bytes', integrated: true },
];
export const IMAGE_MODELS: MediaModel[] = [
{ id: 'gpt-image-2', label: 'gpt-image-2', hint: 'OpenAI · 4K, native multimodal', provider: 'openai', caps: ['t2i', 'i2i', 'inpaint'], default: true },
{ id: 'gpt-image-1.5', label: 'gpt-image-1.5', hint: 'OpenAI · 4× faster than gpt-image-1', provider: 'openai', caps: ['t2i', 'i2i', 'inpaint'] },
{ id: 'gpt-image-1', label: 'gpt-image-1', hint: 'OpenAI · ChatGPT native', provider: 'openai', caps: ['t2i', 'i2i', 'inpaint'] },
{ id: 'gpt-image-1-mini', label: 'gpt-image-1-mini', hint: 'OpenAI · low-cost variant', provider: 'openai', caps: ['t2i', 'i2i'] },
{ id: 'dall-e-3', label: 'dall-e-3', hint: 'OpenAI · classic', provider: 'openai', caps: ['t2i'] },
{ id: 'dall-e-2', label: 'dall-e-2', hint: 'OpenAI · legacy', provider: 'openai', caps: ['t2i'] },
{ id: 'doubao-seedream-3-0-t2i-250415', label: 'seedream-3.0', hint: 'ByteDance · Doubao image', provider: 'volcengine', caps: ['t2i'] },
{ id: 'doubao-seededit-3-0-i2i-250628', label: 'seededit-3.0', hint: 'ByteDance · image edit', provider: 'volcengine', caps: ['i2i'] },
{ id: 'senseaudio-image-2.0-260319', label: 'senseaudio-image-2.0', hint: 'SenseAudio · multi-aspect, latest', provider: 'senseaudio', caps: ['t2i', 'i2i'] },
{ id: 'senseaudio-image-1.0-260319', label: 'senseaudio-image-1.0', hint: 'SenseAudio · standard', provider: 'senseaudio', caps: ['t2i', 'i2i'] },
{ id: 'doubao-seedream-5-0-260128', label: 'seedream-5.0', hint: 'SenseAudio · ByteDance Seedream 5.0 hi-res', provider: 'senseaudio', caps: ['t2i', 'i2i'] },
{ id: 'grok-imagine-image', label: 'grok-imagine-image', hint: 'xAI · 2K text-to-image', provider: 'grok', caps: ['t2i'] },
{ id: 'gemini-3.1-flash-image-preview', label: 'nano-banana-2', hint: 'Nano Banana · text-to-image', provider: 'nanobanana', caps: ['t2i'] },
{ id: 'openai/gpt-image-2', label: 'openai/gpt-image-2', hint: 'ImageRouter · routed GPT Image', provider: 'imagerouter', caps: ['t2i'] },
{ id: 'openai/gpt-image-1.5', label: 'openai/gpt-image-1.5', hint: 'ImageRouter · routed GPT Image', provider: 'imagerouter', caps: ['t2i'] },
{ id: 'black-forest-labs/FLUX-1.1-pro', label: 'FLUX-1.1-pro', hint: 'ImageRouter · Black Forest Labs', provider: 'imagerouter', caps: ['t2i'] },
{ id: 'custom-image', label: 'custom-image', hint: 'Custom · OpenAI-compatible endpoint', provider: 'custom-image', caps: ['t2i', 'i2i'] },
{ id: 'flux-1.1-pro', label: 'flux-1.1-pro', hint: 'BFL · flagship', provider: 'bfl', caps: ['t2i', 'i2i'] },
{ id: 'flux-pro', label: 'flux-pro', hint: 'BFL', provider: 'bfl', caps: ['t2i'] },
{ id: 'flux-dev', label: 'flux-dev', hint: 'BFL · open weights', provider: 'bfl', caps: ['t2i'] },
{ id: 'flux-schnell', label: 'flux-schnell', hint: 'BFL · fast', provider: 'bfl', caps: ['t2i'] },
{ id: 'flux-kontext-pro', label: 'flux-kontext-pro', hint: 'BFL · in-context edits', provider: 'bfl', caps: ['t2i', 'i2i'] },
{ id: 'imagen-4', label: 'imagen-4', hint: 'Google · latest', provider: 'google', caps: ['t2i'] },
{ id: 'imagen-3', label: 'imagen-3', hint: 'Google', provider: 'google', caps: ['t2i'] },
{ id: 'gemini-3-pro-image-preview', label: 'gemini-3-pro-image', hint: 'Google · Nano Banana Pro', provider: 'google', caps: ['t2i', 'i2i'] },
{ id: 'ideogram-v2', label: 'ideogram-v2', hint: 'Replicate · typography', provider: 'replicate', caps: ['t2i'] },
{ id: 'sdxl', label: 'stable-diffusion-xl', hint: 'Replicate · SDXL', provider: 'replicate', caps: ['t2i'] },
{ id: 'flux-pro-ultra', label: 'flux-pro-ultra', hint: 'Fal · FLUX 1.1 Pro Ultra · highest quality (~60180s)', provider: 'fal', caps: ['t2i'] },
{ id: 'flux-dev-fal', label: 'flux-dev (fal)', hint: 'Fal · FLUX Dev · balanced quality/speed (~1540s)', provider: 'fal', caps: ['t2i'] },
{ id: 'flux-schnell-fal', label: 'flux-schnell (fal)', hint: 'Fal · FLUX Schnell · fastest (~38s)', provider: 'fal', caps: ['t2i'] },
{ id: 'ideogram-v3-fal', label: 'ideogram-v3', hint: 'Fal · Ideogram v3 · typography + design (~1530s)', provider: 'fal', caps: ['t2i'] },
{ id: 'recraft-v3-fal', label: 'recraft-v3', hint: 'Fal · Recraft v3 · vector + illustration (~1530s)', provider: 'fal', caps: ['t2i'] },
{ id: 'sd-3.5', label: 'stable-diffusion-3.5', hint: 'Fal · SD 3.5 (~2040s)', provider: 'fal', caps: ['t2i'] },
{ id: 'leonardo-phoenix', label: 'Phoenix', hint: 'Leonardo · versatile', provider: 'leonardo', caps: ['t2i'] },
{ id: 'leonardo-kino-xl', label: 'Kino XL', hint: 'Leonardo · cinematic', provider: 'leonardo', caps: ['t2i'] },
{ id: 'leonardo-flux-dev', label: 'FLUX Dev', hint: 'Leonardo · FLUX', provider: 'leonardo', caps: ['t2i'] },
{ id: 'leonardo-flux-schnell', label: 'FLUX Schnell', hint: 'Leonardo · fast', provider: 'leonardo', caps: ['t2i'] },
{ id: 'leonardo-anime-pastel', label: 'Anime Pastel Dream', hint: 'Leonardo · anime', provider: 'leonardo', caps: ['t2i'] },
{ id: 'midjourney-v7', label: 'midjourney-v7', hint: 'Midjourney · via proxy', provider: 'midjourney', caps: ['t2i'] },
];
export const VIDEO_MODELS: MediaModel[] = [
{ id: 'doubao-seedance-2-0-260128', label: 'seedance-2.0', hint: 'ByteDance · t2v + i2v + audio', provider: 'volcengine', caps: ['t2v', 'i2v', 'audio'], default: true },
{ id: 'doubao-seedance-2-0-fast-260128', label: 'seedance-2.0-fast', hint: 'ByteDance · faster, cheaper', provider: 'volcengine', caps: ['t2v', 'i2v', 'audio'] },
{ id: 'doubao-seedance-1-0-pro-250528', label: 'seedance-1.0-pro', hint: 'ByteDance · 1.0', provider: 'volcengine', caps: ['t2v', 'i2v'] },
{ id: 'doubao-seedance-1-0-lite-i2v-250428', label: 'seedance-1.0-lite-i2v', hint: 'ByteDance · image-to-video', provider: 'volcengine', caps: ['i2v'] },
{ id: 'doubao-seedance-1-0-lite-t2v-250428', label: 'seedance-1.0-lite-t2v', hint: 'ByteDance · text-to-video', provider: 'volcengine', caps: ['t2v'] },
{ id: 'grok-imagine-video', label: 'grok-imagine-video', hint: 'xAI · 720p t2v + i2v + native audio', provider: 'grok', caps: ['t2v', 'i2v', 'audio'] },
{ id: 'xAI/grok-imagine-video', label: 'xAI/grok-imagine-video', hint: 'ImageRouter · routed video', provider: 'imagerouter', caps: ['t2v', 'audio'] },
{ id: 'bytedance/seedance-1.5-pro', label: 'seedance-1.5-pro', hint: 'ImageRouter · Bytedance', provider: 'imagerouter', caps: ['t2v'] },
{ id: 'google/veo-3.1-lite', label: 'veo-3.1-lite', hint: 'ImageRouter · Google', provider: 'imagerouter', caps: ['t2v'] },
{ id: 'kling-2.0', label: 'kling-2.0', hint: 'Kuaishou · latest', provider: 'kling', caps: ['t2v', 'i2v'] },
{ id: 'kling-1.6', label: 'kling-1.6', hint: 'Kuaishou', provider: 'kling', caps: ['t2v', 'i2v'] },
{ id: 'kling-1.5', label: 'kling-1.5', hint: 'Kuaishou', provider: 'kling', caps: ['t2v', 'i2v'] },
{ id: 'veo-3', label: 'veo-3', hint: 'Google · sound-on', provider: 'google', caps: ['t2v', 'audio'] },
{ id: 'veo-2', label: 'veo-2', hint: 'Google', provider: 'google', caps: ['t2v'] },
{ id: 'veo-3-fal', label: 'veo-3 (fal)', hint: 'Fal · Google Veo 3 · sound-on', provider: 'fal', caps: ['t2v', 'audio'] },
{ id: 'veo-2-fal', label: 'veo-2 (fal)', hint: 'Fal · Google Veo 2', provider: 'fal', caps: ['t2v'] },
{ id: 'wan-2.1-t2v', label: 'wan-2.1-t2v', hint: 'Fal · Wan 2.1 text-to-video', provider: 'fal', caps: ['t2v'] },
{ id: 'wan-2.1-i2v', label: 'wan-2.1-i2v', hint: 'Fal · Wan 2.1 image-to-video', provider: 'fal', caps: ['i2v'] },
{ id: 'seedance-1-pro-fal', label: 'seedance-1-pro (fal)', hint: 'Fal · Seedance 1 Pro', provider: 'fal', caps: ['t2v', 'i2v'] },
{ id: 'kling-2.1-t2v-fal', label: 'kling-2.1 (fal)', hint: 'Fal · Kling 2.1 Pro text-to-video', provider: 'fal', caps: ['t2v'] },
{ id: 'sora-2', label: 'sora-2', hint: 'Fal · OpenAI Sora 2', provider: 'fal', caps: ['t2v'] },
{ id: 'sora-2-pro', label: 'sora-2-pro', hint: 'Fal · OpenAI Sora 2 Pro', provider: 'fal', caps: ['t2v'] },
{ id: 'minimax-video-01', label: 'video-01', hint: 'MiniMax · Hailuo', provider: 'minimax', caps: ['t2v', 'i2v'] },
{ id: 'hyperframes-html', label: 'hyperframes-html', hint: 'HyperFrames · local HTML renderer', provider: 'hyperframes', caps: ['t2v'] },
];
export const AUDIO_MODELS_BY_KIND: Record<AudioKind, MediaModel[]> = {
music: [
{ id: 'suno-v5', label: 'suno-v5', hint: 'Suno · default', provider: 'suno', caps: ['music'], default: true },
{ id: 'suno-v4-5', label: 'suno-v4.5', hint: 'Suno', provider: 'suno', caps: ['music'] },
{ id: 'udio-v2', label: 'udio-v2', hint: 'Udio', provider: 'udio', caps: ['music'] },
{ id: 'lyria-2', label: 'lyria-2', hint: 'Google', provider: 'google', caps: ['music'] },
],
speech: [
{ id: 'minimax-tts', label: 'minimax-tts', hint: 'MiniMax', provider: 'minimax', caps: ['tts'], default: true },
{ id: 'fish-speech-2', label: 'fish-speech-2', hint: 'FishAudio', provider: 'fishaudio', caps: ['tts', 'voice-clone'] },
{ id: 'elevenlabs-v3', label: 'elevenlabs-v3', hint: 'ElevenLabs', provider: 'elevenlabs', caps: ['tts', 'voice-clone'] },
{ id: 'senseaudio-tts', label: 'senseaudio-tts', hint: 'SenseAudio', provider: 'senseaudio', caps: ['tts', 'voice-clone'] },
{ id: 'doubao-tts', label: 'doubao-tts', hint: 'Volcengine', provider: 'volcengine', caps: ['tts'] },
{ id: 'gpt-4o-mini-tts', label: 'gpt-4o-mini-tts', hint: 'OpenAI', provider: 'openai', caps: ['tts'] },
// xAI TTS — multilingual; uses the same SuperGrok OAuth as image / video.
{ id: 'grok-tts', label: 'grok-tts', hint: 'xAI · multilingual · uses Grok subscription', provider: 'grok', caps: ['tts'] },
],
sfx: [
{ id: 'elevenlabs-sfx', label: 'elevenlabs-sfx', hint: 'ElevenLabs SFX', provider: 'elevenlabs', caps: ['sfx'], default: true },
{ id: 'audiocraft', label: 'audiocraft', hint: 'Meta · open', provider: 'replicate', caps: ['sfx', 'music'] },
],
};
export const MEDIA_ASPECTS = ['1:1', '16:9', '9:16', '4:3', '3:4'];
export const VIDEO_LENGTHS_SEC = [3, 5, 8, 10, 15, 30];
export const AUDIO_DURATIONS_SEC = [5, 10, 15, 30, 60, 120];
export function findMediaModel(id: string): MediaModel | null {
const all = [
...IMAGE_MODELS,
...VIDEO_MODELS,
...AUDIO_MODELS_BY_KIND.music,
...AUDIO_MODELS_BY_KIND.speech,
...AUDIO_MODELS_BY_KIND.sfx,
];
return all.find((m) => m.id === id) || null;
}
export function findProvider(id: string): MediaProvider | null {
return MEDIA_PROVIDERS.find((p) => p.id === id) || null;
}
export function modelsForSurface(surface: MediaSurface, audioKind?: AudioKind): MediaModel[] {
if (surface === 'image') return IMAGE_MODELS;
if (surface === 'video') return VIDEO_MODELS;
if (surface === 'audio') {
const k = audioKind || 'music';
return AUDIO_MODELS_BY_KIND[k] || AUDIO_MODELS_BY_KIND.music;
}
return [];
}