open-design

mirror of https://github.com/nexu-io/open-design.git synced 2026-05-31 19:04:39 +07:00

Author	SHA1	Message	Date
Dan Porat	3395d2c855	feat(daemon): implement fal.ai renderer for image + video generation (#1606 ) * feat(daemon): implement fal.ai renderer for image + video generation Adds renderFalImage and renderFalVideo backed by the fal queue API (queue.fal.run). Any fal-ai/* model path can be used directly without a catalog entry, enabling the full fal model library without code changes. Catalogued shortcuts are mapped via FAL_ENDPOINTS to their fal-ai/* paths; OD_FAL_MAX_POLL_MS controls the poll ceiling. Expands the fal model catalog with flux-pro-ultra, flux-dev-fal, flux-schnell-fal, ideogram-v3-fal, recraft-v3-fal (images) and veo-3-fal, veo-2-fal, wan-2.1-t2v, wan-2.1-i2v, seedance-1-pro-fal, kling-2.1-t2v-fal (video). Marks fal provider as integrated: true in both daemon and web model registries. * fix(daemon): address fal renderer review comments - Correct Wan 2.1 endpoints: wan-video/v2.1/* → fal-ai/wan-t2v / fal-ai/wan-i2v - Correct Kling 2.1 t2v endpoint: .../pro/... → .../master/text-to-video - Add FAL_IMAGE_USES_ASPECT_RATIO: flux-pro-ultra sends aspect_ratio not image_size - Add FAL_VIDEO_NO_DURATION: Wan models reject the duration field - Add FAL_VIDEO_STRING_DURATION: Veo expects duration as "5s" not 5 - Fix falQueueBase() to use anchored regex replace, avoiding mangled custom base URLs - Do not wrap payload under input — raw fal queue HTTP API expects flat body; the input wrapper is an SDK abstraction only (confirmed by 422 validation error from fal showing prompt missing at body.prompt) * fix(daemon): correct fal queue protocol comment (flat body, no SDK input wrapper) * fix(daemon): clamp Veo duration to valid fal buckets (4s/6s/8s) * fix(daemon): report effective fal Veo duration in providerNote (with snap warning) * fix(daemon): reduce image generation latency from 4m37s to ~73s Five layered fixes targeting the overhead that padded a ~10s fal API call into a 4m37s user-facing wait: 1. Skip DISCOVERY_AND_PHILOSOPHY for media surfaces (image/video/audio). The ~3000-token HTML-artifact discovery layer is irrelevant for media generation and forced the agent to parse and override all its rules before dispatching. Removes it from the system prompt entirely for these surfaces; MEDIA_GENERATION_CONTRACT is the sole authority. 2. Broaden the wait-loop contract to cover ALL slow models, not just "Volcengine i2v / hyperframes-html". Any model whose generation exceeds 25s — including fal flux-pro-ultra, Veo, Sora — returns exit 2 from od media generate. The contract now makes this universal and provides a python3-based bash pattern (jq is not guaranteed to be installed on all agent runtimes). 3. Increase od media wait polling budget from 25s to 120s. od media generate keeps its 25s budget for fast feedback; od media wait is purpose-built to sit and poll, so it can safely use the full 2-minute bash-tool window. Reduces re-entries for a 3-minute generation from ~7 to ~2. 4. First fal poll is now immediate instead of always sleeping 3s before the first status check. Saves 3s for all fal jobs. 5. Project metadata no longer emits "(unknown — ask)" for imageModel and aspectRatio when unset. Emits the actual defaults (gpt-image-2, aspect-ratio scene heuristic) so the agent can dispatch without extended reasoning about model selection. Also adds dispatch-immediately defaults and a brief-reply rule (2–3 sentences max after generation). Measured end-to-end on the exact problem prompt before/after: Before: 4m37s (discovery form + 7x LLM re-entries + jq failure) After: ~73s (single bash loop, no question turn, image delivered) * feat(daemon): inject media dispatch hint for non-media project surfaces Agents running inside prototype, deck, and other non-image/video/audio projects previously had no knowledge of `od media generate`, so when asked to create an image with fal they would try to call provider REST APIs directly and ask the user for API keys — even though the daemon already holds credentials in .od/media-config.json. Add MEDIA_DISPATCH_HINT to composeSystemPrompt for all non-media surfaces. The hint tells the agent to always route media generation through the daemon dispatcher, and explicitly forbids prompting for API keys. Verified end-to-end: a prototype project generates a 952 KB image via flux-pro-ultra in ~52s with no key errors. * fix(daemon): prevent agent from converting bash env vars to PowerShell syntax MEDIA_DISPATCH_HINT now explicitly labels the shell as POSIX bash and shows the correct $VAR form side-by-side with a warning NOT to use PowerShell $env:VAR. Without this, claude-sonnet running on a Windows host converts the example to PowerShell syntax (`& $env:OD_NODE_BIN`) which then fails at the bash executor with 'syntax error near unexpected token &'. * fix(daemon): add generate→wait loop to MEDIA_DISPATCH_HINT for slow models MEDIA_DISPATCH_HINT previously showed only a bare call. flux-pro-ultra and other slow models always exit 2 after ~25s — without the wait loop the agent would treat exit 2 as a failure and report an error to the user. Replace the single-command example with the canonical generate→wait loop (matching media-contract.ts), add an explicit note that exit 2 means 'keep polling', and reinforce the POSIX bash / no-PowerShell rule directly inside the code block. * fix(daemon): allow fal-ai/* passthrough in media-agent contract The media-agent prompt instructed the agent to warn and substitute the default model for any ID not in the catalogue. This blocked the custom fal-ai/* passthrough path the daemon already supports, so users could not reach uncatalogued fal models from the normal chat flow. Carve out the fal-ai/* exception so the agent passes those IDs through directly instead of warning or substituting. * fix(daemon): align MEDIA_DISPATCH_HINT with exit-0 generate contract media generate now always exits 0 (handoff included). The non-media agent hint still checked ec==2 to decide whether to keep polling, so slow fal models (flux-pro-ultra, veo-3-fal) would stop after printing the handoff JSON instead of entering the wait loop. - generate error check: drop the ec!=2 exception (exits 0 always) - while loop: drive on taskId presence, not ec==2; stop on ec==0/5 - footer: remove --surface inference claim; CLI requires it explicitly * fix(guard): add test-fal-webui.ts to e2e scripts allowlist CI failed: guard flagged e2e/scripts/test-fal-webui.ts as an unapproved package-owned entrypoint. Add it to allowedE2eScripts. * fix(daemon): update prompt test expectations to match exit-0 handoff wording The two stale assertions checked for the old generate-exits-2 copy which no longer exists in the contract. Update them to match the current always-exits-0 wording. * fix(daemon): move skipDiscoveryBrief override before discovery block * chore(e2e): remove ad-hoc fal webui test script The script was a one-time developer helper used to manually validate fal image generation through the live UI. It relied on a real fal API key and hardcoded local port, so it cannot participate in the e2e package's fixture/reporting/CI conventions. Removing it per reviewer feedback. - Delete e2e/scripts/test-fal-webui.ts - Remove its guard.ts allowlist entry - Gitignore the file and its screenshots to prevent accidental re-addition * chore: remove accidental local scratch files from branch Remove bash.exe.stackdump (MSYS crash dump) and fix_loop.py (one-off local rewrite helper) — neither is a repo-owned source artifact. * fix(prompts): document fal-ai/* passthrough in non-media dispatch hint Prototype/deck agents now know arbitrary fal-ai/* model ids are valid --model values and should be forwarded as-is, mirroring the exception already present in media-contract.ts. Adds a prompt regression test. * fix(daemon): use renderMediaGenerationContract(mediaExecution) for media surfaces --------- Co-authored-by: mrcfps <mrc@powerformer.com>	2026-05-31 04:44:44 +00:00
maybeyourking	881571dea7	fix(media): route custom-image edits through images API (#3087 ) * fix(media): route custom-image edits through images API * fix(media): normalize custom-image endpoint suffixes --------- Co-authored-by: Artist Ning <dingkuake@yeah.net> Co-authored-by: Siri-Ray <2667192167@qq.com>	2026-05-29 08:09:44 +00:00
nettee	6c6ee44e07	docs(media): clarify custom providers and ComfyUI status (#479 ) (#2942 ) Closes #479 Generated-By: looper 0.9.0 (runner=worker, agent=opencode)	2026-05-26 06:22:02 +00:00
mzl163	210b94069a	feat(senseaudio): BYOK chat with image + video generation tools (#2065 ) * feat(senseaudio): BYOK chat with image + video generation tools Adds SenseAudio as a first-class BYOK chat protocol and wires the daemon's chat proxy with a tool loop so BYOK users can generate images and videos without dropping to a CLI agent. - BYOK protocol: new senseaudio tab + /api/proxy/senseaudio/stream route + connection-test + provider-models discovery (OpenAI-compatible wire) - Tool loop: generate_image (synchronous /v1/image/sync) and generate_video (async /v1/video/create + 5s polling /v1/video/status, 10-min ceiling, periodic progress log every 30s) - Settings dropdown + chat-composer dropdown for the BYOK image model default; generate_image's model enum lets the LLM override per call - Seed-on-success: a successful BYOK chat call idempotently mirrors the key into media-config (preserves env-resolved + already-stored keys) - Generated artifacts land in <projectsRoot>/<projectId>/ so FileViewer, DesignFilesPanel, and project export pick them up automatically; legacy /api/byok-image/:id route kept for old conversation links - Markdown renderer learns ![alt](url) image syntax with a scheme allowlist (http(s) / data:image/ / blob: / relative paths) - i18n key settings.byokImageModel across all 19 locales - 3 SenseAudio image models registered (2.0, 1.0, doubao-seedream-5.0); 1 video model (doubao-seedance-2.0) - Tests: byok-tools (29), media-senseaudio-image (8), media-config seed (7), proxy-routes (47), markdown image rendering (8) * fix(senseaudio): unblock image gen + design file preview switching - SenseAudio /v1/image/sync rejected the previous size mapping with `参数错误：size` (1664x936, 936x1664, 1280x960, 960x1280 are not in the gateway's accepted set). Switched to standard HD / SD sizes that every aspect bucket can hit: 1024×1024, 1280×720, 720×1280, 1024×768, 768×1024. Kept the byok-tools and media.ts tables in sync so the BYOK chat tool and the CLI agent path both stop failing on non-square aspects. - DesignFilesPanel's <DfPreview> was missing a key prop, so React reused the same iframe DOM node when the user picked a different file — the src prop changed but the iframe never navigated. Added key={previewFile.name} so the previous preview unmounts cleanly. - Updated byok-tools + media-senseaudio-image tests for the new size expectations. * docs(senseaudio): clear stale provider hint + update README - Settings → Media → SenseAudio: clear the auto-promoted "Image · TTS · 70+ voices · clone" hint; the provider label alone is enough now that the BYOK chat surface covers image + video tooling. - README: list the new senseaudio (and missing ollama) proxy routes so the BYOK section reflects what the daemon actually serves, and mention the generate_image / generate_video chat tools that ship with the SenseAudio path. * fix(senseaudio): address PR #2065 review feedback Three non-blocking review notes from @PerishCode on PR #2065: 1. Drop the dead /api/byok-image/:id route. The PR description claimed it was "legacy fallback for old chat history" but that storage layout never existed on main, so the route can only ever 400 or 404 — never 200. Removed the handler, the isSafeByokImageId export, the unused createReadStream / stat / path / Request / Response imports, and the two byok-image regression tests. 2. Add rejectProxyPluginContext guard to the senseaudio proxy handler so it matches the invariant the other five proxy paths already enforce (plugin runs must go through /api/runs for snapshot pinning). Extended the existing "API fallback rejects plugin runs" describe to also cover /api/proxy/senseaudio/stream with the 409 PLUGIN_REQUIRES_DAEMON expectation. 3. Wrap the secondary image / video downloads (the URLs the SenseAudio gateway hands back in /v1/image/sync .url and /v1/video/status .video_url) in validateBaseUrlResolved so a malicious gateway can't point us at 169.254.169.254 (AWS / Azure metadata) or RFC1918 hosts via the response payload. Also passed `redirect: 'error'` on both fetches to match the SSRF posture the primary proxy fetch already uses. The new assertExternalAssetUrl helper lives next to executeGenerateImage so future tool downloads can reuse it. Tests: 120/120 daemon tests pass; guard + typecheck green. * fix(senseaudio): mirror SSRF guard onto renderSenseAudioImage CLI path Follow-up to `01b1260a` — the chat-tool fix in byok-tools.ts wasn't mirrored onto the parallel renderSenseAudioImage path in media.ts. Same attacker-controllable shape (gateway-returned `data.url`), same one-line fix. - Hoist assertExternalAssetUrl from byok-tools.ts into connectionTest.ts next to validateBaseUrlResolved so both call sites (the BYOK chat tool loop AND the CLI agent media dispatcher) share one helper. Made the error strings provider-agnostic so a future caller doesn't get a misleading "senseaudio" attribution for a Volcengine / Grok / etc. download. - renderSenseAudioImage now runs the response url through assertExternalAssetUrl before fetching bytes, and passes redirect: 'error' to block a 3xx hop into private space. Scope intentionally limited to the senseaudio path PerishCode flagged; the other unguarded fetch(entry.url) call sites in media.ts (OpenAI / Volcengine / Grok / Nano-Banana) are pre-existing patterns and belong in a separate follow-up if the daemon wants defense-in-depth across every provider. Tests: 127/127 daemon tests pass; guard + typecheck green. --------- Co-authored-by: unknown <mazeliang@sensetime.com>	2026-05-19 23:14:56 +08:00
Quang Do	13c8bc4193	feat(daemon): add OpenAI-compatible media providers (#1712 ) * feat(daemon): add openai-compatible media providers * fix(web): sync media registry with routed providers	2026-05-15 23:05:03 +08:00
Nicholas-Xiong	f78b0d3a2a	feat: add Leonardo.ai image provider integration (#1123 ) * feat: add Leonardo.ai image provider integration Implements Leonardo.ai as a fully supported image provider with the following models: - Phoenix (leonardo-phoenix) - versatile general-purpose model - Kino XL (leonardo-kino-xl) - cinematic quality - FLUX Dev (leonardo-flux-dev) - FLUX.1 [dev] - FLUX Schnell (leonardo-flux-schnell) - fast generation - Anime Pastel Dream (leonardo-anime-pastel) - anime style Features: - Async generation with polling (2-minute timeout) - Support for standard aspect ratios: 1:1, 16:9, 9:16, 4:3, 3:4 - Bearer token authentication - Automatic image format detection Implementation: - Frontend: apps/web/src/media/models.ts - Backend: apps/daemon/src/media-models.ts - Renderer: renderLeonardoImage() function (~130 lines) - Dispatcher: integrated into media generation pipeline API Integration: - Submit: POST /generations - Poll: GET /generations/{id} - Response: generations_by_pk.generated_images[0].url Addresses #984 * fix: Add leonardo to MediaProviderId type and register env vars 1. Add 'leonardo' to MediaProviderId union in apps/web/src/media/models.ts to fix TypeScript build error 2. Register LEONARDO_API_KEY env vars in apps/daemon/src/media-config.ts following the same pattern as other providers This fixes the CI TypeScript build failure and enables proper env-based credential lookup for Leonardo.ai provider. --------- Co-authored-by: lefarcen <935902669@qq.com>	2026-05-15 17:16:28 +08:00
chaoxiaoche	bcc58af931	refactor(web): rename Execution mode and tighten settings dialog UI (#1568 ) * refactor(web): rename Execution mode and tighten settings dialog UI - Rename "Settings → Execution & model" to "Settings → Execution mode" across the web UI, i18n keys, docs, and e2e selectors. - Redesign SettingsDialog: kicker + title row in the modal head, a flatMap-driven agent grid that renders the inline test-result row beside the selected card, compact unavailable cards with right-aligned install/docs links, and an install guide that only shows when the user has no working agent picked. - Trim verbose subtitle / hint copy across chat model, CLI proxy, media providers, custom instructions, and memory sections. - Add an `info` Icon variant for the redesigned settings hints. - Update e2e selectors and docs that referenced the old menu label. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(web): polish Settings dialog — media providers, skills, MCP Media providers - Hide internal Stub fixture provider (settingsVisible: false) - Split provider list into Available (integrated, editable) and Coming Soon (collapsed <details> drawer with name/hint/Docs link only) - Drop right-side Integrated/Configured badges from every row; all rows in the main list are integrated by definition; inline grey "Saved" chip next to the provider name is the only status indicator now - "Saved" badge moves inline to the right of the provider name and uses a neutral grey treatment (was a standalone green pill below the name) - "Reload from daemon" button shows a 2s green "✓ Reloaded" flash on success instead of leaving a permanent paragraph under the header; errors remain sticky Skills - Replace three pill-row filter banks (Source, Type, Category) with a compact single-row toolbar: search + three inline <select> dropdowns side by side; active filter highlighted with a stronger border MCP server - Shorten section hint to one line - Move WHAT YOUR AGENT CAN DO capabilities above the client dropdown (motivate before asking to act) - Move "Build the daemon first" warning below the code block where it contextually explains why the command might fail, not as a top-level error before the user has done anything - Downgrade "Restart your client" left-border from accent orange to border-strong grey — it is a next step, not a warning External MCP - Shorten section hint to one line Misc CSS - Add .sr-only utility for accessible off-screen live regions - Add button.ghost.is-success-flash for transient success feedback - Add .library-filter-selects / .library-filter-select for dropdown filter rows - Add .media-provider-coming-soon-* for the roadmap drawer Co-authored-by: Cursor <cursoragent@cursor.com> * [codex] Add Cursor Agent auth diagnostics (#1538) * Add Cursor Agent auth diagnostics * Handle Cursor not logged in auth status * Address Cursor auth review feedback * Classify Cursor stdout auth failures * test: expand Memory and Routines coverage (#1521) * test: expand settings and packaged coverage * test: extend memory settings coverage * test: cover routine settings failure states * test: cover routine operation failures * test: fix daemon test typing on CI * test: decouple packaged smoke from orbit bug * test: avoid live memory LLM calls in route tests * test: fix daemon fetch typing in CI * fix: restore preview comment and inspect toggles * test: align manual edit flow with current inspector UX * test: align comment attachment flow with current preview comments UI * fix: probe resolved Codex launch path during detection * fix: remove duplicate board activation helper after rebase * test: update ghost cli detection mock * test: align FileViewer toolbar expectation * ci: move full app tests to extended lane * ci: run app tests by changed scope * ci: cover shared app inputs in test scopes * ci: avoid setup-node cache in windows packaged smoke * test: align extended settings and manual edit flows * refactor(web): rename Execution mode and tighten settings dialog UI - Rename "Settings → Execution & model" to "Settings → Execution mode" across the web UI, i18n keys, docs, and e2e selectors. - Redesign SettingsDialog: kicker + title row in the modal head, a flatMap-driven agent grid that renders the inline test-result row beside the selected card, compact unavailable cards with right-aligned install/docs links, and an install guide that only shows when the user has no working agent picked. - Trim verbose subtitle / hint copy across chat model, CLI proxy, media providers, custom instructions, and memory sections. - Add an `info` Icon variant for the redesigned settings hints. - Update e2e selectors and docs that referenced the old menu label. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(web): settings dialog UX polish — layout, dedup, and interactions - Remove duplicate section headers from all settings sections (Notifications, Appearance, Privacy, About, Design Systems, Skills, MCP server, Connectors, Media providers, Routines) - Restructure Notifications cards: title + toggle on same row, hint below - Restructure Skills toolbar: search + New skill button in row 1, filter dropdowns in row 2 with left-aligned labels - Restructure Pet section: tabs and Wake button on same row - MCP server: group capabilities and setup into separate cards, remove nested double border on client picker - Connectors: show connect errors as toast instead of inline card text, position toast inside panel, hide single-provider tab - Media providers: move Reload button to left-aligned small ghost button - Memory: info icon shows path on hover, Path copied badge inline; Extraction history and MEMORY.md as standalone collapsible cards; group header hidden when only one type visible - Pet grid cards: Adopt button hidden until hover, icon-only when adopted, description truncated to 2 lines, text fills full width via abs positioning - Agent cards: selected state uses accent border only, no background change - Add sun/moon icons to Appearance theme buttons (Light/Dark) - Shorten several hint strings for clarity Co-authored-by: Cursor <cursoragent@cursor.com> * fix(web): resolve i18n review comments from PR #1568 - Update settings.title and settings.envConfigure to localized "Execution mode" in all 17 non-English locale files - Add settings.memoryFlashPathCopied to all locales and use t() in MemorySection instead of hardcoded English "Path copied" - Add settings.agentModelHead to all locales and use t() in SettingsDialog for "Model for:" agent model row header Co-authored-by: Cursor <cursoragent@cursor.com> * fix(web): update tests to match settings dialog redesign - Add role prop to Toast (alert/status) so error toasts from ConnectorsBrowser are announced immediately by screen readers - Clear connectErrorToast on successful connector retry - Update SettingsDialog.execution tests: - Remove heading assertions for About and MCP server (headers were intentionally removed as duplicate nav labels) - Rewrite CLI env test to use codex-only fields (per-agent filtering means only selected agent's fields are shown) - Update Composio key hint text assertion to match shortened copy - Replace filter button click with select change for Type filter - Replace Configured/Unsupported/Integrated badge checks with updated assertions matching the new media provider UI - Replace disabled BFL row test with coming-soon section check - Update SettingsDialog.media test: remove Fal.ai input assertions (non-integrated providers no longer have editable fields) Co-authored-by: Cursor <cursoragent@cursor.com> * fix(web): unblock CI for #1568 Three small fixes to get Playwright back to green on the settings dialog redesign: 1. `en.ts`: revert `settings.envConfigure` to "Configure execution mode". This PR collapsed both `settings.title` (header gear) and `settings.envConfigure` (entry-side foot pill) to the same string "Execution mode", so `getByRole('button', { name: 'Execution mode' })` resolved to two elements and tripped Playwright strict mode in the three Composio-flow tests (entry-configuration-flows.test.ts:174, 228, 285). Restoring the distinct label also gives screen readers a clearer hint for the pill, which doubles as a status display. Non-English locales still alias the two keys; happy to follow up on those, but they don't gate the (English-only) Playwright suite. 2. entry-configuration-flows.test.ts:167 — `Connectors` heading is now rendered at `<h2>` in the modal-head (SettingsDialog.tsx:1545), with the inner `<h3>` removed by design (see comment around line 1448). Updated the assertion from `level: 3` to `level: 2`. 3. project-management-flows.test.ts:360 — same change for the `Pets` heading. Verified locally with `pnpm --filter @open-design/web typecheck` and `pnpm --filter @open-design/e2e typecheck`. The actual Playwright specs need the dev server up; I didn't rerun them here, but the locator changes are mechanical and match the new DOM. * fix(web): use exact match for Execution mode button locator Playwright's `getByRole({ name })` defaults to substring matching, so `{ name: 'Execution mode' }` still resolved to both the header gear (aria-label "Execution mode") and the entry-side foot pill (aria-label "Configure execution mode" — substring contains "Execution mode"). Strict mode tripped in the three composio-flow tests at lines 202, 257, and 319. Adding `exact: true` makes each call resolve to just the header gear, which opens the same dialog the foot pill does — the test outcomes are unchanged. --------- Co-authored-by: chaoxiaoche <chaoxiaoche@chaoxiaochedeMacBook-Pro.local> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Caprika <56862773+alchemistklk@users.noreply.github.com> Co-authored-by: shangxinyu1 <shangxinyu@refly.ai> Co-authored-by: lefarcen <935902669@qq.com>	2026-05-15 14:35:06 +08:00
Fl0rencess	53148d52c8	feat(media): add SenseAudio TTS provider (#1633 ) * feat(media): add SenseAudio TTS provider Add SenseAudio (https://docs.senseaudio.cn) as a new TTS provider alongside ElevenLabs / MiniMax / FishAudio / Volcengine. Surfaced as the `senseaudio-tts` catalogue id, mapped on the wire to `senseaudio-tts-1.5-260319` — SenseAudio's flagship model with emotion / 多音字 / 公式朗读 / clone / text-generated voice support. Scope here is HTTP non-streaming (POST /v1/t2a_v2 with stream=false) only; SSE and WebSocket transports are intentionally out of scope. - Mirror provider + model entries in apps/daemon and apps/web registries (catalogue drift check stays green). - ENV_KEYS gets `OD_SENSEAUDIO_API_KEY` / `SENSEAUDIO_API_KEY` so the alias scheme matches every other integrated provider. - `renderSenseAudioTTS` in media.ts mirrors renderMinimaxTTS: Bearer auth, voice_setting / audio_setting body, hex-decoded audio under `data.audio`, base_resp envelope split from HTTP-level failures. - NewProjectPanel's audio supportedProviders allowlist now includes `senseaudio` so the picker actually surfaces the new entry. - Audio shape (mp3 / 32kHz / 128kbps / stereo) and default voice (`female_0033_b`) hard-coded for parity with the other TTS paths; MediaContext is unchanged. - New apps/daemon/tests/media-senseaudio.test.ts (8 specs) covers defaults, custom voice, default base URL fall-back, env-key path, missing-key error, base_resp failures, missing audio, and HTTP non-2xx — patterned on media-elevenlabs.test.ts. * docs(media): drop Chinese from SenseAudio provider comment Translate the model-capabilities line in the SenseAudio block comment (media.ts) into English. Keeping the source comments in a single language matches the rest of the daemon and avoids reviewer churn over mixed-locale prose. * fix(web): unblock openai and volcengine speech models in audio picker Per review on #1633, supportedModels()'s audio allowlist in NewProjectPanel was still filtering out gpt-4o-mini-tts (openai) and doubao-tts (volcengine) even though both are marked `integrated: true` in the shared media-models catalogue. Add the two ids so the picker matches the registry and the PR body's "alongside doubao-tts" claim holds true. * style(media): normalize speech hints to bare provider names Strip the trailing descriptions on the speech catalogue hints so every entry shows just the provider name (matching FishAudio / ElevenLabs / SenseAudio): `gpt-4o-mini-tts` → "OpenAI", `minimax-tts` → "MiniMax", `doubao-tts` → "Volcengine". Also move `gpt-4o-mini-tts` to the end of the list so the OpenAI entry sits after the upstream-focused providers, matching the recent picker grouping discussion on #1633. Mirrored in both apps/daemon/src/media-models.ts and apps/web/src/media/ models.ts; catalogue drift check + daemon (1848) + web (1150) suites all green.	2026-05-14 15:26:38 +08:00
kami	4f76e836ae	feat(audio): add ElevenLabs audio support (#1384 ) * docs: add ElevenLabs audio support design * docs: add ElevenLabs audio implementation plan * feat(daemon): add ElevenLabs speech renderer * feat(daemon): add ElevenLabs sound effects renderer * fix(daemon): preserve ElevenLabs sfx durations * feat(web): expose ElevenLabs media providers * feat(daemon): document ElevenLabs audio contract * feat(audio): add ElevenLabs voice selection * chore: ignore superpowers scratch docs * fix(daemon): cache ElevenLabs voice options * fix(audio): expand ElevenLabs voice and SFX selection * fix(audio): align ElevenLabs SFX controls * fix(audio): tighten ElevenLabs SFX prompt budget * fix(audio): preflight ElevenLabs SFX prompt length * fix(audio): surface ElevenLabs lookup failures * fix(audio): sanitize ElevenLabs prompt errors	2026-05-13 15:53:41 +08:00
Tom Huang	56bf6ee1b6	feat: agent-callable research command and /search (#615 ) * feat: pre-generation research (Tavily) for grounded generation Adds an optional pre-generation research step so the agent can produce slides / prototypes / decks grounded in real sources instead of guessing. User flow: 1. Settings -> Tavily Search -> paste API key (or set TAVILY_API_KEY). 2. Click the new Research button in the chat composer. 3. On send, the daemon runs a Tavily search, prepends the findings as a <research_context> block ahead of the system prompt, and spawns the agent. Research progress shows up as status pills in the chat stream; the agent cites sources inline as [1]/[2]/... Phase 1 surface: - Single provider (Tavily), single depth ('shallow'), no LLM synthesis pass (Tavily's `answer` is the summary). - Composer toggle only; no popover / depth picker yet. - Reuses the existing `status` SSE agent payload + StatusPill UI so no new event variants or renderer code are needed. Layers touched: - contracts: ResearchOptions / Source / Findings DTOs; ChatRequest.research; export from index. - daemon: apps/daemon/src/research/{index,tavily}.ts orchestrator + provider; tavily added to MEDIA_PROVIDERS and ENV_KEYS; hook in startChatRun before prompt assembly. - web: ChatComposer toggle + ChatSendMeta; threaded through ChatPane / ProjectView / streamViaDaemon into ChatRequest. Side fix (required to land the feature, but useful on its own): contracts internal relative imports lacked the `.js` suffix that NodeNext module resolution requires. This was already breaking `pnpm --filter @open-design/daemon typecheck` on main; without the fix, none of the new research types were visible to the daemon. All internal contracts imports now carry `.js`. Spec: specs/current/research-feature.md (phases 2-4 outlined for follow-up: composer popover, multi-provider, deep recursion, example skills with research_recommends). Verified: - pnpm --filter @open-design/contracts typecheck/test - pnpm --filter @open-design/daemon typecheck (the chokidar project-watchers test is a pre-existing flake, unrelated) - pnpm --filter @open-design/web typecheck - node scripts/verify-media-models.mjs * fix(daemon): clamp Tavily max_results to 20 Tavily's /search endpoint requires `max_results` in [0, 20]; sending a larger value (e.g. when `research.depth: "deep"` resolves to 30) returns 400 and `runResearch` silently falls back to no-research. Clamp at the provider boundary so Phase 2 depth tiers above 20 still produce results instead of failing the request. Generated-By: looper 0.6.1 (runner=fixer, agent=claude-code) * Remove stale research merge leftovers * Add agent-callable research search * Fix Indonesian locale typecheck * Fix research command invocation edge cases * Harden slash search prompt expansion * Honor research source caps in command contract * Require search reports in design files * Add research data provider settings * Wire web research provider fallback order * Update research provider fallback wording * Revert "Update research provider fallback wording" This reverts commit `86fb6001e3`. * Revert "Wire web research provider fallback order" This reverts commit `4c9e16036b`. * Revert "Add research data provider settings" This reverts commit `23630d1746`. * Add Dexter and Last30Days research skills * Add DCF and Last30Days OD skills * Add Last30Days and Dexter skills * Resolve research review threads --------- Co-authored-by: a1chzt <chizblank@gmail.com>	2026-05-08 10:33:44 +08:00
zztdan	f3024fdc22	feat(media): add Nano Banana image provider (#631 ) * feat(media): add Nano Banana image provider * fix(media): support Gemini API key headers for Nano Banana * refactor(media): move Nano Banana model override flag into provider metadata	2026-05-06 20:26:31 +08:00
lefarcen	9e8177d80a	feat(media): integrate xAI Grok Imagine (image + video + native audio) (#276 ) * feat(media): integrate xAI Grok Imagine (image + video + native audio) Adds a real provider integration for xAI's Imagine API alongside OpenAI, Volcengine and HyperFrames. The route surfaces as two model entries: * grok-imagine-image — POST /v1/images/generations, synchronous, asks for b64_json so the bytes arrive in one round-trip; sniffs the returned magic bytes so JPEG payloads land with the right extension * grok-imagine-video — POST /v1/videos/generations + GET /v1/videos/{id} polling, with a 4s tick + onProgress heartbeat (mirrors the Volcengine handler so the agent's bash watchdog doesn't kill long polls). Native audio (AAC) ships in the same file as the H.264 video — that's the differentiator vs Seedance and Sora. Picker visibility is gated by NewProjectPanel's hardcoded surface→ provider allowlist; grok is added to image + video. Settings UI picks up the provider automatically off MEDIA_PROVIDERS. Credentials: XAI_API_KEY (canonical, matches the official SDK) or OD_GROK_API_KEY override; both are honoured ahead of any value pasted into Settings, so users who already export XAI_API_KEY don't have to re-enter it in the UI. Verified end-to-end with a real key: * image: 1024×1024 JPEG, single round-trip * video: 5s 16:9 H.264 + AAC, ~46s wall clock, pending→done * fix(i18n): include pl in EXPECTED_LOCALES so locales.test passes Drive-by unblock for CI on this branch — `pl.ts` and `LOCALES`/`Locale` in types.ts already include Polish, but the test's hardcoded EXPECTED_LOCALES didn't, so every PR has been red on locales.test.ts since the locale was added. * fix(media): grok review feedback — credential precedence comment + poll timeout diagnostics Two small fixes from PR #276 review: * media-config.ts — comment claimed XAI_API_KEY won, but readEnvKey iterates the array in order so OD_GROK_API_KEY actually wins (matches every other provider's OD_* override convention). Rewrite the comment to describe what the code does instead of flipping the array — the precedence is correct, the comment was wrong. * media.ts renderGrokVideo — single throw at the bottom couldn't tell "timed out after N seconds with status still pending" from "submit returned neither inline video nor a request_id to poll." Split into two branches so operators know whether to bump OD_GROK_VIDEO_MAX_POLL_MS or file an upstream contract bug.	2026-05-02 17:06:17 +08:00
Tom Huang	3f266103b0	feat(media): port generation workflow onto main (#12 ) Co-authored-by: Elian <elian@EliandeMacBook-Pro.local>	2026-04-30 22:44:00 +08:00

13 commits