Commit graph

13 commits

Author SHA1 Message Date
maybeyourking
881571dea7
fix(media): route custom-image edits through images API (#3087)
* fix(media): route custom-image edits through images API

* fix(media): normalize custom-image endpoint suffixes

---------

Co-authored-by: Artist Ning <dingkuake@yeah.net>
Co-authored-by: Siri-Ray <2667192167@qq.com>
2026-05-29 08:09:44 +00:00
nettee
6c6ee44e07
docs(media): clarify custom providers and ComfyUI status (#479) (#2942)
Closes #479

Generated-By: looper 0.9.0 (runner=worker, agent=opencode)
2026-05-26 06:22:02 +00:00
mzl163
210b94069a
feat(senseaudio): BYOK chat with image + video generation tools (#2065)
* feat(senseaudio): BYOK chat with image + video generation tools

Adds SenseAudio as a first-class BYOK chat protocol and wires the daemon's
chat proxy with a tool loop so BYOK users can generate images and videos
without dropping to a CLI agent.

- BYOK protocol: new senseaudio tab + /api/proxy/senseaudio/stream route +
  connection-test + provider-models discovery (OpenAI-compatible wire)
- Tool loop: generate_image (synchronous /v1/image/sync) and generate_video
  (async /v1/video/create + 5s polling /v1/video/status, 10-min ceiling,
  periodic progress log every 30s)
- Settings dropdown + chat-composer dropdown for the BYOK image model
  default; generate_image's model enum lets the LLM override per call
- Seed-on-success: a successful BYOK chat call idempotently mirrors the
  key into media-config (preserves env-resolved + already-stored keys)
- Generated artifacts land in <projectsRoot>/<projectId>/ so FileViewer,
  DesignFilesPanel, and project export pick them up automatically;
  legacy /api/byok-image/:id route kept for old conversation links
- Markdown renderer learns ![alt](url) image syntax with a scheme
  allowlist (http(s) / data:image/ / blob: / relative paths)
- i18n key settings.byokImageModel across all 19 locales
- 3 SenseAudio image models registered (2.0, 1.0, doubao-seedream-5.0);
  1 video model (doubao-seedance-2.0)
- Tests: byok-tools (29), media-senseaudio-image (8), media-config seed
  (7), proxy-routes (47), markdown image rendering (8)

* fix(senseaudio): unblock image gen + design file preview switching

- SenseAudio /v1/image/sync rejected the previous size mapping with
  `参数错误:size` (1664x936, 936x1664, 1280x960, 960x1280 are not in
  the gateway's accepted set). Switched to standard HD / SD sizes that
  every aspect bucket can hit: 1024×1024, 1280×720, 720×1280,
  1024×768, 768×1024. Kept the byok-tools and media.ts tables in sync
  so the BYOK chat tool and the CLI agent path both stop failing on
  non-square aspects.

- DesignFilesPanel's <DfPreview> was missing a key prop, so React
  reused the same iframe DOM node when the user picked a different
  file — the src prop changed but the iframe never navigated. Added
  key={previewFile.name} so the previous preview unmounts cleanly.

- Updated byok-tools + media-senseaudio-image tests for the new size
  expectations.

* docs(senseaudio): clear stale provider hint + update README

- Settings → Media → SenseAudio: clear the auto-promoted
  "Image · TTS · 70+ voices · clone" hint; the provider label alone is
  enough now that the BYOK chat surface covers image + video tooling.
- README: list the new senseaudio (and missing ollama) proxy routes so
  the BYOK section reflects what the daemon actually serves, and
  mention the generate_image / generate_video chat tools that ship
  with the SenseAudio path.

* fix(senseaudio): address PR #2065 review feedback

Three non-blocking review notes from @PerishCode on PR #2065:

1. Drop the dead /api/byok-image/:id route. The PR description claimed
   it was "legacy fallback for old chat history" but that storage
   layout never existed on main, so the route can only ever 400 or
   404 — never 200. Removed the handler, the isSafeByokImageId
   export, the unused createReadStream / stat / path / Request /
   Response imports, and the two byok-image regression tests.

2. Add rejectProxyPluginContext guard to the senseaudio proxy
   handler so it matches the invariant the other five proxy paths
   already enforce (plugin runs must go through /api/runs for
   snapshot pinning). Extended the existing "API fallback rejects
   plugin runs" describe to also cover /api/proxy/senseaudio/stream
   with the 409 PLUGIN_REQUIRES_DAEMON expectation.

3. Wrap the secondary image / video downloads (the URLs the
   SenseAudio gateway hands back in /v1/image/sync .url and
   /v1/video/status .video_url) in validateBaseUrlResolved so a
   malicious gateway can't point us at 169.254.169.254 (AWS / Azure
   metadata) or RFC1918 hosts via the response payload. Also passed
   `redirect: 'error'` on both fetches to match the SSRF posture
   the primary proxy fetch already uses. The new
   assertExternalAssetUrl helper lives next to executeGenerateImage
   so future tool downloads can reuse it.

Tests: 120/120 daemon tests pass; guard + typecheck green.

* fix(senseaudio): mirror SSRF guard onto renderSenseAudioImage CLI path

Follow-up to 01b1260a — the chat-tool fix in byok-tools.ts wasn't
mirrored onto the parallel renderSenseAudioImage path in media.ts.
Same attacker-controllable shape (gateway-returned `data.url`),
same one-line fix.

- Hoist assertExternalAssetUrl from byok-tools.ts into
  connectionTest.ts next to validateBaseUrlResolved so both call
  sites (the BYOK chat tool loop AND the CLI agent media dispatcher)
  share one helper. Made the error strings provider-agnostic so a
  future caller doesn't get a misleading "senseaudio" attribution
  for a Volcengine / Grok / etc. download.
- renderSenseAudioImage now runs the response url through
  assertExternalAssetUrl before fetching bytes, and passes
  redirect: 'error' to block a 3xx hop into private space.

Scope intentionally limited to the senseaudio path PerishCode
flagged; the other unguarded fetch(entry.url) call sites in
media.ts (OpenAI / Volcengine / Grok / Nano-Banana) are pre-existing
patterns and belong in a separate follow-up if the daemon wants
defense-in-depth across every provider.

Tests: 127/127 daemon tests pass; guard + typecheck green.

---------

Co-authored-by: unknown <mazeliang@sensetime.com>
2026-05-19 23:14:56 +08:00
Joey-nexu
56988e406c
feat: integrate xAI SuperGrok subscription as a credential source for Grok media + X search (#2134)
* feat(daemon): add xAI OAuth client with PKCE + token storage

Wraps mcp-oauth.ts PKCE primitives for xAI's auth.x.ai OAuth server.
xAI doesn't speak MCP and doesn't expose Dynamic Client Registration,
so issuer / endpoints / client_id / scope / loopback :56121 are
hardcoded constants.

Adds xai-tokens.ts for persistent storage, mirroring mcp-tokens.ts:
atomic write + chmod 0600 + per-dataDir in-memory mutex. Simplified
for the single-token case (no per-server-id map).

Reference: NousResearch/hermes-agent hermes_cli/auth.py:93-100.
PoC reuses Hermes client_id (b1a00492-...); replace before stable
release once Open Design has its own.

Tests: 11 + 20, all green. tsc --noEmit clean. pnpm guard clean.

* feat(daemon): expose xAI Grok models in Hermes runtime fallbackModels

Lists grok-4.3, grok-4.20-reasoning, grok-4.20-non-reasoning, and
grok-4.20-multi-agent-0309 as discoverable Hermes fallback models.
A user who has not installed Hermes yet now sees these xAI options
in the model picker, signalling that `hermes auth add xai-oauth`
(SuperGrok subscription) or XAI_API_KEY unlocks Grok in Open Design
without OD itself implementing OAuth-for-chat.

`fetchModels` (which calls `hermes acp` to enumerate the user's
actually-installed providers) is unchanged; this list only kicks
in when probing fails (e.g. Hermes off PATH).

Reference: xAI × Nous Research grok-hermes integration announcement,
2026-05-15. https://x.ai/news/grok-hermes

* feat(media): route Grok Imagine through xAI OAuth credentials

Adds resolveXAIBearer() — a refresh-aware helper on top of the
xai-tokens.json store written by the daemon's OAuth client. Returns
a fresh access_token, transparently refreshing in-place when the
stored token enters the 120 s expiry skew window.

Wires it into media-config.ts so the existing Grok provider gets the
same OAuth-fallback treatment OD already gives the OpenAI provider:
env keys win, then stored Settings keys, then OD-native xAI OAuth,
then a borrowed Hermes-side xai-oauth token from ~/.hermes/auth.json.
SuperGrok subscribers who already authorized Hermes get OD image /
video generation routed through their subscription with zero extra
setup.

Updates the "no xAI API key" error in renderGrokImage / renderGrokVideo
to point at the new OAuth path so users hitting it know they have a
zero-cost option.

Also exposes mediaConfigDir() so credential helpers next to
media-config.json (like xai-tokens.json) reuse the same precedence:
OD_MEDIA_CONFIG_DIR > OD_DATA_DIR > <projectRoot>/.od.

Tests: 7 new xai-credentials cases (refresh on expiry, refresh
failure, missing refresh_token, response without refresh_token) +
8 new media-config Grok OAuth fallback cases (OD-native, Hermes
borrow, OD vs Hermes precedence, env precedence, stored precedence,
unconfigured, expired-without-refresh). All green; tsc / guard clean.

* feat(media): add xAI Grok TTS provider

Registers grok-tts in the speech model catalog and wires up
renderXAITTS to dispatch (provider=grok, surface=audio, kind=speech)
to https://api.x.ai/v1/tts. xAI exposes a dedicated /tts endpoint
that returns raw audio bytes — distinct from OpenAI's /audio/speech
JSON shape — so TTS gets its own renderer rather than reusing
renderOpenAISpeech.

Credentials route through the same OAuth-aware path as Grok image
and video (PR follow-up to media-config.ts), so a SuperGrok
subscriber gets TTS for free once they have authorized once.

Default request body matches the documented minimal shape
(text / voice_id / language); sample_rate / bit_rate / codec are
left unset so the server applies its mp3 / 24 kHz / 128 kbps
defaults. Plumbing for explicit overrides is left for a later PR
once the agent-facing contract grows the corresponding flags.

Tests: 5 cases covering documented body shape, voice / language
override, env-key fallback, server-error surfacing, and the
no-credentials error. All green; tsc / guard clean.

Reference: https://docs.x.ai/developers/model-capabilities/audio/text-to-speech

* feat(daemon, web): expose xAI OAuth flow in Settings UI

Closes the loop on the Grok integration: a SuperGrok subscriber can
now authorize Open Design directly from Settings → Media Providers →
Grok, with no API key and no Hermes install. After authorizing, image,
video, and TTS routes pick up the bearer through the OAuth fallback
chain added in 'route Grok Imagine through xAI OAuth credentials'.

Daemon side
- xai-oauth-server.ts opens a one-shot HTTP listener on
  127.0.0.1:56121 to receive the OAuth callback. The redirect URI is
  hard-locked to that port because the PoC reuses the Hermes-issued
  client_id. Listener self-closes on first matching callback or after
  a 30 min timeout.
- xai-routes.ts wires three endpoints onto the daemon's HTTP app:
    POST /api/xai/oauth/start       — mint state, open listener,
                                       return authorize URL
    GET  /api/xai/auth/status       — has-token / expiry / in-flight
    POST /api/xai/oauth/disconnect  — wipe stored token, stop listener
- server.ts registers xai-routes alongside the existing mcp-routes.

Web side
- XaiOAuthControl.tsx renders a Sign in / Reconnect / Disconnect
  surface mirroring McpOAuthControl, but polls /api/xai/auth/status
  exclusively because the :56121 callback page lives in a separate
  process and can't postMessage back to the OD UI. SettingsDialog
  embeds it inside the Grok provider row.

Tests: 9 listener cases (bind / state mismatch / replay / favicon /
EADDRINUSE / timeout / explicit error param / one-shot consume /
early stop) + 8 route cases (start mints PKCE URL, second start
replaces in-flight listener, status reports listening + connected,
callback ok stores token, callback error skips storage, disconnect
wipes, cross-origin guard rejects all three endpoints). All 17 +
the 74 from prior commits pass; tsc / web typecheck / pnpm guard
clean.

PoC client_id stays Hermes-issued; user-visible strings are
hardcoded English pending an i18n pass before stable.

* fix(daemon, web): xAI OAuth follow-up — paste-back, X search, UX polish

PoC testing surfaced four real-world rough edges in the Sign in flow
that were not obvious before getting an actual SuperGrok subscription
in front of it. None alter the architecture in 'expose xAI OAuth flow
in Settings UI'; they round it off so the path the user actually walks
matches the one the design assumed.

1. Layout. XaiOAuthControl was a grid item inside .media-provider-body
   and got squeezed into the API-key column. Moves it out of the body
   so the row's flex-column layout gives it the full width — matches
   what every other Settings provider OAuth surface gets.

2. Paste-back. xAI's `auth.x.ai` page often shows a "cannot connect to
   your application" fallback that hands the user a code instead of
   redirecting back to 127.0.0.1:56121, even when the loopback listener
   is reachable (browser DOES quietly redirect in the background, but
   the page lies and shows the manual-paste UI anyway). Adds:
     - POST /api/xai/oauth/complete that takes {state, code} and runs
       completeXAIAuth + setXAIToken + stops the listener.
     - A paste-back input row in XaiOAuthControl that surfaces while
       the dance is in flight; submitting either via Enter or the
       button calls /complete and falls through to the same connected
       state the loopback path lands on.

3. X search. New POST /api/xai/search wraps Grok's native x_search tool
   through the Responses API, gated on the same OAuth-first credential
   chain as Grok image / video / TTS. Body accepts query (required),
   allowed_x_handles, excluded_x_handles, from_date, to_date, model.
   Returns { answer, citations[], model } parsed from the Responses
   payload via two newly exported helpers (extractAnswerText,
   extractUrlCitations).

4. State machine + warning banner. Three issues collapsed into one:
     - Polling that flipped busy → 'idle' the moment the loopback
       listener self-closed disabled the paste-back input even though
       the dance was still recoverable. Removed that branch; awaiting
       state now only ends on connected=true or explicit cancel.
     - paste-input `disabled` was over-eager (`busy !== 'awaiting' &&
       busy !== 'refreshing'`); now it's only blocked while a submit
       is in flight (`busy === 'refreshing'`).
     - Added a heads-up banner inside the awaiting region explaining
       that xAI's "cannot connect" page is a UX bug on their side and
       the OD panel is the source of truth for sign-in success. The
       connected message picks up the cue too: "You can close any
       open xAI browser tabs now."

Tests: +12 cases on top of the existing 17. The complete endpoint
covers happy path, blank-field rejection, and unknown-state error.
The search endpoint covers blank-query rejection, no-credentials 401,
full bearer / x_search-options forwarding with response parsing, and
upstream-error pass-through. Two helper functions get four direct
parser cases. All 29 in the file pass; 225 across the daemon test
suite pass; tsc / web tsc / pnpm guard all clean.

* fix(daemon): satisfy tsconfig.tests.json strictness in xai test files

The CI workspace typecheck step runs tsconfig.tests.json (which extends
tsconfig.json's strict + exactOptionalPropertyTypes settings and adds
the tests/ directory to the include set) — but the local
`tsc -p tsconfig.json --noEmit` I ran while iterating only covered
src/. That gap let two classes of strict-mode errors slip into the
PR's CI:

- `let outcome: CallbackOutcome | null = null` mutated from inside an
  async callback narrowed to `never` after `outcome?.kind` because TS
  doesn't track cross-function mutation. Switched the seven sites in
  xai-oauth-server.test.ts to a `{ current: CallbackOutcome | null }`
  ref object — TS does narrow .current correctly, so `kind` / `error`
  field access stops collapsing to `never`.
- `await r.json()` returns `Promise<unknown>` in the lib.dom typings
  shipped with TS 5.x, so every `body.field` / `status.connected`
  access in xai-routes.test.ts tripped TS18046. Added a one-line
  `jsonOf<T = any>` helper at the top of the file and switched all
  call sites (both `await r.json()` and `.then((r) => r.json())`).
- The cross-origin guard test iterated `for (const [method, path] of
  [...])` — under noUncheckedIndexedAccess that destructures to
  `string | undefined`, which RequestInit.method (a `string` under
  exactOptionalPropertyTypes) won't accept. Hoisted the cases to a
  typed `ReadonlyArray<readonly [string, string]>` so the elements
  stay non-optional.

Behaviour is unchanged; vitest still reports 29/29 across these two
files. tsc -p tsconfig.tests.json --noEmit now passes locally,
matching what CI will run.

* fix(xai-oauth): preserve refresh_token + release :56121 on cancel

Two lifecycle issues Looper flagged on the prior commit:

1. resolveXAIBearer dropped the existing refresh_token whenever the
   refresh response omitted one. RFC 6749 §6 explicitly allows the
   server to skip refresh_token rotation and keep the old one valid;
   xAI's behaviour is currently to rotate, but a future change could
   silently break OD users. With the old code the first refresh
   succeeded but persisted a token with no refresh credential, so the
   next expiry forced the user back through Sign in even though their
   grant was still good. Carries the previous refresh_token forward
   when fresh.refresh_token is absent. Updates the matching
   xai-credentials test to assert the carried-forward value instead of
   the previous (incorrect) "drop it" assertion.

2. The Cancel button in XaiOAuthControl only cleared React-side
   pending state; the daemon's one-shot 127.0.0.1:56121 listener kept
   running for the full 30 min server timeout. /api/xai/auth/status
   would still report listening=true, and that singleton port could
   block the next Sign in (or a Hermes session on the same machine).
   Adds POST /api/xai/oauth/cancel that calls stopActiveListener()
   without touching the stored token (Disconnect is the destructive
   path; this is the narrow "release the port" affordance), wires the
   UI Cancel handler to fire it, and adds two route tests covering
   the listener-stopped-but-token-preserved invariant and the no-op
   behaviour when no listener is in flight.

All 38 xai tests + tsconfig.tests.json typecheck + web typecheck +
pnpm guard pass.

* fix(xai-oauth): close two more lifecycle gaps Looper flagged

Both are non-blocking but cheap and right.

1. window.open used 'noopener=no,noreferrer=no' (carried over from the
   sibling McpOAuthControl), which deliberately KEEPS the auth.x.ai
   tab's window.opener reference back to the Settings tab. Reverse
   tabnabbing risk if the auth page or any redirect target along the
   OAuth chain ever turns hostile, with no upside — the xAI flow
   doesn't use postMessage, the daemon receives the code through the
   :56121 listener (or paste-back), so opener access buys nothing.
   Switched to 'noopener,noreferrer'.

2. PendingAuthCache was constructed with its default 10 min TTL while
   the loopback listener self-closes at 30 min and the UI shows a
   pending state for the same 30 min. After 10 min, a user looking at
   a live paste-back input would hit `xAI OAuth state not found or
   expired` even though everything visible (and the daemon socket)
   still claimed the dance was live. Constructed the cache with
   30 * 60 * 1000 so the PKCE state, the open :56121 socket, and the
   paste-back UI all expire together.

The third inline comment (XaiOAuthControl.tsx:248 — "Cancel only
clears React-side state") was a stale reference: the previous commit
fd04887 wired the Cancel button to fire `cancelInFlightOAuth()` which
hits the new `POST /api/xai/oauth/cancel` endpoint. Looper carried
the old comment forward when re-reviewing the rebased file; no code
change needed.

All 38 xai tests still green; tsconfig.tests.json clean; web tsc
clean; pnpm guard clean.

* fix(xai-oauth): keep loopback listener open on stale-tab callbacks

The one-shot listener marked itself consumed at the top of every
/callback request, then closed itself in the finally block whether
or not the state actually matched. A stray browser tab replaying an
old /callback?state=… (real-world scenario: user re-clicked Sign in
before closing the previous tab) would therefore close the singleton
:56121 listener with a state-mismatch error before the real xAI
redirect could arrive.

Now we only tear the listener down on outcomes that actually
terminate the dance:
  - ok callback (matched state, code present)
  - explicit ?error= from xAI (auth provider terminated; we should
    propagate, not wait for the 30 min timeout). xAI's error
    redirects may or may not echo state, but a stale tab can't
    fabricate ?error= without colluding with the auth server, so
    this branch is safe to consume.

Stale tabs / browser prefetches / malformed redirects still get the
HTTP 400 / "Sign-in failed" page, but the listener stays open and
the matching xAI redirect that arrives next is what closes it.

Tests: replaces the previous "rejects state mismatch with kind=error"
test with the recovery scenario (stale-then-real callbacks both hit
the listener; only the real one fires onCallback). Adds a sibling
case for missing-code / missing-state callbacks. xai-oauth-server
suite is now 10/10; full xai sweep 39/39.

* fix(xai-oauth): scope error-callback consume to matching/missing state

c00252c simplified the consume rule to "any explicit ?error= closes
the listener", which was broader than the stale-tab protection added
in the same commit. A browser history replay of an old
`/callback?error=access_denied&state=stale` would set `consumed`,
fire `onCallback`, and tear down the singleton 127.0.0.1:56121 socket
before the current dance's real callback could land — undoing the
defence the commit was supposed to add.

Tighten the rule so error-callbacks consume only when:
  - the URL carries no state (xAI rejected before issuing one, so
    there's nothing to compare against — safe to terminate), or
  - the carried state matches our expectedState (xAI explicitly
    rejected this dance; propagate immediately rather than wait for
    the 30 min timeout).

An ?error= replay carrying a *different* state is now treated like
the stale success replay above: returns the 400 page to the browser,
keeps the listener live, lets the real callback close it.

Tests: adds two cases — error+wrong-state followed by real success
must still resolve to ok; error+matching-state still consumes the
listener and surfaces the error to onCallback. xai-oauth-server
suite goes 10 → 12; full xai sweep 39 → 41.
2026-05-19 11:10:34 +08:00
Quang Do
13c8bc4193
feat(daemon): add OpenAI-compatible media providers (#1712)
* feat(daemon): add openai-compatible media providers

* fix(web): sync media registry with routed providers
2026-05-15 23:05:03 +08:00
Nicholas-Xiong
f78b0d3a2a
feat: add Leonardo.ai image provider integration (#1123)
* feat: add Leonardo.ai image provider integration

Implements Leonardo.ai as a fully supported image provider with the following models:
- Phoenix (leonardo-phoenix) - versatile general-purpose model
- Kino XL (leonardo-kino-xl) - cinematic quality
- FLUX Dev (leonardo-flux-dev) - FLUX.1 [dev]
- FLUX Schnell (leonardo-flux-schnell) - fast generation
- Anime Pastel Dream (leonardo-anime-pastel) - anime style

Features:
- Async generation with polling (2-minute timeout)
- Support for standard aspect ratios: 1:1, 16:9, 9:16, 4:3, 3:4
- Bearer token authentication
- Automatic image format detection

Implementation:
- Frontend: apps/web/src/media/models.ts
- Backend: apps/daemon/src/media-models.ts
- Renderer: renderLeonardoImage() function (~130 lines)
- Dispatcher: integrated into media generation pipeline

API Integration:
- Submit: POST /generations
- Poll: GET /generations/{id}
- Response: generations_by_pk.generated_images[0].url

Addresses #984

* fix: Add leonardo to MediaProviderId type and register env vars

1. Add 'leonardo' to MediaProviderId union in apps/web/src/media/models.ts
   to fix TypeScript build error
2. Register LEONARDO_API_KEY env vars in apps/daemon/src/media-config.ts
   following the same pattern as other providers

This fixes the CI TypeScript build failure and enables proper env-based
credential lookup for Leonardo.ai provider.

---------

Co-authored-by: lefarcen <935902669@qq.com>
2026-05-15 17:16:28 +08:00
Fl0rencess
53148d52c8
feat(media): add SenseAudio TTS provider (#1633)
* feat(media): add SenseAudio TTS provider

Add SenseAudio (https://docs.senseaudio.cn) as a new TTS provider
alongside ElevenLabs / MiniMax / FishAudio / Volcengine. Surfaced as
the `senseaudio-tts` catalogue id, mapped on the wire to
`senseaudio-tts-1.5-260319` — SenseAudio's flagship model with
emotion / 多音字 / 公式朗读 / clone / text-generated voice support.

Scope here is HTTP non-streaming (POST /v1/t2a_v2 with stream=false)
only; SSE and WebSocket transports are intentionally out of scope.

- Mirror provider + model entries in apps/daemon and apps/web
  registries (catalogue drift check stays green).
- ENV_KEYS gets `OD_SENSEAUDIO_API_KEY` / `SENSEAUDIO_API_KEY` so the
  alias scheme matches every other integrated provider.
- `renderSenseAudioTTS` in media.ts mirrors renderMinimaxTTS: Bearer
  auth, voice_setting / audio_setting body, hex-decoded audio under
  `data.audio`, base_resp envelope split from HTTP-level failures.
- NewProjectPanel's audio supportedProviders allowlist now includes
  `senseaudio` so the picker actually surfaces the new entry.
- Audio shape (mp3 / 32kHz / 128kbps / stereo) and default voice
  (`female_0033_b`) hard-coded for parity with the other TTS paths;
  MediaContext is unchanged.
- New apps/daemon/tests/media-senseaudio.test.ts (8 specs) covers
  defaults, custom voice, default base URL fall-back, env-key path,
  missing-key error, base_resp failures, missing audio, and HTTP
  non-2xx — patterned on media-elevenlabs.test.ts.

* docs(media): drop Chinese from SenseAudio provider comment

Translate the model-capabilities line in the SenseAudio block comment
(media.ts) into English. Keeping the source comments in a single
language matches the rest of the daemon and avoids reviewer churn over
mixed-locale prose.

* fix(web): unblock openai and volcengine speech models in audio picker

Per review on #1633, supportedModels()'s audio allowlist in
NewProjectPanel was still filtering out gpt-4o-mini-tts (openai) and
doubao-tts (volcengine) even though both are marked `integrated: true`
in the shared media-models catalogue. Add the two ids so the picker
matches the registry and the PR body's "alongside doubao-tts" claim
holds true.

* style(media): normalize speech hints to bare provider names

Strip the trailing descriptions on the speech catalogue hints so every
entry shows just the provider name (matching FishAudio / ElevenLabs /
SenseAudio): `gpt-4o-mini-tts` → "OpenAI", `minimax-tts` → "MiniMax",
`doubao-tts` → "Volcengine". Also move `gpt-4o-mini-tts` to the end of
the list so the OpenAI entry sits after the upstream-focused providers,
matching the recent picker grouping discussion on #1633.

Mirrored in both apps/daemon/src/media-models.ts and apps/web/src/media/
models.ts; catalogue drift check + daemon (1848) + web (1150) suites all
green.
2026-05-14 15:26:38 +08:00
kami
4f76e836ae
feat(audio): add ElevenLabs audio support (#1384)
* docs: add ElevenLabs audio support design

* docs: add ElevenLabs audio implementation plan

* feat(daemon): add ElevenLabs speech renderer

* feat(daemon): add ElevenLabs sound effects renderer

* fix(daemon): preserve ElevenLabs sfx durations

* feat(web): expose ElevenLabs media providers

* feat(daemon): document ElevenLabs audio contract

* feat(audio): add ElevenLabs voice selection

* chore: ignore superpowers scratch docs

* fix(daemon): cache ElevenLabs voice options

* fix(audio): expand ElevenLabs voice and SFX selection

* fix(audio): align ElevenLabs SFX controls

* fix(audio): tighten ElevenLabs SFX prompt budget

* fix(audio): preflight ElevenLabs SFX prompt length

* fix(audio): surface ElevenLabs lookup failures

* fix(audio): sanitize ElevenLabs prompt errors
2026-05-13 15:53:41 +08:00
nettee
ef9ca7baff
fix(daemon): typecheck core server paths (#952) 2026-05-08 20:43:51 +08:00
Tom Huang
56bf6ee1b6
feat: agent-callable research command and /search (#615)
* feat: pre-generation research (Tavily) for grounded generation

Adds an optional pre-generation research step so the agent can produce
slides / prototypes / decks grounded in real sources instead of guessing.

User flow:
  1. Settings -> Tavily Search -> paste API key (or set TAVILY_API_KEY).
  2. Click the new Research button in the chat composer.
  3. On send, the daemon runs a Tavily search, prepends the findings
     as a <research_context> block ahead of the system prompt, and
     spawns the agent. Research progress shows up as status pills in
     the chat stream; the agent cites sources inline as [1]/[2]/...

Phase 1 surface:
  - Single provider (Tavily), single depth ('shallow'), no LLM
    synthesis pass (Tavily's `answer` is the summary).
  - Composer toggle only; no popover / depth picker yet.
  - Reuses the existing `status` SSE agent payload + StatusPill UI
    so no new event variants or renderer code are needed.

Layers touched:
  - contracts: ResearchOptions / Source / Findings DTOs;
    ChatRequest.research; export from index.
  - daemon: apps/daemon/src/research/{index,tavily}.ts orchestrator
    + provider; tavily added to MEDIA_PROVIDERS and ENV_KEYS; hook
    in startChatRun before prompt assembly.
  - web: ChatComposer toggle + ChatSendMeta; threaded through
    ChatPane / ProjectView / streamViaDaemon into ChatRequest.

Side fix (required to land the feature, but useful on its own):
  contracts internal relative imports lacked the `.js` suffix that
  NodeNext module resolution requires. This was already breaking
  `pnpm --filter @open-design/daemon typecheck` on main; without the
  fix, none of the new research types were visible to the daemon.
  All internal contracts imports now carry `.js`.

Spec: specs/current/research-feature.md (phases 2-4 outlined for
follow-up: composer popover, multi-provider, deep recursion, example
skills with research_recommends).

Verified:
  - pnpm --filter @open-design/contracts typecheck/test
  - pnpm --filter @open-design/daemon typecheck (the chokidar
    project-watchers test is a pre-existing flake, unrelated)
  - pnpm --filter @open-design/web typecheck
  - node scripts/verify-media-models.mjs

* fix(daemon): clamp Tavily max_results to 20

Tavily's /search endpoint requires `max_results` in [0, 20]; sending a
larger value (e.g. when `research.depth: "deep"` resolves to 30) returns
400 and `runResearch` silently falls back to no-research. Clamp at the
provider boundary so Phase 2 depth tiers above 20 still produce results
instead of failing the request.

Generated-By: looper 0.6.1 (runner=fixer, agent=claude-code)

* Remove stale research merge leftovers

* Add agent-callable research search

* Fix Indonesian locale typecheck

* Fix research command invocation edge cases

* Harden slash search prompt expansion

* Honor research source caps in command contract

* Require search reports in design files

* Add research data provider settings

* Wire web research provider fallback order

* Update research provider fallback wording

* Revert "Update research provider fallback wording"

This reverts commit 86fb6001e3.

* Revert "Wire web research provider fallback order"

This reverts commit 4c9e16036b.

* Revert "Add research data provider settings"

This reverts commit 23630d1746.

* Add Dexter and Last30Days research skills

* Add DCF and Last30Days OD skills

* Add Last30Days and Dexter skills

* Resolve research review threads

---------

Co-authored-by: a1chzt <chizblank@gmail.com>
2026-05-08 10:33:44 +08:00
zztdan
f3024fdc22
feat(media): add Nano Banana image provider (#631)
* feat(media): add Nano Banana image provider

* fix(media): support Gemini API key headers for Nano Banana

* refactor(media): move Nano Banana model override flag into provider metadata
2026-05-06 20:26:31 +08:00
lefarcen
9e8177d80a
feat(media): integrate xAI Grok Imagine (image + video + native audio) (#276)
* feat(media): integrate xAI Grok Imagine (image + video + native audio)

Adds a real provider integration for xAI's Imagine API alongside OpenAI,
Volcengine and HyperFrames. The route surfaces as two model entries:

  * grok-imagine-image — POST /v1/images/generations, synchronous, asks
    for b64_json so the bytes arrive in one round-trip; sniffs the
    returned magic bytes so JPEG payloads land with the right extension
  * grok-imagine-video — POST /v1/videos/generations + GET /v1/videos/{id}
    polling, with a 4s tick + onProgress heartbeat (mirrors the
    Volcengine handler so the agent's bash watchdog doesn't kill long
    polls). Native audio (AAC) ships in the same file as the H.264
    video — that's the differentiator vs Seedance and Sora.

Picker visibility is gated by NewProjectPanel's hardcoded surface→
provider allowlist; grok is added to image + video. Settings UI picks up
the provider automatically off MEDIA_PROVIDERS.

Credentials: XAI_API_KEY (canonical, matches the official SDK) or
OD_GROK_API_KEY override; both are honoured ahead of any value pasted
into Settings, so users who already export XAI_API_KEY don't have to
re-enter it in the UI.

Verified end-to-end with a real key:
  * image: 1024×1024 JPEG, single round-trip
  * video: 5s 16:9 H.264 + AAC, ~46s wall clock, pending→done

* fix(i18n): include pl in EXPECTED_LOCALES so locales.test passes

Drive-by unblock for CI on this branch — `pl.ts` and `LOCALES`/`Locale`
in types.ts already include Polish, but the test's hardcoded
EXPECTED_LOCALES didn't, so every PR has been red on
locales.test.ts since the locale was added.

* fix(media): grok review feedback — credential precedence comment + poll timeout diagnostics

Two small fixes from PR #276 review:

* media-config.ts — comment claimed XAI_API_KEY won, but readEnvKey
  iterates the array in order so OD_GROK_API_KEY actually wins
  (matches every other provider's OD_* override convention). Rewrite
  the comment to describe what the code does instead of flipping the
  array — the precedence is correct, the comment was wrong.

* media.ts renderGrokVideo — single throw at the bottom couldn't tell
  "timed out after N seconds with status still pending" from "submit
  returned neither inline video nor a request_id to poll." Split into
  two branches so operators know whether to bump
  OD_GROK_VIDEO_MAX_POLL_MS or file an upstream contract bug.
2026-05-02 17:06:17 +08:00
Tom Huang
3f266103b0
feat(media): port generation workflow onto main (#12)
Co-authored-by: Elian <elian@EliandeMacBook-Pro.local>
2026-04-30 22:44:00 +08:00