feat(senseaudio): BYOK chat with image + video generation tools (#2065)

* feat(senseaudio): BYOK chat with image + video generation tools

Adds SenseAudio as a first-class BYOK chat protocol and wires the daemon's
chat proxy with a tool loop so BYOK users can generate images and videos
without dropping to a CLI agent.

- BYOK protocol: new senseaudio tab + /api/proxy/senseaudio/stream route +
  connection-test + provider-models discovery (OpenAI-compatible wire)
- Tool loop: generate_image (synchronous /v1/image/sync) and generate_video
  (async /v1/video/create + 5s polling /v1/video/status, 10-min ceiling,
  periodic progress log every 30s)
- Settings dropdown + chat-composer dropdown for the BYOK image model
  default; generate_image's model enum lets the LLM override per call
- Seed-on-success: a successful BYOK chat call idempotently mirrors the
  key into media-config (preserves env-resolved + already-stored keys)
- Generated artifacts land in <projectsRoot>/<projectId>/ so FileViewer,
  DesignFilesPanel, and project export pick them up automatically;
  legacy /api/byok-image/:id route kept for old conversation links
- Markdown renderer learns ![alt](url) image syntax with a scheme
  allowlist (http(s) / data:image/ / blob: / relative paths)
- i18n key settings.byokImageModel across all 19 locales
- 3 SenseAudio image models registered (2.0, 1.0, doubao-seedream-5.0);
  1 video model (doubao-seedance-2.0)
- Tests: byok-tools (29), media-senseaudio-image (8), media-config seed
  (7), proxy-routes (47), markdown image rendering (8)

* fix(senseaudio): unblock image gen + design file preview switching

- SenseAudio /v1/image/sync rejected the previous size mapping with
  `参数错误:size` (1664x936, 936x1664, 1280x960, 960x1280 are not in
  the gateway's accepted set). Switched to standard HD / SD sizes that
  every aspect bucket can hit: 1024×1024, 1280×720, 720×1280,
  1024×768, 768×1024. Kept the byok-tools and media.ts tables in sync
  so the BYOK chat tool and the CLI agent path both stop failing on
  non-square aspects.

- DesignFilesPanel's <DfPreview> was missing a key prop, so React
  reused the same iframe DOM node when the user picked a different
  file — the src prop changed but the iframe never navigated. Added
  key={previewFile.name} so the previous preview unmounts cleanly.

- Updated byok-tools + media-senseaudio-image tests for the new size
  expectations.

* docs(senseaudio): clear stale provider hint + update README

- Settings → Media → SenseAudio: clear the auto-promoted
  "Image · TTS · 70+ voices · clone" hint; the provider label alone is
  enough now that the BYOK chat surface covers image + video tooling.
- README: list the new senseaudio (and missing ollama) proxy routes so
  the BYOK section reflects what the daemon actually serves, and
  mention the generate_image / generate_video chat tools that ship
  with the SenseAudio path.

* fix(senseaudio): address PR #2065 review feedback

Three non-blocking review notes from @PerishCode on PR #2065:

1. Drop the dead /api/byok-image/:id route. The PR description claimed
   it was "legacy fallback for old chat history" but that storage
   layout never existed on main, so the route can only ever 400 or
   404 — never 200. Removed the handler, the isSafeByokImageId
   export, the unused createReadStream / stat / path / Request /
   Response imports, and the two byok-image regression tests.

2. Add rejectProxyPluginContext guard to the senseaudio proxy
   handler so it matches the invariant the other five proxy paths
   already enforce (plugin runs must go through /api/runs for
   snapshot pinning). Extended the existing "API fallback rejects
   plugin runs" describe to also cover /api/proxy/senseaudio/stream
   with the 409 PLUGIN_REQUIRES_DAEMON expectation.

3. Wrap the secondary image / video downloads (the URLs the
   SenseAudio gateway hands back in /v1/image/sync .url and
   /v1/video/status .video_url) in validateBaseUrlResolved so a
   malicious gateway can't point us at 169.254.169.254 (AWS / Azure
   metadata) or RFC1918 hosts via the response payload. Also passed
   `redirect: 'error'` on both fetches to match the SSRF posture
   the primary proxy fetch already uses. The new
   assertExternalAssetUrl helper lives next to executeGenerateImage
   so future tool downloads can reuse it.

Tests: 120/120 daemon tests pass; guard + typecheck green.

* fix(senseaudio): mirror SSRF guard onto renderSenseAudioImage CLI path

Follow-up to 01b1260a — the chat-tool fix in byok-tools.ts wasn't
mirrored onto the parallel renderSenseAudioImage path in media.ts.
Same attacker-controllable shape (gateway-returned `data.url`),
same one-line fix.

- Hoist assertExternalAssetUrl from byok-tools.ts into
  connectionTest.ts next to validateBaseUrlResolved so both call
  sites (the BYOK chat tool loop AND the CLI agent media dispatcher)
  share one helper. Made the error strings provider-agnostic so a
  future caller doesn't get a misleading "senseaudio" attribution
  for a Volcengine / Grok / etc. download.
- renderSenseAudioImage now runs the response url through
  assertExternalAssetUrl before fetching bytes, and passes
  redirect: 'error' to block a 3xx hop into private space.

Scope intentionally limited to the senseaudio path PerishCode
flagged; the other unguarded fetch(entry.url) call sites in
media.ts (OpenAI / Volcengine / Grok / Nano-Banana) are pre-existing
patterns and belong in a separate follow-up if the daemon wants
defense-in-depth across every provider.

Tests: 127/127 daemon tests pass; guard + typecheck green.

---------

Co-authored-by: unknown <mazeliang@sensetime.com>
This commit is contained in:
mzl163 2026-05-19 23:14:56 +08:00 committed by GitHub
parent 431a5e2d79
commit 210b94069a
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
52 changed files with 3305 additions and 55 deletions

View file

@ -63,7 +63,7 @@ OD stands on four open-source shoulders:
| | What you get | | | What you get |
|---|---| |---|---|
| **Coding-agent CLIs (16)** | Claude Code · Codex CLI · Devin for Terminal · Cursor Agent · Gemini CLI · OpenCode · Qwen Code · Qoder CLI · GitHub Copilot CLI · Hermes (ACP) · Kimi CLI (ACP) · Pi (RPC) · Kiro CLI (ACP) · Kilo (ACP) · Mistral Vibe CLI (ACP) · DeepSeek TUI — auto-detected on `PATH`, swap with one click | | **Coding-agent CLIs (16)** | Claude Code · Codex CLI · Devin for Terminal · Cursor Agent · Gemini CLI · OpenCode · Qwen Code · Qoder CLI · GitHub Copilot CLI · Hermes (ACP) · Kimi CLI (ACP) · Pi (RPC) · Kiro CLI (ACP) · Kilo (ACP) · Mistral Vibe CLI (ACP) · DeepSeek TUI — auto-detected on `PATH`, swap with one click |
| **BYOK fallback** | Protocol-specific API proxy at `/api/proxy/{anthropic,openai,azure,google}/stream` — paste `baseUrl` + `apiKey` + `model`, choose Anthropic / OpenAI / Azure OpenAI / Google Gemini, and the daemon normalizes SSE back to the same chat stream. Internal-IP/SSRF blocked at the daemon edge. | | **BYOK fallback** | Protocol-specific API proxy at `/api/proxy/{anthropic,openai,azure,google,ollama,senseaudio}/stream` — paste `baseUrl` + `apiKey` + `model`, choose Anthropic / OpenAI / Azure OpenAI / Google Gemini / Ollama Cloud / SenseAudio, and the daemon normalizes SSE back to the same chat stream. SenseAudio chat additionally exposes `generate_image` and `generate_video` tools so the model can write rendered artifacts straight into the active project's folder. Internal-IP/SSRF blocked at the daemon edge. |
| **Design systems built-in** | **129** — 2 hand-authored starters + 70 product systems (Linear, Stripe, Vercel, Airbnb, Tesla, Notion, Anthropic, Apple, Cursor, Supabase, Figma, Xiaohongshu, …) from [`awesome-design-md`][acd2], plus 57 design skills from [`awesome-design-skills`][ads] added directly under `design-systems/` | | **Design systems built-in** | **129** — 2 hand-authored starters + 70 product systems (Linear, Stripe, Vercel, Airbnb, Tesla, Notion, Anthropic, Apple, Cursor, Supabase, Figma, Xiaohongshu, …) from [`awesome-design-md`][acd2], plus 57 design skills from [`awesome-design-skills`][ads] added directly under `design-systems/` |
| **Skills built-in** | **31** — 27 in `prototype` mode (web-prototype, saas-landing, dashboard, mobile-app, gamified-app, social-carousel, magazine-poster, dating-web, sprite-animation, motion-frames, critique, tweaks, wireframe-sketch, pm-spec, eng-runbook, finance-report, hr-onboarding, invoice, kanban-board, team-okrs, …) + 4 in `deck` mode (`guizang-ppt` · `simple-deck` · `replit-deck` · `weekly-update`). Grouped in the picker by `scenario`: design / marketing / operation / engineering / product / finance / hr / sale / personal. | | **Skills built-in** | **31** — 27 in `prototype` mode (web-prototype, saas-landing, dashboard, mobile-app, gamified-app, social-carousel, magazine-poster, dating-web, sprite-animation, motion-frames, critique, tweaks, wireframe-sketch, pm-spec, eng-runbook, finance-report, hr-onboarding, invoice, kanban-board, team-okrs, …) + 4 in `deck` mode (`guizang-ppt` · `simple-deck` · `replit-deck` · `weekly-update`). Grouped in the picker by `scenario`: design / marketing / operation / engineering / product / finance / hr / sale / personal. |
| **Media generation** | Image · video · audio surfaces ship alongside the design loop. **gpt-image-2** (Azure / OpenAI) for posters, avatars, infographics, illustrated maps · **Seedance 2.0** (ByteDance) for cinematic 15s text-to-video and image-to-video · **HyperFrames** ([heygen-com/hyperframes](https://github.com/heygen-com/hyperframes)) for HTML→MP4 motion graphics (product reveals, kinetic typography, data charts, social overlays, logo outros). **93** ready-to-replicate prompts gallery — 43 gpt-image-2 + 39 Seedance + 11 HyperFrames — under [`prompt-templates/`](prompt-templates/), with preview thumbnails and source attribution. Same chat surface as code; outputs a real `.mp4` / `.png` chip into the project workspace. | | **Media generation** | Image · video · audio surfaces ship alongside the design loop. **gpt-image-2** (Azure / OpenAI) for posters, avatars, infographics, illustrated maps · **Seedance 2.0** (ByteDance) for cinematic 15s text-to-video and image-to-video · **HyperFrames** ([heygen-com/hyperframes](https://github.com/heygen-com/hyperframes)) for HTML→MP4 motion graphics (product reveals, kinetic typography, data charts, social overlays, logo outros). **93** ready-to-replicate prompts gallery — 43 gpt-image-2 + 39 Seedance + 11 HyperFrames — under [`prompt-templates/`](prompt-templates/), with preview thumbnails and source attribution. Same chat surface as code; outputs a real `.mp4` / `.png` chip into the project workspace. |
@ -304,7 +304,7 @@ Every layer is composable. Every layer is a file you can edit. Read [`apps/daemo
| Frontend | Next.js 16 App Router + React 18 + TypeScript, Vercel-deployable | | Frontend | Next.js 16 App Router + React 18 + TypeScript, Vercel-deployable |
| Daemon | Node 24 · Express · SSE streaming · `better-sqlite3`; tables: `projects` · `conversations` · `messages` · `tabs` · `templates` | | Daemon | Node 24 · Express · SSE streaming · `better-sqlite3`; tables: `projects` · `conversations` · `messages` · `tabs` · `templates` |
| Agent transport | `child_process.spawn`; typed-event parsers for `claude-stream-json` (Claude Code), `qoder-stream-json` (Qoder CLI), `copilot-stream-json` (Copilot), `json-event-stream` per-CLI parsers (Codex / Gemini / OpenCode / Cursor Agent), `acp-json-rpc` (Devin / Hermes / Kimi / Kiro / Kilo / Mistral Vibe via Agent Client Protocol), `pi-rpc` (Pi via stdio JSON-RPC), `plain` (Qwen Code / DeepSeek TUI) | | Agent transport | `child_process.spawn`; typed-event parsers for `claude-stream-json` (Claude Code), `qoder-stream-json` (Qoder CLI), `copilot-stream-json` (Copilot), `json-event-stream` per-CLI parsers (Codex / Gemini / OpenCode / Cursor Agent), `acp-json-rpc` (Devin / Hermes / Kimi / Kiro / Kilo / Mistral Vibe via Agent Client Protocol), `pi-rpc` (Pi via stdio JSON-RPC), `plain` (Qwen Code / DeepSeek TUI) |
| BYOK proxy | `POST /api/proxy/{anthropic,openai,azure,google}/stream` → provider-specific upstream APIs, normalized `delta/end/error` SSE; allows loopback local LLM providers, rejects non-loopback private/link-local/CGNAT/multicast/reserved hosts, and disables upstream redirects at the daemon edge | | BYOK proxy | `POST /api/proxy/{anthropic,openai,azure,google,ollama,senseaudio}/stream` → provider-specific upstream APIs, normalized `delta/end/error` SSE; allows loopback local LLM providers, rejects non-loopback private/link-local/CGNAT/multicast/reserved hosts, and disables upstream redirects at the daemon edge |
| Storage | Plain files in `.od/projects/<id>/` + SQLite at `.od/app.sqlite` + credentials at `.od/media-config.json` (gitignored, auto-created). `OD_DATA_DIR=<dir>` relocates all daemon data (used for test isolation and read-only-install setups); `OD_MEDIA_CONFIG_DIR=<dir>` further narrows the override to just `media-config.json` for setups that want to keep API keys outside the data dir | | Storage | Plain files in `.od/projects/<id>/` + SQLite at `.od/app.sqlite` + credentials at `.od/media-config.json` (gitignored, auto-created). `OD_DATA_DIR=<dir>` relocates all daemon data (used for test isolation and read-only-install setups); `OD_MEDIA_CONFIG_DIR=<dir>` further narrows the override to just `media-config.json` for setups that want to keep API keys outside the data dir |
| Preview | Sandboxed iframe via `srcdoc` + per-skill `<artifact>` parser ([`apps/web/src/artifacts/parser.ts`](apps/web/src/artifacts/parser.ts)) | | Preview | Sandboxed iframe via `srcdoc` + per-skill `<artifact>` parser ([`apps/web/src/artifacts/parser.ts`](apps/web/src/artifacts/parser.ts)) |
| Export | HTML (inline assets) · PDF (browser print, deck-aware) · PPTX (agent-driven via skill) · ZIP (archiver) · Markdown | | Export | HTML (inline assets) · PDF (browser print, deck-aware) · PPTX (agent-driven via skill) · ZIP (archiver) · Markdown |
@ -872,7 +872,7 @@ Pattern is the same as the rest: pick a template, edit the brief, send. The agen
The chat / artifact loop gets the spotlight, but a handful of less-visible capabilities are already wired and worth knowing before you compare OD to anything else: The chat / artifact loop gets the spotlight, but a handful of less-visible capabilities are already wired and worth knowing before you compare OD to anything else:
- **Claude Design ZIP import.** Drop an export from claude.ai onto the welcome dialog. `POST /api/import/claude-design` extracts it into a real `.od/projects/<id>/`, opens the entry file as a tab, and stages a continue-where-Anthropic-left-off prompt for your local agent. No re-prompting, no "ask the model to re-create what we just had". ([`apps/daemon/src/server.ts`](apps/daemon/src/server.ts) — `/api/import/claude-design`) - **Claude Design ZIP import.** Drop an export from claude.ai onto the welcome dialog. `POST /api/import/claude-design` extracts it into a real `.od/projects/<id>/`, opens the entry file as a tab, and stages a continue-where-Anthropic-left-off prompt for your local agent. No re-prompting, no "ask the model to re-create what we just had". ([`apps/daemon/src/server.ts`](apps/daemon/src/server.ts) — `/api/import/claude-design`)
- **Multi-provider BYOK proxy.** `POST /api/proxy/{anthropic,openai,azure,google}/stream` takes `{ baseUrl, apiKey, model, messages }`, builds the provider-specific upstream request, normalizes SSE chunks into `delta/end/error`, and allows loopback local LLM providers while rejecting non-loopback private, link-local, CGNAT, multicast, reserved, and redirect targets to head off SSRF. OpenAI-compatible covers OpenAI, Azure AI Foundry `/openai/v1`, DeepSeek, Groq, MiMo, OpenRouter, Ollama, LM Studio, and self-hosted vLLM; Azure OpenAI adds deployment URL + `api-version`; Google uses Gemini `:streamGenerateContent`. - **Multi-provider BYOK proxy.** `POST /api/proxy/{anthropic,openai,azure,google,ollama,senseaudio}/stream` takes `{ baseUrl, apiKey, model, messages }`, builds the provider-specific upstream request, normalizes SSE chunks into `delta/end/error`, and allows loopback local LLM providers while rejecting non-loopback private, link-local, CGNAT, multicast, reserved, and redirect targets to head off SSRF. OpenAI-compatible covers OpenAI, Azure AI Foundry `/openai/v1`, DeepSeek, Groq, MiMo, OpenRouter, Ollama, LM Studio, and self-hosted vLLM; Azure OpenAI adds deployment URL + `api-version`; Google uses Gemini `:streamGenerateContent`.
- **User-saved templates.** Once you like a render, `POST /api/templates` snapshots the HTML + metadata into the SQLite `templates` table. The next project picks it from a "your templates" row in the picker — same surface as the shipped 31, but yours. - **User-saved templates.** Once you like a render, `POST /api/templates` snapshots the HTML + metadata into the SQLite `templates` table. The next project picks it from a "your templates" row in the picker — same surface as the shipped 31, but yours.
- **Tab persistence.** Every project remembers its open files and active tab in the `tabs` table. Reopen the project tomorrow and the workspace looks exactly the way you left it. - **Tab persistence.** Every project remembers its open files and active tab in the `tabs` table. Reopen the project tomorrow and the workspace looks exactly the way you left it.
- **Artifact lint API.** `POST /api/artifacts/lint` runs structural checks on a generated artifact (broken `<artifact>` framing, missing required side files, stale palette tokens) and returns findings the agent can read back into its next turn. The five-dim self-critique uses this to ground its score in real evidence, not vibes. - **Artifact lint API.** `POST /api/artifacts/lint` runs structural checks on a generated artifact (broken `<artifact>` framing, missing required side files, stale palette tokens) and returns findings the agent can read back into its next turn. The five-dim self-critique uses this to ground its score in real evidence, not vibes.
@ -974,7 +974,7 @@ Long-form provenance write-up — what we take from each, what we deliberately d
- [x] Web app + chat + question form + 5-direction picker + todo progress + sandboxed preview - [x] Web app + chat + question form + 5-direction picker + todo progress + sandboxed preview
- [x] 31 skills + 72 design systems + 5 visual directions + 5 device frames - [x] 31 skills + 72 design systems + 5 visual directions + 5 device frames
- [x] SQLite-backed projects · conversations · messages · tabs · templates - [x] SQLite-backed projects · conversations · messages · tabs · templates
- [x] Multi-provider BYOK proxy (`/api/proxy/{anthropic,openai,azure,google}/stream`) with SSRF guard - [x] Multi-provider BYOK proxy (`/api/proxy/{anthropic,openai,azure,google,ollama,senseaudio}/stream`) with SSRF guard
- [x] Claude Design ZIP import (`/api/import/claude-design`) - [x] Claude Design ZIP import (`/api/import/claude-design`)
- [x] Sidecar protocol + Electron desktop with IPC automation (STATUS / EVAL / SCREENSHOT / CONSOLE / CLICK / SHUTDOWN) - [x] Sidecar protocol + Electron desktop with IPC automation (STATUS / EVAL / SCREENSHOT / CONSOLE / CLICK / SHUTDOWN)
- [x] Artifact lint API + 5-dim self-critique pre-emit gate - [x] Artifact lint API + 5-dim self-critique pre-emit gate

View file

@ -0,0 +1,598 @@
// Tool definitions and executors exposed to BYOK chat sessions.
//
// Why this file exists: the BYOK chat proxy (e.g. /api/proxy/senseaudio/stream)
// is a thin pass-through that doesn't have the agent-runtime scaffolding the
// CLI agents (Claude Code / Codex / ...) carry. To let users ask their BYOK
// chat to "draw me a cat" and get an actual rendered PNG back, the daemon
// injects an OpenAI-shaped `tools` definition into the upstream completion
// request, then loops on the model's tool_calls: execute → feed the result
// back as a `role: 'tool'` message → re-issue the completion. The chat surface
// stays the same; the tool dispatch happens entirely daemon-side.
//
// Today we ship one tool — `generate_image` — backed by SenseAudio's
// /v1/image/sync endpoint, since the BYOK chat session already authenticates
// against SenseAudio with the same API key. Additional tools (TTS, video,
// research) can be added here as the BYOK surface expands.
import path from 'node:path';
import { writeFile } from 'node:fs/promises';
import { randomBytes } from 'node:crypto';
import { assertExternalAssetUrl } from './connectionTest.js';
import { resolveProviderConfig } from './media-config.js';
import { IMAGE_MODELS } from './media-models.js';
import { ensureProject } from './projects.js';
// SenseAudio image model allowlist — derived from the shared media-models
// registry so adding a new SenseAudio image model in one place (media-models)
// auto-extends the BYOK tool param enum, the Settings dropdown, and the
// daemon-side validation. No drift, no hand-maintained constant.
export const BYOK_SENSEAUDIO_IMAGE_MODELS: readonly string[] = IMAGE_MODELS
.filter((m) => m.provider === 'senseaudio')
.map((m) => m.id);
// Default falls back to the first entry from the registry (today
// `senseaudio-image-2.0-260319` — the multi-aspect latest). Kept as a
// computed constant so re-ordering the registry rotates the default
// without code edits here.
export const BYOK_SENSEAUDIO_DEFAULT_IMAGE_MODEL =
BYOK_SENSEAUDIO_IMAGE_MODELS[0] ?? 'senseaudio-image-2.0-260319';
export function isSenseAudioImageModel(value: unknown): value is string {
return typeof value === 'string' && BYOK_SENSEAUDIO_IMAGE_MODELS.includes(value);
}
const SENSEAUDIO_DEFAULT_BASE_URL = 'https://api.senseaudio.cn';
const PROMPT_MAX_LENGTH = 2000;
// SenseAudio video — the API only documents one model today, so the
// wire id is a const. The chat tool's `generate_video` param surface
// (prompt, aspect_ratio, duration, resolution, generate_audio) covers
// every knob the doubao-seedance gateway accepts.
const SENSEAUDIO_VIDEO_MODEL = 'doubao-seedance-2-0-260128';
const SENSEAUDIO_VIDEO_ASPECT_RATIOS = ['16:9', '9:16', '4:3', '3:4', '1:1'] as const;
const SENSEAUDIO_VIDEO_RESOLUTIONS = ['480p', '720p', '1080p'] as const;
const SENSEAUDIO_VIDEO_DURATION_MIN = 4;
const SENSEAUDIO_VIDEO_DURATION_MAX = 15;
const SENSEAUDIO_VIDEO_DURATION_DEFAULT = 5;
// Polling: SenseAudio docs recommend 510 s intervals; we pick 5 s and
// cap total attempts so a stuck job can't pin the chat stream forever.
// 120 attempts × 5 s = 10 min ceiling — covers the real-world
// doubao-seedance latency range (1080p + audio jobs frequently spend
// 38 min on the gateway). Below this, the 5-min cap timed out otherwise
// valid jobs; above this the chat surface starts feeling stuck.
const SENSEAUDIO_VIDEO_POLL_INTERVAL_MS_DEFAULT = 5000;
const SENSEAUDIO_VIDEO_MAX_POLLS = 120;
// Periodic progress log every N polls so a long-running job emits some
// signal to the daemon log — without flooding it with one line per
// 5 s. 6 polls = ~30 s between progress lines.
const SENSEAUDIO_VIDEO_PROGRESS_LOG_EVERY = 6;
// SenseAudio's image gateway rejects non-standard pixel sizes with a 400
// `参数错误size` (verified against logs from a failed call on
// 2026-05-16). We stick to common 16-multiple HD / SD sizes that the
// gateway is known to accept: 1024×1024 for square, 1280×720 / 720×1280
// for widescreen / portrait, 1024×768 / 768×1024 for the 4:3 family.
// The table is duplicated in renderSenseAudioImage (media.ts) for the
// CLI-agent path so both surfaces stay in sync.
const ASPECT_TO_SIZE: Record<string, string> = {
'1:1': '1024x1024',
'16:9': '1280x720',
'9:16': '720x1280',
'4:3': '1024x768',
'3:4': '768x1024',
};
/**
* OpenAI-compatible tool definition for image generation. Injected into
* the upstream `tools` array on every /api/proxy/senseaudio/stream
* request so the LLM can decide on its own when to call it. The
* description deliberately tells the model to embed the returned URL
* in markdown the chat UI already renders markdown images inline,
* so no client-side wiring is required for the bytes to show up.
*/
export const BYOK_SENSEAUDIO_TOOLS = [
{
type: 'function' as const,
function: {
name: 'generate_image',
description:
'Generate an image from a text prompt using SenseAudio image models. Returns a URL pointing to the rendered PNG. After this tool succeeds, embed the URL in your reply with markdown image syntax — ![alt](url) — so the user sees the image inline. Use this whenever the user asks to draw, create, generate, design, or illustrate something visual.',
parameters: {
type: 'object',
properties: {
prompt: {
type: 'string',
description:
'Detailed visual description of the image (Chinese or English are both fine). Include subject, style, lighting, composition. Maximum 2000 characters.',
},
aspect_ratio: {
type: 'string',
enum: ['1:1', '16:9', '9:16', '4:3', '3:4'],
description:
'Output aspect ratio. 1:1 for square avatars and product shots, 16:9 for hero banners, 9:16 for vertical phone posters, 4:3 for editorial covers, 3:4 for posters. Defaults to 1:1 when omitted.',
},
model: {
type: 'string',
enum: [...BYOK_SENSEAUDIO_IMAGE_MODELS],
description:
'Optional model override. Omit this to use the user-configured default from Settings (or the SenseAudio 2.0 multi-aspect model when unset). Choose senseaudio-image-2.0-260319 for multi-aspect generation, senseaudio-image-1.0-260319 for standard sizes, or doubao-seedream-5-0-260128 for high-resolution output through the ByteDance Seedream gateway. The user explicitly picked a default in their Settings — only override when the user asks for a different style/resolution.',
},
},
required: ['prompt'],
},
},
},
{
type: 'function' as const,
function: {
name: 'generate_video',
description:
'Generate a short video (415 seconds) from a text prompt using SenseAudio\'s ByteDance Seedance gateway. This is an asynchronous call that can take 30 s to a few minutes — the daemon polls the job for you, so the user just sees the chat waiting. After this tool succeeds, embed the returned URL in your reply as a markdown link, e.g. `[▶ Play video](url)`, because the chat\'s markdown renderer does not currently render `<video>` tags inline. Use this whenever the user asks for a video, clip, animation, or motion graphic.',
parameters: {
type: 'object',
properties: {
prompt: {
type: 'string',
description:
'Detailed motion description of the video. Include subject, action / camera move / scene transitions, style, lighting. Chinese or English. Maximum 2000 characters.',
},
aspect_ratio: {
type: 'string',
enum: [...SENSEAUDIO_VIDEO_ASPECT_RATIOS],
description:
'Output aspect ratio. 16:9 for cinematic, 9:16 for vertical (phone / TikTok), 1:1 for social square, 4:3 / 3:4 for editorial. Defaults to 16:9.',
},
duration: {
type: 'integer',
minimum: SENSEAUDIO_VIDEO_DURATION_MIN,
maximum: SENSEAUDIO_VIDEO_DURATION_MAX,
description:
`Video length in seconds (integer). Allowed range ${SENSEAUDIO_VIDEO_DURATION_MIN}${SENSEAUDIO_VIDEO_DURATION_MAX}; defaults to ${SENSEAUDIO_VIDEO_DURATION_DEFAULT}. Shorter durations finish faster.`,
},
resolution: {
type: 'string',
enum: [...SENSEAUDIO_VIDEO_RESOLUTIONS],
description:
'Output resolution. 480p (fastest), 720p (default, balanced), 1080p (best quality, slowest). Pick 1080p only when the user explicitly asks for high resolution.',
},
generate_audio: {
type: 'boolean',
description:
'Whether the model also synthesises an audio track for the clip (background sound, ambience). Defaults to false to keep generation fast; flip to true when the user asks for sound, music, or a "video with audio".',
},
},
required: ['prompt'],
},
},
},
];
/**
* Runtime context the BYOK tool executor needs. Passed by the chat
* route on every call so the tool layer stays free of global state and
* can be unit-tested with a temp directory.
*/
export interface BYOKToolContext {
/** Daemon project root used to look up media-config when the chat
* session key is missing. */
projectRoot: string;
/** Daemon's PROJECTS_DIR (the `<projectRoot>/.od/projects/` folder
* that holds per-project file trees). Generated images land in
* `<projectsRoot>/<projectId>/byok-<id>.png` so the project's
* FileViewer / DesignFilesPanel discover them automatically and
* the file travels with the project on export, archive, rename. */
projectsRoot: string;
/** Active project id from the chat surface. Required the BYOK
* chat always runs inside a project, so the tool dispatch refuses
* to fire without one rather than dump bytes into a global cache.
* Validated upstream via `isSafeId`. */
projectId: string;
/** The BYOK chat session's API key first credential we try. Bypasses
* the media-config indirection so the same key the user just pasted
* for chat is the same key the image call uses. */
upstreamApiKey: string;
/** The BYOK chat session's base URL (may be a custom gateway). Falls
* back to api.senseaudio.cn. */
upstreamBaseUrl?: string;
/** Default image model the user picked in BYOK Settings, used when the
* LLM didn't pass `model` in tool args. Validated upstream anything
* outside `BYOK_SENSEAUDIO_IMAGE_MODELS` is dropped so a stale
* client-side config can't smuggle an unregistered model id through.
* Falls back to `BYOK_SENSEAUDIO_DEFAULT_IMAGE_MODEL` (the registry's
* first SenseAudio image entry) when missing. */
defaultImageModel?: string;
/** Test-only override for the video polling interval (ms). Production
* uses 5 s (SenseAudio's recommendation) tests pass small values
* (e.g. 1 ms) to keep the suite fast without changing the polling
* semantics. */
videoPollIntervalMs?: number;
}
export interface ImageToolResult {
ok: boolean;
/** Daemon-served URL on success. */
url?: string;
/** Short human-readable failure reason. Stuffed into the `tool` role
* reply so the LLM can apologize / retry. */
error?: string;
}
function sanitizeAspectRatio(raw: unknown): string {
if (typeof raw !== 'string') return '1:1';
return ASPECT_TO_SIZE[raw] ? raw : '1:1';
}
/**
* Execute the `generate_image` tool. Calls SenseAudio /v1/image/sync,
* downloads the rendered bytes, writes them to <byokImagesDir>/<id>.png,
* and returns a daemon-served URL. Pure async caller is responsible
* for emitting any SSE events (e.g. "tool result ready").
*
* Failure modes return `{ok: false, error}` rather than throwing so the
* caller can feed the message back to the LLM as a tool_result; that
* lets the model apologize / suggest a retry instead of the chat
* silently stopping.
*/
export async function executeGenerateImage(
args: { prompt?: unknown; aspect_ratio?: unknown; model?: unknown },
ctx: BYOKToolContext,
): Promise<ImageToolResult> {
const promptRaw = typeof args.prompt === 'string' ? args.prompt.trim() : '';
if (!promptRaw) return { ok: false, error: 'prompt is required' };
const prompt =
promptRaw.length > PROMPT_MAX_LENGTH
? promptRaw.slice(0, PROMPT_MAX_LENGTH)
: promptRaw;
const aspect = sanitizeAspectRatio(args.aspect_ratio);
const size = ASPECT_TO_SIZE[aspect];
// Model resolution order — LLM args > user's Settings default > registry
// default. The allowlist guards every step so a hallucinated or stale id
// can never reach the senseaudio /v1/image/sync wire — the catalogue is
// the source of truth.
const senseAudioImageModel = isSenseAudioImageModel(args.model)
? args.model
: isSenseAudioImageModel(ctx.defaultImageModel)
? ctx.defaultImageModel
: BYOK_SENSEAUDIO_DEFAULT_IMAGE_MODEL;
// Resolve the project folder up front. ensureProject runs
// `isSafeId` internally, so an attacker who somehow bypassed the
// chat-routes guard and slipped `../escape` into projectId fails
// here before we make any upstream call. The returned `dir` is
// reused at writeFile time below.
let dir: string;
try {
dir = await ensureProject(ctx.projectsRoot, ctx.projectId);
} catch (err) {
return {
ok: false,
error: `invalid projectId for image storage: ${err instanceof Error ? err.message : String(err)}`,
};
}
// Prefer the BYOK session's key (what the user is actively using).
// Fall back to media-config (env var > stored) so a user who set
// OD_SENSEAUDIO_API_KEY but forgot to fill the chat panel still
// gets a working tool call.
let apiKey = ctx.upstreamApiKey;
let baseUrl = ctx.upstreamBaseUrl || SENSEAUDIO_DEFAULT_BASE_URL;
if (!apiKey) {
const resolved = await resolveProviderConfig(ctx.projectRoot, 'senseaudio');
apiKey = resolved.apiKey || '';
if (resolved.baseUrl) baseUrl = resolved.baseUrl;
}
if (!apiKey) {
return { ok: false, error: 'no SenseAudio API key available' };
}
const trimmedBase = baseUrl.replace(/\/+$/, '');
let imageUrl: string;
try {
const resp = await fetch(`${trimmedBase}/v1/image/sync`, {
method: 'POST',
headers: {
authorization: `Bearer ${apiKey}`,
'content-type': 'application/json',
},
body: JSON.stringify({
model: senseAudioImageModel,
prompt,
size,
}),
});
if (!resp.ok) {
const text = await resp.text().catch(() => '');
return {
ok: false,
error: `senseaudio image ${resp.status}: ${text.slice(0, 240)}`,
};
}
const data = (await resp.json()) as {
url?: string;
error_message?: string;
base_resp?: { status_code?: number; status_msg?: string };
};
if (data?.base_resp && data.base_resp.status_code !== 0) {
return {
ok: false,
error: `senseaudio image api error ${data.base_resp.status_code}: ${data.base_resp.status_msg || 'unknown'}`,
};
}
if (typeof data?.error_message === 'string' && data.error_message) {
return { ok: false, error: `senseaudio image: ${data.error_message}` };
}
if (typeof data?.url !== 'string' || !data.url) {
return { ok: false, error: 'senseaudio image response missing url' };
}
imageUrl = data.url;
} catch (err) {
return {
ok: false,
error: err instanceof Error ? err.message : String(err),
};
}
const imageUrlCheck = await assertExternalAssetUrl(imageUrl);
if (!imageUrlCheck.ok) return { ok: false, error: imageUrlCheck.error };
let bytes: Buffer;
try {
const imgResp = await fetch(imageUrl, { redirect: 'error' });
if (!imgResp.ok) {
return { ok: false, error: `image download ${imgResp.status}` };
}
bytes = Buffer.from(await imgResp.arrayBuffer());
} catch (err) {
return {
ok: false,
error: `image download failed: ${err instanceof Error ? err.message : String(err)}`,
};
}
if (bytes.length === 0) {
return { ok: false, error: 'image download returned zero bytes' };
}
// Persist into the active project's folder. `dir` was resolved up
// front via ensureProject — no DB write, no metadata side-effects —
// and the resulting path slots straight into the existing project
// file plumbing: listFiles enumerates it for the FileViewer,
// readProjectFile serves it via GET /api/projects/<id>/files/<filename>,
// and project archive / export pick it up automatically because it
// lives under the project's own directory.
//
// Filename pattern `byok-<timestamp>-<random>.png` keeps tool
// outputs distinguishable from user uploads at a glance while
// staying url-safe.
const id = `${Date.now().toString(36)}-${randomBytes(4).toString('hex')}`;
const filename = `byok-${id}.png`;
await writeFile(path.join(dir, filename), bytes);
// Return a relative URL through the project file serving route. The
// web's Next.js rewrites `/api/:path*` to the daemon (see
// apps/web/next.config.ts), so the chat UI loads the image
// same-origin — satisfying the strict CSP (`img-src 'self' data:
// blob:`) without any CORS plumbing.
return {
ok: true,
url: `/api/projects/${encodeURIComponent(ctx.projectId)}/files/${filename}`,
};
}
function sanitizeVideoAspectRatio(raw: unknown): (typeof SENSEAUDIO_VIDEO_ASPECT_RATIOS)[number] {
if (typeof raw !== 'string') return '16:9';
return (SENSEAUDIO_VIDEO_ASPECT_RATIOS as readonly string[]).includes(raw)
? (raw as (typeof SENSEAUDIO_VIDEO_ASPECT_RATIOS)[number])
: '16:9';
}
function sanitizeVideoResolution(raw: unknown): (typeof SENSEAUDIO_VIDEO_RESOLUTIONS)[number] {
if (typeof raw !== 'string') return '720p';
return (SENSEAUDIO_VIDEO_RESOLUTIONS as readonly string[]).includes(raw)
? (raw as (typeof SENSEAUDIO_VIDEO_RESOLUTIONS)[number])
: '720p';
}
function sanitizeVideoDuration(raw: unknown): number {
if (typeof raw !== 'number' || !Number.isFinite(raw)) return SENSEAUDIO_VIDEO_DURATION_DEFAULT;
const rounded = Math.round(raw);
if (rounded < SENSEAUDIO_VIDEO_DURATION_MIN) return SENSEAUDIO_VIDEO_DURATION_MIN;
if (rounded > SENSEAUDIO_VIDEO_DURATION_MAX) return SENSEAUDIO_VIDEO_DURATION_MAX;
return rounded;
}
const sleep = (ms: number): Promise<void> =>
new Promise((resolve) => setTimeout(resolve, ms));
/**
* Execute the `generate_video` tool. SenseAudio's video API is
* asynchronous-only: POST /v1/video/create returns a task_id, then
* GET /v1/video/status?id=<task_id> reports `pending` / `processing`
* `completed` (with `video_url`) or `failed` (with `error_message`).
* We poll every `videoPollIntervalMs` (default 5 s) and bail after
* `SENSEAUDIO_VIDEO_MAX_POLLS` so a stuck upstream can't pin the
* chat stream forever.
*
* The chat tool waits for the whole loop, so the daemon's outbound
* SSE response from /api/proxy/senseaudio/stream stays open for the
* duration. That's intentional the next chat turn cannot begin
* until we have a URL to feed back into the tool_result.
*/
export async function executeGenerateVideo(
args: {
prompt?: unknown;
aspect_ratio?: unknown;
duration?: unknown;
resolution?: unknown;
generate_audio?: unknown;
},
ctx: BYOKToolContext,
): Promise<ImageToolResult> {
const promptRaw = typeof args.prompt === 'string' ? args.prompt.trim() : '';
if (!promptRaw) return { ok: false, error: 'prompt is required' };
const prompt =
promptRaw.length > PROMPT_MAX_LENGTH
? promptRaw.slice(0, PROMPT_MAX_LENGTH)
: promptRaw;
const ratio = sanitizeVideoAspectRatio(args.aspect_ratio);
const resolution = sanitizeVideoResolution(args.resolution);
const duration = sanitizeVideoDuration(args.duration);
const generateAudio = args.generate_audio === true;
let dir: string;
try {
dir = await ensureProject(ctx.projectsRoot, ctx.projectId);
} catch (err) {
return {
ok: false,
error: `invalid projectId for video storage: ${err instanceof Error ? err.message : String(err)}`,
};
}
let apiKey = ctx.upstreamApiKey;
let baseUrl = ctx.upstreamBaseUrl || SENSEAUDIO_DEFAULT_BASE_URL;
if (!apiKey) {
const resolved = await resolveProviderConfig(ctx.projectRoot, 'senseaudio');
apiKey = resolved.apiKey || '';
if (resolved.baseUrl) baseUrl = resolved.baseUrl;
}
if (!apiKey) {
return { ok: false, error: 'no SenseAudio API key available' };
}
const trimmedBase = baseUrl.replace(/\/+$/, '');
// Step 1: POST /v1/video/create → task_id.
let taskId: string;
try {
const resp = await fetch(`${trimmedBase}/v1/video/create`, {
method: 'POST',
headers: {
authorization: `Bearer ${apiKey}`,
'content-type': 'application/json',
},
body: JSON.stringify({
model: SENSEAUDIO_VIDEO_MODEL,
content: [{ type: 'text', text: prompt }],
duration,
resolution,
ratio,
provider_specific: { generate_audio: generateAudio },
}),
});
if (!resp.ok) {
const text = await resp.text().catch(() => '');
return {
ok: false,
error: `senseaudio video create ${resp.status}: ${text.slice(0, 240)}`,
};
}
const data = (await resp.json()) as { task_id?: string };
if (typeof data?.task_id !== 'string' || !data.task_id) {
return { ok: false, error: 'senseaudio video create response missing task_id' };
}
taskId = data.task_id;
} catch (err) {
return {
ok: false,
error: err instanceof Error ? err.message : String(err),
};
}
// Step 2: poll /v1/video/status until completed / failed / timeout.
const pollIntervalMs = ctx.videoPollIntervalMs ?? SENSEAUDIO_VIDEO_POLL_INTERVAL_MS_DEFAULT;
let videoUrl = '';
for (let attempt = 0; attempt < SENSEAUDIO_VIDEO_MAX_POLLS; attempt++) {
await sleep(pollIntervalMs);
let statusResp: Response;
try {
statusResp = await fetch(
`${trimmedBase}/v1/video/status?id=${encodeURIComponent(taskId)}`,
{
method: 'GET',
headers: { authorization: `Bearer ${apiKey}` },
},
);
} catch (err) {
return {
ok: false,
error: `senseaudio video poll failed: ${err instanceof Error ? err.message : String(err)}`,
};
}
if (!statusResp.ok) {
const text = await statusResp.text().catch(() => '');
return {
ok: false,
error: `senseaudio video status ${statusResp.status}: ${text.slice(0, 240)}`,
};
}
const data = (await statusResp.json()) as {
status?: string;
progress?: number;
video_url?: string;
error_message?: string;
};
if (data?.status === 'completed') {
if (typeof data.video_url !== 'string' || !data.video_url) {
return { ok: false, error: 'senseaudio video status completed but missing video_url' };
}
videoUrl = data.video_url;
break;
}
if (data?.status === 'failed') {
return {
ok: false,
error: `senseaudio video failed: ${data.error_message || 'unknown reason'}`,
};
}
// pending / processing — continue polling. Emit a periodic log line
// so a stuck job surfaces in the daemon log instead of silently
// burning attempts.
if ((attempt + 1) % SENSEAUDIO_VIDEO_PROGRESS_LOG_EVERY === 0) {
const pct = typeof data.progress === 'number' ? data.progress : '?';
console.log(
`[proxy:senseaudio] generate_video poll ${attempt + 1}/${SENSEAUDIO_VIDEO_MAX_POLLS} task=${taskId} status=${data.status ?? 'unknown'} progress=${pct}`,
);
}
}
if (!videoUrl) {
return {
ok: false,
error: `senseaudio video timed out after ${SENSEAUDIO_VIDEO_MAX_POLLS} polls`,
};
}
// Step 3: download the mp4 bytes and persist into the project folder.
// Re-validate the returned URL through validateBaseUrlResolved so a
// malicious gateway can't point us at 169.254.169.254 (AWS / Azure
// metadata service) or RFC1918 hosts via the response payload.
const videoUrlCheck = await assertExternalAssetUrl(videoUrl);
if (!videoUrlCheck.ok) return { ok: false, error: videoUrlCheck.error };
let bytes: Buffer;
try {
const videoResp = await fetch(videoUrl, { redirect: 'error' });
if (!videoResp.ok) {
return { ok: false, error: `video download ${videoResp.status}` };
}
bytes = Buffer.from(await videoResp.arrayBuffer());
} catch (err) {
return {
ok: false,
error: `video download failed: ${err instanceof Error ? err.message : String(err)}`,
};
}
if (bytes.length === 0) {
return { ok: false, error: 'video download returned zero bytes' };
}
const id = `${Date.now().toString(36)}-${randomBytes(4).toString('hex')}`;
const filename = `byok-video-${id}.mp4`;
await writeFile(path.join(dir, filename), bytes);
return {
ok: true,
url: `/api/projects/${encodeURIComponent(ctx.projectId)}/files/${filename}`,
};
}

View file

@ -1,13 +1,22 @@
import type { Express } from 'express'; import type { Express } from 'express';
import type { RouteDeps } from './server-context.js'; import type { RouteDeps } from './server-context.js';
import { newInsertId } from './analytics.js'; import { newInsertId } from './analytics.js';
import { seedProviderIfMissing } from './media-config.js';
import {
BYOK_SENSEAUDIO_TOOLS,
executeGenerateImage,
executeGenerateVideo,
isSenseAudioImageModel,
type BYOKToolContext,
} from './byok-tools.js';
import { isSafeId as isSafeProjectId } from './projects.js';
import { import {
agentIdToTracking, agentIdToTracking,
projectKindToTracking, projectKindToTracking,
} from '@open-design/contracts/analytics'; } from '@open-design/contracts/analytics';
import { validateBaseUrlResolved } from './connectionTest.js'; import { validateBaseUrlResolved } from './connectionTest.js';
export interface RegisterChatRoutesDeps extends RouteDeps<'db' | 'design' | 'http' | 'chat' | 'agents' | 'critique' | 'validation' | 'lifecycle'> {} export interface RegisterChatRoutesDeps extends RouteDeps<'db' | 'design' | 'http' | 'chat' | 'agents' | 'critique' | 'validation' | 'lifecycle' | 'paths'> {}
// Invariant: a chat assistant message row reflects its run's terminal state // Invariant: a chat assistant message row reflects its run's terminal state
// even when the web client never persists the cancel/finish itself (refresh // even when the web client never persists the cancel/finish itself (refresh
@ -310,13 +319,13 @@ export function registerChatRoutes(app: Express, ctx: RegisterChatRoutesDeps) {
const protocol = body.protocol; const protocol = body.protocol;
if ( if (
typeof protocol !== 'string' || typeof protocol !== 'string' ||
!['anthropic', 'openai', 'azure', 'google', 'ollama'].includes(protocol) !['anthropic', 'openai', 'azure', 'google', 'ollama', 'senseaudio'].includes(protocol)
) { ) {
return sendApiError( return sendApiError(
res, res,
400, 400,
'BAD_REQUEST', 'BAD_REQUEST',
'protocol must be one of anthropic|openai|azure|google|ollama', 'protocol must be one of anthropic|openai|azure|google|ollama|senseaudio',
); );
} }
if ( if (
@ -371,13 +380,13 @@ export function registerChatRoutes(app: Express, ctx: RegisterChatRoutesDeps) {
const protocol = body.protocol; const protocol = body.protocol;
if ( if (
typeof protocol !== 'string' || typeof protocol !== 'string' ||
!['anthropic', 'openai', 'azure', 'google', 'ollama'].includes(protocol) !['anthropic', 'openai', 'azure', 'google', 'ollama', 'senseaudio'].includes(protocol)
) { ) {
return sendApiError( return sendApiError(
res, res,
400, 400,
'BAD_REQUEST', 'BAD_REQUEST',
'protocol must be one of anthropic|openai|azure|google|ollama', 'protocol must be one of anthropic|openai|azure|google|ollama|senseaudio',
); );
} }
if ( if (
@ -1172,4 +1181,354 @@ export function registerChatRoutes(app: Express, ctx: RegisterChatRoutesDeps) {
} }
}); });
// SenseAudio chat completions. Wire-compatible with OpenAI (POST
// /v1/chat/completions, Bearer auth, SSE `data: {...}` + `data: [DONE]`)
// plus a daemon-side tool loop: the handler injects an OpenAI
// `tools` array on every upstream request and, when the model
// responds with a `tool_calls` finish_reason, executes the call
// locally, appends the assistant + tool messages to the conversation,
// and re-issues the completion. This is how BYOK chat — which has
// no agent-runtime scaffolding — gets image-generation parity with
// the CLI agent path. Loop is bounded by MAX_BYOK_TOOL_LOOPS so a
// misbehaving model can't pin the daemon in an infinite tool dance.
const MAX_BYOK_TOOL_LOOPS = 3;
type AccumulatedToolCall = { id: string; name: string; arguments: string };
type TurnResult =
| { kind: 'text_end' }
| { kind: 'error' }
| {
kind: 'tool_calls';
assistantMessage: any;
toolCalls: Array<{ id: string; type: 'function'; function: { name: string; arguments: string } }>;
};
app.post('/api/proxy/senseaudio/stream', async (req, res) => {
const proxyBody = req.body || {};
if (rejectProxyPluginContext(proxyBody, res)) return;
const {
baseUrl,
apiKey,
model,
systemPrompt,
messages,
maxTokens,
projectId,
byokImageModel,
} = proxyBody;
if (!apiKey || !model) {
return sendApiError(
res,
400,
'BAD_REQUEST',
'apiKey and model are required',
);
}
// projectId is required because the BYOK generate_image tool writes
// into the active project's folder; without one we'd have to fall
// back to a daemon-global cache that orphans the file. The web
// client always passes project.id from ProjectView, so a missing
// value means the request did not come through the chat surface.
if (typeof projectId !== 'string' || !isSafeProjectId(projectId)) {
return sendApiError(
res,
400,
'BAD_REQUEST',
'projectId is required and must be a safe identifier',
);
}
const effectiveBaseUrl = baseUrl || 'https://api.senseaudio.cn';
const validated = await validateExternalApiBaseUrl(effectiveBaseUrl);
if (validated.error) {
return sendApiError(
res,
validated.forbidden ? 403 : 400,
validated.forbidden ? 'FORBIDDEN' : 'BAD_REQUEST',
validated.error,
);
}
const url = appendVersionedApiPath(effectiveBaseUrl, '/chat/completions');
console.log(
`[proxy:senseaudio] ${req.method} ${validated.parsed?.hostname ?? '?'} model=${model} project=${projectId}`,
);
const workingMessages: any[] = Array.isArray(messages) ? [...messages] : [];
if (typeof systemPrompt === 'string' && systemPrompt) {
workingMessages.unshift({ role: 'system', content: systemPrompt });
}
// Tool execution context — built once per request. The image tool
// writes into `<projectsRoot>/<projectId>/byok-<id>.png` and returns
// a relative URL via `/api/projects/:id/files/:filename`. The web's
// Next.js rewrites `/api/:path*` to the daemon, so the chat UI
// loads images same-origin through the standard project file
// route — no CSP / CORS exceptions needed.
// User-configured BYOK default image model. Drop silently if the
// client sent an id outside the SenseAudio registry — the tool
// will fall back to the registry default and the LLM can still
// override per-call via the tool's `model` arg.
const validDefaultImageModel = isSenseAudioImageModel(byokImageModel)
? byokImageModel
: undefined;
const toolCtx: BYOKToolContext = {
projectRoot: ctx.paths.PROJECT_ROOT,
projectsRoot: ctx.paths.PROJECTS_DIR,
projectId,
upstreamApiKey: apiKey,
upstreamBaseUrl: effectiveBaseUrl,
// Spread-conditional because tsconfig's exactOptionalPropertyTypes
// forbids `field: undefined` on an optional slot. The byok-tools
// executor reads `ctx.defaultImageModel` with `isSenseAudioImageModel`
// anyway, so a missing key and an undefined value behave the same.
...(validDefaultImageModel
? { defaultImageModel: validDefaultImageModel }
: {}),
};
// Run one round-trip: POST to upstream, stream text deltas to the
// client as they arrive, accumulate any tool_call deltas. Returns
// a typed result describing what to do next (loop on tool calls,
// close the stream, or bail on error). Closures capture all the
// SSE helpers from registerChatRoutes.
const runSenseAudioTurn = async (
sse: any,
messagesForTurn: any[],
): Promise<TurnResult> => {
const payload: any = {
model,
messages: messagesForTurn,
max_tokens:
typeof maxTokens === 'number' && maxTokens > 0 ? maxTokens : 8192,
stream: true,
tools: BYOK_SENSEAUDIO_TOOLS,
tool_choice: 'auto',
};
const response = await fetch(url, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
Authorization: `Bearer ${apiKey}`,
},
body: JSON.stringify(payload),
redirect: 'error',
});
if (!response.ok) {
const errorText = await response.text();
console.error(
`[proxy:senseaudio] upstream error: ${response.status} ${redactAuthTokens(errorText)}`,
);
sendProxyError(sse, `Upstream error: ${response.status}`, {
code: proxyErrorCode(response.status),
details: errorText,
retryable: response.status === 429 || response.status >= 500,
});
return { kind: 'error' };
}
const accum: Record<number, AccumulatedToolCall> = {};
let finishReason = '';
let providerError = '';
await streamUpstreamSse(response, ({ payload, data }: any) => {
if (payload === '[DONE]') return true;
if (!data) return false;
const streamErr = extractStreamErrorMessage(data);
if (streamErr) {
providerError = streamErr;
return true;
}
const choices = (data as any).choices;
if (!Array.isArray(choices) || choices.length === 0) return false;
const choice = choices[0] || {};
const delta = choice.delta || {};
// Text content streams to the client unchanged. Tool turns and
// text turns can both share this path — the OpenAI protocol
// never emits text+tool_calls in the same chunk, but it can
// emit text before / after a tool_call in the same turn, and
// we want the user to see whatever the model decided to say.
if (typeof delta.content === 'string' && delta.content) {
sse.send('delta', { delta: delta.content });
}
// Tool call deltas stream as fragments — `id` arrives once at
// the start, `function.name` once at the start, and
// `function.arguments` accumulates a chunked JSON string we
// have to concatenate. Parallel calls use the `index` field to
// distinguish slots. Default to 0 when omitted (older models).
if (Array.isArray(delta.tool_calls)) {
for (const tc of delta.tool_calls) {
const idx = typeof tc?.index === 'number' ? tc.index : 0;
if (!accum[idx]) {
accum[idx] = { id: '', name: '', arguments: '' };
}
const slot = accum[idx];
if (typeof tc.id === 'string' && tc.id) slot.id = tc.id;
if (typeof tc.function?.name === 'string' && tc.function.name) {
slot.name = tc.function.name;
}
if (typeof tc.function?.arguments === 'string') {
slot.arguments += tc.function.arguments;
}
}
}
if (typeof choice.finish_reason === 'string' && choice.finish_reason) {
finishReason = choice.finish_reason;
}
return false;
});
if (providerError) {
sendProxyError(sse, `Provider error: ${providerError}`, {
details: providerError,
});
return { kind: 'error' };
}
if (finishReason === 'tool_calls' && Object.keys(accum).length > 0) {
const indices = Object.keys(accum)
.map(Number)
.sort((a, b) => a - b);
const toolCalls = indices.map((i) => ({
id: accum[i]!.id || `call_${i}`,
type: 'function' as const,
function: {
name: accum[i]!.name,
arguments: accum[i]!.arguments,
},
}));
return {
kind: 'tool_calls',
assistantMessage: {
role: 'assistant',
content: null,
tool_calls: toolCalls,
},
toolCalls,
};
}
return { kind: 'text_end' };
};
const executeOneTool = async (call: {
id: string;
function: { name: string; arguments: string };
}): Promise<{ ok: boolean; url?: string; error?: string; kind?: 'image' | 'video' }> => {
const fnName = call?.function?.name ?? '';
if (fnName !== 'generate_image' && fnName !== 'generate_video') {
return {
ok: false,
error: `unknown tool: ${fnName || 'unnamed'}`,
};
}
let args: any = {};
try {
args = JSON.parse(call.function.arguments || '{}');
} catch {
return { ok: false, error: 'tool arguments were not valid JSON' };
}
if (fnName === 'generate_image') {
const result = await executeGenerateImage(args, toolCtx);
return { ...result, kind: 'image' };
}
// generate_video — longer (up to 5 min), async-with-polling.
const result = await executeGenerateVideo(args, toolCtx);
return { ...result, kind: 'video' };
};
const sse = createSseResponse(res);
sse.send('start', { model });
// SenseAudio's gateway issues one API key that works for both
// /v1/chat/completions and the image / TTS surfaces. Mirror the
// BYOK key into media-config so the CLI agent path (`od media
// generate`) picks it up automatically — fire-and-forget; the
// chat stream must not block on the disk write. seedProviderIfMissing
// is idempotent and preserves env-var-resolved keys.
seedProviderIfMissing(ctx.paths.PROJECT_ROOT, 'senseaudio', {
apiKey,
baseUrl: effectiveBaseUrl,
})
.then((seeded) => {
if (seeded) {
console.log(
'[proxy:senseaudio] seeded media-config.senseaudio from BYOK key',
);
}
})
.catch((err: unknown) => {
console.warn(
`[proxy:senseaudio] seed media-config failed: ${
err instanceof Error ? err.message : String(err)
}`,
);
});
try {
for (let loop = 0; loop < MAX_BYOK_TOOL_LOOPS; loop++) {
const turn = await runSenseAudioTurn(sse, workingMessages);
if (turn.kind === 'error') return sse.end();
if (turn.kind === 'text_end') {
sse.send('end', {});
return sse.end();
}
// turn.kind === 'tool_calls'
workingMessages.push(turn.assistantMessage);
for (const call of turn.toolCalls) {
const result = await executeOneTool(call);
// The tool result is delivered to the model as a `tool` role
// message — a structured payload the model can interpret. We
// also surface a daemon-side log line so a user reporting "no
// image showed up" can grep for the call id. The kind field
// distinguishes image vs video so the daemon picks the right
// embedding hint for the model (markdown image syntax for
// PNG, markdown link for MP4 since the chat renderer doesn't
// currently render <video> tags).
const toolName = call?.function?.name ?? 'unknown';
if (result.ok) {
console.log(
`[proxy:senseaudio] ${toolName} OK: ${call.id}${result.url}`,
);
} else {
console.warn(
`[proxy:senseaudio] ${toolName} FAILED: ${call.id}${result.error}`,
);
}
const content = result.ok
? result.kind === 'video'
? `Video generated successfully. URL: ${result.url}. Reply to the user with a clickable markdown link, e.g. [▶ Play video](${result.url}). Do NOT use markdown image syntax — the chat renderer does not embed <video> tags.`
: `Image generated successfully. URL: ${result.url}. Reply to the user with: ![generated image](${result.url})`
: result.kind === 'video'
? `Video generation failed: ${result.error}. Apologize briefly and suggest a retry with a more specific prompt or a shorter duration.`
: `Image generation failed: ${result.error}. Apologize briefly and suggest a retry with a more specific prompt.`;
workingMessages.push({
role: 'tool',
tool_call_id: call.id,
content,
});
}
}
// Tool loop exhausted — the model still wants to call tools but we
// refuse a 4th round. Close the stream gracefully; the last text
// delta the model emitted (if any) is already on the wire.
console.warn(
'[proxy:senseaudio] tool loop bounded at MAX_BYOK_TOOL_LOOPS=3',
);
sse.send('end', {});
return sse.end();
} catch (err: any) {
console.error(`[proxy:senseaudio] internal error: ${err.message}`);
sendProxyError(sse, err.message, { code: 'INTERNAL_ERROR' });
sse.end();
}
});
} }

View file

@ -119,6 +119,41 @@ export async function validateBaseUrlResolved(
return sync; return sync;
} }
/**
* SSRF guard for asset URLs handed back inside a successful API
* response typically a `data.url` or `data.video_url` that points
* at the gateway's CDN, but is attacker-controllable when the
* upstream gateway is compromised or misconfigured. Routes the URL
* through `validateBaseUrlResolved` (DNS-resolve reject loopback,
* RFC1918, link-local, CGNAT, metadata-service IPs) and returns a
* discriminated union so callers don't have to repeat the
* `validated.error || !validated.parsed` plumbing.
*
* Two callers today:
* - `byok-tools.ts` for the chat-tool image/video downloads
* - `media.ts` `renderSenseAudioImage` for the CLI agent path
* Both hand the URL straight to `fetch(...)` next, so pair this
* guard with `redirect: 'error'` on the fetch to also block a
* 3xx hop into private space.
*/
export async function assertExternalAssetUrl(
rawUrl: string,
): Promise<{ ok: true } | { ok: false; error: string }> {
if (typeof rawUrl !== 'string' || !rawUrl) {
return { ok: false, error: 'empty download url' };
}
const validated = await validateBaseUrlResolved(rawUrl);
if (validated.error || !validated.parsed) {
return {
ok: false,
error: validated.forbidden
? `blocked download url (${validated.error ?? 'internal address'})`
: `invalid download url: ${validated.error ?? 'unknown reason'}`,
};
}
return { ok: true };
}
// Aggressive but not punitive — happy paths usually return in under 2 s. // Aggressive but not punitive — happy paths usually return in under 2 s.
// Override with OD_CONNECTION_TEST_PROVIDER_TIMEOUT_MS for slow networks // Override with OD_CONNECTION_TEST_PROVIDER_TIMEOUT_MS for slow networks
// or distant providers; invalid values fall back to the default. // or distant providers; invalid values fall back to the default.
@ -315,10 +350,10 @@ function inspectProviderCompletion(
const obj = data && typeof data === 'object' ? data as Record<string, unknown> : null; const obj = data && typeof data === 'object' ? data as Record<string, unknown> : null;
if (!obj) return { valid: false }; if (!obj) return { valid: false };
if (protocol === 'openai' || protocol === 'azure') { if (protocol === 'openai' || protocol === 'azure' || protocol === 'senseaudio') {
const responseModel = typeof obj.model === 'string' ? obj.model : ''; const responseModel = typeof obj.model === 'string' ? obj.model : '';
if ( if (
protocol === 'openai' && (protocol === 'openai' || protocol === 'senseaudio') &&
enforceResponseModel && enforceResponseModel &&
responseModel && responseModel &&
requestedModel && requestedModel &&
@ -518,6 +553,12 @@ function buildProviderCall(input: ProviderTestRequest): ProviderCallShape {
}, },
}; };
case 'openai': case 'openai':
case 'senseaudio':
// SenseAudio is wire-compatible with OpenAI (POST /v1/chat/completions,
// Bearer auth, identical body + response shape), so the connection
// smoke test reuses the same call shape. We default the base URL
// upstream-side in chat-routes; this layer assumes the caller passed
// a concrete URL via the BYOK form.
return { return {
url: appendVersionedApiPath(baseUrl, '/chat/completions'), url: appendVersionedApiPath(baseUrl, '/chat/completions'),
headers: { headers: {

View file

@ -521,3 +521,53 @@ export async function writeConfig(projectRoot: string, body: unknown) {
await writeStored(projectRoot, next); await writeStored(projectRoot, next);
return readMaskedConfig(projectRoot); return readMaskedConfig(projectRoot);
} }
/**
* Idempotent "seed if empty" write for a single provider slot. The chat
* proxy uses this to mirror a BYOK key into media-config so the agent's
* image / TTS path picks up the same credential without the user having
* to paste it twice. Strict rules:
* * No-op when an apiKey is ALREADY stored for `providerId` (the user
* may have configured Media independently and we never overwrite).
* * No-op when an env-var key resolves for `providerId` (env wins
* regardless of disk state seeding would be invisible).
* * No-op when the incoming `apiKey` is empty (we only seed values
* the chat layer has just verified upstream).
* * Otherwise merge `{ [providerId]: entry }` into the existing
* provider map and persist. All other provider slots and aliases
* are preserved byte-for-byte.
*
* Returns `true` when a write happened (caller can log), `false` when
* the call was a no-op. Errors are surfaced the caller decides
* whether to swallow them (fire-and-forget) or propagate.
*/
export async function seedProviderIfMissing(
projectRoot: string,
providerId: string,
entry: { apiKey?: string; baseUrl?: string; model?: string },
): Promise<boolean> {
if (!PROVIDER_IDS.includes(providerId)) return false;
const apiKey = entry.apiKey?.trim() ?? '';
if (!apiKey) return false;
// Env var wins at resolution time, so seeding when env is set would
// be invisible to the user. Skip to avoid confusing on-disk state.
if (readEnvKey(providerId)) return false;
const prior = await readStored(projectRoot);
const priorApiKey =
typeof prior[providerId]?.apiKey === 'string' && prior[providerId].apiKey.trim()
? prior[providerId].apiKey.trim()
: '';
if (priorApiKey) return false;
const baseUrl = entry.baseUrl?.trim() ?? '';
const model = entry.model?.trim() ?? '';
const next: ProviderMap = { ...prior };
next[providerId] = {
apiKey,
...(baseUrl ? { baseUrl } : {}),
...(model ? { model } : {}),
};
await writeStored(projectRoot, next);
return true;
}

View file

@ -60,7 +60,7 @@ export const MEDIA_PROVIDERS: MediaProvider[] = [
{ {
id: 'senseaudio', id: 'senseaudio',
label: 'SenseAudio', label: 'SenseAudio',
hint: 'TTS · 70+ system voices · clone', hint: '',
integrated: true, integrated: true,
defaultBaseUrl: 'https://api.senseaudio.cn', defaultBaseUrl: 'https://api.senseaudio.cn',
docsUrl: 'https://docs.senseaudio.cn', docsUrl: 'https://docs.senseaudio.cn',
@ -80,6 +80,10 @@ export const IMAGE_MODELS: MediaModel[] = [
{ id: 'doubao-seedream-3-0-t2i-250415', label: 'seedream-3.0', hint: 'ByteDance · Doubao image', provider: 'volcengine', caps: ['t2i'] }, { id: 'doubao-seedream-3-0-t2i-250415', label: 'seedream-3.0', hint: 'ByteDance · Doubao image', provider: 'volcengine', caps: ['t2i'] },
{ id: 'doubao-seededit-3-0-i2i-250628', label: 'seededit-3.0', hint: 'ByteDance · image edit', provider: 'volcengine', caps: ['i2i'] }, { id: 'doubao-seededit-3-0-i2i-250628', label: 'seededit-3.0', hint: 'ByteDance · image edit', provider: 'volcengine', caps: ['i2i'] },
{ id: 'senseaudio-image-2.0-260319', label: 'senseaudio-image-2.0', hint: 'SenseAudio · multi-aspect, latest', provider: 'senseaudio', caps: ['t2i', 'i2i'] },
{ id: 'senseaudio-image-1.0-260319', label: 'senseaudio-image-1.0', hint: 'SenseAudio · standard', provider: 'senseaudio', caps: ['t2i', 'i2i'] },
{ id: 'doubao-seedream-5-0-260128', label: 'seedream-5.0', hint: 'SenseAudio · ByteDance Seedream 5.0 hi-res', provider: 'senseaudio', caps: ['t2i', 'i2i'] },
{ id: 'grok-imagine-image', label: 'grok-imagine-image', hint: 'xAI · 2K text-to-image', provider: 'grok', caps: ['t2i'] }, { id: 'grok-imagine-image', label: 'grok-imagine-image', hint: 'xAI · 2K text-to-image', provider: 'grok', caps: ['t2i'] },
{ id: 'gemini-3.1-flash-image-preview', label: 'nano-banana-2', hint: 'Nano Banana · text-to-image', provider: 'nanobanana', caps: ['t2i'] }, { id: 'gemini-3.1-flash-image-preview', label: 'nano-banana-2', hint: 'Nano Banana · text-to-image', provider: 'nanobanana', caps: ['t2i'] },

View file

@ -57,6 +57,7 @@ import {
findProvider, findProvider,
modelsForSurface, modelsForSurface,
} from './media-models.js'; } from './media-models.js';
import { assertExternalAssetUrl } from './connectionTest.js';
import { resolveModelAlias, resolveProviderConfig } from './media-config.js'; import { resolveModelAlias, resolveProviderConfig } from './media-config.js';
import { import {
ensureProject, ensureProject,
@ -559,6 +560,11 @@ export async function generateMedia(args: {
bytes = result.bytes; bytes = result.bytes;
providerNote = result.providerNote; providerNote = result.providerNote;
suggestedExt = result.suggestedExt; suggestedExt = result.suggestedExt;
} else if (def.provider === 'senseaudio' && surface === 'image') {
const result = await renderSenseAudioImage(ctx, credentials);
bytes = result.bytes;
providerNote = result.providerNote;
suggestedExt = result.suggestedExt;
} else if (def.provider === 'fishaudio' && surface === 'audio') { } else if (def.provider === 'fishaudio' && surface === 'audio') {
const result = await renderFishAudioTTS(ctx, credentials); const result = await renderFishAudioTTS(ctx, credentials);
bytes = result.bytes; bytes = result.bytes;
@ -2243,6 +2249,131 @@ async function renderSenseAudioTTS(ctx: MediaContext, credentials: ProviderConfi
}; };
} }
// ---------------------------------------------------------------------------
// Provider: SenseAudio image — POST /v1/image/sync (synchronous text-to-image).
//
// Docs: https://docs.senseaudio.cn/guides/image/overview
// * Models: senseaudio-image-2.0-260319 (multi-aspect), senseaudio-image-1.0-260319
// (standard), doubao-seedream-5-0-260128 (hi-res). The wire `model` field
// accepts the catalog id directly so no alias map is needed.
// * Body: { model, prompt (≤2000 chars), size (WxH, required when no
// reference), reference (URL or data URI, optional), seed (optional int) }.
// * Response: { url: string } pointing at the rendered PNG; we fetch it
// once to materialise bytes the dispatcher can write to disk.
// * Auth: Authorization: Bearer <API_KEY>; shares the senseaudio provider
// slot with the TTS path (OD_SENSEAUDIO_API_KEY / SENSEAUDIO_API_KEY).
// We default to the /sync endpoint because the chat runtime already streams
// progress and a single round-trip keeps the dispatcher contract identical
// to OpenAI / Volcengine image. Switching to /v1/image/async + GET
// /v1/image/pending is a future option if the upstream model latency
// outgrows the daemon's request timeout.
// ---------------------------------------------------------------------------
const SENSEAUDIO_IMAGE_PROMPT_LIMIT = 2000;
// SenseAudio's image gateway rejects non-standard pixel sizes with a 400
// `参数错误size`. Keep this table in sync with byok-tools.ts's
// ASPECT_TO_SIZE — both paths hit the same /v1/image/sync endpoint.
function senseAudioImageSize(aspect?: string): string {
if (aspect === '16:9') return '1280x720';
if (aspect === '9:16') return '720x1280';
if (aspect === '4:3') return '1024x768';
if (aspect === '3:4') return '768x1024';
return '1024x1024';
}
async function renderSenseAudioImage(ctx: MediaContext, credentials: ProviderConfig): Promise<RenderResult> {
if (!credentials.apiKey) {
throw new Error(
'no SenseAudio API key — configure it in Settings or set OD_SENSEAUDIO_API_KEY',
);
}
const baseUrl = (credentials.baseUrl || SENSEAUDIO_DEFAULT_BASE_URL).replace(
/\/$/,
'',
);
const promptRaw = (ctx.prompt && ctx.prompt.trim()) || 'A high-quality reference image.';
// SenseAudio rejects >2000-char prompts with a 4xx; trim defensively so a
// verbose agent plan doesn't dead-end the generation. The truncated tail
// surfaces in providerNote so the user sees what was actually sent.
const prompt =
promptRaw.length > SENSEAUDIO_IMAGE_PROMPT_LIMIT
? promptRaw.slice(0, SENSEAUDIO_IMAGE_PROMPT_LIMIT)
: promptRaw;
const size = senseAudioImageSize(ctx.aspect);
const reference = ctx.imageRef?.dataUrl;
const body: Record<string, unknown> = {
model: ctx.wireModel,
prompt,
size,
};
if (reference) {
// When a reference image is supplied the API documents `size` as
// optional; we still send it so the output dimensions stay
// deterministic across t2i / i2i runs of the same project.
body.reference = reference;
}
const resp = await fetch(`${baseUrl}/v1/image/sync`, {
method: 'POST',
headers: {
authorization: `Bearer ${credentials.apiKey}`,
'content-type': 'application/json',
},
body: JSON.stringify(body),
});
const respText = await resp.text();
if (!resp.ok) {
throw new Error(`senseaudio image ${resp.status}: ${truncate(respText, 240)}`);
}
let data: any;
try {
data = JSON.parse(respText);
} catch {
throw new Error(`senseaudio image non-JSON: ${truncate(respText, 200)}`);
}
// Mirror the TTS base_resp envelope check: HTTP 200 can still encode an
// upstream logical failure. The image API uses the same shape on the
// failure path documented for /v1/image/pending (status=failed +
// error_message), so surface either source verbatim.
if (data?.base_resp && data.base_resp.status_code !== 0) {
throw new Error(
`senseaudio image api error ${data.base_resp.status_code}: ${data.base_resp.status_msg || 'unknown'}`,
);
}
if (typeof data?.error_message === 'string' && data.error_message) {
throw new Error(`senseaudio image api error: ${data.error_message}`);
}
const url = typeof data?.url === 'string' ? data.url : '';
if (!url) {
throw new Error('senseaudio image response missing url');
}
// Mirror the chat-tool SSRF guard (byok-tools.ts): the gateway-returned
// `url` is attacker-controllable inside a successful response, so DNS-
// resolve it through validateBaseUrlResolved and refuse loopback /
// RFC1918 / metadata-service hosts. Pair with `redirect: 'error'` so a
// 3xx hop into private space is also blocked.
const urlCheck = await assertExternalAssetUrl(url);
if (!urlCheck.ok) {
throw new Error(`senseaudio image ${urlCheck.error}`);
}
const imgResp = await fetch(url, { redirect: 'error' });
if (!imgResp.ok) {
throw new Error(`senseaudio image fetch ${imgResp.status}`);
}
const bytes = Buffer.from(await imgResp.arrayBuffer());
if (bytes.length === 0) {
throw new Error('senseaudio image fetch returned zero bytes');
}
return {
bytes,
providerNote: `senseaudio/${ctx.wireModel} · ${size}${reference ? ' · i2i' : ''} · ${bytes.length} bytes`,
suggestedExt: '.png',
};
}
// --------------------------------------------------------------------------- // ---------------------------------------------------------------------------
// Provider: FishAudio — Speech-1.x family text-to-speech (synchronous). // Provider: FishAudio — Speech-1.x family text-to-speech (synchronous).
// //

View file

@ -142,6 +142,15 @@ const PROVIDER_DEFAULTS = {
model: 'gemma3:4b', model: 'gemma3:4b',
baseUrl: 'https://ollama.com', baseUrl: 'https://ollama.com',
}, },
// SenseAudio's chat API is OpenAI-compatible (POST /v1/chat/completions,
// Bearer auth), so the extractor falls through to callOpenAI with this
// base URL and the user's SenseAudio API key. The default model is the
// small/fast variant so auto-pick stays cheap; users can swap in
// senseaudio-s2 or any gateway model via the picker.
senseaudio: {
model: 'senseaudio-s2-flash',
baseUrl: 'https://api.senseaudio.cn',
},
}; };
// Map an explicit override provider to the env var the daemon should // Map an explicit override provider to the env var the daemon should
@ -169,6 +178,13 @@ function envKeyFor(provider) {
if (provider === 'ollama') { if (provider === 'ollama') {
return process.env.OLLAMA_API_KEY?.trim() || ''; return process.env.OLLAMA_API_KEY?.trim() || '';
} }
if (provider === 'senseaudio') {
return (
process.env.OD_SENSEAUDIO_API_KEY?.trim()
|| process.env.SENSEAUDIO_API_KEY?.trim()
|| ''
);
}
return ''; return '';
} }

View file

@ -149,7 +149,9 @@ function extractGoogleModels(data: unknown): ProviderModelOption[] {
} }
function providerModelsUrl(protocol: ConnectionTestProtocol, baseUrl: string, apiKey: string): string { function providerModelsUrl(protocol: ConnectionTestProtocol, baseUrl: string, apiKey: string): string {
if (protocol === 'openai') return appendVersionedApiPath(baseUrl, '/models'); if (protocol === 'openai' || protocol === 'senseaudio') {
return appendVersionedApiPath(baseUrl, '/models');
}
if (protocol === 'anthropic') { if (protocol === 'anthropic') {
const url = new URL(appendVersionedApiPath(baseUrl, '/models')); const url = new URL(appendVersionedApiPath(baseUrl, '/models'));
url.searchParams.set('limit', '1000'); url.searchParams.set('limit', '1000');
@ -167,7 +169,9 @@ function providerModelsHeaders(
protocol: ConnectionTestProtocol, protocol: ConnectionTestProtocol,
apiKey: string, apiKey: string,
): Record<string, string> { ): Record<string, string> {
if (protocol === 'openai') return { authorization: `Bearer ${apiKey}` }; if (protocol === 'openai' || protocol === 'senseaudio') {
return { authorization: `Bearer ${apiKey}` };
}
if (protocol === 'anthropic') { if (protocol === 'anthropic') {
return { return {
'x-api-key': apiKey, 'x-api-key': apiKey,
@ -178,7 +182,9 @@ function providerModelsHeaders(
} }
function extractModels(protocol: ConnectionTestProtocol, data: unknown): ProviderModelOption[] { function extractModels(protocol: ConnectionTestProtocol, data: unknown): ProviderModelOption[] {
if (protocol === 'openai') return extractOpenAiModels(data); // SenseAudio's /v1/models response follows the OpenAI envelope
// (`{ data: [{ id, ... }] }`), so the same extractor handles both.
if (protocol === 'openai' || protocol === 'senseaudio') return extractOpenAiModels(data);
if (protocol === 'anthropic') return extractAnthropicModels(data); if (protocol === 'anthropic') return extractAnthropicModels(data);
if (protocol === 'google') return extractGoogleModels(data); if (protocol === 'google') return extractGoogleModels(data);
return []; return [];

View file

@ -10859,6 +10859,7 @@ export async function startServer({
db, db,
design, design,
http: httpDeps, http: httpDeps,
paths: pathDeps,
chat: { startChatRun, submitToolResultToRun }, chat: { startChatRun, submitToolResultToRun },
agents: agentDeps, agents: agentDeps,
critique: critiqueDeps, critique: critiqueDeps,

View file

@ -0,0 +1,686 @@
import { mkdir, mkdtemp, readFile, rm } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import path from 'node:path';
import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
import {
BYOK_SENSEAUDIO_TOOLS,
executeGenerateImage,
executeGenerateVideo,
} from '../src/byok-tools.js';
describe('BYOK_SENSEAUDIO_TOOLS', () => {
it('exports an OpenAI-shaped generate_image tool definition', () => {
const tool = BYOK_SENSEAUDIO_TOOLS.find(
(t) => t.function.name === 'generate_image',
);
expect(tool).toBeDefined();
expect(tool!.type).toBe('function');
expect(tool!.function.parameters.required).toEqual(['prompt']);
expect(tool!.function.parameters.properties.aspect_ratio.enum).toEqual([
'1:1',
'16:9',
'9:16',
'4:3',
'3:4',
]);
});
it('exposes both generate_image and generate_video tools', () => {
const names = BYOK_SENSEAUDIO_TOOLS.map((t) => t.function.name).sort();
expect(names).toEqual(['generate_image', 'generate_video']);
});
});
describe('executeGenerateImage', () => {
let root: string;
let projectsRoot: string;
const PROJECT_ID = 'test-project';
const realFetch = globalThis.fetch;
beforeEach(async () => {
root = await mkdtemp(path.join(tmpdir(), 'od-byok-tools-'));
projectsRoot = path.join(root, 'projects');
});
afterEach(async () => {
globalThis.fetch = realFetch;
vi.unstubAllGlobals();
await rm(root, { recursive: true, force: true });
});
const baseCtx = () => ({
projectRoot: root,
projectsRoot,
projectId: PROJECT_ID,
upstreamApiKey: 'sa-byok-key',
upstreamBaseUrl: 'https://api.senseaudio.cn',
});
it('calls /v1/image/sync, downloads the URL, persists bytes, and returns a daemon URL', async () => {
const pngBytes = Buffer.from([0x89, 0x50, 0x4e, 0x47, 0x0d, 0x0a, 0x1a, 0x0a]);
const fetchMock = vi.fn(async (input: unknown, init?: RequestInit) => {
const url = String(input);
if (url === 'https://api.senseaudio.cn/v1/image/sync') {
expect(init?.method).toBe('POST');
expect(init?.headers).toMatchObject({
authorization: 'Bearer sa-byok-key',
'content-type': 'application/json',
});
expect(JSON.parse(String(init?.body))).toEqual({
model: 'senseaudio-image-2.0-260319',
prompt: 'a tabby cat playing with yarn',
size: '1024x1024',
});
return new Response(
JSON.stringify({
url: 'https://cdn.example.test/generated/cat.png',
base_resp: { status_code: 0, status_msg: 'success' },
}),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
if (url === 'https://cdn.example.test/generated/cat.png') {
return new Response(pngBytes, {
status: 200,
headers: { 'content-type': 'image/png' },
});
}
throw new Error(`unexpected fetch: ${url}`);
});
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateImage(
{ prompt: 'a tabby cat playing with yarn' },
baseCtx(),
);
expect(result.ok).toBe(true);
// Returns a relative URL through the project file route so the
// chat UI loads same-origin via Next.js's /api/:path* rewrite,
// satisfying the strict CSP `img-src 'self'`. Path component is
// url-encoded so unusual (but isSafeId-passing) project ids don't
// break the URL.
expect(result.url).toMatch(
new RegExp(`^/api/projects/${PROJECT_ID}/files/byok-[a-z0-9-]+\\.png$`),
);
expect(fetchMock).toHaveBeenCalledTimes(2);
// Persisted file lives inside the project folder where listFiles /
// readProjectFile / archive plumbing will all discover it.
const filename = result.url!.split('/').pop()!;
const onDisk = await readFile(path.join(projectsRoot, PROJECT_ID, filename));
expect(onDisk.equals(pngBytes)).toBe(true);
});
it('honours args.model when the LLM picks a SenseAudio image model', async () => {
const pngBytes = Buffer.from([0x89, 0x50]);
const fetchMock = vi.fn(async (input: unknown, init?: RequestInit) => {
const url = String(input);
if (url.endsWith('/v1/image/sync')) {
expect(JSON.parse(String(init?.body)).model).toBe('doubao-seedream-5-0-260128');
return new Response(
JSON.stringify({ url: 'https://cdn.example.test/hi.png' }),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
return new Response(pngBytes, { status: 200 });
});
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateImage(
{ prompt: 'wallpaper', model: 'doubao-seedream-5-0-260128' },
baseCtx(),
);
expect(result.ok).toBe(true);
});
it('falls back to ctx.defaultImageModel when args.model is missing', async () => {
const pngBytes = Buffer.from([0x89, 0x50]);
const fetchMock = vi.fn(async (input: unknown, init?: RequestInit) => {
const url = String(input);
if (url.endsWith('/v1/image/sync')) {
expect(JSON.parse(String(init?.body)).model).toBe('senseaudio-image-1.0-260319');
return new Response(
JSON.stringify({ url: 'https://cdn.example.test/std.png' }),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
return new Response(pngBytes, { status: 200 });
});
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateImage(
{ prompt: 'standard' },
{ ...baseCtx(), defaultImageModel: 'senseaudio-image-1.0-260319' },
);
expect(result.ok).toBe(true);
});
it('ignores args.model when it is not in the SenseAudio allowlist', async () => {
const pngBytes = Buffer.from([0x89, 0x50]);
const fetchMock = vi.fn(async (input: unknown, init?: RequestInit) => {
const url = String(input);
if (url.endsWith('/v1/image/sync')) {
// Falls through to ctx.defaultImageModel (registry-valid).
expect(JSON.parse(String(init?.body)).model).toBe('senseaudio-image-1.0-260319');
return new Response(
JSON.stringify({ url: 'https://cdn.example.test/x.png' }),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
return new Response(pngBytes, { status: 200 });
});
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateImage(
{ prompt: 'spoofed', model: 'evil-model-id' },
{ ...baseCtx(), defaultImageModel: 'senseaudio-image-1.0-260319' },
);
expect(result.ok).toBe(true);
});
it('falls back to registry default when both args.model and ctx.defaultImageModel are missing/invalid', async () => {
const pngBytes = Buffer.from([0x89, 0x50]);
const fetchMock = vi.fn(async (input: unknown, init?: RequestInit) => {
const url = String(input);
if (url.endsWith('/v1/image/sync')) {
// Registry default is the first SenseAudio entry — 2.0 today.
expect(JSON.parse(String(init?.body)).model).toBe('senseaudio-image-2.0-260319');
return new Response(
JSON.stringify({ url: 'https://cdn.example.test/d.png' }),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
return new Response(pngBytes, { status: 200 });
});
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateImage(
{ prompt: 'no model anywhere' },
{ ...baseCtx(), defaultImageModel: 'also-bogus' },
);
expect(result.ok).toBe(true);
});
it('rejects unsafe projectId before any upstream call', async () => {
const fetchMock = vi.fn();
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateImage(
{ prompt: 'x' },
{ ...baseCtx(), projectId: '../escape' },
);
expect(result.ok).toBe(false);
expect(result.error).toMatch(/invalid projectId/);
// ensureProject runs up front so the unsafe id is caught BEFORE
// any senseaudio upstream call goes out — no token spent, no
// attempt to write outside the project tree.
expect(fetchMock).not.toHaveBeenCalled();
});
it('maps aspect_ratio to the SenseAudio size string', async () => {
const pngBytes = Buffer.from([0x89, 0x50]);
const fetchMock = vi.fn(async (input: unknown, init?: RequestInit) => {
const url = String(input);
if (url.endsWith('/v1/image/sync')) {
expect(JSON.parse(String(init?.body)).size).toBe('1280x720');
return new Response(
JSON.stringify({ url: 'https://cdn.example.test/wide.png' }),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
return new Response(pngBytes, { status: 200 });
});
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateImage(
{ prompt: 'widescreen banner', aspect_ratio: '16:9' },
baseCtx(),
);
expect(result.ok).toBe(true);
});
it('falls back to 1:1 for unknown aspect_ratio values', async () => {
const pngBytes = Buffer.from([0x89, 0x50]);
const fetchMock = vi.fn(async (input: unknown, init?: RequestInit) => {
const url = String(input);
if (url.endsWith('/v1/image/sync')) {
expect(JSON.parse(String(init?.body)).size).toBe('1024x1024');
return new Response(
JSON.stringify({ url: 'https://cdn.example.test/square.png' }),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
return new Response(pngBytes, { status: 200 });
});
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateImage(
{ prompt: 'square thing', aspect_ratio: 'something-else' },
baseCtx(),
);
expect(result.ok).toBe(true);
});
it('returns { ok: false } on missing prompt', async () => {
const fetchMock = vi.fn();
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateImage({}, baseCtx());
expect(result).toEqual({ ok: false, error: 'prompt is required' });
expect(fetchMock).not.toHaveBeenCalled();
});
it('returns { ok: false } when no API key is available', async () => {
const fetchMock = vi.fn();
vi.stubGlobal('fetch', fetchMock);
const ctx = { ...baseCtx(), upstreamApiKey: '' };
const result = await executeGenerateImage({ prompt: 'whatever' }, ctx);
expect(result.ok).toBe(false);
expect(result.error).toMatch(/no SenseAudio API key/);
expect(fetchMock).not.toHaveBeenCalled();
});
it('surfaces HTTP failures with status code and truncated body', async () => {
const fetchMock = vi.fn(async () =>
new Response('unauthorized', {
status: 401,
headers: { 'content-type': 'text/plain' },
}),
);
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateImage({ prompt: 'x' }, baseCtx());
expect(result.ok).toBe(false);
expect(result.error).toMatch(/senseaudio image 401/);
});
it('surfaces error_message envelope verbatim', async () => {
const fetchMock = vi.fn(async () =>
new Response(
JSON.stringify({ error_message: 'sensitive_content_blocked' }),
{ status: 200, headers: { 'content-type': 'application/json' } },
),
);
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateImage({ prompt: 'x' }, baseCtx());
expect(result.ok).toBe(false);
expect(result.error).toMatch(/sensitive_content_blocked/);
});
it('surfaces base_resp non-zero status_code', async () => {
const fetchMock = vi.fn(async () =>
new Response(
JSON.stringify({
base_resp: { status_code: 1004, status_msg: 'quota exhausted' },
}),
{ status: 200, headers: { 'content-type': 'application/json' } },
),
);
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateImage({ prompt: 'x' }, baseCtx());
expect(result.ok).toBe(false);
expect(result.error).toMatch(/api error 1004/);
expect(result.error).toMatch(/quota exhausted/);
});
it('returns { ok: false } when upstream returns no url', async () => {
const fetchMock = vi.fn(async () =>
new Response(
JSON.stringify({ base_resp: { status_code: 0, status_msg: 'ok' } }),
{ status: 200, headers: { 'content-type': 'application/json' } },
),
);
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateImage({ prompt: 'x' }, baseCtx());
expect(result.ok).toBe(false);
expect(result.error).toMatch(/missing url/);
});
it('returns { ok: false } when the image download fails', async () => {
const fetchMock = vi.fn(async (input: unknown) => {
const url = String(input);
if (url.endsWith('/v1/image/sync')) {
return new Response(
JSON.stringify({ url: 'https://cdn.example.test/will-404.png' }),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
return new Response('not found', { status: 404 });
});
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateImage({ prompt: 'x' }, baseCtx());
expect(result.ok).toBe(false);
expect(result.error).toMatch(/image download 404/);
});
});
describe('BYOK_SENSEAUDIO_TOOLS — video', () => {
it('exposes a generate_video tool definition with the documented param surface', () => {
const video = BYOK_SENSEAUDIO_TOOLS.find(
(t) => t.function.name === 'generate_video',
);
expect(video).toBeDefined();
const props = video!.function.parameters.properties as Record<string, any>;
expect(video!.function.parameters.required).toEqual(['prompt']);
expect(props.aspect_ratio.enum).toEqual(['16:9', '9:16', '4:3', '3:4', '1:1']);
expect(props.resolution.enum).toEqual(['480p', '720p', '1080p']);
expect(props.duration).toMatchObject({ type: 'integer', minimum: 4, maximum: 15 });
expect(props.generate_audio.type).toBe('boolean');
});
});
describe('executeGenerateVideo', () => {
let root: string;
let projectsRoot: string;
const PROJECT_ID = 'test-project';
const realFetch = globalThis.fetch;
beforeEach(async () => {
root = await mkdtemp(path.join(tmpdir(), 'od-byok-video-'));
projectsRoot = path.join(root, 'projects');
});
afterEach(async () => {
globalThis.fetch = realFetch;
vi.unstubAllGlobals();
await rm(root, { recursive: true, force: true });
});
const baseCtx = () => ({
projectRoot: root,
projectsRoot,
projectId: PROJECT_ID,
upstreamApiKey: 'sa-byok-key',
upstreamBaseUrl: 'https://api.senseaudio.cn',
// Keep tests fast — 1 ms between polls instead of the production 5 s.
videoPollIntervalMs: 1,
});
it('creates, polls until completed, downloads, and writes the mp4 into the project folder', async () => {
const mp4Bytes = Buffer.from([0x00, 0x00, 0x00, 0x18, 0x66, 0x74, 0x79, 0x70]);
let pollCount = 0;
const fetchMock = vi.fn(async (input: unknown, init?: RequestInit) => {
const url = String(input);
if (url === 'https://api.senseaudio.cn/v1/video/create') {
expect(init?.method).toBe('POST');
expect(init?.headers).toMatchObject({
authorization: 'Bearer sa-byok-key',
'content-type': 'application/json',
});
const body = JSON.parse(String(init?.body));
expect(body).toEqual({
model: 'doubao-seedance-2-0-260128',
content: [{ type: 'text', text: 'a sunset over the ocean' }],
duration: 8,
resolution: '1080p',
ratio: '16:9',
provider_specific: { generate_audio: true },
});
return new Response(
JSON.stringify({ task_id: 'task-abc' }),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
if (url.startsWith('https://api.senseaudio.cn/v1/video/status?id=task-abc')) {
pollCount++;
if (pollCount === 1) {
return new Response(
JSON.stringify({ status: 'pending', progress: 0 }),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
if (pollCount === 2) {
return new Response(
JSON.stringify({ status: 'processing', progress: 50 }),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
return new Response(
JSON.stringify({
status: 'completed',
progress: 100,
video_url: 'https://cdn.example.test/video/done.mp4',
duration: 8,
}),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
if (url === 'https://cdn.example.test/video/done.mp4') {
return new Response(mp4Bytes, {
status: 200,
headers: { 'content-type': 'video/mp4' },
});
}
throw new Error(`unexpected fetch: ${url}`);
});
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateVideo(
{
prompt: 'a sunset over the ocean',
aspect_ratio: '16:9',
duration: 8,
resolution: '1080p',
generate_audio: true,
},
baseCtx(),
);
expect(result.ok).toBe(true);
expect(result.url).toMatch(
new RegExp(`^/api/projects/${PROJECT_ID}/files/byok-video-[a-z0-9-]+\\.mp4$`),
);
// 1× create + 3× poll + 1× download = 5 fetches total.
expect(fetchMock).toHaveBeenCalledTimes(5);
expect(pollCount).toBe(3);
const filename = result.url!.split('/').pop()!;
const onDisk = await readFile(path.join(projectsRoot, PROJECT_ID, filename));
expect(onDisk.equals(mp4Bytes)).toBe(true);
});
it('defaults duration / resolution / aspect when caller omits them', async () => {
const fetchMock = vi.fn(async (input: unknown, init?: RequestInit) => {
const url = String(input);
if (url.endsWith('/v1/video/create')) {
const body = JSON.parse(String(init?.body));
expect(body).toMatchObject({
duration: 5,
resolution: '720p',
ratio: '16:9',
provider_specific: { generate_audio: false },
});
return new Response(
JSON.stringify({ task_id: 'task-defaults' }),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
if (url.startsWith('https://api.senseaudio.cn/v1/video/status')) {
return new Response(
JSON.stringify({
status: 'completed',
video_url: 'https://cdn.example.test/video/d.mp4',
}),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
return new Response(Buffer.from([0x01]), { status: 200 });
});
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateVideo({ prompt: 'minimal' }, baseCtx());
expect(result.ok).toBe(true);
});
it('clamps duration outside the 415 range and rejects non-enum aspect_ratio / resolution', async () => {
const fetchMock = vi.fn(async (input: unknown, init?: RequestInit) => {
const url = String(input);
if (url.endsWith('/v1/video/create')) {
const body = JSON.parse(String(init?.body));
// 99 → clamped to 15; 'octagonal' → falls back to '16:9';
// '8k' → falls back to '720p'.
expect(body).toMatchObject({
duration: 15,
resolution: '720p',
ratio: '16:9',
});
return new Response(
JSON.stringify({ task_id: 'task-clamp' }),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
if (url.startsWith('https://api.senseaudio.cn/v1/video/status')) {
return new Response(
JSON.stringify({
status: 'completed',
video_url: 'https://cdn.example.test/clamp.mp4',
}),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
return new Response(Buffer.from([0x02]), { status: 200 });
});
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateVideo(
{
prompt: 'overflow',
duration: 99,
aspect_ratio: 'octagonal',
resolution: '8k',
},
baseCtx(),
);
expect(result.ok).toBe(true);
});
it('surfaces a failed status as a tool error so the model can apologize', async () => {
const fetchMock = vi.fn(async (input: unknown) => {
const url = String(input);
if (url.endsWith('/v1/video/create')) {
return new Response(
JSON.stringify({ task_id: 'task-fail' }),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
if (url.startsWith('https://api.senseaudio.cn/v1/video/status')) {
return new Response(
JSON.stringify({
status: 'failed',
error_message: 'sensitive_content_blocked',
}),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
throw new Error(`unexpected fetch: ${url}`);
});
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateVideo(
{ prompt: 'blocked content' },
baseCtx(),
);
expect(result.ok).toBe(false);
expect(result.error).toMatch(/senseaudio video failed/);
expect(result.error).toMatch(/sensitive_content_blocked/);
});
it('times out after SENSEAUDIO_VIDEO_MAX_POLLS polls when the job stays pending', async () => {
const fetchMock = vi.fn(async (input: unknown) => {
const url = String(input);
if (url.endsWith('/v1/video/create')) {
return new Response(
JSON.stringify({ task_id: 'task-stuck' }),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
if (url.startsWith('https://api.senseaudio.cn/v1/video/status')) {
return new Response(
JSON.stringify({ status: 'pending', progress: 0 }),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
throw new Error(`unexpected fetch: ${url}`);
});
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateVideo(
{ prompt: 'stuck job' },
baseCtx(),
);
expect(result.ok).toBe(false);
expect(result.error).toMatch(/timed out/);
// 1× create + 120× poll = 121 fetches (10-min ceiling at 5 s
// intervals — kept generous because doubao-seedance frequently
// spends 38 min on the gateway for 1080p+audio jobs).
expect(fetchMock).toHaveBeenCalledTimes(121);
}, 30_000);
it('returns a tool error when create response is missing task_id', async () => {
const fetchMock = vi.fn(async () =>
new Response('{"oops": true}', {
status: 200,
headers: { 'content-type': 'application/json' },
}),
);
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateVideo({ prompt: 'x' }, baseCtx());
expect(result.ok).toBe(false);
expect(result.error).toMatch(/missing task_id/);
});
it('returns a tool error when create call returns non-2xx', async () => {
const fetchMock = vi.fn(async () =>
new Response('unauthorized', {
status: 401,
headers: { 'content-type': 'text/plain' },
}),
);
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateVideo({ prompt: 'x' }, baseCtx());
expect(result.ok).toBe(false);
expect(result.error).toMatch(/senseaudio video create 401/);
});
it('rejects an unsafe projectId before any upstream call', async () => {
const fetchMock = vi.fn();
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateVideo(
{ prompt: 'x' },
{ ...baseCtx(), projectId: '../escape' },
);
expect(result.ok).toBe(false);
expect(result.error).toMatch(/invalid projectId/);
expect(fetchMock).not.toHaveBeenCalled();
});
it('rejects empty prompt before any upstream call', async () => {
const fetchMock = vi.fn();
vi.stubGlobal('fetch', fetchMock);
const result = await executeGenerateVideo({}, baseCtx());
expect(result.ok).toBe(false);
expect(result.error).toMatch(/prompt is required/);
expect(fetchMock).not.toHaveBeenCalled();
});
});

View file

@ -8,6 +8,7 @@ import {
readMaskedConfig, readMaskedConfig,
resolveModelAlias, resolveModelAlias,
resolveProviderConfig, resolveProviderConfig,
seedProviderIfMissing,
writeConfig, writeConfig,
} from '../src/media-config.js'; } from '../src/media-config.js';
@ -868,3 +869,159 @@ describe('media-config model alias resolution (issue #1277)', () => {
).toBe('doubao-seedream-5-0'); ).toBe('doubao-seedream-5-0');
}); });
}); });
describe('seedProviderIfMissing', () => {
let projectRoot: string;
const SENSEAUDIO_ENV_KEYS = ['OD_SENSEAUDIO_API_KEY', 'SENSEAUDIO_API_KEY'];
const originalEnv = Object.fromEntries(
SENSEAUDIO_ENV_KEYS.map((key) => [key, process.env[key]]),
);
const originalMediaConfigDir = process.env.OD_MEDIA_CONFIG_DIR;
const originalDataDir = process.env.OD_DATA_DIR;
beforeEach(async () => {
projectRoot = await mkdtemp(path.join(tmpdir(), 'od-media-seed-'));
for (const key of SENSEAUDIO_ENV_KEYS) {
delete process.env[key];
}
delete process.env.OD_MEDIA_CONFIG_DIR;
delete process.env.OD_DATA_DIR;
});
afterEach(async () => {
for (const key of SENSEAUDIO_ENV_KEYS) {
if (originalEnv[key] == null) {
delete process.env[key];
} else {
process.env[key] = originalEnv[key];
}
}
if (originalMediaConfigDir == null) {
delete process.env.OD_MEDIA_CONFIG_DIR;
} else {
process.env.OD_MEDIA_CONFIG_DIR = originalMediaConfigDir;
}
if (originalDataDir == null) {
delete process.env.OD_DATA_DIR;
} else {
process.env.OD_DATA_DIR = originalDataDir;
}
await rm(projectRoot, { recursive: true, force: true });
});
async function writeStored(data: unknown) {
const file = path.join(projectRoot, '.od', 'media-config.json');
await mkdir(path.dirname(file), { recursive: true });
await writeFile(file, JSON.stringify(data), 'utf8');
}
async function readStoredJson(): Promise<unknown> {
const file = path.join(projectRoot, '.od', 'media-config.json');
const raw = await readFile(file, 'utf8');
return JSON.parse(raw);
}
it('writes a fresh entry when the slot is empty', async () => {
const wrote = await seedProviderIfMissing(projectRoot, 'senseaudio', {
apiKey: 'sa-test-key',
baseUrl: 'https://api.senseaudio.cn',
});
expect(wrote).toBe(true);
const stored = await readStoredJson();
expect(stored).toEqual({
providers: {
senseaudio: {
apiKey: 'sa-test-key',
baseUrl: 'https://api.senseaudio.cn',
},
},
});
});
it('no-ops and preserves the stored key when one is already configured', async () => {
await writeStored({
providers: {
senseaudio: { apiKey: 'pre-existing-key', baseUrl: 'https://existing.example' },
},
});
const wrote = await seedProviderIfMissing(projectRoot, 'senseaudio', {
apiKey: 'newer-byok-key',
baseUrl: 'https://api.senseaudio.cn',
});
expect(wrote).toBe(false);
const stored = (await readStoredJson()) as { providers: Record<string, unknown> };
expect(stored.providers.senseaudio).toEqual({
apiKey: 'pre-existing-key',
baseUrl: 'https://existing.example',
});
});
it('preserves every other provider and aliases when seeding', async () => {
await writeStored({
providers: {
openai: { apiKey: 'sk-openai', baseUrl: 'https://api.openai.com/v1' },
volcengine: { apiKey: 'ark-key', baseUrl: 'https://ark.cn-beijing.volces.com/api/v3' },
},
aliases: { 'doubao-seedream-3-0-t2i-250415': 'doubao-seedream-5-0' },
});
const wrote = await seedProviderIfMissing(projectRoot, 'senseaudio', {
apiKey: 'sa-new',
});
expect(wrote).toBe(true);
const stored = (await readStoredJson()) as {
providers: Record<string, unknown>;
aliases: Record<string, string>;
};
expect(stored.providers.openai).toEqual({
apiKey: 'sk-openai',
baseUrl: 'https://api.openai.com/v1',
});
expect(stored.providers.volcengine).toEqual({
apiKey: 'ark-key',
baseUrl: 'https://ark.cn-beijing.volces.com/api/v3',
});
expect(stored.providers.senseaudio).toEqual({ apiKey: 'sa-new' });
expect(stored.aliases).toEqual({
'doubao-seedream-3-0-t2i-250415': 'doubao-seedream-5-0',
});
});
it('no-ops when an env var resolves a key for the provider', async () => {
process.env.OD_SENSEAUDIO_API_KEY = 'env-key';
const wrote = await seedProviderIfMissing(projectRoot, 'senseaudio', {
apiKey: 'sa-byok-key',
baseUrl: 'https://api.senseaudio.cn',
});
expect(wrote).toBe(false);
await expect(readStoredJson()).rejects.toThrow();
});
it('no-ops on empty apiKey', async () => {
const wrote = await seedProviderIfMissing(projectRoot, 'senseaudio', {
apiKey: '',
baseUrl: 'https://api.senseaudio.cn',
});
expect(wrote).toBe(false);
await expect(readStoredJson()).rejects.toThrow();
});
it('no-ops for unknown provider ids', async () => {
const wrote = await seedProviderIfMissing(projectRoot, 'not-a-provider', {
apiKey: 'whatever',
});
expect(wrote).toBe(false);
await expect(readStoredJson()).rejects.toThrow();
});
it('resolves the seeded key through resolveProviderConfig', async () => {
await seedProviderIfMissing(projectRoot, 'senseaudio', {
apiKey: 'sa-final',
baseUrl: 'https://api.senseaudio.cn',
});
const resolved = await resolveProviderConfig(projectRoot, 'senseaudio');
expect(resolved).toEqual({
apiKey: 'sa-final',
baseUrl: 'https://api.senseaudio.cn',
});
});
});

View file

@ -0,0 +1,305 @@
import { mkdir, mkdtemp, readFile, rm, writeFile } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import path from 'node:path';
import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
import { generateMedia } from '../src/media.js';
const TEST_SENSEAUDIO_BASE_URL = 'https://senseaudio-gateway.example.test';
const TEST_IMAGE_URL = 'https://cdn.example.test/generated/abc.png';
const TEST_IMAGE_BYTES = Buffer.from([0x89, 0x50, 0x4e, 0x47, 0x0d, 0x0a, 0x1a, 0x0a, 0x00, 0x01]);
function buildOkResponse(url = TEST_IMAGE_URL) {
return new Response(
JSON.stringify({ url, base_resp: { status_code: 0, status_msg: 'success' } }),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
function buildImageFetchResponse(bytes: Buffer) {
return new Response(bytes, {
status: 200,
headers: { 'content-type': 'image/png' },
});
}
describe('senseaudio image generation', () => {
let root: string;
let projectRoot: string;
let projectsRoot: string;
const realFetch = globalThis.fetch;
const originalMediaConfigDir = process.env.OD_MEDIA_CONFIG_DIR;
const originalDataDir = process.env.OD_DATA_DIR;
beforeEach(async () => {
root = await mkdtemp(path.join(tmpdir(), 'od-senseaudio-image-'));
projectRoot = path.join(root, 'project-root');
projectsRoot = path.join(projectRoot, '.od', 'projects');
await mkdir(projectsRoot, { recursive: true });
delete process.env.OD_MEDIA_CONFIG_DIR;
delete process.env.OD_DATA_DIR;
delete process.env.OD_SENSEAUDIO_API_KEY;
delete process.env.SENSEAUDIO_API_KEY;
});
afterEach(async () => {
globalThis.fetch = realFetch;
if (originalMediaConfigDir == null) {
delete process.env.OD_MEDIA_CONFIG_DIR;
} else {
process.env.OD_MEDIA_CONFIG_DIR = originalMediaConfigDir;
}
if (originalDataDir == null) {
delete process.env.OD_DATA_DIR;
} else {
process.env.OD_DATA_DIR = originalDataDir;
}
delete process.env.OD_SENSEAUDIO_API_KEY;
delete process.env.SENSEAUDIO_API_KEY;
await rm(root, { recursive: true, force: true });
});
async function writeConfig(data: unknown) {
const file = path.join(projectRoot, '.od', 'media-config.json');
await mkdir(path.dirname(file), { recursive: true });
await writeFile(file, JSON.stringify(data), 'utf8');
}
it('renders a SenseAudio image with the documented sync defaults', async () => {
await writeConfig({
providers: {
senseaudio: {
apiKey: 'sense-test-key',
baseUrl: TEST_SENSEAUDIO_BASE_URL,
},
},
});
const fetchMock = vi.fn(async (input: unknown, init?: RequestInit) => {
const urlStr = String(input);
if (urlStr === `${TEST_SENSEAUDIO_BASE_URL}/v1/image/sync`) {
expect(init?.method).toBe('POST');
expect(init?.headers).toMatchObject({
authorization: 'Bearer sense-test-key',
'content-type': 'application/json',
});
expect(JSON.parse(String(init?.body))).toEqual({
model: 'senseaudio-image-2.0-260319',
prompt: 'A magazine-style hero poster.',
size: '1024x1024',
});
return buildOkResponse();
}
if (urlStr === TEST_IMAGE_URL) {
return buildImageFetchResponse(TEST_IMAGE_BYTES);
}
throw new Error(`unexpected fetch: ${urlStr}`);
});
vi.stubGlobal('fetch', fetchMock);
const result = await generateMedia({
projectRoot,
projectsRoot,
projectId: 'project-1',
surface: 'image',
model: 'senseaudio-image-2.0-260319',
prompt: 'A magazine-style hero poster.',
output: 'sa-hero.png',
});
expect(fetchMock).toHaveBeenCalledTimes(2);
expect(result.providerId).toBe('senseaudio');
expect(result.providerNote).toContain('senseaudio/senseaudio-image-2.0-260319');
expect(result.providerNote).toContain('1024x1024');
const bytes = await readFile(path.join(projectsRoot, 'project-1', 'sa-hero.png'));
expect(bytes.equals(TEST_IMAGE_BYTES)).toBe(true);
});
it('maps aspect ratios to the SenseAudio size strings', async () => {
await writeConfig({
providers: {
senseaudio: { apiKey: 'sense-test-key', baseUrl: TEST_SENSEAUDIO_BASE_URL },
},
});
const fetchMock = vi.fn(async (input: unknown, init?: RequestInit) => {
const urlStr = String(input);
if (urlStr === `${TEST_SENSEAUDIO_BASE_URL}/v1/image/sync`) {
expect(JSON.parse(String(init?.body)).size).toBe('1280x720');
return buildOkResponse();
}
return buildImageFetchResponse(TEST_IMAGE_BYTES);
});
vi.stubGlobal('fetch', fetchMock);
await generateMedia({
projectRoot,
projectsRoot,
projectId: 'project-1',
surface: 'image',
model: 'senseaudio-image-1.0-260319',
aspect: '16:9',
prompt: 'Widescreen banner.',
output: 'sa-banner.png',
});
expect(fetchMock).toHaveBeenCalledTimes(2);
});
it('falls back to the canonical base URL when none is configured', async () => {
await writeConfig({
providers: {
senseaudio: { apiKey: 'sense-test-key' },
},
});
const fetchMock = vi.fn(async (input: unknown) => {
const urlStr = String(input);
if (urlStr === 'https://api.senseaudio.cn/v1/image/sync') {
return buildOkResponse();
}
return buildImageFetchResponse(TEST_IMAGE_BYTES);
});
vi.stubGlobal('fetch', fetchMock);
await generateMedia({
projectRoot,
projectsRoot,
projectId: 'project-1',
surface: 'image',
model: 'doubao-seedream-5-0-260128',
prompt: 'Default base url.',
output: 'sa-default-base.png',
});
expect(fetchMock).toHaveBeenCalledTimes(2);
});
it('reads the API key from OD_SENSEAUDIO_API_KEY when storage is empty', async () => {
process.env.OD_SENSEAUDIO_API_KEY = 'env-sense-key';
const fetchMock = vi.fn(async (input: unknown, init?: RequestInit) => {
if (String(input).endsWith('/v1/image/sync')) {
expect(init?.headers).toMatchObject({ authorization: 'Bearer env-sense-key' });
return buildOkResponse();
}
return buildImageFetchResponse(TEST_IMAGE_BYTES);
});
vi.stubGlobal('fetch', fetchMock);
await generateMedia({
projectRoot,
projectsRoot,
projectId: 'project-1',
surface: 'image',
model: 'senseaudio-image-2.0-260319',
prompt: 'Env-only key.',
output: 'sa-env.png',
});
expect(fetchMock).toHaveBeenCalledTimes(2);
});
it('errors when no API key is configured', async () => {
const fetchMock = vi.fn();
vi.stubGlobal('fetch', fetchMock);
await expect(
generateMedia({
projectRoot,
projectsRoot,
projectId: 'project-1',
surface: 'image',
model: 'senseaudio-image-2.0-260319',
prompt: 'Should fail.',
output: 'sa-no-key.png',
}),
).rejects.toThrow(/no SenseAudio API key/);
expect(fetchMock).not.toHaveBeenCalled();
});
it('surfaces HTTP-level failures with the status code and truncated body', async () => {
await writeConfig({
providers: {
senseaudio: { apiKey: 'sense-test-key', baseUrl: TEST_SENSEAUDIO_BASE_URL },
},
});
const fetchMock = vi.fn(async () =>
new Response('unauthorized', {
status: 401,
headers: { 'content-type': 'text/plain' },
}),
);
vi.stubGlobal('fetch', fetchMock);
await expect(
generateMedia({
projectRoot,
projectsRoot,
projectId: 'project-1',
surface: 'image',
model: 'senseaudio-image-2.0-260319',
prompt: 'Bad auth.',
output: 'sa-401.png',
}),
).rejects.toThrow('senseaudio image 401: unauthorized');
});
it('surfaces upstream error_message verbatim when the body reports failure', async () => {
await writeConfig({
providers: {
senseaudio: { apiKey: 'sense-test-key', baseUrl: TEST_SENSEAUDIO_BASE_URL },
},
});
const fetchMock = vi.fn(async () =>
new Response(
JSON.stringify({ error_message: 'sensitive_content_blocked' }),
{ status: 200, headers: { 'content-type': 'application/json' } },
),
);
vi.stubGlobal('fetch', fetchMock);
await expect(
generateMedia({
projectRoot,
projectsRoot,
projectId: 'project-1',
surface: 'image',
model: 'senseaudio-image-2.0-260319',
prompt: 'Blocked.',
output: 'sa-blocked.png',
}),
).rejects.toThrow('senseaudio image api error: sensitive_content_blocked');
});
it('errors when the response body is missing the image url', async () => {
await writeConfig({
providers: {
senseaudio: { apiKey: 'sense-test-key', baseUrl: TEST_SENSEAUDIO_BASE_URL },
},
});
const fetchMock = vi.fn(async () =>
new Response(
JSON.stringify({ base_resp: { status_code: 0, status_msg: 'success' } }),
{ status: 200, headers: { 'content-type': 'application/json' } },
),
);
vi.stubGlobal('fetch', fetchMock);
await expect(
generateMedia({
projectRoot,
projectsRoot,
projectId: 'project-1',
surface: 'image',
model: 'senseaudio-image-2.0-260319',
prompt: 'Missing url.',
output: 'sa-missing-url.png',
}),
).rejects.toThrow('senseaudio image response missing url');
});
});

View file

@ -523,6 +523,497 @@ describe('API proxy routes', () => {
expect(upstreamInit?.redirect).toBe('error'); expect(upstreamInit?.redirect).toBe('error');
}); });
it('streams delta + end for SenseAudio chat completions', async () => {
const fetchMock = vi.fn((input: FetchInput, init?: FetchInit) => {
const url = String(input);
if (url.startsWith(baseUrl)) return realFetch(input, init);
return Promise.resolve(sseResponse([
'data: {"choices":[{"delta":{"content":"sense"}}]}',
'',
'data: [DONE]',
'',
].join('\n')));
});
vi.stubGlobal('fetch', fetchMock);
const res = await realFetch(`${baseUrl}/api/proxy/senseaudio/stream`, {
method: 'POST',
headers: { 'content-type': 'application/json' },
body: JSON.stringify({
baseUrl: 'https://api.senseaudio.cn',
apiKey: 'sa-test',
projectId: 'test-project',
model: 'senseaudio-s2',
messages: [{ role: 'user', content: 'hello' }],
}),
});
await expect(res.text()).resolves.toContain('event: delta\ndata: {"delta":"sense"}');
expect(fetchMock).toHaveBeenCalledWith(
'https://api.senseaudio.cn/v1/chat/completions',
expect.objectContaining({
headers: expect.objectContaining({ Authorization: 'Bearer sa-test' }),
redirect: 'error',
}),
);
});
it('defaults SenseAudio base URL to api.senseaudio.cn when caller omits it', async () => {
const fetchMock = vi.fn((input: FetchInput, init?: FetchInit) => {
const url = String(input);
if (url.startsWith(baseUrl)) return realFetch(input, init);
return Promise.resolve(sseResponse('data: [DONE]\n\n'));
});
vi.stubGlobal('fetch', fetchMock);
await realFetch(`${baseUrl}/api/proxy/senseaudio/stream`, {
method: 'POST',
headers: { 'content-type': 'application/json' },
body: JSON.stringify({
apiKey: 'sa-test',
projectId: 'test-project',
model: 'senseaudio-s2',
messages: [{ role: 'user', content: 'hi' }],
}),
});
expect(String(fetchMock.mock.calls[0]![0])).toBe(
'https://api.senseaudio.cn/v1/chat/completions',
);
});
it('rejects SenseAudio requests that omit apiKey or model', async () => {
const fetchMock = vi.fn();
vi.stubGlobal('fetch', fetchMock);
const missingKey = await realFetch(`${baseUrl}/api/proxy/senseaudio/stream`, {
method: 'POST',
headers: { 'content-type': 'application/json' },
body: JSON.stringify({
model: 'senseaudio-s2',
messages: [{ role: 'user', content: 'hi' }],
}),
});
expect(missingKey.status).toBe(400);
const missingModel = await realFetch(`${baseUrl}/api/proxy/senseaudio/stream`, {
method: 'POST',
headers: { 'content-type': 'application/json' },
body: JSON.stringify({
apiKey: 'sa-test',
messages: [{ role: 'user', content: 'hi' }],
}),
});
expect(missingModel.status).toBe(400);
expect(fetchMock).not.toHaveBeenCalled();
});
it('disables upstream redirects for senseaudio proxy requests', async () => {
const fetchMock = vi.fn((input: FetchInput, init?: FetchInit) => {
const url = String(input);
if (url.startsWith(baseUrl)) return realFetch(input, init);
return Promise.resolve(sseResponse('data: [DONE]\n\n'));
});
vi.stubGlobal('fetch', fetchMock);
await realFetch(`${baseUrl}/api/proxy/senseaudio/stream`, {
method: 'POST',
headers: { 'content-type': 'application/json' },
body: JSON.stringify({
baseUrl: 'https://api.senseaudio.cn',
apiKey: 'sa-test',
projectId: 'test-project',
model: 'model-one',
messages: [{ role: 'user', content: 'hi' }],
}),
});
const upstreamCall = fetchMock.mock.calls.find(([input]) =>
!String(input).startsWith(baseUrl),
);
expect(upstreamCall).toBeDefined();
const upstreamInit = upstreamCall![1] as FetchInit;
expect(upstreamInit?.redirect).toBe('error');
});
it('injects generate_image tool definition on every SenseAudio request', async () => {
const fetchMock = vi.fn((input: FetchInput, init?: FetchInit) => {
const url = String(input);
if (url.startsWith(baseUrl)) return realFetch(input, init);
return Promise.resolve(sseResponse([
'data: {"choices":[{"delta":{"content":"ok"}}]}',
'',
'data: [DONE]',
'',
].join('\n')));
});
vi.stubGlobal('fetch', fetchMock);
await realFetch(`${baseUrl}/api/proxy/senseaudio/stream`, {
method: 'POST',
headers: { 'content-type': 'application/json' },
body: JSON.stringify({
baseUrl: 'https://api.senseaudio.cn',
apiKey: 'sa-test',
projectId: 'test-project',
model: 'senseaudio-s2',
messages: [{ role: 'user', content: 'hi' }],
}),
});
const upstreamCall = fetchMock.mock.calls.find(([input]) =>
!String(input).startsWith(baseUrl),
);
expect(upstreamCall).toBeDefined();
const body = JSON.parse(String((upstreamCall![1] as FetchInit)?.body));
expect(body.tool_choice).toBe('auto');
expect(Array.isArray(body.tools)).toBe(true);
expect(body.tools[0]).toMatchObject({
type: 'function',
function: { name: 'generate_image' },
});
});
it('runs the BYOK image tool loop end-to-end', async () => {
const pngBytes = Buffer.from([0x89, 0x50, 0x4e, 0x47, 0x0d, 0x0a, 0x1a, 0x0a, 0x00, 0x01]);
const upstreamChatBodies: any[] = [];
let chatCallIndex = 0;
const fetchMock = vi.fn(async (input: FetchInput, init?: FetchInit) => {
const url = String(input);
if (url.startsWith(baseUrl)) return realFetch(input, init);
// SenseAudio image generation
if (url === 'https://api.senseaudio.cn/v1/image/sync') {
return new Response(
JSON.stringify({
url: 'https://cdn.example.test/cat.png',
base_resp: { status_code: 0, status_msg: 'success' },
}),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
// Image bytes download (initiated by the tool, not via the proxy)
if (url === 'https://cdn.example.test/cat.png') {
return new Response(pngBytes, {
status: 200,
headers: { 'content-type': 'image/png' },
});
}
// Upstream chat completions — capture bodies, return different SSE per call
if (url === 'https://api.senseaudio.cn/v1/chat/completions') {
upstreamChatBodies.push(JSON.parse(String(init?.body || '{}')));
chatCallIndex++;
if (chatCallIndex === 1) {
// First turn: model decides to call generate_image
return sseResponse([
'data: {"choices":[{"index":0,"delta":{"role":"assistant","content":null,"tool_calls":[{"index":0,"id":"call_abc","type":"function","function":{"name":"generate_image","arguments":"{\\"prompt\\":\\"a cat\\"}"}}]},"finish_reason":null}]}',
'',
'data: {"choices":[{"index":0,"delta":{},"finish_reason":"tool_calls"}]}',
'',
'data: [DONE]',
'',
].join('\n'));
}
// Second turn: model summarises with image embedded in markdown
return sseResponse([
'data: {"choices":[{"index":0,"delta":{"content":"Here is your cat: "}}]}',
'',
'data: {"choices":[{"index":0,"delta":{"content":"![cat](generated)"}}]}',
'',
'data: {"choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}',
'',
'data: [DONE]',
'',
].join('\n'));
}
throw new Error(`unexpected fetch: ${url}`);
});
vi.stubGlobal('fetch', fetchMock);
const res = await realFetch(`${baseUrl}/api/proxy/senseaudio/stream`, {
method: 'POST',
headers: { 'content-type': 'application/json' },
body: JSON.stringify({
baseUrl: 'https://api.senseaudio.cn',
apiKey: 'sa-test',
projectId: 'test-project',
model: 'senseaudio-s2',
messages: [{ role: 'user', content: 'draw a cat' }],
}),
});
expect(res.status).toBe(200);
const body = await res.text();
// Final assistant text streams through to the client
expect(body).toContain('event: delta');
expect(body).toContain('Here is your cat');
expect(body).toContain('![cat](generated)');
expect(body).toContain('event: end');
// Two upstream chat completions calls happened (loop ran exactly once)
expect(upstreamChatBodies).toHaveLength(2);
// Second upstream call includes assistant{tool_calls} + tool{result}
const secondMessages = upstreamChatBodies[1].messages;
expect(secondMessages).toHaveLength(3);
expect(secondMessages[0]).toEqual({ role: 'user', content: 'draw a cat' });
expect(secondMessages[1]).toMatchObject({
role: 'assistant',
content: null,
tool_calls: [
{
id: 'call_abc',
type: 'function',
function: {
name: 'generate_image',
arguments: '{"prompt":"a cat"}',
},
},
],
});
expect(secondMessages[2]).toMatchObject({
role: 'tool',
tool_call_id: 'call_abc',
content: expect.stringMatching(
/Image generated successfully\. URL: \/api\/projects\/test-project\/files\/byok-[a-z0-9-]+\.png/,
),
});
});
it('feeds a tool error message back to the model when generate_image fails', async () => {
const upstreamChatBodies: any[] = [];
let chatCallIndex = 0;
const fetchMock = vi.fn(async (input: FetchInput, init?: FetchInit) => {
const url = String(input);
if (url.startsWith(baseUrl)) return realFetch(input, init);
if (url === 'https://api.senseaudio.cn/v1/image/sync') {
return new Response(
JSON.stringify({ error_message: 'sensitive_content_blocked' }),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
if (url === 'https://api.senseaudio.cn/v1/chat/completions') {
upstreamChatBodies.push(JSON.parse(String(init?.body || '{}')));
chatCallIndex++;
if (chatCallIndex === 1) {
return sseResponse([
'data: {"choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"id":"call_err","type":"function","function":{"name":"generate_image","arguments":"{\\"prompt\\":\\"...\\"}"}}]},"finish_reason":null}]}',
'',
'data: {"choices":[{"index":0,"delta":{},"finish_reason":"tool_calls"}]}',
'',
'data: [DONE]',
'',
].join('\n'));
}
return sseResponse([
'data: {"choices":[{"index":0,"delta":{"content":"Sorry, that one was blocked."}}]}',
'',
'data: {"choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}',
'',
'data: [DONE]',
'',
].join('\n'));
}
throw new Error(`unexpected fetch: ${url}`);
});
vi.stubGlobal('fetch', fetchMock);
const res = await realFetch(`${baseUrl}/api/proxy/senseaudio/stream`, {
method: 'POST',
headers: { 'content-type': 'application/json' },
body: JSON.stringify({
baseUrl: 'https://api.senseaudio.cn',
apiKey: 'sa-test',
projectId: 'test-project',
model: 'senseaudio-s2',
messages: [{ role: 'user', content: 'draw something blocked' }],
}),
});
expect(res.status).toBe(200);
const body = await res.text();
expect(body).toContain('Sorry, that one was blocked');
expect(upstreamChatBodies).toHaveLength(2);
const toolMsg = upstreamChatBodies[1].messages[2];
expect(toolMsg.role).toBe('tool');
expect(toolMsg.tool_call_id).toBe('call_err');
expect(toolMsg.content).toMatch(/Image generation failed/);
expect(toolMsg.content).toMatch(/sensitive_content_blocked/);
});
it('bounds the BYOK tool loop at MAX_BYOK_TOOL_LOOPS=3', async () => {
let chatCallIndex = 0;
const fetchMock = vi.fn(async (input: FetchInput, init?: FetchInit) => {
const url = String(input);
if (url.startsWith(baseUrl)) return realFetch(input, init);
if (url === 'https://api.senseaudio.cn/v1/image/sync') {
return new Response(
JSON.stringify({ url: 'https://cdn.example.test/x.png' }),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
if (url === 'https://cdn.example.test/x.png') {
return new Response(Buffer.from([0x89, 0x50]), { status: 200 });
}
if (url === 'https://api.senseaudio.cn/v1/chat/completions') {
chatCallIndex++;
// Always return tool_calls — the model never returns text
return sseResponse([
`data: {"choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"id":"call_${chatCallIndex}","type":"function","function":{"name":"generate_image","arguments":"{\\"prompt\\":\\"x\\"}"}}]},"finish_reason":null}]}`,
'',
'data: {"choices":[{"index":0,"delta":{},"finish_reason":"tool_calls"}]}',
'',
'data: [DONE]',
'',
].join('\n'));
}
throw new Error(`unexpected fetch: ${url}`);
});
vi.stubGlobal('fetch', fetchMock);
const res = await realFetch(`${baseUrl}/api/proxy/senseaudio/stream`, {
method: 'POST',
headers: { 'content-type': 'application/json' },
body: JSON.stringify({
baseUrl: 'https://api.senseaudio.cn',
apiKey: 'sa-test',
projectId: 'test-project',
model: 'senseaudio-s2',
messages: [{ role: 'user', content: 'infinite' }],
}),
});
expect(res.status).toBe(200);
const body = await res.text();
expect(body).toContain('event: end');
// Loop ran exactly MAX_BYOK_TOOL_LOOPS times before bailing.
expect(chatCallIndex).toBe(3);
});
it('writes the generated image into the project folder and serves it via /api/projects/:id/files/*', async () => {
const pngBytes = Buffer.from([0x89, 0x50, 0x4e, 0x47, 0x0d, 0x0a, 0x1a, 0x0a, 0x42, 0x59]);
let capturedUrl: string | undefined;
const fetchMock = vi.fn(async (input: FetchInput, init?: FetchInit) => {
const url = String(input);
if (url.startsWith(baseUrl)) return realFetch(input, init);
if (url === 'https://api.senseaudio.cn/v1/image/sync') {
return new Response(
JSON.stringify({ url: 'https://cdn.example.test/served.png' }),
{ status: 200, headers: { 'content-type': 'application/json' } },
);
}
if (url === 'https://cdn.example.test/served.png') {
return new Response(pngBytes, { status: 200 });
}
if (url === 'https://api.senseaudio.cn/v1/chat/completions') {
const body = JSON.parse(String(init?.body || '{}'));
// Capture URL the tool produced from the second turn's tool message.
const toolMsg = body.messages?.find((m: any) => m.role === 'tool');
if (toolMsg) {
const match = /URL: (\/api\/projects\/[A-Za-z0-9._-]+\/files\/byok-[a-z0-9-]+\.png)/.exec(toolMsg.content);
if (match) capturedUrl = match[1];
}
const isToolTurn = !toolMsg;
if (isToolTurn) {
return sseResponse([
'data: {"choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"id":"call_serve","type":"function","function":{"name":"generate_image","arguments":"{\\"prompt\\":\\"s\\"}"}}]},"finish_reason":null}]}',
'',
'data: {"choices":[{"index":0,"delta":{},"finish_reason":"tool_calls"}]}',
'',
'data: [DONE]',
'',
].join('\n'));
}
return sseResponse([
'data: {"choices":[{"index":0,"delta":{"content":"done"}}]}',
'',
'data: {"choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}',
'',
'data: [DONE]',
'',
].join('\n'));
}
throw new Error(`unexpected fetch: ${url}`);
});
vi.stubGlobal('fetch', fetchMock);
const proxyRes = await realFetch(`${baseUrl}/api/proxy/senseaudio/stream`, {
method: 'POST',
headers: { 'content-type': 'application/json' },
body: JSON.stringify({
baseUrl: 'https://api.senseaudio.cn',
apiKey: 'sa-test',
projectId: 'test-project',
model: 'senseaudio-s2',
messages: [{ role: 'user', content: 'gen' }],
}),
});
// Drain the SSE body so the tool loop fully completes before we assert.
await proxyRes.text();
expect(capturedUrl).toBeDefined();
// The URL the tool emits is relative — same-origin via Next.js
// rewrite in production, hits this test server directly here.
// We GET the captured URL through the standard project file route
// and assert the bytes come back. This proves both halves:
// (1) the image landed in <projectsRoot>/<projectId>/ as expected
// (so listFiles / FileViewer / archive will find it), and
// (2) /api/projects/:id/files/* serves it without needing any
// byok-specific route.
const imgRes = await realFetch(`${baseUrl}${capturedUrl!}`);
expect(imgRes.status).toBe(200);
expect(imgRes.headers.get('content-type')).toMatch(/^image\/png/);
const served = Buffer.from(await imgRes.arrayBuffer());
expect(served.equals(pngBytes)).toBe(true);
});
it('rejects senseaudio chat requests without a projectId', async () => {
const fetchMock = vi.fn();
vi.stubGlobal('fetch', fetchMock);
const res = await realFetch(`${baseUrl}/api/proxy/senseaudio/stream`, {
method: 'POST',
headers: { 'content-type': 'application/json' },
body: JSON.stringify({
apiKey: 'sa-test',
model: 'senseaudio-s2',
messages: [{ role: 'user', content: 'hi' }],
// no projectId — should 400
}),
});
expect(res.status).toBe(400);
expect(fetchMock).not.toHaveBeenCalled();
});
it('rejects senseaudio chat requests with an unsafe projectId', async () => {
const fetchMock = vi.fn();
vi.stubGlobal('fetch', fetchMock);
const res = await realFetch(`${baseUrl}/api/proxy/senseaudio/stream`, {
method: 'POST',
headers: { 'content-type': 'application/json' },
body: JSON.stringify({
apiKey: 'sa-test',
model: 'senseaudio-s2',
projectId: '../etc/passwd',
messages: [{ role: 'user', content: 'hi' }],
}),
});
expect(res.status).toBe(400);
expect(fetchMock).not.toHaveBeenCalled();
});
// Plan §3.A4 / spec §11.8 (e2e-7): the API-fallback proxy paths must // Plan §3.A4 / spec §11.8 (e2e-7): the API-fallback proxy paths must
// never carry plugin context. The web sidecar's fallback mode bypasses // never carry plugin context. The web sidecar's fallback mode bypasses
// the daemon snapshot bus, so any pluginId / appliedPluginSnapshotId in // the daemon snapshot bus, so any pluginId / appliedPluginSnapshotId in
@ -534,6 +1025,7 @@ describe('API proxy routes', () => {
'/api/proxy/openai/stream', '/api/proxy/openai/stream',
'/api/proxy/azure/stream', '/api/proxy/azure/stream',
'/api/proxy/google/stream', '/api/proxy/google/stream',
'/api/proxy/senseaudio/stream',
]; ];
for (const path of proxies) { for (const path of proxies) {

View file

@ -14,6 +14,7 @@ import {
trackStudioClickChatComposer, trackStudioClickChatComposer,
trackStudioViewChatPanel, trackStudioViewChatPanel,
} from '../analytics/events'; } from '../analytics/events';
import { IMAGE_MODELS } from "../media/models";
import { projectRawUrl, uploadProjectFiles, openFolderDialog, fetchConnectors } from "../providers/registry"; import { projectRawUrl, uploadProjectFiles, openFolderDialog, fetchConnectors } from "../providers/registry";
import { patchProject } from "../state/projects"; import { patchProject } from "../state/projects";
import { fetchMcpServers } from "../state/mcp"; import { fetchMcpServers } from "../state/mcp";
@ -126,6 +127,14 @@ interface Props {
researchAvailable?: boolean; researchAvailable?: boolean;
projectMetadata?: ProjectMetadata; projectMetadata?: ProjectMetadata;
onProjectMetadataChange?: (metadata: ProjectMetadata) => void; onProjectMetadataChange?: (metadata: ProjectMetadata) => void;
// SenseAudio BYOK image-model picker shown above the textarea. Hidden
// when the active chat protocol is anything other than 'senseaudio',
// so the composer stays clean for every other BYOK tab. The state
// owner is ProjectView (per-session, reset on refresh); ChatComposer
// is a fully controlled select.
byokApiProtocol?: AppConfig['apiProtocol'];
byokImageModel?: string;
onChangeByokImageModel?: (model: string) => void;
currentSkillId?: string | null; currentSkillId?: string | null;
onProjectSkillChange?: (skillId: string | null) => void; onProjectSkillChange?: (skillId: string | null) => void;
// Set when the project was created with a plugin already pinned // Set when the project was created with a plugin already pinned
@ -188,6 +197,9 @@ export const ChatComposer = forwardRef<ChatComposerHandle, Props>(
researchAvailable = false, researchAvailable = false,
projectMetadata, projectMetadata,
onProjectMetadataChange, onProjectMetadataChange,
byokApiProtocol,
byokImageModel,
onChangeByokImageModel,
currentSkillId = null, currentSkillId = null,
onProjectSkillChange, onProjectSkillChange,
pinnedPluginId = null, pinnedPluginId = null,
@ -1186,6 +1198,53 @@ export const ChatComposer = forwardRef<ChatComposerHandle, Props>(
t={t} t={t}
/> />
) : null} ) : null}
{byokApiProtocol === 'senseaudio' && onChangeByokImageModel ? (
<div
className="composer-byok-image-model"
data-testid="composer-byok-image-model"
style={{
display: 'flex',
alignItems: 'center',
gap: 8,
padding: '4px 8px',
fontSize: 12,
color: 'var(--text-muted, #888)',
}}
>
<Icon name="image" size={13} />
<label
htmlFor="composer-byok-image-model-select"
style={{ flexShrink: 0 }}
>
{t('settings.byokImageModel')}
</label>
<select
id="composer-byok-image-model-select"
value={byokImageModel ?? ''}
onChange={(e) => onChangeByokImageModel(e.target.value)}
style={{
background: 'transparent',
border: '1px solid var(--border, #444)',
borderRadius: 4,
padding: '2px 6px',
color: 'inherit',
fontSize: 12,
}}
>
<option value="">
{(IMAGE_MODELS.find((m) => m.provider === 'senseaudio')?.label
?? 'senseaudio-image-2.0') + ' (default)'}
</option>
{IMAGE_MODELS.filter((m) => m.provider === 'senseaudio').map(
(m) => (
<option key={m.id} value={m.id}>
{m.label}
</option>
),
)}
</select>
</div>
) : null}
{/* {/*
Spec §8.4 context bar above the composer input. The Spec §8.4 context bar above the composer input. The
section now behaves as a pure context bar: it renders the section now behaves as a pure context bar: it renders the

View file

@ -279,6 +279,12 @@ interface Props {
// message" without forcing a separate side widget. // message" without forcing a separate side widget.
activePluginSnapshot?: AppliedPluginSnapshot | null; activePluginSnapshot?: AppliedPluginSnapshot | null;
onCollapse?: () => void; onCollapse?: () => void;
// SenseAudio BYOK only — wired straight through to ChatComposer for the
// in-composer image-model picker. Active protocol is read so the picker
// hides when the user is on any other BYOK tab (azure / openai / …).
byokApiProtocol?: AppConfig['apiProtocol'];
byokImageModel?: string;
onChangeByokImageModel?: (model: string) => void;
} }
type Tab = 'chat' | 'comments'; type Tab = 'chat' | 'comments';
@ -327,6 +333,9 @@ export function ChatPane({
activePluginSnapshot, activePluginSnapshot,
skills = [], skills = [],
onCollapse, onCollapse,
byokApiProtocol,
byokImageModel,
onChangeByokImageModel,
}: Props) { }: Props) {
const t = useT(); const t = useT();
const logRef = useRef<HTMLDivElement | null>(null); const logRef = useRef<HTMLDivElement | null>(null);
@ -872,6 +881,9 @@ export function ChatPane({
researchAvailable={researchAvailable} researchAvailable={researchAvailable}
projectMetadata={projectMetadata} projectMetadata={projectMetadata}
onProjectMetadataChange={onProjectMetadataChange} onProjectMetadataChange={onProjectMetadataChange}
byokApiProtocol={byokApiProtocol}
byokImageModel={byokImageModel}
onChangeByokImageModel={onChangeByokImageModel}
currentSkillId={currentSkillId} currentSkillId={currentSkillId}
onProjectSkillChange={onProjectSkillChange} onProjectSkillChange={onProjectSkillChange}
pinnedPluginId={activePluginSnapshot?.pluginId ?? null} pinnedPluginId={activePluginSnapshot?.pluginId ?? null}

View file

@ -1192,7 +1192,14 @@ export function DesignFilesPanel({
</div> </div>
</div> </div>
{preview && previewFile ? ( {preview && previewFile ? (
// Key on the file name so React unmounts the previous DfPreview
// (and its iframe / image element) when the user clicks a
// different file. Without this, React diffing reuses the same
// iframe DOM node and the browser keeps showing the first
// file's contents — only the `src` prop changes but the iframe
// never actually navigates.
<DfPreview <DfPreview
key={previewFile.name}
projectId={projectId} projectId={projectId}
file={previewFile} file={previewFile}
onOpen={() => onOpenFile(previewFile.name)} onOpen={() => onOpenFile(previewFile.name)}

View file

@ -486,6 +486,15 @@ export function ProjectView({
const [liveArtifacts, setLiveArtifacts] = useState<LiveArtifactSummary[]>([]); const [liveArtifacts, setLiveArtifacts] = useState<LiveArtifactSummary[]>([]);
const [liveArtifactEvents, setLiveArtifactEvents] = useState<LiveArtifactEventItem[]>([]); const [liveArtifactEvents, setLiveArtifactEvents] = useState<LiveArtifactEventItem[]>([]);
const [workspaceFocused, setWorkspaceFocused] = useState(false); const [workspaceFocused, setWorkspaceFocused] = useState(false);
// Per-session override for the BYOK SenseAudio chat's generate_image
// tool. Seeded once from Settings (config.byokImageModel) so the
// composer dropdown opens on the user's chosen default; subsequent
// selections live only in this component's state — page refresh /
// project switch resets to the Settings default. Persistent defaults
// live in Settings → BYOK → SenseAudio → Image generation model.
const [byokImageModelOverride, setByokImageModelOverride] = useState<string>(
config.byokImageModel ?? '',
);
// `closed` → no surface; `review` → read-only saved-state panel with a // `closed` → no surface; `review` → read-only saved-state panel with a
// preview + reopen-to-edit action (#1822); `edit` → the textarea editor. // preview + reopen-to-edit action (#1822); `edit` → the textarea editor.
const [instructionsMode, setInstructionsMode] = useState<'closed' | 'review' | 'edit'>('closed'); const [instructionsMode, setInstructionsMode] = useState<'closed' | 'review' | 'edit'>('closed');
@ -2202,6 +2211,13 @@ export function ProjectView({
}); });
}, },
onError: handlers.onError, onError: handlers.onError,
}, {
projectId: project.id,
// SenseAudio BYOK chat reads this to pre-fill the tool param's
// default model. Prefer the live composer override; fall back
// to the Settings default when the composer dropdown is on
// "use default". Other protocols ignore unknown body fields.
byokImageModel: byokImageModelOverride || config.byokImageModel,
}); });
} }
}, },
@ -3375,6 +3391,9 @@ export function ProjectView({
onTogglePet={onTogglePet} onTogglePet={onTogglePet}
onOpenPetSettings={onOpenPetSettings} onOpenPetSettings={onOpenPetSettings}
researchAvailable={config.mode === 'daemon'} researchAvailable={config.mode === 'daemon'}
byokApiProtocol={config.apiProtocol}
byokImageModel={byokImageModelOverride}
onChangeByokImageModel={setByokImageModelOverride}
projectMetadata={project.metadata} projectMetadata={project.metadata}
onProjectMetadataChange={(metadata) => { onProjectMetadataChange={(metadata) => {
onProjectChange({ ...project, metadata }); onProjectChange({ ...project, metadata });

View file

@ -68,7 +68,7 @@ import type {
import { testAgent, testApiProvider } from '../providers/connection-test'; import { testAgent, testApiProvider } from '../providers/connection-test';
import { fetchProviderModels } from '../providers/provider-models'; import { fetchProviderModels } from '../providers/provider-models';
import { fetchConnectors, fetchDesignTemplates } from '../providers/registry'; import { fetchConnectors, fetchDesignTemplates } from '../providers/registry';
import { MEDIA_PROVIDERS } from '../media/models'; import { IMAGE_MODELS, MEDIA_PROVIDERS } from '../media/models';
import { XaiOAuthControl } from './XaiOAuthControl'; import { XaiOAuthControl } from './XaiOAuthControl';
import type { MediaProvider } from '../media/models'; import type { MediaProvider } from '../media/models';
import { Toast } from './Toast'; import { Toast } from './Toast';
@ -444,6 +444,7 @@ function currentApiProtocolConfig(config: AppConfig): ApiProtocolConfig {
model: config.model, model: config.model,
apiVersion: config.apiVersion ?? '', apiVersion: config.apiVersion ?? '',
apiProviderBaseUrl: config.apiProviderBaseUrl ?? null, apiProviderBaseUrl: config.apiProviderBaseUrl ?? null,
byokImageModel: config.byokImageModel ?? '',
}; };
} }
@ -460,6 +461,11 @@ function applyApiProtocolConfig(
model: apiConfig.model, model: apiConfig.model,
apiProviderBaseUrl: apiConfig.apiProviderBaseUrl ?? null, apiProviderBaseUrl: apiConfig.apiProviderBaseUrl ?? null,
apiVersion: protocol === 'azure' ? (apiConfig.apiVersion ?? '') : '', apiVersion: protocol === 'azure' ? (apiConfig.apiVersion ?? '') : '',
// byokImageModel is SenseAudio-only — flipping to another BYOK tab
// shouldn't carry a SenseAudio image-model choice into, say, the
// OpenAI form. Mirrors the apiVersion guarding above.
byokImageModel:
protocol === 'senseaudio' ? (apiConfig.byokImageModel ?? '') : '',
}; };
} }
@ -2683,6 +2689,34 @@ export function SettingsDialog({
/> />
</label> </label>
) : null} ) : null}
{apiProtocol === 'senseaudio' ? (
<label className="field">
<span className="field-label">{t('settings.byokImageModel')}</span>
<select
value={cfg.byokImageModel ?? ''}
onChange={(e) =>
updateApiConfig({ byokImageModel: e.target.value })
}
>
{/* Default-empty option resolves to the registry default
on the daemon side (senseaudio-image-2.0-260319 today).
Listing it explicitly lets the picker show what the
unconfigured state actually means. */}
<option value="">
{IMAGE_MODELS.find((m) => m.provider === 'senseaudio')?.label
?? 'senseaudio-image-2.0'}
{' (default)'}
</option>
{IMAGE_MODELS.filter((m) => m.provider === 'senseaudio').map(
(m) => (
<option key={m.id} value={m.id}>
{m.label}
</option>
),
)}
</select>
</label>
) : null}
<p className="hint">{t('settings.apiHint')}</p> <p className="hint">{t('settings.apiHint')}</p>
</section> </section>
)} )}

View file

@ -202,6 +202,7 @@ export const ar: Dict = {
'settings.azureDeploymentModelHint': 'settings.azureDeploymentModelHint':
'في Azure OpenAI، يُستخدم هذا الحقل كاسم النشر في /openai/deployments/<model>. أدخل اسم النشر الذي أنشأته في Azure.', 'في Azure OpenAI، يُستخدم هذا الحقل كاسم النشر في /openai/deployments/<model>. أدخل اسم النشر الذي أنشأته في Azure.',
'settings.apiVersion': 'إصدار API', 'settings.apiVersion': 'إصدار API',
'settings.byokImageModel': 'نموذج إنشاء الصور',
'settings.maxTokens': 'أقصى عدد من الرموز (اختياري)', 'settings.maxTokens': 'أقصى عدد من الرموز (اختياري)',
'settings.maxTokensHint': 'settings.maxTokensHint':
'الحد الأقصى لطول الاستجابة. لكل نموذج قيمة افتراضية؛ اتركها فارغة لاستخدامها، أو أدخل رقماً للتجاوز.', 'الحد الأقصى لطول الاستجابة. لكل نموذج قيمة افتراضية؛ اتركها فارغة لاستخدامها، أو أدخل رقماً للتجاوز.',

View file

@ -202,6 +202,7 @@ export const de: Dict = {
'settings.azureDeploymentModelHint': 'settings.azureDeploymentModelHint':
'Fuer Azure OpenAI wird dieses Feld als Deployment-Name in /openai/deployments/<model> verwendet. Geben Sie den in Azure angelegten Deployment-Namen ein.', 'Fuer Azure OpenAI wird dieses Feld als Deployment-Name in /openai/deployments/<model> verwendet. Geben Sie den in Azure angelegten Deployment-Namen ein.',
'settings.apiVersion': 'API-Version', 'settings.apiVersion': 'API-Version',
'settings.byokImageModel': 'Bilderzeugungsmodell',
'settings.maxTokens': 'Max. Tokens (optional)', 'settings.maxTokens': 'Max. Tokens (optional)',
'settings.maxTokensHint': 'settings.maxTokensHint':
'Obergrenze für die Antwortlänge. Jedes Modell hat einen abgestimmten Standardwert (im Platzhalter sichtbar); leer lassen, um ihn zu verwenden, oder eine Zahl eingeben, um ihn zu überschreiben.', 'Obergrenze für die Antwortlänge. Jedes Modell hat einen abgestimmten Standardwert (im Platzhalter sichtbar); leer lassen, um ihn zu verwenden, oder eine Zahl eingeben, um ihn zu überschreiben.',

View file

@ -227,6 +227,7 @@ export const en: Dict = {
'settings.azureModelFetchHint': 'settings.azureModelFetchHint':
'For Azure OpenAI, enter the deployment name you created in Azure. Automatic deployment discovery is not available from this BYOK endpoint.', 'For Azure OpenAI, enter the deployment name you created in Azure. Automatic deployment discovery is not available from this BYOK endpoint.',
'settings.apiVersion': 'API version', 'settings.apiVersion': 'API version',
'settings.byokImageModel': 'Image generation model',
'settings.maxTokens': 'Max tokens (optional)', 'settings.maxTokens': 'Max tokens (optional)',
'settings.maxTokensHint': 'settings.maxTokensHint':
'Cap on the response length. Each model has a tuned default (shown as a placeholder); leave blank to use it, or enter a number to override.', 'Cap on the response length. Each model has a tuned default (shown as a placeholder); leave blank to use it, or enter a number to override.',

View file

@ -202,6 +202,7 @@ export const esES: Dict = {
'settings.azureDeploymentModelHint': 'settings.azureDeploymentModelHint':
'Para Azure OpenAI, este campo se usa como nombre del despliegue en /openai/deployments/<model>. Introduce el nombre del despliegue que creaste en Azure.', 'Para Azure OpenAI, este campo se usa como nombre del despliegue en /openai/deployments/<model>. Introduce el nombre del despliegue que creaste en Azure.',
'settings.apiVersion': 'Versión de API', 'settings.apiVersion': 'Versión de API',
'settings.byokImageModel': 'Modelo de generación de imágenes',
'settings.maxTokens': 'Tokens máx. (opcional)', 'settings.maxTokens': 'Tokens máx. (opcional)',
'settings.maxTokensHint': 'settings.maxTokensHint':
'Tope para la longitud de la respuesta. Cada modelo tiene un valor por defecto ajustado (visible en el placeholder); déjalo vacío para usarlo o introduce un número para anularlo.', 'Tope para la longitud de la respuesta. Cada modelo tiene un valor por defecto ajustado (visible en el placeholder); déjalo vacío para usarlo o introduce un número para anularlo.',

View file

@ -202,6 +202,7 @@ export const fa: Dict = {
'settings.azureDeploymentModelHint': 'settings.azureDeploymentModelHint':
'در Azure OpenAI، این فیلد به عنوان نام استقرار در /openai/deployments/<model> استفاده می‌شود. نام استقراری را که در Azure ساخته‌اید وارد کنید.', 'در Azure OpenAI، این فیلد به عنوان نام استقرار در /openai/deployments/<model> استفاده می‌شود. نام استقراری را که در Azure ساخته‌اید وارد کنید.',
'settings.apiVersion': 'نسخه API', 'settings.apiVersion': 'نسخه API',
'settings.byokImageModel': 'مدل تولید تصویر',
'settings.maxTokens': 'حداکثر توکن (اختیاری)', 'settings.maxTokens': 'حداکثر توکن (اختیاری)',
'settings.maxTokensHint': 'settings.maxTokensHint':
'سقف طول پاسخ. هر مدل مقدار پیش‌فرض تنظیم‌شدهٔ خود را دارد (در placeholder نمایش داده می‌شود)؛ برای استفاده از آن خالی بگذارید، یا برای جایگزینی، عددی وارد کنید.', 'سقف طول پاسخ. هر مدل مقدار پیش‌فرض تنظیم‌شدهٔ خود را دارد (در placeholder نمایش داده می‌شود)؛ برای استفاده از آن خالی بگذارید، یا برای جایگزینی، عددی وارد کنید.',

View file

@ -202,6 +202,7 @@ export const fr: Dict = {
'settings.azureDeploymentModelHint': 'settings.azureDeploymentModelHint':
'Pour Azure OpenAI, ce champ est utilisé comme nom du déploiement dans /openai/deployments/<model>. Saisissez le nom du déploiement créé dans Azure.', 'Pour Azure OpenAI, ce champ est utilisé comme nom du déploiement dans /openai/deployments/<model>. Saisissez le nom du déploiement créé dans Azure.',
'settings.apiVersion': 'Version API', 'settings.apiVersion': 'Version API',
'settings.byokImageModel': "Modèle de génération d'images",
'settings.maxTokens': 'Tokens max (optionnel)', 'settings.maxTokens': 'Tokens max (optionnel)',
'settings.maxTokensHint': 'settings.maxTokensHint':
'Limite de la longueur de réponse. Chaque modèle a une valeur par défaut (affichée à titre indicatif) ; laissez vide pour l\'utiliser, ou entrez un nombre pour la remplacer.', 'Limite de la longueur de réponse. Chaque modèle a une valeur par défaut (affichée à titre indicatif) ; laissez vide pour l\'utiliser, ou entrez un nombre pour la remplacer.',

View file

@ -202,6 +202,7 @@ export const hu: Dict = {
'settings.azureDeploymentModelHint': 'settings.azureDeploymentModelHint':
'Azure OpenAI esetén ez a mező a /openai/deployments/<model> deployment neveként szerepel. Add meg az Azure-ban létrehozott deployment nevét.', 'Azure OpenAI esetén ez a mező a /openai/deployments/<model> deployment neveként szerepel. Add meg az Azure-ban létrehozott deployment nevét.',
'settings.apiVersion': 'API-verzió', 'settings.apiVersion': 'API-verzió',
'settings.byokImageModel': 'Képgenerálási modell',
'settings.maxTokens': 'Max tokenek (opcionális)', 'settings.maxTokens': 'Max tokenek (opcionális)',
'settings.maxTokensHint': 'settings.maxTokensHint':
'A válasz hosszának felső határa. Minden modellnek van hangolt alapértelmezése (placeholderként látható); hagyd üresen az alkalmazásához, vagy adj meg számot a felülíráshoz.', 'A válasz hosszának felső határa. Minden modellnek van hangolt alapértelmezése (placeholderként látható); hagyd üresen az alkalmazásához, vagy adj meg számot a felülíráshoz.',

View file

@ -202,6 +202,7 @@ export const id: Dict = {
'settings.azureDeploymentModelHint': 'settings.azureDeploymentModelHint':
'Untuk Azure OpenAI, field ini digunakan sebagai nama deployment di /openai/deployments/<model>. Masukkan nama deployment yang kamu buat di Azure.', 'Untuk Azure OpenAI, field ini digunakan sebagai nama deployment di /openai/deployments/<model>. Masukkan nama deployment yang kamu buat di Azure.',
'settings.apiVersion': 'Versi API', 'settings.apiVersion': 'Versi API',
'settings.byokImageModel': 'Model pembuatan gambar',
'settings.maxTokens': 'Token maks (opsional)', 'settings.maxTokens': 'Token maks (opsional)',
'settings.maxTokensHint': 'settings.maxTokensHint':
'Batas panjang respons. Setiap model punya default sendiri; kosongkan untuk memakainya, atau isi angka untuk menimpa.', 'Batas panjang respons. Setiap model punya default sendiri; kosongkan untuk memakainya, atau isi angka untuk menimpa.',

View file

@ -199,6 +199,7 @@ export const it: Dict = {
'settings.azureDeploymentModelHint': 'settings.azureDeploymentModelHint':
'Per Azure OpenAI, questo campo viene utilizzato come nome del deployment in /openai/deployments/<model>. Inserisci il nome del deployment creato in Azure.', 'Per Azure OpenAI, questo campo viene utilizzato come nome del deployment in /openai/deployments/<model>. Inserisci il nome del deployment creato in Azure.',
'settings.apiVersion': 'Versione API', 'settings.apiVersion': 'Versione API',
'settings.byokImageModel': 'Modello di generazione immagini',
'settings.maxTokens': 'Token massimi (opzionale)', 'settings.maxTokens': 'Token massimi (opzionale)',
'settings.maxTokensHint': 'settings.maxTokensHint':
'Limite della lunghezza della risposta. Ogni modello ha un valore predefinito (mostrato nel placeholder); lascia vuoto per usarlo, o inserisci un numero per sostituirlo.', 'Limite della lunghezza della risposta. Ogni modello ha un valore predefinito (mostrato nel placeholder); lascia vuoto per usarlo, o inserisci un numero per sostituirlo.',

View file

@ -202,6 +202,7 @@ export const ja: Dict = {
'settings.azureDeploymentModelHint': 'settings.azureDeploymentModelHint':
'Azure OpenAI では、このフィールドが /openai/deployments/<model> のデプロイ名として使われます。Azure で作成したデプロイ名を入力してください。', 'Azure OpenAI では、このフィールドが /openai/deployments/<model> のデプロイ名として使われます。Azure で作成したデプロイ名を入力してください。',
'settings.apiVersion': 'API バージョン', 'settings.apiVersion': 'API バージョン',
'settings.byokImageModel': '画像生成モデル',
'settings.maxTokens': '最大トークン(任意)', 'settings.maxTokens': '最大トークン(任意)',
'settings.maxTokensHint': 'settings.maxTokensHint':
'応答長の上限。各モデルにチューニング済みのデフォルト値があります(プレースホルダーに表示)。空のままにすればそれを使用し、数値を入力すれば上書きされます。', '応答長の上限。各モデルにチューニング済みのデフォルト値があります(プレースホルダーに表示)。空のままにすればそれを使用し、数値を入力すれば上書きされます。',

View file

@ -205,6 +205,7 @@ export const ko: Dict = {
'settings.azureDeploymentModelHint': 'settings.azureDeploymentModelHint':
'Azure OpenAI에서는 이 필드가 /openai/deployments/<model>의 배포 이름으로 사용됩니다. Azure에서 만든 배포 이름을 입력하세요.', 'Azure OpenAI에서는 이 필드가 /openai/deployments/<model>의 배포 이름으로 사용됩니다. Azure에서 만든 배포 이름을 입력하세요.',
'settings.apiVersion': 'API 버전', 'settings.apiVersion': 'API 버전',
'settings.byokImageModel': '이미지 생성 모델',
'settings.apiHint': '요청은 로컬 daemon 프록시를 통해 설정한 Base URL로 전송됩니다. 키는 이 브라우저에만 저장되며 제공자 요청과 함께 전송됩니다.', 'settings.apiHint': '요청은 로컬 daemon 프록시를 통해 설정한 Base URL로 전송됩니다. 키는 이 브라우저에만 저장되며 제공자 요청과 함께 전송됩니다.',
'settings.skipForNow': '지금은 건너뛰기', 'settings.skipForNow': '지금은 건너뛰기',
'settings.getStarted': '시작하기', 'settings.getStarted': '시작하기',

View file

@ -202,6 +202,7 @@ export const pl: Dict = {
'settings.azureDeploymentModelHint': 'settings.azureDeploymentModelHint':
'Dla Azure OpenAI to pole jest używane jako nazwa wdrożenia w /openai/deployments/<model>. Wpisz nazwę wdrożenia utworzonego w Azure.', 'Dla Azure OpenAI to pole jest używane jako nazwa wdrożenia w /openai/deployments/<model>. Wpisz nazwę wdrożenia utworzonego w Azure.',
'settings.apiVersion': 'Wersja API', 'settings.apiVersion': 'Wersja API',
'settings.byokImageModel': 'Model generowania obrazów',
'settings.maxTokens': 'Maks. liczba tokenów (opcjonalnie)', 'settings.maxTokens': 'Maks. liczba tokenów (opcjonalnie)',
'settings.maxTokensHint': 'settings.maxTokensHint':
'Limit długości odpowiedzi. Każdy model ma dostrojony domyślny limit (widoczny jako placeholder); pozostaw puste, aby go użyć, lub wpisz liczbę.', 'Limit długości odpowiedzi. Każdy model ma dostrojony domyślny limit (widoczny jako placeholder); pozostaw puste, aby go użyć, lub wpisz liczbę.',

View file

@ -202,6 +202,7 @@ export const ptBR: Dict = {
'settings.azureDeploymentModelHint': 'settings.azureDeploymentModelHint':
'No Azure OpenAI, este campo e usado como nome do deployment em /openai/deployments/<model>. Informe o nome do deployment criado no Azure.', 'No Azure OpenAI, este campo e usado como nome do deployment em /openai/deployments/<model>. Informe o nome do deployment criado no Azure.',
'settings.apiVersion': 'Versão da API', 'settings.apiVersion': 'Versão da API',
'settings.byokImageModel': 'Modelo de geração de imagens',
'settings.maxTokens': 'Tokens máx. (opcional)', 'settings.maxTokens': 'Tokens máx. (opcional)',
'settings.maxTokensHint': 'settings.maxTokensHint':
'Limite para o comprimento da resposta. Cada modelo tem um valor padrão ajustado (visível no placeholder); deixe em branco para usá-lo ou insira um número para substituí-lo.', 'Limite para o comprimento da resposta. Cada modelo tem um valor padrão ajustado (visível no placeholder); deixe em branco para usá-lo ou insira um número para substituí-lo.',

View file

@ -202,6 +202,7 @@ export const ru: Dict = {
'settings.azureDeploymentModelHint': 'settings.azureDeploymentModelHint':
'Для Azure OpenAI это поле используется как имя развертывания в /openai/deployments/<model>. Укажите имя развертывания, созданного в Azure.', 'Для Azure OpenAI это поле используется как имя развертывания в /openai/deployments/<model>. Укажите имя развертывания, созданного в Azure.',
'settings.apiVersion': 'Версия API', 'settings.apiVersion': 'Версия API',
'settings.byokImageModel': 'Модель генерации изображений',
'settings.maxTokens': 'Макс. токенов (опционально)', 'settings.maxTokens': 'Макс. токенов (опционально)',
'settings.maxTokensHint': 'settings.maxTokensHint':
'Ограничение длины ответа. У каждой модели свой настроенный дефолт (виден в плейсхолдере); оставьте поле пустым, чтобы использовать его, или введите число, чтобы переопределить.', 'Ограничение длины ответа. У каждой модели свой настроенный дефолт (виден в плейсхолдере); оставьте поле пустым, чтобы использовать его, или введите число, чтобы переопределить.',

View file

@ -198,6 +198,7 @@ export const th: Dict = {
'settings.azureDeploymentModel': 'ชื่อ Deployment', 'settings.azureDeploymentModel': 'ชื่อ Deployment',
'settings.azureDeploymentModelHint': 'สำหรับ Azure OpenAI ฟิลด์นี้ใช้เป็นชื่อ Deployment ใน /openai/deployments/<model> ป้อนชื่อ Deployment ที่คุณสร้างใน Azure', 'settings.azureDeploymentModelHint': 'สำหรับ Azure OpenAI ฟิลด์นี้ใช้เป็นชื่อ Deployment ใน /openai/deployments/<model> ป้อนชื่อ Deployment ที่คุณสร้างใน Azure',
'settings.apiVersion': 'เวอร์ชัน API', 'settings.apiVersion': 'เวอร์ชัน API',
'settings.byokImageModel': 'โมเดลสร้างภาพ',
'settings.maxTokens': 'Max tokens (เลือกได้)', 'settings.maxTokens': 'Max tokens (เลือกได้)',
'settings.maxTokensHint': 'ขีดจำกัดความยาวในการตอบกลับ', 'settings.maxTokensHint': 'ขีดจำกัดความยาวในการตอบกลับ',
'settings.apiHint': 'คำสั่งจะถูกส่งผ่าน local daemon proxy ไปยัง base URL ที่คุณตั้งไว้ API Key จะถูกเก็บในเบราว์เซอร์นี้เท่านั้น', 'settings.apiHint': 'คำสั่งจะถูกส่งผ่าน local daemon proxy ไปยัง base URL ที่คุณตั้งไว้ API Key จะถูกเก็บในเบราว์เซอร์นี้เท่านั้น',

View file

@ -202,6 +202,7 @@ export const tr: Dict = {
'settings.azureDeploymentModelHint': 'settings.azureDeploymentModelHint':
'Azure OpenAI icin bu alan /openai/deployments/<model> icindeki dagitim adi olarak kullanilir. Azureda olusturdugunuz dagitim adini girin.', 'Azure OpenAI icin bu alan /openai/deployments/<model> icindeki dagitim adi olarak kullanilir. Azureda olusturdugunuz dagitim adini girin.',
'settings.apiVersion': 'API sürümü', 'settings.apiVersion': 'API sürümü',
'settings.byokImageModel': 'Görüntü oluşturma modeli',
'settings.maxTokens': 'Maks. token (isteğe bağlı)', 'settings.maxTokens': 'Maks. token (isteğe bağlı)',
'settings.maxTokensHint': 'settings.maxTokensHint':
'Yanıt uzunluğu sınırı. Her modelin ayarlanmış bir varsayılanı vardır (yer tutucuda görünür); kullanmak için boş bırakın, üzerine yazmak için bir sayı girin.', 'Yanıt uzunluğu sınırı. Her modelin ayarlanmış bir varsayılanı vardır (yer tutucuda görünür); kullanmak için boş bırakın, üzerine yazmak için bir sayı girin.',

View file

@ -203,6 +203,7 @@ export const uk: Dict = {
'settings.azureDeploymentModelHint': 'settings.azureDeploymentModelHint':
'Для Azure OpenAI це поле використовується як назва розгортання в /openai/deployments/<model>. Введіть назву розгортання, створену в Azure.', 'Для Azure OpenAI це поле використовується як назва розгортання в /openai/deployments/<model>. Введіть назву розгортання, створену в Azure.',
'settings.apiVersion': 'Версія API', 'settings.apiVersion': 'Версія API',
'settings.byokImageModel': 'Модель генерації зображень',
'settings.maxTokens': 'Макс. токенів (необов\'язково)', 'settings.maxTokens': 'Макс. токенів (необов\'язково)',
'settings.maxTokensHint': 'settings.maxTokensHint':
'Обмеження на довжину відповіді. Кожна модель має налаштовану за замовчуванням (показано в заповнювачі); залиште поле порожнім, щоб використовувати її, або введіть число, щоб переопрацювати.', 'Обмеження на довжину відповіді. Кожна модель має налаштовану за замовчуванням (показано в заповнювачі); залиште поле порожнім, щоб використовувати її, або введіть число, щоб переопрацювати.',

View file

@ -227,6 +227,7 @@ export const zhCN: Dict = {
'settings.azureModelFetchHint': 'settings.azureModelFetchHint':
'对于 Azure OpenAI请填写你在 Azure 中创建的部署名称。当前 BYOK 端点无法自动发现 deployment。', '对于 Azure OpenAI请填写你在 Azure 中创建的部署名称。当前 BYOK 端点无法自动发现 deployment。',
'settings.apiVersion': 'API 版本', 'settings.apiVersion': 'API 版本',
'settings.byokImageModel': '图片生成模型',
'settings.maxTokens': '最大 tokens可选', 'settings.maxTokens': '最大 tokens可选',
'settings.maxTokensHint': 'settings.maxTokensHint':
'响应长度上限。每个 model 有调优过的默认值(在 placeholder 里显示),留空即使用,输入数字则覆盖。', '响应长度上限。每个 model 有调优过的默认值(在 placeholder 里显示),留空即使用,输入数字则覆盖。',

View file

@ -201,6 +201,7 @@ export const zhTW: Dict = {
'settings.azureDeploymentModelHint': 'settings.azureDeploymentModelHint':
'對於 Azure OpenAI此欄位會作為 /openai/deployments/<model> 中的部署名稱使用。請填入你在 Azure 中建立的部署名稱。', '對於 Azure OpenAI此欄位會作為 /openai/deployments/<model> 中的部署名稱使用。請填入你在 Azure 中建立的部署名稱。',
'settings.apiVersion': 'API 版本', 'settings.apiVersion': 'API 版本',
'settings.byokImageModel': '圖片生成模型',
'settings.maxTokens': '最大 tokens可選', 'settings.maxTokens': '最大 tokens可選',
'settings.maxTokensHint': 'settings.maxTokensHint':
'回應長度上限。每個 model 有調過的預設值(在 placeholder 顯示),留空即使用,輸入數字則覆蓋。', '回應長度上限。每個 model 有調過的預設值(在 placeholder 顯示),留空即使用,輸入數字則覆蓋。',

View file

@ -252,6 +252,7 @@ export interface Dict {
'settings.azureDeploymentModelHint': string; 'settings.azureDeploymentModelHint': string;
'settings.azureModelFetchHint': string; 'settings.azureModelFetchHint': string;
'settings.apiVersion': string; 'settings.apiVersion': string;
'settings.byokImageModel': string;
'settings.apiHint': string; 'settings.apiHint': string;
'settings.skipForNow': string; 'settings.skipForNow': string;
'settings.getStarted': string; 'settings.getStarted': string;

View file

@ -234,7 +234,7 @@ export const MEDIA_PROVIDERS: MediaProvider[] = [
{ {
id: 'senseaudio', id: 'senseaudio',
label: 'SenseAudio', label: 'SenseAudio',
hint: 'TTS · 70+ system voices · clone', hint: '',
integrated: true, integrated: true,
defaultBaseUrl: 'https://api.senseaudio.cn', defaultBaseUrl: 'https://api.senseaudio.cn',
docsUrl: 'https://docs.senseaudio.cn', docsUrl: 'https://docs.senseaudio.cn',
@ -344,6 +344,29 @@ export const IMAGE_MODELS: MediaModel[] = [
caps: ['i2i'], caps: ['i2i'],
}, },
// SenseAudio — synchronous /v1/image/sync, Bearer auth, reference URL or data URI.
{
id: 'senseaudio-image-2.0-260319',
label: 'senseaudio-image-2.0',
hint: 'SenseAudio · multi-aspect, latest',
provider: 'senseaudio',
caps: ['t2i', 'i2i'],
},
{
id: 'senseaudio-image-1.0-260319',
label: 'senseaudio-image-1.0',
hint: 'SenseAudio · standard',
provider: 'senseaudio',
caps: ['t2i', 'i2i'],
},
{
id: 'doubao-seedream-5-0-260128',
label: 'seedream-5.0',
hint: 'SenseAudio · ByteDance Seedream 5.0 hi-res',
provider: 'senseaudio',
caps: ['t2i', 'i2i'],
},
// xAI Grok Imagine — text-to-image (1k/2k, 11+ aspect ratios). // xAI Grok Imagine — text-to-image (1k/2k, 11+ aspect ratios).
{ {
id: 'grok-imagine-image', id: 'grok-imagine-image',

View file

@ -11,10 +11,12 @@ import Anthropic from '@anthropic-ai/sdk';
import { effectiveMaxTokens } from '../state/maxTokens'; import { effectiveMaxTokens } from '../state/maxTokens';
import type { AppConfig, ChatMessage } from '../types'; import type { AppConfig, ChatMessage } from '../types';
import { streamMessageAnthropicProxy } from './anthropic-compatible'; import { streamMessageAnthropicProxy } from './anthropic-compatible';
import type { ProxyContext } from './api-proxy';
import { streamMessageAzure } from './azure-compatible'; import { streamMessageAzure } from './azure-compatible';
import { streamMessageGoogle } from './google-compatible'; import { streamMessageGoogle } from './google-compatible';
import { streamMessageOllama } from './ollama-compatible'; import { streamMessageOllama } from './ollama-compatible';
import { isOpenAICompatible, streamMessageOpenAI } from './openai-compatible'; import { isOpenAICompatible, streamMessageOpenAI } from './openai-compatible';
import { streamMessageSenseAudio } from './senseaudio-compatible';
// Re-export for convenience // Re-export for convenience
export { isOpenAICompatible } from './openai-compatible'; export { isOpenAICompatible } from './openai-compatible';
@ -39,6 +41,12 @@ export async function streamMessage(
history: ChatMessage[], history: ChatMessage[],
signal: AbortSignal, signal: AbortSignal,
handlers: StreamHandlers, handlers: StreamHandlers,
// Only the senseaudio branch reads `context.projectId` today (so the
// daemon-side `generate_image` tool can write into the active
// project's folder). Other branches accept and ignore — keeping the
// signature uniform means the single call site in ProjectView passes
// the same shape regardless of protocol.
context?: ProxyContext,
): Promise<void> { ): Promise<void> {
// Prefer the explicit Settings protocol; keep the legacy heuristic as a // Prefer the explicit Settings protocol; keep the legacy heuristic as a
// fallback for configs saved before apiProtocol existed. // fallback for configs saved before apiProtocol existed.
@ -51,6 +59,9 @@ export async function streamMessage(
if (cfg.apiProtocol === 'google') { if (cfg.apiProtocol === 'google') {
return streamMessageGoogle(cfg, system, history, signal, handlers); return streamMessageGoogle(cfg, system, history, signal, handlers);
} }
if (cfg.apiProtocol === 'senseaudio') {
return streamMessageSenseAudio(cfg, system, history, signal, handlers, context);
}
if (cfg.apiProtocol === 'openai' || (!cfg.apiProtocol && isOpenAICompatible(cfg.model, cfg.baseUrl))) { if (cfg.apiProtocol === 'openai' || (!cfg.apiProtocol && isOpenAICompatible(cfg.model, cfg.baseUrl))) {
return streamMessageOpenAI(cfg, system, history, signal, handlers); return streamMessageOpenAI(cfg, system, history, signal, handlers);
} }

View file

@ -3,6 +3,22 @@ import type { AppConfig, ChatMessage } from '../types';
import type { StreamHandlers } from './anthropic'; import type { StreamHandlers } from './anthropic';
import { parseSseFrame } from './sse'; import { parseSseFrame } from './sse';
/**
* Optional per-request context that some protocols thread into the
* proxy body. Today only the senseaudio proxy reads these fields:
* - `projectId` lets the `generate_image` tool write into the active
* project's folder instead of a daemon-global cache.
* - `byokImageModel` is the user's BYOK Settings default for the
* image tool. The LLM can still override per-call via the tool's
* `model` arg; this is just the fallback when it omits one.
* Other protocols ignore unknown body fields, so callers are free to
* pass this for every protocol.
*/
export interface ProxyContext {
projectId?: string;
byokImageModel?: string;
}
export async function streamProxyEndpoint( export async function streamProxyEndpoint(
endpoint: string, endpoint: string,
cfg: AppConfig, cfg: AppConfig,
@ -10,6 +26,7 @@ export async function streamProxyEndpoint(
history: ChatMessage[], history: ChatMessage[],
signal: AbortSignal, signal: AbortSignal,
handlers: StreamHandlers, handlers: StreamHandlers,
context?: ProxyContext,
): Promise<void> { ): Promise<void> {
if (!cfg.apiKey) { if (!cfg.apiKey) {
handlers.onError(new Error('Missing API key — open Settings and paste one in.')); handlers.onError(new Error('Missing API key — open Settings and paste one in.'));
@ -30,6 +47,10 @@ export async function streamProxyEndpoint(
messages: history.map((m) => ({ role: m.role, content: m.content })), messages: history.map((m) => ({ role: m.role, content: m.content })),
maxTokens: effectiveMaxTokens(cfg), maxTokens: effectiveMaxTokens(cfg),
apiVersion: cfg.apiVersion, apiVersion: cfg.apiVersion,
...(context?.projectId ? { projectId: context.projectId } : {}),
...(context?.byokImageModel
? { byokImageModel: context.byokImageModel }
: {}),
}), }),
signal, signal,
}); });

View file

@ -0,0 +1,33 @@
/**
* SenseAudio chat completions provider. Wire-compatible with OpenAI
* (POST /v1/chat/completions, Bearer auth, SSE delta frames + [DONE]),
* so the only thing that differs from streamMessageOpenAI is the
* daemon proxy endpoint keeping a dedicated client makes the picker
* tab daemon log line upstream call chain readable end-to-end and
* leaves room for SenseAudio-specific divergence in the future.
*
* Routes through the daemon proxy to avoid browser CORS issues.
* BYOK the key stays on the user's machine.
*/
import type { AppConfig, ChatMessage } from '../types';
import type { StreamHandlers } from './anthropic';
import { streamProxyEndpoint, type ProxyContext } from './api-proxy';
export async function streamMessageSenseAudio(
cfg: AppConfig,
system: string,
history: ChatMessage[],
signal: AbortSignal,
handlers: StreamHandlers,
context?: ProxyContext,
): Promise<void> {
return streamProxyEndpoint(
'/api/proxy/senseaudio/stream',
cfg,
system,
history,
signal,
handlers,
context,
);
}

View file

@ -262,6 +262,24 @@ function renderBlock(block: Block, key: number): ReactNode {
return null; return null;
} }
// Allowed schemes / forms for image `src` attributes. The BYOK chat
// tool loop emits relative URLs like `/api/byok-image/<id>.png` which
// the web's Next.js rewrites proxy to the daemon — that's the common
// case. data: + blob: cover inline / generated images. http(s):// is
// allowed so a model can reference public images. Anything else
// (javascript:, file:, vbscript:, …) is rejected so a hallucinated
// or adversarial URL cannot exfiltrate or execute.
function isSafeMarkdownImageSrc(src: string): boolean {
if (!src) return false;
if (src.startsWith('/') && !src.startsWith('//')) return true;
return (
src.startsWith('http://')
|| src.startsWith('https://')
|| src.startsWith('data:image/')
|| src.startsWith('blob:')
);
}
// Inline pass: tokenize into runs of `code`, **bold**, *italic*, links, // Inline pass: tokenize into runs of `code`, **bold**, *italic*, links,
// and plain text. We walk the string with a regex that matches whichever // and plain text. We walk the string with a regex that matches whichever
// delimiter shows up next; everything between delimiters becomes a text // delimiter shows up next; everything between delimiters becomes a text
@ -270,14 +288,19 @@ function renderInline(text: string): ReactNode {
const out: ReactNode[] = []; const out: ReactNode[] = [];
// Order matters: // Order matters:
// 1. inline code first so its contents are not re-tokenized as bold/italic. // 1. inline code first so its contents are not re-tokenized as bold/italic.
// 2. explicit `[text](url)` markdown links before bare URL autolink so the // 2. image syntax `![alt](url)` BEFORE the link branch. Both share
// `[…](…)` and the image is only distinguished by the leading `!`;
// letting the link branch win would render `[alt](url)` as a text
// link with `!` stranded as a sibling text node and the user would
// see the link copy but never the image.
// 3. explicit `[text](url)` markdown links before bare URL autolink so the
// autolink does not greedily swallow the closing paren. // autolink does not greedily swallow the closing paren.
// 3. bare http(s) URL autolink BEFORE italic markers — chat output often // 4. bare http(s) URL autolink BEFORE italic markers — chat output often
// contains OAuth-style links with `_type=` / `_id=` query params, and // contains OAuth-style links with `_type=` / `_id=` query params, and
// leaving italic to win turns the URL into an italic-fragmented mess. // leaving italic to win turns the URL into an italic-fragmented mess.
// 4. bold (**a** / __a__) before italic (*a* / _a_). // 5. bold (**a** / __a__) before italic (*a* / _a_).
const re = const re =
/(`[^`]+`)|\[([^\]]+)\]\(([^)\s]+)\)|(https?:\/\/[^\s)<>]+)|(\*\*[^*]+\*\*)|(__[^_]+__)|(\*[^*\n]+\*)|(_[^_\n]+_)/g; /(`[^`]+`)|!\[([^\]]*)\]\(([^)\s]+)\)|\[([^\]]+)\]\(([^)\s]+)\)|(https?:\/\/[^\s)<>]+)|(\*\*[^*]+\*\*)|(__[^_]+__)|(\*[^*\n]+\*)|(_[^_\n]+_)/g;
let lastIndex = 0; let lastIndex = 0;
let m: RegExpExecArray | null; let m: RegExpExecArray | null;
let key = 0; let key = 0;
@ -291,40 +314,61 @@ function renderInline(text: string): ReactNode {
{m[1].slice(1, -1)} {m[1].slice(1, -1)}
</code>, </code>,
); );
} else if (m[2] && m[3]) { } else if (m[3] !== undefined) {
// Image: m[2] = alt (may be empty), m[3] = src
const src = m[3];
const alt = m[2] || '';
if (isSafeMarkdownImageSrc(src)) {
out.push(
<img
key={key++}
className="md-image"
src={src}
alt={alt}
loading="lazy"
referrerPolicy="no-referrer"
style={{ maxWidth: '100%', height: 'auto', borderRadius: 6 }}
/>,
);
} else {
// Unsafe scheme — drop the image tag but keep the alt text so
// the user sees what the model meant to show.
pushText(out, alt, key++);
}
} else if (m[4] && m[5]) {
out.push( out.push(
<a <a
key={key++} key={key++}
className="md-link" className="md-link"
href={m[3]} href={m[5]}
target="_blank"
rel="noreferrer noopener"
>
{m[2]}
</a>,
);
} else if (m[4]) {
// Bare URL — autolink with the URL as both href and visible text,
// matching the Markdown `<https://…>` autolink convention.
out.push(
<a
key={key++}
className="md-link md-link-bare"
href={m[4]}
target="_blank" target="_blank"
rel="noreferrer noopener" rel="noreferrer noopener"
> >
{m[4]} {m[4]}
</a>, </a>,
); );
} else if (m[5]) {
out.push(<strong key={key++}>{m[5].slice(2, -2)}</strong>);
} else if (m[6]) { } else if (m[6]) {
out.push(<strong key={key++}>{m[6].slice(2, -2)}</strong>); // Bare URL — autolink with the URL as both href and visible text,
// matching the Markdown `<https://…>` autolink convention.
out.push(
<a
key={key++}
className="md-link md-link-bare"
href={m[6]}
target="_blank"
rel="noreferrer noopener"
>
{m[6]}
</a>,
);
} else if (m[7]) { } else if (m[7]) {
out.push(<em key={key++}>{m[7].slice(1, -1)}</em>); out.push(<strong key={key++}>{m[7].slice(2, -2)}</strong>);
} else if (m[8]) { } else if (m[8]) {
out.push(<em key={key++}>{m[8].slice(1, -1)}</em>); out.push(<strong key={key++}>{m[8].slice(2, -2)}</strong>);
} else if (m[9]) {
out.push(<em key={key++}>{m[9].slice(1, -1)}</em>);
} else if (m[10]) {
out.push(<em key={key++}>{m[10].slice(1, -1)}</em>);
} }
lastIndex = re.lastIndex; lastIndex = re.lastIndex;
} }

View file

@ -65,6 +65,22 @@ export const SUGGESTED_MODELS_BY_PROTOCOL: Record<ApiProtocol, readonly string[]
'gemini-1.5-pro', 'gemini-1.5-pro',
'gemini-1.5-flash', 'gemini-1.5-flash',
], ],
senseaudio: [
// SenseAudio is an OpenAI-compatible gateway that fronts both its own
// models (senseaudio-s2 family) and aggregator routes to deepseek /
// glm / kimi / minimax. Listing the headline house models first keeps
// the picker's default selection on a SenseAudio-native checkpoint;
// the aggregator IDs trail so users who arrived for a specific
// upstream still find it in this tab without retyping it.
'senseaudio-s2',
'senseaudio-s2-flash',
'deepseek-v4-flash',
'deepseek-v4-pro',
'glm-5.1',
'kimi-k2.6',
'MiniMax-M2.7-highspeed',
'MiniMax-M2.7',
],
ollama: [ ollama: [
'cogito-2.1:671b', 'cogito-2.1:671b',
'deepseek-v3.1:671b', 'deepseek-v3.1:671b',
@ -123,6 +139,7 @@ export const FAST_MODEL_BY_PROTOCOL: Record<ApiProtocol, string> = {
// pick produces a deterministic answer; users who care can override // pick produces a deterministic answer; users who care can override
// through the Memory model picker. // through the Memory model picker.
ollama: 'gemma3:4b', ollama: 'gemma3:4b',
senseaudio: 'senseaudio-s2-flash',
}; };
export const API_PROTOCOL_TABS: ReadonlyArray<{ export const API_PROTOCOL_TABS: ReadonlyArray<{
@ -134,6 +151,7 @@ export const API_PROTOCOL_TABS: ReadonlyArray<{
{ id: 'azure', title: 'Azure OpenAI' }, { id: 'azure', title: 'Azure OpenAI' },
{ id: 'google', title: 'Google Gemini' }, { id: 'google', title: 'Google Gemini' },
{ id: 'ollama', title: 'Ollama Cloud' }, { id: 'ollama', title: 'Ollama Cloud' },
{ id: 'senseaudio', title: 'SenseAudio' },
]; ];
export const API_PROTOCOL_LABELS: Record<ApiProtocol, string> = { export const API_PROTOCOL_LABELS: Record<ApiProtocol, string> = {
@ -142,6 +160,7 @@ export const API_PROTOCOL_LABELS: Record<ApiProtocol, string> = {
azure: 'Azure OpenAI', azure: 'Azure OpenAI',
google: 'Google Gemini', google: 'Google Gemini',
ollama: 'Ollama Cloud API', ollama: 'Ollama Cloud API',
senseaudio: 'SenseAudio API',
}; };
export const API_KEY_PLACEHOLDERS: Record<ApiProtocol, string> = { export const API_KEY_PLACEHOLDERS: Record<ApiProtocol, string> = {
@ -150,6 +169,7 @@ export const API_KEY_PLACEHOLDERS: Record<ApiProtocol, string> = {
azure: 'azure key', azure: 'azure key',
google: 'AIza...', google: 'AIza...',
ollama: 'Ollama API key', ollama: 'Ollama API key',
senseaudio: 'SenseAudio API key',
}; };
// Default base URL the daemon assumes when the user leaves the field // Default base URL the daemon assumes when the user leaves the field
@ -161,4 +181,5 @@ export const DEFAULT_BASE_URL_BY_PROTOCOL: Record<ApiProtocol, string> = {
azure: '', azure: '',
google: 'https://generativelanguage.googleapis.com', google: 'https://generativelanguage.googleapis.com',
ollama: 'https://ollama.com', ollama: 'https://ollama.com',
senseaudio: 'https://api.senseaudio.cn',
}; };

View file

@ -249,6 +249,22 @@ export const KNOWN_PROVIDERS: KnownProvider[] = [
model: 'mimo-v2.5-pro', model: 'mimo-v2.5-pro',
models: ['mimo-v2.5-pro'], models: ['mimo-v2.5-pro'],
}, },
{
label: 'SenseAudio',
protocol: 'senseaudio',
baseUrl: 'https://api.senseaudio.cn',
model: 'senseaudio-s2',
models: [
'senseaudio-s2',
'senseaudio-s2-flash',
'deepseek-v4-flash',
'deepseek-v4-pro',
'glm-5.1',
'kimi-k2.6',
'MiniMax-M2.7-highspeed',
'MiniMax-M2.7',
],
},
]; ];
function normalizePet(input: Partial<PetConfig> | undefined): PetConfig { function normalizePet(input: Partial<PetConfig> | undefined): PetConfig {
@ -290,6 +306,10 @@ function inferApiProtocol(model: string, baseUrl: string): ApiProtocol {
// protocol so both chat and the connection test hit the native Ollama // protocol so both chat and the connection test hit the native Ollama
// proxy instead of the Anthropic or OpenAI paths. // proxy instead of the Anthropic or OpenAI paths.
if (normalized.includes('ollama.com')) return 'ollama'; if (normalized.includes('ollama.com')) return 'ollama';
// SenseAudio host gets routed to its own proxy so the daemon log line
// and the BYOK tab UI stay consistent with the protocol the user
// picked — even though the on-wire shape is OpenAI-compatible.
if (normalized.includes('senseaudio.cn')) return 'senseaudio';
return isOpenAICompatible(model, baseUrl) ? 'openai' : 'anthropic'; return isOpenAICompatible(model, baseUrl) ? 'openai' : 'anthropic';
} catch { } catch {
// Preserve the rest of the user's settings even if an old saved base URL is // Preserve the rest of the user's settings even if an old saved base URL is

View file

@ -91,7 +91,7 @@ export type {
} from '@open-design/contracts'; } from '@open-design/contracts';
export type ExecMode = 'daemon' | 'api'; export type ExecMode = 'daemon' | 'api';
export type ApiProtocol = 'anthropic' | 'openai' | 'azure' | 'google' | 'ollama'; export type ApiProtocol = 'anthropic' | 'openai' | 'azure' | 'google' | 'ollama' | 'senseaudio';
export type LiveArtifactTabId = `live:${string}`; export type LiveArtifactTabId = `live:${string}`;
export type ProjectWorkspaceTabId = string | LiveArtifactTabId; export type ProjectWorkspaceTabId = string | LiveArtifactTabId;
@ -180,6 +180,13 @@ export interface ApiProtocolConfig {
model: string; model: string;
apiVersion?: string; apiVersion?: string;
apiProviderBaseUrl?: string | null; apiProviderBaseUrl?: string | null;
/** SenseAudio BYOK only default image model the daemon-side
* `generate_image` tool uses when the LLM doesn't pass one. Carries
* one of the SenseAudio image model ids (`senseaudio-image-2.0-260319`,
* `senseaudio-image-1.0-260319`, `doubao-seedream-5-0-260128`). Stored
* per-protocol so flipping between BYOK tabs doesn't reset the
* SenseAudio image-model choice. */
byokImageModel?: string;
} }
// Per-CLI model + reasoning the user picked in the model menu. Each agent // Per-CLI model + reasoning the user picked in the model menu. Each agent
@ -294,6 +301,11 @@ export interface AppConfig {
model: string; model: string;
apiProtocol?: ApiProtocol; apiProtocol?: ApiProtocol;
apiVersion?: string; apiVersion?: string;
/** SenseAudio BYOK only default image model for the daemon-side
* generate_image tool. Mirrors apiProtocolConfigs.senseaudio.byokImageModel
* so the active protocol's value lives at the top level (consistent
* with how apiKey / baseUrl / model are projected onto AppConfig). */
byokImageModel?: string;
apiProtocolConfigs?: Partial<Record<ApiProtocol, ApiProtocolConfig>>; apiProtocolConfigs?: Partial<Record<ApiProtocol, ApiProtocolConfig>>;
/** Internal config schema/migration version for localStorage upgrades. */ /** Internal config schema/migration version for localStorage upgrades. */
configMigrationVersion?: number; configMigrationVersion?: number;

View file

@ -6,6 +6,7 @@ const API_PROTOCOL_LABELS: Record<ApiProtocol, string> = {
azure: 'Azure OpenAI', azure: 'Azure OpenAI',
google: 'Google Gemini', google: 'Google Gemini',
ollama: 'Ollama Cloud API', ollama: 'Ollama Cloud API',
senseaudio: 'SenseAudio API',
}; };
const API_PROTOCOL_AGENT_IDS: Record<ApiProtocol, string> = { const API_PROTOCOL_AGENT_IDS: Record<ApiProtocol, string> = {
@ -14,6 +15,7 @@ const API_PROTOCOL_AGENT_IDS: Record<ApiProtocol, string> = {
azure: 'azure-openai-api', azure: 'azure-openai-api',
google: 'google-gemini-api', google: 'google-gemini-api',
ollama: 'ollama-cloud-api', ollama: 'ollama-cloud-api',
senseaudio: 'senseaudio-api',
}; };
export function apiProtocolLabel(protocol: ApiProtocol | undefined): string { export function apiProtocolLabel(protocol: ApiProtocol | undefined): string {

View file

@ -105,4 +105,67 @@ describe('renderMarkdown', () => {
const bodyTd = (out.match(/<tbody>[\s\S]*<\/tbody>/)?.[0] ?? '').match(/<td/g) ?? []; const bodyTd = (out.match(/<tbody>[\s\S]*<\/tbody>/)?.[0] ?? '').match(/<td/g) ?? [];
expect(bodyTd.length).toBe(2); expect(bodyTd.length).toBe(2);
}); });
it('renders ![alt](url) as <img> for relative BYOK image URLs', () => {
const out = html('Here is your cat: ![cute kitten](/api/byok-image/abc-123.png)');
expect(out).toContain('<img');
expect(out).toContain('class="md-image"');
expect(out).toContain('src="/api/byok-image/abc-123.png"');
expect(out).toContain('alt="cute kitten"');
expect(out).toContain('loading="lazy"');
expect(out).toContain('referrerPolicy="no-referrer"');
// Image syntax must NOT be turned into an <a> link — `[alt](url)`
// with a leading `!` is image, not link.
expect(out).not.toContain('<a class="md-link"');
});
it('renders ![](url) with empty alt text', () => {
const out = html('![](/api/byok-image/abc.png)');
expect(out).toContain('<img');
expect(out).toContain('alt=""');
});
it('renders https image URLs', () => {
const out = html('![logo](https://example.com/logo.png)');
expect(out).toContain('<img');
expect(out).toContain('src="https://example.com/logo.png"');
});
it('renders data: image URIs', () => {
const out = html('![inline](data:image/png;base64,iVBORw0KGgo=)');
expect(out).toContain('<img');
expect(out).toContain('src="data:image/png;base64,iVBORw0KGgo="');
});
it('drops image tags with unsafe schemes and keeps alt text as plain text', () => {
const out = html('![hacked](javascript:alert(1))');
expect(out).not.toContain('<img');
expect(out).not.toContain('javascript:');
expect(out).toContain('hacked');
});
it('rejects protocol-relative image URLs (could load cross-origin)', () => {
// `//evil.com/track.png` would inherit the page protocol; not in our
// allowlist. Should fall through to alt-as-text.
const out = html('![track](//evil.com/track.png)');
expect(out).not.toContain('<img');
expect(out).toContain('track');
});
it('keeps regular [text](url) links working alongside image syntax', () => {
const out = html('Click [here](https://example.com) and look ![image](/api/byok-image/a.png)');
expect(out).toContain('<a class="md-link"');
expect(out).toContain('href="https://example.com"');
expect(out).toContain('>here</a>');
expect(out).toContain('<img');
expect(out).toContain('src="/api/byok-image/a.png"');
});
it('preserves bold + italic + code after the image regex addition', () => {
const out = html('**b** and *i* and `c` and ![a](/p.png)');
expect(out).toContain('<strong>b</strong>');
expect(out).toContain('<em>i</em>');
expect(out).toContain('<code class="md-inline-code">c</code>');
expect(out).toContain('<img');
});
}); });

View file

@ -229,7 +229,7 @@ export interface SettingsClickByokProviderOptionProps {
// Tracking doc names azure/google/ollama as azure_openai/google_gemini/ // Tracking doc names azure/google/ollama as azure_openai/google_gemini/
// ollama_cloud — we forward the code value verbatim and let dashboards // ollama_cloud — we forward the code value verbatim and let dashboards
// map; see tracking-doc-issues.md §2.5. // map; see tracking-doc-issues.md §2.5.
provider_id: 'anthropic' | 'openai' | 'azure' | 'ollama' | 'google'; provider_id: 'anthropic' | 'openai' | 'azure' | 'ollama' | 'google' | 'senseaudio';
// True when the clicked chip was already the active protocol (no-op // True when the clicked chip was already the active protocol (no-op
// toggle); false when the click switches protocol. // toggle); false when the click switches protocol.
is_selected: boolean; is_selected: boolean;
@ -242,10 +242,10 @@ export interface SettingsClickByokFieldProps {
action: 'focus_byok_field'; action: 'focus_byok_field';
field_id: 'api_key' | 'base_url' | 'model'; field_id: 'api_key' | 'base_url' | 'model';
// Code's `apiProtocol` is wider than the CSV's BYOK provider enum // Code's `apiProtocol` is wider than the CSV's BYOK provider enum
// (anthropic|openai|azure|ollama|google). We forward the code value // (anthropic|openai|azure|ollama|google|senseaudio). We forward the code
// verbatim so dashboards can group by the actual protocol; the CSV enum // value verbatim so dashboards can group by the actual protocol; the CSV
// is a strict subset the product team can revise. // enum is a strict subset the product team can revise.
provider_id: 'anthropic' | 'openai' | 'azure' | 'ollama' | 'google'; provider_id: 'anthropic' | 'openai' | 'azure' | 'ollama' | 'google' | 'senseaudio';
has_value: boolean; has_value: boolean;
} }
@ -261,7 +261,7 @@ export interface SettingsCliTestResultProps {
export interface SettingsByokTestResultProps { export interface SettingsByokTestResultProps {
page: 'settings'; page: 'settings';
area: 'execution_model'; area: 'execution_model';
provider_id: 'anthropic' | 'openai' | 'azure' | 'ollama' | 'google'; provider_id: 'anthropic' | 'openai' | 'azure' | 'ollama' | 'google' | 'senseaudio';
result: 'success' | 'failed' | 'timeout'; result: 'success' | 'failed' | 'timeout';
error_code?: string; error_code?: string;
duration_ms: number; duration_ms: number;

View file

@ -139,7 +139,7 @@ export type ConnectionTestKind =
| 'agent_spawn_failed' | 'agent_spawn_failed'
| 'unknown'; | 'unknown';
export type ConnectionTestProtocol = 'anthropic' | 'openai' | 'azure' | 'google' | 'ollama'; export type ConnectionTestProtocol = 'anthropic' | 'openai' | 'azure' | 'google' | 'ollama' | 'senseaudio';
export interface ProviderTestRequest { export interface ProviderTestRequest {
protocol: ConnectionTestProtocol; protocol: ConnectionTestProtocol;

View file

@ -80,16 +80,19 @@ export interface MemoryListResponse {
/** Provider/protocol the memory extractor calls. Mirrors the chat /** Provider/protocol the memory extractor calls. Mirrors the chat
* BYOK form's protocols anthropic + openai-compatible + azure * BYOK form's protocols anthropic + openai-compatible + azure
* (openai-compatible at a different URL/header) + google gemini + * (openai-compatible at a different URL/header) + google gemini +
* ollama (also openai-compatible, just hosted on Ollama Cloud) so * ollama (also openai-compatible, just hosted on Ollama Cloud) +
* the memory picker can offer the same options as the chat picker * senseaudio (also openai-compatible, SenseAudio's OpenAI-shaped
* above it. The daemon routes ollama through the same callOpenAI * /v1/chat/completions gateway) so the memory picker can offer the
* path since the wire protocol is identical. */ * same options as the chat picker above it. The daemon routes both
* ollama and senseaudio through the same callOpenAI path since the
* wire protocol is identical. */
export type MemoryExtractionProvider = export type MemoryExtractionProvider =
| 'anthropic' | 'anthropic'
| 'openai' | 'openai'
| 'azure' | 'azure'
| 'google' | 'google'
| 'ollama'; | 'ollama'
| 'senseaudio';
/** Masked version of MemoryExtractionConfig returned by GET endpoints /** Masked version of MemoryExtractionConfig returned by GET endpoints
* the api key field is replaced with a 4-char tail so the settings UI * the api key field is replaced with a 4-char tail so the settings UI